pfam21|Q9ULD7|Q9ULD7_HUMAN | up|P14046|A1I3_RAT | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
Description | Min | Average | Max | Graph |
---|---|---|---|---|
Number of sequences per DA | 2 | 111.0 | 1315 | |
Lengths of each of the query sequences | 170 | 759.7 | 1800 | |
170 | 750.9 | 1800 | ||
198 | 768.5 | 1749 | ||
Number of domains per query | 2 | 2.9 | 16 |
#Taxon DA Relevant_DAs pfam21|O00197|O00197|HUMAN da00353 da00352,da00698,da00923,da00972,da01542,da01962,da02225,da02278,da02288,da02397,da02425 pfam21|O00222|MGR8|HUMAN da00069 da00192,da00679,da02044 pfam21|O00329|PK3CD|HUMAN da00020 da00179,da00324,da00325,da00336,da00615,da00714,da00747,da01306,da02021,da02047,da02059 ...The first column is the label of the multi-domain sequence. The second column is a DA identifier. The third (last) colunmn is a a list of other DAs that would result in a relevant ("True Positive") match.
>pfam21|Q40577|5EAS_TOBAC PF01397 25 219 PF03936 224 492 >up|O70212|5HT3A_CAVPO PF02931 39 248 PF02932 255 481 >up|P46098|5HT3A_HUMAN ...The format of the domain locations information file is a greater than sign, ">", followed by the label of each of the 227,512 sequences. The following lines have three columns to indicate the domain name, it's starting and ending positions. This file is a subset of family_members.annot from RefProtDom v1.2 (Gonzalez and Pearson, 2010).
Script / File / Directory | Description |
---|---|
makeBenchmark.sh | Makes the entire benchmark |
perl/ | Directory of supporting Perl modules |
scripts/domainLocsParser.pl | Determines the number of domains, presence of a repeated domain and/or an embedded domain for each sequence |
scripts/getDomainSizes.pl | Retrieves a subset of the statistics about each sequence based on a file listing the random queries |
scripts/getDASizes.pl | Determines the DA associated with each specified query and calculates the number of sequences with the same DAs |
scripts/getDAs.pl | Filters DAs and randomly chooses one query sequence per DA |
scripts/lengths.sh | Calculates the number of residues in each sequence |
scripts/fileSplicer.pl | Breaks up a file into sets |
scripts/removeTaxa.pl | Removes taxa from a specified file that match labels from a specified list |
taxonomy/diverseDAs.py | Determines if a domain architecture (DA) has members (sequences) in multiple taxonomical domains |
taxonomy/speclist-20060516.txt | Older controlled vocabulary of species (for deprecated species) |
taxonomy/uniprot2taxonomy.pl | Determines the taxonomy for a Uniprot accession |
taxonomy/idmapping_selected-cached.tab | Simplified cached copy of accessions to taxon ID mappings |
COPYING | GNU GENERAL PUBLIC LICENSE v3 |
Script / File | Description |
---|---|
calculateTap-iterations.sh | Calculates TAP scores for each of the iterations |
calculateTap.sh | Calculates TAP scores |
calculateTapKs.sh | Calculates TAP-k scores |
calculateTiming.sh | Gathers all timing information |
classifyRelevance.pl | Main relevancy script. Given an output file from BLAST, PSI-BLAST, or HMMER3, it determines for each sequence retrieved if its a relevant or irrelevant match. |
hmmerWrapper.sh | Executes phmmer or jackhmmer, including setting up a directory, calling the database search program and determining relevancy |
jackhmmerWrapper.sh | Symbolic link to hmmerWrapper.sh |
jobsManager.sh | Creates submission (or simply an execution) scripts and schedules jobs that are not pending or running on PBS-like systems. Alternatively, it can execute jobs serially (without a scheduler). |
numRelevantRecords.pl | Calculates and displays the number of relevant records |
phmmerWrapper.sh | Symbolic link to hmmerWrapper.sh |
psi-non_iterativeWrapper.sh | Executes a single iteration of PSI-BLAST, including setting up a directory, calling the database search program and determining relevancy |
psiWrapper.sh | Executes PSI-BLAST, including setting up a directory, calling the database search program and determining relevancy. |
psiblast-non_iterativeWrapper.sh | Simple wrapper to psi-non_iterativeWrapper.sh |
psiblastWrapper.sh | Simple wrapper to psiWrapper.sh |
removeEmptyFiles.sh | Utility script that removes all non-hidden, zero-length files |
runAll.sh | Highest level script to manage execution of runs |
spouge2spougeE.pl | Truncates .spouge formatted files to a threshold (e.g., E-value) |
COPYING | GNU GENERAL PUBLIC LICENSE v3 |
# Query non-iterative::phmmer non-iterative::psiblast iterative::jackhmmer iterative::psiblast # Threshold for 0.5 Quantile 5.3e-137 4e-131 2e-190 0 # Unweighted Average TAP-1 0.2564 0.2807 0.1381 0.3081 pfam21|P91550|P91550_CAEEL 0.0573 0.1031 0.0019 0.0868 pfam21|Q1SJT6|Q1SJT6_MEDTR 0.4948 0.5686 0.6759 0.6836 pfam21|Q1TLH7|Q1TLH7_9MYCO 0.6418 0.6589 0.5253 0.6221 pfam21|Q1UW50|Q1UW50_9MYCO 0.0315 0.0394 0.0000 0.0000 ...
(time -p psiblast -db finalDatabase -query queries-multiDomain/pfam21_Q1UW50_Q1UW50_9MYCO.fa -num_threads 1 -evalue 1000 -out pfam21_Q1UW50_Q1UW50_9MYCO.psiblast-non_iterative.hits.final.txt -num_descriptions 9999 -num_alignments 9999) &> pfam21_Q1UW50_Q1UW50_9MYCO.psiblast-non_iterative.final.out classifyRelevance.pl -v -psiblast=pfam21_Q1UW50_Q1UW50_9MYCO.psiblast-non_iterative.hits.final.txt -rel=relevanceInfo.tab --spouge=pfam21_Q1UW50_Q1UW50_9MYCO.psiblast-non_iterative.spouge.1000 --spougeext -taxon=pfam21_Q1UW50_Q1UW50_9MYCO --domainLocs=domainLocs.tab --overlap=50 --norandomsAsIrrelevants --combineHSPs --useFold(Note: the --useFold flag indicates to use the primary and superset DAs for classification.)
(time -p psiblast -db nrdb90/nr90 -query queries-multiDomain/pfam21_Q1UW50_Q1UW50_9MYCO.fa -num_iterations 5 -num_threads 1 -out_pssm pfam21_Q1UW50_Q1UW50_9MYCO.psiblast.nr.pssm -out_ascii_pssm pfam21_Q1UW50_Q1UW50_9MYCO.psiblast.nr.pssm.txt) &> pfam21_Q1UW50_Q1UW50_9MYCO.psiblast.iterationsDb.out (time -p psiblast -db finalDatabase -in_pssm pfam21_Q1UW50_Q1UW50_9MYCO.psiblast.nr.pssm -num_iterations 1 -evalue 1000 -num_threads 1 -out pfam21_Q1UW50_Q1UW50_9MYCO.psiblast.hits.final.out -num_descriptions 9999 -num_alignments 9999) &> pfam21_Q1UW50_Q1UW50_9MYCO.psiblast.final.out classifyRelevance.pl -v -psiblast=pfam21_Q1UW50_Q1UW50_9MYCO.psiblast.hits.final.out -rel=relevanceInfo.tab --spouge=pfam21_Q1UW50_Q1UW50_9MYCO.psiblast.spouge.1000 --spougeext -taxon=pfam21_Q1UW50_Q1UW50_9MYCO --domainLocs=domainLocs.tab --overlap=50 --norandomsAsIrrelevants --combineHSPs --useFold
hmmbuild pfam21_Q1UW50_Q1UW50_9MYCO.phmmer.finalIn.hmm queries-multiDomain/pfam21_Q1UW50_Q1UW50_9MYCO.fa (time -p hmmsearch -E 1000 --cpu 1 -o pfam21_Q1UW50_Q1UW50_9MYCO.phmmer.hits.final.out pfam21_Q1UW50_Q1UW50_9MYCO.phmmer.finalIn.hmm finalDatabase.fa) &> pfam21_Q1UW50_Q1UW50_9MYCO.phmmer.final.out classifyRelevance.pl -v -phmmer=pfam21_Q1UW50_Q1UW50_9MYCO.phmmer.hits.final.out -rel=relevanceInfo.tab --spouge=pfam21_Q1UW50_Q1UW50_9MYCO.phmmer.spouge.1000 --spougeext -taxon=pfam21_Q1UW50_Q1UW50_9MYCO --domainLocs=domainLocs.tab --overlap=50 --norandomsAsIrrelevants --combineHSPs --useFold(Note: the --combineHSPs flag is ignored, but supplied to classifyRelevance.pl for uniformity.)
(time -p jackhmmer --noali -N 5 --cpu 1 --chkhmm pfam21_Q1UW50_Q1UW50_9MYCO.jackhmmer.iterationsDb queries-multiDomain/pfam21_Q1UW50_Q1UW50_9MYCO.fa nrdb90/nr90.fa | grep '^\[ok\]$') &> pfam21_Q1UW50_Q1UW50_9MYCO.jackhmmer.iterationsDb.out ln -s "pfam21_Q1UW50_Q1UW50_9MYCO.jackhmmer.iterationsDb-5.hmm" "pfam21_Q1UW50_Q1UW50_9MYCO.jackhmmer.finalIn.hmm" (time -p hmmsearch -E 1000 --cpu 1 -o pfam21_Q1UW50_Q1UW50_9MYCO.jackhmmer.hits.final.out pfam21_Q1UW50_Q1UW50_9MYCO.jackhmmer.finalIn.hmm finalDatabase.fa) &> pfam21_Q1UW50_Q1UW50_9MYCO.jackhmmer.final.out classifyRelevance.pl -v -jackhmmer=pfam21_Q1UW50_Q1UW50_9MYCO.jackhmmer.hits.final.out -rel=relevanceInfo.tab --spouge=pfam21_Q1UW50_Q1UW50_9MYCO.jackhmmer.spouge.1000 --spougeext -taxon=pfam21_Q1UW50_Q1UW50_9MYCO --domainLocs=domainLocs.tab --overlap=50 --norandomsAsIrrelevants --combineHSPs --useFold(Note: the --combineHSPs flag is ignored, but supplied to classifyRelevance.pl for uniformity.)