PSI-SemiGLOBAL combines the heuristics of PSI-BLAST, the ubiquitous database search algorithms with the semi-global statistical calculations from GLOBAL (Kann et al., 2007). All the parameters and optimizations from BLAST that people are familiar with are combined with a new statistical calculation. Additionally, PSI-SemiGLOBAL can take additional command-line arguments like -blocks BLOCKS_FILENAME and -autoblocks. If the -blocks BLOCKS_FILENAME argument is used, then PSI-SemiGLOBAL divides the query input into blocks according to the positions listed in BLOCKS_FILENAME. Otherwise, if the -autoblocks option is specified, then PSI-SemiGLOBAL looks for columns with low complexity in the multiple sequence alignment (MSA) or position specific scoring matrix (PSSM). If a query is specified, PSI-SemiGLOBAL makes blocks of equal length.
We tested PSI-SemiGLOBAL on a multiple domain benchmark database, MultiDomainBenchmark and compared its results with those of PSI-BLAST's. With this database, PSI-SemiGLOBAL produced better Threshold Average Precision-k (TAP-k) (Carroll et al., 2010) values than PSI-BLAST with a TAP-1 value of 0.44 (compared to PSI-BLAST's 0.37).
PSI-SemiGLOBAL is compiled in the same manner as BLAST and can be used on any platform that BLAST is designed for.
To compile, first download the source code for BLAST+ version 2.2.31.
Then, apply the changes specific to PSI-SemiGLOBAL found in the following tarball: psiSemiGlobalFiles.tgz .
Untar the tarball over the original v2.2.31 files (i.e., tar -xzf psiSemiGlobalFiles.tgz).
If you're using a version after 2.2.31, apply the differences of each of the above files found in psiSemiGlobalFiles.diff.
After applying the PSI-SemiGLOBAL specific modifications, compile the code in the same way that you would compile BLAST.
[-h] [-help] [-import_search_strategy filename] [-export_search_strategy filename] [-db database_name] [-dbsize num_letters] [-gilist filename] [-seqidlist filename] [-negative_gilist filename] [-entrez_query entrez_query] [-subject subject_input_file] [-subject_loc range] [-query input_file] [-out output_file] [-evalue evalue] [-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty] [-xdrop_ungap float_value] [-xdrop_gap float_value] [-xdrop_gap_final float_value] [-searchsp int_value] [-max_hsps_per_subject int_value] [-seg SEG_options] [-soft_masking soft_masking] [-matrix matrix_name] [-threshold float_value] [-culling_limit int_value] [-best_hit_overhang float_value] [-best_hit_score_edge float_value] [-window_size int_value] [-lcase_masking] [-query_loc range] [-parse_deflines] [-outfmt format] [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] [-html] [-max_target_seqs num_sequences] [-num_threads int_value] [-remote] [-gap_trigger float_value] [-num_iterations int_value] [-out_pssm checkpoint_file] [-out_ascii_pssm ascii_mtx_file] [-in_msa align_restart] [-in_msa_full align_restart] [-in_pssm psi_chkpt_file] [-blocks blocks_info_file] [-blocks_out blocks_info_file] [-blocks_gap_percent blocks_gap_percent] [-exit_after_last_pssm_calculation] [-col_count int_value] [-col_content float_value] [-global_pssm] [-target_block_length int_value] [-update_block_indices] [-block_hit_length_minimum int_value] [-pseudocount pseudocount] [-inclusion_ethresh ethresh] [-version]For a more detailed explanation of the each parameter, see the help page.
PUBLIC DOMAIN NOTICE National Center for Biotechnology Information This software/database is a "United States Government Work" under the terms of the United States Copyright Act. It was written as part of the author's official duties as a United States Government employee and thus cannot be copyrighted. This software/database is freely available to the public for use. The National Library of Medicine and the U.S. Government have not placed any restriction on its use or reproduction. Although all reasonable efforts have been taken to ensure the accuracy and reliability of the software and data, the NLM and the U.S. Government do not and cannot warrant the performance or results that may be obtained by using this software or data. The NLM and the U.S. Government disclaim all warranties, express or implied, including warranties of performance, merchantability or fitness for any particular purpose. Please cite the author in any work or product based on this material.
psisemiglobal -db nr90 -query t039239.fa -blocks_out t039239-blocksOut.txt -num_iterations 5 -out_pssm t039239.pssm -global_pssm -target_block_length 30
psisemiglobal -db MultiDomainBenchmark -blocks t039239-blocksOut.txt -in_pssm t039239.pssm -num_iterations 1 -evalue 1000 -out t039239-hits_final.out -global_pssm -target_block_length 30 -num_descriptions 9999 -num_alignments 9999