1# ===========================================================================
2#
3#                            PUBLIC DOMAIN NOTICE
4#               National Center for Biotechnology Information
5#
6#  This software/database is a "United States Government Work" under the
7#  terms of the United States Copyright Act.  It was written as part of
8#  the author's official duties as a United States Government employee and
9#  thus cannot be copyrighted.  This software/database is freely available
10#  to the public for use. The National Library of Medicine and the U.S.
11#  Government have not placed any restriction on its use or reproduction.
12#
13#  Although all reasonable efforts have been taken to ensure the accuracy
14#  and reliability of the software and data, the NLM and the U.S.
15#  Government do not and cannot warrant the performance or results that
16#  may be obtained by using this software or data. The NLM and the U.S.
17#  Government disclaim all warranties, express or implied, including
18#  warranties of performance, merchantability or fitness for any particular
19#  purpose.
20#
21#  Please cite the author in any work or product based on this material.
22#
23# ===========================================================================
24
25
26The NCBI SRA ( Sequence Read Archive )
27
28
29Contact: sra-tools@ncbi.nlm.nih.gov
30http://trace.ncbi.nlm.nih.gov/Traces/sra/std
31
32
33Stand-alone BLAST searches against SRA runs in their native format.
34-------------------------------------------------------------------
35
36A stand-alone blastn application to perform BLAST searches directly against
37native SRA files is included in this distribution. This application has been
38tested in-house at the NCBI, but has not been heavily used, so this should be
39considered a preliminary (alpha) release to a few experienced users. A 64-bit
40LINUX application has been built for this testing.
41
42The application is called "blastn_vdb".
43
44The application can be invoked in much the same manner as the standard
45blastn application:
46
471) blastn_vdb -help or blastn_vdb -h will produce usage messages.
48
492) The BLAST+ command-line manual at http://www.ncbi.nlm.nih.gov/books/NBK1763/
50provides more details on the options, though not all blastn options are
51available with blastn_vdb. Some options simply do not apply to sequences in SRA
52(e.g., -gilist is missing as these sequences have not been assigned GI's). Some
53options have not yet been implemented (e.g., -num_threads is currently disabled).
54
55
56To search cached or on-demand SRA objects.
57------------------------------------------
58An example blastn_vdb command-line would be:
59
60./blastn_vdb -db "ERR039542 ERR047215 ERR039539 ERR039540" -query nt.test -out test.out
61
62The file nt.test contains the query in FASTA format, and it will be searched against
63the reads in runs with accessions ERR039542 ERR047215 ERR039539 ERR039540.
64
65If you have not already downloaded these objects using the vdb "prefetch" tool,
66they will be retrieved on-demand from NCBI under standard configuration. For
67alternative configuration information, please see the "README-vdb-config" file
68in this distribution.
69
70Searching with manually downloaded files.
71-----------------------------------------
72If you have manually downloaded files, e.g. via aspera or wget, etc., they may
73be referred to as "local" files. You can pass one or more file paths to be used
74collectively as the database. In this case the blastn_vdb command-line would be:
75
76./blastn_vdb -db <SRR_file> -query <input_file> -out <output_file>
77
78Where
79<SRR_file> is the path (relative or absolute) and name of the SRRxxxxx file
80<input_file> is a fasta file containing the sequence(s) to be BLASTed
81<output_file> is the name specified for the output report of the blast search.
82
83Example:
84
85./blastn_vdb -db ./subdir/ERR039542.sra -query nt.test -out test.out
86
87Querying multiple SRR files simultaneously:
88
89./blastn_vdb -db "<SRR_file1> <SRR_file2> <SRR_file3>" -query <input.fa> -out <output_file>
90
91Enclose the group of files to be included in the search set in "quotes", e.g.
92"./SRR_file1.sra ./SRR_file2.sra ./SRR_file3.sra"
93
94Example:
95
96./blastn_vdb -db "./ERR039542.sra ./ERR047215.sra ./ERR039539.sra ./ERR039540.sra" -query nt.test -out test.out
97
98Caveats
99-------
100There are some limitations on the currently available application:
101
1021) Individual SRA data files containing more than 2 billion reads are not yet supported. For a
103paired-end experiment this is actually a limitation of about 1 billion "spots".
104
1052) Compressed SRA ("cSRA") is not yet fully supported. Currently, only the
106unaligned fraction of reads are searched. Compressed SRA are runs containing
107alignments (e.g., ERR230455). Runs can be checked with "vdb-dump" to report if
108they contain alignment information:
109
110    $ vdb-dump -E ERR230455
111    enumerating the tables of database 'ERR230455'
112    tbl #1: PRIMARY_ALIGNMENT
113    tbl #2: REFERENCE
114    tbl #3: SEQUENCE
115
1163) You may need to prefix "./" to the run name for files in your current
117directory.
118
1194) The blast_formatter is not currently able to read native SRA files, so
120reformatting of results saved as a blast archive is not yet supported.
121
122Common errors and fixes.
123------------------------
124
1251) Failure to provide relative path to manually downloaded SRR file:
126
127./blastn_vdb -db SRR770754.sra -query srr770754_test.fa -out test.out
128Error: NCBI C++ Exception:
129    "vdb2blast_util.cpp", line 253: Error: ncbi::CVDBBlastUtil::x_MakeSRASeqSrc()
130    - VDB BlastSeqSrc construction failed: Failed to add any run to VDB runset: unsupported while allocating
131
132Fix:
133Include relative (e.g., "../" or "./") or absolute (e.g., "/home/user/SRA_BLAST_data/") file path with SRR file
134