• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

build/H03-May-2022-26,86820,820

ncbi-vdb/H03-May-2022-1,402,609970,311

ngs/H15-Mar-2021-69,90545,023

scripts/H03-May-2022-1,6461,286

setup/H03-May-2022-3,6353,121

shared/H15-Mar-2021-10440

test/H15-Mar-2021-32,67824,958

tools/H15-Mar-2021-415,100307,064

tools2/H15-Mar-2021-5,8394,903

.gitignoreH A D15-Mar-2021257 3029

CHANGES.mdH A D15-Mar-202130.2 KiB504413

LICENSEH A D15-Mar-20212.8 KiB6748

MakefileH A D15-Mar-20215.2 KiB18890

README-blastnH A D15-Mar-20215.5 KiB13497

README-vdb-configH A D15-Mar-20214.9 KiB11686

README.mdH A D15-Mar-20215.2 KiB8159

configureH A D15-Mar-20211.7 KiB4616

README-blastn

1# ===========================================================================
2#
3#                            PUBLIC DOMAIN NOTICE
4#               National Center for Biotechnology Information
5#
6#  This software/database is a "United States Government Work" under the
7#  terms of the United States Copyright Act.  It was written as part of
8#  the author's official duties as a United States Government employee and
9#  thus cannot be copyrighted.  This software/database is freely available
10#  to the public for use. The National Library of Medicine and the U.S.
11#  Government have not placed any restriction on its use or reproduction.
12#
13#  Although all reasonable efforts have been taken to ensure the accuracy
14#  and reliability of the software and data, the NLM and the U.S.
15#  Government do not and cannot warrant the performance or results that
16#  may be obtained by using this software or data. The NLM and the U.S.
17#  Government disclaim all warranties, express or implied, including
18#  warranties of performance, merchantability or fitness for any particular
19#  purpose.
20#
21#  Please cite the author in any work or product based on this material.
22#
23# ===========================================================================
24
25
26The NCBI SRA ( Sequence Read Archive )
27
28
29Contact: sra-tools@ncbi.nlm.nih.gov
30http://trace.ncbi.nlm.nih.gov/Traces/sra/std
31
32
33Stand-alone BLAST searches against SRA runs in their native format.
34-------------------------------------------------------------------
35
36A stand-alone blastn application to perform BLAST searches directly against
37native SRA files is included in this distribution. This application has been
38tested in-house at the NCBI, but has not been heavily used, so this should be
39considered a preliminary (alpha) release to a few experienced users. A 64-bit
40LINUX application has been built for this testing.
41
42The application is called "blastn_vdb".
43
44The application can be invoked in much the same manner as the standard
45blastn application:
46
471) blastn_vdb -help or blastn_vdb -h will produce usage messages.
48
492) The BLAST+ command-line manual at http://www.ncbi.nlm.nih.gov/books/NBK1763/
50provides more details on the options, though not all blastn options are
51available with blastn_vdb. Some options simply do not apply to sequences in SRA
52(e.g., -gilist is missing as these sequences have not been assigned GI's). Some
53options have not yet been implemented (e.g., -num_threads is currently disabled).
54
55
56To search cached or on-demand SRA objects.
57------------------------------------------
58An example blastn_vdb command-line would be:
59
60./blastn_vdb -db "ERR039542 ERR047215 ERR039539 ERR039540" -query nt.test -out test.out
61
62The file nt.test contains the query in FASTA format, and it will be searched against
63the reads in runs with accessions ERR039542 ERR047215 ERR039539 ERR039540.
64
65If you have not already downloaded these objects using the vdb "prefetch" tool,
66they will be retrieved on-demand from NCBI under standard configuration. For
67alternative configuration information, please see the "README-vdb-config" file
68in this distribution.
69
70Searching with manually downloaded files.
71-----------------------------------------
72If you have manually downloaded files, e.g. via aspera or wget, etc., they may
73be referred to as "local" files. You can pass one or more file paths to be used
74collectively as the database. In this case the blastn_vdb command-line would be:
75
76./blastn_vdb -db <SRR_file> -query <input_file> -out <output_file>
77
78Where
79<SRR_file> is the path (relative or absolute) and name of the SRRxxxxx file
80<input_file> is a fasta file containing the sequence(s) to be BLASTed
81<output_file> is the name specified for the output report of the blast search.
82
83Example:
84
85./blastn_vdb -db ./subdir/ERR039542.sra -query nt.test -out test.out
86
87Querying multiple SRR files simultaneously:
88
89./blastn_vdb -db "<SRR_file1> <SRR_file2> <SRR_file3>" -query <input.fa> -out <output_file>
90
91Enclose the group of files to be included in the search set in "quotes", e.g.
92"./SRR_file1.sra ./SRR_file2.sra ./SRR_file3.sra"
93
94Example:
95
96./blastn_vdb -db "./ERR039542.sra ./ERR047215.sra ./ERR039539.sra ./ERR039540.sra" -query nt.test -out test.out
97
98Caveats
99-------
100There are some limitations on the currently available application:
101
1021) Individual SRA data files containing more than 2 billion reads are not yet supported. For a
103paired-end experiment this is actually a limitation of about 1 billion "spots".
104
1052) Compressed SRA ("cSRA") is not yet fully supported. Currently, only the
106unaligned fraction of reads are searched. Compressed SRA are runs containing
107alignments (e.g., ERR230455). Runs can be checked with "vdb-dump" to report if
108they contain alignment information:
109
110    $ vdb-dump -E ERR230455
111    enumerating the tables of database 'ERR230455'
112    tbl #1: PRIMARY_ALIGNMENT
113    tbl #2: REFERENCE
114    tbl #3: SEQUENCE
115
1163) You may need to prefix "./" to the run name for files in your current
117directory.
118
1194) The blast_formatter is not currently able to read native SRA files, so
120reformatting of results saved as a blast archive is not yet supported.
121
122Common errors and fixes.
123------------------------
124
1251) Failure to provide relative path to manually downloaded SRR file:
126
127./blastn_vdb -db SRR770754.sra -query srr770754_test.fa -out test.out
128Error: NCBI C++ Exception:
129    "vdb2blast_util.cpp", line 253: Error: ncbi::CVDBBlastUtil::x_MakeSRASeqSrc()
130    - VDB BlastSeqSrc construction failed: Failed to add any run to VDB runset: unsupported while allocating
131
132Fix:
133Include relative (e.g., "../" or "./") or absolute (e.g., "/home/user/SRA_BLAST_data/") file path with SRR file
134

README-vdb-config

1# ===========================================================================
2#
3#                            PUBLIC DOMAIN NOTICE
4#               National Center for Biotechnology Information
5#
6#  This software/database is a "United States Government Work" under the
7#  terms of the United States Copyright Act.  It was written as part of
8#  the author's official duties as a United States Government employee and
9#  thus cannot be copyrighted.  This software/database is freely available
10#  to the public for use. The National Library of Medicine and the U.S.
11#  Government have not placed any restriction on its use or reproduction.
12#
13#  Although all reasonable efforts have been taken to ensure the accuracy
14#  and reliability of the software and data, the NLM and the U.S.
15#  Government do not and cannot warrant the performance or results that
16#  may be obtained by using this software or data. The NLM and the U.S.
17#  Government disclaim all warranties, express or implied, including
18#  warranties of performance, merchantability or fitness for any particular
19#  purpose.
20#
21#  Please cite the author in any work or product based on this material.
22#
23# ===========================================================================
24
25
26The tool 'vdb-config' can be used to inspect or change the configuration
27of the sra-toolkit.
28
29When called without any parameters the tool reports the current configuration
30in xml-format. No changes are made.
31
32-----------------------------------------------------------------------------
33
34vdb-config --restore-defaults
35
36If called with this parameter the tool will bring the configuration into
37default state.
38
39-----------------------------------------------------------------------------
40
41vdb-config -i
42
43This will present the user with a colored configuration dialog.
44
45The tab-key and the cursor-keys navigate the dialog. The item with the little
46red square has the focus. A button or a checkbox can be 'pressed' with the
47space or enter-key. To get out of the dialog without saving any changes
48press the '6'-key or the 'q'-key or navigate to the 'exit'-button at the
49bottom of the dialog and press the space or enter-key.
50
51
52The "data source" part:
53
54The "NCBI SRA" labeled checkbox enables/disables remote access to the SRA-
55accession stored at NCBI. As long as the computer has internet-access and this
56checkbox is enabled the user can access SRA-accessions directly without
57downloading them.
58
59A command like 'sra-pileup SRR341578' at the command-line will produce pileup
60output of the given accession even if this accession has not been downloaded
61before.
62The tool will download the data on the fly from our servers.
63
64
65There might be a checkbox labeled "site" below the "NCBI-SRA" one. If this
66checkbox is not available you do not have a 'site'-installation of SRA-data.
67If it is visible you do have such a site-installation and you can disable
68access to this data.
69
70
71The "local workspaces" part:
72
73At the top are 2 buttons "import dbGaP-project" and "set default import path".
74
75If you are not using dbGaP-projects (The database of Genotypes and Phenotypes)
76you can ignore these 2 buttons.
77
78The "import dbGaP-project" button presents you with another dialog to select
79a ngc-file. You can navigate the directories of your computer to find and
80select one of these files. By default the focus is in the files-list. It may
81be empty.
82Use the cursor-key: 'up' to focus the 'directories'-list. If you press enter
83on any of the listed directory-names you change into this directory.
84The '[ .. ]' entry brings you back into the parent directory. If you see
85ngc-files in the lower 'files'-list press the tab-key to switch to the
86'files'-list. Press enter on one of them to select this file for import. You
87will see a success-message if the import was performed without errors.
88On Windows you cannot switch from one drive-letter
89to another when selecting.
90
91The "set default import path" gives you the opportunity to specify a different
92default location for dbGaP-projects - for instance if your home directory is
93not big enough. You can always change the location for your dbGaP-project
94after the import.
95
96
97Below the 2 buttons is a list of local repositories. If there are no
98dbGaP-projects this list has only one entry "Open Access Data". This is the
99location where accessions get downloaded and cached. You can change these
100locations if for instance your home directory where they are created by
101default does not have enough space.
102The change button brings up a directory-select dialog.
103
104If you made any changes like enabling/disabling or changed a location, the
105change is only written to the configuration if you exit the dialog via the
106'save and exit' button.
107
108-----------------------------------------------------------------------------
109
110
111vdb-config -i --interactive-mode textual
112
113This will present the user with a purely textual and sequential dialog. It is
114intended to be used if the colored mode does not work, maybe because of
115console issues.
116

README.md

1# The NCBI SRA (Sequence Read Archive)
2
3### Contact:
4email: sra@ncbi.nlm.nih.gov
5
6### Download
7Visit our [download page](https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit) for pre-built binaries.
8
9### Change Log
10Please check the CHANGES.md file for change history.
11
12## The SRA Toolkit
13The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for
14using data in the INSDC Sequence Read Archives.
15
16### ANNOUNCEMENT:
17NIH has released a request for information (RFI) to solicit community feedback on new proposed Sequence Read Archive (SRA) data formats. Learn more and share your thoughts at https://go.usa.gov/xvhdr. The response deadline is July 17th, 2020. We’d encourage you all to share with your colleagues and networks, and respond if you are an SRA submitter or data user.
18
19SRA Toolkit 2.11.0 March 15, 2021
20
21  **fasterq-dump**: does not exit with 0 any more if the given path is not found
22  **fasterq-dump**: does not exit with 0 if accession is not found
23  **fasterq-dump**: does not fail when requested to dump a run file with non-standard name
24  **fasterq-dump**: available on windows
25  **kfg, prefetch, vfs**: resolve WGS reference sequences into "Accession Directory"
26  **kfg, sra-tools, vfs**: dropped support of protected repositories
27  **kns, sra-tools**: fixed formatting of HTTP requests for proxy
28  **ncbi-vdb, ngs, ngs-tools, sra-tools, vdb**: added support for 64-bit ARM (AArch64, Apple Silicon)
29  **prefetch, vfs**: fixed download of protected non-run files
30  **prefetch, vfs**: fixed segfault during download of JWT cart
31  **prefetch, vfs**: respect requested version when downloading WGS files
32  **sra-pileup**: now silent if requested slice has no alignments or reference-name does not exist
33  **sratools**: added description and documentation of the sratools driver tool to GitHub wiki
34  **sra-tools**: created a script to fix names of downloaded sra files
35  **sra-tools**: created a script to move downloaded sra run files into proper directories
36  **sratools**: disable-multithreading option removed from help text for tools that do not support it
37  **sratools**: does not access remote repository when it is disabled
38  **sra-tools, vfs**: recognize sra file names with version
39  **vdb-dump**: exits with no-zero value if asked for non existing column
40
41SRA Toolkit 2.10.8
42
43kproc, fasterq-dump: fixed problem with seg-faults caused by too small stack used by threads
44kdbmeta: allow to work with remote runs
45kdb, vdb, vfs, sra-tools: fixed bug preventing use of path to directory created by prefetch if it ends with '/'
46vfs, sra-tools, ngs-tools: report an error when file was encrypted for a different ngc file
47prefetch: print error message when cannot resolve reference sequence
48vfs, prefetch: download encrypted phenotype files with encrypted extension
49vdb, sra-docker: config can auto-generate LIBS/GUID when in a docker container
50
51SRA Toolkit 2.10.5
52sratools: fixed a potential build problem in libutf8proc
53ncbi-vdb, ngs, ngs-tools, sra-tools: all Linux builds now use g++ 7.3 (C++11 ABI)
54prefetch: improvements were made to work in environments with bad network connections
55prefetch, sratools: fixed the names of the --min-size and --max-size command line arguments when running prefetch
56
57SRA Toolkit 2.10.4
58kns, sra-tools:: fixed errors when using ngc file
59
60SRA Toolkit 2.10.3
61sraxf, fasterq-dump, fastq-dump, sam-dump: fixed a problem resulting in a segmentation fault
62
63Release 2.10.2 of `sra-tools` provides access to all the **public and controlled-access dbGaP** of SRA in the AWS and GCP environments _(Linux only for this release)_. This vast archive's original submission format and SRA-formatted data can both be accessed and computed on these clouds, eliminating the need to download from NCBI FTP as well as improving performance.
64
65The `prefetch` tool also retrieves **original submission files** in addition to ETL data for public and controlled-access dbGaP data.
66
67With release 2.10.0 of `sra-tools` we have added cloud-native operation for AWS and GCP environments _(Linux only for this release)_, for use with the public SRA. `prefetch` is capable of retrieving original submission files in addition to ETL data.
68
69With release 2.9.1 of `sra-tools` we have finally made available the tool `fasterq-dump`, a replacement for the much older `fastq-dump` tool. As its name implies, it runs faster, and is better suited for large-scale conversion of SRA objects into FASTQ files that are common on sites with enough disk space for temporary files. `fasterq-dump` is multi-threaded and performs bulk joins in a way that improves performance as compared to `fastq-dump`, which performs joins on a per-record basis _(and is single-threaded)_.
70
71`fastq-dump` is still supported as it handles more corner cases than `fasterq-dump`, but it is likely to be deprecated in the future.
72
73You can get more information about `fasterq-dump` in our Wiki at [https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump](https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump).
74
75For additional information on using, configuring, and building the toolkit,
76please visit our [wiki](https://github.com/ncbi/sra-tools/wiki)
77or our web site at [NCBI](http://www.ncbi.nlm.nih.gov/Traces/sra/?view=toolkit_doc)
78
79
80SRA Toolkit Development Team
81