• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

build_scripts/H05-Jan-2014-7332

doc/H05-Jan-2014-110

galaxy/H05-Jan-2014-4,1483,424

m4/H05-Jan-2014-240206

scripts/H03-May-2022-1,093679

src/H05-Jan-2014-5,1423,310

.gitignoreH A D05-Jan-2014991 4841

AUTHORSH A D05-Jan-2014121 53

COPYINGH A D05-Jan-201433.7 KiB662544

LICENSEH A D05-Jan-2014362 148

Makefile.amH A D05-Jan-2014665 194

NEWSH A D05-Jan-2014871 2717

READMEH A D05-Jan-20148.9 KiB275180

THANKSH A D05-Jan-2014372 96

configure.acH A D03-May-20223.5 KiB116103

install_galaxy_files.shH A D05-Jan-20143.3 KiB12472

reconfH A D05-Jan-2014691 2523

README

1FASTX-Toolkit
2=============
3
4
5Short Summary
6===============
7
8The FASTX-Toolkit is a collection of command line tools for Short-Reads
9FASTA/FASTQ files preprocessing.
10
11
12
13More Details
14==============
15
16Next-Generation sequencing machines usually produce FASTA or FASTQ files,
17containing multiple short-reads sequences (possibly with quality information).
18
19The main processing of such FASTA/FASTQ files is mapping (aka aligning)
20the sequences to reference genomes or other databases using specialized
21programs.
22
23Example of such mapping programs are:
24Blat (http://www.kentinformatics.com/index.asp),
25SHRiMP (http://compbio.cs.toronto.edu/shrimp),
26LastZ (http://www.bx.psu.edu/miller_lab),
27MAQ (http://maq.sourceforge.net/)
28And many many others.
29
30However,
31It is sometimes more productive to preprocess the FASTA/FASTQ files before
32mapping the sequences to the genome - manipulating the sequences to
33produce better mapping results.
34
35The FASTX-Toolkit tools perform some of these preprocessing tasks.
36
37
38
39Available Tools
40===============
41
42FASTQ-to-FASTA - Converts a FASTQ file to a FASTA file..
43
44FASTQ-Statistics - scans a FASTQ file, and produces some statistics about the
45	quality and the sequences in the file.
46
47FASTQ-Quality-BoxPlot, and
48FASTQ-Nucleotides-Distribution - Generates charts based on the statistics
49	generated by FASTQ-Statistics. These charts can be used to quickly
50	see the quality of the sequenced library.
51
52FASTQ-Quality-Converter - Converts from ASCII to numeric quality scores.
53
54FASTQ-Quality-Filter - removes low-quality sequences from FASTQ files.
55
56FASTX-Artifacts-Filter - removes some sequencing artifacts from FASTA/Q files.
57
58FASTX-Barcode-Splitter - A common practice is to sequence multiple biological
59	samples in the same library (marking each sample using a dedicated
60	barcode). The resulting FASTA/Q file contains intermixed sequences
61	from those samples. This tool separates FASTA/Q files into several
62	individual files, based on the barcodes.
63
64FASTX-Clipper - Adapters (aka Linkers) are added to the library (before
65	sequencing), and should be removed from the resulting FASTA/Q file.
66	This tool removes (clips) adapters.
67
68FASTA-Clipping-Histogram - After clipping a FASTA file, this tool generates a
69	chart showing the length of the clipped sequences.
70
71FASTX-Reverse-Complement - Produces a reverse-complement of FASTA/Q file.
72	If a FASTQ file is given, the quality scores are also reversed.
73
74FASTX-Trimmer - Extract sub-seqeunces from FASTA/Q file. Two examples are:
75	Removing barcodes from the 5'-end of all sequences in a FASTQ file;
76	Cutting 7 nucleotides from the 3'-end of all sequences in a FASTA file.
77
78
79
80Galaxy
81======
82
83Galaxy (http://g2.bx.psu.edu) is web-based framework for computational biology.
84
85While the programs in the FASTX-Toolkit are command-line based, the package
86include the necessary files to integrate the tools into a Galaxy server,
87Allowing users to execute this tools from their web-browser.
88
89If you run your own local mirror of a Galaxy server, you can integrate the
90FASTX-Toolkit into your Galaxy server.
91
92
93
94Software Requirements
95=====================
96
971. GCC is required to compile most tools.
98
992. FASTA-Clipping-Histogram tool requires Perl, the "PerlIO::gzip",
100   "GD::Graph::bars" modules.
101
102   Installing the perl modules can be accomplised by running:
103
104   $ sudo cpan 'PerlIO::gzip'
105   $ sudo cpan 'GD::Graph::bars'
106
1073. FASTX-Barcode-Splitter requires the GNU Sed program.
108
1094. FASTQ-Quality-Boxplot and FASTQ-Nucleotides-Distribution requires the
110   'gnuplot' program.
111
112
113Installation
114============
115
116To compile to tools, run:
117
118  $ ./configure
119  $ make
120
121To install the tools, run (as root):
122
123  $ sudo make install
124
125This will install the tools into /usr/local/bin.
126To install the tools to a different location, change the 'configure' step to:
127
128  $ ./configure --prefix=/DESTINATION/DIRECTORY
129
130
131
132Command Line Usage
133==================
134
135Most tools support "-h" argument to show a short help screen.
136Better documentation is not available at this moment.
137Some more details and examples are available in the <help> section
138of the XML tool files (in the 'galaxy' subdirectory).
139
140
141Galaxy Installation
142===================
143
144Galaxy Installation should be done manually, and requires technical
145understading of the Galaxy framework.
146
1471. build and install the command line tools (as described above).
148
1492. Make backup of your galaxy installation (better safe than sorry).
150
1513. Run the 'install_galaxy_files.sh' script,
152   and specify the galaxy root directory.
153   This script copies the files from the 'galaxy' sub-directory into
154   your galaxy mirror directory.
155
1564. Manually add the content of ./galaxy/fastx_toolkit_conf.xml file,
157   into your Galaxy's tool_conf.xml
158
1595. Edit [YOUR-GALAXY]/tool-data/fastx_clipper_sequences.txt file,
160   And add your custom adapters/linkers.
161
1626. Modify the "fastx_barcode_splitter_galaxy_wrapper.sh" as explained
163   Below (see section "Special configuration for Barcode-Splitter").
164
1657. Restart Galaxy.
166
167Always make backup of your galaxy server files before trying to install
168the FASTX-Toolkit.
169
170
171
172Galaxy Testing
173==============
174
175The following tools support Galaxy's functional testing:
176(Run from Galaxy's main directory)
177  $ sh run_functional_tests.sh -id cshl_fastq_qual_conv
178  $ sh run_functional_tests.sh -id cshl_fastq_to_fasta
179  $ sh run_functional_tests.sh -id cshl_fastq_qual_stat
180  $ sh run_functional_tests.sh -id cshl_fastx_trimmer
181  $ sh run_functional_tests.sh -id cshl_fastx_reverse_complement
182  $ sh run_functional_tests.sh -id cshl_fastx_artifacts_filter
183  $ sh run_functional_tests.sh -id cshl_fasta_collapser
184  $ sh run_functional_tests.sh -id cshl_fastx_clipper
185
186
187Special configuration for Barcode-Splitter
188==========================================
189
190When running the barcode-splitter tool from the command line you specify a
191prefix direcotry - the output files will be written to that directory (similar
192to GNU's split program usage).
193
194Running the barcode-splittter inside galaxy requires a special hack beacuse
195(I don't know how to|Galaxy can't) create a variable number of output datasets.
196The number of required output files is determined by the tool only AFTER reading
197the barcodes description file.
198
199The Galaxy-version of Barcode-Splitter works like this:
2001. A FASTA/FASTQ file, and a Barcode description file are fed to the tool.
2012. The tool produces a single output dataset (inside galaxy). This output
202   is an HTML file, containing links to the split FASTA files.
2033. Users can use the links to get the split FASTA files.
204   (Since Galaxy's 'upload data' tool accepts URLs, this is not a real problem).
205
2064. As the galaxy administrator, you'll have to edit
207   'fastx_barcode_splitter_galaxy_wrapper.sh' script and change BASEPATH and
208   PUBLICURL to point to a publicly accesibly path on your server.
209
210Example:
211
212fastx_barcode_splitter_galaxy_wrapper.sh contains:
213
214   BASEPATH="/media/sdb1/galaxy/barcode_splits/"
215   PUBLICURL="http://tango.cshl.edu/barcode_splits/"
216
217When a user runs the barcode splitter tool, the FASTA files will be generated in
218"/media/sdb1/galaxy/barcode_splits/".
219The URL "http://tango.cshl.edu/barcode_splits" is set (in an apache server) to
220serve files from "/media/sdb1/galaxy/barcode_splits/", with the following
221configuration:
222
223    Alias /barcode_splits "/media/sdb1/galaxy/barcode_splits/"
224    <Directory "/media/sdb1/galaxy/barcode_splits/">
225        AllowOverride None
226        Order allow,deny
227        Allow from all
228    </Directory>
229
230
231
232
233Licenses
234========
235
236FASTX-Toolkit is distributed under the Affero GPL version 3 or later (AGPLv3),
237
238EXCEPT
239
240All files under the 'galaxy' sub-directory are distributed under the
241same license as Galaxy itself (which is an MIT-style license).
242
243
244While IANAL, these licenses basically mean that:
2451. You're free to use FASTX-toolkit,
246
2472. You're free to integrate FASTX-toolkit in your Galaxy mirror server
248   (or any other server).
249
2503. You're free to modify the files under 'galaxy',
251   without making your modifications public.
252
2534. If you modify the FASTX-toolkit tools, and make those modifications
254   publicly available (either as downloadable tools, part of another product),
255   or as a web-based server - you must make the modified source code freely
256   available (free as in speech).
257
258See the COPYING file for the full Affero GPL.
259See the GALAXY-LICENSE file for galaxy's license.
260
261Please remember:
262  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
263APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
264HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
265OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
266THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
267PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
268IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
269ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
270
271
272=============
273Please send all comments, suggestions, bug reports (or better yet - bug fixes)
274to assafgordon@gmail.com .
275