• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

tests/H02-Feb-2018-3,2523,168

LICENSEH A D02-Feb-20182 KiB3433

README.rstH A D02-Feb-20184.5 KiB168105

basic_alphabet.hH A D02-Feb-20186 KiB16788

bits_bucket.hH A D02-Feb-20187.3 KiB19486

bits_context.hH A D02-Feb-20186.9 KiB17373

bits_hits.hH A D02-Feb-201811.8 KiB355194

bits_matches.hH A D02-Feb-201838.6 KiB1,057606

bits_reads.hH A D02-Feb-20187 KiB196127

bits_seeds.hH A D02-Feb-20184.9 KiB10841

file_pair.hH A D02-Feb-20184.1 KiB9329

file_prefetched.hH A D02-Feb-20187.7 KiB21493

find_extender.hH A D02-Feb-201818.1 KiB437267

find_verifier.hH A D02-Feb-20186.6 KiB14373

index_fm.hH A D02-Feb-20188.6 KiB18493

indexer.cppH A D02-Feb-201812.8 KiB390228

mapper.cppH A D02-Feb-201818.8 KiB458295

mapper.hH A D02-Feb-201856.4 KiB1,436969

mapper_aligner.hH A D02-Feb-20189.9 KiB234122

mapper_classifier.hH A D02-Feb-201811.1 KiB280148

mapper_collector.hH A D02-Feb-20187.6 KiB19397

mapper_extender.hH A D02-Feb-201812.4 KiB311181

mapper_filter.hH A D02-Feb-20183.1 KiB7221

mapper_ranker.hH A D02-Feb-20187 KiB18087

mapper_verifier.hH A D02-Feb-201810.8 KiB268146

mapper_writer.hH A D02-Feb-201824.4 KiB651407

misc_options.hH A D02-Feb-201813.1 KiB374192

misc_tags.hH A D02-Feb-20184.2 KiB12345

misc_timer.hH A D02-Feb-20184.6 KiB12447

misc_types.hH A D02-Feb-20188.5 KiB21588

store_seqs.hH A D02-Feb-201810.5 KiB294152

README.rst

1Yara - Yet another read aligner
2===============================
3
4
5Overview
6--------
7
8Yara is an *exact* tool for aligning DNA sequencing reads to reference genomes.
9
10Main features
11~~~~~~~~~~~~~
12
13* Exhaustive enumeration of sub-*optimal* end-to-end alignments under the edit distance.
14* Excellent speed, memory footprint and accuracy.
15* Accurate mapping quality computation.
16* Support for reference genomes consisiting of million of contigs.
17* Direct output in SAM/BAM format.
18
19Supported data
20~~~~~~~~~~~~~~
21
22Yara has been tested on DNA reads (i.e., Whole Genome, Exome, ChIP-seq, MeDIP-seq) produced by the following sequencing platforms:
23
24* Illumina GA II, HiSeq and MiSeq (single-end and paired-end).
25* Life Technologies Ion Torrent Proton and PGM.
26
27Quality trimming is *necessary* for Ion Torrent reads and recommended for Illumina reads.
28
29Unsupported data
30~~~~~~~~~~~~~~~~
31
32* RNA-seq reads spanning splicing sites.
33* Long noisy reads (e.g., Pacific Biosciences RSII, Oxford Nanopore MinION).
34
35Installation from sources
36-------------------------
37
38The following instructions assume Linux or OS X. For more information, including Windows instructions, refer to the `SeqAn getting started tutorial <http://trac.seqan.de/wiki/Tutorial/GettingStarted>`_.
39
40Software requirements
41~~~~~~~~~~~~~~~~~~~~~
42
43**A modern C++11 compiler with OpenMP 3.0 extensions is required to build Yara. If unsure, use GNU G++ 4.9 or newer.**
44
45* Git.
46* CMake 3.2 or newer.
47* G++ 4.9 or newer.
48
49Download
50~~~~~~~~
51
52Yara sources are hosted on GitHub within the SeqAn library. Download the sources by executing:
53
54::
55
56  $ git clone https://github.com/seqan/seqan.git
57
58Configuration
59~~~~~~~~~~~~~
60
61Create a build project by executing CMake as follows:
62
63::
64
65  $ mkdir yara-build
66  $ cd yara-build
67  $ cmake ../seqan -DSEQAN_BUILD_SYSTEM=APP:yara -DCMAKE_CXX_COMPILER=/usr/bin/g++-4.9
68
69Build
70~~~~~
71
72Invoke make as follows:
73
74::
75
76  $ make all
77
78Installation
79~~~~~~~~~~~~
80
81Copy the binaries to a folder in your *PATH*, e.g.:
82
83::
84
85  # cp bin/yara* /usr/local/bin
86
87
88Usage
89-----
90
91Yara consists of two executables:
92
93* **yara_indexer** builds the index of a reference genome.
94* **yara_mapper** maps DNA reads on the indexed reference genome.
95
96This document explains only basic usage. To get complete usage descriptions, invoke each tool with -h or --help.
97
98Indexer
99~~~~~~~
100
101Index a reference genome *REF.fasta.gz* by executing:
102
103::
104
105  $ yara_indexer REF.fasta.gz -o REF.index
106
107**The indexer needs at least 25 times the space of the uncompressed reference genome**.
108Be sure to dispose of that space inside the output folder.
109The tool will take about one-two hours to index the human reference genome.
110On success, the tool will create various files called *REF.index.**.
111
112**The indexer does not work over GPFS and may have problems on other network filesystems**.
113
114Mapper
115~~~~~~
116
117Single-end reads
118^^^^^^^^^^^^^^^^
119
120Map single-end DNA reads on the indexed reference genome by executing:
121
122::
123
124  $ yara_mapper REF.index READS.fastq.gz -o READS.bam
125
126By default, the tool will report all co-optimal mapping locations per read within an error rate of 5%.
127The results will be stored in a BAM file called *READS.bam*.
128
129Paired-end reads
130^^^^^^^^^^^^^^^^
131
132Map paired-end reads by providing two DNA read files:
133
134::
135
136  $ yara_mapper REF.index READS_1.fastq.gz READS_2.fastq.gz -o READS.bam
137
138Output format
139^^^^^^^^^^^^^
140
141Output files follow the `SAM/BAM format specification <http://samtools.github.io/hts-specs/SAMv1.pdf>`_.
142In addition, Yara generates the following optional tags:
143
144+-----+----------------------------------------------------+
145| Tag | Meaning                                            |
146+=====+====================================================+
147| NM  | Edit distance                                      |
148+-----+----------------------------------------------------+
149| X0  | Number of co-optimal mapping locations             |
150+-----+----------------------------------------------------+
151| X1  | Number of sub-optimal mapping locations            |
152+-----+----------------------------------------------------+
153| XA  | Alternative locations: (chr,begin,end,strand,NM;)* |
154+-----+----------------------------------------------------+
155
156
157Contact
158-------
159
160For questions or comments, feel free to contact: Enrico Siragusa <enrico.siragusa@fu-berlin.de>
161
162
163References
164----------
165
1661. Siragusa, E. (2015). Approximate string matching for high-throughput sequencing. PhD Dissertation, Free University of Berlin.
1672. Siragusa, E., Weese D., and Reinert, K. (2013). Fast and accurate read mapping with approximate seeds and multiple backtracking. Nucleic Acids Research, 2013, 1–8.
168