• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

docs/H03-May-2022-3,3212,764

docs_jhu/H03-May-2022-1,065915

evaluation/H03-May-2022-10,8328,841

example/H24-Jul-2020-24,17424,170

hisat2.xcodeproj/H24-Jul-2020-1,3081,297

hisat2lib/H24-Jul-2020-1,8091,047

li_hla/H24-Jul-2020-799648

msvcc/H24-Jul-2020-4,0473,424

scripts/H03-May-2022-13,0729,241

third_party/H24-Jul-2020-188122

.gitattributesH A D24-Jul-202029 21

.gitignoreH A D24-Jul-2020713 4439

AUTHORSH A D24-Jul-20201.3 KiB3021

LICENSEH A D24-Jul-202034.3 KiB675553

MANUALH A D24-Jul-202063.8 KiB1,4681,031

MANUAL.markdownH A D24-Jul-202077.4 KiB2,4381,564

MakefileH A D03-May-202215.2 KiB565424

NEWSH A D24-Jul-2020584 1712

README.mdH A D24-Jul-202012.7 KiB203132

TUTORIALH A D24-Jul-2020202 53

VERSIONH A D24-Jul-20206 21

_config.ymlH A D24-Jul-202032 11

aligner_bt.cppH A D24-Jul-202052.7 KiB1,7731,474

aligner_bt.hH A D24-Jul-202030.2 KiB948521

aligner_cache.cppH A D24-Jul-20204.1 KiB182140

aligner_cache.hH A D24-Jul-202026.3 KiB1,014542

aligner_driver.cppH A D24-Jul-20202.2 KiB8155

aligner_driver.hH A D24-Jul-20208.2 KiB248115

aligner_metrics.hH A D24-Jul-202011 KiB353247

aligner_report.hH A D24-Jul-20201,006 3610

aligner_result.cppH A D24-Jul-202055.7 KiB2,1301,750

aligner_result.hH A D24-Jul-202064.8 KiB2,3191,436

aligner_seed.cppH A D24-Jul-202016.1 KiB531382

aligner_seed.hH A D24-Jul-202084 KiB2,9232,113

aligner_seed2.cppH A D24-Jul-202040.7 KiB1,246890

aligner_seed2.hH A D24-Jul-2020127.6 KiB4,2923,061

aligner_seed_policy.cppH A D24-Jul-202029.6 KiB917615

aligner_seed_policy.hH A D24-Jul-20208.2 KiB23542

aligner_sw.cppH A D24-Jul-2020101.5 KiB3,2152,868

aligner_sw.hH A D03-May-202224.9 KiB651321

aligner_sw_common.hH A D24-Jul-20208.5 KiB306211

aligner_sw_driver.cppH A D24-Jul-2020736 210

aligner_sw_driver.hH A D24-Jul-2020105.9 KiB2,9392,312

aligner_sw_nuc.hH A D24-Jul-20207 KiB263161

aligner_swsse.cppH A D24-Jul-20202.6 KiB8950

aligner_swsse.hH A D24-Jul-202014.8 KiB501274

aligner_swsse_ee_i16.cppH A D24-Jul-202062.1 KiB1,9121,401

aligner_swsse_ee_u8.cppH A D24-Jul-202061.2 KiB1,9031,391

aligner_swsse_loc_i16.cppH A D24-Jul-202074.5 KiB2,2731,599

aligner_swsse_loc_u8.cppH A D24-Jul-202073.3 KiB2,2671,608

aln_sink.cppH A D24-Jul-202025.4 KiB786648

aln_sink.hH A D24-Jul-2020106 KiB3,2532,257

alphabet.cppH A D24-Jul-202018.9 KiB441297

alphabet.hH A D24-Jul-20205.5 KiB20094

alt.hH A D24-Jul-20208.3 KiB295234

assert_helpers.hH A D24-Jul-20209.3 KiB280234

banded.cppH A D24-Jul-2020823 286

banded.hH A D24-Jul-20201.2 KiB5320

binary_sa_search.hH A D24-Jul-20203.5 KiB10358

bit_packed_array.cppH A D24-Jul-20207.1 KiB316217

bit_packed_array.hH A D24-Jul-20202.8 KiB10656

bitpack.hH A D24-Jul-20202.2 KiB8141

blockwise_sa.hH A D24-Jul-202039.5 KiB1,114824

bp_aligner.hH A D24-Jul-202061.7 KiB1,2381,104

btypes.hH A D24-Jul-20201.3 KiB4921

ccnt_lut.cppH A D24-Jul-20201.9 KiB8150

diff_sample.cppH A D24-Jul-20204.5 KiB11870

diff_sample.hH A D24-Jul-202030.5 KiB1,001753

dp_framer.cppH A D24-Jul-202036.4 KiB911542

dp_framer.hH A D24-Jul-20209.1 KiB262134

ds.cppH A D24-Jul-20203.1 KiB156116

ds.hH A D24-Jul-202089.3 KiB4,3982,621

edit.cppH A D24-Jul-202012.3 KiB502384

edit.hH A D24-Jul-20209.9 KiB402227

endian_swap.hH A D24-Jul-20204.1 KiB16190

extract_exons.pyH A D03-May-20225.4 KiB160115

extract_splice_sites.pyH A D03-May-20224.9 KiB13896

fast_mutex.hH A D24-Jul-20208.4 KiB295199

filebuf.hH A D24-Jul-202015.5 KiB719451

formats.hH A D24-Jul-20201.2 KiB5830

gbwt_graph.hH A D24-Jul-2020102.9 KiB2,7982,328

gfm.cppH A D24-Jul-20202.3 KiB7338

gfm.hH A D24-Jul-2020252.3 KiB6,9765,498

gp.hH A D24-Jul-20202 KiB8441

group_walk.cppH A D24-Jul-2020759 211

group_walk.hH A D24-Jul-202055.6 KiB1,6251,154

hgfm.hH A D24-Jul-2020102.2 KiB2,6542,199

hi_aligner.hH A D24-Jul-2020288.1 KiB6,9256,082

hier_idx_common.hH A D24-Jul-20201.6 KiB4411

hisat2H A D03-May-202219.4 KiB666542

hisat2-buildH A D03-May-20222.7 KiB9674

hisat2-build-newH A D24-Jul-20203 KiB10177

hisat2-inspectH A D03-May-20222.6 KiB7456

hisat2.cppH A D24-Jul-2020176 KiB4,3713,792

hisat2.slnH A D24-Jul-20205.3 KiB8381

hisat2_build.cppH A D24-Jul-202032.3 KiB850749

hisat2_build_main.cppH A D24-Jul-20202 KiB7139

hisat2_extract_exons.pyH A D03-May-20225.4 KiB160115

hisat2_extract_snps_haplotypes_UCSC.pyH A D03-May-202218.9 KiB579476

hisat2_extract_snps_haplotypes_VCF.pyH A D03-May-202234.4 KiB924778

hisat2_extract_splice_sites.pyH A D03-May-20224.9 KiB13896

hisat2_inspect.cppH A D24-Jul-202028 KiB792688

hisat2_main.cppH A D24-Jul-20202 KiB7038

hisat2_read_statistics.pyH A D03-May-20225.5 KiB237156

hisat2_repeat.cppH A D24-Jul-202031 KiB884750

hisat2_repeat_main.cppH A D24-Jul-20202 KiB7139

hisat2_simulate_reads.pyH A D03-May-202234.4 KiB972831

hisat_bp.cppH A D24-Jul-2020151.3 KiB3,8863,354

ival_list.cppH A D24-Jul-20205.1 KiB166136

ival_list.hH A D24-Jul-20206.5 KiB300171

limit.cppH A D24-Jul-20201.8 KiB4422

limit.hH A D24-Jul-20201.3 KiB4925

ls.cppH A D24-Jul-20203.6 KiB143114

ls.hH A D24-Jul-202011.7 KiB334232

mask.cppH A D24-Jul-20201 KiB3713

mask.hH A D24-Jul-20202.1 KiB8036

mem_ids.hH A D24-Jul-20201.2 KiB369

mm.hH A D24-Jul-20201.5 KiB5220

multikey_qsort.cppH A D24-Jul-2020763 211

multikey_qsort.hH A D24-Jul-202038 KiB1,238938

opts.hH A D24-Jul-20207.2 KiB195172

outq.cppH A D24-Jul-20205.4 KiB202159

outq.hH A D24-Jul-20203.2 KiB15083

pat.cppH A D24-Jul-202046.3 KiB1,8001,478

pat.hH A D24-Jul-202043.6 KiB1,7891,175

pe.cppH A D24-Jul-202030.9 KiB942692

pe.hH A D24-Jul-20209.5 KiB322161

presets.cppH A D24-Jul-20202.6 KiB8855

presets.hH A D24-Jul-20201.5 KiB6826

processor_support.hH A D03-May-20222.3 KiB7341

qual.cppH A D24-Jul-20203.9 KiB8658

qual.hH A D24-Jul-20206.5 KiB237159

radix_sort.hH A D24-Jul-202010.5 KiB298262

random_source.cppH A D24-Jul-20203.2 KiB12999

random_source.hH A D24-Jul-20205.2 KiB240145

random_util.cppH A D24-Jul-2020907 254

random_util.hH A D24-Jul-20205.8 KiB222121

read.hH A D24-Jul-202014 KiB534375

read_qseq.cppH A D24-Jul-20208.2 KiB305226

ref_coord.cppH A D24-Jul-20201,014 3411

ref_coord.hH A D24-Jul-202010.2 KiB430234

ref_read.cppH A D24-Jul-202012.1 KiB455338

ref_read.hH A D24-Jul-20208.4 KiB325231

reference.cppH A D24-Jul-202022.6 KiB715593

reference.hH A D24-Jul-20205.8 KiB19289

repeat.hH A D24-Jul-202024.5 KiB628530

repeat_builder.cppH A D24-Jul-2020160 KiB4,7564,027

repeat_builder.hH A D24-Jul-202026.8 KiB963747

repeat_kmer.hH A D24-Jul-202020.3 KiB607533

rfm.hH A D24-Jul-202041 KiB1,137970

sam.hH A D24-Jul-202038.9 KiB1,2551,033

scoring.cppH A D24-Jul-20209.3 KiB287200

scoring.hH A D24-Jul-202017.6 KiB547318

search_globals.hH A D24-Jul-20201.4 KiB4925

sequence_io.hH A D24-Jul-20203.6 KiB12686

shmem.cppH A D24-Jul-20201.3 KiB5017

shmem.hH A D24-Jul-20204.9 KiB162121

simple_func.cppH A D24-Jul-20202.3 KiB9471

simple_func.hH A D24-Jul-20203.4 KiB12677

splice_site.cppH A D24-Jul-202032.8 KiB851754

splice_site.hH A D24-Jul-202017.6 KiB616380

splice_site_mem.hH A D24-Jul-20201 MiB6,2256,212

splice_site_new.cppH A D24-Jul-202045.6 KiB1,1581,028

spliced_aligner.hH A D24-Jul-2020113.1 KiB2,0551,914

sse_util.cppH A D24-Jul-2020979 3410

sse_util.hH A D03-May-202214 KiB577345

sstring.cppH A D24-Jul-20205.3 KiB203168

sstring.hH A D24-Jul-202075.3 KiB3,4512,149

str_util.hH A D24-Jul-20201.1 KiB4822

threading.hH A D24-Jul-20201.3 KiB5828

timer.hH A D24-Jul-20202.4 KiB8850

tinythread.cppH A D24-Jul-20208.9 KiB321203

tinythread.hH A D24-Jul-202020.7 KiB715376

tokenize.hH A D24-Jul-20201.7 KiB6333

tp.hH A D24-Jul-20203.7 KiB11973

unique.cppH A D24-Jul-20202.3 KiB6723

unique.hH A D24-Jul-202014.4 KiB532376

util.hH A D24-Jul-20201.5 KiB5424

word_io.hH A D24-Jul-20208.1 KiB394240

zbox.hH A D24-Jul-20202.7 KiB9862

README.md

1# Graph-Based Genome Alignment and Genotyping with HISAT2 and HISAT-genotype
2
3## Contact
4
5[Daehwan Kim](https://kim-lab.org) (infphilo@gmail.com) and [Chanhee Park](https://www.linkedin.com/in/chanhee-park-97677297/) (parkchanhee@gmail.com)
6
7## Abstract
8
9Rapid advances in next-generation sequencing technologies have dramatically changed our ability to perform genome-scale analyses. The human reference genome used for most genomic analyses represents only a small number of individuals, limiting its usefulness for genotyping. We designed a novel method, HISAT2, for representing and searching an expanded model of the human reference genome, in which a large catalogue of known genomic variants and haplotypes is incorporated into the data structure used for searching and alignment. This strategy for representing a population of genomes, along with a fast and memory-efficient search algorithm, enables more detailed and accurate variant analyses than previous methods. We demonstrate two initial applications of HISAT2: HLA typing, a critical need in human organ transplantation, and DNA fingerprinting, widely used in forensics. These applications are part of HISAT-genotype, with performance not only surpassing earlier computational methods, but matching or exceeding the accuracy of laboratory-based assays.
10
11![](HISAT2-genotype.png)
12
13For more information, see the following websites:
14* [HISAT2 website](http://ccb.jhu.edu/software/hisat2)
15* [HISAT-genotype website](http://ccb.jhu.edu/software/hisat-genotype)
16
17## HISAT2
18HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) to a population of human genomes (as well as to a single reference genome). Based on an extension of BWT for a graph [1], we designed and implemented a graph FM index (GFM), an original approach and its first implementation to the best of our knowledge. In addition to using one global GFM index that represents general population, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover human population). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of sequencing reads. This new indexing scheme is called Hierarchical Graph FM index (HGFM). We have developed HISAT2 based on the HISAT [2] and Bowtie 2 [3] implementations.  See the [HISAT2 website](http://ccb.jhu.edu/software/hisat2/index.shtml) for
19more information.
20
21A few notes:
22
231) HISAT2's index (HGFM) size for the human reference genome and 12.3 million common SNPs is 6.2GB. The SNPs consist of 11 million single nucleotide polymorphisms, 728,000 deletions, and 555,000 insertions. Insertions and deletions used in this index are small (usually <20bp). We plan to incorporate structural variations (SV) into this index.
24
252) The memory footprint of HISAT2 is relatively low, 6.7GB.
26
273) The runtime of HISAT2 is estimated to be slightly slower than HISAT (30–100% slower for some data sets).
28
294) HISAT2 provides greater accuracy for alignment of reads containing SNPs.
30
315) We released a first (beta) version of HISAT2 in September 8, 2015.
32
33## License
34
35[GPL-3.0](LICENSE)
36
37# For reviwers, follow the instructions below to reproduce some of the results in the manuscript.
38
39## Code
40
41A specific version of [HISAT2 and HISAT-genotype](http://github.com/infphilo/hisat2) at GitHub is used (a branch name: hisat2_v2.2.0_beta).
42
43## Initial setup
44
45HISAT-genotype requires a 64-bit computer running either Linux or Mac OS X and at least 8 GB of RAM (16 GB of RAM is preferred). All the commands used should be run from the Unix shell prompt within a terminal window and are prefixed with a '$' character.
46
47We refer to <b>hisat-genotype-top</b> as our top directory where all of our programs are located. <b>hisat-genotype-top</b> is a place holder that can be changed to another name according to user preference.
48Run the following commands to install HISAT2 and HISAT-genotype.
49
50    $ git clone https://github.com/infphilo/hisat2 hisat-genotype-top
51    $ cd hisat-genotype-top
52    hisat-genotype-top$ git checkout hisat2_v2.2.0_beta
53    hisat-genotype-top$ make hisat2-align-s hisat2-build-s hisat2-inspect-s
54
55To make the binaries built above and other python scripts available everywhere, add the hisat-genotype-top directory to the PATH environment variable (e.g. ~/.bashrc)
56
57    export PATH=hisat-genotype-top:hisat-genotype-top/hisatgenotype_scripts:$PATH
58    export PYTHONPATH=hisat-genotype-top/hisatgenotype_modules:$PYTHONPATH
59
60To reflect the change, run the following command:
61
62    $ source ~/.bashrc
63
64Download real reads, simulated reads, and HISAT2 indexes, then move them into appropriate directories:
65
66    hisat-genotype-top$ cd evaluation
67    hisat-genotype-top/evaluation$ wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/hisat2_20181025.tar.gz
68    hisat-genotype-top/evaluation$ tar xvzf hisat2_20181025.tar.gz
69    hisat-genotype-top/evaluation$ mkdir aligners aligners/bin; cd aligners/bin; ln -s ../../../hisat2* .; cd ../..
70    hisat-genotype-top/evaluation$ mv hisat2/* .
71    hisat-genotype-top/evaluation$ cd simulation; ./init.py; cd ../real; ./init.py; cd ..
72
73## Run HISAT2 on the following simulated and real data sets.
74###	10 million simulated read pairs with SNPs and with sequencing errors
75
76    hisat-genotype-top/evaluation$ cd simulation/10M_DNA_mismatch_snp_reads_genome
77    hisat-genotype-top/evaluation/simulation/10M_DNA_mismatch_snp_reads_genome$ ./calculate_read_cost.py --aligner-list hisat2 --paired-end --fresh
78
79### 10 million simulated read pairs with SNPs and without sequencing errors
80
81    hisat-genotype-top/evaluation$ cd simulation/10M_DNA_snp_reads_genome
82    hisat-genotype-top/evaluation/simulation/10M_DNA_snp_reads_genome$ ./calculate_read_cost.py --aligner-list hisat2 --paired-end --fresh
83
84### 10 million simulated read pairs without SNPs and with sequencing errors
85    hisat-genotype-top/evaluation$ cd simulation/10M_DNA_mismatch_reads_genome
86    hisat-genotype-top/evaluation/simulation/10M_DNA_mismatch_reads_genome$ ./calculate_read_cost.py --aligner-list hisat2 --paired-end --fresh
87
88### 10 million simulated read pairs without SNPs and without sequencing errors
89    hisat-genotype-top/evaluation$ cd simulation/10M_DNA_reads_genome
90    hisat-genotype-top/evaluation/simulation/10M_DNA_reads_genome$ ./calculate_read_cost.py --aligner-list hisat2 --paired-end --fresh
91
92### 10 million real read pairs
93    hisat-genotype-top/evaluation$ cd real/DNA/10M
94    hisat-genotype-top/evaluation/real/DNA/10M$ ./calculate_read_cost.py --aligner-list hisat2 --paired-end --fresh
95
96### Interpreting output
97    Example alignment output for simulated reads
98    aligned: 1000000, multi aligned: 2654390
99		    correctly mapped: 999963 (100.00%)
100		    uniquely and correctly mapped: 967631 (96.76%)
101			    54694 reads per sec (all)
102			    Memory Usage: 86MB
103
104The above lines show that 1,000,000 read pairs are aligned and the total number of alignments is 2,654,390. 999,963 pairs (100.00%) are correctly aligned (e.g. one of the alignments is correct). 967,631 (96.76%) pairs are uniquely and correctly aligned. HISAT2 aligns 54,594 reads with a peak memory usage of 86 MB of RAM.
105
106Each run is expected to take up to several hours mostly due to the comparison of HISAT2’s reported alignments and true alignments and the expansion of repeat alignments.
107
108## Details on HISAT-genotype run for HLA typing and assembly
109
110To create a directory where we perform our analysis for HLA typing and assembly, which here is referred to as hla-analysis but can be changed by the user, execute the following command.
111
112    hisat-genotype-top/evaluation$ mkdir hla-analysis
113
114The current directory can be changed to hla-analysis as follows:
115
116    hisat-genotype-top/evaluation$ cd hla-analysis
117
118Additional program requirements: SAMtools (version 1.3 or later)
119
120### Downloading a Graph Reference and Index
121The graph reference we are going to build incorporates variants of numerous HLA alleles into the linear reference using a graph. The graph reference also includes some known variants of other regions of the genome (e.g. common small variants). To copy the graph reference, type:
122
123    hisat-genotype-top/evaluation/hla-analysis$ mv ../hisat2-genotype/* .
124
125### Typing and Assembly
126Since whole genome sequencing (WGS) data includes reads that are from the whole genome, the first step is to extract the reads that belong to the HLA genes by aligning them to the graph reference with HISAT2. We provide these extracted reads in hisat-genotype-top/evaluation/hla-analysis/ILMN_20181025.
127
128HISAT-genotype performs both HLA typing and assembly as follows.
129You can perform HLA typing and assembly for HLA-A gene on sequencing reads from the genome NA12892 (Illumina's HiSeq 2000 platform).
130
131    hisat-genotype-top/evaluation/hla-analysis$ hisatgenotype_locus.py --base hla --locus-list A --assembly -1 ILMN_20181025/NA12892.hla.extracted.1.fq.gz -2 ILMN_20181025/NA12892.hla.extracted.2.fq.gz
132
133### DNA Fingerprinting
134This function can be performed with the same commands used for “Typing and Assembly” and just replacing --base hla with --base codis.
135
136### Interpreting Output
137    Typing Output
138    Number of reads aligned: 1507
139      1 A*02:01:01:02L (count: 571)
140      2 A*02:01:31 (count: 557)
141      3 A*02:20:02 (count: 557)
142      4 A*02:29 (count: 557)
143      5 A*02:321N (count: 556)
144      6 A*02:372 (count: 556)
145      7 A*02:610:02 (count: 556)
146      8 A*02:249 (count: 555)
147      9 A*02:479 (count: 555)
148      10 A*02:11:01 (count: 554)
149
150The above lines show the top ten alleles that the most number of reads are mapped to or compatible with. For example, the allele first ranked, A\*02:01:01:02L, is compatible with 571 reads. This raw estimate based on the number of reads should not be used to determine the two true alleles because the alleles that resemble both but are not true alleles often tend to be compatible with more reads than either of the true alleles. Thus, we apply a statistical model to identify the two true alleles as described in the main text.
151
152    Abundance of alleles
153      1 ranked A*02:01:01:01 (abundance: 54.32%)
154      2 ranked A*11:01:01:01 (abundance: 45.20%)
155      3 ranked A*24:33 (abundance: 0.48%)
156
157The above rankings show the top three alleles that are most abundant in the sample. Normally, the top two alleles in this estimate (e.g. A\*02:01:01:01 and A\*11:01:01:01) are considered as the two alleles that best match a given sequencing data.
158
159Additional tutorials and details are available at the HISAT-genotype website: https://ccb.jhu.edu/hisat-genotype
160
161
162## Data
163
164The Data directory (`/data`) contains all input files for reproducing some of our results such as from the
165evaluation of HISAT2 and other programs using both simulated and real reads, from typing and assembling
166HLA genes of Illumina Platinum Genomes using HISAT-genotype, and from building a HISAT2 graph index.
167
168* **Simulated read pairs**
169
170| Type | Number of pairs | Path |
171| - | - | - |
172| SNPs and sequencing errors included | 10,000,000 | hisat-genotype-top/evaluation/reads/simulation/10M_DNA_mismatch_snp_reads_genome |
173| SNPs included | 10,000,000 | hisat-genotype-top/evaluation/reads/simulation/10M_DNA_snp_reads_genome |
174| Sequencing errors included | 10,000,000 | hisat-genotype-top/evaluation/reads/simulation/10M_DNA_mismatch_reads_genome |
175| No SNPs nor sequencing errors included | 10,000,000 | hisat-genotype-top/evaluation/reads/simulation/10M_DNA_reads_genome |
176
177Each directory comes with a true alignment file in SAM format so that users know where the reads were generated in the human reference genome.
178
179* **Real read pairs**
180
181| Number of read pairs | Path |
182| - | - |
183| 10,000,000 | hisat-genotype-top/evaluation/reads/real/DNA |
184
185* **Human reference genome, SNPs, haplotypes, and HISAT2's indexes**
186
187| Type | Path |
188| - | - |
189| GRCh38 reference | hisat-genotype-top/evaluation/data/genome.fa |
190| SNPs | hisat-genotype-top/evaluation/data/genome.snp |
191| Haplotypes | hisat-genotype-top/evaluation/data/genome.haplotype |
192| HISAT2's prebuilt graph index for comparison with other aligners | hisat-genotype-top/evaluation/indexes/HISAT2/genome.[1-8].ht2 |
193| HISAT2's prebuilt graph index for genotyping | hisat-genotype-top/evaluation/hla-analysis/genotype_genome.[1-8].ht2 |
194
195
196## References
197
198[1] Sirén J, Välimäki N, Mäkinen V (2014) Indexing graphs for path queries with applications in genome research. IEEE/ACM Transactions on Computational Biology and Bioinformatics 11: 375–388. doi: 10.1109/tcbb.2013.2297101
199
200[2] Kim D, Langmead B, and Salzberg SL  HISAT: a fast spliced aligner with low memory requirements, Nature methods, 2015
201
202[3] Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9:357-359
203