1[![Build Status](https://travis-ci.org/lh3/bwa.svg?branch=dev)](https://travis-ci.org/lh3/bwa) 2## Getting started 3 4 git clone https://github.com/lh3/bwa.git 5 cd bwa; make 6 ./bwa index ref.fa 7 ./bwa mem ref.fa read-se.fq.gz | gzip -3 > aln-se.sam.gz 8 ./bwa mem ref.fa read1.fq read2.fq | gzip -3 > aln-pe.sam.gz 9 10## Introduction 11 12BWA is a software package for mapping DNA sequences against a large reference 13genome, such as the human genome. It consists of three algorithms: 14BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina 15sequence reads up to 100bp, while the rest two for longer sequences ranged from 1670bp to a few megabases. BWA-MEM and BWA-SW share similar features such as the 17support of long reads and chimeric alignment, but BWA-MEM, which is the latest, 18is generally recommended as it is faster and more accurate. BWA-MEM also has 19better performance than BWA-backtrack for 70-100bp Illumina reads. 20 21For all the algorithms, BWA first needs to construct the FM-index for the 22reference genome (the **index** command). Alignment algorithms are invoked with 23different sub-commands: **aln/samse/sampe** for BWA-backtrack, 24**bwasw** for BWA-SW and **mem** for the BWA-MEM algorithm. 25 26## Availability 27 28BWA is released under [GPLv3][1]. The latest source code is [freely 29available at github][2]. Released packages can [be downloaded][3] at 30SourceForge. After you acquire the source code, simply use `make` to compile 31and copy the single executable `bwa` to the destination you want. The only 32dependency required to build BWA is [zlib][14]. 33 34Since 0.7.11, precompiled binary for x86\_64-linux is available in [bwakit][17]. 35In addition to BWA, this self-consistent package also comes with bwa-associated 36and 3rd-party tools for proper BAM-to-FASTQ conversion, mapping to ALT contigs, 37adapter triming, duplicate marking, HLA typing and associated data files. 38 39## Seeking help 40 41The detailed usage is described in the man page available together with the 42source code. You can use `man ./bwa.1` to view the man page in a terminal. The 43[HTML version][4] of the man page can be found at the [BWA website][5]. If you 44have questions about BWA, you may [sign up the mailing list][6] and then send 45the questions to [bio-bwa-help@sourceforge.net][7]. You may also ask questions 46in forums such as [BioStar][8] and [SEQanswers][9]. 47 48## Citing BWA 49 50* Li H. and Durbin R. (2009) Fast and accurate short read alignment with 51 Burrows-Wheeler transform. *Bioinformatics*, **25**, 1754-1760. [PMID: 52 [19451168][10]]. (if you use the BWA-backtrack algorithm) 53 54* Li H. and Durbin R. (2010) Fast and accurate long-read alignment with 55 Burrows-Wheeler transform. *Bioinformatics*, **26**, 589-595. [PMID: 56 [20080505][11]]. (if you use the BWA-SW algorithm) 57 58* Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs 59 with BWA-MEM. [arXiv:1303.3997v2][12] [q-bio.GN]. (if you use the BWA-MEM 60 algorithm or the **fastmap** command, or want to cite the whole BWA package) 61 62Please note that the last reference is a preprint hosted at [arXiv.org][13]. I 63do not have plan to submit it to a peer-reviewed journal in the near future. 64 65## Frequently asked questions (FAQs) 66 671. [What types of data does BWA work with?](#type) 682. [Why does a read appear multiple times in the output SAM?](#multihit) 693. [Does BWA work on reference sequences longer than 4GB in total?](#4gb) 704. [Why can one read in a pair has high mapping quality but the other has zero?](#pe0) 715. [How can a BWA-backtrack alignment stands out of the end of a chromosome?](#endref) 726. [Does BWA work with ALT contigs in the GRCh38 release?](#altctg) 737. [Can I just run BWA-MEM against GRCh38+ALT without post-processing?](#postalt) 74 75#### <a name="type"></a>1. What types of data does BWA work with? 76 77BWA works with a variety types of DNA sequence data, though the optimal 78algorithm and setting may vary. The following list gives the recommended 79settings: 80 81* Illumina/454/IonTorrent single-end reads longer than ~70bp or assembly 82 contigs up to a few megabases mapped to a closely related reference genome: 83 84 bwa mem ref.fa reads.fq > aln.sam 85 86* Illumina single-end reads shorter than ~70bp: 87 88 bwa aln ref.fa reads.fq > reads.sai; bwa samse ref.fa reads.sai reads.fq > aln-se.sam 89 90* Illumina/454/IonTorrent paired-end reads longer than ~70bp: 91 92 bwa mem ref.fa read1.fq read2.fq > aln-pe.sam 93 94* Illumina paired-end reads shorter than ~70bp: 95 96 bwa aln ref.fa read1.fq > read1.sai; bwa aln ref.fa read2.fq > read2.sai 97 bwa sampe ref.fa read1.sai read2.sai read1.fq read2.fq > aln-pe.sam 98 99* PacBio subreads or Oxford Nanopore reads to a reference genome: 100 101 bwa mem -x pacbio ref.fa reads.fq > aln.sam 102 bwa mem -x ont2d ref.fa reads.fq > aln.sam 103 104BWA-MEM is recommended for query sequences longer than ~70bp for a variety of 105error rates (or sequence divergence). Generally, BWA-MEM is more tolerant with 106errors given longer query sequences as the chance of missing all seeds is small. 107As is shown above, with non-default settings, BWA-MEM works with Oxford Nanopore 108reads with a sequencing error rate over 20%. 109 110#### <a name="multihit"></a>2. Why does a read appear multiple times in the output SAM? 111 112BWA-SW and BWA-MEM perform local alignments. If there is a translocation, a gene 113fusion or a long deletion, a read bridging the break point may have two hits, 114occupying two lines in the SAM output. With the default setting of BWA-MEM, one 115and only one line is primary and is soft clipped; other lines are tagged with 1160x800 SAM flag (supplementary alignment) and are hard clipped. 117 118#### <a name="4gb"></a>3. Does BWA work on reference sequences longer than 4GB in total? 119 120Yes. Since 0.6.x, all BWA algorithms work with a genome with total length over 1214GB. However, individual chromosome should not be longer than 2GB. 122 123#### <a name="pe0"></a>4. Why can one read in a pair have a high mapping quality but the other has zero? 124 125This is correct. Mapping quality is assigned for individual read, not for a read 126pair. It is possible that one read can be mapped unambiguously, but its mate 127falls in a tandem repeat and thus its accurate position cannot be determined. 128 129#### <a name="endref"></a>5. How can a BWA-backtrack alignment stand out of the end of a chromosome? 130 131Internally BWA concatenates all reference sequences into one long sequence. A 132read may be mapped to the junction of two adjacent reference sequences. In this 133case, BWA-backtrack will flag the read as unmapped (0x4), but you will see 134position, CIGAR and all the tags. A similar issue may occur to BWA-SW alignment 135as well. BWA-MEM does not have this problem. 136 137#### <a name="altctg"></a>6. Does BWA work with ALT contigs in the GRCh38 release? 138 139Yes, since 0.7.11, BWA-MEM officially supports mapping to GRCh38+ALT. 140BWA-backtrack and BWA-SW don't properly support ALT mapping as of now. Please 141see [README-alt.md][18] for details. Briefly, it is recommended to use 142[bwakit][17], the binary release of BWA, for generating the reference genome 143and for mapping. 144 145#### <a name="postalt"></a>7. Can I just run BWA-MEM against GRCh38+ALT without post-processing? 146 147If you are not interested in hits to ALT contigs, it is okay to run BWA-MEM 148without post-processing. The alignments produced this way are very close to 149alignments against GRCh38 without ALT contigs. Nonetheless, applying 150post-processing helps to reduce false mappings caused by reads from the 151diverged part of ALT contigs and also enables HLA typing. It is recommended to 152run the post-processing script. 153 154 155 156[1]: http://en.wikipedia.org/wiki/GNU_General_Public_License 157[2]: https://github.com/lh3/bwa 158[3]: http://sourceforge.net/projects/bio-bwa/files/ 159[4]: http://bio-bwa.sourceforge.net/bwa.shtml 160[5]: http://bio-bwa.sourceforge.net/ 161[6]: https://lists.sourceforge.net/lists/listinfo/bio-bwa-help 162[7]: mailto:bio-bwa-help@sourceforge.net 163[8]: http://biostars.org 164[9]: http://seqanswers.com/ 165[10]: http://www.ncbi.nlm.nih.gov/pubmed/19451168 166[11]: http://www.ncbi.nlm.nih.gov/pubmed/20080505 167[12]: http://arxiv.org/abs/1303.3997 168[13]: http://arxiv.org/ 169[14]: http://zlib.net/ 170[15]: https://github.com/lh3/bwa/tree/mem 171[16]: ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/ 172[17]: http://sourceforge.net/projects/bio-bwa/files/bwakit/ 173[18]: https://github.com/lh3/bwa/blob/master/README-alt.md 174