1Release 0.7.17 (23 October 2017) 2-------------------------------- 3 4This release adds option -q to preserve the mapping quality of split alignment 5with a lower alignment score than the primary alignment. Option -5 6automatically applies -q as well. 7 8(0.7.17: 23 October 2017, r1188) 9 10 11 12Release 0.7.16 (30 July 2017) 13----------------------------- 14 15This release added a couple of minor features and incorporated multiple pull 16requests, including: 17 18 * Added option -5, which is useful to some Hi-C pipelines. 19 20 * Fixed an error with samtools sorting (#129). Updated download link for 21 GRCh38 (#123). Fixed README MarkDown formatting (#70). Addressed multiple 22 issues via a collected pull request #139 by @jmarshall. Avoid malformatted 23 SAM header when -R is used with TAB (#84). Output mate CIGAR (#138). 24 25(0.7.16: 30 July 2017, r1180) 26 27 28 29Release 0.7.15 (31 May 2016) 30---------------------------- 31 32Fixed a long existing bug which potentially leads to underestimated insert size 33upper bound. This bug should have little effect in practice. 34 35(0.7.15: 31 May 2016, r1140) 36 37 38 39Release 0.7.14 (4 May 2016) 40--------------------------- 41 42In the ALT mapping mode, this release adds the "AH:*" header tag to SQ lines 43corresponding to alternate haplotypes. 44 45(0.7.14: 4 May 2016, r1136) 46 47 48 49Release 0.7.13 (23 Feburary 2016) 50--------------------------------- 51 52This release fixes a few minor bugs in the previous version and adds a few 53minor features. All BWA algorithms should produce identical output to 0.7.12 54when there are no ALT contigs. 55 56Detailed changes: 57 58 * Fixed a bug in "bwa-postalt.js". The old version may produce 0.5% of wrong 59 bases for reads mapped to the ALT contigs. 60 61 * Fixed a potential bug in the multithreading mode. It may occur when mapping 62 is much faster than file reading, which should almost never happen in 63 practice. 64 65 * Changed the download URL of GRCh38. 66 67 * Removed the read overlap mode. It is not working well. 68 69 * Added the ropebwt2 algorithm as an alternative to index large genomes. 70 Ropebwt2 is slower than the "bwtsw" algorithm, but it has a permissive 71 license. This allows us to create an Apache2-licensed BWA (in the "Apache2" 72 branch) for commercial users who are concerned with GPL. 73 74(0.7.13: 23 Feburary 2016, r1126) 75 76 77 78Release 0.7.12 (28 December 2014) 79--------------------------------- 80 81This release fixed a bug in the pair-end mode when ALT contigs are present. It 82leads to undercalling in regions overlapping ALT contigs. 83 84(0.7.12: 28 December 2014, r1039) 85 86 87 88Release 0.7.11 (23 December, 2014) 89---------------------------------- 90 91A major change to BWA-MEM is the support of mapping to ALT contigs in addition 92to the primary assembly. Part of the ALT mapping strategy is implemented in 93BWA-MEM and the rest in a postprocessing script for now. Due to the extra 94layer of complexity on generating the reference genome and on the two-step 95mapping, we start to provide a wrapper script and precompiled binaries since 96this release. The package may be more convenient to some specific use cases. 97For general uses, the single BWA binary still works like the old way. 98 99Another major addition to BWA-MEM is HLA typing, which made possible with the 100new ALT mapping strategy. Necessary data and programs are included in the 101binary release. The wrapper script also optionally performs HLA typing when HLA 102genes are included in the reference genome as additional ALT contigs. 103 104Other notable changes to BWA-MEM: 105 106 * Added option `-b` to `bwa index`. This option tunes the batch size used in 107 the construction of BWT. It is advised to use large `-b` for huge reference 108 sequences such as the BLAST *nt* database. 109 110 * Optimized for PacBio data. This includes a change to scoring based on a 111 study done by Aaron Quinlan and a heuristic speedup. Further speedup is 112 possible, but needs more careful investigation. 113 114 * Dropped PacBio read-to-read alignment for now. BWA-MEM is good for finding 115 the best hit, but is not very sensitive to suboptimal hits. Option `-x pbread` 116 is still available, but hidden on the command line. This may be removed in 117 future releases. 118 119 * Added a new pre-setting for Oxford Nanopore 2D reads. LAST is still a little 120 more sensitive on older bacterial data, but bwa-mem is as good on more 121 recent data and is times faster for mapping against mammalian genomes. 122 123 * Added LAST-like seeding. This improves the accuracy for longer reads. 124 125 * Added option `-H` to insert arbitrary header lines. 126 127 * Smarter option `-p`. Given an interleaved FASTQ stream, old bwa-mem identifies 128 the 2i-th and (2i+1)-th reads as a read pair. The new verion identifies 129 adjacent reads with the same read name as a read pair. It is possible to mix 130 single-end and paired-end reads in one FASTQ. 131 132 * Improved parallelization. Old bwa-mem waits for I/O. The new version puts 133 I/O on a separate thread. It performs mapping while reading FASTQ and 134 writing SAM. This saves significant wall-clock time when reading from 135 or writing to a slow Unix pipe. 136 137With the new release, the recommended way to map Illumina reads to GRCh38 is to 138use the bwakit binary package: 139 140 bwa.kit/run-gen-ref hs38DH 141 bwa.kit/bwa index hs38DH.fa 142 bwa.kit/run-bwamem -t8 -H -o out-prefix hs38DH.fa read1.fq.gz read2.fq.gz | sh 143 144Please check bwa.kit/README.md for details and command line options. 145 146(0.7.11: 23 December 2014, r1034) 147 148 149 150Release 0.7.10 (13 July, 2014) 151------------------------------ 152 153Notable changes to BWA-MEM: 154 155 * Fixed a segmentation fault due to an alignment bridging the forward-reverse 156 boundary. This is a bug. 157 158 * Use the PacBio heuristic to map contigs to the reference genome. The old 159 heuristic evaluates the necessity of full extension for each chain. This may 160 not work in long low-complexity regions. The PacBio heuristic performs 161 SSE2-SW around each short seed. It works better. Note that the heuristic is 162 only applied to long query sequences. For Illumina reads, the output is 163 identical to the previous version. 164 165(0.7.10: 13 July 2014, r789) 166 167 168 169Release 0.7.9 (19 May, 2014) 170---------------------------- 171 172This release brings several major changes to BWA-MEM. Notably, BWA-MEM now 173formally supports PacBio read-to-reference alignment and experimentally supports 174PacBio read-to-read alignment. BWA-MEM also runs faster at a minor cost of 175accuracy. The speedup is more significant when GRCh38 is in use. More 176specifically: 177 178 * Support PacBio subread-to-reference alignment. Although older BWA-MEM works 179 with PacBio data in principle, the resultant alignments are frequently 180 fragmented. In this release, we fine tuned existing methods and introduced 181 new heuristics to improve PacBio alignment. These changes are not used by 182 default. Users need to add option "-x pacbio" to enable the feature. 183 184 * Support PacBio subread-to-subread alignment (EXPERIMENTAL). This feature is 185 enabled with option "-x pbread". In this mode, the output only gives the 186 overlapping region between a pair of reads without detailed alignment. 187 188 * Output alternative hits in the XA tag if there are not so many of them. This 189 is a BWA-backtrack feature. 190 191 * Support mapping to ALT contigs in GRCh38 (EXPERIMENTAL). We provide a script 192 to postprocess hits in the XA tag to adjust the mapping quality and generate 193 new primary alignments to all overlapping ALT contigs. We would *NOT* 194 recommend this feature for production uses. 195 196 * Improved alignments to many short reference sequences. Older BWA-MEM may 197 generate an alignment bridging two or more adjacent reference sequences. 198 Such alignments are split at a later step as postprocessing. This approach 199 is complex and does not always work. This release forbids these alignments 200 from the very beginning. BWA-MEM should not produce an alignment bridging 201 two or more reference sequences any more. 202 203 * Reduced the maximum seed occurrence from 10000 to 500. Reduced the maximum 204 rounds of Smith-Waterman mate rescue from 100 to 50. Added a heuristic to 205 lower the mapping quality if a read contains seeds with excessive 206 occurrences. These changes make BWA-MEM faster at a minor cost of accuracy 207 in highly repetitive regions. 208 209 * Added an option "-Y" to use soft clipping for supplementary alignments. 210 211 * Bugfix: incomplete alignment extension in corner cases. 212 213 * Bugfix: integer overflow when aligning long query sequences. 214 215 * Bugfix: chain score is not computed correctly (almost no practical effect) 216 217 * General code cleanup 218 219 * Added FAQs to README 220 221Changes in BWA-backtrack: 222 223 * Bugfix: a segmentation fault when an alignment stands out of the end of the 224 last chromosome. 225 226(0.7.9: 19 May 2014, r783) 227 228 229 230Release 0.7.8 (31 March, 2014) 231------------------------------ 232 233Changes in BWA-MEM: 234 235 * Bugfix: off-diagonal X-dropoff (option -d) not working as intended. 236 Short-read alignment is not affected. 237 238 * Bugfix: unnecessarily large bandwidth used during global alignment, 239 which reduces the mapping speed by -5% for short reads. Results are not 240 affected. 241 242 * Bugfix: when the matching score is not one, paired-end mapping quality is 243 inaccurate. 244 245 * When the matching score (option -A) is changed, scale all score-related 246 options accordingly unless overridden by users. 247 248 * Allow to specify different gap open (or extension) penalties for deletions 249 and insertions separately. 250 251 * Allow to specify the insert size distribution. 252 253 * Better and more detailed debugging information. 254 255With the default setting, 0.7.8 and 0.7.7 gave identical output on one million 256100bp read pairs. 257 258(0.7.8: 31 March 2014, r455) 259 260 261 262Release 0.7.7 (25 Feburary, 2014) 263--------------------------------- 264 265This release fixes incorrect MD tags in the BWA-MEM output. 266 267A note about short-read mapping to GRCh38. The new human reference genome 268GRCh38 contains 60Mbp program generated alpha repeat arrays, some of which are 269hard masked as they cannot be localized. These highly repetitive arrays make 270BWA-MEM -50% slower. If you are concerned with the performance of BWA-MEM, you 271may consider to use option "-c2000 -m50". On simulated data, this setting helps 272the performance at a very minor cost on accuracy. I may consider to change the 273default in future releases. 274 275(0.7.7: 25 Feburary 2014, r441) 276 277 278 279Release 0.7.6 (31 Januaray, 2014) 280--------------------------------- 281 282Changes in BWA-MEM: 283 284 * Changed the way mapping quality is estimated. The new method tends to give 285 the same alignment a higher mapping quality. On paired-end reads, the change 286 is minor as with pairing, the mapping quality is usually high. For short 287 single-end reads, the difference is considerable. 288 289 * Improved load balance when many threads are spawned. However, bwa-mem is 290 still not very thread efficient, probably due to the frequent heap memory 291 allocation. Further improvement is a little difficult and may affect the 292 code stability. 293 294 * Allow to use different clipping penalties for 5'- and 3'-ends. This helps 295 when we do not want to clip one end. 296 297 * Print the @PG line, including the command line options. 298 299 * Improved the band width estimate: a) fixed a bug causing the band 300 width extimated from extension not used in the final global alignment; b) 301 try doubled band width if the global alignment score is smaller. 302 Insufficient band width leads to wrong CIGAR and spurious mismatches/indels. 303 304 * Added a new option -D to fine tune a heuristic on dropping suboptimal hits. 305 Reducing -D increases accuracy but decreases the mapping speed. If unsure, 306 leave it to the default. 307 308 * Bugfix: for a repetitive single-end read, the reported hit is not randomly 309 distributed among equally best hits. 310 311 * Bugfix: missing paired-end hits due to unsorted list of SE hits. 312 313 * Bugfix: incorrect CIGAR caused by a defect in the global alignment. 314 315 * Bugfix: incorrect CIGAR caused by failed SW rescue. 316 317 * Bugfix: alignments largely mapped to the same position are regarded to be 318 distinct from each other, which leads to underestimated mapping quality. 319 320 * Added the MD tag. 321 322There are no changes to BWA-backtrack in this release. However, it has a few 323known issues yet to be fixed. If you prefer BWA-track, It is still advised to 324use bwa-0.6.x. 325 326While I developed BWA-MEM, I also found a few issues with BWA-SW. It is now 327possible to improve BWA-SW with the lessons learned from BWA-MEM. However, as 328BWA-MEM is usually better, I will not improve BWA-SW until I find applications 329where BWA-SW may excel. 330 331(0.7.6: 31 January 2014, r432) 332 333 334 335Release 0.7.5a (30 May, 2013) 336----------------------------- 337 338Fixed a bug in BWA-backtrack which leads to off-by-one mapping errors in rare 339cases. 340 341(0.7.5a: 30 May 2013, r405) 342 343 344 345Release 0.7.5 (29 May, 2013) 346---------------------------- 347 348Changes in all components: 349 350 * Improved error checking on memory allocation and file I/O. Patches provided 351 by Rob Davies. 352 353 * Updated README. 354 355 * Bugfix: return code is zero upon errors. 356 357Changes in BWA-MEM: 358 359 * Changed the way a chimeric alignment is reported (conforming to the upcoming 360 SAM spec v1.5). With 0.7.5, if the read has a chimeric alignment, the paired 361 or the top hit uses soft clipping and is marked with neither 0x800 nor 0x100 362 bits. All the other hits part of the chimeric alignment will use hard 363 clipping and be marked with 0x800 if option "-M" is not in use, or marked 364 with 0x100 otherwise. 365 366 * Other hits part of a chimeric alignment are now reported in the SA tag, 367 conforming to the SAM spec v1.5. 368 369 * Better method for resolving an alignment bridging two or more short 370 reference sequences. The current strategy maps the query to the reference 371 sequence that covers the middle point of the alignment. For most 372 applications, this change has no effects. 373 374Changes in BWA-backtrack: 375 376 * Added a magic number to .sai files. This prevents samse/sampe from reading 377 corrupted .sai (e.g. a .sai file containing LSF log) or incompatible .sai 378 generated by a different version of bwa. 379 380 * Bugfix: alignments in the XA:Z: tag were wrong. 381 382 * Keep track of #ins and #del during backtracking. This simplifies the code 383 and reduces errors in rare corner cases. I should have done this in the 384 early days of bwa. 385 386In addition, if you use BWA-MEM or the fastmap command of BWA, please cite: 387 388 - Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs 389 with BWA-MEM. arXiv:1303.3997v2 [q-bio.GN]. 390 391Thank you. 392 393(0.7.5: 29 May 2013, r404) 394 395 396 397Release 0.7.4 (23 April, 2013) 398------------------------------ 399 400This is a bugfix release. Most of bugs are considered to be minor which only 401occur very rarely. 402 403 * Bugfix: wrong CIGAR when a query sequence bridges three or more target 404 sequences. This only happens when aligning reads to short assembly contigs. 405 406 * Bugfix: leading "D" operator in CIGAR. 407 408 * Extend more seeds for better alignment around tandem repeats. This is also 409 a cause of the leading "D" operator in CIGAR. 410 411 * Bugfix: SSE2-SSW may occasionally find incorrect query starting position 412 around tandem repeat. This will lead to a suboptimal CIGAR in BWA-MEM and 413 a wrong CIGAR in BWA. 414 415 * Bugfix: clipping penalty does not work as is intended when there is a gap 416 towards the end of a read. 417 418 * Fixed an issue caused by a bug in the libc from Mac/Darwin. In Darwin, 419 fread() is unable to read a data block longer than 2GB due to an integer 420 overflow bug in its implementation. 421 422Since version 0.7.4, BWA-MEM is considered to reach similar stability to 423BWA-backtrack for short-read mapping. 424 425(0.7.4: 23 April, r385) 426 427 428 429Release 0.7.3a (15 March, 2013) 430------------------------------- 431 432In 0.7.3, the wrong CIGAR bug was only fixed in one scenario, but not fixed 433in another corner case. 434 435(0.7.3a: 15 March 2013, r367) 436 437 438 439Release 0.7.3 (15 March, 2013) 440------------------------------ 441 442Changes to BWA-MEM: 443 444 * Bugfix: pairing score is inaccurate when option -A does not take the default 445 value. This is a very minor issue even if it happens. 446 447 * Bugfix: occasionally wrong CIGAR. This happens when in the alignment there 448 is a 1bp deletion and a 1bp insertion which are close to the end of the 449 reads, and there are no other substitutions or indels. BWA-MEM would not do 450 a gapped alignment due to the bug. 451 452 * New feature: output other non-overlapping alignments in the XP tag such that 453 we can see the entire picture of alignment from one SAM line. XP gives the 454 position, CIGAR, NM and mapQ of each aligned subsequence of the query. 455 456BWA-MEM has been used to align -300Gbp 100-700bp SE/PE reads. SNP/indel calling 457has also been evaluated on part of these data. BWA-MEM generally gives better 458pre-filtered SNP calls than BWA. No significant issues have been observed since 4590.7.2, though minor improvements or bugs (e.g. the bug fixed in this release) 460are still possible. If you find potential issues, please send bug reports to 461<bio-bwa-help@lists.sourceforge.net> (free registration required). 462 463In addition, more detailed description of the BWA-MEM algorithm can be found at 464<https://github.com/lh3/mem-paper>. 465 466(0.7.3: 15 March 2013, r366) 467 468 469 470Release 0.7.2 (9 March, 2013) 471----------------------------- 472 473Emergent bug fix: 0.7.0 and 0.7.1 give a wrong sign to TLEN. In addition, 474flagging 'properly paired' also gets improved a little. 475 476(0.7.2: 9 March 2013, r351) 477 478 479 480Release 0.7.1 (8 March, 2013) 481----------------------------- 482 483Changes to BWA-MEM: 484 485 * Bugfix: rare segmentation fault caused by a partial hit to the end of the 486 last sequence. 487 488 * Bugfix: occasional mis-pairing given an interleaved fastq. 489 490 * Bugfix: wrong mate information when the mate is unmapped. SAM generated by 491 BWA-MEM can now be validated with Picard. 492 493 * Improved the performance and accuracy for ultra-long query sequences. 494 Short-read alignment is not affected. 495 496Changes to other components: 497 498 * In BWA-backtrack and BWA-SW, replaced the code for global alignment, 499 Smith-Waterman and SW extension. The performance and accuracy of the two 500 algorithms stay the same. 501 502 * Added an experimental subcommand to merge overlapping paired ends. The 503 algorithm is very conservative: it may miss true overlaps but rarely makes 504 mistakes. 505 506An important note is that like BWA-SW, BWA-MEM may output multiple primary 507alignments for a read, which may cause problems to some tools. For aligning 508sequence reads, it is advised to use '-M' to flag extra hits as secondary. This 509option is not the default because multiple primary alignments are theoretically 510possible in sequence alignment. 511 512(0.7.1: 8 March 2013, r347) 513 514 515 516Beta Release 0.7.0 (28 Feburary, 2013) 517-------------------------------------- 518 519This release comes with a new alignment algorithm, BWA-MEM, for 70bp-1Mbp query 520sequences. BWA-MEM essentially seeds alignments with a variant of the fastmap 521algorithm and extends seeds with banded affine-gap-penalty dynamic programming 522(i.e. the Smith-Waterman-Gotoh algorithm). For typical Illumina 100bp reads or 523longer low-divergence query sequences, BWA-MEM is about twice as fast as BWA 524and BWA-SW and is more accurate. It also supports split alignments like BWA-SW 525and may optionally output multiple hits like BWA. BWA-MEM does not guarantee 526to find hits within a certain edit distance, but BWA is not efficient for such 527task given longer reads anyway, and the edit-distance criterion is arguably 528not as important in long-read alignment. 529 530In addition to the algorithmic improvements, BWA-MEM also implements a few 531handy features in practical aspects: 532 533 1. BWA-MEM automatically switches between local and glocal (global wrt reads; 534 local wrt reference) alignment. It reports the end-to-end glocal alignment 535 if the glocal alignment is not much worse than the optimal local alignment. 536 Glocal alignment reduces reference bias. 537 538 2. BWA-MEM automatically infers pair orientation from a batch of single-end 539 alignments. It allows more than one orientations if there are sufficient 540 supporting reads. This feature has not been tested on reads from Illumina 541 jumping library yet. (EXPERIMENTAL) 542 543 3. BWA-MEM optionally takes one interleaved fastq for paired-end mapping. It 544 is possible to convert a name-sorted BAM to an interleaved fastq on the fly 545 and feed the data stream to BWA-MEM for mapping. 546 547 4. BWA-MEM optionally copies FASTA/Q comments to the final SAM output, which 548 helps to transfer individual read annotations to the output. 549 550 5. BWA-MEM supports more advanced piping. Users can now run: 551 (bwa mem ref.fa '<bzcat r1.fq.bz2' '<bzcat r2.fq.bz2') to map bzip'd read 552 files without replying on bash features. 553 554 6. BWA-MEM provides a few basic APIs for single-end mapping. The 'example.c' 555 program in the source code directory implements a full single-end mapper in 556 50 lines of code. 557 558The BWA-MEM algorithm is in the beta phase. It is not advised to use BWA-MEM 559for production use yet. However, when the implementation becomes stable after a 560few release cycles, existing BWA users are recommended to migrate to BWA-MEM 561for 76bp or longer Illumina reads and long query sequences. The original BWA 562short-read algorithm will not deliver satisfactory results for 150bp+ Illumina 563reads. Change of mappers will be necessary sooner or later. 564 565(0.7.0 beta: 28 Feburary 2013, r313) 566 567 568 569Release 0.6.2 (19 June, 2012) 570----------------------------- 571 572This is largely a bug-fix release. Notable changes in BWA-short and BWA-SW: 573 574 * Bugfix: BWA-SW may give bad alignments due to incorrect band width. 575 576 * Bugfix: A segmentation fault due to an out-of-boundary error. The fix is a 577 temporary solution. The real cause has not been identified. 578 579 * Attempt to read index from prefix.64.bwt, such that the 32-bit and 64-bit 580 index can coexist. 581 582 * Added options '-I' and '-S' to control BWA-SW pairing. 583 584(0.6.2: 19 June 2012, r126) 585 586 587 588Release 0.6.1 (28 November, 2011) 589--------------------------------- 590 591Notable changes to BWA-short: 592 593 * Bugfix: duplicated alternative hits in the XA tag. 594 595 * Bugfix: when trimming enabled, bwa-aln trims 1bp less. 596 597 * Disabled the color-space alignment. 0.6.x is not working with SOLiD reads at 598 present. 599 600Notable changes to BWA-SW: 601 602 * Bugfix: segfault due to excessive ambiguous bases. 603 604 * Bugfix: incorrect mate position in the SE mode. 605 606 * Bugfix: rare segfault in the PE mode 607 608 * When macro _NO_SSE2 is in use, fall back to the standard Smith-Waterman 609 instead of SSE2-SW. 610 611 * Optionally mark split hits with lower alignment scores as secondary. 612 613Changes to fastmap: 614 615 * Bugfix: infinite loop caused by ambiguous bases. 616 617 * Optionally output the query sequence. 618 619(0.6.1: 28 November 2011, r104) 620 621 622 623Release 0.5.10 and 0.6.0 (12 November, 2011) 624-------------------------------------------- 625 626The 0.6.0 release comes with two major changes. Firstly, the index data 627structure has been changed to support genomes longer than 4GB. The forward and 628reverse backward genome is now integrated in one index. This change speeds up 629BWA-short by about 20% and BWA-SW by 90% with the mapping acccuracy largely 630unchanged. A tradeoff is BWA requires more memory, but this is the price almost 631all mappers that index the genome have to pay. 632 633Secondly, BWA-SW in 0.6.0 now works with paired-end data. It is more accurate 634for highly unique reads and more robust to long indels and structural 635variations. However, BWA-short still has edges for reads with many suboptimal 636hits. It is yet to know which algorithm is the best for variant calling. 637 6380.5.10 is a bugfix release only and is likely to be the last release in the 0.5 639branch unless I find critical bugs in future. 640 641Other notable changes: 642 643 * Added the 'fastmap' command that finds super-maximal exact matches. It does 644 not give the final alignment, but runs much faster. It can be a building 645 block for other alignment algorithms. [0.6.0 only] 646 647 * Output the timing information before BWA exits. This also tells users that 648 the task has been finished instead of being killed or aborted. [0.6.0 only] 649 650 * Sped up multi-threading when using many (>20) CPU cores. 651 652 * Check I/O error. 653 654 * Increased the maximum barcode length to 63bp. 655 656 * Automatically choose the indexing algorithm. 657 658 * Bugfix: very rare segfault due to an uninitialized variable. The bug also 659 affects the placement of suboptimal alignments. The effect is very minor. 660 661This release involves quite a lot of tricky changes. Although it has been 662tested on a few data sets, subtle bugs may be still hidden. It is *NOT* 663recommended to use this release in a production pipeline. In future, however, 664BWA-SW may be better when reads continue to go longer. I would encourage users 665to try the 0.6 release. I would also like to hear the users' experience. Thank 666you. 667 668(0.6.0: 12 November 2011, r85) 669 670 671 672Beta Release 0.5.9 (24 January, 2011) 673------------------------------------- 674 675Notable changes: 676 677 * Feature: barcode support via the '-B' option. 678 679 * Feature: Illumina 1.3+ read format support via the '-I' option. 680 681 * Bugfix: RG tags are not attached to unmapped reads. 682 683 * Bugfix: very rare bwasw mismappings 684 685 * Recommend options for PacBio reads in bwasw help message. 686 687 688Also, since January 13, the BWA master repository has been moved to github: 689 690 https://github.com/lh3/bwa 691 692The revision number has been reset. All recent changes will be first 693committed to this repository. 694 695(0.5.9: 24 January 2011, r16) 696 697 698 699Beta Release Candidate 0.5.9rc1 (10 December, 2010) 700--------------------------------------------------- 701 702Notable changes in bwasw: 703 704 * Output unmapped reads. 705 706 * For a repetitive read, choose a random hit instead of a fixed 707 one. This is not well tested. 708 709Notable changes in bwa-short: 710 711 * Fixed a bug in the SW scoring system, which may lead to unexpected 712 gaps towards the end of a read. 713 714 * Fixed a bug which invalidates the randomness of repetitive reads. 715 716 * Fixed a rare memory leak. 717 718 * Allowed to specify the read group at the command line. 719 720 * Take name-grouped BAM files as input. 721 722Changes to this release are usually safe in that they do not interfere 723with the key functionality. However, the release has only been tested on 724small samples instead of on large-scale real data. If anything weird 725happens, please report the bugs to the bio-bwa-help mailing list. 726 727(0.5.9rc1: 10 December 2010, r1561) 728 729 730 731Beta Release 0.5.8 (8 June, 2010) 732--------------------------------- 733 734Notable changes in bwasw: 735 736 * Fixed an issue of missing alignments. This should happen rarely and 737 only when the contig/read alignment is multi-part. Very rarely, bwasw 738 may still miss a segment in a multi-part alignment. This is difficult 739 to fix, although possible. 740 741Notable changes in bwa-short: 742 743 * Discard the SW alignment when the best single-end alignment is much 744 better. Such a SW alignment may caused by structural variations and 745 forcing it to be aligned leads to false alignment. This fix has not 746 been tested thoroughly. It would be great to receive more users 747 feedbacks on this issue. 748 749 * Fixed a typo/bug in sampe which leads to unnecessarily large memory 750 usage in some cases. 751 752 * Further reduced the chance of reporting 'weird pairing'. 753 754(0.5.8: 8 June 2010, r1442) 755 756 757 758Beta Release 0.5.7 (1 March, 2010) 759---------------------------------- 760 761This release only has an effect on paired-end data with fat insert-size 762distribution. Users are still recommended to update as the new release 763improves the robustness to poor data. 764 765 * The fix for 'weird pairing' was not working in version 0.5.6, pointed 766 out by Carol Scott. It should work now. 767 768 * Optionally output to a normal file rather than to stdout (by Tim 769 Fennel). 770 771(0.5.7: 1 March 2010, r1310) 772 773 774 775Beta Release 0.5.6 (10 Feburary, 2010) 776-------------------------------------- 777 778Notable changes in bwa-short: 779 780 * Report multiple hits in the SAM format at a new tag XA encoded as: 781 (chr,pos,CIGAR,NM;)*. By default, if a paired or single-end read has 782 4 or fewer hits, they will all be reported; if a read in a anomalous 783 pair has 11 or fewer hits, all of them will be reported. 784 785 * Perform Smith-Waterman alignment also for anomalous read pairs when 786 both ends have quality higher than 17. This reduces false positives 787 for some SV discovery algorithms. 788 789 * Do not report "weird pairing" when the insert size distribution is 790 too fat or has a mean close to zero. 791 792 * If a read is bridging two adjacent chromsomes, flag it as unmapped. 793 794 * Fixed a small but long existing memory leak in paired-end mapping. 795 796 * Multiple bug fixes in SOLiD mapping: a) quality "-1" can be correctly 797 parsed by solid2fastq.pl; b) truncated quality string is resolved; c) 798 SOLiD read mapped to the reverse strand is complemented. 799 800 * Bwa now calculates skewness and kurtosis of the insert size 801 distribution. 802 803 * Deploy a Bayesian method to estimate the maximum distance for a read 804 pair considered to be paired properly. The method is proposed by 805 Gerton Lunter, but bwa only implements a simplified version. 806 807 * Export more functions for Java bindings, by Matt Hanna (See: 808 http://www.broadinstitute.org/gsa/wiki/index.php/Sting_BWA/C_bindings) 809 810 * Abstract bwa CIGAR for further extension, by Rodrigo Goya. 811 812(0.5.6: 10 Feburary 2010, r1303) 813 814 815 816Beta Release 0.5.5 (10 November, 2009) 817-------------------------------------- 818 819This is a bug fix release: 820 821 * Fixed a serious bug/typo in aln which does not occur given short 822 reads, but will lead to segfault for >500bp reads. Of course, the aln 823 command is not recommended for reads longer than 200bp, but this is a 824 bug anyway. 825 826 * Fixed a minor bug/typo which leads to incorrect single-end mapping 827 quality when one end is moved to meet the mate-pair requirement. 828 829 * Fixed a bug in samse for mapping in the color space. This bug is 830 caused by quality filtration added since 0.5.1. 831 832(0.5.5: 10 November 2009, r1273) 833 834 835 836Beta Release 0.5.4 (9 October, 2009) 837------------------------------------ 838 839Since this version, the default seed length used in the "aln" command is 840changed to 32. 841 842Notable changes in bwa-short: 843 844 * Added a new tag "XC:i" which gives the length of clipped reads. 845 846 * In sampe, skip alignments in case of a bug in the Smith-Waterman 847 alignment module. 848 849 * In sampe, fixed a bug in pairing when the read sequence is identical 850 to its reverse complement. 851 852 * In sampe, optionally preload the entire FM-index into memory to 853 reduce disk operations. 854 855Notable changes in dBWT-SW/BWA-SW: 856 857 * Changed name dBWT-SW to BWA-SW. 858 859 * Optionally use "hard clipping" in the SAM output. 860 861(0.5.4: 9 October 2009, r1245) 862 863 864 865Beta Release 0.5.3 (15 September, 2009) 866--------------------------------------- 867 868Fixed a critical bug in bwa-short: reads mapped to the reverse strand 869are not complemented. 870 871(0.5.3: 15 September 2009, r1225) 872 873 874 875Beta Release 0.5.2 (13 September, 2009) 876--------------------------------------- 877 878Notable changes in bwa-short: 879 880 * Optionally trim reads before alignment. See the manual page on 'aln 881 -q' for detailed description. 882 883 * Fixed a bug in calculating the NM tag for a gapped alignment. 884 885 * Fixed a bug given a mixture of reads with some longer than the seed 886 length and some shorter. 887 888 * Print SAM header. 889 890Notable changes in dBWT-SW: 891 892 * Changed the default value of -T to 30. As a result, the accuracy is a 893 little higher for short reads at the cost of speed. 894 895(0.5.2: 13 September 2009, r1223) 896 897 898 899Beta Release 0.5.1 (2 September, 2009) 900-------------------------------------- 901 902Notable changes in the short read alignment component: 903 904 * Fixed a bug in samse: do not write mate coordinates. 905 906Notable changes in dBWT-SW: 907 908 * Randomly choose one alignment if the read is a repetitive. 909 910 * Fixed a flaw when a read is mapped across two adjacent reference 911 sequences. However, wrong alignment reports may still occur rarely in 912 this case. 913 914 * Changed the default band width to 50. The speed is slower due to this 915 change. 916 917 * Improved the mapping quality a little given long query sequences. 918 919(0.5.1: 2 September 2009, r1209) 920 921 922 923Beta Release 0.5.0 (20 August, 2009) 924------------------------------------ 925 926This release implements a novel algorithm, dBWT-SW, specifically 927designed for long reads. It is 10-50 times faster than SSAHA2, depending 928on the characteristics of the input data, and achieves comparable 929alignment accuracy while allowing chimera detection. In comparison to 930BLAT, dBWT-SW is several times faster and much more accurate especially 931when the error rate is high. Please read the manual page for more 932information. 933 934The dBWT-SW algorithm is kind of developed for future sequencing 935technologies which produce much longer reads with a little higher error 936rate. It is still at its early development stage. Some features are 937missing and it may be buggy although I have evaluated on several 938simulated and real data sets. But following the "release early" 939paradigm, I would like the users to try it first. 940 941Other notable changes in BWA are: 942 943 * Fixed a rare bug in the Smith-Waterman alignment module. 944 945 * Fixed a rare bug about the wrong alignment coordinate when a read is 946 poorly aligned. 947 948 * Fixed a bug in generating the "mate-unmap" SAM tag when both ends in 949 a pair are unmapped. 950 951(0.5.0: 20 August 2009, r1200) 952 953 954 955Beta Release 0.4.9 (19 May, 2009) 956--------------------------------- 957 958Interestingly, the integer overflow bug claimed to be fixed in 0.4.7 has 959not in fact. Now I have fixed the bug. Sorry for this and thank Quan 960Long for pointing out the bug (again). 961 962(0.4.9: 19 May 2009, r1075) 963 964 965 966Beta Release 0.4.8 (18 May, 2009) 967--------------------------------- 968 969One change to "aln -R". Now by default, if there are no more than '-R' 970equally best hits, bwa will search for suboptimal hits. This change 971affects the ability in finding SNPs in segmental duplications. 972 973I have not tested this option thoroughly, but this simple change is less 974likely to cause new bugs. Hope I am right. 975 976(0.4.8: 18 May 2009, r1073) 977 978 979 980Beta Release 0.4.7 (12 May, 2009) 981--------------------------------- 982 983Notable changes: 984 985 * Output SM (single-end mapping quality) and AM (smaller mapping 986 quality among the two ends) tag from sam output. 987 988 * Improved the functionality of stdsw. 989 990 * Made the XN tag more accurate. 991 992 * Fixed a very rare segfault caused by integer overflow. 993 994 * Improve the insert size estimation. 995 996 * Fixed compiling errors for some Linux systems. 997 998(0.4.7: 12 May 2009, r1066) 999 1000 1001 1002Beta Release 0.4.6 (9 March, 2009) 1003---------------------------------- 1004 1005This release improves the SOLiD support. First, a script for converting 1006SOLiD raw data is provided. This script is adapted from solid2fastq.pl 1007in the MAQ package. Second, a nucleotide reference file can be directly 1008used with 'bwa index'. Third, SOLiD paired-end support is 1009completed. Fourth, color-space reads will be converted to nucleotides 1010when SAM output is generated. Color errors are corrected in this 1011process. Please note that like MAQ, BWA cannot make use of the primer 1012base and the first color. 1013 1014In addition, the calculation of mapping quality is also improved a 1015little bit, although end-users may barely observe the difference. 1016 1017(0.4.6: 9 March 2009, r915) 1018 1019 1020 1021Beta Release 0.4.5 (18 Feburary, 2009) 1022-------------------------------------- 1023 1024Not much happened, but I think it would be good to let the users use the 1025latest version. 1026 1027Notable changes (Thank Bob Handsaker for catching the two bugs): 1028 1029 * Improved bounary check. Previous version may still give incorrect 1030 alignment coordinates in rare cases. 1031 1032 * Fixed a bug in SW alignment when no residue matches. This only 1033 affects the 'sampe' command. 1034 1035 * Robustly estimate insert size without setting the maximum on the 1036 command line. Since this release 'sampe -a' only has an effect if 1037 there are not enough good pairs to infer the insert size 1038 distribution. 1039 1040 * Reduced false PE alignments a little bit by using the inferred insert 1041 size distribution. This fix may be more important for long insert 1042 size libraries. 1043 1044(0.4.5: 18 Feburary 2009, r829) 1045 1046 1047 1048Beta Release 0.4.4 (15 Feburary, 2009) 1049-------------------------------------- 1050 1051This is mainly a bug fix release. Notable changes are: 1052 1053 * Imposed boundary check for extracting subsequence from the 1054 genome. Previously this causes memory problem in rare cases. 1055 1056 * Fixed a bug in failing to find whether an alignment overlapping with 1057 N on the genome. 1058 1059 * Changed MD tag to meet the latest SAM specification. 1060 1061(0.4.4: 15 Feburary 2009, r815) 1062 1063 1064 1065Beta Release 0.4.3 (22 January, 2009) 1066------------------------------------ 1067 1068Notable changes: 1069 1070 * Treat an ambiguous base N as a mismatch. Previous versions will not 1071 map reads containing any N. 1072 1073 * Automatically choose the maximum allowed number of differences. This 1074 is important when reads of different lengths are mixed together. 1075 1076 * Print mate coordinate if only one end is unmapped. 1077 1078 * Generate MD tag. This tag encodes the mismatching positions and the 1079 reference bases at these positions. Deletions from the reference will 1080 also be printed. 1081 1082 * Optionally dump multiple hits from samse, in another concise format 1083 rather than SAM. 1084 1085 * Optionally disable iterative search. This is VERY SLOOOOW, though. 1086 1087 * Fixed a bug in generate SAM. 1088 1089(0.4.3: 22 January 2009, r787) 1090 1091 1092 1093Beta Release 0.4.2 (9 January, 2009) 1094------------------------------------ 1095 1096Aaron Quinlan found a bug in the indexer: the bwa indexer segfaults if 1097there are no comment texts in the FASTA header. This is a critical 1098bug. Nothing else was changed. 1099 1100(0.4.2: 9 January 2009, r769) 1101 1102 1103 1104Beta Release 0.4.1 (7 January, 2009) 1105------------------------------------ 1106 1107I am sorry for the quick updates these days. I like to set a milestone 1108for BWA and this release seems to be. For paired end reads, BWA also 1109does Smith-Waterman alignment for an unmapped read whose mate can be 1110mapped confidently. With this strategy BWA achieves similar accuracy to 1111maq. Benchmark is also updated accordingly. 1112 1113(0.4.1: 7 January 2009, r760) 1114 1115 1116 1117Beta Release 0.4.0 (6 January, 2009) 1118------------------------------------ 1119 1120In comparison to the release two days ago, this release is mainly tuned 1121for performance with some tricks I learnt from Bowtie. However, as the 1122indexing format has also been changed, I have to increase the version 1123number to 0.4.0 to emphasize that *DATABASE MUST BE RE-INDEXED* with 1124'bwa index'. 1125 1126 * Improved the speed by about 20%. 1127 1128 * Added multi-threading to 'bwa aln'. 1129 1130(0.4.0: 6 January 2009, r756) 1131 1132 1133 1134Beta Release 0.3.0 (4 January, 2009) 1135------------------------------------ 1136 1137 * Added paired-end support by separating SA calculation and alignment 1138 output. 1139 1140 * Added SAM output. 1141 1142 * Added evaluation to the documentation. 1143 1144(0.3.0: 4 January 2009, r741) 1145 1146 1147 1148Beta Release 0.2.0 (15 Augusst, 2008) 1149------------------------------------- 1150 1151 * Take the subsequence at the 5'-end as seed. Seeding strategy greatly 1152 improves the speed for long reads, at the cost of missing a few true 1153 hits that contain many differences in the seed. Seeding also increase 1154 the memory by 800MB. 1155 1156 * Fixed a bug which may miss some gapped alignments. Fixing the bug 1157 also slows the speed a little. 1158 1159(0.2.0: 15 August 2008, r428) 1160 1161 1162 1163Beta Release 0.1.6 (08 Augusst, 2008) 1164------------------------------------- 1165 1166 * Give accurate CIGAR string. 1167 1168 * Add a simple interface to SW/NW alignment 1169 1170(0.1.6: 08 August 2008, r414) 1171 1172 1173 1174Beta Release 0.1.5 (27 July, 2008) 1175---------------------------------- 1176 1177 * Improve the speed. This version is expected to give the same results. 1178 1179(0.1.5: 27 July 2008, r400) 1180 1181 1182 1183Beta Release 0.1.4 (22 July, 2008) 1184---------------------------------- 1185 1186 * Fixed a bug which may cause missing gapped alignments. 1187 1188 * More clearly define what alignments can be found by BWA (See 1189 manual). Now BWA runs a little slower because it will visit more 1190 potential gapped alignments. 1191 1192 * A bit code clean up. 1193 1194(0.1.4: 22 July 2008, r387) 1195 1196 1197 1198Beta Release 0.1.3 (21 July, 2008) 1199---------------------------------- 1200 1201Improve the speed with some tricks on retrieving occurences. The results 1202should be exactly the same as that of 0.1.2. 1203 1204(0.1.3: 21 July 2008, r382) 1205 1206 1207 1208Beta Release 0.1.2 (17 July, 2008) 1209---------------------------------- 1210 1211Support gapped alignment. Codes for ungapped alignment has been removed. 1212 1213(0.1.2: 17 July 2008, r371) 1214 1215 1216 1217Beta Release 0.1.1 (03 June, 2008) 1218----------------------------------- 1219 1220This is the first release of BWA, Burrows-Wheeler Alignment tool. Please 1221read man page for more information about this software. 1222 1223(0.1.1: 03 June 2008, r349) 1224