1## Introduction 2 3KSW2 is a library to align a pair of biological sequences based on dynamic 4programming (DP). So far it comes with global alignment and alignment extension 5(no local alignment yet) under an affine gap cost function: gapCost(*k*) = 6*q*+*k*\**e*, or a two-piece affine gap cost: gapCost2(*k*) = min{*q*+*k*\**e*, 7*q2*+*k*\**e2*}. For the latter cost function, if *q*+*e*<*q2*+*e2* and *e*>*e2*, 8(*q*,*e*) is effectively applied to short gaps only, while (*q2*,*e2*) applied 9to gaps no shorter than ceil((*q2*-*q*)/(*e*-*e2*)-1). It helps to retain long 10gaps. The algorithm behind the two-piece cost is close to [Gotoh 11(1990)][piece-affine]. 12 13KSW2 supports fixed banding and optionally produces alignment paths (i.e. 14CIGARs) with gaps either left- or right-aligned. It provides implementations 15using SSE2 and SSE4.1 intrinsics based on [Hajime Suzuki][hs]'s diagonal 16[formulation][hs-eq] which enables 16-way SSE parallelization for the most part 17of the inner loop, regardless of the maximum score of the alignment. 18 19KSW2 implements the Suzuki-Kasahara algorithm and is a component of 20[minimap2][mm2]. If you use KSW2 in your work, please cite: 21 22> * Suzuki, H. and Kasahara, M. (2018). Introducing difference recurrence relations for faster semi-global alignment of long sequences. *BMC Bioinformatics*, **19**:45. 23> * Li, H (2018) Minimap2: pairwise alignment for nucleotide sequences. *Bioinformatics*, **34**:3094-3100. 24 25## Usage 26 27Each `ksw2_*.c` file implements a single function and is independent of each 28other. Here are brief descriptions about what each file implements: 29 30* [ksw2_gg.c](ksw2_gg.c): global alignment; Green's standard formulation 31* [ksw2_gg2.c](ksw2_gg2.c): global alignment; Suzuki's diagonal formulation 32* [ksw2_gg2_sse.c](ksw2_gg2_sse.c): global alignment with SSE intrinsics; Suzuki's 33* [ksw2_extz.c](ksw2_extz.c): global and extension alignment; Green's formulation 34* [ksw2_extz2_sse.c](ksw2_extz2_sse.c): global and extension with SSE intrinsics; Suzuki's 35* [ksw2_extd.c](ksw2_extd.c): global and extension alignment, dual gap cost; Green's formulation 36* [ksw2_extd2_sse.c](ksw2_extd2_sse.c): global and extension, dual gap cost, with SSE intrinsics; Suzuki's 37 38Users are encouraged to copy the header file `ksw2.h` and relevant 39`ksw2_*.c` file to their own source code trees. On x86 CPUs with SSE2 40intrinsics, `ksw2_extz2_sse.c` is recommended in general. It supports global 41alignment, alignment extension with Z-drop, score-only alignment, global-only 42alignment and right-aligned CIGARs. `ksw2_gg*.c` are mostly for demonstration 43and comparison purposes. They are annotated with more comments and easier to 44understand than `ksw2_ext*.c`. Header file [ksw2.h](ksw2.h) contains brief 45documentations. TeX file [ksw2.tex](tex/ksw2.tex) gives brief derivation. 46 47To compile the test program `ksw-test`, just type `make`. It takes the 48advantage of SSE4.1 when available. To compile with SSE2 only, use `make 49sse2=1` instead. If you have installed [parasail][para], use `make 50parasail=prefix`, where `prefix` points to the parasail install directory (e.g. 51`/usr/local`). 52 53The following shows a complete example about how to use the library. 54```c 55#include <string.h> 56#include <stdio.h> 57#include "ksw2.h" 58 59void align(const char *tseq, const char *qseq, int sc_mch, int sc_mis, int gapo, int gape) 60{ 61 int i, a = sc_mch, b = sc_mis < 0? sc_mis : -sc_mis; // a>0 and b<0 62 int8_t mat[25] = { a,b,b,b,0, b,a,b,b,0, b,b,a,b,0, b,b,b,a,0, 0,0,0,0,0 }; 63 int tl = strlen(tseq), ql = strlen(qseq); 64 uint8_t *ts, *qs, c[256]; 65 ksw_extz_t ez; 66 67 memset(&ez, 0, sizeof(ksw_extz_t)); 68 memset(c, 4, 256); 69 c['A'] = c['a'] = 0; c['C'] = c['c'] = 1; 70 c['G'] = c['g'] = 2; c['T'] = c['t'] = 3; // build the encoding table 71 ts = (uint8_t*)malloc(tl); 72 qs = (uint8_t*)malloc(ql); 73 for (i = 0; i < tl; ++i) ts[i] = c[(uint8_t)tseq[i]]; // encode to 0/1/2/3 74 for (i = 0; i < ql; ++i) qs[i] = c[(uint8_t)qseq[i]]; 75 ksw_extz(0, ql, qs, tl, ts, 5, mat, gapo, gape, -1, -1, 0, &ez); 76 for (i = 0; i < ez.n_cigar; ++i) // print CIGAR 77 printf("%d%c", ez.cigar[i]>>4, "MID"[ez.cigar[i]&0xf]); 78 putchar('\n'); 79 free(ez.cigar); free(ts); free(qs); 80} 81 82int main(int argc, char *argv[]) 83{ 84 align("ATAGCTAGCTAGCAT", "AGCTAcCGCAT", 1, -2, 2, 1); 85 return 0; 86} 87``` 88 89## Performance Analysis 90 91The following table shows timing on two pairs of long sequences (both in the 92"test" directory). 93 94|Data set|Command line options |Time (s)|CIGAR|Ext|SIMD|Source | 95|:-------|:--------------------------------|:-------|:---:|:-:|:--:|:-------| 96|50k |-t gg -s |7.3 |N |N |N |ksw2 | 97| |-t gg2 -s |19.8 |N |N |N |ksw2 | 98| |-t extz -s |9.2 |N |Y |N |ksw2 | 99| |-t ps\_nw |9.8 |N |N |N |parasail| 100| |-t ps\_nw\_striped\_sse2\_128\_32|2.9 |N |N |SSE2|parasail| 101| |-t ps\_nw\_striped\_32 |2.2 |N |N |SSE4|parasail| 102| |-t ps\_nw\_diag\_32 |3.0 |N |N |SSE4|parasail| 103| |-t ps\_nw\_scan\_32 |3.0 |N |N |SSE4|parasail| 104| |-t extz2\_sse -sg |0.96 |N |N |SSE2|ksw2 | 105| |-t extz2\_sse -sg |0.84 |N |N |SSE4|ksw2 | 106| |-t extz2\_sse -s |3.0 |N |Y |SSE2|ksw2 | 107| |-t extz2\_sse -s |2.7 |N |Y |SSE4|ksw2 | 108|16.5k |-t gg -s |0.84 |N |N |N |ksw2 | 109| |-t gg |1.6 |Y |N |N |ksw2 | 110| |-t gg2 |3.3 |Y |N |N |ksw2 | 111| |-t extz |2.0 |Y |Y |N |ksw2 | 112| |-t extz2\_sse |0.40 |Y |Y |SSE4|ksw2 | 113| |-t extz2\_sse -g |0.18 |Y |N |SSE4|ksw2 | 114 115The standard DP formulation is about twice as fast as Suzuki's diagonal 116formulation (`-tgg` vs `-tgg2`), but SSE-based diagonal formulation 117is several times faster than the standard DP. If we only want to compute one 118global alignment score, we can use 16-way parallelization in the entire inner 119loop. For extension alignment, though, we need to keep an array of 32-bit 120scores and have to use 4-way parallelization for part of the inner loop. This 121significantly reduces performance (`-sg` vs `-s`). KSW2 is faster than 122parasail partly because the former uses one score for all matches and another 123score for all mismatches. For diagonal formulations, vectorization is more 124complex given a generic scoring matrix. 125 126It is possible to further accelerate global alignment with dynamic banding as 127is implemented in [edlib][edlib]. However, it is not as effective for extension 128alignment. Another idea is [adaptive banding][adap-band], which might be worth 129trying at some point. 130 131## Alternative Libraries 132 133|Library |CIGAR|Intra-seq|Affine-gap|Local |Global |Glocal |Extension| 134|:---------------|:---:|:-------:|:--------:|:-------:|:-------:|:-------:|:-------:| 135|[edlib][edlib] |Yes |Yes |No |Very fast|Very fast|Very fast|N/A | 136|[KSW][klib] |Yes |Yes |Yes |Fast |Slow |N/A |Slow | 137|KSW2 |Yes |Yes |Yes/dual |N/A |Fast |N/A |Fast | 138|[libgaba][gaba] |Yes |Yes |Yes |N/A? |N/A? |N/A? |Fast | 139|[libssa][ssa] |No |No? |Yes |Fast |Fast |N/A |N/A | 140|[Opal][opal] |No |No |Yes |Fast |Fast |Fast |N/A | 141|[Parasail][para]|No |Yes |Yes |Fast |Fast |Fast |N/A | 142|[SeqAn][seqan] |Yes |Yes |Yes |Slow |Slow |Slow |N/A | 143|[SSW][ssw] |Yes |Yes |Yes |Fast |N/A |N/A |N/A | 144|[SWIPE][swipe] |Yes |No |Yes |Fast |N/A? |N/A? |N/A | 145|[SWPS3][swps3] |No |Yes |Yes |Fast |N/A? |N/A |N/A | 146 147 148 149[hs]: https://github.com/ocxtal 150[hs-eq]: https://github.com/ocxtal/diffbench 151[edlib]: https://github.com/Martinsos/edlib 152[klib]: https://github.com/attractivechaos/klib 153[para]: https://github.com/jeffdaily/parasail 154[opal]: https://github.com/Martinsos/opal 155[ssw]: https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library 156[ssa]: https://github.com/RonnySoak/libssa 157[gaba]: https://github.com/ocxtal/libgaba 158[adap-band]: https://github.com/ocxtal/adaptivebandbench 159[swipe]: https://github.com/torognes/swipe 160[swps3]: http://lab.dessimoz.org/swps3/ 161[seqan]: http://seqan.de 162[piece-affine]: https://www.ncbi.nlm.nih.gov/pubmed/2165832 163[mm2]: https://github.com/lh3/minimap2 164