• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

cram/H22-Oct-2021-22,93915,411

htscodecs/H07-Jul-2021-27,41922,571

htslib/H22-Oct-2021-11,5754,221

m4/H22-Oct-2021-551503

os/H22-Oct-2021-184118

test/H07-May-2022-39,13635,599

INSTALLH A D15-Jun-202110.7 KiB272200

LICENSEH A D22-Jun-20213.5 KiB7052

MakefileH A D03-May-202237.9 KiB823559

NEWSH A D22-Oct-202166.4 KiB1,5901,132

READMEH A D12-Apr-20212.6 KiB2823

README.large_positions.mdH A D10-Dec-20197.6 KiB235174

bcf_sr_sort.cH A D12-Aug-202122.4 KiB708594

bcf_sr_sort.hH A D12-Sep-20193.8 KiB10967

bgzf.cH A D27-Sep-202179.7 KiB2,5691,967

bgzip.1H A D22-Oct-20215.5 KiB187146

bgzip.cH A D28-Sep-202115.3 KiB425367

builddir_vars.mk.inH A D14-Apr-20212.4 KiB5948

config.h.inH A D22-Oct-20213.7 KiB13793

config.mk.inH A D06-Dec-20203.5 KiB11589

configureH A D22-Oct-2021192.6 KiB6,9175,742

configure.acH A D16-Jun-202118.7 KiB510435

errmod.cH A D19-Nov-20196.6 KiB209131

faidx.5H A D13-Jun-20186.1 KiB239213

faidx.cH A D11-Jul-202026.5 KiB952741

header.cH A D19-Mar-202178.3 KiB2,7411,951

header.hH A D08-Jan-202010.2 KiB320113

hfile.cH A D21-May-202138.8 KiB1,406983

hfile_gcs.cH A D22-Jun-20215 KiB161109

hfile_internal.hH A D07-Jan-20217.8 KiB20462

hfile_libcurl.cH A D07-Jan-202147.8 KiB1,5571,185

hfile_s3.cH A D16-Sep-202135.7 KiB1,311990

hfile_s3_write.cH A D16-Sep-202023.6 KiB897615

hts.cH A D02-Oct-2021150.1 KiB4,7463,718

hts_expr.cH A D02-Feb-202120.2 KiB706484

hts_internal.hH A D19-Oct-20205.6 KiB15367

hts_os.cH A D03-Feb-20211.9 KiB6023

htscodecs_bundled.mkH A D25-Feb-20213 KiB6329

htscodecs_external.mkH A D25-Feb-20211.7 KiB4720

htsfile.1H A D22-Oct-20213.8 KiB9571

htsfile.cH A D03-Feb-20219.4 KiB330260

htslib-s3-plugin.7H A D22-Oct-20214.5 KiB140104

htslib.mkH A D22-Jun-20216.7 KiB195124

htslib.pc.inH A D17-Feb-2017490 1613

htslib_vars.mkH A D03-Feb-20213.2 KiB5527

kfunc.cH A D03-Feb-202110.6 KiB314203

kstring.cH A D29-Jan-202110.3 KiB445356

md5.cH A D02-Oct-201910.4 KiB389252

multipart.cH A D19-Nov-20198.3 KiB268185

plugin.cH A D03-Feb-20216.3 KiB221158

probaln.cH A D03-Feb-202115.7 KiB449321

realn.cH A D22-Jun-202112.2 KiB312246

regidx.cH A D02-Oct-201919.4 KiB687567

region.cH A D02-Oct-20197.6 KiB277196

sam.5H A D16-Sep-20202.7 KiB6944

sam.cH A D15-Sep-2021198.6 KiB6,3835,087

sam_internal.hH A D16-Apr-20213.4 KiB10657

synced_bcf_reader.cH A D07-Sep-202145.3 KiB1,4431,234

tabix.1H A D22-Oct-20216.8 KiB205168

tabix.cH A D04-May-202125.7 KiB721636

tbx.cH A D05-Aug-202015.3 KiB482404

textutils.cH A D17-Jul-202011.7 KiB498388

textutils_internal.hH A D23-Oct-202014.4 KiB403158

thread_pool.cH A D15-Jul-202144.7 KiB1,536921

thread_pool_internal.hH A D04-Mar-20195.7 KiB17069

vcf.5H A D19-Nov-20193.5 KiB12194

vcf.cH A D10-Sep-2021160.7 KiB4,8334,206

vcf_sweep.cH A D11-Jul-20205.4 KiB191136

vcfutils.cH A D22-Jun-202133.9 KiB850768

version.shH A D22-Oct-20212.2 KiB6023

README

1HTSlib is an implementation of a unified C library for accessing common file
2formats, such as SAM, CRAM, VCF, and BCF, used for high-throughput sequencing
3data.  It is the core library used by samtools and bcftools.
4
5See INSTALL for building and installation instructions.
6
7Please cite this paper when using HTSlib for your publications:
8
9HTSlib: C library for reading/writing high-throughput sequencing data
10James K Bonfield, John Marshall, Petr Danecek, Heng Li, Valeriu Ohan, Andrew Whitwham, Thomas Keane, Robert M Davies
11GigaScience, Volume 10, Issue 2, February 2021, giab007, https://doi.org/10.1093/gigascience/giab007
12
13@article{10.1093/gigascience/giab007,
14    author = {Bonfield, James K and Marshall, John and Danecek, Petr and Li, Heng and Ohan, Valeriu and Whitwham, Andrew and Keane, Thomas and Davies, Robert M},
15    title = "{HTSlib: C library for reading/writing high-throughput sequencing data}",
16    journal = {GigaScience},
17    volume = {10},
18    number = {2},
19    year = {2021},
20    month = {02},
21    abstract = "{Since the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files. To facilitate this a library was produced out of the original SAMtools implementation, with a focus on performance and robustness. The file formats themselves have become international standards under the jurisdiction of the Global Alliance for Genomics and Health.We present a software library for providing programmatic access to sequencing alignment and variant formats. It was born out of the widely used SAMtools and BCFtools applications. Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use of threading.Since the original Samtools release, performance has been considerably improved, with a BAM read-write loop running 5 times faster and BAM to SAM conversion 13 times faster (both using 16 threads, compared to Samtools 0.1.19). Widespread adoption has seen HTSlib downloaded \\>1 million times from GitHub and conda. The C library has been used directly by an estimated 900 GitHub projects and has been incorporated into Perl, Python, Rust, and R, significantly expanding the number of uses via other languages. HTSlib is open source and is freely available from htslib.org under MIT/BSD license.}",
22    issn = {2047-217X},
23    doi = {10.1093/gigascience/giab007},
24    url = {https://doi.org/10.1093/gigascience/giab007},
25    note = {giab007},
26    eprint = {https://academic.oup.com/gigascience/article-pdf/10/2/giab007/36332285/giab007.pdf},
27}
28

README.large_positions.md

1# HTSlib 64 bit reference positions
2
3HTSlib version 1.10 onwards internally use 64 bit reference positions.  This
4is to support analysis of species like axolotl, tulip and marbled lungfish
5which have, or are expected to have,  chromosomes longer than two gigabases.
6
7# File format support
8
9Currently 64 bit positions can only be stored in SAM and VCF format files.
10Binary BAM, CRAM and BCF cannot be used due to limitations in the formats
11themselves.  As SAM and VCF are text formats, they have no limit on the
12size of numeric values. Note that while 64 bit positions are supported by
13default for SAM, for VCF they must be enabled explicitly at compile time
14by editing Makefile and adding -DVCF_ALLOW_INT64=1 to CFLAGS.
15
16# Compatibility issues to check
17
18Various data structure members, function parameters, and return values have
19been expanded from 32 to 64 bits.  As a result, some changes may be needed to
20code that uses the library, even if it does not support long references.
21
22## Variadic functions taking format strings
23
24The type of various structure members (e.g. `bam1_core_t::pos`) and return
25values from some functions (e.g. `bam_cigar2rlen()`) have been changed to
26`hts_pos_t`, which is a 64-bit signed integer.  Using these in 32-bit
27code will generally work (as long as the stored positions are within range),
28however care needs to be taken when these values are passed directly
29to functions like `printf()` which take a variable-length argument list and
30a format string.
31
32Header file `htslib/hts.h` defines macro `PRIhts_pos` which can be
33used in `printf()` format strings to get the correct format specifier for
34an `hts_pos_t` value.  Code that needs to print positions should be
35changed from:
36
37```c
38printf("Position is %d\n", bam->core.pos);
39```
40
41to:
42
43```c
44printf("Position is %"PRIhts_pos"\n", bam->core.pos);
45```
46
47If for some reason compatibility with older versions of HTSlib (which do
48not have `hts_pos_t` or `PRIhts_pos`) is needed, the value can be cast to
49`int64_t` and printed as an explicitly 64-bit value:
50
51```c
52#include <inttypes.h> // For PRId64 and int64_t
53
54printf("Position is %" PRId64 "\n", (int64_t) bam->core.pos);
55```
56
57Passing incorrect types to variadic functions like `printf()` can lead
58to incorrect behaviour and security risks, so it important to track down
59and fix all of the places where this may happen.  Modern C compilers like
60gcc (version 3.0 onwards) and clang can check `printf()` and `scanf()`
61parameter types for compatibility against the format string.  To
62enable this, build code with `-Wall` or `-Wformat` and fix all the
63reported warnings.
64
65Where functions that take `printf`-style format strings are implemented,
66they should use the appropriate gcc attributes to enable format string
67checking.  `htslib/hts_defs.h` includes macros `HTS_FORMAT` and
68`HTS_PRINTF_FMT` which can be used to provide the attribute declaration
69in a portable way.  For example, `test/sam.c` uses them for a function
70that prints error messages:
71
72```
73void HTS_FORMAT(HTS_PRINTF_FMT, 1, 2) fail(const char *fmt, ...) { /* ... */ }
74```
75
76## Implicit type conversions
77
78Conversion of signed `int` or `int32_t` to `hts_pos_t` will always work.
79
80Conversion of `hts_pos_t` to `int` or `int32_t` will work as long as the value
81converted is within the range that can be stored in the destination.
82
83Code that casts unsigned `uint32_t` values to signed with the expectation
84that the result may be negative will no longer work as `hts_pos_t` can store
85values over UINT32_MAX.  Such code should be changed to use signed values.
86
87Functions hts_parse_region() and hts_parse_reg64() return special value
88`HTS_POS_MAX` for regions which extend to the end of the reference.
89This value is slightly smaller than INT64_MAX, but should be larger than
90any reference that is likely to be used.  When cast to `int32_t` the
91result should be `INT32_MAX`.
92
93# Upgrading code to work with 64 bit positions
94
95Variables used to store reference positions should be changed to
96type `hts_pos_t`.  Use `PRIhts_pos` in format strings when printing them.
97
98When converting positions stored in strings, use `strtoll()` in place of
99`atoi()` or `strtol()` (which produces a 32 bit value on 64-bit Windows and
100all 32-bit platforms).
101
102Programs which need to look up a reference sequence length from a `sam_hdr_t`
103structure should use `sam_hdr_tid2len()` instead of the old
104`sam_hdr_t::target_len` array (which is left as 32-bit for reasons of
105compatibility).  `sam_hdr_tid2len()` returns `hts_pos_t`, so works correctly
106for large references.
107
108Various functions which take pointer arguments have new versions which
109support `hts_pos_t *` arguments.  Code supporting 64-bit positions should
110use the new versions.  These are:
111
112Original function  | 64-bit version
113------------------ | --------------------
114fai_fetch()        | fai_fetch64()
115fai_fetchqual()    | fai_fetchqual64()
116faidx_fetch_seq()  | faidx_fetch_seq64()
117faidx_fetch_qual() | faidx_fetch_qual64()
118hts_parse_reg()    | hts_parse_reg64() or hts_parse_region()
119bam_plp_auto()     | bam_plp64_auto()
120bam_plp_next()     | bam_plp64_next()
121bam_mplp_auto()    | bam_mplp64_auto()
122
123Limited support has been added for 64-bit INFO values in VCF files, for large
124values in structural variant END tags.  New functions `bcf_update_info_int64()`
125and `bcf_get_info_int64()` can be used to set and fetch 64-bit INFO values.
126They both take arrays of `int64_t`.  `bcf_int64_missing` and
127`bcf_int64_vector_end` can be used to set missing and vector end values in
128these arrays.  The INFO data is stored in the minimum size needed, so there
129is no harm in using these functions to store smaller integer values.
130
131# Structure members that have changed size
132
133```
134File htslib/hts.h:
135   hts_pair32_t::begin
136   hts_pair32_t::end
137
138   (typedef hts_pair_pos_t is provided as a better-named replacement for hts_pair32_t)
139
140   hts_reglist_t::min_beg
141   hts_reglist_t::max_end
142
143   hts_itr_t::beg
144   hts_itr_t::end
145   hts_itr_t::curr_beg
146   hts_itr_t::curr_end
147
148File htslib/regidx.h:
149   reg_t::start
150   reg_t::end
151
152File htslib/sam.h:
153   bam1_core_t::pos
154   bam1_core_t::mpos
155   bam1_core_t::isize
156
157File htslib/synced_bcf_reader.h:
158   bcf_sr_regions_t::start
159   bcf_sr_regions_t::end
160   bcf_sr_regions_t::prev_start
161
162File htslib/vcf.h:
163   bcf_idinfo_t::info
164
165   bcf_info_t::v1::i
166
167   bcf1_t::pos
168   bcf1_t::rlen
169```
170
171# Functions where parameters or the return value have changed size
172
173Functions are annotated as follows:
174
175* `[new]`  The function has been added since version 1.9
176* `[parameters]` Function parameters have changed size
177* `[return]` Function return value has changed size
178
179```
180File htslib/faidx.h:
181
182   [new]        fai_fetch64()
183   [new]        fai_fetchqual64()
184   [new]        faidx_fetch_seq64()
185   [new]        faidx_fetch_qual64()
186   [new]        fai_parse_region()
187
188File htslib/hts.h:
189
190   [parameters] hts_idx_push()
191   [new]        hts_parse_reg64()
192   [parameters] hts_itr_query()
193   [parameters] hts_reg2bin()
194
195File htslib/kstring.h:
196
197   [new]        kputll()
198
199File htslib/regidx.h:
200
201   [parameters] regidx_overlap()
202
203File htslib/sam.h:
204
205   [new]        sam_hdr_tid2len()
206   [return]     bam_cigar2qlen()
207   [return]     bam_cigar2rlen()
208   [return]     bam_endpos()
209   [parameters] bam_itr_queryi()
210   [parameters] sam_itr_queryi()
211   [new]        bam_plp64_next()
212   [new]        bam_plp64_auto()
213   [new]        bam_mplp64_auto()
214   [parameters] sam_cap_mapq()
215   [parameters] sam_prob_realn()
216
217File htslib/synced_bcf_reader.h:
218
219   [parameters] bcf_sr_seek()
220   [parameters] bcf_sr_regions_overlap()
221
222File htslib/tbx.h:
223
224   [parameters] tbx_readrec()
225
226File htslib/vcf.h:
227
228   [parameters] bcf_readrec()
229   [new]        bcf_update_info_int64()
230   [new]        bcf_get_info_int64()
231   [return]     bcf_dec_int1()
232   [return]     bcf_dec_typed_int1()
233
234```
235