• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

LICENSEH A D07-Feb-20211.1 KiB2317

README.mdH A D07-Feb-202112.3 KiB237176

bigWig.hH A D07-Feb-202129.2 KiB607169

bigWigIO.hH A D07-Feb-20214.5 KiB11142

bwCommon.hH A D07-Feb-20212.8 KiB719

bwRead.cH A D07-Feb-202113.2 KiB428337

bwStats.cH A D07-Feb-202114.8 KiB538422

bwValues.cH A D07-Feb-202125.7 KiB803650

bwValues.hH A D07-Feb-20213.8 KiB7843

bwWrite.cH A D07-Feb-202145.7 KiB1,3291,003

io.cH A D07-Feb-202110.2 KiB297257

README.md

1![Master build status](https://travis-ci.org/dpryan79/libBigWig.svg?branch=master) [![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.45278.svg)](http://dx.doi.org/10.5281/zenodo.45278)
2
3A C library for reading/parsing local and remote bigWig and bigBed files. While Kent's source code is free to use for these purposes, it's really inappropriate as library code since it has the unfortunate habit of calling `exit()` whenever there's an error. If that's then used inside of something like python then the python interpreter gets killed. This library is aimed at resolving these sorts of issues and should also use more standard things like curl and has a friendlier license to boot.
4
5Documentation is automatically generated by doxygen and can be found under `docs/html` or online [here](https://cdn.rawgit.com/dpryan79/libBigWig/master/docs/html/index.html).
6
7# Example
8
9The only functions and structures that end users need to care about are in "bigWig.h". Below is a commented example. You can see the files under `test/` for further examples.
10
11    #include "bigWig.h"
12    int main(int argc, char *argv[]) {
13        bigWigFile_t *fp = NULL;
14        bwOverlappingIntervals_t *intervals = NULL;
15        double *stats = NULL;
16        if(argc != 2) {
17            fprintf(stderr, "Usage: %s {file.bw|URL://path/file.bw}\n", argv[0]);
18            return 1;
19        }
20
21        //Initialize enough space to hold 128KiB (1<<17) of data at a time
22        if(bwInit(1<<17) != 0) {
23            fprintf(stderr, "Received an error in bwInit\n");
24            return 1;
25        }
26
27        //Open the local/remote file
28        fp = bwOpen(argv[1], NULL, "r");
29        if(!fp) {
30            fprintf(stderr, "An error occurred while opening %s\n", argv[1]);
31            return 1;
32        }
33
34        //Get values in a range (0-based, half open) without NAs
35        intervals = bwGetValues(fp, "chr1", 10000000, 10000100, 0);
36        bwDestroyOverlappingIntervals(intervals); //Free allocated memory
37
38        //Get values in a range (0-based, half open) with NAs
39        intervals = bwGetValues(fp, "chr1", 10000000, 10000100, 1);
40        bwDestroyOverlappingIntervals(intervals); //Free allocated memory
41
42        //Get the full intervals that overlap
43        intervals = bwGetOverlappingIntervals(fp, "chr1", 10000000, 10000100);
44        bwDestroyOverlappingIntervals(intervals);
45
46        //Get an example statistic - standard deviation
47        //We want ~4 bins in the range
48        stats = bwStats(fp, "chr1", 10000000, 10000100, 4, dev);
49        if(stats) {
50            printf("chr1:10000000-10000100 std. dev.: %f %f %f %f\n", stats[0], stats[1], stats[2], stats[3]);
51            free(stats);
52        }
53
54        bwClose(fp);
55        bwCleanup();
56        return 0;
57    }
58
59##Writing example
60
61N.B., creation of bigBed files is not supported (there are no plans to change this).
62
63Below is an example of how to write bigWig files. You can also find this file under `test/exampleWrite.c`. Unlike with Kent's tools, you can create bigWig files entry by entry without needing an intermediate wiggle or bedGraph file. Entries in bigWig files are stored in blocks with each entry in a block referring to the same chromosome and having the same type, of which there are three (see the [wiggle specification](http://genome.ucsc.edu/goldenpath/help/wiggle.html) for more information on this).
64
65    #include "bigWig.h"
66
67    int main(int argc, char *argv[]) {
68        bigWigFile_t *fp = NULL;
69        char *chroms[] = {"1", "2"};
70        char *chromsUse[] = {"1", "1", "1"};
71        uint32_t chrLens[] = {1000000, 1500000};
72        uint32_t starts[] = {0, 100, 125,
73                             200, 220, 230,
74                             500, 600, 625,
75                             700, 800, 850};
76        uint32_t ends[] = {5, 120, 126,
77                           205, 226, 231};
78        float values[] = {0.0f, 1.0f, 200.0f,
79                          -2.0f, 150.0f, 25.0f,
80                          0.0f, 1.0f, 200.0f,
81                          -2.0f, 150.0f, 25.0f,
82                          -5.0f, -20.0f, 25.0f,
83                          -5.0f, -20.0f, 25.0f};
84
85        if(bwInit(1<<17) != 0) {
86            fprintf(stderr, "Received an error in bwInit\n");
87            return 1;
88        }
89
90        fp = bwOpen("example_output.bw", NULL, "w");
91        if(!fp) {
92            fprintf(stderr, "An error occurred while opening example_output.bw for writingn\n");
93            return 1;
94        }
95
96        //Allow up to 10 zoom levels, though fewer will be used in practice
97        if(bwCreateHdr(fp, 10)) goto error;
98
99        //Create the chromosome lists
100        fp->cl = bwCreateChromList(chroms, chrLens, 2);
101        if(!fp->cl) goto error;
102
103        //Write the header
104        if(bwWriteHdr(fp)) goto error;
105
106        //Some example bedGraph-like entries
107        if(bwAddIntervals(fp, chromsUse, starts, ends, values, 3)) goto error;
108        //We can continue appending similarly formatted entries
109        //N.B. you can't append a different chromosome (those always go into different
110        if(bwAppendIntervals(fp, starts+3, ends+3, values+3, 3)) goto error;
111
112        //Add a new block of entries with a span. Since bwAdd/AppendIntervals was just used we MUST create a new block
113        if(bwAddIntervalSpans(fp, "1", starts+6, 20, values+6, 3)) goto error;
114        //We can continue appending similarly formatted entries
115        if(bwAppendIntervalSpans(fp, starts+9, values+9, 3)) goto error;
116
117        //Add a new block of fixed-step entries
118        if(bwAddIntervalSpanSteps(fp, "1", 900, 20, 30, values+12, 3)) goto error;
119        //The start is then 760, since that's where the previous step ended
120        if(bwAppendIntervalSpanSteps(fp, values+15, 3)) goto error;
121
122        //Add a new chromosome
123        chromsUse[0] = "2";
124        chromsUse[1] = "2";
125        chromsUse[2] = "2";
126        if(bwAddIntervals(fp, chromsUse, starts, ends, values, 3)) goto error;
127
128        //Closing the file causes the zoom levels to be created
129        bwClose(fp);
130        bwCleanup();
131
132        return 0;
133
134    error:
135        fprintf(stderr, "Received an error somewhere!\n");
136        bwClose(fp);
137        bwCleanup();
138        return 1;
139    }
140
141# Testing file types
142
143As of version 0.3.0, this library supports accessing bigBed files, which are related to bigWig files. Applications that need to support both bigWig and bigBed input can use the `bwIsBigWig` and `bbIsBigBed` functions to determine if their inputs are bigWig/bigBed files:
144
145    ...code...
146    if(bwIsBigWig(input_file_name, NULL)) {
147        //do something
148    } else if(bbIsBigBed(input_file_name, NULL)) {
149        //do something else
150    } else {
151        //handle unknown input
152    }
153
154Note that these two functions rely on the "magic number" at the beginning of each file, which differs between bigWig and bigBed files.
155
156# bigBed support
157
158Support for accessing bigBed files was added in version 0.3.0. The function names used for accessing bigBed files are similar to those used for bigWig files.
159
160    Function | Use
161    --- | ---
162    bbOpen | Opens a bigBed file
163    bbGetSQL | Returns the SQL string (if it exists) in a bigBed file
164    bbGetOverlappingEntries | Returns all entries overlapping an interval (either with or without their associated strings
165    bbDestroyOverlappingEntries | Free memory allocated by the above command
166
167Other functions, such as `bwClose` and `bwInit`, are shared between bigWig and bigBed files. See `test/testBigBed.c` for a full example.
168
169# A note on bigBed entries
170
171Inside bigBed files, entries are stored as chromosome, start, and end coordinates with an (optional) associated string. For example, a "bedRNAElements" file from Encode has name, score, strand, "level", "significance", and "score2" values associated with each entry. These are stored inside the bigBed files as a single tab-separated character vector (char \*), which makes parsing difficult. The names of the various fields inside of bigBed files is stored as an SQL string, for example:
172
173    table RnaElements
174    "BED6 + 3 scores for RNA Elements data "
175        (
176        string chrom;      "Reference sequence chromosome or scaffold"
177        uint   chromStart; "Start position in chromosome"
178        uint   chromEnd;   "End position in chromosome"
179        string name;       "Name of item"
180        uint   score;      "Normalized score from 0-1000"
181        char[1] strand;    "+ or - or . for unknown"
182        float level;       "Expression level such as RPKM or FPKM. Set to -1 for no data."
183        float signif;      "Statistical significance such as IDR. Set to -1 for no data."
184        uint score2;       "Additional measurement/count e.g. number of reads. Set to 0 for no data."
185        )
186
187Entries will then be of the form (one per line):
188
189    59426	115	-	0.021	0.48	218
190    51	209	+	0.071	0.74	130
191    52	170	+	0.045	0.61	171
192    59433	178	-	0.049	0.34	296
193    53	156	+	0.038	0.19	593
194    59436	186	-	0.054	0.15	1010
195    59437	506	-	1.560	0.00	430611
196
197Note that chromosome and start/end intervals are stored separately, so there's no need to parse them out of string. libBigWig can return these entries, either with or without the above associated strings. Parsing these string is left to the application requiring them and is currently outside the scope of this library.
198
199# Interval/Entry iterators
200
201Sometimes it is desirable to request a large number of intervals from a bigWig file or entries from a bigBed file, but not hold them all in memory at once (e.g., due to saving memory). To support this, libBigWig (since version 0.3.0) supports two kinds of iterators. The general process of using iterators is: (1) iterator creation, (2) traversal, and finally (3) iterator destruction. Only iterator creation differs between bigWig and bigBed files.
202
203Importantly, iterators return results by one or more blocks. This is for convenience, since bigWig intervals and bigBed entries are stored in together in fixed-size groups, called blocks. The number of blocks of entries returned, therefore, is an option that can be specified to balance performance and memory usage.
204
205## Iterator creation
206
207For bigwig files, iterators are created with the `bwOverlappingIntervalsIterator()`. This function takes chromosomal bounds (chromosome name, start, and end position) as well as a number of blocks. The equivalent function for bigBed files is `bbOverlappingEntriesIterator()`, which additionally takes a `withString` argutment, which dictates whether the returned entries include the associated string values or not.
208
209Each of the aforementioned files returns a pointer to a `bwOverlapIterator_t` object. The only important parts of this structure for end users are the following members: `entries`, `intervals`, and `data`. `entries` is a pointer to a `bbOverlappingEntries_t` object, or `NULL` if a bigWig file is being used. Likewise, `intervals` is a pointer to a `bwOverlappingIntervals_t` object, or `NULL` if a bigBed file is being used. `data` is a special pointer, used to signify the end of iteration. Thus, when `data` is a `NULL` pointer, iteration has ended.
210
211## Iterator traversal
212
213Regardless of whether a bigWig or bigBed file is being used, the `bwIteratorNext()` function will free currently used memory and load the appropriate intervals or entries for the next block(s). On error, this will return a NULL pointer (memory is already internally freed in this case).
214
215## Iterator destruction
216
217`bwOverlapIterator_t` objects MUST be destroyed after use. This can be done with the `bwIteratorDestroy()` function.
218
219## Example
220
221A full example is provided in `tests/testIterator.c`, but a small example of iterating over all bigWig intervals in `chr1:0-10000000` in chunks of 5 blocks follows:
222
223    iter = bwOverlappingIntervalsIterator(fp, "chr1", 0, 10000000, 5);
224    while(iter->data) {
225        //Do stuff with iter->intervals
226        iter = bwIteratorNext(iter);
227    }
228    bwIteratorDestroy(iter);
229
230# A note on bigWig statistics
231
232The results of `min`, `max`, and `mean` should be the same as those from `BigWigSummary`. `stdev` and `coverage`, however, may differ due to Kent's tools producing incorrect results (at least for `coverage`, though the same appears to be the case for `stdev`).
233
234# Python interface
235
236There are currently two python interfaces that make use of libBigWig: [pyBigWig](https://github.com/dpryan79/pyBigWig) by me and [bw-python](https://github.com/brentp/bw-python) by Brent Pederson. Those interested are encouraged to give both a try!
237