1.. _genomecov:
2
3###############
4*genomecov*
5###############
6
7|
8
9.. image:: ../images/tool-glyphs/genomecov-glyph.png
10    :width: 600pt
11
12|
13
14``bedtools genomecov`` computes histograms (default), per-base reports (``-d``)
15and BEDGRAPH (``-bg``) summaries of feature coverage (e.g., aligned sequences)
16for a given genome.
17
18.. note::
19
20  1. If using BED/GFF/VCF, the input (``-i``) file must be grouped by
21  chromosome. A simple  ``sort -k 1,1 in.bed > in.sorted.bed`` will suffice.
22  Also, if using BED/GFF/VCF, one must provide a genome file via the ``-g``
23  argument.
24
25  2. If the input is in BAM (-ibam) format, the BAM file must be sorted
26  by position.  Using ``samtools sort aln.bam aln.sorted`` will suffice.
27
28
29===============================
30Usage and option summary
31===============================
32**Usage**:
33::
34
35  bedtools genomecov [OPTIONS] [-i|-ibam] -g (iff. -i)
36
37**(or)**:
38::
39
40  genomeCoverageBed [OPTIONS] [-i|-ibam] -g (iff. -i)
41
42
43
44===========================      ===============================================================================================================================================================================================================
45 Option                           Description
46===========================      ===============================================================================================================================================================================================================
47**-ibam**                        | BAM file as input for coverage. Each BAM alignment in A added to the total coverage for the genome.
48                                 | Use "stdin" or simply "-" if passing it with a UNIX pipe: For example:
49                                 | ``samtools view -b <BAM> | genomeCoverageBed -ibam stdin -g hg18.genome``
50**-d**                           Report the depth at each genome position with 1-based coordinates.
51**-dz**                          Report the depth at each genome position with 0-based coordinates.
52                                 Unlike, `-d`, this reports only non-zero positions.
53**-bg**                          Report depth in BedGraph format. For details, see: http://genome.ucsc.edu/goldenPath/help/bedgraph.html
54**-bga**                         Report depth in BedGraph format, as above (i.e., -bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep -w 0$" to the output.
55**-split**                       Treat "split" BAM or BED12 entries as distinct BED intervals when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12).
56**-strand**                      Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6).
57**-5**                           Calculate coverage of 5' positions (instead of entire interval).
58**-3**                           Calculate coverage of 3' positions (instead of entire interval).
59**-max**                         Combine all positions with a depth >= max into a single bin in the histogram.
60**-scale**                       | Scale the coverage by a constant factor.
61                                 | Each coverage value is multiplied by this factor before being reported.
62                                 | Useful for normalizing coverage by, e.g., reads per million (RPM).
63                                 | ``Default is 1.0; i.e., unscaled.``
64**-trackline**                   | Adds a UCSC/Genome-Browser track line definition in the first line of the output.
65                                 | See `here <http://genome.ucsc.edu/goldenPath/help/bedgraph.html>`_ for more details about track line definition:
66**-trackopts**                   Writes additional track line definition parameters in the first line.
67**-pc**                          | Calculates coverage of intervals from left point of a pair reads to the right point.
68                                 | Works for BAM files only
69**-fs**                          | Forces to use fragment size instead of read length
70                                 | Works for BAM files only
71
72===========================      ===============================================================================================================================================================================================================
73
74
75
76
77==========================================================================
78Default behavior
79==========================================================================
80By default, ``bedtools genomecov`` will compute a histogram of coverage for
81the genome file provided. The default output format is as follows:
82
831. chromosome (or entire genome)
842. depth of coverage from features in input file
853. number of bases on chromosome (or genome) with depth equal to column 2.
864. size of chromosome (or entire genome) in base pairs
875. fraction of bases on chromosome (or entire genome) with depth equal to column 2.
88
89For example:
90
91.. code-block:: bash
92
93  $ cat A.bed
94  chr1  10  20
95  chr1  20  30
96  chr2  0   500
97
98  $ cat my.genome
99  chr1  1000
100  chr2  500
101
102  $ bedtools genomecov -i A.bed -g my.genome
103  chr1   0  980  1000  0.98
104  chr1   1  20   1000  0.02
105  chr2   1  500  500   1
106  genome 0  980  1500  0.653333
107  genome 1  520  1500  0.346667
108
109
110==========================================================================
111``-max`` Controlling the histogram's maximum depth
112==========================================================================
113Using the ``-max`` option, ``bedtools genomecov`` will "lump" all positions in
114the genome having feature coverage greater than or equal to ``-max`` into
115the ``-max`` histogram bin. For example, if one sets ``-max``
116equal to 50, the max depth reported in the output will be 50 and all positions
117with a depth >= 50 will be represented in bin 50.
118
119
120==========================================================================
121``-d`` Reporting "per-base" genome coverage
122==========================================================================
123Using the ``-d`` option, ``bedtools genomecov`` will compute the depth of
124feature coverage for each base on each chromosome in genome file provided.
125
126The "per-base" output format is as follows:
127
1281. chromosome
1292. chromosome position
1303. depth (number) of features overlapping this chromosome position.
131
132For example:
133
134.. code-block:: bash
135
136  $ cat A.bed
137  chr1  10  20
138  chr1  20  30
139  chr2  0   500
140
141  $ cat my.genome
142  chr1  1000
143  chr2  500
144
145  $ bedtools genomecov -i A.bed -g my.genome -d | \
146        head -15 | \
147        tail -n 10
148  chr1  6  0
149  chr1  7  0
150  chr1  8  0
151  chr1  9  0
152  chr1  10 0
153  chr1  11 1
154  chr1  12 1
155  chr1  13 1
156  chr1  14 1
157  chr1  15 1
158
159
160==========================================================================
161``-bg`` Reporting genome coverage in BEDGRAPH format.
162==========================================================================
163Whereas the ``-d`` option reports an output line describing the observed
164coverage at each and every position in the genome, the ``-bg`` option instead
165produces genome-wide coverage output in
166`BEDGRAPH <http://genome.ucsc.edu/goldenPath/help/bedgraph.html>`_ format.
167This is a much more concise representation since consecutive positions with the
168same coverage are reported as a single output line describing the start and end
169coordinate of the interval having the coverage level, followed by the coverage
170level itself.
171
172
173For example, below is a snippet of BEDGRAPH output of the coverage from a 1000
174Genome Project BAM file:
175
176.. code-block:: bash
177
178  $ bedtools genomecov -ibam NA18152.bam -bg | head
179  chr1	554304	554309	5
180  chr1	554309	554313	6
181  chr1	554313	554314	1
182  chr1	554315	554316	6
183  chr1	554316	554317	5
184  chr1	554317	554318	1
185  chr1	554318	554319	2
186  chr1	554319	554321	6
187  chr1	554321	554323	1
188  chr1	554323	554334	7
189
190Using this format, one can quickly identify regions of the genome with
191sufficient coverage (in this case, 10 or more reads) by piping the
192output to an ``awk`` filter.
193
194.. code-block:: bash
195
196  $ bedtools genomecov -ibam NA18152.bam -bg | \
197      awk '$4 > 9' | \
198      head
199  chr1	554377	554381	11
200  chr1	554381	554385	12
201  chr1	554385	554392	16
202  chr1	554392	554408	17
203  chr1	554408	554410	19
204  chr1	554410	554422	20
205  chr1	554422	554423	19
206  chr1	554423	554430	22
207  chr1	554430	554440	24
208  chr1	554440	554443	25
209
210
211==========================================================================
212``-bga`` Reporting genome coverage for *all* positions in BEDGRAPH format.
213==========================================================================
214The ``-bg`` option reports coverage in BEDGRAPH format only for those regions
215of the genome that actually have coverage.  But what about the uncovered portion
216of the genome?  By using the ``-bga`` option, one receives a complete report
217including the regions with zero coverage.
218
219For example, compare the output from ``-bg``:
220
221.. code-block:: bash
222
223  $ bedtools genomecov -ibam NA18152.bam -bg | head
224  chr1	554304	554309	5
225  chr1	554309	554313	6
226  chr1	554313	554314	1
227  chr1	554315	554316	6
228  chr1	554316	554317	5
229  chr1	554317	554318	1
230  chr1	554318	554319	2
231  chr1	554319	554321	6
232  chr1	554321	554323	1
233  chr1	554323	554334	7
234
235to the output from ``-bga``:
236
237.. code-block:: bash
238
239  # Note the first record reports that the first 554304
240  # base pairs of chr1 had zero coverage
241  $ bedtools genomecov -ibam NA18152.bam -bga | head
242  chr1	0	554304	0
243  chr1	554304	554309	5
244  chr1	554309	554313	6
245  chr1	554313	554314	1
246  chr1	554314	554315	0
247  chr1	554315	554316	6
248  chr1	554316	554317	5
249  chr1	554317	554318	1
250  chr1	554318	554319	2
251  chr1	554319	554321	6
252
253
254==========================================================================
255``-strand`` Reporting genome coverage for a specific strand.
256==========================================================================
257Whereas the default is to count coverage regardless of strand, the ``-strand``
258option allows one to report the coverage observed for a specific strand.
259
260Compare:
261
262.. code-block:: bash
263
264  $ bedtools genomecov -ibam NA18152.bam -bg | head
265  chr1	554304	554309	5
266  chr1	554309	554313	6
267  chr1	554313	554314	1
268  chr1	554315	554316	6
269  chr1	554316	554317	5
270  chr1	554317	554318	1
271  chr1	554318	554319	2
272  chr1	554319	554321	6
273  chr1	554321	554323	1
274  chr1	554323	554334	7
275
276to
277
278.. code-block:: bash
279
280  $ bedtools genomecov -ibam NA18152.bam -bg -strand + | head
281  chr1	554385	554392	4
282  chr1	554392	554408	5
283  chr1	554408	554430	6
284  chr1	554430	554451	7
285  chr1	554451	554455	8
286  chr1	554455	554490	9
287  chr1	554490	554495	10
288  chr1	554495	554496	9
289  chr1	554496	554574	10
290  chr1	554574	554579	11
291
292
293==========================================================================
294``-scale`` Scaling coverage by a constant factor.
295==========================================================================
296The ``-scale`` option allows one to scale the coverage observed in an interval
297file by a constant factor. Each coverage value is multiplied by this factor
298before being reported. This can be useful for normalizing coverage by,
299e.g., metrics such as reads per million (RPM).
300
301Compare:
302
303.. code-block:: bash
304
305  $ bedtools genomecov -ibam NA18152.bam -bg | head
306  chr1	554304	554309	5
307  chr1	554309	554313	6
308  chr1	554313	554314	1
309  chr1	554315	554316	6
310  chr1	554316	554317	5
311  chr1	554317	554318	1
312  chr1	554318	554319	2
313  chr1	554319	554321	6
314  chr1	554321	554323	1
315  chr1	554323	554334	7
316
317to
318
319.. code-block:: bash
320
321  $ bedtools genomecov -ibam NA18152.bam -bg -scale 10.0 | head
322  chr1	554304	554309	50
323  chr1	554309	554313	60
324  chr1	554313	554314	10
325  chr1	554315	554316	60
326  chr1	554316	554317	50
327  chr1	554317	554318	10
328  chr1	554318	554319	20
329  chr1	554319	554321	60
330  chr1	554321	554323	10
331  chr1	554323	554334	70
332
333
334==============================================================================
335``-split`` Reporting coverage with spliced alignments or blocked BED features
336==============================================================================
337``bedtools genomecov`` will, by default, screen for overlaps against the
338entire span of a spliced/split BAM alignment or blocked BED12 feature. When
339dealing with RNA-seq reads, for example, one typically wants to only screen
340for overlaps for the portions of the reads that come from exons (and ignore the
341interstitial intron sequence). The ``-split`` command allows for such
342overlaps to be performed.
343
344
345==============================================================================
346Coverage by fragment
347==============================================================================
348
349|
350
351.. image:: ../images/tool-glyphs/barski_binding_site.png
352
353|
354
355In ChiP-Seq the binding site is usually not at the coordinate where reads map,
356but in the middle of the fragment. For this reason we often try to estimate average fragment size
357for single-read experiment and extend the reads in the 5’-3’ direction up to the estimated fragment length.
358The coverage "by estimated fragments" or by actual pair-end fragments graph is expected to peak at the actual binding site.
359
360
361``-fs`` Forces to use provided fragment size.
362
363
364``-pc`` Calculates coverage for paired-end reads, coverage is calculated as the number of fragments covering each base pair
365