1.. _genomecov: 2 3############### 4*genomecov* 5############### 6 7| 8 9.. image:: ../images/tool-glyphs/genomecov-glyph.png 10 :width: 600pt 11 12| 13 14``bedtools genomecov`` computes histograms (default), per-base reports (``-d``) 15and BEDGRAPH (``-bg``) summaries of feature coverage (e.g., aligned sequences) 16for a given genome. 17 18.. note:: 19 20 1. If using BED/GFF/VCF, the input (``-i``) file must be grouped by 21 chromosome. A simple ``sort -k 1,1 in.bed > in.sorted.bed`` will suffice. 22 Also, if using BED/GFF/VCF, one must provide a genome file via the ``-g`` 23 argument. 24 25 2. If the input is in BAM (-ibam) format, the BAM file must be sorted 26 by position. Using ``samtools sort aln.bam aln.sorted`` will suffice. 27 28 29=============================== 30Usage and option summary 31=============================== 32**Usage**: 33:: 34 35 bedtools genomecov [OPTIONS] [-i|-ibam] -g (iff. -i) 36 37**(or)**: 38:: 39 40 genomeCoverageBed [OPTIONS] [-i|-ibam] -g (iff. -i) 41 42 43 44=========================== =============================================================================================================================================================================================================== 45 Option Description 46=========================== =============================================================================================================================================================================================================== 47**-ibam** | BAM file as input for coverage. Each BAM alignment in A added to the total coverage for the genome. 48 | Use "stdin" or simply "-" if passing it with a UNIX pipe: For example: 49 | ``samtools view -b <BAM> | genomeCoverageBed -ibam stdin -g hg18.genome`` 50**-d** Report the depth at each genome position with 1-based coordinates. 51**-dz** Report the depth at each genome position with 0-based coordinates. 52 Unlike, `-d`, this reports only non-zero positions. 53**-bg** Report depth in BedGraph format. For details, see: http://genome.ucsc.edu/goldenPath/help/bedgraph.html 54**-bga** Report depth in BedGraph format, as above (i.e., -bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep -w 0$" to the output. 55**-split** Treat "split" BAM or BED12 entries as distinct BED intervals when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). 56**-strand** Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6). 57**-5** Calculate coverage of 5' positions (instead of entire interval). 58**-3** Calculate coverage of 3' positions (instead of entire interval). 59**-max** Combine all positions with a depth >= max into a single bin in the histogram. 60**-scale** | Scale the coverage by a constant factor. 61 | Each coverage value is multiplied by this factor before being reported. 62 | Useful for normalizing coverage by, e.g., reads per million (RPM). 63 | ``Default is 1.0; i.e., unscaled.`` 64**-trackline** | Adds a UCSC/Genome-Browser track line definition in the first line of the output. 65 | See `here <http://genome.ucsc.edu/goldenPath/help/bedgraph.html>`_ for more details about track line definition: 66**-trackopts** Writes additional track line definition parameters in the first line. 67**-pc** | Calculates coverage of intervals from left point of a pair reads to the right point. 68 | Works for BAM files only 69**-fs** | Forces to use fragment size instead of read length 70 | Works for BAM files only 71 72=========================== =============================================================================================================================================================================================================== 73 74 75 76 77========================================================================== 78Default behavior 79========================================================================== 80By default, ``bedtools genomecov`` will compute a histogram of coverage for 81the genome file provided. The default output format is as follows: 82 831. chromosome (or entire genome) 842. depth of coverage from features in input file 853. number of bases on chromosome (or genome) with depth equal to column 2. 864. size of chromosome (or entire genome) in base pairs 875. fraction of bases on chromosome (or entire genome) with depth equal to column 2. 88 89For example: 90 91.. code-block:: bash 92 93 $ cat A.bed 94 chr1 10 20 95 chr1 20 30 96 chr2 0 500 97 98 $ cat my.genome 99 chr1 1000 100 chr2 500 101 102 $ bedtools genomecov -i A.bed -g my.genome 103 chr1 0 980 1000 0.98 104 chr1 1 20 1000 0.02 105 chr2 1 500 500 1 106 genome 0 980 1500 0.653333 107 genome 1 520 1500 0.346667 108 109 110========================================================================== 111``-max`` Controlling the histogram's maximum depth 112========================================================================== 113Using the ``-max`` option, ``bedtools genomecov`` will "lump" all positions in 114the genome having feature coverage greater than or equal to ``-max`` into 115the ``-max`` histogram bin. For example, if one sets ``-max`` 116equal to 50, the max depth reported in the output will be 50 and all positions 117with a depth >= 50 will be represented in bin 50. 118 119 120========================================================================== 121``-d`` Reporting "per-base" genome coverage 122========================================================================== 123Using the ``-d`` option, ``bedtools genomecov`` will compute the depth of 124feature coverage for each base on each chromosome in genome file provided. 125 126The "per-base" output format is as follows: 127 1281. chromosome 1292. chromosome position 1303. depth (number) of features overlapping this chromosome position. 131 132For example: 133 134.. code-block:: bash 135 136 $ cat A.bed 137 chr1 10 20 138 chr1 20 30 139 chr2 0 500 140 141 $ cat my.genome 142 chr1 1000 143 chr2 500 144 145 $ bedtools genomecov -i A.bed -g my.genome -d | \ 146 head -15 | \ 147 tail -n 10 148 chr1 6 0 149 chr1 7 0 150 chr1 8 0 151 chr1 9 0 152 chr1 10 0 153 chr1 11 1 154 chr1 12 1 155 chr1 13 1 156 chr1 14 1 157 chr1 15 1 158 159 160========================================================================== 161``-bg`` Reporting genome coverage in BEDGRAPH format. 162========================================================================== 163Whereas the ``-d`` option reports an output line describing the observed 164coverage at each and every position in the genome, the ``-bg`` option instead 165produces genome-wide coverage output in 166`BEDGRAPH <http://genome.ucsc.edu/goldenPath/help/bedgraph.html>`_ format. 167This is a much more concise representation since consecutive positions with the 168same coverage are reported as a single output line describing the start and end 169coordinate of the interval having the coverage level, followed by the coverage 170level itself. 171 172 173For example, below is a snippet of BEDGRAPH output of the coverage from a 1000 174Genome Project BAM file: 175 176.. code-block:: bash 177 178 $ bedtools genomecov -ibam NA18152.bam -bg | head 179 chr1 554304 554309 5 180 chr1 554309 554313 6 181 chr1 554313 554314 1 182 chr1 554315 554316 6 183 chr1 554316 554317 5 184 chr1 554317 554318 1 185 chr1 554318 554319 2 186 chr1 554319 554321 6 187 chr1 554321 554323 1 188 chr1 554323 554334 7 189 190Using this format, one can quickly identify regions of the genome with 191sufficient coverage (in this case, 10 or more reads) by piping the 192output to an ``awk`` filter. 193 194.. code-block:: bash 195 196 $ bedtools genomecov -ibam NA18152.bam -bg | \ 197 awk '$4 > 9' | \ 198 head 199 chr1 554377 554381 11 200 chr1 554381 554385 12 201 chr1 554385 554392 16 202 chr1 554392 554408 17 203 chr1 554408 554410 19 204 chr1 554410 554422 20 205 chr1 554422 554423 19 206 chr1 554423 554430 22 207 chr1 554430 554440 24 208 chr1 554440 554443 25 209 210 211========================================================================== 212``-bga`` Reporting genome coverage for *all* positions in BEDGRAPH format. 213========================================================================== 214The ``-bg`` option reports coverage in BEDGRAPH format only for those regions 215of the genome that actually have coverage. But what about the uncovered portion 216of the genome? By using the ``-bga`` option, one receives a complete report 217including the regions with zero coverage. 218 219For example, compare the output from ``-bg``: 220 221.. code-block:: bash 222 223 $ bedtools genomecov -ibam NA18152.bam -bg | head 224 chr1 554304 554309 5 225 chr1 554309 554313 6 226 chr1 554313 554314 1 227 chr1 554315 554316 6 228 chr1 554316 554317 5 229 chr1 554317 554318 1 230 chr1 554318 554319 2 231 chr1 554319 554321 6 232 chr1 554321 554323 1 233 chr1 554323 554334 7 234 235to the output from ``-bga``: 236 237.. code-block:: bash 238 239 # Note the first record reports that the first 554304 240 # base pairs of chr1 had zero coverage 241 $ bedtools genomecov -ibam NA18152.bam -bga | head 242 chr1 0 554304 0 243 chr1 554304 554309 5 244 chr1 554309 554313 6 245 chr1 554313 554314 1 246 chr1 554314 554315 0 247 chr1 554315 554316 6 248 chr1 554316 554317 5 249 chr1 554317 554318 1 250 chr1 554318 554319 2 251 chr1 554319 554321 6 252 253 254========================================================================== 255``-strand`` Reporting genome coverage for a specific strand. 256========================================================================== 257Whereas the default is to count coverage regardless of strand, the ``-strand`` 258option allows one to report the coverage observed for a specific strand. 259 260Compare: 261 262.. code-block:: bash 263 264 $ bedtools genomecov -ibam NA18152.bam -bg | head 265 chr1 554304 554309 5 266 chr1 554309 554313 6 267 chr1 554313 554314 1 268 chr1 554315 554316 6 269 chr1 554316 554317 5 270 chr1 554317 554318 1 271 chr1 554318 554319 2 272 chr1 554319 554321 6 273 chr1 554321 554323 1 274 chr1 554323 554334 7 275 276to 277 278.. code-block:: bash 279 280 $ bedtools genomecov -ibam NA18152.bam -bg -strand + | head 281 chr1 554385 554392 4 282 chr1 554392 554408 5 283 chr1 554408 554430 6 284 chr1 554430 554451 7 285 chr1 554451 554455 8 286 chr1 554455 554490 9 287 chr1 554490 554495 10 288 chr1 554495 554496 9 289 chr1 554496 554574 10 290 chr1 554574 554579 11 291 292 293========================================================================== 294``-scale`` Scaling coverage by a constant factor. 295========================================================================== 296The ``-scale`` option allows one to scale the coverage observed in an interval 297file by a constant factor. Each coverage value is multiplied by this factor 298before being reported. This can be useful for normalizing coverage by, 299e.g., metrics such as reads per million (RPM). 300 301Compare: 302 303.. code-block:: bash 304 305 $ bedtools genomecov -ibam NA18152.bam -bg | head 306 chr1 554304 554309 5 307 chr1 554309 554313 6 308 chr1 554313 554314 1 309 chr1 554315 554316 6 310 chr1 554316 554317 5 311 chr1 554317 554318 1 312 chr1 554318 554319 2 313 chr1 554319 554321 6 314 chr1 554321 554323 1 315 chr1 554323 554334 7 316 317to 318 319.. code-block:: bash 320 321 $ bedtools genomecov -ibam NA18152.bam -bg -scale 10.0 | head 322 chr1 554304 554309 50 323 chr1 554309 554313 60 324 chr1 554313 554314 10 325 chr1 554315 554316 60 326 chr1 554316 554317 50 327 chr1 554317 554318 10 328 chr1 554318 554319 20 329 chr1 554319 554321 60 330 chr1 554321 554323 10 331 chr1 554323 554334 70 332 333 334============================================================================== 335``-split`` Reporting coverage with spliced alignments or blocked BED features 336============================================================================== 337``bedtools genomecov`` will, by default, screen for overlaps against the 338entire span of a spliced/split BAM alignment or blocked BED12 feature. When 339dealing with RNA-seq reads, for example, one typically wants to only screen 340for overlaps for the portions of the reads that come from exons (and ignore the 341interstitial intron sequence). The ``-split`` command allows for such 342overlaps to be performed. 343 344 345============================================================================== 346Coverage by fragment 347============================================================================== 348 349| 350 351.. image:: ../images/tool-glyphs/barski_binding_site.png 352 353| 354 355In ChiP-Seq the binding site is usually not at the coordinate where reads map, 356but in the middle of the fragment. For this reason we often try to estimate average fragment size 357for single-read experiment and extend the reads in the 5’-3’ direction up to the estimated fragment length. 358The coverage "by estimated fragments" or by actual pair-end fragments graph is expected to peak at the actual binding site. 359 360 361``-fs`` Forces to use provided fragment size. 362 363 364``-pc`` Calculates coverage for paired-end reads, coverage is calculated as the number of fragments covering each base pair 365