1###############
2Example usage
3###############
4
5Below are several examples of basic bedtools usage. Example BED files are
6provided in the /data directory of the bedtools distribution.
7
8
9
10==========================================================================
11bedtools intersect
12==========================================================================
13
14
15Report the base-pair overlap between sequence alignments and genes.
16
17.. code-block:: bash
18
19  bedtools intersect -a reads.bed -b genes.bed
20
21
22
23Report whether each alignment overlaps one or more genes. If not, the alignment is not reported.
24
25.. code-block:: bash
26
27  bedtools intersect -a reads.bed -b genes.bed -u
28
29
30
31Report those alignments that overlap NO genes. Like "grep -v"
32
33.. code-block:: bash
34
35  bedtools intersect -a reads.bed -b genes.bed -v
36
37
38Report the number of genes that each alignment overlaps.
39
40.. code-block:: bash
41
42  bedtools intersect -a reads.bed -b genes.bed -c
43
44
45Report the entire, original alignment entry for each overlap with a gene.
46
47.. code-block:: bash
48
49  bedtools intersect -a reads.bed -b genes.bed -wa
50
51
52
53Report the entire, original gene entry for each overlap with a gene.
54
55.. code-block:: bash
56
57  bedtools intersect -a reads.bed -b genes.bed -wb
58
59
60
61Report the entire, original alignment and gene entries for each overlap.
62
63.. code-block:: bash
64
65  bedtools intersect -a reads.bed -b genes.bed -wa -wb
66
67
68
69Only report an overlap with a repeat if it spans at least 50% of the exon.
70
71.. code-block:: bash
72
73  bedtools intersect -a exons.bed -b repeatMasker.bed -f 0.50
74
75
76
77Only report an overlap if comprises 50% of the structural variant and 50% of the segmental duplication. Thus, it is reciprocally at least a 50% overlap.
78
79.. code-block:: bash
80
81  bedtools intersect -a SV.bed -b segmentalDups.bed -f 0.50 -r
82
83
84
85Read BED A from stdin. For example, find genes that overlap LINEs but not SINEs.
86
87.. code-block:: bash
88
89  bedtools intersect -a genes.bed -b LINES.bed | intersectBed -a stdin -b SINEs.bed -v
90
91
92
93Retain only single-end BAM alignments that overlap exons.
94
95.. code-block:: bash
96
97  bedtools intersect -abam reads.bam -b exons.bed > reads.touchingExons.bam
98
99
100
101Retain only single-end BAM alignments that do not overlap simple sequence
102repeats.
103
104.. code-block:: bash
105
106  bedtools intersect -abam reads.bam -b SSRs.bed -v > reads.noSSRs.bam
107
108
109
110
111==========================================================================
112bedtools bamtobed
113==========================================================================
114
115Convert BAM alignments to BED format.
116
117.. code-block:: bash
118
119  bedtools bamtobed -i reads.bam > reads.bed
120
121
122
123Convert BAM alignments to BED format using the BAM edit distance (NM) as the
124BED "score".
125
126.. code-block:: bash
127
128  bedtools bamtobed -i reads.bam -ed > reads.bed
129
130
131
132Convert BAM alignments to BEDPE format.
133
134.. code-block:: bash
135
136  bedtools bamtobed -i reads.bam -bedpe > reads.bedpe
137
138
139
140
141
142==========================================================================
143bedtools window
144==========================================================================
145
146
147
148Report all genes that are within 10000 bp upstream or downstream of CNVs.
149
150.. code-block:: bash
151
152  bedtools window -a CNVs.bed -b genes.bed -w 10000
153
154
155
156Report all genes that are within 10000 bp upstream or 5000 bp downstream of
157CNVs.
158
159.. code-block:: bash
160
161  bedtools window -a CNVs.bed -b genes.bed -l 10000 -r 5000
162
163
164Report all SNPs that are within 5000 bp upstream or 1000 bp downstream of genes.
165Define upstream and downstream based on strand.
166
167.. code-block:: bash
168
169  bedtools window -a genes.bed -b snps.bed -l 5000 -r 1000 -sw
170
171
172
173
174
175==========================================================================
176bedtools closest
177==========================================================================
178Note: By default, if there is a tie for closest, all ties will be reported. **closestBed** allows overlapping
179features to be the closest.
180
181
182
183Find the closest ALU to each gene.
184
185.. code-block:: bash
186
187  bedtools closest -a genes.bed -b ALUs.bed
188
189
190
191Find the closest ALU to each gene, choosing the first ALU in the file if there is a
192tie.
193
194.. code-block:: bash
195
196  bedtools closest -a genes.bed -b ALUs.bed -t first
197
198
199
200Find the closest ALU to each gene, choosing the last ALU in the file if there is a
201tie.
202
203.. code-block:: bash
204
205  bedtools closest -a genes.bed -b ALUs.bed -t last
206
207
208
209
210==========================================================================
211bedtools subtract
212==========================================================================
213
214.. note::
215
216    If a feature in A is entirely "spanned" by any feature in B, it will not be reported.
217
218Remove introns from gene features. Exons will (should) be reported.
219
220.. code-block:: bash
221
222  bedtools subtract -a genes.bed -b introns.bed
223
224
225==========================================================================
226bedtools merge
227==========================================================================
228
229.. note::
230
231    ``merge`` requires that the input is sorted by chromosome and then by start
232    coordinate.  For example, for BED files, one would first sort the input
233    as follows: ``sort -k1,1 -k2,2n input.bed > input.sorted.bed``
234
235Merge overlapping repetitive elements into a single entry.
236
237.. code-block:: bash
238
239  bedtools merge -i repeatMasker.bed
240
241
242
243Merge overlapping repetitive elements into a single entry, returning the number of
244entries merged.
245
246.. code-block:: bash
247
248  bedtools merge -i repeatMasker.bed -n
249
250
251Merge nearby (within 1000 bp) repetitive elements into a single entry.
252
253.. code-block:: bash
254
255  bedtools merge -i repeatMasker.bed -d 1000
256
257
258==========================================================================
259bedtools coverage
260==========================================================================
261
262
263Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the
264genome.
265
266.. code-block:: bash
267
268  bedtools coverage -a reads.bed -b windows10kb.bed | head
269  chr1 0     10000 0  10000 0.00
270  chr1 10001 20000 33 10000 0.21
271  chr1 20001 30000 42 10000 0.29
272  chr1 30001 40000 71 10000 0.36
273
274
275
276Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the
277genome and created a BEDGRAPH of the number of aligned reads in each window for
278display on the UCSC browser.
279
280.. code-block:: bash
281
282  bedtools coverage -a reads.bed -b windows10kb.bed | cut -f 1-4 > windows10kb.cov.bedg
283
284
285
286Compute the coverage of aligned sequences on 10 kilobase "windows" spanning the
287genome and created a BEDGRAPH of the fraction of each window covered by at least
288one aligned read for display on the UCSC browser.
289
290.. code-block:: bash
291
292  bedtools coverage -a reads.bed -b windows10kb.bed | \
293     awk '{OFS="\t"; print $1,$2,$3,$6}' \
294     > windows10kb.pctcov.bedg
295
296
297
298
299==========================================================================
300bedtools complement
301==========================================================================
302
303
304Report all intervals in the human genome that are not covered by repetitive
305elements.
306
307.. code-block:: bash
308
309  bedtools complement -i repeatMasker.bed -g hg18.genome
310
311
312
313==========================================================================
314bedtools shuffle
315==========================================================================
316
317
318Randomly place all discovered variants in the genome. However, prevent them
319from being placed in know genome gaps.
320
321.. code-block:: bash
322
323   bedtools shuffle -i variants.bed -g hg18.genome -excl genome_gaps.bed
324
325
326
327Randomly place all discovered variants in the genome. However, prevent them
328from being placed in know genome gaps and require that the variants be randomly
329placed on the same chromosome.
330
331.. code-block:: bash
332
333   bedtools shuffle -i variants.bed -g hg18.genome -excl genome_gaps.bed -chrom
334