xref: /openbsd/share/man/man4/bpf.4 (revision 3dd9a927)
1.\"	$OpenBSD: bpf.4,v 1.38 2016/04/28 19:07:19 natano Exp $
2.\"     $NetBSD: bpf.4,v 1.7 1995/09/27 18:31:50 thorpej Exp $
3.\"
4.\" Copyright (c) 1990 The Regents of the University of California.
5.\" All rights reserved.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that: (1) source code distributions
9.\" retain the above copyright notice and this paragraph in its entirety, (2)
10.\" distributions including binary code include the above copyright notice and
11.\" this paragraph in its entirety in the documentation or other materials
12.\" provided with the distribution, and (3) all advertising materials mentioning
13.\" features or use of this software display the following acknowledgement:
14.\" ``This product includes software developed by the University of California,
15.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
16.\" the University nor the names of its contributors may be used to endorse
17.\" or promote products derived from this software without specific prior
18.\" written permission.
19.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
20.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
21.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
22.\"
23.\" This document is derived in part from the enet man page (enet.4)
24.\" distributed with 4.3BSD Unix.
25.\"
26.Dd $Mdocdate: April 28 2016 $
27.Dt BPF 4
28.Os
29.Sh NAME
30.Nm bpf
31.Nd Berkeley Packet Filter
32.Sh SYNOPSIS
33.Cd "pseudo-device bpfilter"
34.Sh DESCRIPTION
35The Berkeley Packet Filter provides a raw interface to data link layers in
36a protocol-independent fashion.
37All packets on the network, even those destined for other hosts, are
38accessible through this mechanism.
39.Pp
40The packet filter appears as a character special device,
41.Pa /dev/bpf .
42After opening the device, the file descriptor must be bound to a specific
43network interface with the
44.Dv BIOCSETIF
45.Xr ioctl 2 .
46A given interface can be shared between multiple listeners, and the filter
47underlying each descriptor will see an identical packet stream.
48.Pp
49Associated with each open instance of a
50.Nm
51file is a user-settable
52packet filter.
53Whenever a packet is received by an interface, all file descriptors
54listening on that interface apply their filter.
55Each descriptor that accepts the packet receives its own copy.
56.Pp
57Reads from these files return the next group of packets that have matched
58the filter.
59To improve performance, the buffer passed to read must be the same size as
60the buffers used internally by
61.Nm bpf .
62This size is returned by the
63.Dv BIOCGBLEN
64.Xr ioctl 2
65and can be set with
66.Dv BIOCSBLEN .
67Note that an individual packet larger than this size is necessarily truncated.
68.Pp
69A packet can be sent out on the network by writing to a
70.Nm
71file descriptor.
72Each descriptor can also have a user-settable filter
73for controlling the writes.
74Only packets matching the filter are sent out of the interface.
75The writes are unbuffered, meaning only one packet can be processed per write.
76.Pp
77Once a descriptor is configured, further changes to the configuration
78can be prevented using the
79.Dv BIOCLOCK
80.Xr ioctl 2 .
81.Sh IOCTL INTERFACE
82The
83.Xr ioctl 2
84command codes below are defined in
85.In net/bpf.h .
86All commands require these includes:
87.Pp
88.nr nS 1
89.In sys/types.h
90.In sys/time.h
91.In sys/ioctl.h
92.In net/bpf.h
93.nr nS 0
94.Pp
95Additionally,
96.Dv BIOCGETIF
97and
98.Dv BIOCSETIF
99require
100.In sys/socket.h
101and
102.In net/if.h .
103.Pp
104The (third) argument to the
105.Xr ioctl 2
106call should be a pointer to the type indicated.
107.Pp
108.Bl -tag -width Ds -compact
109.It Dv BIOCGBLEN Fa "u_int *"
110Returns the required buffer length for reads on
111.Nm
112files.
113.Pp
114.It Dv BIOCSBLEN Fa "u_int *"
115Sets the buffer length for reads on
116.Nm
117files.
118The buffer must be set before the file is attached to an interface with
119.Dv BIOCSETIF .
120If the requested buffer size cannot be accommodated, the closest allowable
121size will be set and returned in the argument.
122A read call will result in
123.Er EINVAL
124if it is passed a buffer that is not this size.
125.Pp
126.It Dv BIOCGDLT Fa "u_int *"
127Returns the type of the data link layer underlying the attached interface.
128.Er EINVAL
129is returned if no interface has been specified.
130The device types, prefixed with
131.Dq DLT_ ,
132are defined in
133.In net/bpf.h .
134.Pp
135.It Dv BIOCGDLTLIST Fa "struct bpf_dltlist *"
136Returns an array of the available types of the data link layer
137underlying the attached interface:
138.Bd -literal -offset indent
139struct bpf_dltlist {
140	u_int bfl_len;
141	u_int *bfl_list;
142};
143.Ed
144.Pp
145The available types are returned in the array pointed to by the
146.Va bfl_list
147field while their length in
148.Vt u_int
149is supplied to the
150.Va bfl_len
151field.
152.Er ENOMEM
153is returned if there is not enough buffer space and
154.Er EFAULT
155is returned if a bad address is encountered.
156The
157.Va bfl_len
158field is modified on return to indicate the actual length in
159.Vt u_int
160of the array returned.
161If
162.Va bfl_list
163is
164.Dv NULL ,
165the
166.Va bfl_len
167field is set to indicate the required length of the array in
168.Vt u_int .
169.Pp
170.It Dv BIOCSDLT Fa "u_int *"
171Changes the type of the data link layer underlying the attached interface.
172.Er EINVAL
173is returned if no interface has been specified or the specified
174type is not available for the interface.
175.Pp
176.It Dv BIOCPROMISC
177Forces the interface into promiscuous mode.
178All packets, not just those destined for the local host, are processed.
179Since more than one file can be listening on a given interface, a listener
180that opened its interface non-promiscuously may receive packets promiscuously.
181This problem can be remedied with an appropriate filter.
182.Pp
183The interface remains in promiscuous mode until all files listening
184promiscuously are closed.
185.Pp
186.It Dv BIOCFLUSH
187Flushes the buffer of incoming packets and resets the statistics that are
188returned by
189.Dv BIOCGSTATS .
190.Pp
191.It Dv BIOCLOCK
192This ioctl is designed to prevent the security issues associated
193with an open
194.Nm
195descriptor in unprivileged programs.
196Even with dropped privileges, an open
197.Nm
198descriptor can be abused by a rogue program to listen on any interface
199on the system, send packets on these interfaces if the descriptor was
200opened read-write and send signals to arbitrary processes using the
201signaling mechanism of
202.Nm bpf .
203By allowing only
204.Dq known safe
205ioctls, the
206.Dv BIOCLOCK
207ioctl prevents this abuse.
208The allowable ioctls are
209.Dv BIOCFLUSH ,
210.Dv BIOCGBLEN ,
211.Dv BIOCGDIRFILT ,
212.Dv BIOCGDLT ,
213.Dv BIOCGDIRFILT ,
214.Dv BIOCGDLTLIST ,
215.Dv BIOCGETIF ,
216.Dv BIOCGHDRCMPLT ,
217.Dv BIOCGRSIG ,
218.Dv BIOCGRTIMEOUT ,
219.Dv BIOCGSTATS ,
220.Dv BIOCIMMEDIATE ,
221.Dv BIOCLOCK ,
222.Dv BIOCSRTIMEOUT ,
223.Dv BIOCVERSION ,
224.Dv TIOCGPGRP ,
225and
226.Dv FIONREAD .
227Use of any other ioctl is denied with error
228.Er EPERM .
229Once a descriptor is locked, it is not possible to unlock it.
230A process with root privileges is not affected by the lock.
231.Pp
232A privileged program can open a
233.Nm
234device, drop privileges, set the interface, filters and modes on the
235descriptor, and lock it.
236Once the descriptor is locked, the system is safe
237from further abuse through the descriptor.
238Locking a descriptor does not prevent writes.
239If the application does not need to send packets through
240.Nm bpf ,
241it can open the device read-only to prevent writing.
242If sending packets is necessary, a write-filter can be set before locking the
243descriptor to prevent arbitrary packets from being sent out.
244.Pp
245.It Dv BIOCGETIF Fa "struct ifreq *"
246Returns the name of the hardware interface that the file is listening on.
247The name is returned in the
248.Fa ifr_name
249field of the
250.Li struct ifreq .
251All other fields are undefined.
252.Pp
253.It Dv BIOCSETIF Fa "struct ifreq *"
254Sets the hardware interface associated with the file.
255This command must be performed before any packets can be read.
256The device is indicated by name using the
257.Fa ifr_name
258field of the
259.Li struct ifreq .
260Additionally, performs the actions of
261.Dv BIOCFLUSH .
262.Pp
263.It Dv BIOCSRTIMEOUT Fa "struct timeval *"
264.It Dv BIOCGRTIMEOUT Fa "struct timeval *"
265Sets or gets the read timeout parameter.
266The
267.Ar timeval
268specifies the length of time to wait before timing out on a read request.
269This parameter is initialized to zero by
270.Xr open 2 ,
271indicating no timeout.
272.Pp
273.It Dv BIOCGSTATS Fa "struct bpf_stat *"
274Returns the following structure of packet statistics:
275.Bd -literal -offset indent
276struct bpf_stat {
277	u_int bs_recv;
278	u_int bs_drop;
279};
280.Ed
281.Pp
282The fields are:
283.Bl -tag -width bs_recv
284.It Fa bs_recv
285Number of packets received by the descriptor since opened or reset (including
286any buffered since the last read call).
287.It Fa bs_drop
288Number of packets which were accepted by the filter but dropped by the kernel
289because of buffer overflows (i.e., the application's reads aren't keeping up
290with the packet traffic).
291.El
292.Pp
293.It Dv BIOCIMMEDIATE Fa "u_int *"
294Enables or disables
295.Dq immediate mode ,
296based on the truth value of the argument.
297When immediate mode is enabled, reads return immediately upon packet reception.
298Otherwise, a read will block until either the kernel buffer becomes full or a
299timeout occurs.
300This is useful for programs like
301.Xr rarpd 8 ,
302which must respond to messages in real time.
303The default for a new file is off.
304.Pp
305.It Dv BIOCSETF Fa "struct bpf_program *"
306Sets the filter program used by the kernel to discard uninteresting packets.
307An array of instructions and its length are passed in using the following
308structure:
309.Bd -literal -offset indent
310struct bpf_program {
311	u_int bf_len;
312	struct bpf_insn *bf_insns;
313};
314.Ed
315.Pp
316The filter program is pointed to by the
317.Fa bf_insns
318field, while its length in units of
319.Li struct bpf_insn
320is given by the
321.Fa bf_len
322field.
323Also, the actions of
324.Dv BIOCFLUSH
325are performed.
326.Pp
327See section
328.Sx FILTER MACHINE
329for an explanation of the filter language.
330.Pp
331.It Dv BIOCSETWF Fa "struct bpf_program *"
332Sets the filter program used by the kernel to filter the packets
333written to the descriptor before the packets are sent out on the
334network.
335See
336.Dv BIOCSETF
337for a description of the filter program.
338This ioctl also acts as
339.Dv BIOCFLUSH .
340.Pp
341Note that the filter operates on the packet data written to the descriptor.
342If the
343.Dq header complete
344flag is not set, the kernel sets the link-layer source address
345of the packet after filtering.
346.Pp
347.It Dv BIOCVERSION Fa "struct bpf_version *"
348Returns the major and minor version numbers of the filter language currently
349recognized by the kernel.
350Before installing a filter, applications must check that the current version
351is compatible with the running kernel.
352Version numbers are compatible if the major numbers match and the application
353minor is less than or equal to the kernel minor.
354The kernel version number is returned in the following structure:
355.Bd -literal -offset indent
356struct bpf_version {
357	u_short bv_major;
358	u_short bv_minor;
359};
360.Ed
361.Pp
362The current version numbers are given by
363.Dv BPF_MAJOR_VERSION
364and
365.Dv BPF_MINOR_VERSION
366from
367.In net/bpf.h .
368An incompatible filter may result in undefined behavior (most likely, an
369error returned by
370.Xr ioctl 2
371or haphazard packet matching).
372.Pp
373.It Dv BIOCSRSIG Fa "u_int *"
374.It Dv BIOCGRSIG Fa "u_int *"
375Sets or gets the receive signal.
376This signal will be sent to the process or process group specified by
377.Dv FIOSETOWN .
378It defaults to
379.Dv SIGIO .
380.Pp
381.It Dv BIOCSHDRCMPLT Fa "u_int *"
382.It Dv BIOCGHDRCMPLT Fa "u_int *"
383Sets or gets the status of the
384.Dq header complete
385flag.
386Set to zero if the link level source address should be filled in
387automatically by the interface output routine.
388Set to one if the link level source address will be written,
389as provided, to the wire.
390This flag is initialized to zero by default.
391.Pp
392.It Dv BIOCSFILDROP Fa "u_int *"
393.It Dv BIOCGFILDROP Fa "u_int *"
394Sets or gets the status of the
395.Dq filter drop
396flag.
397If non-zero, packets matching any filters will be reported to the
398associated interface so that they can be dropped.
399.Pp
400.It Dv BIOCSDIRFILT Fa "u_int *"
401.It Dv BIOCGDIRFILT Fa "u_int *"
402Sets or gets the status of the
403.Dq direction filter
404flag.
405If non-zero, packets matching the specified direction (either
406.Dv BPF_DIRECTION_IN
407or
408.Dv BPF_DIRECTION_OUT )
409will be ignored.
410.El
411.Ss Standard ioctls
412.Nm
413now supports several standard ioctls which allow the user to do asynchronous
414and/or non-blocking I/O to an open
415.Nm
416file descriptor.
417.Pp
418.Bl -tag -width Ds -compact
419.It Dv FIONREAD Fa "int *"
420Returns the number of bytes that are immediately available for reading.
421.Pp
422.It Dv FIONBIO Fa "int *"
423Sets or clears non-blocking I/O.
424If the argument is non-zero, enable non-blocking I/O.
425If the argument is zero, disable non-blocking I/O.
426If non-blocking I/O is enabled, the return value of a read while no data
427is available will be 0.
428The non-blocking read behavior is different from performing non-blocking
429reads on other file descriptors, which will return \-1 and set
430.Va errno
431to
432.Er EAGAIN
433if no data is available.
434Note: setting this overrides the timeout set by
435.Dv BIOCSRTIMEOUT .
436.Pp
437.It Dv FIOASYNC Fa "int *"
438Enables or disables asynchronous I/O.
439When enabled (argument is non-zero), the process or process group specified
440by
441.Dv FIOSETOWN
442will start receiving
443.Dv SIGIO
444signals when packets arrive.
445Note that you must perform an
446.Dv FIOSETOWN
447command in order for this to take effect, as the system will not do it by
448default.
449The signal may be changed via
450.Dv BIOCSRSIG .
451.Pp
452.It Dv FIOSETOWN Fa "int *"
453.It Dv FIOGETOWN Fa "int *"
454Sets or gets the process or process group (if negative) that should receive
455.Dv SIGIO
456when packets are available.
457The signal may be changed using
458.Dv BIOCSRSIG
459(see above).
460.El
461.Ss BPF header
462The following structure is prepended to each packet returned by
463.Xr read 2 :
464.Bd -literal -offset indent
465struct bpf_hdr {
466	struct bpf_timeval bh_tstamp;
467	u_int32_t	bh_caplen;
468	u_int32_t	bh_datalen;
469	u_int16_t	bh_hdrlen;
470};
471.Ed
472.Pp
473The fields, stored in host order, are as follows:
474.Bl -tag -width Ds
475.It Fa bh_tstamp
476Time at which the packet was processed by the packet filter.
477.It Fa bh_caplen
478Length of the captured portion of the packet.
479This is the minimum of the truncation amount specified by the filter and the
480length of the packet.
481.It Fa bh_datalen
482Length of the packet off the wire.
483This value is independent of the truncation amount specified by the filter.
484.It Fa bh_hdrlen
485Length of the BPF header, which may not be equal to
486.Li sizeof(struct bpf_hdr) .
487.El
488.Pp
489The
490.Fa bh_hdrlen
491field exists to account for padding between the header and the link level
492protocol.
493The purpose here is to guarantee proper alignment of the packet data
494structures, which is required on alignment-sensitive architectures and
495improves performance on many other architectures.
496The packet filter ensures that the
497.Fa bpf_hdr
498and the network layer header will be word aligned.
499Suitable precautions must be taken when accessing the link layer protocol
500fields on alignment restricted machines.
501(This isn't a problem on an Ethernet, since the type field is a
502.Li short
503falling on an even offset, and the addresses are probably accessed in a
504bytewise fashion).
505.Pp
506Additionally, individual packets are padded so that each starts on a
507word boundary.
508This requires that an application has some knowledge of how to get from packet
509to packet.
510The macro
511.Dv BPF_WORDALIGN
512is defined in
513.In net/bpf.h
514to facilitate this process.
515It rounds up its argument to the nearest word aligned value (where a word is
516.Dv BPF_ALIGNMENT
517bytes wide).
518For example, if
519.Va p
520points to the start of a packet, this expression will advance it to the
521next packet:
522.Pp
523.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen);
524.Pp
525For the alignment mechanisms to work properly, the buffer passed to
526.Xr read 2
527must itself be word aligned.
528.Xr malloc 3
529will always return an aligned buffer.
530.Ss Filter machine
531A filter program is an array of instructions with all branches forwardly
532directed, terminated by a
533.Dq return
534instruction.
535Each instruction performs some action on the pseudo-machine state, which
536consists of an accumulator, index register, scratch memory store, and
537implicit program counter.
538.Pp
539The following structure defines the instruction format:
540.Bd -literal -offset indent
541struct bpf_insn {
542	u_int16_t	code;
543	u_char		jt;
544	u_char		jf;
545	u_int32_t	k;
546};
547.Ed
548.Pp
549The
550.Fa k
551field is used in different ways by different instructions, and the
552.Fa jt
553and
554.Fa jf
555fields are used as offsets by the branch instructions.
556The opcodes are encoded in a semi-hierarchical fashion.
557There are eight classes of instructions:
558.Dv BPF_LD ,
559.Dv BPF_LDX ,
560.Dv BPF_ST ,
561.Dv BPF_STX ,
562.Dv BPF_ALU ,
563.Dv BPF_JMP ,
564.Dv BPF_RET ,
565and
566.Dv BPF_MISC .
567Various other mode and operator bits are logically OR'd into the class to
568give the actual instructions.
569The classes and modes are defined in
570.In net/bpf.h .
571Below are the semantics for each defined
572.Nm
573instruction.
574We use the convention that A is the accumulator, X is the index register,
575P[] packet data, and M[] scratch memory store.
576P[i:n] gives the data at byte offset
577.Dq i
578in the packet, interpreted as a word (n=4), unsigned halfword (n=2), or
579unsigned byte (n=1).
580M[i] gives the i'th word in the scratch memory store, which is only addressed
581in word units.
582The memory store is indexed from 0 to
583.Dv BPF_MEMWORDS Ns \-1 .
584.Fa k ,
585.Fa jt ,
586and
587.Fa jf
588are the corresponding fields in the instruction definition.
589.Dq len
590refers to the length of the packet.
591.Bl -tag -width Ds
592.It Dv BPF_LD
593These instructions copy a value into the accumulator.
594The type of the source operand is specified by an
595.Dq addressing mode
596and can be a constant
597.Pf ( Dv BPF_IMM ) ,
598packet data at a fixed offset
599.Pf ( Dv BPF_ABS ) ,
600packet data at a variable offset
601.Pf ( Dv BPF_IND ) ,
602the packet length
603.Pf ( Dv BPF_LEN ) ,
604or a word in the scratch memory store
605.Pf ( Dv BPF_MEM ) .
606For
607.Dv BPF_IND
608and
609.Dv BPF_ABS ,
610the data size must be specified as a word
611.Pf ( Dv BPF_W ) ,
612halfword
613.Pf ( Dv BPF_H ) ,
614or byte
615.Pf ( Dv BPF_B ) .
616The semantics of all recognized
617.Dv BPF_LD
618instructions follow.
619.Pp
620.Bl -tag -width 32n -compact
621.Sm off
622.It Xo Dv BPF_LD No + Dv BPF_W No +
623.Dv BPF_ABS
624.Xc
625.Sm on
626A <- P[k:4]
627.Sm off
628.It Xo Dv BPF_LD No + Dv BPF_H No +
629.Dv BPF_ABS
630.Xc
631.Sm on
632A <- P[k:2]
633.Sm off
634.It Xo Dv BPF_LD No + Dv BPF_B No +
635.Dv BPF_ABS
636.Xc
637.Sm on
638A <- P[k:1]
639.Sm off
640.It Xo Dv BPF_LD No + Dv BPF_W No +
641.Dv BPF_IND
642.Xc
643.Sm on
644A <- P[X+k:4]
645.Sm off
646.It Xo Dv BPF_LD No + Dv BPF_H No +
647.Dv BPF_IND
648.Xc
649.Sm on
650A <- P[X+k:2]
651.Sm off
652.It Xo Dv BPF_LD No + Dv BPF_B No +
653.Dv BPF_IND
654.Xc
655.Sm on
656A <- P[X+k:1]
657.Sm off
658.It Xo Dv BPF_LD No + Dv BPF_W No +
659.Dv BPF_LEN
660.Xc
661.Sm on
662A <- len
663.Sm off
664.It Dv BPF_LD No + Dv BPF_IMM
665.Sm on
666A <- k
667.Sm off
668.It Dv BPF_LD No + Dv BPF_MEM
669.Sm on
670A <- M[k]
671.El
672.It Dv BPF_LDX
673These instructions load a value into the index register.
674Note that the addressing modes are more restricted than those of the
675accumulator loads, but they include
676.Dv BPF_MSH ,
677a hack for efficiently loading the IP header length.
678.Pp
679.Bl -tag -width 32n -compact
680.Sm off
681.It Xo Dv BPF_LDX No + Dv BPF_W No +
682.Dv BPF_IMM
683.Xc
684.Sm on
685X <- k
686.Sm off
687.It Xo Dv BPF_LDX No + Dv BPF_W No +
688.Dv BPF_MEM
689.Xc
690.Sm on
691X <- M[k]
692.Sm off
693.It Xo Dv BPF_LDX No + Dv BPF_W No +
694.Dv BPF_LEN
695.Xc
696.Sm on
697X <- len
698.Sm off
699.It Xo Dv BPF_LDX No + Dv BPF_B No +
700.Dv BPF_MSH
701.Xc
702.Sm on
703X <- 4*(P[k:1]&0xf)
704.El
705.It Dv BPF_ST
706This instruction stores the accumulator into the scratch memory.
707We do not need an addressing mode since there is only one possibility for
708the destination.
709.Pp
710.Bl -tag -width 32n -compact
711.It Dv BPF_ST
712M[k] <- A
713.El
714.It Dv BPF_STX
715This instruction stores the index register in the scratch memory store.
716.Pp
717.Bl -tag -width 32n -compact
718.It Dv BPF_STX
719M[k] <- X
720.El
721.It Dv BPF_ALU
722The ALU instructions perform operations between the accumulator and index
723register or constant, and store the result back in the accumulator.
724For binary operations, a source mode is required
725.Pf ( Dv BPF_K
726or
727.Dv BPF_X ) .
728.Pp
729.Bl -tag -width 32n -compact
730.Sm off
731.It Xo Dv BPF_ALU No + BPF_ADD No +
732.Dv BPF_K
733.Xc
734.Sm on
735A <- A + k
736.Sm off
737.It Xo Dv BPF_ALU No + BPF_SUB No +
738.Dv BPF_K
739.Xc
740.Sm on
741A <- A - k
742.Sm off
743.It Xo Dv BPF_ALU No + BPF_MUL No +
744.Dv BPF_K
745.Xc
746.Sm on
747A <- A * k
748.Sm off
749.It Xo Dv BPF_ALU No + BPF_DIV No +
750.Dv BPF_K
751.Xc
752.Sm on
753A <- A / k
754.Sm off
755.It Xo Dv BPF_ALU No + BPF_AND No +
756.Dv BPF_K
757.Xc
758.Sm on
759A <- A & k
760.Sm off
761.It Xo Dv BPF_ALU No + BPF_OR No +
762.Dv BPF_K
763.Xc
764.Sm on
765A <- A | k
766.Sm off
767.It Xo Dv BPF_ALU No + BPF_LSH No +
768.Dv BPF_K
769.Xc
770.Sm on
771A <- A << k
772.Sm off
773.It Xo Dv BPF_ALU No + BPF_RSH No +
774.Dv BPF_K
775.Xc
776.Sm on
777A <- A >> k
778.Sm off
779.It Xo Dv BPF_ALU No + BPF_ADD No +
780.Dv BPF_X
781.Xc
782.Sm on
783A <- A + X
784.Sm off
785.It Xo Dv BPF_ALU No + BPF_SUB No +
786.Dv BPF_X
787.Xc
788.Sm on
789A <- A - X
790.Sm off
791.It Xo Dv BPF_ALU No + BPF_MUL No +
792.Dv BPF_X
793.Xc
794.Sm on
795A <- A * X
796.Sm off
797.It Xo Dv BPF_ALU No + BPF_DIV No +
798.Dv BPF_X
799.Xc
800.Sm on
801A <- A / X
802.Sm off
803.It Xo Dv BPF_ALU No + BPF_AND No +
804.Dv BPF_X
805.Xc
806.Sm on
807A <- A & X
808.Sm off
809.It Xo Dv BPF_ALU No + BPF_OR No +
810.Dv BPF_X
811.Xc
812.Sm on
813A <- A | X
814.Sm off
815.It Xo Dv BPF_ALU No + BPF_LSH No +
816.Dv BPF_X
817.Xc
818.Sm on
819A <- A << X
820.Sm off
821.It Xo Dv BPF_ALU No + BPF_RSH No +
822.Dv BPF_X
823.Xc
824.Sm on
825A <- A >> X
826.Sm off
827.It Dv BPF_ALU No + BPF_NEG
828.Sm on
829A <- -A
830.El
831.It Dv BPF_JMP
832The jump instructions alter flow of control.
833Conditional jumps compare the accumulator against a constant
834.Pf ( Dv BPF_K )
835or the index register
836.Pf ( Dv BPF_X ) .
837If the result is true (or non-zero), the true branch is taken, otherwise the
838false branch is taken.
839Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
840However, the jump always
841.Pf ( Dv BPF_JA )
842opcode uses the 32-bit
843.Fa k
844field as the offset, allowing arbitrarily distant destinations.
845All conditionals use unsigned comparison conventions.
846.Pp
847.Bl -tag -width 32n -compact
848.Sm off
849.It Dv BPF_JMP No + BPF_JA
850pc += k
851.Sm on
852.Sm off
853.It Xo Dv BPF_JMP No + BPF_JGT No +
854.Dv BPF_K
855.Xc
856.Sm on
857pc += (A > k) ? jt : jf
858.Sm off
859.It Xo Dv BPF_JMP No + BPF_JGE No +
860.Dv BPF_K
861.Xc
862.Sm on
863pc += (A >= k) ? jt : jf
864.Sm off
865.It Xo Dv BPF_JMP No + BPF_JEQ No +
866.Dv BPF_K
867.Xc
868.Sm on
869pc += (A == k) ? jt : jf
870.Sm off
871.It Xo Dv BPF_JMP No + BPF_JSET No +
872.Dv BPF_K
873.Xc
874.Sm on
875pc += (A & k) ? jt : jf
876.Sm off
877.It Xo Dv BPF_JMP No + BPF_JGT No +
878.Dv BPF_X
879.Xc
880.Sm on
881pc += (A > X) ? jt : jf
882.Sm off
883.It Xo Dv BPF_JMP No + BPF_JGE No +
884.Dv BPF_X
885.Xc
886.Sm on
887pc += (A >= X) ? jt : jf
888.Sm off
889.It Xo Dv BPF_JMP No + BPF_JEQ No +
890.Dv BPF_X
891.Xc
892.Sm on
893pc += (A == X) ? jt : jf
894.Sm off
895.It Xo Dv BPF_JMP No + BPF_JSET No +
896.Dv BPF_X
897.Xc
898.Sm on
899pc += (A & X) ? jt : jf
900.El
901.It Dv BPF_RET
902The return instructions terminate the filter program and specify the
903amount of packet to accept (i.e., they return the truncation amount)
904or, for the write filter, the maximum acceptable size for the packet
905(i.e., the packet is dropped if it is larger than the returned
906amount).
907A return value of zero indicates that the packet should be ignored/dropped.
908The return value is either a constant
909.Pf ( Dv BPF_K )
910or the accumulator
911.Pf ( Dv BPF_A ) .
912.Pp
913.Bl -tag -width 32n -compact
914.It Dv BPF_RET No + Dv BPF_A
915Accept A bytes.
916.It Dv BPF_RET No + Dv BPF_K
917Accept k bytes.
918.El
919.It Dv BPF_MISC
920The miscellaneous category was created for anything that doesn't fit into
921the above classes, and for any new instructions that might need to be added.
922Currently, these are the register transfer instructions that copy the index
923register to the accumulator or vice versa.
924.Pp
925.Bl -tag -width 32n -compact
926.Sm off
927.It Dv BPF_MISC No + Dv BPF_TAX
928.Sm on
929X <- A
930.Sm off
931.It Dv BPF_MISC No + Dv BPF_TXA
932.Sm on
933A <- X
934.El
935.El
936.Pp
937The
938.Nm
939interface provides the following macros to facilitate array initializers:
940.Bd -filled -offset indent
941.Dv BPF_STMT ( Ns Ar opcode ,
942.Ar operand )
943.Pp
944.Dv BPF_JUMP ( Ns Ar opcode ,
945.Ar operand ,
946.Ar true_offset ,
947.Ar false_offset )
948.Ed
949.Sh FILES
950.Bl -tag -width /dev/bpf -compact
951.It Pa /dev/bpf
952.Nm
953device
954.El
955.Sh EXAMPLES
956The following filter is taken from the Reverse ARP daemon.
957It accepts only Reverse ARP requests.
958.Bd -literal -offset indent
959struct bpf_insn insns[] = {
960	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
961	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
962	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
963	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
964	BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
965	    sizeof(struct ether_header)),
966	BPF_STMT(BPF_RET+BPF_K, 0),
967};
968.Ed
969.Pp
970This filter accepts only IP packets between host 128.3.112.15 and
971128.3.112.35.
972.Bd -literal -offset indent
973struct bpf_insn insns[] = {
974	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
975	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
976	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
977	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
978	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
979	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
980	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
981	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
982	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
983	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
984	BPF_STMT(BPF_RET+BPF_K, 0),
985};
986.Ed
987.Pp
988Finally, this filter returns only TCP finger packets.
989We must parse the IP header to reach the TCP header.
990The
991.Dv BPF_JSET
992instruction checks that the IP fragment offset is 0 so we are sure that we
993have a TCP header.
994.Bd -literal -offset indent
995struct bpf_insn insns[] = {
996	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
997	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
998	BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
999	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
1000	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
1001	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
1002	BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
1003	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
1004	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
1005	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
1006	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
1007	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
1008	BPF_STMT(BPF_RET+BPF_K, 0),
1009};
1010.Ed
1011.Sh SEE ALSO
1012.Xr ioctl 2 ,
1013.Xr read 2 ,
1014.Xr select 2 ,
1015.Xr signal 3 ,
1016.Xr MAKEDEV 8 ,
1017.Xr tcpdump 8
1018.Rs
1019.%A McCanne, S.
1020.%A Jacobson, V.
1021.%D January 1993
1022.%J 1993 Winter USENIX Conference
1023.%T The BSD Packet Filter: A New Architecture for User-level Packet Capture
1024.Re
1025.Sh HISTORY
1026The Enet packet filter was created in 1980 by Mike Accetta and Rick Rashid
1027at Carnegie-Mellon University.
1028Jeffrey Mogul, at Stanford, ported the code to
1029.Bx
1030and continued its
1031development from 1983 on.
1032Since then, it has evolved into the Ultrix Packet Filter at DEC, a STREAMS
1033NIT module under SunOS 4.1, and BPF.
1034.Sh AUTHORS
1035.An -nosplit
1036.An Steve McCanne
1037of Lawrence Berkeley Laboratory implemented BPF in Summer 1990.
1038Much of the design is due to
1039.An Van Jacobson .
1040.Sh BUGS
1041The read buffer must be of a fixed size (returned by the
1042.Dv BIOCGBLEN
1043ioctl).
1044.Pp
1045A file that does not request promiscuous mode may receive promiscuously
1046received packets as a side effect of another file requesting this mode on
1047the same hardware interface.
1048This could be fixed in the kernel with additional processing overhead.
1049However, we favor the model where all files must assume that the interface
1050is promiscuous, and if so desired, must utilize a filter to reject foreign
1051packets.
1052