xref: /openbsd/share/man/man4/bpf.4 (revision 1c16a693)
1.\"	$OpenBSD: bpf.4,v 1.35 2015/01/15 20:37:36 schwarze Exp $
2.\"     $NetBSD: bpf.4,v 1.7 1995/09/27 18:31:50 thorpej Exp $
3.\"
4.\" Copyright (c) 1990 The Regents of the University of California.
5.\" All rights reserved.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that: (1) source code distributions
9.\" retain the above copyright notice and this paragraph in its entirety, (2)
10.\" distributions including binary code include the above copyright notice and
11.\" this paragraph in its entirety in the documentation or other materials
12.\" provided with the distribution, and (3) all advertising materials mentioning
13.\" features or use of this software display the following acknowledgement:
14.\" ``This product includes software developed by the University of California,
15.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
16.\" the University nor the names of its contributors may be used to endorse
17.\" or promote products derived from this software without specific prior
18.\" written permission.
19.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
20.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
21.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
22.\"
23.\" This document is derived in part from the enet man page (enet.4)
24.\" distributed with 4.3BSD Unix.
25.\"
26.Dd $Mdocdate: January 15 2015 $
27.Dt BPF 4
28.Os
29.Sh NAME
30.Nm bpf
31.Nd Berkeley Packet Filter
32.Sh SYNOPSIS
33.Cd "pseudo-device bpfilter"
34.Sh DESCRIPTION
35The Berkeley Packet Filter provides a raw interface to data link layers in
36a protocol-independent fashion.
37All packets on the network, even those destined for other hosts, are
38accessible through this mechanism.
39.Pp
40The packet filter appears as a character special device,
41.Pa /dev/bpf0 ,
42.Pa /dev/bpf1 ,
43etc.
44After opening the device, the file descriptor must be bound to a specific
45network interface with the
46.Dv BIOCSETIF
47.Xr ioctl 2 .
48A given interface can be shared between multiple listeners, and the filter
49underlying each descriptor will see an identical packet stream.
50.Pp
51A separate device file is required for each minor device.
52If a file is in use, the open will fail and
53.Va errno
54will be set to
55.Er EBUSY .
56The number of open files can be increased by creating additional
57device nodes with the
58.Xr MAKEDEV 8
59script.
60.Pp
61Associated with each open instance of a
62.Nm
63file is a user-settable
64packet filter.
65Whenever a packet is received by an interface, all file descriptors
66listening on that interface apply their filter.
67Each descriptor that accepts the packet receives its own copy.
68.Pp
69Reads from these files return the next group of packets that have matched
70the filter.
71To improve performance, the buffer passed to read must be the same size as
72the buffers used internally by
73.Nm bpf .
74This size is returned by the
75.Dv BIOCGBLEN
76.Xr ioctl 2
77and can be set with
78.Dv BIOCSBLEN .
79Note that an individual packet larger than this size is necessarily truncated.
80.Pp
81A packet can be sent out on the network by writing to a
82.Nm
83file descriptor.
84Each descriptor can also have a user-settable filter
85for controlling the writes.
86Only packets matching the filter are sent out of the interface.
87The writes are unbuffered, meaning only one packet can be processed per write.
88.Pp
89Once a descriptor is configured, further changes to the configuration
90can be prevented using the
91.Dv BIOCLOCK
92.Xr ioctl 2 .
93.Sh IOCTL INTERFACE
94The
95.Xr ioctl 2
96command codes below are defined in
97.In net/bpf.h .
98All commands require these includes:
99.Pp
100.nr nS 1
101.In sys/types.h
102.In sys/time.h
103.In sys/ioctl.h
104.In net/bpf.h
105.nr nS 0
106.Pp
107Additionally,
108.Dv BIOCGETIF
109and
110.Dv BIOCSETIF
111require
112.In sys/socket.h
113and
114.In net/if.h .
115.Pp
116The (third) argument to the
117.Xr ioctl 2
118call should be a pointer to the type indicated.
119.Pp
120.Bl -tag -width Ds -compact
121.It Dv BIOCGBLEN Fa "u_int *"
122Returns the required buffer length for reads on
123.Nm
124files.
125.Pp
126.It Dv BIOCSBLEN Fa "u_int *"
127Sets the buffer length for reads on
128.Nm
129files.
130The buffer must be set before the file is attached to an interface with
131.Dv BIOCSETIF .
132If the requested buffer size cannot be accommodated, the closest allowable
133size will be set and returned in the argument.
134A read call will result in
135.Er EINVAL
136if it is passed a buffer that is not this size.
137.Pp
138.It Dv BIOCGDLT Fa "u_int *"
139Returns the type of the data link layer underlying the attached interface.
140.Er EINVAL
141is returned if no interface has been specified.
142The device types, prefixed with
143.Dq DLT_ ,
144are defined in
145.In net/bpf.h .
146.Pp
147.It Dv BIOCGDLTLIST Fa "struct bpf_dltlist *"
148Returns an array of the available types of the data link layer
149underlying the attached interface:
150.Bd -literal -offset indent
151struct bpf_dltlist {
152	u_int bfl_len;
153	u_int *bfl_list;
154};
155.Ed
156.Pp
157The available types are returned in the array pointed to by the
158.Va bfl_list
159field while their length in
160.Vt u_int
161is supplied to the
162.Va bfl_len
163field.
164.Er ENOMEM
165is returned if there is not enough buffer space and
166.Er EFAULT
167is returned if a bad address is encountered.
168The
169.Va bfl_len
170field is modified on return to indicate the actual length in
171.Vt u_int
172of the array returned.
173If
174.Va bfl_list
175is
176.Dv NULL ,
177the
178.Va bfl_len
179field is set to indicate the required length of the array in
180.Vt u_int .
181.Pp
182.It Dv BIOCSDLT Fa "u_int *"
183Changes the type of the data link layer underlying the attached interface.
184.Er EINVAL
185is returned if no interface has been specified or the specified
186type is not available for the interface.
187.Pp
188.It Dv BIOCPROMISC
189Forces the interface into promiscuous mode.
190All packets, not just those destined for the local host, are processed.
191Since more than one file can be listening on a given interface, a listener
192that opened its interface non-promiscuously may receive packets promiscuously.
193This problem can be remedied with an appropriate filter.
194.Pp
195The interface remains in promiscuous mode until all files listening
196promiscuously are closed.
197.Pp
198.It Dv BIOCFLUSH
199Flushes the buffer of incoming packets and resets the statistics that are
200returned by
201.Dv BIOCGSTATS .
202.Pp
203.It Dv BIOCLOCK
204This ioctl is designed to prevent the security issues associated
205with an open
206.Nm
207descriptor in unprivileged programs.
208Even with dropped privileges, an open
209.Nm
210descriptor can be abused by a rogue program to listen on any interface
211on the system, send packets on these interfaces if the descriptor was
212opened read-write and send signals to arbitrary processes using the
213signaling mechanism of
214.Nm bpf .
215By allowing only
216.Dq known safe
217ioctls, the
218.Dv BIOCLOCK
219ioctl prevents this abuse.
220The allowable ioctls are
221.Dv BIOCFLUSH ,
222.Dv BIOCGBLEN ,
223.Dv BIOCGDIRFILT ,
224.Dv BIOCGDLT ,
225.Dv BIOCGDLTLIST ,
226.Dv BIOCGETIF ,
227.Dv BIOCGHDRCMPLT ,
228.Dv BIOCGRSIG ,
229.Dv BIOCGRTIMEOUT ,
230.Dv BIOCGSTATS ,
231.Dv BIOCIMMEDIATE ,
232.Dv BIOCLOCK ,
233.Dv BIOCSRTIMEOUT ,
234.Dv BIOCVERSION ,
235.Dv TIOCGPGRP ,
236and
237.Dv FIONREAD .
238Use of any other ioctl is denied with error
239.Er EPERM .
240Once a descriptor is locked, it is not possible to unlock it.
241A process with root privileges is not affected by the lock.
242.Pp
243A privileged program can open a
244.Nm
245device, drop privileges, set the interface, filters and modes on the
246descriptor, and lock it.
247Once the descriptor is locked, the system is safe
248from further abuse through the descriptor.
249Locking a descriptor does not prevent writes.
250If the application does not need to send packets through
251.Nm bpf ,
252it can open the device read-only to prevent writing.
253If sending packets is necessary, a write-filter can be set before locking the
254descriptor to prevent arbitrary packets from being sent out.
255.Pp
256.It Dv BIOCGETIF Fa "struct ifreq *"
257Returns the name of the hardware interface that the file is listening on.
258The name is returned in the
259.Fa ifr_name
260field of the
261.Li struct ifreq .
262All other fields are undefined.
263.Pp
264.It Dv BIOCSETIF Fa "struct ifreq *"
265Sets the hardware interface associated with the file.
266This command must be performed before any packets can be read.
267The device is indicated by name using the
268.Fa ifr_name
269field of the
270.Li struct ifreq .
271Additionally, performs the actions of
272.Dv BIOCFLUSH .
273.Pp
274.It Dv BIOCSRTIMEOUT Fa "struct timeval *"
275.It Dv BIOCGRTIMEOUT Fa "struct timeval *"
276Sets or gets the read timeout parameter.
277The
278.Ar timeval
279specifies the length of time to wait before timing out on a read request.
280This parameter is initialized to zero by
281.Xr open 2 ,
282indicating no timeout.
283.Pp
284.It Dv BIOCGSTATS Fa "struct bpf_stat *"
285Returns the following structure of packet statistics:
286.Bd -literal -offset indent
287struct bpf_stat {
288	u_int bs_recv;
289	u_int bs_drop;
290};
291.Ed
292.Pp
293The fields are:
294.Bl -tag -width bs_recv
295.It Fa bs_recv
296Number of packets received by the descriptor since opened or reset (including
297any buffered since the last read call).
298.It Fa bs_drop
299Number of packets which were accepted by the filter but dropped by the kernel
300because of buffer overflows (i.e., the application's reads aren't keeping up
301with the packet traffic).
302.El
303.Pp
304.It Dv BIOCIMMEDIATE Fa "u_int *"
305Enables or disables
306.Dq immediate mode ,
307based on the truth value of the argument.
308When immediate mode is enabled, reads return immediately upon packet reception.
309Otherwise, a read will block until either the kernel buffer becomes full or a
310timeout occurs.
311This is useful for programs like
312.Xr rarpd 8 ,
313which must respond to messages in real time.
314The default for a new file is off.
315.Pp
316.It Dv BIOCSETF Fa "struct bpf_program *"
317Sets the filter program used by the kernel to discard uninteresting packets.
318An array of instructions and its length are passed in using the following
319structure:
320.Bd -literal -offset indent
321struct bpf_program {
322	u_int bf_len;
323	struct bpf_insn *bf_insns;
324};
325.Ed
326.Pp
327The filter program is pointed to by the
328.Fa bf_insns
329field, while its length in units of
330.Li struct bpf_insn
331is given by the
332.Fa bf_len
333field.
334Also, the actions of
335.Dv BIOCFLUSH
336are performed.
337.Pp
338See section
339.Sx FILTER MACHINE
340for an explanation of the filter language.
341.Pp
342.It Dv BIOCSETWF Fa "struct bpf_program *"
343Sets the filter program used by the kernel to filter the packets
344written to the descriptor before the packets are sent out on the
345network.
346See
347.Dv BIOCSETF
348for a description of the filter program.
349This ioctl also acts as
350.Dv BIOCFLUSH .
351.Pp
352Note that the filter operates on the packet data written to the descriptor.
353If the
354.Dq header complete
355flag is not set, the kernel sets the link-layer source address
356of the packet after filtering.
357.Pp
358.It Dv BIOCVERSION Fa "struct bpf_version *"
359Returns the major and minor version numbers of the filter language currently
360recognized by the kernel.
361Before installing a filter, applications must check that the current version
362is compatible with the running kernel.
363Version numbers are compatible if the major numbers match and the application
364minor is less than or equal to the kernel minor.
365The kernel version number is returned in the following structure:
366.Bd -literal -offset indent
367struct bpf_version {
368	u_short bv_major;
369	u_short bv_minor;
370};
371.Ed
372.Pp
373The current version numbers are given by
374.Dv BPF_MAJOR_VERSION
375and
376.Dv BPF_MINOR_VERSION
377from
378.In net/bpf.h .
379An incompatible filter may result in undefined behavior (most likely, an
380error returned by
381.Xr ioctl 2
382or haphazard packet matching).
383.Pp
384.It Dv BIOCSRSIG Fa "u_int *"
385.It Dv BIOCGRSIG Fa "u_int *"
386Sets or gets the receive signal.
387This signal will be sent to the process or process group specified by
388.Dv FIOSETOWN .
389It defaults to
390.Dv SIGIO .
391.Pp
392.It Dv BIOCSHDRCMPLT Fa "u_int *"
393.It Dv BIOCGHDRCMPLT Fa "u_int *"
394Sets or gets the status of the
395.Dq header complete
396flag.
397Set to zero if the link level source address should be filled in
398automatically by the interface output routine.
399Set to one if the link level source address will be written,
400as provided, to the wire.
401This flag is initialized to zero by default.
402.Pp
403.It Dv BIOCSFILDROP Fa "u_int *"
404.It Dv BIOCGFILDROP Fa "u_int *"
405Sets or gets the status of the
406.Dq filter drop
407flag.
408If non-zero, packets matching any filters will be reported to the
409associated interface so that they can be dropped.
410.Pp
411.It Dv BIOCSDIRFILT Fa "u_int *"
412.It Dv BIOCGDIRFILT Fa "u_int *"
413Sets or gets the status of the
414.Dq direction filter
415flag.
416If non-zero, packets matching the specified direction (either
417.Dv BPF_DIRECTION_IN
418or
419.Dv BPF_DIRECTION_OUT )
420will be ignored.
421.El
422.Ss Standard ioctls
423.Nm
424now supports several standard ioctls which allow the user to do asynchronous
425and/or non-blocking I/O to an open
426.Nm
427file descriptor.
428.Pp
429.Bl -tag -width Ds -compact
430.It Dv FIONREAD Fa "int *"
431Returns the number of bytes that are immediately available for reading.
432.Pp
433.It Dv FIONBIO Fa "int *"
434Sets or clears non-blocking I/O.
435If the argument is non-zero, enable non-blocking I/O.
436If the argument is zero, disable non-blocking I/O.
437If non-blocking I/O is enabled, the return value of a read while no data
438is available will be 0.
439The non-blocking read behavior is different from performing non-blocking
440reads on other file descriptors, which will return \-1 and set
441.Va errno
442to
443.Er EAGAIN
444if no data is available.
445Note: setting this overrides the timeout set by
446.Dv BIOCSRTIMEOUT .
447.Pp
448.It Dv FIOASYNC Fa "int *"
449Enables or disables asynchronous I/O.
450When enabled (argument is non-zero), the process or process group specified
451by
452.Dv FIOSETOWN
453will start receiving
454.Dv SIGIO
455signals when packets arrive.
456Note that you must perform an
457.Dv FIOSETOWN
458command in order for this to take effect, as the system will not do it by
459default.
460The signal may be changed via
461.Dv BIOCSRSIG .
462.Pp
463.It Dv FIOSETOWN Fa "int *"
464.It Dv FIOGETOWN Fa "int *"
465Sets or gets the process or process group (if negative) that should receive
466.Dv SIGIO
467when packets are available.
468The signal may be changed using
469.Dv BIOCSRSIG
470(see above).
471.El
472.Ss BPF header
473The following structure is prepended to each packet returned by
474.Xr read 2 :
475.Bd -literal -offset indent
476struct bpf_hdr {
477	struct bpf_timeval bh_tstamp;
478	u_int32_t	bh_caplen;
479	u_int32_t	bh_datalen;
480	u_int16_t	bh_hdrlen;
481};
482.Ed
483.Pp
484The fields, stored in host order, are as follows:
485.Bl -tag -width Ds
486.It Fa bh_tstamp
487Time at which the packet was processed by the packet filter.
488.It Fa bh_caplen
489Length of the captured portion of the packet.
490This is the minimum of the truncation amount specified by the filter and the
491length of the packet.
492.It Fa bh_datalen
493Length of the packet off the wire.
494This value is independent of the truncation amount specified by the filter.
495.It Fa bh_hdrlen
496Length of the BPF header, which may not be equal to
497.Li sizeof(struct bpf_hdr) .
498.El
499.Pp
500The
501.Fa bh_hdrlen
502field exists to account for padding between the header and the link level
503protocol.
504The purpose here is to guarantee proper alignment of the packet data
505structures, which is required on alignment-sensitive architectures and
506improves performance on many other architectures.
507The packet filter ensures that the
508.Fa bpf_hdr
509and the network layer header will be word aligned.
510Suitable precautions must be taken when accessing the link layer protocol
511fields on alignment restricted machines.
512(This isn't a problem on an Ethernet, since the type field is a
513.Li short
514falling on an even offset, and the addresses are probably accessed in a
515bytewise fashion).
516.Pp
517Additionally, individual packets are padded so that each starts on a
518word boundary.
519This requires that an application has some knowledge of how to get from packet
520to packet.
521The macro
522.Dv BPF_WORDALIGN
523is defined in
524.In net/bpf.h
525to facilitate this process.
526It rounds up its argument to the nearest word aligned value (where a word is
527.Dv BPF_ALIGNMENT
528bytes wide).
529For example, if
530.Va p
531points to the start of a packet, this expression will advance it to the
532next packet:
533.Pp
534.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen);
535.Pp
536For the alignment mechanisms to work properly, the buffer passed to
537.Xr read 2
538must itself be word aligned.
539.Xr malloc 3
540will always return an aligned buffer.
541.Ss Filter machine
542A filter program is an array of instructions with all branches forwardly
543directed, terminated by a
544.Dq return
545instruction.
546Each instruction performs some action on the pseudo-machine state, which
547consists of an accumulator, index register, scratch memory store, and
548implicit program counter.
549.Pp
550The following structure defines the instruction format:
551.Bd -literal -offset indent
552struct bpf_insn {
553	u_int16_t	code;
554	u_char		jt;
555	u_char		jf;
556	u_int32_t	k;
557};
558.Ed
559.Pp
560The
561.Fa k
562field is used in different ways by different instructions, and the
563.Fa jt
564and
565.Fa jf
566fields are used as offsets by the branch instructions.
567The opcodes are encoded in a semi-hierarchical fashion.
568There are eight classes of instructions:
569.Dv BPF_LD ,
570.Dv BPF_LDX ,
571.Dv BPF_ST ,
572.Dv BPF_STX ,
573.Dv BPF_ALU ,
574.Dv BPF_JMP ,
575.Dv BPF_RET ,
576and
577.Dv BPF_MISC .
578Various other mode and operator bits are logically OR'd into the class to
579give the actual instructions.
580The classes and modes are defined in
581.In net/bpf.h .
582Below are the semantics for each defined
583.Nm
584instruction.
585We use the convention that A is the accumulator, X is the index register,
586P[] packet data, and M[] scratch memory store.
587P[i:n] gives the data at byte offset
588.Dq i
589in the packet, interpreted as a word (n=4), unsigned halfword (n=2), or
590unsigned byte (n=1).
591M[i] gives the i'th word in the scratch memory store, which is only addressed
592in word units.
593The memory store is indexed from 0 to
594.Dv BPF_MEMWORDS Ns \-1 .
595.Fa k ,
596.Fa jt ,
597and
598.Fa jf
599are the corresponding fields in the instruction definition.
600.Dq len
601refers to the length of the packet.
602.Bl -tag -width Ds
603.It Dv BPF_LD
604These instructions copy a value into the accumulator.
605The type of the source operand is specified by an
606.Dq addressing mode
607and can be a constant
608.Pf ( Dv BPF_IMM ) ,
609packet data at a fixed offset
610.Pf ( Dv BPF_ABS ) ,
611packet data at a variable offset
612.Pf ( Dv BPF_IND ) ,
613the packet length
614.Pf ( Dv BPF_LEN ) ,
615or a word in the scratch memory store
616.Pf ( Dv BPF_MEM ) .
617For
618.Dv BPF_IND
619and
620.Dv BPF_ABS ,
621the data size must be specified as a word
622.Pf ( Dv BPF_W ) ,
623halfword
624.Pf ( Dv BPF_H ) ,
625or byte
626.Pf ( Dv BPF_B ) .
627The semantics of all recognized
628.Dv BPF_LD
629instructions follow.
630.Pp
631.Bl -tag -width 32n -compact
632.Sm off
633.It Xo Dv BPF_LD No + Dv BPF_W No +
634.Dv BPF_ABS
635.Xc
636.Sm on
637A <- P[k:4]
638.Sm off
639.It Xo Dv BPF_LD No + Dv BPF_H No +
640.Dv BPF_ABS
641.Xc
642.Sm on
643A <- P[k:2]
644.Sm off
645.It Xo Dv BPF_LD No + Dv BPF_B No +
646.Dv BPF_ABS
647.Xc
648.Sm on
649A <- P[k:1]
650.Sm off
651.It Xo Dv BPF_LD No + Dv BPF_W No +
652.Dv BPF_IND
653.Xc
654.Sm on
655A <- P[X+k:4]
656.Sm off
657.It Xo Dv BPF_LD No + Dv BPF_H No +
658.Dv BPF_IND
659.Xc
660.Sm on
661A <- P[X+k:2]
662.Sm off
663.It Xo Dv BPF_LD No + Dv BPF_B No +
664.Dv BPF_IND
665.Xc
666.Sm on
667A <- P[X+k:1]
668.Sm off
669.It Xo Dv BPF_LD No + Dv BPF_W No +
670.Dv BPF_LEN
671.Xc
672.Sm on
673A <- len
674.Sm off
675.It Dv BPF_LD No + Dv BPF_IMM
676.Sm on
677A <- k
678.Sm off
679.It Dv BPF_LD No + Dv BPF_MEM
680.Sm on
681A <- M[k]
682.El
683.It Dv BPF_LDX
684These instructions load a value into the index register.
685Note that the addressing modes are more restricted than those of the
686accumulator loads, but they include
687.Dv BPF_MSH ,
688a hack for efficiently loading the IP header length.
689.Pp
690.Bl -tag -width 32n -compact
691.Sm off
692.It Xo Dv BPF_LDX No + Dv BPF_W No +
693.Dv BPF_IMM
694.Xc
695.Sm on
696X <- k
697.Sm off
698.It Xo Dv BPF_LDX No + Dv BPF_W No +
699.Dv BPF_MEM
700.Xc
701.Sm on
702X <- M[k]
703.Sm off
704.It Xo Dv BPF_LDX No + Dv BPF_W No +
705.Dv BPF_LEN
706.Xc
707.Sm on
708X <- len
709.Sm off
710.It Xo Dv BPF_LDX No + Dv BPF_B No +
711.Dv BPF_MSH
712.Xc
713.Sm on
714X <- 4*(P[k:1]&0xf)
715.El
716.It Dv BPF_ST
717This instruction stores the accumulator into the scratch memory.
718We do not need an addressing mode since there is only one possibility for
719the destination.
720.Pp
721.Bl -tag -width 32n -compact
722.It Dv BPF_ST
723M[k] <- A
724.El
725.It Dv BPF_STX
726This instruction stores the index register in the scratch memory store.
727.Pp
728.Bl -tag -width 32n -compact
729.It Dv BPF_STX
730M[k] <- X
731.El
732.It Dv BPF_ALU
733The ALU instructions perform operations between the accumulator and index
734register or constant, and store the result back in the accumulator.
735For binary operations, a source mode is required
736.Pf ( Dv BPF_K
737or
738.Dv BPF_X ) .
739.Pp
740.Bl -tag -width 32n -compact
741.Sm off
742.It Xo Dv BPF_ALU No + BPF_ADD No +
743.Dv BPF_K
744.Xc
745.Sm on
746A <- A + k
747.Sm off
748.It Xo Dv BPF_ALU No + BPF_SUB No +
749.Dv BPF_K
750.Xc
751.Sm on
752A <- A - k
753.Sm off
754.It Xo Dv BPF_ALU No + BPF_MUL No +
755.Dv BPF_K
756.Xc
757.Sm on
758A <- A * k
759.Sm off
760.It Xo Dv BPF_ALU No + BPF_DIV No +
761.Dv BPF_K
762.Xc
763.Sm on
764A <- A / k
765.Sm off
766.It Xo Dv BPF_ALU No + BPF_AND No +
767.Dv BPF_K
768.Xc
769.Sm on
770A <- A & k
771.Sm off
772.It Xo Dv BPF_ALU No + BPF_OR No +
773.Dv BPF_K
774.Xc
775.Sm on
776A <- A | k
777.Sm off
778.It Xo Dv BPF_ALU No + BPF_LSH No +
779.Dv BPF_K
780.Xc
781.Sm on
782A <- A << k
783.Sm off
784.It Xo Dv BPF_ALU No + BPF_RSH No +
785.Dv BPF_K
786.Xc
787.Sm on
788A <- A >> k
789.Sm off
790.It Xo Dv BPF_ALU No + BPF_ADD No +
791.Dv BPF_X
792.Xc
793.Sm on
794A <- A + X
795.Sm off
796.It Xo Dv BPF_ALU No + BPF_SUB No +
797.Dv BPF_X
798.Xc
799.Sm on
800A <- A - X
801.Sm off
802.It Xo Dv BPF_ALU No + BPF_MUL No +
803.Dv BPF_X
804.Xc
805.Sm on
806A <- A * X
807.Sm off
808.It Xo Dv BPF_ALU No + BPF_DIV No +
809.Dv BPF_X
810.Xc
811.Sm on
812A <- A / X
813.Sm off
814.It Xo Dv BPF_ALU No + BPF_AND No +
815.Dv BPF_X
816.Xc
817.Sm on
818A <- A & X
819.Sm off
820.It Xo Dv BPF_ALU No + BPF_OR No +
821.Dv BPF_X
822.Xc
823.Sm on
824A <- A | X
825.Sm off
826.It Xo Dv BPF_ALU No + BPF_LSH No +
827.Dv BPF_X
828.Xc
829.Sm on
830A <- A << X
831.Sm off
832.It Xo Dv BPF_ALU No + BPF_RSH No +
833.Dv BPF_X
834.Xc
835.Sm on
836A <- A >> X
837.Sm off
838.It Dv BPF_ALU No + BPF_NEG
839.Sm on
840A <- -A
841.El
842.It Dv BPF_JMP
843The jump instructions alter flow of control.
844Conditional jumps compare the accumulator against a constant
845.Pf ( Dv BPF_K )
846or the index register
847.Pf ( Dv BPF_X ) .
848If the result is true (or non-zero), the true branch is taken, otherwise the
849false branch is taken.
850Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
851However, the jump always
852.Pf ( Dv BPF_JA )
853opcode uses the 32-bit
854.Fa k
855field as the offset, allowing arbitrarily distant destinations.
856All conditionals use unsigned comparison conventions.
857.Pp
858.Bl -tag -width 32n -compact
859.Sm off
860.It Dv BPF_JMP No + BPF_JA
861pc += k
862.Sm on
863.Sm off
864.It Xo Dv BPF_JMP No + BPF_JGT No +
865.Dv BPF_K
866.Xc
867.Sm on
868pc += (A > k) ? jt : jf
869.Sm off
870.It Xo Dv BPF_JMP No + BPF_JGE No +
871.Dv BPF_K
872.Xc
873.Sm on
874pc += (A >= k) ? jt : jf
875.Sm off
876.It Xo Dv BPF_JMP No + BPF_JEQ No +
877.Dv BPF_K
878.Xc
879.Sm on
880pc += (A == k) ? jt : jf
881.Sm off
882.It Xo Dv BPF_JMP No + BPF_JSET No +
883.Dv BPF_K
884.Xc
885.Sm on
886pc += (A & k) ? jt : jf
887.Sm off
888.It Xo Dv BPF_JMP No + BPF_JGT No +
889.Dv BPF_X
890.Xc
891.Sm on
892pc += (A > X) ? jt : jf
893.Sm off
894.It Xo Dv BPF_JMP No + BPF_JGE No +
895.Dv BPF_X
896.Xc
897.Sm on
898pc += (A >= X) ? jt : jf
899.Sm off
900.It Xo Dv BPF_JMP No + BPF_JEQ No +
901.Dv BPF_X
902.Xc
903.Sm on
904pc += (A == X) ? jt : jf
905.Sm off
906.It Xo Dv BPF_JMP No + BPF_JSET No +
907.Dv BPF_X
908.Xc
909.Sm on
910pc += (A & X) ? jt : jf
911.El
912.It Dv BPF_RET
913The return instructions terminate the filter program and specify the
914amount of packet to accept (i.e., they return the truncation amount)
915or, for the write filter, the maximum acceptable size for the packet
916(i.e., the packet is dropped if it is larger than the returned
917amount).
918A return value of zero indicates that the packet should be ignored/dropped.
919The return value is either a constant
920.Pf ( Dv BPF_K )
921or the accumulator
922.Pf ( Dv BPF_A ) .
923.Pp
924.Bl -tag -width 32n -compact
925.It Dv BPF_RET No + Dv BPF_A
926Accept A bytes.
927.It Dv BPF_RET No + Dv BPF_K
928Accept k bytes.
929.El
930.It Dv BPF_MISC
931The miscellaneous category was created for anything that doesn't fit into
932the above classes, and for any new instructions that might need to be added.
933Currently, these are the register transfer instructions that copy the index
934register to the accumulator or vice versa.
935.Pp
936.Bl -tag -width 32n -compact
937.Sm off
938.It Dv BPF_MISC No + Dv BPF_TAX
939.Sm on
940X <- A
941.Sm off
942.It Dv BPF_MISC No + Dv BPF_TXA
943.Sm on
944A <- X
945.El
946.El
947.Pp
948The
949.Nm
950interface provides the following macros to facilitate array initializers:
951.Bd -filled -offset indent
952.Dv BPF_STMT ( Ns Ar opcode ,
953.Ar operand )
954.Pp
955.Dv BPF_JUMP ( Ns Ar opcode ,
956.Ar operand ,
957.Ar true_offset ,
958.Ar false_offset )
959.Ed
960.Sh FILES
961.Bl -tag -width /dev/bpf[0-9] -compact
962.It Pa /dev/bpf[0-9]
963.Nm
964devices
965.El
966.Sh EXAMPLES
967The following filter is taken from the Reverse ARP daemon.
968It accepts only Reverse ARP requests.
969.Bd -literal -offset indent
970struct bpf_insn insns[] = {
971	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
972	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
973	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
974	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
975	BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
976	    sizeof(struct ether_header)),
977	BPF_STMT(BPF_RET+BPF_K, 0),
978};
979.Ed
980.Pp
981This filter accepts only IP packets between host 128.3.112.15 and
982128.3.112.35.
983.Bd -literal -offset indent
984struct bpf_insn insns[] = {
985	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
986	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
987	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
988	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
989	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
990	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
991	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
992	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
993	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
994	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
995	BPF_STMT(BPF_RET+BPF_K, 0),
996};
997.Ed
998.Pp
999Finally, this filter returns only TCP finger packets.
1000We must parse the IP header to reach the TCP header.
1001The
1002.Dv BPF_JSET
1003instruction checks that the IP fragment offset is 0 so we are sure that we
1004have a TCP header.
1005.Bd -literal -offset indent
1006struct bpf_insn insns[] = {
1007	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
1008	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
1009	BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
1010	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
1011	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
1012	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
1013	BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
1014	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
1015	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
1016	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
1017	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
1018	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
1019	BPF_STMT(BPF_RET+BPF_K, 0),
1020};
1021.Ed
1022.Sh SEE ALSO
1023.Xr ioctl 2 ,
1024.Xr read 2 ,
1025.Xr select 2 ,
1026.Xr signal 3 ,
1027.Xr MAKEDEV 8 ,
1028.Xr tcpdump 8
1029.Rs
1030.%A McCanne, S.
1031.%A Jacobson, V.
1032.%J "An efficient, extensible, and portable network monitor"
1033.Re
1034.Sh HISTORY
1035The Enet packet filter was created in 1980 by Mike Accetta and Rick Rashid
1036at Carnegie-Mellon University.
1037Jeffrey Mogul, at Stanford, ported the code to
1038.Bx
1039and continued its
1040development from 1983 on.
1041Since then, it has evolved into the Ultrix Packet Filter at DEC, a STREAMS
1042NIT module under SunOS 4.1, and BPF.
1043.Sh AUTHORS
1044.An -nosplit
1045.An Steve McCanne
1046of Lawrence Berkeley Laboratory implemented BPF in Summer 1990.
1047Much of the design is due to
1048.An Van Jacobson .
1049.Sh BUGS
1050The read buffer must be of a fixed size (returned by the
1051.Dv BIOCGBLEN
1052ioctl).
1053.Pp
1054A file that does not request promiscuous mode may receive promiscuously
1055received packets as a side effect of another file requesting this mode on
1056the same hardware interface.
1057This could be fixed in the kernel with additional processing overhead.
1058However, we favor the model where all files must assume that the interface
1059is promiscuous, and if so desired, must utilize a filter to reject foreign
1060packets.
1061