xref: /openbsd/share/man/man4/bpf.4 (revision 6314440d)
1.\"	$OpenBSD: bpf.4,v 1.24 2005/01/08 00:23:05 jmc Exp $
2.\"     $NetBSD: bpf.4,v 1.7 1995/09/27 18:31:50 thorpej Exp $
3.\"
4.\" Copyright (c) 1990 The Regents of the University of California.
5.\" All rights reserved.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that: (1) source code distributions
9.\" retain the above copyright notice and this paragraph in its entirety, (2)
10.\" distributions including binary code include the above copyright notice and
11.\" this paragraph in its entirety in the documentation or other materials
12.\" provided with the distribution, and (3) all advertising materials mentioning
13.\" features or use of this software display the following acknowledgement:
14.\" ``This product includes software developed by the University of California,
15.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
16.\" the University nor the names of its contributors may be used to endorse
17.\" or promote products derived from this software without specific prior
18.\" written permission.
19.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
20.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
21.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
22.\"
23.\" This document is derived in part from the enet man page (enet.4)
24.\" distributed with 4.3BSD Unix.
25.\"
26.Dd May 23, 1991
27.Dt BPF 4
28.Os
29.Sh NAME
30.Nm bpf
31.Nd Berkeley Packet Filter
32.Sh SYNOPSIS
33.Cd "pseudo-device bpfilter"
34.Sh DESCRIPTION
35The Berkeley Packet Filter provides a raw interface to data link layers in
36a protocol-independent fashion.
37All packets on the network, even those destined for other hosts, are
38accessible through this mechanism.
39.Pp
40The packet filter appears as a character special device,
41.Pa /dev/bpf0 ,
42.Pa /dev/bpf1 ,
43etc.
44After opening the device, the file descriptor must be bound to a specific
45network interface with the
46.Dv BIOCSETIF
47ioctl.
48A given interface can be shared between multiple listeners, and the filter
49underlying each descriptor will see an identical packet stream.
50.Pp
51A separate device file is required for each minor device.
52If a file is in use, the open will fail and
53.Va errno
54will be set to
55.Er EBUSY .
56The number of open files can be increased by creating additional
57device nodes with the
58.Xr MAKEDEV 8
59script.
60.Pp
61Associated with each open instance of a
62.Nm
63file is a user-settable
64packet filter.
65Whenever a packet is received by an interface, all file descriptors
66listening on that interface apply their filter.
67Each descriptor that accepts the packet receives its own copy.
68.Pp
69Reads from these files return the next group of packets that have matched
70the filter.
71To improve performance, the buffer passed to read must be the same size as
72the buffers used internally by
73.Nm bpf .
74This size is returned by the
75.Dv BIOCGBLEN
76ioctl (see below), and under BSD, can be set with
77.Dv BIOCSBLEN .
78Note that an individual packet larger than this size is necessarily truncated.
79.Pp
80The packet filter will support any link level protocol that has fixed length
81headers.
82Currently, only Ethernet, SLIP, and PPP drivers have been modified to
83interact with
84.Nm bpf .
85.Pp
86Since packet data is in network byte order, applications should use the
87.Xr byteorder 3
88macros to extract multi-byte values.
89.Pp
90A packet can be sent out on the network by writing to a
91.Nm
92file descriptor.
93Each descriptor can also have a user-settable filter
94for controlling the writes.
95Only packets matching the filter are sent out of the interface.
96The writes are unbuffered, meaning only one packet can be processed per write.
97.Pp
98Once a descriptor is configured, further changes to the configuration
99can be prevented using the
100.Dv BIOCLOCK
101ioctl.
102.Ss Ioctls
103The ioctl command codes below are defined in
104.Aq Pa net/bpf.h .
105All commands require these includes:
106.Bd -unfilled -offset indent
107.Cd #include <sys/types.h>
108.Cd #include <sys/time.h>
109.Cd #include <sys/ioctl.h>
110.Cd #include <net/bpf.h>
111.Ed
112.Pp
113Additionally,
114.Dv BIOCGETIF
115and
116.Dv BIOCSETIF
117require
118.Aq Pa sys/socket.h
119and
120.Aq Pa net/if.h .
121.Pp
122The (third) argument to the
123.Xr ioctl 2
124call should be a pointer to the type indicated.
125.Bl -tag -width Ds
126.It Dv BIOCGBLEN ( Li int )
127Returns the required buffer length for reads on
128.Nm
129files.
130.It Dv BIOCSBLEN ( Li u_int )
131Sets the buffer length for reads on
132.Nm
133files.
134The buffer must be set before the file is attached to an interface with
135.Dv BIOCSETIF .
136If the requested buffer size cannot be accommodated, the closest allowable
137size will be set and returned in the argument.
138A read call will result in
139.Er EIO
140if it is passed a buffer that is not this size.
141.It Dv BIOCGDLT ( Li u_int )
142Returns the type of the data link layer underlying the attached interface.
143.Er EINVAL
144is returned if no interface has been specified.
145The device types, prefixed with
146.Dq DLT_ ,
147are defined in
148.Aq Pa net/bpf.h .
149.It Dv BIOCGDLTLIST (struct bpf_dltlist)
150Returns an array of the available types of the data link layer
151underlying the attached interface:
152.Bd -literal -offset indent
153struct bpf_dltlist {
154	u_int bfl_len;
155	u_int *bfl_list;
156};
157.Ed
158.Pp
159The available types are returned in the array pointed to by the
160.Va bfl_list
161field while their length in
162.Vt u_int
163is supplied to the
164.Va bfl_len
165field.
166.Er ENOMEM
167is returned if there is not enough buffer space and
168.Er EFAULT
169is returned if a bad address is encountered.
170The
171.Va bfl_len
172field is modified on return to indicate the actual length in
173.Vt u_int
174of the array returned.
175If
176.Va bfl_list
177is
178.Dv NULL ,
179the
180.Va bfl_len
181field is set to indicate the required length of the array in
182.Vt u_int .
183.It Dv BIOCSDLT (u_int)
184Changes the type of the data link layer underlying the attached interface.
185.Er EINVAL
186is returned if no interface has been specified or the specified
187type is not available for the interface.
188.It Dv BIOCPROMISC
189Forces the interface into promiscuous mode.
190All packets, not just those destined for the local host, are processed.
191Since more than one file can be listening on a given interface, a listener
192that opened its interface non-promiscuously may receive packets promiscuously.
193This problem can be remedied with an appropriate filter.
194.Pp
195The interface remains in promiscuous mode until all files listening
196promiscuously are closed.
197.It Dv BIOCFLUSH
198Flushes the buffer of incoming packets and resets the statistics that are
199returned by
200.Dv BIOCGSTATS .
201.It Dv BIOCLOCK
202This ioctl is designed to prevent the security issues associated
203with an open
204.Nm
205descriptor in unprivileged programs.
206Even with dropped privileges, an open
207.Nm
208descriptor can be abused by a rogue program to listen on any interface
209on the system, send packets on these interfaces if the descriptor was
210opened read-write and send signals to arbitrary processes using the
211signaling mechanism of
212.Nm bpf .
213By allowing only
214.Dq known safe
215ioctls, the
216.Dv BIOCLOCK
217ioctl prevents this abuse.
218The allowable ioctls are
219.Dv BIOCGBLEN ,
220.Dv BIOCFLUSH ,
221.Dv BIOCGDLT ,
222.Dv BIOCGETIF ,
223.Dv BIOCGRTIMEOUT ,
224.Dv BIOCSRTIMEOUT ,
225.Dv BIOCIMMEDIATE ,
226.Dv BIOCGSTATS ,
227.Dv BIOCVERSION ,
228.Dv BIOCGRSIG ,
229.Dv BIOCGHDRCMPLT ,
230.Dv TIOCGPGRP ,
231and
232.Dv FIONREAD .
233Use of any other ioctl is denied with error
234.Er EPERM .
235Once a descriptor is locked, it is not possible to unlock it.
236A process with root privileges is not affected by the lock.
237.Pp
238A privileged program can open a
239.Nm
240device, drop privileges, set the interface, filters and modes on the
241descriptor, and lock it.
242Once the descriptor is locked, the system is safe
243from further abuse through the descriptor.
244Locking a descriptor does not prevent writes.
245If the application does not need to send packets through
246.Nm bpf ,
247it can open the device read-only to prevent writing.
248If sending packets is necessary, a write-filter can be set before locking the
249descriptor to prevent arbitrary packets from being sent out.
250.It Dv BIOCGETIF ( Li "struct ifreq" )
251Returns the name of the hardware interface that the file is listening on.
252The name is returned in the
253.Fa ifr_name
254field of the
255.Li struct ifreq .
256All other fields are undefined.
257.It Dv BIOCSETIF ( Li "struct ifreq" )
258Sets the hardware interface associated with the file.
259This command must be performed before any packets can be read.
260The device is indicated by name using the
261.Fa ifr_name
262field of the
263.Li struct ifreq .
264Additionally, performs the actions of
265.Dv BIOCFLUSH .
266.It Dv BIOCSRTIMEOUT , BIOCGRTIMEOUT ( Li "struct timeval" )
267Set or get the read timeout parameter.
268The
269.Ar timeval
270specifies the length of time to wait before timing out on a read request.
271This parameter is initialized to zero by
272.Xr open 2 ,
273indicating no timeout.
274.It Dv BIOCGSTATS ( Li "struct bpf_stat" )
275Returns the following structure of packet statistics:
276.Bd -literal -offset indent
277struct bpf_stat {
278	u_int bs_recv;
279	u_int bs_drop;
280};
281.Ed
282.Pp
283The fields are:
284.Bl -tag -width bs_recv
285.It Fa bs_recv
286Number of packets received by the descriptor since opened or reset (including
287any buffered since the last read call).
288.It Fa bs_drop
289Number of packets which were accepted by the filter but dropped by the kernel
290because of buffer overflows (i.e., the application's reads aren't keeping up
291with the packet traffic).
292.El
293.It Dv BIOCIMMEDIATE ( Li u_int )
294Enable or disable
295.Dq immediate mode ,
296based on the truth value of the argument.
297When immediate mode is enabled, reads return immediately upon packet reception.
298Otherwise, a read will block until either the kernel buffer becomes full or a
299timeout occurs.
300This is useful for programs like
301.Xr rarpd 8 ,
302which must respond to messages in real time.
303The default for a new file is off.
304.It Dv BIOCSETF ( Li "struct bpf_program" )
305Sets the filter program used by the kernel to discard uninteresting packets.
306An array of instructions and its length are passed in using the following
307structure:
308.Bd -literal -offset indent
309struct bpf_program {
310	int bf_len;
311	struct bpf_insn *bf_insns;
312};
313.Ed
314.Pp
315The filter program is pointed to by the
316.Fa bf_insns
317field, while its length in units of
318.Li struct bpf_insn
319is given by the
320.Fa bf_len
321field.
322Also, the actions of
323.Dv BIOCFLUSH
324are performed.
325.Pp
326See section
327.Sx FILTER MACHINE
328for an explanation of the filter language.
329.It Dv BIOCSETWF ( Li "struct bpf_program" )
330Sets the filter program used by the kernel to filter the packets
331written to the descriptor before the packets are sent out on the
332network.
333See
334.Dv BIOCSETF
335for a description of the filter program.
336This ioctl also acts as
337.Dv BIOCFLUSH .
338.Pp
339Note that the filter operates on the packet data written to the descriptor.
340If the
341.Dq header complete
342flag is not set, the kernel sets the link-layer source address
343of the packet after filtering.
344.It Dv BIOCVERSION ( Li "struct bpf_version" )
345Returns the major and minor version numbers of the filter language currently
346recognized by the kernel.
347Before installing a filter, applications must check that the current version
348is compatible with the running kernel.
349Version numbers are compatible if the major numbers match and the application
350minor is less than or equal to the kernel minor.
351The kernel version number is returned in the following structure:
352.Bd -literal -offset indent
353struct bpf_version {
354	u_short bv_major;
355	u_short bv_minor;
356};
357.Ed
358.Pp
359The current version numbers are given by
360.Dv BPF_MAJOR_VERSION
361and
362.Dv BPF_MINOR_VERSION
363from
364.Aq Pa net/bpf.h .
365An incompatible filter may result in undefined behavior (most likely, an
366error returned by
367.Xr ioctl 2
368or haphazard packet matching).
369.It Dv BIOCSRSIG , BIOCGRSIG ( Li u_int )
370Set or get the receive signal.
371This signal will be sent to the process or process group specified by
372.Dv FIOSETOWN .
373It defaults to
374.Dv SIGIO .
375.It Dv BIOCSHDRCMPLT , BIOCGHDRCMPLT ( Li u_int )
376Set or get the status of the ``header complete'' flag.
377Set to zero if the link level source address should be filled in
378automatically by the interface output routine.
379Set to one if the link level source address will be written,
380as provided, to the wire.
381This flag is initialized to zero by default.
382.El
383.Ss Standard ioctls
384.Nm
385now supports several standard ioctls which allow the user to do asynchronous
386and/or non-blocking I/O to an open
387.Nm
388file descriptor.
389.Bl -tag -width Ds
390.It Dv FIONREAD ( Li int )
391Returns the number of bytes that are immediately available for reading.
392.It Dv SIOCGIFADDR ( Li "struct ifreq" )
393Returns the address associated with the interface.
394.It Dv FIONBIO ( Li int )
395Set or clear non-blocking I/O.
396If the argument is non-zero, enable non-blocking I/O.
397If the argument is zero, disable non-blocking I/O.
398If non-blocking I/O is enabled, the return value of a read while no data
399is available will be 0.
400The non-blocking read behavior is different from performing non-blocking
401reads on other file descriptors, which will return \-1 and set
402.Va errno
403to
404.Er EAGAIN
405if no data is available.
406Note: setting this overrides the timeout set by
407.Dv BIOCSRTIMEOUT .
408.It Dv FIOASYNC ( Li int )
409Enable or disable asynchronous I/O.
410When enabled (argument is non-zero), the process or process group specified
411by
412.Dv FIOSETOWN
413will start receiving
414.Dv SIGIO
415signals when packets arrive.
416Note that you must perform an
417.Dv FIOSETOWN
418command in order for this to take effect, as the system will not do it by
419default.
420The signal may be changed via
421.Dv BIOCSRSIG .
422.It Dv FIOSETOWN , FIOGETOWN ( Li int )
423Set or get the process or process group (if negative) that should receive
424.Dv SIGIO
425when packets are available.
426The signal may be changed using
427.Dv BIOCSRSIG
428(see above).
429.El
430.Ss BPF header
431The following structure is prepended to each packet returned by
432.Xr read 2 :
433.Bd -literal -offset indent
434struct bpf_hdr {
435	struct bpf_timeval bh_tstamp;
436	u_int32_t	bh_caplen;
437	u_int32_t	bh_datalen;
438	u_int16_t	bh_hdrlen;
439};
440.Ed
441.Pp
442The fields, stored in host order, are as follows:
443.Bl -tag -width Ds
444.It Fa bh_tstamp
445Time at which the packet was processed by the packet filter.
446.It Fa bh_caplen
447Length of the captured portion of the packet.
448This is the minimum of the truncation amount specified by the filter and the
449length of the packet.
450.It Fa bh_datalen
451Length of the packet off the wire.
452This value is independent of the truncation amount specified by the filter.
453.It Fa bh_hdrlen
454Length of the BPF header, which may not be equal to
455.Li sizeof(struct bpf_hdr) .
456.El
457.Pp
458The
459.Fa bh_hdrlen
460field exists to account for padding between the header and the link level
461protocol.
462The purpose here is to guarantee proper alignment of the packet data
463structures, which is required on alignment-sensitive architectures and
464improves performance on many other architectures.
465The packet filter ensures that the
466.Fa bpf_hdr
467and the network layer header will be word aligned.
468Suitable precautions must be taken when accessing the link layer protocol
469fields on alignment restricted machines.
470(This isn't a problem on an Ethernet, since the type field is a
471.Li short
472falling on an even offset, and the addresses are probably accessed in a
473bytewise fashion).
474.Pp
475Additionally, individual packets are padded so that each starts on a
476word boundary.
477This requires that an application has some knowledge of how to get from packet
478to packet.
479The macro
480.Dv BPF_WORDALIGN
481is defined in
482.Aq Pa net/bpf.h
483to facilitate this process.
484It rounds up its argument to the nearest word aligned value (where a word is
485.Dv BPF_ALIGNMENT
486bytes wide).
487For example, if
488.Va p
489points to the start of a packet, this expression will advance it to the
490next packet:
491.Pp
492.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen);
493.Pp
494For the alignment mechanisms to work properly, the buffer passed to
495.Xr read 2
496must itself be word aligned.
497.Xr malloc 3
498will always return an aligned buffer.
499.Ss Filter machine
500A filter program is an array of instructions with all branches forwardly
501directed, terminated by a
502.Dq return
503instruction.
504Each instruction performs some action on the pseudo-machine state, which
505consists of an accumulator, index register, scratch memory store, and
506implicit program counter.
507.Pp
508The following structure defines the instruction format:
509.Bd -literal -offset indent
510struct bpf_insn {
511	u_int16_t	code;
512	u_char		jt;
513	u_char		jf;
514	u_int32_t	k;
515};
516.Ed
517.Pp
518The
519.Fa k
520field is used in different ways by different instructions, and the
521.Fa jt
522and
523.Fa jf
524fields are used as offsets by the branch instructions.
525The opcodes are encoded in a semi-hierarchical fashion.
526There are eight classes of instructions:
527.Dv BPF_LD ,
528.Dv BPF_LDX ,
529.Dv BPF_ST ,
530.Dv BPF_STX ,
531.Dv BPF_ALU ,
532.Dv BPF_JMP ,
533.Dv BPF_RET ,
534and
535.Dv BPF_MISC .
536Various other mode and operator bits are logically OR'd into the class to
537give the actual instructions.
538The classes and modes are defined in
539.Aq Pa net/bpf.h .
540Below are the semantics for each defined
541.Nm
542instruction.
543We use the convention that A is the accumulator, X is the index register,
544P[] packet data, and M[] scratch memory store.
545P[i:n] gives the data at byte offset
546.Dq i
547in the packet, interpreted as a word (n=4), unsigned halfword (n=2), or
548unsigned byte (n=1).
549M[i] gives the i'th word in the scratch memory store, which is only addressed
550in word units.
551The memory store is indexed from 0 to
552.Dv BPF_MEMWORDS Ns \-1 .
553.Fa k ,
554.Fa jt ,
555and
556.Fa jf
557are the corresponding fields in the instruction definition.
558.Dq len
559refers to the length of the packet.
560.Bl -tag -width Ds
561.It Dv BPF_LD
562These instructions copy a value into the accumulator.
563The type of the source operand is specified by an
564.Dq addressing mode
565and can be a constant
566.Pf ( Dv BPF_IMM ) ,
567packet data at a fixed offset
568.Pf ( Dv BPF_ABS ) ,
569packet data at a variable offset
570.Pf ( Dv BPF_IND ) ,
571the packet length
572.Pf ( Dv BPF_LEN ) ,
573or a word in the scratch memory store
574.Pf ( Dv BPF_MEM ) .
575For
576.Dv BPF_IND
577and
578.Dv BPF_ABS ,
579the data size must be specified as a word
580.Pf ( Dv BPF_W ) ,
581halfword
582.Pf ( Dv BPF_H ) ,
583or byte
584.Pf ( Dv BPF_B ) .
585The semantics of all recognized
586.Dv BPF_LD
587instructions follow.
588.Pp
589.Bl -tag -width 32n -compact
590.Sm off
591.It Xo Dv BPF_LD No + Dv BPF_W No +
592.Dv BPF_ABS
593.Xc
594.Sm on
595A <- P[k:4]
596.Sm off
597.It Xo Dv BPF_LD No + Dv BPF_H No +
598.Dv BPF_ABS
599.Xc
600.Sm on
601A <- P[k:2]
602.Sm off
603.It Xo Dv BPF_LD No + Dv BPF_B No +
604.Dv BPF_ABS
605.Xc
606.Sm on
607A <- P[k:1]
608.Sm off
609.It Xo Dv BPF_LD No + Dv BPF_W No +
610.Dv BPF_IND
611.Xc
612.Sm on
613A <- P[X+k:4]
614.Sm off
615.It Xo Dv BPF_LD No + Dv BPF_H No +
616.Dv BPF_IND
617.Xc
618.Sm on
619A <- P[X+k:2]
620.Sm off
621.It Xo Dv BPF_LD No + Dv BPF_B No +
622.Dv BPF_IND
623.Xc
624.Sm on
625A <- P[X+k:1]
626.Sm off
627.It Xo Dv BPF_LD No + Dv BPF_W No +
628.Dv BPF_LEN
629.Xc
630.Sm on
631A <- len
632.Sm off
633.It Dv BPF_LD No + Dv BPF_IMM
634.Sm on
635A <- k
636.Sm off
637.It Dv BPF_LD No + Dv BPF_MEM
638.Sm on
639A <- M[k]
640.El
641.It Dv BPF_LDX
642These instructions load a value into the index register.
643Note that the addressing modes are more restricted than those of the
644accumulator loads, but they include
645.Dv BPF_MSH ,
646a hack for efficiently loading the IP header length.
647.Pp
648.Bl -tag -width 32n -compact
649.Sm off
650.It Xo Dv BPF_LDX No + Dv BPF_W No +
651.Dv BPF_IMM
652.Xc
653.Sm on
654X <- k
655.Sm off
656.It Xo Dv BPF_LDX No + Dv BPF_W No +
657.Dv BPF_MEM
658.Xc
659.Sm on
660X <- M[k]
661.Sm off
662.It Xo Dv BPF_LDX No + Dv BPF_W No +
663.Dv BPF_LEN
664.Xc
665.Sm on
666X <- len
667.Sm off
668.It Xo Dv BPF_LDX No + Dv BPF_B No +
669.Dv BPF_MSH
670.Xc
671.Sm on
672X <- 4*(P[k:1]&0xf)
673.El
674.It Dv BPF_ST
675This instruction stores the accumulator into the scratch memory.
676We do not need an addressing mode since there is only one possibility for
677the destination.
678.Pp
679.Bl -tag -width 32n -compact
680.It Dv BPF_ST
681M[k] <- A
682.El
683.It Dv BPF_STX
684This instruction stores the index register in the scratch memory store.
685.Pp
686.Bl -tag -width 32n -compact
687.It Dv BPF_STX
688M[k] <- X
689.El
690.It Dv BPF_ALU
691The ALU instructions perform operations between the accumulator and index
692register or constant, and store the result back in the accumulator.
693For binary operations, a source mode is required
694.Pf ( Dv BPF_K
695or
696.Dv BPF_X ) .
697.Pp
698.Bl -tag -width 32n -compact
699.Sm off
700.It Xo Dv BPF_ALU No + BPF_ADD No +
701.Dv BPF_K
702.Xc
703.Sm on
704A <- A + k
705.Sm off
706.It Xo Dv BPF_ALU No + BPF_SUB No +
707.Dv BPF_K
708.Xc
709.Sm on
710A <- A - k
711.Sm off
712.It Xo Dv BPF_ALU No + BPF_MUL No +
713.Dv BPF_K
714.Xc
715.Sm on
716A <- A * k
717.Sm off
718.It Xo Dv BPF_ALU No + BPF_DIV No +
719.Dv BPF_K
720.Xc
721.Sm on
722A <- A / k
723.Sm off
724.It Xo Dv BPF_ALU No + BPF_AND No +
725.Dv BPF_K
726.Xc
727.Sm on
728A <- A & k
729.Sm off
730.It Xo Dv BPF_ALU No + BPF_OR No +
731.Dv BPF_K
732.Xc
733.Sm on
734A <- A | k
735.Sm off
736.It Xo Dv BPF_ALU No + BPF_LSH No +
737.Dv BPF_K
738.Xc
739.Sm on
740A <- A << k
741.Sm off
742.It Xo Dv BPF_ALU No + BPF_RSH No +
743.Dv BPF_K
744.Xc
745.Sm on
746A <- A >> k
747.Sm off
748.It Xo Dv BPF_ALU No + BPF_ADD No +
749.Dv BPF_X
750.Xc
751.Sm on
752A <- A + X
753.Sm off
754.It Xo Dv BPF_ALU No + BPF_SUB No +
755.Dv BPF_X
756.Xc
757.Sm on
758A <- A - X
759.Sm off
760.It Xo Dv BPF_ALU No + BPF_MUL No +
761.Dv BPF_X
762.Xc
763.Sm on
764A <- A * X
765.Sm off
766.It Xo Dv BPF_ALU No + BPF_DIV No +
767.Dv BPF_X
768.Xc
769.Sm on
770A <- A / X
771.Sm off
772.It Xo Dv BPF_ALU No + BPF_AND No +
773.Dv BPF_X
774.Xc
775.Sm on
776A <- A & X
777.Sm off
778.It Xo Dv BPF_ALU No + BPF_OR No +
779.Dv BPF_X
780.Xc
781.Sm on
782A <- A | X
783.Sm off
784.It Xo Dv BPF_ALU No + BPF_LSH No +
785.Dv BPF_X
786.Xc
787.Sm on
788A <- A << X
789.Sm off
790.It Xo Dv BPF_ALU No + BPF_RSH No +
791.Dv BPF_X
792.Xc
793.Sm on
794A <- A >> X
795.Sm off
796.It Dv BPF_ALU No + BPF_NEG
797.Sm on
798A <- -A
799.El
800.It Dv BPF_JMP
801The jump instructions alter flow of control.
802Conditional jumps compare the accumulator against a constant
803.Pf ( Dv BPF_K )
804or the index register
805.Pf ( Dv BPF_X ) .
806If the result is true (or non-zero), the true branch is taken, otherwise the
807false branch is taken.
808Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
809However, the jump always
810.Pf ( Dv BPF_JA )
811opcode uses the 32-bit
812.Fa k
813field as the offset, allowing arbitrarily distant destinations.
814All conditionals use unsigned comparison conventions.
815.Pp
816.Bl -tag -width 32n -compact
817.Sm off
818.It Dv BPF_JMP No + BPF_JA
819pc += k
820.Sm on
821.Sm off
822.It Xo Dv BPF_JMP No + BPF_JGT No +
823.Dv BPF_K
824.Xc
825.Sm on
826pc += (A > k) ? jt : jf
827.Sm off
828.It Xo Dv BPF_JMP No + BPF_JGE No +
829.Dv BPF_K
830.Xc
831.Sm on
832pc += (A >= k) ? jt : jf
833.Sm off
834.It Xo Dv BPF_JMP No + BPF_JEQ No +
835.Dv BPF_K
836.Xc
837.Sm on
838pc += (A == k) ? jt : jf
839.Sm off
840.It Xo Dv BPF_JMP No + BPF_JSET No +
841.Dv BPF_K
842.Xc
843.Sm on
844pc += (A & k) ? jt : jf
845.Sm off
846.It Xo Dv BPF_JMP No + BPF_JGT No +
847.Dv BPF_X
848.Xc
849.Sm on
850pc += (A > X) ? jt : jf
851.Sm off
852.It Xo Dv BPF_JMP No + BPF_JGE No +
853.Dv BPF_X
854.Xc
855.Sm on
856pc += (A >= X) ? jt : jf
857.Sm off
858.It Xo Dv BPF_JMP No + BPF_JEQ No +
859.Dv BPF_X
860.Xc
861.Sm on
862pc += (A == X) ? jt : jf
863.Sm off
864.It Xo Dv BPF_JMP No + BPF_JSET No +
865.Dv BPF_X
866.Xc
867.Sm on
868pc += (A & X) ? jt : jf
869.El
870.It Dv BPF_RET
871The return instructions terminate the filter program and specify the
872amount of packet to accept (i.e., they return the truncation amount)
873or, for the write filter, the maximum acceptable size for the packet
874(i.e., the packet is dropped if it is larger than the returned
875amount).
876A return value of zero indicates that the packet should be ignored/dropped.
877The return value is either a constant
878.Pf ( Dv BPF_K )
879or the accumulator
880.Pf ( Dv BPF_A ) .
881.Pp
882.Bl -tag -width 32n -compact
883.It Dv BPF_RET No + Dv BPF_A
884Accept A bytes.
885.It Dv BPF_RET No + Dv BPF_K
886Accept k bytes.
887.El
888.It Dv BPF_MISC
889The miscellaneous category was created for anything that doesn't fit into
890the above classes, and for any new instructions that might need to be added.
891Currently, these are the register transfer instructions that copy the index
892register to the accumulator or vice versa.
893.Pp
894.Bl -tag -width 32n -compact
895.Sm off
896.It Dv BPF_MISC No + Dv BPF_TAX
897.Sm on
898X <- A
899.Sm off
900.It Dv BPF_MISC No + Dv BPF_TXA
901.Sm on
902A <- X
903.El
904.El
905.Pp
906The
907.Nm
908interface provides the following macros to facilitate array initializers:
909.Bd -filled -offset indent
910.Dv BPF_STMT ( Ns Ar opcode ,
911.Ar operand )
912.Pp
913.Dv BPF_JUMP ( Ns Ar opcode ,
914.Ar operand ,
915.Ar true_offset ,
916.Ar false_offset )
917.Ed
918.Sh FILES
919.Bl -tag -width /dev/bpf[0-9] -compact
920.It Pa /dev/bpf[0-9]
921BPF devices
922.El
923.Sh EXAMPLES
924The following filter is taken from the Reverse ARP daemon.
925It accepts only Reverse ARP requests.
926.Bd -literal -offset indent
927struct bpf_insn insns[] = {
928	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
929	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
930	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
931	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
932	BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
933	    sizeof(struct ether_header)),
934	BPF_STMT(BPF_RET+BPF_K, 0),
935};
936.Ed
937.Pp
938This filter accepts only IP packets between host 128.3.112.15 and
939128.3.112.35.
940.Bd -literal -offset indent
941struct bpf_insn insns[] = {
942	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
943	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
944	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
945	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
946	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
947	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
948	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
949	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
950	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
951	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
952	BPF_STMT(BPF_RET+BPF_K, 0),
953};
954.Ed
955.Pp
956Finally, this filter returns only TCP finger packets.
957We must parse the IP header to reach the TCP header.
958The
959.Dv BPF_JSET
960instruction checks that the IP fragment offset is 0 so we are sure that we
961have a TCP header.
962.Bd -literal -offset indent
963struct bpf_insn insns[] = {
964	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
965	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
966	BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
967	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
968	BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
969	BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
970	BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
971	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
972	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
973	BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
974	BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
975	BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
976	BPF_STMT(BPF_RET+BPF_K, 0),
977};
978.Ed
979.Sh SEE ALSO
980.Xr ioctl 2 ,
981.Xr read 2 ,
982.Xr select 2 ,
983.Xr signal 3 ,
984.Xr MAKEDEV 8 ,
985.Xr tcpdump 8
986.Rs
987.%A McCanne, S.
988.%A Jacobson V.
989.%J "An efficient, extensible, and portable network monitor"
990.Re
991.Sh HISTORY
992The Enet packet filter was created in 1980 by Mike Accetta and Rick Rashid
993at Carnegie-Mellon University.
994Jeffrey Mogul, at Stanford, ported the code to BSD and continued its
995development from 1983 on.
996Since then, it has evolved into the Ultrix Packet Filter at DEC, a STREAMS
997NIT module under SunOS 4.1, and BPF.
998.Sh AUTHORS
999Steve McCanne of Lawrence Berkeley Laboratory implemented BPF in Summer 1990.
1000Much of the design is due to Van Jacobson.
1001.Sh BUGS
1002The read buffer must be of a fixed size (returned by the
1003.Dv BIOCGBLEN
1004ioctl).
1005.Pp
1006A file that does not request promiscuous mode may receive promiscuously
1007received packets as a side effect of another file requesting this mode on
1008the same hardware interface.
1009This could be fixed in the kernel with additional processing overhead.
1010However, we favor the model where all files must assume that the interface
1011is promiscuous, and if so desired, must utilize a filter to reject foreign
1012packets.
1013.Pp
1014Data link protocols with variable length headers are not currently supported.
1015