1.\" $OpenBSD: bpf.4,v 1.24 2005/01/08 00:23:05 jmc Exp $ 2.\" $NetBSD: bpf.4,v 1.7 1995/09/27 18:31:50 thorpej Exp $ 3.\" 4.\" Copyright (c) 1990 The Regents of the University of California. 5.\" All rights reserved. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that: (1) source code distributions 9.\" retain the above copyright notice and this paragraph in its entirety, (2) 10.\" distributions including binary code include the above copyright notice and 11.\" this paragraph in its entirety in the documentation or other materials 12.\" provided with the distribution, and (3) all advertising materials mentioning 13.\" features or use of this software display the following acknowledgement: 14.\" ``This product includes software developed by the University of California, 15.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of 16.\" the University nor the names of its contributors may be used to endorse 17.\" or promote products derived from this software without specific prior 18.\" written permission. 19.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 20.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 21.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 22.\" 23.\" This document is derived in part from the enet man page (enet.4) 24.\" distributed with 4.3BSD Unix. 25.\" 26.Dd May 23, 1991 27.Dt BPF 4 28.Os 29.Sh NAME 30.Nm bpf 31.Nd Berkeley Packet Filter 32.Sh SYNOPSIS 33.Cd "pseudo-device bpfilter" 34.Sh DESCRIPTION 35The Berkeley Packet Filter provides a raw interface to data link layers in 36a protocol-independent fashion. 37All packets on the network, even those destined for other hosts, are 38accessible through this mechanism. 39.Pp 40The packet filter appears as a character special device, 41.Pa /dev/bpf0 , 42.Pa /dev/bpf1 , 43etc. 44After opening the device, the file descriptor must be bound to a specific 45network interface with the 46.Dv BIOCSETIF 47ioctl. 48A given interface can be shared between multiple listeners, and the filter 49underlying each descriptor will see an identical packet stream. 50.Pp 51A separate device file is required for each minor device. 52If a file is in use, the open will fail and 53.Va errno 54will be set to 55.Er EBUSY . 56The number of open files can be increased by creating additional 57device nodes with the 58.Xr MAKEDEV 8 59script. 60.Pp 61Associated with each open instance of a 62.Nm 63file is a user-settable 64packet filter. 65Whenever a packet is received by an interface, all file descriptors 66listening on that interface apply their filter. 67Each descriptor that accepts the packet receives its own copy. 68.Pp 69Reads from these files return the next group of packets that have matched 70the filter. 71To improve performance, the buffer passed to read must be the same size as 72the buffers used internally by 73.Nm bpf . 74This size is returned by the 75.Dv BIOCGBLEN 76ioctl (see below), and under BSD, can be set with 77.Dv BIOCSBLEN . 78Note that an individual packet larger than this size is necessarily truncated. 79.Pp 80The packet filter will support any link level protocol that has fixed length 81headers. 82Currently, only Ethernet, SLIP, and PPP drivers have been modified to 83interact with 84.Nm bpf . 85.Pp 86Since packet data is in network byte order, applications should use the 87.Xr byteorder 3 88macros to extract multi-byte values. 89.Pp 90A packet can be sent out on the network by writing to a 91.Nm 92file descriptor. 93Each descriptor can also have a user-settable filter 94for controlling the writes. 95Only packets matching the filter are sent out of the interface. 96The writes are unbuffered, meaning only one packet can be processed per write. 97.Pp 98Once a descriptor is configured, further changes to the configuration 99can be prevented using the 100.Dv BIOCLOCK 101ioctl. 102.Ss Ioctls 103The ioctl command codes below are defined in 104.Aq Pa net/bpf.h . 105All commands require these includes: 106.Bd -unfilled -offset indent 107.Cd #include <sys/types.h> 108.Cd #include <sys/time.h> 109.Cd #include <sys/ioctl.h> 110.Cd #include <net/bpf.h> 111.Ed 112.Pp 113Additionally, 114.Dv BIOCGETIF 115and 116.Dv BIOCSETIF 117require 118.Aq Pa sys/socket.h 119and 120.Aq Pa net/if.h . 121.Pp 122The (third) argument to the 123.Xr ioctl 2 124call should be a pointer to the type indicated. 125.Bl -tag -width Ds 126.It Dv BIOCGBLEN ( Li int ) 127Returns the required buffer length for reads on 128.Nm 129files. 130.It Dv BIOCSBLEN ( Li u_int ) 131Sets the buffer length for reads on 132.Nm 133files. 134The buffer must be set before the file is attached to an interface with 135.Dv BIOCSETIF . 136If the requested buffer size cannot be accommodated, the closest allowable 137size will be set and returned in the argument. 138A read call will result in 139.Er EIO 140if it is passed a buffer that is not this size. 141.It Dv BIOCGDLT ( Li u_int ) 142Returns the type of the data link layer underlying the attached interface. 143.Er EINVAL 144is returned if no interface has been specified. 145The device types, prefixed with 146.Dq DLT_ , 147are defined in 148.Aq Pa net/bpf.h . 149.It Dv BIOCGDLTLIST (struct bpf_dltlist) 150Returns an array of the available types of the data link layer 151underlying the attached interface: 152.Bd -literal -offset indent 153struct bpf_dltlist { 154 u_int bfl_len; 155 u_int *bfl_list; 156}; 157.Ed 158.Pp 159The available types are returned in the array pointed to by the 160.Va bfl_list 161field while their length in 162.Vt u_int 163is supplied to the 164.Va bfl_len 165field. 166.Er ENOMEM 167is returned if there is not enough buffer space and 168.Er EFAULT 169is returned if a bad address is encountered. 170The 171.Va bfl_len 172field is modified on return to indicate the actual length in 173.Vt u_int 174of the array returned. 175If 176.Va bfl_list 177is 178.Dv NULL , 179the 180.Va bfl_len 181field is set to indicate the required length of the array in 182.Vt u_int . 183.It Dv BIOCSDLT (u_int) 184Changes the type of the data link layer underlying the attached interface. 185.Er EINVAL 186is returned if no interface has been specified or the specified 187type is not available for the interface. 188.It Dv BIOCPROMISC 189Forces the interface into promiscuous mode. 190All packets, not just those destined for the local host, are processed. 191Since more than one file can be listening on a given interface, a listener 192that opened its interface non-promiscuously may receive packets promiscuously. 193This problem can be remedied with an appropriate filter. 194.Pp 195The interface remains in promiscuous mode until all files listening 196promiscuously are closed. 197.It Dv BIOCFLUSH 198Flushes the buffer of incoming packets and resets the statistics that are 199returned by 200.Dv BIOCGSTATS . 201.It Dv BIOCLOCK 202This ioctl is designed to prevent the security issues associated 203with an open 204.Nm 205descriptor in unprivileged programs. 206Even with dropped privileges, an open 207.Nm 208descriptor can be abused by a rogue program to listen on any interface 209on the system, send packets on these interfaces if the descriptor was 210opened read-write and send signals to arbitrary processes using the 211signaling mechanism of 212.Nm bpf . 213By allowing only 214.Dq known safe 215ioctls, the 216.Dv BIOCLOCK 217ioctl prevents this abuse. 218The allowable ioctls are 219.Dv BIOCGBLEN , 220.Dv BIOCFLUSH , 221.Dv BIOCGDLT , 222.Dv BIOCGETIF , 223.Dv BIOCGRTIMEOUT , 224.Dv BIOCSRTIMEOUT , 225.Dv BIOCIMMEDIATE , 226.Dv BIOCGSTATS , 227.Dv BIOCVERSION , 228.Dv BIOCGRSIG , 229.Dv BIOCGHDRCMPLT , 230.Dv TIOCGPGRP , 231and 232.Dv FIONREAD . 233Use of any other ioctl is denied with error 234.Er EPERM . 235Once a descriptor is locked, it is not possible to unlock it. 236A process with root privileges is not affected by the lock. 237.Pp 238A privileged program can open a 239.Nm 240device, drop privileges, set the interface, filters and modes on the 241descriptor, and lock it. 242Once the descriptor is locked, the system is safe 243from further abuse through the descriptor. 244Locking a descriptor does not prevent writes. 245If the application does not need to send packets through 246.Nm bpf , 247it can open the device read-only to prevent writing. 248If sending packets is necessary, a write-filter can be set before locking the 249descriptor to prevent arbitrary packets from being sent out. 250.It Dv BIOCGETIF ( Li "struct ifreq" ) 251Returns the name of the hardware interface that the file is listening on. 252The name is returned in the 253.Fa ifr_name 254field of the 255.Li struct ifreq . 256All other fields are undefined. 257.It Dv BIOCSETIF ( Li "struct ifreq" ) 258Sets the hardware interface associated with the file. 259This command must be performed before any packets can be read. 260The device is indicated by name using the 261.Fa ifr_name 262field of the 263.Li struct ifreq . 264Additionally, performs the actions of 265.Dv BIOCFLUSH . 266.It Dv BIOCSRTIMEOUT , BIOCGRTIMEOUT ( Li "struct timeval" ) 267Set or get the read timeout parameter. 268The 269.Ar timeval 270specifies the length of time to wait before timing out on a read request. 271This parameter is initialized to zero by 272.Xr open 2 , 273indicating no timeout. 274.It Dv BIOCGSTATS ( Li "struct bpf_stat" ) 275Returns the following structure of packet statistics: 276.Bd -literal -offset indent 277struct bpf_stat { 278 u_int bs_recv; 279 u_int bs_drop; 280}; 281.Ed 282.Pp 283The fields are: 284.Bl -tag -width bs_recv 285.It Fa bs_recv 286Number of packets received by the descriptor since opened or reset (including 287any buffered since the last read call). 288.It Fa bs_drop 289Number of packets which were accepted by the filter but dropped by the kernel 290because of buffer overflows (i.e., the application's reads aren't keeping up 291with the packet traffic). 292.El 293.It Dv BIOCIMMEDIATE ( Li u_int ) 294Enable or disable 295.Dq immediate mode , 296based on the truth value of the argument. 297When immediate mode is enabled, reads return immediately upon packet reception. 298Otherwise, a read will block until either the kernel buffer becomes full or a 299timeout occurs. 300This is useful for programs like 301.Xr rarpd 8 , 302which must respond to messages in real time. 303The default for a new file is off. 304.It Dv BIOCSETF ( Li "struct bpf_program" ) 305Sets the filter program used by the kernel to discard uninteresting packets. 306An array of instructions and its length are passed in using the following 307structure: 308.Bd -literal -offset indent 309struct bpf_program { 310 int bf_len; 311 struct bpf_insn *bf_insns; 312}; 313.Ed 314.Pp 315The filter program is pointed to by the 316.Fa bf_insns 317field, while its length in units of 318.Li struct bpf_insn 319is given by the 320.Fa bf_len 321field. 322Also, the actions of 323.Dv BIOCFLUSH 324are performed. 325.Pp 326See section 327.Sx FILTER MACHINE 328for an explanation of the filter language. 329.It Dv BIOCSETWF ( Li "struct bpf_program" ) 330Sets the filter program used by the kernel to filter the packets 331written to the descriptor before the packets are sent out on the 332network. 333See 334.Dv BIOCSETF 335for a description of the filter program. 336This ioctl also acts as 337.Dv BIOCFLUSH . 338.Pp 339Note that the filter operates on the packet data written to the descriptor. 340If the 341.Dq header complete 342flag is not set, the kernel sets the link-layer source address 343of the packet after filtering. 344.It Dv BIOCVERSION ( Li "struct bpf_version" ) 345Returns the major and minor version numbers of the filter language currently 346recognized by the kernel. 347Before installing a filter, applications must check that the current version 348is compatible with the running kernel. 349Version numbers are compatible if the major numbers match and the application 350minor is less than or equal to the kernel minor. 351The kernel version number is returned in the following structure: 352.Bd -literal -offset indent 353struct bpf_version { 354 u_short bv_major; 355 u_short bv_minor; 356}; 357.Ed 358.Pp 359The current version numbers are given by 360.Dv BPF_MAJOR_VERSION 361and 362.Dv BPF_MINOR_VERSION 363from 364.Aq Pa net/bpf.h . 365An incompatible filter may result in undefined behavior (most likely, an 366error returned by 367.Xr ioctl 2 368or haphazard packet matching). 369.It Dv BIOCSRSIG , BIOCGRSIG ( Li u_int ) 370Set or get the receive signal. 371This signal will be sent to the process or process group specified by 372.Dv FIOSETOWN . 373It defaults to 374.Dv SIGIO . 375.It Dv BIOCSHDRCMPLT , BIOCGHDRCMPLT ( Li u_int ) 376Set or get the status of the ``header complete'' flag. 377Set to zero if the link level source address should be filled in 378automatically by the interface output routine. 379Set to one if the link level source address will be written, 380as provided, to the wire. 381This flag is initialized to zero by default. 382.El 383.Ss Standard ioctls 384.Nm 385now supports several standard ioctls which allow the user to do asynchronous 386and/or non-blocking I/O to an open 387.Nm 388file descriptor. 389.Bl -tag -width Ds 390.It Dv FIONREAD ( Li int ) 391Returns the number of bytes that are immediately available for reading. 392.It Dv SIOCGIFADDR ( Li "struct ifreq" ) 393Returns the address associated with the interface. 394.It Dv FIONBIO ( Li int ) 395Set or clear non-blocking I/O. 396If the argument is non-zero, enable non-blocking I/O. 397If the argument is zero, disable non-blocking I/O. 398If non-blocking I/O is enabled, the return value of a read while no data 399is available will be 0. 400The non-blocking read behavior is different from performing non-blocking 401reads on other file descriptors, which will return \-1 and set 402.Va errno 403to 404.Er EAGAIN 405if no data is available. 406Note: setting this overrides the timeout set by 407.Dv BIOCSRTIMEOUT . 408.It Dv FIOASYNC ( Li int ) 409Enable or disable asynchronous I/O. 410When enabled (argument is non-zero), the process or process group specified 411by 412.Dv FIOSETOWN 413will start receiving 414.Dv SIGIO 415signals when packets arrive. 416Note that you must perform an 417.Dv FIOSETOWN 418command in order for this to take effect, as the system will not do it by 419default. 420The signal may be changed via 421.Dv BIOCSRSIG . 422.It Dv FIOSETOWN , FIOGETOWN ( Li int ) 423Set or get the process or process group (if negative) that should receive 424.Dv SIGIO 425when packets are available. 426The signal may be changed using 427.Dv BIOCSRSIG 428(see above). 429.El 430.Ss BPF header 431The following structure is prepended to each packet returned by 432.Xr read 2 : 433.Bd -literal -offset indent 434struct bpf_hdr { 435 struct bpf_timeval bh_tstamp; 436 u_int32_t bh_caplen; 437 u_int32_t bh_datalen; 438 u_int16_t bh_hdrlen; 439}; 440.Ed 441.Pp 442The fields, stored in host order, are as follows: 443.Bl -tag -width Ds 444.It Fa bh_tstamp 445Time at which the packet was processed by the packet filter. 446.It Fa bh_caplen 447Length of the captured portion of the packet. 448This is the minimum of the truncation amount specified by the filter and the 449length of the packet. 450.It Fa bh_datalen 451Length of the packet off the wire. 452This value is independent of the truncation amount specified by the filter. 453.It Fa bh_hdrlen 454Length of the BPF header, which may not be equal to 455.Li sizeof(struct bpf_hdr) . 456.El 457.Pp 458The 459.Fa bh_hdrlen 460field exists to account for padding between the header and the link level 461protocol. 462The purpose here is to guarantee proper alignment of the packet data 463structures, which is required on alignment-sensitive architectures and 464improves performance on many other architectures. 465The packet filter ensures that the 466.Fa bpf_hdr 467and the network layer header will be word aligned. 468Suitable precautions must be taken when accessing the link layer protocol 469fields on alignment restricted machines. 470(This isn't a problem on an Ethernet, since the type field is a 471.Li short 472falling on an even offset, and the addresses are probably accessed in a 473bytewise fashion). 474.Pp 475Additionally, individual packets are padded so that each starts on a 476word boundary. 477This requires that an application has some knowledge of how to get from packet 478to packet. 479The macro 480.Dv BPF_WORDALIGN 481is defined in 482.Aq Pa net/bpf.h 483to facilitate this process. 484It rounds up its argument to the nearest word aligned value (where a word is 485.Dv BPF_ALIGNMENT 486bytes wide). 487For example, if 488.Va p 489points to the start of a packet, this expression will advance it to the 490next packet: 491.Pp 492.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen); 493.Pp 494For the alignment mechanisms to work properly, the buffer passed to 495.Xr read 2 496must itself be word aligned. 497.Xr malloc 3 498will always return an aligned buffer. 499.Ss Filter machine 500A filter program is an array of instructions with all branches forwardly 501directed, terminated by a 502.Dq return 503instruction. 504Each instruction performs some action on the pseudo-machine state, which 505consists of an accumulator, index register, scratch memory store, and 506implicit program counter. 507.Pp 508The following structure defines the instruction format: 509.Bd -literal -offset indent 510struct bpf_insn { 511 u_int16_t code; 512 u_char jt; 513 u_char jf; 514 u_int32_t k; 515}; 516.Ed 517.Pp 518The 519.Fa k 520field is used in different ways by different instructions, and the 521.Fa jt 522and 523.Fa jf 524fields are used as offsets by the branch instructions. 525The opcodes are encoded in a semi-hierarchical fashion. 526There are eight classes of instructions: 527.Dv BPF_LD , 528.Dv BPF_LDX , 529.Dv BPF_ST , 530.Dv BPF_STX , 531.Dv BPF_ALU , 532.Dv BPF_JMP , 533.Dv BPF_RET , 534and 535.Dv BPF_MISC . 536Various other mode and operator bits are logically OR'd into the class to 537give the actual instructions. 538The classes and modes are defined in 539.Aq Pa net/bpf.h . 540Below are the semantics for each defined 541.Nm 542instruction. 543We use the convention that A is the accumulator, X is the index register, 544P[] packet data, and M[] scratch memory store. 545P[i:n] gives the data at byte offset 546.Dq i 547in the packet, interpreted as a word (n=4), unsigned halfword (n=2), or 548unsigned byte (n=1). 549M[i] gives the i'th word in the scratch memory store, which is only addressed 550in word units. 551The memory store is indexed from 0 to 552.Dv BPF_MEMWORDS Ns \-1 . 553.Fa k , 554.Fa jt , 555and 556.Fa jf 557are the corresponding fields in the instruction definition. 558.Dq len 559refers to the length of the packet. 560.Bl -tag -width Ds 561.It Dv BPF_LD 562These instructions copy a value into the accumulator. 563The type of the source operand is specified by an 564.Dq addressing mode 565and can be a constant 566.Pf ( Dv BPF_IMM ) , 567packet data at a fixed offset 568.Pf ( Dv BPF_ABS ) , 569packet data at a variable offset 570.Pf ( Dv BPF_IND ) , 571the packet length 572.Pf ( Dv BPF_LEN ) , 573or a word in the scratch memory store 574.Pf ( Dv BPF_MEM ) . 575For 576.Dv BPF_IND 577and 578.Dv BPF_ABS , 579the data size must be specified as a word 580.Pf ( Dv BPF_W ) , 581halfword 582.Pf ( Dv BPF_H ) , 583or byte 584.Pf ( Dv BPF_B ) . 585The semantics of all recognized 586.Dv BPF_LD 587instructions follow. 588.Pp 589.Bl -tag -width 32n -compact 590.Sm off 591.It Xo Dv BPF_LD No + Dv BPF_W No + 592.Dv BPF_ABS 593.Xc 594.Sm on 595A <- P[k:4] 596.Sm off 597.It Xo Dv BPF_LD No + Dv BPF_H No + 598.Dv BPF_ABS 599.Xc 600.Sm on 601A <- P[k:2] 602.Sm off 603.It Xo Dv BPF_LD No + Dv BPF_B No + 604.Dv BPF_ABS 605.Xc 606.Sm on 607A <- P[k:1] 608.Sm off 609.It Xo Dv BPF_LD No + Dv BPF_W No + 610.Dv BPF_IND 611.Xc 612.Sm on 613A <- P[X+k:4] 614.Sm off 615.It Xo Dv BPF_LD No + Dv BPF_H No + 616.Dv BPF_IND 617.Xc 618.Sm on 619A <- P[X+k:2] 620.Sm off 621.It Xo Dv BPF_LD No + Dv BPF_B No + 622.Dv BPF_IND 623.Xc 624.Sm on 625A <- P[X+k:1] 626.Sm off 627.It Xo Dv BPF_LD No + Dv BPF_W No + 628.Dv BPF_LEN 629.Xc 630.Sm on 631A <- len 632.Sm off 633.It Dv BPF_LD No + Dv BPF_IMM 634.Sm on 635A <- k 636.Sm off 637.It Dv BPF_LD No + Dv BPF_MEM 638.Sm on 639A <- M[k] 640.El 641.It Dv BPF_LDX 642These instructions load a value into the index register. 643Note that the addressing modes are more restricted than those of the 644accumulator loads, but they include 645.Dv BPF_MSH , 646a hack for efficiently loading the IP header length. 647.Pp 648.Bl -tag -width 32n -compact 649.Sm off 650.It Xo Dv BPF_LDX No + Dv BPF_W No + 651.Dv BPF_IMM 652.Xc 653.Sm on 654X <- k 655.Sm off 656.It Xo Dv BPF_LDX No + Dv BPF_W No + 657.Dv BPF_MEM 658.Xc 659.Sm on 660X <- M[k] 661.Sm off 662.It Xo Dv BPF_LDX No + Dv BPF_W No + 663.Dv BPF_LEN 664.Xc 665.Sm on 666X <- len 667.Sm off 668.It Xo Dv BPF_LDX No + Dv BPF_B No + 669.Dv BPF_MSH 670.Xc 671.Sm on 672X <- 4*(P[k:1]&0xf) 673.El 674.It Dv BPF_ST 675This instruction stores the accumulator into the scratch memory. 676We do not need an addressing mode since there is only one possibility for 677the destination. 678.Pp 679.Bl -tag -width 32n -compact 680.It Dv BPF_ST 681M[k] <- A 682.El 683.It Dv BPF_STX 684This instruction stores the index register in the scratch memory store. 685.Pp 686.Bl -tag -width 32n -compact 687.It Dv BPF_STX 688M[k] <- X 689.El 690.It Dv BPF_ALU 691The ALU instructions perform operations between the accumulator and index 692register or constant, and store the result back in the accumulator. 693For binary operations, a source mode is required 694.Pf ( Dv BPF_K 695or 696.Dv BPF_X ) . 697.Pp 698.Bl -tag -width 32n -compact 699.Sm off 700.It Xo Dv BPF_ALU No + BPF_ADD No + 701.Dv BPF_K 702.Xc 703.Sm on 704A <- A + k 705.Sm off 706.It Xo Dv BPF_ALU No + BPF_SUB No + 707.Dv BPF_K 708.Xc 709.Sm on 710A <- A - k 711.Sm off 712.It Xo Dv BPF_ALU No + BPF_MUL No + 713.Dv BPF_K 714.Xc 715.Sm on 716A <- A * k 717.Sm off 718.It Xo Dv BPF_ALU No + BPF_DIV No + 719.Dv BPF_K 720.Xc 721.Sm on 722A <- A / k 723.Sm off 724.It Xo Dv BPF_ALU No + BPF_AND No + 725.Dv BPF_K 726.Xc 727.Sm on 728A <- A & k 729.Sm off 730.It Xo Dv BPF_ALU No + BPF_OR No + 731.Dv BPF_K 732.Xc 733.Sm on 734A <- A | k 735.Sm off 736.It Xo Dv BPF_ALU No + BPF_LSH No + 737.Dv BPF_K 738.Xc 739.Sm on 740A <- A << k 741.Sm off 742.It Xo Dv BPF_ALU No + BPF_RSH No + 743.Dv BPF_K 744.Xc 745.Sm on 746A <- A >> k 747.Sm off 748.It Xo Dv BPF_ALU No + BPF_ADD No + 749.Dv BPF_X 750.Xc 751.Sm on 752A <- A + X 753.Sm off 754.It Xo Dv BPF_ALU No + BPF_SUB No + 755.Dv BPF_X 756.Xc 757.Sm on 758A <- A - X 759.Sm off 760.It Xo Dv BPF_ALU No + BPF_MUL No + 761.Dv BPF_X 762.Xc 763.Sm on 764A <- A * X 765.Sm off 766.It Xo Dv BPF_ALU No + BPF_DIV No + 767.Dv BPF_X 768.Xc 769.Sm on 770A <- A / X 771.Sm off 772.It Xo Dv BPF_ALU No + BPF_AND No + 773.Dv BPF_X 774.Xc 775.Sm on 776A <- A & X 777.Sm off 778.It Xo Dv BPF_ALU No + BPF_OR No + 779.Dv BPF_X 780.Xc 781.Sm on 782A <- A | X 783.Sm off 784.It Xo Dv BPF_ALU No + BPF_LSH No + 785.Dv BPF_X 786.Xc 787.Sm on 788A <- A << X 789.Sm off 790.It Xo Dv BPF_ALU No + BPF_RSH No + 791.Dv BPF_X 792.Xc 793.Sm on 794A <- A >> X 795.Sm off 796.It Dv BPF_ALU No + BPF_NEG 797.Sm on 798A <- -A 799.El 800.It Dv BPF_JMP 801The jump instructions alter flow of control. 802Conditional jumps compare the accumulator against a constant 803.Pf ( Dv BPF_K ) 804or the index register 805.Pf ( Dv BPF_X ) . 806If the result is true (or non-zero), the true branch is taken, otherwise the 807false branch is taken. 808Jump offsets are encoded in 8 bits so the longest jump is 256 instructions. 809However, the jump always 810.Pf ( Dv BPF_JA ) 811opcode uses the 32-bit 812.Fa k 813field as the offset, allowing arbitrarily distant destinations. 814All conditionals use unsigned comparison conventions. 815.Pp 816.Bl -tag -width 32n -compact 817.Sm off 818.It Dv BPF_JMP No + BPF_JA 819pc += k 820.Sm on 821.Sm off 822.It Xo Dv BPF_JMP No + BPF_JGT No + 823.Dv BPF_K 824.Xc 825.Sm on 826pc += (A > k) ? jt : jf 827.Sm off 828.It Xo Dv BPF_JMP No + BPF_JGE No + 829.Dv BPF_K 830.Xc 831.Sm on 832pc += (A >= k) ? jt : jf 833.Sm off 834.It Xo Dv BPF_JMP No + BPF_JEQ No + 835.Dv BPF_K 836.Xc 837.Sm on 838pc += (A == k) ? jt : jf 839.Sm off 840.It Xo Dv BPF_JMP No + BPF_JSET No + 841.Dv BPF_K 842.Xc 843.Sm on 844pc += (A & k) ? jt : jf 845.Sm off 846.It Xo Dv BPF_JMP No + BPF_JGT No + 847.Dv BPF_X 848.Xc 849.Sm on 850pc += (A > X) ? jt : jf 851.Sm off 852.It Xo Dv BPF_JMP No + BPF_JGE No + 853.Dv BPF_X 854.Xc 855.Sm on 856pc += (A >= X) ? jt : jf 857.Sm off 858.It Xo Dv BPF_JMP No + BPF_JEQ No + 859.Dv BPF_X 860.Xc 861.Sm on 862pc += (A == X) ? jt : jf 863.Sm off 864.It Xo Dv BPF_JMP No + BPF_JSET No + 865.Dv BPF_X 866.Xc 867.Sm on 868pc += (A & X) ? jt : jf 869.El 870.It Dv BPF_RET 871The return instructions terminate the filter program and specify the 872amount of packet to accept (i.e., they return the truncation amount) 873or, for the write filter, the maximum acceptable size for the packet 874(i.e., the packet is dropped if it is larger than the returned 875amount). 876A return value of zero indicates that the packet should be ignored/dropped. 877The return value is either a constant 878.Pf ( Dv BPF_K ) 879or the accumulator 880.Pf ( Dv BPF_A ) . 881.Pp 882.Bl -tag -width 32n -compact 883.It Dv BPF_RET No + Dv BPF_A 884Accept A bytes. 885.It Dv BPF_RET No + Dv BPF_K 886Accept k bytes. 887.El 888.It Dv BPF_MISC 889The miscellaneous category was created for anything that doesn't fit into 890the above classes, and for any new instructions that might need to be added. 891Currently, these are the register transfer instructions that copy the index 892register to the accumulator or vice versa. 893.Pp 894.Bl -tag -width 32n -compact 895.Sm off 896.It Dv BPF_MISC No + Dv BPF_TAX 897.Sm on 898X <- A 899.Sm off 900.It Dv BPF_MISC No + Dv BPF_TXA 901.Sm on 902A <- X 903.El 904.El 905.Pp 906The 907.Nm 908interface provides the following macros to facilitate array initializers: 909.Bd -filled -offset indent 910.Dv BPF_STMT ( Ns Ar opcode , 911.Ar operand ) 912.Pp 913.Dv BPF_JUMP ( Ns Ar opcode , 914.Ar operand , 915.Ar true_offset , 916.Ar false_offset ) 917.Ed 918.Sh FILES 919.Bl -tag -width /dev/bpf[0-9] -compact 920.It Pa /dev/bpf[0-9] 921BPF devices 922.El 923.Sh EXAMPLES 924The following filter is taken from the Reverse ARP daemon. 925It accepts only Reverse ARP requests. 926.Bd -literal -offset indent 927struct bpf_insn insns[] = { 928 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 929 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3), 930 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 931 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1), 932 BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) + 933 sizeof(struct ether_header)), 934 BPF_STMT(BPF_RET+BPF_K, 0), 935}; 936.Ed 937.Pp 938This filter accepts only IP packets between host 128.3.112.15 and 939128.3.112.35. 940.Bd -literal -offset indent 941struct bpf_insn insns[] = { 942 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 943 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8), 944 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26), 945 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2), 946 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 947 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4), 948 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3), 949 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 950 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1), 951 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 952 BPF_STMT(BPF_RET+BPF_K, 0), 953}; 954.Ed 955.Pp 956Finally, this filter returns only TCP finger packets. 957We must parse the IP header to reach the TCP header. 958The 959.Dv BPF_JSET 960instruction checks that the IP fragment offset is 0 so we are sure that we 961have a TCP header. 962.Bd -literal -offset indent 963struct bpf_insn insns[] = { 964 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 965 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10), 966 BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23), 967 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8), 968 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 969 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0), 970 BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14), 971 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14), 972 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0), 973 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16), 974 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1), 975 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 976 BPF_STMT(BPF_RET+BPF_K, 0), 977}; 978.Ed 979.Sh SEE ALSO 980.Xr ioctl 2 , 981.Xr read 2 , 982.Xr select 2 , 983.Xr signal 3 , 984.Xr MAKEDEV 8 , 985.Xr tcpdump 8 986.Rs 987.%A McCanne, S. 988.%A Jacobson V. 989.%J "An efficient, extensible, and portable network monitor" 990.Re 991.Sh HISTORY 992The Enet packet filter was created in 1980 by Mike Accetta and Rick Rashid 993at Carnegie-Mellon University. 994Jeffrey Mogul, at Stanford, ported the code to BSD and continued its 995development from 1983 on. 996Since then, it has evolved into the Ultrix Packet Filter at DEC, a STREAMS 997NIT module under SunOS 4.1, and BPF. 998.Sh AUTHORS 999Steve McCanne of Lawrence Berkeley Laboratory implemented BPF in Summer 1990. 1000Much of the design is due to Van Jacobson. 1001.Sh BUGS 1002The read buffer must be of a fixed size (returned by the 1003.Dv BIOCGBLEN 1004ioctl). 1005.Pp 1006A file that does not request promiscuous mode may receive promiscuously 1007received packets as a side effect of another file requesting this mode on 1008the same hardware interface. 1009This could be fixed in the kernel with additional processing overhead. 1010However, we favor the model where all files must assume that the interface 1011is promiscuous, and if so desired, must utilize a filter to reject foreign 1012packets. 1013.Pp 1014Data link protocols with variable length headers are not currently supported. 1015