1.\" $OpenBSD: bpf.4,v 1.27 2005/11/03 20:00:18 reyk Exp $ 2.\" $NetBSD: bpf.4,v 1.7 1995/09/27 18:31:50 thorpej Exp $ 3.\" 4.\" Copyright (c) 1990 The Regents of the University of California. 5.\" All rights reserved. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that: (1) source code distributions 9.\" retain the above copyright notice and this paragraph in its entirety, (2) 10.\" distributions including binary code include the above copyright notice and 11.\" this paragraph in its entirety in the documentation or other materials 12.\" provided with the distribution, and (3) all advertising materials mentioning 13.\" features or use of this software display the following acknowledgement: 14.\" ``This product includes software developed by the University of California, 15.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of 16.\" the University nor the names of its contributors may be used to endorse 17.\" or promote products derived from this software without specific prior 18.\" written permission. 19.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 20.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 21.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 22.\" 23.\" This document is derived in part from the enet man page (enet.4) 24.\" distributed with 4.3BSD Unix. 25.\" 26.Dd May 23, 1991 27.Dt BPF 4 28.Os 29.Sh NAME 30.Nm bpf 31.Nd Berkeley Packet Filter 32.Sh SYNOPSIS 33.Cd "pseudo-device bpfilter" 34.Sh DESCRIPTION 35The Berkeley Packet Filter provides a raw interface to data link layers in 36a protocol-independent fashion. 37All packets on the network, even those destined for other hosts, are 38accessible through this mechanism. 39.Pp 40The packet filter appears as a character special device, 41.Pa /dev/bpf0 , 42.Pa /dev/bpf1 , 43etc. 44After opening the device, the file descriptor must be bound to a specific 45network interface with the 46.Dv BIOCSETIF 47.Xr ioctl 2 . 48A given interface can be shared between multiple listeners, and the filter 49underlying each descriptor will see an identical packet stream. 50.Pp 51A separate device file is required for each minor device. 52If a file is in use, the open will fail and 53.Va errno 54will be set to 55.Er EBUSY . 56The number of open files can be increased by creating additional 57device nodes with the 58.Xr MAKEDEV 8 59script. 60.Pp 61Associated with each open instance of a 62.Nm 63file is a user-settable 64packet filter. 65Whenever a packet is received by an interface, all file descriptors 66listening on that interface apply their filter. 67Each descriptor that accepts the packet receives its own copy. 68.Pp 69Reads from these files return the next group of packets that have matched 70the filter. 71To improve performance, the buffer passed to read must be the same size as 72the buffers used internally by 73.Nm bpf . 74This size is returned by the 75.Dv BIOCGBLEN 76.Xr ioctl 2 77and can be set with 78.Dv BIOCSBLEN . 79Note that an individual packet larger than this size is necessarily truncated. 80.Pp 81The packet filter will support any link level protocol that has fixed length 82headers. 83Currently, only Ethernet, SLIP, and PPP drivers have been modified to 84interact with 85.Nm bpf . 86.Pp 87Since packet data is in network byte order, applications should use the 88.Xr byteorder 3 89macros to extract multi-byte values. 90.Pp 91A packet can be sent out on the network by writing to a 92.Nm 93file descriptor. 94Each descriptor can also have a user-settable filter 95for controlling the writes. 96Only packets matching the filter are sent out of the interface. 97The writes are unbuffered, meaning only one packet can be processed per write. 98.Pp 99Once a descriptor is configured, further changes to the configuration 100can be prevented using the 101.Dv BIOCLOCK 102.Xr ioctl 2 . 103.Sh IOCTL INTERFACE 104The 105.Xr ioctl 2 106command codes below are defined in 107.Aq Pa net/bpf.h . 108All commands require these includes: 109.Bd -unfilled -offset indent 110.Cd #include <sys/types.h> 111.Cd #include <sys/time.h> 112.Cd #include <sys/ioctl.h> 113.Cd #include <net/bpf.h> 114.Ed 115.Pp 116Additionally, 117.Dv BIOCGETIF 118and 119.Dv BIOCSETIF 120require 121.Aq Pa sys/socket.h 122and 123.Aq Pa net/if.h . 124.Pp 125The (third) argument to the 126.Xr ioctl 2 127call should be a pointer to the type indicated. 128.Pp 129.Bl -tag -width Ds -compact 130.It Dv BIOCGBLEN Fa "u_int *" 131Returns the required buffer length for reads on 132.Nm 133files. 134.Pp 135.It Dv BIOCSBLEN Fa "u_int *" 136Sets the buffer length for reads on 137.Nm 138files. 139The buffer must be set before the file is attached to an interface with 140.Dv BIOCSETIF . 141If the requested buffer size cannot be accommodated, the closest allowable 142size will be set and returned in the argument. 143A read call will result in 144.Er EIO 145if it is passed a buffer that is not this size. 146.Pp 147.It Dv BIOCGDLT Fa "u_int *" 148Returns the type of the data link layer underlying the attached interface. 149.Er EINVAL 150is returned if no interface has been specified. 151The device types, prefixed with 152.Dq DLT_ , 153are defined in 154.Aq Pa net/bpf.h . 155.Pp 156.It Dv BIOCGDLTLIST Fa "struct bpf_dltlist *" 157Returns an array of the available types of the data link layer 158underlying the attached interface: 159.Bd -literal -offset indent 160struct bpf_dltlist { 161 u_int bfl_len; 162 u_int *bfl_list; 163}; 164.Ed 165.Pp 166The available types are returned in the array pointed to by the 167.Va bfl_list 168field while their length in 169.Vt u_int 170is supplied to the 171.Va bfl_len 172field. 173.Er ENOMEM 174is returned if there is not enough buffer space and 175.Er EFAULT 176is returned if a bad address is encountered. 177The 178.Va bfl_len 179field is modified on return to indicate the actual length in 180.Vt u_int 181of the array returned. 182If 183.Va bfl_list 184is 185.Dv NULL , 186the 187.Va bfl_len 188field is set to indicate the required length of the array in 189.Vt u_int . 190.Pp 191.It Dv BIOCSDLT Fa "u_int *" 192Changes the type of the data link layer underlying the attached interface. 193.Er EINVAL 194is returned if no interface has been specified or the specified 195type is not available for the interface. 196.Pp 197.It Dv BIOCPROMISC 198Forces the interface into promiscuous mode. 199All packets, not just those destined for the local host, are processed. 200Since more than one file can be listening on a given interface, a listener 201that opened its interface non-promiscuously may receive packets promiscuously. 202This problem can be remedied with an appropriate filter. 203.Pp 204The interface remains in promiscuous mode until all files listening 205promiscuously are closed. 206.Pp 207.It Dv BIOCFLUSH 208Flushes the buffer of incoming packets and resets the statistics that are 209returned by 210.Dv BIOCGSTATS . 211.Pp 212.It Dv BIOCLOCK 213This ioctl is designed to prevent the security issues associated 214with an open 215.Nm 216descriptor in unprivileged programs. 217Even with dropped privileges, an open 218.Nm 219descriptor can be abused by a rogue program to listen on any interface 220on the system, send packets on these interfaces if the descriptor was 221opened read-write and send signals to arbitrary processes using the 222signaling mechanism of 223.Nm bpf . 224By allowing only 225.Dq known safe 226ioctls, the 227.Dv BIOCLOCK 228ioctl prevents this abuse. 229The allowable ioctls are 230.Dv BIOCGBLEN , 231.Dv BIOCFLUSH , 232.Dv BIOCGDLT , 233.Dv BIOCGETIF , 234.Dv BIOCGRTIMEOUT , 235.Dv BIOCSRTIMEOUT , 236.Dv BIOCIMMEDIATE , 237.Dv BIOCGSTATS , 238.Dv BIOCVERSION , 239.Dv BIOCGRSIG , 240.Dv BIOCGHDRCMPLT , 241.Dv TIOCGPGRP , 242and 243.Dv FIONREAD . 244Use of any other ioctl is denied with error 245.Er EPERM . 246Once a descriptor is locked, it is not possible to unlock it. 247A process with root privileges is not affected by the lock. 248.Pp 249A privileged program can open a 250.Nm 251device, drop privileges, set the interface, filters and modes on the 252descriptor, and lock it. 253Once the descriptor is locked, the system is safe 254from further abuse through the descriptor. 255Locking a descriptor does not prevent writes. 256If the application does not need to send packets through 257.Nm bpf , 258it can open the device read-only to prevent writing. 259If sending packets is necessary, a write-filter can be set before locking the 260descriptor to prevent arbitrary packets from being sent out. 261.Pp 262.It Dv BIOCGETIF Fa "struct ifreq *" 263Returns the name of the hardware interface that the file is listening on. 264The name is returned in the 265.Fa ifr_name 266field of the 267.Li struct ifreq . 268All other fields are undefined. 269.Pp 270.It Dv BIOCSETIF Fa "struct ifreq *" 271Sets the hardware interface associated with the file. 272This command must be performed before any packets can be read. 273The device is indicated by name using the 274.Fa ifr_name 275field of the 276.Li struct ifreq . 277Additionally, performs the actions of 278.Dv BIOCFLUSH . 279.Pp 280.It Dv BIOCSRTIMEOUT Fa "struct timeval *" 281.It Dv BIOCGRTIMEOUT Fa "struct timeval *" 282Set or get the read timeout parameter. 283The 284.Ar timeval 285specifies the length of time to wait before timing out on a read request. 286This parameter is initialized to zero by 287.Xr open 2 , 288indicating no timeout. 289.Pp 290.It Dv BIOCGSTATS Fa "struct bpf_stat *" 291Returns the following structure of packet statistics: 292.Bd -literal -offset indent 293struct bpf_stat { 294 u_int bs_recv; 295 u_int bs_drop; 296}; 297.Ed 298.Pp 299The fields are: 300.Bl -tag -width bs_recv 301.It Fa bs_recv 302Number of packets received by the descriptor since opened or reset (including 303any buffered since the last read call). 304.It Fa bs_drop 305Number of packets which were accepted by the filter but dropped by the kernel 306because of buffer overflows (i.e., the application's reads aren't keeping up 307with the packet traffic). 308.El 309.Pp 310.It Dv BIOCIMMEDIATE Fa "u_int *" 311Enable or disable 312.Dq immediate mode , 313based on the truth value of the argument. 314When immediate mode is enabled, reads return immediately upon packet reception. 315Otherwise, a read will block until either the kernel buffer becomes full or a 316timeout occurs. 317This is useful for programs like 318.Xr rarpd 8 , 319which must respond to messages in real time. 320The default for a new file is off. 321.Pp 322.It Dv BIOCSETF Fa "struct bpf_program *" 323Sets the filter program used by the kernel to discard uninteresting packets. 324An array of instructions and its length are passed in using the following 325structure: 326.Bd -literal -offset indent 327struct bpf_program { 328 int bf_len; 329 struct bpf_insn *bf_insns; 330}; 331.Ed 332.Pp 333The filter program is pointed to by the 334.Fa bf_insns 335field, while its length in units of 336.Li struct bpf_insn 337is given by the 338.Fa bf_len 339field. 340Also, the actions of 341.Dv BIOCFLUSH 342are performed. 343.Pp 344See section 345.Sx FILTER MACHINE 346for an explanation of the filter language. 347.Pp 348.It Dv BIOCSETWF Fa "struct bpf_program *" 349Sets the filter program used by the kernel to filter the packets 350written to the descriptor before the packets are sent out on the 351network. 352See 353.Dv BIOCSETF 354for a description of the filter program. 355This ioctl also acts as 356.Dv BIOCFLUSH . 357.Pp 358Note that the filter operates on the packet data written to the descriptor. 359If the 360.Dq header complete 361flag is not set, the kernel sets the link-layer source address 362of the packet after filtering. 363.Pp 364.It Dv BIOCVERSION Fa "struct bpf_version *" 365Returns the major and minor version numbers of the filter language currently 366recognized by the kernel. 367Before installing a filter, applications must check that the current version 368is compatible with the running kernel. 369Version numbers are compatible if the major numbers match and the application 370minor is less than or equal to the kernel minor. 371The kernel version number is returned in the following structure: 372.Bd -literal -offset indent 373struct bpf_version { 374 u_short bv_major; 375 u_short bv_minor; 376}; 377.Ed 378.Pp 379The current version numbers are given by 380.Dv BPF_MAJOR_VERSION 381and 382.Dv BPF_MINOR_VERSION 383from 384.Aq Pa net/bpf.h . 385An incompatible filter may result in undefined behavior (most likely, an 386error returned by 387.Xr ioctl 2 388or haphazard packet matching). 389.Pp 390.It Dv BIOCSRSIG Fa "u_int *" 391.It Dv BIOCGRSIG Fa "u_int *" 392Set or get the receive signal. 393This signal will be sent to the process or process group specified by 394.Dv FIOSETOWN . 395It defaults to 396.Dv SIGIO . 397.Pp 398.It Dv BIOCSHDRCMPLT Fa "u_int *" 399.It Dv BIOCGHDRCMPLT Fa "u_int *" 400Set or get the status of the 401.Dq header complete 402flag. 403Set to zero if the link level source address should be filled in 404automatically by the interface output routine. 405Set to one if the link level source address will be written, 406as provided, to the wire. 407This flag is initialized to zero by default. 408.Pp 409.It Dv BIOCGFILDROP Fa "u_int *" 410.It Dv BIOCSFILDROP Fa "u_int *" 411Get or set the status of the 412.Dq filter drop 413flag. 414If non-zero, packets matching any filters will be reported to the 415associated interface so that they can be dropped. 416.El 417.Ss Standard ioctls 418.Nm 419now supports several standard ioctls which allow the user to do asynchronous 420and/or non-blocking I/O to an open 421.Nm 422file descriptor. 423.Pp 424.Bl -tag -width Ds -compact 425.It Dv FIONREAD Fa "int *" 426Returns the number of bytes that are immediately available for reading. 427.Pp 428.It Dv SIOCGIFADDR Fa "struct ifreq *" 429Returns the address associated with the interface. 430.Pp 431.It Dv FIONBIO Fa "int *" 432Set or clear non-blocking I/O. 433If the argument is non-zero, enable non-blocking I/O. 434If the argument is zero, disable non-blocking I/O. 435If non-blocking I/O is enabled, the return value of a read while no data 436is available will be 0. 437The non-blocking read behavior is different from performing non-blocking 438reads on other file descriptors, which will return \-1 and set 439.Va errno 440to 441.Er EAGAIN 442if no data is available. 443Note: setting this overrides the timeout set by 444.Dv BIOCSRTIMEOUT . 445.Pp 446.It Dv FIOASYNC Fa "int *" 447Enable or disable asynchronous I/O. 448When enabled (argument is non-zero), the process or process group specified 449by 450.Dv FIOSETOWN 451will start receiving 452.Dv SIGIO 453signals when packets arrive. 454Note that you must perform an 455.Dv FIOSETOWN 456command in order for this to take effect, as the system will not do it by 457default. 458The signal may be changed via 459.Dv BIOCSRSIG . 460.Pp 461.It Dv FIOSETOWN Fa "int *" 462.It Dv FIOGETOWN Fa "int *" 463Set or get the process or process group (if negative) that should receive 464.Dv SIGIO 465when packets are available. 466The signal may be changed using 467.Dv BIOCSRSIG 468(see above). 469.El 470.Ss BPF header 471The following structure is prepended to each packet returned by 472.Xr read 2 : 473.Bd -literal -offset indent 474struct bpf_hdr { 475 struct bpf_timeval bh_tstamp; 476 u_int32_t bh_caplen; 477 u_int32_t bh_datalen; 478 u_int16_t bh_hdrlen; 479}; 480.Ed 481.Pp 482The fields, stored in host order, are as follows: 483.Bl -tag -width Ds 484.It Fa bh_tstamp 485Time at which the packet was processed by the packet filter. 486.It Fa bh_caplen 487Length of the captured portion of the packet. 488This is the minimum of the truncation amount specified by the filter and the 489length of the packet. 490.It Fa bh_datalen 491Length of the packet off the wire. 492This value is independent of the truncation amount specified by the filter. 493.It Fa bh_hdrlen 494Length of the BPF header, which may not be equal to 495.Li sizeof(struct bpf_hdr) . 496.El 497.Pp 498The 499.Fa bh_hdrlen 500field exists to account for padding between the header and the link level 501protocol. 502The purpose here is to guarantee proper alignment of the packet data 503structures, which is required on alignment-sensitive architectures and 504improves performance on many other architectures. 505The packet filter ensures that the 506.Fa bpf_hdr 507and the network layer header will be word aligned. 508Suitable precautions must be taken when accessing the link layer protocol 509fields on alignment restricted machines. 510(This isn't a problem on an Ethernet, since the type field is a 511.Li short 512falling on an even offset, and the addresses are probably accessed in a 513bytewise fashion). 514.Pp 515Additionally, individual packets are padded so that each starts on a 516word boundary. 517This requires that an application has some knowledge of how to get from packet 518to packet. 519The macro 520.Dv BPF_WORDALIGN 521is defined in 522.Aq Pa net/bpf.h 523to facilitate this process. 524It rounds up its argument to the nearest word aligned value (where a word is 525.Dv BPF_ALIGNMENT 526bytes wide). 527For example, if 528.Va p 529points to the start of a packet, this expression will advance it to the 530next packet: 531.Pp 532.Dl p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen); 533.Pp 534For the alignment mechanisms to work properly, the buffer passed to 535.Xr read 2 536must itself be word aligned. 537.Xr malloc 3 538will always return an aligned buffer. 539.Ss Filter machine 540A filter program is an array of instructions with all branches forwardly 541directed, terminated by a 542.Dq return 543instruction. 544Each instruction performs some action on the pseudo-machine state, which 545consists of an accumulator, index register, scratch memory store, and 546implicit program counter. 547.Pp 548The following structure defines the instruction format: 549.Bd -literal -offset indent 550struct bpf_insn { 551 u_int16_t code; 552 u_char jt; 553 u_char jf; 554 u_int32_t k; 555}; 556.Ed 557.Pp 558The 559.Fa k 560field is used in different ways by different instructions, and the 561.Fa jt 562and 563.Fa jf 564fields are used as offsets by the branch instructions. 565The opcodes are encoded in a semi-hierarchical fashion. 566There are eight classes of instructions: 567.Dv BPF_LD , 568.Dv BPF_LDX , 569.Dv BPF_ST , 570.Dv BPF_STX , 571.Dv BPF_ALU , 572.Dv BPF_JMP , 573.Dv BPF_RET , 574and 575.Dv BPF_MISC . 576Various other mode and operator bits are logically OR'd into the class to 577give the actual instructions. 578The classes and modes are defined in 579.Aq Pa net/bpf.h . 580Below are the semantics for each defined 581.Nm 582instruction. 583We use the convention that A is the accumulator, X is the index register, 584P[] packet data, and M[] scratch memory store. 585P[i:n] gives the data at byte offset 586.Dq i 587in the packet, interpreted as a word (n=4), unsigned halfword (n=2), or 588unsigned byte (n=1). 589M[i] gives the i'th word in the scratch memory store, which is only addressed 590in word units. 591The memory store is indexed from 0 to 592.Dv BPF_MEMWORDS Ns \-1 . 593.Fa k , 594.Fa jt , 595and 596.Fa jf 597are the corresponding fields in the instruction definition. 598.Dq len 599refers to the length of the packet. 600.Bl -tag -width Ds 601.It Dv BPF_LD 602These instructions copy a value into the accumulator. 603The type of the source operand is specified by an 604.Dq addressing mode 605and can be a constant 606.Pf ( Dv BPF_IMM ) , 607packet data at a fixed offset 608.Pf ( Dv BPF_ABS ) , 609packet data at a variable offset 610.Pf ( Dv BPF_IND ) , 611the packet length 612.Pf ( Dv BPF_LEN ) , 613or a word in the scratch memory store 614.Pf ( Dv BPF_MEM ) . 615For 616.Dv BPF_IND 617and 618.Dv BPF_ABS , 619the data size must be specified as a word 620.Pf ( Dv BPF_W ) , 621halfword 622.Pf ( Dv BPF_H ) , 623or byte 624.Pf ( Dv BPF_B ) . 625The semantics of all recognized 626.Dv BPF_LD 627instructions follow. 628.Pp 629.Bl -tag -width 32n -compact 630.Sm off 631.It Xo Dv BPF_LD No + Dv BPF_W No + 632.Dv BPF_ABS 633.Xc 634.Sm on 635A <- P[k:4] 636.Sm off 637.It Xo Dv BPF_LD No + Dv BPF_H No + 638.Dv BPF_ABS 639.Xc 640.Sm on 641A <- P[k:2] 642.Sm off 643.It Xo Dv BPF_LD No + Dv BPF_B No + 644.Dv BPF_ABS 645.Xc 646.Sm on 647A <- P[k:1] 648.Sm off 649.It Xo Dv BPF_LD No + Dv BPF_W No + 650.Dv BPF_IND 651.Xc 652.Sm on 653A <- P[X+k:4] 654.Sm off 655.It Xo Dv BPF_LD No + Dv BPF_H No + 656.Dv BPF_IND 657.Xc 658.Sm on 659A <- P[X+k:2] 660.Sm off 661.It Xo Dv BPF_LD No + Dv BPF_B No + 662.Dv BPF_IND 663.Xc 664.Sm on 665A <- P[X+k:1] 666.Sm off 667.It Xo Dv BPF_LD No + Dv BPF_W No + 668.Dv BPF_LEN 669.Xc 670.Sm on 671A <- len 672.Sm off 673.It Dv BPF_LD No + Dv BPF_IMM 674.Sm on 675A <- k 676.Sm off 677.It Dv BPF_LD No + Dv BPF_MEM 678.Sm on 679A <- M[k] 680.El 681.It Dv BPF_LDX 682These instructions load a value into the index register. 683Note that the addressing modes are more restricted than those of the 684accumulator loads, but they include 685.Dv BPF_MSH , 686a hack for efficiently loading the IP header length. 687.Pp 688.Bl -tag -width 32n -compact 689.Sm off 690.It Xo Dv BPF_LDX No + Dv BPF_W No + 691.Dv BPF_IMM 692.Xc 693.Sm on 694X <- k 695.Sm off 696.It Xo Dv BPF_LDX No + Dv BPF_W No + 697.Dv BPF_MEM 698.Xc 699.Sm on 700X <- M[k] 701.Sm off 702.It Xo Dv BPF_LDX No + Dv BPF_W No + 703.Dv BPF_LEN 704.Xc 705.Sm on 706X <- len 707.Sm off 708.It Xo Dv BPF_LDX No + Dv BPF_B No + 709.Dv BPF_MSH 710.Xc 711.Sm on 712X <- 4*(P[k:1]&0xf) 713.El 714.It Dv BPF_ST 715This instruction stores the accumulator into the scratch memory. 716We do not need an addressing mode since there is only one possibility for 717the destination. 718.Pp 719.Bl -tag -width 32n -compact 720.It Dv BPF_ST 721M[k] <- A 722.El 723.It Dv BPF_STX 724This instruction stores the index register in the scratch memory store. 725.Pp 726.Bl -tag -width 32n -compact 727.It Dv BPF_STX 728M[k] <- X 729.El 730.It Dv BPF_ALU 731The ALU instructions perform operations between the accumulator and index 732register or constant, and store the result back in the accumulator. 733For binary operations, a source mode is required 734.Pf ( Dv BPF_K 735or 736.Dv BPF_X ) . 737.Pp 738.Bl -tag -width 32n -compact 739.Sm off 740.It Xo Dv BPF_ALU No + BPF_ADD No + 741.Dv BPF_K 742.Xc 743.Sm on 744A <- A + k 745.Sm off 746.It Xo Dv BPF_ALU No + BPF_SUB No + 747.Dv BPF_K 748.Xc 749.Sm on 750A <- A - k 751.Sm off 752.It Xo Dv BPF_ALU No + BPF_MUL No + 753.Dv BPF_K 754.Xc 755.Sm on 756A <- A * k 757.Sm off 758.It Xo Dv BPF_ALU No + BPF_DIV No + 759.Dv BPF_K 760.Xc 761.Sm on 762A <- A / k 763.Sm off 764.It Xo Dv BPF_ALU No + BPF_AND No + 765.Dv BPF_K 766.Xc 767.Sm on 768A <- A & k 769.Sm off 770.It Xo Dv BPF_ALU No + BPF_OR No + 771.Dv BPF_K 772.Xc 773.Sm on 774A <- A | k 775.Sm off 776.It Xo Dv BPF_ALU No + BPF_LSH No + 777.Dv BPF_K 778.Xc 779.Sm on 780A <- A << k 781.Sm off 782.It Xo Dv BPF_ALU No + BPF_RSH No + 783.Dv BPF_K 784.Xc 785.Sm on 786A <- A >> k 787.Sm off 788.It Xo Dv BPF_ALU No + BPF_ADD No + 789.Dv BPF_X 790.Xc 791.Sm on 792A <- A + X 793.Sm off 794.It Xo Dv BPF_ALU No + BPF_SUB No + 795.Dv BPF_X 796.Xc 797.Sm on 798A <- A - X 799.Sm off 800.It Xo Dv BPF_ALU No + BPF_MUL No + 801.Dv BPF_X 802.Xc 803.Sm on 804A <- A * X 805.Sm off 806.It Xo Dv BPF_ALU No + BPF_DIV No + 807.Dv BPF_X 808.Xc 809.Sm on 810A <- A / X 811.Sm off 812.It Xo Dv BPF_ALU No + BPF_AND No + 813.Dv BPF_X 814.Xc 815.Sm on 816A <- A & X 817.Sm off 818.It Xo Dv BPF_ALU No + BPF_OR No + 819.Dv BPF_X 820.Xc 821.Sm on 822A <- A | X 823.Sm off 824.It Xo Dv BPF_ALU No + BPF_LSH No + 825.Dv BPF_X 826.Xc 827.Sm on 828A <- A << X 829.Sm off 830.It Xo Dv BPF_ALU No + BPF_RSH No + 831.Dv BPF_X 832.Xc 833.Sm on 834A <- A >> X 835.Sm off 836.It Dv BPF_ALU No + BPF_NEG 837.Sm on 838A <- -A 839.El 840.It Dv BPF_JMP 841The jump instructions alter flow of control. 842Conditional jumps compare the accumulator against a constant 843.Pf ( Dv BPF_K ) 844or the index register 845.Pf ( Dv BPF_X ) . 846If the result is true (or non-zero), the true branch is taken, otherwise the 847false branch is taken. 848Jump offsets are encoded in 8 bits so the longest jump is 256 instructions. 849However, the jump always 850.Pf ( Dv BPF_JA ) 851opcode uses the 32-bit 852.Fa k 853field as the offset, allowing arbitrarily distant destinations. 854All conditionals use unsigned comparison conventions. 855.Pp 856.Bl -tag -width 32n -compact 857.Sm off 858.It Dv BPF_JMP No + BPF_JA 859pc += k 860.Sm on 861.Sm off 862.It Xo Dv BPF_JMP No + BPF_JGT No + 863.Dv BPF_K 864.Xc 865.Sm on 866pc += (A > k) ? jt : jf 867.Sm off 868.It Xo Dv BPF_JMP No + BPF_JGE No + 869.Dv BPF_K 870.Xc 871.Sm on 872pc += (A >= k) ? jt : jf 873.Sm off 874.It Xo Dv BPF_JMP No + BPF_JEQ No + 875.Dv BPF_K 876.Xc 877.Sm on 878pc += (A == k) ? jt : jf 879.Sm off 880.It Xo Dv BPF_JMP No + BPF_JSET No + 881.Dv BPF_K 882.Xc 883.Sm on 884pc += (A & k) ? jt : jf 885.Sm off 886.It Xo Dv BPF_JMP No + BPF_JGT No + 887.Dv BPF_X 888.Xc 889.Sm on 890pc += (A > X) ? jt : jf 891.Sm off 892.It Xo Dv BPF_JMP No + BPF_JGE No + 893.Dv BPF_X 894.Xc 895.Sm on 896pc += (A >= X) ? jt : jf 897.Sm off 898.It Xo Dv BPF_JMP No + BPF_JEQ No + 899.Dv BPF_X 900.Xc 901.Sm on 902pc += (A == X) ? jt : jf 903.Sm off 904.It Xo Dv BPF_JMP No + BPF_JSET No + 905.Dv BPF_X 906.Xc 907.Sm on 908pc += (A & X) ? jt : jf 909.El 910.It Dv BPF_RET 911The return instructions terminate the filter program and specify the 912amount of packet to accept (i.e., they return the truncation amount) 913or, for the write filter, the maximum acceptable size for the packet 914(i.e., the packet is dropped if it is larger than the returned 915amount). 916A return value of zero indicates that the packet should be ignored/dropped. 917The return value is either a constant 918.Pf ( Dv BPF_K ) 919or the accumulator 920.Pf ( Dv BPF_A ) . 921.Pp 922.Bl -tag -width 32n -compact 923.It Dv BPF_RET No + Dv BPF_A 924Accept A bytes. 925.It Dv BPF_RET No + Dv BPF_K 926Accept k bytes. 927.El 928.It Dv BPF_MISC 929The miscellaneous category was created for anything that doesn't fit into 930the above classes, and for any new instructions that might need to be added. 931Currently, these are the register transfer instructions that copy the index 932register to the accumulator or vice versa. 933.Pp 934.Bl -tag -width 32n -compact 935.Sm off 936.It Dv BPF_MISC No + Dv BPF_TAX 937.Sm on 938X <- A 939.Sm off 940.It Dv BPF_MISC No + Dv BPF_TXA 941.Sm on 942A <- X 943.El 944.El 945.Pp 946The 947.Nm 948interface provides the following macros to facilitate array initializers: 949.Bd -filled -offset indent 950.Dv BPF_STMT ( Ns Ar opcode , 951.Ar operand ) 952.Pp 953.Dv BPF_JUMP ( Ns Ar opcode , 954.Ar operand , 955.Ar true_offset , 956.Ar false_offset ) 957.Ed 958.Sh FILES 959.Bl -tag -width /dev/bpf[0-9] -compact 960.It Pa /dev/bpf[0-9] 961.Nm 962devices 963.El 964.Sh EXAMPLES 965The following filter is taken from the Reverse ARP daemon. 966It accepts only Reverse ARP requests. 967.Bd -literal -offset indent 968struct bpf_insn insns[] = { 969 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 970 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3), 971 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 972 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1), 973 BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) + 974 sizeof(struct ether_header)), 975 BPF_STMT(BPF_RET+BPF_K, 0), 976}; 977.Ed 978.Pp 979This filter accepts only IP packets between host 128.3.112.15 and 980128.3.112.35. 981.Bd -literal -offset indent 982struct bpf_insn insns[] = { 983 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 984 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8), 985 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26), 986 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2), 987 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 988 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4), 989 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3), 990 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 991 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1), 992 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 993 BPF_STMT(BPF_RET+BPF_K, 0), 994}; 995.Ed 996.Pp 997Finally, this filter returns only TCP finger packets. 998We must parse the IP header to reach the TCP header. 999The 1000.Dv BPF_JSET 1001instruction checks that the IP fragment offset is 0 so we are sure that we 1002have a TCP header. 1003.Bd -literal -offset indent 1004struct bpf_insn insns[] = { 1005 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 1006 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10), 1007 BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23), 1008 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8), 1009 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 1010 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0), 1011 BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14), 1012 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14), 1013 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0), 1014 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16), 1015 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1), 1016 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 1017 BPF_STMT(BPF_RET+BPF_K, 0), 1018}; 1019.Ed 1020.Sh SEE ALSO 1021.Xr ioctl 2 , 1022.Xr read 2 , 1023.Xr select 2 , 1024.Xr signal 3 , 1025.Xr MAKEDEV 8 , 1026.Xr tcpdump 8 1027.Rs 1028.%A McCanne, S. 1029.%A Jacobson, V. 1030.%J "An efficient, extensible, and portable network monitor" 1031.Re 1032.Sh HISTORY 1033The Enet packet filter was created in 1980 by Mike Accetta and Rick Rashid 1034at Carnegie-Mellon University. 1035Jeffrey Mogul, at Stanford, ported the code to BSD and continued its 1036development from 1983 on. 1037Since then, it has evolved into the Ultrix Packet Filter at DEC, a STREAMS 1038NIT module under SunOS 4.1, and BPF. 1039.Sh AUTHORS 1040Steve McCanne of Lawrence Berkeley Laboratory implemented BPF in Summer 1990. 1041Much of the design is due to Van Jacobson. 1042.Sh BUGS 1043The read buffer must be of a fixed size (returned by the 1044.Dv BIOCGBLEN 1045ioctl). 1046.Pp 1047A file that does not request promiscuous mode may receive promiscuously 1048received packets as a side effect of another file requesting this mode on 1049the same hardware interface. 1050This could be fixed in the kernel with additional processing overhead. 1051However, we favor the model where all files must assume that the interface 1052is promiscuous, and if so desired, must utilize a filter to reject foreign 1053packets. 1054.Pp 1055Data link protocols with variable length headers are not currently supported. 1056