1.\" Copyright (c) 2001-2003 International Computer Science Institute 2.\" 3.\" Permission is hereby granted, free of charge, to any person obtaining a 4.\" copy of this software and associated documentation files (the "Software"), 5.\" to deal in the Software without restriction, including without limitation 6.\" the rights to use, copy, modify, merge, publish, distribute, sublicense, 7.\" and/or sell copies of the Software, and to permit persons to whom the 8.\" Software is furnished to do so, subject to the following conditions: 9.\" 10.\" The above copyright notice and this permission notice shall be included in 11.\" all copies or substantial portions of the Software. 12.\" 13.\" The names and trademarks of copyright holders may not be used in 14.\" advertising or publicity pertaining to the software without specific 15.\" prior permission. Title to copyright in this software and any associated 16.\" documentation will at all times remain with the copyright holders. 17.\" 18.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 19.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 20.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 21.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 22.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 24.\" DEALINGS IN THE SOFTWARE. 25.\" 26.\" $FreeBSD: src/share/man/man4/multicast.4,v 1.4 2004/07/09 09:22:36 ru Exp $ 27.\" $OpenBSD: multicast.4,v 1.15 2019/03/10 21:31:49 jmc Exp $ 28.\" $NetBSD: multicast.4,v 1.3 2004/09/12 13:12:26 wiz Exp $ 29.\" 30.Dd $Mdocdate: March 10 2019 $ 31.Dt MULTICAST 4 32.Os 33.\" 34.Sh NAME 35.Nm multicast 36.Nd multicast routing 37.\" 38.Sh SYNOPSIS 39.Cd "options MROUTING" 40.Pp 41.In sys/types.h 42.In sys/socket.h 43.In netinet/in.h 44.In netinet/ip_mroute.h 45.In netinet6/ip6_mroute.h 46.Ft int 47.Fn getsockopt "int s" IPPROTO_IP MRT_INIT "void *optval" "socklen_t *optlen" 48.Ft int 49.Fn setsockopt "int s" IPPROTO_IP MRT_INIT "const void *optval" "socklen_t optlen" 50.Ft int 51.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_INIT "void *optval" "socklen_t *optlen" 52.Ft int 53.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_INIT "const void *optval" "socklen_t optlen" 54.Sh DESCRIPTION 55Multicast routing is used to efficiently propagate data 56packets to a set of multicast listeners in multipoint networks. 57If unicast is used to replicate the data to all listeners, 58then some of the network links may carry multiple copies of the same 59data packets. 60With multicast routing, the overhead is reduced to one copy 61(at most) per network link. 62.Pp 63All multicast-capable routers must run a common multicast routing 64protocol. 65The Distance Vector Multicast Routing Protocol (DVMRP) 66was the first developed multicast routing protocol. 67Later, other protocols such as Multicast Extensions to OSPF (MOSPF) and 68Core Based Trees (CBT) 69were developed as well. 70.Pp 71To start multicast routing, 72the user must enable multicast forwarding via the 73.Xr sysctl 8 74variables 75.Va net.inet.ip.mforwarding 76and/or 77.Va net.inet.ip6.mforwarding , 78and set 79.Va multicast 80to 81.Dq YES 82in 83.Xr rc.conf.local 8 . 84The user must also run a multicast routing capable user-level process, 85such as 86.Xr mrouted 8 . 87From a developer's point of view, 88the programming guide described in the 89.Sx Programming Guide 90section should be used to control the multicast forwarding in the kernel. 91.\" 92.Ss Programming Guide 93This section provides information about the basic multicast routing API. 94The so-called 95.Dq advanced multicast API 96is described in the 97.Sx "Advanced Multicast API Programming Guide" 98section. 99.Pp 100First, a multicast routing socket must be open. 101That socket would be used 102to control the multicast forwarding in the kernel. 103Note that most operations below require certain privilege 104(i.e., root privilege): 105.Bd -literal -offset indent 106/* IPv4 */ 107int mrouter_s4; 108mrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP); 109.Ed 110.Bd -literal -offset indent 111int mrouter_s6; 112mrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6); 113.Ed 114.Pp 115Note that if the router needs to open an IGMP or ICMPv6 socket 116(IPv4 or IPv6, respectively) 117for sending or receiving of IGMP or MLD multicast group membership messages, 118then the same 119.Va mrouter_s4 120or 121.Va mrouter_s6 122sockets should be used 123for sending and receiving respectively IGMP or MLD messages. 124In the case of 125.Bx Ns -derived 126kernels, 127it may be possible to open separate sockets 128for IGMP or MLD messages only. 129However, some other kernels (e.g., Linux) 130require that the multicast 131routing socket must be used for sending and receiving of IGMP or MLD 132messages. 133Therefore, for portability reasons, the multicast 134routing socket should be reused for IGMP and MLD messages as well. 135.Pp 136After the multicast routing socket is open, it can be used to enable 137or disable multicast forwarding in the kernel: 138.Bd -literal -offset 5n 139/* IPv4 */ 140int v = 1; /* 1 to enable, or 0 to disable */ 141setsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, &v, sizeof(v)); 142.Ed 143.Bd -literal -offset 5n 144/* IPv6 */ 145int v = 1; /* 1 to enable, or 0 to disable */ 146setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, &v, sizeof(v)); 147\&... 148/* If necessary, filter all ICMPv6 messages */ 149struct icmp6_filter filter; 150ICMP6_FILTER_SETBLOCKALL(&filter); 151setsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, &filter, 152 sizeof(filter)); 153.Ed 154.Pp 155For each network interface (e.g., physical or a virtual tunnel) 156that would be used for multicast forwarding, a corresponding 157multicast interface must be added to the kernel: 158.Bd -literal -offset 3n 159/* IPv4 */ 160struct vifctl vc; 161memset(&vc, 0, sizeof(vc)); 162/* Assign all vifctl fields as appropriate */ 163vc.vifc_vifi = vif_index; 164vc.vifc_flags = vif_flags; 165vc.vifc_threshold = min_ttl_threshold; 166vc.vifc_rate_limit = max_rate_limit; 167memcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr)); 168if (vc.vifc_flags & VIFF_TUNNEL) 169 memcpy(&vc.vifc_rmt_addr, &vif_remote_address, 170 sizeof(vc.vifc_rmt_addr)); 171setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc)); 172.Ed 173.Pp 174The 175.Va vif_index 176must be unique per vif. 177The 178.Va vif_flags 179contains the 180.Dv VIFF_* 181flags as defined in 182.In netinet/ip_mroute.h . 183The 184.Va min_ttl_threshold 185contains the minimum TTL a multicast data packet must have to be 186forwarded on that vif. 187Typically, it would be 1. 188The 189.Va max_rate_limit 190contains the maximum rate (in bits/s) of the multicast data packets forwarded 191on that vif. 192A value of 0 means no limit. 193The 194.Va vif_local_address 195contains the local IP address of the corresponding local interface. 196The 197.Va vif_remote_address 198contains the remote IP address for DVMRP multicast tunnels. 199.Bd -literal -offset indent 200/* IPv6 */ 201struct mif6ctl mc; 202memset(&mc, 0, sizeof(mc)); 203/* Assign all mif6ctl fields as appropriate */ 204mc.mif6c_mifi = mif_index; 205mc.mif6c_flags = mif_flags; 206mc.mif6c_pifi = pif_index; 207setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, &mc, sizeof(mc)); 208.Ed 209.Pp 210The 211.Va mif_index 212must be unique per vif. 213The 214.Va mif_flags 215contains the 216.Dv MIFF_* 217flags as defined in 218.In netinet6/ip6_mroute.h . 219The 220.Va pif_index 221is the physical interface index of the corresponding local interface. 222.Pp 223A multicast interface is deleted by: 224.Bd -literal -offset indent 225/* IPv4 */ 226vifi_t vifi = vif_index; 227setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, &vifi, 228 sizeof(vifi)); 229.Ed 230.Bd -literal -offset indent 231/* IPv6 */ 232mifi_t mifi = mif_index; 233setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, &mifi, 234 sizeof(mifi)); 235.Ed 236.Pp 237After multicast forwarding is enabled, and the multicast virtual 238interfaces have been 239added, the kernel may deliver upcall messages (also called signals 240later in this text) on the multicast routing socket that was open 241earlier with 242.Dv MRT_INIT 243or 244.Dv MRT6_INIT . 245The IPv4 upcalls have a 246.Vt "struct igmpmsg" 247header (see 248.In netinet/ip_mroute.h ) 249with the 250.Va im_mbz 251field set to zero. 252Note that this header follows the structure of 253.Vt "struct ip" 254with the protocol field 255.Va ip_p 256set to zero. 257The IPv6 upcalls have a 258.Vt "struct mrt6msg" 259header (see 260.In netinet6/ip6_mroute.h ) 261with the 262.Va im6_mbz 263field set to zero. 264Note that this header follows the structure of 265.Vt "struct ip6_hdr" 266with the next header field 267.Va ip6_nxt 268set to zero. 269.Pp 270The upcall header contains the 271.Va im_msgtype 272and 273.Va im6_msgtype 274fields, with the type of the upcall 275.Dv IGMPMSG_* 276and 277.Dv MRT6MSG_* 278for IPv4 and IPv6, respectively. 279The values of the rest of the upcall header fields 280and the body of the upcall message depend on the particular upcall type. 281.Pp 282If the upcall message type is 283.Dv IGMPMSG_NOCACHE 284or 285.Dv MRT6MSG_NOCACHE , 286this is an indication that a multicast packet has reached the multicast 287router, but the router has no forwarding state for that packet. 288Typically, the upcall would be a signal for the multicast routing 289user-level process to install the appropriate Multicast Forwarding 290Cache (MFC) entry in the kernel. 291.Pp 292An MFC entry is added by: 293.Bd -literal -offset indent 294/* IPv4 */ 295struct mfcctl mc; 296memset(&mc, 0, sizeof(mc)); 297memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin)); 298memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp)); 299mc.mfcc_parent = iif_index; 300for (i = 0; i < maxvifs; i++) 301 mc.mfcc_ttls[i] = oifs_ttl[i]; 302setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC, &mc, sizeof(mc)); 303.Ed 304.Bd -literal -offset indent 305/* IPv6 */ 306struct mf6cctl mc; 307memset(&mc, 0, sizeof(mc)); 308memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin)); 309memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp)); 310mc.mf6cc_parent = iif_index; 311for (i = 0; i < maxvifs; i++) 312 if (oifs_ttl[i] > 0) 313 IF_SET(i, &mc.mf6cc_ifset); 314setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_ADD_MFC, &mc, sizeof(mc)); 315.Ed 316.Pp 317The 318.Va source_addr 319and 320.Va group_addr 321fields are the source and group address of the multicast packet (as set 322in the upcall message). 323The 324.Va iif_index 325is the virtual interface index of the multicast interface the multicast 326packets for this specific source and group address should be received on. 327The 328.Va oifs_ttl[] 329array contains the minimum TTL (per interface) a multicast packet 330should have to be forwarded on an outgoing interface. 331If the TTL value is zero, the corresponding interface is not included 332in the set of outgoing interfaces. 333Note that for IPv6 only the set of outgoing interfaces can 334be specified. 335.Pp 336An MFC entry is deleted by: 337.Bd -literal -offset indent 338/* IPv4 */ 339struct mfcctl mc; 340memset(&mc, 0, sizeof(mc)); 341memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin)); 342memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp)); 343setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC, &mc, sizeof(mc)); 344.Ed 345.Bd -literal -offset indent 346/* IPv6 */ 347struct mf6cctl mc; 348memset(&mc, 0, sizeof(mc)); 349memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin)); 350memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp)); 351setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_DEL_MFC, &mc, sizeof(mc)); 352.Ed 353.Pp 354The following method can be used to get various statistics per 355installed MFC entry in the kernel (e.g., the number of forwarded 356packets per source and group address): 357.Bd -literal -offset indent 358/* IPv4 */ 359struct sioc_sg_req sgreq; 360memset(&sgreq, 0, sizeof(sgreq)); 361memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src)); 362memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp)); 363ioctl(mrouter_s4, SIOCGETSGCNT, &sgreq); 364.Ed 365.Bd -literal -offset indent 366/* IPv6 */ 367struct sioc_sg_req6 sgreq; 368memset(&sgreq, 0, sizeof(sgreq)); 369memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src)); 370memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp)); 371ioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq); 372.Ed 373.Pp 374The following method can be used to get various statistics per 375multicast virtual interface in the kernel (e.g., the number of forwarded 376packets per interface): 377.Bd -literal -offset indent 378/* IPv4 */ 379struct sioc_vif_req vreq; 380memset(&vreq, 0, sizeof(vreq)); 381vreq.vifi = vif_index; 382ioctl(mrouter_s4, SIOCGETVIFCNT, &vreq); 383.Ed 384.Bd -literal -offset indent 385/* IPv6 */ 386struct sioc_mif_req6 mreq; 387memset(&mreq, 0, sizeof(mreq)); 388mreq.mifi = vif_index; 389ioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq); 390.Ed 391.Ss Advanced Multicast API Programming Guide 392Adding new features to the kernel makes it difficult 393to preserve backward compatibility (binary and API), 394and at the same time to allow user-level processes to take advantage of 395the new features (if the kernel supports them). 396.Pp 397One of the mechanisms that allows preserving the backward 398compatibility is a sort of negotiation 399between the user-level process and the kernel: 400.Bl -enum 401.It 402The user-level process tries to enable in the kernel the set of new 403features (and the corresponding API) it would like to use. 404.It 405The kernel returns the (sub)set of features it knows about 406and is willing to be enabled. 407.It 408The user-level process uses only that set of features 409the kernel has agreed on. 410.El 411.\" 412.Pp 413To support backward compatibility, if the user-level process does not 414ask for any new features, the kernel defaults to the basic 415multicast API (see the 416.Sx "Programming Guide" 417section). 418.\" XXX: edit as appropriate after the advanced multicast API is 419.\" supported under IPv6 420Currently, the advanced multicast API exists only for IPv4; 421in the future there will be IPv6 support as well. 422.Pp 423Below is a summary of the expandable API solution. 424Note that all new options and structures are defined 425in 426.In netinet/ip_mroute.h 427and 428.In netinet6/ip6_mroute.h , 429unless stated otherwise. 430.Pp 431The user-level process uses new 432.Fn getsockopt Ns / Ns Fn setsockopt 433options to 434perform the API features negotiation with the kernel. 435This negotiation must be performed right after the multicast routing 436socket is open. 437The set of desired/allowed features is stored in a bitset 438(currently, in 439.Vt uint32_t 440i.e., maximum of 32 new features). 441The new 442.Fn getsockopt Ns / Ns Fn setsockopt 443options are 444.Dv MRT_API_SUPPORT 445and 446.Dv MRT_API_CONFIG . 447An example: 448.Bd -literal -offset 3n 449uint32_t v; 450getsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, &v, sizeof(v)); 451.Ed 452.Pp 453This would set 454.Va v 455to the pre-defined bits that the kernel API supports. 456The eight least significant bits in 457.Vt uint32_t 458are the same as the 459eight possible flags 460.Dv MRT_MFC_FLAGS_* 461that can be used in 462.Va mfcc_flags 463as part of the new definition of 464.Vt "struct mfcctl" 465(see below about those flags), which leaves 24 flags for other new features. 466The value returned by 467.Fn getsockopt MRT_API_SUPPORT 468is read-only; in other words, 469.Fn setsockopt MRT_API_SUPPORT 470would fail. 471.Pp 472To modify the API, and to set some specific feature in the kernel, then: 473.Bd -literal -offset 3n 474uint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF; 475if (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, &v, sizeof(v)) != 0) 476 return (ERROR); 477if (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF) 478 return (OK); /* Success */ 479else 480 return (ERROR); 481.Ed 482.Pp 483In other words, when 484.Fn setsockopt MRT_API_CONFIG 485is called, the 486argument to it specifies the desired set of features to 487be enabled in the API and the kernel. 488The return value in 489.Va v 490is the actual (sub)set of features that were enabled in the kernel. 491To obtain later the same set of features that were enabled, use: 492.Bd -literal -offset indent 493getsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, &v, sizeof(v)); 494.Ed 495.Pp 496The set of enabled features is global. 497In other words, 498.Fn setsockopt MRT_API_CONFIG 499should be called right after 500.Fn setsockopt MRT_INIT . 501.Pp 502Currently, the following set of new features is defined: 503.Bd -literal 504#define MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0)/*disable WRONGVIF signals*/ 505#define MRT_MFC_FLAGS_BORDER_VIF (1 << 1) /* border vif */ 506#define MRT_MFC_RP (1 << 8) /* enable RP address */ 507#define MRT_MFC_BW_UPCALL (1 << 9) /* enable bw upcalls */ 508.Ed 509.\" .Pp 510.\" In the future there might be: 511.\" .Bd -literal 512.\" #define MRT_MFC_GROUP_SPECIFIC (1 << 10) /* allow (*,G) MFC entries */ 513.\" .Ed 514.\" .Pp 515.\" to allow (*,G) MFC entries (i.e., group-specific entries) in the kernel. 516.\" For now this is left-out until it is clear whether 517.\" (*,G) MFC support is the preferred solution instead of something more generic 518.\" solution for example. 519.\" 520.\" 2. The newly defined struct mfcctl2. 521.\" 522.Pp 523The advanced multicast API uses a newly defined 524.Vt "struct mfcctl2" 525instead of the traditional 526.Vt "struct mfcctl" . 527The original 528.Vt "struct mfcctl" 529is kept as is. 530The new 531.Vt "struct mfcctl2" 532is: 533.Bd -literal 534/* 535 * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays 536 * and extends the old struct mfcctl. 537 */ 538struct mfcctl2 { 539 /* the mfcctl fields */ 540 struct in_addr mfcc_origin; /* ip origin of mcasts */ 541 struct in_addr mfcc_mcastgrp; /* multicast group associated*/ 542 vifi_t mfcc_parent; /* incoming vif */ 543 u_char mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs */ 544 545 /* extension fields */ 546 uint8_t mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/ 547 struct in_addr mfcc_rp; /* the RP address */ 548}; 549.Ed 550.Pp 551The new fields are 552.Va mfcc_flags[MAXVIFS] 553and 554.Va mfcc_rp . 555Note that for compatibility reasons they are added at the end. 556.Pp 557The 558.Va mfcc_flags[MAXVIFS] 559field is used to set various flags per 560interface per (S,G) entry. 561Currently, the defined flags are: 562.Bd -literal 563#define MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0)/*disable WRONGVIF signals*/ 564#define MRT_MFC_FLAGS_BORDER_VIF (1 << 1) /* border vif */ 565.Ed 566.Pp 567The 568.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF 569flag is used to explicitly disable the 570.Dv IGMPMSG_WRONGVIF 571kernel signal at the (S,G) granularity if a multicast data packet 572arrives on the wrong interface. 573However, it should not be delivered for interfaces that are not set in 574the outgoing interface, and that are not expecting to 575become an incoming interface. 576Hence, if the 577.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF 578flag is set for some of the 579interfaces, then a data packet that arrives on that interface for 580that MFC entry will NOT trigger a WRONGVIF signal. 581If that flag is not set, then a signal is triggered (the default action). 582.Pp 583Typically, a multicast routing user-level process would need to know the 584forwarding bandwidth for some data flow. 585.Pp 586The original solution for measuring the bandwidth of a dataflow was 587that a user-level process would periodically 588query the kernel about the number of forwarded packets/bytes per 589(S,G), and then based on those numbers it would estimate whether a source 590has been idle, or whether the source's transmission bandwidth is above a 591threshold. 592That solution is far from being scalable, hence the need for a new 593mechanism for bandwidth monitoring. 594.Pp 595Below is a description of the bandwidth monitoring mechanism. 596.Bl -bullet 597.It 598If the bandwidth of a data flow satisfies some pre-defined filter, 599the kernel delivers an upcall on the multicast routing socket 600to the multicast routing process that has installed that filter. 601.It 602The bandwidth-upcall filters are installed per (S,G). 603There can be 604more than one filter per (S,G). 605.It 606Instead of supporting all possible comparison operations 607(i.e., < <= == != > >= ), there is support only for the 608<= and >= operations, 609because this makes the kernel-level implementation simpler, 610and because practically we need only those two. 611Furthermore, the missing operations can be simulated by secondary 612user-level filtering of those <= and >= filters. 613For example, to simulate !=, then we need to install filter 614.Dq bw <= 0xffffffff , 615and after an 616upcall is received, we need to check whether 617.Dq measured_bw != expected_bw . 618.It 619The bandwidth-upcall mechanism is enabled by 620.Fn setsockopt MRT_API_CONFIG 621for the 622.Dv MRT_MFC_BW_UPCALL 623flag. 624.It 625The bandwidth-upcall filters are added/deleted by the new 626.Fn setsockopt MRT_ADD_BW_UPCALL 627and 628.Fn setsockopt MRT_DEL_BW_UPCALL 629respectively (with the appropriate 630.Vt "struct bw_upcall" 631argument of course). 632.El 633.Pp 634From an application point of view, a developer needs to know about 635the following: 636.Bd -literal 637/* 638 * Structure for installing or delivering an upcall if the 639 * measured bandwidth is above or below a threshold. 640 * 641 * User programs (e.g. daemons) may have a need to know when the 642 * bandwidth used by some data flow is above or below some threshold. 643 * This interface allows the userland to specify the threshold (in 644 * bytes and/or packets) and the measurement interval. Flows are 645 * all packet with the same source and destination IP address. 646 * At the moment the code is only used for multicast destinations 647 * but there is nothing that prevents its use for unicast. 648 * 649 * The measurement interval cannot be shorter than some Tmin (3s). 650 * The threshold is set in packets and/or bytes per_interval. 651 * 652 * Measurement works as follows: 653 * 654 * For >= measurements: 655 * The first packet marks the start of a measurement interval. 656 * During an interval we count packets and bytes, and when we 657 * pass the threshold we deliver an upcall and we are done. 658 * The first packet after the end of the interval resets the 659 * count and restarts the measurement. 660 * 661 * For <= measurement: 662 * We start a timer to fire at the end of the interval, and 663 * then for each incoming packet we count packets and bytes. 664 * When the timer fires, we compare the value with the threshold, 665 * schedule an upcall if we are below, and restart the measurement 666 * (reschedule timer and zero counters). 667 */ 668 669struct bw_data { 670 struct timeval b_time; 671 uint64_t b_packets; 672 uint64_t b_bytes; 673}; 674 675struct bw_upcall { 676 struct in_addr bu_src; /* source address */ 677 struct in_addr bu_dst; /* destination address */ 678 uint32_t bu_flags; /* misc flags (see below) */ 679#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets) */ 680#define BW_UPCALL_UNIT_BYTES (1 << 1) /* threshold (in bytes) */ 681#define BW_UPCALL_GEQ (1 << 2) /* upcall if bw >= threshold */ 682#define BW_UPCALL_LEQ (1 << 3) /* upcall if bw <= threshold */ 683#define BW_UPCALL_DELETE_ALL (1 << 4) /* delete all upcalls for s,d*/ 684 struct bw_data bu_threshold; /* the bw threshold */ 685 struct bw_data bu_measured; /* the measured bw */ 686}; 687 688/* max. number of upcalls to deliver together */ 689#define BW_UPCALLS_MAX 128 690/* min. threshold time interval for bandwidth measurement */ 691#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC 3 692#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC 0 693.Ed 694.Pp 695The 696.Vt bw_upcall 697structure is used as an argument to 698.Fn setsockopt MRT_ADD_BW_UPCALL 699and 700.Fn setsockopt MRT_DEL_BW_UPCALL . 701Each 702.Fn setsockopt MRT_ADD_BW_UPCALL 703installs a filter in the kernel 704for the source and destination address in the 705.Vt bw_upcall 706argument, 707and that filter will trigger an upcall according to the following 708pseudo-algorithm: 709.Bd -literal 710 if (bw_upcall_oper IS ">=") { 711 if (((bw_upcall_unit & PACKETS == PACKETS) && 712 (measured_packets >= threshold_packets)) || 713 ((bw_upcall_unit & BYTES == BYTES) && 714 (measured_bytes >= threshold_bytes))) 715 SEND_UPCALL("measured bandwidth is >= threshold"); 716 } 717 if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) { 718 if (((bw_upcall_unit & PACKETS == PACKETS) && 719 (measured_packets <= threshold_packets)) || 720 ((bw_upcall_unit & BYTES == BYTES) && 721 (measured_bytes <= threshold_bytes))) 722 SEND_UPCALL("measured bandwidth is <= threshold"); 723 } 724.Ed 725.Pp 726In the same 727.Vt bw_upcall , 728the unit can be specified in both BYTES and PACKETS. 729However, the GEQ and LEQ flags are mutually exclusive. 730.Pp 731Basically, an upcall is delivered if the measured bandwidth is >= or 732<= the threshold bandwidth (within the specified measurement 733interval). 734For practical reasons, the smallest value for the measurement 735interval is 3 seconds. 736If smaller values are allowed, then the bandwidth 737estimation may be less accurate, or the potentially very high frequency 738of the generated upcalls may introduce too much overhead. 739For the >= operation, the answer may be known before the end of 740.Va threshold_interval , 741therefore the upcall may be delivered earlier. 742For the <= operation however, we must wait 743until the threshold interval has expired to know the answer. 744.Sh EXAMPLES 745.Bd -literal -offset indent 746struct bw_upcall bw_upcall; 747/* Assign all bw_upcall fields as appropriate */ 748memset(&bw_upcall, 0, sizeof(bw_upcall)); 749memcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src)); 750memcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst)); 751bw_upcall.bu_threshold.b_data = threshold_interval; 752bw_upcall.bu_threshold.b_packets = threshold_packets; 753bw_upcall.bu_threshold.b_bytes = threshold_bytes; 754if (is_threshold_in_packets) 755 bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS; 756if (is_threshold_in_bytes) 757 bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES; 758do { 759 if (is_geq_upcall) { 760 bw_upcall.bu_flags |= BW_UPCALL_GEQ; 761 break; 762 } 763 if (is_leq_upcall) { 764 bw_upcall.bu_flags |= BW_UPCALL_LEQ; 765 break; 766 } 767 return (ERROR); 768} while (0); 769setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL, 770 &bw_upcall, sizeof(bw_upcall)); 771.Ed 772.Pp 773To delete a single filter, use 774.Dv MRT_DEL_BW_UPCALL , 775and the fields of bw_upcall must be set to 776exactly same as when 777.Dv MRT_ADD_BW_UPCALL 778was called. 779.Pp 780To delete all bandwidth filters for a given (S,G), then 781only the 782.Va bu_src 783and 784.Va bu_dst 785fields in 786.Vt "struct bw_upcall" 787need to be set, and then just set only the 788.Dv BW_UPCALL_DELETE_ALL 789flag inside field 790.Va bw_upcall.bu_flags . 791.Pp 792The bandwidth upcalls are received by aggregating them in the new upcall 793message: 794.Bd -literal -offset indent 795#define IGMPMSG_BW_UPCALL 4 /* BW monitoring upcall */ 796.Ed 797.Pp 798This message is an array of 799.Vt "struct bw_upcall" 800elements (up to 801.Dv BW_UPCALLS_MAX 802= 128). 803The upcalls are 804delivered when there are 128 pending upcalls, or when 1 second has 805expired since the previous upcall (whichever comes first). 806In an 807.Vt "struct upcall" 808element, the 809.Va bu_measured 810field is filled in to 811indicate the particular measured values. 812However, because of the way 813the particular intervals are measured, the user should be careful how 814.Va bu_measured.b_time 815is used. 816For example, if the 817filter is installed to trigger an upcall if the number of packets 818is >= 1, then 819.Va bu_measured 820may have a value of zero in the upcalls after the 821first one, because the measured interval for >= filters is 822.Dq clocked 823by the forwarded packets. 824Hence, this upcall mechanism should not be used for measuring 825the exact value of the bandwidth of the forwarded data. 826To measure the exact bandwidth, the user would need to 827get the forwarded packets statistics with the 828.Fn ioctl SIOCGETSGCNT 829mechanism 830(see the 831.Sx Programming Guide 832section). 833.Pp 834Note that the upcalls for a filter are delivered until the specific 835filter is deleted, but no more frequently than once per 836.Va bu_threshold.b_time . 837For example, if the filter is specified to 838deliver a signal if bw >= 1 packet, the first packet will trigger a 839signal, but the next upcall will be triggered no earlier than 840.Va bu_threshold.b_time 841after the previous upcall. 842.\" 843.Sh SEE ALSO 844.Xr getsockopt 2 , 845.Xr recvfrom 2 , 846.Xr recvmsg 2 , 847.Xr setsockopt 2 , 848.Xr socket 2 , 849.Xr icmp6 4 , 850.Xr inet 4 , 851.Xr inet6 4 , 852.Xr intro 4 , 853.Xr ip 4 , 854.Xr ip6 4 , 855.Xr mrouted 8 , 856.Xr sysctl 8 857.\" 858.Sh AUTHORS 859.An -nosplit 860The original multicast code was written by 861.An David Waitzman 862(BBN Labs), 863and later modified by the following individuals: 864.An Steve Deering 865(Stanford), 866.An Mark J. Steiglitz 867(Stanford), 868.An Van Jacobson 869(LBL), 870.An Ajit Thyagarajan 871(PARC), 872.An Bill Fenner 873(PARC). 874.Pp 875The IPv6 multicast support was implemented by the KAME project 876.Pq Lk http://www.kame.net , 877and was based on the IPv4 multicast code. 878The advanced multicast API and the multicast bandwidth 879monitoring were implemented by 880.An Pavlin Radoslavov 881(ICSI) 882in collaboration with 883.An Chris Brown 884(NextHop). 885.Pp 886This manual page was written by 887.An Pavlin Radoslavov 888(ICSI). 889