xref: /openbsd/share/man/man4/multicast.4 (revision d55a83ee)
1.\" Copyright (c) 2001-2003 International Computer Science Institute
2.\"
3.\" Permission is hereby granted, free of charge, to any person obtaining a
4.\" copy of this software and associated documentation files (the "Software"),
5.\" to deal in the Software without restriction, including without limitation
6.\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
7.\" and/or sell copies of the Software, and to permit persons to whom the
8.\" Software is furnished to do so, subject to the following conditions:
9.\"
10.\" The above copyright notice and this permission notice shall be included in
11.\" all copies or substantial portions of the Software.
12.\"
13.\" The names and trademarks of copyright holders may not be used in
14.\" advertising or publicity pertaining to the software without specific
15.\" prior permission. Title to copyright in this software and any associated
16.\" documentation will at all times remain with the copyright holders.
17.\"
18.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
19.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
20.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
21.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
22.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
23.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
24.\" DEALINGS IN THE SOFTWARE.
25.\"
26.\" $FreeBSD: src/share/man/man4/multicast.4,v 1.4 2004/07/09 09:22:36 ru Exp $
27.\" $OpenBSD: multicast.4,v 1.15 2019/03/10 21:31:49 jmc Exp $
28.\" $NetBSD: multicast.4,v 1.3 2004/09/12 13:12:26 wiz Exp $
29.\"
30.Dd $Mdocdate: March 10 2019 $
31.Dt MULTICAST 4
32.Os
33.\"
34.Sh NAME
35.Nm multicast
36.Nd multicast routing
37.\"
38.Sh SYNOPSIS
39.Cd "options MROUTING"
40.Pp
41.In sys/types.h
42.In sys/socket.h
43.In netinet/in.h
44.In netinet/ip_mroute.h
45.In netinet6/ip6_mroute.h
46.Ft int
47.Fn getsockopt "int s" IPPROTO_IP MRT_INIT "void *optval" "socklen_t *optlen"
48.Ft int
49.Fn setsockopt "int s" IPPROTO_IP MRT_INIT "const void *optval" "socklen_t optlen"
50.Ft int
51.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_INIT "void *optval" "socklen_t *optlen"
52.Ft int
53.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_INIT "const void *optval" "socklen_t optlen"
54.Sh DESCRIPTION
55Multicast routing is used to efficiently propagate data
56packets to a set of multicast listeners in multipoint networks.
57If unicast is used to replicate the data to all listeners,
58then some of the network links may carry multiple copies of the same
59data packets.
60With multicast routing, the overhead is reduced to one copy
61(at most) per network link.
62.Pp
63All multicast-capable routers must run a common multicast routing
64protocol.
65The Distance Vector Multicast Routing Protocol (DVMRP)
66was the first developed multicast routing protocol.
67Later, other protocols such as Multicast Extensions to OSPF (MOSPF) and
68Core Based Trees (CBT)
69were developed as well.
70.Pp
71To start multicast routing,
72the user must enable multicast forwarding via the
73.Xr sysctl 8
74variables
75.Va net.inet.ip.mforwarding
76and/or
77.Va net.inet.ip6.mforwarding ,
78and set
79.Va multicast
80to
81.Dq YES
82in
83.Xr rc.conf.local 8 .
84The user must also run a multicast routing capable user-level process,
85such as
86.Xr mrouted 8 .
87From a developer's point of view,
88the programming guide described in the
89.Sx Programming Guide
90section should be used to control the multicast forwarding in the kernel.
91.\"
92.Ss Programming Guide
93This section provides information about the basic multicast routing API.
94The so-called
95.Dq advanced multicast API
96is described in the
97.Sx "Advanced Multicast API Programming Guide"
98section.
99.Pp
100First, a multicast routing socket must be open.
101That socket would be used
102to control the multicast forwarding in the kernel.
103Note that most operations below require certain privilege
104(i.e., root privilege):
105.Bd -literal -offset indent
106/* IPv4 */
107int mrouter_s4;
108mrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP);
109.Ed
110.Bd -literal -offset indent
111int mrouter_s6;
112mrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
113.Ed
114.Pp
115Note that if the router needs to open an IGMP or ICMPv6 socket
116(IPv4 or IPv6, respectively)
117for sending or receiving of IGMP or MLD multicast group membership messages,
118then the same
119.Va mrouter_s4
120or
121.Va mrouter_s6
122sockets should be used
123for sending and receiving respectively IGMP or MLD messages.
124In the case of
125.Bx Ns -derived
126kernels,
127it may be possible to open separate sockets
128for IGMP or MLD messages only.
129However, some other kernels (e.g., Linux)
130require that the multicast
131routing socket must be used for sending and receiving of IGMP or MLD
132messages.
133Therefore, for portability reasons, the multicast
134routing socket should be reused for IGMP and MLD messages as well.
135.Pp
136After the multicast routing socket is open, it can be used to enable
137or disable multicast forwarding in the kernel:
138.Bd -literal -offset 5n
139/* IPv4 */
140int v = 1;        /* 1 to enable, or 0 to disable */
141setsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, &v, sizeof(v));
142.Ed
143.Bd -literal -offset 5n
144/* IPv6 */
145int v = 1;        /* 1 to enable, or 0 to disable */
146setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, &v, sizeof(v));
147\&...
148/* If necessary, filter all ICMPv6 messages */
149struct icmp6_filter filter;
150ICMP6_FILTER_SETBLOCKALL(&filter);
151setsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, &filter,
152           sizeof(filter));
153.Ed
154.Pp
155For each network interface (e.g., physical or a virtual tunnel)
156that would be used for multicast forwarding, a corresponding
157multicast interface must be added to the kernel:
158.Bd -literal -offset 3n
159/* IPv4 */
160struct vifctl vc;
161memset(&vc, 0, sizeof(vc));
162/* Assign all vifctl fields as appropriate */
163vc.vifc_vifi = vif_index;
164vc.vifc_flags = vif_flags;
165vc.vifc_threshold = min_ttl_threshold;
166vc.vifc_rate_limit = max_rate_limit;
167memcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr));
168if (vc.vifc_flags & VIFF_TUNNEL)
169    memcpy(&vc.vifc_rmt_addr, &vif_remote_address,
170           sizeof(vc.vifc_rmt_addr));
171setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));
172.Ed
173.Pp
174The
175.Va vif_index
176must be unique per vif.
177The
178.Va vif_flags
179contains the
180.Dv VIFF_*
181flags as defined in
182.In netinet/ip_mroute.h .
183The
184.Va min_ttl_threshold
185contains the minimum TTL a multicast data packet must have to be
186forwarded on that vif.
187Typically, it would be 1.
188The
189.Va max_rate_limit
190contains the maximum rate (in bits/s) of the multicast data packets forwarded
191on that vif.
192A value of 0 means no limit.
193The
194.Va vif_local_address
195contains the local IP address of the corresponding local interface.
196The
197.Va vif_remote_address
198contains the remote IP address for DVMRP multicast tunnels.
199.Bd -literal -offset indent
200/* IPv6 */
201struct mif6ctl mc;
202memset(&mc, 0, sizeof(mc));
203/* Assign all mif6ctl fields as appropriate */
204mc.mif6c_mifi = mif_index;
205mc.mif6c_flags = mif_flags;
206mc.mif6c_pifi = pif_index;
207setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, &mc, sizeof(mc));
208.Ed
209.Pp
210The
211.Va mif_index
212must be unique per vif.
213The
214.Va mif_flags
215contains the
216.Dv MIFF_*
217flags as defined in
218.In netinet6/ip6_mroute.h .
219The
220.Va pif_index
221is the physical interface index of the corresponding local interface.
222.Pp
223A multicast interface is deleted by:
224.Bd -literal -offset indent
225/* IPv4 */
226vifi_t vifi = vif_index;
227setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, &vifi,
228           sizeof(vifi));
229.Ed
230.Bd -literal -offset indent
231/* IPv6 */
232mifi_t mifi = mif_index;
233setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, &mifi,
234           sizeof(mifi));
235.Ed
236.Pp
237After multicast forwarding is enabled, and the multicast virtual
238interfaces have been
239added, the kernel may deliver upcall messages (also called signals
240later in this text) on the multicast routing socket that was open
241earlier with
242.Dv MRT_INIT
243or
244.Dv MRT6_INIT .
245The IPv4 upcalls have a
246.Vt "struct igmpmsg"
247header (see
248.In netinet/ip_mroute.h )
249with the
250.Va im_mbz
251field set to zero.
252Note that this header follows the structure of
253.Vt "struct ip"
254with the protocol field
255.Va ip_p
256set to zero.
257The IPv6 upcalls have a
258.Vt "struct mrt6msg"
259header (see
260.In netinet6/ip6_mroute.h )
261with the
262.Va im6_mbz
263field set to zero.
264Note that this header follows the structure of
265.Vt "struct ip6_hdr"
266with the next header field
267.Va ip6_nxt
268set to zero.
269.Pp
270The upcall header contains the
271.Va im_msgtype
272and
273.Va im6_msgtype
274fields, with the type of the upcall
275.Dv IGMPMSG_*
276and
277.Dv MRT6MSG_*
278for IPv4 and IPv6, respectively.
279The values of the rest of the upcall header fields
280and the body of the upcall message depend on the particular upcall type.
281.Pp
282If the upcall message type is
283.Dv IGMPMSG_NOCACHE
284or
285.Dv MRT6MSG_NOCACHE ,
286this is an indication that a multicast packet has reached the multicast
287router, but the router has no forwarding state for that packet.
288Typically, the upcall would be a signal for the multicast routing
289user-level process to install the appropriate Multicast Forwarding
290Cache (MFC) entry in the kernel.
291.Pp
292An MFC entry is added by:
293.Bd -literal -offset indent
294/* IPv4 */
295struct mfcctl mc;
296memset(&mc, 0, sizeof(mc));
297memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
298memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
299mc.mfcc_parent = iif_index;
300for (i = 0; i < maxvifs; i++)
301    mc.mfcc_ttls[i] = oifs_ttl[i];
302setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC, &mc, sizeof(mc));
303.Ed
304.Bd -literal -offset indent
305/* IPv6 */
306struct mf6cctl mc;
307memset(&mc, 0, sizeof(mc));
308memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
309memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
310mc.mf6cc_parent = iif_index;
311for (i = 0; i < maxvifs; i++)
312    if (oifs_ttl[i] > 0)
313        IF_SET(i, &mc.mf6cc_ifset);
314setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_ADD_MFC, &mc, sizeof(mc));
315.Ed
316.Pp
317The
318.Va source_addr
319and
320.Va group_addr
321fields are the source and group address of the multicast packet (as set
322in the upcall message).
323The
324.Va iif_index
325is the virtual interface index of the multicast interface the multicast
326packets for this specific source and group address should be received on.
327The
328.Va oifs_ttl[]
329array contains the minimum TTL (per interface) a multicast packet
330should have to be forwarded on an outgoing interface.
331If the TTL value is zero, the corresponding interface is not included
332in the set of outgoing interfaces.
333Note that for IPv6 only the set of outgoing interfaces can
334be specified.
335.Pp
336An MFC entry is deleted by:
337.Bd -literal -offset indent
338/* IPv4 */
339struct mfcctl mc;
340memset(&mc, 0, sizeof(mc));
341memcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
342memcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
343setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC, &mc, sizeof(mc));
344.Ed
345.Bd -literal -offset indent
346/* IPv6 */
347struct mf6cctl mc;
348memset(&mc, 0, sizeof(mc));
349memcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
350memcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
351setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_DEL_MFC, &mc, sizeof(mc));
352.Ed
353.Pp
354The following method can be used to get various statistics per
355installed MFC entry in the kernel (e.g., the number of forwarded
356packets per source and group address):
357.Bd -literal -offset indent
358/* IPv4 */
359struct sioc_sg_req sgreq;
360memset(&sgreq, 0, sizeof(sgreq));
361memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
362memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
363ioctl(mrouter_s4, SIOCGETSGCNT, &sgreq);
364.Ed
365.Bd -literal -offset indent
366/* IPv6 */
367struct sioc_sg_req6 sgreq;
368memset(&sgreq, 0, sizeof(sgreq));
369memcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
370memcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
371ioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq);
372.Ed
373.Pp
374The following method can be used to get various statistics per
375multicast virtual interface in the kernel (e.g., the number of forwarded
376packets per interface):
377.Bd -literal -offset indent
378/* IPv4 */
379struct sioc_vif_req vreq;
380memset(&vreq, 0, sizeof(vreq));
381vreq.vifi = vif_index;
382ioctl(mrouter_s4, SIOCGETVIFCNT, &vreq);
383.Ed
384.Bd -literal -offset indent
385/* IPv6 */
386struct sioc_mif_req6 mreq;
387memset(&mreq, 0, sizeof(mreq));
388mreq.mifi = vif_index;
389ioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq);
390.Ed
391.Ss Advanced Multicast API Programming Guide
392Adding new features to the kernel makes it difficult
393to preserve backward compatibility (binary and API),
394and at the same time to allow user-level processes to take advantage of
395the new features (if the kernel supports them).
396.Pp
397One of the mechanisms that allows preserving the backward
398compatibility is a sort of negotiation
399between the user-level process and the kernel:
400.Bl -enum
401.It
402The user-level process tries to enable in the kernel the set of new
403features (and the corresponding API) it would like to use.
404.It
405The kernel returns the (sub)set of features it knows about
406and is willing to be enabled.
407.It
408The user-level process uses only that set of features
409the kernel has agreed on.
410.El
411.\"
412.Pp
413To support backward compatibility, if the user-level process does not
414ask for any new features, the kernel defaults to the basic
415multicast API (see the
416.Sx "Programming Guide"
417section).
418.\" XXX: edit as appropriate after the advanced multicast API is
419.\" supported under IPv6
420Currently, the advanced multicast API exists only for IPv4;
421in the future there will be IPv6 support as well.
422.Pp
423Below is a summary of the expandable API solution.
424Note that all new options and structures are defined
425in
426.In netinet/ip_mroute.h
427and
428.In netinet6/ip6_mroute.h ,
429unless stated otherwise.
430.Pp
431The user-level process uses new
432.Fn getsockopt Ns / Ns Fn setsockopt
433options to
434perform the API features negotiation with the kernel.
435This negotiation must be performed right after the multicast routing
436socket is open.
437The set of desired/allowed features is stored in a bitset
438(currently, in
439.Vt uint32_t
440i.e., maximum of 32 new features).
441The new
442.Fn getsockopt Ns / Ns Fn setsockopt
443options are
444.Dv MRT_API_SUPPORT
445and
446.Dv MRT_API_CONFIG .
447An example:
448.Bd -literal -offset 3n
449uint32_t v;
450getsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, &v, sizeof(v));
451.Ed
452.Pp
453This would set
454.Va v
455to the pre-defined bits that the kernel API supports.
456The eight least significant bits in
457.Vt uint32_t
458are the same as the
459eight possible flags
460.Dv MRT_MFC_FLAGS_*
461that can be used in
462.Va mfcc_flags
463as part of the new definition of
464.Vt "struct mfcctl"
465(see below about those flags), which leaves 24 flags for other new features.
466The value returned by
467.Fn getsockopt MRT_API_SUPPORT
468is read-only; in other words,
469.Fn setsockopt MRT_API_SUPPORT
470would fail.
471.Pp
472To modify the API, and to set some specific feature in the kernel, then:
473.Bd -literal -offset 3n
474uint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF;
475if (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, &v, sizeof(v)) != 0)
476    return (ERROR);
477if (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF)
478    return (OK);	/* Success */
479else
480    return (ERROR);
481.Ed
482.Pp
483In other words, when
484.Fn setsockopt MRT_API_CONFIG
485is called, the
486argument to it specifies the desired set of features to
487be enabled in the API and the kernel.
488The return value in
489.Va v
490is the actual (sub)set of features that were enabled in the kernel.
491To obtain later the same set of features that were enabled, use:
492.Bd -literal -offset indent
493getsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, &v, sizeof(v));
494.Ed
495.Pp
496The set of enabled features is global.
497In other words,
498.Fn setsockopt MRT_API_CONFIG
499should be called right after
500.Fn setsockopt MRT_INIT .
501.Pp
502Currently, the following set of new features is defined:
503.Bd -literal
504#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0)/*disable WRONGVIF signals*/
505#define	MRT_MFC_FLAGS_BORDER_VIF   (1 << 1)  /* border vif              */
506#define MRT_MFC_RP                 (1 << 8)  /* enable RP address	*/
507#define MRT_MFC_BW_UPCALL          (1 << 9)  /* enable bw upcalls	*/
508.Ed
509.\" .Pp
510.\" In the future there might be:
511.\" .Bd -literal
512.\" #define MRT_MFC_GROUP_SPECIFIC     (1 << 10) /* allow (*,G) MFC entries */
513.\" .Ed
514.\" .Pp
515.\" to allow (*,G) MFC entries (i.e., group-specific entries) in the kernel.
516.\" For now this is left-out until it is clear whether
517.\" (*,G) MFC support is the preferred solution instead of something more generic
518.\" solution for example.
519.\"
520.\" 2. The newly defined struct mfcctl2.
521.\"
522.Pp
523The advanced multicast API uses a newly defined
524.Vt "struct mfcctl2"
525instead of the traditional
526.Vt "struct mfcctl" .
527The original
528.Vt "struct mfcctl"
529is kept as is.
530The new
531.Vt "struct mfcctl2"
532is:
533.Bd -literal
534/*
535 * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays
536 * and extends the old struct mfcctl.
537 */
538struct mfcctl2 {
539        /* the mfcctl fields */
540        struct in_addr  mfcc_origin;       /* ip origin of mcasts       */
541        struct in_addr  mfcc_mcastgrp;     /* multicast group associated*/
542        vifi_t          mfcc_parent;       /* incoming vif              */
543        u_char          mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs   */
544
545        /* extension fields */
546        uint8_t         mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/
547        struct in_addr  mfcc_rp;            /* the RP address           */
548};
549.Ed
550.Pp
551The new fields are
552.Va mfcc_flags[MAXVIFS]
553and
554.Va mfcc_rp .
555Note that for compatibility reasons they are added at the end.
556.Pp
557The
558.Va mfcc_flags[MAXVIFS]
559field is used to set various flags per
560interface per (S,G) entry.
561Currently, the defined flags are:
562.Bd -literal
563#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0)/*disable WRONGVIF signals*/
564#define	MRT_MFC_FLAGS_BORDER_VIF       (1 << 1) /* border vif          */
565.Ed
566.Pp
567The
568.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF
569flag is used to explicitly disable the
570.Dv IGMPMSG_WRONGVIF
571kernel signal at the (S,G) granularity if a multicast data packet
572arrives on the wrong interface.
573However, it should not be delivered for interfaces that are not set in
574the outgoing interface, and that are not expecting to
575become an incoming interface.
576Hence, if the
577.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF
578flag is set for some of the
579interfaces, then a data packet that arrives on that interface for
580that MFC entry will NOT trigger a WRONGVIF signal.
581If that flag is not set, then a signal is triggered (the default action).
582.Pp
583Typically, a multicast routing user-level process would need to know the
584forwarding bandwidth for some data flow.
585.Pp
586The original solution for measuring the bandwidth of a dataflow was
587that a user-level process would periodically
588query the kernel about the number of forwarded packets/bytes per
589(S,G), and then based on those numbers it would estimate whether a source
590has been idle, or whether the source's transmission bandwidth is above a
591threshold.
592That solution is far from being scalable, hence the need for a new
593mechanism for bandwidth monitoring.
594.Pp
595Below is a description of the bandwidth monitoring mechanism.
596.Bl -bullet
597.It
598If the bandwidth of a data flow satisfies some pre-defined filter,
599the kernel delivers an upcall on the multicast routing socket
600to the multicast routing process that has installed that filter.
601.It
602The bandwidth-upcall filters are installed per (S,G).
603There can be
604more than one filter per (S,G).
605.It
606Instead of supporting all possible comparison operations
607(i.e., < <= == != > >= ), there is support only for the
608<= and >= operations,
609because this makes the kernel-level implementation simpler,
610and because practically we need only those two.
611Furthermore, the missing operations can be simulated by secondary
612user-level filtering of those <= and >= filters.
613For example, to simulate !=, then we need to install filter
614.Dq bw <= 0xffffffff ,
615and after an
616upcall is received, we need to check whether
617.Dq measured_bw != expected_bw .
618.It
619The bandwidth-upcall mechanism is enabled by
620.Fn setsockopt MRT_API_CONFIG
621for the
622.Dv MRT_MFC_BW_UPCALL
623flag.
624.It
625The bandwidth-upcall filters are added/deleted by the new
626.Fn setsockopt MRT_ADD_BW_UPCALL
627and
628.Fn setsockopt MRT_DEL_BW_UPCALL
629respectively (with the appropriate
630.Vt "struct bw_upcall"
631argument of course).
632.El
633.Pp
634From an application point of view, a developer needs to know about
635the following:
636.Bd -literal
637/*
638 * Structure for installing or delivering an upcall if the
639 * measured bandwidth is above or below a threshold.
640 *
641 * User programs (e.g. daemons) may have a need to know when the
642 * bandwidth used by some data flow is above or below some threshold.
643 * This interface allows the userland to specify the threshold (in
644 * bytes and/or packets) and the measurement interval. Flows are
645 * all packet with the same source and destination IP address.
646 * At the moment the code is only used for multicast destinations
647 * but there is nothing that prevents its use for unicast.
648 *
649 * The measurement interval cannot be shorter than some Tmin (3s).
650 * The threshold is set in packets and/or bytes per_interval.
651 *
652 * Measurement works as follows:
653 *
654 * For >= measurements:
655 * The first packet marks the start of a measurement interval.
656 * During an interval we count packets and bytes, and when we
657 * pass the threshold we deliver an upcall and we are done.
658 * The first packet after the end of the interval resets the
659 * count and restarts the measurement.
660 *
661 * For <= measurement:
662 * We start a timer to fire at the end of the interval, and
663 * then for each incoming packet we count packets and bytes.
664 * When the timer fires, we compare the value with the threshold,
665 * schedule an upcall if we are below, and restart the measurement
666 * (reschedule timer and zero counters).
667 */
668
669struct bw_data {
670        struct timeval  b_time;
671        uint64_t        b_packets;
672        uint64_t        b_bytes;
673};
674
675struct bw_upcall {
676        struct in_addr  bu_src;         /* source address            */
677        struct in_addr  bu_dst;         /* destination address       */
678        uint32_t        bu_flags;       /* misc flags (see below)    */
679#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets)    */
680#define BW_UPCALL_UNIT_BYTES   (1 << 1) /* threshold (in bytes)      */
681#define BW_UPCALL_GEQ          (1 << 2) /* upcall if bw >= threshold */
682#define BW_UPCALL_LEQ          (1 << 3) /* upcall if bw <= threshold */
683#define BW_UPCALL_DELETE_ALL   (1 << 4) /* delete all upcalls for s,d*/
684        struct bw_data  bu_threshold;   /* the bw threshold          */
685        struct bw_data  bu_measured;    /* the measured bw           */
686};
687
688/* max. number of upcalls to deliver together */
689#define BW_UPCALLS_MAX				128
690/* min. threshold time interval for bandwidth measurement */
691#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC	3
692#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC	0
693.Ed
694.Pp
695The
696.Vt bw_upcall
697structure is used as an argument to
698.Fn setsockopt MRT_ADD_BW_UPCALL
699and
700.Fn setsockopt MRT_DEL_BW_UPCALL .
701Each
702.Fn setsockopt MRT_ADD_BW_UPCALL
703installs a filter in the kernel
704for the source and destination address in the
705.Vt bw_upcall
706argument,
707and that filter will trigger an upcall according to the following
708pseudo-algorithm:
709.Bd -literal
710 if (bw_upcall_oper IS ">=") {
711    if (((bw_upcall_unit & PACKETS == PACKETS) &&
712         (measured_packets >= threshold_packets)) ||
713        ((bw_upcall_unit & BYTES == BYTES) &&
714         (measured_bytes >= threshold_bytes)))
715       SEND_UPCALL("measured bandwidth is >= threshold");
716  }
717  if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) {
718    if (((bw_upcall_unit & PACKETS == PACKETS) &&
719         (measured_packets <= threshold_packets)) ||
720        ((bw_upcall_unit & BYTES == BYTES) &&
721         (measured_bytes <= threshold_bytes)))
722       SEND_UPCALL("measured bandwidth is <= threshold");
723  }
724.Ed
725.Pp
726In the same
727.Vt bw_upcall ,
728the unit can be specified in both BYTES and PACKETS.
729However, the GEQ and LEQ flags are mutually exclusive.
730.Pp
731Basically, an upcall is delivered if the measured bandwidth is >= or
732<= the threshold bandwidth (within the specified measurement
733interval).
734For practical reasons, the smallest value for the measurement
735interval is 3 seconds.
736If smaller values are allowed, then the bandwidth
737estimation may be less accurate, or the potentially very high frequency
738of the generated upcalls may introduce too much overhead.
739For the >= operation, the answer may be known before the end of
740.Va threshold_interval ,
741therefore the upcall may be delivered earlier.
742For the <= operation however, we must wait
743until the threshold interval has expired to know the answer.
744.Sh EXAMPLES
745.Bd -literal -offset indent
746struct bw_upcall bw_upcall;
747/* Assign all bw_upcall fields as appropriate */
748memset(&bw_upcall, 0, sizeof(bw_upcall));
749memcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src));
750memcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst));
751bw_upcall.bu_threshold.b_data = threshold_interval;
752bw_upcall.bu_threshold.b_packets = threshold_packets;
753bw_upcall.bu_threshold.b_bytes = threshold_bytes;
754if (is_threshold_in_packets)
755    bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS;
756if (is_threshold_in_bytes)
757    bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES;
758do {
759    if (is_geq_upcall) {
760        bw_upcall.bu_flags |= BW_UPCALL_GEQ;
761        break;
762    }
763    if (is_leq_upcall) {
764        bw_upcall.bu_flags |= BW_UPCALL_LEQ;
765        break;
766    }
767    return (ERROR);
768} while (0);
769setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL,
770          &bw_upcall, sizeof(bw_upcall));
771.Ed
772.Pp
773To delete a single filter, use
774.Dv MRT_DEL_BW_UPCALL ,
775and the fields of bw_upcall must be set to
776exactly same as when
777.Dv MRT_ADD_BW_UPCALL
778was called.
779.Pp
780To delete all bandwidth filters for a given (S,G), then
781only the
782.Va bu_src
783and
784.Va bu_dst
785fields in
786.Vt "struct bw_upcall"
787need to be set, and then just set only the
788.Dv BW_UPCALL_DELETE_ALL
789flag inside field
790.Va bw_upcall.bu_flags .
791.Pp
792The bandwidth upcalls are received by aggregating them in the new upcall
793message:
794.Bd -literal -offset indent
795#define IGMPMSG_BW_UPCALL  4  /* BW monitoring upcall */
796.Ed
797.Pp
798This message is an array of
799.Vt "struct bw_upcall"
800elements (up to
801.Dv BW_UPCALLS_MAX
802= 128).
803The upcalls are
804delivered when there are 128 pending upcalls, or when 1 second has
805expired since the previous upcall (whichever comes first).
806In an
807.Vt "struct upcall"
808element, the
809.Va bu_measured
810field is filled in to
811indicate the particular measured values.
812However, because of the way
813the particular intervals are measured, the user should be careful how
814.Va bu_measured.b_time
815is used.
816For example, if the
817filter is installed to trigger an upcall if the number of packets
818is >= 1, then
819.Va bu_measured
820may have a value of zero in the upcalls after the
821first one, because the measured interval for >= filters is
822.Dq clocked
823by the forwarded packets.
824Hence, this upcall mechanism should not be used for measuring
825the exact value of the bandwidth of the forwarded data.
826To measure the exact bandwidth, the user would need to
827get the forwarded packets statistics with the
828.Fn ioctl SIOCGETSGCNT
829mechanism
830(see the
831.Sx Programming Guide
832section).
833.Pp
834Note that the upcalls for a filter are delivered until the specific
835filter is deleted, but no more frequently than once per
836.Va bu_threshold.b_time .
837For example, if the filter is specified to
838deliver a signal if bw >= 1 packet, the first packet will trigger a
839signal, but the next upcall will be triggered no earlier than
840.Va bu_threshold.b_time
841after the previous upcall.
842.\"
843.Sh SEE ALSO
844.Xr getsockopt 2 ,
845.Xr recvfrom 2 ,
846.Xr recvmsg 2 ,
847.Xr setsockopt 2 ,
848.Xr socket 2 ,
849.Xr icmp6 4 ,
850.Xr inet 4 ,
851.Xr inet6 4 ,
852.Xr intro 4 ,
853.Xr ip 4 ,
854.Xr ip6 4 ,
855.Xr mrouted 8 ,
856.Xr sysctl 8
857.\"
858.Sh AUTHORS
859.An -nosplit
860The original multicast code was written by
861.An David Waitzman
862(BBN Labs),
863and later modified by the following individuals:
864.An Steve Deering
865(Stanford),
866.An Mark J. Steiglitz
867(Stanford),
868.An Van Jacobson
869(LBL),
870.An Ajit Thyagarajan
871(PARC),
872.An Bill Fenner
873(PARC).
874.Pp
875The IPv6 multicast support was implemented by the KAME project
876.Pq Lk http://www.kame.net ,
877and was based on the IPv4 multicast code.
878The advanced multicast API and the multicast bandwidth
879monitoring were implemented by
880.An Pavlin Radoslavov
881(ICSI)
882in collaboration with
883.An Chris Brown
884(NextHop).
885.Pp
886This manual page was written by
887.An Pavlin Radoslavov
888(ICSI).
889