xref: /freebsd/share/man/man4/netgraph.4 (revision 315ee00f)
1.\" Copyright (c) 1996-1999 Whistle Communications, Inc.
2.\" All rights reserved.
3.\"
4.\" Subject to the following obligations and disclaimer of warranty, use and
5.\" redistribution of this software, in source or object code forms, with or
6.\" without modifications are expressly permitted by Whistle Communications;
7.\" provided, however, that:
8.\" 1. Any and all reproductions of the source or object code must include the
9.\"    copyright notice above and the following disclaimer of warranties; and
10.\" 2. No rights are granted, in any manner or form, to use Whistle
11.\"    Communications, Inc. trademarks, including the mark "WHISTLE
12.\"    COMMUNICATIONS" on advertising, endorsements, or otherwise except as
13.\"    such appears in the above copyright notice or in the software.
14.\"
15.\" THIS SOFTWARE IS BEING PROVIDED BY WHISTLE COMMUNICATIONS "AS IS", AND
16.\" TO THE MAXIMUM EXTENT PERMITTED BY LAW, WHISTLE COMMUNICATIONS MAKES NO
17.\" REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, REGARDING THIS SOFTWARE,
18.\" INCLUDING WITHOUT LIMITATION, ANY AND ALL IMPLIED WARRANTIES OF
19.\" MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT.
20.\" WHISTLE COMMUNICATIONS DOES NOT WARRANT, GUARANTEE, OR MAKE ANY
21.\" REPRESENTATIONS REGARDING THE USE OF, OR THE RESULTS OF THE USE OF THIS
22.\" SOFTWARE IN TERMS OF ITS CORRECTNESS, ACCURACY, RELIABILITY OR OTHERWISE.
23.\" IN NO EVENT SHALL WHISTLE COMMUNICATIONS BE LIABLE FOR ANY DAMAGES
24.\" RESULTING FROM OR ARISING OUT OF ANY USE OF THIS SOFTWARE, INCLUDING
25.\" WITHOUT LIMITATION, ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
26.\" PUNITIVE, OR CONSEQUENTIAL DAMAGES, PROCUREMENT OF SUBSTITUTE GOODS OR
27.\" SERVICES, LOSS OF USE, DATA OR PROFITS, HOWEVER CAUSED AND UNDER ANY
28.\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
29.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
30.\" THIS SOFTWARE, EVEN IF WHISTLE COMMUNICATIONS IS ADVISED OF THE POSSIBILITY
31.\" OF SUCH DAMAGE.
32.\"
33.\" Authors: Julian Elischer <julian@FreeBSD.org>
34.\"          Archie Cobbs <archie@FreeBSD.org>
35.\"
36.\" $Whistle: netgraph.4,v 1.7 1999/01/28 23:54:52 julian Exp $
37.\"
38.Dd September 29, 2021
39.Dt NETGRAPH 4
40.Os
41.Sh NAME
42.Nm netgraph
43.Nd "graph based kernel networking subsystem"
44.Sh DESCRIPTION
45The
46.Nm
47system provides a uniform and modular system for the implementation
48of kernel objects which perform various networking functions.
49The objects, known as
50.Em nodes ,
51can be arranged into arbitrarily complicated graphs.
52Nodes have
53.Em hooks
54which are used to connect two nodes together, forming the edges in the graph.
55Nodes communicate along the edges to process data, implement protocols, etc.
56.Pp
57The aim of
58.Nm
59is to supplement rather than replace the existing kernel networking
60infrastructure.
61It provides:
62.Pp
63.Bl -bullet -compact
64.It
65A flexible way of combining protocol and link level drivers.
66.It
67A modular way to implement new protocols.
68.It
69A common framework for kernel entities to inter-communicate.
70.It
71A reasonably fast, kernel-based implementation.
72.El
73.Ss Nodes and Types
74The most fundamental concept in
75.Nm
76is that of a
77.Em node .
78All nodes implement a number of predefined methods which allow them
79to interact with other nodes in a well defined manner.
80.Pp
81Each node has a
82.Em type ,
83which is a static property of the node determined at node creation time.
84A node's type is described by a unique
85.Tn ASCII
86type name.
87The type implies what the node does and how it may be connected
88to other nodes.
89.Pp
90In object-oriented language, types are classes, and nodes are instances
91of their respective class.
92All node types are subclasses of the generic node
93type, and hence inherit certain common functionality and capabilities
94(e.g., the ability to have an
95.Tn ASCII
96name).
97.Pp
98Nodes may be assigned a globally unique
99.Tn ASCII
100name which can be
101used to refer to the node.
102The name must not contain the characters
103.Ql .\&
104or
105.Ql \&: ,
106and is limited to
107.Dv NG_NODESIZ
108characters (including the terminating
109.Dv NUL
110character).
111.Pp
112Each node instance has a unique
113.Em ID number
114which is expressed as a 32-bit hexadecimal value.
115This value may be used to refer to a node when there is no
116.Tn ASCII
117name assigned to it.
118.Ss Hooks
119Nodes are connected to other nodes by connecting a pair of
120.Em hooks ,
121one from each node.
122Data flows bidirectionally between nodes along
123connected pairs of hooks.
124A node may have as many hooks as it
125needs, and may assign whatever meaning it wants to a hook.
126.Pp
127Hooks have these properties:
128.Bl -bullet
129.It
130A hook has an
131.Tn ASCII
132name which is unique among all hooks
133on that node (other hooks on other nodes may have the same name).
134The name must not contain the characters
135.Ql .\&
136or
137.Ql \&: ,
138and is
139limited to
140.Dv NG_HOOKSIZ
141characters (including the terminating
142.Dv NUL
143character).
144.It
145A hook is always connected to another hook.
146That is, hooks are
147created at the time they are connected, and breaking an edge by
148removing either hook destroys both hooks.
149.It
150A hook can be set into a state where incoming packets are always queued
151by the input queueing system, rather than being delivered directly.
152This can be used when the data is sent from an interrupt handler,
153and processing must be quick so as not to block other interrupts.
154.It
155A hook may supply overriding receive data and receive message functions,
156which should be used for data and messages received through that hook
157in preference to the general node-wide methods.
158.El
159.Pp
160A node may decide to assign special meaning to some hooks.
161For example, connecting to the hook named
162.Va debug
163might trigger
164the node to start sending debugging information to that hook.
165.Ss Data Flow
166Two types of information flow between nodes: data messages and
167control messages.
168Data messages are passed in
169.Vt mbuf chains
170along the edges
171in the graph, one edge at a time.
172The first
173.Vt mbuf
174in a chain must have the
175.Dv M_PKTHDR
176flag set.
177Each node decides how to handle data received through one of its hooks.
178.Pp
179Along with data, nodes can also receive control messages.
180There are generic and type-specific control messages.
181Control messages have a common
182header format, followed by type-specific data, and are binary structures
183for efficiency.
184However, node types may also support conversion of the
185type-specific data between binary and
186.Tn ASCII
187formats,
188for debugging and human interface purposes (see the
189.Dv NGM_ASCII2BINARY
190and
191.Dv NGM_BINARY2ASCII
192generic control messages below).
193Nodes are not required to support these conversions.
194.Pp
195There are three ways to address a control message.
196If there is a sequence of edges connecting the two nodes, the message
197may be
198.Dq source routed
199by specifying the corresponding sequence
200of
201.Tn ASCII
202hook names as the destination address for the message (relative
203addressing).
204If the destination is adjacent to the source, then the source
205node may simply specify (as a pointer in the code) the hook across which the
206message should be sent.
207Otherwise, the recipient node's global
208.Tn ASCII
209name
210(or equivalent ID-based name) is used as the destination address
211for the message (absolute addressing).
212The two types of
213.Tn ASCII
214addressing
215may be combined, by specifying an absolute start node and a sequence
216of hooks.
217Only the
218.Tn ASCII
219addressing modes are available to control programs outside the kernel;
220use of direct pointers is limited to kernel modules.
221.Pp
222Messages often represent commands that are followed by a reply message
223in the reverse direction.
224To facilitate this, the recipient of a
225control message is supplied with a
226.Dq return address
227that is suitable for addressing a reply.
228.Pp
229Each control message contains a 32-bit value, called a
230.Dq typecookie ,
231indicating the type of the message, i.e.\& how to interpret it.
232Typically each type defines a unique typecookie for the messages
233that it understands.
234However, a node may choose to recognize and
235implement more than one type of messages.
236.Pp
237If a message is delivered to an address that implies that it arrived
238at that node through a particular hook (as opposed to having been directly
239addressed using its ID or global name) then that hook is identified to the
240receiving node.
241This allows a message to be re-routed or passed on, should
242a node decide that this is required, in much the same way that data packets
243are passed around between nodes.
244A set of standard
245messages for flow control and link management purposes are
246defined by the base system that are usually
247passed around in this manner.
248Flow control message would usually travel
249in the opposite direction to the data to which they pertain.
250.Ss Netgraph is (Usually) Functional
251In order to minimize latency, most
252.Nm
253operations are functional.
254That is, data and control messages are delivered by making function
255calls rather than by using queues and mailboxes.
256For example, if node
257A wishes to send a data
258.Vt mbuf
259to neighboring node B, it calls the
260generic
261.Nm
262data delivery function.
263This function in turn locates
264node B and calls B's
265.Dq receive data
266method.
267There are exceptions to this.
268.Pp
269Each node has an input queue, and some operations can be considered to
270be
271.Em writers
272in that they alter the state of the node.
273Obviously, in an SMP
274world it would be bad if the state of a node were changed while another
275data packet were transiting the node.
276For this purpose, the input queue implements a
277.Em reader/writer
278semantic so that when there is a writer in the node, all other requests
279are queued, and while there are readers, a writer, and any following
280packets are queued.
281In the case where there is no reason to queue the
282data, the input method is called directly, as mentioned above.
283.Pp
284A node may declare that all requests should be considered as writers,
285or that requests coming in over a particular hook should be considered to
286be a writer, or even that packets leaving or entering across a particular
287hook should always be queued, rather than delivered directly (often useful
288for interrupt routines who want to get back to the hardware quickly).
289By default, all control message packets are considered to be writers
290unless specifically declared to be a reader in their definition.
291(See
292.Dv NGM_READONLY
293in
294.In netgraph/ng_message.h . )
295.Pp
296While this mode of operation
297results in good performance, it has a few implications for node
298developers:
299.Bl -bullet
300.It
301Whenever a node delivers a data or control message, the node
302may need to allow for the possibility of receiving a returning
303message before the original delivery function call returns.
304.It
305.Nm Netgraph
306provides internal synchronization between nodes.
307Data always enters a
308.Dq graph
309at an
310.Em edge node .
311An
312.Em edge node
313is a node that interfaces between
314.Nm
315and some other part of the system.
316Examples of
317.Dq edge nodes
318include device drivers, the
319.Vt socket , ether , tty ,
320and
321.Vt ksocket
322node type.
323In these
324.Em edge nodes ,
325the calling thread directly executes code in the node, and from that code
326calls upon the
327.Nm
328framework to deliver data across some edge
329in the graph.
330From an execution point of view, the calling thread will execute the
331.Nm
332framework methods, and if it can acquire a lock to do so,
333the input methods of the next node.
334This continues until either the data is discarded or queued for some
335device or system entity, or the thread is unable to acquire a lock on
336the next node.
337In that case, the data is queued for the node, and execution rewinds
338back to the original calling entity.
339The queued data will be picked up and processed by either the current
340holder of the lock when they have completed their operations, or by
341a special
342.Nm
343thread that is activated when there are such items
344queued.
345.It
346It is possible for an infinite loop to occur if the graph contains cycles.
347.El
348.Pp
349So far, these issues have not proven problematical in practice.
350.Ss Interaction with Other Parts of the Kernel
351A node may have a hidden interaction with other components of the
352kernel outside of the
353.Nm
354subsystem, such as device hardware,
355kernel protocol stacks, etc.
356In fact, one of the benefits of
357.Nm
358is the ability to join disparate kernel networking entities together in a
359consistent communication framework.
360.Pp
361An example is the
362.Vt socket
363node type which is both a
364.Nm
365node and a
366.Xr socket 2
367in the protocol family
368.Dv PF_NETGRAPH .
369Socket nodes allow user processes to participate in
370.Nm .
371Other nodes communicate with socket nodes using the usual methods, and the
372node hides the fact that it is also passing information to and from a
373cooperating user process.
374.Pp
375Another example is a device driver that presents
376a node interface to the hardware.
377.Ss Node Methods
378Nodes are notified of the following actions via function calls
379to the following node methods,
380and may accept or reject that action (by returning the appropriate
381error code):
382.Bl -tag -width 2n
383.It Creation of a new node
384The constructor for the type is called.
385If creation of a new node is allowed, constructor method may allocate any
386special resources it needs.
387For nodes that correspond to hardware, this is typically done during the
388device attach routine.
389Often a global
390.Tn ASCII
391name corresponding to the
392device name is assigned here as well.
393.It Creation of a new hook
394The hook is created and tentatively
395linked to the node, and the node is told about the name that will be
396used to describe this hook.
397The node sets up any special data structures
398it needs, or may reject the connection, based on the name of the hook.
399.It Successful connection of two hooks
400After both ends have accepted their
401hooks, and the links have been made, the nodes get a chance to
402find out who their peer is across the link, and can then decide to reject
403the connection.
404Tear-down is automatic.
405This is also the time at which
406a node may decide whether to set a particular hook (or its peer) into
407the
408.Em queueing
409mode.
410.It Destruction of a hook
411The node is notified of a broken connection.
412The node may consider some hooks
413to be critical to operation and others to be expendable: the disconnection
414of one hook may be an acceptable event while for another it
415may effect a total shutdown for the node.
416.It Preshutdown of a node
417This method is called before real shutdown, which is discussed below.
418While in this method, the node is fully operational and can send a
419.Dq goodbye
420message to its peers, or it can exclude itself from the chain and reconnect
421its peers together, like the
422.Xr ng_tee 4
423node type does.
424.It Shutdown of a node
425This method allows a node to clean up
426and to ensure that any actions that need to be performed
427at this time are taken.
428The method is called by the generic (i.e., superclass)
429node destructor which will get rid of the generic components of the node.
430Some nodes (usually associated with a piece of hardware) may be
431.Em persistent
432in that a shutdown breaks all edges and resets the node,
433but does not remove it.
434In this case, the shutdown method should not
435free its resources, but rather, clean up and then call the
436.Fn NG_NODE_REVIVE
437macro to signal the generic code that the shutdown is aborted.
438In the case where the shutdown is started by the node itself due to hardware
439removal or unloading (via
440.Fn ng_rmnode_self ) ,
441it should set the
442.Dv NGF_REALLY_DIE
443flag to signal to its own shutdown method that it is not to persist.
444.El
445.Ss Sending and Receiving Data
446Two other methods are also supported by all nodes:
447.Bl -tag -width 2n
448.It Receive data message
449A
450.Nm
451.Em queueable request item ,
452usually referred to as an
453.Em item ,
454is received by this function.
455The item contains a pointer to an
456.Vt mbuf .
457.Pp
458The node is notified on which hook the item has arrived,
459and can use this information in its processing decision.
460The receiving node must always
461.Fn NG_FREE_M
462the
463.Vt mbuf chain
464on completion or error, or pass it on to another node
465(or kernel module) which will then be responsible for freeing it.
466Similarly, the
467.Em item
468must be freed if it is not to be passed on to another node, by using the
469.Fn NG_FREE_ITEM
470macro.
471If the item still holds references to
472.Vt mbufs
473at the time of
474freeing then they will also be appropriately freed.
475Therefore, if there is any chance that the
476.Vt mbuf
477will be
478changed or freed separately from the item, it is very important
479that it be retrieved using the
480.Fn NGI_GET_M
481macro that also removes the reference within the item.
482(Or multiple frees of the same object will occur.)
483.Pp
484If it is only required to examine the contents of the
485.Vt mbufs ,
486then it is possible to use the
487.Fn NGI_M
488macro to both read and rewrite
489.Vt mbuf
490pointer inside the item.
491.Pp
492If developer needs to pass any meta information along with the
493.Vt mbuf chain ,
494he should use
495.Xr mbuf_tags 9
496framework.
497.Bf -symbolic
498Note that old
499.Nm
500specific meta-data format is obsoleted now.
501.Ef
502.Pp
503The receiving node may decide to defer the data by queueing it in the
504.Nm
505NETISR system (see below).
506It achieves this by setting the
507.Dv HK_QUEUE
508flag in the flags word of the hook on which that data will arrive.
509The infrastructure will respect that bit and queue the data for delivery at
510a later time, rather than deliver it directly.
511A node may decide to set
512the bit on the
513.Em peer
514node, so that its own output packets are queued.
515.Pp
516The node may elect to nominate a different receive data function
517for data received on a particular hook, to simplify coding.
518It uses the
519.Fn NG_HOOK_SET_RCVDATA hook fn
520macro to do this.
521The function receives the same arguments in every way
522other than it will receive all (and only) packets from that hook.
523.It Receive control message
524This method is called when a control message is addressed to the node.
525As with the received data, an
526.Em item
527is received, with a pointer to the control message.
528The message can be examined using the
529.Fn NGI_MSG
530macro, or completely extracted from the item using the
531.Fn NGI_GET_MSG
532which also removes the reference within the item.
533If the item still holds a reference to the message when it is freed
534(using the
535.Fn NG_FREE_ITEM
536macro), then the message will also be freed appropriately.
537If the
538reference has been removed, the node must free the message itself using the
539.Fn NG_FREE_MSG
540macro.
541A return address is always supplied, giving the address of the node
542that originated the message so a reply message can be sent anytime later.
543The return address is retrieved from the
544.Em item
545using the
546.Fn NGI_RETADDR
547macro and is of type
548.Vt ng_ID_t .
549All control messages and replies are
550allocated with the
551.Xr malloc 9
552type
553.Dv M_NETGRAPH_MSG ,
554however it is more convenient to use the
555.Fn NG_MKMESSAGE
556and
557.Fn NG_MKRESPONSE
558macros to allocate and fill out a message.
559Messages must be freed using the
560.Fn NG_FREE_MSG
561macro.
562.Pp
563If the message was delivered via a specific hook, that hook will
564also be made known, which allows the use of such things as flow-control
565messages, and status change messages, where the node may want to forward
566the message out another hook to that on which it arrived.
567.Pp
568The node may elect to nominate a different receive message function
569for messages received on a particular hook, to simplify coding.
570It uses the
571.Fn NG_HOOK_SET_RCVMSG hook fn
572macro to do this.
573The function receives the same arguments in every way
574other than it will receive all (and only) messages from that hook.
575.El
576.Pp
577Much use has been made of reference counts, so that nodes being
578freed of all references are automatically freed, and this behaviour
579has been tested and debugged to present a consistent and trustworthy
580framework for the
581.Dq type module
582writer to use.
583.Ss Addressing
584The
585.Nm
586framework provides an unambiguous and simple to use method of specifically
587addressing any single node in the graph.
588The naming of a node is
589independent of its type, in that another node, or external component
590need not know anything about the node's type in order to address it so as
591to send it a generic message type.
592Node and hook names should be
593chosen so as to make addresses meaningful.
594.Pp
595Addresses are either absolute or relative.
596An absolute address begins
597with a node name or ID, followed by a colon, followed by a sequence of hook
598names separated by periods.
599This addresses the node reached by starting
600at the named node and following the specified sequence of hooks.
601A relative address includes only the sequence of hook names, implicitly
602starting hook traversal at the local node.
603.Pp
604There are a couple of special possibilities for the node name.
605The name
606.Ql .\&
607(referred to as
608.Ql .: )
609always refers to the local node.
610Also, nodes that have no global name may be addressed by their ID numbers,
611by enclosing the hexadecimal representation of the ID number within
612the square brackets.
613Here are some examples of valid
614.Nm
615addresses:
616.Bd -literal -offset indent
617\&.:
618[3f]:
619foo:
620\&.:hook1
621foo:hook1.hook2
622[d80]:hook1
623.Ed
624.Pp
625The following set of nodes might be created for a site with
626a single physical frame relay line having two active logical DLCI channels,
627with RFC 1490 frames on DLCI 16 and PPP frames over DLCI 20:
628.Bd -literal
629[type SYNC ]                  [type FRAME]                 [type RFC1490]
630[ "Frame1" ](uplink)<-->(data)[<un-named>](dlci16)<-->(mux)[<un-named>  ]
631[    A     ]                  [    B     ](dlci20)<---+    [     C      ]
632                                                      |
633                                                      |      [ type PPP ]
634                                                      +>(mux)[<un-named>]
635                                                             [    D     ]
636.Ed
637.Pp
638One could always send a control message to node C from anywhere
639by using the name
640.Dq Li Frame1:uplink.dlci16 .
641In this case, node C would also be notified that the message
642reached it via its hook
643.Va mux .
644Similarly,
645.Dq Li Frame1:uplink.dlci20
646could reliably be used to reach node D, and node A could refer
647to node B as
648.Dq Li .:uplink ,
649or simply
650.Dq Li uplink .
651Conversely, B can refer to A as
652.Dq Li data .
653The address
654.Dq Li mux.data
655could be used by both nodes C and D to address a message to node A.
656.Pp
657Note that this is only for
658.Em control messages .
659In each of these cases, where a relative addressing mode is
660used, the recipient is notified of the hook on which the
661message arrived, as well as
662the originating node.
663This allows the option of hop-by-hop distribution of messages and
664state information.
665Data messages are
666.Em only
667routed one hop at a time, by specifying the departing
668hook, with each node making
669the next routing decision.
670So when B receives a frame on hook
671.Va data ,
672it decodes the frame relay header to determine the DLCI,
673and then forwards the unwrapped frame to either C or D.
674.Pp
675In a similar way, flow control messages may be routed in the reverse
676direction to outgoing data.
677For example a
678.Dq "buffer nearly full"
679message from
680.Dq Li Frame1:
681would be passed to node B
682which might decide to send similar messages to both nodes
683C and D.
684The nodes would use
685.Em "direct hook pointer"
686addressing to route the messages.
687The message may have travelled from
688.Dq Li Frame1:
689to B
690as a synchronous reply, saving time and cycles.
691.Ss Netgraph Structures
692Structures are defined in
693.In netgraph/netgraph.h
694(for kernel structures only of interest to nodes)
695and
696.In netgraph/ng_message.h
697(for message definitions also of interest to user programs).
698.Pp
699The two basic object types that are of interest to node authors are
700.Em nodes
701and
702.Em hooks .
703These two objects have the following
704properties that are also of interest to the node writers.
705.Bl -tag -width 2n
706.It Vt "struct ng_node"
707Node authors should always use the following
708.Ic typedef
709to declare
710their pointers, and should never actually declare the structure.
711.Pp
712.Fd "typedef struct ng_node *node_p;"
713.Pp
714The following properties are associated with a node, and can be
715accessed in the following manner:
716.Bl -tag -width 2n
717.It Validity
718A driver or interrupt routine may want to check whether
719the node is still valid.
720It is assumed that the caller holds a reference
721on the node so it will not have been freed, however it may have been
722disabled or otherwise shut down.
723Using the
724.Fn NG_NODE_IS_VALID node
725macro will return this state.
726Eventually it should be almost impossible
727for code to run in an invalid node but at this time that work has not been
728completed.
729.It Node ID Pq Vt ng_ID_t
730This property can be retrieved using the macro
731.Fn NG_NODE_ID node .
732.It Node name
733Optional globally unique name,
734.Dv NUL
735terminated string.
736If there
737is a value in here, it is the name of the node.
738.Bd -literal -offset indent
739if (NG_NODE_NAME(node)[0] != '\e0') ...
740
741if (strcmp(NG_NODE_NAME(node), "fred") == 0) ...
742.Ed
743.It A node dependent opaque cookie
744Anything of the pointer type can be placed here.
745The macros
746.Fn NG_NODE_SET_PRIVATE node value
747and
748.Fn NG_NODE_PRIVATE node
749set and retrieve this property, respectively.
750.It Number of hooks
751The
752.Fn NG_NODE_NUMHOOKS node
753macro is used
754to retrieve this value.
755.It Hooks
756The node may have a number of hooks.
757A traversal method is provided to allow all the hooks to be
758tested for some condition.
759.Fn NG_NODE_FOREACH_HOOK node fn arg rethook
760where
761.Fa fn
762is a function that will be called for each hook
763with the form
764.Fn fn hook arg
765and returning 0 to terminate the search.
766If the search is terminated, then
767.Fa rethook
768will be set to the hook at which the search was terminated.
769.El
770.It Vt "struct ng_hook"
771Node authors should always use the following
772.Ic typedef
773to declare
774their hook pointers.
775.Pp
776.Fd "typedef struct ng_hook *hook_p;"
777.Pp
778The following properties are associated with a hook, and can be
779accessed in the following manner:
780.Bl -tag -width 2n
781.It A hook dependent opaque cookie
782Anything of the pointer type can be placed here.
783The macros
784.Fn NG_HOOK_SET_PRIVATE hook value
785and
786.Fn NG_HOOK_PRIVATE hook
787set and retrieve this property, respectively.
788.It \&An associate node
789The macro
790.Fn NG_HOOK_NODE hook
791finds the associated node.
792.It A peer hook Pq Vt hook_p
793The other hook in this connected pair.
794The
795.Fn NG_HOOK_PEER hook
796macro finds the peer.
797.It References
798The
799.Fn NG_HOOK_REF hook
800and
801.Fn NG_HOOK_UNREF hook
802macros
803increment and decrement the hook reference count accordingly.
804After decrement you should always assume the hook has been freed
805unless you have another reference still valid.
806.It Override receive functions
807The
808.Fn NG_HOOK_SET_RCVDATA hook fn
809and
810.Fn NG_HOOK_SET_RCVMSG hook fn
811macros can be used to set override methods that will be used in preference
812to the generic receive data and receive message functions.
813To unset these, use the macros to set them to
814.Dv NULL .
815They will only be used for data and
816messages received on the hook on which they are set.
817.El
818.Pp
819The maintenance of the names, reference counts, and linked list
820of hooks for each node is handled automatically by the
821.Nm
822subsystem.
823Typically a node's private info contains a back-pointer to the node or hook
824structure, which counts as a new reference that must be included
825in the reference count for the node.
826When the node constructor is called,
827there is already a reference for this calculated in, so that
828when the node is destroyed, it should remember to do a
829.Fn NG_NODE_UNREF
830on the node.
831.Pp
832From a hook you can obtain the corresponding node, and from
833a node, it is possible to traverse all the active hooks.
834.Pp
835A current example of how to define a node can always be seen in
836.Pa src/sys/netgraph/ng_sample.c
837and should be used as a starting point for new node writers.
838.El
839.Ss Netgraph Message Structure
840Control messages have the following structure:
841.Bd -literal
842#define NG_CMDSTRSIZ    32      /* Max command string (including null) */
843
844struct ng_mesg {
845  struct ng_msghdr {
846    u_char      version;        /* Must equal NG_VERSION */
847    u_char      spare;          /* Pad to 4 bytes */
848    uint16_t    spare2;
849    uint32_t    arglen;         /* Length of cmd/resp data */
850    uint32_t    cmd;            /* Command identifier */
851    uint32_t    flags;          /* Message status flags */
852    uint32_t    token;          /* Reply should have the same token */
853    uint32_t    typecookie;     /* Node type understanding this message */
854    u_char      cmdstr[NG_CMDSTRSIZ];  /* cmd string + \0 */
855  } header;
856  char  data[];                 /* placeholder for actual data */
857};
858
859#define NG_ABI_VERSION  12              /* Netgraph kernel ABI version */
860#define NG_VERSION      8               /* Netgraph message version */
861#define NGF_ORIG        0x00000000      /* The msg is the original request */
862#define NGF_RESP        0x00000001      /* The message is a response */
863.Ed
864.Pp
865Control messages have the fixed header shown above, followed by a
866variable length data section which depends on the type cookie
867and the command.
868Each field is explained below:
869.Bl -tag -width indent
870.It Va version
871Indicates the version of the
872.Nm
873message protocol itself.
874The current version is
875.Dv NG_VERSION .
876.It Va arglen
877This is the length of any extra arguments, which begin at
878.Va data .
879.It Va flags
880Indicates whether this is a command or a response control message.
881.It Va token
882The
883.Va token
884is a means by which a sender can match a reply message to the
885corresponding command message; the reply always has the same token.
886.It Va typecookie
887The corresponding node type's unique 32-bit value.
888If a node does not recognize the type cookie it must reject the message
889by returning
890.Er EINVAL .
891.Pp
892Each type should have an include file that defines the commands,
893argument format, and cookie for its own messages.
894The typecookie
895ensures that the same header file was included by both sender and
896receiver; when an incompatible change in the header file is made,
897the typecookie
898.Em must
899be changed.
900The de-facto method for generating unique type cookies is to take the
901seconds from the Epoch at the time the header file is written
902(i.e., the output of
903.Dq Nm date Fl u Li +%s ) .
904.Pp
905There is a predefined typecookie
906.Dv NGM_GENERIC_COOKIE
907for the
908.Vt generic
909node type, and
910a corresponding set of generic messages which all nodes understand.
911The handling of these messages is automatic.
912.It Va cmd
913The identifier for the message command.
914This is type specific,
915and is defined in the same header file as the typecookie.
916.It Va cmdstr
917Room for a short human readable version of
918.Va command
919(for debugging purposes only).
920.El
921.Pp
922Some modules may choose to implement messages from more than one
923of the header files and thus recognize more than one type cookie.
924.Ss Control Message ASCII Form
925Control messages are in binary format for efficiency.
926However, for
927debugging and human interface purposes, and if the node type supports
928it, control messages may be converted to and from an equivalent
929.Tn ASCII
930form.
931The
932.Tn ASCII
933form is similar to the binary form, with two exceptions:
934.Bl -enum
935.It
936The
937.Va cmdstr
938header field must contain the
939.Tn ASCII
940name of the command, corresponding to the
941.Va cmd
942header field.
943.It
944The arguments field contains a
945.Dv NUL Ns
946-terminated
947.Tn ASCII
948string version of the message arguments.
949.El
950.Pp
951In general, the arguments field of a control message can be any
952arbitrary C data type.
953.Nm Netgraph
954includes parsing routines to support
955some pre-defined datatypes in
956.Tn ASCII
957with this simple syntax:
958.Bl -bullet
959.It
960Integer types are represented by base 8, 10, or 16 numbers.
961.It
962Strings are enclosed in double quotes and respect the normal
963C language backslash escapes.
964.It
965IP addresses have the obvious form.
966.It
967Arrays are enclosed in square brackets, with the elements listed
968consecutively starting at index zero.
969An element may have an optional index and equals sign
970.Pq Ql =
971preceding it.
972Whenever an element
973does not have an explicit index, the index is implicitly the previous
974element's index plus one.
975.It
976Structures are enclosed in curly braces, and each field is specified
977in the form
978.Ar fieldname Ns = Ns Ar value .
979.It
980Any array element or structure field whose value is equal to its
981.Dq default value
982may be omitted.
983For integer types, the default value
984is usually zero; for string types, the empty string.
985.It
986Array elements and structure fields may be specified in any order.
987.El
988.Pp
989Each node type may define its own arbitrary types by providing
990the necessary routines to parse and unparse.
991.Tn ASCII
992forms defined
993for a specific node type are documented in the corresponding man page.
994.Ss Generic Control Messages
995There are a number of standard predefined messages that will work
996for any node, as they are supported directly by the framework itself.
997These are defined in
998.In netgraph/ng_message.h
999along with the basic layout of messages and other similar information.
1000.Bl -tag -width indent
1001.It Dv NGM_CONNECT
1002Connect to another node, using the supplied hook names on either end.
1003.It Dv NGM_MKPEER
1004Construct a node of the given type and then connect to it using the
1005supplied hook names.
1006.It Dv NGM_SHUTDOWN
1007The target node should disconnect from all its neighbours and shut down.
1008Persistent nodes such as those representing physical hardware
1009might not disappear from the node namespace, but only reset themselves.
1010The node must disconnect all of its hooks.
1011This may result in neighbors shutting themselves down, and possibly a
1012cascading shutdown of the entire connected graph.
1013.It Dv NGM_NAME
1014Assign a name to a node.
1015Nodes can exist without having a name, and this
1016is the default for nodes created using the
1017.Dv NGM_MKPEER
1018method.
1019Such nodes can only be addressed relatively or by their ID number.
1020.It Dv NGM_RMHOOK
1021Ask the node to break a hook connection to one of its neighbours.
1022Both nodes will have their
1023.Dq disconnect
1024method invoked.
1025Either node may elect to totally shut down as a result.
1026.It Dv NGM_NODEINFO
1027Asks the target node to describe itself.
1028The four returned fields
1029are the node name (if named), the node type, the node ID and the
1030number of hooks attached.
1031The ID is an internal number unique to that node.
1032.It Dv NGM_LISTHOOKS
1033This returns the information given by
1034.Dv NGM_NODEINFO ,
1035but in addition
1036includes an array of fields describing each link, and the description for
1037the node at the far end of that link.
1038.It Dv NGM_LISTNAMES
1039This returns an array of node descriptions (as for
1040.Dv NGM_NODEINFO )
1041where each entry of the array describes a named node.
1042All named nodes will be described.
1043.It Dv NGM_LISTNODES
1044This is the same as
1045.Dv NGM_LISTNAMES
1046except that all nodes are listed regardless of whether they have a name or not.
1047.It Dv NGM_LISTTYPES
1048This returns a list of all currently installed
1049.Nm
1050types.
1051.It Dv NGM_TEXT_STATUS
1052The node may return a text formatted status message.
1053The status information is determined entirely by the node type.
1054It is the only
1055.Dq generic
1056message
1057that requires any support within the node itself and as such the node may
1058elect to not support this message.
1059The text response must be less than
1060.Dv NG_TEXTRESPONSE
1061bytes in length (presently 1024).
1062This can be used to return general
1063status information in human readable form.
1064.It Dv NGM_BINARY2ASCII
1065This message converts a binary control message to its
1066.Tn ASCII
1067form.
1068The entire control message to be converted is contained within the
1069arguments field of the
1070.Dv NGM_BINARY2ASCII
1071message itself.
1072If successful, the reply will contain the same control
1073message in
1074.Tn ASCII
1075form.
1076A node will typically only know how to translate messages that it
1077itself understands, so the target node of the
1078.Dv NGM_BINARY2ASCII
1079is often the same node that would actually receive that message.
1080.It Dv NGM_ASCII2BINARY
1081The opposite of
1082.Dv NGM_BINARY2ASCII .
1083The entire control message to be converted, in
1084.Tn ASCII
1085form, is contained
1086in the arguments section of the
1087.Dv NGM_ASCII2BINARY
1088and need only have the
1089.Va flags , cmdstr ,
1090and
1091.Va arglen
1092header fields filled in, plus the
1093.Dv NUL Ns
1094-terminated string version of
1095the arguments in the arguments field.
1096If successful, the reply
1097contains the binary version of the control message.
1098.El
1099.Ss Flow Control Messages
1100In addition to the control messages that affect nodes with respect to the
1101graph, there are also a number of
1102.Em flow control
1103messages defined.
1104At present these are
1105.Em not
1106handled automatically by the system, so
1107nodes need to handle them if they are going to be used in a graph utilising
1108flow control, and will be in the likely path of these messages.
1109The default action of a node that does not understand these messages should
1110be to pass them onto the next node.
1111Hopefully some helper functions will assist in this eventually.
1112These messages are also defined in
1113.In netgraph/ng_message.h
1114and have a separate cookie
1115.Dv NG_FLOW_COOKIE
1116to help identify them.
1117They will not be covered in depth here.
1118.Sh INITIALIZATION
1119The base
1120.Nm
1121code may either be statically compiled
1122into the kernel or else loaded dynamically as a KLD via
1123.Xr kldload 8 .
1124In the former case, include
1125.Pp
1126.D1 Cd "options NETGRAPH"
1127.Pp
1128in your kernel configuration file.
1129You may also include selected
1130node types in the kernel compilation, for example:
1131.Pp
1132.D1 Cd "options NETGRAPH"
1133.D1 Cd "options NETGRAPH_SOCKET"
1134.D1 Cd "options NETGRAPH_ECHO"
1135.Pp
1136Once the
1137.Nm
1138subsystem is loaded, individual node types may be loaded at any time
1139as KLD modules via
1140.Xr kldload 8 .
1141Moreover,
1142.Nm
1143knows how to automatically do this; when a request to create a new
1144node of unknown type
1145.Ar type
1146is made,
1147.Nm
1148will attempt to load the KLD module
1149.Pa ng_ Ns Ao Ar type Ac Ns Pa .ko .
1150.Pp
1151Types can also be installed at boot time, as certain device drivers
1152may want to export each instance of the device as a
1153.Nm
1154node.
1155.Pp
1156In general, new types can be installed at any time from within the
1157kernel by calling
1158.Fn ng_newtype ,
1159supplying a pointer to the type's
1160.Vt "struct ng_type"
1161structure.
1162.Pp
1163The
1164.Fn NETGRAPH_INIT
1165macro automates this process by using a linker set.
1166.Sh EXISTING NODE TYPES
1167Several node types currently exist.
1168Each is fully documented in its own man page:
1169.Bl -tag -width indent
1170.It SOCKET
1171The socket type implements two new sockets in the new protocol domain
1172.Dv PF_NETGRAPH .
1173The new sockets protocols are
1174.Dv NG_DATA
1175and
1176.Dv NG_CONTROL ,
1177both of type
1178.Dv SOCK_DGRAM .
1179Typically one of each is associated with a socket node.
1180When both sockets have closed, the node will shut down.
1181The
1182.Dv NG_DATA
1183socket is used for sending and receiving data, while the
1184.Dv NG_CONTROL
1185socket is used for sending and receiving control messages.
1186Data and control messages are passed using the
1187.Xr sendto 2
1188and
1189.Xr recvfrom 2
1190system calls, using a
1191.Vt "struct sockaddr_ng"
1192socket address.
1193.It HOLE
1194Responds only to generic messages and is a
1195.Dq black hole
1196for data.
1197Useful for testing.
1198Always accepts new hooks.
1199.It ECHO
1200Responds only to generic messages and always echoes data back through the
1201hook from which it arrived.
1202Returns any non-generic messages as their own response.
1203Useful for testing.
1204Always accepts new hooks.
1205.It TEE
1206This node is useful for
1207.Dq snooping .
1208It has 4 hooks:
1209.Va left , right , left2right ,
1210and
1211.Va right2left .
1212Data entering from the
1213.Va right
1214is passed to the
1215.Va left
1216and duplicated on
1217.Va right2left ,
1218and data entering from the
1219.Va left
1220is passed to the
1221.Va right
1222and duplicated on
1223.Va left2right .
1224Data entering from
1225.Va left2right
1226is sent to the
1227.Va right
1228and data from
1229.Va right2left
1230to
1231.Va left .
1232.It RFC1490 MUX
1233Encapsulates/de-encapsulates frames encoded according to RFC 1490.
1234Has a hook for the encapsulated packets
1235.Pq Va downstream
1236and one hook
1237for each protocol (i.e., IP, PPP, etc.).
1238.It FRAME RELAY MUX
1239Encapsulates/de-encapsulates Frame Relay frames.
1240Has a hook for the encapsulated packets
1241.Pq Va downstream
1242and one hook
1243for each DLCI.
1244.It FRAME RELAY LMI
1245Automatically handles frame relay
1246.Dq LMI
1247(link management interface) operations and packets.
1248Automatically probes and detects which of several LMI standards
1249is in use at the exchange.
1250.It TTY
1251This node is also a line discipline.
1252It simply converts between
1253.Vt mbuf
1254frames and sequential serial data, allowing a TTY to appear as a
1255.Nm
1256node.
1257It has a programmable
1258.Dq hotkey
1259character.
1260.It ASYNC
1261This node encapsulates and de-encapsulates asynchronous frames
1262according to RFC 1662.
1263This is used in conjunction with the TTY node
1264type for supporting PPP links over asynchronous serial lines.
1265.It ETHERNET
1266This node is attached to every Ethernet interface in the system.
1267It allows capturing raw Ethernet frames from the network, as well as
1268sending frames out of the interface.
1269.It INTERFACE
1270This node is also a system networking interface.
1271It has hooks representing each protocol family (IP, IPv6)
1272and appears in the output of
1273.Xr ifconfig 8 .
1274The interfaces are named
1275.Dq Li ng0 ,
1276.Dq Li ng1 ,
1277etc.
1278.It ONE2MANY
1279This node implements a simple round-robin multiplexer.
1280It can be used
1281for example to make several LAN ports act together to get a higher speed
1282link between two machines.
1283.It Various PPP related nodes
1284There is a full multilink PPP implementation that runs in
1285.Nm .
1286The
1287.Pa net/mpd5
1288port can use these modules to make a very low latency high
1289capacity PPP system.
1290It also supports
1291.Tn PPTP
1292VPNs using the PPTP node.
1293.It PPPOE
1294A server and client side implementation of PPPoE.
1295Used in conjunction with
1296either
1297.Xr ppp 8
1298or the
1299.Pa net/mpd5
1300port.
1301.It BRIDGE
1302This node, together with the Ethernet nodes, allows a very flexible
1303bridging system to be implemented.
1304.It KSOCKET
1305This intriguing node looks like a socket to the system but diverts
1306all data to and from the
1307.Nm
1308system for further processing.
1309This allows
1310such things as UDP tunnels to be almost trivially implemented from the
1311command line.
1312.El
1313.Pp
1314Refer to the section at the end of this man page for more nodes types.
1315.Sh NOTES
1316Whether a named node exists can be checked by trying to send a control message
1317to it (e.g.,
1318.Dv NGM_NODEINFO ) .
1319If it does not exist,
1320.Er ENOENT
1321will be returned.
1322.Pp
1323All data messages are
1324.Vt mbuf chains
1325with the
1326.Dv M_PKTHDR
1327flag set.
1328.Pp
1329Nodes are responsible for freeing what they allocate.
1330There are three exceptions:
1331.Bl -enum
1332.It
1333.Vt Mbufs
1334sent across a data link are never to be freed by the sender.
1335In the
1336case of error, they should be considered freed.
1337.It
1338Messages sent using one of
1339.Fn NG_SEND_MSG_*
1340family macros are freed by the recipient.
1341As in the case above, the addresses
1342associated with the message are freed by whatever allocated them so the
1343recipient should copy them if it wants to keep that information.
1344.It
1345Both control messages and data are delivered and queued with a
1346.Nm
1347.Em item .
1348The item must be freed using
1349.Fn NG_FREE_ITEM item
1350or passed on to another node.
1351.El
1352.Sh FILES
1353.Bl -tag -width indent
1354.It In netgraph/netgraph.h
1355Definitions for use solely within the kernel by
1356.Nm
1357nodes.
1358.It In netgraph/ng_message.h
1359Definitions needed by any file that needs to deal with
1360.Nm
1361messages.
1362.It In netgraph/ng_socket.h
1363Definitions needed to use
1364.Nm
1365.Vt socket
1366type nodes.
1367.It In netgraph/ng_ Ns Ao Ar type Ac Ns Pa .h
1368Definitions needed to use
1369.Nm
1370.Ar type
1371nodes, including the type cookie definition.
1372.It Pa /boot/kernel/netgraph.ko
1373The
1374.Nm
1375subsystem loadable KLD module.
1376.It Pa /boot/kernel/ng_ Ns Ao Ar type Ac Ns Pa .ko
1377Loadable KLD module for node type
1378.Ar type .
1379.It Pa src/sys/netgraph/ng_sample.c
1380Skeleton
1381.Nm
1382node.
1383Use this as a starting point for new node types.
1384.El
1385.Sh USER MODE SUPPORT
1386There is a library for supporting user-mode programs that wish
1387to interact with the
1388.Nm
1389system.
1390See
1391.Xr netgraph 3
1392for details.
1393.Pp
1394Two user-mode support programs,
1395.Xr ngctl 8
1396and
1397.Xr nghook 8 ,
1398are available to assist manual configuration and debugging.
1399.Pp
1400There are a few useful techniques for debugging new node types.
1401First, implementing new node types in user-mode first
1402makes debugging easier.
1403The
1404.Vt tee
1405node type is also useful for debugging, especially in conjunction with
1406.Xr ngctl 8
1407and
1408.Xr nghook 8 .
1409.Pp
1410Also look in
1411.Pa /usr/share/examples/netgraph
1412for solutions to several
1413common networking problems, solved using
1414.Nm .
1415.Sh SEE ALSO
1416.Xr socket 2 ,
1417.Xr netgraph 3 ,
1418.Xr ng_async 4 ,
1419.Xr ng_bluetooth 4 ,
1420.Xr ng_bpf 4 ,
1421.Xr ng_bridge 4 ,
1422.Xr ng_btsocket 4 ,
1423.Xr ng_car 4 ,
1424.Xr ng_cisco 4 ,
1425.Xr ng_device 4 ,
1426.Xr ng_echo 4 ,
1427.Xr ng_eiface 4 ,
1428.Xr ng_etf 4 ,
1429.Xr ng_ether 4 ,
1430.Xr ng_frame_relay 4 ,
1431.Xr ng_gif 4 ,
1432.Xr ng_gif_demux 4 ,
1433.Xr ng_hci 4 ,
1434.Xr ng_hole 4 ,
1435.Xr ng_hub 4 ,
1436.Xr ng_iface 4 ,
1437.Xr ng_ip_input 4 ,
1438.Xr ng_ipfw 4 ,
1439.Xr ng_ksocket 4 ,
1440.Xr ng_l2cap 4 ,
1441.Xr ng_l2tp 4 ,
1442.Xr ng_lmi 4 ,
1443.Xr ng_mppc 4 ,
1444.Xr ng_nat 4 ,
1445.Xr ng_netflow 4 ,
1446.Xr ng_one2many 4 ,
1447.Xr ng_patch 4 ,
1448.Xr ng_ppp 4 ,
1449.Xr ng_pppoe 4 ,
1450.Xr ng_pptpgre 4 ,
1451.Xr ng_rfc1490 4 ,
1452.Xr ng_socket 4 ,
1453.Xr ng_split 4 ,
1454.Xr ng_tee 4 ,
1455.Xr ng_tty 4 ,
1456.Xr ng_ubt 4 ,
1457.Xr ng_UI 4 ,
1458.Xr ng_vjc 4 ,
1459.Xr ng_vlan 4 ,
1460.Xr ngctl 8 ,
1461.Xr nghook 8
1462.Sh HISTORY
1463The
1464.Nm
1465system was designed and first implemented at Whistle Communications, Inc.\&
1466in a version of
1467.Fx 2.2
1468customized for the Whistle InterJet.
1469It first made its debut in the main tree in
1470.Fx 3.4 .
1471.Sh AUTHORS
1472.An -nosplit
1473.An Julian Elischer Aq Mt julian@FreeBSD.org ,
1474with contributions by
1475.An Archie Cobbs Aq Mt archie@FreeBSD.org .
1476