xref: /illumos-gate/usr/src/man/man9e/mac.9e (revision 45f8fdd1)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright 2019 Joyent, Inc.
13.\" Copyright 2020 RackTop Systems, Inc.
14.\" Copyright 2023 Oxide Computer Company
15.\"
16.Dd March 4, 2023
17.Dt MAC 9E
18.Os
19.Sh NAME
20.Nm mac ,
21.Nm GLDv3
22.Nd MAC networking device driver overview
23.Sh SYNOPSIS
24.In sys/mac_provider.h
25.In sys/mac_ether.h
26.Sh INTERFACE LEVEL
27illumos DDI specific
28.Sh DESCRIPTION
29The
30.Sy MAC
31framework provides a means for implementing high-performance networking
32device drivers.
33It is the successor to the GLD interfaces and is sometimes referred to as the
34GLDv3.
35The remainder of this manual introduces the aspects of writing devices drivers
36that leverage the MAC framework.
37While both the GLDv3 and MAC framework refer to the same thing, in this manual
38page we use the term the
39.Em MAC framework
40to refer to the device driver interface.
41.Pp
42MAC device drivers are character devices.
43They define the standard
44.Xr _init 9E ,
45.Xr _fini 9E ,
46and
47.Xr _info 9E
48entry points to initialize the module, as well as
49.Xr dev_ops 9S
50and
51.Xr cb_ops 9S
52structures.
53.Pp
54The main interface with MAC is through a series of callbacks defined in
55a
56.Xr mac_callbacks 9S
57structure.
58These callbacks control all the aspects of the device.
59They range from sending data, getting and setting of properties, controlling mac
60address filters, and also managing promiscuous mode.
61.Pp
62The MAC framework takes care of many aspects of the device driver's
63management.
64A device that uses the MAC framework does not have to worry about creating
65device nodes or implementing
66.Xr open 9E
67or
68.Xr close 9E
69routines.
70In addition, all of the work to interact with
71.Xr dlpi 4P
72is taken care of automatically and transparently.
73.Ss High-Level Design
74At a high-level, a device driver is chiefly concerned with three general
75operations:
76.Bl -enum -offset indent
77.It
78Sending frames
79.It
80Receiving frames
81.It
82Managing device configuration and metadata
83.El
84.Pp
85When sending frames, the MAC framework always calls functions registered
86in the
87.Xr mac_callbacks 9S
88structure to have the driver transmit frames on hardware.
89When receiving frames, the driver will generally receive an interrupt which will
90cause it to check for incoming data and deliver it to the MAC framework.
91.Pp
92Configuration of a device, such as whether auto-negotiation should be
93enabled, the speeds that the device supports, the MTU (maximum
94transmission unit), and the generation of pause frames are all driven by
95properties.
96The functions to get, set, and obtain information about properties are
97defined through callback functions specified in the
98.Xr mac_callbacks 9S
99structure.
100The full list of properties and a description of the relevant callbacks
101can be found in the
102.Sx PROPERTIES
103section.
104.Pp
105The MAC framework is designed to take advantage of various modern
106features provided by hardware, such as checksumming, segmentation
107offload, and hardware filtering.
108The MAC framework assumes none of these advanced features are present
109and allows device drivers to negotiate them through a capability system.
110Drivers can declare that they support various capabilities by
111implementing the optional
112.Xr mc_getcapab 9E
113entry point.
114Each capability has its associated entry points and structures to fill
115out.
116The capabilities are detailed in the
117.Sx CAPABILITIES
118section.
119.Pp
120The following sections describe the flow of a basic device driver.
121For advanced device drivers, the flow is generally the same.
122The primary distinction is in how frames are sent and received.
123.Ss Initializing MAC Support
124For a device to be used by the MAC framework, it must register with the
125framework and take specific actions during
126.Xr _init 9E ,
127.Xr attach 9E ,
128.Xr detach 9E ,
129and
130.Xr _fini 9E .
131.Pp
132All device drivers have to define a
133.Xr dev_ops 9S
134structure which is pointed to by a
135.Xr modldrv 9S
136structure and the corresponding NULL-terminated
137.Xr modlinkage 9S
138structure.
139The
140.Xr dev_ops 9S
141structure should have a
142.Xr cb_ops 9S
143structure defined for it; however, it does not need to implement any of
144the standard
145.Xr cb_ops 9S
146entry points unless it also exposes a custom set of device nodes not
147otherwise managed by the MAC framework.
148See the
149.Sx Custom Device Nodes
150section for more details.
151.Pp
152Normally, in a driver's
153.Xr _init 9E
154entry point, it passes its
155.Xr modlinkage 9S
156structure directly to
157.Xr mod_install 9F .
158To properly register with MAC, the driver must call
159.Xr mac_init_ops 9F
160before it calls
161.Xr mod_install 9F .
162If for some reason the
163.Xr mod_install 9F
164function fails, then the driver must be removed by a call to
165.Xr mac_fini_ops 9F .
166.Pp
167Conversely, in the driver's
168.Xr _fini 9E
169routine, it should call
170.Xr mac_fini_ops 9F
171after it successfully calls
172.Xr mod_remove 9F .
173For an example of how to use the
174.Xr mac_init_ops 9F
175and
176.Xr mac_fini_ops 9F
177functions, see the examples section in
178.Xr mac_init_ops 9F .
179.Ss Custom Device Nodes
180A device may want to provide its own minor nodes as simple character or block
181devices backed by the usual
182.Xr cb_ops 9S
183routines.
184The MAC framework allows for this by leaving a portion of the minor
185number space available for private driver use.
186.Xr mac_private_minor 9F
187returns the first minor number a driver may use for its own purposes,
188e.g., to pass to
189.Xr ddi_create_minor_node 9F .
190.Pp
191A driver making use of this ability must provide its own
192.Xr getinfo 9E
193implementation that is aware of any such minor nodes.
194It must also delegate back to the MAC framework as appropriate via either
195calls to
196.Xr mac_getinfo 9F
197or
198.Xr mac_devt_to_instance 9F
199for MAC reserved minor nodes.
200It should also take care to not affect MAC reserved minors, e.g.,
201removing all minor nodes associated with a device:
202.Bd -literal -offset indent
203    ddi_remove_minor_node(dip, NULL);
204.Ed
205.Ss Registering with MAC
206Every instance of a device should register separately with MAC.
207To register with MAC, a driver must allocate a
208.Xr mac_register 9S
209structure, fill it in, and then call
210.Xr mac_register 9F .
211The
212.Vt mac_register_t
213structure contains information about the device and all of the required
214function pointers that will be used as callbacks by the framework.
215.Pp
216These steps should all be taken during a device's
217.Xr attach 9E
218entry point.
219It is recommended that the driver perform this sequence of steps after the
220device has finished its initialization of the chipset and interrupts, though
221interrupts should not be enabled at that point.
222After it calls
223.Xr mac_register 9F
224it will start receiving callbacks from the MAC framework.
225.Pp
226To allocate the registration structure, the driver should call
227.Xr mac_alloc 9F .
228Device drivers should generally always pass the symbol
229.Dv MAC_VERSION
230as the argument to
231.Xr mac_alloc 9F .
232Upon successful completion, the driver will receive a
233.Vt mac_register_t
234structure which it should fill in.
235The structure and its members are documented in
236.Xr mac_register 9S .
237.Pp
238The
239.Xr mac_callbacks 9S
240structure is not allocated as a part of the
241.Xr mac_register 9S
242structure.
243In general, device drivers declare this statically.
244See the
245.Sx MAC Callbacks
246section for more information on how to fill it out.
247.Pp
248Once the structure has been filled in, the driver should call
249.Xr mac_register 9F
250to register itself with MAC.
251The handle that it uses to register with should be part of the driver's soft
252state.
253It will be used in various other support functions and callbacks.
254.Pp
255If the call is successful, then the device driver
256should enable interrupts and finish any other initialization required.
257If the call to
258.Xr mac_register 9F
259failed, then it should unwind its initialization and should return
260.Dv DDI_FAILURE
261from its
262.Xr attach 9E
263routine.
264.Pp
265The driver does not need to hold onto an allocated
266.Xr mac_register 9S
267structure after it has called the
268.Xr mac_register 9F
269function.
270Whether the
271.Xr mac_register 9F
272function returns successfully or not, the driver may free its
273.Xr mac_register 9S
274structure by calling the
275.Xr mac_free 9F
276function.
277.Ss MAC Callbacks
278The MAC framework interacts with a device driver through a series of
279callbacks.
280These callbacks are described in their individual manual pages and the
281collection of callbacks is indicated in the
282.Xr mac_callbacks 9S
283manual page.
284This section does not focus on the specific functions, but rather on
285interactions between them and the rest of the device driver framework.
286.Pp
287A device driver should make no assumptions about when the various
288callbacks will be called and whether or not they will be called
289simultaneously.
290For example, a device driver may be asked to transmit data through a call to its
291.Xr mc_tx 9E
292entry point while it is being asked to get a device property through a
293call to its
294.Xr mc_getprop 9E
295entry point.
296As such, while some calls may be serialized to the device, such as setting
297properties, the device driver should always presume that all of its data needs
298to be protected with locks.
299While the device is holding locks, it is safe for it call the following MAC
300routines:
301.Bl -bullet -offset indent -compact
302.It
303.Xr mac_hcksum_get 9F
304.It
305.Xr mac_hcksum_set 9F
306.It
307.Xr mac_lso_get 9F
308.It
309.Xr mac_maxsdu_update 9F
310.It
311.Xr mac_prop_info_set_default_link_flowctrl 9F
312.It
313.Xr mac_prop_info_set_default_str 9F
314.It
315.Xr mac_prop_info_set_default_uint8 9F
316.It
317.Xr mac_prop_info_set_default_uint32 9F
318.It
319.Xr mac_prop_info_set_default_uint64 9F
320.It
321.Xr mac_prop_info_set_perm 9F
322.It
323.Xr mac_prop_info_set_range_uint32 9F
324.El
325.Pp
326Any other MAC related routines should not be called with locks held,
327such as
328.Xr mac_link_update 9F
329or
330.Xr mac_rx 9F .
331Other routines in the DDI may be called while locks are held; however,
332device driver writers should be careful about calling blocking routines
333while locks are held or in interrupt context, even when it is
334legal to do so as this may cause all other callers that need a given
335lock to back up behind such an operation.
336.Ss Receiving Data
337A device driver will often receive data through the means of an
338interrupt or by being asked to poll for frames.
339When this occurs, zero or more frames, each with optional metadata, may
340be ready for the device driver to consume.
341Often each frame has a corresponding descriptor which has information about
342whether or not there were errors or whether or not the device successfully
343checksummed the packet.
344In addition to the per-packet flow described below, there are certain
345requirements that drivers must adhere to when programming the hardware
346to receive data.
347See the section
348.Sx RECEIVE DESCRIPTOR LAYOUT
349for more information.
350.Pp
351During a single interrupt or poll request, a device driver should process
352a fixed number of frames.
353For each frame the device driver should:
354.Bl -enum -offset indent
355.It
356Ensure that all of the DMA memory for the descriptor ring is synchronized with
357the
358.Xr ddi_dma_sync 9F
359function and check the handle for errors if the device driver has enabled DMA
360error reporting as part of the Fault Management Architecture (FMA).
361If the driver does not rely on DMA, then it may skip this step.
362It is recommended that this is performed once per interrupt or poll for
363the entire region and not on a per-packet basis.
364.It
365First check whether or not the frame has errors.
366If errors were detected, then the frame should not be sent to the operating
367system.
368It is recommended that devices keep kstats (see
369.Xr kstat_create 9F
370for more information) and bump the counter whenever such an error is
371detected.
372If the device distinguishes between the types of errors, then separate kstats
373for each class of error are recommended.
374See the
375.Sx STATISTICS
376section for more information on the various error cases that should be
377considered.
378.It
379Once the frame has been determined to be valid, the device driver should
380transform the frame into a
381.Xr mblk 9S .
382See the section
383.Sx MBLKS AND DMA
384for more information on how to transform and prepare a message block.
385.It
386If the device supports hardware checksumming (see the
387.Sx CAPABILITIES
388section for more information on checksumming), then the device driver
389should set the corresponding checksumming information with a call to
390.Xr mac_hcksum_set 9F .
391.It
392It should then append this new message block to the
393.Em end
394of the message block chain, linking it to the
395.Fa b_next
396pointer.
397It is vitally important that all the frames be chained in the order that they
398were received.
399If the device driver mistakenly reorders frames, then it may cause performance
400impacts in the TCP stack and potentially impact application correctness.
401.El
402.Pp
403Once all the frames have been processed and assembled, the device driver
404should deliver them to the rest of the operating system by calling
405.Xr mac_rx 9F .
406The device driver should try to give as many mblk_t structures to the
407system at once.
408It
409.Em should not
410call
411.Xr mac_rx 9F
412once for every assembled mblk_t.
413.Pp
414The device driver must not hold any locks across the call to
415.Xr mac_rx 9F .
416When this function is called, received data will be pushed through the
417networking stack and some replies may be generated and given to the
418driver to send out.
419.Pp
420It is not the device driver's responsibility to determine whether or not
421the system can keep up with a driver's delivery rate of frames.
422The rest of the networking stack will handle issues related to keeping up
423appropriately and ensure that kernel memory is not exhausted by packets
424that are not being processed.
425.Pp
426If the device driver has negotiated the
427.Dv MAC_CAPAB_RINGS
428capability
429.Pq discussed in Xr mac_capab_rings 9E
430then it should call
431.Xr mac_rx_ring 9F
432and not
433.Xr mac_rx 9F .
434A given interrupt may correspond to more than one ring that needs to be
435checked.
436The set of rings is likely to span different groups that were registered
437with MAC through the
438.Xr mr_gget 9E
439interface.
440In those cases, the driver should follow the above procedure
441independently for each ring.
442That means it will call
443.Xr mac_rx_ring 9F
444once for each ring using the handle that it received from when MAC
445called the driver's
446.Xr mr_rget 9E
447entry point.
448When it is looking at the rings, the driver will need to make sure that
449the ring has not had interrupts disabled
450.Pq due to a pending change to polling mode .
451This is discussed in greater detail in the
452.Xr mac_capab_rings 9E
453and
454.Xr mri_poll 9E
455manual pages.
456.Pp
457Finally, the device driver should make sure that any other housekeeping
458activities required for the ring are taken care of such that more data
459can be received.
460.Ss Transmitting Data and Back Pressure
461A device driver will be asked to transmit a message block chain by
462having it's
463.Xr mc_tx 9E
464entry point called.
465While the driver is processing the message blocks, it may run out of resources.
466For example, a transmit descriptor ring may become full.
467At that point, the device driver should return the remaining unprocessed frames.
468The act of returning frames indicates that the device has asserted flow control.
469Once this has been done, no additional calls will be made to the
470driver's transmit entry point and the back pressure will be propagated
471throughout the rest of the networking stack.
472.Pp
473At some point in the future when resources have become available again,
474for example after an interrupt indicating that some portion of the
475transmit ring has been sent, then the device driver must notify the
476system that it can continue transmission.
477To do this, the driver should call
478.Xr mac_tx_update 9F .
479After that point, the driver will receive calls to its
480.Xr mc_tx 9E
481entry point again.
482As mentioned in the section on callbacks, the device driver should avoid holding
483any particular locks across the call to
484.Xr mac_tx_update 9F .
485.Ss Interrupt Coalescing
486For devices operating at higher data rates, interrupt coalescing is an
487important part of a well functioning device and may impact the
488performance of the device.
489Not all devices support interrupt coalescing.
490If interrupt coalescing is supported on the device, it is recommended that
491device driver writers provide private properties for their device to control the
492interrupt coalescing rate.
493This will make it much easier to perform experiments and observe the impact of
494different interrupt rates on the rest of the system.
495.Ss Polling
496Even with interrupt coalescing, when there is a certain incoming packet rate it
497can make more sense to just actively poll the device, asking for more packets
498rather than constantly taking an interrupt.
499When a device driver supports the
500.Xr mac_capab_rings 9E
501capability and therefore polling on receive rings, the MAC framework will ask
502the driver to disable interrupts, with its
503.Xr mi_disable 9E
504entry point, and then subsequently call its polling entry point,
505.Xr mri_poll 9E .
506.Pp
507As long as a device driver implements the needed entry points, then there is
508nothing else that it needs to do to take advantage of polling.
509A driver should not attempt to spin up its own threads, task queues, or
510creatively use timeouts, to try to simulate polling for received packets.
511.Ss MAC Address Filter Management
512The MAC framework will attempt to use as many MAC address filters as a
513device has.
514To program a multicast address filter, the driver's
515.Xr mc_multicst 9E
516entry point will be called.
517If the device driver runs out of filters, it should not take any special action
518and just return the appropriate error as documented in the corresponding manual
519pages for the entry points.
520The framework will ensure that the device is placed in promiscuous mode
521if it needs to.
522.Pp
523If the hardware supports more than one unicast filter then the device
524driver should consider implementing the
525.Dv MAC_CAPAB_RINGS
526capability, which exposes a means for multiple unicast MAC address filters to be
527used by the broader system.
528It is still useful to implement this on hardware which only has a single ring.
529See
530.Xr mac_capab_rings 9E
531for more information.
532.Ss Receive Side Scaling
533Receive side scaling is where a hardware device supports multiple,
534independent queues of frames that can be received.
535Each of these queues is generally associated with an independent
536interrupt and the hardware usually performs some form of hash across the
537queues.
538Hardware which supports this should look at implementing the
539.Dv MAC_CAPAB_RINGS
540capability and see
541.Xr mac_capab_rings 9E
542for more information.
543.Ss Link Updates
544It is the responsibility of the device driver to keep track of the
545data link's state.
546Many devices provide a means of receiving an interrupt when the state of the
547link changes.
548When such a change happens, the driver should update its internal data
549structures and then call
550.Xr mac_link_update 9F
551to inform the MAC layer that this has occurred.
552If the device driver does not properly inform the system about link changes,
553then various features like link aggregations and other mechanisms that leverage
554the link state will not work correctly.
555.Ss Link Speed and Auto-negotiation
556Many networking devices support more than one possible speed that they
557can operate at.
558The selection of a speed is often performed through
559.Em auto-negotiation ,
560though some devices allow the user to control what speeds are advertised
561and used.
562.Pp
563Logically, there are two different sets of things that the device driver
564needs to keep track of while it's operating:
565.Bl -enum
566.It
567The supported speeds in hardware.
568.It
569The enabled speeds from the user.
570.El
571.Pp
572By default, when a link first comes up, the device driver should
573generally configure the link to support the common set of speeds and
574perform auto-negotiation.
575.Pp
576A user can control what speeds a device advertises via auto-negotiation
577and whether or not it performs auto-negotiation at all by using a series
578of properties that have
579.Sy _EN_
580in the name.
581These are read/write properties and there is one for each speed supported in the
582operating system.
583For a full list of them, see the
584.Sx PROPERTIES
585section.
586.Pp
587In addition to these properties, there is a corresponding set of
588properties with
589.Sy _ADV_
590in the name.
591These are similar to the
592.Sy _EN_
593family of properties, but they are read-only and indicate what the
594device has actually negotiated.
595While they are generally similar to the
596.Sy _EN_
597family of properties, they may change depending on power settings.
598See the
599.Sy Ethernet Link Properties
600section in
601.Xr dladm 8
602for more information.
603.Pp
604It's worth discussing how these different values get used throughout the
605different entry points.
606The first entry point to consider is the
607.Xr mc_propinfo 9E
608entry point.
609For a given speed, the driver should consult whether or not the hardware
610supports this speed.
611If it does, it should fill in the default value that the hardware takes and
612whether or not the property is writable.
613The properties should also be updated to indicate whether or not it is writable.
614This holds for both the
615.Sy _EN_
616and
617.Sy _ADV_
618family of properties.
619.Pp
620The next entry point is
621.Xr mc_getprop 9E .
622Here, the device should first consult whether the given speed is
623supported.
624If it is not, then the driver should return
625.Er ENOTSUP .
626If it does, then it should return the current value of the property.
627.Pp
628The last property endpoint is the
629.Xr mc_setprop 9E
630entry point.
631Here, the same logic applies.
632Before the driver considers whether or not the property is writable, it should
633first check whether or not it's a supported property.
634If it's not, then it should return
635.Er ENOTSUP .
636Otherwise, it should proceed to check whether the property is writable,
637and if it is and a valid value, then it should update the property and
638restart the link's negotiation.
639.Pp
640Finally, there is the
641.Xr mc_getstat 9E
642entry point.
643Several of the statistics that are queried relate to auto-negotiation and
644hardware capabilities.
645When a statistic relates to the hardware supporting a given speed, the
646.Sy _EN_
647properties should be ignored.
648The only thing that should be consulted is what the hardware itself supports.
649Otherwise, the statistics should look at what is currently being advertised by
650the device.
651.Ss Unregistering from MAC
652During a driver's
653.Xr detach 9E
654routine, it should unregister the device instance from MAC by calling
655.Xr mac_unregister 9F
656on the handle that it originally called it on.
657If the call to
658.Xr mac_unregister 9F
659failed, then the device is likely still in use and the driver should
660fail the call to
661.Xr detach 9E .
662.Ss Interacting with Devices
663Administrators always interact with devices through the
664.Xr dladm 8
665command line interface.
666The state of devices such as whether the link is considered up or down,
667various link properties such as the MTU, auto-negotiation state, and
668flow control state, are all exposed.
669It is also the preferred way that these properties are set and configured.
670.Pp
671While device tunables may be presented in a
672.Xr driver.conf 5
673file, it is recommended instead to expose such things through
674.Xr dladm 8
675private properties, whether explicitly documented or not.
676.Sh CAPABILITIES
677Capabilities in the MAC Framework are optional features that a device
678supports which indicate various hardware features that the device
679supports.
680The two current capabilities that the system supports are related to being able
681to hardware perform large send offloads (LSO), often also known as TCP
682segmentation and the ability for hardware to calculate and verify the checksums
683present in IPv4, IPV6, and protocol headers such as TCP and UDP.
684.Pp
685The MAC framework will query a device for support of a capability
686through the
687.Xr mc_getcapab 9E
688function.
689Each capability has its own constant and may have corresponding data that goes
690along with it and a specific structure that the device is required to fill in.
691Note, the set of capabilities changes over time and there are also private
692capabilities in the system.
693Several of the capabilities are used in the implementation of the MAC framework.
694Others, like
695.Dv MAC_CAPAB_RINGS ,
696represent feature that have not been stabilized and thus both API and binary
697compatibility for them is not guaranteed.
698It is important that the device driver handles unknown capabilities correctly.
699For more information, see
700.Xr mc_getcapab 9E .
701.Pp
702The following capabilities are
703stable and defined in the system:
704.Ss Dv MAC_CAPAB_HCKSUM
705The
706.Dv MAC_CAPAB_HCKSUM
707capability indicates to the system that the device driver supports some
708amount of checksumming.
709The specific data for this capability is a pointer to a
710.Vt uint32_t .
711To indicate no support for any kind of checksumming, the driver should
712either set this value to zero or simply return that it doesn't support
713the capability.
714.Pp
715Note, the values that the driver declares in this capability indicate
716what it can do when it transmits data.
717If the driver can only verify checksums when receiving data, then it should not
718indicate that it supports this capability.
719The following set of flags may be combined through a bitwise inclusive OR:
720.Bl -tag -width Ds
721.It Dv HCKSUM_INET_PARTIAL
722This indicates that the hardware can calculate a partial checksum for
723both IPv4 and IPv6 UDP and TCP packets; however, it requires the pseudo-header
724checksum be calculated for it.
725The pseudo-header checksum will be available for the mblk_t when calling
726.Xr mac_hcksum_get 9F .
727Note this does not imply that the hardware is capable of calculating
728the partial checksum for other L4 protocols or the IPv4 header checksum.
729That should be indicated with the
730.Dv HCKSUM_IPHDRCKSUM flag.
731.It Dv HCKSUM_INET_FULL_V4
732This indicates that the hardware will fully calculate the L4 checksum for
733outgoing IPv4 UDP or TCP packets only, and does not require a pseudo-header
734checksum.
735Note this does not imply that the hardware is capable of calculating the
736checksum for other L4 protocols or the IPv4 header checksum.
737That should be indicated with the
738.Dv HCKSUM_IPHDRCKSUM .
739.It Dv HCKSUM_INET_FULL_V6
740This indicates that the hardware will fully calculate the L4 checksum for
741outgoing IPv6 UDP or TCP packets only, and does not require a pseudo-header
742checksum.
743Note this does not imply that the hardware is capable of calculating the
744checksum for any other L4 protocols.
745.It Dv HCKSUM_IPHDRCKSUM
746This indicates that the hardware supports calculating the checksum for
747the IPv4 header itself.
748.El
749.Pp
750When in a driver's transmit function, the driver will be processing a
751single frame.
752It should call
753.Xr mac_hcksum_get 9F
754to see what checksum flags are set on it.
755Note that the flags that are set on it are different from the ones described
756above and are documented in its manual page.
757These flags indicate how the driver is expected to program the hardware and what
758checksumming is required.
759Not all frames will require hardware checksumming or will ask the hardware to
760checksum it.
761.Pp
762If a driver supports offloading the receive checksum and verification,
763it should check to see what the hardware indicated was verified.
764The driver should then call
765.Xr mac_hcksum_set 9F .
766The flags used are different from the ones above and are discussed in
767detail in the
768.Xr mac_hcksum_set 9F
769manual page.
770If there is no checksum information available or the driver does not support
771checksumming, then it should simply not call
772.Xr mac_hcksum_set 9F .
773.Pp
774Note that the checksum flags should be set on the first
775mblk_t that makes up a given message.
776In other words, if multiple mblk_t structures are linked together by the
777.Fa b_cont
778member to describe a single frame, then it should only be called on the
779first mblk_t of that set.
780However, each distinct message should have the checksum bits set on it, if
781applicable.
782In other words, each mblk_t that is linked together by the
783.Fa b_next
784pointer may have checksum flags set.
785.Pp
786It is recommended that device drivers provide a private property or
787.Xr driver.conf 5
788property to control whether or not checksumming is enabled for both rx
789and tx; however, the default disposition is recommended to be enabled
790for both.
791This way if hardware bugs are found in the checksumming implementation, they can
792be disabled without requiring software updates.
793The transmit property should be checked when determining how to reply to
794.Xr mc_getcapab 9E
795and the receive property should be checked in the context of the receive
796function.
797.Ss Dv MAC_CAPAB_LSO
798The
799.Dv MAC_CAPAB_LSO
800capability indicates that the driver supports various forms of large
801send offload (LSO).
802The private data is a pointer to a
803.Ft mac_capab_lso_t
804structure.
805The system currently supports offloading TCP packets over both IPv4 and
806IPv6.
807This structure has the following members which are used to indicate
808various types of LSO support.
809.Bd -literal -offset indent
810t_uscalar_t		lso_flags;
811lso_basic_tcp_ivr4_t	lso_basic_tcp_ipv4;
812lso_basic_tcp_ipv6_t	lso_basic_tcp_ipv6;
813.Ed
814.Pp
815The
816.Fa lso_flags
817member is used to indicate which members are valid and should be
818considered.
819Each flag represents a different form of LSO.
820The member should be set to the bitwise inclusive OR of the following values:
821.Bl -tag -width Dv -offset indent
822.It Dv LSO_TX_BASIC_TCP_IPV4
823This indicates hardware support for performing TCP segmentation
824offloading over IPv4.
825When this flag is set, the
826.Fa lso_basic_tcp_ipv4
827member must be filled in.
828.It Dv LSO_TX_BASIC_TCP_IPV6
829This indicates hardware support for performing TCP segmentation
830offloading over IPv6.
831The IPv6 packet will have no extension headers present.
832When this flag is set, the
833.Fa lso_basic_tcp_ipv6
834member must be filled in.
835.El
836.Pp
837The
838.Fa lso_basic_tcp_ipv4
839member is a structure with the following members:
840.Bd -literal -offset indent
841t_uscalar_t	lso_max
842.Ed
843.Bd -filled -offset indent
844The
845.Fa lso_max
846member should be set to the maximum size of the TCP data
847payload that can be offloaded to the hardware.
848.Ed
849.Pp
850The
851.Fa lso_basic_tcp_ipv6
852member is a structure with the following members:
853.Bd -literal -offset indent
854t_uscalar_t	lso_max
855.Ed
856.Bd -filled -offset indent
857The
858.Fa lso_max
859member should be set to the maximum size of the TCP data
860payload that can be offloaded to the hardware.
861.Ed
862.Pp
863Like with checksumming, it is recommended that driver writers provide a
864means for disabling the support of LSO even if it is enabled by default.
865This deals with the case where issues that pop up for LSO may be worked
866around without requiring additional driver work.
867.Sh EVOLVING CAPABILITIES
868The following capabilities are still evolving in the operating system.
869They are documented such that device driver writers may experiment with
870them.
871However, if such drivers are not present inside the core operating
872system repository, they may be subject to API and ABI breakage.
873.Ss Dv MAC_CAPAB_RINGS
874The
875.Dv MAC_CAPAB_RINGS
876capability is very important for implementing a high-performing device
877driver.
878Networking hardware structures the queues of packets to be sent
879and received into a ring.
880Each entry in this ring has a descriptor, which describes the address
881and options for a packet which is going to
882be transmitted or received.
883While simple networking devices only have a single ring, most high-speed
884networking devices have support for many rings.
885.Pp
886Rings are used for two important purposes.
887The first is receive side scaling (RSS), which is the ability to have
888the hardware hash the contents of a packet based on some of the protocol
889headers, and send it to one of several rings.
890These different rings may each have their own interrupt associated with
891them, allowing the card to receive traffic in parallel.
892Similar logic can be performed when sending traffic, to leverage
893multiple hardware resources, thus increasing capacity.
894.Pp
895The second use of rings is to group them together and apply filtering
896rules.
897For example, if a packet matches a specific VLAN or MAC address,
898then it can be sent to a specific ring or a specific group of rings.
899This is especially useful when there are multiple different virtual NICs
900or zones in play as the operating system will be able to use the
901hardware classificaiton features to already know where a given packet
902needs to be delivered internally rather than having to determine that
903for each packet.
904.Pp
905From the MAC framework's perspective, a driver can have one or more
906groups.
907A group consists of the following:
908.Bl -bullet -offset -indent
909.It
910One or more hardware rings.
911.It
912One or more MAC address or VLAN filters.
913.El
914.Pp
915The details around how a device driver changes when rings are employed,
916the data structures that a driver must implement, and more are available
917in
918.Xr mac_capab_rings 9E .
919.Ss Dv MAC_CAPAB_TRANSCEIVER
920Many networking devices leverage external transceivers that adhere to
921standards such as SFP, QSFP, QSFP-DD, etc., which often contain
922standardized information in a EEPROM on the device.
923The
924.Dv MAC_CAPAB_TRANSCEIVER
925capability provides a means of discovering the number of transceivers,
926their types, and reading the data from a transceiver.
927This allows administrators and users to determine if devices are
928present, if the hardware can use them, and in many cases, detailed
929information about the device ranging from its manufacturer and
930serial numbers to specific information about its health.
931Implementing this capability will lead to the operating system being
932able to discover and display transceivers as part of its fault
933management topology.
934.Pp
935See
936.Xr mac_capab_transceiver 9E
937for more details on the capability structure and the various function
938entry points that come along with it.
939.Ss Dv MAC_CAPAB_LED
940The
941.Dv MAC_CAPAB_LED
942capability provides a means to access and control the LEDs on a network
943interface card.
944This is then made available to the broader operating system and consumed
945by facilities such as the Fault Management Architecture.
946See
947.Xr mac_capab_led 9E
948for more details on the structure and requirements of the capability.
949.Sh PROPERTIES
950Properties in the MAC framework represent aspects of a link.
951These include things like the link's current state and MTU.
952Many of the properties in the system are focused around auto-negotiation and
953controlling what link speeds are advertised.
954Information about properties is covered by three different device entry points.
955The
956.Xr mc_propinfo 9E
957entry point obtains metadata about the property.
958The
959.Xr mc_getprop 9E
960entry point obtains the property.
961The
962.Xr mc_setprop 9E
963entry point updates the property to a new value.
964.Pp
965Many of the properties listed below are read-only.
966Each property indicates whether it's read-only or it's read/write.
967However, driver writers may not implement the ability to set all writable
968properties.
969Many of these depend on the card itself.
970In particular, all properties that relate to auto-negotiation and are read/write
971may not be updated if the hardware in question does not support toggling what
972link speeds are auto-negotiated.
973While copper Ethernet often does not have this restriction, it often exists with
974various fiber standards and phys.
975.Pp
976The following properties are the subset of MAC framework properties that
977driver writers should be aware of and handle.
978While other properties exist in the system, driver writers should always return
979an error when a property not listed below is encountered.
980See
981.Xr mc_getprop 9E
982and
983.Xr mc_setprop 9E
984for more information on how to handle them.
985.Bl -hang -width Ds
986.It Dv MAC_PROP_DUPLEX
987.Bd -filled -compact
988Type:
989.Vt link_duplex_t |
990Permissions:
991.Sy Read-Only
992.Ed
993.Pp
994The
995.Dv MAC_PROP_DUPLEX
996property is used to indicate whether or not the link is duplex.
997A duplex link may have traffic flowing in both directions at the same time.
998The
999.Vt link_duplex_t
1000is an enumeration which may be set to any of the following values:
1001.Bl -tag -width Ds
1002.It Dv LINK_DUPLEX_UNKNOWN
1003The current state of the link is unknown.
1004This may be because the link has not negotiated to a specific speed or it is
1005down.
1006.It Dv LINK_DUPLEX_HALF
1007The link is running at half duplex.
1008Communication may travel in only one direction on the link at a given time.
1009.It Dv LINK_DUPLEX_FULL
1010The link is running at full duplex.
1011Communication may travel in both directions on the link simultaneously.
1012.El
1013.It Dv MAC_PROP_SPEED
1014.Bd -filled -compact
1015Type:
1016.Vt uint64_t |
1017Permissions:
1018.Sy Read-Only
1019.Ed
1020.Pp
1021The
1022.Dv MAC_PROP_SPEED
1023property stores the current link speed in bits per second.
1024A link that is running at 100 MBit/s would store the value 100000000ULL.
1025A link that is running at 40 Gbit/s would store the value 40000000000ULL.
1026.It Dv MAC_PROP_STATUS
1027.Bd -filled -compact
1028Type:
1029.Vt link_state_t |
1030Permissions:
1031.Sy Read-Only
1032.Ed
1033.Pp
1034The
1035.Dv MAC_PROP_STATUS
1036property is used to indicate the current state of the link.
1037It indicates whether the link is up or down.
1038The
1039.Vt link_state_t
1040is an enumeration which may be set to any of the following values:
1041.Bl -tag -width Ds
1042.It Dv LINK_STATE_UNKNOWN
1043The current state of the link is unknown.
1044This may be because the driver's
1045.Xr mc_start 9E
1046endpoint has not been called so it has not attempted to start the link.
1047.It Dv LINK_STATE_DOWN
1048The link is down.
1049This may be because of a negotiation problem, a cable problem, or some other
1050device specific issue.
1051.It Dv LINK_STATE_UP
1052The link is up.
1053If auto-negotiation is in use, it should have completed.
1054Traffic should be able to flow over the link, barring other issues.
1055.El
1056.It Dv MAC_PROP_MEDIA
1057.Bd -filled -compact
1058Type:
1059.Vt uint32_t No (Varies) |
1060Permissions:
1061.Sy Read-Only
1062.Ed
1063.Pp
1064The
1065.Dv MAC_PROP_MEDIA
1066property indicates the current type of media on the link.
1067The type of media is class-specific and determined based on the
1068.Fa m_type_ident
1069field in the
1070.Vt mac_register_t
1071structure used when calling
1072.Xr mac_register 9F .
1073The media is always read-only.
1074This property is not used to control how auto-negotiation should be
1075performed, instead the existing speed-based properties are used instead.
1076This property should be updated after auto-negotiation has completed.
1077If device hardware and firmware do not provide a way to accurately
1078determine this, then it is much better to return that the media is
1079unknown rather than to lie or guess.
1080A common case where this comes up is when a network card uses an
1081SFP-based device.
1082If the underlying negotiated type of the link isn't made available and
1083therefore the driver can't distinguish between say 40GBASE-SR4 and
108440GBASE-LR4, then drivers should return that the media is unknown.
1085.Pp
1086Similarly many types here represent an electrical interface that is
1087often used between a MAC and a PHY, but also for chip-to-chip
1088connectivity or on a backplane.
1089When connecting to a PHY these shouldn't generally be used as the user
1090is concerned with what is actually on the link they plug in, not the
1091internals of the device.
1092.Pp
1093Currently media values are defined for Ethernet-based devices and use
1094the enumeration
1095.Vt mac_ether_media_t .
1096These are defined in
1097.In sys/mac_ether.h
1098and generally follow the IEEE standardized physical medium dependent
1099.Pq PMD
1100layer in 802.3.
1101.Bl -tag -width Ds
1102.It Dv ETHER_MEDIA_UNKNOWN
1103This indicates that the type of the link media is unknown to the driver.
1104This may be because the link is in a state where this information is
1105unknown or the hardware, firmware, and device driver cannot figure it
1106out.
1107If there is no media present and the link is down, use
1108.Dv ETHER_MEDIA_NONE
1109instead.
1110.It Dv ETHER_MEDIA_NONE
1111Represents the case that there is no specific media in use.
1112This should generally be used when the link is down.
1113.It Dv ETHER_MEDIA_10BASE_T
1114Traditional 10 Mbit/s Ethernet based utilizing CAT-3 cabling.
1115Defined in 802.3i.
1116.It Dv ETHER_MEDIA_10BASE_T1
1117A more recent variant of 10 Mbit/s Ethernet that uses a single twisted
1118pair.
1119Defined in 802.3cg.
1120.It Dv ETHER_MEDIA_100BASE_TX
1121The most common form of 100 Mbit/s Ethernet that utilizes two twisted
1122pairs over a CAT-5 cable.
1123Defined in 802.3u.
1124.It Dv ETHER_MEDIA_100BASE_FX
1125100 Mbit/s Ethernet operating over multi-mode fiber.
1126Defined in 802.3u.
1127.It Dv ETHER_MEDIA_100BASE_X
1128This is a general term that covers operating in one of the 100BASE-?X
1129variants.
1130This is here because some PHYs do not distinguish between operating in
1131100BASE-TX and 100BASE-FX.
1132If the driver can determine if it is operating with a BASE-T or fiber
1133based PHY, prefer the more specific types instead.
1134.It Dv ETHER_MEDIA_100BASE_T4
1135This is an uncommon half-duplex variant of 100 Mbit/s Ethernet that
1136operates over CAT-3 cable using four twisted pairs.
1137Defined in 802.3u.
1138.It Dv ETHER_MEDIA_100BASE_T2
1139This is another uncommon variant of 100 Mbit/s Ethernet that only
1140requires two twisted pairs, but unlike 100BASE-TX requires CAT-3 cables.
1141Defined in 802.3y.
1142.It Dv ETHER_MEDIA_100BASE_T1
1143A more recent form of 100 Mbit/s Ethernet that requires only a single
1144twisted pair.
1145Defined in 802.3bw.
1146.It Dv ETHER_MEDIA_100_SGMII
1147This form of 100 Mbit/s Ethernet is generally used for chip-to-chip
1148connectivity and utilizes the SGMII
1149.Pq Serial gigabit media-independent interface
1150specification.
1151.It Dv ETHER_MEDIA_1000BASE_X
1152This is a general catch-all for all 1 Gbit/s fiber-based operation.
1153This is here for compatibility with the generic information returned by
1154traditional 802.3-compatible PHYs.
1155When more specific information is available, that should be used
1156instead.
1157.It Dv ETHER_MEDIA_1000BASE_T
1158Traditional 1 Gbit/s Ethernet that utilizes a CAT-5 cable with four
1159twisted pairs.
1160Defined in 802.3ab.
1161.It Dv ETHER_MEDIA_1000BASE_T1
1162A more recent form of 1 Gbit/s Ethernet that only requires a single
1163twisted pair.
1164.It Dv ETHER_MEDIA_1000BASE_KX
1165This form of 1 Gbit/s Ethernet is designed for operating over a backplane.
1166Defined in 802.3ap.
1167.It Dv ETHER_MEDIA_1000BASE_CX
1168An older form of 1 Gbit/s Ethernet that operates over balanced copper
1169cables.
1170Defined in 802.3z.
1171.It Dv ETHER_MEDIA_1000BASE_SX
11721 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1173each direction.
1174.It Dv ETHER_MEDIA_1000BASE_LX
11751 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1176each direction.
1177.It Dv ETHER_MEDIA_1000BASE_BX
11781 Gbit/s Ethernet operating over a single piece of single-mode fiber.
1179This media operates bi-directionally as opposed to how 1000BASE-LX and
11801000BASE-SX operate.
1181.It Dv ETHER_MEDIA_1000_SGMII
1182A form of 1 Gbit/s Ethernet defined by Cisco that is used for
1183chip-to-chip connectivity.
1184.It Dv ETHER_MEDIA_2500BASE_T
11852.5 Gbit/s Ethernet based on four copper twisted-pairs.
1186Defined in 802.3bz.
1187.It Dv ETHER_MEDIA_2500BASE_KX
11882.5 Gbit/s Ethernet that is designed for operating over a backplane
1189interconnect.
1190Defined in 802.3cb.
1191.It Dv ETHER_MEDIA_2500BASE_X
1192This is a variant of 2.5 Gbit/s Ethernet that took the 1000BASE-X IEEE
1193standard and ran it with a 2.5x faster clock.
1194It is a defacto standard.
1195.It Dv ETHER_MEDIA_5000BASE_T
11965.0 Gbit/s Ethernet based on four copper twisted-pairs.
1197Defined in 802.3bz.
1198.It Dv ETHER_MEDIA_5000BASE_KR
11995.0 Gbit/s Ethernet that is designed for operating over a backplane
1200interconnect.
1201Defined in 802.3cb.
1202.It Dv ETHER_MEDIA_10GBASE_T
120310 Gbit/s Ethernet operating over four copper twisted pairs utilizing
1204CAT-6a cables.
1205Defined in 802.3an.
1206.It Dv ETHER_MEDIA_10GBASE_SR
120710 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1208each direction.
1209Defined in 802.3ae.
1210.It Dv ETHER_MEDIA_10GBASE_LR
121110 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1212each direction.
1213The maximum fiber length is 10km.
1214Defined in 802.3ae.
1215.It Dv ETHER_MEDIA_10GBASE_ER
121610 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1217each direction.
1218The maximum fiber length is 30km.
1219Defined in 802.3ae.
1220.It Dv ETHER_MEDIA_10GBASE_LRM
122110 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1222each direction.
1223This has a longer reach of up to 220m and is a longer distance than
122410GBASE-SR.
1225Defined in 802.3aq.
1226.It Dv ETHER_MEDIA_10GBASE_KR
122710 Gbit/s Ethernet operating over a single lane backplane.
1228Defined n 802.3ap.
1229.It Dv ETHER_MEDIA_10GBASE_CX4
123010 Gbit/s Ethernet operating over a group of four shielded copper cables.
1231Defined in 802.3ak.
1232.It Dv ETHER_MEDIA_10GBASE_KX4
123310 Gbit/s Ethernet operating over a four lane backplane.
1234Defined n 802.3ap.
1235.It Dv ETHER_MEDIA_10GBASE_CR
123610 Gbit/s Ethernet that is built using a passive copper
1237SFP-compatible cable.
1238This is sometimes called 10GSFP+Cu passive.
1239Defined in SFF-8431.
1240.It Dv ETHER_MEDIA_10GBASE_AOC
124110 Gbit/s Ethernet that is built using a short-range active
1242optical cable that is SFP+-compatible.
1243Defined in SFF-8431.
1244.It Dv ETHER_MEDIA_10GBASE_ACC
124510 Gbit/s Ethernet based upon a single lane of copper cable with an
1246active component that allows it go longer distances than 10GBASE-CR.
1247Defined in SFF-8431.
1248.It Dv ETHER_MEDIA_10G_XAUI
124910 Gbit/s signalling that is defined for use between a MAC and PHY.
1250This is the roman numeral X and attachment unit interface.
1251Sometimes used for chip-to-chip interconnects.
1252Defined in 802.3ae.
1253.It Dv ETHER_MEDIA_10G_SFI
125410 Gbit/s signalling that is defined for use between a MAC and an
1255SFP-based transceiver.
1256Defined in SFF-8431.
1257.It Dv ETHER_MEDIA_10G_XFI
125810 Gbit/s signalling that is defined for use between a MAC and an
1259XFP-based transceiver.
1260Defined in INF-8077i
1261.Pq XFP MSA .
1262.It Dv ETHER_MEDIA_25GBASE_T
126325 Gbit/s Ethernet based upon four twisted pair cables using CAT-8
1264cable.
1265Defined in 802.3bq.
1266.It Dv ETHER_MEDIA_25GBASE_SR
126725 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1268each direction.
1269Defined in 802.3by.
1270.It Dv ETHER_MEDIA_25GBASE_LR
127125 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1272each direction.
1273The maximum fiber length is 10km.
1274Defined in 802.3cc.
1275.It Dv ETHER_MEDIA_25GBASE_ER
127625 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1277each direction.
1278The maximum fiber length is 30km.
1279Defined in 802.3cc.
1280.It Dv ETHER_MEDIA_25GBASE_KR
128125 Gbit/s Ethernet operating over a backplane with a single lane.
1282Defined in 802.3by.
1283.It Dv ETHER_MEDIA_25GBASE_CR
128425 Gbit/s Ethernet operating over a single lane of copper cable.
1285Generally used with an SFP28 style connector.
1286Defined in 802.3by.
1287.It Dv ETHER_MEDIA_25GBASE_AOC
128825 Gbit/s Ethernet based that is built using a short-range active
1289optical cable that is SFP28-compatible.
1290Defined loosely by SFF-8402 and often utilizes 25GBASE-SR.
1291.It Dv ETHER_MEDIA_25GBASE_ACC
129225 Gbit/s Ethernet based upon a single lane of copper cable with an
1293active component that allows it go longer distances than 25GBASE-CR.
1294Defined loosely by SFF-8402.
1295.It Dv ETHER_MEDIA_25G_AUI
129625 Gbit/s signalling that is defined for use between a MAC and PHY and
1297for chip-to-chip connectivity.
1298Defined by 802.3by.
1299.It Dv ETHER_MEDIA_40GBASE_T
130040 Gbit/s Ethernet based upon four twisted-pairs of CAT-8 cables.
1301Defined in 802.3bq.
1302.It Dv ETHER_MEDIA_40GBASE_CR4
130340 Gbit/s Ethernet utilizing four lanes of twinaxial copper cabling
1304each operating at 10 Gbit/s.
1305This is generally used with a QSFP+ connector defined in SFF-8635.
1306Defined in 802.3ba.
1307.It Dv ETHER_MEDIA_40GBASE_KR4
130840 Gbit/s Ethernet utilizing four lanes over a copper backplane each
1309operating at 10 Gbit/s.
1310Defined in 802.3ba.
1311.It Dv ETHER_MEDIA_40GBASE_SR4
131240 Gbit/s Ethernet based upon using four pairs of multi-mode fiber, each
1313operating at 10 Gbit/s, with one fiber in the pair being used for
1314transmit and the other for receive.
1315Generally utilizes a QSFP+ connector.
1316Defined in 802.3ba.
1317.It Dv ETHER_MEDIA_40GBASE_LR4
131840 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1319for each direction.
1320Utilizes wavelength multiplexing as the electrical interface is four 10
1321Gbit/s signals.
1322The maximum fiber length is 10km.
1323Defined in 802.3ba.
1324.It Dv ETHER_MEDIA_40GBASE_ER4
132540 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1326for each direction.
1327Utilizes wavelength multiplexing as the electrical interface is four 10
1328Gbit/s signals and generally based upon a QSFP+ connector.
1329The maximum fiber length is 40km.
1330Defined in 802.3bm.
1331.It Dv ETHER_MEDIA_40GBASE_LM4
133240 Gbit/s Ethernet based upon using one pair of multi-mode fibers, one
1333for each direction.
1334Utilizes wavelength multiplexing as the electrical interface is four 10
1335Gbit/s signals and generally based upon a QSFP+ connector.
1336Defined by a specific MSA.
1337.It Dv ETHER_MEDIA_40GBASE_AOC4
133840 Gbit/s Ethernet based upon a QSFP+ based cable with built-in
1339optical transceivers.
1340The electrical interface is four lanes running at 10 Gbit/s.
1341.It Dv ETHER_MEDIA_40GBASE_ACC4
134240 Gbit/s Ethernet based upon four copper lanes each running at 10
1343Gbit/s with some additional component compared to 40GBASE-CR4.
1344.It Dv ETHER_MEDIA_40G_XLAUI
134540 Gbit/s signalling operating across four lanes that is defined for use
1346between a MAC and a PHY or for chip-to-chip connectivity.
1347Defined by 802.3ba.
1348.It Dv ETHER_MEDIA_40G_XLPPI
134940 Gbit/s signalling operating across four lanes that is designed to
1350connect between a chip and a module, generally a QSFP+ based device.
1351Defined in 802.3ba.
1352.It Dv ETHER_MEDIA_50GBASE_KR2
135350 Gbit/s Ethernet which operates over a two lane copper backplane.
1354Each lane operates at 25 Gbit/s.
1355Defined by the 25G and 50G Ethernet consortium.
1356This did not become an IEEE standard.
1357.It Dv ETHER_MEDIA_50GBASE_CR2
135850 Gbit/s Ethernet which operates over two lane copper twinaxial cable,
1359generally with a QSFP+ connector.
1360Each lane operates at 25 Gbit/s.
1361Defined by the 25G and 50G Ethernet consortium.
1362.It Dv ETHER_MEDIA_50GBASE_SR2
136350 Gbit/s Ethernet based upon using four pairs of multi-mode fiber, each
1364operating at 25 Gbit/s, with one fiber in the pair being used for
1365transmit and the other for receive.
1366Generally utilizes a QSFP+ connector.
1367Defined by the 25G and 50G Ethernet consortium.
1368.It Dv ETHER_MEDIA_50GBASE_LR2
136950 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1370for each direction.
1371Utilizes wavelength multiplexing as the electrical interface is two 25
1372Gbit/s signals.
1373Defined by the 25G and 50G Ethernet consortium.
1374.It Dv ETHER_MEDIA_50GBASE_AOC2
137550 Gbit/s Ethernet generally based upon a QSFP+ based cable with built-in
1376optical transceivers.
1377The electrical interface is two lanes running at 25 Gbit/s.
1378.It Dv ETHER_MEDIA_50GBASE_ACC2
137950 Gbit/s Ethernet based upon two copper twinaxial lanes each running at
138025 Gbit/s with some additional component compared to 50GBASE-CR2.
1381.It Dv ETHER_MEDIA_50GBASE_KR
138250 Gbit/s Ethernet operating over a single lane backplane.
1383Defined by 802.3cd.
1384.It Dv ETHER_MEDIA_50GBASE_CR
138550 Gbit/s Ethernet operating over a single lane twinaxial copper cable
1386generally utilizing an SFP56 interface.
1387Defined by 802.3cd.
1388.It Dv ETHER_MEDIA_50GBASE_SR
138950 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1390each direction.
1391Defined by 802.3cd.
1392.It Dv ETHER_MEDIA_50GBASE_LR
139350 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1394each direction.
1395The maximum fiber length is 10km.
1396Defined in 802.3cd.
1397.It Dv ETHER_MEDIA_50GBASE_ER
139850 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1399each direction.
1400The maximum fiber length is 40km.
1401Defined in 802.3cd.
1402.It Dv ETHER_MEDIA_50GBASE_FR
140350 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1404each direction.
1405The maximum fiber length is 2km.
1406Defined in 802.3cd.
1407.It Dv ETHER_MEDIA_50GBASE_AOC
140850 Gbit/s Ethernet that is built using a short-range active optical
1409cable that is generally SFP56 compatible.
1410The electrical interface operates at 25 Gbit/s PAM4 signaling.
1411.It Dv ETHER_MEDIA_50GBASE_ACC
141250 Gbit/s Ethernet that is built using a single lane twinaxial
1413cable that is generally SFP56 compatible but uses an active component
1414such as a retimer or redriver when compared to 50GBASE-CR.
1415.It Dv ETHER_MEDIA_100GBASE_CR10
1416100 Gbit/s Ethernet operating over ten lanes of shielded twinaxial
1417copper cable, each operating at 10 Gbit/s.
1418Defined in 802.3ba.
1419.It Dv ETHER_MEDIA_100GBASE_SR10
1420100 Gbit/s Ethernet based upon using ten pairs of multi-mode fiber, each
1421operating at 10 Gbit/s, with one fiber in the pair being used for
1422transmit and the other for receive.
1423.It Dv ETHER_MEDIA_100GBASE_SR4
1424100 Gbit/s Ethernet based upon using four pairs of multi-mode fiber,
1425each operating at 25 Gbit/s, with one fiber in the pair being used for
1426transmit and the other for receive.
1427Defined by 802.3bm.
1428.It Dv ETHER_MEDIA_100GBASE_LR4
1429100 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1430for each direction.
1431Utilizes wavelength multiplexing as the electrical interface is four 25
1432Gbit/s signals and generally based upon a QSFP28 connector.
1433The maximum fiber length is 10km.
1434Defined by 802.3ba.
1435.It Dv ETHER_MEDIA_100GBASE_ER4
1436100 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1437for each direction.
1438Utilizes wavelength multiplexing as the electrical interface is four 25
1439Gbit/s signals and generally based upon a QSFP28 connector.
1440The maximum fiber length is 40km.
1441Defined by 802.3ba.
1442.It Dv ETHER_MEDIA_100GBASE_KR4
1443100 Gbit/s Ethernet based upon using a four lane copper backplane.
1444Each lane operates at 25 Gbit/s.
1445Defined in 802.3bj.
1446.It Dv ETHER_MEDIA_100GBASE_CAUI4
1447100 Gbit/s signalling used for chip-to-chip and chip-to-module
1448connectivity.
1449Defined in 802.3bm.
1450.It Dv ETHER_MEDIA_100GBASE_CR4
1451100 Gbit/s Ethernet based upon using a four lane copper twinaxial cable.
1452Each lane operates at 25 Gbit/s and generally utilizes a QSFP28
1453connector.
1454Defined in 802.3bj.
1455.It Dv ETHER_MEDIA_100GBASE_AOC4
1456100 Gbit/s Ethernet that utilizes an active optical cable with
1457short-range optical transceivers.
1458Electrically operates as four lanes of 25 Gbit/s and most commonly uses
1459a QSFP28 connector.
1460.It Dv ETHER_MEDIA_100GBASE_ACC4
1461100 Gbit/s Ethernet that utilizes a four lane copper twinaxial cable
1462that unlike 100GBASE-CR4 has an active component such as a retimer or
1463redriver.
1464.It Dv ETHER_MEDIA_100GBASE_KR2
1465100 Gbit/s Ethernet based upon using a two lane copper backplane.
1466Each lane operates at 50 Gbit/s.
1467Defined in 802.3cd.
1468.It Dv ETHER_MEDIA_100GBASE_CR2
1469100 Gbit/s Ethernet that utilizes a two lane copper twinaxial cable.
1470Each lane operates at 50 Gbit/s.
1471Defined by 802.3cd.
1472.It Dv ETHER_MEDIA_100GBASE_SR2
1473100 Gbit/s Ethernet based upon using two pairs of multi-mode fiber,
1474each operating at 50 Gbit/s, with one fiber in the pair being used for
1475transmit and the other for receive.
1476Defined by 802.3cd.
1477.It Dv ETHER_MEDIA_100GBASE_KR
1478100 Gbit/s Ethernet operating over a single lane copper backplane.
1479Defined by 802.3ck.
1480.It Dv ETHER_MEDIA_100GBASE_CR
1481100 Gbit/s Ethernet operating over a single lane copper twinaxial cable.
1482Generally uses an SFP112 connector.
1483Defined by 802.3ck.
1484.It Dv ETHER_MEDIA_100GBASE_SR
1485100 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1486transmitting and one for receiving.
1487The maximum fiber length is 60-100m depending on the fiber type
1488.Pq OM3, OM4 .
1489Defined by 802.3db.
1490.It Dv ETHER_MEDIA_100GBASE_DR
1491100 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1492transmitting and one for receiving.
1493Designed to be used with a parallel DR4/DR8 interface.
1494The maximum fiber length is 500m.
1495Defined by 802.3cd.
1496.It Dv ETHER_MEDIA_100GBASE_LR
1497100 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1498transmitting and one for receiving.
1499The maximum fiber length is 10km.
1500Defined by 802.3cu.
1501.It Dv ETHER_MEDIA_100GBASE_FR
1502100 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1503transmitting and one for receiving.
1504The maximum fiber length is 2km.
1505Defined by 802.3cu.
1506.It Dv ETHER_MEDIA_200GBASE_CR4
1507200 Gbit/s Ethernet utilizing a four lane passive copper twinaxial
1508cable.
1509Each lane operates at 50 Gbit/s and the connector is generally based on
1510QSFP56.
1511Defined by 802.3cd.
1512.It Dv ETHER_MEDIA_200GBASE_KR4
1513200 Gbit/s Ethernet utilizing four lanes over a copper backplane each
1514operating at 50 Gbit/s.
1515Defined by 802.3cd.
1516.It Dv ETHER_MEDIA_200GBASE_SR4
1517200 Gbit/s Ethernet based upon using four pairs of multi-mode fiber,
1518each operating at 50 Gbit/s, with one fiber in the pair being used for
1519transmit and the other for receive.
1520Defined by 802.3cd.
1521.It Dv ETHER_MEDIA_200GBASE_DR4
1522200 Gbit/s Ethernet based upon using four pairs of single-mode fiber,
1523each operating at 50 Gbit/s, with one fiber in the pair being used for
1524transmit and the other for receive.
1525Defined by 802.3bs.
1526.It Dv ETHER_MEDIA_200GBASE_FR4
1527200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1528for transmitting and one for receiving.
1529Utilizes wavelength multiplexing as the electrical interface is four 50
1530Gbit/s signals and generally based upon a QSFP56 connector.
1531The maximum fiber length is 2km.
1532Defined by 802.3bs.
1533.It Dv ETHER_MEDIA_200GBASE_LR4
1534200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1535for transmitting and one for receiving.
1536Utilizes wavelength multiplexing as the electrical interface is four 50
1537Gbit/s signals and generally based upon a QSFP56 connector.
1538The maximum fiber length is 10km.
1539Defined by 802.3bs.
1540.It Dv ETHER_MEDIA_200GBASE_ER4
1541200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1542for transmitting and one for receiving.
1543Utilizes wavelength multiplexing as the electrical interface is four 50
1544Gbit/s signals and generally based upon a QSFP56 connector.
1545The maximum fiber length is 40km.
1546Defined by 802.3bs.
1547.It Dv ETHER_MEDIA_200GAUI_4
1548200 Gbit/s signalling utilizing four lanes each operating at 50 Gbit/s.
1549Used for chip-to-chip and chip-to-module connections.
1550Defined by 802.3bs.
1551.It Dv ETHER_MEDIA_200GBASE_KR2
1552200 Gbit/s Ethernet utilizing two lanes over a copper backplane each
1553operating at 100 Gbit/s.
1554Defined by 802.3ck.
1555.It Dv ETHER_MEDIA_200GBASE_CR2
1556200 Gbit/s Ethernet utilizing a two lane passive copper twinaxial
1557cable.
1558Each lane operates at 100 Gbit/s.
1559Defined by 802.3ck.
1560.It Dv ETHER_MEDIA_200GBASE_SR2
1561200 Gbit/s Ethernet based upon using two pairs of multi-mode fiber,
1562each operating at 100 Gbit/s, with one fiber in the pair being used for
1563transmit and the other for receive.
1564Defined by 802.3db.
1565.It Dv ETHER_MEDIA_200GAUI_2
1566200 Gbit/s signalling utilizing two lanes each operating at 100 Gbit/s.
1567Used for chip-to-chip and chip-to-module connections.
1568Defined by 802.3ck.
1569.It Dv ETHER_MEDIA_400GBASE_KR8
1570400 Gbit/s Ethernet utilizing eight lanes over a copper backplane each
1571operating at 50 Gbit/s.
1572Defined by the 25/50 Gigabit Ethernet Consortium.
1573.It Dv ETHER_MEDIA_400GBASE_FR8
1574200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1575for transmitting and one for receiving.
1576Utilizes wavelength multiplexing as the electrical interface is eight 50
1577Gbit/s signals and generally based upon a QSFP-DD connector.
1578The maximum fiber length is 2km.
1579Defined by 802.3bs.
1580.It Dv ETHER_MEDIA_400GBASE_LR8
1581200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1582for transmitting and one for receiving.
1583Utilizes wavelength multiplexing as the electrical interface is eight 50
1584Gbit/s signals and generally based upon a QSFP-DD connector.
1585The maximum fiber length is 10km.
1586Defined by 802.3bs.
1587.It Dv ETHER_MEDIA_400GBASE_ER8
1588200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1589for transmitting and one for receiving.
1590Utilizes wavelength multiplexing as the electrical interface is eight 50
1591Gbit/s signals and generally based upon a QSFP-DD connector.
1592The maximum fiber length is 40km.
1593Defined by 802.3cn.
1594.It Dv ETHER_MEDIA_400GAUI_8
1595400 Gbit/s signalling utilizing eight lanes each operating at 50 Gbit/s.
1596Used for chip-to-chip and chip-to-module connections.
1597Defined by 802.3bs.
1598.It Dv ETHER_MEDIA_400GBASE_KR4
1599400 Gbit/s Ethernet utilizing four lanes over a copper backplane each
1600operating at 100 Gbit/s.
1601Defined by 802.3ck.
1602.It Dv ETHER_MEDIA_400GBASE_CR4
1603200 Gbit/s Ethernet utilizing a two lane passive copper twinaxial
1604cable.
1605Each lane operates at 100 Gbit/s and generally uses a QSFP112 connector.
1606Defined by 802.3ck.
1607.It Dv ETHER_MEDIA_400GBASE_SR4
1608400 Gbit/s Ethernet based upon using four pairs of multi-mode fiber,
1609each operating at 100 Gbit/s, with one fiber in the pair being used for
1610transmit and the other for receive.
1611Defined by 802.3db.
1612.It Dv ETHER_MEDIA_400GBASE_DR4
1613400 Gbit/s Ethernet based upon using four pairs of single-mode fiber,
1614each operating at 100 Gbit/s, with one fiber in the pair being used for
1615transmit and the other for receive.
1616The maximum fiber length is 500m.
1617Defined by 802.3bs.
1618.It Dv ETHER_MEDIA_400GBASE_FR4
1619400 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1620for transmitting and one for receiving.
1621Utilizes wavelength multiplexing as the electrical interface is four 100
1622Gbit/s signals and generally based upon a QSFP112 connector.
1623The maximum fiber length is 2km.
1624Defined by 802.3cu.
1625.It Dv ETHER_MEDIA_400GAUI_4
1626400 Gbit/s signalling utilizing four lanes each operating at 100 Gbit/s.
1627Used for chip-to-chip and chip-to-module connections.
1628Defined by 802.3ck.
1629.El
1630.It Dv MAC_PROP_AUTONEG
1631.Bd -filled -compact
1632Type:
1633.Vt uint8_t |
1634Permissions:
1635.Sy Read/Write
1636.Ed
1637.Pp
1638The
1639.Dv MAC_PROP_AUTONEG
1640property indicates whether or not the device is currently configured to
1641perform auto-negotiation.
1642A value of
1643.Sy 0
1644indicates that auto-negotiation is disabled.
1645A
1646.Sy non-zero
1647value indicates that auto-negotiation is enabled.
1648Devices should generally default to enabling auto-negotiation.
1649.Pp
1650When getting this property, the device driver should return the current
1651state.
1652When setting this property, if the device supports operating in the requested
1653mode, then the device driver should reset the link to negotiate to the new speed
1654after updating any internal registers.
1655.It Dv MAC_PROP_MTU
1656.Bd -filled -compact
1657Type:
1658.Vt uint32_t |
1659Permissions:
1660.Sy Read/Write
1661.Ed
1662.Pp
1663The
1664.Dv MAC_PROP_MTU
1665property determines the maximum transmission unit (MTU).
1666This indicates the maximum size packet that the device can transmit, ignoring
1667its own headers.
1668For an Ethernet device, this would exclude the size of the Ethernet header and
1669any VLAN headers that would be placed.
1670It is up to the driver to ensure that any MTU values that it accepts when adding
1671in its margin and header sizes does not exceed its maximum frame size.
1672.Pp
1673By default, drivers for Ethernet should initialize this value and the
1674MTU to
1675.Sy 1500 .
1676When getting this property, the driver should return its current
1677recorded MTU.
1678When setting this property, the driver should first validate that it is within
1679the device's valid range and then it must call
1680.Xr mac_maxsdu_update 9F .
1681Note that the call may fail.
1682If the call completes successfully, the driver should update the hardware with
1683the new value of the MTU and perform any other work needed to handle it.
1684.Pp
1685If the device does not support changing the MTU after the device's
1686.Xr mc_start 9E
1687entry point has been called, then driver writers should return
1688.Er EBUSY .
1689.It Dv MAC_PROP_FLOWCTRL
1690.Bd -filled -compact
1691Type:
1692.Vt link_flowctrl_t |
1693Permissions:
1694.Sy Read/Write
1695.Ed
1696.Pp
1697The
1698.Dv MAC_PROP_FLOWCTRL
1699property manages the configuration of pause frames as part of Ethernet
1700flow control.
1701Note, this only describes what this device will advertise.
1702What is actually enabled may be different and is subject to the rules of
1703auto-negotiation.
1704The
1705.Vt link_flowctrl_t
1706is an enumeration that may be set to one of the following values:
1707.Bl -tag -width Ds
1708.It Dv LINK_FLOWCTRL_NONE
1709Flow control is disabled.
1710No pause frames should be generated or honored.
1711.It Dv LINK_FLOWCTRL_RX
1712The device can receive pause frames; however, it should not generate
1713them.
1714.It Dv LINK_FLOWCTRL_TX
1715The device can generate pause frames; however, it does not support
1716receiving them.
1717.It Dv LINK_FLOWCTRL_BI
1718The device supports both sending and receiving pause frames.
1719.El
1720.Pp
1721When getting this property, the device driver should return the way that
1722it has configured the device, not what the device has actually
1723negotiated.
1724When setting the property, it should update the hardware and allow the link to
1725potentially perform auto-negotiation again.
1726.It Dv MAC_PROP_EN_FEC_CAP
1727.Bd -filled -compact
1728Type:
1729.Vt link_fec_t |
1730Permissions:
1731.Sy Read/Write
1732.Ed
1733.Pp
1734The
1735.Dv MAC_PROP_EN_FEC_CAP
1736property indicates which Forward Error Correction (FEC) code is advertised
1737by the device.
1738.Pp
1739The
1740.Vt link_fec_t
1741is an enumeration that may be a combination of the following bit values:
1742.Bl -tag -width Ds
1743.It Dv LINK_FEC_NONE
1744No FEC over the link.
1745.It Dv LINK_FEC_AUTO
1746The FEC coding to use is auto-negotiated,
1747.Dv LINK_FEC_AUTO
1748cannot be set along with any of the other values.
1749This is the default setting the device driver should use.
1750.It Dv LINK_FEC_RS
1751The link may use Reed-Solomon FEC coding.
1752.It Dv LINK_FEC_BASE_R
1753The link may use Base-R coding, also common referred to as FireCode.
1754.El
1755.Pp
1756When setting the property, it should update the hardware with the requested, or
1757combination of requested codings.
1758If a particular combination of codings is not supported by the hardware,
1759the device driver should return
1760.Er EINVAL .
1761When retrieving this property, the device driver should return the current
1762value of the property.
1763.It Dv MAC_PROP_ADV_FEC_CAP
1764.Bd -filled -compact
1765Type:
1766.Vt link_fec_t |
1767Permissions:
1768.Sy Read-Only
1769.Ed
1770.Pp
1771The
1772.Dv MAC_PROP_ADV_FEC_CAP
1773has the same values as
1774.Dv MAC_PROP_EN_FEC_CAP .
1775The property indicates which Forward Error Correction (FEC) code has been
1776negotiated over the link.
1777.El
1778.Pp
1779The remaining properties are all about various auto-negotiation link
1780speeds.
1781They fall into two different buckets: properties with
1782.Sy _ADV_
1783in the name and properties with
1784.Sy _EN_
1785in the name.
1786For any given supported speed, there is one of each.
1787The
1788.Sy _EN_
1789set of properties are read/write properties that control what should be
1790advertised by the device.
1791When these are retrieved, they should return the current value of the property.
1792When they are set, they should change how the hardware advertises the specific
1793speed and trigger any kind of link reset and auto-negotiation, if enabled, to
1794occur.
1795.Pp
1796The
1797.Sy _ADV_
1798set of properties are read-only properties.
1799They are meant to reflect what has actually been negotiated.
1800These may be different from the
1801.Sy _EN_
1802family of properties, especially when different power management
1803settings are at play.
1804.Pp
1805See the
1806.Sx Link Speed and Auto-negotiation
1807section for more information.
1808.Pp
1809The properties are ordered in increasing link speed:
1810.Bl -hang -width Ds
1811.It Dv MAC_PROP_ADV_10HDX_CAP
1812.Bd -filled -compact
1813Type:
1814.Vt uint8_t |
1815Permissions:
1816.Sy Read-Only
1817.Ed
1818.Pp
1819The
1820.Dv MAC_PROP_ADV_10HDX_CAP
1821property describes whether or not 10 Mbit/s half-duplex support is
1822advertised.
1823.It Dv MAC_PROP_EN_10HDX_CAP
1824.Bd -filled -compact
1825Type:
1826.Vt uint8_t |
1827Permissions:
1828.Sy Read/Write
1829.Ed
1830.Pp
1831The
1832.Dv MAC_PROP_EN_10HDX_CAP
1833property describes whether or not 10 Mbit/s half-duplex support is
1834enabled.
1835.It Dv MAC_PROP_ADV_10FDX_CAP
1836.Bd -filled -compact
1837Type:
1838.Vt uint8_t |
1839Permissions:
1840.Sy Read-Only
1841.Ed
1842.Pp
1843The
1844.Dv MAC_PROP_ADV_10FDX_CAP
1845property describes whether or not 10 Mbit/s full-duplex support is
1846advertised.
1847.It Dv MAC_PROP_EN_10FDX_CAP
1848.Bd -filled -compact
1849Type:
1850.Vt uint8_t |
1851Permissions:
1852.Sy Read/Write
1853.Ed
1854.Pp
1855The
1856.Dv MAC_PROP_EN_10FDX_CAP
1857property describes whether or not 10 Mbit/s full-duplex support is
1858enabled.
1859.It Dv MAC_PROP_ADV_100HDX_CAP
1860.Bd -filled -compact
1861Type:
1862.Vt uint8_t |
1863Permissions:
1864.Sy Read-Only
1865.Ed
1866.Pp
1867The
1868.Dv MAC_PROP_ADV_100HDX_CAP
1869property describes whether or not 100 Mbit/s half-duplex support is
1870advertised.
1871.It Dv MAC_PROP_EN_100HDX_CAP
1872.Bd -filled -compact
1873Type:
1874.Vt uint8_t |
1875Permissions:
1876.Sy Read/Write
1877.Ed
1878.Pp
1879The
1880.Dv MAC_PROP_EN_100HDX_CAP
1881property describes whether or not 100 Mbit/s half-duplex support is
1882enabled.
1883.It Dv MAC_PROP_ADV_100FDX_CAP
1884.Bd -filled -compact
1885Type:
1886.Vt uint8_t |
1887Permissions:
1888.Sy Read-Only
1889.Ed
1890.Pp
1891The
1892.Dv MAC_PROP_ADV_100FDX_CAP
1893property describes whether or not 100 Mbit/s full-duplex support is
1894advertised.
1895.It Dv MAC_PROP_EN_100FDX_CAP
1896.Bd -filled -compact
1897Type:
1898.Vt uint8_t |
1899Permissions:
1900.Sy Read/Write
1901.Ed
1902.Pp
1903The
1904.Dv MAC_PROP_EN_100FDX_CAP
1905property describes whether or not 100 Mbit/s full-duplex support is
1906enabled.
1907.It Dv MAC_PROP_ADV_100T4_CAP
1908.Bd -filled -compact
1909Type:
1910.Vt uint8_t |
1911Permissions:
1912.Sy Read-Only
1913.Ed
1914.Pp
1915The
1916.Dv MAC_PROP_ADV_100T4_CAP
1917property describes whether or not 100 Mbit/s Ethernet using the
1918100BASE-T4 standard is
1919advertised.
1920.It Dv MAC_PROP_EN_100T4_CAP
1921.Bd -filled -compact
1922Type:
1923.Vt uint8_t |
1924Permissions:
1925.Sy Read/Write
1926.Ed
1927.Pp
1928The
1929.Sy MAC_PROP_ADV_100T4_CAP
1930property describes whether or not 100 Mbit/s Ethernet using the
1931100BASE-T4 standard is
1932enabled.
1933.It Sy MAC_PROP_ADV_1000HDX_CAP
1934.Bd -filled -compact
1935Type:
1936.Vt uint8_t |
1937Permissions:
1938.Sy Read-Only
1939.Ed
1940.Pp
1941The
1942.Dv MAC_PROP_ADV_1000HDX_CAP
1943property describes whether or not 1 Gbit/s half-duplex support is
1944advertised.
1945.It Dv MAC_PROP_EN_1000HDX_CAP
1946.Bd -filled -compact
1947Type:
1948.Vt uint8_t |
1949Permissions:
1950.Sy Read/Write
1951.Ed
1952.Pp
1953The
1954.Dv MAC_PROP_EN_1000HDX_CAP
1955property describes whether or not 1 Gbit/s half-duplex support is
1956enabled.
1957.It Dv MAC_PROP_ADV_1000FDX_CAP
1958.Bd -filled -compact
1959Type:
1960.Vt uint8_t |
1961Permissions:
1962.Sy Read-Only
1963.Ed
1964.Pp
1965The
1966.Dv MAC_PROP_ADV_1000FDX_CAP
1967property describes whether or not 1 Gbit/s full-duplex support is
1968advertised.
1969.It Dv MAC_PROP_EN_1000FDX_CAP
1970.Bd -filled -compact
1971Type:
1972.Vt uint8_t |
1973Permissions:
1974.Sy Read/Write
1975.Ed
1976.Pp
1977The
1978.Dv MAC_PROP_EN_1000FDX_CAP
1979property describes whether or not 1 Gbit/s full-duplex support is
1980enabled.
1981.It Dv MAC_PROP_ADV_2500FDX_CAP
1982.Bd -filled -compact
1983Type:
1984.Vt uint8_t |
1985Permissions:
1986.Sy Read-Only
1987.Ed
1988.Pp
1989The
1990.Dv MAC_PROP_ADV_2500FDX_CAP
1991property describes whether or not 2.5 Gbit/s full-duplex support is
1992advertised.
1993.It Dv MAC_PROP_EN_2500FDX_CAP
1994.Bd -filled -compact
1995Type:
1996.Vt uint8_t |
1997Permissions:
1998.Sy Read/Write
1999.Ed
2000.Pp
2001The
2002.Dv MAC_PROP_EN_2500FDX_CAP
2003property describes whether or not 2.5 Gbit/s full-duplex support is
2004enabled.
2005.It Dv MAC_PROP_ADV_5000FDX_CAP
2006.Bd -filled -compact
2007Type:
2008.Vt uint8_t |
2009Permissions:
2010.Sy Read-Only
2011.Ed
2012.Pp
2013The
2014.Dv MAC_PROP_ADV_5000FDX_CAP
2015property describes whether or not 5.0 Gbit/s full-duplex support is
2016advertised.
2017.It Dv MAC_PROP_EN_5000FDX_CAP
2018.Bd -filled -compact
2019Type:
2020.Vt uint8_t |
2021Permissions:
2022.Sy Read/Write
2023.Ed
2024.Pp
2025The
2026.Dv MAC_PROP_EN_5000FDX_CAP
2027property describes whether or not 5.0 Gbit/s full-duplex support is
2028enabled.
2029.It Dv MAC_PROP_ADV_10GFDX_CAP
2030.Bd -filled -compact
2031Type:
2032.Vt uint8_t |
2033Permissions:
2034.Sy Read-Only
2035.Ed
2036.Pp
2037The
2038.Dv MAC_PROP_ADV_10GFDX_CAP
2039property describes whether or not 10 Gbit/s full-duplex support is
2040advertised.
2041.It Dv MAC_PROP_EN_10GFDX_CAP
2042.Bd -filled -compact
2043Type:
2044.Vt uint8_t |
2045Permissions:
2046.Sy Read/Write
2047.Ed
2048.Pp
2049The
2050.Dv MAC_PROP_EN_10GFDX_CAP
2051property describes whether or not 10 Gbit/s full-duplex support is
2052enabled.
2053.It Dv MAC_PROP_ADV_40GFDX_CAP
2054.Bd -filled -compact
2055Type:
2056.Vt uint8_t |
2057Permissions:
2058.Sy Read-Only
2059.Ed
2060.Pp
2061The
2062.Dv MAC_PROP_ADV_40GFDX_CAP
2063property describes whether or not 40 Gbit/s full-duplex support is
2064advertised.
2065.It Dv MAC_PROP_EN_40GFDX_CAP
2066.Bd -filled -compact
2067Type:
2068.Vt uint8_t |
2069Permissions:
2070.Sy Read/Write
2071.Ed
2072.Pp
2073The
2074.Dv MAC_PROP_EN_40GFDX_CAP
2075property describes whether or not 40 Gbit/s full-duplex support is
2076enabled.
2077.It Dv MAC_PROP_ADV_100GFDX_CAP
2078.Bd -filled -compact
2079Type:
2080.Vt uint8_t |
2081Permissions:
2082.Sy Read-Only
2083.Ed
2084.Pp
2085The
2086.Dv MAC_PROP_ADV_100GFDX_CAP
2087property describes whether or not 100 Gbit/s full-duplex support is
2088advertised.
2089.It Dv MAC_PROP_EN_100GFDX_CAP
2090.Bd -filled -compact
2091Type:
2092.Vt uint8_t |
2093Permissions:
2094.Sy Read/Write
2095.Ed
2096.Pp
2097The
2098.Dv MAC_PROP_EN_100GFDX_CAP
2099property describes whether or not 100 Gbit/s full-duplex support is
2100enabled.
2101.El
2102.Ss Private Properties
2103In addition to the defined properties above, drivers are allowed to
2104define private properties.
2105These private properties are device-specific properties.
2106All private properties share the same constant,
2107.Dv MAC_PROP_PRIVATE .
2108Properties are distinguished by a name, which is a character string.
2109The list of such private properties is defined when registering with mac in the
2110.Fa m_priv_props
2111member of the
2112.Xr mac_register 9S
2113structure.
2114.Pp
2115The driver may define whatever semantics it wants for these private
2116properties.
2117They will not be listed when running
2118.Xr dladm 8 ,
2119unless explicitly requested by name.
2120All such properties should start with a leading underscore character and then
2121consist of alphanumeric ASCII characters and additional underscores or hyphens.
2122.Pp
2123Properties of type
2124.Dv MAC_PROP_PRIVATE
2125may show up in all three property related entry points:
2126.Xr mc_propinfo 9E ,
2127.Xr mc_getprop 9E ,
2128and
2129.Xr mc_setprop 9E .
2130Device drivers should tell the different properties apart by using the
2131.Xr strcmp 9F
2132function to compare it to the set of properties that it knows about.
2133When encountering properties that it doesn't know, it should treat them
2134like all other unknown properties.
2135.Sh STATISTICS
2136The MAC framework defines a couple different sets of statistics which
2137are based on various standards for devices to implement.
2138Statistics are retrieved through the
2139.Xr mc_getstat 9E
2140entry point.
2141There are both statistics that are required for all devices and then there is a
2142separate set of Ethernet specific statistics.
2143Not all devices will support every statistic.
2144In many cases, several device registers will need to be combined to create the
2145proper stat.
2146.Pp
2147In general, if the device is not keeping track of these statistics, then
2148it is recommended that the driver store these values as a
2149.Vt uint64_t
2150to ensure that overflow does not occur.
2151.Pp
2152If a device does not support a specific statistic, then it is fine to
2153return that it is not supported.
2154The same should be used for unrecognized statistics.
2155See
2156.Xr mc_getstat 9E
2157for more information on the proper way to handle these.
2158.Ss General Device Statistics
2159The following statistics are based on MIB-II statistics from both RFC
21601213 and RFC 1573.
2161.Bl -tag -width Ds
2162.It Dv MAC_STAT_IFSPEED
2163The device's current speed in bits per second.
2164.It Dv MAC_STAT_MULTIRCV
2165The total number of received multicast packets.
2166.It Dv MAC_STAT_BRDCSTRCV
2167The total number of received broadcast packets.
2168.It Dv MAC_STAT_MULTIXMT
2169The total number of transmitted multicast packets.
2170.It Dv MAC_STAT_BRDCSTXMT
2171The total number of received broadcast packets.
2172.It Dv MAC_STAT_NORCVBUF
2173The total number of packets discarded by the hardware due to a lack of
2174receive buffers.
2175.It Dv MAC_STAT_IERRORS
2176The total number of errors detected on input.
2177.It Dv MAC_STAT_UNKNOWNS
2178The total number of received packets that were discarded because they
2179were of an unknown protocol.
2180.It Dv MAC_STAT_NOXMTBUF
2181The total number of outgoing packets dropped due to a lack of transmit
2182buffers.
2183.It Dv MAC_STAT_OERRORS
2184The total number of outgoing packets that resulted in errors.
2185.It Dv MAC_STAT_COLLISIONS
2186Total number of collisions encountered by the transmitter.
2187.It Dv MAC_STAT_RBYTES
2188The total number of bytes received by the device, regardless of packet
2189type.
2190.It Dv MAC_STAT_IPACKETS
2191The total number of packets received by the device, regardless of packet type.
2192.It Dv MAC_STAT_OBYTES
2193The total number of bytes transmitted by the device, regardless of packet type.
2194.It Dv MAC_STAT_OPACKETS
2195The total number of packets sent by the device, regardless of packet type.
2196.It Dv MAC_STAT_UNDERFLOWS
2197The total number of packets that were smaller than the minimum sized
2198packet for the device and were therefore dropped.
2199.It Dv MAC_STAT_OVERFLOWS
2200The total number of packets that were larger than the maximum sized
2201packet for the device and were therefore dropped.
2202.El
2203.Ss Ethernet Specific Statistics
2204The following statistics are specific to Ethernet devices.
2205They refer to values from RFC 1643 and include various MII/GMII specific stats.
2206Many of these are also defined in IEEE 802.3.
2207.Bl -tag -width Ds
2208.It Dv ETHER_STAT_ADV_CAP_1000FDX
2209Indicates that the device is advertising support for 1 Gbit/s
2210full-duplex operation.
2211.It Dv ETHER_STAT_ADV_CAP_1000HDX
2212Indicates that the device is advertising support for 1 Gbit/s
2213half-duplex operation.
2214.It Dv ETHER_STAT_ADV_CAP_100FDX
2215Indicates that the device is advertising support for 100 Mbit/s
2216full-duplex operation.
2217.It Dv ETHER_STAT_ADV_CAP_100GFDX
2218Indicates that the device is advertising support for 100 Gbit/s
2219full-duplex operation.
2220.It Dv ETHER_STAT_ADV_CAP_100HDX
2221Indicates that the device is advertising support for 100 Mbit/s
2222half-duplex operation.
2223.It Dv ETHER_STAT_ADV_CAP_100T4
2224Indicates that the device is advertising support for 100 Mbit/s
2225100BASE-T4 operation.
2226.It Dv ETHER_STAT_ADV_CAP_10FDX
2227Indicates that the device is advertising support for 10 Mbit/s
2228full-duplex operation.
2229.It Dv ETHER_STAT_ADV_CAP_10GFDX
2230Indicates that the device is advertising support for 10 Gbit/s
2231full-duplex operation.
2232.It Dv ETHER_STAT_ADV_CAP_10HDX
2233Indicates that the device is advertising support for 10 Mbit/s
2234half-duplex operation.
2235.It Dv ETHER_STAT_ADV_CAP_2500FDX
2236Indicates that the device is advertising support for 2.5 Gbit/s
2237full-duplex operation.
2238.It Dv ETHER_STAT_ADV_CAP_40GFDX
2239Indicates that the device is advertising support for 40 Gbit/s
2240full-duplex operation.
2241.It Dv ETHER_STAT_ADV_CAP_5000FDX
2242Indicates that the device is advertising support for 5.0 Gbit/s
2243full-duplex operation.
2244.It Dv ETHER_STAT_ADV_CAP_ASMPAUSE
2245Indicates that the device is advertising support for receiving pause
2246frames.
2247.It Dv ETHER_STAT_ADV_CAP_AUTONEG
2248Indicates that the device is advertising support for auto-negotiation.
2249.It Dv ETHER_STAT_ADV_CAP_PAUSE
2250Indicates that the device is advertising support for generating pause
2251frames.
2252.It Dv ETHER_STAT_ADV_REMFAULT
2253Indicates that the device is advertising support for detecting faults in
2254the remote link peer.
2255.It Dv ETHER_STAT_ALIGN_ERRORS
2256Indicates the number of times an alignment error was generated by the
2257Ethernet device.
2258This is a count of packets that were not an integral number of octets and failed
2259the FCS check.
2260.It Dv ETHER_STAT_CAP_1000FDX
2261Indicates the device supports 1 Gbit/s full-duplex operation.
2262.It Dv ETHER_STAT_CAP_1000HDX
2263Indicates the device supports 1 Gbit/s half-duplex operation.
2264.It Dv ETHER_STAT_CAP_100FDX
2265Indicates the device supports 100 Mbit/s full-duplex operation.
2266.It Dv ETHER_STAT_CAP_100GFDX
2267Indicates the device supports 100 Gbit/s full-duplex operation.
2268.It Dv ETHER_STAT_CAP_100HDX
2269Indicates the device supports 100 Mbit/s half-duplex operation.
2270.It Dv ETHER_STAT_CAP_100T4
2271Indicates the device supports 100 Mbit/s 100BASE-T4 operation.
2272.It Dv ETHER_STAT_CAP_10FDX
2273Indicates the device supports 10 Mbit/s full-duplex operation.
2274.It Dv ETHER_STAT_CAP_10GFDX
2275Indicates the device supports 10 Gbit/s full-duplex operation.
2276.It Dv ETHER_STAT_CAP_10HDX
2277Indicates the device supports 10 Mbit/s half-duplex operation.
2278.It Dv ETHER_STAT_CAP_2500FDX
2279Indicates the device supports 2.5 Gbit/s full-duplex operation.
2280.It Dv ETHER_STAT_CAP_40GFDX
2281Indicates the device supports 40 Gbit/s full-duplex operation.
2282.It Dv ETHER_STAT_CAP_5000FDX
2283Indicates the device supports 5.0 Gbit/s full-duplex operation.
2284.It Dv ETHER_STAT_CAP_ASMPAUSE
2285Indicates that the device supports the ability to receive pause frames.
2286.It Dv ETHER_STAT_CAP_AUTONEG
2287Indicates that the device supports the ability to perform link
2288auto-negotiation.
2289.It Dv ETHER_STAT_CAP_PAUSE
2290Indicates that the device supports the ability to transmit pause frames.
2291.It Dv ETHER_STAT_CAP_REMFAULT
2292Indicates that the device supports the ability of detecting a remote
2293fault in a link peer.
2294.It Dv ETHER_STAT_CARRIER_ERRORS
2295Indicates the number of times that the Ethernet carrier sense condition
2296was lost or not asserted.
2297.It Dv ETHER_STAT_DEFER_XMTS
2298Indicates the number of frames for which the device was unable to
2299transmit the frame due to being busy and had to try again.
2300.It Dv ETHER_STAT_EX_COLLISIONS
2301Indicates the number of frames that failed to send due to an excessive
2302number of collisions.
2303.It Dv ETHER_STAT_FCS_ERRORS
2304Indicates the number of times that a frame check sequence failed.
2305.It Dv ETHER_STAT_FIRST_COLLISIONS
2306Indicates the number of times that a frame was eventually transmitted
2307successfully, but only after a single collision.
2308.It Dv ETHER_STAT_JABBER_ERRORS
2309Indicates the number of frames that were received that were both larger
2310than the maximum packet size and failed the frame check sequence.
2311.It Dv ETHER_STAT_LINK_ASMPAUSE
2312Indicates whether the link is currently configured to accept pause
2313frames.
2314.It Dv ETHER_STAT_LINK_AUTONEG
2315Indicates whether the current link state is a result of
2316auto-negotiation.
2317.It Dv ETHER_STAT_LINK_DUPLEX
2318Indicates the current duplex state of the link.
2319The values used here should be the same as documented for
2320.Dv MAC_PROP_DUPLEX .
2321.It Dv ETHER_STAT_LINK_PAUSE
2322Indicates whether the link is currently configured to generate pause
2323frames.
2324.It Dv ETHER_STAT_LP_CAP_1000FDX
2325Indicates the remote device supports 1 Gbit/s full-duplex operation.
2326.It Dv ETHER_STAT_LP_CAP_1000HDX
2327Indicates the remote device supports 1 Gbit/s half-duplex operation.
2328.It Dv ETHER_STAT_LP_CAP_100FDX
2329Indicates the remote device supports 100 Mbit/s full-duplex operation.
2330.It Dv ETHER_STAT_LP_CAP_100GFDX
2331Indicates the remote device supports 100 Gbit/s full-duplex operation.
2332.It Dv ETHER_STAT_LP_CAP_100HDX
2333Indicates the remote device supports 100 Mbit/s half-duplex operation.
2334.It Dv ETHER_STAT_LP_CAP_100T4
2335Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation.
2336.It Dv ETHER_STAT_LP_CAP_10FDX
2337Indicates the remote device supports 10 Mbit/s full-duplex operation.
2338.It Dv ETHER_STAT_LP_CAP_10GFDX
2339Indicates the remote device supports 10 Gbit/s full-duplex operation.
2340.It Dv ETHER_STAT_LP_CAP_10HDX
2341Indicates the remote device supports 10 Mbit/s half-duplex operation.
2342.It Dv ETHER_STAT_LP_CAP_2500FDX
2343Indicates the remote device supports 2.5 Gbit/s full-duplex operation.
2344.It Dv ETHER_STAT_LP_CAP_40GFDX
2345Indicates the remote device supports 40 Gbit/s full-duplex operation.
2346.It Dv ETHER_STAT_LP_CAP_5000FDX
2347Indicates the remote device supports 5.0 Gbit/s full-duplex operation.
2348.It Dv ETHER_STAT_LP_CAP_ASMPAUSE
2349Indicates that the remote device supports the ability to receive pause
2350frames.
2351.It Dv ETHER_STAT_LP_CAP_AUTONEG
2352Indicates that the remote device supports the ability to perform link
2353auto-negotiation.
2354.It Dv ETHER_STAT_LP_CAP_PAUSE
2355Indicates that the remote device supports the ability to transmit pause
2356frames.
2357.It Dv ETHER_STAT_LP_CAP_REMFAULT
2358Indicates that the remote device supports the ability of detecting a
2359remote fault in a link peer.
2360.It Dv ETHER_STAT_MACRCV_ERRORS
2361Indicates the number of times that the internal MAC layer encountered an
2362error when attempting to receive and process a frame.
2363.It Dv ETHER_STAT_MACXMT_ERRORS
2364Indicates the number of times that the internal MAC layer encountered an
2365error when attempting to process and transmit a frame.
2366.It Dv ETHER_STAT_MULTI_COLLISIONS
2367Indicates the number of times that a frame was eventually transmitted
2368successfully, but only after more than one collision.
2369.It Dv ETHER_STAT_SQE_ERRORS
2370Indicates the number of times that an SQE error occurred.
2371The specific conditions for this error are documented in IEEE 802.3.
2372.It Dv ETHER_STAT_TOOLONG_ERRORS
2373Indicates the number of frames that were received that were longer than
2374the maximum frame size supported by the device.
2375.It Dv ETHER_STAT_TOOSHORT_ERRORS
2376Indicates the number of frames that were received that were shorter than
2377the minimum frame size supported by the device.
2378.It Dv ETHER_STAT_TX_LATE_COLLISIONS
2379Indicates the number of times a collision was detected late on the
2380device.
2381.It Dv ETHER_STAT_XCVR_ADDR
2382Indicates the address of the MII/GMII receiver address.
2383.It Dv ETHER_STAT_XCVR_ID
2384Indicates the id of the MII/GMII receiver address.
2385.It Dv ETHER_STAT_XCVR_INUSE
2386Indicates what kind of transceiver is in use.
2387Use the
2388.Vt mac_ether_media_t
2389enumeration values described in the discussion of
2390.Dv MAC_PROP_MEDIA
2391above.
2392These definitions are compatible with the older subset of
2393XCVR_* macros.
2394.El
2395.Ss Device Specific kstats
2396In addition to the defined statistics above, if the device driver
2397maintains additional statistics or the device provides additional
2398statistics, it should create its own kstats through the
2399.Xr kstat_create 9F
2400function to allow operators to observe them.
2401.Sh RECEIVE DESCRIPTOR LAYOUT
2402One of the important things that a device driver must do is lay out DMA
2403memory, generally in a ring of descriptors, into which received Ethernet
2404frames will be placed.
2405When performing this, there are a few things that drivers should
2406generally do:
2407.Bl -enum -offset indent
2408.It
2409Drivers should lay out memory so that the IP header will be 4-byte
2410aligned.
2411The IP stack expects that the beginning of an IP header will be at a
24124-byte aligned address; however, a DMA allocation will be at a 4-
2413or 8-byte aligned address by default.
2414The IP hearder is at a 14 byte offset from the beginning of the Ethernet
2415frame, leaving the IP header at a 2-byte alignment if the Ethernet frame
2416starts at the beginning of the DMA buffer.
2417If VLAN tagging is in place, then each VLAN tag adds 4 bytes, which
2418doesn't change the alignment the IP header is found at.
2419.Pp
2420As a solution to this, the driver should program the device to start
2421placing the received Ethernet frame at two bytes off of the start of the
2422DMA buffer.
2423This will make sure that no matter whether or not VLAN tags are present,
2424that the IP header will be 4-byte aligned.
2425.It
2426Drivers should try to allocate the DMA memory used for receiving frames
2427as a continuous buffer.
2428If for some reason that would not be possible, the driver should try to
2429ensure that there is enough space for all of the initial Ethernet and
2430any possible layer three and layer four headers
2431.Pq such as IP, TCP, or UDP
2432in the initial descriptor.
2433.It
2434As discussed in the
2435.Sx MBLKS AND DMA
2436section, there are multiple strategies for managing the relationship
2437between DMA data, receive descriptors, and the operating system
2438representation of a packet in the
2439.Xr mblk 9S
2440structure.
2441Drivers must limit their resource consumption.
2442See the
2443.Sy Considerations
2444section of
2445.Sx MBLKS AND DMA
2446for more on this.
2447.El
2448.Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT
2449Device drivers are the first line of defense for dealing with broken
2450devices and bugs in their firmware.
2451While most devices will rarely fail, it is important that when designing and
2452implementing the device driver that particular attention is paid in the design
2453with respect to RAS (Reliability, Availability, and Serviceability).
2454While everything described in this section is optional, it is highly recommended
2455that all new device drivers follow these guidelines.
2456.Pp
2457The Fault Management Architecture (FMA) provides facilities for
2458detecting and reporting various classes of defects and faults.
2459Specifically for networking device drivers, issues that should be
2460detected and reported include:
2461.Bl -bullet -offset indent
2462.It
2463Device internal uncorrectable errors
2464.It
2465Device internal correctable errors
2466.It
2467PCI and PCI Express transport errors
2468.It
2469Device temperature alarms
2470.It
2471Device transmission stalls
2472.It
2473Device communication timeouts
2474.It
2475High invalid interrupts
2476.El
2477.Pp
2478All such errors fall into three primary categories:
2479.Bl -enum -offset indent
2480.It
2481Errors detected by the Fault Management Architecture
2482.It
2483Errors detected by the device and indicated to the device driver
2484.It
2485Errors detected by the device driver
2486.El
2487.Ss Fault Management Setup and Teardown
2488Drivers should initialize support for the fault management framework by
2489calling
2490.Xr ddi_fm_init 9F
2491from their
2492.Xr attach 9E
2493routine.
2494By registering with the fault management framework, a device driver is given the
2495chance to detect and notice transport errors as well as report other errors that
2496exist.
2497While a device driver does not need to indicate that it is capable of all such
2498capabilities described in
2499.Xr ddi_fm_init 9F ,
2500we suggest that device drivers at least register the
2501.Dv DDI_FM_EREPORT_CAPABLE
2502so as to allow the driver to report issues that it detects.
2503.Pp
2504If the driver registers with the fault management framework during its
2505.Xr attach 9E
2506entry point, it must call
2507.Xr ddi_fm_fini 9F
2508during its
2509.Xr detach 9E
2510entry point.
2511.Ss Transport Errors
2512Many modern networking devices leverage PCI or PCI Express.
2513As such, there are two primary ways that device drivers access data: they either
2514memory map device registers and use routines like
2515.Xr ddi_get8 9F
2516and
2517.Xr ddi_put8 9F
2518or they use direct memory access (DMA).
2519New device drivers should always enable checking of the transport layer by
2520marking their support in the
2521.Xr ddi_device_acc_attr 9S
2522structure and using routines like
2523.Xr ddi_fm_acc_err_get 9F
2524and
2525.Xr ddi_fm_dma_err_get 9F
2526to detect if errors have occurred.
2527.Ss Device Indicated Errors
2528Many devices have capabilities to announce to a device driver that a
2529fatal correctable error or uncorrectable error has occurred.
2530Other devices have the ability to indicate that various physical issues have
2531occurred such as a fan failing or a temperature sensor having fired.
2532.Pp
2533Drivers should wire themselves to receive notifications when these
2534events occur.
2535The means and capabilities will vary from device to device.
2536For example, some devices will generate information about these notifications
2537through special interrupts.
2538Other devices may have a register that software can poll.
2539In the cases where polling is required, driver writers should try not to poll
2540too frequently and should generally only poll when the device is actively being
2541used, e.g. between calls to the
2542.Xr mc_start 9E
2543and
2544.Xr mc_stop 9E
2545entry points.
2546.Ss Driver Transmit Stall Detection
2547One of the primary responsibilities of a hardened device driver is to
2548perform transmit stall detection.
2549The core idea behind tx stall detection is that the driver should record when
2550it's getting activity related to when data has been successfully transmitted.
2551Most devices should be transmitting data on a regular basis as long as the link
2552is up.
2553If it is not, then this may indicate that the device is stuck and needs to be
2554reset.
2555At this time, the MAC framework does not provide any resources for performing
2556these checks; however, polling on each individual transmit ring for the last
2557completion time while something is actively being transmitted through the use of
2558routines such as
2559.Xr timeout 9F
2560may be a reasonable starting point.
2561.Ss Driver Command Timeout Detection
2562Each device is programmed in different ways.
2563Some devices are programmed through asynchronous commands while others are
2564programmed by writing directly to memory mapped registers.
2565If a device receives asynchronous replies to commands, then the device driver
2566should set reasonable timeouts for all such commands and plan on detecting them.
2567If a timeout occurs, the driver should presume that there is an issue with the
2568hardware and proceed to abort the command or reset the device.
2569.Pp
2570Many devices do not have such a communication mechanism.
2571However, whenever there is some activity where the device driver must wait, then
2572it should be prepared for the fact that the device may never get back to
2573it and react appropriately by performing some kind of device reset.
2574.Ss Reacting to Errors
2575When any of the above categories of errors has been triggered, the
2576behavior that the device driver should take depends on the kind of
2577error.
2578If a fatal error, for example, a transport error, a transmit stall was detected,
2579or the device indicated an uncorrectable error was detected, then it is
2580important that the driver take the following steps:
2581.Bl -enum -offset indent
2582.It
2583Set a flag in the device driver's state that indicates that it has hit
2584an error condition.
2585When this error condition flag is asserted, transmitted packets should be
2586accepted and dropped and actions that would require writing to the device state
2587should fail with an error.
2588This flag should remain until the device has been successfully restarted.
2589.It
2590If the error was not a transport error that was indicated by the fault
2591management architecture, e.g. a transport error that was detected, then
2592the device driver should post an
2593.Sy ereport
2594indicating what has occurred with the
2595.Xr ddi_fm_ereport_post 9F
2596function.
2597.It
2598The device driver should indicate that the device's service was lost
2599with a call to
2600.Xr ddi_fm_service_impact 9F
2601using the symbol
2602.Dv DDI_SERVICE_LOST .
2603.It
2604At this point the device driver should issue a device reset through some
2605device-specific means.
2606.It
2607When the device reset has been completed, then the device driver should
2608restore all of the programmed state to the device.
2609This includes things like the current MTU, advertised auto-negotiation speeds,
2610MAC address filters, and more.
2611.It
2612Finally, when service has been restored, the device driver should call
2613.Xr ddi_fm_service_impact 9F
2614using the symbol
2615.Dv DDI_SERVICE_RESTORED .
2616.El
2617.Pp
2618When a non-fatal error occurs, then the device driver should submit an
2619ereport and should optionally mark the device degraded using
2620.Xr ddi_fm_service_impact 9F
2621with the
2622.Dv DDI_SERVICE_DEGRADED
2623value depending on the nature of the problem that has occurred.
2624.Pp
2625Device drivers should never make the decision to remove a device from
2626service based on errors that have occurred nor should they panic the
2627system.
2628Rather, the device driver should always try to notify the operating system with
2629various ereports and allow its policy decisions to occur.
2630The decision to retire a device lies in the hands of the fault management
2631architecture.
2632It knows more about the operator's intent and the surrounding system's state
2633than the device driver itself does and it will make the call to offline and
2634retire the device if it is required.
2635.Ss Device Resets
2636When resetting a device, a device driver must exercise caution.
2637If a device driver has not been written to plan for a device reset, then it
2638may not correctly restore the device's state after such a reset.
2639Such state should be stored in the instance's private state data as the MAC
2640framework does not know about device resets and will not inform the
2641device again about the expected, programmed state.
2642.Pp
2643One wrinkle with device resets is that many networking cards show up as
2644multiple PCI functions on a single device, for example, each port may
2645show up as a separate function and thus have a separate instance of the
2646device driver attached.
2647When resetting a function, device driver writers should carefully read the
2648device programming manuals and verify whether or not a reset impacts only the
2649stalled function or if it impacts all function across the device.
2650.Pp
2651If the only way to reset a given function is through the device, then
2652this may require more coordination and work on the part of the device
2653driver to ensure that all the other instances are correctly restored.
2654In cases where this occurs, some devices offer ways of injecting
2655interrupts onto those other functions to notify them that this is
2656occurring.
2657.Sh MBLKS AND DMA
2658The networking stack manages framed data through the use of the
2659.Xr mblk 9S
2660structure.
2661The mblk allows for a single message to be made up of individual blocks.
2662Each part is linked together through its
2663.Fa b_cont
2664member.
2665However, it also allows for multiple messages to be chained together through the
2666use of the
2667.Fa b_next
2668member.
2669While the networking stack works with these structures, device drivers generally
2670work with DMA regions.
2671There are two different strategies that device drivers use for handling these
2672two different cases: copying and binding.
2673.Ss Copying Data
2674The first way that device drivers handle interfacing between the two is
2675by having two separate regions of memory.
2676One part is memory which has been allocated for DMA through a call to
2677.Xr ddi_dma_mem_alloc 9F
2678and the other is memory associated with the memory block.
2679.Pp
2680In this case, a driver will use
2681.Xr bcopy 9F
2682to copy memory between the two distinct regions.
2683When transmitting a packet, it will copy the memory from the mblk_t to the DMA
2684region.
2685When receiving memory, it will allocate a mblk_t through the
2686.Xr allocb 9F
2687routine, copy the memory across with
2688.Xr bcopy 9F ,
2689and then increment the mblk_t's
2690.Fa b_wptr
2691structure.
2692.Pp
2693If, when receiving, memory is not available for a new message block,
2694then the frame should be skipped and effectively dropped.
2695A kstat should be bumped when such an occasion occurs.
2696.Ss Binding Data
2697An alternative approach to copying data is to use DMA binding.
2698When using DMA binding, the OS takes care of mapping between DMA memory and
2699normal device memory.
2700The exact process is a bit different between transmit and receive.
2701.Pp
2702When transmitting a device driver has an mblk_t and needs to call the
2703.Xr ddi_dma_addr_bind_handle 9F
2704function to bind it to an already existing DMA handle.
2705At that point, it will receive various DMA cookies that it can use to obtain the
2706addresses to program the device with for transmitting data.
2707Once the transmit is done, the driver must then make sure to call
2708.Xr freemsg 9F
2709to release the data.
2710It must not call
2711.Xr freemsg 9F
2712before it receives an interrupt from the device indicating that the data
2713has been transmitted, otherwise it risks sending arbitrary kernel
2714memory.
2715.Pp
2716When receiving data, the device can perform a similar operation.
2717First, it must bind the DMA memory into the kernel's virtual memory address
2718space through a call to the
2719.Xr ddi_dma_addr_bind_handle 9F
2720function if it has not already.
2721Once it has, it must then call
2722.Xr desballoc 9F
2723to try and create a new mblk_t which leverages the associated memory.
2724It can then pass that mblk_t up to the stack.
2725.Ss Considerations
2726When deciding which of these options to use, there are many different
2727considerations that must be made.
2728The answer as to whether to bind memory or to copy data is not always simpler.
2729.Pp
2730The first thing to remember is that DMA resources may be finite on a
2731given platform.
2732Consider the case of receiving data.
2733A device driver that binds one of its receive descriptors may not get it back
2734for quite some time as it may be used by the kernel until an application
2735actually consumes it.
2736Device drivers that try to bind memory for receive, often work with the
2737constraint that they must be able to replace that DMA memory with another DMA
2738descriptor.
2739If they were not replaced, then eventually the device would not be able to
2740receive additional data into the ring.
2741.Pp
2742On the other hand, particularly for larger frames, copying every packet
2743from one buffer to another can be a source of additional latency and
2744memory waste in the system.
2745For larger copies, the cost of copying may dwarf any potential cost of
2746performing DMA binding.
2747.Pp
2748For device driver authors that are unsure of what to do, they should
2749first employ the copying method to simplify the act of writing the
2750device driver.
2751The copying method is simpler and also allows the device driver author not to
2752worry about allocated DMA memory that is still outstanding when it is asked to
2753unload.
2754.Pp
2755If device driver writers are worried about the cost, it is recommended
2756to make the decision as to whether or not to copy or bind DMA data
2757a separate private property for both transmitting and receiving.
2758That private property should indicate the size of the received frame at which
2759to switch from one format to the other.
2760This way, data can be gathered to determine what the impact of each method is on
2761a given platform.
2762.Sh SEE ALSO
2763.Xr dlpi 4P ,
2764.Xr driver.conf 5 ,
2765.Xr ieee802.3 7 ,
2766.Xr dladm 8 ,
2767.Xr _fini 9E ,
2768.Xr _info 9E ,
2769.Xr _init 9E ,
2770.Xr attach 9E ,
2771.Xr close 9E ,
2772.Xr detach 9E ,
2773.Xr mac_capab_led 9E ,
2774.Xr mac_capab_rings 9E ,
2775.Xr mac_capab_transceiver 9E ,
2776.Xr mc_close 9E ,
2777.Xr mc_getcapab 9E ,
2778.Xr mc_getprop 9E ,
2779.Xr mc_getstat 9E ,
2780.Xr mc_multicst 9E  ,
2781.Xr mc_open 9E ,
2782.Xr mc_propinfo 9E  ,
2783.Xr mc_setpromisc 9E  ,
2784.Xr mc_setprop 9E ,
2785.Xr mc_start 9E ,
2786.Xr mc_stop 9E ,
2787.Xr mc_tx 9E ,
2788.Xr mc_unicst 9E  ,
2789.Xr open 9E ,
2790.Xr allocb 9F ,
2791.Xr bcopy 9F ,
2792.Xr ddi_dma_addr_bind_handle 9F ,
2793.Xr ddi_dma_mem_alloc 9F ,
2794.Xr ddi_fm_acc_err_get 9F ,
2795.Xr ddi_fm_dma_err_get 9F ,
2796.Xr ddi_fm_ereport_post 9F ,
2797.Xr ddi_fm_fini 9F ,
2798.Xr ddi_fm_init 9F ,
2799.Xr ddi_fm_service_impact 9F ,
2800.Xr ddi_get8 9F ,
2801.Xr ddi_put8 9F ,
2802.Xr desballoc 9F ,
2803.Xr freemsg 9F ,
2804.Xr kstat_create 9F ,
2805.Xr mac_alloc 9F ,
2806.Xr mac_devt_to_instance 9F ,
2807.Xr mac_fini_ops 9F ,
2808.Xr mac_free 9F ,
2809.Xr mac_getinfo 9F ,
2810.Xr mac_hcksum_get 9F ,
2811.Xr mac_hcksum_set 9F ,
2812.Xr mac_init_ops 9F ,
2813.Xr mac_link_update 9F ,
2814.Xr mac_lso_get 9F ,
2815.Xr mac_maxsdu_update 9F ,
2816.Xr mac_private_minor 9F ,
2817.Xr mac_prop_info_set_default_link_flowctrl 9F ,
2818.Xr mac_prop_info_set_default_str 9F ,
2819.Xr mac_prop_info_set_default_uint32 9F ,
2820.Xr mac_prop_info_set_default_uint64 9F ,
2821.Xr mac_prop_info_set_default_uint8 9F ,
2822.Xr mac_prop_info_set_perm 9F ,
2823.Xr mac_prop_info_set_range_uint32 9F ,
2824.Xr mac_register 9F ,
2825.Xr mac_rx 9F ,
2826.Xr mac_unregister 9F ,
2827.Xr mod_install 9F ,
2828.Xr mod_remove 9F ,
2829.Xr strcmp 9F ,
2830.Xr timeout 9F ,
2831.Xr cb_ops 9S ,
2832.Xr ddi_device_acc_attr 9S ,
2833.Xr dev_ops 9S ,
2834.Xr mac_callbacks 9S ,
2835.Xr mac_register 9S ,
2836.Xr mblk 9S ,
2837.Xr modldrv 9S ,
2838.Xr modlinkage 9S
2839.Rs
2840.%A McCloghrie, K.
2841.%A Rose, M.
2842.%T RFC 1213 Management Information Base for Network Management of
2843.%T TCP/IP-based internets: MIB-II
2844.%D March 1991
2845.Re
2846.Rs
2847.%A McCloghrie, K.
2848.%A Kastenholz, F.
2849.%T RFC 1573 Evolution of the Interfaces Group of MIB-II
2850.%D January 1994
2851.Re
2852.Rs
2853.%A Kastenholz, F.
2854.%T RFC 1643 Definitions of Managed Objects for the Ethernet-like
2855.%T Interface Types
2856.Re
2857.Rs
2858.%A IEEE Computer Standard
2859.%T IEEE 802.3
2860.%T IEEE Standard for Ethernet
2861.%D 2022
2862.Re
2863