xref: /freebsd/share/man/man4/pci.4 (revision 315ee00f)
1.\"
2.\" Copyright (c) 1999 Kenneth D. Merry.
3.\" All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\" 1. Redistributions of source code must retain the above copyright
9.\"    notice, this list of conditions and the following disclaimer.
10.\" 2. The name of the author may not be used to endorse or promote products
11.\"    derived from this software without specific prior written permission.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.Dd October 4, 2022
26.Dt PCI 4
27.Os
28.Sh NAME
29.Nm pci
30.Nd generic PCI/PCIe bus driver
31.Sh SYNOPSIS
32To compile the PCI bus driver into the kernel,
33place the following line in your
34kernel configuration file:
35.Bd -ragged -offset indent
36.Cd device pci
37.Ed
38.Pp
39To compile in support for Single Root I/O Virtualization
40.Pq SR-IOV :
41.Bd -ragged -offset indent
42.Cd options PCI_IOV
43.Ed
44.Pp
45To compile in support for native PCI-express HotPlug:
46.Bd -ragged -offset indent
47.Cd options PCI_HP
48.Ed
49.Sh DESCRIPTION
50The
51.Nm
52driver provides support for
53.Tn PCI
54and
55.Tn PCIe
56devices in the kernel and limited access to
57.Tn PCI
58devices for userland.
59.Pp
60The
61.Nm
62driver provides a
63.Pa /dev/pci
64character device that can be used by userland programs to read and write
65.Tn PCI
66configuration registers.
67Programs can also use this device to get a list of all
68.Tn PCI
69devices, or all
70.Tn PCI
71devices that match various patterns.
72.Pp
73Since the
74.Nm
75driver provides a write interface for
76.Tn PCI
77configuration registers, system administrators should exercise caution when
78granting access to the
79.Nm
80device.
81If used improperly, this driver can allow userland applications to
82crash a machine or cause data loss.
83In particular, driver only allows operations on the opened
84.Pa /dev/pci
85to modify system state if the file descriptor was opened for writing.
86For instance, the
87.Dv PCIOCREAD
88and
89.Dv PCIOCBARMMAP
90operations require a writeable descriptor, because reading a config register
91or a BAR read access could have function-specific side-effects.
92.Pp
93The
94.Nm
95driver implements the
96.Tn PCI
97bus in the kernel.
98It enumerates any devices on the
99.Tn PCI
100bus and gives
101.Tn PCI
102client drivers the chance to attach to them.
103It assigns resources to children, when the BIOS does not.
104It takes care of routing interrupts when necessary.
105It reprobes the unattached
106.Tn PCI
107children when
108.Tn PCI
109client drivers are dynamically
110loaded at runtime.
111The
112.Nm
113driver also includes support for PCI-PCI bridges,
114various platform-specific Host-PCI bridges,
115and basic support for
116.Tn PCI
117VGA adapters.
118.Sh IOCTLS
119The following
120.Xr ioctl 2
121calls are supported by the
122.Nm
123driver.
124They are defined in the header file
125.In sys/pciio.h .
126.Bl -tag -width 012345678901234
127.It PCIOCGETCONF
128This
129.Xr ioctl 2
130takes a
131.Va pci_conf_io
132structure.
133It allows the user to retrieve information on all
134.Tn PCI
135devices in the system, or on
136.Tn PCI
137devices matching patterns supplied by the user.
138The call may set
139.Va errno
140to any value specified in either
141.Xr copyin 9
142or
143.Xr copyout 9 .
144The
145.Va pci_conf_io
146structure consists of a number of fields:
147.Bl -tag -width match_buf_len
148.It pat_buf_len
149The length, in bytes, of the buffer filled with user-supplied patterns.
150.It num_patterns
151The number of user-supplied patterns.
152.It patterns
153Pointer to a buffer filled with user-supplied patterns.
154.Va patterns
155is a pointer to
156.Va num_patterns
157.Va pci_match_conf
158structures.
159The
160.Va pci_match_conf
161structure consists of the following elements:
162.Bl -tag -width pd_vendor
163.It pc_sel
164.Tn PCI
165domain, bus, slot and function.
166.It pd_name
167.Tn PCI
168device driver name.
169.It pd_unit
170.Tn PCI
171device driver unit number.
172.It pc_vendor
173.Tn PCI
174vendor ID.
175.It pc_device
176.Tn PCI
177device ID.
178.It pc_class
179.Tn PCI
180device class.
181.It flags
182The flags describe which of the fields the kernel should match against.
183A device must match all specified fields in order to be returned.
184The match flags are enumerated in the
185.Va pci_getconf_flags
186structure.
187Hopefully the flag values are obvious enough that they do not need to
188described in detail.
189.El
190.It match_buf_len
191Length of the
192.Va matches
193buffer allocated by the user to hold the results of the
194.Dv PCIOCGETCONF
195query.
196.It num_matches
197Number of matches returned by the kernel.
198.It matches
199Buffer containing matching devices returned by the kernel.
200The items in this buffer are of type
201.Va pci_conf ,
202which consists of the following items:
203.Bl -tag -width pc_subvendor
204.It pc_sel
205.Tn PCI
206domain, bus, slot and function.
207.It pc_hdr
208.Tn PCI
209header type.
210.It pc_subvendor
211.Tn PCI
212subvendor ID.
213.It pc_subdevice
214.Tn PCI
215subdevice ID.
216.It pc_vendor
217.Tn PCI
218vendor ID.
219.It pc_device
220.Tn PCI
221device ID.
222.It pc_class
223.Tn PCI
224device class.
225.It pc_subclass
226.Tn PCI
227device subclass.
228.It pc_progif
229.Tn PCI
230device programming interface.
231.It pc_revid
232.Tn PCI
233revision ID.
234.It pd_name
235Driver name.
236.It pd_unit
237Driver unit number.
238.El
239.It offset
240The offset is passed in by the user to tell the kernel where it should
241start traversing the device list.
242The value passed out by the kernel
243points to the record immediately after the last one returned.
244The user may
245pass the value returned by the kernel in subsequent calls to the
246.Dv PCIOCGETCONF
247ioctl.
248If the user does not intend to use the offset, it must be set to zero.
249.It generation
250.Tn PCI
251configuration generation.
252This value only needs to be set if the offset is set.
253The kernel will compare the current generation number of its internal
254device list to the generation passed in by the user to determine whether
255its device list has changed since the user last called the
256.Dv PCIOCGETCONF
257ioctl.
258If the device list has changed, a status of
259.Va PCI_GETCONF_LIST_CHANGED
260will be passed back.
261.It status
262The status tells the user the disposition of his request for a device list.
263The possible status values are:
264.Bl -ohang
265.It PCI_GETCONF_LAST_DEVICE
266This means that there are no more devices in the PCI device list matching
267the specified criteria after the
268ones returned in the
269.Va matches
270buffer.
271.It PCI_GETCONF_LIST_CHANGED
272This status tells the user that the
273.Tn PCI
274device list has changed since his last call to the
275.Dv PCIOCGETCONF
276ioctl and he must reset the
277.Va offset
278and
279.Va generation
280to zero to start over at the beginning of the list.
281.It PCI_GETCONF_MORE_DEVS
282This tells the user that his buffer was not large enough to hold all of the
283remaining devices in the device list that match his criteria.
284.It PCI_GETCONF_ERROR
285This indicates a general error while servicing the user's request.
286If the
287.Va pat_buf_len
288is not equal to
289.Va num_patterns
290times
291.Fn sizeof "struct pci_match_conf" ,
292.Va errno
293will be set to
294.Er EINVAL .
295.El
296.El
297.It PCIOCREAD
298This
299.Xr ioctl 2
300reads the
301.Tn PCI
302configuration registers specified by the passed-in
303.Va pci_io
304structure.
305The
306.Va pci_io
307structure consists of the following fields:
308.Bl -tag -width pi_width
309.It pi_sel
310A
311.Va pcisel
312structure which specifies the domain, bus, slot and function the user would
313like to query.
314If the specific bus is not found, errno will be set to ENODEV and -1 returned
315from the ioctl.
316.It pi_reg
317The
318.Tn PCI
319configuration registers the user would like to access.
320.It pi_width
321The width, in bytes, of the data the user would like to read.
322This value
323may be either 1, 2, or 4.
3243-byte reads and reads larger than 4 bytes are
325not supported.
326If an invalid width is passed, errno will be set to EINVAL.
327.It pi_data
328The data returned by the kernel.
329.El
330.It PCIOCWRITE
331This
332.Xr ioctl 2
333allows users to write to the
334.Tn PCI
335configuration registers specified in the passed-in
336.Va pci_io
337structure.
338The
339.Va pci_io
340structure is described above.
341The limitations on data width described for
342reading registers, above, also apply to writing
343.Tn PCI
344configuration registers.
345.It PCIOCATTACHED
346This
347.Xr ioctl 2
348allows users to query if a driver is attached to the
349.Tn PCI
350device specified in the passed-in
351.Va pci_io
352structure.
353The
354.Va pci_io
355structure is described above, however, the
356.Va pi_reg
357and
358.Va pi_width
359fields are not used.
360The status of the device is stored in the
361.Va pi_data
362field.
363A value of 0 indicates no driver is attached, while a value larger than 0
364indicates that a driver is attached.
365.It PCIOCBARMMAP
366This
367.Xr ioctl 2
368command allows userspace processes to
369.Xr mmap 2
370the memory-mapped PCI BAR into its address space.
371The input parameters and results are passed in the
372.Va pci_bar_mmap
373structure, which has the following fields:
374.Bl -tag -width Vt struct pcise pbm_sel
375.It Vt uint64_t	pbm_map_base
376Reports the established mapping base to the caller.
377If
378.Va PCIIO_BAR_MMAP_FIXED
379flag was specified, then this field must be filled before the call
380with the desired address for the mapping.
381.It Vt uint64_t pbm_map_length
382Reports the mapped length of the BAR, in bytes.
383Its .Vt uint64_t value is always multiple of machine pages.
384.It Vt int64_t pbm_bar_length
385Reports length of the bar as exposed by the device.
386.It Vt int pbm_bar_off
387Reports offset from the mapped base to the start of the
388first register in the bar.
389.It Vt struct pcisel pbm_sel
390Should be filled before the call.
391Describes the device to operate on.
392.It Vt int pbm_reg
393The BAR index to mmap.
394.It Vt int pbm_flags
395Flags which augments the operation.
396See below.
397.It Vt int pbm_memattr
398The caching attribute for the mapping.
399Typical values are
400.Dv VM_MEMATTR_UNCACHEABLE
401for control registers BARs, and
402.Dv VM_MEMATTR_WRITE_COMBINING
403for frame buffers.
404Regular memory-like BAR should be mapped with
405.Dv VM_MEMATTR_DEFAULT
406attribute.
407.El
408.Pp
409Currently defined flags are:
410.Bl -tag -width PCIIO_BAR_MMAP_ACTIVATE
411.It PCIIO_BAR_MMAP_FIXED
412The resulted mappings should be established at the address
413specified by the
414.Va pbm_map_base
415member, otherwise fail.
416.It PCIIO_BAR_MMAP_EXCL
417Must be used together with
418.Dv PCIIO_BAR_MMAP_FIXED
419If the specified base contains already established mappings, the
420operation fails instead of implicitly unmapping them.
421.It PCIIO_BAR_MMAP_RW
422The requested mapping allows both reading and writing.
423Without the flag, read-only mapping is established.
424Note that it is common for the device registers to have side-effects
425even on reads.
426.It PCIIO_BAR_MMAP_ACTIVATE
427(Unimplemented) If the BAR is not activated, activate it in the course
428of mapping.
429Currently attempt to mmap an inactive BAR results in error.
430.El
431.It PCIOCBARIO
432This
433.Xr ioctl 2
434command allows users to read from and write to BARs.
435The I/O request parameters are passed in a
436.Va struct pci_bar_ioreq
437structure, which has the following fields:
438.Bl -tag
439.It Vt struct pcisel pbi_sel
440Describes the device to operate on.
441.It Vt int pbi_op
442The operation to perform.
443Currently supported values are
444.Dv PCIBARIO_READ
445and
446.Dv PCIBARIO_WRITE .
447.It Vt uint32_t pbi_bar
448The index of the BAR on which to operate.
449.It Vt uint32_t pbi_offset
450The offset into the BAR at which to operate.
451.It Vt uint32_t pbi_width
452The size, in bytes, of the I/O operation.
4531-byte, 2-byte, 4-byte and 8-byte perations are supported.
454.It Vt uint32_t pbi_value
455For reads, the value is returned in this field.
456For writes, the caller specifies the value to be written in this field.
457.Pp
458Note that this operation maps and unmaps the corresponding resource and
459so is relatively expensive for memory BARs.
460The
461.Va PCIOCBARMMAP
462.Xr ioctl 2
463can be used to create a persistent userspace mapping for such BARs instead.
464.El
465.El
466.Sh LOADER TUNABLES
467Tunables can be set at the
468.Xr loader 8
469prompt before booting the kernel, or stored in
470.Xr loader.conf 5 .
471The current value of these tunables can be examined at runtime via
472.Xr sysctl 8
473nodes of the same name.
474Unless otherwise specified,
475each of these tunables is a boolean that can be enabled by setting the
476tunable to a non-zero value.
477.Bl -tag -width indent
478.It Va hw.pci.clear_bars Pq Defaults to 0
479Ignore any firmware-assigned memory and I/O port resources.
480This forces the
481.Tn PCI
482bus driver to allocate resource ranges for memory and I/O port resources
483from scratch.
484.It Va hw.pci.clear_buses Pq Defaults to 0
485Ignore any firmware-assigned bus number registers in PCI-PCI bridges.
486This forces the
487.Tn PCI
488bus driver and PCI-PCI bridge driver to allocate bus numbers for secondary
489buses behind PCI-PCI bridges.
490.It Va hw.pci.clear_pcib Pq Defaults to 0
491Ignore any firmware-assigned memory and I/O port resource windows in PCI-PCI
492bridges.
493This forces the PCI-PCI bridge driver to allocate memory and I/O port resources
494for resource windows from scratch.
495.Pp
496By default the PCI-PCI bridge driver will allocate windows that
497contain the firmware-assigned resources devices behind the bridge.
498In addition, the PCI-PCI bridge driver will suballocate from existing window
499regions when possible to satisfy a resource request.
500As a result,
501both
502.Va hw.pci.clear_bars
503and
504.Va hw.pci.clear_pcib
505must be enabled to fully ignore firmware-supplied resource assignments.
506.It Va hw.pci.default_vgapci_unit Pq Defaults to -1
507By default,
508the first
509.Tn PCI
510VGA adapter encountered by the system is assumed to be the boot display device.
511This tunable can be set to choose a specific VGA adapter by specifying the
512unit number of the associated
513.Va vgapci Ns Ar X
514device.
515.It Va hw.pci.do_power_nodriver Pq Defaults to 0
516Place devices into a low power state
517.Pq D3
518when a suitable device driver is not found.
519Can be set to one of the following values:
520.Bl -tag -width indent
521.It 3
522Powers down all
523.Tn PCI
524devices without a device driver.
525.It 2
526Powers down most devices without a device driver.
527PCI devices with the display, memory, and base peripheral device classes
528are not powered down.
529.It 1
530Similar to a setting of 2 except that storage controllers are also not
531powered down.
532.It 0
533All devices are left fully powered.
534.El
535.Pp
536A
537.Tn PCI
538device must support power management to be powered down.
539Placing a device into a low power state may not reduce power consumption.
540.It Va hw.pci.do_power_resume Pq Defaults to 1
541Place
542.Tn PCI
543devices into the fully powered state when resuming either the system or an
544individual device.
545Setting this to zero is discouraged as the system will not attempt to power
546up non-powered PCI devices after a suspend.
547.It Va hw.pci.do_power_suspend Pq Defaults to 1
548Place
549.Tn PCI
550devices into a low power state when suspending either the system or individual
551devices.
552Normally the D3 state is used as the low power state,
553but firmware may override the desired power state during a system suspend.
554.It Va hw.pci.enable_ari Pq Defaults to 1
555Enable support for PCI-express Alternative RID Interpretation.
556This is often used in conjunction with SR-IOV.
557.It Va hw.pci.enable_io_modes Pq Defaults to 1
558Enable memory or I/O port decoding in a PCI device's command register if it has
559firmware-assigned memory or I/O port resources.
560The firmware
561.Pq BIOS
562in some systems does not enable memory or I/O port decoding for some devices
563even when it has assigned resources to the device.
564This enables decoding for such resources during bus probe.
565.It Va hw.pci.enable_msi Pq Defaults to 1
566Enable support for Message Signalled Interrupts
567.Pq MSI .
568MSI interrupts can be disabled by setting this tunable to 0.
569.It Va hw.pci.enable_msix Pq Defaults to 1
570Enable support for extended Message Signalled Interrupts
571.Pq MSI-X .
572MSI-X interrupts can be disabled by setting this tunable to 0.
573.It Va hw.pci.enable_pcie_ei Pq Defaults to 0
574Enable support for PCI-express Electromechanical Interlock.
575.It Va hw.pci.enable_pcie_hp Pq Defaults to 1
576Enable support for native PCI-express HotPlug.
577.It Va hw.pci.honor_msi_blacklist Pq Defaults to 1
578MSI and MSI-X interrupts are disabled for certain chipsets known to have
579broken MSI and MSI-X implementations when this tunable is set.
580It can be set to zero to permit use of MSI and MSI-X interrupts if the
581chipset match is a false positive.
582.It Va hw.pci.iov_max_config Pq Defaults to 1MB
583The maximum amount of memory permitted for the configuration parameters
584used when creating Virtual Functions via SR-IOV.
585This tunable can also be changed at runtime via
586.Xr sysctl 8 .
587.It Va hw.pci.realloc_bars Pq Defaults to 0
588Attempt to allocate a new resource range during the initial device scan
589for any memory or I/O port resources with firmware-assigned ranges that
590conflict with another active resource.
591.It Va hw.pci.usb_early_takeover Pq Defaults to 1 on Tn amd64 and Tn i386
592Disable legacy device emulation of USB devices during the initial device
593scan.
594Set this tunable to zero to use USB devices via legacy emulation when
595using a custom kernel without USB controller drivers.
596.It Va hw.pci<D>.<B>.<S>.INT<P>.irq
597These tunables can be used to override the interrupt routing for legacy
598PCI INTx interrupts.
599Unlike other tunables in this list,
600these do not have corresponding sysctl nodes.
601The tunable name includes the address of the PCI device as well as the
602pin of the desired INTx IRQ to override:
603.Bl -tag -width indent
604.It <D>
605The domain
606.Pq or segment
607of the PCI device in decimal.
608.It <B>
609The bus address of the PCI device in decimal.
610.It <S>
611The slot of the PCI device in decimal.
612.It <P>
613The interrupt pin of the PCI slot to override.
614One of
615.Ql A ,
616.Ql B ,
617.Ql C ,
618or
619.Ql D .
620.El
621.Pp
622The value of the tunable is the raw IRQ value to use for the INTx interrupt
623pin identified by the tunable name.
624Mapping of IRQ values to platform interrupt sources is machine dependent.
625.El
626.Sh DEVICE WIRING
627You can wire the device unit at a given location with device.hints.
628Entries of the form
629.Va hints.<name>.<unit>.at="pci<B>:<S>:<F>"
630or
631.Va hints.<name>.<unit>.at="pci<D>:<B>:<S>:<F>"
632will force the driver
633.Va name
634to probe and attach at unit
635.Va unit
636for any PCI device found to match the specification, where:
637.Bl -tag -width -indent
638.It <D>
639The domain
640.Pq or segment
641of the PCI device in decimal.
642Defaults to 0 if unspecified
643.It <B>
644The bus address of the PCI device in decimal.
645.It <S>
646The slot of the PCI device in decimal.
647.It <F>
648The function of the PCI device in decimal.
649.El
650.Pp
651The code to do the matching requires an exact string match.
652Do not specify the angle brackets
653.Pq < >
654in the hints file.
655Wiring multiple devices to the same
656.Va name
657and
658.Va unit
659produces undefined results.
660.Ss Examples
661Given the following lines in
662.Pa /boot/device.hints :
663.Cd hint.nvme.3.at="pci6:0:0"
664.Cd hint.igb.8.at="pci14:0:0"
665If there is a device that supports
666.Xr igb 4
667at PCI bus 14 slot 0 function 0,
668then it will be assigned igb8 for probe and attach.
669Likewise, if there is an
670.Xr nvme 4
671card at PCI bus 6 slot 0 function 0,
672then it will be assigned nvme3 for probe and attach.
673If another type of card is in either of these locations, the name and
674unit of that card will be the default names and will be unaffected by
675these hints.
676If other igb or nvme cards are located elsewhere, they will be
677assigned their unit numbers sequentially, skipping the unit numbers
678that have 'at' hints.
679.Sh FILES
680.Bl -tag -width /dev/pci -compact
681.It Pa /dev/pci
682Character device for the
683.Nm
684driver.
685.El
686.Sh SEE ALSO
687.Xr pciconf 8
688.Sh HISTORY
689The
690.Nm
691driver (not the kernel's
692.Tn PCI
693support code) first appeared in
694.Fx 2.2 ,
695and was written by Stefan Esser and Garrett Wollman.
696Support for device listing and matching was re-implemented by
697Kenneth Merry, and first appeared in
698.Fx 3.0 .
699.Sh AUTHORS
700.An Kenneth Merry Aq Mt ken@FreeBSD.org
701.Sh BUGS
702It is not possible for users to specify an accurate offset into the device
703list without calling the
704.Dv PCIOCGETCONF
705at least once, since they have no way of knowing the current generation
706number otherwise.
707This probably is not a serious problem, though, since
708users can easily narrow their search by specifying a pattern or patterns
709for the kernel to match against.
710