xref: /linux/Documentation/accel/qaic/aic100.rst (revision 41cfbaa4)
1830f3f27SJeffrey Hugo.. SPDX-License-Identifier: GPL-2.0-only
2830f3f27SJeffrey Hugo
3830f3f27SJeffrey Hugo===============================
4830f3f27SJeffrey Hugo Qualcomm Cloud AI 100 (AIC100)
5830f3f27SJeffrey Hugo===============================
6830f3f27SJeffrey Hugo
7830f3f27SJeffrey HugoOverview
8830f3f27SJeffrey Hugo========
9830f3f27SJeffrey Hugo
10830f3f27SJeffrey HugoThe Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of
11830f3f27SJeffrey HugoSnapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for
12830f3f27SJeffrey Hugothe purpose of efficiently running Artificial Intelligence (AI) Deep Learning
13830f3f27SJeffrey Hugoinference workloads. They are AI accelerators.
14830f3f27SJeffrey Hugo
15830f3f27SJeffrey HugoThe PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes
16830f3f27SJeffrey Hugo(x8). An individual SoC on a card can have up to 16 NSPs for running workloads.
17830f3f27SJeffrey HugoEach SoC has an A53 management CPU. On card, there can be up to 32 GB of DDR.
18830f3f27SJeffrey Hugo
19830f3f27SJeffrey HugoMultiple AIC100 cards can be hosted in a single system to scale overall
20830f3f27SJeffrey Hugoperformance. AIC100 cards are multi-user capable and able to execute workloads
21830f3f27SJeffrey Hugofrom multiple users in a concurrent manner.
22830f3f27SJeffrey Hugo
23830f3f27SJeffrey HugoHardware Description
24830f3f27SJeffrey Hugo====================
25830f3f27SJeffrey Hugo
26830f3f27SJeffrey HugoAn AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc
27830f3f27SJeffrey Hugoperipherals (PMICs, etc).
28830f3f27SJeffrey Hugo
29830f3f27SJeffrey HugoAn AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card),
30830f3f27SJeffrey Hugoor a Dual M.2 card. Both use PCIe to connect to the host system.
31830f3f27SJeffrey Hugo
32830f3f27SJeffrey HugoAs a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/
33830f3f27SJeffrey HugoDeviceID(DID) combination to uniquely identify itself to the host. AIC100
34830f3f27SJeffrey Hugouses the standard Qualcomm VID (0x17cb). All AIC100 SKUs use the same
35830f3f27SJeffrey HugoAIC100 DID (0xa100).
36830f3f27SJeffrey Hugo
37830f3f27SJeffrey HugoAIC100 does not implement FLR (function level reset).
38830f3f27SJeffrey Hugo
39bb8e97e2SCarl VanderlipAIC100 implements MSI but does not implement MSI-X. AIC100 prefers 17 MSIs to
40bb8e97e2SCarl Vanderlipoperate (1 for MHI, 16 for the DMA Bridge). Falling back to 1 MSI is possible in
41bb8e97e2SCarl Vanderlipscenarios where reserving 32 MSIs isn't feasible.
42830f3f27SJeffrey Hugo
43830f3f27SJeffrey HugoAs a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device
44830f3f27SJeffrey Hugohardware. AIC100 provides 3, 64-bit BARs.
45830f3f27SJeffrey Hugo
46830f3f27SJeffrey Hugo* The first BAR is 4K in size, and exposes the MHI interface to the host.
47830f3f27SJeffrey Hugo
48830f3f27SJeffrey Hugo* The second BAR is 2M in size, and exposes the DMA Bridge interface to the
49830f3f27SJeffrey Hugo  host.
50830f3f27SJeffrey Hugo
51830f3f27SJeffrey Hugo* The third BAR is variable in size based on an individual AIC100's
52830f3f27SJeffrey Hugo  configuration, but defaults to 64K. This BAR currently has no purpose.
53830f3f27SJeffrey Hugo
54830f3f27SJeffrey HugoFrom the host perspective, AIC100 has several key hardware components -
55830f3f27SJeffrey Hugo
56830f3f27SJeffrey Hugo* MHI (Modem Host Interface)
57830f3f27SJeffrey Hugo* QSM (QAIC Service Manager)
58830f3f27SJeffrey Hugo* NSPs (Neural Signal Processor)
59830f3f27SJeffrey Hugo* DMA Bridge
60830f3f27SJeffrey Hugo* DDR
61830f3f27SJeffrey Hugo
62830f3f27SJeffrey HugoMHI
63830f3f27SJeffrey Hugo---
64830f3f27SJeffrey Hugo
65830f3f27SJeffrey HugoAIC100 has one MHI interface over PCIe. MHI itself is documented at
66830f3f27SJeffrey HugoDocumentation/mhi/index.rst MHI is the mechanism the host uses to communicate
67830f3f27SJeffrey Hugowith the QSM. Except for workload data via the DMA Bridge, all interaction with
68830f3f27SJeffrey Hugothe device occurs via MHI.
69830f3f27SJeffrey Hugo
70830f3f27SJeffrey HugoQSM
71830f3f27SJeffrey Hugo---
72830f3f27SJeffrey Hugo
73830f3f27SJeffrey HugoQAIC Service Manager. This is an ARM A53 CPU that runs the primary
74830f3f27SJeffrey Hugofirmware of the card and performs on-card management tasks. It also
75830f3f27SJeffrey Hugocommunicates with the host via MHI. Each AIC100 has one of
76830f3f27SJeffrey Hugothese.
77830f3f27SJeffrey Hugo
78830f3f27SJeffrey HugoNSP
79830f3f27SJeffrey Hugo---
80830f3f27SJeffrey Hugo
81830f3f27SJeffrey HugoNeural Signal Processor. Each AIC100 has up to 16 of these. These are
82830f3f27SJeffrey Hugothe processors that run the workloads on AIC100. Each NSP is a Qualcomm Hexagon
83830f3f27SJeffrey Hugo(Q6) DSP with HVX and HMX. Each NSP can only run one workload at a time, but
84830f3f27SJeffrey Hugomultiple NSPs may be assigned to a single workload. Since each NSP can only run
85830f3f27SJeffrey Hugoone workload, AIC100 is limited to 16 concurrent workloads. Workload
86830f3f27SJeffrey Hugo"scheduling" is under the purview of the host. AIC100 does not automatically
87830f3f27SJeffrey Hugotimeslice.
88830f3f27SJeffrey Hugo
89830f3f27SJeffrey HugoDMA Bridge
90830f3f27SJeffrey Hugo----------
91830f3f27SJeffrey Hugo
92830f3f27SJeffrey HugoThe DMA Bridge is custom DMA engine that manages the flow of data
93830f3f27SJeffrey Hugoin and out of workloads. AIC100 has one of these. The DMA Bridge has 16
94830f3f27SJeffrey Hugochannels, each consisting of a set of request/response FIFOs. Each active
95830f3f27SJeffrey Hugoworkload is assigned a single DMA Bridge channel. The DMA Bridge exposes
96830f3f27SJeffrey Hugohardware registers to manage the FIFOs (head/tail pointers), but requires host
97830f3f27SJeffrey Hugomemory to store the FIFOs.
98830f3f27SJeffrey Hugo
99830f3f27SJeffrey HugoDDR
100830f3f27SJeffrey Hugo---
101830f3f27SJeffrey Hugo
102830f3f27SJeffrey HugoAIC100 has on-card DDR. In total, an AIC100 can have up to 32 GB of DDR.
103830f3f27SJeffrey HugoThis DDR is used to store workloads, data for the workloads, and is used by the
104830f3f27SJeffrey HugoQSM for managing the device. NSPs are granted access to sections of the DDR by
105830f3f27SJeffrey Hugothe QSM. The host does not have direct access to the DDR, and must make
106830f3f27SJeffrey Hugorequests to the QSM to transfer data to the DDR.
107830f3f27SJeffrey Hugo
108830f3f27SJeffrey HugoHigh-level Use Flow
109830f3f27SJeffrey Hugo===================
110830f3f27SJeffrey Hugo
111830f3f27SJeffrey HugoAIC100 is a multi-user, programmable accelerator typically used for running
112830f3f27SJeffrey Hugoneural networks in inferencing mode to efficiently perform AI operations.
113830f3f27SJeffrey HugoAIC100 is not intended for training neural networks. AIC100 can be utilized
114830f3f27SJeffrey Hugofor generic compute workloads.
115830f3f27SJeffrey Hugo
116830f3f27SJeffrey HugoAssuming a user wants to utilize AIC100, they would follow these steps:
117830f3f27SJeffrey Hugo
118830f3f27SJeffrey Hugo1. Compile the workload into an ELF targeting the NSP(s)
119830f3f27SJeffrey Hugo2. Make requests to the QSM to load the workload and related artifacts into the
120830f3f27SJeffrey Hugo   device DDR
121830f3f27SJeffrey Hugo3. Make a request to the QSM to activate the workload onto a set of idle NSPs
122830f3f27SJeffrey Hugo4. Make requests to the DMA Bridge to send input data to the workload to be
123830f3f27SJeffrey Hugo   processed, and other requests to receive processed output data from the
124830f3f27SJeffrey Hugo   workload.
125830f3f27SJeffrey Hugo5. Once the workload is no longer required, make a request to the QSM to
126830f3f27SJeffrey Hugo   deactivate the workload, thus putting the NSPs back into an idle state.
127830f3f27SJeffrey Hugo6. Once the workload and related artifacts are no longer needed for future
128830f3f27SJeffrey Hugo   sessions, make requests to the QSM to unload the data from DDR. This frees
129830f3f27SJeffrey Hugo   the DDR to be used by other users.
130830f3f27SJeffrey Hugo
131830f3f27SJeffrey Hugo
132830f3f27SJeffrey HugoBoot Flow
133830f3f27SJeffrey Hugo=========
134830f3f27SJeffrey Hugo
135830f3f27SJeffrey HugoAIC100 uses a flashless boot flow, derived from Qualcomm MSMs.
136830f3f27SJeffrey Hugo
137830f3f27SJeffrey HugoWhen AIC100 is first powered on, it begins executing PBL (Primary Bootloader)
138830f3f27SJeffrey Hugofrom ROM. PBL enumerates the PCIe link, and initializes the BHI (Boot Host
139830f3f27SJeffrey HugoInterface) component of MHI.
140830f3f27SJeffrey Hugo
141830f3f27SJeffrey HugoUsing BHI, the host points PBL to the location of the SBL (Secondary Bootloader)
142830f3f27SJeffrey Hugoimage. The PBL pulls the image from the host, validates it, and begins
143830f3f27SJeffrey Hugoexecution of SBL.
144830f3f27SJeffrey Hugo
145830f3f27SJeffrey HugoSBL initializes MHI, and uses MHI to notify the host that the device has entered
146830f3f27SJeffrey Hugothe SBL stage. SBL performs a number of operations:
147830f3f27SJeffrey Hugo
148830f3f27SJeffrey Hugo* SBL initializes the majority of hardware (anything PBL left uninitialized),
149830f3f27SJeffrey Hugo  including DDR.
150830f3f27SJeffrey Hugo* SBL offloads the bootlog to the host.
151830f3f27SJeffrey Hugo* SBL synchronizes timestamps with the host for future logging.
152830f3f27SJeffrey Hugo* SBL uses the Sahara protocol to obtain the runtime firmware images from the
153830f3f27SJeffrey Hugo  host.
154830f3f27SJeffrey Hugo
155830f3f27SJeffrey HugoOnce SBL has obtained and validated the runtime firmware, it brings the NSPs out
156830f3f27SJeffrey Hugoof reset, and jumps into the QSM.
157830f3f27SJeffrey Hugo
158830f3f27SJeffrey HugoThe QSM uses MHI to notify the host that the device has entered the QSM stage
159830f3f27SJeffrey Hugo(AMSS in MHI terms). At this point, the AIC100 device is fully functional, and
160830f3f27SJeffrey Hugoready to process workloads.
161830f3f27SJeffrey Hugo
162830f3f27SJeffrey HugoUserspace components
163830f3f27SJeffrey Hugo====================
164830f3f27SJeffrey Hugo
165830f3f27SJeffrey HugoCompiler
166830f3f27SJeffrey Hugo--------
167830f3f27SJeffrey Hugo
168830f3f27SJeffrey HugoAn open compiler for AIC100 based on upstream LLVM can be found at:
169830f3f27SJeffrey Hugohttps://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc
170830f3f27SJeffrey Hugo
171830f3f27SJeffrey HugoUsermode Driver (UMD)
172830f3f27SJeffrey Hugo---------------------
173830f3f27SJeffrey Hugo
174830f3f27SJeffrey HugoAn open UMD that interfaces with the qaic kernel driver can be found at:
175830f3f27SJeffrey Hugohttps://github.com/quic/software-kit-for-qualcomm-cloud-ai-100
176830f3f27SJeffrey Hugo
177830f3f27SJeffrey HugoSahara loader
178830f3f27SJeffrey Hugo-------------
179830f3f27SJeffrey Hugo
180830f3f27SJeffrey HugoAn open implementation of the Sahara protocol called kickstart can be found at:
181830f3f27SJeffrey Hugohttps://github.com/andersson/qdl
182830f3f27SJeffrey Hugo
183830f3f27SJeffrey HugoMHI Channels
184830f3f27SJeffrey Hugo============
185830f3f27SJeffrey Hugo
186830f3f27SJeffrey HugoAIC100 defines a number of MHI channels for different purposes. This is a list
187830f3f27SJeffrey Hugoof the defined channels, and their uses.
188830f3f27SJeffrey Hugo
189830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+
190830f3f27SJeffrey Hugo| Channel name   | IDs     | EEs      | Purpose                                |
191830f3f27SJeffrey Hugo+================+=========+==========+========================================+
192830f3f27SJeffrey Hugo| QAIC_LOOPBACK  | 0 & 1   | AMSS     | Any data sent to the device on this    |
193830f3f27SJeffrey Hugo|                |         |          | channel is sent back to the host.      |
194830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+
195830f3f27SJeffrey Hugo| QAIC_SAHARA    | 2 & 3   | SBL      | Used by SBL to obtain the runtime      |
196830f3f27SJeffrey Hugo|                |         |          | firmware from the host.                |
197830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+
198830f3f27SJeffrey Hugo| QAIC_DIAG      | 4 & 5   | AMSS     | Used to communicate with QSM via the   |
199830f3f27SJeffrey Hugo|                |         |          | DIAG protocol.                         |
200830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+
201830f3f27SJeffrey Hugo| QAIC_SSR       | 6 & 7   | AMSS     | Used to notify the host of subsystem   |
202830f3f27SJeffrey Hugo|                |         |          | restart events, and to offload SSR     |
203830f3f27SJeffrey Hugo|                |         |          | crashdumps.                            |
204830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+
205830f3f27SJeffrey Hugo| QAIC_QDSS      | 8 & 9   | AMSS     | Used for the Qualcomm Debug Subsystem. |
206830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+
207830f3f27SJeffrey Hugo| QAIC_CONTROL   | 10 & 11 | AMSS     | Used for the Neural Network Control    |
208830f3f27SJeffrey Hugo|                |         |          | (NNC) protocol. This is the primary    |
209830f3f27SJeffrey Hugo|                |         |          | channel between host and QSM for       |
210830f3f27SJeffrey Hugo|                |         |          | managing workloads.                    |
211830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+
212830f3f27SJeffrey Hugo| QAIC_LOGGING   | 12 & 13 | SBL      | Used by the SBL to send the bootlog to |
213830f3f27SJeffrey Hugo|                |         |          | the host.                              |
214830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+
215830f3f27SJeffrey Hugo| QAIC_STATUS    | 14 & 15 | AMSS     | Used to notify the host of Reliability,|
216830f3f27SJeffrey Hugo|                |         |          | Accessibility, Serviceability (RAS)    |
217830f3f27SJeffrey Hugo|                |         |          | events.                                |
218830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+
219830f3f27SJeffrey Hugo| QAIC_TELEMETRY | 16 & 17 | AMSS     | Used to get/set power/thermal/etc      |
220830f3f27SJeffrey Hugo|                |         |          | attributes.                            |
221830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+
222830f3f27SJeffrey Hugo| QAIC_DEBUG     | 18 & 19 | AMSS     | Not used.                              |
223830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+
224*41cfbaa4SPranjal Ramajor Asha Kanojiya| QAIC_TIMESYNC  | 20 & 21 | SBL      | Used to synchronize timestamps in the  |
225830f3f27SJeffrey Hugo|                |         |          | device side logs with the host time    |
226830f3f27SJeffrey Hugo|                |         |          | source.                                |
227830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+
2286216fb03SAjit Pal Singh| QAIC_TIMESYNC  | 22 & 23 | AMSS     | Used to periodically synchronize       |
2296216fb03SAjit Pal Singh| _PERIODIC      |         |          | timestamps in the device side logs with|
2306216fb03SAjit Pal Singh|                |         |          | the host time source.                  |
2316216fb03SAjit Pal Singh+----------------+---------+----------+----------------------------------------+
232830f3f27SJeffrey Hugo
233830f3f27SJeffrey HugoDMA Bridge
234830f3f27SJeffrey Hugo==========
235830f3f27SJeffrey Hugo
236830f3f27SJeffrey HugoOverview
237830f3f27SJeffrey Hugo--------
238830f3f27SJeffrey Hugo
239830f3f27SJeffrey HugoThe DMA Bridge is one of the main interfaces to the host from the device
240830f3f27SJeffrey Hugo(the other being MHI). As part of activating a workload to run on NSPs, the QSM
241830f3f27SJeffrey Hugoassigns that network a DMA Bridge channel. A workload's DMA Bridge channel
242830f3f27SJeffrey Hugo(DBC for short) is solely for the use of that workload and is not shared with
243830f3f27SJeffrey Hugoother workloads.
244830f3f27SJeffrey Hugo
245830f3f27SJeffrey HugoEach DBC is a pair of FIFOs that manage data in and out of the workload. One
246830f3f27SJeffrey HugoFIFO is the request FIFO. The other FIFO is the response FIFO.
247830f3f27SJeffrey Hugo
248830f3f27SJeffrey HugoEach DBC contains 4 registers in hardware:
249830f3f27SJeffrey Hugo
250830f3f27SJeffrey Hugo* Request FIFO head pointer (offset 0x0). Read only by the host. Indicates the
251830f3f27SJeffrey Hugo  latest item in the FIFO the device has consumed.
252830f3f27SJeffrey Hugo* Request FIFO tail pointer (offset 0x4). Read/write by the host. Host
253830f3f27SJeffrey Hugo  increments this register to add new items to the FIFO.
254830f3f27SJeffrey Hugo* Response FIFO head pointer (offset 0x8). Read/write by the host. Indicates
255830f3f27SJeffrey Hugo  the latest item in the FIFO the host has consumed.
256830f3f27SJeffrey Hugo* Response FIFO tail pointer (offset 0xc). Read only by the host. Device
257830f3f27SJeffrey Hugo  increments this register to add new items to the FIFO.
258830f3f27SJeffrey Hugo
259830f3f27SJeffrey HugoThe values in each register are indexes in the FIFO. To get the location of the
260830f3f27SJeffrey HugoFIFO element pointed to by the register: FIFO base address + register * element
261830f3f27SJeffrey Hugosize.
262830f3f27SJeffrey Hugo
263830f3f27SJeffrey HugoDBC registers are exposed to the host via the second BAR. Each DBC consumes
264830f3f27SJeffrey Hugo4KB of space in the BAR.
265830f3f27SJeffrey Hugo
266830f3f27SJeffrey HugoThe actual FIFOs are backed by host memory. When sending a request to the QSM
267830f3f27SJeffrey Hugoto activate a network, the host must donate memory to be used for the FIFOs.
268830f3f27SJeffrey HugoDue to internal mapping limitations of the device, a single contiguous chunk of
269830f3f27SJeffrey Hugomemory must be provided per DBC, which hosts both FIFOs. The request FIFO will
270830f3f27SJeffrey Hugoconsume the beginning of the memory chunk, and the response FIFO will consume
271830f3f27SJeffrey Hugothe end of the memory chunk.
272830f3f27SJeffrey Hugo
273830f3f27SJeffrey HugoRequest FIFO
274830f3f27SJeffrey Hugo------------
275830f3f27SJeffrey Hugo
276830f3f27SJeffrey HugoA request FIFO element has the following structure:
277830f3f27SJeffrey Hugo
278830f3f27SJeffrey Hugo.. code-block:: c
279830f3f27SJeffrey Hugo
280830f3f27SJeffrey Hugo  struct request_elem {
281830f3f27SJeffrey Hugo	u16 req_id;
282830f3f27SJeffrey Hugo	u8  seq_id;
283830f3f27SJeffrey Hugo	u8  pcie_dma_cmd;
284830f3f27SJeffrey Hugo	u32 reserved;
285830f3f27SJeffrey Hugo	u64 pcie_dma_source_addr;
286830f3f27SJeffrey Hugo	u64 pcie_dma_dest_addr;
287830f3f27SJeffrey Hugo	u32 pcie_dma_len;
288830f3f27SJeffrey Hugo	u32 reserved;
289830f3f27SJeffrey Hugo	u64 doorbell_addr;
290830f3f27SJeffrey Hugo	u8  doorbell_attr;
291830f3f27SJeffrey Hugo	u8  reserved;
292830f3f27SJeffrey Hugo	u16 reserved;
293830f3f27SJeffrey Hugo	u32 doorbell_data;
294830f3f27SJeffrey Hugo	u32 sem_cmd0;
295830f3f27SJeffrey Hugo	u32 sem_cmd1;
296830f3f27SJeffrey Hugo	u32 sem_cmd2;
297830f3f27SJeffrey Hugo	u32 sem_cmd3;
298830f3f27SJeffrey Hugo  };
299830f3f27SJeffrey Hugo
300830f3f27SJeffrey HugoRequest field descriptions:
301830f3f27SJeffrey Hugo
302830f3f27SJeffrey Hugoreq_id
303830f3f27SJeffrey Hugo	request ID. A request FIFO element and a response FIFO element with
304830f3f27SJeffrey Hugo	the same request ID refer to the same command.
305830f3f27SJeffrey Hugo
306830f3f27SJeffrey Hugoseq_id
307830f3f27SJeffrey Hugo	sequence ID within a request. Ignored by the DMA Bridge.
308830f3f27SJeffrey Hugo
309830f3f27SJeffrey Hugopcie_dma_cmd
310830f3f27SJeffrey Hugo	describes the DMA element of this request.
311830f3f27SJeffrey Hugo
312830f3f27SJeffrey Hugo	* Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic
313830f3f27SJeffrey Hugo	  and generates a MSI when this request is complete, and QSM
314830f3f27SJeffrey Hugo	  configures the DMA Bridge to look at this bit.
315830f3f27SJeffrey Hugo	* Bits(6:5) are reserved.
316830f3f27SJeffrey Hugo	* Bit(4) is the completion code flag, and indicates that the DMA Bridge
317830f3f27SJeffrey Hugo	  shall generate a response FIFO element when this request is
318830f3f27SJeffrey Hugo	  complete.
319830f3f27SJeffrey Hugo	* Bit(3) indicates if this request is a linked list transfer(0) or a bulk
320830f3f27SJeffrey Hugo	  transfer(1).
321830f3f27SJeffrey Hugo	* Bit(2) is reserved.
322830f3f27SJeffrey Hugo	* Bits(1:0) indicate the type of transfer. No transfer(0), to device(1),
323830f3f27SJeffrey Hugo	  from device(2). Value 3 is illegal.
324830f3f27SJeffrey Hugo
325830f3f27SJeffrey Hugopcie_dma_source_addr
326830f3f27SJeffrey Hugo	source address for a bulk transfer, or the address of the linked list.
327830f3f27SJeffrey Hugo
328830f3f27SJeffrey Hugopcie_dma_dest_addr
329830f3f27SJeffrey Hugo	destination address for a bulk transfer.
330830f3f27SJeffrey Hugo
331830f3f27SJeffrey Hugopcie_dma_len
332830f3f27SJeffrey Hugo	length of the bulk transfer. Note that the size of this field
333830f3f27SJeffrey Hugo	limits transfers to 4G in size.
334830f3f27SJeffrey Hugo
335830f3f27SJeffrey Hugodoorbell_addr
336830f3f27SJeffrey Hugo	address of the doorbell to ring when this request is complete.
337830f3f27SJeffrey Hugo
338830f3f27SJeffrey Hugodoorbell_attr
339830f3f27SJeffrey Hugo	doorbell attributes.
340830f3f27SJeffrey Hugo
341830f3f27SJeffrey Hugo	* Bit(7) indicates if a write to a doorbell is to occur.
342830f3f27SJeffrey Hugo	* Bits(6:2) are reserved.
343830f3f27SJeffrey Hugo	* Bits(1:0) contain the encoding of the doorbell length. 0 is 32-bit,
344830f3f27SJeffrey Hugo	  1 is 16-bit, 2 is 8-bit, 3 is reserved. The doorbell address
345830f3f27SJeffrey Hugo	  must be naturally aligned to the specified length.
346830f3f27SJeffrey Hugo
347830f3f27SJeffrey Hugodoorbell_data
348830f3f27SJeffrey Hugo	data to write to the doorbell. Only the bits corresponding to
349830f3f27SJeffrey Hugo	the doorbell length are valid.
350830f3f27SJeffrey Hugo
351830f3f27SJeffrey Hugosem_cmdN
352830f3f27SJeffrey Hugo	semaphore command.
353830f3f27SJeffrey Hugo
354830f3f27SJeffrey Hugo	* Bit(31) indicates this semaphore command is enabled.
355830f3f27SJeffrey Hugo	* Bit(30) is the to-device DMA fence. Block this request until all
356830f3f27SJeffrey Hugo	  to-device DMA transfers are complete.
357830f3f27SJeffrey Hugo	* Bit(29) is the from-device DMA fence. Block this request until all
358830f3f27SJeffrey Hugo	  from-device DMA transfers are complete.
359830f3f27SJeffrey Hugo	* Bits(28:27) are reserved.
360830f3f27SJeffrey Hugo	* Bits(26:24) are the semaphore command. 0 is NOP. 1 is init with the
361830f3f27SJeffrey Hugo	  specified value. 2 is increment. 3 is decrement. 4 is wait
362830f3f27SJeffrey Hugo	  until the semaphore is equal to the specified value. 5 is wait
363830f3f27SJeffrey Hugo	  until the semaphore is greater or equal to the specified value.
364830f3f27SJeffrey Hugo	  6 is "P", wait until semaphore is greater than 0, then
365830f3f27SJeffrey Hugo	  decrement by 1. 7 is reserved.
366830f3f27SJeffrey Hugo	* Bit(23) is reserved.
367830f3f27SJeffrey Hugo	* Bit(22) is the semaphore sync. 0 is post sync, which means that the
368830f3f27SJeffrey Hugo	  semaphore operation is done after the DMA transfer. 1 is
369830f3f27SJeffrey Hugo	  presync, which gates the DMA transfer. Only one presync is
370830f3f27SJeffrey Hugo	  allowed per request.
371830f3f27SJeffrey Hugo	* Bit(21) is reserved.
372830f3f27SJeffrey Hugo	* Bits(20:16) is the index of the semaphore to operate on.
373830f3f27SJeffrey Hugo	* Bits(15:12) are reserved.
374830f3f27SJeffrey Hugo	* Bits(11:0) are the semaphore value to use in operations.
375830f3f27SJeffrey Hugo
376830f3f27SJeffrey HugoOverall, a request is processed in 4 steps:
377830f3f27SJeffrey Hugo
378830f3f27SJeffrey Hugo1. If specified, the presync semaphore condition must be true
379830f3f27SJeffrey Hugo2. If enabled, the DMA transfer occurs
380830f3f27SJeffrey Hugo3. If specified, the postsync semaphore conditions must be true
381830f3f27SJeffrey Hugo4. If enabled, the doorbell is written
382830f3f27SJeffrey Hugo
383830f3f27SJeffrey HugoBy using the semaphores in conjunction with the workload running on the NSPs,
384830f3f27SJeffrey Hugothe data pipeline can be synchronized such that the host can queue multiple
385830f3f27SJeffrey Hugorequests of data for the workload to process, but the DMA Bridge will only copy
386830f3f27SJeffrey Hugothe data into the memory of the workload when the workload is ready to process
387830f3f27SJeffrey Hugothe next input.
388830f3f27SJeffrey Hugo
389830f3f27SJeffrey HugoResponse FIFO
390830f3f27SJeffrey Hugo-------------
391830f3f27SJeffrey Hugo
392830f3f27SJeffrey HugoOnce a request is fully processed, a response FIFO element is generated if
393830f3f27SJeffrey Hugospecified in pcie_dma_cmd. The structure of a response FIFO element:
394830f3f27SJeffrey Hugo
395830f3f27SJeffrey Hugo.. code-block:: c
396830f3f27SJeffrey Hugo
397830f3f27SJeffrey Hugo  struct response_elem {
398830f3f27SJeffrey Hugo	u16 req_id;
399830f3f27SJeffrey Hugo	u16 completion_code;
400830f3f27SJeffrey Hugo  };
401830f3f27SJeffrey Hugo
402830f3f27SJeffrey Hugoreq_id
403830f3f27SJeffrey Hugo	matches the req_id of the request that generated this element.
404830f3f27SJeffrey Hugo
405830f3f27SJeffrey Hugocompletion_code
406830f3f27SJeffrey Hugo	status of this request. 0 is success. Non-zero is an error.
407830f3f27SJeffrey Hugo
408830f3f27SJeffrey HugoThe DMA Bridge will generate a MSI to the host as a reaction to activity in the
409830f3f27SJeffrey Hugoresponse FIFO of a DBC. The DMA Bridge hardware has an IRQ storm mitigation
410830f3f27SJeffrey Hugoalgorithm, where it will only generate a MSI when the response FIFO transitions
411830f3f27SJeffrey Hugofrom empty to non-empty (unless force MSI is enabled and triggered). In
412830f3f27SJeffrey Hugoresponse to this MSI, the host is expected to drain the response FIFO, and must
413830f3f27SJeffrey Hugotake care to handle any race conditions between draining the FIFO, and the
414830f3f27SJeffrey Hugodevice inserting elements into the FIFO.
415830f3f27SJeffrey Hugo
416830f3f27SJeffrey HugoNeural Network Control (NNC) Protocol
417830f3f27SJeffrey Hugo=====================================
418830f3f27SJeffrey Hugo
419830f3f27SJeffrey HugoThe NNC protocol is how the host makes requests to the QSM to manage workloads.
420830f3f27SJeffrey HugoIt uses the QAIC_CONTROL MHI channel.
421830f3f27SJeffrey Hugo
422830f3f27SJeffrey HugoEach NNC request is packaged into a message. Each message is a series of
423830f3f27SJeffrey Hugotransactions. A passthrough type transaction can contain elements known as
424830f3f27SJeffrey Hugocommands.
425830f3f27SJeffrey Hugo
426830f3f27SJeffrey HugoQSM requires NNC messages be little endian encoded and the fields be naturally
427830f3f27SJeffrey Hugoaligned. Since there are 64-bit elements in some NNC messages, 64-bit alignment
428830f3f27SJeffrey Hugomust be maintained.
429830f3f27SJeffrey Hugo
430830f3f27SJeffrey HugoA message contains a header and then a series of transactions. A message may be
431830f3f27SJeffrey Hugoat most 4K in size from QSM to the host. From the host to the QSM, a message
432830f3f27SJeffrey Hugocan be at most 64K (maximum size of a single MHI packet), but there is a
433830f3f27SJeffrey Hugocontinuation feature where message N+1 can be marked as a continuation of
434830f3f27SJeffrey Hugomessage N. This is used for exceedingly large DMA xfer transactions.
435830f3f27SJeffrey Hugo
436830f3f27SJeffrey HugoTransaction descriptions
437830f3f27SJeffrey Hugo------------------------
438830f3f27SJeffrey Hugo
439830f3f27SJeffrey Hugopassthrough
440830f3f27SJeffrey Hugo	Allows userspace to send an opaque payload directly to the QSM.
441830f3f27SJeffrey Hugo	This is used for NNC commands. Userspace is responsible for managing
442830f3f27SJeffrey Hugo	the QSM message requirements in the payload.
443830f3f27SJeffrey Hugo
444830f3f27SJeffrey Hugodma_xfer
445830f3f27SJeffrey Hugo	DMA transfer. Describes an object that the QSM should DMA into the
446830f3f27SJeffrey Hugo	device via address and size tuples.
447830f3f27SJeffrey Hugo
448830f3f27SJeffrey Hugoactivate
449830f3f27SJeffrey Hugo	Activate a workload onto NSPs. The host must provide memory to be
450830f3f27SJeffrey Hugo	used by the DBC.
451830f3f27SJeffrey Hugo
452830f3f27SJeffrey Hugodeactivate
453830f3f27SJeffrey Hugo	Deactivate an active workload and return the NSPs to idle.
454830f3f27SJeffrey Hugo
455830f3f27SJeffrey Hugostatus
456830f3f27SJeffrey Hugo	Query the QSM about it's NNC implementation. Returns the NNC version,
457830f3f27SJeffrey Hugo	and if CRC is used.
458830f3f27SJeffrey Hugo
459830f3f27SJeffrey Hugoterminate
460830f3f27SJeffrey Hugo	Release a user's resources.
461830f3f27SJeffrey Hugo
462830f3f27SJeffrey Hugodma_xfer_cont
463830f3f27SJeffrey Hugo	Continuation of a previous DMA transfer. If a DMA transfer
464830f3f27SJeffrey Hugo	cannot be specified in a single message (highly fragmented), this
465830f3f27SJeffrey Hugo	transaction can be used to specify more ranges.
466830f3f27SJeffrey Hugo
467830f3f27SJeffrey Hugovalidate_partition
468830f3f27SJeffrey Hugo	Query to QSM to determine if a partition identifier is valid.
469830f3f27SJeffrey Hugo
470830f3f27SJeffrey HugoEach message is tagged with a user id, and a partition id. The user id allows
471830f3f27SJeffrey HugoQSM to track resources, and release them when the user goes away (eg the process
472830f3f27SJeffrey Hugocrashes). A partition id identifies the resource partition that QSM manages,
473830f3f27SJeffrey Hugowhich this message applies to.
474830f3f27SJeffrey Hugo
475830f3f27SJeffrey HugoMessages may have CRCs. Messages should have CRCs applied until the QSM
476830f3f27SJeffrey Hugoreports via the status transaction that CRCs are not needed. The QSM on the
477830f3f27SJeffrey HugoSA9000P requires CRCs for black channel safing.
478830f3f27SJeffrey Hugo
479830f3f27SJeffrey HugoSubsystem Restart (SSR)
480830f3f27SJeffrey Hugo=======================
481830f3f27SJeffrey Hugo
482830f3f27SJeffrey HugoSSR is the concept of limiting the impact of an error. An AIC100 device may
483830f3f27SJeffrey Hugohave multiple users, each with their own workload running. If the workload of
484830f3f27SJeffrey Hugoone user crashes, the fallout of that should be limited to that workload and not
485830f3f27SJeffrey Hugoimpact other workloads. SSR accomplishes this.
486830f3f27SJeffrey Hugo
487830f3f27SJeffrey HugoIf a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI
488830f3f27SJeffrey Hugochannel. This notification identifies the workload by it's assigned DBC. A
489830f3f27SJeffrey Hugomulti-stage recovery process is then used to cleanup both sides, and get the
490830f3f27SJeffrey HugoDBC/NSPs into a working state.
491830f3f27SJeffrey Hugo
492830f3f27SJeffrey HugoWhen SSR occurs, any state in the workload is lost. Any inputs that were in
493830f3f27SJeffrey Hugoprocess, or queued by not yet serviced, are lost. The loaded artifacts will
494830f3f27SJeffrey Hugoremain in on-card DDR, but the host will need to re-activate the workload if
495830f3f27SJeffrey Hugoit desires to recover the workload.
496830f3f27SJeffrey Hugo
497830f3f27SJeffrey HugoReliability, Accessibility, Serviceability (RAS)
498830f3f27SJeffrey Hugo================================================
499830f3f27SJeffrey Hugo
500830f3f27SJeffrey HugoAIC100 is expected to be deployed in server systems where RAS ideology is
501830f3f27SJeffrey Hugoapplied. Simply put, RAS is the concept of detecting, classifying, and
502830f3f27SJeffrey Hugoreporting errors. While PCIe has AER (Advanced Error Reporting) which factors
503830f3f27SJeffrey Hugointo RAS, AER does not allow for a device to report details about internal
504830f3f27SJeffrey Hugoerrors. Therefore, AIC100 implements a custom RAS mechanism. When a RAS event
505830f3f27SJeffrey Hugooccurs, QSM will report the event with appropriate details via the QAIC_STATUS
506830f3f27SJeffrey HugoMHI channel. A sysadmin may determine that a particular device needs
507830f3f27SJeffrey Hugoadditional service based on RAS reports.
508830f3f27SJeffrey Hugo
509830f3f27SJeffrey HugoTelemetry
510830f3f27SJeffrey Hugo=========
511830f3f27SJeffrey Hugo
512830f3f27SJeffrey HugoQSM has the ability to report various physical attributes of the device, and in
513830f3f27SJeffrey Hugosome cases, to allow the host to control them. Examples include thermal limits,
514830f3f27SJeffrey Hugothermal readings, and power readings. These items are communicated via the
515830f3f27SJeffrey HugoQAIC_TELEMETRY MHI channel.
516