1830f3f27SJeffrey Hugo.. SPDX-License-Identifier: GPL-2.0-only 2830f3f27SJeffrey Hugo 3830f3f27SJeffrey Hugo=============================== 4830f3f27SJeffrey Hugo Qualcomm Cloud AI 100 (AIC100) 5830f3f27SJeffrey Hugo=============================== 6830f3f27SJeffrey Hugo 7830f3f27SJeffrey HugoOverview 8830f3f27SJeffrey Hugo======== 9830f3f27SJeffrey Hugo 10830f3f27SJeffrey HugoThe Qualcomm Cloud AI 100/AIC100 family of products (including SA9000P - part of 11830f3f27SJeffrey HugoSnapdragon Ride) are PCIe adapter cards which contain a dedicated SoC ASIC for 12830f3f27SJeffrey Hugothe purpose of efficiently running Artificial Intelligence (AI) Deep Learning 13830f3f27SJeffrey Hugoinference workloads. They are AI accelerators. 14830f3f27SJeffrey Hugo 15830f3f27SJeffrey HugoThe PCIe interface of AIC100 is capable of PCIe Gen4 speeds over eight lanes 16830f3f27SJeffrey Hugo(x8). An individual SoC on a card can have up to 16 NSPs for running workloads. 17830f3f27SJeffrey HugoEach SoC has an A53 management CPU. On card, there can be up to 32 GB of DDR. 18830f3f27SJeffrey Hugo 19830f3f27SJeffrey HugoMultiple AIC100 cards can be hosted in a single system to scale overall 20830f3f27SJeffrey Hugoperformance. AIC100 cards are multi-user capable and able to execute workloads 21830f3f27SJeffrey Hugofrom multiple users in a concurrent manner. 22830f3f27SJeffrey Hugo 23830f3f27SJeffrey HugoHardware Description 24830f3f27SJeffrey Hugo==================== 25830f3f27SJeffrey Hugo 26830f3f27SJeffrey HugoAn AIC100 card consists of an AIC100 SoC, on-card DDR, and a set of misc 27830f3f27SJeffrey Hugoperipherals (PMICs, etc). 28830f3f27SJeffrey Hugo 29830f3f27SJeffrey HugoAn AIC100 card can either be a PCIe HHHL form factor (a traditional PCIe card), 30830f3f27SJeffrey Hugoor a Dual M.2 card. Both use PCIe to connect to the host system. 31830f3f27SJeffrey Hugo 32830f3f27SJeffrey HugoAs a PCIe endpoint/adapter, AIC100 uses the standard VendorID(VID)/ 33830f3f27SJeffrey HugoDeviceID(DID) combination to uniquely identify itself to the host. AIC100 34830f3f27SJeffrey Hugouses the standard Qualcomm VID (0x17cb). All AIC100 SKUs use the same 35830f3f27SJeffrey HugoAIC100 DID (0xa100). 36830f3f27SJeffrey Hugo 37830f3f27SJeffrey HugoAIC100 does not implement FLR (function level reset). 38830f3f27SJeffrey Hugo 39bb8e97e2SCarl VanderlipAIC100 implements MSI but does not implement MSI-X. AIC100 prefers 17 MSIs to 40bb8e97e2SCarl Vanderlipoperate (1 for MHI, 16 for the DMA Bridge). Falling back to 1 MSI is possible in 41bb8e97e2SCarl Vanderlipscenarios where reserving 32 MSIs isn't feasible. 42830f3f27SJeffrey Hugo 43830f3f27SJeffrey HugoAs a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device 44830f3f27SJeffrey Hugohardware. AIC100 provides 3, 64-bit BARs. 45830f3f27SJeffrey Hugo 46830f3f27SJeffrey Hugo* The first BAR is 4K in size, and exposes the MHI interface to the host. 47830f3f27SJeffrey Hugo 48830f3f27SJeffrey Hugo* The second BAR is 2M in size, and exposes the DMA Bridge interface to the 49830f3f27SJeffrey Hugo host. 50830f3f27SJeffrey Hugo 51830f3f27SJeffrey Hugo* The third BAR is variable in size based on an individual AIC100's 52830f3f27SJeffrey Hugo configuration, but defaults to 64K. This BAR currently has no purpose. 53830f3f27SJeffrey Hugo 54830f3f27SJeffrey HugoFrom the host perspective, AIC100 has several key hardware components - 55830f3f27SJeffrey Hugo 56830f3f27SJeffrey Hugo* MHI (Modem Host Interface) 57830f3f27SJeffrey Hugo* QSM (QAIC Service Manager) 58830f3f27SJeffrey Hugo* NSPs (Neural Signal Processor) 59830f3f27SJeffrey Hugo* DMA Bridge 60830f3f27SJeffrey Hugo* DDR 61830f3f27SJeffrey Hugo 62830f3f27SJeffrey HugoMHI 63830f3f27SJeffrey Hugo--- 64830f3f27SJeffrey Hugo 65830f3f27SJeffrey HugoAIC100 has one MHI interface over PCIe. MHI itself is documented at 66830f3f27SJeffrey HugoDocumentation/mhi/index.rst MHI is the mechanism the host uses to communicate 67830f3f27SJeffrey Hugowith the QSM. Except for workload data via the DMA Bridge, all interaction with 68830f3f27SJeffrey Hugothe device occurs via MHI. 69830f3f27SJeffrey Hugo 70830f3f27SJeffrey HugoQSM 71830f3f27SJeffrey Hugo--- 72830f3f27SJeffrey Hugo 73830f3f27SJeffrey HugoQAIC Service Manager. This is an ARM A53 CPU that runs the primary 74830f3f27SJeffrey Hugofirmware of the card and performs on-card management tasks. It also 75830f3f27SJeffrey Hugocommunicates with the host via MHI. Each AIC100 has one of 76830f3f27SJeffrey Hugothese. 77830f3f27SJeffrey Hugo 78830f3f27SJeffrey HugoNSP 79830f3f27SJeffrey Hugo--- 80830f3f27SJeffrey Hugo 81830f3f27SJeffrey HugoNeural Signal Processor. Each AIC100 has up to 16 of these. These are 82830f3f27SJeffrey Hugothe processors that run the workloads on AIC100. Each NSP is a Qualcomm Hexagon 83830f3f27SJeffrey Hugo(Q6) DSP with HVX and HMX. Each NSP can only run one workload at a time, but 84830f3f27SJeffrey Hugomultiple NSPs may be assigned to a single workload. Since each NSP can only run 85830f3f27SJeffrey Hugoone workload, AIC100 is limited to 16 concurrent workloads. Workload 86830f3f27SJeffrey Hugo"scheduling" is under the purview of the host. AIC100 does not automatically 87830f3f27SJeffrey Hugotimeslice. 88830f3f27SJeffrey Hugo 89830f3f27SJeffrey HugoDMA Bridge 90830f3f27SJeffrey Hugo---------- 91830f3f27SJeffrey Hugo 92830f3f27SJeffrey HugoThe DMA Bridge is custom DMA engine that manages the flow of data 93830f3f27SJeffrey Hugoin and out of workloads. AIC100 has one of these. The DMA Bridge has 16 94830f3f27SJeffrey Hugochannels, each consisting of a set of request/response FIFOs. Each active 95830f3f27SJeffrey Hugoworkload is assigned a single DMA Bridge channel. The DMA Bridge exposes 96830f3f27SJeffrey Hugohardware registers to manage the FIFOs (head/tail pointers), but requires host 97830f3f27SJeffrey Hugomemory to store the FIFOs. 98830f3f27SJeffrey Hugo 99830f3f27SJeffrey HugoDDR 100830f3f27SJeffrey Hugo--- 101830f3f27SJeffrey Hugo 102830f3f27SJeffrey HugoAIC100 has on-card DDR. In total, an AIC100 can have up to 32 GB of DDR. 103830f3f27SJeffrey HugoThis DDR is used to store workloads, data for the workloads, and is used by the 104830f3f27SJeffrey HugoQSM for managing the device. NSPs are granted access to sections of the DDR by 105830f3f27SJeffrey Hugothe QSM. The host does not have direct access to the DDR, and must make 106830f3f27SJeffrey Hugorequests to the QSM to transfer data to the DDR. 107830f3f27SJeffrey Hugo 108830f3f27SJeffrey HugoHigh-level Use Flow 109830f3f27SJeffrey Hugo=================== 110830f3f27SJeffrey Hugo 111830f3f27SJeffrey HugoAIC100 is a multi-user, programmable accelerator typically used for running 112830f3f27SJeffrey Hugoneural networks in inferencing mode to efficiently perform AI operations. 113830f3f27SJeffrey HugoAIC100 is not intended for training neural networks. AIC100 can be utilized 114830f3f27SJeffrey Hugofor generic compute workloads. 115830f3f27SJeffrey Hugo 116830f3f27SJeffrey HugoAssuming a user wants to utilize AIC100, they would follow these steps: 117830f3f27SJeffrey Hugo 118830f3f27SJeffrey Hugo1. Compile the workload into an ELF targeting the NSP(s) 119830f3f27SJeffrey Hugo2. Make requests to the QSM to load the workload and related artifacts into the 120830f3f27SJeffrey Hugo device DDR 121830f3f27SJeffrey Hugo3. Make a request to the QSM to activate the workload onto a set of idle NSPs 122830f3f27SJeffrey Hugo4. Make requests to the DMA Bridge to send input data to the workload to be 123830f3f27SJeffrey Hugo processed, and other requests to receive processed output data from the 124830f3f27SJeffrey Hugo workload. 125830f3f27SJeffrey Hugo5. Once the workload is no longer required, make a request to the QSM to 126830f3f27SJeffrey Hugo deactivate the workload, thus putting the NSPs back into an idle state. 127830f3f27SJeffrey Hugo6. Once the workload and related artifacts are no longer needed for future 128830f3f27SJeffrey Hugo sessions, make requests to the QSM to unload the data from DDR. This frees 129830f3f27SJeffrey Hugo the DDR to be used by other users. 130830f3f27SJeffrey Hugo 131830f3f27SJeffrey Hugo 132830f3f27SJeffrey HugoBoot Flow 133830f3f27SJeffrey Hugo========= 134830f3f27SJeffrey Hugo 135830f3f27SJeffrey HugoAIC100 uses a flashless boot flow, derived from Qualcomm MSMs. 136830f3f27SJeffrey Hugo 137830f3f27SJeffrey HugoWhen AIC100 is first powered on, it begins executing PBL (Primary Bootloader) 138830f3f27SJeffrey Hugofrom ROM. PBL enumerates the PCIe link, and initializes the BHI (Boot Host 139830f3f27SJeffrey HugoInterface) component of MHI. 140830f3f27SJeffrey Hugo 141830f3f27SJeffrey HugoUsing BHI, the host points PBL to the location of the SBL (Secondary Bootloader) 142830f3f27SJeffrey Hugoimage. The PBL pulls the image from the host, validates it, and begins 143830f3f27SJeffrey Hugoexecution of SBL. 144830f3f27SJeffrey Hugo 145830f3f27SJeffrey HugoSBL initializes MHI, and uses MHI to notify the host that the device has entered 146830f3f27SJeffrey Hugothe SBL stage. SBL performs a number of operations: 147830f3f27SJeffrey Hugo 148830f3f27SJeffrey Hugo* SBL initializes the majority of hardware (anything PBL left uninitialized), 149830f3f27SJeffrey Hugo including DDR. 150830f3f27SJeffrey Hugo* SBL offloads the bootlog to the host. 151830f3f27SJeffrey Hugo* SBL synchronizes timestamps with the host for future logging. 152830f3f27SJeffrey Hugo* SBL uses the Sahara protocol to obtain the runtime firmware images from the 153830f3f27SJeffrey Hugo host. 154830f3f27SJeffrey Hugo 155830f3f27SJeffrey HugoOnce SBL has obtained and validated the runtime firmware, it brings the NSPs out 156830f3f27SJeffrey Hugoof reset, and jumps into the QSM. 157830f3f27SJeffrey Hugo 158830f3f27SJeffrey HugoThe QSM uses MHI to notify the host that the device has entered the QSM stage 159830f3f27SJeffrey Hugo(AMSS in MHI terms). At this point, the AIC100 device is fully functional, and 160830f3f27SJeffrey Hugoready to process workloads. 161830f3f27SJeffrey Hugo 162830f3f27SJeffrey HugoUserspace components 163830f3f27SJeffrey Hugo==================== 164830f3f27SJeffrey Hugo 165830f3f27SJeffrey HugoCompiler 166830f3f27SJeffrey Hugo-------- 167830f3f27SJeffrey Hugo 168830f3f27SJeffrey HugoAn open compiler for AIC100 based on upstream LLVM can be found at: 169830f3f27SJeffrey Hugohttps://github.com/quic/software-kit-for-qualcomm-cloud-ai-100-cc 170830f3f27SJeffrey Hugo 171830f3f27SJeffrey HugoUsermode Driver (UMD) 172830f3f27SJeffrey Hugo--------------------- 173830f3f27SJeffrey Hugo 174830f3f27SJeffrey HugoAn open UMD that interfaces with the qaic kernel driver can be found at: 175830f3f27SJeffrey Hugohttps://github.com/quic/software-kit-for-qualcomm-cloud-ai-100 176830f3f27SJeffrey Hugo 177830f3f27SJeffrey HugoSahara loader 178830f3f27SJeffrey Hugo------------- 179830f3f27SJeffrey Hugo 180830f3f27SJeffrey HugoAn open implementation of the Sahara protocol called kickstart can be found at: 181830f3f27SJeffrey Hugohttps://github.com/andersson/qdl 182830f3f27SJeffrey Hugo 183830f3f27SJeffrey HugoMHI Channels 184830f3f27SJeffrey Hugo============ 185830f3f27SJeffrey Hugo 186830f3f27SJeffrey HugoAIC100 defines a number of MHI channels for different purposes. This is a list 187830f3f27SJeffrey Hugoof the defined channels, and their uses. 188830f3f27SJeffrey Hugo 189830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+ 190830f3f27SJeffrey Hugo| Channel name | IDs | EEs | Purpose | 191830f3f27SJeffrey Hugo+================+=========+==========+========================================+ 192830f3f27SJeffrey Hugo| QAIC_LOOPBACK | 0 & 1 | AMSS | Any data sent to the device on this | 193830f3f27SJeffrey Hugo| | | | channel is sent back to the host. | 194830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+ 195830f3f27SJeffrey Hugo| QAIC_SAHARA | 2 & 3 | SBL | Used by SBL to obtain the runtime | 196830f3f27SJeffrey Hugo| | | | firmware from the host. | 197830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+ 198830f3f27SJeffrey Hugo| QAIC_DIAG | 4 & 5 | AMSS | Used to communicate with QSM via the | 199830f3f27SJeffrey Hugo| | | | DIAG protocol. | 200830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+ 201830f3f27SJeffrey Hugo| QAIC_SSR | 6 & 7 | AMSS | Used to notify the host of subsystem | 202830f3f27SJeffrey Hugo| | | | restart events, and to offload SSR | 203830f3f27SJeffrey Hugo| | | | crashdumps. | 204830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+ 205830f3f27SJeffrey Hugo| QAIC_QDSS | 8 & 9 | AMSS | Used for the Qualcomm Debug Subsystem. | 206830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+ 207830f3f27SJeffrey Hugo| QAIC_CONTROL | 10 & 11 | AMSS | Used for the Neural Network Control | 208830f3f27SJeffrey Hugo| | | | (NNC) protocol. This is the primary | 209830f3f27SJeffrey Hugo| | | | channel between host and QSM for | 210830f3f27SJeffrey Hugo| | | | managing workloads. | 211830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+ 212830f3f27SJeffrey Hugo| QAIC_LOGGING | 12 & 13 | SBL | Used by the SBL to send the bootlog to | 213830f3f27SJeffrey Hugo| | | | the host. | 214830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+ 215830f3f27SJeffrey Hugo| QAIC_STATUS | 14 & 15 | AMSS | Used to notify the host of Reliability,| 216830f3f27SJeffrey Hugo| | | | Accessibility, Serviceability (RAS) | 217830f3f27SJeffrey Hugo| | | | events. | 218830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+ 219830f3f27SJeffrey Hugo| QAIC_TELEMETRY | 16 & 17 | AMSS | Used to get/set power/thermal/etc | 220830f3f27SJeffrey Hugo| | | | attributes. | 221830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+ 222830f3f27SJeffrey Hugo| QAIC_DEBUG | 18 & 19 | AMSS | Not used. | 223830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+ 224*41cfbaa4SPranjal Ramajor Asha Kanojiya| QAIC_TIMESYNC | 20 & 21 | SBL | Used to synchronize timestamps in the | 225830f3f27SJeffrey Hugo| | | | device side logs with the host time | 226830f3f27SJeffrey Hugo| | | | source. | 227830f3f27SJeffrey Hugo+----------------+---------+----------+----------------------------------------+ 2286216fb03SAjit Pal Singh| QAIC_TIMESYNC | 22 & 23 | AMSS | Used to periodically synchronize | 2296216fb03SAjit Pal Singh| _PERIODIC | | | timestamps in the device side logs with| 2306216fb03SAjit Pal Singh| | | | the host time source. | 2316216fb03SAjit Pal Singh+----------------+---------+----------+----------------------------------------+ 232830f3f27SJeffrey Hugo 233830f3f27SJeffrey HugoDMA Bridge 234830f3f27SJeffrey Hugo========== 235830f3f27SJeffrey Hugo 236830f3f27SJeffrey HugoOverview 237830f3f27SJeffrey Hugo-------- 238830f3f27SJeffrey Hugo 239830f3f27SJeffrey HugoThe DMA Bridge is one of the main interfaces to the host from the device 240830f3f27SJeffrey Hugo(the other being MHI). As part of activating a workload to run on NSPs, the QSM 241830f3f27SJeffrey Hugoassigns that network a DMA Bridge channel. A workload's DMA Bridge channel 242830f3f27SJeffrey Hugo(DBC for short) is solely for the use of that workload and is not shared with 243830f3f27SJeffrey Hugoother workloads. 244830f3f27SJeffrey Hugo 245830f3f27SJeffrey HugoEach DBC is a pair of FIFOs that manage data in and out of the workload. One 246830f3f27SJeffrey HugoFIFO is the request FIFO. The other FIFO is the response FIFO. 247830f3f27SJeffrey Hugo 248830f3f27SJeffrey HugoEach DBC contains 4 registers in hardware: 249830f3f27SJeffrey Hugo 250830f3f27SJeffrey Hugo* Request FIFO head pointer (offset 0x0). Read only by the host. Indicates the 251830f3f27SJeffrey Hugo latest item in the FIFO the device has consumed. 252830f3f27SJeffrey Hugo* Request FIFO tail pointer (offset 0x4). Read/write by the host. Host 253830f3f27SJeffrey Hugo increments this register to add new items to the FIFO. 254830f3f27SJeffrey Hugo* Response FIFO head pointer (offset 0x8). Read/write by the host. Indicates 255830f3f27SJeffrey Hugo the latest item in the FIFO the host has consumed. 256830f3f27SJeffrey Hugo* Response FIFO tail pointer (offset 0xc). Read only by the host. Device 257830f3f27SJeffrey Hugo increments this register to add new items to the FIFO. 258830f3f27SJeffrey Hugo 259830f3f27SJeffrey HugoThe values in each register are indexes in the FIFO. To get the location of the 260830f3f27SJeffrey HugoFIFO element pointed to by the register: FIFO base address + register * element 261830f3f27SJeffrey Hugosize. 262830f3f27SJeffrey Hugo 263830f3f27SJeffrey HugoDBC registers are exposed to the host via the second BAR. Each DBC consumes 264830f3f27SJeffrey Hugo4KB of space in the BAR. 265830f3f27SJeffrey Hugo 266830f3f27SJeffrey HugoThe actual FIFOs are backed by host memory. When sending a request to the QSM 267830f3f27SJeffrey Hugoto activate a network, the host must donate memory to be used for the FIFOs. 268830f3f27SJeffrey HugoDue to internal mapping limitations of the device, a single contiguous chunk of 269830f3f27SJeffrey Hugomemory must be provided per DBC, which hosts both FIFOs. The request FIFO will 270830f3f27SJeffrey Hugoconsume the beginning of the memory chunk, and the response FIFO will consume 271830f3f27SJeffrey Hugothe end of the memory chunk. 272830f3f27SJeffrey Hugo 273830f3f27SJeffrey HugoRequest FIFO 274830f3f27SJeffrey Hugo------------ 275830f3f27SJeffrey Hugo 276830f3f27SJeffrey HugoA request FIFO element has the following structure: 277830f3f27SJeffrey Hugo 278830f3f27SJeffrey Hugo.. code-block:: c 279830f3f27SJeffrey Hugo 280830f3f27SJeffrey Hugo struct request_elem { 281830f3f27SJeffrey Hugo u16 req_id; 282830f3f27SJeffrey Hugo u8 seq_id; 283830f3f27SJeffrey Hugo u8 pcie_dma_cmd; 284830f3f27SJeffrey Hugo u32 reserved; 285830f3f27SJeffrey Hugo u64 pcie_dma_source_addr; 286830f3f27SJeffrey Hugo u64 pcie_dma_dest_addr; 287830f3f27SJeffrey Hugo u32 pcie_dma_len; 288830f3f27SJeffrey Hugo u32 reserved; 289830f3f27SJeffrey Hugo u64 doorbell_addr; 290830f3f27SJeffrey Hugo u8 doorbell_attr; 291830f3f27SJeffrey Hugo u8 reserved; 292830f3f27SJeffrey Hugo u16 reserved; 293830f3f27SJeffrey Hugo u32 doorbell_data; 294830f3f27SJeffrey Hugo u32 sem_cmd0; 295830f3f27SJeffrey Hugo u32 sem_cmd1; 296830f3f27SJeffrey Hugo u32 sem_cmd2; 297830f3f27SJeffrey Hugo u32 sem_cmd3; 298830f3f27SJeffrey Hugo }; 299830f3f27SJeffrey Hugo 300830f3f27SJeffrey HugoRequest field descriptions: 301830f3f27SJeffrey Hugo 302830f3f27SJeffrey Hugoreq_id 303830f3f27SJeffrey Hugo request ID. A request FIFO element and a response FIFO element with 304830f3f27SJeffrey Hugo the same request ID refer to the same command. 305830f3f27SJeffrey Hugo 306830f3f27SJeffrey Hugoseq_id 307830f3f27SJeffrey Hugo sequence ID within a request. Ignored by the DMA Bridge. 308830f3f27SJeffrey Hugo 309830f3f27SJeffrey Hugopcie_dma_cmd 310830f3f27SJeffrey Hugo describes the DMA element of this request. 311830f3f27SJeffrey Hugo 312830f3f27SJeffrey Hugo * Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic 313830f3f27SJeffrey Hugo and generates a MSI when this request is complete, and QSM 314830f3f27SJeffrey Hugo configures the DMA Bridge to look at this bit. 315830f3f27SJeffrey Hugo * Bits(6:5) are reserved. 316830f3f27SJeffrey Hugo * Bit(4) is the completion code flag, and indicates that the DMA Bridge 317830f3f27SJeffrey Hugo shall generate a response FIFO element when this request is 318830f3f27SJeffrey Hugo complete. 319830f3f27SJeffrey Hugo * Bit(3) indicates if this request is a linked list transfer(0) or a bulk 320830f3f27SJeffrey Hugo transfer(1). 321830f3f27SJeffrey Hugo * Bit(2) is reserved. 322830f3f27SJeffrey Hugo * Bits(1:0) indicate the type of transfer. No transfer(0), to device(1), 323830f3f27SJeffrey Hugo from device(2). Value 3 is illegal. 324830f3f27SJeffrey Hugo 325830f3f27SJeffrey Hugopcie_dma_source_addr 326830f3f27SJeffrey Hugo source address for a bulk transfer, or the address of the linked list. 327830f3f27SJeffrey Hugo 328830f3f27SJeffrey Hugopcie_dma_dest_addr 329830f3f27SJeffrey Hugo destination address for a bulk transfer. 330830f3f27SJeffrey Hugo 331830f3f27SJeffrey Hugopcie_dma_len 332830f3f27SJeffrey Hugo length of the bulk transfer. Note that the size of this field 333830f3f27SJeffrey Hugo limits transfers to 4G in size. 334830f3f27SJeffrey Hugo 335830f3f27SJeffrey Hugodoorbell_addr 336830f3f27SJeffrey Hugo address of the doorbell to ring when this request is complete. 337830f3f27SJeffrey Hugo 338830f3f27SJeffrey Hugodoorbell_attr 339830f3f27SJeffrey Hugo doorbell attributes. 340830f3f27SJeffrey Hugo 341830f3f27SJeffrey Hugo * Bit(7) indicates if a write to a doorbell is to occur. 342830f3f27SJeffrey Hugo * Bits(6:2) are reserved. 343830f3f27SJeffrey Hugo * Bits(1:0) contain the encoding of the doorbell length. 0 is 32-bit, 344830f3f27SJeffrey Hugo 1 is 16-bit, 2 is 8-bit, 3 is reserved. The doorbell address 345830f3f27SJeffrey Hugo must be naturally aligned to the specified length. 346830f3f27SJeffrey Hugo 347830f3f27SJeffrey Hugodoorbell_data 348830f3f27SJeffrey Hugo data to write to the doorbell. Only the bits corresponding to 349830f3f27SJeffrey Hugo the doorbell length are valid. 350830f3f27SJeffrey Hugo 351830f3f27SJeffrey Hugosem_cmdN 352830f3f27SJeffrey Hugo semaphore command. 353830f3f27SJeffrey Hugo 354830f3f27SJeffrey Hugo * Bit(31) indicates this semaphore command is enabled. 355830f3f27SJeffrey Hugo * Bit(30) is the to-device DMA fence. Block this request until all 356830f3f27SJeffrey Hugo to-device DMA transfers are complete. 357830f3f27SJeffrey Hugo * Bit(29) is the from-device DMA fence. Block this request until all 358830f3f27SJeffrey Hugo from-device DMA transfers are complete. 359830f3f27SJeffrey Hugo * Bits(28:27) are reserved. 360830f3f27SJeffrey Hugo * Bits(26:24) are the semaphore command. 0 is NOP. 1 is init with the 361830f3f27SJeffrey Hugo specified value. 2 is increment. 3 is decrement. 4 is wait 362830f3f27SJeffrey Hugo until the semaphore is equal to the specified value. 5 is wait 363830f3f27SJeffrey Hugo until the semaphore is greater or equal to the specified value. 364830f3f27SJeffrey Hugo 6 is "P", wait until semaphore is greater than 0, then 365830f3f27SJeffrey Hugo decrement by 1. 7 is reserved. 366830f3f27SJeffrey Hugo * Bit(23) is reserved. 367830f3f27SJeffrey Hugo * Bit(22) is the semaphore sync. 0 is post sync, which means that the 368830f3f27SJeffrey Hugo semaphore operation is done after the DMA transfer. 1 is 369830f3f27SJeffrey Hugo presync, which gates the DMA transfer. Only one presync is 370830f3f27SJeffrey Hugo allowed per request. 371830f3f27SJeffrey Hugo * Bit(21) is reserved. 372830f3f27SJeffrey Hugo * Bits(20:16) is the index of the semaphore to operate on. 373830f3f27SJeffrey Hugo * Bits(15:12) are reserved. 374830f3f27SJeffrey Hugo * Bits(11:0) are the semaphore value to use in operations. 375830f3f27SJeffrey Hugo 376830f3f27SJeffrey HugoOverall, a request is processed in 4 steps: 377830f3f27SJeffrey Hugo 378830f3f27SJeffrey Hugo1. If specified, the presync semaphore condition must be true 379830f3f27SJeffrey Hugo2. If enabled, the DMA transfer occurs 380830f3f27SJeffrey Hugo3. If specified, the postsync semaphore conditions must be true 381830f3f27SJeffrey Hugo4. If enabled, the doorbell is written 382830f3f27SJeffrey Hugo 383830f3f27SJeffrey HugoBy using the semaphores in conjunction with the workload running on the NSPs, 384830f3f27SJeffrey Hugothe data pipeline can be synchronized such that the host can queue multiple 385830f3f27SJeffrey Hugorequests of data for the workload to process, but the DMA Bridge will only copy 386830f3f27SJeffrey Hugothe data into the memory of the workload when the workload is ready to process 387830f3f27SJeffrey Hugothe next input. 388830f3f27SJeffrey Hugo 389830f3f27SJeffrey HugoResponse FIFO 390830f3f27SJeffrey Hugo------------- 391830f3f27SJeffrey Hugo 392830f3f27SJeffrey HugoOnce a request is fully processed, a response FIFO element is generated if 393830f3f27SJeffrey Hugospecified in pcie_dma_cmd. The structure of a response FIFO element: 394830f3f27SJeffrey Hugo 395830f3f27SJeffrey Hugo.. code-block:: c 396830f3f27SJeffrey Hugo 397830f3f27SJeffrey Hugo struct response_elem { 398830f3f27SJeffrey Hugo u16 req_id; 399830f3f27SJeffrey Hugo u16 completion_code; 400830f3f27SJeffrey Hugo }; 401830f3f27SJeffrey Hugo 402830f3f27SJeffrey Hugoreq_id 403830f3f27SJeffrey Hugo matches the req_id of the request that generated this element. 404830f3f27SJeffrey Hugo 405830f3f27SJeffrey Hugocompletion_code 406830f3f27SJeffrey Hugo status of this request. 0 is success. Non-zero is an error. 407830f3f27SJeffrey Hugo 408830f3f27SJeffrey HugoThe DMA Bridge will generate a MSI to the host as a reaction to activity in the 409830f3f27SJeffrey Hugoresponse FIFO of a DBC. The DMA Bridge hardware has an IRQ storm mitigation 410830f3f27SJeffrey Hugoalgorithm, where it will only generate a MSI when the response FIFO transitions 411830f3f27SJeffrey Hugofrom empty to non-empty (unless force MSI is enabled and triggered). In 412830f3f27SJeffrey Hugoresponse to this MSI, the host is expected to drain the response FIFO, and must 413830f3f27SJeffrey Hugotake care to handle any race conditions between draining the FIFO, and the 414830f3f27SJeffrey Hugodevice inserting elements into the FIFO. 415830f3f27SJeffrey Hugo 416830f3f27SJeffrey HugoNeural Network Control (NNC) Protocol 417830f3f27SJeffrey Hugo===================================== 418830f3f27SJeffrey Hugo 419830f3f27SJeffrey HugoThe NNC protocol is how the host makes requests to the QSM to manage workloads. 420830f3f27SJeffrey HugoIt uses the QAIC_CONTROL MHI channel. 421830f3f27SJeffrey Hugo 422830f3f27SJeffrey HugoEach NNC request is packaged into a message. Each message is a series of 423830f3f27SJeffrey Hugotransactions. A passthrough type transaction can contain elements known as 424830f3f27SJeffrey Hugocommands. 425830f3f27SJeffrey Hugo 426830f3f27SJeffrey HugoQSM requires NNC messages be little endian encoded and the fields be naturally 427830f3f27SJeffrey Hugoaligned. Since there are 64-bit elements in some NNC messages, 64-bit alignment 428830f3f27SJeffrey Hugomust be maintained. 429830f3f27SJeffrey Hugo 430830f3f27SJeffrey HugoA message contains a header and then a series of transactions. A message may be 431830f3f27SJeffrey Hugoat most 4K in size from QSM to the host. From the host to the QSM, a message 432830f3f27SJeffrey Hugocan be at most 64K (maximum size of a single MHI packet), but there is a 433830f3f27SJeffrey Hugocontinuation feature where message N+1 can be marked as a continuation of 434830f3f27SJeffrey Hugomessage N. This is used for exceedingly large DMA xfer transactions. 435830f3f27SJeffrey Hugo 436830f3f27SJeffrey HugoTransaction descriptions 437830f3f27SJeffrey Hugo------------------------ 438830f3f27SJeffrey Hugo 439830f3f27SJeffrey Hugopassthrough 440830f3f27SJeffrey Hugo Allows userspace to send an opaque payload directly to the QSM. 441830f3f27SJeffrey Hugo This is used for NNC commands. Userspace is responsible for managing 442830f3f27SJeffrey Hugo the QSM message requirements in the payload. 443830f3f27SJeffrey Hugo 444830f3f27SJeffrey Hugodma_xfer 445830f3f27SJeffrey Hugo DMA transfer. Describes an object that the QSM should DMA into the 446830f3f27SJeffrey Hugo device via address and size tuples. 447830f3f27SJeffrey Hugo 448830f3f27SJeffrey Hugoactivate 449830f3f27SJeffrey Hugo Activate a workload onto NSPs. The host must provide memory to be 450830f3f27SJeffrey Hugo used by the DBC. 451830f3f27SJeffrey Hugo 452830f3f27SJeffrey Hugodeactivate 453830f3f27SJeffrey Hugo Deactivate an active workload and return the NSPs to idle. 454830f3f27SJeffrey Hugo 455830f3f27SJeffrey Hugostatus 456830f3f27SJeffrey Hugo Query the QSM about it's NNC implementation. Returns the NNC version, 457830f3f27SJeffrey Hugo and if CRC is used. 458830f3f27SJeffrey Hugo 459830f3f27SJeffrey Hugoterminate 460830f3f27SJeffrey Hugo Release a user's resources. 461830f3f27SJeffrey Hugo 462830f3f27SJeffrey Hugodma_xfer_cont 463830f3f27SJeffrey Hugo Continuation of a previous DMA transfer. If a DMA transfer 464830f3f27SJeffrey Hugo cannot be specified in a single message (highly fragmented), this 465830f3f27SJeffrey Hugo transaction can be used to specify more ranges. 466830f3f27SJeffrey Hugo 467830f3f27SJeffrey Hugovalidate_partition 468830f3f27SJeffrey Hugo Query to QSM to determine if a partition identifier is valid. 469830f3f27SJeffrey Hugo 470830f3f27SJeffrey HugoEach message is tagged with a user id, and a partition id. The user id allows 471830f3f27SJeffrey HugoQSM to track resources, and release them when the user goes away (eg the process 472830f3f27SJeffrey Hugocrashes). A partition id identifies the resource partition that QSM manages, 473830f3f27SJeffrey Hugowhich this message applies to. 474830f3f27SJeffrey Hugo 475830f3f27SJeffrey HugoMessages may have CRCs. Messages should have CRCs applied until the QSM 476830f3f27SJeffrey Hugoreports via the status transaction that CRCs are not needed. The QSM on the 477830f3f27SJeffrey HugoSA9000P requires CRCs for black channel safing. 478830f3f27SJeffrey Hugo 479830f3f27SJeffrey HugoSubsystem Restart (SSR) 480830f3f27SJeffrey Hugo======================= 481830f3f27SJeffrey Hugo 482830f3f27SJeffrey HugoSSR is the concept of limiting the impact of an error. An AIC100 device may 483830f3f27SJeffrey Hugohave multiple users, each with their own workload running. If the workload of 484830f3f27SJeffrey Hugoone user crashes, the fallout of that should be limited to that workload and not 485830f3f27SJeffrey Hugoimpact other workloads. SSR accomplishes this. 486830f3f27SJeffrey Hugo 487830f3f27SJeffrey HugoIf a particular workload crashes, QSM notifies the host via the QAIC_SSR MHI 488830f3f27SJeffrey Hugochannel. This notification identifies the workload by it's assigned DBC. A 489830f3f27SJeffrey Hugomulti-stage recovery process is then used to cleanup both sides, and get the 490830f3f27SJeffrey HugoDBC/NSPs into a working state. 491830f3f27SJeffrey Hugo 492830f3f27SJeffrey HugoWhen SSR occurs, any state in the workload is lost. Any inputs that were in 493830f3f27SJeffrey Hugoprocess, or queued by not yet serviced, are lost. The loaded artifacts will 494830f3f27SJeffrey Hugoremain in on-card DDR, but the host will need to re-activate the workload if 495830f3f27SJeffrey Hugoit desires to recover the workload. 496830f3f27SJeffrey Hugo 497830f3f27SJeffrey HugoReliability, Accessibility, Serviceability (RAS) 498830f3f27SJeffrey Hugo================================================ 499830f3f27SJeffrey Hugo 500830f3f27SJeffrey HugoAIC100 is expected to be deployed in server systems where RAS ideology is 501830f3f27SJeffrey Hugoapplied. Simply put, RAS is the concept of detecting, classifying, and 502830f3f27SJeffrey Hugoreporting errors. While PCIe has AER (Advanced Error Reporting) which factors 503830f3f27SJeffrey Hugointo RAS, AER does not allow for a device to report details about internal 504830f3f27SJeffrey Hugoerrors. Therefore, AIC100 implements a custom RAS mechanism. When a RAS event 505830f3f27SJeffrey Hugooccurs, QSM will report the event with appropriate details via the QAIC_STATUS 506830f3f27SJeffrey HugoMHI channel. A sysadmin may determine that a particular device needs 507830f3f27SJeffrey Hugoadditional service based on RAS reports. 508830f3f27SJeffrey Hugo 509830f3f27SJeffrey HugoTelemetry 510830f3f27SJeffrey Hugo========= 511830f3f27SJeffrey Hugo 512830f3f27SJeffrey HugoQSM has the ability to report various physical attributes of the device, and in 513830f3f27SJeffrey Hugosome cases, to allow the host to control them. Examples include thermal limits, 514830f3f27SJeffrey Hugothermal readings, and power readings. These items are communicated via the 515830f3f27SJeffrey HugoQAIC_TELEMETRY MHI channel. 516