xref: /linux/Documentation/arch/s390/vfio-ccw.rst (revision 37002bc6)
1*37002bc6SCosta Shulyupin==================================
2*37002bc6SCosta Shulyupinvfio-ccw: the basic infrastructure
3*37002bc6SCosta Shulyupin==================================
4*37002bc6SCosta Shulyupin
5*37002bc6SCosta ShulyupinIntroduction
6*37002bc6SCosta Shulyupin------------
7*37002bc6SCosta Shulyupin
8*37002bc6SCosta ShulyupinHere we describe the vfio support for I/O subchannel devices for
9*37002bc6SCosta ShulyupinLinux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
10*37002bc6SCosta Shulyupinvirtual machine, while vfio is the means.
11*37002bc6SCosta Shulyupin
12*37002bc6SCosta ShulyupinDifferent than other hardware architectures, s390 has defined a unified
13*37002bc6SCosta ShulyupinI/O access method, which is so called Channel I/O. It has its own access
14*37002bc6SCosta Shulyupinpatterns:
15*37002bc6SCosta Shulyupin
16*37002bc6SCosta Shulyupin- Channel programs run asynchronously on a separate (co)processor.
17*37002bc6SCosta Shulyupin- The channel subsystem will access any memory designated by the caller
18*37002bc6SCosta Shulyupin  in the channel program directly, i.e. there is no iommu involved.
19*37002bc6SCosta Shulyupin
20*37002bc6SCosta ShulyupinThus when we introduce vfio support for these devices, we realize it
21*37002bc6SCosta Shulyupinwith a mediated device (mdev) implementation. The vfio mdev will be
22*37002bc6SCosta Shulyupinadded to an iommu group, so as to make itself able to be managed by the
23*37002bc6SCosta Shulyupinvfio framework. And we add read/write callbacks for special vfio I/O
24*37002bc6SCosta Shulyupinregions to pass the channel programs from the mdev to its parent device
25*37002bc6SCosta Shulyupin(the real I/O subchannel device) to do further address translation and
26*37002bc6SCosta Shulyupinto perform I/O instructions.
27*37002bc6SCosta Shulyupin
28*37002bc6SCosta ShulyupinThis document does not intend to explain the s390 I/O architecture in
29*37002bc6SCosta Shulyupinevery detail. More information/reference could be found here:
30*37002bc6SCosta Shulyupin
31*37002bc6SCosta Shulyupin- A good start to know Channel I/O in general:
32*37002bc6SCosta Shulyupin  https://en.wikipedia.org/wiki/Channel_I/O
33*37002bc6SCosta Shulyupin- s390 architecture:
34*37002bc6SCosta Shulyupin  s390 Principles of Operation manual (IBM Form. No. SA22-7832)
35*37002bc6SCosta Shulyupin- The existing QEMU code which implements a simple emulated channel
36*37002bc6SCosta Shulyupin  subsystem could also be a good reference. It makes it easier to follow
37*37002bc6SCosta Shulyupin  the flow.
38*37002bc6SCosta Shulyupin  qemu/hw/s390x/css.c
39*37002bc6SCosta Shulyupin
40*37002bc6SCosta ShulyupinFor vfio mediated device framework:
41*37002bc6SCosta Shulyupin- Documentation/driver-api/vfio-mediated-device.rst
42*37002bc6SCosta Shulyupin
43*37002bc6SCosta ShulyupinMotivation of vfio-ccw
44*37002bc6SCosta Shulyupin----------------------
45*37002bc6SCosta Shulyupin
46*37002bc6SCosta ShulyupinTypically, a guest virtualized via QEMU/KVM on s390 only sees
47*37002bc6SCosta Shulyupinparavirtualized virtio devices via the "Virtio Over Channel I/O
48*37002bc6SCosta Shulyupin(virtio-ccw)" transport. This makes virtio devices discoverable via
49*37002bc6SCosta Shulyupinstandard operating system algorithms for handling channel devices.
50*37002bc6SCosta Shulyupin
51*37002bc6SCosta ShulyupinHowever this is not enough. On s390 for the majority of devices, which
52*37002bc6SCosta Shulyupinuse the standard Channel I/O based mechanism, we also need to provide
53*37002bc6SCosta Shulyupinthe functionality of passing through them to a QEMU virtual machine.
54*37002bc6SCosta ShulyupinThis includes devices that don't have a virtio counterpart (e.g. tape
55*37002bc6SCosta Shulyupindrives) or that have specific characteristics which guests want to
56*37002bc6SCosta Shulyupinexploit.
57*37002bc6SCosta Shulyupin
58*37002bc6SCosta ShulyupinFor passing a device to a guest, we want to use the same interface as
59*37002bc6SCosta Shulyupineverybody else, namely vfio. We implement this vfio support for channel
60*37002bc6SCosta Shulyupindevices via the vfio mediated device framework and the subchannel device
61*37002bc6SCosta Shulyupindriver "vfio_ccw".
62*37002bc6SCosta Shulyupin
63*37002bc6SCosta ShulyupinAccess patterns of CCW devices
64*37002bc6SCosta Shulyupin------------------------------
65*37002bc6SCosta Shulyupin
66*37002bc6SCosta Shulyupins390 architecture has implemented a so called channel subsystem, that
67*37002bc6SCosta Shulyupinprovides a unified view of the devices physically attached to the
68*37002bc6SCosta Shulyupinsystems. Though the s390 hardware platform knows about a huge variety of
69*37002bc6SCosta Shulyupindifferent peripheral attachments like disk devices (aka. DASDs), tapes,
70*37002bc6SCosta Shulyupincommunication controllers, etc. They can all be accessed by a well
71*37002bc6SCosta Shulyupindefined access method and they are presenting I/O completion a unified
72*37002bc6SCosta Shulyupinway: I/O interruptions.
73*37002bc6SCosta Shulyupin
74*37002bc6SCosta ShulyupinAll I/O requires the use of channel command words (CCWs). A CCW is an
75*37002bc6SCosta Shulyupininstruction to a specialized I/O channel processor. A channel program is
76*37002bc6SCosta Shulyupina sequence of CCWs which are executed by the I/O channel subsystem.  To
77*37002bc6SCosta Shulyupinissue a channel program to the channel subsystem, it is required to
78*37002bc6SCosta Shulyupinbuild an operation request block (ORB), which can be used to point out
79*37002bc6SCosta Shulyupinthe format of the CCW and other control information to the system. The
80*37002bc6SCosta Shulyupinoperating system signals the I/O channel subsystem to begin executing
81*37002bc6SCosta Shulyupinthe channel program with a SSCH (start sub-channel) instruction. The
82*37002bc6SCosta Shulyupincentral processor is then free to proceed with non-I/O instructions
83*37002bc6SCosta Shulyupinuntil interrupted. The I/O completion result is received by the
84*37002bc6SCosta Shulyupininterrupt handler in the form of interrupt response block (IRB).
85*37002bc6SCosta Shulyupin
86*37002bc6SCosta ShulyupinBack to vfio-ccw, in short:
87*37002bc6SCosta Shulyupin
88*37002bc6SCosta Shulyupin- ORBs and channel programs are built in guest kernel (with guest
89*37002bc6SCosta Shulyupin  physical addresses).
90*37002bc6SCosta Shulyupin- ORBs and channel programs are passed to the host kernel.
91*37002bc6SCosta Shulyupin- Host kernel translates the guest physical addresses to real addresses
92*37002bc6SCosta Shulyupin  and starts the I/O with issuing a privileged Channel I/O instruction
93*37002bc6SCosta Shulyupin  (e.g SSCH).
94*37002bc6SCosta Shulyupin- channel programs run asynchronously on a separate processor.
95*37002bc6SCosta Shulyupin- I/O completion will be signaled to the host with I/O interruptions.
96*37002bc6SCosta Shulyupin  And it will be copied as IRB to user space to pass it back to the
97*37002bc6SCosta Shulyupin  guest.
98*37002bc6SCosta Shulyupin
99*37002bc6SCosta ShulyupinPhysical vfio ccw device and its child mdev
100*37002bc6SCosta Shulyupin-------------------------------------------
101*37002bc6SCosta Shulyupin
102*37002bc6SCosta ShulyupinAs mentioned above, we realize vfio-ccw with a mdev implementation.
103*37002bc6SCosta Shulyupin
104*37002bc6SCosta ShulyupinChannel I/O does not have IOMMU hardware support, so the physical
105*37002bc6SCosta Shulyupinvfio-ccw device does not have an IOMMU level translation or isolation.
106*37002bc6SCosta Shulyupin
107*37002bc6SCosta ShulyupinSubchannel I/O instructions are all privileged instructions. When
108*37002bc6SCosta Shulyupinhandling the I/O instruction interception, vfio-ccw has the software
109*37002bc6SCosta Shulyupinpolicing and translation how the channel program is programmed before
110*37002bc6SCosta Shulyupinit gets sent to hardware.
111*37002bc6SCosta Shulyupin
112*37002bc6SCosta ShulyupinWithin this implementation, we have two drivers for two types of
113*37002bc6SCosta Shulyupindevices:
114*37002bc6SCosta Shulyupin
115*37002bc6SCosta Shulyupin- The vfio_ccw driver for the physical subchannel device.
116*37002bc6SCosta Shulyupin  This is an I/O subchannel driver for the real subchannel device.  It
117*37002bc6SCosta Shulyupin  realizes a group of callbacks and registers to the mdev framework as a
118*37002bc6SCosta Shulyupin  parent (physical) device. As a consequence, mdev provides vfio_ccw a
119*37002bc6SCosta Shulyupin  generic interface (sysfs) to create mdev devices. A vfio mdev could be
120*37002bc6SCosta Shulyupin  created by vfio_ccw then and added to the mediated bus. It is the vfio
121*37002bc6SCosta Shulyupin  device that added to an IOMMU group and a vfio group.
122*37002bc6SCosta Shulyupin  vfio_ccw also provides an I/O region to accept channel program
123*37002bc6SCosta Shulyupin  request from user space and store I/O interrupt result for user
124*37002bc6SCosta Shulyupin  space to retrieve. To notify user space an I/O completion, it offers
125*37002bc6SCosta Shulyupin  an interface to setup an eventfd fd for asynchronous signaling.
126*37002bc6SCosta Shulyupin
127*37002bc6SCosta Shulyupin- The vfio_mdev driver for the mediated vfio ccw device.
128*37002bc6SCosta Shulyupin  This is provided by the mdev framework. It is a vfio device driver for
129*37002bc6SCosta Shulyupin  the mdev that created by vfio_ccw.
130*37002bc6SCosta Shulyupin  It realizes a group of vfio device driver callbacks, adds itself to a
131*37002bc6SCosta Shulyupin  vfio group, and registers itself to the mdev framework as a mdev
132*37002bc6SCosta Shulyupin  driver.
133*37002bc6SCosta Shulyupin  It uses a vfio iommu backend that uses the existing map and unmap
134*37002bc6SCosta Shulyupin  ioctls, but rather than programming them into an IOMMU for a device,
135*37002bc6SCosta Shulyupin  it simply stores the translations for use by later requests. This
136*37002bc6SCosta Shulyupin  means that a device programmed in a VM with guest physical addresses
137*37002bc6SCosta Shulyupin  can have the vfio kernel convert that address to process virtual
138*37002bc6SCosta Shulyupin  address, pin the page and program the hardware with the host physical
139*37002bc6SCosta Shulyupin  address in one step.
140*37002bc6SCosta Shulyupin  For a mdev, the vfio iommu backend will not pin the pages during the
141*37002bc6SCosta Shulyupin  VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
142*37002bc6SCosta Shulyupin  of the iova<->vaddr mappings in this operation. And they export a
143*37002bc6SCosta Shulyupin  vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
144*37002bc6SCosta Shulyupin  backend for the physical devices to pin and unpin pages by demand.
145*37002bc6SCosta Shulyupin
146*37002bc6SCosta ShulyupinBelow is a high Level block diagram::
147*37002bc6SCosta Shulyupin
148*37002bc6SCosta Shulyupin +-------------+
149*37002bc6SCosta Shulyupin |             |
150*37002bc6SCosta Shulyupin | +---------+ | mdev_register_driver() +--------------+
151*37002bc6SCosta Shulyupin | |  Mdev   | +<-----------------------+              |
152*37002bc6SCosta Shulyupin | |  bus    | |                        | vfio_mdev.ko |
153*37002bc6SCosta Shulyupin | | driver  | +----------------------->+              |<-> VFIO user
154*37002bc6SCosta Shulyupin | +---------+ |    probe()/remove()    +--------------+    APIs
155*37002bc6SCosta Shulyupin |             |
156*37002bc6SCosta Shulyupin |  MDEV CORE  |
157*37002bc6SCosta Shulyupin |   MODULE    |
158*37002bc6SCosta Shulyupin |   mdev.ko   |
159*37002bc6SCosta Shulyupin | +---------+ | mdev_register_parent() +--------------+
160*37002bc6SCosta Shulyupin | |Physical | +<-----------------------+              |
161*37002bc6SCosta Shulyupin | | device  | |                        |  vfio_ccw.ko |<-> subchannel
162*37002bc6SCosta Shulyupin | |interface| +----------------------->+              |     device
163*37002bc6SCosta Shulyupin | +---------+ |       callback         +--------------+
164*37002bc6SCosta Shulyupin +-------------+
165*37002bc6SCosta Shulyupin
166*37002bc6SCosta ShulyupinThe process of how these work together.
167*37002bc6SCosta Shulyupin
168*37002bc6SCosta Shulyupin1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
169*37002bc6SCosta Shulyupin   physical device (with callbacks) to mdev framework.
170*37002bc6SCosta Shulyupin   When vfio_ccw probing the subchannel device, it registers device
171*37002bc6SCosta Shulyupin   pointer and callbacks to the mdev framework. Mdev related file nodes
172*37002bc6SCosta Shulyupin   under the device node in sysfs would be created for the subchannel
173*37002bc6SCosta Shulyupin   device, namely 'mdev_create', 'mdev_destroy' and
174*37002bc6SCosta Shulyupin   'mdev_supported_types'.
175*37002bc6SCosta Shulyupin2. Create a mediated vfio ccw device.
176*37002bc6SCosta Shulyupin   Use the 'mdev_create' sysfs file, we need to manually create one (and
177*37002bc6SCosta Shulyupin   only one for our case) mediated device.
178*37002bc6SCosta Shulyupin3. vfio_mdev.ko drives the mediated ccw device.
179*37002bc6SCosta Shulyupin   vfio_mdev is also the vfio device driver. It will probe the mdev and
180*37002bc6SCosta Shulyupin   add it to an iommu_group and a vfio_group. Then we could pass through
181*37002bc6SCosta Shulyupin   the mdev to a guest.
182*37002bc6SCosta Shulyupin
183*37002bc6SCosta Shulyupin
184*37002bc6SCosta ShulyupinVFIO-CCW Regions
185*37002bc6SCosta Shulyupin----------------
186*37002bc6SCosta Shulyupin
187*37002bc6SCosta ShulyupinThe vfio-ccw driver exposes MMIO regions to accept requests from and return
188*37002bc6SCosta Shulyupinresults to userspace.
189*37002bc6SCosta Shulyupin
190*37002bc6SCosta Shulyupinvfio-ccw I/O region
191*37002bc6SCosta Shulyupin-------------------
192*37002bc6SCosta Shulyupin
193*37002bc6SCosta ShulyupinAn I/O region is used to accept channel program request from user
194*37002bc6SCosta Shulyupinspace and store I/O interrupt result for user space to retrieve. The
195*37002bc6SCosta Shulyupindefinition of the region is::
196*37002bc6SCosta Shulyupin
197*37002bc6SCosta Shulyupin  struct ccw_io_region {
198*37002bc6SCosta Shulyupin  #define ORB_AREA_SIZE 12
199*37002bc6SCosta Shulyupin	  __u8    orb_area[ORB_AREA_SIZE];
200*37002bc6SCosta Shulyupin  #define SCSW_AREA_SIZE 12
201*37002bc6SCosta Shulyupin	  __u8    scsw_area[SCSW_AREA_SIZE];
202*37002bc6SCosta Shulyupin  #define IRB_AREA_SIZE 96
203*37002bc6SCosta Shulyupin	  __u8    irb_area[IRB_AREA_SIZE];
204*37002bc6SCosta Shulyupin	  __u32   ret_code;
205*37002bc6SCosta Shulyupin  } __packed;
206*37002bc6SCosta Shulyupin
207*37002bc6SCosta ShulyupinThis region is always available.
208*37002bc6SCosta Shulyupin
209*37002bc6SCosta ShulyupinWhile starting an I/O request, orb_area should be filled with the
210*37002bc6SCosta Shulyupinguest ORB, and scsw_area should be filled with the SCSW of the Virtual
211*37002bc6SCosta ShulyupinSubchannel.
212*37002bc6SCosta Shulyupin
213*37002bc6SCosta Shulyupinirb_area stores the I/O result.
214*37002bc6SCosta Shulyupin
215*37002bc6SCosta Shulyupinret_code stores a return code for each access of the region. The following
216*37002bc6SCosta Shulyupinvalues may occur:
217*37002bc6SCosta Shulyupin
218*37002bc6SCosta Shulyupin``0``
219*37002bc6SCosta Shulyupin  The operation was successful.
220*37002bc6SCosta Shulyupin
221*37002bc6SCosta Shulyupin``-EOPNOTSUPP``
222*37002bc6SCosta Shulyupin  The ORB specified transport mode or the
223*37002bc6SCosta Shulyupin  SCSW specified a function other than the start function.
224*37002bc6SCosta Shulyupin
225*37002bc6SCosta Shulyupin``-EIO``
226*37002bc6SCosta Shulyupin  A request was issued while the device was not in a state ready to accept
227*37002bc6SCosta Shulyupin  requests, or an internal error occurred.
228*37002bc6SCosta Shulyupin
229*37002bc6SCosta Shulyupin``-EBUSY``
230*37002bc6SCosta Shulyupin  The subchannel was status pending or busy, or a request is already active.
231*37002bc6SCosta Shulyupin
232*37002bc6SCosta Shulyupin``-EAGAIN``
233*37002bc6SCosta Shulyupin  A request was being processed, and the caller should retry.
234*37002bc6SCosta Shulyupin
235*37002bc6SCosta Shulyupin``-EACCES``
236*37002bc6SCosta Shulyupin  The channel path(s) used for the I/O were found to be not operational.
237*37002bc6SCosta Shulyupin
238*37002bc6SCosta Shulyupin``-ENODEV``
239*37002bc6SCosta Shulyupin  The device was found to be not operational.
240*37002bc6SCosta Shulyupin
241*37002bc6SCosta Shulyupin``-EINVAL``
242*37002bc6SCosta Shulyupin  The orb specified a chain longer than 255 ccws, or an internal error
243*37002bc6SCosta Shulyupin  occurred.
244*37002bc6SCosta Shulyupin
245*37002bc6SCosta Shulyupin
246*37002bc6SCosta Shulyupinvfio-ccw cmd region
247*37002bc6SCosta Shulyupin-------------------
248*37002bc6SCosta Shulyupin
249*37002bc6SCosta ShulyupinThe vfio-ccw cmd region is used to accept asynchronous instructions
250*37002bc6SCosta Shulyupinfrom userspace::
251*37002bc6SCosta Shulyupin
252*37002bc6SCosta Shulyupin  #define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0)
253*37002bc6SCosta Shulyupin  #define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1)
254*37002bc6SCosta Shulyupin  struct ccw_cmd_region {
255*37002bc6SCosta Shulyupin         __u32 command;
256*37002bc6SCosta Shulyupin         __u32 ret_code;
257*37002bc6SCosta Shulyupin  } __packed;
258*37002bc6SCosta Shulyupin
259*37002bc6SCosta ShulyupinThis region is exposed via region type VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD.
260*37002bc6SCosta Shulyupin
261*37002bc6SCosta ShulyupinCurrently, CLEAR SUBCHANNEL and HALT SUBCHANNEL use this region.
262*37002bc6SCosta Shulyupin
263*37002bc6SCosta Shulyupincommand specifies the command to be issued; ret_code stores a return code
264*37002bc6SCosta Shulyupinfor each access of the region. The following values may occur:
265*37002bc6SCosta Shulyupin
266*37002bc6SCosta Shulyupin``0``
267*37002bc6SCosta Shulyupin  The operation was successful.
268*37002bc6SCosta Shulyupin
269*37002bc6SCosta Shulyupin``-ENODEV``
270*37002bc6SCosta Shulyupin  The device was found to be not operational.
271*37002bc6SCosta Shulyupin
272*37002bc6SCosta Shulyupin``-EINVAL``
273*37002bc6SCosta Shulyupin  A command other than halt or clear was specified.
274*37002bc6SCosta Shulyupin
275*37002bc6SCosta Shulyupin``-EIO``
276*37002bc6SCosta Shulyupin  A request was issued while the device was not in a state ready to accept
277*37002bc6SCosta Shulyupin  requests.
278*37002bc6SCosta Shulyupin
279*37002bc6SCosta Shulyupin``-EAGAIN``
280*37002bc6SCosta Shulyupin  A request was being processed, and the caller should retry.
281*37002bc6SCosta Shulyupin
282*37002bc6SCosta Shulyupin``-EBUSY``
283*37002bc6SCosta Shulyupin  The subchannel was status pending or busy while processing a halt request.
284*37002bc6SCosta Shulyupin
285*37002bc6SCosta Shulyupinvfio-ccw schib region
286*37002bc6SCosta Shulyupin---------------------
287*37002bc6SCosta Shulyupin
288*37002bc6SCosta ShulyupinThe vfio-ccw schib region is used to return Subchannel-Information
289*37002bc6SCosta ShulyupinBlock (SCHIB) data to userspace::
290*37002bc6SCosta Shulyupin
291*37002bc6SCosta Shulyupin  struct ccw_schib_region {
292*37002bc6SCosta Shulyupin  #define SCHIB_AREA_SIZE 52
293*37002bc6SCosta Shulyupin         __u8 schib_area[SCHIB_AREA_SIZE];
294*37002bc6SCosta Shulyupin  } __packed;
295*37002bc6SCosta Shulyupin
296*37002bc6SCosta ShulyupinThis region is exposed via region type VFIO_REGION_SUBTYPE_CCW_SCHIB.
297*37002bc6SCosta Shulyupin
298*37002bc6SCosta ShulyupinReading this region triggers a STORE SUBCHANNEL to be issued to the
299*37002bc6SCosta Shulyupinassociated hardware.
300*37002bc6SCosta Shulyupin
301*37002bc6SCosta Shulyupinvfio-ccw crw region
302*37002bc6SCosta Shulyupin---------------------
303*37002bc6SCosta Shulyupin
304*37002bc6SCosta ShulyupinThe vfio-ccw crw region is used to return Channel Report Word (CRW)
305*37002bc6SCosta Shulyupindata to userspace::
306*37002bc6SCosta Shulyupin
307*37002bc6SCosta Shulyupin  struct ccw_crw_region {
308*37002bc6SCosta Shulyupin         __u32 crw;
309*37002bc6SCosta Shulyupin         __u32 pad;
310*37002bc6SCosta Shulyupin  } __packed;
311*37002bc6SCosta Shulyupin
312*37002bc6SCosta ShulyupinThis region is exposed via region type VFIO_REGION_SUBTYPE_CCW_CRW.
313*37002bc6SCosta Shulyupin
314*37002bc6SCosta ShulyupinReading this region returns a CRW if one that is relevant for this
315*37002bc6SCosta Shulyupinsubchannel (e.g. one reporting changes in channel path state) is
316*37002bc6SCosta Shulyupinpending, or all zeroes if not. If multiple CRWs are pending (including
317*37002bc6SCosta Shulyupinpossibly chained CRWs), reading this region again will return the next
318*37002bc6SCosta Shulyupinone, until no more CRWs are pending and zeroes are returned. This is
319*37002bc6SCosta Shulyupinsimilar to how STORE CHANNEL REPORT WORD works.
320*37002bc6SCosta Shulyupin
321*37002bc6SCosta Shulyupinvfio-ccw operation details
322*37002bc6SCosta Shulyupin--------------------------
323*37002bc6SCosta Shulyupin
324*37002bc6SCosta Shulyupinvfio-ccw follows what vfio-pci did on the s390 platform and uses
325*37002bc6SCosta Shulyupinvfio-iommu-type1 as the vfio iommu backend.
326*37002bc6SCosta Shulyupin
327*37002bc6SCosta Shulyupin* CCW translation APIs
328*37002bc6SCosta Shulyupin  A group of APIs (start with `cp_`) to do CCW translation. The CCWs
329*37002bc6SCosta Shulyupin  passed in by a user space program are organized with their guest
330*37002bc6SCosta Shulyupin  physical memory addresses. These APIs will copy the CCWs into kernel
331*37002bc6SCosta Shulyupin  space, and assemble a runnable kernel channel program by updating the
332*37002bc6SCosta Shulyupin  guest physical addresses with their corresponding host physical addresses.
333*37002bc6SCosta Shulyupin  Note that we have to use IDALs even for direct-access CCWs, as the
334*37002bc6SCosta Shulyupin  referenced memory can be located anywhere, including above 2G.
335*37002bc6SCosta Shulyupin
336*37002bc6SCosta Shulyupin* vfio_ccw device driver
337*37002bc6SCosta Shulyupin  This driver utilizes the CCW translation APIs and introduces
338*37002bc6SCosta Shulyupin  vfio_ccw, which is the driver for the I/O subchannel devices you want
339*37002bc6SCosta Shulyupin  to pass through.
340*37002bc6SCosta Shulyupin  vfio_ccw implements the following vfio ioctls::
341*37002bc6SCosta Shulyupin
342*37002bc6SCosta Shulyupin    VFIO_DEVICE_GET_INFO
343*37002bc6SCosta Shulyupin    VFIO_DEVICE_GET_IRQ_INFO
344*37002bc6SCosta Shulyupin    VFIO_DEVICE_GET_REGION_INFO
345*37002bc6SCosta Shulyupin    VFIO_DEVICE_RESET
346*37002bc6SCosta Shulyupin    VFIO_DEVICE_SET_IRQS
347*37002bc6SCosta Shulyupin
348*37002bc6SCosta Shulyupin  This provides an I/O region, so that the user space program can pass a
349*37002bc6SCosta Shulyupin  channel program to the kernel, to do further CCW translation before
350*37002bc6SCosta Shulyupin  issuing them to a real device.
351*37002bc6SCosta Shulyupin  This also provides the SET_IRQ ioctl to setup an event notifier to
352*37002bc6SCosta Shulyupin  notify the user space program the I/O completion in an asynchronous
353*37002bc6SCosta Shulyupin  way.
354*37002bc6SCosta Shulyupin
355*37002bc6SCosta ShulyupinThe use of vfio-ccw is not limited to QEMU, while QEMU is definitely a
356*37002bc6SCosta Shulyupingood example to get understand how these patches work. Here is a little
357*37002bc6SCosta Shulyupinbit more detail how an I/O request triggered by the QEMU guest will be
358*37002bc6SCosta Shulyupinhandled (without error handling).
359*37002bc6SCosta Shulyupin
360*37002bc6SCosta ShulyupinExplanation:
361*37002bc6SCosta Shulyupin
362*37002bc6SCosta Shulyupin- Q1-Q7: QEMU side process.
363*37002bc6SCosta Shulyupin- K1-K5: Kernel side process.
364*37002bc6SCosta Shulyupin
365*37002bc6SCosta ShulyupinQ1.
366*37002bc6SCosta Shulyupin    Get I/O region info during initialization.
367*37002bc6SCosta Shulyupin
368*37002bc6SCosta ShulyupinQ2.
369*37002bc6SCosta Shulyupin    Setup event notifier and handler to handle I/O completion.
370*37002bc6SCosta Shulyupin
371*37002bc6SCosta Shulyupin... ...
372*37002bc6SCosta Shulyupin
373*37002bc6SCosta ShulyupinQ3.
374*37002bc6SCosta Shulyupin    Intercept a ssch instruction.
375*37002bc6SCosta ShulyupinQ4.
376*37002bc6SCosta Shulyupin    Write the guest channel program and ORB to the I/O region.
377*37002bc6SCosta Shulyupin
378*37002bc6SCosta Shulyupin    K1.
379*37002bc6SCosta Shulyupin	Copy from guest to kernel.
380*37002bc6SCosta Shulyupin    K2.
381*37002bc6SCosta Shulyupin	Translate the guest channel program to a host kernel space
382*37002bc6SCosta Shulyupin	channel program, which becomes runnable for a real device.
383*37002bc6SCosta Shulyupin    K3.
384*37002bc6SCosta Shulyupin	With the necessary information contained in the orb passed in
385*37002bc6SCosta Shulyupin	by QEMU, issue the ccwchain to the device.
386*37002bc6SCosta Shulyupin    K4.
387*37002bc6SCosta Shulyupin	Return the ssch CC code.
388*37002bc6SCosta ShulyupinQ5.
389*37002bc6SCosta Shulyupin    Return the CC code to the guest.
390*37002bc6SCosta Shulyupin
391*37002bc6SCosta Shulyupin... ...
392*37002bc6SCosta Shulyupin
393*37002bc6SCosta Shulyupin    K5.
394*37002bc6SCosta Shulyupin	Interrupt handler gets the I/O result and write the result to
395*37002bc6SCosta Shulyupin	the I/O region.
396*37002bc6SCosta Shulyupin    K6.
397*37002bc6SCosta Shulyupin	Signal QEMU to retrieve the result.
398*37002bc6SCosta Shulyupin
399*37002bc6SCosta ShulyupinQ6.
400*37002bc6SCosta Shulyupin    Get the signal and event handler reads out the result from the I/O
401*37002bc6SCosta Shulyupin    region.
402*37002bc6SCosta ShulyupinQ7.
403*37002bc6SCosta Shulyupin    Update the irb for the guest.
404*37002bc6SCosta Shulyupin
405*37002bc6SCosta ShulyupinLimitations
406*37002bc6SCosta Shulyupin-----------
407*37002bc6SCosta Shulyupin
408*37002bc6SCosta ShulyupinThe current vfio-ccw implementation focuses on supporting basic commands
409*37002bc6SCosta Shulyupinneeded to implement block device functionality (read/write) of DASD/ECKD
410*37002bc6SCosta Shulyupindevice only. Some commands may need special handling in the future, for
411*37002bc6SCosta Shulyupinexample, anything related to path grouping.
412*37002bc6SCosta Shulyupin
413*37002bc6SCosta ShulyupinDASD is a kind of storage device. While ECKD is a data recording format.
414*37002bc6SCosta ShulyupinMore information for DASD and ECKD could be found here:
415*37002bc6SCosta Shulyupinhttps://en.wikipedia.org/wiki/Direct-access_storage_device
416*37002bc6SCosta Shulyupinhttps://en.wikipedia.org/wiki/Count_key_data
417*37002bc6SCosta Shulyupin
418*37002bc6SCosta ShulyupinTogether with the corresponding work in QEMU, we can bring the passed
419*37002bc6SCosta Shulyupinthrough DASD/ECKD device online in a guest now and use it as a block
420*37002bc6SCosta Shulyupindevice.
421*37002bc6SCosta Shulyupin
422*37002bc6SCosta ShulyupinThe current code allows the guest to start channel programs via
423*37002bc6SCosta ShulyupinSTART SUBCHANNEL, and to issue HALT SUBCHANNEL, CLEAR SUBCHANNEL,
424*37002bc6SCosta Shulyupinand STORE SUBCHANNEL.
425*37002bc6SCosta Shulyupin
426*37002bc6SCosta ShulyupinCurrently all channel programs are prefetched, regardless of the
427*37002bc6SCosta Shulyupinp-bit setting in the ORB.  As a result, self modifying channel
428*37002bc6SCosta Shulyupinprograms are not supported.  For this reason, IPL has to be handled as
429*37002bc6SCosta Shulyupina special case by a userspace/guest program; this has been implemented
430*37002bc6SCosta Shulyupinin QEMU's s390-ccw bios as of QEMU 4.1.
431*37002bc6SCosta Shulyupin
432*37002bc6SCosta Shulyupinvfio-ccw supports classic (command mode) channel I/O only. Transport
433*37002bc6SCosta Shulyupinmode (HPF) is not supported.
434*37002bc6SCosta Shulyupin
435*37002bc6SCosta ShulyupinQDIO subchannels are currently not supported. Classic devices other than
436*37002bc6SCosta ShulyupinDASD/ECKD might work, but have not been tested.
437*37002bc6SCosta Shulyupin
438*37002bc6SCosta ShulyupinReference
439*37002bc6SCosta Shulyupin---------
440*37002bc6SCosta Shulyupin1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
441*37002bc6SCosta Shulyupin2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
442*37002bc6SCosta Shulyupin3. https://en.wikipedia.org/wiki/Channel_I/O
443*37002bc6SCosta Shulyupin4. Documentation/arch/s390/cds.rst
444*37002bc6SCosta Shulyupin5. Documentation/driver-api/vfio.rst
445*37002bc6SCosta Shulyupin6. Documentation/driver-api/vfio-mediated-device.rst
446