xref: /linux/Documentation/filesystems/relay.rst (revision 56e6d5c0)
1*56e6d5c0SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
2*56e6d5c0SMauro Carvalho Chehab
3*56e6d5c0SMauro Carvalho Chehab==================================
4*56e6d5c0SMauro Carvalho Chehabrelay interface (formerly relayfs)
5*56e6d5c0SMauro Carvalho Chehab==================================
6*56e6d5c0SMauro Carvalho Chehab
7*56e6d5c0SMauro Carvalho ChehabThe relay interface provides a means for kernel applications to
8*56e6d5c0SMauro Carvalho Chehabefficiently log and transfer large quantities of data from the kernel
9*56e6d5c0SMauro Carvalho Chehabto userspace via user-defined 'relay channels'.
10*56e6d5c0SMauro Carvalho Chehab
11*56e6d5c0SMauro Carvalho ChehabA 'relay channel' is a kernel->user data relay mechanism implemented
12*56e6d5c0SMauro Carvalho Chehabas a set of per-cpu kernel buffers ('channel buffers'), each
13*56e6d5c0SMauro Carvalho Chehabrepresented as a regular file ('relay file') in user space.  Kernel
14*56e6d5c0SMauro Carvalho Chehabclients write into the channel buffers using efficient write
15*56e6d5c0SMauro Carvalho Chehabfunctions; these automatically log into the current cpu's channel
16*56e6d5c0SMauro Carvalho Chehabbuffer.  User space applications mmap() or read() from the relay files
17*56e6d5c0SMauro Carvalho Chehaband retrieve the data as it becomes available.  The relay files
18*56e6d5c0SMauro Carvalho Chehabthemselves are files created in a host filesystem, e.g. debugfs, and
19*56e6d5c0SMauro Carvalho Chehabare associated with the channel buffers using the API described below.
20*56e6d5c0SMauro Carvalho Chehab
21*56e6d5c0SMauro Carvalho ChehabThe format of the data logged into the channel buffers is completely
22*56e6d5c0SMauro Carvalho Chehabup to the kernel client; the relay interface does however provide
23*56e6d5c0SMauro Carvalho Chehabhooks which allow kernel clients to impose some structure on the
24*56e6d5c0SMauro Carvalho Chehabbuffer data.  The relay interface doesn't implement any form of data
25*56e6d5c0SMauro Carvalho Chehabfiltering - this also is left to the kernel client.  The purpose is to
26*56e6d5c0SMauro Carvalho Chehabkeep things as simple as possible.
27*56e6d5c0SMauro Carvalho Chehab
28*56e6d5c0SMauro Carvalho ChehabThis document provides an overview of the relay interface API.  The
29*56e6d5c0SMauro Carvalho Chehabdetails of the function parameters are documented along with the
30*56e6d5c0SMauro Carvalho Chehabfunctions in the relay interface code - please see that for details.
31*56e6d5c0SMauro Carvalho Chehab
32*56e6d5c0SMauro Carvalho ChehabSemantics
33*56e6d5c0SMauro Carvalho Chehab=========
34*56e6d5c0SMauro Carvalho Chehab
35*56e6d5c0SMauro Carvalho ChehabEach relay channel has one buffer per CPU, each buffer has one or more
36*56e6d5c0SMauro Carvalho Chehabsub-buffers.  Messages are written to the first sub-buffer until it is
37*56e6d5c0SMauro Carvalho Chehabtoo full to contain a new message, in which case it is written to
38*56e6d5c0SMauro Carvalho Chehabthe next (if available).  Messages are never split across sub-buffers.
39*56e6d5c0SMauro Carvalho ChehabAt this point, userspace can be notified so it empties the first
40*56e6d5c0SMauro Carvalho Chehabsub-buffer, while the kernel continues writing to the next.
41*56e6d5c0SMauro Carvalho Chehab
42*56e6d5c0SMauro Carvalho ChehabWhen notified that a sub-buffer is full, the kernel knows how many
43*56e6d5c0SMauro Carvalho Chehabbytes of it are padding i.e. unused space occurring because a complete
44*56e6d5c0SMauro Carvalho Chehabmessage couldn't fit into a sub-buffer.  Userspace can use this
45*56e6d5c0SMauro Carvalho Chehabknowledge to copy only valid data.
46*56e6d5c0SMauro Carvalho Chehab
47*56e6d5c0SMauro Carvalho ChehabAfter copying it, userspace can notify the kernel that a sub-buffer
48*56e6d5c0SMauro Carvalho Chehabhas been consumed.
49*56e6d5c0SMauro Carvalho Chehab
50*56e6d5c0SMauro Carvalho ChehabA relay channel can operate in a mode where it will overwrite data not
51*56e6d5c0SMauro Carvalho Chehabyet collected by userspace, and not wait for it to be consumed.
52*56e6d5c0SMauro Carvalho Chehab
53*56e6d5c0SMauro Carvalho ChehabThe relay channel itself does not provide for communication of such
54*56e6d5c0SMauro Carvalho Chehabdata between userspace and kernel, allowing the kernel side to remain
55*56e6d5c0SMauro Carvalho Chehabsimple and not impose a single interface on userspace.  It does
56*56e6d5c0SMauro Carvalho Chehabprovide a set of examples and a separate helper though, described
57*56e6d5c0SMauro Carvalho Chehabbelow.
58*56e6d5c0SMauro Carvalho Chehab
59*56e6d5c0SMauro Carvalho ChehabThe read() interface both removes padding and internally consumes the
60*56e6d5c0SMauro Carvalho Chehabread sub-buffers; thus in cases where read(2) is being used to drain
61*56e6d5c0SMauro Carvalho Chehabthe channel buffers, special-purpose communication between kernel and
62*56e6d5c0SMauro Carvalho Chehabuser isn't necessary for basic operation.
63*56e6d5c0SMauro Carvalho Chehab
64*56e6d5c0SMauro Carvalho ChehabOne of the major goals of the relay interface is to provide a low
65*56e6d5c0SMauro Carvalho Chehaboverhead mechanism for conveying kernel data to userspace.  While the
66*56e6d5c0SMauro Carvalho Chehabread() interface is easy to use, it's not as efficient as the mmap()
67*56e6d5c0SMauro Carvalho Chehabapproach; the example code attempts to make the tradeoff between the
68*56e6d5c0SMauro Carvalho Chehabtwo approaches as small as possible.
69*56e6d5c0SMauro Carvalho Chehab
70*56e6d5c0SMauro Carvalho Chehabklog and relay-apps example code
71*56e6d5c0SMauro Carvalho Chehab================================
72*56e6d5c0SMauro Carvalho Chehab
73*56e6d5c0SMauro Carvalho ChehabThe relay interface itself is ready to use, but to make things easier,
74*56e6d5c0SMauro Carvalho Chehaba couple simple utility functions and a set of examples are provided.
75*56e6d5c0SMauro Carvalho Chehab
76*56e6d5c0SMauro Carvalho ChehabThe relay-apps example tarball, available on the relay sourceforge
77*56e6d5c0SMauro Carvalho Chehabsite, contains a set of self-contained examples, each consisting of a
78*56e6d5c0SMauro Carvalho Chehabpair of .c files containing boilerplate code for each of the user and
79*56e6d5c0SMauro Carvalho Chehabkernel sides of a relay application.  When combined these two sets of
80*56e6d5c0SMauro Carvalho Chehabboilerplate code provide glue to easily stream data to disk, without
81*56e6d5c0SMauro Carvalho Chehabhaving to bother with mundane housekeeping chores.
82*56e6d5c0SMauro Carvalho Chehab
83*56e6d5c0SMauro Carvalho ChehabThe 'klog debugging functions' patch (klog.patch in the relay-apps
84*56e6d5c0SMauro Carvalho Chehabtarball) provides a couple of high-level logging functions to the
85*56e6d5c0SMauro Carvalho Chehabkernel which allow writing formatted text or raw data to a channel,
86*56e6d5c0SMauro Carvalho Chehabregardless of whether a channel to write into exists or not, or even
87*56e6d5c0SMauro Carvalho Chehabwhether the relay interface is compiled into the kernel or not.  These
88*56e6d5c0SMauro Carvalho Chehabfunctions allow you to put unconditional 'trace' statements anywhere
89*56e6d5c0SMauro Carvalho Chehabin the kernel or kernel modules; only when there is a 'klog handler'
90*56e6d5c0SMauro Carvalho Chehabregistered will data actually be logged (see the klog and kleak
91*56e6d5c0SMauro Carvalho Chehabexamples for details).
92*56e6d5c0SMauro Carvalho Chehab
93*56e6d5c0SMauro Carvalho ChehabIt is of course possible to use the relay interface from scratch,
94*56e6d5c0SMauro Carvalho Chehabi.e. without using any of the relay-apps example code or klog, but
95*56e6d5c0SMauro Carvalho Chehabyou'll have to implement communication between userspace and kernel,
96*56e6d5c0SMauro Carvalho Chehaballowing both to convey the state of buffers (full, empty, amount of
97*56e6d5c0SMauro Carvalho Chehabpadding).  The read() interface both removes padding and internally
98*56e6d5c0SMauro Carvalho Chehabconsumes the read sub-buffers; thus in cases where read(2) is being
99*56e6d5c0SMauro Carvalho Chehabused to drain the channel buffers, special-purpose communication
100*56e6d5c0SMauro Carvalho Chehabbetween kernel and user isn't necessary for basic operation.  Things
101*56e6d5c0SMauro Carvalho Chehabsuch as buffer-full conditions would still need to be communicated via
102*56e6d5c0SMauro Carvalho Chehabsome channel though.
103*56e6d5c0SMauro Carvalho Chehab
104*56e6d5c0SMauro Carvalho Chehabklog and the relay-apps examples can be found in the relay-apps
105*56e6d5c0SMauro Carvalho Chehabtarball on http://relayfs.sourceforge.net
106*56e6d5c0SMauro Carvalho Chehab
107*56e6d5c0SMauro Carvalho ChehabThe relay interface user space API
108*56e6d5c0SMauro Carvalho Chehab==================================
109*56e6d5c0SMauro Carvalho Chehab
110*56e6d5c0SMauro Carvalho ChehabThe relay interface implements basic file operations for user space
111*56e6d5c0SMauro Carvalho Chehabaccess to relay channel buffer data.  Here are the file operations
112*56e6d5c0SMauro Carvalho Chehabthat are available and some comments regarding their behavior:
113*56e6d5c0SMauro Carvalho Chehab
114*56e6d5c0SMauro Carvalho Chehab=========== ============================================================
115*56e6d5c0SMauro Carvalho Chehabopen()	    enables user to open an _existing_ channel buffer.
116*56e6d5c0SMauro Carvalho Chehab
117*56e6d5c0SMauro Carvalho Chehabmmap()      results in channel buffer being mapped into the caller's
118*56e6d5c0SMauro Carvalho Chehab	    memory space. Note that you can't do a partial mmap - you
119*56e6d5c0SMauro Carvalho Chehab	    must map the entire file, which is NRBUF * SUBBUFSIZE.
120*56e6d5c0SMauro Carvalho Chehab
121*56e6d5c0SMauro Carvalho Chehabread()      read the contents of a channel buffer.  The bytes read are
122*56e6d5c0SMauro Carvalho Chehab	    'consumed' by the reader, i.e. they won't be available
123*56e6d5c0SMauro Carvalho Chehab	    again to subsequent reads.  If the channel is being used
124*56e6d5c0SMauro Carvalho Chehab	    in no-overwrite mode (the default), it can be read at any
125*56e6d5c0SMauro Carvalho Chehab	    time even if there's an active kernel writer.  If the
126*56e6d5c0SMauro Carvalho Chehab	    channel is being used in overwrite mode and there are
127*56e6d5c0SMauro Carvalho Chehab	    active channel writers, results may be unpredictable -
128*56e6d5c0SMauro Carvalho Chehab	    users should make sure that all logging to the channel has
129*56e6d5c0SMauro Carvalho Chehab	    ended before using read() with overwrite mode.  Sub-buffer
130*56e6d5c0SMauro Carvalho Chehab	    padding is automatically removed and will not be seen by
131*56e6d5c0SMauro Carvalho Chehab	    the reader.
132*56e6d5c0SMauro Carvalho Chehab
133*56e6d5c0SMauro Carvalho Chehabsendfile()  transfer data from a channel buffer to an output file
134*56e6d5c0SMauro Carvalho Chehab	    descriptor. Sub-buffer padding is automatically removed
135*56e6d5c0SMauro Carvalho Chehab	    and will not be seen by the reader.
136*56e6d5c0SMauro Carvalho Chehab
137*56e6d5c0SMauro Carvalho Chehabpoll()      POLLIN/POLLRDNORM/POLLERR supported.  User applications are
138*56e6d5c0SMauro Carvalho Chehab	    notified when sub-buffer boundaries are crossed.
139*56e6d5c0SMauro Carvalho Chehab
140*56e6d5c0SMauro Carvalho Chehabclose()     decrements the channel buffer's refcount.  When the refcount
141*56e6d5c0SMauro Carvalho Chehab	    reaches 0, i.e. when no process or kernel client has the
142*56e6d5c0SMauro Carvalho Chehab	    buffer open, the channel buffer is freed.
143*56e6d5c0SMauro Carvalho Chehab=========== ============================================================
144*56e6d5c0SMauro Carvalho Chehab
145*56e6d5c0SMauro Carvalho ChehabIn order for a user application to make use of relay files, the
146*56e6d5c0SMauro Carvalho Chehabhost filesystem must be mounted.  For example::
147*56e6d5c0SMauro Carvalho Chehab
148*56e6d5c0SMauro Carvalho Chehab	mount -t debugfs debugfs /sys/kernel/debug
149*56e6d5c0SMauro Carvalho Chehab
150*56e6d5c0SMauro Carvalho Chehab.. Note::
151*56e6d5c0SMauro Carvalho Chehab
152*56e6d5c0SMauro Carvalho Chehab	the host filesystem doesn't need to be mounted for kernel
153*56e6d5c0SMauro Carvalho Chehab	clients to create or use channels - it only needs to be
154*56e6d5c0SMauro Carvalho Chehab	mounted when user space applications need access to the buffer
155*56e6d5c0SMauro Carvalho Chehab	data.
156*56e6d5c0SMauro Carvalho Chehab
157*56e6d5c0SMauro Carvalho Chehab
158*56e6d5c0SMauro Carvalho ChehabThe relay interface kernel API
159*56e6d5c0SMauro Carvalho Chehab==============================
160*56e6d5c0SMauro Carvalho Chehab
161*56e6d5c0SMauro Carvalho ChehabHere's a summary of the API the relay interface provides to in-kernel clients:
162*56e6d5c0SMauro Carvalho Chehab
163*56e6d5c0SMauro Carvalho ChehabTBD(curr. line MT:/API/)
164*56e6d5c0SMauro Carvalho Chehab  channel management functions::
165*56e6d5c0SMauro Carvalho Chehab
166*56e6d5c0SMauro Carvalho Chehab    relay_open(base_filename, parent, subbuf_size, n_subbufs,
167*56e6d5c0SMauro Carvalho Chehab               callbacks, private_data)
168*56e6d5c0SMauro Carvalho Chehab    relay_close(chan)
169*56e6d5c0SMauro Carvalho Chehab    relay_flush(chan)
170*56e6d5c0SMauro Carvalho Chehab    relay_reset(chan)
171*56e6d5c0SMauro Carvalho Chehab
172*56e6d5c0SMauro Carvalho Chehab  channel management typically called on instigation of userspace::
173*56e6d5c0SMauro Carvalho Chehab
174*56e6d5c0SMauro Carvalho Chehab    relay_subbufs_consumed(chan, cpu, subbufs_consumed)
175*56e6d5c0SMauro Carvalho Chehab
176*56e6d5c0SMauro Carvalho Chehab  write functions::
177*56e6d5c0SMauro Carvalho Chehab
178*56e6d5c0SMauro Carvalho Chehab    relay_write(chan, data, length)
179*56e6d5c0SMauro Carvalho Chehab    __relay_write(chan, data, length)
180*56e6d5c0SMauro Carvalho Chehab    relay_reserve(chan, length)
181*56e6d5c0SMauro Carvalho Chehab
182*56e6d5c0SMauro Carvalho Chehab  callbacks::
183*56e6d5c0SMauro Carvalho Chehab
184*56e6d5c0SMauro Carvalho Chehab    subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
185*56e6d5c0SMauro Carvalho Chehab    buf_mapped(buf, filp)
186*56e6d5c0SMauro Carvalho Chehab    buf_unmapped(buf, filp)
187*56e6d5c0SMauro Carvalho Chehab    create_buf_file(filename, parent, mode, buf, is_global)
188*56e6d5c0SMauro Carvalho Chehab    remove_buf_file(dentry)
189*56e6d5c0SMauro Carvalho Chehab
190*56e6d5c0SMauro Carvalho Chehab  helper functions::
191*56e6d5c0SMauro Carvalho Chehab
192*56e6d5c0SMauro Carvalho Chehab    relay_buf_full(buf)
193*56e6d5c0SMauro Carvalho Chehab    subbuf_start_reserve(buf, length)
194*56e6d5c0SMauro Carvalho Chehab
195*56e6d5c0SMauro Carvalho Chehab
196*56e6d5c0SMauro Carvalho ChehabCreating a channel
197*56e6d5c0SMauro Carvalho Chehab------------------
198*56e6d5c0SMauro Carvalho Chehab
199*56e6d5c0SMauro Carvalho Chehabrelay_open() is used to create a channel, along with its per-cpu
200*56e6d5c0SMauro Carvalho Chehabchannel buffers.  Each channel buffer will have an associated file
201*56e6d5c0SMauro Carvalho Chehabcreated for it in the host filesystem, which can be and mmapped or
202*56e6d5c0SMauro Carvalho Chehabread from in user space.  The files are named basename0...basenameN-1
203*56e6d5c0SMauro Carvalho Chehabwhere N is the number of online cpus, and by default will be created
204*56e6d5c0SMauro Carvalho Chehabin the root of the filesystem (if the parent param is NULL).  If you
205*56e6d5c0SMauro Carvalho Chehabwant a directory structure to contain your relay files, you should
206*56e6d5c0SMauro Carvalho Chehabcreate it using the host filesystem's directory creation function,
207*56e6d5c0SMauro Carvalho Chehabe.g. debugfs_create_dir(), and pass the parent directory to
208*56e6d5c0SMauro Carvalho Chehabrelay_open().  Users are responsible for cleaning up any directory
209*56e6d5c0SMauro Carvalho Chehabstructure they create, when the channel is closed - again the host
210*56e6d5c0SMauro Carvalho Chehabfilesystem's directory removal functions should be used for that,
211*56e6d5c0SMauro Carvalho Chehabe.g. debugfs_remove().
212*56e6d5c0SMauro Carvalho Chehab
213*56e6d5c0SMauro Carvalho ChehabIn order for a channel to be created and the host filesystem's files
214*56e6d5c0SMauro Carvalho Chehabassociated with its channel buffers, the user must provide definitions
215*56e6d5c0SMauro Carvalho Chehabfor two callback functions, create_buf_file() and remove_buf_file().
216*56e6d5c0SMauro Carvalho Chehabcreate_buf_file() is called once for each per-cpu buffer from
217*56e6d5c0SMauro Carvalho Chehabrelay_open() and allows the user to create the file which will be used
218*56e6d5c0SMauro Carvalho Chehabto represent the corresponding channel buffer.  The callback should
219*56e6d5c0SMauro Carvalho Chehabreturn the dentry of the file created to represent the channel buffer.
220*56e6d5c0SMauro Carvalho Chehabremove_buf_file() must also be defined; it's responsible for deleting
221*56e6d5c0SMauro Carvalho Chehabthe file(s) created in create_buf_file() and is called during
222*56e6d5c0SMauro Carvalho Chehabrelay_close().
223*56e6d5c0SMauro Carvalho Chehab
224*56e6d5c0SMauro Carvalho ChehabHere are some typical definitions for these callbacks, in this case
225*56e6d5c0SMauro Carvalho Chehabusing debugfs::
226*56e6d5c0SMauro Carvalho Chehab
227*56e6d5c0SMauro Carvalho Chehab    /*
228*56e6d5c0SMauro Carvalho Chehab    * create_buf_file() callback.  Creates relay file in debugfs.
229*56e6d5c0SMauro Carvalho Chehab    */
230*56e6d5c0SMauro Carvalho Chehab    static struct dentry *create_buf_file_handler(const char *filename,
231*56e6d5c0SMauro Carvalho Chehab						struct dentry *parent,
232*56e6d5c0SMauro Carvalho Chehab						umode_t mode,
233*56e6d5c0SMauro Carvalho Chehab						struct rchan_buf *buf,
234*56e6d5c0SMauro Carvalho Chehab						int *is_global)
235*56e6d5c0SMauro Carvalho Chehab    {
236*56e6d5c0SMauro Carvalho Chehab	    return debugfs_create_file(filename, mode, parent, buf,
237*56e6d5c0SMauro Carvalho Chehab				    &relay_file_operations);
238*56e6d5c0SMauro Carvalho Chehab    }
239*56e6d5c0SMauro Carvalho Chehab
240*56e6d5c0SMauro Carvalho Chehab    /*
241*56e6d5c0SMauro Carvalho Chehab    * remove_buf_file() callback.  Removes relay file from debugfs.
242*56e6d5c0SMauro Carvalho Chehab    */
243*56e6d5c0SMauro Carvalho Chehab    static int remove_buf_file_handler(struct dentry *dentry)
244*56e6d5c0SMauro Carvalho Chehab    {
245*56e6d5c0SMauro Carvalho Chehab	    debugfs_remove(dentry);
246*56e6d5c0SMauro Carvalho Chehab
247*56e6d5c0SMauro Carvalho Chehab	    return 0;
248*56e6d5c0SMauro Carvalho Chehab    }
249*56e6d5c0SMauro Carvalho Chehab
250*56e6d5c0SMauro Carvalho Chehab    /*
251*56e6d5c0SMauro Carvalho Chehab    * relay interface callbacks
252*56e6d5c0SMauro Carvalho Chehab    */
253*56e6d5c0SMauro Carvalho Chehab    static struct rchan_callbacks relay_callbacks =
254*56e6d5c0SMauro Carvalho Chehab    {
255*56e6d5c0SMauro Carvalho Chehab	    .create_buf_file = create_buf_file_handler,
256*56e6d5c0SMauro Carvalho Chehab	    .remove_buf_file = remove_buf_file_handler,
257*56e6d5c0SMauro Carvalho Chehab    };
258*56e6d5c0SMauro Carvalho Chehab
259*56e6d5c0SMauro Carvalho ChehabAnd an example relay_open() invocation using them::
260*56e6d5c0SMauro Carvalho Chehab
261*56e6d5c0SMauro Carvalho Chehab  chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks, NULL);
262*56e6d5c0SMauro Carvalho Chehab
263*56e6d5c0SMauro Carvalho ChehabIf the create_buf_file() callback fails, or isn't defined, channel
264*56e6d5c0SMauro Carvalho Chehabcreation and thus relay_open() will fail.
265*56e6d5c0SMauro Carvalho Chehab
266*56e6d5c0SMauro Carvalho ChehabThe total size of each per-cpu buffer is calculated by multiplying the
267*56e6d5c0SMauro Carvalho Chehabnumber of sub-buffers by the sub-buffer size passed into relay_open().
268*56e6d5c0SMauro Carvalho ChehabThe idea behind sub-buffers is that they're basically an extension of
269*56e6d5c0SMauro Carvalho Chehabdouble-buffering to N buffers, and they also allow applications to
270*56e6d5c0SMauro Carvalho Chehabeasily implement random-access-on-buffer-boundary schemes, which can
271*56e6d5c0SMauro Carvalho Chehabbe important for some high-volume applications.  The number and size
272*56e6d5c0SMauro Carvalho Chehabof sub-buffers is completely dependent on the application and even for
273*56e6d5c0SMauro Carvalho Chehabthe same application, different conditions will warrant different
274*56e6d5c0SMauro Carvalho Chehabvalues for these parameters at different times.  Typically, the right
275*56e6d5c0SMauro Carvalho Chehabvalues to use are best decided after some experimentation; in general,
276*56e6d5c0SMauro Carvalho Chehabthough, it's safe to assume that having only 1 sub-buffer is a bad
277*56e6d5c0SMauro Carvalho Chehabidea - you're guaranteed to either overwrite data or lose events
278*56e6d5c0SMauro Carvalho Chehabdepending on the channel mode being used.
279*56e6d5c0SMauro Carvalho Chehab
280*56e6d5c0SMauro Carvalho ChehabThe create_buf_file() implementation can also be defined in such a way
281*56e6d5c0SMauro Carvalho Chehabas to allow the creation of a single 'global' buffer instead of the
282*56e6d5c0SMauro Carvalho Chehabdefault per-cpu set.  This can be useful for applications interested
283*56e6d5c0SMauro Carvalho Chehabmainly in seeing the relative ordering of system-wide events without
284*56e6d5c0SMauro Carvalho Chehabthe need to bother with saving explicit timestamps for the purpose of
285*56e6d5c0SMauro Carvalho Chehabmerging/sorting per-cpu files in a postprocessing step.
286*56e6d5c0SMauro Carvalho Chehab
287*56e6d5c0SMauro Carvalho ChehabTo have relay_open() create a global buffer, the create_buf_file()
288*56e6d5c0SMauro Carvalho Chehabimplementation should set the value of the is_global outparam to a
289*56e6d5c0SMauro Carvalho Chehabnon-zero value in addition to creating the file that will be used to
290*56e6d5c0SMauro Carvalho Chehabrepresent the single buffer.  In the case of a global buffer,
291*56e6d5c0SMauro Carvalho Chehabcreate_buf_file() and remove_buf_file() will be called only once.  The
292*56e6d5c0SMauro Carvalho Chehabnormal channel-writing functions, e.g. relay_write(), can still be
293*56e6d5c0SMauro Carvalho Chehabused - writes from any cpu will transparently end up in the global
294*56e6d5c0SMauro Carvalho Chehabbuffer - but since it is a global buffer, callers should make sure
295*56e6d5c0SMauro Carvalho Chehabthey use the proper locking for such a buffer, either by wrapping
296*56e6d5c0SMauro Carvalho Chehabwrites in a spinlock, or by copying a write function from relay.h and
297*56e6d5c0SMauro Carvalho Chehabcreating a local version that internally does the proper locking.
298*56e6d5c0SMauro Carvalho Chehab
299*56e6d5c0SMauro Carvalho ChehabThe private_data passed into relay_open() allows clients to associate
300*56e6d5c0SMauro Carvalho Chehabuser-defined data with a channel, and is immediately available
301*56e6d5c0SMauro Carvalho Chehab(including in create_buf_file()) via chan->private_data or
302*56e6d5c0SMauro Carvalho Chehabbuf->chan->private_data.
303*56e6d5c0SMauro Carvalho Chehab
304*56e6d5c0SMauro Carvalho ChehabBuffer-only channels
305*56e6d5c0SMauro Carvalho Chehab--------------------
306*56e6d5c0SMauro Carvalho Chehab
307*56e6d5c0SMauro Carvalho ChehabThese channels have no files associated and can be created with
308*56e6d5c0SMauro Carvalho Chehabrelay_open(NULL, NULL, ...). Such channels are useful in scenarios such
309*56e6d5c0SMauro Carvalho Chehabas when doing early tracing in the kernel, before the VFS is up. In these
310*56e6d5c0SMauro Carvalho Chehabcases, one may open a buffer-only channel and then call
311*56e6d5c0SMauro Carvalho Chehabrelay_late_setup_files() when the kernel is ready to handle files,
312*56e6d5c0SMauro Carvalho Chehabto expose the buffered data to the userspace.
313*56e6d5c0SMauro Carvalho Chehab
314*56e6d5c0SMauro Carvalho ChehabChannel 'modes'
315*56e6d5c0SMauro Carvalho Chehab---------------
316*56e6d5c0SMauro Carvalho Chehab
317*56e6d5c0SMauro Carvalho Chehabrelay channels can be used in either of two modes - 'overwrite' or
318*56e6d5c0SMauro Carvalho Chehab'no-overwrite'.  The mode is entirely determined by the implementation
319*56e6d5c0SMauro Carvalho Chehabof the subbuf_start() callback, as described below.  The default if no
320*56e6d5c0SMauro Carvalho Chehabsubbuf_start() callback is defined is 'no-overwrite' mode.  If the
321*56e6d5c0SMauro Carvalho Chehabdefault mode suits your needs, and you plan to use the read()
322*56e6d5c0SMauro Carvalho Chehabinterface to retrieve channel data, you can ignore the details of this
323*56e6d5c0SMauro Carvalho Chehabsection, as it pertains mainly to mmap() implementations.
324*56e6d5c0SMauro Carvalho Chehab
325*56e6d5c0SMauro Carvalho ChehabIn 'overwrite' mode, also known as 'flight recorder' mode, writes
326*56e6d5c0SMauro Carvalho Chehabcontinuously cycle around the buffer and will never fail, but will
327*56e6d5c0SMauro Carvalho Chehabunconditionally overwrite old data regardless of whether it's actually
328*56e6d5c0SMauro Carvalho Chehabbeen consumed.  In no-overwrite mode, writes will fail, i.e. data will
329*56e6d5c0SMauro Carvalho Chehabbe lost, if the number of unconsumed sub-buffers equals the total
330*56e6d5c0SMauro Carvalho Chehabnumber of sub-buffers in the channel.  It should be clear that if
331*56e6d5c0SMauro Carvalho Chehabthere is no consumer or if the consumer can't consume sub-buffers fast
332*56e6d5c0SMauro Carvalho Chehabenough, data will be lost in either case; the only difference is
333*56e6d5c0SMauro Carvalho Chehabwhether data is lost from the beginning or the end of a buffer.
334*56e6d5c0SMauro Carvalho Chehab
335*56e6d5c0SMauro Carvalho ChehabAs explained above, a relay channel is made of up one or more
336*56e6d5c0SMauro Carvalho Chehabper-cpu channel buffers, each implemented as a circular buffer
337*56e6d5c0SMauro Carvalho Chehabsubdivided into one or more sub-buffers.  Messages are written into
338*56e6d5c0SMauro Carvalho Chehabthe current sub-buffer of the channel's current per-cpu buffer via the
339*56e6d5c0SMauro Carvalho Chehabwrite functions described below.  Whenever a message can't fit into
340*56e6d5c0SMauro Carvalho Chehabthe current sub-buffer, because there's no room left for it, the
341*56e6d5c0SMauro Carvalho Chehabclient is notified via the subbuf_start() callback that a switch to a
342*56e6d5c0SMauro Carvalho Chehabnew sub-buffer is about to occur.  The client uses this callback to 1)
343*56e6d5c0SMauro Carvalho Chehabinitialize the next sub-buffer if appropriate 2) finalize the previous
344*56e6d5c0SMauro Carvalho Chehabsub-buffer if appropriate and 3) return a boolean value indicating
345*56e6d5c0SMauro Carvalho Chehabwhether or not to actually move on to the next sub-buffer.
346*56e6d5c0SMauro Carvalho Chehab
347*56e6d5c0SMauro Carvalho ChehabTo implement 'no-overwrite' mode, the userspace client would provide
348*56e6d5c0SMauro Carvalho Chehaban implementation of the subbuf_start() callback something like the
349*56e6d5c0SMauro Carvalho Chehabfollowing::
350*56e6d5c0SMauro Carvalho Chehab
351*56e6d5c0SMauro Carvalho Chehab    static int subbuf_start(struct rchan_buf *buf,
352*56e6d5c0SMauro Carvalho Chehab			    void *subbuf,
353*56e6d5c0SMauro Carvalho Chehab			    void *prev_subbuf,
354*56e6d5c0SMauro Carvalho Chehab			    unsigned int prev_padding)
355*56e6d5c0SMauro Carvalho Chehab    {
356*56e6d5c0SMauro Carvalho Chehab	    if (prev_subbuf)
357*56e6d5c0SMauro Carvalho Chehab		    *((unsigned *)prev_subbuf) = prev_padding;
358*56e6d5c0SMauro Carvalho Chehab
359*56e6d5c0SMauro Carvalho Chehab	    if (relay_buf_full(buf))
360*56e6d5c0SMauro Carvalho Chehab		    return 0;
361*56e6d5c0SMauro Carvalho Chehab
362*56e6d5c0SMauro Carvalho Chehab	    subbuf_start_reserve(buf, sizeof(unsigned int));
363*56e6d5c0SMauro Carvalho Chehab
364*56e6d5c0SMauro Carvalho Chehab	    return 1;
365*56e6d5c0SMauro Carvalho Chehab    }
366*56e6d5c0SMauro Carvalho Chehab
367*56e6d5c0SMauro Carvalho ChehabIf the current buffer is full, i.e. all sub-buffers remain unconsumed,
368*56e6d5c0SMauro Carvalho Chehabthe callback returns 0 to indicate that the buffer switch should not
369*56e6d5c0SMauro Carvalho Chehaboccur yet, i.e. until the consumer has had a chance to read the
370*56e6d5c0SMauro Carvalho Chehabcurrent set of ready sub-buffers.  For the relay_buf_full() function
371*56e6d5c0SMauro Carvalho Chehabto make sense, the consumer is responsible for notifying the relay
372*56e6d5c0SMauro Carvalho Chehabinterface when sub-buffers have been consumed via
373*56e6d5c0SMauro Carvalho Chehabrelay_subbufs_consumed().  Any subsequent attempts to write into the
374*56e6d5c0SMauro Carvalho Chehabbuffer will again invoke the subbuf_start() callback with the same
375*56e6d5c0SMauro Carvalho Chehabparameters; only when the consumer has consumed one or more of the
376*56e6d5c0SMauro Carvalho Chehabready sub-buffers will relay_buf_full() return 0, in which case the
377*56e6d5c0SMauro Carvalho Chehabbuffer switch can continue.
378*56e6d5c0SMauro Carvalho Chehab
379*56e6d5c0SMauro Carvalho ChehabThe implementation of the subbuf_start() callback for 'overwrite' mode
380*56e6d5c0SMauro Carvalho Chehabwould be very similar::
381*56e6d5c0SMauro Carvalho Chehab
382*56e6d5c0SMauro Carvalho Chehab    static int subbuf_start(struct rchan_buf *buf,
383*56e6d5c0SMauro Carvalho Chehab			    void *subbuf,
384*56e6d5c0SMauro Carvalho Chehab			    void *prev_subbuf,
385*56e6d5c0SMauro Carvalho Chehab			    size_t prev_padding)
386*56e6d5c0SMauro Carvalho Chehab    {
387*56e6d5c0SMauro Carvalho Chehab	    if (prev_subbuf)
388*56e6d5c0SMauro Carvalho Chehab		    *((unsigned *)prev_subbuf) = prev_padding;
389*56e6d5c0SMauro Carvalho Chehab
390*56e6d5c0SMauro Carvalho Chehab	    subbuf_start_reserve(buf, sizeof(unsigned int));
391*56e6d5c0SMauro Carvalho Chehab
392*56e6d5c0SMauro Carvalho Chehab	    return 1;
393*56e6d5c0SMauro Carvalho Chehab    }
394*56e6d5c0SMauro Carvalho Chehab
395*56e6d5c0SMauro Carvalho ChehabIn this case, the relay_buf_full() check is meaningless and the
396*56e6d5c0SMauro Carvalho Chehabcallback always returns 1, causing the buffer switch to occur
397*56e6d5c0SMauro Carvalho Chehabunconditionally.  It's also meaningless for the client to use the
398*56e6d5c0SMauro Carvalho Chehabrelay_subbufs_consumed() function in this mode, as it's never
399*56e6d5c0SMauro Carvalho Chehabconsulted.
400*56e6d5c0SMauro Carvalho Chehab
401*56e6d5c0SMauro Carvalho ChehabThe default subbuf_start() implementation, used if the client doesn't
402*56e6d5c0SMauro Carvalho Chehabdefine any callbacks, or doesn't define the subbuf_start() callback,
403*56e6d5c0SMauro Carvalho Chehabimplements the simplest possible 'no-overwrite' mode, i.e. it does
404*56e6d5c0SMauro Carvalho Chehabnothing but return 0.
405*56e6d5c0SMauro Carvalho Chehab
406*56e6d5c0SMauro Carvalho ChehabHeader information can be reserved at the beginning of each sub-buffer
407*56e6d5c0SMauro Carvalho Chehabby calling the subbuf_start_reserve() helper function from within the
408*56e6d5c0SMauro Carvalho Chehabsubbuf_start() callback.  This reserved area can be used to store
409*56e6d5c0SMauro Carvalho Chehabwhatever information the client wants.  In the example above, room is
410*56e6d5c0SMauro Carvalho Chehabreserved in each sub-buffer to store the padding count for that
411*56e6d5c0SMauro Carvalho Chehabsub-buffer.  This is filled in for the previous sub-buffer in the
412*56e6d5c0SMauro Carvalho Chehabsubbuf_start() implementation; the padding value for the previous
413*56e6d5c0SMauro Carvalho Chehabsub-buffer is passed into the subbuf_start() callback along with a
414*56e6d5c0SMauro Carvalho Chehabpointer to the previous sub-buffer, since the padding value isn't
415*56e6d5c0SMauro Carvalho Chehabknown until a sub-buffer is filled.  The subbuf_start() callback is
416*56e6d5c0SMauro Carvalho Chehabalso called for the first sub-buffer when the channel is opened, to
417*56e6d5c0SMauro Carvalho Chehabgive the client a chance to reserve space in it.  In this case the
418*56e6d5c0SMauro Carvalho Chehabprevious sub-buffer pointer passed into the callback will be NULL, so
419*56e6d5c0SMauro Carvalho Chehabthe client should check the value of the prev_subbuf pointer before
420*56e6d5c0SMauro Carvalho Chehabwriting into the previous sub-buffer.
421*56e6d5c0SMauro Carvalho Chehab
422*56e6d5c0SMauro Carvalho ChehabWriting to a channel
423*56e6d5c0SMauro Carvalho Chehab--------------------
424*56e6d5c0SMauro Carvalho Chehab
425*56e6d5c0SMauro Carvalho ChehabKernel clients write data into the current cpu's channel buffer using
426*56e6d5c0SMauro Carvalho Chehabrelay_write() or __relay_write().  relay_write() is the main logging
427*56e6d5c0SMauro Carvalho Chehabfunction - it uses local_irqsave() to protect the buffer and should be
428*56e6d5c0SMauro Carvalho Chehabused if you might be logging from interrupt context.  If you know
429*56e6d5c0SMauro Carvalho Chehabyou'll never be logging from interrupt context, you can use
430*56e6d5c0SMauro Carvalho Chehab__relay_write(), which only disables preemption.  These functions
431*56e6d5c0SMauro Carvalho Chehabdon't return a value, so you can't determine whether or not they
432*56e6d5c0SMauro Carvalho Chehabfailed - the assumption is that you wouldn't want to check a return
433*56e6d5c0SMauro Carvalho Chehabvalue in the fast logging path anyway, and that they'll always succeed
434*56e6d5c0SMauro Carvalho Chehabunless the buffer is full and no-overwrite mode is being used, in
435*56e6d5c0SMauro Carvalho Chehabwhich case you can detect a failed write in the subbuf_start()
436*56e6d5c0SMauro Carvalho Chehabcallback by calling the relay_buf_full() helper function.
437*56e6d5c0SMauro Carvalho Chehab
438*56e6d5c0SMauro Carvalho Chehabrelay_reserve() is used to reserve a slot in a channel buffer which
439*56e6d5c0SMauro Carvalho Chehabcan be written to later.  This would typically be used in applications
440*56e6d5c0SMauro Carvalho Chehabthat need to write directly into a channel buffer without having to
441*56e6d5c0SMauro Carvalho Chehabstage data in a temporary buffer beforehand.  Because the actual write
442*56e6d5c0SMauro Carvalho Chehabmay not happen immediately after the slot is reserved, applications
443*56e6d5c0SMauro Carvalho Chehabusing relay_reserve() can keep a count of the number of bytes actually
444*56e6d5c0SMauro Carvalho Chehabwritten, either in space reserved in the sub-buffers themselves or as
445*56e6d5c0SMauro Carvalho Chehaba separate array.  See the 'reserve' example in the relay-apps tarball
446*56e6d5c0SMauro Carvalho Chehabat http://relayfs.sourceforge.net for an example of how this can be
447*56e6d5c0SMauro Carvalho Chehabdone.  Because the write is under control of the client and is
448*56e6d5c0SMauro Carvalho Chehabseparated from the reserve, relay_reserve() doesn't protect the buffer
449*56e6d5c0SMauro Carvalho Chehabat all - it's up to the client to provide the appropriate
450*56e6d5c0SMauro Carvalho Chehabsynchronization when using relay_reserve().
451*56e6d5c0SMauro Carvalho Chehab
452*56e6d5c0SMauro Carvalho ChehabClosing a channel
453*56e6d5c0SMauro Carvalho Chehab-----------------
454*56e6d5c0SMauro Carvalho Chehab
455*56e6d5c0SMauro Carvalho ChehabThe client calls relay_close() when it's finished using the channel.
456*56e6d5c0SMauro Carvalho ChehabThe channel and its associated buffers are destroyed when there are no
457*56e6d5c0SMauro Carvalho Chehablonger any references to any of the channel buffers.  relay_flush()
458*56e6d5c0SMauro Carvalho Chehabforces a sub-buffer switch on all the channel buffers, and can be used
459*56e6d5c0SMauro Carvalho Chehabto finalize and process the last sub-buffers before the channel is
460*56e6d5c0SMauro Carvalho Chehabclosed.
461*56e6d5c0SMauro Carvalho Chehab
462*56e6d5c0SMauro Carvalho ChehabMisc
463*56e6d5c0SMauro Carvalho Chehab----
464*56e6d5c0SMauro Carvalho Chehab
465*56e6d5c0SMauro Carvalho ChehabSome applications may want to keep a channel around and re-use it
466*56e6d5c0SMauro Carvalho Chehabrather than open and close a new channel for each use.  relay_reset()
467*56e6d5c0SMauro Carvalho Chehabcan be used for this purpose - it resets a channel to its initial
468*56e6d5c0SMauro Carvalho Chehabstate without reallocating channel buffer memory or destroying
469*56e6d5c0SMauro Carvalho Chehabexisting mappings.  It should however only be called when it's safe to
470*56e6d5c0SMauro Carvalho Chehabdo so, i.e. when the channel isn't currently being written to.
471*56e6d5c0SMauro Carvalho Chehab
472*56e6d5c0SMauro Carvalho ChehabFinally, there are a couple of utility callbacks that can be used for
473*56e6d5c0SMauro Carvalho Chehabdifferent purposes.  buf_mapped() is called whenever a channel buffer
474*56e6d5c0SMauro Carvalho Chehabis mmapped from user space and buf_unmapped() is called when it's
475*56e6d5c0SMauro Carvalho Chehabunmapped.  The client can use this notification to trigger actions
476*56e6d5c0SMauro Carvalho Chehabwithin the kernel application, such as enabling/disabling logging to
477*56e6d5c0SMauro Carvalho Chehabthe channel.
478*56e6d5c0SMauro Carvalho Chehab
479*56e6d5c0SMauro Carvalho Chehab
480*56e6d5c0SMauro Carvalho ChehabResources
481*56e6d5c0SMauro Carvalho Chehab=========
482*56e6d5c0SMauro Carvalho Chehab
483*56e6d5c0SMauro Carvalho ChehabFor news, example code, mailing list, etc. see the relay interface homepage:
484*56e6d5c0SMauro Carvalho Chehab
485*56e6d5c0SMauro Carvalho Chehab    http://relayfs.sourceforge.net
486*56e6d5c0SMauro Carvalho Chehab
487*56e6d5c0SMauro Carvalho Chehab
488*56e6d5c0SMauro Carvalho ChehabCredits
489*56e6d5c0SMauro Carvalho Chehab=======
490*56e6d5c0SMauro Carvalho Chehab
491*56e6d5c0SMauro Carvalho ChehabThe ideas and specs for the relay interface came about as a result of
492*56e6d5c0SMauro Carvalho Chehabdiscussions on tracing involving the following:
493*56e6d5c0SMauro Carvalho Chehab
494*56e6d5c0SMauro Carvalho ChehabMichel Dagenais		<michel.dagenais@polymtl.ca>
495*56e6d5c0SMauro Carvalho ChehabRichard Moore		<richardj_moore@uk.ibm.com>
496*56e6d5c0SMauro Carvalho ChehabBob Wisniewski		<bob@watson.ibm.com>
497*56e6d5c0SMauro Carvalho ChehabKarim Yaghmour		<karim@opersys.com>
498*56e6d5c0SMauro Carvalho ChehabTom Zanussi		<zanussi@us.ibm.com>
499*56e6d5c0SMauro Carvalho Chehab
500*56e6d5c0SMauro Carvalho ChehabAlso thanks to Hubertus Franke for a lot of useful suggestions and bug
501*56e6d5c0SMauro Carvalho Chehabreports.
502