1# Hacking
2
3This section covers quick notes in using and modifying the code-base. The main
4point of interest should be the a12 state machine and its decode and encode
5translation units.
6
7## Exporting shmif
8
9The default implementation for this is in 'a12\_helper\_srv.c'
10
11With exporting shmif we have a local shmif-server that listens on a connection
12point, translates into a12 and sends over some communication channel to an
13a12-server.
14
15This is initiated by the 'a12\_channel\_open call. This takes an optional
16authentication key used for preauthenticated setup where both have
17performed the key-exchange in advance.
18
19The connection point management is outside of the scope here, see the
20arcan\_shmif\_server.h API to the libarcan-shmif-srv library, or the
21corresponding a12\_helper\_srv.c
22
23## Importing shmif
24
25The default implementation for this is in 'a12\_helper\_cl.c'.
26
27With importing shmif we have an a12-server that listens for incoming
28connections, unpacks and maps into shmif connections. From the perspective of a
29local arcan instance, it is just another client.
30
31This is initiated by the 'a12\_channel\_build. This takes an optional
32authentication key used for preauthenticated setup where both have
33performed the key exchange in advance.
34
35## Unpacking
36
37In both export and import you should have access to a shmif\_cont. This
38should be bound to a channel id via:
39
40    a12_set_destination(S, &shmif_cont, 0)
41
42There can only be one context assigned to a channel number, trying to call it
43multiple times with the channel ID will replace the context, likely breaking
44the internal state of the shmif context.
45
46When data has been received over the communication channel, it needs to be
47unpacked into the a12 state machine:
48
49    a12_channel_unpack(S, my_data, number_of_bytes, void_tag, on_event)
50
51The state machine will take care of signalling and modifying the shmif context
52as well, but you will want to prove an 'on\_event' handler to intercept event
53delivery. This will look like the processing after arcan\_shmif\_dequeue.
54
55    on_event(struct arcan_shmif_cont*, int channel, struct arcan_event*, void_tag)
56
57Forward relevant events into the context by arcan\_shmif\_enqueue:ing into it.
58
59## Output
60
61When the communication is available for writing, check with:
62
63    out_sz = a12_channel_flush(S, &buf);
64		if (out_sz)
65		   send(buf, out_sz)
66
67Until it no-longer produces any output. The a12 state machine assumes you are
68done with the buffer and its contents by the next time you call any a12
69function.
70
71## Events
72
73Forwarding events work just like the normal processing of a shmif\_wait or
74poll call. Send it to a12\_channel\_enqueue and it will take care of repacking
75and forwarding. There are a few events that require special treatment, and that
76are those that carry a descriptor pointing to other data as any one channel
77can only have a single binary data stream transfer in flight. There is a helper
78in arcan\_shmif\_descrevent(ev) that will tell you if it is such an event or not.
79
80If the enqueue call fails, it is likely due to congestion from existing transfers.
81When that happens, defer processing and focus on flushing out data.
82
83Some are also order dependent, so you can't reliably forward other data in between:
84
85* FONTHINT : needs to be completed before vframe contents will be correct again
86* BCHUNK\_OUT, BCHUNK\_IN : can be interleaved with other non-descriptor events
87* STORE, RESTORE : needs to be completed before anything else is accepted
88* DEVICEHINT : does not work cross- network
89* PRIVDROP : not supposed to pass networked barriers
90
91PRIVDROP should also be emitted on client to local server, in order to mark that
92it comes from an external/networked source.
93
94## Audio / Video
95
96Both audio and video may need to be provided by each side depending on segment
97type, as the ENCODE/sharing scenario changes the directionality, though it is
98decided at allocation time.
99
100The structures for defining audio and video parameters actually come from
101the shmif\_srv API, though is synched locally in a12\_int.h. Thus in order
102to send a video frame:
103
104    struct shmifsrv_vbuffer vb = shmifsrv_video(shmifsrv_video(client));
105    a12_channel_vframe(S, channel, &vb, &(struct a12_vframe_opts){...});
106
107With the vframe-opts carrying hints to the encoder stage, the typical pattern
108is to select those based on some feedback from the communication combined with
109the type of the segment itself.
110
111## Multiple Channels
112
113One or many communication can, and should, be multiplexed over the same
114carrier. Thus, there is a 1:1 relationship between channel id and shmif
115contents in use. Since this is a state within the A12 context, use
116
117    a12_channel_setid(S, chid);
118
119Before enqueueing events or video. On the importer side, every-time a
120new segment gets allocated, it should be mapped via:
121
122    a12_set_destination(S, shmif_context, chid);
123
124The actual allocation of the channel ids is performed in the server itself
125as part of the event unpack stage with a NEWSEGMENT. In the event handler
126callback you can thus get an event, but a NULL segment and a channel ID.
127That is where to setup the new destination.
128
129Basically just use the tag field in the shmif context to remember chid
130and there should not be much more work to it.
131
132Internally, it is quite a headache, as we have '4' different views of
133the same action.
134
1351. (opt.) shmif-client sends SEGREQ
1362. (opt.) local-server forwards to remote server.
1373. (opt.) remote-server sends to remote-arcan.
1384. (opt.) remote-arcan maps new subsegment (NEWSEGMENT event).
1395. remote-server maps this subsegment, assigns channel-ID. sends command.
1406. local-server gets command, converts into NEWSEGMENT event, maps into
141   channel.
142
143## Packet construction
144
145Going into the a12 internals, the first the to follow is how a packet is
146constructed.
147
148For sending/providing output, first build the appropriate control command.
149When such a buffer is finished, send it to the "a12int\_append\_out"
150function.
151
152    uint8_t hdr_buf[CONTROL_PACKET_SIZE];
153    /* populate hdr_buf, see a12int_vframehdr_build as an example */
154    a12int_append_out(S, STATE_CONTROL_PACKET, hdr_buf, CONTROL_PACKET_SIZE, NULL, 0);
155
156This will take care of buffering, encryption and updating the authentication
157code.
158
159Continue in a similar way with any subpacket types, make sure to chunk output
160in reasonably sized chunks so that interleaving of other packet types is
161possible. This is needed in order to prevent audio / video from stuttering or
162saturating other events.
163
164"a12\_channel\_vframe" is probably the best example of providing output and
165sending, since it needs to treat many options, large data and different
166encoding schemes.
167
168# Notes / Flaws
169
170* vpts, origo\_ll and alpha flags are not yet covered
171
172* custom timers should be managed locally, so the proxy server will still
173  tick etc. without forwarding it remote...
174
175* should we allow session- resume with a timeout? (pair authk in HELLO)
176
177# Critical Path / Security Notes
178
179The implementation is intended to be run as a per-user server with the same
180level of privileges that the user would have on the server through any ssh-
181like session. Therefore, the normal culprits, video/image decoders, are not
182as vital - though they should still be subject to further privsep, the main
183culprits are the ones that work on data while it is unauthenticated or when
184negotiating keys.
185
186The following functions should be the hotpath for vulnerability research:
187
188- a12.c:a12\_unpack (input buffer)
189- a12.c:process\_srvfirst
190- a12.c:process\_nopacket
191- a12.c:process\_control (MAC check + HELLO command)
192- a12.c:process\_event   (up until MAC check)
193- a12.c:process\_video   (up until MAC check)
194- a12.c:process\_audio   (up until MAC check)
195- a12.c:process\_blob    (up until MAC check)
196
197# Protocol
198
199This section mostly covers a rough draft of things as they evolve. A more
200'real' spec is to be written separately towards the end of the subproject in an
201RFC like style and the a12 state machine will be decoupled from the current
202shmif dependency.
203
204Each arcan segment correlates to a 'channel' that can be multiplexed over one
205of these transports, with a sequence number used as a synchronization primitive
206for drift measurement, scheduling re-keying and (later) out-of-order processing.
207
208For each channel, a number of streams can be defined, each with a unique 32-bit
209identifier.
210
211A stream corresponds to one binary, audio or video transfer operation.
212
213One stream of each type (i.e. binary, audio and video) can be in flight at the
214same time.
215
216The first message has the structure of:
217
218 |---------------------|
219 | 8 bytes MAC         |
220 | 8 bytes nonce       | | from CSPRNG
221 |---------------------|
222 | 8 byte sequence     | | encrypted block
223 | 1 byte command-code | | encrypted block   (HELLO command)
224 | command-code data   | | keymaterial       (HELLO command)
225 |----------------------
226
227The MAC comes from BLAKE3 in keyed mode (normally, output size of 16b) using a
228pre-shared secret or the default 'SETECASTRONOMY' that comes from BLAKE3 in KDF
229mode using the message "arcan-a12 init-packet".
230
231The forced encryption of the first packet is to avoid any predictable bytes
232(forcing fingerprinting to be based on packet sizes, rate and timestamps rather
233than any metadata) and hide the fact that X25519 is used. The use of pre-shared
234secrets and X25519 is to allow for a PKI- less first-time authentication of public
235keys.
236
237Only the first 8 byte of MAC output is used for the first HELLO packet in order
238to make it easier for implementations to avoid radically different code paths in
239parsing for these packets.
240
241KDF(secret) -> Kmac.
242
243The cipher is strictly Chacha8 (reduced rounds from normal 20 due to cycles per
244byte, see the paper 'Too much Crypto' by JP et al.) with nonce from message and
245keyed using:
246
247KDF(Kmac) -> Kcl.
248KDF(Kcl) -> Ksrv.
249
250The subsequent HELLO command contains data for 1 or 2 rounds of X25519. If 2
251round-trips is used, the first round is using ephemeral key-pairs to further
252hide the actual key-pair to force active MiM in order for Eve to log/track Kp
253use.
254
255Each message after completed key-exchange has the outer structure of :
256
257 |---------------------|
258 | 16 byte MAC         |
259 |---------------------|  |
260 | 8 byte sequence     |  |
261 | 1 byte command code |  | encrypted block
262 | command-code data   |  |
263 |---------------------|  |
264 | command- variable   |  |
265
266The 8-byte LSB sequence number is incremented for each message.
267
268## Notes
269
270There are a few details with the cryptography setup that is still in flux due
271to unfinished prototyping of rekeying or the immature state.
272
273One is that the setup above requires one or two round-trips before a session is
274established. Ideally we should be able to operate in a weaker 'use the
275pre-shared secret' and immediately schedule a rekey, a user unfriendly - pin
276server public key or through session- resume, a pre-established shared secret.
277
278Another is related in that a directory service would be useful for both dynamic
279discovery, and reducing setup latency. See the minimaLT paper for a possible
280construction of that.
281
282All of these could partially be adressed through modifications to the HELLO
283command. All cryptography setup comes through the HELLO command or the REKEY
284command.
285
286A detail though is that the client cannot do very much until the preroll stage
287of the server-end has completed (SHMIF terminology, initial burst of WM/server
288side state needed to produce correct contents). This is a forced round-trip that
289masks the 1-round x25519 one.
290
291## Commands
292
293The command- code can be one out of the following types:
294
2951. control (128b fixed size)
2962. event, tied to the format of arcan\_shmif\_evpack()
2973. video-stream data chunk
2984. audio-stream data chunk
2995. binary-stream data chunk
300
301Starting with the control commands, these affect connection status, but is
302also used for defining new video/audio and binary streams.
303
304## Control (1)
305- [0..7]    last-seen seqnr : uint64
306- [8..15]   entropy         : uint8[8]
307- [16]      channel-id      : uint8
308- [17]      command         : uint8
309
310The last-seen are used both as a timing channel and to determine drift.
311
312If the two sides start drifting outside a certain window, measures to reduce
313bandwidth should be taken, including increasing compression parameters,
314lowering sample- and frame- rates, cancelling ongoing frames, merging /
315delaying input events, scaling window sizes, ... If they still keep drifting,
316show a user notice and destroy the channel. The drift window should also be
317used with a safety factor to determine the sequence number at which rekeying
318occurs.
319
320The entropy contents can be used as input to the mix pool of a local CSPRNG to
321balance out the risk of a runtime-compromised entropy pool. For multiple HELLO
322roundtrips doing x25519 exchange, the entropy field is also used for cipher
323nonce.
324
325The channel ID will have a zero- value for the first channel, and after
326negotiation via [define-channel], specify the channel the command effects.
327Discard messages to an unused channel (within reasonable tolerances) as
328asynch- races can happen with data in-flight during channel tear-down from
329interleaving.
330
331### command = 0, hello
332- [18]      Version major : uint8 (shmif-version until 1.0)
333- [19]      Version minor : uint8 (shmif-version until 1.0)
334- [20]      Mode          : uint8
335- [21+ 32]  x25519 Pk     : blob
336
337The hello message contains key-material for normal x25519, according to
338the Mode byte [20].
339
340Accepted encryption values:
3410 : no-exchange - Keep using the shared secret key for all communication
342
3431 : X25519 direct - Authenticate supplied Pk, return server Pk and switch
344auth and cipher to computed session key for all subsequent packets.
345
3462 : X25519 nested - Supplied Pk is ephemeral, return ephemeral Pk, switch
347to computed session key and treat next hello as direct.
348
349### command = 1, shutdown
350- [18..n] : last\_words : UTF-8
351
352Destroy the segment defined by the header command-channel.
353Destroying the primary segment kills all others as well.
354
355### command = 2, define-channel
356- [18]     channel-id : uint8
357- [19]     type       : uint8
358- [20]     direction  : uint8 (0 = seg to srv, 1 = srv to seg)
359- [21..24] cookie     : uint32
360
361This corresponds to a slightly altered version of the NEWSEGMENT event,
362which should be absorbed and translated in each proxy.
363
364### command = 3, stream-cancel
365- [18..21] stream-id : uint32
366- [22]     code      : uint8
367- [23]     type      : uint8
368
369This command carries a 4 byte stream ID that refers to the identifier
370of an ongoing video, audio or bstream.
371
372The code dictates if the cancel is due to the information being dated or
373undesired (0), encoded in an unhandled format (1) or data is already known
374(cached, 2).
375
376In the event on vstreams or astreams receiving an unhandled format, (possible
377for H264 and future hardware-/ licensing- dependent encodings), the client
378should attempt to resend / reencode the buffer with one of the built-in
379formats.
380
381The type indicates if the idea of the stram is video (0), audio (1) or binary
382data (2).
383
384### command - 4, define vstream
385- [18..21] : stream-id: uint32
386- [22    ] : format: uint8
387- [23..24] : surfacew: uint16
388- [25..26] : surfaceh: uint16
389- [27..28] : startx: uint16 (0..outw-1)
390- [29..30] : starty: uint16 (0..outh-1)
391- [31..32] : framew: uint16 (outw-startx + framew < outw)
392- [33..34] : frameh: uint16 (outh-starty + frameh < outh)
393- [35    ] : dataflags: uint8
394- [36..39] : length: uint32
395- [40..43] : expanded length: uint32
396- [44]     : commit: uint8
397
398The format field defines the encoding method applied. Current values are:
399
400 R8G8B8A8 = 0 : raw 8-bit red, green, blue and alpha values
401 R8G8B8   = 1 : raw 8-bit red, green and blue values
402 RGB565   = 2 : raw 5 bit red, 6 bit green, 5 bit red
403 DMINIZ   = 3 : [deprecated] DEFLATE compressed block, set as ^ delta from last
404 MINIZ    = 4 : [deprecated] DEFLATE compressed block
405 H264     = 5 : h264 stream
406 TZ       = 6 : [deprecated] DEFLATE compressed tpack block
407 TZSTD    = 7 : ZSTD compressed tpack block
408 ZSTD     = 8 : ZSTD compressed block
409 DZSTD    = 9 : ZSTD compressed block, set as ^ delta from last
410
411This list is likely to be reviewed / compressed into only ZSTD and H264
412variants, as well as allowing a FourCC passthrough block for hardware decoding.
413
414This defines a new video stream frame. The length- field covers how many bytes
415that need to be buffered for the data to be decoded. This can be chunked up
416into 1..n packages, depending on interleaving and so on.
417
418Commit indicates if this is the final (1) update before the accumulation
419buffer can be forwarded without tearing, or if there are more blocks to come.
420
421The dataflags field is a bitmask that indicate if there is any special kind of
422post-processing to apply. The currently defined one is origo_ll (1) which means
423that the completed frame is to be presented with the y axis inverted.
424
425The length field indicates the number of total bytes for all the payloads
426in subsequent vstream-data packets.
427
428### command - 5, define astream
429- [18..21] stream-id  : uint32
430- [22]     channel    : uint8
431- [23]     encoding   : uint8
432- [24..25] nsamples   : uint16
433- [26..29] rate       : uint32
434
435The encoding field determine the size of each sample, multiplied over the
436number of samples multiplied by the number of channels to get the size of
437the stream. The nsamples determins how many samples are sent in this stream.
438
439The size of the sample is determined by the encoding.
440
441The following encodings are allowed:
442 S16 = 0 : signed- 16-bit
443
444### command - 6, define bstream
445- [18..21] stream-id   : uint32
446- [22..29] total-size  : uint64 (0 on streaming source)
447- [30]     stream-type : uint8 (0: state, 1:bchunk, 2: font, 3: font-secondary)
448- [31..34] id-token    : uint32 (used for bchunk pairing on _out/_store)
449- [35 +16] blake3-hash : blob (0 if unknown)
450
451This defines a new or continued binary transfer stream. The block-size sets the
452number of continuous bytes in the stream until the point where another transfer
453can be interleaved. There can thus be multiple binary streams in flight in
454order to interrupt an ongoing one with a higher priority one.
455
456### command - 7, ping
457- [18..21] stream-id : uint32
458
459The stream-id is that of the last completed stream (if any).
460
461### command - 8, rekey
462- [0...7] future-seqnr : uint64
463- [8..39] new (P)key   : uint8[32]
464
465This command indicate a sequence number in the future outside of the expected
466established drift range along with a safety factor. When this sequence number
467has been seen, new message and cipher keys will be derived from the new key and
468used instead of the old one which is discarded and safely erased. The same
469packet with a new corresponding public key will be sent from the other side
470as well.
471
472##  Event (2), fixed length
473- [0..7] sequence number : uint64
474- [8   ] channel-id      : uint8
475- [9+  ] event-data      : special
476
477The event data does not currently have a fixed packing format as the model is
478still being refined and thus we use the opaque format from
479arcan\_shmif\_eventpack.
480
481Worthy of note is that this message type is the most sensitive to side channel
482analysis as input device events are driven by user interaction. Combatting this
483by injecting discard- events is kept outside the protocol implementation, and
484deferred to the UI/window manager.
485
486## Vstream-data (3), Astream-data (4), Bstream-data (5) (variable length)
487- [0   ] channel-id : uint8
488- [1..4] stream-id  : uint32
489- [5..6] length     : uint16
490
491The data messages themselves will make out the bulk of communication,
492and ties to a pre-defined channel/stream.
493
494# Compressions and Codecs
495
496To get audio/video/data transfers bandwidth efficient, the contents must be
497compressed in some way. This is a rats nest of issues, ranging from patents
498in less civilized parts of the world to complete dependency and hardware hell.
499
500The current approach to 'feature negotiation' is simply for the sender to
501try the best one it can, and fall back to inefficient safe-defaults if the
502recipient returns a remark of the frame as unsupported.
503
504# Event Model
505
506The more complicated interactions come from events and their ordering,
507especially when trying to interleave or deal with data carrying events.
508
509The only stage this 'should' require special treatment is for font transfers
510for client side text rendering which typically only happen before activation.
511
512For other kinds of transfers, state and clipboard, they act as a mask/block
513for certain other events. If a state-restore is requested for instance, it
514makes no sense trying to interleave input events that assumes state has been
515restored.
516
517Input events are the next complication in line as any input event that relies
518on a 'press-hold-release' pattern interpreted on the other end may be held for
519too long, or with clients that deal with raw codes and repeats, cause
520extraneous key-repeats or 'shadow releases'. To minimize the harm here, a more
521complex state machine will be needed that tries to determine if the channel
522blocks or not by relying on a ping-stream.
523