1# Hacking 2 3This section covers quick notes in using and modifying the code-base. The main 4point of interest should be the a12 state machine and its decode and encode 5translation units. 6 7## Exporting shmif 8 9The default implementation for this is in 'a12\_helper\_srv.c' 10 11With exporting shmif we have a local shmif-server that listens on a connection 12point, translates into a12 and sends over some communication channel to an 13a12-server. 14 15This is initiated by the 'a12\_channel\_open call. This takes an optional 16authentication key used for preauthenticated setup where both have 17performed the key-exchange in advance. 18 19The connection point management is outside of the scope here, see the 20arcan\_shmif\_server.h API to the libarcan-shmif-srv library, or the 21corresponding a12\_helper\_srv.c 22 23## Importing shmif 24 25The default implementation for this is in 'a12\_helper\_cl.c'. 26 27With importing shmif we have an a12-server that listens for incoming 28connections, unpacks and maps into shmif connections. From the perspective of a 29local arcan instance, it is just another client. 30 31This is initiated by the 'a12\_channel\_build. This takes an optional 32authentication key used for preauthenticated setup where both have 33performed the key exchange in advance. 34 35## Unpacking 36 37In both export and import you should have access to a shmif\_cont. This 38should be bound to a channel id via: 39 40 a12_set_destination(S, &shmif_cont, 0) 41 42There can only be one context assigned to a channel number, trying to call it 43multiple times with the channel ID will replace the context, likely breaking 44the internal state of the shmif context. 45 46When data has been received over the communication channel, it needs to be 47unpacked into the a12 state machine: 48 49 a12_channel_unpack(S, my_data, number_of_bytes, void_tag, on_event) 50 51The state machine will take care of signalling and modifying the shmif context 52as well, but you will want to prove an 'on\_event' handler to intercept event 53delivery. This will look like the processing after arcan\_shmif\_dequeue. 54 55 on_event(struct arcan_shmif_cont*, int channel, struct arcan_event*, void_tag) 56 57Forward relevant events into the context by arcan\_shmif\_enqueue:ing into it. 58 59## Output 60 61When the communication is available for writing, check with: 62 63 out_sz = a12_channel_flush(S, &buf); 64 if (out_sz) 65 send(buf, out_sz) 66 67Until it no-longer produces any output. The a12 state machine assumes you are 68done with the buffer and its contents by the next time you call any a12 69function. 70 71## Events 72 73Forwarding events work just like the normal processing of a shmif\_wait or 74poll call. Send it to a12\_channel\_enqueue and it will take care of repacking 75and forwarding. There are a few events that require special treatment, and that 76are those that carry a descriptor pointing to other data as any one channel 77can only have a single binary data stream transfer in flight. There is a helper 78in arcan\_shmif\_descrevent(ev) that will tell you if it is such an event or not. 79 80If the enqueue call fails, it is likely due to congestion from existing transfers. 81When that happens, defer processing and focus on flushing out data. 82 83Some are also order dependent, so you can't reliably forward other data in between: 84 85* FONTHINT : needs to be completed before vframe contents will be correct again 86* BCHUNK\_OUT, BCHUNK\_IN : can be interleaved with other non-descriptor events 87* STORE, RESTORE : needs to be completed before anything else is accepted 88* DEVICEHINT : does not work cross- network 89* PRIVDROP : not supposed to pass networked barriers 90 91PRIVDROP should also be emitted on client to local server, in order to mark that 92it comes from an external/networked source. 93 94## Audio / Video 95 96Both audio and video may need to be provided by each side depending on segment 97type, as the ENCODE/sharing scenario changes the directionality, though it is 98decided at allocation time. 99 100The structures for defining audio and video parameters actually come from 101the shmif\_srv API, though is synched locally in a12\_int.h. Thus in order 102to send a video frame: 103 104 struct shmifsrv_vbuffer vb = shmifsrv_video(shmifsrv_video(client)); 105 a12_channel_vframe(S, channel, &vb, &(struct a12_vframe_opts){...}); 106 107With the vframe-opts carrying hints to the encoder stage, the typical pattern 108is to select those based on some feedback from the communication combined with 109the type of the segment itself. 110 111## Multiple Channels 112 113One or many communication can, and should, be multiplexed over the same 114carrier. Thus, there is a 1:1 relationship between channel id and shmif 115contents in use. Since this is a state within the A12 context, use 116 117 a12_channel_setid(S, chid); 118 119Before enqueueing events or video. On the importer side, every-time a 120new segment gets allocated, it should be mapped via: 121 122 a12_set_destination(S, shmif_context, chid); 123 124The actual allocation of the channel ids is performed in the server itself 125as part of the event unpack stage with a NEWSEGMENT. In the event handler 126callback you can thus get an event, but a NULL segment and a channel ID. 127That is where to setup the new destination. 128 129Basically just use the tag field in the shmif context to remember chid 130and there should not be much more work to it. 131 132Internally, it is quite a headache, as we have '4' different views of 133the same action. 134 1351. (opt.) shmif-client sends SEGREQ 1362. (opt.) local-server forwards to remote server. 1373. (opt.) remote-server sends to remote-arcan. 1384. (opt.) remote-arcan maps new subsegment (NEWSEGMENT event). 1395. remote-server maps this subsegment, assigns channel-ID. sends command. 1406. local-server gets command, converts into NEWSEGMENT event, maps into 141 channel. 142 143## Packet construction 144 145Going into the a12 internals, the first the to follow is how a packet is 146constructed. 147 148For sending/providing output, first build the appropriate control command. 149When such a buffer is finished, send it to the "a12int\_append\_out" 150function. 151 152 uint8_t hdr_buf[CONTROL_PACKET_SIZE]; 153 /* populate hdr_buf, see a12int_vframehdr_build as an example */ 154 a12int_append_out(S, STATE_CONTROL_PACKET, hdr_buf, CONTROL_PACKET_SIZE, NULL, 0); 155 156This will take care of buffering, encryption and updating the authentication 157code. 158 159Continue in a similar way with any subpacket types, make sure to chunk output 160in reasonably sized chunks so that interleaving of other packet types is 161possible. This is needed in order to prevent audio / video from stuttering or 162saturating other events. 163 164"a12\_channel\_vframe" is probably the best example of providing output and 165sending, since it needs to treat many options, large data and different 166encoding schemes. 167 168# Notes / Flaws 169 170* vpts, origo\_ll and alpha flags are not yet covered 171 172* custom timers should be managed locally, so the proxy server will still 173 tick etc. without forwarding it remote... 174 175* should we allow session- resume with a timeout? (pair authk in HELLO) 176 177# Critical Path / Security Notes 178 179The implementation is intended to be run as a per-user server with the same 180level of privileges that the user would have on the server through any ssh- 181like session. Therefore, the normal culprits, video/image decoders, are not 182as vital - though they should still be subject to further privsep, the main 183culprits are the ones that work on data while it is unauthenticated or when 184negotiating keys. 185 186The following functions should be the hotpath for vulnerability research: 187 188- a12.c:a12\_unpack (input buffer) 189- a12.c:process\_srvfirst 190- a12.c:process\_nopacket 191- a12.c:process\_control (MAC check + HELLO command) 192- a12.c:process\_event (up until MAC check) 193- a12.c:process\_video (up until MAC check) 194- a12.c:process\_audio (up until MAC check) 195- a12.c:process\_blob (up until MAC check) 196 197# Protocol 198 199This section mostly covers a rough draft of things as they evolve. A more 200'real' spec is to be written separately towards the end of the subproject in an 201RFC like style and the a12 state machine will be decoupled from the current 202shmif dependency. 203 204Each arcan segment correlates to a 'channel' that can be multiplexed over one 205of these transports, with a sequence number used as a synchronization primitive 206for drift measurement, scheduling re-keying and (later) out-of-order processing. 207 208For each channel, a number of streams can be defined, each with a unique 32-bit 209identifier. 210 211A stream corresponds to one binary, audio or video transfer operation. 212 213One stream of each type (i.e. binary, audio and video) can be in flight at the 214same time. 215 216The first message has the structure of: 217 218 |---------------------| 219 | 8 bytes MAC | 220 | 8 bytes nonce | | from CSPRNG 221 |---------------------| 222 | 8 byte sequence | | encrypted block 223 | 1 byte command-code | | encrypted block (HELLO command) 224 | command-code data | | keymaterial (HELLO command) 225 |---------------------- 226 227The MAC comes from BLAKE3 in keyed mode (normally, output size of 16b) using a 228pre-shared secret or the default 'SETECASTRONOMY' that comes from BLAKE3 in KDF 229mode using the message "arcan-a12 init-packet". 230 231The forced encryption of the first packet is to avoid any predictable bytes 232(forcing fingerprinting to be based on packet sizes, rate and timestamps rather 233than any metadata) and hide the fact that X25519 is used. The use of pre-shared 234secrets and X25519 is to allow for a PKI- less first-time authentication of public 235keys. 236 237Only the first 8 byte of MAC output is used for the first HELLO packet in order 238to make it easier for implementations to avoid radically different code paths in 239parsing for these packets. 240 241KDF(secret) -> Kmac. 242 243The cipher is strictly Chacha8 (reduced rounds from normal 20 due to cycles per 244byte, see the paper 'Too much Crypto' by JP et al.) with nonce from message and 245keyed using: 246 247KDF(Kmac) -> Kcl. 248KDF(Kcl) -> Ksrv. 249 250The subsequent HELLO command contains data for 1 or 2 rounds of X25519. If 2 251round-trips is used, the first round is using ephemeral key-pairs to further 252hide the actual key-pair to force active MiM in order for Eve to log/track Kp 253use. 254 255Each message after completed key-exchange has the outer structure of : 256 257 |---------------------| 258 | 16 byte MAC | 259 |---------------------| | 260 | 8 byte sequence | | 261 | 1 byte command code | | encrypted block 262 | command-code data | | 263 |---------------------| | 264 | command- variable | | 265 266The 8-byte LSB sequence number is incremented for each message. 267 268## Notes 269 270There are a few details with the cryptography setup that is still in flux due 271to unfinished prototyping of rekeying or the immature state. 272 273One is that the setup above requires one or two round-trips before a session is 274established. Ideally we should be able to operate in a weaker 'use the 275pre-shared secret' and immediately schedule a rekey, a user unfriendly - pin 276server public key or through session- resume, a pre-established shared secret. 277 278Another is related in that a directory service would be useful for both dynamic 279discovery, and reducing setup latency. See the minimaLT paper for a possible 280construction of that. 281 282All of these could partially be adressed through modifications to the HELLO 283command. All cryptography setup comes through the HELLO command or the REKEY 284command. 285 286A detail though is that the client cannot do very much until the preroll stage 287of the server-end has completed (SHMIF terminology, initial burst of WM/server 288side state needed to produce correct contents). This is a forced round-trip that 289masks the 1-round x25519 one. 290 291## Commands 292 293The command- code can be one out of the following types: 294 2951. control (128b fixed size) 2962. event, tied to the format of arcan\_shmif\_evpack() 2973. video-stream data chunk 2984. audio-stream data chunk 2995. binary-stream data chunk 300 301Starting with the control commands, these affect connection status, but is 302also used for defining new video/audio and binary streams. 303 304## Control (1) 305- [0..7] last-seen seqnr : uint64 306- [8..15] entropy : uint8[8] 307- [16] channel-id : uint8 308- [17] command : uint8 309 310The last-seen are used both as a timing channel and to determine drift. 311 312If the two sides start drifting outside a certain window, measures to reduce 313bandwidth should be taken, including increasing compression parameters, 314lowering sample- and frame- rates, cancelling ongoing frames, merging / 315delaying input events, scaling window sizes, ... If they still keep drifting, 316show a user notice and destroy the channel. The drift window should also be 317used with a safety factor to determine the sequence number at which rekeying 318occurs. 319 320The entropy contents can be used as input to the mix pool of a local CSPRNG to 321balance out the risk of a runtime-compromised entropy pool. For multiple HELLO 322roundtrips doing x25519 exchange, the entropy field is also used for cipher 323nonce. 324 325The channel ID will have a zero- value for the first channel, and after 326negotiation via [define-channel], specify the channel the command effects. 327Discard messages to an unused channel (within reasonable tolerances) as 328asynch- races can happen with data in-flight during channel tear-down from 329interleaving. 330 331### command = 0, hello 332- [18] Version major : uint8 (shmif-version until 1.0) 333- [19] Version minor : uint8 (shmif-version until 1.0) 334- [20] Mode : uint8 335- [21+ 32] x25519 Pk : blob 336 337The hello message contains key-material for normal x25519, according to 338the Mode byte [20]. 339 340Accepted encryption values: 3410 : no-exchange - Keep using the shared secret key for all communication 342 3431 : X25519 direct - Authenticate supplied Pk, return server Pk and switch 344auth and cipher to computed session key for all subsequent packets. 345 3462 : X25519 nested - Supplied Pk is ephemeral, return ephemeral Pk, switch 347to computed session key and treat next hello as direct. 348 349### command = 1, shutdown 350- [18..n] : last\_words : UTF-8 351 352Destroy the segment defined by the header command-channel. 353Destroying the primary segment kills all others as well. 354 355### command = 2, define-channel 356- [18] channel-id : uint8 357- [19] type : uint8 358- [20] direction : uint8 (0 = seg to srv, 1 = srv to seg) 359- [21..24] cookie : uint32 360 361This corresponds to a slightly altered version of the NEWSEGMENT event, 362which should be absorbed and translated in each proxy. 363 364### command = 3, stream-cancel 365- [18..21] stream-id : uint32 366- [22] code : uint8 367- [23] type : uint8 368 369This command carries a 4 byte stream ID that refers to the identifier 370of an ongoing video, audio or bstream. 371 372The code dictates if the cancel is due to the information being dated or 373undesired (0), encoded in an unhandled format (1) or data is already known 374(cached, 2). 375 376In the event on vstreams or astreams receiving an unhandled format, (possible 377for H264 and future hardware-/ licensing- dependent encodings), the client 378should attempt to resend / reencode the buffer with one of the built-in 379formats. 380 381The type indicates if the idea of the stram is video (0), audio (1) or binary 382data (2). 383 384### command - 4, define vstream 385- [18..21] : stream-id: uint32 386- [22 ] : format: uint8 387- [23..24] : surfacew: uint16 388- [25..26] : surfaceh: uint16 389- [27..28] : startx: uint16 (0..outw-1) 390- [29..30] : starty: uint16 (0..outh-1) 391- [31..32] : framew: uint16 (outw-startx + framew < outw) 392- [33..34] : frameh: uint16 (outh-starty + frameh < outh) 393- [35 ] : dataflags: uint8 394- [36..39] : length: uint32 395- [40..43] : expanded length: uint32 396- [44] : commit: uint8 397 398The format field defines the encoding method applied. Current values are: 399 400 R8G8B8A8 = 0 : raw 8-bit red, green, blue and alpha values 401 R8G8B8 = 1 : raw 8-bit red, green and blue values 402 RGB565 = 2 : raw 5 bit red, 6 bit green, 5 bit red 403 DMINIZ = 3 : [deprecated] DEFLATE compressed block, set as ^ delta from last 404 MINIZ = 4 : [deprecated] DEFLATE compressed block 405 H264 = 5 : h264 stream 406 TZ = 6 : [deprecated] DEFLATE compressed tpack block 407 TZSTD = 7 : ZSTD compressed tpack block 408 ZSTD = 8 : ZSTD compressed block 409 DZSTD = 9 : ZSTD compressed block, set as ^ delta from last 410 411This list is likely to be reviewed / compressed into only ZSTD and H264 412variants, as well as allowing a FourCC passthrough block for hardware decoding. 413 414This defines a new video stream frame. The length- field covers how many bytes 415that need to be buffered for the data to be decoded. This can be chunked up 416into 1..n packages, depending on interleaving and so on. 417 418Commit indicates if this is the final (1) update before the accumulation 419buffer can be forwarded without tearing, or if there are more blocks to come. 420 421The dataflags field is a bitmask that indicate if there is any special kind of 422post-processing to apply. The currently defined one is origo_ll (1) which means 423that the completed frame is to be presented with the y axis inverted. 424 425The length field indicates the number of total bytes for all the payloads 426in subsequent vstream-data packets. 427 428### command - 5, define astream 429- [18..21] stream-id : uint32 430- [22] channel : uint8 431- [23] encoding : uint8 432- [24..25] nsamples : uint16 433- [26..29] rate : uint32 434 435The encoding field determine the size of each sample, multiplied over the 436number of samples multiplied by the number of channels to get the size of 437the stream. The nsamples determins how many samples are sent in this stream. 438 439The size of the sample is determined by the encoding. 440 441The following encodings are allowed: 442 S16 = 0 : signed- 16-bit 443 444### command - 6, define bstream 445- [18..21] stream-id : uint32 446- [22..29] total-size : uint64 (0 on streaming source) 447- [30] stream-type : uint8 (0: state, 1:bchunk, 2: font, 3: font-secondary) 448- [31..34] id-token : uint32 (used for bchunk pairing on _out/_store) 449- [35 +16] blake3-hash : blob (0 if unknown) 450 451This defines a new or continued binary transfer stream. The block-size sets the 452number of continuous bytes in the stream until the point where another transfer 453can be interleaved. There can thus be multiple binary streams in flight in 454order to interrupt an ongoing one with a higher priority one. 455 456### command - 7, ping 457- [18..21] stream-id : uint32 458 459The stream-id is that of the last completed stream (if any). 460 461### command - 8, rekey 462- [0...7] future-seqnr : uint64 463- [8..39] new (P)key : uint8[32] 464 465This command indicate a sequence number in the future outside of the expected 466established drift range along with a safety factor. When this sequence number 467has been seen, new message and cipher keys will be derived from the new key and 468used instead of the old one which is discarded and safely erased. The same 469packet with a new corresponding public key will be sent from the other side 470as well. 471 472## Event (2), fixed length 473- [0..7] sequence number : uint64 474- [8 ] channel-id : uint8 475- [9+ ] event-data : special 476 477The event data does not currently have a fixed packing format as the model is 478still being refined and thus we use the opaque format from 479arcan\_shmif\_eventpack. 480 481Worthy of note is that this message type is the most sensitive to side channel 482analysis as input device events are driven by user interaction. Combatting this 483by injecting discard- events is kept outside the protocol implementation, and 484deferred to the UI/window manager. 485 486## Vstream-data (3), Astream-data (4), Bstream-data (5) (variable length) 487- [0 ] channel-id : uint8 488- [1..4] stream-id : uint32 489- [5..6] length : uint16 490 491The data messages themselves will make out the bulk of communication, 492and ties to a pre-defined channel/stream. 493 494# Compressions and Codecs 495 496To get audio/video/data transfers bandwidth efficient, the contents must be 497compressed in some way. This is a rats nest of issues, ranging from patents 498in less civilized parts of the world to complete dependency and hardware hell. 499 500The current approach to 'feature negotiation' is simply for the sender to 501try the best one it can, and fall back to inefficient safe-defaults if the 502recipient returns a remark of the frame as unsupported. 503 504# Event Model 505 506The more complicated interactions come from events and their ordering, 507especially when trying to interleave or deal with data carrying events. 508 509The only stage this 'should' require special treatment is for font transfers 510for client side text rendering which typically only happen before activation. 511 512For other kinds of transfers, state and clipboard, they act as a mask/block 513for certain other events. If a state-restore is requested for instance, it 514makes no sense trying to interleave input events that assumes state has been 515restored. 516 517Input events are the next complication in line as any input event that relies 518on a 'press-hold-release' pattern interpreted on the other end may be held for 519too long, or with clients that deal with raw codes and repeats, cause 520extraneous key-repeats or 'shadow releases'. To minimize the harm here, a more 521complex state machine will be needed that tries to determine if the channel 522blocks or not by relying on a ping-stream. 523