1\# This file is so named for tradition's sake: it contains what we
2\# always used to refer to, before they were written down, as
3\# PuTTY's `unwritten design principles'. It has nothing to do with
4\# the User Datagram Protocol.
5
6\A{udp} PuTTY hacking guide
7
8This appendix lists a selection of the design principles applying to
9the PuTTY source code. If you are planning to send code
10contributions, you should read this first.
11
12\H{udp-portability} Cross-OS portability
13
14Despite Windows being its main area of fame, PuTTY is no longer a
15Windows-only application suite. It has a working Unix port; a Mac
16port is in progress; more ports may or may not happen at a later
17date.
18
19Therefore, embedding Windows-specific code in core modules such as
20\cw{ssh.c} is not acceptable. We went to great lengths to \e{remove}
21all the Windows-specific stuff from our core modules, and to shift
22it out into Windows-specific modules. Adding large amounts of
23Windows-specific stuff in parts of the code that should be portable
24is almost guaranteed to make us reject a contribution.
25
26The PuTTY source base is divided into platform-specific modules and
27platform-generic modules. The Unix-specific modules are all in the
28\c{unix} subdirectory; the Windows-specific modules are in the
29\c{windows} subdirectory.
30
31All the modules in the main source directory - notably \e{all} of
32the code for the various back ends - are platform-generic. We want
33to keep them that way.
34
35This also means you should stick to the C semantics guaranteed by the
36C standard: try not to make assumptions about the precise size of
37basic types such as \c{int} and \c{long int}; don't use pointer casts
38to do endianness-dependent operations, and so on.
39
40(Even \e{within} a platform front end you should still be careful of
41some of these portability issues. The Windows front end compiles on
42both 32- and 64-bit x86 and also Arm.)
43
44Our current choice of C standards version is \e{mostly} C99. With a
45couple of exceptions, you can assume that C99 features are available
46(in particular \cw{<stdint.h>}, \cw{<stdbool.h>} and \c{inline}), but
47you shouldn't use things that are new in C11 (such as \cw{<uchar.h>}
48or \cw{_Generic}).
49
50The exceptions to that rule are due to the need for Visual Studio
51compatibility:
52
53\b Don't use variable-length arrays. Visual Studio doesn't support
54them even now that it's adopted the rest of C99. We use \cw{-Wvla}
55when building with gcc and clang, to make it easier to avoid
56accidentally breaking that rule.
57
58\b For historical reasons, we still build with one older VS version
59which lacks \cw{<inttypes.h>}. So that file is included centrally in
60\c{defs.h}, and has a set of workaround definitions for the
61\cw{PRIx64}-type macros we use. If you need to use another one of
62those macros, you need to add a workaround definition in \c{defs.h},
63and don't casually re-include \cw{<inttypes.h>} anywhere else in the
64source file.
65
66Here are a few portability assumptions that we \e{do} currently allow
67(because we'd already have to thoroughly vet the existing code if they
68ever needed to change, and it doesn't seem worth doing that unless we
69really have to):
70
71\b You can assume \c{int} is \e{at least} 32 bits wide. (We've never
72tried to port PuTTY to a platform with 16-bit \cw{int}, and it doesn't
73look likely to be necessary in future.)
74
75\b Similarly, you can assume \c{char} is exactly 8 bits. (Exceptions
76to that are even less likely to be relevant to us than short
77\cw{int}.)
78
79\b You can assume that using \c{memset} to write zero bytes over a
80whole structure will have the effect of setting all its pointer fields
81to \cw{NULL}. (The standard itself guarantees this for \e{integer}
82fields, but not for pointers.)
83
84\b You can assume that \c{time_t} has POSIX semantics, i.e. that it
85represents an integer number of non-leap seconds since 1970-01-01
8600:00:00 UTC. (Times in this format are used in X authorisation, but
87we could work around that by carefully distinguishing local \c{time_t}
88from time values used in the wire protocol; but these semantics of
89\c{time_t} are also baked into the shared library API used by the
90GSSAPI authentication code, which would be much harder to change.)
91
92\b You can assume that the execution character encoding is a superset
93of the printable characters of ASCII. (In particular, it's fine to do
94arithmetic on a \c{char} value representing a Latin alphabetic
95character, without bothering to allow for EBCDIC or other
96non-consecutive encodings of the alphabet.)
97
98On the other hand, here are some particular things \e{not} to assume:
99
100\b Don't assume anything about the \e{signedness} of \c{char}. In
101particular, you \e{must} cast \c{char} values to \c{unsigned char}
102before passing them to any \cw{<ctype.h>} function (because those
103expect a non-negative character value, or \cw{EOF}). If you need a
104particular signedness, explicitly specify \c{signed char} or
105\c{unsigned char}, or use C99 \cw{int8_t} or \cw{uint8_t}.
106
107\b From past experience with MacOS, we're still a bit nervous about
108\cw{'\\n'} and \cw{'\\r'} potentially having unusual meanings on a
109given platform. So it's fine to say \c{\\n} in a string you're passing
110to \c{printf}, but in any context where those characters appear in a
111standardised wire protocol or a binary file format, they should be
112spelled \cw{'\\012'} and \cw{'\\015'} respectively.
113
114\H{udp-multi-backend} Multiple backends treated equally
115
116PuTTY is not an SSH client with some other stuff tacked on the side.
117PuTTY is a generic, multiple-backend, remote VT-terminal client
118which happens to support one backend which is larger, more popular
119and more useful than the rest. Any extra feature which can possibly
120be general across all backends should be so: localising features
121unnecessarily into the SSH back end is a design error. (For example,
122we had several code submissions for proxy support which worked by
123hacking \cw{ssh.c}. Clearly this is completely wrong: the
124\cw{network.h} abstraction is the place to put it, so that it will
125apply to all back ends equally, and indeed we eventually put it
126there after another contributor sent a better patch.)
127
128The rest of PuTTY should try to avoid knowing anything about
129specific back ends if at all possible. To support a feature which is
130only available in one network protocol, for example, the back end
131interface should be extended in a general manner such that \e{any}
132back end which is able to provide that feature can do so. If it so
133happens that only one back end actually does, that's just the way it
134is, but it shouldn't be relied upon by any code.
135
136\H{udp-globals} Multiple sessions per process on some platforms
137
138Some ports of PuTTY - notably the in-progress Mac port - are
139constrained by the operating system to run as a single process
140potentially managing multiple sessions.
141
142Therefore, the platform-independent parts of PuTTY never use global
143variables to store per-session data. The global variables that do
144exist are tolerated because they are not specific to a particular
145login session. The random number state in \cw{sshrand.c}, the timer
146list in \cw{timing.c} and the queue of top-level callbacks in
147\cw{callback.c} serve all sessions equally. But most data is specific
148to a particular network session, and is therefore stored in
149dynamically allocated data structures, and pointers to these
150structures are passed around between functions.
151
152Platform-specific code can reverse this decision if it likes. The
153Windows code, for historical reasons, stores most of its data as
154global variables. That's OK, because \e{on Windows} we know there is
155only one session per PuTTY process, so it's safe to do that. But
156changes to the platform-independent code should avoid introducing
157global variables, unless they are genuinely cross-session.
158
159\H{udp-pure-c} C, not C++
160
161PuTTY is written entirely in C, not in C++.
162
163We have made \e{some} effort to make it easy to compile our code
164using a C++ compiler: notably, our \c{snew}, \c{snewn} and
165\c{sresize} macros explicitly cast the return values of \cw{malloc}
166and \cw{realloc} to the target type. (This has type checking
167advantages even in C: it means you never accidentally allocate the
168wrong size piece of memory for the pointer type you're assigning it
169to. C++ friendliness is really a side benefit.)
170
171We want PuTTY to continue being pure C, at least in the
172platform-independent parts and the currently existing ports. Patches
173which switch the Makefiles to compile it as C++ and start using
174classes will not be accepted. Also, in particular, we disapprove of
175\cw{//} comments, at least for the moment. (Perhaps once C99 becomes
176genuinely widespread we might be more lenient.)
177
178The one exception: a port to a new platform may use languages other
179than C if they are necessary to code on that platform. If your
180favourite PDA has a GUI with a C++ API, then there's no way you can
181do a port of PuTTY without using C++, so go ahead and use it. But
182keep the C++ restricted to that platform's subdirectory; if your
183changes force the Unix or Windows ports to be compiled as C++, they
184will be unacceptable to us.
185
186\H{udp-security} Security-conscious coding
187
188PuTTY is a network application and a security application. Assume
189your code will end up being fed deliberately malicious data by
190attackers, and try to code in a way that makes it unlikely to be a
191security risk.
192
193In particular, try not to use fixed-size buffers for variable-size
194data such as strings received from the network (or even the user).
195We provide functions such as \cw{dupcat} and \cw{dupprintf}, which
196dynamically allocate buffers of the right size for the string they
197construct. Use these wherever possible.
198
199\H{udp-multi-compiler} Independence of specific compiler
200
201Windows PuTTY can currently be compiled with any of three Windows
202compilers: MS Visual C, the Cygwin / \cw{mingw32} GNU tools, and
203\cw{clang} (in MS compatibility mode).
204
205This is a really useful property of PuTTY, because it means people
206who want to contribute to the coding don't depend on having a
207specific compiler; so they don't have to fork out money for MSVC if
208they don't already have it, but on the other hand if they \e{do}
209have it they also don't have to spend effort installing \cw{gcc}
210alongside it. They can use whichever compiler they happen to have
211available, or install whichever is cheapest and easiest if they
212don't have one.
213
214Therefore, we don't want PuTTY to start depending on which compiler
215you're using. Using GNU extensions to the C language, for example,
216would ruin this useful property (not that anyone's ever tried it!);
217and more realistically, depending on an MS-specific library function
218supplied by the MSVC C library (\cw{_snprintf}, for example) is a
219mistake, because that function won't be available under the other
220compilers. Any function supplied in an official Windows DLL as part
221of the Windows API is fine, and anything defined in the C library
222standard is also fine, because those should be available
223irrespective of compilation environment. But things in between,
224available as non-standard library and language extensions in only
225one compiler, are disallowed.
226
227(\cw{_snprintf} in particular should be unnecessary, since we
228provide \cw{dupprintf}; see \k{udp-security}.)
229
230Compiler independence should apply on all platforms, of course, not
231just on Windows.
232
233\H{udp-small} Small code size
234
235PuTTY is tiny, compared to many other Windows applications. And it's
236easy to install: it depends on no DLLs, no other applications, no
237service packs or system upgrades. It's just one executable. You
238install that executable wherever you want to, and run it.
239
240We want to keep both these properties - the small size, and the ease
241of installation - if at all possible. So code contributions that
242depend critically on external DLLs, or that add a huge amount to the
243code size for a feature which is only useful to a small minority of
244users, are likely to be thrown out immediately.
245
246We do vaguely intend to introduce a DLL plugin interface for PuTTY,
247whereby seriously large extra features can be implemented in plugin
248modules. The important thing, though, is that those DLLs will be
249\e{optional}; if PuTTY can't find them on startup, it should run
250perfectly happily and just won't provide those particular features.
251A full installation of PuTTY might one day contain ten or twenty
252little DLL plugins, which would cut down a little on the ease of
253installation - but if you really needed ease of installation you
254\e{could} still just install the one PuTTY binary, or just the DLLs
255you really needed, and it would still work fine.
256
257Depending on \e{external} DLLs is something we'd like to avoid if at
258all possible (though for some purposes, such as complex SSH
259authentication mechanisms, it may be unavoidable). If it can't be
260avoided, the important thing is to follow the same principle of
261graceful degradation: if a DLL can't be found, then PuTTY should run
262happily and just not supply the feature that depended on it.
263
264\H{udp-single-threaded} Single-threaded code
265
266PuTTY and its supporting tools, or at least the vast majority of
267them, run in only one OS thread.
268
269This means that if you're devising some piece of internal mechanism,
270there's no need to use locks to make sure it doesn't get called by
271two threads at once. The only way code can be called re-entrantly is
272by recursion.
273
274That said, most of Windows PuTTY's network handling is triggered off
275Windows messages requested by \cw{WSAAsyncSelect()}, so if you call
276\cw{MessageBox()} deep within some network event handling code you
277should be aware that you might be re-entered if a network event
278comes in and is passed on to our window procedure by the
279\cw{MessageBox()} message loop.
280
281Also, the front ends (in particular Windows Plink) can use multiple
282threads if they like. However, Windows Plink keeps \e{very} tight
283control of its auxiliary threads, and uses them pretty much
284exclusively as a form of \cw{select()}. Pretty much all the code
285outside \cw{windows/winplink.c} is \e{only} ever called from the one
286primary thread; the others just loop round blocking on file handles
287and send messages to the main thread when some real work needs
288doing. This is not considered a portability hazard because that bit
289of \cw{windows/winplink.c} will need rewriting on other platforms in
290any case.
291
292One important consequence of this: PuTTY has only one thread in
293which to do everything. That \q{everything} may include managing
294more than one login session (\k{udp-globals}), managing multiple
295data channels within an SSH session, responding to GUI events even
296when nothing is happening on the network, and responding to network
297requests from the server (such as repeat key exchange) even when the
298program is dealing with complex user interaction such as the
299re-configuration dialog box. This means that \e{almost none} of the
300PuTTY code can safely block.
301
302\H{udp-keystrokes} Keystrokes sent to the server wherever possible
303
304In almost all cases, PuTTY sends keystrokes to the server. Even
305weird keystrokes that you think should be hot keys controlling
306PuTTY. Even Alt-F4 or Alt-Space, for example. If a keystroke has a
307well-defined escape sequence that it could usefully be sending to
308the server, then it should do so, or at the very least it should be
309configurably able to do so.
310
311To unconditionally turn a key combination into a hot key to control
312PuTTY is almost always a design error. If a hot key is really truly
313required, then try to find a key combination for it which \e{isn't}
314already used in existing PuTTYs (either it sends nothing to the
315server, or it sends the same thing as some other combination). Even
316then, be prepared for the possibility that one day that key
317combination might end up being needed to send something to the
318server - so make sure that there's an alternative way to invoke
319whatever PuTTY feature it controls.
320
321\H{udp-640x480} 640\u00D7{x}480 friendliness in configuration panels
322
323There's a reason we have lots of tiny configuration panels instead
324of a few huge ones, and that reason is that not everyone has a
3251600\u00D7{x}1200 desktop. 640\u00D7{x}480 is still a viable
326resolution for running Windows (and indeed it's still the default if
327you start up in safe mode), so it's still a resolution we care
328about.
329
330Accordingly, the PuTTY configuration box, and the PuTTYgen control
331window, are deliberately kept just small enough to fit comfortably
332on a 640\u00D7{x}480 display. If you're adding controls to either of
333these boxes and you find yourself wanting to increase the size of
334the whole box, \e{don't}. Split it into more panels instead.
335
336\H{udp-makefiles-auto} Automatically generated \cw{Makefile}s
337
338PuTTY is intended to compile on multiple platforms, and with
339multiple compilers. It would be horrifying to try to maintain a
340single \cw{Makefile} which handled all possible situations, and just
341as painful to try to directly maintain a set of matching
342\cw{Makefile}s for each different compilation environment.
343
344Therefore, we have moved the problem up by one level. In the PuTTY
345source archive is a file called \c{Recipe}, which lists which source
346files combine to produce which binaries; and there is also a script
347called \cw{mkfiles.pl}, which reads \c{Recipe} and writes out the
348real \cw{Makefile}s. (The script also reads all the source files and
349analyses their dependencies on header files, so we get an extra
350benefit from doing it this way, which is that we can supply correct
351dependency information even in environments where it's difficult to
352set up an automated \c{make depend} phase.)
353
354You should \e{never} edit any of the PuTTY \cw{Makefile}s directly.
355They are not stored in our source repository at all. They are
356automatically generated by \cw{mkfiles.pl} from the file \c{Recipe}.
357
358If you need to add a new object file to a particular binary, the
359right thing to do is to edit \c{Recipe} and re-run \cw{mkfiles.pl}.
360This will cause the new object file to be added in every tool that
361requires it, on every platform where it matters, in every
362\cw{Makefile} to which it is relevant, \e{and} to get all the
363dependency data right.
364
365If you send us a patch that modifies one of the \cw{Makefile}s, you
366just waste our time, because we will have to convert it into a
367change to \c{Recipe}. If you send us a patch that modifies \e{all}
368of the \cw{Makefile}s, you will have wasted a lot of \e{your} time
369as well!
370
371(There is a comment at the top of every \cw{Makefile} in the PuTTY
372source archive saying this, but many people don't seem to read it,
373so it's worth repeating here.)
374
375\H{udp-ssh-coroutines} Coroutines in the SSH code
376
377Large parts of the code in the various SSH modules (in fact most of
378the protocol layers) are structured using a set of macros that
379implement (something close to) Donald Knuth's \q{coroutines} concept
380in C.
381
382Essentially, the purpose of these macros are to arrange that a
383function can call \cw{crReturn()} to return to its caller, and the
384next time it is called control will resume from just after that
385\cw{crReturn} statement.
386
387This means that any local (automatic) variables declared in such a
388function will be corrupted every time you call \cw{crReturn}. If you
389need a variable to persist for longer than that, you \e{must} make it
390a field in some appropriate structure containing the persistent state
391of the coroutine \dash typically the main state structure for an SSH
392protocol layer.
393
394See
395\W{https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html}\c{https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html}
396for a more in-depth discussion of what these macros are for and how
397they work.
398
399Another caveat: most of these coroutines are not \e{guaranteed} to run
400to completion, because the SSH connection (or whatever) that they're
401part of might be interrupted at any time by an unexpected network
402event or user action. So whenever a coroutine-managed variable refers
403to a resource that needs releasing, you should also ensure that the
404cleanup function for its containing state structure can reliably
405release it even if the coroutine is aborted at an arbitrary point.
406
407For example, if an SSH packet protocol layer has to have a field that
408sometimes points to a piece of allocated memory, then you should
409ensure that when you free that memory you reset the pointer field to
410\cw{NULL}. Then, no matter when the protocol layer's cleanup function
411is called, it can reliably free the memory if there is any, and not
412crash if there isn't.
413
414\H{udp-traits} Explicit vtable structures to implement traits
415
416A lot of PuTTY's code is written in a style that looks structurally
417rather like an object-oriented language, in spite of PuTTY being a
418pure C program.
419
420For example, there's a single data type called \cw{ssh_hash}, which is
421an abstraction of a secure hash function, and a bunch of functions
422called things like \cw{ssh_hash_}\e{foo} that do things with those
423data types. But in fact, PuTTY supports many different hash functions,
424and each one has to provide its own implementation of those functions.
425
426In C++ terms, this is rather like having a single abstract base class,
427and multiple concrete subclasses of it, each of which fills in all the
428pure virtual methods in a way that's compatible with the data fields
429of the subclass. The implementation is more or less the same, as well:
430in C, we do explicitly in the source code what the C++ compiler will
431be doing behind the scenes at compile time.
432
433But perhaps a closer analogy in functional terms is the Rust concept
434of a \q{trait}, or the Java idea of an \q{interface}. C++ supports a
435multi-level hierarchy of inheritance, whereas PuTTY's system \dash
436like traits or interfaces \dash has only two levels, one describing a
437generic object of a type (e.g. a hash function) and another describing
438a specific implementation of that type (e.g. SHA-256).
439
440The PuTTY code base has a standard idiom for doing this in C, as
441follows.
442
443Firstly, we define two \cw{struct} types for our trait. One of them
444describes a particular \e{kind} of implementation of that trait, and
445it's full of (mostly) function pointers. The other describes a
446specific \e{instance} of an implementation of that trait, and it will
447contain a pointer to a \cw{const} instance of the first type. For
448example:
449
450\c typedef struct MyAbstraction MyAbstraction;
451\c typedef struct MyAbstractionVtable MyAbstractionVtable;
452\c
453\c struct MyAbstractionVtable {
454\c     MyAbstraction *(*new)(const MyAbstractionVtable *vt);
455\c     void (*free)(MyAbstraction *);
456\c     void (*modify)(MyAbstraction *, unsigned some_parameter);
457\c     unsigned (*query)(MyAbstraction *, unsigned some_parameter);
458\c };
459\c
460\c struct MyAbstraction {
461\c     const MyAbstractionVtable *vt;
462\c };
463
464Here, we imagine that \cw{MyAbstraction} might be some kind of object
465that contains mutable state. The associated vtable structure shows
466what operations you can perform on a \cw{MyAbstraction}: you can
467create one (dynamically allocated), free one you already have, or call
468the example methods \q{modify} (to change the state of the object in
469some way) and \q{query} (to return some value derived from the
470object's current state).
471
472(In most cases, the vtable structure has a name ending in \cq{vtable}.
473But for historical reasons a lot of the crypto primitives that use
474this scheme \dash ciphers, hash functions, public key methods and so
475on \dash instead have names ending in \cq{alg}, on the basis that the
476primitives they implement are often referred to as \q{encryption
477algorithms}, \q{hash algorithms} and so forth.)
478
479Now, to define a concrete instance of this trait, you'd define a
480\cw{struct} that contains a \cw{MyAbstraction} field, plus any other
481data it might need:
482
483\c struct MyImplementation {
484\c     unsigned internal_data[16];
485\c     SomeOtherType *dynamic_subthing;
486\c
487\c     MyAbstraction myabs;
488\c };
489
490Next, you'd implement all the necessary methods for that
491implementation of the trait, in this kind of style:
492
493\c static MyAbstraction *myimpl_new(const MyAbstractionVtable *vt)
494\c {
495\c     MyImplementation *impl = snew(MyImplementation);
496\c     memset(impl, 0, sizeof(*impl));
497\c     impl->dynamic_subthing = allocate_some_other_type();
498\c     impl->myabs.vt = vt;
499\c     return &impl->myabs;
500\c }
501\c
502\c static void myimpl_free(MyAbstraction *myabs)
503\c {
504\c     MyImplementation *impl = container_of(myabs, MyImplementation, myabs);
505\c     free_other_type(impl->dynamic_subthing);
506\c     sfree(impl);
507\c }
508\c
509\c static void myimpl_modify(MyAbstraction *myabs, unsigned param)
510\c {
511\c     MyImplementation *impl = container_of(myabs, MyImplementation, myabs);
512\c     impl->internal_data[param] += do_something_with(impl->dynamic_subthing);
513\c }
514\c
515\c static unsigned myimpl_query(MyAbstraction *myabs, unsigned param)
516\c {
517\c     MyImplementation *impl = container_of(myabs, MyImplementation, myabs);
518\c     return impl->internal_data[param];
519\c }
520
521Having defined those methods, now we can define a \cw{const} instance
522of the vtable structure containing pointers to them:
523
524\c const MyAbstractionVtable MyImplementation_vt = {
525\c     .new = myimpl_new,
526\c     .free = myimpl_free,
527\c     .modify = myimpl_modify,
528\c     .query = myimpl_query,
529\c };
530
531\e{In principle}, this is all you need. Client code can construct a
532new instance of a particular implementation of \cw{MyAbstraction} by
533digging out the \cw{new} method from the vtable and calling it (with
534the vtable itself as a parameter), which returns a \cw{MyAbstraction
535*} pointer that identifies a newly created instance, in which the
536\cw{vt} field will contain a pointer to the same vtable structure you
537passed in. And once you have an instance object, say \cw{MyAbstraction
538*myabs}, you can dig out one of the other method pointers from the
539vtable it points to, and call that, passing the object itself as a
540parameter.
541
542But in fact, we don't do that, because it looks pretty ugly at all the
543call sites. Instead, what we generally do in this code base is to
544write a set of \cw{static inline} wrapper functions in the same header
545file that defined the \cw{MyAbstraction} structure types, like this:
546
547\c static MyAbstraction *myabs_new(const MyAbstractionVtable *vt)
548\c { return vt->new(vt); }
549\c static void myabs_free(MyAbstraction *myabs)
550\c { myabs->vt->free(myabs); }
551\c static void myimpl_modify(MyAbstraction *myabs, unsigned param)
552\c { myabs->vt->modify(myabs, param); }
553\c static unsigned myimpl_query(MyAbstraction *myabs, unsigned param)
554\c { return myabs->vt->query(myabs, param); }
555
556And now call sites can use those reasonably clean-looking wrapper
557functions, and shouldn't ever have to directly refer to the \cw{vt}
558field inside any \cw{myabs} object they're holding. For example, you
559might write something like this:
560
561\c MyAbstraction *myabs = myabs_new(&MyImplementation_vtable);
562\c myabs_update(myabs, 10);
563\c unsigned output = myabs_query(myabs, 2);
564\c myabs_free(myabs);
565
566and then all this code can use a different implementation of the same
567abstraction by just changing which vtable pointer it passed in in the
568first line.
569
570Some things to note about this system:
571
572\b The implementation instance type (here \cq{MyImplementation}
573contains the abstraction type (\cq{MyAbstraction}) as one of its
574fields. But that field is not necessarily at the start of the
575structure. So you can't just \e{cast} pointers back and forth between
576the two types. Instead:
577
578\lcont{
579
580\b You \q{up-cast} from implementation to abstraction by taking the
581address of the \cw{MyAbstraction} field. You can see the example
582\cw{new} method above doing this, returning \cw{&impl->myabs}. All
583\cw{new} methods do this on return.
584
585\b Going in the other direction, each method that was passed a generic
586\cw{MyAbstraction *myabs} parameter has to recover a pointer to the
587specific implementation type \cw{MyImplementation *impl}. The idiom
588for doing that is to use the \cq{container_of} macro, also seen in the
589Linux kernel code. Generally, \cw{container_of(p, Type, field)} says:
590\q{I'm confident that the pointer value \cq{p} is pointing to the
591field called \cq{field} within a larger \cw{struct} of type \cw{Type}.
592Please return me the pointer to the containing structure.} So in this
593case, we take the \cq{myabs} pointer passed to the function, and
594\q{down-cast} it into a pointer to the larger and more specific
595structure type \cw{MyImplementation}, by adjusting the pointer value
596based on the offset within that structure of the field called
597\cq{myabs}.
598
599This system is flexible enough to permit \q{multiple inheritance}, or
600rather, multiple \e{implementation}: having one object type implement
601more than one trait. For example, the \cw{Proxy} type implements both
602the \cw{Socket} trait and the \cw{Plug} trait that connects to it,
603because it has to act as an adapter between another instance of each
604of those types.
605
606It's also perfectly possible to have the same object implement the
607\e{same} trait in two different ways. At the time of writing this I
608can't think of any case where we actually do this, but a theoretical
609example might be if you needed to support a trait like \cw{Comparable}
610in two ways that sorted by different criteria. There would be no
611difficulty doing this in the PuTTY system: simply have your
612implementation \cw{struct} contain two (or more) fields of the same
613abstraction type. The fields will have different names, which makes it
614easy to explicitly specify which one you're returning a pointer to
615during up-casting, or which one you're down-casting from using
616\cw{container_of}. And then both sets of implementation methods can
617recover a pointer to the same containing structure.
618
619}
620
621\b Unlike in C++, all objects in PuTTY that use this system are
622dynamically allocated. The \q{constructor} functions (whether they're
623virtualised across the whole abstraction or specific to each
624implementation) always allocate memory and return a pointer to it. The
625\q{free} method (our analogue of a destructor) always expects the
626input pointer to be dynamically allocated, and frees it. As a result,
627client code doesn't need to know how large the implementing object
628type is, because it will never need to allocate it (on the stack or
629anywhere else).
630
631\b Unlike in C++, the abstraction's \q{vtable} structure does not only
632hold methods that you can call on an instance object. It can also
633hold several other kinds of thing:
634
635\lcont{
636
637\b Methods that you can call \e{without} an instance object, given
638only the vtable structure identifying a particular implementation of
639the trait. You might think of these as \q{static methods}, as in C++,
640except that they're \e{virtual} \dash the same code can call the
641static method of a different \q{class} given a different vtable
642pointer. So they're more like \q{virtual static methods}, which is a
643concept C++ doesn't have. An example is the \cw{pubkey_bits} method in
644\cw{ssh_keyalg}.
645
646\b The most important case of a \q{virtual static method} is the
647\cw{new} method that allocates and returns a new object. You can think
648of it as a \q{virtual constructor} \dash another concept C++ doesn't
649have. (However, not all types need one of these: see below.)
650
651\b The vtable can also contain constant data relevant to the class as
652a whole \dash \q{virtual constant data}. For example, a cryptographic
653hash function will contain an integer field giving the length of the
654output hash, and most crypto primitives will contain a string field
655giving the identifier used in the SSH protocol that describes that
656primitive.
657
658The effect of all of this is that you can make other pieces of code
659able to use any instance of one of these types, by passing it an
660actual vtable as a parameter. For example, the \cw{hash_simple}
661function takes an \cw{ssh_hashalg} vtable pointer specifying any hash
662algorithm you like, and internally, it creates an object of that type,
663uses it, and frees it. In C++, you'd probably do this using a
664template, which would mean you had multiple specialisations of
665\cw{hash_simple} \dash and then it would be much more difficult to
666decide \e{at run time} which one you needed to use. Here,
667\cw{hash_simple} is still just one function, and you can decide as
668late as you like which vtable to pass to it.
669
670}
671
672\b The abstract \e{instance} structure can also contain publicly
673visible data fields (this time, usually treated as mutable) which are
674common to all implementations of the trait. For example,
675\cw{BinaryPacketProtocol} has lots of these.
676
677\b Not all abstractions of this kind want virtual constructors. It
678depends on how different the implementations are.
679
680\lcont{
681
682With a crypto primitive like a hash algorithm, the constructor call
683looks the same for every implementing type, so it makes sense to have
684a standardised virtual constructor in the vtable and a
685\cw{ssh_hash_new} wrapper function which can make an instance of
686whatever vtable you pass it. And then you make all the vtable objects
687themselves globally visible throughout the source code, so that any
688module can call (for example) \cw{ssh_hash_new(&ssh_sha256)}.
689
690But with other kinds of object, the constructor for each implementing
691type has to take a different set of parameters. For example,
692implementations of \cw{Socket} are not generally interchangeable at
693construction time, because constructing different kinds of socket
694require totally different kinds of address parameter. In that
695situation, it makes more sense to keep the vtable structure itself
696private to the implementing source file, and instead, publish an
697ordinary constructing function that allocates and returns an instance
698of that particular subtype, taking whatever parameters are appropriate
699to that subtype.
700
701}
702
703\b If you do have virtual constructors, you can choose whether they
704take a vtable pointer as a parameter (as shown above), or an
705\e{existing} instance object. In the latter case, they can refer to
706the object itself as well as the vtable. For example, you could have a
707trait come with a virtual constructor called \q{clone}, meaning
708\q{Make a copy of this object, no matter which implementation it is.}
709
710\b Sometimes, a single vtable structure type can be shared between two
711completely different object types, and contain all the methods for
712both. For example, \cw{ssh_compression_alg} contains methods to
713create, use and free \cw{ssh_compressor} and \cw{ssh_decompressor}
714objects, which are not interchangeable \dash but putting their methods
715in the same vtable means that it's easy to create a matching pair of
716objects that are compatible with each other.
717
718\b Passing the vtable itself as an argument to the \cw{new} method is
719not compulsory: if a given \cw{new} implementation is only used by a
720single vtable, then that function can simply hard-code the vtable
721pointer that it writes into the object it constructs. But passing the
722vtable is more flexible, because it allows a single constructor
723function to be shared between multiple slightly different object
724types. For example, SHA-384 and SHA-512 share the same \cw{new} method
725and the same implementation data type, because they're very nearly the
726same hash algorithm \dash but a couple of the other methods in their
727vtables are different, because the \q{reset} function has to set up
728the initial algorithm state differently, and the \q{digest} method has
729to write out a different amount of data.
730
731\lcont{
732
733One practical advantage of having the \cw{myabs_}\e{foo} family of
734inline wrapper functions in the header file is that if you change your
735mind later about whether the vtable needs to be passed to \cw{new},
736you only have to update the \cw{myabs_new} wrapper, and then the
737existing call sites won't need changing.
738
739}
740
741\b Another piece of \q{stunt object orientation} made possible by this
742scheme is that you can write two vtables that both use the same
743structure layout for the implementation object, and have an object
744\e{transform from one to the other} part way through its lifetime, by
745overwriting its own vtable pointer field. For example, the
746\cw{sesschan} type that handles the server side of an SSH terminal
747session will sometimes transform in mid-lifetime into an SCP or SFTP
748file-transfer channel in this way, at the point where the client sends
749an \cq{exec} or \cq{subsystem} request that indicates that that's what
750it wants to do with the channel.
751
752\lcont{
753
754This concept would be difficult to arrange in C++. In Rust, it
755wouldn't even \e{make sense}, because in Rust, objects implementing a
756trait don't even contain a vtable pointer at all \dash instead, the
757\q{trait object} type (identifying a specific instance of some
758implementation of a given trait) consists of a pair of pointers, one
759to the object itself and one to the vtable. In that model, the only
760way you could make an existing object turn into a different trait
761would be to know where all the pointers to it were stored elsewhere in
762the program, and persuade all their owners to rewrite them.
763
764}
765
766\b Another stunt you can do is to have a vtable that doesn't have a
767corresponding implementation structure at all, because the only
768methods implemented in it are the constructors, and they always end up
769returning an implementation of some other vtable. For example, some of
770PuTTY's crypto primitives have a hardware-accelerated version and a
771pure software version, and decide at run time which one to use (based
772on whether the CPU they're running on supports the necessary
773acceleration instructions). So, for example, there are vtables for
774\cw{ssh_sha256_sw} and \cw{ssh_sha256_hw}, each of which has its own
775data layout and its own implementations of all the methods; and then
776there's a top-level vtable \cw{ssh_sha256}, which only provides the
777\q{new} method, and implements it by calling the \q{new} method on one
778or other of the subtypes depending on what it finds out about the
779machine it's running on. That top-level selector vtable is nearly
780always the one used by client code. (Except for the test suite, which
781has to instantiate both of the subtypes in order to make sure they
782both pass the tests.)
783
784\lcont{
785
786As a result, the top-level selector vtable \cw{ssh_sha256} doesn't
787need to implement any method that takes an \cw{ssh_cipher *}
788parameter, because no \cw{ssh_cipher} object is ever constructed whose
789\cw{vt} field points to \cw{&ssh_sha256}: they all point to one of the
790other two full implementation vtables.
791
792}
793
794\H{udp-compile-once} Single compilation of each source file
795
796The PuTTY build system for any given platform works on the following
797very simple model:
798
799\b Each source file is compiled precisely once, to produce a single
800object file.
801
802\b Each binary is created by linking together some combination of
803those object files.
804
805Therefore, if you need to introduce functionality to a particular
806module which is only available in some of the tool binaries (for
807example, a cryptographic proxy authentication mechanism which needs
808to be left out of PuTTYtel to maintain its usability in
809crypto-hostile jurisdictions), the \e{wrong} way to do it is by
810adding \cw{#ifdef}s in (say) \cw{proxy.c}. This would require
811separate compilation of \cw{proxy.c} for PuTTY and PuTTYtel, which
812means that the entire \cw{Makefile}-generation architecture (see
813\k{udp-makefiles-auto}) would have to be significantly redesigned.
814Unless you are prepared to do that redesign yourself, \e{and}
815guarantee that it will still port to any future platforms we might
816decide to run on, you should not attempt this!
817
818The \e{right} way to introduce a feature like this is to put the new
819code in a separate source file, and (if necessary) introduce a
820second new source file defining the same set of functions, but
821defining them as stubs which don't provide the feature. Then the
822module whose behaviour needs to vary (\cw{proxy.c} in this example)
823can call the functions defined in these two modules, and it will
824either provide the new feature or not provide it according to which
825of your new modules it is linked with.
826
827Of course, object files are never shared \e{between} platforms; so
828it is allowable to use \cw{#ifdef} to select between platforms. This
829happens in \cw{puttyps.h} (choosing which of the platform-specific
830include files to use), and also in \cw{misc.c} (the Windows-specific
831\q{Minefield} memory diagnostic system). It should be used
832sparingly, though, if at all.
833
834\H{udp-perfection} Do as we say, not as we do
835
836The current PuTTY code probably does not conform strictly to \e{all}
837of the principles listed above. There may be the occasional
838SSH-specific piece of code in what should be a backend-independent
839module, or the occasional dependence on a non-standard X library
840function under Unix.
841
842This should not be taken as a licence to go ahead and violate the
843rules. Where we violate them ourselves, we're not happy about it,
844and we would welcome patches that fix any existing problems. Please
845try to help us make our code better, not worse!
846