1\# This file is so named for tradition's sake: it contains what we 2\# always used to refer to, before they were written down, as 3\# PuTTY's `unwritten design principles'. It has nothing to do with 4\# the User Datagram Protocol. 5 6\A{udp} PuTTY hacking guide 7 8This appendix lists a selection of the design principles applying to 9the PuTTY source code. If you are planning to send code 10contributions, you should read this first. 11 12\H{udp-portability} Cross-OS portability 13 14Despite Windows being its main area of fame, PuTTY is no longer a 15Windows-only application suite. It has a working Unix port; a Mac 16port is in progress; more ports may or may not happen at a later 17date. 18 19Therefore, embedding Windows-specific code in core modules such as 20\cw{ssh.c} is not acceptable. We went to great lengths to \e{remove} 21all the Windows-specific stuff from our core modules, and to shift 22it out into Windows-specific modules. Adding large amounts of 23Windows-specific stuff in parts of the code that should be portable 24is almost guaranteed to make us reject a contribution. 25 26The PuTTY source base is divided into platform-specific modules and 27platform-generic modules. The Unix-specific modules are all in the 28\c{unix} subdirectory; the Windows-specific modules are in the 29\c{windows} subdirectory. 30 31All the modules in the main source directory - notably \e{all} of 32the code for the various back ends - are platform-generic. We want 33to keep them that way. 34 35This also means you should stick to the C semantics guaranteed by the 36C standard: try not to make assumptions about the precise size of 37basic types such as \c{int} and \c{long int}; don't use pointer casts 38to do endianness-dependent operations, and so on. 39 40(Even \e{within} a platform front end you should still be careful of 41some of these portability issues. The Windows front end compiles on 42both 32- and 64-bit x86 and also Arm.) 43 44Our current choice of C standards version is \e{mostly} C99. With a 45couple of exceptions, you can assume that C99 features are available 46(in particular \cw{<stdint.h>}, \cw{<stdbool.h>} and \c{inline}), but 47you shouldn't use things that are new in C11 (such as \cw{<uchar.h>} 48or \cw{_Generic}). 49 50The exceptions to that rule are due to the need for Visual Studio 51compatibility: 52 53\b Don't use variable-length arrays. Visual Studio doesn't support 54them even now that it's adopted the rest of C99. We use \cw{-Wvla} 55when building with gcc and clang, to make it easier to avoid 56accidentally breaking that rule. 57 58\b For historical reasons, we still build with one older VS version 59which lacks \cw{<inttypes.h>}. So that file is included centrally in 60\c{defs.h}, and has a set of workaround definitions for the 61\cw{PRIx64}-type macros we use. If you need to use another one of 62those macros, you need to add a workaround definition in \c{defs.h}, 63and don't casually re-include \cw{<inttypes.h>} anywhere else in the 64source file. 65 66Here are a few portability assumptions that we \e{do} currently allow 67(because we'd already have to thoroughly vet the existing code if they 68ever needed to change, and it doesn't seem worth doing that unless we 69really have to): 70 71\b You can assume \c{int} is \e{at least} 32 bits wide. (We've never 72tried to port PuTTY to a platform with 16-bit \cw{int}, and it doesn't 73look likely to be necessary in future.) 74 75\b Similarly, you can assume \c{char} is exactly 8 bits. (Exceptions 76to that are even less likely to be relevant to us than short 77\cw{int}.) 78 79\b You can assume that using \c{memset} to write zero bytes over a 80whole structure will have the effect of setting all its pointer fields 81to \cw{NULL}. (The standard itself guarantees this for \e{integer} 82fields, but not for pointers.) 83 84\b You can assume that \c{time_t} has POSIX semantics, i.e. that it 85represents an integer number of non-leap seconds since 1970-01-01 8600:00:00 UTC. (Times in this format are used in X authorisation, but 87we could work around that by carefully distinguishing local \c{time_t} 88from time values used in the wire protocol; but these semantics of 89\c{time_t} are also baked into the shared library API used by the 90GSSAPI authentication code, which would be much harder to change.) 91 92\b You can assume that the execution character encoding is a superset 93of the printable characters of ASCII. (In particular, it's fine to do 94arithmetic on a \c{char} value representing a Latin alphabetic 95character, without bothering to allow for EBCDIC or other 96non-consecutive encodings of the alphabet.) 97 98On the other hand, here are some particular things \e{not} to assume: 99 100\b Don't assume anything about the \e{signedness} of \c{char}. In 101particular, you \e{must} cast \c{char} values to \c{unsigned char} 102before passing them to any \cw{<ctype.h>} function (because those 103expect a non-negative character value, or \cw{EOF}). If you need a 104particular signedness, explicitly specify \c{signed char} or 105\c{unsigned char}, or use C99 \cw{int8_t} or \cw{uint8_t}. 106 107\b From past experience with MacOS, we're still a bit nervous about 108\cw{'\\n'} and \cw{'\\r'} potentially having unusual meanings on a 109given platform. So it's fine to say \c{\\n} in a string you're passing 110to \c{printf}, but in any context where those characters appear in a 111standardised wire protocol or a binary file format, they should be 112spelled \cw{'\\012'} and \cw{'\\015'} respectively. 113 114\H{udp-multi-backend} Multiple backends treated equally 115 116PuTTY is not an SSH client with some other stuff tacked on the side. 117PuTTY is a generic, multiple-backend, remote VT-terminal client 118which happens to support one backend which is larger, more popular 119and more useful than the rest. Any extra feature which can possibly 120be general across all backends should be so: localising features 121unnecessarily into the SSH back end is a design error. (For example, 122we had several code submissions for proxy support which worked by 123hacking \cw{ssh.c}. Clearly this is completely wrong: the 124\cw{network.h} abstraction is the place to put it, so that it will 125apply to all back ends equally, and indeed we eventually put it 126there after another contributor sent a better patch.) 127 128The rest of PuTTY should try to avoid knowing anything about 129specific back ends if at all possible. To support a feature which is 130only available in one network protocol, for example, the back end 131interface should be extended in a general manner such that \e{any} 132back end which is able to provide that feature can do so. If it so 133happens that only one back end actually does, that's just the way it 134is, but it shouldn't be relied upon by any code. 135 136\H{udp-globals} Multiple sessions per process on some platforms 137 138Some ports of PuTTY - notably the in-progress Mac port - are 139constrained by the operating system to run as a single process 140potentially managing multiple sessions. 141 142Therefore, the platform-independent parts of PuTTY never use global 143variables to store per-session data. The global variables that do 144exist are tolerated because they are not specific to a particular 145login session. The random number state in \cw{sshrand.c}, the timer 146list in \cw{timing.c} and the queue of top-level callbacks in 147\cw{callback.c} serve all sessions equally. But most data is specific 148to a particular network session, and is therefore stored in 149dynamically allocated data structures, and pointers to these 150structures are passed around between functions. 151 152Platform-specific code can reverse this decision if it likes. The 153Windows code, for historical reasons, stores most of its data as 154global variables. That's OK, because \e{on Windows} we know there is 155only one session per PuTTY process, so it's safe to do that. But 156changes to the platform-independent code should avoid introducing 157global variables, unless they are genuinely cross-session. 158 159\H{udp-pure-c} C, not C++ 160 161PuTTY is written entirely in C, not in C++. 162 163We have made \e{some} effort to make it easy to compile our code 164using a C++ compiler: notably, our \c{snew}, \c{snewn} and 165\c{sresize} macros explicitly cast the return values of \cw{malloc} 166and \cw{realloc} to the target type. (This has type checking 167advantages even in C: it means you never accidentally allocate the 168wrong size piece of memory for the pointer type you're assigning it 169to. C++ friendliness is really a side benefit.) 170 171We want PuTTY to continue being pure C, at least in the 172platform-independent parts and the currently existing ports. Patches 173which switch the Makefiles to compile it as C++ and start using 174classes will not be accepted. Also, in particular, we disapprove of 175\cw{//} comments, at least for the moment. (Perhaps once C99 becomes 176genuinely widespread we might be more lenient.) 177 178The one exception: a port to a new platform may use languages other 179than C if they are necessary to code on that platform. If your 180favourite PDA has a GUI with a C++ API, then there's no way you can 181do a port of PuTTY without using C++, so go ahead and use it. But 182keep the C++ restricted to that platform's subdirectory; if your 183changes force the Unix or Windows ports to be compiled as C++, they 184will be unacceptable to us. 185 186\H{udp-security} Security-conscious coding 187 188PuTTY is a network application and a security application. Assume 189your code will end up being fed deliberately malicious data by 190attackers, and try to code in a way that makes it unlikely to be a 191security risk. 192 193In particular, try not to use fixed-size buffers for variable-size 194data such as strings received from the network (or even the user). 195We provide functions such as \cw{dupcat} and \cw{dupprintf}, which 196dynamically allocate buffers of the right size for the string they 197construct. Use these wherever possible. 198 199\H{udp-multi-compiler} Independence of specific compiler 200 201Windows PuTTY can currently be compiled with any of three Windows 202compilers: MS Visual C, the Cygwin / \cw{mingw32} GNU tools, and 203\cw{clang} (in MS compatibility mode). 204 205This is a really useful property of PuTTY, because it means people 206who want to contribute to the coding don't depend on having a 207specific compiler; so they don't have to fork out money for MSVC if 208they don't already have it, but on the other hand if they \e{do} 209have it they also don't have to spend effort installing \cw{gcc} 210alongside it. They can use whichever compiler they happen to have 211available, or install whichever is cheapest and easiest if they 212don't have one. 213 214Therefore, we don't want PuTTY to start depending on which compiler 215you're using. Using GNU extensions to the C language, for example, 216would ruin this useful property (not that anyone's ever tried it!); 217and more realistically, depending on an MS-specific library function 218supplied by the MSVC C library (\cw{_snprintf}, for example) is a 219mistake, because that function won't be available under the other 220compilers. Any function supplied in an official Windows DLL as part 221of the Windows API is fine, and anything defined in the C library 222standard is also fine, because those should be available 223irrespective of compilation environment. But things in between, 224available as non-standard library and language extensions in only 225one compiler, are disallowed. 226 227(\cw{_snprintf} in particular should be unnecessary, since we 228provide \cw{dupprintf}; see \k{udp-security}.) 229 230Compiler independence should apply on all platforms, of course, not 231just on Windows. 232 233\H{udp-small} Small code size 234 235PuTTY is tiny, compared to many other Windows applications. And it's 236easy to install: it depends on no DLLs, no other applications, no 237service packs or system upgrades. It's just one executable. You 238install that executable wherever you want to, and run it. 239 240We want to keep both these properties - the small size, and the ease 241of installation - if at all possible. So code contributions that 242depend critically on external DLLs, or that add a huge amount to the 243code size for a feature which is only useful to a small minority of 244users, are likely to be thrown out immediately. 245 246We do vaguely intend to introduce a DLL plugin interface for PuTTY, 247whereby seriously large extra features can be implemented in plugin 248modules. The important thing, though, is that those DLLs will be 249\e{optional}; if PuTTY can't find them on startup, it should run 250perfectly happily and just won't provide those particular features. 251A full installation of PuTTY might one day contain ten or twenty 252little DLL plugins, which would cut down a little on the ease of 253installation - but if you really needed ease of installation you 254\e{could} still just install the one PuTTY binary, or just the DLLs 255you really needed, and it would still work fine. 256 257Depending on \e{external} DLLs is something we'd like to avoid if at 258all possible (though for some purposes, such as complex SSH 259authentication mechanisms, it may be unavoidable). If it can't be 260avoided, the important thing is to follow the same principle of 261graceful degradation: if a DLL can't be found, then PuTTY should run 262happily and just not supply the feature that depended on it. 263 264\H{udp-single-threaded} Single-threaded code 265 266PuTTY and its supporting tools, or at least the vast majority of 267them, run in only one OS thread. 268 269This means that if you're devising some piece of internal mechanism, 270there's no need to use locks to make sure it doesn't get called by 271two threads at once. The only way code can be called re-entrantly is 272by recursion. 273 274That said, most of Windows PuTTY's network handling is triggered off 275Windows messages requested by \cw{WSAAsyncSelect()}, so if you call 276\cw{MessageBox()} deep within some network event handling code you 277should be aware that you might be re-entered if a network event 278comes in and is passed on to our window procedure by the 279\cw{MessageBox()} message loop. 280 281Also, the front ends (in particular Windows Plink) can use multiple 282threads if they like. However, Windows Plink keeps \e{very} tight 283control of its auxiliary threads, and uses them pretty much 284exclusively as a form of \cw{select()}. Pretty much all the code 285outside \cw{windows/winplink.c} is \e{only} ever called from the one 286primary thread; the others just loop round blocking on file handles 287and send messages to the main thread when some real work needs 288doing. This is not considered a portability hazard because that bit 289of \cw{windows/winplink.c} will need rewriting on other platforms in 290any case. 291 292One important consequence of this: PuTTY has only one thread in 293which to do everything. That \q{everything} may include managing 294more than one login session (\k{udp-globals}), managing multiple 295data channels within an SSH session, responding to GUI events even 296when nothing is happening on the network, and responding to network 297requests from the server (such as repeat key exchange) even when the 298program is dealing with complex user interaction such as the 299re-configuration dialog box. This means that \e{almost none} of the 300PuTTY code can safely block. 301 302\H{udp-keystrokes} Keystrokes sent to the server wherever possible 303 304In almost all cases, PuTTY sends keystrokes to the server. Even 305weird keystrokes that you think should be hot keys controlling 306PuTTY. Even Alt-F4 or Alt-Space, for example. If a keystroke has a 307well-defined escape sequence that it could usefully be sending to 308the server, then it should do so, or at the very least it should be 309configurably able to do so. 310 311To unconditionally turn a key combination into a hot key to control 312PuTTY is almost always a design error. If a hot key is really truly 313required, then try to find a key combination for it which \e{isn't} 314already used in existing PuTTYs (either it sends nothing to the 315server, or it sends the same thing as some other combination). Even 316then, be prepared for the possibility that one day that key 317combination might end up being needed to send something to the 318server - so make sure that there's an alternative way to invoke 319whatever PuTTY feature it controls. 320 321\H{udp-640x480} 640\u00D7{x}480 friendliness in configuration panels 322 323There's a reason we have lots of tiny configuration panels instead 324of a few huge ones, and that reason is that not everyone has a 3251600\u00D7{x}1200 desktop. 640\u00D7{x}480 is still a viable 326resolution for running Windows (and indeed it's still the default if 327you start up in safe mode), so it's still a resolution we care 328about. 329 330Accordingly, the PuTTY configuration box, and the PuTTYgen control 331window, are deliberately kept just small enough to fit comfortably 332on a 640\u00D7{x}480 display. If you're adding controls to either of 333these boxes and you find yourself wanting to increase the size of 334the whole box, \e{don't}. Split it into more panels instead. 335 336\H{udp-makefiles-auto} Automatically generated \cw{Makefile}s 337 338PuTTY is intended to compile on multiple platforms, and with 339multiple compilers. It would be horrifying to try to maintain a 340single \cw{Makefile} which handled all possible situations, and just 341as painful to try to directly maintain a set of matching 342\cw{Makefile}s for each different compilation environment. 343 344Therefore, we have moved the problem up by one level. In the PuTTY 345source archive is a file called \c{Recipe}, which lists which source 346files combine to produce which binaries; and there is also a script 347called \cw{mkfiles.pl}, which reads \c{Recipe} and writes out the 348real \cw{Makefile}s. (The script also reads all the source files and 349analyses their dependencies on header files, so we get an extra 350benefit from doing it this way, which is that we can supply correct 351dependency information even in environments where it's difficult to 352set up an automated \c{make depend} phase.) 353 354You should \e{never} edit any of the PuTTY \cw{Makefile}s directly. 355They are not stored in our source repository at all. They are 356automatically generated by \cw{mkfiles.pl} from the file \c{Recipe}. 357 358If you need to add a new object file to a particular binary, the 359right thing to do is to edit \c{Recipe} and re-run \cw{mkfiles.pl}. 360This will cause the new object file to be added in every tool that 361requires it, on every platform where it matters, in every 362\cw{Makefile} to which it is relevant, \e{and} to get all the 363dependency data right. 364 365If you send us a patch that modifies one of the \cw{Makefile}s, you 366just waste our time, because we will have to convert it into a 367change to \c{Recipe}. If you send us a patch that modifies \e{all} 368of the \cw{Makefile}s, you will have wasted a lot of \e{your} time 369as well! 370 371(There is a comment at the top of every \cw{Makefile} in the PuTTY 372source archive saying this, but many people don't seem to read it, 373so it's worth repeating here.) 374 375\H{udp-ssh-coroutines} Coroutines in the SSH code 376 377Large parts of the code in the various SSH modules (in fact most of 378the protocol layers) are structured using a set of macros that 379implement (something close to) Donald Knuth's \q{coroutines} concept 380in C. 381 382Essentially, the purpose of these macros are to arrange that a 383function can call \cw{crReturn()} to return to its caller, and the 384next time it is called control will resume from just after that 385\cw{crReturn} statement. 386 387This means that any local (automatic) variables declared in such a 388function will be corrupted every time you call \cw{crReturn}. If you 389need a variable to persist for longer than that, you \e{must} make it 390a field in some appropriate structure containing the persistent state 391of the coroutine \dash typically the main state structure for an SSH 392protocol layer. 393 394See 395\W{https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html}\c{https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html} 396for a more in-depth discussion of what these macros are for and how 397they work. 398 399Another caveat: most of these coroutines are not \e{guaranteed} to run 400to completion, because the SSH connection (or whatever) that they're 401part of might be interrupted at any time by an unexpected network 402event or user action. So whenever a coroutine-managed variable refers 403to a resource that needs releasing, you should also ensure that the 404cleanup function for its containing state structure can reliably 405release it even if the coroutine is aborted at an arbitrary point. 406 407For example, if an SSH packet protocol layer has to have a field that 408sometimes points to a piece of allocated memory, then you should 409ensure that when you free that memory you reset the pointer field to 410\cw{NULL}. Then, no matter when the protocol layer's cleanup function 411is called, it can reliably free the memory if there is any, and not 412crash if there isn't. 413 414\H{udp-traits} Explicit vtable structures to implement traits 415 416A lot of PuTTY's code is written in a style that looks structurally 417rather like an object-oriented language, in spite of PuTTY being a 418pure C program. 419 420For example, there's a single data type called \cw{ssh_hash}, which is 421an abstraction of a secure hash function, and a bunch of functions 422called things like \cw{ssh_hash_}\e{foo} that do things with those 423data types. But in fact, PuTTY supports many different hash functions, 424and each one has to provide its own implementation of those functions. 425 426In C++ terms, this is rather like having a single abstract base class, 427and multiple concrete subclasses of it, each of which fills in all the 428pure virtual methods in a way that's compatible with the data fields 429of the subclass. The implementation is more or less the same, as well: 430in C, we do explicitly in the source code what the C++ compiler will 431be doing behind the scenes at compile time. 432 433But perhaps a closer analogy in functional terms is the Rust concept 434of a \q{trait}, or the Java idea of an \q{interface}. C++ supports a 435multi-level hierarchy of inheritance, whereas PuTTY's system \dash 436like traits or interfaces \dash has only two levels, one describing a 437generic object of a type (e.g. a hash function) and another describing 438a specific implementation of that type (e.g. SHA-256). 439 440The PuTTY code base has a standard idiom for doing this in C, as 441follows. 442 443Firstly, we define two \cw{struct} types for our trait. One of them 444describes a particular \e{kind} of implementation of that trait, and 445it's full of (mostly) function pointers. The other describes a 446specific \e{instance} of an implementation of that trait, and it will 447contain a pointer to a \cw{const} instance of the first type. For 448example: 449 450\c typedef struct MyAbstraction MyAbstraction; 451\c typedef struct MyAbstractionVtable MyAbstractionVtable; 452\c 453\c struct MyAbstractionVtable { 454\c MyAbstraction *(*new)(const MyAbstractionVtable *vt); 455\c void (*free)(MyAbstraction *); 456\c void (*modify)(MyAbstraction *, unsigned some_parameter); 457\c unsigned (*query)(MyAbstraction *, unsigned some_parameter); 458\c }; 459\c 460\c struct MyAbstraction { 461\c const MyAbstractionVtable *vt; 462\c }; 463 464Here, we imagine that \cw{MyAbstraction} might be some kind of object 465that contains mutable state. The associated vtable structure shows 466what operations you can perform on a \cw{MyAbstraction}: you can 467create one (dynamically allocated), free one you already have, or call 468the example methods \q{modify} (to change the state of the object in 469some way) and \q{query} (to return some value derived from the 470object's current state). 471 472(In most cases, the vtable structure has a name ending in \cq{vtable}. 473But for historical reasons a lot of the crypto primitives that use 474this scheme \dash ciphers, hash functions, public key methods and so 475on \dash instead have names ending in \cq{alg}, on the basis that the 476primitives they implement are often referred to as \q{encryption 477algorithms}, \q{hash algorithms} and so forth.) 478 479Now, to define a concrete instance of this trait, you'd define a 480\cw{struct} that contains a \cw{MyAbstraction} field, plus any other 481data it might need: 482 483\c struct MyImplementation { 484\c unsigned internal_data[16]; 485\c SomeOtherType *dynamic_subthing; 486\c 487\c MyAbstraction myabs; 488\c }; 489 490Next, you'd implement all the necessary methods for that 491implementation of the trait, in this kind of style: 492 493\c static MyAbstraction *myimpl_new(const MyAbstractionVtable *vt) 494\c { 495\c MyImplementation *impl = snew(MyImplementation); 496\c memset(impl, 0, sizeof(*impl)); 497\c impl->dynamic_subthing = allocate_some_other_type(); 498\c impl->myabs.vt = vt; 499\c return &impl->myabs; 500\c } 501\c 502\c static void myimpl_free(MyAbstraction *myabs) 503\c { 504\c MyImplementation *impl = container_of(myabs, MyImplementation, myabs); 505\c free_other_type(impl->dynamic_subthing); 506\c sfree(impl); 507\c } 508\c 509\c static void myimpl_modify(MyAbstraction *myabs, unsigned param) 510\c { 511\c MyImplementation *impl = container_of(myabs, MyImplementation, myabs); 512\c impl->internal_data[param] += do_something_with(impl->dynamic_subthing); 513\c } 514\c 515\c static unsigned myimpl_query(MyAbstraction *myabs, unsigned param) 516\c { 517\c MyImplementation *impl = container_of(myabs, MyImplementation, myabs); 518\c return impl->internal_data[param]; 519\c } 520 521Having defined those methods, now we can define a \cw{const} instance 522of the vtable structure containing pointers to them: 523 524\c const MyAbstractionVtable MyImplementation_vt = { 525\c .new = myimpl_new, 526\c .free = myimpl_free, 527\c .modify = myimpl_modify, 528\c .query = myimpl_query, 529\c }; 530 531\e{In principle}, this is all you need. Client code can construct a 532new instance of a particular implementation of \cw{MyAbstraction} by 533digging out the \cw{new} method from the vtable and calling it (with 534the vtable itself as a parameter), which returns a \cw{MyAbstraction 535*} pointer that identifies a newly created instance, in which the 536\cw{vt} field will contain a pointer to the same vtable structure you 537passed in. And once you have an instance object, say \cw{MyAbstraction 538*myabs}, you can dig out one of the other method pointers from the 539vtable it points to, and call that, passing the object itself as a 540parameter. 541 542But in fact, we don't do that, because it looks pretty ugly at all the 543call sites. Instead, what we generally do in this code base is to 544write a set of \cw{static inline} wrapper functions in the same header 545file that defined the \cw{MyAbstraction} structure types, like this: 546 547\c static MyAbstraction *myabs_new(const MyAbstractionVtable *vt) 548\c { return vt->new(vt); } 549\c static void myabs_free(MyAbstraction *myabs) 550\c { myabs->vt->free(myabs); } 551\c static void myimpl_modify(MyAbstraction *myabs, unsigned param) 552\c { myabs->vt->modify(myabs, param); } 553\c static unsigned myimpl_query(MyAbstraction *myabs, unsigned param) 554\c { return myabs->vt->query(myabs, param); } 555 556And now call sites can use those reasonably clean-looking wrapper 557functions, and shouldn't ever have to directly refer to the \cw{vt} 558field inside any \cw{myabs} object they're holding. For example, you 559might write something like this: 560 561\c MyAbstraction *myabs = myabs_new(&MyImplementation_vtable); 562\c myabs_update(myabs, 10); 563\c unsigned output = myabs_query(myabs, 2); 564\c myabs_free(myabs); 565 566and then all this code can use a different implementation of the same 567abstraction by just changing which vtable pointer it passed in in the 568first line. 569 570Some things to note about this system: 571 572\b The implementation instance type (here \cq{MyImplementation} 573contains the abstraction type (\cq{MyAbstraction}) as one of its 574fields. But that field is not necessarily at the start of the 575structure. So you can't just \e{cast} pointers back and forth between 576the two types. Instead: 577 578\lcont{ 579 580\b You \q{up-cast} from implementation to abstraction by taking the 581address of the \cw{MyAbstraction} field. You can see the example 582\cw{new} method above doing this, returning \cw{&impl->myabs}. All 583\cw{new} methods do this on return. 584 585\b Going in the other direction, each method that was passed a generic 586\cw{MyAbstraction *myabs} parameter has to recover a pointer to the 587specific implementation type \cw{MyImplementation *impl}. The idiom 588for doing that is to use the \cq{container_of} macro, also seen in the 589Linux kernel code. Generally, \cw{container_of(p, Type, field)} says: 590\q{I'm confident that the pointer value \cq{p} is pointing to the 591field called \cq{field} within a larger \cw{struct} of type \cw{Type}. 592Please return me the pointer to the containing structure.} So in this 593case, we take the \cq{myabs} pointer passed to the function, and 594\q{down-cast} it into a pointer to the larger and more specific 595structure type \cw{MyImplementation}, by adjusting the pointer value 596based on the offset within that structure of the field called 597\cq{myabs}. 598 599This system is flexible enough to permit \q{multiple inheritance}, or 600rather, multiple \e{implementation}: having one object type implement 601more than one trait. For example, the \cw{Proxy} type implements both 602the \cw{Socket} trait and the \cw{Plug} trait that connects to it, 603because it has to act as an adapter between another instance of each 604of those types. 605 606It's also perfectly possible to have the same object implement the 607\e{same} trait in two different ways. At the time of writing this I 608can't think of any case where we actually do this, but a theoretical 609example might be if you needed to support a trait like \cw{Comparable} 610in two ways that sorted by different criteria. There would be no 611difficulty doing this in the PuTTY system: simply have your 612implementation \cw{struct} contain two (or more) fields of the same 613abstraction type. The fields will have different names, which makes it 614easy to explicitly specify which one you're returning a pointer to 615during up-casting, or which one you're down-casting from using 616\cw{container_of}. And then both sets of implementation methods can 617recover a pointer to the same containing structure. 618 619} 620 621\b Unlike in C++, all objects in PuTTY that use this system are 622dynamically allocated. The \q{constructor} functions (whether they're 623virtualised across the whole abstraction or specific to each 624implementation) always allocate memory and return a pointer to it. The 625\q{free} method (our analogue of a destructor) always expects the 626input pointer to be dynamically allocated, and frees it. As a result, 627client code doesn't need to know how large the implementing object 628type is, because it will never need to allocate it (on the stack or 629anywhere else). 630 631\b Unlike in C++, the abstraction's \q{vtable} structure does not only 632hold methods that you can call on an instance object. It can also 633hold several other kinds of thing: 634 635\lcont{ 636 637\b Methods that you can call \e{without} an instance object, given 638only the vtable structure identifying a particular implementation of 639the trait. You might think of these as \q{static methods}, as in C++, 640except that they're \e{virtual} \dash the same code can call the 641static method of a different \q{class} given a different vtable 642pointer. So they're more like \q{virtual static methods}, which is a 643concept C++ doesn't have. An example is the \cw{pubkey_bits} method in 644\cw{ssh_keyalg}. 645 646\b The most important case of a \q{virtual static method} is the 647\cw{new} method that allocates and returns a new object. You can think 648of it as a \q{virtual constructor} \dash another concept C++ doesn't 649have. (However, not all types need one of these: see below.) 650 651\b The vtable can also contain constant data relevant to the class as 652a whole \dash \q{virtual constant data}. For example, a cryptographic 653hash function will contain an integer field giving the length of the 654output hash, and most crypto primitives will contain a string field 655giving the identifier used in the SSH protocol that describes that 656primitive. 657 658The effect of all of this is that you can make other pieces of code 659able to use any instance of one of these types, by passing it an 660actual vtable as a parameter. For example, the \cw{hash_simple} 661function takes an \cw{ssh_hashalg} vtable pointer specifying any hash 662algorithm you like, and internally, it creates an object of that type, 663uses it, and frees it. In C++, you'd probably do this using a 664template, which would mean you had multiple specialisations of 665\cw{hash_simple} \dash and then it would be much more difficult to 666decide \e{at run time} which one you needed to use. Here, 667\cw{hash_simple} is still just one function, and you can decide as 668late as you like which vtable to pass to it. 669 670} 671 672\b The abstract \e{instance} structure can also contain publicly 673visible data fields (this time, usually treated as mutable) which are 674common to all implementations of the trait. For example, 675\cw{BinaryPacketProtocol} has lots of these. 676 677\b Not all abstractions of this kind want virtual constructors. It 678depends on how different the implementations are. 679 680\lcont{ 681 682With a crypto primitive like a hash algorithm, the constructor call 683looks the same for every implementing type, so it makes sense to have 684a standardised virtual constructor in the vtable and a 685\cw{ssh_hash_new} wrapper function which can make an instance of 686whatever vtable you pass it. And then you make all the vtable objects 687themselves globally visible throughout the source code, so that any 688module can call (for example) \cw{ssh_hash_new(&ssh_sha256)}. 689 690But with other kinds of object, the constructor for each implementing 691type has to take a different set of parameters. For example, 692implementations of \cw{Socket} are not generally interchangeable at 693construction time, because constructing different kinds of socket 694require totally different kinds of address parameter. In that 695situation, it makes more sense to keep the vtable structure itself 696private to the implementing source file, and instead, publish an 697ordinary constructing function that allocates and returns an instance 698of that particular subtype, taking whatever parameters are appropriate 699to that subtype. 700 701} 702 703\b If you do have virtual constructors, you can choose whether they 704take a vtable pointer as a parameter (as shown above), or an 705\e{existing} instance object. In the latter case, they can refer to 706the object itself as well as the vtable. For example, you could have a 707trait come with a virtual constructor called \q{clone}, meaning 708\q{Make a copy of this object, no matter which implementation it is.} 709 710\b Sometimes, a single vtable structure type can be shared between two 711completely different object types, and contain all the methods for 712both. For example, \cw{ssh_compression_alg} contains methods to 713create, use and free \cw{ssh_compressor} and \cw{ssh_decompressor} 714objects, which are not interchangeable \dash but putting their methods 715in the same vtable means that it's easy to create a matching pair of 716objects that are compatible with each other. 717 718\b Passing the vtable itself as an argument to the \cw{new} method is 719not compulsory: if a given \cw{new} implementation is only used by a 720single vtable, then that function can simply hard-code the vtable 721pointer that it writes into the object it constructs. But passing the 722vtable is more flexible, because it allows a single constructor 723function to be shared between multiple slightly different object 724types. For example, SHA-384 and SHA-512 share the same \cw{new} method 725and the same implementation data type, because they're very nearly the 726same hash algorithm \dash but a couple of the other methods in their 727vtables are different, because the \q{reset} function has to set up 728the initial algorithm state differently, and the \q{digest} method has 729to write out a different amount of data. 730 731\lcont{ 732 733One practical advantage of having the \cw{myabs_}\e{foo} family of 734inline wrapper functions in the header file is that if you change your 735mind later about whether the vtable needs to be passed to \cw{new}, 736you only have to update the \cw{myabs_new} wrapper, and then the 737existing call sites won't need changing. 738 739} 740 741\b Another piece of \q{stunt object orientation} made possible by this 742scheme is that you can write two vtables that both use the same 743structure layout for the implementation object, and have an object 744\e{transform from one to the other} part way through its lifetime, by 745overwriting its own vtable pointer field. For example, the 746\cw{sesschan} type that handles the server side of an SSH terminal 747session will sometimes transform in mid-lifetime into an SCP or SFTP 748file-transfer channel in this way, at the point where the client sends 749an \cq{exec} or \cq{subsystem} request that indicates that that's what 750it wants to do with the channel. 751 752\lcont{ 753 754This concept would be difficult to arrange in C++. In Rust, it 755wouldn't even \e{make sense}, because in Rust, objects implementing a 756trait don't even contain a vtable pointer at all \dash instead, the 757\q{trait object} type (identifying a specific instance of some 758implementation of a given trait) consists of a pair of pointers, one 759to the object itself and one to the vtable. In that model, the only 760way you could make an existing object turn into a different trait 761would be to know where all the pointers to it were stored elsewhere in 762the program, and persuade all their owners to rewrite them. 763 764} 765 766\b Another stunt you can do is to have a vtable that doesn't have a 767corresponding implementation structure at all, because the only 768methods implemented in it are the constructors, and they always end up 769returning an implementation of some other vtable. For example, some of 770PuTTY's crypto primitives have a hardware-accelerated version and a 771pure software version, and decide at run time which one to use (based 772on whether the CPU they're running on supports the necessary 773acceleration instructions). So, for example, there are vtables for 774\cw{ssh_sha256_sw} and \cw{ssh_sha256_hw}, each of which has its own 775data layout and its own implementations of all the methods; and then 776there's a top-level vtable \cw{ssh_sha256}, which only provides the 777\q{new} method, and implements it by calling the \q{new} method on one 778or other of the subtypes depending on what it finds out about the 779machine it's running on. That top-level selector vtable is nearly 780always the one used by client code. (Except for the test suite, which 781has to instantiate both of the subtypes in order to make sure they 782both pass the tests.) 783 784\lcont{ 785 786As a result, the top-level selector vtable \cw{ssh_sha256} doesn't 787need to implement any method that takes an \cw{ssh_cipher *} 788parameter, because no \cw{ssh_cipher} object is ever constructed whose 789\cw{vt} field points to \cw{&ssh_sha256}: they all point to one of the 790other two full implementation vtables. 791 792} 793 794\H{udp-compile-once} Single compilation of each source file 795 796The PuTTY build system for any given platform works on the following 797very simple model: 798 799\b Each source file is compiled precisely once, to produce a single 800object file. 801 802\b Each binary is created by linking together some combination of 803those object files. 804 805Therefore, if you need to introduce functionality to a particular 806module which is only available in some of the tool binaries (for 807example, a cryptographic proxy authentication mechanism which needs 808to be left out of PuTTYtel to maintain its usability in 809crypto-hostile jurisdictions), the \e{wrong} way to do it is by 810adding \cw{#ifdef}s in (say) \cw{proxy.c}. This would require 811separate compilation of \cw{proxy.c} for PuTTY and PuTTYtel, which 812means that the entire \cw{Makefile}-generation architecture (see 813\k{udp-makefiles-auto}) would have to be significantly redesigned. 814Unless you are prepared to do that redesign yourself, \e{and} 815guarantee that it will still port to any future platforms we might 816decide to run on, you should not attempt this! 817 818The \e{right} way to introduce a feature like this is to put the new 819code in a separate source file, and (if necessary) introduce a 820second new source file defining the same set of functions, but 821defining them as stubs which don't provide the feature. Then the 822module whose behaviour needs to vary (\cw{proxy.c} in this example) 823can call the functions defined in these two modules, and it will 824either provide the new feature or not provide it according to which 825of your new modules it is linked with. 826 827Of course, object files are never shared \e{between} platforms; so 828it is allowable to use \cw{#ifdef} to select between platforms. This 829happens in \cw{puttyps.h} (choosing which of the platform-specific 830include files to use), and also in \cw{misc.c} (the Windows-specific 831\q{Minefield} memory diagnostic system). It should be used 832sparingly, though, if at all. 833 834\H{udp-perfection} Do as we say, not as we do 835 836The current PuTTY code probably does not conform strictly to \e{all} 837of the principles listed above. There may be the occasional 838SSH-specific piece of code in what should be a backend-independent 839module, or the occasional dependence on a non-standard X library 840function under Unix. 841 842This should not be taken as a licence to go ahead and violate the 843rules. Where we violate them ourselves, we're not happy about it, 844and we would welcome patches that fix any existing problems. Please 845try to help us make our code better, not worse! 846