• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..07-May-2022-

cmake/H28-Jul-2021-4435

CODING.mdH A D28-Jul-20212.9 KiB11683

CONTRIB.mdH A D28-Jul-2021164 116

ChangeLogH A D28-Jul-20214.6 KiB183130

LICENSEH A D28-Jul-202111.1 KiB202169

Makefile.inH A D28-Jul-20213.1 KiB14297

README.mdH A D28-Jul-202118.1 KiB498357

buffer.ccH A D28-Jul-20218.1 KiB393289

client.ccH A D28-Jul-20217.6 KiB337264

cody.hhH A D28-Jul-202121.5 KiB805470

config.h.inH A D28-Jul-2021666 2717

config.m4H A D28-Jul-20212.8 KiB10193

configureH A D28-Jul-2021117.4 KiB4,1353,411

configure.acH A D28-Jul-20211.9 KiB8269

fatal.ccH A D28-Jul-20211,022 5843

internal.hhH A D28-Jul-20213.1 KiB13499

netclient.ccH A D28-Jul-20212.7 KiB142107

netserver.ccH A D28-Jul-20213 KiB155118

packet.ccH A D28-Jul-2021805 5136

resolver.ccH A D28-Jul-20214.4 KiB212165

server.ccH A D28-Jul-20216.9 KiB308251

README.md

1# libCODY: COmpiler DYnamism<sup><a href="#1">1</a></sup>
2
3Copyright (C) 2020 Nathan Sidwell, nathan@acm.org
4
5libCODY is an implementation of a communication protocol between
6compilers and build systems.
7
8**WARNING:**  This is preliminary software.
9
10In addition to supporting C++modules, this may also support LTO
11requirements and could also deal with generated #include files
12and feed the compiler with prepruned include paths and whatnot.  (The
13system calls involved in include searches can be quite expensive on
14some build infrastructures.)
15
16* Client and Server objects
17* Direct connection for in-process use
18* Testing with Joust (that means nothing to you, doesn't it!)
19
20
21## Problem Being Solved
22
23The origin is in C++20 modules:
24```
25import foo;
26```
27
28At that import, the compiler needs<sup><a href="#2">2</a></sup> to
29load up the compiled serialization of module `foo`.  Where is that
30file?  Does it even exist?  Unless the build system already knows the
31dependency graph, this might be a completely unknown module.  Now, the
32build system knows how to build things, but it might not have complete
33information about the dependencies.  The ultimate source of
34dependencies is the source code being compiled, and specifying the
35same thing in multiple places is a recipe for build skew.
36
37Hence, a protocol by which a compiler can query a build system.  This
38was originally described in <a
39href="https://wg21.link/p1184r1">p1184r1:A Module Mapper</a>.  Along
40with a proof-of-concept hack in GNUmake, described in <a
41href="https://wg21.link/p1602">p1602:Make Me A Module</a>. The current
42implementation has evolved and an update to p1184 will be forthcoming.
43
44## Packet Encoding
45
46The protocol is turn-based.  The compiler sends a block of one or more
47requests to the builder, then waits for a block of responses to all of
48those requests.  If the builder needs to compile something to satisfy
49a request, there may be some time before the response.  A builder may
50service multiple compilers concurrently, each as a separate connection.
51
52When multiple requests are in a block, the responses are also in a
53block, and in corresponding order.  The responses must not be
54commenced eagerly -- they must wait until the incoming block has ended
55(as mentioned above, it is turn-based).  To do otherwise risks
56deadlock, as there is no requirement for a sending end of the
57communication to listen for incoming responses (or new requests) until
58it has completed sending its current block.
59
60Every request has a response.
61
62Requests and responses are user-readable text.  It is not intended as
63a transmission medium to send large binary objects (such as compiled
64modules).  It is presumed the builder and the compiler share a file
65system, for that kind of thing.<sup><a href="#3">3</a></sup>
66
67Messages characters are encoded in UTF8.
68
69Messages are a sequence of octets ending with a NEWLINE (0xa).  The lines
70consist of a sequence of words, separated by WHITESPACE (0x20 or 0x9).
71Words themselves do not contain WHITESPACE.  Lines consisting solely
72of WHITESPACE (or empty) are ignored.
73
74To encode a block of multiple messages, non-final messages end with a
75single word of SEMICOLON (0x3b), immediately before the NEWLINE.  Thus
76a serial connection can determine whether a block is complete without
77decoding the messages.
78
79Words containing characters in the set [-+_/%.A-Za-z0-9] need not be
80quoted.  Words containing characters outside that set should be
81quoted.  A zero-length word may be achieved with `''`
82
83Quoted words begin and end with APOSTROPHE (x27). Within the quoted
84word, BACKSLASH (x5c) is used as an escape mechanism, with the
85following meanings:
86
87* \\n - NEWLINE (0xa)
88* \\t - TAB (0x9)
89* \\' - APOSTROPHE (')
90* \\\\ - BACKSLASH (\\)
91
92Characters in the range [0x00, 0x20) and 0x7f are encoded with one or
93two lowercase hex characters.  Octets in the range [0x80,0xff) are
94UTF8 encodings of unicode characters outside the traditional ASCII set
95and passed as such.
96
97Decoding should be more relaxed.  Unquoted words containing characters
98in the range [0x20,0xff] other than BACKSLASH or APOSTROPHE should be
99accepted.  In a quoted sequence, `\` followed by one or two lower case
100hex characters decode to that octet.  Further, words can be
101constructed from a mixture of abutted quoted and unquoted sequences.
102For instance `FOO' 'bar` would decode to the word `FOO bar`.
103
104Notice that the block continuation marker of `;` is not a valid
105encoding of the word `;`, which would be `';'`.
106
107It is recommended that words are separated by single SPACE characters.
108
109## Messages
110
111The message descriptions use `$metavariable` examples.
112
113The request messages are specific to a particular action.  The response
114messages are more generic, describing their value types, but not their
115meaning.  Message consumers need to know the response to decode them.
116Notice the `Packet::GetRequest()` method records in response packets
117what the request being responded to was.  Do not confuse this with the
118`Packet::GetCode ()` method.
119
120### Responses
121
122The simplest response is a single:
123
124`OK`
125
126This indicates the request was successful.
127
128
129An error response is:
130
131`ERROR $message`
132
133The message is a human-readable string.  It indicates failure of the request.
134
135Pathnames are encoded with:
136
137`PATHNAME $pathname`
138
139Boolean responses use:
140
141`BOOL `(`TRUE`|`FALSE`)
142
143### Handshake Request
144
145The first message is a handshake:
146
147`HELLO $version $compiler $ident`
148
149The `$version` is a numeric value, currently `1`.  `$compiler` identifies
150the compiler &mdash; builders may need to keep compiled modules from
151different compilers separate.  `$ident` is an identifier the builder
152might use to identify the compilation it is communicating with.
153
154Responses are:
155
156`HELLO $version $builder [$flags]`
157
158A successful handshake.  The communication is now connected and other
159messages may be exchanged.  An ERROR response indicates an unsuccessful
160handshake.  The communication remains unconnected.
161
162There is nothing restricting a handshake to its own message block.  Of
163course, if the handshake fails, subsequent non-handshake messages in
164the block will fail (producing error responses).
165
166The `$flags` word, if present allows a server to control what requests
167might be given.  See below.
168
169### C++ Module Requests
170
171A set of requests are specific to C++ modules:
172
173#### Flags
174
175Several requests and one response have an optional `$flags` word.
176These are the `Cody::Flags` value pertaining to that request.  If
177omitted the value 0 is implied.  The following flags are available:
178
179* `0`, `None`: No flags.
180
181* `1<<0`, `NameOnly`: The request is for the name only, and not the
182  CMI contents.
183
184The `NameOnly` flag may be provded in a handshake response, and
185indicates that the server is interested in requests only for their
186implied dependency information.  It may be provided on a request to
187indicate that only the CMI name is required, not its contents (for
188instance, when preprocessing).  Note that a compiler may still make
189`NameOnly` requests even if the server did not ask for such.
190
191#### Repository
192
193All relative CMI file names are relative to a repository.  (There are
194usually no absolute CMI files).  The repository may be determined
195with:
196
197`MODULE-REPO`
198
199A PATHNAME response is expected.  The `$pathname` may be an empty
200word, which is equivalent to `.`.  When the response is a relative
201pathname, it must be relative to the client's current working
202directory (which might be a process on a different host to the
203server).  You may set the repository to `/`, if you with to use paths
204relative to the root directory.
205
206#### Exporting
207
208A compilation of a module interface, partition or header unit can
209inform the builder with:
210
211`MODULE-EXPORT $module [$flags]`
212
213This will result in a PATHNAME response naming the Compiled Module
214Interface pathname to write.
215
216The `MODULE-EXPORT` request does not indicate the module has been
217successfully compiled.  At most one `MODULE-EXPORT` is to be made, and
218as the connection is for a single compilation, the builder may infer
219dependency relationships between the module being generated and import
220requests made.
221
222Named module names and header unit names are distinguished by making
223the latter unambiguously look like file names.  Firstly, they must be
224fully resolved according to the compiler's usual include path.  If
225that results in an absolute name file name (beginning with `/`, or
226certain other OS-specific sequences), all is well.  Otherwise a
227relative file name must be prefixed by `./` to be distinguished from a
228similarly named named module.  This prefixing must occur, even if the
229header-unit's name contains characters that cannot appear in a named
230module's name.
231
232It is expected that absolute header-unit names convert to relative CMI
233names, to keep all CMIs within the CMI repository.  This means that
234steps must be taken to distinguish the CMIs for `/here` from `./here`,
235and this can be achieved by replacing the leading `./` directory with
236`,/`, which is visually similar but does not have the self-reference
237semantics of dot.  Likewise, header-unit names containing `..`
238directories, can be remapped to `,,`.  (When symlinks are involved
239`bob/dob/..` might not be `bob`, of course.)  C++ header-unit
240semantics are such that there is no need to resolve multiple ways of
241spelling a particular header-unit to a unique CMI file.
242
243Successful compilation of an interface is indicated with a subsequent:
244
245`MODULE-COMPILED $module [$flags]`
246
247request.  This indicates the CMI file has been written to disk, so
248that any other compilations waiting on it may proceed.  Depending on
249compiler implementation, the CMI may be written before the compilation
250completes.  A single OK response is expected.
251
252Compilation failure can be inferred by lack of a `MODULE-COMPILED`
253request.  It is presumed the builder can determine this, as it is also
254responsible for launching and reaping the compiler invocations
255themselves.
256
257#### Importing
258
259Importation, including that of header-units, uses:
260
261`MODULE-IMPORT $module [$flags]`
262
263A PATHNAME response names the CMI file to be read.  Should the builder
264have to invoke a compilation to produce the CMI, the response should
265be delayed until that occurs.  If such a compilation fails, an error
266response should be provided to the requestor &mdash; which will then
267presumably fail in some manner.
268
269#### Include Translation
270
271Include translation can be determined with:
272
273`INCLUDE-TRANSLATE $header [$flags]`
274
275The header name, `$header`, is the fully resolved header name, in the
276above-mentioned unambiguous filename form.  The response will either
277be a BOOL response indicating textual inclusion, or a PATHNAME
278response naming the CMI for such translation.  The BOOL value is TRUE,
279if the header is known to be a textual header, and FALSE if nothing is
280known about it -- the latter might cause diagnostics about incomplete
281knowledge.
282
283### GCC LTO Messages
284
285These set of requests are used for GCC LTO jobserver integration with GNU Make
286
287## Building libCody
288
289Libcody is written in C++11.  (It's a intended for compilers, so
290there'd be a bootstrapping problem if it used the latest and greatest.)
291
292### Using configure and make.
293
294It supports the usual `configure`, `make`, `make check` & `make install`
295sequence.  It does not support building in the source directory --
296that just didn't drop out, and it's not how I build things (because,
297again, for compilers).  Excitingly it uses my own `joust` test
298harness, so you'll need to build and install that somewhere, if you
299want the comfort of testing.
300
301The following configure options are available, in addition to the usual set:
302
303* `--enable-checking` Compile with assert-like checking.  Defaults to on.
304
305* `--with-tooldir=DIR` Prepend `DIR` to `PATH` when building (`DIR`
306  need not already include the trailing `/bin`, and the right things
307  happen).  Use this if you need to point to non-standard tools that
308  you usually don't have in your path.  This path is also used when
309  the configure script searches for programs.
310
311* `--with-toolinc=DIR`, `--with-toollib=DIR`, include path and library
312  path variants of `--with-tooldir`.  If these are siblings of the
313  tool bin directory, they'll be found automatically.
314
315* `--with-compiler=NAME` Specify a particular compiler to use.
316  Usually what configure finds is sufficiently usable.
317
318* `--with-bugurl=URL` Override the bugreporting URL.  Do this if
319  you're providing libcody as part of a package that /you/ are
320  supporting.
321
322* `--enable-maintainer-mode` Specify that rules to rebuild things like
323  `configure` (with `autoconf`) should be enabled.  When not enabled,
324  you'll get a message if these appear out of date, but that can
325  happen naturally after an update or clone as `git`, in common with
326  other VCs, doesn't preserve the relative ordering of file
327  modifications.  You can use `make MAINTAINER=touch` to shut make up,
328  if this occurs (or manually execute the `autoconf` and related
329  commands).
330
331When building, you can override the default optimization flags with
332`CXXFLAGS=$flags`.  I often build a debuggable library with `make
333CXXFLAGS=-g3`.
334
335The `Makefile` will also parallelize according to the number of CPUs,
336unless you specify explicitly with a `-j` option.  This is a little
337clunky, as it's not possible to figure out inside the makefile whether
338the user provided `-j`.  (Or at least I've not figured out how.)
339
340### Using cmake and make
341
342#### In the clang/LLVM project
343
344The primary motivation for a cmake implementation is to allow building
345libcody "in tree" in clang/LLVM.  In that case, a checkout of libcody
346can be placed (or symbolically linked) into clang/tools.  This will
347configure and build the library along with other LLVM dependencies.
348
349*NOTE* This is not treated as an installable entity (it is present only
350for use by the project).
351
352*NOTE* The testing targets would not be appropriate in this configuration;
353it is expected that lit-based testing of the required functionality will be
354done by the code using the library.
355
356#### Stand-alone
357
358For use on platforms that don't support configure & make effectively, it
359is possible to use the cmake & make process in stand-alone mode (similar
360to the configure & make process above).
361
362An example use.
363```
364cmake -DCMAKE_INSTALL_PREFIX=/path/to/installation -DCMAKE_CXX_COMPILER=clang++ /path/to/libcody/source
365make
366make install
367```
368Supported flags (additions to the usual cmake ones).
369
370* `-DCODY_CHECKING=ON,OFF`: Compile with assert-like checking. (defaults ON)
371
372* `-DCODY_WITHEXCEPTIONS=ON,OFF`: Compile with C++ exceptions and RTTI enabled.
373(defaults OFF, to be compatible with GCC and LLVM).
374
375*TODO*: At present there is no support for `ctest` integration (this should be
376feasible, provided that `joust` is installed and can be discovered by `cmake`).
377
378## API
379
380The library defines entities in the `::Cody` namespace.
381
382There are 4 user-visible classes:
383
384* `Packet`: Responses to requests are `Packets`.  These have a code,
385  indicating the response kind, and a payload.
386
387* `Client`: The compiler-end of a connection.  Requests may be made
388  and responses are returned.
389
390* `Server`: The builder-end of a connection.  Requests may be waited
391  for, and responses made.  Builders that serve multiple concurrent
392  connections and spawn compilations to resolve dependencies may need
393  to derive from this class to provide response queuing.
394
395* `Resolver`: The processing engine of the builder side.  User code is
396  expected to derive from this class and provide virtual function
397  overriders to affect the semantics of the resolver.
398
399In addition there are a number of helpers to setup connections.
400
401Logically the Client and the Server communicate via a sequential
402channel.  The channel may be provided by:
403
404* two pipes, with different file descriptors for reading and writing
405  at each end.
406
407* a socket, which will use the same file descriptor for reading and
408  writing.  the socket can be created in a number of ways, including
409  Unix domain and IPv6 TCP, for which helpers are provided.
410
411* a direct, in-process, connection, using buffer swapping.
412
413The communication channel is presumed reliable.
414
415Refer to the (currently very sparse) doxygen-generated documentation
416for details of the API.
417
418## Examples
419
420To create an in-process resolver, use the following boilerplate:
421
422```
423class MyResolver : Cody::Resolver { ... stuff here ... };
424
425Cody::Client *MakeClient (char const *maybe_ident)
426{
427  auto *r = new MyResolver (...);
428  auto *s = new Cody::Server (r);
429  auto *c = new Cody::Client (s);
430
431  auto t = c->ConnectRequest ("ME", maybe_ident);
432  if (t.GetCode () == Cody::Client::TC_CONNECT)
433    ;// Yay!
434  else if (t.GetCode () == Cody::Client::TC_ERROR)
435    report_error (t.GetString ());
436
437  return c;
438}
439
440```
441
442For a remotely connecting client:
443```
444Cody::Client *MakeClient ()
445{
446  char const *err = nullptr;
447  int fd = OpenInet6 (char const **err, name, port);
448  if (fd < 0)
449    { ... error... return nullptr;}
450
451  auto *c = new Cody::Client (fd);
452
453  auto t = c->ConnectRequest ("ME", maybe_ident);
454  if (t.GetCode () == Cody::Client::TC_CONNECT)
455    ;// Yay!
456  else if (t.GetCode () == Cody::Client::TC_ERROR)
457    report_error (t.GetString ());
458
459  return c;
460}
461```
462
463# Future Directions
464
465* Current Directory.  There is no mechanism to check the builder and
466  the compiler have the same working directory.  Perhaps that should
467  be addressed.
468
469* Include path canonization and/or header file lookup.  This can be
470  expensive, particularly with many `-I` options, due to the system
471  calls.  Perhaps using a common resource would be cheaper?
472
473* Generated header file lookup/construction.  This is essentially the
474  same problem as importing a module, and build systems are crap at
475  dealing with this.
476
477* Link-time compilations.  Another place the compiler would like to
478  ask the build system to do things.
479
480* C++20 API entrypoints &mdash; std:string_view would be nice
481
482* Exception-safety audit.  Exceptions are not used, but memory
483  exhaustion could happen.  And perhaps user's resolver code employs
484  exceptions?
485
486<a name="1">1</a>: Or a small town in Wyoming
487
488<a name="2">2</a>: This describes one common implementation technique.
489The std itself doesn't require such serializations, but the ability to
490create them is kind of the point.  Also, 'compiler' is used where we
491mean any consumer of a module, and 'build system' where we mean any
492producer of a module.
493
494<a name="3">3</a>: Even when the builder is managing a distributed set
495of compilations, the builder must have a mechanism to get source files
496to, and object files from, the compilations.  That scheme can also
497transfer the CMI files.
498