1
2μpb Design
3----------
4
5μpb has the following design goals:
6
7- C89 compatible.
8- small code size (both for the core library and generated messages).
9- fast performance (hundreds of MB/s).
10- idiomatic for C programs.
11- easy to wrap in high-level languages (Python, Ruby, Lua, etc) with
12  good performance and all standard protobuf features.
13- hands-off about memory management, allowing for easy integration
14  with existing VMs and/or garbage collectors.
15- offers binary ABI compatibility between apps, generated messages, and
16  the core library (doesn't require re-generating messages or recompiling
17  your application when the core library changes).
18- provides all features that users expect from a protobuf library
19  (generated messages in C, reflection, text format, etc.).
20- layered, so the core is small and doesn't require descriptors.
21- tidy about symbol references, so that any messages or features that
22  aren't used by a C program can have their code GC'd by the linker.
23- possible to use protobuf binary format without leaking message/field
24  names into the binary.
25
26μpb accomplishes these goals by keeping a very small core that does not contain
27descriptors.  We need some way of knowing what fields are in each message and
28where they live, but instead of descriptors, we keep a small/lightweight summary
29of the .proto file.  We call this a `upb_msglayout`.  It contains the bare
30minimum of what we need to know to parse and serialize protobuf binary format
31into our internal representation for messages, `upb_msg`.
32
33The core then contains functions to parse/serialize a message, given a `upb_msg*`
34and a `const upb_msglayout*`.
35
36This approach is similar to [nanopb](https://github.com/nanopb/nanopb) which
37also compiles message definitions to a compact, internal representation without
38names.  However nanopb does not aim to be a fully-featured library, and has no
39support for text format, JSON, or descriptors.  μpb is unique in that it has a
40small core similar to nanopb (though not quite as small), but also offers a
41full-featured protobuf library for applications that want reflection, text
42format, JSON format, etc.
43
44Without descriptors, the core doesn't have access to field names, so it cannot
45parse/serialize to protobuf text format or JSON.  Instead this functionality
46lives in separate modules that depend on the module implementing descriptors.
47With the descriptor module we can parse/serialize binary descriptors and
48validate that they follow all the rules of protobuf schemas.
49
50To provide binary compatibility, we version the structs that generated messages
51use to create a `upb_msglayout*`.  The current initializers are
52`upb_msglayout_msginit_v1`, `upb_msglayout_fieldinit_v1`, etc.  Then
53`upb_msglayout*` uses these as its internal representation.  If upb changes its
54internal representation for a `upb_msglayout*`, it will also include code to
55convert the old representation to the new representation.  This will use some
56more memory/CPU at runtime to convert between the two, but apps that statically
57link μpb will never need to worry about this.
58
59TODO
60----
61
621. revise our generated code until it is in a state where we feel comfortable
63   committing to API/ABI stability for it.  In particular there is an open
64   question of whether non-ABI-compatible field accesses should have a
65   fastpath different from the ABI-compatible field access.
661. Add missing features (maps, extensions, unknown fields).
671. Flesh out C++ wrappers.
681. *(lower-priority)*: revise all of the existing encoders/decoders and
69   handlers.  We probably will want to keep handlers, since they let us decouple
70   encoders/decoders from `upb_msg`, but we need to simplify all of that a LOT.
71   Likely we will want to make handlers only per-message instead of per-field,
72   except for variable-length fields.
73