README.md
1# protoc-gen-star (PG*) [![Build Status](https://travis-ci.org/lyft/protoc-gen-star.svg?branch=master)](https://travis-ci.org/lyft/protoc-gen-star) [![GoDoc](https://godoc.org/github.com/lyft/protoc-gen-star?status.svg)](https://godoc.org/github.com/lyft/protoc-gen-star)
2
3**!!! THIS PROJECT IS A WORK-IN-PROGRESS | THE API SHOULD BE CONSIDERED UNSTABLE !!!**
4
5_PG* is a protoc plugin library for efficient proto-based code generation_
6
7```go
8package main
9
10import "github.com/lyft/protoc-gen-star"
11
12func main() {
13 pgs.Init(pgs.DebugEnv("DEBUG")).
14 RegisterModule(&myPGSModule{}).
15 RegisterPostProcessor(&myPostProcessor{}).
16 Render()
17}
18```
19
20## Features
21
22### Documentation
23
24While this README seeks to describe many of the nuances of `protoc` plugin development and using PG*, the true documentation source is the code itself. The Go language is self-documenting and provides tools for easily reading through it and viewing examples. The docs can be viewed on [GoDoc](https://godoc.org/github.com/lyft/protoc-gen-star) or locally by running `make docs`, which will start a `godoc` server and open them in the default browser.
25
26### Roadmap
27
28- [x] Interface-based and fully-linked dependency graph with access to raw descriptors
29- [x] Built-in context-aware debugging capabilities
30- [x] Exhaustive, near 100% unit test coverage
31- [x] End-to-end testable via overrideable IO & Interface based API
32- [x] [`Visitor`][visitor] pattern and helpers for efficiently walking the dependency graph
33- [x] [`BuildContext`][context] to facilitate complex generation
34- [x] Parsed, typed command-line [`Parameters`][params] access
35- [x] Extensible `ModuleBase` for quickly creating `Modules` and facilitating code generation
36- [x] Configurable post-processing (eg, gofmt) of generated files
37- [x] Support processing proto files from multiple packages
38- [x] Load comments (via SourceCodeInfo) from proto files into gathered AST for easy access
39- [x] Language-specific helper subpackages for handling common, nuanced generation tasks
40- [ ] Load plugins/modules at runtime using Go shared libraries
41
42### Examples
43
44[`protoc-gen-example`][pge], can be found in the `testdata` directory. It includes two `Module` implementations using a variety of the features available. It's `protoc` execution is included in the `testdata/generated` [Makefile][make] target. Examples are also accessible via the documentation by running `make docs`.
45
46## How It Works
47
48### The `protoc` Flow
49
50Because the process is somewhat confusing, this section will cover the entire flow of how proto files are converted to generated code, using a hypothetical PG* plugin: `protoc-gen-myplugin`. A typical execution looks like this:
51
52```sh
53protoc \
54 -I . \
55 --myplugin_out="foo=bar:../generated" \
56 ./pkg/*.proto
57```
58
59`protoc`, the PB compiler, is configured using a set of flags (documented under `protoc -h`) and handed a set of files as arguments. In this case, the `I` flag can be specified multiple times and is the lookup path it uses for imported dependencies in a proto file. By default, the official descriptor protos are already included.
60
61`myplugin_out` tells `protoc` to use the `protoc-gen-myplugin` protoc-plugin. These plugins are automatically resolved from the system's `PATH` environment variable, or can be explicitly specified with another flag. The official protoc-plugins (eg, `protoc-gen-python`) are already registered with `protoc`. The flag's value is specific to the particular plugin, with the exception of the `:../generated` suffix. This suffix indicates the root directory in which `protoc` will place the generated files from that package (relative to the current working directory). This generated output directory is _not_ propagated to `protoc-gen-myplugin`, however, so it needs to be duplicated in the left-hand side of the flag. PG* supports this via an `output_path` parameter.
62
63`protoc` parses the passed in proto files, ensures they are syntactically correct, and loads any imported dependencies. It converts these files and the dependencies into descriptors (which are themselves PB messages) and creates a `CodeGeneratorRequest` (yet another PB). `protoc` serializes this request and then executes each configured protoc-plugin, sending the payload via `stdin`.
64
65`protoc-gen-myplugin` starts up, receiving the request payload, which it unmarshals. There are two phases to a PG*-based protoc-plugin. First, PG* unmarshals the `CodeGeneratorRequest` received from `protoc`, and creates a fully connected abstract syntax tree (AST) of each file and all its contained entities. Any parameters specified for this plugin are also parsed for later consumption.
66
67When this step is complete, PG* then executes any registered `Modules`, handing it the constructed AST. `Modules` can be written to generate artifacts (eg, files) or just performing some form of validation over the provided graph without any other side effects. `Modules` provide the great flexibility in terms of operating against the PBs.
68
69Once all `Modules` are run, PG* writes any custom artifacts to the file system or serializes generator-specific ones into a `CodeGeneratorResponse` and sends the data to its `stdout`. `protoc` receives this payload, unmarshals it, and persists any requested files to disk after all its plugins have returned. This whole flow looks something like this:
70
71```
72foo.proto → protoc → CodeGeneratorRequest → protoc-gen-myplugin → CodeGeneratorResponse → protoc → foo.pb.go
73```
74
75The PG* library hides away nearly all of this complexity required to implement a protoc-plugin!
76
77### Modules
78
79PG* `Modules` are handed a complete AST for those files that are targeted for generation as well as all dependencies. A `Module` can then add files to the protoc `CodeGeneratorResponse` or write files directly to disk as `Artifacts`.
80
81PG* provides a `ModuleBase` struct to simplify developing modules. Out of the box, it satisfies the interface for a `Module`, only requiring the creation of `Name` and `Execute` methods. `ModuleBase` is best used as an anonyomous embedded field of a wrapping `Module` implementation. A minimal module would look like the following:
82
83```go
84// ReportModule creates a report of all the target messages generated by the
85// protoc run, writing the file into the /tmp directory.
86type reportModule struct {
87 *pgs.ModuleBase
88}
89
90// New configures the module with an instance of ModuleBase
91func New() pgs.Module { return &reportModule{&pgs.ModuleBase{}} }
92
93// Name is the identifier used to identify the module. This value is
94// automatically attached to the BuildContext associated with the ModuleBase.
95func (m *reportModule) Name() string { return "reporter" }
96
97// Execute is passed the target files as well as its dependencies in the pkgs
98// map. The implementation should return a slice of Artifacts that represent
99// the files to be generated. In this case, "/tmp/report.txt" will be created
100// outside of the normal protoc flow.
101func (m *reportModule) Execute(targets map[string]pgs.File, pkgs map[string]Package) []pgs.Artifact {
102 buf := &bytes.Buffer{}
103
104 for _, f := range targets {
105 m.Push(f.Name().String()).Debug("reporting")
106
107 fmt.Fprintf(buf, "--- %v ---", f.Name())
108
109 for i, msg := range f.AllMessages() {
110 fmt.Fprintf(buf, "%03d. %v\n", i, msg.Name())
111 }
112
113 m.Pop()
114 }
115
116 m.OverwriteCustomFile(
117 "/tmp/report.txt",
118 buf.String(),
119 0644,
120 )
121
122 return m.Artifacts()
123}
124```
125
126`ModuleBase` exposes a PG* [`BuildContext`][context] instance, already prefixed with the module's name. Calling `Push` and `Pop` allows adding further information to error and debugging messages. Above, each file from the target package is pushed onto the context before logging the "reporting" debug message.
127
128The base also provides helper methods for adding or overwriting both protoc-generated and custom files. The above execute method creates a custom file at `/tmp/report.txt` specifying that it should overwrite an existing file with that name. If it instead called `AddCustomFile` and the file existed, no file would have been generated (though a debug message would be logged out). Similar methods exist for adding generator files, appends, and injections. Likewise, methods such as `AddCustomTemplateFile` allows for `Templates` to be rendered instead.
129
130After all modules have been executed, the returned `Artifacts` are either placed into the `CodeGenerationResponse` payload for protoc or written out to the file system. For testing purposes, the file system has been abstracted such that a custom one (such as an in-memory FS) can be provided to the PG* generator with the `FileSystem` `InitOption`.
131
132#### Post Processing
133
134`Artifacts` generated by `Modules` sometimes require some mutations prior to writing to disk or sending in the response to protoc. This could range from running `gofmt` against Go source or adding copyright headers to all generated source files. To simplify this task in PG*, a `PostProcessor` can be utilized. A minimal looking `PostProcessor` implementation might look like this:
135
136```go
137// New returns a PostProcessor that adds a copyright comment to the top
138// of all generated files.
139func New(owner string) pgs.PostProcessor { return copyrightPostProcessor{owner} }
140
141type copyrightPostProcessor struct {
142 owner string
143}
144
145// Match returns true only for Custom and Generated files (including templates).
146func (cpp copyrightPostProcessor) Match(a pgs.Artifact) bool {
147 switch a := a.(type) {
148 case pgs.GeneratorFile, pgs.GeneratorTemplateFile,
149 pgs.CustomFile, pgs.CustomTemplateFile:
150 return true
151 default:
152 return false
153 }
154}
155
156// Process attaches the copyright header to the top of the input bytes
157func (cpp copyrightPostProcessor) Process(in []byte) (out []byte, err error) {
158 cmt := fmt.Sprintf("// Copyright © %d %s. All rights reserved\n",
159 time.Now().Year(),
160 cpp.owner)
161
162 return append([]byte(cmt), in...), nil
163}
164```
165
166The `copyrightPostProcessor` struct satisfies the `PostProcessor` interface by implementing the `Match` and `Process` methods. After PG* recieves all `Artifacts`, each is handed in turn to each registered processor's `Match` method. In the above case, we return `true` if the file is a part of the targeted Artifact types. If `true` is returned, `Process` is immediately called with the rendered contents of the file. This method mutates the input, returning the modified value to out or an error if something goes wrong. Above, the notice is prepended to the input.
167
168PostProcessors are registered with PG* similar to `Modules`:
169
170```go
171g := pgs.Init(pgs.IncludeGo())
172g.RegisterModule(some.NewModule())
173g.RegisterPostProcessor(copyright.New("PG* Authors"))
174```
175
176## Protocol Buffer AST
177
178While `protoc` ensures that all the dependencies required to generate a proto file are loaded in as descriptors, it's up to the protoc-plugins to recognize the relationships between them. To get around this, PG* uses constructs an abstract syntax tree (AST) of all the `Entities` loaded into the plugin. This AST is provided to every `Module` to facilitate code generation.
179
180### Hierarchy
181
182The hierarchy generated by the PG* `gatherer` is fully linked, starting at a top-level `Package` down to each individual `Field` of a `Message`. The AST can be represented with the following digraph:
183
184 <p align=center><img src="/testdata/ast/ast.png"></p>
185
186A `Package` describes a set of `Files` loaded within the same namespace. As would be expected, a `File` represents a single proto file, which contains any number of `Message`, `Enum` or `Service` entities. An `Enum` describes an integer-based enumeration type, containing each individual `EnumValue`. A `Service` describes a set of RPC `Methods`, which in turn refer to their input and output `Messages`.
187
188A `Message` can contain other nested `Messages` and `Enums` as well as each of its `Fields`. For non-scalar types, a `Field` may also reference its `Message` or `Enum` type. As a mechanism for achieving union types, a `Message` can also contain `OneOf` entities that refer to some of its `Fields`.
189
190### Visitor Pattern
191
192The structure of the AST can be fairly complex and unpredictable. Likewise, `Module's` are typically concerned with only a subset of the entities in the graph. To separate the `Module's` algorithm from understanding and traversing the structure of the AST, PG* implements the `Visitor` pattern to decouple the two. Implementing this interface is straightforward and can greatly simplify code generation.
193
194Two base `Visitor` structs are provided by PG* to simplify developing implementations. First, the `NilVisitor` returns an instance that short-circuits execution for all Entity types. This is useful when certain branches of the AST are not interesting to code generation. For instance, if the `Module` is only concerned with `Services`, it can use a `NilVisitor` as an anonymous field and only implement the desired interface methods:
195
196```go
197// ServiceVisitor logs out each Method's name
198type serviceVisitor struct {
199 pgs.Visitor
200 pgs.DebuggerCommon
201}
202
203func New(d pgs.DebuggerCommon) pgs.Visitor {
204 return serviceVistor{
205 Visitor: pgs.NilVisitor(),
206 DebuggerCommon: d,
207 }
208}
209
210// Passthrough Packages, Files, and Services. All other methods can be
211// ignored since Services can only live in Files and Files can only live in a
212// Package.
213func (v serviceVisitor) VisitPackage(pgs.Package) (pgs.Visitor, error) { return v, nil }
214func (v serviceVisitor) VisitFile(pgs.File) (pgs.Visitor, error) { return v, nil }
215func (v serviceVisitor) VisitService(pgs.Service) (pgs.Visitor, error) { return v, nil }
216
217// VisitMethod logs out ServiceName#MethodName for m.
218func (v serviceVisitor) VisitMethod(m pgs.Method) (pgs.Vistitor, error) {
219 v.Logf("%v#%v", m.Service().Name(), m.Name())
220 return nil, nil
221}
222```
223
224If access to deeply nested `Nodes` is desired, a `PassthroughVisitor` can be used instead. Unlike `NilVisitor` and as the name suggests, this implementation passes through all nodes instead of short-circuiting on the first unimplemented interface method. Setup of this type as an anonymous field is a bit more complex but avoids implementing each method of the interface explicitly:
225
226```go
227type fieldVisitor struct {
228 pgs.Visitor
229 pgs.DebuggerCommon
230}
231
232func New(d pgs.DebuggerCommon) pgs.Visitor {
233 v := &fieldVisitor{DebuggerCommon: d}
234 v.Visitor = pgs.PassThroughVisitor(v)
235 return v
236}
237
238func (v *fieldVisitor) VisitField(f pgs.Field) (pgs.Visitor, error) {
239 v.Logf("%v.%v", f.Message().Name(), f.Name())
240 return nil, nil
241}
242```
243
244Walking the AST with any `Visitor` is straightforward:
245
246```go
247v := visitor.New(d)
248err := pgs.Walk(v, pkg)
249```
250
251All `Entity` types and `Package` can be passed into `Walk`, allowing for starting a `Visitor` lower than the top-level `Package` if desired.
252
253## Build Context
254
255`Modules` registered with the PG* `Generator` are initialized with an instance of `BuildContext` that encapsulates contextual paths, debugging, and parameter information.
256
257### Output Paths
258
259The `BuildContext's` `OutputPath` method returns the output directory that the PG* plugin is targeting. This path is also initially `.` but refers to the directory in which `protoc` is executed. This default behavior can be overridden by providing an `output_path` in the flag.
260
261The `OutputPath` can be used to create file names for `Artifacts`, using `JoinPath(name ...string)` which is essentially an alias for `filepath.Join(ctx.OutputPath(), name...)`. Manually tracking directories relative to the `OutputPath` can be tedious, especially if the names are dynamic. Instead, a `BuildContext` can manage these, via `PushDir` and `PopDir`.
262
263```go
264ctx.OutputPath() // foo
265ctx.JoinPath("fizz", "buzz.go") // foo/fizz/buzz.go
266
267ctx = ctx.PushDir("bar/baz")
268ctx.OutputPath() // foo/bar/baz
269ctx.JoinPath("quux.go") // foo/bar/baz/quux.go
270
271ctx = ctx.PopDir()
272ctx.OutputPath() // foo
273```
274
275`ModuleBase` wraps these methods to mutate their underlying `BuildContexts`. Those methods should be used instead of the ones on the contained `BuildContext` directly.
276
277### Debugging
278
279The `BuildContext` exposes a `DebuggerCommon` interface which provides utilities for logging, error checking, and assertions. `Log` and the formatted `Logf` print messages to `os.Stderr`, typically prefixed with the `Module` name. `Debug` and `Debugf` behave the same, but only print if enabled via the `DebugMode` or `DebugEnv` `InitOptions`.
280
281`Fail` and `Failf` immediately stops execution of the protoc-plugin and causes `protoc` to fail generation with the provided message. `CheckErr` and `Assert` also fail with the provided messages if an error is passed in or if an expression evaluates to false, respectively.
282
283Additional contextual prefixes can be provided by calling `Push` and `Pop` on the `BuildContext`. This behavior is similar to `PushDir` and `PopDir` but only impacts log messages. `ModuleBase` wraps these methods to mutate their underlying `BuildContexts`. Those methods should be used instead of the ones on the contained `BuildContext` directly.
284
285### Parameters
286
287The `BuildContext` also provides access to the pre-processed `Parameters` from the specified protoc flag. The only PG*-specific key expected is "output_path", which is utilized by a module's `BuildContext` for its `OutputPath`.
288
289PG* permits mutating the `Parameters` via the `MutateParams` `InitOption`. By passing in a `ParamMutator` function here, these KV pairs can be modified or verified prior to the PGG workflow begins.
290
291## Language-Specific Subpackages
292
293While implemented in Go, PG* seeks to be language agnostic in what it can do. Therefore, beyond the pre-generated base descriptor types, PG* has no dependencies on the protoc-gen-go (PGG) package. However, there are many nuances that each language's protoc-plugin introduce that can be generalized. For instance, PGG package naming, import paths, and output paths are a complex interaction of the proto package name, the `go_package` file option, and parameters passed to protoc. While PG*'s core API should not be overloaded with many language-specific methods, subpackages can be provided that can operate on `Parameters` and `Entities` to derive the appropriate results.
294
295PG* currently implements the [pgsgo](https://godoc.org/github.com/lyft/protoc-gen-star/lang/go/) subpackage to provide these utilities to plugins targeting the Go language. Future subpackages are planned to support a variety of languages.
296
297## PG* Development & Make Targets
298
299PG* seeks to provide all the tools necessary to rapidly and ergonomically extend and build on top of the Protocol Buffer IDL. Whether the goal is to modify the official protoc-gen-go output or create entirely new files and packages, this library should offer a user-friendly wrapper around the complexities of the PB descriptors and the protoc-plugin workflow.
300
301### Setup
302
303For developing on PG*, you should install the package within the `GOPATH`. PG* uses [glide][glide] for dependency management.
304
305```sh
306go get -u github.com/lyft/protoc-gen-star
307cd $GOPATH/src/github.com/lyft/protoc-gen-star
308make vendor
309```
310
311To upgrade dependencies, please make the necessary modifications in `glide.yaml` and run `glide update`.
312
313### Linting & Static Analysis
314
315To avoid style nits and also to enforce some best practices for Go packages, PG* requires passing `golint`, `go vet`, and `go fmt -s` for all code changes.
316
317```sh
318make lint
319```
320
321### Testing
322
323PG* strives to have near 100% code coverage by unit tests. Most unit tests are run in parallel to catch potential race conditions. There are three ways of running unit tests, each taking longer than the next but providing more insight into test coverage:
324
325```sh
326# run code generation for the data used by the tests
327make testdata
328
329# run unit tests without race detection or code coverage reporting
330make quick
331
332# run unit tests with race detection and code coverage
333make tests
334
335# run unit tests with race detection and generates a code coverage report, opening in a browser
336make cover
337```
338
339#### protoc-gen-debug
340
341PG* comes with a specialized protoc-plugin, `protoc-gen-debug`. This plugin captures the CodeGeneratorRequest from a protoc execution and saves the serialized PB to disk. These files can be used as inputs to prevent calling protoc from tests.
342
343### Documentation
344
345Go is a self-documenting language, and provides a built in utility to view locally: `godoc`. The following command starts a godoc server and opens a browser window to this package's documentation. If you see a 404 or unavailable page initially, just refresh.
346
347```sh
348make docs
349```
350
351### Demo
352
353PG* comes with a "kitchen sink" example: [`protoc-gen-example`][pge]. This protoc plugin built on top of PG* prints out the target package's AST as a tree to stderr. This provides an end-to-end way of validating each of the nuanced types and nesting in PB descriptors:
354
355```sh
356# create the example PG*-based plugin
357make bin/protoc-gen-example
358
359# run protoc-gen-example against the demo protos
360make testdata/generated
361```
362
363#### CI
364
365PG* uses [TravisCI][travis] to validate all code changes. Please view the [configuration][travis.yml] for what tests are involved in the validation.
366
367[glide]: http://glide.sh
368[pgg]: https://github.com/golang/protobuf/tree/master/protoc-gen-go
369[pge]: https://github.com/lyft/protoc-gen-star/tree/master/testdata/protoc-gen-example
370[travis]: https://travis-ci.com/lyft/protoc-gen-star
371[travis.yml]: https://github.com/lyft/protoc-gen-star/tree/master/.travis.yml
372[module]: https://github.com/lyft/protoc-gen-star/blob/master/module.go
373[pb]: https://developers.google.com/protocol-buffers/
374[context]: https://github.com/lyft/protoc-gen-star/tree/master/build_context.go
375[visitor]: https://github.com/lyft/protoc-gen-star/tree/master/node.go
376[params]: https://github.com/lyft/protoc-gen-star/tree/master/parameters.go
377[make]: https://github.com/lyft/protoc-gen-star/blob/master/Makefile
378[single]: https://github.com/golang/protobuf/pull/40
379