• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

.travis/H03-May-2022-

cmd/H18-Feb-2021-

data/H03-May-2022-

examples/H18-Feb-2021-

shared/H18-Feb-2021-

tests/H18-Feb-2021-

.gitignoreH A D18-Feb-202123

.travis.ymlH A D18-Feb-2021755

LICENSEH A D18-Feb-20211.6 KiB

MakefileH A D18-Feb-2021708

README.mdH A D18-Feb-202110.3 KiB

README.md

1<p align="center">
2<img align="center" src="data/dragon128x128.png?raw_true">
3</p>
4
5# Whole Program LLVM in Go
6
7[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
8[![Build Status](https://travis-ci.org/SRI-CSL/gllvm.svg?branch=master)](https://travis-ci.org/SRI-CSL/gllvm)
9[![Go Report Card](https://goreportcard.com/badge/github.com/SRI-CSL/gllvm)](https://goreportcard.com/report/github.com/SRI-CSL/gllvm)
10
11**TL; DR:**  A drop-in replacement for [wllvm](https://github.com/SRI-CSL/whole-program-llvm), that builds the
12bitcode in parallel, and is faster. A comparison between the two tools can be gleaned from building the [Linux kernel.](https://github.com/SRI-CSL/gllvm/tree/master/examples/linux-kernel)
13
14## Quick Start Comparison Table
15
16| wllvm command/env variable  | gllvm command/env variable  |
17|-----------------------------|-----------------------------|
18|  wllvm                      | gclang                      |
19|  wllvm++                    | gclang++                    |
20|  extract-bc                 | get-bc                      |
21|  wllvm-sanity-checker       | gsanity-check               |
22|  LLVM_COMPILER_PATH         | LLVM_COMPILER_PATH          |
23|  LLVM_CC_NAME      ...      | LLVM_CC_NAME          ...   |
24|  WLLVM_CONFIGURE_ONLY       | WLLVM_CONFIGURE_ONLY        |
25|  WLLVM_OUTPUT_LEVEL         | WLLVM_OUTPUT_LEVEL          |
26|  WLLVM_OUTPUT_FILE          | WLLVM_OUTPUT_FILE           |
27|  LLVM_COMPILER              | *not supported* (clang only)|
28|  LLVM_GCC_PREFIX            | *not supported* (clang only)|
29|  LLVM_DRAGONEGG_PLUGIN      | *not supported* (clang only)|
30
31
32This project, `gllvm`, provides tools for building whole-program (or
33whole-library) LLVM bitcode files from an unmodified C or C++
34source package. It currently runs on `*nix` platforms such as Linux,
35FreeBSD, and Mac OS X. It is a Go port of [wllvm](https://github.com/SRI-CSL/whole-program-llvm).
36
37`gllvm` provides compiler wrappers that work in two
38phases. The wrappers first invoke the compiler as normal. Then, for
39each object file, they call a bitcode compiler to produce LLVM
40bitcode. The wrappers then store the location of the generated bitcode
41file in a dedicated section of the object file.  When object files are
42linked together, the contents of the dedicated sections are
43concatenated (so we don't lose the locations of any of the constituent
44bitcode files). After the build completes, one can use a `gllvm`
45utility to read the contents of the dedicated section and link all of
46the bitcode into a single whole-program bitcode file. This utility
47works for both executable and native libraries.
48
49For more details see [wllvm](https://github.com/SRI-CSL/whole-program-llvm).
50
51## Prerequisites
52
53To install `gllvm` you need the go language [tool](https://golang.org/doc/install).
54
55To use `gllvm` you need clang/clang++ and the llvm tools llvm-link and llvm-ar.
56`gllvm` is agnostic to the actual llvm version. `gllvm` also relies on standard build
57tools such as `objcopy` and `ld`.
58
59
60## Installation
61
62To install, simply do (making sure to include those `...`)
63```
64go get github.com/SRI-CSL/gllvm/cmd/...
65```
66This should install four binaries: `gclang`, `gclang++`, `get-bc`, and `gsanity-check`
67in the `$GOPATH/bin` directory.
68
69## Usage
70
71`gclang` and
72`gclang++` are the wrappers used to compile C and C++.  `get-bc` is used for
73extracting the bitcode from a build product (either an object file, executable, library
74or archive). `gsanity-check` can be used for detecting configuration errors.
75
76Here is a simple example. Assuming that clang is in your `PATH`, you can build
77bitcode for `pkg-config` as follows:
78
79```
80tar xf pkg-config-0.26.tar.gz
81cd pkg-config-0.26
82CC=gclang ./configure
83make
84```
85
86This should produce the executable `pkg-config`. To extract the bitcode:
87```
88get-bc pkg-config
89```
90
91which will produce the bitcode module `pkg-config.bc`. For more on this example
92see [here](https://github.com/SRI-CSL/gllvm/tree/master/examples/pkg-config).
93
94## Advanced Configuration
95
96If clang and the llvm tools are not in your `PATH`, you will need to set some
97environment variables.
98
99
100 * `LLVM_COMPILER_PATH` can be set to the absolute path of the directory that
101   contains the compiler and the other LLVM tools to be used.
102
103 * `LLVM_CC_NAME` can be set if your clang compiler is not called `clang` but
104    something like `clang-3.7`. Similarly `LLVM_CXX_NAME` can be used to
105    describe what the C++ compiler is called. We also pay attention to the
106    environment variables `LLVM_LINK_NAME` and `LLVM_AR_NAME` in an
107    analogous way.
108
109Another useful, and sometimes necessary, environment variable is `WLLVM_CONFIGURE_ONLY`.
110
111* `WLLVM_CONFIGURE_ONLY` can be set to anything. If it is set, `gclang`
112   and `gclang++` behave like a normal C or C++ compiler. They do not
113   produce bitcode.  Setting `WLLVM_CONFIGURE_ONLY` may prevent
114   configuration errors caused by the unexpected production of hidden
115   bitcode files. It is sometimes required when configuring a build.
116   For example:
117   ```
118   WLLVM_CONFIGURE_ONLY=1 CC=gclang ./configure
119   make
120   ```
121
122## Extracting the Bitcode
123
124The `get-bc` tool is used to extract the bitcode from a build artifact, such as an executable, object file, thin archive, archive, or library. In the simplest use case, as seen above,
125one simply does:
126
127```
128get-bc -o <name of bitcode file> <path to executable>
129```
130This will produce the desired bitcode file. The situation is similar for an object file.
131For an archive or library, there is a choice as to whether you produce a bitcode module
132or a bitcode archive. This choice is made by using the `-b` switch.
133
134Another useful switch is the `-m` switch which will, in addition to producing the
135bitcode, will also produce a manifest of the bitcode files
136that made up the final product. As is typical
137
138```
139get-bc -h
140```
141will list all the commandline switches. Since we use the `golang` `flag` module,
142the switches must precede the artifact path.
143
144
145
146## Preserving bitcode files in a store
147
148Sometimes, because of pathological build systems, it can be useful
149to preserve the bitcode files produced in a
150build, either to prevent deletion or to retrieve it later. If the
151environment variable `WLLVM_BC_STORE` is set to the absolute path of
152an existing directory,
153then WLLVM will copy the produced bitcode file into that directory.
154The name of the copied bitcode file is the hash of the path to the
155original bitcode file.  For convenience, when using both the manifest
156feature of `get-bc` and the store, the manifest will contain both
157the original path, and the store path.
158
159## Debugging
160
161
162The gllvm tools can show various levels of output to aid with debugging.
163To show this output set the `WLLVM_OUTPUT_LEVEL` environment
164variable to one of the following levels:
165
166 * `ERROR`
167 * `WARNING`
168 * `INFO`
169 * `DEBUG`
170
171For example:
172```
173    export WLLVM_OUTPUT_LEVEL=DEBUG
174```
175Output will be directed to the standard error stream, unless you specify the
176path of a logfile via the `WLLVM_OUTPUT_FILE` environment variable.
177
178For example:
179```
180    export WLLVM_OUTPUT_FILE=/tmp/gllvm.log
181```
182
183## Dragons Begone
184
185`gllvm` does not support the dragonegg plugin.
186
187
188## Sanity Checking
189
190Too many environment variables? Try doing a sanity check:
191
192```
193gsanity-check
194```
195it might point out what is wrong.
196
197
198
199## Under the hoods
200
201
202Both `wllvm` and `gllvm` toolsets do much the same thing, but the way
203they do it is slightly different. The `gllvm` toolset's code base is
204written in `golang`, and is largely derived from the `wllvm`'s python
205codebase.
206
207Both generate object files and bitcode files using the
208compiler. `wllvm` can use `gcc` and `dragonegg`, `gllvm` can only use
209`clang`. The `gllvm` toolset does these two tasks in parallel, while
210`wllvm` does them sequentially.  This together with the slowness of
211python's `fork exec`-ing, and it's interpreted nature accounts for the
212large efficiency gap between the two toolsets.
213
214Both inject the path of the bitcode version of the `.o` file into a
215dedicated segment of the `.o` file itself. This segment is the same
216across toolsets, so extracting the bitcode can be done by the
217appropriate tool in either toolset. On `*nix` both toolsets use
218`objcopy` to add the segment, while on OS X they use `ld`.
219
220When the object files are linked into the resulting library or
221executable, the bitcode path segments are appended, so the resulting
222binary contains the paths of all the bitcode files that constitute the
223binary.  To extract the sections the `gllvm` toolset uses the golang
224packages `"debug/elf"` and `"debug/macho"`, while the `wllvm` toolset
225uses `objdump` on `*nix`, and `otool` on OS X.
226
227Both tools then use `llvm-link` or `llvm-ar` to combine the bitcode
228files into the desired form.
229
230## Customization under the hood.
231
232You can specify the exact version of `objcopy` and `ld` that `gllvm` uses
233to manipulate the artifacts by setting the `GLLVM_OBJCOPY` and `GLLVM_LD`
234environment variables. For more details of what's under the `gllvm` hood, try
235```
236gsanity-check -e
237```
238
239## Customizing the BitCode Generation (e.g. LTO)
240
241In some situations it is desirable to pass certain flags to `clang` in the step that
242produces the bitcode. This can be fulfilled by setting the
243`LLVM_BITCODE_GENERATION_FLAGS` environment variable to the desired
244flags, for example `"-flto -fwhole-program-vtables"`.
245
246## Beware of link time optimization.
247
248If the package you are building happens to take advantage of recent `clang` developments
249such as *link time optimization* (indicated by the presence of compiler flag `-flto`), then
250your build is unlikely to produce anything that `get-bc` will work on. This is to be
251expected.
252
253## Developer tools
254
255Debugging usually boils down to looking in the logs, maybe adding a print statement or two.
256There is an additional executable, not mentioned above, called `gparse` that gets installed
257along with `gclang`, `gclang++`, `get-bc` and `gsanity-check`. `gparse` takes the command line
258arguments to the compiler, and outputs how it parsed them. This can sometimes be helpful.
259
260## License
261
262`gllvm` is released under a BSD license. See the file `LICENSE` for [details.](LICENSE)
263
264---
265
266This material is based upon work supported by the National Science
267Foundation under Grant
268[ACI-1440800](http://www.nsf.gov/awardsearch/showAward?AWD_ID=1440800). Any
269opinions, findings, and conclusions or recommendations expressed in
270this material are those of the author(s) and do not necessarily
271reflect the views of the National Science Foundation.
272