1# Running and Benchmarking Halide Generators
2
3## Overview
4
5`RunGen` is a simple(ish) wrapper that allows an arbitrary Generator to be built
6into a single executable that can be run directly from bash, without needing to
7wrap it in your own custom main() driver. It also implements a rudimentary
8benchmarking and memory-usage functionality.
9
10If you use the standard CMake rules for Generators, you get RunGen functionality
11automatically. (If you use Make, you might need to add an extra rule or two to
12your Makefile; all the examples in `apps/` already have these rules.)
13
14For every `halide_library` (or `halide_library_from_generator`) rule, there is
15an implicit `name.rungen` rule that generates an executable that wraps the
16Generator library:
17
18```
19# In addition to defining a static library named "local_laplacian", this rule
20# also implicitly defines an executable target named "local_laplacian.rungen"
21halide_library(
22    local_laplacian
23    SRCS local_laplacian_generator.cc
24)
25```
26
27You can build and run this like any other executable:
28
29```
30$ make bin/local_laplacian.rungen && ./bin/local_laplacian.rungen
31Usage: local_laplacian.rungen argument=value [argument=value... ] [flags]
32...typical "usage" text...
33```
34
35To be useful, you need to pass in values for the Generator's inputs (and
36locations for the output(s)) on the command line, of course. You can use the
37`--describe` flag to see the names and expected types:
38
39```
40# ('make bin/local_laplacian.rungen && ' prefix omitted henceforth for clarity)
41$ ./bin/local_laplacian.rungen --describe
42Filter name: "local_laplacian"
43  Input "input" is of type Buffer<uint16> with 3 dimensions
44  Input "levels" is of type int32
45  Input "alpha" is of type float32
46  Input "beta" is of type float32
47  Output "local_laplacian" is of type Buffer<uint16> with 3 dimensions
48```
49
50Warning: Outputs may have `$X` (where `X` is a small integer) appended to their
51names in some cases (or, in the case of Generators that don't explicitly declare
52outputs via `Output<>`, an autogenerated name of the form `fX`). If this
53happens, don't forget to escape the `$` with a backslash as necessary. These are
54both bugs we intend to fix; see https://github.com/halide/Halide/issues/2194
55
56As a convenience, there is also an implicit target that builds-and-runs, named
57simply "NAME.run":
58
59```
60# This is equivalent to "make bin/local_laplacian.rungen && ./bin/local_laplacian.rungen"
61$ make bin/local_laplacian.run
62Usage: local_laplacian.rungen argument=value [argument=value... ] [flags]
63
64# To pass arguments to local_laplacian.rungen, set the RUNARGS var:
65$ make bin/local_laplacian.run RUNARGS=--describe
66Filter name: "local_laplacian"
67  Input "input" is of type Buffer<uint16> with 3 dimensions
68  Input "levels" is of type int32
69  Input "alpha" is of type float32
70  Input "beta" is of type float32
71  Output "local_laplacian" is of type Buffer<uint16> with 3 dimensions
72```
73
74Inputs are specified as `name=value` pairs, in any order. Scalar inputs are
75specified the typical text form, while buffer inputs (and outputs) are specified
76via paths to image files. RunGen currently can read/write image files in any
77format supported by halide_image_io.h; at this time, that means .png, .jpg,
78.ppm, .pgm, and .tmp formats. (We plan to add .tiff and .mat (level 5) in the
79future.)
80
81```
82$ ./bin/local_laplacian.rungen input=../images/rgb_small16.png levels=8 alpha=1 beta=1 output=/tmp/out.png
83$ display /tmp/out.png
84```
85
86You can also specify any scalar input as `default` or `estimate`, which will use
87the default value specified for the input, or the value specified by
88`set_estimate` for that input. (If the relevant value isn't set for that input,
89a runtime error occurs.)
90
91```
92$ ./bin/local_laplacian.rungen input=../images/rgb_small16.png levels=8 alpha=estimate beta=default output=/tmp/out.png
93$ display /tmp/out.png
94```
95
96If you specify an input or output file format that doesn't match the required
97type/dimensions for an argument (e.g., using an 8-bit PNG for an Input<float>,
98or a grayscale image for a 3-dimensional input), RunGen will try to coerce the
99inputs to something sensible; that said, it's hard to always get this right, so
100warnings are **always** issued whenever an input or output is modified in any
101way.
102
103```
104# This filter expects a 16-bit RGB image as input, but we're giving it an 8-bit grayscale image:
105$ ./bin/local_laplacian.rungen input=../images/gray.png levels=8 alpha=1 beta=1 output=/tmp/out.png
106Warning: Image for Input "input" has 2 dimensions, but this argument requires at least 3 dimensions: adding dummy dimensions of extent 1.
107Warning: Image loaded for argument "input" is type uint8 but this argument expects type uint16; data loss may have occurred.
108```
109
110By default, we try to guess a suitable size for the output image(s), based
111mainly on the size of the input images (if any); you can also specify explicit
112output extents. (Note that output_extents are subject to constraints already
113imposed by the particular Generator's logic, so arbitrary values for
114--output_extents may produce runtime errors.)
115
116```
117# Constrain output extents to 100x200x3
118$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=../images/rgb_small16.png levels=8 alpha=1 beta=1 output=/tmp/out.png
119```
120
121Sometimes you don't care what the particular element values for an input are
122(e.g. for benchmarking), and you just want an image of a particular size; in
123that case, you can use the `zero:[]` pseudo-file; it infers the _type_ from the
124Generator, and inits every element to zero:
125
126```
127# Input is a 3-dimensional image with extent 123, 456, and 3
128# (bluring an image of all zeroes isn't very interesting, of course)
129$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:[123,456,3] levels=8 alpha=1 beta=1 output=/tmp/out.png
130```
131
132You can also specify arbitrary (nonzero) constants:
133
134```
135# Input is a 3-dimensional image with extent 123, 456, and 3,
136# filled with a constant value of 42
137$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=constant:42:[123,456,3] levels=8 alpha=1 beta=1 output=/tmp/out.png
138```
139
140Similarly, you can create identity images where only the diagonal elements are
1411-s (rest are 0-s) by invoking `identity:[]`. Diagonal elements are defined as
142those whose first two coordinates are equal.
143
144There's also a `random:SEED:[]` pseudo-file, which fills the image with uniform
145noise based on a specific random-number seed:
146
147```
148# Input is a 3-dimensional image with extent 123, 456, and 3
149$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=random:42:[123,456,3] levels=8 alpha=1 beta=1 output=/tmp/out.png
150```
151
152Instead of specifying an explicit set of extents for a pseudo-input, you can use
153the string `auto`, which will run a bounds query to choose a legal set of
154extents for that input given the known output extents. (This is only useful when
155used in conjunction with the `--output_extents` flag.)
156
157```
158$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:auto levels=8 alpha=1 beta=1 output=/tmp/out.png
159```
160
161You can also specify `estimate` for the extents, which will use the estimate
162values provided, typically (but not necessarily) for auto_schedule. (If there
163aren't estimates for all of the buffer's dimensions, a runtime error occurs.)
164
165```
166$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:auto levels=8 alpha=1 beta=1 output=/tmp/out.png
167```
168
169You can combine the two and specify `estimate_then_auto` for the extents, which
170will attempt to use the estimate values; if a given input buffer has no
171estimates, it will fall back to the bounds-query result for that input:
172
173```
174$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:estimate_then_auto levels=8 alpha=1 beta=1 output=/tmp/out.png
175```
176
177Similarly, you can use `estimate` for `--output_extents`, which will use the
178estimate values for each output. (If there aren't estimates for all of the
179outputs, a runtime error occurs.)
180
181```
182$ ./bin/local_laplacian.rungen --output_extents=estimate input=zero:auto levels=8 alpha=1 beta=1 output=/tmp/out.png
183```
184
185If you don't want to explicitly specify all (or any!) of the input values, you
186can use the `--default_input_buffers` and `--default_input_scalars` flags, which
187provide wildcards for any omitted inputs:
188
189```
190$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] --default_input_buffers=random:0:auto --default_input_scalars=estimate output=/tmp/out.png
191```
192
193In this case, all input buffers will be sized according to bounds query, and
194filled with a random seed; all input scalars will be initialized to their
195declared default values. (If they have no declared default value, a zero of the
196appropriate type will be used.)
197
198Note: `--default_input_buffers` can produce surprising sizes! For instance, any
199input that uses `BoundaryConditions::repeat_edge` to wrap itself can legally be
200set to almost any size, so you may legitimately get an input with extent=1 in
201all dimensions; whether this is useful to you or not depends on the code. It's
202highly recommended you do testing with the `--verbose` flag (which will log the
203calculated sizes) to reality-check that you are getting what you expect,
204especially for benchmarking.
205
206A common case (especially for benchmarking) is to specify using estimates for
207all inputs and outputs; for this, you can specify `--estimate_all`, which is
208just a shortcut for
209`--default_input_buffers=estimate_then_auto --default_input_scalars=estimate --output_extents=estimate`.
210
211## Benchmarking
212
213To run a benchmark, use the `--benchmarks=all` flag:
214
215```
216$ ./bin/local_laplacian.rungen --benchmarks=all input=zero:[1920,1080,3] levels=8 alpha=1 beta=1 --output_extents=[100,200,3]
217Benchmark for local_laplacian produces best case of 0.0494629 sec/iter, over 3 blocks of 10 iterations.
218Best output throughput is 39.9802 mpix/sec.
219```
220
221You can use `--default_input_buffers` and `--default_input_scalars` here as
222well:
223
224```
225$ ./bin/local_laplacian.rungen --benchmarks=all --default_input_buffers --default_input_scalars --output_extents=estimate
226Benchmark for local_laplacian produces best case of 0.0494629 sec/iter, over 3 blocks of 10 iterations.
227Best output throughput is 39.9802 mpix/sec.
228```
229
230Note: `halide_benchmark.h` is known to be inaccurate for GPU filters; see
231https://github.com/halide/Halide/issues/2278
232
233## Measuring Memory Usage
234
235To track memory usage, use the `--track_memory` flag, which measures the
236high-water-mark of CPU memory usage.
237
238```
239$ ./bin/local_laplacian.rungen --track_memory input=zero:[1920,1080,3] levels=8 alpha=1 beta=1 --output_extents=[100,200,3]
240Maximum Halide memory: 82688420 bytes for output of 1.97754 mpix.
241```
242
243Warning: `--track_memory` may degrade performance; don't combine it with
244`--benchmark` or expect meaningful timing measurements when using it.
245
246## Using RunGen in Make
247
248To add support for RunGen to your Makefile, you need to add rules something like
249this (see `apps/support/Makefile.inc` for an example):
250
251```
252HALIDE_DISTRIB ?= /path/to/halide/distrib/folder
253
254$(BIN)/RunGenMain.o: $(HALIDE_DISTRIB)/tools/RunGenMain.cpp
255  @mkdir -p $(@D)
256  @$(CXX) -c $< $(CXXFLAGS) $(LIBPNG_CXX_FLAGS) $(LIBJPEG_CXX_FLAGS) -I$(BIN) -o $@
257
258.PRECIOUS: $(BIN)/%.rungen
259$(BIN)/%.rungen: $(BIN)/%.a $(BIN)/%.registration.cpp $(BIN)/RunGenMain.o
260  $(CXX) $(CXXFLAGS) $^ -o $@ $(LIBPNG_LIBS) $(LIBJPEG_LIBS) $(LDFLAGS)
261
262RUNARGS ?=
263
264$(BIN)/%.run: $(BIN)/%.rungen
265  @$(CURDIR)/$< $(RUNARGS)
266```
267
268Note that the `%.registration.cpp` file is created by running a generator and
269specifying `registration` in the comma-separated list of files to emit; these
270are also generated by default if `-e` is not used on the generator command line.
271
272## Known Issues & Caveats
273
274- If your Generator uses `define_extern()`, you must have all link-time
275  dependencies declared properly via `FILTER_DEPS`; otherwise, you'll fail to
276  link.
277- The code does its best to detect when inputs or outputs need to be
278  chunky/interleaved (rather than planar), but in unusual cases it might guess
279  wrong; if your Generator uses buffers with unusual stride setups, RunGen might
280  fail at runtime. (If this happens, please file a bug!)
281- The code for deducing good output sizes is rudimentary and needs to be
282  smartened; it will sometimes make bad decisions which will prevent the filter
283  from executing. (If this happens, please file a bug!)
284