1# Running and Benchmarking Halide Generators 2 3## Overview 4 5`RunGen` is a simple(ish) wrapper that allows an arbitrary Generator to be built 6into a single executable that can be run directly from bash, without needing to 7wrap it in your own custom main() driver. It also implements a rudimentary 8benchmarking and memory-usage functionality. 9 10If you use the standard CMake rules for Generators, you get RunGen functionality 11automatically. (If you use Make, you might need to add an extra rule or two to 12your Makefile; all the examples in `apps/` already have these rules.) 13 14For every `halide_library` (or `halide_library_from_generator`) rule, there is 15an implicit `name.rungen` rule that generates an executable that wraps the 16Generator library: 17 18``` 19# In addition to defining a static library named "local_laplacian", this rule 20# also implicitly defines an executable target named "local_laplacian.rungen" 21halide_library( 22 local_laplacian 23 SRCS local_laplacian_generator.cc 24) 25``` 26 27You can build and run this like any other executable: 28 29``` 30$ make bin/local_laplacian.rungen && ./bin/local_laplacian.rungen 31Usage: local_laplacian.rungen argument=value [argument=value... ] [flags] 32...typical "usage" text... 33``` 34 35To be useful, you need to pass in values for the Generator's inputs (and 36locations for the output(s)) on the command line, of course. You can use the 37`--describe` flag to see the names and expected types: 38 39``` 40# ('make bin/local_laplacian.rungen && ' prefix omitted henceforth for clarity) 41$ ./bin/local_laplacian.rungen --describe 42Filter name: "local_laplacian" 43 Input "input" is of type Buffer<uint16> with 3 dimensions 44 Input "levels" is of type int32 45 Input "alpha" is of type float32 46 Input "beta" is of type float32 47 Output "local_laplacian" is of type Buffer<uint16> with 3 dimensions 48``` 49 50Warning: Outputs may have `$X` (where `X` is a small integer) appended to their 51names in some cases (or, in the case of Generators that don't explicitly declare 52outputs via `Output<>`, an autogenerated name of the form `fX`). If this 53happens, don't forget to escape the `$` with a backslash as necessary. These are 54both bugs we intend to fix; see https://github.com/halide/Halide/issues/2194 55 56As a convenience, there is also an implicit target that builds-and-runs, named 57simply "NAME.run": 58 59``` 60# This is equivalent to "make bin/local_laplacian.rungen && ./bin/local_laplacian.rungen" 61$ make bin/local_laplacian.run 62Usage: local_laplacian.rungen argument=value [argument=value... ] [flags] 63 64# To pass arguments to local_laplacian.rungen, set the RUNARGS var: 65$ make bin/local_laplacian.run RUNARGS=--describe 66Filter name: "local_laplacian" 67 Input "input" is of type Buffer<uint16> with 3 dimensions 68 Input "levels" is of type int32 69 Input "alpha" is of type float32 70 Input "beta" is of type float32 71 Output "local_laplacian" is of type Buffer<uint16> with 3 dimensions 72``` 73 74Inputs are specified as `name=value` pairs, in any order. Scalar inputs are 75specified the typical text form, while buffer inputs (and outputs) are specified 76via paths to image files. RunGen currently can read/write image files in any 77format supported by halide_image_io.h; at this time, that means .png, .jpg, 78.ppm, .pgm, and .tmp formats. (We plan to add .tiff and .mat (level 5) in the 79future.) 80 81``` 82$ ./bin/local_laplacian.rungen input=../images/rgb_small16.png levels=8 alpha=1 beta=1 output=/tmp/out.png 83$ display /tmp/out.png 84``` 85 86You can also specify any scalar input as `default` or `estimate`, which will use 87the default value specified for the input, or the value specified by 88`set_estimate` for that input. (If the relevant value isn't set for that input, 89a runtime error occurs.) 90 91``` 92$ ./bin/local_laplacian.rungen input=../images/rgb_small16.png levels=8 alpha=estimate beta=default output=/tmp/out.png 93$ display /tmp/out.png 94``` 95 96If you specify an input or output file format that doesn't match the required 97type/dimensions for an argument (e.g., using an 8-bit PNG for an Input<float>, 98or a grayscale image for a 3-dimensional input), RunGen will try to coerce the 99inputs to something sensible; that said, it's hard to always get this right, so 100warnings are **always** issued whenever an input or output is modified in any 101way. 102 103``` 104# This filter expects a 16-bit RGB image as input, but we're giving it an 8-bit grayscale image: 105$ ./bin/local_laplacian.rungen input=../images/gray.png levels=8 alpha=1 beta=1 output=/tmp/out.png 106Warning: Image for Input "input" has 2 dimensions, but this argument requires at least 3 dimensions: adding dummy dimensions of extent 1. 107Warning: Image loaded for argument "input" is type uint8 but this argument expects type uint16; data loss may have occurred. 108``` 109 110By default, we try to guess a suitable size for the output image(s), based 111mainly on the size of the input images (if any); you can also specify explicit 112output extents. (Note that output_extents are subject to constraints already 113imposed by the particular Generator's logic, so arbitrary values for 114--output_extents may produce runtime errors.) 115 116``` 117# Constrain output extents to 100x200x3 118$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=../images/rgb_small16.png levels=8 alpha=1 beta=1 output=/tmp/out.png 119``` 120 121Sometimes you don't care what the particular element values for an input are 122(e.g. for benchmarking), and you just want an image of a particular size; in 123that case, you can use the `zero:[]` pseudo-file; it infers the _type_ from the 124Generator, and inits every element to zero: 125 126``` 127# Input is a 3-dimensional image with extent 123, 456, and 3 128# (bluring an image of all zeroes isn't very interesting, of course) 129$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:[123,456,3] levels=8 alpha=1 beta=1 output=/tmp/out.png 130``` 131 132You can also specify arbitrary (nonzero) constants: 133 134``` 135# Input is a 3-dimensional image with extent 123, 456, and 3, 136# filled with a constant value of 42 137$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=constant:42:[123,456,3] levels=8 alpha=1 beta=1 output=/tmp/out.png 138``` 139 140Similarly, you can create identity images where only the diagonal elements are 1411-s (rest are 0-s) by invoking `identity:[]`. Diagonal elements are defined as 142those whose first two coordinates are equal. 143 144There's also a `random:SEED:[]` pseudo-file, which fills the image with uniform 145noise based on a specific random-number seed: 146 147``` 148# Input is a 3-dimensional image with extent 123, 456, and 3 149$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=random:42:[123,456,3] levels=8 alpha=1 beta=1 output=/tmp/out.png 150``` 151 152Instead of specifying an explicit set of extents for a pseudo-input, you can use 153the string `auto`, which will run a bounds query to choose a legal set of 154extents for that input given the known output extents. (This is only useful when 155used in conjunction with the `--output_extents` flag.) 156 157``` 158$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:auto levels=8 alpha=1 beta=1 output=/tmp/out.png 159``` 160 161You can also specify `estimate` for the extents, which will use the estimate 162values provided, typically (but not necessarily) for auto_schedule. (If there 163aren't estimates for all of the buffer's dimensions, a runtime error occurs.) 164 165``` 166$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:auto levels=8 alpha=1 beta=1 output=/tmp/out.png 167``` 168 169You can combine the two and specify `estimate_then_auto` for the extents, which 170will attempt to use the estimate values; if a given input buffer has no 171estimates, it will fall back to the bounds-query result for that input: 172 173``` 174$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:estimate_then_auto levels=8 alpha=1 beta=1 output=/tmp/out.png 175``` 176 177Similarly, you can use `estimate` for `--output_extents`, which will use the 178estimate values for each output. (If there aren't estimates for all of the 179outputs, a runtime error occurs.) 180 181``` 182$ ./bin/local_laplacian.rungen --output_extents=estimate input=zero:auto levels=8 alpha=1 beta=1 output=/tmp/out.png 183``` 184 185If you don't want to explicitly specify all (or any!) of the input values, you 186can use the `--default_input_buffers` and `--default_input_scalars` flags, which 187provide wildcards for any omitted inputs: 188 189``` 190$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] --default_input_buffers=random:0:auto --default_input_scalars=estimate output=/tmp/out.png 191``` 192 193In this case, all input buffers will be sized according to bounds query, and 194filled with a random seed; all input scalars will be initialized to their 195declared default values. (If they have no declared default value, a zero of the 196appropriate type will be used.) 197 198Note: `--default_input_buffers` can produce surprising sizes! For instance, any 199input that uses `BoundaryConditions::repeat_edge` to wrap itself can legally be 200set to almost any size, so you may legitimately get an input with extent=1 in 201all dimensions; whether this is useful to you or not depends on the code. It's 202highly recommended you do testing with the `--verbose` flag (which will log the 203calculated sizes) to reality-check that you are getting what you expect, 204especially for benchmarking. 205 206A common case (especially for benchmarking) is to specify using estimates for 207all inputs and outputs; for this, you can specify `--estimate_all`, which is 208just a shortcut for 209`--default_input_buffers=estimate_then_auto --default_input_scalars=estimate --output_extents=estimate`. 210 211## Benchmarking 212 213To run a benchmark, use the `--benchmarks=all` flag: 214 215``` 216$ ./bin/local_laplacian.rungen --benchmarks=all input=zero:[1920,1080,3] levels=8 alpha=1 beta=1 --output_extents=[100,200,3] 217Benchmark for local_laplacian produces best case of 0.0494629 sec/iter, over 3 blocks of 10 iterations. 218Best output throughput is 39.9802 mpix/sec. 219``` 220 221You can use `--default_input_buffers` and `--default_input_scalars` here as 222well: 223 224``` 225$ ./bin/local_laplacian.rungen --benchmarks=all --default_input_buffers --default_input_scalars --output_extents=estimate 226Benchmark for local_laplacian produces best case of 0.0494629 sec/iter, over 3 blocks of 10 iterations. 227Best output throughput is 39.9802 mpix/sec. 228``` 229 230Note: `halide_benchmark.h` is known to be inaccurate for GPU filters; see 231https://github.com/halide/Halide/issues/2278 232 233## Measuring Memory Usage 234 235To track memory usage, use the `--track_memory` flag, which measures the 236high-water-mark of CPU memory usage. 237 238``` 239$ ./bin/local_laplacian.rungen --track_memory input=zero:[1920,1080,3] levels=8 alpha=1 beta=1 --output_extents=[100,200,3] 240Maximum Halide memory: 82688420 bytes for output of 1.97754 mpix. 241``` 242 243Warning: `--track_memory` may degrade performance; don't combine it with 244`--benchmark` or expect meaningful timing measurements when using it. 245 246## Using RunGen in Make 247 248To add support for RunGen to your Makefile, you need to add rules something like 249this (see `apps/support/Makefile.inc` for an example): 250 251``` 252HALIDE_DISTRIB ?= /path/to/halide/distrib/folder 253 254$(BIN)/RunGenMain.o: $(HALIDE_DISTRIB)/tools/RunGenMain.cpp 255 @mkdir -p $(@D) 256 @$(CXX) -c $< $(CXXFLAGS) $(LIBPNG_CXX_FLAGS) $(LIBJPEG_CXX_FLAGS) -I$(BIN) -o $@ 257 258.PRECIOUS: $(BIN)/%.rungen 259$(BIN)/%.rungen: $(BIN)/%.a $(BIN)/%.registration.cpp $(BIN)/RunGenMain.o 260 $(CXX) $(CXXFLAGS) $^ -o $@ $(LIBPNG_LIBS) $(LIBJPEG_LIBS) $(LDFLAGS) 261 262RUNARGS ?= 263 264$(BIN)/%.run: $(BIN)/%.rungen 265 @$(CURDIR)/$< $(RUNARGS) 266``` 267 268Note that the `%.registration.cpp` file is created by running a generator and 269specifying `registration` in the comma-separated list of files to emit; these 270are also generated by default if `-e` is not used on the generator command line. 271 272## Known Issues & Caveats 273 274- If your Generator uses `define_extern()`, you must have all link-time 275 dependencies declared properly via `FILTER_DEPS`; otherwise, you'll fail to 276 link. 277- The code does its best to detect when inputs or outputs need to be 278 chunky/interleaved (rather than planar), but in unusual cases it might guess 279 wrong; if your Generator uses buffers with unusual stride setups, RunGen might 280 fail at runtime. (If this happens, please file a bug!) 281- The code for deducing good output sizes is rudimentary and needs to be 282 smartened; it will sometimes make bad decisions which will prevent the filter 283 from executing. (If this happens, please file a bug!) 284