• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

.github/workflows/H05-Sep-2020-880763

apps/H03-May-2022-47,82235,500

cmake/H05-Sep-2020-880732

dependencies/H03-May-2022-

doc/H03-May-2022-

packaging/H03-May-2022-164138

python_bindings/H03-May-2022-11,0256,848

src/H03-May-2022-180,924132,287

test/H03-May-2022-65,04048,510

tools/H03-May-2022-6,3584,963

tutorial/H03-May-2022-6,9542,992

util/H03-May-2022-5,2004,755

.clang-formatH A D05-Sep-20201.5 KiB5250

.clang-format-ignoreH A D05-Sep-2020265 1514

.clang-tidyH A D05-Sep-2020469 87

.gitattributesH A D05-Sep-2020342 1411

.gitignoreH A D05-Sep-20201.1 KiB9585

CODE_OF_CONDUCT.mdH A D05-Sep-20203.5 KiB6454

MakefileH A D05-Sep-202092.6 KiB2,2901,755

README.mdH A D05-Sep-202019.1 KiB512379

README_cmake.mdH A D05-Sep-202066.5 KiB1,222989

README_rungen.mdH A D05-Sep-202012.1 KiB284222

README_webassembly.mdH A D05-Sep-20207.5 KiB164131

README.md

1# Halide
2
3Halide is a programming language designed to make it easier to write
4high-performance image and array processing code on modern machines. Halide
5currently targets:
6
7- CPU architectures: X86, ARM, MIPS, Hexagon, PowerPC
8- Operating systems: Linux, Windows, Mac OS X, Android, iOS, Qualcomm QuRT
9- GPU Compute APIs: CUDA, OpenCL, OpenGL, OpenGL Compute Shaders, Apple Metal,
10  Microsoft Direct X 12
11
12Rather than being a standalone programming language, Halide is embedded in C++.
13This means you write C++ code that builds an in-memory representation of a
14Halide pipeline using Halide's C++ API. You can then compile this representation
15to an object file, or JIT-compile it and run it in the same process. Halide also
16provides a Python binding that provides full support for writing Halide embedded
17in Python without C++.
18
19For more detail about what Halide is, see http://halide-lang.org.
20
21For API documentation see http://halide-lang.org/docs
22
23To see some example code, look in the tutorials directory.
24
25If you've acquired a full source distribution and want to build Halide, see the
26notes below.
27
28# Building Halide with Make
29
30### TL;DR
31
32Have llvm-9.0 (or greater) installed and run `make` in the root directory of the
33repository (where this README is).
34
35### Acquiring LLVM
36
37At any point in time, building Halide requires either the latest stable version
38of LLVM, the previous stable version of LLVM, and trunk. At the time of writing,
39this means versions 10.0 and 9.0 are supported, but 8.0 is not. The commands
40`llvm-config` and `clang` must be somewhere in the path.
41
42If your OS does not have packages for llvm, you can find binaries for it at
43http://llvm.org/releases/download.html. Download an appropriate package and then
44either install it, or at least put the `bin` subdirectory in your path. (This
45works well on OS X and Ubuntu.)
46
47If you want to build it yourself, first check it out from GitHub:
48
49```
50% git clone --depth 1 --branch llvmorg-10.0.0 https://github.com/llvm/llvm-project.git
51```
52
53(If you want to build LLVM 9.x, use branch `release/9.x`; for current trunk, use
54`master`)
55
56Then build it like so:
57
58```
59% mkdir llvm-build
60% cd llvm-build
61% cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../llvm-install \
62        -DLLVM_ENABLE_PROJECTS="clang;lld;clang-tools-extra" \
63        -DLLVM_TARGETS_TO_BUILD="X86;ARM;NVPTX;AArch64;Mips;Hexagon" \
64        -DLLVM_ENABLE_TERMINFO=OFF -DLLVM_ENABLE_ASSERTIONS=ON \
65        -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=ON -DLLVM_BUILD_32_BITS=OFF \
66        ../llvm-project/llvm
67% cmake --build . --target install
68```
69
70then to point Halide to it:
71
72```
73export LLVM_CONFIG=<path to llvm>/llvm-install/bin/llvm-config
74```
75
76Note that you _must_ add `clang` to `LLVM_ENABLE_PROJECTS`; adding `lld` to
77`LLVM_ENABLE_PROJECTS` is only required when using WebAssembly, and adding
78`clang-tools-extra` is only necessary if you plan to contribute code to Halide
79(so that you can run clang-tidy on your pull requests). We recommend enabling
80both in all cases, to simplify builds. You can disable exception handling (EH)
81and RTTI if you don't want the Python bindings.
82
83### Building Halide with make
84
85With `LLVM_CONFIG` set (or `llvm-config` in your path), you should be able to
86just run `make` in the root directory of the Halide source tree.
87`make run_tests` will run the JIT test suite, and `make test_apps` will make
88sure all the apps compile and run (but won't check their output).
89
90There is no `make install` yet. If you want to make an install package, run
91`make distrib`.
92
93### Building Halide out-of-tree with make
94
95If you wish to build Halide in a separate directory, you can do that like so:
96
97    % cd ..
98    % mkdir halide_build
99    % cd halide_build
100    % make -f ../Halide/Makefile
101
102# Building Halide with CMake
103
104### MacOS and Linux
105
106Follow the above instructions to build LLVM or acquire a suitable binary
107release. Then create a separate build folder for Halide and run CMake, pointing
108it to your LLVM installation.
109
110```
111% mkdir Halide-build
112% cd Halide-build
113% cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_DIR=/path/to/llvm-install/lib/cmake/llvm /path/to/Halide
114% cmake --build .
115```
116
117`LLVM_DIR` should be the folder in the LLVM installation or build tree that
118contains `LLVMConfig.cmake`. It is not required if you have a suitable
119system-wide version installed. If you have multiple system-wide versions
120installed, you can specify the version with `HALIDE_REQUIRE_LLVM_VERSION`. Add
121`-G Ninja` if you prefer to build with the Ninja generator.
122
123### Windows
124
125We recommend building with MSVC 2019, but MSVC 2017 is also supported. Be sure
126to install the CMake Individual Component in the Visual Studio 2019 installer.
127For older versions of Visual Studio, do not install the CMake tools, but instead
128acquire CMake and Ninja from their respective project websites.
129
130These instructions start from the `D:` drive. We assume this git repo is cloned
131to `D:\Halide`. We also assume that your shell environment is set up correctly.
132For a 64-bit build, run:
133
134```
135D:\> "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
136```
137
138For a 32-bit build, run:
139
140```
141D:\> "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x86_amd64
142```
143
144#### Managing dependencies with vcpkg
145
146The best way to get compatible dependencies on Windows is to use
147[vcpkg](https://github.com/Microsoft/vcpkg). Install it like so:
148
149```
150D:\> git clone https://github.com/Microsoft/vcpkg.git
151D:\> cd vcpkg
152D:\> .\bootstrap-vcpkg.bat
153D:\vcpkg> .\vcpkg integrate install
154...
155CMake projects should use: "-DCMAKE_TOOLCHAIN_FILE=D:/vcpkg/scripts/buildsystems/vcpkg.cmake"
156```
157
158Then install the libraries. For a 64-bit build, run:
159
160```
161D:\vcpkg> .\vcpkg install libpng:x64-windows libjpeg-turbo:x64-windows llvm[target-all,clang-tools-extra]:x64-windows
162```
163
164To support 32-bit builds, also run:
165
166```
167D:\vcpkg> .\vcpkg install libpng:x86-windows libjpeg-turbo:x86-windows llvm[target-all,clang-tools-extra]:x86-windows
168```
169
170#### Building Halide
171
172Create a separate build tree and call CMake with vcpkg's toolchain. This will
173build in either 32-bit or 64-bit depending on the environment script (`vcvars`)
174that was run earlier.
175
176```
177D:\> md Halide-build
178D:\> cd Halide-build
179D:\Halide-build> cmake -G Ninja ^
180                       -DCMAKE_BUILD_TYPE=Release ^
181                       -DCMAKE_TOOLCHAIN_FILE=D:/vcpkg/scripts/buildsystems/vcpkg.cmake ^
182                       ..\Halide
183```
184
185**Note:** If building with Python bindings on 32-bit (enabled by default), be
186sure to point CMake to the installation path of a 32-bit Python 3. You can do
187this by specifying, for example:
188`"-DPython3_ROOT_DIR=C:\Program Files (x86)\Python38-32"`.
189
190Then run the build with:
191
192```
193D:\Halide-build> cmake --build . --config Release -j %NUMBER_OF_PROCESSORS%
194```
195
196To run all the tests:
197
198```
199D:\Halide-build> ctest -C Release
200```
201
202Subsets of the tests can be selected with `-L` and include `correctness`,
203`python`, `error`, and the other directory names under `/tests`.
204
205#### Building LLVM (optional)
206
207Follow these steps if you want to build LLVM yourself. First, download LLVM's
208sources (these instructions use the latest 10.0 release)
209
210```
211D:\> git clone --depth 1 --branch llvmorg-10.0.0 https://github.com/llvm/llvm-project.git
212```
213
214For a 64-bit build, run:
215
216```
217D:\> md llvm-build
218D:\> cd llvm-build
219D:\llvm-build> cmake -G Ninja ^
220                     -DCMAKE_BUILD_TYPE=Release ^
221                     -DCMAKE_INSTALL_PREFIX=../llvm-install ^
222                     -DLLVM_ENABLE_PROJECTS=clang;lld;clang-tools-extra ^
223                     -DLLVM_ENABLE_TERMINFO=OFF ^
224                     -DLLVM_TARGETS_TO_BUILD=X86;ARM;NVPTX;AArch64;Mips;Hexagon ^
225                     -DLLVM_ENABLE_ASSERTIONS=ON ^
226                     -DLLVM_ENABLE_EH=ON ^
227                     -DLLVM_ENABLE_RTTI=ON ^
228                     -DLLVM_BUILD_32_BITS=OFF ^
229                     ..\llvm-project\llvm
230```
231
232For a 32-bit build, run:
233
234```
235D:\> md llvm32-build
236D:\> cd llvm32-build
237D:\llvm32-build> cmake -G Ninja ^
238                       -DCMAKE_BUILD_TYPE=Release ^
239                       -DCMAKE_INSTALL_PREFIX=../llvm32-install ^
240                       -DLLVM_ENABLE_PROJECTS=clang;lld;clang-tools-extra ^
241                       -DLLVM_ENABLE_TERMINFO=OFF ^
242                       -DLLVM_TARGETS_TO_BUILD=X86;ARM;NVPTX;AArch64;Mips;Hexagon ^
243                       -DLLVM_ENABLE_ASSERTIONS=ON ^
244                       -DLLVM_ENABLE_EH=ON ^
245                       -DLLVM_ENABLE_RTTI=ON ^
246                       -DLLVM_BUILD_32_BITS=ON ^
247                       ..\llvm-project\llvm
248```
249
250Finally, run:
251
252```
253D:\llvm-build> cmake --build . --config Release --target install -j %NUMBER_OF_PROCESSORS%
254```
255
256You can substitute `Debug` for `Release` in the above `cmake` commands if you
257want a debug build. Make sure to add `-DLLVM_DIR=D:/llvm-install/lib/cmake/llvm`
258to the Halide CMake command to override `vcpkg`'s LLVM.
259
260**MSBuild:** If you want to build LLVM with MSBuild instead of Ninja, use
261`-G "Visual Studio 16 2019" -Thost=x64 -A x64` or
262`-G "Visual Studio 16 2019" -Thost=x64 -A Win32` in place of `-G Ninja`.
263
264#### If all else fails...
265
266Do what the build-bots do: https://buildbot.halide-lang.org/master/#/builders
267
268If the column that best matches your system is red, then maybe things aren't
269just broken for you. If it's green, then you can click the "stdio" links in the
270latest build to see what commands the build bots run, and what the output was.
271
272# Some useful environment variables
273
274`HL_TARGET=...` will set Halide's AOT compilation target.
275
276`HL_JIT_TARGET=...` will set Halide's JIT compilation target.
277
278`HL_DEBUG_CODEGEN=1` will print out pseudocode for what Halide is compiling.
279Higher numbers will print more detail.
280
281`HL_NUM_THREADS=...` specifies the number of threads to create for the thread
282pool. When the async scheduling directive is used, more threads than this number
283may be required and thus allocated. A maximum of 256 threads is allowed. (By
284default, the number of cores on the host is used.)
285
286`HL_TRACE_FILE=...` specifies a binary target file to dump tracing data into
287(ignored unless at least one `trace_` feature is enabled in `HL_TARGET` or
288`HL_JIT_TARGET`). The output can be parsed programmatically by starting from the
289code in `utils/HalideTraceViz.cpp`.
290
291# Using Halide on OSX
292
293Precompiled Halide distributions are built using XCode's command-line tools with
294Apple clang 500.2.76. This means that we link against libc++ instead of
295libstdc++. You may need to adjust compiler options accordingly if you're using
296an older XCode which does not default to libc++.
297
298# Halide OpenGL/GLSL backend
299
300Halide's OpenGL backend offloads image processing operations to the GPU by
301generating GLSL-based fragment shaders.
302
303Compared to other GPU-based processing options such as CUDA and OpenCL, OpenGL
304has two main advantages: it is available on basically every desktop computer and
305mobile device, and it is generally well supported across different hardware
306vendors.
307
308The main disadvantage of OpenGL as an image processing framework is that the
309computational capabilities of fragment shaders are quite restricted. In general,
310the processing model provided by OpenGL is most suitable for filters where each
311output pixel can be expressed as a simple function of the input pixels. This
312covers a wide range of interesting operations like point-wise filters and
313convolutions; but a few common image processing operations such as histograms or
314recursive filters are notoriously hard to express in GLSL.
315
316#### Writing OpenGL-Based Filters
317
318To enable code generation for OpenGL, include `opengl` in the target specifier
319passed to Halide. Since OpenGL shaders are limited in their computational power,
320you must also specify a CPU target for those parts of the filter that cannot or
321should not be computed on the GPU. Examples of valid target specifiers are
322
323```
324host-opengl
325x86-opengl-debug
326```
327
328Adding `debug`, as in the second example, adds additional logging output and is
329highly recommended during development.
330
331By default, filters compiled for OpenGL targets run completely on the CPU.
332Execution on the GPU must be enabled for individual Funcs by appropriate
333scheduling calls.
334
335GLSL fragment shaders implicitly iterate over two spatial dimensions x,y and the
336color channel. Due to the way color channels handled in GLSL, only filters for
337which the color index is a compile-time constant can be scheduled. The main
338consequence is that the range of color variables must be explicitly specified
339for both input and output buffers before scheduling:
340
341```
342ImageParam input;
343Func f;
344Var x, y, c;
345f(x, y, c) = ...;
346
347input.set_bounds(2, 0, 3);   // specify color range for input
348f.bound(c, 0, 3);            // and output
349f.glsl(x, y, c);
350```
351
352#### JIT Compilation
353
354For JIT compilation Halide attempts to load the system libraries for opengl and
355creates a new context to use for each module. Windows is not yet supported.
356
357Examples for JIT execution of OpenGL-based filters can be found in test/opengl.
358
359#### AOT Compilation
360
361When AOT (ahead-of-time) compilation is used, Halide generates OpenGL-enabled
362object files that can be linked to and called from a host application. In
363general, this is fairly straightforward, but a few things must be taken care of.
364
365On Linux, OS X, and Android, Halide creates its own OpenGL context unless the
366current thread already has an active context. On other platforms you have to
367link implementations of the following two functions with your Halide code:
368
369```
370extern "C" int halide_opengl_create_context(void *) {
371    return 0;  // if successful
372}
373
374extern "C" void *halide_opengl_get_proc_addr(void *, const char *name) {
375    ...
376}
377```
378
379Halide allocates and deletes textures as necessary. Applications may manage the
380textures by hand by setting the `halide_buffer_t::device` field; this is most
381useful for reusing image data that is already stored in textures. Some
382rudimentary checks are performed to ensure that externally allocated textures
383have the correct format, but in general that's the responsibility of the
384application.
385
386It is possible to let render directly to the current framebuffer; to do this,
387set the `dev` field of the output buffer to the value returned by
388`halide_opengl_output_client_bound`. The example in apps/HelloAndroidGL
389demonstrates this technique.
390
391Some operating systems can delete the OpenGL context of suspended applications.
392If this happens, Halide needs to re-initialize itself with the new context after
393the application resumes. Call `halide_opengl_context_lost` to reset Halide's
394OpenGL state after this has happened.
395
396#### Limitations
397
398The current implementation of the OpenGL backend targets the common subset of
399OpenGL 2.0 and OpenGL ES 2.0 which is widely available on both mobile devices
400and traditional computers. As a consequence, only a subset of the Halide
401language can be scheduled to run using OpenGL. Some important limitations are:
402
403- Reductions cannot be implemented in GLSL and must be run on the CPU.
404
405- OpenGL ES 2.0 only supports uint8 buffers.
406
407  Support for floating point texture is available, but requires OpenGL (ES) 3.0
408  or the texture_float extension, which may not work on all mobile devices.
409
410- OpenGL ES 2.0 has very limited support for integer arithmetic. For maximum
411  compatibility, consider doing all computations using floating point, even when
412  using integer textures.
413
414- Only 2D images with 3 or 4 color channels can be scheduled. Images with one or
415  two channels require OpenGL (ES) 3.0 or the texture_rg extension.
416
417- Not all builtin functions provided by Halide are currently supported, for
418  example `fast_log`, `fast_exp`, `fast_pow`, `reinterpret`, bit operations,
419  `random_float`, `random_int` cannot be used in GLSL code.
420
421The maximum texture size in OpenGL is `GL_MAX_TEXTURE_SIZE`, which is often
422smaller than the image of interest; on mobile devices, for example,
423`GL_MAX_TEXTURE_SIZE` is commonly 2048. Tiling must be used to process larger
424images.
425
426Planned features:
427
428- Support for half-float textures and arithmetic
429
430- Support for integer textures and arithmetic
431
432(Note that OpenGL Compute Shaders are supported with a separate OpenGLCompute
433backend.)
434
435# Halide for Hexagon HVX
436
437Halide supports offloading work to Qualcomm Hexagon DSP on Qualcomm Snapdragon
438820 devices or newer. The Hexagon DSP provides a set of 64 and 128 byte vector
439instructions - the Hexagon Vector eXtensions (HVX). HVX is well suited to image
440processing, and Halide for Hexagon HVX will generate the appropriate HVX vector
441instructions from a program authored in Halide.
442
443Halide can be used to compile Hexagon object files directly, by using a target
444such as `hexagon-32-qurt-hvx_64` or `hexagon-32-qurt-hvx_128`.
445
446Halide can also be used to offload parts of a pipeline to Hexagon using the
447`hexagon` scheduling directive. To enable the `hexagon` scheduling directive,
448include the `hvx_64` or `hvx_128` target features in your target. The currently
449supported combination of targets is to use the HVX target features with an x86
450linux host (to use the simulator) or with an ARM android target (to use Hexagon
451DSP hardware). For examples of using the `hexagon` scheduling directive on both
452the simulator and a Hexagon DSP, see the blur example app.
453
454To build and run an example app using the Hexagon target,
455
4561. Obtain and build trunk LLVM and Clang. (Earlier versions of LLVM may work but
457   are not actively tested and thus not recommended.)
4582. Download and install the Hexagon SDK and version 8.0 Hexagon Tools
4593. Build and run an example for Hexagon HVX
460
461### 1. Obtain and build trunk LLVM and Clang
462
463(Instructions given previous, just be sure to check out the `master` branch.)
464
465### 2. Download and install the Hexagon SDK and version 8.0 Hexagon Tools
466
467Go to https://developer.qualcomm.com/software/hexagon-dsp-sdk/tools
468
4691. Select the Hexagon Series 600 Software and download the 3.0 version for
470   Linux.
4712. untar the installer
4723. Run the extracted installer to install the Hexagon SDK and Hexagon Tools,
473   selecting Installation of Hexagon SDK into `/location/of/SDK/Hexagon_SDK/3.0`
474   and the Hexagon tools into `/location/of/SDK/Hexagon_Tools/8.0`
4754. Set an environment variable to point to the SDK installation location
476   ```
477   export SDK_LOC=/location/of/SDK
478   ```
479
480### 3. Build and run an example for Hexagon HVX
481
482In addition to running Hexagon code on device, Halide also supports running
483Hexagon code on the simulator from the Hexagon tools.
484
485To build and run the blur example in Halide/apps/blur on the simulator:
486
487```
488cd apps/blur
489export HL_HEXAGON_SIM_REMOTE=../../src/runtime/hexagon_remote/bin/v60/hexagon_sim_remote
490export HL_HEXAGON_TOOLS=$SDK_LOC/Hexagon_Tools/8.0/Tools/
491LD_LIBRARY_PATH=../../src/runtime/hexagon_remote/bin/host/:$HL_HEXAGON_TOOLS/lib/iss/:. HL_TARGET=host-hvx_128 make test
492```
493
494### To build and run the blur example in Halide/apps/blur on Android:
495
496To build the example for Android, first ensure that you have a standalone
497toolchain created from the NDK using the make-standalone-toolchain.sh script:
498
499```
500export ANDROID_NDK_HOME=$SDK_LOC/Hexagon_SDK/3.0/tools/android-ndk-r10d/
501export ANDROID_ARM64_TOOLCHAIN=<path to put new arm64 toolchain>
502$ANDROID_NDK_HOME/build/tools/make-standalone-toolchain.sh --arch=arm64 --platform=android-21 \
503    --install-dir=$ANDROID_ARM64_TOOLCHAIN
504```
505
506Now build and run the blur example using the script to run it on device:
507
508```
509export HL_HEXAGON_TOOLS=$SDK_LOC/HEXAGON_Tools/8.0/Tools/
510HL_TARGET=arm-64-android-hvx_128 ./adb_run_on_device.sh
511```
512

README_cmake.md

1# Halide and CMake
2
3This is a comprehensive guide to the three main usage stories of the Halide
4CMake build.
5
61. Compiling or packaging Halide from source.
72. Building Halide programs using the official CMake package.
83. Contributing to Halide and updating the build files.
9
10The following sections cover each in detail.
11
12## Table of Contents
13
14- [Getting started](#getting-started)
15  - [Installing CMake](#installing-cmake)
16  - [Installing dependencies](#installing-dependencies)
17- [Building Halide with CMake](#building-halide-with-cmake)
18  - [Basic build](#basic-build)
19  - [Build options](#build-options)
20    - [Find module options](#find-module-options)
21- [Using Halide from your CMake build](#using-halide-from-your-cmake-build)
22  - [A basic CMake project](#a-basic-cmake-project)
23  - [JIT mode](#jit-mode)
24  - [AOT mode](#aot-mode)
25    - [Autoschedulers](#autoschedulers)
26    - [RunGenMain](#rungenmain)
27  - [Halide package documentation](#halide-package-documentation)
28    - [Components](#components)
29    - [Variables](#variables)
30    - [Imported targets](#imported-targets)
31    - [Functions](#functions)
32      - [`add_halide_library`](#add_halide_library)
33- [Contributing CMake code to Halide](#contributing-cmake-code-to-halide)
34  - [General guidelines and best practices](#general-guidelines-and-best-practices)
35    - [Prohibited commands list](#prohibited-commands-list)
36    - [Prohibited variables list](#prohibited-variables-list)
37  - [Adding tests](#adding-tests)
38  - [Adding apps](#adding-apps)
39
40# Getting started
41
42This section covers installing a recent version of CMake and the correct
43dependencies for building and using Halide. If you have not used CMake before,
44we strongly suggest reading through the [CMake documentation][cmake-docs] first.
45
46## Installing CMake
47
48Halide requires at least version 3.16, which was released in November 2019.
49Fortunately, getting a recent version of CMake couldn't be easier, and there are
50multiple good options on any system to do so. Generally, one should always have
51the most recent version of CMake installed system-wide. CMake is committed to
52backwards compatibility and even the most recent release can build projects over
53a decade old.
54
55### Cross-platform
56
57The Python package manager `pip3` has the newest version of CMake at all times.
58This might be the most convenient method since Python 3 is an optional
59dependency for Halide, anyway.
60
61```
62$ pip3 install --upgrade cmake
63```
64
65See the [PyPI website][pypi-cmake] for more details.
66
67### Windows
68
69On Windows, there are three primary methods for installing an up-to-date CMake:
70
711. If you have Visual Studio 2019 installed, you can get CMake 3.17 through the
72   Visual Studio installer. This is the recommended way of getting CMake if you
73   are able to use Visual Studio 2019. See Microsoft's
74   [documentation][vs2019-cmake-docs] for more details.
752. If you use [Chocolatey][chocolatey], its [CMake package][choco-cmake] is kept
76   up to date. It should be as simple as `choco install cmake`.
773. Otherwise, you should install CMake from [Kitware's website][cmake-download].
78
79### macOS
80
81On macOS, the [Homebrew][homebrew] [CMake package][brew-cmake] is kept up to
82date. Simply run:
83
84```
85$ brew update
86$ brew install cmake
87```
88
89to install the newest version of CMake. If your environment prevents you from
90installing Homebrew, the binary release on [Kitware's website][cmake-download]
91is also a viable option.
92
93### Ubuntu Linux
94
95There are a few good ways to install a modern CMake on Ubuntu:
96
971. If you're on Ubuntu Linux 20.04 (focal), then simply running
98   `sudo apt install cmake` will get you CMake 3.16.
992. If you are on an older Ubuntu release or would like to use the newest CMake,
100   try installing via the snap store: `snap install cmake`. Be sure you do not
101   already have `cmake` installed via APT. The snap package automatically stays
102   up to date.
1033. For older versions of Debian, Ubuntu, Mint, and derivatives, Kitware provides
104   an [APT repository][cmake-apt] with up-to-date releases. Note that this is
105   still useful for Ubuntu 20.04 because it will remain up to date.
1064. If all else fails, you might need to build CMake from source (eg. on old
107   Ubuntu versions running on ARM). In that case, follow the directions posted
108   on [Kitware's website][cmake-from-source].
109
110For other Linux distributions, check with your distribution's package manager or
111use pip as detailed above. Snap packages might also be available.
112
113**Note:** On WSL 1, the snap service is not available; in this case, prefer to
114use the APT repository. On WSL 2, all methods are available.
115
116## Installing dependencies
117
118We generally recommend using a package manager to fetch Halide's dependencies.
119Except where noted, we recommend using [vcpkg][vcpkg] on Windows,
120[Homebrew][homebrew] on macOS, and APT on Ubuntu 20.04 LTS.
121
122Only LLVM and Clang are _absolutely_ required to build Halide. Halide always
123supports three LLVM versions: the current major version, the previous major
124version, and trunk. The LLVM and Clang versions must match exactly. For most
125users, we recommend using a binary release of LLVM rather than building it
126yourself.
127
128However, to run all of the tests and apps, an extended set is needed. This
129includes [lld][lld], [Python 3][python], [libpng][libpng], [libjpeg][libjpeg],
130[Doxygen][doxygen], [OpenBLAS][openblas], [ATLAS][atlas], and [Eigen3][eigen].
131While not required to build any part of Halide, we find that [Ninja][ninja] is
132the best backend build tool across all platforms.
133
134Note that CMake has many special variables for overriding the locations of
135packages and executables. A partial list can be found in the
136["find module options"](#find-module-options) section below, and more can be
137found in the documentation for the CMake [find_package][find_package] command.
138Normally, you should prefer to make sure your environment is set up so that
139CMake can find dependencies automatically. For instance, if you want CMake to
140use a particular version of Python, create a [virtual environment][venv] and
141activate it _before_ configuring Halide.
142
143### Windows
144
145We assume you have vcpkg installed at `D:\vcpkg`. Follow the instructions in the
146[vcpkg README][vcpkg] to install. Start by installing LLVM.
147
148```
149D:\vcpkg> .\vcpkg install llvm[target-all,enable-assertions,clang-tools-extra]:x64-windows
150D:\vcpkg> .\vcpkg install llvm[target-all,enable-assertions,clang-tools-extra]:x86-windows
151```
152
153This will also install Clang and LLD. The `enable-assertions` option is not
154strictly necessary but will make debugging during development much smoother.
155These builds will take a long time and a lot of disk space. After they are
156built, it is safe to delete the intermediate build files and caches in
157`D:\vcpkg\buildtrees` and `%APPDATA%\local\vcpkg`.
158
159Then install the other libraries:
160
161```
162D:\vcpkg> .\vcpkg install libpng:x64-windows libjpeg-turbo:x64-windows openblas:x64-windows eigen3:x64-windows
163D:\vcpkg> .\vcpkg install libpng:x86-windows libjpeg-turbo:x86-windows openblas:x86-windows eigen3:x86-windows
164```
165
166To build the documentation, you will need to install [Doxygen][doxygen]. This
167can be done either through [Chocolatey][choco-doxygen] or from the [Doxygen
168website][doxygen-download].
169
170```
171> choco install doxygen
172```
173
174To build the Python bindings, you will need to install Python 3. This should be
175done by running the official installer from the [Python website][python]. Be
176sure to download the debugging symbols through the installer. This will require
177using the "Advanced Installation" workflow. Although it is not strictly
178necessary, it is convenient to install Python system-wide on Windows (ie.
179`C:\Program Files`). This makes it easy for CMake to find without needing to
180manually set the `PATH`.
181
182Once Python is installed, you can install the Python module dependencies either
183globally or in a [virtual environment][venv] by running
184
185```
186> pip3 install -r .\python_bindings\requirements.txt
187```
188
189from the root of the repository.
190
191If you would like to use [Ninja][ninja], note that it is installed alongside
192CMake when using the Visual Studio 2019 installer. Alternatively, you can
193install via [Chocolatey][choco-ninja] or place the [pre-built
194binary][ninja-download] from their website in the PATH.
195
196```
197> choco install ninja
198```
199
200### macOS
201
202On macOS, it is possible to install all dependencies via [Homebrew][homebrew]:
203
204```
205$ brew install llvm libpng libjpeg python@3.8 openblas doxygen ninja
206```
207
208The `llvm` package includes `clang`, `clang-format`, and `lld`, too. Don't
209forget to install the Python module dependencies:
210
211```
212$ pip3 install -r python_bindings/requirements.txt
213```
214
215### Ubuntu
216
217Finally, on Ubuntu 20.04 LTS, you should install the following packages (this
218includes the Python module dependencies):
219
220```
221dev@ubuntu:~$ sudo apt install \
222                  clang-tools lld llvm-dev libclang-dev liblld-10-dev \
223                  libpng-dev libjpeg-dev libgl-dev \
224                  python3-dev python3-numpy python3-scipy python3-imageio python3-pybind11 \
225                  libopenblas-dev libeigen3-dev libatlas-base-dev \
226                  doxygen ninja-build
227```
228
229# Building Halide with CMake
230
231## Basic build
232
233These instructions assume that your working directory is the Halide repo root.
234
235### Windows
236
237If you plan to use the Ninja generator, be sure to be in the developer command
238prompt corresponding to your intended environment. Note that whatever your
239intended target system (x86, x64, or arm), you must use the 64-bit _host tools_
240because the 32-bit tools run out of memory during the linking step with LLVM.
241More information is available from [Microsoft's documentation][msvc-cmd].
242
243You should either open the correct Developer Command Prompt directly or run the
244[`vcvarsall.bat`][vcvarsall] script with the correct argument, ie. one of the
245following:
246
247```
248D:\> "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
249D:\> "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64_x86
250D:\> "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64_arm
251```
252
253Then, assuming that vcpkg is installed to `D:\vcpkg`, simply run:
254
255```
256> cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=D:\vcpkg\scripts\buildsystems\vcpkg.cmake -S . -B build
257> cmake --build .\build
258```
259
260Valid values of [`CMAKE_BUILD_TYPE`][cmake_build_type] are `Debug`,
261`RelWithDebInfo`, `MinSizeRel`, and `Release`. When using a single-configuration
262generator (like Ninja) you must specify a build type when configuring Halide (or
263any other CMake project).
264
265Otherwise, if you wish to create a Visual Studio based build system, you can
266configure with:
267
268```
269> cmake -G "Visual Studio 16 2019" -Thost=x64 -A x64 ^
270        -DCMAKE_TOOLCHAIN_FILE=D:\vcpkg\scripts\buildsystems\vcpkg.cmake ^
271        -S . -B build
272> cmake --build .\build --config Release -j %NUMBER_OF_PROCESSORS%
273```
274
275Because the Visual Studio generator is a _multi-config generator_, you don't set
276`CMAKE_BUILD_TYPE` at configure-time, but instead pass the configuration to the
277build (and test/install) commands with the `--config` flag. More documentation
278is available in the [CMake User Interaction Guide][cmake-user-interaction].
279
280The process is similar for 32-bit:
281
282```
283> cmake -G "Visual Studio 16 2019" -Thost=x64 -A Win32 ^
284        -DCMAKE_TOOLCHAIN_FILE=D:\vcpkg\scripts\buildsystems\vcpkg.cmake ^
285        -S . -B build
286> cmake --build .\build --config Release -j %NUMBER_OF_PROCESSORS%
287```
288
289In both cases, the `-Thost=x64` flag ensures that the correct host tools are
290used.
291
292**Note:** due to limitations in MSBuild, incremental builds using the VS
293generators will not detect changes to headers in the `src/runtime` folder. We
294recommend using Ninja for day-to-day development and use Visual Studio only if
295you need it for packaging.
296
297### macOS and Linux
298
299The instructions here are straightforward. Assuming your environment is set up
300correctly, just run:
301
302```
303dev@host:~/Halide$ cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -S . -B build
304dev@host:~/Halide$ cmake --build ./build
305```
306
307If you omit `-G Ninja`, a Makefile-based generator will likely be used instead.
308In either case, [`CMAKE_BUILD_TYPE`][cmake_build_type] must be set to one of the
309standard types: `Debug`, `RelWithDebInfo`, `MinSizeRel`, or `Release`.
310
311## Installing
312
313Once built, Halide will need to be installed somewhere before using it in a
314separate project. On any platform, this means running the
315[`cmake --install`][cmake-install] command in one of two ways. For a
316single-configuration generator (like Ninja), run either:
317
318```
319dev@host:~/Halide$ cmake --install ./build --prefix /path/to/Halide-install
320> cmake --install .\build --prefix X:\path\to\Halide-install
321```
322
323For a multi-configuration generator (like Visual Studio) run:
324
325```
326dev@host:~/Halide$ cmake --install ./build --prefix /path/to/Halide-install --config Release
327> cmake --install .\build --prefix X:\path\to\Halide-install --config Release
328```
329
330Of course, make sure that you build the corresponding config before attempting
331to install it.
332
333## Build options
334
335Halide reads and understands several options that can configure the build. The
336following are the most consequential and control how Halide is actually
337compiled.
338
339| Option                                   | Default               | Description                                                                                                      |
340| ---------------------------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------------- |
341| [`BUILD_SHARED_LIBS`][build_shared_libs] | `ON`                  | Standard CMake variable that chooses whether to build as a static or shared library.                             |
342| `Halide_BUNDLE_LLVM`                     | `OFF`                 | When building Halide as a static library, unpack the LLVM static libraries and add those objects to libHalide.a. |
343| `Halide_SHARED_LLVM`                     | `OFF`                 | Link to the shared version of LLVM. Not available on Windows.                                                    |
344| `Halide_ENABLE_RTTI`                     | _inherited from LLVM_ | Enable RTTI when building Halide. Recommended to be set to `ON`                                                  |
345| `Halide_ENABLE_EXCEPTIONS`               | `ON`                  | Enable exceptions when building Halide                                                                           |
346| `Halide_USE_CODEMODEL_LARGE`             | `OFF`                 | Use the Large LLVM codemodel                                                                                     |
347| `Halide_TARGET`                          | _empty_               | The default target triple to use for `add_halide_library` (and the generator tests, by extension)                |
348
349The following options are only available when building Halide directly, ie. not
350through the [`add_subdirectory`][add_subdirectory] or
351[`FetchContent`][fetchcontent] mechanisms. They control whether non-essential
352targets (like tests and documentation) are built.
353
354| Option                 | Default              | Description                                                                              |
355| ---------------------- | -------------------- | ---------------------------------------------------------------------------------------- |
356| `WITH_TESTS`           | `ON`                 | Enable building unit and integration tests                                               |
357| `WITH_APPS`            | `ON`                 | Enable testing sample applications (run `ctest -L apps` to actually build and test them) |
358| `WITH_PYTHON_BINDINGS` | `ON` if Python found | Enable building Python 3.x bindings                                                      |
359| `WITH_DOCS`            | `OFF`                | Enable building the documentation via Doxygen                                            |
360| `WITH_UTILS`           | `ON`                 | Enable building various utilities including the trace visualizer                         |
361| `WITH_TUTORIALS`       | `ON`                 | Enable building the tutorials                                                            |
362
363The following options control whether to build certain test subsets. They only
364apply when `WITH_TESTS=ON`:
365
366| Option                    | Default | Description                       |
367| ------------------------- | ------- | --------------------------------- |
368| `WITH_TEST_AUTO_SCHEDULE` | `ON`    | enable the auto-scheduling tests  |
369| `WITH_TEST_CORRECTNESS`   | `ON`    | enable the correctness tests      |
370| `WITH_TEST_ERROR`         | `ON`    | enable the expected-error tests   |
371| `WITH_TEST_WARNING`       | `ON`    | enable the expected-warning tests |
372| `WITH_TEST_PERFORMANCE`   | `ON`    | enable performance testing        |
373| `WITH_TEST_OPENGL`        | `OFF`   | enable the OpenGL tests           |
374| `WITH_TEST_GENERATOR`     | `ON`    | enable the AOT generator tests    |
375
376The following options enable/disable various LLVM backends (they correspond to
377LLVM component names):
378
379| Option               | Default              | Description                                              |
380| -------------------- | -------------------- | -------------------------------------------------------- |
381| `TARGET_AARCH64`     | `ON`, _if available_ | Enable the AArch64 backend                               |
382| `TARGET_AMDGPU`      | `ON`, _if available_ | Enable the AMD GPU backend                               |
383| `TARGET_ARM`         | `ON`, _if available_ | Enable the ARM backend                                   |
384| `TARGET_HEXAGON`     | `ON`, _if available_ | Enable the Hexagon backend                               |
385| `TARGET_MIPS`        | `ON`, _if available_ | Enable the MIPS backend                                  |
386| `TARGET_NVPTX`       | `ON`, _if available_ | Enable the NVidia PTX backend                            |
387| `TARGET_POWERPC`     | `ON`, _if available_ | Enable the PowerPC backend                               |
388| `TARGET_RISCV`       | `ON`, _if available_ | Enable the RISC V backend                                |
389| `TARGET_WEBASSEMBLY` | `ON`, _if available_ | Enable the WebAssembly backend. Only valid for LLVM 11+. |
390| `TARGET_X86`         | `ON`, _if available_ | Enable the x86 (and x86_64) backend                      |
391
392The following options enable/disable various Halide-specific backends:
393
394| Option                | Default | Description                            |
395| --------------------- | ------- | -------------------------------------- |
396| `TARGET_OPENCL`       | `ON`    | Enable the OpenCL-C backend            |
397| `TARGET_OPENGL`       | `ON`    | Enable the OpenGL/GLSL backend         |
398| `TARGET_METAL`        | `ON`    | Enable the Metal backend               |
399| `TARGET_D3D12COMPUTE` | `ON`    | Enable the Direct3D 12 Compute backend |
400
401The following options are WebAssembly-specific. They only apply when
402`TARGET_WEBASSEMBLY=ON`:
403
404| Option            | Default | Description                                                |
405| ----------------- | ------- | ---------------------------------------------------------- |
406| `WITH_WABT`       | `ON`    | Include WABT Interpreter for WASM testing                  |
407| `WITH_WASM_SHELL` | `ON`    | Download a wasm shell (e.g. d8) for testing AOT wasm code. |
408
409### Find module options
410
411Halide uses the following find modules to search for certain dependencies. These
412modules accept certain variables containing hints for the search process. Before
413setting any of these variables, closely study the [`find_package`][find_package]
414documentation.
415
416All of these variables should be set at the CMake command line via the `-D`
417flag.
418
419First, Halide expects to find LLVM and Clang through the `CONFIG` mode of
420`find_package`. You can tell Halide where to find these dependencies by setting
421the corresponding `_DIR` variables:
422
423| Variable    | Description                                          |
424| ----------- | ---------------------------------------------------- |
425| `LLVM_DIR`  | Path to the directory containing `LLVMConfig.cmake`  |
426| `Clang_DIR` | Path to the directory containing `ClangConfig.cmake` |
427
428When using CMake 3.18 or above, some of Halide's tests will search for CUDA
429using the [`FindCUDAToolkit`][findcudatoolkit] module. If it doesn't find your
430CUDA installation automatically, you can point it to it by setting:
431
432| Variable           | Description                                       |
433| ------------------ | ------------------------------------------------- |
434| `CUDAToolkit_ROOT` | Path to the directory containing `bin/nvcc[.exe]` |
435| `CUDA_PATH`        | _Environment_ variable, same as above.            |
436
437If the CMake version is lower than 3.18, the deprecated [`FindCUDA`][findcuda]
438module will be used instead. It reads the variable `CUDA_TOOLKIT_ROOT_DIR`
439instead of `CUDAToolkit_ROOT` above.
440
441When targeting OpenGL, the [`FindOpenGL`][findopengl] and [`FindX11`][findx11]
442modules will be used to link AOT generated binaries. These modules can be
443overridden by setting the following variables:
444
445| Variable                | Description                      |
446| ----------------------- | -------------------------------- |
447| `OPENGL_egl_LIBRARY`    | Path to the EGL library.         |
448| `OPENGL_glu_LIBRARY`    | Path to the GLU library.         |
449| `OPENGL_glx_LIBRARY`    | Path to the GLVND GLX library.   |
450| `OPENGL_opengl_LIBRARY` | Path to the GLVND OpenGL library |
451| `OPENGL_gl_LIBRARY`     | Path to the OpenGL library.      |
452
453The OpenGL paths will need to be set if you intend to use OpenGL with X11 on
454macOS.
455
456Halide also searches for `libpng` and `libjpeg-turbo` through the
457[`FindPNG`][findpng] and [`FindJPEG`][findjpeg] modules, respectively. They can
458be overridden by setting the following variables.
459
460| Variable            | Description                                        |
461| ------------------- | -------------------------------------------------- |
462| `PNG_LIBRARIES`     | Paths to the libraries to link against to use PNG. |
463| `PNG_INCLUDE_DIRS`  | Path to `png.h`, etc.                              |
464| `JPEG_LIBRARIES`    | Paths to the libraries needed to use JPEG.         |
465| `JPEG_INCLUDE_DIRS` | Paths to `jpeglib.h`, etc.                         |
466
467When `WITH_DOCS` is set to `ON`, Halide searches for Doxygen using the
468[`FindDoxygen`][finddoxygen] module. It can be overridden by setting the
469following variable.
470
471| Variable             | Description                     |
472| -------------------- | ------------------------------- |
473| `DOXYGEN_EXECUTABLE` | Path to the Doxygen executable. |
474
475When compiling for an OpenCL target, Halide uses the [`FindOpenCL`][findopencl]
476target to locate the libraries and include paths. These can be overridden by
477setting the following variables:
478
479| Variable              | Description                                           |
480| --------------------- | ----------------------------------------------------- |
481| `OpenCL_LIBRARIES`    | Paths to the libraries to link against to use OpenCL. |
482| `OpenCL_INCLUDE_DIRS` | Include directories for OpenCL.                       |
483
484Lastly, Halide searches for Python 3 using the [`FindPython3`][findpython3]
485module, _not_ the deprecated `FindPythonInterp` and `FindPythonLibs` modules,
486like other projects you might have encountered. You can select which Python
487installation to use by setting the following variable.
488
489| Variable           | Description                                           |
490| ------------------ | ----------------------------------------------------- |
491| `Python3_ROOT_DIR` | Define the root directory of a Python 3 installation. |
492
493# Using Halide from your CMake build
494
495This section assumes some basic familiarity with CMake but tries to be explicit
496in all its examples. To learn more about CMake, consult the
497[documentation][cmake-docs] and engage with the community on the [CMake
498Discourse][cmake-discourse].
499
500Note: previous releases bundled a `halide.cmake` module that was meant to be
501[`include()`][include]-ed into your project. This has been removed. Please
502upgrade to the new package config module.
503
504## A basic CMake project
505
506There are two main ways to use Halide in your application: as a **JIT compiler**
507for dynamic pipelines or an **ahead-of-time (AOT) compiler** for static
508pipelines. CMake provides robust support for both use cases.
509
510No matter how you intend to use Halide, you will need some basic CMake
511boilerplate.
512
513```cmake
514cmake_minimum_required(VERSION 3.16)
515project(HalideExample)
516
517set(CMAKE_CXX_STANDARD 11)  # or newer
518set(CMAKE_CXX_STANDARD_REQUIRED YES)
519set(CMAKE_CXX_EXTENSIONS NO)
520
521find_package(Halide REQUIRED)
522```
523
524The [`cmake_minimum_required`][cmake_minimum_required] command is required to be
525the first command executed in a CMake program. It disables all of the deprecated
526behavior ("policies" in CMake lingo) from earlier versions. The
527[`project`][project] command sets the name of the project (and has arguments for
528versioning, language support, etc.) and is required by CMake to be called
529immediately after setting the minimum version.
530
531The next three variables set the project-wide C++ standard. The first,
532[`CMAKE_CXX_STANDARD`][cmake_cxx_standard], simply sets the standard version.
533Halide requires at least C++11. The second,
534[`CMAKE_CXX_STANDARD_REQUIRED`][cmake_cxx_standard_required], tells CMake to
535fail if the compiler cannot provide the requested standard version. Lastly,
536[`CMAKE_CXX_EXTENSIONS`][cmake_cxx_extensions] tells CMake to disable
537vendor-specific extensions to C++. This is not necessary to simply use Halide,
538but we require it when authoring new code in the Halide repo.
539
540Finally, we use [`find_package`][find_package] to locate Halide on your system.
541If Halide is not globally installed, you will need to add the root of the Halide
542installation directory to [`CMAKE_MODULE_PATH`][cmake_module_path] at the CMake
543command line.
544
545```
546dev@ubuntu:~/myproj$ cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_MODULE_PATH="/path/to/Halide-install" -S . -B build
547```
548
549## JIT mode
550
551To use Halide in JIT mode (like the [tutorials][halide-tutorials] do, for
552example), you can simply link to `Halide::Halide`.
553
554```cmake
555# ... same project setup as before ...
556add_executable(my_halide_app main.cpp)
557target_link_libraries(my_halide_app PRIVATE Halide::Halide)
558```
559
560Then `Halide.h` will be available to your code and everything should just work.
561That's it!
562
563## AOT mode
564
565Using Halide in AOT mode is more complicated so we'll walk through it step by
566step. Note that this only applies to Halide generators, so it might be useful to
567re-read the [tutorial][halide-generator-tutorial] on generators. Assume (like in
568the tutorial) that you have a source file named `my_generators.cpp` and that in
569it you have generator classes `MyFirstGenerator` and `MySecondGenerator` with
570registered names `my_first_generator` and `my_second_generator` respectively.
571
572Then the first step is to add a **generator executable** to your build:
573
574```cmake
575# ... same project setup as before ...
576add_executable(my_generators my_generators.cpp)
577target_link_libraries(my_generators PRIVATE Halide::Generator)
578```
579
580Using the generator executable, we can add a Halide library corresponding to
581`MyFirstGenerator`.
582
583```cmake
584# ... continuing from above
585add_halide_library(my_first_generator FROM my_generators)
586```
587
588This will create a static library target in CMake that corresponds to the output
589of running your generator. The second generator in the file requires generator
590parameters to be passed to it. These are also easy to handle:
591
592```cmake
593# ... continuing from above
594add_halide_library(my_second_generator FROM my_generators
595                   PARAMS parallel=false scale=3.0 rotation=ccw output.type=uint16)
596```
597
598Adding multiple configurations is easy, too:
599
600```cmake
601# ... continuing from above
602add_halide_library(my_second_generator_2 FROM my_generators
603                   GENERATOR my_second_generator
604                   PARAMS scale=9.0 rotation=ccw output.type=float32)
605
606add_halide_library(my_second_generator_3 FROM my_generators
607                   GENERATOR my_second_generator
608                   PARAMS parallel=false output.type=float64)
609```
610
611Here, we had to specify which generator to use (`my_second_generator`) since it
612uses the target name by default. The functions in these libraries will be named
613after the target names, `my_second_generator_2` and `my_second_generator_3`, by
614default, but it is possible to control this via the `FUNCTION_NAME` parameter.
615
616Each one of these targets, `<GEN>`, carries an associated `<GEN>.runtime`
617target, which is also a static library containing the Halide runtime. It is
618transitively linked through `<GEN>` to targets that link to `<GEN>`. On an
619operating system like Linux, where weak linking is available, this is not an
620issue. However, on Windows, this can fail due to symbol redefinitions. In these
621cases, you must declare that two Halide libraries share a runtime, like so:
622
623```cmake
624# ... updating above
625add_halide_library(my_second_generator_2 FROM my_generators
626                   GENERATOR my_second_generator
627                   USE_RUNTIME my_first_generator.runtime
628                   PARAMS scale=9.0 rotation=ccw output.type=float32)
629
630add_halide_library(my_second_generator_3 FROM my_generators
631                   GENERATOR my_second_generator
632                   USE_RUNTIME my_first_generator.runtime
633                   PARAMS parallel=false output.type=float64)
634```
635
636This will even work correctly when different combinations of targets are
637specified for each halide library. A "greatest common denominator" target will
638be chosen that is compatible with all of them (or the build will fail).
639
640### Autoschedulers
641
642When the autoschedulers are included in the release package, they are very
643simple to apply to your own generators. For example, we could update the
644definition of the `my_first_generator` library above to use the `Adams2019`
645autoscheduler:
646
647```cmake
648add_halide_library(my_second_generator FROM my_generators
649                   AUTOSCHEDULER Halide::Adams2019
650                   PARAMS auto_schedule=true)
651```
652
653### RunGenMain
654
655Halide provides a generic driver for generators to be used during development
656for benchmarking and debugging. Suppose you have a generator executable called
657`my_gen` and a generator within called `my_filter`. Then you can pass a variable
658name to the `REGISTRATION` parameter of `add_halide_library` which will contain
659the name of a generated C++ source that should be linked to `Halide::RunGenMain`
660and `my_filter`.
661
662For example:
663
664```cmake
665add_halide_library(my_filter FROM my_gen
666                   REGISTRATION filter_reg_cpp)
667add_executable(runner ${filter_reg_cpp})
668target_link_libraries(runner PRIVATE my_filter Halide::RunGenMain)
669```
670
671Then you can run, debug, and benchmark your generator through the `runner`
672executable.
673
674## Halide package documentation
675
676Halide provides a CMake _package configuration_ module. The intended way to use
677the CMake build is to run `find_package(Halide ...)` in your `CMakeLists.txt`
678file. Closely read the [`find_package` documentation][find_package] before
679proceeding.
680
681### Components
682
683The Halide package script understands a handful of optional components when
684loading the package.
685
686First, if you plan to use the Halide Image IO library, you will want to include
687the `png` and `jpeg` components when loading Halide.
688
689Second, Halide releases can contain a variety of configurations: static, shared,
690debug, release, etc. CMake handles Debug/Release configurations automatically,
691but generally only allows one type of library to be loaded.
692
693The package understands two components, `static` and `shared`, that specify
694which type of library you would like to load. For example, if you want to make
695sure that you link against shared Halide, you can write:
696
697```cmake
698find_package(Halide REQUIRED COMPONENTS shared)
699```
700
701If the shared libraries are not available, this will result in a failure.
702
703If no component is specified, then the `Halide_SHARED_LIBS` variable is checked.
704If it is defined and set to true, then the shared libraries will be loaded or
705the package loading will fail. Similarly, if it is defined and set to false, the
706static libraries will be loaded.
707
708If no component is specified and `Halide_SHARED_LIBS` is _not_ defined, then the
709[`BUILD_SHARED_LIBS`][build_shared_libs] variable will be inspected. If it is
710**not defined** or **defined and set to true**, then it will attempt to load the
711shared libs and fall back to the static libs if they are not available.
712Similarly, if `BUILD_SHARED_LIBS` is **defined and set to false**, then it will
713try the static libs first then fall back to the shared libs.
714
715### Variables
716
717Variables that control package loading:
718
719| Variable             | Description                                                                                                                                                                   |
720| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
721| `Halide_SHARED_LIBS` | override `BUILD_SHARED_LIBS` when loading the Halide package via `find_package`. Has no effect when using Halide via `add_subdirectory` as a Git or `FetchContent` submodule. |
722
723Variables set by the package:
724
725| Variable                   | Description                                                      |
726| -------------------------- | ---------------------------------------------------------------- |
727| `Halide_VERSION`           | The full version string of the loaded Halide package             |
728| `Halide_VERSION_MAJOR`     | The major version of the loaded Halide package                   |
729| `Halide_VERSION_MINOR`     | The minor version of the loaded Halide package                   |
730| `Halide_VERSION_PATCH`     | The patch version of the loaded Halide package                   |
731| `Halide_VERSION_TWEAK`     | The tweak version of the loaded Halide package                   |
732| `Halide_HOST_TARGET`       | The Halide target triple corresponding to "host" for this build. |
733| `Halide_ENABLE_EXCEPTIONS` | Whether Halide was compiled with exception support               |
734| `Halide_ENABLE_RTTI`       | Whether Halide was compiled with RTTI                            |
735
736### Imported targets
737
738Halide defines the following targets that are available to users:
739
740| Imported target      | Description                                                                                                                          |
741| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
742| `Halide::Halide`     | this is the JIT-mode library to use when using Halide from C++.                                                                      |
743| `Halide::Generator`  | this is the target to use when defining a generator executable. It supplies a `main()` function.                                     |
744| `Halide::Runtime`    | adds include paths to the Halide runtime headers                                                                                     |
745| `Halide::Tools`      | adds include paths to the Halide tools, including the benchmarking utility.                                                          |
746| `Halide::ImageIO`    | adds include paths to the Halide image IO utility and sets up dependencies to PNG / JPEG if they are available.                      |
747| `Halide::RunGenMain` | used with the `REGISTRATION` parameter of `add_halide_library` to create simple runners and benchmarking tools for Halide libraries. |
748
749The following targets are not guaranteed to be available:
750
751| Imported target   | Description                                                                                                                              |
752| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
753| `Halide::Python`  | this is a Python 3 module that can be referenced as `$<TARGET_FILE:Halide::Python>` when setting up Python tests or the like from CMake. |
754| `Halide::Adams19` | the Adams et.al. 2019 autoscheduler (no GPU support)                                                                                     |
755| `Halide::Li18`    | the Li et.al. 2018 gradient autoscheduler (limited GPU support)                                                                          |
756
757### Functions
758
759Currently, only one function is defined:
760
761#### `add_halide_library`
762
763This is the main function for managing generators in AOT compilation. The full
764signature follows:
765
766```
767add_halide_library(<target> FROM <generator-target>
768                   [GENERATOR generator-name]
769                   [FUNCTION_NAME function-name]
770                   [USE_RUNTIME hl-target]
771                   [PARAMS param1 [param2 ...]]
772                   [TARGETS target1 [target2 ...]]
773                   [FEATURES feature1 [feature2 ...]]
774                   [PLUGINS plugin1 [plugin2 ...]]
775                   [AUTOSCHEDULER scheduler-name]
776                   [GRADIENT_DESCENT]
777                   [C_BACKEND]
778                   [REGISTRATION OUTVAR]
779                   [<extra-output> OUTVAR])
780
781extra-output = ASSEMBLY | BITCODE | COMPILER_LOG | CPP_STUB
782             | FEATURIZATION | LLVM_ASSEMBLY | PYTHON_EXTENSION
783             | PYTORCH_WRAPPER | SCHEDULE | STMT | STMT_HTML
784```
785
786This function creates a called `<target>` corresponding to running the
787`<generator-target>` (an executable target which links to `Halide::Generator`)
788one time, using command line arguments derived from the other parameters.
789
790The arguments `GENERATOR` and `FUNCTION_NAME` default to `<target>`. They
791correspond to the `-g` and `-f` command line flags, respectively.
792
793If `USE_RUNTIME` is not specified, this function will create another target
794called `<target>.runtime` which corresponds to running the generator with `-r`
795and a compatible list of targets. This runtime target is an INTERFACE dependency
796of `<target>`. If multiple runtime targets need to be linked together, setting
797`USE_RUNTIME` to another Halide library, `<target2>` will prevent the generation
798of `<target>.runtime` and instead use `<target2>.runtime`.
799
800Parameters can be passed to a generator via the `PARAMS` argument. Parameters
801should be space-separated. Similarly, `TARGETS` is a space-separated list of
802targets for which to generate code in a single function. They must all share the
803same platform/bits/os triple (eg. `arm-32-linux`). Features that are in common
804among all targets, including device libraries (like `cuda`) should go in
805`FEATURES`.
806
807Every element of `TARGETS` must begin with the same `arch-bits-os` triple. This
808function understands two _meta-triples_, `host` and `cmake`. The meta-triple
809`host` is equal to the `arch-bits-os` triple used to compile Halide along with
810all of the supported instruction set extensions. On platforms that support
811running both 32 and 64-bit programs, this will not necessarily equal the
812platform the compiler is running on or that CMake is targeting.
813
814The meta-triple `cmake` is equal to the `arch-bits-os` of the current CMake
815target. This is useful if you want to make sure you are not unintentionally
816cross-compiling, which would result in an [`IMPORTED` target][imported-target]
817being created. When `TARGETS` is empty and the `host` target would not
818cross-compile, then `host` will be used. Otherwise, `cmake` will be used and an
819author warning will be issued.
820
821To set the default autoscheduler, set the `AUTOSCHEDULER` argument to a target
822named like `Namespace::Scheduler`, for example `Halide::Adams19`. This will set
823the `-s` flag on the generator command line to `Scheduler` and add the target to
824the list of plugins. Additional plugins can be loaded by setting the `PLUGINS`
825argument. If the argument to `AUTOSCHEDULER` does not contain `::` or it does
826not name a target, it will be passed to the `-s` flag verbatim.
827
828If `GRADIENT_DESCENT` is set, then the module will be built suitably for
829gradient descent calculation in TensorFlow or PyTorch. See
830`Generator::build_gradient_module()` for more documentation. This corresponds to
831passing `-d 1` at the generator command line.
832
833If the `C_BACKEND` option is set, this command will invoke the configured C++
834compiler on a generated source. Note that a `<target>.runtime` target is _not_
835created in this case, and the `USE_RUNTIME` option is ignored. Other options
836work as expected.
837
838If `REGISTRATION` is set, the path to the generated `.registration.cpp` file
839will be set in `OUTVAR`. This can be used to generate a runner for a Halide
840library that is useful for benchmarking and testing, as documented above. This
841is equivalent to setting `-e registration` at the generator command line.
842
843Lastly, each of the `extra-output` arguments directly correspond to an extra
844output (via `-e`) from the generator. The value `OUTVAR` names a variable into
845which a path (relative to
846[`CMAKE_CURRENT_BINARY_DIR`][cmake_current_binary_dir]) to the extra file will
847be written.
848
849## Cross compiling
850
851Cross-compiling in CMake can be tricky, since CMake doesn't easily support
852compiling for both the host platform and the cross-platform within the same
853build. Unfortunately, Halide generator executables are just about always
854designed to run on the host platform. Each project will be set up differently
855and have different requirements, but here are some suggestions for effective use
856of CMake in these scenarios.
857
858### Use a super-build
859
860A CMake super-build consists of breaking down a project into sub-projects that
861are isolated by [toolchain][cmake-toolchains]. The basic structure is to have an
862outermost project that only coordinates the sub-builds via the
863[`ExternalProject`][externalproject] module.
864
865One would then use Halide to build a generator executable in one self-contained
866project, then export that target to be used in a separate project. The second
867project would be configured with the target [toolchain][cmake-toolchains] and
868would call `add_halide_library` with no `TARGETS` option and set `FROM` equal to
869the name of the imported generator executable. Obviously, this is a significant
870increase in complexity over a typical CMake project.
871
872### Use `ExternalProject` directly
873
874A lighter weight alternative to the above is to use
875[`ExternalProject`][externalproject] directly in your parent build. Configure
876the parent build with the target [toolchain][cmake-toolchains], and configure
877the inner project to use the host toolchain. Then, manually create an
878[`IMPORTED` target][imported-executable] for your generator executable and call
879`add_halide_library` as described above.
880
881The main drawback of this approach is that creating accurate `IMPORTED` targets
882is difficult since predicting the names and locations of your binaries across
883all possible platform and CMake project generators is difficult. In particular,
884it is hard to predict executable extensions in cross-OS builds.
885
886### Use an emulator or run on device
887
888The [`CMAKE_CROSSCOMPILING_EMULATOR`][cmake_crosscompiling_emulator] variable
889allows one to specify a command _prefix_ to run a target-system binary on the
890host machine. One could set this to a custom shell script that uploads the
891generator executable, runs it on the device and copies back the results.
892
893### Bypass CMake
894
895The previous two options ensure that the targets generated by
896`add_halide_library` will be _normal_ static libraries. This approach does not
897use [`ExternalProject`][externalproject], but instead produces `IMPORTED`
898targets. The main drawback of `IMPORTED` targets is that they are considered
899second-class in CMake. In particular, they cannot be installed with the typical
900[`install(TARGETS)` command][install-targets]. Instead, they must be installed
901using [`install(FILES)`][install-files] and the
902[`$<TARGET_FILE:tgt>`][target-file] generator expression.
903
904# Contributing CMake code to Halide
905
906When contributing new CMake code to Halide, keep in mind that the minimum
907version is 3.16. Therefore, it is possible (and indeed required) to use modern
908CMake best practices.
909
910Like any large and complex system with a dedication to preserving backwards
911compatibility, CMake is difficult to learn and full of traps. While not
912comprehensive, the following serves as a guide for writing quality CMake code
913and outlines the code quality expectations we have as they apply to CMake.
914
915## General guidelines and best practices
916
917The following are some common mistakes that lead to subtly broken builds.
918
919- **Reading the build directory.** While setting up the build, the build
920  directory should be considered _write only_. Using the build directory as a
921  read/write temporary directory is acceptable as long as all temp files are
922  cleaned up by the end of configuration.
923- **Not using [generator expressions][cmake-genex].** Declarative is better than
924  imperative and this is no exception. Conditionally adding to a target property
925  can leak unwanted details about the build environment into packages. Some
926  information is not accurate or available except via generator expressions, eg.
927  the build configuration.
928- **Using the wrong variable.** `CMAKE_SOURCE_DIR` doesn't always point to the
929  Halide source root. When someone uses Halide via
930  [`FetchContent`][fetchcontent], it will point to _their_ source root instead.
931  The correct variable is [`Halide_SOURCE_DIR`][project-name_source_dir]. If you
932  want to know if the compiler is MSVC, check it directly with the
933  [`MSVC`][msvc] variable; don't use [`WIN32`][win32]. That will be wrong when
934  compiling with clang on Windows. In most cases, however, a generator
935  expression will be more appropriate.
936- **Using directory properties.** Directory properties have vexing behavior and
937  are essentially deprecated from CMake 3.0+. Propagating target properties is
938  the way of the future.
939- **Using the wrong visibility.** Target properties can be `PRIVATE`,
940  `INTERFACE`, or both (aka `PUBLIC`). Pick the most conservative one for each
941  scenario. Refer to the [transitive usage requirements][cmake-propagation] docs
942  for more information.
943
944### Prohibited commands list
945
946As mentioned above, using directory properties is brittle and they are therefore
947_not allowed_. The following functions may not appear in any new CMake code.
948
949| Command                             | Alternative                                                                                        |
950| ----------------------------------- | -------------------------------------------------------------------------------------------------- |
951| `add_compile_definitions`           | Use [`target_compile_definitions`][target_compile_definitions]                                     |
952| `add_compile_options`               | Use [`target_compile_options`][target_compile_options]                                             |
953| `add_definitions`                   | Use [`target_compile_definitions`][target_compile_definitions]                                     |
954| `add_link_options`                  | Use [`target_link_options`][target_link_options], but prefer not to use either                     |
955| `get_directory_property`            | Use cache variables or target properties                                                           |
956| `get_property(... DIRECTORY)`       | Use cache variables or target properties                                                           |
957| `include_directories`               | Use [`target_include_directories`][target_include_directories]                                     |
958| `link_directories`                  | Use [`target_link_libraries`][target_link_libraries]                                               |
959| `link_libraries`                    | Use [`target_link_libraries`][target_link_libraries]                                               |
960| `remove_definitions`                | [Generator expressions][cmake-genex] in [`target_compile_definitions`][target_compile_definitions] |
961| `set_directory_properties`          | Use cache variables or target properties                                                           |
962| `set_property(... DIRECTORY)`       | Use cache variables or target properties                                                           |
963| `target_link_libraries(target lib)` | Use [`target_link_libraries`][target_link_libraries] _with a visibility specifier_ (eg. `PRIVATE`) |
964
965As an example, it was once common practice to write code similar to this:
966
967```cmake
968# WRONG: do not do this
969include_directories(include)
970add_library(my_lib source1.cpp ..)
971```
972
973However, this has two major pitfalls. First, it applies to _all_ targets created
974in that directory, even those before the call to `include_directories` and those
975created in [`include()`][include]-ed CMake files. As CMake files get larger and
976more complex, this behavior gets harder to pinpoint. This is particularly vexing
977when using the `link_libraries` or `add_defintions` commands. Second, this form
978does not provide a way to _propagate_ the include directory to consumers of
979`my_lib`. The correct way to do this is:
980
981```cmake
982# CORRECT
983add_library(my_lib source1.cpp ...)
984target_include_directories(my_lib PUBLIC $<BUILD_INTERFACE:include>)
985```
986
987This is better in many ways. It only affects the target in question. It
988propagates the include path to the targets linking to it (via `PUBLIC`). It also
989does not incorrectly export the host-filesystem-specific include path when
990installing or packaging the target (via `$<BUILD_INTERFACE>`).
991
992If common properties need to be grouped together, use an INTERFACE target
993(better) or write a function (worse). There are also several functions that are
994disallowed for other reasons:
995
996| Command                         | Reason                                                                            | Alternative                                                                            |
997| ------------------------------- | --------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- |
998| `aux_source_directory`          | Interacts poorly with incremental builds and Git                                  | List source files explicitly                                                           |
999| `build_command`                 | CTest internal function                                                           | Use CTest build-and-test mode via [`CMAKE_CTEST_COMMAND`][cmake_ctest_command]         |
1000| `cmake_host_system_information` | Usually misleading information.                                                   | Inspect [toolchain][cmake-toolchains] variables and use generator expressions.         |
1001| `cmake_policy(... OLD)`         | OLD policies are deprecated by definition.                                        | Instead, fix the code to work with the new policy.                                     |
1002| `create_test_sourcelist`        | We use our own unit testing solution                                              | See the [adding tests](#adding-tests) section.                                         |
1003| `define_property`               | Adds unnecessary complexity                                                       | Use a cache variable. Exceptions under special circumstances.                          |
1004| `enable_language`               | Halide is C/C++ only                                                              | [`FindCUDAToolkit`][findcudatoolkit] or [`FindCUDA`][findcuda], appropriately guarded. |
1005| `file(GLOB ...)`                | Interacts poorly with incremental builds and Git                                  | List source files explicitly. Allowed if not globbing for source files.                |
1006| `fltk_wrap_ui`                  | Halide does not use FLTK                                                          | None                                                                                   |
1007| `include_external_msproject`    | Halide must remain portable                                                       | Write a CMake package config file or find module.                                      |
1008| `include_guard`                 | Use of recursive inclusion is not allowed                                         | Write (recursive) functions.                                                           |
1009| `include_regular_expression`    | Changes default dependency checking behavior                                      | None                                                                                   |
1010| `load_cache`                    | Superseded by [`FetchContent`][fetchcontent]/[`ExternalProject`][externalproject] | Use aforementioned modules                                                             |
1011| `macro`                         | CMake macros are not hygienic and are therefore error-prone                       | Use functions instead.                                                                 |
1012| `site_name`                     | Privacy: do not want leak host name information                                   | Provide a cache variable, generate a unique name.                                      |
1013| `variable_watch`                | Debugging helper                                                                  | None. Not needed in production.                                                        |
1014
1015Lastly, do not introduce any dependencies via [`find_package`][find_package]
1016without broader approval. Confine dependencies to the `dependencies/` subtree.
1017
1018### Prohibited variables list
1019
1020Any variables that are specific to languages that are not enabled should, of
1021course, be avoided. But of greater concern are variables that are easy to misuse
1022or should not be overridden for our end-users. The following (non-exhaustive)
1023list of variables shall not be used in code merged into master.
1024
1025| Variable                        | Reason                                        | Alternative                                                                                             |
1026| ------------------------------- | --------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
1027| `CMAKE_ROOT`                    | Code smell                                    | Rely on `find_package` search options; include `HINTS` if necessary                                     |
1028| `CMAKE_DEBUG_TARGET_PROPERTIES` | Debugging helper                              | None                                                                                                    |
1029| `CMAKE_FIND_DEBUG_MODE`         | Debugging helper                              | None                                                                                                    |
1030| `CMAKE_RULE_MESSAGES`           | Debugging helper                              | None                                                                                                    |
1031| `CMAKE_VERBOSE_MAKEFILE`        | Debugging helper                              | None                                                                                                    |
1032| `CMAKE_BACKWARDS_COMPATIBILITY` | Deprecated                                    | None                                                                                                    |
1033| `CMAKE_BUILD_TOOL`              | Deprecated                                    | `${CMAKE_COMMAND} --build` or [`CMAKE_MAKE_PROGRAM`][cmake_make_program] (but see below)                |
1034| `CMAKE_CACHEFILE_DIR`           | Deprecated                                    | [`CMAKE_BINARY_DIR`][cmake_binary_dir], but see below                                                   |
1035| `CMAKE_CFG_INTDIR`              | Deprecated                                    | `$<CONFIG>`, `$<TARGET_FILE:..>`, target resolution of [`add_custom_command`][add_custom_command], etc. |
1036| `CMAKE_CL_64`                   | Deprecated                                    | [`CMAKE_SIZEOF_VOID_P`][cmake_sizeof_void_p]                                                            |
1037| `CMAKE_COMPILER_IS_*`           | Deprecated                                    | [`CMAKE_<LANG>_COMPILER_ID`][cmake_lang_compiler_id]                                                    |
1038| `CMAKE_HOME_DIRECTORY`          | Deprecated                                    | [`CMAKE_SOURCE_DIR`][cmake_source_dir], but see below                                                   |
1039| `CMAKE_DIRECTORY_LABELS`        | Directory property                            | None                                                                                                    |
1040| `CMAKE_BUILD_TYPE`              | Only applies to single-config generators.     | `$<CONFIG>`                                                                                             |
1041| `CMAKE_*_FLAGS*` (w/o `_INIT`)  | User-only                                     | Write a [toolchain][cmake-toolchains] file with the corresponding `_INIT` variable                      |
1042| `CMAKE_COLOR_MAKEFILE`          | User-only                                     | None                                                                                                    |
1043| `CMAKE_ERROR_DEPRECATED`        | User-only                                     | None                                                                                                    |
1044| `CMAKE_CONFIGURATION_TYPES`     | We only support the four standard build types | None                                                                                                    |
1045
1046Of course feel free to insert debugging helpers _while developing_ but please
1047remove them before review. Finally, the following variables are allowed, but
1048their use must be motivated:
1049
1050| Variable                                       | Reason                                              | Alternative                                                                                  |
1051| ---------------------------------------------- | --------------------------------------------------- | -------------------------------------------------------------------------------------------- |
1052| [`CMAKE_SOURCE_DIR`][cmake_source_dir]         | Points to global source root, not Halide's.         | [`Halide_SOURCE_DIR`][project-name_source_dir] or [`PROJECT_SOURCE_DIR`][project_source_dir] |
1053| [`CMAKE_BINARY_DIR`][cmake_binary_dir]         | Points to global build root, not Halide's           | [`Halide_BINARY_DIR`][project-name_binary_dir] or [`PROJECT_BINARY_DIR`][project_binary_dir] |
1054| [`CMAKE_MAKE_PROGRAM`][cmake_make_program]     | CMake abstracts over differences in the build tool. | Prefer CTest's build and test mode or CMake's `--build` mode                                 |
1055| [`CMAKE_CROSSCOMPILING`][cmake_crosscompiling] | Often misleading.                                   | Inspect relevant variables directly, eg. [`CMAKE_SYSTEM_NAME`][cmake_system_name]            |
1056| [`BUILD_SHARED_LIBS`][build_shared_libs]       | Could override user setting                         | None, but be careful to restore value when overriding for a dependency                       |
1057
1058Any use of these functions and variables will block a PR.
1059
1060## Adding tests
1061
1062When adding a file to any of the folders under `test`, be aware that CI expects
1063that every `.c` and `.cpp` appears in the `CMakeLists.txt` file _on its own
1064line_, possibly as a comment. This is to avoid globbing and also to ensure that
1065added files are not missed.
1066
1067For most test types, it should be as simple as adding to the existing lists,
1068which must remain in alphabetical order. Generator tests are trickier, but
1069following the existing examples is a safe way to go.
1070
1071## Adding apps
1072
1073If you're contributing a new app to Halide: great! Thank you! There are a few
1074guidelines you should follow when writing a new app.
1075
1076- Write the app as if it were a top-level project. You should call
1077  `find_package(Halide)` and set the C++ version to 11.
1078- Call [`enable_testing()`][enable_testing] and add a small test that runs the
1079  app.
1080- Don't assume your app will have access to a GPU. Write your schedules to be
1081  robust to varying buildbot hardware.
1082- Don't assume your app will be run on a specific OS, architecture, or bitness.
1083  Write your apps to be robust (ideally efficient) on all supported platforms.
1084- If you rely on any additional packages, don't include them as `REQUIRED`,
1085  instead test to see if their targets are available and, if not, call
1086  `return()` before creating any targets. In this case, print a
1087  `message(STATUS "[SKIP] ...")`, too.
1088- Look at the existing apps for examples.
1089- Test your app with ctest before opening a PR. Apps are built as part of the
1090  test, rather than the main build.
1091
1092[add_custom_command]:
1093  https://cmake.org/cmake/help/latest/command/add_custom_command.html
1094[add_library]: https://cmake.org/cmake/help/latest/command/add_library.html
1095[add_subdirectory]:
1096  https://cmake.org/cmake/help/latest/command/add_subdirectory.html
1097[atlas]: http://math-atlas.sourceforge.net/
1098[brew-cmake]: https://formulae.brew.sh/cask/cmake#default
1099[build_shared_libs]:
1100  https://cmake.org/cmake/help/latest/variable/BUILD_SHARED_LIBS.html
1101[choco-cmake]: https://chocolatey.org/packages/cmake/
1102[choco-doxygen]: https://chocolatey.org/packages/doxygen.install
1103[choco-ninja]: https://chocolatey.org/packages/ninja
1104[chocolatey]: https://chocolatey.org/
1105[cmake-apt]: https://apt.kitware.com/
1106[cmake-discourse]: https://discourse.cmake.org/
1107[cmake-docs]: https://cmake.org/cmake/help/latest/
1108[cmake-download]: https://cmake.org/download/
1109[cmake-from-source]: https://cmake.org/install/
1110[cmake-genex]:
1111  https://cmake.org/cmake/help/latest/manual/cmake-generator-expressions.7.html
1112[cmake-install]:
1113  https://cmake.org/cmake/help/latest/manual/cmake.1.html#install-a-project
1114[cmake-propagation]:
1115  https://cmake.org/cmake/help/latest/manual/cmake-buildsystem.7.html#transitive-usage-requirements
1116[cmake-toolchains]:
1117  https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html
1118[cmake-user-interaction]:
1119  https://cmake.org/cmake/help/latest/guide/user-interaction/index.html#setting-build-variables
1120[cmake_binary_dir]:
1121  https://cmake.org/cmake/help/latest/variable/CMAKE_BINARY_DIR.html
1122[cmake_build_type]:
1123  https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html
1124[cmake_crosscompiling]:
1125  https://cmake.org/cmake/help/latest/variable/CMAKE_CROSSCOMPILING.html
1126[cmake_crosscompiling_emulator]:
1127  https://cmake.org/cmake/help/latest/variable/CMAKE_CROSSCOMPILING_EMULATOR.html
1128[cmake_ctest_command]:
1129  https://cmake.org/cmake/help/latest/variable/CMAKE_CTEST_COMMAND.html
1130[cmake_current_binary_dir]:
1131  https://cmake.org/cmake/help/latest/variable/CMAKE_CURRENT_BINARY_DIR.html
1132[cmake_cxx_extensions]:
1133  https://cmake.org/cmake/help/latest/variable/CMAKE_CXX_EXTENSIONS.html
1134[cmake_cxx_standard]:
1135  https://cmake.org/cmake/help/latest/variable/CMAKE_CXX_STANDARD.html
1136[cmake_cxx_standard_required]:
1137  https://cmake.org/cmake/help/latest/variable/CMAKE_CXX_STANDARD_REQUIRED.html
1138[cmake_lang_compiler_id]:
1139  https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER_ID.html
1140[cmake_make_program]:
1141  https://cmake.org/cmake/help/latest/variable/CMAKE_MAKE_PROGRAM.html
1142[cmake_minimum_required]:
1143  https://cmake.org/cmake/help/latest/command/cmake_minimum_required.html
1144[cmake_module_path]:
1145  https://cmake.org/cmake/help/latest/variable/CMAKE_MODULE_PATH.html
1146[cmake_sizeof_void_p]:
1147  https://cmake.org/cmake/help/latest/variable/CMAKE_SIZEOF_VOID_P.html
1148[cmake_source_dir]:
1149  https://cmake.org/cmake/help/latest/variable/CMAKE_SOURCE_DIR.html
1150[cmake_system_name]:
1151  https://cmake.org/cmake/help/latest/variable/CMAKE_SYSTEM_NAME.html
1152[doxygen-download]: https://www.doxygen.nl/download.html
1153[doxygen]: https://www.doxygen.nl/index.html
1154[eigen]: http://eigen.tuxfamily.org/index.php?title=Main_Page
1155[enable_testing]:
1156  https://cmake.org/cmake/help/latest/command/enable_testing.html
1157[externalproject]:
1158  https://cmake.org/cmake/help/latest/module/ExternalProject.html
1159[fetchcontent]: https://cmake.org/cmake/help/latest/module/FetchContent.html
1160[find_package]: https://cmake.org/cmake/help/latest/command/find_package.html
1161[findcuda]: https://cmake.org/cmake/help/latest/module/FindCUDA.html
1162[findcudatoolkit]:
1163  https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html
1164[finddoxygen]: https://cmake.org/cmake/help/latest/module/FindDoxygen.html
1165[findjpeg]: https://cmake.org/cmake/help/latest/module/FindJPEG.html
1166[findopencl]: https://cmake.org/cmake/help/latest/module/FindOpenCL.html
1167[findopengl]: https://cmake.org/cmake/help/latest/module/FindOpenGL.html
1168[findpng]: https://cmake.org/cmake/help/latest/module/FindPNG.html
1169[findpython3]: https://cmake.org/cmake/help/latest/module/FindPython3.html
1170[findx11]: https://cmake.org/cmake/help/latest/module/FindX11.html
1171[halide-generator-tutorial]:
1172  https://halide-lang.org/tutorials/tutorial_lesson_15_generators.html
1173[halide-tutorials]: https://halide-lang.org/tutorials/tutorial_introduction.html
1174[homebrew]: https://brew.sh
1175[imported-executable]:
1176  https://cmake.org/cmake/help/latest/command/add_executable.html#imported-executables
1177[imported-target]:
1178  https://cmake.org/cmake/help/latest/manual/cmake-buildsystem.7.html#imported-targets
1179[include]: https://cmake.org/cmake/help/latest/command/include.html
1180[install-files]: https://cmake.org/cmake/help/latest/command/install.html#files
1181[install-targets]:
1182  https://cmake.org/cmake/help/latest/command/install.html#targets
1183[libjpeg]: https://www.libjpeg-turbo.org/
1184[libpng]: http://www.libpng.org/pub/png/libpng.html
1185[lld]: https://lld.llvm.org/
1186[msvc]: https://cmake.org/cmake/help/latest/variable/MSVC.html
1187[msvc-cmd]:
1188  https://docs.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=vs-2019
1189[ninja-download]: https://github.com/ninja-build/ninja/releases
1190[ninja]: https://ninja-build.org/
1191[openblas]: https://www.openblas.net/
1192[project]: https://cmake.org/cmake/help/latest/command/project.html
1193[project-name_binary_dir]:
1194  https://cmake.org/cmake/help/latest/variable/PROJECT-NAME_BINARY_DIR.html
1195[project-name_source_dir]:
1196  https://cmake.org/cmake/help/latest/variable/PROJECT-NAME_SOURCE_DIR.html
1197[project_source_dir]:
1198  https://cmake.org/cmake/help/latest/variable/PROJECT_SOURCE_DIR.html
1199[project_binary_dir]:
1200  https://cmake.org/cmake/help/latest/variable/PROJECT_BINARY_DIR.html
1201[pypi-cmake]: https://pypi.org/project/cmake/
1202[python]: https://www.python.org/downloads/
1203[target-file]:
1204  https://cmake.org/cmake/help/latest/manual/cmake-generator-expressions.7.html#target-dependent-queries
1205[target_compile_definitions]:
1206  https://cmake.org/cmake/help/latest/command/target_compile_definitions.html
1207[target_compile_options]:
1208  https://cmake.org/cmake/help/latest/command/target_compile_options.html
1209[target_include_directories]:
1210  https://cmake.org/cmake/help/latest/command/target_include_directories.html
1211[target_link_libraries]:
1212  https://cmake.org/cmake/help/latest/command/target_link_libraries.html
1213[target_link_options]:
1214  https://cmake.org/cmake/help/latest/command/target_link_options.html
1215[vcpkg]: https://github.com/Microsoft/vcpkg
1216[vcvarsall]:
1217  https://docs.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=vs-2019#vcvarsall-syntax
1218[venv]: https://docs.python.org/3/tutorial/venv.html
1219[vs2019-cmake-docs]:
1220  https://docs.microsoft.com/en-us/cpp/build/cmake-projects-in-visual-studio?view=vs-2019
1221[win32]: https://cmake.org/cmake/help/latest/variable/WIN32.html
1222

README_rungen.md

1# Running and Benchmarking Halide Generators
2
3## Overview
4
5`RunGen` is a simple(ish) wrapper that allows an arbitrary Generator to be built
6into a single executable that can be run directly from bash, without needing to
7wrap it in your own custom main() driver. It also implements a rudimentary
8benchmarking and memory-usage functionality.
9
10If you use the standard CMake rules for Generators, you get RunGen functionality
11automatically. (If you use Make, you might need to add an extra rule or two to
12your Makefile; all the examples in `apps/` already have these rules.)
13
14For every `halide_library` (or `halide_library_from_generator`) rule, there is
15an implicit `name.rungen` rule that generates an executable that wraps the
16Generator library:
17
18```
19# In addition to defining a static library named "local_laplacian", this rule
20# also implicitly defines an executable target named "local_laplacian.rungen"
21halide_library(
22    local_laplacian
23    SRCS local_laplacian_generator.cc
24)
25```
26
27You can build and run this like any other executable:
28
29```
30$ make bin/local_laplacian.rungen && ./bin/local_laplacian.rungen
31Usage: local_laplacian.rungen argument=value [argument=value... ] [flags]
32...typical "usage" text...
33```
34
35To be useful, you need to pass in values for the Generator's inputs (and
36locations for the output(s)) on the command line, of course. You can use the
37`--describe` flag to see the names and expected types:
38
39```
40# ('make bin/local_laplacian.rungen && ' prefix omitted henceforth for clarity)
41$ ./bin/local_laplacian.rungen --describe
42Filter name: "local_laplacian"
43  Input "input" is of type Buffer<uint16> with 3 dimensions
44  Input "levels" is of type int32
45  Input "alpha" is of type float32
46  Input "beta" is of type float32
47  Output "local_laplacian" is of type Buffer<uint16> with 3 dimensions
48```
49
50Warning: Outputs may have `$X` (where `X` is a small integer) appended to their
51names in some cases (or, in the case of Generators that don't explicitly declare
52outputs via `Output<>`, an autogenerated name of the form `fX`). If this
53happens, don't forget to escape the `$` with a backslash as necessary. These are
54both bugs we intend to fix; see https://github.com/halide/Halide/issues/2194
55
56As a convenience, there is also an implicit target that builds-and-runs, named
57simply "NAME.run":
58
59```
60# This is equivalent to "make bin/local_laplacian.rungen && ./bin/local_laplacian.rungen"
61$ make bin/local_laplacian.run
62Usage: local_laplacian.rungen argument=value [argument=value... ] [flags]
63
64# To pass arguments to local_laplacian.rungen, set the RUNARGS var:
65$ make bin/local_laplacian.run RUNARGS=--describe
66Filter name: "local_laplacian"
67  Input "input" is of type Buffer<uint16> with 3 dimensions
68  Input "levels" is of type int32
69  Input "alpha" is of type float32
70  Input "beta" is of type float32
71  Output "local_laplacian" is of type Buffer<uint16> with 3 dimensions
72```
73
74Inputs are specified as `name=value` pairs, in any order. Scalar inputs are
75specified the typical text form, while buffer inputs (and outputs) are specified
76via paths to image files. RunGen currently can read/write image files in any
77format supported by halide_image_io.h; at this time, that means .png, .jpg,
78.ppm, .pgm, and .tmp formats. (We plan to add .tiff and .mat (level 5) in the
79future.)
80
81```
82$ ./bin/local_laplacian.rungen input=../images/rgb_small16.png levels=8 alpha=1 beta=1 output=/tmp/out.png
83$ display /tmp/out.png
84```
85
86You can also specify any scalar input as `default` or `estimate`, which will use
87the default value specified for the input, or the value specified by
88`set_estimate` for that input. (If the relevant value isn't set for that input,
89a runtime error occurs.)
90
91```
92$ ./bin/local_laplacian.rungen input=../images/rgb_small16.png levels=8 alpha=estimate beta=default output=/tmp/out.png
93$ display /tmp/out.png
94```
95
96If you specify an input or output file format that doesn't match the required
97type/dimensions for an argument (e.g., using an 8-bit PNG for an Input<float>,
98or a grayscale image for a 3-dimensional input), RunGen will try to coerce the
99inputs to something sensible; that said, it's hard to always get this right, so
100warnings are **always** issued whenever an input or output is modified in any
101way.
102
103```
104# This filter expects a 16-bit RGB image as input, but we're giving it an 8-bit grayscale image:
105$ ./bin/local_laplacian.rungen input=../images/gray.png levels=8 alpha=1 beta=1 output=/tmp/out.png
106Warning: Image for Input "input" has 2 dimensions, but this argument requires at least 3 dimensions: adding dummy dimensions of extent 1.
107Warning: Image loaded for argument "input" is type uint8 but this argument expects type uint16; data loss may have occurred.
108```
109
110By default, we try to guess a suitable size for the output image(s), based
111mainly on the size of the input images (if any); you can also specify explicit
112output extents. (Note that output_extents are subject to constraints already
113imposed by the particular Generator's logic, so arbitrary values for
114--output_extents may produce runtime errors.)
115
116```
117# Constrain output extents to 100x200x3
118$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=../images/rgb_small16.png levels=8 alpha=1 beta=1 output=/tmp/out.png
119```
120
121Sometimes you don't care what the particular element values for an input are
122(e.g. for benchmarking), and you just want an image of a particular size; in
123that case, you can use the `zero:[]` pseudo-file; it infers the _type_ from the
124Generator, and inits every element to zero:
125
126```
127# Input is a 3-dimensional image with extent 123, 456, and 3
128# (bluring an image of all zeroes isn't very interesting, of course)
129$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:[123,456,3] levels=8 alpha=1 beta=1 output=/tmp/out.png
130```
131
132You can also specify arbitrary (nonzero) constants:
133
134```
135# Input is a 3-dimensional image with extent 123, 456, and 3,
136# filled with a constant value of 42
137$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=constant:42:[123,456,3] levels=8 alpha=1 beta=1 output=/tmp/out.png
138```
139
140Similarly, you can create identity images where only the diagonal elements are
1411-s (rest are 0-s) by invoking `identity:[]`. Diagonal elements are defined as
142those whose first two coordinates are equal.
143
144There's also a `random:SEED:[]` pseudo-file, which fills the image with uniform
145noise based on a specific random-number seed:
146
147```
148# Input is a 3-dimensional image with extent 123, 456, and 3
149$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=random:42:[123,456,3] levels=8 alpha=1 beta=1 output=/tmp/out.png
150```
151
152Instead of specifying an explicit set of extents for a pseudo-input, you can use
153the string `auto`, which will run a bounds query to choose a legal set of
154extents for that input given the known output extents. (This is only useful when
155used in conjunction with the `--output_extents` flag.)
156
157```
158$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:auto levels=8 alpha=1 beta=1 output=/tmp/out.png
159```
160
161You can also specify `estimate` for the extents, which will use the estimate
162values provided, typically (but not necessarily) for auto_schedule. (If there
163aren't estimates for all of the buffer's dimensions, a runtime error occurs.)
164
165```
166$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:auto levels=8 alpha=1 beta=1 output=/tmp/out.png
167```
168
169You can combine the two and specify `estimate_then_auto` for the extents, which
170will attempt to use the estimate values; if a given input buffer has no
171estimates, it will fall back to the bounds-query result for that input:
172
173```
174$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] input=zero:estimate_then_auto levels=8 alpha=1 beta=1 output=/tmp/out.png
175```
176
177Similarly, you can use `estimate` for `--output_extents`, which will use the
178estimate values for each output. (If there aren't estimates for all of the
179outputs, a runtime error occurs.)
180
181```
182$ ./bin/local_laplacian.rungen --output_extents=estimate input=zero:auto levels=8 alpha=1 beta=1 output=/tmp/out.png
183```
184
185If you don't want to explicitly specify all (or any!) of the input values, you
186can use the `--default_input_buffers` and `--default_input_scalars` flags, which
187provide wildcards for any omitted inputs:
188
189```
190$ ./bin/local_laplacian.rungen --output_extents=[100,200,3] --default_input_buffers=random:0:auto --default_input_scalars=estimate output=/tmp/out.png
191```
192
193In this case, all input buffers will be sized according to bounds query, and
194filled with a random seed; all input scalars will be initialized to their
195declared default values. (If they have no declared default value, a zero of the
196appropriate type will be used.)
197
198Note: `--default_input_buffers` can produce surprising sizes! For instance, any
199input that uses `BoundaryConditions::repeat_edge` to wrap itself can legally be
200set to almost any size, so you may legitimately get an input with extent=1 in
201all dimensions; whether this is useful to you or not depends on the code. It's
202highly recommended you do testing with the `--verbose` flag (which will log the
203calculated sizes) to reality-check that you are getting what you expect,
204especially for benchmarking.
205
206A common case (especially for benchmarking) is to specify using estimates for
207all inputs and outputs; for this, you can specify `--estimate_all`, which is
208just a shortcut for
209`--default_input_buffers=estimate_then_auto --default_input_scalars=estimate --output_extents=estimate`.
210
211## Benchmarking
212
213To run a benchmark, use the `--benchmarks=all` flag:
214
215```
216$ ./bin/local_laplacian.rungen --benchmarks=all input=zero:[1920,1080,3] levels=8 alpha=1 beta=1 --output_extents=[100,200,3]
217Benchmark for local_laplacian produces best case of 0.0494629 sec/iter, over 3 blocks of 10 iterations.
218Best output throughput is 39.9802 mpix/sec.
219```
220
221You can use `--default_input_buffers` and `--default_input_scalars` here as
222well:
223
224```
225$ ./bin/local_laplacian.rungen --benchmarks=all --default_input_buffers --default_input_scalars --output_extents=estimate
226Benchmark for local_laplacian produces best case of 0.0494629 sec/iter, over 3 blocks of 10 iterations.
227Best output throughput is 39.9802 mpix/sec.
228```
229
230Note: `halide_benchmark.h` is known to be inaccurate for GPU filters; see
231https://github.com/halide/Halide/issues/2278
232
233## Measuring Memory Usage
234
235To track memory usage, use the `--track_memory` flag, which measures the
236high-water-mark of CPU memory usage.
237
238```
239$ ./bin/local_laplacian.rungen --track_memory input=zero:[1920,1080,3] levels=8 alpha=1 beta=1 --output_extents=[100,200,3]
240Maximum Halide memory: 82688420 bytes for output of 1.97754 mpix.
241```
242
243Warning: `--track_memory` may degrade performance; don't combine it with
244`--benchmark` or expect meaningful timing measurements when using it.
245
246## Using RunGen in Make
247
248To add support for RunGen to your Makefile, you need to add rules something like
249this (see `apps/support/Makefile.inc` for an example):
250
251```
252HALIDE_DISTRIB ?= /path/to/halide/distrib/folder
253
254$(BIN)/RunGenMain.o: $(HALIDE_DISTRIB)/tools/RunGenMain.cpp
255  @mkdir -p $(@D)
256  @$(CXX) -c $< $(CXXFLAGS) $(LIBPNG_CXX_FLAGS) $(LIBJPEG_CXX_FLAGS) -I$(BIN) -o $@
257
258.PRECIOUS: $(BIN)/%.rungen
259$(BIN)/%.rungen: $(BIN)/%.a $(BIN)/%.registration.cpp $(BIN)/RunGenMain.o
260  $(CXX) $(CXXFLAGS) $^ -o $@ $(LIBPNG_LIBS) $(LIBJPEG_LIBS) $(LDFLAGS)
261
262RUNARGS ?=
263
264$(BIN)/%.run: $(BIN)/%.rungen
265  @$(CURDIR)/$< $(RUNARGS)
266```
267
268Note that the `%.registration.cpp` file is created by running a generator and
269specifying `registration` in the comma-separated list of files to emit; these
270are also generated by default if `-e` is not used on the generator command line.
271
272## Known Issues & Caveats
273
274- If your Generator uses `define_extern()`, you must have all link-time
275  dependencies declared properly via `FILTER_DEPS`; otherwise, you'll fail to
276  link.
277- The code does its best to detect when inputs or outputs need to be
278  chunky/interleaved (rather than planar), but in unusual cases it might guess
279  wrong; if your Generator uses buffers with unusual stride setups, RunGen might
280  fail at runtime. (If this happens, please file a bug!)
281- The code for deducing good output sizes is rudimentary and needs to be
282  smartened; it will sometimes make bad decisions which will prevent the filter
283  from executing. (If this happens, please file a bug!)
284

README_webassembly.md

1# WebAssembly Support for Halide
2
3Halide supports WebAssembly (Wasm) code generation from Halide using the LLVM
4backend.
5
6As WebAssembly itself is still under active development, Halide's support has
7some limitations. Some of the most important:
8
9-   We require using LLVM 11 or later for Wasm codegen; earlier versions of LLVM
10    will not work.
11-   Fixed-width SIMD (128 bit) can be enabled via Target::WasmSimd128.
12-   Sign-extension operations can be enabled via Target::WasmSignExt.
13-   Non-trapping float-to-int conversions can be enabled via
14    Target::WasmSatFloatToInt.
15-   Threads are not available yet. We'd like to support this in the future but
16    don't yet have a timeline.
17-   Halide's JIT for Wasm is extremely limited and really useful only for
18    internal testing purposes.
19
20# Additional Tooling Requirements:
21
22-   In additional to the usual install of LLVM and clang, you'll need lld. All
23    should be at least v11 or later (codegen will be improved under LLVM
24    v12/trunk, at least as of July 2020).
25-   Locally-installed version of Emscripten, 1.39.19+
26
27Note that for all of the above, earlier versions might work, but have not been
28tested.
29
30# AOT Limitations
31
32Halide outputs a Wasm object (.o) or static library (.a) file, much like any
33other architecture; to use it, of course, you must link it to suitable calling
34code. Additionally, you must link to something that provides an implementation
35of `libc`; as a practical matter, this means using the Emscripten tool to do
36your linking, as it provides the most complete such implementation we're aware
37of at this time.
38
39-   Halide ahead-of-time tests assume/require that you have Emscripten installed
40    and available on your system, with the `EMSDK` environment variable set
41    properly.
42
43# JIT Limitations
44
45It's important to reiterate that the WebAssembly JIT mode is not (and will never
46be) appropriate for anything other than limited self tests, for a number of
47reasons:
48
49-   It actually uses an interpreter (from the WABT toolkit
50    [https://github.com/WebAssembly/wabt]) to execute wasm bytecode; not
51    surprisingly, this can be *very* slow.
52-   Wasm effectively runs in a private, 32-bit memory address space; while the
53    host has access to that entire space, the reverse is not true, and thus any
54    `define_extern` calls require copying all `halide_buffer_t` data across the
55    Wasm<->host boundary in both directions. This has severe implications for
56    existing benchmarks, which don't currently attempt to account for this extra
57    overhead. (This could possibly be improved by modeling the Wasm JIT's buffer
58    support as a `device` model that would allow lazy copy-on-demand.)
59-   Host functions used via `define_extern` or `HalideExtern` cannot accept or
60    return values that are pointer types or 64-bit integer types; this includes
61    things like `const char *` and `user_context`. Fixing this is tractable, but
62    is currently omitted as the fix is nontrivial and the tests that are
63    affected are mostly non-critical. (Note that `halide_buffer_t*` is
64    explicitly supported as a special case, however.)
65-   Threading isn't supported at all (yet); all `parallel()` schedules will be
66    run serially.
67-   The `.async()` directive isn't supported at all, not even in
68    serial-emulation mode.
69-   You can't use `Param<void *>` (or any other arbitrary pointer type) with the
70    Wasm jit.
71-   You can't use `Func.debug_to_file()`, `Func.set_custom_do_par_for()`,
72    `Func.set_custom_do_task()`, or `Func.set_custom_allocator()`.
73-   The implementation of `malloc()` used by the JIT is incredibly simpleminded
74    and unsuitable for anything other than the most basic of tests.
75-   GPU usage (or any buffer usage that isn't 100% host-memory) isn't supported
76    at all yet. (This should be doable, just omitted for now.)
77
78Note that while some of these limitations may be improved in the future, some
79are effectively intrinsic to the nature of this problem. Realistically, this JIT
80implementation is intended solely for running Halide self-tests (and even then,
81a number of them are fundamentally impractical to support in a hosted-Wasm
82environment and are disabled).
83
84In sum: don't plan on using Halide JIT mode with Wasm unless you are working on
85the Halide library itself.
86
87# To Use Halide For WebAssembly:
88
89-   Ensure WebAssembly is in LLVM_TARGETS_TO_BUILD; if you use the default
90    (`"all"`) then it's already present, but otherwise, add it explicitly:
91
92```
93-DLLVM_TARGETS_TO_BUILD="X86;ARM;NVPTX;AArch64;Mips;PowerPC;Hexagon;WebAssembly
94```
95
96## Enabling wasm JIT
97
98If you want to run `test_correctness` and other interesting parts of the Halide
99test suite (and you almost certainly will), you'll need to ensure that LLVM is
100built with wasm-ld:
101
102-   Ensure that you have lld in LVM_ENABLE_PROJECTS:
103
104```
105cmake -DLLVM_ENABLE_PROJECTS="clang;lld" ...
106```
107
108-   To run the JIT tests, set `HL_JIT_TARGET=wasm-32-wasmrt` (possibly adding
109    `wasm_simd128`, `wasm_signext`, and/or `wasm_sat_float_to_int`) and run
110    CMake/CTest normally. Note that wasm testing is only support under CMake
111    (not via Make).
112
113## Enabling wasm AOT
114
115If you want to test ahead-of-time code generation (and you almost certainly
116will), you need to install Emscripten locally.
117
118-   The simplest way to install is probably via the Emscripten emsdk
119    (https://emscripten.org/docs/getting_started/downloads.html).
120
121-   To run the AOT tests, set `HL_TARGET=wasm-32-wasmrt` (possibly adding
122    `wasm_simd128`, `wasm_signext`, and/or `wasm_sat_float_to_int`) and run
123    CMake/CTest normally. Note that wasm testing is only support under CMake
124    (not via Make).
125
126# Running benchmarks
127
128The `test_performance` benchmarks are misleading (and thus useless) for Wasm, as
129they include JIT overhead as described elsewhere. Suitable benchmarks for Wasm
130will be provided at a later date. (See
131https://github.com/halide/Halide/issues/5119 and
132https://github.com/halide/Halide/issues/5047 to track progress.)
133
134# Known Limitations And Caveats
135
136-   Current trunk LLVM (as of July 2020) doesn't reliably generate all of the
137    Wasm SIMD ops that are available; see
138    https://github.com/halide/Halide/issues/5130 for tracking information as
139    these are fixed.
140-   Using the JIT requires that we link the `wasm-ld` tool into libHalide; with
141    some work this need could possibly be eliminated.
142-   OSX and Linux-x64 have been tested. Windows hasn't; it should be supportable
143    with some work. (Patches welcome.)
144-   None of the `apps/` folder has been investigated yet. Many of them should be
145    supportable with some work. (Patches welcome.)
146-   We currently use v8/d8 as a test environment for AOT code; we may want to
147    consider using Node or (better yet) headless Chrome instead (which is
148    probably required to allow for using threads in AOT code).
149
150# Known TODO:
151
152-   There's some invasive hackiness in Codgen_LLVM to support the JIT
153    trampolines; this really should be refactored to be less hacky.
154-   Can we rework JIT to avoid the need to link in wasm-ld? This might be
155    doable, as the wasm object files produced by the LLVM backend are close
156    enough to an executable form that we could likely make it work with some
157    massaging on our side, but it's not clear whether this would be a bad idea
158    or not (i.e., would it be unreasonably fragile).
159-   Buffer-copying overhead in the JIT could possibly be dramatically improved
160    by modeling the copy as a "device" (i.e. `copy_to_device()` would copy from
161    host -> wasm); this would make the performance benchmarks much more useful.
162-   Can we support threads in the JIT without an unreasonable amount of work?
163    Unknown at this point.
164