1# WebAssembly Support for Halide
2
3Halide supports WebAssembly (Wasm) code generation from Halide using the LLVM
4backend.
5
6As WebAssembly itself is still under active development, Halide's support has
7some limitations. Some of the most important:
8
9-   We require using LLVM 11 or later for Wasm codegen; earlier versions of LLVM
10    will not work.
11-   Fixed-width SIMD (128 bit) can be enabled via Target::WasmSimd128.
12-   Sign-extension operations can be enabled via Target::WasmSignExt.
13-   Non-trapping float-to-int conversions can be enabled via
14    Target::WasmSatFloatToInt.
15-   Threads are not available yet. We'd like to support this in the future but
16    don't yet have a timeline.
17-   Halide's JIT for Wasm is extremely limited and really useful only for
18    internal testing purposes.
19
20# Additional Tooling Requirements:
21
22-   In additional to the usual install of LLVM and clang, you'll need lld. All
23    should be at least v11 or later (codegen will be improved under LLVM
24    v12/trunk, at least as of July 2020).
25-   Locally-installed version of Emscripten, 1.39.19+
26
27Note that for all of the above, earlier versions might work, but have not been
28tested.
29
30# AOT Limitations
31
32Halide outputs a Wasm object (.o) or static library (.a) file, much like any
33other architecture; to use it, of course, you must link it to suitable calling
34code. Additionally, you must link to something that provides an implementation
35of `libc`; as a practical matter, this means using the Emscripten tool to do
36your linking, as it provides the most complete such implementation we're aware
37of at this time.
38
39-   Halide ahead-of-time tests assume/require that you have Emscripten installed
40    and available on your system, with the `EMSDK` environment variable set
41    properly.
42
43# JIT Limitations
44
45It's important to reiterate that the WebAssembly JIT mode is not (and will never
46be) appropriate for anything other than limited self tests, for a number of
47reasons:
48
49-   It actually uses an interpreter (from the WABT toolkit
50    [https://github.com/WebAssembly/wabt]) to execute wasm bytecode; not
51    surprisingly, this can be *very* slow.
52-   Wasm effectively runs in a private, 32-bit memory address space; while the
53    host has access to that entire space, the reverse is not true, and thus any
54    `define_extern` calls require copying all `halide_buffer_t` data across the
55    Wasm<->host boundary in both directions. This has severe implications for
56    existing benchmarks, which don't currently attempt to account for this extra
57    overhead. (This could possibly be improved by modeling the Wasm JIT's buffer
58    support as a `device` model that would allow lazy copy-on-demand.)
59-   Host functions used via `define_extern` or `HalideExtern` cannot accept or
60    return values that are pointer types or 64-bit integer types; this includes
61    things like `const char *` and `user_context`. Fixing this is tractable, but
62    is currently omitted as the fix is nontrivial and the tests that are
63    affected are mostly non-critical. (Note that `halide_buffer_t*` is
64    explicitly supported as a special case, however.)
65-   Threading isn't supported at all (yet); all `parallel()` schedules will be
66    run serially.
67-   The `.async()` directive isn't supported at all, not even in
68    serial-emulation mode.
69-   You can't use `Param<void *>` (or any other arbitrary pointer type) with the
70    Wasm jit.
71-   You can't use `Func.debug_to_file()`, `Func.set_custom_do_par_for()`,
72    `Func.set_custom_do_task()`, or `Func.set_custom_allocator()`.
73-   The implementation of `malloc()` used by the JIT is incredibly simpleminded
74    and unsuitable for anything other than the most basic of tests.
75-   GPU usage (or any buffer usage that isn't 100% host-memory) isn't supported
76    at all yet. (This should be doable, just omitted for now.)
77
78Note that while some of these limitations may be improved in the future, some
79are effectively intrinsic to the nature of this problem. Realistically, this JIT
80implementation is intended solely for running Halide self-tests (and even then,
81a number of them are fundamentally impractical to support in a hosted-Wasm
82environment and are disabled).
83
84In sum: don't plan on using Halide JIT mode with Wasm unless you are working on
85the Halide library itself.
86
87# To Use Halide For WebAssembly:
88
89-   Ensure WebAssembly is in LLVM_TARGETS_TO_BUILD; if you use the default
90    (`"all"`) then it's already present, but otherwise, add it explicitly:
91
92```
93-DLLVM_TARGETS_TO_BUILD="X86;ARM;NVPTX;AArch64;Mips;PowerPC;Hexagon;WebAssembly
94```
95
96## Enabling wasm JIT
97
98If you want to run `test_correctness` and other interesting parts of the Halide
99test suite (and you almost certainly will), you'll need to ensure that LLVM is
100built with wasm-ld:
101
102-   Ensure that you have lld in LVM_ENABLE_PROJECTS:
103
104```
105cmake -DLLVM_ENABLE_PROJECTS="clang;lld" ...
106```
107
108-   To run the JIT tests, set `HL_JIT_TARGET=wasm-32-wasmrt` (possibly adding
109    `wasm_simd128`, `wasm_signext`, and/or `wasm_sat_float_to_int`) and run
110    CMake/CTest normally. Note that wasm testing is only support under CMake
111    (not via Make).
112
113## Enabling wasm AOT
114
115If you want to test ahead-of-time code generation (and you almost certainly
116will), you need to install Emscripten locally.
117
118-   The simplest way to install is probably via the Emscripten emsdk
119    (https://emscripten.org/docs/getting_started/downloads.html).
120
121-   To run the AOT tests, set `HL_TARGET=wasm-32-wasmrt` (possibly adding
122    `wasm_simd128`, `wasm_signext`, and/or `wasm_sat_float_to_int`) and run
123    CMake/CTest normally. Note that wasm testing is only support under CMake
124    (not via Make).
125
126# Running benchmarks
127
128The `test_performance` benchmarks are misleading (and thus useless) for Wasm, as
129they include JIT overhead as described elsewhere. Suitable benchmarks for Wasm
130will be provided at a later date. (See
131https://github.com/halide/Halide/issues/5119 and
132https://github.com/halide/Halide/issues/5047 to track progress.)
133
134# Known Limitations And Caveats
135
136-   Current trunk LLVM (as of July 2020) doesn't reliably generate all of the
137    Wasm SIMD ops that are available; see
138    https://github.com/halide/Halide/issues/5130 for tracking information as
139    these are fixed.
140-   Using the JIT requires that we link the `wasm-ld` tool into libHalide; with
141    some work this need could possibly be eliminated.
142-   OSX and Linux-x64 have been tested. Windows hasn't; it should be supportable
143    with some work. (Patches welcome.)
144-   None of the `apps/` folder has been investigated yet. Many of them should be
145    supportable with some work. (Patches welcome.)
146-   We currently use v8/d8 as a test environment for AOT code; we may want to
147    consider using Node or (better yet) headless Chrome instead (which is
148    probably required to allow for using threads in AOT code).
149
150# Known TODO:
151
152-   There's some invasive hackiness in Codgen_LLVM to support the JIT
153    trampolines; this really should be refactored to be less hacky.
154-   Can we rework JIT to avoid the need to link in wasm-ld? This might be
155    doable, as the wasm object files produced by the LLVM backend are close
156    enough to an executable form that we could likely make it work with some
157    massaging on our side, but it's not clear whether this would be a bad idea
158    or not (i.e., would it be unreasonably fragile).
159-   Buffer-copying overhead in the JIT could possibly be dramatically improved
160    by modeling the copy as a "device" (i.e. `copy_to_device()` would copy from
161    host -> wasm); this would make the performance benchmarks much more useful.
162-   Can we support threads in the JIT without an unreasonable amount of work?
163    Unknown at this point.
164