1# WebAssembly Support for Halide 2 3Halide supports WebAssembly (Wasm) code generation from Halide using the LLVM 4backend. 5 6As WebAssembly itself is still under active development, Halide's support has 7some limitations. Some of the most important: 8 9- We require using LLVM 11 or later for Wasm codegen; earlier versions of LLVM 10 will not work. 11- Fixed-width SIMD (128 bit) can be enabled via Target::WasmSimd128. 12- Sign-extension operations can be enabled via Target::WasmSignExt. 13- Non-trapping float-to-int conversions can be enabled via 14 Target::WasmSatFloatToInt. 15- Threads are not available yet. We'd like to support this in the future but 16 don't yet have a timeline. 17- Halide's JIT for Wasm is extremely limited and really useful only for 18 internal testing purposes. 19 20# Additional Tooling Requirements: 21 22- In additional to the usual install of LLVM and clang, you'll need lld. All 23 should be at least v11 or later (codegen will be improved under LLVM 24 v12/trunk, at least as of July 2020). 25- Locally-installed version of Emscripten, 1.39.19+ 26 27Note that for all of the above, earlier versions might work, but have not been 28tested. 29 30# AOT Limitations 31 32Halide outputs a Wasm object (.o) or static library (.a) file, much like any 33other architecture; to use it, of course, you must link it to suitable calling 34code. Additionally, you must link to something that provides an implementation 35of `libc`; as a practical matter, this means using the Emscripten tool to do 36your linking, as it provides the most complete such implementation we're aware 37of at this time. 38 39- Halide ahead-of-time tests assume/require that you have Emscripten installed 40 and available on your system, with the `EMSDK` environment variable set 41 properly. 42 43# JIT Limitations 44 45It's important to reiterate that the WebAssembly JIT mode is not (and will never 46be) appropriate for anything other than limited self tests, for a number of 47reasons: 48 49- It actually uses an interpreter (from the WABT toolkit 50 [https://github.com/WebAssembly/wabt]) to execute wasm bytecode; not 51 surprisingly, this can be *very* slow. 52- Wasm effectively runs in a private, 32-bit memory address space; while the 53 host has access to that entire space, the reverse is not true, and thus any 54 `define_extern` calls require copying all `halide_buffer_t` data across the 55 Wasm<->host boundary in both directions. This has severe implications for 56 existing benchmarks, which don't currently attempt to account for this extra 57 overhead. (This could possibly be improved by modeling the Wasm JIT's buffer 58 support as a `device` model that would allow lazy copy-on-demand.) 59- Host functions used via `define_extern` or `HalideExtern` cannot accept or 60 return values that are pointer types or 64-bit integer types; this includes 61 things like `const char *` and `user_context`. Fixing this is tractable, but 62 is currently omitted as the fix is nontrivial and the tests that are 63 affected are mostly non-critical. (Note that `halide_buffer_t*` is 64 explicitly supported as a special case, however.) 65- Threading isn't supported at all (yet); all `parallel()` schedules will be 66 run serially. 67- The `.async()` directive isn't supported at all, not even in 68 serial-emulation mode. 69- You can't use `Param<void *>` (or any other arbitrary pointer type) with the 70 Wasm jit. 71- You can't use `Func.debug_to_file()`, `Func.set_custom_do_par_for()`, 72 `Func.set_custom_do_task()`, or `Func.set_custom_allocator()`. 73- The implementation of `malloc()` used by the JIT is incredibly simpleminded 74 and unsuitable for anything other than the most basic of tests. 75- GPU usage (or any buffer usage that isn't 100% host-memory) isn't supported 76 at all yet. (This should be doable, just omitted for now.) 77 78Note that while some of these limitations may be improved in the future, some 79are effectively intrinsic to the nature of this problem. Realistically, this JIT 80implementation is intended solely for running Halide self-tests (and even then, 81a number of them are fundamentally impractical to support in a hosted-Wasm 82environment and are disabled). 83 84In sum: don't plan on using Halide JIT mode with Wasm unless you are working on 85the Halide library itself. 86 87# To Use Halide For WebAssembly: 88 89- Ensure WebAssembly is in LLVM_TARGETS_TO_BUILD; if you use the default 90 (`"all"`) then it's already present, but otherwise, add it explicitly: 91 92``` 93-DLLVM_TARGETS_TO_BUILD="X86;ARM;NVPTX;AArch64;Mips;PowerPC;Hexagon;WebAssembly 94``` 95 96## Enabling wasm JIT 97 98If you want to run `test_correctness` and other interesting parts of the Halide 99test suite (and you almost certainly will), you'll need to ensure that LLVM is 100built with wasm-ld: 101 102- Ensure that you have lld in LVM_ENABLE_PROJECTS: 103 104``` 105cmake -DLLVM_ENABLE_PROJECTS="clang;lld" ... 106``` 107 108- To run the JIT tests, set `HL_JIT_TARGET=wasm-32-wasmrt` (possibly adding 109 `wasm_simd128`, `wasm_signext`, and/or `wasm_sat_float_to_int`) and run 110 CMake/CTest normally. Note that wasm testing is only support under CMake 111 (not via Make). 112 113## Enabling wasm AOT 114 115If you want to test ahead-of-time code generation (and you almost certainly 116will), you need to install Emscripten locally. 117 118- The simplest way to install is probably via the Emscripten emsdk 119 (https://emscripten.org/docs/getting_started/downloads.html). 120 121- To run the AOT tests, set `HL_TARGET=wasm-32-wasmrt` (possibly adding 122 `wasm_simd128`, `wasm_signext`, and/or `wasm_sat_float_to_int`) and run 123 CMake/CTest normally. Note that wasm testing is only support under CMake 124 (not via Make). 125 126# Running benchmarks 127 128The `test_performance` benchmarks are misleading (and thus useless) for Wasm, as 129they include JIT overhead as described elsewhere. Suitable benchmarks for Wasm 130will be provided at a later date. (See 131https://github.com/halide/Halide/issues/5119 and 132https://github.com/halide/Halide/issues/5047 to track progress.) 133 134# Known Limitations And Caveats 135 136- Current trunk LLVM (as of July 2020) doesn't reliably generate all of the 137 Wasm SIMD ops that are available; see 138 https://github.com/halide/Halide/issues/5130 for tracking information as 139 these are fixed. 140- Using the JIT requires that we link the `wasm-ld` tool into libHalide; with 141 some work this need could possibly be eliminated. 142- OSX and Linux-x64 have been tested. Windows hasn't; it should be supportable 143 with some work. (Patches welcome.) 144- None of the `apps/` folder has been investigated yet. Many of them should be 145 supportable with some work. (Patches welcome.) 146- We currently use v8/d8 as a test environment for AOT code; we may want to 147 consider using Node or (better yet) headless Chrome instead (which is 148 probably required to allow for using threads in AOT code). 149 150# Known TODO: 151 152- There's some invasive hackiness in Codgen_LLVM to support the JIT 153 trampolines; this really should be refactored to be less hacky. 154- Can we rework JIT to avoid the need to link in wasm-ld? This might be 155 doable, as the wasm object files produced by the LLVM backend are close 156 enough to an executable form that we could likely make it work with some 157 massaging on our side, but it's not clear whether this would be a bad idea 158 or not (i.e., would it be unreasonably fragile). 159- Buffer-copying overhead in the JIT could possibly be dramatically improved 160 by modeling the copy as a "device" (i.e. `copy_to_device()` would copy from 161 host -> wasm); this would make the performance benchmarks much more useful. 162- Can we support threads in the JIT without an unreasonable amount of work? 163 Unknown at this point. 164