1# Acquiring dynamic libraries 2 3_Draft_ 4 5## Overview 6 7The normal model for shared libraries in Unix is that libraries are installed 8into the filesystem by a package manager into the centralised locations `/lib` 9and `/usr/lib`. Native Client, however, does not have a built-in filesystem, and 10the concept of a centralised package manager is not applicable to web apps. 11Instead, we propose to use a virtualised filesystem namespace, implemented via 12IPC calls. Each NaCl process may be launched with a custom filesystem namespace 13populated with the library versions the web app chooses to use. 14 15## How files are fetched 16 17Each library and executable can be fetched from a URL. 18 19There are at least two interfaces through which libraries can be fetched for use 20in NaCl processes: * XMLHttpRequest * NaCl's `__urlAsNaClDesc()`, a method 21provided to Javascript on the NaCl plugin object. This returns a NaCl file 22descriptor via an asynchronous callback. 23 24In principle, any mechanism that Javascript code can currently use for fetching 25data can be used for fetching libraries. 26 27However, using the latter, NaCl-specific descriptor-based interface has two 28advantages: * It reduces the need to copy data between processes via sockets. * 29It can be used with an mmap() interface which has the potential to allow library 30and executable code to be mapped rather than copied into memory. Whether this 31potential can be realised depends on the underlying mechanism for dynamic 32loading; see [mmapping code] 33(DynamicLoadingOptions#mmapping_code_to_share_memory.md) in 34DynamicLoadingOptions. If mapping can be used, it means that two NaCl processes 35using the same library will share physical memory for the library, provided that 36the library is retained in the browser's cache. 37 38The basic interface for fetching files is therefore a Javascript API. We need a 39way to hook that up to Native Client. 40 41## How files are requested by NaCl processes 42 43NaCl processes will open files by making requests over IPC, using NaCl IMC 44sockets. Javascript code running in the browser can handle these requests and 45call `__urlAsNaclDesc()` on behalf of the NaCl process. Javascript objects can 46provide a virtual file namespace that may contain a Unix-like file layout. 47 48The `open()` library function will be implemented as a remote procedure call 49which sends a message across an IMC socket and expects to receive a reply 50message containing a file descriptor. This `open()` implementation will be used 51by the dynamic linker and can be made available in libc/libnacl. 52 53* This solves a bootstrap problem. The NaCl file namespace does not need to be 54 implemented by another NaCl process that would need to load its own files 55 somehow. The file namespace does not have to be implemented by trusted code. 56* This allows namespaces to be defined in a flexible way. Rules for mapping 57 filenames to URLs can be written in a scripting language. 58* The task of mapping filenames to URLs is not computationally intensive so 59 using Javascript should not be a performance problem. Javascript code passes 60 NaCl file descriptors around. File data does not need to be copied to or 61 from Javascript strings. 62* We can provide a sample implementation or standard library that implements 63 the Javascript side and provides the kind of file namespaces that developers 64 are likely to need. 65* This design does not involve adding too many NaCl-specific interfaces to 66 trusted code. 67 68### Receiving messages from NaCl asynchronously in Javascript 69 70The current NaCl Javascript API does not allow Javascript code to receive 71messages asynchronously from NaCl processes. We propose to extend the Javascript 72API to allow this. Javascript will need to be able to receive `open()` requests 73from the NaCl process. Currently the only way to do this is to busy-wait. 74 75Implementing this in the NaCl NPAPI plugin will require using 76[NPN\_PluginThreadAsyncCall()] 77(https://developer.mozilla.org/en/NPN_PluginThreadAsyncCall). 78 79### Initial socket connections 80 81The current interface assumes that the Javascript code will be sending requests 82to the NaCl process. The NaCl plugin creates the NaCl process with a BoundSocket 83descriptor. The NaCl process is expected to start by going into an 84`imc_accept()` loop on this descriptor to receive connections from Javascript. 85 86We would like to remove this assumption and allow the reverse arrangement. It 87should be possible to start the NaCl process with a SocketAddress descriptor -- 88or ideally, an array of NaCl descriptors of any descriptor type. The NaCl 89process should be able to send `open()` requests early on and should not need to 90call `imc_accept()` on startup. 91 92### Prototype implementation 93 94I wrote a prototype of this earlier in 2009. As an example web app I implemented 95a [Python read-eval-print loop (REPL)] 96(http://lackingrhoticity.blogspot.com/2009/06/python-standard-library-in-native.html), 97using CPython running under Native Client using dynamic linking. It is able to 98use Python extension modules such as Sqlite. The prototype works in Firefox on 99Linux. 100 101The code is in Git: * [hello.html] 102(http://repo.or.cz/w/nativeclient.git/blob/7b77b13ebfae704ac4492827d4431b5e70789c37:/imcplugin/hello.html): 103contains the Javascript side of the Python REPL * [imcplugin.c] 104(http://repo.or.cz/w/nativeclient.git/blob/7b77b13ebfae704ac4492827d4431b5e70789c37:/imcplugin/imcplugin.c): 105a minimal trusted NPAPI plugin for Native Client allowing Javascript to send and 106receive asynchrous messages * [demo.py] 107(http://repo.or.cz/w/nativeclient.git/blob/7b77b13ebfae704ac4492827d4431b5e70789c37:/imcplugin/demo.py): 108the Python code 109 110`imcplugin` provides the following interfaces to Javascript: 111 112* `plugin.get_file(url_string, function(nacl_file) { ... })` 113 114> Fetches a file from the given URL. When the file becomes available, the plugin 115> calls the callback function passing a Javascript wrapper object for a NaCl 116> file descriptor. This simple interface lacks error handling for when the URL 117> cannot be fetched. 118 119* `plugin.launch(nacl_file, [arg1, arg2, ...], function(msg) { ... }) -> proc` 120 121> Spawns a NaCl process. Under the hood, this runs `sel_ldr`. * Takes a NaCl 122> file object specifying the executable to run. * Takes an array of strings to 123> pass as command line arguments which the NaCl process receives via main(). * 124> Takes a callback function which receives messages from the NaCl process. Each 125> message is a string. * Returns an object which can be used to send messages to 126> the process. 127 128* `proc.send(string_arg, [nacl_file, ...])` 129 130> Sends a message to the process. Messages consist of an array of bytes 131> (represented as a Javascript string) and an array of file descriptor wrapper 132> objects. (The latter array may of course be empty.) 133 134### Call-return over IMC 135 136There are two ways we might implement call-return on top of IMC sockets. 137 138Option 1: Use the same channel, C, for sending and receiving: * `imc_sendmsg(C, 139request)` * `imc_recvmsg(C) -> reply` * This does not allow the channel to be 140shared between processes. 141 142Option 2: Create a new channel for each request: * `imc_connect(C) -> D` * 143`imc_sendmsg(D, request)` * `imc_recvmsg(D) -> reply` * `close(D)` 144 145See [IMCSockets](imc_sockets.md) for a further discussion. 146 147### Questions 148 149How will this interact with Web Workers? 150 151## Sharing libraries across sites 152 153It will be desirable to share library files across sites, so that the browser 154does not have to download identical files multiple times. This issue already 155occurs for Javascript libraries. NaCl executables and libraries are expected to 156be larger than Javascript libraries which makes this issue more important for 157NaCl. 158 159### Background: Same Origin Policy 160 161XMLHttpRequest is constrained by a Same Origin Policy (SOP). `__urlAsNaClDesc` 162will also be constrained by a SOP. (Note that the NaCl NPAPI plugin has to 163implement the SOP itself because NPAPI does not provide a way to reuse the 164browser's SOP.) 165 166The main reason for the SOP is that XMLHttpRequest requests convey cookies -- a 167type of [ambient authority](http://en.wikipedia.org/wiki/Ambient_authority). The 168Same Origin Policy is not intended to prevent web apps from sending messages 169across origins; it is only intended to prevent the web app from seeing the 170server's response to the request. (Sending cross-origin messages can already be 171done using mechanisms other than XMLHttpRequest, including redirects and `<img>` 172elements.) 173 174### Comparison: `script` element 175 176Loading libraries in NaCl is analogous to loading Javascript files via the 177`<script src=...>` element. Interestingly, `<script>` is not constrained by the 178SOP. By setting the response's content-type to `text/javascript`, the server 179effectively opts in to revealing the response to the web app. Supposedly, the 180response is not revealed directly to the web app. The DOM, a trusted part of the 181browser, evaluates the Javascript code, and the web app gets access only to the 182values the script assigns to variables. In practice, one cannot rely on 183`text/javascript` data from being revealed across origins. 184 185In NaCl's case, however, interpreting .so files is unambiguously the 186responsibility of untrusted code. We have to reveal the fetched data to the web 187app, so NaCl cannot be as unconstrained as the `<script>` tag. 188 189The `<script>` element permits a centralised model for sharing library code. 190Suppose multiple web apps use the library `libjfoo.js`. If this is hosted at 191`http://libjfoo.org/libjfoo-1.0.js`, the web apps can opt to link to this URL. 192The down side of using the `<script>` element in this way is that the web apps 193will be vulnerable to the centralised site, `libjfoo.org`. This site can change 194the file contents it serves up (there was [an example of this happening with 195json.org's copy of json.js] 196(http://www.stevesouders.com/blog/2009/12/10/crockford-alert/)) and thereby run 197arbitrary Javascript in the context of the web apps. Since the script text is 198not available across origins, the web app cannot check the text against a hash 199before `eval`'ing it. 200 201### Fetching libraries across origins 202 203For NaCl, web apps could fetch libraries using [Uniform Messaging] 204(http://lists.w3.org/Archives/Public/public-webapps/2009OctDec/att-0931/draft.html) 205(formerly known as GuestXHR) or [CORS](http://www.w3.org/Security/wiki/CORS), 206which are not NaCl-specific. 207 208We might also wish to allow decentralised sharing of files. For example, sites A 209and B both host `libfoo.so`. If the browser has already downloaded `libfoo.so` 210from site A, it won't need to download it again from site B, and vice-versa. 211Schemes for doing this by embedding secure hashes into URLs have been proposed; 212for example, see [Douglas Crockford's post] 213(http://profiles.yahoo.com/blog/GSBHPXZFNRM2QRAP3PXNGFMFVU?eid=vbNraNs6kXn9E4kaLDYAml5ESuTWLnf9pNVJDWj5zGMu8Ltwiw). 214 215This problem is not unique to NaCl, so we should not adopt a solution which is 216NaCl-specific. 217 218## Trust relationship between Javascript and NaCl process 219 220In the above scheme, there are two principals: * the untrusted NaCl process 221running under the NaCl trusted runtime; * the Javascript code running on the web 222page under the browser 223 224The NaCl process depends on the Javascript code to provide its execution 225environment. The Javascript code provides all the code running in the NaCl 226process. The Javascript code therefore has at least as much authority as the 227NaCl process. 228 229This is at odds with the current same origin policy, described in [issue 238] 230(http://code.google.com/p/nativeclient/issues/detail?id=238). In the current 231scheme, executable `http://a.com/foo.nexe` can be embedded in the page 232`http://b.com`. Javascript on `b.com`'s page can use `__urlAsNaClDesc()` to 233fetch an `a.com` URL, getting a file descriptor in return. However, only the 234NaCl process can use the file descriptor to read the file's contents. The NaCl 235process therefore has strictly greater authority than the Javascript. However, 236it has no trusted path for fetching files from `a.com`. This is a dangerous 237situation which is likely to lead to XRSF-like Confused Deputy vulnerabilities. 238`foo.nexe` is expected to distrust the messages and file descriptors it receives 239from the page; this is difficult or impossible to achieve. It is incompatible 240with the dynamic library scenario above in which the NaCl process must trust the 241library data supplied by the page. 242 243We propose that if `__urlAsNaClDesc()` (or a similar API) is to follow a same 244origin policy at all, it should use the origin of the page, not the origin of 245the executable's URL. 246 247It may be that directly embedding a NaCl plugin object across origins should not 248be permitted at all. In this case, it would still be possible to embed a NaCl 249plugin object across origins indirectly, through a cross-origin iframe. In such 250a scenario, one is embedding a combination of Javascript and NaCl code in which 251the latter can legitimately trust the former. 252 253## Prefetching files 254 255The simplest approach to fetching library files is to fetch them one by one, as 256ld.so does synchronous open() calls. However, this means the inbound network 257connection will be idle after the end of a file is received by the client and 258before ld.so's request for the next file is received by the server. This costs 259one network round trip per file. 260 261We could reduce the time taken to fetch the whole set of files by pipelining the 262requests. A simple way to do this, which does not involve changing the dynamic 263linker, is to list up-front all the libraries we expect to load. The Javascript 264code could request the files on startup in order to pre-populate the browser's 265cache. 266 267## Versioning 268 269As with static linking, each web app gets to choose its own version of libc and 270other libraries. Furthermore, different NaCl processes in the same web app can 271use different libc versions. Libc is not supplied by the browser. 272 273We don't expect there to be a huge number of libc versions, but older and newer 274versions of the same libc are likely to be around at the same time, as are 275different libc implementations (such as newlib and glibc). 276 277Web apps get to pick a set of libraries that are known to work well together. 278This is analogous to selecting a set of Javascript libraries, or selecting a set 279of packages for a software distribution such as Debian or Fedora. This way we 280can avoid "[DLL hell](http://en.wikipedia.org/wiki/Dll_hell)"; libraries are not 281the responsibility of the end user. 282 283This provides extra flexibility that is not available to typical applications on 284Linux when packaged with commonly-used packaging systems like dpkg or RPM. 285Packaging systems such as Zero-Install and Nix allow multiple library versions 286to coexist in the same way that I am proposing for NaCl. 287 288Though we have this extra flexibility we will still have all the versioning 289mechanisms that are available in ELF shared libraries normally: libraries can 290opt to provide stable ABIs and declare interfaces via [sonames] 291(http://en.wikipedia.org/wiki/Soname) and ELF symbol versioning; we get the 292benefit of separate compilation. 293 294Upgrading libraries is the responsibility of the web app. A web app may choose 295to delegate this responsibility to another site by fetching libraries from that 296site. 297