1# he [![Build status](https://travis-ci.org/mathiasbynens/he.svg?branch=master)](https://travis-ci.org/mathiasbynens/he) [![Code coverage status](http://img.shields.io/coveralls/mathiasbynens/he/master.svg)](https://coveralls.io/r/mathiasbynens/he) [![Dependency status](https://gemnasium.com/mathiasbynens/he.svg)](https://gemnasium.com/mathiasbynens/he) 2 3_he_ (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports [all standardized named character references as per HTML](http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html), handles [ambiguous ampersands](https://mathiasbynens.be/notes/ambiguous-ampersands) and other edge cases [just like a browser would](http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenizing-character-references), has an extensive test suite, and — contrary to many other JavaScript solutions — _he_ handles astral Unicode symbols just fine. [An online demo is available.](http://mothereff.in/html-entities) 4 5## Installation 6 7Via [npm](http://npmjs.org/): 8 9```bash 10npm install he 11``` 12 13Via [Bower](http://bower.io/): 14 15```bash 16bower install he 17``` 18 19Via [Component](https://github.com/component/component): 20 21```bash 22component install mathiasbynens/he 23``` 24 25In a browser: 26 27```html 28<script src="he.js"></script> 29``` 30 31In [Narwhal](http://narwhaljs.org/), [Node.js](http://nodejs.org/), and [RingoJS](http://ringojs.org/): 32 33```js 34var he = require('he'); 35``` 36 37In [Rhino](http://www.mozilla.org/rhino/): 38 39```js 40load('he.js'); 41``` 42 43Using an AMD loader like [RequireJS](http://requirejs.org/): 44 45```js 46require( 47 { 48 'paths': { 49 'he': 'path/to/he' 50 } 51 }, 52 ['he'], 53 function(he) { 54 console.log(he); 55 } 56); 57``` 58 59## API 60 61### `he.version` 62 63A string representing the semantic version number. 64 65### `he.encode(text, options)` 66 67This function takes a string of text and encodes (by default) any symbols that aren’t printable ASCII symbols and `&`, `<`, `>`, `"`, `'`, and `` ` ``, replacing them with character references. 68 69```js 70he.encode('foo © bar ≠ baz qux'); 71// → 'foo © bar ≠ baz 𝌆 qux' 72``` 73 74As long as the input string contains [allowed code points](http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream) only, the return value of this function is always valid HTML. Any [(invalid) code points that cannot be represented using a character reference](http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#table-charref-overrides) in the input are not encoded. 75 76```js 77he.encode('foo \0 bar'); 78// → 'foo \0 bar' 79``` 80 81The `options` object is optional. It recognizes the following properties: 82 83#### `useNamedReferences` 84 85The default value for the `useNamedReferences` option is `false`. This means that `encode()` will not use any named character references (e.g. `©`) in the output — hexadecimal escapes (e.g. `©`) will be used instead. Set it to `true` to enable the use of named references. 86 87**Note that if compatibility with older browsers is a concern, this option should remain disabled.** 88 89```js 90// Using the global default setting (defaults to `false`): 91he.encode('foo © bar ≠ baz qux'); 92// → 'foo © bar ≠ baz 𝌆 qux' 93 94// Passing an `options` object to `encode`, to explicitly disallow named references: 95he.encode('foo © bar ≠ baz qux', { 96 'useNamedReferences': false 97}); 98// → 'foo © bar ≠ baz 𝌆 qux' 99 100// Passing an `options` object to `encode`, to explicitly allow named references: 101he.encode('foo © bar ≠ baz qux', { 102 'useNamedReferences': true 103}); 104// → 'foo © bar ≠ baz 𝌆 qux' 105``` 106 107#### `encodeEverything` 108 109The default value for the `encodeEverything` option is `false`. This means that `encode()` will not use any character references for printable ASCII symbols that don’t need escaping. Set it to `true` to encode every symbol in the input string. When set to `true`, this option takes precedence over `allowUnsafeSymbols` (i.e. setting the latter to `true` in such a case has no effect). 110 111```js 112// Using the global default setting (defaults to `false`): 113he.encode('foo © bar ≠ baz qux'); 114// → 'foo © bar ≠ baz 𝌆 qux' 115 116// Passing an `options` object to `encode`, to explicitly encode all symbols: 117he.encode('foo © bar ≠ baz qux', { 118 'encodeEverything': true 119}); 120// → 'foo © bar ≠ baz 𝌆 qux' 121 122// This setting can be combined with the `useNamedReferences` option: 123he.encode('foo © bar ≠ baz qux', { 124 'encodeEverything': true, 125 'useNamedReferences': true 126}); 127// → 'foo © bar ≠ baz 𝌆 qux' 128``` 129 130#### `strict` 131 132The default value for the `strict` option is `false`. This means that `encode()` will encode any HTML text content you feed it, even if it contains any symbols that cause [parse errors](http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream). To throw an error when such invalid HTML is encountered, set the `strict` option to `true`. This option makes it possible to use _he_ as part of HTML parsers and HTML validators. 133 134```js 135// Using the global default setting (defaults to `false`, i.e. error-tolerant mode): 136he.encode('\x01'); 137// → '' 138 139// Passing an `options` object to `encode`, to explicitly enable error-tolerant mode: 140he.encode('\x01', { 141 'strict': false 142}); 143// → '' 144 145// Passing an `options` object to `encode`, to explicitly enable strict mode: 146he.encode('\x01', { 147 'strict': true 148}); 149// → Parse error 150``` 151 152#### `allowUnsafeSymbols` 153 154The default value for the `allowUnsafeSymbols` option is `false`. This means that characters that are unsafe for use in HTML content (`&`, `<`, `>`, `"`, `'`, and `` ` ``) will be encoded. When set to `true`, only non-ASCII characters will be encoded. If the `encodeEverything` option is set to `true`, this option will be ignored. 155 156```js 157he.encode('foo © and & ampersand', { 158 'allowUnsafeSymbols': true 159}); 160// → 'foo © and & ampersand' 161``` 162 163#### Overriding default `encode` options globally 164 165The global default setting can be overridden by modifying the `he.encode.options` object. This saves you from passing in an `options` object for every call to `encode` if you want to use the non-default setting. 166 167```js 168// Read the global default setting: 169he.encode.options.useNamedReferences; 170// → `false` by default 171 172// Override the global default setting: 173he.encode.options.useNamedReferences = true; 174 175// Using the global default setting, which is now `true`: 176he.encode('foo © bar ≠ baz qux'); 177// → 'foo © bar ≠ baz 𝌆 qux' 178``` 179 180### `he.decode(html, options)` 181 182This function takes a string of HTML and decodes any named and numerical character references in it using [the algorithm described in section 12.2.4.69 of the HTML spec](http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenizing-character-references). 183 184```js 185he.decode('foo © bar ≠ baz 𝌆 qux'); 186// → 'foo © bar ≠ baz qux' 187``` 188 189The `options` object is optional. It recognizes the following properties: 190 191#### `isAttributeValue` 192 193The default value for the `isAttributeValue` option is `false`. This means that `decode()` will decode the string as if it were used in [a text context in an HTML document](http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#data-state). HTML has different rules for [parsing character references in attribute values](http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#character-reference-in-attribute-value-state) — set this option to `true` to treat the input string as if it were used as an attribute value. 194 195```js 196// Using the global default setting (defaults to `false`, i.e. HTML text context): 197he.decode('foo&bar'); 198// → 'foo&bar' 199 200// Passing an `options` object to `decode`, to explicitly assume an HTML text context: 201he.decode('foo&bar', { 202 'isAttributeValue': false 203}); 204// → 'foo&bar' 205 206// Passing an `options` object to `decode`, to explicitly assume an HTML attribute value context: 207he.decode('foo&bar', { 208 'isAttributeValue': true 209}); 210// → 'foo&bar' 211``` 212 213#### `strict` 214 215The default value for the `strict` option is `false`. This means that `decode()` will decode any HTML text content you feed it, even if it contains any entities that cause [parse errors](http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenizing-character-references). To throw an error when such invalid HTML is encountered, set the `strict` option to `true`. This option makes it possible to use _he_ as part of HTML parsers and HTML validators. 216 217```js 218// Using the global default setting (defaults to `false`, i.e. error-tolerant mode): 219he.decode('foo&bar'); 220// → 'foo&bar' 221 222// Passing an `options` object to `decode`, to explicitly enable error-tolerant mode: 223he.decode('foo&bar', { 224 'strict': false 225}); 226// → 'foo&bar' 227 228// Passing an `options` object to `decode`, to explicitly enable strict mode: 229he.decode('foo&bar', { 230 'strict': true 231}); 232// → Parse error 233``` 234 235#### Overriding default `decode` options globally 236 237The global default settings for the `decode` function can be overridden by modifying the `he.decode.options` object. This saves you from passing in an `options` object for every call to `decode` if you want to use a non-default setting. 238 239```js 240// Read the global default setting: 241he.decode.options.isAttributeValue; 242// → `false` by default 243 244// Override the global default setting: 245he.decode.options.isAttributeValue = true; 246 247// Using the global default setting, which is now `true`: 248he.decode('foo&bar'); 249// → 'foo&bar' 250``` 251 252### `he.escape(text)` 253 254This function takes a string of text and escapes it for use in text contexts in XML or HTML documents. Only the following characters are escaped: `&`, `<`, `>`, `"`, `'`, and `` ` ``. 255 256```js 257he.escape('<img src=\'x\' onerror="prompt(1)">'); 258// → '<img src='x' onerror="prompt(1)">' 259``` 260 261### `he.unescape(html, options)` 262 263`he.unescape` is an alias for `he.decode`. It takes a string of HTML and decodes any named and numerical character references in it. 264 265### Using the `he` binary 266 267To use the `he` binary in your shell, simply install _he_ globally using npm: 268 269```bash 270npm install -g he 271``` 272 273After that you will be able to encode/decode HTML entities from the command line: 274 275```bash 276$ he --encode 'föo ♥ bår baz' 277föo ♥ bår 𝌆 baz 278 279$ he --encode --use-named-refs 'föo ♥ bår baz' 280föo ♥ bår 𝌆 baz 281 282$ he --decode 'föo ♥ bår 𝌆 baz' 283föo ♥ bår baz 284``` 285 286Read a local text file, encode it for use in an HTML text context, and save the result to a new file: 287 288```bash 289$ he --encode < foo.txt > foo-escaped.html 290``` 291 292Or do the same with an online text file: 293 294```bash 295$ curl -sL "http://git.io/HnfEaw" | he --encode > escaped.html 296``` 297 298Or, the opposite — read a local file containing a snippet of HTML in a text context, decode it back to plain text, and save the result to a new file: 299 300```bash 301$ he --decode < foo-escaped.html > foo.txt 302``` 303 304Or do the same with an online HTML snippet: 305 306```bash 307$ curl -sL "http://git.io/HnfEaw" | he --decode > decoded.txt 308``` 309 310See `he --help` for the full list of options. 311 312## Support 313 314he has been tested in at least Chrome 27-29, Firefox 3-22, Safari 4-6, Opera 10-12, IE 6-10, Node.js v0.10.0, Narwhal 0.3.2, RingoJS 0.8-0.9, PhantomJS 1.9.0, and Rhino 1.7RC4. 315 316## Unit tests & code coverage 317 318After cloning this repository, run `npm install` to install the dependencies needed for he development and testing. You may want to install Istanbul _globally_ using `npm install istanbul -g`. 319 320Once that’s done, you can run the unit tests in Node using `npm test` or `node tests/tests.js`. To run the tests in Rhino, Ringo, Narwhal, and web browsers as well, use `grunt test`. 321 322To generate the code coverage report, use `grunt cover`. 323 324## Acknowledgements 325 326Thanks to [Simon Pieters](http://simon.html5.org/) ([@zcorpan](https://twitter.com/zcorpan)) for the many suggestions. 327 328## Author 329 330| [![twitter/mathias](https://gravatar.com/avatar/24e08a9ea84deb17ae121074d0f17125?s=70)](https://twitter.com/mathias "Follow @mathias on Twitter") | 331|---| 332| [Mathias Bynens](https://mathiasbynens.be/) | 333 334## License 335 336_he_ is available under the [MIT](http://mths.be/mit) license. 337