1# he [![Build status](https://travis-ci.org/mathiasbynens/he.svg?branch=master)](https://travis-ci.org/mathiasbynens/he) [![Code coverage status](http://img.shields.io/coveralls/mathiasbynens/he/master.svg)](https://coveralls.io/r/mathiasbynens/he) [![Dependency status](https://gemnasium.com/mathiasbynens/he.svg)](https://gemnasium.com/mathiasbynens/he)
2
3_he_ (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports [all standardized named character references as per HTML](http://www.whatwg.org/specs/web-apps/current-work/multipage/named-character-references.html), handles [ambiguous ampersands](https://mathiasbynens.be/notes/ambiguous-ampersands) and other edge cases [just like a browser would](http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenizing-character-references), has an extensive test suite, and — contrary to many other JavaScript solutions — _he_ handles astral Unicode symbols just fine. [An online demo is available.](http://mothereff.in/html-entities)
4
5## Installation
6
7Via [npm](http://npmjs.org/):
8
9```bash
10npm install he
11```
12
13Via [Bower](http://bower.io/):
14
15```bash
16bower install he
17```
18
19Via [Component](https://github.com/component/component):
20
21```bash
22component install mathiasbynens/he
23```
24
25In a browser:
26
27```html
28<script src="he.js"></script>
29```
30
31In [Narwhal](http://narwhaljs.org/), [Node.js](http://nodejs.org/), and [RingoJS](http://ringojs.org/):
32
33```js
34var he = require('he');
35```
36
37In [Rhino](http://www.mozilla.org/rhino/):
38
39```js
40load('he.js');
41```
42
43Using an AMD loader like [RequireJS](http://requirejs.org/):
44
45```js
46require(
47  {
48    'paths': {
49      'he': 'path/to/he'
50    }
51  },
52  ['he'],
53  function(he) {
54    console.log(he);
55  }
56);
57```
58
59## API
60
61### `he.version`
62
63A string representing the semantic version number.
64
65### `he.encode(text, options)`
66
67This function takes a string of text and encodes (by default) any symbols that aren’t printable ASCII symbols and `&`, `<`, `>`, `"`, `'`, and `` ` ``, replacing them with character references.
68
69```js
70he.encode('foo © bar ≠ baz �� qux');
71// → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'
72```
73
74As long as the input string contains [allowed code points](http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream) only, the return value of this function is always valid HTML. Any [(invalid) code points that cannot be represented using a character reference](http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#table-charref-overrides) in the input are not encoded.
75
76```js
77he.encode('foo \0 bar');
78// → 'foo \0 bar'
79```
80
81The `options` object is optional. It recognizes the following properties:
82
83#### `useNamedReferences`
84
85The default value for the `useNamedReferences` option is `false`. This means that `encode()` will not use any named character references (e.g. `&copy;`) in the output — hexadecimal escapes (e.g. `&#xA9;`) will be used instead. Set it to `true` to enable the use of named references.
86
87**Note that if compatibility with older browsers is a concern, this option should remain disabled.**
88
89```js
90// Using the global default setting (defaults to `false`):
91he.encode('foo © bar ≠ baz �� qux');
92// → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'
93
94// Passing an `options` object to `encode`, to explicitly disallow named references:
95he.encode('foo © bar ≠ baz �� qux', {
96  'useNamedReferences': false
97});
98// → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'
99
100// Passing an `options` object to `encode`, to explicitly allow named references:
101he.encode('foo © bar ≠ baz �� qux', {
102  'useNamedReferences': true
103});
104// → 'foo &copy; bar &ne; baz &#x1D306; qux'
105```
106
107#### `encodeEverything`
108
109The default value for the `encodeEverything` option is `false`. This means that `encode()` will not use any character references for printable ASCII symbols that don’t need escaping. Set it to `true` to encode every symbol in the input string. When set to `true`, this option takes precedence over `allowUnsafeSymbols` (i.e. setting the latter to `true` in such a case has no effect).
110
111```js
112// Using the global default setting (defaults to `false`):
113he.encode('foo © bar ≠ baz �� qux');
114// → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'
115
116// Passing an `options` object to `encode`, to explicitly encode all symbols:
117he.encode('foo © bar ≠ baz �� qux', {
118  'encodeEverything': true
119});
120// → '&#x66;&#x6F;&#x6F;&#x20;&#xA9;&#x20;&#x62;&#x61;&#x72;&#x20;&#x2260;&#x20;&#x62;&#x61;&#x7A;&#x20;&#x1D306;&#x20;&#x71;&#x75;&#x78;'
121
122// This setting can be combined with the `useNamedReferences` option:
123he.encode('foo © bar ≠ baz �� qux', {
124  'encodeEverything': true,
125  'useNamedReferences': true
126});
127// → '&#x66;&#x6F;&#x6F;&#x20;&copy;&#x20;&#x62;&#x61;&#x72;&#x20;&ne;&#x20;&#x62;&#x61;&#x7A;&#x20;&#x1D306;&#x20;&#x71;&#x75;&#x78;'
128```
129
130#### `strict`
131
132The default value for the `strict` option is `false`. This means that `encode()` will encode any HTML text content you feed it, even if it contains any symbols that cause [parse errors](http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#preprocessing-the-input-stream). To throw an error when such invalid HTML is encountered, set the `strict` option to `true`. This option makes it possible to use _he_ as part of HTML parsers and HTML validators.
133
134```js
135// Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
136he.encode('\x01');
137// → '&#x1;'
138
139// Passing an `options` object to `encode`, to explicitly enable error-tolerant mode:
140he.encode('\x01', {
141  'strict': false
142});
143// → '&#x1;'
144
145// Passing an `options` object to `encode`, to explicitly enable strict mode:
146he.encode('\x01', {
147  'strict': true
148});
149// → Parse error
150```
151
152#### `allowUnsafeSymbols`
153
154The default value for the `allowUnsafeSymbols` option is `false`. This means that characters that are unsafe for use in HTML content (`&`, `<`, `>`, `"`, `'`, and `` ` ``) will be encoded. When set to `true`, only non-ASCII characters will be encoded. If the `encodeEverything` option is set to `true`, this option will be ignored.
155
156```js
157he.encode('foo © and & ampersand', {
158  'allowUnsafeSymbols': true
159});
160// → 'foo &#xA9; and & ampersand'
161```
162
163#### Overriding default `encode` options globally
164
165The global default setting can be overridden by modifying the `he.encode.options` object. This saves you from passing in an `options` object for every call to `encode` if you want to use the non-default setting.
166
167```js
168// Read the global default setting:
169he.encode.options.useNamedReferences;
170// → `false` by default
171
172// Override the global default setting:
173he.encode.options.useNamedReferences = true;
174
175// Using the global default setting, which is now `true`:
176he.encode('foo © bar ≠ baz �� qux');
177// → 'foo &copy; bar &ne; baz &#x1D306; qux'
178```
179
180### `he.decode(html, options)`
181
182This function takes a string of HTML and decodes any named and numerical character references in it using [the algorithm described in section 12.2.4.69 of the HTML spec](http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenizing-character-references).
183
184```js
185he.decode('foo &copy; bar &ne; baz &#x1D306; qux');
186// → 'foo © bar ≠ baz �� qux'
187```
188
189The `options` object is optional. It recognizes the following properties:
190
191#### `isAttributeValue`
192
193The default value for the `isAttributeValue` option is `false`. This means that `decode()` will decode the string as if it were used in [a text context in an HTML document](http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#data-state). HTML has different rules for [parsing character references in attribute values](http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#character-reference-in-attribute-value-state) — set this option to `true` to treat the input string as if it were used as an attribute value.
194
195```js
196// Using the global default setting (defaults to `false`, i.e. HTML text context):
197he.decode('foo&ampbar');
198// → 'foo&bar'
199
200// Passing an `options` object to `decode`, to explicitly assume an HTML text context:
201he.decode('foo&ampbar', {
202  'isAttributeValue': false
203});
204// → 'foo&bar'
205
206// Passing an `options` object to `decode`, to explicitly assume an HTML attribute value context:
207he.decode('foo&ampbar', {
208  'isAttributeValue': true
209});
210// → 'foo&ampbar'
211```
212
213#### `strict`
214
215The default value for the `strict` option is `false`. This means that `decode()` will decode any HTML text content you feed it, even if it contains any entities that cause [parse errors](http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#tokenizing-character-references). To throw an error when such invalid HTML is encountered, set the `strict` option to `true`. This option makes it possible to use _he_ as part of HTML parsers and HTML validators.
216
217```js
218// Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
219he.decode('foo&ampbar');
220// → 'foo&bar'
221
222// Passing an `options` object to `decode`, to explicitly enable error-tolerant mode:
223he.decode('foo&ampbar', {
224  'strict': false
225});
226// → 'foo&bar'
227
228// Passing an `options` object to `decode`, to explicitly enable strict mode:
229he.decode('foo&ampbar', {
230  'strict': true
231});
232// → Parse error
233```
234
235#### Overriding default `decode` options globally
236
237The global default settings for the `decode` function can be overridden by modifying the `he.decode.options` object. This saves you from passing in an `options` object for every call to `decode` if you want to use a non-default setting.
238
239```js
240// Read the global default setting:
241he.decode.options.isAttributeValue;
242// → `false` by default
243
244// Override the global default setting:
245he.decode.options.isAttributeValue = true;
246
247// Using the global default setting, which is now `true`:
248he.decode('foo&ampbar');
249// → 'foo&ampbar'
250```
251
252### `he.escape(text)`
253
254This function takes a string of text and escapes it for use in text contexts in XML or HTML documents. Only the following characters are escaped: `&`, `<`, `>`, `"`, `'`, and `` ` ``.
255
256```js
257he.escape('<img src=\'x\' onerror="prompt(1)">');
258// → '&lt;img src=&#x27;x&#x27; onerror=&quot;prompt(1)&quot;&gt;'
259```
260
261### `he.unescape(html, options)`
262
263`he.unescape` is an alias for `he.decode`. It takes a string of HTML and decodes any named and numerical character references in it.
264
265### Using the `he` binary
266
267To use the `he` binary in your shell, simply install _he_ globally using npm:
268
269```bash
270npm install -g he
271```
272
273After that you will be able to encode/decode HTML entities from the command line:
274
275```bash
276$ he --encode 'föo ♥ bår �� baz'
277f&#xF6;o &#x2665; b&#xE5;r &#x1D306; baz
278
279$ he --encode --use-named-refs 'föo ♥ bår �� baz'
280f&ouml;o &hearts; b&aring;r &#x1D306; baz
281
282$ he --decode 'f&ouml;o &hearts; b&aring;r &#x1D306; baz'
283föo ♥ bår �� baz
284```
285
286Read a local text file, encode it for use in an HTML text context, and save the result to a new file:
287
288```bash
289$ he --encode < foo.txt > foo-escaped.html
290```
291
292Or do the same with an online text file:
293
294```bash
295$ curl -sL "http://git.io/HnfEaw" | he --encode > escaped.html
296```
297
298Or, the opposite — read a local file containing a snippet of HTML in a text context, decode it back to plain text, and save the result to a new file:
299
300```bash
301$ he --decode < foo-escaped.html > foo.txt
302```
303
304Or do the same with an online HTML snippet:
305
306```bash
307$ curl -sL "http://git.io/HnfEaw" | he --decode > decoded.txt
308```
309
310See `he --help` for the full list of options.
311
312## Support
313
314he has been tested in at least Chrome 27-29, Firefox 3-22, Safari 4-6, Opera 10-12, IE 6-10, Node.js v0.10.0, Narwhal 0.3.2, RingoJS 0.8-0.9, PhantomJS 1.9.0, and Rhino 1.7RC4.
315
316## Unit tests & code coverage
317
318After cloning this repository, run `npm install` to install the dependencies needed for he development and testing. You may want to install Istanbul _globally_ using `npm install istanbul -g`.
319
320Once that’s done, you can run the unit tests in Node using `npm test` or `node tests/tests.js`. To run the tests in Rhino, Ringo, Narwhal, and web browsers as well, use `grunt test`.
321
322To generate the code coverage report, use `grunt cover`.
323
324## Acknowledgements
325
326Thanks to [Simon Pieters](http://simon.html5.org/) ([@zcorpan](https://twitter.com/zcorpan)) for the many suggestions.
327
328## Author
329
330| [![twitter/mathias](https://gravatar.com/avatar/24e08a9ea84deb17ae121074d0f17125?s=70)](https://twitter.com/mathias "Follow @mathias on Twitter") |
331|---|
332| [Mathias Bynens](https://mathiasbynens.be/) |
333
334## License
335
336_he_ is available under the [MIT](http://mths.be/mit) license.
337