1#+TITLE: UglifyJS -- a JavaScript parser/compressor/beautifier
2#+KEYWORDS: javascript, js, parser, compiler, compressor, mangle, minify, minifier
3#+DESCRIPTION: a JavaScript parser/compressor/beautifier in JavaScript
4#+STYLE: <link rel="stylesheet" type="text/css" href="docstyle.css" />
5#+AUTHOR: Mihai Bazon
6#+EMAIL: mihai.bazon@gmail.com
7
8* UglifyJS --- a JavaScript parser/compressor/beautifier
9
10This package implements a general-purpose JavaScript
11parser/compressor/beautifier toolkit.  It is developed on [[http://nodejs.org/][NodeJS]], but it
12should work on any JavaScript platform supporting the CommonJS module system
13(and if your platform of choice doesn't support CommonJS, you can easily
14implement it, or discard the =exports.*= lines from UglifyJS sources).
15
16The tokenizer/parser generates an abstract syntax tree from JS code.  You
17can then traverse the AST to learn more about the code, or do various
18manipulations on it.  This part is implemented in [[../lib/parse-js.js][parse-js.js]] and it's a
19port to JavaScript of the excellent [[http://marijn.haverbeke.nl/parse-js/][parse-js]] Common Lisp library from [[http://marijn.haverbeke.nl/][Marijn
20Haverbeke]].
21
22( See [[http://github.com/mishoo/cl-uglify-js][cl-uglify-js]] if you're looking for the Common Lisp version of
23UglifyJS. )
24
25The second part of this package, implemented in [[../lib/process.js][process.js]], inspects and
26manipulates the AST generated by the parser to provide the following:
27
28- ability to re-generate JavaScript code from the AST.  Optionally
29  indented---you can use this if you want to “beautify” a program that has
30  been compressed, so that you can inspect the source.  But you can also run
31  our code generator to print out an AST without any whitespace, so you
32  achieve compression as well.
33
34- shorten variable names (usually to single characters).  Our mangler will
35  analyze the code and generate proper variable names, depending on scope
36  and usage, and is smart enough to deal with globals defined elsewhere, or
37  with =eval()= calls or =with{}= statements.  In short, if =eval()= or
38  =with{}= are used in some scope, then all variables in that scope and any
39  variables in the parent scopes will remain unmangled, and any references
40  to such variables remain unmangled as well.
41
42- various small optimizations that may lead to faster code but certainly
43  lead to smaller code.  Where possible, we do the following:
44
45  - foo["bar"]  ==>  foo.bar
46
47  - remove block brackets ={}=
48
49  - join consecutive var declarations:
50    var a = 10; var b = 20; ==> var a=10,b=20;
51
52  - resolve simple constant expressions: 1 +2 * 3 ==> 7.  We only do the
53    replacement if the result occupies less bytes; for example 1/3 would
54    translate to 0.333333333333, so in this case we don't replace it.
55
56  - consecutive statements in blocks are merged into a sequence; in many
57    cases, this leaves blocks with a single statement, so then we can remove
58    the block brackets.
59
60  - various optimizations for IF statements:
61
62    - if (foo) bar(); else baz(); ==> foo?bar():baz();
63    - if (!foo) bar(); else baz(); ==> foo?baz():bar();
64    - if (foo) bar(); ==> foo&&bar();
65    - if (!foo) bar(); ==> foo||bar();
66    - if (foo) return bar(); else return baz(); ==> return foo?bar():baz();
67    - if (foo) return bar(); else something(); ==> {if(foo)return bar();something()}
68
69  - remove some unreachable code and warn about it (code that follows a
70    =return=, =throw=, =break= or =continue= statement, except
71    function/variable declarations).
72
73  - act a limited version of a pre-processor (c.f. the pre-processor of
74    C/C++) to allow you to safely replace selected global symbols with
75    specified values.  When combined with the optimisations above this can
76    make UglifyJS operate slightly more like a compilation process, in
77    that when certain symbols are replaced by constant values, entire code
78    blocks may be optimised away as unreachable.
79
80** <<Unsafe transformations>>
81
82The following transformations can in theory break code, although they're
83probably safe in most practical cases.  To enable them you need to pass the
84=--unsafe= flag.
85
86*** Calls involving the global Array constructor
87
88The following transformations occur:
89
90#+BEGIN_SRC js
91new Array(1, 2, 3, 4)  => [1,2,3,4]
92Array(a, b, c)         => [a,b,c]
93new Array(5)           => Array(5)
94new Array(a)           => Array(a)
95#+END_SRC
96
97These are all safe if the Array name isn't redefined.  JavaScript does allow
98one to globally redefine Array (and pretty much everything, in fact) but I
99personally don't see why would anyone do that.
100
101UglifyJS does handle the case where Array is redefined locally, or even
102globally but with a =function= or =var= declaration.  Therefore, in the
103following cases UglifyJS *doesn't touch* calls or instantiations of Array:
104
105#+BEGIN_SRC js
106// case 1.  globally declared variable
107  var Array;
108  new Array(1, 2, 3);
109  Array(a, b);
110
111  // or (can be declared later)
112  new Array(1, 2, 3);
113  var Array;
114
115  // or (can be a function)
116  new Array(1, 2, 3);
117  function Array() { ... }
118
119// case 2.  declared in a function
120  (function(){
121    a = new Array(1, 2, 3);
122    b = Array(5, 6);
123    var Array;
124  })();
125
126  // or
127  (function(Array){
128    return Array(5, 6, 7);
129  })();
130
131  // or
132  (function(){
133    return new Array(1, 2, 3, 4);
134    function Array() { ... }
135  })();
136
137  // etc.
138#+END_SRC
139
140*** =obj.toString()= ==> =obj+“”=
141
142** Install (NPM)
143
144UglifyJS is now available through NPM --- =npm install uglify-js= should do
145the job.
146
147** Install latest code from GitHub
148
149#+BEGIN_SRC sh
150## clone the repository
151mkdir -p /where/you/wanna/put/it
152cd /where/you/wanna/put/it
153git clone git://github.com/mishoo/UglifyJS.git
154
155## make the module available to Node
156mkdir -p ~/.node_libraries/
157cd ~/.node_libraries/
158ln -s /where/you/wanna/put/it/UglifyJS/uglify-js.js
159
160## and if you want the CLI script too:
161mkdir -p ~/bin
162cd ~/bin
163ln -s /where/you/wanna/put/it/UglifyJS/bin/uglifyjs
164  # (then add ~/bin to your $PATH if it's not there already)
165#+END_SRC
166
167** Usage
168
169There is a command-line tool that exposes the functionality of this library
170for your shell-scripting needs:
171
172#+BEGIN_SRC sh
173uglifyjs [ options... ] [ filename ]
174#+END_SRC
175
176=filename= should be the last argument and should name the file from which
177to read the JavaScript code.  If you don't specify it, it will read code
178from STDIN.
179
180Supported options:
181
182- =-b= or =--beautify= --- output indented code; when passed, additional
183  options control the beautifier:
184
185  - =-i N= or =--indent N= --- indentation level (number of spaces)
186
187  - =-q= or =--quote-keys= --- quote keys in literal objects (by default,
188    only keys that cannot be identifier names will be quotes).
189
190- =--ascii= --- pass this argument to encode non-ASCII characters as
191  =\uXXXX= sequences.  By default UglifyJS won't bother to do it and will
192  output Unicode characters instead.  (the output is always encoded in UTF8,
193  but if you pass this option you'll only get ASCII).
194
195- =-nm= or =--no-mangle= --- don't mangle variable names
196
197- =-ns= or =--no-squeeze= --- don't call =ast_squeeze()= (which does various
198  optimizations that result in smaller, less readable code).
199
200- =-mt= or =--mangle-toplevel= --- mangle names in the toplevel scope too
201  (by default we don't do this).
202
203- =--no-seqs= --- when =ast_squeeze()= is called (thus, unless you pass
204  =--no-squeeze=) it will reduce consecutive statements in blocks into a
205  sequence.  For example, "a = 10; b = 20; foo();" will be written as
206  "a=10,b=20,foo();".  In various occasions, this allows us to discard the
207  block brackets (since the block becomes a single statement).  This is ON
208  by default because it seems safe and saves a few hundred bytes on some
209  libs that I tested it on, but pass =--no-seqs= to disable it.
210
211- =--no-dead-code= --- by default, UglifyJS will remove code that is
212  obviously unreachable (code that follows a =return=, =throw=, =break= or
213  =continue= statement and is not a function/variable declaration).  Pass
214  this option to disable this optimization.
215
216- =-nc= or =--no-copyright= --- by default, =uglifyjs= will keep the initial
217  comment tokens in the generated code (assumed to be copyright information
218  etc.).  If you pass this it will discard it.
219
220- =-o filename= or =--output filename= --- put the result in =filename=.  If
221  this isn't given, the result goes to standard output (or see next one).
222
223- =--overwrite= --- if the code is read from a file (not from STDIN) and you
224  pass =--overwrite= then the output will be written in the same file.
225
226- =--ast= --- pass this if you want to get the Abstract Syntax Tree instead
227  of JavaScript as output.  Useful for debugging or learning more about the
228  internals.
229
230- =-v= or =--verbose= --- output some notes on STDERR (for now just how long
231  each operation takes).
232
233- =-d SYMBOL[=VALUE]= or =--define SYMBOL[=VALUE]= --- will replace
234  all instances of the specified symbol where used as an identifier
235  (except where symbol has properly declared by a var declaration or
236  use as function parameter or similar) with the specified value. This
237  argument may be specified multiple times to define multiple
238  symbols - if no value is specified the symbol will be replaced with
239  the value =true=, or you can specify a numeric value (such as
240  =1024=), a quoted string value (such as ="object"= or
241  ='https://github.com'=), or the name of another symbol or keyword
242  (such as =null= or =document=).
243  This allows you, for example, to assign meaningful names to key
244  constant values but discard the symbolic names in the uglified
245  version for brevity/efficiency, or when used wth care, allows
246  UglifyJS to operate as a form of *conditional compilation*
247  whereby defining appropriate values may, by dint of the constant
248  folding and dead code removal features above, remove entire
249  superfluous code blocks (e.g. completely remove instrumentation or
250  trace code for production use).
251  Where string values are being defined, the handling of quotes are
252  likely to be subject to the specifics of your command shell
253  environment, so you may need to experiment with quoting styles
254  depending on your platform, or you may find the option
255  =--define-from-module= more suitable for use.
256
257- =-define-from-module SOMEMODULE= --- will load the named module (as
258  per the NodeJS =require()= function) and iterate all the exported
259  properties of the module defining them as symbol names to be defined
260  (as if by the =--define= option) per the name of each property
261  (i.e. without the module name prefix) and given the value of the
262  property. This is a much easier way to handle and document groups of
263  symbols to be defined rather than a large number of =--define=
264  options.
265
266- =--unsafe= --- enable other additional optimizations that are known to be
267  unsafe in some contrived situations, but could still be generally useful.
268  For now only these:
269
270  - foo.toString()  ==>  foo+""
271  - new Array(x,...)  ==> [x,...]
272  - new Array(x) ==> Array(x)
273
274- =--max-line-len= (default 32K characters) --- add a newline after around
275  32K characters.  I've seen both FF and Chrome croak when all the code was
276  on a single line of around 670K.  Pass --max-line-len 0 to disable this
277  safety feature.
278
279- =--reserved-names= --- some libraries rely on certain names to be used, as
280  pointed out in issue #92 and #81, so this option allow you to exclude such
281  names from the mangler.  For example, to keep names =require= and =$super=
282  intact you'd specify --reserved-names "require,$super".
283
284- =--inline-script= -- when you want to include the output literally in an
285  HTML =<script>= tag you can use this option to prevent =</script= from
286  showing up in the output.
287
288- =--lift-vars= -- when you pass this, UglifyJS will apply the following
289  transformations (see the notes in API, =ast_lift_variables=):
290
291  - put all =var= declarations at the start of the scope
292  - make sure a variable is declared only once
293  - discard unused function arguments
294  - discard unused inner (named) functions
295  - finally, try to merge assignments into that one =var= declaration, if
296    possible.
297
298*** API
299
300To use the library from JavaScript, you'd do the following (example for
301NodeJS):
302
303#+BEGIN_SRC js
304var jsp = require("uglify-js").parser;
305var pro = require("uglify-js").uglify;
306
307var orig_code = "... JS code here";
308var ast = jsp.parse(orig_code); // parse code and get the initial AST
309ast = pro.ast_mangle(ast); // get a new AST with mangled names
310ast = pro.ast_squeeze(ast); // get an AST with compression optimizations
311var final_code = pro.gen_code(ast); // compressed code here
312#+END_SRC
313
314The above performs the full compression that is possible right now.  As you
315can see, there are a sequence of steps which you can apply.  For example if
316you want compressed output but for some reason you don't want to mangle
317variable names, you would simply skip the line that calls
318=pro.ast_mangle(ast)=.
319
320Some of these functions take optional arguments.  Here's a description:
321
322- =jsp.parse(code, strict_semicolons)= -- parses JS code and returns an AST.
323  =strict_semicolons= is optional and defaults to =false=.  If you pass
324  =true= then the parser will throw an error when it expects a semicolon and
325  it doesn't find it.  For most JS code you don't want that, but it's useful
326  if you want to strictly sanitize your code.
327
328- =pro.ast_lift_variables(ast)= -- merge and move =var= declarations to the
329  scop of the scope; discard unused function arguments or variables; discard
330  unused (named) inner functions.  It also tries to merge assignments
331  following the =var= declaration into it.
332
333  If your code is very hand-optimized concerning =var= declarations, this
334  lifting variable declarations might actually increase size.  For me it
335  helps out.  On jQuery it adds 865 bytes (243 after gzip).  YMMV.  Also
336  note that (since it's not enabled by default) this operation isn't yet
337  heavily tested (please report if you find issues!).
338
339  Note that although it might increase the image size (on jQuery it gains
340  865 bytes, 243 after gzip) it's technically more correct: in certain
341  situations, dead code removal might drop variable declarations, which
342  would not happen if the variables are lifted in advance.
343
344  Here's an example of what it does:
345
346#+BEGIN_SRC js
347function f(a, b, c, d, e) {
348    var q;
349    var w;
350    w = 10;
351    q = 20;
352    for (var i = 1; i < 10; ++i) {
353        var boo = foo(a);
354    }
355    for (var i = 0; i < 1; ++i) {
356        var boo = bar(c);
357    }
358    function foo(){ ... }
359    function bar(){ ... }
360    function baz(){ ... }
361}
362
363// transforms into ==>
364
365function f(a, b, c) {
366    var i, boo, w = 10, q = 20;
367    for (i = 1; i < 10; ++i) {
368        boo = foo(a);
369    }
370    for (i = 0; i < 1; ++i) {
371        boo = bar(c);
372    }
373    function foo() { ... }
374    function bar() { ... }
375}
376#+END_SRC
377
378- =pro.ast_mangle(ast, options)= -- generates a new AST containing mangled
379  (compressed) variable and function names.  It supports the following
380  options:
381
382  - =toplevel= -- mangle toplevel names (by default we don't touch them).
383  - =except= -- an array of names to exclude from compression.
384  - =defines= -- an object with properties named after symbols to
385    replace (see the =--define= option for the script) and the values
386    representing the AST replacement value.
387
388- =pro.ast_squeeze(ast, options)= -- employs further optimizations designed
389  to reduce the size of the code that =gen_code= would generate from the
390  AST.  Returns a new AST.  =options= can be a hash; the supported options
391  are:
392
393  - =make_seqs= (default true) which will cause consecutive statements in a
394    block to be merged using the "sequence" (comma) operator
395
396  - =dead_code= (default true) which will remove unreachable code.
397
398- =pro.gen_code(ast, options)= -- generates JS code from the AST.  By
399  default it's minified, but using the =options= argument you can get nicely
400  formatted output.  =options= is, well, optional :-) and if you pass it it
401  must be an object and supports the following properties (below you can see
402  the default values):
403
404  - =beautify: false= -- pass =true= if you want indented output
405  - =indent_start: 0= (only applies when =beautify= is =true=) -- initial
406    indentation in spaces
407  - =indent_level: 4= (only applies when =beautify= is =true=) --
408    indentation level, in spaces (pass an even number)
409  - =quote_keys: false= -- if you pass =true= it will quote all keys in
410    literal objects
411  - =space_colon: false= (only applies when =beautify= is =true=) -- wether
412    to put a space before the colon in object literals
413  - =ascii_only: false= -- pass =true= if you want to encode non-ASCII
414    characters as =\uXXXX=.
415  - =inline_script: false= -- pass =true= to escape occurrences of
416    =</script= in strings
417
418*** Beautifier shortcoming -- no more comments
419
420The beautifier can be used as a general purpose indentation tool.  It's
421useful when you want to make a minified file readable.  One limitation,
422though, is that it discards all comments, so you don't really want to use it
423to reformat your code, unless you don't have, or don't care about, comments.
424
425In fact it's not the beautifier who discards comments --- they are dumped at
426the parsing stage, when we build the initial AST.  Comments don't really
427make sense in the AST, and while we could add nodes for them, it would be
428inconvenient because we'd have to add special rules to ignore them at all
429the processing stages.
430
431*** Use as a code pre-processor
432
433The =--define= option can be used, particularly when combined with the
434constant folding logic, as a form of pre-processor to enable or remove
435particular constructions, such as might be used for instrumenting
436development code, or to produce variations aimed at a specific
437platform.
438
439The code below illustrates the way this can be done, and how the
440symbol replacement is performed.
441
442#+BEGIN_SRC js
443CLAUSE1: if (typeof DEVMODE === 'undefined') {
444    DEVMODE = true;
445}
446
447CLAUSE2: function init() {
448    if (DEVMODE) {
449        console.log("init() called");
450    }
451    ....
452    DEVMODE &amp;&amp; console.log("init() complete");
453}
454
455CLAUSE3: function reportDeviceStatus(device) {
456    var DEVMODE = device.mode, DEVNAME = device.name;
457    if (DEVMODE === 'open') {
458        ....
459    }
460}
461#+END_SRC
462
463When the above code is normally executed, the undeclared global
464variable =DEVMODE= will be assigned the value *true* (see =CLAUSE1=)
465and so the =init()= function (=CLAUSE2=) will write messages to the
466console log when executed, but in =CLAUSE3= a locally declared
467variable will mask access to the =DEVMODE= global symbol.
468
469If the above code is processed by UglifyJS with an argument of
470=--define DEVMODE=false= then UglifyJS will replace =DEVMODE= with the
471boolean constant value *false* within =CLAUSE1= and =CLAUSE2=, but it
472will leave =CLAUSE3= as it stands because there =DEVMODE= resolves to
473a validly declared variable.
474
475And more so, the constant-folding features of UglifyJS will recognise
476that the =if= condition of =CLAUSE1= is thus always false, and so will
477remove the test and body of =CLAUSE1= altogether (including the
478otherwise slightly problematical statement =false = true;= which it
479will have formed by replacing =DEVMODE= in the body).  Similarly,
480within =CLAUSE2= both calls to =console.log()= will be removed
481altogether.
482
483In this way you can mimic, to a limited degree, the functionality of
484the C/C++ pre-processor to enable or completely remove blocks
485depending on how certain symbols are defined - perhaps using UglifyJS
486to generate different versions of source aimed at different
487environments
488
489It is recommmended (but not made mandatory) that symbols designed for
490this purpose are given names consisting of =UPPER_CASE_LETTERS= to
491distinguish them from other (normal) symbols and avoid the sort of
492clash that =CLAUSE3= above illustrates.
493
494** Compression -- how good is it?
495
496Here are updated statistics.  (I also updated my Google Closure and YUI
497installations).
498
499We're still a lot better than YUI in terms of compression, though slightly
500slower.  We're still a lot faster than Closure, and compression after gzip
501is comparable.
502
503| File                        | UglifyJS         | UglifyJS+gzip | Closure          | Closure+gzip | YUI              | YUI+gzip |
504|-----------------------------+------------------+---------------+------------------+--------------+------------------+----------|
505| jquery-1.6.2.js             | 91001 (0:01.59)  |         31896 | 90678 (0:07.40)  |        31979 | 101527 (0:01.82) |    34646 |
506| paper.js                    | 142023 (0:01.65) |         43334 | 134301 (0:07.42) |        42495 | 173383 (0:01.58) |    48785 |
507| prototype.js                | 88544 (0:01.09)  |         26680 | 86955 (0:06.97)  |        26326 | 92130 (0:00.79)  |    28624 |
508| thelib-full.js (DynarchLIB) | 251939 (0:02.55) |         72535 | 249911 (0:09.05) |        72696 | 258869 (0:01.94) |    76584 |
509
510** Bugs?
511
512Unfortunately, for the time being there is no automated test suite.  But I
513ran the compressor manually on non-trivial code, and then I tested that the
514generated code works as expected.  A few hundred times.
515
516DynarchLIB was started in times when there was no good JS minifier.
517Therefore I was quite religious about trying to write short code manually,
518and as such DL contains a lot of syntactic hacks[1] such as “foo == bar ?  a
519= 10 : b = 20”, though the more readable version would clearly be to use
520if/else”.
521
522Since the parser/compressor runs fine on DL and jQuery, I'm quite confident
523that it's solid enough for production use.  If you can identify any bugs,
524I'd love to hear about them ([[http://groups.google.com/group/uglifyjs][use the Google Group]] or email me directly).
525
526[1] I even reported a few bugs and suggested some fixes in the original
527    [[http://marijn.haverbeke.nl/parse-js/][parse-js]] library, and Marijn pushed fixes literally in minutes.
528
529** Links
530
531- Twitter: [[http://twitter.com/UglifyJS][@UglifyJS]]
532- Project at GitHub: [[http://github.com/mishoo/UglifyJS][http://github.com/mishoo/UglifyJS]]
533- Google Group: [[http://groups.google.com/group/uglifyjs][http://groups.google.com/group/uglifyjs]]
534- Common Lisp JS parser: [[http://marijn.haverbeke.nl/parse-js/][http://marijn.haverbeke.nl/parse-js/]]
535- JS-to-Lisp compiler: [[http://github.com/marijnh/js][http://github.com/marijnh/js]]
536- Common Lisp JS uglifier: [[http://github.com/mishoo/cl-uglify-js][http://github.com/mishoo/cl-uglify-js]]
537
538** License
539
540UglifyJS is released under the BSD license:
541
542#+BEGIN_EXAMPLE
543Copyright 2010 (c) Mihai Bazon <mihai.bazon@gmail.com>
544Based on parse-js (http://marijn.haverbeke.nl/parse-js/).
545
546Redistribution and use in source and binary forms, with or without
547modification, are permitted provided that the following conditions
548are met:
549
550    * Redistributions of source code must retain the above
551      copyright notice, this list of conditions and the following
552      disclaimer.
553
554    * Redistributions in binary form must reproduce the above
555      copyright notice, this list of conditions and the following
556      disclaimer in the documentation and/or other materials
557      provided with the distribution.
558
559THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER “AS IS” AND ANY
560EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
561IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
562PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE
563LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
564OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
565PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
566PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
567THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
568TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
569THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
570SUCH DAMAGE.
571#+END_EXAMPLE
572