1#+TITLE: UglifyJS -- a JavaScript parser/compressor/beautifier 2#+KEYWORDS: javascript, js, parser, compiler, compressor, mangle, minify, minifier 3#+DESCRIPTION: a JavaScript parser/compressor/beautifier in JavaScript 4#+STYLE: <link rel="stylesheet" type="text/css" href="docstyle.css" /> 5#+AUTHOR: Mihai Bazon 6#+EMAIL: mihai.bazon@gmail.com 7 8* UglifyJS --- a JavaScript parser/compressor/beautifier 9 10This package implements a general-purpose JavaScript 11parser/compressor/beautifier toolkit. It is developed on [[http://nodejs.org/][NodeJS]], but it 12should work on any JavaScript platform supporting the CommonJS module system 13(and if your platform of choice doesn't support CommonJS, you can easily 14implement it, or discard the =exports.*= lines from UglifyJS sources). 15 16The tokenizer/parser generates an abstract syntax tree from JS code. You 17can then traverse the AST to learn more about the code, or do various 18manipulations on it. This part is implemented in [[../lib/parse-js.js][parse-js.js]] and it's a 19port to JavaScript of the excellent [[http://marijn.haverbeke.nl/parse-js/][parse-js]] Common Lisp library from [[http://marijn.haverbeke.nl/][Marijn 20Haverbeke]]. 21 22( See [[http://github.com/mishoo/cl-uglify-js][cl-uglify-js]] if you're looking for the Common Lisp version of 23UglifyJS. ) 24 25The second part of this package, implemented in [[../lib/process.js][process.js]], inspects and 26manipulates the AST generated by the parser to provide the following: 27 28- ability to re-generate JavaScript code from the AST. Optionally 29 indented---you can use this if you want to “beautify” a program that has 30 been compressed, so that you can inspect the source. But you can also run 31 our code generator to print out an AST without any whitespace, so you 32 achieve compression as well. 33 34- shorten variable names (usually to single characters). Our mangler will 35 analyze the code and generate proper variable names, depending on scope 36 and usage, and is smart enough to deal with globals defined elsewhere, or 37 with =eval()= calls or =with{}= statements. In short, if =eval()= or 38 =with{}= are used in some scope, then all variables in that scope and any 39 variables in the parent scopes will remain unmangled, and any references 40 to such variables remain unmangled as well. 41 42- various small optimizations that may lead to faster code but certainly 43 lead to smaller code. Where possible, we do the following: 44 45 - foo["bar"] ==> foo.bar 46 47 - remove block brackets ={}= 48 49 - join consecutive var declarations: 50 var a = 10; var b = 20; ==> var a=10,b=20; 51 52 - resolve simple constant expressions: 1 +2 * 3 ==> 7. We only do the 53 replacement if the result occupies less bytes; for example 1/3 would 54 translate to 0.333333333333, so in this case we don't replace it. 55 56 - consecutive statements in blocks are merged into a sequence; in many 57 cases, this leaves blocks with a single statement, so then we can remove 58 the block brackets. 59 60 - various optimizations for IF statements: 61 62 - if (foo) bar(); else baz(); ==> foo?bar():baz(); 63 - if (!foo) bar(); else baz(); ==> foo?baz():bar(); 64 - if (foo) bar(); ==> foo&&bar(); 65 - if (!foo) bar(); ==> foo||bar(); 66 - if (foo) return bar(); else return baz(); ==> return foo?bar():baz(); 67 - if (foo) return bar(); else something(); ==> {if(foo)return bar();something()} 68 69 - remove some unreachable code and warn about it (code that follows a 70 =return=, =throw=, =break= or =continue= statement, except 71 function/variable declarations). 72 73 - act a limited version of a pre-processor (c.f. the pre-processor of 74 C/C++) to allow you to safely replace selected global symbols with 75 specified values. When combined with the optimisations above this can 76 make UglifyJS operate slightly more like a compilation process, in 77 that when certain symbols are replaced by constant values, entire code 78 blocks may be optimised away as unreachable. 79 80** <<Unsafe transformations>> 81 82The following transformations can in theory break code, although they're 83probably safe in most practical cases. To enable them you need to pass the 84=--unsafe= flag. 85 86*** Calls involving the global Array constructor 87 88The following transformations occur: 89 90#+BEGIN_SRC js 91new Array(1, 2, 3, 4) => [1,2,3,4] 92Array(a, b, c) => [a,b,c] 93new Array(5) => Array(5) 94new Array(a) => Array(a) 95#+END_SRC 96 97These are all safe if the Array name isn't redefined. JavaScript does allow 98one to globally redefine Array (and pretty much everything, in fact) but I 99personally don't see why would anyone do that. 100 101UglifyJS does handle the case where Array is redefined locally, or even 102globally but with a =function= or =var= declaration. Therefore, in the 103following cases UglifyJS *doesn't touch* calls or instantiations of Array: 104 105#+BEGIN_SRC js 106// case 1. globally declared variable 107 var Array; 108 new Array(1, 2, 3); 109 Array(a, b); 110 111 // or (can be declared later) 112 new Array(1, 2, 3); 113 var Array; 114 115 // or (can be a function) 116 new Array(1, 2, 3); 117 function Array() { ... } 118 119// case 2. declared in a function 120 (function(){ 121 a = new Array(1, 2, 3); 122 b = Array(5, 6); 123 var Array; 124 })(); 125 126 // or 127 (function(Array){ 128 return Array(5, 6, 7); 129 })(); 130 131 // or 132 (function(){ 133 return new Array(1, 2, 3, 4); 134 function Array() { ... } 135 })(); 136 137 // etc. 138#+END_SRC 139 140*** =obj.toString()= ==> =obj+“”= 141 142** Install (NPM) 143 144UglifyJS is now available through NPM --- =npm install uglify-js= should do 145the job. 146 147** Install latest code from GitHub 148 149#+BEGIN_SRC sh 150## clone the repository 151mkdir -p /where/you/wanna/put/it 152cd /where/you/wanna/put/it 153git clone git://github.com/mishoo/UglifyJS.git 154 155## make the module available to Node 156mkdir -p ~/.node_libraries/ 157cd ~/.node_libraries/ 158ln -s /where/you/wanna/put/it/UglifyJS/uglify-js.js 159 160## and if you want the CLI script too: 161mkdir -p ~/bin 162cd ~/bin 163ln -s /where/you/wanna/put/it/UglifyJS/bin/uglifyjs 164 # (then add ~/bin to your $PATH if it's not there already) 165#+END_SRC 166 167** Usage 168 169There is a command-line tool that exposes the functionality of this library 170for your shell-scripting needs: 171 172#+BEGIN_SRC sh 173uglifyjs [ options... ] [ filename ] 174#+END_SRC 175 176=filename= should be the last argument and should name the file from which 177to read the JavaScript code. If you don't specify it, it will read code 178from STDIN. 179 180Supported options: 181 182- =-b= or =--beautify= --- output indented code; when passed, additional 183 options control the beautifier: 184 185 - =-i N= or =--indent N= --- indentation level (number of spaces) 186 187 - =-q= or =--quote-keys= --- quote keys in literal objects (by default, 188 only keys that cannot be identifier names will be quotes). 189 190- =--ascii= --- pass this argument to encode non-ASCII characters as 191 =\uXXXX= sequences. By default UglifyJS won't bother to do it and will 192 output Unicode characters instead. (the output is always encoded in UTF8, 193 but if you pass this option you'll only get ASCII). 194 195- =-nm= or =--no-mangle= --- don't mangle variable names 196 197- =-ns= or =--no-squeeze= --- don't call =ast_squeeze()= (which does various 198 optimizations that result in smaller, less readable code). 199 200- =-mt= or =--mangle-toplevel= --- mangle names in the toplevel scope too 201 (by default we don't do this). 202 203- =--no-seqs= --- when =ast_squeeze()= is called (thus, unless you pass 204 =--no-squeeze=) it will reduce consecutive statements in blocks into a 205 sequence. For example, "a = 10; b = 20; foo();" will be written as 206 "a=10,b=20,foo();". In various occasions, this allows us to discard the 207 block brackets (since the block becomes a single statement). This is ON 208 by default because it seems safe and saves a few hundred bytes on some 209 libs that I tested it on, but pass =--no-seqs= to disable it. 210 211- =--no-dead-code= --- by default, UglifyJS will remove code that is 212 obviously unreachable (code that follows a =return=, =throw=, =break= or 213 =continue= statement and is not a function/variable declaration). Pass 214 this option to disable this optimization. 215 216- =-nc= or =--no-copyright= --- by default, =uglifyjs= will keep the initial 217 comment tokens in the generated code (assumed to be copyright information 218 etc.). If you pass this it will discard it. 219 220- =-o filename= or =--output filename= --- put the result in =filename=. If 221 this isn't given, the result goes to standard output (or see next one). 222 223- =--overwrite= --- if the code is read from a file (not from STDIN) and you 224 pass =--overwrite= then the output will be written in the same file. 225 226- =--ast= --- pass this if you want to get the Abstract Syntax Tree instead 227 of JavaScript as output. Useful for debugging or learning more about the 228 internals. 229 230- =-v= or =--verbose= --- output some notes on STDERR (for now just how long 231 each operation takes). 232 233- =-d SYMBOL[=VALUE]= or =--define SYMBOL[=VALUE]= --- will replace 234 all instances of the specified symbol where used as an identifier 235 (except where symbol has properly declared by a var declaration or 236 use as function parameter or similar) with the specified value. This 237 argument may be specified multiple times to define multiple 238 symbols - if no value is specified the symbol will be replaced with 239 the value =true=, or you can specify a numeric value (such as 240 =1024=), a quoted string value (such as ="object"= or 241 ='https://github.com'=), or the name of another symbol or keyword 242 (such as =null= or =document=). 243 This allows you, for example, to assign meaningful names to key 244 constant values but discard the symbolic names in the uglified 245 version for brevity/efficiency, or when used wth care, allows 246 UglifyJS to operate as a form of *conditional compilation* 247 whereby defining appropriate values may, by dint of the constant 248 folding and dead code removal features above, remove entire 249 superfluous code blocks (e.g. completely remove instrumentation or 250 trace code for production use). 251 Where string values are being defined, the handling of quotes are 252 likely to be subject to the specifics of your command shell 253 environment, so you may need to experiment with quoting styles 254 depending on your platform, or you may find the option 255 =--define-from-module= more suitable for use. 256 257- =-define-from-module SOMEMODULE= --- will load the named module (as 258 per the NodeJS =require()= function) and iterate all the exported 259 properties of the module defining them as symbol names to be defined 260 (as if by the =--define= option) per the name of each property 261 (i.e. without the module name prefix) and given the value of the 262 property. This is a much easier way to handle and document groups of 263 symbols to be defined rather than a large number of =--define= 264 options. 265 266- =--unsafe= --- enable other additional optimizations that are known to be 267 unsafe in some contrived situations, but could still be generally useful. 268 For now only these: 269 270 - foo.toString() ==> foo+"" 271 - new Array(x,...) ==> [x,...] 272 - new Array(x) ==> Array(x) 273 274- =--max-line-len= (default 32K characters) --- add a newline after around 275 32K characters. I've seen both FF and Chrome croak when all the code was 276 on a single line of around 670K. Pass --max-line-len 0 to disable this 277 safety feature. 278 279- =--reserved-names= --- some libraries rely on certain names to be used, as 280 pointed out in issue #92 and #81, so this option allow you to exclude such 281 names from the mangler. For example, to keep names =require= and =$super= 282 intact you'd specify --reserved-names "require,$super". 283 284- =--inline-script= -- when you want to include the output literally in an 285 HTML =<script>= tag you can use this option to prevent =</script= from 286 showing up in the output. 287 288- =--lift-vars= -- when you pass this, UglifyJS will apply the following 289 transformations (see the notes in API, =ast_lift_variables=): 290 291 - put all =var= declarations at the start of the scope 292 - make sure a variable is declared only once 293 - discard unused function arguments 294 - discard unused inner (named) functions 295 - finally, try to merge assignments into that one =var= declaration, if 296 possible. 297 298*** API 299 300To use the library from JavaScript, you'd do the following (example for 301NodeJS): 302 303#+BEGIN_SRC js 304var jsp = require("uglify-js").parser; 305var pro = require("uglify-js").uglify; 306 307var orig_code = "... JS code here"; 308var ast = jsp.parse(orig_code); // parse code and get the initial AST 309ast = pro.ast_mangle(ast); // get a new AST with mangled names 310ast = pro.ast_squeeze(ast); // get an AST with compression optimizations 311var final_code = pro.gen_code(ast); // compressed code here 312#+END_SRC 313 314The above performs the full compression that is possible right now. As you 315can see, there are a sequence of steps which you can apply. For example if 316you want compressed output but for some reason you don't want to mangle 317variable names, you would simply skip the line that calls 318=pro.ast_mangle(ast)=. 319 320Some of these functions take optional arguments. Here's a description: 321 322- =jsp.parse(code, strict_semicolons)= -- parses JS code and returns an AST. 323 =strict_semicolons= is optional and defaults to =false=. If you pass 324 =true= then the parser will throw an error when it expects a semicolon and 325 it doesn't find it. For most JS code you don't want that, but it's useful 326 if you want to strictly sanitize your code. 327 328- =pro.ast_lift_variables(ast)= -- merge and move =var= declarations to the 329 scop of the scope; discard unused function arguments or variables; discard 330 unused (named) inner functions. It also tries to merge assignments 331 following the =var= declaration into it. 332 333 If your code is very hand-optimized concerning =var= declarations, this 334 lifting variable declarations might actually increase size. For me it 335 helps out. On jQuery it adds 865 bytes (243 after gzip). YMMV. Also 336 note that (since it's not enabled by default) this operation isn't yet 337 heavily tested (please report if you find issues!). 338 339 Note that although it might increase the image size (on jQuery it gains 340 865 bytes, 243 after gzip) it's technically more correct: in certain 341 situations, dead code removal might drop variable declarations, which 342 would not happen if the variables are lifted in advance. 343 344 Here's an example of what it does: 345 346#+BEGIN_SRC js 347function f(a, b, c, d, e) { 348 var q; 349 var w; 350 w = 10; 351 q = 20; 352 for (var i = 1; i < 10; ++i) { 353 var boo = foo(a); 354 } 355 for (var i = 0; i < 1; ++i) { 356 var boo = bar(c); 357 } 358 function foo(){ ... } 359 function bar(){ ... } 360 function baz(){ ... } 361} 362 363// transforms into ==> 364 365function f(a, b, c) { 366 var i, boo, w = 10, q = 20; 367 for (i = 1; i < 10; ++i) { 368 boo = foo(a); 369 } 370 for (i = 0; i < 1; ++i) { 371 boo = bar(c); 372 } 373 function foo() { ... } 374 function bar() { ... } 375} 376#+END_SRC 377 378- =pro.ast_mangle(ast, options)= -- generates a new AST containing mangled 379 (compressed) variable and function names. It supports the following 380 options: 381 382 - =toplevel= -- mangle toplevel names (by default we don't touch them). 383 - =except= -- an array of names to exclude from compression. 384 - =defines= -- an object with properties named after symbols to 385 replace (see the =--define= option for the script) and the values 386 representing the AST replacement value. 387 388- =pro.ast_squeeze(ast, options)= -- employs further optimizations designed 389 to reduce the size of the code that =gen_code= would generate from the 390 AST. Returns a new AST. =options= can be a hash; the supported options 391 are: 392 393 - =make_seqs= (default true) which will cause consecutive statements in a 394 block to be merged using the "sequence" (comma) operator 395 396 - =dead_code= (default true) which will remove unreachable code. 397 398- =pro.gen_code(ast, options)= -- generates JS code from the AST. By 399 default it's minified, but using the =options= argument you can get nicely 400 formatted output. =options= is, well, optional :-) and if you pass it it 401 must be an object and supports the following properties (below you can see 402 the default values): 403 404 - =beautify: false= -- pass =true= if you want indented output 405 - =indent_start: 0= (only applies when =beautify= is =true=) -- initial 406 indentation in spaces 407 - =indent_level: 4= (only applies when =beautify= is =true=) -- 408 indentation level, in spaces (pass an even number) 409 - =quote_keys: false= -- if you pass =true= it will quote all keys in 410 literal objects 411 - =space_colon: false= (only applies when =beautify= is =true=) -- wether 412 to put a space before the colon in object literals 413 - =ascii_only: false= -- pass =true= if you want to encode non-ASCII 414 characters as =\uXXXX=. 415 - =inline_script: false= -- pass =true= to escape occurrences of 416 =</script= in strings 417 418*** Beautifier shortcoming -- no more comments 419 420The beautifier can be used as a general purpose indentation tool. It's 421useful when you want to make a minified file readable. One limitation, 422though, is that it discards all comments, so you don't really want to use it 423to reformat your code, unless you don't have, or don't care about, comments. 424 425In fact it's not the beautifier who discards comments --- they are dumped at 426the parsing stage, when we build the initial AST. Comments don't really 427make sense in the AST, and while we could add nodes for them, it would be 428inconvenient because we'd have to add special rules to ignore them at all 429the processing stages. 430 431*** Use as a code pre-processor 432 433The =--define= option can be used, particularly when combined with the 434constant folding logic, as a form of pre-processor to enable or remove 435particular constructions, such as might be used for instrumenting 436development code, or to produce variations aimed at a specific 437platform. 438 439The code below illustrates the way this can be done, and how the 440symbol replacement is performed. 441 442#+BEGIN_SRC js 443CLAUSE1: if (typeof DEVMODE === 'undefined') { 444 DEVMODE = true; 445} 446 447CLAUSE2: function init() { 448 if (DEVMODE) { 449 console.log("init() called"); 450 } 451 .... 452 DEVMODE && console.log("init() complete"); 453} 454 455CLAUSE3: function reportDeviceStatus(device) { 456 var DEVMODE = device.mode, DEVNAME = device.name; 457 if (DEVMODE === 'open') { 458 .... 459 } 460} 461#+END_SRC 462 463When the above code is normally executed, the undeclared global 464variable =DEVMODE= will be assigned the value *true* (see =CLAUSE1=) 465and so the =init()= function (=CLAUSE2=) will write messages to the 466console log when executed, but in =CLAUSE3= a locally declared 467variable will mask access to the =DEVMODE= global symbol. 468 469If the above code is processed by UglifyJS with an argument of 470=--define DEVMODE=false= then UglifyJS will replace =DEVMODE= with the 471boolean constant value *false* within =CLAUSE1= and =CLAUSE2=, but it 472will leave =CLAUSE3= as it stands because there =DEVMODE= resolves to 473a validly declared variable. 474 475And more so, the constant-folding features of UglifyJS will recognise 476that the =if= condition of =CLAUSE1= is thus always false, and so will 477remove the test and body of =CLAUSE1= altogether (including the 478otherwise slightly problematical statement =false = true;= which it 479will have formed by replacing =DEVMODE= in the body). Similarly, 480within =CLAUSE2= both calls to =console.log()= will be removed 481altogether. 482 483In this way you can mimic, to a limited degree, the functionality of 484the C/C++ pre-processor to enable or completely remove blocks 485depending on how certain symbols are defined - perhaps using UglifyJS 486to generate different versions of source aimed at different 487environments 488 489It is recommmended (but not made mandatory) that symbols designed for 490this purpose are given names consisting of =UPPER_CASE_LETTERS= to 491distinguish them from other (normal) symbols and avoid the sort of 492clash that =CLAUSE3= above illustrates. 493 494** Compression -- how good is it? 495 496Here are updated statistics. (I also updated my Google Closure and YUI 497installations). 498 499We're still a lot better than YUI in terms of compression, though slightly 500slower. We're still a lot faster than Closure, and compression after gzip 501is comparable. 502 503| File | UglifyJS | UglifyJS+gzip | Closure | Closure+gzip | YUI | YUI+gzip | 504|-----------------------------+------------------+---------------+------------------+--------------+------------------+----------| 505| jquery-1.6.2.js | 91001 (0:01.59) | 31896 | 90678 (0:07.40) | 31979 | 101527 (0:01.82) | 34646 | 506| paper.js | 142023 (0:01.65) | 43334 | 134301 (0:07.42) | 42495 | 173383 (0:01.58) | 48785 | 507| prototype.js | 88544 (0:01.09) | 26680 | 86955 (0:06.97) | 26326 | 92130 (0:00.79) | 28624 | 508| thelib-full.js (DynarchLIB) | 251939 (0:02.55) | 72535 | 249911 (0:09.05) | 72696 | 258869 (0:01.94) | 76584 | 509 510** Bugs? 511 512Unfortunately, for the time being there is no automated test suite. But I 513ran the compressor manually on non-trivial code, and then I tested that the 514generated code works as expected. A few hundred times. 515 516DynarchLIB was started in times when there was no good JS minifier. 517Therefore I was quite religious about trying to write short code manually, 518and as such DL contains a lot of syntactic hacks[1] such as “foo == bar ? a 519= 10 : b = 20”, though the more readable version would clearly be to use 520“if/else”. 521 522Since the parser/compressor runs fine on DL and jQuery, I'm quite confident 523that it's solid enough for production use. If you can identify any bugs, 524I'd love to hear about them ([[http://groups.google.com/group/uglifyjs][use the Google Group]] or email me directly). 525 526[1] I even reported a few bugs and suggested some fixes in the original 527 [[http://marijn.haverbeke.nl/parse-js/][parse-js]] library, and Marijn pushed fixes literally in minutes. 528 529** Links 530 531- Twitter: [[http://twitter.com/UglifyJS][@UglifyJS]] 532- Project at GitHub: [[http://github.com/mishoo/UglifyJS][http://github.com/mishoo/UglifyJS]] 533- Google Group: [[http://groups.google.com/group/uglifyjs][http://groups.google.com/group/uglifyjs]] 534- Common Lisp JS parser: [[http://marijn.haverbeke.nl/parse-js/][http://marijn.haverbeke.nl/parse-js/]] 535- JS-to-Lisp compiler: [[http://github.com/marijnh/js][http://github.com/marijnh/js]] 536- Common Lisp JS uglifier: [[http://github.com/mishoo/cl-uglify-js][http://github.com/mishoo/cl-uglify-js]] 537 538** License 539 540UglifyJS is released under the BSD license: 541 542#+BEGIN_EXAMPLE 543Copyright 2010 (c) Mihai Bazon <mihai.bazon@gmail.com> 544Based on parse-js (http://marijn.haverbeke.nl/parse-js/). 545 546Redistribution and use in source and binary forms, with or without 547modification, are permitted provided that the following conditions 548are met: 549 550 * Redistributions of source code must retain the above 551 copyright notice, this list of conditions and the following 552 disclaimer. 553 554 * Redistributions in binary form must reproduce the above 555 copyright notice, this list of conditions and the following 556 disclaimer in the documentation and/or other materials 557 provided with the distribution. 558 559THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER “AS IS” AND ANY 560EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 561IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 562PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE 563LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, 564OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 565PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 566PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 567THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR 568TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF 569THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 570SUCH DAMAGE. 571#+END_EXAMPLE 572