1# $Id$
2
3Introduction
4============
5
6Text_Highlighter is a class for syntax highlighting. The main idea is to
7simplify creation of subclasses implementing syntax highlighting for
8particular language. Subclasses do not implement any new functioanality, they
9just provide syntax highlighting rules. The rules sources are in XML format.
10To create a highlighter for a language, there is no need to code a new class
11manually. Simply describe the rules in XML file and use Text_Highlighter_Generator
12to create a new class.
13
14
15This document does not contain a formal description of API - it is very
16simple, and I believe providing some examples of code is sufficient.
17
18
19Highlighter XML source
20======================
21
22Basics
23------
24
25Creating a new syntax highlighter begins with describing the highlighting
26rules. There are two basic elements: block and region. A block is just a
27portion of text matching a regular expression and highlighted with a single
28color. Keyword is an example of a block. A region is defined by two regular
29expressions: one for start of region, and another for the end. The main
30difference from a block is that a region can contain blocks and regions
31(including same-named regions). An example of a region is a group of
32statements enclosed in curly brackets (this is used in many languages, for
33example PHP and C). Also, characters matching start and end of a region may be
34highlighted with their own color, and region contents with another.
35
36Blocks and regions may be declared as contained. Contained blocks and regions
37can only appear inside regions. If a region or a block is not declared as
38contained, it can appear both on top level and inside regions. Block or region
39declared as not-contained can only appear on top level.
40
41For any region, a list of blocks and regions that can appear inside this
42region can be specified.
43
44In this document, the term "color group" is used. Chunks of text assigned to
45same color group will be highlighted with same color. Note that in versions
46prior 0.5.0 color goups were refered as CSS classes, but since 0.5.0 not only
47HTML output is supported, so "color group" is more appropriate term.
48
49Elements
50--------
51
52The toplevel element is <highlight>. Attribute lang is required and denotes
53the name of the language. Its value is used as a part of generated class name,
54and must only contain letters, digits and underscores. Optional attribute
55case, when given value yes, makes the language case sensitive (default is case
56insensitive). Allowed subelements are:
57
58 * <authors>: Information about the authors of the file.
59 <author>: Information about a single author of the file. (May be used
60 multiple times, one per author.)
61 - name="...": Author's name. Required.
62 - email="...": Author's email address. Optional.
63
64 * <default>: Default color group.
65 - innerGroup="...": color group name. Required.
66
67 * <region>: Region definition
68 - name="...": Region name. Required.
69 - innerGroup="...": Default color group of region contents. Required.
70 - delimGroup="...": color group of start and end of region. Optional,
71 defaults to value of innerGroup attribute.
72 - start="...", end="...": Regular expression matching start and end
73 of region. Required. Regular expression delimiters are optional, but
74 if you need to specify delimiter, use /. The only case when the
75 delimiters are needed, is specifying regular expression modifiers,
76 such as m or U. Examples: \/\* or /$/m.
77 - contained="yes": Marks region as contained.
78 - never-contained="yes": Marks region as not-contained.
79 - <contains>: Elements allowed inside this region.
80 - all="yes" Region can contain any other region or block
81 (except not-contained). May be used multiple times.
82 - <but> Do not allow certain regions or blocks.
83 - region="..." Name of region not allowed within
84 current region.
85 - block="..." Name of block not allowed within
86 current region.
87 - region="..." Name of region allowed within current region.
88 - block="..." Name of block allowed within current region.
89 - <onlyin> Only allow this region within certain regions. May be
90 used multiple times.
91 - block="..." Name of parent region
92
93 * <block>: Block definition
94 - name="...": Block name. Required.
95 - innerGroup="...": color group of block contents. Optional. If not
96 specified, color group of parent region or default color group will be
97 used. One would only want to omit this attribute if there are
98 keyword groups (see below) inherited from this block, and no special
99 highlighting should apply when the block does not match the keyword.
100 - match="..." Regular expression matching the block. Required.
101 Regular expression delimiters are optional, but if you need to
102 specify delimiter, use /. The only case when the delimiters are
103 needed, is specifying regular expression modifiers, such as m or U.
104 Examples: #|\/\/ or /$/m.
105 - contained="yes": Marks block as contained.
106 - never-contained="yes": Marks block as not-contained.
107 - <onlyin> Only allow this block within certain regions. May be used
108 multiple times.
109 - block="..." Name of parent region
110 - multiline="yes": Marks block as multi-line. By default, whole
111 blocks are assumed to reside in a single line. This make the things
112 faster. If you need to declare a multi-line block, use this
113 attribute.
114 - <partgroup>: Assigns another color group to a part of the block that
115 matched a subpattern.
116 - index="n": Subpattern index. Required.
117 - innerGroup="...": color group name. Required.
118
119 This is an example from CSS highlighter: the measure is matched as
120 a whole, but the measurement units are highlighted with different
121 color.
122
123 <block name="measure" match="\d*\.?\d+(\%|em|ex|pc|pt|px|in|mm|cm)"
124 innerGroup="number" contained="yes">
125 <onlyin region="property"/>
126 <partGroup index="1" innerGroup="string" />
127 </block>
128
129 * <keywords>: Keyword group definition. Keyword groups are useful when you
130 want to highlight some words that match a condition for a block with a
131 different color. Keywords are defined with literal match, not regular
132 expressions. For example, you have a block named identifier matching a
133 general identifier, and want to highlight reserved words (which match
134 this block as well) with different color. You inherit a keyword group
135 "reserved" from "identifier" block.
136 - name="...": Keyword group. Required.
137 - ifdef="...", ifndef="..." : Conditional declaration. See
138 "Conditions" below.
139 - inherits="...": Inherited block name. Required.
140 - innerGroup="...": color group of keyword group. Required.
141 - case="yes|no": Overrides case-sensitivity of the language.
142 Optional, defaults to global value.
143 - <keyword>: Single keyword definition.
144 - match="..." The keyword. Note: this is not a regular
145 expression, but literal match (possibly case insensitive).
146
147Note that for BC reasons element partClass is alias for partGroup, and
148attributes innerClass and delimClass are aliases of innerGroup and
149delimGroup, respectively.
150
151
152Conditions
153----------
154
155Conditional declarations allow enabling or disabling certain highlighting
156rules at runtime. For example, Java highlighter has a very big list of
157keywords matching Java standard classes. Finding a match in this list can take
158much time. For that reason, corresponding keyword group is declared with
159"ifdef" attribute :
160
161 <keywords name="builtin" inherits="identifier" innerClass="builtin"
162 case="yes" ifdef="java.builtins">
163 <keyword match="AbstractAction" />
164 <keyword match="AbstractBorder" />
165 <keyword match="AbstractButton" />
166 ...
167 ...
168 <keyword match="_Remote_Stub" />
169 <keyword match="_ServantActivatorStub" />
170 <keyword match="_ServantLocatorStub" />
171 </keywords>
172
173This keyword group will be only enabled when "java.builtins" is passed as an
174element of "defines" option:
175
176 $options = array(
177 'defines' => array(
178 'java.builtins',
179 ),
180 'numbers' => HL_NUMBERS_TABLE,
181 );
182 $highlighter = Text_Highlighter::factory('java', $options);
183
184"ifndef" attribute has reverse meaning.
185
186Currently, "ifdef" and "ifndef" attributes are only supported for <keywords>
187tag.
188
189
190
191Class generation
192================
193
194Creating XML description of highlighting rules is the most complicated part of
195the process. To generate the class, you need just few lines of code:
196
197 <?php
198 require_once 'Text/Highlighter/Generator.php';
199 $generator = new Text_Highlighter_Generator('php.xml');
200 $generator->generate();
201 $generator->saveCode('PHP.php');
202 ?>
203
204
205
206Command-line class generation tool
207==================================
208
209Example from previous section looks pretty simple, but it does not handle any
210errors which may occur during parsing of XML source. The package provides a
211command-line script to make generation of classes even more simple, and takes
212care of possible errors. It is called generate (on Unix/Linux) or generate.bat
213(on Windows). This script is able to process multiple files in one run, and
214also to process XML from standard input and write generated code to standard
215output.
216
217 Usage:
218 generate options
219
220 Options:
221 -x filename, --xml=filename
222 source XML file. Multiple input files can be specified, in which
223 case each -x option must be followed by -p unless -d is specified
224 Defaults to stdin
225 -p filename, --php=filename
226 destination PHP file. Defaults to stdout. If specied multiple times,
227 each -p must follow -x
228 -d dirname, --dir=dirname
229 Default destination directory. File names will be taken from XML input
230 ("lang" attribute of <highlight> tag)
231 -h, --help
232 This help
233
234Examples
235
236 Read from php.xml, write to PHP.php
237
238 generate -x php.xml -p PHP.php
239
240 Read from php.xml, write to standard output
241
242 generate -x php.xml
243
244 Read from php.xml, write to PHP.php, read from xml.xml, write to XML.php
245
246 generate -x php.xml -p PHP.php -x xml.xml -p XML.php
247
248 Read from php.xml, write to /some/dir/PHP.php, read from xml.xml, write to
249 /some/dir/XML.php (assuming that xml.xml contains <highlight lang="xml">, and
250 php.xml contains <highlight lang="php">)
251
252 generate -x php.xml -x xml.xml -d /some/dir/
253
254
255
256Renderers
257=========
258
259Introduction
260------------
261
262Text_Highlighter supports renderes. Using renderers, you can get output in
263different formats. Two renderers are included in the package:
264
265 - HTML renderer. Generates HTML output. A style sheet should be linked to
266 the document to display colored text
267
268 - Console renderer. Can be used to output highlighted text to
269 color-capable terminals, either directly or trough less -r
270
271
272Renderers API
273-------------
274
275Renderers are subclasses of Text_Highlighter_Renderer. Renderer should
276override at least two methods - acceptToken and getOutput. Overriding other
277methods is optional, depending on the nature of renderer's output and details
278of implementation.
279
280 string reset()
281 resets renderer state. This method is called every time before a new
282 source file is highlighted.
283
284 string preprocess(string $code)
285 preprocesses code. Can be used, for example, to normalize whitespace
286 before highlighting. Returns preprocessed string.
287
288 void acceptToken(string $group, string $content)
289 the core method of the renderer. Highlighter passes chunks of text to
290 this method in $content, and color group in $group
291
292 void finalize()
293 signals the renderer that no more tokens are available.
294
295 mixed getOutput()
296 returns generated output.
297
298
299Setting renderer options
300--------------------------------
301
302Renderers accept an optional argument to their constructor - options array.
303Elements of this array are renderer-specific.
304
305HTML renderer
306-------------
307
308HTML renderer produces HTML output with optional line numbering. The renderer
309itself does not provide information about actual colors of highlighted text.
310Instead, <span class="hl-XXX"> is used, where XXX is replaced with color group
311name (hl-var, hl-string, etc.). It is up to you to create a CSS stylesheet.
312If 'use_language' option with value evaluating to true was passed, class names
313will be formatted as "LANG-hl-XXX", where LANG is language name as defined in
314highlighter XML source ("lang" attribute of <highlight> tag) in lower case.
315
316There are 3 special CSS classes:
317
318 hl-main - this class applies to whole output or right table column,
319 depending on 'numbers' option
320 hl-gutter - applies to left column in table
321 hl-table - applies to whole table
322
323HTML renderer accepts following options (each being optional):
324
325 * numbers - line numbering style.
326 0 - no numbering (default)
327 HL_NUMBERS_LI - use <ol></ol> for line numbering
328 HL_NUMBERS_TABLE - create a 2-column table, with line numbers in left
329 column and highlighted text in right column
330
331 * tabsize - tabulation size. Defaults to 4
332
333 Example:
334
335 require_once 'Text/Highlighter/Renderer/Html.php';
336 $options = array(
337 'numbers' => HL_NUMBERS_LI,
338 'tabsize' => 8,
339 );
340 $renderer = new Text_Highlighter_Renderer_HTML($options);
341
342Console renderer
343----------------
344
345Console renderer produces output for displaying on a color-capable terminal,
346either directly or through less -r, using ANSI escape sequences. By default,
347this renderer only highlights most common color groups. Additional colors
348can be specified using 'colors' option. This renderer also accepts 'numbers'
349option - a boolean value, and 'tabsize' option.
350
351 Example :
352
353 require_once 'Text/Highlighter/Renderer/Console.php';
354 $colors = array(
355 'prepro' => "\033[35m",
356 'types' => "\033[32m",
357 );
358 $options = array(
359 'numbers' => true,
360 'tabsize' => 8,
361 'colors' => $colors,
362 );
363 $renderer = new Text_Highlighter_Renderer_Console($options);
364
365
366ANSI color escape sequences have the following format:
367
368 ESC[#;#;....;#m
369
370where ESC is character with ASCII code 27 (033 octal, 0x1B hexadecimal). # is
371one of the following:
372
373 0 for normal display
374 1 for bold on
375 4 underline (mono only)
376 5 blink on
377 7 reverse video on
378 8 nondisplayed (invisible)
379 30 black foreground
380 31 red foreground
381 32 green foreground
382 33 yellow foreground
383 34 blue foreground
384 35 magenta foreground
385 36 cyan foreground
386 37 white foreground
387 40 black background
388 41 red background
389 42 green background
390 43 yellow background
391 44 blue background
392 45 magenta background
393 46 cyan background
394 47 white background
395
396
397How to use Text_Highlighter class
398=================================
399
400Creating a highlighter object
401-----------------------------
402
403To create a highlighter for a certain language, use Text_Highlighter::factory()
404static method:
405
406 require_once 'Text/Highlighter.php';
407 $hl = Text_Highlighter::factory('php');
408
409
410Setting a renderer
411------------------
412
413Actual output is produced by a renderer.
414
415 require_once 'Text/Highlighter.php';
416 require_once 'Text/Highlighter/Renderer/Html.php';
417 $options = array(
418 'numbers' => HL_NUMBERS_LI,
419 'tabsize' => 8,
420 );
421 $renderer = new Text_Highlighter_Renderer_HTML($options);
422 $hl = Text_Highlighter::factory('php');
423 $hl->setRenderer($renderer);
424
425Note that for BC reasons, it is possible to use highlighter without setting a
426renderer. If no renderer is set, HTML renderer will be used by default. In
427this case, you should pass options as second parameter to factory method. The
428following example works exactly as previous one:
429
430 require_once 'Text/Highlighter.php';
431 $options = array(
432 'numbers' => HL_NUMBERS_LI,
433 'tabsize' => 8,
434 );
435 $hl = Text_Highlighter::factory('php', $options);
436
437
438Getting output
439--------------
440
441And finally, do the highlighting and get the output:
442
443 require_once 'Text/Highlighter.php';
444 require_once 'Text/Highlighter/Renderer/Html.php';
445 $options = array(
446 'numbers' => HL_NUMBERS_LI,
447 'tabsize' => 8,
448 );
449 $renderer = new Text_Highlighter_Renderer_HTML($options);
450 $hl = Text_Highlighter::factory('php');
451 $hl->setRenderer($renderer);
452 $html = $hl->highlight(file_get_contents('example.php'));
453
454# vim: set autoindent tabstop=4 shiftwidth=4 softtabstop=4 tw=78: */
455
456