1Lognormalizer
2=============
3
4Lognormalizer is a sample tool which is often used to test and debug
5rulebases before real use. Nevertheless, it can be used in production as
6a simple command line interface to liblognorm.
7
8This tool reads log lines from its standard input and prints results
9to standard output. You need to use redirections if you want to read
10or write files.
11
12An example of the command::
13
14    $ lognormalizer -r messages.sampdb -e json <messages.log
15
16Command line options
17--------------------
18
19::
20
21    -V
22
23Output version information, including information about the installed
24version of liblognorm and its optional features. So this may also be
25used to check the currently installed library version.
26
27::
28
29    -r <FILENAME>
30
31Specifies name of the file containing the rulebase.
32
33::
34
35    -v
36
37Increase verbosity level. Can be used several times. If used three
38times, internal data structures are dumped (make sense to developers,
39only).
40
41::
42
43    -p
44
45Print only successfully parsed messages.
46
47::
48
49    -P
50
51Print only messages **not** successfully parsed.
52
53::
54
55    -L
56
57Add line number information to events not successfully parsed. This
58is meant as a troubleshooting aid when working with unparsable events,
59as the information can be used to directly go to the line in question
60in the source data file. The line number is contained in a field
61named ``lognormalizer.line_nbr``.
62
63::
64
65    -t <TAG>
66
67Print only those messages which have this tag.
68
69::
70
71    -T
72
73Include 'event.tags' attribute when output is in JSON format. This attribute contains list of tags of the matched
74rule.
75
76::
77
78    -E <DATA>
79
80Encoder-specific data. For CSV, it is the list of fields to be output,
81separated by comma or space. It is currently unused for other formats.
82
83::
84
85    -d <FILENAME>
86
87Generate DOT file describing parse tree. It is used to plot parse graph
88with GraphViz.
89
90::
91
92    -H
93
94At end of run, print a summary line with number of messages processed,
95parsed and unparsed to stdout.
96
97::
98
99    -U
100
101At end of run, print a summary line with number of messages unparsed to
102stdout. Note that this message is only printed if there was at least one
103unparsable message.
104
105::
106
107    -o
108
109Special options. The following ones can be set:
110
111   * **allowRegex** Permits to use regular expressions inse the v1 engine
112     This is deprecated and should not be used for new deployments.
113
114   * **addExecPath** Includes metadata into the event on how it was
115     (tried) to be parsed. Can be useful in troubleshooting normalization
116     problems.
117
118   * **addOriginalMsg** Always add the "original-msg" data item. By
119     default, this is only done when a message could not be parsed.
120
121   * **addRule** Add a mockup of the rule that was processed. Note that
122     it is *not* an exact copy of the rule, but a rule that correctly
123     describes the parsed message. Most importantly, prefixes are
124     appended and custom data types are expanded (and no longer visiable
125     as such). This option is primarily meant for postprocessing, e.g.
126     as input to an anonymizer.
127
128   * **addRuleRulcation** For rules that successfully parsed, add the
129     location of the rule inside the rulebase. But the file name as
130     well as the line number are given. If two rules evaluate to the same
131     end node, only a single rule location is given. However, in
132     practice this is extremely unlikely and as such for practical
133     reasons the information can be considered reliable.
134
135::
136
137    -s <FILENAME>
138
139At end of run, print internal parse DAG statistics and exit. This
140option is meant for developers and researches which want to get insight
141into the quality of the algorithm and/or how efficient the rulebase could
142be processed. **NOT** intended for end users. This option is performance
143intense.
144
145::
146
147    -S <FILENAME>
148
149Even stronger statistics than -s. Requires that the version is compiled
150with --enable-advanced-statistics, which causes a considerable
151performance loss.
152
153::
154
155   -x <FILENAME>
156
157Print statistics as a DOT file. In order to keep the graph readable,
158information is only emitted for called nodes.
159
160::
161
162    -e <json|xml|csv|raw|cee-syslog>
163
164Output format. By default, output is in JSON format. With this option,
165you can change it to a different one.
166
167Supported Output Formats
168........................
169The JSON, XML, and CSV formats should be self-explanatory.
170
171The cee-syslog format emits messages according to the Mitre CEE spec.
172Note that the cee-syslog format is primarily supported for
173backward-compatibility. It does **not** support nested data items
174and as such cannot be used when the rulebase makes use of this
175feature (we assume this most often happens nowadays). We strongly
176recommend not use it for new deployments. Support may be removed
177in later releases.
178
179The raw format outputs an exact copy of the input message, without
180any normalization visible. The prime use case of "raw" is to extract
181either all messages that could or could not be normalized. To do so
182specify the -p or -P option. Also, it works in combination with the
183-t option to extract a subset based on tagging. In any case, the core
184use is to prepare a subset of the original file for further processing.
185
186Examples
187--------
188
189These examples were created using sample rulebase from source package.
190
191Default (CEE) output::
192
193	$ lognormalizer -r rulebases/sample.rulebase
194	Weight: 42kg
195	[cee@115 event.tags="tag2" unit="kg" N="42" fat="free"]
196	Snow White and the Seven Dwarfs
197	[cee@115 event.tags="tale" company="the Seven Dwarfs"]
198	2012-10-11 src=127.0.0.1 dst=88.111.222.19
199	[cee@115 dst="88.111.222.19" src="127.0.0.1" date="2012-10-11"]
200
201JSON output, flat tags enabled::
202
203	$ lognormalizer -r rulebases/sample.rulebase -e json -T
204	%%
205	{ "event.tags": [ "tag3", "percent" ], "percent": "100", "part": "wha", "whole": "whale" }
206	Weight: 42kg
207	{ "unit": "kg", "N": "42", "event.tags": [ "tag2" ], "fat": "free" }
208
209CSV output with fixed field list::
210
211	$ lognormalizer -r rulebases/sample.rulebase -e csv -E'N unit'
212	Weight: 42kg
213	"42","kg"
214	Weight: 115lbs
215	"115","lbs"
216	Anything not matching the rule
217	,
218
219Creating a graph of the rulebase
220--------------------------------
221
222To get a better overview of a rulebase you can create a graph that shows you
223the chain of normalization (parse-tree).
224
225At first you have to install an additional package called graphviz. Graphviz
226is a tool that creates such a graph with the help of a control file (created
227with the rulebase). `Here <http://www.graphviz.org/>`_ you will find more
228information about graphviz.
229
230To install it you can use the package manager. For example, on RedHat
231systems it is yum command::
232
233    $ sudo yum install graphviz
234
235The next step would be creating the control file for graphviz. Therefore we
236use the normalizer command with the options -d "prefered filename for the
237control file" and -r "rulebase"::
238
239    $ lognormalize -d control.dot -r messages.rb
240
241Please note that there is no need for an input or output file.
242If you have a look at the control file now you will see that the content is
243a little bit confusing, but it includes all information, like the nodes,
244fields and parser, that graphviz needs to create the graph. Of course you
245can edit that file, but please note that it is a lot of work.
246
247Now we can create the graph by typing::
248
249    $ dot control.dot -Tpng >graph.png
250
251dot + name of control file + option -T -> file format + output file
252
253That is just one example for using graphviz, of course you can do many
254other great things with it. But I think this "simple" graph could be very
255helpful for the normalizer.
256
257Below you see sample for such a graph, but please note that this is
258not such a pretty one. Such a graph can grow very fast by editing your
259rulebase.
260
261.. figure:: graph.png
262   :width: 90 %
263   :alt: graph sample
264
265