1How to configure 2================ 3 4To use liblognorm, you need 3 things. 5 61. An installed and working copy of liblognorm. The installation process 7 has been discussed in the chapter :doc:`installation`. 82. Log files. 93. A rulebase, which is heart of liblognorm configuration. 10 11Log files 12--------- 13 14A log file is a text file, which typically holds many lines. Each line is 15a log message. These are usually a bit strange to read, thus to analyze. 16This mostly happens, if you have a lot of different devices, that are all 17creating log messages in a different format. 18 19Rulebase 20-------- 21 22The rulebase holds all the schemes for your logs. It basically consists of 23many lines that reflect the structure of your log messages. When the 24normalization process is started, a parse-tree will be generated from 25the rulebase and put into the memory. This will then be used to parse the 26log messages. 27 28Each line in rulebase file is evaluated separately. 29 30Rulebase Versions 31----------------- 32This documentation is for liblognorm version 2 and above. Version 2 is a 33complete rewrite of liblognorm which offers many enhanced features but 34is incompatible to some pre-v2 rulebase commands. For details, see 35compatiblity document. 36 37Note that liblognorm v2 contains a full copy of the v1 engine. As such 38it is fully compatible to old rulebases. In order to use the new v2 39engine, you need to explicitely opt in. To do so, you need to add 40the line:: 41 42 version=2 43 44to the top of your rulebase file. Currently, it is very important that 45 46 * the line is given exactly as above 47 * no whitespace within the sequence is permitted (e.g. "version = 2" 48 is invalid) 49 * no whitepace or comment after the "2" is permitted 50 (e.g. "version=2 # comment") is invalid 51 * this line **must** be the **very** first line of the file; this 52 also means there **must** not be any comment or empty lines in 53 front of it 54 55Only if the version indicator is properly detected, the v2 engine is 56used. Otherwise, the v1 engine is used. So if you use v2 features but 57got the version line wrong, you'll end up with error messages from the 58v1 engine. 59 60The v2 engine understands almost all v1 parsers, and most importantly all 61that are typically used. It does not understand these parsers: 62 63 * tokenized 64 * recursive 65 * descent 66 * regex 67 * interpret 68 * suffixed 69 * named_suffixed 70 71The recursive and descent parsers should be replaced by user-defined types 72in. The tokenized parsers should be replaced by repeat. The interpret functionality 73is provided via the parser's "format" parameters. For the others, 74currently there exists no replacement, but will the exception of regex, 75will be added based on demand. If you think regex support is urgently 76needed, please read our 77`related issue on github, <https://github.com/rsyslog/liblognorm/issues/143>`_ 78where you can also cast 79you ballot in favor of it. If you need any of these parsers, you need 80to use the v1 engine. That of course means you cannot use the v2 enhancements, 81so converting as much as possible makes sense. 82 83Commentaries 84------------ 85 86To keep your rulebase tidy, you can use commentaries. Start a commentary 87with "#" like in many other configurations. It should look like this:: 88 89 # The following prefix and rules are for firewall logs 90 91Note that the comment character MUST be in the first column of the line. 92 93Empty lines are just skipped, they can be inserted for readability. 94 95User-Defined Types 96------------------ 97 98If the line starts with ``type=``, then it contains a user-defined type. 99You can use a user-defined type wherever you use a built-in type; they 100are equivalent. That also means you can use user-defined types in the 101definition of other user-defined types (they can be used recursively). 102The only restriction is that you must define a type **before** you can 103use it. 104 105This line has following format:: 106 107 type=<typename>:<match description> 108 109Everything before the colon is treated as the type name. User-defined types 110must always start with "@". So "@mytype" is a valid name, whereas "mytype" 111is invalid and will lead to an error. 112 113After the colon, a match description should be 114given. It is exactly the same like the one given in rule lines (see below). 115 116A generic IP address type could look as follows:: 117 118 type=@IPaddr:%ip:ipv4% 119 type=@IPaddr:%ip:ipv6% 120 121This creates a type "@IPaddr", which consists of either an IPv4 or IPv6 122address. Note how we use two different lines to create an alternative 123representation. This is how things generally work with types: you can use 124as many "type" lines for a single type as you need to define your object. 125Note that pure alternatives could also be defined via the "alternative" 126parser - which option to choose is left to the user. They are equivalent. 127The ability to use multiple type lines for definition, however, brings 128more power than just to define alternatives. 129 130Includes 131-------- 132Especially with user-defined types includes come handy. With an include, 133you can include definitions already made elsewhere into the current 134rule set (just like the "include" directive works in many programming 135languages). An include is done by a line starting with ``include=`` 136where the rest of the line is the actual file name, just like in this 137example:: 138 139 include=/var/lib/liblognorm/stdtypes.rb 140 141The definition is included right at the position where it occurs. 142Processing of the original file is continued when the included file 143has been fully processed. Includes can be nested. 144 145To facilitate repositories of common rules, liblognorm honors the 146 147:: 148 149 LIBLOGNORM_RULEBASES 150 151environment variable. If it is set liblognorm tries to locate the file 152inside the path pointed to by ``LIBLOGNORM_RULEBASES`` in the following 153case: 154 155* the provided file cannot be found 156* the provided file name is not an absolute path (does not start with "/") 157 158So assuming we have:: 159 160 export LIBLOGNORM_RULEBASES=/var/lib/loblognorm 161 162The above example can be re-written as follows:: 163 164 include=stdtypes.rb 165 166Note, however, that if ``stdtypes.rb`` exist in the current working 167directory, that file will be loaded insted of the one from 168``/var/lib/liblognorm``. 169 170This use facilitates building a library of standard type definitions. Note 171the the liblognorm project also ships type definitions for common 172scenarios. 173 174Rules 175----- 176 177If the line starts with ``rule=``, then it contains a rule. This line has 178following format:: 179 180 rule=[<tag1>[,<tag2>...]]:<match description> 181 182Everything before a colon is treated as comma-separated list of tags, which 183will be attached to a match. After the colon, match description should be 184given. It consists of string literals and field selectors. String literals 185should match exactly, whereas field selectors may match variable parts 186of a message. 187 188A rule could look like this (in legacy format):: 189 190 rule=:%date:date-rfc3164% %host:word% %tag:char-to:\x3a%: no longer listening on %ip:ipv4%#%port:number%' 191 192This excerpt is a common rule. A rule always contains several different 193"parts"/properties and reflects the structure of the message you want to 194normalize (e.g. Host, IP, Source, Syslogtag...). 195 196 197Literals 198-------- 199 200Literal is just a sequence of characters, which must match exactly. 201Percent sign characters must be escaped to prevent them from starting a 202field accidentally. Replace each "%" with "\\x25" or "%%", when it occurs 203in a string literal. 204 205Fields 206------ 207 208There are different formats for field specification: 209 210 * legacy format 211 * condensed format 212 * full json format 213 214Legacy Format 215############# 216Legay format is exactly identical to the v1 engine. This permits you to use 217existing v1 rulebases without any modification with the v2 engine, except for 218adding the ``version=2`` header line to the top of the file. Remember: some 219v1 types are not supported - if you are among the few who use them, you need 220to do some manual conversion. For almost all users, manual conversion should 221not be necessary. 222 223Legacy format is not documented here. If you want to use it, see the v1 224documentation. 225 226Condensed Format 227################ 228The goal of this format is to be as brief as possible, permitting you an 229as-clear-as-possible view of your rule. It is very similar to legacy format 230and recommended to be used for simple types which do not need any parser 231parameters. 232 233Its structure is as follows:: 234 235 %<field name>:<field type>{<parameters>}% 236 237**field name** -> that name can be selected freely. It should be a description 238of what kind of information the field is holding, e.g. SRC is the field 239contains the source IP address of the message. These names should also be 240chosen carefully, since the field name can be used in every rule and 241therefore should fit for the same kind of information in different rules. 242 243Some special field names exist: 244 245* **dash** ("-"): this field is matched but not saved 246* **dot** ("."): this is useful if a parser returns a set of fields. Usually, 247 it does so by creating a json subtree. If the field is named ".", then 248 no subtree is created but instead the subfields are moved into the main 249 hierarchy. 250* **two dots** (".."): similiar to ".", but can be used at the lower level to denote 251 that a field is to be included with the name given by the upper-level 252 object. Note that ".." is only acted on if a subelement contains a single 253 field. The reason is that if there were more, we could not assign all of 254 them to the *single* name given by the upper-level-object. The prime 255 use case for this special name is in user-defined types that parse only 256 a single value. Without "..", they would always become a JSON subtree, which 257 seems unnatural and is different from built-in types. So it is suggested to 258 name such fields as "..", which means that the user can assign a name of his 259 liking, just like in the case of built-in parsers. 260 261**field type** -> selects the accordant parser, which are described below. 262 263Special characters that need to be escaped when used inside a field 264description are "%" and ":". It is strongly recommended **not** to use them. 265 266**parameters** -> This is an optional set of parameters, given in pure JSON 267format. Parameters can be generic (e.g. "priority") or specific to a 268parser (e.g. "extradata"). Generic parameters are described below in their 269own section, parser-specific ones in the relevant type documentation. 270 271As an example, the "char-to" parser accepts a parameter named "extradata" 272which describes up to which character it shall match (the name "extradata" 273stems back to the legacy v1 system):: 274 275 %tag:char-to{"extradata":":"}% 276 277Whitespace, including LF, is permitted inside a field definition after 278the opening precent sign and before the closing one. This can be used to 279make complex rules more readable. So the example rule from the overview 280section above could be rewritten as:: 281 282 rule=:% 283 date:date-rfc3164 284 % % 285 host:word 286 % % 287 tag:char-to{"extradata":":"} 288 %: no longer listening on % 289 ip:ipv4 290 %#% 291 port:number 292 %' 293 294When doing this, note well that whitespace IS important inside the 295literal text. So e.g. in the second example line above "% %" we require 296a single SP as literal text. Note that any combination of your liking is 297valid, so it could also be written as:: 298 299 rule=:%date:date-rfc3164% %host:word% % tag:char-to{"extradata":":"} 300 %: no longer listening on % ip:ipv4 %#% port:number %' 301 302To prevent a typical user error, continuation lines are **not** permitted 303to start with ``rule=``. There are some obscure cases where this could 304be a valid rule, and it can be re-formatted in that case. Moreoften, this 305is the result of a missing percent sign, as in this sample:: 306 307 rule=:test%field:word ... missing percent sign ... 308 rule=:%f:word% 309 310If we would permit ``rule=`` at start of continuation line, these kinds 311of problems would be very hard to detect. 312 313Full JSON Format 314################ 315This format is best for complex definitions or if there are many parser 316parameters. 317 318Its structure is as follows:: 319 320 %JSON% 321 322Where JSON is the configuration expressed in JSON. To get you started, let's 323rewrite above sample in pure JSON form:: 324 325 rule=:%[ {"type":"date-rfc3164", "name":"date"}, 326 {"type":"literal", "text:" "}, 327 {"type":"char-to", "name":"host", "extradata":":"}, 328 {"type":"literal", "text:": no longer listening on "}, 329 {"type":"ipv4", "name":"ip"}, 330 {"type":"literal", "text:"#"}, 331 {"type":"number", "name":"port"} 332 ]% 333 334A couple of things to note: 335 336 * we express everything in this example in a *single* parser definition 337 * this is done by using a **JSON array**; whenever an array is used, 338 multiple parsers can be specified. They are exectued one after the 339 other in given order. 340 * literal text is matched here via explicit parser call; as specified 341 below, this is recommended only for specific use cases with the 342 current version of liblognorm 343 * parser parameters (both generic and parser-specific ones) are given 344 on the main JSON level 345 * the literal text shall not be stored inside an output variable; for 346 this reason no name attribute is given (we could also have used 347 ``"name":"-"`` which achives the same effect but is more verbose). 348 349With the literal parser calls replaced by actual literals, the sample 350looks like this:: 351 352 rule=:%{"type":"date-rfc3164", "name":"date"} 353 % % 354 {"type":"char-to", "name":"host", "extradata":":"} 355 % no longer listening on % 356 {"type":"ipv4", "name":"ip"} 357 %#% 358 {"type":"number", "name":"port"} 359 % 360 361Which format you use and how you exactly use it is up to you. 362 363Some guidelines: 364 365 * using the "literal" parser in JSON should be avoided currently; the 366 experimental version does have some rough edges where conflicts 367 in literal processing will not be properly handled. This should not 368 be an issue in "closed environments", like "repeat", where no such 369 conflict can occur. 370 * otherwise, JSON is perfect for very complex things (like nesting of 371 parsers - it is **not** suggested to use any other format for these 372 kinds of things. 373 * if a field needs to be matched but the result of that match is not 374 needed, omit the "name" attribute; specifically avoid using 375 the more verbose ``"name":"-"``. 376 * it is a good idea to start each defintion with ``"type":"..."`` 377 as this provides a good quick overview over what is being defined. 378 379Mandatory Parameters 380.................... 381 382type 383~~~~ 384The field type, selects the parser to use. See "fields" below for description. 385 386Optional Generic Parameters 387........................... 388 389name 390~~~~ 391The field name to use. If "-" is used, the field is matched, but not stored. 392In this case, you can simply **not** specify a field name, which is the 393preferred way of doing this. 394 395priority 396~~~~~~~~ 397The priority to assign to this parser. Priorities are numerical values in the 398range from 0 (highest) to 65535 (lowest). If multiple parsers could match at 399a given character position of a log line, parsers are tried in priority order. 400Different priorities can lead to different parsing. For example, if the 401greedy "rest" type is assigned priority 0, and no other parser is assigned the 402same priority, no other parser will ever match (because "rest" is very greedy 403and always matches the rest of the message). 404 405Note that liblognorm internally 406has a parser-specific priority, which is selected by the program developer based 407on the specificallity of a type. If the user assigns equal priorities, parsers are 408executed based on the parser-specific priority. 409 410The default priority value is 30,000. 411 412Field types 413----------- 414We have legacy and regular field types. Pre-v2, we did not have user-defined types. 415As such, there was a relatively large number of parsers that handled very similar 416cases, for example for strings. These parsers still work and may even provide 417best performance in extreme cases. In v2, we focus on fewer, but more 418generic parsers, which are then tailored via parameters. 419 420There is nothing bad about using legacy parsers and there is no 421plan to outphase them at any time in the future. We just wanted to 422let you know, especially if you wonder about some "wereid" parsers. 423In v1, parsers could have only a single paramter, which was called 424"extradata" at that time. This is why some of the legacy parsers 425require or support a parameter named "extradata" and do not use a 426better name for it (internally, the legacy format creates a 427v2 parser defintion with "extradata" being populated from the 428legacy "extradata" part of the configuration). 429 430number 431###### 432 433One or more decimal digits. 434 435Parameters 436.......... 437 438format 439~~~~~~ 440 441Specifies the format of the json object. Possible values are "string" and 442"number", with string being the default. If "number" is used, the json 443object will be a native json integer. 444 445maxval 446~~~~~~ 447 448Maximum value permitted for this number. If the value is higher than this, 449it will not be detected by this parser definition and an alternate detection 450path will be pursued. 451 452float 453##### 454 455A floating-pt number represented in non-scientific form. 456 457Parameters 458.......... 459 460format 461~~~~~~ 462 463Specifies the format of the json object. Possible values are "string" and 464"number", with string being the default. If "number" is used, the json 465object will be a native json floating point number. Note that we try to 466preserve the original string serialization format, but keep on your mind 467that floating point numbers are inherently imprecise, so slight variance 468may occur depending on processing them. 469 470 471hexnumber 472######### 473 474A hexadecimal number as seen by this parser begins with the string 475"0x", is followed by 1 or more hex digits and is terminated by white 476space. Any interleaving non-hex digits will cause non-detection. The 477rules are strict to avoid false positives. 478 479Parameters 480.......... 481 482format 483~~~~~~ 484 485Specifies the format of the json object. Possible values are "string" and 486"number", with string being the default. If "number" is used, the json 487object will be a native json integer. Note that json numbers are always 488decimal, so if "number" is selected, the hex number will be converted 489to decimal. The original hex string is no longer available in this case. 490 491maxval 492~~~~~~ 493 494Maximum value permitted for this number. If the value is higher than this, 495it will not be detected by this parser definition and an alternate detection 496path will be pursued. This is most useful if fixed-size hex numbers need to 497be processed. For example, for byte values the "maxval" could be set to 255, 498which ensures that invalid values are not misdetected. 499 500 501kernel-timestamp 502################ 503 504Parses a linux kernel timestamp, which has the format:: 505 506 [ddddd.dddddd] 507 508where "d" is a decimal digit. The part before the period has to 509have at least 5 digits as per kernel code. There is no upper 510limit per se inside the kernel, but liblognorm does not accept 511more than 12 digits, which seems more than sufficient (we may reduce 512the max count if misdetections occur). The part after the period 513has to have exactly 6 digits. 514 515 516whitespace 517########## 518 519This parses all whitespace until the first non-whitespace character 520is found. This check is performed using the ``isspace()`` C library 521function to check for space, horizontal tab, newline, vertical tab, 522feed and carriage return characters. 523 524This parser is primarily a tool to skip to the next "word" if 525the exact number of whitspace characters (and type of whitespace) 526is not known. The current parsing position MUST be on a whitspace, 527else the parser does not match. 528 529Remeber that to just parse but not preserve the field contents, the 530dash ("-") is used as field name in compact format or the "name" 531parameter is simply omitted in JSON format. This is almost always 532expected with the *whitespace* type. 533 534string 535###### 536 537This is a highly customizable parser that can be used to extract 538many types of strings. It is meant to be used for most cases. It 539is suggested that specific string types are created as user-defined 540types using this parser. 541 542This parser supports: 543 544* various quoting modes for strings 545* escape character processing 546 547Parameters 548.......... 549 550quoting.mode 551~~~~~~~~~~~~ 552Specifies how the string is quoted. Possible modes: 553 554* **none** - no quoting is permitted 555* **required** - quotes must be present 556* **auto** - quotes are permitted, but not required 557 558Default is ``auto``. 559 560quoting.escape.mode 561~~~~~~~~~~~~~~~~~~~ 562 563Specifies how quote character escaping is handled. Possible modes: 564 565* **none** - there are no escapes, quote characters are *not* permitted in value 566* **double** - the ending quote character is duplicated to indicate 567 a single quote without termination of the value (e.g. ``""``) 568* **backslash** - a backslash is prepended to the quote character (e.g ``\"``) 569* **both** - both double and backslash escaping can happen and are supported 570 571Default is ``both``. 572 573Note that turning on ``backslash`` mode (or ``both``) has the side-effect that 574backslash escaping is enabled in general. This usually is what you want 575if this option is selected (e.g. otherwise you could no longer represent 576backslash). 577 578**NOTE**: this parameter also affects operation if quoting is **turned off**. That 579is somewhat counter-intuitive, but has traditionally been the case - which means 580we cannot change it. 581 582quoting.char.begin 583~~~~~~~~~~~~~~~~~~ 584 585Sets the begin quote character. 586 587Default is ". 588 589quoting.char.end 590~~~~~~~~~~~~~~~~ 591 592Sets the end quote character. 593 594Default is ". 595 596Note that setting the begin and end quote character permits you to 597support more quoting modes. For example, brackets and braces are 598used by some software for quoting. To handle such string, you can for 599example use a configuration like this:: 600 601 rule=:a %f:string{"quoting.char.begin":"[", "quoting.char.end":"]"}% b 602 603which matches strings like this:: 604 605 a [test test2] b 606 607matching.permitted 608~~~~~~~~~~~~~~~~~~ 609 610This allows to specify a set of characters permitted in the to-be-parsed 611field. It is primarily a utility to extract things like programming-language 612like names (e.g. consisting of letters, digits and a set of special characters 613only), alphanumeric or alphabetic strings. 614 615If this parameter is not specified, all characters are permitted. If it 616is specified, only the configured characters are permitted. 617 618Note that this option reliably only works on US-ASCII data. Multi-byte 619character encodings may lead to strange results. 620 621There are two ways to specify permitted characters. The simple one is to 622specify them directly for the parameter:: 623 624 rule=:%f:string{"matching.permitted":"abc"}% 625 626This only supports literal characters and all must be given as a single 627parameter. For more advanced use cases, an array of permitted characters 628can be provided:: 629 630 rule=:%f:string{"matching.permitted":[ 631 {"class":"digit"}, 632 {"chars":"xX"} 633 ]}% 634 635Here, ``class`` is a specify for the usual character classes, with 636support for: 637 638* digit 639* hexdigit 640* alpha 641* alnum 642 643In contrast, ``chars`` permits to specify literal characters. Both 644``class`` as well as ``chars`` may be specified multiple times inside 645the array. For example, the ``alnum`` class could also be permitted as 646follows:: 647 648 rule=:%f:string{"matching.permitted":[ 649 {"class":"digit"}, 650 {"class":"alpha"} 651 ]}% 652 653matching.mode 654~~~~~~~~~~~~~ 655 656This parameter permits the strict matching requirement of liblognorm, where each 657parser must be terminated by a space character. Possible values are: 658 659* **strict** - which requires that space 660* **lazy** - which does not 661 662Default is ``strict``, this parameter is available starting with version 2.0.6. 663 664In ``lazy`` mode, the parser always matches if at least one character can be matched. 665This can lead to unexpected results, so use it with care. 666 667Example: assume the following message (without quotes):: 668 669 "12:34 56" 670 671And the following parser definition:: 672 673 rule=:%f:string{"matching.permitted":[ {"class":"digit"} ]} 674 %%r:rest% 675 676This will be unresolvable, as ":" is not a digit. With this definition:: 677 678 rule=:%f:string{"matching.permitted":[ {"class":"digit"} ], "matching.mode":"lazy"} 679 %%r:rest% 680 681it becomes resolvable, and ``f`` will contain "12" and ``r`` will contain ":34 56". 682This also shows the risk associated, as the result obtained may not necessarily be 683what was intended. 684 685 686word 687#### 688 689One or more characters, up to the next space (\\x20), or 690up to end of line. 691 692string-to 693######### 694 695One or more characters, up to the next string given in 696"extradata". 697 698alpha 699##### 700 701One or more alphabetic characters, up to the next whitspace, punctuation, 702decimal digit or control character. 703 704char-to 705####### 706 707One or more characters, up to the next character(s) given in 708extradata. 709 710Parameters 711.......... 712 713extradata 714~~~~~~~~~ 715 716This is a mandatory parameter. It contains one or more characters, each of 717which terminates the match. 718 719 720char-sep 721######## 722 723Zero or more characters, up to the next character(s) given in extradata. 724 725Parameters 726.......... 727 728extradata 729~~~~~~~~~~ 730 731This is a mandatory parameter. It contains one or more characters, each of 732which terminates the match. 733 734rest 735#### 736 737Zero or more characters untill end of line. Must always be at end of the 738rule, even though this condition is currently **not** checked. In any case, 739any definitions after *rest* are ignored. 740 741Note that the *rest* syntax should be avoided because it generates 742a very broad match. If it needs to be used, the user shall assign it 743the lowest priority among his parser definitions. Note that the 744parser-sepcific priority is also lowest, so by default it will only 745match if nothing else matches. 746 747quoted-string 748############# 749 750Zero or more characters, surrounded by double quote marks. 751Quote marks are stripped from the match. 752 753op-quoted-string 754################ 755 756Zero or more characters, possibly surrounded by double quote marks. 757If the first character is a quote mark, operates like quoted-string. Otherwise, operates like "word" 758Quote marks are stripped from the match. 759 760date-iso 761######## 762Date in ISO format ('YYYY-MM-DD'). 763 764time-24hr 765######### 766 767Time of format 'HH:MM:SS', where HH is 00..23. 768 769time-12hr 770######### 771 772Time of format 'HH:MM:SS', where HH is 00..12. 773 774duration 775######## 776 777A duration is similar to a timestamp, except that 778it tells about time elapsed. As such, hours can be larger than 23 779and hours may also be specified by a single digit (this, for example, 780is commonly done in Cisco software). 781 782Examples for durations are "12:05:01", "0:00:01" and "37:59:59" but not 783"00:60:00" (HH and MM must still be within the usual range for 784minutes and seconds). 785 786 787date-rfc3164 788############ 789 790Valid date/time in RFC3164 format, i.e.: 'Oct 29 09:47:08'. 791This parser implements several quirks to match malformed 792timestamps from some devices. 793 794Parameters 795.......... 796 797format 798~~~~~~ 799 800Specifies the format of the json object. Possible values are 801 802- **string** - string representation as given in input data 803- **timestamp-unix** - string converted to an unix timestamp (seconds since epoch) 804- **timestamp-unix-ms** - a kind of unix-timestamp, but with millisecond resolution. 805 This format is understood for example by ElasticSearch. Note that RFC3164 does **not** 806 contain subsecond resolution, so this option makes no sense for RFC3164-data only. 807 It is usefull, howerver, if processing mixed sources, some of which contain higher 808 precision. 809 810 811date-rfc5424 812############ 813 814Valid date/time in RFC5424 format, i.e.: 815'1985-04-12T19:20:50.52-04:00'. 816Slightly different formats are allowed. 817 818Parameters 819.......... 820 821format 822~~~~~~ 823 824Specifies the format of the json object. Possible values are 825 826- **string** - string representation as given in input data 827- **timestamp-unix** - string converted to an unix timestamp (seconds since epoch). 828 If subsecond resolution is given in the original timestamp, it is lost. 829- **timestamp-unix-ms** - a kind of unix-timestamp, but with millisecond resolution. 830 This format is understood for example by ElasticSearch. Note that a RFC5424 831 timestamp can contain higher than ms resolution. If so, the timestamp is 832 truncated to millisecond resolution. 833 834 835 836ipv4 837#### 838 839IPv4 address, in dot-decimal notation (AAA.BBB.CCC.DDD). 840 841ipv6 842#### 843 844IPv6 address, in textual notation as specified in RFC4291. 845All formats specified in section 2.2 are supported, including 846embedded IPv4 address (e.g. "::13.1.68.3"). Note that a 847**pure** IPv4 address ("13.1.68.3") is **not** valid and as 848such not recognized. 849 850To avoid false positives, there must be either a whitespace 851character after the IPv6 address or the end of string must be 852reached. 853 854mac48 855##### 856 857The standard (IEEE 802) format for printing MAC-48 addresses in 858human-friendly form is six groups of two hexadecimal digits, 859separated by hyphens (-) or colons (:), in transmission order 860(e.g. 01-23-45-67-89-ab or 01:23:45:67:89:ab ). 861This form is also commonly used for EUI-64. 862from: http://en.wikipedia.org/wiki/MAC_address 863 864cef 865### 866 867This parses ArcSight Comment Event Format (CEF) as described in 868the "Implementing ArcSight CEF" manual revision 20 (2013-06-15). 869 870It matches a format that closely follows the spec. The header fields 871are extracted into the field name container, all extension are 872extracted into a container called "Extensions" beneath it. 873 874Example 875....... 876 877Rule (compact format):: 878 879 rule=:%f:cef' 880 881Data:: 882 883 CEF:0|Vendor|Product|Version|Signature ID|some name|Severity| aa=field1 bb=this is a value cc=field 3 884 885Result:: 886 887 { 888 "f": { 889 "DeviceVendor": "Vendor", 890 "DeviceProduct": "Product", 891 "DeviceVersion": "Version", 892 "SignatureID": "Signature ID", 893 "Name": "some name", 894 "Severity": "Severity", 895 "Extensions": { 896 "aa": "field1", 897 "bb": "this is a value", 898 "cc": "field 3" 899 } 900 } 901 } 902 903checkpoint-lea 904############## 905 906This supports the LEA on-disk format. Unfortunately, the format 907is underdocumented, the Checkpoint docs we could get hold of just 908describe the API and provide a field dictionary. In a nutshell, what 909we do is extract field names up to the colon and values up to the 910semicolon. No escaping rules are known to us, so we assume none 911exists (and as such no semicolon can be part of a value). This 912format needs to continue until the end of the log message. 913 914We have also seen some samples of a LEA format that has data **after** 915the format described above. So it does not end at the end of log line. 916We guess that this is LEA when used inside (syslog) messages. We have 917one sample where the format ends on a brace (`; ]`). To support this, 918the `terminator` parameter exists (see below). 919 920If someone has a definitive reference or a sample set to contribute 921to the project, please let us know and we will check if we need to 922add additional transformations. 923 924Parameters 925.......... 926 927terminator 928~~~~~~~~~~ 929Must be a single character. If used, LEA format is terminated when the 930character is hit instead of a field name. Note that the terminator character 931is **not** part of LEA. It it should be skipped, it must be specified as 932a literal after the parser. We have implemented it in this way as this 933provides most options for this format - about which we do not know any 934details. 935 936Example 937....... 938 939This configures a LEA parser for use with the syslog transfer format 940(if we guess right). It terminates when a brace is detected. 941 942Rule (condensed format):: 943 944 rule=:%field:checkpoint-lea{"terminator": "]"}%] 945 946Data:: 947 948 tcp_flags: RST-ACK; src: 192.168.0.1; ] 949 950Result:: 951 952 { "field": { "tcp_flags": "RST-ACK", "src": "192.168.0.1" } }' 953 954 955cisco-interface-spec 956#################### 957 958A Cisco interface specifier, as for example seen in PIX or ASA. 959The format contains a number of optional parts and is described 960as follows (in ABNF-like manner where square brackets indicate 961optional parts): 962 963:: 964 965 [interface:]ip/port [SP (ip2/port2)] [[SP](username)] 966 967Samples for such a spec are: 968 969 * outside:192.168.52.102/50349 970 * inside:192.168.1.15/56543 (192.168.1.112/54543) 971 * outside:192.168.1.13/50179 (192.168.1.13/50179)(LOCAL\some.user) 972 * outside:192.168.1.25/41850(LOCAL\RG-867G8-DEL88D879BBFFC8) 973 * inside:192.168.1.25/53 (192.168.1.25/53) (some.user) 974 * 192.168.1.15/0(LOCAL\RG-867G8-DEL88D879BBFFC8) 975 976Note that the current verision of liblognorm does not permit sole 977IP addresses to be detected as a Cisco interface spec. However, we 978are reviewing more Cisco message and need to decide if this is 979to be supported. The problem here is that this would create a much 980broader parser which would potentially match many things that are 981**not** Cisco interface specs. 982 983As this object extracts multiple subelements, it create a JSON 984structure. 985 986Let's for example look at this definiton (compact format):: 987 988 %ifaddr:cisco-interface-spec% 989 990and assume the following message is to be parsed:: 991 992 outside:192.168.1.13/50179 (192.168.1.13/50179) (LOCAL\some.user) 993 994Then the resulting JSON will be as follows:: 995 996{ "ifaddr": { "interface": "outside", "ip": "192.168.1.13", "port": "50179", "ip2": "192.168.1.13", "port2": "50179", "user": "LOCAL\\some.user" } } 997 998Subcomponents that are not given in the to-be-normalized string are 999also not present in the resulting JSON. 1000 1001iptables 1002######## 1003 1004Name=value pairs, separated by spaces, as in Netfilter log messages. 1005Name of the selector is not used; names from the line are 1006used instead. This selector always matches everything till 1007end of the line. Cannot match zero characters. 1008 1009cisco-interface-spec 1010#################### 1011 1012This is an experimental parser. It is used to detect Cisco Interface 1013Specifications. A sample of them is: 1014 1015:: 1016 1017 outside:176.97.252.102/50349 1018 1019Note that this parser does not yet extract the individual parts 1020due to the restrictions in current liblognorm. This is planned for 1021after a general algorithm overhaul. 1022 1023In order to match, this syntax must start on a non-whitespace char 1024other than colon. 1025 1026json 1027#### 1028This parses native JSON from the message. All data up to the first non-JSON 1029is parsed into the field. There may be any other field after the JSON, 1030including another JSON section. 1031 1032Note that any white space after the actual JSON 1033is considered **to be part of the JSON**. So you cannot filter on whitespace 1034after the JSON. 1035 1036Example 1037....... 1038 1039Rule (compact format):: 1040 1041 rule=:%field1:json%interim text %field2:json%' 1042 1043Data:: 1044 1045 {"f1": "1"} interim text {"f2": 2} 1046 1047Result:: 1048 1049 { "field2": { "f2": 2 }, "field1": { "f1": "1" } } 1050 1051Note also that the space before "interim" must **not** be given in the 1052rule, as it is consumed by the JSON parser. However, the space after 1053"text" is required. 1054 1055alternative 1056########### 1057 1058This type permits to specify alternative ways of parsing within a single 1059definition. This can make writing rule bases easier. It also permits the 1060v2 engine to create a more efficient parsing data structure resulting in 1061better performance (to be noticed only in extreme cases, though). 1062 1063An example explains this parser best:: 1064 1065 rule=:a % 1066 {"type":"alternative", 1067 "parser": [ 1068 {"name":"num", "type":"number"}, 1069 {"name":"hex", "type":"hexnumber"} 1070 ] 1071 }% b 1072 1073This rule matches messages like these:: 1074 1075 a 1234 b 1076 a 0xff b 1077 1078Note that the "parser" parameter here needs to be provided with an array 1079of *alternatives*. In this case, the JSON array is **not** interpreted as 1080a sequence. Note, though that you can nest defintions by using custom types. 1081 1082repeat 1083###### 1084This parser is used to extract a repeated sequence with the same pattern. 1085 1086An example explains this parser best:: 1087 1088 rule=:a % 1089 {"name":"numbers", "type":"repeat", 1090 "parser":[ 1091 {"type":"number", "name":"n1"}, 1092 {"type":"literal", "text":":"}, 1093 {"type":"number", "name":"n2"} 1094 ], 1095 "while":[ 1096 {"type":"literal", "text":", "} 1097 ] 1098 }% b 1099 1100This matches lines like this:: 1101 1102 a 1:2, 3:4, 5:6, 7:8 b 1103 1104and will generate this JSON:: 1105 1106 { "numbers": [ 1107 { "n2": "2", "n1": "1" }, 1108 { "n2": "4", "n1": "3" }, 1109 { "n2": "6", "n1": "5" }, 1110 { "n2": "8", "n1": "7" } 1111 ] 1112 } 1113 1114As can be seen, there are two parameters to "alternative". The parser 1115parameter specifies which type should be repeatedly parsed out of 1116the input data. We could use a single parser for that, but in the example 1117above we parse a sequence. Note the nested array in the "parser" parameter. 1118 1119If we just wanted to match a single list of numbers like:: 1120 1121 a 1, 2, 3, 4 b 1122 1123we could use this definition:: 1124 1125 rule=:a % 1126 {"name":"numbers", "type":"repeat", 1127 "parser": 1128 {"type":"number", "name":"n"}, 1129 "while": 1130 {"type":"literal", "text":", "} 1131 }% b 1132 1133Note that in this example we also removed the redundant single-element 1134array in "while". 1135 1136The "while" parameter tells "repeat" how long to do repeat processing. It 1137is specified by any parser, including a nested sequence of parser (array). 1138As long as the "while" part matches, the repetition is continued. If it no 1139longer matches, "repeat" processing is successfully completed. Note that 1140the "parser" parameter **must** match at least once, otherwise "repeat" 1141fails. 1142 1143In the above sample, "while" mismatches after "4", because no ", " follows. 1144Then, the parser termiantes, and according to definition the literal " b" 1145is matched, which will result in a successful rule match (note: the "a ", 1146" b" literals are just here for explanatory purposes and could be any 1147other rule element). 1148 1149Sometimes we need to deal with malformed messages. For example, we 1150could have a sequence like this:: 1151 1152 a 1:2, 3:4,5:6, 7:8 b 1153 1154Note the missing space after "4,". To handle such cases, we can nest the 1155"alternative" parser inside "while":: 1156 1157 rule=:a % 1158 {"name":"numbers", "type":"repeat", 1159 "parser":[ 1160 {"type":"number", "name":"n1"}, 1161 {"type":"literal", "text":":"}, 1162 {"type":"number", "name":"n2"} 1163 ], 1164 "while": { 1165 "type":"alternative", "parser": [ 1166 {"type":"literal", "text":", "}, 1167 {"type":"literal", "text":","} 1168 ] 1169 } 1170 }% b 1171 1172This definition handles numbers being delemited by either ", " or ",". 1173 1174For people with programming skills, the "repeat" parser is described 1175by this pseudocode:: 1176 1177 do 1178 parse via parsers given in "parser" 1179 if parsing fails 1180 abort "repeat" unsuccessful 1181 parse via parsers given in "while" 1182 while the "while" parsers parsed successfully 1183 if not aborted, flag "repeat" as successful 1184 1185Parameters 1186.......... 1187 1188option.permitMismatchInParser 1189~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1190If set to "True", permits repeat to accept as successful even when 1191the parser processing failed. This by default is false, and can be 1192set to true to cover some border cases, where the while part cannot 1193definitely detect the end of processing. An example of such a border 1194case is a listing of flags, being terminated by a double space where 1195each flag is delimited by single spaces. For example, Cisco products 1196generate such messages (note the flags part):: 1197 1198 Aug 18 13:18:45 192.168.0.1 %ASA-6-106015: Deny TCP (no connection) from 10.252.88.66/443 to 10.79.249.222/52746 flags RST on interface outside 1199 1200cee-syslog 1201########## 1202This parses cee syslog from the message. This format has been defined 1203by Mitre CEE as well as Project Lumberjack. 1204 1205This format essentially is JSON with additional restrictions: 1206 1207 * The message must start with "@cee:" 1208 * an JSON **object** must immediately follow (whitespace before it permitted, 1209 but a JSON array is **not** permitted) 1210 * after the JSON, there must be no other non-whitespace characters. 1211 1212In other words: the message must consist of a single JSON object only, 1213prefixed by the "@cee:" cookie. 1214 1215Note that the cee cookie is case sensitive, so "@CEE:" is **NOT** valid. 1216 1217Prefixes 1218-------- 1219 1220Several rules can have a common prefix. You can set it once with this 1221syntax:: 1222 1223 prefix=<prefix match description> 1224 1225Prefix match description syntax is the same as rule match description. 1226Every following rule will be treated as an addition to this prefix. 1227 1228Prefix can be reset to default (empty value) by the line:: 1229 1230 prefix= 1231 1232You can define a prefix for devices that produce the same header in each 1233message. We assume, that you have your rules sorted by device. In such a 1234case you can take the header of the rules and use it with the prefix 1235variable. Here is a example of a rule for IPTables (legacy format, to be converted later):: 1236 1237 prefix=%date:date-rfc3164% %host:word% %tag:char-to:-\x3a%: 1238 rule=:INBOUND%INBOUND:char-to:-\x3a%: IN=%IN:word% PHYSIN=%PHYSIN:word% OUT=%OUT:word% PHYSOUT=%PHYSOUT:word% SRC=%source:ipv4% DST=%destination:ipv4% LEN=%LEN:number% TOS=%TOS:char-to: % PREC=%PREC:word% TTL=%TTL:number% ID=%ID:number% DF PROTO=%PROTO:word% SPT=%SPT:number% DPT=%DPT:number% WINDOW=%WINDOW:number% RES=0x00 ACK SYN URGP=%URGP:number% 1239 1240Usually, every rule would hold what is defined in the prefix at its 1241beginning. But since we can define the prefix, we can save that work in 1242every line and just make the rules for the log lines. This saves us a lot 1243of work and even saves space. 1244 1245In a rulebase you can use multiple prefixes obviously. The prefix will be 1246used for the following rules. If then another prefix is set, the first one 1247will be erased, and new one will be used for the following rules. 1248 1249Rule tags 1250--------- 1251 1252Rule tagging capability permits very easy classification of syslog 1253messages and log records in general. So you can not only extract data from 1254your various log source, you can also classify events, for example, as 1255being a "login", a "logout" or a firewall "denied access". This makes it 1256very easy to look at specific subsets of messages and process them in ways 1257specific to the information being conveyed. 1258 1259To see how it works, let’s first define what a tag is: 1260 1261A tag is a simple alphanumeric string that identifies a specific type of 1262object, action, status, etc. For example, we can have object tags for 1263firewalls and servers. For simplicity, let’s call them "firewall" and 1264"server". Then, we can have action tags like "login", "logout" and 1265"connectionOpen". Status tags could include "success" or "fail", among 1266others. Tags form a flat space, there is no inherent relationship between 1267them (but this may be added later on top of the current implementation). 1268Think of tags like the tag cloud in a blogging system. Tags can be defined 1269for any reason and need. A single event can be associated with as many 1270tags as required. 1271 1272Assigning tags to messages is simple. A rule contains both the sample of 1273the message (including the extracted fields) as well as the tags. 1274Have a look at this sample:: 1275 1276 rule=:sshd[%pid:number%]: Invalid user %user:word% from %src-ip:ipv4% 1277 1278Here, we have a rule that shows an invalid ssh login request. The various 1279fields are used to extract information into a well-defined structure. Have 1280you ever wondered why every rule starts with a colon? Now, here is the 1281answer: the colon separates the tag part from the actual sample part. 1282Now, you can create a rule like this:: 1283 1284 rule=ssh,user,login,fail:sshd[%pid:number%]: Invalid user %user:word% from %src-ip:ipv4% 1285 1286Note the "ssh,user,login,fail" part in front of the colon. These are the 1287four tags the user has decided to assign to this event. What now happens 1288is that the normalizer does not only extract the information from the 1289message if it finds a match, but it also adds the tags as metadata. Once 1290normalization is done, one can not only query the individual fields, but 1291also query if a specific tag is associated with this event. For example, 1292to find all ssh-related events (provided the rules are built that way), 1293you can normalize a large log and select only that subset of the 1294normalized log that contains the tag "ssh". 1295 1296Log annotations 1297--------------- 1298 1299In short, annotations allow to add arbitrary attributes to a parsed 1300message, depending on rule tags. Values of these attributes are fixed, 1301they cannot be derived from variable fields. Syntax is as following:: 1302 1303 annotate=<tag>:+<field name>="<field value>" 1304 1305Field value should always be enclosed in double quote marks. 1306 1307There can be multiple annotations for the same tag. 1308 1309Examples 1310-------- 1311 1312Look at :doc:`sample rulebase <sample_rulebase>` for configuration 1313examples and matching log lines. Note that the examples are currently 1314in legacy format, only. 1315