1Regex Module 2 3Iñaki Baz Castillo 4 5 <ibc@aliax.net> 6 7Edited by 8 9Iñaki Baz Castillo 10 11 <ibc@aliax.net> 12 13 Copyright © 2009 Iñaki Baz Castillo 14 __________________________________________________________________ 15 16 Table of Contents 17 18 1. Admin Guide 19 20 1. Overview 21 2. Dependencies 22 23 2.1. Kamailio Modules 24 2.2. External Libraries or Applications 25 26 3. Parameters 27 28 3.1. file (string) 29 3.2. max_groups (int) 30 3.3. group_max_size (int) 31 3.4. pcre_caseless (int) 32 3.5. pcre_multiline (int) 33 3.6. pcre_dotall (int) 34 3.7. pcre_extended (int) 35 36 4. Functions 37 38 4.1. pcre_match (string, pcre_regex) 39 4.2. pcre_match_group (string [, group]) 40 41 5. RPC Commands 42 43 5.1. regex.reload 44 45 6. Installation and Running 46 47 6.1. File format 48 49 List of Examples 50 51 1.1. Set file parameter 52 1.2. Set max_groups parameter 53 1.3. Set group_max_size parameter 54 1.4. Set pcre_caseless parameter 55 1.5. Set pcre_multiline parameter 56 1.6. Set pcre_dotall parameter 57 1.7. Set pcre_extended parameter 58 1.8. pcre_match usage (forcing case insensitive) 59 1.9. pcre_match usage (using "end of line" symbol) 60 1.10. pcre_match_group usage 61 1.11. pcre_match_group usage (using a pseudo-variable as group) 62 1.12. regex file 63 1.13. Using with pua_usrloc 64 1.14. Incorrect groups file 65 66Chapter 1. Admin Guide 67 68 Table of Contents 69 70 1. Overview 71 2. Dependencies 72 73 2.1. Kamailio Modules 74 2.2. External Libraries or Applications 75 76 3. Parameters 77 78 3.1. file (string) 79 3.2. max_groups (int) 80 3.3. group_max_size (int) 81 3.4. pcre_caseless (int) 82 3.5. pcre_multiline (int) 83 3.6. pcre_dotall (int) 84 3.7. pcre_extended (int) 85 86 4. Functions 87 88 4.1. pcre_match (string, pcre_regex) 89 4.2. pcre_match_group (string [, group]) 90 91 5. RPC Commands 92 93 5.1. regex.reload 94 95 6. Installation and Running 96 97 6.1. File format 98 991. Overview 100 101 This module offers matching operations using regular expressions based 102 on the powerful PCRE library. 103 104 A text file containing regular expressions categorized in groups is 105 compiled when the module is loaded, the resulting PCRE objects are 106 stored in an array. A function to match a string or pseudo-variable 107 against any of these groups is provided. The text file can be modified 108 and reloaded at any time via a RPC command. The module also offers a 109 function to perform a PCRE matching operation against a regular 110 expression provided as function parameter. 111 112 For a detailed list of PCRE features read the man page of the library. 113 1142. Dependencies 115 116 2.1. Kamailio Modules 117 2.2. External Libraries or Applications 118 1192.1. Kamailio Modules 120 121 The following modules must be loaded before this module: 122 * No dependencies on other Kamailio modules. 123 1242.2. External Libraries or Applications 125 126 The following libraries or applications must be installed before 127 running Kamailio with this module loaded: 128 * libpcre - the libraries of PCRE. 129 1303. Parameters 131 132 3.1. file (string) 133 3.2. max_groups (int) 134 3.3. group_max_size (int) 135 3.4. pcre_caseless (int) 136 3.5. pcre_multiline (int) 137 3.6. pcre_dotall (int) 138 3.7. pcre_extended (int) 139 1403.1. file (string) 141 142 Text file containing the regular expression groups. It must be set in 143 order to enable the group matching function. 144 145 Default value is “NULL”. 146 147 Example 1.1. Set file parameter 148... 149modparam("regex", "file", "/etc/kamailio/regex_groups") 150... 151 1523.2. max_groups (int) 153 154 Max number of regular expression groups in the text file. 155 156 Default value is “20”. 157 158 Example 1.2. Set max_groups parameter 159... 160modparam("regex", "max_groups", 40) 161... 162 1633.3. group_max_size (int) 164 165 Max content size of a group in the text file. 166 167 Default value is “8192”. 168 169 Example 1.3. Set group_max_size parameter 170... 171modparam("regex", "group_max_size", 16384) 172... 173 1743.4. pcre_caseless (int) 175 176 If this options is set, matching is done caseless. It is equivalent to 177 Perl's /i option, and it can be changed within a pattern by a (?i) or 178 (?-i) option setting. 179 180 Default value is “0”. 181 182 Example 1.4. Set pcre_caseless parameter 183... 184modparam("regex", "pcre_caseless", 1) 185... 186 1873.5. pcre_multiline (int) 188 189 By default, PCRE treats the subject string as consisting of a single 190 line of characters (even if it actually contains newlines). The "start 191 of line" metacharacter (^) matches only at the start of the string, 192 while the "end of line" metacharacter ($) matches only at the end of 193 the string, or before a terminating newline. 194 195 When this option is set, the "start of line" and "end of line" 196 constructs match immediately following or immediately before internal 197 newlines in the subject string, respectively, as well as at the very 198 start and end. This is equivalent to Perl's /m option, and it can be 199 changed within a pattern by a (?m) or (?-m) option setting. If there 200 are no newlines in a subject string, or no occurrences of ^ or $ in a 201 pattern, setting this option has no effect. 202 203 Default value is “0”. 204 205 Example 1.5. Set pcre_multiline parameter 206... 207modparam("regex", "pcre_multiline", 1) 208... 209 2103.6. pcre_dotall (int) 211 212 If this option is set, a dot metacharacter in the pattern matches all 213 characters, including those that indicate newline. Without it, a dot 214 does not match when the current position is at a newline. This option 215 is equivalent to Perl's /s option, and it can be changed within a 216 pattern by a (?s) or (?-s) option setting. 217 218 Default value is “0”. 219 220 Example 1.6. Set pcre_dotall parameter 221... 222modparam("regex", "pcre_dotall", 1) 223... 224 2253.7. pcre_extended (int) 226 227 If this option is set, whitespace data characters in the pattern are 228 totally ignored except when escaped or inside a character class. 229 Whitespace does not include the VT character (code 11). In addition, 230 characters between an unescaped # outside a character class and the 231 next newline, inclusive, are also ignored. This is equivalent to Perl's 232 /x option, and it can be changed within a pattern by a (?x) or (?-x) 233 option setting. 234 235 Default value is “0”. 236 237 Example 1.7. Set pcre_extended parameter 238... 239modparam("regex", "pcre_extended", 1) 240... 241 2424. Functions 243 244 4.1. pcre_match (string, pcre_regex) 245 4.2. pcre_match_group (string [, group]) 246 2474.1. pcre_match (string, pcre_regex) 248 249 Matches the given string parameter against the regular expression 250 pcre_regex, which is compiled in runtime into a PCRE object. Returns 251 TRUE if it matches, FALSE otherwise. 252 253 Meaning of the parameters is as follows: 254 * string - String or pseudo-variable to compare. 255 * pcre_regex - Regular expression to be compiled in a PCRE object. It 256 can be a string or pseudo-variable. 257 258 NOTE: To use the "end of line" symbol '$' in the pcre_regex parameter 259 use '$$'. 260 261 This function can be used from REQUEST_ROUTE, FAILURE_ROUTE, 262 ONREPLY_ROUTE, BRANCH_ROUTE and LOCAL_ROUTE. 263 264 Example 1.8. pcre_match usage (forcing case insensitive) 265... 266if (pcre_match("$ua", "(?i)^twinkle")) { 267 xlog("L_INFO", "User-Agent matches\n"); 268} 269... 270 271 Example 1.9. pcre_match usage (using "end of line" symbol) 272... 273if (pcre_match("$rU", "^user[1234]$$")) { # Will be converted to "^user[1234]$" 274 xlog("L_INFO", "RURI username matches\n"); 275} 276... 277 2784.2. pcre_match_group (string [, group]) 279 280 Tries to match the given string against a specific group in the text 281 file (see Section 6.1, “File format”). Returns TRUE if it matches, 282 FALSE otherwise. 283 284 Meaning of the parameters is as follows: 285 * string - String or pseudo-variable to compare. 286 * group - Number of group to use in the operation. If not specified 287 then 0 (the first group) is used. A pseudo-variable containing an 288 integer can also be used. 289 290 This function can be used from REQUEST_ROUTE, FAILURE_ROUTE, 291 ONREPLY_ROUTE, BRANCH_ROUTE and LOCAL_ROUTE. 292 293 Example 1.10. pcre_match_group usage 294... 295if (pcre_match_group("$rU", "2")) { 296 xlog("L_INFO", "RURI username matches group 2\n"); 297} 298... 299 300 Example 1.11. pcre_match_group usage (using a pseudo-variable as 301 group) 302... 303$avp(i:10) = 5; # Maybe got from a DB query. 304if (pcre_match_group("$ua", "$avp(i:10)")) { 305 xlog("L_INFO", "User-Agent matches group 5\n"); 306} 307... 308 3095. RPC Commands 310 311 5.1. regex.reload 312 3135.1. regex.reload 314 315 Causes regex module to re-read the content of the text file and 316 re-compile the regular expressions. The number of groups in the file 317 can be modified safely. 318 319 Name: regex.reload 320 321 Parameters: none 322 323 RPC Command Example: 324... 325kamcmd regex.reload 326... 327 3286. Installation and Running 329 330 6.1. File format 331 3326.1. File format 333 334 The file contains regular expressions categorized in groups. Each group 335 starts with "[number]" line. Lines starting by space, tab, CR, LF or # 336 (comments) are ignored. Each regular expression must take up just one 337 line, this means that a regular expression can't be split in various 338 lines. 339 340 An example of the file format would be the following: 341 342 Example 1.12. regex file 343### List of User-Agents publishing presence status 344[0] 345 346# Softphones 347^Twinkle/1 348^X-Lite 349^eyeBeam 350^Bria 351^SIP Communicator 352^Linphone 353 354# Deskphones 355^Snom 356 357# Others 358^SIPp 359^PJSUA 360 361 362### Blacklisted source IP's 363[1] 364 365^190\.232\.250\.226$ 366^122\.5\.27\.125$ 367^86\.92\.112\. 368 369 370### Free PSTN destinations in Spain 371[2] 372 373^1\d{3}$ 374^((\+|00)34)?900\d{6}$ 375 376 The module compiles the text above to the following regular 377 expressions: 378group 0: ((^Twinkle/1)|(^X-Lite)|(^eyeBeam)|(^Bria)|(^SIP Communicator)| 379 (^Linphone)|(^Snom)|(^SIPp)|(^PJSUA)) 380group 1: ((^190\.232\.250\.226$)|(^122\.5\.27\.125$)|(^86\.92\.112\.)) 381group 2: ((^1\d{3}$)|(^((\+|00)34)?900\d{6}$)) 382 383 The first group can be used to avoid auto-generated PUBLISH (pua_usrloc 384 module) for UA's already supporting presence: 385 386 Example 1.13. Using with pua_usrloc 387route[REGISTER] { 388 if (! pcre_match_group("$ua", "0")) { 389 xlog("L_INFO", "Auto-generated PUBLISH for $fu ($ua)\n"); 390 pua_set_publish(); 391 } 392 save("location"); 393 exit; 394} 395 396 NOTE: It's important to understand that the numbers in each group 397 header ([number]) must start by 0. If not, the real group number will 398 not match the number appearing in the file. For example, the following 399 text file: 400 401 Example 1.14. Incorrect groups file 402[1] 403^aaa 404^bbb 405 406[2] 407^ccc 408^ddd 409 410 will generate the following regular expressions: 411group 0: ((^aaa)|(^bbb)) 412group 1: ((^ccc)|(^ddd)) 413 414 Note that the real index doesn't match the group number in the file. 415 This is, compiled group 0 always points to the first group in the file, 416 regardless of its number in the file. In fact, the group number 417 appearing in the file is used for nothing but for delimiting different 418 groups. 419 420 NOTE: A line containing a regular expression cannot start by '[' since 421 it would be treated as a new group. The same for lines starting by 422 space, tab, or '#' (they would be ignored by the parser). As a 423 workaround, using brackets would work: 424[0] 425([0-9]{9}) 426( #abcde) 427( qwerty) 428