1Regex Module
2
3Iñaki Baz Castillo
4
5   <ibc@aliax.net>
6
7Edited by
8
9Iñaki Baz Castillo
10
11   <ibc@aliax.net>
12
13   Copyright © 2009 Iñaki Baz Castillo
14     __________________________________________________________________
15
16   Table of Contents
17
18   1. Admin Guide
19
20        1. Overview
21        2. Dependencies
22
23              2.1. Kamailio Modules
24              2.2. External Libraries or Applications
25
26        3. Parameters
27
28              3.1. file (string)
29              3.2. max_groups (int)
30              3.3. group_max_size (int)
31              3.4. pcre_caseless (int)
32              3.5. pcre_multiline (int)
33              3.6. pcre_dotall (int)
34              3.7. pcre_extended (int)
35
36        4. Functions
37
38              4.1. pcre_match (string, pcre_regex)
39              4.2. pcre_match_group (string [, group])
40
41        5. RPC Commands
42
43              5.1. regex.reload
44
45        6. Installation and Running
46
47              6.1. File format
48
49   List of Examples
50
51   1.1. Set file parameter
52   1.2. Set max_groups parameter
53   1.3. Set group_max_size parameter
54   1.4. Set pcre_caseless parameter
55   1.5. Set pcre_multiline parameter
56   1.6. Set pcre_dotall parameter
57   1.7. Set pcre_extended parameter
58   1.8. pcre_match usage (forcing case insensitive)
59   1.9. pcre_match usage (using "end of line" symbol)
60   1.10. pcre_match_group usage
61   1.11. pcre_match_group usage (using a pseudo-variable as group)
62   1.12. regex file
63   1.13. Using with pua_usrloc
64   1.14. Incorrect groups file
65
66Chapter 1. Admin Guide
67
68   Table of Contents
69
70   1. Overview
71   2. Dependencies
72
73        2.1. Kamailio Modules
74        2.2. External Libraries or Applications
75
76   3. Parameters
77
78        3.1. file (string)
79        3.2. max_groups (int)
80        3.3. group_max_size (int)
81        3.4. pcre_caseless (int)
82        3.5. pcre_multiline (int)
83        3.6. pcre_dotall (int)
84        3.7. pcre_extended (int)
85
86   4. Functions
87
88        4.1. pcre_match (string, pcre_regex)
89        4.2. pcre_match_group (string [, group])
90
91   5. RPC Commands
92
93        5.1. regex.reload
94
95   6. Installation and Running
96
97        6.1. File format
98
991. Overview
100
101   This module offers matching operations using regular expressions based
102   on the powerful PCRE library.
103
104   A text file containing regular expressions categorized in groups is
105   compiled when the module is loaded, the resulting PCRE objects are
106   stored in an array. A function to match a string or pseudo-variable
107   against any of these groups is provided. The text file can be modified
108   and reloaded at any time via a RPC command. The module also offers a
109   function to perform a PCRE matching operation against a regular
110   expression provided as function parameter.
111
112   For a detailed list of PCRE features read the man page of the library.
113
1142. Dependencies
115
116   2.1. Kamailio Modules
117   2.2. External Libraries or Applications
118
1192.1. Kamailio Modules
120
121   The following modules must be loaded before this module:
122     * No dependencies on other Kamailio modules.
123
1242.2. External Libraries or Applications
125
126   The following libraries or applications must be installed before
127   running Kamailio with this module loaded:
128     * libpcre - the libraries of PCRE.
129
1303. Parameters
131
132   3.1. file (string)
133   3.2. max_groups (int)
134   3.3. group_max_size (int)
135   3.4. pcre_caseless (int)
136   3.5. pcre_multiline (int)
137   3.6. pcre_dotall (int)
138   3.7. pcre_extended (int)
139
1403.1. file (string)
141
142   Text file containing the regular expression groups. It must be set in
143   order to enable the group matching function.
144
145   Default value is “NULL”.
146
147   Example 1.1. Set file parameter
148...
149modparam("regex", "file", "/etc/kamailio/regex_groups")
150...
151
1523.2. max_groups (int)
153
154   Max number of regular expression groups in the text file.
155
156   Default value is “20”.
157
158   Example 1.2. Set max_groups parameter
159...
160modparam("regex", "max_groups", 40)
161...
162
1633.3. group_max_size (int)
164
165   Max content size of a group in the text file.
166
167   Default value is “8192”.
168
169   Example 1.3. Set group_max_size parameter
170...
171modparam("regex", "group_max_size", 16384)
172...
173
1743.4. pcre_caseless (int)
175
176   If this options is set, matching is done caseless. It is equivalent to
177   Perl's /i option, and it can be changed within a pattern by a (?i) or
178   (?-i) option setting.
179
180   Default value is “0”.
181
182   Example 1.4. Set pcre_caseless parameter
183...
184modparam("regex", "pcre_caseless", 1)
185...
186
1873.5. pcre_multiline (int)
188
189   By default, PCRE treats the subject string as consisting of a single
190   line of characters (even if it actually contains newlines). The "start
191   of line" metacharacter (^) matches only at the start of the string,
192   while the "end of line" metacharacter ($) matches only at the end of
193   the string, or before a terminating newline.
194
195   When this option is set, the "start of line" and "end of line"
196   constructs match immediately following or immediately before internal
197   newlines in the subject string, respectively, as well as at the very
198   start and end. This is equivalent to Perl's /m option, and it can be
199   changed within a pattern by a (?m) or (?-m) option setting. If there
200   are no newlines in a subject string, or no occurrences of ^ or $ in a
201   pattern, setting this option has no effect.
202
203   Default value is “0”.
204
205   Example 1.5. Set pcre_multiline parameter
206...
207modparam("regex", "pcre_multiline", 1)
208...
209
2103.6. pcre_dotall (int)
211
212   If this option is set, a dot metacharacter in the pattern matches all
213   characters, including those that indicate newline. Without it, a dot
214   does not match when the current position is at a newline. This option
215   is equivalent to Perl's /s option, and it can be changed within a
216   pattern by a (?s) or (?-s) option setting.
217
218   Default value is “0”.
219
220   Example 1.6. Set pcre_dotall parameter
221...
222modparam("regex", "pcre_dotall", 1)
223...
224
2253.7. pcre_extended (int)
226
227   If this option is set, whitespace data characters in the pattern are
228   totally ignored except when escaped or inside a character class.
229   Whitespace does not include the VT character (code 11). In addition,
230   characters between an unescaped # outside a character class and the
231   next newline, inclusive, are also ignored. This is equivalent to Perl's
232   /x option, and it can be changed within a pattern by a (?x) or (?-x)
233   option setting.
234
235   Default value is “0”.
236
237   Example 1.7. Set pcre_extended parameter
238...
239modparam("regex", "pcre_extended", 1)
240...
241
2424. Functions
243
244   4.1. pcre_match (string, pcre_regex)
245   4.2. pcre_match_group (string [, group])
246
2474.1.  pcre_match (string, pcre_regex)
248
249   Matches the given string parameter against the regular expression
250   pcre_regex, which is compiled in runtime into a PCRE object. Returns
251   TRUE if it matches, FALSE otherwise.
252
253   Meaning of the parameters is as follows:
254     * string - String or pseudo-variable to compare.
255     * pcre_regex - Regular expression to be compiled in a PCRE object. It
256       can be a string or pseudo-variable.
257
258   NOTE: To use the "end of line" symbol '$' in the pcre_regex parameter
259   use '$$'.
260
261   This function can be used from REQUEST_ROUTE, FAILURE_ROUTE,
262   ONREPLY_ROUTE, BRANCH_ROUTE and LOCAL_ROUTE.
263
264   Example 1.8.  pcre_match usage (forcing case insensitive)
265...
266if (pcre_match("$ua", "(?i)^twinkle")) {
267    xlog("L_INFO", "User-Agent matches\n");
268}
269...
270
271   Example 1.9.  pcre_match usage (using "end of line" symbol)
272...
273if (pcre_match("$rU", "^user[1234]$$")) {  # Will be converted to "^user[1234]$"
274    xlog("L_INFO", "RURI username matches\n");
275}
276...
277
2784.2.  pcre_match_group (string [, group])
279
280   Tries to match the given string against a specific group in the text
281   file (see Section 6.1, “File format”). Returns TRUE if it matches,
282   FALSE otherwise.
283
284   Meaning of the parameters is as follows:
285     * string - String or pseudo-variable to compare.
286     * group - Number of group to use in the operation. If not specified
287       then 0 (the first group) is used. A pseudo-variable containing an
288       integer can also be used.
289
290   This function can be used from REQUEST_ROUTE, FAILURE_ROUTE,
291   ONREPLY_ROUTE, BRANCH_ROUTE and LOCAL_ROUTE.
292
293   Example 1.10.  pcre_match_group usage
294...
295if (pcre_match_group("$rU", "2")) {
296    xlog("L_INFO", "RURI username matches group 2\n");
297}
298...
299
300   Example 1.11.  pcre_match_group usage (using a pseudo-variable as
301   group)
302...
303$avp(i:10) = 5;  # Maybe got from a DB query.
304if (pcre_match_group("$ua", "$avp(i:10)")) {
305    xlog("L_INFO", "User-Agent matches group 5\n");
306}
307...
308
3095. RPC Commands
310
311   5.1. regex.reload
312
3135.1.  regex.reload
314
315   Causes regex module to re-read the content of the text file and
316   re-compile the regular expressions. The number of groups in the file
317   can be modified safely.
318
319   Name: regex.reload
320
321   Parameters: none
322
323   RPC Command Example:
324...
325kamcmd regex.reload
326...
327
3286. Installation and Running
329
330   6.1. File format
331
3326.1. File format
333
334   The file contains regular expressions categorized in groups. Each group
335   starts with "[number]" line. Lines starting by space, tab, CR, LF or #
336   (comments) are ignored. Each regular expression must take up just one
337   line, this means that a regular expression can't be split in various
338   lines.
339
340   An example of the file format would be the following:
341
342   Example 1.12. regex file
343### List of User-Agents publishing presence status
344[0]
345
346# Softphones
347^Twinkle/1
348^X-Lite
349^eyeBeam
350^Bria
351^SIP Communicator
352^Linphone
353
354# Deskphones
355^Snom
356
357# Others
358^SIPp
359^PJSUA
360
361
362### Blacklisted source IP's
363[1]
364
365^190\.232\.250\.226$
366^122\.5\.27\.125$
367^86\.92\.112\.
368
369
370### Free PSTN destinations in Spain
371[2]
372
373^1\d{3}$
374^((\+|00)34)?900\d{6}$
375
376   The module compiles the text above to the following regular
377   expressions:
378group 0: ((^Twinkle/1)|(^X-Lite)|(^eyeBeam)|(^Bria)|(^SIP Communicator)|
379          (^Linphone)|(^Snom)|(^SIPp)|(^PJSUA))
380group 1: ((^190\.232\.250\.226$)|(^122\.5\.27\.125$)|(^86\.92\.112\.))
381group 2: ((^1\d{3}$)|(^((\+|00)34)?900\d{6}$))
382
383   The first group can be used to avoid auto-generated PUBLISH (pua_usrloc
384   module) for UA's already supporting presence:
385
386   Example 1.13. Using with pua_usrloc
387route[REGISTER] {
388    if (! pcre_match_group("$ua", "0")) {
389        xlog("L_INFO", "Auto-generated PUBLISH for $fu ($ua)\n");
390        pua_set_publish();
391    }
392    save("location");
393    exit;
394}
395
396   NOTE: It's important to understand that the numbers in each group
397   header ([number]) must start by 0. If not, the real group number will
398   not match the number appearing in the file. For example, the following
399   text file:
400
401   Example 1.14. Incorrect groups file
402[1]
403^aaa
404^bbb
405
406[2]
407^ccc
408^ddd
409
410   will generate the following regular expressions:
411group 0: ((^aaa)|(^bbb))
412group 1: ((^ccc)|(^ddd))
413
414   Note that the real index doesn't match the group number in the file.
415   This is, compiled group 0 always points to the first group in the file,
416   regardless of its number in the file. In fact, the group number
417   appearing in the file is used for nothing but for delimiting different
418   groups.
419
420   NOTE: A line containing a regular expression cannot start by '[' since
421   it would be treated as a new group. The same for lines starting by
422   space, tab, or '#' (they would be ignored by the parser). As a
423   workaround, using brackets would work:
424[0]
425([0-9]{9})
426( #abcde)
427( qwerty)
428