1.\" $OpenBSD: patterns.7,v 1.8 2023/11/08 11:17:20 deraadt Exp $ 2.\" 3.\" Copyright (c) 2015 Reyk Floeter <reyk@openbsd.org> 4.\" Copyright (C) 1994-2015 Lua.org, PUC-Rio. 5.\" 6.\" Permission is hereby granted, free of charge, to any person obtaining 7.\" a copy of this software and associated documentation files (the 8.\" "Software"), to deal in the Software without restriction, including 9.\" without limitation the rights to use, copy, modify, merge, publish, 10.\" distribute, sublicense, and/or sell copies of the Software, and to 11.\" permit persons to whom the Software is furnished to do so, subject to 12.\" the following conditions: 13.\" 14.\" The above copyright notice and this permission notice shall be 15.\" included in all copies or substantial portions of the Software. 16.\" 17.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 18.\" EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 19.\" MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 20.\" IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY 21.\" CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, 22.\" TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 23.\" SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 24.\" 25.\" Derived from section 6.4.1 in manual.html of Lua 5.3.1: 26.\" $Id: patterns.7,v 1.8 2023/11/08 11:17:20 deraadt Exp $ 27.\" 28.Dd $Mdocdate: November 8 2023 $ 29.Dt PATTERNS 7 30.Os 31.Sh NAME 32.Nm patterns 33.Nd Lua's pattern matching rules 34.Sh DESCRIPTION 35Pattern matching in 36.Xr httpd 8 37is based on the implementation of the Lua scripting language and 38provides a simple and fast alternative to the regular expressions (REs) that 39are described in 40.Xr re_format 7 . 41Patterns are described by regular strings, which are interpreted as 42patterns by the pattern-matching 43.Dq find 44and 45.Dq match 46functions. 47This document describes the syntax and the meaning (that is, what they 48match) of these strings. 49.Sh CHARACTER CLASS 50A character class is used to represent a set of characters. 51The following combinations are allowed in describing a character 52class: 53.Bl -tag -width Ds 54.It Ar x 55(where 56.Ar x 57is not one of the magic characters 58.Sq ^$()%.[]*+-? ) 59represents the character 60.Ar x 61itself. 62.It . 63(a dot) represents all characters. 64.It %a 65represents all letters. 66.It %c 67represents all control characters. 68.It %d 69represents all digits. 70.It %g 71represents all printable characters except space. 72.It %l 73represents all lowercase letters. 74.It %p 75represents all punctuation characters. 76.It %s 77represents all space characters. 78.It %u 79represents all uppercase letters. 80.It %w 81represents all alphanumeric characters. 82.It %x 83represents all hexadecimal digits. 84.It Pf % Ar x 85(where 86.Ar x 87is any non-alphanumeric character) represents the character 88.Ar x . 89This is the standard way to escape the magic characters. 90Any non-alphanumeric character (including all punctuation characters, 91even the non-magical) can be preceded by a 92.Sq % 93when used to represent itself in a pattern. 94.It Bq Ar set 95represents the class which is the union of all 96characters in 97.Ar set . 98A range of characters can be specified by separating the end 99characters of the range, in ascending order, with a 100.Sq - . 101All classes 102.Sq Ar %x 103described above can also be used as components in 104.Ar set . 105All other characters in 106.Ar set 107represent themselves. 108For example, 109.Sq [%w_] 110(or 111.Sq [_%w] ) 112represents all alphanumeric characters plus the underscore, 113.Sq [0-7] 114represents the octal digits, 115and 116.Sq [0-7%l%-] 117represents the octal digits plus the lowercase letters plus the 118.Sq - 119character. 120.Pp 121The interaction between ranges and classes is not defined. 122Therefore, patterns like 123.Sq [%a-z] 124or 125.Sq [a-%%] 126have no meaning. 127.It Bq Ar ^set 128represents the complement of 129.Ar set , 130where 131.Ar set 132is interpreted as above. 133.El 134.Pp 135For all classes represented by single letters ( 136.Sq %a , 137.Sq %c , 138etc.), 139the corresponding uppercase letter represents the complement of the class. 140For instance, 141.Sq %S 142represents all non-space characters. 143.Pp 144The definitions of letter, space, and other character groups depend on 145the current locale. 146In particular, the class 147.Sq [a-z] 148may not be equivalent to 149.Sq %l . 150.Sh PATTERN ITEM 151A pattern item can be 152.Bl -bullet 153.It 154a single character class, which matches any single character in the class; 155.It 156a single character class followed by 157.Sq * , 158which matches zero or more repetitions of characters in the class. 159These repetition items will always match the longest possible sequence; 160.It 161a single character class followed by 162.Sq + , 163which matches one or more repetitions of characters in the class. 164These repetition items will always match the longest possible sequence; 165.It 166a single character class followed by 167.Sq - , 168which also matches zero or more repetitions of characters in the class. 169Unlike 170.Sq * , 171these repetition items will always match the shortest possible sequence; 172.It 173a single character class followed by 174.Sq \&? , 175which matches zero or one occurrence of a character in the class. 176It always matches one occurrence if possible; 177.It 178.Sq Pf % Ar n , 179for 180.Ar n 181between 1 and 9; 182such item matches a substring equal to the n-th captured string (see below); 183.It 184.Sq Pf %b Ar xy , 185where 186.Ar x 187and 188.Ar y 189are two distinct characters; 190such item matches strings that start with 191.Ar x , 192end with 193.Ar y , 194and where the 195.Ar x 196and 197.Ar y 198are 199.Em balanced . 200This means that if one reads the string from left to right, counting 201.Em +1 202for an 203.Ar x 204and 205.Em -1 206for a 207.Ar y , 208the ending 209.Ar y 210is the first 211.Ar y 212where the count reaches 0. 213For instance, the item 214.Sq %b() 215matches expressions with balanced parentheses. 216.It 217.Sq Pf %f Bq Ar set , 218a 219.Em frontier pattern ; 220such item matches an empty string at any position such that the next 221character belongs to 222.Ar set 223and the previous character does not belong to 224.Ar set . 225The set 226.Ar set 227is interpreted as previously described. 228The beginning and the end of the subject are handled as if 229they were the character 230.Sq \e0 . 231.El 232.Sh PATTERN 233A pattern is a sequence of pattern items. 234A caret 235.Sq ^ 236at the beginning of a pattern anchors the match at the beginning of 237the subject string. 238A 239.Sq $ 240at the end of a pattern anchors the match at the end of the subject string. 241At other positions, 242.Sq ^ 243and 244.Sq $ 245have no special meaning and represent themselves. 246.Sh CAPTURES 247A pattern can contain sub-patterns enclosed in parentheses; they 248describe captures. 249When a match succeeds, the substrings of the subject string that match 250captures are stored (captured) for future use. 251Captures are numbered according to their left parentheses. 252For instance, in the pattern 253.Qq (a*(.)%w(%s*)) , 254the part of the string matching 255.Qq a*(.)%w(%s*) 256is stored as the first capture (and therefore has number 1); 257the character matching 258.Qq \&. 259is captured with number 2, 260and the part matching 261.Qq %s* 262has number 3. 263.Pp 264As a special case, the empty capture 265.Sq () 266captures the current string position (a number). 267For instance, if we apply the pattern 268.Qq ()aa() 269on the string 270.Qq flaaap , 271there will be two captures: 2 and 4. 272.Sh SEE ALSO 273.Xr fnmatch 3 , 274.Xr re_format 7 , 275.Xr httpd 8 276.Rs 277.%A Roberto Ierusalimschy 278.%A Luiz Henrique de Figueiredo 279.%A Waldemar Celes 280.%Q Lua.org 281.%Q PUC-Rio 282.%D June 2015 283.%R Lua 5.3 Reference Manual 284.%T Patterns 285.%U https://www.lua.org/manual/5.3/manual.html#6.4.1 286.Re 287.Sh HISTORY 288The first implementation of the pattern rules were introduced with Lua 2.5. 289Almost twenty years later, 290an implementation based on Lua 5.3.1 appeared in 291.Ox 5.8 . 292.Sh AUTHORS 293The pattern matching is derived from the original implementation of 294the Lua scripting language written by 295.An -nosplit 296.An Roberto Ierusalimschy , 297.An Waldemar Celes , 298and 299.An Luiz Henrique de Figueiredo 300at PUC-Rio. 301It was turned into a native C API for 302.Xr httpd 8 303by 304.An Reyk Floeter Aq Mt reyk@openbsd.org . 305.Sh CAVEATS 306A notable difference with the Lua implementation is the position in the string 307returned by captures. 308It follows the C-style indexing (position starting from 0) 309instead of Lua-style indexing (position starting from 1). 310