xref: /openbsd/usr.sbin/httpd/patterns.7 (revision d415bd75)
1.\"	$OpenBSD: patterns.7,v 1.8 2023/11/08 11:17:20 deraadt Exp $
2.\"
3.\" Copyright (c) 2015 Reyk Floeter <reyk@openbsd.org>
4.\" Copyright (C) 1994-2015 Lua.org, PUC-Rio.
5.\"
6.\" Permission is hereby granted, free of charge, to any person obtaining
7.\" a copy of this software and associated documentation files (the
8.\" "Software"), to deal in the Software without restriction, including
9.\" without limitation the rights to use, copy, modify, merge, publish,
10.\" distribute, sublicense, and/or sell copies of the Software, and to
11.\" permit persons to whom the Software is furnished to do so, subject to
12.\" the following conditions:
13.\"
14.\" The above copyright notice and this permission notice shall be
15.\" included in all copies or substantial portions of the Software.
16.\"
17.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
18.\" EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
19.\" MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
20.\" IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
21.\" CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
22.\" TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
23.\" SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
24.\"
25.\" Derived from section 6.4.1 in manual.html of Lua 5.3.1:
26.\" $Id: patterns.7,v 1.8 2023/11/08 11:17:20 deraadt Exp $
27.\"
28.Dd $Mdocdate: November 8 2023 $
29.Dt PATTERNS 7
30.Os
31.Sh NAME
32.Nm patterns
33.Nd Lua's pattern matching rules
34.Sh DESCRIPTION
35Pattern matching in
36.Xr httpd 8
37is based on the implementation of the Lua scripting language and
38provides a simple and fast alternative to the regular expressions (REs) that
39are described in
40.Xr re_format 7 .
41Patterns are described by regular strings, which are interpreted as
42patterns by the pattern-matching
43.Dq find
44and
45.Dq match
46functions.
47This document describes the syntax and the meaning (that is, what they
48match) of these strings.
49.Sh CHARACTER CLASS
50A character class is used to represent a set of characters.
51The following combinations are allowed in describing a character
52class:
53.Bl -tag -width Ds
54.It Ar x
55(where
56.Ar x
57is not one of the magic characters
58.Sq ^$()%.[]*+-? )
59represents the character
60.Ar x
61itself.
62.It .
63(a dot) represents all characters.
64.It %a
65represents all letters.
66.It %c
67represents all control characters.
68.It %d
69represents all digits.
70.It %g
71represents all printable characters except space.
72.It %l
73represents all lowercase letters.
74.It %p
75represents all punctuation characters.
76.It %s
77represents all space characters.
78.It %u
79represents all uppercase letters.
80.It %w
81represents all alphanumeric characters.
82.It %x
83represents all hexadecimal digits.
84.It Pf % Ar x
85(where
86.Ar x
87is any non-alphanumeric character) represents the character
88.Ar x .
89This is the standard way to escape the magic characters.
90Any non-alphanumeric character (including all punctuation characters,
91even the non-magical) can be preceded by a
92.Sq %
93when used to represent itself in a pattern.
94.It Bq Ar set
95represents the class which is the union of all
96characters in
97.Ar set .
98A range of characters can be specified by separating the end
99characters of the range, in ascending order, with a
100.Sq - .
101All classes
102.Sq Ar %x
103described above can also be used as components in
104.Ar set .
105All other characters in
106.Ar set
107represent themselves.
108For example,
109.Sq [%w_]
110(or
111.Sq [_%w] )
112represents all alphanumeric characters plus the underscore,
113.Sq [0-7]
114represents the octal digits,
115and
116.Sq [0-7%l%-]
117represents the octal digits plus the lowercase letters plus the
118.Sq -
119character.
120.Pp
121The interaction between ranges and classes is not defined.
122Therefore, patterns like
123.Sq [%a-z]
124or
125.Sq [a-%%]
126have no meaning.
127.It Bq Ar ^set
128represents the complement of
129.Ar set ,
130where
131.Ar set
132is interpreted as above.
133.El
134.Pp
135For all classes represented by single letters (
136.Sq %a ,
137.Sq %c ,
138etc.),
139the corresponding uppercase letter represents the complement of the class.
140For instance,
141.Sq %S
142represents all non-space characters.
143.Pp
144The definitions of letter, space, and other character groups depend on
145the current locale.
146In particular, the class
147.Sq [a-z]
148may not be equivalent to
149.Sq %l .
150.Sh PATTERN ITEM
151A pattern item can be
152.Bl -bullet
153.It
154a single character class, which matches any single character in the class;
155.It
156a single character class followed by
157.Sq * ,
158which matches zero or more repetitions of characters in the class.
159These repetition items will always match the longest possible sequence;
160.It
161a single character class followed by
162.Sq + ,
163which matches one or more repetitions of characters in the class.
164These repetition items will always match the longest possible sequence;
165.It
166a single character class followed by
167.Sq - ,
168which also matches zero or more repetitions of characters in the class.
169Unlike
170.Sq * ,
171these repetition items will always match the shortest possible sequence;
172.It
173a single character class followed by
174.Sq \&? ,
175which matches zero or one occurrence of a character in the class.
176It always matches one occurrence if possible;
177.It
178.Sq Pf % Ar n ,
179for
180.Ar n
181between 1 and 9;
182such item matches a substring equal to the n-th captured string (see below);
183.It
184.Sq Pf %b Ar xy ,
185where
186.Ar x
187and
188.Ar y
189are two distinct characters;
190such item matches strings that start with
191.Ar x ,
192end with
193.Ar y ,
194and where the
195.Ar x
196and
197.Ar y
198are
199.Em balanced .
200This means that if one reads the string from left to right, counting
201.Em +1
202for an
203.Ar x
204and
205.Em -1
206for a
207.Ar y ,
208the ending
209.Ar y
210is the first
211.Ar y
212where the count reaches 0.
213For instance, the item
214.Sq %b()
215matches expressions with balanced parentheses.
216.It
217.Sq Pf %f Bq Ar set ,
218a
219.Em frontier pattern ;
220such item matches an empty string at any position such that the next
221character belongs to
222.Ar set
223and the previous character does not belong to
224.Ar set .
225The set
226.Ar set
227is interpreted as previously described.
228The beginning and the end of the subject are handled as if
229they were the character
230.Sq \e0 .
231.El
232.Sh PATTERN
233A pattern is a sequence of pattern items.
234A caret
235.Sq ^
236at the beginning of a pattern anchors the match at the beginning of
237the subject string.
238A
239.Sq $
240at the end of a pattern anchors the match at the end of the subject string.
241At other positions,
242.Sq ^
243and
244.Sq $
245have no special meaning and represent themselves.
246.Sh CAPTURES
247A pattern can contain sub-patterns enclosed in parentheses; they
248describe captures.
249When a match succeeds, the substrings of the subject string that match
250captures are stored (captured) for future use.
251Captures are numbered according to their left parentheses.
252For instance, in the pattern
253.Qq (a*(.)%w(%s*)) ,
254the part of the string matching
255.Qq a*(.)%w(%s*)
256is stored as the first capture (and therefore has number 1);
257the character matching
258.Qq \&.
259is captured with number 2,
260and the part matching
261.Qq %s*
262has number 3.
263.Pp
264As a special case, the empty capture
265.Sq ()
266captures the current string position (a number).
267For instance, if we apply the pattern
268.Qq ()aa()
269on the string
270.Qq flaaap ,
271there will be two captures: 2 and 4.
272.Sh SEE ALSO
273.Xr fnmatch 3 ,
274.Xr re_format 7 ,
275.Xr httpd 8
276.Rs
277.%A Roberto Ierusalimschy
278.%A Luiz Henrique de Figueiredo
279.%A Waldemar Celes
280.%Q Lua.org
281.%Q PUC-Rio
282.%D June 2015
283.%R Lua 5.3 Reference Manual
284.%T Patterns
285.%U https://www.lua.org/manual/5.3/manual.html#6.4.1
286.Re
287.Sh HISTORY
288The first implementation of the pattern rules were introduced with Lua 2.5.
289Almost twenty years later,
290an implementation based on Lua 5.3.1 appeared in
291.Ox 5.8 .
292.Sh AUTHORS
293The pattern matching is derived from the original implementation of
294the Lua scripting language written by
295.An -nosplit
296.An Roberto Ierusalimschy ,
297.An Waldemar Celes ,
298and
299.An Luiz Henrique de Figueiredo
300at PUC-Rio.
301It was turned into a native C API for
302.Xr httpd 8
303by
304.An Reyk Floeter Aq Mt reyk@openbsd.org .
305.Sh CAVEATS
306A notable difference with the Lua implementation is the position in the string
307returned by captures.
308It follows the C-style indexing (position starting from 0)
309instead of Lua-style indexing (position starting from 1).
310