1.\" Copyright (c) 1991, 1993 2.\" The Regents of the University of California. All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 3. Neither the name of the University nor the names of its contributors 13.\" may be used to endorse or promote products derived from this software 14.\" without specific prior written permission. 15.\" 16.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 19.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 26.\" SUCH DAMAGE. 27.\" 28.\" from: @(#)regexp.3 8.1 (Berkeley) 6/4/93 29.\" $NetBSD: regexp.3,v 1.16 2008/09/08 22:14:11 apb Exp $ 30.\" 31.Dd June 4, 1993 32.Dt REGEXP 3 33.Os 34.Sh NAME 35.Nm regcomp , 36.Nm regexec , 37.Nm regsub , 38.Nm regerror 39.Nd obsolete "'regexp'" regular expression handlers 40.Sh LIBRARY 41.Lb libcompat 42.Sh SYNOPSIS 43.In regexp.h 44.Ft regexp * 45.Fn regcomp "const char *exp" 46.Ft int 47.Fn regexec "const regexp *prog" "const char *string" 48.Ft void 49.Fn regsub "const regexp *prog" "const char *source" "char *dest" 50.Ft void 51.Fn regerror "const char *msg" 52.Sh DESCRIPTION 53.Bf -symbolic 54This interface is made obsolete by 55.Xr regex 3 . 56It is available from the compatibility library, libcompat. 57.Ef 58.Pp 59The 60.Fn regcomp , 61.Fn regexec , 62.Fn regsub , 63and 64.Fn regerror 65functions implement 66.Xr egrep 1 Ns -style 67regular expressions and supporting facilities. 68.Pp 69The 70.Fn regcomp 71function 72compiles a regular expression into a structure of type 73.Em regexp , 74and returns a pointer to it. 75The space has been allocated using 76.Xr malloc 3 77and may be released by 78.Xr free 3 . 79.Pp 80The 81.Fn regexec 82function 83matches a 84.Dv NUL Ns -terminated 85.Fa string 86against the compiled regular expression 87in 88.Fa prog . 89It returns 1 for success and 0 for failure, and adjusts the contents of 90.Fa prog Ns 's 91.Em startp 92and 93.Em endp 94(see below) accordingly. 95.Pp 96The members of a 97.Em regexp 98structure include at least the following (not necessarily in order): 99.Bd -literal -offset indent 100char *startp[NSUBEXP]; 101char *endp[NSUBEXP]; 102.Ed 103.Pp 104where 105.Dv NSUBEXP 106is defined (as 10) in the header file. 107Once a successful 108.Fn regexec 109has been done using the 110.Fn regexp , 111each 112.Em startp Ns - Em endp 113pair describes one substring 114within the 115.Fa string , 116with the 117.Em startp 118pointing to the first character of the substring and 119the 120.Em endp 121pointing to the first character following the substring. 122The 0th substring is the substring of 123.Fa string 124that matched the whole 125regular expression. 126The others are those substrings that matched parenthesized expressions 127within the regular expression, with parenthesized expressions numbered 128in left-to-right order of their opening parentheses. 129.Pp 130The 131.Fn regsub 132function 133copies 134.Fa source 135to 136.Fa dest , 137making substitutions according to the 138most recent 139.Fn regexec 140performed using 141.Fa prog . 142Each instance of `\*[Am]' in 143.Fa source 144is replaced by the substring 145indicated by 146.Em startp Ns Bq 147and 148.Em endp Ns Bq . 149Each instance of 150.Sq \e Ns Em n , 151where 152.Em n 153is a digit, is replaced by 154the substring indicated by 155.Em startp Ns Bq Em n 156and 157.Em endp Ns Bq Em n . 158To get a literal `\*[Am]' or 159.Sq \e Ns Em n 160into 161.Fa dest , 162prefix it with `\e'; 163to get a literal `\e' preceding `\*[Am]' or 164.Sq \e Ns Em n , 165prefix it with 166another `\e'. 167.Pp 168The 169.Fn regerror 170function 171is called whenever an error is detected in 172.Fn regcomp , 173.Fn regexec , 174or 175.Fn regsub . 176The default 177.Fn regerror 178writes the string 179.Fa msg , 180with a suitable indicator of origin, 181on the standard 182error output 183and invokes 184.Xr exit 3 . 185The 186.Fn regerror 187function 188can be replaced by the user if other actions are desirable. 189.Sh REGULAR EXPRESSION SYNTAX 190A regular expression is zero or more 191.Em branches , 192separated by `|'. 193It matches anything that matches one of the branches. 194.Pp 195A branch is zero or more 196.Em pieces , 197concatenated. 198It matches a match for the first, followed by a match for the second, etc. 199.Pp 200A piece is an 201.Em atom 202possibly followed by `*', `+', or `?'. 203An atom followed by `*' matches a sequence of 0 or more matches of the atom. 204An atom followed by `+' matches a sequence of 1 or more matches of the atom. 205An atom followed by `?' matches a match of the atom, or the null string. 206.Pp 207An atom is a regular expression in parentheses (matching a match for the 208regular expression), a 209.Em range 210(see below), `.' 211(matching any single character), `^' (matching the null string at the 212beginning of the input string), `$' (matching the null string at the 213end of the input string), a `\e' followed by a single character (matching 214that character), or a single character with no other significance 215(matching that character). 216.Pp 217A 218.Em range 219is a sequence of characters enclosed in `[]'. 220It normally matches any single character from the sequence. 221If the sequence begins with `^', 222it matches any single character 223.Em not 224from the rest of the sequence. 225If two characters in the sequence are separated by `\-', this is shorthand 226for the full list of 227.Tn ASCII 228characters between them 229(e.g. `[0-9]' matches any decimal digit). 230To include a literal `]' in the sequence, make it the first character 231(following a possible `^'). 232To include a literal `\-', make it the first or last character. 233.Sh AMBIGUITY 234If a regular expression could match two different parts of the input string, 235it will match the one which begins earliest. 236If both begin in the same place but match different lengths, or match 237the same length in different ways, life gets messier, as follows. 238.Pp 239In general, the possibilities in a list of branches are considered in 240left-to-right order, the possibilities for `*', `+', and `?' are 241considered longest-first, nested constructs are considered from the 242outermost in, and concatenated constructs are considered leftmost-first. 243The match that will be chosen is the one that uses the earliest 244possibility in the first choice that has to be made. 245If there is more than one choice, the next will be made in the same manner 246(earliest possibility) subject to the decision on the first choice. 247And so forth. 248.Pp 249For example, 250.Sq Li (ab|a)b*c 251could match 252`abc' in one of two ways. 253The first choice is between `ab' and `a'; since `ab' is earlier, and does 254lead to a successful overall match, it is chosen. 255Since the `b' is already spoken for, 256the `b*' must match its last possibility\(emthe empty string\(emsince 257it must respect the earlier choice. 258.Pp 259In the particular case where no `|'s are present and there is only one 260`*', `+', or `?', the net effect is that the longest possible 261match will be chosen. 262So 263.Sq Li ab* , 264presented with `xabbbby', will match `abbbb'. 265Note that if 266.Sq Li ab* , 267is tried against `xabyabbbz', it 268will match `ab' just after `x', due to the begins-earliest rule. 269(In effect, the decision on where to start the match is the first choice 270to be made, hence subsequent choices must respect it even if this leads them 271to less-preferred alternatives.) 272.Sh RETURN VALUES 273The 274.Fn regcomp 275function 276returns 277.Dv NULL 278for a failure 279.Pf ( Fn regerror 280permitting), 281where failures are syntax errors, exceeding implementation limits, 282or applying `+' or `*' to a possibly-null operand. 283.Sh SEE ALSO 284.Xr ed 1 , 285.Xr egrep 1 , 286.Xr ex 1 , 287.Xr expr 1 , 288.Xr fgrep 1 , 289.Xr grep 1 , 290.Xr regex 3 291.Sh HISTORY 292Both code and manual page for 293.Fn regcomp , 294.Fn regexec , 295.Fn regsub , 296and 297.Fn regerror 298were written at the University of Toronto 299and appeared in 300.Bx 4.3 tahoe . 301They are intended to be compatible with the Bell V8 302.Xr regexp 3 , 303but are not derived from Bell code. 304.Sh BUGS 305Empty branches and empty regular expressions are not portable to V8. 306.Pp 307The restriction against 308applying `*' or `+' to a possibly-null operand is an artifact of the 309simplistic implementation. 310.Pp 311Does not support 312.Xr egrep 1 Ns 's 313newline-separated branches; 314neither does the V8 315.Xr regexp 3 , 316though. 317.Pp 318Due to emphasis on 319compactness and simplicity, 320it's not strikingly fast. 321It does give special attention to handling simple cases quickly. 322