• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

cmake/H29-Oct-2021-292259

doc/H03-May-2022-19,71618,373

m4/H29-Oct-2021-398361

maint/H29-Oct-2021-3,8292,766

src/H29-Oct-2021-110,37080,031

testdata/H03-May-2022-97,04282,325

.gitignoreH A D29-Oct-2021821 7869

132htmlH A D29-Oct-20216.9 KiB315218

AUTHORSH A D29-Oct-2021749 3724

COPYINGH A D29-Oct-202197 63

ChangeLogH A D29-Oct-2021120 KiB2,6101,906

CheckManH A D29-Oct-20211.7 KiB7965

CleanTxtH A D29-Oct-20212.9 KiB11472

DetrailH A D29-Oct-2021643 3623

HACKINGH A D29-Oct-202137.7 KiB831655

LICENCEH A D29-Oct-20213.4 KiB9567

Makefile.amH A D03-May-202225.4 KiB871675

NEWSH A D29-Oct-202113.7 KiB382249

NON-AUTOTOOLS-BUILDH A D29-Oct-202118 KiB411308

PrepareReleaseH A D29-Oct-20216.8 KiB237200

READMEH A D29-Oct-202142.2 KiB909715

README.mdH A D29-Oct-20212.3 KiB5740

RunGrepTestH A D29-Oct-202140.1 KiB822579

RunGrepTest.batH A D29-Oct-202134.4 KiB700526

RunTestH A D29-Oct-202124.5 KiB870629

RunTest.batH A D29-Oct-202113.5 KiB527474

autogen.shH A D29-Oct-20211.2 KiB4625

config-cmake.h.inH A D29-Oct-20211.5 KiB5545

configure.acH A D29-Oct-202140.7 KiB1,116944

index.mdH A D29-Oct-20212.3 KiB5740

libpcre2-16.pc.inH A D29-Oct-2021406 1411

libpcre2-32.pc.inH A D29-Oct-2021406 1411

libpcre2-8.pc.inH A D29-Oct-2021403 1411

libpcre2-posix.pc.inH A D29-Oct-2021342 1411

pcre2-config.inH A D29-Oct-20212.3 KiB122109

pcre2_fuzzer.dictH A D29-Oct-2021435 5145

pcre2_fuzzer.optionsH A D29-Oct-202137 32

perltest.shH A D29-Oct-202111.1 KiB401227

README

1README file for PCRE2 (Perl-compatible regular expression library)
2------------------------------------------------------------------
3
4PCRE2 is a re-working of the original PCRE1 library to provide an entirely new
5API. Since its initial release in 2015, there has been further development of
6the code and it now differs from PCRE1 in more than just the API. There are new
7features, and the internals have been improved. The original PCRE1 library is
8now obsolete and no longer maintained. The latest release of PCRE2 is available
9in .tar.gz, tar.bz2, or .zip form from this GitHub repository:
10
11https://github.com/PhilipHazel/pcre2/releases
12
13There is a mailing list for discussion about the development of PCRE2 at
14pcre2-dev@googlegroups.com. You can subscribe by sending an email to
15pcre2-dev+subscribe@googlegroups.com.
16
17You can access the archives and also subscribe or manage your subscription
18here:
19
20https://groups.google.com/pcre2-dev
21
22Please read the NEWS file if you are upgrading from a previous release. The
23contents of this README file are:
24
25  The PCRE2 APIs
26  Documentation for PCRE2
27  Contributions by users of PCRE2
28  Building PCRE2 on non-Unix-like systems
29  Building PCRE2 without using autotools
30  Building PCRE2 using autotools
31  Retrieving configuration information
32  Shared libraries
33  Cross-compiling using autotools
34  Making new tarballs
35  Testing PCRE2
36  Character tables
37  File manifest
38
39
40The PCRE2 APIs
41--------------
42
43PCRE2 is written in C, and it has its own API. There are three sets of
44functions, one for the 8-bit library, which processes strings of bytes, one for
45the 16-bit library, which processes strings of 16-bit values, and one for the
4632-bit library, which processes strings of 32-bit values. Unlike PCRE1, there
47are no C++ wrappers.
48
49The distribution does contain a set of C wrapper functions for the 8-bit
50library that are based on the POSIX regular expression API (see the pcre2posix
51man page). These are built into a library called libpcre2-posix. Note that this
52just provides a POSIX calling interface to PCRE2; the regular expressions
53themselves still follow Perl syntax and semantics. The POSIX API is restricted,
54and does not give full access to all of PCRE2's facilities.
55
56The header file for the POSIX-style functions is called pcre2posix.h. The
57official POSIX name is regex.h, but I did not want to risk possible problems
58with existing files of that name by distributing it that way. To use PCRE2 with
59an existing program that uses the POSIX API, pcre2posix.h will have to be
60renamed or pointed at by a link (or the program modified, of course). See the
61pcre2posix documentation for more details.
62
63
64Documentation for PCRE2
65-----------------------
66
67If you install PCRE2 in the normal way on a Unix-like system, you will end up
68with a set of man pages whose names all start with "pcre2". The one that is
69just called "pcre2" lists all the others. In addition to these man pages, the
70PCRE2 documentation is supplied in two other forms:
71
72  1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and
73     doc/pcre2test.txt in the source distribution. The first of these is a
74     concatenation of the text forms of all the section 3 man pages except the
75     listing of pcre2demo.c and those that summarize individual functions. The
76     other two are the text forms of the section 1 man pages for the pcre2grep
77     and pcre2test commands. These text forms are provided for ease of scanning
78     with text editors or similar tools. They are installed in
79     <prefix>/share/doc/pcre2, where <prefix> is the installation prefix
80     (defaulting to /usr/local).
81
82  2. A set of files containing all the documentation in HTML form, hyperlinked
83     in various ways, and rooted in a file called index.html, is distributed in
84     doc/html and installed in <prefix>/share/doc/pcre2/html.
85
86
87Building PCRE2 on non-Unix-like systems
88---------------------------------------
89
90For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if
91your system supports the use of "configure" and "make" you may be able to build
92PCRE2 using autotools in the same way as for many Unix-like systems.
93
94PCRE2 can also be configured using CMake, which can be run in various ways
95(command line, GUI, etc). This creates Makefiles, solution files, etc. The file
96NON-AUTOTOOLS-BUILD has information about CMake.
97
98PCRE2 has been compiled on many different operating systems. It should be
99straightforward to build PCRE2 on any system that has a Standard C compiler and
100library, because it uses only Standard C functions.
101
102
103Building PCRE2 without using autotools
104--------------------------------------
105
106The use of autotools (in particular, libtool) is problematic in some
107environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
108file for ways of building PCRE2 without using autotools.
109
110
111Building PCRE2 using autotools
112------------------------------
113
114The following instructions assume the use of the widely used "configure; make;
115make install" (autotools) process.
116
117To build PCRE2 on system that supports autotools, first run the "configure"
118command from the PCRE2 distribution directory, with your current directory set
119to the directory where you want the files to be created. This command is a
120standard GNU "autoconf" configuration script, for which generic instructions
121are supplied in the file INSTALL.
122
123Most commonly, people build PCRE2 within its own distribution directory, and in
124this case, on many systems, just running "./configure" is sufficient. However,
125the usual methods of changing standard defaults are available. For example:
126
127CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
128
129This command specifies that the C compiler should be run with the flags '-O2
130-Wall' instead of the default, and that "make install" should install PCRE2
131under /opt/local instead of the default /usr/local.
132
133If you want to build in a different directory, just run "configure" with that
134directory as current. For example, suppose you have unpacked the PCRE2 source
135into /source/pcre2/pcre2-xxx, but you want to build it in
136/build/pcre2/pcre2-xxx:
137
138cd /build/pcre2/pcre2-xxx
139/source/pcre2/pcre2-xxx/configure
140
141PCRE2 is written in C and is normally compiled as a C library. However, it is
142possible to build it as a C++ library, though the provided building apparatus
143does not have any features to support this.
144
145There are some optional features that can be included or omitted from the PCRE2
146library. They are also documented in the pcre2build man page.
147
148. By default, both shared and static libraries are built. You can change this
149  by adding one of these options to the "configure" command:
150
151  --disable-shared
152  --disable-static
153
154  (See also "Shared libraries on Unix-like systems" below.)
155
156. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to
157  the "configure" command, the 16-bit library is also built. If you add
158  --enable-pcre2-32 to the "configure" command, the 32-bit library is also
159  built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8
160  to disable building the 8-bit library.
161
162. If you want to include support for just-in-time (JIT) compiling, which can
163  give large performance improvements on certain platforms, add --enable-jit to
164  the "configure" command. This support is available only for certain hardware
165  architectures. If you try to enable it on an unsupported architecture, there
166  will be a compile time error. If in doubt, use --enable-jit=auto, which
167  enables JIT only if the current hardware is supported.
168
169. If you are enabling JIT under SELinux environment you may also want to add
170  --enable-jit-sealloc, which enables the use of an executable memory allocator
171  that is compatible with SELinux. Warning: this allocator is experimental!
172  It does not support fork() operation and may crash when no disk space is
173  available. This option has no effect if JIT is disabled.
174
175. If you do not want to make use of the default support for UTF-8 Unicode
176  character strings in the 8-bit library, UTF-16 Unicode character strings in
177  the 16-bit library, or UTF-32 Unicode character strings in the 32-bit
178  library, you can add --disable-unicode to the "configure" command. This
179  reduces the size of the libraries. It is not possible to configure one
180  library with Unicode support, and another without, in the same configuration.
181  It is also not possible to use --enable-ebcdic (see below) with Unicode
182  support, so if this option is set, you must also use --disable-unicode.
183
184  When Unicode support is available, the use of a UTF encoding still has to be
185  enabled by setting the PCRE2_UTF option at run time or starting a pattern
186  with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
187  either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.
188
189  As well as supporting UTF strings, Unicode support includes support for the
190  \P, \p, and \X sequences that recognize Unicode character properties.
191  However, only the basic two-letter properties such as Lu are supported.
192  Escape sequences such as \d and \w in patterns do not by default make use of
193  Unicode properties, but can be made to do so by setting the PCRE2_UCP option
194  or starting a pattern with (*UCP).
195
196. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
197  of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
198  character as indicating the end of a line. Whatever you specify at build time
199  is the default; the caller of PCRE2 can change the selection at run time. The
200  default newline indicator is a single LF character (the Unix standard). You
201  can specify the default newline indicator by adding --enable-newline-is-cr,
202  --enable-newline-is-lf, --enable-newline-is-crlf,
203  --enable-newline-is-anycrlf, --enable-newline-is-any, or
204  --enable-newline-is-nul to the "configure" command, respectively.
205
206. By default, the sequence \R in a pattern matches any Unicode line ending
207  sequence. This is independent of the option specifying what PCRE2 considers
208  to be the end of a line (see above). However, the caller of PCRE2 can
209  restrict \R to match only CR, LF, or CRLF. You can make this the default by
210  adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
211
212. In a pattern, the escape sequence \C matches a single code unit, even in a
213  UTF mode. This can be dangerous because it breaks up multi-code-unit
214  characters. You can build PCRE2 with the use of \C permanently locked out by
215  adding --enable-never-backslash-C (note the upper case C) to the "configure"
216  command. When \C is allowed by the library, individual applications can lock
217  it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option.
218
219. PCRE2 has a counter that limits the depth of nesting of parentheses in a
220  pattern. This limits the amount of system stack that a pattern uses when it
221  is compiled. The default is 250, but you can change it by setting, for
222  example,
223
224  --with-parens-nest-limit=500
225
226. PCRE2 has a counter that can be set to limit the amount of computing resource
227  it uses when matching a pattern. If the limit is exceeded during a match, the
228  match fails. The default is ten million. You can change the default by
229  setting, for example,
230
231  --with-match-limit=500000
232
233  on the "configure" command. This is just the default; individual calls to
234  pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
235  discussion in the pcre2api man page (search for pcre2_set_match_limit).
236
237. There is a separate counter that limits the depth of nested backtracking
238  (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
239  matching process, which indirectly limits the amount of heap memory that is
240  used, and in the case of pcre2_dfa_match() the amount of stack as well. This
241  counter also has a default of ten million, which is essentially "unlimited".
242  You can change the default by setting, for example,
243
244  --with-match-limit-depth=5000
245
246  There is more discussion in the pcre2api man page (search for
247  pcre2_set_depth_limit).
248
249. You can also set an explicit limit on the amount of heap memory used by
250  the pcre2_match() and pcre2_dfa_match() interpreters:
251
252  --with-heap-limit=500
253
254  The units are kibibytes (units of 1024 bytes). This limit does not apply when
255  the JIT optimization (which has its own memory control features) is used.
256  There is more discussion on the pcre2api man page (search for
257  pcre2_set_heap_limit).
258
259. In the 8-bit library, the default maximum compiled pattern size is around
260  64 kibibytes. You can increase this by adding --with-link-size=3 to the
261  "configure" command. PCRE2 then uses three bytes instead of two for offsets
262  to different parts of the compiled pattern. In the 16-bit library,
263  --with-link-size=3 is the same as --with-link-size=4, which (in both
264  libraries) uses four-byte offsets. Increasing the internal link size reduces
265  performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
266  link size setting is ignored, as 4-byte offsets are always used.
267
268. For speed, PCRE2 uses four tables for manipulating and identifying characters
269  whose code point values are less than 256. By default, it uses a set of
270  tables for ASCII encoding that is part of the distribution. If you specify
271
272  --enable-rebuild-chartables
273
274  a program called pcre2_dftables is compiled and run in the default C locale
275  when you obey "make". It builds a source file called pcre2_chartables.c. If
276  you do not specify this option, pcre2_chartables.c is created as a copy of
277  pcre2_chartables.c.dist. See "Character tables" below for further
278  information.
279
280. It is possible to compile PCRE2 for use on systems that use EBCDIC as their
281  character code (as opposed to ASCII/Unicode) by specifying
282
283  --enable-ebcdic --disable-unicode
284
285  This automatically implies --enable-rebuild-chartables (see above). However,
286  when PCRE2 is built this way, it always operates in EBCDIC. It cannot support
287  both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
288  which specifies that the code value for the EBCDIC NL character is 0x25
289  instead of the default 0x15.
290
291. If you specify --enable-debug, additional debugging code is included in the
292  build. This option is intended for use by the PCRE2 maintainers.
293
294. In environments where valgrind is installed, if you specify
295
296  --enable-valgrind
297
298  PCRE2 will use valgrind annotations to mark certain memory regions as
299  unaddressable. This allows it to detect invalid memory accesses, and is
300  mostly useful for debugging PCRE2 itself.
301
302. In environments where the gcc compiler is used and lcov is installed, if you
303  specify
304
305  --enable-coverage
306
307  the build process implements a code coverage report for the test suite. The
308  report is generated by running "make coverage". If ccache is installed on
309  your system, it must be disabled when building PCRE2 for coverage reporting.
310  You can do this by setting the environment variable CCACHE_DISABLE=1 before
311  running "make" to build PCRE2. There is more information about coverage
312  reporting in the "pcre2build" documentation.
313
314. When JIT support is enabled, pcre2grep automatically makes use of it, unless
315  you add --disable-pcre2grep-jit to the "configure" command.
316
317. There is support for calling external programs during matching in the
318  pcre2grep command, using PCRE2's callout facility with string arguments. This
319  support can be disabled by adding --disable-pcre2grep-callout to the
320  "configure" command. There are two kinds of callout: one that generates
321  output from inbuilt code, and another that calls an external program. The
322  latter has special support for Windows and VMS; otherwise it assumes the
323  existence of the fork() function. This facility can be disabled by adding
324  --disable-pcre2grep-callout-fork to the "configure" command.
325
326. The pcre2grep program currently supports only 8-bit data files, and so
327  requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
328  libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
329  specifying one or both of
330
331  --enable-pcre2grep-libz
332  --enable-pcre2grep-libbz2
333
334  Of course, the relevant libraries must be installed on your system.
335
336. The default starting size (in bytes) of the internal buffer used by pcre2grep
337  can be set by, for example:
338
339  --with-pcre2grep-bufsize=51200
340
341  The value must be a plain integer. The default is 20480. The amount of memory
342  used by pcre2grep is actually three times this number, to allow for "before"
343  and "after" lines. If very long lines are encountered, the buffer is
344  automatically enlarged, up to a fixed maximum size.
345
346. The default maximum size of pcre2grep's internal buffer can be set by, for
347  example:
348
349  --with-pcre2grep-max-bufsize=2097152
350
351  The default is either 1048576 or the value of --with-pcre2grep-bufsize,
352  whichever is the larger.
353
354. It is possible to compile pcre2test so that it links with the libreadline
355  or libedit libraries, by specifying, respectively,
356
357  --enable-pcre2test-libreadline or --enable-pcre2test-libedit
358
359  If this is done, when pcre2test's input is from a terminal, it reads it using
360  the readline() function. This provides line-editing and history facilities.
361  Note that libreadline is GPL-licenced, so if you distribute a binary of
362  pcre2test linked in this way, there may be licensing issues. These can be
363  avoided by linking with libedit (which has a BSD licence) instead.
364
365  Enabling libreadline causes the -lreadline option to be added to the
366  pcre2test build. In many operating environments with a sytem-installed
367  readline library this is sufficient. However, in some environments (e.g. if
368  an unmodified distribution version of readline is in use), it may be
369  necessary to specify something like LIBS="-lncurses" as well. This is
370  because, to quote the readline INSTALL, "Readline uses the termcap functions,
371  but does not link with the termcap or curses library itself, allowing
372  applications which link with readline the to choose an appropriate library."
373  If you get error messages about missing functions tgetstr, tgetent, tputs,
374  tgetflag, or tgoto, this is the problem, and linking with the ncurses library
375  should fix it.
376
377. The C99 standard defines formatting modifiers z and t for size_t and
378  ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in
379  environments other than Microsoft Visual Studio versions earlier than 2013
380  when __STDC_VERSION__ is defined and has a value greater than or equal to
381  199901L (indicating C99). However, there is at least one environment that
382  claims to be C99 but does not support these modifiers. If
383  --disable-percent-zt is specified, no use is made of the z or t modifiers.
384  Instead of %td or %zu, %lu is used, with a cast for size_t values.
385
386. There is a special option called --enable-fuzz-support for use by people who
387  want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
388  library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
389  be built, but not installed. This contains a single function called
390  LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
391  length of the string. When called, this function tries to compile the string
392  as a pattern, and if that succeeds, to match it. This is done both with no
393  options and with some random options bits that are generated from the string.
394  Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
395  be created. This is normally run under valgrind or used when PCRE2 is
396  compiled with address sanitizing enabled. It calls the fuzzing function and
397  outputs information about it is doing. The input strings are specified by
398  arguments: if an argument starts with "=" the rest of it is a literal input
399  string. Otherwise, it is assumed to be a file name, and the contents of the
400  file are the test string.
401
402. Releases before 10.30 could be compiled with --disable-stack-for-recursion,
403  which caused pcre2_match() to use individual blocks on the heap for
404  backtracking instead of recursive function calls (which use the stack). This
405  is now obsolete since pcre2_match() was refactored always to use the heap (in
406  a much more efficient way than before). This option is retained for backwards
407  compatibility, but has no effect other than to output a warning.
408
409The "configure" script builds the following files for the basic C library:
410
411. Makefile             the makefile that builds the library
412. src/config.h         build-time configuration options for the library
413. src/pcre2.h          the public PCRE2 header file
414. pcre2-config          script that shows the building settings such as CFLAGS
415                         that were set for "configure"
416. libpcre2-8.pc        )
417. libpcre2-16.pc       ) data for the pkg-config command
418. libpcre2-32.pc       )
419. libpcre2-posix.pc    )
420. libtool              script that builds shared and/or static libraries
421
422Versions of config.h and pcre2.h are distributed in the src directory of PCRE2
423tarballs under the names config.h.generic and pcre2.h.generic. These are
424provided for those who have to build PCRE2 without using "configure" or CMake.
425If you use "configure" or CMake, the .generic versions are not used.
426
427The "configure" script also creates config.status, which is an executable
428script that can be run to recreate the configuration, and config.log, which
429contains compiler output from tests that "configure" runs.
430
431Once "configure" has run, you can run "make". This builds whichever of the
432libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test
433program called pcre2test. If you enabled JIT support with --enable-jit, another
434test program called pcre2_jit_test is built as well. If the 8-bit library is
435built, libpcre2-posix and the pcre2grep command are also built. Running
436"make" with the -j option may speed up compilation on multiprocessor systems.
437
438The command "make check" runs all the appropriate tests. Details of the PCRE2
439tests are given below in a separate section of this document. The -j option of
440"make" can also be used when running the tests.
441
442You can use "make install" to install PCRE2 into live directories on your
443system. The following are installed (file names are all relative to the
444<prefix> that is set when "configure" is run):
445
446  Commands (bin):
447    pcre2test
448    pcre2grep (if 8-bit support is enabled)
449    pcre2-config
450
451  Libraries (lib):
452    libpcre2-8      (if 8-bit support is enabled)
453    libpcre2-16     (if 16-bit support is enabled)
454    libpcre2-32     (if 32-bit support is enabled)
455    libpcre2-posix  (if 8-bit support is enabled)
456
457  Configuration information (lib/pkgconfig):
458    libpcre2-8.pc
459    libpcre2-16.pc
460    libpcre2-32.pc
461    libpcre2-posix.pc
462
463  Header files (include):
464    pcre2.h
465    pcre2posix.h
466
467  Man pages (share/man/man{1,3}):
468    pcre2grep.1
469    pcre2test.1
470    pcre2-config.1
471    pcre2.3
472    pcre2*.3 (lots more pages, all starting "pcre2")
473
474  HTML documentation (share/doc/pcre2/html):
475    index.html
476    *.html (lots more pages, hyperlinked from index.html)
477
478  Text file documentation (share/doc/pcre2):
479    AUTHORS
480    COPYING
481    ChangeLog
482    LICENCE
483    NEWS
484    README
485    pcre2.txt         (a concatenation of the man(3) pages)
486    pcre2test.txt     the pcre2test man page
487    pcre2grep.txt     the pcre2grep man page
488    pcre2-config.txt  the pcre2-config man page
489
490If you want to remove PCRE2 from your system, you can run "make uninstall".
491This removes all the files that "make install" installed. However, it does not
492remove any directories, because these are often shared with other programs.
493
494
495Retrieving configuration information
496------------------------------------
497
498Running "make install" installs the command pcre2-config, which can be used to
499recall information about the PCRE2 configuration and installation. For example:
500
501  pcre2-config --version
502
503prints the version number, and
504
505  pcre2-config --libs8
506
507outputs information about where the 8-bit library is installed. This command
508can be included in makefiles for programs that use PCRE2, saving the programmer
509from having to remember too many details. Run pcre2-config with no arguments to
510obtain a list of possible arguments.
511
512The pkg-config command is another system for saving and retrieving information
513about installed libraries. Instead of separate commands for each library, a
514single command is used. For example:
515
516  pkg-config --libs libpcre2-16
517
518The data is held in *.pc files that are installed in a directory called
519<prefix>/lib/pkgconfig.
520
521
522Shared libraries
523----------------
524
525The default distribution builds PCRE2 as shared libraries and static libraries,
526as long as the operating system supports shared libraries. Shared library
527support relies on the "libtool" script which is built as part of the
528"configure" process.
529
530The libtool script is used to compile and link both shared and static
531libraries. They are placed in a subdirectory called .libs when they are newly
532built. The programs pcre2test and pcre2grep are built to use these uninstalled
533libraries (by means of wrapper scripts in the case of shared libraries). When
534you use "make install" to install shared libraries, pcre2grep and pcre2test are
535automatically re-built to use the newly installed shared libraries before being
536installed themselves. However, the versions left in the build directory still
537use the uninstalled libraries.
538
539To build PCRE2 using static libraries only you must use --disable-shared when
540configuring it. For example:
541
542./configure --prefix=/usr/gnu --disable-shared
543
544Then run "make" in the usual way. Similarly, you can use --disable-static to
545build only shared libraries.
546
547
548Cross-compiling using autotools
549-------------------------------
550
551You can specify CC and CFLAGS in the normal way to the "configure" command, in
552order to cross-compile PCRE2 for some other host. However, you should NOT
553specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c
554source file is compiled and run on the local host, in order to generate the
555inbuilt character tables (the pcre2_chartables.c file). This will probably not
556work, because pcre2_dftables.c needs to be compiled with the local compiler,
557not the cross compiler.
558
559When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
560created by making a copy of pcre2_chartables.c.dist, which is a default set of
561tables that assumes ASCII code. Cross-compiling with the default tables should
562not be a problem.
563
564If you need to modify the character tables when cross-compiling, you should
565move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by
566hand and run it on the local host to make a new version of
567pcre2_chartables.c.dist. See the pcre2build section "Creating character tables
568at build time" for more details.
569
570
571Making new tarballs
572-------------------
573
574The command "make dist" creates two PCRE2 tarballs, in tar.gz and zip formats.
575The command "make distcheck" does the same, but then does a trial build of the
576new distribution to ensure that it works.
577
578If you have modified any of the man page sources in the doc directory, you
579should first run the PrepareRelease script before making a distribution. This
580script creates the .txt and HTML forms of the documentation from the man pages.
581
582
583Testing PCRE2
584-------------
585
586To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
587There is another script called RunGrepTest that tests the pcre2grep command.
588When JIT support is enabled, a third test program called pcre2_jit_test is
589built. Both the scripts and all the program tests are run if you obey "make
590check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
591
592The RunTest script runs the pcre2test test program (which is documented in its
593own man page) on each of the relevant testinput files in the testdata
594directory, and compares the output with the contents of the corresponding
595testoutput files. RunTest uses a file called testtry to hold the main output
596from pcre2test. Other files whose names begin with "test" are used as working
597files in some tests.
598
599Some tests are relevant only when certain build-time options were selected. For
600example, the tests for UTF-8/16/32 features are run only when Unicode support
601is available. RunTest outputs a comment when it skips a test.
602
603Many (but not all) of the tests that are not skipped are run twice if JIT
604support is available. On the second run, JIT compilation is forced. This
605testing can be suppressed by putting "nojit" on the RunTest command line.
606
607The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
608libraries that are enabled. If you want to run just one set of tests, call
609RunTest with either the -8, -16 or -32 option.
610
611If valgrind is installed, you can run the tests under it by putting "valgrind"
612on the RunTest command line. To run pcre2test on just one or more specific test
613files, give their numbers as arguments to RunTest, for example:
614
615  RunTest 2 7 11
616
617You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
618end), or a number preceded by ~ to exclude a test. For example:
619
620  Runtest 3-15 ~10
621
622This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
623except test 13. Whatever order the arguments are in, the tests are always run
624in numerical order.
625
626You can also call RunTest with the single argument "list" to cause it to output
627a list of tests.
628
629The test sequence starts with "test 0", which is a special test that has no
630input file, and whose output is not checked. This is because it will be
631different on different hardware and with different configurations. The test
632exists in order to exercise some of pcre2test's code that would not otherwise
633be run.
634
635Tests 1 and 2 can always be run, as they expect only plain text strings (not
636UTF) and make no use of Unicode properties. The first test file can be fed
637directly into the perltest.sh script to check that Perl gives the same results.
638The only difference you should see is in the first few lines, where the Perl
639version is given instead of the PCRE2 version. The second set of tests check
640auxiliary functions, error detection, and run-time flags that are specific to
641PCRE2. It also uses the debugging flags to check some of the internals of
642pcre2_compile().
643
644If you build PCRE2 with a locale setting that is not the standard C locale, the
645character tables may be different (see next paragraph). In some cases, this may
646cause failures in the second set of tests. For example, in a locale where the
647isprint() function yields TRUE for characters in the range 128-255, the use of
648[:isascii:] inside a character class defines a different set of characters, and
649this shows up in this test as a difference in the compiled code, which is being
650listed for checking. For example, where the comparison test output contains
651[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
652cases. This is not a bug in PCRE2.
653
654Test 3 checks pcre2_maketables(), the facility for building a set of character
655tables for a specific locale and using them instead of the default tables. The
656script uses the "locale" command to check for the availability of the "fr_FR",
657"french", or "fr" locale, and uses the first one that it finds. If the "locale"
658command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
659the list of available locales, the third test cannot be run, and a comment is
660output to say why. If running this test produces an error like this:
661
662  ** Failed to set locale "fr_FR"
663
664it means that the given locale is not available on your system, despite being
665listed by "locale". This does not mean that PCRE2 is broken. There are three
666alternative output files for the third test, because three different versions
667of the French locale have been encountered. The test passes if its output
668matches any one of them.
669
670Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
671with the perltest.sh script, and test 5 checking PCRE2-specific things.
672
673Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
674non-UTF mode and UTF-mode with Unicode property support, respectively.
675
676Test 8 checks some internal offsets and code size features, but it is run only
677when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
67832-bit modes and for different link sizes, so there are different output files
679for each mode and link size.
680
681Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
68216-bit and 32-bit modes. These are tests that generate different output in
6838-bit mode. Each pair are for general cases and Unicode support, respectively.
684
685Test 13 checks the handling of non-UTF characters greater than 255 by
686pcre2_dfa_match() in 16-bit and 32-bit modes.
687
688Test 14 contains some special UTF and UCP tests that give different output for
689different code unit widths.
690
691Test 15 contains a number of tests that must not be run with JIT. They check,
692among other non-JIT things, the match-limiting features of the intepretive
693matcher.
694
695Test 16 is run only when JIT support is not available. It checks that an
696attempt to use JIT has the expected behaviour.
697
698Test 17 is run only when JIT support is available. It checks JIT complete and
699partial modes, match-limiting under JIT, and other JIT-specific features.
700
701Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to
702the 8-bit library, without and with Unicode support, respectively.
703
704Test 20 checks the serialization functions by writing a set of compiled
705patterns to a file, and then reloading and checking them.
706
707Tests 21 and 22 test \C support when the use of \C is not locked out, without
708and with UTF support, respectively. Test 23 tests \C when it is locked out.
709
710Tests 24 and 25 test the experimental pattern conversion functions, without and
711with UTF support, respectively.
712
713
714Character tables
715----------------
716
717For speed, PCRE2 uses four tables for manipulating and identifying characters
718whose code point values are less than 256. By default, a set of tables that is
719built into the library is used. The pcre2_maketables() function can be called
720by an application to create a new set of tables in the current locale. This are
721passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a
722compile context.
723
724The source file called pcre2_chartables.c contains the default set of tables.
725By default, this is created as a copy of pcre2_chartables.c.dist, which
726contains tables for ASCII coding. However, if --enable-rebuild-chartables is
727specified for ./configure, a new version of pcre2_chartables.c is built by the
728program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C
729character handling functions such as isalnum(), isalpha(), isupper(),
730islower(), etc. to build the table sources. This means that the default C
731locale that is set for your system will control the contents of these default
732tables. You can change the default tables by editing pcre2_chartables.c and
733then re-building PCRE2. If you do this, you should take care to ensure that the
734file does not get automatically re-generated. The best way to do this is to
735move pcre2_chartables.c.dist out of the way and replace it with your customized
736tables.
737
738When the pcre2_dftables program is run as a result of specifying
739--enable-rebuild-chartables, it uses the default C locale that is set on your
740system. It does not pay attention to the LC_xxx environment variables. In other
741words, it uses the system's default locale rather than whatever the compiling
742user happens to have set. If you really do want to build a source set of
743character tables in a locale that is specified by the LC_xxx variables, you can
744run the pcre2_dftables program by hand with the -L option. For example:
745
746  ./pcre2_dftables -L pcre2_chartables.c.special
747
748The second argument names the file where the source code for the tables is
749written. The first two 256-byte tables provide lower casing and case flipping
750functions, respectively. The next table consists of a number of 32-byte bit
751maps which identify certain character classes such as digits, "word"
752characters, white space, etc. These are used when building 32-byte bit maps
753that represent character classes for code points less than 256. The final
754256-byte table has bits indicating various character types, as follows:
755
756    1   white space character
757    2   letter
758    4   lower case letter
759    8   decimal digit
760   16   alphanumeric or '_'
761
762You can also specify -b (with or without -L) when running pcre2_dftables. This
763causes the tables to be written in binary instead of as source code. A set of
764binary tables can be loaded into memory by an application and passed to
765pcre2_compile() in the same way as tables created dynamically by calling
766pcre2_maketables(). The tables are just a string of bytes, independent of
767hardware characteristics such as endianness. This means they can be bundled
768with an application that runs in different environments, to ensure consistent
769behaviour.
770
771See also the pcre2build section "Creating character tables at build time".
772
773
774File manifest
775-------------
776
777The distribution should contain the files listed below.
778
779(A) Source files for the PCRE2 library functions and their headers are found in
780    the src directory:
781
782  src/pcre2_dftables.c     auxiliary program for building pcre2_chartables.c
783                           when --enable-rebuild-chartables is specified
784
785  src/pcre2_chartables.c.dist  a default set of character tables that assume
786                           ASCII coding; unless --enable-rebuild-chartables is
787                           specified, used by copying to pcre2_chartables.c
788
789  src/pcre2posix.c         )
790  src/pcre2_auto_possess.c )
791  src/pcre2_compile.c      )
792  src/pcre2_config.c       )
793  src/pcre2_context.c      )
794  src/pcre2_convert.c      )
795  src/pcre2_dfa_match.c    )
796  src/pcre2_error.c        )
797  src/pcre2_extuni.c       )
798  src/pcre2_find_bracket.c )
799  src/pcre2_jit_compile.c  )
800  src/pcre2_jit_match.c    ) sources for the functions in the library,
801  src/pcre2_jit_misc.c     )   and some internal functions that they use
802  src/pcre2_maketables.c   )
803  src/pcre2_match.c        )
804  src/pcre2_match_data.c   )
805  src/pcre2_newline.c      )
806  src/pcre2_ord2utf.c      )
807  src/pcre2_pattern_info.c )
808  src/pcre2_script_run.c   )
809  src/pcre2_serialize.c    )
810  src/pcre2_string_utils.c )
811  src/pcre2_study.c        )
812  src/pcre2_substitute.c   )
813  src/pcre2_substring.c    )
814  src/pcre2_tables.c       )
815  src/pcre2_ucd.c          )
816  src/pcre2_valid_utf.c    )
817  src/pcre2_xclass.c       )
818
819  src/pcre2_printint.c     debugging function that is used by pcre2test,
820  src/pcre2_fuzzsupport.c  function for (optional) fuzzing support
821
822  src/config.h.in          template for config.h, when built by "configure"
823  src/pcre2.h.in           template for pcre2.h when built by "configure"
824  src/pcre2posix.h         header for the external POSIX wrapper API
825  src/pcre2_internal.h     header for internal use
826  src/pcre2_intmodedep.h   a mode-specific internal header
827  src/pcre2_ucp.h          header for Unicode property handling
828
829  sljit/*                  source files for the JIT compiler
830
831(B) Source files for programs that use PCRE2:
832
833  src/pcre2demo.c          simple demonstration of coding calls to PCRE2
834  src/pcre2grep.c          source of a grep utility that uses PCRE2
835  src/pcre2test.c          comprehensive test program
836  src/pcre2_jit_test.c     JIT test program
837
838(C) Auxiliary files:
839
840  132html                  script to turn "man" pages into HTML
841  AUTHORS                  information about the author of PCRE2
842  ChangeLog                log of changes to the code
843  CleanTxt                 script to clean nroff output for txt man pages
844  Detrail                  script to remove trailing spaces
845  HACKING                  some notes about the internals of PCRE2
846  INSTALL                  generic installation instructions
847  LICENCE                  conditions for the use of PCRE2
848  COPYING                  the same, using GNU's standard name
849  Makefile.in              ) template for Unix Makefile, which is built by
850                           )   "configure"
851  Makefile.am              ) the automake input that was used to create
852                           )   Makefile.in
853  NEWS                     important changes in this release
854  NON-AUTOTOOLS-BUILD      notes on building PCRE2 without using autotools
855  PrepareRelease           script to make preparations for "make dist"
856  README                   this file
857  RunTest                  a Unix shell script for running tests
858  RunGrepTest              a Unix shell script for pcre2grep tests
859  aclocal.m4               m4 macros (generated by "aclocal")
860  config.guess             ) files used by libtool,
861  config.sub               )   used only when building a shared library
862  configure                a configuring shell script (built by autoconf)
863  configure.ac             ) the autoconf input that was used to build
864                           )   "configure" and config.h
865  depcomp                  ) script to find program dependencies, generated by
866                           )   automake
867  doc/*.3                  man page sources for PCRE2
868  doc/*.1                  man page sources for pcre2grep and pcre2test
869  doc/index.html.src       the base HTML page
870  doc/html/*               HTML documentation
871  doc/pcre2.txt            plain text version of the man pages
872  doc/pcre2test.txt        plain text documentation of test program
873  install-sh               a shell script for installing files
874  libpcre2-8.pc.in         template for libpcre2-8.pc for pkg-config
875  libpcre2-16.pc.in        template for libpcre2-16.pc for pkg-config
876  libpcre2-32.pc.in        template for libpcre2-32.pc for pkg-config
877  libpcre2-posix.pc.in     template for libpcre2-posix.pc for pkg-config
878  ltmain.sh                file used to build a libtool script
879  missing                  ) common stub for a few missing GNU programs while
880                           )   installing, generated by automake
881  mkinstalldirs            script for making install directories
882  perltest.sh              Script for running a Perl test program
883  pcre2-config.in          source of script which retains PCRE2 information
884  testdata/testinput*      test data for main library tests
885  testdata/testoutput*     expected test results
886  testdata/grep*           input and output for pcre2grep tests
887  testdata/*               other supporting test files
888
889(D) Auxiliary files for cmake support
890
891  cmake/COPYING-CMAKE-SCRIPTS
892  cmake/FindPackageHandleStandardArgs.cmake
893  cmake/FindEditline.cmake
894  cmake/FindReadline.cmake
895  CMakeLists.txt
896  config-cmake.h.in
897
898(E) Auxiliary files for building PCRE2 "by hand"
899
900  src/pcre2.h.generic     ) a version of the public PCRE2 header file
901                          )   for use in non-"configure" environments
902  src/config.h.generic    ) a version of config.h for use in non-"configure"
903                          )   environments
904
905Philip Hazel
906Email local part: Philip.Hazel
907Email domain: gmail.com
908Last updated: 29 October 2021
909

README.md

1# PCRE2 - Perl-Compatible Regular Expressions
2
3The PCRE2 library is a set of C functions that implement regular expression
4pattern matching using the same syntax and semantics as Perl 5. PCRE2 has its
5own native API, as well as a set of wrapper functions that correspond to the
6POSIX regular expression API. The PCRE2 library is free, even for building
7proprietary software. It comes in three forms, for processing 8-bit, 16-bit,
8or 32-bit code units, in either literal or UTF encoding.
9
10PCRE2 was first released in 2015 to replace the API in the original PCRE
11library, which is now obsolete and no longer maintained. As well as a more
12flexible API, the code of PCRE2 has been much improved since the fork.
13
14## Download
15
16As well as downloading from the
17[GitHub site](https://github.com/PhilipHazel/pcre2), you can download PCRE2
18or the older, unmaintained PCRE1 library from an
19[*unofficial* mirror](https://sourceforge.net/projects/pcre/files/) at SourceForge.
20
21You can check out the PCRE2 source code via Git or Subversion:
22
23    git clone https://github.com/PhilipHazel/pcre2.git
24    svn co    https://github.com/PhilipHazel/pcre2.git
25
26## Contributed Ports
27
28If you just need the command-line PCRE2 tools on Windows, precompiled binary
29versions are available at this
30[Rexegg page](http://www.rexegg.com/pcregrep-pcretest.html).
31
32A PCRE2 port for z/OS, a mainframe operating system which uses EBCDIC as its
33default character encoding, can be found at
34[http://www.cbttape.org](http://www.cbttape.org/) (File 939).
35
36## Documentation
37
38You can read the PCRE2 documentation
39[here](https://philiphazel.github.io/pcre2/doc/html/index.html).
40
41Comparisons to Perl's regular expression semantics can be found in the
42community authored Wikipedia entry for PCRE.
43
44There is a curated summary of changes for each PCRE release, copies of
45documentation from older releases, and other useful information from the third
46party authored
47[RexEgg PCRE Documentation and Change Log page](http://www.rexegg.com/pcre-documentation.html).
48
49## Contact
50
51To report a problem with the PCRE2 library, or to make a feature request, please
52use the PCRE2 GitHub issues tracker. There is a mailing list for discussion of
53 PCRE2 issues and development at pcre2-dev@googlegroups.com, which is where any
54announcements will be made. You can browse the
55[list archives](https://groups.google.com/g/pcre2-dev).
56
57