xref: /dragonfly/contrib/grep/doc/grep.texi (revision 2b7dbe20)
1\input texinfo  @c -*-texinfo-*-
2@c %**start of header
3@setfilename grep.info
4@include version.texi
5@settitle GNU Grep @value{VERSION}
6
7@c Combine indices.
8@syncodeindex ky cp
9@syncodeindex pg cp
10@syncodeindex tp cp
11@defcodeindex op
12@syncodeindex op cp
13@syncodeindex vr cp
14@c %**end of header
15
16@documentencoding UTF-8
17@c These two require Texinfo 5.0 or later, so use the older
18@c equivalent @set variables supported in 4.11 and later.
19@ignore
20@codequotebacktick on
21@codequoteundirected on
22@end ignore
23@set txicodequoteundirected
24@set txicodequotebacktick
25@iftex
26@c TeX sometimes fails to hyphenate, so help it here.
27@hyphenation{spec-i-fied}
28@end iftex
29
30@copying
31This manual is for @command{grep}, a pattern matching engine.
32
33Copyright @copyright{} 1999--2002, 2005, 2008--2020 Free Software Foundation,
34Inc.
35
36@quotation
37Permission is granted to copy, distribute and/or modify this document
38under the terms of the GNU Free Documentation License, Version 1.3 or
39any later version published by the Free Software Foundation; with no
40Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
41Texts.  A copy of the license is included in the section entitled
42``GNU Free Documentation License''.
43@end quotation
44@end copying
45
46@dircategory Text creation and manipulation
47@direntry
48* grep: (grep).                 Print lines that match patterns.
49@end direntry
50
51@titlepage
52@title GNU Grep: Print lines that match patterns
53@subtitle version @value{VERSION}, @value{UPDATED}
54@author Alain Magloire et al.
55@page
56@vskip 0pt plus 1filll
57@insertcopying
58@end titlepage
59
60@contents
61
62
63@ifnottex
64@node Top
65@top grep
66
67@command{grep} prints lines that contain a match for one or more patterns.
68
69This manual is for version @value{VERSION} of GNU Grep.
70
71@insertcopying
72@end ifnottex
73
74@menu
75* Introduction::                Introduction.
76* Invoking::                    Command-line options, environment, exit status.
77* Regular Expressions::         Regular Expressions.
78* Usage::                       Examples.
79* Performance::                 Performance tuning.
80* Reporting Bugs::              Reporting Bugs.
81* Copying::                     License terms for this manual.
82* Index::                       Combined index.
83@end menu
84
85
86@node Introduction
87@chapter Introduction
88
89@cindex searching for patterns
90
91Given one or more patterns, @command{grep} searches input files
92for matches to the patterns.
93When it finds a match in a line,
94it copies the line to standard output (by default),
95or produces whatever other sort of output you have requested with options.
96
97Though @command{grep} expects to do the matching on text,
98it has no limits on input line length other than available memory,
99and it can match arbitrary characters within a line.
100If the final byte of an input file is not a newline,
101@command{grep} silently supplies one.
102Since newline is also a separator for the list of patterns,
103there is no way to match newline characters in a text.
104
105
106@node Invoking
107@chapter Invoking @command{grep}
108
109The general synopsis of the @command{grep} command line is
110
111@example
112grep [@var{option}...] [@var{patterns}] [@var{file}...]
113@end example
114
115@noindent
116There can be zero or more @var{option} arguments, and zero or more
117@var{file} arguments.  The @var{patterns} argument contains one or
118more patterns separated by newlines, and is omitted when patterns are
119given via the @samp{-e@ @var{patterns}} or @samp{-f@ @var{file}}
120options.  Typically @var{patterns} should be quoted when
121@command{grep} is used in a shell command.
122
123@menu
124* Command-line Options::        Short and long names, grouped by category.
125* Environment Variables::       POSIX, GNU generic, and GNU grep specific.
126* Exit Status::                 Exit status returned by @command{grep}.
127* grep Programs::               @command{grep} programs.
128@end menu
129
130@node Command-line Options
131@section Command-line Options
132
133@command{grep} comes with a rich set of options:
134some from POSIX and some being GNU extensions.
135Long option names are always a GNU extension,
136even for options that are from POSIX specifications.
137Options that are specified by POSIX,
138under their short names,
139are explicitly marked as such
140to facilitate POSIX-portable programming.
141A few option names are provided
142for compatibility with older or more exotic implementations.
143
144@menu
145* Generic Program Information::
146* Matching Control::
147* General Output Control::
148* Output Line Prefix Control::
149* Context Line Control::
150* File and Directory Selection::
151* Other Options::
152@end menu
153
154Several additional options control
155which variant of the @command{grep} matching engine is used.
156@xref{grep Programs}.
157
158@node Generic Program Information
159@subsection Generic Program Information
160
161@table @option
162
163@item --help
164@opindex --help
165@cindex usage summary, printing
166Print a usage message briefly summarizing the command-line options
167and the bug-reporting address, then exit.
168
169@item -V
170@itemx --version
171@opindex -V
172@opindex --version
173@cindex version, printing
174Print the version number of @command{grep} to the standard output stream.
175This version number should be included in all bug reports.
176
177@end table
178
179@node Matching Control
180@subsection Matching Control
181
182@table @option
183
184@item -e @var{patterns}
185@itemx --regexp=@var{patterns}
186@opindex -e
187@opindex --regexp=@var{patterns}
188@cindex patterns option
189Use @var{patterns} as one or more patterns; newlines within
190@var{patterns} separate each pattern from the next.
191If this option is used multiple times or is combined with the
192@option{-f} (@option{--file}) option, search for all patterns given.
193Typically @var{patterns} should be quoted when @command{grep} is used
194in a shell command.
195(@option{-e} is specified by POSIX.)
196
197@item -f @var{file}
198@itemx --file=@var{file}
199@opindex -f
200@opindex --file
201@cindex patterns from file
202Obtain patterns from @var{file}, one per line.
203If this option is used multiple times or is combined with the
204@option{-e} (@option{--regexp}) option, search for all patterns given.
205The empty file contains zero patterns, and therefore matches nothing.
206(@option{-f} is specified by POSIX.)
207
208@item -i
209@itemx -y
210@itemx --ignore-case
211@opindex -i
212@opindex -y
213@opindex --ignore-case
214@cindex case insensitive search
215Ignore case distinctions in patterns and input data,
216so that characters that differ only in case
217match each other.  Although this is straightforward when letters
218differ in case only via lowercase-uppercase pairs, the behavior is
219unspecified in other situations.  For example, uppercase ``S'' has an
220unusual lowercase counterpart ``ſ'' (Unicode character U+017F, LATIN
221SMALL LETTER LONG S) in many locales, and it is unspecified whether
222this unusual character matches ``S'' or ``s'' even though uppercasing
223it yields ``S''.  Another example: the lowercase German letter ``ß''
224(U+00DF, LATIN SMALL LETTER SHARP S) is normally capitalized as the
225two-character string ``SS'' but it does not match ``SS'', and it might
226not match the uppercase letter ``ẞ'' (U+1E9E, LATIN CAPITAL LETTER
227SHARP S) even though lowercasing the latter yields the former.
228
229@option{-y} is an obsolete synonym that is provided for compatibility.
230(@option{-i} is specified by POSIX.)
231
232@item --no-ignore-case
233@opindex --no-ignore-case
234Do not ignore case distinctions in patterns and input data.  This is
235the default.  This option is useful for passing to shell scripts that
236already use @option{-i}, in order to cancel its effects because the
237two options override each other.
238
239@item -v
240@itemx --invert-match
241@opindex -v
242@opindex --invert-match
243@cindex invert matching
244@cindex print non-matching lines
245Invert the sense of matching, to select non-matching lines.
246(@option{-v} is specified by POSIX.)
247
248@item -w
249@itemx --word-regexp
250@opindex -w
251@opindex --word-regexp
252@cindex matching whole words
253Select only those lines containing matches that form whole words.
254The test is that the matching substring must either
255be at the beginning of the line,
256or preceded by a non-word constituent character.
257Similarly,
258it must be either at the end of the line
259or followed by a non-word constituent character.
260Word constituent characters are letters, digits, and the underscore.
261This option has no effect if @option{-x} is also specified.
262
263Because the @option{-w} option can match a substring that does not
264begin and end with word constituents, it differs from surrounding a
265regular expression with @samp{\<} and @samp{\>}.  For example, although
266@samp{grep -w @@} matches a line containing only @samp{@@}, @samp{grep
267'\<@@\>'} cannot match any line because @samp{@@} is not a
268word constituent.  @xref{The Backslash Character and Special
269Expressions}.
270
271@item -x
272@itemx --line-regexp
273@opindex -x
274@opindex --line-regexp
275@cindex match the whole line
276Select only those matches that exactly match the whole line.
277For regular expression patterns, this is like parenthesizing each
278pattern and then surrounding it with @samp{^} and @samp{$}.
279(@option{-x} is specified by POSIX.)
280
281@end table
282
283@node General Output Control
284@subsection General Output Control
285
286@table @option
287
288@item -c
289@itemx --count
290@opindex -c
291@opindex --count
292@cindex counting lines
293Suppress normal output;
294instead print a count of matching lines for each input file.
295With the @option{-v} (@option{--invert-match}) option,
296count non-matching lines.
297(@option{-c} is specified by POSIX.)
298
299@item --color[=@var{WHEN}]
300@itemx --colour[=@var{WHEN}]
301@opindex --color
302@opindex --colour
303@cindex highlight, color, colour
304Surround the matched (non-empty) strings, matching lines, context lines,
305file names, line numbers, byte offsets, and separators (for fields and
306groups of context lines) with escape sequences to display them in color
307on the terminal.
308The colors are defined by the environment variable @env{GREP_COLORS}
309and default to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36}
310for bold red matched text, magenta file names, green line numbers,
311green byte offsets, cyan separators, and default terminal colors otherwise.
312The deprecated environment variable @env{GREP_COLOR} is still supported,
313but its setting does not have priority;
314it defaults to @samp{01;31} (bold red)
315which only covers the color for matched text.
316@var{WHEN} is @samp{never}, @samp{always}, or @samp{auto}.
317
318@item -L
319@itemx --files-without-match
320@opindex -L
321@opindex --files-without-match
322@cindex files which don't match
323Suppress normal output;
324instead print the name of each input file from which
325no output would normally have been printed.
326The scanning of each file stops on the first match.
327
328@item -l
329@itemx --files-with-matches
330@opindex -l
331@opindex --files-with-matches
332@cindex names of matching files
333Suppress normal output;
334instead print the name of each input file from which
335output would normally have been printed.
336The scanning of each file stops on the first match.
337(@option{-l} is specified by POSIX.)
338
339@item -m @var{num}
340@itemx --max-count=@var{num}
341@opindex -m
342@opindex --max-count
343@cindex max-count
344Stop after the first @var{num} selected lines.
345If the input is standard input from a regular file,
346and @var{num} selected lines are output,
347@command{grep} ensures that the standard input is positioned
348just after the last selected line before exiting,
349regardless of the presence of trailing context lines.
350This enables a calling process to resume a search.
351For example, the following shell script makes use of it:
352
353@example
354while grep -m 1 'PATTERN'
355do
356  echo xxxx
357done < FILE
358@end example
359
360But the following probably will not work because a pipe is not a regular
361file:
362
363@example
364# This probably will not work.
365cat FILE |
366while grep -m 1 'PATTERN'
367do
368  echo xxxx
369done
370@end example
371
372@cindex context lines
373When @command{grep} stops after @var{num} selected lines,
374it outputs any trailing context lines.
375When the @option{-c} or @option{--count} option is also used,
376@command{grep} does not output a count greater than @var{num}.
377When the @option{-v} or @option{--invert-match} option is also used,
378@command{grep} stops after outputting @var{num} non-matching lines.
379
380@item -o
381@itemx --only-matching
382@opindex -o
383@opindex --only-matching
384@cindex only matching
385Print only the matched (non-empty) parts of matching lines,
386with each such part on a separate output line.
387Output lines use the same delimiters as input, and delimiters are null
388bytes if @option{-z} (@option{--null-data}) is also used (@pxref{Other
389Options}).
390
391@item -q
392@itemx --quiet
393@itemx --silent
394@opindex -q
395@opindex --quiet
396@opindex --silent
397@cindex quiet, silent
398Quiet; do not write anything to standard output.
399Exit immediately with zero status if any match is found,
400even if an error was detected.
401Also see the @option{-s} or @option{--no-messages} option.
402(@option{-q} is specified by POSIX.)
403
404@item -s
405@itemx --no-messages
406@opindex -s
407@opindex --no-messages
408@cindex suppress error messages
409Suppress error messages about nonexistent or unreadable files.
410Portability note:
411unlike GNU @command{grep},
4127th Edition Unix @command{grep} did not conform to POSIX,
413because it lacked @option{-q}
414and its @option{-s} option behaved like
415GNU @command{grep}'s @option{-q} option.@footnote{Of course, 7th Edition
416Unix predated POSIX by several years!}
417USG-style @command{grep} also lacked @option{-q}
418but its @option{-s} option behaved like GNU @command{grep}'s.
419Portable shell scripts should avoid both
420@option{-q} and @option{-s} and should redirect
421standard and error output to @file{/dev/null} instead.
422(@option{-s} is specified by POSIX.)
423
424@end table
425
426@node Output Line Prefix Control
427@subsection Output Line Prefix Control
428
429When several prefix fields are to be output,
430the order is always file name, line number, and byte offset,
431regardless of the order in which these options were specified.
432
433@table @option
434
435@item -b
436@itemx --byte-offset
437@opindex -b
438@opindex --byte-offset
439@cindex byte offset
440Print the 0-based byte offset within the input file
441before each line of output.
442If @option{-o} (@option{--only-matching}) is specified,
443print the offset of the matching part itself.
444
445@item -H
446@itemx --with-filename
447@opindex -H
448@opindex --with-filename
449@cindex with filename prefix
450Print the file name for each match.
451This is the default when there is more than one file to search.
452
453@item -h
454@itemx --no-filename
455@opindex -h
456@opindex --no-filename
457@cindex no filename prefix
458Suppress the prefixing of file names on output.
459This is the default when there is only one file
460(or only standard input) to search.
461
462@item --label=@var{LABEL}
463@opindex --label
464@cindex changing name of standard input
465Display input actually coming from standard input
466as input coming from file @var{LABEL}.
467This can be useful for commands that transform a file's contents
468before searching; e.g.:
469
470@example
471gzip -cd foo.gz | grep --label=foo -H 'some pattern'
472@end example
473
474@item -n
475@itemx --line-number
476@opindex -n
477@opindex --line-number
478@cindex line numbering
479Prefix each line of output with the 1-based line number within its input file.
480(@option{-n} is specified by POSIX.)
481
482@item -T
483@itemx --initial-tab
484@opindex -T
485@opindex --initial-tab
486@cindex tab-aligned content lines
487Make sure that the first character of actual line content lies on a tab stop,
488so that the alignment of tabs looks normal.
489This is useful with options that prefix their output to the actual content:
490@option{-H}, @option{-n}, and @option{-b}.
491This may also prepend spaces to output line numbers and byte offsets
492so that lines from a single file all start at the same column.
493
494@item -Z
495@itemx --null
496@opindex -Z
497@opindex --null
498@cindex zero-terminated file names
499Output a zero byte (the ASCII NUL character)
500instead of the character that normally follows a file name.
501For example,
502@samp{grep -lZ} outputs a zero byte after each file name
503instead of the usual newline.
504This option makes the output unambiguous,
505even in the presence of file names containing unusual characters like newlines.
506This option can be used with commands like
507@samp{find -print0}, @samp{perl -0}, @samp{sort -z}, and @samp{xargs -0}
508to process arbitrary file names,
509even those that contain newline characters.
510
511@end table
512
513@node Context Line Control
514@subsection Context Line Control
515
516@cindex context lines
517@dfn{Context lines} are non-matching lines that are near a matching line.
518They are output only if one of the following options are used.
519Regardless of how these options are set,
520@command{grep} never outputs any given line more than once.
521If the @option{-o} (@option{--only-matching}) option is specified,
522these options have no effect and a warning is given upon their use.
523
524@table @option
525
526@item -A @var{num}
527@itemx --after-context=@var{num}
528@opindex -A
529@opindex --after-context
530@cindex after context
531@cindex context lines, after match
532Print @var{num} lines of trailing context after matching lines.
533
534@item -B @var{num}
535@itemx --before-context=@var{num}
536@opindex -B
537@opindex --before-context
538@cindex before context
539@cindex context lines, before match
540Print @var{num} lines of leading context before matching lines.
541
542@item -C @var{num}
543@itemx -@var{num}
544@itemx --context=@var{num}
545@opindex -C
546@opindex --context
547@opindex -@var{num}
548@cindex context lines
549Print @var{num} lines of leading and trailing output context.
550
551@item --group-separator=@var{string}
552@opindex --group-separator
553@cindex group separator
554When @option{-A}, @option{-B} or @option{-C} are in use,
555print @var{string} instead of @option{--} between groups of lines.
556
557@item --no-group-separator
558@opindex --group-separator
559@cindex group separator
560When @option{-A}, @option{-B} or @option{-C} are in use,
561do not print a separator between groups of lines.
562
563@end table
564
565Here are some points about how @command{grep} chooses
566the separator to print between prefix fields and line content:
567
568@itemize @bullet
569@item
570Matching lines normally use @samp{:} as a separator
571between prefix fields and actual line content.
572
573@item
574Context (i.e., non-matching) lines use @samp{-} instead.
575
576@item
577When context is not specified,
578matching lines are simply output one right after another.
579
580@item
581When context is specified,
582lines that are adjacent in the input form a group
583and are output one right after another, while
584by default a separator appears between non-adjacent groups.
585
586@item
587The default separator
588is a @samp{--} line; its presence and appearance
589can be changed with the options above.
590
591@item
592Each group may contain
593several matching lines when they are close enough to each other
594that two adjacent groups connect and can merge into a single
595contiguous one.
596@end itemize
597
598@node File and Directory Selection
599@subsection File and Directory Selection
600
601@table @option
602
603@item -a
604@itemx --text
605@opindex -a
606@opindex --text
607@cindex suppress binary data
608@cindex binary files
609Process a binary file as if it were text;
610this is equivalent to the @samp{--binary-files=text} option.
611
612@item --binary-files=@var{type}
613@opindex --binary-files
614@cindex binary files
615If a file's data or metadata
616indicate that the file contains binary data,
617assume that the file is of type @var{type}.
618Non-text bytes indicate binary data; these are either output bytes that are
619improperly encoded for the current locale (@pxref{Environment
620Variables}), or null input bytes when the
621@option{-z} (@option{--null-data}) option is not given (@pxref{Other
622Options}).
623
624By default, @var{type} is @samp{binary}, and @command{grep}
625suppresses output after null input binary data is discovered,
626and suppresses output lines that contain improperly encoded data.
627When some output is suppressed, @command{grep} follows any output
628with a one-line message saying that a binary file matches.
629
630If @var{type} is @samp{without-match},
631when @command{grep} discovers null input binary data
632it assumes that the rest of the file does not match;
633this is equivalent to the @option{-I} option.
634
635If @var{type} is @samp{text},
636@command{grep} processes binary data as if it were text;
637this is equivalent to the @option{-a} option.
638
639When @var{type} is @samp{binary}, @command{grep} may treat non-text
640bytes as line terminators even without the @option{-z}
641(@option{--null-data}) option.  This means choosing @samp{binary}
642versus @samp{text} can affect whether a pattern matches a file.  For
643example, when @var{type} is @samp{binary} the pattern @samp{q$} might
644match @samp{q} immediately followed by a null byte, even though this
645is not matched when @var{type} is @samp{text}.  Conversely, when
646@var{type} is @samp{binary} the pattern @samp{.} (period) might not
647match a null byte.
648
649@emph{Warning:} The @option{-a} (@option{--binary-files=text}) option
650might output binary garbage, which can have nasty side effects if the
651output is a terminal and if the terminal driver interprets some of it
652as commands.  On the other hand, when reading files whose text
653encodings are unknown, it can be helpful to use @option{-a} or to set
654@samp{LC_ALL='C'} in the environment, in order to find more matches
655even if the matches are unsafe for direct display.
656
657@item -D @var{action}
658@itemx --devices=@var{action}
659@opindex -D
660@opindex --devices
661@cindex device search
662If an input file is a device, FIFO, or socket, use @var{action} to process it.
663If @var{action} is @samp{read},
664all devices are read just as if they were ordinary files.
665If @var{action} is @samp{skip},
666devices, FIFOs, and sockets are silently skipped.
667By default, devices are read if they are on the command line or if the
668@option{-R} (@option{--dereference-recursive}) option is used, and are
669skipped if they are encountered recursively and the @option{-r}
670(@option{--recursive}) option is used.
671This option has no effect on a file that is read via standard input.
672
673@item -d @var{action}
674@itemx --directories=@var{action}
675@opindex -d
676@opindex --directories
677@cindex directory search
678@cindex symbolic links
679If an input file is a directory, use @var{action} to process it.
680By default, @var{action} is @samp{read},
681which means that directories are read just as if they were ordinary files
682(some operating systems and file systems disallow this,
683and will cause @command{grep}
684to print error messages for every directory or silently skip them).
685If @var{action} is @samp{skip}, directories are silently skipped.
686If @var{action} is @samp{recurse},
687@command{grep} reads all files under each directory, recursively,
688following command-line symbolic links and skipping other symlinks;
689this is equivalent to the @option{-r} option.
690
691@item --exclude=@var{glob}
692@opindex --exclude
693@cindex exclude files
694@cindex searching directory trees
695Skip any command-line file with a name suffix that matches the pattern
696@var{glob}, using wildcard matching; a name suffix is either the whole
697name, or a trailing part that starts with a non-slash character
698immediately after a slash (@samp{/}) in the name.
699When searching recursively, skip any subfile whose base
700name matches @var{glob}; the base name is the part after the last
701slash.  A pattern can use
702@samp{*}, @samp{?}, and @samp{[}...@samp{]} as wildcards,
703and @code{\} to quote a wildcard or backslash character literally.
704
705@item --exclude-from=@var{file}
706@opindex --exclude-from
707@cindex exclude files
708@cindex searching directory trees
709Skip files whose name matches any of the patterns
710read from @var{file} (using wildcard matching as described
711under @option{--exclude}).
712
713@item --exclude-dir=@var{glob}
714@opindex --exclude-dir
715@cindex exclude directories
716Skip any command-line directory with a name suffix that matches the
717pattern @var{glob}.  When searching recursively, skip any subdirectory
718whose base name matches @var{glob}.  Ignore any redundant trailing
719slashes in @var{glob}.
720
721@item -I
722Process a binary file as if it did not contain matching data;
723this is equivalent to the @samp{--binary-files=without-match} option.
724
725@item --include=@var{glob}
726@opindex --include
727@cindex include files
728@cindex searching directory trees
729Search only files whose name matches @var{glob},
730using wildcard matching as described under @option{--exclude}.
731
732@item -r
733@itemx --recursive
734@opindex -r
735@opindex --recursive
736@cindex recursive search
737@cindex searching directory trees
738@cindex symbolic links
739For each directory operand,
740read and process all files in that directory, recursively.
741Follow symbolic links on the command line, but skip symlinks
742that are encountered recursively.
743Note that if no file operand is given, grep searches the working directory.
744This is the same as the @samp{--directories=recurse} option.
745
746@item -R
747@itemx --dereference-recursive
748@opindex -R
749@opindex --dereference-recursive
750@cindex recursive search
751@cindex searching directory trees
752@cindex symbolic links
753For each directory operand, read and process all files in that
754directory, recursively, following all symbolic links.
755
756@end table
757
758@node Other Options
759@subsection Other Options
760
761@table @option
762
763@item --
764@opindex --
765@cindex option delimiter
766Delimit the option list.  Later arguments, if any, are treated as
767operands even if they begin with @samp{-}.  For example, @samp{grep PAT --
768-file1 file2} searches for the pattern PAT in the files named @file{-file1}
769and @file{file2}.
770
771@item --line-buffered
772@opindex --line-buffered
773@cindex line buffering
774Use line buffering on output.
775This can cause a performance penalty.
776
777@item -U
778@itemx --binary
779@opindex -U
780@opindex --binary
781@cindex MS-Windows binary I/O
782@cindex binary I/O
783On platforms that distinguish between text and binary I/O,
784use the latter when reading and writing files other
785than the user's terminal, so that all input bytes are read and written
786as-is.  This overrides the default behavior where @command{grep}
787follows the operating system's advice whether to use text or binary
788I/O@.  On MS-Windows when @command{grep} uses text I/O it reads a
789carriage return--newline pair as a newline and a Control-Z as
790end-of-file, and it writes a newline as a carriage return--newline
791pair.
792
793When using text I/O @option{--byte-offset} (@option{-b}) counts and
794@option{--binary-files} heuristics apply to input data after text-I/O
795processing.  Also, the @option{--binary-files} heuristics need not agree
796with the @option{--binary} option; that is, they may treat the data as
797text even if @option{--binary} is given, or vice versa.
798@xref{File and Directory Selection}.
799
800This option has no effect on GNU and other POSIX-compatible platforms,
801which do not distinguish text from binary I/O.
802
803@item -z
804@itemx --null-data
805@opindex -z
806@opindex --null-data
807@cindex zero-terminated lines
808Treat input and output data as sequences of lines, each terminated by
809a zero byte (the ASCII NUL character) instead of a newline.
810Like the @option{-Z} or @option{--null} option,
811this option can be used with commands like
812@samp{sort -z} to process arbitrary file names.
813
814@end table
815
816@node Environment Variables
817@section Environment Variables
818
819The behavior of @command{grep} is affected
820by the following environment variables.
821
822@vindex LANGUAGE @r{environment variable}
823@vindex LC_ALL @r{environment variable}
824@vindex LC_MESSAGES @r{environment variable}
825@vindex LANG @r{environment variable}
826The locale for category @w{@code{LC_@var{foo}}}
827is specified by examining the three environment variables
828@env{LC_ALL}, @w{@env{LC_@var{foo}}}, and @env{LANG},
829in that order.
830The first of these variables that is set specifies the locale.
831For example, if @env{LC_ALL} is not set,
832but @env{LC_COLLATE} is set to @samp{pt_BR},
833then the Brazilian Portuguese locale is used
834for the @env{LC_COLLATE} category.
835As a special case for @env{LC_MESSAGES} only, the environment variable
836@env{LANGUAGE} can contain a colon-separated list of languages that
837overrides the three environment variables that ordinarily specify
838the @env{LC_MESSAGES} category.
839The @samp{C} locale is used if none of these environment variables are set,
840if the locale catalog is not installed,
841or if @command{grep} was not compiled
842with national language support (NLS).
843The shell command @code{locale -a} lists locales that are currently available.
844
845Many of the environment variables in the following list let you
846control highlighting using
847Select Graphic Rendition (SGR)
848commands interpreted by the terminal or terminal emulator.
849(See the
850section
851in the documentation of your text terminal
852for permitted values and their meanings as character attributes.)
853These substring values are integers in decimal representation
854and can be concatenated with semicolons.
855@command{grep} takes care of assembling the result
856into a complete SGR sequence (@samp{\33[}...@samp{m}).
857Common values to concatenate include
858@samp{1} for bold,
859@samp{4} for underline,
860@samp{5} for blink,
861@samp{7} for inverse,
862@samp{39} for default foreground color,
863@samp{30} to @samp{37} for foreground colors,
864@samp{90} to @samp{97} for 16-color mode foreground colors,
865@samp{38;5;0} to @samp{38;5;255}
866for 88-color and 256-color modes foreground colors,
867@samp{49} for default background color,
868@samp{40} to @samp{47} for background colors,
869@samp{100} to @samp{107} for 16-color mode background colors,
870and @samp{48;5;0} to @samp{48;5;255}
871for 88-color and 256-color modes background colors.
872
873The two-letter names used in the @env{GREP_COLORS} environment variable
874(and some of the others) refer to terminal ``capabilities,'' the ability
875of a terminal to highlight text, or change its color, and so on.
876These capabilities are stored in an online database and accessed by
877the @code{terminfo} library.
878
879@cindex environment variables
880
881@table @env
882
883@item GREP_OPTIONS
884@vindex GREP_OPTIONS @r{environment variable}
885@cindex default options environment variable
886This variable specifies default options to be placed in front of any
887explicit options.
888As this causes problems when writing portable scripts, this feature
889will be removed in a future release of @command{grep}, and @command{grep}
890warns if it is used.  Please use an alias or script instead.
891For example, if @command{grep} is in the directory @samp{/usr/bin} you
892can prepend @file{$HOME/bin} to your @env{PATH} and create an
893executable script @file{$HOME/bin/grep} containing the following:
894
895@example
896#! /bin/sh
897export PATH=/usr/bin
898exec grep --color=auto --devices=skip "$@@"
899@end example
900
901@item GREP_COLOR
902@vindex GREP_COLOR @r{environment variable}
903@cindex highlight markers
904This variable specifies the color used to highlight matched (non-empty) text.
905It is deprecated in favor of @env{GREP_COLORS}, but still supported.
906The @samp{mt}, @samp{ms}, and @samp{mc} capabilities of @env{GREP_COLORS}
907have priority over it.
908It can only specify the color used to highlight
909the matching non-empty text in any matching line
910(a selected line when the @option{-v} command-line option is omitted,
911or a context line when @option{-v} is specified).
912The default is @samp{01;31},
913which means a bold red foreground text on the terminal's default background.
914
915@item GREP_COLORS
916@vindex GREP_COLORS @r{environment variable}
917@cindex highlight markers
918This variable specifies the colors and other attributes
919used to highlight various parts of the output.
920Its value is a colon-separated list of @code{terminfo} capabilities
921that defaults to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36}
922with the @samp{rv} and @samp{ne} boolean capabilities omitted (i.e., false).
923Supported capabilities are as follows.
924
925@table @code
926@item sl=
927@vindex sl GREP_COLORS @r{capability}
928SGR substring for whole selected lines
929(i.e.,
930matching lines when the @option{-v} command-line option is omitted,
931or non-matching lines when @option{-v} is specified).
932If however the boolean @samp{rv} capability
933and the @option{-v} command-line option are both specified,
934it applies to context matching lines instead.
935The default is empty (i.e., the terminal's default color pair).
936
937@item cx=
938@vindex cx GREP_COLORS @r{capability}
939SGR substring for whole context lines
940(i.e.,
941non-matching lines when the @option{-v} command-line option is omitted,
942or matching lines when @option{-v} is specified).
943If however the boolean @samp{rv} capability
944and the @option{-v} command-line option are both specified,
945it applies to selected non-matching lines instead.
946The default is empty (i.e., the terminal's default color pair).
947
948@item rv
949@vindex rv GREP_COLORS @r{capability}
950Boolean value that reverses (swaps) the meanings of
951the @samp{sl=} and @samp{cx=} capabilities
952when the @option{-v} command-line option is specified.
953The default is false (i.e., the capability is omitted).
954
955@item mt=01;31
956@vindex mt GREP_COLORS @r{capability}
957SGR substring for matching non-empty text in any matching line
958(i.e.,
959a selected line when the @option{-v} command-line option is omitted,
960or a context line when @option{-v} is specified).
961Setting this is equivalent to setting both @samp{ms=} and @samp{mc=}
962at once to the same value.
963The default is a bold red text foreground over the current line background.
964
965@item ms=01;31
966@vindex ms GREP_COLORS @r{capability}
967SGR substring for matching non-empty text in a selected line.
968(This is used only when the @option{-v} command-line option is omitted.)
969The effect of the @samp{sl=} (or @samp{cx=} if @samp{rv}) capability
970remains active when this takes effect.
971The default is a bold red text foreground over the current line background.
972
973@item mc=01;31
974@vindex mc GREP_COLORS @r{capability}
975SGR substring for matching non-empty text in a context line.
976(This is used only when the @option{-v} command-line option is specified.)
977The effect of the @samp{cx=} (or @samp{sl=} if @samp{rv}) capability
978remains active when this takes effect.
979The default is a bold red text foreground over the current line background.
980
981@item fn=35
982@vindex fn GREP_COLORS @r{capability}
983SGR substring for file names prefixing any content line.
984The default is a magenta text foreground over the terminal's default background.
985
986@item ln=32
987@vindex ln GREP_COLORS @r{capability}
988SGR substring for line numbers prefixing any content line.
989The default is a green text foreground over the terminal's default background.
990
991@item bn=32
992@vindex bn GREP_COLORS @r{capability}
993SGR substring for byte offsets prefixing any content line.
994The default is a green text foreground over the terminal's default background.
995
996@item se=36
997@vindex fn GREP_COLORS @r{capability}
998SGR substring for separators that are inserted
999between selected line fields (@samp{:}),
1000between context line fields (@samp{-}),
1001and between groups of adjacent lines
1002when nonzero context is specified (@samp{--}).
1003The default is a cyan text foreground over the terminal's default background.
1004
1005@item ne
1006@vindex ne GREP_COLORS @r{capability}
1007Boolean value that prevents clearing to the end of line
1008using Erase in Line (EL) to Right (@samp{\33[K})
1009each time a colorized item ends.
1010This is needed on terminals on which EL is not supported.
1011It is otherwise useful on terminals
1012for which the @code{back_color_erase}
1013(@code{bce}) boolean @code{terminfo} capability does not apply,
1014when the chosen highlight colors do not affect the background,
1015or when EL is too slow or causes too much flicker.
1016The default is false (i.e., the capability is omitted).
1017@end table
1018
1019Note that boolean capabilities have no @samp{=}... part.
1020They are omitted (i.e., false) by default and become true when specified.
1021
1022
1023@item LC_ALL
1024@itemx LC_COLLATE
1025@itemx LANG
1026@vindex LC_ALL @r{environment variable}
1027@vindex LC_COLLATE @r{environment variable}
1028@vindex LANG @r{environment variable}
1029@cindex character type
1030@cindex national language support
1031@cindex NLS
1032These variables specify the locale for the @env{LC_COLLATE} category,
1033which might affect how range expressions like @samp{[a-z]} are
1034interpreted.
1035
1036@item LC_ALL
1037@itemx LC_CTYPE
1038@itemx LANG
1039@vindex LC_ALL @r{environment variable}
1040@vindex LC_CTYPE @r{environment variable}
1041@vindex LANG @r{environment variable}
1042@cindex encoding error
1043@cindex null character
1044These variables specify the locale for the @env{LC_CTYPE} category,
1045which determines the type of characters,
1046e.g., which characters are whitespace.
1047This category also determines the character encoding, that is, whether
1048text is encoded in UTF-8, ASCII, or some other encoding.  In the
1049@samp{C} or @samp{POSIX} locale, all characters are encoded as a
1050single byte and every byte is a valid character.
1051In more-complex encodings such as UTF-8, a sequence of multiple bytes
1052may be needed to represent a character, and some bytes may be encoding
1053errors that do not contribute to the representation of any character.
1054POSIX does not specify the behavior of @command{grep} when patterns or
1055input data contain encoding errors or null characters, so portable
1056scripts should avoid such usage.  As an extension to POSIX, GNU
1057@command{grep} treats null characters like any other character.
1058However, unless the @option{-a} (@option{--binary-files=text}) option
1059is used, the presence of null characters in input or of encoding
1060errors in output causes GNU @command{grep} to treat the file as binary
1061and suppress details about matches.  @xref{File and Directory
1062Selection}.
1063
1064@item LANGUAGE
1065@itemx LC_ALL
1066@itemx LC_MESSAGES
1067@itemx LANG
1068@vindex LANGUAGE @r{environment variable}
1069@vindex LC_ALL @r{environment variable}
1070@vindex LC_MESSAGES @r{environment variable}
1071@vindex LANG @r{environment variable}
1072@cindex language of messages
1073@cindex message language
1074@cindex national language support
1075@cindex translation of message language
1076These variables specify the locale for the @env{LC_MESSAGES} category,
1077which determines the language that @command{grep} uses for messages.
1078The default @samp{C} locale uses American English messages.
1079
1080@item POSIXLY_CORRECT
1081@vindex POSIXLY_CORRECT @r{environment variable}
1082If set, @command{grep} behaves as POSIX requires; otherwise,
1083@command{grep} behaves more like other GNU programs.
1084POSIX
1085requires that options that
1086follow file names must be treated as file names;
1087by default,
1088such options are permuted to the front of the operand list
1089and are treated as options.
1090Also, @env{POSIXLY_CORRECT} disables special handling of an
1091invalid bracket expression.  @xref{invalid-bracket-expr}.
1092
1093@item _@var{N}_GNU_nonoption_argv_flags_
1094@vindex _@var{N}_GNU_nonoption_argv_flags_ @r{environment variable}
1095(Here @code{@var{N}} is @command{grep}'s numeric process ID.)
1096If the @var{i}th character of this environment variable's value is @samp{1},
1097do not consider the @var{i}th operand of @command{grep} to be an option,
1098even if it appears to be one.
1099A shell can put this variable in the environment for each command it runs,
1100specifying which operands are the results of file name wildcard expansion
1101and therefore should not be treated as options.
1102This behavior is available only with the GNU C library,
1103and only when @env{POSIXLY_CORRECT} is not set.
1104
1105@end table
1106
1107
1108@node Exit Status
1109@section Exit Status
1110@cindex exit status
1111@cindex return status
1112
1113Normally the exit status is 0 if a line is selected, 1 if no lines
1114were selected, and 2 if an error occurred.  However, if the
1115@option{-L} or @option{--files-without-match} is used, the exit status
1116is 0 if a file is listed, 1 if no files were listed, and 2 if an error
1117occurred.  Also, if the
1118@option{-q} or @option{--quiet} or @option{--silent} option is used
1119and a line is selected, the exit status is 0 even if an error
1120occurred.  Other @command{grep} implementations may exit with status
1121greater than 2 on error.
1122
1123@node grep Programs
1124@section @command{grep} Programs
1125@cindex @command{grep} programs
1126@cindex variants of @command{grep}
1127
1128@command{grep} searches the named input files
1129for lines containing a match to the given patterns.
1130By default, @command{grep} prints the matching lines.
1131A file named @file{-} stands for standard input.
1132If no input is specified, @command{grep} searches the working
1133directory @file{.} if given a command-line option specifying
1134recursion; otherwise, @command{grep} searches standard input.
1135There are four major variants of @command{grep},
1136controlled by the following options.
1137
1138@table @option
1139
1140@item -G
1141@itemx --basic-regexp
1142@opindex -G
1143@opindex --basic-regexp
1144@cindex matching basic regular expressions
1145Interpret patterns as basic regular expressions (BREs).
1146This is the default.
1147
1148@item -E
1149@itemx --extended-regexp
1150@opindex -E
1151@opindex --extended-regexp
1152@cindex matching extended regular expressions
1153Interpret patterns as extended regular expressions (EREs).
1154(@option{-E} is specified by POSIX.)
1155
1156@item -F
1157@itemx --fixed-strings
1158@opindex -F
1159@opindex --fixed-strings
1160@cindex matching fixed strings
1161Interpret patterns as fixed strings, not regular expressions.
1162(@option{-F} is specified by POSIX.)
1163
1164@item -P
1165@itemx --perl-regexp
1166@opindex -P
1167@opindex --perl-regexp
1168@cindex matching Perl-compatible regular expressions
1169Interpret patterns as Perl-compatible regular expressions (PCREs).
1170PCRE support is here to stay, but consider this option experimental when
1171combined with the @option{-z} (@option{--null-data}) option, and note that
1172@samp{grep@ -P} may warn of unimplemented features.
1173@xref{Other Options}.
1174
1175@end table
1176
1177In addition,
1178two variant programs @command{egrep} and @command{fgrep} are available.
1179@command{egrep} is the same as @samp{grep@ -E}.
1180@command{fgrep} is the same as @samp{grep@ -F}.
1181Direct invocation as either
1182@command{egrep} or @command{fgrep} is deprecated,
1183but is provided to allow historical applications
1184that rely on them to run unmodified.
1185
1186
1187@node Regular Expressions
1188@chapter Regular Expressions
1189@cindex regular expressions
1190
1191A @dfn{regular expression} is a pattern that describes a set of strings.
1192Regular expressions are constructed analogously to arithmetic expressions,
1193by using various operators to combine smaller expressions.
1194@command{grep} understands
1195three different versions of regular expression syntax:
1196basic (BRE), extended (ERE), and Perl-compatible (PCRE).
1197In GNU @command{grep},
1198there is no difference in available functionality between the basic and
1199extended syntaxes.
1200In other implementations, basic regular expressions are less powerful.
1201The following description applies to extended regular expressions;
1202differences for basic regular expressions are summarized afterwards.
1203Perl-compatible regular expressions give additional functionality, and
1204are documented in the @i{pcresyntax}(3) and @i{pcrepattern}(3) manual
1205pages, but work only if PCRE is available in the system.
1206
1207@menu
1208* Fundamental Structure::
1209* Character Classes and Bracket Expressions::
1210* The Backslash Character and Special Expressions::
1211* Anchoring::
1212* Back-references and Subexpressions::
1213* Basic vs Extended::
1214@end menu
1215
1216@node Fundamental Structure
1217@section Fundamental Structure
1218
1219The fundamental building blocks are the regular expressions that match
1220a single character.
1221Most characters, including all letters and digits,
1222are regular expressions that match themselves.
1223Any meta-character
1224with special meaning may be quoted by preceding it with a backslash.
1225
1226@opindex .
1227@cindex dot
1228@cindex period
1229The period @samp{.} matches any single character.
1230It is unspecified whether @samp{.} matches an encoding error.
1231
1232A regular expression may be followed by one of several
1233repetition operators:
1234
1235@table @samp
1236
1237@item ?
1238@opindex ?
1239@cindex question mark
1240@cindex match expression at most once
1241The preceding item is optional and will be matched at most once.
1242
1243@item *
1244@opindex *
1245@cindex asterisk
1246@cindex match expression zero or more times
1247The preceding item will be matched zero or more times.
1248
1249@item +
1250@opindex +
1251@cindex plus sign
1252@cindex match expression one or more times
1253The preceding item will be matched one or more times.
1254
1255@item @{@var{n}@}
1256@opindex @{@var{n}@}
1257@cindex braces, one argument
1258@cindex match expression @var{n} times
1259The preceding item is matched exactly @var{n} times.
1260
1261@item @{@var{n},@}
1262@opindex @{@var{n},@}
1263@cindex braces, second argument omitted
1264@cindex match expression @var{n} or more times
1265The preceding item is matched @var{n} or more times.
1266
1267@item @{,@var{m}@}
1268@opindex @{,@var{m}@}
1269@cindex braces, first argument omitted
1270@cindex match expression at most @var{m} times
1271The preceding item is matched at most @var{m} times.
1272This is a GNU extension.
1273
1274@item @{@var{n},@var{m}@}
1275@opindex @{@var{n},@var{m}@}
1276@cindex braces, two arguments
1277@cindex match expression from @var{n} to @var{m} times
1278The preceding item is matched at least @var{n} times, but not more than
1279@var{m} times.
1280
1281@end table
1282
1283The empty regular expression matches the empty string.
1284Two regular expressions may be concatenated;
1285the resulting regular expression
1286matches any string formed by concatenating two substrings
1287that respectively match the concatenated expressions.
1288
1289Two regular expressions may be joined by the infix operator @samp{|};
1290the resulting regular expression
1291matches any string matching either alternate expression.
1292
1293Repetition takes precedence over concatenation,
1294which in turn takes precedence over alternation.
1295A whole expression may be enclosed in parentheses
1296to override these precedence rules and form a subexpression.
1297An unmatched @samp{)} matches just itself.
1298
1299@node Character Classes and Bracket Expressions
1300@section Character Classes and Bracket Expressions
1301
1302@cindex bracket expression
1303@cindex character class
1304A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and
1305@samp{]}.
1306It matches any single character in that list.
1307If the first character of the list is the caret @samp{^},
1308then it matches any character @strong{not} in the list,
1309and it is unspecified whether it matches an encoding error.
1310For example, the regular expression
1311@samp{[0123456789]} matches any single digit,
1312whereas @samp{[^()]} matches any single character that is not
1313an opening or closing parenthesis, and might or might not match an
1314encoding error.
1315
1316@cindex range expression
1317Within a bracket expression, a @dfn{range expression} consists of two
1318characters separated by a hyphen.
1319It matches any single character that
1320sorts between the two characters, inclusive.
1321In the default C locale, the sorting sequence is the native character
1322order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}.
1323In other locales, the sorting sequence is not specified, and
1324@samp{[a-d]} might be equivalent to @samp{[abcd]} or to
1325@samp{[aBbCcDd]}, or it might fail to match any character, or the set of
1326characters that it matches might even be erratic.
1327To obtain the traditional interpretation
1328of bracket expressions, you can use the @samp{C} locale by setting the
1329@env{LC_ALL} environment variable to the value @samp{C}.
1330
1331Finally, certain named classes of characters are predefined within
1332bracket expressions, as follows.
1333Their interpretation depends on the @env{LC_CTYPE} locale;
1334for example, @samp{[[:alnum:]]} means the character class of numbers and letters
1335in the current locale.
1336
1337@cindex classes of characters
1338@cindex character classes
1339@table @samp
1340
1341@item [:alnum:]
1342@opindex alnum @r{character class}
1343@cindex alphanumeric characters
1344Alphanumeric characters:
1345@samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII
1346character encoding, this is the same as @samp{[0-9A-Za-z]}.
1347
1348@item [:alpha:]
1349@opindex alpha @r{character class}
1350@cindex alphabetic characters
1351Alphabetic characters:
1352@samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII
1353character encoding, this is the same as @samp{[A-Za-z]}.
1354
1355@item [:blank:]
1356@opindex blank @r{character class}
1357@cindex blank characters
1358Blank characters:
1359space and tab.
1360
1361@item [:cntrl:]
1362@opindex cntrl @r{character class}
1363@cindex control characters
1364Control characters.
1365In ASCII, these characters have octal codes 000
1366through 037, and 177 (DEL).
1367In other character sets, these are
1368the equivalent characters, if any.
1369
1370@item [:digit:]
1371@opindex digit @r{character class}
1372@cindex digit characters
1373@cindex numeric characters
1374Digits: @code{0 1 2 3 4 5 6 7 8 9}.
1375
1376@item [:graph:]
1377@opindex graph @r{character class}
1378@cindex graphic characters
1379Graphical characters:
1380@samp{[:alnum:]} and @samp{[:punct:]}.
1381
1382@item [:lower:]
1383@opindex lower @r{character class}
1384@cindex lower-case letters
1385Lower-case letters; in the @samp{C} locale and ASCII character
1386encoding, this is
1387@code{a b c d e f g h i j k l m n o p q r s t u v w x y z}.
1388
1389@item [:print:]
1390@opindex print @r{character class}
1391@cindex printable characters
1392Printable characters:
1393@samp{[:alnum:]}, @samp{[:punct:]}, and space.
1394
1395@item [:punct:]
1396@opindex punct @r{character class}
1397@cindex punctuation characters
1398Punctuation characters; in the @samp{C} locale and ASCII character
1399encoding, this is
1400@code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ | @} ~}.
1401
1402@item [:space:]
1403@opindex space @r{character class}
1404@cindex space characters
1405@cindex whitespace characters
1406Space characters: in the @samp{C} locale, this is
1407tab, newline, vertical tab, form feed, carriage return, and space.
1408@xref{Usage}, for more discussion of matching newlines.
1409
1410@item [:upper:]
1411@opindex upper @r{character class}
1412@cindex upper-case letters
1413Upper-case letters: in the @samp{C} locale and ASCII character
1414encoding, this is
1415@code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}.
1416
1417@item [:xdigit:]
1418@opindex xdigit @r{character class}
1419@cindex xdigit class
1420@cindex hexadecimal digits
1421Hexadecimal digits:
1422@code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}.
1423
1424@end table
1425Note that the brackets in these class names are
1426part of the symbolic names, and must be included in addition to
1427the brackets delimiting the bracket expression.
1428
1429@anchor{invalid-bracket-expr}
1430If you mistakenly omit the outer brackets, and search for say, @samp{[:upper:]},
1431GNU @command{grep} prints a diagnostic and exits with status 2, on
1432the assumption that you did not intend to search for the nominally
1433equivalent regular expression: @samp{[:epru]}.
1434Set the @env{POSIXLY_CORRECT} environment variable to disable this feature.
1435
1436Most meta-characters lose their special meaning inside bracket expressions.
1437
1438@table @samp
1439@item ]
1440ends the bracket expression if it's not the first list item.
1441So, if you want to make the @samp{]} character a list item,
1442you must put it first.
1443
1444@item [.
1445represents the open collating symbol.
1446
1447@item .]
1448represents the close collating symbol.
1449
1450@item [=
1451represents the open equivalence class.
1452
1453@item =]
1454represents the close equivalence class.
1455
1456@item [:
1457represents the open character class symbol, and should be followed by a
1458valid character class name.
1459
1460@item :]
1461represents the close character class symbol.
1462
1463@item -
1464represents the range if it's not first or last in a list or the ending point
1465of a range.
1466
1467@item ^
1468represents the characters not in the list.
1469If you want to make the @samp{^}
1470character a list item, place it anywhere but first.
1471
1472@end table
1473
1474@node The Backslash Character and Special Expressions
1475@section The Backslash Character and Special Expressions
1476@cindex backslash
1477
1478The @samp{\} character,
1479when followed by certain ordinary characters,
1480takes a special meaning:
1481
1482@table @samp
1483
1484@item \b
1485Match the empty string at the edge of a word.
1486
1487@item \B
1488Match the empty string provided it's not at the edge of a word.
1489
1490@item \<
1491Match the empty string at the beginning of word.
1492
1493@item \>
1494Match the empty string at the end of word.
1495
1496@item \w
1497Match word constituent, it is a synonym for @samp{[_[:alnum:]]}.
1498
1499@item \W
1500Match non-word constituent, it is a synonym for @samp{[^_[:alnum:]]}.
1501
1502@item \s
1503Match whitespace, it is a synonym for @samp{[[:space:]]}.
1504
1505@item \S
1506Match non-whitespace, it is a synonym for @samp{[^[:space:]]}.
1507
1508@end table
1509
1510For example, @samp{\brat\b} matches the separate word @samp{rat},
1511@samp{\Brat\B} matches @samp{crate} but not @samp{furry rat}.
1512
1513@node Anchoring
1514@section Anchoring
1515@cindex anchoring
1516
1517The caret @samp{^} and the dollar sign @samp{$} are meta-characters that
1518respectively match the empty string at the beginning and end of a line.
1519They are termed @dfn{anchors}, since they force the match to be ``anchored''
1520to beginning or end of a line, respectively.
1521
1522@node Back-references and Subexpressions
1523@section Back-references and Subexpressions
1524@cindex subexpression
1525@cindex back-reference
1526
1527The back-reference @samp{\@var{n}}, where @var{n} is a single digit, matches
1528the substring previously matched by the @var{n}th parenthesized subexpression
1529of the regular expression.
1530For example, @samp{(a)\1} matches @samp{aa}.
1531When used with alternation, if the group does not participate in the match then
1532the back-reference makes the whole match fail.
1533For example, @samp{a(.)|b\1}
1534will not match @samp{ba}.
1535When multiple regular expressions are given with
1536@option{-e} or from a file (@samp{-f @var{file}}),
1537back-references are local to each expression.
1538
1539@xref{Known Bugs}, for some known problems with back-references.
1540
1541@node Basic vs Extended
1542@section Basic vs Extended Regular Expressions
1543@cindex basic regular expressions
1544
1545In basic regular expressions the meta-characters @samp{?}, @samp{+},
1546@samp{@{}, @samp{|}, @samp{(}, and @samp{)} lose their special meaning;
1547instead use the backslashed versions @samp{\?}, @samp{\+}, @samp{\@{},
1548@samp{\|}, @samp{\(}, and @samp{\)}.
1549
1550@cindex interval specifications
1551Traditional @command{egrep} did not support the @samp{@{} meta-character,
1552and some @command{egrep} implementations support @samp{\@{} instead, so
1553portable scripts should avoid @samp{@{} in @samp{grep@ -E} patterns and
1554should use @samp{[@{]} to match a literal @samp{@{}.
1555
1556GNU @command{grep@ -E} attempts to support traditional usage by
1557assuming that @samp{@{} is not special if it would be the start of an
1558invalid interval specification.
1559For example, the command
1560@samp{grep@ -E@ '@{1'} searches for the two-character string @samp{@{1}
1561instead of reporting a syntax error in the regular expression.
1562POSIX allows this behavior as an extension, but portable scripts
1563should avoid it.
1564
1565
1566@node Usage
1567@chapter Usage
1568
1569@cindex usage, examples
1570Here is an example command that invokes GNU @command{grep}:
1571
1572@example
1573grep -i 'hello.*world' menu.h main.c
1574@end example
1575
1576@noindent
1577This lists all lines in the files @file{menu.h} and @file{main.c} that
1578contain the string @samp{hello} followed by the string @samp{world};
1579this is because @samp{.*} matches zero or more characters within a line.
1580@xref{Regular Expressions}.
1581The @option{-i} option causes @command{grep}
1582to ignore case, causing it to match the line @samp{Hello, world!}, which
1583it would not otherwise match.
1584
1585Here is a more complex example session,
1586showing the location and contents of any line
1587containing @samp{f} and ending in @samp{.c},
1588within all files in the current directory whose names
1589contain @samp{g} and end in @samp{.h}.
1590The @option{-n} option outputs line numbers, the @option{--} argument
1591treats any later arguments starting with @samp{-} as file names not
1592options, and the empty file @file{/dev/null} causes file names to be output
1593even if only one file name happens to be of the form @samp{*g*.h}.
1594
1595@example
1596$ @kbd{grep -n -- 'f.*\.c$' *g*.h /dev/null}
1597argmatch.h:1:/* definitions and prototypes for argmatch.c
1598@end example
1599
1600@noindent
1601The only line that contains a match is line 1 of @file{argmatch.h}.
1602Note that the regular expression syntax used in the pattern differs
1603from the globbing syntax that the shell uses to match file names.
1604
1605@xref{Invoking}, for more details about
1606how to invoke @command{grep}.
1607
1608@cindex using @command{grep}, Q&A
1609@cindex FAQ about @command{grep} usage
1610Here are some common questions and answers about @command{grep} usage.
1611
1612@enumerate
1613
1614@item
1615How can I list just the names of matching files?
1616
1617@example
1618grep -l 'main' test-*.c
1619@end example
1620
1621@noindent
1622lists names of @samp{test-*.c} files in the current directory whose contents
1623mention @samp{main}.
1624
1625@item
1626How do I search directories recursively?
1627
1628@example
1629grep -r 'hello' /home/gigi
1630@end example
1631
1632@noindent
1633searches for @samp{hello} in all files
1634under the @file{/home/gigi} directory.
1635For more control over which files are searched,
1636use @command{find} and @command{grep}.
1637For example, the following command searches only C files:
1638
1639@example
1640find /home/gigi -name '*.c' ! -type d \
1641  -exec grep -H 'hello' '@{@}' +
1642@end example
1643
1644This differs from the command:
1645
1646@example
1647grep -H 'hello' /home/gigi/*.c
1648@end example
1649
1650which merely looks for @samp{hello} in non-hidden C files in
1651@file{/home/gigi} whose names end in @samp{.c}.
1652The @command{find} command line above is more similar to the command:
1653
1654@example
1655grep -r --include='*.c' 'hello' /home/gigi
1656@end example
1657
1658@item
1659What if a pattern or file has a leading @samp{-}?
1660
1661@example
1662grep -- '--cut here--' *
1663@end example
1664
1665@noindent
1666searches for all lines matching @samp{--cut here--}.
1667Without @option{--},
1668@command{grep} would attempt to parse @samp{--cut here--} as a list of
1669options, and there would be similar problems with any file names
1670beginning with @samp{-}.
1671
1672Alternatively, you can prevent misinterpretation of leading @samp{-}
1673by using @option{-e} for patterns and leading @samp{./} for files:
1674
1675@example
1676grep -e '--cut here--' ./*
1677@end example
1678
1679@item
1680Suppose I want to search for a whole word, not a part of a word?
1681
1682@example
1683grep -w 'hello' test*.log
1684@end example
1685
1686@noindent
1687searches only for instances of @samp{hello} that are entire words;
1688it does not match @samp{Othello}.
1689For more control, use @samp{\<} and
1690@samp{\>} to match the start and end of words.
1691For example:
1692
1693@example
1694grep 'hello\>' test*.log
1695@end example
1696
1697@noindent
1698searches only for words ending in @samp{hello}, so it matches the word
1699@samp{Othello}.
1700
1701@item
1702How do I output context around the matching lines?
1703
1704@example
1705grep -C 2 'hello' test*.log
1706@end example
1707
1708@noindent
1709prints two lines of context around each matching line.
1710
1711@item
1712How do I force @command{grep} to print the name of the file?
1713
1714Append @file{/dev/null}:
1715
1716@example
1717grep 'eli' /etc/passwd /dev/null
1718@end example
1719
1720gets you:
1721
1722@example
1723/etc/passwd:eli:x:2098:1000:Eli Smith:/home/eli:/bin/bash
1724@end example
1725
1726Alternatively, use @option{-H}, which is a GNU extension:
1727
1728@example
1729grep -H 'eli' /etc/passwd
1730@end example
1731
1732@item
1733Why do people use strange regular expressions on @command{ps} output?
1734
1735@example
1736ps -ef | grep '[c]ron'
1737@end example
1738
1739If the pattern had been written without the square brackets, it would
1740have matched not only the @command{ps} output line for @command{cron},
1741but also the @command{ps} output line for @command{grep}.
1742Note that on some platforms,
1743@command{ps} limits the output to the width of the screen;
1744@command{grep} does not have any limit on the length of a line
1745except the available memory.
1746
1747@item
1748Why does @command{grep} report ``Binary file matches''?
1749
1750If @command{grep} listed all matching ``lines'' from a binary file, it
1751would probably generate output that is not useful, and it might even
1752muck up your display.
1753So GNU @command{grep} suppresses output from
1754files that appear to be binary files.
1755To force GNU @command{grep}
1756to output lines even from files that appear to be binary, use the
1757@option{-a} or @samp{--binary-files=text} option.
1758To eliminate the
1759``Binary file matches'' messages, use the @option{-I} or
1760@samp{--binary-files=without-match} option.
1761
1762@item
1763Why doesn't @samp{grep -lv} print non-matching file names?
1764
1765@samp{grep -lv} lists the names of all files containing one or more
1766lines that do not match.
1767To list the names of all files that contain no
1768matching lines, use the @option{-L} or @option{--files-without-match}
1769option.
1770
1771@item
1772I can do ``OR'' with @samp{|}, but what about ``AND''?
1773
1774@example
1775grep 'paul' /etc/motd | grep 'franc,ois'
1776@end example
1777
1778@noindent
1779finds all lines that contain both @samp{paul} and @samp{franc,ois}.
1780
1781@item
1782Why does the empty pattern match every input line?
1783
1784The @command{grep} command searches for lines that contain strings
1785that match a pattern.  Every line contains the empty string, so an
1786empty pattern causes @command{grep} to find a match on each line.  It
1787is not the only such pattern: @samp{^}, @samp{$}, and many
1788other patterns cause @command{grep} to match every line.
1789
1790To match empty lines, use the pattern @samp{^$}.  To match blank
1791lines, use the pattern @samp{^[[:blank:]]*$}.  To match no lines at
1792all, use the command @samp{grep -f /dev/null}.
1793
1794@item
1795How can I search in both standard input and in files?
1796
1797Use the special file name @samp{-}:
1798
1799@example
1800cat /etc/passwd | grep 'alain' - /etc/motd
1801@end example
1802
1803@item
1804Why is this back-reference failing?
1805
1806@example
1807echo 'ba' | grep -E '(a)\1|b\1'
1808@end example
1809
1810This gives no output, because the first alternate @samp{(a)\1} does not match,
1811as there is no @samp{aa} in the input, so the @samp{\1} in the second alternate
1812has nothing to refer back to, meaning it will never match anything.
1813(The second alternate in this example can only match
1814if the first alternate has matched---making the second one superfluous.)
1815
1816@item
1817How can I match across lines?
1818
1819Standard grep cannot do this, as it is fundamentally line-based.
1820Therefore, merely using the @code{[:space:]} character class does not
1821match newlines in the way you might expect.
1822
1823With the GNU @command{grep} option @option{-z} (@option{--null-data}), each
1824input and output ``line'' is null-terminated; @pxref{Other Options}.  Thus,
1825you can match newlines in the input, but typically if there is a match
1826the entire input is output, so this usage is often combined with
1827output-suppressing options like @option{-q}, e.g.:
1828
1829@example
1830printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar'
1831@end example
1832
1833If this does not suffice, you can transform the input
1834before giving it to @command{grep}, or turn to @command{awk},
1835@command{sed}, @command{perl}, or many other utilities that are
1836designed to operate across lines.
1837
1838@item
1839What do @command{grep}, @command{fgrep}, and @command{egrep} stand for?
1840
1841The name @command{grep} comes from the way line editing was done on Unix.
1842For example,
1843@command{ed} uses the following syntax
1844to print a list of matching lines on the screen:
1845
1846@example
1847global/regular expression/print
1848g/re/p
1849@end example
1850
1851@command{fgrep} stands for Fixed @command{grep};
1852@command{egrep} stands for Extended @command{grep}.
1853
1854@end enumerate
1855
1856
1857@node Performance
1858@chapter Performance
1859
1860@cindex performance
1861Typically @command{grep} is an efficient way to search text.  However,
1862it can be quite slow in some cases, and it can search large files
1863where even minor performance tweaking can help significantly.
1864Although the algorithm used by @command{grep} is an implementation
1865detail that can change from release to release, understanding its
1866basic strengths and weaknesses can help you improve its performance.
1867
1868The @command{grep} command operates partly via a set of automata that
1869are designed for efficiency, and partly via a slower matcher that
1870takes over when the fast matchers run into unusual features like
1871back-references.  When feasible, the Boyer--Moore fast string
1872searching algorithm is used to match a single fixed pattern, and the
1873Aho--Corasick algorithm is used to match multiple fixed patterns.
1874
1875@cindex locales
1876Generally speaking @command{grep} operates more efficiently in
1877single-byte locales, since it can avoid the special processing needed
1878for multi-byte characters.  If your patterns will work just as well
1879that way, setting @env{LC_ALL} to a single-byte locale can help
1880performance considerably.  Setting @samp{LC_ALL='C'} can be
1881particularly efficient, as @command{grep} is tuned for that locale.
1882
1883@cindex case insensitive search
1884Outside the @samp{C} locale, case-insensitive search, and search for
1885bracket expressions like @samp{[a-z]} and @samp{[[=a=]b]}, can be
1886surprisingly inefficient due to difficulties in fast portable access to
1887concepts like multi-character collating elements.
1888
1889@cindex back-references
1890A back-reference such as @samp{\1} can hurt performance significantly
1891in some cases, since back-references cannot in general be implemented
1892via a finite state automaton, and instead trigger a backtracking
1893algorithm that can be quite inefficient.  For example, although the
1894pattern @samp{^(.*)\1@{14@}(.*)\2@{13@}$} matches only lines whose
1895lengths can be written as a sum @math{15x + 14y} for nonnegative
1896integers @math{x} and @math{y}, the pattern matcher does not perform
1897linear Diophantine analysis and instead backtracks through all
1898possible matching strings, using an algorithm that is exponential in
1899the worst case.
1900
1901@cindex holes in files
1902On some operating systems that support files with holes---large
1903regions of zeros that are not physically present on secondary
1904storage---@command{grep} can skip over the holes efficiently without
1905needing to read the zeros.  This optimization is not available if the
1906@option{-a} (@option{--binary-files=text}) option is used (@pxref{File and
1907Directory Selection}), unless the @option{-z} (@option{--null-data})
1908option is also used (@pxref{Other Options}).
1909
1910For more about the algorithms used by @command{grep} and about
1911related string matching algorithms, see:
1912
1913@frenchspacing on
1914@itemize @bullet
1915@item
1916Aho AV. Algorithms for finding patterns in strings.
1917In: van Leeuwen J. @emph{Handbook of Theoretical Computer Science}, vol. A.
1918New York: Elsevier; 1990. p. 255--300.
1919This surveys classic string matching algorithms, some of which are
1920used by @command{grep}.
1921
1922@item
1923Aho AV, Corasick MJ. Efficient string matching: an aid to bibliographic search.
1924@emph{CACM}. 1975;18(6):333--40.
1925@url{https://dx.doi.org/10.1145/360825.360855}.
1926This introduces the Aho--Corasick algorithm.
1927
1928@item
1929Boyer RS, Moore JS. A fast string searching algorithm.
1930@emph{CACM}. 1977;20(10):762--72.
1931@url{https://dx.doi.org/10.1145/359842.359859}.
1932This introduces the Boyer--Moore algorithm.
1933
1934@item
1935Faro S, Lecroq T. The exact online string matching problem: a review
1936of the most recent results.
1937@emph{ACM Comput Surv}. 2013;45(2):13.
1938@url{https://dx.doi.org/10.1145/2431211.2431212}.
1939This surveys string matching algorithms that might help improve the
1940performance of @command{grep} in the future.
1941@end itemize
1942@frenchspacing off
1943
1944@node Reporting Bugs
1945@chapter Reporting bugs
1946
1947@cindex bugs, reporting
1948Bug reports can be found at the
1949@url{https://debbugs.gnu.org/cgi/pkgreport.cgi?package=grep,
1950GNU bug report logs for @command{grep}}.
1951If you find a bug not listed there, please email it to
1952@email{bug-grep@@gnu.org} to create a new bug report.
1953
1954@menu
1955* Known Bugs::
1956@end menu
1957
1958@node Known Bugs
1959@section Known Bugs
1960@cindex Bugs, known
1961
1962Large repetition counts in the @samp{@{n,m@}} construct may cause
1963@command{grep} to use lots of memory.
1964In addition, certain other
1965obscure regular expressions require exponential time and
1966space, and may cause @command{grep} to run out of memory.
1967
1968Back-references can greatly slow down matching, as they can generate
1969exponentially many matching possibilities that can consume both time
1970and memory to explore.  Also, the POSIX specification for
1971back-references is at times unclear.  Furthermore, many regular
1972expression implementations have back-reference bugs that can cause
1973programs to return incorrect answers or even crash, and fixing these
1974bugs has often been low-priority---for example, as of 2019 the GNU C
1975library bug database contained back-reference bugs 52, 10844, 11053,
1976and 25322, with little sign of forthcoming fixes.  Luckily,
1977back-references are rarely useful and it should be little trouble to
1978avoid them in practical applications.
1979
1980
1981@node Copying
1982@chapter Copying
1983@cindex copying
1984
1985GNU @command{grep} is licensed under the GNU GPL, which makes it @dfn{free
1986software}.
1987
1988The ``free'' in ``free software'' refers to liberty, not price.  As
1989some GNU project advocates like to point out, think of ``free speech''
1990rather than ``free beer''.  In short, you have the right (freedom) to
1991run and change @command{grep} and distribute it to other people, and---if you
1992want---charge money for doing either.  The important restriction is
1993that you have to grant your recipients the same rights and impose the
1994same restrictions.
1995
1996This general method of licensing software is sometimes called
1997@dfn{open source}.  The GNU project prefers the term ``free software''
1998for reasons outlined at
1999@url{https://www.gnu.org/philosophy/open-source-misses-the-point.html}.
2000
2001This manual is free documentation in the same sense.  The
2002documentation license is included below.  The license for the program
2003is available with the source code, or at
2004@url{https://www.gnu.org/licenses/gpl.html}.
2005
2006@menu
2007* GNU Free Documentation License::
2008@end menu
2009
2010@node GNU Free Documentation License
2011@section GNU Free Documentation License
2012
2013@include fdl.texi
2014
2015
2016@node Index
2017@unnumbered Index
2018
2019@printindex cp
2020
2021@bye
2022