1\input texinfo @c -*-texinfo-*- 2@c %**start of header 3@setfilename grep.info 4@include version.texi 5@settitle GNU Grep @value{VERSION} 6 7@c Combine indices. 8@syncodeindex ky cp 9@syncodeindex pg cp 10@syncodeindex tp cp 11@defcodeindex op 12@syncodeindex op cp 13@syncodeindex vr cp 14@c %**end of header 15 16@documentencoding UTF-8 17@c These two require Texinfo 5.0 or later, so use the older 18@c equivalent @set variables supported in 4.11 and later. 19@ignore 20@codequotebacktick on 21@codequoteundirected on 22@end ignore 23@set txicodequoteundirected 24@set txicodequotebacktick 25@iftex 26@c TeX sometimes fails to hyphenate, so help it here. 27@hyphenation{spec-i-fied} 28@end iftex 29 30@copying 31This manual is for @command{grep}, a pattern matching engine. 32 33Copyright @copyright{} 1999--2002, 2005, 2008--2020 Free Software Foundation, 34Inc. 35 36@quotation 37Permission is granted to copy, distribute and/or modify this document 38under the terms of the GNU Free Documentation License, Version 1.3 or 39any later version published by the Free Software Foundation; with no 40Invariant Sections, with no Front-Cover Texts, and with no Back-Cover 41Texts. A copy of the license is included in the section entitled 42``GNU Free Documentation License''. 43@end quotation 44@end copying 45 46@dircategory Text creation and manipulation 47@direntry 48* grep: (grep). Print lines that match patterns. 49@end direntry 50 51@titlepage 52@title GNU Grep: Print lines that match patterns 53@subtitle version @value{VERSION}, @value{UPDATED} 54@author Alain Magloire et al. 55@page 56@vskip 0pt plus 1filll 57@insertcopying 58@end titlepage 59 60@contents 61 62 63@ifnottex 64@node Top 65@top grep 66 67@command{grep} prints lines that contain a match for one or more patterns. 68 69This manual is for version @value{VERSION} of GNU Grep. 70 71@insertcopying 72@end ifnottex 73 74@menu 75* Introduction:: Introduction. 76* Invoking:: Command-line options, environment, exit status. 77* Regular Expressions:: Regular Expressions. 78* Usage:: Examples. 79* Performance:: Performance tuning. 80* Reporting Bugs:: Reporting Bugs. 81* Copying:: License terms for this manual. 82* Index:: Combined index. 83@end menu 84 85 86@node Introduction 87@chapter Introduction 88 89@cindex searching for patterns 90 91Given one or more patterns, @command{grep} searches input files 92for matches to the patterns. 93When it finds a match in a line, 94it copies the line to standard output (by default), 95or produces whatever other sort of output you have requested with options. 96 97Though @command{grep} expects to do the matching on text, 98it has no limits on input line length other than available memory, 99and it can match arbitrary characters within a line. 100If the final byte of an input file is not a newline, 101@command{grep} silently supplies one. 102Since newline is also a separator for the list of patterns, 103there is no way to match newline characters in a text. 104 105 106@node Invoking 107@chapter Invoking @command{grep} 108 109The general synopsis of the @command{grep} command line is 110 111@example 112grep [@var{option}...] [@var{patterns}] [@var{file}...] 113@end example 114 115@noindent 116There can be zero or more @var{option} arguments, and zero or more 117@var{file} arguments. The @var{patterns} argument contains one or 118more patterns separated by newlines, and is omitted when patterns are 119given via the @samp{-e@ @var{patterns}} or @samp{-f@ @var{file}} 120options. Typically @var{patterns} should be quoted when 121@command{grep} is used in a shell command. 122 123@menu 124* Command-line Options:: Short and long names, grouped by category. 125* Environment Variables:: POSIX, GNU generic, and GNU grep specific. 126* Exit Status:: Exit status returned by @command{grep}. 127* grep Programs:: @command{grep} programs. 128@end menu 129 130@node Command-line Options 131@section Command-line Options 132 133@command{grep} comes with a rich set of options: 134some from POSIX and some being GNU extensions. 135Long option names are always a GNU extension, 136even for options that are from POSIX specifications. 137Options that are specified by POSIX, 138under their short names, 139are explicitly marked as such 140to facilitate POSIX-portable programming. 141A few option names are provided 142for compatibility with older or more exotic implementations. 143 144@menu 145* Generic Program Information:: 146* Matching Control:: 147* General Output Control:: 148* Output Line Prefix Control:: 149* Context Line Control:: 150* File and Directory Selection:: 151* Other Options:: 152@end menu 153 154Several additional options control 155which variant of the @command{grep} matching engine is used. 156@xref{grep Programs}. 157 158@node Generic Program Information 159@subsection Generic Program Information 160 161@table @option 162 163@item --help 164@opindex --help 165@cindex usage summary, printing 166Print a usage message briefly summarizing the command-line options 167and the bug-reporting address, then exit. 168 169@item -V 170@itemx --version 171@opindex -V 172@opindex --version 173@cindex version, printing 174Print the version number of @command{grep} to the standard output stream. 175This version number should be included in all bug reports. 176 177@end table 178 179@node Matching Control 180@subsection Matching Control 181 182@table @option 183 184@item -e @var{patterns} 185@itemx --regexp=@var{patterns} 186@opindex -e 187@opindex --regexp=@var{patterns} 188@cindex patterns option 189Use @var{patterns} as one or more patterns; newlines within 190@var{patterns} separate each pattern from the next. 191If this option is used multiple times or is combined with the 192@option{-f} (@option{--file}) option, search for all patterns given. 193Typically @var{patterns} should be quoted when @command{grep} is used 194in a shell command. 195(@option{-e} is specified by POSIX.) 196 197@item -f @var{file} 198@itemx --file=@var{file} 199@opindex -f 200@opindex --file 201@cindex patterns from file 202Obtain patterns from @var{file}, one per line. 203If this option is used multiple times or is combined with the 204@option{-e} (@option{--regexp}) option, search for all patterns given. 205The empty file contains zero patterns, and therefore matches nothing. 206(@option{-f} is specified by POSIX.) 207 208@item -i 209@itemx -y 210@itemx --ignore-case 211@opindex -i 212@opindex -y 213@opindex --ignore-case 214@cindex case insensitive search 215Ignore case distinctions in patterns and input data, 216so that characters that differ only in case 217match each other. Although this is straightforward when letters 218differ in case only via lowercase-uppercase pairs, the behavior is 219unspecified in other situations. For example, uppercase ``S'' has an 220unusual lowercase counterpart ``ſ'' (Unicode character U+017F, LATIN 221SMALL LETTER LONG S) in many locales, and it is unspecified whether 222this unusual character matches ``S'' or ``s'' even though uppercasing 223it yields ``S''. Another example: the lowercase German letter ``ß'' 224(U+00DF, LATIN SMALL LETTER SHARP S) is normally capitalized as the 225two-character string ``SS'' but it does not match ``SS'', and it might 226not match the uppercase letter ``ẞ'' (U+1E9E, LATIN CAPITAL LETTER 227SHARP S) even though lowercasing the latter yields the former. 228 229@option{-y} is an obsolete synonym that is provided for compatibility. 230(@option{-i} is specified by POSIX.) 231 232@item --no-ignore-case 233@opindex --no-ignore-case 234Do not ignore case distinctions in patterns and input data. This is 235the default. This option is useful for passing to shell scripts that 236already use @option{-i}, in order to cancel its effects because the 237two options override each other. 238 239@item -v 240@itemx --invert-match 241@opindex -v 242@opindex --invert-match 243@cindex invert matching 244@cindex print non-matching lines 245Invert the sense of matching, to select non-matching lines. 246(@option{-v} is specified by POSIX.) 247 248@item -w 249@itemx --word-regexp 250@opindex -w 251@opindex --word-regexp 252@cindex matching whole words 253Select only those lines containing matches that form whole words. 254The test is that the matching substring must either 255be at the beginning of the line, 256or preceded by a non-word constituent character. 257Similarly, 258it must be either at the end of the line 259or followed by a non-word constituent character. 260Word constituent characters are letters, digits, and the underscore. 261This option has no effect if @option{-x} is also specified. 262 263Because the @option{-w} option can match a substring that does not 264begin and end with word constituents, it differs from surrounding a 265regular expression with @samp{\<} and @samp{\>}. For example, although 266@samp{grep -w @@} matches a line containing only @samp{@@}, @samp{grep 267'\<@@\>'} cannot match any line because @samp{@@} is not a 268word constituent. @xref{The Backslash Character and Special 269Expressions}. 270 271@item -x 272@itemx --line-regexp 273@opindex -x 274@opindex --line-regexp 275@cindex match the whole line 276Select only those matches that exactly match the whole line. 277For regular expression patterns, this is like parenthesizing each 278pattern and then surrounding it with @samp{^} and @samp{$}. 279(@option{-x} is specified by POSIX.) 280 281@end table 282 283@node General Output Control 284@subsection General Output Control 285 286@table @option 287 288@item -c 289@itemx --count 290@opindex -c 291@opindex --count 292@cindex counting lines 293Suppress normal output; 294instead print a count of matching lines for each input file. 295With the @option{-v} (@option{--invert-match}) option, 296count non-matching lines. 297(@option{-c} is specified by POSIX.) 298 299@item --color[=@var{WHEN}] 300@itemx --colour[=@var{WHEN}] 301@opindex --color 302@opindex --colour 303@cindex highlight, color, colour 304Surround the matched (non-empty) strings, matching lines, context lines, 305file names, line numbers, byte offsets, and separators (for fields and 306groups of context lines) with escape sequences to display them in color 307on the terminal. 308The colors are defined by the environment variable @env{GREP_COLORS} 309and default to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36} 310for bold red matched text, magenta file names, green line numbers, 311green byte offsets, cyan separators, and default terminal colors otherwise. 312The deprecated environment variable @env{GREP_COLOR} is still supported, 313but its setting does not have priority; 314it defaults to @samp{01;31} (bold red) 315which only covers the color for matched text. 316@var{WHEN} is @samp{never}, @samp{always}, or @samp{auto}. 317 318@item -L 319@itemx --files-without-match 320@opindex -L 321@opindex --files-without-match 322@cindex files which don't match 323Suppress normal output; 324instead print the name of each input file from which 325no output would normally have been printed. 326The scanning of each file stops on the first match. 327 328@item -l 329@itemx --files-with-matches 330@opindex -l 331@opindex --files-with-matches 332@cindex names of matching files 333Suppress normal output; 334instead print the name of each input file from which 335output would normally have been printed. 336The scanning of each file stops on the first match. 337(@option{-l} is specified by POSIX.) 338 339@item -m @var{num} 340@itemx --max-count=@var{num} 341@opindex -m 342@opindex --max-count 343@cindex max-count 344Stop after the first @var{num} selected lines. 345If the input is standard input from a regular file, 346and @var{num} selected lines are output, 347@command{grep} ensures that the standard input is positioned 348just after the last selected line before exiting, 349regardless of the presence of trailing context lines. 350This enables a calling process to resume a search. 351For example, the following shell script makes use of it: 352 353@example 354while grep -m 1 'PATTERN' 355do 356 echo xxxx 357done < FILE 358@end example 359 360But the following probably will not work because a pipe is not a regular 361file: 362 363@example 364# This probably will not work. 365cat FILE | 366while grep -m 1 'PATTERN' 367do 368 echo xxxx 369done 370@end example 371 372@cindex context lines 373When @command{grep} stops after @var{num} selected lines, 374it outputs any trailing context lines. 375When the @option{-c} or @option{--count} option is also used, 376@command{grep} does not output a count greater than @var{num}. 377When the @option{-v} or @option{--invert-match} option is also used, 378@command{grep} stops after outputting @var{num} non-matching lines. 379 380@item -o 381@itemx --only-matching 382@opindex -o 383@opindex --only-matching 384@cindex only matching 385Print only the matched (non-empty) parts of matching lines, 386with each such part on a separate output line. 387Output lines use the same delimiters as input, and delimiters are null 388bytes if @option{-z} (@option{--null-data}) is also used (@pxref{Other 389Options}). 390 391@item -q 392@itemx --quiet 393@itemx --silent 394@opindex -q 395@opindex --quiet 396@opindex --silent 397@cindex quiet, silent 398Quiet; do not write anything to standard output. 399Exit immediately with zero status if any match is found, 400even if an error was detected. 401Also see the @option{-s} or @option{--no-messages} option. 402(@option{-q} is specified by POSIX.) 403 404@item -s 405@itemx --no-messages 406@opindex -s 407@opindex --no-messages 408@cindex suppress error messages 409Suppress error messages about nonexistent or unreadable files. 410Portability note: 411unlike GNU @command{grep}, 4127th Edition Unix @command{grep} did not conform to POSIX, 413because it lacked @option{-q} 414and its @option{-s} option behaved like 415GNU @command{grep}'s @option{-q} option.@footnote{Of course, 7th Edition 416Unix predated POSIX by several years!} 417USG-style @command{grep} also lacked @option{-q} 418but its @option{-s} option behaved like GNU @command{grep}'s. 419Portable shell scripts should avoid both 420@option{-q} and @option{-s} and should redirect 421standard and error output to @file{/dev/null} instead. 422(@option{-s} is specified by POSIX.) 423 424@end table 425 426@node Output Line Prefix Control 427@subsection Output Line Prefix Control 428 429When several prefix fields are to be output, 430the order is always file name, line number, and byte offset, 431regardless of the order in which these options were specified. 432 433@table @option 434 435@item -b 436@itemx --byte-offset 437@opindex -b 438@opindex --byte-offset 439@cindex byte offset 440Print the 0-based byte offset within the input file 441before each line of output. 442If @option{-o} (@option{--only-matching}) is specified, 443print the offset of the matching part itself. 444 445@item -H 446@itemx --with-filename 447@opindex -H 448@opindex --with-filename 449@cindex with filename prefix 450Print the file name for each match. 451This is the default when there is more than one file to search. 452 453@item -h 454@itemx --no-filename 455@opindex -h 456@opindex --no-filename 457@cindex no filename prefix 458Suppress the prefixing of file names on output. 459This is the default when there is only one file 460(or only standard input) to search. 461 462@item --label=@var{LABEL} 463@opindex --label 464@cindex changing name of standard input 465Display input actually coming from standard input 466as input coming from file @var{LABEL}. 467This can be useful for commands that transform a file's contents 468before searching; e.g.: 469 470@example 471gzip -cd foo.gz | grep --label=foo -H 'some pattern' 472@end example 473 474@item -n 475@itemx --line-number 476@opindex -n 477@opindex --line-number 478@cindex line numbering 479Prefix each line of output with the 1-based line number within its input file. 480(@option{-n} is specified by POSIX.) 481 482@item -T 483@itemx --initial-tab 484@opindex -T 485@opindex --initial-tab 486@cindex tab-aligned content lines 487Make sure that the first character of actual line content lies on a tab stop, 488so that the alignment of tabs looks normal. 489This is useful with options that prefix their output to the actual content: 490@option{-H}, @option{-n}, and @option{-b}. 491This may also prepend spaces to output line numbers and byte offsets 492so that lines from a single file all start at the same column. 493 494@item -Z 495@itemx --null 496@opindex -Z 497@opindex --null 498@cindex zero-terminated file names 499Output a zero byte (the ASCII NUL character) 500instead of the character that normally follows a file name. 501For example, 502@samp{grep -lZ} outputs a zero byte after each file name 503instead of the usual newline. 504This option makes the output unambiguous, 505even in the presence of file names containing unusual characters like newlines. 506This option can be used with commands like 507@samp{find -print0}, @samp{perl -0}, @samp{sort -z}, and @samp{xargs -0} 508to process arbitrary file names, 509even those that contain newline characters. 510 511@end table 512 513@node Context Line Control 514@subsection Context Line Control 515 516@cindex context lines 517@dfn{Context lines} are non-matching lines that are near a matching line. 518They are output only if one of the following options are used. 519Regardless of how these options are set, 520@command{grep} never outputs any given line more than once. 521If the @option{-o} (@option{--only-matching}) option is specified, 522these options have no effect and a warning is given upon their use. 523 524@table @option 525 526@item -A @var{num} 527@itemx --after-context=@var{num} 528@opindex -A 529@opindex --after-context 530@cindex after context 531@cindex context lines, after match 532Print @var{num} lines of trailing context after matching lines. 533 534@item -B @var{num} 535@itemx --before-context=@var{num} 536@opindex -B 537@opindex --before-context 538@cindex before context 539@cindex context lines, before match 540Print @var{num} lines of leading context before matching lines. 541 542@item -C @var{num} 543@itemx -@var{num} 544@itemx --context=@var{num} 545@opindex -C 546@opindex --context 547@opindex -@var{num} 548@cindex context lines 549Print @var{num} lines of leading and trailing output context. 550 551@item --group-separator=@var{string} 552@opindex --group-separator 553@cindex group separator 554When @option{-A}, @option{-B} or @option{-C} are in use, 555print @var{string} instead of @option{--} between groups of lines. 556 557@item --no-group-separator 558@opindex --group-separator 559@cindex group separator 560When @option{-A}, @option{-B} or @option{-C} are in use, 561do not print a separator between groups of lines. 562 563@end table 564 565Here are some points about how @command{grep} chooses 566the separator to print between prefix fields and line content: 567 568@itemize @bullet 569@item 570Matching lines normally use @samp{:} as a separator 571between prefix fields and actual line content. 572 573@item 574Context (i.e., non-matching) lines use @samp{-} instead. 575 576@item 577When context is not specified, 578matching lines are simply output one right after another. 579 580@item 581When context is specified, 582lines that are adjacent in the input form a group 583and are output one right after another, while 584by default a separator appears between non-adjacent groups. 585 586@item 587The default separator 588is a @samp{--} line; its presence and appearance 589can be changed with the options above. 590 591@item 592Each group may contain 593several matching lines when they are close enough to each other 594that two adjacent groups connect and can merge into a single 595contiguous one. 596@end itemize 597 598@node File and Directory Selection 599@subsection File and Directory Selection 600 601@table @option 602 603@item -a 604@itemx --text 605@opindex -a 606@opindex --text 607@cindex suppress binary data 608@cindex binary files 609Process a binary file as if it were text; 610this is equivalent to the @samp{--binary-files=text} option. 611 612@item --binary-files=@var{type} 613@opindex --binary-files 614@cindex binary files 615If a file's data or metadata 616indicate that the file contains binary data, 617assume that the file is of type @var{type}. 618Non-text bytes indicate binary data; these are either output bytes that are 619improperly encoded for the current locale (@pxref{Environment 620Variables}), or null input bytes when the 621@option{-z} (@option{--null-data}) option is not given (@pxref{Other 622Options}). 623 624By default, @var{type} is @samp{binary}, and @command{grep} 625suppresses output after null input binary data is discovered, 626and suppresses output lines that contain improperly encoded data. 627When some output is suppressed, @command{grep} follows any output 628with a one-line message saying that a binary file matches. 629 630If @var{type} is @samp{without-match}, 631when @command{grep} discovers null input binary data 632it assumes that the rest of the file does not match; 633this is equivalent to the @option{-I} option. 634 635If @var{type} is @samp{text}, 636@command{grep} processes binary data as if it were text; 637this is equivalent to the @option{-a} option. 638 639When @var{type} is @samp{binary}, @command{grep} may treat non-text 640bytes as line terminators even without the @option{-z} 641(@option{--null-data}) option. This means choosing @samp{binary} 642versus @samp{text} can affect whether a pattern matches a file. For 643example, when @var{type} is @samp{binary} the pattern @samp{q$} might 644match @samp{q} immediately followed by a null byte, even though this 645is not matched when @var{type} is @samp{text}. Conversely, when 646@var{type} is @samp{binary} the pattern @samp{.} (period) might not 647match a null byte. 648 649@emph{Warning:} The @option{-a} (@option{--binary-files=text}) option 650might output binary garbage, which can have nasty side effects if the 651output is a terminal and if the terminal driver interprets some of it 652as commands. On the other hand, when reading files whose text 653encodings are unknown, it can be helpful to use @option{-a} or to set 654@samp{LC_ALL='C'} in the environment, in order to find more matches 655even if the matches are unsafe for direct display. 656 657@item -D @var{action} 658@itemx --devices=@var{action} 659@opindex -D 660@opindex --devices 661@cindex device search 662If an input file is a device, FIFO, or socket, use @var{action} to process it. 663If @var{action} is @samp{read}, 664all devices are read just as if they were ordinary files. 665If @var{action} is @samp{skip}, 666devices, FIFOs, and sockets are silently skipped. 667By default, devices are read if they are on the command line or if the 668@option{-R} (@option{--dereference-recursive}) option is used, and are 669skipped if they are encountered recursively and the @option{-r} 670(@option{--recursive}) option is used. 671This option has no effect on a file that is read via standard input. 672 673@item -d @var{action} 674@itemx --directories=@var{action} 675@opindex -d 676@opindex --directories 677@cindex directory search 678@cindex symbolic links 679If an input file is a directory, use @var{action} to process it. 680By default, @var{action} is @samp{read}, 681which means that directories are read just as if they were ordinary files 682(some operating systems and file systems disallow this, 683and will cause @command{grep} 684to print error messages for every directory or silently skip them). 685If @var{action} is @samp{skip}, directories are silently skipped. 686If @var{action} is @samp{recurse}, 687@command{grep} reads all files under each directory, recursively, 688following command-line symbolic links and skipping other symlinks; 689this is equivalent to the @option{-r} option. 690 691@item --exclude=@var{glob} 692@opindex --exclude 693@cindex exclude files 694@cindex searching directory trees 695Skip any command-line file with a name suffix that matches the pattern 696@var{glob}, using wildcard matching; a name suffix is either the whole 697name, or a trailing part that starts with a non-slash character 698immediately after a slash (@samp{/}) in the name. 699When searching recursively, skip any subfile whose base 700name matches @var{glob}; the base name is the part after the last 701slash. A pattern can use 702@samp{*}, @samp{?}, and @samp{[}...@samp{]} as wildcards, 703and @code{\} to quote a wildcard or backslash character literally. 704 705@item --exclude-from=@var{file} 706@opindex --exclude-from 707@cindex exclude files 708@cindex searching directory trees 709Skip files whose name matches any of the patterns 710read from @var{file} (using wildcard matching as described 711under @option{--exclude}). 712 713@item --exclude-dir=@var{glob} 714@opindex --exclude-dir 715@cindex exclude directories 716Skip any command-line directory with a name suffix that matches the 717pattern @var{glob}. When searching recursively, skip any subdirectory 718whose base name matches @var{glob}. Ignore any redundant trailing 719slashes in @var{glob}. 720 721@item -I 722Process a binary file as if it did not contain matching data; 723this is equivalent to the @samp{--binary-files=without-match} option. 724 725@item --include=@var{glob} 726@opindex --include 727@cindex include files 728@cindex searching directory trees 729Search only files whose name matches @var{glob}, 730using wildcard matching as described under @option{--exclude}. 731 732@item -r 733@itemx --recursive 734@opindex -r 735@opindex --recursive 736@cindex recursive search 737@cindex searching directory trees 738@cindex symbolic links 739For each directory operand, 740read and process all files in that directory, recursively. 741Follow symbolic links on the command line, but skip symlinks 742that are encountered recursively. 743Note that if no file operand is given, grep searches the working directory. 744This is the same as the @samp{--directories=recurse} option. 745 746@item -R 747@itemx --dereference-recursive 748@opindex -R 749@opindex --dereference-recursive 750@cindex recursive search 751@cindex searching directory trees 752@cindex symbolic links 753For each directory operand, read and process all files in that 754directory, recursively, following all symbolic links. 755 756@end table 757 758@node Other Options 759@subsection Other Options 760 761@table @option 762 763@item -- 764@opindex -- 765@cindex option delimiter 766Delimit the option list. Later arguments, if any, are treated as 767operands even if they begin with @samp{-}. For example, @samp{grep PAT -- 768-file1 file2} searches for the pattern PAT in the files named @file{-file1} 769and @file{file2}. 770 771@item --line-buffered 772@opindex --line-buffered 773@cindex line buffering 774Use line buffering on output. 775This can cause a performance penalty. 776 777@item -U 778@itemx --binary 779@opindex -U 780@opindex --binary 781@cindex MS-Windows binary I/O 782@cindex binary I/O 783On platforms that distinguish between text and binary I/O, 784use the latter when reading and writing files other 785than the user's terminal, so that all input bytes are read and written 786as-is. This overrides the default behavior where @command{grep} 787follows the operating system's advice whether to use text or binary 788I/O@. On MS-Windows when @command{grep} uses text I/O it reads a 789carriage return--newline pair as a newline and a Control-Z as 790end-of-file, and it writes a newline as a carriage return--newline 791pair. 792 793When using text I/O @option{--byte-offset} (@option{-b}) counts and 794@option{--binary-files} heuristics apply to input data after text-I/O 795processing. Also, the @option{--binary-files} heuristics need not agree 796with the @option{--binary} option; that is, they may treat the data as 797text even if @option{--binary} is given, or vice versa. 798@xref{File and Directory Selection}. 799 800This option has no effect on GNU and other POSIX-compatible platforms, 801which do not distinguish text from binary I/O. 802 803@item -z 804@itemx --null-data 805@opindex -z 806@opindex --null-data 807@cindex zero-terminated lines 808Treat input and output data as sequences of lines, each terminated by 809a zero byte (the ASCII NUL character) instead of a newline. 810Like the @option{-Z} or @option{--null} option, 811this option can be used with commands like 812@samp{sort -z} to process arbitrary file names. 813 814@end table 815 816@node Environment Variables 817@section Environment Variables 818 819The behavior of @command{grep} is affected 820by the following environment variables. 821 822@vindex LANGUAGE @r{environment variable} 823@vindex LC_ALL @r{environment variable} 824@vindex LC_MESSAGES @r{environment variable} 825@vindex LANG @r{environment variable} 826The locale for category @w{@code{LC_@var{foo}}} 827is specified by examining the three environment variables 828@env{LC_ALL}, @w{@env{LC_@var{foo}}}, and @env{LANG}, 829in that order. 830The first of these variables that is set specifies the locale. 831For example, if @env{LC_ALL} is not set, 832but @env{LC_COLLATE} is set to @samp{pt_BR}, 833then the Brazilian Portuguese locale is used 834for the @env{LC_COLLATE} category. 835As a special case for @env{LC_MESSAGES} only, the environment variable 836@env{LANGUAGE} can contain a colon-separated list of languages that 837overrides the three environment variables that ordinarily specify 838the @env{LC_MESSAGES} category. 839The @samp{C} locale is used if none of these environment variables are set, 840if the locale catalog is not installed, 841or if @command{grep} was not compiled 842with national language support (NLS). 843The shell command @code{locale -a} lists locales that are currently available. 844 845Many of the environment variables in the following list let you 846control highlighting using 847Select Graphic Rendition (SGR) 848commands interpreted by the terminal or terminal emulator. 849(See the 850section 851in the documentation of your text terminal 852for permitted values and their meanings as character attributes.) 853These substring values are integers in decimal representation 854and can be concatenated with semicolons. 855@command{grep} takes care of assembling the result 856into a complete SGR sequence (@samp{\33[}...@samp{m}). 857Common values to concatenate include 858@samp{1} for bold, 859@samp{4} for underline, 860@samp{5} for blink, 861@samp{7} for inverse, 862@samp{39} for default foreground color, 863@samp{30} to @samp{37} for foreground colors, 864@samp{90} to @samp{97} for 16-color mode foreground colors, 865@samp{38;5;0} to @samp{38;5;255} 866for 88-color and 256-color modes foreground colors, 867@samp{49} for default background color, 868@samp{40} to @samp{47} for background colors, 869@samp{100} to @samp{107} for 16-color mode background colors, 870and @samp{48;5;0} to @samp{48;5;255} 871for 88-color and 256-color modes background colors. 872 873The two-letter names used in the @env{GREP_COLORS} environment variable 874(and some of the others) refer to terminal ``capabilities,'' the ability 875of a terminal to highlight text, or change its color, and so on. 876These capabilities are stored in an online database and accessed by 877the @code{terminfo} library. 878 879@cindex environment variables 880 881@table @env 882 883@item GREP_OPTIONS 884@vindex GREP_OPTIONS @r{environment variable} 885@cindex default options environment variable 886This variable specifies default options to be placed in front of any 887explicit options. 888As this causes problems when writing portable scripts, this feature 889will be removed in a future release of @command{grep}, and @command{grep} 890warns if it is used. Please use an alias or script instead. 891For example, if @command{grep} is in the directory @samp{/usr/bin} you 892can prepend @file{$HOME/bin} to your @env{PATH} and create an 893executable script @file{$HOME/bin/grep} containing the following: 894 895@example 896#! /bin/sh 897export PATH=/usr/bin 898exec grep --color=auto --devices=skip "$@@" 899@end example 900 901@item GREP_COLOR 902@vindex GREP_COLOR @r{environment variable} 903@cindex highlight markers 904This variable specifies the color used to highlight matched (non-empty) text. 905It is deprecated in favor of @env{GREP_COLORS}, but still supported. 906The @samp{mt}, @samp{ms}, and @samp{mc} capabilities of @env{GREP_COLORS} 907have priority over it. 908It can only specify the color used to highlight 909the matching non-empty text in any matching line 910(a selected line when the @option{-v} command-line option is omitted, 911or a context line when @option{-v} is specified). 912The default is @samp{01;31}, 913which means a bold red foreground text on the terminal's default background. 914 915@item GREP_COLORS 916@vindex GREP_COLORS @r{environment variable} 917@cindex highlight markers 918This variable specifies the colors and other attributes 919used to highlight various parts of the output. 920Its value is a colon-separated list of @code{terminfo} capabilities 921that defaults to @samp{ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36} 922with the @samp{rv} and @samp{ne} boolean capabilities omitted (i.e., false). 923Supported capabilities are as follows. 924 925@table @code 926@item sl= 927@vindex sl GREP_COLORS @r{capability} 928SGR substring for whole selected lines 929(i.e., 930matching lines when the @option{-v} command-line option is omitted, 931or non-matching lines when @option{-v} is specified). 932If however the boolean @samp{rv} capability 933and the @option{-v} command-line option are both specified, 934it applies to context matching lines instead. 935The default is empty (i.e., the terminal's default color pair). 936 937@item cx= 938@vindex cx GREP_COLORS @r{capability} 939SGR substring for whole context lines 940(i.e., 941non-matching lines when the @option{-v} command-line option is omitted, 942or matching lines when @option{-v} is specified). 943If however the boolean @samp{rv} capability 944and the @option{-v} command-line option are both specified, 945it applies to selected non-matching lines instead. 946The default is empty (i.e., the terminal's default color pair). 947 948@item rv 949@vindex rv GREP_COLORS @r{capability} 950Boolean value that reverses (swaps) the meanings of 951the @samp{sl=} and @samp{cx=} capabilities 952when the @option{-v} command-line option is specified. 953The default is false (i.e., the capability is omitted). 954 955@item mt=01;31 956@vindex mt GREP_COLORS @r{capability} 957SGR substring for matching non-empty text in any matching line 958(i.e., 959a selected line when the @option{-v} command-line option is omitted, 960or a context line when @option{-v} is specified). 961Setting this is equivalent to setting both @samp{ms=} and @samp{mc=} 962at once to the same value. 963The default is a bold red text foreground over the current line background. 964 965@item ms=01;31 966@vindex ms GREP_COLORS @r{capability} 967SGR substring for matching non-empty text in a selected line. 968(This is used only when the @option{-v} command-line option is omitted.) 969The effect of the @samp{sl=} (or @samp{cx=} if @samp{rv}) capability 970remains active when this takes effect. 971The default is a bold red text foreground over the current line background. 972 973@item mc=01;31 974@vindex mc GREP_COLORS @r{capability} 975SGR substring for matching non-empty text in a context line. 976(This is used only when the @option{-v} command-line option is specified.) 977The effect of the @samp{cx=} (or @samp{sl=} if @samp{rv}) capability 978remains active when this takes effect. 979The default is a bold red text foreground over the current line background. 980 981@item fn=35 982@vindex fn GREP_COLORS @r{capability} 983SGR substring for file names prefixing any content line. 984The default is a magenta text foreground over the terminal's default background. 985 986@item ln=32 987@vindex ln GREP_COLORS @r{capability} 988SGR substring for line numbers prefixing any content line. 989The default is a green text foreground over the terminal's default background. 990 991@item bn=32 992@vindex bn GREP_COLORS @r{capability} 993SGR substring for byte offsets prefixing any content line. 994The default is a green text foreground over the terminal's default background. 995 996@item se=36 997@vindex fn GREP_COLORS @r{capability} 998SGR substring for separators that are inserted 999between selected line fields (@samp{:}), 1000between context line fields (@samp{-}), 1001and between groups of adjacent lines 1002when nonzero context is specified (@samp{--}). 1003The default is a cyan text foreground over the terminal's default background. 1004 1005@item ne 1006@vindex ne GREP_COLORS @r{capability} 1007Boolean value that prevents clearing to the end of line 1008using Erase in Line (EL) to Right (@samp{\33[K}) 1009each time a colorized item ends. 1010This is needed on terminals on which EL is not supported. 1011It is otherwise useful on terminals 1012for which the @code{back_color_erase} 1013(@code{bce}) boolean @code{terminfo} capability does not apply, 1014when the chosen highlight colors do not affect the background, 1015or when EL is too slow or causes too much flicker. 1016The default is false (i.e., the capability is omitted). 1017@end table 1018 1019Note that boolean capabilities have no @samp{=}... part. 1020They are omitted (i.e., false) by default and become true when specified. 1021 1022 1023@item LC_ALL 1024@itemx LC_COLLATE 1025@itemx LANG 1026@vindex LC_ALL @r{environment variable} 1027@vindex LC_COLLATE @r{environment variable} 1028@vindex LANG @r{environment variable} 1029@cindex character type 1030@cindex national language support 1031@cindex NLS 1032These variables specify the locale for the @env{LC_COLLATE} category, 1033which might affect how range expressions like @samp{[a-z]} are 1034interpreted. 1035 1036@item LC_ALL 1037@itemx LC_CTYPE 1038@itemx LANG 1039@vindex LC_ALL @r{environment variable} 1040@vindex LC_CTYPE @r{environment variable} 1041@vindex LANG @r{environment variable} 1042@cindex encoding error 1043@cindex null character 1044These variables specify the locale for the @env{LC_CTYPE} category, 1045which determines the type of characters, 1046e.g., which characters are whitespace. 1047This category also determines the character encoding, that is, whether 1048text is encoded in UTF-8, ASCII, or some other encoding. In the 1049@samp{C} or @samp{POSIX} locale, all characters are encoded as a 1050single byte and every byte is a valid character. 1051In more-complex encodings such as UTF-8, a sequence of multiple bytes 1052may be needed to represent a character, and some bytes may be encoding 1053errors that do not contribute to the representation of any character. 1054POSIX does not specify the behavior of @command{grep} when patterns or 1055input data contain encoding errors or null characters, so portable 1056scripts should avoid such usage. As an extension to POSIX, GNU 1057@command{grep} treats null characters like any other character. 1058However, unless the @option{-a} (@option{--binary-files=text}) option 1059is used, the presence of null characters in input or of encoding 1060errors in output causes GNU @command{grep} to treat the file as binary 1061and suppress details about matches. @xref{File and Directory 1062Selection}. 1063 1064@item LANGUAGE 1065@itemx LC_ALL 1066@itemx LC_MESSAGES 1067@itemx LANG 1068@vindex LANGUAGE @r{environment variable} 1069@vindex LC_ALL @r{environment variable} 1070@vindex LC_MESSAGES @r{environment variable} 1071@vindex LANG @r{environment variable} 1072@cindex language of messages 1073@cindex message language 1074@cindex national language support 1075@cindex translation of message language 1076These variables specify the locale for the @env{LC_MESSAGES} category, 1077which determines the language that @command{grep} uses for messages. 1078The default @samp{C} locale uses American English messages. 1079 1080@item POSIXLY_CORRECT 1081@vindex POSIXLY_CORRECT @r{environment variable} 1082If set, @command{grep} behaves as POSIX requires; otherwise, 1083@command{grep} behaves more like other GNU programs. 1084POSIX 1085requires that options that 1086follow file names must be treated as file names; 1087by default, 1088such options are permuted to the front of the operand list 1089and are treated as options. 1090Also, @env{POSIXLY_CORRECT} disables special handling of an 1091invalid bracket expression. @xref{invalid-bracket-expr}. 1092 1093@item _@var{N}_GNU_nonoption_argv_flags_ 1094@vindex _@var{N}_GNU_nonoption_argv_flags_ @r{environment variable} 1095(Here @code{@var{N}} is @command{grep}'s numeric process ID.) 1096If the @var{i}th character of this environment variable's value is @samp{1}, 1097do not consider the @var{i}th operand of @command{grep} to be an option, 1098even if it appears to be one. 1099A shell can put this variable in the environment for each command it runs, 1100specifying which operands are the results of file name wildcard expansion 1101and therefore should not be treated as options. 1102This behavior is available only with the GNU C library, 1103and only when @env{POSIXLY_CORRECT} is not set. 1104 1105@end table 1106 1107 1108@node Exit Status 1109@section Exit Status 1110@cindex exit status 1111@cindex return status 1112 1113Normally the exit status is 0 if a line is selected, 1 if no lines 1114were selected, and 2 if an error occurred. However, if the 1115@option{-L} or @option{--files-without-match} is used, the exit status 1116is 0 if a file is listed, 1 if no files were listed, and 2 if an error 1117occurred. Also, if the 1118@option{-q} or @option{--quiet} or @option{--silent} option is used 1119and a line is selected, the exit status is 0 even if an error 1120occurred. Other @command{grep} implementations may exit with status 1121greater than 2 on error. 1122 1123@node grep Programs 1124@section @command{grep} Programs 1125@cindex @command{grep} programs 1126@cindex variants of @command{grep} 1127 1128@command{grep} searches the named input files 1129for lines containing a match to the given patterns. 1130By default, @command{grep} prints the matching lines. 1131A file named @file{-} stands for standard input. 1132If no input is specified, @command{grep} searches the working 1133directory @file{.} if given a command-line option specifying 1134recursion; otherwise, @command{grep} searches standard input. 1135There are four major variants of @command{grep}, 1136controlled by the following options. 1137 1138@table @option 1139 1140@item -G 1141@itemx --basic-regexp 1142@opindex -G 1143@opindex --basic-regexp 1144@cindex matching basic regular expressions 1145Interpret patterns as basic regular expressions (BREs). 1146This is the default. 1147 1148@item -E 1149@itemx --extended-regexp 1150@opindex -E 1151@opindex --extended-regexp 1152@cindex matching extended regular expressions 1153Interpret patterns as extended regular expressions (EREs). 1154(@option{-E} is specified by POSIX.) 1155 1156@item -F 1157@itemx --fixed-strings 1158@opindex -F 1159@opindex --fixed-strings 1160@cindex matching fixed strings 1161Interpret patterns as fixed strings, not regular expressions. 1162(@option{-F} is specified by POSIX.) 1163 1164@item -P 1165@itemx --perl-regexp 1166@opindex -P 1167@opindex --perl-regexp 1168@cindex matching Perl-compatible regular expressions 1169Interpret patterns as Perl-compatible regular expressions (PCREs). 1170PCRE support is here to stay, but consider this option experimental when 1171combined with the @option{-z} (@option{--null-data}) option, and note that 1172@samp{grep@ -P} may warn of unimplemented features. 1173@xref{Other Options}. 1174 1175@end table 1176 1177In addition, 1178two variant programs @command{egrep} and @command{fgrep} are available. 1179@command{egrep} is the same as @samp{grep@ -E}. 1180@command{fgrep} is the same as @samp{grep@ -F}. 1181Direct invocation as either 1182@command{egrep} or @command{fgrep} is deprecated, 1183but is provided to allow historical applications 1184that rely on them to run unmodified. 1185 1186 1187@node Regular Expressions 1188@chapter Regular Expressions 1189@cindex regular expressions 1190 1191A @dfn{regular expression} is a pattern that describes a set of strings. 1192Regular expressions are constructed analogously to arithmetic expressions, 1193by using various operators to combine smaller expressions. 1194@command{grep} understands 1195three different versions of regular expression syntax: 1196basic (BRE), extended (ERE), and Perl-compatible (PCRE). 1197In GNU @command{grep}, 1198there is no difference in available functionality between the basic and 1199extended syntaxes. 1200In other implementations, basic regular expressions are less powerful. 1201The following description applies to extended regular expressions; 1202differences for basic regular expressions are summarized afterwards. 1203Perl-compatible regular expressions give additional functionality, and 1204are documented in the @i{pcresyntax}(3) and @i{pcrepattern}(3) manual 1205pages, but work only if PCRE is available in the system. 1206 1207@menu 1208* Fundamental Structure:: 1209* Character Classes and Bracket Expressions:: 1210* The Backslash Character and Special Expressions:: 1211* Anchoring:: 1212* Back-references and Subexpressions:: 1213* Basic vs Extended:: 1214@end menu 1215 1216@node Fundamental Structure 1217@section Fundamental Structure 1218 1219The fundamental building blocks are the regular expressions that match 1220a single character. 1221Most characters, including all letters and digits, 1222are regular expressions that match themselves. 1223Any meta-character 1224with special meaning may be quoted by preceding it with a backslash. 1225 1226@opindex . 1227@cindex dot 1228@cindex period 1229The period @samp{.} matches any single character. 1230It is unspecified whether @samp{.} matches an encoding error. 1231 1232A regular expression may be followed by one of several 1233repetition operators: 1234 1235@table @samp 1236 1237@item ? 1238@opindex ? 1239@cindex question mark 1240@cindex match expression at most once 1241The preceding item is optional and will be matched at most once. 1242 1243@item * 1244@opindex * 1245@cindex asterisk 1246@cindex match expression zero or more times 1247The preceding item will be matched zero or more times. 1248 1249@item + 1250@opindex + 1251@cindex plus sign 1252@cindex match expression one or more times 1253The preceding item will be matched one or more times. 1254 1255@item @{@var{n}@} 1256@opindex @{@var{n}@} 1257@cindex braces, one argument 1258@cindex match expression @var{n} times 1259The preceding item is matched exactly @var{n} times. 1260 1261@item @{@var{n},@} 1262@opindex @{@var{n},@} 1263@cindex braces, second argument omitted 1264@cindex match expression @var{n} or more times 1265The preceding item is matched @var{n} or more times. 1266 1267@item @{,@var{m}@} 1268@opindex @{,@var{m}@} 1269@cindex braces, first argument omitted 1270@cindex match expression at most @var{m} times 1271The preceding item is matched at most @var{m} times. 1272This is a GNU extension. 1273 1274@item @{@var{n},@var{m}@} 1275@opindex @{@var{n},@var{m}@} 1276@cindex braces, two arguments 1277@cindex match expression from @var{n} to @var{m} times 1278The preceding item is matched at least @var{n} times, but not more than 1279@var{m} times. 1280 1281@end table 1282 1283The empty regular expression matches the empty string. 1284Two regular expressions may be concatenated; 1285the resulting regular expression 1286matches any string formed by concatenating two substrings 1287that respectively match the concatenated expressions. 1288 1289Two regular expressions may be joined by the infix operator @samp{|}; 1290the resulting regular expression 1291matches any string matching either alternate expression. 1292 1293Repetition takes precedence over concatenation, 1294which in turn takes precedence over alternation. 1295A whole expression may be enclosed in parentheses 1296to override these precedence rules and form a subexpression. 1297An unmatched @samp{)} matches just itself. 1298 1299@node Character Classes and Bracket Expressions 1300@section Character Classes and Bracket Expressions 1301 1302@cindex bracket expression 1303@cindex character class 1304A @dfn{bracket expression} is a list of characters enclosed by @samp{[} and 1305@samp{]}. 1306It matches any single character in that list. 1307If the first character of the list is the caret @samp{^}, 1308then it matches any character @strong{not} in the list, 1309and it is unspecified whether it matches an encoding error. 1310For example, the regular expression 1311@samp{[0123456789]} matches any single digit, 1312whereas @samp{[^()]} matches any single character that is not 1313an opening or closing parenthesis, and might or might not match an 1314encoding error. 1315 1316@cindex range expression 1317Within a bracket expression, a @dfn{range expression} consists of two 1318characters separated by a hyphen. 1319It matches any single character that 1320sorts between the two characters, inclusive. 1321In the default C locale, the sorting sequence is the native character 1322order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}. 1323In other locales, the sorting sequence is not specified, and 1324@samp{[a-d]} might be equivalent to @samp{[abcd]} or to 1325@samp{[aBbCcDd]}, or it might fail to match any character, or the set of 1326characters that it matches might even be erratic. 1327To obtain the traditional interpretation 1328of bracket expressions, you can use the @samp{C} locale by setting the 1329@env{LC_ALL} environment variable to the value @samp{C}. 1330 1331Finally, certain named classes of characters are predefined within 1332bracket expressions, as follows. 1333Their interpretation depends on the @env{LC_CTYPE} locale; 1334for example, @samp{[[:alnum:]]} means the character class of numbers and letters 1335in the current locale. 1336 1337@cindex classes of characters 1338@cindex character classes 1339@table @samp 1340 1341@item [:alnum:] 1342@opindex alnum @r{character class} 1343@cindex alphanumeric characters 1344Alphanumeric characters: 1345@samp{[:alpha:]} and @samp{[:digit:]}; in the @samp{C} locale and ASCII 1346character encoding, this is the same as @samp{[0-9A-Za-z]}. 1347 1348@item [:alpha:] 1349@opindex alpha @r{character class} 1350@cindex alphabetic characters 1351Alphabetic characters: 1352@samp{[:lower:]} and @samp{[:upper:]}; in the @samp{C} locale and ASCII 1353character encoding, this is the same as @samp{[A-Za-z]}. 1354 1355@item [:blank:] 1356@opindex blank @r{character class} 1357@cindex blank characters 1358Blank characters: 1359space and tab. 1360 1361@item [:cntrl:] 1362@opindex cntrl @r{character class} 1363@cindex control characters 1364Control characters. 1365In ASCII, these characters have octal codes 000 1366through 037, and 177 (DEL). 1367In other character sets, these are 1368the equivalent characters, if any. 1369 1370@item [:digit:] 1371@opindex digit @r{character class} 1372@cindex digit characters 1373@cindex numeric characters 1374Digits: @code{0 1 2 3 4 5 6 7 8 9}. 1375 1376@item [:graph:] 1377@opindex graph @r{character class} 1378@cindex graphic characters 1379Graphical characters: 1380@samp{[:alnum:]} and @samp{[:punct:]}. 1381 1382@item [:lower:] 1383@opindex lower @r{character class} 1384@cindex lower-case letters 1385Lower-case letters; in the @samp{C} locale and ASCII character 1386encoding, this is 1387@code{a b c d e f g h i j k l m n o p q r s t u v w x y z}. 1388 1389@item [:print:] 1390@opindex print @r{character class} 1391@cindex printable characters 1392Printable characters: 1393@samp{[:alnum:]}, @samp{[:punct:]}, and space. 1394 1395@item [:punct:] 1396@opindex punct @r{character class} 1397@cindex punctuation characters 1398Punctuation characters; in the @samp{C} locale and ASCII character 1399encoding, this is 1400@code{!@: " # $ % & ' ( ) * + , - .@: / : ; < = > ?@: @@ [ \ ] ^ _ ` @{ | @} ~}. 1401 1402@item [:space:] 1403@opindex space @r{character class} 1404@cindex space characters 1405@cindex whitespace characters 1406Space characters: in the @samp{C} locale, this is 1407tab, newline, vertical tab, form feed, carriage return, and space. 1408@xref{Usage}, for more discussion of matching newlines. 1409 1410@item [:upper:] 1411@opindex upper @r{character class} 1412@cindex upper-case letters 1413Upper-case letters: in the @samp{C} locale and ASCII character 1414encoding, this is 1415@code{A B C D E F G H I J K L M N O P Q R S T U V W X Y Z}. 1416 1417@item [:xdigit:] 1418@opindex xdigit @r{character class} 1419@cindex xdigit class 1420@cindex hexadecimal digits 1421Hexadecimal digits: 1422@code{0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f}. 1423 1424@end table 1425Note that the brackets in these class names are 1426part of the symbolic names, and must be included in addition to 1427the brackets delimiting the bracket expression. 1428 1429@anchor{invalid-bracket-expr} 1430If you mistakenly omit the outer brackets, and search for say, @samp{[:upper:]}, 1431GNU @command{grep} prints a diagnostic and exits with status 2, on 1432the assumption that you did not intend to search for the nominally 1433equivalent regular expression: @samp{[:epru]}. 1434Set the @env{POSIXLY_CORRECT} environment variable to disable this feature. 1435 1436Most meta-characters lose their special meaning inside bracket expressions. 1437 1438@table @samp 1439@item ] 1440ends the bracket expression if it's not the first list item. 1441So, if you want to make the @samp{]} character a list item, 1442you must put it first. 1443 1444@item [. 1445represents the open collating symbol. 1446 1447@item .] 1448represents the close collating symbol. 1449 1450@item [= 1451represents the open equivalence class. 1452 1453@item =] 1454represents the close equivalence class. 1455 1456@item [: 1457represents the open character class symbol, and should be followed by a 1458valid character class name. 1459 1460@item :] 1461represents the close character class symbol. 1462 1463@item - 1464represents the range if it's not first or last in a list or the ending point 1465of a range. 1466 1467@item ^ 1468represents the characters not in the list. 1469If you want to make the @samp{^} 1470character a list item, place it anywhere but first. 1471 1472@end table 1473 1474@node The Backslash Character and Special Expressions 1475@section The Backslash Character and Special Expressions 1476@cindex backslash 1477 1478The @samp{\} character, 1479when followed by certain ordinary characters, 1480takes a special meaning: 1481 1482@table @samp 1483 1484@item \b 1485Match the empty string at the edge of a word. 1486 1487@item \B 1488Match the empty string provided it's not at the edge of a word. 1489 1490@item \< 1491Match the empty string at the beginning of word. 1492 1493@item \> 1494Match the empty string at the end of word. 1495 1496@item \w 1497Match word constituent, it is a synonym for @samp{[_[:alnum:]]}. 1498 1499@item \W 1500Match non-word constituent, it is a synonym for @samp{[^_[:alnum:]]}. 1501 1502@item \s 1503Match whitespace, it is a synonym for @samp{[[:space:]]}. 1504 1505@item \S 1506Match non-whitespace, it is a synonym for @samp{[^[:space:]]}. 1507 1508@end table 1509 1510For example, @samp{\brat\b} matches the separate word @samp{rat}, 1511@samp{\Brat\B} matches @samp{crate} but not @samp{furry rat}. 1512 1513@node Anchoring 1514@section Anchoring 1515@cindex anchoring 1516 1517The caret @samp{^} and the dollar sign @samp{$} are meta-characters that 1518respectively match the empty string at the beginning and end of a line. 1519They are termed @dfn{anchors}, since they force the match to be ``anchored'' 1520to beginning or end of a line, respectively. 1521 1522@node Back-references and Subexpressions 1523@section Back-references and Subexpressions 1524@cindex subexpression 1525@cindex back-reference 1526 1527The back-reference @samp{\@var{n}}, where @var{n} is a single digit, matches 1528the substring previously matched by the @var{n}th parenthesized subexpression 1529of the regular expression. 1530For example, @samp{(a)\1} matches @samp{aa}. 1531When used with alternation, if the group does not participate in the match then 1532the back-reference makes the whole match fail. 1533For example, @samp{a(.)|b\1} 1534will not match @samp{ba}. 1535When multiple regular expressions are given with 1536@option{-e} or from a file (@samp{-f @var{file}}), 1537back-references are local to each expression. 1538 1539@xref{Known Bugs}, for some known problems with back-references. 1540 1541@node Basic vs Extended 1542@section Basic vs Extended Regular Expressions 1543@cindex basic regular expressions 1544 1545In basic regular expressions the meta-characters @samp{?}, @samp{+}, 1546@samp{@{}, @samp{|}, @samp{(}, and @samp{)} lose their special meaning; 1547instead use the backslashed versions @samp{\?}, @samp{\+}, @samp{\@{}, 1548@samp{\|}, @samp{\(}, and @samp{\)}. 1549 1550@cindex interval specifications 1551Traditional @command{egrep} did not support the @samp{@{} meta-character, 1552and some @command{egrep} implementations support @samp{\@{} instead, so 1553portable scripts should avoid @samp{@{} in @samp{grep@ -E} patterns and 1554should use @samp{[@{]} to match a literal @samp{@{}. 1555 1556GNU @command{grep@ -E} attempts to support traditional usage by 1557assuming that @samp{@{} is not special if it would be the start of an 1558invalid interval specification. 1559For example, the command 1560@samp{grep@ -E@ '@{1'} searches for the two-character string @samp{@{1} 1561instead of reporting a syntax error in the regular expression. 1562POSIX allows this behavior as an extension, but portable scripts 1563should avoid it. 1564 1565 1566@node Usage 1567@chapter Usage 1568 1569@cindex usage, examples 1570Here is an example command that invokes GNU @command{grep}: 1571 1572@example 1573grep -i 'hello.*world' menu.h main.c 1574@end example 1575 1576@noindent 1577This lists all lines in the files @file{menu.h} and @file{main.c} that 1578contain the string @samp{hello} followed by the string @samp{world}; 1579this is because @samp{.*} matches zero or more characters within a line. 1580@xref{Regular Expressions}. 1581The @option{-i} option causes @command{grep} 1582to ignore case, causing it to match the line @samp{Hello, world!}, which 1583it would not otherwise match. 1584 1585Here is a more complex example session, 1586showing the location and contents of any line 1587containing @samp{f} and ending in @samp{.c}, 1588within all files in the current directory whose names 1589contain @samp{g} and end in @samp{.h}. 1590The @option{-n} option outputs line numbers, the @option{--} argument 1591treats any later arguments starting with @samp{-} as file names not 1592options, and the empty file @file{/dev/null} causes file names to be output 1593even if only one file name happens to be of the form @samp{*g*.h}. 1594 1595@example 1596$ @kbd{grep -n -- 'f.*\.c$' *g*.h /dev/null} 1597argmatch.h:1:/* definitions and prototypes for argmatch.c 1598@end example 1599 1600@noindent 1601The only line that contains a match is line 1 of @file{argmatch.h}. 1602Note that the regular expression syntax used in the pattern differs 1603from the globbing syntax that the shell uses to match file names. 1604 1605@xref{Invoking}, for more details about 1606how to invoke @command{grep}. 1607 1608@cindex using @command{grep}, Q&A 1609@cindex FAQ about @command{grep} usage 1610Here are some common questions and answers about @command{grep} usage. 1611 1612@enumerate 1613 1614@item 1615How can I list just the names of matching files? 1616 1617@example 1618grep -l 'main' test-*.c 1619@end example 1620 1621@noindent 1622lists names of @samp{test-*.c} files in the current directory whose contents 1623mention @samp{main}. 1624 1625@item 1626How do I search directories recursively? 1627 1628@example 1629grep -r 'hello' /home/gigi 1630@end example 1631 1632@noindent 1633searches for @samp{hello} in all files 1634under the @file{/home/gigi} directory. 1635For more control over which files are searched, 1636use @command{find} and @command{grep}. 1637For example, the following command searches only C files: 1638 1639@example 1640find /home/gigi -name '*.c' ! -type d \ 1641 -exec grep -H 'hello' '@{@}' + 1642@end example 1643 1644This differs from the command: 1645 1646@example 1647grep -H 'hello' /home/gigi/*.c 1648@end example 1649 1650which merely looks for @samp{hello} in non-hidden C files in 1651@file{/home/gigi} whose names end in @samp{.c}. 1652The @command{find} command line above is more similar to the command: 1653 1654@example 1655grep -r --include='*.c' 'hello' /home/gigi 1656@end example 1657 1658@item 1659What if a pattern or file has a leading @samp{-}? 1660 1661@example 1662grep -- '--cut here--' * 1663@end example 1664 1665@noindent 1666searches for all lines matching @samp{--cut here--}. 1667Without @option{--}, 1668@command{grep} would attempt to parse @samp{--cut here--} as a list of 1669options, and there would be similar problems with any file names 1670beginning with @samp{-}. 1671 1672Alternatively, you can prevent misinterpretation of leading @samp{-} 1673by using @option{-e} for patterns and leading @samp{./} for files: 1674 1675@example 1676grep -e '--cut here--' ./* 1677@end example 1678 1679@item 1680Suppose I want to search for a whole word, not a part of a word? 1681 1682@example 1683grep -w 'hello' test*.log 1684@end example 1685 1686@noindent 1687searches only for instances of @samp{hello} that are entire words; 1688it does not match @samp{Othello}. 1689For more control, use @samp{\<} and 1690@samp{\>} to match the start and end of words. 1691For example: 1692 1693@example 1694grep 'hello\>' test*.log 1695@end example 1696 1697@noindent 1698searches only for words ending in @samp{hello}, so it matches the word 1699@samp{Othello}. 1700 1701@item 1702How do I output context around the matching lines? 1703 1704@example 1705grep -C 2 'hello' test*.log 1706@end example 1707 1708@noindent 1709prints two lines of context around each matching line. 1710 1711@item 1712How do I force @command{grep} to print the name of the file? 1713 1714Append @file{/dev/null}: 1715 1716@example 1717grep 'eli' /etc/passwd /dev/null 1718@end example 1719 1720gets you: 1721 1722@example 1723/etc/passwd:eli:x:2098:1000:Eli Smith:/home/eli:/bin/bash 1724@end example 1725 1726Alternatively, use @option{-H}, which is a GNU extension: 1727 1728@example 1729grep -H 'eli' /etc/passwd 1730@end example 1731 1732@item 1733Why do people use strange regular expressions on @command{ps} output? 1734 1735@example 1736ps -ef | grep '[c]ron' 1737@end example 1738 1739If the pattern had been written without the square brackets, it would 1740have matched not only the @command{ps} output line for @command{cron}, 1741but also the @command{ps} output line for @command{grep}. 1742Note that on some platforms, 1743@command{ps} limits the output to the width of the screen; 1744@command{grep} does not have any limit on the length of a line 1745except the available memory. 1746 1747@item 1748Why does @command{grep} report ``Binary file matches''? 1749 1750If @command{grep} listed all matching ``lines'' from a binary file, it 1751would probably generate output that is not useful, and it might even 1752muck up your display. 1753So GNU @command{grep} suppresses output from 1754files that appear to be binary files. 1755To force GNU @command{grep} 1756to output lines even from files that appear to be binary, use the 1757@option{-a} or @samp{--binary-files=text} option. 1758To eliminate the 1759``Binary file matches'' messages, use the @option{-I} or 1760@samp{--binary-files=without-match} option. 1761 1762@item 1763Why doesn't @samp{grep -lv} print non-matching file names? 1764 1765@samp{grep -lv} lists the names of all files containing one or more 1766lines that do not match. 1767To list the names of all files that contain no 1768matching lines, use the @option{-L} or @option{--files-without-match} 1769option. 1770 1771@item 1772I can do ``OR'' with @samp{|}, but what about ``AND''? 1773 1774@example 1775grep 'paul' /etc/motd | grep 'franc,ois' 1776@end example 1777 1778@noindent 1779finds all lines that contain both @samp{paul} and @samp{franc,ois}. 1780 1781@item 1782Why does the empty pattern match every input line? 1783 1784The @command{grep} command searches for lines that contain strings 1785that match a pattern. Every line contains the empty string, so an 1786empty pattern causes @command{grep} to find a match on each line. It 1787is not the only such pattern: @samp{^}, @samp{$}, and many 1788other patterns cause @command{grep} to match every line. 1789 1790To match empty lines, use the pattern @samp{^$}. To match blank 1791lines, use the pattern @samp{^[[:blank:]]*$}. To match no lines at 1792all, use the command @samp{grep -f /dev/null}. 1793 1794@item 1795How can I search in both standard input and in files? 1796 1797Use the special file name @samp{-}: 1798 1799@example 1800cat /etc/passwd | grep 'alain' - /etc/motd 1801@end example 1802 1803@item 1804Why is this back-reference failing? 1805 1806@example 1807echo 'ba' | grep -E '(a)\1|b\1' 1808@end example 1809 1810This gives no output, because the first alternate @samp{(a)\1} does not match, 1811as there is no @samp{aa} in the input, so the @samp{\1} in the second alternate 1812has nothing to refer back to, meaning it will never match anything. 1813(The second alternate in this example can only match 1814if the first alternate has matched---making the second one superfluous.) 1815 1816@item 1817How can I match across lines? 1818 1819Standard grep cannot do this, as it is fundamentally line-based. 1820Therefore, merely using the @code{[:space:]} character class does not 1821match newlines in the way you might expect. 1822 1823With the GNU @command{grep} option @option{-z} (@option{--null-data}), each 1824input and output ``line'' is null-terminated; @pxref{Other Options}. Thus, 1825you can match newlines in the input, but typically if there is a match 1826the entire input is output, so this usage is often combined with 1827output-suppressing options like @option{-q}, e.g.: 1828 1829@example 1830printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar' 1831@end example 1832 1833If this does not suffice, you can transform the input 1834before giving it to @command{grep}, or turn to @command{awk}, 1835@command{sed}, @command{perl}, or many other utilities that are 1836designed to operate across lines. 1837 1838@item 1839What do @command{grep}, @command{fgrep}, and @command{egrep} stand for? 1840 1841The name @command{grep} comes from the way line editing was done on Unix. 1842For example, 1843@command{ed} uses the following syntax 1844to print a list of matching lines on the screen: 1845 1846@example 1847global/regular expression/print 1848g/re/p 1849@end example 1850 1851@command{fgrep} stands for Fixed @command{grep}; 1852@command{egrep} stands for Extended @command{grep}. 1853 1854@end enumerate 1855 1856 1857@node Performance 1858@chapter Performance 1859 1860@cindex performance 1861Typically @command{grep} is an efficient way to search text. However, 1862it can be quite slow in some cases, and it can search large files 1863where even minor performance tweaking can help significantly. 1864Although the algorithm used by @command{grep} is an implementation 1865detail that can change from release to release, understanding its 1866basic strengths and weaknesses can help you improve its performance. 1867 1868The @command{grep} command operates partly via a set of automata that 1869are designed for efficiency, and partly via a slower matcher that 1870takes over when the fast matchers run into unusual features like 1871back-references. When feasible, the Boyer--Moore fast string 1872searching algorithm is used to match a single fixed pattern, and the 1873Aho--Corasick algorithm is used to match multiple fixed patterns. 1874 1875@cindex locales 1876Generally speaking @command{grep} operates more efficiently in 1877single-byte locales, since it can avoid the special processing needed 1878for multi-byte characters. If your patterns will work just as well 1879that way, setting @env{LC_ALL} to a single-byte locale can help 1880performance considerably. Setting @samp{LC_ALL='C'} can be 1881particularly efficient, as @command{grep} is tuned for that locale. 1882 1883@cindex case insensitive search 1884Outside the @samp{C} locale, case-insensitive search, and search for 1885bracket expressions like @samp{[a-z]} and @samp{[[=a=]b]}, can be 1886surprisingly inefficient due to difficulties in fast portable access to 1887concepts like multi-character collating elements. 1888 1889@cindex back-references 1890A back-reference such as @samp{\1} can hurt performance significantly 1891in some cases, since back-references cannot in general be implemented 1892via a finite state automaton, and instead trigger a backtracking 1893algorithm that can be quite inefficient. For example, although the 1894pattern @samp{^(.*)\1@{14@}(.*)\2@{13@}$} matches only lines whose 1895lengths can be written as a sum @math{15x + 14y} for nonnegative 1896integers @math{x} and @math{y}, the pattern matcher does not perform 1897linear Diophantine analysis and instead backtracks through all 1898possible matching strings, using an algorithm that is exponential in 1899the worst case. 1900 1901@cindex holes in files 1902On some operating systems that support files with holes---large 1903regions of zeros that are not physically present on secondary 1904storage---@command{grep} can skip over the holes efficiently without 1905needing to read the zeros. This optimization is not available if the 1906@option{-a} (@option{--binary-files=text}) option is used (@pxref{File and 1907Directory Selection}), unless the @option{-z} (@option{--null-data}) 1908option is also used (@pxref{Other Options}). 1909 1910For more about the algorithms used by @command{grep} and about 1911related string matching algorithms, see: 1912 1913@frenchspacing on 1914@itemize @bullet 1915@item 1916Aho AV. Algorithms for finding patterns in strings. 1917In: van Leeuwen J. @emph{Handbook of Theoretical Computer Science}, vol. A. 1918New York: Elsevier; 1990. p. 255--300. 1919This surveys classic string matching algorithms, some of which are 1920used by @command{grep}. 1921 1922@item 1923Aho AV, Corasick MJ. Efficient string matching: an aid to bibliographic search. 1924@emph{CACM}. 1975;18(6):333--40. 1925@url{https://dx.doi.org/10.1145/360825.360855}. 1926This introduces the Aho--Corasick algorithm. 1927 1928@item 1929Boyer RS, Moore JS. A fast string searching algorithm. 1930@emph{CACM}. 1977;20(10):762--72. 1931@url{https://dx.doi.org/10.1145/359842.359859}. 1932This introduces the Boyer--Moore algorithm. 1933 1934@item 1935Faro S, Lecroq T. The exact online string matching problem: a review 1936of the most recent results. 1937@emph{ACM Comput Surv}. 2013;45(2):13. 1938@url{https://dx.doi.org/10.1145/2431211.2431212}. 1939This surveys string matching algorithms that might help improve the 1940performance of @command{grep} in the future. 1941@end itemize 1942@frenchspacing off 1943 1944@node Reporting Bugs 1945@chapter Reporting bugs 1946 1947@cindex bugs, reporting 1948Bug reports can be found at the 1949@url{https://debbugs.gnu.org/cgi/pkgreport.cgi?package=grep, 1950GNU bug report logs for @command{grep}}. 1951If you find a bug not listed there, please email it to 1952@email{bug-grep@@gnu.org} to create a new bug report. 1953 1954@menu 1955* Known Bugs:: 1956@end menu 1957 1958@node Known Bugs 1959@section Known Bugs 1960@cindex Bugs, known 1961 1962Large repetition counts in the @samp{@{n,m@}} construct may cause 1963@command{grep} to use lots of memory. 1964In addition, certain other 1965obscure regular expressions require exponential time and 1966space, and may cause @command{grep} to run out of memory. 1967 1968Back-references can greatly slow down matching, as they can generate 1969exponentially many matching possibilities that can consume both time 1970and memory to explore. Also, the POSIX specification for 1971back-references is at times unclear. Furthermore, many regular 1972expression implementations have back-reference bugs that can cause 1973programs to return incorrect answers or even crash, and fixing these 1974bugs has often been low-priority---for example, as of 2019 the GNU C 1975library bug database contained back-reference bugs 52, 10844, 11053, 1976and 25322, with little sign of forthcoming fixes. Luckily, 1977back-references are rarely useful and it should be little trouble to 1978avoid them in practical applications. 1979 1980 1981@node Copying 1982@chapter Copying 1983@cindex copying 1984 1985GNU @command{grep} is licensed under the GNU GPL, which makes it @dfn{free 1986software}. 1987 1988The ``free'' in ``free software'' refers to liberty, not price. As 1989some GNU project advocates like to point out, think of ``free speech'' 1990rather than ``free beer''. In short, you have the right (freedom) to 1991run and change @command{grep} and distribute it to other people, and---if you 1992want---charge money for doing either. The important restriction is 1993that you have to grant your recipients the same rights and impose the 1994same restrictions. 1995 1996This general method of licensing software is sometimes called 1997@dfn{open source}. The GNU project prefers the term ``free software'' 1998for reasons outlined at 1999@url{https://www.gnu.org/philosophy/open-source-misses-the-point.html}. 2000 2001This manual is free documentation in the same sense. The 2002documentation license is included below. The license for the program 2003is available with the source code, or at 2004@url{https://www.gnu.org/licenses/gpl.html}. 2005 2006@menu 2007* GNU Free Documentation License:: 2008@end menu 2009 2010@node GNU Free Documentation License 2011@section GNU Free Documentation License 2012 2013@include fdl.texi 2014 2015 2016@node Index 2017@unnumbered Index 2018 2019@printindex cp 2020 2021@bye 2022