xref: /freebsd/usr.bin/awk/awk.1 (revision f552d7ad)
1.\"	$OpenBSD: awk.1,v 1.44 2015/09/14 20:06:58 schwarze Exp $
2.\"
3.\" Copyright (C) Lucent Technologies 1997
4.\" All Rights Reserved
5.\"
6.\" Permission to use, copy, modify, and distribute this software and
7.\" its documentation for any purpose and without fee is hereby
8.\" granted, provided that the above copyright notice appear in all
9.\" copies and that both that the copyright notice and this
10.\" permission notice and warranty disclaimer appear in supporting
11.\" documentation, and that the name Lucent Technologies or any of
12.\" its entities not be used in advertising or publicity pertaining
13.\" to distribution of the software without specific, written prior
14.\" permission.
15.\"
16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
23.\" THIS SOFTWARE.
24.Dd July 30, 2021
25.Dt AWK 1
26.Os
27.Sh NAME
28.Nm awk
29.Nd pattern-directed scanning and processing language
30.Sh SYNOPSIS
31.Nm awk
32.Op Fl safe
33.Op Fl version
34.Op Fl d Ns Op Ar n
35.Op Fl F Ar fs
36.Op Fl v Ar var Ns = Ns Ar value
37.Op Ar prog | Fl f Ar progfile
38.Ar
39.Sh DESCRIPTION
40.Nm
41scans each input
42.Ar file
43for lines that match any of a set of patterns specified literally in
44.Ar prog
45or in one or more files specified as
46.Fl f Ar progfile .
47With each pattern there can be an associated action that will be performed
48when a line of a
49.Ar file
50matches the pattern.
51Each line is matched against the
52pattern portion of every pattern-action statement;
53the associated action is performed for each matched pattern.
54The file name
55.Sq -
56means the standard input.
57Any
58.Ar file
59of the form
60.Ar var Ns = Ns Ar value
61is treated as an assignment, not a filename,
62and is executed at the time it would have been opened if it were a filename.
63.Pp
64The options are as follows:
65.Bl -tag -width "-safe "
66.It Fl d Ns Op Ar n
67Debug mode.
68Set debug level to
69.Ar n ,
70or 1 if
71.Ar n
72is not specified.
73A value greater than 1 causes
74.Nm
75to dump core on fatal errors.
76.It Fl F Ar fs
77Define the input field separator to be the regular expression
78.Ar fs .
79.It Fl f Ar progfile
80Read program code from the specified file
81.Ar progfile
82instead of from the command line.
83.It Fl safe
84Disable file output
85.Pf ( Ic print No > ,
86.Ic print No >> ) ,
87process creation
88.Po
89.Ar cmd | Ic getline ,
90.Ic print | ,
91.Ic system
92.Pc
93and access to the environment
94.Pf ( Va ENVIRON ;
95see the section on variables below).
96This is a first
97.Pq and not very reliable
98approximation to a
99.Dq safe
100version of
101.Nm .
102.It Fl version
103Print the version number of
104.Nm
105to standard output and exit.
106.It Fl v Ar var Ns = Ns Ar value
107Assign
108.Ar value
109to variable
110.Ar var
111before
112.Ar prog
113is executed;
114any number of
115.Fl v
116options may be present.
117.El
118.Pp
119The input is normally made up of input lines
120.Pq records
121separated by newlines, or by the value of
122.Va RS .
123If
124.Va RS
125is null, then any number of blank lines are used as the record separator,
126and newlines are used as field separators
127(in addition to the value of
128.Va FS ) .
129This is convenient when working with multi-line records.
130.Pp
131An input line is normally made up of fields separated by whitespace,
132or by the extended regular expression
133.Va FS
134as described below.
135The fields are denoted
136.Va $1 , $2 , ... ,
137while
138.Va $0
139refers to the entire line.
140If
141.Va FS
142is null, the input line is split into one field per character.
143While both gawk and mawk have the same behavior, it is unspecified in the
144.St -p1003.1-2008
145standard.
146If
147.Va FS
148is a single space, then leading and trailing blank and newline characters are
149skipped.
150Fields are delimited by one or more blank or newline characters.
151A blank character is a space or a tab.
152If
153.Va FS
154is a single character, other than space, fields are delimited by each single
155occurrence of that character.
156The
157.Va FS
158variable defaults to a single space.
159.Pp
160Normally, any number of blanks separate fields.
161In order to set the field separator to a single blank, use the
162.Fl F
163option with a value of
164.Sq [\ \&] .
165If a field separator of
166.Sq t
167is specified,
168.Nm
169treats it as if
170.Sq \et
171had been specified and uses
172.Aq TAB
173as the field separator.
174In order to use a literal
175.Sq t
176as the field separator, use the
177.Fl F
178option with a value of
179.Sq [t] .
180.Pp
181A pattern-action statement has the form
182.Pp
183.D1 Ar pattern Ic \&{ Ar action Ic \&}
184.Pp
185A missing
186.Ic \&{ Ar action Ic \&}
187means print the line;
188a missing pattern always matches.
189Pattern-action statements are separated by newlines or semicolons.
190.Pp
191Newlines are permitted after a terminating statement or following a comma
192.Pq Sq ,\& ,
193an open brace
194.Pq Sq { ,
195a logical AND
196.Pq Sq && ,
197a logical OR
198.Pq Sq || ,
199after the
200.Sq do
201or
202.Sq else
203keywords,
204or after the closing parenthesis of an
205.Sq if ,
206.Sq for ,
207or
208.Sq while
209statement.
210Additionally, a backslash
211.Pq Sq \e
212can be used to escape a newline between tokens.
213.Pp
214An action is a sequence of statements.
215A statement can be one of the following:
216.Pp
217.Bl -tag -width Ds -offset indent -compact
218.It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement
219.It Ic while Ar ( expression ) Ar statement
220.It Ic for Ar ( expression ; expression ; expression ) statement
221.It Ic for Ar ( var Ic in Ar array ) statement
222.It Ic do Ar statement Ic while Ar ( expression )
223.It Ic break
224.It Ic continue
225.It Xo Ic {
226.Op Ar statement ...
227.Ic }
228.Xc
229.It Xo Ar expression
230.No # commonly
231.Ar var No = Ar expression
232.Xc
233.It Xo Ic print
234.Op Ar expression-list
235.Op > Ns Ar expression
236.Xc
237.It Xo Ic printf Ar format
238.Op Ar ... , expression-list
239.Op > Ns Ar expression
240.Xc
241.It Ic return Op Ar expression
242.It Xo Ic next
243.No # skip remaining patterns on this input line
244.Xc
245.It Xo Ic nextfile
246.No # skip rest of this file, open next, start at top
247.Xc
248.It Xo Ic delete
249.Sm off
250.Ar array Ic \&[ Ar expression Ic \&]
251.Sm on
252.No # delete an array element
253.Xc
254.It Xo Ic delete Ar array
255.No # delete all elements of array
256.Xc
257.It Xo Ic exit
258.Op Ar expression
259.No # exit immediately; status is Ar expression
260.Xc
261.El
262.Pp
263Statements are terminated by
264semicolons, newlines or right braces.
265An empty
266.Ar expression-list
267stands for
268.Ar $0 .
269String constants are quoted
270.Li \&"" ,
271with the usual C escapes recognized within
272(see
273.Xr printf 1
274for a complete list of these).
275Expressions take on string or numeric values as appropriate,
276and are built using the operators
277.Ic + \- * / % ^
278.Pq exponentiation ,
279and concatenation
280.Pq indicated by whitespace .
281The operators
282.Ic \&! ++ \-\- += \-= *= /= %= ^=
283.Ic > >= < <= == != ?\&:
284are also available in expressions.
285Variables may be scalars, array elements
286(denoted
287.Li x[i] )
288or fields.
289Variables are initialized to the null string.
290Array subscripts may be any string,
291not necessarily numeric;
292this allows for a form of associative memory.
293Multiple subscripts such as
294.Li [i,j,k]
295are permitted; the constituents are concatenated,
296separated by the value of
297.Va SUBSEP
298.Pq see the section on variables below .
299.Pp
300The
301.Ic print
302statement prints its arguments on the standard output
303(or on a file if
304.Pf > Ar file
305or
306.Pf >> Ar file
307is present or on a pipe if
308.Pf |\ \& Ar cmd
309is present), separated by the current output field separator,
310and terminated by the output record separator.
311.Ar file
312and
313.Ar cmd
314may be literal names or parenthesized expressions;
315identical string values in different statements denote
316the same open file.
317The
318.Ic printf
319statement formats its expression list according to the format
320(see
321.Xr printf 1 ) .
322.Pp
323Patterns are arbitrary Boolean combinations
324(with
325.Ic "\&! || &&" )
326of regular expressions and
327relational expressions.
328.Nm
329supports extended regular expressions
330.Pq EREs .
331See
332.Xr re_format 7
333for more information on regular expressions.
334Isolated regular expressions
335in a pattern apply to the entire line.
336Regular expressions may also occur in
337relational expressions, using the operators
338.Ic ~
339and
340.Ic !~ .
341.Pf / Ar re Ns /
342is a constant regular expression;
343any string (constant or variable) may be used
344as a regular expression, except in the position of an isolated regular expression
345in a pattern.
346.Pp
347A pattern may consist of two patterns separated by a comma;
348in this case, the action is performed for all lines
349from an occurrence of the first pattern
350through an occurrence of the second.
351.Pp
352A relational expression is one of the following:
353.Pp
354.Bl -tag -width Ds -offset indent -compact
355.It Ar expression matchop regular-expression
356.It Ar expression relop expression
357.It Ar expression Ic in Ar array-name
358.It Xo Ic \&( Ns
359.Ar expr , expr , \&... Ns Ic \&) in
360.Ar array-name
361.Xc
362.El
363.Pp
364where a
365.Ar relop
366is any of the six relational operators in C, and a
367.Ar matchop
368is either
369.Ic ~
370(matches)
371or
372.Ic !~
373(does not match).
374A conditional is an arithmetic expression,
375a relational expression,
376or a Boolean combination
377of these.
378.Pp
379The special patterns
380.Ic BEGIN
381and
382.Ic END
383may be used to capture control before the first input line is read
384and after the last.
385.Ic BEGIN
386and
387.Ic END
388do not combine with other patterns.
389.Pp
390Variable names with special meanings:
391.Pp
392.Bl -tag -width "FILENAME " -compact
393.It Va ARGC
394Argument count, assignable.
395.It Va ARGV
396Argument array, assignable;
397non-null members are taken as filenames.
398.It Va CONVFMT
399Conversion format when converting numbers
400(default
401.Qq Li %.6g ) .
402.It Va ENVIRON
403Array of environment variables; subscripts are names.
404.It Va FILENAME
405The name of the current input file.
406.It Va FNR
407Ordinal number of the current record in the current file.
408.It Va FS
409Regular expression used to separate fields; also settable
410by option
411.Fl F Ar fs .
412.It Va NF
413Number of fields in the current record.
414.Va $NF
415can be used to obtain the value of the last field in the current record.
416.It Va NR
417Ordinal number of the current record.
418.It Va OFMT
419Output format for numbers (default
420.Qq Li %.6g ) .
421.It Va OFS
422Output field separator (default blank).
423.It Va ORS
424Output record separator (default newline).
425.It Va RLENGTH
426The length of the string matched by the
427.Fn match
428function.
429.It Va RS
430Input record separator (default newline).
431.It Va RSTART
432The starting position of the string matched by the
433.Fn match
434function.
435.It Va SUBSEP
436Separates multiple subscripts (default 034).
437.El
438.Sh FUNCTIONS
439The awk language has a variety of built-in functions:
440arithmetic, string, input/output, general, and bit-operation.
441.Pp
442Functions may be defined (at the position of a pattern-action statement)
443thusly:
444.Pp
445.Dl function foo(a, b, c) { ...; return x }
446.Pp
447Parameters are passed by value if scalar, and by reference if array name;
448functions may be called recursively.
449Parameters are local to the function; all other variables are global.
450Thus local variables may be created by providing excess parameters in
451the function definition.
452.Ss Arithmetic Functions
453.Bl -tag -width "atan2(y, x)"
454.It Fn atan2 y x
455Return the arctangent of
456.Fa y Ns / Ns Fa x
457in radians.
458.It Fn cos x
459Return the cosine of
460.Fa x ,
461where
462.Fa x
463is in radians.
464.It Fn exp x
465Return the exponential of
466.Fa x .
467.It Fn int x
468Return
469.Fa x
470truncated to an integer value.
471.It Fn log x
472Return the natural logarithm of
473.Fa x .
474.It Fn rand
475Return a random number,
476.Fa n ,
477such that
478.Sm off
479.Pf 0 \*(Le Fa n No \*(Lt 1 .
480.Sm on
481.It Fn sin x
482Return the sine of
483.Fa x ,
484where
485.Fa x
486is in radians.
487.It Fn sqrt x
488Return the square root of
489.Fa x .
490.It Fn srand expr
491Sets seed for
492.Fn rand
493to
494.Fa expr
495and returns the previous seed.
496If
497.Fa expr
498is omitted, the time of day is used instead.
499.El
500.Ss String Functions
501.Bl -tag -width "split(s, a, fs)"
502.It Fn gsub r t s
503The same as
504.Fn sub
505except that all occurrences of the regular expression are replaced.
506.Fn gsub
507returns the number of replacements.
508.It Fn index s t
509The position in
510.Fa s
511where the string
512.Fa t
513occurs, or 0 if it does not.
514.It Fn length s
515The length of
516.Fa s
517taken as a string,
518or of
519.Va $0
520if no argument is given.
521.It Fn match s r
522The position in
523.Fa s
524where the regular expression
525.Fa r
526occurs, or 0 if it does not.
527The variable
528.Va RSTART
529is set to the starting position of the matched string
530.Pq which is the same as the returned value
531or zero if no match is found.
532The variable
533.Va RLENGTH
534is set to the length of the matched string,
535or \-1 if no match is found.
536.It Fn split s a fs
537Splits the string
538.Fa s
539into array elements
540.Va a[1] , a[2] , ... , a[n]
541and returns
542.Va n .
543The separation is done with the regular expression
544.Ar fs
545or with the field separator
546.Va FS
547if
548.Ar fs
549is not given.
550An empty string as field separator splits the string
551into one array element per character.
552.It Fn sprintf fmt expr ...
553The string resulting from formatting
554.Fa expr , ...
555according to the
556.Xr printf 1
557format
558.Fa fmt .
559.It Fn sub r t s
560Substitutes
561.Fa t
562for the first occurrence of the regular expression
563.Fa r
564in the string
565.Fa s .
566If
567.Fa s
568is not given,
569.Va $0
570is used.
571An ampersand
572.Pq Sq &
573in
574.Fa t
575is replaced in string
576.Fa s
577with regular expression
578.Fa r .
579A literal ampersand can be specified by preceding it with two backslashes
580.Pq Sq \e\e .
581A literal backslash can be specified by preceding it with another backslash
582.Pq Sq \e\e .
583.Fn sub
584returns the number of replacements.
585.It Fn substr s m n
586Return at most the
587.Fa n Ns -character
588substring of
589.Fa s
590that begins at position
591.Fa m
592counted from 1.
593If
594.Fa n
595is omitted, or if
596.Fa n
597specifies more characters than are left in the string,
598the length of the substring is limited by the length of
599.Fa s .
600.It Fn tolower str
601Returns a copy of
602.Fa str
603with all upper-case characters translated to their
604corresponding lower-case equivalents.
605.It Fn toupper str
606Returns a copy of
607.Fa str
608with all lower-case characters translated to their
609corresponding upper-case equivalents.
610.El
611.Ss Input/Output and General Functions
612.Bl -tag -width "getline [var] < file"
613.It Fn close expr
614Closes the file or pipe
615.Fa expr .
616.Fa expr
617should match the string that was used to open the file or pipe.
618.It Ar cmd | Ic getline Op Va var
619Read a record of input from a stream piped from the output of
620.Ar cmd .
621If
622.Va var
623is omitted, the variables
624.Va $0
625and
626.Va NF
627are set.
628Otherwise
629.Va var
630is set.
631If the stream is not open, it is opened.
632As long as the stream remains open, subsequent calls
633will read subsequent records from the stream.
634The stream remains open until explicitly closed with a call to
635.Fn close .
636.Ic getline
637returns 1 for a successful input, 0 for end of file, and \-1 for an error.
638.It Fn fflush [expr]
639Flushes any buffered output for the file or pipe
640.Fa expr ,
641or all open files or pipes if
642.Fa expr
643is omitted.
644.Fa expr
645should match the string that was used to open the file or pipe.
646.It Ic getline
647Sets
648.Va $0
649to the next input record from the current input file.
650This form of
651.Ic getline
652sets the variables
653.Va NF ,
654.Va NR ,
655and
656.Va FNR .
657.Ic getline
658returns 1 for a successful input, 0 for end of file, and \-1 for an error.
659.It Ic getline Va var
660Sets
661.Va $0
662to variable
663.Va var .
664This form of
665.Ic getline
666sets the variables
667.Va NR
668and
669.Va FNR .
670.Ic getline
671returns 1 for a successful input, 0 for end of file, and \-1 for an error.
672.It Xo
673.Ic getline Op Va var
674.Pf \ \&< Ar file
675.Xc
676Sets
677.Va $0
678to the next record from
679.Ar file .
680If
681.Va var
682is omitted, the variables
683.Va $0
684and
685.Va NF
686are set.
687Otherwise
688.Va var
689is set.
690If
691.Ar file
692is not open, it is opened.
693As long as the stream remains open, subsequent calls will read subsequent
694records from
695.Ar file .
696.Ar file
697remains open until explicitly closed with a call to
698.Fn close .
699.It Fn system cmd
700Executes
701.Fa cmd
702and returns its exit status.
703.El
704.Ss Bit-Operation Functions
705.Bl -tag -width "lshift(a, b)"
706.It Fn compl x
707Returns the bitwise complement of integer argument x.
708.It Fn and v1 v2 ...
709Performs a bitwise AND on all arguments provided, as integers.
710There must be at least two values.
711.It Fn or v1 v2 ...
712Performs a bitwise OR on all arguments provided, as integers.
713There must be at least two values.
714.It Fn xor v1 v2 ...
715Performs a bitwise Exclusive-OR on all arguments provided, as integers.
716There must be at least two values.
717.It Fn lshift x n
718Returns integer argument x shifted by n bits to the left.
719.It Fn rshift x n
720Returns integer argument x shifted by n bits to the right.
721.El
722.Sh EXIT STATUS
723.Ex -std awk
724.Pp
725But note that the
726.Ic exit
727expression can modify the exit status.
728.Sh EXAMPLES
729Print lines longer than 72 characters:
730.Pp
731.Dl length($0) > 72
732.Pp
733Print first two fields in opposite order:
734.Pp
735.Dl { print $2, $1 }
736.Pp
737Same, with input fields separated by comma and/or blanks and tabs:
738.Bd -literal -offset indent
739BEGIN { FS = ",[ \et]*|[ \et]+" }
740      { print $2, $1 }
741.Ed
742.Pp
743Add up first column, print sum and average:
744.Bd -literal -offset indent
745{ s += $1 }
746END { print "sum is", s, " average is", s/NR }
747.Ed
748.Pp
749Print all lines between start/stop pairs:
750.Pp
751.Dl /start/, /stop/
752.Pp
753Simulate echo(1):
754.Bd -literal -offset indent
755BEGIN { # Simulate echo(1)
756        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
757        printf "\en"
758        exit }
759.Ed
760.Pp
761Print an error message to standard error:
762.Bd -literal -offset indent
763{ print "error!" > "/dev/stderr" }
764.Ed
765.Sh SEE ALSO
766.Xr cut 1 ,
767.Xr lex 1 ,
768.Xr printf 1 ,
769.Xr sed 1 ,
770.Xr re_format 7
771.Rs
772.%A A. V. Aho
773.%A B. W. Kernighan
774.%A P. J. Weinberger
775.%T The AWK Programming Language
776.%I Addison-Wesley
777.%D 1988
778.%O ISBN 0-201-07981-X
779.Re
780.Sh STANDARDS
781The
782.Nm
783utility is compliant with the
784.St -p1003.1-2008
785specification,
786except
787.Nm
788does not support {n,m} pattern matching.
789.Pp
790The flags
791.Fl d ,
792.Fl safe ,
793and
794.Fl version
795as well as the commands
796.Cm fflush , compl , and , or ,
797.Cm xor , lshift , rshift ,
798are extensions to that specification.
799.Sh HISTORY
800An
801.Nm
802utility appeared in
803.At v7 .
804.Sh BUGS
805There are no explicit conversions between numbers and strings.
806To force an expression to be treated as a number add 0 to it;
807to force it to be treated as a string concatenate
808.Li \&""
809to it.
810.Pp
811The scope rules for variables in functions are a botch;
812the syntax is worse.
813.Sh DEPRECATED BEHAVIOR
814One True Awk has accepted
815.Fl F Ar t
816to mean the same as
817.Fl F Ar <TAB>
818to make it easier to specify tabs as the separator character.
819Upstream One True Awk has deprecated this wart in the name of better
820compatibility with other awk implementations like gawk and mawk.
821.Pp
822Historically,
823.Nm
824did not accept
825.Dq 0x
826as a hex string.
827However, since One True Awk used strtod to convert strings to floats, and since
828.Dq 0x12
829is a valid hexadecimal representation of a floating point number,
830On
831.Fx ,
832.Nm
833has accepted this notation as an extension since One True Awk was imported in
834.Fx 5.0 .
835Upstream One True Awk has restored the historical behavior for better
836compatibility between the different awk implementations.
837Both gawk and mawk already behave similarly.
838Starting with
839.Fx 14.0
840.Nm
841will no longer accept this extension.
842.Pp
843The
844.Fx
845.Nm
846sets the locale for many years to match the environment it was running in.
847This lead to pattern ranges, like
848.Dq "[A-Z]"
849sometimes matching lower case characters in some locales.
850This misbehavior was never in upstream One True Awk and has been removed as a
851bug in
852.Fx 12.3 ,
853.Fx 13.1 ,
854and
855.Fx 14.0 .
856