xref: /freebsd/usr.bin/awk/awk.1 (revision 9768746b)
1.\"	$OpenBSD: awk.1,v 1.44 2015/09/14 20:06:58 schwarze Exp $
2.\"
3.\" Copyright (C) Lucent Technologies 1997
4.\" All Rights Reserved
5.\"
6.\" Permission to use, copy, modify, and distribute this software and
7.\" its documentation for any purpose and without fee is hereby
8.\" granted, provided that the above copyright notice appear in all
9.\" copies and that both that the copyright notice and this
10.\" permission notice and warranty disclaimer appear in supporting
11.\" documentation, and that the name Lucent Technologies or any of
12.\" its entities not be used in advertising or publicity pertaining
13.\" to distribution of the software without specific, written prior
14.\" permission.
15.\"
16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
23.\" THIS SOFTWARE.
24.\"
25.\"	$FreeBSD$
26.Dd July 30, 2021
27.Dt AWK 1
28.Os
29.Sh NAME
30.Nm awk
31.Nd pattern-directed scanning and processing language
32.Sh SYNOPSIS
33.Nm awk
34.Op Fl safe
35.Op Fl version
36.Op Fl d Ns Op Ar n
37.Op Fl F Ar fs
38.Op Fl v Ar var Ns = Ns Ar value
39.Op Ar prog | Fl f Ar progfile
40.Ar
41.Sh DESCRIPTION
42.Nm
43scans each input
44.Ar file
45for lines that match any of a set of patterns specified literally in
46.Ar prog
47or in one or more files specified as
48.Fl f Ar progfile .
49With each pattern there can be an associated action that will be performed
50when a line of a
51.Ar file
52matches the pattern.
53Each line is matched against the
54pattern portion of every pattern-action statement;
55the associated action is performed for each matched pattern.
56The file name
57.Sq -
58means the standard input.
59Any
60.Ar file
61of the form
62.Ar var Ns = Ns Ar value
63is treated as an assignment, not a filename,
64and is executed at the time it would have been opened if it were a filename.
65.Pp
66The options are as follows:
67.Bl -tag -width "-safe "
68.It Fl d Ns Op Ar n
69Debug mode.
70Set debug level to
71.Ar n ,
72or 1 if
73.Ar n
74is not specified.
75A value greater than 1 causes
76.Nm
77to dump core on fatal errors.
78.It Fl F Ar fs
79Define the input field separator to be the regular expression
80.Ar fs .
81.It Fl f Ar progfile
82Read program code from the specified file
83.Ar progfile
84instead of from the command line.
85.It Fl safe
86Disable file output
87.Pf ( Ic print No > ,
88.Ic print No >> ) ,
89process creation
90.Po
91.Ar cmd | Ic getline ,
92.Ic print | ,
93.Ic system
94.Pc
95and access to the environment
96.Pf ( Va ENVIRON ;
97see the section on variables below).
98This is a first
99.Pq and not very reliable
100approximation to a
101.Dq safe
102version of
103.Nm .
104.It Fl version
105Print the version number of
106.Nm
107to standard output and exit.
108.It Fl v Ar var Ns = Ns Ar value
109Assign
110.Ar value
111to variable
112.Ar var
113before
114.Ar prog
115is executed;
116any number of
117.Fl v
118options may be present.
119.El
120.Pp
121The input is normally made up of input lines
122.Pq records
123separated by newlines, or by the value of
124.Va RS .
125If
126.Va RS
127is null, then any number of blank lines are used as the record separator,
128and newlines are used as field separators
129(in addition to the value of
130.Va FS ) .
131This is convenient when working with multi-line records.
132.Pp
133An input line is normally made up of fields separated by whitespace,
134or by the extended regular expression
135.Va FS
136as described below.
137The fields are denoted
138.Va $1 , $2 , ... ,
139while
140.Va $0
141refers to the entire line.
142If
143.Va FS
144is null, the input line is split into one field per character.
145While both gawk and mawk have the same behavior, it is unspecified in the
146.St -p1003.1-2008
147standard.
148If
149.Va FS
150is a single space, then leading and trailing blank and newline characters are
151skipped.
152Fields are delimited by one or more blank or newline characters.
153A blank character is a space or a tab.
154If
155.Va FS
156is a single character, other than space, fields are delimited by each single
157occurrence of that character.
158The
159.Va FS
160variable defaults to a single space.
161.Pp
162Normally, any number of blanks separate fields.
163In order to set the field separator to a single blank, use the
164.Fl F
165option with a value of
166.Sq [\ \&] .
167If a field separator of
168.Sq t
169is specified,
170.Nm
171treats it as if
172.Sq \et
173had been specified and uses
174.Aq TAB
175as the field separator.
176In order to use a literal
177.Sq t
178as the field separator, use the
179.Fl F
180option with a value of
181.Sq [t] .
182.Pp
183A pattern-action statement has the form
184.Pp
185.D1 Ar pattern Ic \&{ Ar action Ic \&}
186.Pp
187A missing
188.Ic \&{ Ar action Ic \&}
189means print the line;
190a missing pattern always matches.
191Pattern-action statements are separated by newlines or semicolons.
192.Pp
193Newlines are permitted after a terminating statement or following a comma
194.Pq Sq ,\& ,
195an open brace
196.Pq Sq { ,
197a logical AND
198.Pq Sq && ,
199a logical OR
200.Pq Sq || ,
201after the
202.Sq do
203or
204.Sq else
205keywords,
206or after the closing parenthesis of an
207.Sq if ,
208.Sq for ,
209or
210.Sq while
211statement.
212Additionally, a backslash
213.Pq Sq \e
214can be used to escape a newline between tokens.
215.Pp
216An action is a sequence of statements.
217A statement can be one of the following:
218.Pp
219.Bl -tag -width Ds -offset indent -compact
220.It Ic if Ar ( expression ) Ar statement Op Ic else Ar statement
221.It Ic while Ar ( expression ) Ar statement
222.It Ic for Ar ( expression ; expression ; expression ) statement
223.It Ic for Ar ( var Ic in Ar array ) statement
224.It Ic do Ar statement Ic while Ar ( expression )
225.It Ic break
226.It Ic continue
227.It Xo Ic {
228.Op Ar statement ...
229.Ic }
230.Xc
231.It Xo Ar expression
232.No # commonly
233.Ar var No = Ar expression
234.Xc
235.It Xo Ic print
236.Op Ar expression-list
237.Op > Ns Ar expression
238.Xc
239.It Xo Ic printf Ar format
240.Op Ar ... , expression-list
241.Op > Ns Ar expression
242.Xc
243.It Ic return Op Ar expression
244.It Xo Ic next
245.No # skip remaining patterns on this input line
246.Xc
247.It Xo Ic nextfile
248.No # skip rest of this file, open next, start at top
249.Xc
250.It Xo Ic delete
251.Sm off
252.Ar array Ic \&[ Ar expression Ic \&]
253.Sm on
254.No # delete an array element
255.Xc
256.It Xo Ic delete Ar array
257.No # delete all elements of array
258.Xc
259.It Xo Ic exit
260.Op Ar expression
261.No # exit immediately; status is Ar expression
262.Xc
263.El
264.Pp
265Statements are terminated by
266semicolons, newlines or right braces.
267An empty
268.Ar expression-list
269stands for
270.Ar $0 .
271String constants are quoted
272.Li \&"" ,
273with the usual C escapes recognized within
274(see
275.Xr printf 1
276for a complete list of these).
277Expressions take on string or numeric values as appropriate,
278and are built using the operators
279.Ic + \- * / % ^
280.Pq exponentiation ,
281and concatenation
282.Pq indicated by whitespace .
283The operators
284.Ic \&! ++ \-\- += \-= *= /= %= ^=
285.Ic > >= < <= == != ?\&:
286are also available in expressions.
287Variables may be scalars, array elements
288(denoted
289.Li x[i] )
290or fields.
291Variables are initialized to the null string.
292Array subscripts may be any string,
293not necessarily numeric;
294this allows for a form of associative memory.
295Multiple subscripts such as
296.Li [i,j,k]
297are permitted; the constituents are concatenated,
298separated by the value of
299.Va SUBSEP
300.Pq see the section on variables below .
301.Pp
302The
303.Ic print
304statement prints its arguments on the standard output
305(or on a file if
306.Pf > Ar file
307or
308.Pf >> Ar file
309is present or on a pipe if
310.Pf |\ \& Ar cmd
311is present), separated by the current output field separator,
312and terminated by the output record separator.
313.Ar file
314and
315.Ar cmd
316may be literal names or parenthesized expressions;
317identical string values in different statements denote
318the same open file.
319The
320.Ic printf
321statement formats its expression list according to the format
322(see
323.Xr printf 1 ) .
324.Pp
325Patterns are arbitrary Boolean combinations
326(with
327.Ic "\&! || &&" )
328of regular expressions and
329relational expressions.
330.Nm
331supports extended regular expressions
332.Pq EREs .
333See
334.Xr re_format 7
335for more information on regular expressions.
336Isolated regular expressions
337in a pattern apply to the entire line.
338Regular expressions may also occur in
339relational expressions, using the operators
340.Ic ~
341and
342.Ic !~ .
343.Pf / Ar re Ns /
344is a constant regular expression;
345any string (constant or variable) may be used
346as a regular expression, except in the position of an isolated regular expression
347in a pattern.
348.Pp
349A pattern may consist of two patterns separated by a comma;
350in this case, the action is performed for all lines
351from an occurrence of the first pattern
352through an occurrence of the second.
353.Pp
354A relational expression is one of the following:
355.Pp
356.Bl -tag -width Ds -offset indent -compact
357.It Ar expression matchop regular-expression
358.It Ar expression relop expression
359.It Ar expression Ic in Ar array-name
360.It Xo Ic \&( Ns
361.Ar expr , expr , \&... Ns Ic \&) in
362.Ar array-name
363.Xc
364.El
365.Pp
366where a
367.Ar relop
368is any of the six relational operators in C, and a
369.Ar matchop
370is either
371.Ic ~
372(matches)
373or
374.Ic !~
375(does not match).
376A conditional is an arithmetic expression,
377a relational expression,
378or a Boolean combination
379of these.
380.Pp
381The special patterns
382.Ic BEGIN
383and
384.Ic END
385may be used to capture control before the first input line is read
386and after the last.
387.Ic BEGIN
388and
389.Ic END
390do not combine with other patterns.
391.Pp
392Variable names with special meanings:
393.Pp
394.Bl -tag -width "FILENAME " -compact
395.It Va ARGC
396Argument count, assignable.
397.It Va ARGV
398Argument array, assignable;
399non-null members are taken as filenames.
400.It Va CONVFMT
401Conversion format when converting numbers
402(default
403.Qq Li %.6g ) .
404.It Va ENVIRON
405Array of environment variables; subscripts are names.
406.It Va FILENAME
407The name of the current input file.
408.It Va FNR
409Ordinal number of the current record in the current file.
410.It Va FS
411Regular expression used to separate fields; also settable
412by option
413.Fl F Ar fs .
414.It Va NF
415Number of fields in the current record.
416.Va $NF
417can be used to obtain the value of the last field in the current record.
418.It Va NR
419Ordinal number of the current record.
420.It Va OFMT
421Output format for numbers (default
422.Qq Li %.6g ) .
423.It Va OFS
424Output field separator (default blank).
425.It Va ORS
426Output record separator (default newline).
427.It Va RLENGTH
428The length of the string matched by the
429.Fn match
430function.
431.It Va RS
432Input record separator (default newline).
433.It Va RSTART
434The starting position of the string matched by the
435.Fn match
436function.
437.It Va SUBSEP
438Separates multiple subscripts (default 034).
439.El
440.Sh FUNCTIONS
441The awk language has a variety of built-in functions:
442arithmetic, string, input/output, general, and bit-operation.
443.Pp
444Functions may be defined (at the position of a pattern-action statement)
445thusly:
446.Pp
447.Dl function foo(a, b, c) { ...; return x }
448.Pp
449Parameters are passed by value if scalar, and by reference if array name;
450functions may be called recursively.
451Parameters are local to the function; all other variables are global.
452Thus local variables may be created by providing excess parameters in
453the function definition.
454.Ss Arithmetic Functions
455.Bl -tag -width "atan2(y, x)"
456.It Fn atan2 y x
457Return the arctangent of
458.Fa y Ns / Ns Fa x
459in radians.
460.It Fn cos x
461Return the cosine of
462.Fa x ,
463where
464.Fa x
465is in radians.
466.It Fn exp x
467Return the exponential of
468.Fa x .
469.It Fn int x
470Return
471.Fa x
472truncated to an integer value.
473.It Fn log x
474Return the natural logarithm of
475.Fa x .
476.It Fn rand
477Return a random number,
478.Fa n ,
479such that
480.Sm off
481.Pf 0 \*(Le Fa n No \*(Lt 1 .
482.Sm on
483.It Fn sin x
484Return the sine of
485.Fa x ,
486where
487.Fa x
488is in radians.
489.It Fn sqrt x
490Return the square root of
491.Fa x .
492.It Fn srand expr
493Sets seed for
494.Fn rand
495to
496.Fa expr
497and returns the previous seed.
498If
499.Fa expr
500is omitted, the time of day is used instead.
501.El
502.Ss String Functions
503.Bl -tag -width "split(s, a, fs)"
504.It Fn gsub r t s
505The same as
506.Fn sub
507except that all occurrences of the regular expression are replaced.
508.Fn gsub
509returns the number of replacements.
510.It Fn index s t
511The position in
512.Fa s
513where the string
514.Fa t
515occurs, or 0 if it does not.
516.It Fn length s
517The length of
518.Fa s
519taken as a string,
520or of
521.Va $0
522if no argument is given.
523.It Fn match s r
524The position in
525.Fa s
526where the regular expression
527.Fa r
528occurs, or 0 if it does not.
529The variable
530.Va RSTART
531is set to the starting position of the matched string
532.Pq which is the same as the returned value
533or zero if no match is found.
534The variable
535.Va RLENGTH
536is set to the length of the matched string,
537or \-1 if no match is found.
538.It Fn split s a fs
539Splits the string
540.Fa s
541into array elements
542.Va a[1] , a[2] , ... , a[n]
543and returns
544.Va n .
545The separation is done with the regular expression
546.Ar fs
547or with the field separator
548.Va FS
549if
550.Ar fs
551is not given.
552An empty string as field separator splits the string
553into one array element per character.
554.It Fn sprintf fmt expr ...
555The string resulting from formatting
556.Fa expr , ...
557according to the
558.Xr printf 1
559format
560.Fa fmt .
561.It Fn sub r t s
562Substitutes
563.Fa t
564for the first occurrence of the regular expression
565.Fa r
566in the string
567.Fa s .
568If
569.Fa s
570is not given,
571.Va $0
572is used.
573An ampersand
574.Pq Sq &
575in
576.Fa t
577is replaced in string
578.Fa s
579with regular expression
580.Fa r .
581A literal ampersand can be specified by preceding it with two backslashes
582.Pq Sq \e\e .
583A literal backslash can be specified by preceding it with another backslash
584.Pq Sq \e\e .
585.Fn sub
586returns the number of replacements.
587.It Fn substr s m n
588Return at most the
589.Fa n Ns -character
590substring of
591.Fa s
592that begins at position
593.Fa m
594counted from 1.
595If
596.Fa n
597is omitted, or if
598.Fa n
599specifies more characters than are left in the string,
600the length of the substring is limited by the length of
601.Fa s .
602.It Fn tolower str
603Returns a copy of
604.Fa str
605with all upper-case characters translated to their
606corresponding lower-case equivalents.
607.It Fn toupper str
608Returns a copy of
609.Fa str
610with all lower-case characters translated to their
611corresponding upper-case equivalents.
612.El
613.Ss Input/Output and General Functions
614.Bl -tag -width "getline [var] < file"
615.It Fn close expr
616Closes the file or pipe
617.Fa expr .
618.Fa expr
619should match the string that was used to open the file or pipe.
620.It Ar cmd | Ic getline Op Va var
621Read a record of input from a stream piped from the output of
622.Ar cmd .
623If
624.Va var
625is omitted, the variables
626.Va $0
627and
628.Va NF
629are set.
630Otherwise
631.Va var
632is set.
633If the stream is not open, it is opened.
634As long as the stream remains open, subsequent calls
635will read subsequent records from the stream.
636The stream remains open until explicitly closed with a call to
637.Fn close .
638.Ic getline
639returns 1 for a successful input, 0 for end of file, and \-1 for an error.
640.It Fn fflush [expr]
641Flushes any buffered output for the file or pipe
642.Fa expr ,
643or all open files or pipes if
644.Fa expr
645is omitted.
646.Fa expr
647should match the string that was used to open the file or pipe.
648.It Ic getline
649Sets
650.Va $0
651to the next input record from the current input file.
652This form of
653.Ic getline
654sets the variables
655.Va NF ,
656.Va NR ,
657and
658.Va FNR .
659.Ic getline
660returns 1 for a successful input, 0 for end of file, and \-1 for an error.
661.It Ic getline Va var
662Sets
663.Va $0
664to variable
665.Va var .
666This form of
667.Ic getline
668sets the variables
669.Va NR
670and
671.Va FNR .
672.Ic getline
673returns 1 for a successful input, 0 for end of file, and \-1 for an error.
674.It Xo
675.Ic getline Op Va var
676.Pf \ \&< Ar file
677.Xc
678Sets
679.Va $0
680to the next record from
681.Ar file .
682If
683.Va var
684is omitted, the variables
685.Va $0
686and
687.Va NF
688are set.
689Otherwise
690.Va var
691is set.
692If
693.Ar file
694is not open, it is opened.
695As long as the stream remains open, subsequent calls will read subsequent
696records from
697.Ar file .
698.Ar file
699remains open until explicitly closed with a call to
700.Fn close .
701.It Fn system cmd
702Executes
703.Fa cmd
704and returns its exit status.
705.El
706.Ss Bit-Operation Functions
707.Bl -tag -width "lshift(a, b)"
708.It Fn compl x
709Returns the bitwise complement of integer argument x.
710.It Fn and v1 v2 ...
711Performs a bitwise AND on all arguments provided, as integers.
712There must be at least two values.
713.It Fn or v1 v2 ...
714Performs a bitwise OR on all arguments provided, as integers.
715There must be at least two values.
716.It Fn xor v1 v2 ...
717Performs a bitwise Exclusive-OR on all arguments provided, as integers.
718There must be at least two values.
719.It Fn lshift x n
720Returns integer argument x shifted by n bits to the left.
721.It Fn rshift x n
722Returns integer argument x shifted by n bits to the right.
723.El
724.Sh EXIT STATUS
725.Ex -std awk
726.Pp
727But note that the
728.Ic exit
729expression can modify the exit status.
730.Sh EXAMPLES
731Print lines longer than 72 characters:
732.Pp
733.Dl length($0) > 72
734.Pp
735Print first two fields in opposite order:
736.Pp
737.Dl { print $2, $1 }
738.Pp
739Same, with input fields separated by comma and/or blanks and tabs:
740.Bd -literal -offset indent
741BEGIN { FS = ",[ \et]*|[ \et]+" }
742      { print $2, $1 }
743.Ed
744.Pp
745Add up first column, print sum and average:
746.Bd -literal -offset indent
747{ s += $1 }
748END { print "sum is", s, " average is", s/NR }
749.Ed
750.Pp
751Print all lines between start/stop pairs:
752.Pp
753.Dl /start/, /stop/
754.Pp
755Simulate echo(1):
756.Bd -literal -offset indent
757BEGIN { # Simulate echo(1)
758        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
759        printf "\en"
760        exit }
761.Ed
762.Pp
763Print an error message to standard error:
764.Bd -literal -offset indent
765{ print "error!" > "/dev/stderr" }
766.Ed
767.Sh SEE ALSO
768.Xr cut 1 ,
769.Xr lex 1 ,
770.Xr printf 1 ,
771.Xr sed 1 ,
772.Xr re_format 7
773.Rs
774.%A A. V. Aho
775.%A B. W. Kernighan
776.%A P. J. Weinberger
777.%T The AWK Programming Language
778.%I Addison-Wesley
779.%D 1988
780.%O ISBN 0-201-07981-X
781.Re
782.Sh STANDARDS
783The
784.Nm
785utility is compliant with the
786.St -p1003.1-2008
787specification,
788except
789.Nm
790does not support {n,m} pattern matching.
791.Pp
792The flags
793.Fl d ,
794.Fl safe ,
795and
796.Fl version
797as well as the commands
798.Cm fflush , compl , and , or ,
799.Cm xor , lshift , rshift ,
800are extensions to that specification.
801.Sh HISTORY
802An
803.Nm
804utility appeared in
805.At v7 .
806.Sh BUGS
807There are no explicit conversions between numbers and strings.
808To force an expression to be treated as a number add 0 to it;
809to force it to be treated as a string concatenate
810.Li \&""
811to it.
812.Pp
813The scope rules for variables in functions are a botch;
814the syntax is worse.
815.Sh DEPRECATED BEHAVIOR
816One True Awk has accpeted
817.Fl F Ar t
818to mean the same as
819.Fl F Ar <TAB>
820to make it easier to specify tabs as the separator character.
821Upstream One True Awk has deprecated this wart in the name of better
822compatibility with other awk implementations like gawk and mawk.
823.Pp
824Historically,
825.Nm
826did not accept
827.Dq 0x
828as a hex string.
829However, since One True Awk used strtod to convert strings to floats, and since
830.Dq 0x12
831is a valid hexadecimal representation of a floating point number,
832On
833.Fx ,
834.Nm
835has accepted this notation as an extension since One True Awk was imported in
836.Fx 5.0 .
837Upstream One True Awk has restored the historical behavior for better
838compatibility between the different awk implementations.
839Both gawk and mawk already behave similarly.
840Starting with
841.Fx 14.0
842.Nm
843will no longer accept this extension.
844.Pp
845The
846.Fx
847.Nm
848sets the locale for many years to match the environment it was running in.
849This lead to pattern ranges, like
850.Dq "[A-Z]"
851sometimes matching lower case characters in some locales.
852This misbehavior was never in upstream One True Awk and has been removed as a
853bug in
854.Fx 12.3 ,
855.Fx 13.1 ,
856and
857.Fx 14.0 .
858