xref: /openbsd/usr.bin/awk/awk.1 (revision 898184e3)
1.\"	$OpenBSD: awk.1,v 1.40 2011/05/02 11:14:11 jmc Exp $
2.\"
3.\" Copyright (C) Lucent Technologies 1997
4.\" All Rights Reserved
5.\"
6.\" Permission to use, copy, modify, and distribute this software and
7.\" its documentation for any purpose and without fee is hereby
8.\" granted, provided that the above copyright notice appear in all
9.\" copies and that both that the copyright notice and this
10.\" permission notice and warranty disclaimer appear in supporting
11.\" documentation, and that the name Lucent Technologies or any of
12.\" its entities not be used in advertising or publicity pertaining
13.\" to distribution of the software without specific, written prior
14.\" permission.
15.\"
16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
23.\" THIS SOFTWARE.
24.\"
25.Dd $Mdocdate: May 2 2011 $
26.Dt AWK 1
27.Os
28.Sh NAME
29.Nm awk
30.Nd pattern-directed scanning and processing language
31.Sh SYNOPSIS
32.Nm awk
33.Op Fl safe
34.Op Fl V
35.Op Fl d Ns Op Ar n
36.Op Fl F Ar fs
37.Op Fl v Ar var Ns = Ns Ar value
38.Op Ar prog | Fl f Ar progfile
39.Ar
40.Sh DESCRIPTION
41.Nm
42scans each input
43.Ar file
44for lines that match any of a set of patterns specified literally in
45.Ar prog
46or in one or more files specified as
47.Fl f Ar progfile .
48With each pattern there can be an associated action that will be performed
49when a line of a
50.Ar file
51matches the pattern.
52Each line is matched against the
53pattern portion of every pattern-action statement;
54the associated action is performed for each matched pattern.
55The file name
56.Sq -
57means the standard input.
58Any
59.Ar file
60of the form
61.Ar var Ns = Ns Ar value
62is treated as an assignment, not a filename,
63and is executed at the time it would have been opened if it were a filename.
64.Pp
65The options are as follows:
66.Bl -tag -width "-safe "
67.It Fl d Ns Op Ar n
68Debug mode.
69Set debug level to
70.Ar n ,
71or 1 if
72.Ar n
73is not specified.
74A value greater than 1 causes
75.Nm
76to dump core on fatal errors.
77.It Fl F Ar fs
78Define the input field separator to be the regular expression
79.Ar fs .
80.It Fl f Ar progfile
81Read program code from the specified file
82.Ar progfile
83instead of from the command line.
84.It Fl safe
85Disable file output
86.Pf ( Ic print No > ,
87.Ic print No >> ) ,
88process creation
89.Po
90.Ar cmd | Ic getline ,
91.Ic print | ,
92.Ic system
93.Pc
94and access to the environment
95.Pf ( Va ENVIRON ;
96see the section on variables below).
97This is a first
98.Pq and not very reliable
99approximation to a
100.Dq safe
101version of
102.Nm .
103.It Fl V
104Print the version number of
105.Nm
106to standard output and exit.
107.It Fl v Ar var Ns = Ns Ar value
108Assign
109.Ar value
110to variable
111.Ar var
112before
113.Ar prog
114is executed;
115any number of
116.Fl v
117options may be present.
118.El
119.Pp
120The input is normally made up of input lines
121.Pq records
122separated by newlines, or by the value of
123.Va RS .
124If
125.Va RS
126is null, then any number of blank lines are used as the record separator,
127and newlines are used as field separators
128(in addition to the value of
129.Va FS ) .
130This is convenient when working with multi-line records.
131.Pp
132An input line is normally made up of fields separated by whitespace,
133or by the regular expression
134.Va FS .
135The fields are denoted
136.Va $1 , $2 , ... ,
137while
138.Va $0
139refers to the entire line.
140If
141.Va FS
142is null, the input line is split into one field per character.
143.Pp
144Normally, any number of blanks separate fields.
145In order to set the field separator to a single blank, use the
146.Fl F
147option with a value of
148.Sq [\ \&] .
149If a field separator of
150.Sq t
151is specified,
152.Nm
153treats it as if
154.Sq \et
155had been specified and uses
156.Aq TAB
157as the field separator.
158In order to use a literal
159.Sq t
160as the field separator, use the
161.Fl F
162option with a value of
163.Sq [t] .
164.Pp
165A pattern-action statement has the form
166.Pp
167.D1 Ar pattern Ic \&{ Ar action Ic \&}
168.Pp
169A missing
170.Ic \&{ Ar action Ic \&}
171means print the line;
172a missing pattern always matches.
173Pattern-action statements are separated by newlines or semicolons.
174.Pp
175Newlines are permitted after a terminating statement or following a comma
176.Pq Sq ,\& ,
177an open brace
178.Pq Sq { ,
179a logical AND
180.Pq Sq && ,
181a logical OR
182.Pq Sq || ,
183after the
184.Sq do
185or
186.Sq else
187keywords,
188or after the closing parenthesis of an
189.Sq if ,
190.Sq for ,
191or
192.Sq while
193statement.
194Additionally, a backslash
195.Pq Sq \e
196can be used to escape a newline between tokens.
197.Pp
198An action is a sequence of statements.
199A statement can be one of the following:
200.Pp
201.Bl -tag -width Ds -offset indent -compact
202.It Xo Ic if ( Ar expression ) Ar statement
203.Op Ic else Ar statement
204.Xc
205.It Ic while ( Ar expression ) Ar statement
206.It Xo Ic for
207.No ( Ar expression ; expression ; expression ) statement
208.Xc
209.It Xo Ic for
210.No ( Ar var Ic in Ar array ) statement
211.Xc
212.It Xo Ic do
213.Ar statement Ic while ( Ar expression )
214.Xc
215.It Ic break
216.It Ic continue
217.It Xo Ic {
218.Op Ar statement ...
219.Ic }
220.Xc
221.It Xo Ar expression
222.No # commonly
223.Ar var No = Ar expression
224.Xc
225.It Xo Ic print
226.Op Ar expression-list
227.Op > Ns Ar expression
228.Xc
229.It Xo Ic printf Ar format
230.Op Ar ... , expression-list
231.Op > Ns Ar expression
232.Xc
233.It Ic return Op Ar expression
234.It Xo Ic next
235.No # skip remaining patterns on this input line
236.Xc
237.It Xo Ic nextfile
238.No # skip rest of this file, open next, start at top
239.Xc
240.It Xo Ic delete
241.Sm off
242.Ar array Ic \&[ Ar expression Ic \&]
243.Sm on
244.No # delete an array element
245.Xc
246.It Xo Ic delete Ar array
247.No # delete all elements of array
248.Xc
249.It Xo Ic exit
250.Op Ar expression
251.No # exit immediately; status is Ar expression
252.Xc
253.El
254.Pp
255Statements are terminated by
256semicolons, newlines or right braces.
257An empty
258.Ar expression-list
259stands for
260.Ar $0 .
261String constants are quoted
262.Li \&"" ,
263with the usual C escapes recognized within
264(see
265.Xr printf 1
266for a complete list of these).
267Expressions take on string or numeric values as appropriate,
268and are built using the operators
269.Ic + \- * / % ^
270.Pq exponentiation ,
271and concatenation
272.Pq indicated by whitespace .
273The operators
274.Ic \&! ++ \-\- += \-= *= /= %= ^=
275.Ic > >= < <= == != ?:
276are also available in expressions.
277Variables may be scalars, array elements
278(denoted
279.Li x[i] )
280or fields.
281Variables are initialized to the null string.
282Array subscripts may be any string,
283not necessarily numeric;
284this allows for a form of associative memory.
285Multiple subscripts such as
286.Li [i,j,k]
287are permitted; the constituents are concatenated,
288separated by the value of
289.Va SUBSEP
290.Pq see the section on variables below .
291.Pp
292The
293.Ic print
294statement prints its arguments on the standard output
295(or on a file if
296.Pf > Ns Ar file
297or
298.Pf >> Ns Ar file
299is present or on a pipe if
300.Pf |\ \& Ar cmd
301is present), separated by the current output field separator,
302and terminated by the output record separator.
303.Ar file
304and
305.Ar cmd
306may be literal names or parenthesized expressions;
307identical string values in different statements denote
308the same open file.
309The
310.Ic printf
311statement formats its expression list according to the format
312(see
313.Xr printf 1 ) .
314.Pp
315Patterns are arbitrary Boolean combinations
316(with
317.Ic "\&! || &&" )
318of regular expressions and
319relational expressions.
320.Nm
321supports extended regular expressions
322.Pq EREs .
323See
324.Xr re_format 7
325for more information on regular expressions.
326Isolated regular expressions
327in a pattern apply to the entire line.
328Regular expressions may also occur in
329relational expressions, using the operators
330.Ic ~
331and
332.Ic !~ .
333.Pf / Ns Ar re Ns /
334is a constant regular expression;
335any string (constant or variable) may be used
336as a regular expression, except in the position of an isolated regular expression
337in a pattern.
338.Pp
339A pattern may consist of two patterns separated by a comma;
340in this case, the action is performed for all lines
341from an occurrence of the first pattern
342through an occurrence of the second.
343.Pp
344A relational expression is one of the following:
345.Pp
346.Bl -tag -width Ds -offset indent -compact
347.It Ar expression matchop regular-expression
348.It Ar expression relop expression
349.It Ar expression Ic in Ar array-name
350.It Xo Ic \&( Ns
351.Ar expr , expr , \&... Ns Ic \&) in
352.Ar array-name
353.Xc
354.El
355.Pp
356where a
357.Ar relop
358is any of the six relational operators in C, and a
359.Ar matchop
360is either
361.Ic ~
362(matches)
363or
364.Ic !~
365(does not match).
366A conditional is an arithmetic expression,
367a relational expression,
368or a Boolean combination
369of these.
370.Pp
371The special patterns
372.Ic BEGIN
373and
374.Ic END
375may be used to capture control before the first input line is read
376and after the last.
377.Ic BEGIN
378and
379.Ic END
380do not combine with other patterns.
381.Pp
382Variable names with special meanings:
383.Pp
384.Bl -tag -width "FILENAME " -compact
385.It Va ARGC
386Argument count, assignable.
387.It Va ARGV
388Argument array, assignable;
389non-null members are taken as filenames.
390.It Va CONVFMT
391Conversion format when converting numbers
392(default
393.Qq Li %.6g ) .
394.It Va ENVIRON
395Array of environment variables; subscripts are names.
396.It Va FILENAME
397The name of the current input file.
398.It Va FNR
399Ordinal number of the current record in the current file.
400.It Va FS
401Regular expression used to separate fields; also settable
402by option
403.Fl F Ar fs .
404.It Va NF
405Number of fields in the current record.
406.Va $NF
407can be used to obtain the value of the last field in the current record.
408.It Va NR
409Ordinal number of the current record.
410.It Va OFMT
411Output format for numbers (default
412.Qq Li %.6g ) .
413.It Va OFS
414Output field separator (default blank).
415.It Va ORS
416Output record separator (default newline).
417.It Va RLENGTH
418The length of the string matched by the
419.Fn match
420function.
421.It Va RS
422Input record separator (default newline).
423.It Va RSTART
424The starting position of the string matched by the
425.Fn match
426function.
427.It Va SUBSEP
428Separates multiple subscripts (default 034).
429.El
430.Sh FUNCTIONS
431The awk language has a variety of built-in functions:
432arithmetic, string, input/output, general, and bit-operation.
433.Pp
434Functions may be defined (at the position of a pattern-action statement)
435thusly:
436.Pp
437.Dl function foo(a, b, c) { ...; return x }
438.Pp
439Parameters are passed by value if scalar, and by reference if array name;
440functions may be called recursively.
441Parameters are local to the function; all other variables are global.
442Thus local variables may be created by providing excess parameters in
443the function definition.
444.Ss Arithmetic Functions
445.Bl -tag -width "atan2(y, x)"
446.It Fn atan2 y x
447Return the arctangent of
448.Fa y Ns / Ns Fa x
449in radians.
450.It Fn cos x
451Return the cosine of
452.Fa x ,
453where
454.Fa x
455is in radians.
456.It Fn exp x
457Return the exponential of
458.Fa x .
459.It Fn int x
460Return
461.Fa x
462truncated to an integer value.
463.It Fn log x
464Return the natural logarithm of
465.Fa x .
466.It Fn rand
467Return a random number,
468.Fa n ,
469such that
470.Sm off
471.Pf 0 \*(Le Fa n No \*(Lt 1 .
472.Sm on
473.It Fn sin x
474Return the sine of
475.Fa x ,
476where
477.Fa x
478is in radians.
479.It Fn sqrt x
480Return the square root of
481.Fa x .
482.It Fn srand expr
483Sets seed for
484.Fn rand
485to
486.Fa expr
487and returns the previous seed.
488If
489.Fa expr
490is omitted, the time of day is used instead.
491.El
492.Ss String Functions
493.Bl -tag -width "split(s, a, fs)"
494.It Fn gsub r t s
495The same as
496.Fn sub
497except that all occurrences of the regular expression are replaced.
498.Fn gsub
499returns the number of replacements.
500.It Fn index s t
501The position in
502.Fa s
503where the string
504.Fa t
505occurs, or 0 if it does not.
506.It Fn length s
507The length of
508.Fa s
509taken as a string,
510or of
511.Va $0
512if no argument is given.
513.It Fn match s r
514The position in
515.Fa s
516where the regular expression
517.Fa r
518occurs, or 0 if it does not.
519The variable
520.Va RSTART
521is set to the starting position of the matched string
522.Pq which is the same as the returned value
523or zero if no match is found.
524The variable
525.Va RLENGTH
526is set to the length of the matched string,
527or \-1 if no match is found.
528.It Fn split s a fs
529Splits the string
530.Fa s
531into array elements
532.Va a[1] , a[2] , ... , a[n]
533and returns
534.Va n .
535The separation is done with the regular expression
536.Ar fs
537or with the field separator
538.Va FS
539if
540.Ar fs
541is not given.
542An empty string as field separator splits the string
543into one array element per character.
544.It Fn sprintf fmt expr ...
545The string resulting from formatting
546.Fa expr , ...
547according to the
548.Xr printf 1
549format
550.Fa fmt .
551.It Fn sub r t s
552Substitutes
553.Fa t
554for the first occurrence of the regular expression
555.Fa r
556in the string
557.Fa s .
558If
559.Fa s
560is not given,
561.Va $0
562is used.
563An ampersand
564.Pq Sq &
565in
566.Fa t
567is replaced in string
568.Fa s
569with regular expression
570.Fa r .
571A literal ampersand can be specified by preceding it with two backslashes
572.Pq Sq \e\e .
573A literal backslash can be specified by preceding it with another backslash
574.Pq Sq \e\e .
575.Fn sub
576returns the number of replacements.
577.It Fn substr s m n
578Return at most the
579.Fa n Ns -character
580substring of
581.Fa s
582that begins at position
583.Fa m
584counted from 1.
585If
586.Fa n
587is omitted, or if
588.Fa n
589specifies more characters than are left in the string,
590the length of the substring is limited by the length of
591.Fa s .
592.It Fn tolower str
593Returns a copy of
594.Fa str
595with all upper-case characters translated to their
596corresponding lower-case equivalents.
597.It Fn toupper str
598Returns a copy of
599.Fa str
600with all lower-case characters translated to their
601corresponding upper-case equivalents.
602.El
603.Ss Input/Output and General Functions
604.Bl -tag -width "getline [var] < file"
605.It Fn close expr
606Closes the file or pipe
607.Fa expr .
608.Fa expr
609should match the string that was used to open the file or pipe.
610.It Ar cmd | Ic getline Op Va var
611Read a record of input from a stream piped from the output of
612.Ar cmd .
613If
614.Va var
615is omitted, the variables
616.Va $0
617and
618.Va NF
619are set.
620Otherwise
621.Va var
622is set.
623If the stream is not open, it is opened.
624As long as the stream remains open, subsequent calls
625will read subsequent records from the stream.
626The stream remains open until explicitly closed with a call to
627.Fn close .
628.Ic getline
629returns 1 for a successful input, 0 for end of file, and \-1 for an error.
630.It Fn fflush [expr]
631Flushes any buffered output for the file or pipe
632.Fa expr ,
633or all open files or pipes if
634.Fa expr
635is omitted.
636.Fa expr
637should match the string that was used to open the file or pipe.
638.It Ic getline
639Sets
640.Va $0
641to the next input record from the current input file.
642This form of
643.Ic getline
644sets the variables
645.Va NF ,
646.Va NR ,
647and
648.Va FNR .
649.Ic getline
650returns 1 for a successful input, 0 for end of file, and \-1 for an error.
651.It Ic getline Va var
652Sets
653.Va $0
654to variable
655.Va var .
656This form of
657.Ic getline
658sets the variables
659.Va NR
660and
661.Va FNR .
662.Ic getline
663returns 1 for a successful input, 0 for end of file, and \-1 for an error.
664.It Xo
665.Ic getline Op Va var
666.Pf \ \&< Ar file
667.Xc
668Sets
669.Va $0
670to the next record from
671.Ar file .
672If
673.Va var
674is omitted, the variables
675.Va $0
676and
677.Va NF
678are set.
679Otherwise
680.Va var
681is set.
682If
683.Ar file
684is not open, it is opened.
685As long as the stream remains open, subsequent calls will read subsequent
686records from
687.Ar file .
688.Ar file
689remains open until explicitly closed with a call to
690.Fn close .
691.It Fn system cmd
692Executes
693.Fa cmd
694and returns its exit status.
695.El
696.Ss Bit-Operation Functions
697.Bl -tag -width "lshift(a, b)"
698.It Fn compl x
699Returns the bitwise complement of integer argument x.
700.It Fn and x y
701Performs a bitwise AND on integer arguments x and y.
702.It Fn or x y
703Performs a bitwise OR on integer arguments x and y.
704.It Fn xor x y
705Performs a bitwise Exclusive-OR on integer arguments x and y.
706.It Fn lshift x n
707Returns integer argument x shifted by n bits to the left.
708.It Fn rshift x n
709Returns integer argument x shifted by n bits to the right.
710.El
711.Sh EXIT STATUS
712.Ex -std awk
713.Pp
714But note that the
715.Ic exit
716expression can modify the exit status.
717.Sh EXAMPLES
718Print lines longer than 72 characters:
719.Pp
720.Dl length($0) > 72
721.Pp
722Print first two fields in opposite order:
723.Pp
724.Dl { print $2, $1 }
725.Pp
726Same, with input fields separated by comma and/or blanks and tabs:
727.Bd -literal -offset indent
728BEGIN { FS = ",[ \et]*|[ \et]+" }
729      { print $2, $1 }
730.Ed
731.Pp
732Add up first column, print sum and average:
733.Bd -literal -offset indent
734{ s += $1 }
735END { print "sum is", s, " average is", s/NR }
736.Ed
737.Pp
738Print all lines between start/stop pairs:
739.Pp
740.Dl /start/, /stop/
741.Pp
742Simulate echo(1):
743.Bd -literal -offset indent
744BEGIN { # Simulate echo(1)
745        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
746        printf "\en"
747        exit }
748.Ed
749.Pp
750Print an error message to standard error:
751.Bd -literal -offset indent
752{ print "error!" > "/dev/stderr" }
753.Ed
754.Sh SEE ALSO
755.Xr lex 1 ,
756.Xr printf 1 ,
757.Xr sed 1 ,
758.Xr re_format 7 ,
759.Xr script 7
760.Rs
761.%A A. V. Aho
762.%A B. W. Kernighan
763.%A P. J. Weinberger
764.%T The AWK Programming Language
765.%I Addison-Wesley
766.%D 1988
767.%O ISBN 0-201-07981-X
768.Re
769.Sh STANDARDS
770The
771.Nm
772utility is compliant with the
773.St -p1003.1-2008
774specification.
775.Pp
776The flags
777.Op Fl \&dV
778and
779.Op Fl safe ,
780as well as the commands
781.Cm fflush , compl , and , or ,
782.Cm xor , lshift , rshift ,
783are extensions to that specification.
784.Pp
785.Nm
786does not support {n,m} pattern matching.
787.Sh HISTORY
788An
789.Nm
790utility appeared in
791.At v7 .
792.Sh BUGS
793There are no explicit conversions between numbers and strings.
794To force an expression to be treated as a number add 0 to it;
795to force it to be treated as a string concatenate
796.Li \&""
797to it.
798.Pp
799The scope rules for variables in functions are a botch;
800the syntax is worse.
801