xref: /openbsd/usr.bin/awk/awk.1 (revision db3296cf)
1.\"	$OpenBSD: awk.1,v 1.13 2003/06/30 23:59:00 millert Exp $
2.\" EX/EE is a Bd
3.\"
4.\" Copyright (C) Lucent Technologies 1997
5.\" All Rights Reserved
6.\"
7.\" Permission to use, copy, modify, and distribute this software and
8.\" its documentation for any purpose and without fee is hereby
9.\" granted, provided that the above copyright notice appear in all
10.\" copies and that both that the copyright notice and this
11.\" permission notice and warranty disclaimer appear in supporting
12.\" documentation, and that the name Lucent Technologies or any of
13.\" its entities not be used in advertising or publicity pertaining
14.\" to distribution of the software without specific, written prior
15.\" permission.
16.\"
17.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
18.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
19.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
20.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
21.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
22.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
23.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
24.\" THIS SOFTWARE.
25.\"
26.Dd June 29, 1996
27.Dt AWK 1
28.Os
29.Sh NAME
30.Nm awk
31.Nd pattern-directed scanning and processing language
32.Sh SYNOPSIS
33.Nm awk
34.Op Fl F Ar fs
35.Op Fl v Ar var=value
36.Op Fl safe
37.Op Fl mr Ar n
38.Op Fl mf Ar n
39.Op Ar prog | Fl f Ar progfile
40.Ar
41.Nm nawk
42.Ar ...
43.Sh DESCRIPTION
44.Nm
45scans each input
46.Ar file
47for lines that match any of a set of patterns specified literally in
48.Ar prog
49or in one or more files
50specified as
51.Fl f Ar progfile .
52With each pattern
53there can be an associated action that will be performed
54when a line of a
55.Ar file
56matches the pattern.
57Each line is matched against the
58pattern portion of every pattern-action statement;
59the associated action is performed for each matched pattern.
60The file name
61.Sq Pa \-
62means the standard input.
63Any
64.Ar file
65of the form
66.Ar var=value
67is treated as an assignment, not a filename,
68and is executed at the time it would have been opened if it were a filename.
69The option
70.Fl v
71followed by
72.Ar var=value
73is an assignment to be done before
74.Ar prog
75is executed;
76any number of
77.Fl v
78options may be present.
79The
80.Fl F Ar fs
81option defines the input field separator to be the regular expression
82.Ar fs .
83The
84.Fl safe
85option disables file output
86.Po
87.Ic print Ic > ,
88.Ic print Ic >> ,
89.Pc
90process creation
91.Po
92.Ar cmd Ic \&| getline ,
93.Ic print \&| , system
94.Pc
95and access to the environment
96.Pq Va ENVIRON .
97This
98is a first (and not very reliable) approximation to a
99.Dq safe
100version of
101.Nm awk .
102.Pp
103An input line is normally made up of fields separated by whitespace,
104or by regular expression
105.Va FS .
106The fields are denoted
107.Va $1 , $2 , ... ,
108while
109.Va $0
110refers to the entire line.
111If
112.Va FS
113is null, the input line is split into one field per character.
114.Pp
115To compensate for inadequate implementation of storage management,
116the
117.Fl mr
118option can be used to set the maximum size of the input record,
119and the
120.Fl mf
121option to set the maximum number of fields.
122.Pp
123A pattern-action statement has the form
124.Pp
125.D1 Ar pattern Ic \&{ Ar action Ic \&}
126.Pp
127A missing
128.Ic \&{ Ar action Ic \&}
129means print the line;
130a missing pattern always matches.
131Pattern-action statements are separated by newlines or semicolons.
132.Pp
133An action is a sequence of statements.
134A statement can be one of the following:
135.Pp
136.Bd -unfilled -offset indent
137.Ic if ( Xo
138.Ar expression ) statement \&
139.Op Ic else Ar statement
140.Xc
141.Ic while ( Ar expression ) statement
142.Ic for ( Xo
143.Ar expression ; expression ; expression ) statement
144.Xc
145.Ic for ( Xo
146.Ar var Ic in Ar array ) statement
147.Xc
148.Ic do Ar statement Ic while ( Ar expression )
149.Ic break
150.Ic continue
151.Ic { Oo Ar statement ... Oc Ic \& }
152.Ar expression Xo
153.No "# commonly" \&
154.Ar var Ic = Ar expression
155.Xc
156.Ic print Xo
157.Op Ar expression-list
158.Op Ic > Ns Ar expression
159.Xc
160.Ic printf Ar format Xo
161.Op Ar ... , expression-list
162.Op Ic > Ns Ar expression
163.Xc
164.Ic return Op Ar expression
165.Ic next Xo
166.No "# skip remaining patterns on this input line"
167.Xc
168.Ic nextfile Xo
169.No "# skip rest of this file, open next, start at top"
170.Xc
171.Ic delete Ar array Ns Xo
172.Ic \&[ Ns Ar expression Ns Ic \&]
173.No \& "# delete an array element"
174.Xc
175.Ic delete Ar array Xo
176.No "# delete all elements of array"
177.Xc
178.Ic exit Xo
179.Op Ar expression
180.No \& "# exit immediately; status is" Ar expression
181.Xc
182.Ed
183.Pp
184Statements are terminated by
185semicolons, newlines or right braces.
186An empty
187.Ar expression-list
188stands for
189.Ar $0 .
190String constants are quoted
191.Li \&"" ,
192with the usual C escapes recognized within.
193Expressions take on string or numeric values as appropriate,
194and are built using the operators
195.Ic + \- * / % ^
196(exponentiation), and concatenation (indicated by whitespace).
197The operators
198.Ic ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
199are also available in expressions.
200Variables may be scalars, array elements
201(denoted
202.Li x[i] )
203or fields.
204Variables are initialized to the null string.
205Array subscripts may be any string,
206not necessarily numeric;
207this allows for a form of associative memory.
208Multiple subscripts such as
209.Li [i,j,k]
210are permitted; the constituents are concatenated,
211separated by the value of
212.Va SUBSEP .
213.Pp
214The
215.Ic print
216statement prints its arguments on the standard output
217(or on a file if
218.Ic > Ns Ar file
219or
220.Ic >> Ns Ar file
221is present or on a pipe if
222.Ic \&| Ar cmd
223is present), separated by the current output field separator,
224and terminated by the output record separator.
225.Ar file
226and
227.Ar cmd
228may be literal names or parenthesized expressions;
229identical string values in different statements denote
230the same open file.
231The
232.Ic printf
233statement formats its expression list according to the format
234(see
235.Xr printf 3 ) .
236The built-in function
237.Fn close expr
238closes the file or pipe
239.Fa expr .
240The built-in function
241.Fn fflush expr
242flushes any buffered output for the file or pipe
243.Fa expr .
244.Pp
245The mathematical functions
246.Fn exp ,
247.Fn log ,
248.Fn sqrt ,
249.Fn sin ,
250.Fn cos ,
251and
252.Fn atan2
253are built in.
254Other built-in functions:
255.Pp
256.Bl -tag -width Fn
257.It Fn length
258the length of its argument
259taken as a string,
260or of
261.Va $0
262if no argument.
263.It Fn rand
264random number on (0,1)
265.It Fn srand
266sets seed for
267.Fn rand
268and returns the previous seed.
269.It Fn int
270truncates to an integer value.
271.It Fn substr s m n
272the
273.Fa n Ns No -character
274substring of
275.Fa s
276that begins at position
277.Fa m
278counted from 1.
279.It Fn index s t
280the position in
281.Fa s
282where the string
283.Fa t
284occurs, or 0 if it does not.
285.It Fn match s r
286the position in
287.Fa s
288where the regular expression
289.Fa r
290occurs, or 0 if it does not.
291The variables
292.Va RSTART
293and
294.Va RLENGTH
295are set to the position and length of the matched string.
296.It Fn split s a fs
297splits the string
298.Fa s
299into array elements
300.Va a[1] , a[2] , ... , a[n]
301and returns
302.Va n .
303The separation is done with the regular expression
304.Ar fs
305or with the field separator
306.Va FS
307if
308.Ar fs
309is not given.
310An empty string as field separator splits the string
311into one array element per character.
312.It Fn sub r t s
313substitutes
314.Fa t
315for the first occurrence of the regular expression
316.Fa r
317in the string
318.Fa s .
319If
320.Fa s
321is not given,
322.Va $0
323is used.
324.It Fn gsub r t s
325same as
326.Fn sub
327except that all occurrences of the regular expression
328are replaced;
329.Fn sub
330and
331.Fn gsub
332return the number of replacements.
333.It Fn sprintf fmt expr ...
334the string resulting from formatting
335.Fa expr , ...
336according to the
337.Xr printf 3
338format
339.Fa fmt .
340.It Fn system cmd
341executes
342.Fa cmd
343and returns its exit status.
344.It Fn tolower str
345returns a copy of
346.Fa str
347with all upper-case characters translated to their
348corresponding lower-case equivalents.
349.It Fn toupper str
350returns a copy of
351.Fa str
352with all lower-case characters translated to their
353corresponding upper-case equivalents.
354.El
355.Pp
356The
357.Sq function
358.Ic getline
359sets
360.Va $0
361to the next input record from the current input file;
362.Ic getline < Ar file
363sets
364.Va $0
365to the next record from
366.Ar file .
367.Ic getline Va x
368sets variable
369.Va x
370instead.
371Finally,
372.Ar cmd Ic \&| getline
373pipes the output of
374.Ar cmd
375into
376.Ic getline ;
377each call of
378.Ic getline
379returns the next line of output from
380.Ar cmd .
381In all cases,
382.Ic getline
383returns 1 for a successful input,
3840 for end of file, and \-1 for an error.
385.Pp
386Patterns are arbitrary Boolean combinations
387(with
388.Ic "! || &&" )
389of regular expressions and
390relational expressions.
391Regular expressions are as in
392.Xr egrep 1 .
393Isolated regular expressions
394in a pattern apply to the entire line.
395Regular expressions may also occur in
396relational expressions, using the operators
397.Ic ~
398and
399.Ic !~ .
400.Ic / Ns Ar re Ns Ic /
401is a constant regular expression;
402any string (constant or variable) may be used
403as a regular expression, except in the position of an isolated regular expression
404in a pattern.
405.Pp
406A pattern may consist of two patterns separated by a comma;
407in this case, the action is performed for all lines
408from an occurrence of the first pattern
409though an occurrence of the second.
410.Pp
411A relational expression is one of the following:
412.Bd -unfilled -offset indent
413.Ar expression matchop regular-expression
414.Ar expression relop expression
415.Ar expression Ic in Ar array-name
416.Ic \&( Ns Xo
417.Ar expr , expr , \&... Ns Ic \&) in
418.Ar \& array-name
419.Xc
420.Ed
421where a
422.Ar relop
423is any of the six relational operators in C, and a
424.Ar matchop
425is either
426.Ic ~
427(matches)
428or
429.Ic !~
430(does not match).
431A conditional is an arithmetic expression,
432a relational expression,
433or a Boolean combination
434of these.
435.Pp
436The special patterns
437.Ic BEGIN
438and
439.Ic END
440may be used to capture control before the first input line is read
441and after the last.
442.Ic BEGIN
443and
444.Ic END
445do not combine with other patterns.
446.Pp
447Variable names with special meanings:
448.Pp
449.Bl -tag -width Va -compact
450.It Va CONVFMT
451conversion format used when converting numbers
452(default
453.Qq Li %.6g )
454.It Va FS
455regular expression used to separate fields; also settable
456by option
457.Fl F Ar fs .
458.It Va NF
459number of fields in the current record
460.It Va NR
461ordinal number of the current record
462.It Va FNR
463ordinal number of the current record in the current file
464.It Va FILENAME
465the name of the current input file
466.It Va RS
467input record separator (default newline)
468.It Va OFS
469output field separator (default blank)
470.It Va ORS
471output record separator (default newline)
472.It Va OFMT
473output format for numbers (default
474.Qq Li %.6g )
475.It Va SUBSEP
476separates multiple subscripts (default 034)
477.It Va ARGC
478argument count, assignable
479.It Va ARGV
480argument array, assignable;
481non-null members are taken as filenames
482.It Va ENVIRON
483array of environment variables; subscripts are names.
484.El
485.Pp
486Functions may be defined (at the position of a pattern-action statement)
487thusly:
488.Pp
489.Dl function foo(a, b, c) { ...; return x }
490.Pp
491Parameters are passed by value if scalar and by reference if array name;
492functions may be called recursively.
493Parameters are local to the function; all other variables are global.
494Thus local variables may be created by providing excess parameters in
495the function definition.
496.Sh EXAMPLES
497.Dl length($0) > 72
498Print lines longer than 72 characters.
499.Pp
500.Dl { print $2, $1 }
501Print first two fields in opposite order.
502.Pp
503.Bd -literal -offset indent
504BEGIN { FS = ",[ \et]*|[ \et]+" }
505      { print $2, $1 }
506.Ed
507Same, with input fields separated by comma and/or blanks and tabs.
508.Pp
509.Bd -literal -offset indent
510{ s += $1 }
511END { print "sum is", s, " average is", s/NR }
512.Ed
513Add up first column, print sum and average.
514.Pp
515.Dl /start/, /stop/
516Print all lines between start/stop pairs.
517.Pp
518.Bd -literal -offset indent
519BEGIN { # Simulate echo(1)
520        for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
521        printf "\en"
522        exit }
523.Ed
524.Sh SEE ALSO
525.Xr lex 1 ,
526.Xr sed 1
527.Rs
528.%A A. V. Aho
529.%A B. W. Kernighan
530.%A P. J. Weinberger
531.%T The AWK Programming Language
532.%I Addison-Wesley
533.%D 1988
534.%O ISBN 0-201-07981-X
535.Re
536.Sh HISTORY
537An
538.Nm
539utility appeared in
540.At v7 .
541.Sh BUGS
542There are no explicit conversions between numbers and strings.
543To force an expression to be treated as a number add 0 to it;
544to force it to be treated as a string concatenate
545.Li \&""
546to it.
547.Pp
548The scope rules for variables in functions are a botch;
549the syntax is worse.
550