xref: /original-bsd/old/awk/awk.1 (revision cfa2a17a)
1.\" Copyright (c) 1990 The Regents of the University of California.
2.\" All rights reserved.
3.\"
4.\" %sccs.include.proprietary.roff%
5.\"
6.\"	@(#)awk.1	6.5 (Berkeley) 04/17/91
7.\"
8.Dd
9.Dt AWK 1
10.Os ATT 7
11.Sh NAME
12.Nm awk
13.Nd pattern scanning and processing language
14.Sh SYNOPSIS
15.Nm awk
16.Oo
17.Op Fl \&F Ar \&c
18.Oo
19.Op Fl f Ar prog_file
20.Op Ar prog
21.Ar
22.Sh DESCRIPTION
23.Nm Awk
24scans each input
25.Ar file
26for lines that match any of a set of patterns specified in
27.Ar prog .
28With each pattern in
29.Ar prog
30there can be an associated action that will be performed
31when a line of a
32.Ar file
33matches the pattern.
34The set of patterns may appear literally as
35.Ar prog
36or in a file
37specified as
38.Fl f
39.Ar file .
40.Pp
41.Tw Ds
42.Tp Cx Fl F
43.Ar c
44.Cx
45Specify a field separator of
46.Ar c .
47.Tp Fl f
48Use
49.Ar prog_file
50as an input
51.Ar prog
52(an awk script).
53.Tp
54.Pp
55Files are read in order;
56if there are no files, the standard input is read.
57The file name
58.Sq Fl
59means the standard input.
60Each line is matched against the
61pattern portion of every pattern-action statement;
62the associated action is performed for each matched pattern.
63.Pp
64An input line is made up of fields separated by white space.
65(This default can be changed by using
66.Li FS ,
67.Em vide infra . )
68The fields are denoted $1, $2, ... ;
69$0 refers to the entire line.
70.Pp
71A pattern-action statement has the form
72.Pp
73.Dl pattern {action}
74.Pp
75A missing { action } means print the line;
76a missing pattern always matches.
77.Pp
78An action is a sequence of statements.
79A statement can be one of the following:
80.Pp
81.Ds I
82if ( conditional ) statement [ else statement ]
83while ( conditional ) statement
84for ( expression ; conditional ; expression ) statement
85break
86continue
87{ [ statement ] ... }
88variable = expression
89print [ expression-list ] [ >expression ]
90printf format [, expression-list ] [ >expression ]
91next	# skip remaining patterns on this input line
92exit	# skip the rest of the input
93.De
94.Pp
95Statements are terminated by
96semicolons, newlines or right braces.
97An empty expression-list stands for the whole line.
98Expressions take on string or numeric values as appropriate,
99and are built using the operators
100+, \-, *, /, %,  and concatenation (indicated by a blank).
101The C operators ++, \-\-, +=, \-=, *=, /=, and %=
102are also available in expressions.
103Variables may be scalars, array elements
104(denoted
105.Cx x
106.Op i
107.Cx )
108.Cx
109or fields.
110Variables are initialized to the null string.
111Array subscripts may be any string,
112not necessarily numeric;
113this allows for a form of associative memory.
114String constants are quoted "...".
115.Pp
116The
117.Ic print
118statement prints its arguments on the standard output
119(or on a file if
120.Ar \&>file
121is present), separated by the current output field separator,
122and terminated by the output record separator.
123The
124.Ic printf
125statement formats its expression list according to the format
126(see
127.Xr printf 3 ) .
128.Pp
129The built-in function
130.Ic length
131returns the length of its argument
132taken as a string,
133or of the whole line if no argument.
134There are also built-in functions
135.Ic exp ,
136.Ic log ,
137.Ic sqrt
138and
139.Ic int .
140The last truncates its argument to an integer.
141The function
142.Fn substr s m n
143returns the
144.Cx Ar n
145.Cx \-
146.Cx character
147.Cx
148substring of
149.Ar s
150that begins at position
151.Ar m .
152The
153.Fn sprintf fmt expr expr \&...
154function
155formats the expressions
156according to the
157.Xr printf 3
158format given by
159.Ar fmt
160and returns the resulting string.
161.Pp
162Patterns are arbitrary Boolean combinations
163(!, \(or\(or, &&, and parentheses) of
164regular expressions and
165relational expressions.
166Regular expressions must be surrounded
167by slashes and are as in
168.Xr egrep 1 .
169Isolated regular expressions
170in a pattern apply to the entire line.
171Regular expressions may also occur in
172relational expressions.
173.Pp
174A pattern may consist of two patterns separated by a comma;
175in this case, the action is performed for all lines
176between an occurrence of the first pattern
177and the next occurrence of the second.
178.Pp
179A relational expression is one of the following:
180.Pp
181.Ds I
182expression matchop regular-expression
183expression relop expression
184.De
185.Pp
186where a relop is any of the six relational operators in C,
187and a matchop is either ~ (for contains)
188or !~ (for does not contain).
189A conditional is an arithmetic expression,
190a relational expression,
191or a Boolean combination
192of these.
193.Pp
194The special patterns
195.Li BEGIN
196and
197.Li END
198may be used to capture control before the first input line is read
199and after the last.
200.Li BEGIN
201must be the first pattern,
202.Li END
203the last.
204.Pp
205A single character
206.Ar c
207may be used to separate the fields by starting
208the program with
209.Pp
210.Dl BEGIN { FS = "c" }
211.Pp
212or by using the
213.Cx Fl F
214.Ar c
215.Cx
216option.
217.Pp
218Other variable names with special meanings
219include
220.Dp Li NF
221the number of fields in the current record;
222.Dp Li NR
223the ordinal number of the current record;
224.Dp Li FILENAME
225the name of the current input file;
226.Dp Li OFS
227the output field separator (default blank);
228.Dp Li ORS
229the output record separator (default newline);
230.Dp Li OFMT
231the output format for numbers (default "%.6g").
232.Dp
233.Pp
234.Sh EXAMPLES
235.Pp
236Print lines longer than 72 characters:
237.Pp
238.Dl length > 72
239.Pp
240Print first two fields in opposite order:
241.Pp
242.Dl { print $2, $1 }
243.Pp
244Add up first column, print sum and average:
245.Pp
246.Ds I
247	{ s += $1 }
248END	{ print "sum is", s, " average is", s/NR }
249.De
250.Pp
251Print fields in reverse order:
252.Pp
253.Dl { for (i = NF; i > 0; \-\-i) print $i }
254.Pp
255Print all lines between start/stop pairs:
256.Pp
257.Dl /start/, /stop/
258.Pp
259Print all lines whose first field is different from previous one:
260.Pp
261.Dl $1 != prev { print; prev = $1 }
262.Sh SEE ALSO
263.Xr lex 1 ,
264.Xr sed 1
265.Pp
266A. V. Aho, B. W. Kernighan, P. J. Weinberger,
267.Em Awk \- a pattern scanning and processing language
268.Sh HISTORY
269.Nm Awk
270appeared in Version 7 AT&T UNIX.  A much improved
271and true to the book version of
272.Nm awk
273appeared in the AT&T Toolchest in the late 1980's.
274The version of
275.Nm awk
276this manual page describes
277is a derivative of the original and not the Toolchest version.
278.Sh BUGS
279There are no explicit conversions between numbers and strings.
280To force an expression to be treated as a number add 0 to it;
281to force it to be treated as a string concatenate "" (an empty
282string) to it.
283