1.\" $OpenBSD: awk.1,v 1.13 2003/06/30 23:59:00 millert Exp $ 2.\" EX/EE is a Bd 3.\" 4.\" Copyright (C) Lucent Technologies 1997 5.\" All Rights Reserved 6.\" 7.\" Permission to use, copy, modify, and distribute this software and 8.\" its documentation for any purpose and without fee is hereby 9.\" granted, provided that the above copyright notice appear in all 10.\" copies and that both that the copyright notice and this 11.\" permission notice and warranty disclaimer appear in supporting 12.\" documentation, and that the name Lucent Technologies or any of 13.\" its entities not be used in advertising or publicity pertaining 14.\" to distribution of the software without specific, written prior 15.\" permission. 16.\" 17.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 18.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 19.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 20.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 21.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 22.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 23.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 24.\" THIS SOFTWARE. 25.\" 26.Dd June 29, 1996 27.Dt AWK 1 28.Os 29.Sh NAME 30.Nm awk 31.Nd pattern-directed scanning and processing language 32.Sh SYNOPSIS 33.Nm awk 34.Op Fl F Ar fs 35.Op Fl v Ar var=value 36.Op Fl safe 37.Op Fl mr Ar n 38.Op Fl mf Ar n 39.Op Ar prog | Fl f Ar progfile 40.Ar 41.Nm nawk 42.Ar ... 43.Sh DESCRIPTION 44.Nm 45scans each input 46.Ar file 47for lines that match any of a set of patterns specified literally in 48.Ar prog 49or in one or more files 50specified as 51.Fl f Ar progfile . 52With each pattern 53there can be an associated action that will be performed 54when a line of a 55.Ar file 56matches the pattern. 57Each line is matched against the 58pattern portion of every pattern-action statement; 59the associated action is performed for each matched pattern. 60The file name 61.Sq Pa \- 62means the standard input. 63Any 64.Ar file 65of the form 66.Ar var=value 67is treated as an assignment, not a filename, 68and is executed at the time it would have been opened if it were a filename. 69The option 70.Fl v 71followed by 72.Ar var=value 73is an assignment to be done before 74.Ar prog 75is executed; 76any number of 77.Fl v 78options may be present. 79The 80.Fl F Ar fs 81option defines the input field separator to be the regular expression 82.Ar fs . 83The 84.Fl safe 85option disables file output 86.Po 87.Ic print Ic > , 88.Ic print Ic >> , 89.Pc 90process creation 91.Po 92.Ar cmd Ic \&| getline , 93.Ic print \&| , system 94.Pc 95and access to the environment 96.Pq Va ENVIRON . 97This 98is a first (and not very reliable) approximation to a 99.Dq safe 100version of 101.Nm awk . 102.Pp 103An input line is normally made up of fields separated by whitespace, 104or by regular expression 105.Va FS . 106The fields are denoted 107.Va $1 , $2 , ... , 108while 109.Va $0 110refers to the entire line. 111If 112.Va FS 113is null, the input line is split into one field per character. 114.Pp 115To compensate for inadequate implementation of storage management, 116the 117.Fl mr 118option can be used to set the maximum size of the input record, 119and the 120.Fl mf 121option to set the maximum number of fields. 122.Pp 123A pattern-action statement has the form 124.Pp 125.D1 Ar pattern Ic \&{ Ar action Ic \&} 126.Pp 127A missing 128.Ic \&{ Ar action Ic \&} 129means print the line; 130a missing pattern always matches. 131Pattern-action statements are separated by newlines or semicolons. 132.Pp 133An action is a sequence of statements. 134A statement can be one of the following: 135.Pp 136.Bd -unfilled -offset indent 137.Ic if ( Xo 138.Ar expression ) statement \& 139.Op Ic else Ar statement 140.Xc 141.Ic while ( Ar expression ) statement 142.Ic for ( Xo 143.Ar expression ; expression ; expression ) statement 144.Xc 145.Ic for ( Xo 146.Ar var Ic in Ar array ) statement 147.Xc 148.Ic do Ar statement Ic while ( Ar expression ) 149.Ic break 150.Ic continue 151.Ic { Oo Ar statement ... Oc Ic \& } 152.Ar expression Xo 153.No "# commonly" \& 154.Ar var Ic = Ar expression 155.Xc 156.Ic print Xo 157.Op Ar expression-list 158.Op Ic > Ns Ar expression 159.Xc 160.Ic printf Ar format Xo 161.Op Ar ... , expression-list 162.Op Ic > Ns Ar expression 163.Xc 164.Ic return Op Ar expression 165.Ic next Xo 166.No "# skip remaining patterns on this input line" 167.Xc 168.Ic nextfile Xo 169.No "# skip rest of this file, open next, start at top" 170.Xc 171.Ic delete Ar array Ns Xo 172.Ic \&[ Ns Ar expression Ns Ic \&] 173.No \& "# delete an array element" 174.Xc 175.Ic delete Ar array Xo 176.No "# delete all elements of array" 177.Xc 178.Ic exit Xo 179.Op Ar expression 180.No \& "# exit immediately; status is" Ar expression 181.Xc 182.Ed 183.Pp 184Statements are terminated by 185semicolons, newlines or right braces. 186An empty 187.Ar expression-list 188stands for 189.Ar $0 . 190String constants are quoted 191.Li \&"" , 192with the usual C escapes recognized within. 193Expressions take on string or numeric values as appropriate, 194and are built using the operators 195.Ic + \- * / % ^ 196(exponentiation), and concatenation (indicated by whitespace). 197The operators 198.Ic ! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?: 199are also available in expressions. 200Variables may be scalars, array elements 201(denoted 202.Li x[i] ) 203or fields. 204Variables are initialized to the null string. 205Array subscripts may be any string, 206not necessarily numeric; 207this allows for a form of associative memory. 208Multiple subscripts such as 209.Li [i,j,k] 210are permitted; the constituents are concatenated, 211separated by the value of 212.Va SUBSEP . 213.Pp 214The 215.Ic print 216statement prints its arguments on the standard output 217(or on a file if 218.Ic > Ns Ar file 219or 220.Ic >> Ns Ar file 221is present or on a pipe if 222.Ic \&| Ar cmd 223is present), separated by the current output field separator, 224and terminated by the output record separator. 225.Ar file 226and 227.Ar cmd 228may be literal names or parenthesized expressions; 229identical string values in different statements denote 230the same open file. 231The 232.Ic printf 233statement formats its expression list according to the format 234(see 235.Xr printf 3 ) . 236The built-in function 237.Fn close expr 238closes the file or pipe 239.Fa expr . 240The built-in function 241.Fn fflush expr 242flushes any buffered output for the file or pipe 243.Fa expr . 244.Pp 245The mathematical functions 246.Fn exp , 247.Fn log , 248.Fn sqrt , 249.Fn sin , 250.Fn cos , 251and 252.Fn atan2 253are built in. 254Other built-in functions: 255.Pp 256.Bl -tag -width Fn 257.It Fn length 258the length of its argument 259taken as a string, 260or of 261.Va $0 262if no argument. 263.It Fn rand 264random number on (0,1) 265.It Fn srand 266sets seed for 267.Fn rand 268and returns the previous seed. 269.It Fn int 270truncates to an integer value. 271.It Fn substr s m n 272the 273.Fa n Ns No -character 274substring of 275.Fa s 276that begins at position 277.Fa m 278counted from 1. 279.It Fn index s t 280the position in 281.Fa s 282where the string 283.Fa t 284occurs, or 0 if it does not. 285.It Fn match s r 286the position in 287.Fa s 288where the regular expression 289.Fa r 290occurs, or 0 if it does not. 291The variables 292.Va RSTART 293and 294.Va RLENGTH 295are set to the position and length of the matched string. 296.It Fn split s a fs 297splits the string 298.Fa s 299into array elements 300.Va a[1] , a[2] , ... , a[n] 301and returns 302.Va n . 303The separation is done with the regular expression 304.Ar fs 305or with the field separator 306.Va FS 307if 308.Ar fs 309is not given. 310An empty string as field separator splits the string 311into one array element per character. 312.It Fn sub r t s 313substitutes 314.Fa t 315for the first occurrence of the regular expression 316.Fa r 317in the string 318.Fa s . 319If 320.Fa s 321is not given, 322.Va $0 323is used. 324.It Fn gsub r t s 325same as 326.Fn sub 327except that all occurrences of the regular expression 328are replaced; 329.Fn sub 330and 331.Fn gsub 332return the number of replacements. 333.It Fn sprintf fmt expr ... 334the string resulting from formatting 335.Fa expr , ... 336according to the 337.Xr printf 3 338format 339.Fa fmt . 340.It Fn system cmd 341executes 342.Fa cmd 343and returns its exit status. 344.It Fn tolower str 345returns a copy of 346.Fa str 347with all upper-case characters translated to their 348corresponding lower-case equivalents. 349.It Fn toupper str 350returns a copy of 351.Fa str 352with all lower-case characters translated to their 353corresponding upper-case equivalents. 354.El 355.Pp 356The 357.Sq function 358.Ic getline 359sets 360.Va $0 361to the next input record from the current input file; 362.Ic getline < Ar file 363sets 364.Va $0 365to the next record from 366.Ar file . 367.Ic getline Va x 368sets variable 369.Va x 370instead. 371Finally, 372.Ar cmd Ic \&| getline 373pipes the output of 374.Ar cmd 375into 376.Ic getline ; 377each call of 378.Ic getline 379returns the next line of output from 380.Ar cmd . 381In all cases, 382.Ic getline 383returns 1 for a successful input, 3840 for end of file, and \-1 for an error. 385.Pp 386Patterns are arbitrary Boolean combinations 387(with 388.Ic "! || &&" ) 389of regular expressions and 390relational expressions. 391Regular expressions are as in 392.Xr egrep 1 . 393Isolated regular expressions 394in a pattern apply to the entire line. 395Regular expressions may also occur in 396relational expressions, using the operators 397.Ic ~ 398and 399.Ic !~ . 400.Ic / Ns Ar re Ns Ic / 401is a constant regular expression; 402any string (constant or variable) may be used 403as a regular expression, except in the position of an isolated regular expression 404in a pattern. 405.Pp 406A pattern may consist of two patterns separated by a comma; 407in this case, the action is performed for all lines 408from an occurrence of the first pattern 409though an occurrence of the second. 410.Pp 411A relational expression is one of the following: 412.Bd -unfilled -offset indent 413.Ar expression matchop regular-expression 414.Ar expression relop expression 415.Ar expression Ic in Ar array-name 416.Ic \&( Ns Xo 417.Ar expr , expr , \&... Ns Ic \&) in 418.Ar \& array-name 419.Xc 420.Ed 421where a 422.Ar relop 423is any of the six relational operators in C, and a 424.Ar matchop 425is either 426.Ic ~ 427(matches) 428or 429.Ic !~ 430(does not match). 431A conditional is an arithmetic expression, 432a relational expression, 433or a Boolean combination 434of these. 435.Pp 436The special patterns 437.Ic BEGIN 438and 439.Ic END 440may be used to capture control before the first input line is read 441and after the last. 442.Ic BEGIN 443and 444.Ic END 445do not combine with other patterns. 446.Pp 447Variable names with special meanings: 448.Pp 449.Bl -tag -width Va -compact 450.It Va CONVFMT 451conversion format used when converting numbers 452(default 453.Qq Li %.6g ) 454.It Va FS 455regular expression used to separate fields; also settable 456by option 457.Fl F Ar fs . 458.It Va NF 459number of fields in the current record 460.It Va NR 461ordinal number of the current record 462.It Va FNR 463ordinal number of the current record in the current file 464.It Va FILENAME 465the name of the current input file 466.It Va RS 467input record separator (default newline) 468.It Va OFS 469output field separator (default blank) 470.It Va ORS 471output record separator (default newline) 472.It Va OFMT 473output format for numbers (default 474.Qq Li %.6g ) 475.It Va SUBSEP 476separates multiple subscripts (default 034) 477.It Va ARGC 478argument count, assignable 479.It Va ARGV 480argument array, assignable; 481non-null members are taken as filenames 482.It Va ENVIRON 483array of environment variables; subscripts are names. 484.El 485.Pp 486Functions may be defined (at the position of a pattern-action statement) 487thusly: 488.Pp 489.Dl function foo(a, b, c) { ...; return x } 490.Pp 491Parameters are passed by value if scalar and by reference if array name; 492functions may be called recursively. 493Parameters are local to the function; all other variables are global. 494Thus local variables may be created by providing excess parameters in 495the function definition. 496.Sh EXAMPLES 497.Dl length($0) > 72 498Print lines longer than 72 characters. 499.Pp 500.Dl { print $2, $1 } 501Print first two fields in opposite order. 502.Pp 503.Bd -literal -offset indent 504BEGIN { FS = ",[ \et]*|[ \et]+" } 505 { print $2, $1 } 506.Ed 507Same, with input fields separated by comma and/or blanks and tabs. 508.Pp 509.Bd -literal -offset indent 510{ s += $1 } 511END { print "sum is", s, " average is", s/NR } 512.Ed 513Add up first column, print sum and average. 514.Pp 515.Dl /start/, /stop/ 516Print all lines between start/stop pairs. 517.Pp 518.Bd -literal -offset indent 519BEGIN { # Simulate echo(1) 520 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 521 printf "\en" 522 exit } 523.Ed 524.Sh SEE ALSO 525.Xr lex 1 , 526.Xr sed 1 527.Rs 528.%A A. V. Aho 529.%A B. W. Kernighan 530.%A P. J. Weinberger 531.%T The AWK Programming Language 532.%I Addison-Wesley 533.%D 1988 534.%O ISBN 0-201-07981-X 535.Re 536.Sh HISTORY 537An 538.Nm 539utility appeared in 540.At v7 . 541.Sh BUGS 542There are no explicit conversions between numbers and strings. 543To force an expression to be treated as a number add 0 to it; 544to force it to be treated as a string concatenate 545.Li \&"" 546to it. 547.Pp 548The scope rules for variables in functions are a botch; 549the syntax is worse. 550