1.\" $OpenBSD: awk.1,v 1.40 2011/05/02 11:14:11 jmc Exp $ 2.\" 3.\" Copyright (C) Lucent Technologies 1997 4.\" All Rights Reserved 5.\" 6.\" Permission to use, copy, modify, and distribute this software and 7.\" its documentation for any purpose and without fee is hereby 8.\" granted, provided that the above copyright notice appear in all 9.\" copies and that both that the copyright notice and this 10.\" permission notice and warranty disclaimer appear in supporting 11.\" documentation, and that the name Lucent Technologies or any of 12.\" its entities not be used in advertising or publicity pertaining 13.\" to distribution of the software without specific, written prior 14.\" permission. 15.\" 16.\" LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 17.\" INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 18.\" IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 19.\" SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 20.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 21.\" IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 22.\" ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 23.\" THIS SOFTWARE. 24.\" 25.Dd $Mdocdate: May 2 2011 $ 26.Dt AWK 1 27.Os 28.Sh NAME 29.Nm awk 30.Nd pattern-directed scanning and processing language 31.Sh SYNOPSIS 32.Nm awk 33.Op Fl safe 34.Op Fl V 35.Op Fl d Ns Op Ar n 36.Op Fl F Ar fs 37.Op Fl v Ar var Ns = Ns Ar value 38.Op Ar prog | Fl f Ar progfile 39.Ar 40.Sh DESCRIPTION 41.Nm 42scans each input 43.Ar file 44for lines that match any of a set of patterns specified literally in 45.Ar prog 46or in one or more files specified as 47.Fl f Ar progfile . 48With each pattern there can be an associated action that will be performed 49when a line of a 50.Ar file 51matches the pattern. 52Each line is matched against the 53pattern portion of every pattern-action statement; 54the associated action is performed for each matched pattern. 55The file name 56.Sq - 57means the standard input. 58Any 59.Ar file 60of the form 61.Ar var Ns = Ns Ar value 62is treated as an assignment, not a filename, 63and is executed at the time it would have been opened if it were a filename. 64.Pp 65The options are as follows: 66.Bl -tag -width "-safe " 67.It Fl d Ns Op Ar n 68Debug mode. 69Set debug level to 70.Ar n , 71or 1 if 72.Ar n 73is not specified. 74A value greater than 1 causes 75.Nm 76to dump core on fatal errors. 77.It Fl F Ar fs 78Define the input field separator to be the regular expression 79.Ar fs . 80.It Fl f Ar progfile 81Read program code from the specified file 82.Ar progfile 83instead of from the command line. 84.It Fl safe 85Disable file output 86.Pf ( Ic print No > , 87.Ic print No >> ) , 88process creation 89.Po 90.Ar cmd | Ic getline , 91.Ic print | , 92.Ic system 93.Pc 94and access to the environment 95.Pf ( Va ENVIRON ; 96see the section on variables below). 97This is a first 98.Pq and not very reliable 99approximation to a 100.Dq safe 101version of 102.Nm . 103.It Fl V 104Print the version number of 105.Nm 106to standard output and exit. 107.It Fl v Ar var Ns = Ns Ar value 108Assign 109.Ar value 110to variable 111.Ar var 112before 113.Ar prog 114is executed; 115any number of 116.Fl v 117options may be present. 118.El 119.Pp 120The input is normally made up of input lines 121.Pq records 122separated by newlines, or by the value of 123.Va RS . 124If 125.Va RS 126is null, then any number of blank lines are used as the record separator, 127and newlines are used as field separators 128(in addition to the value of 129.Va FS ) . 130This is convenient when working with multi-line records. 131.Pp 132An input line is normally made up of fields separated by whitespace, 133or by the regular expression 134.Va FS . 135The fields are denoted 136.Va $1 , $2 , ... , 137while 138.Va $0 139refers to the entire line. 140If 141.Va FS 142is null, the input line is split into one field per character. 143.Pp 144Normally, any number of blanks separate fields. 145In order to set the field separator to a single blank, use the 146.Fl F 147option with a value of 148.Sq [\ \&] . 149If a field separator of 150.Sq t 151is specified, 152.Nm 153treats it as if 154.Sq \et 155had been specified and uses 156.Aq TAB 157as the field separator. 158In order to use a literal 159.Sq t 160as the field separator, use the 161.Fl F 162option with a value of 163.Sq [t] . 164.Pp 165A pattern-action statement has the form 166.Pp 167.D1 Ar pattern Ic \&{ Ar action Ic \&} 168.Pp 169A missing 170.Ic \&{ Ar action Ic \&} 171means print the line; 172a missing pattern always matches. 173Pattern-action statements are separated by newlines or semicolons. 174.Pp 175Newlines are permitted after a terminating statement or following a comma 176.Pq Sq ,\& , 177an open brace 178.Pq Sq { , 179a logical AND 180.Pq Sq && , 181a logical OR 182.Pq Sq || , 183after the 184.Sq do 185or 186.Sq else 187keywords, 188or after the closing parenthesis of an 189.Sq if , 190.Sq for , 191or 192.Sq while 193statement. 194Additionally, a backslash 195.Pq Sq \e 196can be used to escape a newline between tokens. 197.Pp 198An action is a sequence of statements. 199A statement can be one of the following: 200.Pp 201.Bl -tag -width Ds -offset indent -compact 202.It Xo Ic if ( Ar expression ) Ar statement 203.Op Ic else Ar statement 204.Xc 205.It Ic while ( Ar expression ) Ar statement 206.It Xo Ic for 207.No ( Ar expression ; expression ; expression ) statement 208.Xc 209.It Xo Ic for 210.No ( Ar var Ic in Ar array ) statement 211.Xc 212.It Xo Ic do 213.Ar statement Ic while ( Ar expression ) 214.Xc 215.It Ic break 216.It Ic continue 217.It Xo Ic { 218.Op Ar statement ... 219.Ic } 220.Xc 221.It Xo Ar expression 222.No # commonly 223.Ar var No = Ar expression 224.Xc 225.It Xo Ic print 226.Op Ar expression-list 227.Op > Ns Ar expression 228.Xc 229.It Xo Ic printf Ar format 230.Op Ar ... , expression-list 231.Op > Ns Ar expression 232.Xc 233.It Ic return Op Ar expression 234.It Xo Ic next 235.No # skip remaining patterns on this input line 236.Xc 237.It Xo Ic nextfile 238.No # skip rest of this file, open next, start at top 239.Xc 240.It Xo Ic delete 241.Sm off 242.Ar array Ic \&[ Ar expression Ic \&] 243.Sm on 244.No # delete an array element 245.Xc 246.It Xo Ic delete Ar array 247.No # delete all elements of array 248.Xc 249.It Xo Ic exit 250.Op Ar expression 251.No # exit immediately; status is Ar expression 252.Xc 253.El 254.Pp 255Statements are terminated by 256semicolons, newlines or right braces. 257An empty 258.Ar expression-list 259stands for 260.Ar $0 . 261String constants are quoted 262.Li \&"" , 263with the usual C escapes recognized within 264(see 265.Xr printf 1 266for a complete list of these). 267Expressions take on string or numeric values as appropriate, 268and are built using the operators 269.Ic + \- * / % ^ 270.Pq exponentiation , 271and concatenation 272.Pq indicated by whitespace . 273The operators 274.Ic \&! ++ \-\- += \-= *= /= %= ^= 275.Ic > >= < <= == != ?: 276are also available in expressions. 277Variables may be scalars, array elements 278(denoted 279.Li x[i] ) 280or fields. 281Variables are initialized to the null string. 282Array subscripts may be any string, 283not necessarily numeric; 284this allows for a form of associative memory. 285Multiple subscripts such as 286.Li [i,j,k] 287are permitted; the constituents are concatenated, 288separated by the value of 289.Va SUBSEP 290.Pq see the section on variables below . 291.Pp 292The 293.Ic print 294statement prints its arguments on the standard output 295(or on a file if 296.Pf > Ns Ar file 297or 298.Pf >> Ns Ar file 299is present or on a pipe if 300.Pf |\ \& Ar cmd 301is present), separated by the current output field separator, 302and terminated by the output record separator. 303.Ar file 304and 305.Ar cmd 306may be literal names or parenthesized expressions; 307identical string values in different statements denote 308the same open file. 309The 310.Ic printf 311statement formats its expression list according to the format 312(see 313.Xr printf 1 ) . 314.Pp 315Patterns are arbitrary Boolean combinations 316(with 317.Ic "\&! || &&" ) 318of regular expressions and 319relational expressions. 320.Nm 321supports extended regular expressions 322.Pq EREs . 323See 324.Xr re_format 7 325for more information on regular expressions. 326Isolated regular expressions 327in a pattern apply to the entire line. 328Regular expressions may also occur in 329relational expressions, using the operators 330.Ic ~ 331and 332.Ic !~ . 333.Pf / Ns Ar re Ns / 334is a constant regular expression; 335any string (constant or variable) may be used 336as a regular expression, except in the position of an isolated regular expression 337in a pattern. 338.Pp 339A pattern may consist of two patterns separated by a comma; 340in this case, the action is performed for all lines 341from an occurrence of the first pattern 342through an occurrence of the second. 343.Pp 344A relational expression is one of the following: 345.Pp 346.Bl -tag -width Ds -offset indent -compact 347.It Ar expression matchop regular-expression 348.It Ar expression relop expression 349.It Ar expression Ic in Ar array-name 350.It Xo Ic \&( Ns 351.Ar expr , expr , \&... Ns Ic \&) in 352.Ar array-name 353.Xc 354.El 355.Pp 356where a 357.Ar relop 358is any of the six relational operators in C, and a 359.Ar matchop 360is either 361.Ic ~ 362(matches) 363or 364.Ic !~ 365(does not match). 366A conditional is an arithmetic expression, 367a relational expression, 368or a Boolean combination 369of these. 370.Pp 371The special patterns 372.Ic BEGIN 373and 374.Ic END 375may be used to capture control before the first input line is read 376and after the last. 377.Ic BEGIN 378and 379.Ic END 380do not combine with other patterns. 381.Pp 382Variable names with special meanings: 383.Pp 384.Bl -tag -width "FILENAME " -compact 385.It Va ARGC 386Argument count, assignable. 387.It Va ARGV 388Argument array, assignable; 389non-null members are taken as filenames. 390.It Va CONVFMT 391Conversion format when converting numbers 392(default 393.Qq Li %.6g ) . 394.It Va ENVIRON 395Array of environment variables; subscripts are names. 396.It Va FILENAME 397The name of the current input file. 398.It Va FNR 399Ordinal number of the current record in the current file. 400.It Va FS 401Regular expression used to separate fields; also settable 402by option 403.Fl F Ar fs . 404.It Va NF 405Number of fields in the current record. 406.Va $NF 407can be used to obtain the value of the last field in the current record. 408.It Va NR 409Ordinal number of the current record. 410.It Va OFMT 411Output format for numbers (default 412.Qq Li %.6g ) . 413.It Va OFS 414Output field separator (default blank). 415.It Va ORS 416Output record separator (default newline). 417.It Va RLENGTH 418The length of the string matched by the 419.Fn match 420function. 421.It Va RS 422Input record separator (default newline). 423.It Va RSTART 424The starting position of the string matched by the 425.Fn match 426function. 427.It Va SUBSEP 428Separates multiple subscripts (default 034). 429.El 430.Sh FUNCTIONS 431The awk language has a variety of built-in functions: 432arithmetic, string, input/output, general, and bit-operation. 433.Pp 434Functions may be defined (at the position of a pattern-action statement) 435thusly: 436.Pp 437.Dl function foo(a, b, c) { ...; return x } 438.Pp 439Parameters are passed by value if scalar, and by reference if array name; 440functions may be called recursively. 441Parameters are local to the function; all other variables are global. 442Thus local variables may be created by providing excess parameters in 443the function definition. 444.Ss Arithmetic Functions 445.Bl -tag -width "atan2(y, x)" 446.It Fn atan2 y x 447Return the arctangent of 448.Fa y Ns / Ns Fa x 449in radians. 450.It Fn cos x 451Return the cosine of 452.Fa x , 453where 454.Fa x 455is in radians. 456.It Fn exp x 457Return the exponential of 458.Fa x . 459.It Fn int x 460Return 461.Fa x 462truncated to an integer value. 463.It Fn log x 464Return the natural logarithm of 465.Fa x . 466.It Fn rand 467Return a random number, 468.Fa n , 469such that 470.Sm off 471.Pf 0 \*(Le Fa n No \*(Lt 1 . 472.Sm on 473.It Fn sin x 474Return the sine of 475.Fa x , 476where 477.Fa x 478is in radians. 479.It Fn sqrt x 480Return the square root of 481.Fa x . 482.It Fn srand expr 483Sets seed for 484.Fn rand 485to 486.Fa expr 487and returns the previous seed. 488If 489.Fa expr 490is omitted, the time of day is used instead. 491.El 492.Ss String Functions 493.Bl -tag -width "split(s, a, fs)" 494.It Fn gsub r t s 495The same as 496.Fn sub 497except that all occurrences of the regular expression are replaced. 498.Fn gsub 499returns the number of replacements. 500.It Fn index s t 501The position in 502.Fa s 503where the string 504.Fa t 505occurs, or 0 if it does not. 506.It Fn length s 507The length of 508.Fa s 509taken as a string, 510or of 511.Va $0 512if no argument is given. 513.It Fn match s r 514The position in 515.Fa s 516where the regular expression 517.Fa r 518occurs, or 0 if it does not. 519The variable 520.Va RSTART 521is set to the starting position of the matched string 522.Pq which is the same as the returned value 523or zero if no match is found. 524The variable 525.Va RLENGTH 526is set to the length of the matched string, 527or \-1 if no match is found. 528.It Fn split s a fs 529Splits the string 530.Fa s 531into array elements 532.Va a[1] , a[2] , ... , a[n] 533and returns 534.Va n . 535The separation is done with the regular expression 536.Ar fs 537or with the field separator 538.Va FS 539if 540.Ar fs 541is not given. 542An empty string as field separator splits the string 543into one array element per character. 544.It Fn sprintf fmt expr ... 545The string resulting from formatting 546.Fa expr , ... 547according to the 548.Xr printf 1 549format 550.Fa fmt . 551.It Fn sub r t s 552Substitutes 553.Fa t 554for the first occurrence of the regular expression 555.Fa r 556in the string 557.Fa s . 558If 559.Fa s 560is not given, 561.Va $0 562is used. 563An ampersand 564.Pq Sq & 565in 566.Fa t 567is replaced in string 568.Fa s 569with regular expression 570.Fa r . 571A literal ampersand can be specified by preceding it with two backslashes 572.Pq Sq \e\e . 573A literal backslash can be specified by preceding it with another backslash 574.Pq Sq \e\e . 575.Fn sub 576returns the number of replacements. 577.It Fn substr s m n 578Return at most the 579.Fa n Ns -character 580substring of 581.Fa s 582that begins at position 583.Fa m 584counted from 1. 585If 586.Fa n 587is omitted, or if 588.Fa n 589specifies more characters than are left in the string, 590the length of the substring is limited by the length of 591.Fa s . 592.It Fn tolower str 593Returns a copy of 594.Fa str 595with all upper-case characters translated to their 596corresponding lower-case equivalents. 597.It Fn toupper str 598Returns a copy of 599.Fa str 600with all lower-case characters translated to their 601corresponding upper-case equivalents. 602.El 603.Ss Input/Output and General Functions 604.Bl -tag -width "getline [var] < file" 605.It Fn close expr 606Closes the file or pipe 607.Fa expr . 608.Fa expr 609should match the string that was used to open the file or pipe. 610.It Ar cmd | Ic getline Op Va var 611Read a record of input from a stream piped from the output of 612.Ar cmd . 613If 614.Va var 615is omitted, the variables 616.Va $0 617and 618.Va NF 619are set. 620Otherwise 621.Va var 622is set. 623If the stream is not open, it is opened. 624As long as the stream remains open, subsequent calls 625will read subsequent records from the stream. 626The stream remains open until explicitly closed with a call to 627.Fn close . 628.Ic getline 629returns 1 for a successful input, 0 for end of file, and \-1 for an error. 630.It Fn fflush [expr] 631Flushes any buffered output for the file or pipe 632.Fa expr , 633or all open files or pipes if 634.Fa expr 635is omitted. 636.Fa expr 637should match the string that was used to open the file or pipe. 638.It Ic getline 639Sets 640.Va $0 641to the next input record from the current input file. 642This form of 643.Ic getline 644sets the variables 645.Va NF , 646.Va NR , 647and 648.Va FNR . 649.Ic getline 650returns 1 for a successful input, 0 for end of file, and \-1 for an error. 651.It Ic getline Va var 652Sets 653.Va $0 654to variable 655.Va var . 656This form of 657.Ic getline 658sets the variables 659.Va NR 660and 661.Va FNR . 662.Ic getline 663returns 1 for a successful input, 0 for end of file, and \-1 for an error. 664.It Xo 665.Ic getline Op Va var 666.Pf \ \&< Ar file 667.Xc 668Sets 669.Va $0 670to the next record from 671.Ar file . 672If 673.Va var 674is omitted, the variables 675.Va $0 676and 677.Va NF 678are set. 679Otherwise 680.Va var 681is set. 682If 683.Ar file 684is not open, it is opened. 685As long as the stream remains open, subsequent calls will read subsequent 686records from 687.Ar file . 688.Ar file 689remains open until explicitly closed with a call to 690.Fn close . 691.It Fn system cmd 692Executes 693.Fa cmd 694and returns its exit status. 695.El 696.Ss Bit-Operation Functions 697.Bl -tag -width "lshift(a, b)" 698.It Fn compl x 699Returns the bitwise complement of integer argument x. 700.It Fn and x y 701Performs a bitwise AND on integer arguments x and y. 702.It Fn or x y 703Performs a bitwise OR on integer arguments x and y. 704.It Fn xor x y 705Performs a bitwise Exclusive-OR on integer arguments x and y. 706.It Fn lshift x n 707Returns integer argument x shifted by n bits to the left. 708.It Fn rshift x n 709Returns integer argument x shifted by n bits to the right. 710.El 711.Sh EXIT STATUS 712.Ex -std awk 713.Pp 714But note that the 715.Ic exit 716expression can modify the exit status. 717.Sh EXAMPLES 718Print lines longer than 72 characters: 719.Pp 720.Dl length($0) > 72 721.Pp 722Print first two fields in opposite order: 723.Pp 724.Dl { print $2, $1 } 725.Pp 726Same, with input fields separated by comma and/or blanks and tabs: 727.Bd -literal -offset indent 728BEGIN { FS = ",[ \et]*|[ \et]+" } 729 { print $2, $1 } 730.Ed 731.Pp 732Add up first column, print sum and average: 733.Bd -literal -offset indent 734{ s += $1 } 735END { print "sum is", s, " average is", s/NR } 736.Ed 737.Pp 738Print all lines between start/stop pairs: 739.Pp 740.Dl /start/, /stop/ 741.Pp 742Simulate echo(1): 743.Bd -literal -offset indent 744BEGIN { # Simulate echo(1) 745 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 746 printf "\en" 747 exit } 748.Ed 749.Pp 750Print an error message to standard error: 751.Bd -literal -offset indent 752{ print "error!" > "/dev/stderr" } 753.Ed 754.Sh SEE ALSO 755.Xr lex 1 , 756.Xr printf 1 , 757.Xr sed 1 , 758.Xr re_format 7 , 759.Xr script 7 760.Rs 761.%A A. V. Aho 762.%A B. W. Kernighan 763.%A P. J. Weinberger 764.%T The AWK Programming Language 765.%I Addison-Wesley 766.%D 1988 767.%O ISBN 0-201-07981-X 768.Re 769.Sh STANDARDS 770The 771.Nm 772utility is compliant with the 773.St -p1003.1-2008 774specification. 775.Pp 776The flags 777.Op Fl \&dV 778and 779.Op Fl safe , 780as well as the commands 781.Cm fflush , compl , and , or , 782.Cm xor , lshift , rshift , 783are extensions to that specification. 784.Pp 785.Nm 786does not support {n,m} pattern matching. 787.Sh HISTORY 788An 789.Nm 790utility appeared in 791.At v7 . 792.Sh BUGS 793There are no explicit conversions between numbers and strings. 794To force an expression to be treated as a number add 0 to it; 795to force it to be treated as a string concatenate 796.Li \&"" 797to it. 798.Pp 799The scope rules for variables in functions are a botch; 800the syntax is worse. 801