1.\" $OpenBSD: sort.1,v 1.17 2002/02/10 15:50:15 aaron Exp $ 2.\" 3.\" Copyright (c) 1991, 1993 4.\" The Regents of the University of California. All rights reserved. 5.\" 6.\" This code is derived from software contributed to Berkeley by 7.\" the Institute of Electrical and Electronics Engineers, Inc. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by the University of 20.\" California, Berkeley and its contributors. 21.\" 4. Neither the name of the University nor the names of its contributors 22.\" may be used to endorse or promote products derived from this software 23.\" without specific prior written permission. 24.\" 25.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 26.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 27.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 28.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 29.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 30.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 31.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 32.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 33.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 34.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 35.\" SUCH DAMAGE. 36.\" 37.\" @(#)sort.1 8.1 (Berkeley) 6/6/93 38.\" 39.Dd June 6, 1993 40.Dt SORT 1 41.Os 42.Sh NAME 43.Nm sort 44.Nd sort or merge text files 45.Sh SYNOPSIS 46.Nm sort 47.Op Fl cmubdfinrH 48.Op Fl t Ar char 49.Op Fl R Ar char 50.Oo 51.Cm Fl k Ar field1[,field2] 52.Oc 53.Ar ... 54.Op Fl T Ar dir 55.Op Fl o Ar output 56.Op Ar file 57.Ar ... 58.Sh DESCRIPTION 59The 60.Nm 61utility sorts text files by lines. 62Comparisons are based on one or more sort keys extracted 63from each line of input, and are performed lexicographically. 64By default, if keys are not given, 65.Nm 66regards each input line as a single field. 67.Pp 68The options are as follows: 69.Bl -tag -width file Ds 70.It Fl c 71Check that the single input file is sorted. 72If the file is not sorted, 73.Nm 74produces the appropriate error messages and exits with code 1; otherwise, 75.Nm 76returns 0. 77.Nm 78.Fl c 79produces no output, except the error messages on 80.Em stderr . 81.It Fl m 82Merge only; the input files are assumed to be pre-sorted. 83.It Fl o Ar output 84The argument given is the name of an 85.Ar output 86file to be used instead of the standard output. 87This file can be the same as one of the input files. 88.It Fl T Ar dir 89Use 90.Ar dir 91as the directory for temporary files. 92The default is the contents of the environment variable 93.Ev TMPDIR 94or 95.Pa /var/tmp 96if 97.Ev TMPDIR 98does not exist. 99.It Fl u 100Unique: suppress all but one in each set of lines having equal keys. 101If used with the 102.Fl c 103option, check that there are no lines with duplicate keys. 104.El 105.Pp 106The following options override the default ordering rules. 107When ordering options appear independent of key field 108specifications, the requested field ordering rules are 109applied globally to all sort keys. 110When attached to a specific key (see 111.Fl k ) , 112the ordering options override 113all global ordering options for that key. 114.Bl -tag -width indent 115.It Fl d 116Only blank space and alphanumeric characters 117.\" according 118.\" to the current setting of LC_CTYPE 119are used in making comparisons. 120.It Fl f 121Considers all lowercase characters that have uppercase 122equivalents to be the same for purposes of comparison. 123.It Fl i 124Ignore all non-printable characters. 125.It Fl n 126An initial numeric string, consisting of optional blank space, optional 127minus sign, and zero or more digits (including decimal point) 128.\" with 129.\" optional radix character and thousands 130.\" separator 131.\" (as defined in the current locale), 132is sorted by arithmetic value. 133(The 134.Fl n 135option no longer implies the 136.Fl b 137option.) 138.It Fl r 139Reverse the sense of comparisons. 140.It Fl H 141Use a merge sort instead of a radix sort. 142This option should be used for files larger than 60Mb. 143.El 144.Pp 145The treatment of field separators can be altered using these options: 146.Bl -tag -width indent 147.It Fl b 148Ignores leading blank space when determining the start 149and end of a restricted sort key. 150A 151.Fl b 152option specified before the first 153.Fl k 154option applies globally to all 155.Fl k 156options. 157Otherwise, the 158.Fl b 159option can be attached independently to each 160.Ar field 161argument of the 162.Fl k 163option (see below). 164Note that the 165.Fl b 166option has no effect unless key fields are specified. 167.It Fl t Ar char 168.Ar char 169is used as the field separator character. 170The initial 171.Ar char 172is not considered to be part of a field when determining key offsets. 173Each occurrence of 174.Ar char 175is significant (for example, 176.Dq Ar charchar 177delimits an empty field). 178If 179.Fl t 180is not specified, the default field separator is a sequence of 181blank-space characters, and consecutive blank spaces do 182.Em not 183delimit an empty field; further, the initial blank space 184.Em is 185considered part of a field when determining key offsets. 186.It Fl R Ar char 187.Ar char 188is used as the record separator character. 189This should be used with discretion; 190.Fl R Ar <alphanumeric> 191usually produces undesirable results. 192The default record separator is newline. 193.It Fl k Ar field1[,field2] 194Designates the starting position, 195.Ar field1 , 196and optional ending position, 197.Ar field2 , 198of a key field. 199The 200.Fl k 201option replaces the obsolescent options 202.Cm \(pl Ns Ar pos1 203and 204.Fl Ns Ar pos2 . 205.El 206.Pp 207The following operands are available: 208.Bl -tag -width indent 209.It Ar file 210The pathname of a file to be sorted, merged, or checked. 211If no 212.Ar file 213operands are specified, or if a 214.Ar file 215operand is 216.Fl , 217the standard input is used. 218.El 219.Pp 220A field is defined as a maximal sequence of characters other than the 221field separator and record separator 222.Pq newline by default . 223Initial blank spaces are included in the field unless 224.Fl b 225has been specified; 226the first blank space of a sequence of blank spaces acts as the field 227separator and is included in the field (unless 228.Fl t 229is specified). 230For example, by default all blank spaces at the beginning of a line are 231considered to be part of the first field. 232.Pp 233Fields are specified by the 234.Fl k Ar field1[,field2] 235argument. 236A missing 237.Ar field2 238argument defaults to the end of a line. 239.Pp 240The arguments 241.Ar field1 242and 243.Ar field2 244have the form 245.Em m.n 246.Em (m,n > 0) 247and can be followed by one or more of the letters 248.Cm b , d , f , i , 249.Cm n , 250and 251.Cm r , 252which correspond to the options discussed above. 253A 254.Ar field1 255position specified by 256.Em m.n 257is interpreted as the 258.Em n Ns th 259character from the beginning of the 260.Em m Ns th 261field. 262A missing 263.Em \&.n 264in 265.Ar field1 266means 267.Ql \&.1 , 268indicating the first character of the 269.Em m Ns th 270field; if the 271.Fl b 272option is in effect, 273.Em n 274is counted from the first non-blank character in the 275.Em m Ns th 276field; 277.Em m Ns \&.1b 278refers to the first non-blank character in the 279.Em m Ns th 280field. 281.No 1\&. Ns Em n 282refers to the 283.Em n Ns th 284character from the beginning of the line; 285if 286.Em n 287is greater than the length of the line, the field is taken to be empty. 288.Pp 289A 290.Ar field2 291position specified by 292.Em m.n 293is interpreted as the 294.Em n Ns th 295character (including separators) of the 296.Em m Ns th 297field. 298A missing 299.Em \&.n 300indicates the last character of the 301.Em m Ns th 302field; 303.Em m 304= \&0 305designates the end of a line. 306Thus the option 307.Fl k Ar v.x,w.y 308is synonymous with the obsolescent option 309.Cm \(pl Ns Ar v-\&1.x-\&1 310.Fl Ns Ar w-\&1.y ; 311when 312.Em y 313is omitted, 314.Fl k Ar v.x,w 315is synonymous with 316.Cm \(pl Ns Ar v-\&1.x-\&1 317.Fl Ns Ar w+1.0 . 318The obsolescent 319.Cm \(pl Ns Ar pos1 320.Fl Ns Ar pos2 321option is still supported, except for 322.Fl Ns Ar w\&.0b , 323which has no 324.Fl k 325equivalent. 326.Pp 327The 328.Nm 329utility shall exit with one of the following values: 330.Pp 331.Bl -tag -width flag -compact 332.It 0 333Normal behavior. 334.It 1 335On disorder (or non-uniqueness) with the 336.Fl c 337option. 338.It 2 339An error occurred. 340.El 341.Sh ENVIRONMENT 342.Bl -tag -width Fl 343.It Ev TMPDIR 344Path in which to store temporary files. 345Note that 346.Ev TMPDIR 347may be overridden by the 348.Fl T 349option. 350.El 351.Sh FILES 352.Bl -tag -width Pa -compact 353.It Pa /var/tmp/sort.* 354default temporary directories 355.It Pa Ar output Ns #PID 356temporary name for 357.Ar output 358if 359.Ar output 360already exists 361.El 362.Sh SEE ALSO 363.Xr comm 1 , 364.Xr join 1 , 365.Xr radixsort 3 , 366.Xr uniq 1 367.Sh HISTORY 368A 369.Nm 370command appeared in 371.At v3 . 372.Sh NOTES 373.Nm 374has no limits on input line length (other than imposed by available 375memory) or any restrictions on bytes allowed within lines. 376.Pp 377To protect data 378.Nm 379.Fl o 380calls 381.Xr link 2 382and 383.Xr unlink 2 , 384and thus fails on protected directories. 385.Pp 386The current sort command uses lexicographic radix sorting, which requires 387that sort keys be kept in memory (as opposed to previous versions which 388used quick and merge sorts and did not). 389Thus performance depends highly on efficient choice of sort keys, and the 390.Fl b 391option and the 392.Ar field2 393argument of the 394.Fl k 395option should be used whenever possible. 396Similarly, 397.Nm 398.Fl k1f 399is equivalent to 400.Nm 401.Fl f 402and may take twice as long. 403.Sh BUGS 404To sort files larger than 60Mb, use 405.Nm 406.Fl H ; 407files larger than 704Mb must be sorted in smaller pieces, then merged. 408