1.\" $NetBSD: sort.1,v 1.18 2002/02/08 01:36:33 ross Exp $ 2.\" 3.\" Copyright (c) 1991, 1993 4.\" The Regents of the University of California. All rights reserved. 5.\" 6.\" This code is derived from software contributed to Berkeley by 7.\" the Institute of Electrical and Electronics Engineers, Inc. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by the University of 20.\" California, Berkeley and its contributors. 21.\" 4. Neither the name of the University nor the names of its contributors 22.\" may be used to endorse or promote products derived from this software 23.\" without specific prior written permission. 24.\" 25.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 26.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 27.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 28.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 29.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 30.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 31.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 32.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 33.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 34.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 35.\" SUCH DAMAGE. 36.\" 37.\" @(#)sort.1 8.1 (Berkeley) 6/6/93 38.\" 39.Dd January 13, 2001 40.Dt SORT 1 41.Os 42.Sh NAME 43.Nm sort 44.Nd sort or merge text files 45.Sh SYNOPSIS 46.Nm sort 47.Op Fl cmubdfHinrsS 48.Op Fl t Ar char 49.Op Fl R Ar char 50.Oo 51.Fl k 52.Ar field1 Ns Op Li \&, Ns Ar field2 53.Oc 54.Op Fl T Ar dir 55.Op Fl o Ar output 56.Op Ar 57.Sh DESCRIPTION 58The 59.Nm 60utility sorts text files by lines. 61Comparisons are based on one or more sort keys extracted 62from each line of input, and are performed lexicographically. 63By default, if keys are not given, 64.Nm 65regards each input line as a single field. 66.Pp 67The following options are available: 68.Bl -tag -width Fl 69.It Fl c 70Check that the single input file is sorted. 71If the file is not sorted, 72.Nm 73produces the appropriate error messages and exits with code 1; otherwise, 74.Nm 75returns 0. 76.Nm 77.Fl c 78produces no output. 79.It Fl m 80Merge only; the input files are assumed to be pre-sorted. 81.It Fl o Ar output 82The argument given is the name of an 83.Ar output 84file to be used instead of the standard output. 85This file can be the same as one of the input files. 86.It Fl T Ar dir 87Use 88.Ar dir 89as the directory for temporary files. 90The default is the value specified in the environment variable 91.Ev TMPDIR or 92.Pa /tmp 93if 94.Ev TMPDIR 95is not defined. 96.It Fl u 97Unique: suppress all but one in each set of lines having equal keys. 98If used with the 99.Fl c 100option, check that there are no lines with duplicate keys. 101.El 102.Pp 103The following options override the default ordering rules. 104When ordering options appear independent of key field 105specifications, the requested field ordering rules are 106applied globally to all sort keys. 107When attached to a specific key (see 108.Fl k ) , 109the ordering options override 110all global ordering options for that key. 111.Bl -tag -width Fl 112.It Fl d 113Only blank space and alphanumeric characters 114.\" according 115.\" to the current setting of LC_CTYPE 116are used 117in making comparisons. 118.It Fl f 119Considers all lowercase characters that have uppercase 120equivalents to be the same for purposes of comparison. 121.It Fl i 122Ignore all non-printable characters. 123.It Fl n 124An initial numeric string, consisting of optional blank space, optional 125minus sign, and zero or more digits (including decimal point) 126.\" with 127.\" optional radix character and thousands 128.\" separator 129.\" (as defined in the current locale), 130is sorted by arithmetic value. 131(The 132.Fl n 133option no longer implies the 134.Fl b 135option.) 136.It Fl r 137Reverse the sense of comparisons. 138.It Fl S 139Don't use stable sort. 140Default is to use stable sort. 141.It Fl s 142Use stable sort. 143This is the default. 144Provided for compatiblity with other 145.Nm 146implementations only. 147.It Fl H 148Use a merge sort instead of a radix sort. 149This option should be used for files larger than 60Mb. 150.El 151.Pp 152The treatment of field separators can be altered using these options: 153.Bl -tag -width Fl 154.It Fl b 155Ignores leading blank space when determining the start 156and end of a restricted sort key. 157A 158.Fl b 159option specified before the first 160.Fl k 161option applies globally to all 162.Fl k 163options. 164Otherwise, the 165.Fl b 166option can be attached independently to each 167.Ar field 168argument of the 169.Fl k 170option (see below). 171Note that the 172.Fl b 173option has no effect unless key fields are specified. 174.It Fl t Ar char 175.Ar char 176is used as the field separator character. 177The initial 178.Ar char 179is not considered to be part of a field when determining 180key offsets (see below). 181Each occurrence of 182.Ar char 183is significant (for example, 184.Dq Ar charchar 185delimits an empty field). 186If 187.Fl t 188is not specified, the default field separator is a sequence of 189blank-space characters, and consecutive blank spaces do 190.Em not 191delimit an empty field; further, the initial blank space 192.Em is 193considered part of a field when determining key offsets. 194.It Fl R Ar char 195.Ar char 196is used as the record separator character. 197This should be used with discretion; 198.Fl R Ar \*[Lt]alphanumeric\*[Gt] 199usually produces undesirable results. 200The default record separator is newline. 201.It Xo 202.Fl k 203.Ar field1 Ns Op Li \&, Ns Ar field2 204.Xc 205Designates the starting position, 206.Ar field1 , 207and optional ending position, 208.Ar field2 , 209of a key field. 210The 211.Fl k 212option replaces the obsolescent options 213.Cm \(pl Ns Ar pos1 214and 215.Fl Ns Ar pos2 . 216.El 217.Pp 218The following operands are available: 219.Bl -tag -width Ar 220.It Ar file 221The pathname of a file to be sorted, merged, or checked. 222If no 223.Ar file 224operands are specified, or if 225a 226.Ar file 227operand is 228.Fl , 229the standard input is used. 230.El 231.Pp 232A field is defined as a minimal sequence of characters followed by a 233field separator or a newline character. 234By default, the first 235blank space of a sequence of blank spaces acts as the field separator. 236All blank spaces in a sequence of blank spaces are considered 237as part of the next field; for example, all blank spaces at 238the beginning of a line are considered to be part of the 239first field. 240.Pp 241Fields are specified 242by the 243.Fl k 244.Ar field1 Ns Op \&, Ns Ar field2 245argument. 246A missing 247.Ar field2 248argument defaults to the end of a line. 249.Pp 250The arguments 251.Ar field1 252and 253.Ar field2 254have the form 255.Ar m Ns Li \&. Ns Ar n 256and can be followed by one or more of the letters 257.Cm b , d , f , i , 258.Cm n , 259and 260.Cm r , 261which correspond to the options discussed above. 262A 263.Ar field1 264position specified by 265.Ar m Ns Li \&. Ns Ar n 266.Pq Ar m , n No \*[Gt] 0 267is interpreted as the 268.Ar n Ns th 269character in the 270.Ar m Ns th 271field. 272A missing 273.Li \&. Ns Ar n 274in 275.Ar field1 276means 277.Ql \&.1 , 278indicating the first character of the 279.Ar m Ns th 280field; if the 281.Fl b 282option is in effect, 283.Ar n 284is counted from the first non-blank character in the 285.Ar m Ns th 286field; 287.Ar m Ns Li \&.1b 288refers to the first non-blank character in the 289.Ar m Ns th 290field. 291.Pp 292A 293.Ar field2 294position specified by 295.Ar m Ns Li \&. Ns Ar n 296is interpreted as 297the 298.Ar n Ns th 299character (including separators) of the 300.Ar m Ns th 301field. 302A missing 303.Li \&. Ns Ar n 304indicates the last character of the 305.Ar m Ns th 306field; 307.Ar m 308= \&0 309designates the end of a line. 310Thus the option 311.Fl k 312.Sm off 313.Xo 314.Ar v Li \&. Ar x Li \&, 315.Ar w Li \&. Ar y 316.Xc 317.Sm on 318is synonymous with the obsolescent option 319.Sm off 320.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1 321.Fl Ar w-\&1 Li \&. Ar y ; 322.Sm on 323when 324.Ar y 325is omitted, 326.Fl k 327.Sm off 328.Ar v Li \&. Ar x Li \&, Ar w 329.Sm on 330is synonymous with 331.Sm off 332.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1 333.Fl Ar w+1 Li \&.0 . 334.Sm on 335The obsolescent 336.Cm \(pl Ns Ar pos1 337.Fl Ns Ar pos2 338option is still supported, except for 339.Fl Ns Ar w Ns Li \&.0b , 340which has no 341.Fl k 342equivalent. 343.Sh RETURN VALUES 344Sort exits with one of the following values: 345.Bl -tag -width flag -compact 346.It 0 347Normal behavior. 348.It 1 349On disorder (or non-uniqueness) with the 350.Fl c 351option 352.It 2 353An error occurred. 354.El 355.Sh ENVIRONMENT 356If the following environment variable exists, it is utilized by 357.Nm "" . 358.Bl -tag -width Ev 359.It Ev TMPDIR 360.Nm 361uses the contents of the 362.Ev TMPDIR 363environment variable as the path in which to store 364temporary files. 365.El 366.Sh FILES 367.Bl -tag -width outputNUMBER+some -compact 368.It Pa /tmp/sort.* 369Default temporary files. 370.It Pa Ar output Ns NUMBER 371Temporary file which is used for output if 372.Ar output 373already exists. 374Once sorting is finished, this file replaces 375.Ar output 376(via 377.Xr link 2 378and 379.Xr unlink 2 ) . 380.El 381.Sh SEE ALSO 382.Xr comm 1 , 383.Xr join 1 , 384.Xr uniq 1 , 385.Xr qsort 3 , 386.Xr radixsort 3 387.Sh HISTORY 388A 389.Nm 390command appeared in 391.At v5 . 392This 393.Nm 394implementation appeared in 395.Bx 4.4 396and is used since 397.Nx 1.6 . 398.Sh BUGS 399To sort files larger than 60Mb, use 400.Nm 401.Fl H ; 402files larger than 704Mb must be sorted in smaller pieces, then merged. 403.Sh NOTES 404This 405.Nm 406has no limits on input line length (other than imposed by available 407memory) or any restrictions on bytes allowed within lines. 408.Pp 409To protect data 410.Nm 411.Fl o 412calls 413.Xr link 2 414and 415.Xr unlink 2 , 416and thus fails on protected directories. 417.Pp 418Input files should be text files. 419If file doesn't end with record separator (which is typically newline), the 420.Nm 421utility silently supplies one. 422.Pp 423The current 424.Nm 425uses lexicographic radix sorting, which requires 426that sort keys be kept in memory (as opposed to previous versions which used quick 427and merge sorts and did not.) 428Thus performance depends highly on efficient choice of sort keys, and the 429.Fl b 430option and the 431.Ar field2 432argument of the 433.Fl k 434option should be used whenever possible. 435Similarly, 436.Nm 437.Fl k1f 438is equivalent to 439.Nm 440.Fl f 441and may take twice as long. 442