1.\" $NetBSD: sort.1,v 1.34 2013/05/29 15:00:35 wiz Exp $ 2.\" 3.\" Copyright (c) 2000-2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Ben Harris and Jaromir Dolecek. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGE. 29.\" 30.\" Copyright (c) 1991, 1993 31.\" The Regents of the University of California. All rights reserved. 32.\" 33.\" This code is derived from software contributed to Berkeley by 34.\" the Institute of Electrical and Electronics Engineers, Inc. 35.\" 36.\" Redistribution and use in source and binary forms, with or without 37.\" modification, are permitted provided that the following conditions 38.\" are met: 39.\" 1. Redistributions of source code must retain the above copyright 40.\" notice, this list of conditions and the following disclaimer. 41.\" 2. Redistributions in binary form must reproduce the above copyright 42.\" notice, this list of conditions and the following disclaimer in the 43.\" documentation and/or other materials provided with the distribution. 44.\" 3. Neither the name of the University nor the names of its contributors 45.\" may be used to endorse or promote products derived from this software 46.\" without specific prior written permission. 47.\" 48.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 49.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 50.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 51.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 52.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 53.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 54.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 55.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 56.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 57.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 58.\" SUCH DAMAGE. 59.\" 60.\" @(#)sort.1 8.1 (Berkeley) 6/6/93 61.\" 62.Dd May 29, 2013 63.Dt SORT 1 64.Os 65.Sh NAME 66.Nm sort 67.Nd sort or merge text files 68.Sh SYNOPSIS 69.Nm 70.Op Fl bcdfHilmnrSsu 71.Oo 72.Fl k 73.Ar field1 Ns Op Li \&, Ns Ar field2 74.Oc 75.Op Fl o Ar output 76.Op Fl R Ar char 77.Op Fl T Ar dir 78.Op Fl t Ar char 79.Op Ar 80.Sh DESCRIPTION 81The 82.Nm 83utility sorts text files by lines. 84Comparisons are based on one or more sort keys extracted 85from each line of input, and are performed lexicographically. 86By default, if keys are not given, 87.Nm 88regards each input line as a single field. 89.Pp 90The following options are available: 91.Bl -tag -width Fl 92.It Fl c 93Check that the single input file is sorted. 94If the file is not sorted, 95.Nm 96produces the appropriate error messages and exits with code 1; otherwise, 97.Nm 98returns 0. 99.Nm 100.Fl c 101produces no output. 102See also 103.Fl u . 104.It Fl H 105Ignored for compatibility with earlier versions of 106.Nm . 107.It Fl m 108Merge only; the input files are assumed to be pre-sorted. 109.It Fl o Ar output 110The argument given is the name of an 111.Ar output 112file to be used instead of the standard output. 113This file can be the same as one of the input files. 114.It Fl S 115Don't use stable sort. 116Default is to use stable sort. 117.It Fl s 118Use stable sort, keeps records with equal keys in their original order. 119This is the default. 120Provided for compatibility with other 121.Nm 122implementations only. 123.It Fl T Ar dir 124Use 125.Ar dir 126as the directory for temporary files. 127The default is the value specified in the environment variable 128.Ev TMPDIR or 129.Pa /tmp 130if 131.Ev TMPDIR 132is not defined. 133.It Fl u 134Unique: suppress all but one in each set of lines having equal keys. 135If used with the 136.Fl c 137option, check that there are no lines with duplicate keys. 138.El 139.Pp 140The following options override the default ordering rules. 141When ordering options appear independent of key field 142specifications, the requested field ordering rules are 143applied globally to all sort keys. 144When attached to a specific key (see 145.Fl k ) , 146the ordering options override 147all global ordering options for that key. 148.Bl -tag -width Fl 149.It Fl d 150Only blank space and alphanumeric characters 151.\" according 152.\" to the current setting of LC_CTYPE 153are used 154in making comparisons. 155.It Fl f 156Considers all lowercase characters that have uppercase 157equivalents to be the same for purposes of comparison. 158.It Fl i 159Ignore all non-printable characters. 160.It Fl l 161Sort by the string length of the field, not by the field itself. 162.It Fl n 163An initial numeric string, consisting of optional blank space, optional 164plus or minus sign, and zero or more digits (including decimal point) 165.\" with 166.\" optional radix character and thousands 167.\" separator 168.\" (as defined in the current locale), 169is sorted by arithmetic value. 170(The 171.Fl n 172option no longer implies the 173.Fl b 174option.) 175.It Fl r 176Reverse the sense of comparisons. 177.El 178.Pp 179The treatment of field separators can be altered using these options: 180.Bl -tag -width Fl 181.It Fl b 182Ignores leading blank space when determining the start 183and end of a restricted sort key. 184A 185.Fl b 186option specified before the first 187.Fl k 188option applies globally to all 189.Fl k 190options. 191Otherwise, the 192.Fl b 193option can be attached independently to each 194.Ar field 195argument of the 196.Fl k 197option (see below). 198Note that the 199.Fl b 200option has no effect unless key fields are specified. 201.It Fl t Ar char 202.Ar char 203is used as the field separator character. 204The initial 205.Ar char 206is not considered to be part of a field when determining 207key offsets (see below). 208Each occurrence of 209.Ar char 210is significant (for example, 211.Dq Ar charchar 212delimits an empty field). 213If 214.Fl t 215is not specified, the default field separator is a sequence of 216blank-space characters, and consecutive blank spaces do 217.Em not 218delimit an empty field; further, the initial blank space 219.Em is 220considered part of a field when determining key offsets. 221.It Fl R Ar char 222.Ar char 223is used as the record separator character. 224This should be used with discretion; 225.Fl R Aq Ar alphanumeric 226usually produces undesirable results. 227The default record separator is newline. 228.It Fl k Ar field1 Ns Op Li \&, Ns Ar field2 229Designates the starting position, 230.Ar field1 , 231and optional ending position, 232.Ar field2 , 233of a key field. 234The 235.Fl k 236option replaces the obsolescent options 237.Cm \(pl Ns Ar pos1 238and 239.Fl Ns Ar pos2 . 240.El 241.Pp 242The following operands are available: 243.Bl -tag -width Ar 244.It Ar file 245The pathname of a file to be sorted, merged, or checked. 246If no 247.Ar file 248operands are specified, or if 249a 250.Ar file 251operand is 252.Fl , 253the standard input is used. 254.El 255.Pp 256A field is defined as a minimal sequence of characters followed by a 257field separator or a newline character. 258By default, the first 259blank space of a sequence of blank spaces acts as the field separator. 260All blank spaces in a sequence of blank spaces are considered 261as part of the next field; for example, all blank spaces at 262the beginning of a line are considered to be part of the 263first field. 264.Pp 265Fields are specified 266by the 267.Fl k 268.Ar field1 Ns Op \&, Ns Ar field2 269argument. 270A missing 271.Ar field2 272argument defaults to the end of a line. 273.Pp 274The arguments 275.Ar field1 276and 277.Ar field2 278have the form 279.Ar m Ns Li \&. Ns Ar n 280and can be followed by one or more of the letters 281.Cm b , d , f , i , 282.Cm l , n , 283and 284.Cm r , 285which correspond to the options discussed above. 286A 287.Ar field1 288position specified by 289.Ar m Ns Li \&. Ns Ar n 290.Pq Ar m , n No \*[Gt] 0 291is interpreted as the 292.Ar n Ns th 293character in the 294.Ar m Ns th 295field. 296A missing 297.Li \&. Ns Ar n 298in 299.Ar field1 300means 301.Ql \&.1 , 302indicating the first character of the 303.Ar m Ns th 304field; if the 305.Fl b 306option is in effect, 307.Ar n 308is counted from the first non-blank character in the 309.Ar m Ns th 310field; 311.Ar m Ns Li \&.1b 312refers to the first non-blank character in the 313.Ar m Ns th 314field. 315.Pp 316A 317.Ar field2 318position specified by 319.Ar m Ns Li \&. Ns Ar n 320is interpreted as 321the 322.Ar n Ns th 323character (including separators) of the 324.Ar m Ns th 325field. 326A missing 327.Li \&. Ns Ar n 328indicates the last character of the 329.Ar m Ns th 330field; 331.Ar m 332= \&0 333designates the end of a line. 334Thus the option 335.Fl k 336.Sm off 337.Xo 338.Ar v Li \&. Ar x Li \&, 339.Ar w Li \&. Ar y 340.Xc 341.Sm on 342is synonymous with the obsolescent option 343.Sm off 344.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1 345.Fl Ar w-\&1 Li \&. Ar y ; 346.Sm on 347when 348.Ar y 349is omitted, 350.Fl k 351.Sm off 352.Ar v Li \&. Ar x Li \&, Ar w 353.Sm on 354is synonymous with 355.Sm off 356.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1 357.Fl Ar w+1 Li \&.0 . 358.Sm on 359The obsolescent 360.Cm \(pl Ns Ar pos1 361.Fl Ns Ar pos2 362option is still supported, except for 363.Fl Ns Ar w Ns Li \&.0b , 364which has no 365.Fl k 366equivalent. 367.Sh ENVIRONMENT 368If the following environment variable exists, it is used by 369.Nm . 370.Bl -tag -width Ev 371.It Ev TMPDIR 372.Nm 373uses the contents of the 374.Ev TMPDIR 375environment variable as the path in which to store 376temporary files. 377.El 378.Sh FILES 379.Bl -tag -width outputNUMBER+some -compact 380.It Pa /tmp/sort.* 381Default temporary files. 382.It Ar output Ns NUMBER 383Temporary file which is used for output if 384.Ar output 385already exists. 386Once sorting is finished, this file replaces 387.Ar output 388(via 389.Xr link 2 390and 391.Xr unlink 2 ) . 392.El 393.Sh EXIT STATUS 394Sort exits with one of the following values: 395.Bl -tag -width flag -compact 396.It 0 397Normal behavior. 398.It 1 399On disorder (or non-uniqueness) with the 400.Fl c 401option 402.It 2 403An error occurred. 404.El 405.Sh SEE ALSO 406.Xr comm 1 , 407.Xr join 1 , 408.Xr uniq 1 , 409.Xr qsort 3 , 410.Xr radixsort 3 411.Sh HISTORY 412A 413.Nm 414command appeared in 415.At v5 . 416This 417.Nm 418implementation appeared in 419.Bx 4.4 420and is used since 421.Nx 1.6 . 422.Sh BUGS 423Posix requires the locale's thousands separator be ignored in numbers. 424It may be faster to sort very large files in pieces and then explicitly 425merge them. 426.Sh NOTES 427This 428.Nm 429has no limits on input line length (other than imposed by available 430memory) or any restrictions on bytes allowed within lines. 431.Pp 432To protect data 433.Nm 434.Fl o 435calls 436.Xr link 2 437and 438.Xr unlink 2 , 439and thus fails on protected directories. 440.Pp 441Input files should be text files. 442If file doesn't end with record separator (which is typically newline), the 443.Nm 444utility silently supplies one. 445.Pp 446The current 447.Nm 448uses lexicographic radix sorting, which requires 449that sort keys be kept in memory (as opposed to previous versions which used quick 450and merge sorts and did not.) 451Thus performance depends highly on efficient choice of sort keys, and the 452.Fl b 453option and the 454.Ar field2 455argument of the 456.Fl k 457option should be used whenever possible. 458Similarly, 459.Nm 460.Fl k1f 461is equivalent to 462.Nm 463.Fl f 464and may take twice as long. 465