1.\" $NetBSD: sort.1,v 1.32 2010/12/18 23:36:23 wiz Exp $ 2.\" 3.\" Copyright (c) 2000-2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Ben Harris and Jaromir Dolecek. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGE. 29.\" 30.\" Copyright (c) 1991, 1993 31.\" The Regents of the University of California. All rights reserved. 32.\" 33.\" This code is derived from software contributed to Berkeley by 34.\" the Institute of Electrical and Electronics Engineers, Inc. 35.\" 36.\" Redistribution and use in source and binary forms, with or without 37.\" modification, are permitted provided that the following conditions 38.\" are met: 39.\" 1. Redistributions of source code must retain the above copyright 40.\" notice, this list of conditions and the following disclaimer. 41.\" 2. Redistributions in binary form must reproduce the above copyright 42.\" notice, this list of conditions and the following disclaimer in the 43.\" documentation and/or other materials provided with the distribution. 44.\" 3. Neither the name of the University nor the names of its contributors 45.\" may be used to endorse or promote products derived from this software 46.\" without specific prior written permission. 47.\" 48.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 49.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 50.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 51.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 52.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 53.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 54.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 55.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 56.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 57.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 58.\" SUCH DAMAGE. 59.\" 60.\" @(#)sort.1 8.1 (Berkeley) 6/6/93 61.\" 62.Dd December 18, 2010 63.Dt SORT 1 64.Os 65.Sh NAME 66.Nm sort 67.Nd sort or merge text files 68.Sh SYNOPSIS 69.Nm sort 70.Op Fl bcdfHilmnrSsu 71.Oo 72.Fl k 73.Ar field1 Ns Op Li \&, Ns Ar field2 74.Oc 75.Op Fl o Ar output 76.Op Fl R Ar char 77.Op Fl T Ar dir 78.Op Fl t Ar char 79.Op Ar 80.Sh DESCRIPTION 81The 82.Nm 83utility sorts text files by lines. 84Comparisons are based on one or more sort keys extracted 85from each line of input, and are performed lexicographically. 86By default, if keys are not given, 87.Nm 88regards each input line as a single field. 89.Pp 90The following options are available: 91.Bl -tag -width Fl 92.It Fl c 93Check that the single input file is sorted. 94If the file is not sorted, 95.Nm 96produces the appropriate error messages and exits with code 1; otherwise, 97.Nm 98returns 0. 99.Nm 100.Fl c 101produces no output. 102.It Fl H 103Ignored for compatibility with earlier versions of 104.Nm . 105.It Fl m 106Merge only; the input files are assumed to be pre-sorted. 107.It Fl o Ar output 108The argument given is the name of an 109.Ar output 110file to be used instead of the standard output. 111This file can be the same as one of the input files. 112.It Fl S 113Don't use stable sort. 114Default is to use stable sort. 115.It Fl s 116Use stable sort, keeps records with equal keys in their original order. 117This is the default. 118Provided for compatibility with other 119.Nm 120implementations only. 121.It Fl T Ar dir 122Use 123.Ar dir 124as the directory for temporary files. 125The default is the value specified in the environment variable 126.Ev TMPDIR or 127.Pa /tmp 128if 129.Ev TMPDIR 130is not defined. 131.It Fl u 132Unique: suppress all but one in each set of lines having equal keys. 133If used with the 134.Fl c 135option, check that there are no lines with duplicate keys. 136.El 137.Pp 138The following options override the default ordering rules. 139When ordering options appear independent of key field 140specifications, the requested field ordering rules are 141applied globally to all sort keys. 142When attached to a specific key (see 143.Fl k ) , 144the ordering options override 145all global ordering options for that key. 146.Bl -tag -width Fl 147.It Fl d 148Only blank space and alphanumeric characters 149.\" according 150.\" to the current setting of LC_CTYPE 151are used 152in making comparisons. 153.It Fl f 154Considers all lowercase characters that have uppercase 155equivalents to be the same for purposes of comparison. 156.It Fl i 157Ignore all non-printable characters. 158.It Fl l 159Sort by the string length of the field, not by the field itself. 160.It Fl n 161An initial numeric string, consisting of optional blank space, optional 162minus sign, and zero or more digits (including decimal point) 163.\" with 164.\" optional radix character and thousands 165.\" separator 166.\" (as defined in the current locale), 167is sorted by arithmetic value. 168(The 169.Fl n 170option no longer implies the 171.Fl b 172option.) 173.It Fl r 174Reverse the sense of comparisons. 175.El 176.Pp 177The treatment of field separators can be altered using these options: 178.Bl -tag -width Fl 179.It Fl b 180Ignores leading blank space when determining the start 181and end of a restricted sort key. 182A 183.Fl b 184option specified before the first 185.Fl k 186option applies globally to all 187.Fl k 188options. 189Otherwise, the 190.Fl b 191option can be attached independently to each 192.Ar field 193argument of the 194.Fl k 195option (see below). 196Note that the 197.Fl b 198option has no effect unless key fields are specified. 199.It Fl t Ar char 200.Ar char 201is used as the field separator character. 202The initial 203.Ar char 204is not considered to be part of a field when determining 205key offsets (see below). 206Each occurrence of 207.Ar char 208is significant (for example, 209.Dq Ar charchar 210delimits an empty field). 211If 212.Fl t 213is not specified, the default field separator is a sequence of 214blank-space characters, and consecutive blank spaces do 215.Em not 216delimit an empty field; further, the initial blank space 217.Em is 218considered part of a field when determining key offsets. 219.It Fl R Ar char 220.Ar char 221is used as the record separator character. 222This should be used with discretion; 223.Fl R Ar \*[Lt]alphanumeric\*[Gt] 224usually produces undesirable results. 225The default record separator is newline. 226.It Fl k Ar field1 Ns Op Li \&, Ns Ar field2 227Designates the starting position, 228.Ar field1 , 229and optional ending position, 230.Ar field2 , 231of a key field. 232The 233.Fl k 234option replaces the obsolescent options 235.Cm \(pl Ns Ar pos1 236and 237.Fl Ns Ar pos2 . 238.El 239.Pp 240The following operands are available: 241.Bl -tag -width Ar 242.It Ar file 243The pathname of a file to be sorted, merged, or checked. 244If no 245.Ar file 246operands are specified, or if 247a 248.Ar file 249operand is 250.Fl , 251the standard input is used. 252.El 253.Pp 254A field is defined as a minimal sequence of characters followed by a 255field separator or a newline character. 256By default, the first 257blank space of a sequence of blank spaces acts as the field separator. 258All blank spaces in a sequence of blank spaces are considered 259as part of the next field; for example, all blank spaces at 260the beginning of a line are considered to be part of the 261first field. 262.Pp 263Fields are specified 264by the 265.Fl k 266.Ar field1 Ns Op \&, Ns Ar field2 267argument. 268A missing 269.Ar field2 270argument defaults to the end of a line. 271.Pp 272The arguments 273.Ar field1 274and 275.Ar field2 276have the form 277.Ar m Ns Li \&. Ns Ar n 278and can be followed by one or more of the letters 279.Cm b , d , f , i , 280.Cm l , n , 281and 282.Cm r , 283which correspond to the options discussed above. 284A 285.Ar field1 286position specified by 287.Ar m Ns Li \&. Ns Ar n 288.Pq Ar m , n No \*[Gt] 0 289is interpreted as the 290.Ar n Ns th 291character in the 292.Ar m Ns th 293field. 294A missing 295.Li \&. Ns Ar n 296in 297.Ar field1 298means 299.Ql \&.1 , 300indicating the first character of the 301.Ar m Ns th 302field; if the 303.Fl b 304option is in effect, 305.Ar n 306is counted from the first non-blank character in the 307.Ar m Ns th 308field; 309.Ar m Ns Li \&.1b 310refers to the first non-blank character in the 311.Ar m Ns th 312field. 313.Pp 314A 315.Ar field2 316position specified by 317.Ar m Ns Li \&. Ns Ar n 318is interpreted as 319the 320.Ar n Ns th 321character (including separators) of the 322.Ar m Ns th 323field. 324A missing 325.Li \&. Ns Ar n 326indicates the last character of the 327.Ar m Ns th 328field; 329.Ar m 330= \&0 331designates the end of a line. 332Thus the option 333.Fl k 334.Sm off 335.Xo 336.Ar v Li \&. Ar x Li \&, 337.Ar w Li \&. Ar y 338.Xc 339.Sm on 340is synonymous with the obsolescent option 341.Sm off 342.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1 343.Fl Ar w-\&1 Li \&. Ar y ; 344.Sm on 345when 346.Ar y 347is omitted, 348.Fl k 349.Sm off 350.Ar v Li \&. Ar x Li \&, Ar w 351.Sm on 352is synonymous with 353.Sm off 354.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1 355.Fl Ar w+1 Li \&.0 . 356.Sm on 357The obsolescent 358.Cm \(pl Ns Ar pos1 359.Fl Ns Ar pos2 360option is still supported, except for 361.Fl Ns Ar w Ns Li \&.0b , 362which has no 363.Fl k 364equivalent. 365.Sh ENVIRONMENT 366If the following environment variable exists, it is used by 367.Nm . 368.Bl -tag -width Ev 369.It Ev TMPDIR 370.Nm 371uses the contents of the 372.Ev TMPDIR 373environment variable as the path in which to store 374temporary files. 375.El 376.Sh FILES 377.Bl -tag -width outputNUMBER+some -compact 378.It Pa /tmp/sort.* 379Default temporary files. 380.It Ar output Ns NUMBER 381Temporary file which is used for output if 382.Ar output 383already exists. 384Once sorting is finished, this file replaces 385.Ar output 386(via 387.Xr link 2 388and 389.Xr unlink 2 ) . 390.El 391.Sh EXIT STATUS 392Sort exits with one of the following values: 393.Bl -tag -width flag -compact 394.It 0 395Normal behavior. 396.It 1 397On disorder (or non-uniqueness) with the 398.Fl c 399option 400.It 2 401An error occurred. 402.El 403.Sh SEE ALSO 404.Xr comm 1 , 405.Xr join 1 , 406.Xr uniq 1 , 407.Xr qsort 3 , 408.Xr radixsort 3 409.Sh HISTORY 410A 411.Nm 412command appeared in 413.At v5 . 414This 415.Nm 416implementation appeared in 417.Bx 4.4 418and is used since 419.Nx 1.6 . 420.Sh BUGS 421Posix requires the locale's thousands separator be ignored in numbers. 422It may be faster to sort very large files in pieces and then explicitly 423merge them. 424.Sh NOTES 425This 426.Nm 427has no limits on input line length (other than imposed by available 428memory) or any restrictions on bytes allowed within lines. 429.Pp 430To protect data 431.Nm 432.Fl o 433calls 434.Xr link 2 435and 436.Xr unlink 2 , 437and thus fails on protected directories. 438.Pp 439Input files should be text files. 440If file doesn't end with record separator (which is typically newline), the 441.Nm 442utility silently supplies one. 443.Pp 444The current 445.Nm 446uses lexicographic radix sorting, which requires 447that sort keys be kept in memory (as opposed to previous versions which used quick 448and merge sorts and did not.) 449Thus performance depends highly on efficient choice of sort keys, and the 450.Fl b 451option and the 452.Ar field2 453argument of the 454.Fl k 455option should be used whenever possible. 456Similarly, 457.Nm 458.Fl k1f 459is equivalent to 460.Nm 461.Fl f 462and may take twice as long. 463