1.\" $OpenBSD: sort.1,v 1.54 2015/04/05 14:20:22 millert Exp $ 2.\" 3.\" Copyright (c) 1991, 1993 4.\" The Regents of the University of California. All rights reserved. 5.\" 6.\" This code is derived from software contributed to Berkeley by 7.\" the Institute of Electrical and Electronics Engineers, Inc. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. Neither the name of the University nor the names of its contributors 18.\" may be used to endorse or promote products derived from this software 19.\" without specific prior written permission. 20.\" 21.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 22.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 23.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 24.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 25.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 26.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 27.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 28.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 29.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 30.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 31.\" SUCH DAMAGE. 32.\" 33.\" @(#)sort.1 8.1 (Berkeley) 6/6/93 34.\" 35.Dd $Mdocdate: April 5 2015 $ 36.Dt SORT 1 37.Os 38.Sh NAME 39.Nm sort 40.Nd sort, merge, or sequence check text and binary files 41.Sh SYNOPSIS 42.Nm sort 43.Op Fl bCcdfgHhiMmnRrsuVz 44.Op Fl k Ar field1 Ns Op , Ns Ar field2 45.Op Fl o Ar output 46.Op Fl S Ar size 47.Op Fl T Ar dir 48.Op Fl t Ar char 49.Op Ar 50.Sh DESCRIPTION 51The 52.Nm 53utility sorts text and binary files by lines. 54A line is a record separated from the subsequent record by a 55newline (default) or NUL \'\\0\' character (-z option). 56A record can contain any printable or unprintable characters. 57Comparisons are based on one or more sort keys extracted from 58each line of input, and are performed lexicographically, 59according to the current locale's collating rules and the 60specified command-line options that can tune the actual 61sorting behavior. 62By default, if keys are not given, 63.Nm 64uses entire lines for comparison. 65.Pp 66If no 67.Ar file 68is specified, or if 69.Ar file 70is 71.Sq - , 72the standard input is used. 73.Pp 74The options are as follows: 75.Bl -tag -width Ds 76.It Fl C, Fl Fl check=silent|quiet 77Check that the single input file is sorted. 78If it is, exit 0; if it's not, exit 1. 79In either case, produce no output. 80.It Fl c, Fl Fl check 81Like 82.Fl C , 83but additionally write a message to 84.Em stderr 85if the input file is not sorted. 86.It Fl m , Fl Fl merge 87Merge only; the input files are assumed to be pre-sorted. 88If they are not sorted, the output order is undefined. 89.It Fl o Ar output , Fl Fl output Ns = Ns Ar output 90Write the output to the 91.Ar output 92file instead of the standard output. 93This file can be the same as one of the input files. 94.It Fl S Ar size , Fl Fl buffer-size Ns = Ns Ar size 95Use a memory buffer no larger than 96.Ar size . 97The modifiers %, b, K, M, G, T, P, E, Z, and Y can be used. 98If no memory limit is specified, 99.Nm 100may use up to about 90% of available memory. 101If the input is too big to fit into the memory buffer, 102temporary files are used. 103.It Fl s 104Stable sort; maintains the original record order of records that have 105an equal key. 106This is a non-standard feature, but it is widely accepted and used. 107.It Fl T Ar dir , Fl Fl temporary-directory Ns = Ns Ar dir 108Store temporary files in the directory 109.Ar dir . 110The default path is the value of the environment variable 111.Ev TMPDIR 112or 113.Pa /var/tmp 114if 115.Ev TMPDIR 116is not defined. 117.It Fl u , Fl Fl unique 118Unique: suppress all but one in each set of lines having equal keys. 119This option implies a stable sort (see below). 120If used with 121.Fl C 122or 123.Fl c , 124.Nm 125also checks that there are no lines with duplicate keys. 126.El 127.Pp 128The following options override the default ordering rules. 129If ordering options appear before the first 130.Fl k 131option, they apply globally to all sort keys. 132When attached to a specific key (see 133.Fl k ) , 134the ordering options override all global ordering options for that key. 135Note that the ordering options intended to apply globally should not 136appear after 137.Fl k 138or results may be unexpected. 139.Bl -tag -width indent 140.It Fl d , Fl Fl dictionary-order 141Consider only blank spaces and alphanumeric characters in comparisons. 142.It Fl f , Fl Fl ignore-case 143Consider all lowercase characters that have uppercase 144equivalents to be the same for purposes of comparison. 145.It Fl g, Fl Fl general-numeric-sort, Fl Fl sort=general-numeric 146Sort by general numerical value. 147As opposed to 148.Fl n , 149this option handles general floating points. 150It has a more 151permissive format than that allowed by 152.Fl n 153but it has a significant performance drawback. 154.It Fl h, Fl Fl human-numeric-sort, Fl Fl sort=human-numeric 155Sort by numerical value, but take into account the SI suffix, 156if present. 157Sorts first by numeric sign (negative, zero, or 158positive); then by SI suffix (either empty, or `k' or `K', or one 159of `MGTPEZY', in that order); and finally by numeric value. 160The SI suffix must immediately follow the number. 161For example, '12345K' sorts before '1M', because M is "larger" than K. 162This sort option is useful for sorting the output of a single invocation 163of 'df' command with 164.Fl h 165or 166.Fl H 167options (human-readable). 168.It Fl i , Fl Fl ignore-nonprinting 169Ignore all non-printable characters. 170.It Fl M, Fl Fl month-sort, Fl Fl sort=month 171Sort by month abbreviations. 172Unknown strings are considered smaller than valid month names. 173.It Fl n , Fl Fl numeric-sort, Fl Fl sort=numeric 174An initial numeric string, consisting of optional blank space, optional 175minus sign, and zero or more digits (including decimal point) 176.\" with 177.\" optional radix character and thousands 178.\" separator 179.\" (as defined in the current locale), 180is sorted by arithmetic value. 181Leading blank characters are ignored. 182.It Fl R, Fl Fl random-sort, Fl Fl sort=random 183Sort lines in random order. 184This is a random permutation of the inputs with the exception that 185equal keys sort together. 186It is implemented by hashing the input keys and sorting the hash values. 187The hash function is randomized with data from 188.Xr arc4random_buf 3 , 189or by file content if one is specified via 190.Fl Fl random-source . 191If multiple sort fields are specified, 192the same random hash function is used for all of them. 193.It Fl r , Fl Fl reverse 194Sort in reverse order. 195.It Fl V, Fl Fl version-sort 196Sort version numbers. 197The input lines are treated as file names in form 198PREFIX VERSION SUFFIX, where SUFFIX matches the regular expression 199"(\.([A-Za-z~][A-Za-z0-9~]*)?)*". 200The files are compared by their prefixes and versions (leading 201zeros are ignored in version numbers, see example below). 202If an input string does not match the pattern, then it is compared 203using the byte compare function. 204All string comparisons are performed in the C locale. 205.Pp 206For example: 207.Bd -literal -offset indent 208$ ls sort* | sort -V 209sort-1.022.tgz 210sort-1.23.tgz 211sort-1.23.1.tgz 212sort-1.024.tgz 213sort-1.024.003. 214sort-1.024.003.tgz 215sort-1.024.07.tgz 216sort-1.024.009.tgz 217.Ed 218.El 219.Pp 220The treatment of field separators can be altered using these options: 221.Bl -tag -width indent 222.It Fl b , Fl Fl ignore-leading-blanks 223Ignore leading blank space when determining the start 224and end of a restricted sort key (see 225.Fl k ) . 226If 227.Fl b 228is specified before the first 229.Fl k 230option, it applies globally to all key specifications. 231Otherwise, 232.Fl b 233can be attached independently to each 234.Ar field 235argument of the key specifications. 236Note that 237.Fl b 238should not appear after 239.Fl k , 240and that it has no effect unless key fields are specified. 241.It Xo 242.Fl k Ar field1 Ns Op , Ns Ar field2 , 243.Fl Fl key Ns = Ns Ar field1 Ns Op , Ns Ar field2 244.Xc 245Define a restricted sort key that has the starting position 246.Ar field1 , 247and optional ending position 248.Ar field2 249of a key field. 250The 251.Fl k 252option may be specified multiple times, 253in which case subsequent keys are compared after earlier keys compare equal. 254The 255.Fl k 256option replaces the obsolete options 257.Cm \(pl Ns Ar pos1 258and 259.Fl Ns Ar pos2 , 260but the old notation is also supported. 261.It Fl t Ar char , Fl Fl field-separator Ns = Ns Ar char 262Use 263.Ar char 264as the field separator character. 265The initial 266.Ar char 267is not considered to be part of a field when determining key offsets. 268Each occurrence of 269.Ar char 270is significant (for example, 271.Dq Ar charchar 272delimits an empty field). 273If 274.Fl t 275is not specified, the default field separator is a sequence of 276blank-space characters, and consecutive blank spaces do 277.Em not 278delimit an empty field; further, the initial blank space 279.Em is 280considered part of a field when determining key offsets. 281To use NUL as field separator, use 282.Fl t 283\'\\0\'. 284.It Fl z , Fl Fl zero-terminated 285Use NUL as the record separator. 286By default, records in the files are expected to be separated by 287the newline characters. 288With this option, NUL (\'\\0\') is used as the record separator character. 289.El 290.Pp 291Other options: 292.Bl -tag -width indent 293.It Fl Fl batch-size Ns = Ns Ar num 294Specify maximum number of files that can be opened by 295.Nm 296at once. 297This option affects behavior when having many input files or using 298temporary files. 299The minimum value is 2. 300The default value is 16. 301.It Fl Fl compress-program Ns = Ns Ar program 302Use 303.Ar program 304to compress temporary files. 305When invoked with no arguments, 306.Ar program 307must compress standard input to standard output. 308When called with the 309.Fl d 310option, it must decompress standard input to standard output. 311If 312.Ar program 313fails, 314.Nm 315will exit with an error. 316The 317.Xr compress 1 318and 319.Xr gzip 1 320utilities meet these requirements. 321.It Fl Fl debug 322Print some extra information about the sorting process to the 323standard output. 324.It Fl Fl files0-from Ns = Ns Ar filename 325Take the input file list from the file 326.Ar filename . 327The file names must be separated by NUL 328(like the output produced by the command 329.Dq find ... -print0 ) . 330.It Fl Fl heapsort 331Try to use heap sort, if the sort specifications allow. 332This sort algorithm cannot be used with 333.Fl u 334and 335.Fl s . 336.It Fl Fl help 337Print the help text and exit. 338.It Fl Fl mergesort , Fl H 339Use mergesort. 340This is a universal algorithm that can always be used, 341but it is not always the fastest. 342.It Fl Fl mmap 343Try to use file memory mapping system call. 344It may increase speed in some cases. 345.It Fl Fl qsort 346Try to use quick sort, if the sort specifications allow. 347This sort algorithm cannot be used with 348.Fl u 349and 350.Fl s . 351.It Fl Fl radixsort 352Try to use radix sort, if the sort specifications allow. 353The radix sort can only be used for trivial locales (C and POSIX), 354and it cannot be used for numeric or month sort. 355Radix sort is very fast and stable. 356.It Fl Fl random-source Ns = Ns Ar filename 357For random sort, the contents of 358.Ar filename 359are used as the source of the 360.Sq seed 361data for the hash function. 362Two invocations of random sort with the same seed data will use 363produce the same result if the input is also identical. 364By default, the 365.Xr arc4random_buf 3 366function is used instead. 367.It Fl Fl version 368Print the version and exit. 369.El 370.Pp 371A field is defined as a maximal sequence of characters other than the 372field separator and record separator 373.Pq newline by default . 374Initial blank spaces are included in the field unless 375.Fl b 376has been specified; 377the first blank space of a sequence of blank spaces acts as the field 378separator and is included in the field (unless 379.Fl t 380is specified). 381For example, by default all blank spaces at the beginning of a line are 382considered to be part of the first field. 383.Pp 384Fields are specified by the 385.Fl k Ar field1 Ns Op , Ns Ar field2 386option. 387If 388.Ar field2 389is missing, the end of the key defaults to the end of the line. 390.Pp 391The arguments 392.Ar field1 393and 394.Ar field2 395have the form 396.Em m.n 397.Em (m,n > 0) 398and can be followed by one or more of the modifiers 399.Cm b , d , f , i , 400.Cm n , g , M 401and 402.Cm r , 403which correspond to the options discussed above. 404When 405.Cm b 406is specified it applies only to 407.Ar field1 408or 409.Ar field2 410where it is specified while the rest of the modifiers 411apply to the whole key field regardless if they are 412specified only with 413.Ar field1 414or 415.Ar field2 416or both. 417A 418.Ar field1 419position specified by 420.Em m.n 421is interpreted as the 422.Em n Ns th 423character from the beginning of the 424.Em m Ns th 425field. 426A missing 427.Em \&.n 428in 429.Ar field1 430means 431.Ql \&.1 , 432indicating the first character of the 433.Em m Ns th 434field; if the 435.Fl b 436option is in effect, 437.Em n 438is counted from the first non-blank character in the 439.Em m Ns th 440field; 441.Em m Ns \&.1b 442refers to the first non-blank character in the 443.Em m Ns th 444field. 445.No 1\&. Ns Em n 446refers to the 447.Em n Ns th 448character from the beginning of the line; 449if 450.Em n 451is greater than the length of the line, the field is taken to be empty. 452.Pp 453.Em n Ns th 454positions are always counted from the field beginning, even if the field 455is shorter than the number of specified positions. 456Thus, the key can really start from a position in a subsequent field. 457.Pp 458A 459.Ar field2 460position specified by 461.Em m.n 462is interpreted as the 463.Em n Ns th 464character (including separators) from the beginning of the 465.Em m Ns th 466field. 467A missing 468.Em \&.n 469indicates the last character of the 470.Em m Ns th 471field; 472.Em m 473= \&0 474designates the end of a line. 475Thus the option 476.Fl k Ar v.x,w.y 477is synonymous with the obsolete option 478.Cm \(pl Ns Ar v-\&1.x-\&1 479.Fl Ns Ar w-\&1.y ; 480when 481.Em y 482is omitted, 483.Fl k Ar v.x,w 484is synonymous with 485.Cm \(pl Ns Ar v-\&1.x-\&1 486.Fl Ns Ar w\&.0 . 487The obsolete 488.Cm \(pl Ns Ar pos1 489.Fl Ns Ar pos2 490option is still supported, except for 491.Fl Ns Ar w\&.0b , 492which has no 493.Fl k 494equivalent. 495.Sh ENVIRONMENT 496.Bl -tag -width Fl 497.It Ev GNUSORT_NUMERIC_COMPATIBILITY 498If defined 499.Fl t 500will not override the locale numeric symbols, that is, thousand 501separators and decimal separators. 502By default, if we specify 503.Fl t 504with the same symbol as the thousand separator or decimal point, 505the symbol will be treated as the field separator. 506Older behavior was less definite: the symbol was treated as both field 507separator and numeric separator, simultaneously. 508This environment variable enables the old behavior. 509.It Ev LANG 510Used as a last resort to determine different kinds of locale-specific 511behavior if neither the respective environment variable nor 512.Ev LC_ALL 513are set. 514.It Ev LC_ALL 515Locale settings that override all of the other locale settings. 516This environment variable can be used to set all these settings 517to the same value at once. 518.It Ev LC_COLLATE 519Locale settings to be used to determine the collation for 520sorting records. 521.It Ev LC_CTYPE 522Locale settings to be used to case conversion and classification 523of characters, that is, which characters are considered 524whitespaces, etc. 525.It Ev LC_MESSAGES 526Locale settings that determine the language of output messages 527that 528.Nm 529prints out. 530.It Ev LC_NUMERIC 531Locale settings that determine the number format used in numeric sort. 532.It Ev LC_TIME 533Locale settings that determine the month format used in month sort. 534.It Ev TMPDIR 535Path to the directory in which temporary files will be stored. 536Note that 537.Ev TMPDIR 538may be overridden by the 539.Fl T 540option. 541.El 542.Sh FILES 543.Bl -tag -width Pa -compact 544.It Pa /var/tmp/.bsdsort.PID.* 545Temporary files. 546.El 547.Sh EXIT STATUS 548The 549.Nm 550utility exits with one of the following values: 551.Pp 552.Bl -tag -width Ds -offset indent -compact 553.It 0 554Successfully sorted the input files or if used with 555.Fl C 556or 557.Fl c , 558the input file already met the sorting criteria. 559.It 1 560On disorder (or non-uniqueness) with the 561.Fl C 562or 563.Fl c 564options. 565.It 2 566An error occurred. 567.El 568.Sh SEE ALSO 569.Xr comm 1 , 570.Xr join 1 , 571.Xr uniq 1 572.Sh STANDARDS 573The 574.Nm 575utility is compliant with the 576.St -p1003.1-2008 577specification. 578.Pp 579The flags 580.Op Fl gHhiMRSsTVz 581are extensions to that specification. 582.Pp 583All long options are extensions to the specification. 584Some are provided for compatibility with GNU 585.Nm , 586others are specific to this implementation. 587.Pp 588Some implementations of 589.Nm 590honor the 591.Fl b 592option even when no key fields are specified. 593This implementation follows historic practice and 594.St -p1003.1-2008 595in only honoring 596.Fl b 597when it precedes a key field. 598.Pp 599The historic practice of allowing the 600.Fl o 601option to appear after the 602.Ar file 603is supported for compatibility with older versions of 604.Nm . 605.Pp 606The historic key notations 607.Cm \(pl Ns Ar pos1 608and 609.Fl Ns Ar pos2 610are supported for compatibility with older versions of 611.Nm 612but their use is highly discouraged. 613.Sh HISTORY 614A 615.Nm 616command appeared in 617.At v3 . 618.Sh AUTHORS 619.An Gabor Kovesdan Aq Mt gabor@FreeBSD.org 620.An Oleg Moskalenko Aq Mt mom040267@gmail.com 621.Sh CAVEATS 622This implementation of 623.Nm 624has no limits on input line length (other than imposed by available 625memory) or any restrictions on bytes allowed within lines. 626.Pp 627The performance depends highly on locale settings, 628efficient choice of sort keys and key complexity. 629The fastest sort is with the C locale, on whole lines, with option 630.Fl s . 631In general, the C locale is the fastest, followed by single-byte 632locales with multi-byte locales being the slowest. 633The correct collation order respected in all cases. 634For the key specification, the simpler to process the 635lines the faster the search will be. 636.Pp 637When sorting by arithmetic value, using 638.Fl n 639results in much better performance than 640.Fl g 641so its use is encouraged whenever possible. 642