1.\" $OpenBSD: sort.1,v 1.65 2022/03/31 17:27:27 naddy Exp $ 2.\" 3.\" Copyright (c) 1991, 1993 4.\" The Regents of the University of California. All rights reserved. 5.\" 6.\" This code is derived from software contributed to Berkeley by 7.\" the Institute of Electrical and Electronics Engineers, Inc. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. Neither the name of the University nor the names of its contributors 18.\" may be used to endorse or promote products derived from this software 19.\" without specific prior written permission. 20.\" 21.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 22.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 23.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 24.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 25.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 26.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 27.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 28.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 29.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 30.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 31.\" SUCH DAMAGE. 32.\" 33.\" @(#)sort.1 8.1 (Berkeley) 6/6/93 34.\" 35.Dd $Mdocdate: March 31 2022 $ 36.Dt SORT 1 37.Os 38.Sh NAME 39.Nm sort 40.Nd sort, merge, or sequence check text and binary files 41.Sh SYNOPSIS 42.Nm sort 43.Op Fl bCcdfgHhiMmnRrsuVz 44.Op Fl k Ar field1 Ns Op , Ns Ar field2 45.Op Fl o Ar output 46.Op Fl S Ar size 47.Op Fl T Ar dir 48.Op Fl t Ar char 49.Op Ar 50.Sh DESCRIPTION 51The 52.Nm 53utility sorts text and binary files by lines. 54A line is a record separated from the subsequent record by a 55newline (default) or NUL 56.Ql \e0 57character 58.Po 59.Fl z 60option 61.Pc . 62A record can contain any printable or unprintable characters. 63Comparisons are based on one or more sort keys extracted from 64each line of input, and are performed lexicographically, 65according to the specified command-line options 66that can tune the actual sorting behavior. 67By default, if keys are not given, 68.Nm 69uses entire lines for comparison. 70.Pp 71If no 72.Ar file 73is specified, or if 74.Ar file 75is 76.Sq - , 77the standard input is used. 78.Pp 79The options are as follows: 80.Bl -tag -width Ds 81.It Fl C , Fl Fl check Ns = Ns Cm silent Ns | Ns Cm quiet 82Check that the single input file is sorted. 83If it is, exit 0; if it's not, exit 1. 84In either case, produce no output. 85.It Fl c , Fl Fl check 86Like 87.Fl C , 88but additionally write a message to 89.Em stderr 90if the input file is not sorted. 91.It Fl m , Fl Fl merge 92Merge only; the input files are assumed to be pre-sorted. 93If they are not sorted, the output order is undefined. 94.It Fl o Ar output , Fl Fl output Ns = Ns Ar output 95Write the output to the 96.Ar output 97file instead of the standard output. 98This file can be the same as one of the input files. 99.It Fl S Ar size , Fl Fl buffer-size Ns = Ns Ar size 100Use a memory buffer no larger than 101.Ar size . 102The modifiers %, b, K, M, G, T, P, E, Z, and Y can be used. 103If no memory limit is specified, 104.Nm 105may use up to about 90% of available memory. 106If the input is too big to fit into the memory buffer, 107temporary files are used. 108.It Fl s 109Stable sort; maintains the original record order of records that have 110an equal key. 111This is a non-standard feature, but it is widely accepted and used. 112.It Fl T Ar dir , Fl Fl temporary-directory Ns = Ns Ar dir 113Store temporary files in the directory 114.Ar dir . 115The default path is the value of the environment variable 116.Ev TMPDIR 117or 118.Pa /tmp 119if 120.Ev TMPDIR 121is not defined. 122.It Fl u , Fl Fl unique 123Unique: suppress all but one in each set of lines having equal keys. 124This option implies a stable sort (see below). 125If used with 126.Fl C 127or 128.Fl c , 129.Nm 130also checks that there are no lines with duplicate keys. 131.El 132.Pp 133The following options override the default ordering rules. 134If ordering options appear before the first 135.Fl k 136option, they apply globally to all sort keys. 137When attached to a specific key (see 138.Fl k ) , 139the ordering options override all global ordering options for that key. 140Note that the ordering options intended to apply globally should not 141appear after 142.Fl k 143or results may be unexpected. 144.Bl -tag -width indent 145.It Fl d , Fl Fl dictionary-order 146Consider only blank spaces and alphanumeric characters in comparisons. 147.It Fl f , Fl Fl ignore-case 148Consider all lowercase characters that have uppercase 149equivalents to be the same for purposes of comparison. 150.It Fl g , Fl Fl general-numeric-sort , Fl Fl sort Ns = Ns Cm general-numeric 151Sort by general numerical value. 152As opposed to 153.Fl n , 154this option handles general floating points. 155It has a more 156permissive format than that allowed by 157.Fl n 158but it has a significant performance drawback. 159.It Fl h , Fl Fl human-numeric-sort , Fl Fl sort Ns = Ns Cm human-numeric 160Sort by numerical value, but take into account the SI suffix, 161if present. 162Sorts first by numeric sign (negative, zero, or 163positive); then by SI suffix (either empty, or `k' or `K', or one 164of `MGTPEZY', in that order); and finally by numeric value. 165The SI suffix must immediately follow the number. 166For example, '12345K' sorts before '1M', because M is "larger" than K. 167This sort option is useful for sorting the output of a single invocation 168of 'df' command with 169.Fl h 170or 171.Fl H 172options (human-readable). 173.It Fl i , Fl Fl ignore-nonprinting 174Ignore all non-printable characters. 175.It Fl M , Fl Fl month-sort , Fl Fl sort Ns = Ns Cm month 176Sort by month abbreviations. 177Unknown strings are considered smaller than valid month names. 178.It Fl n , Fl Fl numeric-sort , Fl Fl sort Ns = Ns Cm numeric 179An initial numeric string, consisting of optional blank space, optional 180minus sign, and zero or more digits (including decimal point) 181is sorted by arithmetic value. 182Leading blank characters are ignored. 183.It Fl R , Fl Fl random-sort , Fl Fl sort Ns = Ns Cm random 184Sort lines in random order. 185This is a random permutation of the inputs with the exception that 186equal keys sort together. 187It is implemented by hashing the input keys and sorting the hash values. 188The hash function is randomized with data from 189.Xr arc4random_buf 3 , 190or by file content if one is specified via 191.Fl Fl random-source . 192If multiple sort fields are specified, 193the same random hash function is used for all of them. 194.It Fl r , Fl Fl reverse 195Sort in reverse order. 196.It Fl V , Fl Fl version-sort 197Sort version numbers. 198The input lines are treated as file names in form 199PREFIX VERSION SUFFIX, where SUFFIX matches the regular expression 200"(\.([A-Za-z~][A-Za-z0-9~]*)?)*". 201The files are compared by their prefixes and versions (leading 202zeros are ignored in version numbers, see example below). 203If an input string does not match the pattern, then it is compared 204using the byte compare function. 205.Pp 206For example: 207.Bd -literal -offset indent 208$ ls sort* | sort -V 209sort-1.022.tgz 210sort-1.23.tgz 211sort-1.23.1.tgz 212sort-1.024.tgz 213sort-1.024.003. 214sort-1.024.003.tgz 215sort-1.024.07.tgz 216sort-1.024.009.tgz 217.Ed 218.El 219.Pp 220The treatment of field separators can be altered using these options: 221.Bl -tag -width indent 222.It Fl b , Fl Fl ignore-leading-blanks 223Ignore leading blank space when determining the start 224and end of a restricted sort key (see 225.Fl k ) . 226If 227.Fl b 228is specified before the first 229.Fl k 230option, it applies globally to all key specifications. 231Otherwise, 232.Fl b 233can be attached independently to each 234.Ar field 235argument of the key specifications. 236Note that 237.Fl b 238should not appear after 239.Fl k , 240and that it has no effect unless key fields are specified. 241.It Xo 242.Fl k Ar field1 Ns Op , Ns Ar field2 , 243.Fl Fl key Ns = Ns Ar field1 Ns Op , Ns Ar field2 244.Xc 245Define a restricted sort key that has the starting position 246.Ar field1 , 247and optional ending position 248.Ar field2 249of a key field. 250The 251.Fl k 252option may be specified multiple times, 253in which case subsequent keys are compared after earlier keys compare equal. 254The 255.Fl k 256option replaces the obsolete options 257.Cm \(pl Ns Ar pos1 258and 259.Fl Ns Ar pos2 , 260but the old notation is also supported. 261.It Fl t Ar char , Fl Fl field-separator Ns = Ns Ar char 262Use 263.Ar char 264as the field separator character. 265The initial 266.Ar char 267is not considered to be part of a field when determining key offsets. 268Each occurrence of 269.Ar char 270is significant (for example, 271.Dq Ar charchar 272delimits an empty field). 273If 274.Fl t 275is not specified, the default field separator is a sequence of 276blank-space characters, and consecutive blank spaces do 277.Em not 278delimit an empty field; further, the initial blank space 279.Em is 280considered part of a field when determining key offsets. 281To use NUL as field separator, use 282.Fl t 283\(aq\e0\(aq. 284.It Fl z , Fl Fl zero-terminated 285Use NUL as the record separator. 286By default, records in the files are expected to be separated by 287the newline characters. 288With this option, NUL 289.Pq Ql \e0 290is used as the record separator character. 291.El 292.Pp 293Other options: 294.Bl -tag -width indent 295.It Fl Fl batch-size Ns = Ns Ar num 296Specify maximum number of files that can be opened by 297.Nm 298at once. 299This option affects behavior when having many input files or using 300temporary files. 301The minimum value is 2. 302The default value is 16. 303.It Fl Fl compress-program Ns = Ns Ar program 304Use 305.Ar program 306to compress temporary files. 307When invoked with no arguments, 308.Ar program 309must compress standard input to standard output. 310When called with the 311.Fl d 312option, it must decompress standard input to standard output. 313If 314.Ar program 315fails, 316.Nm 317will exit with an error. 318The 319.Xr compress 1 320and 321.Xr gzip 1 322utilities meet these requirements. 323.It Fl Fl debug 324Print some extra information about the sorting process to the 325standard output. 326.It Fl Fl files0-from Ns = Ns Ar filename 327Take the input file list from the file 328.Ar filename . 329The file names must be separated by NUL 330(like the output produced by the command 331.Dq find ... -print0 ) . 332.It Fl Fl heapsort 333Try to use heap sort, if the sort specifications allow. 334This sort algorithm cannot be used with 335.Fl u 336and 337.Fl s . 338.It Fl Fl help 339Print the help text and exit. 340.It Fl H , Fl Fl mergesort 341Use mergesort. 342This is a universal algorithm that can always be used, 343but it is not always the fastest. 344.It Fl Fl mmap 345Try to use file memory mapping system call. 346It may increase speed in some cases. 347.It Fl Fl qsort 348Try to use quick sort, if the sort specifications allow. 349This sort algorithm cannot be used with 350.Fl u 351and 352.Fl s . 353.It Fl Fl radixsort 354Try to use radix sort, if the sort specifications allow. 355The radix sort can only be used for trivial locales (C and POSIX), 356and it cannot be used for numeric or month sort. 357Radix sort is very fast and stable. 358.It Fl Fl random-source Ns = Ns Ar filename 359For random sort, the contents of 360.Ar filename 361are used as the source of the 362.Sq seed 363data for the hash function. 364Two invocations of random sort with the same seed data 365produce the same result if the input is also identical. 366By default, the 367.Xr arc4random_buf 3 368function is used instead. 369.It Fl Fl version 370Print the version and exit. 371.El 372.Pp 373A field is defined as a maximal sequence of characters other than the 374field separator and record separator 375.Pq newline by default . 376Initial blank spaces are included in the field unless 377.Fl b 378has been specified; 379the first blank space of a sequence of blank spaces acts as the field 380separator and is included in the field (unless 381.Fl t 382is specified). 383For example, by default all blank spaces at the beginning of a line are 384considered to be part of the first field. 385.Pp 386Fields are specified by the 387.Fl k Ar field1 Ns Op , Ns Ar field2 388option. 389If 390.Ar field2 391is missing, the end of the key defaults to the end of the line. 392.Pp 393The arguments 394.Ar field1 395and 396.Ar field2 397have the form 398.Em m.n 399.Em (m,n > 0) 400and can be followed by one or more of the modifiers 401.Cm b , d , f , i , 402.Cm n , g , M 403and 404.Cm r , 405which correspond to the options discussed above. 406When 407.Cm b 408is specified, it applies only to 409.Ar field1 410or 411.Ar field2 412where it is specified while the rest of the modifiers 413apply to the whole key field regardless if they are 414specified only with 415.Ar field1 416or 417.Ar field2 418or both. 419A 420.Ar field1 421position specified by 422.Em m.n 423is interpreted as the 424.Em n Ns th 425character from the beginning of the 426.Em m Ns th 427field. 428A missing 429.Em \&.n 430in 431.Ar field1 432means 433.Ql \&.1 , 434indicating the first character of the 435.Em m Ns th 436field; if the 437.Fl b 438option is in effect, 439.Em n 440is counted from the first non-blank character in the 441.Em m Ns th 442field; 443.Em m Ns \&.1b 444refers to the first non-blank character in the 445.Em m Ns th 446field. 447.No 1\&. Ns Em n 448refers to the 449.Em n Ns th 450character from the beginning of the line; 451if 452.Em n 453is greater than the length of the line, the field is taken to be empty. 454.Pp 455.Em n Ns th 456positions are always counted from the field beginning, even if the field 457is shorter than the number of specified positions. 458Thus, the key can really start from a position in a subsequent field. 459.Pp 460A 461.Ar field2 462position specified by 463.Em m.n 464is interpreted as the 465.Em n Ns th 466character (including separators) from the beginning of the 467.Em m Ns th 468field. 469A missing 470.Em \&.n 471indicates the last character of the 472.Em m Ns th 473field; 474.Em m 475= \&0 476designates the end of a line. 477Thus the option 478.Fl k Ar v.x,w.y 479is synonymous with the obsolete option 480.Cm \(pl Ns Ar v-\&1.x-\&1 481.Fl Ns Ar w-\&1.y ; 482when 483.Em y 484is omitted, 485.Fl k Ar v.x,w 486is synonymous with 487.Cm \(pl Ns Ar v-\&1.x-\&1 488.Fl Ns Ar w\&.0 . 489The obsolete 490.Cm \(pl Ns Ar pos1 491.Fl Ns Ar pos2 492option is still supported, except for 493.Fl Ns Ar w\&.0b , 494which has no 495.Fl k 496equivalent. 497.Sh ENVIRONMENT 498.Bl -tag -width Ds 499.It Ev TMPDIR 500Path to the directory in which temporary files will be stored. 501Note that 502.Ev TMPDIR 503may be overridden by the 504.Fl T 505option. 506.El 507.Sh FILES 508.Bl -tag -width Pa -compact 509.It Pa /tmp/.bsdsort.PID.* 510Temporary files. 511.El 512.Sh EXIT STATUS 513The 514.Nm 515utility exits with one of the following values: 516.Pp 517.Bl -tag -width Ds -offset indent -compact 518.It 0 519Successfully sorted the input files or if used with 520.Fl C 521or 522.Fl c , 523the input file already met the sorting criteria. 524.It 1 525On disorder (or non-uniqueness) with the 526.Fl C 527or 528.Fl c 529options. 530.It 2 531An error occurred. 532.El 533.Sh SEE ALSO 534.Xr comm 1 , 535.Xr join 1 , 536.Xr uniq 1 537.Sh STANDARDS 538The 539.Nm 540utility is compliant with the 541.St -p1003.1-2008 542specification, except that it ignores the user's 543.Xr locale 1 544and always assumes 545.Ev LC_ALL Ns =C. 546.Pp 547The flags 548.Op Fl gHhiMRSsTVz 549are extensions to that specification. 550.Pp 551All long options are extensions to the specification. 552Some are provided for compatibility with GNU 553.Nm , 554others are specific to this implementation. 555.Pp 556Some implementations of 557.Nm 558honor the 559.Fl b 560option even when no key fields are specified. 561This implementation follows historic practice and 562.St -p1003.1-2008 563in only honoring 564.Fl b 565when it precedes a key field. 566.Pp 567The historic practice of allowing the 568.Fl o 569option to appear after the 570.Ar file 571is supported for compatibility with older versions of 572.Nm . 573.Pp 574The historic key notations 575.Cm \(pl Ns Ar pos1 576and 577.Fl Ns Ar pos2 578are supported for compatibility with older versions of 579.Nm 580but their use is highly discouraged. 581.Sh HISTORY 582A 583.Nm 584command appeared in 585.At v1 . 586.Sh AUTHORS 587.An Gabor Kovesdan Aq Mt gabor@FreeBSD.org 588.An Oleg Moskalenko Aq Mt mom040267@gmail.com 589.Sh CAVEATS 590This implementation of 591.Nm 592has no limits on input line length (other than imposed by available 593memory) or any restrictions on bytes allowed within lines. 594.Pp 595The performance depends highly on 596efficient choice of sort keys and key complexity. 597The fastest sort is on whole lines, with option 598.Fl s . 599For the key specification, the simpler to process the 600lines the faster the search will be. 601.Pp 602When sorting by arithmetic value, using 603.Fl n 604results in much better performance than 605.Fl g 606so its use is encouraged whenever possible. 607