xref: /dragonfly/usr.bin/sort/sort.1 (revision 548a3528)
1.\"	$NetBSD: sort.1,v 1.31 2010/12/18 23:09:48 christos Exp $
2.\"
3.\" Copyright (c) 2000-2003 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Ben Harris and Jaromir Dolecek.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGE.
29.\"
30.\" Copyright (c) 1991, 1993
31.\"	The Regents of the University of California.  All rights reserved.
32.\"
33.\" This code is derived from software contributed to Berkeley by
34.\" the Institute of Electrical and Electronics Engineers, Inc.
35.\"
36.\" Redistribution and use in source and binary forms, with or without
37.\" modification, are permitted provided that the following conditions
38.\" are met:
39.\" 1. Redistributions of source code must retain the above copyright
40.\"    notice, this list of conditions and the following disclaimer.
41.\" 2. Redistributions in binary form must reproduce the above copyright
42.\"    notice, this list of conditions and the following disclaimer in the
43.\"    documentation and/or other materials provided with the distribution.
44.\" 3. Neither the name of the University nor the names of its contributors
45.\"    may be used to endorse or promote products derived from this software
46.\"    without specific prior written permission.
47.\"
48.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
49.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
50.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
51.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
52.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
53.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
54.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
55.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
56.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
57.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
58.\" SUCH DAMAGE.
59.\"
60.\"     @(#)sort.1	8.1 (Berkeley) 6/6/93
61.\"
62.Dd December 18, 2010
63.Dt SORT 1
64.Os
65.Sh NAME
66.Nm sort
67.Nd sort or merge text files
68.Sh SYNOPSIS
69.Nm
70.Op Fl bcdfHilmnrSsu
71.Oo
72.Fl k
73.Ar field1 Ns Op Li \&, Ns Ar field2
74.Oc
75.Op Fl o Ar output
76.Op Fl R Ar char
77.Op Fl T Ar dir
78.Op Fl t Ar char
79.Op Ar
80.Sh DESCRIPTION
81The
82.Nm
83utility sorts text files by lines.
84Comparisons are based on one or more sort keys extracted
85from each line of input, and are performed lexicographically.
86By default, if keys are not given,
87.Nm
88regards each input line as a single field.
89.Pp
90The following options are available:
91.Bl -tag -width Fl
92.It Fl c
93Check that the single input file is sorted.
94If the file is not sorted,
95.Nm
96produces the appropriate error messages and exits with code 1; otherwise,
97.Nm
98returns 0.
99.Nm
100.Fl c
101produces no output.
102.It Fl H
103Ignored for compatibility with earlier versions of
104.Nm .
105.It Fl m
106Merge only; the input files are assumed to be pre-sorted.
107.It Fl o Ar output
108The argument given is the name of an
109.Ar output
110file to be used instead of the standard output.
111This file can be the same as one of the input files.
112.It Fl S
113Don't use stable sort.
114Default is to use stable sort.
115.It Fl s
116Use stable sort, keeps records with equal keys in their original order.
117This is the default.
118Provided for compatibility with other
119.Nm
120implementations only.
121.It Fl T Ar dir
122Use
123.Ar dir
124as the directory for temporary files.
125The default is the value specified in the environment variable
126.Ev TMPDIR or
127.Pa /tmp
128if
129.Ev TMPDIR
130is not defined.
131.It Fl u
132Unique: suppress all but one in each set of lines having equal keys.
133If used with the
134.Fl c
135option, check that there are no lines with duplicate keys.
136.El
137.Pp
138The following options override the default ordering rules.
139When ordering options appear independent of key field
140specifications, the requested field ordering rules are
141applied globally to all sort keys.
142When attached to a specific key (see
143.Fl k ) ,
144the ordering options override
145all global ordering options for that key.
146.Bl -tag -width Fl
147.It Fl d
148Only blank space and alphanumeric characters
149.\" according
150.\" to the current setting of LC_CTYPE
151are used
152in making comparisons.
153.It Fl f
154Considers all lowercase characters that have uppercase
155equivalents to be the same for purposes of comparison.
156.It Fl i
157Ignore all non-printable characters.
158.It Fl l
159Sort by the string length of the field, not by the field itself.
160.It Fl n
161An initial numeric string, consisting of optional blank space, optional
162minus sign, and zero or more digits (including decimal point)
163.\" with
164.\" optional radix character and thousands
165.\" separator
166.\" (as defined in the current locale),
167is sorted by arithmetic value.
168(The
169.Fl n
170option no longer implies the
171.Fl b
172option.)
173.It Fl r
174Reverse the sense of comparisons.
175.El
176.Pp
177The treatment of field separators can be altered using these options:
178.Bl -tag -width Fl
179.It Fl b
180Ignores leading blank space when determining the start
181and end of a restricted sort key.
182A
183.Fl b
184option specified before the first
185.Fl k
186option applies globally to all
187.Fl k
188options.
189Otherwise, the
190.Fl b
191option can be attached independently to each
192.Ar field
193argument of the
194.Fl k
195option (see below).
196Note that the
197.Fl b
198option has no effect unless key fields are specified.
199.It Fl t Ar char
200.Ar char
201is used as the field separator character.
202The initial
203.Ar char
204is not considered to be part of a field when determining
205key offsets (see below).
206Each occurrence of
207.Ar char
208is significant (for example,
209.Dq Ar charchar
210delimits an empty field).
211If
212.Fl t
213is not specified, the default field separator is a sequence of
214blank-space characters, and consecutive blank spaces do
215.Em not
216delimit an empty field; further, the initial blank space
217.Em is
218considered part of a field when determining key offsets.
219.It Fl R Ar char
220.Ar char
221is used as the record separator character.
222This should be used with discretion;
223.Fl R Ar \*[Lt]alphanumeric\*[Gt]
224usually produces undesirable results.
225The default record separator is newline.
226.It Fl k Ar field1 Ns Op Li \&, Ns Ar field2
227Designates the starting position,
228.Ar field1 ,
229and optional ending position,
230.Ar field2 ,
231of a key field.
232The
233.Fl k
234option replaces the obsolescent options
235.Cm \(pl Ns Ar pos1
236and
237.Fl Ns Ar pos2 .
238.El
239.Pp
240The following operands are available:
241.Bl -tag -width Ar
242.It Ar file
243The pathname of a file to be sorted, merged, or checked.
244If no
245.Ar file
246operands are specified, or if
247a
248.Ar file
249operand is
250.Fl ,
251the standard input is used.
252.El
253.Pp
254A field is defined as a minimal sequence of characters followed by a
255field separator or a newline character.
256By default, the first
257blank space of a sequence of blank spaces acts as the field separator.
258All blank spaces in a sequence of blank spaces are considered
259as part of the next field; for example, all blank spaces at
260the beginning of a line are considered to be part of the
261first field.
262.Pp
263Fields are specified
264by the
265.Fl k
266.Ar field1 Ns Op \&, Ns Ar field2
267argument.
268A missing
269.Ar field2
270argument defaults to the end of a line.
271.Pp
272The arguments
273.Ar field1
274and
275.Ar field2
276have the form
277.Ar m Ns Li \&. Ns Ar n
278and can be followed by one or more of the letters
279.Cm b , d , f , i ,
280.Cm l , n ,
281and
282.Cm r ,
283which correspond to the options discussed above.
284A
285.Ar field1
286position specified by
287.Ar m Ns Li \&. Ns Ar n
288.Pq Ar m , n No \*[Gt] 0
289is interpreted as the
290.Ar n Ns th
291character in the
292.Ar m Ns th
293field.
294A missing
295.Li \&. Ns Ar n
296in
297.Ar field1
298means
299.Ql \&.1 ,
300indicating the first character of the
301.Ar m Ns th
302field; if the
303.Fl b
304option is in effect,
305.Ar n
306is counted from the first non-blank character in the
307.Ar m Ns th
308field;
309.Ar m Ns Li \&.1b
310refers to the first non-blank character in the
311.Ar m Ns th
312field.
313.Pp
314A
315.Ar field2
316position specified by
317.Ar m Ns Li \&. Ns Ar n
318is interpreted as
319the
320.Ar n Ns th
321character (including separators) of the
322.Ar m Ns th
323field.
324A missing
325.Li \&. Ns Ar n
326indicates the last character of the
327.Ar m Ns th
328field;
329.Ar m
330= \&0
331designates the end of a line.
332Thus the option
333.Fl k
334.Sm off
335.Xo
336.Ar v Li \&. Ar x Li \&,
337.Ar w Li \&. Ar y
338.Xc
339.Sm on
340is synonymous with the obsolescent option
341.Sm off
342.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1
343.Fl Ar w-\&1 Li \&. Ar y ;
344.Sm on
345when
346.Ar y
347is omitted,
348.Fl k
349.Sm off
350.Ar v Li \&. Ar x Li \&, Ar w
351.Sm on
352is synonymous with
353.Sm off
354.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1
355.Fl Ar w+1 Li \&.0 .
356.Sm on
357The obsolescent
358.Cm \(pl Ns Ar pos1
359.Fl Ns Ar pos2
360option is still supported, except for
361.Fl Ns Ar w Ns Li \&.0b ,
362which has no
363.Fl k
364equivalent.
365.Sh ENVIRONMENT
366If the following environment variable exists, it is used by
367.Nm .
368.Bl -tag -width Ev
369.It Ev TMPDIR
370.Nm
371uses the contents of the
372.Ev TMPDIR
373environment variable as the path in which to store
374temporary files.
375.El
376.Sh FILES
377.Bl -tag -width outputNUMBER+some -compact
378.It Pa /tmp/sort.*
379Default temporary files.
380.It Ar output Ns NUMBER
381Temporary file which is used for output if
382.Ar output
383already exists.
384Once sorting is finished, this file replaces
385.Ar output
386(via
387.Xr link 2
388and
389.Xr unlink 2 ) .
390.El
391.Sh EXIT STATUS
392Sort exits with one of the following values:
393.Bl -tag -width flag -compact
394.It 0
395Normal behavior.
396.It 1
397On disorder (or non-uniqueness) with the
398.Fl c
399option
400.It 2
401An error occurred.
402.El
403.Sh SEE ALSO
404.Xr comm 1 ,
405.Xr join 1 ,
406.Xr uniq 1 ,
407.Xr qsort 3 ,
408.Xr radixsort 3
409.Sh HISTORY
410A
411.Nm
412command appeared in
413.At v5 .
414This
415.Nm
416implementation appeared in
417.Bx 4.4
418and is used since
419.Nx 1.6 .
420.Sh BUGS
421Posix requires the locale's thousands separator be ignored in numbers.
422It may be faster to sort very large files in pieces and then explicitly
423merge them.
424.Sh NOTES
425This
426.Nm
427has no limits on input line length (other than imposed by available
428memory) or any restrictions on bytes allowed within lines.
429.Pp
430To protect data
431.Nm
432.Fl o
433calls
434.Xr link 2
435and
436.Xr unlink 2 ,
437and thus fails on protected directories.
438.Pp
439Input files should be text files.
440If file doesn't end with record separator (which is typically newline), the
441.Nm
442utility silently supplies one.
443.Pp
444The current
445.Nm
446uses lexicographic radix sorting, which requires
447that sort keys be kept in memory (as opposed to previous versions which used quick
448and merge sorts and did not.)
449Thus performance depends highly on efficient choice of sort keys, and the
450.Fl b
451option and the
452.Ar field2
453argument of the
454.Fl k
455option should be used whenever possible.
456Similarly,
457.Nm
458.Fl k1f
459is equivalent to
460.Nm
461.Fl f
462and may take twice as long.
463