xref: /minix/usr.bin/sort/sort.1 (revision ebfedea0)
1.\"	$NetBSD: sort.1,v 1.34 2013/05/29 15:00:35 wiz Exp $
2.\"
3.\" Copyright (c) 2000-2003 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Ben Harris and Jaromir Dolecek.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGE.
29.\"
30.\" Copyright (c) 1991, 1993
31.\"	The Regents of the University of California.  All rights reserved.
32.\"
33.\" This code is derived from software contributed to Berkeley by
34.\" the Institute of Electrical and Electronics Engineers, Inc.
35.\"
36.\" Redistribution and use in source and binary forms, with or without
37.\" modification, are permitted provided that the following conditions
38.\" are met:
39.\" 1. Redistributions of source code must retain the above copyright
40.\"    notice, this list of conditions and the following disclaimer.
41.\" 2. Redistributions in binary form must reproduce the above copyright
42.\"    notice, this list of conditions and the following disclaimer in the
43.\"    documentation and/or other materials provided with the distribution.
44.\" 3. Neither the name of the University nor the names of its contributors
45.\"    may be used to endorse or promote products derived from this software
46.\"    without specific prior written permission.
47.\"
48.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
49.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
50.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
51.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
52.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
53.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
54.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
55.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
56.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
57.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
58.\" SUCH DAMAGE.
59.\"
60.\"     @(#)sort.1	8.1 (Berkeley) 6/6/93
61.\"
62.Dd May 29, 2013
63.Dt SORT 1
64.Os
65.Sh NAME
66.Nm sort
67.Nd sort or merge text files
68.Sh SYNOPSIS
69.Nm
70.Op Fl bcdfHilmnrSsu
71.Oo
72.Fl k
73.Ar field1 Ns Op Li \&, Ns Ar field2
74.Oc
75.Op Fl o Ar output
76.Op Fl R Ar char
77.Op Fl T Ar dir
78.Op Fl t Ar char
79.Op Ar
80.Sh DESCRIPTION
81The
82.Nm
83utility sorts text files by lines.
84Comparisons are based on one or more sort keys extracted
85from each line of input, and are performed lexicographically.
86By default, if keys are not given,
87.Nm
88regards each input line as a single field.
89.Pp
90The following options are available:
91.Bl -tag -width Fl
92.It Fl c
93Check that the single input file is sorted.
94If the file is not sorted,
95.Nm
96produces the appropriate error messages and exits with code 1; otherwise,
97.Nm
98returns 0.
99.Nm
100.Fl c
101produces no output.
102See also
103.Fl u .
104.It Fl H
105Ignored for compatibility with earlier versions of
106.Nm .
107.It Fl m
108Merge only; the input files are assumed to be pre-sorted.
109.It Fl o Ar output
110The argument given is the name of an
111.Ar output
112file to be used instead of the standard output.
113This file can be the same as one of the input files.
114.It Fl S
115Don't use stable sort.
116Default is to use stable sort.
117.It Fl s
118Use stable sort, keeps records with equal keys in their original order.
119This is the default.
120Provided for compatibility with other
121.Nm
122implementations only.
123.It Fl T Ar dir
124Use
125.Ar dir
126as the directory for temporary files.
127The default is the value specified in the environment variable
128.Ev TMPDIR or
129.Pa /tmp
130if
131.Ev TMPDIR
132is not defined.
133.It Fl u
134Unique: suppress all but one in each set of lines having equal keys.
135If used with the
136.Fl c
137option, check that there are no lines with duplicate keys.
138.El
139.Pp
140The following options override the default ordering rules.
141When ordering options appear independent of key field
142specifications, the requested field ordering rules are
143applied globally to all sort keys.
144When attached to a specific key (see
145.Fl k ) ,
146the ordering options override
147all global ordering options for that key.
148.Bl -tag -width Fl
149.It Fl d
150Only blank space and alphanumeric characters
151.\" according
152.\" to the current setting of LC_CTYPE
153are used
154in making comparisons.
155.It Fl f
156Considers all lowercase characters that have uppercase
157equivalents to be the same for purposes of comparison.
158.It Fl i
159Ignore all non-printable characters.
160.It Fl l
161Sort by the string length of the field, not by the field itself.
162.It Fl n
163An initial numeric string, consisting of optional blank space, optional
164plus or minus sign, and zero or more digits (including decimal point)
165.\" with
166.\" optional radix character and thousands
167.\" separator
168.\" (as defined in the current locale),
169is sorted by arithmetic value.
170(The
171.Fl n
172option no longer implies the
173.Fl b
174option.)
175.It Fl r
176Reverse the sense of comparisons.
177.El
178.Pp
179The treatment of field separators can be altered using these options:
180.Bl -tag -width Fl
181.It Fl b
182Ignores leading blank space when determining the start
183and end of a restricted sort key.
184A
185.Fl b
186option specified before the first
187.Fl k
188option applies globally to all
189.Fl k
190options.
191Otherwise, the
192.Fl b
193option can be attached independently to each
194.Ar field
195argument of the
196.Fl k
197option (see below).
198Note that the
199.Fl b
200option has no effect unless key fields are specified.
201.It Fl t Ar char
202.Ar char
203is used as the field separator character.
204The initial
205.Ar char
206is not considered to be part of a field when determining
207key offsets (see below).
208Each occurrence of
209.Ar char
210is significant (for example,
211.Dq Ar charchar
212delimits an empty field).
213If
214.Fl t
215is not specified, the default field separator is a sequence of
216blank-space characters, and consecutive blank spaces do
217.Em not
218delimit an empty field; further, the initial blank space
219.Em is
220considered part of a field when determining key offsets.
221.It Fl R Ar char
222.Ar char
223is used as the record separator character.
224This should be used with discretion;
225.Fl R Aq Ar alphanumeric
226usually produces undesirable results.
227The default record separator is newline.
228.It Fl k Ar field1 Ns Op Li \&, Ns Ar field2
229Designates the starting position,
230.Ar field1 ,
231and optional ending position,
232.Ar field2 ,
233of a key field.
234The
235.Fl k
236option replaces the obsolescent options
237.Cm \(pl Ns Ar pos1
238and
239.Fl Ns Ar pos2 .
240.El
241.Pp
242The following operands are available:
243.Bl -tag -width Ar
244.It Ar file
245The pathname of a file to be sorted, merged, or checked.
246If no
247.Ar file
248operands are specified, or if
249a
250.Ar file
251operand is
252.Fl ,
253the standard input is used.
254.El
255.Pp
256A field is defined as a minimal sequence of characters followed by a
257field separator or a newline character.
258By default, the first
259blank space of a sequence of blank spaces acts as the field separator.
260All blank spaces in a sequence of blank spaces are considered
261as part of the next field; for example, all blank spaces at
262the beginning of a line are considered to be part of the
263first field.
264.Pp
265Fields are specified
266by the
267.Fl k
268.Ar field1 Ns Op \&, Ns Ar field2
269argument.
270A missing
271.Ar field2
272argument defaults to the end of a line.
273.Pp
274The arguments
275.Ar field1
276and
277.Ar field2
278have the form
279.Ar m Ns Li \&. Ns Ar n
280and can be followed by one or more of the letters
281.Cm b , d , f , i ,
282.Cm l , n ,
283and
284.Cm r ,
285which correspond to the options discussed above.
286A
287.Ar field1
288position specified by
289.Ar m Ns Li \&. Ns Ar n
290.Pq Ar m , n No \*[Gt] 0
291is interpreted as the
292.Ar n Ns th
293character in the
294.Ar m Ns th
295field.
296A missing
297.Li \&. Ns Ar n
298in
299.Ar field1
300means
301.Ql \&.1 ,
302indicating the first character of the
303.Ar m Ns th
304field; if the
305.Fl b
306option is in effect,
307.Ar n
308is counted from the first non-blank character in the
309.Ar m Ns th
310field;
311.Ar m Ns Li \&.1b
312refers to the first non-blank character in the
313.Ar m Ns th
314field.
315.Pp
316A
317.Ar field2
318position specified by
319.Ar m Ns Li \&. Ns Ar n
320is interpreted as
321the
322.Ar n Ns th
323character (including separators) of the
324.Ar m Ns th
325field.
326A missing
327.Li \&. Ns Ar n
328indicates the last character of the
329.Ar m Ns th
330field;
331.Ar m
332= \&0
333designates the end of a line.
334Thus the option
335.Fl k
336.Sm off
337.Xo
338.Ar v Li \&. Ar x Li \&,
339.Ar w Li \&. Ar y
340.Xc
341.Sm on
342is synonymous with the obsolescent option
343.Sm off
344.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1
345.Fl Ar w-\&1 Li \&. Ar y ;
346.Sm on
347when
348.Ar y
349is omitted,
350.Fl k
351.Sm off
352.Ar v Li \&. Ar x Li \&, Ar w
353.Sm on
354is synonymous with
355.Sm off
356.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1
357.Fl Ar w+1 Li \&.0 .
358.Sm on
359The obsolescent
360.Cm \(pl Ns Ar pos1
361.Fl Ns Ar pos2
362option is still supported, except for
363.Fl Ns Ar w Ns Li \&.0b ,
364which has no
365.Fl k
366equivalent.
367.Sh ENVIRONMENT
368If the following environment variable exists, it is used by
369.Nm .
370.Bl -tag -width Ev
371.It Ev TMPDIR
372.Nm
373uses the contents of the
374.Ev TMPDIR
375environment variable as the path in which to store
376temporary files.
377.El
378.Sh FILES
379.Bl -tag -width outputNUMBER+some -compact
380.It Pa /tmp/sort.*
381Default temporary files.
382.It Ar output Ns NUMBER
383Temporary file which is used for output if
384.Ar output
385already exists.
386Once sorting is finished, this file replaces
387.Ar output
388(via
389.Xr link 2
390and
391.Xr unlink 2 ) .
392.El
393.Sh EXIT STATUS
394Sort exits with one of the following values:
395.Bl -tag -width flag -compact
396.It 0
397Normal behavior.
398.It 1
399On disorder (or non-uniqueness) with the
400.Fl c
401option
402.It 2
403An error occurred.
404.El
405.Sh SEE ALSO
406.Xr comm 1 ,
407.Xr join 1 ,
408.Xr uniq 1 ,
409.Xr qsort 3 ,
410.Xr radixsort 3
411.Sh HISTORY
412A
413.Nm
414command appeared in
415.At v5 .
416This
417.Nm
418implementation appeared in
419.Bx 4.4
420and is used since
421.Nx 1.6 .
422.Sh BUGS
423Posix requires the locale's thousands separator be ignored in numbers.
424It may be faster to sort very large files in pieces and then explicitly
425merge them.
426.Sh NOTES
427This
428.Nm
429has no limits on input line length (other than imposed by available
430memory) or any restrictions on bytes allowed within lines.
431.Pp
432To protect data
433.Nm
434.Fl o
435calls
436.Xr link 2
437and
438.Xr unlink 2 ,
439and thus fails on protected directories.
440.Pp
441Input files should be text files.
442If file doesn't end with record separator (which is typically newline), the
443.Nm
444utility silently supplies one.
445.Pp
446The current
447.Nm
448uses lexicographic radix sorting, which requires
449that sort keys be kept in memory (as opposed to previous versions which used quick
450and merge sorts and did not.)
451Thus performance depends highly on efficient choice of sort keys, and the
452.Fl b
453option and the
454.Ar field2
455argument of the
456.Fl k
457option should be used whenever possible.
458Similarly,
459.Nm
460.Fl k1f
461is equivalent to
462.Nm
463.Fl f
464and may take twice as long.
465