xref: /netbsd/usr.bin/sort/sort.1 (revision bf9ec67e)
1.\"	$NetBSD: sort.1,v 1.18 2002/02/08 01:36:33 ross Exp $
2.\"
3.\" Copyright (c) 1991, 1993
4.\"	The Regents of the University of California.  All rights reserved.
5.\"
6.\" This code is derived from software contributed to Berkeley by
7.\" the Institute of Electrical and Electronics Engineers, Inc.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"	This product includes software developed by the University of
20.\"	California, Berkeley and its contributors.
21.\" 4. Neither the name of the University nor the names of its contributors
22.\"    may be used to endorse or promote products derived from this software
23.\"    without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
26.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
27.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
28.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
29.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
30.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
31.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
32.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
33.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
34.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
35.\" SUCH DAMAGE.
36.\"
37.\"     @(#)sort.1	8.1 (Berkeley) 6/6/93
38.\"
39.Dd January 13, 2001
40.Dt SORT 1
41.Os
42.Sh NAME
43.Nm sort
44.Nd sort or merge text files
45.Sh SYNOPSIS
46.Nm sort
47.Op Fl cmubdfHinrsS
48.Op Fl t Ar char
49.Op Fl R Ar char
50.Oo
51.Fl k
52.Ar field1 Ns Op Li \&, Ns Ar field2
53.Oc
54.Op Fl T Ar dir
55.Op Fl o Ar output
56.Op Ar
57.Sh DESCRIPTION
58The
59.Nm
60utility sorts text files by lines.
61Comparisons are based on one or more sort keys extracted
62from each line of input, and are performed lexicographically.
63By default, if keys are not given,
64.Nm
65regards each input line as a single field.
66.Pp
67The following options are available:
68.Bl -tag -width Fl
69.It Fl c
70Check that the single input file is sorted.
71If the file is not sorted,
72.Nm
73produces the appropriate error messages and exits with code 1; otherwise,
74.Nm
75returns 0.
76.Nm
77.Fl c
78produces no output.
79.It Fl m
80Merge only; the input files are assumed to be pre-sorted.
81.It Fl o Ar output
82The argument given is the name of an
83.Ar output
84file to be used instead of the standard output.
85This file can be the same as one of the input files.
86.It Fl T Ar dir
87Use
88.Ar dir
89as the directory for temporary files.
90The default is the value specified in the environment variable
91.Ev TMPDIR or
92.Pa /tmp
93if
94.Ev TMPDIR
95is not defined.
96.It Fl u
97Unique: suppress all but one in each set of lines having equal keys.
98If used with the
99.Fl c
100option, check that there are no lines with duplicate keys.
101.El
102.Pp
103The following options override the default ordering rules.
104When ordering options appear independent of key field
105specifications, the requested field ordering rules are
106applied globally to all sort keys.
107When attached to a specific key (see
108.Fl k ) ,
109the ordering options override
110all global ordering options for that key.
111.Bl -tag -width Fl
112.It Fl d
113Only blank space and alphanumeric characters
114.\" according
115.\" to the current setting of LC_CTYPE
116are used
117in making comparisons.
118.It Fl f
119Considers all lowercase characters that have uppercase
120equivalents to be the same for purposes of comparison.
121.It Fl i
122Ignore all non-printable characters.
123.It Fl n
124An initial numeric string, consisting of optional blank space, optional
125minus sign, and zero or more digits (including decimal point)
126.\" with
127.\" optional radix character and thousands
128.\" separator
129.\" (as defined in the current locale),
130is sorted by arithmetic value.
131(The
132.Fl n
133option no longer implies the
134.Fl b
135option.)
136.It Fl r
137Reverse the sense of comparisons.
138.It Fl S
139Don't use stable sort.
140Default is to use stable sort.
141.It Fl s
142Use stable sort.
143This is the default.
144Provided for compatiblity with other
145.Nm
146implementations only.
147.It Fl H
148Use a merge sort instead of a radix sort.
149This option should be used for files larger than 60Mb.
150.El
151.Pp
152The treatment of field separators can be altered using these options:
153.Bl -tag -width Fl
154.It Fl b
155Ignores leading blank space when determining the start
156and end of a restricted sort key.
157A
158.Fl b
159option specified before the first
160.Fl k
161option applies globally to all
162.Fl k
163options.
164Otherwise, the
165.Fl b
166option can be attached independently to each
167.Ar field
168argument of the
169.Fl k
170option (see below).
171Note that the
172.Fl b
173option has no effect unless key fields are specified.
174.It Fl t Ar char
175.Ar char
176is used as the field separator character.
177The initial
178.Ar char
179is not considered to be part of a field when determining
180key offsets (see below).
181Each occurrence of
182.Ar char
183is significant (for example,
184.Dq Ar charchar
185delimits an empty field).
186If
187.Fl t
188is not specified, the default field separator is a sequence of
189blank-space characters, and consecutive blank spaces do
190.Em not
191delimit an empty field; further, the initial blank space
192.Em is
193considered part of a field when determining key offsets.
194.It Fl R Ar char
195.Ar char
196is used as the record separator character.
197This should be used with discretion;
198.Fl R Ar \*[Lt]alphanumeric\*[Gt]
199usually produces undesirable results.
200The default record separator is newline.
201.It Xo
202.Fl k
203.Ar field1 Ns Op Li \&, Ns Ar field2
204.Xc
205Designates the starting position,
206.Ar field1 ,
207and optional ending position,
208.Ar field2 ,
209of a key field.
210The
211.Fl k
212option replaces the obsolescent options
213.Cm \(pl Ns Ar pos1
214and
215.Fl Ns Ar pos2 .
216.El
217.Pp
218The following operands are available:
219.Bl -tag -width Ar
220.It Ar file
221The pathname of a file to be sorted, merged, or checked.
222If no
223.Ar file
224operands are specified, or if
225a
226.Ar file
227operand is
228.Fl ,
229the standard input is used.
230.El
231.Pp
232A field is defined as a minimal sequence of characters followed by a
233field separator or a newline character.
234By default, the first
235blank space of a sequence of blank spaces acts as the field separator.
236All blank spaces in a sequence of blank spaces are considered
237as part of the next field; for example, all blank spaces at
238the beginning of a line are considered to be part of the
239first field.
240.Pp
241Fields are specified
242by the
243.Fl k
244.Ar field1 Ns Op \&, Ns Ar field2
245argument.
246A missing
247.Ar field2
248argument defaults to the end of a line.
249.Pp
250The arguments
251.Ar field1
252and
253.Ar field2
254have the form
255.Ar m Ns Li \&. Ns Ar n
256and can be followed by one or more of the letters
257.Cm b , d , f , i ,
258.Cm n ,
259and
260.Cm r ,
261which correspond to the options discussed above.
262A
263.Ar field1
264position specified by
265.Ar m Ns Li \&. Ns Ar n
266.Pq Ar m , n No \*[Gt] 0
267is interpreted as the
268.Ar n Ns th
269character in the
270.Ar m Ns th
271field.
272A missing
273.Li \&. Ns Ar n
274in
275.Ar field1
276means
277.Ql \&.1 ,
278indicating the first character of the
279.Ar m Ns th
280field; if the
281.Fl b
282option is in effect,
283.Ar n
284is counted from the first non-blank character in the
285.Ar m Ns th
286field;
287.Ar m Ns Li \&.1b
288refers to the first non-blank character in the
289.Ar m Ns th
290field.
291.Pp
292A
293.Ar field2
294position specified by
295.Ar m Ns Li \&. Ns Ar n
296is interpreted as
297the
298.Ar n Ns th
299character (including separators) of the
300.Ar m Ns th
301field.
302A missing
303.Li \&. Ns Ar n
304indicates the last character of the
305.Ar m Ns th
306field;
307.Ar m
308= \&0
309designates the end of a line.
310Thus the option
311.Fl k
312.Sm off
313.Xo
314.Ar v Li \&. Ar x Li \&,
315.Ar w Li \&. Ar y
316.Xc
317.Sm on
318is synonymous with the obsolescent option
319.Sm off
320.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1
321.Fl Ar w-\&1 Li \&. Ar y ;
322.Sm on
323when
324.Ar y
325is omitted,
326.Fl k
327.Sm off
328.Ar v Li \&. Ar x Li \&, Ar w
329.Sm on
330is synonymous with
331.Sm off
332.Cm \(pl Ar v-\&1 Li \&. Ar x-\&1
333.Fl Ar w+1 Li \&.0 .
334.Sm on
335The obsolescent
336.Cm \(pl Ns Ar pos1
337.Fl Ns Ar pos2
338option is still supported, except for
339.Fl Ns Ar w Ns Li \&.0b ,
340which has no
341.Fl k
342equivalent.
343.Sh RETURN VALUES
344Sort exits with one of the following values:
345.Bl -tag -width flag -compact
346.It 0
347Normal behavior.
348.It 1
349On disorder (or non-uniqueness) with the
350.Fl c
351option
352.It 2
353An error occurred.
354.El
355.Sh ENVIRONMENT
356If the following environment variable exists, it is utilized by
357.Nm "" .
358.Bl -tag -width Ev
359.It Ev TMPDIR
360.Nm
361uses the contents of the
362.Ev TMPDIR
363environment variable as the path in which to store
364temporary files.
365.El
366.Sh FILES
367.Bl -tag -width outputNUMBER+some -compact
368.It Pa /tmp/sort.*
369Default temporary files.
370.It Pa Ar output Ns NUMBER
371Temporary file which is used for output if
372.Ar output
373already exists.
374Once sorting is finished, this file replaces
375.Ar output
376(via
377.Xr link 2
378and
379.Xr unlink 2 ) .
380.El
381.Sh SEE ALSO
382.Xr comm 1 ,
383.Xr join 1 ,
384.Xr uniq 1 ,
385.Xr qsort 3 ,
386.Xr radixsort 3
387.Sh HISTORY
388A
389.Nm
390command appeared in
391.At v5 .
392This
393.Nm
394implementation appeared in
395.Bx 4.4
396and is used since
397.Nx 1.6 .
398.Sh BUGS
399To sort files larger than 60Mb, use
400.Nm
401.Fl H ;
402files larger than 704Mb must be sorted in smaller pieces, then merged.
403.Sh NOTES
404This
405.Nm
406has no limits on input line length (other than imposed by available
407memory) or any restrictions on bytes allowed within lines.
408.Pp
409To protect data
410.Nm
411.Fl o
412calls
413.Xr link 2
414and
415.Xr unlink 2 ,
416and thus fails on protected directories.
417.Pp
418Input files should be text files.
419If file doesn't end with record separator (which is typically newline), the
420.Nm
421utility silently supplies one.
422.Pp
423The current
424.Nm
425uses lexicographic radix sorting, which requires
426that sort keys be kept in memory (as opposed to previous versions which used quick
427and merge sorts and did not.)
428Thus performance depends highly on efficient choice of sort keys, and the
429.Fl b
430option and the
431.Ar field2
432argument of the
433.Fl k
434option should be used whenever possible.
435Similarly,
436.Nm
437.Fl k1f
438is equivalent to
439.Nm
440.Fl f
441and may take twice as long.
442