xref: /openbsd/usr.bin/sort/sort.1 (revision 610f49f8)
1.\"	$OpenBSD: sort.1,v 1.17 2002/02/10 15:50:15 aaron Exp $
2.\"
3.\" Copyright (c) 1991, 1993
4.\"	The Regents of the University of California.  All rights reserved.
5.\"
6.\" This code is derived from software contributed to Berkeley by
7.\" the Institute of Electrical and Electronics Engineers, Inc.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"	This product includes software developed by the University of
20.\"	California, Berkeley and its contributors.
21.\" 4. Neither the name of the University nor the names of its contributors
22.\"    may be used to endorse or promote products derived from this software
23.\"    without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
26.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
27.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
28.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
29.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
30.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
31.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
32.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
33.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
34.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
35.\" SUCH DAMAGE.
36.\"
37.\"     @(#)sort.1	8.1 (Berkeley) 6/6/93
38.\"
39.Dd June 6, 1993
40.Dt SORT 1
41.Os
42.Sh NAME
43.Nm sort
44.Nd sort or merge text files
45.Sh SYNOPSIS
46.Nm sort
47.Op Fl cmubdfinrH
48.Op Fl t Ar char
49.Op Fl R Ar char
50.Oo
51.Cm Fl k Ar field1[,field2]
52.Oc
53.Ar ...
54.Op Fl T Ar dir
55.Op Fl o Ar output
56.Op Ar file
57.Ar ...
58.Sh DESCRIPTION
59The
60.Nm
61utility sorts text files by lines.
62Comparisons are based on one or more sort keys extracted
63from each line of input, and are performed lexicographically.
64By default, if keys are not given,
65.Nm
66regards each input line as a single field.
67.Pp
68The options are as follows:
69.Bl -tag -width file Ds
70.It Fl c
71Check that the single input file is sorted.
72If the file is not sorted,
73.Nm
74produces the appropriate error messages and exits with code 1; otherwise,
75.Nm
76returns 0.
77.Nm
78.Fl c
79produces no output, except the error messages on
80.Em stderr .
81.It Fl m
82Merge only; the input files are assumed to be pre-sorted.
83.It Fl o Ar output
84The argument given is the name of an
85.Ar output
86file to be used instead of the standard output.
87This file can be the same as one of the input files.
88.It Fl T Ar dir
89Use
90.Ar dir
91as the directory for temporary files.
92The default is the contents of the environment variable
93.Ev TMPDIR
94or
95.Pa /var/tmp
96if
97.Ev TMPDIR
98does not exist.
99.It Fl u
100Unique: suppress all but one in each set of lines having equal keys.
101If used with the
102.Fl c
103option, check that there are no lines with duplicate keys.
104.El
105.Pp
106The following options override the default ordering rules.
107When ordering options appear independent of key field
108specifications, the requested field ordering rules are
109applied globally to all sort keys.
110When attached to a specific key (see
111.Fl k ) ,
112the ordering options override
113all global ordering options for that key.
114.Bl -tag -width indent
115.It Fl d
116Only blank space and alphanumeric characters
117.\" according
118.\" to the current setting of LC_CTYPE
119are used in making comparisons.
120.It Fl f
121Considers all lowercase characters that have uppercase
122equivalents to be the same for purposes of comparison.
123.It Fl i
124Ignore all non-printable characters.
125.It Fl n
126An initial numeric string, consisting of optional blank space, optional
127minus sign, and zero or more digits (including decimal point)
128.\" with
129.\" optional radix character and thousands
130.\" separator
131.\" (as defined in the current locale),
132is sorted by arithmetic value.
133(The
134.Fl n
135option no longer implies the
136.Fl b
137option.)
138.It Fl r
139Reverse the sense of comparisons.
140.It Fl H
141Use a merge sort instead of a radix sort.
142This option should be used for files larger than 60Mb.
143.El
144.Pp
145The treatment of field separators can be altered using these options:
146.Bl -tag -width indent
147.It Fl b
148Ignores leading blank space when determining the start
149and end of a restricted sort key.
150A
151.Fl b
152option specified before the first
153.Fl k
154option applies globally to all
155.Fl k
156options.
157Otherwise, the
158.Fl b
159option can be attached independently to each
160.Ar field
161argument of the
162.Fl k
163option (see below).
164Note that the
165.Fl b
166option has no effect unless key fields are specified.
167.It Fl t Ar char
168.Ar char
169is used as the field separator character.
170The initial
171.Ar char
172is not considered to be part of a field when determining key offsets.
173Each occurrence of
174.Ar char
175is significant (for example,
176.Dq Ar charchar
177delimits an empty field).
178If
179.Fl t
180is not specified, the default field separator is a sequence of
181blank-space characters, and consecutive blank spaces do
182.Em not
183delimit an empty field; further, the initial blank space
184.Em is
185considered part of a field when determining key offsets.
186.It Fl R Ar char
187.Ar char
188is used as the record separator character.
189This should be used with discretion;
190.Fl R Ar <alphanumeric>
191usually produces undesirable results.
192The default record separator is newline.
193.It Fl k Ar field1[,field2]
194Designates the starting position,
195.Ar field1 ,
196and optional ending position,
197.Ar field2 ,
198of a key field.
199The
200.Fl k
201option replaces the obsolescent options
202.Cm \(pl Ns Ar pos1
203and
204.Fl Ns Ar pos2 .
205.El
206.Pp
207The following operands are available:
208.Bl -tag -width indent
209.It Ar file
210The pathname of a file to be sorted, merged, or checked.
211If no
212.Ar file
213operands are specified, or if a
214.Ar file
215operand is
216.Fl ,
217the standard input is used.
218.El
219.Pp
220A field is defined as a maximal sequence of characters other than the
221field separator and record separator
222.Pq newline by default .
223Initial blank spaces are included in the field unless
224.Fl b
225has been specified;
226the first blank space of a sequence of blank spaces acts as the field
227separator and is included in the field (unless
228.Fl t
229is specified).
230For example, by default all blank spaces at the beginning of a line are
231considered to be part of the first field.
232.Pp
233Fields are specified by the
234.Fl k Ar field1[,field2]
235argument.
236A missing
237.Ar field2
238argument defaults to the end of a line.
239.Pp
240The arguments
241.Ar field1
242and
243.Ar field2
244have the form
245.Em m.n
246.Em (m,n > 0)
247and can be followed by one or more of the letters
248.Cm b , d , f , i ,
249.Cm n ,
250and
251.Cm r ,
252which correspond to the options discussed above.
253A
254.Ar field1
255position specified by
256.Em m.n
257is interpreted as the
258.Em n Ns th
259character from the beginning of the
260.Em m Ns th
261field.
262A missing
263.Em \&.n
264in
265.Ar field1
266means
267.Ql \&.1 ,
268indicating the first character of the
269.Em m Ns th
270field; if the
271.Fl b
272option is in effect,
273.Em n
274is counted from the first non-blank character in the
275.Em m Ns th
276field;
277.Em m Ns \&.1b
278refers to the first non-blank character in the
279.Em m Ns th
280field.
281.No 1\&. Ns Em n
282refers to the
283.Em n Ns th
284character from the beginning of the line;
285if
286.Em n
287is greater than the length of the line, the field is taken to be empty.
288.Pp
289A
290.Ar field2
291position specified by
292.Em m.n
293is interpreted as the
294.Em n Ns th
295character (including separators) of the
296.Em m Ns th
297field.
298A missing
299.Em \&.n
300indicates the last character of the
301.Em m Ns th
302field;
303.Em m
304= \&0
305designates the end of a line.
306Thus the option
307.Fl k Ar v.x,w.y
308is synonymous with the obsolescent option
309.Cm \(pl Ns Ar v-\&1.x-\&1
310.Fl Ns Ar w-\&1.y ;
311when
312.Em y
313is omitted,
314.Fl k Ar v.x,w
315is synonymous with
316.Cm \(pl Ns Ar v-\&1.x-\&1
317.Fl Ns Ar w+1.0 .
318The obsolescent
319.Cm \(pl Ns Ar pos1
320.Fl Ns Ar pos2
321option is still supported, except for
322.Fl Ns Ar w\&.0b ,
323which has no
324.Fl k
325equivalent.
326.Pp
327The
328.Nm
329utility shall exit with one of the following values:
330.Pp
331.Bl -tag -width flag -compact
332.It 0
333Normal behavior.
334.It 1
335On disorder (or non-uniqueness) with the
336.Fl c
337option.
338.It 2
339An error occurred.
340.El
341.Sh ENVIRONMENT
342.Bl -tag -width Fl
343.It Ev TMPDIR
344Path in which to store temporary files.
345Note that
346.Ev TMPDIR
347may be overridden by the
348.Fl T
349option.
350.El
351.Sh FILES
352.Bl -tag -width Pa -compact
353.It Pa /var/tmp/sort.*
354default temporary directories
355.It Pa Ar output Ns #PID
356temporary name for
357.Ar output
358if
359.Ar output
360already exists
361.El
362.Sh SEE ALSO
363.Xr comm 1 ,
364.Xr join 1 ,
365.Xr radixsort 3 ,
366.Xr uniq 1
367.Sh HISTORY
368A
369.Nm
370command appeared in
371.At v3 .
372.Sh NOTES
373.Nm
374has no limits on input line length (other than imposed by available
375memory) or any restrictions on bytes allowed within lines.
376.Pp
377To protect data
378.Nm
379.Fl o
380calls
381.Xr link 2
382and
383.Xr unlink 2 ,
384and thus fails on protected directories.
385.Pp
386The current sort command uses lexicographic radix sorting, which requires
387that sort keys be kept in memory (as opposed to previous versions which
388used quick and merge sorts and did not).
389Thus performance depends highly on efficient choice of sort keys, and the
390.Fl b
391option and the
392.Ar field2
393argument of the
394.Fl k
395option should be used whenever possible.
396Similarly,
397.Nm
398.Fl k1f
399is equivalent to
400.Nm
401.Fl f
402and may take twice as long.
403.Sh BUGS
404To sort files larger than 60Mb, use
405.Nm
406.Fl H ;
407files larger than 704Mb must be sorted in smaller pieces, then merged.
408