xref: /original-bsd/contrib/sort/sort.1 (revision c3e32dec)
1.\" Copyright (c) 1991, 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" This code is derived from software contributed to Berkeley by
5.\" the Institute of Electrical and Electronics Engineers, Inc.
6.\"
7.\" %sccs.include.redist.roff%
8.\"
9.\"     @(#)sort.1	8.1 (Berkeley) 06/06/93
10.\"
11.Dd
12.Dt SORT 1
13.Os
14.Sh NAME
15.Nm sort
16.Nd sort or merge text files
17.Sh SYNOPSIS
18.Nm sort
19.Op Fl cmubdfinr
20.Op Fl t Ar char
21.Op Fl T Ar char
22.Oo
23.Cm Fl k Ar field1[,field2]
24.Oc
25.Ar ...
26.Op Fl o Ar output
27.Op Ar file
28.Ar ...
29.Sh DESCRIPTION
30The
31.Nm sort
32utility
33sorts text files by lines.
34Comparisons are based on one or more sort keys extracted
35from each line of input, and are performed
36lexicographically. By default, if keys are not given,
37.Nm sort
38regards each input line as a single field.
39.Pp
40The following options are available:
41.Bl -tag -width indent
42.It Fl c
43Check that the single input file is sorted.
44If the file is not sorted,
45.Nm sort
46produces the appropriate error messages and exits with code 1;
47otherwise,
48.Nm sort
49returns 0.
50.Nm Sort
51.Fl c
52produces no output.
53.It Fl m
54Merge only; the input files are assumed to be pre-sorted.
55.It Fl o Ar output
56The argument given is the name of an
57.Ar output
58file to
59be used instead of the standard output.
60This file
61can be the same as one of the input files.
62.It Fl u
63Unique: suppress all but one in each set of lines
64having equal keys.
65If used with the
66.Fl c
67option,
68check that there are no lines with duplicate keys.
69.El
70.Pp
71The following options override the default ordering rules.
72When ordering options appear independent of key field
73specifications, the requested field ordering rules are
74applied globally to all sort keys.
75When attached to a specific key (see
76.Fl k ) ,
77the ordering options override
78all global ordering options for that key.
79.Bl -tag -width indent
80.It Fl d
81Only blank space and alphanumeric characters
82.\" according
83.\" to the current setting of LC_CTYPE
84are used
85in making comparisons.
86.It Fl f
87Considers all lowercase characters that have uppercase
88equivalents to be the same for purposes of
89comparison.
90.It Fl i
91Ignore all non-printable characters.
92.It Fl n
93An initial numeric string, consisting of optional
94blank space, optional minus sign, and zero or more
95digits (including decimal point)
96.\" with
97.\" optional radix character and thousands
98.\" separator
99.\" (as defined in the current locale),
100is sorted by arithmetic value.
101(The
102.Fl n
103option no longer implies
104the
105.Fl b
106option.)
107.It Fl r
108Reverse the sense of comparisons.
109.El
110.Pp
111The treatment of field separators can be altered using the
112options:
113.Bl -tag -width indent
114.It Fl b
115Ignores leading blank space when determining the start
116and end of a restricted sort key.
117A
118.Fl b
119option specified before the first
120.Fl k
121option applies globally to all
122.Fl k
123options.
124Otherwise, the
125.Fl b
126option can be
127attached independently to each
128.Ar field
129argument of the
130.Fl k
131option (see below).
132Note that the
133.Fl b
134option
135has no effect unless key fields are specified.
136.It Fl t Ar char
137.Ar Char
138is used as the field separator character. The initial
139.Ar char
140is not considered to be part of a field when determining
141key offsets (see below).
142Each occurrence of
143.Ar char
144is significant (for example,
145.Dq Ar charchar
146delimits an empty field).
147If
148.Fl t
149is not specified,
150blank space characters are used as default field
151separators.
152.It Fl T Ar char
153.Ar Char
154is used as the record separator character.
155This should be used with discretion;
156.Fl T Ar <alphanumeric>
157usually produces undesirable results.
158The default line separator is newline.
159.It Fl k Ar field1[,field2]
160Designates the starting position,
161.Ar field1 ,
162and optional ending position,
163.Ar field2 ,
164of a key field.
165The
166.Fl k
167option replaces the obsolescent options
168.Cm \(pl Ns Ar pos1
169and
170.Fl Ns Ar pos2 .
171.El
172.Pp
173The following operands are available:
174.Bl -tag -width indent
175.Ar file
176The pathname of a file to be sorted, merged, or checked.
177If no file
178operands are specified, or if
179a file operand is
180.Fl ,
181the standard input is used.
182.Pp
183A field is
184defined as a minimal sequence of characters followed by a
185field separator or a newline character.
186By default, the first
187blank space of a sequence of blank spaces acts as the field separator.
188All blank spaces in a sequence of blank spaces are considered
189as part of the next field; for example, all blank spaces at
190the beginning of a line are considered to be part of the
191first field.
192.Pp
193Fields are specified
194by the
195.Fl k Ar field1[,field2]
196argument. A missing
197.Ar field2
198argument defaults to the end of a line.
199.Pp
200The arguments
201.Ar field1
202and
203.Ar field2
204have the form
205.Em m.n
206followed by one or more of the options
207.Fl b , d , f , i ,
208.Fl n , r .
209A
210.Ar field1
211position specified by
212.Em m.n
213.Em (m,n > 0)
214is interpreted as the
215.Em n Ns th
216character in the
217.Em m Ns th
218field.
219A missing
220.Em \&.n
221in
222.Ar field1
223means
224.Ql \&.1 ,
225indicating the first character of the
226.Em m Ns th
227field;
228If the
229.Fl b
230option is in effect,
231.Em n
232is counted from the first
233non-blank character in the
234.Em m Ns th
235field;
236.Em m Ns \&.1b
237refers to the first
238non-blank character in the
239.Em m Ns th
240field.
241.Pp
242A
243.Ar field2
244position specified by
245.Em m.n
246is interpreted as
247the
248.Em n Ns th
249character (including separators) of the
250.Em m Ns th
251field.
252A missing
253.Em \&.n
254indicates the last character of the
255.Em m Ns th
256field;
257.Em m
258= \&0
259designates the end of a line.
260Thus the option
261.Fl k Ar v.x,w.y
262is synonymous with the obsolescent option
263.Cm \(pl Ns Ar v-\&1.x-\&1
264.Fl Ns Ar w-\&1.y ;
265when
266.Em y
267is omitted,
268.Fl k Ar v.x,w
269is synonymous with
270.Cm \(pl Ns Ar v-\&1.x-\&1
271.Fl Ns Ar w+1.0 .
272The obsolescent
273.Cm \(pl Ns Ar pos1
274.Fl Ns Ar pos2
275option is still supported, except for
276.Fl Ns Ar w\&.0b,
277which has no
278.Fl k
279equivalent.
280.Sh FILES
281.Bl -tag -width Pa -compact
282.It Pa /var/tmp/sort.*
283Default temporary directories.
284.It Pa Ar output Ns #PID
285Temporary name for
286.Ar output
287if
288.Ar output
289already exists.
290.El
291.Sh SEE ALSO
292.Xr comm 1 ,
293.Xr uniq 1 ,
294.Xr join 1
295.Sh RETURN VALUES
296Sort exits with one of the following values:
297.Bl -tag -width flag -compact
298.It Pa 0:
299normal behavior.
300.It Pa 1:
301on disorder (or non-uniqueness) with the
302.Fl c
303option
304.It Pa 2:
305an error occurred.
306.Sh BUGS
307Lines longer than 65522 characters are discarded and processing continues.
308To sort files larger than 60Mb, use
309.Nm sort
310.Fl H ;
311files larger than 704Mb must be sorted in smaller pieces, then merged.
312To protect data
313.Nm sort
314.Fl o
315calls link and unlink, and thus fails in protected directories.
316.Sh HISTORY
317A
318.Nm sort
319command appeared in
320.At v6 .
321.Sh NOTES
322The current sort command uses lexicographic radix sorting, which requires
323that sort keys be kept in memory (as opposed to previous versions which used quick
324and merge sorts and did not.)
325Thus performance depends highly on efficient choice of sort keys, and the
326.Fl b
327option and the
328.Ar field2
329argument of the
330.Fl k
331option should be used whenever possible.
332Similarly,
333.Nm sort
334.Fl k1f
335is equivalent to
336.Nm sort
337.Fl f
338and may take twice as long.
339