xref: /freebsd/usr.bin/tr/tr.1 (revision 4b9d6057)
1.\" Copyright (c) 1991, 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" This code is derived from software contributed to Berkeley by
5.\" the Institute of Electrical and Electronics Engineers, Inc.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\" 3. Neither the name of the University nor the names of its contributors
16.\"    may be used to endorse or promote products derived from this software
17.\"    without specific prior written permission.
18.\"
19.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
20.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
22.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
23.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
25.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
26.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
27.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
28.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
29.\" SUCH DAMAGE.
30.\"
31.Dd October 13, 2006
32.Dt TR 1
33.Os
34.Sh NAME
35.Nm tr
36.Nd translate characters
37.Sh SYNOPSIS
38.Nm
39.Op Fl Ccsu
40.Ar string1 string2
41.Nm
42.Op Fl Ccu
43.Fl d
44.Ar string1
45.Nm
46.Op Fl Ccu
47.Fl s
48.Ar string1
49.Nm
50.Op Fl Ccu
51.Fl ds
52.Ar string1 string2
53.Sh DESCRIPTION
54The
55.Nm
56utility copies the standard input to the standard output with substitution
57or deletion of selected characters.
58.Pp
59The following options are available:
60.Bl -tag -width Ds
61.It Fl C
62Complement the set of characters in
63.Ar string1 ,
64that is
65.Dq Fl C Li ab
66includes every character except for
67.Ql a
68and
69.Ql b .
70.It Fl c
71Same as
72.Fl C
73but complement the set of values in
74.Ar string1 .
75.It Fl d
76Delete characters in
77.Ar string1
78from the input.
79.It Fl s
80Squeeze multiple occurrences of the characters listed in the last
81operand (either
82.Ar string1
83or
84.Ar string2 )
85in the input into a single instance of the character.
86This occurs after all deletion and translation is completed.
87.It Fl u
88Guarantee that any output is unbuffered.
89.El
90.Pp
91In the first synopsis form, the characters in
92.Ar string1
93are translated into the characters in
94.Ar string2
95where the first character in
96.Ar string1
97is translated into the first character in
98.Ar string2
99and so on.
100If
101.Ar string1
102is longer than
103.Ar string2 ,
104the last character found in
105.Ar string2
106is duplicated until
107.Ar string1
108is exhausted.
109.Pp
110In the second synopsis form, the characters in
111.Ar string1
112are deleted from the input.
113.Pp
114In the third synopsis form, the characters in
115.Ar string1
116are compressed as described for the
117.Fl s
118option.
119.Pp
120In the fourth synopsis form, the characters in
121.Ar string1
122are deleted from the input, and the characters in
123.Ar string2
124are compressed as described for the
125.Fl s
126option.
127.Pp
128The following conventions can be used in
129.Ar string1
130and
131.Ar string2
132to specify sets of characters:
133.Bl -tag -width [:equiv:]
134.It character
135Any character not described by one of the following conventions
136represents itself.
137.It \eoctal
138A backslash followed by 1, 2 or 3 octal digits represents a character
139with that encoded value.
140To follow an octal sequence with a digit as a character, left zero-pad
141the octal sequence to the full 3 octal digits.
142.It \echaracter
143A backslash followed by certain special characters maps to special
144values.
145.Bl -column "\ea"
146.It "\ea	<alert character>"
147.It "\eb	<backspace>"
148.It "\ef	<form-feed>"
149.It "\en	<newline>"
150.It "\er	<carriage return>"
151.It "\et	<tab>"
152.It "\ev	<vertical tab>"
153.El
154.Pp
155A backslash followed by any other character maps to that character.
156.It c-c
157For non-octal range endpoints
158represents the range of characters between the range endpoints, inclusive,
159in ascending order,
160as defined by the collation sequence.
161If either or both of the range endpoints are octal sequences, it
162represents the range of specific coded values between the
163range endpoints, inclusive.
164.Pp
165.Bf Em
166See the
167.Sx COMPATIBILITY
168section below for an important note regarding
169differences in the way the current
170implementation interprets range expressions differently from
171previous implementations.
172.Ef
173.It [:class:]
174Represents all characters belonging to the defined character class.
175Class names are:
176.Bl -column "phonogram"
177.It "alnum	<alphanumeric characters>"
178.It "alpha	<alphabetic characters>"
179.It "blank	<whitespace characters>"
180.It "cntrl	<control characters>"
181.It "digit	<numeric characters>"
182.It "graph	<graphic characters>"
183.It "ideogram	<ideographic characters>"
184.It "lower	<lower-case alphabetic characters>"
185.It "phonogram	<phonographic characters>"
186.It "print	<printable characters>"
187.It "punct	<punctuation characters>"
188.It "rune	<valid characters>"
189.It "space	<space characters>"
190.It "special	<special characters>"
191.It "upper	<upper-case characters>"
192.It "xdigit	<hexadecimal characters>"
193.El
194.Pp
195.\" All classes may be used in
196.\" .Ar string1 ,
197.\" and in
198.\" .Ar string2
199.\" when both the
200.\" .Fl d
201.\" and
202.\" .Fl s
203.\" options are specified.
204.\" Otherwise, only the classes ``upper'' and ``lower'' may be used in
205.\" .Ar string2
206.\" and then only when the corresponding class (``upper'' for ``lower''
207.\" and vice-versa) is specified in the same relative position in
208.\" .Ar string1 .
209.\" .Pp
210When
211.Dq Li [:lower:]
212appears in
213.Ar string1
214and
215.Dq Li [:upper:]
216appears in the same relative position in
217.Ar string2 ,
218it represents the characters pairs from the
219.Dv toupper
220mapping in the
221.Ev LC_CTYPE
222category of the current locale.
223When
224.Dq Li [:upper:]
225appears in
226.Ar string1
227and
228.Dq Li [:lower:]
229appears in the same relative position in
230.Ar string2 ,
231it represents the characters pairs from the
232.Dv tolower
233mapping in the
234.Ev LC_CTYPE
235category of the current locale.
236.Pp
237With the exception of case conversion,
238characters in the classes are in unspecified order.
239.Pp
240For specific information as to which
241.Tn ASCII
242characters are included
243in these classes, see
244.Xr ctype 3
245and related manual pages.
246.It [=equiv=]
247Represents all characters belonging to the same equivalence class as
248.Ar equiv ,
249ordered by their encoded values.
250.It [#*n]
251Represents
252.Ar n
253repeated occurrences of the character represented by
254.Ar # .
255This
256expression is only valid when it occurs in
257.Ar string2 .
258If
259.Ar n
260is omitted or is zero, it is be interpreted as large enough to extend
261.Ar string2
262sequence to the length of
263.Ar string1 .
264If
265.Ar n
266has a leading zero, it is interpreted as an octal value, otherwise,
267it is interpreted as a decimal value.
268.El
269.Sh ENVIRONMENT
270The
271.Ev LANG , LC_ALL , LC_CTYPE
272and
273.Ev LC_COLLATE
274environment variables affect the execution of
275.Nm
276as described in
277.Xr environ 7 .
278.Sh EXIT STATUS
279.Ex -std
280.Sh EXAMPLES
281The following examples are shown as given to the shell:
282.Pp
283Create a list of the words in file1, one per line, where a word is taken to
284be a maximal string of letters.
285.Pp
286.D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q < file1"
287.Pp
288Translate the contents of file1 to upper-case.
289.Pp
290.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q < file1"
291.Pp
292(This should be preferred over the traditional
293.Ux
294idiom of
295.Dq Li "tr a-z A-Z" ,
296since it works correctly in all locales.)
297.Pp
298Strip out non-printable characters from file1.
299.Pp
300.D1 Li "tr -cd \*q[:print:]\*q < file1"
301.Pp
302Remove diacritical marks from all accented variants of the letter
303.Ql e :
304.Pp
305.Dl "tr \*q[=e=]\*q \*qe\*q"
306.Sh COMPATIBILITY
307Previous
308.Fx
309implementations of
310.Nm
311did not order characters in range expressions according to the current
312locale's collation order, making it possible to convert unaccented Latin
313characters (esp.\& as found in English text) from upper to lower case using
314the traditional
315.Ux
316idiom of
317.Dq Li "tr A-Z a-z" .
318Since
319.Nm
320now obeys the locale's collation order, this idiom may not produce
321correct results when there is not a 1:1 mapping between lower and
322upper case, or when the order of characters within the two cases differs.
323As noted in the
324.Sx EXAMPLES
325section above, the character class expressions
326.Dq Li [:lower:]
327and
328.Dq Li [:upper:]
329should be used instead of explicit character ranges like
330.Dq Li a-z
331and
332.Dq Li A-Z .
333.Pp
334.Dq Li [=equiv=]
335expression and collation for ranges
336are implemented for single byte locales only.
337.Pp
338System V has historically implemented character ranges using the syntax
339.Dq Li [c-c]
340instead of the
341.Dq Li c-c
342used by historic
343.Bx
344implementations and
345standardized by POSIX.
346System V shell scripts should work under this implementation as long as
347the range is intended to map in another range, i.e., the command
348.Dq Li "tr [a-z] [A-Z]"
349will work as it will map the
350.Ql \&[
351character in
352.Ar string1
353to the
354.Ql \&[
355character in
356.Ar string2 .
357However, if the shell script is deleting or squeezing characters as in
358the command
359.Dq Li "tr -d [a-z]" ,
360the characters
361.Ql \&[
362and
363.Ql \&]
364will be
365included in the deletion or compression list which would not have happened
366under a historic System V implementation.
367Additionally, any scripts that depended on the sequence
368.Dq Li a-z
369to
370represent the three characters
371.Ql a ,
372.Ql \-
373and
374.Ql z
375will have to be
376rewritten as
377.Dq Li a\e-z .
378.Pp
379The
380.Nm
381utility has historically not permitted the manipulation of NUL bytes in
382its input and, additionally, stripped NUL's from its input stream.
383This implementation has removed this behavior as a bug.
384.Pp
385The
386.Nm
387utility has historically been extremely forgiving of syntax errors,
388for example, the
389.Fl c
390and
391.Fl s
392options were ignored unless two strings were specified.
393This implementation will not permit illegal syntax.
394.Sh STANDARDS
395The
396.Nm
397utility conforms to
398.St -p1003.1-2001 .
399The
400.Dq ideogram ,
401.Dq phonogram ,
402.Dq rune ,
403and
404.Dq special
405character classes are extensions.
406.Pp
407It should be noted that the feature wherein the last character of
408.Ar string2
409is duplicated if
410.Ar string2
411has less characters than
412.Ar string1
413is permitted by POSIX but is not required.
414Shell scripts attempting to be portable to other POSIX systems should use
415the
416.Dq Li [#*]
417convention instead of relying on this behavior.
418The
419.Fl u
420option is an extension to the
421.St -p1003.1-2001
422standard.
423