xref: /openbsd/usr.bin/tr/tr.1 (revision 17df1aa7)
1.\"	$OpenBSD: tr.1,v 1.15 2010/04/05 16:44:52 deraadt Exp $
2.\"	$NetBSD: tr.1,v 1.5 1994/12/07 08:35:13 jtc Exp $
3.\"
4.\" Copyright (c) 1991, 1993
5.\"	The Regents of the University of California.  All rights reserved.
6.\"
7.\" This code is derived from software contributed to Berkeley by
8.\" the Institute of Electrical and Electronics Engineers, Inc.
9.\"
10.\" Redistribution and use in source and binary forms, with or without
11.\" modification, are permitted provided that the following conditions
12.\" are met:
13.\" 1. Redistributions of source code must retain the above copyright
14.\"    notice, this list of conditions and the following disclaimer.
15.\" 2. Redistributions in binary form must reproduce the above copyright
16.\"    notice, this list of conditions and the following disclaimer in the
17.\"    documentation and/or other materials provided with the distribution.
18.\" 3. Neither the name of the University nor the names of its contributors
19.\"    may be used to endorse or promote products derived from this software
20.\"    without specific prior written permission.
21.\"
22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
25.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32.\" SUCH DAMAGE.
33.\"
34.\"     @(#)tr.1	8.1 (Berkeley) 6/6/93
35.\"
36.Dd $Mdocdate: April 5 2010 $
37.Dt TR 1
38.Os
39.Sh NAME
40.Nm tr
41.Nd translate characters
42.Sh SYNOPSIS
43.Nm tr
44.Op Fl cs
45.Ar string1 string2
46.Nm tr
47.Op Fl c
48.Fl d
49.Ar string1
50.Nm tr
51.Op Fl c
52.Fl s
53.Ar string1
54.Nm tr
55.Op Fl c
56.Fl ds
57.Ar string1 string2
58.Sh DESCRIPTION
59The
60.Nm
61utility copies the standard input to the standard output with substitution
62or deletion of selected characters.
63.Pp
64The options are as follows:
65.Bl -tag -width Ds
66.It Fl c
67Complements the set of characters in
68.Ar string1 ;
69for instance,
70.Dq -c\ ab
71includes every character except for
72.Dq a
73and
74.Dq b .
75.It Fl d
76The
77.Fl d
78option causes characters to be deleted from the input.
79.It Fl s
80The
81.Fl s
82option squeezes multiple occurrences of the characters listed in the last
83operand (either
84.Ar string1
85or
86.Ar string2 )
87in the input into a single instance of the character.
88This occurs after all deletion and translation is completed.
89.El
90.Pp
91In the first synopsis form, the characters in
92.Ar string1
93are translated into the characters in
94.Ar string2
95where the first character in
96.Ar string1
97is translated into the first character in
98.Ar string2
99and so on.
100If
101.Ar string1
102is longer than
103.Ar string2 ,
104the last character found in
105.Ar string2
106is duplicated until
107.Ar string1
108is exhausted.
109.Pp
110In the second synopsis form, the characters in
111.Ar string1
112are deleted from the input.
113.Pp
114In the third synopsis form, the characters in
115.Ar string1
116are compressed as described for the
117.Fl s
118option.
119.Pp
120In the fourth synopsis form, the characters in
121.Ar string1
122are deleted from the input, and the characters in
123.Ar string2
124are compressed as described for the
125.Fl s
126option.
127.Pp
128The following conventions can be used in
129.Ar string1
130and
131.Ar string2
132to specify sets of characters:
133.Bl -tag -width [:equiv:]
134.It character
135Any character not described by one of the following conventions
136represents itself.
137.It \eoctal
138A backslash followed by 1, 2, or 3 octal digits represents a character
139with that encoded value.
140To follow an octal sequence with a digit as a character, left zero-pad
141the octal sequence to the full 3 octal digits.
142.It \echaracter
143A backslash followed by certain special characters maps to special
144values.
145.Pp
146.Bl -column "nn" "<alert character>"
147.It \ea	<alert character>
148.It \eb	<backspace>
149.It \ef	<form-feed>
150.It \en	<newline>
151.It \er	<carriage return>
152.It \et	<tab>
153.It \ev	<vertical tab>
154.El
155.Pp
156A backslash followed by any other character maps to that character.
157.It c-c
158Represents the range of characters between the range endpoints, inclusively.
159.It [:class:]
160Represents all characters belonging to the defined character class.
161Class names are:
162.Pp
163.Bl -column "xdigit" "<lower-case alphabetic characters>"
164.It alnum	<alphanumeric characters>
165.It alpha	<alphabetic characters>
166.It blank	<blank characters>
167.It cntrl	<control characters>
168.It digit	<numeric characters>
169.It graph	<graphic characters>
170.It lower	<lower-case alphabetic characters>
171.It print	<printable characters>
172.It punct	<punctuation characters>
173.It space	<space characters>
174.It upper	<upper-case characters>
175.It xdigit	<hexadecimal characters>
176.El
177.Pp
178.\" All classes may be used in
179.\" .Ar string1 ,
180.\" and in
181.\" .Ar string2
182.\" when both the
183.\" .Fl d
184.\" and
185.\" .Fl s
186.\" options are specified.
187.\" Otherwise, only the classes ``upper'' and ``lower'' may be used in
188.\" .Ar string2
189.\" and then only when the corresponding class (``upper'' for ``lower''
190.\" and vice-versa) is specified in the same relative position in
191.\" .Ar string1 .
192.\" .Pp
193With the exception of the
194.Dq upper
195and
196.Dq lower
197classes, characters
198in the classes are in unspecified order.
199In the
200.Dq upper
201and
202.Dq lower
203classes, characters are entered in
204ascending order.
205.Pp
206For specific information as to which ASCII characters are included
207in these classes, see
208.Xr ctype 3
209and related manual pages.
210.It [=equiv=]
211Represents all characters or collating (sorting) elements belonging to
212the same equivalence class as
213.Ar equiv .
214If
215there is a secondary ordering within the equivalence class, the characters
216are ordered in ascending sequence.
217Otherwise, they are ordered after their encoded values.
218An example of an equivalence class might be
219.Dq c
220and
221.Dq ch
222in Spanish;
223English has no equivalence classes.
224.It [#*n]
225Represents
226.Ar n
227repeated occurrences of the character represented by
228.Ar # .
229This
230expression is only valid when it occurs in
231.Ar string2 .
232If
233.Ar n
234is omitted or is zero, it is be interpreted as large enough to extend
235.Ar string2
236sequence to the length of
237.Ar string1 .
238If
239.Ar n
240has a leading zero, it is interpreted as an octal value; otherwise,
241it's interpreted as a decimal value.
242.El
243.Pp
244.Ex -std tr
245.Sh EXAMPLES
246The following examples are shown as given to the shell:
247.Pp
248Create a list of the words in file1, one per line, where a word is taken to
249be a maximal string of letters.
250.Pp
251.D1 Li "$ tr -cs \*q[:alpha:]\*q \*q\en\*q < file1"
252.Pp
253Translate the contents of file1 to upper-case.
254.Pp
255.D1 Li "$ tr \*q[:lower:]\*q \*q[:upper:]\*q < file1"
256.Pp
257Strip out non-printable characters from file1.
258.Pp
259.D1 Li "$ tr -cd \*q[:print:]\*q < file1"
260.Sh SEE ALSO
261.Xr sed 1
262.Sh STANDARDS
263The
264.Nm
265utility is compliant with the
266.St -p1003.1-2008
267specification.
268.Pp
269System V has historically implemented character ranges using the syntax
270.Dq [c-c]
271instead of the
272.Dq c-c
273used by historic BSD implementations and
274standardized by POSIX.
275System V shell scripts should work under this implementation as long as
276the range is intended to map in another range, i.e., the command
277.Dq tr\ [a-z]\ [A-Z]
278will work as it will map the
279.Dq [
280character in
281.Ar string1
282to the
283.Dq [
284character in
285.Ar string2 .
286However, if the shell script is deleting or squeezing characters as in
287the command
288.Dq tr\ -d\ [a-z] ,
289the characters
290.Dq [
291and
292.Dq \]
293will be
294included in the deletion or compression list, which would not have happened
295under an historic System V implementation.
296Additionally, any scripts that depended on the sequence
297.Dq a-z
298to represent the three characters
299.Dq a ,
300.Dq - ,
301and
302.Dq z
303will have to be rewritten as
304.Dq a\e-z .
305.Pp
306The
307.Nm
308utility has historically not permitted the manipulation of NUL bytes in
309its input and, additionally, has stripped NUL's from its input stream.
310This implementation has removed this behavior as a bug.
311.Pp
312The
313.Nm
314utility has historically been extremely forgiving of syntax errors:
315for example, the
316.Fl c
317and
318.Fl s
319options were ignored unless two strings were specified.
320This implementation will not permit illegal syntax.
321.Pp
322It should be noted that the feature wherein the last character of
323.Ar string2
324is duplicated if
325.Ar string2
326has less characters than
327.Ar string1
328is permitted by POSIX but is not required.
329Shell scripts attempting to be portable to other POSIX systems should use
330the
331.Dq [#*]
332convention instead of relying on this behavior.
333