xref: /netbsd/usr.bin/tr/tr.1 (revision 7cb9e3d3)
1.\"	$NetBSD: tr.1,v 1.24 2021/06/14 17:22:22 christos Exp $
2.\"
3.\" Copyright (c) 1991, 1993
4.\"	The Regents of the University of California.  All rights reserved.
5.\"
6.\" This code is derived from software contributed to Berkeley by
7.\" the Institute of Electrical and Electronics Engineers, Inc.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. Neither the name of the University nor the names of its contributors
18.\"    may be used to endorse or promote products derived from this software
19.\"    without specific prior written permission.
20.\"
21.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
22.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
23.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
24.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
25.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
26.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
27.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
28.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
29.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
30.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
31.\" SUCH DAMAGE.
32.\"
33.\"     @(#)tr.1	8.1 (Berkeley) 6/6/93
34.\"
35.Dd June 14, 2021
36.Dt TR 1
37.Os
38.Sh NAME
39.Nm tr
40.Nd translate characters
41.Sh SYNOPSIS
42.Nm
43.Op Fl cs
44.Ar string1 string2
45.Nm
46.Op Fl c
47.Fl d
48.Ar string1
49.Nm
50.Op Fl c
51.Fl s
52.Ar string1
53.Nm
54.Op Fl c
55.Fl ds
56.Ar string1 string2
57.Sh DESCRIPTION
58The
59.Nm
60utility copies the standard input to the standard output with substitution
61or deletion of selected characters.
62.Pp
63The following options are available:
64.Bl -tag -width Ds
65.It Fl c
66Complements the set of characters in
67.Ar string1 ;
68that is,
69.Fl c Ar \&ab
70includes every character except for
71.Sq a
72and
73.Sq b .
74.It Fl d
75The
76.Fl d
77option causes characters to be deleted from the input.
78.It Fl s
79The
80.Fl s
81option squeezes multiple occurrences of the characters listed in the last
82operand (either
83.Ar string1
84or
85.Ar string2 )
86in the input into a single instance of the character.
87This occurs after all deletion and translation is completed.
88.El
89.Pp
90In the first synopsis form, the characters in
91.Ar string1
92are translated into the characters in
93.Ar string2 ,
94where the first character in
95.Ar string1
96is translated into the first character in
97.Ar string2 ,
98and so on.
99If
100.Ar string1
101is longer than
102.Ar string2 ,
103the last character found in
104.Ar string2
105is duplicated until
106.Ar string1
107is exhausted.
108.Pp
109In the second synopsis form, the characters in
110.Ar string1
111are deleted from the input.
112.Pp
113In the third synopsis form, the characters in
114.Ar string1
115are compressed as described for the
116.Fl s
117option.
118.Pp
119In the fourth synopsis form, the characters in
120.Ar string1
121are deleted from the input, and the characters in
122.Ar string2
123are compressed as described for the
124.Fl s
125option.
126.Pp
127The following conventions can be used in
128.Ar string1
129and
130.Ar string2
131to specify sets of characters:
132.Bl -tag -width [:equiv:]
133.It character
134Any character not described by one of the following conventions
135represents itself.
136.It \eoctal
137A backslash followed by 1, 2, or 3 octal digits represents a character
138with that encoded value.
139To follow an octal sequence with a digit as a character, left zero-pad
140the octal sequence to the full 3 octal digits.
141.It \echaracter
142A backslash followed by certain special characters maps to special
143values.
144.sp
145.Bl -column cc
146.It \ea	<alert character>
147.It \eb	<backspace>
148.It \ef	<form-feed>
149.It \en	<newline>
150.It \er	<carriage return>
151.It \et	<tab>
152.It \ev	<vertical tab>
153.El
154.sp
155A backslash followed by any other character maps to that character.
156.It c-c
157Represents the range of characters between the range endpoints, inclusively.
158.It [:class:]
159Represents all characters belonging to the defined character class.
160Class names are:
161.sp
162.Bl -column xdigit
163.It alnum	<alphanumeric characters>
164.It alpha	<alphabetic characters>
165.It blank	<blank characters>
166.It cntrl	<control characters>
167.It digit	<numeric characters>
168.It graph	<graphic characters>
169.It lower	<lower-case alphabetic characters>
170.It print	<printable characters>
171.It punct	<punctuation characters>
172.It space	<space characters>
173.It upper	<upper-case alphabetic characters>
174.It xdigit	<hexadecimal characters>
175.El
176.Pp
177.\" All classes may be used in
178.\" .Ar string1 ,
179.\" and in
180.\" .Ar string2
181.\" when both the
182.\" .Fl d
183.\" and
184.\" .Fl s
185.\" options are specified.
186.\" Otherwise, only the classes ``upper'' and ``lower'' may be used in
187.\" .Ar string2
188.\" and then only when the corresponding class (``upper'' for ``lower''
189.\" and vice-versa) is specified in the same relative position in
190.\" .Ar string1 .
191.\" .Pp
192With the exception of the
193.Dq upper
194and
195.Dq lower
196classes, characters in the classes are in unspecified order.
197In the
198.Dq upper
199and
200.Dq lower
201classes, characters are entered in ascending order.
202.Pp
203For specific information as to which ASCII characters are included
204in these classes, see
205.Xr ctype 3
206and related manual pages.
207.It [=equiv=]
208Represents all characters or collating (sorting) elements belonging to
209the same equivalence class as
210.Ar equiv .
211If there is a secondary ordering within the equivalence class, the
212characters are ordered in ascending sequence.
213Otherwise, they are ordered after their encoded values.
214An example of an equivalence class might be
215.Dq \&c
216and
217.Dq \&ch
218in Spanish;
219English has no equivalence classes.
220.It [#*n]
221Represents
222.Ar n
223repeated occurrences of the character represented by
224.Ar # .
225This
226expression is only valid when it occurs in
227.Ar string2 .
228If
229.Ar n
230is omitted or is zero, it is interpreted as large enough to extend the
231.Ar string2
232sequence to the length of
233.Ar string1 .
234If
235.Ar n
236has a leading zero, it is interpreted as an octal value;
237otherwise, it is interpreted as a decimal value.
238.El
239.Sh EXIT STATUS
240.Ex -std
241.Sh EXAMPLES
242The following examples are shown as given to the shell:
243.Pp
244Create a list of the words in
245.Ar file1 ,
246one per line, where a word is taken to be a maximal string of letters:
247.sp
248.D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q < file1"
249.sp
250Translate the contents of
251.Ar file1
252to upper-case:
253.sp
254.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q < file1"
255.sp
256Strip out non-printable characters from
257.Ar file1 :
258.sp
259.D1 Li "tr -cd \*q[:print:]\*q < file1"
260.Sh COMPATIBILITY
261.At V
262has historically implemented character ranges using the syntax
263.Dq [c-c]
264instead of the
265.Dq c-c
266used by historic
267.Bx
268implementations and standardized by POSIX.
269.At V
270shell scripts should work under this implementation as long as
271the range is intended to map in another range, i.e. the command
272.Pp
273.Ic "tr [a-z] [A-Z]"
274.Pp
275will work as it will map the
276.Sq \&[
277character in
278.Ar string1
279to the
280.Sq \&[
281character in
282.Ar string2 .
283However, if the shell script is deleting or squeezing characters as in
284the command
285.Pp
286.Ic "tr -d [a-z]"
287.Pp
288the characters
289.Sq \&[
290and
291.Sq \&]
292will be included in the deletion or compression list which would
293not have happened under an historic
294.At V
295implementation.
296Additionally, any scripts that depended on the sequence
297.Dq a-z
298to represent the three characters
299.Sq \&a ,
300.Sq \&- ,
301and
302.Sq \&z
303will have to be rewritten as
304.Dq a\e-z .
305.Pp
306The
307.Nm
308utility has historically not permitted the manipulation of NUL bytes in
309its input and, additionally, stripped NULs from its input stream.
310This implementation has removed this behavior as a bug.
311.Pp
312The
313.Nm
314utility has historically been extremely forgiving of syntax errors,
315for example, the
316.Fl c
317and
318.Fl s
319options were ignored unless two strings were specified.
320This implementation will not permit illegal syntax.
321.Sh SEE ALSO
322.Xr dd 1 ,
323.Xr sed 1 ,
324.Xr ctype 3
325.Sh STANDARDS
326The
327.Nm
328utility is expected to be
329.St -p1003.2
330compatible.
331It should be noted that the feature wherein the last character of
332.Ar string2
333is duplicated if
334.Ar string2
335has less characters than
336.Ar string1
337is permitted by POSIX but is not required.
338Shell scripts attempting to be portable to other POSIX systems should use
339the
340.Dq [#*n]
341convention instead of relying on this behavior.
342.Sh BUGS
343.Nm
344was originally designed to work with
345.Tn US-ASCII .
346Its use with character sets that do not share all the properties of
347.Tn US-ASCII ,
348e.g., a symmetric set of upper and lower case characters
349that can be algorithmically converted one to the other,
350may yield unpredictable results.
351.Pp
352.Nm
353should be internationalized.
354