xref: /freebsd/contrib/libc-vis/vis.3 (revision 4f52dfbb)
1.\"	$NetBSD: vis.3,v 1.49 2017/08/05 20:22:29 wiz Exp $
2.\"	$FreeBSD$
3.\"
4.\" Copyright (c) 1989, 1991, 1993
5.\"	The Regents of the University of California.  All rights reserved.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\" 3. Neither the name of the University nor the names of its contributors
16.\"    may be used to endorse or promote products derived from this software
17.\"    without specific prior written permission.
18.\"
19.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
20.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
22.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
23.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
25.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
26.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
27.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
28.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
29.\" SUCH DAMAGE.
30.\"
31.\"     @(#)vis.3	8.1 (Berkeley) 6/9/93
32.\"
33.Dd April 22, 2017
34.Dt VIS 3
35.Os
36.Sh NAME
37.Nm vis ,
38.Nm nvis ,
39.Nm strvis ,
40.Nm stravis ,
41.Nm strnvis ,
42.Nm strvisx ,
43.Nm strnvisx ,
44.Nm strenvisx ,
45.Nm svis ,
46.Nm snvis ,
47.Nm strsvis ,
48.Nm strsnvis ,
49.Nm strsvisx ,
50.Nm strsnvisx ,
51.Nm strsenvisx
52.Nd visually encode characters
53.Sh LIBRARY
54.Lb libc
55.Sh SYNOPSIS
56.In vis.h
57.Ft char *
58.Fn vis "char *dst" "int c" "int flag" "int nextc"
59.Ft char *
60.Fn nvis "char *dst" "size_t dlen" "int c" "int flag" "int nextc"
61.Ft int
62.Fn strvis "char *dst" "const char *src" "int flag"
63.Ft int
64.Fn stravis "char **dst" "const char *src" "int flag"
65.Ft int
66.Fn strnvis "char *dst" "size_t dlen" "const char *src" "int flag"
67.Ft int
68.Fn strvisx "char *dst" "const char *src" "size_t len" "int flag"
69.Ft int
70.Fn strnvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag"
71.Ft int
72.Fn strenvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "int *cerr_ptr"
73.Ft char *
74.Fn svis "char *dst" "int c" "int flag" "int nextc" "const char *extra"
75.Ft char *
76.Fn snvis "char *dst" "size_t dlen" "int c" "int flag" "int nextc" "const char *extra"
77.Ft int
78.Fn strsvis "char *dst" "const char *src" "int flag" "const char *extra"
79.Ft int
80.Fn strsnvis "char *dst" "size_t dlen" "const char *src" "int flag" "const char *extra"
81.Ft int
82.Fn strsvisx "char *dst" "const char *src" "size_t len" "int flag" "const char *extra"
83.Ft int
84.Fn strsnvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "const char *extra"
85.Ft int
86.Fn strsenvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "const char *extra" "int *cerr_ptr"
87.Sh DESCRIPTION
88The
89.Fn vis
90function
91copies into
92.Fa dst
93a string which represents the character
94.Fa c .
95If
96.Fa c
97needs no encoding, it is copied in unaltered.
98The string is null terminated, and a pointer to the end of the string is
99returned.
100The maximum length of any encoding is four
101bytes (not including the trailing
102.Dv NUL ) ;
103thus, when
104encoding a set of characters into a buffer, the size of the buffer should
105be four times the number of bytes encoded, plus one for the trailing
106.Dv NUL .
107The flag parameter is used for altering the default range of
108characters considered for encoding and for altering the visual
109representation.
110The additional character,
111.Fa nextc ,
112is only used when selecting the
113.Dv VIS_CSTYLE
114encoding format (explained below).
115.Pp
116The
117.Fn strvis ,
118.Fn stravis ,
119.Fn strnvis ,
120.Fn strvisx ,
121and
122.Fn strnvisx
123functions copy into
124.Fa dst
125a visual representation of
126the string
127.Fa src .
128The
129.Fn strvis
130and
131.Fn strnvis
132functions encode characters from
133.Fa src
134up to the
135first
136.Dv NUL .
137The
138.Fn strvisx
139and
140.Fn strnvisx
141functions encode exactly
142.Fa len
143characters from
144.Fa src
145(this
146is useful for encoding a block of data that may contain
147.Dv NUL Ns 's ) .
148Both forms
149.Dv NUL
150terminate
151.Fa dst .
152The size of
153.Fa dst
154must be four times the number
155of bytes encoded from
156.Fa src
157(plus one for the
158.Dv NUL ) .
159Both
160forms return the number of characters in
161.Fa dst
162(not including the trailing
163.Dv NUL ) .
164The
165.Fn stravis
166function allocates space dynamically to hold the string.
167The
168.Dq Nm n
169versions of the functions also take an additional argument
170.Fa dlen
171that indicates the length of the
172.Fa dst
173buffer.
174If
175.Fa dlen
176is not large enough to fit the converted string then the
177.Fn strnvis
178and
179.Fn strnvisx
180functions return \-1 and set
181.Va errno
182to
183.Dv ENOSPC .
184The
185.Fn strenvisx
186function takes an additional argument,
187.Fa cerr_ptr ,
188that is used to pass in and out a multibyte conversion error flag.
189This is useful when processing single characters at a time when
190it is possible that the locale may be set to something other
191than the locale of the characters in the input data.
192.Pp
193The functions
194.Fn svis ,
195.Fn snvis ,
196.Fn strsvis ,
197.Fn strsnvis ,
198.Fn strsvisx ,
199.Fn strsnvisx ,
200and
201.Fn strsenvisx
202correspond to
203.Fn vis ,
204.Fn nvis ,
205.Fn strvis ,
206.Fn strnvis ,
207.Fn strvisx ,
208.Fn strnvisx ,
209and
210.Fn strenvisx
211but have an additional argument
212.Fa extra ,
213pointing to a
214.Dv NUL
215terminated list of characters.
216These characters will be copied encoded or backslash-escaped into
217.Fa dst .
218These functions are useful e.g. to remove the special meaning
219of certain characters to shells.
220.Pp
221The encoding is a unique, invertible representation composed entirely of
222graphic characters; it can be decoded back into the original form using
223the
224.Xr unvis 3 ,
225.Xr strunvis 3
226or
227.Xr strnunvis 3
228functions.
229.Pp
230There are two parameters that can be controlled: the range of
231characters that are encoded (applies only to
232.Fn vis ,
233.Fn nvis ,
234.Fn strvis ,
235.Fn strnvis ,
236.Fn strvisx ,
237and
238.Fn strnvisx ) ,
239and the type of representation used.
240By default, all non-graphic characters,
241except space, tab, and newline are encoded (see
242.Xr isgraph 3 ) .
243The following flags
244alter this:
245.Bl -tag -width VIS_WHITEX
246.It Dv VIS_DQ
247Also encode double quotes
248.It Dv VIS_GLOB
249Also encode the magic characters
250.Ql ( * ,
251.Ql \&? ,
252.Ql \&[ ,
253and
254.Ql # )
255recognized by
256.Xr glob 3 .
257.It Dv VIS_SHELL
258Also encode the meta characters used by shells (in addition to the glob
259characters):
260.Ql ( ' ,
261.Ql ` ,
262.Ql \&" ,
263.Ql \&; ,
264.Ql & ,
265.Ql < ,
266.Ql > ,
267.Ql \&( ,
268.Ql \&) ,
269.Ql \&| ,
270.Ql \&] ,
271.Ql \e ,
272.Ql $ ,
273.Ql \&! ,
274.Ql \&^ ,
275and
276.Ql ~ ) .
277.It Dv VIS_SP
278Also encode space.
279.It Dv VIS_TAB
280Also encode tab.
281.It Dv VIS_NL
282Also encode newline.
283.It Dv VIS_WHITE
284Synonym for
285.Dv VIS_SP | VIS_TAB | VIS_NL .
286.It Dv VIS_META
287Synonym for
288.Dv VIS_WHITE | VIS_GLOB | VIS_SHELL .
289.It Dv VIS_SAFE
290Only encode
291.Dq unsafe
292characters.
293Unsafe means control characters which may cause common terminals to perform
294unexpected functions.
295Currently this form allows space, tab, newline, backspace, bell, and
296return \(em in addition to all graphic characters \(em unencoded.
297.El
298.Pp
299(The above flags have no effect for
300.Fn svis ,
301.Fn snvis ,
302.Fn strsvis ,
303.Fn strsnvis ,
304.Fn strsvisx ,
305and
306.Fn strsnvisx .
307When using these functions, place all graphic characters to be
308encoded in an array pointed to by
309.Fa extra .
310In general, the backslash character should be included in this array, see the
311warning on the use of the
312.Dv VIS_NOSLASH
313flag below).
314.Pp
315There are six forms of encoding.
316All forms use the backslash character
317.Ql \e
318to introduce a special
319sequence; two backslashes are used to represent a real backslash,
320except
321.Dv VIS_HTTPSTYLE
322that uses
323.Ql % ,
324or
325.Dv VIS_MIMESTYLE
326that uses
327.Ql = .
328These are the visual formats:
329.Bl -tag -width VIS_CSTYLE
330.It (default)
331Use an
332.Ql M
333to represent meta characters (characters with the 8th
334bit set), and use caret
335.Ql ^
336to represent control characters (see
337.Xr iscntrl 3 ) .
338The following formats are used:
339.Bl -tag -width xxxxx
340.It Dv \e^C
341Represents the control character
342.Ql C .
343Spans characters
344.Ql \e000
345through
346.Ql \e037 ,
347and
348.Ql \e177
349(as
350.Ql \e^? ) .
351.It Dv \eM-C
352Represents character
353.Ql C
354with the 8th bit set.
355Spans characters
356.Ql \e241
357through
358.Ql \e376 .
359.It Dv \eM^C
360Represents control character
361.Ql C
362with the 8th bit set.
363Spans characters
364.Ql \e200
365through
366.Ql \e237 ,
367and
368.Ql \e377
369(as
370.Ql \eM^? ) .
371.It Dv \e040
372Represents
373.Tn ASCII
374space.
375.It Dv \e240
376Represents Meta-space.
377.El
378.It Dv VIS_CSTYLE
379Use C-style backslash sequences to represent standard non-printable
380characters.
381The following sequences are used to represent the indicated characters:
382.Bd -unfilled -offset indent
383.Li \ea Tn  \(em BEL No (007)
384.Li \eb Tn  \(em BS No (010)
385.Li \ef Tn  \(em NP No (014)
386.Li \en Tn  \(em NL No (012)
387.Li \er Tn  \(em CR No (015)
388.Li \es Tn  \(em SP No (040)
389.Li \et Tn  \(em HT No (011)
390.Li \ev Tn  \(em VT No (013)
391.Li \e0 Tn  \(em NUL No (000)
392.Ed
393.Pp
394When using this format, the
395.Fa nextc
396parameter is looked at to determine if a
397.Dv NUL
398character can be encoded as
399.Ql \e0
400instead of
401.Ql \e000 .
402If
403.Fa nextc
404is an octal digit, the latter representation is used to
405avoid ambiguity.
406.Pp
407Non-printable characters without C-style
408backslash sequences use the default representation.
409.It Dv VIS_OCTAL
410Use a three digit octal sequence.
411The form is
412.Ql \eddd
413where
414.Em d
415represents an octal digit.
416.It Dv VIS_CSTYLE \&| Dv VIS_OCTAL
417Same as
418.Dv VIS_CSTYLE
419except that non-printable characters without C-style
420backslash sequences use a three digit octal sequence.
421.It Dv VIS_HTTPSTYLE
422Use URI encoding as described in RFC 1738.
423The form is
424.Ql %xx
425where
426.Em x
427represents a lower case hexadecimal digit.
428.It Dv VIS_MIMESTYLE
429Use MIME Quoted-Printable encoding as described in RFC 2045, only don't
430break lines and don't handle CRLF.
431The form is
432.Ql =XX
433where
434.Em X
435represents an upper case hexadecimal digit.
436.El
437.Pp
438There is one additional flag,
439.Dv VIS_NOSLASH ,
440which inhibits the
441doubling of backslashes and the backslash before the default
442format (that is, control characters are represented by
443.Ql ^C
444and
445meta characters as
446.Ql M-C ) .
447With this flag set, the encoding is
448ambiguous and non-invertible.
449.Sh MULTIBYTE CHARACTER SUPPORT
450These functions support multibyte character input.
451The encoding conversion is influenced by the setting of the
452.Ev LC_CTYPE
453environment variable which defines the set of characters
454that can be copied without encoding.
455.Pp
456If
457.Dv VIS_NOLOCALE
458is set, processing is done assuming the C locale and overriding
459any other environment settings.
460.Pp
461When 8-bit data is present in the input,
462.Ev LC_CTYPE
463must be set to the correct locale or to the C locale.
464If the locales of the data and the conversion are mismatched,
465multibyte character recognition may fail and encoding will be performed
466byte-by-byte instead.
467.Pp
468As noted above,
469.Fa dst
470must be four times the number of bytes processed from
471.Fa src .
472But note that each multibyte character can be up to
473.Dv MB_LEN_MAX
474bytes
475.\" (see
476.\" .Xr multibyte 3 )
477so in terms of multibyte characters,
478.Fa dst
479must be four times
480.Dv MB_LEN_MAX
481times the number of characters processed from
482.Fa src .
483.Sh ENVIRONMENT
484.Bl -tag -width ".Ev LC_CTYPE"
485.It Ev LC_CTYPE
486Specify the locale of the input data.
487Set to C if the input data locale is unknown.
488.El
489.Sh ERRORS
490The functions
491.Fn nvis
492and
493.Fn snvis
494will return
495.Dv NULL
496and the functions
497.Fn strnvis ,
498.Fn strnvisx ,
499.Fn strsnvis ,
500and
501.Fn strsnvisx ,
502will return \-1 when the
503.Fa dlen
504destination buffer size is not enough to perform the conversion while
505setting
506.Va errno
507to:
508.Bl -tag -width ".Bq Er ENOSPC"
509.It Bq Er ENOSPC
510The destination buffer size is not large enough to perform the conversion.
511.El
512.Sh SEE ALSO
513.Xr unvis 1 ,
514.Xr vis 1 ,
515.Xr glob 3 ,
516.\" .Xr multibyte 3 ,
517.Xr unvis 3
518.Rs
519.%A T. Berners-Lee
520.%T Uniform Resource Locators (URL)
521.%O "RFC 1738"
522.Re
523.Rs
524.%T "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies"
525.%O "RFC 2045"
526.Re
527.Sh HISTORY
528The
529.Fn vis ,
530.Fn strvis ,
531and
532.Fn strvisx
533functions first appeared in
534.Bx 4.4 .
535The
536.Fn svis ,
537.Fn strsvis ,
538and
539.Fn strsvisx
540functions appeared in
541.Nx 1.5
542and
543.Fx 9.2 .
544The buffer size limited versions of the functions
545.Po Fn nvis ,
546.Fn strnvis ,
547.Fn strnvisx ,
548.Fn snvis ,
549.Fn strsnvis ,
550and
551.Fn strsnvisx Pc
552appeared in
553.Nx 6.0
554and
555.Fx 9.2 .
556Multibyte character support was added in
557.Nx 7.0
558and
559.Fx 9.2 .
560