xref: /dragonfly/lib/libc/gen/vis.3 (revision dcb5d66b)
1.\"	$NetBSD: vis.3,v 1.49 2017/08/05 20:22:29 wiz Exp $
2.\"
3.\" Copyright (c) 1989, 1991, 1993
4.\"	The Regents of the University of California.  All rights reserved.
5.\"
6.\" Redistribution and use in source and binary forms, with or without
7.\" modification, are permitted provided that the following conditions
8.\" are met:
9.\" 1. Redistributions of source code must retain the above copyright
10.\"    notice, this list of conditions and the following disclaimer.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice, this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\" 3. Neither the name of the University nor the names of its contributors
15.\"    may be used to endorse or promote products derived from this software
16.\"    without specific prior written permission.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
19.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
22.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28.\" SUCH DAMAGE.
29.\"
30.\"     @(#)vis.3	8.1 (Berkeley) 6/9/93
31.\"
32.Dd April 9, 2018
33.Dt VIS 3
34.Os
35.Sh NAME
36.Nm vis ,
37.Nm nvis ,
38.Nm strvis ,
39.Nm stravis ,
40.Nm strnvis ,
41.Nm strvisx ,
42.Nm strnvisx ,
43.Nm strenvisx ,
44.Nm svis ,
45.Nm snvis ,
46.Nm strsvis ,
47.Nm strsnvis ,
48.Nm strsvisx ,
49.Nm strsnvisx ,
50.Nm strsenvisx
51.Nd visually encode characters
52.Sh LIBRARY
53.Lb libc
54.Sh SYNOPSIS
55.In vis.h
56.Ft char *
57.Fn vis "char *dst" "int c" "int flag" "int nextc"
58.Ft char *
59.Fn nvis "char *dst" "size_t dlen" "int c" "int flag" "int nextc"
60.Ft int
61.Fn strvis "char *dst" "const char *src" "int flag"
62.Ft int
63.Fn stravis "char **dst" "const char *src" "int flag"
64.Ft int
65.Fn strnvis "char *dst" "const char *src" "size_t len" "int flag"
66.Ft int
67.Fn strvisx "char *dst" "const char *src" "size_t len" "int flag"
68.Ft int
69.Fn strnvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag"
70.Ft int
71.Fn strenvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "int *cerr_ptr"
72.Ft char *
73.Fn svis "char *dst" "int c" "int flag" "int nextc" "const char *extra"
74.Ft char *
75.Fn snvis "char *dst" "size_t dlen" "int c" "int flag" "int nextc" "const char *extra"
76.Ft int
77.Fn strsvis "char *dst" "const char *src" "int flag" "const char *extra"
78.Ft int
79.Fn strsnvis "char *dst" "size_t dlen" "const char *src" "int flag" "const char *extra"
80.Ft int
81.Fn strsvisx "char *dst" "const char *src" "size_t len" "int flag" "const char *extra"
82.Ft int
83.Fn strsnvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "const char *extra"
84.Ft int
85.Fn strsenvisx "char *dst" "size_t dlen" "const char *src" "size_t len" "int flag" "const char *extra" "int *cerr_ptr"
86.Sh DESCRIPTION
87The
88.Fn vis
89function
90copies into
91.Fa dst
92a string which represents the character
93.Fa c .
94If
95.Fa c
96needs no encoding, it is copied in unaltered.
97The string is null terminated, and a pointer to the end of the string is
98returned.
99The maximum length of any encoding is four
100bytes (not including the trailing
101.Dv NUL ) ;
102thus, when
103encoding a set of characters into a buffer, the size of the buffer should
104be four times the number of bytes encoded, plus one for the trailing
105.Dv NUL .
106The
107.Fa flag
108parameter is used for altering the default range of
109characters considered for encoding and for altering the visual
110representation.
111The additional character,
112.Fa nextc ,
113is only used when selecting the
114.Dv VIS_CSTYLE
115encoding format (explained below).
116.Pp
117The
118.Fn strvis ,
119.Fn stravis ,
120.Fn strnvis ,
121.Fn strvisx ,
122and
123.Fn strnvisx
124functions copy into
125.Fa dst
126a visual representation of
127the string
128.Fa src .
129The
130.Fn strvis
131and
132.Fn strnvis
133functions encode characters from
134.Fa src
135up to the
136first
137.Dv NUL .
138The
139.Fn strvisx
140and
141.Fn strnvisx
142functions encode exactly
143.Fa len
144characters from
145.Fa src
146(this
147is useful for encoding a block of data that may contain
148.Dv NUL Ns 's ) .
149Both forms
150.Dv NUL
151terminate
152.Fa dst .
153The size of
154.Fa dst
155must be four times the number
156of bytes encoded from
157.Fa src
158(plus one for the
159.Dv NUL ) .
160Both
161forms return the number of characters in
162.Fa dst
163(not including the trailing
164.Dv NUL ) .
165The
166.Fn stravis
167function allocates space dynamically to hold the string.
168The
169.Dq Nm n
170versions of the functions also take an additional argument
171.Fa dlen
172that indicates the length of the
173.Fa dst
174buffer.
175If
176.Fa dlen
177is not large enough to fit the converted string then the
178.Fn strnvis
179and
180.Fn strnvisx
181functions return \-1 and set
182.Va errno
183to
184.Er ENOSPC .
185The
186.Fn strenvisx
187function takes an additional argument,
188.Fa cerr_ptr ,
189that is used to pass in and out a multibyte conversion error flag.
190This is useful when processing single characters at a time when
191it is possible that the locale may be set to something other
192than the locale of the characters in the input data.
193.Pp
194The functions
195.Fn svis ,
196.Fn snvis ,
197.Fn strsvis ,
198.Fn strsnvis ,
199.Fn strsvisx ,
200.Fn strsnvisx ,
201and
202.Fn strsenvisx
203correspond to
204.Fn vis ,
205.Fn nvis ,
206.Fn strvis ,
207.Fn strnvis ,
208.Fn strvisx ,
209.Fn strnvisx ,
210and
211.Fn strenvisx
212but have an additional argument
213.Fa extra ,
214pointing to a
215.Dv NUL
216terminated list of characters.
217These characters will be copied encoded or backslash-escaped into
218.Fa dst .
219These functions are useful e.g. to remove the special meaning
220of certain characters to shells.
221.Pp
222The encoding is a unique, invertible representation composed entirely of
223graphic characters; it can be decoded back into the original form using
224the
225.Xr unvis 3 ,
226.Xr strunvis 3
227or
228.Xr strnunvis 3
229functions.
230.Pp
231There are two parameters that can be controlled: the range of
232characters that are encoded (applies only to
233.Fn vis ,
234.Fn nvis ,
235.Fn strvis ,
236.Fn strnvis ,
237.Fn strvisx ,
238and
239.Fn strnvisx ) ,
240and the type of representation used.
241By default, all non-graphic characters,
242except space, tab, and newline are encoded (see
243.Xr isgraph 3 ) .
244The following flags
245alter this:
246.Bl -tag -width ".Dv VIS_HTTPSTYLE"
247.It Dv VIS_DQ
248Also encode double quotes
249.It Dv VIS_GLOB
250Also encode the magic characters
251.Ql ( * ,
252.Ql \&? ,
253.Ql \&[ ,
254and
255.Ql # )
256recognized by
257.Xr glob 3 .
258.It Dv VIS_SHELL
259Also encode the meta characters used by shells (in addition to the glob
260characters):
261.Ql ( ' ,
262.Ql ` ,
263.Ql \&" ,
264.Ql \&; ,
265.Ql & ,
266.Ql < ,
267.Ql > ,
268.Ql \&( ,
269.Ql \&) ,
270.Ql \&| ,
271.Ql \&] ,
272.Ql \e ,
273.Ql $ ,
274.Ql \&! ,
275.Ql \&^ ,
276and
277.Ql ~ ) .
278.It Dv VIS_SP
279Also encode space.
280.It Dv VIS_TAB
281Also encode tab.
282.It Dv VIS_NL
283Also encode newline.
284.It Dv VIS_WHITE
285Synonym for
286.Dv VIS_SP | VIS_TAB | VIS_NL .
287.It Dv VIS_META
288Synonym for
289.Dv VIS_WHITE | VIS_GLOB | VIS_SHELL .
290.It Dv VIS_SAFE
291Only encode
292.Dq unsafe
293characters.
294Unsafe means control characters which may cause common terminals to perform
295unexpected functions.
296Currently this form allows space, tab, newline, backspace, bell, and
297return \(em in addition to all graphic characters \(em unencoded.
298.El
299.Pp
300(The above flags have no effect for
301.Fn svis ,
302.Fn snvis ,
303.Fn strsvis ,
304.Fn strsnvis ,
305.Fn strsvisx ,
306and
307.Fn strsnvisx .
308When using these functions, place all graphic characters to be
309encoded in an array pointed to by
310.Fa extra .
311In general, the backslash character should be included in this array, see the
312warning on the use of the
313.Dv VIS_NOSLASH
314flag below).
315.Pp
316There are six forms of encoding.
317All forms use the backslash character
318.Ql \e
319to introduce a special
320sequence; two backslashes are used to represent a real backslash,
321except
322.Dv VIS_HTTPSTYLE
323that uses
324.Ql % ,
325or
326.Dv VIS_MIMESTYLE
327that uses
328.Ql = .
329These are the visual formats:
330.Bl -tag -width ".Dv VIS_HTTPSTYLE"
331.It (default)
332Use an
333.Ql M
334to represent meta characters (characters with the 8th
335bit set), and use caret
336.Ql ^
337to represent control characters (see
338.Xr iscntrl 3 ) .
339The following formats are used:
340.Bl -tag -width xxxxx
341.It Dv \e^C
342Represents the control character
343.Ql C .
344Spans characters
345.Ql \e000
346through
347.Ql \e037 ,
348and
349.Ql \e177
350(as
351.Ql \e^? ) .
352.It Dv \eM-C
353Represents character
354.Ql C
355with the 8th bit set.
356Spans characters
357.Ql \e241
358through
359.Ql \e376 .
360.It Dv \eM^C
361Represents control character
362.Ql C
363with the 8th bit set.
364Spans characters
365.Ql \e200
366through
367.Ql \e237 ,
368and
369.Ql \e377
370(as
371.Ql \eM^? ) .
372.It Dv \e040
373Represents
374.Tn ASCII
375space.
376.It Dv \e240
377Represents Meta-space.
378.El
379.It Dv VIS_CSTYLE
380Use C-style backslash sequences to represent standard non-printable
381characters.
382The following sequences are used to represent the indicated characters:
383.Pp
384.Bl -tag -width ".Li \e0" -offset indent -compact
385.It Li \ea
386.Dv BEL No (007)
387.It Li \eb
388.Dv BS No (010)
389.It Li \ef
390.Dv NP No (014)
391.It Li \en
392.Dv NL No (012)
393.It Li \er
394.Dv CR No (015)
395.It Li \et
396.Dv HT No (011)
397.It Li \ev
398.Dv VT No (013)
399.It Li \e0
400.Dv NUL No (000)
401.El
402.Pp
403When using this format, the
404.Fa nextc
405parameter is looked at to determine if a
406.Dv NUL
407character can be encoded as
408.Ql \e0
409instead of
410.Ql \e000 .
411If
412.Fa nextc
413is an octal digit, the latter representation is used to
414avoid ambiguity.
415.Pp
416Non-printable characters without C-style
417backslash sequences use the default representation.
418.It Dv VIS_OCTAL
419Use a three digit octal sequence.
420The form is
421.Ql \eddd
422where
423.Em d
424represents an octal digit.
425.It Dv VIS_CSTYLE \&| Dv VIS_OCTAL
426Same as
427.Dv VIS_CSTYLE
428except that non-printable characters without C-style
429backslash sequences use a three digit octal sequence.
430.It Dv VIS_HTTPSTYLE
431Use URI encoding as described in RFC 1738.
432The form is
433.Ql %xx
434where
435.Em x
436represents a lower case hexadecimal digit.
437.It Dv VIS_MIMESTYLE
438Use MIME Quoted-Printable encoding as described in RFC 2045, only don't
439break lines and don't handle CRLF.
440The form is
441.Ql =XX
442where
443.Em X
444represents an upper case hexadecimal digit.
445.El
446.Pp
447There is one additional flag,
448.Dv VIS_NOSLASH ,
449which inhibits the
450doubling of backslashes and the backslash before the default
451format (that is, control characters are represented by
452.Ql ^C
453and
454meta characters as
455.Ql M-C ) .
456With this flag set, the encoding is
457ambiguous and non-invertible.
458.Sh MULTIBYTE CHARACTER SUPPORT
459These functions support multibyte character input.
460The encoding conversion is influenced by the setting of the
461.Ev LC_CTYPE
462environment variable which defines the set of characters
463that can be copied without encoding.
464.Pp
465If
466.Dv VIS_NOLOCALE
467is set, processing is done assuming the C locale and overriding
468any other environment settings.
469.Pp
470When 8-bit data is present in the input,
471.Ev LC_CTYPE
472must be set to the correct locale or to the C locale.
473If the locales of the data and the conversion are mismatched,
474multibyte character recognition may fail and encoding will be performed
475byte-by-byte instead.
476.Pp
477As noted above,
478.Fa dst
479must be four times the number of bytes processed from
480.Fa src .
481But note that each multibyte character can be up to
482.Dv MB_LEN_MAX
483bytes
484.\" (see
485.\" .Xr multibyte 3 )
486so in terms of multibyte characters,
487.Fa dst
488must be four times
489.Dv MB_LEN_MAX
490times the number of characters processed from
491.Fa src .
492.Sh ENVIRONMENT
493.Bl -tag -width ".Ev LC_CTYPE"
494.It Ev LC_CTYPE
495Specify the locale of the input data.
496Set to C if the input data locale is unknown.
497.El
498.Sh ERRORS
499The functions
500.Fn nvis
501and
502.Fn snvis
503will return
504.Dv NULL
505and the functions
506.Fn strnvis ,
507.Fn strnvisx ,
508.Fn strsnvis ,
509and
510.Fn strsnvisx ,
511will return \-1 when the
512.Fa dlen
513destination buffer size is not enough to perform the conversion while
514setting
515.Va errno
516to:
517.Bl -tag -width ".Bq Er ENOSPC"
518.It Bq Er ENOSPC
519The destination buffer size is not large enough to perform the conversion.
520.El
521.Sh SEE ALSO
522.Xr unvis 1 ,
523.Xr vis 1 ,
524.Xr glob 3 ,
525.\" .Xr multibyte 3 ,
526.Xr unvis 3
527.Rs
528.%A T. Berners-Lee
529.%T Uniform Resource Locators (URL)
530.%O "RFC 1738"
531.Re
532.Rs
533.%T "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies"
534.%O "RFC 2045"
535.Re
536.Sh HISTORY
537The
538.Fn vis ,
539.Fn strvis ,
540and
541.Fn strvisx
542functions first appeared in
543.Bx 4.4 .
544The
545.Fn svis ,
546.Fn strsvis ,
547and
548.Fn strsvisx
549functions appeared in
550.Nx 1.5 .
551The buffer size limited versions of the functions
552.Po Fn nvis ,
553.Fn strnvis ,
554.Fn strnvisx ,
555.Fn snvis ,
556.Fn strsnvis ,
557and
558.Fn strsnvisx Pc
559appeared in
560.Nx 6.0
561and
562.Fx 9.2 .
563Multibyte character support was added in
564.Nx 7.0
565and
566.Fx 9.2 .
567