xref: /original-bsd/lib/libc/gen/vis.3 (revision 2e5c0888)
Copyright (c) 1989 The Regents of the University of California.
All rights reserved.

Redistribution and use in source and binary forms are permitted
provided that the above copyright notice and this paragraph are
duplicated in all such forms and that any documentation,
advertising materials, and other materials related to such
distribution and use acknowledge that the software was developed
by the University of California, Berkeley. The name of the
University may not be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

@(#)vis.3 5.3 (Berkeley) 05/11/90

CENCODE 3 ""
C 7
NAME
cencode, cdecode - encode (decode) non-printing characters
SYNOPSIS
#include <cencode.h>

char *cencode(character, cflag)
char character;
int flag;

cdecode(character, store, dflag)
char character, *store;
int flag;
DESCRIPTION
Cencode converts a non-printing character into a printable, invertible representation; cdecode converts that representation back into the original character. These functions are useful for filtering a stream of characters to and from a visual representation.

Cencode returns a pointer to a string that contains the printable representation of the character passed as the argument character . By default, cencode considers characters selected by isgraph (3), space, tab, and newline to be printable characters.

There are three possible forms of representation, as specified by the cflags argument. All forms use the backslash character (``\e'') to introduce a special sequence; two backslashes are used to represent a real backslash. Cflags is specified by or 'ing one or more of the following values:

CENC_WHITE Setting CENC_WHITE in cflag causes space, tab, and newline characters to be considered non-printable, and therefore encoded.

CENC_CSTYLE Use C-style backslash sequences to represent standard non-printable characters. The following sequences are used to represent the indicated characters:

\ea - BEL (007)
\eb - BS (010)
\ef - NP (014)
\en - NL (012)
\er - CR (015)
\et - HT (011)
\ev - VT (013)
\e000 - NUL (000)
These are the only characters that are converted using CENC_CSTYLE . The more familiar abbreviation of ``\e0'' for NULL cannot be used as it could be confused with other octal numbers if the sequence preceded other digits.

CENC_GRAPH Use an ``M'' to represent meta characters (characters with the 8th bit set), and use carat (``^'') to represent control characters see (iscntrl(3)). The following formats are used:

\e^C Represents the control character ``C''. Spans characters \e000 through \e037, and \e0177 (as ``\e^?'').

\eM-C Represents character ``C'' with the 8th bit set. Spans characters \e0240 (\e0241 if CENC_WHITE is set) through \e0376.

\eM^C Represents control character ``C'' with the 8th bit set. Spans characters \e0200 through \e0237, and \e0377 (as ``\eM^?'').

The only characters that cannot be displayed using CDEC_GRAPH are space and meta-space, and only when CENC_WHITE is set.

CENC_OCTAL Use a three digit octal sequence. The form is ``\eddd'' where d represents an octal digit. All non-printing characters may be displayed in this form.

If the supplied character could not be encoded (because the selected formats were unable encode that character) it is placed in the return string unaltered. Note that if NULL's are not encoded, it is placed in the string as two NULL's. If the caller expects to encounter this situation, it suffices to always extract one character from the returned string before checking for NULL. If CENC_OCTAL is selected, in addition to any other formats, this situation can never arise.

Calling cencode with no requested formats results in no encoding being done; however, backslashes are still doubled.

Cdecode is used decode data encoded by cencode . Characters are passed to cdecode until the decoder recognizes a character to return. Dflags is specified by or 'ing one or more of the following values:

CDEC_HAT Treat the carat (``^'') character specially, i.e. decode the sequence ``^C'' as the control character ``C''. This is separate from the sequence ``\e^C'' as output by cencode with the CENC_GRAPH flag set as it does not require the preceding backslash character.

CDEC_END Reset the state of the decoder to the initial state, and flush out any characters have been retained in the decoder.

There are five possible return values from cdecode :

CDEC_NEEDMORE The decoder has not yet recognized a control sequence; supply it with more characters.

CDEC_NOCHAR A valid sequence which did not result in a character was decoded.

CDEC_OK A character was recognized and has been placed in the location pointed to by store .

CDEC_OKPUSH A character was recognized and has been placed in the location pointed to by store ; however, the character that was just supplied to cdecode has not yet been used. When processing a stream of characters, the current character should be supplied to cdecode again.

CDEC_SYNBAD An unrecognized backslash sequence was detected. The decoder was automatically reset to a normal state. All characters since the last un-escaped backslash character constitute the unrecognized sequence. The following code fragment illustrates the use of cdecode :

int ch;
char nc;

while ((ch = getchar()) != EOF) {
again:
 switch(cdecode((char)ch, &nc, 0)) {
 case CDEC_NEEDMORE:
 case CDEC_NOCHAR:
 break;
 case CDEC_OK:
 (void)putchar(nc);
 break;
 case CDEC_OKPUSH:
 (void)putchar(nc);
 goto again;
 case CDEC_SYNBAD:
 (void)fprintf(stderr, "bad sequence!\n");
 exit(1);
 }
}
if (cdecode((char)0, &nc, CDEC_END) == CDEC_OK)
 (void)putchar(nc);
"SEE ALSO"
vis(1)