xref: /original-bsd/lib/libc/gen/vis.3 (revision 7717c4d4)
Copyright (c) 1989 The Regents of the University of California.
All rights reserved.

Redistribution and use in source and binary forms are permitted
provided that the above copyright notice and this paragraph are
duplicated in all such forms and that any documentation,
advertising materials, and other materials related to such
distribution and use acknowledge that the software was developed
by the University of California, Berkeley. The name of the
University may not be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

@(#)vis.3 5.2 (Berkeley) 10/13/89

<CENCODE> <3> ""
C 7 .AT 3
NAME
cencode, cdecode - encode (decode) non-printing characters
SYNOPSIS
 #include <cencode.h> 

char *cencode(c, cflag) char c; int flag;

cdecode(c, cp, dflag) char c, *cp; int flag;

DESCRIPTION
Cencode converts a non-printing character into a printable, invertable representation; cdecode inverts from that representation back to the original character. Both functions pass through printable characters, and are useful for filtering a stream of characters to and from a visual representation.

By default, cencode considers isgraph(c), space, tab, and newline as printable characters. Setting CENC_WHITE in cflag causes space, tab, and newline to be encoded as well.

There are 3 forms of representation, and all three can be requested, independent of each other, since some encode only a subset of the non-printable characters. All forms use the backslash character to introduce the visual sequence; two backslashs are used to represent a real backslash. The following lists the name of the form (specified in the cflag), and a description:

CENC_CTYPE Use C-style backslash sequences where possible. The following sequences are used to represent the indicated character:


\\n - NL (012)
\\r - CR (015) 
\\b - BS (010)
\\a - BEL (007)
\\v - VT (013)
\\t - HT (011)
\\f - NP (014)
\\000 - NUL (000)

These are the only characters that are converted using CDEC_CTYPE. The more familiar abbreviation of \\0 for NULL cannot be used since it could be confused as another octal number if the sequence is laid ahead of other octal digits.

CENC_GRAPHIC Use an M to represent meta characters (chars with the 8th bit set), and use hat (^) to represent control characters (iscntrl(c)). The following forms are possible:


\\^C - Represents control character 'C'. Spans 
 characters 000 through 037, and 0177 (as \\^?).
\\M-C - Represents character 'C' with the 8th bit set. 
 Spans characters 0240 (241 if CENC_WHITE is set)
 through 0376.
\\M^C - Represents control character 'C' with the 8th 
 bit set. Spans characters 0200 through 0237, 
 and 0377 (as \\M^?).

The only characters that cannot be displayed using CDEC_GRAPHIC are space and meta-space, and only when CENC_WHITE is set.

CENC_OCTAL Use a three digit octal sequence. The form is:


\\ddd

where d represents an octal digit. All non-printing characters can be displayed in this form.

Cencode returns a pointer to a string that contains the printable representation of the character passed in c. If the character could not be encoded (because none of the selected formats can encode that character), it is placed in the returned string un-encoded. Note that if NULL is not encoded, it is placed in the string as two nulls. If the caller expects to encounter this situation, it suffices to always extract one character from the returned string before checking for NULL. If CDEC_OCTAL is selected, in addition to any other formats, this situation can never arrise. Also, calling cencode with no requested formats results in no encoding being done; however, backslashes are still doubled.

Using cdecode to decode previously encoded data is a little trickier. Essentially, characters are passed to cdecode until the decoder recognizes a character to return. There are five return codes which need to be handled:

CDEC_NEEDMORE The decoder is not done recognizing a control sequence; pass it another character in c.

CDEC_OK A character was recognized and has been placed in *cp.

CDEC_OKPUSH A character was recognized and has been placed in *cp; however, the character that was just passed in c is not yet needed. When processing a stream of characters, the current character should be used again.

CDEC_NOCHAR A sequence which represents no character was detected.

CDEC_SYNBAD An unrecognized backslash sequence was detected. The decoder was automatically reset to a normal state. All characters since the last un-escaped backslash character constitute the unrecognized sequence.

When the caller is finished feeding characters to cdecode, it should be called one last time with dflag set to CDEC_END. This will extract any remaining character. A sample code fragment is given to illustrate using cdecode:


 char nc;
 while ((c = getchar()) != EOF) {
 again:
 switch(cdecode((char)c, &nc, 0)) {
 case CDEC_NEEDMORE:
 case CDEC_NOCHAR:
 break;
 case CDEC_OK:
 putchar(nc);
 break;
 case CDEC_OKPUSH:
 putchar(nc);
 goto again;
 case CDEC_SYNBAD:
 fprintf(stderr, "Bad sequence\n");
 exit(1);
 }
 }
 if (cdecode((char)0, &nc, CDEC_END) == CDEC_OK)
 putchar(nc);

"SEE ALSO"
vis(1)