html-xml-utils/html-xml-utils-8.0/hxunent.1

     "HXUNENT" "1" "10 Jul 2011" "7.x" "HTML-XML-utils"

S
..
E

..

 NAME
hxunent - replace HTML predefined character entities by UTF-8
 SYNOPSIS
 hxunent  "[\| " -b " \|]"  "[\| " -f " \|]"  "[\| " file " \|]"  DESCRIPTION

The
 hxunent command reads the
 file (or standard input) and copies it to standard output with &-entities
by their equivalent character (encoded as UTF-8). E.g., &quot; is
replaced by " and &lt; is replaced by <.
 OPTIONS
The following options are supported:
 10
 -b The five builtin entities of XML (&lt; &gt; &quot; &apos; &amp;) are not
replaced but copied unchanged. This is necessary if the output has to
be valid XML or SGML.

 -f This option changes how unknown entities or lone ampersands are handled. Normally they are copied unchanged, but this option tries to "fix" them by replacing ampersands by &amp;. Often such stray ampersands are the result of copy and paste of URLs into a document and then this option indeed fixes them and makes the document valid.
 "DIAGNOSTICS"
The program's exit value is 0 if all went well, otherwise:
 10
 1 The input couldn't be read (file not found, file not readable...)

 2 Wrong command line arguments.
 "SEE ALSO"
 asc2xml (1),  xml2asc (1),  UTF-8 " (RFC 2279)"  BUGS

The program assumes entities are as defined by HTML. It doesn't read a
document's DTD to find the actual definitions in use in a document.
With
 -f , it will even remove all entities that are not HTML entities.