paper1 - OpenGrok cross reference for /dports/archivers/snappy-java/snappy-java-1.1.7.5/src/test/resources/org/xerial/snappy/testdata/calgary/paper1

.pn 0
.ls1
.EQ
delim $$
.EN
.ev1
.ps-2
.vs-2
.ev
\&
.sp 5
.ps+4
.ce
ARITHMETIC CODING FOR DATA COMPRESSION
.ps-4
.sp4
.ce
Ian H. Witten, Radford M. Neal, and John G. Cleary
.sp2
.ce4
Department of Computer Science
The University of Calgary
2500 University Drive NW
Calgary, Canada T2N 1N4
.sp2
.ce
August, 1986, revised January 1987
.sp 8
.in+1i
.ll-1i
The state of the art in data compression is arithmetic coding, not
the better-known Huffman method.
Arithmetic coding gives greater compression, is faster for adaptive models,
and clearly separates the model from the channel encoding.
This paper presents a practical implementation of the technique.
.sp 3
.in +0.5i
.ti -0.5i
\fICR Categories and subject descriptors:\fR
.br
E.4 [DATA] Coding and Information Theory \(em Data Compaction and Compression
.br
H.1.1 [Models and Principles] Systems and Information Theory \(em Information Theory
.sp
.ti -0.5i
\fIGeneral terms:\fR  algorithms, performance
.sp
.ti -0.5i
\fIAdditional key words and phrases:\fR  arithmetic coding, Huffman coding, adaptive modeling
.ll+1i
.in 0
.bp
.sh "Introduction"
.pp
Arithmetic coding is superior in most respects to the better-known Huffman
(1952) method.
.[
Huffman 1952 method construction minimum-redundancy codes
.]
It represents information at least as compactly, sometimes considerably more
so.
Its performance is optimal without the need for blocking of input data.
It encourages a clear separation between the model for representing data and
the encoding of information with respect to that model.
It accommodates adaptive models easily.
It is computationally efficient.
Yet many authors and practitioners seem unaware of the technique.
Indeed there is a widespread belief that Huffman coding cannot be improved
upon.
.pp
This paper aims to rectify the situation by presenting an accessible
implementation of arithmetic coding, and detailing its performance
characteristics.
The next section briefly reviews basic concepts of data compression and
introduces the model-based approach which underlies most modern techniques.
We then outline the idea of arithmetic coding using a simple example.
Following that programs are presented for both encoding and decoding, in which
the model occupies a separate module so that different ones can easily be
used.
Next we discuss the construction of fixed and adaptive models.
The subsequent section details the compression efficiency and execution time
of the programs, including the effect of different arithmetic word lengths on
compression efficiency.
Finally, we outline a few applications where arithmetic coding is appropriate.
.sh "Data compression"
.pp
To many, data compression conjures up an assortment of \fIad hoc\fR
techniques such as converting spaces in text to tabs, creating special codes
for common words, or run-length coding of picture data (eg see Held, 1984).
.[
Held 1984 data compression techniques applications
.]
This contrasts with the more modern model-based paradigm for
coding, where from an \fIinput string\fR of symbols and a \fImodel\fR, an
\fIencoded string\fR is produced which is (usually) a compressed version of
the input.
The decoder, which must have access to the same model,
regenerates the exact input string from the encoded string.
Input symbols are drawn from some well-defined set such as the ASCII or
binary alphabets;
the encoded string is a plain sequence of bits.
The model is a way of calculating, in any given context, the distribution of
probabilities for the next input symbol.
It must be possible for the decoder to produce exactly the same probability
distribution in the same context.
Compression is achieved by transmitting the more probable symbols in fewer
bits than the less probable ones.
.pp
For example, the model may assign a predetermined probability to each symbol
in the ASCII alphabet.
No context is involved.
These probabilities may be determined by counting frequencies in
representative samples of text to be transmitted.
Such a \fIfixed\fR model is communicated in advance to both encoder and
decoder, after which it is used for many messages.
.pp
Alternatively, the probabilities the model assigns may change as each symbol
is transmitted, based on the symbol frequencies seen \fIso far\fR in this
message.
This is an \fIadaptive\fR model.
There is no need for a representative sample of text, because each message
is treated as an independent unit, starting from scratch.
The encoder's model changes with each symbol transmitted, and the decoder's
changes with each symbol received, in sympathy.
.pp
More complex models can provide more accurate probabilistic predictions and
hence achieve greater compression.
For example, several characters of previous context could condition the
next-symbol probability.
Such methods have enabled mixed-case English text to be encoded in around
2.2\ bit/char with two quite different kinds of model.
(Cleary & Witten, 1984b; Cormack & Horspool, 1985).
.[
Cleary Witten 1984 data compression
%D 1984b
.]
.[
Cormack Horspool 1985 dynamic Markov
%O April
.]
Techniques which do not separate modeling from coding so distinctly, like
that of Ziv & Lempel (1978), do not seem to show such great potential for
compression, although they may be appropriate when the aim is raw speed rather
than compression performance (Welch, 1984).
.[
Ziv Lempel 1978 compression of individual sequences
.]
.[
Welch 1984 data compression
.]
.pp
The effectiveness of any model can be measured by the \fIentropy\fR of the
message with respect to it, usually expressed in bits/symbol.
Shannon's fundamental theorem of coding states that given messages randomly
generated from a model, it is impossible to encode them into less bits
(on average) than the entropy of that model (Shannon & Weaver, 1949).
.[
Shannon Weaver 1949
.]
.pp
A message can be coded with respect to a model using either Huffman or
arithmetic coding.
The former method is frequently advocated as the best possible technique for
reducing the encoded data rate.
But it is not.
Given that each symbol in the alphabet must translate into an integral number
of bits in the encoding, Huffman coding indeed achieves ``minimum
redundancy''.
In other words, it performs optimally if all symbol probabilities are
integral powers of 1/2.
But this is not normally the case in practice;
indeed, Huffman coding can take up to one extra bit per symbol.
The worst case is realized by a source in which one symbol has probability
approaching unity.
Symbols emanating from such a source convey negligible information on average,
but require at least one bit to transmit (Gallagher, 1978).
.[
Gallagher 1978 variations on a theme by Huffman
.]
Arithmetic coding dispenses with the restriction that each symbol translates
into an integral number of bits, thereby coding more efficiently.
It actually achieves the theoretical entropy bound to compression efficiency
for any source, including the one just mentioned.
.pp
In general, sophisticated models expose the deficiencies of Huffman coding
more starkly than simple ones.
This is because they more often predict symbols with probabilities close to
one, the worst case for Huffman coding.
For example, the techniques mentioned above which code English text in
2.2\ bit/char both use arithmetic coding as the final step, and performance
would be impacted severely if Huffman coding were substituted.
Nevertheless, since our topic is coding and not modeling, the illustrations in
this paper all employ simple models.
Even then, as we shall see, Huffman coding is inferior to arithmetic coding.
.pp
The basic concept of arithmetic coding can be traced back to Elias in the
early 1960s (see Abramson, 1963, pp 61-62).
.[
Abramson 1963
.]
Practical techniques were first introduced by Rissanen (1976) and
Pasco (1976), and developed further in Rissanen (1979).
.[
Rissanen 1976 Generalized Kraft Inequality
.]
.[
Pasco 1976
.]
.[
Rissanen 1979 number representations
.]
.[
Langdon 1981 tutorial arithmetic coding
.]
Details of the implementation presented here have not appeared in the
literature before; Rubin (1979) is closest to our approach.
.[
Rubin 1979 arithmetic stream coding
.]
The reader interested in the broader class of arithmetic codes is referred
to Rissanen & Langdon (1979);
.[
Rissanen Langdon 1979 Arithmetic coding
.]
a tutorial is available in Langdon (1981).
.[
Langdon 1981 tutorial arithmetic coding
.]
Despite these publications, the method is not widely known.
A number of recent books and papers on data compression mention it only in
passing, or not at all.
.sh "The idea of arithmetic coding"
.pp
In arithmetic coding a message is represented by an
interval of real numbers between 0 and 1.
As the message becomes longer, the interval needed to represent it becomes
smaller, and the number of bits needed to specify that interval grows.
Successive symbols of the message reduce the size of the
interval in accordance with the symbol probabilities generated by the
model.
The more likely symbols reduce the range by less than the unlikely symbols,
and hence add fewer bits to the message.
.pp
Before anything is transmitted, the range for the message is the entire
interval [0,\ 1)\(dg.
.FN
\(dg [0,\ 1) denotes the half-open interval 0\(<=\fIx\fR<1.
.EF
As each symbol is processed, the range is narrowed to that portion of it
allocated to the symbol.
For example, suppose the alphabet is {\fIa,\ e,\ i,\ o,\ u,\ !\fR}, and a
fixed model is used with probabilities shown in Table\ 1.
Imagine transmitting the message \fIeaii!\fR.
Initially, both encoder and decoder know that the range is [0,\ 1).
After seeing the first symbol, \fIe\fR, the encoder narrows it to
[0.2,\ 0.5), the range the model allocates to this symbol.
The second symbol, \fIa\fR, will narrow this new range to the first 1/5 of it,
since \fIa\fR has been allocated [0,\ 0.2).
This produces [0.2,\ 0.26) since the previous range was 0.3 units long and
1/5 of that is 0.06.
The next symbol, \fIi\fR, is allocated [0.5,\ 0.6), which when applied to
[0.2,\ 0.26) gives the smaller range [0.23,\ 0.236).
Proceeding in this way, the encoded message builds up as follows:
.LB
.nf
.ta \w'after seeing   'u +0.5i +\w'[0.23354, 'u
initially		[0,	1)
after seeing	\fIe\fR	[0.2,	0.5)
	\fIa\fR	[0.2,	0.26)
	\fIi\fR	[0.23,	0.236)
	\fIi\fR	[0.233,	0.2336)
	\fI!\fR	[0.23354,	0.2336)
.fi
.LE
Figure\ 1 shows another representation of the encoding process.
The vertical bars with ticks represent the symbol probabilities stipulated
by the model.
After the first symbol, \fIe\fR, has been processed, the model is scaled
into the range [0.2,\ 0.5), as shown in part (a).
The second symbol, \fIa\fR, scales it again into the range [0.2,\ 0.26).
But the picture cannot be continued in this way without a magnifying glass!
Consequently Figure\ 1(b) shows the ranges expanded to full height at every
stage, and marked with a scale which gives the endpoints as numbers.
.pp
Suppose all the decoder knows about the message is the final range,
[0.23354,\ 0.2336).
It can immediately deduce that the first character was \fIe\fR, since the
range lies entirely within the space the model of Table\ 1 allocates for
\fIe\fR.
Now it can simulate the operation of the \fIen\fR\^coder:
.LB
.nf
.ta \w'after seeing   'u +0.5i +\w'[0.2, 'u
initially		[0,	1)
after seeing	\fIe\fR	[0.2,	0.5)
.fi
.LE
This makes it clear that the second character of the message is \fIa\fR,
since this will produce the range
.LB
.nf
.ta \w'after seeing   'u +0.5i +\w'[0.2, 'u
after seeing	\fIa\fR	[0.2,	0.26)
.fi
.LE
which entirely encloses the given range [0.23354,\ 0.2336).
Proceeding like this, the decoder can identify the whole message.
.pp
It is not really necessary for the decoder to know both ends of the range
produced by the encoder.
Instead, a single number within the range \(em for example, 0.23355 \(em will
suffice.
(Other numbers, like 0.23354, 0.23357, or even 0.23354321, would do just as
well.)  \c
However, the decoder will face the problem of detecting the end of the
message, to determine when to stop decoding.
After all, the single number 0.0 could represent any of \fIa\fR, \fIaa\fR,
\fIaaa\fR, \fIaaaa\fR, ...\ .
To resolve the ambiguity we ensure that each message ends with a special
EOF symbol known to both encoder and decoder.
For the alphabet of Table\ 1, \fI!\fR will be used to terminate messages,
and only to terminate messages.
When the decoder sees this symbol it stops decoding.
.pp
Relative to the fixed model of Table\ 1, the entropy of the 5-symbol message
\fIeaii!\fR is
.LB
$- ~ log ~ 0.3 ~ - ~ log ~ 0.2 ~ - ~ log ~ 0.1 ~ - ~ log ~ 0.1 ~ - ~ log ~ 0.1 ~~=~~ - ~ log ~ 0.00006 ~~ approx ~~ 4.22$
.LE
(using base 10, since the above encoding was performed in decimal).
This explains why it takes 5\ decimal digits to encode the message.
In fact, the size of the final range is $0.2336 ~-~ 0.23354 ~~=~~ 0.00006$,
and the entropy is the negative logarithm of this figure.
Of course, we normally work in binary, transmitting binary digits and
measuring entropy in bits.
.pp
Five decimal digits seems a lot to encode a message comprising four vowels!
It is perhaps unfortunate that our example ended up by expanding rather than
compressing.
Needless to say, however, different models will give different entropies.
The best single-character model of the message \fIeaii!\fR is the set of
symbol frequencies
{\fIe\fR\ (0.2),  \fIa\fR\ (0.2),  \fIi\fR\ (0.4),  \fI!\fR\ (0.2)},
which gives an entropy for \fIeaii!\fR of 2.89\ decimal digits.
Using this model the encoding would be only 3\ digits long.
Moreover, as noted earlier more sophisticated models give much better
performance in general.
.sh "A program for arithmetic coding"
.pp
Figure\ 2 shows a pseudo-code fragment which summarizes the encoding and
decoding procedures developed in the last section.
Symbols are numbered 1, 2, 3, ...
The frequency range for the $i$th symbol is from $cum_freq[i]$ to
$cum_freq[i-1]$.
$cum_freq[i]$ increases as $i$ decreases, and $cum_freq[0] = 1$.
(The reason for this ``backwards'' convention is that later, $cum_freq[0]$
will contain a normalizing factor, and it will be convenient to have it
begin the array.)  \c
The ``current interval'' is [$low$,\ $high$); and for both encoding and
decoding this should be initialized to [0,\ 1).
.pp
Unfortunately, Figure\ 2 is overly simplistic.
In practice, there are several factors which complicate both encoding and
decoding.
.LB
.NP
Incremental transmission and reception.
.br
The encode algorithm as described does not transmit anything until the entire
message has been encoded; neither does the decode algorithm begin decoding
until it has received the complete transmission.
In most applications, an incremental mode of operation is necessary.
.sp
.NP
The desire to use integer arithmetic.
.br
The precision required to represent the [$low$, $high$) interval grows with
the length of the message.
Incremental operation will help overcome this, but the potential for overflow
and underflow must still be examined carefully.
.sp
.NP
Representing the model so that it can be consulted efficiently.
.br
The representation used for the model should minimize the time required for
the decode algorithm to identify the next symbol.
Moreover, an adaptive model should be organized to minimize the
time-consuming task of maintaining cumulative frequencies.
.LE
.pp
Figure\ 3 shows working code, in C, for arithmetic encoding and decoding.
It is considerably more detailed than the bare-bones sketch of Figure\ 2!
Implementations of two different models are given in Figure\ 4;
the Figure\ 3 code can use either one.
.pp
The remainder of this section examines the code of Figure\ 3 more closely,
including a proof that decoding is still correct in the integer
implementation and a review of constraints on word lengths in the program.
.rh "Representing the model."
Implementations of models are discussed in the next section; here we
are concerned only with the interface to the model (lines 20-38).
In C, a byte is represented as an integer between 0 and 255 (call this a
$char$).
Internally, we represent a byte as an integer between 1 and 257 inclusive
(call this an $index$), EOF being treated as a 257th symbol.
It is advantageous to sort the model into frequency order, to minimize the
number of executions of the decoding loop (line 189).
To permit such reordering, the $char$/$index$ translation is implemented as
a pair of tables, $index_to_char[ \| ]$ and $char_to_index[ \| ]$.
In one of our models, these tables simply form the $index$ by adding 1 to the
$char$, but another implements a more complex translation which assigns small
indexes to frequently-used symbols.
.pp
The probabilities in the model are represented as integer frequency counts,
and cumulative counts are stored in the array $cum_freq[ \| ]$.
As previously, this array is ``backwards,'' and the total frequency count \(em
which is used to normalize all frequencies \(em appears in $cum_freq[0]$.
Cumulative counts must not exceed a predetermined maximum, $Max_frequency$,
and the model implementation must prevent overflow by scaling appropriately.
It must also ensure that neighboring values in the $cum_freq[ \| ]$ array
differ by at least 1; otherwise the affected symbol could not be transmitted.
.rh "Incremental transmission and reception."
Unlike Figure\ 2, the program of Figure\ 3 represents $low$ and $high$ as
integers.
A special data type, $code_value$, is defined for these quantities,
together with some useful constants:  \c
$Top_value$, representing the largest possible $code_value$, and
$First_qtr$, $Half$, and $Third_qtr$, representing parts of the range
(lines 6-16).
Whereas in Figure\ 2 the current interval is represented by
[$low$,\ $high$), in Figure\ 3 it is [$low$,\ $high$]; that is, the range now
includes the value of $high$.
Actually, it is more accurate (though more confusing) to say that in the
program of Figure\ 3 the interval represented is
[$low$,\ $high + 0.11111 ...$).
This is because when the bounds are scaled up to increase the precision, 0's
are shifted into the low-order bits of $low$ but 1's are shifted into $high$.
While it is possible to write the program to use a different convention,
this one has some advantages in simplifying the code.
.pp
As the code range narrows, the top bits of $low$ and $high$ will become the
same.
Any bits which are the same can be transmitted immediately, since they cannot
be affected by future narrowing.
For encoding, since we know that $low ~ <= ~ high$, this requires code like
.LB "nnnn"
.nf
.ta \w'nnnn'u +\w'if (high < 'u +\w'Half) { 'u +\w'output_bit(1); low = 2*(low\-Half); high = 2*(high\-Half)+1; 'u
.ne 4
for (;;) {
	if (high <	Half) {	output_bit(0); low = 2*low; high = 2*high+1;	}
	if (low $>=$	Half) {	output_bit(1); low = 2*(low\-Half); high = 2*(high\-Half)+1;	}
}
.fi
.LE "nnnn"
which ensures that, upon completion, $low ~ < ~ Half ~ <= ~ high$.
This can be found in lines 95-113 of $encode_symbol( \| )$,
although there are some extra complications caused by underflow possibilities
(see next subsection).
Care is taken care to shift 1's in at the bottom when $high$ is scaled, as
noted above.
.pp
Incremental reception is done using a number called $value$ as in Figure\ 2,
in which processed bits flow out the top (high-significance) end and
newly-received ones flow in the bottom.
$start_decoding( \| )$ (lines 168-176) fills $value$ with received bits initially.
Once $decode_symbol( \| )$ has identified the next input symbol, it shifts out
now-useless high-order bits which are the same in $low$ and $high$, shifting
$value$ by the same amount (and replacing lost bits by fresh input bits at the
bottom end):
.LB "nnnn"
.nf
.ta \w'nnnn'u +\w'if (high < 'u +\w'Half) { 'u +\w'value = 2*(value\-Half)+input_bit(\|); low = 2*(low\-Half); high = 2*(high\-Half)+1; 'u
.ne 4
for (;;) {
	if (high <	Half) {	value = 2*value+input_bit(\|); low = 2*low; high = 2*high+1;	}
	if (low $>=$	Half) {	value = 2*(value\-Half)+input_bit(\|); low = 2*(low\-Half); high = 2*(high\-Half)+1;	}
}
.fi
.LE "nnnn"
(see lines 194-213, again complicated by precautions against underflow
discussed below).
.rh "Proof of decoding correctness."
At this point it is worth checking that the identification of the next
symbol by $decode_symbol( \| )$ works properly.
Recall from Figure\ 2 that $decode_symbol( \| )$ must use $value$ to find that
symbol which, when encoded, reduces the range to one that still includes
$value$.
Lines 186-188 in $decode_symbol( \| )$ identify the symbol for which
.LB
$cum_freq[symbol] ~ <= ~~
left f {(value-low+1)*cum_freq[0] ~-~ 1} over {high-low+1} right f
~~ < ~ cum_freq[symbol-1]$,
.LE
where $left f ~ right f$ denotes the ``integer part of'' function that
comes from integer division with truncation.
It is shown in the Appendix that this implies
.LB "nnnn"
$low ~+~~ left f {(high-low+1)*cum_freq[symbol]} over cum_freq[0] right f
~~ <= ~ v ~ <=  ~~
low ~+~~ left f {(high-low+1)*cum_freq[symbol-1]} over cum_freq[0] right f ~~ - ~ 1$,
.LE "nnnn"
so $value$ lies within the new interval that $decode_symbol( \| )$
calculates in lines 190-193.
This is sufficient to guarantee that the decoding operation identifies each
symbol correctly.
.rh "Underflow."
As Figure\ 1 shows, arithmetic coding works by scaling the cumulative
probabilities given by the model into the interval [$low$,\ $high$] for
each character transmitted.
Suppose $low$ and $high$ are very close together, so close that this scaling
operation maps some different symbols of the model on to the same integer in
the [$low$,\ $high$] interval.
This would be disastrous, because if such a symbol actually occurred it would
not be possible to continue encoding.
Consequently, the encoder must guarantee that the interval [$low$,\ $high$] is
always large enough to prevent this.
The simplest way is to ensure that this interval is at least as large as
$Max_frequency$, the maximum allowed cumulative frequency count (line\ 36).
.pp
How could this condition be violated?
The bit-shifting operation explained above ensures that $low$ and $high$ can
only become close together when they straddle $Half$.
Suppose in fact they become as close as
.LB
$First_qtr ~ <= ~ low ~<~ Half ~ <= ~ high ~<~ Third_qtr$.
.LE
Then the next two bits sent will have opposite polarity, either 01 or 10.
For example, if the next bit turns out to be 0 (ie $high$ descends below
$Half$ and [0,\ $Half$] is expanded to the full interval) the bit after
that will be 1 since the range has to be above the midpoint of the expanded
interval.
Conversely if the next bit happens to be 1 the one after that will be 0.
Therefore the interval can safely be expanded right now, if only we remember
that whatever bit actually comes next, its opposite must be transmitted
afterwards as well.
Thus lines 104-109 expand [$First_qtr$,\ $Third_qtr$] into the whole interval,
remembering in $bits_to_follow$ that the bit that is output next must be
followed by an opposite bit.
This explains why all output is done via $bit_plus_follow( \| )$
(lines 128-135) instead of directly with $output_bit( \| )$.
.pp
But what if, after this operation, it is \fIstill\fR true that
.LB
$First_qtr ~ <= ~ low ~<~ Half ~ <= ~ high ~<~ Third_qtr$?
.LE
Figure\ 5 illustrates this situation, where the current [$low$,\ $high$]
range (shown as a thick line) has been expanded a total of three times.
Suppose the next bit will turn out to be 0, as indicated by the arrow in
Figure\ 5(a) being below the halfway point.
Then the next \fIthree\fR bits will be 1's, since not only is the arrow in the
top half of the bottom half of the original range, it is in the top quarter,
and moreover the top eighth, of that half \(em that is why the expansion
can occur three times.
Similarly, as Figure\ 5(b) shows, if the next bit turns out to be a 1 it will
be followed by three 0's.
Consequently we need only count the number of expansions and follow the next
bit by that number of opposites (lines 106 and 131-134).
.pp
Using this technique, the encoder can guarantee that after the shifting
operations, either
.LB
.ta \n(.lu-\n(.iuR
$low ~<~ First_qtr ~<~ Half ~ <= ~ high$	(1a)
.LE
or
.LB
.ta \n(.lu-\n(.iuR
$low ~<~ Half ~<~ Third_qtr ~ <= ~ high$.	(1b)
.LE
Therefore as long as the integer range spanned by the cumulative frequencies
fits into a quarter of that provided by $code_value$s, the underflow problem
cannot occur.
This corresponds to the condition
.LB
$Max_frequency ~ <= ~~ {Top_value+1} over 4 ~~ + ~ 1$,
.LE
which is satisfied by Figure\ 3 since $Max_frequency ~=~ 2 sup 14 - 1$ and
$Top_value ~=~ 2 sup 16 - 1$ (lines\ 36, 9).
More than 14\ bits cannot be used to represent cumulative frequency
counts without increasing the number of bits allocated to $code_value$s.
.pp
We have discussed underflow in the encoder only.
Since the decoder's job, once each symbol has been decoded, is to track the
operation of the encoder, underflow will be avoided if it performs the same
expansion operation under the same conditions.
.rh "Overflow."
Now consider the possibility of overflow in the integer multiplications
corresponding to those of Figure\ 2, which occur in lines 91-94 and 190-193
of Figure\ 3.
Overflow cannot occur provided the product
.LB
$range * Max_frequency$
.LE
fits within the integer word length available, since cumulative frequencies
cannot exceed $Max_frequency$.
$Range$ might be as large as $Top_value ~+~1$, so the largest possible product
in Figure 3 is $2 sup 16 ( 2 sup 14 - 1 )$ which is less than $2 sup 30$.
$Long$ declarations are used for $code_value$ (line\ 7) and $range$
(lines\ 89, 183) to ensure that arithmetic is done to 32-bit precision.
.rh "Constraints on the implementation."
The constraints on word length imposed by underflow and overflow can
be simplified by assuming frequency counts are represented in $f$\ bits, and
$code_value$s in $c$\ bits.
The implementation will work correctly provided
.LB
$f ~ <= ~ c ~ - ~2$
.br
$f ~+~ c ~ <= ~ p$, the precision to which arithmetic is performed.
.LE
In most C implementations, $p=31$ if $long$ integers are used, and $p=32$
if they are $unsigned ~ long$.
In Figure\ 3, $f=14$ and $c=16$.
With appropriately modified declarations, $unsigned ~ long$ arithmetic with
$f=15$, $c=17$ could be used.
In assembly language $c=16$ is a natural choice because it expedites some
comparisons and bit manipulations (eg those of lines\ 95-113 and 194-213).
.pp
If $p$ is restricted to 16\ bits, the best values possible are $c=9$ and
$f=7$, making it impossible to encode a full alphabet of 256\ symbols, as each
symbol must have a count of at least 1.
A smaller alphabet (eg the letters, or 4-bit nibbles) could still be
handled.
.rh "Termination."
To finish the transmission, it is necessary to send a unique terminating
symbol ($EOF_symbol$, line 56) and then follow it by enough bits to ensure
that the encoded string falls within the final range.
Since $done_encoding( \| )$ (lines 119-123) can be sure that
$low$ and $high$ are constrained by either (1a) or (1b) above, it need only
transmit $01$ in the first case or $10$ in the second to remove the remaining
ambiguity.
It is convenient to do this using the $bit_plus_follow( \| )$ procedure
discussed earlier.
The $input_bit( \| )$ procedure will actually read a few more bits than were
sent by $output_bit( \| )$ as it needs to keep the low end of the buffer full.
It does not matter what value these bits have as the EOF is uniquely
determined by the last two bits actually transmitted.
.sh "Models for arithmetic coding"
.pp
The program of Figure\ 3 must be used with a model which provides
a pair of translation tables $index_to_char[ \| ]$ and $char_to_index[ \| ]$,
and a cumulative frequency array $cum_freq[ \| ]$.
The requirements on the latter are that
.LB
.NP
$cum_freq[ i-1 ] ~ >= ~ cum_freq[ i ]$;
.NP
an attempt is never made to encode a symbol $i$ for which
$cum_freq[i-1] ~=~ cum_freq[i]$;
.NP
$cum_freq[0] ~ <= ~ Max_frequency$.
.LE
Provided these conditions are satisfied the values in the array need bear
no relationship to the actual cumulative symbol frequencies in messages.
Encoding and decoding will still work correctly, although encodings will
occupy less space if the frequencies are accurate.
(Recall our successfully encoding \fIeaii!\fR according to the model of
Table\ 1, which does not actually reflect the frequencies in the message.)  \c
.rh "Fixed models."
The simplest kind of model is one in which symbol frequencies are fixed.
The first model in Figure\ 4 has symbol frequencies which approximate those
of English (taken from a part of the Brown Corpus, Kucera & Francis, 1967).
.[
%A Kucera, H.
%A Francis, W.N.
%D 1967
%T Computational analysis of present-day American English
%I Brown University Press
%C Providence, RI
.]
However, bytes which did not occur in that sample have been given frequency
counts of 1 in case they do occur in messages to be encoded
(so, for example, this model will still work for binary files in which all
256\ bytes occur).
Frequencies have been normalized to total 8000.
The initialization procedure $start_model( \| )$ simply computes a cumulative
version of these frequencies (lines 48-51), having first initialized the
translation tables (lines 44-47).
Execution speed would be improved if these tables were used to re-order
symbols and frequencies so that the most frequent came first in the
$cum_freq[ \| ]$ array.
Since the model is fixed, the procedure $update_model( \| )$, which is
called from both $encode.c$ and $decode.c$, is null.
.pp
An \fIexact\fR model is one where the symbol frequencies in the message are
exactly as prescribed by the model.
For example, the fixed model of Figure\ 4 is close to an exact model
for the particular excerpt of the Brown Corpus from which it was taken.
To be truly exact, however, symbols that did not occur in the excerpt would
be assigned counts of 0, not 1 (sacrificing the capability of
transmitting messages containing those symbols).
Moreover, the frequency counts would not be scaled to a predetermined
cumulative frequency, as they have been in Figure\ 4.
The exact model may be calculated and transmitted before sending the message.
It is shown in Cleary & Witten (1984a) that, under quite general conditions,
this will \fInot\fR give better overall compression than adaptive coding,
described next.
.[
Cleary Witten 1984 enumerative adaptive codes
%D 1984a
.]
.rh "Adaptive models."
An adaptive model represents the changing symbol frequencies seen \fIso far\fR
in the message.
Initially all counts might be the same (reflecting no initial information),
but they are updated as each symbol is seen, to approximate the observed
frequencies.
Provided both encoder and decoder use the same initial values (eg equal
counts) and the same updating algorithm, their models will remain in step.
The encoder receives the next symbol, encodes it, and updates its model.
The decoder identifies it according to its current model, and then updates its
model.
.pp
The second half of Figure\ 4 shows such an adaptive model.
This is the type of model recommended for use with Figure\ 3, for in practice
it will outperform a fixed model in terms of compression efficiency.
Initialization is the same as for the fixed model, except that all frequencies
are set to 1.
The procedure $update_model(symbol)$ is called by both $encode_symbol( \| )$
and $decode_symbol( \| )$ (Figure\ 3 lines 54 and 151) after each symbol is
processed.
.pp
Updating the model is quite expensive, because of the need to maintain
cumulative totals.
In the code of Figure\ 4, frequency counts, which must be maintained anyway,
are used to optimize access by keeping the array in frequency order \(em an
effective kind of self-organizing linear search (Hester & Hirschberg, 1985).
.[
Hester Hirschberg 1985
.]
$Update_model( \| )$ first checks to see if the new model will exceed
the cumulative-frequency limit, and if so scales all frequencies down by a
factor of 2 (taking care to ensure that no count scales to zero) and
recomputes cumulative values (Figure\ 4, lines\ 29-37).
Then, if necessary, $update_model( \| )$ re-orders the symbols to place the
current one in its correct rank in the frequency ordering, altering the
translation tables to reflect the change.
Finally, it increments the appropriate frequency count and adjusts cumulative
frequencies accordingly.
.sh "Performance"
.pp
Now consider the performance of the algorithm of Figure\ 3, both
in compression efficiency and execution time.
.rh "Compression efficiency."
In principle, when a message is coded using arithmetic coding, the number of
bits in the encoded string is the same as the entropy of that message with
respect to the model used for coding.
Three factors cause performance to be worse than this in practice:
.LB
.NP
message termination overhead
.NP
the used of fixed-length rather than infinite-precision arithmetic
.NP
scaling of counts so that their total is at most $Max_frequency$.
.LE
None of these effects is significant, as we now show.
In order to isolate the effect of arithmetic coding the model will be
considered to be exact (as defined above).
.pp
Arithmetic coding must send extra bits at the end of each message, causing a
message termination overhead.
Two bits are needed, sent by $done_encoding( \| )$ (Figure\ 3 lines 119-123),
in order to disambiguate the final symbol.
In cases where a bit-stream must be blocked into 8-bit characters before
encoding, it will be necessary to round out to the end of a block.
Combining these, an extra 9\ bits may be required.
.pp
The overhead of using fixed-length arithmetic
occurs because remainders are truncated on division.
It can be assessed by comparing the algorithm's performance with
the figure obtained from a theoretical entropy calculation
which derives its frequencies from counts scaled exactly as for coding.
It is completely negligible \(em on the order of $10 sup -4$ bits/symbol.
.pp
The penalty paid by scaling counts is somewhat larger, but still very
small.
For short messages (less than $2 sup 14$ bytes) no scaling need be done.
Even with messages of $10 sup 5$ to $10 sup 6$ bytes, the overhead was found
experimentally to be less than 0.25% of the encoded string.
.pp
The adaptive model of Figure\ 4 scales down all counts whenever the total
threatens to exceed $Max_frequency$.
This has the effect of weighting recent events more heavily compared with
those earlier in the message.
The statistics thus tend to track changes in the input sequence, which can be
very beneficial.
(For example, we have encountered cases where limiting counts to 6 or 7\ bits
gives better results than working to higher precision.)  \c
Of course, this depends on the source being modeled.
Bentley \fIet al\fR (1986) consider other, more explicit, ways of
incorporating a recency effect.
.[
Bentley Sleator Tarjan Wei 1986 locally adaptive
%J Communications of the ACM
.]
.rh "Execution time."
The program in Figure\ 3 has been written for clarity, not execution speed.
In fact, with the adaptive model of Figure\ 4, it takes about 420\ $mu$s per
input byte on a VAX-11/780 to encode a text file, and about the same for
decoding.
However, easily avoidable overheads such as procedure calls account for much
of this, and some simple optimizations increase speed by a factor of 2.
The following alterations were made to the C version shown:
.LB
.NP
the procedures $input_bit( \| )$, $output_bit( \| )$, and
$bit_plus_follow( \| )$ were converted to macros to eliminate
procedure-call overhead;
.NP
frequently-used quantities were put in register variables;
.NP
multiplies by two were replaced by additions (C ``+='');
.NP
array indexing was replaced by pointer manipulation in the loops
at line 189 of Figure\ 3 and lines 49-52 of the adaptive model in Figure\ 4.
.LE
.pp
This mildly-optimized C implementation has an execution time of
214\ $mu$s/262\ $mu$s, per input byte,
for encoding/decoding 100,000\ bytes of English text on a VAX-11/780, as shown
in Table\ 2.
Also given are corresponding figures for the same program on an
Apple Macintosh and a SUN-3/75.
As can be seen, coding a C source program of the same length took slightly
longer in all cases, and a binary object program longer still.
The reason for this will be discussed shortly.
Two artificial test files were included to allow readers to replicate the
results.
``Alphabet'' consists of enough copies of the 26-letter alphabet to fill
out 100,000\ characters (ending with a partially-completed alphabet).
``Skew-statistics'' contains 10,000 copies of the string
\fIaaaabaaaac\fR\^; it demonstrates that files may be encoded into less than
1\ bit per character (output size of 12,092\ bytes = 96,736\ bits).
All results quoted used the adaptive model of Figure\ 4.
.pp
A further factor of 2 can be gained by reprogramming in assembly language.
A carefully optimized version of Figures\ 3 and 4 (adaptive model) was
written in both VAX and M68000 assembly language.
Full use was made of registers.
Advantage was taken of the 16-bit $code_value$ to expedite some crucial
comparisons and make subtractions of $Half$ trivial.
The performance of these implementations on the test files is also shown in
Table\ 2 in order to give the reader some idea of typical execution speeds.
.pp
The VAX-11/780 assembly language timings are broken down in Table\ 3.
These figures were obtained with the U\s-2NIX\s+2 profile facility and are
accurate only to within perhaps 10%\(dg.
.FN
\(dg This mechanism constructs a histogram of program counter values at
real-time clock interrupts, and suffers from statistical variation as well as
some systematic errors.
.EF
``Bounds calculation'' refers to the initial part of $encode_symbol( \| )$
and $decode_symbol( \| )$ (Figure\ 3 lines 90-94 and 190-193)
which contain multiply and divide operations.
``Bit shifting'' is the major loop in both the encode and decode routines
(lines 95-113 and 194-213).
The $cum$ calculation in $decode_symbol( \| )$, which requires a
multiply/divide, and the following loop to identify the next symbol
(lines\ 187-189), is ``Symbol decode''.
Finally, ``Model update'' refers to the adaptive
$update_model( \| )$ procedure of Figure\ 4 (lines\ 26-53).
.pp
As expected, the bounds calculation and model update take the same time for
both encoding and decoding, within experimental error.
Bit shifting was quicker for the text file than for the C program and object
file because compression performance was better.
The extra time for decoding over encoding is due entirely to the symbol
decode step.
This takes longer in the C program and object file tests because the loop of
line\ 189 was executed more often (on average 9\ times, 13\ times, and
35\ times respectively).
This also affects the model update time because it is the number of cumulative
counts which must be incremented in Figure\ 4 lines\ 49-52.
In the worst case, when the symbol frequencies are uniformly distributed,
these loops are executed an average of 128 times.
Worst-case performance would be improved by using a more complex tree
representation for frequencies, but this would likely be slower for text
files.
.sh "Some applications"
.pp
Applications of arithmetic coding are legion.
By liberating \fIcoding\fR with respect to a model from the \fImodeling\fR
required for prediction, it encourages a whole new view of data compression
(Rissanen & Langdon, 1981).
.[
Rissanen Langdon 1981 Universal modeling and coding
.]
This separation of function costs nothing in compression performance, since
arithmetic coding is (practically) optimal with respect to the entropy of
the model.
Here we intend to do no more than suggest the scope of this view
by briefly considering
.LB
.NP
adaptive text compression
.NP
non-adaptive coding
.NP
compressing black/white images
.NP
coding arbitrarily-distributed integers.
.LE
Of course, as noted earlier, greater coding efficiencies could easily be
achieved with more sophisticated models.
Modeling, however, is an extensive topic in its own right and is beyond the
scope of this paper.
.pp
.ul
Adaptive text compression
using single-character adaptive frequencies shows off arithmetic coding to
good effect.
The results obtained using the program of Figures\ 3 and 4 vary from
4.8\-5.3\ bit/char for short English text files ($10 sup 3$\ to $10 sup 4$
bytes) to 4.5\-4.7\ bit/char for long ones ($10 sup 5$ to $10 sup 6$ bytes).
Although adaptive Huffman techniques do exist (eg Gallagher, 1978;
Cormack & Horspool, 1984) they lack the conceptual simplicity of
arithmetic coding.
.[
Gallagher 1978 variations on a theme by Huffman
.]
.[
Cormack Horspool 1984 adaptive Huffman codes
.]
While competitive in compression efficiency for many files, they are slower.
For example, Table\ 4 compares the performance of the mildly-optimized C
implementation of arithmetic coding with that of the U\s-2NIX\s+2
\fIcompact\fR program which implements adaptive Huffman coding using
a similar model\(dg.
.FN
\(dg \fICompact\fR's model is essentially the same for long files (like those
of Table\ 4) but is better for short files than the model used as an example
in this paper.
.EF
Casual examination of \fIcompact\fR indicates that the care taken in
optimization is roughly comparable for both systems, yet arithmetic coding
halves execution time.
Compression performance is somewhat better with arithmetic coding on all the
example files.
The difference would be accentuated with more sophisticated models that
predict symbols with probabilities approaching one under certain circumstances
(eg letter ``u'' following ``q'').
.pp
.ul
Non-adaptive coding
can be performed arithmetically using fixed, pre-specified models like that in
the first part of Figure\ 4.
Compression performance will be better than Huffman coding.
In order to minimize execution time, the total frequency count,
$cum_freq[0]$, should be chosen as a power of two so the divisions
in the bounds calculations (Figure\ 3 lines 91-94 and 190-193) can be done
as shifts.
Encode/decode times of around 60\ $mu$s/90\ $mu$s should then be possible
for an assembly language implementation on a VAX-11/780.
A carefully-written implementation of Huffman coding, using table look-up for
encoding and decoding, would be a bit faster in this application.
.pp
.ul
Compressing black/white images
using arithmetic coding has been investigated by Langdon & Rissanen (1981),
who achieved excellent results using a model which conditioned the probability
of a pixel's being black on a template of pixels surrounding it.
.[
Langdon Rissanen 1981 compression of black-white images
.]
The template contained a total of ten pixels, selected from those above and
to the left of the current one so that they precede it in the raster scan.
This creates 1024 different possible contexts, and for each the probability of
the pixel being black was estimated adaptively as the picture was transmitted.
Each pixel's polarity was then coded arithmetically according to this
probability.
A 20%\-30% improvement in compression was attained over earlier methods.
To increase coding speed Langdon & Rissanen used an approximate method
of arithmetic coding which avoided multiplication by representing
probabilities as integer powers of 1/2.
Huffman coding cannot be directly used in this application, as it never
compresses with a two-symbol alphabet.
Run-length coding, a popular method for use with two-valued alphabets,
provides another opportunity for arithmetic coding.
The model reduces the data to a sequence of lengths of runs of the same symbol
(eg for picture coding, run-lengths of black followed by white followed by
black followed by white ...).
The sequence of lengths must be transmitted.
The CCITT facsimile coding standard (Hunter & Robinson, 1980), for example,
bases a Huffman code on the frequencies with which black and white runs of
different lengths occur in sample documents.
.[
Hunter Robinson 1980 facsimile
.]
A fixed arithmetic code using these same frequencies would give better
performance; adapting the frequencies to each particular document would be
better still.
.pp
.ul
Coding arbitrarily-distributed integers
is often called for when using more sophisticated models of text, image,
or other data.
Consider, for instance, Bentley \fIet al\fR's (1986) locally-adaptive data
compression scheme, in which the encoder and decoder cache the last $N$
different words seen.
.[
Bentley Sleator Tarjan Wei 1986 locally adaptive
%J Communications of the ACM
.]
A word present in the cache is transmitted by sending the integer cache index.
Words not in the cache are transmitted by sending a new-word marker followed
by the characters of the word.
This is an excellent model for text in which words are used frequently over
short intervals and then fall into long periods of disuse.
Their paper discusses several variable-length codings for the integers used
as cache indexes.
Arithmetic coding allows \fIany\fR probability distribution to be used as the
basis for a variable-length encoding, including \(em amongst countless others
\(em the ones implied by the particular codes discussed there.
It also permits use of an adaptive model for cache
indexes, which is desirable if the distribution of cache hits is
difficult to predict in advance.
Furthermore, with arithmetic coding, the code space allotted to the cache
indexes can be scaled down to accommodate any desired probability for the
new-word marker.
.sh "Acknowledgement"
.pp
Financial support for this work has been provided by the
Natural Sciences and Engineering Research Council of Canada.
.sh "References"
.sp
.in+4n
.[
$LIST$
.]
.in 0
.bp
.sh "APPENDIX: Proof of decoding inequality"
.sp
Using 1-letter abbreviations for $cum_freq$, $symbol$, $low$, $high$, and
$value$, suppose
.LB
$c[s] ~ <= ~~ left f {(v-l+1) times c[0] ~-~ 1} over {h-l+1} right f ~~ < ~
c[s-1]$;
.LE
in other words,
.LB
.ta \n(.lu-\n(.iuR
$c[s] ~ <= ~~ {(v-l+1) times c[0] ~-~ 1} over {r} ~~-~~ epsilon ~~ <= ~
c[s-1] ~-~1$, 	(1)
.LE
.ta 8n
where	$r ~=~ h-l+1$,  $0 ~ <= ~ epsilon ~ <= ~ {r-1} over r $.
.sp
(The last inequality of (1) derives from the fact that $c[s-1]$ must be an
integer.)  \c
Then we wish to show that  $l' ~ <= ~ v ~ <= ~ h'$,  where $l'$ and $h'$
are the updated values for $low$ and $high$ as defined below.
.sp
.ta \w'(a)    'u
(a)	$l' ~ == ~~ l ~+~~ left f {r times c[s]} over c[0] right f ~~ mark
<= ~~ l ~+~~ {r} over c[0] ~ left [ ~ {(v-l+1) times c[0] ~-~ 1} over {r}
                          ~~ - ~ epsilon ~ right ]$    from (1),
.sp 0.5
$lineup <= ~~ v ~ + ~ 1 ~ - ~ 1 over c[0]$ ,
.sp 0.5
	so   $l' ~ <= ~~ v$   since both $v$ and $l'$ are integers
and $c[0] > 0$.
.sp
(b)	$h' ~ == ~~ l ~+~~ left f {r times c[s-1]} over c[0] right f ~~-~1~~ mark
>= ~~ l ~+~~  {r} over c[0] ~ left [ ~ {(v-l+1) times c[0] ~-~ 1} over {r}
                          ~~ + ~ 1 ~ - ~ epsilon ~ right ] ~~ - ~ 1
$    from (1),
.sp 0.5
$lineup >= ~~ v ~ + ~~ r over c[0] ~ left [ ~ - ~ 1 over r ~+~ 1
                                                  ~-~~ r-1 over r right ]
~~ = ~~ v$.
.bp
.sh "Captions for tables"
.sp
.nf
.ta \w'Figure 1  'u
Table 1	Example fixed model for alphabet {\fIa, e, i, o, u, !\fR}
Table 2	Results for encoding and decoding 100,000-byte files
Table 3	Breakdown of timings for VAX-11/780 assembly language version
Table 4	Comparison of arithmetic and adaptive Huffman coding
.fi
.sh "Captions for figures"
.sp
.nf
.ta \w'Figure 1  'u
Figure 1	(a) Representation of the arithmetic coding process
	(b) Like (a) but with the interval scaled up at each stage
Figure 2	Pseudo-code for the encoding and decoding procedures
Figure 3	C implementation of arithmetic encoding and decoding
Figure 4	Fixed and adaptive models for use with Figure 3
Figure 5	Scaling the interval to prevent underflow
.fi
.bp 0
.ev2
.nr x2 \w'symbol'/2
.nr x3 (\w'symbol'/2)+0.5i+(\w'probability'/2)
.nr x4 (\w'probability'/2)+0.5i
.nr x5 (\w'[0.0, '
.nr x1 \n(x2+\n(x3+\n(x4+\n(x5+\w'0.0)'
.nr x0 (\n(.l-\n(x1)/2
.in \n(x0u
.ta \n(x2uC +\n(x3uC +\n(x4u +\n(x5u
\l'\n(x1u'
.sp
	symbol	probability	\0\0range
\l'\n(x1u'
.sp
	\fIa\fR	0.2	[0,	0.2)
	\fIe\fR	0.3	[0.2,	0.5)
	\fIi\fR	0.1	[0.5,	0.6)
	\fIo\fR	0.2	[0.6,	0.8)
	\fIu\fR	0.1	[0.8,	0.9)
	\fI!\fR	0.1	[0.9,	1.0)
\l'\n(x1u'
.sp
.in 0
.FE "Table 1  Example fixed model for alphabet {\fIa, e, i, o, u, !\fR}"
.bp 0
.ev2
.nr x1 0.5i+\w'\fIVAX object program\fR      '+\w'100,000      '+\w'time ($mu$s)  '+\w'time ($mu$s)    '+\w'time ($mu$s)  '+\w'time ($mu$s)    '+\w'time ($mu$s)  '+\w'time ($mu$s)'
.nr x0 (\n(.l-\n(x1)/2
.in \n(x0u
.ta 0.5i +\w'\fIVAX object program\fR      'u +\w'100,000      'u +\w'time ($mu$s)  'u +\w'time ($mu$s)    'u +\w'time ($mu$s)  'u +\w'time ($mu$s)    'u +\w'time ($mu$s)  'u
\l'\n(x1u'
.sp
			\0\0VAX-11/780	\0\0\0Macintosh	\0\0\0\0SUN-3/75
		output	 encode	 decode	 encode	 decode	 encode	 decode
		(bytes)	time ($mu$s)	time ($mu$s)	time ($mu$s)	time ($mu$s)	time ($mu$s)	time ($mu$s)
\l'\n(x1u'
.sp
Mildly optimized C implementation
.sp
	\fIText file\fR	\057718	\0\0214	\0\0262	\0\0687	\0\0881	\0\0\098	\0\0121
	\fIC program\fR	\062991	\0\0230	\0\0288	\0\0729	\0\0950	\0\0105	\0\0131
	\fIVAX object program\fR	\073501	\0\0313	\0\0406	\0\0950	\01334	\0\0145	\0\0190
	\fIAlphabet\fR	\059292	\0\0223	\0\0277	\0\0719	\0\0942	\0\0105	\0\0130
	\fISkew-statistics\fR	\012092	\0\0143	\0\0170	\0\0507	\0\0645	\0\0\070	\0\0\085
.sp
Carefully optimized assembly language implementation
.sp
	\fIText file\fR	\057718	\0\0104	\0\0135	\0\0194	\0\0243	\0\0\046	\0\0\058
	\fIC program\fR	\062991	\0\0109	\0\0151	\0\0208	\0\0266	\0\0\051	\0\0\065
	\fIVAX object program\fR	\073501	\0\0158	\0\0241	\0\0280	\0\0402	\0\0\075	\0\0107
	\fIAlphabet\fR	\059292	\0\0105	\0\0145	\0\0204	\0\0264	\0\0\051	\0\0\065
	\fISkew-statistics\fR	\012092	\0\0\063	\0\0\081	\0\0126	\0\0160	\0\0\028	\0\0\036

\l'\n(x1u'
.sp 2
.nr x0 \n(.l
.ll \n(.lu-\n(.iu
.fi
.in \w'\fINotes:\fR  'u
.ti -\w'\fINotes:\fR  'u
\fINotes:\fR\ \ \c
Times are measured in $mu$s per byte of uncompressed data.
.sp 0.5
The VAX-11/780 had a floating-point accelerator, which reduces integer
multiply and divide times.
.sp 0.5
The Macintosh uses an 8\ MHz MC68000 with some memory wait states.
.sp 0.5
The SUN-3/75 uses a 16.67\ MHz MC68020.
.sp 0.5
All times exclude I/O and operating system overhead in support of I/O.
VAX and SUN figures give user time from the U\s-2NIX\s+2 \fItime\fR
command; on the Macintosh I/O was explicitly directed to an array.
.sp 0.5
The 4.2BSD C compiler was used for VAX and SUN; Aztec C 1.06g for Macintosh.
.sp
.ll \n(x0u
.nf
.in 0
.FE "Table 2  Results for encoding and decoding 100,000-byte files"
.bp 0
.ev2
.nr x1 \w'\fIVAX object program\fR        '+\w'Bounds calculation        '+\w'time ($mu$s)    '+\w'time ($mu$s)'
.nr x0 (\n(.l-\n(x1)/2
.in \n(x0u
.ta \w'\fIVAX object program\fR        'u +\w'Bounds calculation        'u +\w'time ($mu$s)    'u +\w'time ($mu$s)'u
\l'\n(x1u'
.sp
		 encode	 decode
		time ($mu$s)	time ($mu$s)
\l'\n(x1u'
.sp
\fIText file\fR	Bounds calculation	\0\0\032	\0\0\031
	Bit shifting	\0\0\039	\0\0\030
	Model update	\0\0\029	\0\0\029
	Symbol decode	\0\0\0\(em	\0\0\045
	Other	\0\0\0\04	\0\0\0\00
		\0\0\l'\w'100'u'	\0\0\l'\w'100'u'
		\0\0104	\0\0135
.sp
\fIC program\fR	Bounds calculation	\0\0\030	\0\0\028
	Bit shifting	\0\0\042	\0\0\035
	Model update	\0\0\033	\0\0\036
	Symbol decode	\0\0\0\(em	\0\0\051
	Other	\0\0\0\04	\0\0\0\01
		\0\0\l'\w'100'u'	\0\0\l'\w'100'u'
		\0\0109	\0\0151
.sp
\fIVAX object program\fR	Bounds calculation	\0\0\034	\0\0\031
	Bit shifting	\0\0\046	\0\0\040
	Model update	\0\0\075	\0\0\075
	Symbol decode	\0\0\0\(em	\0\0\094
	Other	\0\0\0\03	\0\0\0\01
		\0\0\l'\w'100'u'	\0\0\l'\w'100'u'
		\0\0158	\0\0241
\l'\n(x1u'
.in 0
.FE "Table 3  Breakdown of timings for VAX-11/780 assembly language version"
.bp 0
.ev2
.nr x1 \w'\fIVAX object program\fR      '+\w'100,000      '+\w'time ($mu$s)  '+\w'time ($mu$s)    '+\w'100,000      '+\w'time ($mu$s)  '+\w'time ($mu$s)'
.nr x0 (\n(.l-\n(x1)/2
.in \n(x0u
.ta \w'\fIVAX object program\fR      'u +\w'100,000      'u +\w'time ($mu$s)  'u +\w'time ($mu$s)    'u +\w'100,000      'u +\w'time ($mu$s)  'u +\w'time ($mu$s)'u
\l'\n(x1u'
.sp
	\0\0\0\0\0\0Arithmetic coding	\0\0\0Adaptive Huffman coding
	output	 encode	 decode	output	 encode	 decode
	(bytes)	time ($mu$s)	time ($mu$s)	(bytes)	time ($mu$s)	time ($mu$s)
\l'\n(x1u'
.sp
\fIText file\fR	\057718	\0\0214	\0\0262	\057781	\0\0550	\0\0414
\fIC program\fR	\062991	\0\0230	\0\0288	\063731	\0\0596	\0\0441
\fIVAX object program\fR	\073546	\0\0313	\0\0406	\076950	\0\0822	\0\0606
\fIAlphabet\fR	\059292	\0\0223	\0\0277	\060127	\0\0598	\0\0411
\fISkew-statistics\fR	\012092	\0\0143	\0\0170	\016257	\0\0215	\0\0132
\l'\n(x1u'
.sp 2
.nr x0 \n(.l
.ll \n(.lu-\n(.iu
.fi
.in +\w'\fINotes:\fR  'u
.ti -\w'\fINotes:\fR  'u
\fINotes:\fR\ \ \c
Mildly optimized C implementation used for arithmetic coding
.sp 0.5
U\s-2NIX\s+2 \fIcompact\fR used for adaptive Huffman coding
.sp 0.5
Times are for a VAX-11/780, and exclude I/O and operating system overhead in
support of I/O.
.sp
.ll \n(x0u
.nf
.in 0
.FE "Table 4  Comparison of arithmetic and adaptive Huffman coding"