crc/doc/crc.qbk

[library Boost.CRC
    [quickbook 1.5]
    [version 1.5]
    [id crc]
    [dirname crc]
    [copyright 2001 2003 2012 Daryle Walker]
    [purpose Cyclic Redundancy Code computation]
    [category Miscellaneous]
    [authors [Walker, Daryle]]
    [license
        Distributed under the Boost Software License, Version 1.0.
        (See the accompanying file LICENSE_1_0.txt or a copy at
        [@http://www.boost.org/LICENSE_1_0.txt])
    ]
    [source-mode c++]
]

[/  Macros   ]
[def __RMCA__  Rocksoft\u2122 Model CRC Algorithm]

[section:motivation  What is Boost.CRC?]

CRCs (cyclic redundancy codes) is one common technique to confirming data
integrity after transmission.  The [*Boost.CRC] library provides access to two
styles of CRC computation, one as a function template, the other as a function
template and two computation object class templates, where the two class
templates differ in speed.

[endsect]
[section:introduction  Introduction]

[note
    Some of the introductory material is based on
    [@http://www.ross.net/crc/download/crc_v3.txt ['A Painless Guide to CRC
    Error Detection Algorithms]] by Ross N. Williams at his
    [@http://www.ross.net/crc/ [*The CRC Pitstop]] site.
]

When binary data is transmitted, usually electronically, there is a chance that
the data gets corrupted.  One method to pick up said corruption is to generate
some value that is coded from the original data, send said value to the
receiver, then confirm that the received data generates the same value when it's
coded at the destination.

There are several possibilities after the receiver's check:

* The check value matches at both places and the data was transmitted intact.
* The data got corrupted and the check values do not match.
* The data is intact, but the check value got corrupted.  This can't the
  distinguished from the previous case, but is a (relatively) safe false
  positive.
* Either the data and/or check value gets corrupted, but such that the results
  of the check-coding are still consistent.  This is a dangerous false negative!

The way to minimize false negatives is to choose coding algorithms that cause a
lot of churn per input, especially a variable amount.

The check values are known as [*checksum]s because they are used to /check/ for
data consistency and the first coding algorithms were addition- (i.e.
['sum]ming-) based.

[section:intro_crcs  CRCs]

[*Cyclic Redundancy Codes] are a type of consistency check that treats the
message data as a (long) dividend of a modulo-2 polynomial division.  Modulo-2
arithmetic doesn't use carries/borrows when combining numbers.  A specific CRC
defines a set number of bits to work on at a time, where said number is also the
degree of a fixed polynomial (with modulo-2 coefficients) used as a divisor.

Since ordering doesn't apply to modulo arithmetic, the check between the current
high part of the dividend and the trial partial product (of the divisor and the
trial new quotient coefficient) is done by seeing if the highest-degree
coefficient of the dividend is one.  (The highest-degree coefficient of the
divisor must be one by definition, since it's the only non-zero choice.)  The
remainder after the division is finished is used as the basis of the CRC
checksum.

[section:intro_crcs_impl  CRC Implementation]

For a given degree /x/ for the modulo-2 polynomial divisor, the remainder will
have at most /x/ terms (from degree /x/ - 1 down to the constant term).  The
coefficients are modulo-2, which means that they can be represented by 0's and
1's.  So a remainder can be modeled by an (unsigned) integer of at least /x/
bits in *width*.

The divisor must have its /x/ degree term be one, which means it is always known
and can be implied instead of having to explicitly include in representations.
Its lower /x/ terms must be specified, so a divisor can be modeled the same way
as remainders.  With such a modeling, the divisor representation could be said
to be truncated since the uppermost term's value is implied and not stored.

The remainder and [*(truncated) divisor polynomial]s are stored as basic
computer integers.  This is in contrast to the dividend, which is modeled from
the input stream of data bits, where each new incoming bit is the next lower
term of the dividend polynomial.  Long division can be processed in piecemeal,
reading new upper terms as needed.  This maps to reading the data a byte (or
bit) at a time, generating updated remainders just-in-time, without needing to
read (and/or store(!)) the entire data message at once.

Long division involves appending new dividend terms after the previous terms
have been processed into the (interim) remainder.  So the remainder it the only
thing that has to change during each division step; a new input byte (or bit) is
combined with the remainder to make the interim dividend, and then combined with
the partial product (based on the divisor and top dividend bit(s)) to become a
remainder again.

[endsect]
[section:intro_crcs_augment  Message Augmentation]

When all of the input data has been read during division, the last /x/ bits are
still stuck in the interim remainder.  They have not been pushed through the
division steps; to do so, /x/ zero-valued extra bits must be passed into the
system.  This ensures all of the message's data bits get processed.  The
post-processed remainder is the checksum.  The system requires the message to be
*augmented* with /x/ extra bits to get results.

Alternatively, if the post-division augmentation bits are the expected checksum
instead, then the remainder will "subtract" the checksum with itself, giving
zero as the final remainder.  The remainder will end up non-zero if bit errors
exist in either the data or checksum or both.  This option requires the checksum
to be fed from highest-order bit first on down (i.e. big endian).

Exploiting the properties of how the division is carried out, the steps can be
rearranged such that the post-processing zero-valued bits are not needed; their
effect is merged into the start of the process.  Such systems read *unaugmented*
messages and expose the checksum directly from the interim remainder afterwards.
(You can't use the "augment-message-with-checksum-and-zero-check" technique with
this, of course.)

[endsect]
[section:intro_crcs_param  Other CRC Parameters]

Since long division proceeds from the uppermost terms on down, it's easiest to
treat an incoming byte as the uppermost unprocessed terms, and to read the bits
within that byte as the highest-order bit is the uppermost unprocessed term and
go down.  However, some hardware implementations have an easier time reading
each byte from the lowest-order bit and go up.  To simulate those systems in
software, the program needs to be flagged that *input reflection* needs to be
applied.  Reflecting a built-in integer reverses the order of its bits, such
that the lowest- and highest-order bits swap states, the next-lowest- and
next-highest-order bits swap, etc.  The input reflection can be done by
reflecting each byte as it comes in or keeping the bytes unchanged but reflect
the other internal functioning.  The latter sounds harder, but what it usually
done in the real world, since it's a one-time cost, unlike reflecting the bytes.

Similarly, the final remainder is processed by some hardware in reverse order,
which means software that simulate such systems need to flag that *output
reflection* is in effect.

Some CRCs don't return the remainder directly (reflected or not), but add an
extra step complementing the output bits.  Complementing turns 1 values into 0
values and vice versa.  This can simulated by using a XOR (exclusive-or) bit
mask of all 1-values (of the same bit length as the remainder).  Some systems
use a *final XOR mask* that isn't all 1-values, for variety.  (This mask takes
place /after/ any output reflection.)

At the other end, the built-in-integer register normally starts at zero as the
first bytes are read.  Instead of just doing nothing but load input bits for /x/
steps, some CRC systems use a non-zero *initial remainder* to add extra
processing.  This initial value has to be different for the augmented versus
un-augmented versions of the same system, due to possible incorporation with the
zero-valued augment bits.

[endsect]
[section:intro_crcs_rmca  Compact CRC Parameter Specification]

The __RMCA__, or RMCA for short, was designed by Ross Williams to describe all
the specification points of a given CRC system (quoted):

[variablelist  RMCA Parameters
   [[WIDTH] [This is the width of the algorithm expressed in bits. This
   is one less than the width of the Poly.]]
   [[POLY] [This parameter is the poly. This is a binary value that
   should be specified as a hexadecimal number. The top bit of the
   poly should be omitted. For example, if the poly is 10110, you
   should specify 06. An important aspect of this parameter is that it
   represents the unreflected poly; the bottom bit of this parameter
   is always the LSB of the divisor during the division regardless of
   whether the algorithm being modelled is reflected.]]
   [[INIT] [This parameter specifies the initial value of the register
   when the algorithm starts. This is the value that is to be assigned
   to the register in the direct table algorithm. In the table
   algorithm, we may think of the register always commencing with the
   value zero, and this value being XORed into the register after the
   N'th bit iteration. This parameter should be specified as a
   hexadecimal number.]]
   [[REFIN] [This is a boolean parameter. If it is FALSE, input bytes are
   processed with bit 7 being treated as the most significant bit
   (MSB) and bit 0 being treated as the least significant bit. If this
   parameter is FALSE, each byte is reflected before being processed.]]
   [[REFOUT] [This is a boolean parameter. If it is set to FALSE, the
   final value in the register is fed into the XOROUT stage directly,
   otherwise, if this parameter is TRUE, the final register value is
   reflected first.]]
   [[XOROUT] [This is an W-bit value that should be specified as a
   hexadecimal number. It is XORed to the final register value (after
   the REFOUT) stage before the value is returned as the official
   checksum.]]
]

His description assumes an octet-sized byte.  The /POLY/ is the (truncated)
divisor.  The /INIT/ is the initial remainder, assuming the unaugmented version
of CRC processing is used.  (If you're using an augmented-style CRC, you have to
undo the effect of the built-in zero-augment before initialization.)

[endsect]

The two function templates and two class templates in this library provide ways
to carry out CRC computations.  You give the various __RMCA__ parameters as
template parameters and/or constructor parameters.  You then submit all the
message data bytes at once (for the functions) or piecemeal (for the class
objects).

[endsect]

Note that some error-detection techniques merge their checksum results within
the message data, while CRC checksums are either at the end (when augmented,
without either kind of reflection, with a bit-width that's a multiple of byte
size, and no XOR mask) or out-of-band.

[endsect]
[section:crc_basic  Theoretical CRC Computer]

    #include <cstddef>  // for std::size_t

    namespace boost
    {
        template < std::size_t Bits >
        class crc_basic;
    }

The [*`boost::crc_basic`] class template acts as an unaugmented-CRC processor
that can accept input at the bit-level.  It only takes one __RMCA__ parameter as
a template parameter, the /WIDTH/, which determines the built-in unsigned integer
type used for division registers.  The other __RMCA__ parameters can be passed
on through the constructor.  (Most of the parameters have defaults.)

The integer type used for registers is published as `value_type`, while the
__RMCA__ attributes can be discovered as:

[table:crc_basic_rmca  RMCA Parameters in boost::crc_basic
    [[Parameter]  [Member Name]                 [Kind]                               [Expression Type]]
    [[['WIDTH]]   [`bit_count`]                 [class-static immutable data member] [`std::size_t`]]
    [[['POLY]]    [`get_truncated_polynominal`] [`const` member function]            [`value_type`]]
    [[['INIT]]    [`get_initial_remainder`]     [`const` member function]            [`value_type`]]
    [[['REFIN]]   [`get_reflect_input`]         [`const` member function]            [`bool`]]
    [[['REFOUT]]  [`get_reflect_remainder`]     [`const` member function]            [`bool`]]
    [[['XOROUT]]  [`get_final_xor_value`]       [`const` member function]            [`value_type`]]
]

Since most of the parameters are specified at run-time, you can reuse a single
computer object for separate runs with differing parameters.  The type uses the
automatically-defined copy/move/assign and destruction routines.

[import ./crc_examples.cpp]
[crc_basic_reuse]

This example necessarily demonstrates input and output.  There is one output
member function, `checksum` that returns the remainder from the CRC steps plus
any post-processing reflection and/or XOR-masking.  There are several options
to submit input data for processing:

[table:crc_basic_input  Member Functions for Input in boost::crc_basic
    [[Member Function] [Input Type/Style]]
    [[`process_bit`]   [A single bit.]]
    [[`process_bits`]  [A specified number of bits within the given byte.
     Reading starts from the highest-order desired bit.]]
    [[`process_byte`]  [An entire byte.  The bits within the byte are read in an
     order consistent with `get_reflect_input`.]]
    [[`process_block`] [All bytes in the range.  The range is defined by a
     pointer to the first byte and another pointer to one (byte) past-the-end.]]
    [[`process_bytes`] [All bytes in the range.  The range is defined by a
     pointer to the first byte and a count of number of bytes to read.]]
]

The input functions currently do not return anything.

[crc_piecemeal_run]

The input-influenced state of a computer object may be read with the
`get_interim_remainder` member function.  An object may be loaded with the same
kind of state with the one-argument version of the `reset` member function.
The state is the remainder from the CRC operations performed on all the
already-entered input, without any output post-processing.

The two functions can be used together to save the state to a persistent medium
and restore it later.  Note that the __RMCA__ parameters have to saved/restored
via other means, if they're not fixed within a program.

Calling `reset` with no arguments sets the interim remainder to what /INIT/ was
set at object construction.  This lets one computer object be used for
independent runs with separate data messages using the same CRC standard.

[endsect]
[section:crc_optimal  Optimized CRC Computer]

    #include <boost/cstdint.hpp>  // for boost::uintmax_t
    #include <cstddef>            // for std::size_t

    namespace boost
    {
        template < std::size_t Bits, uintmax_t TruncPoly, uintmax_t InitRem,
         uintmax_t FinalXor, bool ReflectIn, bool ReflectRem >
        class crc_optimal;
    }

The [*`boost::crc_optimal`] class template acts as an unaugmented-CRC processor
that can accept input at the byte-level.  It takes all the __RMCA__ parameters
as template parameters.  Like in `crc_basic`, the /WIDTH/ stays as the first
parameter and determines the built-in unsigned integer type used for division
registers.  The other __RMCA__ parameters move up to the template parameter
field in the same relative order they were in the `crc_basic` constructor.
(Some parameters have defaults.)  Objects based from `crc_optimal` can either be
default-constructed, giving it the same behavior as a `crc_basic` object with
the equivalent settings, or use one parameter, which overrides the initial
remainder value[footnote i.e. The interim-remainder before any input is read.]
without permanently affecting the initial-remainder attribute.

Besides template parameters and construction, `crc_optimal` differs from
`crc_basic` interface-wise by:

* Adding five class-static immutable data members corresponding to the new
  template parameters.
  [table:crc_optimal_rmca  Additional RMCA Expressions in boost::crc_optimal
    [[New Member]              [Equivalent]]
    [[`truncated_polynominal`] [`get_truncated_polynominal`]]
    [[`initial_remainder`]     [`get_initial_remainder`]]
    [[`reflect_input`]         [`get_reflect_input`]]
    [[`reflect_remainder`]     [`get_reflect_remainder`]]
    [[`final_xor_value`]       [`get_final_xor_value`]]
  ]
* Removing the `process_bit` and `process_bits` member functions.
* Adding two versions of the `operator ()` member function.  The single-argument
  version forwards to `process_byte`, making it suitable to STL algorithms that
  take (and possibly return) function objects[footnote Like `std::for_each`.].
  The argument-less version forwards to `checksum`, making it suitable for STL
  algorithms that expect generator objects[footnote Albeit this object won't
  change its return value within code that only uses it as a generator.].
* Merging the two `reset` member functions into one.  (It uses a single
  parameter that can have a default argument).

The major difference between `crc_basic` and `crc_optimal` is the internals.
Objects from `crc_basic` run their CRC algorithm one bit at a time, no matter
which unit is used as input.  Objects from `crc_optimal`, when /WIDTH/ is at
least `CHAR_BIT`[footnote i.e. The optimizations are suspended if the /WIDTH/
only justifies using part of a single byte.], use a byte-indexed table-driven CRC
algorithm which is a *lot* faster than processing individual bits.

Since all of the parameters are specified at compile-time, you cannot reuse a
single computer object for separate runs with differing parameters.  The type
uses the automatically-defined copy/move/assign and destruction routines.

[endsect]
[section:crc_function  CRC Function]

    #include <boost/cstdint.hpp>  // for boost::uintmax_t
    #include <boost/integer.hpp>  // for boost::uint_t
    #include <cstddef>            // for std::size_t

    namespace boost
    {
        template < std::size_t Bits, uintmax_t TruncPoly, uintmax_t InitRem,
         uintmax_t FinalXor, bool ReflectIn, bool ReflectRem >
        typename uint_t<Bits>::fast  crc( void const *buffer, std::size_t
         byte_count );
    }

The [*`boost::crc`] function template computes the unaugmented CRC of a single
data block in one pass.  It has the same template parameter list as
`boost::crc_optimal`, which it uses for implementation.  This function saves you
the trouble of setting up and using the `boost::crc_optimal` object manually.

[endsect]
[section:acrc_function  Augmented-CRC Function]

    #include <boost/cstdint.hpp>  // for boost::uintmax_t
    #include <boost/integer.hpp>  // for boost::uint_t
    #include <cstddef>            // for std::size_t

    namespace boost
    {
        template < std::size_t Bits, uintmax_t TruncPoly >
        typename uint_t<Bits>::fast  augmented_crc( void const *buffer,
         std::size_t byte_count, typename uint_t<Bits>::fast initial_remainder
         = 0u  );
    }

The [*`boost::augmented_crc`] function computes the augmented-style CRC of a
data block.  Like `boost::crc`, the first two template parameters are the
/WIDTH/ and /POLY/.  However, the /INIT/ has moved to being a function
parameter, after the data block's starting address and byte length, defaulting
to zero if not given.

This function uses modulo-2 division at its most raw, and so forfeits the
/REFIN/, /REFOUT/, and /XOROUT/ attributes, setting them to `0` or `false`.
Another difference from `boost::crc` is that a non-zero /INIT/ has to be skewed
when used with this function.  (No conversion functions are currently given.)

[acrc_piecemeal_run]

Since `augmented_crc` doesn't know when your data ends, you must supply the
augment, either /WIDTH/ zero bits or the expected checksum.  The augment can be
either at the end of last data block or from an extra call.  Remember that if an
expected checksum is used as the augment, its bits must be arranged in
big-endian order.  Because `augmented_crc` reads byte-wise, while augments
assume bit-wise reading, augmentations are valid only when /WIDTH/ is a multiple
of the bits-per-byte ratio (`CHAR_BIT`).

[endsect]
[section:crc_samples  Pre-Defined CRC Samples]

    namespace boost
    {
        typedef crc_optimal<16, 0x8005, 0, 0, true, true>
          crc_16_type;

        typedef crc_optimal<16, 0x1021, 0xFFFF, 0, false, false>
          crc_ccitt_false_t, crc_ccitt_type;
        typedef crc_optimal<16, 0x1021, 0, 0, true, true>  crc_ccitt_true_t;

        typedef crc_optimal<16, 0x8408, 0, 0, true, true>    crc_xmodem_type;
        typedef crc_optimal<16, 0x1021, 0, 0, false, false>  crc_xmodem_t;

        typedef crc_optimal<32, 0x04C11DB7, 0xFFFFFFFF, 0xFFFFFFFF, true, true>
          crc_32_type;
    }

Several sample CRC types are given, representing common CRC algorithms.  The
samples have now been checked against the
[@http://regregex.bbcmicro.net/crc-catalogue.htm ['Catalogue of parametrised CRC
algorithms]], leading to some new type-aliases corresponding to the corrected
profiles.  (Older, incorrect profiles keep their name for backwards
compatibility.)  However, this library is primarily concerned with CRC
implementation, and not with determining "good" sets of CRC parameters.

[table:crc_samples_list  Common CRCs
    [[Computer Type]        [Standard(s)]]
    [[`crc_16_type`]        [BISYNCH, ARC, LHA, ZOO]]
    [[`crc_ccitt_false_t`]  [Commonly misidentified as the standard by CCITT]]
    [[`crc_ccitt_type`]     [`crc_ccitt_false_t` (I made the same mistake.)]]
    [[`crc_ccitt_true_t`]   [Designated by CCITT (Comit\u00E9 Consultatif
     International T\u00E9l\u00E9graphique et T\u00E9l\u00E9phonique), KERMIT]]
    [[`crc_xmodem_type`]    [A mistake I didn't catch in defining
     `crc_xmodem_t`.]]
    [[`crc_xmodem_t`]       [XMODEM, ZMODEM, ACORN]]
    [[`crc_32_type`]        [ADCCP, PKZip, libPNG, AUTODIN II, Ethernet, FDDI]]
]

[endsect]
[section:end  End Matter]
[heading  References]

*The [@boost:/boost/crc.hpp CRC header] itself
*Some [@boost:/libs/crc/test/crc_test.cpp test code] and some
 [@boost:/libs/crc/test/crc_test2.cpp newer tests] that use the
 [@boost:/libs/test/index.html Boost test library]
*Some [@boost:/libs/crc/crc_example.cpp example code]

[variablelist  History
    [[Boost 1.49.0] [Refined implementation, testing, and documentation.  Some
      interfaces were tweaked.]]
    [[Boost 1.30.2 (estimated)] [Released an example program.]]
    [[Boost 1.22.0] [First public release.]]
]

[heading  Acknowledgments]

For giving advice on compiler/C++ compliance, implementation, interface,
algorithms, and bug reports:

*Darin Adler
*Beman Dawes
*Doug Gregor
*John Maddock
*Joe Mariadassou
*Jens Maurer
*Vladimir Prus
*Joel Young

[variablelist  Contributors
    [[Michael Barr [@mailto:mbarr@netrino.com mbarr@netrino.com]]
     [Wrote
      [@http://www.netrino.com/Embedded-Systems/How-To/CRC-Calculation-C-Code
      ["CRC Implementation Code in C]], a less-confusing guide to implementing
      CRC algorithms.  (Originally published as ["Slow and Steady Never Lost the
      Race] in the January 2000 issue of [@http://www.embedded.com ['Embedded
      Systems Programming]], pages 37\u201346.  The web version used to be known
      as [@http://www.netrino.com/Connecting/2000-01/ ['Easier Said Than Done]].)
     ]]
    [[Daryle Walker]
     [Started the library and contributed the theoretical and optimal CRC
      computation class templates and the CRC computing function template.
      Contributed the test and example code.
     ]]
    [[Ross N. Williams]
     [Wrote [@http://www.ross.net/crc/crcpaper.html ['A Painless Guide to CRC
      Error Detection Algorithms]], a definitive source of CRC information.
     ]]
]

[endsect]

[xinclude autodoc.xml]