1\section{Introduction to bands}
2
3% Le plus gros du morceau arrive lorsque l'on s'intéresse à la représentation
4% réelle du document que l'on souhaite imprimer. Dans un premier temps,
5% l'étude de la représentation sera faite (dans cette section même) pour
6% embrayer ensuite sur les différents algorithmes de compression des
7% << bandes >>.
8%
9The largest piece of the puzzle is the real representation of the document you want to print.
10For the first time, the study of this format has been done (in this very section).  Following
11is a discussion of the different ``band'' compression algorithms.
12
13\subsection{From document to bitmap}
14
15\noindent\textit{\underline{Note : }
16Since the present study was only done on a black and white printer, this will
17be the only type of printing discussed.}
18
19\medskip
20In contrast to languages like \emph{PostScript}, that are vector based
21languages, QPDL is a language that describes images dot by dot.
22
23To print a document it is therefore necessary to convert it to an image
24where each bit corresponds to a dot on the page.  The format closest to
25that understood by the printer is PBM\footnote{Portable BitMap}.
26
27To review, the PBM format describes an image bit by bit without compression. A set bit corresponds to a black dot and a zero bit corresponds to a white dot. Adjacent bits represent lines and the byte
28which follows the end of a line starts the beginning of the next.  This continues until the
29end of the file.
30
31\subsection{From image to bands}
32
33Once the document is converted into an image, it is necessary to cut it
34into bands.  The bands represent a fixed number of lines of dots: one band
35represents exactly \emph{128 lines} (that's $0x80$).
36
37\subsection{An image in columns}
38
39In contrast to a PBM image which describes an image line by line, the printer expects the
40image in bands that represent the image \emph{column by column}. In fact, the first
41byte represents the first byte of the first line, the second byte represents the first byte of the second line, \ldots. Then the 129\raise{th} byte represents the second byte of the first line.
42
43Figure \ref{fig:bande} represents this treatment.
44
45\begin{figure}[ht]
46\centering
47\includegraphics{Images/bande.eps}
48\caption{From a document to rearranged data}
49\label{fig:bande}
50\end{figure}
51
52\subsection{Bit inversion}
53Whereas in a PBM file a set bit corresponds to a black bit, for the
54printer is is the reverse.
55It is thus essential to apply the \emph{NOT} operator to each byte in the band.
56
57\section{Band header}
58
59The band header gives invaluable directions on how to convert
60the band and recreate the document. As table
61\ref{tab:entete_bande} shows.
62
63\begin{table}[!ht]
64\centering
65\begin{tabular}{| c | c | c |}
66\hline
67\textbf{Address} & \textbf{Description} & \textbf{Type} \\
68\hline
69\hline
700x0 & Fixed value $0xC$ & Byte \\
710x1 & Band number & Byte \\
720x2 & Width of band & 16-Bits BE \\
730x4 & Height of band & 16-Bits BE \\
740x6 & Compression version $0x11$ (?) & Byte \\
750x7 & Length of following data & 32-Bits BE \\
76\hline
77\end{tabular}
78\caption{Band Header}
79\label{tab:entete_bande}
80\end{table}
81
82\section{Band compression}
83To reduce the quantity of data transmitted to the printer, the band data is compressed. There
84appear to be several compression versions. The following description covers the
85version most often seen, version \textsc{$0x11$}.
86
87\subsection{Compressed data header}
88The  information required to decompress the data is
89provided in the header.  A description of which is found
90in table\ref{tab:entete_compression}.
91In contrast to the other headers, this header can be written
92without worrying about whether the machine is
93\emph{little-endian} or \emph{big-endian}. Indeed, when the header
94signature is written ($0x9ABCDEF$), the printer will autodetect
95the endianness of the following
96data. It is thus enough to write the data without any special conversion.
97
98\begin{table}[!ht]
99\centering
100\begin{tabular}{| c | c | c |}
101\hline
102\textbf{Adresse} & \textbf{Description} & \textbf{Type} \\
103\hline
104\hline
1050x0 & Fixed value $0x9ABCDEF$ & 32-Bits \\
1060x4 & Raw data length ($ \leq 128$) & 32-Bits \\
1070x8 & Pointer table & 64 16-Bit entries \\
1080x88 & Raw data & Variable \\
109Variable & Compressed data & Variable \\
110\hline
111\end{tabular}
112\caption{Compressed data header}
113\label{tab:entete_compression}
114\end{table}
115
116The significance of the various entries follows.
117
118\subsection{The Compression Algorithm}
119The compression algorithm aims to reduce sequences of similar bytes.
120With this aim, a table of $64$ \emph{offsets} is initialized, and the beginning of the uncompressed
121raw data is copied.
122Then, for each entry in the table, one compares the following data to be compressed with previous uncompressed data using the table entry as a negative offset from the current position.
123The index of the table entry that provides the longest sequence is kept and its index
124and the length of the sequence is output.
125If, however, no matching sequence can be found, the following data will be written out
126uncompressed until a sequence using one of the offsets in the table can be found.
127
128\subsection{An example of compression}
129The study of a concrete example permits a better understanding.
130Here's a table with 4 entries:
131
132\begin{exemple}
133Table = \{-1; -3; -4; -5\}
134\end{exemple}
135
136with initial raw data as follows:
137\begin{exemple}
13801 04 03 06 08 0F 0F 0F 0F 04 02 05 08 01 06 03 06 01 06
139\end{exemple}
140
141The following data is to be compressed:
142\begin{exemple}
1430F 0F 01 04 03 06 0F 01 04 05 06 0F 01 04 05 06 0F 0F 0F 0F 0F
144\end{exemple}
145
146To find the best sequence, it is necessary to try each pointer. Each pointer is
147tried relative to the current position.
148Taking the first pointer, the comparison is done between 06 and 0F
149The comparison stops there. It is noticed that for the three other pointers,
150no matching sequence is found. Therefore $0F$ will be written out uncompressed.
151
152For the second character, the first pointer is valid for a sequence of length
153one; and that's the only pointer that matches a sequence.  Unfortunately, if
154the length of a sequence is \emph{less than 3} compression is not useful.
155It is therefore preferable to output the data uncompressed.
156
157This is the same for the four next characters.  With the seventh character,
158the fourth pointer gives us a sequence of three characters.
159
160The next character is written out uncompressed.  Following that, the fourth
161pointer matches a sequence of length seven.
162
163Finally, to finish, the first pointer matches a sequence of four bytes.
164\medskip
165
166To review, the compressed data may be represented as follows:
167\begin{exemple}
168Uncompressed data : 0F 0F 01 04 03 06 \\
169Pointer 4, length = 3 \\
170Uncompressed data : 05 \\
171Pointer 4, length = 7 \\
172Pointer 1, length = 4
173\end{exemple}
174
175
176\subsection{Compression coding}
177As we saw above, a table of pointers is stored in the header.
178Each entry is the absolute value of a displacement.
179So, to represent a displacement of $-1$, the entry will be $1$.
180Uncompressed data follows the header.
181The amount of this uncompressed data is the maximum of $128$ and the largest pointer value (in absolute value). So, if the largest pointer value is greater then $128$, only $128$ bytes
182of uncompressed data follows.
183Finally, this quantity of data is copied out of the header into the start of
184the uncompressed data.
185
186What follows is the compressed data.
187If a sequence of data can be compressed, \emph{bit 7} of the first byte is
188set and the remaining seven bits encode the least significant bits of the
189\emph{sequence length}.  Finally, bits $6$ and $7$ of the second byte encode
190bits seven and eight of the \emph{sequence length} whilst the other bits encode
191an index into the pointer table.
192It should be noted that three must be \emph{subracted} from the sequence length
193before being encoded and this explains why it is not possible to represent
194sequences of less than three bytes.
195
196On the other hand, if a sequence of data cannot be compressed,
197% 128 surely?
198this data is cut into sequences of 64 bytes (in case it's longer)
199and the first byte encodes the length minus one of the uncompressed data that follows.
200
201A small summary is given in figure \ref{fig:code_comp}.
202\begin{figure}[!ht]
203\centering
204\begin{verbatim}
205unsigned short length, ptr;
206
207if (byte1 & 0x80) {
208    length = (byte1 & 0x7F) + ((byte2 & 0xC0) << 1) + 3;
209    ptr = byte2 & 0x3F;
210} else {
211    length = byte1 + 1;
212}
213\end{verbatim}
214\caption{Sequence encoding}
215\label{fig:code_comp}
216\end{figure}
217
218To repeat the previous example, here's how it would be compressed:
219\begin{exemple}
22005 0F 0F 01 04 03 06 80 04 00 05 84 04 81 01
221\end{exemple}
222
223That's six bytes shorter than the uncompressed data.
224
225\section{Checksum}
226To be able to detect data corruption during transit, a
227checksum is placed at the end of the compressed data (it should be noted
228that the length of this checksum is included in the
229data length indicated in the band header).
230This sum is coded
231as 32-bits big-endian and is the simple sum of the bytes between
232the band header and the checksum.
233
234