1\section{Introduction to bands} 2 3% Le plus gros du morceau arrive lorsque l'on s'intéresse à la représentation 4% réelle du document que l'on souhaite imprimer. Dans un premier temps, 5% l'étude de la représentation sera faite (dans cette section même) pour 6% embrayer ensuite sur les différents algorithmes de compression des 7% << bandes >>. 8% 9The largest piece of the puzzle is the real representation of the document you want to print. 10For the first time, the study of this format has been done (in this very section). Following 11is a discussion of the different ``band'' compression algorithms. 12 13\subsection{From document to bitmap} 14 15\noindent\textit{\underline{Note : } 16Since the present study was only done on a black and white printer, this will 17be the only type of printing discussed.} 18 19\medskip 20In contrast to languages like \emph{PostScript}, that are vector based 21languages, QPDL is a language that describes images dot by dot. 22 23To print a document it is therefore necessary to convert it to an image 24where each bit corresponds to a dot on the page. The format closest to 25that understood by the printer is PBM\footnote{Portable BitMap}. 26 27To review, the PBM format describes an image bit by bit without compression. A set bit corresponds to a black dot and a zero bit corresponds to a white dot. Adjacent bits represent lines and the byte 28which follows the end of a line starts the beginning of the next. This continues until the 29end of the file. 30 31\subsection{From image to bands} 32 33Once the document is converted into an image, it is necessary to cut it 34into bands. The bands represent a fixed number of lines of dots: one band 35represents exactly \emph{128 lines} (that's $0x80$). 36 37\subsection{An image in columns} 38 39In contrast to a PBM image which describes an image line by line, the printer expects the 40image in bands that represent the image \emph{column by column}. In fact, the first 41byte represents the first byte of the first line, the second byte represents the first byte of the second line, \ldots. Then the 129\raise{th} byte represents the second byte of the first line. 42 43Figure \ref{fig:bande} represents this treatment. 44 45\begin{figure}[ht] 46\centering 47\includegraphics{Images/bande.eps} 48\caption{From a document to rearranged data} 49\label{fig:bande} 50\end{figure} 51 52\subsection{Bit inversion} 53Whereas in a PBM file a set bit corresponds to a black bit, for the 54printer is is the reverse. 55It is thus essential to apply the \emph{NOT} operator to each byte in the band. 56 57\section{Band header} 58 59The band header gives invaluable directions on how to convert 60the band and recreate the document. As table 61\ref{tab:entete_bande} shows. 62 63\begin{table}[!ht] 64\centering 65\begin{tabular}{| c | c | c |} 66\hline 67\textbf{Address} & \textbf{Description} & \textbf{Type} \\ 68\hline 69\hline 700x0 & Fixed value $0xC$ & Byte \\ 710x1 & Band number & Byte \\ 720x2 & Width of band & 16-Bits BE \\ 730x4 & Height of band & 16-Bits BE \\ 740x6 & Compression version $0x11$ (?) & Byte \\ 750x7 & Length of following data & 32-Bits BE \\ 76\hline 77\end{tabular} 78\caption{Band Header} 79\label{tab:entete_bande} 80\end{table} 81 82\section{Band compression} 83To reduce the quantity of data transmitted to the printer, the band data is compressed. There 84appear to be several compression versions. The following description covers the 85version most often seen, version \textsc{$0x11$}. 86 87\subsection{Compressed data header} 88The information required to decompress the data is 89provided in the header. A description of which is found 90in table\ref{tab:entete_compression}. 91In contrast to the other headers, this header can be written 92without worrying about whether the machine is 93\emph{little-endian} or \emph{big-endian}. Indeed, when the header 94signature is written ($0x9ABCDEF$), the printer will autodetect 95the endianness of the following 96data. It is thus enough to write the data without any special conversion. 97 98\begin{table}[!ht] 99\centering 100\begin{tabular}{| c | c | c |} 101\hline 102\textbf{Adresse} & \textbf{Description} & \textbf{Type} \\ 103\hline 104\hline 1050x0 & Fixed value $0x9ABCDEF$ & 32-Bits \\ 1060x4 & Raw data length ($ \leq 128$) & 32-Bits \\ 1070x8 & Pointer table & 64 16-Bit entries \\ 1080x88 & Raw data & Variable \\ 109Variable & Compressed data & Variable \\ 110\hline 111\end{tabular} 112\caption{Compressed data header} 113\label{tab:entete_compression} 114\end{table} 115 116The significance of the various entries follows. 117 118\subsection{The Compression Algorithm} 119The compression algorithm aims to reduce sequences of similar bytes. 120With this aim, a table of $64$ \emph{offsets} is initialized, and the beginning of the uncompressed 121raw data is copied. 122Then, for each entry in the table, one compares the following data to be compressed with previous uncompressed data using the table entry as a negative offset from the current position. 123The index of the table entry that provides the longest sequence is kept and its index 124and the length of the sequence is output. 125If, however, no matching sequence can be found, the following data will be written out 126uncompressed until a sequence using one of the offsets in the table can be found. 127 128\subsection{An example of compression} 129The study of a concrete example permits a better understanding. 130Here's a table with 4 entries: 131 132\begin{exemple} 133Table = \{-1; -3; -4; -5\} 134\end{exemple} 135 136with initial raw data as follows: 137\begin{exemple} 13801 04 03 06 08 0F 0F 0F 0F 04 02 05 08 01 06 03 06 01 06 139\end{exemple} 140 141The following data is to be compressed: 142\begin{exemple} 1430F 0F 01 04 03 06 0F 01 04 05 06 0F 01 04 05 06 0F 0F 0F 0F 0F 144\end{exemple} 145 146To find the best sequence, it is necessary to try each pointer. Each pointer is 147tried relative to the current position. 148Taking the first pointer, the comparison is done between 06 and 0F 149The comparison stops there. It is noticed that for the three other pointers, 150no matching sequence is found. Therefore $0F$ will be written out uncompressed. 151 152For the second character, the first pointer is valid for a sequence of length 153one; and that's the only pointer that matches a sequence. Unfortunately, if 154the length of a sequence is \emph{less than 3} compression is not useful. 155It is therefore preferable to output the data uncompressed. 156 157This is the same for the four next characters. With the seventh character, 158the fourth pointer gives us a sequence of three characters. 159 160The next character is written out uncompressed. Following that, the fourth 161pointer matches a sequence of length seven. 162 163Finally, to finish, the first pointer matches a sequence of four bytes. 164\medskip 165 166To review, the compressed data may be represented as follows: 167\begin{exemple} 168Uncompressed data : 0F 0F 01 04 03 06 \\ 169Pointer 4, length = 3 \\ 170Uncompressed data : 05 \\ 171Pointer 4, length = 7 \\ 172Pointer 1, length = 4 173\end{exemple} 174 175 176\subsection{Compression coding} 177As we saw above, a table of pointers is stored in the header. 178Each entry is the absolute value of a displacement. 179So, to represent a displacement of $-1$, the entry will be $1$. 180Uncompressed data follows the header. 181The amount of this uncompressed data is the maximum of $128$ and the largest pointer value (in absolute value). So, if the largest pointer value is greater then $128$, only $128$ bytes 182of uncompressed data follows. 183Finally, this quantity of data is copied out of the header into the start of 184the uncompressed data. 185 186What follows is the compressed data. 187If a sequence of data can be compressed, \emph{bit 7} of the first byte is 188set and the remaining seven bits encode the least significant bits of the 189\emph{sequence length}. Finally, bits $6$ and $7$ of the second byte encode 190bits seven and eight of the \emph{sequence length} whilst the other bits encode 191an index into the pointer table. 192It should be noted that three must be \emph{subracted} from the sequence length 193before being encoded and this explains why it is not possible to represent 194sequences of less than three bytes. 195 196On the other hand, if a sequence of data cannot be compressed, 197% 128 surely? 198this data is cut into sequences of 64 bytes (in case it's longer) 199and the first byte encodes the length minus one of the uncompressed data that follows. 200 201A small summary is given in figure \ref{fig:code_comp}. 202\begin{figure}[!ht] 203\centering 204\begin{verbatim} 205unsigned short length, ptr; 206 207if (byte1 & 0x80) { 208 length = (byte1 & 0x7F) + ((byte2 & 0xC0) << 1) + 3; 209 ptr = byte2 & 0x3F; 210} else { 211 length = byte1 + 1; 212} 213\end{verbatim} 214\caption{Sequence encoding} 215\label{fig:code_comp} 216\end{figure} 217 218To repeat the previous example, here's how it would be compressed: 219\begin{exemple} 22005 0F 0F 01 04 03 06 80 04 00 05 84 04 81 01 221\end{exemple} 222 223That's six bytes shorter than the uncompressed data. 224 225\section{Checksum} 226To be able to detect data corruption during transit, a 227checksum is placed at the end of the compressed data (it should be noted 228that the length of this checksum is included in the 229data length indicated in the band header). 230This sum is coded 231as 32-bits big-endian and is the simple sum of the bytes between 232the band header and the checksum. 233 234