1Snzip, a compression/decompression tool based on snappy. 2======================================================== 3 4What is snzip. 5-------------- 6 7Snzip is one of command line tools using [snappy][]. This supports several file 8formats; [framing-format][], [old framing-format][], [hadoop-snappy format][], [raw format][] 9and obsolete three formats used by snzip, [snappy-java][] and [snappy-in-java][] 10before official framing-format was defined. The default format is [framing-format][]. 11 12Notable Changes 13--------------- 14 15The default format was changed to [framing-format][] in 1.0.0. 16Set `--with-default-format=snzip` as a configure option to use obsolete snzip 17format as the default format as before. 18 19Installation 20------------ 21 22### Install from a tar-ball 23 24Download snzip-1.0.5.tar.gz from https://github.com/kubo/snzip/releases, 25uncompress and untar it, and run configure. 26 27 tar xvfz snzip-1.0.5.tar.gz 28 cd snzip-1.0.5 29 ./configure 30 make 31 make install 32 33If you didn't install snappy under `/usr` or `/usr/local`, you need to specify 34the location by `--with-snappy` as follows. 35 36 # install snzip 37 tar xvfz snzip-1.0.5.tar.gz 38 cd snzip-1.0.5 39 ./configure --with-snappy=/xxx/yyy/ 40 make 41 make install 42 43When both dynamic and static snappy libraries are available, the former 44is used by default. The compiled `snzip` depends on `libsnappy.so`. 45When `--with-static-snappy` is passed as a configure option, the latter 46is used. The compiled `snzip` includes snappy library. 47 48Note: `--with-static-snappy` isn't available on some platforms. 49 50You can use `--with-default-format` to change the default compression format. 51 52 ./configure --with-default-format=snzip 53 54### Install as a rpm package 55 56We don't provide rpm packages. You need to download snzip-1.0.5.tar.gz 57from https://github.com/kubo/snzip/releases, create a rpm package as follows and 58install it. 59 60 # The rpm package will be created under $HOME/rpmbuild/RPMS. 61 rpmbuild -tb snzip-1.0.5.tar.gz 62 63### Install from the latest source 64 65To use source code in the github repository. 66 67 git clone git://github.com/kubo/snzip.git 68 cd snzip 69 ./autogen.sh 70 ./configure 71 make 72 make install 73 74### Install a Windows package. 75 76Download `snzip-1.0.5-win32.zip` or `snzip-1.0.5-win64.zip` from 77https://github.com/kubo/snzip/releases and copy `snzip.exe` and `snunzip.exe` 78to a directory in the PATH environment variable. 79 80Usage 81----- 82 83### To compress file.tar: 84 85 snzip file.tar 86 87Compressed file name is `file.tar.sz` and the original file is deleted. 88The file attributes such as timestamp, mode and permissions are not changed 89as possible as it can. 90 91The compressed file's format is [framing-format][]. You need to add an option `-t snappy-java` or 92`-t snappy-in-java` to use other formats. 93 94 snzip -t snappy-java file.tar 95 96or 97 98 snzip -t snappy-in-java file.tar 99 100### To compress file.tar and output to standard out. 101 102 snzip -c file.tar > file.tar.sz 103 104or 105 106 cat file.tar | snzip > file.tar.sz 107 108You need to add an option `-t [format-name]` to use formats except [framing-format][]. 109 110### To create a new tar file and compress it. 111 112 tar cf - files-to-be-archived | snzip > archive.tar.sz 113 114### To uncompress file.tar.sz: 115 116 snzip -d file.tar.sz 117 118or 119 120 snunzip file.tar.sz 121 122Uncompressed file name is `file.tar` and the original file is deleted. 123The file attributes such as timestamp, mode and permissions are not changed 124as possible as it can. 125 126If the program name includes `un` such as `snunzip`, it acts as `-d` is set. 127 128The file format is automatically determined from the file header. 129However it doesn't work for some file formats such as raw and Apple iWork .iwa. 130 131### To uncompress file.tar.sz and output to standard out. 132 133 snzip -dc file.tar.sz > file.tar 134 snunzip -c file.tar.sz > file.tar 135 snzcat file.tar.sz > file.tar 136 cat file.tar.sz | snzcat > file.tar 137 138If the program name includes `cat` such as snzcat, it acts as `-dc` is set. 139 140### To uncompress a tar file and extract it. 141 142 snzip -dc archive.tar.sz | tar xf - 143 144Raw format 145---------- 146 147Raw format is native format of snappy. 148Unlike other formats, there are a few limitations: 149(1) The total data length before compression must be known on compression. 150(2) Automatic file format detection doesn't work on uncompression. 151(3) The raw format support is enabled only when snzip is compiled for snappy 1.1.3 or upper. 152 153### To compress file.tar as raw format: 154 155 snzip -t raw file.tar 156 157or 158 159 snzip -t raw < file.tar > file.tar.raw 160 161In these examples, snzip uses a file descriptor, which directly opens 162the `file.tar` file, and gets the file length to be compressed. 163However the following command doesn't work. 164 165 cat file.tar | snzip -t raw > file.tar.raw 166 167It uses a pipe. snzip cannot get the total length before compression. 168The total length must be specified by the `-s` option in this case. 169 170 cat file.tar | snzip -t raw -s "size of file.tar" > file.tar.raw 171 172### To uncompress file.tar.sz compressed as raw format 173 174 snzip -t raw -d file.tar.sz 175 176or 177 178 snunzip -t raw file.tar.sz 179 180You need to set the `-t raw` option to tell snzip the format of the 181file to be uncompressed. 182 183Hadoop-snappy format 184-------------------- 185 186Hadoop-snappy format is one of the compression formats used in Hadoop. 187It uses its own framing format as follows: 188 189* A compressed file consists of one or more blocks. 190* A block consists of uncompressed length (big endian 4 byte integer) and one or more subblocks. 191* A subblock consists of compressed length (big endian 4 byte integer) and raw compressed data. 192 193### To compress a file as hadoop-snappy format 194 195 snzip -t hadoop-snappy file_name 196 197The default block size used by `snzip` for hadoop-snappy format is 256k. 198It is same with the default value of the `io.compression.codec.snappy.buffersize` 199parameter. If the block size used by `snzip` is larger than the parameter, 200you would get an InternalError `Could not decompress data. Buffer length is too small` 201while hadoop is reading a file compressed by snzip. You need to change the block 202size by the `-b` option as follows if you get the error. 203 204 # if io.compression.codec.snappy.buffersize is 32768 205 snzip -t hadoop-snappy -b 32768 file_name_to_be_compressed 206 207### To uncompress a file compressed as haddoop-snappy format 208 209 snzip -d compressed_file.snappy 210 211The file format is guessed by the first 8 bytes of the file. 212 213Apple iWork .iwa format 214----------------------- 215 216Apple iWork .iwa format is a file format used by Apple iWork. The format was 217demystified [here](https://github.com/obriensp/iWorkFileFormat). 218Basically the .iwa format consists of a Protobuf stream [compressed by Snappy](https://github.com/obriensp/iWorkFileFormat/blob/master/Docs/index.md#snappy-compression). 219 220Snzip uncompresses .iwa files to Protbuf streams and compresses Protobuf streams 221to .iwa files. You need to set `-t iwa` on compression and uncompression to 222specify the file format. 223 224SNZ File format 225--------------- 226 227Note: This is obsolete format. The default format was changed to [framing-format]. 228 229The first three bytes are magic characters 'SNZ'. 230 231The fourth byte is the file format version. It is 0x01. 232 233The fifth byte is the order of the block size. The input data 234is divided into fixed-length blocks and each block is compressed 235by snappy. When it is 16 (default value), the block size is 16th 236power of 2; 64 kilobytes. 237 238The rest is pairs of a compressed data length and a compressed data block 239The compressed data length is encoded as `snappy::Varint::Encode32()` does. 240If the length is zero, it is the end of data. 241 242Though the rest after the end of data is ignored for now, they 243may be continuously read as a next compressed file as gzip does. 244 245Note that the uncompressed length of each compressed data block must be 246less than or equal to the block size specified by the fifth byte. 247 248License 249------- 250 2512-clause BSD-style license. 252 253[snappy]: https://github.com/google/snappy/blob/master/docs/README.md 254[framing-format]: https://github.com/google/snappy/blob/master/framing_format.txt 255[old framing-format]: https://github.com/google/snappy/blob/0755c815197dacc77d8971ae917c86d7aa96bf8e/framing_format.txt 256[snappy-java]: https://github.com/xerial/snappy-java 257[snappy-in-java]: https://github.com/dain/snappy 258[raw format]: https://github.com/kubo/snzip#raw-format 259[Hadoop-snappy format]: https://github.com/kubo/snzip#hadoop-snappy-format 260