• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

cmake/H19-Jun-2020-1,5541,310

images/H03-May-2022-

packaging/H03-May-2022-153116

python/H03-May-2022-628447

test_data/H03-May-2022-

third_party/H19-Jun-2020-26,82620,116

vbz/H03-May-2022-2,8302,279

vbz_plugin/H03-May-2022-1,123847

.gitmodulesH A D19-Jun-2020119 43

README.mdH A D19-Jun-20202.8 KiB7854

README.md

1![Oxford Nanopore Technologies logo](images/ONT_logo_590x106.png)
2
3VBZ Compression
4===============
5
6VBZ Compression uses variable byte integer encoding to compress nanopore signal data and is built using the following libraries:
7
8  - https://github.com/lemire/streamvbyte
9  - https://github.com/facebook/zstd
10
11The performance of VBZ is achieved by taking advantage of the properties of the raw signal and therefore is most effective when applied to the signal dataset. Other datasets you may have in your Fast5 files will not be able to take advantage of the default VBZ settings for compression. VBZ will be used as the default compression scheme in a future release of MinKNOW.
12
13Installation
14------------
15
16See the [release](https://github.com/nanoporetech/vbz_compression/releases) section to find the installers for the hdf5 plugin.
17
18Post installation you can then use `HDFView`, `h5repack` or `h5py` as you normally would:
19
20```bash
21# Invoke h5repack to pack input.fast5 into output.fast5
22#
23# The integer values specify how the data is packed:
24#   - 32020: The id of the filter to apply (vbz in this case)
25#   - 5: The number of following arguments
26#   - 0: Filter flag for configuring filter version
27#   - 0: Padding value for configuring filter version
28#   - 2: Packing integers of size 2 bytes
29#   - 1: Use zig zag encoding
30#   - 1: Use zstd compression level 1
31> h5repack -f UD=32020,5,0,0,2,1,1 input.fast5 output.fast5
32
33# To compress 4 byte unsigned integers (no zig zag) with level 3 zstd you could use:
34> h5repack -f UD=32020,5,0,0,4,0,3 input.h5 output.h5
35
36# Invoke h5repack recursively on all reads using 10 processes
37> find . -name "*.fast5" | xargs -P 10 -I % h5repack -f UD=32020,5,0,0,2,1,1 % %.vbz
38
39# Invoke h5repack recursively on all reads storing the results inplace using 10 processes
40> find . -name "*.fast5" | xargs -P 10 -I % sh -c "h5repack -f UD=32020,5,0,0,2,1,1 % %.vbz && mv %.vbz %"
41```
42
43Benchmarks
44----------
45
46VBZ outperforms GZIP in both CPU time (>10X compression, >5X decompression) and compression (>30%).
47
48![Compression Ratio](images/vbz_compression_ratio.png)
49![Compression Performance](images/vbz_x86_compression.png)
50![Decompression Performance](images/vbz_x86_decompression.png)
51
52
53Development
54-----------
55
56To develop the plugin without conan you need the following installed:
57
58- cmake 3.11 (https://cmake.org/)
59
60and the following c++ dependencies
61
62- zstd development libraries available to cmake
63- hdf5 development libraries available to cmake (required for testing)
64
65The following ubuntu packages provide these libraries:
66  - libhdf5-dev
67  - libzstd-dev
68
69Then configure the project using:
70
71```bash
72> git submodule update --init
73> mkdir build
74> cd build
75> cmake -D CMAKE_BUILD_TYPE=Release -D ENABLE_CONAN=OFF -D ENABLE_PERF_TESTING=OFF -D ENABLE_PYTHON=OFF ..
76> make -j
77```
78