• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

examples/H20-May-2020-594354

.gitignoreH A D20-May-2020272 2620

.travis.ymlH A D20-May-20201.5 KiB7870

LICENSEH A D20-May-20201.1 KiB2418

README.mdH A D20-May-202016.3 KiB396290

appveyor.ymlH A D20-May-2020333 2115

examples_test.goH A D20-May-20204.4 KiB210123

galois.goH A D20-May-2020437.3 KiB930871

galoisAvx512_amd64.goH A D20-May-202010.9 KiB339273

galoisAvx512_amd64.sH A D20-May-20209.7 KiB401296

galoisAvx512_amd64_test.goH A D20-May-202012.5 KiB417322

galois_amd64.goH A D20-May-20202.8 KiB13990

galois_amd64.sH A D20-May-202010 KiB369316

galois_arm64.goH A D20-May-20201.2 KiB6846

galois_arm64.sH A D20-May-20202.7 KiB12679

galois_gen_amd64.goH A D20-May-202014.7 KiB40981

galois_gen_amd64.sH A D20-May-2020394.9 KiB18,52716,680

galois_gen_none.goH A D20-May-2020259 127

galois_gen_switch_amd64.goH A D20-May-20205.9 KiB294282

galois_noasm.goH A D20-May-2020795 4533

galois_notamd64.goH A D20-May-2020492 147

galois_ppc64le.goH A D20-May-20201.4 KiB7642

galois_ppc64le.sH A D20-May-20202.6 KiB12593

galois_test.goH A D20-May-20206.4 KiB273236

gen.goH A D20-May-20206.1 KiB250199

gentables.goH A D20-May-20203.2 KiB133107

go.modH A D20-May-202097 85

go.sumH A D20-May-2020171 32

inversion_tree.goH A D20-May-20205.8 KiB16171

inversion_tree_test.goH A D20-May-20202.9 KiB126101

matrix.goH A D20-May-20206.7 KiB280204

matrix_test.goH A D20-May-20206 KiB218159

options.goH A D20-May-20204.6 KiB176109

reedsolomon.goH A D20-May-202030.2 KiB1,012618

reedsolomon_test.goH A D20-May-202036.9 KiB1,5591,276

streaming.goH A D20-May-202015.4 KiB604383

streaming_test.goH A D20-May-202015 KiB624515

README.md

1# Reed-Solomon
2[![GoDoc][1]][2] [![Build Status][3]][4]
3
4[1]: https://godoc.org/github.com/klauspost/reedsolomon?status.svg
5[2]: https://pkg.go.dev/github.com/klauspost/reedsolomon?tab=doc
6[3]: https://travis-ci.org/klauspost/reedsolomon.svg?branch=master
7[4]: https://travis-ci.org/klauspost/reedsolomon
8
9Reed-Solomon Erasure Coding in Go, with speeds exceeding 1GB/s/cpu core implemented in pure Go.
10
11This is a Go port of the [JavaReedSolomon](https://github.com/Backblaze/JavaReedSolomon) library released by
12[Backblaze](http://backblaze.com), with some additional optimizations.
13
14For an introduction on erasure coding, see the post on the [Backblaze blog](https://www.backblaze.com/blog/reed-solomon/).
15
16Package home: https://github.com/klauspost/reedsolomon
17
18Godoc: https://pkg.go.dev/github.com/klauspost/reedsolomon?tab=doc
19
20# Installation
21To get the package use the standard:
22```bash
23go get -u github.com/klauspost/reedsolomon
24```
25
26# Changes
27
28## May 2020
29
30* ARM64 optimizations, up to 2.5x faster.
31* Added [WithFastOneParityMatrix](https://pkg.go.dev/github.com/klauspost/reedsolomon?tab=doc#WithFastOneParityMatrix) for faster operation with 1 parity shard.
32* Much better performance when using a limited number of goroutines.
33* AVX512 is now using multiple cores.
34* Stream processing overhaul, big speedups in most cases.
35* AVX512 optimizations
36
37## March 6, 2019
38
39The pure Go implementation is about 30% faster. Minor tweaks to assembler implementations.
40
41## February 8, 2019
42
43AVX512 accelerated version added for Intel Skylake CPUs. This can give up to a 4x speed improvement as compared to AVX2.
44See [here](https://github.com/klauspost/reedsolomon#performance-on-avx512) for more details.
45
46## December 18, 2018
47
48Assembly code for ppc64le has been contributed, this boosts performance by about 10x on this platform.
49
50## November 18, 2017
51
52Added [WithAutoGoroutines](https://godoc.org/github.com/klauspost/reedsolomon#WithAutoGoroutines) which will attempt
53to calculate the optimal number of goroutines to use based on your expected shard size and detected CPU.
54
55## October 1, 2017
56
57* [Cauchy Matrix](https://godoc.org/github.com/klauspost/reedsolomon#WithCauchyMatrix) is now an option.
58Thanks to [templexxx](https://github.com/templexxx) for the basis of this.
59
60* Default maximum number of [goroutines](https://godoc.org/github.com/klauspost/reedsolomon#WithMaxGoroutines)
61has been increased for better multi-core scaling.
62
63* After several requests the Reconstruct and ReconstructData now slices of zero length but sufficient capacity to
64be used instead of allocating new memory.
65
66## August 26, 2017
67
68*  The [`Encoder()`](https://godoc.org/github.com/klauspost/reedsolomon#Encoder) now contains an `Update`
69function contributed by [chenzhongtao](https://github.com/chenzhongtao).
70
71* [Frank Wessels](https://github.com/fwessels) kindly contributed ARM 64 bit assembly,
72which gives a huge performance boost on this platform.
73
74## July 20, 2017
75
76`ReconstructData` added to [`Encoder`](https://godoc.org/github.com/klauspost/reedsolomon#Encoder) interface.
77This can cause compatibility issues if you implement your own Encoder. A simple workaround can be added:
78
79```Go
80func (e *YourEnc) ReconstructData(shards [][]byte) error {
81	return ReconstructData(shards)
82}
83```
84
85You can of course also do your own implementation.
86The [`StreamEncoder`](https://godoc.org/github.com/klauspost/reedsolomon#StreamEncoder)
87handles this without modifying the interface.
88This is a good lesson on why returning interfaces is not a good design.
89
90# Usage
91
92This section assumes you know the basics of Reed-Solomon encoding.
93A good start is this [Backblaze blog post](https://www.backblaze.com/blog/reed-solomon/).
94
95This package performs the calculation of the parity sets. The usage is therefore relatively simple.
96
97First of all, you need to choose your distribution of data and parity shards.
98A 'good' distribution is very subjective, and will depend a lot on your usage scenario.
99A good starting point is above 5 and below 257 data shards (the maximum supported number),
100and the number of parity shards to be 2 or above, and below the number of data shards.
101
102To create an encoder with 10 data shards (where your data goes) and 3 parity shards (calculated):
103```Go
104    enc, err := reedsolomon.New(10, 3)
105```
106This encoder will work for all parity sets with this distribution of data and parity shards.
107The error will only be set if you specify 0 or negative values in any of the parameters,
108or if you specify more than 256 data shards.
109
110If you will primarily be using it with one shard size it is recommended to use
111[`WithAutoGoroutines(shardSize)`](https://pkg.go.dev/github.com/klauspost/reedsolomon?tab=doc#WithAutoGoroutines)
112as an additional parameter. This will attempt to calculate the optimal number of goroutines to use for the best speed.
113It is not required that all shards are this size.
114
115The you send and receive data  is a simple slice of byte slices; `[][]byte`.
116In the example above, the top slice must have a length of 13.
117
118```Go
119    data := make([][]byte, 13)
120```
121You should then fill the 10 first slices with *equally sized* data,
122and create parity shards that will be populated with parity data. In this case we create the data in memory,
123but you could for instance also use [mmap](https://github.com/edsrzf/mmap-go) to map files.
124
125```Go
126    // Create all shards, size them at 50000 each
127    for i := range input {
128      data[i] := make([]byte, 50000)
129    }
130
131
132  // Fill some data into the data shards
133    for i, in := range data[:10] {
134      for j:= range in {
135         in[j] = byte((i+j)&0xff)
136      }
137    }
138```
139
140To populate the parity shards, you simply call `Encode()` with your data.
141```Go
142    err = enc.Encode(data)
143```
144The only cases where you should get an error is, if the data shards aren't of equal size.
145The last 3 shards now contain parity data. You can verify this by calling `Verify()`:
146
147```Go
148    ok, err = enc.Verify(data)
149```
150
151The final (and important) part is to be able to reconstruct missing shards.
152For this to work, you need to know which parts of your data is missing.
153The encoder *does not know which parts are invalid*, so if data corruption is a likely scenario,
154you need to implement a hash check for each shard.
155
156If a byte has changed in your set, and you don't know which it is, there is no way to reconstruct the data set.
157
158To indicate missing data, you set the shard to nil before calling `Reconstruct()`:
159
160```Go
161    // Delete two data shards
162    data[3] = nil
163    data[7] = nil
164
165    // Reconstruct the missing shards
166    err := enc.Reconstruct(data)
167```
168The missing data and parity shards will be recreated. If more than 3 shards are missing, the reconstruction will fail.
169
170If you are only interested in the data shards (for reading purposes) you can call `ReconstructData()`:
171
172```Go
173    // Delete two data shards
174    data[3] = nil
175    data[7] = nil
176
177    // Reconstruct just the missing data shards
178    err := enc.ReconstructData(data)
179```
180
181So to sum up reconstruction:
182* The number of data/parity shards must match the numbers used for encoding.
183* The order of shards must be the same as used when encoding.
184* You may only supply data you know is valid.
185* Invalid shards should be set to nil.
186
187For complete examples of an encoder and decoder see the
188[examples folder](https://github.com/klauspost/reedsolomon/tree/master/examples).
189
190# Splitting/Joining Data
191
192You might have a large slice of data.
193To help you split this, there are some helper functions that can split and join a single byte slice.
194
195```Go
196   bigfile, _ := ioutil.Readfile("myfile.data")
197
198   // Split the file
199   split, err := enc.Split(bigfile)
200```
201This will split the file into the number of data shards set when creating the encoder and create empty parity shards.
202
203An important thing to note is that you have to *keep track of the exact input size*.
204If the size of the input isn't divisible by the number of data shards, extra zeros will be inserted in the last shard.
205
206To join a data set, use the `Join()` function, which will join the shards and write it to the `io.Writer` you supply:
207```Go
208   // Join a data set and write it to io.Discard.
209   err = enc.Join(io.Discard, data, len(bigfile))
210```
211
212# Streaming/Merging
213
214It might seem like a limitation that all data should be in memory,
215but an important property is that *as long as the number of data/parity shards are the same,
216you can merge/split data sets*, and they will remain valid as a separate set.
217
218```Go
219    // Split the data set of 50000 elements into two of 25000
220    splitA := make([][]byte, 13)
221    splitB := make([][]byte, 13)
222
223    // Merge into a 100000 element set
224    merged := make([][]byte, 13)
225
226    for i := range data {
227      splitA[i] = data[i][:25000]
228      splitB[i] = data[i][25000:]
229
230      // Concatenate it to itself
231	  merged[i] = append(make([]byte, 0, len(data[i])*2), data[i]...)
232	  merged[i] = append(merged[i], data[i]...)
233    }
234
235    // Each part should still verify as ok.
236    ok, err := enc.Verify(splitA)
237    if ok && err == nil {
238        log.Println("splitA ok")
239    }
240
241    ok, err = enc.Verify(splitB)
242    if ok && err == nil {
243        log.Println("splitB ok")
244    }
245
246    ok, err = enc.Verify(merge)
247    if ok && err == nil {
248        log.Println("merge ok")
249    }
250```
251
252This means that if you have a data set that may not fit into memory, you can split processing into smaller blocks.
253For the best throughput, don't use too small blocks.
254
255This also means that you can divide big input up into smaller blocks, and do reconstruction on parts of your data.
256This doesn't give the same flexibility of a higher number of data shards, but it will be much more performant.
257
258# Streaming API
259
260There has been added support for a streaming API, to help perform fully streaming operations,
261which enables you to do the same operations, but on streams.
262To use the stream API, use [`NewStream`](https://godoc.org/github.com/klauspost/reedsolomon#NewStream) function
263to create the encoding/decoding interfaces.
264
265You can use [`WithConcurrentStreams`](https://godoc.org/github.com/klauspost/reedsolomon#WithConcurrentStreams)
266to ready an interface that reads/writes concurrently from the streams.
267
268You can specify the size of each operation using
269[`WithStreamBlockSize`](https://godoc.org/github.com/klauspost/reedsolomon#WithStreamBlockSize).
270This will set the size of each read/write operation.
271
272Input is delivered as `[]io.Reader`, output as `[]io.Writer`, and functionality corresponds to the in-memory API.
273Each stream must supply the same amount of data, similar to how each slice must be similar size with the in-memory API.
274If an error occurs in relation to a stream,
275a [`StreamReadError`](https://godoc.org/github.com/klauspost/reedsolomon#StreamReadError)
276or [`StreamWriteError`](https://godoc.org/github.com/klauspost/reedsolomon#StreamWriteError)
277will help you determine which stream was the offender.
278
279There is no buffering or timeouts/retry specified. If you want to add that, you need to add it to the Reader/Writer.
280
281For complete examples of a streaming encoder and decoder see the
282[examples folder](https://github.com/klauspost/reedsolomon/tree/master/examples).
283
284# Advanced Options
285
286You can modify internal options which affects how jobs are split between and processed by goroutines.
287
288To create options, use the WithXXX functions. You can supply options to `New`, `NewStream`.
289If no Options are supplied, default options are used.
290
291Example of how to supply options:
292
293 ```Go
294     enc, err := reedsolomon.New(10, 3, WithMaxGoroutines(25))
295 ```
296
297
298# Performance
299Performance depends mainly on the number of parity shards.
300In rough terms, doubling the number of parity shards will double the encoding time.
301
302Here are the throughput numbers with some different selections of data and parity shards.
303For reference each shard is 1MB random data, and 2 CPU cores are used for encoding.
304
305| Data | Parity | Parity | MB/s   | SSSE3 MB/s  | SSSE3 Speed | Rel. Speed |
306|------|--------|--------|--------|-------------|-------------|------------|
307| 5    | 2      | 40%    | 576,11 | 2599,2      | 451%        | 100,00%    |
308| 10   | 2      | 20%    | 587,73 | 3100,28     | 528%        | 102,02%    |
309| 10   | 4      | 40%    | 298,38 | 2470,97     | 828%        | 51,79%     |
310| 50   | 20     | 40%    | 59,81  | 713,28      | 1193%       | 10,38%     |
311
312If `runtime.GOMAXPROCS()` is set to a value higher than 1,
313the encoder will use multiple goroutines to perform the calculations in `Verify`, `Encode` and `Reconstruct`.
314
315Example of performance scaling on AMD Ryzen 3950X - 16 physical cores, 32 logical cores, AVX 2.
316The example uses 10 blocks with 1MB data each and 4 parity blocks.
317
318| Threads | Speed      |
319|---------|------------|
320| 1       | 9979 MB/s  |
321| 2       | 18870 MB/s |
322| 4       | 33697 MB/s |
323| 8       | 51531 MB/s |
324| 16      | 59204 MB/s |
325
326
327Benchmarking `Reconstruct()` followed by a `Verify()` (=`all`) versus just calling `ReconstructData()` (=`data`) gives the following result:
328```
329benchmark                            all MB/s     data MB/s    speedup
330BenchmarkReconstruct10x2x10000-8     2011.67      10530.10     5.23x
331BenchmarkReconstruct50x5x50000-8     4585.41      14301.60     3.12x
332BenchmarkReconstruct10x2x1M-8        8081.15      28216.41     3.49x
333BenchmarkReconstruct5x2x1M-8         5780.07      28015.37     4.85x
334BenchmarkReconstruct10x4x1M-8        4352.56      14367.61     3.30x
335BenchmarkReconstruct50x20x1M-8       1364.35      4189.79      3.07x
336BenchmarkReconstruct10x4x16M-8       1484.35      5779.53      3.89x
337```
338
339# Performance on AVX512
340
341The performance on AVX512 has been accelerated for Intel CPUs.
342This gives speedups on a per-core basis typically up to 2x compared to
343AVX2 as can be seen in the following table:
344
345```
346[...]
347```
348
349This speedup has been achieved by computing multiple parity blocks in parallel as opposed to one after the other.
350In doing so it is possible to minimize the memory bandwidth required for loading all data shards.
351At the same time the calculations are performed in the 512-bit wide ZMM registers and the surplus of ZMM
352registers (32 in total) is used to keep more data around (most notably the matrix coefficients).
353
354# Performance on ARM64 NEON
355
356By exploiting NEON instructions the performance for ARM has been accelerated.
357Below are the performance numbers for a single core on an EC2 m6g.16xlarge (Graviton2) instance (Amazon Linux 2):
358
359```
360BenchmarkGalois128K-64        119562     10028 ns/op        13070.78 MB/s
361BenchmarkGalois1M-64           14380     83424 ns/op        12569.22 MB/s
362BenchmarkGaloisXor128K-64      96508     12432 ns/op        10543.29 MB/s
363BenchmarkGaloisXor1M-64        10000    100322 ns/op        10452.13 MB/s
364```
365
366# Performance on ppc64le
367
368The performance for ppc64le has been accelerated.
369This gives roughly a 10x performance improvement on this architecture as can been seen below:
370
371```
372benchmark                      old MB/s     new MB/s     speedup
373BenchmarkGalois128K-160        948.87       8878.85      9.36x
374BenchmarkGalois1M-160          968.85       9041.92      9.33x
375BenchmarkGaloisXor128K-160     862.02       7905.00      9.17x
376BenchmarkGaloisXor1M-160       784.60       6296.65      8.03x
377```
378
379# asm2plan9s
380
381[asm2plan9s](https://github.com/fwessels/asm2plan9s) is used for assembling the AVX2 instructions into their BYTE/WORD/LONG equivalents.
382
383# Links
384* [Backblaze Open Sources Reed-Solomon Erasure Coding Source Code](https://www.backblaze.com/blog/reed-solomon/).
385* [JavaReedSolomon](https://github.com/Backblaze/JavaReedSolomon). Compatible java library by Backblaze.
386* [ocaml-reed-solomon-erasure](https://gitlab.com/darrenldl/ocaml-reed-solomon-erasure). Compatible OCaml implementation.
387* [reedsolomon-c](https://github.com/jannson/reedsolomon-c). C version, compatible with output from this package.
388* [Reed-Solomon Erasure Coding in Haskell](https://github.com/NicolasT/reedsolomon). Haskell port of the package with similar performance.
389* [reed-solomon-erasure](https://github.com/darrenldl/reed-solomon-erasure). Compatible Rust implementation.
390* [go-erasure](https://github.com/somethingnew2-0/go-erasure). A similar library using cgo, slower in my tests.
391* [Screaming Fast Galois Field Arithmetic](http://www.snia.org/sites/default/files2/SDC2013/presentations/NewThinking/EthanMiller_Screaming_Fast_Galois_Field%20Arithmetic_SIMD%20Instructions.pdf). Basis for SSE3 optimizations.
392
393# License
394
395This code, as the original [JavaReedSolomon](https://github.com/Backblaze/JavaReedSolomon) is published under an MIT license. See LICENSE file for more information.
396