1Programs and scripts for automated testing of Zstandard 2======================================================= 3 4This directory contains the following programs and scripts: 5- `datagen` : Synthetic and parametrable data generator, for tests 6- `fullbench` : Precisely measure speed for each zstd inner functions 7- `fuzzer` : Test tool, to check zstd integrity on target platform 8- `paramgrill` : parameter tester for zstd 9- `test-zstd-speed.py` : script for testing zstd speed difference between commits 10- `test-zstd-versions.py` : compatibility test between zstd versions stored on Github (v0.1+) 11- `zbufftest` : Test tool to check ZBUFF (a buffered streaming API) integrity 12- `zstreamtest` : Fuzzer test tool for zstd streaming API 13- `legacy` : Test tool to test decoding of legacy zstd frames 14- `decodecorpus` : Tool to generate valid Zstandard frames, for verifying decoder implementations 15 16 17#### `test-zstd-versions.py` - script for testing zstd interoperability between versions 18 19This script creates `versionsTest` directory to which zstd repository is cloned. 20Then all tagged (released) versions of zstd are compiled. 21In the following step interoperability between zstd versions is checked. 22 23#### `automated-benchmarking.py` - script for benchmarking zstd prs to dev 24 25This script benchmarks facebook:dev and changes from pull requests made to zstd and compares 26them against facebook:dev to detect regressions. This script currently runs on a dedicated 27desktop machine for every pull request that is made to the zstd repo but can also 28be run on any machine via the command line interface. 29 30There are three modes of usage for this script: fastmode will just run a minimal single 31build comparison (between facebook:dev and facebook:release), onetime will pull all the current 32pull requests from the zstd repo and compare facebook:dev to all of them once, continuous 33will continuously get pull requests from the zstd repo and run benchmarks against facebook:dev. 34 35``` 36Example usage: python automated_benchmarking.py 37``` 38 39``` 40usage: automated_benchmarking.py [-h] [--directory DIRECTORY] 41 [--levels LEVELS] [--iterations ITERATIONS] 42 [--emails EMAILS] [--frequency FREQUENCY] 43 [--mode MODE] [--dict DICT] 44 45optional arguments: 46 -h, --help show this help message and exit 47 --directory DIRECTORY 48 directory with files to benchmark 49 --levels LEVELS levels to test eg ('1,2,3') 50 --iterations ITERATIONS 51 number of benchmark iterations to run 52 --emails EMAILS email addresses of people who will be alerted upon 53 regression. Only for continuous mode 54 --frequency FREQUENCY 55 specifies the number of seconds to wait before each 56 successive check for new PRs in continuous mode 57 --mode MODE 'fastmode', 'onetime', 'current', or 'continuous' (see 58 README.md for details) 59 --dict DICT filename of dictionary to use (when set, this 60 dictioanry will be used to compress the files provided 61 inside --directory) 62``` 63 64#### `test-zstd-speed.py` - script for testing zstd speed difference between commits 65 66DEPRECATED 67 68This script creates `speedTest` directory to which zstd repository is cloned. 69Then it compiles all branches of zstd and performs a speed benchmark for a given list of files (the `testFileNames` parameter). 70After `sleepTime` (an optional parameter, default 300 seconds) seconds the script checks repository for new commits. 71If a new commit is found it is compiled and a speed benchmark for this commit is performed. 72The results of the speed benchmark are compared to the previous results. 73If compression or decompression speed for one of zstd levels is lower than `lowerLimit` (an optional parameter, default 0.98) the speed benchmark is restarted. 74If second results are also lower than `lowerLimit` the warning e-mail is send to recipients from the list (the `emails` parameter). 75 76Additional remarks: 77- To be sure that speed results are accurate the script should be run on a "stable" target system with no other jobs running in parallel 78- Using the script with virtual machines can lead to large variations of speed results 79- The speed benchmark is not performed until computers' load average is lower than `maxLoadAvg` (an optional parameter, default 0.75) 80- The script sends e-mails using `mutt`; if `mutt` is not available it sends e-mails without attachments using `mail`; if both are not available it only prints a warning 81 82 83The example usage with two test files, one e-mail address, and with an additional message: 84``` 85./test-zstd-speed.py "silesia.tar calgary.tar" "email@gmail.com" --message "tested on my laptop" --sleepTime 60 86``` 87 88To run the script in background please use: 89``` 90nohup ./test-zstd-speed.py testFileNames emails & 91``` 92 93The full list of parameters: 94``` 95positional arguments: 96 testFileNames file names list for speed benchmark 97 emails list of e-mail addresses to send warnings 98 99optional arguments: 100 -h, --help show this help message and exit 101 --message MESSAGE attach an additional message to e-mail 102 --lowerLimit LOWERLIMIT 103 send email if speed is lower than given limit 104 --maxLoadAvg MAXLOADAVG 105 maximum load average to start testing 106 --lastCLevel LASTCLEVEL 107 last compression level for testing 108 --sleepTime SLEEPTIME 109 frequency of repository checking in seconds 110``` 111 112#### `decodecorpus` - tool to generate Zstandard frames for decoder testing 113Command line tool to generate test .zst files. 114 115This tool will generate .zst files with checksums, 116as well as optionally output the corresponding correct uncompressed data for 117extra verification. 118 119Example: 120``` 121./decodecorpus -ptestfiles -otestfiles -n10000 -s5 122``` 123will generate 10,000 sample .zst files using a seed of 5 in the `testfiles` directory, 124with the zstd checksum field set, 125as well as the 10,000 original files for more detailed comparison of decompression results. 126 127``` 128./decodecorpus -t -T1mn 129``` 130will choose a random seed, and for 1 minute, 131generate random test frames and ensure that the 132zstd library correctly decompresses them in both simple and streaming modes. 133 134#### `paramgrill` - tool for generating compression table parameters and optimizing parameters on file given constraints 135 136Full list of arguments 137``` 138 -T# : set level 1 speed objective 139 -B# : cut input into blocks of size # (default : single block) 140 -S : benchmarks a single run (example command: -Sl3w10h12) 141 w# - windowLog 142 h# - hashLog 143 c# - chainLog 144 s# - searchLog 145 l# - minMatch 146 t# - targetLength 147 S# - strategy 148 L# - level 149 --zstd= : Single run, parameter selection syntax same as zstdcli with more parameters 150 (Added forceAttachDictionary / fadt) 151 When invoked with --optimize, this represents the sample to exceed. 152 --optimize= : find parameters to maximize compression ratio given parameters 153 Can use all --zstd= commands to constrain the type of solution found in addition to the following constraints 154 cSpeed= : Minimum compression speed 155 dSpeed= : Minimum decompression speed 156 cMem= : Maximum compression memory 157 lvl= : Searches for solutions which are strictly better than that compression lvl in ratio and cSpeed, 158 stc= : When invoked with lvl=, represents percentage slack in ratio/cSpeed allowed for a solution to be considered (Default 100%) 159 : In normal operation, represents percentage slack in choosing viable starting strategy selection in choosing the default parameters 160 (Lower value will begin with stronger strategies) (Default 90%) 161 speedRatio= (accepts decimals) 162 : determines value of gains in speed vs gains in ratio 163 when determining overall winner (default 5 (1% ratio = 5% speed)). 164 tries= : Maximum number of random restarts on a single strategy before switching (Default 5) 165 Higher values will make optimizer run longer, more chances to find better solution. 166 memLog : Limits the log of the size of each memotable (1 per strategy). Will use hash tables when state space is larger than max size. 167 Setting memLog = 0 turns off memoization 168 --display= : specify which parameters are included in the output 169 can use all --zstd parameter names and 'cParams' as a shorthand for all parameters used in ZSTD_compressionParameters 170 (Default: display all params available) 171 -P# : generated sample compressibility (when no file is provided) 172 -t# : Caps runtime of operation in seconds (default : 99999 seconds (about 27 hours )) 173 -v : Prints Benchmarking output 174 -D : Next argument dictionary file 175 -s : Benchmark all files separately 176 -q : Quiet, repeat for more quiet 177 -q Prints parameters + results whenever a new best is found 178 -qq Only prints parameters whenever a new best is found, prints final parameters + results 179 -qqq Only print final parameters + results 180 -qqqq Only prints final parameter set in the form --zstd= 181 -v : Verbose, cancels quiet, repeat for more volume 182 -v Prints all candidate parameters and results 183 184``` 185 Any inputs afterwards are treated as files to benchmark. 186