1# Change History 2 3**3.0.0** 4- Implemented TLSH. 5- Updated to build with CMake. 6 7**3.0.1** 8- Enabled C++ optimization. Runs 4x faster. 9 10**3.0.2** 11- Supports Windows and Visual Studio. 12 13**3.0.3** 14- Added Python extension library. TLSH is callable in Python. 15- Stop generating hash if the input is less than 512 bytes. 16- Cleaned up. 17 18**3.0.4** 19- Length difference consideration can be disabled in this version. See `totalDiff` in `tlsh.h`. 20- TLSH can be compiled to generate the 70 or 134 character hashes. The longer version is more accurate. 21 22**3.1.0** 23- The checksum can be changed from 1 byte to 3 bytes. The collison rate is lower using 3 bytes. 24- If the incoming data has few features. The algorithm will not generate hash value. At least half the buckets must be non-zero. 25- Null or invalid hash strings comparison will return `-EINVAL` (-22). 26- Python extension library will read `CMakeLists.txt` to pick the compile options. 27- The default build will use half the buckets and 1 byte checksum. 28- New executable `tlsh_version` reports number of buckets, checksum length. 29 30**3.1.1** 31- Add `make.sh` and `clean.sh` scripts for building/cleaning the project. 32- Modifications to `tlsh_unittest.cpp` to write errors to stderr (not stdout) and to continue processing in some error cases. Also handle a listfile (`-l` parameter) which contains both TLSH and filename. 33- Updated expected output files based on changes to `tlsh_unittest.cpp`. 34 35**3.1.2** 36- Updated the Testing/exp expected results. 37- Created a script to ease the creation of the Testing/exp expected results. 38 39**3.1.3** 40- Updated `tlsh_util.h`, `tlsh_impl.cpp`, `tlsh_util.cpp` on checksum. 41- Updated `destroy_refersh_exp.sh` and Testing/exp results. 42 43**3.2.0** 44- Add Visual Studio 2005 and 2008 project and solution files to enable build on Windows environment. 45- Added files `WinFunctions.h` and `WinFunctions.cpp` to handle code changes needed for Windows build. 46- Modified several unit test expected output files to remove error messages, to allow the running of unit tests on Windows under Cygwin. This was caused by the opposite order in which stdout and stderr are written when stderr is redirected to stdout as 2>&1. Also modified `test.sh` to write stderr to `/dev/null`. 47- Move `rand_tags` executable from tlsh_forest project to tlsh, to reduce the dependencies of the tlsh ROC analysis project, which depends upon `tlsh_unittest` and `rand_tags`. 48- Remove `simple_unittest` and `tlsh_version` from bin directory as these executables are for internal testing and source code documentation, and do not need to be exported. 49- Add -version flag to `tlsh_unittest` to get the version of the tlsh library. 50 51**3.2.1** 52- Pickup fix to `hash_py()` in `py_ext/tlshmodule.cpp` (commit da5370bcfdd40dd6a33c877ee87fe3866188cf2d). 53 54**3.3.0** 55- Made the minimum data length = 256 for the C version. 56 This was reduced in version 3.5.0 to 50 bytes (the force option) 57 58**3.3.1** 59- Fixed bug introduced by commit 1a8f1c581c8b988ced683ff8e0a0f9c574058df4 which caused a different hash value to be generated if there were multiple calls to `Tlsh::update` as opposed to a single call. 60 61**3.4.0** 62- Add JavaScript implementation (see directory `js_ext`) - required for [Blackhat presentation](https://www.blackhat.com/us-15/speakers/Sean-Park.html). 63- Modify `tlsh_unittest` so that it can output tlsh values and filenames correctly, when the filenames contain embedded newline, linefeed or tab characters. 64 65**3.4.1** 66- Thanks to Jeremy Bobbios `py_ext` patch. TLSH has these enhancements. 67- Instead of using a big memory blob, it will calculate the hash incrementally. 68- A hashlib like object-oriented interface has been added to the Python module. See `test.py`. 69- Restrict the function to be fed bytes-like object to remove surprises like silent UTF-8 decoding. 70 71**3.4.2** 72- Back out python regression test as part of the test.sh script, so that the python module does not need to be installed in order to successfully pass the tests run by make.sh 73 74**3.4.3** 75- Fix regression tests running on Windows 76 77**3.4.4** 78- Specify Tlsh::getHash() is a const method 79 80**3.4.5** 81- Pick up Jeremy Bobbios patches for: 82 - Build shared library (libtlsh.so), in addition to static library, on Linux and have tlsh_unittest link to it. 83 - Remove TlshImpl symbols from libtlsh.so 84 - Add Tlsh_init to py_ext/tlshmodule.cpp, which ensures Tlsh constructor will be called from Tlsh python module 85 - Create symbolic link for tlsh -> tlsh_unittest 86 87**3.5.0** 88- Added the - force option 89 - Allows a user to force the generation of digests for strings down to 50 characters long 90 91**3.5.1** 92- Fixed the error in the Python extension 93 94**3.5.2** 95- Added the BlackHat Asia tool (presented at Arsenal) 96 97**3.6.0** 98- skipped 99 100**3.7.0** 101- merged in various fixes - ifdef for SPARC and RH73 102 corrected TLSH_CTC_final.pdf (see https://github.com/trendmicro/tlsh/issues/31) 103 added a SHA1 to the NOTICE.txt file 104 improved the make.sh so that it calls the test.sh (and does regression tests) 105 improved regression tests to confirm that the hash is calculated correctly in your environment 106 fixed the header file C++ standard violation (reserved identifier violation #21) 107 108**3.7.1** 109- resolved issue #29 - the force option for Python 110 Step 1 - adding a regression test for strings approx of length 50 111 Step 2 - add python code 112 113**3.7.2** 114- added code to set the distance parameters for ROC analysis 115to use these settings then change in CMakeLists.txt 116set(TLSH_DISTANCE_PARAMETERS 0) 117=> 118set(TLSH_DISTANCE_PARAMETERS 1) 119 120**3.7.3** 121- resolving issue #44 122- making static library the default 123 124**3.7.4** 125- resolving issue #45 126- add a timing test for TLSH 127<PRE> 128 $ bin/timing_unittest 129 build a buffer with a million bytes... 130 eval TLSH 50 times... 131 TLSH(buffer) = A12500088C838B0A0F0EC3C0ACAB82F3B8228B0308CFA302338C0F0AE2C24F28000008 132 BEFORE ms=1502905567631 133 AFTER ms=1502905573523 134 TIME ms=5892 135 TIME ms=117 per iteration 136</PRE> 137 138**3.7.5** 139- resolving issue #46 140- in include/tlsh_impl.h 141 #define SLIDING_WND_SIZE 5 142this can be varied between 4 to 8 143 144**3.8.0** 145Adding // access functions - required by tools using TLSH library 146+ int Lvalue(); 147+ int Q1ratio(); 148+ int Q2ratio(); 149 150**3.9.0** 151resolving issue #48 - tlsh_pattern program 152This tlsh_pattern program should read a pattern file 153+ col 1: pattern number 154+ col 2: nitems in group 155+ col 3: TLSH 156+ col 4: radius 157+ col 5: pattern label 158The input options should match the tlsh program 159<PRE> 160usage: tlsh_pattern [-xlen] [-force] -pat pattern_file -f file 161: tlsh_pattern [-xlen] [-force] -pat pattern_file -d digest 162: tlsh_pattern [-xlen] [-force] -pat pattern_file -r dir 163: tlsh_pattern [-xlen] [-force] -pat pattern_file -l listfile 164</PRE> 165 166**3.9.1** 167resolving issue #38 168putting in fix in rand_tags.cpp so that it generates identical output to previous version 169while safely working with pointers 170 171**3.9.2** 172<PRE> 17318/Mar/2019 174Also merged the contents of NOTICE.txt into LICENSE. 175This was done because NOTICE.txt is sometimes accidently removed when people clone this repository. 176And the LICENSE specifically states that NOTICE.txt should NOT be removed. 177 178Also added command line option -notice which displays the NOTICE.txt file 179</PRE> 180 181**3.9.3** 182<PRE> 18319/Mar/2019 184currently tlsh_pattern returns all the matches 185modify tlsh_pattern to return the best match 186 187remove the newline from the input fields when reading in the tlsh_pattern file 188</PRE> 189 190**3.9.4** 191<PRE> 19219/Mar/2019 193check in order_bug program which demonstrates issue #50 194resolved issue #50 - added code to tlsh_impl.cpp to check for invalid call sequences to update() and final() 195</PRE> 196 197**3.9.5** 198<PRE> 19919/Mar/2019 200issue #61: added a command line option -notest - do not do any testing 201 ./make.sh -notest 202</PRE> 203 204**3.9.6** 205<PRE> 20619/Mar/2019 207Have a cmake option to build tlsh with a zero byte checksum (development / research option) 208Default build has 1 byte checksum - which is strongly recommended 209</PRE> 210 211**3.9.7** 212<PRE> 21319/Mar/2019 214resolving issue #50 for bin/timing_unittest 215</PRE> 216 217**3.9.8** 218<PRE> 21919/Mar/2019 220timing_unittest measures the time taken to do distance calculations 221add a command line option -size - so that you can measure the time taken to evaluate different sizes of string 222</PRE> 223 224**3.9.9** 225<PRE> 22619/Mar/2019 227resolve issue #62 228remove dependancy on GNUInstallDirs 229</PRE> 230 231**3.10.0** 232<PRE> 23319/Mar/2019 234Adding // access function - required by tools using TLSH library 235 int BucketValue(int bucket); 236 int Checksum(int k); 237</PRE> 238 239**3.11.0** 240<PRE> 24119/Mar/2019 242Make calculation of TLSH digests approx 7 times faster (for large files) 243done by 244 - inline functions 245 - unrolling loops 246 - fixing the -O2 optimization option 247 248<H3>Timing on Amazon linux</H3> 249# "Amazon Linux 2 AMI (HVM), SSD Volume Type" 250# Description: Amazon Linux 2 comes with five years support. It provides Linux kernel 4.14 tuned for optimal performance 251# on Amazon EC2, systemd 219, GCC 7.3, Glibc 2.26, Binutils 2.29.1, and the latest software packages through extras. 252 253BEFORE 254$ ./tlsh_3_10_0/bin/timing_unittest 255build a buffer with a million bytes... 256eval TLSH (3.9.9 compact hash 1 byte checksum sliding_window=5) 50 times... 257TLSH(buffer) = A12500088C838B0A0F0EC3C0ACAB82F3B8228B0308CFA302338C0F0AE2C24F28000008 258Test 1: Evaluate TLSH digest 259BEFORE ms=1552963428277 260AFTER ms=1552963433258 261TIME ms=4981 262TIME ms=99 per iteration 263 264eval TLSH distance 50 million times... 265Test 2: Calc distance TLSH digest 266dist=138 267BEFORE ms=1552963433362 268AFTER ms=1552963440723 269TIME ms=7361 270TIME ms=147 per million iterations 271 272AFTER 273$ ./tlsh_3_11_0/bin/timing_unittest 274build a buffer with a million bytes... 275eval TLSH (3.11.0 compact hash 1 byte checksum sliding_window=5) 50 times... 276TLSH(buffer) = A12500088C838B0A0F0EC3C0ACAB82F3B8228B0308CFA302338C0F0AE2C24F28000008 277Test 1: Evaluate TLSH digest 278BEFORE ms=1552963419037 279AFTER ms=1552963419628 280TIME ms=591 281TIME ms=11 per iteration 282 283eval TLSH distance 50 million times... 284Test 2: Calc distance TLSH digest 285dist=138 286BEFORE ms=1552963419642 287AFTER ms=1552963421519 288TIME ms=1877 289TIME ms=37 per million iterations 290 291 292<H3>Timing on a Mac (Processor 3.1 GHz - running Sierra)</H3> 293 294BEFORE using tlsh_3_10_0 295$ bin/timing_unittest 296build a buffer with a million bytes... 297eval TLSH (3.10.0 compact hash 1 byte checksum sliding_window=5) 50 times... 298TLSH(buffer) = A12500088C838B0A0F0EC3C0ACAB82F3B8228B0308CFA302338C0F0AE2C24F28000008 299Test 1: Evaluate TLSH digest 300BEFORE ms=1552963383885 301AFTER ms=1552963387866 302TIME ms=3981 303TIME ms=79 per iteration 304 305eval TLSH distance 50 million times... 306Test 2: Calc distance TLSH digest 307dist=138 308BEFORE ms=1552963387951 309AFTER ms=1552963392498 310TIME ms=4547 311TIME ms=90 per million iterations 312 313AFTER using tlsh_3_11_0 314$ bin/timing_unittest 315build a buffer with a million bytes... 316eval TLSH (3.11.0 compact hash 1 byte checksum sliding_window=5) 50 times... 317TLSH(buffer) = A12500088C838B0A0F0EC3C0ACAB82F3B8228B0308CFA302338C0F0AE2C24F28000008 318Test 1: Evaluate TLSH digest 319BEFORE ms=1552963360177 320AFTER ms=1552963360791 321TIME ms=614 322TIME ms=12 per iteration 323 324eval TLSH distance 50 million times... 325Test 2: Calc distance TLSH digest 326dist=138 327BEFORE ms=1552963360808 328AFTER ms=1552963365502 329TIME ms=4694 330TIME ms=93 per million iterations 331</PRE> 332 333**3.11.1** 334<PRE> 33531/May/2019 336tidy up: 3371. use fast_b_mapping() instead of b_mapping() 3382. remove declaration of unsigned r which is never used 3393. remove #include which is not required 340</PRE> 341 342**3.12.0** 343<PRE> 34431/May/2019 345remove floating point calculations such as log() function 346use alookup table instead 347</PRE> 348 349**3.13.0** 350<PRE> 35131/May/2019 352.vcproj files and instructions for builing TLSH on Windows using Visual Studio 353Thanks Jayson Pryde! :-) 354</PRE> 355 356**3.13.1** 357<PRE> 35831/May/2019 359fixing setup.py so that you can install Python Extension on Windows 360</PRE> 361 362**3.14.0** 363<PRE> 36418/July/2019 365adding sliding window size to tlsh_version 366changing test.sh to read the sliding window size 367</PRE> 368 369**3.14.1** 370<PRE> 37118/July/2019 372fixing error in test script for -xlen option (print statements about considering length were incorrect) 373improved test.sh - tests for existance of expected output files 374</PRE> 375 376**3.15.0** 377<PRE> 37819/July/2019 379Refactor code - so that input of directory or digest is in a struct. 380The code to process input is in library code (input_desc.cpp, shared_file_functions.cpp). 381The input routines can be used by myultiple programs. 382Also, preparing for things like csv input files. 383</PRE> 384 385**3.15.1** 386<PRE> 38719/July/2019 388added command line option -help to show full help information 389</PRE> 390 391**3.15.2** 392<PRE> 39319/July/2019 394tlsh_pattrern uses refactored code introduced in 3.15.0 395</PRE> 396 397**3.16.0** 398<PRE> 39919/July/2019 400improved tlsh_pattern functionality 401usage: tlsh_pattern -f <file> [-showmiss T] -pat <pattern_file> [-xlen] [-force] 402 : tlsh_pattern -d <digest> [-showmiss T] -pat <pattern_file> [-xlen] [-force] 403 : tlsh_pattern -r <dir> [-showmiss T] -pat <pattern_file> [-xlen] [-force] 404 : tlsh_pattern -l <listfile> [-l1|-l2|-lcsv] [-showmiss T] -pat <pattern_file> [-xlen] [-force] 405 : tlsh_pattern -version: prints version of tlsh library 406add options 407- to have different columns of a listfile be processed (-l1 or -l2) 408- to allow a listfile to be in CSV format (-lcsv) 409- to show misses up to threshold T (-showmiss T) 410added regression tests for tlsh_pattern 411</PRE> 412 413**3.16.1** 414<PRE> 41519/July/2019 416improved tlsh functionality 417add options 418 -out_fname: Specifies that only the filename is outputted when using the -r option (no path included in output) 419 -out_dirname Specifies that the dirname and filename are outputted when using the -r option (no path included in output) 420 421 -l1 (default) listfile contains TLSH value in column 1 422 -l2 listfile contains TLSH value in column 2 423 -lcsv listfile is csv (comma seperated) file (default is TAB seperated file) 424 425 -split linenumbers: linenumbers is a comma seperated list of line numbers (example 50,100,200 ) 426 split the file into components and eval the TLSH for each component 427 example. -split 50,100,200 evals 4 TLSH digests. lines 1-49, 50-99, 100-199, 200-end 428 for the purpose of splitting the file, each line has a max length of 2048 bytes 429</PRE> 430 431**3.16.2** 432<PRE> 43319/July/2019 434added regression tests for 3.16.1 435by adding tests for -split, swapping columns in input files, and for CVS input file 436</PRE> 437 438**3.17.0** 439<PRE> 44019/July/2019 441Make command line option -force (50 character limit) the default behaviour 442Add a command line option -conservative (256 character limit) 443change the force_option parameter to be a bit field 444 force_option == 0 Default (50 char limit) 445 force_option == 1 Force behaviour (50 char limit) 446 force_option == 2 conservative behaviour (256 char limit) 447</PRE> 448 449**3.17.3** 450<PRE> 45124/March/2020 452add checking to confirm that TLSH digests are the correct length in 453 -c option 454 -d option 455 the appropriate column of -l listfile options 456</PRE> 457 458**3.18.0** 459<PRE> 46024/March/2020 461resolve issue #72 - remove tlsh_version 462</PRE> 463 464**3.19.0** 465<PRE> 46624/March/2020 467preperation for Windows build 468- remove ../Testing/ from test.sh script and from regression test results 469</PRE> 470 471**3.19.1** 472<PRE> 47325/March/2020 474in test.sh and testlen.sh - make TLSH_PROG a variable 475</PRE> 476 477**4.0.0** 478<PRE> 47926/March/2020 480version 4: adding version identifier to each digest: 'T1' 481 adding command line option -old to generate old style digests 482 In this version - the showvers is defaulted to off - so this will pass the old regression tests 483</PRE> 484 485**4.0.1** 486<PRE> 48726/March/2020 488version 4: adding version identifier to each digest: 'T1' 489 turing on T1 functionality by setting showvers=1 in main 490 updating regression tests to have T1 at the start of digests 491</PRE> 492 493**4.1.0** 494<PRE> 49526/March/2020 496 adding -o option for output filename (output will go to stdout if no output file given) 497 changed test scripts to use -o option 498 adding -ojson option for json output 499 added regression test for -ojson option 500 adding -onull option to output empty files / files too small as TNULL 501</PRE> 502 503**4.2.0** 504<PRE> 50526/March/2020 506 Windows version using minGW 507</PRE> 508 509**4.2.1** 510<PRE> 51127/March/2020 512 resolve issue #78 json objects do not validate on windows 513</PRE> 514 515**4.2.2** 516<PRE> 51717/April/2020 518 resolve issue #81 519 Pass regression tests 520</PRE> 521 522**4.2.3** 523<PRE> 52422/April/2020 525 add regression tests that are compatible with 526 https://github.com/glaslos/tlsh 527 To use 528 $ cd Testing 529 edit tlsh_go script to set prog= your Go TLSH application 530 $ ./test.sh _go 531</PRE> 532 533**4.3.0** 534<PRE> 53526/June/2020 536 issue #79 - divide by 0 if q3 == 0 537 solution. if (q3 == 0) return invalid hash 538</PRE> 539 540**4.4.0** 541<PRE> 54208/Nov/2020 543 Fixing Python Extension 544 - updated python extension to T1 hashes (4.0.0) 545 - fixed python_test.sh (which attempted to access old expected results files) 546 now passes test 547 - added license information to py_ext/tlshmodule.cpp 548</PRE> 549 550**4.4.1** 551<PRE> 55209/Dec/2020 553 Command line options to tlsh_digest.py 554 -conservative enforce 256 byte limit 555 -old generate old style hash (without "T1") 556 added python functions to tlsh package (for backwards compatibility) 557 tlsh.oldhash(data) 558 tlsh.conservativehash(data) 559 tlsh.oldconservativehash(data) 560</PRE> 561 562**4.5.0** 563<PRE> 56410/Dec/2020 565 Checking in files to create pypi package 566</PRE> 567 568**4.6.0** 569<PRE> 57023/04/2021 571 Merging in pull requests 572 issue #99 - new Java version that solves large file problem (Thanks Daniel) 573 Add architecture ppc64le to travis build (Thanks ddeka2910) 574 Fix tmpArray is undefined in JavaScript version (Thanks carbureted) 575</PRE> 576 577**4.7.0** 578<PRE> 57929/06/2021 580 Release updated package py-tlsh on Pypi.org 581 Merging in pull request that adds functions to Python package 582 lvalue, q1ratio, q2ratio, checksum, bucket_value and is_valid 583 resolve issue #102 - correct Python version numbers 584</PRE> 585 586**4.7.2** 587<PRE> 58802/07/2021 589 Release updated package py-tlsh on Pypi.org 590 regression tests for C++ and Python functions for: 591 lvalue, q1ratio, q2ratio, checksum, bucket_value 592 resolve issue #95 - allow Requires-Python: >=2.7 593</PRE> 594