1# Change History
2
3**3.0.0**
4- Implemented TLSH.
5- Updated to build with CMake.
6
7**3.0.1**
8- Enabled C++ optimization. Runs 4x faster.
9
10**3.0.2**
11- Supports Windows and Visual Studio.
12
13**3.0.3**
14- Added Python extension library. TLSH is callable in Python.
15- Stop generating hash if the input is less than 512 bytes.
16- Cleaned up.
17
18**3.0.4**
19- Length difference consideration can be disabled in this version. See `totalDiff` in `tlsh.h`.
20- TLSH can be compiled to generate the 70 or 134 character hashes. The longer version is more accurate.
21
22**3.1.0**
23- The checksum can be changed from 1 byte to 3 bytes. The collison rate is lower using 3 bytes.
24- If the incoming data has few features. The algorithm will not generate hash value. At least half the buckets must be non-zero.
25- Null or invalid hash strings comparison will return `-EINVAL` (-22).
26- Python extension library will read `CMakeLists.txt` to pick the compile options.
27- The default build will use half the buckets and 1 byte checksum.
28- New executable `tlsh_version` reports number of buckets, checksum length.
29
30**3.1.1**
31- Add `make.sh` and `clean.sh` scripts for building/cleaning the project.
32- Modifications to `tlsh_unittest.cpp` to write errors to stderr (not stdout) and to continue processing in some error cases. Also handle a listfile (`-l` parameter) which contains both TLSH and filename.
33- Updated expected output files based on changes to `tlsh_unittest.cpp`.
34
35**3.1.2**
36- Updated the Testing/exp expected results.
37- Created a script to ease the creation of the Testing/exp expected results.
38
39**3.1.3**
40- Updated `tlsh_util.h`, `tlsh_impl.cpp`, `tlsh_util.cpp` on checksum.
41- Updated `destroy_refersh_exp.sh` and Testing/exp results.
42
43**3.2.0**
44- Add Visual Studio 2005 and 2008 project and solution files to enable build on Windows environment.
45- Added files `WinFunctions.h` and `WinFunctions.cpp` to handle code changes needed for Windows build.
46- Modified several unit test expected output files to remove error messages, to allow the running of unit tests on Windows under Cygwin.  This was caused by the opposite order in which stdout and stderr are written when stderr is redirected to stdout as 2>&1.  Also modified `test.sh` to write stderr to `/dev/null`.
47- Move `rand_tags` executable from tlsh_forest project to tlsh, to reduce the dependencies of the tlsh ROC analysis project, which depends upon `tlsh_unittest` and `rand_tags`.
48- Remove `simple_unittest` and `tlsh_version` from bin directory as these executables are for internal testing and source code documentation, and do not need to be exported.
49- Add -version flag to `tlsh_unittest` to get the version of the tlsh library.
50
51**3.2.1**
52- Pickup fix to `hash_py()` in `py_ext/tlshmodule.cpp` (commit da5370bcfdd40dd6a33c877ee87fe3866188cf2d).
53
54**3.3.0**
55- Made the minimum data length = 256 for the C version.
56	This was reduced in version 3.5.0 to 50 bytes (the force option)
57
58**3.3.1**
59- Fixed bug introduced by commit 1a8f1c581c8b988ced683ff8e0a0f9c574058df4 which caused a different hash value to be generated if there were multiple calls to `Tlsh::update` as opposed to a single call.
60
61**3.4.0**
62- Add JavaScript implementation (see directory `js_ext`) - required for [Blackhat presentation](https://www.blackhat.com/us-15/speakers/Sean-Park.html).
63- Modify `tlsh_unittest` so that it can output tlsh values and filenames correctly, when the filenames contain embedded newline, linefeed or tab characters.
64
65**3.4.1**
66- Thanks to Jeremy Bobbios `py_ext` patch. TLSH has these enhancements.
67- Instead of using a big memory blob, it will calculate the hash incrementally.
68- A hashlib like object-oriented interface has been added to the Python module. See `test.py`.
69- Restrict the function to be fed bytes-like object to remove surprises like silent UTF-8 decoding.
70
71**3.4.2**
72- Back out python regression test as part of the test.sh script, so that the python module does not need to be installed in order to successfully pass the tests run by make.sh
73
74**3.4.3**
75- Fix regression tests running on Windows
76
77**3.4.4**
78- Specify Tlsh::getHash() is a const method
79
80**3.4.5**
81- Pick up Jeremy Bobbios patches for:
82  - Build shared library (libtlsh.so), in addition to static library, on Linux and have tlsh_unittest link to it.
83  - Remove TlshImpl symbols from libtlsh.so
84  - Add Tlsh_init to py_ext/tlshmodule.cpp, which ensures Tlsh constructor will be called from Tlsh python module
85  - Create symbolic link for tlsh -> tlsh_unittest
86
87**3.5.0**
88- Added the - force option
89  - Allows a user to force the generation of digests for strings down to 50 characters long
90
91**3.5.1**
92- Fixed the error in the Python extension
93
94**3.5.2**
95- Added the BlackHat Asia tool (presented at Arsenal)
96
97**3.6.0**
98- skipped
99
100**3.7.0**
101- merged in various fixes - ifdef for SPARC and RH73
102  corrected TLSH_CTC_final.pdf (see https://github.com/trendmicro/tlsh/issues/31)
103  added a SHA1 to the NOTICE.txt file
104  improved the make.sh so that it calls the test.sh (and does regression tests)
105  improved regression tests to confirm that the hash is calculated correctly in your environment
106  fixed the header file C++ standard violation (reserved identifier violation #21)
107
108**3.7.1**
109- resolved issue #29 - the force option for Python
110  Step 1 - adding a regression test for strings approx of length 50
111  Step 2 - add python code
112
113**3.7.2**
114- added code to set the distance parameters for ROC analysis
115to use these settings then change in CMakeLists.txt
116set(TLSH_DISTANCE_PARAMETERS 0)
117=>
118set(TLSH_DISTANCE_PARAMETERS 1)
119
120**3.7.3**
121- resolving issue #44
122- making static library the default
123
124**3.7.4**
125- resolving issue #45
126- add a timing test for TLSH
127<PRE>
128  $ bin/timing_unittest
129  build a buffer with a million bytes...
130  eval TLSH 50 times...
131  TLSH(buffer) = A12500088C838B0A0F0EC3C0ACAB82F3B8228B0308CFA302338C0F0AE2C24F28000008
132  BEFORE	ms=1502905567631
133  AFTER	ms=1502905573523
134  TIME	ms=5892
135  TIME	ms=117	per iteration
136</PRE>
137
138**3.7.5**
139- resolving issue #46
140- in include/tlsh_impl.h
141	#define SLIDING_WND_SIZE  5
142this can be varied between 4 to 8
143
144**3.8.0**
145Adding    // access functions - required by tools using TLSH library
146+    int Lvalue();
147+    int Q1ratio();
148+    int Q2ratio();
149
150**3.9.0**
151resolving issue #48 - tlsh_pattern program
152This tlsh_pattern program should read a pattern file
153+ col 1: pattern number
154+ col 2: nitems in group
155+ col 3: TLSH
156+ col 4: radius
157+ col 5: pattern label
158The input options should match the tlsh program
159<PRE>
160usage: tlsh_pattern [-xlen] [-force] -pat pattern_file -f file
161: tlsh_pattern [-xlen] [-force] -pat pattern_file -d digest
162: tlsh_pattern [-xlen] [-force] -pat pattern_file -r dir
163: tlsh_pattern [-xlen] [-force] -pat pattern_file -l listfile
164</PRE>
165
166**3.9.1**
167resolving issue #38
168putting in fix in rand_tags.cpp so that it generates identical output to previous version
169while safely working with pointers
170
171**3.9.2**
172<PRE>
17318/Mar/2019
174Also merged the contents of NOTICE.txt into LICENSE.
175This was done because NOTICE.txt is sometimes accidently removed when people clone this repository.
176And the LICENSE specifically states that NOTICE.txt should NOT be removed.
177
178Also added command line option -notice which displays the NOTICE.txt file
179</PRE>
180
181**3.9.3**
182<PRE>
18319/Mar/2019
184currently tlsh_pattern returns all the matches
185modify tlsh_pattern to return the best match
186
187remove the newline from the input fields when reading in the tlsh_pattern file
188</PRE>
189
190**3.9.4**
191<PRE>
19219/Mar/2019
193check in order_bug program which demonstrates issue #50
194resolved issue #50 - added code to tlsh_impl.cpp to check for invalid call sequences to update() and final()
195</PRE>
196
197**3.9.5**
198<PRE>
19919/Mar/2019
200issue #61: added a command line option -notest - do not do any testing
201	./make.sh -notest
202</PRE>
203
204**3.9.6**
205<PRE>
20619/Mar/2019
207Have a cmake option to build tlsh with a zero byte checksum (development / research option)
208Default build has 1 byte checksum - which is strongly recommended
209</PRE>
210
211**3.9.7**
212<PRE>
21319/Mar/2019
214resolving issue #50 for bin/timing_unittest
215</PRE>
216
217**3.9.8**
218<PRE>
21919/Mar/2019
220timing_unittest measures the time taken to do distance calculations
221add a command line option -size - so that you can measure the time taken to evaluate different sizes of string
222</PRE>
223
224**3.9.9**
225<PRE>
22619/Mar/2019
227resolve issue #62
228remove dependancy on GNUInstallDirs
229</PRE>
230
231**3.10.0**
232<PRE>
23319/Mar/2019
234Adding // access function - required by tools using TLSH library
235	int BucketValue(int bucket);
236	int Checksum(int k);
237</PRE>
238
239**3.11.0**
240<PRE>
24119/Mar/2019
242Make calculation of TLSH digests approx 7 times faster (for large files)
243done by
244	- inline functions
245	- unrolling loops
246	- fixing the -O2 optimization option
247
248<H3>Timing on Amazon linux</H3>
249#       "Amazon Linux 2 AMI (HVM), SSD Volume Type"
250#       Description: Amazon Linux 2 comes with five years support. It provides Linux kernel 4.14 tuned for optimal performance
251#               on Amazon EC2, systemd 219, GCC 7.3, Glibc 2.26, Binutils 2.29.1, and the latest software packages through extras.
252
253BEFORE
254$ ./tlsh_3_10_0/bin/timing_unittest
255build a buffer with a million bytes...
256eval TLSH (3.9.9 compact hash 1 byte checksum sliding_window=5) 50 times...
257TLSH(buffer) = A12500088C838B0A0F0EC3C0ACAB82F3B8228B0308CFA302338C0F0AE2C24F28000008
258Test 1: Evaluate TLSH digest
259BEFORE	ms=1552963428277
260AFTER	ms=1552963433258
261TIME	ms=4981
262TIME	ms=99	per iteration
263
264eval TLSH distance 50 million times...
265Test 2: Calc distance TLSH digest
266dist=138
267BEFORE	ms=1552963433362
268AFTER	ms=1552963440723
269TIME	ms=7361
270TIME	ms=147	per million iterations
271
272AFTER
273$ ./tlsh_3_11_0/bin/timing_unittest
274build a buffer with a million bytes...
275eval TLSH (3.11.0 compact hash 1 byte checksum sliding_window=5) 50 times...
276TLSH(buffer) = A12500088C838B0A0F0EC3C0ACAB82F3B8228B0308CFA302338C0F0AE2C24F28000008
277Test 1: Evaluate TLSH digest
278BEFORE	ms=1552963419037
279AFTER	ms=1552963419628
280TIME	ms=591
281TIME	ms=11	per iteration
282
283eval TLSH distance 50 million times...
284Test 2: Calc distance TLSH digest
285dist=138
286BEFORE	ms=1552963419642
287AFTER	ms=1552963421519
288TIME	ms=1877
289TIME	ms=37	per million iterations
290
291
292<H3>Timing on a Mac (Processor 3.1 GHz - running Sierra)</H3>
293
294BEFORE using tlsh_3_10_0
295$ bin/timing_unittest
296build a buffer with a million bytes...
297eval TLSH (3.10.0 compact hash 1 byte checksum sliding_window=5) 50 times...
298TLSH(buffer) = A12500088C838B0A0F0EC3C0ACAB82F3B8228B0308CFA302338C0F0AE2C24F28000008
299Test 1: Evaluate TLSH digest
300BEFORE	ms=1552963383885
301AFTER	ms=1552963387866
302TIME	ms=3981
303TIME	ms=79	per iteration
304
305eval TLSH distance 50 million times...
306Test 2: Calc distance TLSH digest
307dist=138
308BEFORE	ms=1552963387951
309AFTER	ms=1552963392498
310TIME	ms=4547
311TIME	ms=90	per million iterations
312
313AFTER using tlsh_3_11_0
314$ bin/timing_unittest
315build a buffer with a million bytes...
316eval TLSH (3.11.0 compact hash 1 byte checksum sliding_window=5) 50 times...
317TLSH(buffer) = A12500088C838B0A0F0EC3C0ACAB82F3B8228B0308CFA302338C0F0AE2C24F28000008
318Test 1: Evaluate TLSH digest
319BEFORE	ms=1552963360177
320AFTER	ms=1552963360791
321TIME	ms=614
322TIME	ms=12	per iteration
323
324eval TLSH distance 50 million times...
325Test 2: Calc distance TLSH digest
326dist=138
327BEFORE	ms=1552963360808
328AFTER	ms=1552963365502
329TIME	ms=4694
330TIME	ms=93	per million iterations
331</PRE>
332
333**3.11.1**
334<PRE>
33531/May/2019
336tidy up:
3371. use fast_b_mapping() instead of b_mapping()
3382. remove declaration of unsigned r which is never used
3393. remove #include which is not required
340</PRE>
341
342**3.12.0**
343<PRE>
34431/May/2019
345remove floating point calculations such as log() function
346use alookup table instead
347</PRE>
348
349**3.13.0**
350<PRE>
35131/May/2019
352.vcproj files and instructions for builing TLSH on Windows using Visual Studio
353Thanks Jayson Pryde! :-)
354</PRE>
355
356**3.13.1**
357<PRE>
35831/May/2019
359fixing setup.py so that you can install Python Extension on Windows
360</PRE>
361
362**3.14.0**
363<PRE>
36418/July/2019
365adding sliding window size to tlsh_version
366changing test.sh to read the sliding window size
367</PRE>
368
369**3.14.1**
370<PRE>
37118/July/2019
372fixing error in test script for -xlen option (print statements about considering length were incorrect)
373improved test.sh - tests for existance of expected output files
374</PRE>
375
376**3.15.0**
377<PRE>
37819/July/2019
379Refactor code - so that input of directory or digest is in a struct.
380The code to process input is in library code (input_desc.cpp, shared_file_functions.cpp).
381The input routines can be used by myultiple programs.
382Also, preparing for things like csv input files.
383</PRE>
384
385**3.15.1**
386<PRE>
38719/July/2019
388added command line option -help to show full help information
389</PRE>
390
391**3.15.2**
392<PRE>
39319/July/2019
394tlsh_pattrern uses refactored code introduced in 3.15.0
395</PRE>
396
397**3.16.0**
398<PRE>
39919/July/2019
400improved tlsh_pattern functionality
401usage: tlsh_pattern -f <file>                     [-showmiss T] -pat <pattern_file> [-xlen] [-force]
402     : tlsh_pattern -d <digest>                   [-showmiss T] -pat <pattern_file> [-xlen] [-force]
403     : tlsh_pattern -r <dir>                      [-showmiss T] -pat <pattern_file> [-xlen] [-force]
404     : tlsh_pattern -l <listfile> [-l1|-l2|-lcsv] [-showmiss T] -pat <pattern_file> [-xlen] [-force]
405     : tlsh_pattern -version: prints version of tlsh library
406add options
407- to have different columns of a listfile be processed (-l1 or -l2)
408- to allow a listfile to be in CSV format (-lcsv)
409- to show misses up to threshold T (-showmiss T)
410added regression tests for tlsh_pattern
411</PRE>
412
413**3.16.1**
414<PRE>
41519/July/2019
416improved tlsh functionality
417add options
418  -out_fname:         Specifies that only the filename is outputted when using the -r option (no path included in output)
419  -out_dirname        Specifies that the dirname and filename are outputted when using the -r option (no path included in output)
420
421  -l1                 (default) listfile contains TLSH value in column 1
422  -l2                           listfile contains TLSH value in column 2
423  -lcsv               listfile is csv (comma seperated) file (default is TAB seperated file)
424
425  -split linenumbers: linenumbers is a comma seperated list of line numbers (example 50,100,200 )
426                      split the file into components and eval the TLSH for each component
427                      example. -split 50,100,200 evals 4 TLSH digests. lines 1-49, 50-99, 100-199, 200-end
428                      for the purpose of splitting the file, each line has a max length of 2048 bytes
429</PRE>
430
431**3.16.2**
432<PRE>
43319/July/2019
434added regression tests for 3.16.1
435by adding tests for -split, swapping columns in input files, and for CVS input file
436</PRE>
437
438**3.17.0**
439<PRE>
44019/July/2019
441Make command line option	-force		(50 character limit) the default behaviour
442Add a command line option	-conservative	(256 character limit)
443change the force_option parameter to be a bit field
444	force_option	==	0	Default (50 char limit)
445	force_option	==	1	Force behaviour (50 char limit)
446	force_option	==	2	conservative behaviour (256 char limit)
447</PRE>
448
449**3.17.3**
450<PRE>
45124/March/2020
452add checking to confirm that TLSH digests are the correct length in
453	-c option
454	-d option
455	the appropriate column of -l listfile options
456</PRE>
457
458**3.18.0**
459<PRE>
46024/March/2020
461resolve issue #72 - remove tlsh_version
462</PRE>
463
464**3.19.0**
465<PRE>
46624/March/2020
467preperation for Windows build
468- remove ../Testing/ from test.sh script and from regression test results
469</PRE>
470
471**3.19.1**
472<PRE>
47325/March/2020
474in test.sh and testlen.sh - make TLSH_PROG a variable
475</PRE>
476
477**4.0.0**
478<PRE>
47926/March/2020
480version 4: adding version identifier to each digest: 'T1'
481	adding command line option -old to generate old style digests
482	In this version - the showvers is defaulted to off - so this will pass the old regression tests
483</PRE>
484
485**4.0.1**
486<PRE>
48726/March/2020
488version 4: adding version identifier to each digest: 'T1'
489	turing on T1 functionality by setting showvers=1 in main
490	updating regression tests to have T1 at the start of digests
491</PRE>
492
493**4.1.0**
494<PRE>
49526/March/2020
496        adding -o option for output filename (output will go to stdout if no output file given)
497		changed test scripts to use -o option
498        adding -ojson option for json output
499		added regression test for -ojson option
500        adding -onull option to output empty files / files too small as TNULL
501</PRE>
502
503**4.2.0**
504<PRE>
50526/March/2020
506	Windows version using minGW
507</PRE>
508
509**4.2.1**
510<PRE>
51127/March/2020
512	resolve issue #78 json objects do not validate on windows
513</PRE>
514
515**4.2.2**
516<PRE>
51717/April/2020
518	resolve issue #81
519	Pass regression tests
520</PRE>
521
522**4.2.3**
523<PRE>
52422/April/2020
525	add regression tests that are compatible with
526		https://github.com/glaslos/tlsh
527	To use
528		$ cd Testing
529		edit tlsh_go script to set prog= your Go TLSH application
530		$ ./test.sh _go
531</PRE>
532
533**4.3.0**
534<PRE>
53526/June/2020
536	issue #79 - divide by 0 if q3 == 0
537		solution. if (q3 == 0) return invalid hash
538</PRE>
539
540**4.4.0**
541<PRE>
54208/Nov/2020
543	Fixing Python Extension
544	- updated python extension to T1 hashes (4.0.0)
545	- fixed python_test.sh (which attempted to access old expected results files)
546		now passes test
547	- added license information to py_ext/tlshmodule.cpp
548</PRE>
549
550**4.4.1**
551<PRE>
55209/Dec/2020
553	Command line options to tlsh_digest.py
554		-conservative	enforce 256 byte limit
555		-old		generate old style hash (without "T1")
556	added python functions to tlsh package (for backwards compatibility)
557		tlsh.oldhash(data)
558		tlsh.conservativehash(data)
559		tlsh.oldconservativehash(data)
560</PRE>
561
562**4.5.0**
563<PRE>
56410/Dec/2020
565	Checking in files to create pypi package
566</PRE>
567
568**4.6.0**
569<PRE>
57023/04/2021
571	Merging in pull requests
572	issue #99 - new Java version that solves large file problem (Thanks Daniel)
573	Add architecture ppc64le to travis build (Thanks ddeka2910)
574	Fix tmpArray is undefined in JavaScript version (Thanks carbureted)
575</PRE>
576
577**4.7.0**
578<PRE>
57929/06/2021
580	Release updated package py-tlsh on Pypi.org
581	Merging in pull request that adds functions to Python package
582		lvalue, q1ratio, q2ratio, checksum, bucket_value and is_valid
583	resolve issue #102 - correct Python version numbers
584</PRE>
585
586**4.7.2**
587<PRE>
58802/07/2021
589	Release updated package py-tlsh on Pypi.org
590	regression tests for C++ and Python functions for:
591		lvalue, q1ratio, q2ratio, checksum, bucket_value
592	resolve issue #95 - allow Requires-Python: >=2.7
593</PRE>
594