• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

example_scripts/H24-Mar-2021-7930

testdir/H03-May-2022-857794

.gitignoreH A D24-Mar-2021409 5145

CHANGESH A D24-Mar-202110.2 KiB286203

INSTALLH A D24-Mar-20213.5 KiB7357

LICENSEH A D24-Mar-20211.2 KiB2217

MakefileH A D24-Mar-20216.2 KiB211124

README.mdH A D24-Mar-202129.2 KiB548452

README.stupid_dupesH A D24-Mar-20212.3 KiB6245

act_dedupefiles.cH A D24-Mar-20214.7 KiB146111

act_dedupefiles.hH A D24-Mar-2021363 1911

act_deletefiles.cH A D24-Mar-20214.4 KiB157121

act_deletefiles.hH A D24-Mar-2021356 1911

act_linkfiles.cH A D24-Mar-202112.3 KiB359304

act_linkfiles.hH A D24-Mar-2021343 1911

act_printjson.cH A D24-Mar-20214.6 KiB165135

act_printjson.hH A D24-Mar-2021401 1911

act_printmatches.cH A D24-Mar-20212 KiB8463

act_printmatches.hH A D24-Mar-2021409 2012

act_summarize.cH A D24-Mar-20211.1 KiB4435

act_summarize.hH A D24-Mar-2021367 1911

chroot_build.shH A D24-Mar-20212.1 KiB7966

compare_jdupes.shH A D24-Mar-20211.1 KiB4028

generate_packages.shH A D24-Mar-20213.1 KiB8665

jdupes.1H A D24-Mar-202112.8 KiB369332

jdupes.cH A D24-Mar-202180.1 KiB2,4001,869

jdupes.hH A D24-Mar-20217.2 KiB290222

jody_cacheinfo.cH A D24-Mar-20212.6 KiB13392

jody_cacheinfo.hH A D24-Mar-2021655 3925

jody_paths.cH A D24-Mar-20214.4 KiB15397

jody_paths.hH A D24-Mar-2021457 2415

jody_sort.cH A D24-Mar-20213 KiB10564

jody_sort.hH A D24-Mar-2021323 1911

jody_strtoepoch.cH A D24-Mar-20212.3 KiB7147

jody_strtoepoch.hH A D24-Mar-2021361 2110

jody_win_unicode.cH A D24-Mar-20212.2 KiB9269

jody_win_unicode.hH A D24-Mar-2021822 3626

linux-dedupe-static.hH A D24-Mar-20211.8 KiB5926

oom.cH A D24-Mar-2021395 178

oom.hH A D24-Mar-2021239 1810

string_malloc.cH A D24-Mar-20217.8 KiB307197

string_malloc.hH A D24-Mar-2021700 3222

stupid_dupes.shH A D24-Mar-202111.9 KiB340224

tune_winres.shH A D24-Mar-20211 KiB2314

version.hH A D24-Mar-2021256 115

win_stat.cH A D24-Mar-20212.2 KiB7354

win_stat.hH A D24-Mar-20211.6 KiB5542

winres.manifest.xmlH A D24-Mar-2021445 109

winres.rcH A D24-Mar-2021859 3431

winres_xp.rcH A D24-Mar-2021833 3230

xxhash.cH A D24-Mar-202129.4 KiB913626

xxhash.hH A D24-Mar-202112.3 KiB295109

README.md

1Introduction
2-------------------------------------------------------------------------------
3jdupes is a program for identifying and taking actions upon duplicate files.
4
5A WORD OF WARNING: jdupes IS NOT a drop-in compatible replacement for fdupes!
6Do not blindly replace fdupes with jdupes in scripts and expect everything to
7work the same way. Option availability and meanings differ between the two
8programs. For example, the `-I` switch in jdupes means "isolate" and blocks
9intra-argument matching, while in fdupes it means "immediately delete files
10during scanning without prompting the user."
11
12Please consider financially supporting continued development of jdupes:
13
14https://www.subscribestar.com/JodyBruchon
15
16
17v1.19.0 specific: extfilter behavior has changed, check your scripts!
18-------------------------------------------------------------------------------
19There were some inconsistencies in the behavior of the extfilter framework that
20stemmed from its origins in the exclusion option `-x`. These inconsistencies
21have been resolved and extfilters now work correctly. Unfortunately, this also
22means that the meaning of several filters has changed, particularly the size
23filters. The `-X size[+-=]` option now includes by the specified size criteria,
24rather than excluding, which will cause problems with existing shell scripts.
25It is extremely important that any shell scripts currently using the size
26extfilter be revised to take the new meaning into account. Use `jdupes -v`
27output in your script to do a version check if needed.
28
29v1.15+ specific: Why is the addition of single files not working?
30-------------------------------------------------------------------------------
31If a file was added through recursion and also added explicitly, that file
32would end up matching itself. This issue can be seen in v1.14.1 or older
33versions that support single file addition using a command like this in the
34jdupes source code directory:
35
36/usr/src/jdupes$ jdupes -rH testdir/isolate/1/ testdir/isolate/1/1.txt
37testdir/isolate/1/1.txt
38testdir/isolate/1/1.txt
39testdir/isolate/1/2.txt
40
41Even worse, using the special dot directory will make it happen without the -H
42option, which is how I discovered this bug:
43
44
45/usr/src/jdupes/testdir/isolate/1$ jdupes . 1.txt
46./1.txt
47./2.txt
481.txt
49
50This works for any path with a single dot directory anywhere in the path, so it
51has a good deal of potential for data loss in some use cases. As such, the best
52option was to shove out a new minor release with this feature turned off until
53some additional checking can be done, e.g. by making sure the canonical paths
54aren't identical between any two files.
55
56A future release will fix this safely.
57
58
59Why use jdupes instead of the original fdupes or other duplicate finders?
60-------------------------------------------------------------------------------
61The biggest reason is raw speed. In testing on various data sets, jdupes is
62over 7 times faster than fdupes-1.51 on average.
63
64jdupes provides a native Windows port. Most duplicate scanners built on Linux
65and other UNIX-like systems do not compile for Windows out-of-the-box and even
66if they do, they don't support Unicode and other Windows-specific quirks and
67features.
68
69jdupes is generally stable. All releases of jdupes are compared against a known
70working reference versions of fdupes or jdupes to be certain that output does
71not change. You get the benefits of an aggressive development process without
72putting your data at increased risk.
73
74Code in jdupes is written with data loss avoidance as the highest priority.  If
75a choice must be made between being aggressive or careful, the careful way is
76always chosen.
77
78jdupes includes features that are not always found elsewhere. Examples of such
79features include block-level data deduplication and control over which file is
80kept when a match set is automatically deleted. jdupes is not afraid of
81dropping features of low value; a prime example is the `-1` switch which
82outputs all matches in a set on one line, a feature which was found to be
83useless in real-world tests and therefore thrown out.
84
85While jdupes maintains some degree of compatibility with fdupes from which it
86was originally derived, there is no guarantee that it will continue to maintain
87such compatibility in the future. However, compatibility will be retained
88between minor versions, i.e. jdupes-1.6 and jdupes-1.6.1 should not have any
89significant differences in results with identical command lines.
90
91If the program eats your dog or sets fire to your lawn, the authors cannot be
92held responsible. If you notice a bug, please report it.
93
94
95What jdupes is not: a similar (but not identical) file finding tool
96-------------------------------------------------------------------------------
97Please note that jdupes ONLY works on 100% exact matches. It does not have any
98sort of "similarity" matching, nor does it know anything about any specific
99file formats such as images or sounds. Something as simple as a change in
100embedded metadata such as the ID3 tags in an MP3 file or the EXIF information
101in a JPEG image will not change the sound or image presented to the user when
102opened, but technically it makes the file no longer identical to the original.
103
104Plenty of excellent tools already exist to "fuzzy match" specific file types
105using knowledge of their file formats to help. There are no plans to add this
106type of matching to jdupes.
107
108There are some match options available in jdupes that enable dangerous file
109matching based on partial or likely but not 100% certain matching. These are
110considered expert options for special situations and are clearly and loudly
111documented as being dangerous. The `-Q` and `-T` options are notable examples,
112and the extreme danger of the `-T` option is safeguarded by a requirement to
113specify it twice so it can't be used accidentally.
114
115
116How can I do stuff with jdupes that isn't supported by jdupes?
117-------------------------------------------------------------------------------
118The standard output format of jdupes is extremely simple. Match sets are
119presented with one file path per line, and match sets are separated by a blank
120line. This is easy to process with fairly simple shell scripts. You can find
121example shell scripts in the "example scripts" directory in the jdupes source
122code. The main example script, "example.sh", is easy to modify to take basic
123actions on each file in a match set. These scripts are used by piping the
124standard jdupes output to them:
125
126jdupes dir1 dir2 dir3 | example.sh scriptparameters
127
128
129Usage
130-------------------------------------------------------------------------------
131```
132Usage: jdupes [options] DIRECTORY...
133```
134
135Duplicate file sets will be printed by default unless a different action
136option is specified (delete, summarize, link, dedupe, etc.)
137
138```
139 -@ --loud              output annoying low-level debug info while running
140 -0 --printnull         output nulls instead of CR/LF (like 'find -print0')
141 -1 --one-file-system   do not match files on different filesystems/devices
142 -A --nohidden          exclude hidden files from consideration
143 -B --dedupe            do a copy-on-write (reflink/clone) deduplication
144 -C --chunksize=#       override I/O chunk size (min 4096, max 16777216)
145 -d --delete            prompt user for files to preserve and delete all
146                        others; important: under particular circumstances,
147                        data may be lost when using this option together
148                        with -s or --symlinks, or when specifying a
149                        particular directory more than once; refer to the
150                        documentation for additional information
151 -D --debug             output debug statistics after completion
152 -f --omitfirst         omit the first file in each set of matches
153 -h --help              display this help message
154 -H --hardlinks         treat any linked files as duplicate files. Normally
155                        linked files are treated as non-duplicates for safety
156 -i --reverse           reverse (invert) the match sort order
157 -I --isolate           files in the same specified directory won't match
158 -j --json              produce JSON (machine-readable) output
159 -l --linksoft          make relative symlinks for duplicates w/o prompting
160 -L --linkhard          hard link all duplicate files without prompting
161                        Windows allows a maximum of 1023 hard links per file
162 -m --summarize         summarize dupe information
163 -M --printwithsummary  will print matches and --summarize at the end
164 -N --noprompt          together with --delete, preserve the first file in
165                        each set of duplicates and delete the rest without
166                        prompting the user
167 -o --order=BY          select sort order for output, linking and deleting:
168                        by mtime (BY=time) or filename (BY=name, the default)
169 -O --paramorder        sort output files in order of command line parameter
170sequence
171                        Parameter order is more important than selected -o sort
172                        which applies should several files share the same
173parameter order
174 -p --permissions       don't consider files with different owner/group or
175                        permission bits as duplicates
176 -P --print=type        print extra info (partial, early, fullhash)
177 -q --quiet             hide progress indicator
178 -Q --quick             skip byte-by-byte duplicate verification. WARNING:
179                        this may delete non-duplicates! Read the manual first!
180 -r --recurse           for every directory, process its subdirectories too
181 -R --recurse:          for each directory given after this option follow
182                        subdirectories encountered within (note the ':' at
183                        the end of the option, manpage for more details)
184 -s --symlinks          follow symlinks
185 -S --size              show size of duplicate files
186 -t --nochangecheck     disable security check for file changes (aka TOCTTOU)
187 -T --partial-only      match based on partial hashes only. WARNING:
188                        EXTREMELY DANGEROUS paired with destructive actions!
189                        -T must be specified twice to work. Read the manual!
190 -u --printunique       print only a list of unique (non-matched) files
191 -U --notravcheck       disable double-traversal safety check (BE VERY CAREFUL)
192                        This fixes a Google Drive File Stream recursion issue
193 -v --version           display jdupes version and license information
194 -X --extfilter=x:y     filter files based on specified criteria
195                        Use '-X help' for detailed extfilter help
196 -z --zeromatch         consider zero-length files to be duplicates
197 -Z --softabort         If the user aborts (i.e. CTRL-C) act on matches so far
198                        You can send SIGUSR1 to the program to toggle this
199
200
201Detailed help for jdupes -X/--extfilter options
202General format: jdupes -X filter[:value][size_suffix]
203
204noext:ext1[,ext2,...]           Exclude files with certain extension(s)
205
206onlyext:ext1[,ext2,...]         Only include files with certain extension(s)
207
208size[+-=]:size[suffix]          Only Include files matching size criteria
209                                Size specs: + larger, - smaller, = equal to
210                                Specs can be mixed, i.e. size+=:100k will
211                                only include files 100KiB or more in size.
212
213nostr:text_string               Exclude all paths containing the string
214onlystr:text_string             Only allow paths containing the string
215                                HINT: you can use these for directories:
216                                -X nostr:/dir_x/  or  -X onlystr:/dir_x/
217newer:datetime                  Only include files newer than specified date
218older:datetime                  Only include files older than specified date
219                                Date/time format: "YYYY-MM-DD HH:MM:SS"
220                                Time is optional (remember to escape spaces!)
221
222Some filters take no value or multiple values. Filters that can take
223a numeric option generally support the size multipliers K/M/G/T/P/E
224with or without an added iB or B. Multipliers are binary-style unless
225the -B suffix is used, which will use decimal multipliers. For example,
22616k or 16kib = 16384; 16kb = 16000. Multipliers are case-insensitive.
227
228Filters have cumulative effects: jdupes -X size+:99 -X size-:101 will
229cause only files of exactly 100 bytes in size to be included.
230
231Extension matching is case-insensitive.
232Path substring matching is case-sensitive.
233```
234
235The `-U`/`--notravcheck` option disables the double-traversal prevention tree.
236In the VAST MAJORITY of circumstances, this SHOULD NOT BE DONE, as it protects
237against several dangerous user errors, including specifying the same files or
238directories twice causing them to match themselves and potentially be lost or
239irreversibly damaged, or a symbolic link to a directory making an endless loop
240of recursion that will cause the program to hang indefinitely. This option was
241added because Google Drive File Stream presents directories in the virtual hard
242drive used by GDFS with identical device:inode pairs despite the directories
243actually being different. This triggers double-traversal prevention against
244every directory, effectively blocking all recursion. Disabling this check will
245reduce safety, but will allow duplicate scanning inside Google Drive File
246Stream drives. This also results in a very minor speed boost during recursion,
247but the boost is unlikely to be noticeable.
248
249The `-t`/`--nochangecheck` option disables file change checks during/after
250scanning. This opens a security vulnerability that is called a TOCTTOU (time of
251check to time of use) vulnerability. The program normally runs checks
252immediately before scanning or taking action upon a file to see if the file has
253changed in some way since it was last checked. With this option enabled, the
254program will not run any of these checks, making the algorithm slightly faster,
255but also increasing the risk that the program scans a file, the file is changed
256after the scan, and the program still acts like the file was in its previous
257state. This is particularly dangerous when considering actions such as linking
258and deleting. In the most extreme case, a file could be deleted during scanning
259but match other files prior to that deletion; if the file is the first in the
260list of duplicates and auto-delete is used, all of the remaining matched files
261will be deleted as well. This option was added due to user reports of some
262filesystems (particularly network filesystems) changing the reported file
263information inappropriately, rendering the entire program unusable on such
264filesystems.
265
266The `-n`/`--noempty` option was removed for safety. Matching zero-length files
267as duplicates now requires explicit use of the `-z`/`--zeromatch` option
268instead.
269
270Duplicate files are listed together in groups with each file displayed on a
271separate line. The groups are then separated from each other by blank lines.
272
273The `-s`/`--symlinks` option will treat symlinked files as regular files, but
274direct symlinks will be treated as if they are hard linked files and the
275-H/--hardlinks option will apply to them in the same manner.
276
277When using `-d` or `--delete`, care should be taken to insure against
278accidental data loss. While no information will be immediately lost, using this
279option together with `-s` or `--symlink` can lead to confusing information
280being presented to the user when prompted for files to preserve. Specifically,
281a user could accidentally preserve a symlink while deleting the file it points
282to. A similar problem arises when specifying a particular directory more than
283once. All files within that directory will be listed as their own duplicates,
284leading to data loss should a user preserve a file without its "duplicate" (the
285file itself!)
286
287Using `-1` or `--one-file-system` prevents matches that cross filesystems, but
288a more relaxed form of this option may be added that allows cross-matching for
289all filesystems that each parameter is present on.
290
291`-Z` or `--softabort` used to be `--hardabort` in jdupes prior to v1.5 and had
292the opposite behavior. Defaulting to taking action on abort is probably not
293what most users would expect. The decision to invert rather than reassign to a
294different option was made because this feature was still fairly new at the time
295of the change.
296
297On non-Windows platforms that support SIGUSR1, you can toggle the state of the
298`-Z` option by sending a SIGUSR1 to the program. This is handy if you want to
299abort jdupes, didn't specify `-Z`, and changed your mind and don't want to lose
300all the work that was done so far. Just do '`killall -USR1 jdupes`' and you will
301be able to abort with `-Z`. This works in reverse: if you want to prevent a
302`-Z` from happening, a SIGUSR1 will toggle it back off. That's a lot less
303useful because you can just stop and kill the program to get the same effect,
304but it's there if you want it for some reason. Sending the signal twice while
305the program is stopped will behave as if it was only sent once, as per normal
306POSIX signal behavior.
307
308The `-O` or `--paramorder` option allows the user greater control over what
309appears in the first position of a match set, specifically for keeping the `-N`
310option from deleting all but one file in a set in a seemingly random way. All
311directories specified on the command line will be used as the sorting order of
312result sets first, followed by the sorting algorithm set by the `-o` or
313`--order` option. This means that the order of all match pairs for a single
314directory specification will retain the old sorting behavior even if this
315option is specified.
316
317When used together with options `-s` or `--symlink`, a user could accidentally
318preserve a symlink while deleting the file it points to.
319
320The `-Q` or `--quick option` only reads each file once, hashes it, and performs
321comparisons based solely on the hashes. There is a small but significant risk
322of a hash collision which is the purpose of the failsafe byte-for-byte
323comparison that this option explicitly bypasses. Do not use it on ANY data set
324for which any amount of data loss is unacceptable. You have been warned!
325
326The `-T` or `--partial-only` option produces results based on a hash of the
327first block of file data in each file, ignoring everything else in the file.
328Partial hash checks have always been an important exclusion step in the jdupes
329algorithm, usually hashing the first 4096 bytes of data and allowing files that
330are different at the start to be rejected early. In certain scenarios it may be
331a useful heuristic for a user to see that a set of files has the same size and
332the same starting data, even if the remaining data does not match; one example
333of this would be comparing files with data blocks that are damaged or missing
334such as an incomplete file transfer or checking a data recovery against
335known-good copies to see what damaged data can be deleted in favor of restoring
336the known-good copy. This option is meant to be used with informational actions
337and can result in EXTREME DATA LOSS if used with options that delete files,
338create hard links, or perform other destructive actions on data based on the
339matching output. Because of the potential for massive data destruction, this
340option MUST BE SPECIFIED TWICE to take effect and will error out if it is only
341specified once.
342
343The `-I`/`--isolate` option attempts to block matches that are contained in the
344same specified directory parameter on the command line. Due to the underlying
345nature of the jdupes algorithm, a lot of matches will be blocked by this option
346that probably should not be. This code could use improvement.
347
348The `-C`/`--chunksize` option overrides the size of the I/O "chunk" used for
349all file operations. Larger numbers will increase the amount of data read at
350once from each file and may improve performance when scanning lots of files
351that are larger than the default chunk size by reducing "thrashing" of the hard
352disk heads. Smaller numbers may increase algorithm speed depending on the
353characteristics of your CPU but will usually increase I/O and system call
354overhead as well. The number also directly affects memory usage: I/O chunk size
355is used for at least three allocations in the program, so using a chunk size of
35616777216 (16 MiB) will require 48 MiB of RAM. The default is usually between
35732768 and 65536 which results in the fastest raw speed of the algorithm and
358generally good all-around performance. Feel free to experiment with the number
359on your data set and report your experiences (preferably with benchmarks and
360info on your data set.)
361
362Using `-P`/`--print` will cause the program to print extra information that may
363be useful but will pollute the output in a way that makes scripted handling
364difficult. Its current purpose is to reveal more information about the file
365matching process by printing match pairs that pass certain steps of the process
366prior to full file comparison. This can be useful if you have two files that
367are passing early checks but failing after full checks.
368
369
370Hard and soft (symbolic) linking status symbols and behavior
371-------------------------------------------------------------------------------
372A set of arrows are used in file linking to show what action was taken on each
373link candidate. These arrows are as follows:
374
375`---->` File was hard linked to the first file in the duplicate chain
376
377`-@@->` File was symlinked to the first file in the chain
378
379`-==->` Already a hard link to the first file in the chain
380
381`-//->` File linking failed due to an error during the linking process
382
383If your data set has linked files and you do not use `-H` to always consider
384them as duplicates, you may still see linked files appear together in match
385sets. This is caused by a separate file that matches with linked files
386independently and is the correct behavior. See notes below on the "triangle
387problem" in jdupes for technical details.
388
389
390Microsoft Windows platform-specific notes
391-------------------------------------------------------------------------------
392Windows has a hard limit of 1024 hard links per file. There is no way to change
393this. The documentation for CreateHardLink() states: "The maximum number of
394hard links that can be created with this function is 1023 per file. If more
395than 1023 links are created for a file, an error results." (The number is
396actually 1024, but they're ignoring the first file.)
397
398
399The current jdupes algorithm's "triangle problem"
400-------------------------------------------------------------------------------
401Pairs of files are excluded individually based on how the two files compare.
402For example, if `--hardlinks` is not specified then two files which are hard
403linked will not match one another for duplicate scanning purposes. The problem
404with only examining files in pairs is that certain circumstances will lead to
405the exclusion being overridden.
406
407Let's say we have three files with identical contents:
408
409```
410a/file1
411a/file2
412a/file3
413```
414
415and `a/file1` is linked to `a/file3`. Here's how `jdupes a/` sees them:
416
417---
418        Are 'a/file1' and 'a/file2' matches? Yes
419        [point a/file1->duplicates to a/file2]
420
421        Are 'a/file1' and 'a/file3' matches? No (hard linked already, `-H` off)
422
423        Are 'a/file2' and 'a/file3' matches? Yes
424        [point a/file2->duplicates to a/file3]
425---
426
427Now you have the following duplicate list:
428
429```
430a/file1->duplicates ==> a/file2->duplicates ==> a/file3
431```
432
433The solution is to split match sets into multiple sets, but doing this will
434also remove the guarantee that files will only ever appear in one match set and
435could result in data loss if handled improperly. In the future, options for
436"greedy" and "sparse" may be introduced to switch between allowing triangle
437matches to be in the same set vs. splitting sets after matching finishes
438without the "only ever appears once" guarantee.
439
440
441Does jdupes meet the "Good Practice when Deleting Duplicates" by rmlint?
442-------------------------------------------------------------------------------
443Yes. If you've not read this list of cautions, it is available at
444http://rmlint.readthedocs.io/en/latest/cautions.html
445
446Here's a breakdown of how jdupes addresses each of the items listed.
447
448### "Backup your data"/"Measure twice, cut once"
449These guidelines are for the user of duplicate scanning software, not the
450software itself. Back up your files regularly. Use jdupes to print a list of
451what is found as duplicated and check that list very carefully before
452automatically deleting the files.
453
454### "Beware of unusual filename characters"
455The only character that poses a concern in jdupes is a newline `\n` and that is
456only a problem because the duplicate set printer uses them to separate file
457names. Actions taken by jdupes are not parsed like a command line, so spaces
458and other weird characters in names aren't a problem. Escaping the names
459properly if acting on the printed output is a problem for the user's shell
460script or other external program.
461
462### "Consider safe removal options"
463This is also an exercise for the user.
464
465### "Traversal Robustness"
466jdupes tracks each directory traversed by dev:inode pair to avoid adding the
467contents of the same directory twice. This prevents the user from being able to
468register all of their files twice by duplicating an entry on the command line.
469Symlinked directories are only followed if they weren't already followed
470earlier. Files are renamed to a temporary name before any linking is done and
471if the link operation fails they are renamed back to the original name.
472
473### "Collision Robustness"
474jdupes uses xxHash for file data hashing. This hash is extremely fast with a
475low collision rate, but it still encounters collisions as any hash function
476will ("secure" or otherwise) due to the pigeonhole principle. This is why
477jdupes performs a full-file verification before declaring a match.  It's slower
478than matching by hash only, but the pigeonhole principle puts all data sets
479larger than the hash at risk of collision, meaning a false duplicate detection
480and data loss. The slower completion time is not as important as data
481integrity. Checking for a match based on hashes alone is irresponsible, and
482using secure hashes like MD5 or the SHA families is orders of magnitude slower
483than xxHash while still suffering from the risk brought about by the
484pigeonholing. An example of this problem is as follows: if you have 365 days in
485a year and 366 people, the chance of having at least two birthdays on the same
486day is guaranteed; likewise, even though SHA512 is a 512-bit (64-byte) wide
487hash, there are guaranteed to be at least 256 pairs of data streams that causes
488a collision once any of the data streams being hashed for comparison is 65
489bytes (520 bits) or larger.
490
491### "Unusual Characters Robustness"
492jdupes does not protect the user from putting ASCII control characters in their
493file names; they will mangle the output if printed, but they can still be
494operated upon by the actions (delete, link, etc.) in jdupes.
495
496### "Seek Thrash Robustness"
497jdupes uses an I/O chunk size that is optimized for reading as much as possible
498from disk at once to take advantage of high sequential read speeds in
499traditional rotating media drives while balancing against the significantly
500higher rate of CPU cache misses triggered by an excessively large I/O buffer
501size. Enlarging the I/O buffer further may allow for lots of large files to be
502read with less head seeking, but the CPU cache misses slow the algorithm down
503and memory usage increases to hold these large buffers. jdupes is benchmarked
504periodically to make sure that the chosen I/O chunk size is the best compromise
505for a wide variety of data sets.
506
507### "Memory Usage Robustness"
508This is a very subjective concern considering that even a cell phone in
509someone's pocket has at least 1GB of RAM, however it still applies in the
510embedded device world where 32MB of RAM might be all that you can have.  Even
511when processing a data set with over a million files, jdupes memory usage
512(tested on Linux x86-64 with -O3 optimization) doesn't exceed 2GB.  A low
513memory mode can be chosen at compile time to reduce overall memory usage with a
514small performance penalty.
515
516
517Contact information
518-------------------------------------------------------------------------------
519For all jdupes inquiries, contact Jody Bruchon <jody@jodybruchon.com>
520Please DO NOT contact Adrian Lopez about issues with jdupes.
521
522
523Legal information and software license
524-------------------------------------------------------------------------------
525jdupes is Copyright (C) 2015-2020 by Jody Bruchon <jody@jodybruchon.com>
526Derived from the original 'fdupes' 1.51 (C) 1999-2014 by Adrian Lopez
527Includes other code libraries which are (C) 2015-2020 by Jody Bruchon
528
529The MIT License
530
531Permission is hereby granted, free of charge, to any person obtaining a copy of
532this software and associated documentation files (the "Software"), to deal in
533the Software without restriction, including without limitation the rights to
534use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
535of the Software, and to permit persons to whom the Software is furnished to do
536so, subject to the following conditions:
537
538The above copyright notice and this permission notice shall be included in all
539copies or substantial portions of the Software.
540
541THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
542IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
543FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
544AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
545LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
546OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
547SOFTWARE.
548

README.stupid_dupes

1Introduction
2--------------------------------------------------------------------------
3stupid_dupes is a shell script that copies the most basic capabilities of
4jdupes. It is inefficient. It barely has enough features to be worthy of
5using the word "features" at all. Despite all of that, it's pretty safe
6and produces the same simple match set printouts as jdupes.
7
8This program illustrates how a duplicate scanner works on a basic level.
9It has a minimal set of requirements:
10
11* GNU bash
12* find with support for -type and -maxdepth
13* stat
14* cat
15* jodyhash (or any other program that outputs ONLY a hash)
16* dd (for partial hashing)
17
18It's slow.
19
20Real slow.
21
22You're welcome.
23
24Please consider financially supporting continued development of
25stupid_dupes (like you'd spend the money so smartly otherwise):
26
27https://www.subscribestar.com/JodyBruchon
28
29
30Contact information
31--------------------------------------------------------------------------
32For stupid_dupes inquiries, contact Jody Bruchon <jody@jodybruchon.com>
33and be sure to say something really stupid when you do.
34
35
36Legal information and software license
37--------------------------------------------------------------------------
38stupid_dupes is Copyright (C) 2020 by Jody Bruchon <jody@jodybruchon.com>
39and for some reason Jody is willing to admit to writing it.
40
41The MIT License
42
43Permission is hereby granted, free of charge, to any person
44obtaining a copy of this software and associated documentation files
45(the "Software"), to deal in the Software without restriction,
46including without limitation the rights to use, copy, modify, merge,
47publish, distribute, sublicense, and/or sell copies of the Software,
48and to permit persons to whom the Software is furnished to do so,
49subject to the following conditions:
50
51The above copyright notice and this permission notice shall be
52included in all copies or substantial portions of the Software.
53
54THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
55OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
56MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
57IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
58CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
59TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
60SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
61
62