|
Name |
|
Date |
Size |
#Lines |
LOC |
| .. | | 03-May-2022 | - |
| build-aux/ | H | 28-Jan-2012 | - | 5,420 | 4,459 |
| lib/ | H | 28-Jan-2012 | - | 684 | 540 |
| m4/ | H | 28-Jan-2012 | - | 2,299 | 2,185 |
| man/ | H | 28-Jan-2012 | - | 637 | 568 |
| po/ | H | 03-May-2022 | - | 1,704 | 1,440 |
| src/ | H | 28-Jan-2012 | - | 4,997 | 3,544 |
| ABOUT-NLS | H A D | 28-Jan-2012 | 91.6 KiB | 1,283 | 1,244 |
| AUTHORS | H A D | 10-Apr-2011 | 643 | 21 | 13 |
| COPYING | H A D | 10-Apr-2011 | 926 | 23 | 16 |
| ChangeLog | H A D | 28-Jan-2012 | 13.4 KiB | 323 | 230 |
| HACKING | H A D | 24-Jan-2012 | 2.2 KiB | 77 | 52 |
| INSTALL | H A D | 28-Jan-2012 | 15.2 KiB | 366 | 284 |
| Makefile.am | H A D | 28-Jan-2012 | 178 | 12 | 5 |
| Makefile.in | H A D | 28-Jan-2012 | 24.7 KiB | 772 | 686 |
| NEWS | H A D | 28-Jan-2012 | 2.1 KiB | 76 | 58 |
| README | H A D | 28-Jan-2012 | 6 KiB | 214 | 142 |
| README.SHA | H A D | 10-Apr-2011 | 8.3 KiB | 224 | 182 |
| TODO | H A D | 28-Jan-2012 | 700 | 23 | 18 |
| aclocal.m4 | H A D | 28-Jan-2012 | 36.8 KiB | 1,024 | 924 |
| configure | H A D | 28-Jan-2012 | 284.9 KiB | 9,773 | 8,224 |
| configure.ac | H A D | 28-Jan-2012 | 1.2 KiB | 48 | 38 |
| join-duplicates.sh | H A D | 03-May-2022 | 926 | 48 | 30 |
README
1duff - Duplicate file finder
2============================
3
40. Introduction
5===============
6
7Duff is a command-line utility for identifying duplicates in a given set of
8files. It attempts to be usably fast and uses the SHA family of message
9digests as a part of the comparisons.
10
11The project website is here:
12
13 http://duff.sourceforge.net/
14
15Duff resides in public Git repository on SourceForge.net:
16
17 git://duff.git.sourceforge.net/gitroot/duff/duff
18
19The version numbering scheme for duff is as follows:
20
21 * The first number is the major version. This will be updated upon what the
22 author considers a round of feature completion.
23
24 * The second number is the minor version number. This is updated for releases
25 that include minor new features, or features that do not change the
26 functionality of the program.
27
28 * The third number, if present, is the bugfix release number. This indicates
29 a release which only fixes bugs present in a previous major or minor release.
30
31
321. License and copyright
33========================
34
35Duff is copyright (c) 2005 Camilla Berglund <elmindreda@elmindreda.org>
36
37Duff is licensed under the zlib/libpng license. See the file `COPYING' for
38license details. The license is also included at the top of each source file.
39
40Duff contains shaX-asaddi.
41Copyright (c) 2001-2003 Allan Saddi <allan@saddi.com>
42See the files `src/sha*.c' and `src/sha*.h' for license details.
43
44Duff uses the gettext.h convenience header from GNU gettext.
45Copyright (C) 1995-1998, 2000-2002, 2004-2006, 2009 Free Software Foundation,
46Inc. See the `lib/gettex.h' for license details.
47
48Duff comes with a number of files provided by the GNU autoconf, automake and
49gettext packages. See the individual files in question for license details.
50
51
522. Project news
53===============
54
55See the file `NEWS'.
56
57
583. Building Duff
59================
60
61If you got this source tree from a Git repository then you will need to
62bootstrap the build environment using first `gettextize' and then `autoreconf
63-i'. Note that this requires that GNU autoconf, automake and gettext are
64installed. Also note that running gettextize may cause a few duplicate entries
65in various build files. If you got the source tree using Git, you can remove
66these with `git reset --hard' before moving on.
67
68If (or once) you have a `configure' script, go ahead and run it. No additional
69magic should be required. If it is, then that's a bug and should be reported.
70
71This release of duff has been successfully built on the following systems:
72
73 Cygwin 1.7 i686
74 Mac OS X 10.7 x86_64
75 Ubuntu Natty x86_64
76
77Earlier releases have been successfully built on the following systems:
78
79 Arch Linux x86
80 Cygwin 1.7 i686
81 Darwin 7.9.0 powerpc
82 Debian Etch powerpc
83 Debian Etch x86
84 Debian Lenny x86
85 Debian Sarge alpha
86 Debian Wheezy amd64
87 FreeBSD 4.11 x86
88 FreeBSD 5.4 x86
89 FreeBSD 8.2 i386
90 Mac OS X 10.3 powerpc
91 Mac OS X 10.4 powerpc
92 Mac OS X 10.6 i386
93 Mac OS X 10.6 x86_64
94 Mac OS X 10.6 x86_64 (with MacPorts gettext)
95 NetBSD 1.6.1 sparc
96 Red Hat Enterprise 4.0 x86
97 SunOS 5.9 sparc64
98 Ubuntu Breezy x86
99 Ubuntu Jaunty x86
100 Ubuntu Lucid amd64
101 Ubuntu Maverick amd64
102
103The tools used were GCC and GNU or BSD make. However, it should build on most
104Unix systems without modifications.
105
106
1074. Installing Duff
108==================
109
110See the file `INSTALL'.
111
112
1135. Using Duff
114=============
115
116See the accompanying manpage duff(1).
117
118To read the manpage before installation, use the following command:
119
120 groff -mdoc -Tascii duff.1 | less -R
121
122On GNU/Linux systems, however, the following command may suffice:
123
124 man -l duff.1
125
126
1276. Hacking Duff
128===============
129
130See the file `HACKING'.
131
132
1337. Bugs, feedback and patches
134=============================
135
136Please send bug reports, feedback, patches and cookies to:
137
138 Camilla Berglund <elmindreda@elmindreda.org>
139
140Or, if you prefer, you may use the trackers on SF.net to report bugs, submit
141patches or request features:
142
143 http://sourceforge.net/projects/duff
144
145For more involved discussions, please join the mailing list:
146
147 http://lists.sourceforge.net/lists/listinfo/duff-devel
148
149
1508. Credits and thanks
151=====================
152
153The following (alphabetically listed) people have contributed to duff, either
154by reporting bugs, suggesting new features or submitting patches:
155
156Harald Barth
157Alexander Bostrom
158Magnus Danielsson
159Stephan Hegel
160Patrik Jarnefelt
161Rasmus Kaj
162Mika Kuoppala
163Richard Levitte
164Fernando Lopez
165Clemens Lucas Fries
166Kamal Mostafa
167Ross Newell
168Allan Saddi <allan@saddi.com>
169
170...and everyone I forgot. Did I forget you? Drop me an email.
171
172
1739. Disambiguation
174=================
175
176This is duff the Unix command-line utility, not DUFF the Windows program.
177If you wish to find duplicate files on Windows, use DUFF.
178
179DUFF also has a SourceForge.net URL:
180
181 http://dff.sourceforge.net/
182
183
18410. Release history
185===================
186
187Version 0.1 was named `duplicate' and was never released anywhere.
188
189Version 0.2 was the first release named duff. It lacked a real checksumming
190algorithm, and was thus only released to a few individuals, during the first
191half of 2005.
192
193Version 0.3 was the first official release, on November 22, 2005, after a
194long search for a suitably licensed implementation of SHA1.
195
196Version 0.3.1 was a bugfix release, on November 27, 2005, adding a single
197feature (-z), which just happened to get included.
198
199Version 0.4 was the second feature release, on January 13, 2006, adding a
200number of missing and/or requested features as well as bug fixes. It was the
201first release to be considered stable and safe enough for everyday use.
202
203Version 0.5 was the third feature release, on April 11, 2011, adding a number
204of minor features and fixing a number of bugs. It was mostly intended to get
205the ball rolling again and thus low on features.
206
207Version 0.5.1 was a bugfix release, on January 17, 2012, adding a single bugfix
208and a new default cluster header for thorough mode.
209
210Version 0.5.2 was an minor release, on January 29, 2012, adding a number of
211optimizations, prefixing error and warning messages with the program name and
212modifying the default sampling limit.
213
214
README.SHA
1shaX-asaddi (X = 1, 256, 384, 512)
2==================================
3Copyright (c) 2001-2003 Allan Saddi <allan@saddi.com>
4All rights reserved.
5
6Redistribution and use in source and binary forms, with or without
7modification, are permitted provided that the following conditions
8are met:
91. Redistributions of source code must retain the above copyright
10 notice, this list of conditions and the following disclaimer.
112. Redistributions in binary form must reproduce the above copyright
12 notice, this list of conditions and the following disclaimer in the
13 documentation and/or other materials provided with the distribution.
14
15THIS SOFTWARE IS PROVIDED BY ALLAN SADDI AND HIS CONTRIBUTORS ``AS IS''
16AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
18ARE DISCLAIMED. IN NO EVENT SHALL ALLAN SADDI OR HIS CONTRIBUTORS BE
19LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
20CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
21SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
22INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
23CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
24ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
25POSSIBILITY OF SUCH DAMAGE.
26
27Introduction
28------------
29These are portable implementations of the National Institute of
30Standards and Technology's Secure Hash Algorithms. Implementations
31for SHA-1, SHA-256, SHA-384, and SHA-512 are available. All are
32equally portable, assuming your compiler supports 64-bit integers
33(which gcc does).
34
35For more information on SHA (the algorithms), visit:
36http://csrc.nist.gov/encryption/tkhash.html
37
38The following documentation and examples will refer to the SHA-1
39implementation. However, they equally apply to the SHA-256, SHA-384,
40and SHA-512 implementations except where noted.
41
42API
43---
44SHA1Context
45 This is the hash context. There should be one SHA1Context for each
46 object to be hashed. (This only applies if hashing is being done
47 in parallel. Otherwise, it's perfectly safe to reuse a SHA1Context
48 to hash objects serially, e.g. one file at a time.)
49
50 A SHA1Context can be declared static, automatic, or allocated from
51 the heap. There are certain alignment restrictions, but it shouldn't
52 be of any concern in normal usage (malloc() should return suitably
53 aligned memory, and the compiler will take care of the other cases).
54
55 There's nothing really special about a SHA1Context. It should be
56 safe to copy it, e.g. using memcpy() or bcopy().
57
58void SHA1Init (SHA1Context *sc);
59 Initializes a SHA1Context. This should be called before any of the
60 following functions are called.
61
62void SHA1Update (SHA1Context *sc, const void *data, uint32_t len);
63 Hashes some data. len is in bytes.
64
65void SHA1Final (SHA1Context *sc, uint8_t hash[SHA1_HASH_SIZE]);
66 Gets the SHA-1 hash and "closes" the context. The context should
67 no longer be used. (Due to padding, etc.) If you wish to hash a
68 new set of data using the same SHA1Context, be sure to call
69 SHA1Init(). If you want to continue hashing data using the
70 same context, simply make a copy of the context and call
71 SHA1Final() on the copy.
72
73 hash may be NULL, in which case no hash is generated (but the
74 context is still closed). Regardless if hash is NULL or not, a
75 word representation of the hash (32-bit words for SHA-1 and SHA-256,
76 64-bit words for SHA-384 and SHA-512) is available in
77 sc->hash[0..SHA1_HASH_WORDS-1]. This may be useful in other
78 applications.
79
80 If being used for cryptography, it's probably a good idea to zero-out
81 the SHA1Context after you're done.
82
83Compile-Time Options
84--------------------
85HAVE_CONFIG_H
86 Define this if you want the code to include <config.h>. This is useful
87 if you use GNU configure.
88
89HAVE_INTTYPES_H
90HAVE_STDINT_H
91 Define one of these to 1 if you have the respective header file. If you
92 have neither, be sure to typedef/define uint8_t, uint32_t, and uint64_t
93 appropriately (perhaps in config.h above).
94
95WORDS_BIGENDIAN
96 Define this if you're on a big-endian processor.
97
98RUNTIME_ENDIAN
99 Define this if you would rather determine processor endianess at
100 runtime. WORDS_BIGENDIAN will be ignored if this is defined. The
101 generated code may be slightly slower, but at least you won't
102 have to worry about big-endian vs. little-endian!
103
104SHA1_FAST_COPY
105 Defining this will eliminate some copying overhead of hashed data.
106 Also, calculating the hash in SHA1Final() should be slightly faster.
107 This isn't on by default because of alignment issues. See Portability
108 Notes.
109
110SHA1_UNROLL
111 If undefined, it will default to 1. This is the number of rounds
112 to perform in a loop iteration. The larger the number, the bigger
113 the code, but also the less loop overhead there will be. It must
114 be between 1 and 20 inclusive, and it must be a factor of 20 or
115 a product of some of its factors. (Don't worry, you'll get a nice
116 error message if you defined it wrong.)
117
118 SHA-256 is the only other implementation that has something
119 similar (SHA256_UNROLL). It must be a power of 2 between 1 and
120 64 inclusive and it defaults to 1.
121
122 You may want to experiment with different values. I've generally
123 found that big code is slower, despite being more efficient. This
124 is most likely due to cache space limitations.
125
126SHA1_TEST
127 Define this to compile a simple test program. See the comments in
128 sha1.c for what the output should look like. If the output doesn't
129 look right, try flipping WORDS_BIGENDIAN (define it if you didn't
130 define it, undefine it if you did). For example:
131
132 > gcc -Wall -O2 -DSHA1_TEST -o test sha1.c
133
134Portability Notes
135-----------------
136As was mentioned, you need a compiler that supports 64-bit integers.
137You will also need <inttypes.h> for uint8_t, uint32_t, uint64_t. I'm not
138sure how common or standard this include file is, but it was available
139on all platforms I tested.
140
141It was actually surprising to find that all but one of the processors
142tested supported unaligned word accesses. (I came from a MC680x0 +
143MIPS background.) I developed the code on i386 and powerpc architectures,
144which both supported unaligned words. It wasn't until I tried out my
145code on a sparc that I realized I needed to be a little more careful.
146(Bus errors... yum!)
147
148With SHA1_FAST_COPY undefined, the code should be very portable. If you
149define it, the code may be slightly faster, but there are a few things
150you need to be careful about, especially on architectures that don't
151support unaligned word accesses. Here are some general guidelines:
152
153Use SHA1_FAST_COPY if:
154
155 * You call SHA1Update() with a consistent buffer size every time.
156 (The last time you call it before calling SHA1Final() can be the
157 exception.) And:
158
159 * The buffer size is a multiple of 64-bytes (SHA-1, SHA-256) or
160 128-bytes (SHA-384, SHA-512). And:
161
162 * The buffer address is evenly divisible by 4 (SHA-1, SHA-256) or
163 evenly divisible by 8 (SHA-384, SHA-512). And finally:
164
165 * The hash address passed to SHA1Final() is evenly divisible by
166 4 (SHA-1, SHA-256) or evenly divisible by 8 (SHA-384, SHA-512).
167
168You can ensure proper address alignment by using malloc() (read your
169man page to verify this) or by doing something like:
170
171 union {
172 uint32_t w; /* use uint64_t for SHA-384, SHA-512 */
173 uint8_t b[SHA1_HASH_SIZE];
174 } hash;
175 ...
176 SHA1Final (&sha, hash.b);
177
178If you're on an architecture that supports unaligned word accesses,
179it may be safe to define SHA1_FAST_COPY anyway. However, it would be
180a good idea to experiment, since unaligned word accesses may actually
181take longer and cancel the benefits of faster code.
182
183Example
184-------
185 #include <inttypes.h> /* for uint8_t, etc. */
186 #include <string.h> /* for memset() */
187
188 #include "sha1.h"
189
190 ...
191 SHA1Context sha;
192 uint8_t hash[SHA1_HASH_SIZE];
193 ...
194 SHA1Init (&sha);
195 ...
196 SHA1Update (&sha, buffer, length);
197 ...
198 SHA1Update (&sha, buffer2, length2);
199 ...
200 call SHA1Update() with more data
201 ...
202 SHA1Final (&sha, hash);
203 memset (&sha, 0, sizeof (sha)); /* for the truly paranoid */
204 ...
205 do something with hash
206 ...
207
208Platforms Tested
209----------------
210gcc was the compiler used on all tested platforms.
211
212FreeBSD i386
213Darwin powerpc
214Linux i386
215Linux alpha
216Linux powerpc
217Solaris sparc
218
219Comments? Suggestions? Bugs?
220----------------------------
221Please let me know!
222
223- Allan Saddi <allan@saddi.com>
224