• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

build-aux/H28-Jan-2012-5,4204,459

lib/H28-Jan-2012-684540

m4/H28-Jan-2012-2,2992,185

man/H28-Jan-2012-637568

po/H03-May-2022-1,7041,440

src/H28-Jan-2012-4,9973,544

ABOUT-NLSH A D28-Jan-201291.6 KiB1,2831,244

AUTHORSH A D10-Apr-2011643 2113

COPYINGH A D10-Apr-2011926 2316

ChangeLogH A D28-Jan-201213.4 KiB323230

HACKINGH A D24-Jan-20122.2 KiB7752

INSTALLH A D28-Jan-201215.2 KiB366284

Makefile.amH A D28-Jan-2012178 125

Makefile.inH A D28-Jan-201224.7 KiB772686

NEWSH A D28-Jan-20122.1 KiB7658

READMEH A D28-Jan-20126 KiB214142

README.SHAH A D10-Apr-20118.3 KiB224182

TODOH A D28-Jan-2012700 2318

aclocal.m4H A D28-Jan-201236.8 KiB1,024924

configureH A D28-Jan-2012284.9 KiB9,7738,224

configure.acH A D28-Jan-20121.2 KiB4838

join-duplicates.shH A D03-May-2022926 4830

README

1duff - Duplicate file finder
2============================
3
40. Introduction
5===============
6
7Duff is a command-line utility for identifying duplicates in a given set of
8files.  It attempts to be usably fast and uses the SHA family of message
9digests as a part of the comparisons.
10
11The project website is here:
12
13  http://duff.sourceforge.net/
14
15Duff resides in public Git repository on SourceForge.net:
16
17  git://duff.git.sourceforge.net/gitroot/duff/duff
18
19The version numbering scheme for duff is as follows:
20
21 * The first number is the major version.  This will be updated upon what the
22   author considers a round of feature completion.
23
24 * The second number is the minor version number.  This is updated for releases
25   that include minor new features, or features that do not change the
26   functionality of the program.
27
28 * The third number, if present, is the bugfix release number.  This indicates
29   a release which only fixes bugs present in a previous major or minor release.
30
31
321. License and copyright
33========================
34
35Duff is copyright (c) 2005 Camilla Berglund <elmindreda@elmindreda.org>
36
37Duff is licensed under the zlib/libpng license.  See the file `COPYING' for
38license details.  The license is also included at the top of each source file.
39
40Duff contains shaX-asaddi.
41Copyright (c) 2001-2003 Allan Saddi <allan@saddi.com>
42See the files `src/sha*.c' and `src/sha*.h' for license details.
43
44Duff uses the gettext.h convenience header from GNU gettext.
45Copyright (C) 1995-1998, 2000-2002, 2004-2006, 2009 Free Software Foundation,
46Inc.  See the `lib/gettex.h' for license details.
47
48Duff comes with a number of files provided by the GNU autoconf, automake and
49gettext packages.  See the individual files in question for license details.
50
51
522. Project news
53===============
54
55See the file `NEWS'.
56
57
583. Building Duff
59================
60
61If you got this source tree from a Git repository then you will need to
62bootstrap the build environment using first `gettextize' and then `autoreconf
63-i'.  Note that this requires that GNU autoconf, automake and gettext are
64installed.  Also note that running gettextize may cause a few duplicate entries
65in various build files.  If you got the source tree using Git, you can remove
66these with `git reset --hard' before moving on.
67
68If (or once) you have a `configure' script, go ahead and run it.  No additional
69magic should be required.  If it is, then that's a bug and should be reported.
70
71This release of duff has been successfully built on the following systems:
72
73  Cygwin 1.7 i686
74  Mac OS X 10.7 x86_64
75  Ubuntu Natty x86_64
76
77Earlier releases have been successfully built on the following systems:
78
79  Arch Linux x86
80  Cygwin 1.7 i686
81  Darwin 7.9.0 powerpc
82  Debian Etch powerpc
83  Debian Etch x86
84  Debian Lenny x86
85  Debian Sarge alpha
86  Debian Wheezy amd64
87  FreeBSD 4.11 x86
88  FreeBSD 5.4 x86
89  FreeBSD 8.2 i386
90  Mac OS X 10.3 powerpc
91  Mac OS X 10.4 powerpc
92  Mac OS X 10.6 i386
93  Mac OS X 10.6 x86_64
94  Mac OS X 10.6 x86_64 (with MacPorts gettext)
95  NetBSD 1.6.1 sparc
96  Red Hat Enterprise 4.0 x86
97  SunOS 5.9 sparc64
98  Ubuntu Breezy x86
99  Ubuntu Jaunty x86
100  Ubuntu Lucid amd64
101  Ubuntu Maverick amd64
102
103The tools used were GCC and GNU or BSD make.  However, it should build on most
104Unix systems without modifications.
105
106
1074. Installing Duff
108==================
109
110See the file `INSTALL'.
111
112
1135. Using Duff
114=============
115
116See the accompanying manpage duff(1).
117
118To read the manpage before installation, use the following command:
119
120  groff -mdoc -Tascii duff.1 | less -R
121
122On GNU/Linux systems, however, the following command may suffice:
123
124  man -l duff.1
125
126
1276. Hacking Duff
128===============
129
130See the file `HACKING'.
131
132
1337. Bugs, feedback and patches
134=============================
135
136Please send bug reports, feedback, patches and cookies to:
137
138  Camilla Berglund <elmindreda@elmindreda.org>
139
140Or, if you prefer, you may use the trackers on SF.net to report bugs, submit
141patches or request features:
142
143  http://sourceforge.net/projects/duff
144
145For more involved discussions, please join the mailing list:
146
147  http://lists.sourceforge.net/lists/listinfo/duff-devel
148
149
1508. Credits and thanks
151=====================
152
153The following (alphabetically listed) people have contributed to duff, either
154by reporting bugs, suggesting new features or submitting patches:
155
156Harald Barth
157Alexander Bostrom
158Magnus Danielsson
159Stephan Hegel
160Patrik Jarnefelt
161Rasmus Kaj
162Mika Kuoppala
163Richard Levitte
164Fernando Lopez
165Clemens Lucas Fries
166Kamal Mostafa
167Ross Newell
168Allan Saddi <allan@saddi.com>
169
170...and everyone I forgot.  Did I forget you?  Drop me an email.
171
172
1739. Disambiguation
174=================
175
176This is duff the Unix command-line utility, not DUFF the Windows program.
177If you wish to find duplicate files on Windows, use DUFF.
178
179DUFF also has a SourceForge.net URL:
180
181  http://dff.sourceforge.net/
182
183
18410. Release history
185===================
186
187Version 0.1 was named `duplicate' and was never released anywhere.
188
189Version 0.2 was the first release named duff.  It lacked a real checksumming
190algorithm, and was thus only released to a few individuals, during the first
191half of 2005.
192
193Version 0.3 was the first official release, on November 22, 2005, after a
194long search for a suitably licensed implementation of SHA1.
195
196Version 0.3.1 was a bugfix release, on November 27, 2005, adding a single
197feature (-z), which just happened to get included.
198
199Version 0.4 was the second feature release, on January 13, 2006, adding a
200number of missing and/or requested features as well as bug fixes.  It was the
201first release to be considered stable and safe enough for everyday use.
202
203Version 0.5 was the third feature release, on April 11, 2011, adding a number
204of minor features and fixing a number of bugs.  It was mostly intended to get
205the ball rolling again and thus low on features.
206
207Version 0.5.1 was a bugfix release, on January 17, 2012, adding a single bugfix
208and a new default cluster header for thorough mode.
209
210Version 0.5.2 was an minor release, on January 29, 2012, adding a number of
211optimizations, prefixing error and warning messages with the program name and
212modifying the default sampling limit.
213
214

README.SHA

1shaX-asaddi (X = 1, 256, 384, 512)
2==================================
3Copyright (c) 2001-2003 Allan Saddi <allan@saddi.com>
4All rights reserved.
5
6Redistribution and use in source and binary forms, with or without
7modification, are permitted provided that the following conditions
8are met:
91. Redistributions of source code must retain the above copyright
10   notice, this list of conditions and the following disclaimer.
112. Redistributions in binary form must reproduce the above copyright
12   notice, this list of conditions and the following disclaimer in the
13   documentation and/or other materials provided with the distribution.
14
15THIS SOFTWARE IS PROVIDED BY ALLAN SADDI AND HIS CONTRIBUTORS ``AS IS''
16AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
18ARE DISCLAIMED.  IN NO EVENT SHALL ALLAN SADDI OR HIS CONTRIBUTORS BE
19LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
20CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
21SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
22INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
23CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
24ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
25POSSIBILITY OF SUCH DAMAGE.
26
27Introduction
28------------
29These are portable implementations of the National Institute of
30Standards and Technology's Secure Hash Algorithms. Implementations
31for SHA-1, SHA-256, SHA-384, and SHA-512 are available. All are
32equally portable, assuming your compiler supports 64-bit integers
33(which gcc does).
34
35For more information on SHA (the algorithms), visit:
36http://csrc.nist.gov/encryption/tkhash.html
37
38The following documentation and examples will refer to the SHA-1
39implementation. However, they equally apply to the SHA-256, SHA-384,
40and SHA-512 implementations except where noted.
41
42API
43---
44SHA1Context
45  This is the hash context. There should be one SHA1Context for each
46  object to be hashed. (This only applies if hashing is being done
47  in parallel. Otherwise, it's perfectly safe to reuse a SHA1Context
48  to hash objects serially, e.g. one file at a time.)
49
50  A SHA1Context can be declared static, automatic, or allocated from
51  the heap. There are certain alignment restrictions, but it shouldn't
52  be of any concern in normal usage (malloc() should return suitably
53  aligned memory, and the compiler will take care of the other cases).
54
55  There's nothing really special about a SHA1Context. It should be
56  safe to copy it, e.g. using memcpy() or bcopy().
57
58void SHA1Init (SHA1Context *sc);
59  Initializes a SHA1Context. This should be called before any of the
60  following functions are called.
61
62void SHA1Update (SHA1Context *sc, const void *data, uint32_t len);
63  Hashes some data. len is in bytes.
64
65void SHA1Final (SHA1Context *sc, uint8_t hash[SHA1_HASH_SIZE]);
66  Gets the SHA-1 hash and "closes" the context. The context should
67  no longer be used. (Due to padding, etc.) If you wish to hash a
68  new set of data using the same SHA1Context, be sure to call
69  SHA1Init(). If you want to continue hashing data using the
70  same context, simply make a copy of the context and call
71  SHA1Final() on the copy.
72
73  hash may be NULL, in which case no hash is generated (but the
74  context is still closed). Regardless if hash is NULL or not, a
75  word representation of the hash (32-bit words for SHA-1 and SHA-256,
76  64-bit words for SHA-384 and SHA-512) is available in
77  sc->hash[0..SHA1_HASH_WORDS-1]. This may be useful in other
78  applications.
79
80  If being used for cryptography, it's probably a good idea to zero-out
81  the SHA1Context after you're done.
82
83Compile-Time Options
84--------------------
85HAVE_CONFIG_H
86  Define this if you want the code to include <config.h>. This is useful
87  if you use GNU configure.
88
89HAVE_INTTYPES_H
90HAVE_STDINT_H
91  Define one of these to 1 if you have the respective header file. If you
92  have neither, be sure to typedef/define uint8_t, uint32_t, and uint64_t
93  appropriately (perhaps in config.h above).
94
95WORDS_BIGENDIAN
96  Define this if you're on a big-endian processor.
97
98RUNTIME_ENDIAN
99  Define this if you would rather determine processor endianess at
100  runtime. WORDS_BIGENDIAN will be ignored if this is defined. The
101  generated code may be slightly slower, but at least you won't
102  have to worry about big-endian vs. little-endian!
103
104SHA1_FAST_COPY
105  Defining this will eliminate some copying overhead of hashed data.
106  Also, calculating the hash in SHA1Final() should be slightly faster.
107  This isn't on by default because of alignment issues. See Portability
108  Notes.
109
110SHA1_UNROLL
111  If undefined, it will default to 1. This is the number of rounds
112  to perform in a loop iteration. The larger the number, the bigger
113  the code, but also the less loop overhead there will be. It must
114  be between 1 and 20 inclusive, and it must be a factor of 20 or
115  a product of some of its factors. (Don't worry, you'll get a nice
116  error message if you defined it wrong.)
117
118  SHA-256 is the only other implementation that has something
119  similar (SHA256_UNROLL). It must be a power of 2 between 1 and
120  64 inclusive and it defaults to 1.
121
122  You may want to experiment with different values. I've generally
123  found that big code is slower, despite being more efficient. This
124  is most likely due to cache space limitations.
125
126SHA1_TEST
127  Define this to compile a simple test program. See the comments in
128  sha1.c for what the output should look like. If the output doesn't
129  look right, try flipping WORDS_BIGENDIAN (define it if you didn't
130  define it, undefine it if you did). For example:
131
132  > gcc -Wall -O2 -DSHA1_TEST -o test sha1.c
133
134Portability Notes
135-----------------
136As was mentioned, you need a compiler that supports 64-bit integers.
137You will also need <inttypes.h> for uint8_t, uint32_t, uint64_t. I'm not
138sure how common or standard this include file is, but it was available
139on all platforms I tested.
140
141It was actually surprising to find that all but one of the processors
142tested supported unaligned word accesses. (I came from a MC680x0 +
143MIPS background.) I developed the code on i386 and powerpc architectures,
144which both supported unaligned words. It wasn't until I tried out my
145code on a sparc that I realized I needed to be a little more careful.
146(Bus errors... yum!)
147
148With SHA1_FAST_COPY undefined, the code should be very portable. If you
149define it, the code may be slightly faster, but there are a few things
150you need to be careful about, especially on architectures that don't
151support unaligned word accesses. Here are some general guidelines:
152
153Use SHA1_FAST_COPY if:
154
155  * You call SHA1Update() with a consistent buffer size every time.
156    (The last time you call it before calling SHA1Final() can be the
157    exception.) And:
158
159  * The buffer size is a multiple of 64-bytes (SHA-1, SHA-256) or
160    128-bytes (SHA-384, SHA-512). And:
161
162  * The buffer address is evenly divisible by 4 (SHA-1, SHA-256) or
163    evenly divisible by 8 (SHA-384, SHA-512). And finally:
164
165  * The hash address passed to SHA1Final() is evenly divisible by
166    4 (SHA-1, SHA-256) or evenly divisible by 8 (SHA-384, SHA-512).
167
168You can ensure proper address alignment by using malloc() (read your
169man page to verify this) or by doing something like:
170
171  union {
172    uint32_t w; /* use uint64_t for SHA-384, SHA-512 */
173    uint8_t b[SHA1_HASH_SIZE];
174  } hash;
175  ...
176  SHA1Final (&sha, hash.b);
177
178If you're on an architecture that supports unaligned word accesses,
179it may be safe to define SHA1_FAST_COPY anyway. However, it would be
180a good idea to experiment, since unaligned word accesses may actually
181take longer and cancel the benefits of faster code.
182
183Example
184-------
185  #include <inttypes.h> /* for uint8_t, etc. */
186  #include <string.h> /* for memset() */
187
188  #include "sha1.h"
189
190  ...
191    SHA1Context sha;
192    uint8_t hash[SHA1_HASH_SIZE];
193    ...
194    SHA1Init (&sha);
195    ...
196    SHA1Update (&sha, buffer, length);
197    ...
198    SHA1Update (&sha, buffer2, length2);
199    ...
200    call SHA1Update() with more data
201    ...
202    SHA1Final (&sha, hash);
203    memset (&sha, 0, sizeof (sha)); /* for the truly paranoid */
204    ...
205    do something with hash
206  ...
207
208Platforms Tested
209----------------
210gcc was the compiler used on all tested platforms.
211
212FreeBSD	 i386
213Darwin   powerpc
214Linux    i386
215Linux    alpha
216Linux    powerpc
217Solaris  sparc
218
219Comments? Suggestions? Bugs?
220----------------------------
221Please let me know!
222
223- Allan Saddi <allan@saddi.com>
224