• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

man/H24-Mar-2009-931863

php/H24-Mar-2009-4,8953,976

src/H03-May-2022-2,8572,539

AUTHORSH A D24-Mar-200932 21

COPYINGH A D24-Mar-2009802 1615

INSTALLH A D20-Mar-200911.1 KiB292219

Makefile.amH A D24-Mar-200993 118

Makefile.inH A D24-Mar-200919.7 KiB653571

READMEH A D24-Mar-20096.2 KiB216125

README-PHPH A D24-Mar-20092.1 KiB7746

THANKSH A D24-Mar-200993 76

aclocal.m4H A D24-Mar-2009307.7 KiB8,8537,931

config.guessH A D20-Mar-200944.4 KiB1,5431,328

config.h.inH A D24-Mar-20092.6 KiB10168

config.subH A D20-Mar-200933 KiB1,6781,533

configureH A D24-Mar-2009576.3 KiB19,67215,478

configure.acH A D24-Mar-20092.1 KiB7160

install-shH A D20-Mar-200913.3 KiB520344

ltmain.shH A D20-Mar-2009237.6 KiB8,4076,475

missingH A D20-Mar-200910.9 KiB368275

README

1
2                               .:. LIBPUZZLE .:.
3
4                         http://libpuzzle.pureftpd.org
5
6
7            ------------------------ BLURB ------------------------
8
9
10The Puzzle library is designed to quickly find visually similar images (gif,
11png, jpg), even if they have been resized, recompressed, recolored or slightly
12modified.
13
14The library is free, lightweight yet very fast, configurable, easy to use and
15it has been designed with security in mind. This is a C library, but is also
16comes with a command-line tool and PHP bindings.
17
18
19          ------------------------ REFERENCE ------------------------
20
21
22The Puzzle library is a implementation of "An image signature for any kind of
23image", by H. CHI WONG, Marschall BERN and David GOLDBERG.
24
25
26         ------------------------ COMPILATION ------------------------
27
28
29In order to load images, the library relies on the GD2 library.
30You need to install gdlib2 and its development headers before compiling
31libpuzzle.
32The GD2 library is available as a pre-built package for most operating systems.
33Debian and Ubuntu users should install the "libgd2-dev" or the "libgd2-xpm-dev"
34package.
35Gentoo users should install "media-libs/gd".
36OpenBSD, NetBSD and DragonflyBSD users should install the "gd" package.
37MacPorts users should install the "gd2" package.
38X11 support is not required for the Puzzle library.
39
40Once GD2 has been installed, configure the Puzzle library as usual:
41
42./configure
43
44This is a standard autoconf script, if you're not familiar with it, please
45have a look at the INSTALL file.
46
47Compile the beast:
48
49make
50
51Try the built-in tests:
52
53make check
54
55If everything looks fine, install the software:
56
57make install
58
59If anything goes wrong, please submit a bug report to:
60                       libpuzzle [at] pureftpd [dot] org
61
62
63            ------------------------ USAGE ------------------------
64
65
66The API is documented in the libpuzzle(3) and puzzle_set(3) man pages.
67You can also play with the puzzle-diff test application.
68See puzzle-diff(8) for more info about the puzzle-diff application.
69
70In order to be thread-safe, every exported function of the library requires a
71PuzzleContext object. That object stores various run-time tunables.
72
73Out of a bitmap picture, the Puzzle library can fill a PuzzleCVec object :
74
75  PuzzleContext context;
76  PuzzleCVec cvec;
77
78  puzzle_init_context(&context);
79  puzzle_init_cvec(&context, &cvec);
80  puzzle_fill_cvec_from_file(&context, &cvec, "directory/filename.jpg");
81
82The PuzzleCvec structure holds two fields:
83  signed char *vec:  a pointer to the first element of the vector
84  size_t sizeof_vec: the number of elements
85
86The size depends on the "lambdas" value (see puzzle_set(3)).
87
88PuzzleCvec structures can be compared:
89
90  d = puzzle_vector_normalized_distance(&context, &cvec1, &cvec2, 1);
91
92d is the normalized distance between both vectors. If d is below 0.6, pictures
93are probably similar.
94
95If you need further help, feel free to subscribe to the mailing-list (see
96below).
97
98
99          ------------------------ INDEXING ------------------------
100
101
102How to quickly find similar pictures, if they are millions of records?
103
104The original paper has a simple, yet efficient answer.
105
106Cut the vector in fixed-length words. For instance, let's consider the
107following vector:
108
109[ a b c d e f g h i j k l m n o p q r s t u v w x y z ]
110
111With a word length (K) of 10, you can get the following words:
112
113[ a b c d e f g h i j ] found at position 0
114[ b c d e f g h i j k ] found at position 1
115[ c d e f g h i j k l ] found at position 2
116etc. until position N-1
117
118Then, index your vector with a compound index of (word + position).
119
120Even with millions of images, K = 10 and N = 100 should be enough to have very
121little entries sharing the same index.
122
123Here's a very basic sample database schema:
124
125+-----------------------------+
126|          signatures         |
127+-----------------------------+
128| sig_id | signature | pic_id |
129+--------+-----------+--------+
130
131+--------------------------+
132|           words          |
133+--------------------------+
134| pos_and_word | fk_sig_id |
135+--------------+-----------+
136
137I'd recommend splitting at least the "words" table into multiple tables and/or
138servers.
139
140By default (lambas=9) signatures are 544 bytes long. In order to save storage
141space, they can be compressed to 1/third of their original size through the
142puzzle_compress_cvec() function. Before use, they must be uncompressed with
143puzzle_uncompress_cvec().
144
145
146         ------------------------ PUZZLE-DIFF ------------------------
147
148
149A command-line tool is also available for scripting or testing.
150
151It is installed as "puzzle-diff" and comes with a man page.
152
153Sample usage:
154
155- Output distance between two images:
156
157$ puzzle-diff pic-a-0.jpg pics-a-1.jpg
1580.102286
159
160- Compare two images, exit with 10 if they look the same, exit with 20 if
161they don't (may be useful for scripts):
162
163$ puzzle-diff -e pic-a-0.jpg pics-a-1.jpg
164$ echo $?
16510
166
167- Compute distance, without cropping and with computing the average intensity
168of the whole blocks:
169
170$ puzzle-diff -p 1.0 -c pic-a-0.jpg pic-a-1.jpg
1710.0523151
172
173
174  ------------------------ COMPARING IMAGES WITH PHP ------------------------
175
176
177A PHP extension is bundled with the Libpuzzle package, and it provides PHP
178bindings to most functions of the library.
179
180Documentation for the Libpuzzle PHP extension is available in the README-PHP
181file.
182
183
184    ------------------------ APPS USING LIBPUZZLE ------------------------
185
186
187Here are third-party projects using libpuzzle:
188
189* ftwin - http://jok.is-a-geek.net/ftwin.php
190  ftwin is a tool useful to find duplicate files according to their content on
191your file system.
192
193
194           ------------------------ CONTACT ------------------------
195
196
197The main web site for the project is: http://libpuzzle.pureftpd.org
198
199If you need to share ideas with other users, or if you need help, feel free to
200subscribe to the mailing-list.
201
202In order to subscribe, just send a mail with random content to:
203
204listpuzzle-subscribe at pureftpd dot org
205
206For anything else, you can get in touch with me at:
207
208libpuzzle at pureftpd dot org
209
210If you are interested in bindings for Ruby, Python, PHP, etc. just ask!
211
212
213Thank you,
214
215               -Frank.
216

README-PHP

1
2                       .:. LIBPUZZLE - PHP EXTENSION .:.
3
4                         http://libpuzzle.pureftpd.org
5
6
7        ------------------------ PHP EXTENSION ------------------------
8
9
10The Puzzle library can also be used through PHP, using a native extension.
11
12Prerequisites are the PHP headers, libtool, autoconf and automake.
13
14Here are the basic steps in order to install the extension:
15
16(on OpenBSD: export AUTOMAKE_VERSION=1.9 ; export AUTOCONF_VERSION=2.61)
17
18cd php/libpuzzle
19phpize
20./configure --with-libpuzzle
21make clean
22make
23make install
24
25If libpuzzle is installed in a non-standard location, use:
26./configure --with-libpuzzle=/base/directory/for/libpuzzle
27
28Then edit your php.ini file and add:
29
30extension=libpuzzle.so
31
32
33            ------------------------ USAGE ------------------------
34
35
36The PHP extension provides bindings for the following tuning functions:
37- puzzle_set_max_width()
38- puzzle_set_max_height()
39- puzzle_set_lambdas()
40- puzzle_set_noise_cutoff()
41- puzzle_set_p_ratio()
42- puzzle_set_contrast_barrier_for_cropping()
43- puzzle_set_max_cropping_ratio()
44- puzzle_set_autocrop()
45
46Have a look at the puzzle_set man page for more info about those.
47
48Getting the signature of a picture is as simple as:
49
50$signature = puzzle_fill_cvec_from_file($filename);
51
52In order to compute the similarity between two pictures using their
53signatures, use:
54
55$d = puzzle_vector_normalized_distance($signature1, $signature2);
56
57The result is between 0.0 and 1.0, with 0.6 being a good threshold to detect
58visually similar pictures.
59
60The PUZZLE_CVEC_SIMILARITY_THRESHOLD, PUZZLE_CVEC_SIMILARITY_HIGH_THRESHOLD,
61PUZZLE_CVEC_SIMILARITY_LOW_THRESHOLD and PUZZLE_CVEC_SIMILARITY_LOWER_THRESHOLD
62constants can also be used to get common thresholds :
63
64if ($d < PUZZLE_CVEC_SIMILARITY_THRESHOLD) {
65  echo "Pictures look similar\n";
66}
67
68Before storing a signature into a database, you can compress it in order to
69save some storage space:
70
71$compressed_signature = puzzle_compress_cvec($signature);
72
73Before use, those compressed signatures must be uncompressed with:
74
75$signature = puzzle_uncompress_cvec($compressed_signature);
76
77