1# libpillowfight / pypillowfight 2 3Really simple C Library containing various image processing algorithms. 4It includes Python 3 bindings designed to operate on Pillow images (PIL.Image). 5 6The C library depends only on the libc. 7The Python bindings depend only on Pillow. 8 9APIs are designed to be as simple to use as possible. Default values are provided 10for every parameters. 11 12Python 2.x is *not* supported. 13 14Available algorithms are listed below. 15 16## Available algorithms 17 18* [Unpaper](https://github.com/Flameeyes/unpaper)'s algorithms 19 * Blackfilter 20 * Noisefilter 21 * Blurfilter 22 * Masks 23 * Grayfilter 24 * Border 25* Canny edge detection 26* Sobel operator 27* Gaussian blur 28* ACE (Automatic Color Equalization ; Parallelized implementation) 29* SWT (Stroke Width Transformation) 30* Compare: Compare two images (grayscale) and makes the pixels that are different 31 really visible (red). 32* Scan borders: Tries to detect the borders of a page in an image coming from 33 a scanner. 34 35 36## Python API 37 38The Python API can be compiled, installed and used without installing the C library. 39 40### Installation 41 42Latest release : 43 44```sh 45$ sudo pip3 install pypillowfight 46``` 47 48Development version : 49 50```sh 51$ git clone https://github.com/openpaperwork/libpillowfight.git 52$ cd libpillowfight 53 54# Both C library and Python module 55$ make 56$ sudo make install # will run python3 ./setup.py install + make install (CMake) 57 58# Or just the Python bindings 59$ make build_py 60$ make install_py # will run only python3 ./setup.py install 61``` 62 63### Usage 64 65For each algorithm, a function is available. It takes a PIL.Image instance as parameter. 66It may take other optionnal parameters. The return value is another PIL.Image instance. 67 68Example: 69 70```py 71import pillowfight 72 73input_img = PIL.Image.open("tests/data/brightness_problem.jpg") 74output_img = pillowfight.ace(input_img) 75``` 76 77### Tests 78 79```sh 80make check # will check style 81make test # will run the tests (will require tox) 82``` 83 84Test reference images are made on amd64. They should match also on i386. 85On other architectures however, due to slight differences regarding floating 86point numbers, results may vary slightly and tests may not pass. 87 88 89## C library 90 91### Installation 92 93```sh 94# C library only (will use CMake) 95$ make build_c 96$ sudo make install_c 97``` 98 99### Usage 100 101#### C code 102 103For each algorithm, a function is available. It takes a ```struct pf_bitmap``` 104as input. As output, it fills in another ```struct pf_bitmap```. 105 106```struct pf_bitmap``` is a really simple structure: 107 108```C 109struct pf_bitmap { 110 struct { 111 int x; 112 int y; 113 } size; 114 union pf_pixel *pixels; 115}; 116``` 117 118```(struct pf_bitmap).size.x``` is the width of the image. 119 120```(struct pf_bitmap).size.y``` is the height of the image. 121 122```union pf_pixel``` are basically 32 bits integers, defined in a manner convenient 123to retrieve each color independantly (RGB). Each color is on one byte. 4th byte is 124unused (no alpha channel taken into account). 125 126```(struct pf_bitmap).pixels``` must points to a memory area containing the image. 127The image must contains ```x * y * union pf_pixel```. 128 129 130#### Compilation with GCC 131 132``` 133$ gcc -Wall -Werror -lpillowfight -o test test.c 134``` 135 136 137## Note regarding Unpaper's algorithms 138 139Many algorithms in this library are re-implementations of algorithms used 140by [Unpaper](https://github.com/Flameeyes/unpaper). To make the API simpler 141to use (.. and implement), a lot of settings have been hard-coded. 142 143Unpaper applies them in the following order: 144 145* Blackfilter 146* Noisefilter 147* Blurfilter 148* Masks 149* Grayfilter 150* Border 151 152I would advise applying automatic color equalization (ACE) first. 153 154A basic documentation for some of the algorithms can be found in 155[Unpaper's documentation](https://github.com/Flameeyes/unpaper/blob/master/doc/basic-concepts.md). 156 157| Input | Output | 158| ----- | ------ | 159| [Black border problem](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem.jpg) | [ACE + Unpapered](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem_all.jpg) | 160| [Brightness problem](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/brightness_problem.jpg) | [ACE + Unpapered](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/brightness_problem_all.jpg) | 161 162## Available algorithms 163 164### Automatic Color Equalization (ACE) 165 166| Input | Output | 167| ----- | ------ | 168| [Brightness problem](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/brightness_problem.jpg) | [Corrected](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/brightness_problem_ace.jpg) | 169 170This algorithm is quite slow (~40s for one big image with one thread 171on my machine). So this version is parallelized (down to ~15s on a 4 172cores computer). 173 174 175#### Python API 176 177```py 178out_img = pillowfight.ace(img_in, 179 slope=10, 180 limit=1000, 181 samples=100, 182 seed=None) 183``` 184 185Use as many threads as there are cores on the computer (up to 32). 186 187This algorithm uses random number. If you need consistent results 188(for unit tests for instance), you can specify a seed for the 189random number generator. Otherwise, time.time() will be used. 190 191 192#### C API 193 194```C 195#define PF_DEFAULT_ACE_SLOPE 10 196#define PF_DEFAULT_ACE_LIMIT 1000 197#define PF_DEFAULT_ACE_NB_SAMPLES 100 198#define PF_DEFAULT_ACE_NB_THREADS 2 199extern void pf_ace(const struct pf_bitmap *in, struct pf_bitmap *out, 200 int nb_samples, double slope, double limit, 201 int nb_threads); 202``` 203 204This function uses random numbers coming (```rand()```). 205You *should* call ```srand()``` before calling this function. 206 207 208#### Sources 209 210* "A new algorithm for unsupervised global and local color correction." - A. Rizzi, C. Gatta and D. Marini 211* http://argmax.jp/index.php?colorcorrect 212 213 214### Scan border 215 216This algorithm tries to find page borders in a scanned image. It is designed 217to operate on images coming from a flatbed scanner or a scanner with an 218automatic document feeder. 219 220This algorithms looks for horizontal and vertical lines, and return the 221smallest rectangle that includes all those lines. To get the lines, it runs 222the Sobel operator on the input image and only keep the points with 223an angle of [0°, 90°, 180°, 270°] ±5°. 224 225This algorithm does not always work: 226- It's quite sensible to noise: dust, hair, etc may easily be counted 227 erroneously as lines. 228- Some scanners or drivers (most of Brother scanners for instance) "clean up" 229 the image before returning it. Unfortunately they often remove most of the 230 page borders in the process. 231 232Still, this algorithm can help users of GUI applications by pre-selecting a 233cropping area. 234 235 236| Input | Output | 237| ----- | ------ | 238| [brother_mfc7360n](https://gitlab.gnome.org/World/OpenPaperwork/libpillowfight/raw/master/tests/data/brother_mfc7360.jpeg) | (56, 8, 1637, 2275) | 239| [epson_xp425](https://gitlab.gnome.org/World/OpenPaperwork/libpillowfight/raw/master/tests/data/epson_xp425) | (4, 5, 2484, 3498) | 240| [brother_ds620](https://gitlab.gnome.org/World/OpenPaperwork/libpillowfight/raw/master/tests/data/brother_ds620.jpeg) | (3, 3, 2507, 3527) | 241 242#### Python API 243 244```py 245frame = pillowfight.find_scan_borders(img_in) 246``` 247 248 249#### C API 250 251```C 252struct pf_point { 253 int x; 254 int y; 255}; 256 257struct pf_rectangle { 258 struct pf_point a; 259 struct pf_point b; 260}; 261 262struct pf_rectangle pf_find_scan_borders(const struct pf_bitmap *img_in); 263``` 264 265 266#### Sources 267 268* ["Detecting Text in Natural Scenes with Stroke Width Transform"](http://cmp.felk.cvut.cz/~cernyad2/TextCaptchaPdf/Detecting%20Text%20in%20Natural%20Scenes%20with%20Stroke%20Width%20Transform.pdf) - Boris Epshtein, Eyal Ofek, Yonatan Wexler 269* https://github.com/aperrau/DetectText 270 271### Canny's edge detection 272 273| Input | Output | 274| ----- | ------ | 275| [Crappy background](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/crappy_background.jpg) | [Canny output](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/crappy_background_canny.jpg) | 276 277 278#### Python API 279 280```py 281img_out = pillowfight.canny(img_in) 282``` 283 284 285#### C API 286 287```C 288extern void pf_canny(const struct pf_bitmap *in, struct pf_bitmap *out); 289``` 290 291 292#### Sources 293 294* "A computational Approach to Edge Detection" - John Canny 295* https://en.wikipedia.org/wiki/Canny_edge_detector 296 297 298### Compare 299 300Simple algorithm showing the difference between two images. 301Note that it converts the images to grayscale first. 302 303It accepts a parameter 'tolerance': For each pixel, the difference with 304the corresponding pixel from the other image is computed. If the 305difference is between 0 and 'tolerance', it is ignored (pixels 306are considered equal). 307 308| Input | Input2 | Output | 309| ----- | ------ | ------ | 310| [Black border problem](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem.jpg) | [Black border problem + blackfilter](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem_blackfilter.jpg) | [Diff](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem_diff.jpg) | 311 312#### Python API 313 314```py 315(nb_diff, out_img) = pillowfight.compare(img_in, img_in2, tolerance=10) 316``` 317 318 319#### C API 320 321```C 322extern int pf_compare(const struct pf_bitmap *in, const struct pf_bitmap *in2, 323 struct pf_bitmap *out, int tolerance); 324``` 325 326Returns the number of pixels that are different between both images. 327 328 329### Gaussian 330 331| Input | Output | 332| ----- | ------ | 333| [Crappy background](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/crappy_background.jpg) | [Gaussed](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/crappy_background_gaussian.jpg) | 334 335One of the parameters is ```sigma```. If it is equals to 0.0, it will be computed automatically 336using the following formula (same as OpenCV): 337 338```C 339sigma = 0.3 * ((nb_stddev - 1) * 0.5 - 1) + 0.8; 340``` 341 342#### Python API 343 344```py 345img_out = pillowfight.gaussian(img_in, sigma=2.0, nb_stddev=5) 346``` 347 348 349#### C API 350 351``` 352extern void pf_gaussian(const struct pf_bitmap *in, struct pf_bitmap *out, 353 double sigma, int nb_stddev); 354``` 355 356 357#### Sources 358 359* https://en.wikipedia.org/wiki/Gaussian_blur 360* https://en.wikipedia.org/wiki/Gaussian_function 361 362 363### Sobel operator 364 365| Input | Output | 366| ----- | ------ | 367| [Crappy background](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/crappy_background.jpg) | [Sobel](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/crappy_background_sobel.jpg) | 368 369 370#### Python API 371 372```py 373img_out = pillowfight.sobel(img_in) 374``` 375 376 377#### C API 378 379```C 380extern void pf_sobel(const struct pf_bitmap *in_img, struct pf_bitmap *out_img); 381``` 382 383 384#### Sources 385 386* https://www.researchgate.net/publication/239398674_An_Isotropic_3_3_Image_Gradient_Operator 387* https://en.wikipedia.org/wiki/Sobel_operator 388 389 390### Stroke Width Transformation 391 392This algorithm extracts text from natural scenes images. 393 394To find text, it looks for strokes. Note that it doesn't appear to work well on 395scanned documents because strokes are too small. 396 397This implementation can provide the output in 3 different ways: 398 399* Black & White : Detected text is black. Background is white. 400* Grayscale : Detected text is gray. Its exact color is proportional to the stroke width detected. 401* Original boxes : The rectangle around the detected is copied as is in the output image. Rest of the image is white. 402 403(following examples are with original boxes) 404 405| Input | Output | 406| ----- | ------ | 407| [Black border problen](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem.jpg) | [SWT (SWT_OUTPUT_ORIGINAL_BOXES)](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem_swt.jpg) | 408| [Crappy background](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/crappy_background.jpg) | [SWT (SWT_OUTPUT_ORIGINAL_BOXES)](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/crappy_background_swt.jpg) | 409| [Black border problen](https://raw.githubusercontent.com/openpaperwork/libpillowfight/47b1f59ce9a5fb3816e3abd186c28cc4c6092e13/tests/data/black_border_problem.jpg) | [SWT (SWT_OUTPUT_BW_TEXT)](https://raw.githubusercontent.com/openpaperwork/libpillowfight/47b1f59ce9a5fb3816e3abd186c28cc4c6092e13/tests/data/black_border_problem_swt.jpg) | 410| [Crappy background](https://raw.githubusercontent.com/openpaperwork/libpillowfight/47b1f59ce9a5fb3816e3abd186c28cc4c6092e13/tests/data/crappy_background.jpg) | [SWT (SWT_OUTPUT_BW_TEXT)](https://raw.githubusercontent.com/openpaperwork/libpillowfight/47b1f59ce9a5fb3816e3abd186c28cc4c6092e13/tests/data/crappy_background_swt.jpg) | 411 412 413#### Python API 414 415```py 416# SWT_OUTPUT_BW_TEXT = 0 # default 417# SWT_OUTPUT_GRAYSCALE_TEXT = 1 418# SWT_OUTPUT_ORIGINAL_BOXES = 2 419 420img_out = pillowfight.swt(img_in, output_type=pillowfight.SWT_OUTPUT_ORIGINAL_BOXES) 421``` 422 423 424#### C API 425 426```C 427enum pf_swt_output 428{ 429 PF_SWT_OUTPUT_BW_TEXT = 0, 430 PF_SWT_OUTPUT_GRAYSCALE_TEXT, 431 PF_SWT_OUTPUT_ORIGINAL_BOXES, 432}; 433#define PF_DEFAULT_SWT_OUTPUT PF_SWT_OUTPUT_BW_TEXT 434 435extern void pf_swt(const struct pf_bitmap *in_img, struct pf_bitmap *out_img, 436 enum pf_swt_output output_type); 437``` 438 439 440#### Sources 441 442* ["Detecting Text in Natural Scenes with Stroke Width Transform"](http://cmp.felk.cvut.cz/~cernyad2/TextCaptchaPdf/Detecting%20Text%20in%20Natural%20Scenes%20with%20Stroke%20Width%20Transform.pdf) - Boris Epshtein, Eyal Ofek, Yonatan Wexler 443* https://github.com/aperrau/DetectText 444 445 446### Unpaper's Blackfilter 447 448| Input | Output | Diff | 449| ----- | ------ | ---- | 450| [Black border problem](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem.jpg) | [Filtered](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem_blackfilter.jpg) | [Diff](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem_blackfilter_diff.jpg) | 451 452 453#### Python API 454 455```py 456img_out = pillowfight.unpaper_blackfilter(img_in) 457``` 458 459 460#### C API 461 462```C 463extern void pf_unpaper_blackfilter(const struct pf_bitmap *in, struct pf_bitmap *out); 464``` 465 466 467#### Sources 468 469* https://github.com/Flameeyes/unpaper 470* https://github.com/Flameeyes/unpaper/blob/master/doc/basic-concepts.md 471 472 473### Unpaper's Blurfilter 474 475| Input | Output | Diff | 476| ----- | ------ | ---- | 477| [Black border problem](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem.jpg) | [Filtered](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem_blurfilter.jpg) | [Diff](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem_blurfilter_diff.jpg) | 478 479 480#### Python API 481 482```py 483img_out = pillowfight.unpaper_blurfilter(img_in) 484``` 485 486 487#### C API 488 489```C 490extern void pf_unpaper_blurfilter(const struct pf_bitmap *in, struct pf_bitmap *out); 491``` 492 493 494#### Sources 495 496* https://github.com/Flameeyes/unpaper 497* https://github.com/Flameeyes/unpaper/blob/master/doc/basic-concepts.md 498 499 500### Unpaper's Border 501 502| Input | Output | Diff | 503| ----- | ------ | ---- | 504| [Black border problem 3](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem3.jpg) | [Border](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem3_border.jpg) | [Diff](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem3_border_diff.jpg) | 505 506 507#### Python API 508 509```py 510img_out = pillowfight.unpaper_border(img_in) 511``` 512 513 514#### C API 515 516```C 517extern void pf_unpaper_border(const struct pf_bitmap *in, struct pf_bitmap *out); 518``` 519 520 521#### Sources 522 523* https://github.com/Flameeyes/unpaper 524* https://github.com/Flameeyes/unpaper/blob/master/doc/basic-concepts.md 525 526 527### Unpaper's Grayfilter 528 529| Input | Output | Diff | 530| ----- | ------ | ---- | 531| [Black border problem 3](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem.jpg) | [Filterd](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem_grayfilter.jpg) | [Diff](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem_grayfilter_diff.jpg) | 532 533 534#### Python API 535 536```py 537img_out = pillowfight.unpaper_grayfilter(img_in) 538``` 539 540 541#### C API 542 543```C 544extern void pf_unpaper_grayfilter(const struct pf_bitmap *in, struct pf_bitmap *out); 545``` 546 547 548#### Sources 549 550* https://github.com/Flameeyes/unpaper 551* https://github.com/Flameeyes/unpaper/blob/master/doc/basic-concepts.md 552 553 554### Unpaper's Masks 555 556| Input | Output | Diff | 557| ----- | ------ | ---- | 558| [Black border problem 2](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem2.jpg) | [Masks](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem2_masks.jpg) | [Diff](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem2_masks_diff.jpg) | 559 560 561#### Python API 562 563```py 564img_out = pillowfight.unpaper_masks(img_in) 565``` 566 567 568#### C API 569 570```C 571extern void pf_unpaper_masks(const struct pf_bitmap *in, struct pf_bitmap *out); 572``` 573 574 575#### Sources 576 577* https://github.com/Flameeyes/unpaper 578* https://github.com/Flameeyes/unpaper/blob/master/doc/basic-concepts.md 579 580 581### Unpaper's Noisefilter 582 583| Input | Output | Diff | 584| ----- | ------ | ---- | 585| [Black border problem](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem.jpg) | [Filtered](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem_noisefilter.jpg) | [Diff](https://raw.githubusercontent.com/openpaperwork/libpillowfight/master/tests/data/black_border_problem_noisefilter_diff.jpg) | 586 587 588#### Python API 589 590```py 591img_out = pillowfight.unpaper_noisefilter(img_in) 592``` 593 594 595#### C API 596 597```C 598extern void pf_unpaper_noisefilter(const struct pf_bitmap *in, struct pf_bitmap *out); 599``` 600 601 602## Contact 603 604* [Mailing-list](https://github.com/openpaperwork/paperwork/wiki/Contact#mailing-list) 605* [Bug tracker](https://github.com/openpaperwork/libpillowfight/issues/) 606 607 608#### Sources 609 610* https://github.com/Flameeyes/unpaper 611* https://github.com/Flameeyes/unpaper/blob/master/doc/basic-concepts.md 612