• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..04-Jul-2021-

cmake/H04-Jul-2021-9887

doc/H04-Jul-2021-5352

include/opencv2/H04-Jul-2021-1,168323

misc/objc/H04-Jul-2021-2928

samples/H03-May-2022-7,9337,498

src/H04-Jul-2021-7,5935,628

test/H04-Jul-2021-168111

tutorials/H04-Jul-2021-128115

README.mdH A D04-Jul-20213 KiB5938

text_config.hpp.inH A D04-Jul-2021129 85

README.md

1Scene Text Detection and Recognition in Natural Scene Images
2============================================================
3
4The module contains algorithms to detect text, segment words and recognise the text.
5It's mainly intended for the "text in the wild", i.e. short phrases and separate words that occur on navigation signs and such. It's not an OCR tool for scanned documents, do not treat it as such.
6The detection part can in theory handle different languages, but will likely fail on hieroglyphic texts.
7
8The recognition part currently uses open-source Tesseract OCR (https://code.google.com/p/tesseract-ocr/). If Tesseract OCR is not installed on your system, the corresponding part of the functionality will be unavailable.
9
10Here are instructions on how to install Tesseract on your machine (Linux or Mac; Windows users should look for precompiled binaries or try to adopt the instructions below):
11
12Tesseract installation instruction (Linux, Mac)
13-----------------------------------------------
14
150. Linux users may try to install tesseract-3.03-rc1 (or later) and leptonica-1.70 (or later) with the corresponding development packages using their package manager. Mac users may try brew. The instructions below are for those who wants to build tesseract from source.
16
171. download leptonica 1.70 tarball (helper image processing library, used by tesseract. Later versions might work too):
18http://www.leptonica.com/download.html
19unpack and build it:
20
21cd leptonica-1.70
22mkdir build && cd build && ../configure && make && sudo make install
23
24leptonica will be installed to /usr/local.
25
262. download tesseract-3.03-rc1 tarball from https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&usp=sharing
27unpack and build it:
28
29# needed only to build tesseract
30export LIBLEPT_HEADERSDIR=/usr/local/include/
31cd tesseract-3.03
32mkdir build && cd build
33../configure --with-extra-includes=/usr/local --with-extra-libraries=/usr/local
34make && sudo make install
35
36Tesseract will be installed to /usr/local.
37
383. download the pre-trained classifier data for English language:
39https://code.google.com/p/tesseract-ocr/downloads/detail?name=eng.traineddata.gz
40
41unzip it (gzip -d eng.traineddata.gz) and copy to /usr/local/share/tessdata.
42
43Notes
44-----
451. Google announced that they close code.google.com, so at some moment in the future you may have to find Tesseract 3.03rc1 or later.
46
472. Tesseract configure script may fail to detect leptonica, so you may have to edit the configure script - comment off some if's around this message and retain only "then" branch.
48
493. You are encouraged to search the Net for some better pre-trained classifiers, as well as classifiers for other languages.
50
51
52Text Detection CNN
53=================
54
55Intro
56-----
57
58The text module now have a text detection and recognition using deep CNN. The text detector deep CNN that takes an image which may contain multiple words. This outputs a list of Rects with bounding boxes and probability of text there. The text recognizer provides a probabillity over a given vocabulary for each of these rects.
59