1TODO LIST
2
3Please send good ideas.
4
5Next release (0.5x)
6- compile-test dietlibc, TinyC tcc
7  CC="diet -Os gcc -Wall -s -nostdinc" ./configure
8- statistics foto vs. smoothing, follow neighbour pixels with max gradient
9  to max and min (contrast? distance? number of max/min centers)
10  for better (local) threshold
11- 1/8^2 reduction for fast text array/image/line detection via statistics?
12  (min. 5x7 font + 1pixel space)
13- replace getTextLine-loop(line) by output_text() (libs have to use pipes)
14  simplify cod
15- allow uppercase suffixes .JPG or .jpg
16- get rid of global variable job_t *JOB
17- make stderr silent for verbose=0 (using the libPgm2asc) see mail 2010-07-13
18  add debug messages to a debugstring on object list?
19  (clickable on XML front end)?
20- quality test script for groups of samples (to ensure improvement in recognition)
21  single chars or single chars in words, clean and difficult sets,
22  formatted texts
23  false negative + false positive
24  start with numbers? option numbers only?
25  output: find fname.{jpg,png} + fname.txt + compare output gocr against txt
26  options: [testpath] [testchars]
27  bin/gocr_chk.sh testbase/free/{clean,glued,dusty}/{numbers,text}/
28  bin/gocr_chk.sh jocr/examples tmp09
29- fix problem with cutting melted chars (using vector frames)
30- better detection of agglutinated serifs (Gutenberg scans 086.png+171.png)
31- vectorize recognition (big step!, relation to other OSS?)
32  (find min distance to ideal vector patterns, start with <>())
33- frame_nn is marking only the borders like frame_vector and removed later
34- handle broken and glueed chars by the database algorithm (-m 256 -m 130)
35- improve get_line2(), implement distance_to_point and distance_to_line
36- dot-matrix printouts (examples/matrix.jpg) (german: Nadeldrucker)
37- examples/inverse.pcx + examples/rotate45.pcx by nearest-box-to-line alg.
38  or mean nearest box (or its 4 edges) directions,
39  rotate only boxes (by creating new greater boxes and tread as new image)
40- proof replacement of getTextLine by getXMLline via pipe(?) or stdout
41  is pipe available on all platforms?
42- docu about ispell using via XML (what needed, test in gocr.tcl)
43- replace rest of UNDEFINED in unicode.c by its correct strings
44- add probability for box->m1..m4 (to reduce errors caused by bad line-scan)
45  call line detection function second time to improve unsure line data
46
47Next release (0.5x)
48- reduce pixel data by vectorization (big change, faster)
49- writing images through pipes (like reading)
50- using dictionary (optional) for replacing not recognized chars
51- Karsten.Hilbert@gmx.net: use ORChie WordBox-format
52  see http://http.cs.berkeley.edu/~fateman/kathey/ocrchie.html
53  aspell instead of ispell
54
55Near future: (planned version)
56- rewrite install-routine
57- perspective distortion (for cameras)
58- genetic algorithms engine (already in development, 0.8?). It includes
59  feature extraction and classification
60- support for other languages (may affect context_correction(), etc) (0.6?)
61- support for diagramation. Can be done using the Unicode+new stuff. I (bbg)
62  have some ideas.
63
64Far future:
65- gimp plugin
66- color support
67- Braille detection (usefull for blinds?)
68  see: American Journal of Physics Vol. 70, No. 7, p 684-688 (2002)
69  or use special foils
70- read image in smaller parts, to reduce memory usage.
71- frames should be recognized
72- better distance function (comparision of characters)
73- detection of orientation (i.a. 90,180,270deg rotation)
74- picture extraction
75- math formula detection, font type detection
76- handwritten texts (blockletters)
77   --- uff, really a lot of work ---
78- Feel free and add your suggestions and wishes,
79  or tell me, what is the most important point for you.
80