1TODO LIST 2 3Please send good ideas. 4 5Next release (0.5x) 6- compile-test dietlibc, TinyC tcc 7 CC="diet -Os gcc -Wall -s -nostdinc" ./configure 8- statistics foto vs. smoothing, follow neighbour pixels with max gradient 9 to max and min (contrast? distance? number of max/min centers) 10 for better (local) threshold 11- 1/8^2 reduction for fast text array/image/line detection via statistics? 12 (min. 5x7 font + 1pixel space) 13- replace getTextLine-loop(line) by output_text() (libs have to use pipes) 14 simplify cod 15- allow uppercase suffixes .JPG or .jpg 16- get rid of global variable job_t *JOB 17- make stderr silent for verbose=0 (using the libPgm2asc) see mail 2010-07-13 18 add debug messages to a debugstring on object list? 19 (clickable on XML front end)? 20- quality test script for groups of samples (to ensure improvement in recognition) 21 single chars or single chars in words, clean and difficult sets, 22 formatted texts 23 false negative + false positive 24 start with numbers? option numbers only? 25 output: find fname.{jpg,png} + fname.txt + compare output gocr against txt 26 options: [testpath] [testchars] 27 bin/gocr_chk.sh testbase/free/{clean,glued,dusty}/{numbers,text}/ 28 bin/gocr_chk.sh jocr/examples tmp09 29- fix problem with cutting melted chars (using vector frames) 30- better detection of agglutinated serifs (Gutenberg scans 086.png+171.png) 31- vectorize recognition (big step!, relation to other OSS?) 32 (find min distance to ideal vector patterns, start with <>()) 33- frame_nn is marking only the borders like frame_vector and removed later 34- handle broken and glueed chars by the database algorithm (-m 256 -m 130) 35- improve get_line2(), implement distance_to_point and distance_to_line 36- dot-matrix printouts (examples/matrix.jpg) (german: Nadeldrucker) 37- examples/inverse.pcx + examples/rotate45.pcx by nearest-box-to-line alg. 38 or mean nearest box (or its 4 edges) directions, 39 rotate only boxes (by creating new greater boxes and tread as new image) 40- proof replacement of getTextLine by getXMLline via pipe(?) or stdout 41 is pipe available on all platforms? 42- docu about ispell using via XML (what needed, test in gocr.tcl) 43- replace rest of UNDEFINED in unicode.c by its correct strings 44- add probability for box->m1..m4 (to reduce errors caused by bad line-scan) 45 call line detection function second time to improve unsure line data 46 47Next release (0.5x) 48- reduce pixel data by vectorization (big change, faster) 49- writing images through pipes (like reading) 50- using dictionary (optional) for replacing not recognized chars 51- Karsten.Hilbert@gmx.net: use ORChie WordBox-format 52 see http://http.cs.berkeley.edu/~fateman/kathey/ocrchie.html 53 aspell instead of ispell 54 55Near future: (planned version) 56- rewrite install-routine 57- perspective distortion (for cameras) 58- genetic algorithms engine (already in development, 0.8?). It includes 59 feature extraction and classification 60- support for other languages (may affect context_correction(), etc) (0.6?) 61- support for diagramation. Can be done using the Unicode+new stuff. I (bbg) 62 have some ideas. 63 64Far future: 65- gimp plugin 66- color support 67- Braille detection (usefull for blinds?) 68 see: American Journal of Physics Vol. 70, No. 7, p 684-688 (2002) 69 or use special foils 70- read image in smaller parts, to reduce memory usage. 71- frames should be recognized 72- better distance function (comparision of characters) 73- detection of orientation (i.a. 90,180,270deg rotation) 74- picture extraction 75- math formula detection, font type detection 76- handwritten texts (blockletters) 77 --- uff, really a lot of work --- 78- Feel free and add your suggestions and wishes, 79 or tell me, what is the most important point for you. 80