1History: (Changes,ChangeLog) 2 3 0.52 Sep18 4 2018-10 fix endianess of 16bit-pnm (NetPBM: most significant byte first) 5 it works partly with (old) wrong order, but noisy contour 6 2018-09 improve tests: random | base64 as FreeMono-Regular 80pt 7 2018-09 fix bad 7 as T detection, fix corner vectors for thinn fonts 8 fix debug-option-dependence -v32 of iIl|-vert.-line-detection 9 other chars may have that problem also 10 2018-09 skip UTF8 code above 16bit if __WCHAR_MAX__ is 16bit (VS) 11 2018-09 simplify xml-format, add xml-sample to the README 12 achars (alternative chars) include the main char now 13 2018-09 fix reading P5-PGM/PNM format (pnmtoplainpnm) 14 2018-09 some clean up compiler warnings, set default --with-debug 15 2018-09 error + exit on bad option, fix missing -h for --help 16 17 0.51 Jun13-Aug17 18 2017-08 fix some 8x9 (unsharp) screen fonts (0O,il1I,e), 19 from old samples and patches received 2005 20 2017-04 fix NULL-pointer access by Norbert M. 21 fix range check in nearest_frame_vector() (does not affect users) 22 add appended argument to option ("-v33" or "-v 33") 23 fix J vs. 3 (13x20) 24 fix compiler warning by typo, thx to Senh Liu, Jun2013 25 (still lot on my todo list) 26 27 0.50 Sep10-May12,Mar13 28 just release it to avoid questiions to old problems, give a life sign ;) 29 fix 4 parfait problems against 0.48 (thanks to Rich Burridge) 30 adding qrcode detection and decoding (no error correction, no skewing) 31 spacing slightly improved 32 context correction of hex codes (p.e. hex fingerprints) 33 some threshold value adaptions (not finished) 34 try to fix double output of XML code <...> and removed additional \n 35 improved quotation detection ,, '' 36 improved monospaced spacing (video text) 37 38 0.49 Aug09-Sep10 39 fix dot handling for ':' and ';' (vector code) 40 fix '@' for 7x9 and 5x8 fonts 41 fix double counting of subboxes (affects "0" (zero) with dot in it) 42 character "l" of width 1 improved 43 bug fix gluing chars ij of width=1 44 bug fix thresholding (small gray images) 45 return error code -1 on ERROR pnm.c unexpected EOF 46 fix conflicts with unicode_defs.h TRUE definition on gcc/alphaev7-osf/3.4 47 further fixes for lib by D. Katsubo 48 fix #3039007 "struct list" in list.h conflicts with STL (ocr_object_list) 49 fix #3039006 INFINITY macro in unicode.h conflicts with math.h 50 bugfix barcode 128, switch from mode mC to mA (":1") 51 bugfix: MultiPNM + database - ID: 2957140 52 improved barcode recognition - ID: 2859644 (bars wider than spaces) 53 quality test-script bin/gocr_chk.sh added 54 initial datamatrix support (ASCII + ASCII numeric only, no ErrCorrection) 55 56 0.48 Jul09 57 fix buffer overflow introduced in 0.46 for filenames 58 add codabar barcode 59 fix bug, removing melted serifs 60 add patch by Chris Lee, i25 barcode recognition + modifications 61 fix some false positive numbers "34" (video, gas meter) 62 fix problems with 2zZ4 for 10x10 screen font 63 better debug output for :;,. 64 remove examples, doc and libs part from configure (see below) 65 remove doc and examples from the (make install) part to reduce 66 dependencies (gs and transfig is not needed for rpm/ebuild) 67 gocr only may depend from netpbm, but can live without too 68 this will help to install gocr on "exotic" (nonlinux) platforms 69 fix gentoo app-text/gocr Bug 243250 src/Makefile: $(CC) $(LDFLAGS) ... 70 71 0.47 fix database recognition for certainty 100 (-a 100) 72 insert spaces with certainty 100 (old: 99) to let -a 100 work 73 new option -u string for unrecognized chars 74 fix: No contrast in image causes division by zero 75 reduced false positive recognition of scanned "a496" (Gutenberg Project) 76 "d as a" patch ID: 1556112 77 add "Windows Pipe Fix", but I hate extra code for bad environments 78 improve 7x10, sample 0811qemu1.png (ToDo: not finished) 79 change black:white from >4:1 to >3.5:1 as criteria of inversion 80 reintroduce static library libPgm2asc.a (make libs) for OSRA project 81 add dynamic library (make libs), unused but may help other projects 82 83 0.46 improved context correction (especially helvetica "Il") 84 improved recognition of tiny chars "$1", fat "s", "rw" "," 85 fix blank spaces problem in filenames 86 (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=316511) 87 !!! please check on other platforms and report to me !!! 88 there are still problems with special chars (double quodes, backslash) 89 better use this way: djpeg -gray -pnm strangefilename.jpg | gocr - 90 fix possible problem with database and UTF8 input 91 fix hidden bug in pitch/spacing initialization 92 reactivate code for output of glued chars and strings 93 fix wrong close() call 94 remove creation of pgm2asc.a for simplicity (see SF-patch 1827477) 95 96 0.45 minor corrections for c and k 97 minus sign is filtered by option -C "--" now, ("\-" was parsed badly) 98 clean up old unused code for simplicity (api, frontend) 99 fix problem with low height barcodes and barcode removing 100 fix problem with readpgm (for multiple images) and database 101 PACKAGE_VERSION defined by configure.in AC_INIT + gocr.spec 102 103 0.44 add volume to boxes (negative means white areas inside black areas) 104 Fix overflow in despeckling routine (verbose mode, dust removing) 105 reactivate composed chars, fix merge_boxes 106 fix problems with uncertain line detection and not recognized "7" 107 option -a has an effect now for the output 108 adaptions to MICR E13-B font (see GnuMICR), ToDo: 4 extra-chars 109 fix num_boxes in merge_boxes (affects line detection) 110 reduce 2 prompts to one per char in database mode, ^A for skip all 111 fix problem with smaller headlines 112 fix problems with tall font (4) 113 fix includes for non-linux-platforms 114 115 0.43 fix problem with dark frame around image 116 support multiple images, ex: giftopnm -image=all a.gif | gocr - 117 invert if obviously white on black (black_mass>=4*white_mass) 118 improve thresholding for discrete histograms 119 (note: this can particularly lead to bad results, will be fixed later) 120 speedup for big boxes (especially dark background) 121 fix memory leak (setas(same string) + detect_barcode) 122 fix uninitialized variables after insert spaces (num_frames) 123 fix frame_vector for single pixels (twice + ERROR idx out of range) 124 125 0.42 further parts of recognition engine relaced by vector version 126 changed colored debug output for out??.png, instead of out30.bmp 127 division of glued chars replaced (slower but more accurate) 128 fix framing of small font 129 fix problem with uninitialized pnm_readpaminit call (CPS 21Nov06) 130 better progress output (see progress.[ch]), new image debug output 131 switch to the new improved rotation detection 132 133 0.41 (buggy if --with-netpbm=no, apply the pgm-patch!) 134 otsu.c concentrates now only on high contrast regions 135 fix pnm reads for 2 byte pixels (--with-libpbm=no) 136 update man-page (mail me your suggestions) 137 fix g++ warnings, float-OPs replaced by int-OPs 138 spacing reviewed; make distance() more sensitive 139 xml-objects (barcode, melted chars) now also handled with weights 140 fix division by zero bug for vertical positioned characters 141 default output is UTF8 now, UTF-encoding bug fixed 142 added certainty option 143 added uninstall to Makefile 144 debug image format changed to png (using pipe) or ppm (fall-back) 145 much better word spacing (line-by-line based) 146 better DOT_ABOVE recognition 147 fix output of char groups or strings stored in database, utf8 input 148 fix buffer overflow in barcode decode39 149 fix lost comma on end of line 150 internal vector format added for future use (faster, scalable, rotable) 151 line detection extended 152 internal list management rewritten to fix memory leaks and segfaults 153 154 0.40 update PNM file reader to maxval > 255 155 (make rpm) updated 156 barcode-patch UPC_addon by Michael van Rooyen 157 CAPITAL_LETTER_A_WITH_OGONEK added 158 no "(PICTURE)" output for UTF8+ASCII (better for Mobile OCR project) 159 smooth_borders() bug fixed and reworked 160 5x7 and prop10 font adaptions 161 objects now detected by flood-fill algorithm (better?) 162 XML-output changed 163 changed auto dust detection (not final) 164 165 0.39 XML output added (subject of change, suggestions are welcome) 166 netpbm-link-error fixed in gocr.c and configure.in: 167 gocr.c: <config.h> changed to "config.h" 168 configure-option --with-netpbm=PATH and --without-netpbm added 169 update configure.in according to autoconf 2.57 170 wchar_h miss-configuration fixed in pgm2asc.c 171 fix compiler warnings 172 char filter accepts abbreviations now, like "0-9A-F" (but slow) 173 update READMEde.txt 174 output barcode tags (also improved recognition) 175 fix pnm.c for files like example.eps.pbm 176 fix detect.c for barcodes 177 fix ocr0n.c 0<->8g 178 179 0.38 move UTF/HTML/TeX decoding to getTextLine, return (char *) now 180 out_format HTML step towards detailed XML output 181 correct line detection for footnotes (detect.c) 182 "y" now seen as vowel (pgm2asc.c), I<vowel> susbtituted by l<vowel> 183 é-detection, á-output fixed 184 default dust_size is -1 now (auto detection = mean_size/10) 185 char filter added 186 ex: -C 0123456789ABCDEF - recognize only hexcodes 187 man page updated (hopefully correct syntax) 188 database bug fixed (small fonts, example by Chris) 189 several bugs fixed by W. Webber (thanks) 190 speed improved by 3rd-pass matrix filter in pixel() (pixel.c) (code from W. Webber) 191 bug in remove_dust (remove.c) fixed 192 for fonts bigger than 20x40 smooth_borders() changed (b/w-scans) 193 bug in O0-detection fixed 194 195 0.37 best-fit generates probability, not perfect but better results 196 bug in line detection removed (happens for lot of small boxes) 197 progress output (option -x <fileID|fname>) 198 counting versions number as floating point now 199 MACRON and DOT_ABOVE (not complete) defined (latin2) 200 adaptions for 5x7 and 6x12 screen font 201 doc/ocr.tex changed to doc/gocr.html (now independent of LaTeX) 202 symbols {} added 203 OCR-B font tested succesfull 204 better headline/picture distinction 205 bug removed (struct box.modifier is wchar_t now) 206 207 known bugs: to much newlines 208 209 0.3.6 210 CARON and Omega defined, 211 output of not defined chars (HTML="&#xxx;", TeX="\symbol{xxx}") 212 system dependend bug: isupper(>255) SIGSEGV fixed 213 better line detection for lines with lowercase chars only 214 lot of possible SIGSEGV in list_del() fixed 215 barcode recognition (UPC,code128) 216 .ps .eps via pstopnm supported 217 -m 256 switches off the main ocr engine (usefull together with -m 2 for identical chars) 218 strings added to database ("ff","ft","special-symbol") 219 gocr.tcl adapted to gocr v0.3 220 internal detection probability introduced 221 222 0.3.5 223 minor and major fixes (string\0 bugs) 224 memory leak fixes by Duncan Edwards 225 layout analysis or zoning (-m 4) improved, 226 now it detects pictures and columns much better 227 the behavior of setting threshold (-l) is slightly changed 228 wcsdup defined for non-gnu-systems (BSD), further Problems? 229 better context correction for 10 (IO,lO) 230 Fixes for S.Koledin examples "GlS" 231 Euro-currency-sign detection added 232 better pitch estimation for proportional font (needs to be improved) 233 make install DESTDIR= instead configure --prefix= (better?) 234 use wchar_t by default, more simple code and -f works with nonLinuxOS 235 line detection more robust against vertical glued chars (js) 236 -f UTF8 added (usefull for xterm -u8), should be default? 237 handle vertical glued boxes (ex: g over T) 238 0.3.4 239 some BSD adaptions (no WCHAR?), tell me if there are still problems 240 use unicode in database (4-8 hex digits) 241 new option: -p database_path/ 242 TILDE fixed, #, Æ, Å, etc. added (swedish,norwegian) 243 layout analysis improved 244 0.3.3 245 database (-m 2) bug fixed and interactive mode (-m 130) added 246 its not finished, but you can test it 247 result should be ok for machine generated images (no scans) 248 engine improved a bit 249 0.3.2 250 ocr-engine improved for screen fonts (thanks for examples) 251 option -f [HTML,TeX,...] added 252 0.3.1 253 make install updated 254 0.3.0 some parts of the code reviewed (most work done by Bruno Barberi Gnecco) 255 tkispell patch from David Pinson (exec bug fixed) 256 gnome frontend added (Dany De Bontrider) 257 acute, grave, circumflex ... detection 258 C++ parts rewritten into C, and much more (see REVIEW) 259 0.2.7 lib-patch from Klaas Freitag inserted, engine improved 260 option -n 1 detect only numbers, get threshold value by otsu.cc 261 xxx.pnm.bz2 can be used on linux systems bzip2 installed 262 0.2.6 pipes used on POSIX2-systems for easier use of jpg,gif,tiff,pnm.gz-files 263 example: gocr text.jpg; gocr text.pnm.gz 264 verbose output on stderr, text output on stdout, 265 redirection of output possible (-e, -o, example: -e /dev/stdout) 266 engine upgraded a bit (thx for the new sample files) 267 gocr.tcl upgraded (save options, save text) 268 DOS/WIN95-EXE created, download GOCREXE.ZIP (v0.2.5) 269 0.2.5 program convert renamed to jconv 270 you can choose stdin as input now, for using conversion tools 271 example: djpeg -pnm -gray text.jpg | gocr -i - 272 option "--help" added, some bugs removed 273 amiga.h added for SAS/C under AmigaOS (suggested by Uffe Holst) 274 line detection changed (faster?) 275 importing gocr in your C++ application is easier now (see fkt pgm2asc) 276 argument can be given instead of option -i (this is more natural) 277 some reorganization of code (not finished) 278 2000 downloads counted !!! Jun2000 279 SourceForge.net used for gocr (project: jocr, other gocr exist there) 280 bugs in dust removing, line detection and zoning fixed (rewritten) 281 first version of tcl/tk-GUI, test it! 282 rekursive function frame_nn() replaced by labyrint-algorithm (no extensiv stack used) 283 gluing of broken chars added, removing glued serifs (on small fonts) 284 new bugs added :; 285 0.2.4a2 some details are added (better dust removing and char division) 286 0.2.4 three char division (connected chars), dust removing 287 0.2.3 add layout analysis (very slowly, try -m 4), engine modified 288 better distance function, engine updated, database added for testing 289 1000 downloads counted !!! May2000 290 0.2.2 gocr_0_2.tgz expands into gocr_0_2 directory (thanks to zz99zz) 291 engine upgraded a bit, some bugs fixed (umlaut, thin lines) 292 short documentation added (ocr.tex) 293 colored output (out30.bmp, later out30.png) for test/development-mode 294 bug: read ASC-PBM and PCX (1 bit) fixed 295 0.2.1 first official release on freshmeat.net March 2000 296 0.2 line scanning added 297 0.1 project started (not documented), autumn 1998 - summer 1999 298