1This is a list of reviews, so that developers don't get confused with the new 2changes. Please document what you do here. 3 4() done 5[] done, but should be reviewed 6|| to be done, do if you can 7?? question: if you know, solve it 8Writer: put your initials inside. 9 105/8 11(bbg) changed box1 to head_data. 12(bbg) changed boxd to head_db. 13|bbg| box2 static variables do not exist anymore. They were confusing, 14 since these names were used as arguments to functions too. From what I've 15 seen, this will not affect the program. 16 (js) agree 17(bbg) added new file: box.c, with all the functions that deal with boxes. 18(bbg) added new file: database.c, with all the functions that deal with db. 19[bbg] ocr_db() 20|bbg| split pgm2asc(): the function would still exist, but call other 21 subfunctions which are part of it now. I suggest: 22 - find_letters() 23 - remove_dust() 24 - remove_pictures() 25 - remove_melted_serifs() 26 - find_longest_line() 27 - detect_lines() 28 etc: they are practically already defined between {}. 29?bbg? pgm2asc: it has 2 frame letter codes, 2 find pictures, 2 remove 30 dust 31 |js| It is result of last (not finished) big rewriting of code 32 will be changed later 33|bbg| vvv (or equivalent) should be global. 34 356/8 36(bbg) general linked lists done (list.*). 37|bbg| Functions in box.c should be changed by the new general ones. 38 linearr should too. 39(bbg) strc() and in_str() changed by strchr. 40?bbg? shouldn't test_umlaut() belong to another file, like ocr0.cc? 41 js: good idea 42 (bbg) moved to ocr0.cc 43?bbg? there are several functions with 2 or 3 versions (detect_lines, 44 detect_lines2, for example). Do they do the same thing (and one 45 is older) or different things? 46 js: different, they are different childs from same mother 47 they are going more different in future 48?bbg? What do ini_list(), excude() and getresult() do? They don't seem 49 to be used. If they are useless, please delete them. 50 js: read the new comments in the file (should be in seperate file?) 51|bbg| document pixel(). 52 :done. 53|bbg| Many, many commented lines all over the file; should be cleaned. 54 :done. 55|bbg| Change new->malloc, delete->free. 56 :done 57 587/8 59(js) some comments added in pgm2asc.cc 60|bbg| instead of p.p[x+y*p.x], use the new macro pixel_at(p, x, y). 61 Not all substitutions done yet. 62 :done. 63(bbg) excude() renamed to exclude(). 64?bbg? what are the pixel bits meaning? A table would be useful. 65 (js) not stable: lowest (3-4?) bits are for temporary use 66 highest (4?) bits are for intensity 67|bbg| must change all "type &p" arguments. 68 :done 69|bbg| should make argument order standard: at the present, sometimes the 70 order is x0,x1,y0,y1, sometimes x0,y0,x1,y1, pix is not always the 71 first, etc. 72(bbg) turmite(): changed for+if -> while. Is is correct? I think so, but 73 a test wouldn't hurt. Also optimized the function. 74 (js) I test it after complete review. 75 768/8 77(js) pgm2asc.cc: generally pix *p used, pgm2asc() partly rewritten 78|js| ocr0.cc: pix *p must be used everywhere 79 :done 80(bbg) pixel(): first part (c33 filling) optimized. 81(bbg) minor optimizations, cut superflous counters. 82?bbg? copybox: shouldn't b->p be free()d, or realloc()ed? There's also a new 83 faster version (using memcpy), but there's no unmarking of pixels; I'm 84 also not sure if they are equivalent, since I'm not calling pixel. It's 85 between #ifs. 86 879/8 88(bbg) Started to change linearr to linked lists; linearr is obsolete now, 89 and textlines too. 90(bbg) free_textlines() and getTextLine() use linked lists (instead of linearr) 91 now. The first may be deleted in the future. 92[bbg] store_boxtree_textlines use linked lists (instead of linearr) now, but 93 should be tested and reviewed; specially if (box2->c == ' ') since it 94 doesn't check for an \n anymore (why should it?). Also the extra list_app 95 should be checked. Check too if "if (!(mo & 8))" shouldn't come before 96 "if (box2->c != '_')". 97 :done 98(bbg) Moved copybox() to box.c 99?bbg? put(): What are ia and io? 100 js: ia int_and, io int_or new_pixel=(old_pixel & ia) | io 101 for more praktical use 102|bbg| What about putting remove_dust, remove_picture, remove_etc in another 103 file, such as clean.c or remove.c? 104 js: should be made 105 bbg: done 106|bbg| several #ifdef's pending approval or review. Get rid of them. 107 :done 108[bbg] Added code to open pgm files using libpgm. Not tested, but should be OK. 109[bbg] started to write writepbm, writeppm. 110 11110/8 112[js] pgm2asc() new functions created 113(bbg) moved remove_* functions to remove.c 114(bbg) added doc/ directory; moved ocr.tex to doc/; created examples.txt. 115(bbg) added UNICODE support: unicode.h contains all symbols we will ever need, 116 unicode.c contains two conversion functions. 117(bbg) most of Unicode->TeX convert() (0x7F-0xFF) codes are done. 118[bbg] several compose() codes done. 119 12011/8 121(bbg) new pnm IO functions, much better now. 122[bbg] fixed part of ocr0.cc: since arguments changed from pix &p to pix *p, it 123 wasn't compiling. Must fix the pix b; should it be changed to pix *b? 124 :done 125|bbg| Use new Unicode code; change all old code using unsigned char to wchar_t. 126 Work started. Take care: old libc functions won't work anymore. Use those 127 defined in <wchar.h>. Perhaps we should wait until the review is done and 128 gocr is working again. 129 :do it in 0.3.1 only. 130?bbg? should we move wert code to a new file? 131 (js) we should 132?bbg? should we move all code a /src sudirectory? 133 (js) we should, same step should rename .cc to .c 134|bbg| fix Makefile (or better, configure.in) to reflect new files. 135 :done 136[bbg] started to change all the for(box=head_data) to for_each_data. CAREFUL: 137 you cannot use for_each_data recursively. On the next layer, you must use 138 something like for(box=list_get_header; box; box=list_next(box)); 139 :done 140 14113/8 142(bbg) finished changing for(box=head_data). Old box_* functions still used, 143 must fix. 144(bbg) cleaned 99% of the warning/errors when compiling pgm2asc.cc 145[bbg] fix: warning: control reaches end of non-void function 146 `compare_unknown_with_known_chars(pix *, int)' 147 :done 148[bbg] check if pgm2asc.cc:474:i = ((pixel(p, x, y) < cs) ? 0 : 1); is useless 149 or not. 150 (js) useless, relict from older review-changes 151(bbg) vvv is now part of struct environment. 152(bbg) removed Uchar, may conflict with other libs. Use unsigned char. 153 15415/8 155(bbg) Tim Waugh sent man page; added him in the thanks of README.txt 156[bbg] should break README.txt in: INSTALL, CREDITS, TODO, HISTORY. 157 :done 158(bbg) more changes from box_* to list_* 159 16016/8 161(bbg) added greek letters to the unicode.c functions. Not complete. 162(bbg) added punctuation symbols to the unicode.c functions. Not complete. 163(bbg) added ISO8859-1 support to the unicode.c functions. 0x20-0xFF supported, 164 ligatures supported. 165 16618/8 167[bbg] fix try_to_divide_boxes to the new general linked list. 168[bbg] most for_each_data have a line like box2 = list_get_current, which is 169 unnecessary. Change box2 by list_get_current. To avoid clumsiness, it 170 maybe a good idea to create a new #define lgc(a) list_get_current(a). 171 17225/8 173(js) ./src created, .cc,.h,.c moved to src, .cc renamed to .c 174[js] try to get make working (make jconv) see config.h: USE_LIBPNM => HAVE_PNM_H 175|js| list.c L66 data_before ??? what does that mean? 176 (bbg) mistyped, fixed. 177 list_del should be tested 178 :done 179?js? unicode.c there are undefined DEFS L89++ (i have put it in /* */) 180 L150++,L185++,L633++ 181 (bbg) fixed 182[js] make works again (now you can make compilation tests on changes) 183 ... but SEGFAULT 184 18526/8 186(bbg) fixed casting warnings in unicode.c; fixed undefined DEFS. 187|bbg| fix the missing LaTeX codes in unicode.c. 188 18928/8 190?js? configure does not find /usr/X11R6/include/pnm.h, why? 191[js] fixed some bugs to get gocr working, but there are still big bugs 192 19330/8 194(bbg) fixed all gcc warnings. Some bugs killed in the process. 195(bbg) fixed pnm.c compilation problem 196|bbg| have to fix database.c:102 197|bbg| MANDATORY: change all old linked list code to the new one. This means: 198 get rid of box_* functions, the header->data stuff, etc. The new linked 199 list code is probably working 100%. 200(bbg) changed $(CXX) to $(CC). C conversion completed! 201(bbg) added jconv and gocr sections to src/Makefile. 202(bbg) new ISO8859-1 codes in unicode.c:convert. 203(bbg) fixed several bugs in list.c, and patched list_del() to return 2 if 204 deleted data was list->current. 205 2061/9 207(bbg) added new list_init(). Use it. 208(bbg) wrote a fix to the list_del()+for_each_data bug. It's not very good, but 209 works. Fixes the nested loops too. Now you can use for_each_data 210 recursively. 211 2128/9 213(bbg) more fixes from the old LL system to the new one. 214(js) added list_higher_level(), fix list_lower_level() 215 bug fixed free_textlines(),store_boxtree_lines(),store_boxtree_lines() 216(js) review of pgm2asc() finished 217 2189/9 219(bbg) more fixes from the old LL system to the new one. 220(bbg) fixed the realloc fixes of 8/9 in fix list_lower_level(), 221 free_textlines(),store_boxtree_lines(), and fixed list_higher_level(), 222 to avoid a blow if realloc fails. Now things continue working. 223(js) compilation bug removed 224(bbg) reviewed context_correction(). 225(bbg) moved output_list() and write_img() to output.c. Added output.h. 226?bbg? Since output.c functions are only for debugging, what do you think about 227 adding some #ifdef DEBUGs to make a smaller, faster code? It doesn't have 228 to be done now, but it's an idea. 229 (js) its general good idea, but I would use a DEBUG level 230 DEBUG > 2 (or similar) 231 during development (until version 1.0) DEBUG should be activated 232 by default, 233 people could experiment with it and experts could deactivate DEBUG 234(bbg) cleaned old textline functions. 235 23611/9 237(js) have fixed two major bugs, gocr now works again (but only -m 56 works) 238 23918/9 240(js) pedantic compiler warnings removed 241 bbg: compiling using pam (netpbm, see 22/9) functions generate some warnings, 242 which are caused by pam.h and do not affect gocr. 243(js) list_del() does not work proper in context_correction(), fixed 244 bbg: list_del return value should be checked always. I'm not sure what 245 behaviour you want, so I didn't add the checks. 246(js) malloc_box() bug fixed (memcpy src-dest mismatch) 247 24821/9 249(js) pgm2asc.c L1954, 2nd scan removed, now much faster, but 250 handling of umlaut etc. is missing (similar to gluing function) 251 bbg: umlaut is detected. Are you sure it's missing? 252(js) I am away until Sept 30th 2000 253 25422/9 255|bbg| pnm_readpnmrow() SIGSEGV's. Don't know why. 256 ?js? (when does it happens? example file?) 257 bbg: any file, here. Does it work for you? Are you sure isn't the pam 258 functions? 259(bbg) added new code to read pnm's using netpbm package (pam* functions). It 260 works, but the drawback is that only recent (August 2000) libraries have 261 these functions. Added test for pam.h in configure. 262[bbg] moved example files to ~/tests/. Updated Makefile.in to reflect changes 263 and moved the pertinent stuff to tests/Makefile. I don't have experience 264 with autoconf, but I think I did it right. :) 265(bbg) fixed tests for unistd.h, which were missing. 266|bbg| How to change gocr.tcl splitted windows proportion? It'd be nice to be 267 able to have a larger output area. 268|bbg| gotta fix make.bat. There's a make for DOS, however; should we delete 269 make.bat? 270(bbg) wrote INSTALL. Moved (no changes) history from README.txt to HISTORY. 271|bbg| review README.txt. Create a TODO file, BUGS, etc. 272 :done 273 27423/9 275?bbg? database.c::ocr_db(), is any use for box3? 276(bbg) added "Elapsed time". 277(bbg) minor fixes. 278 27924/9 280|bbg| remove_dust() broken? It's not being used currently, since it's the last 281 piece of code using old list functions, and if it's updated, ocr1.c returns 282 a bunch of "#hmm, something was going wrong". So, the problem may be in 283 ocr1.c. I wrote the patch for remove_dust(), it's in #ifdefs. 284 :done 285(bbg) patched rest of remove.c to with new list functions. 286[bbg] remove_pictures() loop is weird. I followed what was there, but it should 287 be checked. 288 2891/10 290(bbg) for_each_data now checks the return value of list_higher_level(). 291(bbg) moved some functions to pixel.c 292(bbg) some cleanup 293 2943/10 295(js) minor changes in Makefiles, add_line_infos() improved 296 2976/10 298(bbg) fixed bug when inserting a node before header, in list_ins(). 299(bbg) fixed bug that made for_each_data use list->header twice: l->fix wasn't 300 properly initialized. 301(bbg) added list_sort(). Runned some tests, seems to be OK. 302 3037/10 304(bbg) updated list_sort() to a faster version. Runned tests. 305(bbg) some minor cosmetic fixes 306|bbg| amiga.h: is it really needed, can someone test it? 307|bbg| As soon as remove_dust is fixed, the old list code can be deleted, which 308 would be nice. So, Joerg, can you please fix it? Thanks. 309 [js] Have I fixed it? 310 :done 311 3129/10 313(mg) moved gocr.tcl and create_db to new bin folder 314(mg) cleaned up Makefile.in (some subdirs not included yet) 315(mg) created new sub-folder lib 316(mg) created new sub-folder include 317[mg] installation for src is done very rudimentary 318 31913/10 320(bbg) split README.txt in BUGS, TODO and CREDITS. Joerg: Please write an AUTHORS 321 file. 322(bbg) renamed README.txt to README. 323(bbg) added list_empty(). Fixed list_higher_level to avoid empty lists. BTW: I 324 think that passing an empty list (l->header==NULL) may blow some of the 325 list functions. 326 32716/10 328(bbg) added AUTHORS. 329 33022/10 331(js) minor changes, glue_broken_chars() improved, ocr0 updated 332?js? list_del() seems not to work in nested for_each_element()-loops 333[js] remove_dust fixed? 334 :done 335 33623/10 337(bbg) rewrote list_del, using a new kind of fix. It will work as long as the 338 data is not freed by the user. A solution to fix this is to pass a pointer 339 to a function that will free the data. 340 34125/10 342(js) glue_broken_chars() recognition of "=" fixed, some other bugs fixed 343(js) some improvements for recognition (ocr0n.c,ocr0.c) 344 34527/10 346(js) further improvements for better recognition results, polish.tex added 347 34828/10 349(bbg) changed free_boxtree() to new code 350|bbg| have to fix the reast of the old box code 351 :done 352 3534/11 354[bbg] one of the frees of the new free_boxtree code sigsevs. I dunno why. 355 (js) boxlist->element->p is a pointer to existing whole pixmap, do not free this! 356[bbg] I commented the following lines of pgmasc.c: 1430,43,55,67,81,95, since 357 they seem to be unnecessary. 358 (js) hmm: they are not needed for list_ins, but could be usefull for 359 a engine which looks for surrounding boxes. Do you know what I mean? 360 It was the last step of the conversion to the 361 new list routines. *uf*. 362(bbg) ALL OLD CODE ROUTINES ARE DEPRECATED NOW. Mission successful, waiting for 363 permission to delete. :) 364 (js) permission granted 365(bbg) Just to reassure, I tested gocr today and it's 100% 366[bbg] change sort_boxes to a call to list_sort. sort_box_func should be 367 reviewed 368 (js) looks ok to me. 369 or tini_list(){ 370 (js) ??? 371?bbg? ini_list(), exclude() & getresult(): should they be deleted? 372 (js) not found :( 373 I'm working 374 on a probability based engine, using neural networks. 375 3766/11 377(js) get Segmentation fault, => fixed 378 37914/11 380(bbg) deleted old list code. 381(bbg) moved all detect* functions to detect.c 382(bbg) movde lines functions to lines.c 383 38422/11 385(bbg) reviewed frame_nn(), mark_nn(), num_hole(), num_obj(). Some other minor 386 revisions. 387(bbg) added outbounds() macro in gocr.h. It should be used from now on. 388 38924/11 390(js) some patches by J.R.V. Zandt added 391?js? have still problems detecting pnm-libs via autoconf/configure 392 39325/11 394(bbg) added HTML support for all ISO8859-1 characters in unicode. 395 39627/11 397(bbg) some fixes in unicode.* 398|bbg| whatletter needs a serious revision. 399 40030/11 401[js] split ocr0.c (compiles faster now), gcc 2.95.1 "bug" removed? 402 40310/12 404[bbg] Unicode is now supported. Some tests should be made, specially testing 405 UNKNOWN/PICTURE and >0xFF characters. 406 40717/12 408[js] pamlib,netlib,pnmlib detection (configure.in) changed 409 41021/12 411[jrv] pgm2asc changes: 412 context_correction() BUGFIX: test whether 413previous(previous(current)) exists, before trying to dereferencing the pointer. 414 follow_path() introduced: follow a path, recording transitions 415between dark and light. 416 xrealloc() introduced: safe memory allocation 417 loop() optimization: move direction test outside loop, simplify loop test 418 measure_pitch() introduced: detect monospaced font and measure the pitch 419 pgm2asc(): if font is monospaced, set spc per measured character spacing. 420 A few wording fixes. 421 42205/01 423(js) some fixes in glue_broken_chars results in good ";" and "!" detection 424 engine updated (examples/font1.pbm should work perfect) 425 42619/02 427(js) "make dist" now automaticly makes packaging (gocr-x.y.z.tgz) 428 does the Makefile in api working (especially make clean/proper)? 429 "make install" should work too 430 bbg: API makefiles work, but there's no make proper (use distclean instead) 431 I'd appreciate if someone could test the configure and libpnm. install is 432 not tested yet. 433 43419/04 435[js] USE_UNICODE defined as default in gocr.h (test it), some fixes to get 436 -f TeX, -f HTML running, fixed bug in lines.c (unterminated string) 437 measure_pitch extended to 12<width<150 (see example) 438 43915/05 440[js] list.c line 225+226 realloc( .., (l->level+2)..) correct? (S.Niemz bug) 441 44230/07 443?js? readpnm if USE_libpnm does not the job of old readpgm (for pbm-files) 444 44522/08 446(js) gcc -Wall -pedantic warnings fixed, ocr0.c wchar_t used by default 447 we should switch to wchar_t, its more flexible and ANSI (?) 448 and the huge number of "#ifdef USE_UNICODE" statements are looking bad 449?js? can someone compile test on a machine without wchar.h? (FREEBSD?) 450 45125/08 452?js? I got following warning by gcc 2.95.3 20010315 (-Wall -pedantic): 453 lines.c: In function 'store_boxtree_lines': 454 lines.c:136: warning: implicit declaration of function 'wcsdup' 455 but I included <wchar.h>. Does anybody understand this??? 456 hmmm ... Is it because of not ANSI (but GNU) extension? 457 What should we do with non-standart C-functions? 458 May be therefore I got the complains from FREEBSD users? 459 46030/08 461(js) pgm2asc.c, output.c simplified, improvement of char-devision 462 46308/02/2002 464?js? all possible SIGSEGVs in list.c fixed 465 46615/02/2002 467[js] ocr0.c will use setac() in future manipulating struct box - tac,wac 468 and c, if chars are very similar this will make context correction more 469 easyly, filtering will be more easily too (not fully implemented) 470 47125/02/2002 472[js] box->obj added for storing more than one char 473 47402/06/2002 475(jb) added special encodings for '&', '<', '>' to HTML decoder 476 47705/06/2002 478[jb] job_t introduced (same patch as sent to the mailing list) 479 Only local variables (configuration and pixmap) of main() are 480 moved inside job_t yet. 481[jb] boxlist and linelist converted to job_t 482[jb] Temorary introduction of global variable JOB (of type job_t) 483 This variable will be removed, while all functions are converted to 484 receive either an job_t pointer or the needed values. 485[jb] n_run, env, db_path and ppo converted to job_t 486[jb] converted dblist 487[jb] renamed init_job and free_job to job_free and job_init 488[jb] converted nearly all global variables (except warn and debug) to 489 job_t. Used JOB as new temporaryr global variable. 490 Overview of the renaming (some old variables are combined to a new one): 491 492 OLD VAR NAMES NEW VAR NAME 493 * main():inam JOB->src.fname 494 * main():p JOB->src.p 495 env.p 496 * main():init JOB->tmp.init_time 497 * ppo JOB->tmp.ppo 498 * n_run JOB->tmp.n_run 499 * dblist JOB->tmp.dblist 500 * boxlist JOB->res.boxlist 501 * linelist JOB->res.linelist 502 * lines JOB->res.lines 503 * env.avX JOB->res.avX 504 * env.avY JOB->res.avY 505 * env.sumX JOB->res.sumX 506 * env.sumY JOB->res.sumY 507 * env.numC JOB->res.numC 508 * main():cs JOB->cfg.cs 509 env.cs 510 * main():spc JOB->cfg.spc 511 * main():mo JOB->cfg.mode 512 env.mode 513 * main():dust_size JOB->cfg.dust_size 514 * main():numo JOB->cfg.only_numbers 515 only_numbers 516 * main():verbose JOB->cfg.verbose 517 env.vvv 518 * main():out_format JOB->cfg.out_format 519 * main():lc JOB->cfg.lc 520 * env.db_path JOB->cfg.db_path 521 522[jb] introduced list_and_data_free 523 524|jb| get rid of JOB. 525|jb| decide on last global variables "warn" and "debug" 526 js: its thought for debugging, we should use MACROS WARN or DEBUG 527|jb| eliminate static buffers inside fucntions. 528 52930/06/2002 530(js) gocr.h: struct box: modifier now wchar_t 531 mark things, which need changes with "ToDo:" directly in the sources, 532 so that we can grep for it and see that there are planed changes 533 53405/01/2003 535(js) ocr0.c: engine splitted in groups of characters (mostly pairs) 536 num_hole called once for every char, result is stored in a table 537?js? cvs ci jocr does not work since 2003? (it tries to update jocr/CVS) 538 have cvs-1.11.1p1 539