1This is a list of reviews, so that developers don't get confused with the new
2changes. Please document what you do here.
3
4() done
5[] done, but should be reviewed
6|| to be done, do if you can
7?? question: if you know, solve it
8Writer: put your initials inside.
9
105/8
11(bbg) changed box1 to head_data.
12(bbg) changed boxd to head_db.
13|bbg| box2 static variables do not exist anymore. They were confusing,
14     since these names were used as arguments to functions too. From what I've
15     seen, this will not affect the program.
16     (js) agree
17(bbg) added new file: box.c, with all the functions that deal with boxes.
18(bbg) added new file: database.c, with all the functions that deal with db.
19[bbg] ocr_db()
20|bbg| split pgm2asc(): the function would still exist, but call other
21     subfunctions which are part of it now. I suggest:
22     - find_letters()
23     - remove_dust()
24     - remove_pictures()
25     - remove_melted_serifs()
26     - find_longest_line()
27     - detect_lines()
28     etc: they are practically already defined between {}.
29?bbg? pgm2asc: it has 2 frame letter codes, 2 find pictures, 2 remove
30     dust
31     |js| It is result of last (not finished) big rewriting of code
32          will be changed later
33|bbg| vvv (or equivalent) should be global.
34
356/8
36(bbg) general linked lists done (list.*).
37|bbg| Functions in box.c should be changed by the new general ones.
38     linearr should too.
39(bbg) strc() and in_str() changed by strchr.
40?bbg? shouldn't test_umlaut() belong to another file, like ocr0.cc?
41      js: good idea
42      (bbg) moved to ocr0.cc
43?bbg? there are several functions with 2 or 3 versions (detect_lines,
44     detect_lines2, for example). Do they do the same thing (and one
45     is older) or different things?
46     js: different, they are different childs from same mother
47         they are going more different in future
48?bbg? What do ini_list(), excude() and getresult() do? They don't seem
49     to be used. If they are useless, please delete them.
50     js: read the new comments in the file (should be in seperate file?)
51|bbg| document pixel().
52     :done.
53|bbg| Many, many commented lines all over the file; should be cleaned.
54     :done.
55|bbg| Change new->malloc, delete->free.
56     :done
57
587/8
59(js) some comments added in pgm2asc.cc
60|bbg| instead of p.p[x+y*p.x], use the new macro pixel_at(p, x, y).
61     Not all substitutions done yet.
62     :done.
63(bbg) excude() renamed to exclude().
64?bbg? what are the pixel bits meaning? A table would be useful.
65     (js) not stable: lowest (3-4?) bits are for temporary use
66              highest (4?) bits are for intensity
67|bbg| must change all "type &p" arguments.
68     :done
69|bbg| should make argument order standard: at the present, sometimes the
70     order is x0,x1,y0,y1, sometimes x0,y0,x1,y1, pix is not always the
71     first, etc.
72(bbg) turmite(): changed for+if -> while. Is is correct? I think so, but
73     a test wouldn't hurt. Also optimized the function.
74   (js) I test it after complete review.
75
768/8
77(js) pgm2asc.cc: generally pix *p used, pgm2asc() partly rewritten
78|js| ocr0.cc: pix *p must be used everywhere
79    :done
80(bbg) pixel(): first part (c33 filling) optimized.
81(bbg) minor optimizations, cut superflous counters.
82?bbg? copybox: shouldn't b->p be free()d, or realloc()ed? There's also a new
83     faster version (using memcpy), but there's no unmarking of pixels; I'm
84     also not sure if they are equivalent, since I'm not calling pixel. It's
85     between #ifs.
86
879/8
88(bbg) Started to change linearr to linked lists; linearr is obsolete now,
89     and textlines too.
90(bbg) free_textlines() and getTextLine() use linked lists (instead of linearr)
91     now. The first may be deleted in the future.
92[bbg] store_boxtree_textlines use linked lists (instead of linearr) now, but
93     should be tested and reviewed; specially if (box2->c == ' ') since it
94     doesn't check for an \n anymore (why should it?). Also the extra list_app
95     should be checked. Check too if "if (!(mo & 8))" shouldn't come before
96     "if (box2->c != '_')".
97     :done
98(bbg) Moved copybox() to box.c
99?bbg? put(): What are ia and io?
100      js: ia int_and, io int_or  new_pixel=(old_pixel & ia) | io
101          for more praktical use
102|bbg| What about putting remove_dust, remove_picture, remove_etc in another
103     file, such as clean.c or remove.c?
104      js: should be made
105      bbg: done
106|bbg| several #ifdef's pending approval or review. Get rid of them.
107      :done
108[bbg] Added code to open pgm files using libpgm. Not tested, but should be OK.
109[bbg] started to write writepbm, writeppm.
110
11110/8
112[js] pgm2asc() new functions created
113(bbg) moved remove_* functions to remove.c
114(bbg) added doc/ directory; moved ocr.tex to doc/; created examples.txt.
115(bbg) added UNICODE support: unicode.h contains all symbols we will ever need,
116     unicode.c contains two conversion functions.
117(bbg) most of Unicode->TeX convert() (0x7F-0xFF) codes are done.
118[bbg] several compose() codes done.
119
12011/8
121(bbg) new pnm IO functions, much better now.
122[bbg] fixed part of ocr0.cc: since arguments changed from pix &p to pix *p, it
123     wasn't compiling. Must fix the pix b; should it be changed to pix *b?
124     :done
125|bbg| Use new Unicode code; change all old code using unsigned char to wchar_t.
126     Work started. Take care: old libc functions won't work anymore. Use those
127     defined in <wchar.h>. Perhaps we should wait until the review is done and
128     gocr is working again.
129     :do it in 0.3.1 only.
130?bbg? should we move wert code to a new file?
131     (js) we should
132?bbg? should we move all code a /src sudirectory?
133     (js) we should, same step should rename .cc to .c
134|bbg| fix Makefile (or better, configure.in) to reflect new files.
135     :done
136[bbg] started to change all the for(box=head_data) to for_each_data. CAREFUL:
137     you cannot use for_each_data recursively. On the next layer, you must use
138     something like for(box=list_get_header; box; box=list_next(box));
139     :done
140
14113/8
142(bbg) finished changing for(box=head_data). Old box_* functions still used,
143     must fix.
144(bbg) cleaned 99% of the warning/errors when compiling pgm2asc.cc
145[bbg] fix: warning: control reaches end of non-void function
146     `compare_unknown_with_known_chars(pix *, int)'
147     :done
148[bbg] check if pgm2asc.cc:474:i = ((pixel(p, x, y) < cs) ? 0 : 1); is useless
149     or not.
150     (js) useless, relict from older review-changes
151(bbg) vvv is now part of struct environment.
152(bbg) removed Uchar, may conflict with other libs. Use unsigned char.
153
15415/8
155(bbg) Tim Waugh sent man page; added him in the thanks of README.txt
156[bbg] should break README.txt in: INSTALL, CREDITS, TODO, HISTORY.
157     :done
158(bbg) more changes from box_* to list_*
159
16016/8
161(bbg) added greek letters to the unicode.c functions. Not complete.
162(bbg) added punctuation symbols to the unicode.c functions. Not complete.
163(bbg) added ISO8859-1 support to the unicode.c functions. 0x20-0xFF supported,
164     ligatures supported.
165
16618/8
167[bbg] fix try_to_divide_boxes to the new general linked list.
168[bbg] most for_each_data have a line like box2 = list_get_current, which is
169     unnecessary. Change box2 by list_get_current. To avoid clumsiness, it
170     maybe a good idea to create a new #define lgc(a) list_get_current(a).
171
17225/8
173(js) ./src created, .cc,.h,.c moved to src, .cc renamed to .c
174[js] try to get make working (make jconv) see config.h: USE_LIBPNM => HAVE_PNM_H
175|js| list.c L66 data_before ??? what does that mean?
176     (bbg) mistyped, fixed.
177     list_del should be tested
178     :done
179?js? unicode.c there are undefined DEFS L89++ (i have put it in /* */)
180     L150++,L185++,L633++
181     (bbg) fixed
182[js] make works again (now you can make compilation tests on changes)
183     ... but SEGFAULT
184
18526/8
186(bbg) fixed casting warnings in unicode.c; fixed undefined DEFS.
187|bbg| fix the missing LaTeX codes in unicode.c.
188
18928/8
190?js? configure does not find /usr/X11R6/include/pnm.h, why?
191[js] fixed some bugs to get gocr working, but there are still big bugs
192
19330/8
194(bbg) fixed all gcc warnings. Some bugs killed in the process.
195(bbg) fixed pnm.c compilation problem
196|bbg| have to fix database.c:102
197|bbg| MANDATORY: change all old linked list code to the new one. This means:
198     get rid of box_* functions, the header->data stuff, etc. The new linked
199     list code is probably working 100%.
200(bbg) changed $(CXX) to $(CC). C conversion completed!
201(bbg) added jconv and gocr sections to src/Makefile.
202(bbg) new ISO8859-1 codes in unicode.c:convert.
203(bbg) fixed several bugs in list.c, and patched list_del() to return 2 if
204     deleted data was list->current.
205
2061/9
207(bbg) added new list_init(). Use it.
208(bbg) wrote a fix to the list_del()+for_each_data bug. It's not very good, but
209     works. Fixes the nested loops too. Now you can use for_each_data
210     recursively.
211
2128/9
213(bbg) more fixes from the old LL system to the new one.
214(js) added list_higher_level(), fix list_lower_level()
215     bug fixed free_textlines(),store_boxtree_lines(),store_boxtree_lines()
216(js) review of pgm2asc() finished
217
2189/9
219(bbg) more fixes from the old LL system to the new one.
220(bbg) fixed the realloc fixes of 8/9 in fix list_lower_level(),
221     free_textlines(),store_boxtree_lines(), and fixed list_higher_level(),
222     to avoid a blow if realloc fails. Now things continue working.
223(js) compilation bug removed
224(bbg) reviewed context_correction().
225(bbg) moved output_list() and write_img() to output.c. Added output.h.
226?bbg? Since output.c functions are only for debugging, what do you think about
227     adding some #ifdef DEBUGs to make a smaller, faster code? It doesn't have
228     to be done now, but it's an idea.
229   (js) its general good idea, but I would use a DEBUG level
230           DEBUG > 2 (or similar)
231          during development (until version 1.0) DEBUG should be activated
232          by default,
233          people could experiment with it and experts could deactivate DEBUG
234(bbg) cleaned old textline functions.
235
23611/9
237(js) have fixed two major bugs, gocr now works again (but only -m 56 works)
238
23918/9
240(js) pedantic compiler warnings removed
241    bbg: compiling using pam (netpbm, see 22/9) functions generate some warnings,
242     which are caused by pam.h and do not affect gocr.
243(js) list_del() does not work proper in context_correction(), fixed
244    bbg: list_del return value should be checked always. I'm not sure what
245     behaviour you want, so I didn't add the checks.
246(js) malloc_box() bug fixed (memcpy src-dest mismatch)
247
24821/9
249(js) pgm2asc.c L1954, 2nd scan removed, now much faster, but
250     handling of umlaut etc. is missing (similar to gluing function)
251     bbg: umlaut is detected. Are you sure it's missing?
252(js) I am away until Sept 30th 2000
253
25422/9
255|bbg| pnm_readpnmrow() SIGSEGV's. Don't know why.
256     ?js? (when does it happens? example file?)
257     bbg: any file, here. Does it work for you? Are you sure isn't the pam
258      functions?
259(bbg) added new code to read pnm's using netpbm package (pam* functions). It
260     works, but the drawback is that only recent (August 2000) libraries have
261     these functions. Added test for pam.h in configure.
262[bbg] moved example files to ~/tests/. Updated Makefile.in to reflect changes
263     and moved the pertinent stuff to tests/Makefile. I don't have experience
264     with autoconf, but I think I did it right. :)
265(bbg) fixed tests for unistd.h, which were missing.
266|bbg| How to change gocr.tcl splitted windows proportion? It'd be nice to be
267     able to have a larger output area.
268|bbg| gotta fix make.bat. There's a make for DOS, however; should we delete
269     make.bat?
270(bbg) wrote INSTALL. Moved (no changes) history from README.txt to HISTORY.
271|bbg| review README.txt. Create a TODO file, BUGS, etc.
272     :done
273
27423/9
275?bbg? database.c::ocr_db(), is any use for box3?
276(bbg) added "Elapsed time".
277(bbg) minor fixes.
278
27924/9
280|bbg| remove_dust() broken? It's not being used currently, since it's the last
281     piece of code using old list functions, and if it's updated, ocr1.c returns
282     a bunch of "#hmm, something was going wrong". So, the problem may be in
283     ocr1.c. I wrote the patch for remove_dust(), it's in #ifdefs.
284     :done
285(bbg) patched rest of remove.c to with new list functions.
286[bbg] remove_pictures() loop is weird. I followed what was there, but it should
287     be checked.
288
2891/10
290(bbg) for_each_data now checks the return value of list_higher_level().
291(bbg) moved some functions to pixel.c
292(bbg) some cleanup
293
2943/10
295(js)  minor changes in Makefiles, add_line_infos() improved
296
2976/10
298(bbg) fixed bug when inserting a node before header, in list_ins().
299(bbg) fixed bug that made for_each_data use list->header twice: l->fix wasn't
300     properly initialized.
301(bbg) added list_sort(). Runned some tests, seems to be OK.
302
3037/10
304(bbg) updated list_sort() to a faster version. Runned tests.
305(bbg) some minor cosmetic fixes
306|bbg| amiga.h: is it really needed, can someone test it?
307|bbg| As soon as remove_dust is fixed, the old list code can be deleted, which
308     would be nice. So, Joerg, can you please fix it? Thanks.
309     [js] Have I fixed it?
310     :done
311
3129/10
313(mg)  moved gocr.tcl and create_db to new bin folder
314(mg)  cleaned up Makefile.in (some subdirs not included yet)
315(mg)  created new sub-folder lib
316(mg)  created new sub-folder include
317[mg]  installation for src is done very rudimentary
318
31913/10
320(bbg) split README.txt in BUGS, TODO and CREDITS. Joerg: Please write an AUTHORS
321     file.
322(bbg) renamed README.txt to README.
323(bbg) added list_empty(). Fixed list_higher_level to avoid empty lists. BTW: I
324    think that passing an empty list (l->header==NULL) may blow some of the
325    list functions.
326
32716/10
328(bbg) added AUTHORS.
329
33022/10
331(js) minor changes, glue_broken_chars() improved, ocr0 updated
332?js? list_del() seems not to work in nested for_each_element()-loops
333[js] remove_dust fixed?
334     :done
335
33623/10
337(bbg) rewrote list_del, using a new kind of fix. It will work as long as the
338     data is not freed by the user. A solution to fix this is to pass a pointer
339     to a function that will free the data.
340
34125/10
342(js) glue_broken_chars() recognition of "=" fixed, some other bugs fixed
343(js) some improvements for recognition (ocr0n.c,ocr0.c)
344
34527/10
346(js) further improvements for better recognition results, polish.tex added
347
34828/10
349(bbg) changed free_boxtree() to new code
350|bbg| have to fix the reast of the old box code
351     :done
352
3534/11
354[bbg] one of the frees of the new free_boxtree code sigsevs. I dunno why.
355     (js) boxlist->element->p is a pointer to existing whole pixmap, do not free this!
356[bbg] I commented the following lines of pgmasc.c: 1430,43,55,67,81,95, since
357     they seem to be unnecessary.
358       (js) hmm: they are not needed for list_ins, but could be usefull for
359       a engine which looks for surrounding boxes. Do you know what I mean?
360     It was the last step of the conversion to the
361     new list routines. *uf*.
362(bbg) ALL OLD CODE ROUTINES ARE DEPRECATED NOW. Mission successful, waiting for
363     permission to delete. :)
364       (js) permission granted
365(bbg) Just to reassure, I tested gocr today and it's 100%
366[bbg] change sort_boxes to a call to list_sort. sort_box_func should be
367     reviewed
368      (js) looks ok to me.
369     or tini_list(){
370      (js) ???
371?bbg? ini_list(), exclude() & getresult(): should they be deleted?
372       (js) not found :(
373     I'm working
374     on a probability based engine, using neural networks.
375
3766/11
377(js) get Segmentation fault, => fixed
378
37914/11
380(bbg) deleted old list code.
381(bbg) moved all detect* functions to detect.c
382(bbg) movde lines functions to lines.c
383
38422/11
385(bbg) reviewed frame_nn(), mark_nn(), num_hole(), num_obj(). Some other minor
386    revisions.
387(bbg) added outbounds() macro in gocr.h. It should be used from now on.
388
38924/11
390(js) some patches by J.R.V. Zandt added
391?js? have still problems detecting pnm-libs via autoconf/configure
392
39325/11
394(bbg) added HTML support for all ISO8859-1 characters in unicode.
395
39627/11
397(bbg) some fixes in unicode.*
398|bbg| whatletter needs a serious revision.
399
40030/11
401[js] split ocr0.c (compiles faster now), gcc 2.95.1 "bug" removed?
402
40310/12
404[bbg] Unicode is now supported. Some tests should be made, specially testing
405    UNKNOWN/PICTURE and >0xFF characters.
406
40717/12
408[js] pamlib,netlib,pnmlib detection (configure.in) changed
409
41021/12
411[jrv] pgm2asc changes:
412  context_correction() BUGFIX: test whether
413previous(previous(current)) exists, before trying to dereferencing the pointer.
414  follow_path() introduced: follow a path, recording transitions
415between dark and light.
416  xrealloc() introduced: safe memory allocation
417  loop() optimization: move direction test outside loop, simplify loop test
418  measure_pitch() introduced: detect monospaced font and measure the pitch
419  pgm2asc(): if font is monospaced, set spc per measured character spacing.
420  A few wording fixes.
421
42205/01
423(js) some fixes in glue_broken_chars results in good ";" and "!" detection
424     engine updated (examples/font1.pbm should work perfect)
425
42619/02
427(js) "make dist" now automaticly makes packaging (gocr-x.y.z.tgz)
428    does the Makefile in api working (especially make clean/proper)?
429   "make install" should work too
430   bbg: API makefiles work, but there's no make proper (use distclean instead)
431    I'd appreciate if someone could test the configure and libpnm. install is
432    not tested yet.
433
43419/04
435[js] USE_UNICODE defined as default in gocr.h (test it), some fixes to get
436   -f TeX, -f HTML running, fixed bug in lines.c (unterminated string)
437  measure_pitch extended to 12<width<150 (see example)
438
43915/05
440[js] list.c line 225+226 realloc( .., (l->level+2)..) correct? (S.Niemz bug)
441
44230/07
443?js? readpnm if USE_libpnm does not the job of old readpgm (for pbm-files)
444
44522/08
446(js) gcc -Wall -pedantic warnings fixed, ocr0.c wchar_t used by default
447    we should switch to wchar_t, its more flexible and ANSI (?)
448    and the huge number of "#ifdef USE_UNICODE" statements are looking bad
449?js? can someone compile test on a machine without wchar.h? (FREEBSD?)
450
45125/08
452?js? I got following warning by gcc 2.95.3 20010315 (-Wall -pedantic):
453  lines.c: In function 'store_boxtree_lines':
454  lines.c:136: warning: implicit declaration of function 'wcsdup'
455  but I included <wchar.h>. Does anybody understand this???
456  hmmm ... Is it because of not ANSI (but GNU) extension?
457  What should we do with non-standart C-functions?
458  May be therefore I got the complains from FREEBSD users?
459
46030/08
461(js) pgm2asc.c, output.c simplified, improvement of char-devision
462
46308/02/2002
464?js? all possible SIGSEGVs in list.c fixed
465
46615/02/2002
467[js] ocr0.c will use setac() in future manipulating struct box - tac,wac
468  and c, if chars are very similar this will make context correction more
469  easyly, filtering will be more easily too (not fully implemented)
470
47125/02/2002
472[js] box->obj added for storing more than one char
473
47402/06/2002
475(jb) added special encodings for '&', '<', '>' to HTML decoder
476
47705/06/2002
478[jb] job_t introduced (same patch as sent to the mailing list)
479     Only local variables (configuration and pixmap) of main() are
480     moved inside job_t yet.
481[jb] boxlist and linelist converted to job_t
482[jb] Temorary introduction of global variable JOB (of type job_t)
483     This variable will be removed, while all functions are converted to
484     receive either an job_t pointer or the needed values.
485[jb] n_run, env, db_path and ppo converted to job_t
486[jb] converted dblist
487[jb] renamed init_job and free_job to job_free and job_init
488[jb] converted nearly all global variables (except warn and debug) to
489     job_t. Used JOB as new temporaryr global variable.
490     Overview of the renaming (some old variables are combined to a new one):
491
492       OLD VAR NAMES      NEW VAR NAME
493     * main():inam        JOB->src.fname
494     * main():p           JOB->src.p
495       env.p
496     * main():init        JOB->tmp.init_time
497     * ppo                JOB->tmp.ppo
498     * n_run              JOB->tmp.n_run
499     * dblist             JOB->tmp.dblist
500     * boxlist            JOB->res.boxlist
501     * linelist           JOB->res.linelist
502     * lines              JOB->res.lines
503     * env.avX            JOB->res.avX
504     * env.avY            JOB->res.avY
505     * env.sumX           JOB->res.sumX
506     * env.sumY           JOB->res.sumY
507     * env.numC           JOB->res.numC
508     * main():cs          JOB->cfg.cs
509       env.cs
510     * main():spc         JOB->cfg.spc
511     * main():mo          JOB->cfg.mode
512       env.mode
513     * main():dust_size   JOB->cfg.dust_size
514     * main():numo        JOB->cfg.only_numbers
515       only_numbers
516     * main():verbose     JOB->cfg.verbose
517       env.vvv
518     * main():out_format  JOB->cfg.out_format
519     * main():lc          JOB->cfg.lc
520     * env.db_path        JOB->cfg.db_path
521
522[jb] introduced list_and_data_free
523
524|jb| get rid of JOB.
525|jb| decide on last global variables "warn" and "debug"
526  js: its thought for debugging, we should use MACROS WARN or DEBUG
527|jb| eliminate static buffers inside fucntions.
528
52930/06/2002
530(js) gocr.h: struct box: modifier now wchar_t
531     mark things, which need changes with "ToDo:" directly in the sources,
532      so that we can grep for it and see that there are planed changes
533
53405/01/2003
535(js) ocr0.c: engine splitted in groups of characters (mostly pairs)
536     num_hole called once for every char, result is stored in a table
537?js? cvs ci jocr   does not work since 2003? (it tries to update jocr/CVS)
538   have cvs-1.11.1p1
539