1WORDLIST2DAWG(1) 2================ 3:doctype: manpage 4 5NAME 6---- 7wordlist2dawg - convert a wordlist to a DAWG for Tesseract 8 9SYNOPSIS 10-------- 11*wordlist2dawg* 'WORDLIST' 'DAWG' 'lang.unicharset' 12 13*wordlist2dawg* -t 'WORDLIST' 'DAWG' 'lang.unicharset' 14 15*wordlist2dawg* -r 1 'WORDLIST' 'DAWG' 'lang.unicharset' 16 17*wordlist2dawg* -r 2 'WORDLIST' 'DAWG' 'lang.unicharset' 18 19*wordlist2dawg* -l <short> <long> 'WORDLIST' 'DAWG' 'lang.unicharset' 20 21DESCRIPTION 22----------- 23wordlist2dawg(1) converts a wordlist to a Directed Acyclic Word Graph 24(DAWG) for use with Tesseract. A DAWG is a compressed, space and time 25efficient representation of a word list. 26 27OPTIONS 28------- 29-t 30 Verify that a given dawg file is equivalent to a given wordlist. 31 32-r 1 33 Reverse a word if it contains an RTL character. 34 35-r 2 36 Reverse all words. 37 38-l <short> <long> 39 Produce a file with several dawgs in it, one each for words 40 of length <short>, <short+1>,... <long> 41 42ARGUMENTS 43--------- 44 45'WORDLIST' 46 A plain text file in UTF-8, one word per line. 47 48'DAWG' 49 The output DAWG to write. 50 51'lang.unicharset' 52 The unicharset of the language. This is the unicharset 53 generated by mftraining(1). 54 55SEE ALSO 56-------- 57tesseract(1), combine_tessdata(1), dawg2wordlist(1) 58 59<https://tesseract-ocr.github.io/tessdoc/Training-Tesseract.html> 60 61COPYING 62------- 63Copyright \(C) 2006 Google, Inc. 64Licensed under the Apache License, Version 2.0 65 66AUTHOR 67------ 68The Tesseract OCR engine was written by Ray Smith and his research groups 69at Hewlett Packard (1985-1995) and Google (2006-present). 70