1\input texinfo @c -*-texinfo-*-
2@c %**start of header
3@setfilename flite.info
4@settitle Flite: a small, fast speech synthesis engine
5@finalout
6@setchapternewpage odd
7@c %**end of header
8
9@c This document was modelled on the numerous examples of texinfo
10@c documentation available with GNU software, primarily the hello
11@c world example, but many others too.  I happily acknowledge their
12@c aid in producing this document -- awb
13
14@set EDITION 2.0
15@set VERSION 2.0
16@set UPDATED 18th November 2014
17
18@ifinfo
19This file documents @code{Flite}, a small, fast run-time speech
20synthesis engine.
21
22Copyright (C) 2001-2014 Carnegie Mellon University
23
24Permission is granted to make and distribute verbatim copies of
25this manual provided the copyright notice and this permission notice
26are preserved on all copies.
27
28@ignore
29Permission is granted to process this file through TeX, or otherwise and
30print the results, provided the printed document carries copying
31permission notice identical to this one except for the removal of this
32paragraph (this paragraph not being relevant to the printed manual).
33
34@end ignore
35Permission is granted to copy and distribute modified versions of this
36manual under the conditions for verbatim copying, provided that the entire
37resulting derived work is distributed under the terms of a permission
38notice identical to this one.
39
40Permission is granted to copy and distribute translations of this manual
41into another language, under the above conditions for modified versions,
42except that this permission notice may be stated in a translation approved
43by the authors.
44@end ifinfo
45
46@titlepage
47@title Flite: a small, fast speech synthesis engine
48@subtitle System documentation
49@subtitle Edition @value{EDITION}, for Flite version @value{VERSION}
50@subtitle @value{UPDATED}
51@author by Alan W Black and Kevin A. Lenzo
52
53@page
54@vskip 0pt plus 1filll
55Copyright @copyright{} 2001-2014 Carnegie Mellon University, all rights
56reserved.
57
58Permission is granted to make and distribute verbatim copies of
59this manual provided the copyright notice and this permission notice
60are preserved on all copies.
61
62Permission is granted to copy and distribute modified versions of this
63manual under the conditions for verbatim copying, provided that the entire
64resulting derived work is distributed under the terms of a permission
65notice identical to this one.
66
67Permission is granted to copy and distribute translations of this manual
68into another language, under the above conditions for modified versions,
69except that this permission notice may be stated in a translation approved
70by the Carnegie Mellon University
71@end titlepage
72
73@node Top, , , (dir)
74
75@menu
76* Abstract::            initial comments
77* Copying::             How you can copy and share the code
78* Acknowledgements::    List of contributors
79* Installation::        Compilation and Installation
80* Flite Design::
81* APIs::                 Standard functions
82* Converting FestVox Voices:: building flite voices from FestVox ones
83
84@end menu
85
86@node Abstract, Copying, , Top
87@chapter Abstract
88
89This document provides a user manual for flite, a small, fast
90run-time speech synthesis engine.
91
92This manual is nowhere near complete.
93
94Flite offers text to speech synthesis in a small and efficient binary.
95It is designed for embedded systems like PDAs as well large server
96installation which must serve synthesis to many ports.  Flite is part
97of the suite of free speech synthesis tools which include Edinburgh
98University's Festival Speech Synthesis System
99@url{http://www.festvox.org/festival} and Carnegie
100Mellon University's FestVox project @url{http://festvox.org}, which
101provides tools, scripts, and documentation for building new synthetic
102voices.
103
104Flite is written in ANSI C, and is designed to be portable
105to almost any platform, including very small hardware.
106
107Flite is really just a synthesis library that can be linked into other
108programs, it includes two simple voices with the distribution, an old
109diphone voice and an example limited domain voice which uses the newer
110unit selection techniques we have been developing.  Neither of these
111voices would be considered production voices but serve as examples, new
112voices will be released as they are developed.
113
114The latest versions, comments, new voices etc for Flite are available
115from its home page which may be found at
116@example
117@url{http://cmuflite.org}
118@end example
119
120@node Copying, Acknowledgements, Abstract, Top
121@chapter Copying
122
123Flite is free software.  It is distributed under an X11-like license.
124Apart from the few exceptions noted below (which still have
125similarly open licenses) the general license is
126@example
127                  Language Technologies Institute
128                    Carnegie Mellon University
129                     Copyright (c) 1999-2014
130                       All Rights Reserved.
131
132 Permission is hereby granted, free of charge, to use and distribute
133 this software and its documentation without restriction, including
134 without limitation the rights to use, copy, modify, merge, publish,
135 distribute, sublicense, and/or sell copies of this work, and to
136 permit persons to whom this work is furnished to do so, subject to
137 the following conditions:
138  1. The code must retain the above copyright notice, this list of
139     conditions and the following disclaimer.
140  2. Any modifications must be clearly marked as such.
141  3. Original authors' names are not deleted.
142  4. The authors' names are not used to endorse or promote products
143     derived from this software without specific prior written
144     permission.
145
146 CARNEGIE MELLON UNIVERSITY AND THE CONTRIBUTORS TO THIS WORK
147 DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
148 ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT
149 SHALL CARNEGIE MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE
150 FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
151 WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
152 AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
153 ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
154 THIS SOFTWARE.
155@end example
156
157@node Acknowledgements, Installation, Copying, Top
158@chapter Acknowledgements
159
160The initial development of flite was primarily done by awb while
161travelling, perhaps the name is doubly appropriate as a substantial
162amount of the coding was done over 30,000ft).  During most of that
163time awb was funded by the Language Technologies Institute at
164Carnegie Mellon University.
165
166Kevin A. Lenzo was involved in the design, conversion techniques and
167representations for the voice distributed with flite (as well as being
168the actual voice itself).
169
170Other contributions are:
171@itemize @bullet
172@item Nagoya Institute of Technology
173The MLSA, MLPG code comes directly NITECH's hts engine code, though we
174have done some optimizations.
175@item Marcela Charfuelan (DFKI)
176For the mixed-excitation techniques (but no direct code).  These
177originally came from NITECH but we understood the techniques from
178Marcela's Open Mary Java code and implemented them in our optimized
179version of MLSA.
180@item David Huggins-Daines:
181much of the very early clunits code, porting to multiple platforms,
182substantial code tidy up and configure/autoconf guidance (up to 2001).
183@item Cepstral, LLC (@url{http://cepstral.com}):
184For supporting DHD to spend time (in early 2001) on flite and passing
185back the important fixes and enhancements while on a project funded by
186the Portuguese Foundation for Science and Technology (FCT) Praxis XXI
187program specifically to produce an open source synthesizer.
188@item Willie Walker <william.walker@@sun.com> and the Sun Speech Group:
189lots of low level bugs (and fixes).
190@item Henry Spencer:
191For the regex code
192@item University of Edinburgh:
193for releasing Festival for free, making a companion runtime synthesizer
194a practical project, much of the design of flite relies on the
195architecture decisions made in the Festival Speech Synthesis Systems and
196the Edinburgh Speech Tools.
197
198The duration cart tree and intonation (accent and F0) models for the
199US English voice were derived from the models in the Festival
200distribution. which in turn were trained from the Boston University FM
201Radio Data Corpus.
202
203@item Carnegie Mellon University
204The included lexicon is derived from CMULEX and the letter to sound
205rules are constructed using the Lenzo and Black techniques for
206building LTS decision graphs.
207@item Craig Reese: IDA/Supercomputing Research Center and Joe Campbell: Department of Defense
208who wrote the ulaw conversion routines in src/speech/cst_wave_utils.c
209@end itemize
210
211
212@node Installation, Flite Design, Acknowledgements, Top
213@chapter Installation
214
215Flite consist simple of a set of C files.  GNU configure is
216used to configure the engine and will work on most
217major architectures.
218
219In general, the following should build the system
220@example
221tar zxvf flite-XXX.tar.gz
222cd flite-XXX
223./configure
224make
225@end example
226However you will need to explicitly call GNU make
227@code{gmake} if @code{make} is not GNU make on your system.
228
229The configuration process build a file @file{config/config} which under
230some circumstances may need to be edited, e.g. to add unusual options or
231dealing with cross compilation.
232
233On Linux systems, we also support shared libraries which are useful for
234keeping space down when multiple different application are linked to the
235flite libraries.  For development we strongly discourage use of shared
236libraries as it is too easy to either not set them up correctly or
237accidentally pick up the wrong version.  But for installation they are
238definitely encouraged.  That is if you are just going to make and
239install they are good but unless you know what @var{LD_LIBRARY_PATH}
240does, it may be better to use static libraries (the default) if you are
241changing C code or building your own voices.
242@example
243./configure --enable-shared
244make
245@end example
246This will build both shared and static versions of the libraries but
247will link the executables to the @emph{shared} libraries thus you will
248need to install the libraries in a place that your dynamic linker will
249find them (cf. /etc/ld.so.conf) or set @var{LD_LIBRARY_PATH}
250appropriately.
251
252@example
253make install
254@end example
255Will install the binaries (@file{bin/flite*}), include files and
256libraries in appropriate subdirectories of the defined install
257directory, @file{/usr/local} by default.  You can change this at configure
258time with
259@example
260./configure --prefix=/opt
261@end example
262
263@section Windows Support
264
265@section Window CE Support
266
267@emph{NOTE: as Windows CE is somewhat rare now, we do not guarantee this
268still works.}
269
270Flite has been successfully compiled by a number of different groups
271under Windows CE.  The system should compile under Embedded Visual
272Studio but we not have the full details.
273
274The system as distributed does compile under the gcc @file{mingw32ce}
275toolchain available from @url{http://cegcc.sourceforge.net/}.  The
276current version can be compiled and run under WinCE with a primitive
277application called @file{flowm}.  @file{flowm} is a simple application
278that allows playing of typed-in text, or full text to speech on
279a file.  Files should be a simple ascii text files @code{*.txt}.  The
280application allows the setting of the byte position to start synthesis
281from.
282
283Assuming you have @file{mingw32ce} installed you can configure as
284@example
285./configure --target=arm-wince
286make
287@end example
288The resulting binary is given in @file{wince/flowm.exe}.  If you copy
289this onto your Windows Mobile device and run it, it should allow you
290to speak typed-in text and any @file{*.txt} files you have on your
291device.
292
293The application uses @code{cmu_us_kal} as the voice for default.
294Although it is possible to include the clustergen voices, they may be
295too slow to be really practical.  An 8KHz clustergen voice with a
296reduced order to 13 gives a voices that runs acceptably on an hp2755
297(624MHz) but still marginal on an AT&T Tilt (400MHz).
298
299Building 8KHz clustergen voices is currently a bit of hack.  We take the
300standard waveforms and resample them to 8KHz, then relabel the sample
301rate to be 16KHz.  Then build the voice as normal (as if the speaker
302spoke twice as fast.  You may need to have tune the F0 parameters in
303@file{etc/f0.params}.  This seems to basically work.  Then after the
304waveform is synthesized (still in the "chipmunk'' domain) we then
305playit back at 8KHz.  This effectively means we generate half the
306number of samples and the frames are really at 10ms.  A second
307reduction is an option on the basic @file{build_flite} command.  A
308second argument can specify order reduction, thus instead of the
309standard 25 static parameters (plus its deltas) we can reduce this to
31013 and still get acceptable results
311@example
312./bin/build_flite cg 13
313cd flite
314make
315@end example
316Importantly this uses less space, and uses less time to synthesis.
317These @code{SPEECH_HACKS} in @file{src/cg/cst_mlsa.c} are switched on
318by default when @code{UNDER_CE} is defined.
319
320The reduced order properly extracts the statics (and stddev) and
321deltas (and stddev) from the predicted parameter clusters and makes it
322as if those were the sizes of parameters that were used to the train
323the voice.
324
325@section PalmOS Support
326
327@emph{NOTE: as PalmOS is somewhat rare now, we do not guarantee this
328still works.}
329
330Support for PalmOS was removed from 1.9, I no longer have any working
331PalmOS devices.  But this section remains for people who do, but they
332may need to update something to make this work.
333
334Starting with 1.3 we have initial support for PalmOS using the free
335development tools.  The compilation method assumes the target device
336is running PalmOS 5.0 (or later) on an ARM processor.  Following
337convention in the Palm world, the app that the user interacts with is
338actually a m68k application compiled with the m68 gcc cross compiler,
339the resulting code is interpreted by the PalmOS 5.0 device.  The core
340flite code is in native ARM, and hence uses the ARM gcc cross
341compiler.  An interesting amout of support code is required to
342get all this work properly.
343
344The user app is called @code{flop} (FLite on Palm) and like most apps
345written by awb, is functional, but ugly.  You should not let a
346short-sighted Scotsman, who still thinks command line interfaces are
347cool, design a graphical app.  But it does work and can read typed-in
348text.  The @file{armflite.ro} resources are designed with the idea
349that proper applications will be written using it as a library.
350
351The @file{flop.prc} application is distributed separately so it can be used
352without having to install all these tools.  But if you want to PalmOS
353development here is what you need to do to compile Flite for PalmOS and
354the flop application.
355
356There are number of different application development environments for
357Palm, here I only describe the Unix based one as this is what was
358used.  You will need the PalmOS SDK 5.0 from palmOne
359@url{http://www.palmone.com/us/developers/}.  This is
360free but does require registration.  Out of the lots of different
361files you can get for palmOne you will eventually find
362@file{palmos-sdk-5.0r3-1.noarch.rpm}, install that on your linux
363machine
364@example
365rpm -i palmos-sdk-5.0r3-1.noarch.rpm
366@end example
367You will also need the various gcc based cross compilers
368@url{http://prc-tools.sourceforge.net/}
369@example
370prc-tools-2.3-1.i386.rpm
371prc-tools-arm-2.3-1.i386.rpm
372prc-tools-htmldocs-2.3-1.noarch.rpm
373@end example
374The Palm Resource compiler
375@url{http://pilrc.sourceforge.net/}
376@example
377pilrc-3.1-1.i386.rpm
378@end example
379And maybe the emulator
380@url{http://www.palmos.com/dev/tools/emulator/}
381@example
382pose-3.5-2.i386.rpm
383pose-skins-1.9-1.noarch.rpm
384pose-skins-handspring-3.1H4-1.noarch.rpm
385@end example
386Though as POSE doesn't support ARM code, @file{Simulator} does but
387that only works under Windows, POSE is only useful for debugging the
388m68k parts of the app.
389
390Install these
391@example
392rpm -i prc-tools-2.3-1.i386.rpm
393rpm -i prc-tools-arm-2.3-1.i386.rpm
394rpm -i prc-tools-htmldocs-2.3-1.noarch.rpm
395rpm -i pilrc-3.1-1.i386.rpm
396rpm -i pose-3.5-2.i386.rpm
397rpm -i pose-skins-1.9-1.noarch.rpm
398rpm -i pose-skins-handspring-3.1H4-1.noarch.rpm
399@end example
400We also need the prc-tools to know which SDK is available
401@example
402palmdev-prep
403@end example
404In addition we use Greg Parker's PEAL
405@url{http://www.sealiesoftware.com/peal/} ELF ARM loader. You need to
406download this and compile and install it yourself, so that
407@code{peal-postlink} is in your path.  Greg was very helpful and even
408added support for large data segments for this work (though in the end
409we don't actually use them).  Some peal code is in our distribution
410(which is valid under his licence) but if you use a different version
411of peal you may need to ensure they are matched, by updating
412the peal code in @file{palm/}.  We used version @file{peal-2004-12-29}.
413
414The other palm specific function we require is @code{par}
415@url{http://www.djw.org/product/palm/par/} which is part of the
416@code{prc.tgz} distribution.  We use @code{par} to construct resources
417from raw binary files.  There are other programs that can do this but
418we found this one adequate.  Again you must compile this and ensure
419@code{par} is in your path.  Note no part of @code{par} ends up
420in the distributed system.
421
422Given all of the above you should be able to compile the
423Palm code and the @code{flop} application.
424@example
425   ./configure --target=arm-palmos
426   make
427@end example
428The resulting application should be in @file{palm/flop/flop.prc}
429which can then be installed on your Plam device
430@example
431   pilot-xfer -i palm/flop/flop.prc
432@end example
433Setting up the tools, and getting a working Linux/Palm conduit is not
434particularly easy but it is possible.  Although some attempt was made
435to use the Simulator, (PalmOS 5.0/ARM simulator) under Windows, it
436never really contributed to the development.  The POSE (m68k) emulator
437though was use to develop the @code{flop} application itself.
438
439@subsection Some notes on the PalmOS port
440
441Throughout the PalmOS developer documentation they continually remind
442you that a Palm device is not a full computer, its an extention of the
443desktop.  But seeing devices like the Treo 600 can easily make one
444forget and want the device to do real computational work.  PalmOS is
445designed for small light weight devices so it is easy to start hitting
446the boundaries of its capabilities when trying to port larger
447aplications.
448
449PalmOS5.0 still has interesting limitations, in the m68k domain,
450@code{int}'s are 16 bit and using memory segments greater than 65K
451require special work.  Quaint as these are, they do significantly
452affect the port.  At first we thought that only the key
453computationally expensive parts would be in ARM (so-called
454@code{armlets}) but trying to compile the whole flite code in m68k
455with long/short distinctions and sub-64K code segment limitations was
456just too hard.
457
458Thus all the Flite code, USEnglish, Lexicon and diphone databases
459actually are compiled in the ARM domain.  There is however no system
460support in the ARM domain so call backs to m68k system functions are
461necessary.  With care calls to system functions can be significantly
462limited so only a few call backs needed to be written.  These are in
463@file{palm/pocore/}.  I believe CodeWarrior has better support for
464this, but in this case we rolled our own (though help from other open
465source examples was important).
466
467We manage the m68k/ARM interface through PEAL, which is basically a
468linker for ARM code and calling mechanism from m68k.  PEAL deals with
469globals and spliting the code into 65K chunks automatically.
470
471Flite does however have a number of large data segments, in the
472lexicon and the voice database itself.  PEAL can deal with this but it
473loads large segments by copying them into the dynamic heap, which on
474most Palm device is less than 2M.  This isn't big enough.
475
476Thus we changed Flite to restrict the number of large data sgements it
477used (and also did some new compression on them).  The five segments: the
478lts rules, the lexical entries, the voice LPC coefficients, the voice
479residuals and the voice residual index are now treated a data segments
480that are split into 65400 sized segments and loaded into feature
481memory space, which is in the storage heap and typically much bigger.
482This means we do need about 2-3 megabyte free on the device to run.
483We did look into just indexing the 65400 byte segments directly but
484that looked like being too much work, and we're only going to be able
485to run on 16M sized Palms anyway (there aren't any 8M ARM Palms with
486audio, expect maybe some SmartPhones).
487
488Using Flite from m68k land involves getting a @code{flite_info}
489structure from @code{flite_init()}.  This contains a bunch of fields
490that be set and sent to the ARM domain Flite synthesizer proper within
491which other output fields may be set and returned.  This isn't a very
492general structure, but is adequate.  Note the necessary byte swapping
493(for the top level fileds) is done for the this structure, before
494calling the ARM native @code{arm_flite_synth_text} and swapped back
495again after returning.
496
497Display, playing audio, pointy-clicky event thingies are all done in
498the m68K domain.
499
500@subsection Using the PalmOS
501
502There are three basic functions that access the ARM flite
503functions: @code{flite_init()}, @code{flite_synth_text()} and
504@code{flite_end()}.
505
506@node Flite Design, APIs, Installation, Top
507@chapter Flite Design
508
509@section Background
510
511Flite was primarily developed to address one of the most common
512complaints about the Festival Speech Synthesis System.  Festival is
513large and slow, even with the software bloat common amongst most
514products and that that bloat has helped machines get faster, have more
515memory and large disks, still Festival is criticized for its size.
516
517Although sometimes this complaint is unfair, it is valid and although
518much work was done to ensure Festival can be trimmed and run fast it
519still requires substantial resources per utterance to run.  After some
520investigation to see if Festival itself could be trimmed down it became
521clear because there was a core set of functions that were sufficient for
522synthesis that a new implementation containing only those aspects that
523were necessary would be easier than trimming down Festival itself.
524
525Given that a new implementation was being considered a number of
526problems with Festival could also be addressed at the same time.
527Festival is not thread-safe, and although it runs under Windows, in
528server mode it relies on the Unix-centric view of fast forks with
529copy-on-write shared memory for servicing clients.  This is a perfectly
530safe and practical solution for Unix systems, but under Windows where
531threads are the more common feature used for servicing multiple events
532and forking is expensive, a non-thread safe program can't be used as
533efficiently.
534
535Festival is written in C++ which was a good decision at the time and
536perfectly suitable for a large program.  However what was discovered
537over the years of development is that C++ is not a portable language.
538Different C++ compilers are quite different and it takes significant
539amount of work to ensure compatibility of the code base over multiple
540compilers.  What makes this worse is that new versions of each compiler
541are incompatible and changes are required.  At first this looked like we
542were producing bad quality code but after 10 years it is clear that it is
543also that the compilers are still maturing.  Thus it is clear that
544Festival and the Edinburgh Speech Tools will continue to require
545constant support as new versions of compilers are released.
546
547A second problem with C++ is the size and efficiency of the code
548produced.  Proponents of C++ may rightly argue that Festival and the
549Edinburgh Speech Tools aren't properly designed, but irrespective if
550that is true or not, it is true that the size of the code is much larger
551and slower than it need be for what it does.  Throughout the design
552there is a constant trade-off between elegancy and efficiency which
553unfortunately at times in Festival requires untidy solutions of
554copying data out of objects processing it and copying back because
555direct access (particularly in some signal processing routines)
556is just too inefficient.
557
558Another major criticism of Festival is the use of Scheme as the
559interpreter language.  Even though it is a simple to implement language
560that is adequate for Festival's needs and can be easily included in the
561distribution, people still hate it.  Often these people do learn to use
562it and appreciate how run time configurability is very desirable and that
563new voices may be added without recompilation.  Scheme does have garbage
564collection which makes leaky programs much harder to write and as some
565of the intended audience for developing in Festival will not be hard
566core programmers a safe programming language seems very desirable.
567
568After taking into consideration all of the above it was decided to
569develop Flite as a new system written in ANSI C.  C is much more
570portable than C++ as well as offering much lower level control of the
571size of the objects and data structure it uses.
572
573Flite is not intended as a research and development platform for speech
574synthesis, Festival is and will continue to be the best platform for
575that.  Flite however is designed as a run-time engine when an
576application needs to be delivered.  It specifically addresses two
577communities.  First as a engine for small devices such as PDAs and
578telephones where the memory and CPU power are limited and in some cases do
579not even have a conventional operating system.
580
581The second community is for those running synthesis servers for many
582clients.  Here although large fixed databases are acceptable, the size
583of memory required per utterance and speed in which they can be
584synthesized is crucial.
585
586However in spite of the decision to build a new synthesis engine we see
587this as being tightly coupled into the existing free software synthesis
588tools or Festival and the FestVox voice building suite.  Flite offers
589a companion run-time engine.  Our intended mode of development is
590to build new voices in FestVox and debug and tune them in Festival.
591Then for deployment the FestVox format voice may be (semi-)automatically
592compiled into a form that can be used by Flite.
593
594In case some people feel that development of a small run-time
595synthesizer is not an appropriate thing to do within a University and is
596more suited to commercial development, we have a few points which they
597should be aware of that to our mind justify this work.
598
599We have long felt that research in speech and language should have an
600identifiable link to ultimate commercial use.  In providing a platform
601that can be used in consumer products that falls within the same
602framework as our research we can better understand what research issues
603are actually important to the improvement our work.
604
605In considering small useful synthesizers it forces a more explicit
606definition of what is necessary in a synthesizer and also how we can
607trade size, flexibility and speed with the quality of synthesized
608output.  Defining that relationship is a research issue.
609
610We are also advocates of speech technology within other research areas
611and the ability to offer support on new platforms such as PDAs and
612wearables allows for more interesting speech applications such as
613speech-to-speech translation, robots, and interactive personal digital
614assistants, that will prove new and interesting areas of research.
615Thus having a platform that others around us can more easily integrate
616into their research makes our work more satisfying.
617
618@section Key Decisions
619
620The basic architecture of Festival is good.  It is well proven.  Paul
621Taylor, Alan W. Black and Richard Caley spent many hours debating low
622level aspects of representation and structure that would both be
623adequate for current theories but also allow for future theories too.
624The heterogeneous relation graphs (HRG) are theoretically adequate,
625computationally efficient and well proven.  Thus both because HRGs have
626such a background and that Flite is to be compatible with voices and
627models developed in Festival, Flite uses HRGs as its basic utterance
628representation structure.
629
630Most of a synthesizer is in its data (lexicons, unit database etc),
631the actual synthesis code is pretty small.  In Festival most of that
632data exists in external files which are loaded on demand.  This is
633obviously slow and memory expensive (you need both a copy on the data
634on disk and in memory).  As one of the principal targets for Flite is
635very small machines we wanted to allow that core data to be in ROM,
636and be appropriately mapped into RAM without any explicit loading
637(some OS's call this XIP -- execute in place).  This can be done by
638various memory mapping functions (in Unix its called mmap) and is the
639core technique used in shared libraries (called DLLs in some parts of
640the world).  Thus the data should be in a format that it can be
641directly accessed.  If you are going to directly access data you need
642to ensure the byte layout is appropriate for the architecture you are
643running on, byte order and address width become crucial if you want to
644avoid any extra conversion code at access time (like byte swapping).
645
646At first is was considered that synthesis data would be converted in
647binary files which could be mmap'ed into the runtime systems but
648building appropriate binaries files for architectures is quite a job.
649However the C compiler does this in a standard way.  Therefore the mode
650of operation for data within Flite is to convert it to C code (actually
651C structures) and use the C compiler to generate the appropriate binary
652structures.
653
654Using the C compiler is a good portable solution but it as these
655structures can be very big this can tax the C compiler somewhat.  Also
656because this data is not going to change at run time it can all be
657declared @code{const}.  Which means (in Unix) it will be in the text
658segment and hence read only (this can be ROM on platforms which have
659that distinction).  For structures to be const all their subparts
660must also be const thus all relevant parts must be in the same file,
661hence the unit databases files can be quite big.
662
663Of course, this all presumes that you have a C compiler robust enough to
664compile these files, hardware smart enough to treat flash ROM as memory
665rather than disk, or an operating system smart enough to demand-page
666executables.  Certain "popular" operating systems and compilers fail in
667at least one of these respects, and therefore we have provided the
668flexibility to use memory-mapped file I/O on voice databases, where
669available, or simply to load them all into memory.
670
671@chapter Structure
672
673The flite distribution consists of two distinct parts:
674@itemize @bullet
675@item The flite library containing the core synthesis code
676@item Voice(s) for flite.  These contain three sub-parts
677@itemize @bullet
678@item Language models:
679text processing, prosody models etc.
680@item Lexicon and letter to sound rules
681@item Unit database and voice definition
682@end itemize
683@end itemize
684
685@section cst_val
686
687This is a basic simple object which can contain ints, floats, strings
688and other objects.  It also allows for lists using the Scheme/Lisp,
689car/cdr architecture (as that is the most efficient way to represent
690arbitrary trees and lists).
691
692The @code{cst_val} structure is carefully designed to take up only 8 bytes (or
69316 on a 64-bit machine).  The multiple union structure that it can
694contain is designed so there are no conflicts.  However it depends on
695the fact that a pointer to a @code{cst_val} is guaranteed to lie on a even
696address boundary (which is true for all architectures I know of).  Thus
697the distinction between between cons (i.e. list) objects and atomic
698values can be determined by the odd/evenness of the least significant bits
699of the first address in a @code{cst_val}.  In some circles this is considered
700hacky, in others elegant. This was done in flite to ensure that the most
701common structure is 8 bytes rather than 12 which saves significantly on
702memory.
703
704All @code{cst_val}'s except those of type cons are reference counted.  A
705few functions generate new lists of @code{cst_val}'s which the user
706should be careful about as they need to explicitly delete them (notably
707the lexicon lookup function that returns a list of phonemes).
708Everything that is added to an utterance will be deleted (and/or
709dereferenced) when the utterance is deleted.
710
711Like Festival, user types can be added to the @code{cst_val}s.  In
712Festival this can be done on the fly but because this requires the
713updating of some list when each new type is added, this wouldn't be
714thread safe.  Thus an explicit method of defining user types is done in
715@file{src/utils/cst_val_user.c}.  This is not as neat as defining on the
716fly or using a registration function but it is thread safe and these
717user types won't change often.
718
719@node APIs, Converting FestVox Voices, Flite Design, Top
720@chapter APIs
721
722Flite is a library that we expected will be embedded into other
723applications.  Included with the distribution is a small example
724executable that allows synthesis of strings of text and text files
725from the command line.
726
727You may want to look at Bard @file{http://festvox.org/bard/}, an ebook
728reader with a tight coupling to flite as a synthesizer.  This is the
729most elaborate use of the Flite API within our suite of programs.
730
731@section flite binary
732
733The example flite binary may be suitable for very simple applications.
734Unlike Festival its start up time is very short (less than 25ms on a PIII
735500MHz) making it practical (on larger machines) to call it each
736time you need to synthesize something.
737@example
738flite TEXT OUTPUTTYPE
739@end example
740If @code{TEXT} contains a space it is treated as a string of text and
741converted to speech, if it does not contain a space @code{TEXT} is
742treated as a file name and the contents of that file are converted to
743speech.  The option @code{-t} specifies @code{TEXT} is to be treat
744as text (not a filename) and @code{-f} forces treatment as a file.
745Thus
746@example
747flite -t hello
748@end example
749will say the word "hello" while
750@example
751flite hello
752@end example
753will say the content of the file @file{hello}.  Likewise
754@example
755flite "hello world."
756@end example
757will say the words "hello world" while
758@example
759flite -f "hello world"
760@end example
761will say the contents of a file @file{hello world}.  If no argument is
762specified text is read from standard input.
763
764The second argument @code{OUTPUTTYPE} is the name of a file the output
765is written to, or if it is @code{play} then it is played to the audio
766device directly.  If it is @code{none} then the audio is created but
767discarded, this is used for benchmarking.  If it is @code{stream} then
768the audio is streamed through a call back function (though this is not
769particularly useful in the command line version.  If @code{OUTPUTTYPE}
770is omitted, @code{play} is assumed.  You can also explicitly set the
771outputtype with the @code{-o} flag.
772@example
773flite -f doc/alice -o alice.wav
774@end example
775
776@section Voice selection
777
778All the voices in the distribution are collected into a single simple
779list in the global variable @code{flite_voice_list}.  You can select a
780voice from this list from the command line
781@example
782flite -voice awb -f doc/alice -o alice.wav
783@end example
784And list which voices are currently supported in the binary with
785@example
786flite -lv
787@end example
788The voices which get linked together are those listed in the
789@code{VOICES} in the @file{main/Makefile}.  You can change that as you
790require.
791
792Voices may also be dynamically loaded from files as well as built in.
793The argument to the @code{-voice} option may be pathname to a dumped
794(Clustergen) voice.  This may be a Unix pathname or a URL (only protocols @file{http} and @file{file} are supported.  For example
795@example
796flite -voice file://cmu_us_awb.flitevox -f doc/alice -o alice.wav
797flite -voice http://festvox.org/voices/cmu_us_ksp.flitevox -f doc/alice -o alice.wav
798@end example
799Voices will be loaded once and added to @code{flite_voice_list}.
800Although these voices are often small (a few megabytes) there will
801still be some time required to read them in the first time.  The
802voices are not mapped, they are read into newly created structures.
803
804This loading function is currently only supported for Clustergen
805voices.
806
807@section C example
808
809Each voice in Flite is held in a structure, a pointer to which is
810returned by the voice registration function.  In the standard
811distribution, the example diphone voice is @code{cmu_us_kal}.
812
813Here is a simple C program that uses the flite library
814@example
815#include "flite.h"
816
817register_cmu_us_kal();
818
819int main(int argc, char **argv)
820@{
821    cst_voice *v;
822
823    if (argc != 2)
824    @{
825        fprintf(stderr,"usage: flite_test FILE\n");
826        exit(-1);
827    @}
828
829    flite_init();
830
831    v = register_cmu_us_kal(NULL);
832
833    flite_file_to_speech(argv[1],v,"play");
834
835@}
836@end example
837Assuming the shell variable FLITEDIR is set to the flite directory
838the following will compile the system (with appropriate changes for
839your platform if necessary).
840@example
841gcc -Wall -g -o flite_test flite_test.c -I$FLITEDIR/include -L$FLITEDIR/lib
842    -lflite_cmu_us_kal -lflite_usenglish -lflite_cmulex -lflite -lm
843@end example
844
845@section Public Functions
846
847Although, of course you are welcome to call lower level functions,
848there a few key functions that will satisfy most users
849of flite.
850@table @code
851@item void flite_init(void);
852This must be called before any other flite function can be called.  As
853of Flite 1.1, it actually does nothing at all, but there is no guarantee
854that this will remain true.
855@item cst_wave *flite_text_to_wave(const char *text,cst_voice *voice);
856Returns a waveform (as defined in @file{include/cst_wave.h}) synthesized
857from the given text string by the given voice.
858@item float flite_file_to_speech(const char *filename, cst_voice *voice, const char *outtype);
859synthesizes all the sentences in the file @file{filename} with given
860voice.  Output (at present) can only reasonably be, @code{play} or
861@code{none}.  If the feature @code{file_start_position} with an
862integer, that point is used as start position in the file to be synthesized.
863@item float flite_text_to_speech(const char *text, cst_voice *voice, const char *outtype);
864synthesizes the text in string point to by @code{text}, with the given
865voice.  @code{outtype} may be a filename where the generated waveform is
866written to, or "play" and it will be sent to the audio device, or
867"none" and it will be discarded.  The return value is the
868number of seconds of speech generated.
869@item cst_utterance *flite_synth_text(const char *text,cst_voice *voice);
870synthesize the given text with the given voice and returns an utterance
871from it for further processing and access.
872@item cst_utterance *flite_synth_phones(const char *phones,cst_voice *voice);
873synthesize the given phones with the given voice and returns an utterance
874from it for further processing and access.
875@item cst_voice *flite_voice_select(const char *name);
876returns a pointer to the voice named @code{name}.  Will retrurn
877@code{NULL} if there is not match, if @code{name == NULL} then the
878first voice in the voice list is returned.  If @code{name} is a url
879(starting with @file{file:} or @file{http:}, that file will be
880accessed and the voice will be downloaded from there.
881@item float flite_ssml_file_to_speech(const char *filename, cst_voice *voice, const char *outtype);
882Will read the file as ssml, not all ssml tags are supported but many
883are, unsupported ones are ignored.  Voice selection works by naming
884the internal name of the voice, or the name may be a url and the voice
885will be loaded.  The audio tag is supported for loading waveform files, again urls are supported.
886@item float flite_ssml_text_to_speech(const char *text, cst_voice *voice, const char *outtype);
887Will treat the text as ssml.
888@item int flite_voice_add_lex_addenda(cst_voice *v, const cst_string *lexfile);
889loads the pronunciations from @code{lexfile} into the lexicon
890identified in the given voice (which will cause all other voices using
891that lexicon to also get this new addenda list.  An example lexicon
892file is given in @file{flite/tools/examples.lex}.  Words may be in
893double quotes, an optional part of speech tag may be give.  A colon
894separates the headword/postag from the list of phonemes.  Stress
895values (if used in the lexicon) must be specified.  Bad phonemes will
896be complained about on standard out.
897@end table
898
899@section Streaming Synthesis
900
901In 1.4 support was added for streaming synthesis.  Basically you may
902provide a call back function that will be called with waveform data
903immediately when it is available.  This potentially can reduce the
904delay between sending text to the synthesized and having audio
905available.
906
907The support is through a call back function of type
908@example
909int audio_stream_chunk(const cst_wave *w, int start, int size,
910                       int last, cst_audio_streaming_info *asi)
911@end example
912If the utterance feature @code{streaming_info} is set (which can
913be set in a voice or in an utterance).  The LPC or MLSA resynthesis
914functions will call the provided function as buffers become available.
915The LPC and MLSA waveform synthesis functions are used for diphones,
916limited domain, unit selection and clustergen voices.  Note explicit
917support is required for streaming so new waveform synthesis function
918may not have the functionality.
919
920An example streaming function is provided in
921@file{src/audio/au_streaming.c} and is used by the example flite main
922program when @code{stream} is given as the playing option.  (Though in
923the command line program the function it isn't really useful.)
924
925In order to use streaming you must provide call back function in your
926particular thread.  This is done by adding features to the voice in
927your thread.  Suppose your function was declared as
928
929@example
930int example_audio_stream_chunk(const cst_wave *w, int start, int size,
931                       int last, void *user)
932@end example
933You can add this function as the streaming function through the statement
934@example
935     cst_audio_streaming_info *asi;
936...
937     asi = new_audio_streaming_info();
938     asi->asc = example_audio_stream_chunk;
939     feat_set(voice->features,
940             "streaming_info",
941             audio_streaming_info_val(asi));
942@end example
943You may also optionally include your own pointer to any information
944you additionally want to pass to your function.  For example
945@example
946typedef my_callback_struct @{
947   cst_audiodev *fd;
948   int count;
949@};
950cst_audio_streaming_info *asi;
951
952...
953
954mcs = cst_alloc(my_callback_struct,1);
955mcs->fd=NULL;
956mcs->count=1;
957
958asi = new_audio_streaming_info();
959asi->asc = example_audio_stream_chunk;
960asi->userdata = mcs;
961feat_set(voice->features,
962         "streaming_info",
963         audio_streaming_info_val(asi));
964@end example
965Another example is given in @file{testsuite/by_word_main.c} which
966shows a call back funtion that also prints the token as it is being
967synthesized.  The @file{utt} field in the
968@file{cst_audio_streaming_info} structure will be set to the current
969utterance.  Please note that the @file{item} field in the
970@file{cst_audio_streaming_info} structure is for your convenience and
971is not set by anyone at all.  The previous sentence exists in the
972documentation so that I can point at it, when user's fail to read it.
973
974@node Converting FestVox Voices, , APIs, top
975@chapter Converting FestVox Voices
976
977As of 1.2 initial scripts have been added to aid the conversion of
978FestVox voices to Flite.  In general the conversion cannot be automatic.
979For example all specific Scheme code written for a voice needs to be
980hand converted to C to work in Flite, this can be a major task.
981
982Simple conversion scripts are given as examples of the stages you need
983to go through.  These are designed to work on standard (English)
984diphone sets, and simple limited domain voices.  The conversion
985technique will almost certainly fail for large unit selection voices
986due to limitations in the C compiler (more discussion below).  In 1.4
987we have also added support for converting clustergen voices too (which
988is a little easier, see section below).
989
990@section Cocantenative Voice Building
991
992Conversion is basically taking the description of units (clunit
993catalogue or diphone index) and constructing some C files that can be
994compiled to form a usable database.  Using the C compiler to generate
995the object files has the advantage that we do not need to worry about
996byte order, alignment and object formats as the C compiler for the
997particular target platform should be able to generate the right code.
998
999Before you start ensure you have successfully built and run your FestVox
1000voice in Festival.  Flite is not designed as a voice building/debugging
1001tool it is just a delivery vehicle for finalized voices so you should
1002first ensure you are satisfied with the quality of Festival voices
1003before you start converting it for Flite.
1004
1005The following basic stages are required:
1006@itemize @bullet
1007@item Setup the directories and copy the conversion scripts
1008@item Build the LPC files
1009@item Build the MCEP files (for ldom/clunits)
1010@item Convert LPC (MCEP) into STS (short term signal) files
1011@item Convert the catalogue/diphone index
1012@item Compile the generated C code
1013@end itemize
1014
1015The conversion assumes the environment variable @code{FLITEDIR}
1016is set, for example
1017@example
1018   export FLITEDIR=/home/awb/projects/flite/
1019@end example
1020The basic flite conversion takes place within a FestVox voice directory.
1021Thus all of the conversion scripts expect that the standard files are
1022available.  The first task is to build some new directories and copy in
1023the build scripts.  The scripts are copied rather than linked from the
1024Flite directories as you may need to change these for your particular
1025voices.
1026@example
1027   $FLITEDIR/tools/setup_flite
1028@end example
1029This will read @file{etc/voice.defs}, which should have been created by
1030the FestVox build process (except in very old versions of FestVox).
1031
1032If you don't have a @file{etc/voice.defs} you can construct one
1033with @code{festvox/src/general/guess_voice_defs} in the Festvox
1034distribution, or generate one by hand making it look
1035like
1036@example
1037FV_INST=cmu
1038FV_LANG=us
1039FV_NAME=ked_timit
1040FV_TYPE=clunits
1041FV_VOICENAME=$FV_INST"_"$FV_LANG"_"$FV_NAME
1042FV_FULLVOICENAME=$FV_VOICENAME"_"$FV_TYPE
1043@end example
1044
1045The main script build building the Flite voice is @file{bin/build_flite}
1046which will eventually build sufficient C code in @file{flite/} that can
1047be compiled with the constructed @file{flite/Makefile} to give you a
1048library that can be linked into applications and also an example
1049@file{flite} binary with the constructed voice built-in.
1050
1051You can run all of these stages, except the final make, together by
1052running the the build script with no arguments
1053@example
1054   ./bin/build_flite
1055@end example
1056But as things may not run smoothly, we will go through the
1057stages explicitly.
1058
1059The first stage is to build the LPC files, this may have already been
1060done as part of the diphone building process (though probably not in
1061the ldom/clunit case).  In our experience it is very important that the
1062records be of similar power, as mis-matched power can often cause
1063overflows in the resulting flite (and sometimes Festival) voices. Thus,
1064for diphone voices, it is important to run the power normalization
1065techniques described int he FestVox document.  The Flite LPC build
1066process also builds a parameter file of the ranges of the LPC parameters
1067used in later coding of the files, so even if you have already built your
1068LPC files you should still do this again
1069@example
1070   ./bin/build_flite lpc
1071@end example
1072
1073For ldom, and clunit voices (but not for diphone voices) we also
1074need the Mel-frequency Cepstral Coefficients.  These are assumed to
1075have been cleared and are in @file{mcep/} as they are necessary
1076for running the voice in Festival.  This stage simply constructs
1077information about the range of the mcep parameters.
1078@example
1079   ./bin/build_flite mcep
1080@end example
1081
1082The next stage is to construct the STS files.  Short Term Signals (STS)
1083are built for each pitch period in the database.  These are ascii files
1084(one for each utterance file in the database, with LPC coefficients, and
1085ulaw encoded residuals for each pitch period.  These are built using a
1086binary executable built as part of the Flite build
1087(@file{flite/tools/find_sts}.
1088@example
1089   ./bin/build_flite sts
1090@end example
1091Note that the flite code expects waveform files to be in Microsoft RIFF
1092format and cannot deal with files in other formats.  Some earlier
1093versions of the Edinburgh Speech Tools used NIST as the default header
1094format.  This is likely to cause flite and its related programs not
1095work. So do ensure your waveform files are in riff format (ch_wave -info
1096wav/* will tell you the format).  And the following fill convert
1097all your wave files
1098@example
1099   mv wav wav.nist
1100   mkdir wav
1101   cd wav.nist
1102   for i in *.wav
1103   do
1104      ch_wave -otype riff -o ../wav/$i $i
1105   done
1106@end example
1107
1108The next stage is to convert the index to the required C format.  For
1109diphone voices this takes the @file{dic/*.est} index files, for
1110clunit/ldom voices it takes the @file{festival/clunit/VOICE.catalogue}
1111and @file{festival/trees/VOICE.tree} files.  This process uses a binary
1112executable built as part of the Flite build process
1113(@file{flite/tools/flite_sort}) to sort the indices into the same
1114sorting order required for flite to run.  (Using unix sort may or may
1115not give the same result due to definitions of lexicographic order so
1116we use the very same function in C that will be used in flite to ensure
1117that a consistent order is given.)
1118@example
1119   ./bin/build_flite idx
1120@end example
1121All the necessary C files should now have been built in @file{flite/}
1122and you may compile them by
1123@example
1124   cd flite
1125   make
1126@end example
1127This should give a library and an executable called @file{flite} that
1128can run as
1129@example
1130   ./flite "Hello World"
1131@end example
1132Assuming a general voice.  For ldom voices it will only be able to say
1133things in its domain.  This @file{flite} binary offers the same options
1134as standard the standard @file{flite} binary compiled in the Flite build
1135but with your voice rather than the distributed voices.
1136
1137Almost certainly this process will not run smoothly for you.  Building
1138voices is still a very hard thing to do and problems will probably
1139exist.
1140
1141This build process does not deal with customization for the given
1142voices.  Thus you will need to edit @file{flite/VOICE.c} to set
1143intonation ranges and duration stretch for your particular voice.
1144
1145For example in our @file{cmu_us_sls_diphone} voice (a US English female
1146diphone voice).  We had to change the default parameters from
1147@example
1148    feat_set_float(v->features,"int_f0_target_mean",110.0);
1149    feat_set_float(v->features,"int_f0_target_stddev",15.0);
1150
1151    feat_set_float(v->features,"duration_stretch",1.0);
1152@end example
1153to
1154@example
1155    feat_set_float(v->features,"int_f0_target_mean",167.0);
1156    feat_set_float(v->features,"int_f0_target_stddev",25.0);
1157
1158    feat_set_float(v->features,"duration_stretch",1.0);
1159@end example
1160
1161Note this conversion is limited.  Because it depends on the C compiler
1162to do the final conversion into binary object format (a good idea in
1163general for portability), you can easily generate files too big for the
1164C compiler to deal with.  We have spent some time investigating this
1165so the largest possible voices can be converted but it is still too
1166limited for our larger voices.  In general the limitation seems to be
1167best quantified by the number of pitch periods in the database.  After
1168about 100k pitch periods the files get too big to handle.  There are
1169probably solutions to this but we have not yet investigated them.  This
1170limitation doesn't seem to be an issue with the diphone voices as they
1171are typically much smaller than unit selection voices.
1172
1173@section Statistical Voice Building (Clustergen)
1174
1175The process of building from a clustergen (cg) voice is also
1176supported.  It is assumed the environment variable @code{FLITEDIR} is
1177set
1178@example
1179   export FLITEDIR=/home/awb/projects/flite/
1180@end example
1181After you build the clustergen voice you can convert by first setting
1182up the skeleton files in the @file{flite/} directory
1183@example
1184   $FLITEDIR/tools/setup_flite
1185@end example
1186Assuming @file{etc/voice.defs} properly identifies the voice the cg
1187templates will be compied in.
1188
1189The conversion itself is actually much faster than a clunit build
1190(there is less to actually convert).
1191@example
1192   ./bin/build_flite cg
1193@end example
1194Will convert then necessary models into files in the @file{flite/}
1195directory.  The you can compile it with
1196@example
1197   cd flite
1198   make
1199   ./flite_cmu_us_awb "Hello world"
1200@end example
1201Note that the voice that is to be converted *must* be a standard
1202clustergen voice with f0, mceps, delta mceps (optionally strengths for
1203mixed excitation) and voicing in its
1204combined coeffs files.  The method could be changed to deal with other
1205possibilities but it will only work for default build method.
1206
1207The generated library @file{libflite_cmu_us_awb.a} may be linked with
1208other programs like any other flite voice.  The binary generated
1209@code{flite_cmu_us_awb} links in only one voice (unlike the flite binary in
1210the full flite distribution.
1211
1212A single flat file contain the cg voice can also be generated that can
1213be loaded at run time into the flite binary.  You can dump this file
1214from the initial constructed flite binary
1215@example
1216   ./flite_cmu_us_awb -voicedump cmu_us_awb.flitevox
1217@end example
1218The file cmu_us_awb.flitevox may now be references (with pathname/url) on
1219the flite command line and used by the synthesizer
1220@example
1221   ./flite -voice cmu_us_awb.flitevox "Hello World"
1222@end example
1223
1224@section Lexicon Conversion
1225
1226As of 1.3 the script for converting the CMU lexicon (as distributed as
1227part of Festival) is included.  @file{make_cmulex} will use the
1228version of CMULEX unpacked in the current directory to build a new
1229lexicon.  Also in 1.3. a more sophisticated compression technique is
1230used to reduce the lexicon size.  The lexicon is pruned, removing
1231those words which the letter to sound rule models get correct.  Also
1232the letters and phones are separately huffman coded to produce a
1233smaller lexicon.
1234
1235@section Language Conversion
1236
1237This is by far the weakest part as this is the most open ended.  There
1238are basic tools in the @file{flite/tools/} directory that include Scheme
1239code to convert various Scheme structures to C include CART tree
1240conversion and Lisp list conversion.  The other major source of help
1241here is the existing language examples in @file{flite/lang/usenglish/}.
1242
1243Adding new language support is far from automatic, but there are core
1244scripts for setting up new Flite support for languages and lexicons.  There
1245are also scripts for converting (Festival) phoneset definitions to C
1246and converting Festival lexicons to LTS rules and compressed lexicons
1247in C.
1248
1249But beyond that you are sort of on your own.  The largest gap here is
1250text normalization.  We do not yet have a standardize model for text
1251normalization with well definied models for which we could write
1252conversion scripts.
1253
1254However here is a step by step attempt to show you what to do when
1255building support for a new language/lexicon.
1256
1257Suppose we need to create support for Pashto, and already have a
1258festival voice running, and want it now to run in flite.  Converting
1259the voice itself (unitselction or clustergen) is fairly robust, but
1260you will also need C libraries for @file{cmu_pashto_lang} and
1261@file{cmu_pasho_lex}.  The first stage is to create the basic
1262temple files for these.  In the core @file{flite/} source directory
1263type
1264@example
1265   ./tools/make_new_lang_lex pashto
1266@end example
1267This will create language and lex template files in
1268@file{lang/cmu_pashto_lang/'} and @file{cmu_pashto_lex}.
1269
1270Then in firectory @file{lang/cmu_pashto_lang/} type
1271@example
1272    festival $FLITEDIR/tools/make_phoneset.scm
1273    ...
1274    festival> (phonesettoC "cmu_pashto" (car (load "PATHTO/cmu_pashto_transtac_phoneset.scm" t)) "pau" ".")
1275@end example
1276This will create @file{cmu_pashto_lang_phoneset.[ch]}.  You must the add these
1277explicitly to the @file{Makefile}.
1278
1279Then in @file{lang/cmu_pashto_lex/} you have to build the C version of
1280the lexicon and letter to sound rules.  The core script is in
1281@file{flite/tools/build_lex}.
1282@example
1283    mkdir lex
1284    cd lex
1285    cp -p $FLITEDIR/tooks/build_lex .
1286@end example
1287Edit build_lex to give it the name of your lexicon name, and compiled
1288lexicon from your voice.
1289@example
1290LEXNAME=cmu_pashto
1291LEXFILE=lexicon.out
1292@end example
1293You should (I think) remove the first line ``MNCL'' from your
1294@file{lexicon.out} file, note this @emph{must} be the compiled lexicon
1295not the list of entries you compiled from as it expects the ordering, and
1296the syllabification.
1297@example
1298    ./build_lex setup
1299@end example
1300Build the letter to sound rules (probably again)
1301@example
1302    ./build_lex lts
1303@end example
1304Convert the compiled letter to sound rules to C.  This converts the
1305decision trees to decision graphs and runs WFST minimization of them
1306to get a more efficient set of structures.  This puts the generated C
1307files in @file{c/}.
1308@example
1309    ./build_lex lts2c
1310@end example
1311Now convert the lexical entries themselves
1312@example
1313    ./build_lex lex
1314@end example
1315Again the generate C files will be put in @file{c/}.
1316
1317Now we generated a Huffman codes compressed lexicon to reduce the
1318lexicon size, merging frequent letter sequences and phone sequences.
1319@example
1320    ./build_lex compresslex
1321@end example
1322The copy the @file{.c} and @file{.h} files to @file{lang/cmu_pashto_lex/}
1323[something about compressed and non-compressed???]
1324
1325
1326@chapter Porting to new platforms
1327
1328byte order, unions, compiler restrictions
1329
1330@chapter Future developments
1331
1332@contents
1333
1334@bye
1335