1\input texinfo @c -*-texinfo-*- 2@c %**start of header 3@setfilename flite.info 4@settitle Flite: a small, fast speech synthesis engine 5@finalout 6@setchapternewpage odd 7@c %**end of header 8 9@c This document was modelled on the numerous examples of texinfo 10@c documentation available with GNU software, primarily the hello 11@c world example, but many others too. I happily acknowledge their 12@c aid in producing this document -- awb 13 14@set EDITION 2.0 15@set VERSION 2.0 16@set UPDATED 18th November 2014 17 18@ifinfo 19This file documents @code{Flite}, a small, fast run-time speech 20synthesis engine. 21 22Copyright (C) 2001-2014 Carnegie Mellon University 23 24Permission is granted to make and distribute verbatim copies of 25this manual provided the copyright notice and this permission notice 26are preserved on all copies. 27 28@ignore 29Permission is granted to process this file through TeX, or otherwise and 30print the results, provided the printed document carries copying 31permission notice identical to this one except for the removal of this 32paragraph (this paragraph not being relevant to the printed manual). 33 34@end ignore 35Permission is granted to copy and distribute modified versions of this 36manual under the conditions for verbatim copying, provided that the entire 37resulting derived work is distributed under the terms of a permission 38notice identical to this one. 39 40Permission is granted to copy and distribute translations of this manual 41into another language, under the above conditions for modified versions, 42except that this permission notice may be stated in a translation approved 43by the authors. 44@end ifinfo 45 46@titlepage 47@title Flite: a small, fast speech synthesis engine 48@subtitle System documentation 49@subtitle Edition @value{EDITION}, for Flite version @value{VERSION} 50@subtitle @value{UPDATED} 51@author by Alan W Black and Kevin A. Lenzo 52 53@page 54@vskip 0pt plus 1filll 55Copyright @copyright{} 2001-2014 Carnegie Mellon University, all rights 56reserved. 57 58Permission is granted to make and distribute verbatim copies of 59this manual provided the copyright notice and this permission notice 60are preserved on all copies. 61 62Permission is granted to copy and distribute modified versions of this 63manual under the conditions for verbatim copying, provided that the entire 64resulting derived work is distributed under the terms of a permission 65notice identical to this one. 66 67Permission is granted to copy and distribute translations of this manual 68into another language, under the above conditions for modified versions, 69except that this permission notice may be stated in a translation approved 70by the Carnegie Mellon University 71@end titlepage 72 73@node Top, , , (dir) 74 75@menu 76* Abstract:: initial comments 77* Copying:: How you can copy and share the code 78* Acknowledgements:: List of contributors 79* Installation:: Compilation and Installation 80* Flite Design:: 81* APIs:: Standard functions 82* Converting FestVox Voices:: building flite voices from FestVox ones 83 84@end menu 85 86@node Abstract, Copying, , Top 87@chapter Abstract 88 89This document provides a user manual for flite, a small, fast 90run-time speech synthesis engine. 91 92This manual is nowhere near complete. 93 94Flite offers text to speech synthesis in a small and efficient binary. 95It is designed for embedded systems like PDAs as well large server 96installation which must serve synthesis to many ports. Flite is part 97of the suite of free speech synthesis tools which include Edinburgh 98University's Festival Speech Synthesis System 99@url{http://www.festvox.org/festival} and Carnegie 100Mellon University's FestVox project @url{http://festvox.org}, which 101provides tools, scripts, and documentation for building new synthetic 102voices. 103 104Flite is written in ANSI C, and is designed to be portable 105to almost any platform, including very small hardware. 106 107Flite is really just a synthesis library that can be linked into other 108programs, it includes two simple voices with the distribution, an old 109diphone voice and an example limited domain voice which uses the newer 110unit selection techniques we have been developing. Neither of these 111voices would be considered production voices but serve as examples, new 112voices will be released as they are developed. 113 114The latest versions, comments, new voices etc for Flite are available 115from its home page which may be found at 116@example 117@url{http://cmuflite.org} 118@end example 119 120@node Copying, Acknowledgements, Abstract, Top 121@chapter Copying 122 123Flite is free software. It is distributed under an X11-like license. 124Apart from the few exceptions noted below (which still have 125similarly open licenses) the general license is 126@example 127 Language Technologies Institute 128 Carnegie Mellon University 129 Copyright (c) 1999-2014 130 All Rights Reserved. 131 132 Permission is hereby granted, free of charge, to use and distribute 133 this software and its documentation without restriction, including 134 without limitation the rights to use, copy, modify, merge, publish, 135 distribute, sublicense, and/or sell copies of this work, and to 136 permit persons to whom this work is furnished to do so, subject to 137 the following conditions: 138 1. The code must retain the above copyright notice, this list of 139 conditions and the following disclaimer. 140 2. Any modifications must be clearly marked as such. 141 3. Original authors' names are not deleted. 142 4. The authors' names are not used to endorse or promote products 143 derived from this software without specific prior written 144 permission. 145 146 CARNEGIE MELLON UNIVERSITY AND THE CONTRIBUTORS TO THIS WORK 147 DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING 148 ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT 149 SHALL CARNEGIE MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE 150 FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 151 WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN 152 AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 153 ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 154 THIS SOFTWARE. 155@end example 156 157@node Acknowledgements, Installation, Copying, Top 158@chapter Acknowledgements 159 160The initial development of flite was primarily done by awb while 161travelling, perhaps the name is doubly appropriate as a substantial 162amount of the coding was done over 30,000ft). During most of that 163time awb was funded by the Language Technologies Institute at 164Carnegie Mellon University. 165 166Kevin A. Lenzo was involved in the design, conversion techniques and 167representations for the voice distributed with flite (as well as being 168the actual voice itself). 169 170Other contributions are: 171@itemize @bullet 172@item Nagoya Institute of Technology 173The MLSA, MLPG code comes directly NITECH's hts engine code, though we 174have done some optimizations. 175@item Marcela Charfuelan (DFKI) 176For the mixed-excitation techniques (but no direct code). These 177originally came from NITECH but we understood the techniques from 178Marcela's Open Mary Java code and implemented them in our optimized 179version of MLSA. 180@item David Huggins-Daines: 181much of the very early clunits code, porting to multiple platforms, 182substantial code tidy up and configure/autoconf guidance (up to 2001). 183@item Cepstral, LLC (@url{http://cepstral.com}): 184For supporting DHD to spend time (in early 2001) on flite and passing 185back the important fixes and enhancements while on a project funded by 186the Portuguese Foundation for Science and Technology (FCT) Praxis XXI 187program specifically to produce an open source synthesizer. 188@item Willie Walker <william.walker@@sun.com> and the Sun Speech Group: 189lots of low level bugs (and fixes). 190@item Henry Spencer: 191For the regex code 192@item University of Edinburgh: 193for releasing Festival for free, making a companion runtime synthesizer 194a practical project, much of the design of flite relies on the 195architecture decisions made in the Festival Speech Synthesis Systems and 196the Edinburgh Speech Tools. 197 198The duration cart tree and intonation (accent and F0) models for the 199US English voice were derived from the models in the Festival 200distribution. which in turn were trained from the Boston University FM 201Radio Data Corpus. 202 203@item Carnegie Mellon University 204The included lexicon is derived from CMULEX and the letter to sound 205rules are constructed using the Lenzo and Black techniques for 206building LTS decision graphs. 207@item Craig Reese: IDA/Supercomputing Research Center and Joe Campbell: Department of Defense 208who wrote the ulaw conversion routines in src/speech/cst_wave_utils.c 209@end itemize 210 211 212@node Installation, Flite Design, Acknowledgements, Top 213@chapter Installation 214 215Flite consist simple of a set of C files. GNU configure is 216used to configure the engine and will work on most 217major architectures. 218 219In general, the following should build the system 220@example 221tar zxvf flite-XXX.tar.gz 222cd flite-XXX 223./configure 224make 225@end example 226However you will need to explicitly call GNU make 227@code{gmake} if @code{make} is not GNU make on your system. 228 229The configuration process build a file @file{config/config} which under 230some circumstances may need to be edited, e.g. to add unusual options or 231dealing with cross compilation. 232 233On Linux systems, we also support shared libraries which are useful for 234keeping space down when multiple different application are linked to the 235flite libraries. For development we strongly discourage use of shared 236libraries as it is too easy to either not set them up correctly or 237accidentally pick up the wrong version. But for installation they are 238definitely encouraged. That is if you are just going to make and 239install they are good but unless you know what @var{LD_LIBRARY_PATH} 240does, it may be better to use static libraries (the default) if you are 241changing C code or building your own voices. 242@example 243./configure --enable-shared 244make 245@end example 246This will build both shared and static versions of the libraries but 247will link the executables to the @emph{shared} libraries thus you will 248need to install the libraries in a place that your dynamic linker will 249find them (cf. /etc/ld.so.conf) or set @var{LD_LIBRARY_PATH} 250appropriately. 251 252@example 253make install 254@end example 255Will install the binaries (@file{bin/flite*}), include files and 256libraries in appropriate subdirectories of the defined install 257directory, @file{/usr/local} by default. You can change this at configure 258time with 259@example 260./configure --prefix=/opt 261@end example 262 263@section Windows Support 264 265@section Window CE Support 266 267@emph{NOTE: as Windows CE is somewhat rare now, we do not guarantee this 268still works.} 269 270Flite has been successfully compiled by a number of different groups 271under Windows CE. The system should compile under Embedded Visual 272Studio but we not have the full details. 273 274The system as distributed does compile under the gcc @file{mingw32ce} 275toolchain available from @url{http://cegcc.sourceforge.net/}. The 276current version can be compiled and run under WinCE with a primitive 277application called @file{flowm}. @file{flowm} is a simple application 278that allows playing of typed-in text, or full text to speech on 279a file. Files should be a simple ascii text files @code{*.txt}. The 280application allows the setting of the byte position to start synthesis 281from. 282 283Assuming you have @file{mingw32ce} installed you can configure as 284@example 285./configure --target=arm-wince 286make 287@end example 288The resulting binary is given in @file{wince/flowm.exe}. If you copy 289this onto your Windows Mobile device and run it, it should allow you 290to speak typed-in text and any @file{*.txt} files you have on your 291device. 292 293The application uses @code{cmu_us_kal} as the voice for default. 294Although it is possible to include the clustergen voices, they may be 295too slow to be really practical. An 8KHz clustergen voice with a 296reduced order to 13 gives a voices that runs acceptably on an hp2755 297(624MHz) but still marginal on an AT&T Tilt (400MHz). 298 299Building 8KHz clustergen voices is currently a bit of hack. We take the 300standard waveforms and resample them to 8KHz, then relabel the sample 301rate to be 16KHz. Then build the voice as normal (as if the speaker 302spoke twice as fast. You may need to have tune the F0 parameters in 303@file{etc/f0.params}. This seems to basically work. Then after the 304waveform is synthesized (still in the "chipmunk'' domain) we then 305playit back at 8KHz. This effectively means we generate half the 306number of samples and the frames are really at 10ms. A second 307reduction is an option on the basic @file{build_flite} command. A 308second argument can specify order reduction, thus instead of the 309standard 25 static parameters (plus its deltas) we can reduce this to 31013 and still get acceptable results 311@example 312./bin/build_flite cg 13 313cd flite 314make 315@end example 316Importantly this uses less space, and uses less time to synthesis. 317These @code{SPEECH_HACKS} in @file{src/cg/cst_mlsa.c} are switched on 318by default when @code{UNDER_CE} is defined. 319 320The reduced order properly extracts the statics (and stddev) and 321deltas (and stddev) from the predicted parameter clusters and makes it 322as if those were the sizes of parameters that were used to the train 323the voice. 324 325@section PalmOS Support 326 327@emph{NOTE: as PalmOS is somewhat rare now, we do not guarantee this 328still works.} 329 330Support for PalmOS was removed from 1.9, I no longer have any working 331PalmOS devices. But this section remains for people who do, but they 332may need to update something to make this work. 333 334Starting with 1.3 we have initial support for PalmOS using the free 335development tools. The compilation method assumes the target device 336is running PalmOS 5.0 (or later) on an ARM processor. Following 337convention in the Palm world, the app that the user interacts with is 338actually a m68k application compiled with the m68 gcc cross compiler, 339the resulting code is interpreted by the PalmOS 5.0 device. The core 340flite code is in native ARM, and hence uses the ARM gcc cross 341compiler. An interesting amout of support code is required to 342get all this work properly. 343 344The user app is called @code{flop} (FLite on Palm) and like most apps 345written by awb, is functional, but ugly. You should not let a 346short-sighted Scotsman, who still thinks command line interfaces are 347cool, design a graphical app. But it does work and can read typed-in 348text. The @file{armflite.ro} resources are designed with the idea 349that proper applications will be written using it as a library. 350 351The @file{flop.prc} application is distributed separately so it can be used 352without having to install all these tools. But if you want to PalmOS 353development here is what you need to do to compile Flite for PalmOS and 354the flop application. 355 356There are number of different application development environments for 357Palm, here I only describe the Unix based one as this is what was 358used. You will need the PalmOS SDK 5.0 from palmOne 359@url{http://www.palmone.com/us/developers/}. This is 360free but does require registration. Out of the lots of different 361files you can get for palmOne you will eventually find 362@file{palmos-sdk-5.0r3-1.noarch.rpm}, install that on your linux 363machine 364@example 365rpm -i palmos-sdk-5.0r3-1.noarch.rpm 366@end example 367You will also need the various gcc based cross compilers 368@url{http://prc-tools.sourceforge.net/} 369@example 370prc-tools-2.3-1.i386.rpm 371prc-tools-arm-2.3-1.i386.rpm 372prc-tools-htmldocs-2.3-1.noarch.rpm 373@end example 374The Palm Resource compiler 375@url{http://pilrc.sourceforge.net/} 376@example 377pilrc-3.1-1.i386.rpm 378@end example 379And maybe the emulator 380@url{http://www.palmos.com/dev/tools/emulator/} 381@example 382pose-3.5-2.i386.rpm 383pose-skins-1.9-1.noarch.rpm 384pose-skins-handspring-3.1H4-1.noarch.rpm 385@end example 386Though as POSE doesn't support ARM code, @file{Simulator} does but 387that only works under Windows, POSE is only useful for debugging the 388m68k parts of the app. 389 390Install these 391@example 392rpm -i prc-tools-2.3-1.i386.rpm 393rpm -i prc-tools-arm-2.3-1.i386.rpm 394rpm -i prc-tools-htmldocs-2.3-1.noarch.rpm 395rpm -i pilrc-3.1-1.i386.rpm 396rpm -i pose-3.5-2.i386.rpm 397rpm -i pose-skins-1.9-1.noarch.rpm 398rpm -i pose-skins-handspring-3.1H4-1.noarch.rpm 399@end example 400We also need the prc-tools to know which SDK is available 401@example 402palmdev-prep 403@end example 404In addition we use Greg Parker's PEAL 405@url{http://www.sealiesoftware.com/peal/} ELF ARM loader. You need to 406download this and compile and install it yourself, so that 407@code{peal-postlink} is in your path. Greg was very helpful and even 408added support for large data segments for this work (though in the end 409we don't actually use them). Some peal code is in our distribution 410(which is valid under his licence) but if you use a different version 411of peal you may need to ensure they are matched, by updating 412the peal code in @file{palm/}. We used version @file{peal-2004-12-29}. 413 414The other palm specific function we require is @code{par} 415@url{http://www.djw.org/product/palm/par/} which is part of the 416@code{prc.tgz} distribution. We use @code{par} to construct resources 417from raw binary files. There are other programs that can do this but 418we found this one adequate. Again you must compile this and ensure 419@code{par} is in your path. Note no part of @code{par} ends up 420in the distributed system. 421 422Given all of the above you should be able to compile the 423Palm code and the @code{flop} application. 424@example 425 ./configure --target=arm-palmos 426 make 427@end example 428The resulting application should be in @file{palm/flop/flop.prc} 429which can then be installed on your Plam device 430@example 431 pilot-xfer -i palm/flop/flop.prc 432@end example 433Setting up the tools, and getting a working Linux/Palm conduit is not 434particularly easy but it is possible. Although some attempt was made 435to use the Simulator, (PalmOS 5.0/ARM simulator) under Windows, it 436never really contributed to the development. The POSE (m68k) emulator 437though was use to develop the @code{flop} application itself. 438 439@subsection Some notes on the PalmOS port 440 441Throughout the PalmOS developer documentation they continually remind 442you that a Palm device is not a full computer, its an extention of the 443desktop. But seeing devices like the Treo 600 can easily make one 444forget and want the device to do real computational work. PalmOS is 445designed for small light weight devices so it is easy to start hitting 446the boundaries of its capabilities when trying to port larger 447aplications. 448 449PalmOS5.0 still has interesting limitations, in the m68k domain, 450@code{int}'s are 16 bit and using memory segments greater than 65K 451require special work. Quaint as these are, they do significantly 452affect the port. At first we thought that only the key 453computationally expensive parts would be in ARM (so-called 454@code{armlets}) but trying to compile the whole flite code in m68k 455with long/short distinctions and sub-64K code segment limitations was 456just too hard. 457 458Thus all the Flite code, USEnglish, Lexicon and diphone databases 459actually are compiled in the ARM domain. There is however no system 460support in the ARM domain so call backs to m68k system functions are 461necessary. With care calls to system functions can be significantly 462limited so only a few call backs needed to be written. These are in 463@file{palm/pocore/}. I believe CodeWarrior has better support for 464this, but in this case we rolled our own (though help from other open 465source examples was important). 466 467We manage the m68k/ARM interface through PEAL, which is basically a 468linker for ARM code and calling mechanism from m68k. PEAL deals with 469globals and spliting the code into 65K chunks automatically. 470 471Flite does however have a number of large data segments, in the 472lexicon and the voice database itself. PEAL can deal with this but it 473loads large segments by copying them into the dynamic heap, which on 474most Palm device is less than 2M. This isn't big enough. 475 476Thus we changed Flite to restrict the number of large data sgements it 477used (and also did some new compression on them). The five segments: the 478lts rules, the lexical entries, the voice LPC coefficients, the voice 479residuals and the voice residual index are now treated a data segments 480that are split into 65400 sized segments and loaded into feature 481memory space, which is in the storage heap and typically much bigger. 482This means we do need about 2-3 megabyte free on the device to run. 483We did look into just indexing the 65400 byte segments directly but 484that looked like being too much work, and we're only going to be able 485to run on 16M sized Palms anyway (there aren't any 8M ARM Palms with 486audio, expect maybe some SmartPhones). 487 488Using Flite from m68k land involves getting a @code{flite_info} 489structure from @code{flite_init()}. This contains a bunch of fields 490that be set and sent to the ARM domain Flite synthesizer proper within 491which other output fields may be set and returned. This isn't a very 492general structure, but is adequate. Note the necessary byte swapping 493(for the top level fileds) is done for the this structure, before 494calling the ARM native @code{arm_flite_synth_text} and swapped back 495again after returning. 496 497Display, playing audio, pointy-clicky event thingies are all done in 498the m68K domain. 499 500@subsection Using the PalmOS 501 502There are three basic functions that access the ARM flite 503functions: @code{flite_init()}, @code{flite_synth_text()} and 504@code{flite_end()}. 505 506@node Flite Design, APIs, Installation, Top 507@chapter Flite Design 508 509@section Background 510 511Flite was primarily developed to address one of the most common 512complaints about the Festival Speech Synthesis System. Festival is 513large and slow, even with the software bloat common amongst most 514products and that that bloat has helped machines get faster, have more 515memory and large disks, still Festival is criticized for its size. 516 517Although sometimes this complaint is unfair, it is valid and although 518much work was done to ensure Festival can be trimmed and run fast it 519still requires substantial resources per utterance to run. After some 520investigation to see if Festival itself could be trimmed down it became 521clear because there was a core set of functions that were sufficient for 522synthesis that a new implementation containing only those aspects that 523were necessary would be easier than trimming down Festival itself. 524 525Given that a new implementation was being considered a number of 526problems with Festival could also be addressed at the same time. 527Festival is not thread-safe, and although it runs under Windows, in 528server mode it relies on the Unix-centric view of fast forks with 529copy-on-write shared memory for servicing clients. This is a perfectly 530safe and practical solution for Unix systems, but under Windows where 531threads are the more common feature used for servicing multiple events 532and forking is expensive, a non-thread safe program can't be used as 533efficiently. 534 535Festival is written in C++ which was a good decision at the time and 536perfectly suitable for a large program. However what was discovered 537over the years of development is that C++ is not a portable language. 538Different C++ compilers are quite different and it takes significant 539amount of work to ensure compatibility of the code base over multiple 540compilers. What makes this worse is that new versions of each compiler 541are incompatible and changes are required. At first this looked like we 542were producing bad quality code but after 10 years it is clear that it is 543also that the compilers are still maturing. Thus it is clear that 544Festival and the Edinburgh Speech Tools will continue to require 545constant support as new versions of compilers are released. 546 547A second problem with C++ is the size and efficiency of the code 548produced. Proponents of C++ may rightly argue that Festival and the 549Edinburgh Speech Tools aren't properly designed, but irrespective if 550that is true or not, it is true that the size of the code is much larger 551and slower than it need be for what it does. Throughout the design 552there is a constant trade-off between elegancy and efficiency which 553unfortunately at times in Festival requires untidy solutions of 554copying data out of objects processing it and copying back because 555direct access (particularly in some signal processing routines) 556is just too inefficient. 557 558Another major criticism of Festival is the use of Scheme as the 559interpreter language. Even though it is a simple to implement language 560that is adequate for Festival's needs and can be easily included in the 561distribution, people still hate it. Often these people do learn to use 562it and appreciate how run time configurability is very desirable and that 563new voices may be added without recompilation. Scheme does have garbage 564collection which makes leaky programs much harder to write and as some 565of the intended audience for developing in Festival will not be hard 566core programmers a safe programming language seems very desirable. 567 568After taking into consideration all of the above it was decided to 569develop Flite as a new system written in ANSI C. C is much more 570portable than C++ as well as offering much lower level control of the 571size of the objects and data structure it uses. 572 573Flite is not intended as a research and development platform for speech 574synthesis, Festival is and will continue to be the best platform for 575that. Flite however is designed as a run-time engine when an 576application needs to be delivered. It specifically addresses two 577communities. First as a engine for small devices such as PDAs and 578telephones where the memory and CPU power are limited and in some cases do 579not even have a conventional operating system. 580 581The second community is for those running synthesis servers for many 582clients. Here although large fixed databases are acceptable, the size 583of memory required per utterance and speed in which they can be 584synthesized is crucial. 585 586However in spite of the decision to build a new synthesis engine we see 587this as being tightly coupled into the existing free software synthesis 588tools or Festival and the FestVox voice building suite. Flite offers 589a companion run-time engine. Our intended mode of development is 590to build new voices in FestVox and debug and tune them in Festival. 591Then for deployment the FestVox format voice may be (semi-)automatically 592compiled into a form that can be used by Flite. 593 594In case some people feel that development of a small run-time 595synthesizer is not an appropriate thing to do within a University and is 596more suited to commercial development, we have a few points which they 597should be aware of that to our mind justify this work. 598 599We have long felt that research in speech and language should have an 600identifiable link to ultimate commercial use. In providing a platform 601that can be used in consumer products that falls within the same 602framework as our research we can better understand what research issues 603are actually important to the improvement our work. 604 605In considering small useful synthesizers it forces a more explicit 606definition of what is necessary in a synthesizer and also how we can 607trade size, flexibility and speed with the quality of synthesized 608output. Defining that relationship is a research issue. 609 610We are also advocates of speech technology within other research areas 611and the ability to offer support on new platforms such as PDAs and 612wearables allows for more interesting speech applications such as 613speech-to-speech translation, robots, and interactive personal digital 614assistants, that will prove new and interesting areas of research. 615Thus having a platform that others around us can more easily integrate 616into their research makes our work more satisfying. 617 618@section Key Decisions 619 620The basic architecture of Festival is good. It is well proven. Paul 621Taylor, Alan W. Black and Richard Caley spent many hours debating low 622level aspects of representation and structure that would both be 623adequate for current theories but also allow for future theories too. 624The heterogeneous relation graphs (HRG) are theoretically adequate, 625computationally efficient and well proven. Thus both because HRGs have 626such a background and that Flite is to be compatible with voices and 627models developed in Festival, Flite uses HRGs as its basic utterance 628representation structure. 629 630Most of a synthesizer is in its data (lexicons, unit database etc), 631the actual synthesis code is pretty small. In Festival most of that 632data exists in external files which are loaded on demand. This is 633obviously slow and memory expensive (you need both a copy on the data 634on disk and in memory). As one of the principal targets for Flite is 635very small machines we wanted to allow that core data to be in ROM, 636and be appropriately mapped into RAM without any explicit loading 637(some OS's call this XIP -- execute in place). This can be done by 638various memory mapping functions (in Unix its called mmap) and is the 639core technique used in shared libraries (called DLLs in some parts of 640the world). Thus the data should be in a format that it can be 641directly accessed. If you are going to directly access data you need 642to ensure the byte layout is appropriate for the architecture you are 643running on, byte order and address width become crucial if you want to 644avoid any extra conversion code at access time (like byte swapping). 645 646At first is was considered that synthesis data would be converted in 647binary files which could be mmap'ed into the runtime systems but 648building appropriate binaries files for architectures is quite a job. 649However the C compiler does this in a standard way. Therefore the mode 650of operation for data within Flite is to convert it to C code (actually 651C structures) and use the C compiler to generate the appropriate binary 652structures. 653 654Using the C compiler is a good portable solution but it as these 655structures can be very big this can tax the C compiler somewhat. Also 656because this data is not going to change at run time it can all be 657declared @code{const}. Which means (in Unix) it will be in the text 658segment and hence read only (this can be ROM on platforms which have 659that distinction). For structures to be const all their subparts 660must also be const thus all relevant parts must be in the same file, 661hence the unit databases files can be quite big. 662 663Of course, this all presumes that you have a C compiler robust enough to 664compile these files, hardware smart enough to treat flash ROM as memory 665rather than disk, or an operating system smart enough to demand-page 666executables. Certain "popular" operating systems and compilers fail in 667at least one of these respects, and therefore we have provided the 668flexibility to use memory-mapped file I/O on voice databases, where 669available, or simply to load them all into memory. 670 671@chapter Structure 672 673The flite distribution consists of two distinct parts: 674@itemize @bullet 675@item The flite library containing the core synthesis code 676@item Voice(s) for flite. These contain three sub-parts 677@itemize @bullet 678@item Language models: 679text processing, prosody models etc. 680@item Lexicon and letter to sound rules 681@item Unit database and voice definition 682@end itemize 683@end itemize 684 685@section cst_val 686 687This is a basic simple object which can contain ints, floats, strings 688and other objects. It also allows for lists using the Scheme/Lisp, 689car/cdr architecture (as that is the most efficient way to represent 690arbitrary trees and lists). 691 692The @code{cst_val} structure is carefully designed to take up only 8 bytes (or 69316 on a 64-bit machine). The multiple union structure that it can 694contain is designed so there are no conflicts. However it depends on 695the fact that a pointer to a @code{cst_val} is guaranteed to lie on a even 696address boundary (which is true for all architectures I know of). Thus 697the distinction between between cons (i.e. list) objects and atomic 698values can be determined by the odd/evenness of the least significant bits 699of the first address in a @code{cst_val}. In some circles this is considered 700hacky, in others elegant. This was done in flite to ensure that the most 701common structure is 8 bytes rather than 12 which saves significantly on 702memory. 703 704All @code{cst_val}'s except those of type cons are reference counted. A 705few functions generate new lists of @code{cst_val}'s which the user 706should be careful about as they need to explicitly delete them (notably 707the lexicon lookup function that returns a list of phonemes). 708Everything that is added to an utterance will be deleted (and/or 709dereferenced) when the utterance is deleted. 710 711Like Festival, user types can be added to the @code{cst_val}s. In 712Festival this can be done on the fly but because this requires the 713updating of some list when each new type is added, this wouldn't be 714thread safe. Thus an explicit method of defining user types is done in 715@file{src/utils/cst_val_user.c}. This is not as neat as defining on the 716fly or using a registration function but it is thread safe and these 717user types won't change often. 718 719@node APIs, Converting FestVox Voices, Flite Design, Top 720@chapter APIs 721 722Flite is a library that we expected will be embedded into other 723applications. Included with the distribution is a small example 724executable that allows synthesis of strings of text and text files 725from the command line. 726 727You may want to look at Bard @file{http://festvox.org/bard/}, an ebook 728reader with a tight coupling to flite as a synthesizer. This is the 729most elaborate use of the Flite API within our suite of programs. 730 731@section flite binary 732 733The example flite binary may be suitable for very simple applications. 734Unlike Festival its start up time is very short (less than 25ms on a PIII 735500MHz) making it practical (on larger machines) to call it each 736time you need to synthesize something. 737@example 738flite TEXT OUTPUTTYPE 739@end example 740If @code{TEXT} contains a space it is treated as a string of text and 741converted to speech, if it does not contain a space @code{TEXT} is 742treated as a file name and the contents of that file are converted to 743speech. The option @code{-t} specifies @code{TEXT} is to be treat 744as text (not a filename) and @code{-f} forces treatment as a file. 745Thus 746@example 747flite -t hello 748@end example 749will say the word "hello" while 750@example 751flite hello 752@end example 753will say the content of the file @file{hello}. Likewise 754@example 755flite "hello world." 756@end example 757will say the words "hello world" while 758@example 759flite -f "hello world" 760@end example 761will say the contents of a file @file{hello world}. If no argument is 762specified text is read from standard input. 763 764The second argument @code{OUTPUTTYPE} is the name of a file the output 765is written to, or if it is @code{play} then it is played to the audio 766device directly. If it is @code{none} then the audio is created but 767discarded, this is used for benchmarking. If it is @code{stream} then 768the audio is streamed through a call back function (though this is not 769particularly useful in the command line version. If @code{OUTPUTTYPE} 770is omitted, @code{play} is assumed. You can also explicitly set the 771outputtype with the @code{-o} flag. 772@example 773flite -f doc/alice -o alice.wav 774@end example 775 776@section Voice selection 777 778All the voices in the distribution are collected into a single simple 779list in the global variable @code{flite_voice_list}. You can select a 780voice from this list from the command line 781@example 782flite -voice awb -f doc/alice -o alice.wav 783@end example 784And list which voices are currently supported in the binary with 785@example 786flite -lv 787@end example 788The voices which get linked together are those listed in the 789@code{VOICES} in the @file{main/Makefile}. You can change that as you 790require. 791 792Voices may also be dynamically loaded from files as well as built in. 793The argument to the @code{-voice} option may be pathname to a dumped 794(Clustergen) voice. This may be a Unix pathname or a URL (only protocols @file{http} and @file{file} are supported. For example 795@example 796flite -voice file://cmu_us_awb.flitevox -f doc/alice -o alice.wav 797flite -voice http://festvox.org/voices/cmu_us_ksp.flitevox -f doc/alice -o alice.wav 798@end example 799Voices will be loaded once and added to @code{flite_voice_list}. 800Although these voices are often small (a few megabytes) there will 801still be some time required to read them in the first time. The 802voices are not mapped, they are read into newly created structures. 803 804This loading function is currently only supported for Clustergen 805voices. 806 807@section C example 808 809Each voice in Flite is held in a structure, a pointer to which is 810returned by the voice registration function. In the standard 811distribution, the example diphone voice is @code{cmu_us_kal}. 812 813Here is a simple C program that uses the flite library 814@example 815#include "flite.h" 816 817register_cmu_us_kal(); 818 819int main(int argc, char **argv) 820@{ 821 cst_voice *v; 822 823 if (argc != 2) 824 @{ 825 fprintf(stderr,"usage: flite_test FILE\n"); 826 exit(-1); 827 @} 828 829 flite_init(); 830 831 v = register_cmu_us_kal(NULL); 832 833 flite_file_to_speech(argv[1],v,"play"); 834 835@} 836@end example 837Assuming the shell variable FLITEDIR is set to the flite directory 838the following will compile the system (with appropriate changes for 839your platform if necessary). 840@example 841gcc -Wall -g -o flite_test flite_test.c -I$FLITEDIR/include -L$FLITEDIR/lib 842 -lflite_cmu_us_kal -lflite_usenglish -lflite_cmulex -lflite -lm 843@end example 844 845@section Public Functions 846 847Although, of course you are welcome to call lower level functions, 848there a few key functions that will satisfy most users 849of flite. 850@table @code 851@item void flite_init(void); 852This must be called before any other flite function can be called. As 853of Flite 1.1, it actually does nothing at all, but there is no guarantee 854that this will remain true. 855@item cst_wave *flite_text_to_wave(const char *text,cst_voice *voice); 856Returns a waveform (as defined in @file{include/cst_wave.h}) synthesized 857from the given text string by the given voice. 858@item float flite_file_to_speech(const char *filename, cst_voice *voice, const char *outtype); 859synthesizes all the sentences in the file @file{filename} with given 860voice. Output (at present) can only reasonably be, @code{play} or 861@code{none}. If the feature @code{file_start_position} with an 862integer, that point is used as start position in the file to be synthesized. 863@item float flite_text_to_speech(const char *text, cst_voice *voice, const char *outtype); 864synthesizes the text in string point to by @code{text}, with the given 865voice. @code{outtype} may be a filename where the generated waveform is 866written to, or "play" and it will be sent to the audio device, or 867"none" and it will be discarded. The return value is the 868number of seconds of speech generated. 869@item cst_utterance *flite_synth_text(const char *text,cst_voice *voice); 870synthesize the given text with the given voice and returns an utterance 871from it for further processing and access. 872@item cst_utterance *flite_synth_phones(const char *phones,cst_voice *voice); 873synthesize the given phones with the given voice and returns an utterance 874from it for further processing and access. 875@item cst_voice *flite_voice_select(const char *name); 876returns a pointer to the voice named @code{name}. Will retrurn 877@code{NULL} if there is not match, if @code{name == NULL} then the 878first voice in the voice list is returned. If @code{name} is a url 879(starting with @file{file:} or @file{http:}, that file will be 880accessed and the voice will be downloaded from there. 881@item float flite_ssml_file_to_speech(const char *filename, cst_voice *voice, const char *outtype); 882Will read the file as ssml, not all ssml tags are supported but many 883are, unsupported ones are ignored. Voice selection works by naming 884the internal name of the voice, or the name may be a url and the voice 885will be loaded. The audio tag is supported for loading waveform files, again urls are supported. 886@item float flite_ssml_text_to_speech(const char *text, cst_voice *voice, const char *outtype); 887Will treat the text as ssml. 888@item int flite_voice_add_lex_addenda(cst_voice *v, const cst_string *lexfile); 889loads the pronunciations from @code{lexfile} into the lexicon 890identified in the given voice (which will cause all other voices using 891that lexicon to also get this new addenda list. An example lexicon 892file is given in @file{flite/tools/examples.lex}. Words may be in 893double quotes, an optional part of speech tag may be give. A colon 894separates the headword/postag from the list of phonemes. Stress 895values (if used in the lexicon) must be specified. Bad phonemes will 896be complained about on standard out. 897@end table 898 899@section Streaming Synthesis 900 901In 1.4 support was added for streaming synthesis. Basically you may 902provide a call back function that will be called with waveform data 903immediately when it is available. This potentially can reduce the 904delay between sending text to the synthesized and having audio 905available. 906 907The support is through a call back function of type 908@example 909int audio_stream_chunk(const cst_wave *w, int start, int size, 910 int last, cst_audio_streaming_info *asi) 911@end example 912If the utterance feature @code{streaming_info} is set (which can 913be set in a voice or in an utterance). The LPC or MLSA resynthesis 914functions will call the provided function as buffers become available. 915The LPC and MLSA waveform synthesis functions are used for diphones, 916limited domain, unit selection and clustergen voices. Note explicit 917support is required for streaming so new waveform synthesis function 918may not have the functionality. 919 920An example streaming function is provided in 921@file{src/audio/au_streaming.c} and is used by the example flite main 922program when @code{stream} is given as the playing option. (Though in 923the command line program the function it isn't really useful.) 924 925In order to use streaming you must provide call back function in your 926particular thread. This is done by adding features to the voice in 927your thread. Suppose your function was declared as 928 929@example 930int example_audio_stream_chunk(const cst_wave *w, int start, int size, 931 int last, void *user) 932@end example 933You can add this function as the streaming function through the statement 934@example 935 cst_audio_streaming_info *asi; 936... 937 asi = new_audio_streaming_info(); 938 asi->asc = example_audio_stream_chunk; 939 feat_set(voice->features, 940 "streaming_info", 941 audio_streaming_info_val(asi)); 942@end example 943You may also optionally include your own pointer to any information 944you additionally want to pass to your function. For example 945@example 946typedef my_callback_struct @{ 947 cst_audiodev *fd; 948 int count; 949@}; 950cst_audio_streaming_info *asi; 951 952... 953 954mcs = cst_alloc(my_callback_struct,1); 955mcs->fd=NULL; 956mcs->count=1; 957 958asi = new_audio_streaming_info(); 959asi->asc = example_audio_stream_chunk; 960asi->userdata = mcs; 961feat_set(voice->features, 962 "streaming_info", 963 audio_streaming_info_val(asi)); 964@end example 965Another example is given in @file{testsuite/by_word_main.c} which 966shows a call back funtion that also prints the token as it is being 967synthesized. The @file{utt} field in the 968@file{cst_audio_streaming_info} structure will be set to the current 969utterance. Please note that the @file{item} field in the 970@file{cst_audio_streaming_info} structure is for your convenience and 971is not set by anyone at all. The previous sentence exists in the 972documentation so that I can point at it, when user's fail to read it. 973 974@node Converting FestVox Voices, , APIs, top 975@chapter Converting FestVox Voices 976 977As of 1.2 initial scripts have been added to aid the conversion of 978FestVox voices to Flite. In general the conversion cannot be automatic. 979For example all specific Scheme code written for a voice needs to be 980hand converted to C to work in Flite, this can be a major task. 981 982Simple conversion scripts are given as examples of the stages you need 983to go through. These are designed to work on standard (English) 984diphone sets, and simple limited domain voices. The conversion 985technique will almost certainly fail for large unit selection voices 986due to limitations in the C compiler (more discussion below). In 1.4 987we have also added support for converting clustergen voices too (which 988is a little easier, see section below). 989 990@section Cocantenative Voice Building 991 992Conversion is basically taking the description of units (clunit 993catalogue or diphone index) and constructing some C files that can be 994compiled to form a usable database. Using the C compiler to generate 995the object files has the advantage that we do not need to worry about 996byte order, alignment and object formats as the C compiler for the 997particular target platform should be able to generate the right code. 998 999Before you start ensure you have successfully built and run your FestVox 1000voice in Festival. Flite is not designed as a voice building/debugging 1001tool it is just a delivery vehicle for finalized voices so you should 1002first ensure you are satisfied with the quality of Festival voices 1003before you start converting it for Flite. 1004 1005The following basic stages are required: 1006@itemize @bullet 1007@item Setup the directories and copy the conversion scripts 1008@item Build the LPC files 1009@item Build the MCEP files (for ldom/clunits) 1010@item Convert LPC (MCEP) into STS (short term signal) files 1011@item Convert the catalogue/diphone index 1012@item Compile the generated C code 1013@end itemize 1014 1015The conversion assumes the environment variable @code{FLITEDIR} 1016is set, for example 1017@example 1018 export FLITEDIR=/home/awb/projects/flite/ 1019@end example 1020The basic flite conversion takes place within a FestVox voice directory. 1021Thus all of the conversion scripts expect that the standard files are 1022available. The first task is to build some new directories and copy in 1023the build scripts. The scripts are copied rather than linked from the 1024Flite directories as you may need to change these for your particular 1025voices. 1026@example 1027 $FLITEDIR/tools/setup_flite 1028@end example 1029This will read @file{etc/voice.defs}, which should have been created by 1030the FestVox build process (except in very old versions of FestVox). 1031 1032If you don't have a @file{etc/voice.defs} you can construct one 1033with @code{festvox/src/general/guess_voice_defs} in the Festvox 1034distribution, or generate one by hand making it look 1035like 1036@example 1037FV_INST=cmu 1038FV_LANG=us 1039FV_NAME=ked_timit 1040FV_TYPE=clunits 1041FV_VOICENAME=$FV_INST"_"$FV_LANG"_"$FV_NAME 1042FV_FULLVOICENAME=$FV_VOICENAME"_"$FV_TYPE 1043@end example 1044 1045The main script build building the Flite voice is @file{bin/build_flite} 1046which will eventually build sufficient C code in @file{flite/} that can 1047be compiled with the constructed @file{flite/Makefile} to give you a 1048library that can be linked into applications and also an example 1049@file{flite} binary with the constructed voice built-in. 1050 1051You can run all of these stages, except the final make, together by 1052running the the build script with no arguments 1053@example 1054 ./bin/build_flite 1055@end example 1056But as things may not run smoothly, we will go through the 1057stages explicitly. 1058 1059The first stage is to build the LPC files, this may have already been 1060done as part of the diphone building process (though probably not in 1061the ldom/clunit case). In our experience it is very important that the 1062records be of similar power, as mis-matched power can often cause 1063overflows in the resulting flite (and sometimes Festival) voices. Thus, 1064for diphone voices, it is important to run the power normalization 1065techniques described int he FestVox document. The Flite LPC build 1066process also builds a parameter file of the ranges of the LPC parameters 1067used in later coding of the files, so even if you have already built your 1068LPC files you should still do this again 1069@example 1070 ./bin/build_flite lpc 1071@end example 1072 1073For ldom, and clunit voices (but not for diphone voices) we also 1074need the Mel-frequency Cepstral Coefficients. These are assumed to 1075have been cleared and are in @file{mcep/} as they are necessary 1076for running the voice in Festival. This stage simply constructs 1077information about the range of the mcep parameters. 1078@example 1079 ./bin/build_flite mcep 1080@end example 1081 1082The next stage is to construct the STS files. Short Term Signals (STS) 1083are built for each pitch period in the database. These are ascii files 1084(one for each utterance file in the database, with LPC coefficients, and 1085ulaw encoded residuals for each pitch period. These are built using a 1086binary executable built as part of the Flite build 1087(@file{flite/tools/find_sts}. 1088@example 1089 ./bin/build_flite sts 1090@end example 1091Note that the flite code expects waveform files to be in Microsoft RIFF 1092format and cannot deal with files in other formats. Some earlier 1093versions of the Edinburgh Speech Tools used NIST as the default header 1094format. This is likely to cause flite and its related programs not 1095work. So do ensure your waveform files are in riff format (ch_wave -info 1096wav/* will tell you the format). And the following fill convert 1097all your wave files 1098@example 1099 mv wav wav.nist 1100 mkdir wav 1101 cd wav.nist 1102 for i in *.wav 1103 do 1104 ch_wave -otype riff -o ../wav/$i $i 1105 done 1106@end example 1107 1108The next stage is to convert the index to the required C format. For 1109diphone voices this takes the @file{dic/*.est} index files, for 1110clunit/ldom voices it takes the @file{festival/clunit/VOICE.catalogue} 1111and @file{festival/trees/VOICE.tree} files. This process uses a binary 1112executable built as part of the Flite build process 1113(@file{flite/tools/flite_sort}) to sort the indices into the same 1114sorting order required for flite to run. (Using unix sort may or may 1115not give the same result due to definitions of lexicographic order so 1116we use the very same function in C that will be used in flite to ensure 1117that a consistent order is given.) 1118@example 1119 ./bin/build_flite idx 1120@end example 1121All the necessary C files should now have been built in @file{flite/} 1122and you may compile them by 1123@example 1124 cd flite 1125 make 1126@end example 1127This should give a library and an executable called @file{flite} that 1128can run as 1129@example 1130 ./flite "Hello World" 1131@end example 1132Assuming a general voice. For ldom voices it will only be able to say 1133things in its domain. This @file{flite} binary offers the same options 1134as standard the standard @file{flite} binary compiled in the Flite build 1135but with your voice rather than the distributed voices. 1136 1137Almost certainly this process will not run smoothly for you. Building 1138voices is still a very hard thing to do and problems will probably 1139exist. 1140 1141This build process does not deal with customization for the given 1142voices. Thus you will need to edit @file{flite/VOICE.c} to set 1143intonation ranges and duration stretch for your particular voice. 1144 1145For example in our @file{cmu_us_sls_diphone} voice (a US English female 1146diphone voice). We had to change the default parameters from 1147@example 1148 feat_set_float(v->features,"int_f0_target_mean",110.0); 1149 feat_set_float(v->features,"int_f0_target_stddev",15.0); 1150 1151 feat_set_float(v->features,"duration_stretch",1.0); 1152@end example 1153to 1154@example 1155 feat_set_float(v->features,"int_f0_target_mean",167.0); 1156 feat_set_float(v->features,"int_f0_target_stddev",25.0); 1157 1158 feat_set_float(v->features,"duration_stretch",1.0); 1159@end example 1160 1161Note this conversion is limited. Because it depends on the C compiler 1162to do the final conversion into binary object format (a good idea in 1163general for portability), you can easily generate files too big for the 1164C compiler to deal with. We have spent some time investigating this 1165so the largest possible voices can be converted but it is still too 1166limited for our larger voices. In general the limitation seems to be 1167best quantified by the number of pitch periods in the database. After 1168about 100k pitch periods the files get too big to handle. There are 1169probably solutions to this but we have not yet investigated them. This 1170limitation doesn't seem to be an issue with the diphone voices as they 1171are typically much smaller than unit selection voices. 1172 1173@section Statistical Voice Building (Clustergen) 1174 1175The process of building from a clustergen (cg) voice is also 1176supported. It is assumed the environment variable @code{FLITEDIR} is 1177set 1178@example 1179 export FLITEDIR=/home/awb/projects/flite/ 1180@end example 1181After you build the clustergen voice you can convert by first setting 1182up the skeleton files in the @file{flite/} directory 1183@example 1184 $FLITEDIR/tools/setup_flite 1185@end example 1186Assuming @file{etc/voice.defs} properly identifies the voice the cg 1187templates will be compied in. 1188 1189The conversion itself is actually much faster than a clunit build 1190(there is less to actually convert). 1191@example 1192 ./bin/build_flite cg 1193@end example 1194Will convert then necessary models into files in the @file{flite/} 1195directory. The you can compile it with 1196@example 1197 cd flite 1198 make 1199 ./flite_cmu_us_awb "Hello world" 1200@end example 1201Note that the voice that is to be converted *must* be a standard 1202clustergen voice with f0, mceps, delta mceps (optionally strengths for 1203mixed excitation) and voicing in its 1204combined coeffs files. The method could be changed to deal with other 1205possibilities but it will only work for default build method. 1206 1207The generated library @file{libflite_cmu_us_awb.a} may be linked with 1208other programs like any other flite voice. The binary generated 1209@code{flite_cmu_us_awb} links in only one voice (unlike the flite binary in 1210the full flite distribution. 1211 1212A single flat file contain the cg voice can also be generated that can 1213be loaded at run time into the flite binary. You can dump this file 1214from the initial constructed flite binary 1215@example 1216 ./flite_cmu_us_awb -voicedump cmu_us_awb.flitevox 1217@end example 1218The file cmu_us_awb.flitevox may now be references (with pathname/url) on 1219the flite command line and used by the synthesizer 1220@example 1221 ./flite -voice cmu_us_awb.flitevox "Hello World" 1222@end example 1223 1224@section Lexicon Conversion 1225 1226As of 1.3 the script for converting the CMU lexicon (as distributed as 1227part of Festival) is included. @file{make_cmulex} will use the 1228version of CMULEX unpacked in the current directory to build a new 1229lexicon. Also in 1.3. a more sophisticated compression technique is 1230used to reduce the lexicon size. The lexicon is pruned, removing 1231those words which the letter to sound rule models get correct. Also 1232the letters and phones are separately huffman coded to produce a 1233smaller lexicon. 1234 1235@section Language Conversion 1236 1237This is by far the weakest part as this is the most open ended. There 1238are basic tools in the @file{flite/tools/} directory that include Scheme 1239code to convert various Scheme structures to C include CART tree 1240conversion and Lisp list conversion. The other major source of help 1241here is the existing language examples in @file{flite/lang/usenglish/}. 1242 1243Adding new language support is far from automatic, but there are core 1244scripts for setting up new Flite support for languages and lexicons. There 1245are also scripts for converting (Festival) phoneset definitions to C 1246and converting Festival lexicons to LTS rules and compressed lexicons 1247in C. 1248 1249But beyond that you are sort of on your own. The largest gap here is 1250text normalization. We do not yet have a standardize model for text 1251normalization with well definied models for which we could write 1252conversion scripts. 1253 1254However here is a step by step attempt to show you what to do when 1255building support for a new language/lexicon. 1256 1257Suppose we need to create support for Pashto, and already have a 1258festival voice running, and want it now to run in flite. Converting 1259the voice itself (unitselction or clustergen) is fairly robust, but 1260you will also need C libraries for @file{cmu_pashto_lang} and 1261@file{cmu_pasho_lex}. The first stage is to create the basic 1262temple files for these. In the core @file{flite/} source directory 1263type 1264@example 1265 ./tools/make_new_lang_lex pashto 1266@end example 1267This will create language and lex template files in 1268@file{lang/cmu_pashto_lang/'} and @file{cmu_pashto_lex}. 1269 1270Then in firectory @file{lang/cmu_pashto_lang/} type 1271@example 1272 festival $FLITEDIR/tools/make_phoneset.scm 1273 ... 1274 festival> (phonesettoC "cmu_pashto" (car (load "PATHTO/cmu_pashto_transtac_phoneset.scm" t)) "pau" ".") 1275@end example 1276This will create @file{cmu_pashto_lang_phoneset.[ch]}. You must the add these 1277explicitly to the @file{Makefile}. 1278 1279Then in @file{lang/cmu_pashto_lex/} you have to build the C version of 1280the lexicon and letter to sound rules. The core script is in 1281@file{flite/tools/build_lex}. 1282@example 1283 mkdir lex 1284 cd lex 1285 cp -p $FLITEDIR/tooks/build_lex . 1286@end example 1287Edit build_lex to give it the name of your lexicon name, and compiled 1288lexicon from your voice. 1289@example 1290LEXNAME=cmu_pashto 1291LEXFILE=lexicon.out 1292@end example 1293You should (I think) remove the first line ``MNCL'' from your 1294@file{lexicon.out} file, note this @emph{must} be the compiled lexicon 1295not the list of entries you compiled from as it expects the ordering, and 1296the syllabification. 1297@example 1298 ./build_lex setup 1299@end example 1300Build the letter to sound rules (probably again) 1301@example 1302 ./build_lex lts 1303@end example 1304Convert the compiled letter to sound rules to C. This converts the 1305decision trees to decision graphs and runs WFST minimization of them 1306to get a more efficient set of structures. This puts the generated C 1307files in @file{c/}. 1308@example 1309 ./build_lex lts2c 1310@end example 1311Now convert the lexical entries themselves 1312@example 1313 ./build_lex lex 1314@end example 1315Again the generate C files will be put in @file{c/}. 1316 1317Now we generated a Huffman codes compressed lexicon to reduce the 1318lexicon size, merging frequent letter sequences and phone sequences. 1319@example 1320 ./build_lex compresslex 1321@end example 1322The copy the @file{.c} and @file{.h} files to @file{lang/cmu_pashto_lex/} 1323[something about compressed and non-compressed???] 1324 1325 1326@chapter Porting to new platforms 1327 1328byte order, unions, compiler restrictions 1329 1330@chapter Future developments 1331 1332@contents 1333 1334@bye 1335