1 2 Flite: a small run-time speech synthesis engine 3 version 2.1-release 4 Copyright Carnegie Mellon University 1999-2017 5 All rights reserved 6 http://cmuflite.org 7 https://github.com/festvox/flite 8 9 10Flite is an open source small fast run-time text to speech engine. It 11is the latest addition to the suite of free software synthesis tools 12including University of Edinburgh's Festival Speech Synthesis System 13and Carnegie Mellon University's FestVox project, tools, scripts and 14documentation for building synthetic voices. However, flite itself 15does not require either of these systems to compile and run. 16 17The core Flite library was developed by Alan W Black <awb@cs.cmu.edu> 18(mostly in his so-called spare time) while employed in the Language 19Technologies Institute at Carnegie Mellon University. The name 20"flite", originally chosen to mean "festival-lite" is perhaps doubly 21appropriate as a substantial part of design and coding was done over 2230,000ft while awb was travelling, and (usually) isn't in meetings. 23 24The voices, lexicon and language components of flite, both their 25compression techniques and their actual contents were developed by 26Kevin A. Lenzo <lenzo@cs.cmu.edu> and Alan W Black <awb@cs.cmu.edu>. 27 28Flite is the answer to the complaint that Festival is too big, too slow, 29and not portable enough. 30 31o Flite is designed for very small devices, such as PDAs, and also 32 for large server machines which need to serve lots of ports. 33 34o Flite is not a replacement for Festival but an alternative run time 35 engine for voices developed in the FestVox framework where size and 36 speed is crucial. 37 38o Flite is all in ANSI C, it contains no C++ or Scheme, thus requires 39 more care in programming, and is harder to customize at run time. 40 41o It is thread safe 42 43o Voices, lexicons and language descriptions can be compiled 44 (mostly automatically for voices and lexicons) into C representations 45 from their FestVox formats 46 47o All voices, lexicons and language model data are const and in the 48 text segment (i.e. they may be put in ROM). As they are linked in 49 at compile time, there is virtually no startup delay. 50 51o Although the synthesized output is not exactly the same as the same 52 voice in Festival they are effectively equivalent. That is, flite 53 doesn't sound better or worse than the equivalent voice in festival, 54 just faster, smaller and scalable. 55 56o For standard diphone voices, maximum run time memory 57 requirements are approximately less than twice the memory requirement 58 for the waveform generated. For 32bit archtectures 59 this effectively means under 1M. 60 61o The flite program supports, synthesis of individual strings or files 62 (utterance by utterance) to direct audio devices or to waveform files. 63 64o The flite library offers simple functions suitable for use in specific 65 applications. 66 67Flite is distributed with a single 8K diphone voice (derived from the 68cmu_us_kal voice), a pruned lexicon (derived from 69cmulex) and a set of models for US English. Here are comparisons 70with Festival using basically the same 8KHz diphone voice 71 72 Flite Festival 73 core code 60K 2.6M 74 USEnglish 100K ?? 75 lexicon 600K 5M 76 diphone 1.8M 2.1M 77 runtime <1M 16-20M 78 79 80On a 500Mhz PIII, a timing test of the first two chapters of 81"Alice in Wonderland" (doc/alice) was done. This produces about 821300 seconds of speech. With flite it takes 19.128 seconds (about 8370.6 times faster than real time) with Festival it takes 97 seconds 84(13.4 times faster than real time). On the ipaq (with the 16KHz diphones) 85flite synthesizes 9.79 time faster than real time. 86 87Requirements: 88------------- 89 90 o A good C compiler, some of these files are quite large and some C 91 compilers might choke on these, gcc is fine. Sun CC 3.01 has been 92 tested too. Visual C++ 6.0 is known to fail on the large diphone 93 database files. We recommend you use GCC Windows Subsystem for Linux 94 Cygwin or mingw32 instead. 95 96 o GNU Make 97 98 o An audio device isn't required as flite can write its output to 99 a waveform file. 100 101Supported platforms: 102-------------------- 103 104We have successfully compiled and run on 105 106 o Various Intel Linux systems (and iPaq Linux), under various versions 107 of GCC (2.7.2 to 6.x) 108 109 o Mac OS X 110 111 o Various Android devices 112 113 o Various openwrt devices 114 115 o FreeBSD 3.x and 4.x 116 117 o Solaris 5.7, and Solaris 9 118 119 o Windows 2000/XP and later under Cygwin 1.3.5 and later 120 121 o Windows 10 with Windows Subsystem for Linux 122 123 o Successfully compiles and runs under 64Bit Linux architectures 124 125 o OSF1 V4.0 (gives an unimportant warning about sizes when compiled cst_val.c) 126 127Previously we supported PalmOS and Windows CE but these seem to be rare 128nowadays so they are no longer actively supported. 129 130Other similar platforms should just work, we have also cross compiled 131on a Linux machine for StrongARM. However note that new byte order 132architectures may not work directly as there is some careful 133byte order constraints in some structures. These are portable but may 134require reordering of some fields, contact us if you are moving to 135a new archiecture. 136 137News 138---- 139 140New in 2.1 (Oct 2017) 141 142 o Improved Indic front end support (thanks to Suresh Bazaj @ Hear2Read) 143 144 o 18 English Voices (various accents) 145 146 o 12 Indian Voices (Bengali, Gujarati, Hindi, Kannada, Marathi, Panjabi 147 Tamil and Telugu) usually with bilingual (with English) support 148 149 o Can do byteswap architectures [again] (ar9331 yun arduino, zsun etc) 150 151 o flitecheck front-end test suite 152 153 o grapheme based festvox builds give working flitevox voices 154 155 o SAPI support for CG voices (thanks to Alok Parlikar @ Cobalt Speech and 156 Language INC) 157 158 o gcc 6.x support 159 160 o .flitevox files (and models) 40% of previous size, but same quality 161 162New in 2.0.0 (Dec 2014) 163 o Indic language support (Hindi, Tamil and Telugu) 164 165 o SSML support 166 167 o CG voices as files accessilble by file:/// and http:// 168 (and set of 13 voices to load) 169 170 o random forest (multimodel support) improves voice quality 171 172 o Supports diffrent sample rates/mgc order to tune for speed 173 174 o Kal diphone 500K smaller 175 176 o Fixed lots of API issues 177 178 o thread safe (again) [after initialization] 179 180 o Generalized tokenstreams (used in Bard Storyteller) 181 182 o simple-Pulseaudio support 183 184 o Improved Android support 185 186 o Removed PalmOS support from distribution 187 188 o Companion multilingual ebook reader Bard Storyteller 189 https://github.com/festvox/bard 190 191New in 1.4.1 (March 2010) 192 o better ssml support (actually does something) 193 194 o better clunit support (smaller) 195 196 o Android support 197 198New in 1.4 (December 2009) 199 o crude multi-voice selection support (may change) 200 201 o 4 basic voices are included 3 clustergen (awb, rms and slt) plus 202 the kal diphone database 203 204 o CMULEX now uses maximum onset for syllabification 205 206 o alsa support 207 208 o Clustergen support (including mlpg with mixed excitation) 209 But is still slow on limited processors 210 211 o Windows support with Visual Studio (specifically for the Olympus 212 Spoken Dialog System) 213 214 o WinCE support is redone with cegcc/mingw32ce with example 215 example TTS app: Flowm: Flite on Windows Mobile 216 217 o Speed-ups in feature interpretation limiting calls to alloc 218 219 o Speed-ups (and fixes) for converting clunits festvox voices 220 221New in 1.3-release (October 2005) 222 o fixes to lpc residual extraction to give better quality output 223 224 o An updated lexicon (festlex_CMU from festival-2.0.95) and better 225 compression its about 30% of the previous size, with about 226 the same accuracy 227 o Fairly substantial code movements to better support PalmOS and 228 multi-platform cross compilation builds 229 230 o A PalmOS 5.0 port with an small example talking app ("flop") 231 232 o runs under ix86_64 linux 233 234New in 1.2-release (February 2003) 235 o A build process for diphone and clunits/ldom voices 236 FestVox voices can be converted (sometimes) automatically 237 238 o Various bug fixes 239 240 o Initial support for Mac OS X (not talking to audio device yet) 241 but compiles and runs 242 243 o Text files can be synthesize to a single audio file 244 245 o (optional) shared library support (Linux) 246 247Compilation 248----------- 249 250In general 251 252 tar zxvf flite-2.1-current.tar.gz 253 254 cd flite-2.1-current 255 ./configure 256 make 257 make get_voices 258 259Where tar is gnu tar (gtar), and make is gnu make (gmake). 260 261Or 262 263 git clone http://github.com/festvox/flite 264 cd flite 265 ./configure 266 make 267 make get_voices 268 269Configuration should be automatic, but maybe doesn't work in all cases 270especially if you have some new compiler. You can explicitly set the 271compiler in config/config and add any options you see fit. Configure 272tries to guess these but it might be unable to guess for cross 273compilation cases Interesting options there are 274 275 -DWORDS_BIGENDIAN=1 for bigendian machines (e.g. Sparc, M68x, ar9331) 276 -DNO_UNION_INITIALIZATION=1 For compilers without C 99 union inintialization 277 -DCST_AUDIO_NONE if you don't need/want audio support 278 279There are different sets of voices and languages you can select between 280them (and your own sets if you make config/XXX.lv). For example 281 282 ./configure --with-langvox=transtac 283 284Will use the languages and voices defined in config/transtac.lv 285 286Usage: 287------ 288 289The ./bin/flite binary contains all supported voices and you may 290choose between the voices with the -voice flag and list the supported 291voices with the -lw flag. Note the kal (diphone) voice is a different 292technology from the others and is much less computationally expensive 293but more robotic. For each voice additional binaries that contain 294only that voice are created in ./bin/flite_FULLVOICENAME, 295e.g. ./bin/flite_cmu_us_awb. You can also refer to external clustergen 296.flitevox voice via a pathname argument with -voice (note the pathname 297must contain at least one "/") 298 299If it compiles properly a binary will be put in bin/, note by 300default -g is on so it will be bigger than is actually required 301 302 ./bin/flite "Flite is a small fast run-time synthesis engine" flite.wav 303 304Will produce an 8KHz riff headered waveform file (riff is Microsoft's 305wave format often called .WAV). 306 307 ./bin/flite doc/alice 308 309Will play the text file doc/alice. If the first argument contains 310a space it is treated as text otherwise it is treated as a filename. 311If a second argument is given a waveform file is written to it, 312if no argument is given or "play" is given it will attempt to 313write directly to the audio device (if supported). if "none" 314is given the audio is simply thrown away (used for benchmarking). 315Explicit options are also available. 316 317 ./bin/flite -v doc/alice none 318 319Will synthesize the file without playing the audio and give a summary 320of the speed. 321 322 ./bin/flite doc/alice alice.wav 323 324will synthesize the whole of alice into a single file (previoous 325versions would only give the last utterance in the file, but 326that is fixed now). 327 328An additional set of feature setting options are available, these are 329*debug* options, Voices are represented as sets of feature values (see 330lang/cmu_us_kal/cmu_us_kal.c) and you can override values on the 331command line. This can stop flite from working if malicious values 332are set and therefor this facility is not intended to be made 333available for standard users. But these are useful for 334debugging. Some typical examples are 335 336Use simple concatenation of diphones without prosodic modification 337 338 ./bin/flite --sets join_type=simple_join doc/intro 339 340Print sentences as they are said 341 342 ./bin/flite -pw doc/alice 343 344Make it speak slower 345 346 ./bin/flite --setf duration_stretch=1.5 doc/alice 347 348Make it speak higher pitch 349 350 ./bin/flite --setf int_f0_target_mean=145 doc/alice 351 352The talking clock is an example talking clode as discussed on 353http://festvox.org/ldom it requires a single argument HH:MM 354under Unix you can call it 355 356 ./bin/flite_time `date +%H:%M` 357 358List the voices linked in directly in this build 359 360 ./bin/flite -lv 361 362Speak with the US male rms voice (builtin version) 363 364 ./bin/flite -voice rms -f doc/alice 365 366Speak with the "Scottish" male awb voice (builtin version) 367 368 ./bin/flite -voice awb -f doc/alice 369 370Speak with the US female slt voice 371 372 ./bin/flite -voice slt -f doc/alice 373 374Speak with AEW voice, download on the fly from festvox.org 375 376 ./bin/flite -voice http://festvox.org/flite/packed/flite-2.1/voices/cmu_us_aew.flitevox -f doc/alice 377 378Speak with AHW voice loaded from the local file. 379 380 ./bin/flite -voice voices/cmu_us_ahw.flitevox -f doc/alice 381 382You can download the available voices into voices/ 383 384 ./bin/get_voices us_voices 385 386and/or 387 388 ./bin/get_voices indic_voices 389 390Voice quality 391------------- 392 393So you've eagerly downloaded flite, compiled it and run it, now you 394are disappointed that it doesn't sound wonderful, sure its fast and 395small but what you really hoped for was the dulcit tones of a deep 396baritone voice that would make you desperately hang on every phrase it 397mellifluously produces. But instead you get an 8Khz diphone voice that 398sounds like it came from the last millenium. 399 400Well, first, you are right, it is an 8KHz diphone voice from the last 401millenium, and that was actually deliberate. As we developed flite we 402wanted a voice that was stable and that we could directly compare with 403that very same voice in Festival. Flite is an *engine*. We want to 404be able take voices built with the FestVox process and compile them 405for flite, the result should be exactly the same quality (though of 406course trading the size for quality in flite is also an option). The 407included voice is just a sample voice that was used in the testing 408process. 409 410We expect that often voices will be loaded from external files, and we 411have now set up a voice repository in 412 413 http://festvox.org/flite/flite-2.1/voices/*.flitevox 414 415If you visit there with a browser you can hear the examples. You can 416also download the .flitevox files to you machine so you don't need a 417network connect everytime you need to load a voice. 418 419We are now actively adding to this list of available voices in English (16) 420and other languages. 421 422Bard Storyteller: https://github.com/festvox/bard 423-------------------------------------------------- 424 425Bard is a companion app that reads ebooks, both displaying them and 426actually reading them to you out loud using flite. Bard supports a 427wide range of fonts, and flite voices, and books in text, html and 428epub format. Bard is used as a evaluation of flite's capabilities and 429an example of a serious application using flite. 430 431