• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

FliteCMUGenericCG/H03-May-2022-1,289814

FliteCMUKalDiphone/H03-May-2022-1,345897

FliteTTSEngineObj/H03-May-2022-1,6331,254

cmu_cg/H03-May-2022-331266

cmu_us_kal/H03-May-2022-316267

cmulex/H03-May-2022-452399

flite/H03-May-2022-1,2031,067

usenglish/H03-May-2022-606534

MakefileH A D21-Oct-20172.5 KiB428

READMEH A D21-Oct-201713.9 KiB335270

README_CGH A D21-Oct-20172.1 KiB4329

flite_sapi.dswH A D21-Oct-20172.2 KiB11782

flite_sapi.slnH A D21-Oct-201711.9 KiB173171

README

1Microsoft Speech API 5.1 support for CMU Flite
2----------------------------------------------
3
4Copyright Cepstral, LLC, 2001 all rights reserved
5
6David Huggins-Daines <dhd@cepstral.com>
7December 6th, 2001
8
9About the Flite SAPI port
10-------------------------
11
12Funding for this work was provided by the Instituto de Engenharia de
13Sistemas e Computadores (Lisbon, Portugal).  The port itself was done
14by David Huggins-Daines at Cepstral, LLC (Pittsburgh, USA).
15
16This work remains Copyright Cepstral, LLC but is distributed under
17the same free software licence as CMU Flite.
18
19What's here
20-----------
21
22This directory contains a port of CMU Flite 1.1 and the included 8kHz
23diphone voice to the Win32 platform under Visual C++, as well as an
24interface library that allows Flite voices to be compiled into COM
25objects which implement the TTS engine interfaces for Microsoft's
26Speech API 5.1 (SAPI).
27
28What isn't here
29---------------
30
31There isn't a pointy-clicky tool for automatically converting voices
32to SAPI objects.  You are more than welcome to write one; it should be
33fairly straightforward to do as a Visual Studio wizard or add-in or
34whatever they call those things.
35
36Instead, the procedure for making SAPI objects is documented below.
37Feel free to ask me questions at the address above if you don't
38understand some parts of it.
39
40Some parts of the SAPI interface code are language specific, namely
41the viseme and phoneme translation code, and to some extent the text
42processing code.  These have been implemented for US English only,
43although the relevant functions are abstracted with function pointers
44so each voice is free to choose its own.  See "Language-Specific
45Functions" below for more information.
46
47Building and testing the example voice
48--------------------------------------
49
50In order to build the Flite SAPI code and example diphone voice, you
51will need a copy of Visual C++ 6.0 or later, as well as the full SDK
52for SAPI 5.1.  If you have the Microsoft Platform SDK, you will need
53to make sure to install the Internet Explorer SDK as well, as it
54contains some IDL files that are required in order to build COM
55objects with the Platform SDK.
56
57Before you build anything, you will need to set up Visual C++ to find
58the SAPI header and IDL files.  To do this, select "Options" from the
59"Tools" menu.  In the resulting dialog box, switch to the
60"Directories" tab.  Now, add the "include" and "IDL" directories from
61your SAPI SDK installation to the list.  If you performed a default
62installation of the SAPI SDK on the C: drive, they will be:
63
64  C:\Program Files\Microsoft Speech SDK 5.1\Include
65  C:\Program Files\Microsoft Speech SDK 5.1\IDL
66
67Make sure the build type is set to "Win32 - Debug" in Visual C++.  Set
68the active project to "FliteCMUKalDiphone".  Select "Build
69FliteCMUKalDiphone.dll" from the "Build" menu.  This will build all of
70the other libraries before building the SAPI object.
71
72The first time you use the voice, you will need to register it with
73Windows.  To do this, build the "register_vox" project and execute
74register_vox.exe (you can find it in the
75FliteCMUKalDiphone\register_vox\Debug subdirectory, or you can simply
76run it from within Visual C++).
77
78Although, normally, Visual C++ should automatically register the
79FliteCMUKalDiphone object as a COM server, in some cases you may have
80to do it manually.  This can be done by running 'regsvr32
81FliteCMUKalDiphone.dll' on the command-line from the build directory
82(sapi\FliteCMUKalDiphone\Debug).
83
84Now you can test the voice by running the SAPI "TTSApp" example
85program, or any of the other examples included with the SAPI SDK.
86
87How to SAPI-enable your own Flite voices
88----------------------------------------
89
90First, you'll obviously need to build your voice under Visual C++.
91This is relatively straightforward.  It's probably best to build it as
92a static library.  You'll need to make sure that it can find the Flite
93header files as well as the ones for your language and lexicon, which
94probably means adding their paths in the "Preprocessor" category of
95the "C/C++" tab of the Project Settings dialog box.
96
97One thing to be aware of is that the Microsoft linker will probably
98break if you have very large voice data files as C code.  To work
99around this problem, you can change "Debug Info" from "Program
100Database for Edit and Continue" to "Program Database" in the "General"
101category of the "C/C++" tab in the Project Settings dialog box.
102
103Since there is no support for dynamically discovering and loading
104voices in Flite, each SAPI voice links to its own instance of the
105engine.  This also simplifies the distribution and installation of
106voices considerably.
107
108The common code used to implement the SAPI interface is in the
109FliteTTSEngineObj library.  Your voice should create a subclass of
110CFliteTTSEngineObj.  In the minimal case, you only need to provide a
111constructor that sets the 'regfunc' and 'unregfunc' members of the
112base class to the registration and unregistration functions defined in
113your Flite voice.
114
115To set up the voice object, create a new Visual C++ project, using the
116project type "ATL COM AppWizard".  For "Server Type", choose "Dynamic
117Link Library (DLL)".
118
119Now you must create a definition for the voice object.  To do this,
120switch to "ClassView" in the sidebar, right-click on the project name,
121and choose "New ATL Object...".  Then, from the selection box, choose
122"Simple Object".
123
124In the "Names" tab of the next dialog box ("ATL Object Wizard
125Properties"), choose whatever names you like for your class.  In the
126"Attributes" tab, select "Both" in the "Threading Model" section, and
127"Custom" in the "Interface" section.
128
129Next, you need to edit the IDL file to import and use the relevant
130SAPI interfaces.  You should be able to find this file in the "Source
131Files" section of your new project in the "FileView" tab in the
132sidebar.  If your project name was "FooVoice", the IDL file will be
133called "FooVoice.idl".  In order to import the SAPI interfaces, you
134should add the following line to the list of 'import' statements at
135the top of the file:
136
137  import "sapiddk.idl";
138
139In order for Visual C++ to find this import, you will have to add the
140SAPI IDL directory it to the Tools->Options dialog box as detailed
141above (do not add it to the Project Settings, because MIDL.EXE is
142broken and will not accept it.)
143
144Just underneath the list of import statements, Visual C++ will have
145created a bogus interface definition for your object, which will look
146like this (the UUID and interface name will be different, of course):
147
148	[
149		object,
150		uuid(51284204-38B4-48C4-B65E-4FDAF6476D13),
151
152		helpstring("IFooVoiceObj Interface"),
153		pointer_default(unique)
154	]
155	interface IFooVoiceObj : IUnknown
156	{
157	};
158
159You should delete this section.  Then, you should change the list of
160interfaces in the 'coclass' section to use the SAPI TTS Engine
161interfaces.  It should look like this (replace "FooVoiceObj" with the
162name of your voice object):
163
164	coclass FooVoiceObj
165	{
166		[default] interface ISpTTSEngine;
167		interface ISpObjectWithToken;
168	};
169
170Now you will need to edit the source code for your voice object.  As
171noted above, in the minimal case, you will simply need to include
172"voxdefs.h" from your voice library, inherit from CFliteTTSEngineObj,
173provide declarations for REGISTER_VOX and UNREGISTER_VOX, and
174initialize the pointers to them in a constructor.
175
176To do this, open the header file for your voice object.  It can be
177found in the "Header Files" section for your project in the "FileView"
178tab in the sidebar.  If your voice object name (as entered in the
179"Names" tab in the "ATL Object Wizard Properties" dialog box above)
180was "FooVoiceObj", then this file will be called
181"FooVoiceObj.h".
182
183First, add these declarations underneath the #include statements at
184the top of the file:
185
186  #include "FliteTTSEngineObj.h"
187  #include "flite_sapi_usenglish.h"
188  extern "C" {
189  #include "voxdefs.h"
190  cst_voice *REGISTER_VOX(const char *voxdir);
191  void UNREGISTER_VOX(cst_voice *vox);
192  };
193
194You will need to either put the full path to voxdefs.h in the #include
195statement, or add the directory containing your voice's source code to
196the list of extra include directories in the "Preprocessor" category
197of the "C/C++" tab of the Project Settings dialog box.  You may also
198need to do the same for "FliteTTSEngineObj.h" (if you create your
199voice within the "flite_sapi" workspace included here, you can enter
200"..\FliteTTSEngineObj" here).
201
202Next, you must adjust the inheritance list for your voice's class.
203Remove the following lines marked with '-' and add the line marked
204with '+':
205
206-	public CComObjectRootEx<CComMultiThreadModel>,
207	public CComCoClass<CFooVoiceObj, &CLSID_FooVoiceObj>,
208-	public IFooVoiceObj
209+	public CFliteTTSEngineObj
210
211You must also change the COM interface map to contain the SAPI
212interfaces, by making the following changes (as above, remove the
213lines marked with '-' and add those marked with '+'):
214
215  BEGIN_COM_MAP(CFooVoiceObj)
216-  	COM_INTERFACE_ENTRY(IFooVoiceObj)
217+  	COM_INTERFACE_ENTRY(ISpTTSEngine)
218+  	COM_INTERFACE_ENTRY(ISpObjectWithToken)
219  END_COM_MAP()
220
221Finally, add code to the constructor to set the 'regfunc' and
222'unregfunc' members, and the language-specific functions, if you have
223them, by adding the lines marked with '+':
224
225  public:
226	  CFooVoiceObject() {
227+		regfunc = REGISTER_VOX;
228+		unregfunc = UNREGISTER_VOX;
229+		phonemefunc = flite_sapi_usenglish_phoneme;
230+		visemefunc = flite_sapi_usenglish_viseme;
231+		featurefunc = flite_sapi_usenglish_feature;
232+		pronouncefunc = flite_sapi_usenglish_pronounce;
233	  }
234
235Before you build the SAPI object, you will need to add the voice
236library and the Flite libraries (flite.lib, plus the libraries for the
237lexicon and language model, which are cmulex.lib and usenglish.lib for
238US English) to the list of extra libraries (in the "Input" section of
239the "Linker" tab of the Project Settings dialog box in Visual C++).
240You also need to include "winmm.lib" here as it is required by the
241Flite library. You'll also need to make sure that Visual C++ can find
242the Flite libraries - you can either set their projects up as
243dependencies of your SAPI object's project, or you can add a list of
244relative paths to their build directories in the Project Settings.
245
246Registering and testing your new SAPI voice object
247--------------------------------------------------
248
249Now that you've built your voice as a SAPI component, you must
250register it with the system so that it can be found and used by
251programs using the SAPI interface.  Predictably, this involves
252twiddling bits in the Windows Registry.
253
254Source code and a Visual C++ project is provided (in register-vox.cpp)
255for a small command-line program which performs the necessary
256operations for the CMU diphone voice.  To use it for another voice,
257you will need to make the modifications noted by /* CHANGEME */
258comments in the source code.
259
260To test your SAPI voice, use the TTSApp program included with the SAPI
261SDK.  You may find it helpful to build the debugging version of TTSApp
262from source code and specify it as the executable to run when
263debugging for your SAPI object's project.  This will allow you to set
264breakpoints in your code and get proper backtraces and so forth.
265
266Language-specific functions
267---------------------------
268
269The language-specific functions for US English are contained in the
270files flite_sapi_usenglish.c and flite_sapi_usenglish.h.  The SAPI
271object is set up to use them by initializing four function pointers
272which are members of class CFliteTTSEngineObj:
273
274  int (*phonemefunc)(cst_item *s);
275
276This function takes a cst_item representing a single phone (usually a
277member of the "Segment" relation) and returns the appropriate SAPI
278phone ID for it.
279
280  int (*visemefunc)(cst_item *s);
281
282This function takes a cst_item representing a single phone and returns
283the appropriate SAPI viseme ID for it.  The SAPI visemes are
284potentially language independent, though they are expressed in the
285documentation in terms of US English phonemes.  A more general
286description of them is included below.
287
288SP_VISEME_0	silence
289SP_VISEME_1	low mid/front unrounded vowels (ae, ah, ax)
290SP_VISEME_2	low back unrounded vowels (aa)
291SP_VISEME_3	low/mid-low back rounded vowels (ao)
292SP_VISEME_4	mid front unrounded vowels (eh, ey)
293SP_VISEME_5	English mid rhotic vowel (er)
294SP_VISEME_6	high front unrounded vowels and glides (ih, iy, y)
295SP_VISEME_7	high back rounded vowels and glides (uw, w)
296SP_VISEME_8	rounded-to-rounded rising diphthongs (ow)
297SP_VISEME_9	unrounded-to-rounded rising diphthongs (aw)
298SP_VISEME_10	rounded-to-unrounded rising diphthongs (oy)
299SP_VISEME_11	unrounded-to-unrounded rising diphthongs (ay)
300SP_VISEME_12	English glottal fricative (hh)
301SP_VISEME_13	English retroflex approximant (r)
302SP_VISEME_14	English lateral approximant (l)
303SP_VISEME_15	grooved alveolar/dental fricatives (s, z)
304SP_VISEME_16	palatal fricatives and affricates (sh, zh, ch, jh)
305SP_VISEME_17	interdental fricatives (th, dh)
306SP_VISEME_18	labiodental fricatives (f, v)
307SP_VISEME_19	alveolar/dental occlusives (d, t, n)
308SP_VISEME_20	velar occlusives (k, g, ng)
309SP_VISEME_21	bilabial occlusives (p, b, m)
310
311When adapting them to your phoneset, remember that the position of the
312lips and teeth as viewed from the front is more important than the
313place or manner of articulation /per se/.
314
315Also, while the US English code implements this using a table lookup,
316it may be more appropriate to determine them algorithmically from the
317feature values used in your phoneset.
318
319  int (*featurefunc)(cst_item *s);
320
321This function returns a bitmask used by SAPI to indicate placement of
322stress or emphasis (see the pages on the SPEVENTENUM and SPVFEATURE
323enumerations in the SAPI documentation for more information on this).
324If you use "stressed" and "accented" as the feature names for these
325things in your language code, you can just copy
326flite_sapi_usenglish_feature().
327
328  cst_val *(*pronounce_func)(SPPHONEID *spids);
329
330This function takes a zero-terminated array of SAPI phone identifiers
331and converts it to a list of cst_val containing the phone names as
332used by Flite as strings.  You will probably want to simply copy
333flite_sapi_usenglish_pronounce(), changing the tables it uses to look
334up phone names.
335

README_CG

1Microsoft Speech API 5.1 support for CLUSTERGEN Voices
2------------------------------------------------------
3
4Copyright (2015) Cobalt Speech and Language Inc
5
6Alok Parlikar <alok@cobaltspeech.com>
7December 2015
8
9------------------------------------------------------
10
11To use CLUSTERGEN voices under SAPI,
121. Open the "flite_sapi.sln" solution with Visual Studio.
132. Go to Build -> Batch Build
143. Choose to build FliteCMUGenericCG in Release mode for x86 and Win32.
15
16This will create the following files:
17  flite/sapi/Win32/Release/FliteCMUGenericCG_Win32.dll
18  flite/sapi/x64/Release/FliteCMUGenericCG_x64.dll
19
20In order to use these libraries, you will need to register these libraries
21and use them with a flitevox file.
22
23ASSUMPTION: Your voice file is in C:\xyz.flitevox
24
251. Register (regsvr32) the appropriate DLL file based on your Windows version.
26
272. Create registry entries for each flitevox file you would like to use.
28
29Here is an example of registry data you need to create:
30"SOFTWARE\Microsoft\Speech\Voices\Tokens\Flite001"; ValueType: string; ValueName: "CLSID"; ValueData: "{{435A0515-F568-4A0A-B5A3-42844348602F}"
31"SOFTWARE\Microsoft\Speech\Voices\Tokens\Flite001"; ValueType: string; ValueData: "Flite Voice 01"
32"SOFTWARE\Microsoft\Speech\Voices\Tokens\Flite001"; ValueType: string; ValueName: "409"; ValueData: "Flite Voice 01"
33"SOFTWARE\Microsoft\Speech\Voices\Tokens\Flite001"; ValueType: string; ValueName: "voxdir"; ValueData: "C:\xyz.flitevox"
34"SOFTWARE\Microsoft\Speech\Voices\Tokens\Flite001\Attributes"; ValueType: string; ValueName: "Gender"; ValueData: "Male"
35"SOFTWARE\Microsoft\Speech\Voices\Tokens\Flite001\Attributes"; ValueType: string; ValueName: "Name"; ValueData: "Flite Voice 01"
36"SOFTWARE\Microsoft\Speech\Voices\Tokens\Flite001\Attributes"; ValueType: string; ValueName: "Language"; ValueData: "409;9"
37"SOFTWARE\Microsoft\Speech\Voices\Tokens\Flite001\Attributes"; ValueType: string; ValueName: "Age"; ValueData: "Adult"
38"SOFTWARE\Microsoft\Speech\Voices\Tokens\Flite001\Attributes"; ValueType: string; ValueName: "Vendor"; ValueData: "Flite"
39
40################################################################################
41
42
43