• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..04-Nov-2021-

resources/H04-Nov-2021-2828

README.mdH A D04-Nov-20216.3 KiB160121

arch_deepspeech.pyH A D04-Nov-202110.4 KiB223174

config_util.pyH A D04-Nov-20213.8 KiB10361

deepspeech.cfgH A D04-Nov-20213.6 KiB121110

default.cfgH A D04-Nov-20213.4 KiB118107

flac_to_wav.shH A D04-Nov-20211,005 265

label_util.pyH A D04-Nov-20213.9 KiB14985

log_util.pyH A D04-Nov-20212.4 KiB6130

main.pyH A D04-Nov-202117.2 KiB371298

singleton.pyH A D04-Nov-20212.1 KiB5831

stt_bi_graphemes_util.pyH A D04-Nov-20212.2 KiB6238

stt_bucketing_module.pyH A D04-Nov-20211.3 KiB3110

stt_datagenerator.pyH A D04-Nov-202112.8 KiB282227

stt_io_bucketingiter.pyH A D04-Nov-20216.5 KiB166124

stt_io_iter.pyH A D04-Nov-20215 KiB12890

stt_layer_batchnorm.pyH A D04-Nov-20211.9 KiB5131

stt_layer_conv.pyH A D04-Nov-20212 KiB4725

stt_layer_fc.pyH A D04-Nov-20216 KiB12995

stt_layer_gru.pyH A D04-Nov-202111.2 KiB226188

stt_layer_lstm.pyH A D04-Nov-202115.1 KiB307250

stt_layer_slice.pyH A D04-Nov-20211.1 KiB288

stt_layer_warpctc.pyH A D04-Nov-20211.5 KiB4017

stt_metric.pyH A D04-Nov-20217.9 KiB253184

stt_utils.pyH A D04-Nov-20215.7 KiB148100

train.pyH A D04-Nov-20218 KiB187129

README.md

1<!--- Licensed to the Apache Software Foundation (ASF) under one -->
2<!--- or more contributor license agreements.  See the NOTICE file -->
3<!--- distributed with this work for additional information -->
4<!--- regarding copyright ownership.  The ASF licenses this file -->
5<!--- to you under the Apache License, Version 2.0 (the -->
6<!--- "License"); you may not use this file except in compliance -->
7<!--- with the License.  You may obtain a copy of the License at -->
8
9<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
10
11<!--- Unless required by applicable law or agreed to in writing, -->
12<!--- software distributed under the License is distributed on an -->
13<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
14<!--- KIND, either express or implied.  See the License for the -->
15<!--- specific language governing permissions and limitations -->
16<!--- under the License. -->
17
18**deepSpeech.mxnet: Rich Speech Example**
19=========================================
20
21This example based on [DeepSpeech2 of Baidu](https://arxiv.org/abs/1512.02595) helps you to build Speech-To-Text (STT) models at scale using
22- CNNs, fully connected networks, (Bi-) RNNs, (Bi-) LSTMs, and (Bi-) GRUs for network layers,
23- batch-normalization and drop-outs for training efficiency,
24- and a Warp CTC for loss calculations.
25
26In order to make your own STT models, besides, all you need is to just edit a configuration file not actual codes.
27
28
29* * *
30## **Motivation**
31This example is intended to guide people who want to making practical STT models with MXNet.
32With rich functionalities and convenience explained above, you can build your own speech recognition models with it easier than former examples.
33
34
35* * *
36## **Environments**
37- MXNet version: 0.9.5+
38- GPU memory size: 2.4GB+
39- Install mxboard for logging
40<pre>
41<code>pip install mxboard</code>
42</pre>
43
44- [SoundFile](https://pypi.python.org/pypi/SoundFile/0.8.1) for audio preprocessing (If encounter errors about libsndfile, follow [this tutorial](http://www.linuxfromscratch.org/blfs/view/svn/multimedia/libsndfile.html).)
45<pre>
46<code>pip install soundfile</code>
47</pre>
48- Warp CTC: Follow [this instruction](https://github.com/baidu-research/warp-ctc) to compile Baidu's Warp CTC. (Note: If you are using V100, make sure to use this [fix](https://github.com/baidu-research/warp-ctc/pull/118))
49- You need to compile MXNet with WarpCTC, follow the instructions [here](https://github.com/apache/incubator-mxnet/tree/master/example/ctc)
50- You might need to set `LD_LIBRARY_PATH` to the right path if MXNet fails to find your `libwarpctc.so`
51- **We strongly recommend that you first test a model of small networks.**
52
53
54* * *
55## **How it works**
56### **Preparing data**
57Input data are described in a JSON file **Libri_sample.json** as followed.
58<pre>
59<code>{"duration": 2.9450625, "text": "and sharing her house which was near by", "key": "./Libri_sample/3830-12531-0030.wav"}
60{"duration": 3.94, "text": "we were able to impart the information that we wanted", "key": "./Libri_sample/3830-12529-0005.wav"}</code>
61</pre>
62You can download two wave files above from [this](https://github.com/samsungsds-rnd/deepspeech.mxnet/tree/master/Libri_sample). Put them under /path/to/yourproject/Libri_sample/.
63
64
65### **Setting the configuration file**
66**[Notice]** The configuration file "default.cfg" included describes DeepSpeech2 with slight changes. You can test the original DeepSpeech2("deepspeech.cfg") with a few line changes to the cfg file:
67<pre><code>
68[common]
69...
70learning_rate = 0.0003
71# constant learning rate annealing by factor
72learning_rate_annealing = 1.1
73optimizer = sgd
74...
75is_bi_graphemes = True
76...
77[arch]
78...
79num_rnn_layer = 7
80num_hidden_rnn_list = [1760, 1760, 1760, 1760, 1760, 1760, 1760]
81num_hidden_proj = 0
82num_rear_fc_layers = 1
83num_hidden_rear_fc_list = [1760]
84act_type_rear_fc_list = ["relu"]
85...
86[train]
87...
88learning_rate = 0.0003
89# constant learning rate annealing by factor
90learning_rate_annealing = 1.1
91optimizer = sgd
92...
93</code></pre>
94
95
96* * *
97## **Run the example**
98### **Train**
99<pre><code>cd /path/to/your/project/
100mkdir checkpoints
101mkdir log
102python main.py --configfile default.cfg</code></pre>
103Checkpoints of the model will be saved at every n-th epoch.
104
105### **Load**
106You can (re-) train (saved) models by loading checkpoints (starting from 0). For this, you need to modify only two lines of the file "default.cfg".
107<pre><code>...
108[common]
109# mode can be one of the followings - train, predict, load
110mode = load
111...
112model_file = 'file name of your model saved'
113...</code></pre>
114
115
116### **Predict**
117You can predict (or test) audios by specifying the mode, model, and test data in the file "default.cfg".
118<pre><code>...
119[common]
120# mode can be one of the followings - train, predict, load
121mode = predict
122...
123model_file = 'file name of your model to be tested'
124...
125[data]
126...
127test_json = 'a json file described test audios'
128...</code></pre>
129<br />
130Run the following line after all modification explained above.
131<pre><code>python main.py --configfile default.cfg</code></pre>
132
133
134* * *
135## **Train and test your own models**
136
137Train and test your own models by preparing two files.
1381) A new configuration file, i.e., custom.cfg, corresponding to the file 'default.cfg'.
139The new file should specify the items below the '[arch]' section of the original file.
1402) A new implementation file, i.e., arch_custom.py, corresponding to the file 'arch_deepspeech.py'.
141The new file should implement two functions, prepare_data() and arch(), for building networks described in the new configuration file.
142
143Run the following line after preparing the files.
144<pre><code>python main.py --configfile custom.cfg --archfile arch_custom</pre></code>
145
146***
147## **Further more**
148You can prepare full LibriSpeech dataset by following the instruction on https://github.com/baidu-research/ba-dls-deepspeech
149**Change flac_to_wav.sh script of baidu to flac_to_wav.sh in repository to avoid bug**
150```bash
151git clone https://github.com/baidu-research/ba-dls-deepspeech
152cd ba-dls-deepspeech
153./download.sh
154cp -f /path/to/example/flac_to_wav.sh ./
155./flac_to_wav.sh
156python create_desc_json.py /path/to/ba-dls-deepspeech/LibriSpeech/train-clean-100 train_corpus.json
157python create_desc_json.py /path/to/ba-dls-deepspeech/LibriSpeech/dev-clean validation_corpus.json
158python create_desc_json.py /path/to/ba-dls-deepspeech/LibriSpeech/test-clean test_corpus.json
159```
160