incubator-mxnet-1.9.0/example/ctc

<!--- Licensed to the Apache Software Foundation (ASF) under one -->
<!--- or more contributor license agreements.  See the NOTICE file -->
<!--- distributed with this work for additional information -->
<!--- regarding copyright ownership.  The ASF licenses this file -->
<!--- to you under the Apache License, Version 2.0 (the -->
<!--- "License"); you may not use this file except in compliance -->
<!--- with the License.  You may obtain a copy of the License at -->

<!---   http://www.apache.org/licenses/LICENSE-2.0 -->

<!--- Unless required by applicable law or agreed to in writing, -->
<!--- software distributed under the License is distributed on an -->
<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
<!--- KIND, either express or implied.  See the License for the -->
<!--- specific language governing permissions and limitations -->
<!--- under the License. -->

# Connectionist Temporal Classification

[Connectionist Temporal Classification](https://www.cs.toronto.edu/~graves/icml_2006.pdf) (CTC) is a cost function that is used to train Recurrent Neural Networks (RNNs) to label unsegmented input sequence data in supervised learning. For example in a speech recognition application, using a typical cross-entropy loss the input signal needs to be segmented into words or sub-words. However, using CTC-loss, a single unaligned label sequence per input sequence is sufficient for the network to learn both the alignment and labeling. Baidu's warp-ctc page contains a more detailed [introduction to CTC-loss](https://github.com/baidu-research/warp-ctc#introduction).

## LSTM OCR Example
In this example, we use CTC loss to train a network on the problem of Optical Character Recognition (OCR) of CAPTCHA images. This example uses the `captcha` python package to generate a random dataset for training. Training the network requires a CTC-loss layer and MXNet provides two options for such layer. The OCR example is constructed as follows:

1. 80x30 CAPTCHA images containing 3 to 4 random digits are generated using python captcha library.
2. Each image is used as a data sequence with sequence-length of 80 and vector length of 30.
3. The output layer uses CTC loss in training and softmax in inference.

Note: When using CTC-loss, one prediction label is reserved for blank label. In this example, when predicting digits between 0 to 9, softmax output has 11 labels, with label 0 used for blank and 1 to 10 used for digit 0 to digit 9 respectively.

### Description of the files
LSTM-OCR example contains the following files:
* `captcha_generator.py`: Module for generating random 3 or 4 digit CAPTCHA images for training. It also contains a script for generating sample CAPTCHA images into an output file for inference testing.
* `ctc_metrics.py`: Module for calculating the prediction accuracy during training. Two accuracy measures are implemented: A simple accuracy measure that calculates number of correct predictions divided by total number of predictions and a second accuracy measure based on sum of Longest Common Sequence (LCS) ratio of all predictions divided by total number of predictions.
* `hyperparameters.py`: Contains all hyperparameters for the network structure and training.
* `lstm.py`: Contains LSTM network implementations. Options for adding mxnet-ctc and warp-ctc loss for training as well as adding softmax for inference are available.
* `lstm_ocr_infer.py`: Script for running inference after training.
* `lstm_ocr_train.py`: Script for training with ctc or warp-ctc loss.
* `multiproc_data.py`: A module for multiprocess data generation.
* `oct_iter.py`: A DataIter module for iterating through training data.

## CTC-loss in MXNet
MXNet supports two CTC-loss layers in Symbol API:

* `mxnet.symbol.contrib.ctc_loss` is implemented in MXNet and included as part of the standard package.
* `mxnet.symbol.WarpCTC` uses Baidu's warp-ctc library and requires building warp-ctc library and mxnet library both from source.

### Building MXNet with warp-ctc
In order to use `mxnet.symbol.WarpCTC` layer, you need to first build Baidu's [warp-ctc](https://github.com/baidu-research/warp-ctc) library from source and then build MXNet from source with warp-ctc config flags enabled.

#### Building warp-ctc
You need to first build warp-ctc from source and then install it in your system. Please follow [instructions here](https://github.com/baidu-research/warp-ctc#compilation) to build warp-ctc from source. Once compiled, you need to install the library by running the following command from `warp-ctc/build` directory:
```
$ sudo make install
```

#### Building MXNet from source with warp-ctc integration
In order to build MXNet from source, you need to follow [instructions here](https://mxnet.apache.org/install/index.html). After choosing your system configuration, Python environment, and "Build from Source" options, before running `make` in step 4, you need to enable warp-ctc integration by uncommenting the following lines in `make/config.mk` in `incubator-mxnet` directory:
```
WARPCTC_PATH = $(HOME)/warp-ctc
MXNET_PLUGINS += plugin/warpctc/warpctc.mk
```

## Run LSTM OCR Example
Running this example requires the following pre-requisites:
* `captcha` and `opencv` python packages are installed:
```
$ pip install captcha
$ pip install opencv-python
```
* You have access to one (or more) `ttf` font files. You can download a collection of font files from [Ubuntu's website](https://design.ubuntu.com/font/). The instructions in this section assume that a `./font/Ubuntu-M.ttf` file exists under the `example/ctc/` directory.

### Training
The training script demonstrates how to construct a network with both CTC loss options and train using `mxnet.Module` API. Training is done by generating random CAPTCHA images using the font(s) provided. This example uses 80x30 captcha images that contain 3 to 4 digits each.

When using a GPU for training, the training bottleneck will be data generation. To remedy this bottleneck, this example implements a multiprocess data generation. Number of processes for image generation as well as training on CPU or GPU can be configured using command line arguments.

To see the list of all arguments:
```
$ python lstm_ocr_train.py --help
```
Using command line, you can also select between ctc or warp-ctc loss options. For example, the following command initiates a training session on a single GPU with 4 CAPTCHA generating processes using ctc loss and `font/Ubuntu-M.ttf` font file:
```
$ python lstm_ocr_train.py --gpu 1 --num_proc 4 --loss ctc font/Ubuntu-M.ttf
```

You can train with multiple fonts by specifying a folder that contains multiple `ttf` font files instead. The training saves a checkpoint after each epoch. The prefix used for checkpoint is 'ocr' by default, but can be changed with `--prefix` argument.

When testing this example, the following system configuration was used:
* p2.xlarge AWS EC2 instance (4 x CPU and 1 x K80 GPU)
* Deep Learning Amazon Machine Image (with mxnet 1.0.0)

This training example finishes after 100 epochs with ~87% accuracy. If you continue training further, the network achieves over 95% accuracy. Similar accuracy is achieved with both ctc (`--loss ctc`) and warp-ctc (`--loss warpctc`) options. Logs of the last training epoch:

```
05:58:36,128 Epoch[99] Batch [50]	Speed: 1067.63 samples/sec	accuracy=0.877757
05:58:42,119 Epoch[99] Batch [100]	Speed: 1068.14 samples/sec	accuracy=0.859688
05:58:48,114 Epoch[99] Batch [150]	Speed: 1067.73 samples/sec	accuracy=0.870469
05:58:54,107 Epoch[99] Batch [200]	Speed: 1067.91 samples/sec	accuracy=0.864219
05:58:58,004 Epoch[99] Train-accuracy=0.877367
05:58:58,005 Epoch[99] Time cost=28.068
05:58:58,047 Saved checkpoint to "ocr-0100.params"
05:59:00,721 Epoch[99] Validation-accuracy=0.868886
```

### Inference
The inference script demonstrates how to load a network from a checkpoint, modify its final layer, and predict a label for a CAPTCHA image using `mxnet.Module` API. You can choose the prefix as well as the epoch number of the checkpoint using command line arguments. To see the full list of arguments:
```
$ python lstm_ocr_infer.py --help
```
For example, to predict label for 'sample.jpg' file using 'ocr' prefix and checkpoint at epoch 100:
```
$ python lstm_ocr_infer.py --prefix ocr --epoch 100 sample.jpg

Digits: [0, 0, 8, 9]
```

Note: The above command expects the following files, generated by the training script, to exist in the current directory:
* ocr-symbol.json
* ocr-0100.params

#### Generate CAPTCHA samples
CAPTCHA images can be generated using the `captcha_generator.py` script. To see the list of all arguments:
```
$ python captcha_generator.py --help
```
For example, to generate a CAPTCHA image with random digits from 'font/Ubuntu-M.ttf' and save to 'sample.jpg' file:
```
$ python captcha_generator.py font/Ubuntu-M.ttf sample.jpg
```
Name		Date	Size	#Lines	LOC
..		04-Nov-2021	-
README.md	H A D	04-Nov-2021	8.7 KiB	131	91
captcha_generator.py	H A D	04-Nov-2021	6.3 KiB	200	157
ctc_metrics.py	H A D	04-Nov-2021	4.2 KiB	121	92
hyperparams.py	H A D	04-Nov-2021	2.1 KiB	77	45
lstm.py	H A D	04-Nov-2021	6.6 KiB	175	123
lstm_ocr_infer.py	H A D	04-Nov-2021	3.5 KiB	93	53
lstm_ocr_train.py	H A D	04-Nov-2021	4.9 KiB	126	85
multiproc_data.py	H A D	04-Nov-2021	4.1 KiB	127	98
ocr_iter.py	H A D	04-Nov-2021	3.7 KiB	114	80
ocr_predict.py	H A D	04-Nov-2021	6.2 KiB	164	85