README.md
1<!--- Licensed to the Apache Software Foundation (ASF) under one -->
2<!--- or more contributor license agreements. See the NOTICE file -->
3<!--- distributed with this work for additional information -->
4<!--- regarding copyright ownership. The ASF licenses this file -->
5<!--- to you under the Apache License, Version 2.0 (the -->
6<!--- "License"); you may not use this file except in compliance -->
7<!--- with the License. You may obtain a copy of the License at -->
8
9<!--- http://www.apache.org/licenses/LICENSE-2.0 -->
10
11<!--- Unless required by applicable law or agreed to in writing, -->
12<!--- software distributed under the License is distributed on an -->
13<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
14<!--- KIND, either express or implied. See the License for the -->
15<!--- specific language governing permissions and limitations -->
16<!--- under the License. -->
17
18Word Level Language Modeling
19===========
20This example trains a multi-layer LSTM on Sherlock Holmes language modeling benchmark.
21
22The following techniques have been adopted for SOTA results:
23- [LSTM for LM](https://arxiv.org/pdf/1409.2329.pdf)
24- [Weight tying](https://arxiv.org/abs/1608.05859) between word vectors and softmax output embeddings
25
26## Prerequisite
27The example requires MXNet built with CUDA.
28
29## Data
30The Sherlock Holmes data is a copyright free copy of Sherlock Holmes from[(Project Gutenberg)](http://www.gutenberg.org/cache/epub/1661/pg1661.txt):
31
32## Usage
33Example runs and the results:
34
35```
36python train.py --tied --nhid 650 --emsize 650 --dropout 0.5 # Test ppl of 44.26
37```
38
39```
40usage: train.py [-h] [--data DATA] [--emsize EMSIZE] [--nhid NHID]
41 [--nlayers NLAYERS] [--lr LR] [--clip CLIP] [--epochs EPOCHS]
42 [--batch_size BATCH_SIZE] [--dropout DROPOUT] [--tied]
43 [--bptt BPTT] [--log-interval LOG_INTERVAL] [--seed SEED]
44
45Sherlock Holmes LSTM Language Model
46
47optional arguments:
48 -h, --help show this help message and exit
49 --data DATA location of the data corpus
50 --emsize EMSIZE size of word embeddings
51 --nhid NHID number of hidden units per layer
52 --nlayers NLAYERS number of layers
53 --lr LR initial learning rate
54 --clip CLIP gradient clipping by global norm
55 --epochs EPOCHS upper epoch limit
56 --batch_size BATCH_SIZE
57 batch size
58 --dropout DROPOUT dropout applied to layers (0 = no dropout)
59 --tied tie the word embedding and softmax weights
60 --bptt BPTT sequence length
61 --log-interval LOG_INTERVAL
62 report interval
63 --seed SEED random seed
64```
65
66
67