• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..04-Nov-2021-

README.mdH A D04-Nov-20212.6 KiB6739

data.pyH A D04-Nov-20213.7 KiB11581

get_sherlockholmes_data.shH A D04-Nov-20211.8 KiB4420

model.pyH A D04-Nov-20213 KiB6842

module.pyH A D04-Nov-20215.4 KiB135102

train.pyH A D04-Nov-20215.5 KiB137100

README.md

1<!--- Licensed to the Apache Software Foundation (ASF) under one -->
2<!--- or more contributor license agreements.  See the NOTICE file -->
3<!--- distributed with this work for additional information -->
4<!--- regarding copyright ownership.  The ASF licenses this file -->
5<!--- to you under the Apache License, Version 2.0 (the -->
6<!--- "License"); you may not use this file except in compliance -->
7<!--- with the License.  You may obtain a copy of the License at -->
8
9<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
10
11<!--- Unless required by applicable law or agreed to in writing, -->
12<!--- software distributed under the License is distributed on an -->
13<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
14<!--- KIND, either express or implied.  See the License for the -->
15<!--- specific language governing permissions and limitations -->
16<!--- under the License. -->
17
18Word Level Language Modeling
19===========
20This example trains a multi-layer LSTM on Sherlock Holmes language modeling benchmark.
21
22The following techniques have been adopted for SOTA results:
23- [LSTM for LM](https://arxiv.org/pdf/1409.2329.pdf)
24- [Weight tying](https://arxiv.org/abs/1608.05859) between word vectors and softmax output embeddings
25
26## Prerequisite
27The example requires MXNet built with CUDA.
28
29## Data
30The Sherlock Holmes data is a copyright free copy of Sherlock Holmes from[(Project Gutenberg)](http://www.gutenberg.org/cache/epub/1661/pg1661.txt):
31
32## Usage
33Example runs and the results:
34
35```
36python train.py --tied --nhid 650 --emsize 650 --dropout 0.5        # Test ppl of 44.26
37```
38
39```
40usage: train.py [-h] [--data DATA] [--emsize EMSIZE] [--nhid NHID]
41                [--nlayers NLAYERS] [--lr LR] [--clip CLIP] [--epochs EPOCHS]
42                [--batch_size BATCH_SIZE] [--dropout DROPOUT] [--tied]
43                [--bptt BPTT] [--log-interval LOG_INTERVAL] [--seed SEED]
44
45Sherlock Holmes LSTM Language Model
46
47optional arguments:
48  -h, --help            show this help message and exit
49  --data DATA           location of the data corpus
50  --emsize EMSIZE       size of word embeddings
51  --nhid NHID           number of hidden units per layer
52  --nlayers NLAYERS     number of layers
53  --lr LR               initial learning rate
54  --clip CLIP           gradient clipping by global norm
55  --epochs EPOCHS       upper epoch limit
56  --batch_size BATCH_SIZE
57                        batch size
58  --dropout DROPOUT     dropout applied to layers (0 = no dropout)
59  --tied                tie the word embedding and softmax weights
60  --bptt BPTT           sequence length
61  --log-interval LOG_INTERVAL
62                        report interval
63  --seed SEED           random seed
64```
65
66
67