1BERT 2---- 3 4:download:`Download scripts </model_zoo/bert.zip>` 5 6 7Reference: Devlin, Jacob, et al. "`Bert: Pre-training of deep bidirectional transformers for language understanding. <https://arxiv.org/abs/1810.04805>`_" arXiv preprint arXiv:1810.04805 (2018). 8 9BERT Model Zoo 10~~~~~~~~~~~~~~ 11 12The following pre-trained BERT models are available from the **gluonnlp.model.get_model** API: 13 14+-----------------------------------------+----------------+-----------------+ 15| | bert_12_768_12 | bert_24_1024_16 | 16+=========================================+================+=================+ 17| book_corpus_wiki_en_uncased | ✓ | ✓ | 18+-----------------------------------------+----------------+-----------------+ 19| book_corpus_wiki_en_cased | ✓ | ✓ | 20+-----------------------------------------+----------------+-----------------+ 21| openwebtext_book_corpus_wiki_en_uncased | ✓ | x | 22+-----------------------------------------+----------------+-----------------+ 23| wiki_multilingual_uncased | ✓ | x | 24+-----------------------------------------+----------------+-----------------+ 25| wiki_multilingual_cased | ✓ | x | 26+-----------------------------------------+----------------+-----------------+ 27| wiki_cn_cased | ✓ | x | 28+-----------------------------------------+----------------+-----------------+ 29| scibert_scivocab_uncased | ✓ | x | 30+-----------------------------------------+----------------+-----------------+ 31| scibert_scivocab_cased | ✓ | x | 32+-----------------------------------------+----------------+-----------------+ 33| scibert_basevocab_uncased | ✓ | x | 34+-----------------------------------------+----------------+-----------------+ 35| scibert_basevocab_cased | ✓ | x | 36+-----------------------------------------+----------------+-----------------+ 37| biobert_v1.0_pmc_cased | ✓ | x | 38+-----------------------------------------+----------------+-----------------+ 39| biobert_v1.0_pubmed_cased | ✓ | x | 40+-----------------------------------------+----------------+-----------------+ 41| biobert_v1.0_pubmed_pmc_cased | ✓ | x | 42+-----------------------------------------+----------------+-----------------+ 43| biobert_v1.1_pubmed_cased | ✓ | x | 44+-----------------------------------------+----------------+-----------------+ 45| clinicalbert_uncased | ✓ | x | 46+-----------------------------------------+----------------+-----------------+ 47| kobert_news_wiki_ko_cased | ✓ | x | 48+-----------------------------------------+----------------+-----------------+ 49 50where **bert_12_768_12** refers to the BERT BASE model, and **bert_24_1024_16** refers to the BERT LARGE model. 51 52.. code-block:: python 53 54 import gluonnlp as nlp; import mxnet as mx; 55 model, vocab = nlp.model.get_model('bert_12_768_12', dataset_name='book_corpus_wiki_en_uncased', use_classifier=False, use_decoder=False); 56 tokenizer = nlp.data.BERTTokenizer(vocab, lower=True); 57 transform = nlp.data.BERTSentenceTransform(tokenizer, max_seq_length=512, pair=False, pad=False); 58 sample = transform(['Hello world!']); 59 words, valid_len, segments = mx.nd.array([sample[0]]), mx.nd.array([sample[1]]), mx.nd.array([sample[2]]); 60 seq_encoding, cls_encoding = model(words, segments, valid_len); 61 62 63The pretrained parameters for dataset_name 64'openwebtext_book_corpus_wiki_en_uncased' were obtained by running the GluonNLP 65BERT pre-training script on OpenWebText. 66 67The pretrained parameters for dataset_name 'scibert_scivocab_uncased', 68'scibert_scivocab_cased', 'scibert_basevocab_uncased', 'scibert_basevocab_cased' 69were obtained by converting the parameters published by "Beltagy, I., Cohan, A., 70& Lo, K. (2019). Scibert: Pretrained contextualized embeddings for scientific 71text. arXiv preprint `arXiv:1903.10676 <https://arxiv.org/abs/1903.10676>`_." 72 73The pretrained parameters for dataset_name 'biobert_v1.0_pmc', 74'biobert_v1.0_pubmed', 'biobert_v1.0_pubmed_pmc', 'biobert_v1.1_pubmed' were 75obtained by converting the parameters published by "Lee, J., Yoon, W., Kim, S., 76Kim, D., Kim, S., So, C. H., & Kang, J. (2019). Biobert: pre-trained biomedical 77language representation model for biomedical text mining. arXiv preprint 78`arXiv:1901.08746 <https://arxiv.org/abs/1901.08746>`_." 79 80The pretrained parameters for dataset_name 'clinicalbert' were obtained by 81converting the parameters published by "Huang, K., Altosaar, J., & Ranganath, R. 82(2019). ClinicalBERT: Modeling Clinical Notes and Predicting Hospital 83Readmission. arXiv preprint `arXiv:1904.05342 84<https://arxiv.org/abs/1904.05342>`_." 85 86Additionally, GluonNLP supports the "`RoBERTa <https://arxiv.org/abs/1907.11692>`_" model: 87 88+-----------------------------------------+-------------------+--------------------+ 89| | roberta_12_768_12 | roberta_24_1024_16 | 90+=========================================+===================+====================+ 91| openwebtext_ccnews_stories_books_cased | ✓ | ✓ | 92+-----------------------------------------+-------------------+--------------------+ 93 94.. code-block:: python 95 96 import gluonnlp as nlp; import mxnet as mx; 97 model, vocab = nlp.model.get_model('roberta_12_768_12', dataset_name='openwebtext_ccnews_stories_books_cased', use_decoder=False); 98 tokenizer = nlp.data.GPT2BPETokenizer(); 99 text = [vocab.bos_token] + tokenizer('Hello world!') + [vocab.eos_token]; 100 seq_encoding = model(mx.nd.array([vocab[text]])) 101 102GluonNLP also supports the "`DistilBERT <https://arxiv.org/abs/1910.01108>`_" model: 103 104+-----------------------------------------+----------------------+ 105| | distilbert_6_768_12 | 106+=========================================+======================+ 107| distil_book_corpus_wiki_en_uncased | ✓ | 108+-----------------------------------------+----------------------+ 109 110.. code-block:: python 111 112 import gluonnlp as nlp; import mxnet as mx; 113 model, vocab = nlp.model.get_model('distilbert_6_768_12', dataset_name='distil_book_corpus_wiki_en_uncased'); 114 tokenizer = nlp.data.BERTTokenizer(vocab, lower=True); 115 transform = nlp.data.BERTSentenceTransform(tokenizer, max_seq_length=512, pair=False, pad=False); 116 sample = transform(['Hello world!']); 117 words, valid_len = mx.nd.array([sample[0]]), mx.nd.array([sample[1]]) 118 seq_encoding, cls_encoding = model(words, valid_len); 119 120Finally, GluonNLP also suports Korean BERT pre-trained model, "`KoBERT <https://github.com/SKTBrain/KoBERT>`_", using Korean wiki dataset (`kobert_news_wiki_ko_cased`). 121 122.. code-block:: python 123 124 import gluonnlp as nlp; import mxnet as mx; 125 model, vocab = nlp.model.get_model('bert_12_768_12', dataset_name='kobert_news_wiki_ko_cased',use_decoder=False, use_classifier=False) 126 tok = nlp.data.get_tokenizer('bert_12_768_12', 'kobert_news_wiki_ko_cased') 127 tok('안녕하세요.') 128 129.. hint:: 130 131 The pre-training, fine-tuning and export scripts are available `here. </_downloads/bert.zip>`__ 132 133 134Sentence Classification 135~~~~~~~~~~~~~~~~~~~~~~~ 136 137GluonNLP provides the following example script to fine-tune sentence classification with pre-trained 138BERT model. 139 140To enable mixed precision training with float16, set `--dtype` argument to `float16`. 141 142Results using `bert_12_768_12`: 143 144.. editing URL for the following table: https://tinyurl.com/y4n8q84w 145 146+-----------------+---------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 147|Task Name |Metrics |Results on Dev Set |log |command | 148+=================+=====================+=======================+============================================================================================================================================+=================================================================================================================================================================+ 149| CoLA |Matthew Corr. |60.32 |`log <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_CoLA_base_mx1.6.0rc1.log>`__ |`command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_CoLA_base_mx1.6.0rc1.sh>`__ | 150+-----------------+---------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 151| SST-2 |Accuracy |93.46 |`log <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_SST_base_mx1.6.0rc1.log>`__ |`command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_SST_base_mx1.6.0rc1.sh>`__ | 152+-----------------+---------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 153| MRPC |Accuracy/F1 |88.73/91.96 |`log <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_MRPC_base_mx1.6.0rc1.log>`__ |`command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_MRPC_base_mx1.6.0rc1.sh>`__ | 154+-----------------+---------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 155| STS-B |Pearson Corr. |90.34 |`log <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_STS-B_base_mx1.6.0rc1.log>`__ |`command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_STS-B_base_mx1.6.0rc1.sh>`__ | 156+-----------------+---------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 157| QQP |Accuracy |91 |`log <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_QQP_base_mx1.6.0rc1.log>`__ |`command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_QQP_base_mx1.6.0rc1.sh>`__ | 158+-----------------+---------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 159| MNLI |Accuracy(m/mm) |84.29/85.07 |`log <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_MNLI_base_mx1.6.0rc1.log>`__ |`command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_MNLI_base_mx1.6.0rc1.sh>`__ | 160+-----------------+---------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 161| XNLI (Chinese) |Accuracy |78.43 |`log <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_XNLI_base_mx1.6.0rc1.log>`__ |`command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_XNLI-B_base_mx1.6.0rc1.sh>`__ | 162+-----------------+---------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 163| RTE |Accuracy |74 |`log <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_RTE_base_mx1.6.0rc1.log>`__ |`command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_RTE_base_mx1.6.0rc1.sh>`__ | 164+-----------------+---------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+ 165 166 167 168Results using `roberta_12_768_12`: 169 170.. editing URL for the following table: https://www.shorturl.at/cjAO7 171 172+---------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ 173| Dataset | SST-2 | MNLI-M/MM | 174+=====================+======================================================================================================+==================================================================================================================+ 175| Validation Accuracy | 95.3% | 87.69%, 87.23% | 176+---------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ 177| Log | `log <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/roberta/finetuned_sst.log>`__ | `log <https://raw.githubusercontent.com/dmlc/web-data/master/gluonnlp/logs/roberta/mnli_1e-5-32.log>`__ | 178+---------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ 179| Command | `command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/roberta/finetuned_sst.sh>`__ | `command <https://raw.githubusercontent.com/dmlc/web-data/master/gluonnlp/logs/roberta/finetuned_mnli.sh>`__ | 180+---------------------+------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ 181 182.. editing URL for the following table: https://tinyurl.com/y5rrowj3 183 184Question Answering on SQuAD 185~~~~~~~~~~~~~~~~~~~~~~~~~~~ 186 187+-----------+-----------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ 188| Dataset | SQuAD 1.1 | SQuAD 1.1 | SQuAD 2.0 | 189+===========+=========================================================================================================================================+==========================================================================================================================================+==========================================================================================================================================+ 190| Model | bert_12_768_12 | bert_24_1024_16 | bert_24_1024_16 | 191+-----------+-----------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ 192| F1 / EM | 88.58 / 81.26 | 90.97 / 84.22 | 81.27 / 78.14 | 193+-----------+-----------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ 194| Log | `log <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_squad1.1_base_mx1.6.0rc1.log>`__ | `log <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_squad1.1_large_mx1.6.0rc1.log>`__ | `log <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_squad2.0_large_mx1.6.0rc1.log>`__ | 195+-----------+-----------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ 196| Command | `command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_squad1.1_base_mx1.6.0rc1.sh>`__ | `command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_squad1.1_large_mx1.6.0rc1.sh>`__ | `command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_squad2.0_large_mx1.6.0rc1.sh>`__ | 197+-----------+-----------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ 198| Prediction| `predictions.json <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_squad1.1_base_mx1.6.0rc1.json>`__ | `predictions.json <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_squad1.1_large_mx1.6.0rc1.json>`__ | `predictions.json <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/finetune_squad2.0_large_mx1.6.0rc1.json>`__ | 199+-----------+-----------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+ 200 201For all model settings above, we set learing rate = 3e-5 and optimizer = adam. 202 203Note that the BERT model is memory-consuming. If you have limited GPU memory, you can use the following command to accumulate gradient to achieve the same result with a large batch size by setting *accumulate* and *batch_size* arguments accordingly. 204 205.. code-block:: console 206 207 $ python finetune_squad.py --optimizer adam --accumulate 2 --batch_size 6 --lr 3e-5 --epochs 2 --gpu 208 209We support multi-GPU training via horovod: 210 211.. code-block:: console 212 213 $ HOROVOD_WITH_MXNET=1 HOROVOD_GPU_ALLREDUCE=NCCL pip install horovod --user --no-cache-dir 214 $ horovodrun -np 8 python finetune_squad.py --bert_model bert_24_1024_16 --batch_size 4 --lr 3e-5 --epochs 2 --gpu --dtype float16 --comm_backend horovod 215 216SQuAD 2.0 217+++++++++ 218 219For SQuAD 2.0, you need to specify the parameter *version_2* and specify the parameter *null_score_diff_threshold*. Typical values are between -1.0 and -5.0. Use the following command to fine-tune the BERT large model on SQuAD 2.0 and generate predictions.json. 220 221To get the score of the dev data, you need to download the dev dataset (`dev-v2.0.json <https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json>`_) and the evaluate script (`evaluate-2.0.py <https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/>`_). Then use the following command to get the score of the dev dataset. 222 223.. code-block:: console 224 225 $ python evaluate-v2.0.py dev-v2.0.json predictions.json 226 227BERT INT8 Quantization 228~~~~~~~~~~~~~~~~~~~~~~ 229 230GluonNLP provides the following example scripts to quantize fine-tuned 231BERT models into int8 data type. Note that INT8 Quantization needs a nightly 232version of `mxnet-mkl <https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/index.html>`_. 233 234Sentence Classification 235+++++++++++++++++++++++ 236 237+-----------+-------------------+---------------+---------------+---------+---------+------------------------------------------------------------------------------------------------------------------------+ 238| Dataset | Model | FP32 Accuracy | INT8 Accuracy | FP32 F1 | INT8 F1 | Command | 239+===========+===================+===============+===============+=========+=========+========================================================================================================================+ 240| MRPC | bert_12_768_12 | 87.01 | 87.01 | 90.97 | 90.88 |`command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/calibration_MRPC_base_mx1.6.0b20200125.sh>`__ | 241+-----------+-------------------+---------------+---------------+---------+---------+------------------------------------------------------------------------------------------------------------------------+ 242| SST-2 | bert_12_768_12 | 93.23 | 93.00 | | |`command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/calibration_SST_base_mx1.6.0b20200125.sh>`__ | 243+-----------+-------------------+---------------+---------------+---------+---------+------------------------------------------------------------------------------------------------------------------------+ 244 245Question Answering 246++++++++++++++++++ 247 248+-----------+-------------------+---------+---------+---------+---------+----------------------------------------------------------------------------------------------------------------------------+ 249| Dataset | Model | FP32 EM | INT8 EM | FP32 F1 | INT8 F1 | Command | 250+===========+===================+=========+=========+=========+=========+============================================================================================================================+ 251| SQuAD 1.1 | bert_12_768_12 | 81.18 | 80.32 | 88.58 | 88.10 |`command <https://github.com/dmlc/web-data/blob/master/gluonnlp/logs/bert/calibration_squad1.1_base_mx1.6.0b20200125.sh>`__ | 252+-----------+-------------------+---------+---------+---------+---------+----------------------------------------------------------------------------------------------------------------------------+ 253 254For all model settings above, we use a subset of evaluation dataset for calibration. 255 256Pre-training from Scratch 257~~~~~~~~~~~~~~~~~~~~~~~~~ 258 259We also provide scripts for pre-training BERT with masked language modeling and and next sentence prediction. 260 261The pre-training data format expects: (1) One sentence per line. These should ideally be actual sentences, not entire paragraphs or arbitrary spans of text for the "next sentence prediction" task. (2) Blank lines between documents. You can find a sample pre-training text with 3 documents `here <https://github.com/dmlc/gluon-nlp/blob/master/scripts/bert/sample_text.txt>`__. You can perform sentence segmentation with an off-the-shelf NLP toolkit such as NLTK. 262 263 264.. hint:: 265 266 You can download pre-processed English wikipedia dataset `here. <https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/enwiki-197b5d8d.zip>`__ 267 268 269Pre-requisite 270+++++++++++++ 271 272We recommend horovod for scalable multi-gpu multi-machine training. 273 274To install horovod, you need: 275 276- `NCCL <https://developer.nvidia.com/nccl>`__, and 277- `OpenMPI <https://www.open-mpi.org/software/ompi/v4.0/>`__ 278 279Then you can install horovod via the following command: 280 281.. code-block:: console 282 283 $ HOROVOD_WITH_MXNET=1 HOROVOD_GPU_ALLREDUCE=NCCL pip install horovod==0.16.2 --user --no-cache-dir 284 285Run Pre-training 286++++++++++++++++ 287 288You can use the following command to run pre-training with 2 hosts, 8 GPUs each: 289 290.. code-block:: console 291 292 $ mpirun -np 16 -H host0_ip:8,host1_ip:8 -mca pml ob1 -mca btl ^openib \ 293 -mca btl_tcp_if_exclude docker0,lo --map-by ppr:4:socket \ 294 --mca plm_rsh_agent 'ssh -q -o StrictHostKeyChecking=no' \ 295 -x NCCL_MIN_NRINGS=8 -x NCCL_DEBUG=INFO -x HOROVOD_HIERARCHICAL_ALLREDUCE=1 \ 296 -x MXNET_SAFE_ACCUMULATION=1 --tag-output \ 297 python run_pretraining.py --data='folder1/*.txt,folder2/*.txt,' \ 298 --data_eval='dev_folder/*.txt,' --num_steps 1000000 \ 299 --lr 1e-4 --total_batch_size 256 --accumulate 1 --raw --comm_backend horovod 300 301If you see out-of-memory error, try increasing --accumulate for gradient accumulation. 302 303When multiple hosts are present, please make sure you can ssh to these nodes without password. 304 305Alternatively, if horovod is not available, you could run pre-training with the MXNet native parameter server by setting --comm_backend and --gpus. 306 307.. code-block:: console 308 309 $ MXNET_SAFE_ACCUMULATION=1 python run_pretraining.py --comm_backend device --gpus 0,1,2,3,4,5,6,7 ... 310 311The BERT base model produced by gluonnlp pre-training script (`log <https://raw.githubusercontent.com/dmlc/web-data/master/gluonnlp/logs/bert/bert_base_pretrain.log>`__) achieves 83.6% on MNLI-mm, 93% on SST-2, 87.99% on MRPC and 80.99/88.60 on SQuAD 1.1 validation set on the books corpus and English wikipedia dataset. 312 313Custom Vocabulary 314+++++++++++++++++ 315 316The pre-training script supports subword tokenization with a custom vocabulary using `sentencepiece <https://github.com/google/sentencepiece>`__. 317 318To install sentencepiece, run: 319 320.. code-block:: console 321 322 $ pip install sentencepiece==0.1.82 --user 323 324You can `train <//github.com/google/sentencepiece/tree/v0.1.82/python#model-training>`__ a custom sentencepiece vocabulary by specifying the vocabulary size: 325 326.. code-block:: python 327 328 import sentencepiece as spm 329 spm.SentencePieceTrainer.Train('--input=a.txt,b.txt --unk_id=0 --pad_id=3 --model_prefix=my_vocab --vocab_size=30000 --model_type=BPE') 330 331To use sentencepiece vocab for pre-training, please set --sentencepiece=my_vocab.model when using run_pretraining.py. 332 333 334 335Export BERT for Deployment 336~~~~~~~~~~~~~~~~~~~~~~~~~~ 337 338Current export.py support exporting BERT models. Supported values for --task argument include classification, regression and question answering. 339 340.. code-block:: console 341 342 $ python export.py --task classification --model_parameters /path/to/saved/ckpt.params --output_dir /path/to/output/dir/ --seq_length 128 343 344This will export the BERT model for classification to a symbol.json file, saved to the directory specified by --output_dir. 345The --model_parameters argument is optional. If not set, the .params file saved in the output directory will be randomly initialized parameters. 346 347BERT for Sentence or Tokens Embedding 348~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 349 350The goal of this BERT Embedding is to obtain the token embedding from BERT's pre-trained model. In this way, instead of building and do fine-tuning for an end-to-end NLP model, you can build your model by just utilizing the token embeddings. You can use the command line interface below: 351 352.. code-block:: shell 353 354 python embedding.py --sentences "GluonNLP is a toolkit that enables easy text preprocessing, datasets loading and neural models building to help you speed up your Natural Language Processing (NLP) research." 355 Text: g ##lu ##on ##nl ##p is a tool ##kit that enables easy text prep ##ro ##ces ##sing , data ##set ##s loading and neural models building to help you speed up your natural language processing ( nl ##p ) research . 356 Tokens embedding: [array([-0.11881411, -0.59530115, 0.627092 , ..., 0.00648153, 357 -0.03886228, 0.03406909], dtype=float32), array([-0.7995638 , -0.6540758 , -0.00521846, ..., -0.42272145, 358 -0.5787281 , 0.7021201 ], dtype=float32), array([-0.7406778 , -0.80276626, 0.3931962 , ..., -0.49068323, 359 -0.58128357, 0.6811132 ], dtype=float32), array([-0.43287313, -1.0018158 , 0.79617643, ..., -0.26877284, 360 -0.621779 , -0.2731115 ], dtype=float32), array([-0.8515188 , -0.74098676, 0.4427735 , ..., -0.41267148, 361 -0.64225197, 0.3949393 ], dtype=float32), array([-0.86652845, -0.27746758, 0.8806506 , ..., -0.87452525, 362 -0.9551989 , -0.0786318 ], dtype=float32), array([-1.0987284 , -0.36603633, 0.2826037 , ..., -0.33794224, 363 -0.55210876, -0.09221527], dtype=float32), array([-0.3483025 , 0.401534 , 0.9361341 , ..., -0.29747447, 364 -0.49559578, -0.08878893], dtype=float32), array([-0.65626 , -0.14857645, 0.29733548, ..., -0.15890433, 365 -0.45487815, -0.28494897], dtype=float32), array([-0.1983894 , 0.67196256, 0.7867421 , ..., -0.7990434 , 366 0.05860569, -0.26884627], dtype=float32), array([-0.3775159 , -0.00590206, 0.5240432 , ..., -0.26754653, 367 -0.37806216, 0.23336883], dtype=float32), array([ 0.1876977 , 0.30165672, 0.47167772, ..., -0.43823618, 368 -0.42823148, -0.48873612], dtype=float32), array([-0.6576557 , -0.09822252, 0.1121515 , ..., -0.21743725, 369 -0.1820574 , -0.16115054], dtype=float32)] 370