1# Compact Language Detector v3 (CLD3) 2 3* [Model](#model) 4* [Installation](#installation) 5* [Contact](#contact) 6* [Credits](#credits) 7 8### Model 9 10CLD3 is a neural network model for language identification. This package 11 contains the inference code and a trained model. The inference code 12 extracts character ngrams from the input text and computes the fraction 13 of times each of them appears. For example, as shown in the figure below, 14 if the input text is "banana", then one of the extracted trigrams is "ana" 15 and the corresponding fraction is 2/4. The ngrams are hashed down to an id 16 within a small range, and each id is represented by a dense embedding vector 17 estimated during training. 18 19The model averages the embeddings corresponding to each ngram type according 20 to the fractions, and the averaged embeddings are concatenated to produce 21 the embedding layer. The remaining components of the network are a hidden 22 (Rectified linear) layer and a softmax layer. 23 24To get a language prediction for the input text, we simply perform a forward 25 pass through the network. 26 27![Figure](model.png "CLD3") 28 29### Installation 30CLD3 is designed to run in the Chrome browser, so it relies on code in 31[Chromium](http://www.chromium.org/). 32The steps for building and running the demo of the language detection model are: 33 34- [check out](http://www.chromium.org/developers/how-tos/get-the-code) the 35 Chromium repository. 36- copy the code to `//third_party/cld_3` 37- Uncomment `language_identifier_main` executable in `src/BUILD.gn`. 38- build and run the model using the commands: 39 40```shell 41gn gen out/Default 42ninja -C out/Default third_party/cld_3/src/src:language_identifier_main 43out/Default/language_identifier_main 44``` 45### Bugs and Feature Requests 46 47Open a [GitHub issue](https://github.com/google/cld3/issues) for this repository to file bugs and feature requests. 48 49### Announcements and Discussion 50 51For announcements regarding major updates as well as general discussion list, please subscribe to: 52[cld3-users@googlegroups.com](https://groups.google.com/forum/#!forum/cld3-users) 53 54### Credits 55 56Original authors of the code in this package include (in alphabetical order): 57 58* Alex Salcianu 59* Andy Golding 60* Anton Bakalov 61* Chris Alberti 62* Daniel Andor 63* David Weiss 64* Emily Pitler 65* Greg Coppola 66* Jason Riesa 67* Kuzman Ganchev 68* Michael Ringgaard 69* Nan Hua 70* Ryan McDonald 71* Slav Petrov 72* Stefan Istrate 73* Terry Koo 74