• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..16-Feb-2021-

misc/H16-Feb-2021-5850

src/H16-Feb-2021-74,89568,517

.gitignoreH A D16-Feb-20216 21

CONTRIBUTING.mdH A D16-Feb-20211.4 KiB2723

LICENSEH A D16-Feb-202111.1 KiB204170

README.mdH A D16-Feb-20212.3 KiB7458

README.md

1# Compact Language Detector v3 (CLD3)
2
3* [Model](#model)
4* [Installation](#installation)
5* [Contact](#contact)
6* [Credits](#credits)
7
8### Model
9
10CLD3 is a neural network model for language identification. This package
11 contains the inference code and a trained model. The inference code
12 extracts character ngrams from the input text and computes the fraction
13 of times each of them appears. For example, as shown in the figure below,
14 if the input text is "banana", then one of the extracted trigrams is "ana"
15 and the corresponding fraction is 2/4. The ngrams are hashed down to an id
16 within a small range, and each id is represented by a dense embedding vector
17 estimated during training.
18
19The model averages the embeddings corresponding to each ngram type according
20 to the fractions, and the averaged embeddings are concatenated to produce
21 the embedding layer. The remaining components of the network are a hidden
22 (Rectified linear) layer and a softmax layer.
23
24To get a language prediction for the input text, we simply perform a forward
25 pass through the network.
26
27![Figure](model.png "CLD3")
28
29### Installation
30CLD3 is designed to run in the Chrome browser, so it relies on code in
31[Chromium](http://www.chromium.org/).
32The steps for building and running the demo of the language detection model are:
33
34- [check out](http://www.chromium.org/developers/how-tos/get-the-code) the
35  Chromium repository.
36- copy the code to `//third_party/cld_3`
37- Uncomment `language_identifier_main` executable in `src/BUILD.gn`.
38- build and run the model using the commands:
39
40```shell
41gn gen out/Default
42ninja -C out/Default third_party/cld_3/src/src:language_identifier_main
43out/Default/language_identifier_main
44```
45### Bugs and Feature Requests
46
47Open a [GitHub issue](https://github.com/google/cld3/issues) for this repository to file bugs and feature requests.
48
49### Announcements and Discussion
50
51For announcements regarding major updates as well as general discussion list, please subscribe to:
52[cld3-users@googlegroups.com](https://groups.google.com/forum/#!forum/cld3-users)
53
54### Credits
55
56Original authors of the code in this package include (in alphabetical order):
57
58* Alex Salcianu
59* Andy Golding
60* Anton Bakalov
61* Chris Alberti
62* Daniel Andor
63* David Weiss
64* Emily Pitler
65* Greg Coppola
66* Jason Riesa
67* Kuzman Ganchev
68* Michael Ringgaard
69* Nan Hua
70* Ryan McDonald
71* Slav Petrov
72* Stefan Istrate
73* Terry Koo
74