|
Name |
|
Date |
Size |
#Lines |
LOC |
| .. | | 03-May-2022 | - |
| README | H A D | 01-Jun-2018 | 3 KiB | 72 | 52 |
| czech.words | H A D | 01-Jun-2018 | 2.5 KiB | 292 | 291 |
| dutch.words | H A D | 01-Jun-2018 | 82.6 KiB | 10,001 | 10,000 |
| french.words | H A D | 01-Jun-2018 | 86.6 KiB | 10,000 | 9,999 |
| german.words | H A D | 01-Jun-2018 | 89.9 KiB | 10,001 | 10,000 |
| mkwordlist | H A D | 01-Jun-2018 | 2.4 KiB | 91 | 63 |
| norwegian.words | H A D | 01-Jun-2018 | 11 KiB | 1,262 | 1,261 |
| swedish.words | H A D | 01-Jun-2018 | 23.4 KiB | 2,486 | 2,485 |
| wordlist.inc | H A D | 01-Jun-2018 | 1,010.2 KiB | 20,294 | 20,293 |
README
1This dictionary contains list of common words in UTF-8. Each file is
2named for a language and contains common words in that language, one
3word per line.
4
5Any lines starting with '#' are disregarded.
6
7A note regarding licensing:
8
9The code and data in this directory are licensed under the OSL 2.1 by
10virtue of being in this source tree. Please write to info@aox.org if
11that's a problem for you. If anyone else wants to use this algorithm,
12we'll be very flexible.
13
14The data files in this directory are based on the following sources:
15
161. http://wortschatz.uni-leipzig.de/html/wliste.html
17
18 The files german.words, dutch.words and french.words are based on
19 Wortschatz material, transcoded to UTF-8.
20
212. Eva Schlittermann via email
22
23 The file czech.words is largely based on a list supplied by Eva
24 Schlittermann. Supplements desired.
25
263. These ten pages contain the 10,000 most frequent words in Norwegian
27 newspapers, as counted by the University of Oslo's Tekstlab project
28 (http://www.hf.uio.no/tekstlab/).
29
30 The original web pages have been deleted sometime since we fetched
31 them. Archive.org has copies:
32
33 http://web.archive.org/web/20050324200652/http://www.hf.uio.no/tekstlab/frekvensordlister/aviser.frek.html
34 .../aviser.frek2.html etc
35 ...
36 .../aviser.frek10.html
37
384. ftp://ftp.spraakbanken.gu.se/pub/statistik/PAROLE/parole_most_freq_10k.tgz
39
40 Note that swedish.words contains less than 25% of the
41 parole_most_freq_10k and is modified a little. For any purpose
42 other than this algorithm, we recommend going to the source,
43 http://spraakbanken.gu.se.
44
45 GU distributes its language data under the following license:
46
47 # --------------------------------------------------------- #
48 # ---- license ---- #
49 #---------------------------------------------------------- #
50 Copyright (c) 2003 Spr�kbanken, G�teborgs universitet
51
52 Permission is hereby granted, free of charge, to any person obtaining a
53 copy of this resource and associated documentation files (the
54 "Resource"), to deal in the Resource without restriction, including
55 without limitation the rights to use, copy, modify, merge, publish,
56 distribute, sublicense, and/or sell copies of the Resource, and to
57 permit persons to whom the Resource is furnished to do so, subject to
58 the following conditions:
59
60 The above copyright notice and this permission notice shall be included
61 in all copies or substantial portions of the Resource.
62
63 THE RESOURCE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
64 OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
65 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
66 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
67 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
68 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
69 RESOURCE OR THE USE OR OTHER DEALINGS IN THE RESOURCE.
70 #---------------------------------------------------------- #
71
72