• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

READMEH A D01-Jun-20183 KiB7252

czech.wordsH A D01-Jun-20182.5 KiB292291

dutch.wordsH A D01-Jun-201882.6 KiB10,00110,000

french.wordsH A D01-Jun-201886.6 KiB10,0009,999

german.wordsH A D01-Jun-201889.9 KiB10,00110,000

mkwordlistH A D01-Jun-20182.4 KiB9163

norwegian.wordsH A D01-Jun-201811 KiB1,2621,261

swedish.wordsH A D01-Jun-201823.4 KiB2,4862,485

wordlist.incH A D01-Jun-20181,010.2 KiB20,29420,293

README

1This dictionary contains list of common words in UTF-8. Each file is
2named for a language and contains common words in that language, one
3word per line.
4
5Any lines starting with '#' are disregarded.
6
7A note regarding licensing:
8
9The code and data in this directory are licensed under the OSL 2.1 by
10virtue of being in this source tree. Please write to info@aox.org if
11that's a problem for you. If anyone else wants to use this algorithm,
12we'll be very flexible.
13
14The data files in this directory are based on the following sources:
15
161. http://wortschatz.uni-leipzig.de/html/wliste.html
17
18   The files german.words, dutch.words and french.words are based on
19   Wortschatz material, transcoded to UTF-8.
20
212. Eva Schlittermann via email
22
23   The file czech.words is largely based on a list supplied by Eva
24   Schlittermann. Supplements desired.
25
263. These ten pages contain the 10,000 most frequent words in Norwegian
27   newspapers, as counted by the University of Oslo's Tekstlab project
28   (http://www.hf.uio.no/tekstlab/).
29
30   The original web pages have been deleted sometime since we fetched
31   them. Archive.org has copies:
32
33   http://web.archive.org/web/20050324200652/http://www.hf.uio.no/tekstlab/frekvensordlister/aviser.frek.html
34   .../aviser.frek2.html etc
35   ...
36   .../aviser.frek10.html
37
384. ftp://ftp.spraakbanken.gu.se/pub/statistik/PAROLE/parole_most_freq_10k.tgz
39
40   Note that swedish.words contains less than 25% of the
41   parole_most_freq_10k and is modified a little. For any purpose
42   other than this algorithm, we recommend going to the source,
43   http://spraakbanken.gu.se.
44
45   GU distributes its language data under the following license:
46
47   # --------------------------------------------------------- #
48   # ---- license                                         ---- #
49   #---------------------------------------------------------- #
50   Copyright (c) 2003 Spr�kbanken, G�teborgs universitet
51
52   Permission is hereby granted, free of charge, to any person obtaining a
53   copy of this resource and associated documentation files (the
54   "Resource"), to deal in the Resource without restriction, including
55   without limitation the rights to use, copy, modify, merge, publish,
56   distribute, sublicense, and/or sell copies of the Resource, and to
57   permit persons to whom the Resource is furnished to do so, subject to
58   the following conditions:
59
60   The above copyright notice and this permission notice shall be included
61   in all copies or substantial portions of the Resource.
62
63   THE RESOURCE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
64   OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
65   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
66   IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
67   CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
68   TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
69   RESOURCE OR THE USE OR OTHER DEALINGS IN THE RESOURCE.
70   #---------------------------------------------------------- #
71
72