• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

changelogs/H03-May-2022-60,75951,671

data/H03-May-2022-22,61522,547

doc/H03-May-2022-666654

COPYINGH A D21-Dec-202034.3 KiB675553

READMEH A D21-Dec-202015.8 KiB438309

documentation.cssH A D21-Dec-2020510 3832

README

1=====================================================================
2
3                              ======
4                              README
5                              ======
6
7                            WEKA 3.8.5
8                           21 Dec 2020
9
10                 Java Programs for Machine Learning
11
12           Copyright (C) 1998-2020  University of Waikato
13
14              web: http://www.cs.waikato.ac.nz/~ml/weka
15
16=====================================================================
17
18
19Contents:
20---------
21
221. Using one of the graphical user interfaces in Weka
23
242. Weka packages and the package manager
25
263. The Weka data format (ARFF)
27
284. Using Weka from the command line
29
30   - Classifiers
31   - Association rules
32   - Filters
33
345. Database access
35
366. The Experiment package
37
387. Weka manual
39
408. Source code
41
429. Credits
43
4410. Submission of code and bug reports
45
4611. Copyright
47
48
49----------------------------------------------------------------------
50
511. Using one of the graphical user interfaces in Weka:
52------------------------------------------------------
53
54This assumes that the Weka archive that you have downloaded has been
55extracted into a directory containing this README and that you haven't
56used an automatic installer (e.g. the one provided for Windows).
57
58Weka 3.7 requires Java 1.6 or higher. Depending on your platform you
59may be able to just double-click on the weka.jar icon to run the
60graphical user interfaces for Weka. Otherwise, from a command-line
61(assuming you are in the directory containing weka.jar), type
62
63java -jar weka.jar
64
65or if you are using Windows use
66
67javaw -jar weka.jar
68
69Note:
70Using "-jar" overrides your CLASSPATH variable! If you need to
71use classes specified in the CLASSPATH, use the following command
72instead:
73
74java -classpath $CLASSPATH:weka.jar weka.gui.GUIChooser
75
76or if you are using Windows use
77
78javaw -classpath "%CLASSPATH%;weka.jar" weka.gui.GUIChooser
79
80This will start a graphical user interface (weka.gui.GUIChooser) from
81which you can select various interfaces, like the SimpleCLI interface
82or the more sophisticated Explorer, Experimenter, and Knowledge Flow
83interfaces. SimpleCLI just acts like a simple command shell. The
84Explorer is currently the main interface for data analysis using
85Weka. The Experimenter can be used to compare the performance of
86different learning algorithms across various datasets. The Knowledge
87Flow provides a component-based alternative to the Explorer interface.
88
89Example datasets that can be used with Weka are in the sub-directory
90called "data", which should be located in the same directory as this
91README file.
92
93The Weka user interfaces provide extensive built-in help facilities
94(tool tips, etc.). Documentation for the Explorer can be found in
95ExplorerGuide.pdf (also in the same directory as this
96README).
97
98You can also start the GUI "Main" class from within weka.jar:
99
100java -classpath weka.jar:$CLASSPATH weka.gui.Main
101or if you are using Windows use
102javaw -classpath weka.jar;$CLASSPATH weka.gui.Main
103
104----------------------------------------------------------------------
105
1062. Weka packages and the package manager:
107-----------------------------------------
108
109From Weka 3.7.2 many Weka algorithms and tools have been extracted from
110the main Weka distribution and encapsulated in separate downloadable
111packages. These can be obtained from the Weka project on Sourceforge
112and installed manually, or Weka's new built-in package manager can
113be used to take care of installing/removing packages. There is both
114a command line and GUI package manager that can be used to browse
115and install packages. The package manager takes care of resolving
116dependencies and checking for conflicts.
117
118The GUI package manager can be found in the "Tools" menu of the
119GUIChooser. Detailed information on how to use the package management
120system can be found in the Weka manual ($WEKAINSTALL/WekaManual.pdf).
121
122----------------------------------------------------------------------
123
1243. The Weka data format (ARFF):
125-------------------------------
126
127Datasets for WEKA should be formatted according to the ARFF
128format. (However, there are several converters included in WEKA that
129can convert other file formats to ARFF. The Weka Explorer will use
130these automatically if it doesn't recognize a given file as an ARFF
131file.) Examples of ARFF files can be found in the "data" subdirectory.
132What follows is a short description of the file format. A more
133complete description is available from the Weka web page.
134
135A dataset has to start with a declaration of its name:
136
137@relation name
138
139followed by a list of all the attributes in the dataset (including
140the class attribute). These declarations have the form
141
142@attribute attribute_name specification
143
144If an attribute is nominal, specification contains a list of the
145possible attribute values in curly brackets:
146
147@attribute nominal_attribute {first_value, second_value, third_value}
148
149If an attribute is numeric, specification is replaced by the keyword
150numeric: (Integer values are treated as real numbers in WEKA.)
151
152@attribute numeric_attribute numeric
153
154In addition to these two types of attributes, there also exists a
155string attribute type. This attribute provides the possibility to
156store a comment or ID field for each of the instances in a dataset:
157
158@attribute string_attribute string
159
160After the attribute declarations, the actual data is introduced by a
161
162@data
163
164tag, which is followed by a list of all the instances. The instances
165are listed in comma-separated format, with a question mark
166representing a missing value.
167
168Comments are lines starting with % and are ignored by Weka.
169
170----------------------------------------------------------------------
171
1724. Using Weka from the command line:
173------------------------------------
174
175If you want to use Weka from your standard command-line interface
176(e.g. bash under Linux):
177
178a) Set WEKAINSTALL to be the directory which contains this README.
179b) Add $WEKAINSTALL/weka.jar to your CLASSPATH environment variable.
180c) Bookmark $WEKAINSTALL/doc/packages.html in your web browser.
181
182Alternatively you can try using the SimpleCLI user interface available
183from the GUI chooser discussed above.
184
185In the following, the names of files assume use of a unix command-line
186with environment variables. For other command-lines (including
187SimpleCLI) you should substitute the name of the directory where
188weka.jar lives for $WEKAINSTALL. If your platform uses something other
189character than / as the path separator, also make the appropriate
190substitutions.
191
192===========
193Classifiers
194===========
195
196Try:
197
198java weka.classifiers.trees.J48 -t $WEKAINSTALL/data/iris.arff
199
200This prints out a decision tree classifier for the iris dataset
201and ten-fold cross-validation estimates of its performance. If you
202don't pass any options to the classifier, WEKA will list all the
203available options. Try:
204
205java weka.classifiers.trees.J48
206
207The options are divided into "general" options that apply to most
208classification schemes in WEKA, and scheme-specific options that only
209apply to the current scheme---in this case J48. WEKA has a common
210interface to all classification methods. Any class that implements a
211classifier can be used in the same way as J48 is used above. WEKA
212knows that a class implements a classifier if it extends the
213Classifier class in weka.classifiers. Almost all classes in
214weka.classifiers fall into this category. Try, for example:
215
216java weka.classifiers.bayes.NaiveBayes -t $WEKAINSTALL/data/labor.arff
217
218Here is a list of some of the classifiers currently implemented in
219weka.classifiers:
220
221a) Classifiers for categorical prediction:
222
223weka.classifiers.lazy.IBk: k-nearest neighbour learner
224weka.classifiers.trees.J48: C4.5 decision trees
225weka.classifiers.rules.PART: rule learner
226weka.classifiers.bayes.NaiveBayes: naive Bayes with/without kernels
227weka.classifiers.rules.OneR: Holte's OneR
228weka.classifiers.functions.SMO: support vector machines
229weka.classifiers.functions.Logistic: logistic regression
230weka.classifiers.meta.AdaBoostM1: AdaBoost
231weka.classifiers.meta.LogitBoost: logit boost
232weka.classifiers.trees.DecisionStump: decision stumps (for boosting)
233etc.
234
235b) Classifiers for numeric prediction:
236
237weka.classifiers.functions.LinearRegression: linear regression
238weka.classifiers.trees.M5P: model trees
239weka.classifiers.rules.M5Rules: model rules
240weka.classifiers.lazy.IBk: k-nearest neighbour learner
241weka.classifiers.lazy.LWL: locally weighted learning
242
243=================
244Association rules
245=================
246
247Next to classification schemes, there is some other useful stuff in
248WEKA. Association rules, for example, can be extracted using the
249Apriori algorithm. Try
250
251java weka.associations.Apriori -t $WEKAINSTALL/data/weather.nominal.arff
252
253=======
254Filters
255=======
256
257There are also a number of tools that allow you to manipulate a
258dataset. These tools are called filters in WEKA and can be found
259in weka.filters.
260
261weka.filters.unsupervised.attribute.Discretize: discretizes numeric data
262weka.filters.unsupervised.attribute.Remove: deletes/selects attributes
263etc.
264
265Try:
266
267java weka.filters.supervised.attribute.Discretize -i
268  $WEKAINSTALL/data/iris.arff -c last
269
270----------------------------------------------------------------------
271
2725. Database access:
273-------------------
274
275In terms of database connectivity, you should be able to use any
276database with a Java JDBC driver. When using classes that access a
277database (e.g. the Explorer), you will probably want to create a
278properties file that specifies which JDBC drivers to use, where to
279find the database, and specify a mapping for the data types. This file
280should reside in your home directory or the current directory and be
281called "DatabaseUtils.props". An example is provided in
282weka/experiment (you need to expand weka.jar to be able to look a this
283file). Note that the settings in this file are used unless they are
284overidden by settings in the DatabaseUtils.props file in your home
285directory or the current directory (in that order).
286
287There are also example DatabaseUtils.props files for several common
288databases available (also in weka/experiment):
289
290* HSQLDB: DatabaseUtils.props.hsql
291* MS SQL Server 2000: DatabaseUtils.props.mssqlserver
292* MS SQL Server 2005 Express Edition: DatabaseUtils.props.mssqlserver2005
293* MySQL: DatabaseUtils.props.mysql
294* ODBC: DatabaseUtils.props.odbc
295* Oracle: DatabaseUtils.props.oracle
296* PostgreSQL: DatabaseUtils.props.postgresql
297
298----------------------------------------------------------------------
299
3006. The Experiment package:
301-----------------------------------------
302
303There is support for running experiments that involve evaluating
304classifiers on repeated randomizations of datasets, over multiple
305datasets (you can do much more than this, besides). The classes for
306this reside in the weka.experiment package. The basic architecture is
307that a ResultProducer (which generates results on some randomization
308of a dataset) sends results to a ResultListener (which is responsible
309for stating whether it already has the result, and otherwise storing
310results).
311
312Example ResultListeners include:
313
314weka.experiment.CSVResultListener: outputs results as
315comma-separated-value files.
316weka.experiment.InstancesResultListener: converts results into a set
317of Instances.
318weka.experiment.DatabaseResultListener: sends results to a database
319via JDBC.
320
321Example ResultProducers include:
322
323weka.experiment.RandomSplitResultProducer: train/test on a % split
324weka.experiment.CrossValidationResultProducer: n-fold cross-validation
325weka.experiment.AveragingResultProducer: averages results from another
326ResultPoducer
327weka.experiment.DatabaseResultProducer: acts as a cache for results,
328storing them in a database.
329
330The RandomSplitResultProducer and CrossValidationResultProducer make
331use of a SplitEvaluator to obtain actual results for a particular
332split, provided are ClassifierSplitEvaluator (for nominal
333classification) and RegressionSplitEvaluator (for numeric
334classification). Each of these uses a Classifier for actual results
335generation.
336
337So, you might have a DatabaseResultListener, that is sent results from
338an AveragingResultProducer, which produces averages over the n results
339produced for each run of an n-fold CrossValidationResultProducer,
340which in turn is doing nominal classification through a
341ClassifierSplitEvaluator, which uses OneR for prediction. Whew. But
342you can combine these things together to do pretty much whatever you
343want. You might want to write a LearningRateResultProducer that splits
344a dataset into increasing numbers of training instances.
345
346To run a simple experiment from the command line, try:
347
348java weka.experiment.Experiment -r -T datasets/UCI/iris.arff  \
349  -D weka.experiment.InstancesResultListener \
350  -P weka.experiment.RandomSplitResultProducer -- \
351  -W weka.experiment.ClassifierSplitEvaluator -- \
352  -W weka.classifiers.rules.OneR
353
354(Try "java weka.experiment.Experiment -h" to find out what these
355options mean)
356
357If you have your results as a set of instances, you can perform paired
358t-tests using weka.experiment.PairedTTester (use the -h option to find
359out what options it needs).
360
361However, all this is much easier if you use the Experimenter GUI.
362
363----------------------------------------------------------------------
364
3657. Weka Manual:
366------------------
367
368A comprehensive manual covering Weka's graphical and command-line
369user interfaces. $WEKAINSTALL/WekaManual.pdf
370
371----------------------------------------------------------------------
372
3738. Source code:
374---------------
375
376The source code for WEKA is in $WEKAINSTALL/weka-src.jar. To expand it,
377use the jar utility that's in every Java distribution (or any file
378archiver that can handle ZIP files).
379
380----------------------------------------------------------------------
381
3829. Credits:
383------------
384
385Refer to the web page for a list of contributors:
386
387http://www.cs.waikato.ac.nz/~ml/weka/
388
389----------------------------------------------------------------------
390
39110. Call for code and bug reports:
392---------------------------------
393
394If you have implemented a learning scheme, filter, application,
395visualization tool, etc., using the WEKA classes, and you would
396like to make it available to the community, then create a Weka
397"package" and submit your package's "Description.props" file
398to us. We will check the package to make sure that it works
399as advertised and doesn't contain any "nasties" and then
400add your Description.props to the central package meta data
401repository hosted on Sourceforge. Hosting downloadable
402packages is the responsibility of the contributer.
403
404The conditions for new classifiers (schemes in general) are that,
405firstly, they have to be published in the proceedings of a renowned
406conference (e.g., ICML) or as an article of respected journal (e.g.,
407Machine Learning) and, secondly, that they outperform other standard
408schemes (e.g., J48/C4.5).
409
410More information on contributing to Weka and how to create a Weka
411package can be found in the Weka manual ($WEKAINSTALL/WekaManual.pdf).
412
413If you find any bugs, send a bug report to the wekalist mailing list.
414
4151) For core Weka components, i.e. everything in the main weka.jar file
416(not including packages), send a bug report to the wekalist mailing
417list.
418
4192) For packages, check the package description (either online at
420Sourceforge or by using the package management system) and contact the
421maintainer of the package directly.
422
423-----------------------------------------------------------------------
424
42511. Copyright:
426--------------
427
428The core WEKA system is distributed under the GNU public
429license. Please read the file COPYING.
430
431Packages may be distributed under various licenses - check the
432description of the package in question for license details.
433
434-----------------------------------------------------------------------
435
436
437$Revision: 1.13 $
438