1# Natural Language Toolkit: Classifiers
2#
3# Copyright (C) 2001-2019 NLTK Project
4# Author: Edward Loper <edloper@gmail.com>
5# URL: <http://nltk.org/>
6# For license information, see LICENSE.TXT
7
8"""
9Classes and interfaces for labeling tokens with category labels (or
10"class labels").  Typically, labels are represented with strings
11(such as ``'health'`` or ``'sports'``).  Classifiers can be used to
12perform a wide range of classification tasks.  For example,
13classifiers can be used...
14
15- to classify documents by topic
16- to classify ambiguous words by which word sense is intended
17- to classify acoustic signals by which phoneme they represent
18- to classify sentences by their author
19
20Features
21========
22In order to decide which category label is appropriate for a given
23token, classifiers examine one or more 'features' of the token.  These
24"features" are typically chosen by hand, and indicate which aspects
25of the token are relevant to the classification decision.  For
26example, a document classifier might use a separate feature for each
27word, recording how often that word occurred in the document.
28
29Featuresets
30===========
31The features describing a token are encoded using a "featureset",
32which is a dictionary that maps from "feature names" to "feature
33values".  Feature names are unique strings that indicate what aspect
34of the token is encoded by the feature.  Examples include
35``'prevword'``, for a feature whose value is the previous word; and
36``'contains-word(library)'`` for a feature that is true when a document
37contains the word ``'library'``.  Feature values are typically
38booleans, numbers, or strings, depending on which feature they
39describe.
40
41Featuresets are typically constructed using a "feature detector"
42(also known as a "feature extractor").  A feature detector is a
43function that takes a token (and sometimes information about its
44context) as its input, and returns a featureset describing that token.
45For example, the following feature detector converts a document
46(stored as a list of words) to a featureset describing the set of
47words included in the document:
48
49    >>> # Define a feature detector function.
50    >>> def document_features(document):
51    ...     return dict([('contains-word(%s)' % w, True) for w in document])
52
53Feature detectors are typically applied to each token before it is fed
54to the classifier:
55
56    >>> # Classify each Gutenberg document.
57    >>> from nltk.corpus import gutenberg
58    >>> for fileid in gutenberg.fileids(): # doctest: +SKIP
59    ...     doc = gutenberg.words(fileid) # doctest: +SKIP
60    ...     print fileid, classifier.classify(document_features(doc)) # doctest: +SKIP
61
62The parameters that a feature detector expects will vary, depending on
63the task and the needs of the feature detector.  For example, a
64feature detector for word sense disambiguation (WSD) might take as its
65input a sentence, and the index of a word that should be classified,
66and return a featureset for that word.  The following feature detector
67for WSD includes features describing the left and right contexts of
68the target word:
69
70    >>> def wsd_features(sentence, index):
71    ...     featureset = {}
72    ...     for i in range(max(0, index-3), index):
73    ...         featureset['left-context(%s)' % sentence[i]] = True
74    ...     for i in range(index, max(index+3, len(sentence))):
75    ...         featureset['right-context(%s)' % sentence[i]] = True
76    ...     return featureset
77
78Training Classifiers
79====================
80Most classifiers are built by training them on a list of hand-labeled
81examples, known as the "training set".  Training sets are represented
82as lists of ``(featuredict, label)`` tuples.
83"""
84
85from nltk.classify.api import ClassifierI, MultiClassifierI
86from nltk.classify.megam import config_megam, call_megam
87from nltk.classify.weka import WekaClassifier, config_weka
88from nltk.classify.naivebayes import NaiveBayesClassifier
89from nltk.classify.positivenaivebayes import PositiveNaiveBayesClassifier
90from nltk.classify.decisiontree import DecisionTreeClassifier
91from nltk.classify.rte_classify import rte_classifier, rte_features, RTEFeatureExtractor
92from nltk.classify.util import accuracy, apply_features, log_likelihood
93from nltk.classify.scikitlearn import SklearnClassifier
94from nltk.classify.maxent import (
95    MaxentClassifier,
96    BinaryMaxentFeatureEncoding,
97    TypedMaxentFeatureEncoding,
98    ConditionalExponentialClassifier,
99)
100from nltk.classify.senna import Senna
101from nltk.classify.textcat import TextCat
102