1# Choosing an Estimator
2Estimators make up the core of the Rubix ML library and include classifiers, regressors, clusterers, anomaly detectors, and meta-estimators organized into their own namespaces. They are responsible for making predictions and are usually trained with data. Most estimators allow tuning by adjusting their user-defined hyper-parameters. Hyper-parameters are arguments to the learning algorithm that effect its behavior during training and inference. The values for the hyper-parameters can be chosen by intuition, [tuning](hyper-parameter-tuning.md), or completely at random. The defaults provided by the library are a good place to start for most problems. To instantiate a new estimator, pass the desired values of the hyper-parameters to the estimator's constructor like in the example below.
3
4```php
5use Rubix\ML\Classifiers\KNearestNeighbors;
6use Rubix\ML\Kernels\Distance\Minkowski;
7
8$estimator = new KNearestNeighbors(10, false, new Minkowski(2.5));
9```
10
11## Classifiers
12Classifiers are supervised learners that predict a categorical *class* label. They can be used to recognize (`cat`, `dog`, `turtle`), differentiate (`spam`, `not spam`), or describe (`running`, `walking`) the samples in a dataset based on the labels they were trained on. In addition, classifiers that implement the [Probabilistic](probabilistic.md) interface can infer the joint probability distribution of each possible class given an unclassified sample.
13
14| Name | Flexibility | [Proba](probabilistic.md) | [Online](online.md) | [Ranks Features](ranks-features.md) | [Verbose](verbose.md) | Data Compatibility |
15|---|---|---|---|---|---|---|
16| [AdaBoost](classifiers/adaboost.md) | High | ● | | | ● | Depends on base learner |
17| [Classification Tree](classifiers/classification-tree.md) | Medium | ● | | ● | | Categorical, Continuous |
18| [Extra Tree Classifier](classifiers/extra-tree-classifier.md) | Medium | ● | | ● | | Categorical, Continuous |
19| [Gaussian Naive Bayes](classifiers/gaussian-naive-bayes.md) | Medium | ● | ● | | | Continuous |
20| [K-d Neighbors](classifiers/kd-neighbors.md) | Medium | ● | | | | Depends on distance kernel |
21| [K Nearest Neighbors](classifiers/k-nearest-neighbors.md) | Medium | ● | ● | | | Depends on distance kernel |
22| [Logistic Regression](classifiers/logistic-regression.md) | Low | ● | ● | ● | ● | Continuous |
23| [Multilayer Perceptron](classifiers/multilayer-perceptron.md) | High | ● | ● | | ● | Continuous |
24| [Naive Bayes](classifiers/naive-bayes.md) | Medium | ● | ● | | | Categorical |
25| [Radius Neighbors](classifiers/radius-neighbors.md) | Medium | ● | | | | Depends on distance kernel |
26| [Random Forest](classifiers/random-forest.md) | High | ● | | ● | | Categorical, Continuous |
27| [Softmax Classifier](classifiers/softmax-classifier.md) | Low | ● | ● | | ● | Continuous |
28| [SVC](classifiers/svc.md) | High | | | | | Continuous |
29
30## Regressors
31Regressors are a type of supervised learner that predict a continuous-valued outcome such as `1.275` or `655`. They can be used to quantify a sample such as its credit score, age, or steering wheel position in units of degrees. Unlike classifiers whose range of predictions is bounded by the number of possible classes in the training set, a regressor's range is unbounded - meaning, the number of possible values a regressor *could* predict is infinite.
32
33| Name | Flexibility | [Online](online.md) | [Ranks Features](ranks-features.md) | [Verbose](verbose.md) | [Persistable](persistable.md) | Data Compatibility |
34|---|---|---|---|---|---|---|
35| [Adaline](regressors/adaline.md) | Low | ● | ● | ● | ● | Continuous |
36| [Extra Tree Regressor](regressors/extra-tree-regressor.md) | Medium | | ● | | ● | Categorical, Continuous |
37| [Gradient Boost](regressors/gradient-boost.md) | High | | ● | ● | ● | Categorical, Continuous |
38| [K-d Neighbors Regressor](regressors/kd-neighbors-regressor.md) | Medium | | | | ● | Depends on distance kernel |
39| [KNN Regressor](regressors/knn-regressor.md) | Medium | ● | | | ● | Depends on distance kernel |
40| [MLP Regressor](regressors/mlp-regressor.md) | High | ● | | ● | ● | Continuous |
41| [Radius Neighbors Regressor](regressors/radius-neighbors-regressor.md) | Medium | | | | ● | Depends on distance kernerl |
42| [Regression Tree](regressors/regression-tree.md) | Medium | | ● | | ● | Categorical, Continuous |
43| [Ridge](regressors/ridge.md) | Low | | ● | | ● | Continuous |
44| [SVR](regressors/svr.md) | High | | | | | Continuous |
45
46## Clusterers
47Clusterers are unsupervised learners that predict an integer-valued cluster number such as `0`, `1`, `...`, `n`. They are similar to classifiers, however since they lack a supervised training signal, they cannot be used to recognize or describe samples. Instead, clusterers differentiate and group samples using only the information found within the structure of the samples without their labels.
48
49| Name | Flexibility | [Proba](probabilistic.md) | [Online](online.md) | [Verbose](verbose.md) | [Persistable](persistable.md) | Data Compatibility |
50|---|---|---|---|---|---|---|
51| [DBSCAN](clusterers/dbscan.md) | High | | | | | Depends on distance kernel |
52| [Fuzzy C Means](clusterers/fuzzy-c-means.md) | Low | ● | | ● | ● | Continuous |
53| [Gaussian Mixture](clusterers/gaussian-mixture.md) | Medium | ● | | ● | ● | Continuous |
54| [K Means](clusterers/k-means.md) | Low | ● | ● | ● | ● | Continuous |
55| [Mean Shift](clusterers/mean-shift.md) | Medium | ● | | ● | ● | Continuous |
56
57## Anomaly Detectors
58Anomaly Detectors are unsupervised learners that predict whether a sample should be classified as an anomaly or not. We use the value `1` to indicate an outlier and `0` for a regular sample and the predictions can be cast to their boolean equivalent if needed. Anomaly detectors that implement the [Scoring](scoring.md) interface can output an anomaly score that can be used to sort the samples by their degree of anomalousness.
59
60| Name | Scope | [Scoring](scoring.md) | [Online](online.md) | [Verbose](verbose.md) | [Persistable](persistable.md) | Data Compatibility |
61|---|---|---|---|---|---|---|
62| [Gaussian MLE](anomaly-detectors/gaussian-mle.md) | Global | ● | ● | | ● | Continuous |
63| [Isolation Forest](anomaly-detectors/isolation-forest.md) | Local | ● | | | ● | Categorical, Continuous |
64| [Local Outlier Factor](anomaly-detectors/local-outlier-factor.md) | Local | ● | | | ● | Depends on distance kernel |
65| [Loda](anomaly-detectors/loda.md) | Local | ● | ● | | ● | Continuous |
66| [One Class SVM](anomaly-detectors/one-class-svm.md) | Global | | | | ● | Continuous |
67| [Robust Z-Score](anomaly-detectors/robust-z-score.md) | Global | ● | | | ● | Continuous  |
68
69## Model Flexibility Tradeoff
70A characteristic of most estimator types is the notion of *flexibility*. Flexibility can be expressed in different ways but greater flexibility usually comes with the capacity to handle more complex tasks. The tradeoff for flexibility is increased computational complexity, reduced model interpretability, and greater susceptibility to [overfitting](cross-validation.md#overfitting). In contrast, low flexibility models tend to be easier to interpret and quicker to train but are more prone to [underfitting](cross-validation.md#underfitting). In general, we recommend choosing the simplest model that does not underfit the training data for your project.
71
72## Meta-estimator Ensembles
73Ensemble learning is when multiple estimators are used together to make the final prediction of a sample. Meta-estimator ensembles can consist of multiple variations of the same estimator or a heterogeneous mix of estimators of the same type. They generally work by the principal of averaging and can often achieve greater accuracy than a single estimator at the cost of training more models.
74
75### Bootstrap Aggregator
76Bootstrap Aggregation or *bagging* is an ensemble learning technique that trains a set of learners that each specialize on a unique subset of the training set known as a bootstrap set. The final prediction made by the meta-estimator is the averaged prediction returned by the ensemble. In the example below, we'll wrap a [Regression Tree](regressors/regression-tree.md) in a [Bootstrap Aggregator](bootstrap-aggregator.md) to form a *forest* of 1000 trees.
77
78```php
79use Rubix\ML\BootstrapAggregator;
80use Rubix\ML\Regressors\RegressionTree;
81
82$estimator = new BootstrapAggregator(new RegressionTree(5), 1000);
83```
84
85### Committee Machine
86[Committee Machine](committee-machine.md) is a voting ensemble consisting of estimators (referred to as *experts*) with user-programmable *influences*. Each expert is trained on the same dataset and the final prediction is based on the contribution of each expert weighted by their influence.
87
88```php
89use Rubix\ML\CommitteeMachine;
90use Rubix\ML\RandomForest;
91new Rubix\ML\SoftmaxClassifier;
92use Rubix\ML\AdaBoost;
93use Rubix\ML\ClassificationTree;
94use Rubix\ML\Backends\Amp;
95
96$estimator = new CommitteeMachine([
97    new RandomForest(),
98    new SoftmaxClassifier(128),
99    new AdaBoost(new ClassificationTree(5), 1.0),
100], [
101    3.0, 1.0, 2.0, // Influences
102]);
103```
104
105## No Free Lunch Theorem
106At some point you may ask yourself "Why do we need so many different learning algorithms?" The answer to that question can be understood by the [No Free Lunch Theorem](https://en.wikipedia.org/wiki/No_free_lunch_theorem) which states that, when averaged over the space of *all* possible problems, no algorithm performs any better than the next. Perhaps a more useful way of stating NFL is that certain learners perform better at certain tasks and worse in others. This is explained by the fact that all learning algorithms have some prior knowledge inherent in them whether it be via the choice of hyper-parameters or the design of the algorithm itself. Another consequence of No Free Lunch is that there exists no single estimator that performs better for all problems.
107