1\name{NEWS} 2\title{News for Package \pkg{caret}} 3\newcommand{\cpkg}{\href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}} 4\newcommand{\issue}{\href{https://github.com/topepo/caret/issues/#1}{(issue #1)}} 5 6\section{Changes in version 6.0-90}{ 7 \itemize{ 8 \item A \code{ptype} element was added to \code{train} objects that records the trianing set predictor columns and their types using a zero-row slice. 9 \item Updated the internal object containing the subsampling information. The old package dependencies were being used. 10 } 11} 12 13\section{Changes in version 6.0-89}{ 14 \itemize{ 15 \item SMOTE subsampling is now computed via the \cpkg{themis} package \issue{1226}. 16 \item For SA feature selection, a better warning is given when there are too few iterations for computing differences \issue{1247}. 17 \item Better error message when pacakges are missing \issue{1246}. 18 \item \cpkg{e1071} was promoted from Suggests to Imports \issue{1238}. 19 } 20} 21 22 23\section{Changes in version 6.0-88}{ 24 \itemize{ 25 \item Fixed cases where the "corr" filter was not run in \code{preProcess()}. 26 \item Prediction type bug for Poisson glm model was fixed \issue{1231}. 27 \item Fixed a \code{PreProcess()} bug related to a single PCA component \issue{1181}. 28 \item Fixed random forest bugs related to \code{rfe()} \issue{#1077}, \issue{1062}. 29 \item RuleFit was added via the \cpkg{pre} package \issue{1218}. 30 \item Bugs were fixed where MAE was not treated as a minimization metric \issue{1224}. 31 } 32} 33 34\section{Changes in version 6.0-87}{ 35 \itemize{ 36 \item The \cpkg{ordinalNet} model was given an additional tuning parameter \code{modeltype}. 37 } 38} 39 40 41\section{Changes in version 6.0-86}{ 42 \itemize{ 43 \item Small release for \code{stringsAsFactors = TRUE} in R-4.0 44 } 45} 46 47\section{Changes in version 6.0-85}{ 48 \itemize{ 49 \item Internal changes required by r-devel for new matrix class structure. 50 \item Michael Mayer contributed a faster version of \code{groupKFold()}. \issue{1108} 51 \item A typo in a variable name was fixed. \issue{1087} 52 \item Removed some warnings related to contrasts. 53 \item Temporarily moved ROC calculations back to the \cpkg{pROC} package related to \code{JackStat/ModelMetrics#30}. \issue{1105} 54 55 } 56} 57 58\section{Changes in version 6.0-84}{ 59 \itemize{ 60 \item Another new version was related to character encodings. 61 } 62} 63 64\section{Changes in version 6.0-83}{ 65 \itemize{ 66 \item A new version was requested by CRAN since en dashes were used in the documentation. 67 \item A bug was fixed where, for some recipes that involve class imbalance sampling, the resampling indicies were computed incorrectly \issue{1030}. 68 \item code{train} now removes duplicate models in the tuning grid. Duplicates could occur for models with discrete parameters. 69 } 70} 71 72\section{Changes in version 6.0-82}{ 73 \itemize{ 74 \item Immediate and required updates related to the different behavior of \code{sample} in R >= 3.6. 75 \item \code{sbf}, \code{gafs}, and \code{safs} now accept recipes as inputs. A few sections of documentation were added to the \href{https://topepo.github.io/caret/}{bookdown site} 76 \item A bug was fixed in simulated annealing feature selection where the number of variables perturbed was based on the total number of variables instead of the more appropriate number of variables in the current subset. 77 \item Convenience functions \code{ggplot.gafs} and \code{ggplot.safs} were added. 78 \item \code{learning_curve_dat} now has a real name. 79 \item The \code{earth} and \code{ctree} models were udpdate to fix bugs \issue{1022}\issue{1018}. 80 \item When a model has the same value for its resamples, \code{plot.resamples} and \code{ggplot.resamples} now produce an estimate and missing values for the intervals (instead of failing) \issue{1007} 81 } 82} 83 84\section{Changes in version 6.0-81}{ 85 \itemize{ 86 \item The \code{blackboost} code gained a dependency on \cpkg{partykit} due to changes in \cpkg{mboost}. 87 \item The internals were updated to work with the latest version of the \cpkg{recipes} package. 88 \item Jason Muhlenkamp added better error messages for misspecified tuning parameters \issue{956}. 89 \item Two bugs in random forests with RFE were fixed in \issue{942}. 90 \item When correlation filters are used in \code{preProcess}, constant (i.e. zero-variance) columns are first removed to avoid errors \issue{966}. 91 \item A bug was fixed when \code{train} models using weights were updated \issue{935}. 92 \item Benjamin Allévius added more statistics to the output of \code{thresholder} \issue{938}. 93 \item A bug was fixed that occurred when \code{indexFinal} was used where the recipe that was saved was created using the entire training set \issue{928}. 94 } 95} 96 97 98\section{Changes in version 6.0-80}{ 99 \itemize{ 100 \item Two bugs associated with \code{varImp} in \code{lm} \issue{858} and in \code{bartMachine} were fixed by \code{hadjipantelis}. 101 \item SMOTE sampling now works with tibbles \issue{875} 102 \item \cpkg{rpart} now a dependency due to new CRAN policies. 103 \item Added a \code{ggplot} method for \code{resamples} that produces confidence intervals. 104 \item \code{hadjipantelis} added some fixes for \code{mxnet} models \issue{887}. 105 } 106} 107 108 109\section{Changes in version 6.0-79}{ 110 \itemize{ 111 \item \code{keras} and \code{mxnet} models have better initialization of parameters \href{https://github.com/topepo/caret/pull/765/files}{pr 765} 112 \item The range preprocessing method can scale the data to an arbitrary range. Thanks to Sergey Korop. 113 \item The spatial sign transformation will now operation on all non-missing predictors. Thanks to Markus Peter Auer \issue{789}. 114 \item A variety of small changes were made to work with the new version of the \cpkg{gam} package \issue{828}. 115 \item The package vignette was changed to HTML. 116 \item A big was fixed for computing variable importance scores with the various PLS methods \issue{848}. 117 \item Fixed a \code{drop = FALSE} bug that occurred when computing class probabilities \issue{849}. 118 \item An issue with predicting probabilities with \code{multinom} and one observation was fixed \issue{827}. 119 \item A bug in the threshold calculation for choosing the number of PCA components was resolved \issue{825}. 120 \item Models \code{mlpML} and \code{mlpWeightDecayML} now ignore layers with zero units. For example, if the number of layers was specified to be \code{c(5, 0, 3)} a warning is issued and the architecture is converted to \code{c(5, 3)} \issue{829}. 121 \item \code{svmLinearWeights2} and \code{svmLinear3} may have chosen the incorrect SVM loss function. This was found by Dirk Neumann \issue{826} 122 \item \cpkg{bnclassify} models \code{awtan} and \code{awnb} were updated since they previously used deprecated functions. All \cpkg{bnclassify} models now require version 0.3.3 of that package or greater \issue{815}. 123 \item \code{confusionMatrix.default} not requires \code{data} and \code{reference} to be factors and will throw an error otherwise. Previously, the vectors were converted to factors but this resulted in too many bug reports and misuse. 124 \item \code{xyplot.resample} did not pass the dots to the underlying plot function \issue{853}. 125 \item A bug with model \code{xgbDART} was fixed by \code{hadjipantelis}. 126 } 127} 128 129\section{Changes in version 6.0-78}{ 130 \itemize{ 131 \item A number of changes were made to the underlying model code to repair problems caused by the previous version. In essence, unless the modeling package was formally loaded, the model code would fail in some cases. In the vast majority of cases, \code{train} will not load the package (but will load the namespace). There are some exceptions where this is not possible, including \code{bam}, \code{earth}, \code{gam}, \code{gamLoess}, \code{gamSpline}, \code{logicBag}, \code{ORFlog}, \code{ORFpls}, \code{ORFridge}, \code{ORFsvm}, \code{plsRglm}, \code{RSimca}, \code{rrlda}, \code{spikeslab}, and others. These are noted in \code{?models} and in the model code itself. The regression tests now catch these issues. 132 \item The option to control the minimum node size to models \code{ranger} and \code{Rborist} was added by \code{hadjipantelis} \issue{732}. 133 \item The rule-based model \code{GFS.GCCL} was removed from the model library. 134 \item A bug was fixed affecting models using the \pkg{sparsediscrim} package (i.e. \code{dda} and \code{rlda}) where the class probability values were reversed. \issue{761}. 135 \item The \code{keras} models now clear the session prior to each model fit to avoid problems. Also, on the last fit, the model is serialized so that it can be used between sessions. The \code{predict} code will automatically undo this encoding so that the user does not have to manually intervene. 136 \item A bug in \code{twoClassSummary} was fixed that prevents failure when the class level includes "y" \issue{770}. 137 \item The \code{preProcess} function can now scale variables to a range where the user can set the high and low values \issue{730}. Thanks to Sergey Korop. 138 \item Erwan Le Pennec fixed some issues when \code{train} was run using some parallel processing backends (e.g. \code{doFuture} and \code{doAzureParallel}) \issue{748}. 139 \item Waleed Muhanna found and fixed a bug in \code{twoClassSim} when irrelevant variables were generated. \issue{744}. 140 \item \code{hadjipantelis} added the DART model (aka "Dropouts meet Multiple Additive Regression Trees") with the model code \code{xgbDART} \issue{742}. 141 \item Vadim Khotilovich updated \code{predict.dummyVars} to run faster with large datasets with many factors \issue{727}. 142 \item \code{spatialSign} now has the option of removing missing data prior to computing the norm \issue{789}. 143 \item The various \cpkg{earth} models have been updated to work with recent versions of that package, including multi-class \code{glm} models \issue{779}. 144 } 145} 146 147\section{Changes in version 6.0-77}{ 148 \itemize{ 149 \item Two neural network models (containing up to three hidden layers) using \code{mxnet} were added; \code{mxnet} (optimiser: SGD) and \code{mxnetAdam} (optimiser: ADAM). 150 \item A new method was added for \code{train} so that \cpkg{recipes} can be used to specify the model terms and preprocessing. Alexis Sardá provided a great deal of help converting the bootstrap optimism code to the new workflows. A new chapter was added to the package website related to recipes. 151 \item The Yeo-Johnson transformation parameter estimation code was rewritten and not longer requires the \code{car} package. 152 \item The leave-one-out cross-validation workflow for \code{train} has been harmonized with the other resampling methods in terms of fault tolerance and prediction trimming. 153 \item \code{train} now uses different random numbers to make resamples. Previously, setting the seed prior to calling \code{train} should result in getting the same resamples. However, if \code{train} loaded or imported a namespace from another package, and that startup process used random numbers, it could lead to different random numbers being used. See \issue{452} for details. Now, \code{train} gets a separate (and more reproducible) seed that will be used to generate the resamples. However, this may effect random number reproducibility between this version and previous versions. Otherwise, this change should increase the reproducibility of results. 154 \item Erwan Le Pennec conducted the herculean task of modifying all of the model code to call by namespace (instead of fully loading each required package). This should reduce naming conflicts \issue{701}. 155 \item MAE was added as output metric for regression tasks through \code{postResample} and \code{defaultSummary} by hadjipantelis. The function is now exposed to the users. \issue{657}. 156 \item More average precision/recall statistics were added to \code{multiClassSummary} \issue{697}. 157 \item The package website code was updated to use version 4 of the D3 JS library and now uses \cpkg{heatmaply} to make the interactive heatmap. 158 \item Added a \code{ggplot} method for lift objects (and fixed a bug in the \code{lattice} version of the code) for \issue{656}. 159 \item Vadim Khotilovich made a change to speed up \code{predict.dummyVars} \issue{727}. 160 \item The model code for \code{ordinalNet} was updated for recent changes to that package. 161 \item \code{oblique.tree} was removed from the model library. 162 \item The default grid generation for rotation forest models now provides better values of \code{K}. 163 \item The parameter ranges for \code{AdaBag} and \code{AdaBoost.M1} were changed; the number of iterations in the default grids have been lowered. 164 \item Switched to non-formula interface in ranger. Also, another tuning parameter was added to ranger (\code{splitrule}) that can be used to change the splitting procedure and includes extremely randomized trees. This requires version 0.8.0 of the \cpkg{ranger} package. \issue{581} 165 \item A simple "null model" was added. For classification, it predictors using the most prevalent level and, for regression, fits an intercept only model. \issue{694} 166 \item A function \code{thresholder} was added to analyze the resample results for two class problems to choose an appropriate probability cutoff a la \url{https://topepo.github.io/caret//using-your-own-model-in-train.html#Illustration5} \issue{224}. 167 \item Two neural network models (containing a single hidden layers) using \code{tensorflow}/\code{keras} were added. \code{mlpKerasDecay} uses standard weight decay while \code{mlpKerasDropout} uses dropout for regularization. Both use RMSProp optimizer and have a lot of tuning parameters. Two additional models, \code{mlpKerasDecayCost} and \code{mlpKerasDropoutCost}, are classification only and perform cost-sensitive learning. Note that these models will not run in parallel using \cpkg{caret}'s parallelism and also will not give reproducible results from run-to-run (see \url{https://github.com/rstudio/keras/issues/42}). 168 \item The range for one parameter (\code{gamma}) was modified in the \code{mlpSGD} model code. 169 \item A bug in classification models with all missing predictions was fixed (found by andzandz11). \issue{684} 170 \item A bug preventing preprocessing to work properly when the preprocessing transformations are related to individual columns only fixed by Mateusz Kobos in \issue{679}. 171 \item A prediction bug in \code{glm.nb} that was found by jpclemens0 was fixed \issue{688}. 172 \item A bug was fixed in Self-Organizing Maps via \code{xyf} for regression models. 173 \item A bug was fixed in \code{rpartCost} related to how the tuning parameter grid was processed. 174 \item A bug in negative-binomial GLM models (found by jpclemens0) was fixed \issue{688}. 175 \item In \code{trainControl}, if \code{repeats} is used on methods other than \code{"repeatedcv"} or \code{"adaptive_cv"}, a warning is issued. Also, for method other than these two, a new default (\code{NA}) is given to \code{repeats}. \issue{720}. 176 \item \code{rfFuncs} now computes importance on the first and last model fit. \issue{723} 177 } 178} 179 180 181\section{Changes in version 6.0-76}{ 182 \itemize{ 183 \item Monotone multi-layer perceptron neural network models from the \cpkg{monmlp} package were added \issue{489}. 184 \item A new resampling function (\code{groupKFold}) was added \issue{540}. 185 \item The bootstrap optimism estimate was added by Alexis Sarda \issue{544}. 186 \item Bugs in \code{glm}, \code{glm.nb}, and \code{lm} variable importance methods that occur when a single variable is in the model \issue{543}. 187 \item A bug in \code{filterVarImp} was fixed where the ROC curve AUC could be much less than 0.50 because the directionality of the predictor was not taken into account. This will artificially increase the importance of some non-informative predictors. However, the bug might report the AUC for an important predictor to be 0.20 instead of 0.80 \issue{565}. 188 \item \code{multiClassSummary} now reports the average F score \issue{566}. 189 \item The \code{RMSE} and \code{R2} are now (re)exposed to the users \issue{563}. 190 \item A \cpkg{caret} bug was discovered by Jiebiao Wang where \code{glmboost}, \code{gamboost}, and \code{blackboost} models incorrectly reported the class probabilities \issue{560}. 191 \item Training data weights support was added to \code{xgbTree} model by schistyakov. 192 \item Regularized logistic regression through Liblinear (\code{LiblineaR::LiblineaR}) using L1 or L2 regularization were added by hadjipantelis. 193 \item A bug related to the ordering of axes labels in the heatmap plot of training results was fixed by Mateusz Dziedzic in \issue{620}. 194 \item A variable importance method for model averaged neural networks was added. 195 \item More logic was added so that the \code{predict} method behaves well when a variable is subtracted from a model formula from \issue{574}. 196 \item More documentation was added for the \code{class2ind} function (\issue{592}). 197 \item Fixed the formatting of the design matrices in the \code{dummyVars} man file. 198 \item A note was added to \code{?trainControl} about using custom resampling methods (\issue{584}). 199 \item A bug was fixed related to SMOTE and ROSE sampling with one predictor (\issue{612}). 200 \item Due to changes in the \cpkg{kohonen} package, the \code{bdk} model is no longer available and the code behind the \code{xyf} model has changes substantially (including the tuning parameters). Also, when using \code{xyf}, a check is conducted to make sure that a recent version of the \cpkg{kohonen} package is being used. 201 \item Changes to \code{xgbTree} and \code{xgbLinear} to help with sparse matrix inputs for \issue{593}. Sparse matrices are not allowed when preprocessing or subsampling are used. 202 \item Several PLS models were using the classical orthogonal scores algorithm when discriminant analysis was conducted (despite using \code{simpls}, \code{widekernelpls}, or \code{kernelpls}). Now, the PLSDA model estimation method is consistent with the method requested (\issue{610}). 203 \item Added Multi-Step Adaptive MCP-Net (\code{method = "msaenet"}) for \issue{561}. 204 \item The variable importance score for linear regression was modified so that missing values in the coefficients are converted to zero. 205 \item In \code{train}, \code{x} is now required to have column names. 206 207 } 208} 209 210\section{Changes in version 6.0-73}{ 211 \itemize{ 212 \item Negative binomial generalized linear models (\code{MASS:::glm.nb}) were added \issue{476} 213 \item \code{mnLogLoss} now returns a named vector (\issue{514}, bug found by Jay Qi) 214 \item A bunch of method/class related bugs induced by the previous version were fixed. 215 } 216} 217 218\section{Changes in version 6.0-72}{ 219 \itemize{ 220 \item The inverse hyperbolic sine transformation was added to \code{preProcess} \issue{56} 221 \item Tyler Hunt moved the ROC code from the \cpkg{pROC} package to the \cpkg{ModelMetrics} package which should make the computations more efficient \issue{482}. 222 \item \code{train} does a better job of respecting the original format of the input data \issue{474} 223 \item A bug in \code{bdk} and \code{xyf} models was fixed where the appropriate number of parameter combinations are tested during random search. 224 \item A bug in \code{rfe} was fixed related to neural networks found by david-machinelearning \issue{485} 225 \item Neural networks via stochastic gradient descent (\code{method = "mlpSGD"}) was adapted for classification and a variable importance calculation was added. 226 \item \href{https://www.h2o.ai/}{h2o} versions of glmnet and gradient boosting machines were added with methods \code{"glmnet\_h2o"} and \code{"gbm\_h2o"}. These methods are not currently optimized. \issue{283} 227 \item The fuzzy rule-based models (\code{WM}, \code{SLAVE}, \code{SBC}, \code{HYFIS}, \code{GFS.THRIFT}, \code{GFS.LT.RS}, \code{GFS.GCCL}, \code{GFS.FR.MOGUL}, \code{FS.HGD}, \code{FRBCS.W}, \code{FRBCS.CHI}, \code{FIR.DM}, \code{FH.GBML}, \code{DENFIS}, and \code{ANFIS}) were modified so that the user can pass in the predictor ranges using the \code{range.data} argument to those functions. \issue{498} 228 \item A variable importance method was added for boosted generalized linear models \issue{493} 229 \item \code{preProcess} now has an option to filter out highly correlated predictors. 230 \item \code{trainControl} now has additional options to modify the parameters of near-zero variance and correlation filters. See the \code{preProcOptions} argument. 231 \item The \code{rotationForest} and \code{rotationForestCp} methods were revised to evaluate only \emph{feasible} values of the parameter \code{K} (the number of variable subsets). The underlying \code{rotationForest} function reduces this parameter until values of \code{K} divides evenly into the number of parameters. 232 \item The \code{skip} option from \code{createTimeSlices} was added to \code{trainControl} \issue{491} 233 \item \code{xgb.train}'s option \code{subsample} was added to the \code{xgbTree} model \issue{464} 234 } 235} 236 237\section{Changes in version 6.0-71}{ 238 \itemize{ 239 \item Precision, recall, and F measure functions were added along with one called \code{prSummary} that is analogous to \code{twoClassSummary}. Also, \code{confusionMatrix} gains an argument called \code{mode} that dictates what output is shown. 240 \item schistyakov added additional tuning parameters to the robust linear model code \issue{454}. Also for \code{rlm} and \code{lm} schistyakov added the ability to tune over the intercept/no intercept model. 241 \item Generalized additive models for very large datasets (\code{bam} in \cpkg{mgcv}) was added \issue{453} 242 \item Two more linear SVM models were added from the \cpkg{LiblineaR} package with model codes \code{svmLinear3} and \code{svmLinearWeights2} (\issue{441}) 243 \item The \code{tau} parameter was added to all of the least square SVM models (\issue{415}) 244 \item A new data set (called \code{scat}) on animal droppings was added. 245 \item A significant bug was fixed where the internals of how R creates a model matrix was ignoring \code{na.action} when the default was set to \code{na.fail} \issue{461}. This means that \code{train} will now immediately fail if there are any missing data. To use imputation, use \code{na.action = na.pass} and the imputation method of your choice in the \code{preProcess} argument. Also, a warning is issued if the user asks for imputation but uses the formula method and excludes missing data in \code{na.action} 246 247 } 248} 249 250\section{Changes in version 6.0-70}{ 251 \itemize{ 252 \item Based on a comment by Alexis Sarda, \code{method = "ctree2"} does not fix \code{mincriterion = 0} and tunes over this parameter. For a fixed depth, \code{mincriterion} can further prune the tree \issue{409}. 253 \item A bug in KNN imputation was fixed (found by saviola777) that occurred when a factor predictor was in the data set \issue{404}. 254 \item Infrastructure changes were made so that \code{train} tries harder to respect the original class of the outcome. For example, if an ordered factor is used as the outcome with a modeling function that treats is as an unordered factor, the model still produces an ordered factor during prediction. 255 \item The \code{ranger} code now allows for case weights \issue{414}. 256 \item \code{twoClassSim} now has an option to compute ordered factors. 257 \item High-dimensional regularized discriminant analysis and, regularized linear discriminant analysis, and several variants of diagonal discriminant analysis from the \cpkg{sparsediscrim} package were added (\code{method = "hdrda"}, \code{method = "rlda"}, and \code{method = "dda"}, respectively) \issue{313}. 258 \item A neural network regression model optimized by stochastic gradient decent from the \cpkg{FCNN4R} package was added. The model code is \code{mlpSGD}. 259 \item Several models for ordinal outcomes were added: \code{rpartScore} (from the \cpkg{rpartScore} package), \code{ordinalNet} (\cpkg{ordinalNet}), \code{vglmAdjCat} (\cpkg{VGAM}), \code{vglmContRatio} (\cpkg{VGAM}), and \code{vglmCumulative} (\cpkg{VGAM}). Note that, for models that load \cpkg{VGAM}, there is a conflict such that the \code{predictors} class code from \cpkg{caret} is masked. To use that method, you can use \code{caret:::predictors.train()} instead of \code{predictors()}. 260 \item Another high performance random forest package (\cpkg{Rborist}) was exposed through \cpkg{caret}. The model code is \code{method = "Rborist"} \issue{418} 261 \item Xavier Robin fixed a bug related to the area under the ROC curve in \issue{431}. 262 \item A bug in \code{print.train} was fixed when LOO CV was used \issue{435} 263 \item With RFE, a better error message drafted by mikekaminsky is printed when the number of importance measures is off \issue{424} 264 \item Another bug was fixed in estimating the prediction time when the formula method was used \issue{420}. 265 \item A linear SVM model was added that uses class weights. 266 \item The linear SVM model using the \cpkg{e1071} package (\code{method = "svmLinear2"}) had the \code{gamma} parameter for the RBF kernel removed. 267 \item Xavier Robin committed changes to make sure that the area under the ROC is accurately estimated \issue{431} 268 } 269} 270 271 272\section{Changes in version 6.0-68}{ 273 \itemize{ 274 \item \code{print.train} no longer shows the standard deviation of the resampled values unless the new option is used (\code{print.train(, showSD = TRUE)}). When shown, they are within parentheses (e.g. "4.24 (0.493)"). 275 \item An adjustment the innards of adaptive resampling was changed so that the test for linear dependencies is more stringent. 276 \item A bug in the bootstrap 632 estimate was found and fixed by Alexis Sarda \issue{349} \issue{353}. 277 \item The \code{cforest} module's \code{oob} element was modified based on another bug found by Alexis Sarda \issue{351}. 278 \item The methods for \code{bagEarth}, \code{bagEarthGCV}, \code{bagFDA}, \code{bagFDAGCV}, \code{earth}, \code{fda}, and \code{gcvEarth} models have been updates so that case-weights can be used. 279 \item The \code{rda} module contained a bug found by Eric Czech \issue{369}. 280 \item A bug was fixed for printing out the resampling details with LGOCV found by github user zsharpm \issue{366} 281 \item A new data set was added (\code{data(Sacramento)}) with sale prices of homes. 282 \item Another adaboost algorithm (\code{method = "adaboost"} from the \cpkg{fastAdaboost} package) was added \issue{284}. 283 \item Yet another boosting algorithm (\code{method = "deepboost"} from the \cpkg{deepboost} package) was added \issue{388}. 284 \item Alexis Sarda made changes to the confusion matrix code for \code{train}, \code{rfe}, and \code{sbf} objects that more rationally normalizes the resampled tables \issue{355}. 285 \item A bug in how \cpkg{RSNNS} perceptron models were tuned (found by github user smlek) was fixed \issue{392}. 286 \item A bug in computing the bootstrap 632 estimate was fixed (found by Stu) \issue{382}. 287 \item John Johnson contributed an update to \code{xgbLinear} \issue{372}. 288 \item Resampled confusion matrices are not automatically computed when there are 50 or more classes due to the storage requirements (\issue{356}). However, the relevant functions have been updated to use the out-of-sample predictions instead (when the user asks for them to be returned by the function). 289 \item Some changes were made to \code{predict.train} to error trap (and fix) cases when predictions are requested without referencing a \code{newdata} object \issue{347}. 290 \item Github user pverspeelt identified a bug in our model code for \code{glmboost} (and \code{gamboost}) related to the \code{mstop} function modifying the model object in memory. It was fixed \issue{396}. 291 \item For \issue{346}, an option to select which samples are used to fit the final model, called \code{indexFinal}, was added to \code{trainControl}. 292 \item For issue \issue{390} found by JanLauGe, a bug was fixed in \code{dummyVars} related to the names of the resulting data set. 293 \item Models \code{rknn} and \code{rknnBel} were removed since their package is no longer on CRAN. 294 } 295} 296 297\section{Changes in version 6.0-66}{ 298 \itemize{ 299 \item Model averaged naive Bayes (\code{method = "manb"}) from the \cpkg{bnclassify} package was added. 300 \item \code{blackboost} was updated to work with outcomes with 3+ classes. 301 \item A new model \code{rpart1SE} was added. This has no tuning parameters and resamples the internal \cpkg{rpart} procdure of pruning using the one standard error method. 302 \item Another model (\code{svmRadialSigma}) tunes over the cost parameter and the RBF kernel parameter sigma. In the latter case, using \code{tuneLength} will, at most, evaluate six values of the kernel parameter. This enables a broad search over the cost parameter and a relatively narrow search over \code{sigma}. 303 \item Additional model tags for "Accepts Case Weights", "Two Class Only", "Handle Missing Predictor Data", "Categorical Predictors Only", and "Binary Predictors Only" were added. In some cases, a new model element called "notes" was added to the model code. 304 \item A pre-processing method called "conditionalX" was added that eliminates predictors where the conditional distribution (X|Y) for that predictor has a single value. See the \code{checkConditionalX} function for details. This is only used for classification. \issue{334} 305 \item A bug in the naive Bayes prediction code was found by github user pverspeelt and was fixed. \issue{345} 306 \item Josh Brady (doublej2) found and fixed an issue with \code{DummyVars} \issue{344} 307 \item A bug related to recent changes to the \cpkg{ranger} package was fixed \issue{320} 308 \item Dependencies on external software can now be checked in the model code. See \href{https://github.com/topepo/caret/blob/master/models/deprecated/pythonKnnReg.R}{\code{pythonKnnReg}} for an example. This also removes the overall package dependency on \cpkg{rPython} \issue{328}. 309 \item The tuning parameter grid for \code{enpls} and \code{enpls.fs} were changed to avoid errors. 310 \item A bug was fixed \issue{342} where the data used for prediction was inappropriately converted from its original class. 311 \item Matt (aka washcycle) added option to return column names to \code{nearZeroVar} function 312 \item Homer Strong fixed \code{varImp} for \code{glmnet} models so that they return the absolute value of the regression coefficients \issue{173} \issue{190} 313 \item The basic naive Bayes method (\code{method = "nb"}) gained a tuning parameter, \code{adjust}, that adjusts the bandwidth (see \code{?density}). The parameter is ignored when \code{usekernel = FALSE}. 314 } 315} 316 317 318\section{Changes in version 6.0-62}{ 319 \itemize{ 320 \item From the \cpkg{randomGLM} package, a model of the same name was added. 321 \item From \cpkg{monomvn} package, models for the Bayesian lasso and ridge regression were added. In the latter case, two methods were added. \code{blasso} creates predictions using the mean of the posterior distributions but sets some parameters specifically to zero based on the tuning parameter called \code{sparsity}. For example, when \code{sparsity = .5}, only coefficients where at least half the posterior estimates are nonzero are used. The other model, \code{blassoAveraged}, makes predictions across all of the realizations in the posterior distribution without coercing any coefficients to zero. This is more consistent with Bayesian model averaging, but is unlikely to produce very sparse solutions. 322 \item From the \cpkg{spikeslab} package, a regression model was added that emulates the procedure used by \code{cv.spikeslab} where the tuning variable is the number of retained predictors. 323 \item A bug was fixed in adaptive resampling (found by github user elephann) \issue{304} 324 \item Fixed another adaptive resampling bug flagged by github user elephann related to the latest version of the \cpkg{BradleyTerry2} package. Thanks to Heather Turner for the fix \issue{310} 325 \item Yuan (Terry) Tang added more tuning parameters to \code{xgbTree} models. 326 \item Model \code{svmRadialWeights} was updated to allow for class probabilities. Previously, \cpkg{kernlab} did not change the probability estimates when weights were used. 327 \item A \cpkg{ggplot2} method for \code{varImp.train} was added \issue{231} 328 \item Changes were made for the package to work with the next version of \cpkg{ggplot2} \issue{317} 329 \item Github user \code{fjeze} added new models \code{mlpML} and \code{mlpWeightDecayML} that extend the existing \cpkg{RSNNS} models to multiple layers. \code{fjeze} also added the \code{gamma} parameter to the \code{svmLinear2} model. 330 \item A function for generating data for learning curves was added. 331 \item The range of SVM cost values explored in random search was expanded. 332 } 333} 334 335\section{Changes in version 6.0-58}{ 336 \itemize{ 337 \item A major bug was fixed (found by Harlan Harris) where pre-processing objects created from versions of the package prior to 6.0-57 can give incorrect results when run with 6.0-57 \issue{282}. 338 \item \code{preProcess} can now remove predictors using zero- and near zero-variance filters via (\code{method} values of \code{"zv"} and \code{"nzv"}). When used, these filters are applied to numeric predictors prior to all other pre-processing operations. 339 \item \code{train} now throws an error for classification tasks where the outcome has a factor level with no observed data \issue{260}. 340 \item Character outcomes passed to \code{train} are not converted to factors. 341 \item A bug was found and fixed in this package's class probability code for \code{gbm} models when a single multinomial observation is predicted \issue{274}. 342 \item A new option to \code{ggplot.train} was added that highlights the optimal tuning parameter setting in the cases where grid search is used (thanks to Balaji Iyengar (github: bdanalytics)). 343 \item In \code{trainControl}, the argument \code{savePredictions} can now be character values (\code{"final"}, \code{"all"} or \code{"none"}). Logicals can still be used and match to \code{"all"} or \code{"none"}. 344 } 345} 346 347 348\section{Changes in version 6.0-57}{ 349 \itemize{ 350 \item Hyperparameter optimization via random search is now availible. See the new \href{http://topepo.github.io/caret/random-hyperparameter-search.html}{help page} for examples and syntax. 351 \item \code{preProcess} now allows (but ignores) non-numeric predictor columns. 352 \item Models were added for optimal weighted and stabilized nearest neighbor classifiers from the \cpkg{snn} package were added with model codes \code{snn} and \code{ownn} 353 \item Random forests using the excellent \cpkg{ranger} package were added (\code{method = "ranger"}) 354 \item An additional variation of rotation forests was added (\code{rotationForest2}) that also tunes over \code{cp}. Unfortunately, the sub-model trick can't be utilized in this instance. 355 \item Kernelized distance weighted discriminant analysis models from \cpkg{kerndwd} where added (\code{dwdLieanr}, \code{dwdPoly}, and \code{dwdRadial}) 356 \item A bug was fixed with \code{rfe} when \code{train} was used to generate a classification model but class probabilities were not (or could not be) generated \issue{234}. 357 \item Can Candan added a python model \code{sklearn.neighbors.KNeighborsRegressor} that can be accessed via \code{train} using the \cpkg{rPython} package. The python modules \code{sklearn} and \code{pandas} are required for this to run. 358 \item Jason Aizkalns fixed a bunch of typos. 359 \item MarwaNabil found a bug with \code{lift} and missing values \issue{225}. This was fixed such that missing values are removed prior to the calculations (within each model) 360 \item Additional options were added to \code{LPH07_1} so that two class data can also be simulated and predictors are converted to factors. 361 \item The model-specific code for computing out-of-bag performance estimates were moved into the model code library \issue{230}. 362 \item A variety of naive Bayes and tree augmented naive Bayes classifier from the \cpkg{bnclassify} package were added. Variations include simple models (methods labeled as \code{"nbDiscrete"} and \code{"tan"}), models using attribute weighting (\code{"awnb"} and \code{"awtan"}), and wrappers that use search methods to optimize the network structure (\code{"nbSearch"} and \code{"tanSearch"}). In each case, the predictors and outcomes must all be factor variables; for that reason, using the non-formula interface to \code{train} (e.g. \code{train(x, y)}) is critical to preserve the factor structure of the data. 363 \item A function called \code{multiClassSummary} was added to compute performance values for problems with three or more classes. It works with or without predicted class probabilities \issue{107}. 364 \item \code{confusionMatrix} was modified to deal with name collisions between this package and \cpkg{RSNNS} \issue{256}. 365 \item A bug in how the LVQ tune grid is filtered was fixed. 366 \item A bug in \code{preProcess} for ICA and PCA was fixed. 367 \item Bugs in \code{avNNet} and \code{pcaNNet} when predicting class probabilities were fixed \issue{#261}. 368 } 369} 370 371 372\section{Changes in version 6.0-52}{ 373 \itemize{ 374 \item A new model using the \cpkg{randomForest} and \cpkg{inTrees} packages called \code{rfRules} was added. A basic random forest model is used and then is decomposed into rules (of user-specified complexity). The \cpkg{inTrees} package is used to prune and optimize the rules. Thanks to Mirjam Jenny who suggested the workflow. 375 \item Other new models (and their packages): \code{bartMachine} (\cpkg{bartMachine}), \code{rotationForest} (\cpkg{rotationForest}), \code{sdwd} (\cpkg{sdwd}), \code{loclda} (\cpkg{klaR}), \code{nnls} (\cpkg{nnls}), \code{svmLinear2} (\cpkg{e1071}), \code{rqnc} (\cpkg{rqPen}), and \code{rqlasso} (\cpkg{rqPen}) 376 \item When specifying your own resampling indices, a value of \code{method = "custom"} can be used with \code{trainControl} for better printing. 377 \item Tim Lucas fixed a bug in \code{avNNet} when \code{bag = TRUE} 378 \item Fixed a bug found by \code{ruggerorossi} in \code{method = "dnn"} with classification. 379 \item A new option called \code{sampling} was added to \code{trainControl} that allows users to subsample their data in the case of a class imbalance. Another \href{http://topepo.github.io/caret/subsampling-for-class-imbalances.html}{help page} was added to explain the features. 380 \item Class probabilities can be computed for \code{extraTrees} models now. 381 \item When PCA pre-processing is conducted, the variance trace is saved in an object called \code{trace}. 382 \item More error traps were added for common mistakes (e.g. bad factor levels in classification). 383 \item An internal function (\code{class2ind}) that can be used to make dummy variables for a single factor vector is now documented and exported. 384 \item A bug was fixed in the \code{xyplot.lift} where the reference line was incorrectly computed. Thanks to Einat Sitbon for finding this. 385 \item A bug related to calculating the Box-Cox transformation found by John Johnson was fixed. 386 \item github user \code{EdwinTh} developed a faster version of \code{findCorrelation} and found a bug in the original code. \code{findCorrelation} has two new arguments, one of which is called \code{exact} which defaults to use the original (fixed) function. Using \code{exact = FALSE} uses the faster version. The fixed version of the "exact" code is, on average, 26-fold slower than the current version (for 250x250 matrices) although the average time for matrices of this size was only 26s. The exact version yields subsets that are, one average, 2.4 percent smaller than the other versions. This difference will be more significant for smaller matrices. The faster ("approximate") version of the code is 8-fold faster than the current version. 387 \item github user \code{slyuee} found a bug in the \code{gam} model fitting code. 388 \item Chris Kennedy fixed a bug in the \code{bartMachine} variable importance code. 389 } 390} 391 392\section{Changes in version 6.0-47}{ 393 \itemize{ 394 \item CHAID from the R-Forge package \pkg{CHAID} 395 \item Models \code{xgbTree} amd \code{xgbLinear} from the \code{xgboost} package were added. That package is not on CRAN and can be installed from github using the \cpkg{devtools} package and \code{install_github('dmlc/xgboost',subdir='R-package')}. 396 \item \code{dratewka} enabled \code{rbf} models for regression. 397 \item A summary function for the multinomial likelihood called \code{mnLogLoss} was added. 398 \item The total object size for \code{preProces} objects that used bagged imputation was reduced almost 5-fold. 399 \item A new option to \code{trainControl} called \code{trim} was added where, if implemented, will reduce the model's footprint. However, features beyond simple prediction may not work. 400 \item A rarely occurring bug in \code{gbm} model code was fixed (thanks to Wade Cooper) 401 \item \code{splom.resamples} now respects the \code{models} argument 402 \item A new argument to \code{lift} called \code{cuts} was added to allow more control over what thresholds are used to calculate the curve. 403 \item The \code{cuts} argument of \code{calibration} now accepts a vector of cut points. 404 \item Jason Schadewald noticed and fixed a bug in the man page for \code{dummyVars} 405 \item Call objects were removed from the following models: \code{avNNet}, \code{bagFDA}, \code{icr}, \code{knn3}, \code{knnreg}, \code{pcaNNet}, and \code{plsda}. 406 \item An argument was added to \code{createTimeSlices} to thin the number of resamples 407 \item The RFE-related functions \code{lrFuncs}, \code{lmFuncs}, and \code{gamFuncs} were updated so that \code{rfe} accepts a matrix \code{x} argument. 408 \item Using the default grid generation with \code{train} and \code{glmnet}, an initial \code{glmnet} fit is created with \code{alpha = 0.50} to define the \code{lambda} values. 409 \item \code{train} models for \code{"gbm"}, \code{"gam"}, \code{"gamSpline"}, and \code{"gamLoess"} now allow their respective arguments for the outcome probability distribution to be passed to the underlying function. 410 \item A bug in \code{print.varImp.train} was fixed. 411 \item \code{train} now returns an additional column called \code{rowIndex} that is exposed when calling the summary function during resampling. 412 \item The ability to compute class probabilities was removed from the \code{rpartCost} model since they are unlikely to agree with the class predictions. 413 \item \code{extractProb} no longer redundantly calls \code{extractPrediction} to generate the class predictions. 414 \item A new function called \code{var_seq} was added that finds a sequence of integers that can be useful for some tuning parameters such as random forests \code{mtry}. Model modules were update to use the new function. 415 \item \code{n.minobsinnode} was added as a tuning parameter to \code{gbm} models. 416 \item For models using out-of-bag resampling, \code{train} now properly checks the \code{metric} argument against the names of the measured outcomes. 417 \item Both \code{createDataParition} and \code{createFolds} were modified to better handle cases where one or more class have very low numbers of data points. 418 } 419} 420 421\section{Changes in version 6.0-41}{ 422 \itemize{ 423 \item The license was changed to GPL (>= 2) to accommodate new code from the GA package. 424 \item New feature selection functions \code{gafs} and \code{safs} were added, along with helper functions and objects, were added. The package HTML was updated to expand more about feature selection. 425 \item From the \cpkg{adabag} package, two new models were added: \code{AdaBag} and \code{AdaBoost.M1}. 426 \item Weighted subspace random forests from the \cpkg{wsrf} package was added. 427 \item Additional bagged FDA and MARS models were added (model codes \code{bagFDAGCV} and \code{bagEarthGCV}) were added that use the GCV statistic to prune the model. This leads to memory reductions during training. 428 \item The model code for \code{ada} had a bug fix applied and the code was adapted to use the "sub-model trick" so it should train faster. 429 \item A bug was fixed related to imputation when the formula method is used with \code{train} 430 \item The old \code{drop = FALSE} bug was fixed in \code{getTrainPerf} 431 \item A bug was fixed for custom models with no labels. 432 \item A bug fix was made for bagged MARS models when predicting probabilities. 433 \item In \code{train}, the argument \code{last} was being incorrectly set for the last model. 434 \item Reynald Lescarbeau refactored \code{findCorrelation} to make it faster. 435 \item The apparent performance values are not reported by \code{print.train} when the bootstrap 632 estimate is used. 436 \item When a required package is missing, the code stops earlier with a more explicit error message. 437 } 438} 439 440\section{Changes in version 6.0-37}{ 441 \itemize{ 442 \item Brenton Kenkel added ordered logistic or probit regression to \code{train} using \code{method = "polr"} from \cpkg{MASS} 443 \item \code{LPH07_1} now encodes the noise variables as binary 444 \item Both \code{rfe} and \code{sbf} get arguments for \code{indexOut} for their control functions. 445 \item A reworked version of \code{\link{nearZerVar}} based on code from Michael Benesty was added the old version is now called \code{nzv} that uses less memory and can be used in parallel. 446 \item The adaptive mixture discriminant model from the \cpkg{adaptDA} package was added as well as a robust mixture discriminant model from the \cpkg{robustDA} package. 447 \item The multi-class discriminant model using binary predictors in the \cpkg{binda} package was added. 448 \item Ensembles of partial least squares models (via the \cpkg{enpls}) package was added. 449 \item A bug using \code{gbm} with Poisson data was fixed (thanks to user eriklampa) 450 \item \code{sbfControl} now has a \code{multivariate} option where all the predictors are exposed to the scoring function at once. 451 \item A function \code{compare_models} was added that is a simple comparison of models via \code{diff.resamples)}. 452 \item The row names for the \code{variables} component of \code{rfe} objects were simplified. 453 \item Philipp Bergmeir found a bug that was fixed where \code{bag} would not run in parallel. 454 \item \code{predictionBounds} was not implemented during resampling. 455 } 456} 457 458\section{Changes in version 6.0-35}{ 459 \itemize{ 460 \item A few bug fixes to \code{preProcess} were made related to KNN imputation. 461 \item The parameter labels for polynomial SVM models were fixed 462 \item The tags for \code{dnn} models were fixed. 463 \item The following functions were removed from the package: \code{generateExprVal.method.trimMean}, \code{normalize.AffyBatch.normalize2Reference}, \code{normalize2Reference}, and \code{PLS}. The original code and the man files can be found at \href{https://github.com/topepo/caret/tree/master/deprecated}{https://github.com/topepo/caret/tree/master/deprecated}. 464 \item A number of changes to comply with section 1.1.3.1 of "Writing R Extensions" were made. 465 } 466} 467 468\section{Changes in version 6.0-34}{ 469\itemize{ 470 471\item For the input data \code{x} to \code{train}, we now respect the class of the input value to accommodate other data types (such as sparse matrices). There are some complications though; for pre-processing we throw a 472warning if the data are not simple matrices or data frames since there is some infrastructure that does not exist for other classes( e.g. \code{complete.cases}). We also throw a warning if \code{returnData <- TRUE} and it cannot be converted to a data frame. This allows the use of sparse matrices and text corpus to be used as inputs into that function. 473 474 475\item \code{plsRglm} was added. 476 477\item From the \cpkg{frbs}, the following rule-based models were added: \code{ANFIS}, \code{DENFIS}, \code{FH.GBML}, \code{FIR.DM}, \code{FRBCS.CHI}, \code{FRBCS.W}, \code{FS.HGD}, \code{GFS.FR.MOGAL}, \code{GFS.GCCL}, \code{GFS.LTS}, \code{GFS.THRIFT}, \code{HYFIS}, \code{SBC} and \code{WM}. Thanks to Lala Riza for suggesting these and facilitating their addition to the package. 478 479\item From the \cpkg{kernlab} package, SVM models using string kernels were added: \code{svmBoundrangeString}, \code{svmExpoString}, \code{svmSpectrumString} 480 481\item A function \code{update.rfe} was added. 482 483\item \code{cluster.resamples} was added to the namespace. 484 485\item An option to choose the \code{metric} was added to \code{summary.resamples}. 486 487\item \code{prcomp.resamples} now passed \code{...} to \code{prcomp}. Also the call to \code{prcomp} uses the formula method so that \code{na.action} can be used. 488 489\item The function \code{resamples} was enhanced so that \code{train} and \code{rfe} models that used \code{returnResamp="all"} subsets the resamples to get the appropriate values and issues a warning. The function also fills in missing model names if one or more are not given. 490 491\item Several regression simulation functions were added: \code{SLC14_1}, \code{SLC14_2}, \code{LPH07_1} and \code{LPH07_2} 492 493\item \code{print.train} was re-factored so that \code{format.data.frame} is now used. This should behave better when using \cpkg{knitr}. 494 495\item The error message in \code{train.formula} was improved to provide more helpful feedback in cases where there is at least one missing value in each row of the data set. 496 497 498\item \code{ggplot.train} was modified so that groups are distinguished by color and shape. 499 500\item Options were added to \code{plot.train} and \code{ggplot.train} called \code{nameInStrip} that will print the name and value of any tuning parameters shown in panels. 501 502\item A bug was fixed by Jia Xu within the knn imputation code used by \code{preProcess}. 503} 504} 505 506\section{Changes in version 6.0-30}{ 507\itemize{ 508\item A missing piece of documentation in \code{trainControl} for adaptive models was filled in. 509 510\item A warning was added to \code{plot.train} and \code{ggplot.train} to note that the relationship between the resampled performance measures and the tuning parameters can be deceiving when using adaptive resampling. 511 512\item A check was added to \code{trainControl} to make sure that a value of \code{min} makes sense when using adaptive resampling. 513 514} 515} 516 517\section{Changes in version 6.0-29}{ 518\itemize{ 519\item A man page with the list of models available via \code{train} was added back into the package. See \code{?models}. 520 521\item Thoralf Mildenberger found and fixed a bug in the variable importance 522calculation for neural network models. 523 524\item The output of \code{varImp} for \code{pamr} models was updated to clarify the ordering of the importance scores. 525 526\item \code{getModelInfo} was updated to generate a more informative error message if the user looks for a model that is not in the package's model library. 527 528\item A bug was fixed related to how seeds were set inside of \code{train}. 529 530\item The model \code{"parRF"} (parallel random forest) was added back into the library. 531 532\item When case weights are specified in \code{train}, the hold-out weights are exposed when computing the summary function. 533 534\item A check was made to convert a \code{data.table} given to \code{train} to a data frame (see \url{https://stackoverflow.com/questions/23256177/r-caret-renames-column-in-data-table-after-training}). 535 536} 537} 538 539\section{Changes in version 6.0-25}{ 540\itemize{ 541\item Changes were made that stopped execution of \code{train} if there are no rows in the data (changes suggested by Andrew Ziem) 542 543\item Andrew Ziem also helped improve the documentation. 544} 545} 546 547\section{Changes in version 6.0-24}{ 548\itemize{ 549\item Several models were updated to work with case weights. 550 551\item A bug in \code{rfe} was found where the largest subset size have the same results as the full model. Thanks to Jose Seoane for reporting the bug. 552} 553} 554 555\section{Changes in version 6.0-22}{ 556\itemize{ 557\item For some parallel processing technologies, the package now export 558more internal functions. 559 560\item A bug was fixed in \code{rfe} that occurred when LOO CV was used. 561 562\item Another bug was fixed that occurred for some models when 563\code{tuneGrid} contained only a single model. 564 } 565} 566 567 568\section{Changes in version 6.0-21}{ 569\itemize{ 570\item A new system for user-defined models has been added. 571\item When creating the grid of tuning parameter values, the column 572names no longer need to be preceded by a period. Periods can still be 573used as before but are not required. This isn't guaranteed to break 574backwards compatibility but it may in some cases. 575 576\item \code{trainControl} now has a \code{method = "none"} resampling 577option that bypasses model tuning and fits the model to the entire 578training set. Note that if more than one model is specified an error 579will occur. 580 581 \item \code{logicForest} models were removed since the package is 582 now archived. 583 584 \item \code{CSimca} and \code{RSimca} models from the \cpkg{rrcovHD} 585 package were added. 586 587 \item Model \code{elm} from the \cpkg{elmNN} 588 package was added. 589 590 \item Models \code{rknn} and \code{rknnBel} from the \cpkg{rknn} 591 package were added 592 593 \item Model \code{brnn} from the \cpkg{brnn} 594 package was added. 595 596 \item \code{panel.lift2} and \code{xyplot.lift} now have an argument 597 called \code{values} that show the percentages of samples found for 598 the specified percentages of samples tested. 599 600\item \code{train}, \code{rfe} and \code{sbf} should no longer throw 601a warning that "executing %dopar% sequentially: no parallel backend registered". 602 603\item A \code{ggplot} method for \code{train} was added. 604 605\item Imputation via medians was added to \code{preProcess} by Zachary Mayer. 606 607\item A small change was made to \code{rpart} models. Previously, when the 608final model is determined, it would be fit by specifying the model using the 609\code{cp} argument of \code{rpart.control}. This could lead to duplicated Cp 610values in the final list of possible Cp values. The current version fits the 611final model slightly different. An initial model is fit using \code{cp = 0} 612then it is pruned using \code{prune.rpart} to the desired depth. This 613shouldn't be different for the vast majority of data sets. Thanks to Jeff 614Evans for pointing this out. 615 616\item The method for estimating sigma for SVM and RVM models was slightly 617changed to make them consistent with how \code{ksvm} and \code{rvm} does the 618estimation. 619 620\item The default behavior for \code{returnResamp} in \code{rfeControl} and 621 \code{sbfControl} is now \code{returnResamp = "final"}. 622 623\item \code{cluster} was added as a general class with a specific method 624for \code{resamples} objects. 625 626\item The refactoring of model code resulted in a number of packages being 627eliminated from the depends field. Additionally, a few were moved to exports. 628 629} 630} 631 632\section{Changes in version 5.17-07}{ 633\itemize{ 634 \item A bug in \code{spatialSign} was fixed for data frames with 635 a single column. 636 637 \item Pre-processing was not applied to the training data set 638 prior to grid creation. This is now done but only for models 639 that use the data when defining the grid. Thanks to Brad Buchsbaum 640 for finding the bug. 641 642 \item Some code was added to \code{rfe} to truncate the subset 643 sizes in case the user over-specified them. 644 645 \item A bug was fixed in \code{gamFuncs} for the \code{rfe} 646 function. 647 648 \item Option in \code{trainControl}, \code{rfeControl} and 649 \code{sbfControl} were added so that the user can set the 650 seed at each resampling iteration (most useful for parallel 651 processing). Thanks to Allan Engelhardt for the recommendation. 652 653 \item Some internal refactoring of the data was done to prepare 654 for some upcoming resampling options. 655 656 \item \code{predict.train} now has an explicit \code{na.action} 657 argument defaulted to \code{na.omit}. If imputation is used in 658 \code{train}, then \code{na.action = na.pass} is recommended. 659 660 \item A bug was fixed in \code{dummyVars} that occured when 661 missing data were in \code{newdata}. The function 662 \code{contr.dummy} is now deprecated and \code{contr.ltfr} 663 should be used (if you are using it at all). Thanks to 664 stackexchange user mchangun for finding the bug. 665 666 \item A check is now done inside \code{dummyVars} when 667 \code{levelsOnly = TRUE} to see if any predictors share common 668 levels. 669 670 \item A new option \code{fullRank} was added to \code{dummyVars}. 671 When true, \code{contr.treatment} is used. Otherwise, 672 \code{contr.ltfr} is used. 673 674 \item A bug in \code{train} was fixed with \code{gbm} models 675 (thanks to stackoverflow user screechOwl for finding it). 676 677 } 678 } 679 680 681\section{Changes in version 5.16-24}{ 682\itemize{ 683 \item The \code{protoclass} function in the \cpkg{protoclass} 684 package was added. The model uses a distance matrix as input and 685 the \code{train} method also uses the \cpkg{proxy} package to 686 compute the distance using the Minkowski distance. The two tuning 687 parameters is the neighborhood size (\code{eps}) and the Minkowski 688 distance parameter (\code{p}). 689 690 \item A bug was (hopefully) fixed that occurred when some type of 691 parallel processing was used with \code{train}. The problem is 692 that the \code{methods} package was not being loaded in the workers. 693 While reproducible, it is unknown why this occurs and why it is 694 only for some technologies and systems. The \code{methods} package 695 is now a formal dependency and we coerce the workers to load it 696 remotely. 697 698 \item A bug was fixed where some calls were printed twice. 699 700 \item For \code{rpart}, \code{C5.0} and \code{ksvm}, cost-sensitive 701 versions of these models for two classes were added to \code{train}. 702 The method values are \code{rpartCost}, \code{C5.0Cost} and 703 \code{svmRadialWeights}. 704 705 \item The prediction code for the \code{ksvm} models was changed. There 706are some cases where the class predictions and the predicted class 707probabilities disagree. This usually happens when the probabilities are 708close to 0.50 (in the two class case). A \cpkg{kernlab} bug has been 709filed. In the meantime, if the \code{ksvm} model uses a probability 710model, the class probabilities are generated first and the predicted 711class is assigned to the probability with the largest value. Thanks to 712Kjell Johnson for finding that one. 713 714\item \code{print.train} was changed so that tune parameters that are 715logicals are printed well. 716 717 } 718 } 719 720 721\section{Changes in version 5.16-13}{ 722\itemize{ 723 \item Added a few exemptions to the logic that determines whether a model call should be scrubbed. 724 725 \item An error trap was created to catch issues with missing importance scores in \code{rfe}. 726 727 } 728 } 729 730\section{Changes in version 5.16-03}{ 731\itemize{ 732 \item A function \code{twoClassSim} was added for benchmarking classification models. 733 734 \item A bug was fixed in \code{predict.nullModel} related to predicted class probabilities. 735 736 \item The version requirement for \cpkg{gbm} was updated. 737 738 \item The function \code{getTrainPerf} was made visible. 739 740 \item The automatic tuning grid for \code{sda} models from the \cpkg{sda} package was changed to include \code{lambda}. 741 742 \item When \code{randomForests} is used with \code{train} and \code{tuneLength == 1}, the \code{randomForests} default value for \code{mtry} is used. 743 744 \item Maximum uncertainty linear discriminant analysis (\code{Mlda}) and factor-based linear discriminant analysis (\code{RFlda}) from the \cpkg{HiDimDA} package were added to \code{train}. 745} 746} 747 748\section{Changes in version 5.15-87}{ 749\itemize{ 750 751\item Added the Yeo-Johnson power transformation from the \cpkg{car} 752package to the \code{preProcess} function. 753 754\item A \code{train} bug was fixed for the \code{rrlda} model (found 755by Tiago Branquinho Oliveira). 756 757\item The \code{extraTrees} model in the \cpkg{extraTrees} package was 758added. 759 760\item The \code{kknn.train} model in the \cpkg{kknn} package was 761added. 762 763\item A bug was fixed in \code{lrFuncs} where the class threshold was 764improperly set (thanks to David Meyer). 765 766\item A bug related to newer versions of the \cpkg{gbm} package were fixed. 767Another \cpkg{gbm} bug was fixed related to using non-Bernoulli distributions 768with two class outcomes (thanks to Zachary Mayer). 769 770\item The old funciton \code{getTrainPerf} was finally made visible. 771 772\item Some models are created using "do.call" and may contain the 773entire data set in the call object. A function to "scrub" some model call 774objects was added to reduce their size. 775 776\item The tuning process for \code{sda:::sda} models was changed to 777add the \code{lambda} parameter. 778 779} 780} 781 782\section{Changes in version 5.15-60}{ 783\itemize{ 784 785\item A bug in \code{predictors.earth}, discovered by Katrina Bennett, 786was fixed. 787 788\item A bug induced by version 5.15-052 for the bootstrap 632 rule was 789fixed. 790 791\item The DESCRIPTION file as of 5.15-048 should have used a 792version-specific lattice dependency. 793 794\item \code{lift} can compute gain and lift charts (and defaults to 795gain) 796 797\item The \cpkg{gbm} model was updated to handle 3 or more classes. 798 799\item For bagged trees using \cpkg{ipred}, the code in \code{train} 800defaults to \code{keepX = FALSE} to save space. Pass in \code{keepX = 801TRUE} to use out-of-bag sampling for this model. 802 803\item Changes were made to support vector machines for classification 804models due to bugs with class probabilities in the latest version of 805\cpkg{kernlab}. The \code{prob.model} will default to the value of 806\code{classProbs} in the \code{trControl} function. If 807\code{prob.model} is passed in as an argument to \code{train}, this 808specification over-rides the default. In other words, to avoid 809generating a probability model, set either \code{classProbs = FALSE} 810or \code{prob.model = FALSE}. 811 812} 813} 814 815\section{Changes in version 5.15-052}{ 816\itemize{ 817 818\item Added \code{bayesglm} from the \cpkg{arm} package. 819 820\item A few bugs were fixed in \code{bag}, thanks to Keith 821Woolner. Most notably, out-of-bag estimates are now computed when the 822prediction function includes a column called \code{pred}. 823 824\item Parallel processing was implemented in \code{bag} and 825\code{avNNet}, which can be turned off using an optional arguments. 826 827\item \code{train}, \code{rfe}, \code{sbf}, \code{bag} and 828\code{avNNet} were given an additional argument in their respective 829control files called \code{allowParallel} that defaults to 830\code{TRUE}. When \code{Code}, the code will be executed in parallel 831if a parallel backend (e.g. \cpkg{doMC}) is registered. When 832\code{allowParallel = FALSE}, the parallel backend is always 833ignored. The use case is when \code{rfe} or \code{sbf} calls 834\code{train}. If a parallel backend with P processors is being used, 835the combination of these functions will create P^2 processes. Since 836some operations benefit more from parallelization than others, the 837user has the ability to concentrate computing resources for specific 838functions. 839 840\item A new resampling function called \code{createTimeSlices} was 841contributed by Tony Cooper that generates cross-validation indices for 842time series data. 843 844\item A few more options were added to 845\code{trainControl}. \code{initialWindow}, \code{horizon} and 846\code{fixedWindow} are applicable for when \code{method = 847"timeslice"}. Another, \code{indexOut} is an optional list of 848resampling indices for the hold-out set. By default, these values are 849the unique set of data points not in the training set. 850 851\item A bug was fixed in multiclass \code{glmnet} models when 852generating class probabilities (thanks to Bradley Buchsbaum for 853finding it). 854} 855} 856 857 858\section{Changes in version 5.15-048}{ 859\itemize{ 860 861\item The three vignettes were removed and two things were added: a 862smaller vignette and a large collection of help pages. 863 864\item Minkoo Seo found a bug where \code{na.action} was not being properly 865set with train.formula(). 866 867\item \code{parallel.resamples} was changed to properly account for 868missing values. 869 870\item Some testing code was removed from \code{probFunction} and 871\code{predictionFunction}. 872 873\item Fixed a bug in \code{sbf} exposed by a new version of \cpkg{plyr}. 874 875\item Changed the package dependency on \cpkg{reshape} to \cpkg{reshape2}. 876 877\item To be more consistent with recent versions of \cpkg{lattice}, 878the \code{parallel.resamples} function was changed to 879\code{parallelplot.resamples}. 880 881\item Since \code{ksvm} now allows probabilities when class weights 882are used, the default behavior in \code{train} is to set 883\code{prob.model = TRUE} unless the user explicitly sets it to 884\code{FALSE}. However, I have reported a bug in \code{ksvm} that gives 885inconsistent results with class weights, so this is not advised at 886this point in time. 887 888\item Bugs were fix in \code{predict.bagEarth} and 889\code{predict.bagFDA}. 890 891\item When using \code{rfeControl(saveDetails = TRUE)} or 892\code{sbfControl(saveDetails = TRUE)} an additional column is 893added to \code{object$pred} called \code{rowIndex}. This indicates the 894row from the original data that is being held-out. 895 896} 897} 898 899\section{Changes in version 5.15-045}{ 900\itemize{ 901 902\item A bug was fixed that induced \code{NA} values in SVM model predictions. 903} 904} 905 906\section{Changes in version 5.15-042}{ 907\itemize{ 908 909\item Many examples are wrapped in dontrun to speed up cran checking. 910 911\item The \code{scrda} methods were removed from the package (on 9126/30/12, R Core sent an email that "since we haven't got fixes for 913long standing warnings of the rda packages since more than half a year 914now, we set the package to ORPHANED.") 915 916\item \cpkg{C50} was added (model codes \code{C5.0}, \code{C5.0Tree} and 917\code{C5.0Rules}). 918 919\item Fixed a bug in \code{train} with NaiveBayes when \code{fL != 0} 920was used 921 922\item The output of \code{train} with \code{verboseIter = TRUE} was 923modified to show the resample label as well as logging when the worker 924started and stopped the task (better when using parallel processing). 925 926\item Added a long-hidden function \code{downSample} for class imbalances 927 928\item An \code{upSample} function was added for class imbalances. 929 930\item A new file, aaa.R, was added to be compiled first that tries to 931eliminate the dreaded 'no visible binding for global variable' false 932positives. Specific namespaces were used with several functions for 933avoid similar warnings. 934 935\item A bug was fixed with \code{icr.formula} that was so ridiculous, 936I now know that nobody has ever used that function. 937 938\item Fixed a bug when using \code{method = "oob"} with \code{train} 939 940\item Some exceptions were added to \code{plot.train} so that some 941tuning parameters are better labeled. 942 943\item \code{dotplot.resamples} and \code{bwplot.resamples} now order 944the models using the first metric. 945 946\item A few of the lattice plots for the \code{resamples} class were 947changed such that when only one metric is shown: the strip is not 948shown and the x-axis label displays the metric 949 950\item When using \code{trainControl(savePredictions = TRUE)} an 951additional column is added to \code{object$pred} called 952\code{rowIndex}. This indicates the row from the original data that is 953being held-out. 954 955\item A variable importance function for \code{nnet} objects was 956created based on Gevrey, M., Dimopoulos, I., & Lek, S. (2003). Review 957and comparison of methods to study the contribution of variables in 958artificial neural network models. ecological modelling, 160(3), 959249–264. 960 961\item The \code{predictor} function for \code{glmnet} was update and a 962variable importance function was also added. 963 964\item Raghu Nidagal found a bug in \code{predict.avNNet} that was 965fixed. 966 967\item \code{sensitivity} and \code{specificity} were given an 968\code{na.rm} argument. 969 970\item A first attempt at fault tolerance was added to \code{train}. If 971a model fit fails, the predictions are set to \code{NA} and a warning 972is issued (eg "model fit failed for Fold04: sigma=0.00392, 973C=0.25"). When \code{verboseIter = TRUE}, the warning is also printed 974to the log. Resampled performance is calculated on only the 975non-missing estimates. This can also be done during predictions, but 976must be done on a model by model basis. Fault tolerance was added for 977\cpkg{kernlab} models only at this time. 978 979\item \code{lift} was modified in two ways. First, \code{cuts} is no 980longer an argument. The function always uses cuts based on the number 981of unique probability estimates. Second, a new argument called 982\code{label} is available to use alternate names for the models 983(e.g. names that are not valid R variable names). 984 985\item A bug in \code{print.bag} was fixed. 986 987\item Class probabilities were not being generated for sparseLDA 988models. 989 990\item Bugs were fixed in the new varImp methods for PART and RIPPER 991 992\item Starting using namespaces for \code{ctree} and \code{cforest} to 993avoid conflicts between duplicate function names in the \cpkg{party} 994and \cpkg{partykit} package 995 996\item A set of functions for RFE and logistic regression 997(\code{lrFuncs}) was added. 998 999\item A bug in \code{train} with \code{method="glmStepAIC"} was fixed 1000so that \code{direction} and other \code{stepAIC} arguments were 1001honored. 1002 1003\item A bug was fixed in \code{preProcess} where the number of ICA 1004components was not specified. (thanks to Alexander Lebedev) 1005 1006\item Another bug was fixed for oblique random forest methods in 1007\code{train}. (thanks to Alexander Lebedev) 1008 1009} 1010} 1011 1012\section{Changes in version 5.15-023}{ 1013\itemize{ 1014 1015\item The list of models that can accept factor inputs directly was 1016expanded to include the \cpkg{RWeka} models, \code{ctree}, 1017\code{cforest} and custom models. 1018 1019\item Added model \code{lda2}, which tunes by the number of functions 1020used during prediction. 1021 1022\item \code{predict.train} allows probability predictions for custom 1023models now (thanks to Peng Zhang) 1024 1025\item \code{confusionMatrix.train} was updated to use the default 1026\code{confusionMatrix} code when \code{norm = "none"} and only a 1027single hold-out was used. 1028 1029\item Added variable importance metrics for PART and RIPPER in the 1030\cpkg{RWeka} package. 1031 1032\item vignettes were moved from /inst/doc to /vignettes 1033 1034} 1035} 1036 1037 1038\section{Changes in version 5.14-023}{ 1039\itemize{ 1040 1041\item The model details in \code{?train} was changed to be more 1042readable 1043 1044\item Added two models from the \cpkg{RRF} package. \code{RRF} uses a 1045penalty for each predictor based on the scaled variable importance 1046scores from a prior random forest fit. \code{RRFglobal} sets a common, 1047global penalty across all predictors. 1048 1049\item Added two models from the \cpkg{KRLS} package: \code{krlsRadial} 1050and \code{krlsPoly}. Both have kernel parameters (\code{sigma} and 1051\code{degree}) and a common regularization parameter 1052\code{lambda}. The default for \code{lambda} is \code{NA}, letting the 1053\code{krls} function estimate it internally. \code{lambda} can also be 1054specified via \code{tuneGrid}. 1055 1056\item \code{twoClassSummary} was modified to wrap the call to 1057\code{pROC:::roc} in a \code{try} command. In cases where the hold-out 1058data are only from one class, this produced an error. Now it generates 1059\code{NA} values for the AUC when this occurs and a general warning is 1060issued. 1061 1062\item The underlying workflows for \code{train} were modified so that 1063missing values for performance measures would not throw an error (but 1064will issue a warning). 1065 1066} 1067} 1068 1069 1070\section{Changes in version 5.13-037}{ 1071\itemize{ 1072 1073\item Models \code{mlp}, \code{mlpWeightDecay}, \code{rbf} and 1074\code{rbfDDA} were added from \cpkg{RSNNS}. 1075 1076\item Functions \code{roc}, \code{rocPoint} and \code{aucRoc} finally 1077met their end. The cake was a lie. 1078 1079\item This NEWS file was converted over to Rd format. 1080} 1081} 1082 1083 1084\section{Changes in version 5.13-020}{ 1085\itemize{ 1086 1087\item \code{\link{lift}} was expanded into \code{\link{lift.formula}} 1088 for calculating the plot points and \code{\link{xyplot.lift}} to 1089 create the plot. 1090 1091\item The package vignettes were altered to stop loading external 1092 RData files. 1093 1094\item A few \code{match.call} changes were made to pass new R CMD 1095check tests. 1096 1097\item \code{\link{calibration}}, \code{\link{calibration.formula}} and 1098 \code{\link{xyplot.calibration}} were created to make probability 1099 calibration plots. 1100 1101\item Model types \code{xyf} and \code{bdk} from the \cpkg{kohonen} 1102package were added. 1103 1104\item \code{\link{update.train}} was added so that tuning parameters 1105 can be manually set if the automated approach to setting their 1106 values is insufficient. 1107 1108 } 1109} 1110 1111\section{Changes in version 5.11-006}{ 1112\itemize{ 1113 1114\item When using \code{method = "pls"} in \code{\link{train}}, the 1115 \code{\link[pls]{plsr}} function used the default PLS algorithm 1116 ("kernelpls"). Now, the full orthogonal scores method is used. This 1117 results in the same model, but a more extensive set of values are 1118 calculated that enable VIP calculations (without much of a loss in 1119 computational efficient). 1120 1121\item A check was added to \code{\link{preProcess}} to ensure valid 1122 values of \code{method} were used. 1123 1124\item A new method, \code{kernelpls}, was added. 1125 1126\item \code{residuals} and \code{summary} methods were added to 1127 \code{\link{train}} objects that pass the final model to their 1128 respective functions. 1129 1130 } 1131} 1132 1133\section{Changes in version 5.11-006}{ 1134\itemize{ 1135 1136\item Bugs were fixed that prevented hold-out predictions from being 1137 returned. 1138 1139 } 1140} 1141 1142\section{Changes in version 5.11-003}{ 1143\itemize{ 1144 1145\item A bug in \code{roc} was found when the classes were completely 1146 separable. 1147 1148\item The ROC calculations for \code{\link{twoClassSummary}} and 1149 \code{\link{filterVarImp}} were changed to use the \cpkg{pROC} 1150 package. This, and other changes, have increased efficiency. For 1151 \code{\link{filterVarImp}} on the cell segmentation data lead to a 1152 54-fold decrease in execution time. For the Glass data in the 1153 \cpkg{mlbench} package, the speedup was 37-fold. Warnings were 1154 added for \code{roc}, \code{aucRoc} and 1155 \code{rocPoint} regarding their deprecation. 1156 1157\item random ferns (package \cpkg{rFerns}) were added 1158 1159\item Another sparse LDA model (from the penalizedLDA) was also added 1160 1161 } 1162} 1163 1164\section{Changes in version 5.09-002}{ 1165\itemize{ 1166 1167\item Fixed a bug which occurred when \code{\link[pls]{plsda}} models were used with class 1168 probabilities 1169 1170\item As of 8/15/11, the \code{\link[glmnet]{glmnet}} function was 1171 updated to return a character vector. Because of this, 1172 \code{\link{train}} required modification and a version requirement 1173 was put in the package description file. 1174 1175 } 1176} 1177 1178\section{Changes in version 5.09-006}{ 1179\itemize{ 1180 1181\item Shea X made a suggestion and provided code to improve the speed 1182 of prediction when sequential parameters are used for 1183 \code{\link[gbm]{gbm}} models. 1184 1185\item Andrew Ziem suggested an error check with \code{metric = "ROC"} and 1186 \code{classProbs = FALSE}. 1187 1188\item Andrew Ziem found a bug in how \code{\link{train}} obtained 1189 \code{\link[earth]{earth}} class probabilities 1190 1191 } 1192} 1193 1194\section{Changes in version 5.08-011}{ 1195\itemize{ 1196 1197\item Andrew Ziem found another small bug with parallel processing and 1198 \code{\link{train}} (functions in the caret namespace cannot be found). 1199 1200\item Ben Hoffman found a bug in \code{\link{pickSizeTolerance}} that was fixed. 1201 1202\item Jiaye Yu found (and fixed) a bug in getting predictions back from 1203 \code{\link{rfe}} 1204 1205 } 1206} 1207 1208\section{Changes in version 5.07-024}{ 1209\itemize{ 1210 1211\item Using \code{saveDetails = TRUE} in \code{\link{sbfControl}} or 1212 \code{\link{rfeControl}} will save the predictions on the hold-out 1213 sets (Jiaye Yu wins the prize for finding that one). 1214 1215\item \code{\link{trainControl}} now has a logical to save the hold-out predictions. 1216 1217 } 1218} 1219 1220\section{Changes in version 5.07-005}{ 1221\itemize{ 1222 1223\item \code{type = "prob"} was added for \code{\link{avNNet}} prediction. 1224 1225\item A warning was added when a model from \cpkg{RWeka} is used with 1226 \code{\link{train}} and (it appears that) \cpkg{multicore} is being 1227 used for parallel processing. The session will crash, so don't do 1228 that. 1229 1230\item A bug was fixed where the extrapolation limits were being 1231 applied in \code{\link{predict.train}} but not in 1232 \code{\link{extractPrediction}}. Thanks to Antoine Stevens for 1233 finding this. 1234 1235\item Modifications were made to some of the workflow code to expose 1236 internal functions. When parallel processing was used with 1237 \cpkg{doMPI} or \cpkg{doSMP}, \cpkg{foreach} did not find some 1238 \cpkg{caret} internals (but \cpkg{doMC} did). 1239 1240 1241 } 1242} 1243 1244\section{Changes in version 5.07-001}{ 1245\itemize{ 1246 1247\item changed calls to \code{\link[pls]{predict.mvr}} since the \cpkg{pls} package now has a 1248 namespace. 1249 1250 } 1251} 1252 1253\section{Changes in version 5.06-002}{ 1254\itemize{ 1255 1256\item a beta version of custom models with \code{\link{train}} is included. The 1257 "caretTrain" vignette was updated with a new section that defines 1258 how to make custom models. 1259 1260 } 1261} 1262 1263\section{Changes in version 5.05-004}{ 1264\itemize{ 1265 1266\item laying some of the groundwork for custom models 1267 1268\item updates to get away from deprecated (mean and sd on data frames) 1269 1270\item The pre-processing in \code{\link{train}} bug of the last 1271 version was not entirely squashed. Now it is. 1272 1273 } 1274} 1275 1276\section{Changes in version 5.04-007}{ 1277\itemize{ 1278 1279\item \code{\link{panel.lift}} was moved out of the examples in \code{?lift} and into the 1280 package along with another function, \code{\link{panel.lift2}}. 1281 1282\item \code{\link{lift}} now uses \code{\link{panel.lift2}} by default 1283 1284\item Added robust regularized linear discriminant analysis from the 1285 \cpkg{rrlda} package 1286 1287\item Added \code{evtree} from \cpkg{evtree} 1288 1289\item A weird bug was fixed that occurred when some models were run with 1290 sequential parameters that were fixed to single values (thanks to 1291 Antoine Stevens for finding this issue). 1292 1293item Another bug was fixed where pre-processing with \code{\link{train}} could fail 1294 1295 } 1296} 1297 1298\section{Changes in version 5.03-003}{ 1299\itemize{ 1300 1301\item pre-processing in \code{\link{train}} did not occur for the final model fit 1302 1303 } 1304} 1305 1306\section{Changes in version 5.02-011}{ 1307\itemize{ 1308 1309\item A function, \code{\link{lift}}, was added to create lattice 1310objects for lift plots. 1311 1312\item Several models were added from the \cpkg{obliqueRF} package: 1313 'ORFridge' (linear combinations created using L2 regularization), 1314 'ORFpls' (using partial least squares), 'ORFsvm' (linear support 1315 vector machines), and 'ORFlog' (using logistic regression). As of 1316 now, the package only support classification. 1317 1318\item Added regression models \code{simpls} and 1319 \code{widekernelpls}. These are new models since both 1320 \code{\link{train}} and \code{\link[pls]{plsr}} have an argument 1321 called \code{method}, so the computational algorithm could not be 1322 passed through using the three dots. 1323 1324\item Model \code{rpart} was added that uses \code{cp} as the tuning 1325 parameter. To make the model codes more consistent, \code{rpart} 1326 and \code{ctree} correspond to the nominal tuning parameters 1327 (\code{cp} and \code{mincriterion}, respectively) and \code{rpart2} 1328 and \code{ctree2} are the alternate versions using \code{maxdepth}. 1329 1330\item The text for \code{ctree}'s tuning parameter was changed to '1 - 1331P-Value Threshold' 1332 1333\item The argument \code{controls} was not being properly passed 1334 through in models \code{ctree} and \code{ctree2}. 1335 1336 } 1337} 1338 1339\section{Changes in version 5.01-001}{ 1340\itemize{ 1341 1342\item \code{controls} was not being set properly for \code{cforest} 1343models in \code{\link{train}} 1344 1345\item The print methods for \code{\link{train}}, \code{\link{rfe}} and 1346\code{\link{sbf}} did not recognize LOOCV 1347 1348\item \code{\link{avNNet}} sometimes failed with categorical outcomes with \code{bag = FALSE} 1349 1350\item A bug in \code{\link{preProcess}} was fixed that was triggered by matrices without 1351 dimnames (found by Allan Engelhardt) 1352 1353\item bagged MARS models with factor outcomes now work 1354 1355\item \code{cforest} was using the argument \code{control} instead of \code{controls} 1356 1357\item A few bugs for class probabilities were fixed for \code{slda}, \code{hdda}, 1358 \code{glmStepAIC}, \code{nodeHarvest}, \code{avNNet} and \code{sda} 1359 1360\item When looping over models and resamples, the \cpkg{foreach} 1361 package is now being used. Now, when using parallel processing, the 1362 \cpkg{caret} code stays the same and parallelism is invoked using 1363 one of the "do" packages (eg. \cpkg{doMC}, \cpkg{doMPI}, etc). This 1364 affects \code{\link{train}}, \code{\link{rfe}} and 1365 \code{\link{sbf}}. Their respective man pages have been revised to 1366 illustrate this change. 1367 1368\item The order of the results produced by \code{\link{defaultSummary}} were changed 1369 so that the ROC AUC is first 1370 1371\item A few man and C files were updated to eliminate R CMD check warnings 1372 1373\item Now that we are using foreach, the verbose option in \code{\link{trainControl}}, 1374 \code{\link{rfeControl}} and \code{\link{sbfControl}} are now defaulted to \code{FALSE} 1375 1376\item \code{\link{rfe}} now returns the variable ranks in a single data frame (previously 1377 there were data frames in lists of lists) for each of use. This will 1378 will break code from previous versions. The built-in RFE functions 1379 were also modified 1380 1381\item confusionMatrix methods for \code{\link{rfe}} and \code{\link{sbf}} were added 1382 1383\item NULL values of 'method' in \code{\link{preProcess}} are no longer allowed 1384 1385\item a model for ridge regression was added (\code{method = 'ridge'}) based on \code{\link[eslasticnet]{enet}}. 1386 1387 1388 } 1389} 1390 1391\section{Changes in version 4.98}{ 1392\itemize{ 1393 1394\item A bug was fixed in a few of the bagging aggregation 1395 functions (found by Harlan Harris). 1396 1397\item Fixed a bug spotted by Richard Marchese Robinson in createFolds 1398 when the outcome was numeric. The issue is that 1399 \code{\link{createFolds}} is trying to randomize \code{n/4} numeric 1400 samples to \code{k} folds. With less than 40 samples, it could not 1401 always do this and would generate less than \code{k} folds in some 1402 cases. The change will adjust the number of groups based on 1403 \code{n} and \code{k}. For small samples sizes, it will not use 1404 stratification. For larger data sets, it will at most group the 1405 data into quartiles. 1406 1407\item A function \code{\link{confusionMatrix.train}} was added to get an average 1408 confusion matrices across resampled hold-outs when using the 1409 \code{\link{train}} function for classification. 1410 1411\item Added another model, \code{\link{avNNet}}, that fits several neural networks 1412 via the \cpkg{nnet} package using different seeds, then averages the 1413 predictions of the networks. There is an additional bagging 1414 option. 1415 1416\item The default value of the 'var' argument of \code{\link{bag}} was changed. 1417 1418\item As requested, most options can be passed from 1419 \code{\link{train}} to \code{\link{preProcess}}. The 1420 \code{\link{trainControl}} function was re-factored and several 1421 options (e.g. \code{k}, \code{thresh}) were combined into a single 1422 list option called \code{preProcOptions}. The default is consistent 1423 with the original configuration: \code{preProcOptions = list(thresh 1424 = 0.95, ICAcomp = 3, k = 5)} 1425 1426\item nother option was added to \code{\link{preProcess}}. The \code{pcaComp} 1427 option can be used to set exactly how many components are used 1428 (as opposed to just a threshold). It defaults to \code{NULL} so that 1429 the threshold method is still used by default, but a non-null 1430 value of \code{pcaComp} over-rides \code{thresh}. 1431 1432\item When created within \code{\link{train}}, the call for \code{\link{preProcess}} is now 1433 modified to be a text string ("scrubed") because the call could 1434 be very large. 1435 1436\item Removed two deprecated functions: \code{applyProcessing} and 1437\code{processData}. 1438 1439\item A new version of the cell segmentation data was saved and the 1440 original version was moved to the package website (see 1441 \code{\link{segmentationData}} for location). First, several 1442 discrete versions of some of the predictors (with the suffix 1443 \code{"Status"}) were removed. Second, there are several skewed 1444 predictors with minimum values of zero (that would benefit from 1445 some transformation, such as the log). A constant value of 1 was 1446 added to these fields: \code{AvgIntenCh2}, \code{FiberAlign2Ch3}, 1447 \code{FiberAlign2Ch4}, \code{SpotFiberCountCh4} and 1448 \code{TotalIntenCh2}. 1449 1450 1451 } 1452} 1453 1454\section{Changes in version 4.92}{ 1455\itemize{ 1456 1457\item Some tweaks were made to \code{\link{plot.train}} in a effort to get the group 1458key to look less horrid. 1459 1460\item \code{\link{train}}, \code{\link{rfe}} and \code{\link{sbf}} are 1461now able to estimate the time that these models take to predict new 1462samples. Their respective control objects have a new option, 1463\code{timingSamps}, that indicates how many of the training set samples 1464should be used for prediction (the default of zero means do not 1465estimate the prediction time). 1466 1467\item \code{\link{xyplot.resamples}} was modified. A new argument, 1468\code{what}, has values: \code{"scatter"} plots the resampled 1469performance values for two models; \code{"BlandAltman"} plots the 1470difference between two models by the average (aka a MA plot) for two 1471models; \code{"tTime"}, \code{"mTime"}, \code{"pTime"} plot the total 1472model building and tuning; time (\code{"t"}) or the final model 1473building time (\code{"m"}) or the time to produce predictions 1474(\code{"p"}) against a confidence interval for the average 1475performance. 2+ models can be used. 1476 1477\item Three new model types were added to \code{\link{train}} using 1478 \code{\link[leaps]{regsubsets}} in the \cpkg{leaps} package: 1479 \code{"leapForward"}, \code{"leapBackward"} and \code{"leapSeq"}. The 1480 tuning parameter, \code{nvmax}, is the maximum number of terms in the 1481 subset. 1482 1483\item The seed was accidentally set when \code{\link{preProcess}} used ICA (spotted 1484 by Allan Engelhardt) 1485 1486\item \code{\link{preProcess}} was always being called (even to do nothing) 1487 (found by Guozhu Wen) 1488 1489 } 1490} 1491 1492\section{Changes in version 4.91}{ 1493\itemize{ 1494 1495\item Added a few new models associated with the \cpkg{bst} package: bstTree, 1496bstLs and bstSm. 1497 1498\item A model denoted as \code{"M5"} that combines M5P and M5Rules from the 1499\cpkg{RWeka} package. This new model uses either of these functions 1500depending on the tuning parameter \code{"rules"}. 1501 1502 } 1503} 1504 1505\section{Changes in version 4.90}{ 1506\itemize{ 1507 1508\item Fixed a bug with \code{\link{train}} and \code{method = "penalized"}. Thanks to 1509Fedor for finding it. 1510 1511 } 1512} 1513 1514\section{Changes in version 4.89}{ 1515\itemize{ 1516 1517\item A new tuning parameter was added for \code{M5Rules} controlling smoothing. 1518 1519\item The Laplace correction value for Naive Bayes was also added as a 1520tuning parameter. 1521 1522\item \code{\link{varImp.RandomForest}} was updated to work. It now requires a recent 1523version of the \cpkg{party} package. 1524 1525 } 1526} 1527 1528\section{Changes in version 4.88}{ 1529\itemize{ 1530 1531\item A variable importance method was created for \cpkg{Cubist} models. 1532 1533 } 1534} 1535 1536\section{Changes in version 4.87}{ 1537\itemize{ 1538 1539\item Altered the earth/MARS/FDA labels to be more exact. 1540 1541\item Added cubist models from the \cpkg{Cubist} package. 1542 1543\item A new option to \code{\link{trainControl}} was added to allow 1544users to constrain the possible predicted values of the model to the 1545range seen in the training set or a user-defined range. One-sided 1546ranges are also allowed. 1547 1548 } 1549} 1550 1551\section{Changes in version 4.85}{ 1552\itemize{ 1553 1554\item Two typos fixed in \code{\link{print.rfe}} and 1555\code{\link{print.sbf}} (thanks to Jan Lammertyn) 1556 1557 } 1558} 1559 1560\section{Changes in version 4.83}{ 1561\itemize{ 1562 1563\item \code{\link{dummyVars}} failed with formulas using \code{"."} 1564 (\code{all.vars} does not handle this well) 1565 1566\item \code{tree2} was failing for some classification models 1567 1568\item When SVM classification models are used with \code{class.weights}, the 1569options \code{prob.model} is automatically set to \code{FALSE} (otherwise, it 1570is always set to \code{TRUE}). A warning is issued that the model will 1571not be able to create class probabilities. 1572 1573\item Also for SVM classification models, there are cases when the 1574probability model generates negative class probabilities. In 1575these cases, we assign a probability of zero then coerce the 1576probabilities to sum to one. 1577 1578\item Several typos in the help pages were fixed (thanks to Andrew Ziem). 1579 1580\item Added a new model, \code{svmRadialCost}, that fits the SVM model 1581and estimates the \code{sigma} parameter for each resample (to 1582properly capture the uncertainty). 1583 1584\item \code{\link{preProcess}} has a new method called \code{"range"} that scales the predictors 1585to [0, 1] (which is approximate for new samples if the training set 1586ranges is narrow in comparison). 1587 1588\item A check was added to \code{\link{train}} to make sure that, when the user passes 1589a data frame to \code{\link{tuneGrid}}, the names are correct and complete. 1590 1591\item \code{\link{print.train}} prints the number of classes and levels for classification 1592models. 1593 1594 } 1595} 1596 1597\section{Changes in version 4.78}{ 1598\itemize{ 1599 1600\item Added a few bagging modules. See ?bag. 1601 1602\item Added basic timings of the entire call to \code{\link{train}}, \code{\link{rfe}} and \code{\link{sbf}} 1603as well as the fit time of the final model. These are stored in an element 1604called "times". 1605 1606\item The data files were updated to use better compression, which added a 1607higher R version dependency. 1608 1609\item \code{\link{plot.train}} was pretty much re-written to more effectively use trellis theme 1610defaults and to allow arguments (e.g. axis labels, keys, etc) to be passed 1611in to over-ride the defaults. 1612 1613\item Bug fix for lda bagging function 1614 1615\item Bug fix for \code{\link{print.train}} when \code{preProc} is \code{NULL} 1616 1617\item \code{\link{predict.BoxCoxTrans}} would go all klablooey if there were missing 1618 values 1619 1620\item \code{\link{varImp.rpart}} was failing with some models (thanks to Maria Delgado) 1621 1622 } 1623} 1624 1625\section{Changes in version 4.77}{ 1626\itemize{ 1627 1628\item A new class was added or estimating and applying the Box-Cox 1629transformation to data called BoxCoxTrans. This is also included as an 1630option to transform predictor variables. Although the Box-Tidwell 1631transformation was invented for this purpose, the Box-Cox transformation 1632is more straightforward, less prone to numerical issues and just as 1633effective. This method was also added to \code{\link{preProcess}}. 1634 1635\item Fixed mis-labelled x axis in \code{\link{plot.train}} when a 1636transformation is applied for models with three tuning parameters. 1637 1638\item When plotting a \code{\link{train}} object with \code{method == 1639"gbm"} and multiple values of the shrinkage parameter, the ordering of 1640panels was improved. 1641 1642\item Fixed bugs for regression prediction using \code{partDSA} and 1643\code{qrf}. 1644 1645\item Another bug, reported by Jan Lammertyn, related to 1646\code{\link{extractPrediciton}} with a single predictor was also 1647fixed. 1648 1649 } 1650} 1651 1652\section{Changes in version 4.76}{ 1653\itemize{ 1654 1655\item Fixed a bug where linear SVM models were not working for classification 1656 1657 } 1658} 1659 1660\section{Changes in version 4.75}{ 1661\itemize{ 1662 1663\item \code{'gcvEearth'} which is the basic MARS model. The pruning procedure 1664 is the nominal one based on GCV; only the degree is tuned by \code{\link{train}}. 1665 1666\item \code{'qrnn'} for quantile regression neural networks from the \cpkg{qrnn} package. 1667 1668\item \code{'Boruta'} for random forests models with feature selection via the 1669 \cpkg{Boruta} package. 1670 1671 } 1672} 1673 1674\section{Changes in version 4.74}{ 1675\itemize{ 1676 1677\item Some changes to \code{\link{print.train}}: the call is not automatically 1678 printed (but can be when \code{\link{print.train}} is explicitly invoked); the 1679 "Selected" column is also not automatically printed (but can be); 1680 non-table text now respects \code{options("width")}; only significant 1681 digits are now printed when tuning parameters are kept at a 1682 constant value 1683 1684 } 1685} 1686 1687\section{Changes in version 4.73}{ 1688\itemize{ 1689 1690\item Bug fixes to \code{\link{preProcess}} related to complete.cases and a single predictor. 1691 1692\item For knn models (knn3 and knnreg), added automatic conversion of data frames 1693to matrices 1694 1695 } 1696} 1697 1698\section{Changes in version 4.72}{ 1699\itemize{ 1700 1701\item A new function for \code{\link{rfe}} with \cpkg{gam} was added. 1702 1703\item "Down-sampling" was implemented with \code{\link{bag}} so that, for 1704classification models, each class has the same number of classes 1705as the smallest class. 1706 1707\item Added a new class, \code{\link{dummyVars}}, that creates an entire set of 1708binary dummy variables (instead of the reduced, full rank set). 1709The initial code was suggested by Gabor Grothendieck on R-Help. 1710The predict method is used to create dummy variables for any 1711data set. 1712 1713\item Added \code{\link{R2}} and \code{\link{RMSE}} functions for evaluating regression models 1714 1715\item \code{\link{varImp.gam}} failed to recognize objects from \cpkg{mgcv} 1716 1717\item a small fix to test a logical vector \code{\link{filterVarImp}} 1718 1719\item When \code{\link{diff.resamples}} calculated the number of comparisons, 1720 the \code{"models"} argument was ignored. 1721 1722\item \code{\link{predict.bag}} was ignoring \code{type = "prob"} 1723 1724\item Minor updates to conform to R 2.13.0 1725 1726 } 1727} 1728 1729\section{Changes in version 4.70}{ 1730\itemize{ 1731 1732\item Added a warning to \code{\link{train}} when class levels are not 1733valid R variable names. 1734 1735\item Fixed a bug in the variable importance function for 1736\code{multinom} objects. 1737 1738\item Added p-value adjustments to 1739\code{\link{summary.diff.resamples}}. Confidence intervals in 1740\code{\link{dotplot.diff.resamples}} are adjusted accordingly if the 1741Bonferroni is used. 1742 1743\item For \code{\link{dotplot.resamples}}, no point was plotted when 1744the upper and/or lower interval values were NaN. Now, the point is 1745plotted but without the interval bars. 1746 1747\item Updated \code{\link{print.rfe}} to correctly describe new 1748resampling methods. 1749 1750 } 1751} 1752 1753\section{Changes in version 4.69}{ 1754\itemize{ 1755 1756\item Fixed a bug in \code{\link{predict.rfe}} where an error was 1757thrown even though the required predictors were in \code{newdata}. 1758 1759\item Changed \code{\link{preProcess}} so that centering and scaling are both automatic 1760when PCA or ICA are requested. 1761 1762 } 1763} 1764 1765\section{Changes in version 4.68}{ 1766\itemize{ 1767 1768\item Added two functions, \code{\link{checkResamples}} and 1769\code{\link{checkConditionalX}} that identify predictor data with 1770degenerate distributions when conditioned on a factor. 1771 1772\item Added a high content screening data set (\code{\link{segmentedData}}) from Hill et 1773al. Impact of image segmentation on high-content screening data quality 1774for SK-BR-3 cells. BMC bioinformatics (2007) vol. 8 (1) pp. 340. 1775 1776\item Fixed bugs in how \code{\link{sbf}} objects were printed (when using repeated 1777CV) and classification models with \cpkg{earth} and \code{classProbs = TRUE}. 1778 1779 1780 } 1781} 1782 1783\section{Changes in version 4.67}{ 1784\itemize{ 1785 1786\item Added \code{\link{predict.rfe}} 1787 1788\item Added imputation using bagged regression trees to 1789\code{\link{preProcess}}. 1790 1791\item Fixed bug in \code{\link{varImp.rfe}} that caused incorrect 1792results (thanks to Lawrence Mosley for the find). 1793 1794 } 1795} 1796 1797\section{Changes in version 4.65}{ 1798\itemize{ 1799 1800\item Fixed a bug where \code{\link{train}} would not allow knn imputation. 1801 1802\item \code{\link{filterVarImp}} and \code{roc} now check for missing values and 1803use complete data for each predictor (instead of case- 1804wise deletion across all predictors). 1805 1806 } 1807} 1808 1809\section{Changes in version 4.64}{ 1810\itemize{ 1811 1812\item Fixed bug introduced in the last version with 1813\code{createDataPartition(... list = FALSE)}. 1814 1815\item Fixed a bug predicting class probabilities when using 1816\cpkg{earth}/glm models 1817 1818\item Fixed a bug that occurred when \code{\link{train}} was used with 1819\code{ctree} or \code{tree2} methods. 1820 1821\item Fixed bugs in \code{\link{rfe}} and \code{\link{sbf}} when running in 1822parallel; not all the resampling results were saved 1823 1824 } 1825} 1826 1827\section{Changes in version 4.63}{ 1828\itemize{ 1829 1830\item A p-value from McNemar's test was added to \code{\link{confusionMatrix}}. 1831 1832\item Updated \code{\link{print.train}} so that constant parameters are not 1833 shown in the table (but a note is written below the table 1834 instead). Also, the output was changed slightly to be 1835 more easily read (I hope) 1836 1837\item Adapted \code{\link{varImp.gam}} to work with either \cpkg{mgcv} or \cpkg{gam} packages. 1838 1839\item Expanded the tuning parameters for \code{lvq}. 1840 1841\item Some of the examples in the Model Building vignette were changed 1842 1843\item Added bootstrap 632 rule and repeated cross-validation 1844 to \code{\link{trainControl}}. 1845 1846\item A new function, \code{\link{createMultiFolds}}, is 1847 used to generate indices for repeated CV. 1848 1849\item The various resampling functions now have *named* lists 1850 as output (with prefixes "Fold" for cv and repeated cv 1851 and "Resample" otherwise) 1852 1853\item Pre-processing has been added to \code{\link{train}} with the 1854 \code{\link{preProcess}} argument. This has been tested when caret 1855 function are used with \code{\link{rfe}} and \code{\link{sbf}} (via 1856 \code{\link{caretFuncs}} and \code{\link{caretSBF}}, respectively). 1857 1858\item When \code{preProcess(method = "spatialSign")}, centering and 1859 scaling is done automatically too. Also, a bug was fixed 1860 that stopped the transformation from being executed. 1861 1862\item knn imputation was added to \code{\link{preProcess}}. The \cpkg{RANN} package 1863 is used to find the neighbors (the knn impute function in 1864 the impute library was consistently generating segmentation 1865 faults, so we wrote our own). 1866 1867\item Changed the behavior of \code{\link{preProcess}} in situations where 1868 scaling is requested but there is no variation in the 1869 predictor. Previously, the method would fail. Now a 1870 warning is issued and the value of the standard 1871 deviation is coerced to be one (so that scaling has 1872 no effect). 1873 1874 } 1875} 1876 1877\section{Changes in version 4.62}{ 1878\itemize{ 1879 1880\item Added \code{gam} from \cpkg{mgcv} (with smoothing splines and feature 1881selection) and \code{gam} from \cpkg{gam} (with basic splines and loess) 1882smoothers. For these models, a formula is derived 1883from the data where "near zero variance" predictors 1884(see \code{\link{nearZerVar}}) are excluded and predictors with 1885less than 10 distinct values are entered as linear 1886(i.e. unsmoothed) terms. 1887 1888 } 1889} 1890 1891\section{Changes in version 4.61}{ 1892\itemize{ 1893 1894\item Changed \cpkg{earth} fit for classification models to use the 1895\code{glm} argument with a binomial family. 1896 1897\item Added \code{\link{varImp.multinom}}, which is based on the absolute 1898values of the model coefficients 1899 1900 } 1901} 1902 1903\section{Changes in version 4.60}{ 1904\itemize{ 1905 1906\item The feature selection vignette was updated slightly (again). 1907 1908 } 1909} 1910 1911\section{Changes in version 4.59}{ 1912\itemize{ 1913 1914\item Updated \code{\link{rfe}} and \code{\link{sbf}} to include class probabilities 1915in performance calculations. 1916 1917\item Also, the names of the resampling indices were harmonized 1918across \code{\link{train}}, \code{\link{rfe}} and \code{\link{sbf}}. 1919 1920\item The feature selection vignette was updated slightly. 1921 1922 } 1923} 1924 1925\section{Changes in version 4.58}{ 1926\itemize{ 1927 1928\item Added the ability to include class probabilities in 1929performance calculations. See \code{\link{trainControl}} and 1930\code{\link{twoClassSummary}}. 1931 1932\item Updated and restructured the main vignette. 1933 1934 } 1935} 1936 1937\section{Changes in version 4.57}{ 1938\itemize{ 1939 1940\item Internal changes related to how predictions from models are 1941stored and summarized. With the exception of loo, the model 1942performance values are calculated by the workers instead of 1943the main program. This should reduce i/o and lay some 1944groundwork for upcoming changes. 1945 1946\item The default grid for \cpkg{relaxo} models were changed based on 1947and initial model fit. 1948 1949\item \cpkg{partDSA} model predictions were modified; there were cases 1950where the user might request X partitions, but the model 1951only produced Y < X. In these cases, the partitions for 1952missing models were replaced with the largest model 1953that was fit. 1954 1955\item The function \code{\link{modelLookup}} was put in the namespace and 1956a man file was added. 1957 1958\item The names of the resample indices are automatically 1959reset, even if the user specified them. 1960 1961 } 1962} 1963 1964\section{Changes in version 4.56}{ 1965\itemize{ 1966 1967\item Fixed a bug generated a few versions ago where \code{\link{varImp}} 1968for \code{plsda} and \code{fda} objects crashed. 1969 1970 } 1971} 1972 1973\section{Changes in version 4.55}{ 1974\itemize{ 1975 1976\item When computing the scale parameter for RBF kernels, the 1977option to automatically scale the data was changed to \code{TRUE} 1978 1979 } 1980} 1981 1982\section{Changes in version 4.54}{ 1983\itemize{ 1984 1985\item Added \code{logic.bagging} in \pkg{logicFT} with \code{method = "logicBag"} 1986 1987 } 1988} 1989 1990\section{Changes in version 4.53}{ 1991\itemize{ 1992 1993\item Fixed a bug in \code{\link{varImp.train}} related to nearest shrunken 1994centroid models. 1995 1996\item Added logic regression and logic forests 1997 1998 } 1999} 2000 2001\section{Changes in version 4.51}{ 2002\itemize{ 2003 2004\item Added an option to \code{\link{splom.resamples}} so that the variables in the 2005scatter plots are models or metrics. 2006 2007 } 2008} 2009 2010\section{Changes in version 4.50}{ 2011\itemize{ 2012 2013\item Added \code{\link{dotplot.resamples}} plus acknowledgements to Hothorn et al. 2014(2005) and Eugster et al. (2008) 2015 2016 } 2017} 2018 2019\section{Changes in version 4.49}{ 2020\itemize{ 2021 2022\item Enhanced the \code{tuneGrid} option to allow a function 2023to be passed in. 2024 } 2025} 2026 2027\section{Changes in version 4.48}{ 2028\itemize{ 2029 2030\item Added a \code{prcomp} method for the \code{resamples} class 2031 2032 } 2033} 2034 2035\section{Changes in version 4.47}{ 2036\itemize{ 2037 2038\item Extended \code{\link{resamples}} to work with \code{\link{rfe}} and \code{\link{sbf}} 2039 } 2040} 2041 2042\section{Changes in version 4.46}{ 2043\itemize{ 2044 2045\item Cleaned up some of the man files for the resamples class 2046and added \code{\link{parallel.resamples}}. 2047 2048\item Fixed a bug in \code{\link{diff.resamples}} where \code{...} were 2049not being passed to the test statistic function. 2050 2051\item Added more log messages in \code{\link{train}} when running verbose. 2052 2053\item Added the German credit data set. 2054 } 2055} 2056 2057\section{Changes in version 4.45}{ 2058\itemize{ 2059 2060\item Added a general framework for bagging models via the 2061\code{\link{bag}} function. Also, model type \code{"hdda"} from the 2062\cpkg{HDclassif} package was added. 2063 2064 } 2065} 2066 2067\section{Changes in version 4.44}{ 2068\itemize{ 2069 2070\item Added \cpkg{neuralnet}, \code{quantregForest} and \code{rda} 2071(from \cpkg{rda}) to \code{\link{train}}. Since there is a naming 2072conflict with \code{rda} from \cpkg{mda}, the \cpkg{rda} model was 2073given a method value of \code{"scrda"}. } } 2074 2075\section{Changes in version 4.43}{ 2076\itemize{ 2077 2078\item Tthe resampling estimate of the standard deviation given 2079 by \code{\link{train}} since v 4.39 was wrong 2080 2081\item A new field was added to \code{\link{varImp.mvr}} called 2082 \code{"estimate"}. In cases where the mvr model had multiple 2083 estimates of performance (e.g. training set, CV, etc) the user can 2084 now select which estimate they want to be used in the importance 2085 calculation (thanks to Sophie Bréand for finding this) 2086} 2087} 2088 2089\section{Changes in version 4.42}{ 2090\itemize{ 2091 2092\item Added \code{\link{predict.sbf}} and modified the structure of 2093the \code{\link{sbf}} helper functions. The \code{"score"} function 2094only computes the metric used to filter and the filter function does 2095the actual filtering. This was changed so that FDR corrections or 2096other operations that use all of the p-values can be computed. 2097 2098\item Also, the formatting of p-values in \code{\link{print.confusionMatrix}} 2099was changed 2100 2101\item An argument was added to \code{\link{maxDissim}} 2102so that the variable name is returned instead of the index. 2103 2104\item Independent component analysis was added to the list of 2105pre-processing operations and a new model ("icr") was 2106added to fit a pcr-like model with the ICA components. 2107 2108 } 2109} 2110 2111\section{Changes in version 4.40}{ 2112\itemize{ 2113 2114\item Added \code{hda} and cleaned up the \cpkg{caret} training vignette 2115 2116 } 2117} 2118 2119\section{Changes in version 4.39}{ 2120\itemize{ 2121 2122\item Added several classes for examining the resampling results. There 2123are methods for estimating pair-wise differences and lattice 2124functions for visualization. The training vignette has a new 2125section describing the new features. 2126 2127 } 2128} 2129 2130\section{Changes in version 4.38}{ 2131\itemize{ 2132 2133\item Added \cpkg{partDSA} and \code{stepAIC} for linear models and 2134generalized linear models 2135 2136 } 2137} 2138 2139\section{Changes in version 4.37}{ 2140\itemize{ 2141 2142\item Fixed a new bug in how resampling results are exported 2143 2144 } 2145} 2146 2147\section{Changes in version 4.36}{ 2148\itemize{ 2149 2150\item Added penalized linear models from the \cpkg{foba} package 2151 2152 } 2153} 2154 2155\section{Changes in version 4.35}{ 2156\itemize{ 2157 2158\item Added \code{rocc} classification and fixed a typo. 2159 } 2160} 2161 2162\section{Changes in version 4.34}{ 2163\itemize{ 2164 2165\item Added two new data sets: \code{\link{dhfr}} and \code{\link{cars}} 2166 2167 } 2168} 2169 2170\section{Changes in version 4.33}{ 2171\itemize{ 2172 2173\item Added GAMens (ensembles using gams) 2174 2175\item Fixed a bug in \code{roc} that, for some data cases, would reverse the "positive" 2176class and report sensitivity as specificity and vice-versa. 2177 2178 } 2179} 2180 2181\section{Changes in version 4.32}{ 2182\itemize{ 2183 2184\item Added a parallel random forest method in \code{\link{train}} using the \cpkg{foreach} package. 2185 2186\item Also added penalized logistic regression using the \code{plr} function in the 2187\cpkg{stepPlr} package. 2188 } 2189} 2190 2191\section{Changes in version 4.31}{ 2192\itemize{ 2193 2194\item Added a new feature selection function, \code{\link{sbf}} (for selection by filter). 2195 2196\item Fixed bug in \code{\link{rfe}} that did not affect the results, but did produce 2197a warning. 2198 2199\item A new model function, \code{\link{nullModel}}, was added. This model fits either the 2200mean only model for regression or the majority class model for classification. 2201 2202\item Also, ldaFuncs had a bug fixed. 2203 2204\item Minor changes to Rd files 2205 2206 } 2207} 2208 2209\section{Changes in version 4.30}{ 2210\itemize{ 2211 2212\item For whatever reason, there is now a function in the \cpkg{spls} package 2213by the name of splsda that does the same thing. A few functions 2214and a man page were changed to ensure backwards compatibility. 2215 2216 } 2217} 2218 2219\section{Changes in version 4.29}{ 2220\itemize{ 2221 2222\item Added stepwise variable selection for \code{lda} and \code{qda} using the 2223\code{stepclass} function in \cpkg{klaR} 2224 2225 } 2226} 2227 2228\section{Changes in version 4.28}{ 2229\itemize{ 2230 2231\item Added robust linear and quadratic discriminant analysis functions 2232from \cpkg{rrcov}. 2233 2234\item Also added another column to the output of 2235\code{\link{extractProb}} and \code{\link{extractPrediction}} that 2236saves the name of the model object so that you can have multiple 2237models of the same type and tell which predictions came from which 2238model. 2239 2240\item Changes were made to \code{plotClassProbs}: new parameters were added 2241and densityplots can now be produced. 2242 2243 } 2244} 2245 2246\section{Changes in version 4.27}{ 2247\itemize{ 2248 2249\item Added \cpkg{nodeHarvest} 2250 2251 } 2252} 2253 2254\section{Changes in version 4.26}{ 2255\itemize{ 2256 2257\item Fixed a bug in \code{\link{caretFunc}} that led to NaN variable rankings, so 2258that the first k terms were always selected. 2259 2260 } 2261} 2262 2263\section{Changes in version 4.25}{ 2264\itemize{ 2265 2266\item Added parallel processing functionality for \code{\link{rfe}} 2267 2268 } 2269} 2270 2271\section{Changes in version 4.24}{ 2272\itemize{ 2273 2274\item Added the ability to use custom metrics with \code{\link{rfe}} 2275 2276 } 2277} 2278 2279\section{Changes in version 4.22}{ 2280\itemize{ 2281 2282\item Many Rd changes to work with updated parser. 2283 2284 } 2285} 2286 2287\section{Changes in version 4.21}{ 2288\itemize{ 2289 2290\item Re-saved data in more compressed format 2291 2292 } 2293} 2294 2295\section{Changes in version 4.20}{ 2296\itemize{ 2297 2298\item Added \code{pcr} as a method 2299 2300 } 2301} 2302 2303\section{Changes in version 4.19}{ 2304\itemize{ 2305 2306\item Weights argument was added to \code{\link{train}} for models that accept weights 2307 2308\item Also, a bug was fixed for lasso regression (wrong lambda 2309specification) and other for prediction in naive Bayes models 2310with a single predictor. 2311 } 2312} 2313 2314\section{Changes in version 4.18}{ 2315\itemize{ 2316 2317\item Fixed bug in new \code{\link{nearZeroVar}} and updated \code{format.earth} so that it 2318does not automatically print the formula 2319 } 2320} 2321 2322\section{Changes in version 4.17}{ 2323\itemize{ 2324 2325\item Added a new version of \code{\link{nearZeroVar}} from Allan Engelhardt that is 2326much faster 2327 } 2328} 2329 2330\section{Changes in version 4.16}{ 2331\itemize{ 2332 2333\item Fixed bugs in \code{\link{extractProb}} (for glmnet) and \code{\link{filterVarImp}}. 2334 2335\item For glmnet, the user can now pass in their own value of family to 2336\code{\link{train}} (otherwise \code{\link{train}} will set it depending on the mode of the 2337outcome). However, glmnet doesn't have much support for families at 2338this time, so you can't change links or try other distributions. 2339 } 2340} 2341 2342\section{Changes in version 4.15}{ 2343\itemize{ 2344 2345\item Fixed bug in \code{\link{createFolds}} when the smallest y value is more than 25% 2346of the data 2347 2348 } 2349} 2350 2351\section{Changes in version 4.14}{ 2352\itemize{ 2353 2354\item Fixed bug in \code{\link{print.train}} 2355 } 2356} 2357 2358\section{Changes in version 4.13}{ 2359\itemize{ 2360 2361\item Added vbmp from \cpkg{vbmp} package 2362 } 2363} 2364 2365\section{Changes in version 4.12}{ 2366\itemize{ 2367 2368\item Added additional error check to \code{\link{confusionMatrix}} 2369 2370\item Fixed an absurd typo in \code{\link{print.confusionMatrix}} 2371 2372 } 2373} 2374 2375\section{Changes in version 4.11}{ 2376\itemize{ 2377 2378\item Added: linear kernels for svm, rvm and Gaussian processes; \code{rlm} from \cpkg{MASS}; a knn regression model, knnreg 2379 2380\item A set of functions (class "\code{\link{classDist}}") to computes the class 2381 centroids and covariance matrix for a training set for 2382 determining Mahalanobis distances of samples to each class 2383 centroid was added 2384 2385\item a set of functions (\code{\link{rfe}}) for doing recursive feature selection 2386 (aka backwards selection). A new vignette was added for more 2387 details 2388 } 2389} 2390 2391\section{Changes in version 4.10}{ 2392\itemize{ 2393 2394\item Added \code{OneR} and \code{PART} from \cpkg{RWeka} 2395 2396 } 2397} 2398 2399\section{Changes in version 4.09}{ 2400\itemize{ 2401 2402\item Fixed error in documentation for \code{confusionMatrix}. The old doc had \code{"Detection Prevalence = A/(A+B)"} and the new one has \code{"Detection Prevalence =(A+B)(A+B+C+D)"}. The underlying code was correct. 2403 2404\item Added \code{lars} (\code{fraction} and \code{step} as parameters) 2405 2406 } 2407} 2408 2409\section{Changes in version 4.08}{ 2410\itemize{ 2411 2412\item Updated \code{\link{train}} and \code{bagEarth} to allow \code{earth} 2413for classification models 2414 2415 } 2416} 2417 2418 2419 2420\section{Changes in version 4.07}{ 2421\itemize{ 2422 2423\item Added \cpkg{glmnet} models 2424 2425 } 2426} 2427 2428\section{Changes in version 4.06}{ 2429\itemize{ 2430 2431\item Added code for sparse PLS classification. 2432 2433\item Fix a bug in prediction for \code{caTools::LogitBoost} 2434 2435 } 2436} 2437 2438\section{Changes in version 4.05}{ 2439\itemize{ 2440 2441\item Updated again for more stringent R CMD check tests in R-devel 2.9 2442 2443 } 2444} 2445 2446 2447 2448\section{Changes in version 4.04}{ 2449\itemize{ 2450 2451\item Updated for more stringent R CMD check tests in R-devel 2.9 2452 2453 } 2454} 2455 2456\section{Changes in version 4.03}{ 2457\itemize{ 2458 2459\item Significant internal changes were made to how the models are 2460fit. Now, the function used to compute the models is passed in as a 2461parameter (defaulting to \code{lapply}). In this way, users can use 2462their own parallel processing software without new versions of 2463\cpkg{caret}. Examples are given in \code{\link{train}}. 2464 2465\item Also, fixed a bug where the MSE (instead of RMSE) was reported 2466for random forest OOB resampling 2467 2468\item There are more examples in \code{\link{train}}. 2469 2470\item Changes to \code{confusionMatrix}, \code{sensitivity}, 2471\code{specificity} and the predictive value functions: each was made 2472more generic with default and \code{table} methods; 2473\code{confusionMatrix} "extractor" functions for matrices and tables 2474were added; the pos/neg predicted value computations were changed to 2475incorporate prevalence; prevalence was added as an option to several 2476functions; detection rate and prevalence statistics were added to 2477\code{confusionMatrix}; and the examples were expanded in the help 2478files. 2479 2480\item This version of caret will break compatibility with 2481\pkg{caretLSF} and \pkg{caretNWS}. However, these packages will not be 2482needed now and will be deprecated. 2483 2484 } 2485} 2486 2487\section{Changes in version 3.51}{ 2488\itemize{ 2489 2490\item Updated the man files and manuals. 2491 2492 } 2493} 2494 2495\section{Changes in version 3.50}{ 2496\itemize{ 2497 2498\item Added \code{qda}, \code{mda} and \code{pda}. 2499 2500 } 2501} 2502 2503 2504\section{Changes in version 3.49}{ 2505\itemize{ 2506 2507\item Fixed bug in \code{resampleHist}. Also added a check in the \code{\link{train}} functions 2508that error trapped with \code{glm} models and > 2 classes 2509 2510 } 2511} 2512 2513 2514\section{Changes in version 3.48}{ 2515\itemize{ 2516 2517\item Added \code{glm}s. Also, added \code{varImp.bagEarth} to the 2518namespace. 2519 2520 } 2521} 2522 2523\section{Changes in version 3.47}{ 2524\itemize{ 2525 2526\item Added \code{sda} from the \cpkg{sda} package. There was a naming 2527conflict between \code{sda::sda} and \code{sparseLDA:::sda}. The 2528method value for \code{sparseLDA} was changed from "sda" to 2529"sparseLDA". 2530 2531 } 2532} 2533 2534 2535\section{Changes in version 3.46}{ 2536\itemize{ 2537 2538\item Added \code{spls} from the \cpkg{spls} package 2539 2540 } 2541} 2542 2543\section{Changes in version 3.45}{ 2544\itemize{ 2545 2546\item Added caching of \cpkg{RWeka} objects to that they can be saved 2547to the file system and used in other sessions. (changes per Kurt 2548Hornik on 2008-10-05) 2549 2550 } 2551} 2552 2553\section{Changes in version 3.44}{ 2554\itemize{ 2555 2556\item Added \code{sda} from the \cpkg{sparseLDA} package (not on 2557CRAN). 2558 2559\item Also, a bug was fixed where the ellipses were not passed into a 2560few of the newer models (such as \code{penalized} and \code{ppr}) 2561 2562 } 2563} 2564 2565\section{Changes in version 3.43}{ 2566\itemize{ 2567 2568\item Added the penalized model from the \cpkg{penalized} package. In 2569\cpkg{caret}, it is regression only although the package allows for 2570classification via glm models. However, it does not allow the user to 2571pass the classes in (just an indicator matrix). Because of this, it 2572doesn't really work with the rest of the classification tools in the 2573package. 2574 2575 } 2576} 2577 2578\section{Changes in version 3.42}{ 2579\itemize{ 2580 2581\item Added a little more formatting to \code{\link{print.train}} 2582 2583 } 2584} 2585 2586\section{Changes in version 3.41}{ 2587\itemize{ 2588 2589\item For \code{gbm}, let the user over-ride the default value of the 2590\code{distribution} argument (brought us by Peter Tait via RHelp). 2591 2592 } 2593} 2594 2595 2596 2597\section{Changes in version 3.40}{ 2598\itemize{ 2599 2600\item Changed \code{predict.preProcess} so that it doesn't crash if 2601\code{newdata} does not have all of the variables used to originally 2602pre-process *unless* PCA processing was requested. 2603 2604 } 2605} 2606 2607 2608 2609\section{Changes in version 3.39}{ 2610\itemize{ 2611 2612\item Fixed bug in \code{varImp.rpart} when the model had only primary 2613splits. 2614 2615\item Minor changes to the Affy normalization code 2616 2617\item Changed typo in \code{predictors} man page 2618 2619 } 2620} 2621 2622\section{Changes in version 3.38}{ 2623\itemize{ 2624 2625\item Added a new class called \code{predictors} that returns the 2626names of the predictors that were used in the final model. 2627 2628\item Also added \code{ppr} from the \code{stats} package. 2629 2630\item Minor update to the project web page to deal with IE issues 2631 2632 2633 } 2634} 2635 2636\section{Changes in version 3.37}{ 2637\itemize{ 2638 2639\item Added the ability of \code{\link{train}} to use custom made performance 2640functions so that the tuning parameters can be chosen on the basis of 2641things other than RMSE/R-squared and Accuracy/Kappa. 2642 2643\item A new argument was added to \code{\link{trainControl}} called 2644 "summaryFunction" that is used to specify the function used to 2645 compute performance metrics. The default function preserves the 2646 functionality prior to this new version 2647 2648\item a new argument to \code{\link{train}} is "maximize" which is a logical 2649 for whether the performance measure specified in the "metric" 2650 argument to \code{\link{train}} should be maximized or minimized. 2651 2652\item The selection function specified in \code{\link{trainControl}} carries 2653 the maximize argument with it so that customized performance 2654 metrics can be used. 2655 2656\item A bug was fixed in \code{confusionMatrix} (thanks to Gabor 2657Grothendieck) 2658 2659\item Another bug was fixed related to predictions from least square 2660SVMs 2661 2662 } 2663} 2664 2665 2666\section{Changes in version 3.36}{ 2667\itemize{ 2668 2669\item Added \code{superpc} from the \cpkg{superpc} package. One note: 2670the \code{data} argument that is passed to \code{superpc} is saved in 2671the object that results from \code{superpc.train}. This is used later 2672in the prediction function. 2673 2674 } 2675} 2676 2677\section{Changes in version 3.35}{ 2678\itemize{ 2679 2680\item Added \code{slda} from \cpkg{ipred}. 2681 2682 } 2683} 2684 2685 2686\section{Changes in version 3.34}{ 2687\itemize{ 2688 2689\item Fixed a few bugs related to the lattice plots from version 3.33. 2690 2691\item Also added the ripper (aka \code{JRip}) and logistic model trees 2692from \cpkg{RWeka} 2693 2694 } 2695} 2696 2697\section{Changes in version 3.33}{ 2698\itemize{ 2699 2700\item Added \code{xyplot.train}, \code{densityplot.train}, 2701\code{histogram.train} and \code{stripplot.train}. These are all 2702functions to plot the resampling points. There is some overlap between 2703these functions, \code{plot.train} and 2704\code{resampleHist}. \code{plot.train} gives the average metrics only 2705while these plot all of the resampled performance 2706metrics. \code{resampleHist} could plot all of the points, but only 2707for the final optimal set of predictors. 2708 2709\item To use these functions, there is a new argument in 2710\code{\link{trainControl}} called \code{\link{returnResamp}} which should have 2711values "none", "final" and "all". The default is "final" to be 2712consistent with previous versions, but "all" should be specified to 2713use these new functions to their fullest. 2714 2715 } 2716} 2717 2718\section{Changes in version 3.32}{ 2719\itemize{ 2720 2721\item The functions \code{\link{predict.train}} and \code{\link{predict.list}} were 2722added to use as alternatives to the \code{\link{extractPrediction}} and 2723\code{\link{extractProbs}} functions. 2724 2725\item Added C4.5 (aka \code{J48}) and rules-based models (M5 prime) from 2726\cpkg{RWeka}. 2727 2728\item Also added \code{logitBoost} from the \cpkg{caTools} 2729package. This package doesn't have a namespace and \cpkg{RWeka} has a 2730function with the same name. It was suggested to use the "::" prefix 2731to differentiate them (but we'll see how this works). 2732 2733 } 2734} 2735 2736 2737 2738