1Callbacks in ensmallen are methods that are called at various states during the 2optimization process, which can be used to implement and control behaviors such 3as: 4 5* Changing the learning rate. 6* Printing of the current objective. 7* Sending a message when the optimization hits a specific state such us a minimal objective. 8 9Callbacks can be passed as an argument to the `Optimize()` function: 10 11<details open> 12<summary>Click to collapse/expand example code. 13</summary> 14 15```c++ 16RosenbrockFunction f; 17arma::mat coordinates = f.GetInitialPoint(); 18 19MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5)); 20 21// Pass the built-in *PrintLoss* callback as the last argument to the 22// *Optimize()* function. 23optimizer.Optimize(f, coordinates, PrintLoss()); 24``` 25 26</details> 27 28Passing multiple callbacks is just the same as passing a single callback: 29 30<details open> 31<summary>Click to collapse/expand example code. 32</summary> 33 34```c++ 35RosenbrockFunction f; 36arma::mat coordinates = f.GetInitialPoint(); 37 38MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5)); 39 40// Pass the built-in *PrintLoss* and *EarlyStopAtMinLoss* callback as the last 41// argument to the *Optimize()* function. 42optimizer.Optimize(f, coordinates, PrintLoss(), EarlyStopAtMinLoss()); 43``` 44 45</details> 46 47It is also possible to pass a callback instantiation that allows accessing of 48internal callback parameters at a later state: 49 50<details open> 51<summary>Click to collapse/expand example code. 52</summary> 53 54```c++ 55RosenbrockFunction f; 56arma::mat coordinates = f.GetInitialPoint(); 57 58MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5)); 59 60// Create an instantiation of the built-in *StoreBestCoordinates* callback, 61// which will store the best objective and the corresponding model parameter 62// that can be accessed later. 63StoreBestCoordinates<> callback; 64 65// Pass an instantiation of the built-in *StoreBestCoordinates* callback as the 66// last argument to the *Optimize()* function. 67optimizer.Optimize(f, coordinates, callback); 68 69// Print the minimum objective that is stored inside the *StoreBestCoordinates* 70// callback that was passed to the *Optimize()* call. 71std::cout << callback.BestObjective() << std::endl; 72``` 73 74</details> 75 76## Built-in Callbacks 77 78### EarlyStopAtMinLoss 79 80Stops the optimization process if the loss stops decreasing or no improvement 81has been made. 82 83#### Constructors 84 85 * `EarlyStopAtMinLoss()` 86 * `EarlyStopAtMinLoss(`_`patience`_`)` 87 * `EarlyStopAtMinLoss(`_`func`_`)` 88 * `EarlyStopAtMinLoss(`_`func`_`,`_`patience`_`)` 89 90#### Attributes 91 92| **type** | **name** | **description** | **default** | 93|----------|----------|-----------------|-------------| 94| `size_t` | **`patience`** | The number of epochs to wait after the minimum loss has been reached. | `10` | 95| `std::function<double(const arma::mat&)>` | **`func`** | A callback to return immediate loss evaluated by the function. | | 96 97Note that for the `func` argument above, if a 98[different matrix type](#alternate-matrix-types) is desired, instead of using 99the class `EarlyStopAtMinLoss`, the class `EarlyStopAtMinLossType<MatType>` 100should be used. 101 102#### Examples: 103 104<details open> 105<summary>Click to collapse/expand example code. 106</summary> 107 108```c++ 109AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true); 110 111RosenbrockFunction f; 112arma::mat coordinates = f.GetInitialPoint(); 113optimizer.Optimize(f, coordinates, EarlyStopAtMinLoss()); 114``` 115Another example of using lambda in the constructor. 116 117```c++ 118// Generate random training data and labels. 119arma::mat trainingData(5, 100, arma::fill::randu); 120arma::Row<size_t> trainingLabels = 121 arma::randi<arma::Row<size_t>>(100, arma::distr_param(0, 1)); 122// Generate a validation set. 123arma::mat validationData(5, 100, arma::fill::randu); 124arma::Row<size_t> validationLabels = 125 arma::randi<arma::Row<size_t>>(100, arma::distr_param(0, 1)); 126 127// Create a LogisticRegressionFunction for both the training and validation data. 128LogisticRegressionFunction lrfTrain(trainingData, trainingLabels); 129LogisticRegressionFunction lrfValidation(validationData, validationLabels); 130 131// Create a callback that will terminate when the validation loss starts to 132// increase. 133EarlyStopAtMinLoss cb( 134 [&](const arma::mat& coordinates) 135 { 136 // You could also, e.g., print the validation loss here to watch it converge. 137 return lrfValidation.Evaluate(coordinates); 138 }); 139 140arma::mat coordinates = lrfTrain.GetInitialPoint(); 141SMORMS3 smorms3; 142smorms3.Optimize(lrfTrain, coordinates, cb); 143``` 144 145</details> 146 147### PrintLoss 148 149Callback that prints loss to stdout or a specified output stream. 150 151#### Constructors 152 153 * `PrintLoss()` 154 * `PrintLoss(`_`output`_`)` 155 156#### Attributes 157 158| **type** | **name** | **description** | **default** | 159|----------|----------|-----------------|-------------| 160| `std::ostream` | **`output`** | Ostream which receives output from this object. | `stdout` | 161 162#### Examples: 163 164<details open> 165<summary>Click to collapse/expand example code. 166</summary> 167 168```c++ 169AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true); 170 171RosenbrockFunction f; 172arma::mat coordinates = f.GetInitialPoint(); 173optimizer.Optimize(f, coordinates, PrintLoss()); 174``` 175 176</details> 177 178### ProgressBar 179 180Callback that prints a progress bar to stdout or a specified output stream. 181 182#### Constructors 183 184 * `ProgressBar()` 185 * `ProgressBar(`_`width`_`)` 186 * `ProgressBar(`_`width, output`_`)` 187 188#### Attributes 189 190| **type** | **name** | **description** | **default** | 191|----------|----------|-----------------|-------------| 192| `size_t` | **`width`** | Width of the bar. | `70` | 193| `std::ostream` | **`output`** | Ostream which receives output from this object. | `stdout` | 194 195#### Examples: 196 197<details open> 198<summary>Click to collapse/expand example code. 199</summary> 200 201```c++ 202AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true); 203 204RosenbrockFunction f; 205arma::mat coordinates = f.GetInitialPoint(); 206optimizer.Optimize(f, coordinates, ProgressBar()); 207``` 208 209</details> 210 211### Report 212 213Callback that prints a optimizer report to stdout or a specified output stream. 214 215#### Constructors 216 217 * `Report()` 218 * `Report(`_`iterationsPercentage`_`)` 219 * `Report(`_`iterationsPercentage, output`_`)` 220 * `Report(`_`iterationsPercentage, output, outputMatrixSize`_`)` 221 222#### Attributes 223 224| **type** | **name** | **description** | **default** | 225|----------|----------|-----------------|-------------| 226| `double` | **`iterationsPercentage`** | The number of iterations to report in percent, between [0, 1]. | `0.1` | 227| `std::ostream` | **`output`** | Ostream which receives output from this object. | `stdout` | 228| `size_t` | **`outputMatrixSize`** | The number of values to output for the function coordinates. | `4` | 229 230#### Examples: 231 232<details open> 233<summary>Click to collapse/expand example code. 234</summary> 235 236```c++ 237AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true); 238 239RosenbrockFunction f; 240arma::mat coordinates = f.GetInitialPoint(); 241optimizer.Optimize(f, coordinates, Report(0.1)); 242``` 243 244<details open> 245<summary>Click to collapse/expand example output. 246</summary> 247 248``` 249Optimization Report 250-------------------------------------------------------------------------------- 251 252Initial Coordinates: 253 -1.2000 1.0000 254 255Final coordinates: 256 -1.0490 1.1070 257 258iter loss loss change |gradient| step size total time 2590 24.2 0 233 1 4.27e-05 260100 8.6 15.6 104 1 0.000215 261200 5.26 3.35 48.7 1 0.000373 262300 4.49 0.767 23.4 1 0.000533 263400 4.31 0.181 11.3 1 0.000689 264500 4.27 0.0431 5.4 1 0.000846 265600 4.26 0.012 2.86 1 0.00101 266700 4.25 0.00734 2.09 1 0.00117 267800 4.24 0.00971 1.95 1 0.00132 268900 4.22 0.0146 1.91 1 0.00148 269 270-------------------------------------------------------------------------------- 271 272Version: 273ensmallen: 2.13.0 (Automatically Automated Automation) 274armadillo: 9.900.1 (Nocturnal Misbehaviour) 275 276Function: 277Number of functions: 1 278Coordinates rows: 2 279Coordinates columns: 1 280 281Loss: 282Initial 24.2 283Final 4.2 284Change 20 285 286Optimizer: 287Maximum iterations: 1000 288Reached maximum iterations: true 289Batchsize: 1 290Iterations: 1000 291Number of epochs: 1001 292Initial step size: 1 293Final step size: 1 294Coordinates max. norm: 233 295Evaluate calls: 1000 296Gradient calls: 1000 297Time (in seconds): 0.00163 298``` 299 300### StoreBestCoordinates 301 302Callback that stores the model parameter after every epoch if the objective 303decreased. 304 305#### Constructors 306 307 * `StoreBestCoordinates<`_`ModelMatType`_`>()` 308 309The _`ModelMatType`_ template parameter refers to the matrix type of the model 310parameter. 311 312#### Attributes 313 314The stored model parameter can be accessed via the member method 315`BestCoordinates()` and the best objective via `BestObjective()`. 316 317#### Examples: 318 319 320<details open> 321<summary>Click to collapse/expand example code. 322</summary> 323 324```c++ 325AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true); 326 327RosenbrockFunction f; 328arma::mat coordinates = f.GetInitialPoint(); 329 330StoreBestCoordinates<arma::mat> cb; 331optimizer.Optimize(f, coordinates, cb); 332 333std::cout << "The optimized model found by AdaDelta has the " 334 << "parameters " << cb.BestCoordinates(); 335``` 336 337</details> 338 339## Callback States 340 341Callbacks are called at several states during the optimization process: 342 343* At the beginning and end of the optimization process. 344* After any call to `Evaluate()` and `EvaluateConstraint`. 345* After any call to `Gradient()` and `GradientConstraint`. 346* At the start and end of an epoch. 347 348Each callback provides optimization relevant information that can be accessed or 349modified. 350 351### BeginOptimization 352 353Called at the beginning of the optimization process. 354 355 * `BeginOptimization(`_`optimizer, function, coordinates`_`)` 356 357#### Attributes 358 359| **type** | **name** | **description** | 360|----------|----------|-----------------| 361| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. | 362| `FunctionType` | **`function`** | The function to be optimized. | 363| `MatType` | **`coordinates`** | The current function parameter. | 364 365### EndOptimization 366 367Called at the end of the optimization process. 368 369 * `EndOptimization(`_`optimizer, function, coordinates`_`)` 370 371#### Attributes 372 373| **type** | **name** | **description** | 374|----------|----------|-----------------| 375| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. | 376| `FunctionType` | **`function`** | The function to be optimized. | 377| `MatType` | **`coordinates`** | The current function parameter. | 378 379### Evaluate 380 381Called after any call to `Evaluate()`. 382 383 * `Evaluate(`_`optimizer, function, coordinates, objective`_`)` 384 385#### Attributes 386 387| **type** | **name** | **description** | 388|----------|----------|-----------------| 389| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. | 390| `FunctionType` | **`function`** | The function to be optimized. | 391| `MatType` | **`coordinates`** | The current function parameter. | 392| `double` | **`objective`** | Objective value of the current point. | 393 394### EvaluateConstraint 395 396 Called after any call to `EvaluateConstraint()`. 397 398 * `EvaluateConstraint(`_`optimizer, function, coordinates, constraint, constraintValue`_`)` 399 400#### Attributes 401 402| **type** | **name** | **description** | 403|----------|----------|-----------------| 404| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. | 405| `FunctionType` | **`function`** | The function to be optimized. | 406| `MatType` | **`coordinates`** | The current function parameter. | 407| `size_t` | **`constraint`** | The index of the constraint. | 408| `double` | **`constraintValue`** | Constraint value of the current point. | 409 410### Gradient 411 412 Called after any call to `Gradient()`. 413 414 * `Gradient(`_`optimizer, function, coordinates, gradient`_`)` 415 416#### Attributes 417 418| **type** | **name** | **description** | 419|----------|----------|-----------------| 420| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. | 421| `FunctionType` | **`function`** | The function to be optimized. | 422| `MatType` | **`coordinates`** | The current function parameter. | 423| `GradType` | **`gradient`** | Matrix that holds the gradient. | 424 425### GradientConstraint 426 427 Called after any call to `GradientConstraint()`. 428 429 * `GradientConstraint(`_`optimizer, function, coordinates, constraint, gradient`_`)` 430 431#### Attributes 432 433| **type** | **name** | **description** | 434|----------|----------|-----------------| 435| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. | 436| `FunctionType` | **`function`** | The function to be optimized. | 437| `MatType` | **`coordinates`** | The current function parameter. | 438| `size_t` | **`constraint`** | The index of the constraint. | 439| `GradType` | **`gradient`** | Matrix that holds the gradient. | 440 441### BeginEpoch 442 443Called at the beginning of a pass over the data. The objective may be exact or 444an estimate depending on `exactObjective` value. 445 446 * `BeginEpoch(`_`optimizer, function, coordinates, epoch, objective`_`)` 447 448#### Attributes 449 450| **type** | **name** | **description** | 451|----------|----------|-----------------| 452| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. | 453| `FunctionType` | **`function`** | The function to be optimized. | 454| `MatType` | **`coordinates`** | The current function parameter. | 455| `size_t` | **`epoch`** | The index of the current epoch. | 456| `double` | **`objective`** | Objective value of the current point. | 457 458### EndEpoch 459 460Called at the end of a pass over the data. The objective may be exact or 461an estimate depending on `exactObjective` value. 462 463 * `EndEpoch(`_`optimizer, function, coordinates, epoch, objective`_`)` 464 465#### Attributes 466 467| **type** | **name** | **description** | 468|----------|----------|-----------------| 469| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. | 470| `FunctionType` | **`function`** | The function to be optimized. | 471| `MatType` | **`coordinates`** | The current function parameter. | 472| `size_t` | **`epoch`** | The index of the current epoch. | 473| `double` | **`objective`** | Objective value of the current point. | 474 475### GenerationalStepTaken 476 477Called after the evolution of a single generation. Intended specifically for 478MultiObjective Optimizers. 479 480 * `GenerationalStepTaken(`_`optimizer, function, coordinates, objectives, frontIndices`_`)` 481 482#### Attributes 483 484| **type** | **name** | **description** | 485|----------|----------|-----------------| 486| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. | 487| `FunctionType` | **`function`** | The function to be optimized. | 488| `MatType` | **`coordinates`** | The current function parameter. | 489| `ObjectivesVecType` | **`objectives`** | The set of calculated objectives so far. | 490| `IndicesType` | **`frontIndices`** | The indices of the members belonging to Pareto Front. | 491 492## Custom Callbacks 493 494### Learning rate scheduling 495 496Setting the learning rate is crucially important when training because it 497controls both the speed of convergence and the ultimate performance of the 498model. One of the simplest learning rate strategies is to have a fixed learning 499rate throughout the training process. Choosing a small learning rate allows the 500optimizer to find good solutions, but this comes at the expense of limiting the 501initial speed of convergence. To overcome this tradeoff, changing the learning 502rate as more epochs have passed is commonly done in model training. The 503`Evaluate` method in combination with the ``StepSize`` method of the optimizer 504can be used to update the variables. 505 506Example code showing how to implement a custom callback to change the learning 507rate is given below. 508 509<details> 510<summary>Click to collapse/expand example code. 511</summary> 512 513```c++ 514class ExponentialDecay 515{ 516 // Set up the exponential decay learning rate scheduler with the user 517 // specified decay value. 518 ExponentialDecay(const double decay) : decay(decay), learningRate(0) { } 519 520 521 // Callback function called at the start of the optimization process. 522 // In this example we will use this to save the initial learning rate. 523 template<typename OptimizerType, typename FunctionType, typename MatType> 524 void BeginOptimization(OptimizerType& /* optimizer */, 525 FunctionType& /* function */, 526 MatType& /* coordinates */) 527 { 528 // Save the initial learning rate. 529 learningRate = optimizer.StepSize(); 530 } 531 532 // Callback function called at the end of a pass over the data. We are only 533 // interested in the current epoch and the optimizer, we ignore the rest. 534 template<typename OptimizerType, typename FunctionType, typename MatType> 535 void EndEpoch(OptimizerType& optimizer, 536 FunctionType& /* function */, 537 const MatType& /* coordinates */, 538 const size_t epoch, 539 const double objective) 540 { 541 // Update the learning rate. 542 optimizer.StepSize() = learningRate * (1.0 - std::pow(decay, 543 (double) epoch)); 544 } 545 546 double learningRate; 547}; 548 549int main() 550{ 551 // First, generate some random data, with 10000 points and 10 dimensions. 552 // This data has no pattern and as such will make a model that's not very 553 // useful---but the purpose here is just demonstration. :) 554 // 555 // For a more "real world" situation, load a dataset from file using X.load() 556 // and y.load() (but make sure the matrix is column-major, so that each 557 // observation/data point corresponds to a *column*, *not* a row. 558 arma::mat data(10, 10000, arma::fill::randn); 559 arma::rowvec responses(10000, arma::fill::randn); 560 561 // Create a starting point for our optimization randomly. The model has 10 562 // parameters, so the shape is 10x1. 563 arma::mat startingPoint(10, 1, arma::fill::randn); 564 565 // Construct the objective function. 566 LinearRegressionFunction lrf(data, responses); 567 arma::mat lrfParams(startingPoint); 568 569 // Create the StandardSGD optimizer with specified parameters. 570 // The ens::StandardSGD type can be replaced with any ensmallen optimizer 571 //that can handle differentiable functions. 572 StandardSGD optimizer(0.001, 1, 0, 1e-15, true); 573 574 // Use the StandardSGD optimizer with specified parameters to minimize the 575 // LinearRegressionFunction and pass the *exponential decay* 576 // callback function from above. 577 optimizer.Optimize(lrf, lrfParams, ExponentialDecay(0.01)); 578 579 // Print the trained model parameter. 580 std::cout << lrfParams.t(); 581} 582``` 583 584</details> 585 586### Early stopping at minimum loss 587 588Early stopping is a technique for controlling overfitting in machine learning 589models, especially neural networks, by stopping the optimization process before 590the model has trained for the maximum number of iterations. 591 592Example code showing how to implement a custom callback to stop the optimization 593when the minimum of loss has been reached is given below. 594 595<details> 596<summary>Click to collapse/expand example code. 597</summary> 598 599```c++ 600#include <ensmallen.hpp> 601 602// This class implements early stopping at minimum loss callback function to 603// terminate the optimization process early if the loss stops decreasing. 604class EarlyStop 605{ 606 public: 607 // Set up the early stop at min loss class, which keeps track of the minimum 608 // loss. 609 EarlyStop() : bestObjective(std::numeric_limits<double>::max()) { } 610 611 // Callback function called at the end of a pass over the data, which provides 612 // the current objective. We are only interested in the objective and ignore 613 // the rest. 614 template<typename OptimizerType, typename FunctionType, typename MatType> 615 void EndEpoch(OptimizerType& /* optimizer */, 616 FunctionType& /* function */, 617 const MatType& /* coordinates */, 618 const size_t /* epoch */, 619 const double objective) 620 { 621 // Check if the given objective is lower as the previous objective. 622 if (objective < bestObjective) 623 { 624 // Update the local objective. 625 bestObjective = objective; 626 } 627 else 628 { 629 // Stop the optimization process. 630 return true; 631 } 632 633 // Do not stop the optimization process. 634 return false; 635 } 636 637 // Locally-stored best objective. 638 double bestObjective; 639}; 640 641int main() 642{ 643 // First, generate some random data, with 10000 points and 10 dimensions. 644 // This data has no pattern and as such will make a model that's not very 645 // useful---but the purpose here is just demonstration. :) 646 // 647 // For a more "real world" situation, load a dataset from file using X.load() 648 // and y.load() (but make sure the matrix is column-major, so that each 649 // observation/data point corresponds to a *column*, *not* a row. 650 arma::mat data(10, 10000, arma::fill::randn); 651 arma::rowvec responses(10000, arma::fill::randn); 652 653 // Create a starting point for our optimization randomly. The model has 10 654 // parameters, so the shape is 10x1. 655 arma::mat startingPoint(10, 1, arma::fill::randn); 656 657 // Construct the objective function. 658 LinearRegressionFunction lrf(data, responses); 659 arma::mat lrfParams(startingPoint); 660 661 // Create the L_BFGS optimizer with default parameters. 662 // The ens::L_BFGS type can be replaced with any ensmallen optimizer that can 663 // handle differentiable functions. 664 ens::L_BFGS lbfgs; 665 666 // Use the L_BFGS optimizer with default parameters to minimize the 667 // LinearRegressionFunction and pass the *early stopping at minimum loss* 668 // callback function from above. 669 lbfgs.Optimize(lrf, lrfParams, EarlyStop()); 670 671 // Print the trained model parameter. 672 std::cout << lrfParams.t(); 673} 674``` 675 676</details> 677 678Note that we have simply passed an instantiation of `EarlyStop` the 679rest is handled inside the optimizer. 680 681ensmallen provides a more complete and general implementation of a 682[early stopping](#EarlyStopAtMinLoss) at minimum loss callback function. 683