1Callbacks in ensmallen are methods that are called at various states during the
2optimization process, which can be used to implement and control behaviors such
3as:
4
5* Changing the learning rate.
6* Printing of the current objective.
7* Sending a message when the optimization hits a specific state such us a minimal objective.
8
9Callbacks can be passed as an argument to the `Optimize()` function:
10
11<details open>
12<summary>Click to collapse/expand example code.
13</summary>
14
15```c++
16RosenbrockFunction f;
17arma::mat coordinates = f.GetInitialPoint();
18
19MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5));
20
21// Pass the built-in *PrintLoss* callback as the last argument to the
22// *Optimize()* function.
23optimizer.Optimize(f, coordinates, PrintLoss());
24```
25
26</details>
27
28Passing multiple callbacks is just the same as passing a single callback:
29
30<details open>
31<summary>Click to collapse/expand example code.
32</summary>
33
34```c++
35RosenbrockFunction f;
36arma::mat coordinates = f.GetInitialPoint();
37
38MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5));
39
40// Pass the built-in *PrintLoss* and *EarlyStopAtMinLoss* callback as the last
41// argument to the *Optimize()* function.
42optimizer.Optimize(f, coordinates, PrintLoss(), EarlyStopAtMinLoss());
43```
44
45</details>
46
47It is also possible to pass a callback instantiation that allows accessing of
48internal callback parameters at a later state:
49
50<details open>
51<summary>Click to collapse/expand example code.
52</summary>
53
54```c++
55RosenbrockFunction f;
56arma::mat coordinates = f.GetInitialPoint();
57
58MomentumSGD optimizer(0.01, 32, 100000, 1e-5, true, MomentumUpdate(0.5));
59
60// Create an instantiation of the built-in *StoreBestCoordinates* callback,
61// which will store the best objective and the corresponding model parameter
62// that can be accessed later.
63StoreBestCoordinates<> callback;
64
65// Pass an instantiation of the built-in *StoreBestCoordinates* callback as the
66// last argument to the *Optimize()* function.
67optimizer.Optimize(f, coordinates, callback);
68
69// Print the minimum objective that is stored inside the *StoreBestCoordinates*
70// callback that was passed to the *Optimize()* call.
71std::cout << callback.BestObjective() << std::endl;
72```
73
74</details>
75
76## Built-in Callbacks
77
78### EarlyStopAtMinLoss
79
80Stops the optimization process if the loss stops decreasing or no improvement
81has been made.
82
83#### Constructors
84
85 * `EarlyStopAtMinLoss()`
86 * `EarlyStopAtMinLoss(`_`patience`_`)`
87 * `EarlyStopAtMinLoss(`_`func`_`)`
88 * `EarlyStopAtMinLoss(`_`func`_`,`_`patience`_`)`
89
90#### Attributes
91
92| **type** | **name** | **description** | **default** |
93|----------|----------|-----------------|-------------|
94| `size_t` | **`patience`** | The number of epochs to wait after the minimum loss has been reached. | `10` |
95| `std::function<double(const arma::mat&)>` | **`func`** | A callback to return immediate loss evaluated by the function. | |
96
97Note that for the `func` argument above, if a
98[different matrix type](#alternate-matrix-types) is desired, instead of using
99the class `EarlyStopAtMinLoss`, the class `EarlyStopAtMinLossType<MatType>`
100should be used.
101
102#### Examples:
103
104<details open>
105<summary>Click to collapse/expand example code.
106</summary>
107
108```c++
109AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
110
111RosenbrockFunction f;
112arma::mat coordinates = f.GetInitialPoint();
113optimizer.Optimize(f, coordinates, EarlyStopAtMinLoss());
114```
115Another example of using lambda in the constructor.
116
117```c++
118// Generate random training data and labels.
119arma::mat trainingData(5, 100, arma::fill::randu);
120arma::Row<size_t> trainingLabels =
121    arma::randi<arma::Row<size_t>>(100, arma::distr_param(0, 1));
122// Generate a validation set.
123arma::mat validationData(5, 100, arma::fill::randu);
124arma::Row<size_t> validationLabels =
125    arma::randi<arma::Row<size_t>>(100, arma::distr_param(0, 1));
126
127// Create a LogisticRegressionFunction for both the training and validation data.
128LogisticRegressionFunction lrfTrain(trainingData, trainingLabels);
129LogisticRegressionFunction lrfValidation(validationData, validationLabels);
130
131// Create a callback that will terminate when the validation loss starts to
132// increase.
133EarlyStopAtMinLoss cb(
134    [&](const arma::mat& coordinates)
135    {
136      // You could also, e.g., print the validation loss here to watch it converge.
137      return lrfValidation.Evaluate(coordinates);
138    });
139
140arma::mat coordinates = lrfTrain.GetInitialPoint();
141SMORMS3 smorms3;
142smorms3.Optimize(lrfTrain, coordinates, cb);
143```
144
145</details>
146
147### PrintLoss
148
149Callback that prints loss to stdout or a specified output stream.
150
151#### Constructors
152
153 * `PrintLoss()`
154 * `PrintLoss(`_`output`_`)`
155
156#### Attributes
157
158| **type** | **name** | **description** | **default** |
159|----------|----------|-----------------|-------------|
160| `std::ostream` | **`output`** | Ostream which receives output from this object. | `stdout` |
161
162#### Examples:
163
164<details open>
165<summary>Click to collapse/expand example code.
166</summary>
167
168```c++
169AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
170
171RosenbrockFunction f;
172arma::mat coordinates = f.GetInitialPoint();
173optimizer.Optimize(f, coordinates, PrintLoss());
174```
175
176</details>
177
178### ProgressBar
179
180Callback that prints a progress bar to stdout or a specified output stream.
181
182#### Constructors
183
184 * `ProgressBar()`
185 * `ProgressBar(`_`width`_`)`
186 * `ProgressBar(`_`width, output`_`)`
187
188#### Attributes
189
190| **type** | **name** | **description** | **default** |
191|----------|----------|-----------------|-------------|
192| `size_t` | **`width`** | Width of the bar. | `70` |
193| `std::ostream` | **`output`** | Ostream which receives output from this object. | `stdout` |
194
195#### Examples:
196
197<details open>
198<summary>Click to collapse/expand example code.
199</summary>
200
201```c++
202AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
203
204RosenbrockFunction f;
205arma::mat coordinates = f.GetInitialPoint();
206optimizer.Optimize(f, coordinates, ProgressBar());
207```
208
209</details>
210
211### Report
212
213Callback that prints a optimizer report to stdout or a specified output stream.
214
215#### Constructors
216
217 * `Report()`
218 * `Report(`_`iterationsPercentage`_`)`
219 * `Report(`_`iterationsPercentage, output`_`)`
220 * `Report(`_`iterationsPercentage, output, outputMatrixSize`_`)`
221
222#### Attributes
223
224| **type** | **name** | **description** | **default** |
225|----------|----------|-----------------|-------------|
226| `double` | **`iterationsPercentage`** | The number of iterations to report in percent, between [0, 1]. | `0.1` |
227| `std::ostream` | **`output`** | Ostream which receives output from this object. | `stdout` |
228| `size_t` | **`outputMatrixSize`** | The number of values to output for the function coordinates. | `4` |
229
230#### Examples:
231
232<details open>
233<summary>Click to collapse/expand example code.
234</summary>
235
236```c++
237AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
238
239RosenbrockFunction f;
240arma::mat coordinates = f.GetInitialPoint();
241optimizer.Optimize(f, coordinates, Report(0.1));
242```
243
244<details open>
245<summary>Click to collapse/expand example output.
246</summary>
247
248```
249Optimization Report
250--------------------------------------------------------------------------------
251
252Initial Coordinates:
253  -1.2000   1.0000
254
255Final coordinates:
256  -1.0490   1.1070
257
258iter          loss          loss change   |gradient|    step size     total time
2590             24.2          0             233           1             4.27e-05
260100           8.6           15.6          104           1             0.000215
261200           5.26          3.35          48.7          1             0.000373
262300           4.49          0.767         23.4          1             0.000533
263400           4.31          0.181         11.3          1             0.000689
264500           4.27          0.0431        5.4           1             0.000846
265600           4.26          0.012         2.86          1             0.00101
266700           4.25          0.00734       2.09          1             0.00117
267800           4.24          0.00971       1.95          1             0.00132
268900           4.22          0.0146        1.91          1             0.00148
269
270--------------------------------------------------------------------------------
271
272Version:
273ensmallen:                    2.13.0 (Automatically Automated Automation)
274armadillo:                    9.900.1 (Nocturnal Misbehaviour)
275
276Function:
277Number of functions:          1
278Coordinates rows:             2
279Coordinates columns:          1
280
281Loss:
282Initial                       24.2
283Final                         4.2
284Change                        20
285
286Optimizer:
287Maximum iterations:           1000
288Reached maximum iterations:   true
289Batchsize:                    1
290Iterations:                   1000
291Number of epochs:             1001
292Initial step size:            1
293Final step size:              1
294Coordinates max. norm:        233
295Evaluate calls:               1000
296Gradient calls:               1000
297Time (in seconds):            0.00163
298```
299
300### StoreBestCoordinates
301
302Callback that stores the model parameter after every epoch if the objective
303decreased.
304
305#### Constructors
306
307 * `StoreBestCoordinates<`_`ModelMatType`_`>()`
308
309The _`ModelMatType`_ template parameter refers to the matrix type of the model
310parameter.
311
312#### Attributes
313
314The stored model parameter can be accessed via the member method
315`BestCoordinates()` and the best objective via `BestObjective()`.
316
317#### Examples:
318
319
320<details open>
321<summary>Click to collapse/expand example code.
322</summary>
323
324```c++
325AdaDelta optimizer(1.0, 1, 0.99, 1e-8, 1000, 1e-9, true);
326
327RosenbrockFunction f;
328arma::mat coordinates = f.GetInitialPoint();
329
330StoreBestCoordinates<arma::mat> cb;
331optimizer.Optimize(f, coordinates, cb);
332
333std::cout << "The optimized model found by AdaDelta has the "
334      << "parameters " << cb.BestCoordinates();
335```
336
337</details>
338
339## Callback States
340
341Callbacks are called at several states during the optimization process:
342
343* At the beginning and end of the optimization process.
344* After any call to `Evaluate()` and `EvaluateConstraint`.
345* After any call to `Gradient()` and `GradientConstraint`.
346* At the start and end of an epoch.
347
348Each callback provides optimization relevant information that can be accessed or
349modified.
350
351### BeginOptimization
352
353Called at the beginning of the optimization process.
354
355 * `BeginOptimization(`_`optimizer, function, coordinates`_`)`
356
357#### Attributes
358
359| **type** | **name** | **description** |
360|----------|----------|-----------------|
361| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. |
362| `FunctionType` | **`function`** | The function to be optimized. |
363| `MatType` | **`coordinates`** | The current function parameter. |
364
365### EndOptimization
366
367Called at the end of the optimization process.
368
369 * `EndOptimization(`_`optimizer, function, coordinates`_`)`
370
371#### Attributes
372
373| **type** | **name** | **description** |
374|----------|----------|-----------------|
375| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. |
376| `FunctionType` | **`function`** | The function to be optimized. |
377| `MatType` | **`coordinates`** | The current function parameter. |
378
379### Evaluate
380
381Called after any call to `Evaluate()`.
382
383 * `Evaluate(`_`optimizer, function, coordinates, objective`_`)`
384
385#### Attributes
386
387| **type** | **name** | **description** |
388|----------|----------|-----------------|
389| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. |
390| `FunctionType` | **`function`** | The function to be optimized. |
391| `MatType` | **`coordinates`** | The current function parameter. |
392| `double` | **`objective`** | Objective value of the current point. |
393
394### EvaluateConstraint
395
396 Called after any call to `EvaluateConstraint()`.
397
398 * `EvaluateConstraint(`_`optimizer, function, coordinates, constraint, constraintValue`_`)`
399
400#### Attributes
401
402| **type** | **name** | **description** |
403|----------|----------|-----------------|
404| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. |
405| `FunctionType` | **`function`** | The function to be optimized. |
406| `MatType` | **`coordinates`** | The current function parameter. |
407| `size_t` | **`constraint`** | The index of the constraint. |
408| `double` | **`constraintValue`** | Constraint value of the current point. |
409
410### Gradient
411
412 Called after any call to `Gradient()`.
413
414 * `Gradient(`_`optimizer, function, coordinates, gradient`_`)`
415
416#### Attributes
417
418| **type** | **name** | **description** |
419|----------|----------|-----------------|
420| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. |
421| `FunctionType` | **`function`** | The function to be optimized. |
422| `MatType` | **`coordinates`** | The current function parameter. |
423| `GradType` | **`gradient`** | Matrix that holds the gradient. |
424
425### GradientConstraint
426
427 Called after any call to `GradientConstraint()`.
428
429 * `GradientConstraint(`_`optimizer, function, coordinates, constraint, gradient`_`)`
430
431#### Attributes
432
433| **type** | **name** | **description** |
434|----------|----------|-----------------|
435| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. |
436| `FunctionType` | **`function`** | The function to be optimized. |
437| `MatType` | **`coordinates`** | The current function parameter. |
438| `size_t` | **`constraint`** | The index of the constraint. |
439| `GradType` | **`gradient`** | Matrix that holds the gradient. |
440
441### BeginEpoch
442
443Called at the beginning of a pass over the data. The objective may be exact or
444an estimate depending on `exactObjective` value.
445
446 * `BeginEpoch(`_`optimizer, function, coordinates, epoch, objective`_`)`
447
448#### Attributes
449
450| **type** | **name** | **description** |
451|----------|----------|-----------------|
452| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. |
453| `FunctionType` | **`function`** | The function to be optimized. |
454| `MatType` | **`coordinates`** | The current function parameter. |
455| `size_t` | **`epoch`** | The index of the current epoch. |
456| `double` | **`objective`** | Objective value of the current point. |
457
458### EndEpoch
459
460Called at the end of a pass over the data. The objective may be exact or
461an estimate depending on `exactObjective` value.
462
463 * `EndEpoch(`_`optimizer, function, coordinates, epoch, objective`_`)`
464
465#### Attributes
466
467| **type** | **name** | **description** |
468|----------|----------|-----------------|
469| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. |
470| `FunctionType` | **`function`** | The function to be optimized. |
471| `MatType` | **`coordinates`** | The current function parameter. |
472| `size_t` | **`epoch`** | The index of the current epoch. |
473| `double` | **`objective`** | Objective value of the current point. |
474
475### GenerationalStepTaken
476
477Called after the evolution of a single generation. Intended specifically for
478MultiObjective Optimizers.
479
480 * `GenerationalStepTaken(`_`optimizer, function, coordinates, objectives, frontIndices`_`)`
481
482#### Attributes
483
484| **type** | **name** | **description** |
485|----------|----------|-----------------|
486| `OptimizerType` | **`optimizer`** | The optimizer used to update the function. |
487| `FunctionType` | **`function`** | The function to be optimized. |
488| `MatType` | **`coordinates`** | The current function parameter. |
489| `ObjectivesVecType` | **`objectives`** | The set of calculated objectives so far. |
490| `IndicesType` | **`frontIndices`** | The indices of the members belonging to Pareto Front. |
491
492## Custom Callbacks
493
494### Learning rate scheduling
495
496Setting the learning rate is crucially important when training because it
497controls both the speed of convergence and the ultimate performance of the
498model. One of the simplest learning rate strategies is to have a fixed learning
499rate throughout the training process. Choosing a small learning rate allows the
500optimizer to find good solutions, but this comes at the expense of limiting the
501initial speed of convergence. To overcome this tradeoff, changing the learning
502rate as more epochs have passed is commonly done in model training. The
503`Evaluate` method in combination with the ``StepSize`` method of the optimizer
504can be used to update the variables.
505
506Example code showing how to implement a custom callback to change the learning
507rate is given below.
508
509<details>
510<summary>Click to collapse/expand example code.
511</summary>
512
513```c++
514class ExponentialDecay
515{
516  // Set up the exponential decay learning rate scheduler with the user
517  // specified decay value.
518  ExponentialDecay(const double decay) : decay(decay), learningRate(0) { }
519
520
521  // Callback function called at the start of the optimization process.
522  // In this example we will use this to save the initial learning rate.
523  template<typename OptimizerType, typename FunctionType, typename MatType>
524  void BeginOptimization(OptimizerType& /* optimizer */,
525                         FunctionType& /* function */,
526                         MatType& /* coordinates */)
527  {
528    // Save the initial learning rate.
529    learningRate = optimizer.StepSize();
530  }
531
532  // Callback function called at the end of a pass over the data. We are only
533  // interested in the current epoch and the optimizer, we ignore the rest.
534  template<typename OptimizerType, typename FunctionType, typename MatType>
535  void EndEpoch(OptimizerType& optimizer,
536                FunctionType& /* function */,
537                const MatType& /* coordinates */,
538                const size_t epoch,
539                const double objective)
540  {
541    // Update the learning rate.
542    optimizer.StepSize() = learningRate * (1.0 - std::pow(decay,
543        (double) epoch));
544  }
545
546  double learningRate;
547};
548
549int main()
550{
551  // First, generate some random data, with 10000 points and 10 dimensions.
552  // This data has no pattern and as such will make a model that's not very
553  // useful---but the purpose here is just demonstration. :)
554  //
555  // For a more "real world" situation, load a dataset from file using X.load()
556  // and y.load() (but make sure the matrix is column-major, so that each
557  // observation/data point corresponds to a *column*, *not* a row.
558  arma::mat data(10, 10000, arma::fill::randn);
559  arma::rowvec responses(10000, arma::fill::randn);
560
561  // Create a starting point for our optimization randomly.  The model has 10
562  // parameters, so the shape is 10x1.
563  arma::mat startingPoint(10, 1, arma::fill::randn);
564
565  // Construct the objective function.
566  LinearRegressionFunction lrf(data, responses);
567  arma::mat lrfParams(startingPoint);
568
569  // Create the StandardSGD optimizer with specified parameters.
570  // The ens::StandardSGD type can be replaced with any ensmallen optimizer
571  //that can handle differentiable functions.
572  StandardSGD optimizer(0.001, 1, 0, 1e-15, true);
573
574  // Use the StandardSGD optimizer with specified parameters to minimize the
575  // LinearRegressionFunction and pass the *exponential decay*
576  // callback function from above.
577  optimizer.Optimize(lrf, lrfParams, ExponentialDecay(0.01));
578
579  // Print the trained model parameter.
580  std::cout << lrfParams.t();
581}
582```
583
584</details>
585
586### Early stopping at minimum loss
587
588Early stopping is a technique for controlling overfitting in machine learning
589models, especially neural networks, by stopping the optimization process before
590the model has trained for the maximum number of iterations.
591
592Example code showing how to implement a custom callback to stop the optimization
593when the minimum of loss has been reached is given below.
594
595<details>
596<summary>Click to collapse/expand example code.
597</summary>
598
599```c++
600#include <ensmallen.hpp>
601
602// This class implements early stopping at minimum loss callback function to
603// terminate the optimization process early if the loss stops decreasing.
604class EarlyStop
605{
606 public:
607  // Set up the early stop at min loss class, which keeps track of the minimum
608  // loss.
609  EarlyStop() : bestObjective(std::numeric_limits<double>::max()) { }
610
611  // Callback function called at the end of a pass over the data, which provides
612  // the current objective. We are only interested in the objective and ignore
613  // the rest.
614  template<typename OptimizerType, typename FunctionType, typename MatType>
615  void EndEpoch(OptimizerType& /* optimizer */,
616                FunctionType& /* function */,
617                const MatType& /* coordinates */,
618                const size_t /* epoch */,
619                const double objective)
620  {
621    // Check if the given objective is lower as the previous objective.
622    if (objective < bestObjective)
623    {
624      // Update the local objective.
625      bestObjective = objective;
626    }
627    else
628    {
629      // Stop the optimization process.
630      return true;
631    }
632
633    // Do not stop the optimization process.
634    return false;
635  }
636
637  // Locally-stored best objective.
638  double bestObjective;
639};
640
641int main()
642{
643  // First, generate some random data, with 10000 points and 10 dimensions.
644  // This data has no pattern and as such will make a model that's not very
645  // useful---but the purpose here is just demonstration. :)
646  //
647  // For a more "real world" situation, load a dataset from file using X.load()
648  // and y.load() (but make sure the matrix is column-major, so that each
649  // observation/data point corresponds to a *column*, *not* a row.
650  arma::mat data(10, 10000, arma::fill::randn);
651  arma::rowvec responses(10000, arma::fill::randn);
652
653  // Create a starting point for our optimization randomly.  The model has 10
654  // parameters, so the shape is 10x1.
655  arma::mat startingPoint(10, 1, arma::fill::randn);
656
657  // Construct the objective function.
658  LinearRegressionFunction lrf(data, responses);
659  arma::mat lrfParams(startingPoint);
660
661  // Create the L_BFGS optimizer with default parameters.
662  // The ens::L_BFGS type can be replaced with any ensmallen optimizer that can
663  // handle differentiable functions.
664  ens::L_BFGS lbfgs;
665
666  // Use the L_BFGS optimizer with default parameters to minimize the
667  // LinearRegressionFunction and pass the *early stopping at minimum loss*
668  // callback function from above.
669  lbfgs.Optimize(lrf, lrfParams, EarlyStop());
670
671  // Print the trained model parameter.
672  std::cout << lrfParams.t();
673}
674```
675
676</details>
677
678Note that we have simply passed an instantiation of `EarlyStop` the
679rest is handled inside the optimizer.
680
681ensmallen provides a more complete and general implementation of a
682[early stopping](#EarlyStopAtMinLoss) at minimum loss callback function.
683