1"""
2.. _tut_nn:
3
4Train a neural network potential
5================================
6
7In this tutorial, we train a neural network (NN) potential for silicon.
8
9"""
10
11
12##########################################################################################
13# We are going to fit the NN potential to a training set of energies and forces from
14# compressed and stretched diamond silicon structures (the same training set used in
15# :ref:`tut_kim_sw`).
16# Download the training set :download:`Si_training_set.tar.gz <https://raw.githubusercontent.com/mjwen/kliff/master/examples/Si_training_set.tar.gz>`
17# (It will be automatically downloaded if it is not present.)
18# The data is stored in **extended xyz** format, and see :ref:`doc.dataset` for more
19# information of this format.
20#
21# .. warning::
22#     The ``Si_training_set`` is just a toy data set for the purpose to demonstrate how to
23#     use KLIFF to train potentials. It should not be used to train any potential for real
24#     simulations.
25#
26# Let's first import the modules that will be used in this example.
27
28from kliff import nn
29from kliff.calculators import CalculatorTorch
30from kliff.dataset import Dataset
31from kliff.descriptors import SymmetryFunction
32from kliff.loss import Loss
33from kliff.models import NeuralNetwork
34from kliff.utils import download_dataset
35
36##########################################################################################
37# Model
38# -----
39#
40# For a NN model, we need to specify the descriptor that transforms atomic environment
41# information to the fingerprints, which the NN model uses as the input. Here, we use the
42# symmetry functions proposed by Behler and coworkers.
43
44descriptor = SymmetryFunction(
45    cut_name="cos", cut_dists={"Si-Si": 5.0}, hyperparams="set51", normalize=True
46)
47
48
49##########################################################################################
50# The ``cut_name`` and ``cut_dists`` tell the descriptor what type of cutoff function to
51# use and what the cutoff distances are. ``hyperparams`` specifies the set of
52# hyperparameters used in the symmetry function descriptor. If you prefer, you can provide
53# a dictionary of your own hyperparameters. And finally, ``normalize`` informs that the
54# generated fingerprints should be normalized by first subtracting the mean and then
55# dividing the standard deviation. This normalization typically makes it easier to
56# optimize NN model.
57#
58# We can then build the NN model on top of the descriptor.
59
60N1 = 10
61N2 = 10
62model = NeuralNetwork(descriptor)
63model.add_layers(
64    # first hidden layer
65    nn.Linear(descriptor.get_size(), N1),
66    nn.Tanh(),
67    # second hidden layer
68    nn.Linear(N1, N2),
69    nn.Tanh(),
70    # output layer
71    nn.Linear(N2, 1),
72)
73model.set_save_metadata(prefix="./kliff_saved_model", start=5, frequency=2)
74
75
76##########################################################################################
77# In the above code, we build a NN model with an input layer, two hidden layer, and an
78# output layer. The ``descriptor`` carries the information of the input layer, so it is
79# not needed to be specified explicitly. For each hidden layer, we first do a linear
80# transformation using ``nn.Linear(size_in, size_out)`` (essentially carrying out :math:`y
81# = xW+b`, where :math:`W` is the weight matrix of size ``size_in`` by ``size_out``, and
82# :math:`b` is a vector of size ``size_out``. Then we apply the hyperbolic tangent
83# activation function ``nn.Tanh()`` to the output of the Linear layer (i.e. :math:`y`) so
84# as to add the nonlinearity. We use a Linear layer for the output layer as well, but
85# unlike the hidden layer, no activation function is applied here. The input size
86# ``size_in`` of the first hidden layer must be the size of the descriptor, which is
87# obtained using ``descriptor.get_size()``. For all other layers (hidden or output), the
88# input size must be equal to the output size of the previous layer. The ``out_size`` of
89# the output layer must be 1 such that the output of the NN model gives the energy of the
90# atom.
91#
92# The ``set_save_metadata`` function call informs where to save intermediate models during
93# the optimization (discussed below), and what the starting epoch and how often to save
94# the model.
95#
96#
97# Training set and calculator
98# ---------------------------
99#
100# The training set and the calculator are the same as explained in :ref:`tut_kim_sw`. The
101# only difference is that we need to use the
102# :mod:`~kliff.calculators.CalculatorTorch()`, which is targeted for the NN model.
103# Also, its ``create()`` method takes an argument ``reuse`` to inform whether to reuse the
104# fingerprints generated from the descriptor if it is present.
105
106# training set
107dataset_path = download_dataset(dataset_name="Si_training_set")
108dataset_path = dataset_path.joinpath("varying_alat")
109tset = Dataset(dataset_path)
110configs = tset.get_configs()
111
112# calculator
113calc = CalculatorTorch(model)
114_ = calc.create(configs, reuse=False)
115
116
117##########################################################################################
118# Loss function
119# -------------
120#
121# KLIFF uses a loss function to quantify the difference between the training data and
122# potential predictions and uses minimization algorithms to reduce the loss as much as
123# possible. In the following code snippet, we create a loss function that uses the
124# ``Adam`` optimizer to minimize it. The Adam optimizer supports minimization using
125# `mini-batches` of data, and here we use ``100`` configurations in each minimization step
126# (the training set has a total of 400 configurations as can be seen above), and run
127# through the training set for ``10`` epochs. The learning rate ``lr`` used here is
128# ``0.001``, and typically, one may need to play with this to find an acceptable one that
129# drives the loss down in a reasonable time.
130
131loss = Loss(calc, residual_data={"forces_weight": 0.3})
132result = loss.minimize(method="Adam", num_epochs=10, batch_size=100, lr=0.001)
133
134
135##########################################################################################
136# We can save the trained model to disk, and later can load it back if we want. We can
137# also write the trained model to a KIM model such that it can be used in other simulation
138# codes such as LAMMPS via the KIM API.
139
140model.save("final_model.pkl")
141loss.save_optimizer_state("optimizer_stat.pkl")
142
143model.write_kim_model()
144
145
146##########################################################################################
147# .. note::
148#    Now we have trained an NN for a single specie Si. If you have multiple species in
149#    your system and want to use different parameters for different species,
150#    take a look at the :ref:`tut_nn_multi_spec` example.
151#
152