theory/probabilistic_modeling/boxcox_transformation.rst

.. _boxcox_transformation:

Box Cox transformation
======================

| We consider :math:`X: \Omega \times \cD \rightarrow \Rset^d` a
  multivariate stochastic process of dimension :math:`d` where
  :math:`\cD \in \Rset^n` and :math:`\omega \in \Omega` is an event. We
  suppose that the process is :math:`\cL^2(\Omega)`.
| We note :math:`X_{\vect{t}}: \Omega \rightarrow \Rset^d` the random
  variable at the vertex :math:`\vect{t} \in \cD` defined by
  :math:`X_{\vect{t}}(\omega)=X(\omega, \vect{t})`.
| If the variance of :math:`X_{\vect{t}}` depends on the vertex
  :math:`\vect{t}`, the Box Cox transformation maps the process
  :math:`X` into the process :math:`Y` such that the variance of
  :math:`Y_{\vect{t}}` is constant (at the first order at least) with
  respect to :math:`\vect{t}`.
| We present here:

-  the estimation of the Box Cox transformation from a given field of
   the process :math:`X`,

-  the action of the Box Cox transformation on a field generated from
   :math:`X`.

| We note :math:`h: \Rset^d \rightarrow \Rset^d` the Box Cox
  transformation which maps the process :math:`X` into the process
  :math:`Y: \Omega \times \cD \rightarrow \Rset^d`, where
  :math:`Y=h(X)`, such that :math:`\Var{Y_{\vect{t}}}` is independent of
  :math:`\vect{t}` at the first order.
| We suppose that :math:`X_{\vect{t}}` is a positive random variable for
  any :math:`\vect{t}`. To verify that constraint, it may be needed to
  consider the shifted process :math:`X+\vect{\alpha}`.
| We illustrate some usual Box Cox transformations :math:`h` in the
  scalar case (:math:`d`\ =1), using the Taylor development of
  :math:`h: \Rset \rightarrow \Rset` at the mean point of
  :math:`X_{\vect{t}}`.
| In the multivariate case, we estimate the Box Cox transformation
  component by component and we define the multivariate Box Cox
  transformation as the aggregation of the marginal Box Cox
  transformations.

| **Marginal Box Cox transformation:**
| The first order Taylor development of :math:`h` around
  :math:`\Expect{Y_{\vect{t}}}` writes:

  .. math::

      \forall \vect{t} \in \cD, h(X_{\vect{t}}) = h(\Expect{X_{\vect{t}}}) + (X_{\vect{t}} - \Expect{X_{\vect{t}}})h'(\Expect{X_{\vect{t}}})

which leads to:

  .. math::

      \Expect{h(X_{\vect{t}})} = h(\Expect{X_{\vect{t}}})

and then:

  .. math::

      \Var{h(X_{\vect{t}})} = h'(\Expect{X_{\vect{t}}})^2  \Var{X_{\vect{t}}}

To have :math:`\Var{h(X_{\vect{t}})}` constant with respect to
:math:`\vect{t}` at the first order, we need:

.. math::
  :label: eqh

    h'(\Expect{X_{\vect{t}}}) = k \left(  \Var{X_{\vect{t}}} \right)^{-1/2}

Now, we make some additional hypotheses on the relation between
:math:`\Expect{X_{\vect{t}}}` and :math:`\Var{X_{\vect{t}}}`:

-  If we suppose that
   :math:`\Var{X_{\vect{t}}} \propto \Expect{X_{\vect{t}}}`, then
   :eq:`eqh` leads to the function :math:`h(y) \propto \sqrt{y}` and we
   take :math:`h(y) = \sqrt{y}, y~>~0`;

-  If we suppose that
   :math:`\Var{X_{\vect{t}}} \propto (\Expect{X_{\vect{t}}})^2` , then
   :eq:`eqh` leads to the function :math:`h(y) \propto \log{y}` and we
   take :math:`h(y) = \log{y}, y>0`;

-  More generally, if we suppose that
   :math:`\Var{X_{\vect{t}}} \propto (\Expect{X_{\vect{t}}})^{\beta}`,
   then :eq:`eqh` leads to the function :math:`h_\lambda` parametrized by
   the scalar :math:`\lambda`:

   .. math::
    :label: BoxCoxModel

        h_\lambda(y) =
        \left\{
        \begin{array}{ll}
          \frac{y^\lambda-1}{\lambda} & \lambda \neq 0 \\
          \log(y)                     & \lambda = 0
        \end{array}
        \right.


where :math:`\lambda = 1-\frac{\beta}{2}`.

The inverse Box Cox transformation is defined by:

.. math::
  :label: InverseBoxCoxModel

     h^{-1}_\lambda(y) =
     \left\{
     \begin{array}{ll}
       \displaystyle (\lambda y + 1)^{\frac{1}{\lambda}} & \lambda \neq 0 \\
       \displaystyle \exp(y)                          & \lambda = 0
     \end{array}
     \right.

| **Estimation of the Box Cox transformation:**
| The parameter :math:`\lambda` is estimated from a given field of the
  process :math:`X` as follows.
| The estimation of :math:`\lambda` given below is optimized in the case
  when :math:`h_\lambda(X_{\vect{t}}) \sim \cN(\beta , \sigma^2 )` at
  each vertex :math:`\vect{t}`. If it is not the case, that estimation
  can be considered as a proposition, with no guarantee.
| The parameters :math:`(\beta,\sigma,\lambda)` are then estimated by
  the maximum likelihood estimators. We note
  :math:`\Phi_{\beta, \sigma}` and :math:`\phi_{\beta, \sigma}`
  respectively the cumulative distribution function and the density
  probability function of the :math:`\cN(\beta , \sigma^2)`
  distribution.
| For all vertices :math:`\vect{t}`, we have:

  .. math::
    :label: cdfYt

      \forall v \geq 0, \, \Prob{ X_{\vect{t}} \leq v } = \Prob{ h_\lambda(X_{\vect{t}}) \leq h_\lambda(v) } \\
      = \Phi_{\beta, \sigma} \left(h_\lambda(v)\right)

from which we derive the density probability function :math:`p` of
:math:`X_{\vect{t}}` for all vertices :math:`\vect{t}`:

.. math::
  :label: pdfYt

    p(v) = h_\lambda'(v)\phi_{\beta, \sigma}(v) = v^{\lambda - 1}\phi_{\beta, \sigma}(v)

Using :eq:`pdfYt`, the likelihood of the values
:math:`(x_0, \dots, x_{N-1})` with respect to the model :eq:`cdfYt`
writes:

.. math::
  :label: LKH

     L(\beta,\sigma,\lambda) =
     \underbrace{ \frac{1}{(2\pi)^{N/2}}
       \times
       \frac{1}{(\sigma^2)^{N/2}}
       \times
       \exp\left[
         -\frac{1}{2\sigma^2}
         \sum_{k=0}^{N-1}
         \left(
         h_\lambda(x_k)-\beta
         \right)^2
         \right]
     }_{\Psi(\beta, \sigma)}
     \times
     \prod_{k=0}^{N-1} x_k^{\lambda - 1}

We notice that for each fixed :math:`\lambda`, the likelihood equation
is proportional to the likelihood equation which estimates
:math:`(\beta, \sigma^2)`. Thus, the maximum likelihood estimator for
:math:`(\beta(\lambda), \sigma^2(\lambda))` for a given :math:`\lambda`
are:

.. math::
  :label: eqBetaSigma

    \hat{\beta}(\lambda) = \frac{1}{N} \sum_{k=0}^{N-1} h_{\lambda}(x_k) \\
    \hat{\sigma}^2(\lambda) = \frac{1}{N} \sum_{k=0}^{N-1} (h_{\lambda}(x_k) - \beta(\lambda))^2

| Substituting :eq:`eqBetaSigma` into :eq:`LKH` and taking the
  :math:`\log-`\ likelihood, we obtain:

  .. math::
    :label: lLambda

    \ell(\lambda) = \log L( \hat{\beta}(\lambda), \hat{\sigma}(\lambda),\lambda ) = C -
    \frac{N}{2}
    \log\left[\hat{\sigma}^2(\lambda)\right]
    \;+\;
    \left(\lambda - 1 \right) \sum_{k=0}^{N-1} \log(x_i)\,,

where :math:`C` is a constant.

The parameter :math:`\hat{\lambda}` is the one maximizing :math:`\ell(\lambda)` defined in :eq:`lLambda`.

.. topic:: API:

    - See :class:`~openturns.BoxCoxTransform`
    - See :class:`~openturns.InverseBoxCoxTransform`
    - See :class:`~openturns.BoxCoxFactory`

.. topic:: Examples:

    - See :doc:`/auto_probabilistic_modeling/stochastic_processes/plot_box_cox_transform`