1LogSoftmax {#dev_guide_logsoftmax}
2============================
3
4>
5> [API Reference](@ref dnnl_api_logsoftmax)
6>
7
8## General
9
10The logsoftmax primitive performs softmax along a particular axis on data with
11arbitrary dimensions followed by the logarithm function. All other axes are
12treated as independent (batch).
13
14### Forward
15
16In general form, the operation is defined by the following formulas (the
17variable names follow the standard @ref dev_guide_conventions). Second form is
18used as more numerically stable:
19
20\f[
21    \dst(\overline{ou}, c, \overline{in}) =
22        \ln\left({\frac
23        {
24            e^{\src(\overline{ou}, c, \overline{in}) - \nu(\overline{ou}, \overline{in})}
25        }
26        {
27            \sum\limits_{ic}
28                e^{\src(\overline{ou}, ic, \overline{in}) - \nu(\overline{ou}, \overline{in})}
29        }}\right) =
30        \left(\src(\overline{ou}, c, \overline{in}) - \nu(\overline{ou}, \overline{in})\right)
31            - \ln\left(
32                    \sum\limits_{ic}
33                    e^{\src(\overline{ou}, ic, \overline{in}) - \nu(\overline{ou}, \overline{in})}
34                 \right),
35\f]
36
37where
38
39- \f$c\f$ axis over which the logsoftmax computation is computed on,
40- \f$\overline{ou}\f$ is the outermost index (to the left of logsoftmax axis),
41- \f$\overline{in}\f$ is the innermost index (to the right of logsoftmax axis), and
42- \f$\nu\f$ is used to produce more accurate results and defined as:
43
44\f[
45    \nu(\overline{ou}, \overline{in}) =
46        \max\limits_{ic}
47        \src(\overline{ou}, ic, \overline{in})
48\f]
49
50#### Difference Between Forward Training and Forward Inference
51
52There is no difference between the #dnnl_forward_training
53and #dnnl_forward_inference propagation kinds.
54
55### Backward
56
57The backward propagation computes \f$\diffsrc(ou, c, in)\f$, based on
58\f$\diffdst(ou, c, in)\f$ and \f$\dst(ou, c, in)\f$.
59
60## Execution Arguments
61
62When executed, the inputs and outputs should be mapped to an execution
63argument index as specified by the following table.
64
65| Primitive input/output | Execution argument index |
66| ---                    | ---                      |
67| \src                   | DNNL_ARG_SRC             |
68| \dst                   | DNNL_ARG_DST             |
69| \diffsrc               | DNNL_ARG_DIFF_SRC        |
70| \diffdst               | DNNL_ARG_DIFF_DST        |
71
72## Implementation Details
73
74### General Notes
75
761. Both forward and backward propagation support in-place operations, meaning
77   that `src` can be used as input and output for forward propagation, and
78   `diff_dst` can be used as input and output for backward propagation. In case
79   of in-place operation, the original data will be overwritten.
80
81### Post-ops and Attributes
82
83The logsoftmax primitive does not support any post-ops or attributes.
84
85### Data Type Support
86
87The logsoftmax primitive supports the following combinations of data types:
88
89| Propagation        | Source / Destination
90| :--                | :--
91| forward / backward | bf16, f32
92
93### Data Representation
94
95#### Source, Destination, and Their Gradients
96
97The logsoftmax primitive works with arbitrary data tensors. There is no special
98meaning associated with any logical dimensions. However, the logsoftmax axis is
99typically referred to as channels (hence in formulas we use \f$c\f$).
100
101
102## Implementation Limitations
103
1041. No primitive specific limitations. Refer to @ref dev_guide_data_types for
105   limitations related to data types support.
106
1072. **GPU**
108    - No support.
109
110## Performance Tips
111
1121. Use in-place operations whenever possible.
113
1142. Currently the softmax primitive is optimized for the cases where
115   the dimension of the softmax axis is physically dense. For instance:
116   - Optimized: 2D case, tensor \f$A \times B\f$,
117                softmax axis 1 (B), format tag #dnnl_ab
118   - Optimized: 4D case, tensor \f$A \times B \times C \times D\f$,
119                softmax axis 3 (D), format tag #dnnl_abcd
120   - Optimized: 4D case, tensor \f$A \times B \times C \times D\f$,
121                softmax axis 1 (B), format tag #dnnl_abcd, and
122                \f$C = D = 1\f$
123   - Optimized: 4D case, tensor \f$A \times B \times C \times D\f$,
124                softmax axis 1 (B), format tag #dnnl_acdb or #dnnl_aBcd16b, and
125                \f$C \cdot D \ne 1\f$
126   - Non-optimized: 2D case, tensor \f$A \times B\f$,
127                    softmax axis 0 (A), format tag #dnnl_ab,
128                    and \f$B \ne 1\f$
129   - Non-optimized: 2D case, tensor \f$A \times B\f$,
130                    softmax axis 1 (B), format tag #dnnl_ba,
131                    and \f$A \ne 1\f$
132   - Non-optimized: 4D case, tensor \f$A \times B \times C \times D\f$,
133                    softmax axis 2 (C), format tag #dnnl_acdb, and
134                    and \f$D \cdot B \ne 1\f$
135
136## Examples
137
138| Engine  | Name                        | Comments
139| :--     | :--                         | :--
140| CPU/GPU | @ref logsoftmax_example_cpp | @copydetails logsoftmax_example_cpp_short
141