1LogSoftmax {#dev_guide_logsoftmax} 2============================ 3 4> 5> [API Reference](@ref dnnl_api_logsoftmax) 6> 7 8## General 9 10The logsoftmax primitive performs softmax along a particular axis on data with 11arbitrary dimensions followed by the logarithm function. All other axes are 12treated as independent (batch). 13 14### Forward 15 16In general form, the operation is defined by the following formulas (the 17variable names follow the standard @ref dev_guide_conventions). Second form is 18used as more numerically stable: 19 20\f[ 21 \dst(\overline{ou}, c, \overline{in}) = 22 \ln\left({\frac 23 { 24 e^{\src(\overline{ou}, c, \overline{in}) - \nu(\overline{ou}, \overline{in})} 25 } 26 { 27 \sum\limits_{ic} 28 e^{\src(\overline{ou}, ic, \overline{in}) - \nu(\overline{ou}, \overline{in})} 29 }}\right) = 30 \left(\src(\overline{ou}, c, \overline{in}) - \nu(\overline{ou}, \overline{in})\right) 31 - \ln\left( 32 \sum\limits_{ic} 33 e^{\src(\overline{ou}, ic, \overline{in}) - \nu(\overline{ou}, \overline{in})} 34 \right), 35\f] 36 37where 38 39- \f$c\f$ axis over which the logsoftmax computation is computed on, 40- \f$\overline{ou}\f$ is the outermost index (to the left of logsoftmax axis), 41- \f$\overline{in}\f$ is the innermost index (to the right of logsoftmax axis), and 42- \f$\nu\f$ is used to produce more accurate results and defined as: 43 44\f[ 45 \nu(\overline{ou}, \overline{in}) = 46 \max\limits_{ic} 47 \src(\overline{ou}, ic, \overline{in}) 48\f] 49 50#### Difference Between Forward Training and Forward Inference 51 52There is no difference between the #dnnl_forward_training 53and #dnnl_forward_inference propagation kinds. 54 55### Backward 56 57The backward propagation computes \f$\diffsrc(ou, c, in)\f$, based on 58\f$\diffdst(ou, c, in)\f$ and \f$\dst(ou, c, in)\f$. 59 60## Execution Arguments 61 62When executed, the inputs and outputs should be mapped to an execution 63argument index as specified by the following table. 64 65| Primitive input/output | Execution argument index | 66| --- | --- | 67| \src | DNNL_ARG_SRC | 68| \dst | DNNL_ARG_DST | 69| \diffsrc | DNNL_ARG_DIFF_SRC | 70| \diffdst | DNNL_ARG_DIFF_DST | 71 72## Implementation Details 73 74### General Notes 75 761. Both forward and backward propagation support in-place operations, meaning 77 that `src` can be used as input and output for forward propagation, and 78 `diff_dst` can be used as input and output for backward propagation. In case 79 of in-place operation, the original data will be overwritten. 80 81### Post-ops and Attributes 82 83The logsoftmax primitive does not support any post-ops or attributes. 84 85### Data Type Support 86 87The logsoftmax primitive supports the following combinations of data types: 88 89| Propagation | Source / Destination 90| :-- | :-- 91| forward / backward | bf16, f32 92 93### Data Representation 94 95#### Source, Destination, and Their Gradients 96 97The logsoftmax primitive works with arbitrary data tensors. There is no special 98meaning associated with any logical dimensions. However, the logsoftmax axis is 99typically referred to as channels (hence in formulas we use \f$c\f$). 100 101 102## Implementation Limitations 103 1041. No primitive specific limitations. Refer to @ref dev_guide_data_types for 105 limitations related to data types support. 106 1072. **GPU** 108 - No support. 109 110## Performance Tips 111 1121. Use in-place operations whenever possible. 113 1142. Currently the softmax primitive is optimized for the cases where 115 the dimension of the softmax axis is physically dense. For instance: 116 - Optimized: 2D case, tensor \f$A \times B\f$, 117 softmax axis 1 (B), format tag #dnnl_ab 118 - Optimized: 4D case, tensor \f$A \times B \times C \times D\f$, 119 softmax axis 3 (D), format tag #dnnl_abcd 120 - Optimized: 4D case, tensor \f$A \times B \times C \times D\f$, 121 softmax axis 1 (B), format tag #dnnl_abcd, and 122 \f$C = D = 1\f$ 123 - Optimized: 4D case, tensor \f$A \times B \times C \times D\f$, 124 softmax axis 1 (B), format tag #dnnl_acdb or #dnnl_aBcd16b, and 125 \f$C \cdot D \ne 1\f$ 126 - Non-optimized: 2D case, tensor \f$A \times B\f$, 127 softmax axis 0 (A), format tag #dnnl_ab, 128 and \f$B \ne 1\f$ 129 - Non-optimized: 2D case, tensor \f$A \times B\f$, 130 softmax axis 1 (B), format tag #dnnl_ba, 131 and \f$A \ne 1\f$ 132 - Non-optimized: 4D case, tensor \f$A \times B \times C \times D\f$, 133 softmax axis 2 (C), format tag #dnnl_acdb, and 134 and \f$D \cdot B \ne 1\f$ 135 136## Examples 137 138| Engine | Name | Comments 139| :-- | :-- | :-- 140| CPU/GPU | @ref logsoftmax_example_cpp | @copydetails logsoftmax_example_cpp_short 141