1Softmax {#dev_guide_softmax} 2============================ 3 4> 5> [API Reference](@ref dnnl_api_softmax) 6> 7 8## General 9 10The softmax primitive performs softmax along a particular axis on data with 11arbitrary dimensions. All other axes are treated as independent (batch). 12 13### Forward 14 15In general form, the operation is defined by the following formulas (the 16variable names follow the standard @ref dev_guide_conventions): 17 18\f[ 19 \dst(\overline{ou}, c, \overline{in}) = 20 \frac 21 {e^{\src(\overline{ou}, c, \overline{in}) - \nu(\overline{ou}, \overline{in})}} 22 { 23 \sum\limits_{ic} 24 e^{\src(\overline{ou}, ic, \overline{in}) - \nu(\overline{ou}, \overline{in})} 25 }, 26\f] 27 28where 29 30- \f$c\f$ axis over which the softmax computation is computed on, 31- \f$\overline{ou}\f$ is the outermost index (to the left of softmax axis), 32- \f$\overline{in}\f$ is the innermost index (to the right of softmax axis), and 33- \f$\nu\f$ is used to produce more accurate results and defined as: 34 35\f[ 36 \nu(\overline{ou}, \overline{in}) = 37 \max\limits_{ic} 38 \src(\overline{ou}, ic, \overline{in}) 39\f] 40 41#### Difference Between Forward Training and Forward Inference 42 43There is no difference between the #dnnl_forward_training 44and #dnnl_forward_inference propagation kinds. 45 46### Backward 47 48The backward propagation computes \f$\diffsrc(ou, c, in)\f$, based on 49\f$\diffdst(ou, c, in)\f$ and \f$\dst(ou, c, in)\f$. 50 51## Execution Arguments 52When executed, the inputs and outputs should be mapped to an execution 53argument index as specified by the following table. 54 55| Primitive input/output | Execution argument index | 56| --- | --- | 57| \src | DNNL_ARG_SRC | 58| \dst | DNNL_ARG_DST | 59| \diffsrc | DNNL_ARG_DIFF_SRC | 60| \diffdst | DNNL_ARG_DIFF_DST | 61 62## Implementation Details 63 64### General Notes 65 661. Both forward and backward propagation support in-place operations, meaning 67 that `src` can be used as input and output for forward propagation, and 68 `diff_dst` can be used as input and output for backward propagation. In case 69 of in-place operation, the original data will be overwritten. 70 71### Post-ops and Attributes 72 73The softmax primitive does not support any post-ops or attributes. 74 75### Data Type Support 76 77The softmax primitive supports the following combinations of data types: 78 79| Propagation | Source / Destination 80| :-- | :-- 81| forward / backward | bf16, f32 82| forward | f16 83 84### Data Representation 85 86#### Source, Destination, and Their Gradients 87 88The softmax primitive works with arbitrary data tensors. There is no special 89meaning associated with any logical dimensions. However, the softmax axis is 90typically referred to as channels (hence in formulas we use \f$c\f$). 91 92 93## Implementation Limitations 94 951. No primitive specific limitations. Refer to @ref dev_guide_data_types for 96 limitations related to data types support. 97 98## Performance Tips 99 1001. Use in-place operations whenever possible. 101 1022. Currently the softmax primitive is optimized for the cases where 103 the dimension of the softmax axis is physically dense. For instance: 104 - Optimized: 2D case, tensor \f$A \times B\f$, 105 softmax axis 1 (B), format tag #dnnl_ab 106 - Optimized: 4D case, tensor \f$A \times B \times C \times D\f$, 107 softmax axis 3 (D), format tag #dnnl_abcd 108 - Optimized: 4D case, tensor \f$A \times B \times C \times D\f$, 109 softmax axis 1 (B), format tag #dnnl_abcd, and 110 \f$C = D = 1\f$ 111 - Optimized: 4D case, tensor \f$A \times B \times C \times D\f$, 112 softmax axis 1 (B), format tag #dnnl_acdb or #dnnl_aBcd16b, and 113 \f$C \cdot D \ne 1\f$ 114 - Non-optimized: 2D case, tensor \f$A \times B\f$, 115 softmax axis 0 (A), format tag #dnnl_ab, 116 and \f$B \ne 1\f$ 117 - Non-optimized: 2D case, tensor \f$A \times B\f$, 118 softmax axis 1 (B), format tag #dnnl_ba, 119 and \f$A \ne 1\f$ 120 - Non-optimized: 4D case, tensor \f$A \times B \times C \times D\f$, 121 softmax axis 2 (C), format tag #dnnl_acdb, and 122 and \f$D \cdot B \ne 1\f$ 123 124## Examples 125 126| Engine | Name | Comments 127| :-- | :-- | :-- 128| CPU/GPU | @ref softmax_example_cpp | @copydetails softmax_example_cpp_short 129