1Eltwise {#dev_guide_eltwise} 2============================ 3 4> 5> [API Reference](@ref dnnl_api_eltwise) 6> 7 8## General 9 10### Forward 11 12The eltwise primitive applies an operation to every element of the tensor (the 13variable names follow the standard @ref dev_guide_conventions): 14 15\f[ 16 \dst_{i_1, \ldots, i_k} = Operation\left(\src_{i_1, \ldots, i_k}\right). 17\f] 18 19For notational convenience, in the formulas below we will denote individual 20element of \src, \dst, \diffsrc, and \diffdst tensors via s, d, ds, and dd 21respectively. 22 23The following operations are supported: 24 25| Operation | oneDNN algorithm kind | Forward formula | Backward formula (from src) | Backward formula (from dst) | 26| :-- | :-- | :-- | :-- | :-- | 27| abs | #dnnl_eltwise_abs | \f$ d = \begin{cases} s & \text{if}\ s > 0 \\ -s & \text{if}\ s \leq 0 \end{cases} \f$ | \f$ ds = \begin{cases} dd & \text{if}\ s > 0 \\ -dd & \text{if}\ s < 0 \\ 0 & \text{if}\ s = 0 \end{cases} \f$ | -- | 28| bounded_relu | #dnnl_eltwise_bounded_relu | \f$ d = \begin{cases} \alpha & \text{if}\ s > \alpha \geq 0 \\ s & \text{if}\ 0 < s \leq \alpha \\ 0 & \text{if}\ s \leq 0 \end{cases} \f$ | \f$ ds = \begin{cases} dd & \text{if}\ 0 < s \leq \alpha, \\ 0 & \text{otherwise}\ \end{cases} \f$ | -- | 29| clip | #dnnl_eltwise_clip | \f$ d = \begin{cases} \beta & \text{if}\ s > \beta \geq \alpha \\ s & \text{if}\ \alpha < s \leq \beta \\ \alpha & \text{if}\ s \leq \alpha \end{cases} \f$ | \f$ ds = \begin{cases} dd & \text{if}\ \alpha < s \leq \beta \\ 0 & \text{otherwise}\ \end{cases} \f$ | -- | 30| clip_v2 | #dnnl_eltwise_clip_v2 <br> #dnnl_eltwise_clip_v2_use_dst_for_bwd | \f$ d = \begin{cases} \beta & \text{if}\ s \geq \beta \geq \alpha \\ s & \text{if}\ \alpha < s < \beta \\ \alpha & \text{if}\ s \leq \alpha \end{cases} \f$ | \f$ ds = \begin{cases} dd & \text{if}\ \alpha < s < \beta \\ 0 & \text{otherwise}\ \end{cases} \f$ | \f$ ds = \begin{cases} dd & \text{if}\ \alpha < d < \beta \\ 0 & \text{otherwise}\ \end{cases} \f$ | 31| elu | #dnnl_eltwise_elu <br> #dnnl_eltwise_elu_use_dst_for_bwd | \f$ d = \begin{cases} s & \text{if}\ s > 0 \\ \alpha (e^s - 1) & \text{if}\ s \leq 0 \end{cases} \f$ | \f$ ds = \begin{cases} dd & \text{if}\ s > 0 \\ dd \cdot \alpha e^s & \text{if}\ s \leq 0 \end{cases} \f$ | \f$ ds = \begin{cases} dd & \text{if}\ d > 0 \\ dd \cdot (d + \alpha) & \text{if}\ d \leq 0 \end{cases}. See\ (2). \f$ | 32| exp | #dnnl_eltwise_exp <br> #dnnl_eltwise_exp_use_dst_for_bwd | \f$ d = e^s \f$ | \f$ ds = dd \cdot e^s \f$ | \f$ ds = dd \cdot d \f$ | 33| gelu_erf | #dnnl_eltwise_gelu_erf | \f$ d = 0.5 s (1 + \operatorname{erf}[\frac{s}{\sqrt{2}}])\f$ | \f$ ds = dd \cdot \left(0.5 + 0.5 \, \operatorname{erf}\left({\frac{s}{\sqrt{2}}}\right) + \frac{s}{\sqrt{2\pi}}e^{-0.5s^{2}}\right) \f$ | -- | 34| gelu_tanh | #dnnl_eltwise_gelu_tanh | \f$ d = 0.5 s (1 + \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)])\f$ | \f$ See\ (1). \f$ | -- | 35| hardswish | #dnnl_eltwise_hardswish | \f$ d = \begin{cases} s & \text{if}\ s > 3 \\ s \cdot \frac{s + 3}{6} & \text{if}\ -3 < s \leq 3 \\ 0 & \text{otherwise} \end{cases} \f$ | \f$ ds = \begin{cases} dd & \text{if}\ s > 3 \\ dd \cdot \frac{2s + 3}{6} & \text{if}\ -3 < s \leq 3 \\ 0 & \text{otherwise} \end{cases} \f$ | -- | 36| linear | #dnnl_eltwise_linear | \f$ d = \alpha s + \beta \f$ | \f$ ds = \alpha \cdot dd \f$ | -- | 37| log | #dnnl_eltwise_log | \f$ d = \log_{e}{s} \f$ | \f$ ds = \frac{dd}{s} \f$ | -- | 38| logistic | #dnnl_eltwise_logistic <br> #dnnl_eltwise_logistic_use_dst_for_bwd | \f$ d = \frac{1}{1+e^{-s}} \f$ | \f$ ds = \frac{dd}{1+e^{-s}} \cdot (1 - \frac{1}{1+e^{-s}}) \f$ | \f$ ds = dd \cdot d \cdot (1 - d) \f$ | 39| logsigmoid | #dnnl_eltwise_logsigmoid | \f$ d = -\log_{e}(1+e^{-s}) \f$ | \f$ ds = \frac{dd}{1 + e^{s}} \f$ | -- | 40| mish | #dnnl_eltwise_mish | \f$ d = s \cdot \tanh{(\log_{e}(1+e^s))} \f$ | \f$ ds = dd \cdot \frac{e^{s} \cdot \omega}{\delta^{2}}. See\ (3). \f$ | -- | 41| pow | #dnnl_eltwise_pow | \f$ d = \alpha s^{\beta} \f$ | \f$ ds = dd \cdot \alpha \beta s^{\beta - 1} \f$ | -- | 42| relu | #dnnl_eltwise_relu <br> #dnnl_eltwise_relu_use_dst_for_bwd | \f$ d = \begin{cases} s & \text{if}\ s > 0 \\ \alpha s & \text{if}\ s \leq 0 \end{cases} \f$ | \f$ ds = \begin{cases} dd & \text{if}\ s > 0 \\ \alpha \cdot dd & \text{if}\ s \leq 0 \end{cases} \f$ | \f$ ds = \begin{cases} dd & \text{if}\ d > 0 \\ \alpha \cdot dd & \text{if}\ d \leq 0 \end{cases}. See\ (2). \f$ | 43| round | #dnnl_eltwise_round | \f$ d = round(s) \f$ | -- | -- | 44| soft_relu | #dnnl_eltwise_soft_relu | \f$ d = \log_{e}(1+e^s) \f$ | \f$ ds = \frac{dd}{1 + e^{-s}} \f$ | -- | 45| sqrt | #dnnl_eltwise_sqrt <br> #dnnl_eltwise_sqrt_use_dst_for_bwd | \f$ d = \sqrt{s} \f$ | \f$ ds = \frac{dd}{2\sqrt{s}} \f$ | \f$ ds = \frac{dd}{2d} \f$ | 46| square | #dnnl_eltwise_square | \f$ d = s^2 \f$ | \f$ ds = dd \cdot 2 s \f$ | -- | 47| swish | #dnnl_eltwise_swish | \f$ d = \frac{s}{1+e^{-\alpha s}} \f$ | \f$ ds = \frac{dd}{1 + e^{-\alpha s}}(1 + \alpha s (1 - \frac{1}{1 + e^{-\alpha s}})) \f$ | -- | 48| tanh | #dnnl_eltwise_tanh <br> #dnnl_eltwise_tanh_use_dst_for_bwd | \f$ d = \tanh{s} \f$ | \f$ ds = dd \cdot (1 - \tanh^2{s}) \f$ | \f$ ds = dd \cdot (1 - d^2) \f$ | 49 50\f$ (1)\ ds = dd \cdot 0.5 (1 + \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)]) \cdot (1 + \sqrt{\frac{2}{\pi}} (s + 0.134145 s^3) \cdot (1 - \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)]) ) \f$ 51 52\f$ (2)\ \text{Operation is supported only for } \alpha \geq 0. \f$ 53 54\f$ (3)\ \text{where, } \omega = e^{3s} + 4 \cdot e^{2s} + e^{s} \cdot (4 \cdot s + 6) + 4 \cdot (s + 1) \text{ and } \delta = e^{2s} + 2 \cdot e^{s} + 2. \f$ 55 56#### Difference Between Forward Training and Forward Inference 57 58There is no difference between the #dnnl_forward_training 59and #dnnl_forward_inference propagation kinds. 60 61### Backward 62 63The backward propagation computes \diffsrc based on \diffdst and \src tensors. 64However, some operations support a computation using \dst memory produced during 65the forward propagation. Refer to the table above for a list of operations 66supporting destination as input memory and the corresponding formulas. 67 68#### Exceptions 69The eltwise primitive with algorithm round does not support backward 70propagation. 71 72## Execution Arguments 73 74When executed, the inputs and outputs should be mapped to an execution 75argument index as specified by the following table. 76 77| Primitive input/output | Execution argument index | 78| --- | --- | 79| \src | DNNL_ARG_SRC | 80| \dst | DNNL_ARG_DST | 81| \diffsrc | DNNL_ARG_DIFF_SRC | 82| \diffdst | DNNL_ARG_DIFF_DST | 83| \f$\text{binary post-op}\f$ | DNNL_ARG_ATTR_MULTIPLE_POST_OP(binary_post_op_position) \| DNNL_ARG_SRC_1 | 84 85## Implementation Details 86 87### General Notes 88 891. All eltwise primitives have a common initialization function (e.g., 90 dnnl::eltwise_forward::desc::desc()) which takes both parameters 91 \f$\alpha\f$, and \f$\beta\f$. These parameters are ignored if they are 92 unused. 93 942. The memory format and data type for \src and \dst are assumed to be the 95 same, and in the API are typically referred as `data` (e.g., see `data_desc` 96 in dnnl::eltwise_forward::desc::desc()). The same holds for 97 \diffsrc and \diffdst. The corresponding memory descriptors are referred 98 to as `diff_data_desc`. 99 1003. Both forward and backward propagation support in-place operations, meaning 101 that \src can be used as input and output for forward propagation, and 102 \diffdst can be used as input and output for backward propagation. In case of 103 an in-place operation, the original data will be overwritten. Note, however, 104 that some algorithms for backward propagation require original \src, hence 105 the corresponding forward propagation should not be performed in-place for 106 those algorithms. Algorithms that use \dst for backward propagation can be 107 safely done in-place. 108 1094. For some operations it might be beneficial to compute backward 110 propagation based on \f$\dst(\overline{s})\f$, rather than on 111 \f$\src(\overline{s})\f$, for improved performance. 112 1135. For logsigmoid original formula \f$ d = \log_{e}(\frac{1}{1+e^{-s}})\f$ was 114 replaced by \f$ d = -soft\_relu(-s)\f$ for numerical stability. 115 116@note For operations supporting destination memory as input, \dst can be 117used instead of \src when backward propagation is computed. This enables 118several performance optimizations (see the tips below). 119 120### Data Type Support 121 122The eltwise primitive supports the following combinations of data types: 123 124| Propagation | Source / Destination | Intermediate data type 125| :-- | :-- | :-- 126| forward / backward | f32, bf16 | f32 127| forward | f16 | f16 128| forward | s32 / s8 / u8 | f32 129 130@warning 131 There might be hardware and/or implementation specific restrictions. 132 Check [Implementation Limitations](@ref dg_eltwise_impl_limits) section 133 below. 134 135Here the intermediate data type means that the values coming in are first 136converted to the intermediate data type, then the operation is applied, and 137finally the result is converted to the output data type. 138 139### Data Representation 140 141The eltwise primitive works with arbitrary data tensors. There is no special 142meaning associated with any logical dimensions. 143 144### Post-Ops and Attributes 145 146| Propagation | Type | Operation | Description | Restrictions | 147| :-- | :-- | :-- | :-- | :-- | 148| Forward | Post-op | [Binary](@ref dnnl::post_ops::append_binary) | Applies a @ref dnnl_api_binary operation to the result | General binary post-op restrictions | 149 150@anchor dg_eltwise_impl_limits 151## Implementation Limitations 152 1531. Refer to @ref dev_guide_data_types for 154 limitations related to data types support. 155 156## Performance Tips 157 1581. For backward propagation, use the same memory format for \src, \diffdst, 159 and \diffsrc (the format of the \diffdst and \diffsrc are always the 160 same because of the API). Different formats are functionally supported but 161 lead to highly suboptimal performance. 162 1632. Use in-place operations whenever possible (see caveats in General Notes). 164 1653. As mentioned above for all operations supporting destination memory as input, 166 one can use the \dst tensor instead of \src. This enables the 167 following potential optimizations for training: 168 169 - Such operations can be safely done in-place. 170 171 - Moreover, such operations can be fused as a 172 [post-op](@ref dev_guide_attributes) with the previous operation if that 173 operation does not require its \dst to compute the backward 174 propagation (e.g., if the convolution operation satisfies these 175 conditions). 176 177## Example 178 179[Eltwise Primitive Example](@ref eltwise_example_cpp) 180 181@copydetails eltwise_example_cpp_short 182