1Eltwise {#dev_guide_eltwise}
2============================
3
4>
5> [API Reference](@ref dnnl_api_eltwise)
6>
7
8## General
9
10### Forward
11
12The eltwise primitive applies an operation to every element of the tensor (the
13variable names follow the standard @ref dev_guide_conventions):
14
15\f[
16    \dst_{i_1, \ldots, i_k} = Operation\left(\src_{i_1, \ldots, i_k}\right).
17\f]
18
19For notational convenience, in the formulas below we will denote individual
20element of \src, \dst, \diffsrc, and \diffdst tensors via s, d, ds, and dd
21respectively.
22
23The following operations are supported:
24
25| Operation    | oneDNN algorithm kind                                              | Forward formula                                                                                                                                             | Backward formula (from src)                                                                                                                  | Backward formula (from dst)                                                                                            |
26| :--          | :--                                                                | :--                                                                                                                                                         | :--                                                                                                                                          | :--                                                                                                                    |
27| abs          | #dnnl_eltwise_abs                                                  | \f$ d = \begin{cases} s & \text{if}\ s > 0 \\ -s & \text{if}\ s \leq 0 \end{cases} \f$                                                                      | \f$ ds = \begin{cases} dd & \text{if}\ s > 0 \\ -dd & \text{if}\ s < 0 \\ 0 & \text{if}\ s = 0 \end{cases} \f$                               | --                                                                                                                     |
28| bounded_relu | #dnnl_eltwise_bounded_relu                                         | \f$ d = \begin{cases} \alpha & \text{if}\ s > \alpha \geq 0 \\ s & \text{if}\ 0 < s \leq \alpha \\ 0 & \text{if}\ s \leq 0 \end{cases} \f$                  | \f$ ds = \begin{cases} dd & \text{if}\ 0 < s \leq \alpha, \\ 0 & \text{otherwise}\ \end{cases} \f$                                           | --                                                                                                                     |
29| clip         | #dnnl_eltwise_clip                                                 | \f$ d = \begin{cases} \beta & \text{if}\ s > \beta \geq \alpha \\ s & \text{if}\ \alpha < s \leq \beta \\ \alpha & \text{if}\ s \leq \alpha \end{cases} \f$ | \f$ ds = \begin{cases} dd & \text{if}\ \alpha < s \leq \beta \\ 0 & \text{otherwise}\ \end{cases} \f$                                        | --                                                                                                                     |
30| clip_v2      | #dnnl_eltwise_clip_v2 <br> #dnnl_eltwise_clip_v2_use_dst_for_bwd   | \f$ d = \begin{cases} \beta & \text{if}\ s \geq \beta \geq \alpha \\ s & \text{if}\ \alpha < s < \beta \\ \alpha & \text{if}\ s \leq \alpha \end{cases} \f$ | \f$ ds = \begin{cases} dd & \text{if}\ \alpha < s < \beta \\ 0 & \text{otherwise}\ \end{cases} \f$                                           | \f$ ds = \begin{cases} dd & \text{if}\ \alpha < d < \beta \\ 0 & \text{otherwise}\ \end{cases} \f$                     |
31| elu          | #dnnl_eltwise_elu <br> #dnnl_eltwise_elu_use_dst_for_bwd           | \f$ d = \begin{cases} s & \text{if}\ s > 0 \\ \alpha (e^s - 1) & \text{if}\ s \leq 0 \end{cases} \f$                                                        | \f$ ds = \begin{cases} dd & \text{if}\ s > 0 \\ dd \cdot \alpha e^s & \text{if}\ s \leq 0 \end{cases} \f$                                    | \f$ ds = \begin{cases} dd & \text{if}\ d > 0 \\ dd \cdot (d + \alpha) & \text{if}\ d \leq 0 \end{cases}. See\ (2). \f$ |
32| exp          | #dnnl_eltwise_exp <br> #dnnl_eltwise_exp_use_dst_for_bwd           | \f$ d = e^s \f$                                                                                                                                             | \f$ ds = dd \cdot e^s \f$                                                                                                                    | \f$ ds = dd \cdot d \f$                                                                                                |
33| gelu_erf     | #dnnl_eltwise_gelu_erf                                             | \f$ d = 0.5 s (1 + \operatorname{erf}[\frac{s}{\sqrt{2}}])\f$                                                                                                     | \f$ ds = dd \cdot \left(0.5 + 0.5 \, \operatorname{erf}\left({\frac{s}{\sqrt{2}}}\right) + \frac{s}{\sqrt{2\pi}}e^{-0.5s^{2}}\right) \f$           | --                                                                                                                     |
34| gelu_tanh    | #dnnl_eltwise_gelu_tanh                                            | \f$ d = 0.5 s (1 + \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)])\f$                                                                                       | \f$ See\ (1). \f$                                                                                                                            | --                                                                                                                     |
35| hardswish    | #dnnl_eltwise_hardswish                                            | \f$ d = \begin{cases} s & \text{if}\ s > 3 \\ s \cdot \frac{s + 3}{6} & \text{if}\ -3 < s \leq 3 \\ 0 & \text{otherwise} \end{cases} \f$                    | \f$ ds = \begin{cases} dd & \text{if}\ s > 3 \\ dd \cdot \frac{2s + 3}{6} & \text{if}\ -3 < s \leq 3 \\ 0 & \text{otherwise} \end{cases} \f$ | --                                                                                                                     |
36| linear       | #dnnl_eltwise_linear                                               | \f$ d = \alpha s + \beta \f$                                                                                                                                | \f$ ds = \alpha \cdot dd \f$                                                                                                                 | --                                                                                                                     |
37| log          | #dnnl_eltwise_log                                                  | \f$ d = \log_{e}{s} \f$                                                                                                                                     | \f$ ds = \frac{dd}{s} \f$                                                                                                                    | --                                                                                                                     |
38| logistic     | #dnnl_eltwise_logistic <br> #dnnl_eltwise_logistic_use_dst_for_bwd | \f$ d = \frac{1}{1+e^{-s}} \f$                                                                                                                              | \f$ ds = \frac{dd}{1+e^{-s}} \cdot (1 - \frac{1}{1+e^{-s}}) \f$                                                                              | \f$ ds = dd \cdot d \cdot (1 - d) \f$                                                                                  |
39| logsigmoid   | #dnnl_eltwise_logsigmoid                                           | \f$ d = -\log_{e}(1+e^{-s}) \f$                                                                                                                             | \f$ ds = \frac{dd}{1 + e^{s}} \f$                                                                                                            | --                                                                                                                     |
40| mish         | #dnnl_eltwise_mish                                                 | \f$ d = s \cdot \tanh{(\log_{e}(1+e^s))} \f$                                                                                                                | \f$ ds = dd \cdot \frac{e^{s} \cdot \omega}{\delta^{2}}. See\ (3). \f$                                                                       | --                                                                                                                     |
41| pow          | #dnnl_eltwise_pow                                                  | \f$ d = \alpha s^{\beta} \f$                                                                                                                                | \f$ ds = dd \cdot \alpha \beta s^{\beta - 1} \f$                                                                                             | --                                                                                                                     |
42| relu         | #dnnl_eltwise_relu <br> #dnnl_eltwise_relu_use_dst_for_bwd         | \f$ d = \begin{cases} s & \text{if}\ s > 0 \\ \alpha s & \text{if}\ s \leq 0 \end{cases} \f$                                                                | \f$ ds = \begin{cases} dd & \text{if}\ s > 0 \\ \alpha \cdot dd & \text{if}\ s \leq 0 \end{cases} \f$                                        | \f$ ds = \begin{cases} dd & \text{if}\ d > 0 \\ \alpha \cdot dd & \text{if}\ d \leq 0 \end{cases}. See\ (2). \f$       |
43| round        | #dnnl_eltwise_round                                                | \f$ d = round(s) \f$                                                                                                                                        | --                                                                                                                                           | --                                                                                                                     |
44| soft_relu    | #dnnl_eltwise_soft_relu                                            | \f$ d = \log_{e}(1+e^s) \f$                                                                                                                                 | \f$ ds = \frac{dd}{1 + e^{-s}} \f$                                                                                                           | --                                                                                                                     |
45| sqrt         | #dnnl_eltwise_sqrt <br> #dnnl_eltwise_sqrt_use_dst_for_bwd         | \f$ d = \sqrt{s} \f$                                                                                                                                        | \f$ ds = \frac{dd}{2\sqrt{s}} \f$                                                                                                            | \f$ ds = \frac{dd}{2d} \f$                                                                                             |
46| square       | #dnnl_eltwise_square                                               | \f$ d = s^2 \f$                                                                                                                                             | \f$ ds = dd \cdot 2 s \f$                                                                                                                    | --                                                                                                                     |
47| swish        | #dnnl_eltwise_swish                                                | \f$ d = \frac{s}{1+e^{-\alpha s}} \f$                                                                                                                       | \f$ ds = \frac{dd}{1 + e^{-\alpha s}}(1 + \alpha s (1 - \frac{1}{1 + e^{-\alpha s}})) \f$                                                    | --                                                                                                                     |
48| tanh         | #dnnl_eltwise_tanh <br> #dnnl_eltwise_tanh_use_dst_for_bwd         | \f$ d = \tanh{s} \f$                                                                                                                                        | \f$ ds = dd \cdot (1 - \tanh^2{s}) \f$                                                                                                       | \f$ ds = dd \cdot (1 - d^2) \f$                                                                                        |
49
50\f$ (1)\ ds = dd \cdot 0.5 (1 + \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)]) \cdot (1 + \sqrt{\frac{2}{\pi}} (s + 0.134145 s^3) \cdot (1 -  \tanh[\sqrt{\frac{2}{\pi}} (s + 0.044715 s^3)]) ) \f$
51
52\f$ (2)\ \text{Operation is supported only for } \alpha \geq 0. \f$
53
54\f$ (3)\ \text{where, } \omega = e^{3s} + 4 \cdot e^{2s} + e^{s} \cdot (4 \cdot s + 6) + 4 \cdot (s + 1) \text{ and } \delta = e^{2s} + 2 \cdot e^{s} + 2. \f$
55
56#### Difference Between Forward Training and Forward Inference
57
58There is no difference between the #dnnl_forward_training
59and #dnnl_forward_inference propagation kinds.
60
61### Backward
62
63The backward propagation computes \diffsrc based on \diffdst and \src tensors.
64However, some operations support a computation using \dst memory produced during
65the forward propagation. Refer to the table above for a list of operations
66supporting destination as input memory and the corresponding formulas.
67
68#### Exceptions
69The eltwise primitive with algorithm round does not support backward
70propagation.
71
72## Execution Arguments
73
74When executed, the inputs and outputs should be mapped to an execution
75argument index as specified by the following table.
76
77| Primitive input/output      | Execution argument index                                                  |
78| ---                         | ---                                                                       |
79| \src                        | DNNL_ARG_SRC                                                              |
80| \dst                        | DNNL_ARG_DST                                                              |
81| \diffsrc                    | DNNL_ARG_DIFF_SRC                                                         |
82| \diffdst                    | DNNL_ARG_DIFF_DST                                                         |
83| \f$\text{binary post-op}\f$ | DNNL_ARG_ATTR_MULTIPLE_POST_OP(binary_post_op_position) \| DNNL_ARG_SRC_1 |
84
85## Implementation Details
86
87### General Notes
88
891. All eltwise primitives have a common initialization function (e.g.,
90   dnnl::eltwise_forward::desc::desc()) which takes both parameters
91   \f$\alpha\f$, and \f$\beta\f$. These parameters are ignored if they are
92   unused.
93
942. The memory format and data type for \src and \dst are assumed to be the
95   same, and in the API are typically referred as `data` (e.g., see `data_desc`
96   in dnnl::eltwise_forward::desc::desc()). The same holds for
97   \diffsrc and \diffdst. The corresponding memory descriptors are referred
98   to as `diff_data_desc`.
99
1003. Both forward and backward propagation support in-place operations, meaning
101   that \src can be used as input and output for forward propagation, and
102   \diffdst can be used as input and output for backward propagation. In case of
103   an in-place operation, the original data will be overwritten. Note, however,
104   that some algorithms for backward propagation require original \src, hence
105   the corresponding forward propagation should not be performed in-place for
106   those algorithms. Algorithms that use \dst for backward propagation can be
107   safely done in-place.
108
1094. For some operations it might be beneficial to compute backward
110   propagation based on \f$\dst(\overline{s})\f$, rather than on
111   \f$\src(\overline{s})\f$, for improved performance.
112
1135. For logsigmoid original formula \f$ d = \log_{e}(\frac{1}{1+e^{-s}})\f$ was
114   replaced by \f$ d = -soft\_relu(-s)\f$ for numerical stability.
115
116@note For operations supporting destination memory as input, \dst can be
117used instead of \src when backward propagation is computed. This enables
118several performance optimizations (see the tips below).
119
120### Data Type Support
121
122The eltwise primitive supports the following combinations of data types:
123
124| Propagation        | Source / Destination | Intermediate data type
125| :--                | :--                  | :--
126| forward / backward | f32, bf16            | f32
127| forward            | f16                  | f16
128| forward            | s32 / s8 / u8        | f32
129
130@warning
131    There might be hardware and/or implementation specific restrictions.
132    Check [Implementation Limitations](@ref dg_eltwise_impl_limits) section
133    below.
134
135Here the intermediate data type means that the values coming in are first
136converted to the intermediate data type, then the operation is applied, and
137finally the result is converted to the output data type.
138
139### Data Representation
140
141The eltwise primitive works with arbitrary data tensors. There is no special
142meaning associated with any logical dimensions.
143
144### Post-ops and Attributes
145
146| Propagation | Type    | Operation                                    | Description                                            | Restrictions                        |
147| :--         | :--     | :--                                          | :--                                                    | :--                                 |
148| Forward     | Post-op | [Binary](@ref dnnl::post_ops::append_binary) | Applies a @ref dnnl_api_binary operation to the result | General binary post-op restrictions |
149
150@anchor dg_eltwise_impl_limits
151## Implementation Limitations
152
1531. Refer to @ref dev_guide_data_types for
154   limitations related to data types support.
155
156## Performance Tips
157
1581. For backward propagation, use the same memory format for \src, \diffdst,
159   and \diffsrc (the format of the \diffdst and \diffsrc are always the
160   same because of the API). Different formats are functionally supported but
161   lead to highly suboptimal performance.
162
1632. Use in-place operations whenever possible (see caveats in General Notes).
164
1653. As mentioned above for all operations supporting destination memory as input,
166   one can use the \dst tensor instead of \src. This enables the
167   following potential optimizations for training:
168
169    - Such operations can be safely done in-place.
170
171    - Moreover, such operations can be fused as a
172      [post-op](@ref dev_guide_attributes) with the previous operation if that
173      operation does not require its \dst to compute the backward
174      propagation (e.g., if the convolution operation satisfies these
175      conditions).
176
177## Examples
178
179| Engine  | Name                     | Comments
180| :--     | :--                      | :--
181| CPU/GPU | @ref eltwise_example_cpp | @copydetails eltwise_example_cpp_short
182