1Reduction {#dev_guide_reduction}
2============================
3>
4> [API Reference](@ref dnnl_api_reduction)
5>
6
7## General
8
9The reduction primitive performs reduction operation on arbitrary data. Each
10element in the destination is the result of reduction operation with specified
11algorithm along one or multiple source tensor dimensions:
12
13\f[
14    \dst(f) = \mathop{reduce\_op}\limits_{r}\src(r),
15\f]
16
17where \f$reduce\_op\f$ can be max, min, sum, mul, mean, Lp-norm and
18Lp-norm-power-p, \f$f\f$ is an index in an idle dimension and \f$r\f$ is an
19index in a reduction dimension.
20
21Mean:
22
23\f[
24    \dst(f) = \frac{\sum\limits_{r}\src(r)} {R},
25\f]
26
27where \f$R\f$ is the size of a reduction dimension.
28
29Lp-norm:
30
31\f[
32    \dst(f) = \root p \of {\mathop{eps\_op}(\sum\limits_{r}|src(r)|^p, eps)},
33\f]
34
35where \f$eps\_op\f$ can be max and sum.
36
37Lp-norm-power-p:
38
39\f[
40    \dst(f) = \mathop{eps\_op}(\sum\limits_{r}|src(r)|^p, eps),
41\f]
42
43where \f$eps\_op\f$ can be max and sum.
44
45### Notes
46 * The reduction primitive requires the source and destination tensors to have
47   the same number of dimensions.
48 * Reduction dimensions are of size 1 in a destination tensor.
49 * The reduction primitive does not have a notion of forward or backward
50   propagations.
51
52## Execution Arguments
53
54When executed, the inputs and outputs should be mapped to an execution
55argument index as specified by the following table.
56
57| Primitive input/output      | Execution argument index |
58| ---                         | ---                      |
59| \src                        | DNNL_ARG_SRC             |
60| \dst                        | DNNL_ARG_DST             |
61| \f$\text{binary post-op}\f$ | DNNL_ARG_ATTR_MULTIPLE_POST_OP(binary_post_op_position) \| DNNL_ARG_SRC_1 |
62
63## Implementation Details
64
65### General Notes
66 * The \dst memory format can be either specified explicitly or by
67   #dnnl::memory::format_tag::any (recommended), in which case the primitive
68   will derive the most appropriate memory format based on the format of the
69   source tensor.
70
71### Post-ops and Attributes
72
73The following attributes are supported:
74
75| Type    | Operation                                      | Description                                                                    | Restrictions
76| :--     | :--                                            | :--                                                                            | :--
77| Post-op | [Sum](@ref dnnl::post_ops::append_sum)         | Adds the operation result to the destination tensor instead of overwriting it. |                                     |
78| Post-op | [Eltwise](@ref dnnl::post_ops::append_eltwise) | Applies an @ref dnnl_api_eltwise operation to the result.                      |                                     |
79| Post-op | [Binary](@ref dnnl::post_ops::append_binary)   | Applies a @ref dnnl_api_binary operation to the result                         | General binary post-op restrictions |
80
81### Data Types Support
82
83The source and destination tensors may have `f32`, `bf16`, or `int8` data types.
84See @ref dev_guide_data_types page for more details.
85
86### Data Representation
87
88#### Sources, Destination
89
90The reduction primitive works with arbitrary data tensors. There is no special
91meaning associated with any of the dimensions of a tensor.
92
93## Implementation Limitations
94
951. Refer to @ref dev_guide_data_types for limitations related to data types
96   support.
97
98## Performance Tips
99
1001. Whenever possible, avoid specifying different memory formats for source
101   and destination tensors.
102
103## Examples
104
105| Engine  | Name                       | Comments
106| :--     | :--                        | :--
107| CPU/GPU | @ref reduction_example_cpp | @copydetails reduction_example_cpp_short
108