1Shuffle {#dev_guide_shuffle}
2============================
3
4>
5> [API Reference](@ref dnnl_api_shuffle)
6>
7
8## General
9
10The shuffle primitive shuffles data along the shuffle axis (here is designated
11as \f$C\f$) with the group parameter \f$G\f$. Namely, the shuffle axis is
12thought to be a 2D tensor of size \f$(\frac{C}{G} \times G)\f$ and it is being
13transposed to \f$(G \times \frac{C}{G})\f$. Variable names follow the standard
14@ref dev_guide_conventions.
15
16The formal definition is shown below:
17
18### Forward
19
20\f[
21    \dst(\overline{ou}, c, \overline{in}) =
22    \src(\overline{ou}, c', \overline{in})
23\f]
24
25where
26
27- \f$c\f$ dimension is called a shuffle axis,
28- \f$G\f$ is a `group_size`,
29- \f$\overline{ou}\f$ is the outermost indices (to the left from shuffle axis),
30- \f$\overline{in}\f$ is the innermost indices (to the right from shuffle axis), and
31- \f$c'\f$ and \f$c\f$ relate to each other as define by the system:
32
33\f[
34    \begin{cases}
35        c  &= u + v\frac{C}{G}, \\
36        c' &= uG + v, \\
37    \end{cases}
38\f]
39
40Here, \f$0 \leq u < \frac{C}{G}\f$ and \f$0 \leq v < G\f$.
41
42#### Difference Between Forward Training and Forward Inference
43
44There is no difference between the #dnnl_forward_training
45and #dnnl_forward_inference propagation kinds.
46
47### Backward
48
49The backward propagation computes
50\f$\diffsrc(ou, c, in)\f$,
51based on
52\f$\diffdst(ou, c, in)\f$.
53
54Essentially, backward propagation is the same as forward propagation with
55\f$g\f$ replaced by \f$C / g\f$.
56
57## Execution Arguments
58
59When executed, the inputs and outputs should be mapped to an execution
60argument index as specified by the following table.
61
62| Primitive input/output | Execution argument index |
63| ---                    | ---                      |
64| \src                   | DNNL_ARG_SRC             |
65| \dst                   | DNNL_ARG_DST             |
66| \diffsrc               | DNNL_ARG_DIFF_SRC        |
67| \diffdst               | DNNL_ARG_DIFF_DST        |
68
69## Implementation Details
70
71### General Notes
72
731. The memory format and data type for `src` and `dst` are assumed to be the
74   same, and in the API are typically referred as `data` (e.g., see `data_desc`
75   in dnnl::shuffle_forward::desc::desc()). The same holds for
76   `diff_src` and `diff_dst`. The corresponding memory descriptors are referred
77   to as `diff_data_desc`.
78
79## Data Types
80
81The shuffle primitive supports the following combinations of data types:
82
83| Propagation        | Source / Destination
84| :--                | :--
85| forward / backward | f32, bf16
86| forward            | s32, s8, u8
87
88@warning
89    There might be hardware and/or implementation specific restrictions.
90    Check [Implementation Limitations](@ref dg_shuffle_impl_limits) section
91    below.
92
93## Data Layouts
94
95The shuffle primitive works with arbitrary data tensors. There is no special
96meaning associated with any logical dimensions. However, the shuffle axis is
97typically referred to as channels (hence in formulas we use \f$c\f$).
98
99Shuffle operation typically appear in CNN topologies. Hence, in the library the
100shuffle primitive is optimized for the corresponding memory formats:
101
102| Spatial | Logical tensor | Shuffle Axis | Implementations optimized for memory formats                       |
103| :--     | :--            | :--          | :--                                                                |
104| 2D      | NCHW           | 1 (C)        | #dnnl_nchw (#dnnl_abcd), #dnnl_nhwc (#dnnl_acdb), *optimized^*     |
105| 3D      | NCDHW          | 1 (C)        | #dnnl_ncdhw (#dnnl_abcde), #dnnl_ndhwc (#dnnl_acdeb), *optimized^* |
106
107Here *optimized^* means the format that
108[comes out](@ref memory_format_propagation_cpp)
109of any preceding compute-intensive primitive.
110
111### Post-ops and Attributes
112
113The shuffle primitive does not support any post-ops or attributes.
114
115
116@anchor dg_shuffle_impl_limits
117## Implementation Limitations
118
1191. Refer to @ref dev_guide_data_types for limitations related to data types
120   support.
121
122
123## Performance Tips
124
125N/A
126
127## Examples
128
129| Engine  | Name                     | Comments
130| :--     | :--                      | :--
131| CPU/GPU | @ref shuffle_example_cpp | @copydetails shuffle_example_cpp_short
132