1Inner Product {#dev_guide_inner_product}
2========================================
3
4>
5> [API Reference](@ref dnnl_api_inner_product)
6>
7
8## General
9
10The inner product primitive (sometimes called fully connected) treats each
11activation in the minibatch as a vector and computes its product with a
12weights 2D tensor producing a 2D tensor as an output.
13
14### Forward
15
16More precisely, let \src, \weights, \bias and \dst be \f$N \times IC\f$,
17\f$OC \times IC\f$, \f$OC\f$, and \f$N \times OC\f$ tensors, respectively
18(variable names follow the standard @ref dev_guide_conventions). Then:
19
20\f[\dst(n, oc) = \bias(oc) + \sum_{ic=0}^{IC-1} \src(n, ic) \cdot \weights(oc, ic)\f]
21
22In cases where the \src and \weights tensors have spatial dimensions, they are
23flattened to 2D. For example, if they are 4D
24\f$N \times IC' \times IH \times IW\f$ and
25\f$OC \times IC' \times KH \times KW\f$ tensors, then the formula above is
26applied with \f$IC = IC' \cdot IH \cdot IW\f$. In such cases, the \src and
27\weights tensors must have equal spatial dimensions (e.g. \f$KH = IH\f$ and
28\f$KW = IW\f$ for 4D tensors).
29
30#### Difference Between Forward Training and Forward Inference
31
32There is no difference between the @ref dnnl::prop_kind::forward_training
33and @ref dnnl::prop_kind::forward_inference propagation kinds.
34
35### Backward
36
37The backward propagation computes \diffsrc based on \diffdst and
38\weights.
39
40The weights update computes \diffweights and \diffbias based on
41\diffdst and \src.
42
43@note The *optimized* memory formats \src and \weights might be
44different on forward propagation, backward propagation, and weights
45update.
46
47## Execution Arguments
48
49When executed, the inputs and outputs should be mapped to an execution
50argument index as specified by the following table.
51
52| Primitive input/output      | Execution argument index                                                  |
53| ---                         | ---                                                                       |
54| \src                        | DNNL_ARG_SRC                                                              |
55| \weights                    | DNNL_ARG_WEIGHTS                                                          |
56| \bias                       | DNNL_ARG_BIAS                                                             |
57| \dst                        | DNNL_ARG_DST                                                              |
58| \diffsrc                    | DNNL_ARG_DIFF_SRC                                                         |
59| \diffweights                | DNNL_ARG_DIFF_WEIGHTS                                                     |
60| \diffbias                   | DNNL_ARG_DIFF_BIAS                                                        |
61| \diffdst                    | DNNL_ARG_DIFF_DST                                                         |
62| \f$\text{binary post-op}\f$ | DNNL_ARG_ATTR_MULTIPLE_POST_OP(binary_post_op_position) \| DNNL_ARG_SRC_1 |
63
64
65## Implementation Details
66
67### General Notes
68
69N/A.
70
71### Data Types
72
73Inner product primitive supports the following combination of data types for
74source, destination, weights, and bias:
75
76| Propagation        | Source    | Weights   | Destination      | Bias             |
77| :--                | :--       | :--       | :--              | :--              |
78| forward / backward | f32       | f32       | f32              | f32              |
79| forward            | f16       | f16       | f16              | f16              |
80| forward            | u8, s8    | s8        | u8, s8, s32, f32 | u8, s8, s32, f32 |
81| forward            | bf16      | bf16      | f32, bf16        | f32, bf16        |
82| backward           | f32, bf16 | bf16      | bf16             |                  |
83| weights update     | bf16      | f32, bf16 | bf16             | f32, bf16        |
84
85### Data Representation
86
87Like other CNN primitives, the inner product primitive expects the following
88tensors:
89
90| Spatial | Source                                      | Destination      | Weights
91| :--     | :--                                         | :--              | :--
92| 1D      | \f$N \times C \times W\f$                   | \f$N \times C\f$ | \f$OC \times IC \times KW\f$
93| 2D      | \f$N \times C \times H \times W\f$          | \f$N \times C\f$ | \f$OC \times IC \times KH \times KW\f$
94| 3D      | \f$N \times C \times D \times H \times W\f$ | \f$N \times C\f$ | \f$OC \times IC \times KD \times KH \times KW\f$
95
96Memory format of data and weights memory objects is critical for inner
97product primitive performance. In the oneDNN programming model, inner
98product primitive is one of the few primitives that support the placeholder
99format #dnnl::memory::format_tag::any (shortened to `any` from
100now on) and can define data and weight memory objects formats based on the
101primitive parameters. When using `any` it is necessary to first create an
102inner product primitive descriptor and then query it for the actual data and
103weight memory objects formats.
104
105The table below shows the combinations for which **plain** memory formats the
106inner product primitive is optimized for. For the destination tensor (which is
107always \f$N \times C\f$) the memory format is always
108#dnnl::memory::format_tag::nc (#dnnl::memory::format_tag::ab).
109
110| Spatial | Source / Weights logical tensor | Implementation optimized for memory formats
111| :--     | :--                             | :--
112| 0D      | NC / OI                         | #dnnl_nc (#dnnl_ab) / #dnnl_oi (#dnnl_ab)
113| 0D      | NC / OI                         | #dnnl_nc (#dnnl_ab) / #dnnl_io (#dnnl_ba)
114| 1D      | NCW / OIW                       | #dnnl_ncw (#dnnl_abc) / #dnnl_oiw (#dnnl_abc)
115| 1D      | NCW / OIW                       | #dnnl_nwc (#dnnl_acb) / #dnnl_wio (#dnnl_cba)
116| 2D      | NCHW / OIHW                     | #dnnl_nchw (#dnnl_abcd) / #dnnl_oihw (#dnnl_abcd)
117| 2D      | NCHW / OIHW                     | #dnnl_nhwc (#dnnl_acdb) / #dnnl_hwio (#dnnl_cdba)
118| 3D      | NCDHW / OIDHW                   | #dnnl_ncdhw (#dnnl_abcde) / #dnnl_oidhw (#dnnl_abcde)
119| 3D      | NCDHW / OIDHW                   | #dnnl_ndhwc (#dnnl_acdeb) / #dnnl_dhwio (#dnnl_cdeba)
120
121### Post-ops and Attributes
122
123Post-ops and attributes enable you to modify the behavior of the inner product
124primitive by chaining certain operations after the inner product operation.
125The following post-ops are supported by inner product primitives:
126
127| Propagation | Type      | Operation                                                    | Description                                                                   | Restrictions                        |
128| :--         | :--       | :--                                                          | :--                                                                           | :--                                 |
129| forward     | attribute | [Output scale](@ref dnnl::primitive_attr::set_output_scales) | Scales the result of inner product by given scale factor(s)                   | int8 inner products only            |
130| forward     | post-op   | [Eltwise](@ref dnnl::post_ops::append_eltwise)               | Applies an @ref dnnl_api_eltwise operation to the result                      |                                     |
131| forward     | post-op   | [Sum](@ref dnnl::post_ops::append_sum)                       | Adds the operation result to the destination tensor instead of overwriting it |                                     |
132| forward     | post-op   | [Binary](@ref dnnl::post_ops::append_binary)                 | Applies a @ref dnnl_api_binary operation to the result                        | General binary post-op restrictions |
133
134## Implementation Limitations
135
1361. Check @ref dev_guide_data_types.
137
138
139## Performance Tips
140
141- Use #dnnl::memory::format_tag::any for source, weights,
142  and destinations memory format tags when create an inner product primitive
143  to allow the library to choose the most appropriate memory format.
144
145## Examples
146
147| Engine  | Name                           | Comments
148| :--     | :--                            | :--
149| CPU/GPU | @ref inner_product_example_cpp | @copydetails inner_product_example_cpp_short
150