1# Introduction
2
3This document discusses the current state of filtering in libyuv. An emphasis on maximum performance while avoiding memory exceptions, and minimal amount of code/complexity.  See future work at end.
4
5# LibYuv Filter Subsampling
6
7There are 2 challenges with subsampling
8
91. centering of samples, which involves clamping on edges
102. clipping a source region
11
12Centering depends on scale factor and filter mode.
13
14# Down Sampling
15
16If scaling down, the stepping rate is always src_width / dst_width.
17
18    dx = src_width / dst_width;
19
20e.g. If scaling from 1280x720 to 640x360, the step thru the source will be 2.0, stepping over 2 pixels of source for each pixel of destination.
21
22Centering, depends on filter mode.
23
24*Point* downsampling takes the middle pixel.
25
26    x = dx >> 1;
27
28For odd scale factors (e.g. 3x down) this is exactly the middle.  For even scale factors, this rounds up and takes the pixel to the right of center.  e.g. scale of 4x down will take pixel 2.
29
30**Bilinear** filter, uses the 2x2 pixels in the middle.
31
32    x = dx / 2 - 0.5;
33
34For odd scale factors (e.g. 3x down) this is exactly the middle, and point sampling is used.
35For even scale factors, this evenly filters the middle 2x2 pixels.  e.g. 4x down will filter pixels 1,2 at 50% in both directions.
36
37**Box** filter averages the entire box so sampling starts at 0.
38
39    x = 0;
40
41For a scale factor of 2x down, this is equivalent to bilinear.
42
43# Up Sampling
44
45**Point** upsampling use stepping rate of src_width / dst_width and a starting coordinate of 0.
46
47    x = 0;
48    dx = src_width / dst_width;
49
50e.g. If scaling from 640x360 to 1280x720 the step thru the source will be 0.0, stepping half a pixel of source for each pixel of destination. Each pixel is replicated by the scale factor.
51
52**Bilinear** filter stretches such that the first pixel of source maps to the first pixel of destination, and the last pixel of source maps to the last pixel of destination.
53
54    x = 0;
55    dx = (src_width - 1) / (dst_width - 1);
56
57This method is not technically correct, and will likely change in the future.
58
59* It is inconsistent with the bilinear down sampler.  The same method could be used for down sampling, and then it would be more reversible, but that would prevent specialized 2x down sampling.
60* Although centered, the image is slightly magnified.
61* The filtering was changed in early 2013 - previously it used:
62
63        x = 0;
64        dx = (src_width - 1) / (dst_width - 1);
65
66Which is the correct scale factor, but shifted the image left, and extruded the last pixel.  The reason for the change was to remove the extruding code from the low level row functions, allowing 3 functions to sshare the same row functions - ARGBScale, I420Scale, and ARGBInterpolate.  Then the one function was ported to many cpu variations: SSE2, SSSE3, AVX2, Neon and 'Any' version for any number of pixels and alignment.  The function is also specialized for 0,25,50,75%.
67
68The above goes still has the potential to read the last pixel 100% and last pixel + 1 0%, which may cause a memory exception.  So the left pixel goes to a fraction less than the last pixel, but filters in the minimum amount of it, and the maximum of the last pixel.
69
70    dx = FixedDiv((src_width << 16) - 0x00010001, (dst << 16) - 0x00010000);
71
72**Box** filter for upsampling switches over to Bilinear.
73
74# Scale snippet:
75
76    #define CENTERSTART(dx, s) (dx < 0) ? -((-dx >> 1) + s) : ((dx >> 1) + s)
77    #define FIXEDDIV1(src, dst) FixedDiv((src << 16) - 0x00010001, \
78                                         (dst << 16) - 0x00010000);
79
80    // Compute slope values for stepping.
81    void ScaleSlope(int src_width, int src_height,
82                    int dst_width, int dst_height,
83                    FilterMode filtering,
84                    int* x, int* y, int* dx, int* dy) {
85      assert(x != NULL);
86      assert(y != NULL);
87      assert(dx != NULL);
88      assert(dy != NULL);
89      assert(src_width != 0);
90      assert(src_height != 0);
91      assert(dst_width > 0);
92      assert(dst_height > 0);
93      if (filtering == kFilterBox) {
94        // Scale step for point sampling duplicates all pixels equally.
95        *dx = FixedDiv(Abs(src_width), dst_width);
96        *dy = FixedDiv(src_height, dst_height);
97        *x = 0;
98        *y = 0;
99      } else if (filtering == kFilterBilinear) {
100        // Scale step for bilinear sampling renders last pixel once for upsample.
101        if (dst_width <= Abs(src_width)) {
102          *dx = FixedDiv(Abs(src_width), dst_width);
103          *x = CENTERSTART(*dx, -32768);
104        } else if (dst_width > 1) {
105          *dx = FIXEDDIV1(Abs(src_width), dst_width);
106          *x = 0;
107        }
108        if (dst_height <= src_height) {
109          *dy = FixedDiv(src_height,  dst_height);
110          *y = CENTERSTART(*dy, -32768);  // 32768 = -0.5 to center bilinear.
111        } else if (dst_height > 1) {
112          *dy = FIXEDDIV1(src_height, dst_height);
113          *y = 0;
114        }
115      } else if (filtering == kFilterLinear) {
116        // Scale step for bilinear sampling renders last pixel once for upsample.
117        if (dst_width <= Abs(src_width)) {
118          *dx = FixedDiv(Abs(src_width), dst_width);
119          *x = CENTERSTART(*dx, -32768);
120        } else if (dst_width > 1) {
121          *dx = FIXEDDIV1(Abs(src_width), dst_width);
122          *x = 0;
123        }
124        *dy = FixedDiv(src_height, dst_height);
125        *y = *dy >> 1;
126      } else {
127        // Scale step for point sampling duplicates all pixels equally.
128        *dx = FixedDiv(Abs(src_width), dst_width);
129        *dy = FixedDiv(src_height, dst_height);
130        *x = CENTERSTART(*dx, 0);
131        *y = CENTERSTART(*dy, 0);
132      }
133      // Negative src_width means horizontally mirror.
134      if (src_width < 0) {
135        *x += (dst_width - 1) * *dx;
136        *dx = -*dx;
137        src_width = -src_width;
138      }
139    }
140
141# Future Work
142
143Point sampling should ideally be the same as bilinear, but pixel by pixel, round to nearest neighbor.  But as is, it is reversible and exactly matches ffmpeg at all scale factors, both up and down.  The scale factor is
144
145    dx = src_width / dst_width;
146
147The step value is centered for down sample:
148
149    x = dx / 2;
150
151Or starts at 0 for upsample.
152
153    x = 0;
154
155Bilinear filtering is currently correct for down sampling, but not for upsampling.
156Upsampling is stretching the first and last pixel of source, to the first and last pixel of destination.
157
158    dx = (src_width - 1) / (dst_width - 1);<br>
159    x = 0;
160
161It should be stretching such that the first pixel is centered in the middle of the scale factor, to match the pixel that would be sampled for down sampling by the same amount.  And same on last pixel.
162
163    dx = src_width / dst_width;<br>
164    x = dx / 2 - 0.5;
165
166This would start at -0.5 and go to last pixel + 0.5, sampling 50% from last pixel + 1.
167Then clamping would be needed.  On GPUs there are numerous ways to clamp.
168
1691. Clamp the coordinate to the edge of the texture, duplicating the first and last pixel.
1702. Blend with a constant color, such as transparent black.  Typically best for fonts.
1713. Mirror the UV coordinate, which is similar to clamping.  Good for continuous tone images.
1724. Wrap the coordinate, for texture tiling.
1735. Allow the coordinate to index beyond the image, which may be the correct data if sampling a subimage.
1746. Extrapolate the edge based on the previous pixel.  pixel -0.5 is computed from slope of pixel 0 and 1.
175
176Some of these are computational, even for a GPU, which is one reason textures are sometimes limited to power of 2 sizes.
177We do care about the clipping case, where allowing coordinates to become negative and index pixels before the image is the correct data.  But normally for simple scaling, we want to clamp to the edge pixel.  For example, if bilinear scaling from 3x3 to 30x30, we’d essentially want 10 pixels of each of the original 3 pixels.  But we want the original pixels to land in the middle of each 10 pixels, at offsets 5, 15 and 25.  There would be filtering between 5 and 15 between the original pixels 0 and 1.  And filtering between 15 and 25 from original pixels 1 and 2.  The first 5 pixels are clamped to pixel 0 and the last 5 pixels are clamped to pixel 2.
178The easiest way to implement this is copy the original 3 pixels to a buffer, and duplicate the first and last pixels.  0,1,2 becomes 0, 0,1,2, 2.  Then implement a filtering without clamping.  We call this source extruding.  Its only necessary on up sampling, since down sampler will always have valid surrounding pixels.
179Extruding is practical when the image is already copied to a temporary buffer.   It could be done to the original image, as long as the original memory is restored, but valgrind and/or memory protection would disallow this, so it requires a memcpy to a temporary buffer, which may hurt performance.  The memcpy has a performance advantage, from a cache point of view, that can actually make this technique faster, depending on hardware characteristics.
180Vertical extrusion can be done with a memcpy of the first/last row, or clamping a pointer.
181
182
183The other way to implement clamping is handle the edges with a memset.  e.g. Read first source pixel and memset the first 5 pixels.  Filter pixels 0,1,2 to 5 to 25.  Read last pixel and memset the last 5 pixels.  Blur is implemented with this method like this, which has 3 loops per row - left, middle and right.
184
185Box filter is only used for 2x down sample or more.  Its based on integer sized boxes.  Technically it should be filtered edges, but thats substantially slower (roughly 100x), and at that point you may as well do a cubic filter which is more correct.
186
187Box filter currently sums rows into a row buffer.  It does this with
188
189Mirroring will use the same slope as normal, but with a negative.
190The starting coordinate needs to consider the scale factor and filter.  e.g. box filter of 30x30 to 3x3 with mirroring would use -10 for step, but x = 20.  width (30) - dx.
191
192Step needs to be accurate, so it uses an integer divide.  This is as much as 5% of the profile.  An approximated divide is substantially faster, but the inaccuracy causes stepping beyond the original image boundaries.  3 general solutions:
193
1941. copy image to buffer with padding.  allows for small errors in stepping.
1952. hash the divide, so common values are quickly found.
1963. change api so caller provides the slope.
197