1<div style="font-size:3em; text-align:center;"> Algorithm Description </div> 2 3# Abstract 4This document describes technical aspects of coding tools included in 5the associated codec. This document is not a specification of the associated 6codec. Instead, it summarizes the highlighted features of coding tools for new 7developers. This document should be updated when significant new normative 8changes have been integrated into the associated codec. 9 10# Table of Contents 11 12[Abbreviations](#Abbreviations) 13 14[Algorithm description](#Algorithm-Description) 15 16- [Block Partitioning](#Block-Partitioning) 17 - [Coding block partition](#Coding-block-partition) 18 - [Transform block partition](#Transform-block-partition) 19- [Intra Prediction](#Intra-Prediction) 20 - [Directional intra prediction modes](#Directional-intra-prediction-modes) 21 - [Non-directional intra prediction modes](#Non-directional-intra-prediction-modes) 22 - [Recursive filtering modes](#Recursive-filtering-modes) 23 - [Chroma from Luma mode](#Chroma-from-Luma-mode) 24- [Inter Prediction](#Inter-Prediction) 25 - [Motion vector prediction](#Motion-vector-prediction) 26 - [Motion vector coding](#Motion-vector-coding) 27 - [Interpolation filter for motion compensation](#Interpolation-filter-for-motion-compensation) 28 - [Warped motion compensation](#Warped-motion-compensation) 29 - [Overlapped block motion compensation](#Overlapped-block-motion-compensation) 30 - [Reference frames](#Reference-frames) 31 - [Compound Prediction](#Compound-Prediction) 32- [Transform](#Transform) 33- [Quantization](#Quantization) 34- [Entropy Coding](#Entropy-Coding) 35- [Loop filtering and post-processing](#Loop-filtering-and-post-processing) 36 - [Deblocking](#Deblocking) 37 - [Constrained directional enhancement](#Constrained-directional-enhancement) 38 - [Loop Restoration filter](#Loop-Restoration-filter) 39 - [Frame super-resolution](#Frame-super-resolution) 40 - [Film grain synthesis](#Film-grain-synthesis) 41- [Screen content coding](#Screen-content-coding) 42 - [Intra block copy](#Intra-block-copy) 43 - [Palette mode](#Palette-mode) 44 45[References](#References) 46 47# Abbreviations 48 49CfL: Chroma from Luma\ 50IntraBC: Intra block copy\ 51LCU: Largest coding unit\ 52OBMC: Overlapped Block Motion Compensation\ 53CDEF: Constrained Directional Enhancement Filter 54 55# Algorithm Description 56 57## Block Partitioning 58 59### Coding block partition 60 61The largest coding block unit (LCU) applied in this codec is 128×128. In 62addition to no split mode `PARTITION_NONE`, the partition tree supports 9 63different partitioning patterns, as shown in below figure. 64 65<figure class="image"> <center><img src="img\partition_codingblock.svg" 66alt="Partition" width="360" /> <figcaption>Figure 1: Supported coding block 67partitions</figcaption> </figure> 68 69According to the number of sub-partitions, the 9 partition modes are summarized 70as follows: 1. Four partitions: `PARTITION_SPLIT`, `PARTITION_VERT_4`, 71`PARTITION_HORZ_4` 2. Three partitions (T-Shape): `PARTITION_HORZ_A`, 72`PARTITION_HORZ_B`, `PARTITION_VERT_A`, `PARTITION_HORZ_B` 3. Two partitions: 73`PARTITION_HORZ`, `PARTITION_VERT` 74 75Among all the 9 partitioning patterns, only `PARTITION_SPLIT` mode supports 76recursive partitioning, i.e., sub-partitions can be further split, other 77partitioning modes cannot further split. Particularly, for 8x8 and 128x128, 78`PARTITION_VERT_4`, `PARTITION_HORZ_4` are not used, and for 8x8, T-Shape 79partitions are not used either. 80 81### Transform block partition 82 83For both intra and inter coded blocks, the coding block can be further 84partitioned into multiple transform units with the partitioning depth up to 2 85levels. The mapping from the transform size of the current depth to the 86transform size of the next depth is shown in the following Table 1. 87 88<figure class="image"> <center><figcaption>Table 1: Transform partition size 89setting</figcaption> <img src="img\tx_partition.svg" alt="Partition" width="220" 90/> </figure> 91 92Furthermore, for intra coded blocks, the transform partition is done in a way 93that all the transform blocks have the same size, and the transform blocks are 94coded in a raster scan order. An example of the transform block partitioning for 95intra coded block is shown in the Figure 2. 96 97<figure class="image"> <center><img src="img\intra_tx_partition.svg" 98alt="Partition" width="600" /> <figcaption>Figure 2: Example of transform 99partitioning for intra coded block</figcaption> </figure> 100 101For inter coded blocks, the transform unit partitioning can be done in a 102recursive manner with the partitioning depth up to 2 levels. The transform 103partitioning supports 1:1 (square), 1:2/2:1, and 1:4/4:1 transform unit sizes 104ranging from 4×4 to 64×64. If the coding block is smaller than or equal to 10564x64, the transform block partitioning can only apply to luma component, for 106chroma blocks, the transform block size is identical to the coding block size. 107Otherwise, if the coding block width or height is greater than 64, then both the 108luma and chroma coding blocks will implicitly split into multiples of min(W, 10964)x min(H, 64) and min(W, 32)x min(H, 32) transform blocks, respectively. 110 111<figure class="image"> <center><img src="img\inter_tx_partition.svg" 112alt="Partition" width="400" /> <figcaption>Figure 3: Example of transform 113partitioning for inter coded block</figcaption> </figure> 114 115## Intra Prediction 116 117### Directional intra prediction modes 118 119Directional intra prediction modes are applied in intra prediction, which models 120local textures using a given direction pattern. Directional intra prediction 121modes are represented by nominal modes and angle delta. The nominal modes are 122similar set of intra prediction angles used in VP9, which includes 8 angles. The 123index value of angle delta is ranging from -3 ~ +3, and zero delta angle 124indicates a nominal mode. The prediction angle is represented by a nominal intra 125angle plus an angle delta. In total, there are 56 directional intra prediction 126modes, as shown in the following figure. In the below figure, solid arrows 127indicate directional intra prediction modes and dotted arrows represent non-zero 128angle delta. 129 130<figure class="image"> <center><img src="img\intra_directional.svg" 131alt="Directional intra" width="300" /> <figcaption>Figure 4: Directional intra 132prediction modes</figcaption> </figure> 133 134The nominal mode index and angle delta index is signalled separately, and 135nominal mode index is signalled before the associated angle delta index. It is 136noted that for small block sizes, where the coding gain from extending intra 137prediction angles may saturate, only the nominal modes are used and angle delta 138index is not signalled. 139 140### Non-directional intra prediction modes 141 142In addition to directional intra prediction modes, four non-directional intra 143modes which simulate smooth textures are also included. The four non-directional 144intra modes include `SMOOTH_V`, `SMOOTH_H`, `SMOOTH` and `PAETH predictor`. 145 146In `SMOOTH V`, `SMOOTH H` and `SMOOTH modes`, the prediction values are 147generated using quadratic interpolation along vertical, horizontal directions, 148or the average thereof. The samples used in the quadratic interpolation include 149reconstructed samples from the top and left neighboring blocks and samples from 150the right and bottom boundaries which are approximated by top reconstructed 151samples and the left reconstructed samples. 152 153In `PAETH predictor` mode, the prediction for each sample is assigned as one 154from the top (T), left (L) and top-left (TL) reference samples, which has the 155value closest to the Paeth predictor value, i.e., T + L -TL. The samples used in 156`PAETH predictor` are illustrated in below figure. 157 158<figure class="image"> <center><img src="img\intra_paeth.svg" alt="Directional 159intra" width="300" /> <figcaption>Figure 5: Paeth predictor</figcaption> 160</figure> 161 162### Recursive filtering modes 163 164Five filtering intra modes are defined, and each mode specify a set of eight 1657-tap filters. Given the selected filtering mode index (0~4), the current block 166is divided into 4x2 sub-blocks. For one 4×2 sub-block, each sample is predicted 167by 7-tap interpolation using the 7 top and left neighboring samples as inputs. 168Different filters are applied for samples located at different coordinates 169within a 4×2 sub-block. The prediction process can be done recursively in unit 1704x2 sub-block, which means that prediction samples generated for one 4x2 171prediction block can be used to predict another 4x2 sub-block. 172 173<figure class="image"> <center><img src="img\intra_recursive.svg" 174alt="Directional intra" width="300" /> <figcaption>Figure 6: Recursive filtering 175modes</figcaption> </figure> 176 177### Chroma from Luma mode 178 179Chroma from Luma (CfL) is a chroma intra prediction mode, which models chroma 180samples as a linear function of co-located reconstructed luma samples. To align 181the resolution between luma and chroma samples for different chroma sampling 182format, e.g., 4:2:0 and 4:2:2, reconstructed luma pixels may need to be 183sub-sampled before being used in CfL mode. In addition, the DC component is 184removed to form the AC contribution. In CfL mode, the model parameters which 185specify the linear function between two color components are optimized by 186encoder signalled in the bitstream. 187 188<figure class="image"> <center><img src="img\intra_cfl.svg" alt="Directional 189intra" width="700" /> <figcaption>Figure 7: CfL prediction</figcaption> 190</figure> 191 192## Inter Prediction 193 194### Motion vector prediction 195 196Motion vectors are predicted by neighboring blocks which can be either spatial 197neighboring blocks, or temporal neighboring blocks located in a reference frame. 198A set of MV predictors will be identified by checking all these blocks and 199utilized to encode the motion vector information. 200 201**Spatial motion vector prediction** 202 203There are two sets of spatial neighboring blocks that can be utilized for 204finding spatial MV predictors, including the adjacent spatial neighbors which 205are direct top and left neighbors of the current block, and second outer spatial 206neighbors which are close but not directly adjacent to the current block. The 207two sets of spatial neighboring blocks are illustrated in an example shown in 208Figure 8. 209 210<figure class="image"> <center><img src="img\inter_spatial_mvp.svg" 211alt="Directional intra" width="350" /><figcaption>Figure 8: Motion field 212estimation by linear projection</figcaption></figure> 213 214For each set of spatial neighbors, the top row will be checked from left to 215right and then the left column will be checked from top to down. For the 216adjacent spatial neighbors, an additional top-right block will be also checked 217after checking the left column neighboring blocks. For the non-adjacent spatial 218neighbors, the top-left block located at (-1, -1) position will be checked 219first, then the top row and left column in a similar manner as the adjacent 220neighbors. The adjacent neighbors will be checked first, then the temporal MV 221predictor that will be described in the next subsection will be checked second, 222after that, the non-adjacent spatial neighboring blocks will be checked. 223 224For compound prediction which utilizes a pair of reference frames, the 225non-adjacent spatial neighbors are not used for deriving the MV predictor. 226 227**Temporal motion vector prediction** 228 229In addition to spatial neighboring blocks, MV predictor can be also derived 230using co-located blocks of reference pictures, namely temporal MV predictor. To 231generate temporal MV predictor, the MVs of reference frames are first stored 232together with reference indices associated with the reference frame. Then for 233each 8x8 block of the current frame, the MVs of a reference frame which pass the 2348x8 block are identified and stored together with the reference frame index in a 235temporal MV buffer. In an example shown in Figure 5, the MV of reference frame 1 236(R1) pointing from R1 to a reference frame of R1 is identified, i.e., MVref, 237which passes a 8x8 block (shaded in blue dots) of current frame. Then this MVref 238is stored in the temporal MV buffer associated with this 8x8 block. <figure 239class="image"> <center><img src="img\inter_motion_field.svg" alt="Directional 240intra" width="800" /><figcaption>Figure 9: Motion field estimation by linear 241projection</figcaption></figure> Finally, given a couple of pre-defined block 242coordinates, the associated MVs stored in the temporal MV buffer are identified 243and projected accordingly to derive a temporal MV predictor which points from 244the current block to its reference frame, e.g., MV0 in Figure 5. In Figure 6, 245the pre-defined block positions for deriving temporal MV predictors of a 16x16 246block are shown and up to 7 blocks will be checked to find valid temporal MV 247predictors.<figure class="image"> <center><img 248src="img\inter_tmvp_positions.svg" alt="Directional intra" width="300" 249/><figcaption>Figure 10: Block positions for deriving temporal MV 250predictors</figcaption></figure> The temporal MV predictors are checked after 251the nearest spatial MV predictors but before the non-adjacent spatial MV 252predictors. 253 254All the spatial and temporal MV candidates will be put together in a pool, with 255each predictor associated with a weighting determined during the scanning of the 256spatial and temporal neighboring blocks. Based on the associated weightings, the 257candidates are sorted and ranked, and up to four candidates will be used as a 258list MV predictor list. 259 260### Motion vector coding 261 262### Interpolation filter for motion compensation 263 264<mark>[Ed.: to be added]</mark> 265 266### Warped motion compensation 267 268**Global warped motion** 269 270The global motion information is signalled at each inter frame, wherein the 271global motion type and motion parameters are included. The global motion types 272and the number of the associated parameters are listed in the following table. 273 274 275| Global motion type | Number of parameters | 276|:------------------:|:--------------------:| 277| Identity (zero motion)| 0 | 278| Translation | 2 | 279| Rotzoom | 4 | 280| General affine | 6 | 281 282For an inter coded block, after the reference frame index is 283transmitted, if the motion of current block is indicated as global motion, the 284global motion type and the associated parameters of the given reference will be 285used for current block. 286 287**Local warped motion** 288 289For an inter coded block, local warped motion is allowed when the following 290conditions are all satisfied: 291 292* Current block is single prediction 293* Width or height is greater than or equal to 8 samples 294* At least one of the immediate neighbors uses same reference frame with current block 295 296If the local warped motion is used for current block, instead of signalling the 297affine parameters, they are estimated by using mean square minimization of the 298distance between the reference projection and modeled projection based on the 299motion vectors of current block and its immediate neighbors. To estimate the 300parameters of local warped motion, the projection sample pair of the center 301pixel in neighboring block and its corresponding pixel in the reference frame 302are collected if the neighboring block uses the same reference frame with 303current block. After that, 3 extra samples are created by shifting the center 304position by a quarter sample in one or two dimensions, and these samples are 305also considered as projection sample pairs to ensure the stability of the model 306parameter estimation process. 307 308 309### Overlapped block motion compensation 310 311For an inter-coded block, overlapped block motion compensation (OBMC) is allowed 312when the following conditions are all satisfied. 313 314* Current block is single prediction 315* Width or height is greater than or equal to 8 samples 316* At least one of the neighboring blocks are inter-coded blocks 317 318When OBMC is applied to current block, firstly, the initial inter prediction 319samples is generated by using the assigned motion vector of current block, then 320the inter predicted samples for the current block and inter predicted samples 321based on motion vectors from the above and left blocks are blended to generate 322the final prediction samples.The maximum number of neighboring motion vectors is 323limited based on the size of current block, and up to 4 motion vectors from each 324of upper and left blocks can be involved in the OBMC process of current block. 325 326One example of the processing order of neighboring blocks is shown in the 327following picture, wherein the values marked in each block indicate the 328processing order of the motion vectors of current block and neighboring blocks. 329To be specific, the motion vector of current block is firstly applied to 330generate inter prediction samples P0(x,y). Then motion vector of block 1 is 331applied to generate the prediction samples p1(x,y). After that, the prediction 332samples in the overlapping area between block 0 and block 1 is an weighted 333average of p0(x,y) and p1(x,y). The overlapping area of block 1 and block 0 is 334marked in grey in the following picture. The motion vectors of block 2, 3, 4 are 335further applied and blended in the same way. 336 337<figure class="image"> <center><img src="img\inter_obmc.svg" alt="Directional 338intra" width="300" /><figcaption>Figure 11: neighboring blocks for OBMC 339process</figcaption></figure> 340 341### Reference frames 342 343<mark>[Ed.: to be added]</mark> 344 345### Compound Prediction 346 347<mark>[Ed.: to be added]</mark> 348 349**Compound wedge prediction** 350 351<mark>[Ed.: to be added]</mark> 352 353**Difference-modulated masked prediction** 354 355<mark>[Ed.: to be added]</mark> 356 357**Frame distance-based compound prediction** 358 359<mark>[Ed.: to be added]</mark> 360 361**Compound inter-intra prediction** 362 363<mark>[Ed.: to be added]</mark> 364 365## Transform 366 367The separable 2D transform process is applied on prediction residuals. For the 368forward transform, a 1-D vertical transform is performed first on each column of 369the input residual block, then a horizontal transform is performed on each row 370of the vertical transform output. For the backward transform, a 1-D horizontal 371transform is performed first on each row of the input de-quantized coefficient 372block, then a vertical transform is performed on each column of the horizontal 373transform output. The primary 1-D transforms include four different types of 374transform: a) 4-point, 8-point, 16-point, 32-point, 64-point DCT-2; b) 4-point, 3758-point, 16-point asymmetric DST’s (DST-4, DST-7) and c) their flipped 376versions; d) 4-point, 8-point, 16-point, 32-point identity transforms. When 377transform size is 4-point, ADST refers to DST-7, otherwise, when transform size 378is greater than 4-point, ADST refers to DST-4. 379 380<figure class="image"> <center><figcaption>Table 2: Transform basis functions 381(DCT-2, DST-4 and DST-7 for N-point input.</figcaption> <img src= 382"img\tx_basis.svg" alt="Partition" width="450" /> </figure> 383 384For luma component, each transform block can select one pair of horizontal and 385vertical transform combination given a pre-defined set of transform type 386candidates, and the selection is explicitly signalled into the bitstream. 387However, the selection is not signalled when Max(width,height) is 64. When 388the maximum of transform block width and height is greater than or equal to 32, 389the set of transform type candidates depend on the prediction mode, as described 390in Table 3. Otherwise, when the maximum of transform block width and height is 391smaller than 32, the set of transform type candidates depend on the prediction 392mode, as described in Table 4. 393 394<figure class="image"> <center><figcaption>Table 3: Transform type candidates 395for luma component when max(width, height) is greater than or equal to 32. 396</figcaption> <img src="img\tx_cands_large.svg" alt="Partition" width="370" /> 397</figure> 398 399<figure class="image"> <center><figcaption>Table 4: Transform type candidates 400for luma component when max(width, height) is smaller than 32. </figcaption> 401<img src="img\tx_cands_small.svg" alt="Partition" width="440" /> </figure> 402 403The set of transform type candidates (namely transform set) is defined in Table 4045. 405 406<figure class="image"> <center><figcaption>Table 5: Definition of transform set. 407</figcaption> <img src="img\tx_set.svg" alt="Partition" width="450" /> </figure> 408 409For chroma component, the transform type selection is done in an implicit way. 410For intra prediction residuals, the transform type is selected according to the 411intra prediction mode, as specified in Table 4. For inter prediction residuals, 412the transform type is selected according to the transform type selection of the 413co-located luma block. Therefore, for chroma component, there is no transform 414type signalling in the bitstream. 415 416<figure class="image"> <center><figcaption>Table 6: Transform type selection for 417chroma component intra prediction residuals.</figcaption> <img src= 418"img\tx_chroma.svg" alt="Partition" width="500" /> </figure> 419 420The computational cost of large size (e.g., 64-point) transforms is further 421reduced by zeroing out all the coefficients except the following two cases: 422 4231. The top-left 32×32 quadrant for 64×64/64×32/32×64 DCT_DCT hybrid transforms 4242. The left 32×16 area for 64×16 and top 16×32 for16×64 DCT_DCT hybrid transforms. 425 426Both the DCT-2 and ADST (DST-4, DST-7) are implemented using butterfly structure 427[1], which included multiple stages of butterfly operations. Each butterfly 428operations can be calculated in parallel and different stages are cascaded in a 429sequential order. 430 431## Quantization 432Quantization of transform coefficients may apply different quantization step 433size for DC and AC transform coefficients, and different quantization step size 434for luma and chroma transform coefficients. To specify the quantization step 435size, in the frame header, a _**base_q_idx**_ syntax element is first signalled, 436which is a 8-bit fixed length code specifying the quantization step size for 437luma AC coefficients. The valid range of _**base_q_idx**_ is [0, 255]. 438 439After that, the delta value relative to base_q_idx for Luma DC coefficients, 440indicated as DeltaQYDc is further signalled. Furthermore, if there are more than 441one color plane, then a flag _**diff_uv_delta**_ is signaled to indicate whether 442Cb and Cr color components apply different quantization index values. If 443_**diff_uv_delta**_ is signalled as 0, then only the delta values relative to 444base_q_idx for chroma DC coefficients (indicated as DeltaQUDc) and AC 445coefficients (indicated as DeltaQUAc) are signalled. Otherwise, the delta values 446relative to base_q_idx for both the Cb and Cr DC coefficients (indicated as 447DeltaQUDc and DeltaQVDc) and AC coefficients (indicated as DeltaQUAc and 448DeltaQVAc) are signalled. 449 450The above decoded DeltaQYDc, DeltaQUAc, DeltaQUDc, DeltaQVAc and DeltaQVDc are 451added to _base_q_idx_ to derive the quantization indices. Then these 452quantization indices are further mapped to quantization step size according to 453two tables. For DC coefficients, the mapping from quantization index to 454quantization step size for 8-bit, 10-bit and 12-bit internal bit depth is 455specified by a lookup table Dc_Qlookup[3][256], and the mapping from 456quantization index to quantization step size for 8-bit, 10-bit and 12-bit is 457specified by a lookup table Ac_Qlookup[3][256]. 458 459<figure class="image"> <center><img src="img\quant_dc.svg" alt="quant_dc" 460width="800" /><figcaption>Figure 11: Quantization step size of DC coefficients 461for different internal bit-depth</figcaption></figure> 462 463<figure class="image"> <center><img src="img\quant_ac.svg" alt="quant_ac" 464width="800" /><figcaption>Figure 12: Quantization step size of AC coefficients 465for different internal bit-depth</figcaption></figure> 466 467Given the quantization step size, indicated as _Q<sub>step_, the input quantized 468coefficients is further de-quantized using the following formula: 469 470_F_ = sign * ( (_f_ * _Q<sub>step_) % 0xFFFFFF ) / _deNorm_ 471 472, where _f_ is the input quantized coefficient, _F_ is the output dequantized 473coefficient, _deNorm_ is a constant value derived from the transform block area 474size, as indicated by the following table: 475 476| _deNorm_ | Tx block area size | 477|----------|:--------------------------| 478| 1| Less than 512 samples | 479| 2 | 512 or 1024 samples | 480| 4 | Greater than 1024 samples | 481 482When the quantization index is 0, the quantization is performed using a 483quantization step size equal to 1, which is lossless coding mode. 484 485## Entropy Coding 486 487**Entropy coding engine** 488 489<mark>[Ed.: to be added]</mark> 490 491**Coefficient coding** 492 493For each transform unit, the coefficient coding starts with coding a skip sign, 494which is followed by the signaling of primary transform kernel type and the 495end-of-block (EOB) position in case the transform coding is not skipped. After 496that, the coefficient values are coded in a multiple level map manner plus sign 497values. The level maps are coded as three level planes, namely lower-level, 498middle-level and higher-level planes, and the sign is coded as another separate 499plane. The lower-level, middle-level and higher-level planes correspond to 500correspond to different ranges of coefficient magnitudes. The lower level plane 501corresponds to the range of 0–2, the middle level plane takes care of the 502range of 3–14, and the higher-level plane covers the range of 15 and above. 503 504The three level planes are coded as follows. After the EOB position is coded, 505the lower-level and middle-level planes are coded together in backward scan 506order, and the scan order refers to zig-zag scan applied on the entire transform 507unit basis. Then the sign plane and higher-level plane are coded together in 508forward scan order. After that, the remainder (coefficient level minus 14) is 509entropy coded using Exp-Golomb code. 510 511The context model applied to the lower level plane depends on the primary 512transform directions, including: bi-directional, horizontal, and vertical, as 513well as transform size, and up to five neighbor (in frequency domain) 514coefficients are used to derive the context. The middle level plane uses a 515similar context model, but the number of context neighbor coefficients is 516reduced from 5 to 2. The higher-level plane is coded by Exp-Golomb code without 517using context model. For the sign plane, except the DC sign that is coded using 518the DC signs from its neighboring transform units, sign values of other 519coefficients are coded directly without using context model. 520 521## Loop filtering and post-processing 522 523### Deblocking 524 525There are four methods when picking deblocking filter level, which are listed 526below: 527 528* LPF_PICK_FROM_FULL_IMAGE: search the full image with different values 529* LPF_PICK_FROM_Q: estimate the filter level based on quantizer and frame type 530* LPF_PICK_FROM_SUBIMAGE: estimate the level from a portion of image 531* LPF_PICK_MINIMAL_LPF: set the filter level to 0 and disable the deblocking 532 533When estimating the filter level from the full image or sub-image, the searching 534starts from the previous frame filter level, ends when the filter step is less 535or equal to zero. In addition to filter level, there are some other parameters 536which control the deblocking filter such as sharpness level, mode deltas, and 537reference deltas. 538 539Deblocking is performed at 128x128 super block level, and the vertical and 540horizontal edges are filtered respectively. For a 128x128 super block, the 541vertical/horizontal edges aligned with each 8x8 block is firstly filtered. If 542the 4x4 transform is used, the internal edge aligned with a 4x4 block will be 543further filtered. The filter length is switchable from 4-tap, 6-tap, 8-tap, 54414-tap, and 0-tap (no filtering). The location of filter taps are identified 545based on the number of filter taps in order to compute the filter mask. When 546finally performing the filtering, outer taps are added if there is high edge 547variance. 548 549### Constrained directional enhancement filter 550 551**Edge Direction Estimation**\ 552In CDEF, edge direction search is performed at 8x8 block-level. There are 553eight edge directions in total, as illustrated in Figure 13. 554<figure class="image"> <center><img src="img\edge_direction.svg" 555alt="Edge direction" width="700" /> <figcaption>Figure 13: Line number 556k for pixels following direction d=0:7 in an 8x8 block.</figcaption> </figure> 557 558The optimal edge direction d_opt is found by maximizing the following 559term [3]: 560 561<figure class="image"> <center><img src="img\equ_edge_direction.svg" 562alt="Equation edge direction" width="250" /> </figure> 563<!-- $$d_{opt}=\max_{d} s_d$$ 564$$s_d = \sum_{k}\frac{1}{N_{d,k}}(\sum_{p\in P_{d,k}}x_p)^2,$$ --> 565 566where x_p is the value of pixel p, P_{d,k} is the set of pixels in 567line k following direction d, N_{d,k} is the cardinality of P_{d,k}. 568 569**Directional filter**\ 570CDEF consists two filter taps: the primary tap and the secondary tap. 571The primary tap works along the edge direction (as shown in Figure 14), 572while the secondary tap forms an oriented 45 degree off the edge direction 573 (as shown in Figure 15). 574 575<figure class="image"> <center><img src="img\primary_tap.svg" 576alt="Primary tap" width="700" /> <figcaption>Figure 14: Primary filter 577taps following edge direction. For even strengths a = 2 and b = 4, for 578odd strengths a = 3 and b = 3. The filtered pixel is shown in the 579highlighted center.</figcaption> </figure> 580 581<figure class="image"> <center><img src="img\secondary_tap.svg" 582alt="Edge direction" width="700" /> <figcaption>Figure 15: Secondary 583filter taps. The filtered pixel is shown in the highlighted center. 584</figcaption> </figure> 585 586CDEF can be described by the following equation: 587 588<figure class="image"> <center><img src="img\equ_dir_search.svg" 589alt="Equation direction search" width="720" /> </figure> 590 591<!-- $$y(i,j)=x(i,j)+round(\sum_{m,n}w^{(p)}_{d,m,n}f(x(m,x)-x(i,j),S^{(p)}, 592D)+\sum_{m,n}w^{(s)}_{d,m,n}f(x(m,x)-x(i,j),S^{(s)},D)),$$ --> 593 594where x(i,j) and y(i,j) are the input and output reconstructed values 595of CDEF. p denotes primary tap, and s denotes secondary tap, w is 596the weight between primary and secondary tap. f(d,S,D) is a non-linear 597filtering function, S denotes filter strength, D is a damping parameter. 598For 8-bit content, S^p ranges from 0 to 15, and S^s can be 5990, 1, 2, or 4. D ranges from 3 to 6 for luma, and 2 to 4 for chroma. 600 601**Non linear filter**\ 602CDEF uses a non-linear filtering function to prevent excessive blurring 603when applied across an edge. It is achieved by ignoring pixels that are 604too different from the current pixels to be filtered. When the difference 605between current pixel and it's neighboring pixel d is within a threshold, 606f(d,S,D) = d, otherwise f(d,S,D) = 0. Specifically, the strength S 607determines the maximum difference allowed and damping D determines the 608point to ignore the filter tap. 609 610### Loop Restoration filter 611 612**Separable symmetric wiener filter** 613 614Let F be a w x w 2D filter taps around the pixel to be filtered, denoted as 615a w^2 x 1 column vector. When compared with traditional Wiener Filter, 616Separable Symmetric Wiener Filter has the following three constraints in order 617to save signaling bits and reduce complexity [4]: 618 6191) The w x w filter window of is separated into horizontal and vertical w-tap 620convolutions. 621 6222) The horizontal and vertical filters are constrained to be symmetric. 623 6243) It is assumed that the summation of horizontal/vertical filter coefficients 625is 1. 626 627As a result, F can be written as F = column_vectorize[ab^T], subject to a(i) 628= a(w - 1 - i), b(i) = b(w - 1 - i), for i = [0, r - 1], and sum(a(i)) = 629sum(b(i)) = 1, where a is the vertical filters and b is the horizontal filters. 630The derivation of the filters a and b starts from an initial guess of 631horizontal and vertical filters, optimizing one of the two while holding the 632other fixed. In the implementation w = 7, thus, 3 taps need to be sent for 633filters a and b, respectively. When signaling the filter coefficients, 4, 5 and 6346 bits are used for the first three filter taps, and the remaining ones are 635obtained from the normalization and symmetry constraints. 30 bits in total are 636transmitted for both vertical and horizontal filters. 637 638 639**Dual self-guided filter** 640 641Dual self-guided filter is designed to firstly obtain two coarse restorations 642X1 and X2 of the degraded frame X, and the final restoration Xr is obtained as 643a combination of the degraded samples, and the difference between the degraded 644samples and the coarse restorations [4]: 645 646<figure class="image"> <center><img src="img\equ_dual_self_guided.svg" 647alt="Equation dual self guided filter" width="300" /> </figure> 648<!-- $$X_r = X + \alpha (X_1 - X) + \beta (X_2 - X)$$ --> 649 650At encoder side, alpha and beta are computed using: 651 652<figure class="image"> <center><img src="img\equ_dual_self_para.svg" 653alt="Equation dual self guided filter parameter" width="220" /> </figure> 654<!-- $${\alpha, \beta}^T = (A^T A) ^{-1} A^T b,$$ --> 655 656where A = {X1 - X, X2 - X}, b = Y - X, and Y is the original source. 657 658X1 and X2 are obtained using guided filtering, and the filtering is controlled 659by a radius r and a noise parameter e, where a higher r implies a higher 660spatial variance and a higher e implies a higher range variance [4]. X1 and X2 661can be described by {r1, e1} and {r2, e2}, respectively. 662 663The encoder sends a 6-tuple {r1, e1, r2, e2, alpha, beta} to the decoder. In 664the implementation, {r1, e1, r2, e2} uses a 3-bit codebook, and {alpha, beta} 665uses 7-bit each due to much higher precision, resulting in a total of 17 bits. 666r is always less or equal to 3 [4]. 667 668Guided filtering can be described by a local linear model: 669 670<figure class="image"> <center><img src="img\equ_guided_filter.svg" 671alt="Equation guided filter" width="155" /> </figure> 672<!-- $$y=Fx+G,$$ --> 673 674where x and y are the input and output samples, F and G are determined by the 675statistics in the neighboring of the pixel to be filtered. It is called 676self-guided filtering when the guidance image is the same as the degraded 677image[4]. 678 679Following are three steps when deriving F and G of the self-guided filtering: 680 6811) Compute mean u and variance d of pixels in a (2r + 1) x (2r + 1) window 682around the pixel to be filtered. 683 6842) For each pixel, compute f = d / (d + e); g = (1 - f)u. 685 6863) Compute F and G for each pixel as averages of f and g values in a 3 x 3 687window around the pixel for use in step 2. 688 689### Frame super-resolution 690 691In order to improve the perceptual quality of decoded pictures, a 692super-resolution process is applied at low bit-rates [5]. First, at encoder 693side, the source video is downscaled as a non-normative procedure. Second, 694the downscaled video is encoded, followed by deblocking and CDEF process. 695Third, a linear upscaling process is applied as a normative procedure to bring 696the encoded video back to it's original spatial resolution. Lastly, the loop 697restoration is applied to resolve part of the high frequency lost. The last 698two steps together are called super-resolving process [5]. Similarly, decoding, 699deblocking and CDEF processes are applied at lower spatial resolution at 700decoder side. Then, the frames go through the super-resolving process. 701In order to reduce overheads in line-buffers with respect to hardware 702implementation, the upscaling and downscaling process are applied to 703horizontal dimension only. 704 705### Film grain synthesis 706 707At encoder side, film grain is removed from the input video as a denoising 708process. Then, the structure and intensity of the input video are analyzed 709by canny edge detector, and smooth areas are used to estimate the strength 710of film grain. Once the strength is estimated, the denoised video and film 711grain parameters are sent to decoder side. Those parameters are used to 712synthesis the grain and add it back to the decoded video, producing the final 713output video. 714 715In order to reconstruct the film grain, the following parameters are sent to 716decoder side: lag value, autoregressive coefficients, values for precomputed 717look-up table index of chroma components, and a set of points for a piece-wise 718linear scaling function [6]. Those parameters are signaled as quantized 719integers including 64 bytes for scaling function and 74 bytes for 720autoregressive coefficients. Once the parameters are received, an 721autoregressive process is applied in a raster scan order to generate one 64x64 722luma and two 32x32 chroma film grain templates [6]. Those templates are used 723to generate the grain for the remaining part of a picture. 724 725## Screen content coding 726 727To improve the coding performance of screen content coding, the associated video 728codec incorporates several coding tools,for example, intra block copy 729(IntraBC) is employed to handle the repeated patterns in a screen picture, and 730palette mode is used to handle the screen blocks with a limited number of 731different colors. 732 733### Intra block copy 734 735Intra Block Copy (IntraBC) [2] is a coding tool similar to inter-picture 736prediction. The main difference is that in IntraBC, a predictor block is 737formed from the reconstructed samples (before application of in-loop filtering) 738of the current picture. Therefore, IntraBC can be considered as "motion 739compensation" within current picture. 740 741A block vector (BV) was coded to specify the location of the predictor block. 742The BV precision is integer. The BV will be signalled in the bitstream since the 743decoder needs it to locate the predictor. For current block, the flag use 744IntraBC indicating whether current block is IntraBC mode is first transmitted in 745bit stream. Then, if the current block is IntraBC mode, the BV difference diff 746is obtained by subtracting the reference BV from the current BV, and then diff 747is classified into four types according to the diff values of horizontal and 748vertical component. Type information needs to be transmitted into the bitstream, 749after that, diff values of two components may be signalled based on the type 750info. 751 752IntraBC is very effective for screen content coding, but it also brings a lot of 753difficulties to hardware design. To facilitate the hardware design, the 754following modifications are adopted. 755 7561) when IntraBC is allowed, the loop filters are disabled, which are de-blocking 757filter, the CDEF (Constrained Directional Enhancement Filter), and the Loop 758Restoration. By doing this, picture buffer of reconstructed samples can be 759shared between IntraBC and inter prediction. 760 7612) To facilitate parallel decoding, the prediction cannot exceed the restricted 762areas. For one super block, if the coordinate of its top-left position is (x0, 763y0), the prediction at position (x, y) can be accessed by IntraBC, if y < y0 and 764x < x0 + 2 * (y0 - y) 765 7663) To allow hardware writing back delay, immediate reconstructed areas cannot be 767accessed by IntraBC prediction. The restricted immediate reconstructed area can 768be 1 ∼ n super blocks. So on top of modification 2, if the coordinate of one 769super block's top-left position is (x0, y0), the prediction at position (x, y) 770can be accessed by IntraBC, if y < y0 and x < x0 + 2 * (y0 - y) - D, where D 771denotes the restricted immediate reconstructed area. When D is one super block, 772the prediction area is shown in below figure. 773 774<figure class="image"> <center><img src="img\SCC_IntraBC.svg" alt="Intra block 775copy" width="600" /> <figcaption>Figure 13: the prediction area for IntraBC mode 776in one super block prediction</figcaption> </figure> 777 778### Palette mode 779 780# References 781 782[1] J. Han, Y. Xu and D. Mukherjee, "A butterfly structured design of the hybrid 783transform coding scheme," 2013 Picture Coding Symposium (PCS), San Jose, CA, 7842013, pp. 17-20.\ 785[2] J. Li, H. Su, A. Converse, B. Li, R. Zhou, B. Lin, J. Xu, Y. Lu, and R. 786Xiong, "Intra Block Copy for Screen Content in the Emerging AV1 Video Codec," 7872018 Data Compression Conference, Snowbird, Utah, USA.\ 788[3] S. Midtskogen and J.M. Valin. "The AV1 constrained directional enhancement 789 filter (CDEF)." In 2018 IEEE International Conference on Acoustics, Speech 790 and Signal Processing (ICASSP), pp. 1193-1197. IEEE, 2018.\ 791[4] D. Mukherjee, S. Li, Y. Chen, A. Anis, S. Parker, and 792J. Bankoski. "A switchable loop-restoration with side-information framework 793for the emerging AV1 video codec." In 2017 IEEE International Conference on 794Image Processing (ICIP), pp. 265-269. IEEE, 2017.\ 795[5] Y. Chen, D. Murherjee, J. Han, A. Grange, Y. Xu, Z. Liu,... & C.H.Chiang, 796(2018, June). "An overview of core coding tools in the AV1 video codec."" 797In 2018 Picture Coding Symposium (PCS) (pp. 41-45). IEEE.\ 798[6] A. Norkin, & N. Birkbeck, (2018, March). "Film grain synthesis for AV1 799video codec." In 2018 Data Compression Conference (pp. 3-12). IEEE. 800