1Changes for 0.9.0 'Golden Eagle': 2--------------------------------- 3 40.9.0 is a major version of dav1d, adding notably 10b acceleration on x64. 5 6Details: 7 - x86 (64bit) AVX2 implementation of most 10b/12b functions, which should provide 8 a large boost for high-bitdepth decoding on modern x86 computers and servers. 9 - ARM64 neon implementation of FilmGrain (4:2:0/4:2:2/4:4:4 8bit) 10 - New API to signal events happening during the decoding process 11 12 13Changes for 0.8.2 'Eurasian hobby': 14----------------------------------- 15 160.8.2 is a middle-size update of the 0.8.0 branch: 17 - ARM32 optimizations for ipred and itx in 10/12bits, 18 completing the 10b/12b work on ARM64 and ARM32 19 - Give the post-filters their own threads 20 - ARM64: rewrite the wiener functions 21 - Speed up coefficient decoding, 0.5%-3% global decoding gain 22 - x86 optimizations for CDEF_filter and wiener in 10/12bit 23 - x86: rewrite the SGR AVX2 asm 24 - x86: improve msac speed on SSE2+ machines 25 - ARM32: improve speed of ipred and warp 26 - ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16 27 - ARM32/64: improve speed of looprestoration 28 - Add seeking, pausing to the player 29 - Update the player for rendering of 10b/12b 30 - Misc speed improvements and fixes on all platforms 31 - Add a xxh3 muxer in the dav1d application 32 33 34Changes for 0.8.1 'Eurasian hobby': 35----------------------------------- 36 370.8.1 is a minor update on 0.8.0: 38 - Keep references to buffers valid after dav1d_close(). Fixes a regression 39 caused by the picture buffer pool added in 0.8.0. 40 - ARM32 optimizations for 10bit bitdepth for SGR 41 - ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge 42 - ARM64 optimizations for 10bit bitdepth for SGR 43 - x86 optimizations for wiener in SSE2/SSSE3/AVX2 44 45 46Changes for 0.8.0 'Eurasian hobby': 47----------------------------------- 48 490.8.0 is a major update for dav1d: 50 - Improve the performance by using a picture buffer pool; 51 The improvements can reach 10% on some cases on Windows. 52 - Support for Apple ARM Silicon 53 - ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl 54 - ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg, 55 put/prep 8tap/bilin, wiener and CDEF filters 56 - ARM64 optimizations for cfl_ac 444 for all bitdepths 57 - x86 optimizations for MC 8-tap, mc_scaled in AVX2 58 - x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3 59 60 61Changes for 0.7.1 'Frigatebird': 62------------------------------ 63 640.7.1 is a minor update on 0.7.0: 65 - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC 66 - SSE2 optimizations for prep_bilin and prep_8tap 67 - AVX2 optimizations for MC scaled 68 - Fix a clamping issue in motion vector projection 69 - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions 70 - Improvements on the dav1dplay utility player to support resizing 71 72 73Changes for 0.7.0 'Frigatebird': 74------------------------------ 75 760.7.0 is a major release for dav1d: 77 - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread) 78 - 10b/12b ARM64 optimizations are mostly complete: 79 - ipred (paeth, smooth, dc, pal, filter, cfl) 80 - itxfm (only 10b) 81 - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize 82 - AVX2 for cfl4:4:4 83 - AVX-512 CDEF filter 84 - ARM64 8b improvements for cfl_ac and itxfm 85 - ARM64 implementation for emu_edge in 8b/10b/12b 86 - ARM32 implementation for emu_edge in 8b 87 - Improvements on the dav1dplay utility player to support 10 bit, 88 non-4:2:0 pixel formats and film grain on the GPU 89 90 91Changes for 0.6.0 'Gyrfalcon': 92------------------------------ 93 940.6.0 is a major release for dav1d: 95 - New ARM64 optimizations for the 10/12bit depth: 96 - mc_avg, mc_w_avg, mc_mask 97 - mc_put/mc_prep 8tap/bilin 98 - mc_warp_8x8 99 - mc_w_mask 100 - mc_blend 101 - wiener 102 - SGR 103 - loopfilter 104 - cdef 105 - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask 106 - New SSSE3 optimizations for film grain 107 - New AVX2 optimizations for msac_adapt16 108 - Fix rare mismatches against the reference decoder, notably because of clipping 109 - Improvements on ARM64 on msac, cdef and looprestoration optimizations 110 - Improvements on AVX2 optimizations for cdef_filter 111 - Improvements in the C version for itxfm, cdef_filter 112 113 114Changes for 0.5.2 'Asiatic Cheetah': 115------------------------------------ 116 1170.5.2 is a small release improving speed for ARM32 and adding minor features: 118 - ARM32 optimizations for loopfilter, ipred_dc|h|v 119 - Add section-5 raw OBU demuxer 120 - Improve the speed by reducing the L2 cache collisions 121 - Fix minor issues 122 123 124Changes for 0.5.1 'Asiatic Cheetah': 125------------------------------------ 126 1270.5.1 is a small release improving speeds and fixing minor issues 128compared to 0.5.0: 129 - SSE2 optimizations for CDEF, wiener and warp_affine 130 - NEON optimizations for SGR on ARM32 131 - Fix mismatch issue in x86 asm in inverse identity transforms 132 - Fix build issue in ARM64 assembly if debug info was enabled 133 - Add a workaround for Xcode 11 -fstack-check bug 134 135 136Changes for 0.5.0 'Asiatic Cheetah': 137------------------------------------ 138 1390.5.0 is a medium release fixing regressions and minor issues, 140and improving speed significantly: 141 - Export ITU T.35 metadata 142 - Speed improvements on blend_ on ARM 143 - Speed improvements on decode_coef and MSAC 144 - NEON optimizations for blend*, w_mask_, ipred functions for ARM64 145 - NEON optimizations for CDEF and warp on ARM32 146 - SSE2 optimizations for MSAC hi_tok decoding 147 - SSSE3 optimizations for deblocking loopfilters and warp_affine 148 - AVX2 optimizations for film grain and ipred_z2 149 - SSE4 optimizations for warp_affine 150 - VSX optimizations for wiener 151 - Fix inverse transform overflows in x86 and NEON asm 152 - Fix integer overflows with large frames 153 - Improve film grain generation to match reference code 154 - Improve compatibility with older binutils for ARM 155 - More advanced Player example in tools 156 157 158Changes for 0.4.0 'Cheetah': 159---------------------------- 160 161 - Fix playback with unknown OBUs 162 - Add an option to limit the maximum frame size 163 - SSE2 and ARM64 optimizations for MSAC 164 - Improve speed on 32bits systems 165 - Optimization in obmc blend 166 - Reduce RAM usage significantly 167 - The initial PPC SIMD code, cdef_filter 168 - NEON optimizations for blend functions on ARM 169 - NEON optimizations for w_mask functions on ARM 170 - NEON optimizations for inverse transforms on ARM64 171 - VSX optimizations for CDEF filter 172 - Improve handling of malloc failures 173 - Simple Player example in tools 174 175 176Changes for 0.3.1 'Sailfish': 177------------------------------ 178 179 - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs 180 - Reduce binary size, notably on Windows 181 - SSSE3 optimizations for ipred_filter 182 - ARM optimizations for MSAC 183 184 185Changes for 0.3.0 'Sailfish': 186------------------------------ 187 188This is the final release for the numerous speed improvements of 0.3.0-rc. 189It mostly: 190 - Fixes an annoying crash on SSSE3 that happened in the itx functions 191 192 193Changes for 0.2.2 (0.3.0-rc) 'Antelope': 194----------------------------- 195 196 - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase 197 The impact is important on SSSE3, SSE4 and AVX2 cpus 198 - SSSE3 optimizations for all blocks size in itx 199 - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444) 200 - Speed improvements on CDEF for SSE4 CPUs 201 - NEON optimizations for SGR and loop filter 202 - Minor crashes, improvements and build changes 203 204 205Changes for 0.2.1 'Antelope': 206---------------------------- 207 208 - SSSE3 optimization for cdef_dir 209 - AVX2 improvements of the existing CDEF optimizations 210 - NEON improvements of the existing CDEF and wiener optimizations 211 - Clarification about the numbering/versionning scheme 212 213 214Changes for 0.2.0 'Antelope': 215---------------------------- 216 217 - ARM64 and ARM optimizations using NEON instructions 218 - SSSE3 optimizations for both 32 and 64bits 219 - More AVX2 assembly, reaching almost completion 220 - Fix installation of includes 221 - Rewrite inverse transforms to avoid overflows 222 - Snap packaging for Linux 223 - Updated API (ABI and API break) 224 - Fixes for un-decodable samples 225 226 227Changes for 0.1.0 'Gazelle': 228---------------------------- 229 230Initial release of dav1d, the fast and small AV1 decoder. 231 - Support for all features of the AV1 bitstream 232 - Support for all bitdepth, 8, 10 and 12bits 233 - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale 234 - Full acceleration for AVX2 64bits processors, making it the fastest decoder 235 - Partial acceleration for SSSE3 processors 236 - Partial acceleration for NEON processors 237