Changes for 0.9.0 'Golden Eagle': --------------------------------- 0.9.0 is a major version of dav1d, adding notably 10b acceleration on x64. Details: - x86 (64bit) AVX2 implementation of most 10b/12b functions, which should provide a large boost for high-bitdepth decoding on modern x86 computers and servers. - ARM64 neon implementation of FilmGrain (4:2:0/4:2:2/4:4:4 8bit) - New API to signal events happening during the decoding process Changes for 0.8.2 'Eurasian hobby': ----------------------------------- 0.8.2 is a middle-size update of the 0.8.0 branch: - ARM32 optimizations for ipred and itx in 10/12bits, completing the 10b/12b work on ARM64 and ARM32 - Give the post-filters their own threads - ARM64: rewrite the wiener functions - Speed up coefficient decoding, 0.5%-3% global decoding gain - x86 optimizations for CDEF_filter and wiener in 10/12bit - x86: rewrite the SGR AVX2 asm - x86: improve msac speed on SSE2+ machines - ARM32: improve speed of ipred and warp - ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16 - ARM32/64: improve speed of looprestoration - Add seeking, pausing to the player - Update the player for rendering of 10b/12b - Misc speed improvements and fixes on all platforms - Add a xxh3 muxer in the dav1d application Changes for 0.8.1 'Eurasian hobby': ----------------------------------- 0.8.1 is a minor update on 0.8.0: - Keep references to buffers valid after dav1d_close(). Fixes a regression caused by the picture buffer pool added in 0.8.0. - ARM32 optimizations for 10bit bitdepth for SGR - ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge - ARM64 optimizations for 10bit bitdepth for SGR - x86 optimizations for wiener in SSE2/SSSE3/AVX2 Changes for 0.8.0 'Eurasian hobby': ----------------------------------- 0.8.0 is a major update for dav1d: - Improve the performance by using a picture buffer pool; The improvements can reach 10% on some cases on Windows. - Support for Apple ARM Silicon - ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl - ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg, put/prep 8tap/bilin, wiener and CDEF filters - ARM64 optimizations for cfl_ac 444 for all bitdepths - x86 optimizations for MC 8-tap, mc_scaled in AVX2 - x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3 Changes for 0.7.1 'Frigatebird': ------------------------------ 0.7.1 is a minor update on 0.7.0: - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC - SSE2 optimizations for prep_bilin and prep_8tap - AVX2 optimizations for MC scaled - Fix a clamping issue in motion vector projection - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions - Improvements on the dav1dplay utility player to support resizing Changes for 0.7.0 'Frigatebird': ------------------------------ 0.7.0 is a major release for dav1d: - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread) - 10b/12b ARM64 optimizations are mostly complete: - ipred (paeth, smooth, dc, pal, filter, cfl) - itxfm (only 10b) - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize - AVX2 for cfl4:4:4 - AVX-512 CDEF filter - ARM64 8b improvements for cfl_ac and itxfm - ARM64 implementation for emu_edge in 8b/10b/12b - ARM32 implementation for emu_edge in 8b - Improvements on the dav1dplay utility player to support 10 bit, non-4:2:0 pixel formats and film grain on the GPU Changes for 0.6.0 'Gyrfalcon': ------------------------------ 0.6.0 is a major release for dav1d: - New ARM64 optimizations for the 10/12bit depth: - mc_avg, mc_w_avg, mc_mask - mc_put/mc_prep 8tap/bilin - mc_warp_8x8 - mc_w_mask - mc_blend - wiener - SGR - loopfilter - cdef - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask - New SSSE3 optimizations for film grain - New AVX2 optimizations for msac_adapt16 - Fix rare mismatches against the reference decoder, notably because of clipping - Improvements on ARM64 on msac, cdef and looprestoration optimizations - Improvements on AVX2 optimizations for cdef_filter - Improvements in the C version for itxfm, cdef_filter Changes for 0.5.2 'Asiatic Cheetah': ------------------------------------ 0.5.2 is a small release improving speed for ARM32 and adding minor features: - ARM32 optimizations for loopfilter, ipred_dc|h|v - Add section-5 raw OBU demuxer - Improve the speed by reducing the L2 cache collisions - Fix minor issues Changes for 0.5.1 'Asiatic Cheetah': ------------------------------------ 0.5.1 is a small release improving speeds and fixing minor issues compared to 0.5.0: - SSE2 optimizations for CDEF, wiener and warp_affine - NEON optimizations for SGR on ARM32 - Fix mismatch issue in x86 asm in inverse identity transforms - Fix build issue in ARM64 assembly if debug info was enabled - Add a workaround for Xcode 11 -fstack-check bug Changes for 0.5.0 'Asiatic Cheetah': ------------------------------------ 0.5.0 is a medium release fixing regressions and minor issues, and improving speed significantly: - Export ITU T.35 metadata - Speed improvements on blend_ on ARM - Speed improvements on decode_coef and MSAC - NEON optimizations for blend*, w_mask_, ipred functions for ARM64 - NEON optimizations for CDEF and warp on ARM32 - SSE2 optimizations for MSAC hi_tok decoding - SSSE3 optimizations for deblocking loopfilters and warp_affine - AVX2 optimizations for film grain and ipred_z2 - SSE4 optimizations for warp_affine - VSX optimizations for wiener - Fix inverse transform overflows in x86 and NEON asm - Fix integer overflows with large frames - Improve film grain generation to match reference code - Improve compatibility with older binutils for ARM - More advanced Player example in tools Changes for 0.4.0 'Cheetah': ---------------------------- - Fix playback with unknown OBUs - Add an option to limit the maximum frame size - SSE2 and ARM64 optimizations for MSAC - Improve speed on 32bits systems - Optimization in obmc blend - Reduce RAM usage significantly - The initial PPC SIMD code, cdef_filter - NEON optimizations for blend functions on ARM - NEON optimizations for w_mask functions on ARM - NEON optimizations for inverse transforms on ARM64 - VSX optimizations for CDEF filter - Improve handling of malloc failures - Simple Player example in tools Changes for 0.3.1 'Sailfish': ------------------------------ - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs - Reduce binary size, notably on Windows - SSSE3 optimizations for ipred_filter - ARM optimizations for MSAC Changes for 0.3.0 'Sailfish': ------------------------------ This is the final release for the numerous speed improvements of 0.3.0-rc. It mostly: - Fixes an annoying crash on SSSE3 that happened in the itx functions Changes for 0.2.2 (0.3.0-rc) 'Antelope': ----------------------------- - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase The impact is important on SSSE3, SSE4 and AVX2 cpus - SSSE3 optimizations for all blocks size in itx - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444) - Speed improvements on CDEF for SSE4 CPUs - NEON optimizations for SGR and loop filter - Minor crashes, improvements and build changes Changes for 0.2.1 'Antelope': ---------------------------- - SSSE3 optimization for cdef_dir - AVX2 improvements of the existing CDEF optimizations - NEON improvements of the existing CDEF and wiener optimizations - Clarification about the numbering/versionning scheme Changes for 0.2.0 'Antelope': ---------------------------- - ARM64 and ARM optimizations using NEON instructions - SSSE3 optimizations for both 32 and 64bits - More AVX2 assembly, reaching almost completion - Fix installation of includes - Rewrite inverse transforms to avoid overflows - Snap packaging for Linux - Updated API (ABI and API break) - Fixes for un-decodable samples Changes for 0.1.0 'Gazelle': ---------------------------- Initial release of dav1d, the fast and small AV1 decoder. - Support for all features of the AV1 bitstream - Support for all bitdepth, 8, 10 and 12bits - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale - Full acceleration for AVX2 64bits processors, making it the fastest decoder - Partial acceleration for SSSE3 processors - Partial acceleration for NEON processors