1Changes for 0.9.0 'Golden Eagle':
2---------------------------------
3
40.9.0 is a major version of dav1d, adding notably 10b acceleration on x64.
5
6Details:
7 - x86 (64bit) AVX2 implementation of most 10b/12b functions, which should provide
8   a large boost for high-bitdepth decoding on modern x86 computers and servers.
9 - ARM64 neon implementation of FilmGrain (4:2:0/4:2:2/4:4:4 8bit)
10 - New API to signal events happening during the decoding process
11
12
13Changes for 0.8.2 'Eurasian hobby':
14-----------------------------------
15
160.8.2 is a middle-size update of the 0.8.0 branch:
17 - ARM32 optimizations for ipred and itx in 10/12bits,
18   completing the 10b/12b work on ARM64 and ARM32
19 - Give the post-filters their own threads
20 - ARM64: rewrite the wiener functions
21 - Speed up coefficient decoding, 0.5%-3% global decoding gain
22 - x86 optimizations for CDEF_filter and wiener in 10/12bit
23 - x86: rewrite the SGR AVX2 asm
24 - x86: improve msac speed on SSE2+ machines
25 - ARM32: improve speed of ipred and warp
26 - ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16
27 - ARM32/64: improve speed of looprestoration
28 - Add seeking, pausing to the player
29 - Update the player for rendering of 10b/12b
30 - Misc speed improvements and fixes on all platforms
31 - Add a xxh3 muxer in the dav1d application
32
33
34Changes for 0.8.1 'Eurasian hobby':
35-----------------------------------
36
370.8.1 is a minor update on 0.8.0:
38 - Keep references to buffers valid after dav1d_close(). Fixes a regression
39   caused by the picture buffer pool added in 0.8.0.
40 - ARM32 optimizations for 10bit bitdepth for SGR
41 - ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge
42 - ARM64 optimizations for 10bit bitdepth for SGR
43 - x86 optimizations for wiener in SSE2/SSSE3/AVX2
44
45
46Changes for 0.8.0 'Eurasian hobby':
47-----------------------------------
48
490.8.0 is a major update for dav1d:
50 - Improve the performance by using a picture buffer pool;
51   The improvements can reach 10% on some cases on Windows.
52 - Support for Apple ARM Silicon
53 - ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl
54 - ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg,
55   put/prep 8tap/bilin, wiener and CDEF filters
56 - ARM64 optimizations for cfl_ac 444 for all bitdepths
57 - x86 optimizations for MC 8-tap, mc_scaled in AVX2
58 - x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3
59
60
61Changes for 0.7.1 'Frigatebird':
62------------------------------
63
640.7.1 is a minor update on 0.7.0:
65 - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC
66 - SSE2 optimizations for prep_bilin and prep_8tap
67 - AVX2 optimizations for MC scaled
68 - Fix a clamping issue in motion vector projection
69 - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions
70 - Improvements on the dav1dplay utility player to support resizing
71
72
73Changes for 0.7.0 'Frigatebird':
74------------------------------
75
760.7.0 is a major release for dav1d:
77 - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread)
78 - 10b/12b ARM64 optimizations are mostly complete:
79   - ipred (paeth, smooth, dc, pal, filter, cfl)
80   - itxfm (only 10b)
81 - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize
82 - AVX2 for cfl4:4:4
83 - AVX-512 CDEF filter
84 - ARM64 8b improvements for cfl_ac and itxfm
85 - ARM64 implementation for emu_edge in 8b/10b/12b
86 - ARM32 implementation for emu_edge in 8b
87 - Improvements on the dav1dplay utility player to support 10 bit,
88   non-4:2:0 pixel formats and film grain on the GPU
89
90
91Changes for 0.6.0 'Gyrfalcon':
92------------------------------
93
940.6.0 is a major release for dav1d:
95 - New ARM64 optimizations for the 10/12bit depth:
96    - mc_avg, mc_w_avg, mc_mask
97    - mc_put/mc_prep 8tap/bilin
98    - mc_warp_8x8
99    - mc_w_mask
100    - mc_blend
101    - wiener
102    - SGR
103    - loopfilter
104    - cdef
105 - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask
106 - New SSSE3 optimizations for film grain
107 - New AVX2 optimizations for msac_adapt16
108 - Fix rare mismatches against the reference decoder, notably because of clipping
109 - Improvements on ARM64 on msac, cdef and looprestoration optimizations
110 - Improvements on AVX2 optimizations for cdef_filter
111 - Improvements in the C version for itxfm, cdef_filter
112
113
114Changes for 0.5.2 'Asiatic Cheetah':
115------------------------------------
116
1170.5.2 is a small release improving speed for ARM32 and adding minor features:
118 - ARM32 optimizations for loopfilter, ipred_dc|h|v
119 - Add section-5 raw OBU demuxer
120 - Improve the speed by reducing the L2 cache collisions
121 - Fix minor issues
122
123
124Changes for 0.5.1 'Asiatic Cheetah':
125------------------------------------
126
1270.5.1 is a small release improving speeds and fixing minor issues
128compared to 0.5.0:
129 - SSE2 optimizations for CDEF, wiener and warp_affine
130 - NEON optimizations for SGR on ARM32
131 - Fix mismatch issue in x86 asm in inverse identity transforms
132 - Fix build issue in ARM64 assembly if debug info was enabled
133 - Add a workaround for Xcode 11 -fstack-check bug
134
135
136Changes for 0.5.0 'Asiatic Cheetah':
137------------------------------------
138
1390.5.0 is a medium release fixing regressions and minor issues,
140and improving speed significantly:
141 - Export ITU T.35 metadata
142 - Speed improvements on blend_ on ARM
143 - Speed improvements on decode_coef and MSAC
144 - NEON optimizations for blend*, w_mask_, ipred functions for ARM64
145 - NEON optimizations for CDEF and warp on ARM32
146 - SSE2 optimizations for MSAC hi_tok decoding
147 - SSSE3 optimizations for deblocking loopfilters and warp_affine
148 - AVX2 optimizations for film grain and ipred_z2
149 - SSE4 optimizations for warp_affine
150 - VSX optimizations for wiener
151 - Fix inverse transform overflows in x86 and NEON asm
152 - Fix integer overflows with large frames
153 - Improve film grain generation to match reference code
154 - Improve compatibility with older binutils for ARM
155 - More advanced Player example in tools
156
157
158Changes for 0.4.0 'Cheetah':
159----------------------------
160
161 - Fix playback with unknown OBUs
162 - Add an option to limit the maximum frame size
163 - SSE2 and ARM64 optimizations for MSAC
164 - Improve speed on 32bits systems
165 - Optimization in obmc blend
166 - Reduce RAM usage significantly
167 - The initial PPC SIMD code, cdef_filter
168 - NEON optimizations for blend functions on ARM
169 - NEON optimizations for w_mask functions on ARM
170 - NEON optimizations for inverse transforms on ARM64
171 - VSX optimizations for CDEF filter
172 - Improve handling of malloc failures
173 - Simple Player example in tools
174
175
176Changes for 0.3.1 'Sailfish':
177------------------------------
178
179 - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs
180 - Reduce binary size, notably on Windows
181 - SSSE3 optimizations for ipred_filter
182 - ARM optimizations for MSAC
183
184
185Changes for 0.3.0 'Sailfish':
186------------------------------
187
188This is the final release for the numerous speed improvements of 0.3.0-rc.
189It mostly:
190 - Fixes an annoying crash on SSSE3 that happened in the itx functions
191
192
193Changes for 0.2.2 (0.3.0-rc) 'Antelope':
194-----------------------------
195
196 - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase
197   The impact is important on SSSE3, SSE4 and AVX2 cpus
198 - SSSE3 optimizations for all blocks size in itx
199 - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444)
200 - Speed improvements on CDEF for SSE4 CPUs
201 - NEON optimizations for SGR and loop filter
202 - Minor crashes, improvements and build changes
203
204
205Changes for 0.2.1 'Antelope':
206----------------------------
207
208 - SSSE3 optimization for cdef_dir
209 - AVX2 improvements of the existing CDEF optimizations
210 - NEON improvements of the existing CDEF and wiener optimizations
211 - Clarification about the numbering/versionning scheme
212
213
214Changes for 0.2.0 'Antelope':
215----------------------------
216
217 - ARM64 and ARM optimizations using NEON instructions
218 - SSSE3 optimizations for both 32 and 64bits
219 - More AVX2 assembly, reaching almost completion
220 - Fix installation of includes
221 - Rewrite inverse transforms to avoid overflows
222 - Snap packaging for Linux
223 - Updated API (ABI and API break)
224 - Fixes for un-decodable samples
225
226
227Changes for 0.1.0 'Gazelle':
228----------------------------
229
230Initial release of dav1d, the fast and small AV1 decoder.
231 - Support for all features of the AV1 bitstream
232 - Support for all bitdepth, 8, 10 and 12bits
233 - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale
234 - Full acceleration for AVX2 64bits processors, making it the fastest decoder
235 - Partial acceleration for SSSE3 processors
236 - Partial acceleration for NEON processors
237