1 /* stb_image - v2.08 - public domain image loader - http://nothings.org/stb_image.h
2                                      no warranty implied; use at your own risk
3 
4    Do this:
5       #define STB_IMAGE_IMPLEMENTATION
6    before you include this file in *one* C or C++ file to create the implementation.
7 
8    // i.e. it should look like this:
9    #include ...
10    #include ...
11    #include ...
12    #define STB_IMAGE_IMPLEMENTATION
13    #include "stb_image.h"
14 
15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
17 
18 
19    QUICK NOTES:
20       Primarily of interest to game developers and other people who can
21           avoid problematic images and only need the trivial interface
22 
23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
24       PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
25 
26       TGA (not sure what subset, if a subset)
27       BMP non-1bpp, non-RLE
28       PSD (composited view only, no extra channels, 8/16 bit-per-channel)
29 
30       GIF (*comp always reports as 4-channel)
31       HDR (radiance rgbE format)
32       PIC (Softimage PIC)
33       PNM (PPM and PGM binary only)
34 
35       Animated GIF still needs a proper API, but here's one way to do it:
36           http://gist.github.com/urraka/685d9a6340b26b830d49
37 
38       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
39       - decode from arbitrary I/O callbacks
40       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
41 
42    Full documentation under "DOCUMENTATION" below.
43 
44 
45    Revision 2.00 release notes:
46 
47       - Progressive JPEG is now supported.
48 
49       - PPM and PGM binary formats are now supported, thanks to Ken Miller.
50 
51       - x86 platforms now make use of SSE2 SIMD instructions for
52         JPEG decoding, and ARM platforms can use NEON SIMD if requested.
53         This work was done by Fabian "ryg" Giesen. SSE2 is used by
54         default, but NEON must be enabled explicitly; see docs.
55 
56         With other JPEG optimizations included in this version, we see
57         2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
58         on a JPEG on an ARM machine, relative to previous versions of this
59         library. The same results will not obtain for all JPGs and for all
60         x86/ARM machines. (Note that progressive JPEGs are significantly
61         slower to decode than regular JPEGs.) This doesn't mean that this
62         is the fastest JPEG decoder in the land; rather, it brings it
63         closer to parity with standard libraries. If you want the fastest
64         decode, look elsewhere. (See "Philosophy" section of docs below.)
65 
66         See final bullet items below for more info on SIMD.
67 
68       - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
69         the memory allocator. Unlike other STBI libraries, these macros don't
70         support a context parameter, so if you need to pass a context in to
71         the allocator, you'll have to store it in a global or a thread-local
72         variable.
73 
74       - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
75         STBI_NO_LINEAR.
76             STBI_NO_HDR:     suppress implementation of .hdr reader format
77             STBI_NO_LINEAR:  suppress high-dynamic-range light-linear float API
78 
79       - You can suppress implementation of any of the decoders to reduce
80         your code footprint by #defining one or more of the following
81         symbols before creating the implementation.
82 
83             STBI_NO_JPEG
84             STBI_NO_PNG
85             STBI_NO_BMP
86             STBI_NO_PSD
87             STBI_NO_TGA
88             STBI_NO_GIF
89             STBI_NO_HDR
90             STBI_NO_PIC
91             STBI_NO_PNM   (.ppm and .pgm)
92 
93       - You can request *only* certain decoders and suppress all other ones
94         (this will be more forward-compatible, as addition of new decoders
95         doesn't require you to disable them explicitly):
96 
97             STBI_ONLY_JPEG
98             STBI_ONLY_PNG
99             STBI_ONLY_BMP
100             STBI_ONLY_PSD
101             STBI_ONLY_TGA
102             STBI_ONLY_GIF
103             STBI_ONLY_HDR
104             STBI_ONLY_PIC
105             STBI_ONLY_PNM   (.ppm and .pgm)
106 
107          Note that you can define multiples of these, and you will get all
108          of them ("only x" and "only y" is interpreted to mean "only x&y").
109 
110        - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
111          want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
112 
113       - Compilation of all SIMD code can be suppressed with
114             #define STBI_NO_SIMD
115         It should not be necessary to disable SIMD unless you have issues
116         compiling (e.g. using an x86 compiler which doesn't support SSE
117         intrinsics or that doesn't support the method used to detect
118         SSE2 support at run-time), and even those can be reported as
119         bugs so I can refine the built-in compile-time checking to be
120         smarter.
121 
122       - The old STBI_SIMD system which allowed installing a user-defined
123         IDCT etc. has been removed. If you need this, don't upgrade. My
124         assumption is that almost nobody was doing this, and those who
125         were will find the built-in SIMD more satisfactory anyway.
126 
127       - RGB values computed for JPEG images are slightly different from
128         previous versions of stb_image. (This is due to using less
129         integer precision in SIMD.) The C code has been adjusted so
130         that the same RGB values will be computed regardless of whether
131         SIMD support is available, so your app should always produce
132         consistent results. But these results are slightly different from
133         previous versions. (Specifically, about 3% of available YCbCr values
134         will compute different RGB results from pre-1.49 versions by +-1;
135         most of the deviating values are one smaller in the G channel.)
136 
137       - If you must produce consistent results with previous versions of
138         stb_image, #define STBI_JPEG_OLD and you will get the same results
139         you used to; however, you will not get the SIMD speedups for
140         the YCbCr-to-RGB conversion step (although you should still see
141         significant JPEG speedup from the other changes).
142 
143         Please note that STBI_JPEG_OLD is a temporary feature; it will be
144         removed in future versions of the library. It is only intended for
145         near-term back-compatibility use.
146 
147 
148    Latest revision history:
149       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
150       2.07  (2015-09-13) partial animated GIF support
151                          limited 16-bit PSD support
152                          minor bugs, code cleanup, and compiler warnings
153       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
154       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
155       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
156       2.03  (2015-04-12) additional corruption checking
157                          stbi_set_flip_vertically_on_load
158                          fix NEON support; fix mingw support
159       2.02  (2015-01-19) fix incorrect assert, fix warning
160       2.01  (2015-01-17) fix various warnings
161       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
162       2.00  (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD
163                          progressive JPEG
164                          PGM/PPM support
165                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
166                          STBI_NO_*, STBI_ONLY_*
167                          GIF bugfix
168       1.48  (2014-12-14) fix incorrectly-named assert()
169       1.47  (2014-12-14) 1/2/4-bit PNG support (both grayscale and paletted)
170                          optimize PNG
171                          fix bug in interlaced PNG with user-specified channel count
172 
173    See end of file for full revision history.
174 
175 
176  ============================    Contributors    =========================
177 
178  Image formats                                Bug fixes & warning fixes
179     Sean Barrett (jpeg, png, bmp)                Marc LeBlanc
180     Nicolas Schulz (hdr, psd)                    Christpher Lloyd
181     Jonathan Dummer (tga)                        Dave Moore
182     Jean-Marc Lienher (gif)                      Won Chun
183     Tom Seddon (pic)                             the Horde3D community
184     Thatcher Ulrich (psd)                        Janez Zemva
185     Ken Miller (pgm, ppm)                        Jonathan Blow
186     urraka@github (animated gif)                 Laurent Gomila
187                                                  Aruelien Pocheville
188                                                  Ryamond Barbiero
189                                                  David Woo
190  Extensions, features                            Martin Golini
191     Jetro Lauha (stbi_info)                      Roy Eltham
192     Martin "SpartanJ" Golini (stbi_info)         Luke Graham
193     James "moose2000" Brown (iPhone PNG)         Thomas Ruf
194     Ben "Disch" Wenger (io callbacks)            John Bartholomew
195     Omar Cornut (1/2/4-bit PNG)                  Ken Hamada
196     Nicolas Guillemot (vertical flip)            Cort Stratton
197     Richard Mitton (16-bit PSD)                  Blazej Dariusz Roszkowski
198                                                  Thibault Reuille
199                                                  Paul Du Bois
200                                                  Guillaume George
201                                                  Jerry Jansson
202                                                  Hayaki Saito
203                                                  Johan Duparc
204                                                  Ronny Chevalier
205  Optimizations & bugfixes                        Michal Cichon
206     Fabian "ryg" Giesen                          Tero Hanninen
207     Arseny Kapoulkine                            Sergio Gonzalez
208                                                  Cass Everitt
209                                                  Engin Manap
210   If your name should be here but                Martins Mozeiko
211   isn't, let Sean know.                          Joseph Thomson
212                                                  Phil Jordan
213                                                  Nathan Reed
214                                                  Michaelangel007@github
215                                                  Nick Verigakis
216 
217 LICENSE
218 
219 This software is in the public domain. Where that dedication is not
220 recognized, you are granted a perpetual, irrevocable license to copy,
221 distribute, and modify this file as you see fit.
222 
223 */
224 
225 #ifndef STBI_INCLUDE_STB_IMAGE_H
226 #define STBI_INCLUDE_STB_IMAGE_H
227 
228 // DOCUMENTATION
229 //
230 // Limitations:
231 //    - no 16-bit-per-channel PNG
232 //    - no 12-bit-per-channel JPEG
233 //    - no JPEGs with arithmetic coding
234 //    - no 1-bit BMP
235 //    - GIF always returns *comp=4
236 //
237 // Basic usage (see HDR discussion below for HDR usage):
238 //    int x,y,n;
239 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
240 //    // ... process data if not NULL ...
241 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
242 //    // ... replace '0' with '1'..'4' to force that many components per pixel
243 //    // ... but 'n' will always be the number that it would have been if you said 0
244 //    stbi_image_free(data)
245 //
246 // Standard parameters:
247 //    int *x       -- outputs image width in pixels
248 //    int *y       -- outputs image height in pixels
249 //    int *comp    -- outputs # of image components in image file
250 //    int req_comp -- if non-zero, # of image components requested in result
251 //
252 // The return value from an image loader is an 'unsigned char *' which points
253 // to the pixel data, or NULL on an allocation failure or if the image is
254 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
255 // with each pixel consisting of N interleaved 8-bit components; the first
256 // pixel pointed to is top-left-most in the image. There is no padding between
257 // image scanlines or between pixels, regardless of format. The number of
258 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
259 // If req_comp is non-zero, *comp has the number of components that _would_
260 // have been output otherwise. E.g. if you set req_comp to 4, you will always
261 // get RGBA output, but you can check *comp to see if it's trivially opaque
262 // because e.g. there were only 3 channels in the source image.
263 //
264 // An output image with N components has the following components interleaved
265 // in this order in each pixel:
266 //
267 //     N=#comp     components
268 //       1           grey
269 //       2           grey, alpha
270 //       3           red, green, blue
271 //       4           red, green, blue, alpha
272 //
273 // If image loading fails for any reason, the return value will be NULL,
274 // and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
275 // can be queried for an extremely brief, end-user unfriendly explanation
276 // of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
277 // compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
278 // more user-friendly ones.
279 //
280 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
281 //
282 // ===========================================================================
283 //
284 // Philosophy
285 //
286 // stb libraries are designed with the following priorities:
287 //
288 //    1. easy to use
289 //    2. easy to maintain
290 //    3. good performance
291 //
292 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
293 // and for best performance I may provide less-easy-to-use APIs that give higher
294 // performance, in addition to the easy to use ones. Nevertheless, it's important
295 // to keep in mind that from the standpoint of you, a client of this library,
296 // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
297 //
298 // Some secondary priorities arise directly from the first two, some of which
299 // make more explicit reasons why performance can't be emphasized.
300 //
301 //    - Portable ("ease of use")
302 //    - Small footprint ("easy to maintain")
303 //    - No dependencies ("ease of use")
304 //
305 // ===========================================================================
306 //
307 // I/O callbacks
308 //
309 // I/O callbacks allow you to read from arbitrary sources, like packaged
310 // files or some other source. Data read from callbacks are processed
311 // through a small internal buffer (currently 128 bytes) to try to reduce
312 // overhead.
313 //
314 // The three functions you must define are "read" (reads some bytes of data),
315 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
316 //
317 // ===========================================================================
318 //
319 // SIMD support
320 //
321 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
322 // supported by the compiler. For ARM Neon support, you must explicitly
323 // request it.
324 //
325 // (The old do-it-yourself SIMD API is no longer supported in the current
326 // code.)
327 //
328 // On x86, SSE2 will automatically be used when available based on a run-time
329 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
330 // the typical path is to have separate builds for NEON and non-NEON devices
331 // (at least this is true for iOS and Android). Therefore, the NEON support is
332 // toggled by a build flag: define STBI_NEON to get NEON loops.
333 //
334 // The output of the JPEG decoder is slightly different from versions where
335 // SIMD support was introduced (that is, for versions before 1.49). The
336 // difference is only +-1 in the 8-bit RGB channels, and only on a small
337 // fraction of pixels. You can force the pre-1.49 behavior by defining
338 // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
339 // and hence cost some performance.
340 //
341 // If for some reason you do not want to use any of SIMD code, or if
342 // you have issues compiling it, you can disable it entirely by
343 // defining STBI_NO_SIMD.
344 //
345 // ===========================================================================
346 //
347 // HDR image support   (disable by defining STBI_NO_HDR)
348 //
349 // stb_image now supports loading HDR images in general, and currently
350 // the Radiance .HDR file format, although the support is provided
351 // generically. You can still load any file through the existing interface;
352 // if you attempt to load an HDR file, it will be automatically remapped to
353 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
354 // both of these constants can be reconfigured through this interface:
355 //
356 //     stbi_hdr_to_ldr_gamma(2.2f);
357 //     stbi_hdr_to_ldr_scale(1.0f);
358 //
359 // (note, do not use _inverse_ constants; stbi_image will invert them
360 // appropriately).
361 //
362 // Additionally, there is a new, parallel interface for loading files as
363 // (linear) floats to preserve the full dynamic range:
364 //
365 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
366 //
367 // If you load LDR images through this interface, those images will
368 // be promoted to floating point values, run through the inverse of
369 // constants corresponding to the above:
370 //
371 //     stbi_ldr_to_hdr_scale(1.0f);
372 //     stbi_ldr_to_hdr_gamma(2.2f);
373 //
374 // Finally, given a filename (or an open file or memory block--see header
375 // file for details) containing image data, you can query for the "most
376 // appropriate" interface to use (that is, whether the image is HDR or
377 // not), using:
378 //
379 //     stbi_is_hdr(char *filename);
380 //
381 // ===========================================================================
382 //
383 // iPhone PNG support:
384 //
385 // By default we convert iphone-formatted PNGs back to RGB, even though
386 // they are internally encoded differently. You can disable this conversion
387 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
388 // you will always just get the native iphone "format" through (which
389 // is BGR stored in RGB).
390 //
391 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
392 // pixel to remove any premultiplied alpha *only* if the image file explicitly
393 // says there's premultiplied data (currently only happens in iPhone images,
394 // and only if iPhone convert-to-rgb processing is on).
395 //
396 
397 
398 #ifndef STBI_NO_STDIO
399 #include <stdio.h>
400 #endif // STBI_NO_STDIO
401 
402 #define STBI_VERSION 1
403 
404 enum
405 {
406    STBI_default = 0, // only used for req_comp
407 
408    STBI_grey       = 1,
409    STBI_grey_alpha = 2,
410    STBI_rgb        = 3,
411    STBI_rgb_alpha  = 4
412 };
413 
414 typedef unsigned char stbi_uc;
415 
416 #ifdef __cplusplus
417 extern "C" {
418 #endif
419 
420 #ifdef STB_IMAGE_STATIC
421 #define STBIDEF static
422 #else
423 #define STBIDEF extern
424 #endif
425 
426 //////////////////////////////////////////////////////////////////////////////
427 //
428 // PRIMARY API - works on images of any type
429 //
430 
431 //
432 // load image by filename, open file, or memory buffer
433 //
434 
435 typedef struct
436 {
437    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
438    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
439    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
440 } stbi_io_callbacks;
441 
442 STBIDEF stbi_uc *stbi_load               (char              const *filename,           int *x, int *y, int *comp, int req_comp);
443 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *comp, int req_comp);
444 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *comp, int req_comp);
445 
446 #ifndef STBI_NO_STDIO
447 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f,                  int *x, int *y, int *comp, int req_comp);
448 // for stbi_load_from_file, file pointer is left pointing immediately after image
449 #endif
450 
451 #ifndef STBI_NO_LINEAR
452    STBIDEF float *stbi_loadf                 (char const *filename,           int *x, int *y, int *comp, int req_comp);
453    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
454    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp);
455 
456    #ifndef STBI_NO_STDIO
457    STBIDEF float *stbi_loadf_from_file  (FILE *f,                int *x, int *y, int *comp, int req_comp);
458    #endif
459 #endif
460 
461 #ifndef STBI_NO_HDR
462    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
463    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
464 #endif
465 
466 #ifndef STBI_NO_LINEAR
467    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
468    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
469 #endif // STBI_NO_HDR
470 
471 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
472 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
473 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
474 #ifndef STBI_NO_STDIO
475 STBIDEF int      stbi_is_hdr          (char const *filename);
476 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
477 #endif // STBI_NO_STDIO
478 
479 
480 // get a VERY brief reason for failure
481 // NOT THREADSAFE
482 STBIDEF const char *stbi_failure_reason  (void);
483 
484 // free the loaded image -- this is just free()
485 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
486 
487 // get image dimensions & components without fully decoding
488 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
489 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
490 
491 #ifndef STBI_NO_STDIO
492 STBIDEF int      stbi_info            (char const *filename,     int *x, int *y, int *comp);
493 STBIDEF int      stbi_info_from_file  (FILE *f,                  int *x, int *y, int *comp);
494 
495 #endif
496 
497 
498 
499 // for image formats that explicitly notate that they have premultiplied alpha,
500 // we just return the colors as stored in the file. set this flag to force
501 // unpremultiplication. results are undefined if the unpremultiply overflow.
502 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
503 
504 // indicate whether we should process iphone images back to canonical format,
505 // or just pass them through "as-is"
506 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
507 
508 // flip the image vertically, so the first pixel in the output array is the bottom left
509 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
510 
511 // ZLIB client - used by PNG, available for other purposes
512 
513 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
514 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
515 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
516 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
517 
518 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
519 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
520 
521 
522 #ifdef __cplusplus
523 }
524 #endif
525 
526 //
527 //
528 ////   end header file   /////////////////////////////////////////////////////
529 #endif // STBI_INCLUDE_STB_IMAGE_H
530 
531 #ifdef STB_IMAGE_IMPLEMENTATION
532 
533 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
534   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
535   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
536   || defined(STBI_ONLY_ZLIB)
537    #ifndef STBI_ONLY_JPEG
538    #define STBI_NO_JPEG
539    #endif
540    #ifndef STBI_ONLY_PNG
541    #define STBI_NO_PNG
542    #endif
543    #ifndef STBI_ONLY_BMP
544    #define STBI_NO_BMP
545    #endif
546    #ifndef STBI_ONLY_PSD
547    #define STBI_NO_PSD
548    #endif
549    #ifndef STBI_ONLY_TGA
550    #define STBI_NO_TGA
551    #endif
552    #ifndef STBI_ONLY_GIF
553    #define STBI_NO_GIF
554    #endif
555    #ifndef STBI_ONLY_HDR
556    #define STBI_NO_HDR
557    #endif
558    #ifndef STBI_ONLY_PIC
559    #define STBI_NO_PIC
560    #endif
561    #ifndef STBI_ONLY_PNM
562    #define STBI_NO_PNM
563    #endif
564 #endif
565 
566 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
567 #define STBI_NO_ZLIB
568 #endif
569 
570 
571 #include <stdarg.h>
572 #include <stddef.h> // ptrdiff_t on osx
573 #include <stdlib.h>
574 #include <string.h>
575 
576 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
577 #include <math.h>  // ldexp
578 #endif
579 
580 #ifndef STBI_NO_STDIO
581 #include <stdio.h>
582 #endif
583 
584 #ifndef STBI_ASSERT
585 #include <assert.h>
586 #define STBI_ASSERT(x) assert(x)
587 #endif
588 
589 
590 #ifndef _MSC_VER
591    #ifdef __cplusplus
592    #define stbi_inline inline
593    #else
594    #define stbi_inline
595    #endif
596 #else
597    #define stbi_inline __forceinline
598 #endif
599 
600 
601 #ifdef _MSC_VER
602 typedef unsigned short stbi__uint16;
603 typedef   signed short stbi__int16;
604 typedef unsigned int   stbi__uint32;
605 typedef   signed int   stbi__int32;
606 #else
607 #include <stdint.h>
608 typedef uint16_t stbi__uint16;
609 typedef int16_t  stbi__int16;
610 typedef uint32_t stbi__uint32;
611 typedef int32_t  stbi__int32;
612 #endif
613 
614 // should produce compiler error if size is wrong
615 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
616 
617 #ifdef _MSC_VER
618 #define STBI_NOTUSED(v)  (void)(v)
619 #else
620 #define STBI_NOTUSED(v)  (void)sizeof(v)
621 #endif
622 
623 #ifdef _MSC_VER
624 #define STBI_HAS_LROTL
625 #endif
626 
627 #ifdef STBI_HAS_LROTL
628    #define stbi_lrot(x,y)  _lrotl(x,y)
629 #else
630    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (32 - (y))))
631 #endif
632 
633 #if defined(STBI_MALLOC) && defined(STBI_FREE) && defined(STBI_REALLOC)
634 // ok
635 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC)
636 // ok
637 #else
638 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC."
639 #endif
640 
641 #ifndef STBI_MALLOC
642 #define STBI_MALLOC(sz)    malloc(sz)
643 #define STBI_REALLOC(p,sz) realloc(p,sz)
644 #define STBI_FREE(p)       free(p)
645 #endif
646 
647 // x86/x64 detection
648 #if defined(__x86_64__) || defined(_M_X64)
649 #define STBI__X64_TARGET
650 #elif defined(__i386) || defined(_M_IX86)
651 #define STBI__X86_TARGET
652 #endif
653 
654 #if defined(__GNUC__) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET)) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
655 // NOTE: not clear do we actually need this for the 64-bit path?
656 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
657 // (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
658 // this is just broken and gcc are jerks for not fixing it properly
659 // http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
660 #define STBI_NO_SIMD
661 #endif
662 
663 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
664 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
665 //
666 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
667 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
668 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
669 // simultaneously enabling "-mstackrealign".
670 //
671 // See https://github.com/nothings/stb/issues/81 for more information.
672 //
673 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
674 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
675 #define STBI_NO_SIMD
676 #endif
677 
678 #if !defined(STBI_NO_SIMD) && defined(STBI__X86_TARGET)
679 #define STBI_SSE2
680 #include <emmintrin.h>
681 
682 #ifdef _MSC_VER
683 
684 #if _MSC_VER >= 1400  // not VC6
685 #include <intrin.h> // __cpuid
stbi__cpuid3(void)686 static int stbi__cpuid3(void)
687 {
688    int info[4];
689    __cpuid(info,1);
690    return info[3];
691 }
692 #else
stbi__cpuid3(void)693 static int stbi__cpuid3(void)
694 {
695    int res;
696    __asm {
697       mov  eax,1
698       cpuid
699       mov  res,edx
700    }
701    return res;
702 }
703 #endif
704 
705 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
706 
stbi__sse2_available()707 static int stbi__sse2_available()
708 {
709    int info3 = stbi__cpuid3();
710    return ((info3 >> 26) & 1) != 0;
711 }
712 #else // assume GCC-style if not VC++
713 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
714 
stbi__sse2_available()715 static int stbi__sse2_available()
716 {
717 #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
718    // GCC 4.8+ has a nice way to do this
719    return __builtin_cpu_supports("sse2");
720 #else
721    // portable way to do this, preferably without using GCC inline ASM?
722    // just bail for now.
723    return 0;
724 #endif
725 }
726 #endif
727 #endif
728 
729 // ARM NEON
730 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
731 #undef STBI_NEON
732 #endif
733 
734 #ifdef STBI_NEON
735 #include <arm_neon.h>
736 // assume GCC or Clang on ARM targets
737 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
738 #endif
739 
740 #ifndef STBI_SIMD_ALIGN
741 #define STBI_SIMD_ALIGN(type, name) type name
742 #endif
743 
744 ///////////////////////////////////////////////
745 //
746 //  stbi__context struct and start_xxx functions
747 
748 // stbi__context structure is our basic context used by all images, so it
749 // contains all the IO context, plus some basic image information
750 typedef struct
751 {
752    stbi__uint32 img_x, img_y;
753    int img_n, img_out_n;
754 
755    stbi_io_callbacks io;
756    void *io_user_data;
757 
758    int read_from_callbacks;
759    int buflen;
760    stbi_uc buffer_start[128];
761 
762    stbi_uc *img_buffer, *img_buffer_end;
763    stbi_uc *img_buffer_original, *img_buffer_original_end;
764 } stbi__context;
765 
766 
767 static void stbi__refill_buffer(stbi__context *s);
768 
769 // initialize a memory-decode context
stbi__start_mem(stbi__context * s,stbi_uc const * buffer,int len)770 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
771 {
772    s->io.read = NULL;
773    s->read_from_callbacks = 0;
774    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
775    s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
776 }
777 
778 // initialize a callback-based context
stbi__start_callbacks(stbi__context * s,stbi_io_callbacks * c,void * user)779 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
780 {
781    s->io = *c;
782    s->io_user_data = user;
783    s->buflen = sizeof(s->buffer_start);
784    s->read_from_callbacks = 1;
785    s->img_buffer_original = s->buffer_start;
786    stbi__refill_buffer(s);
787    s->img_buffer_original_end = s->img_buffer_end;
788 }
789 
790 #ifndef STBI_NO_STDIO
791 
stbi__stdio_read(void * user,char * data,int size)792 static int stbi__stdio_read(void *user, char *data, int size)
793 {
794    return (int) fread(data,1,size,(FILE*) user);
795 }
796 
stbi__stdio_skip(void * user,int n)797 static void stbi__stdio_skip(void *user, int n)
798 {
799    fseek((FILE*) user, n, SEEK_CUR);
800 }
801 
stbi__stdio_eof(void * user)802 static int stbi__stdio_eof(void *user)
803 {
804    return feof((FILE*) user);
805 }
806 
807 static stbi_io_callbacks stbi__stdio_callbacks =
808 {
809    stbi__stdio_read,
810    stbi__stdio_skip,
811    stbi__stdio_eof,
812 };
813 
stbi__start_file(stbi__context * s,FILE * f)814 static void stbi__start_file(stbi__context *s, FILE *f)
815 {
816    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
817 }
818 
819 //static void stop_file(stbi__context *s) { }
820 
821 #endif // !STBI_NO_STDIO
822 
stbi__rewind(stbi__context * s)823 static void stbi__rewind(stbi__context *s)
824 {
825    // conceptually rewind SHOULD rewind to the beginning of the stream,
826    // but we just rewind to the beginning of the initial buffer, because
827    // we only use it after doing 'test', which only ever looks at at most 92 bytes
828    s->img_buffer = s->img_buffer_original;
829    s->img_buffer_end = s->img_buffer_original_end;
830 }
831 
832 #ifndef STBI_NO_JPEG
833 static int      stbi__jpeg_test(stbi__context *s);
834 static stbi_uc *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
835 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
836 #endif
837 
838 #ifndef STBI_NO_PNG
839 static int      stbi__png_test(stbi__context *s);
840 static stbi_uc *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
841 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
842 #endif
843 
844 #ifndef STBI_NO_BMP
845 static int      stbi__bmp_test(stbi__context *s);
846 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
847 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
848 #endif
849 
850 #ifndef STBI_NO_TGA
851 static int      stbi__tga_test(stbi__context *s);
852 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
853 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
854 #endif
855 
856 #ifndef STBI_NO_PSD
857 static int      stbi__psd_test(stbi__context *s);
858 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
859 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
860 #endif
861 
862 #ifndef STBI_NO_HDR
863 static int      stbi__hdr_test(stbi__context *s);
864 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
865 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
866 #endif
867 
868 #ifndef STBI_NO_PIC
869 static int      stbi__pic_test(stbi__context *s);
870 static stbi_uc *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
871 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
872 #endif
873 
874 #ifndef STBI_NO_GIF
875 static int      stbi__gif_test(stbi__context *s);
876 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
877 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
878 #endif
879 
880 #ifndef STBI_NO_PNM
881 static int      stbi__pnm_test(stbi__context *s);
882 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
883 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
884 #endif
885 
886 // this is not threadsafe
887 static const char *stbi__g_failure_reason;
888 
stbi_failure_reason(void)889 STBIDEF const char *stbi_failure_reason(void)
890 {
891    return stbi__g_failure_reason;
892 }
893 
stbi__err(const char * str)894 static int stbi__err(const char *str)
895 {
896    stbi__g_failure_reason = str;
897    return 0;
898 }
899 
stbi__malloc(size_t size)900 static void *stbi__malloc(size_t size)
901 {
902     return STBI_MALLOC(size);
903 }
904 
905 // stbi__err - error
906 // stbi__errpf - error returning pointer to float
907 // stbi__errpuc - error returning pointer to unsigned char
908 
909 #ifdef STBI_NO_FAILURE_STRINGS
910    #define stbi__err(x,y)  0
911 #elif defined(STBI_FAILURE_USERMSG)
912    #define stbi__err(x,y)  stbi__err(y)
913 #else
914    #define stbi__err(x,y)  stbi__err(x)
915 #endif
916 
917 #define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
918 #define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
919 
stbi_image_free(void * retval_from_stbi_load)920 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
921 {
922    STBI_FREE(retval_from_stbi_load);
923 }
924 
925 #ifndef STBI_NO_LINEAR
926 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
927 #endif
928 
929 #ifndef STBI_NO_HDR
930 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
931 #endif
932 
933 static int stbi__vertically_flip_on_load = 0;
934 
stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)935 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
936 {
937     stbi__vertically_flip_on_load = flag_true_if_should_flip;
938 }
939 
stbi__load_main(stbi__context * s,int * x,int * y,int * comp,int req_comp)940 static unsigned char *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
941 {
942    #ifndef STBI_NO_JPEG
943    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp);
944    #endif
945    #ifndef STBI_NO_PNG
946    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp);
947    #endif
948    #ifndef STBI_NO_BMP
949    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp);
950    #endif
951    #ifndef STBI_NO_GIF
952    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp);
953    #endif
954    #ifndef STBI_NO_PSD
955    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp);
956    #endif
957    #ifndef STBI_NO_PIC
958    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp);
959    #endif
960    #ifndef STBI_NO_PNM
961    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp);
962    #endif
963 
964    #ifndef STBI_NO_HDR
965    if (stbi__hdr_test(s)) {
966       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp);
967       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
968    }
969    #endif
970 
971    #ifndef STBI_NO_TGA
972    // test tga last because it's a crappy test!
973    if (stbi__tga_test(s))
974       return stbi__tga_load(s,x,y,comp,req_comp);
975    #endif
976 
977    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
978 }
979 
stbi__load_flip(stbi__context * s,int * x,int * y,int * comp,int req_comp)980 static unsigned char *stbi__load_flip(stbi__context *s, int *x, int *y, int *comp, int req_comp)
981 {
982    unsigned char *result = stbi__load_main(s, x, y, comp, req_comp);
983 
984    if (stbi__vertically_flip_on_load && result != NULL) {
985       int w = *x, h = *y;
986       int depth = req_comp ? req_comp : *comp;
987       int row,col,z;
988       stbi_uc temp;
989 
990       // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
991       for (row = 0; row < (h>>1); row++) {
992          for (col = 0; col < w; col++) {
993             for (z = 0; z < depth; z++) {
994                temp = result[(row * w + col) * depth + z];
995                result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
996                result[((h - row - 1) * w + col) * depth + z] = temp;
997             }
998          }
999       }
1000    }
1001 
1002    return result;
1003 }
1004 
1005 #ifndef STBI_NO_HDR
stbi__float_postprocess(float * result,int * x,int * y,int * comp,int req_comp)1006 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1007 {
1008    if (stbi__vertically_flip_on_load && result != NULL) {
1009       int w = *x, h = *y;
1010       int depth = req_comp ? req_comp : *comp;
1011       int row,col,z;
1012       float temp;
1013 
1014       // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1015       for (row = 0; row < (h>>1); row++) {
1016          for (col = 0; col < w; col++) {
1017             for (z = 0; z < depth; z++) {
1018                temp = result[(row * w + col) * depth + z];
1019                result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
1020                result[((h - row - 1) * w + col) * depth + z] = temp;
1021             }
1022          }
1023       }
1024    }
1025 }
1026 #endif
1027 
1028 #ifndef STBI_NO_STDIO
1029 
stbi__fopen(char const * filename,char const * mode)1030 static FILE *stbi__fopen(char const *filename, char const *mode)
1031 {
1032    FILE *f;
1033 #if defined(_MSC_VER) && _MSC_VER >= 1400
1034    if (0 != fopen_s(&f, filename, mode))
1035       f=0;
1036 #else
1037    f = fopen(filename, mode);
1038 #endif
1039    return f;
1040 }
1041 
1042 
stbi_load(char const * filename,int * x,int * y,int * comp,int req_comp)1043 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1044 {
1045    FILE *f = stbi__fopen(filename, "rb");
1046    unsigned char *result;
1047    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1048    result = stbi_load_from_file(f,x,y,comp,req_comp);
1049    fclose(f);
1050    return result;
1051 }
1052 
stbi_load_from_file(FILE * f,int * x,int * y,int * comp,int req_comp)1053 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1054 {
1055    unsigned char *result;
1056    stbi__context s;
1057    stbi__start_file(&s,f);
1058    result = stbi__load_flip(&s,x,y,comp,req_comp);
1059    if (result) {
1060       // need to 'unget' all the characters in the IO buffer
1061       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1062    }
1063    return result;
1064 }
1065 #endif //!STBI_NO_STDIO
1066 
stbi_load_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp,int req_comp)1067 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1068 {
1069    stbi__context s;
1070    stbi__start_mem(&s,buffer,len);
1071    return stbi__load_flip(&s,x,y,comp,req_comp);
1072 }
1073 
stbi_load_from_callbacks(stbi_io_callbacks const * clbk,void * user,int * x,int * y,int * comp,int req_comp)1074 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1075 {
1076    stbi__context s;
1077    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1078    return stbi__load_flip(&s,x,y,comp,req_comp);
1079 }
1080 
1081 #ifndef STBI_NO_LINEAR
stbi__loadf_main(stbi__context * s,int * x,int * y,int * comp,int req_comp)1082 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1083 {
1084    unsigned char *data;
1085    #ifndef STBI_NO_HDR
1086    if (stbi__hdr_test(s)) {
1087       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp);
1088       if (hdr_data)
1089          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
1090       return hdr_data;
1091    }
1092    #endif
1093    data = stbi__load_flip(s, x, y, comp, req_comp);
1094    if (data)
1095       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1096    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1097 }
1098 
stbi_loadf_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp,int req_comp)1099 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1100 {
1101    stbi__context s;
1102    stbi__start_mem(&s,buffer,len);
1103    return stbi__loadf_main(&s,x,y,comp,req_comp);
1104 }
1105 
stbi_loadf_from_callbacks(stbi_io_callbacks const * clbk,void * user,int * x,int * y,int * comp,int req_comp)1106 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1107 {
1108    stbi__context s;
1109    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1110    return stbi__loadf_main(&s,x,y,comp,req_comp);
1111 }
1112 
1113 #ifndef STBI_NO_STDIO
stbi_loadf(char const * filename,int * x,int * y,int * comp,int req_comp)1114 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1115 {
1116    float *result;
1117    FILE *f = stbi__fopen(filename, "rb");
1118    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1119    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1120    fclose(f);
1121    return result;
1122 }
1123 
stbi_loadf_from_file(FILE * f,int * x,int * y,int * comp,int req_comp)1124 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1125 {
1126    stbi__context s;
1127    stbi__start_file(&s,f);
1128    return stbi__loadf_main(&s,x,y,comp,req_comp);
1129 }
1130 #endif // !STBI_NO_STDIO
1131 
1132 #endif // !STBI_NO_LINEAR
1133 
1134 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1135 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1136 // reports false!
1137 
stbi_is_hdr_from_memory(stbi_uc const * buffer,int len)1138 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1139 {
1140    #ifndef STBI_NO_HDR
1141    stbi__context s;
1142    stbi__start_mem(&s,buffer,len);
1143    return stbi__hdr_test(&s);
1144    #else
1145    STBI_NOTUSED(buffer);
1146    STBI_NOTUSED(len);
1147    return 0;
1148    #endif
1149 }
1150 
1151 #ifndef STBI_NO_STDIO
stbi_is_hdr(char const * filename)1152 STBIDEF int      stbi_is_hdr          (char const *filename)
1153 {
1154    FILE *f = stbi__fopen(filename, "rb");
1155    int result=0;
1156    if (f) {
1157       result = stbi_is_hdr_from_file(f);
1158       fclose(f);
1159    }
1160    return result;
1161 }
1162 
stbi_is_hdr_from_file(FILE * f)1163 STBIDEF int      stbi_is_hdr_from_file(FILE *f)
1164 {
1165    #ifndef STBI_NO_HDR
1166    stbi__context s;
1167    stbi__start_file(&s,f);
1168    return stbi__hdr_test(&s);
1169    #else
1170    STBI_NOTUSED(f);
1171    return 0;
1172    #endif
1173 }
1174 #endif // !STBI_NO_STDIO
1175 
stbi_is_hdr_from_callbacks(stbi_io_callbacks const * clbk,void * user)1176 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1177 {
1178    #ifndef STBI_NO_HDR
1179    stbi__context s;
1180    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1181    return stbi__hdr_test(&s);
1182    #else
1183    STBI_NOTUSED(clbk);
1184    STBI_NOTUSED(user);
1185    return 0;
1186    #endif
1187 }
1188 
1189 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1190 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1191 
1192 #ifndef STBI_NO_LINEAR
stbi_ldr_to_hdr_gamma(float gamma)1193 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
stbi_ldr_to_hdr_scale(float scale)1194 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1195 #endif
1196 
stbi_hdr_to_ldr_gamma(float gamma)1197 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
stbi_hdr_to_ldr_scale(float scale)1198 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1199 
1200 
1201 //////////////////////////////////////////////////////////////////////////////
1202 //
1203 // Common code used by all image loaders
1204 //
1205 
1206 enum
1207 {
1208    STBI__SCAN_load=0,
1209    STBI__SCAN_type,
1210    STBI__SCAN_header
1211 };
1212 
stbi__refill_buffer(stbi__context * s)1213 static void stbi__refill_buffer(stbi__context *s)
1214 {
1215    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1216    if (n == 0) {
1217       // at end of file, treat same as if from memory, but need to handle case
1218       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1219       s->read_from_callbacks = 0;
1220       s->img_buffer = s->buffer_start;
1221       s->img_buffer_end = s->buffer_start+1;
1222       *s->img_buffer = 0;
1223    } else {
1224       s->img_buffer = s->buffer_start;
1225       s->img_buffer_end = s->buffer_start + n;
1226    }
1227 }
1228 
stbi__get8(stbi__context * s)1229 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1230 {
1231    if (s->img_buffer < s->img_buffer_end)
1232       return *s->img_buffer++;
1233    if (s->read_from_callbacks) {
1234       stbi__refill_buffer(s);
1235       return *s->img_buffer++;
1236    }
1237    return 0;
1238 }
1239 
stbi__at_eof(stbi__context * s)1240 stbi_inline static int stbi__at_eof(stbi__context *s)
1241 {
1242    if (s->io.read) {
1243       if (!(s->io.eof)(s->io_user_data)) return 0;
1244       // if feof() is true, check if buffer = end
1245       // special case: we've only got the special 0 character at the end
1246       if (s->read_from_callbacks == 0) return 1;
1247    }
1248 
1249    return s->img_buffer >= s->img_buffer_end;
1250 }
1251 
stbi__skip(stbi__context * s,int n)1252 static void stbi__skip(stbi__context *s, int n)
1253 {
1254    if (n < 0) {
1255       s->img_buffer = s->img_buffer_end;
1256       return;
1257    }
1258    if (s->io.read) {
1259       int blen = (int) (s->img_buffer_end - s->img_buffer);
1260       if (blen < n) {
1261          s->img_buffer = s->img_buffer_end;
1262          (s->io.skip)(s->io_user_data, n - blen);
1263          return;
1264       }
1265    }
1266    s->img_buffer += n;
1267 }
1268 
stbi__getn(stbi__context * s,stbi_uc * buffer,int n)1269 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1270 {
1271    if (s->io.read) {
1272       int blen = (int) (s->img_buffer_end - s->img_buffer);
1273       if (blen < n) {
1274          int res, count;
1275 
1276          memcpy(buffer, s->img_buffer, blen);
1277 
1278          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1279          res = (count == (n-blen));
1280          s->img_buffer = s->img_buffer_end;
1281          return res;
1282       }
1283    }
1284 
1285    if (s->img_buffer+n <= s->img_buffer_end) {
1286       memcpy(buffer, s->img_buffer, n);
1287       s->img_buffer += n;
1288       return 1;
1289    } else
1290       return 0;
1291 }
1292 
stbi__get16be(stbi__context * s)1293 static int stbi__get16be(stbi__context *s)
1294 {
1295    int z = stbi__get8(s);
1296    return (z << 8) + stbi__get8(s);
1297 }
1298 
stbi__get32be(stbi__context * s)1299 static stbi__uint32 stbi__get32be(stbi__context *s)
1300 {
1301    stbi__uint32 z = stbi__get16be(s);
1302    return (z << 16) + stbi__get16be(s);
1303 }
1304 
1305 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1306 // nothing
1307 #else
stbi__get16le(stbi__context * s)1308 static int stbi__get16le(stbi__context *s)
1309 {
1310    int z = stbi__get8(s);
1311    return z + (stbi__get8(s) << 8);
1312 }
1313 #endif
1314 
1315 #ifndef STBI_NO_BMP
stbi__get32le(stbi__context * s)1316 static stbi__uint32 stbi__get32le(stbi__context *s)
1317 {
1318    stbi__uint32 z = stbi__get16le(s);
1319    return z + (stbi__get16le(s) << 16);
1320 }
1321 #endif
1322 
1323 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
1324 
1325 
1326 //////////////////////////////////////////////////////////////////////////////
1327 //
1328 //  generic converter from built-in img_n to req_comp
1329 //    individual types do this automatically as much as possible (e.g. jpeg
1330 //    does all cases internally since it needs to colorspace convert anyway,
1331 //    and it never has alpha, so very few cases ). png can automatically
1332 //    interleave an alpha=255 channel, but falls back to this for other cases
1333 //
1334 //  assume data buffer is malloced, so malloc a new one and free that one
1335 //  only failure mode is malloc failing
1336 
stbi__compute_y(int r,int g,int b)1337 static stbi_uc stbi__compute_y(int r, int g, int b)
1338 {
1339    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
1340 }
1341 
stbi__convert_format(unsigned char * data,int img_n,int req_comp,unsigned int x,unsigned int y)1342 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1343 {
1344    int i,j;
1345    unsigned char *good;
1346 
1347    if (req_comp == img_n) return data;
1348    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1349 
1350    good = (unsigned char *) stbi__malloc(req_comp * x * y);
1351    if (good == NULL) {
1352       STBI_FREE(data);
1353       return stbi__errpuc("outofmem", "Out of memory");
1354    }
1355 
1356    for (j=0; j < (int) y; ++j) {
1357       unsigned char *src  = data + j * x * img_n   ;
1358       unsigned char *dest = good + j * x * req_comp;
1359 
1360       #define COMBO(a,b)  ((a)*8+(b))
1361       #define CASE(a,b)   case COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1362       // convert source image with img_n components to one with req_comp components;
1363       // avoid switch per pixel, so use switch per scanline and massive macros
1364       switch (COMBO(img_n, req_comp)) {
1365          CASE(1,2) dest[0]=src[0], dest[1]=255; break;
1366          CASE(1,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1367          CASE(1,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=255; break;
1368          CASE(2,1) dest[0]=src[0]; break;
1369          CASE(2,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1370          CASE(2,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; break;
1371          CASE(3,4) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255; break;
1372          CASE(3,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1373          CASE(3,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255; break;
1374          CASE(4,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1375          CASE(4,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3]; break;
1376          CASE(4,3) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; break;
1377          default: STBI_ASSERT(0);
1378       }
1379       #undef CASE
1380    }
1381 
1382    STBI_FREE(data);
1383    return good;
1384 }
1385 
1386 #ifndef STBI_NO_LINEAR
stbi__ldr_to_hdr(stbi_uc * data,int x,int y,int comp)1387 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1388 {
1389    int i,k,n;
1390    float *output = (float *) stbi__malloc(x * y * comp * sizeof(float));
1391    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1392    // compute number of non-alpha components
1393    if (comp & 1) n = comp; else n = comp-1;
1394    for (i=0; i < x*y; ++i) {
1395       for (k=0; k < n; ++k) {
1396          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1397       }
1398       if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
1399    }
1400    STBI_FREE(data);
1401    return output;
1402 }
1403 #endif
1404 
1405 #ifndef STBI_NO_HDR
1406 #define stbi__float2int(x)   ((int) (x))
stbi__hdr_to_ldr(float * data,int x,int y,int comp)1407 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
1408 {
1409    int i,k,n;
1410    stbi_uc *output = (stbi_uc *) stbi__malloc(x * y * comp);
1411    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1412    // compute number of non-alpha components
1413    if (comp & 1) n = comp; else n = comp-1;
1414    for (i=0; i < x*y; ++i) {
1415       for (k=0; k < n; ++k) {
1416          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1417          if (z < 0) z = 0;
1418          if (z > 255) z = 255;
1419          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1420       }
1421       if (k < comp) {
1422          float z = data[i*comp+k] * 255 + 0.5f;
1423          if (z < 0) z = 0;
1424          if (z > 255) z = 255;
1425          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1426       }
1427    }
1428    STBI_FREE(data);
1429    return output;
1430 }
1431 #endif
1432 
1433 //////////////////////////////////////////////////////////////////////////////
1434 //
1435 //  "baseline" JPEG/JFIF decoder
1436 //
1437 //    simple implementation
1438 //      - doesn't support delayed output of y-dimension
1439 //      - simple interface (only one output format: 8-bit interleaved RGB)
1440 //      - doesn't try to recover corrupt jpegs
1441 //      - doesn't allow partial loading, loading multiple at once
1442 //      - still fast on x86 (copying globals into locals doesn't help x86)
1443 //      - allocates lots of intermediate memory (full size of all components)
1444 //        - non-interleaved case requires this anyway
1445 //        - allows good upsampling (see next)
1446 //    high-quality
1447 //      - upsampled channels are bilinearly interpolated, even across blocks
1448 //      - quality integer IDCT derived from IJG's 'slow'
1449 //    performance
1450 //      - fast huffman; reasonable integer IDCT
1451 //      - some SIMD kernels for common paths on targets with SSE2/NEON
1452 //      - uses a lot of intermediate memory, could cache poorly
1453 
1454 #ifndef STBI_NO_JPEG
1455 
1456 // huffman decoding acceleration
1457 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
1458 
1459 typedef struct
1460 {
1461    stbi_uc  fast[1 << FAST_BITS];
1462    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1463    stbi__uint16 code[256];
1464    stbi_uc  values[256];
1465    stbi_uc  size[257];
1466    unsigned int maxcode[18];
1467    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
1468 } stbi__huffman;
1469 
1470 typedef struct
1471 {
1472    stbi__context *s;
1473    stbi__huffman huff_dc[4];
1474    stbi__huffman huff_ac[4];
1475    stbi_uc dequant[4][64];
1476    stbi__int16 fast_ac[4][1 << FAST_BITS];
1477 
1478 // sizes for components, interleaved MCUs
1479    int img_h_max, img_v_max;
1480    int img_mcu_x, img_mcu_y;
1481    int img_mcu_w, img_mcu_h;
1482 
1483 // definition of jpeg image component
1484    struct
1485    {
1486       int id;
1487       int h,v;
1488       int tq;
1489       int hd,ha;
1490       int dc_pred;
1491 
1492       int x,y,w2,h2;
1493       stbi_uc *data;
1494       void *raw_data, *raw_coeff;
1495       stbi_uc *linebuf;
1496       short   *coeff;   // progressive only
1497       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
1498    } img_comp[4];
1499 
1500    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
1501    int            code_bits;   // number of valid bits
1502    unsigned char  marker;      // marker seen while filling entropy buffer
1503    int            nomore;      // flag if we saw a marker so must stop
1504 
1505    int            progressive;
1506    int            spec_start;
1507    int            spec_end;
1508    int            succ_high;
1509    int            succ_low;
1510    int            eob_run;
1511 
1512    int scan_n, order[4];
1513    int restart_interval, todo;
1514 
1515 // kernels
1516    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1517    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1518    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1519 } stbi__jpeg;
1520 
stbi__build_huffman(stbi__huffman * h,int * count)1521 static int stbi__build_huffman(stbi__huffman *h, int *count)
1522 {
1523    int i,j,k=0,code;
1524    // build size list for each symbol (from JPEG spec)
1525    for (i=0; i < 16; ++i)
1526       for (j=0; j < count[i]; ++j)
1527          h->size[k++] = (stbi_uc) (i+1);
1528    h->size[k] = 0;
1529 
1530    // compute actual symbols (from jpeg spec)
1531    code = 0;
1532    k = 0;
1533    for(j=1; j <= 16; ++j) {
1534       // compute delta to add to code to compute symbol id
1535       h->delta[j] = k - code;
1536       if (h->size[k] == j) {
1537          while (h->size[k] == j)
1538             h->code[k++] = (stbi__uint16) (code++);
1539          if (code-1 >= (1 << j)) return stbi__err("bad code lengths","Corrupt JPEG");
1540       }
1541       // compute largest code + 1 for this size, preshifted as needed later
1542       h->maxcode[j] = code << (16-j);
1543       code <<= 1;
1544    }
1545    h->maxcode[j] = 0xffffffff;
1546 
1547    // build non-spec acceleration table; 255 is flag for not-accelerated
1548    memset(h->fast, 255, 1 << FAST_BITS);
1549    for (i=0; i < k; ++i) {
1550       int s = h->size[i];
1551       if (s <= FAST_BITS) {
1552          int c = h->code[i] << (FAST_BITS-s);
1553          int m = 1 << (FAST_BITS-s);
1554          for (j=0; j < m; ++j) {
1555             h->fast[c+j] = (stbi_uc) i;
1556          }
1557       }
1558    }
1559    return 1;
1560 }
1561 
1562 // build a table that decodes both magnitude and value of small ACs in
1563 // one go.
stbi__build_fast_ac(stbi__int16 * fast_ac,stbi__huffman * h)1564 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
1565 {
1566    int i;
1567    for (i=0; i < (1 << FAST_BITS); ++i) {
1568       stbi_uc fast = h->fast[i];
1569       fast_ac[i] = 0;
1570       if (fast < 255) {
1571          int rs = h->values[fast];
1572          int run = (rs >> 4) & 15;
1573          int magbits = rs & 15;
1574          int len = h->size[fast];
1575 
1576          if (magbits && len + magbits <= FAST_BITS) {
1577             // magnitude code followed by receive_extend code
1578             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
1579             int m = 1 << (magbits - 1);
1580             if (k < m) k += (-1 << magbits) + 1;
1581             // if the result is small enough, we can fit it in fast_ac table
1582             if (k >= -128 && k <= 127)
1583                fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits));
1584          }
1585       }
1586    }
1587 }
1588 
stbi__grow_buffer_unsafe(stbi__jpeg * j)1589 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
1590 {
1591    do {
1592       int b = j->nomore ? 0 : stbi__get8(j->s);
1593       if (b == 0xff) {
1594          int c = stbi__get8(j->s);
1595          if (c != 0) {
1596             j->marker = (unsigned char) c;
1597             j->nomore = 1;
1598             return;
1599          }
1600       }
1601       j->code_buffer |= b << (24 - j->code_bits);
1602       j->code_bits += 8;
1603    } while (j->code_bits <= 24);
1604 }
1605 
1606 // (1 << n) - 1
1607 static stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
1608 
1609 // decode a jpeg huffman value from the bitstream
stbi__jpeg_huff_decode(stbi__jpeg * j,stbi__huffman * h)1610 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
1611 {
1612    unsigned int temp;
1613    int c,k;
1614 
1615    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1616 
1617    // look at the top FAST_BITS and determine what symbol ID it is,
1618    // if the code is <= FAST_BITS
1619    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1620    k = h->fast[c];
1621    if (k < 255) {
1622       int s = h->size[k];
1623       if (s > j->code_bits)
1624          return -1;
1625       j->code_buffer <<= s;
1626       j->code_bits -= s;
1627       return h->values[k];
1628    }
1629 
1630    // naive test is to shift the code_buffer down so k bits are
1631    // valid, then test against maxcode. To speed this up, we've
1632    // preshifted maxcode left so that it has (16-k) 0s at the
1633    // end; in other words, regardless of the number of bits, it
1634    // wants to be compared against something shifted to have 16;
1635    // that way we don't need to shift inside the loop.
1636    temp = j->code_buffer >> 16;
1637    for (k=FAST_BITS+1 ; ; ++k)
1638       if (temp < h->maxcode[k])
1639          break;
1640    if (k == 17) {
1641       // error! code not found
1642       j->code_bits -= 16;
1643       return -1;
1644    }
1645 
1646    if (k > j->code_bits)
1647       return -1;
1648 
1649    // convert the huffman code to the symbol id
1650    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
1651    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
1652 
1653    // convert the id to a symbol
1654    j->code_bits -= k;
1655    j->code_buffer <<= k;
1656    return h->values[c];
1657 }
1658 
1659 // bias[n] = (-1<<n) + 1
1660 static int const stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
1661 
1662 // combined JPEG 'receive' and JPEG 'extend', since baseline
1663 // always extends everything it receives.
stbi__extend_receive(stbi__jpeg * j,int n)1664 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
1665 {
1666    unsigned int k;
1667    int sgn;
1668    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1669 
1670    sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
1671    k = stbi_lrot(j->code_buffer, n);
1672    STBI_ASSERT(n >= 0 && n < (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask)));
1673    j->code_buffer = k & ~stbi__bmask[n];
1674    k &= stbi__bmask[n];
1675    j->code_bits -= n;
1676    return k + (stbi__jbias[n] & ~sgn);
1677 }
1678 
1679 // get some unsigned bits
stbi__jpeg_get_bits(stbi__jpeg * j,int n)1680 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
1681 {
1682    unsigned int k;
1683    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1684    k = stbi_lrot(j->code_buffer, n);
1685    j->code_buffer = k & ~stbi__bmask[n];
1686    k &= stbi__bmask[n];
1687    j->code_bits -= n;
1688    return k;
1689 }
1690 
stbi__jpeg_get_bit(stbi__jpeg * j)1691 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
1692 {
1693    unsigned int k;
1694    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
1695    k = j->code_buffer;
1696    j->code_buffer <<= 1;
1697    --j->code_bits;
1698    return k & 0x80000000;
1699 }
1700 
1701 // given a value that's at position X in the zigzag stream,
1702 // where does it appear in the 8x8 matrix coded as row-major?
1703 static stbi_uc stbi__jpeg_dezigzag[64+15] =
1704 {
1705     0,  1,  8, 16,  9,  2,  3, 10,
1706    17, 24, 32, 25, 18, 11,  4,  5,
1707    12, 19, 26, 33, 40, 48, 41, 34,
1708    27, 20, 13,  6,  7, 14, 21, 28,
1709    35, 42, 49, 56, 57, 50, 43, 36,
1710    29, 22, 15, 23, 30, 37, 44, 51,
1711    58, 59, 52, 45, 38, 31, 39, 46,
1712    53, 60, 61, 54, 47, 55, 62, 63,
1713    // let corrupt input sample past end
1714    63, 63, 63, 63, 63, 63, 63, 63,
1715    63, 63, 63, 63, 63, 63, 63
1716 };
1717 
1718 // decode one 64-entry block--
stbi__jpeg_decode_block(stbi__jpeg * j,short data[64],stbi__huffman * hdc,stbi__huffman * hac,stbi__int16 * fac,int b,stbi_uc * dequant)1719 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant)
1720 {
1721    int diff,dc,k;
1722    int t;
1723 
1724    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1725    t = stbi__jpeg_huff_decode(j, hdc);
1726    if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1727 
1728    // 0 all the ac values now so we can do it 32-bits at a time
1729    memset(data,0,64*sizeof(data[0]));
1730 
1731    diff = t ? stbi__extend_receive(j, t) : 0;
1732    dc = j->img_comp[b].dc_pred + diff;
1733    j->img_comp[b].dc_pred = dc;
1734    data[0] = (short) (dc * dequant[0]);
1735 
1736    // decode AC components, see JPEG spec
1737    k = 1;
1738    do {
1739       unsigned int zig;
1740       int c,r,s;
1741       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1742       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1743       r = fac[c];
1744       if (r) { // fast-AC path
1745          k += (r >> 4) & 15; // run
1746          s = r & 15; // combined length
1747          j->code_buffer <<= s;
1748          j->code_bits -= s;
1749          // decode into unzigzag'd location
1750          zig = stbi__jpeg_dezigzag[k++];
1751          data[zig] = (short) ((r >> 8) * dequant[zig]);
1752       } else {
1753          int rs = stbi__jpeg_huff_decode(j, hac);
1754          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1755          s = rs & 15;
1756          r = rs >> 4;
1757          if (s == 0) {
1758             if (rs != 0xf0) break; // end block
1759             k += 16;
1760          } else {
1761             k += r;
1762             // decode into unzigzag'd location
1763             zig = stbi__jpeg_dezigzag[k++];
1764             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
1765          }
1766       }
1767    } while (k < 64);
1768    return 1;
1769 }
1770 
stbi__jpeg_decode_block_prog_dc(stbi__jpeg * j,short data[64],stbi__huffman * hdc,int b)1771 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
1772 {
1773    int diff,dc;
1774    int t;
1775    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1776 
1777    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1778 
1779    if (j->succ_high == 0) {
1780       // first scan for DC coefficient, must be first
1781       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
1782       t = stbi__jpeg_huff_decode(j, hdc);
1783       diff = t ? stbi__extend_receive(j, t) : 0;
1784 
1785       dc = j->img_comp[b].dc_pred + diff;
1786       j->img_comp[b].dc_pred = dc;
1787       data[0] = (short) (dc << j->succ_low);
1788    } else {
1789       // refinement scan for DC coefficient
1790       if (stbi__jpeg_get_bit(j))
1791          data[0] += (short) (1 << j->succ_low);
1792    }
1793    return 1;
1794 }
1795 
1796 // @OPTIMIZE: store non-zigzagged during the decode passes,
1797 // and only de-zigzag when dequantizing
stbi__jpeg_decode_block_prog_ac(stbi__jpeg * j,short data[64],stbi__huffman * hac,stbi__int16 * fac)1798 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
1799 {
1800    int k;
1801    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1802 
1803    if (j->succ_high == 0) {
1804       int shift = j->succ_low;
1805 
1806       if (j->eob_run) {
1807          --j->eob_run;
1808          return 1;
1809       }
1810 
1811       k = j->spec_start;
1812       do {
1813          unsigned int zig;
1814          int c,r,s;
1815          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1816          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1817          r = fac[c];
1818          if (r) { // fast-AC path
1819             k += (r >> 4) & 15; // run
1820             s = r & 15; // combined length
1821             j->code_buffer <<= s;
1822             j->code_bits -= s;
1823             zig = stbi__jpeg_dezigzag[k++];
1824             data[zig] = (short) ((r >> 8) << shift);
1825          } else {
1826             int rs = stbi__jpeg_huff_decode(j, hac);
1827             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1828             s = rs & 15;
1829             r = rs >> 4;
1830             if (s == 0) {
1831                if (r < 15) {
1832                   j->eob_run = (1 << r);
1833                   if (r)
1834                      j->eob_run += stbi__jpeg_get_bits(j, r);
1835                   --j->eob_run;
1836                   break;
1837                }
1838                k += 16;
1839             } else {
1840                k += r;
1841                zig = stbi__jpeg_dezigzag[k++];
1842                data[zig] = (short) (stbi__extend_receive(j,s) << shift);
1843             }
1844          }
1845       } while (k <= j->spec_end);
1846    } else {
1847       // refinement scan for these AC coefficients
1848 
1849       short bit = (short) (1 << j->succ_low);
1850 
1851       if (j->eob_run) {
1852          --j->eob_run;
1853          for (k = j->spec_start; k <= j->spec_end; ++k) {
1854             short *p = &data[stbi__jpeg_dezigzag[k]];
1855             if (*p != 0)
1856                if (stbi__jpeg_get_bit(j))
1857                   if ((*p & bit)==0) {
1858                      if (*p > 0)
1859                         *p += bit;
1860                      else
1861                         *p -= bit;
1862                   }
1863          }
1864       } else {
1865          k = j->spec_start;
1866          do {
1867             int r,s;
1868             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
1869             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1870             s = rs & 15;
1871             r = rs >> 4;
1872             if (s == 0) {
1873                if (r < 15) {
1874                   j->eob_run = (1 << r) - 1;
1875                   if (r)
1876                      j->eob_run += stbi__jpeg_get_bits(j, r);
1877                   r = 64; // force end of block
1878                } else {
1879                   // r=15 s=0 should write 16 0s, so we just do
1880                   // a run of 15 0s and then write s (which is 0),
1881                   // so we don't have to do anything special here
1882                }
1883             } else {
1884                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
1885                // sign bit
1886                if (stbi__jpeg_get_bit(j))
1887                   s = bit;
1888                else
1889                   s = -bit;
1890             }
1891 
1892             // advance by r
1893             while (k <= j->spec_end) {
1894                short *p = &data[stbi__jpeg_dezigzag[k++]];
1895                if (*p != 0) {
1896                   if (stbi__jpeg_get_bit(j))
1897                      if ((*p & bit)==0) {
1898                         if (*p > 0)
1899                            *p += bit;
1900                         else
1901                            *p -= bit;
1902                      }
1903                } else {
1904                   if (r == 0) {
1905                      *p = (short) s;
1906                      break;
1907                   }
1908                   --r;
1909                }
1910             }
1911          } while (k <= j->spec_end);
1912       }
1913    }
1914    return 1;
1915 }
1916 
1917 // take a -128..127 value and stbi__clamp it and convert to 0..255
stbi__clamp(int x)1918 stbi_inline static stbi_uc stbi__clamp(int x)
1919 {
1920    // trick to use a single test to catch both cases
1921    if ((unsigned int) x > 255) {
1922       if (x < 0) return 0;
1923       if (x > 255) return 255;
1924    }
1925    return (stbi_uc) x;
1926 }
1927 
1928 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
1929 #define stbi__fsh(x)  ((x) << 12)
1930 
1931 // derived from jidctint -- DCT_ISLOW
1932 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
1933    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
1934    p2 = s2;                                    \
1935    p3 = s6;                                    \
1936    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
1937    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
1938    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
1939    p2 = s0;                                    \
1940    p3 = s4;                                    \
1941    t0 = stbi__fsh(p2+p3);                      \
1942    t1 = stbi__fsh(p2-p3);                      \
1943    x0 = t0+t3;                                 \
1944    x3 = t0-t3;                                 \
1945    x1 = t1+t2;                                 \
1946    x2 = t1-t2;                                 \
1947    t0 = s7;                                    \
1948    t1 = s5;                                    \
1949    t2 = s3;                                    \
1950    t3 = s1;                                    \
1951    p3 = t0+t2;                                 \
1952    p4 = t1+t3;                                 \
1953    p1 = t0+t3;                                 \
1954    p2 = t1+t2;                                 \
1955    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
1956    t0 = t0*stbi__f2f( 0.298631336f);           \
1957    t1 = t1*stbi__f2f( 2.053119869f);           \
1958    t2 = t2*stbi__f2f( 3.072711026f);           \
1959    t3 = t3*stbi__f2f( 1.501321110f);           \
1960    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
1961    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
1962    p3 = p3*stbi__f2f(-1.961570560f);           \
1963    p4 = p4*stbi__f2f(-0.390180644f);           \
1964    t3 += p1+p4;                                \
1965    t2 += p2+p3;                                \
1966    t1 += p2+p4;                                \
1967    t0 += p1+p3;
1968 
stbi__idct_block(stbi_uc * out,int out_stride,short data[64])1969 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
1970 {
1971    int i,val[64],*v=val;
1972    stbi_uc *o;
1973    short *d = data;
1974 
1975    // columns
1976    for (i=0; i < 8; ++i,++d, ++v) {
1977       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
1978       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
1979            && d[40]==0 && d[48]==0 && d[56]==0) {
1980          //    no shortcut                 0     seconds
1981          //    (1|2|3|4|5|6|7)==0          0     seconds
1982          //    all separate               -0.047 seconds
1983          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
1984          int dcterm = d[0] << 2;
1985          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
1986       } else {
1987          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
1988          // constants scaled things up by 1<<12; let's bring them back
1989          // down, but keep 2 extra bits of precision
1990          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
1991          v[ 0] = (x0+t3) >> 10;
1992          v[56] = (x0-t3) >> 10;
1993          v[ 8] = (x1+t2) >> 10;
1994          v[48] = (x1-t2) >> 10;
1995          v[16] = (x2+t1) >> 10;
1996          v[40] = (x2-t1) >> 10;
1997          v[24] = (x3+t0) >> 10;
1998          v[32] = (x3-t0) >> 10;
1999       }
2000    }
2001 
2002    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
2003       // no fast case since the first 1D IDCT spread components out
2004       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
2005       // constants scaled things up by 1<<12, plus we had 1<<2 from first
2006       // loop, plus horizontal and vertical each scale by sqrt(8) so together
2007       // we've got an extra 1<<3, so 1<<17 total we need to remove.
2008       // so we want to round that, which means adding 0.5 * 1<<17,
2009       // aka 65536. Also, we'll end up with -128 to 127 that we want
2010       // to encode as 0..255 by adding 128, so we'll add that before the shift
2011       x0 += 65536 + (128<<17);
2012       x1 += 65536 + (128<<17);
2013       x2 += 65536 + (128<<17);
2014       x3 += 65536 + (128<<17);
2015       // tried computing the shifts into temps, or'ing the temps to see
2016       // if any were out of range, but that was slower
2017       o[0] = stbi__clamp((x0+t3) >> 17);
2018       o[7] = stbi__clamp((x0-t3) >> 17);
2019       o[1] = stbi__clamp((x1+t2) >> 17);
2020       o[6] = stbi__clamp((x1-t2) >> 17);
2021       o[2] = stbi__clamp((x2+t1) >> 17);
2022       o[5] = stbi__clamp((x2-t1) >> 17);
2023       o[3] = stbi__clamp((x3+t0) >> 17);
2024       o[4] = stbi__clamp((x3-t0) >> 17);
2025    }
2026 }
2027 
2028 #ifdef STBI_SSE2
2029 // sse2 integer IDCT. not the fastest possible implementation but it
2030 // produces bit-identical results to the generic C version so it's
2031 // fully "transparent".
stbi__idct_simd(stbi_uc * out,int out_stride,short data[64])2032 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2033 {
2034    // This is constructed to match our regular (generic) integer IDCT exactly.
2035    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2036    __m128i tmp;
2037 
2038    // dot product constant: even elems=x, odd elems=y
2039    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2040 
2041    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
2042    // out(1) = c1[even]*x + c1[odd]*y
2043    #define dct_rot(out0,out1, x,y,c0,c1) \
2044       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2045       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2046       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2047       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2048       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2049       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2050 
2051    // out = in << 12  (in 16-bit, out 32-bit)
2052    #define dct_widen(out, in) \
2053       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2054       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2055 
2056    // wide add
2057    #define dct_wadd(out, a, b) \
2058       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2059       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2060 
2061    // wide sub
2062    #define dct_wsub(out, a, b) \
2063       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2064       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2065 
2066    // butterfly a/b, add bias, then shift by "s" and pack
2067    #define dct_bfly32o(out0, out1, a,b,bias,s) \
2068       { \
2069          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2070          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2071          dct_wadd(sum, abiased, b); \
2072          dct_wsub(dif, abiased, b); \
2073          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2074          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2075       }
2076 
2077    // 8-bit interleave step (for transposes)
2078    #define dct_interleave8(a, b) \
2079       tmp = a; \
2080       a = _mm_unpacklo_epi8(a, b); \
2081       b = _mm_unpackhi_epi8(tmp, b)
2082 
2083    // 16-bit interleave step (for transposes)
2084    #define dct_interleave16(a, b) \
2085       tmp = a; \
2086       a = _mm_unpacklo_epi16(a, b); \
2087       b = _mm_unpackhi_epi16(tmp, b)
2088 
2089    #define dct_pass(bias,shift) \
2090       { \
2091          /* even part */ \
2092          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2093          __m128i sum04 = _mm_add_epi16(row0, row4); \
2094          __m128i dif04 = _mm_sub_epi16(row0, row4); \
2095          dct_widen(t0e, sum04); \
2096          dct_widen(t1e, dif04); \
2097          dct_wadd(x0, t0e, t3e); \
2098          dct_wsub(x3, t0e, t3e); \
2099          dct_wadd(x1, t1e, t2e); \
2100          dct_wsub(x2, t1e, t2e); \
2101          /* odd part */ \
2102          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2103          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2104          __m128i sum17 = _mm_add_epi16(row1, row7); \
2105          __m128i sum35 = _mm_add_epi16(row3, row5); \
2106          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2107          dct_wadd(x4, y0o, y4o); \
2108          dct_wadd(x5, y1o, y5o); \
2109          dct_wadd(x6, y2o, y5o); \
2110          dct_wadd(x7, y3o, y4o); \
2111          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2112          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2113          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2114          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2115       }
2116 
2117    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2118    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2119    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2120    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2121    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2122    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2123    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2124    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2125 
2126    // rounding biases in column/row passes, see stbi__idct_block for explanation.
2127    __m128i bias_0 = _mm_set1_epi32(512);
2128    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2129 
2130    // load
2131    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2132    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2133    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2134    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2135    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2136    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2137    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2138    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2139 
2140    // column pass
2141    dct_pass(bias_0, 10);
2142 
2143    {
2144       // 16bit 8x8 transpose pass 1
2145       dct_interleave16(row0, row4);
2146       dct_interleave16(row1, row5);
2147       dct_interleave16(row2, row6);
2148       dct_interleave16(row3, row7);
2149 
2150       // transpose pass 2
2151       dct_interleave16(row0, row2);
2152       dct_interleave16(row1, row3);
2153       dct_interleave16(row4, row6);
2154       dct_interleave16(row5, row7);
2155 
2156       // transpose pass 3
2157       dct_interleave16(row0, row1);
2158       dct_interleave16(row2, row3);
2159       dct_interleave16(row4, row5);
2160       dct_interleave16(row6, row7);
2161    }
2162 
2163    // row pass
2164    dct_pass(bias_1, 17);
2165 
2166    {
2167       // pack
2168       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2169       __m128i p1 = _mm_packus_epi16(row2, row3);
2170       __m128i p2 = _mm_packus_epi16(row4, row5);
2171       __m128i p3 = _mm_packus_epi16(row6, row7);
2172 
2173       // 8bit 8x8 transpose pass 1
2174       dct_interleave8(p0, p2); // a0e0a1e1...
2175       dct_interleave8(p1, p3); // c0g0c1g1...
2176 
2177       // transpose pass 2
2178       dct_interleave8(p0, p1); // a0c0e0g0...
2179       dct_interleave8(p2, p3); // b0d0f0h0...
2180 
2181       // transpose pass 3
2182       dct_interleave8(p0, p2); // a0b0c0d0...
2183       dct_interleave8(p1, p3); // a4b4c4d4...
2184 
2185       // store
2186       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2187       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2188       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2189       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2190       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2191       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2192       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2193       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2194    }
2195 
2196 #undef dct_const
2197 #undef dct_rot
2198 #undef dct_widen
2199 #undef dct_wadd
2200 #undef dct_wsub
2201 #undef dct_bfly32o
2202 #undef dct_interleave8
2203 #undef dct_interleave16
2204 #undef dct_pass
2205 }
2206 
2207 #endif // STBI_SSE2
2208 
2209 #ifdef STBI_NEON
2210 
2211 // NEON integer IDCT. should produce bit-identical
2212 // results to the generic C version.
stbi__idct_simd(stbi_uc * out,int out_stride,short data[64])2213 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2214 {
2215    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2216 
2217    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2218    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2219    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2220    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2221    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2222    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2223    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2224    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2225    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2226    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2227    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2228    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2229 
2230 #define dct_long_mul(out, inq, coeff) \
2231    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2232    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2233 
2234 #define dct_long_mac(out, acc, inq, coeff) \
2235    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2236    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2237 
2238 #define dct_widen(out, inq) \
2239    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2240    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2241 
2242 // wide add
2243 #define dct_wadd(out, a, b) \
2244    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2245    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2246 
2247 // wide sub
2248 #define dct_wsub(out, a, b) \
2249    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2250    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2251 
2252 // butterfly a/b, then shift using "shiftop" by "s" and pack
2253 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2254    { \
2255       dct_wadd(sum, a, b); \
2256       dct_wsub(dif, a, b); \
2257       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2258       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2259    }
2260 
2261 #define dct_pass(shiftop, shift) \
2262    { \
2263       /* even part */ \
2264       int16x8_t sum26 = vaddq_s16(row2, row6); \
2265       dct_long_mul(p1e, sum26, rot0_0); \
2266       dct_long_mac(t2e, p1e, row6, rot0_1); \
2267       dct_long_mac(t3e, p1e, row2, rot0_2); \
2268       int16x8_t sum04 = vaddq_s16(row0, row4); \
2269       int16x8_t dif04 = vsubq_s16(row0, row4); \
2270       dct_widen(t0e, sum04); \
2271       dct_widen(t1e, dif04); \
2272       dct_wadd(x0, t0e, t3e); \
2273       dct_wsub(x3, t0e, t3e); \
2274       dct_wadd(x1, t1e, t2e); \
2275       dct_wsub(x2, t1e, t2e); \
2276       /* odd part */ \
2277       int16x8_t sum15 = vaddq_s16(row1, row5); \
2278       int16x8_t sum17 = vaddq_s16(row1, row7); \
2279       int16x8_t sum35 = vaddq_s16(row3, row5); \
2280       int16x8_t sum37 = vaddq_s16(row3, row7); \
2281       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2282       dct_long_mul(p5o, sumodd, rot1_0); \
2283       dct_long_mac(p1o, p5o, sum17, rot1_1); \
2284       dct_long_mac(p2o, p5o, sum35, rot1_2); \
2285       dct_long_mul(p3o, sum37, rot2_0); \
2286       dct_long_mul(p4o, sum15, rot2_1); \
2287       dct_wadd(sump13o, p1o, p3o); \
2288       dct_wadd(sump24o, p2o, p4o); \
2289       dct_wadd(sump23o, p2o, p3o); \
2290       dct_wadd(sump14o, p1o, p4o); \
2291       dct_long_mac(x4, sump13o, row7, rot3_0); \
2292       dct_long_mac(x5, sump24o, row5, rot3_1); \
2293       dct_long_mac(x6, sump23o, row3, rot3_2); \
2294       dct_long_mac(x7, sump14o, row1, rot3_3); \
2295       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2296       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2297       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2298       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2299    }
2300 
2301    // load
2302    row0 = vld1q_s16(data + 0*8);
2303    row1 = vld1q_s16(data + 1*8);
2304    row2 = vld1q_s16(data + 2*8);
2305    row3 = vld1q_s16(data + 3*8);
2306    row4 = vld1q_s16(data + 4*8);
2307    row5 = vld1q_s16(data + 5*8);
2308    row6 = vld1q_s16(data + 6*8);
2309    row7 = vld1q_s16(data + 7*8);
2310 
2311    // add DC bias
2312    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2313 
2314    // column pass
2315    dct_pass(vrshrn_n_s32, 10);
2316 
2317    // 16bit 8x8 transpose
2318    {
2319 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2320 // whether compilers actually get this is another story, sadly.
2321 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2322 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2323 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2324 
2325       // pass 1
2326       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2327       dct_trn16(row2, row3);
2328       dct_trn16(row4, row5);
2329       dct_trn16(row6, row7);
2330 
2331       // pass 2
2332       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2333       dct_trn32(row1, row3);
2334       dct_trn32(row4, row6);
2335       dct_trn32(row5, row7);
2336 
2337       // pass 3
2338       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2339       dct_trn64(row1, row5);
2340       dct_trn64(row2, row6);
2341       dct_trn64(row3, row7);
2342 
2343 #undef dct_trn16
2344 #undef dct_trn32
2345 #undef dct_trn64
2346    }
2347 
2348    // row pass
2349    // vrshrn_n_s32 only supports shifts up to 16, we need
2350    // 17. so do a non-rounding shift of 16 first then follow
2351    // up with a rounding shift by 1.
2352    dct_pass(vshrn_n_s32, 16);
2353 
2354    {
2355       // pack and round
2356       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2357       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2358       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2359       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2360       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2361       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2362       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2363       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2364 
2365       // again, these can translate into one instruction, but often don't.
2366 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2367 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2368 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2369 
2370       // sadly can't use interleaved stores here since we only write
2371       // 8 bytes to each scan line!
2372 
2373       // 8x8 8-bit transpose pass 1
2374       dct_trn8_8(p0, p1);
2375       dct_trn8_8(p2, p3);
2376       dct_trn8_8(p4, p5);
2377       dct_trn8_8(p6, p7);
2378 
2379       // pass 2
2380       dct_trn8_16(p0, p2);
2381       dct_trn8_16(p1, p3);
2382       dct_trn8_16(p4, p6);
2383       dct_trn8_16(p5, p7);
2384 
2385       // pass 3
2386       dct_trn8_32(p0, p4);
2387       dct_trn8_32(p1, p5);
2388       dct_trn8_32(p2, p6);
2389       dct_trn8_32(p3, p7);
2390 
2391       // store
2392       vst1_u8(out, p0); out += out_stride;
2393       vst1_u8(out, p1); out += out_stride;
2394       vst1_u8(out, p2); out += out_stride;
2395       vst1_u8(out, p3); out += out_stride;
2396       vst1_u8(out, p4); out += out_stride;
2397       vst1_u8(out, p5); out += out_stride;
2398       vst1_u8(out, p6); out += out_stride;
2399       vst1_u8(out, p7);
2400 
2401 #undef dct_trn8_8
2402 #undef dct_trn8_16
2403 #undef dct_trn8_32
2404    }
2405 
2406 #undef dct_long_mul
2407 #undef dct_long_mac
2408 #undef dct_widen
2409 #undef dct_wadd
2410 #undef dct_wsub
2411 #undef dct_bfly32o
2412 #undef dct_pass
2413 }
2414 
2415 #endif // STBI_NEON
2416 
2417 #define STBI__MARKER_none  0xff
2418 // if there's a pending marker from the entropy stream, return that
2419 // otherwise, fetch from the stream and get a marker. if there's no
2420 // marker, return 0xff, which is never a valid marker value
stbi__get_marker(stbi__jpeg * j)2421 static stbi_uc stbi__get_marker(stbi__jpeg *j)
2422 {
2423    stbi_uc x;
2424    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2425    x = stbi__get8(j->s);
2426    if (x != 0xff) return STBI__MARKER_none;
2427    while (x == 0xff)
2428       x = stbi__get8(j->s);
2429    return x;
2430 }
2431 
2432 // in each scan, we'll have scan_n components, and the order
2433 // of the components is specified by order[]
2434 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
2435 
2436 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2437 // the dc prediction
stbi__jpeg_reset(stbi__jpeg * j)2438 static void stbi__jpeg_reset(stbi__jpeg *j)
2439 {
2440    j->code_bits = 0;
2441    j->code_buffer = 0;
2442    j->nomore = 0;
2443    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
2444    j->marker = STBI__MARKER_none;
2445    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2446    j->eob_run = 0;
2447    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2448    // since we don't even allow 1<<30 pixels
2449 }
2450 
stbi__parse_entropy_coded_data(stbi__jpeg * z)2451 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2452 {
2453    stbi__jpeg_reset(z);
2454    if (!z->progressive) {
2455       if (z->scan_n == 1) {
2456          int i,j;
2457          STBI_SIMD_ALIGN(short, data[64]);
2458          int n = z->order[0];
2459          // non-interleaved data, we just need to process one block at a time,
2460          // in trivial scanline order
2461          // number of blocks to do just depends on how many actual "pixels" this
2462          // component has, independent of interleaved MCU blocking and such
2463          int w = (z->img_comp[n].x+7) >> 3;
2464          int h = (z->img_comp[n].y+7) >> 3;
2465          for (j=0; j < h; ++j) {
2466             for (i=0; i < w; ++i) {
2467                int ha = z->img_comp[n].ha;
2468                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2469                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2470                // every data block is an MCU, so countdown the restart interval
2471                if (--z->todo <= 0) {
2472                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2473                   // if it's NOT a restart, then just bail, so we get corrupt data
2474                   // rather than no data
2475                   if (!STBI__RESTART(z->marker)) return 1;
2476                   stbi__jpeg_reset(z);
2477                }
2478             }
2479          }
2480          return 1;
2481       } else { // interleaved
2482          int i,j,k,x,y;
2483          STBI_SIMD_ALIGN(short, data[64]);
2484          for (j=0; j < z->img_mcu_y; ++j) {
2485             for (i=0; i < z->img_mcu_x; ++i) {
2486                // scan an interleaved mcu... process scan_n components in order
2487                for (k=0; k < z->scan_n; ++k) {
2488                   int n = z->order[k];
2489                   // scan out an mcu's worth of this component; that's just determined
2490                   // by the basic H and V specified for the component
2491                   for (y=0; y < z->img_comp[n].v; ++y) {
2492                      for (x=0; x < z->img_comp[n].h; ++x) {
2493                         int x2 = (i*z->img_comp[n].h + x)*8;
2494                         int y2 = (j*z->img_comp[n].v + y)*8;
2495                         int ha = z->img_comp[n].ha;
2496                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2497                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2498                      }
2499                   }
2500                }
2501                // after all interleaved components, that's an interleaved MCU,
2502                // so now count down the restart interval
2503                if (--z->todo <= 0) {
2504                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2505                   if (!STBI__RESTART(z->marker)) return 1;
2506                   stbi__jpeg_reset(z);
2507                }
2508             }
2509          }
2510          return 1;
2511       }
2512    } else {
2513       if (z->scan_n == 1) {
2514          int i,j;
2515          int n = z->order[0];
2516          // non-interleaved data, we just need to process one block at a time,
2517          // in trivial scanline order
2518          // number of blocks to do just depends on how many actual "pixels" this
2519          // component has, independent of interleaved MCU blocking and such
2520          int w = (z->img_comp[n].x+7) >> 3;
2521          int h = (z->img_comp[n].y+7) >> 3;
2522          for (j=0; j < h; ++j) {
2523             for (i=0; i < w; ++i) {
2524                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2525                if (z->spec_start == 0) {
2526                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2527                      return 0;
2528                } else {
2529                   int ha = z->img_comp[n].ha;
2530                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2531                      return 0;
2532                }
2533                // every data block is an MCU, so countdown the restart interval
2534                if (--z->todo <= 0) {
2535                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2536                   if (!STBI__RESTART(z->marker)) return 1;
2537                   stbi__jpeg_reset(z);
2538                }
2539             }
2540          }
2541          return 1;
2542       } else { // interleaved
2543          int i,j,k,x,y;
2544          for (j=0; j < z->img_mcu_y; ++j) {
2545             for (i=0; i < z->img_mcu_x; ++i) {
2546                // scan an interleaved mcu... process scan_n components in order
2547                for (k=0; k < z->scan_n; ++k) {
2548                   int n = z->order[k];
2549                   // scan out an mcu's worth of this component; that's just determined
2550                   // by the basic H and V specified for the component
2551                   for (y=0; y < z->img_comp[n].v; ++y) {
2552                      for (x=0; x < z->img_comp[n].h; ++x) {
2553                         int x2 = (i*z->img_comp[n].h + x);
2554                         int y2 = (j*z->img_comp[n].v + y);
2555                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
2556                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2557                            return 0;
2558                      }
2559                   }
2560                }
2561                // after all interleaved components, that's an interleaved MCU,
2562                // so now count down the restart interval
2563                if (--z->todo <= 0) {
2564                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2565                   if (!STBI__RESTART(z->marker)) return 1;
2566                   stbi__jpeg_reset(z);
2567                }
2568             }
2569          }
2570          return 1;
2571       }
2572    }
2573 }
2574 
stbi__jpeg_dequantize(short * data,stbi_uc * dequant)2575 static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant)
2576 {
2577    int i;
2578    for (i=0; i < 64; ++i)
2579       data[i] *= dequant[i];
2580 }
2581 
stbi__jpeg_finish(stbi__jpeg * z)2582 static void stbi__jpeg_finish(stbi__jpeg *z)
2583 {
2584    if (z->progressive) {
2585       // dequantize and idct the data
2586       int i,j,n;
2587       for (n=0; n < z->s->img_n; ++n) {
2588          int w = (z->img_comp[n].x+7) >> 3;
2589          int h = (z->img_comp[n].y+7) >> 3;
2590          for (j=0; j < h; ++j) {
2591             for (i=0; i < w; ++i) {
2592                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2593                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
2594                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2595             }
2596          }
2597       }
2598    }
2599 }
2600 
stbi__process_marker(stbi__jpeg * z,int m)2601 static int stbi__process_marker(stbi__jpeg *z, int m)
2602 {
2603    int L;
2604    switch (m) {
2605       case STBI__MARKER_none: // no marker found
2606          return stbi__err("expected marker","Corrupt JPEG");
2607 
2608       case 0xDD: // DRI - specify restart interval
2609          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
2610          z->restart_interval = stbi__get16be(z->s);
2611          return 1;
2612 
2613       case 0xDB: // DQT - define quantization table
2614          L = stbi__get16be(z->s)-2;
2615          while (L > 0) {
2616             int q = stbi__get8(z->s);
2617             int p = q >> 4;
2618             int t = q & 15,i;
2619             if (p != 0) return stbi__err("bad DQT type","Corrupt JPEG");
2620             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
2621             for (i=0; i < 64; ++i)
2622                z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
2623             L -= 65;
2624          }
2625          return L==0;
2626 
2627       case 0xC4: // DHT - define huffman table
2628          L = stbi__get16be(z->s)-2;
2629          while (L > 0) {
2630             stbi_uc *v;
2631             int sizes[16],i,n=0;
2632             int q = stbi__get8(z->s);
2633             int tc = q >> 4;
2634             int th = q & 15;
2635             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
2636             for (i=0; i < 16; ++i) {
2637                sizes[i] = stbi__get8(z->s);
2638                n += sizes[i];
2639             }
2640             L -= 17;
2641             if (tc == 0) {
2642                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
2643                v = z->huff_dc[th].values;
2644             } else {
2645                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
2646                v = z->huff_ac[th].values;
2647             }
2648             for (i=0; i < n; ++i)
2649                v[i] = stbi__get8(z->s);
2650             if (tc != 0)
2651                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
2652             L -= n;
2653          }
2654          return L==0;
2655    }
2656    // check for comment block or APP blocks
2657    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
2658       stbi__skip(z->s, stbi__get16be(z->s)-2);
2659       return 1;
2660    }
2661    return 0;
2662 }
2663 
2664 // after we see SOS
stbi__process_scan_header(stbi__jpeg * z)2665 static int stbi__process_scan_header(stbi__jpeg *z)
2666 {
2667    int i;
2668    int Ls = stbi__get16be(z->s);
2669    z->scan_n = stbi__get8(z->s);
2670    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
2671    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
2672    for (i=0; i < z->scan_n; ++i) {
2673       int id = stbi__get8(z->s), which;
2674       int q = stbi__get8(z->s);
2675       for (which = 0; which < z->s->img_n; ++which)
2676          if (z->img_comp[which].id == id)
2677             break;
2678       if (which == z->s->img_n) return 0; // no match
2679       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
2680       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
2681       z->order[i] = which;
2682    }
2683 
2684    {
2685       int aa;
2686       z->spec_start = stbi__get8(z->s);
2687       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
2688       aa = stbi__get8(z->s);
2689       z->succ_high = (aa >> 4);
2690       z->succ_low  = (aa & 15);
2691       if (z->progressive) {
2692          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
2693             return stbi__err("bad SOS", "Corrupt JPEG");
2694       } else {
2695          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
2696          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
2697          z->spec_end = 63;
2698       }
2699    }
2700 
2701    return 1;
2702 }
2703 
stbi__process_frame_header(stbi__jpeg * z,int scan)2704 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
2705 {
2706    stbi__context *s = z->s;
2707    int Lf,p,i,q, h_max=1,v_max=1,c;
2708    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
2709    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
2710    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
2711    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
2712    c = stbi__get8(s);
2713    if (c != 3 && c != 1) return stbi__err("bad component count","Corrupt JPEG");    // JFIF requires
2714    s->img_n = c;
2715    for (i=0; i < c; ++i) {
2716       z->img_comp[i].data = NULL;
2717       z->img_comp[i].linebuf = NULL;
2718    }
2719 
2720    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
2721 
2722    for (i=0; i < s->img_n; ++i) {
2723       z->img_comp[i].id = stbi__get8(s);
2724       if (z->img_comp[i].id != i+1)   // JFIF requires
2725          if (z->img_comp[i].id != i)  // some version of jpegtran outputs non-JFIF-compliant files!
2726             return stbi__err("bad component ID","Corrupt JPEG");
2727       q = stbi__get8(s);
2728       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
2729       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
2730       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
2731    }
2732 
2733    if (scan != STBI__SCAN_load) return 1;
2734 
2735    if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
2736 
2737    for (i=0; i < s->img_n; ++i) {
2738       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
2739       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
2740    }
2741 
2742    // compute interleaved mcu info
2743    z->img_h_max = h_max;
2744    z->img_v_max = v_max;
2745    z->img_mcu_w = h_max * 8;
2746    z->img_mcu_h = v_max * 8;
2747    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
2748    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
2749 
2750    for (i=0; i < s->img_n; ++i) {
2751       // number of effective pixels (e.g. for non-interleaved MCU)
2752       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
2753       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
2754       // to simplify generation, we'll allocate enough memory to decode
2755       // the bogus oversized data from using interleaved MCUs and their
2756       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
2757       // discard the extra data until colorspace conversion
2758       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
2759       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
2760       z->img_comp[i].raw_data = stbi__malloc(z->img_comp[i].w2 * z->img_comp[i].h2+15);
2761 
2762       if (z->img_comp[i].raw_data == NULL) {
2763          for(--i; i >= 0; --i) {
2764             STBI_FREE(z->img_comp[i].raw_data);
2765             z->img_comp[i].raw_data = NULL;
2766          }
2767          return stbi__err("outofmem", "Out of memory");
2768       }
2769       // align blocks for idct using mmx/sse
2770       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
2771       z->img_comp[i].linebuf = NULL;
2772       if (z->progressive) {
2773          z->img_comp[i].coeff_w = (z->img_comp[i].w2 + 7) >> 3;
2774          z->img_comp[i].coeff_h = (z->img_comp[i].h2 + 7) >> 3;
2775          z->img_comp[i].raw_coeff = STBI_MALLOC(z->img_comp[i].coeff_w * z->img_comp[i].coeff_h * 64 * sizeof(short) + 15);
2776          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
2777       } else {
2778          z->img_comp[i].coeff = 0;
2779          z->img_comp[i].raw_coeff = 0;
2780       }
2781    }
2782 
2783    return 1;
2784 }
2785 
2786 // use comparisons since in some cases we handle more than one case (e.g. SOF)
2787 #define stbi__DNL(x)         ((x) == 0xdc)
2788 #define stbi__SOI(x)         ((x) == 0xd8)
2789 #define stbi__EOI(x)         ((x) == 0xd9)
2790 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
2791 #define stbi__SOS(x)         ((x) == 0xda)
2792 
2793 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
2794 
stbi__decode_jpeg_header(stbi__jpeg * z,int scan)2795 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
2796 {
2797    int m;
2798    z->marker = STBI__MARKER_none; // initialize cached marker to empty
2799    m = stbi__get_marker(z);
2800    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
2801    if (scan == STBI__SCAN_type) return 1;
2802    m = stbi__get_marker(z);
2803    while (!stbi__SOF(m)) {
2804       if (!stbi__process_marker(z,m)) return 0;
2805       m = stbi__get_marker(z);
2806       while (m == STBI__MARKER_none) {
2807          // some files have extra padding after their blocks, so ok, we'll scan
2808          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
2809          m = stbi__get_marker(z);
2810       }
2811    }
2812    z->progressive = stbi__SOF_progressive(m);
2813    if (!stbi__process_frame_header(z, scan)) return 0;
2814    return 1;
2815 }
2816 
2817 // decode image to YCbCr format
stbi__decode_jpeg_image(stbi__jpeg * j)2818 static int stbi__decode_jpeg_image(stbi__jpeg *j)
2819 {
2820    int m;
2821    for (m = 0; m < 4; m++) {
2822       j->img_comp[m].raw_data = NULL;
2823       j->img_comp[m].raw_coeff = NULL;
2824    }
2825    j->restart_interval = 0;
2826    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
2827    m = stbi__get_marker(j);
2828    while (!stbi__EOI(m)) {
2829       if (stbi__SOS(m)) {
2830          if (!stbi__process_scan_header(j)) return 0;
2831          if (!stbi__parse_entropy_coded_data(j)) return 0;
2832          if (j->marker == STBI__MARKER_none ) {
2833             // handle 0s at the end of image data from IP Kamera 9060
2834             while (!stbi__at_eof(j->s)) {
2835                int x = stbi__get8(j->s);
2836                if (x == 255) {
2837                   j->marker = stbi__get8(j->s);
2838                   break;
2839                } else if (x != 0) {
2840                   return stbi__err("junk before marker", "Corrupt JPEG");
2841                }
2842             }
2843             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
2844          }
2845       } else {
2846          if (!stbi__process_marker(j, m)) return 0;
2847       }
2848       m = stbi__get_marker(j);
2849    }
2850    if (j->progressive)
2851       stbi__jpeg_finish(j);
2852    return 1;
2853 }
2854 
2855 // static jfif-centered resampling (across block boundaries)
2856 
2857 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
2858                                     int w, int hs);
2859 
2860 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
2861 
resample_row_1(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2862 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2863 {
2864    STBI_NOTUSED(out);
2865    STBI_NOTUSED(in_far);
2866    STBI_NOTUSED(w);
2867    STBI_NOTUSED(hs);
2868    return in_near;
2869 }
2870 
stbi__resample_row_v_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2871 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2872 {
2873    // need to generate two samples vertically for every one in input
2874    int i;
2875    STBI_NOTUSED(hs);
2876    for (i=0; i < w; ++i)
2877       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
2878    return out;
2879 }
2880 
stbi__resample_row_h_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2881 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2882 {
2883    // need to generate two samples horizontally for every one in input
2884    int i;
2885    stbi_uc *input = in_near;
2886 
2887    if (w == 1) {
2888       // if only one sample, can't do any interpolation
2889       out[0] = out[1] = input[0];
2890       return out;
2891    }
2892 
2893    out[0] = input[0];
2894    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
2895    for (i=1; i < w-1; ++i) {
2896       int n = 3*input[i]+2;
2897       out[i*2+0] = stbi__div4(n+input[i-1]);
2898       out[i*2+1] = stbi__div4(n+input[i+1]);
2899    }
2900    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
2901    out[i*2+1] = input[w-1];
2902 
2903    STBI_NOTUSED(in_far);
2904    STBI_NOTUSED(hs);
2905 
2906    return out;
2907 }
2908 
2909 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
2910 
stbi__resample_row_hv_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2911 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2912 {
2913    // need to generate 2x2 samples for every one in input
2914    int i,t0,t1;
2915    if (w == 1) {
2916       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2917       return out;
2918    }
2919 
2920    t1 = 3*in_near[0] + in_far[0];
2921    out[0] = stbi__div4(t1+2);
2922    for (i=1; i < w; ++i) {
2923       t0 = t1;
2924       t1 = 3*in_near[i]+in_far[i];
2925       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
2926       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
2927    }
2928    out[w*2-1] = stbi__div4(t1+2);
2929 
2930    STBI_NOTUSED(hs);
2931 
2932    return out;
2933 }
2934 
2935 #if defined(STBI_SSE2) || defined(STBI_NEON)
stbi__resample_row_hv_2_simd(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2936 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2937 {
2938    // need to generate 2x2 samples for every one in input
2939    int i=0,t0,t1;
2940 
2941    if (w == 1) {
2942       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2943       return out;
2944    }
2945 
2946    t1 = 3*in_near[0] + in_far[0];
2947    // process groups of 8 pixels for as long as we can.
2948    // note we can't handle the last pixel in a row in this loop
2949    // because we need to handle the filter boundary conditions.
2950    for (; i < ((w-1) & ~7); i += 8) {
2951 #if defined(STBI_SSE2)
2952       // load and perform the vertical filtering pass
2953       // this uses 3*x + y = 4*x + (y - x)
2954       __m128i zero  = _mm_setzero_si128();
2955       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
2956       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
2957       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
2958       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
2959       __m128i diff  = _mm_sub_epi16(farw, nearw);
2960       __m128i nears = _mm_slli_epi16(nearw, 2);
2961       __m128i curr  = _mm_add_epi16(nears, diff); // current row
2962 
2963       // horizontal filter works the same based on shifted vers of current
2964       // row. "prev" is current row shifted right by 1 pixel; we need to
2965       // insert the previous pixel value (from t1).
2966       // "next" is current row shifted left by 1 pixel, with first pixel
2967       // of next block of 8 pixels added in.
2968       __m128i prv0 = _mm_slli_si128(curr, 2);
2969       __m128i nxt0 = _mm_srli_si128(curr, 2);
2970       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
2971       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
2972 
2973       // horizontal filter, polyphase implementation since it's convenient:
2974       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
2975       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
2976       // note the shared term.
2977       __m128i bias  = _mm_set1_epi16(8);
2978       __m128i curs = _mm_slli_epi16(curr, 2);
2979       __m128i prvd = _mm_sub_epi16(prev, curr);
2980       __m128i nxtd = _mm_sub_epi16(next, curr);
2981       __m128i curb = _mm_add_epi16(curs, bias);
2982       __m128i even = _mm_add_epi16(prvd, curb);
2983       __m128i odd  = _mm_add_epi16(nxtd, curb);
2984 
2985       // interleave even and odd pixels, then undo scaling.
2986       __m128i int0 = _mm_unpacklo_epi16(even, odd);
2987       __m128i int1 = _mm_unpackhi_epi16(even, odd);
2988       __m128i de0  = _mm_srli_epi16(int0, 4);
2989       __m128i de1  = _mm_srli_epi16(int1, 4);
2990 
2991       // pack and write output
2992       __m128i outv = _mm_packus_epi16(de0, de1);
2993       _mm_storeu_si128((__m128i *) (out + i*2), outv);
2994 #elif defined(STBI_NEON)
2995       // load and perform the vertical filtering pass
2996       // this uses 3*x + y = 4*x + (y - x)
2997       uint8x8_t farb  = vld1_u8(in_far + i);
2998       uint8x8_t nearb = vld1_u8(in_near + i);
2999       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3000       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3001       int16x8_t curr  = vaddq_s16(nears, diff); // current row
3002 
3003       // horizontal filter works the same based on shifted vers of current
3004       // row. "prev" is current row shifted right by 1 pixel; we need to
3005       // insert the previous pixel value (from t1).
3006       // "next" is current row shifted left by 1 pixel, with first pixel
3007       // of next block of 8 pixels added in.
3008       int16x8_t prv0 = vextq_s16(curr, curr, 7);
3009       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3010       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3011       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
3012 
3013       // horizontal filter, polyphase implementation since it's convenient:
3014       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3015       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3016       // note the shared term.
3017       int16x8_t curs = vshlq_n_s16(curr, 2);
3018       int16x8_t prvd = vsubq_s16(prev, curr);
3019       int16x8_t nxtd = vsubq_s16(next, curr);
3020       int16x8_t even = vaddq_s16(curs, prvd);
3021       int16x8_t odd  = vaddq_s16(curs, nxtd);
3022 
3023       // undo scaling and round, then store with even/odd phases interleaved
3024       uint8x8x2_t o;
3025       o.val[0] = vqrshrun_n_s16(even, 4);
3026       o.val[1] = vqrshrun_n_s16(odd,  4);
3027       vst2_u8(out + i*2, o);
3028 #endif
3029 
3030       // "previous" value for next iter
3031       t1 = 3*in_near[i+7] + in_far[i+7];
3032    }
3033 
3034    t0 = t1;
3035    t1 = 3*in_near[i] + in_far[i];
3036    out[i*2] = stbi__div16(3*t1 + t0 + 8);
3037 
3038    for (++i; i < w; ++i) {
3039       t0 = t1;
3040       t1 = 3*in_near[i]+in_far[i];
3041       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3042       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3043    }
3044    out[w*2-1] = stbi__div4(t1+2);
3045 
3046    STBI_NOTUSED(hs);
3047 
3048    return out;
3049 }
3050 #endif
3051 
stbi__resample_row_generic(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)3052 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3053 {
3054    // resample with nearest-neighbor
3055    int i,j;
3056    STBI_NOTUSED(in_far);
3057    for (i=0; i < w; ++i)
3058       for (j=0; j < hs; ++j)
3059          out[i*hs+j] = in_near[i];
3060    return out;
3061 }
3062 
3063 #ifdef STBI_JPEG_OLD
3064 // this is the same YCbCr-to-RGB calculation that stb_image has used
3065 // historically before the algorithm changes in 1.49
3066 #define float2fixed(x)  ((int) ((x) * 65536 + 0.5))
stbi__YCbCr_to_RGB_row(stbi_uc * out,const stbi_uc * y,const stbi_uc * pcb,const stbi_uc * pcr,int count,int step)3067 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3068 {
3069    int i;
3070    for (i=0; i < count; ++i) {
3071       int y_fixed = (y[i] << 16) + 32768; // rounding
3072       int r,g,b;
3073       int cr = pcr[i] - 128;
3074       int cb = pcb[i] - 128;
3075       r = y_fixed + cr*float2fixed(1.40200f);
3076       g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
3077       b = y_fixed                            + cb*float2fixed(1.77200f);
3078       r >>= 16;
3079       g >>= 16;
3080       b >>= 16;
3081       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3082       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3083       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3084       out[0] = (stbi_uc)r;
3085       out[1] = (stbi_uc)g;
3086       out[2] = (stbi_uc)b;
3087       out[3] = 255;
3088       out += step;
3089    }
3090 }
3091 #else
3092 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
3093 // to make sure the code produces the same results in both SIMD and scalar
3094 #define float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
stbi__YCbCr_to_RGB_row(stbi_uc * out,const stbi_uc * y,const stbi_uc * pcb,const stbi_uc * pcr,int count,int step)3095 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3096 {
3097    int i;
3098    for (i=0; i < count; ++i) {
3099       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3100       int r,g,b;
3101       int cr = pcr[i] - 128;
3102       int cb = pcb[i] - 128;
3103       r = y_fixed +  cr* float2fixed(1.40200f);
3104       g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3105       b = y_fixed                               +   cb* float2fixed(1.77200f);
3106       r >>= 20;
3107       g >>= 20;
3108       b >>= 20;
3109       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3110       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3111       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3112       out[0] = (stbi_uc)r;
3113       out[1] = (stbi_uc)g;
3114       out[2] = (stbi_uc)b;
3115       out[3] = 255;
3116       out += step;
3117    }
3118 }
3119 #endif
3120 
3121 #if defined(STBI_SSE2) || defined(STBI_NEON)
stbi__YCbCr_to_RGB_simd(stbi_uc * out,stbi_uc const * y,stbi_uc const * pcb,stbi_uc const * pcr,int count,int step)3122 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3123 {
3124    int i = 0;
3125 
3126 #ifdef STBI_SSE2
3127    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3128    // it's useful in practice (you wouldn't use it for textures, for example).
3129    // so just accelerate step == 4 case.
3130    if (step == 4) {
3131       // this is a fairly straightforward implementation and not super-optimized.
3132       __m128i signflip  = _mm_set1_epi8(-0x80);
3133       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
3134       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3135       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3136       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
3137       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3138       __m128i xw = _mm_set1_epi16(255); // alpha channel
3139 
3140       for (; i+7 < count; i += 8) {
3141          // load
3142          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3143          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3144          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3145          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3146          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3147 
3148          // unpack to short (and left-shift cr, cb by 8)
3149          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
3150          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3151          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3152 
3153          // color transform
3154          __m128i yws = _mm_srli_epi16(yw, 4);
3155          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3156          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3157          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3158          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3159          __m128i rws = _mm_add_epi16(cr0, yws);
3160          __m128i gwt = _mm_add_epi16(cb0, yws);
3161          __m128i bws = _mm_add_epi16(yws, cb1);
3162          __m128i gws = _mm_add_epi16(gwt, cr1);
3163 
3164          // descale
3165          __m128i rw = _mm_srai_epi16(rws, 4);
3166          __m128i bw = _mm_srai_epi16(bws, 4);
3167          __m128i gw = _mm_srai_epi16(gws, 4);
3168 
3169          // back to byte, set up for transpose
3170          __m128i brb = _mm_packus_epi16(rw, bw);
3171          __m128i gxb = _mm_packus_epi16(gw, xw);
3172 
3173          // transpose to interleave channels
3174          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3175          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3176          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3177          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3178 
3179          // store
3180          _mm_storeu_si128((__m128i *) (out + 0), o0);
3181          _mm_storeu_si128((__m128i *) (out + 16), o1);
3182          out += 32;
3183       }
3184    }
3185 #endif
3186 
3187 #ifdef STBI_NEON
3188    // in this version, step=3 support would be easy to add. but is there demand?
3189    if (step == 4) {
3190       // this is a fairly straightforward implementation and not super-optimized.
3191       uint8x8_t signflip = vdup_n_u8(0x80);
3192       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
3193       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3194       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3195       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
3196 
3197       for (; i+7 < count; i += 8) {
3198          // load
3199          uint8x8_t y_bytes  = vld1_u8(y + i);
3200          uint8x8_t cr_bytes = vld1_u8(pcr + i);
3201          uint8x8_t cb_bytes = vld1_u8(pcb + i);
3202          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3203          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3204 
3205          // expand to s16
3206          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3207          int16x8_t crw = vshll_n_s8(cr_biased, 7);
3208          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3209 
3210          // color transform
3211          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3212          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3213          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3214          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3215          int16x8_t rws = vaddq_s16(yws, cr0);
3216          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3217          int16x8_t bws = vaddq_s16(yws, cb1);
3218 
3219          // undo scaling, round, convert to byte
3220          uint8x8x4_t o;
3221          o.val[0] = vqrshrun_n_s16(rws, 4);
3222          o.val[1] = vqrshrun_n_s16(gws, 4);
3223          o.val[2] = vqrshrun_n_s16(bws, 4);
3224          o.val[3] = vdup_n_u8(255);
3225 
3226          // store, interleaving r/g/b/a
3227          vst4_u8(out, o);
3228          out += 8*4;
3229       }
3230    }
3231 #endif
3232 
3233    for (; i < count; ++i) {
3234       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3235       int r,g,b;
3236       int cr = pcr[i] - 128;
3237       int cb = pcb[i] - 128;
3238       r = y_fixed + cr* float2fixed(1.40200f);
3239       g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3240       b = y_fixed                             +   cb* float2fixed(1.77200f);
3241       r >>= 20;
3242       g >>= 20;
3243       b >>= 20;
3244       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3245       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3246       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3247       out[0] = (stbi_uc)r;
3248       out[1] = (stbi_uc)g;
3249       out[2] = (stbi_uc)b;
3250       out[3] = 255;
3251       out += step;
3252    }
3253 }
3254 #endif
3255 
3256 // set up the kernels
stbi__setup_jpeg(stbi__jpeg * j)3257 static void stbi__setup_jpeg(stbi__jpeg *j)
3258 {
3259    j->idct_block_kernel = stbi__idct_block;
3260    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3261    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3262 
3263 #ifdef STBI_SSE2
3264    if (stbi__sse2_available()) {
3265       j->idct_block_kernel = stbi__idct_simd;
3266       #ifndef STBI_JPEG_OLD
3267       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3268       #endif
3269       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3270    }
3271 #endif
3272 
3273 #ifdef STBI_NEON
3274    j->idct_block_kernel = stbi__idct_simd;
3275    #ifndef STBI_JPEG_OLD
3276    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3277    #endif
3278    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3279 #endif
3280 }
3281 
3282 // clean up the temporary component buffers
stbi__cleanup_jpeg(stbi__jpeg * j)3283 static void stbi__cleanup_jpeg(stbi__jpeg *j)
3284 {
3285    int i;
3286    for (i=0; i < j->s->img_n; ++i) {
3287       if (j->img_comp[i].raw_data) {
3288          STBI_FREE(j->img_comp[i].raw_data);
3289          j->img_comp[i].raw_data = NULL;
3290          j->img_comp[i].data = NULL;
3291       }
3292       if (j->img_comp[i].raw_coeff) {
3293          STBI_FREE(j->img_comp[i].raw_coeff);
3294          j->img_comp[i].raw_coeff = 0;
3295          j->img_comp[i].coeff = 0;
3296       }
3297       if (j->img_comp[i].linebuf) {
3298          STBI_FREE(j->img_comp[i].linebuf);
3299          j->img_comp[i].linebuf = NULL;
3300       }
3301    }
3302 }
3303 
3304 typedef struct
3305 {
3306    resample_row_func resample;
3307    stbi_uc *line0,*line1;
3308    int hs,vs;   // expansion factor in each axis
3309    int w_lores; // horizontal pixels pre-expansion
3310    int ystep;   // how far through vertical expansion we are
3311    int ypos;    // which pre-expansion row we're on
3312 } stbi__resample;
3313 
load_jpeg_image(stbi__jpeg * z,int * out_x,int * out_y,int * comp,int req_comp)3314 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3315 {
3316    int n, decode_n;
3317    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3318 
3319    // validate req_comp
3320    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3321 
3322    // load a jpeg image from whichever source, but leave in YCbCr format
3323    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3324 
3325    // determine actual number of components to generate
3326    n = req_comp ? req_comp : z->s->img_n;
3327 
3328    if (z->s->img_n == 3 && n < 3)
3329       decode_n = 1;
3330    else
3331       decode_n = z->s->img_n;
3332 
3333    // resample and color-convert
3334    {
3335       int k;
3336       unsigned int i,j;
3337       stbi_uc *output;
3338       stbi_uc *coutput[4];
3339 
3340       stbi__resample res_comp[4];
3341 
3342       for (k=0; k < decode_n; ++k) {
3343          stbi__resample *r = &res_comp[k];
3344 
3345          // allocate line buffer big enough for upsampling off the edges
3346          // with upsample factor of 4
3347          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3348          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3349 
3350          r->hs      = z->img_h_max / z->img_comp[k].h;
3351          r->vs      = z->img_v_max / z->img_comp[k].v;
3352          r->ystep   = r->vs >> 1;
3353          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3354          r->ypos    = 0;
3355          r->line0   = r->line1 = z->img_comp[k].data;
3356 
3357          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3358          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3359          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3360          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3361          else                               r->resample = stbi__resample_row_generic;
3362       }
3363 
3364       // can't error after this so, this is safe
3365       output = (stbi_uc *) stbi__malloc(n * z->s->img_x * z->s->img_y + 1);
3366       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3367 
3368       // now go ahead and resample
3369       for (j=0; j < z->s->img_y; ++j) {
3370          stbi_uc *out = output + n * z->s->img_x * j;
3371          for (k=0; k < decode_n; ++k) {
3372             stbi__resample *r = &res_comp[k];
3373             int y_bot = r->ystep >= (r->vs >> 1);
3374             coutput[k] = r->resample(z->img_comp[k].linebuf,
3375                                      y_bot ? r->line1 : r->line0,
3376                                      y_bot ? r->line0 : r->line1,
3377                                      r->w_lores, r->hs);
3378             if (++r->ystep >= r->vs) {
3379                r->ystep = 0;
3380                r->line0 = r->line1;
3381                if (++r->ypos < z->img_comp[k].y)
3382                   r->line1 += z->img_comp[k].w2;
3383             }
3384          }
3385          if (n >= 3) {
3386             stbi_uc *y = coutput[0];
3387             if (z->s->img_n == 3) {
3388                z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3389             } else
3390                for (i=0; i < z->s->img_x; ++i) {
3391                   out[0] = out[1] = out[2] = y[i];
3392                   out[3] = 255; // not used if n==3
3393                   out += n;
3394                }
3395          } else {
3396             stbi_uc *y = coutput[0];
3397             if (n == 1)
3398                for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
3399             else
3400                for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
3401          }
3402       }
3403       stbi__cleanup_jpeg(z);
3404       *out_x = z->s->img_x;
3405       *out_y = z->s->img_y;
3406       if (comp) *comp  = z->s->img_n; // report original components, not output
3407       return output;
3408    }
3409 }
3410 
stbi__jpeg_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)3411 static unsigned char *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
3412 {
3413    stbi__jpeg j;
3414    j.s = s;
3415    stbi__setup_jpeg(&j);
3416    return load_jpeg_image(&j, x,y,comp,req_comp);
3417 }
3418 
stbi__jpeg_test(stbi__context * s)3419 static int stbi__jpeg_test(stbi__context *s)
3420 {
3421    int r;
3422    stbi__jpeg j;
3423    j.s = s;
3424    stbi__setup_jpeg(&j);
3425    r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
3426    stbi__rewind(s);
3427    return r;
3428 }
3429 
stbi__jpeg_info_raw(stbi__jpeg * j,int * x,int * y,int * comp)3430 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
3431 {
3432    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3433       stbi__rewind( j->s );
3434       return 0;
3435    }
3436    if (x) *x = j->s->img_x;
3437    if (y) *y = j->s->img_y;
3438    if (comp) *comp = j->s->img_n;
3439    return 1;
3440 }
3441 
stbi__jpeg_info(stbi__context * s,int * x,int * y,int * comp)3442 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
3443 {
3444    stbi__jpeg j;
3445    j.s = s;
3446    return stbi__jpeg_info_raw(&j, x, y, comp);
3447 }
3448 #endif
3449 
3450 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
3451 //    simple implementation
3452 //      - all input must be provided in an upfront buffer
3453 //      - all output is written to a single output buffer (can malloc/realloc)
3454 //    performance
3455 //      - fast huffman
3456 
3457 #ifndef STBI_NO_ZLIB
3458 
3459 // fast-way is faster to check than jpeg huffman, but slow way is slower
3460 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
3461 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
3462 
3463 // zlib-style huffman encoding
3464 // (jpegs packs from left, zlib from right, so can't share code)
3465 typedef struct
3466 {
3467    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
3468    stbi__uint16 firstcode[16];
3469    int maxcode[17];
3470    stbi__uint16 firstsymbol[16];
3471    stbi_uc  size[288];
3472    stbi__uint16 value[288];
3473 } stbi__zhuffman;
3474 
stbi__bitreverse16(int n)3475 stbi_inline static int stbi__bitreverse16(int n)
3476 {
3477   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
3478   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
3479   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
3480   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
3481   return n;
3482 }
3483 
stbi__bit_reverse(int v,int bits)3484 stbi_inline static int stbi__bit_reverse(int v, int bits)
3485 {
3486    STBI_ASSERT(bits <= 16);
3487    // to bit reverse n bits, reverse 16 and shift
3488    // e.g. 11 bits, bit reverse and shift away 5
3489    return stbi__bitreverse16(v) >> (16-bits);
3490 }
3491 
stbi__zbuild_huffman(stbi__zhuffman * z,stbi_uc * sizelist,int num)3492 static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num)
3493 {
3494    int i,k=0;
3495    int code, next_code[16], sizes[17];
3496 
3497    // DEFLATE spec for generating codes
3498    memset(sizes, 0, sizeof(sizes));
3499    memset(z->fast, 0, sizeof(z->fast));
3500    for (i=0; i < num; ++i)
3501       ++sizes[sizelist[i]];
3502    sizes[0] = 0;
3503    for (i=1; i < 16; ++i)
3504       if (sizes[i] > (1 << i))
3505          return stbi__err("bad sizes", "Corrupt PNG");
3506    code = 0;
3507    for (i=1; i < 16; ++i) {
3508       next_code[i] = code;
3509       z->firstcode[i] = (stbi__uint16) code;
3510       z->firstsymbol[i] = (stbi__uint16) k;
3511       code = (code + sizes[i]);
3512       if (sizes[i])
3513          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
3514       z->maxcode[i] = code << (16-i); // preshift for inner loop
3515       code <<= 1;
3516       k += sizes[i];
3517    }
3518    z->maxcode[16] = 0x10000; // sentinel
3519    for (i=0; i < num; ++i) {
3520       int s = sizelist[i];
3521       if (s) {
3522          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
3523          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
3524          z->size [c] = (stbi_uc     ) s;
3525          z->value[c] = (stbi__uint16) i;
3526          if (s <= STBI__ZFAST_BITS) {
3527             int j = stbi__bit_reverse(next_code[s],s);
3528             while (j < (1 << STBI__ZFAST_BITS)) {
3529                z->fast[j] = fastv;
3530                j += (1 << s);
3531             }
3532          }
3533          ++next_code[s];
3534       }
3535    }
3536    return 1;
3537 }
3538 
3539 // zlib-from-memory implementation for PNG reading
3540 //    because PNG allows splitting the zlib stream arbitrarily,
3541 //    and it's annoying structurally to have PNG call ZLIB call PNG,
3542 //    we require PNG read all the IDATs and combine them into a single
3543 //    memory buffer
3544 
3545 typedef struct
3546 {
3547    stbi_uc *zbuffer, *zbuffer_end;
3548    int num_bits;
3549    stbi__uint32 code_buffer;
3550 
3551    char *zout;
3552    char *zout_start;
3553    char *zout_end;
3554    int   z_expandable;
3555 
3556    stbi__zhuffman z_length, z_distance;
3557 } stbi__zbuf;
3558 
stbi__zget8(stbi__zbuf * z)3559 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
3560 {
3561    if (z->zbuffer >= z->zbuffer_end) return 0;
3562    return *z->zbuffer++;
3563 }
3564 
stbi__fill_bits(stbi__zbuf * z)3565 static void stbi__fill_bits(stbi__zbuf *z)
3566 {
3567    do {
3568       STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
3569       z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
3570       z->num_bits += 8;
3571    } while (z->num_bits <= 24);
3572 }
3573 
stbi__zreceive(stbi__zbuf * z,int n)3574 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
3575 {
3576    unsigned int k;
3577    if (z->num_bits < n) stbi__fill_bits(z);
3578    k = z->code_buffer & ((1 << n) - 1);
3579    z->code_buffer >>= n;
3580    z->num_bits -= n;
3581    return k;
3582 }
3583 
stbi__zhuffman_decode_slowpath(stbi__zbuf * a,stbi__zhuffman * z)3584 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
3585 {
3586    int b,s,k;
3587    // not resolved by fast table, so compute it the slow way
3588    // use jpeg approach, which requires MSbits at top
3589    k = stbi__bit_reverse(a->code_buffer, 16);
3590    for (s=STBI__ZFAST_BITS+1; ; ++s)
3591       if (k < z->maxcode[s])
3592          break;
3593    if (s == 16) return -1; // invalid code!
3594    // code size is s, so:
3595    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
3596    STBI_ASSERT(z->size[b] == s);
3597    a->code_buffer >>= s;
3598    a->num_bits -= s;
3599    return z->value[b];
3600 }
3601 
stbi__zhuffman_decode(stbi__zbuf * a,stbi__zhuffman * z)3602 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
3603 {
3604    int b,s;
3605    if (a->num_bits < 16) stbi__fill_bits(a);
3606    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
3607    if (b) {
3608       s = b >> 9;
3609       a->code_buffer >>= s;
3610       a->num_bits -= s;
3611       return b & 511;
3612    }
3613    return stbi__zhuffman_decode_slowpath(a, z);
3614 }
3615 
stbi__zexpand(stbi__zbuf * z,char * zout,int n)3616 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
3617 {
3618    char *q;
3619    int cur, limit;
3620    z->zout = zout;
3621    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
3622    cur   = (int) (z->zout     - z->zout_start);
3623    limit = (int) (z->zout_end - z->zout_start);
3624    while (cur + n > limit)
3625       limit *= 2;
3626    q = (char *) STBI_REALLOC(z->zout_start, limit);
3627    if (q == NULL) return stbi__err("outofmem", "Out of memory");
3628    z->zout_start = q;
3629    z->zout       = q + cur;
3630    z->zout_end   = q + limit;
3631    return 1;
3632 }
3633 
3634 static int stbi__zlength_base[31] = {
3635    3,4,5,6,7,8,9,10,11,13,
3636    15,17,19,23,27,31,35,43,51,59,
3637    67,83,99,115,131,163,195,227,258,0,0 };
3638 
3639 static int stbi__zlength_extra[31]=
3640 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
3641 
3642 static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
3643 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
3644 
3645 static int stbi__zdist_extra[32] =
3646 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
3647 
stbi__parse_huffman_block(stbi__zbuf * a)3648 static int stbi__parse_huffman_block(stbi__zbuf *a)
3649 {
3650    char *zout = a->zout;
3651    for(;;) {
3652       int z = stbi__zhuffman_decode(a, &a->z_length);
3653       if (z < 256) {
3654          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
3655          if (zout >= a->zout_end) {
3656             if (!stbi__zexpand(a, zout, 1)) return 0;
3657             zout = a->zout;
3658          }
3659          *zout++ = (char) z;
3660       } else {
3661          stbi_uc *p;
3662          int len,dist;
3663          if (z == 256) {
3664             a->zout = zout;
3665             return 1;
3666          }
3667          z -= 257;
3668          len = stbi__zlength_base[z];
3669          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
3670          z = stbi__zhuffman_decode(a, &a->z_distance);
3671          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
3672          dist = stbi__zdist_base[z];
3673          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
3674          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
3675          if (zout + len > a->zout_end) {
3676             if (!stbi__zexpand(a, zout, len)) return 0;
3677             zout = a->zout;
3678          }
3679          p = (stbi_uc *) (zout - dist);
3680          if (dist == 1) { // run of one byte; common in images.
3681             stbi_uc v = *p;
3682             if (len) { do *zout++ = v; while (--len); }
3683          } else {
3684             if (len) { do *zout++ = *p++; while (--len); }
3685          }
3686       }
3687    }
3688 }
3689 
stbi__compute_huffman_codes(stbi__zbuf * a)3690 static int stbi__compute_huffman_codes(stbi__zbuf *a)
3691 {
3692    static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
3693    stbi__zhuffman z_codelength;
3694    stbi_uc lencodes[286+32+137];//padding for maximum single op
3695    stbi_uc codelength_sizes[19];
3696    int i,n;
3697 
3698    int hlit  = stbi__zreceive(a,5) + 257;
3699    int hdist = stbi__zreceive(a,5) + 1;
3700    int hclen = stbi__zreceive(a,4) + 4;
3701 
3702    memset(codelength_sizes, 0, sizeof(codelength_sizes));
3703    for (i=0; i < hclen; ++i) {
3704       int s = stbi__zreceive(a,3);
3705       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
3706    }
3707    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
3708 
3709    n = 0;
3710    while (n < hlit + hdist) {
3711       int c = stbi__zhuffman_decode(a, &z_codelength);
3712       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
3713       if (c < 16)
3714          lencodes[n++] = (stbi_uc) c;
3715       else if (c == 16) {
3716          c = stbi__zreceive(a,2)+3;
3717          memset(lencodes+n, lencodes[n-1], c);
3718          n += c;
3719       } else if (c == 17) {
3720          c = stbi__zreceive(a,3)+3;
3721          memset(lencodes+n, 0, c);
3722          n += c;
3723       } else {
3724          STBI_ASSERT(c == 18);
3725          c = stbi__zreceive(a,7)+11;
3726          memset(lencodes+n, 0, c);
3727          n += c;
3728       }
3729    }
3730    if (n != hlit+hdist) return stbi__err("bad codelengths","Corrupt PNG");
3731    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
3732    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
3733    return 1;
3734 }
3735 
stbi__parse_uncomperssed_block(stbi__zbuf * a)3736 static int stbi__parse_uncomperssed_block(stbi__zbuf *a)
3737 {
3738    stbi_uc header[4];
3739    int len,nlen,k;
3740    if (a->num_bits & 7)
3741       stbi__zreceive(a, a->num_bits & 7); // discard
3742    // drain the bit-packed data into header
3743    k = 0;
3744    while (a->num_bits > 0) {
3745       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
3746       a->code_buffer >>= 8;
3747       a->num_bits -= 8;
3748    }
3749    STBI_ASSERT(a->num_bits == 0);
3750    // now fill header the normal way
3751    while (k < 4)
3752       header[k++] = stbi__zget8(a);
3753    len  = header[1] * 256 + header[0];
3754    nlen = header[3] * 256 + header[2];
3755    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
3756    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
3757    if (a->zout + len > a->zout_end)
3758       if (!stbi__zexpand(a, a->zout, len)) return 0;
3759    memcpy(a->zout, a->zbuffer, len);
3760    a->zbuffer += len;
3761    a->zout += len;
3762    return 1;
3763 }
3764 
stbi__parse_zlib_header(stbi__zbuf * a)3765 static int stbi__parse_zlib_header(stbi__zbuf *a)
3766 {
3767    int cmf   = stbi__zget8(a);
3768    int cm    = cmf & 15;
3769    /* int cinfo = cmf >> 4; */
3770    int flg   = stbi__zget8(a);
3771    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
3772    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
3773    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
3774    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
3775    return 1;
3776 }
3777 
3778 // @TODO: should statically initialize these for optimal thread safety
3779 static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
stbi__init_zdefaults(void)3780 static void stbi__init_zdefaults(void)
3781 {
3782    int i;   // use <= to match clearly with spec
3783    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
3784    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
3785    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
3786    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
3787 
3788    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
3789 }
3790 
stbi__parse_zlib(stbi__zbuf * a,int parse_header)3791 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
3792 {
3793    int final, type;
3794    if (parse_header)
3795       if (!stbi__parse_zlib_header(a)) return 0;
3796    a->num_bits = 0;
3797    a->code_buffer = 0;
3798    do {
3799       final = stbi__zreceive(a,1);
3800       type = stbi__zreceive(a,2);
3801       if (type == 0) {
3802          if (!stbi__parse_uncomperssed_block(a)) return 0;
3803       } else if (type == 3) {
3804          return 0;
3805       } else {
3806          if (type == 1) {
3807             // use fixed code lengths
3808             if (!stbi__zdefault_distance[31]) stbi__init_zdefaults();
3809             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , 288)) return 0;
3810             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
3811          } else {
3812             if (!stbi__compute_huffman_codes(a)) return 0;
3813          }
3814          if (!stbi__parse_huffman_block(a)) return 0;
3815       }
3816    } while (!final);
3817    return 1;
3818 }
3819 
stbi__do_zlib(stbi__zbuf * a,char * obuf,int olen,int exp,int parse_header)3820 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
3821 {
3822    a->zout_start = obuf;
3823    a->zout       = obuf;
3824    a->zout_end   = obuf + olen;
3825    a->z_expandable = exp;
3826 
3827    return stbi__parse_zlib(a, parse_header);
3828 }
3829 
stbi_zlib_decode_malloc_guesssize(const char * buffer,int len,int initial_size,int * outlen)3830 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
3831 {
3832    stbi__zbuf a;
3833    char *p = (char *) stbi__malloc(initial_size);
3834    if (p == NULL) return NULL;
3835    a.zbuffer = (stbi_uc *) buffer;
3836    a.zbuffer_end = (stbi_uc *) buffer + len;
3837    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
3838       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3839       return a.zout_start;
3840    } else {
3841       STBI_FREE(a.zout_start);
3842       return NULL;
3843    }
3844 }
3845 
stbi_zlib_decode_malloc(char const * buffer,int len,int * outlen)3846 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
3847 {
3848    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
3849 }
3850 
stbi_zlib_decode_malloc_guesssize_headerflag(const char * buffer,int len,int initial_size,int * outlen,int parse_header)3851 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
3852 {
3853    stbi__zbuf a;
3854    char *p = (char *) stbi__malloc(initial_size);
3855    if (p == NULL) return NULL;
3856    a.zbuffer = (stbi_uc *) buffer;
3857    a.zbuffer_end = (stbi_uc *) buffer + len;
3858    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
3859       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3860       return a.zout_start;
3861    } else {
3862       STBI_FREE(a.zout_start);
3863       return NULL;
3864    }
3865 }
3866 
stbi_zlib_decode_buffer(char * obuffer,int olen,char const * ibuffer,int ilen)3867 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
3868 {
3869    stbi__zbuf a;
3870    a.zbuffer = (stbi_uc *) ibuffer;
3871    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3872    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
3873       return (int) (a.zout - a.zout_start);
3874    else
3875       return -1;
3876 }
3877 
stbi_zlib_decode_noheader_malloc(char const * buffer,int len,int * outlen)3878 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
3879 {
3880    stbi__zbuf a;
3881    char *p = (char *) stbi__malloc(16384);
3882    if (p == NULL) return NULL;
3883    a.zbuffer = (stbi_uc *) buffer;
3884    a.zbuffer_end = (stbi_uc *) buffer+len;
3885    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
3886       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3887       return a.zout_start;
3888    } else {
3889       STBI_FREE(a.zout_start);
3890       return NULL;
3891    }
3892 }
3893 
stbi_zlib_decode_noheader_buffer(char * obuffer,int olen,const char * ibuffer,int ilen)3894 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
3895 {
3896    stbi__zbuf a;
3897    a.zbuffer = (stbi_uc *) ibuffer;
3898    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3899    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
3900       return (int) (a.zout - a.zout_start);
3901    else
3902       return -1;
3903 }
3904 #endif
3905 
3906 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
3907 //    simple implementation
3908 //      - only 8-bit samples
3909 //      - no CRC checking
3910 //      - allocates lots of intermediate memory
3911 //        - avoids problem of streaming data between subsystems
3912 //        - avoids explicit window management
3913 //    performance
3914 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
3915 
3916 #ifndef STBI_NO_PNG
3917 typedef struct
3918 {
3919    stbi__uint32 length;
3920    stbi__uint32 type;
3921 } stbi__pngchunk;
3922 
stbi__get_chunk_header(stbi__context * s)3923 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
3924 {
3925    stbi__pngchunk c;
3926    c.length = stbi__get32be(s);
3927    c.type   = stbi__get32be(s);
3928    return c;
3929 }
3930 
stbi__check_png_header(stbi__context * s)3931 static int stbi__check_png_header(stbi__context *s)
3932 {
3933    static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
3934    int i;
3935    for (i=0; i < 8; ++i)
3936       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
3937    return 1;
3938 }
3939 
3940 typedef struct
3941 {
3942    stbi__context *s;
3943    stbi_uc *idata, *expanded, *out;
3944 } stbi__png;
3945 
3946 
3947 enum {
3948    STBI__F_none=0,
3949    STBI__F_sub=1,
3950    STBI__F_up=2,
3951    STBI__F_avg=3,
3952    STBI__F_paeth=4,
3953    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
3954    STBI__F_avg_first,
3955    STBI__F_paeth_first
3956 };
3957 
3958 static stbi_uc first_row_filter[5] =
3959 {
3960    STBI__F_none,
3961    STBI__F_sub,
3962    STBI__F_none,
3963    STBI__F_avg_first,
3964    STBI__F_paeth_first
3965 };
3966 
stbi__paeth(int a,int b,int c)3967 static int stbi__paeth(int a, int b, int c)
3968 {
3969    int p = a + b - c;
3970    int pa = abs(p-a);
3971    int pb = abs(p-b);
3972    int pc = abs(p-c);
3973    if (pa <= pb && pa <= pc) return a;
3974    if (pb <= pc) return b;
3975    return c;
3976 }
3977 
3978 static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
3979 
3980 // create the png data from post-deflated data
stbi__create_png_image_raw(stbi__png * a,stbi_uc * raw,stbi__uint32 raw_len,int out_n,stbi__uint32 x,stbi__uint32 y,int depth,int color)3981 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
3982 {
3983    stbi__context *s = a->s;
3984    stbi__uint32 i,j,stride = x*out_n;
3985    stbi__uint32 img_len, img_width_bytes;
3986    int k;
3987    int img_n = s->img_n; // copy it into a local for later
3988 
3989    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
3990    a->out = (stbi_uc *) stbi__malloc(x * y * out_n); // extra bytes to write off the end into
3991    if (!a->out) return stbi__err("outofmem", "Out of memory");
3992 
3993    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
3994    img_len = (img_width_bytes + 1) * y;
3995    if (s->img_x == x && s->img_y == y) {
3996       if (raw_len != img_len) return stbi__err("not enough pixels","Corrupt PNG");
3997    } else { // interlaced:
3998       if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
3999    }
4000 
4001    for (j=0; j < y; ++j) {
4002       stbi_uc *cur = a->out + stride*j;
4003       stbi_uc *prior = cur - stride;
4004       int filter = *raw++;
4005       int filter_bytes = img_n;
4006       int width = x;
4007       if (filter > 4)
4008          return stbi__err("invalid filter","Corrupt PNG");
4009 
4010       if (depth < 8) {
4011          STBI_ASSERT(img_width_bytes <= x);
4012          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
4013          filter_bytes = 1;
4014          width = img_width_bytes;
4015       }
4016 
4017       // if first row, use special filter that doesn't sample previous row
4018       if (j == 0) filter = first_row_filter[filter];
4019 
4020       // handle first byte explicitly
4021       for (k=0; k < filter_bytes; ++k) {
4022          switch (filter) {
4023             case STBI__F_none       : cur[k] = raw[k]; break;
4024             case STBI__F_sub        : cur[k] = raw[k]; break;
4025             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4026             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
4027             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
4028             case STBI__F_avg_first  : cur[k] = raw[k]; break;
4029             case STBI__F_paeth_first: cur[k] = raw[k]; break;
4030          }
4031       }
4032 
4033       if (depth == 8) {
4034          if (img_n != out_n)
4035             cur[img_n] = 255; // first pixel
4036          raw += img_n;
4037          cur += out_n;
4038          prior += out_n;
4039       } else {
4040          raw += 1;
4041          cur += 1;
4042          prior += 1;
4043       }
4044 
4045       // this is a little gross, so that we don't switch per-pixel or per-component
4046       if (depth < 8 || img_n == out_n) {
4047          int nk = (width - 1)*img_n;
4048          #define CASE(f) \
4049              case f:     \
4050                 for (k=0; k < nk; ++k)
4051          switch (filter) {
4052             // "none" filter turns into a memcpy here; make that explicit.
4053             case STBI__F_none:         memcpy(cur, raw, nk); break;
4054             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); break;
4055             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4056             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); break;
4057             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); break;
4058             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); break;
4059             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); break;
4060          }
4061          #undef CASE
4062          raw += nk;
4063       } else {
4064          STBI_ASSERT(img_n+1 == out_n);
4065          #define CASE(f) \
4066              case f:     \
4067                 for (i=x-1; i >= 1; --i, cur[img_n]=255,raw+=img_n,cur+=out_n,prior+=out_n) \
4068                    for (k=0; k < img_n; ++k)
4069          switch (filter) {
4070             CASE(STBI__F_none)         cur[k] = raw[k]; break;
4071             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-out_n]); break;
4072             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4073             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-out_n])>>1)); break;
4074             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],prior[k],prior[k-out_n])); break;
4075             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-out_n] >> 1)); break;
4076             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],0,0)); break;
4077          }
4078          #undef CASE
4079       }
4080    }
4081 
4082    // we make a separate pass to expand bits to pixels; for performance,
4083    // this could run two scanlines behind the above code, so it won't
4084    // interfere with filtering but will still be in the cache.
4085    if (depth < 8) {
4086       for (j=0; j < y; ++j) {
4087          stbi_uc *cur = a->out + stride*j;
4088          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
4089          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
4090          // png guarantee byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
4091          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4092 
4093          // note that the final byte might overshoot and write more data than desired.
4094          // we can allocate enough data that this never writes out of memory, but it
4095          // could also overwrite the next scanline. can it overwrite non-empty data
4096          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
4097          // so we need to explicitly clamp the final ones
4098 
4099          if (depth == 4) {
4100             for (k=x*img_n; k >= 2; k-=2, ++in) {
4101                *cur++ = scale * ((*in >> 4)       );
4102                *cur++ = scale * ((*in     ) & 0x0f);
4103             }
4104             if (k > 0) *cur++ = scale * ((*in >> 4)       );
4105          } else if (depth == 2) {
4106             for (k=x*img_n; k >= 4; k-=4, ++in) {
4107                *cur++ = scale * ((*in >> 6)       );
4108                *cur++ = scale * ((*in >> 4) & 0x03);
4109                *cur++ = scale * ((*in >> 2) & 0x03);
4110                *cur++ = scale * ((*in     ) & 0x03);
4111             }
4112             if (k > 0) *cur++ = scale * ((*in >> 6)       );
4113             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
4114             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
4115          } else if (depth == 1) {
4116             for (k=x*img_n; k >= 8; k-=8, ++in) {
4117                *cur++ = scale * ((*in >> 7)       );
4118                *cur++ = scale * ((*in >> 6) & 0x01);
4119                *cur++ = scale * ((*in >> 5) & 0x01);
4120                *cur++ = scale * ((*in >> 4) & 0x01);
4121                *cur++ = scale * ((*in >> 3) & 0x01);
4122                *cur++ = scale * ((*in >> 2) & 0x01);
4123                *cur++ = scale * ((*in >> 1) & 0x01);
4124                *cur++ = scale * ((*in     ) & 0x01);
4125             }
4126             if (k > 0) *cur++ = scale * ((*in >> 7)       );
4127             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4128             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4129             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4130             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4131             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4132             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4133          }
4134          if (img_n != out_n) {
4135             int q;
4136             // insert alpha = 255
4137             cur = a->out + stride*j;
4138             if (img_n == 1) {
4139                for (q=x-1; q >= 0; --q) {
4140                   cur[q*2+1] = 255;
4141                   cur[q*2+0] = cur[q];
4142                }
4143             } else {
4144                STBI_ASSERT(img_n == 3);
4145                for (q=x-1; q >= 0; --q) {
4146                   cur[q*4+3] = 255;
4147                   cur[q*4+2] = cur[q*3+2];
4148                   cur[q*4+1] = cur[q*3+1];
4149                   cur[q*4+0] = cur[q*3+0];
4150                }
4151             }
4152          }
4153       }
4154    }
4155 
4156    return 1;
4157 }
4158 
stbi__create_png_image(stbi__png * a,stbi_uc * image_data,stbi__uint32 image_data_len,int out_n,int depth,int color,int interlaced)4159 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4160 {
4161    stbi_uc *final;
4162    int p;
4163    if (!interlaced)
4164       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4165 
4166    // de-interlacing
4167    final = (stbi_uc *) stbi__malloc(a->s->img_x * a->s->img_y * out_n);
4168    for (p=0; p < 7; ++p) {
4169       int xorig[] = { 0,4,0,2,0,1,0 };
4170       int yorig[] = { 0,0,4,0,2,0,1 };
4171       int xspc[]  = { 8,8,4,4,2,2,1 };
4172       int yspc[]  = { 8,8,8,4,4,2,2 };
4173       int i,j,x,y;
4174       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4175       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4176       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4177       if (x && y) {
4178          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4179          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4180             STBI_FREE(final);
4181             return 0;
4182          }
4183          for (j=0; j < y; ++j) {
4184             for (i=0; i < x; ++i) {
4185                int out_y = j*yspc[p]+yorig[p];
4186                int out_x = i*xspc[p]+xorig[p];
4187                memcpy(final + out_y*a->s->img_x*out_n + out_x*out_n,
4188                       a->out + (j*x+i)*out_n, out_n);
4189             }
4190          }
4191          STBI_FREE(a->out);
4192          image_data += img_len;
4193          image_data_len -= img_len;
4194       }
4195    }
4196    a->out = final;
4197 
4198    return 1;
4199 }
4200 
stbi__compute_transparency(stbi__png * z,stbi_uc tc[3],int out_n)4201 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4202 {
4203    stbi__context *s = z->s;
4204    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4205    stbi_uc *p = z->out;
4206 
4207    // compute color-based transparency, assuming we've
4208    // already got 255 as the alpha value in the output
4209    STBI_ASSERT(out_n == 2 || out_n == 4);
4210 
4211    if (out_n == 2) {
4212       for (i=0; i < pixel_count; ++i) {
4213          p[1] = (p[0] == tc[0] ? 0 : 255);
4214          p += 2;
4215       }
4216    } else {
4217       for (i=0; i < pixel_count; ++i) {
4218          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4219             p[3] = 0;
4220          p += 4;
4221       }
4222    }
4223    return 1;
4224 }
4225 
stbi__expand_png_palette(stbi__png * a,stbi_uc * palette,int len,int pal_img_n)4226 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4227 {
4228    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4229    stbi_uc *p, *temp_out, *orig = a->out;
4230 
4231    p = (stbi_uc *) stbi__malloc(pixel_count * pal_img_n);
4232    if (p == NULL) return stbi__err("outofmem", "Out of memory");
4233 
4234    // between here and free(out) below, exiting would leak
4235    temp_out = p;
4236 
4237    if (pal_img_n == 3) {
4238       for (i=0; i < pixel_count; ++i) {
4239          int n = orig[i]*4;
4240          p[0] = palette[n  ];
4241          p[1] = palette[n+1];
4242          p[2] = palette[n+2];
4243          p += 3;
4244       }
4245    } else {
4246       for (i=0; i < pixel_count; ++i) {
4247          int n = orig[i]*4;
4248          p[0] = palette[n  ];
4249          p[1] = palette[n+1];
4250          p[2] = palette[n+2];
4251          p[3] = palette[n+3];
4252          p += 4;
4253       }
4254    }
4255    STBI_FREE(a->out);
4256    a->out = temp_out;
4257 
4258    STBI_NOTUSED(len);
4259 
4260    return 1;
4261 }
4262 
4263 static int stbi__unpremultiply_on_load = 0;
4264 static int stbi__de_iphone_flag = 0;
4265 
stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)4266 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4267 {
4268    stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
4269 }
4270 
stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)4271 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
4272 {
4273    stbi__de_iphone_flag = flag_true_if_should_convert;
4274 }
4275 
stbi__de_iphone(stbi__png * z)4276 static void stbi__de_iphone(stbi__png *z)
4277 {
4278    stbi__context *s = z->s;
4279    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4280    stbi_uc *p = z->out;
4281 
4282    if (s->img_out_n == 3) {  // convert bgr to rgb
4283       for (i=0; i < pixel_count; ++i) {
4284          stbi_uc t = p[0];
4285          p[0] = p[2];
4286          p[2] = t;
4287          p += 3;
4288       }
4289    } else {
4290       STBI_ASSERT(s->img_out_n == 4);
4291       if (stbi__unpremultiply_on_load) {
4292          // convert bgr to rgb and unpremultiply
4293          for (i=0; i < pixel_count; ++i) {
4294             stbi_uc a = p[3];
4295             stbi_uc t = p[0];
4296             if (a) {
4297                p[0] = p[2] * 255 / a;
4298                p[1] = p[1] * 255 / a;
4299                p[2] =  t   * 255 / a;
4300             } else {
4301                p[0] = p[2];
4302                p[2] = t;
4303             }
4304             p += 4;
4305          }
4306       } else {
4307          // convert bgr to rgb
4308          for (i=0; i < pixel_count; ++i) {
4309             stbi_uc t = p[0];
4310             p[0] = p[2];
4311             p[2] = t;
4312             p += 4;
4313          }
4314       }
4315    }
4316 }
4317 
4318 #define STBI__PNG_TYPE(a,b,c,d)  (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
4319 
stbi__parse_png_file(stbi__png * z,int scan,int req_comp)4320 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
4321 {
4322    stbi_uc palette[1024], pal_img_n=0;
4323    stbi_uc has_trans=0, tc[3];
4324    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
4325    int first=1,k,interlace=0, color=0, depth=0, is_iphone=0;
4326    stbi__context *s = z->s;
4327 
4328    z->expanded = NULL;
4329    z->idata = NULL;
4330    z->out = NULL;
4331 
4332    if (!stbi__check_png_header(s)) return 0;
4333 
4334    if (scan == STBI__SCAN_type) return 1;
4335 
4336    for (;;) {
4337       stbi__pngchunk c = stbi__get_chunk_header(s);
4338       switch (c.type) {
4339          case STBI__PNG_TYPE('C','g','B','I'):
4340             is_iphone = 1;
4341             stbi__skip(s, c.length);
4342             break;
4343          case STBI__PNG_TYPE('I','H','D','R'): {
4344             int comp,filter;
4345             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
4346             first = 0;
4347             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
4348             s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4349             s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4350             depth = stbi__get8(s);  if (depth != 1 && depth != 2 && depth != 4 && depth != 8)  return stbi__err("1/2/4/8-bit only","PNG not supported: 1/2/4/8-bit only");
4351             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
4352             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
4353             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
4354             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
4355             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
4356             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
4357             if (!pal_img_n) {
4358                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
4359                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
4360                if (scan == STBI__SCAN_header) return 1;
4361             } else {
4362                // if paletted, then pal_n is our final components, and
4363                // img_n is # components to decompress/filter.
4364                s->img_n = 1;
4365                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
4366                // if SCAN_header, have to scan to see if we have a tRNS
4367             }
4368             break;
4369          }
4370 
4371          case STBI__PNG_TYPE('P','L','T','E'):  {
4372             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4373             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
4374             pal_len = c.length / 3;
4375             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
4376             for (i=0; i < pal_len; ++i) {
4377                palette[i*4+0] = stbi__get8(s);
4378                palette[i*4+1] = stbi__get8(s);
4379                palette[i*4+2] = stbi__get8(s);
4380                palette[i*4+3] = 255;
4381             }
4382             break;
4383          }
4384 
4385          case STBI__PNG_TYPE('t','R','N','S'): {
4386             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4387             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
4388             if (pal_img_n) {
4389                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
4390                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
4391                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
4392                pal_img_n = 4;
4393                for (i=0; i < c.length; ++i)
4394                   palette[i*4+3] = stbi__get8(s);
4395             } else {
4396                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
4397                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
4398                has_trans = 1;
4399                for (k=0; k < s->img_n; ++k)
4400                   tc[k] = (stbi_uc) (stbi__get16be(s) & 255) * stbi__depth_scale_table[depth]; // non 8-bit images will be larger
4401             }
4402             break;
4403          }
4404 
4405          case STBI__PNG_TYPE('I','D','A','T'): {
4406             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4407             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
4408             if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
4409             if ((int)(ioff + c.length) < (int)ioff) return 0;
4410             if (ioff + c.length > idata_limit) {
4411                stbi_uc *p;
4412                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
4413                while (ioff + c.length > idata_limit)
4414                   idata_limit *= 2;
4415                p = (stbi_uc *) STBI_REALLOC(z->idata, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
4416                z->idata = p;
4417             }
4418             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
4419             ioff += c.length;
4420             break;
4421          }
4422 
4423          case STBI__PNG_TYPE('I','E','N','D'): {
4424             stbi__uint32 raw_len, bpl;
4425             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4426             if (scan != STBI__SCAN_load) return 1;
4427             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
4428             // initial guess for decoded data size to avoid unnecessary reallocs
4429             bpl = (s->img_x * depth + 7) / 8; // bytes per line, per component
4430             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
4431             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
4432             if (z->expanded == NULL) return 0; // zlib should set error
4433             STBI_FREE(z->idata); z->idata = NULL;
4434             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
4435                s->img_out_n = s->img_n+1;
4436             else
4437                s->img_out_n = s->img_n;
4438             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, depth, color, interlace)) return 0;
4439             if (has_trans)
4440                if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
4441             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
4442                stbi__de_iphone(z);
4443             if (pal_img_n) {
4444                // pal_img_n == 3 or 4
4445                s->img_n = pal_img_n; // record the actual colors we had
4446                s->img_out_n = pal_img_n;
4447                if (req_comp >= 3) s->img_out_n = req_comp;
4448                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
4449                   return 0;
4450             }
4451             STBI_FREE(z->expanded); z->expanded = NULL;
4452             return 1;
4453          }
4454 
4455          default:
4456             // if critical, fail
4457             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4458             if ((c.type & (1 << 29)) == 0) {
4459                #ifndef STBI_NO_FAILURE_STRINGS
4460                // not threadsafe
4461                static char invalid_chunk[] = "XXXX PNG chunk not known";
4462                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
4463                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
4464                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
4465                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
4466                #endif
4467                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
4468             }
4469             stbi__skip(s, c.length);
4470             break;
4471       }
4472       // end of PNG chunk, read and skip CRC
4473       stbi__get32be(s);
4474    }
4475 }
4476 
stbi__do_png(stbi__png * p,int * x,int * y,int * n,int req_comp)4477 static unsigned char *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp)
4478 {
4479    unsigned char *result=NULL;
4480    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
4481    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
4482       result = p->out;
4483       p->out = NULL;
4484       if (req_comp && req_comp != p->s->img_out_n) {
4485          result = stbi__convert_format(result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
4486          p->s->img_out_n = req_comp;
4487          if (result == NULL) return result;
4488       }
4489       *x = p->s->img_x;
4490       *y = p->s->img_y;
4491       if (n) *n = p->s->img_out_n;
4492    }
4493    STBI_FREE(p->out);      p->out      = NULL;
4494    STBI_FREE(p->expanded); p->expanded = NULL;
4495    STBI_FREE(p->idata);    p->idata    = NULL;
4496 
4497    return result;
4498 }
4499 
stbi__png_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4500 static unsigned char *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4501 {
4502    stbi__png p;
4503    p.s = s;
4504    return stbi__do_png(&p, x,y,comp,req_comp);
4505 }
4506 
stbi__png_test(stbi__context * s)4507 static int stbi__png_test(stbi__context *s)
4508 {
4509    int r;
4510    r = stbi__check_png_header(s);
4511    stbi__rewind(s);
4512    return r;
4513 }
4514 
stbi__png_info_raw(stbi__png * p,int * x,int * y,int * comp)4515 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
4516 {
4517    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
4518       stbi__rewind( p->s );
4519       return 0;
4520    }
4521    if (x) *x = p->s->img_x;
4522    if (y) *y = p->s->img_y;
4523    if (comp) *comp = p->s->img_n;
4524    return 1;
4525 }
4526 
stbi__png_info(stbi__context * s,int * x,int * y,int * comp)4527 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
4528 {
4529    stbi__png p;
4530    p.s = s;
4531    return stbi__png_info_raw(&p, x, y, comp);
4532 }
4533 #endif
4534 
4535 // Microsoft/Windows BMP image
4536 
4537 #ifndef STBI_NO_BMP
stbi__bmp_test_raw(stbi__context * s)4538 static int stbi__bmp_test_raw(stbi__context *s)
4539 {
4540    int r;
4541    int sz;
4542    if (stbi__get8(s) != 'B') return 0;
4543    if (stbi__get8(s) != 'M') return 0;
4544    stbi__get32le(s); // discard filesize
4545    stbi__get16le(s); // discard reserved
4546    stbi__get16le(s); // discard reserved
4547    stbi__get32le(s); // discard data offset
4548    sz = stbi__get32le(s);
4549    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
4550    return r;
4551 }
4552 
stbi__bmp_test(stbi__context * s)4553 static int stbi__bmp_test(stbi__context *s)
4554 {
4555    int r = stbi__bmp_test_raw(s);
4556    stbi__rewind(s);
4557    return r;
4558 }
4559 
4560 
4561 // returns 0..31 for the highest set bit
stbi__high_bit(unsigned int z)4562 static int stbi__high_bit(unsigned int z)
4563 {
4564    int n=0;
4565    if (z == 0) return -1;
4566    if (z >= 0x10000) n += 16, z >>= 16;
4567    if (z >= 0x00100) n +=  8, z >>=  8;
4568    if (z >= 0x00010) n +=  4, z >>=  4;
4569    if (z >= 0x00004) n +=  2, z >>=  2;
4570    if (z >= 0x00002) n +=  1, z >>=  1;
4571    return n;
4572 }
4573 
stbi__bitcount(unsigned int a)4574 static int stbi__bitcount(unsigned int a)
4575 {
4576    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
4577    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
4578    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
4579    a = (a + (a >> 8)); // max 16 per 8 bits
4580    a = (a + (a >> 16)); // max 32 per 8 bits
4581    return a & 0xff;
4582 }
4583 
stbi__shiftsigned(int v,int shift,int bits)4584 static int stbi__shiftsigned(int v, int shift, int bits)
4585 {
4586    int result;
4587    int z=0;
4588 
4589    if (shift < 0) v <<= -shift;
4590    else v >>= shift;
4591    result = v;
4592 
4593    z = bits;
4594    while (z < 8) {
4595       result += v >> z;
4596       z += bits;
4597    }
4598    return result;
4599 }
4600 
stbi__bmp_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4601 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4602 {
4603    stbi_uc *out;
4604    unsigned int mr=0,mg=0,mb=0,ma=0, all_a=255;
4605    stbi_uc pal[256][4];
4606    int psize=0,i,j,compress=0,width;
4607    int bpp, flip_vertically, pad, target, offset, hsz;
4608    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
4609    stbi__get32le(s); // discard filesize
4610    stbi__get16le(s); // discard reserved
4611    stbi__get16le(s); // discard reserved
4612    offset = stbi__get32le(s);
4613    hsz = stbi__get32le(s);
4614    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
4615    if (hsz == 12) {
4616       s->img_x = stbi__get16le(s);
4617       s->img_y = stbi__get16le(s);
4618    } else {
4619       s->img_x = stbi__get32le(s);
4620       s->img_y = stbi__get32le(s);
4621    }
4622    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
4623    bpp = stbi__get16le(s);
4624    if (bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
4625    flip_vertically = ((int) s->img_y) > 0;
4626    s->img_y = abs((int) s->img_y);
4627    if (hsz == 12) {
4628       if (bpp < 24)
4629          psize = (offset - 14 - 24) / 3;
4630    } else {
4631       compress = stbi__get32le(s);
4632       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
4633       stbi__get32le(s); // discard sizeof
4634       stbi__get32le(s); // discard hres
4635       stbi__get32le(s); // discard vres
4636       stbi__get32le(s); // discard colorsused
4637       stbi__get32le(s); // discard max important
4638       if (hsz == 40 || hsz == 56) {
4639          if (hsz == 56) {
4640             stbi__get32le(s);
4641             stbi__get32le(s);
4642             stbi__get32le(s);
4643             stbi__get32le(s);
4644          }
4645          if (bpp == 16 || bpp == 32) {
4646             mr = mg = mb = 0;
4647             if (compress == 0) {
4648                if (bpp == 32) {
4649                   mr = 0xffu << 16;
4650                   mg = 0xffu <<  8;
4651                   mb = 0xffu <<  0;
4652                   ma = 0xffu << 24;
4653                   all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
4654                } else {
4655                   mr = 31u << 10;
4656                   mg = 31u <<  5;
4657                   mb = 31u <<  0;
4658                }
4659             } else if (compress == 3) {
4660                mr = stbi__get32le(s);
4661                mg = stbi__get32le(s);
4662                mb = stbi__get32le(s);
4663                // not documented, but generated by photoshop and handled by mspaint
4664                if (mr == mg && mg == mb) {
4665                   // ?!?!?
4666                   return stbi__errpuc("bad BMP", "bad BMP");
4667                }
4668             } else
4669                return stbi__errpuc("bad BMP", "bad BMP");
4670          }
4671       } else {
4672          STBI_ASSERT(hsz == 108 || hsz == 124);
4673          mr = stbi__get32le(s);
4674          mg = stbi__get32le(s);
4675          mb = stbi__get32le(s);
4676          ma = stbi__get32le(s);
4677          stbi__get32le(s); // discard color space
4678          for (i=0; i < 12; ++i)
4679             stbi__get32le(s); // discard color space parameters
4680          if (hsz == 124) {
4681             stbi__get32le(s); // discard rendering intent
4682             stbi__get32le(s); // discard offset of profile data
4683             stbi__get32le(s); // discard size of profile data
4684             stbi__get32le(s); // discard reserved
4685          }
4686       }
4687       if (bpp < 16)
4688          psize = (offset - 14 - hsz) >> 2;
4689    }
4690    s->img_n = ma ? 4 : 3;
4691    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
4692       target = req_comp;
4693    else
4694       target = s->img_n; // if they want monochrome, we'll post-convert
4695    out = (stbi_uc *) stbi__malloc(target * s->img_x * s->img_y);
4696    if (!out) return stbi__errpuc("outofmem", "Out of memory");
4697    if (bpp < 16) {
4698       int z=0;
4699       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
4700       for (i=0; i < psize; ++i) {
4701          pal[i][2] = stbi__get8(s);
4702          pal[i][1] = stbi__get8(s);
4703          pal[i][0] = stbi__get8(s);
4704          if (hsz != 12) stbi__get8(s);
4705          pal[i][3] = 255;
4706       }
4707       stbi__skip(s, offset - 14 - hsz - psize * (hsz == 12 ? 3 : 4));
4708       if (bpp == 4) width = (s->img_x + 1) >> 1;
4709       else if (bpp == 8) width = s->img_x;
4710       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
4711       pad = (-width)&3;
4712       for (j=0; j < (int) s->img_y; ++j) {
4713          for (i=0; i < (int) s->img_x; i += 2) {
4714             int v=stbi__get8(s),v2=0;
4715             if (bpp == 4) {
4716                v2 = v & 15;
4717                v >>= 4;
4718             }
4719             out[z++] = pal[v][0];
4720             out[z++] = pal[v][1];
4721             out[z++] = pal[v][2];
4722             if (target == 4) out[z++] = 255;
4723             if (i+1 == (int) s->img_x) break;
4724             v = (bpp == 8) ? stbi__get8(s) : v2;
4725             out[z++] = pal[v][0];
4726             out[z++] = pal[v][1];
4727             out[z++] = pal[v][2];
4728             if (target == 4) out[z++] = 255;
4729          }
4730          stbi__skip(s, pad);
4731       }
4732    } else {
4733       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
4734       int z = 0;
4735       int easy=0;
4736       stbi__skip(s, offset - 14 - hsz);
4737       if (bpp == 24) width = 3 * s->img_x;
4738       else if (bpp == 16) width = 2*s->img_x;
4739       else /* bpp = 32 and pad = 0 */ width=0;
4740       pad = (-width) & 3;
4741       if (bpp == 24) {
4742          easy = 1;
4743       } else if (bpp == 32) {
4744          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
4745             easy = 2;
4746       }
4747       if (!easy) {
4748          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
4749          // right shift amt to put high bit in position #7
4750          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
4751          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
4752          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
4753          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
4754       }
4755       for (j=0; j < (int) s->img_y; ++j) {
4756          if (easy) {
4757             for (i=0; i < (int) s->img_x; ++i) {
4758                unsigned char a;
4759                out[z+2] = stbi__get8(s);
4760                out[z+1] = stbi__get8(s);
4761                out[z+0] = stbi__get8(s);
4762                z += 3;
4763                a = (easy == 2 ? stbi__get8(s) : 255);
4764                all_a |= a;
4765                if (target == 4) out[z++] = a;
4766             }
4767          } else {
4768             for (i=0; i < (int) s->img_x; ++i) {
4769                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
4770                int a;
4771                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
4772                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
4773                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
4774                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
4775                all_a |= a;
4776                if (target == 4) out[z++] = STBI__BYTECAST(a);
4777             }
4778          }
4779          stbi__skip(s, pad);
4780       }
4781    }
4782 
4783    // if alpha channel is all 0s, replace with all 255s
4784    if (target == 4 && all_a == 0)
4785       for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
4786          out[i] = 255;
4787 
4788    if (flip_vertically) {
4789       stbi_uc t;
4790       for (j=0; j < (int) s->img_y>>1; ++j) {
4791          stbi_uc *p1 = out +      j     *s->img_x*target;
4792          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
4793          for (i=0; i < (int) s->img_x*target; ++i) {
4794             t = p1[i], p1[i] = p2[i], p2[i] = t;
4795          }
4796       }
4797    }
4798 
4799    if (req_comp && req_comp != target) {
4800       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
4801       if (out == NULL) return out; // stbi__convert_format frees input on failure
4802    }
4803 
4804    *x = s->img_x;
4805    *y = s->img_y;
4806    if (comp) *comp = s->img_n;
4807    return out;
4808 }
4809 #endif
4810 
4811 // Targa Truevision - TGA
4812 // by Jonathan Dummer
4813 #ifndef STBI_NO_TGA
stbi__tga_info(stbi__context * s,int * x,int * y,int * comp)4814 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
4815 {
4816     int tga_w, tga_h, tga_comp;
4817     int sz;
4818     stbi__get8(s);                   // discard Offset
4819     sz = stbi__get8(s);              // color type
4820     if( sz > 1 ) {
4821         stbi__rewind(s);
4822         return 0;      // only RGB or indexed allowed
4823     }
4824     sz = stbi__get8(s);              // image type
4825     // only RGB or grey allowed, +/- RLE
4826     if ((sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11)) return 0;
4827     stbi__skip(s,9);
4828     tga_w = stbi__get16le(s);
4829     if( tga_w < 1 ) {
4830         stbi__rewind(s);
4831         return 0;   // test width
4832     }
4833     tga_h = stbi__get16le(s);
4834     if( tga_h < 1 ) {
4835         stbi__rewind(s);
4836         return 0;   // test height
4837     }
4838     sz = stbi__get8(s);               // bits per pixel
4839     // only RGB or RGBA or grey allowed
4840     if ((sz != 8) && (sz != 16) && (sz != 24) && (sz != 32)) {
4841         stbi__rewind(s);
4842         return 0;
4843     }
4844     tga_comp = sz;
4845     if (x) *x = tga_w;
4846     if (y) *y = tga_h;
4847     if (comp) *comp = tga_comp / 8;
4848     return 1;                   // seems to have passed everything
4849 }
4850 
stbi__tga_test(stbi__context * s)4851 static int stbi__tga_test(stbi__context *s)
4852 {
4853    int res;
4854    int sz;
4855    stbi__get8(s);      //   discard Offset
4856    sz = stbi__get8(s);   //   color type
4857    if ( sz > 1 ) return 0;   //   only RGB or indexed allowed
4858    sz = stbi__get8(s);   //   image type
4859    if ( (sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11) ) return 0;   //   only RGB or grey allowed, +/- RLE
4860    stbi__get16be(s);      //   discard palette start
4861    stbi__get16be(s);      //   discard palette length
4862    stbi__get8(s);         //   discard bits per palette color entry
4863    stbi__get16be(s);      //   discard x origin
4864    stbi__get16be(s);      //   discard y origin
4865    if ( stbi__get16be(s) < 1 ) return 0;      //   test width
4866    if ( stbi__get16be(s) < 1 ) return 0;      //   test height
4867    sz = stbi__get8(s);   //   bits per pixel
4868    if ( (sz != 8) && (sz != 16) && (sz != 24) && (sz != 32) )
4869       res = 0;
4870    else
4871       res = 1;
4872    stbi__rewind(s);
4873    return res;
4874 }
4875 
stbi__tga_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4876 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4877 {
4878    //   read in the TGA header stuff
4879    int tga_offset = stbi__get8(s);
4880    int tga_indexed = stbi__get8(s);
4881    int tga_image_type = stbi__get8(s);
4882    int tga_is_RLE = 0;
4883    int tga_palette_start = stbi__get16le(s);
4884    int tga_palette_len = stbi__get16le(s);
4885    int tga_palette_bits = stbi__get8(s);
4886    int tga_x_origin = stbi__get16le(s);
4887    int tga_y_origin = stbi__get16le(s);
4888    int tga_width = stbi__get16le(s);
4889    int tga_height = stbi__get16le(s);
4890    int tga_bits_per_pixel = stbi__get8(s);
4891    int tga_comp = tga_bits_per_pixel / 8;
4892    int tga_inverted = stbi__get8(s);
4893    //   image data
4894    unsigned char *tga_data;
4895    unsigned char *tga_palette = NULL;
4896    int i, j;
4897    unsigned char raw_data[4];
4898    int RLE_count = 0;
4899    int RLE_repeating = 0;
4900    int read_next_pixel = 1;
4901 
4902    //   do a tiny bit of processing
4903    if ( tga_image_type >= 8 )
4904    {
4905       tga_image_type -= 8;
4906       tga_is_RLE = 1;
4907    }
4908    /* int tga_alpha_bits = tga_inverted & 15; */
4909    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
4910 
4911    //   error check
4912    if ( //(tga_indexed) ||
4913       (tga_width < 1) || (tga_height < 1) ||
4914       (tga_image_type < 1) || (tga_image_type > 3) ||
4915       ((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16) &&
4916       (tga_bits_per_pixel != 24) && (tga_bits_per_pixel != 32))
4917       )
4918    {
4919       return NULL; // we don't report this as a bad TGA because we don't even know if it's TGA
4920    }
4921 
4922    //   If I'm paletted, then I'll use the number of bits from the palette
4923    if ( tga_indexed )
4924    {
4925       tga_comp = tga_palette_bits / 8;
4926    }
4927 
4928    //   tga info
4929    *x = tga_width;
4930    *y = tga_height;
4931    if (comp) *comp = tga_comp;
4932 
4933    tga_data = (unsigned char*)stbi__malloc( (size_t)tga_width * tga_height * tga_comp );
4934    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
4935 
4936    // skip to the data's starting position (offset usually = 0)
4937    stbi__skip(s, tga_offset );
4938 
4939    if ( !tga_indexed && !tga_is_RLE) {
4940       for (i=0; i < tga_height; ++i) {
4941          int row = tga_inverted ? tga_height -i - 1 : i;
4942          stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
4943          stbi__getn(s, tga_row, tga_width * tga_comp);
4944       }
4945    } else  {
4946       //   do I need to load a palette?
4947       if ( tga_indexed)
4948       {
4949          //   any data to skip? (offset usually = 0)
4950          stbi__skip(s, tga_palette_start );
4951          //   load the palette
4952          tga_palette = (unsigned char*)stbi__malloc( tga_palette_len * tga_palette_bits / 8 );
4953          if (!tga_palette) {
4954             STBI_FREE(tga_data);
4955             return stbi__errpuc("outofmem", "Out of memory");
4956          }
4957          if (!stbi__getn(s, tga_palette, tga_palette_len * tga_palette_bits / 8 )) {
4958             STBI_FREE(tga_data);
4959             STBI_FREE(tga_palette);
4960             return stbi__errpuc("bad palette", "Corrupt TGA");
4961          }
4962       }
4963       //   load the data
4964       for (i=0; i < tga_width * tga_height; ++i)
4965       {
4966          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
4967          if ( tga_is_RLE )
4968          {
4969             if ( RLE_count == 0 )
4970             {
4971                //   yep, get the next byte as a RLE command
4972                int RLE_cmd = stbi__get8(s);
4973                RLE_count = 1 + (RLE_cmd & 127);
4974                RLE_repeating = RLE_cmd >> 7;
4975                read_next_pixel = 1;
4976             } else if ( !RLE_repeating )
4977             {
4978                read_next_pixel = 1;
4979             }
4980          } else
4981          {
4982             read_next_pixel = 1;
4983          }
4984          //   OK, if I need to read a pixel, do it now
4985          if ( read_next_pixel )
4986          {
4987             //   load however much data we did have
4988             if ( tga_indexed )
4989             {
4990                //   read in 1 byte, then perform the lookup
4991                int pal_idx = stbi__get8(s);
4992                if ( pal_idx >= tga_palette_len )
4993                {
4994                   //   invalid index
4995                   pal_idx = 0;
4996                }
4997                pal_idx *= tga_bits_per_pixel / 8;
4998                for (j = 0; j*8 < tga_bits_per_pixel; ++j)
4999                {
5000                   raw_data[j] = tga_palette[pal_idx+j];
5001                }
5002             } else
5003             {
5004                //   read in the data raw
5005                for (j = 0; j*8 < tga_bits_per_pixel; ++j)
5006                {
5007                   raw_data[j] = stbi__get8(s);
5008                }
5009             }
5010             //   clear the reading flag for the next pixel
5011             read_next_pixel = 0;
5012          } // end of reading a pixel
5013 
5014          // copy data
5015          for (j = 0; j < tga_comp; ++j)
5016            tga_data[i*tga_comp+j] = raw_data[j];
5017 
5018          //   in case we're in RLE mode, keep counting down
5019          --RLE_count;
5020       }
5021       //   do I need to invert the image?
5022       if ( tga_inverted )
5023       {
5024          for (j = 0; j*2 < tga_height; ++j)
5025          {
5026             int index1 = j * tga_width * tga_comp;
5027             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
5028             for (i = tga_width * tga_comp; i > 0; --i)
5029             {
5030                unsigned char temp = tga_data[index1];
5031                tga_data[index1] = tga_data[index2];
5032                tga_data[index2] = temp;
5033                ++index1;
5034                ++index2;
5035             }
5036          }
5037       }
5038       //   clear my palette, if I had one
5039       if ( tga_palette != NULL )
5040       {
5041          STBI_FREE( tga_palette );
5042       }
5043    }
5044 
5045    // swap RGB
5046    if (tga_comp >= 3)
5047    {
5048       unsigned char* tga_pixel = tga_data;
5049       for (i=0; i < tga_width * tga_height; ++i)
5050       {
5051          unsigned char temp = tga_pixel[0];
5052          tga_pixel[0] = tga_pixel[2];
5053          tga_pixel[2] = temp;
5054          tga_pixel += tga_comp;
5055       }
5056    }
5057 
5058    // convert to target component count
5059    if (req_comp && req_comp != tga_comp)
5060       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
5061 
5062    //   the things I do to get rid of an error message, and yet keep
5063    //   Microsoft's C compilers happy... [8^(
5064    tga_palette_start = tga_palette_len = tga_palette_bits =
5065          tga_x_origin = tga_y_origin = 0;
5066    //   OK, done
5067    return tga_data;
5068 }
5069 #endif
5070 
5071 // *************************************************************************************************
5072 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
5073 
5074 #ifndef STBI_NO_PSD
stbi__psd_test(stbi__context * s)5075 static int stbi__psd_test(stbi__context *s)
5076 {
5077    int r = (stbi__get32be(s) == 0x38425053);
5078    stbi__rewind(s);
5079    return r;
5080 }
5081 
stbi__psd_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5082 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5083 {
5084    int   pixelCount;
5085    int channelCount, compression;
5086    int channel, i, count, len;
5087    int bitdepth;
5088    int w,h;
5089    stbi_uc *out;
5090 
5091    // Check identifier
5092    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
5093       return stbi__errpuc("not PSD", "Corrupt PSD image");
5094 
5095    // Check file type version.
5096    if (stbi__get16be(s) != 1)
5097       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
5098 
5099    // Skip 6 reserved bytes.
5100    stbi__skip(s, 6 );
5101 
5102    // Read the number of channels (R, G, B, A, etc).
5103    channelCount = stbi__get16be(s);
5104    if (channelCount < 0 || channelCount > 16)
5105       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
5106 
5107    // Read the rows and columns of the image.
5108    h = stbi__get32be(s);
5109    w = stbi__get32be(s);
5110 
5111    // Make sure the depth is 8 bits.
5112    bitdepth = stbi__get16be(s);
5113    if (bitdepth != 8 && bitdepth != 16)
5114       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
5115 
5116    // Make sure the color mode is RGB.
5117    // Valid options are:
5118    //   0: Bitmap
5119    //   1: Grayscale
5120    //   2: Indexed color
5121    //   3: RGB color
5122    //   4: CMYK color
5123    //   7: Multichannel
5124    //   8: Duotone
5125    //   9: Lab color
5126    if (stbi__get16be(s) != 3)
5127       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
5128 
5129    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
5130    stbi__skip(s,stbi__get32be(s) );
5131 
5132    // Skip the image resources.  (resolution, pen tool paths, etc)
5133    stbi__skip(s, stbi__get32be(s) );
5134 
5135    // Skip the reserved data.
5136    stbi__skip(s, stbi__get32be(s) );
5137 
5138    // Find out if the data is compressed.
5139    // Known values:
5140    //   0: no compression
5141    //   1: RLE compressed
5142    compression = stbi__get16be(s);
5143    if (compression > 1)
5144       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
5145 
5146    // Create the destination image.
5147    out = (stbi_uc *) stbi__malloc(4 * w*h);
5148    if (!out) return stbi__errpuc("outofmem", "Out of memory");
5149    pixelCount = w*h;
5150 
5151    // Initialize the data to zero.
5152    //memset( out, 0, pixelCount * 4 );
5153 
5154    // Finally, the image data.
5155    if (compression) {
5156       // RLE as used by .PSD and .TIFF
5157       // Loop until you get the number of unpacked bytes you are expecting:
5158       //     Read the next source byte into n.
5159       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
5160       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
5161       //     Else if n is 128, noop.
5162       // Endloop
5163 
5164       // The RLE-compressed data is preceded by a 2-byte data count for each row in the data,
5165       // which we're going to just skip.
5166       stbi__skip(s, h * channelCount * 2 );
5167 
5168       // Read the RLE data by channel.
5169       for (channel = 0; channel < 4; channel++) {
5170          stbi_uc *p;
5171 
5172          p = out+channel;
5173          if (channel >= channelCount) {
5174             // Fill this channel with default data.
5175             for (i = 0; i < pixelCount; i++, p += 4)
5176                *p = (channel == 3 ? 255 : 0);
5177          } else {
5178             // Read the RLE data.
5179             count = 0;
5180             while (count < pixelCount) {
5181                len = stbi__get8(s);
5182                if (len == 128) {
5183                   // No-op.
5184                } else if (len < 128) {
5185                   // Copy next len+1 bytes literally.
5186                   len++;
5187                   count += len;
5188                   while (len) {
5189                      *p = stbi__get8(s);
5190                      p += 4;
5191                      len--;
5192                   }
5193                } else if (len > 128) {
5194                   stbi_uc   val;
5195                   // Next -len+1 bytes in the dest are replicated from next source byte.
5196                   // (Interpret len as a negative 8-bit int.)
5197                   len ^= 0x0FF;
5198                   len += 2;
5199                   val = stbi__get8(s);
5200                   count += len;
5201                   while (len) {
5202                      *p = val;
5203                      p += 4;
5204                      len--;
5205                   }
5206                }
5207             }
5208          }
5209       }
5210 
5211    } else {
5212       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
5213       // where each channel consists of an 8-bit value for each pixel in the image.
5214 
5215       // Read the data by channel.
5216       for (channel = 0; channel < 4; channel++) {
5217          stbi_uc *p;
5218 
5219          p = out + channel;
5220          if (channel >= channelCount) {
5221             // Fill this channel with default data.
5222             stbi_uc val = channel == 3 ? 255 : 0;
5223             for (i = 0; i < pixelCount; i++, p += 4)
5224                *p = val;
5225          } else {
5226             // Read the data.
5227             if (bitdepth == 16) {
5228                for (i = 0; i < pixelCount; i++, p += 4)
5229                   *p = (stbi_uc) (stbi__get16be(s) >> 8);
5230             } else {
5231                for (i = 0; i < pixelCount; i++, p += 4)
5232                   *p = stbi__get8(s);
5233             }
5234          }
5235       }
5236    }
5237 
5238    if (req_comp && req_comp != 4) {
5239       out = stbi__convert_format(out, 4, req_comp, w, h);
5240       if (out == NULL) return out; // stbi__convert_format frees input on failure
5241    }
5242 
5243    if (comp) *comp = 4;
5244    *y = h;
5245    *x = w;
5246 
5247    return out;
5248 }
5249 #endif
5250 
5251 // *************************************************************************************************
5252 // Softimage PIC loader
5253 // by Tom Seddon
5254 //
5255 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
5256 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
5257 
5258 #ifndef STBI_NO_PIC
stbi__pic_is4(stbi__context * s,const char * str)5259 static int stbi__pic_is4(stbi__context *s,const char *str)
5260 {
5261    int i;
5262    for (i=0; i<4; ++i)
5263       if (stbi__get8(s) != (stbi_uc)str[i])
5264          return 0;
5265 
5266    return 1;
5267 }
5268 
stbi__pic_test_core(stbi__context * s)5269 static int stbi__pic_test_core(stbi__context *s)
5270 {
5271    int i;
5272 
5273    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
5274       return 0;
5275 
5276    for(i=0;i<84;++i)
5277       stbi__get8(s);
5278 
5279    if (!stbi__pic_is4(s,"PICT"))
5280       return 0;
5281 
5282    return 1;
5283 }
5284 
5285 typedef struct
5286 {
5287    stbi_uc size,type,channel;
5288 } stbi__pic_packet;
5289 
stbi__readval(stbi__context * s,int channel,stbi_uc * dest)5290 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
5291 {
5292    int mask=0x80, i;
5293 
5294    for (i=0; i<4; ++i, mask>>=1) {
5295       if (channel & mask) {
5296          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
5297          dest[i]=stbi__get8(s);
5298       }
5299    }
5300 
5301    return dest;
5302 }
5303 
stbi__copyval(int channel,stbi_uc * dest,const stbi_uc * src)5304 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
5305 {
5306    int mask=0x80,i;
5307 
5308    for (i=0;i<4; ++i, mask>>=1)
5309       if (channel&mask)
5310          dest[i]=src[i];
5311 }
5312 
stbi__pic_load_core(stbi__context * s,int width,int height,int * comp,stbi_uc * result)5313 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
5314 {
5315    int act_comp=0,num_packets=0,y,chained;
5316    stbi__pic_packet packets[10];
5317 
5318    // this will (should...) cater for even some bizarre stuff like having data
5319     // for the same channel in multiple packets.
5320    do {
5321       stbi__pic_packet *packet;
5322 
5323       if (num_packets==sizeof(packets)/sizeof(packets[0]))
5324          return stbi__errpuc("bad format","too many packets");
5325 
5326       packet = &packets[num_packets++];
5327 
5328       chained = stbi__get8(s);
5329       packet->size    = stbi__get8(s);
5330       packet->type    = stbi__get8(s);
5331       packet->channel = stbi__get8(s);
5332 
5333       act_comp |= packet->channel;
5334 
5335       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
5336       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
5337    } while (chained);
5338 
5339    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
5340 
5341    for(y=0; y<height; ++y) {
5342       int packet_idx;
5343 
5344       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
5345          stbi__pic_packet *packet = &packets[packet_idx];
5346          stbi_uc *dest = result+y*width*4;
5347 
5348          switch (packet->type) {
5349             default:
5350                return stbi__errpuc("bad format","packet has bad compression type");
5351 
5352             case 0: {//uncompressed
5353                int x;
5354 
5355                for(x=0;x<width;++x, dest+=4)
5356                   if (!stbi__readval(s,packet->channel,dest))
5357                      return 0;
5358                break;
5359             }
5360 
5361             case 1://Pure RLE
5362                {
5363                   int left=width, i;
5364 
5365                   while (left>0) {
5366                      stbi_uc count,value[4];
5367 
5368                      count=stbi__get8(s);
5369                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
5370 
5371                      if (count > left)
5372                         count = (stbi_uc) left;
5373 
5374                      if (!stbi__readval(s,packet->channel,value))  return 0;
5375 
5376                      for(i=0; i<count; ++i,dest+=4)
5377                         stbi__copyval(packet->channel,dest,value);
5378                      left -= count;
5379                   }
5380                }
5381                break;
5382 
5383             case 2: {//Mixed RLE
5384                int left=width;
5385                while (left>0) {
5386                   int count = stbi__get8(s), i;
5387                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
5388 
5389                   if (count >= 128) { // Repeated
5390                      stbi_uc value[4];
5391 
5392                      if (count==128)
5393                         count = stbi__get16be(s);
5394                      else
5395                         count -= 127;
5396                      if (count > left)
5397                         return stbi__errpuc("bad file","scanline overrun");
5398 
5399                      if (!stbi__readval(s,packet->channel,value))
5400                         return 0;
5401 
5402                      for(i=0;i<count;++i, dest += 4)
5403                         stbi__copyval(packet->channel,dest,value);
5404                   } else { // Raw
5405                      ++count;
5406                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
5407 
5408                      for(i=0;i<count;++i, dest+=4)
5409                         if (!stbi__readval(s,packet->channel,dest))
5410                            return 0;
5411                   }
5412                   left-=count;
5413                }
5414                break;
5415             }
5416          }
5417       }
5418    }
5419 
5420    return result;
5421 }
5422 
stbi__pic_load(stbi__context * s,int * px,int * py,int * comp,int req_comp)5423 static stbi_uc *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp)
5424 {
5425    stbi_uc *result;
5426    int i, x,y;
5427 
5428    for (i=0; i<92; ++i)
5429       stbi__get8(s);
5430 
5431    x = stbi__get16be(s);
5432    y = stbi__get16be(s);
5433    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
5434    if ((1 << 28) / x < y) return stbi__errpuc("too large", "Image too large to decode");
5435 
5436    stbi__get32be(s); //skip `ratio'
5437    stbi__get16be(s); //skip `fields'
5438    stbi__get16be(s); //skip `pad'
5439 
5440    // intermediate buffer is RGBA
5441    result = (stbi_uc *) stbi__malloc(x*y*4);
5442    memset(result, 0xff, x*y*4);
5443 
5444    if (!stbi__pic_load_core(s,x,y,comp, result)) {
5445       STBI_FREE(result);
5446       result=0;
5447    }
5448    *px = x;
5449    *py = y;
5450    if (req_comp == 0) req_comp = *comp;
5451    result=stbi__convert_format(result,4,req_comp,x,y);
5452 
5453    return result;
5454 }
5455 
stbi__pic_test(stbi__context * s)5456 static int stbi__pic_test(stbi__context *s)
5457 {
5458    int r = stbi__pic_test_core(s);
5459    stbi__rewind(s);
5460    return r;
5461 }
5462 #endif
5463 
5464 // *************************************************************************************************
5465 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
5466 
5467 #ifndef STBI_NO_GIF
5468 typedef struct
5469 {
5470    stbi__int16 prefix;
5471    stbi_uc first;
5472    stbi_uc suffix;
5473 } stbi__gif_lzw;
5474 
5475 typedef struct
5476 {
5477    int w,h;
5478    stbi_uc *out, *old_out;             // output buffer (always 4 components)
5479    int flags, bgindex, ratio, transparent, eflags, delay;
5480    stbi_uc  pal[256][4];
5481    stbi_uc lpal[256][4];
5482    stbi__gif_lzw codes[4096];
5483    stbi_uc *color_table;
5484    int parse, step;
5485    int lflags;
5486    int start_x, start_y;
5487    int max_x, max_y;
5488    int cur_x, cur_y;
5489    int line_size;
5490 } stbi__gif;
5491 
stbi__gif_test_raw(stbi__context * s)5492 static int stbi__gif_test_raw(stbi__context *s)
5493 {
5494    int sz;
5495    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
5496    sz = stbi__get8(s);
5497    if (sz != '9' && sz != '7') return 0;
5498    if (stbi__get8(s) != 'a') return 0;
5499    return 1;
5500 }
5501 
stbi__gif_test(stbi__context * s)5502 static int stbi__gif_test(stbi__context *s)
5503 {
5504    int r = stbi__gif_test_raw(s);
5505    stbi__rewind(s);
5506    return r;
5507 }
5508 
stbi__gif_parse_colortable(stbi__context * s,stbi_uc pal[256][4],int num_entries,int transp)5509 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
5510 {
5511    int i;
5512    for (i=0; i < num_entries; ++i) {
5513       pal[i][2] = stbi__get8(s);
5514       pal[i][1] = stbi__get8(s);
5515       pal[i][0] = stbi__get8(s);
5516       pal[i][3] = transp == i ? 0 : 255;
5517    }
5518 }
5519 
stbi__gif_header(stbi__context * s,stbi__gif * g,int * comp,int is_info)5520 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
5521 {
5522    stbi_uc version;
5523    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
5524       return stbi__err("not GIF", "Corrupt GIF");
5525 
5526    version = stbi__get8(s);
5527    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
5528    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
5529 
5530    stbi__g_failure_reason = "";
5531    g->w = stbi__get16le(s);
5532    g->h = stbi__get16le(s);
5533    g->flags = stbi__get8(s);
5534    g->bgindex = stbi__get8(s);
5535    g->ratio = stbi__get8(s);
5536    g->transparent = -1;
5537 
5538    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
5539 
5540    if (is_info) return 1;
5541 
5542    if (g->flags & 0x80)
5543       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
5544 
5545    return 1;
5546 }
5547 
stbi__gif_info_raw(stbi__context * s,int * x,int * y,int * comp)5548 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
5549 {
5550    stbi__gif g;
5551    if (!stbi__gif_header(s, &g, comp, 1)) {
5552       stbi__rewind( s );
5553       return 0;
5554    }
5555    if (x) *x = g.w;
5556    if (y) *y = g.h;
5557    return 1;
5558 }
5559 
stbi__out_gif_code(stbi__gif * g,stbi__uint16 code)5560 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
5561 {
5562    stbi_uc *p, *c;
5563 
5564    // recurse to decode the prefixes, since the linked-list is backwards,
5565    // and working backwards through an interleaved image would be nasty
5566    if (g->codes[code].prefix >= 0)
5567       stbi__out_gif_code(g, g->codes[code].prefix);
5568 
5569    if (g->cur_y >= g->max_y) return;
5570 
5571    p = &g->out[g->cur_x + g->cur_y];
5572    c = &g->color_table[g->codes[code].suffix * 4];
5573 
5574    if (c[3] >= 128) {
5575       p[0] = c[2];
5576       p[1] = c[1];
5577       p[2] = c[0];
5578       p[3] = c[3];
5579    }
5580    g->cur_x += 4;
5581 
5582    if (g->cur_x >= g->max_x) {
5583       g->cur_x = g->start_x;
5584       g->cur_y += g->step;
5585 
5586       while (g->cur_y >= g->max_y && g->parse > 0) {
5587          g->step = (1 << g->parse) * g->line_size;
5588          g->cur_y = g->start_y + (g->step >> 1);
5589          --g->parse;
5590       }
5591    }
5592 }
5593 
stbi__process_gif_raster(stbi__context * s,stbi__gif * g)5594 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
5595 {
5596    stbi_uc lzw_cs;
5597    stbi__int32 len, init_code;
5598    stbi__uint32 first;
5599    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
5600    stbi__gif_lzw *p;
5601 
5602    lzw_cs = stbi__get8(s);
5603    if (lzw_cs > 12) return NULL;
5604    clear = 1 << lzw_cs;
5605    first = 1;
5606    codesize = lzw_cs + 1;
5607    codemask = (1 << codesize) - 1;
5608    bits = 0;
5609    valid_bits = 0;
5610    for (init_code = 0; init_code < clear; init_code++) {
5611       g->codes[init_code].prefix = -1;
5612       g->codes[init_code].first = (stbi_uc) init_code;
5613       g->codes[init_code].suffix = (stbi_uc) init_code;
5614    }
5615 
5616    // support no starting clear code
5617    avail = clear+2;
5618    oldcode = -1;
5619 
5620    len = 0;
5621    for(;;) {
5622       if (valid_bits < codesize) {
5623          if (len == 0) {
5624             len = stbi__get8(s); // start new block
5625             if (len == 0)
5626                return g->out;
5627          }
5628          --len;
5629          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
5630          valid_bits += 8;
5631       } else {
5632          stbi__int32 code = bits & codemask;
5633          bits >>= codesize;
5634          valid_bits -= codesize;
5635          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
5636          if (code == clear) {  // clear code
5637             codesize = lzw_cs + 1;
5638             codemask = (1 << codesize) - 1;
5639             avail = clear + 2;
5640             oldcode = -1;
5641             first = 0;
5642          } else if (code == clear + 1) { // end of stream code
5643             stbi__skip(s, len);
5644             while ((len = stbi__get8(s)) > 0)
5645                stbi__skip(s,len);
5646             return g->out;
5647          } else if (code <= avail) {
5648             if (first) return stbi__errpuc("no clear code", "Corrupt GIF");
5649 
5650             if (oldcode >= 0) {
5651                p = &g->codes[avail++];
5652                if (avail > 4096)        return stbi__errpuc("too many codes", "Corrupt GIF");
5653                p->prefix = (stbi__int16) oldcode;
5654                p->first = g->codes[oldcode].first;
5655                p->suffix = (code == avail) ? p->first : g->codes[code].first;
5656             } else if (code == avail)
5657                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5658 
5659             stbi__out_gif_code(g, (stbi__uint16) code);
5660 
5661             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
5662                codesize++;
5663                codemask = (1 << codesize) - 1;
5664             }
5665 
5666             oldcode = code;
5667          } else {
5668             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5669          }
5670       }
5671    }
5672 }
5673 
stbi__fill_gif_background(stbi__gif * g,int x0,int y0,int x1,int y1)5674 static void stbi__fill_gif_background(stbi__gif *g, int x0, int y0, int x1, int y1)
5675 {
5676    int x, y;
5677    stbi_uc *c = g->pal[g->bgindex];
5678    for (y = y0; y < y1; y += 4 * g->w) {
5679       for (x = x0; x < x1; x += 4) {
5680          stbi_uc *p  = &g->out[y + x];
5681          p[0] = c[2];
5682          p[1] = c[1];
5683          p[2] = c[0];
5684          p[3] = 0;
5685       }
5686    }
5687 }
5688 
5689 // this function is designed to support animated gifs, although stb_image doesn't support it
stbi__gif_load_next(stbi__context * s,stbi__gif * g,int * comp,int req_comp)5690 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp)
5691 {
5692    int i;
5693    stbi_uc *prev_out = 0;
5694 
5695    if (g->out == 0 && !stbi__gif_header(s, g, comp,0))
5696       return 0; // stbi__g_failure_reason set by stbi__gif_header
5697 
5698    prev_out = g->out;
5699    g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
5700    if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory");
5701 
5702    switch ((g->eflags & 0x1C) >> 2) {
5703       case 0: // unspecified (also always used on 1st frame)
5704          stbi__fill_gif_background(g, 0, 0, 4 * g->w, 4 * g->w * g->h);
5705          break;
5706       case 1: // do not dispose
5707          if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
5708          g->old_out = prev_out;
5709          break;
5710       case 2: // dispose to background
5711          if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
5712          stbi__fill_gif_background(g, g->start_x, g->start_y, g->max_x, g->max_y);
5713          break;
5714       case 3: // dispose to previous
5715          if (g->old_out) {
5716             for (i = g->start_y; i < g->max_y; i += 4 * g->w)
5717                memcpy(&g->out[i + g->start_x], &g->old_out[i + g->start_x], g->max_x - g->start_x);
5718          }
5719          break;
5720    }
5721 
5722    for (;;) {
5723       switch (stbi__get8(s)) {
5724          case 0x2C: /* Image Descriptor */
5725          {
5726             int prev_trans = -1;
5727             stbi__int32 x, y, w, h;
5728             stbi_uc *o;
5729 
5730             x = stbi__get16le(s);
5731             y = stbi__get16le(s);
5732             w = stbi__get16le(s);
5733             h = stbi__get16le(s);
5734             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
5735                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
5736 
5737             g->line_size = g->w * 4;
5738             g->start_x = x * 4;
5739             g->start_y = y * g->line_size;
5740             g->max_x   = g->start_x + w * 4;
5741             g->max_y   = g->start_y + h * g->line_size;
5742             g->cur_x   = g->start_x;
5743             g->cur_y   = g->start_y;
5744 
5745             g->lflags = stbi__get8(s);
5746 
5747             if (g->lflags & 0x40) {
5748                g->step = 8 * g->line_size; // first interlaced spacing
5749                g->parse = 3;
5750             } else {
5751                g->step = g->line_size;
5752                g->parse = 0;
5753             }
5754 
5755             if (g->lflags & 0x80) {
5756                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
5757                g->color_table = (stbi_uc *) g->lpal;
5758             } else if (g->flags & 0x80) {
5759                if (g->transparent >= 0 && (g->eflags & 0x01)) {
5760                   prev_trans = g->pal[g->transparent][3];
5761                   g->pal[g->transparent][3] = 0;
5762                }
5763                g->color_table = (stbi_uc *) g->pal;
5764             } else
5765                return stbi__errpuc("missing color table", "Corrupt GIF");
5766 
5767             o = stbi__process_gif_raster(s, g);
5768             if (o == NULL) return NULL;
5769 
5770             if (prev_trans != -1)
5771                g->pal[g->transparent][3] = (stbi_uc) prev_trans;
5772 
5773             return o;
5774          }
5775 
5776          case 0x21: // Comment Extension.
5777          {
5778             int len;
5779             if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
5780                len = stbi__get8(s);
5781                if (len == 4) {
5782                   g->eflags = stbi__get8(s);
5783                   g->delay = stbi__get16le(s);
5784                   g->transparent = stbi__get8(s);
5785                } else {
5786                   stbi__skip(s, len);
5787                   break;
5788                }
5789             }
5790             while ((len = stbi__get8(s)) != 0)
5791                stbi__skip(s, len);
5792             break;
5793          }
5794 
5795          case 0x3B: // gif stream termination code
5796             return (stbi_uc *) s; // using '1' causes warning on some compilers
5797 
5798          default:
5799             return stbi__errpuc("unknown code", "Corrupt GIF");
5800       }
5801    }
5802 
5803    STBI_NOTUSED(req_comp);
5804 }
5805 
stbi__gif_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5806 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5807 {
5808    stbi_uc *u = 0;
5809    stbi__gif g;
5810    memset(&g, 0, sizeof(g));
5811 
5812    u = stbi__gif_load_next(s, &g, comp, req_comp);
5813    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
5814    if (u) {
5815       *x = g.w;
5816       *y = g.h;
5817       if (req_comp && req_comp != 4)
5818          u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
5819    }
5820    else if (g.out)
5821       STBI_FREE(g.out);
5822 
5823    return u;
5824 }
5825 
stbi__gif_info(stbi__context * s,int * x,int * y,int * comp)5826 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
5827 {
5828    return stbi__gif_info_raw(s,x,y,comp);
5829 }
5830 #endif
5831 
5832 // *************************************************************************************************
5833 // Radiance RGBE HDR loader
5834 // originally by Nicolas Schulz
5835 #ifndef STBI_NO_HDR
stbi__hdr_test_core(stbi__context * s)5836 static int stbi__hdr_test_core(stbi__context *s)
5837 {
5838    const char *signature = "#?RADIANCE\n";
5839    int i;
5840    for (i=0; signature[i]; ++i)
5841       if (stbi__get8(s) != signature[i])
5842          return 0;
5843    return 1;
5844 }
5845 
stbi__hdr_test(stbi__context * s)5846 static int stbi__hdr_test(stbi__context* s)
5847 {
5848    int r = stbi__hdr_test_core(s);
5849    stbi__rewind(s);
5850    return r;
5851 }
5852 
5853 #define STBI__HDR_BUFLEN  1024
stbi__hdr_gettoken(stbi__context * z,char * buffer)5854 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
5855 {
5856    int len=0;
5857    char c = '\0';
5858 
5859    c = (char) stbi__get8(z);
5860 
5861    while (!stbi__at_eof(z) && c != '\n') {
5862       buffer[len++] = c;
5863       if (len == STBI__HDR_BUFLEN-1) {
5864          // flush to end of line
5865          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
5866             ;
5867          break;
5868       }
5869       c = (char) stbi__get8(z);
5870    }
5871 
5872    buffer[len] = 0;
5873    return buffer;
5874 }
5875 
stbi__hdr_convert(float * output,stbi_uc * input,int req_comp)5876 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
5877 {
5878    if ( input[3] != 0 ) {
5879       float f1;
5880       // Exponent
5881       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
5882       if (req_comp <= 2)
5883          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
5884       else {
5885          output[0] = input[0] * f1;
5886          output[1] = input[1] * f1;
5887          output[2] = input[2] * f1;
5888       }
5889       if (req_comp == 2) output[1] = 1;
5890       if (req_comp == 4) output[3] = 1;
5891    } else {
5892       switch (req_comp) {
5893          case 4: output[3] = 1; /* fallthrough */
5894          case 3: output[0] = output[1] = output[2] = 0;
5895                  break;
5896          case 2: output[1] = 1; /* fallthrough */
5897          case 1: output[0] = 0;
5898                  break;
5899       }
5900    }
5901 }
5902 
stbi__hdr_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5903 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5904 {
5905    char buffer[STBI__HDR_BUFLEN];
5906    char *token;
5907    int valid = 0;
5908    int width, height;
5909    stbi_uc *scanline;
5910    float *hdr_data;
5911    int len;
5912    unsigned char count, value;
5913    int i, j, k, c1,c2, z;
5914 
5915 
5916    // Check identifier
5917    if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0)
5918       return stbi__errpf("not HDR", "Corrupt HDR image");
5919 
5920    // Parse header
5921    for(;;) {
5922       token = stbi__hdr_gettoken(s,buffer);
5923       if (token[0] == 0) break;
5924       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
5925    }
5926 
5927    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
5928 
5929    // Parse width and height
5930    // can't use sscanf() if we're not using stdio!
5931    token = stbi__hdr_gettoken(s,buffer);
5932    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5933    token += 3;
5934    height = (int) strtol(token, &token, 10);
5935    while (*token == ' ') ++token;
5936    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5937    token += 3;
5938    width = (int) strtol(token, NULL, 10);
5939 
5940    *x = width;
5941    *y = height;
5942 
5943    if (comp) *comp = 3;
5944    if (req_comp == 0) req_comp = 3;
5945 
5946    // Read data
5947    hdr_data = (float *) stbi__malloc(height * width * req_comp * sizeof(float));
5948 
5949    // Load image data
5950    // image data is stored as some number of sca
5951    if ( width < 8 || width >= 32768) {
5952       // Read flat data
5953       for (j=0; j < height; ++j) {
5954          for (i=0; i < width; ++i) {
5955             stbi_uc rgbe[4];
5956            main_decode_loop:
5957             stbi__getn(s, rgbe, 4);
5958             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
5959          }
5960       }
5961    } else {
5962       // Read RLE-encoded data
5963       scanline = NULL;
5964 
5965       for (j = 0; j < height; ++j) {
5966          c1 = stbi__get8(s);
5967          c2 = stbi__get8(s);
5968          len = stbi__get8(s);
5969          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
5970             // not run-length encoded, so we have to actually use THIS data as a decoded
5971             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
5972             stbi_uc rgbe[4];
5973             rgbe[0] = (stbi_uc) c1;
5974             rgbe[1] = (stbi_uc) c2;
5975             rgbe[2] = (stbi_uc) len;
5976             rgbe[3] = (stbi_uc) stbi__get8(s);
5977             stbi__hdr_convert(hdr_data, rgbe, req_comp);
5978             i = 1;
5979             j = 0;
5980             STBI_FREE(scanline);
5981             goto main_decode_loop; // yes, this makes no sense
5982          }
5983          len <<= 8;
5984          len |= stbi__get8(s);
5985          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
5986          if (scanline == NULL) scanline = (stbi_uc *) stbi__malloc(width * 4);
5987 
5988          for (k = 0; k < 4; ++k) {
5989             i = 0;
5990             while (i < width) {
5991                count = stbi__get8(s);
5992                if (count > 128) {
5993                   // Run
5994                   value = stbi__get8(s);
5995                   count -= 128;
5996                   for (z = 0; z < count; ++z)
5997                      scanline[i++ * 4 + k] = value;
5998                } else {
5999                   // Dump
6000                   for (z = 0; z < count; ++z)
6001                      scanline[i++ * 4 + k] = stbi__get8(s);
6002                }
6003             }
6004          }
6005          for (i=0; i < width; ++i)
6006             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
6007       }
6008       STBI_FREE(scanline);
6009    }
6010 
6011    return hdr_data;
6012 }
6013 
stbi__hdr_info(stbi__context * s,int * x,int * y,int * comp)6014 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
6015 {
6016    char buffer[STBI__HDR_BUFLEN];
6017    char *token;
6018    int valid = 0;
6019 
6020    if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0) {
6021        stbi__rewind( s );
6022        return 0;
6023    }
6024 
6025    for(;;) {
6026       token = stbi__hdr_gettoken(s,buffer);
6027       if (token[0] == 0) break;
6028       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
6029    }
6030 
6031    if (!valid) {
6032        stbi__rewind( s );
6033        return 0;
6034    }
6035    token = stbi__hdr_gettoken(s,buffer);
6036    if (strncmp(token, "-Y ", 3)) {
6037        stbi__rewind( s );
6038        return 0;
6039    }
6040    token += 3;
6041    *y = (int) strtol(token, &token, 10);
6042    while (*token == ' ') ++token;
6043    if (strncmp(token, "+X ", 3)) {
6044        stbi__rewind( s );
6045        return 0;
6046    }
6047    token += 3;
6048    *x = (int) strtol(token, NULL, 10);
6049    *comp = 3;
6050    return 1;
6051 }
6052 #endif // STBI_NO_HDR
6053 
6054 #ifndef STBI_NO_BMP
stbi__bmp_info(stbi__context * s,int * x,int * y,int * comp)6055 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
6056 {
6057    int hsz;
6058    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') {
6059        stbi__rewind( s );
6060        return 0;
6061    }
6062    stbi__skip(s,12);
6063    hsz = stbi__get32le(s);
6064    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) {
6065        stbi__rewind( s );
6066        return 0;
6067    }
6068    if (hsz == 12) {
6069       *x = stbi__get16le(s);
6070       *y = stbi__get16le(s);
6071    } else {
6072       *x = stbi__get32le(s);
6073       *y = stbi__get32le(s);
6074    }
6075    if (stbi__get16le(s) != 1) {
6076        stbi__rewind( s );
6077        return 0;
6078    }
6079    *comp = stbi__get16le(s) / 8;
6080    return 1;
6081 }
6082 #endif
6083 
6084 #ifndef STBI_NO_PSD
stbi__psd_info(stbi__context * s,int * x,int * y,int * comp)6085 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
6086 {
6087    int channelCount;
6088    if (stbi__get32be(s) != 0x38425053) {
6089        stbi__rewind( s );
6090        return 0;
6091    }
6092    if (stbi__get16be(s) != 1) {
6093        stbi__rewind( s );
6094        return 0;
6095    }
6096    stbi__skip(s, 6);
6097    channelCount = stbi__get16be(s);
6098    if (channelCount < 0 || channelCount > 16) {
6099        stbi__rewind( s );
6100        return 0;
6101    }
6102    *y = stbi__get32be(s);
6103    *x = stbi__get32be(s);
6104    if (stbi__get16be(s) != 8) {
6105        stbi__rewind( s );
6106        return 0;
6107    }
6108    if (stbi__get16be(s) != 3) {
6109        stbi__rewind( s );
6110        return 0;
6111    }
6112    *comp = 4;
6113    return 1;
6114 }
6115 #endif
6116 
6117 #ifndef STBI_NO_PIC
stbi__pic_info(stbi__context * s,int * x,int * y,int * comp)6118 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
6119 {
6120    int act_comp=0,num_packets=0,chained;
6121    stbi__pic_packet packets[10];
6122 
6123    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
6124       stbi__rewind(s);
6125       return 0;
6126    }
6127 
6128    stbi__skip(s, 88);
6129 
6130    *x = stbi__get16be(s);
6131    *y = stbi__get16be(s);
6132    if (stbi__at_eof(s)) {
6133       stbi__rewind( s);
6134       return 0;
6135    }
6136    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
6137       stbi__rewind( s );
6138       return 0;
6139    }
6140 
6141    stbi__skip(s, 8);
6142 
6143    do {
6144       stbi__pic_packet *packet;
6145 
6146       if (num_packets==sizeof(packets)/sizeof(packets[0]))
6147          return 0;
6148 
6149       packet = &packets[num_packets++];
6150       chained = stbi__get8(s);
6151       packet->size    = stbi__get8(s);
6152       packet->type    = stbi__get8(s);
6153       packet->channel = stbi__get8(s);
6154       act_comp |= packet->channel;
6155 
6156       if (stbi__at_eof(s)) {
6157           stbi__rewind( s );
6158           return 0;
6159       }
6160       if (packet->size != 8) {
6161           stbi__rewind( s );
6162           return 0;
6163       }
6164    } while (chained);
6165 
6166    *comp = (act_comp & 0x10 ? 4 : 3);
6167 
6168    return 1;
6169 }
6170 #endif
6171 
6172 // *************************************************************************************************
6173 // Portable Gray Map and Portable Pixel Map loader
6174 // by Ken Miller
6175 //
6176 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
6177 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
6178 //
6179 // Known limitations:
6180 //    Does not support comments in the header section
6181 //    Does not support ASCII image data (formats P2 and P3)
6182 //    Does not support 16-bit-per-channel
6183 
6184 #ifndef STBI_NO_PNM
6185 
stbi__pnm_test(stbi__context * s)6186 static int      stbi__pnm_test(stbi__context *s)
6187 {
6188    char p, t;
6189    p = (char) stbi__get8(s);
6190    t = (char) stbi__get8(s);
6191    if (p != 'P' || (t != '5' && t != '6')) {
6192        stbi__rewind( s );
6193        return 0;
6194    }
6195    return 1;
6196 }
6197 
stbi__pnm_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)6198 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
6199 {
6200    stbi_uc *out;
6201    if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
6202       return 0;
6203    *x = s->img_x;
6204    *y = s->img_y;
6205    *comp = s->img_n;
6206 
6207    out = (stbi_uc *) stbi__malloc(s->img_n * s->img_x * s->img_y);
6208    if (!out) return stbi__errpuc("outofmem", "Out of memory");
6209    stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
6210 
6211    if (req_comp && req_comp != s->img_n) {
6212       out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
6213       if (out == NULL) return out; // stbi__convert_format frees input on failure
6214    }
6215    return out;
6216 }
6217 
stbi__pnm_isspace(char c)6218 static int      stbi__pnm_isspace(char c)
6219 {
6220    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
6221 }
6222 
stbi__pnm_skip_whitespace(stbi__context * s,char * c)6223 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
6224 {
6225    while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
6226       *c = (char) stbi__get8(s);
6227 }
6228 
stbi__pnm_isdigit(char c)6229 static int      stbi__pnm_isdigit(char c)
6230 {
6231    return c >= '0' && c <= '9';
6232 }
6233 
stbi__pnm_getinteger(stbi__context * s,char * c)6234 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
6235 {
6236    int value = 0;
6237 
6238    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
6239       value = value*10 + (*c - '0');
6240       *c = (char) stbi__get8(s);
6241    }
6242 
6243    return value;
6244 }
6245 
stbi__pnm_info(stbi__context * s,int * x,int * y,int * comp)6246 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
6247 {
6248    int maxv;
6249    char c, p, t;
6250 
6251    stbi__rewind( s );
6252 
6253    // Get identifier
6254    p = (char) stbi__get8(s);
6255    t = (char) stbi__get8(s);
6256    if (p != 'P' || (t != '5' && t != '6')) {
6257        stbi__rewind( s );
6258        return 0;
6259    }
6260 
6261    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
6262 
6263    c = (char) stbi__get8(s);
6264    stbi__pnm_skip_whitespace(s, &c);
6265 
6266    *x = stbi__pnm_getinteger(s, &c); // read width
6267    stbi__pnm_skip_whitespace(s, &c);
6268 
6269    *y = stbi__pnm_getinteger(s, &c); // read height
6270    stbi__pnm_skip_whitespace(s, &c);
6271 
6272    maxv = stbi__pnm_getinteger(s, &c);  // read max value
6273 
6274    if (maxv > 255)
6275       return stbi__err("max value > 255", "PPM image not 8-bit");
6276    else
6277       return 1;
6278 }
6279 #endif
6280 
stbi__info_main(stbi__context * s,int * x,int * y,int * comp)6281 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
6282 {
6283    #ifndef STBI_NO_JPEG
6284    if (stbi__jpeg_info(s, x, y, comp)) return 1;
6285    #endif
6286 
6287    #ifndef STBI_NO_PNG
6288    if (stbi__png_info(s, x, y, comp))  return 1;
6289    #endif
6290 
6291    #ifndef STBI_NO_GIF
6292    if (stbi__gif_info(s, x, y, comp))  return 1;
6293    #endif
6294 
6295    #ifndef STBI_NO_BMP
6296    if (stbi__bmp_info(s, x, y, comp))  return 1;
6297    #endif
6298 
6299    #ifndef STBI_NO_PSD
6300    if (stbi__psd_info(s, x, y, comp))  return 1;
6301    #endif
6302 
6303    #ifndef STBI_NO_PIC
6304    if (stbi__pic_info(s, x, y, comp))  return 1;
6305    #endif
6306 
6307    #ifndef STBI_NO_PNM
6308    if (stbi__pnm_info(s, x, y, comp))  return 1;
6309    #endif
6310 
6311    #ifndef STBI_NO_HDR
6312    if (stbi__hdr_info(s, x, y, comp))  return 1;
6313    #endif
6314 
6315    // test tga last because it's a crappy test!
6316    #ifndef STBI_NO_TGA
6317    if (stbi__tga_info(s, x, y, comp))
6318        return 1;
6319    #endif
6320    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
6321 }
6322 
6323 #ifndef STBI_NO_STDIO
stbi_info(char const * filename,int * x,int * y,int * comp)6324 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
6325 {
6326     FILE *f = stbi__fopen(filename, "rb");
6327     int result;
6328     if (!f) return stbi__err("can't fopen", "Unable to open file");
6329     result = stbi_info_from_file(f, x, y, comp);
6330     fclose(f);
6331     return result;
6332 }
6333 
stbi_info_from_file(FILE * f,int * x,int * y,int * comp)6334 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
6335 {
6336    int r;
6337    stbi__context s;
6338    long pos = ftell(f);
6339    stbi__start_file(&s, f);
6340    r = stbi__info_main(&s,x,y,comp);
6341    fseek(f,pos,SEEK_SET);
6342    return r;
6343 }
6344 #endif // !STBI_NO_STDIO
6345 
stbi_info_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp)6346 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
6347 {
6348    stbi__context s;
6349    stbi__start_mem(&s,buffer,len);
6350    return stbi__info_main(&s,x,y,comp);
6351 }
6352 
stbi_info_from_callbacks(stbi_io_callbacks const * c,void * user,int * x,int * y,int * comp)6353 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
6354 {
6355    stbi__context s;
6356    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
6357    return stbi__info_main(&s,x,y,comp);
6358 }
6359 
6360 #endif // STB_IMAGE_IMPLEMENTATION
6361 
6362 /*
6363    revision history:
6364       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
6365       2.07  (2015-09-13) fix compiler warnings
6366                          partial animated GIF support
6367                          limited 16-bit PSD support
6368                          #ifdef unused functions
6369                          bug with < 92 byte PIC,PNM,HDR,TGA
6370       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
6371       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
6372       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
6373       2.03  (2015-04-12) extra corruption checking (mmozeiko)
6374                          stbi_set_flip_vertically_on_load (nguillemot)
6375                          fix NEON support; fix mingw support
6376       2.02  (2015-01-19) fix incorrect assert, fix warning
6377       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
6378       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
6379       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
6380                          progressive JPEG (stb)
6381                          PGM/PPM support (Ken Miller)
6382                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
6383                          GIF bugfix -- seemingly never worked
6384                          STBI_NO_*, STBI_ONLY_*
6385       1.48  (2014-12-14) fix incorrectly-named assert()
6386       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
6387                          optimize PNG (ryg)
6388                          fix bug in interlaced PNG with user-specified channel count (stb)
6389       1.46  (2014-08-26)
6390               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
6391       1.45  (2014-08-16)
6392               fix MSVC-ARM internal compiler error by wrapping malloc
6393       1.44  (2014-08-07)
6394               various warning fixes from Ronny Chevalier
6395       1.43  (2014-07-15)
6396               fix MSVC-only compiler problem in code changed in 1.42
6397       1.42  (2014-07-09)
6398               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
6399               fixes to stbi__cleanup_jpeg path
6400               added STBI_ASSERT to avoid requiring assert.h
6401       1.41  (2014-06-25)
6402               fix search&replace from 1.36 that messed up comments/error messages
6403       1.40  (2014-06-22)
6404               fix gcc struct-initialization warning
6405       1.39  (2014-06-15)
6406               fix to TGA optimization when req_comp != number of components in TGA;
6407               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
6408               add support for BMP version 5 (more ignored fields)
6409       1.38  (2014-06-06)
6410               suppress MSVC warnings on integer casts truncating values
6411               fix accidental rename of 'skip' field of I/O
6412       1.37  (2014-06-04)
6413               remove duplicate typedef
6414       1.36  (2014-06-03)
6415               convert to header file single-file library
6416               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
6417       1.35  (2014-05-27)
6418               various warnings
6419               fix broken STBI_SIMD path
6420               fix bug where stbi_load_from_file no longer left file pointer in correct place
6421               fix broken non-easy path for 32-bit BMP (possibly never used)
6422               TGA optimization by Arseny Kapoulkine
6423       1.34  (unknown)
6424               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
6425       1.33  (2011-07-14)
6426               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
6427       1.32  (2011-07-13)
6428               support for "info" function for all supported filetypes (SpartanJ)
6429       1.31  (2011-06-20)
6430               a few more leak fixes, bug in PNG handling (SpartanJ)
6431       1.30  (2011-06-11)
6432               added ability to load files via callbacks to accommodate custom input streams (Ben Wenger)
6433               removed deprecated format-specific test/load functions
6434               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
6435               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
6436               fix inefficiency in decoding 32-bit BMP (David Woo)
6437       1.29  (2010-08-16)
6438               various warning fixes from Aurelien Pocheville
6439       1.28  (2010-08-01)
6440               fix bug in GIF palette transparency (SpartanJ)
6441       1.27  (2010-08-01)
6442               cast-to-stbi_uc to fix warnings
6443       1.26  (2010-07-24)
6444               fix bug in file buffering for PNG reported by SpartanJ
6445       1.25  (2010-07-17)
6446               refix trans_data warning (Won Chun)
6447       1.24  (2010-07-12)
6448               perf improvements reading from files on platforms with lock-heavy fgetc()
6449               minor perf improvements for jpeg
6450               deprecated type-specific functions so we'll get feedback if they're needed
6451               attempt to fix trans_data warning (Won Chun)
6452       1.23    fixed bug in iPhone support
6453       1.22  (2010-07-10)
6454               removed image *writing* support
6455               stbi_info support from Jetro Lauha
6456               GIF support from Jean-Marc Lienher
6457               iPhone PNG-extensions from James Brown
6458               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
6459       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
6460       1.20    added support for Softimage PIC, by Tom Seddon
6461       1.19    bug in interlaced PNG corruption check (found by ryg)
6462       1.18  (2008-08-02)
6463               fix a threading bug (local mutable static)
6464       1.17    support interlaced PNG
6465       1.16    major bugfix - stbi__convert_format converted one too many pixels
6466       1.15    initialize some fields for thread safety
6467       1.14    fix threadsafe conversion bug
6468               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
6469       1.13    threadsafe
6470       1.12    const qualifiers in the API
6471       1.11    Support installable IDCT, colorspace conversion routines
6472       1.10    Fixes for 64-bit (don't use "unsigned long")
6473               optimized upsampling by Fabian "ryg" Giesen
6474       1.09    Fix format-conversion for PSD code (bad global variables!)
6475       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
6476       1.07    attempt to fix C++ warning/errors again
6477       1.06    attempt to fix C++ warning/errors again
6478       1.05    fix TGA loading to return correct *comp and use good luminance calc
6479       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
6480       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
6481       1.02    support for (subset of) HDR files, float interface for preferred access to them
6482       1.01    fix bug: possible bug in handling right-side up bmps... not sure
6483               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
6484       1.00    interface to zlib that skips zlib header
6485       0.99    correct handling of alpha in palette
6486       0.98    TGA loader by lonesock; dynamically add loaders (untested)
6487       0.97    jpeg errors on too large a file; also catch another malloc failure
6488       0.96    fix detection of invalid v value - particleman@mollyrocket forum
6489       0.95    during header scan, seek to markers in case of padding
6490       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
6491       0.93    handle jpegtran output; verbose errors
6492       0.92    read 4,8,16,24,32-bit BMP files of several formats
6493       0.91    output 24-bit Windows 3.0 BMP files
6494       0.90    fix a few more warnings; bump version number to approach 1.0
6495       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
6496       0.60    fix compiling as c++
6497       0.59    fix warnings: merge Dave Moore's -Wall fixes
6498       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
6499       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
6500       0.56    fix bug: zlib uncompressed mode len vs. nlen
6501       0.55    fix bug: restart_interval not initialized to 0
6502       0.54    allow NULL for 'int *comp'
6503       0.53    fix bug in png 3->4; speedup png decoding
6504       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
6505       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
6506               on 'test' only check type, not whether we support this variant
6507       0.50  (2006-11-19)
6508               first released version
6509 */
6510