1 /* stb_image - v2.05 - public domain image loader - http://nothings.org/stb_image.h
2                                      no warranty implied; use at your own risk
3 
4    Do this:
5       #define STB_IMAGE_IMPLEMENTATION
6    before you include this file in *one* C or C++ file to create the implementation.
7 
8    // i.e. it should look like this:
9    #include ...
10    #include ...
11    #include ...
12    #define STB_IMAGE_IMPLEMENTATION
13    #include "stb_image.h"
14 
15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
17 
18 
19    QUICK NOTES:
20       Primarily of interest to game developers and other people who can
21           avoid problematic images and only need the trivial interface
22 
23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
24       PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
25 
26       TGA (not sure what subset, if a subset)
27       BMP non-1bpp, non-RLE
28       PSD (composited view only, no extra channels)
29 
30       GIF (*comp always reports as 4-channel)
31       HDR (radiance rgbE format)
32       PIC (Softimage PIC)
33       PNM (PPM and PGM binary only)
34 
35       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
36       - decode from arbitrary I/O callbacks
37       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
38 
39    Full documentation under "DOCUMENTATION" below.
40 
41 
42    Revision 2.00 release notes:
43 
44       - Progressive JPEG is now supported.
45 
46       - PPM and PGM binary formats are now supported, thanks to Ken Miller.
47 
48       - x86 platforms now make use of SSE2 SIMD instructions for
49         JPEG decoding, and ARM platforms can use NEON SIMD if requested.
50         This work was done by Fabian "ryg" Giesen. SSE2 is used by
51         default, but NEON must be enabled explicitly; see docs.
52 
53         With other JPEG optimizations included in this version, we see
54         2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
55         on a JPEG on an ARM machine, relative to previous versions of this
56         library. The same results will not obtain for all JPGs and for all
57         x86/ARM machines. (Note that progressive JPEGs are significantly
58         slower to decode than regular JPEGs.) This doesn't mean that this
59         is the fastest JPEG decoder in the land; rather, it brings it
60         closer to parity with standard libraries. If you want the fastest
61         decode, look elsewhere. (See "Philosophy" section of docs below.)
62 
63         See final bullet items below for more info on SIMD.
64 
65       - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
66         the memory allocator. Unlike other STBI libraries, these macros don't
67         support a context parameter, so if you need to pass a context in to
68         the allocator, you'll have to store it in a global or a thread-local
69         variable.
70 
71       - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
72         STBI_NO_LINEAR.
73             STBI_NO_HDR:     suppress implementation of .hdr reader format
74             STBI_NO_LINEAR:  suppress high-dynamic-range light-linear float API
75 
76       - You can suppress implementation of any of the decoders to reduce
77         your code footprint by #defining one or more of the following
78         symbols before creating the implementation.
79 
80             STBI_NO_JPEG
81             STBI_NO_PNG
82             STBI_NO_BMP
83             STBI_NO_PSD
84             STBI_NO_TGA
85             STBI_NO_GIF
86             STBI_NO_HDR
87             STBI_NO_PIC
88             STBI_NO_PNM   (.ppm and .pgm)
89 
90       - You can request *only* certain decoders and suppress all other ones
91         (this will be more forward-compatible, as addition of new decoders
92         doesn't require you to disable them explicitly):
93 
94             STBI_ONLY_JPEG
95             STBI_ONLY_PNG
96             STBI_ONLY_BMP
97             STBI_ONLY_PSD
98             STBI_ONLY_TGA
99             STBI_ONLY_GIF
100             STBI_ONLY_HDR
101             STBI_ONLY_PIC
102             STBI_ONLY_PNM   (.ppm and .pgm)
103 
104          Note that you can define multiples of these, and you will get all
105          of them ("only x" and "only y" is interpreted to mean "only x&y").
106 
107        - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
108          want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
109 
110       - Compilation of all SIMD code can be suppressed with
111             #define STBI_NO_SIMD
112         It should not be necessary to disable SIMD unless you have issues
113         compiling (e.g. using an x86 compiler which doesn't support SSE
114         intrinsics or that doesn't support the method used to detect
115         SSE2 support at run-time), and even those can be reported as
116         bugs so I can refine the built-in compile-time checking to be
117         smarter.
118 
119       - The old STBI_SIMD system which allowed installing a user-defined
120         IDCT etc. has been removed. If you need this, don't upgrade. My
121         assumption is that almost nobody was doing this, and those who
122         were will find the built-in SIMD more satisfactory anyway.
123 
124       - RGB values computed for JPEG images are slightly different from
125         previous versions of stb_image. (This is due to using less
126         integer precision in SIMD.) The C code has been adjusted so
127         that the same RGB values will be computed regardless of whether
128         SIMD support is available, so your app should always produce
129         consistent results. But these results are slightly different from
130         previous versions. (Specifically, about 3% of available YCbCr values
131         will compute different RGB results from pre-1.49 versions by +-1;
132         most of the deviating values are one smaller in the G channel.)
133 
134       - If you must produce consistent results with previous versions of
135         stb_image, #define STBI_JPEG_OLD and you will get the same results
136         you used to; however, you will not get the SIMD speedups for
137         the YCbCr-to-RGB conversion step (although you should still see
138         significant JPEG speedup from the other changes).
139 
140         Please note that STBI_JPEG_OLD is a temporary feature; it will be
141         removed in future versions of the library. It is only intended for
142         near-term back-compatibility use.
143 
144 
145    Latest revision history:
146       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
147       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
148       2.03  (2015-04-12) additional corruption checking
149                          stbi_set_flip_vertically_on_load
150                          fix NEON support; fix mingw support
151       2.02  (2015-01-19) fix incorrect assert, fix warning
152       2.01  (2015-01-17) fix various warnings
153       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
154       2.00  (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD
155                          progressive JPEG
156                          PGM/PPM support
157                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
158                          STBI_NO_*, STBI_ONLY_*
159                          GIF bugfix
160       1.48  (2014-12-14) fix incorrectly-named assert()
161       1.47  (2014-12-14) 1/2/4-bit PNG support (both grayscale and paletted)
162                          optimize PNG
163                          fix bug in interlaced PNG with user-specified channel count
164 
165    See end of file for full revision history.
166 
167 
168  ============================    Contributors    =========================
169 
170  Image formats                                Bug fixes & warning fixes
171     Sean Barrett (jpeg, png, bmp)                Marc LeBlanc
172     Nicolas Schulz (hdr, psd)                    Christpher Lloyd
173     Jonathan Dummer (tga)                        Dave Moore
174     Jean-Marc Lienher (gif)                      Won Chun
175     Tom Seddon (pic)                             the Horde3D community
176     Thatcher Ulrich (psd)                        Janez Zemva
177     Ken Miller (pgm, ppm)                        Jonathan Blow
178                                                  Laurent Gomila
179                                                  Aruelien Pocheville
180  Extensions, features                            Ryamond Barbiero
181     Jetro Lauha (stbi_info)                      David Woo
182     Martin "SpartanJ" Golini (stbi_info)         Martin Golini
183     James "moose2000" Brown (iPhone PNG)         Roy Eltham
184     Ben "Disch" Wenger (io callbacks)            Luke Graham
185     Omar Cornut (1/2/4-bit PNG)                  Thomas Ruf
186     Nicolas Guillemot (vertical flip)            John Bartholomew
187                                                  Ken Hamada
188  Optimizations & bugfixes                        Cort Stratton
189     Fabian "ryg" Giesen                          Blazej Dariusz Roszkowski
190     Arseny Kapoulkine                            Thibault Reuille
191                                                  Paul Du Bois
192                                                  Guillaume George
193   If your name should be here but                Jerry Jansson
194   isn't, let Sean know.                          Hayaki Saito
195                                                  Johan Duparc
196                                                  Ronny Chevalier
197                                                  Michal Cichon
198                                                  Tero Hanninen
199                                                  Sergio Gonzalez
200                                                  Cass Everitt
201                                                  Engin Manap
202                                                  Martins Mozeiko
203                                                  Joseph Thomson
204                                                  Phil Jordan
205 
206 License:
207    This software is in the public domain. Where that dedication is not
208    recognized, you are granted a perpetual, irrevocable license to copy
209    and modify this file however you want.
210 
211 */
212 
213 #ifndef STBI_INCLUDE_STB_IMAGE_H
214 #define STBI_INCLUDE_STB_IMAGE_H
215 
216 // DOCUMENTATION
217 //
218 // Limitations:
219 //    - no 16-bit-per-channel PNG
220 //    - no 12-bit-per-channel JPEG
221 //    - no JPEGs with arithmetic coding
222 //    - no 1-bit BMP
223 //    - GIF always returns *comp=4
224 //
225 // Basic usage (see HDR discussion below for HDR usage):
226 //    int x,y,n;
227 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
228 //    // ... process data if not NULL ...
229 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
230 //    // ... replace '0' with '1'..'4' to force that many components per pixel
231 //    // ... but 'n' will always be the number that it would have been if you said 0
232 //    stbi_image_free(data)
233 //
234 // Standard parameters:
235 //    int *x       -- outputs image width in pixels
236 //    int *y       -- outputs image height in pixels
237 //    int *comp    -- outputs # of image components in image file
238 //    int req_comp -- if non-zero, # of image components requested in result
239 //
240 // The return value from an image loader is an 'unsigned char *' which points
241 // to the pixel data, or NULL on an allocation failure or if the image is
242 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
243 // with each pixel consisting of N interleaved 8-bit components; the first
244 // pixel pointed to is top-left-most in the image. There is no padding between
245 // image scanlines or between pixels, regardless of format. The number of
246 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
247 // If req_comp is non-zero, *comp has the number of components that _would_
248 // have been output otherwise. E.g. if you set req_comp to 4, you will always
249 // get RGBA output, but you can check *comp to see if it's trivially opaque
250 // because e.g. there were only 3 channels in the source image.
251 //
252 // An output image with N components has the following components interleaved
253 // in this order in each pixel:
254 //
255 //     N=#comp     components
256 //       1           grey
257 //       2           grey, alpha
258 //       3           red, green, blue
259 //       4           red, green, blue, alpha
260 //
261 // If image loading fails for any reason, the return value will be NULL,
262 // and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
263 // can be queried for an extremely brief, end-user unfriendly explanation
264 // of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
265 // compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
266 // more user-friendly ones.
267 //
268 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
269 //
270 // ===========================================================================
271 //
272 // Philosophy
273 //
274 // stb libraries are designed with the following priorities:
275 //
276 //    1. easy to use
277 //    2. easy to maintain
278 //    3. good performance
279 //
280 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
281 // and for best performance I may provide less-easy-to-use APIs that give higher
282 // performance, in addition to the easy to use ones. Nevertheless, it's important
283 // to keep in mind that from the standpoint of you, a client of this library,
284 // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
285 //
286 // Some secondary priorities arise directly from the first two, some of which
287 // make more explicit reasons why performance can't be emphasized.
288 //
289 //    - Portable ("ease of use")
290 //    - Small footprint ("easy to maintain")
291 //    - No dependencies ("ease of use")
292 //
293 // ===========================================================================
294 //
295 // I/O callbacks
296 //
297 // I/O callbacks allow you to read from arbitrary sources, like packaged
298 // files or some other source. Data read from callbacks are processed
299 // through a small internal buffer (currently 128 bytes) to try to reduce
300 // overhead.
301 //
302 // The three functions you must define are "read" (reads some bytes of data),
303 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
304 //
305 // ===========================================================================
306 //
307 // SIMD support
308 //
309 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
310 // supported by the compiler. For ARM Neon support, you must explicitly
311 // request it.
312 //
313 // (The old do-it-yourself SIMD API is no longer supported in the current
314 // code.)
315 //
316 // On x86, SSE2 will automatically be used when available based on a run-time
317 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
318 // the typical path is to have separate builds for NEON and non-NEON devices
319 // (at least this is true for iOS and Android). Therefore, the NEON support is
320 // toggled by a build flag: define STBI_NEON to get NEON loops.
321 //
322 // The output of the JPEG decoder is slightly different from versions where
323 // SIMD support was introduced (that is, for versions before 1.49). The
324 // difference is only +-1 in the 8-bit RGB channels, and only on a small
325 // fraction of pixels. You can force the pre-1.49 behavior by defining
326 // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
327 // and hence cost some performance.
328 //
329 // If for some reason you do not want to use any of SIMD code, or if
330 // you have issues compiling it, you can disable it entirely by
331 // defining STBI_NO_SIMD.
332 //
333 // ===========================================================================
334 //
335 // HDR image support   (disable by defining STBI_NO_HDR)
336 //
337 // stb_image now supports loading HDR images in general, and currently
338 // the Radiance .HDR file format, although the support is provided
339 // generically. You can still load any file through the existing interface;
340 // if you attempt to load an HDR file, it will be automatically remapped to
341 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
342 // both of these constants can be reconfigured through this interface:
343 //
344 //     stbi_hdr_to_ldr_gamma(2.2f);
345 //     stbi_hdr_to_ldr_scale(1.0f);
346 //
347 // (note, do not use _inverse_ constants; stbi_image will invert them
348 // appropriately).
349 //
350 // Additionally, there is a new, parallel interface for loading files as
351 // (linear) floats to preserve the full dynamic range:
352 //
353 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
354 //
355 // If you load LDR images through this interface, those images will
356 // be promoted to floating point values, run through the inverse of
357 // constants corresponding to the above:
358 //
359 //     stbi_ldr_to_hdr_scale(1.0f);
360 //     stbi_ldr_to_hdr_gamma(2.2f);
361 //
362 // Finally, given a filename (or an open file or memory block--see header
363 // file for details) containing image data, you can query for the "most
364 // appropriate" interface to use (that is, whether the image is HDR or
365 // not), using:
366 //
367 //     stbi_is_hdr(char *filename);
368 //
369 // ===========================================================================
370 //
371 // iPhone PNG support:
372 //
373 // By default we convert iphone-formatted PNGs back to RGB, even though
374 // they are internally encoded differently. You can disable this conversion
375 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
376 // you will always just get the native iphone "format" through (which
377 // is BGR stored in RGB).
378 //
379 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
380 // pixel to remove any premultiplied alpha *only* if the image file explicitly
381 // says there's premultiplied data (currently only happens in iPhone images,
382 // and only if iPhone convert-to-rgb processing is on).
383 //
384 
385 
386 #ifndef STBI_NO_STDIO
387 #include <stdio.h>
388 #endif // STBI_NO_STDIO
389 
390 #define STBI_VERSION 1
391 
392 enum
393 {
394    STBI_default = 0, // only used for req_comp
395 
396    STBI_grey       = 1,
397    STBI_grey_alpha = 2,
398    STBI_rgb        = 3,
399    STBI_rgb_alpha  = 4
400 };
401 
402 typedef unsigned char stbi_uc;
403 
404 #ifdef __cplusplus
405 extern "C" {
406 #endif
407 
408 #ifdef STB_IMAGE_STATIC
409 #define STBIDEF static
410 #else
411 #define STBIDEF extern
412 #endif
413 
414 //////////////////////////////////////////////////////////////////////////////
415 //
416 // PRIMARY API - works on images of any type
417 //
418 
419 //
420 // load image by filename, open file, or memory buffer
421 //
422 
423 typedef struct
424 {
425    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
426    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
427    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
428 } stbi_io_callbacks;
429 
430 STBIDEF stbi_uc *stbi_load               (char              const *filename,           int *x, int *y, int *comp, int req_comp);
431 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *comp, int req_comp);
432 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *comp, int req_comp);
433 
434 #ifndef STBI_NO_STDIO
435 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f,                  int *x, int *y, int *comp, int req_comp);
436 // for stbi_load_from_file, file pointer is left pointing immediately after image
437 #endif
438 
439 #ifndef STBI_NO_LINEAR
440    STBIDEF float *stbi_loadf                 (char const *filename,           int *x, int *y, int *comp, int req_comp);
441    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
442    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp);
443 
444    #ifndef STBI_NO_STDIO
445    STBIDEF float *stbi_loadf_from_file  (FILE *f,                int *x, int *y, int *comp, int req_comp);
446    #endif
447 #endif
448 
449 #ifndef STBI_NO_HDR
450    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
451    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
452 #endif
453 
454 #ifndef STBI_NO_LINEAR
455    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
456    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
457 #endif // STBI_NO_HDR
458 
459 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
460 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
461 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
462 #ifndef STBI_NO_STDIO
463 STBIDEF int      stbi_is_hdr          (char const *filename);
464 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
465 #endif // STBI_NO_STDIO
466 
467 
468 // get a VERY brief reason for failure
469 // NOT THREADSAFE
470 STBIDEF const char *stbi_failure_reason  (void);
471 
472 // free the loaded image -- this is just free()
473 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
474 
475 // get image dimensions & components without fully decoding
476 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
477 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
478 
479 #ifndef STBI_NO_STDIO
480 STBIDEF int      stbi_info            (char const *filename,     int *x, int *y, int *comp);
481 STBIDEF int      stbi_info_from_file  (FILE *f,                  int *x, int *y, int *comp);
482 
483 #endif
484 
485 
486 
487 // for image formats that explicitly notate that they have premultiplied alpha,
488 // we just return the colors as stored in the file. set this flag to force
489 // unpremultiplication. results are undefined if the unpremultiply overflow.
490 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
491 
492 // indicate whether we should process iphone images back to canonical format,
493 // or just pass them through "as-is"
494 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
495 
496 // flip the image vertically, so the first pixel in the output array is the bottom left
497 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
498 
499 // ZLIB client - used by PNG, available for other purposes
500 
501 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
502 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
503 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
504 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
505 
506 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
507 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
508 
509 #ifndef STBI_NO_DDS
510 #include "stbi_DDS.h"
511 #endif
512 
513 #ifndef STBI_NO_PVR
514 #include "stbi_pvr.h"
515 #endif
516 
517 #ifndef STBI_NO_PKM
518 #include "stbi_pkm.h"
519 #endif
520 
521 #ifndef STBI_NO_EXT
522 #include "stbi_ext.h"
523 #endif
524 
525 #ifdef __cplusplus
526 }
527 #endif
528 
529 //
530 //
531 ////   end header file   /////////////////////////////////////////////////////
532 #endif // STBI_INCLUDE_STB_IMAGE_H
533 
534 #ifdef STB_IMAGE_IMPLEMENTATION
535 
536 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
537   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
538   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
539   || defined(STBI_ONLY_ZLIB)
540    #ifndef STBI_ONLY_JPEG
541    #define STBI_NO_JPEG
542    #endif
543    #ifndef STBI_ONLY_PNG
544    #define STBI_NO_PNG
545    #endif
546    #ifndef STBI_ONLY_BMP
547    #define STBI_NO_BMP
548    #endif
549    #ifndef STBI_ONLY_PSD
550    #define STBI_NO_PSD
551    #endif
552    #ifndef STBI_ONLY_TGA
553    #define STBI_NO_TGA
554    #endif
555    #ifndef STBI_ONLY_GIF
556    #define STBI_NO_GIF
557    #endif
558    #ifndef STBI_ONLY_HDR
559    #define STBI_NO_HDR
560    #endif
561    #ifndef STBI_ONLY_PIC
562    #define STBI_NO_PIC
563    #endif
564    #ifndef STBI_ONLY_PNM
565    #define STBI_NO_PNM
566    #endif
567 #endif
568 
569 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
570 #define STBI_NO_ZLIB
571 #endif
572 
573 
574 #include <stdarg.h>
575 #include <stddef.h> // ptrdiff_t on osx
576 #include <stdlib.h>
577 #include <string.h>
578 
579 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
580 #include <math.h>  // ldexp
581 #endif
582 
583 #ifndef STBI_NO_STDIO
584 #include <stdio.h>
585 #endif
586 
587 #ifndef STBI_ASSERT
588 #include <assert.h>
589 #define STBI_ASSERT(x) assert(x)
590 #endif
591 
592 
593 #ifndef _MSC_VER
594    #ifdef __cplusplus
595    #define stbi_inline inline
596    #else
597    #define stbi_inline
598    #endif
599 #else
600    #define stbi_inline __forceinline
601 #endif
602 
603 
604 #ifdef _MSC_VER
605 typedef unsigned short stbi__uint16;
606 typedef   signed short stbi__int16;
607 typedef unsigned int   stbi__uint32;
608 typedef   signed int   stbi__int32;
609 #else
610 #include <stdint.h>
611 typedef uint16_t stbi__uint16;
612 typedef int16_t  stbi__int16;
613 typedef uint32_t stbi__uint32;
614 typedef int32_t  stbi__int32;
615 #endif
616 
617 // should produce compiler error if size is wrong
618 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
619 
620 #ifdef _MSC_VER
621 #define STBI_NOTUSED(v)  (void)(v)
622 #else
623 #define STBI_NOTUSED(v)  (void)sizeof(v)
624 #endif
625 
626 #ifdef _MSC_VER
627 #define STBI_HAS_LROTL
628 #endif
629 
630 #ifdef STBI_HAS_LROTL
631    #define stbi_lrot(x,y)  _lrotl(x,y)
632 #else
633    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (32 - (y))))
634 #endif
635 
636 #if defined(STBI_MALLOC) && defined(STBI_FREE) && defined(STBI_REALLOC)
637 // ok
638 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC)
639 // ok
640 #else
641 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC."
642 #endif
643 
644 #ifndef STBI_MALLOC
645 #define STBI_MALLOC(sz)    malloc(sz)
646 #define STBI_REALLOC(p,sz) realloc(p,sz)
647 #define STBI_FREE(p)       free(p)
648 #endif
649 
650 // x86/x64 detection
651 #if defined(__x86_64__) || defined(_M_X64)
652 #define STBI__X64_TARGET
653 #elif defined(__i386) || defined(_M_IX86)
654 #define STBI__X86_TARGET
655 #endif
656 
657 #if defined(__GNUC__) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET)) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
658 // NOTE: not clear do we actually need this for the 64-bit path?
659 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
660 // (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
661 // this is just broken and gcc are jerks for not fixing it properly
662 // http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
663 #define STBI_NO_SIMD
664 #endif
665 
666 #if defined(__MINGW32__) && !defined(__x86_64__) && !defined(STBI_NO_SIMD)
667 #define STBI_MINGW_ENABLE_SSE2
668 #define STBI_FORCE_STACK_ALIGN __attribute__((force_align_arg_pointer))
669 #else
670 #define STBI_FORCE_STACK_ALIGN
671 #endif
672 
673 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
674 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
675 //
676 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
677 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
678 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
679 // simultaneously enabling "-mstackrealign".
680 //
681 // See https://github.com/nothings/stb/issues/81 for more information.
682 //
683 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
684 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
685 #define STBI_NO_SIMD
686 #endif
687 
688 #if !defined(STBI_NO_SIMD) && defined(STBI__X86_TARGET)
689 #define STBI_SSE2
690 #include <emmintrin.h>
691 
692 #ifdef _MSC_VER
693 
694 #if _MSC_VER >= 1400  // not VC6
695 #include <intrin.h> // __cpuid
stbi__cpuid3(void)696 static int stbi__cpuid3(void)
697 {
698    int info[4];
699    __cpuid(info,1);
700    return info[3];
701 }
702 #else
stbi__cpuid3(void)703 static int stbi__cpuid3(void)
704 {
705    int res;
706    __asm {
707       mov  eax,1
708       cpuid
709       mov  res,edx
710    }
711    return res;
712 }
713 #endif
714 
715 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
716 
stbi__sse2_available()717 static int stbi__sse2_available()
718 {
719    int info3 = stbi__cpuid3();
720    return ((info3 >> 26) & 1) != 0;
721 }
722 #else // assume GCC-style if not VC++
723 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
724 
stbi__sse2_available()725 static int stbi__sse2_available()
726 {
727 #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
728    // GCC 4.8+ has a nice way to do this
729    return __builtin_cpu_supports("sse2");
730 #else
731    // portable way to do this, preferably without using GCC inline ASM?
732    // just bail for now.
733    return 0;
734 #endif
735 }
736 #endif
737 #endif
738 
739 // ARM NEON
740 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
741 #undef STBI_NEON
742 #endif
743 
744 #ifdef STBI_NEON
745 #include <arm_neon.h>
746 // assume GCC or Clang on ARM targets
747 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
748 #endif
749 
750 #ifndef STBI_SIMD_ALIGN
751 #define STBI_SIMD_ALIGN(type, name) type name
752 #endif
753 
754 ///////////////////////////////////////////////
755 //
756 //  stbi__context struct and start_xxx functions
757 
758 // stbi__context structure is our basic context used by all images, so it
759 // contains all the IO context, plus some basic image information
760 typedef struct
761 {
762    stbi__uint32 img_x, img_y;
763    int img_n, img_out_n;
764 
765    stbi_io_callbacks io;
766    void *io_user_data;
767 
768    int read_from_callbacks;
769    int buflen;
770    stbi_uc buffer_start[128];
771 
772    stbi_uc *img_buffer, *img_buffer_end;
773    stbi_uc *img_buffer_original;
774 } stbi__context;
775 
776 
777 static void stbi__refill_buffer(stbi__context *s);
778 
779 // initialize a memory-decode context
stbi__start_mem(stbi__context * s,stbi_uc const * buffer,int len)780 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
781 {
782    s->io.read = NULL;
783    s->read_from_callbacks = 0;
784    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
785    s->img_buffer_end = (stbi_uc *) buffer+len;
786 }
787 
788 // initialize a callback-based context
stbi__start_callbacks(stbi__context * s,stbi_io_callbacks * c,void * user)789 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
790 {
791    s->io = *c;
792    s->io_user_data = user;
793    s->buflen = sizeof(s->buffer_start);
794    s->read_from_callbacks = 1;
795    s->img_buffer_original = s->buffer_start;
796    stbi__refill_buffer(s);
797 }
798 
799 #ifndef STBI_NO_STDIO
800 
stbi__stdio_read(void * user,char * data,int size)801 static int stbi__stdio_read(void *user, char *data, int size)
802 {
803    return (int) fread(data,1,size,(FILE*) user);
804 }
805 
stbi__stdio_skip(void * user,int n)806 static void stbi__stdio_skip(void *user, int n)
807 {
808    fseek((FILE*) user, n, SEEK_CUR);
809 }
810 
stbi__stdio_eof(void * user)811 static int stbi__stdio_eof(void *user)
812 {
813    return feof((FILE*) user);
814 }
815 
816 static stbi_io_callbacks stbi__stdio_callbacks =
817 {
818    stbi__stdio_read,
819    stbi__stdio_skip,
820    stbi__stdio_eof,
821 };
822 
stbi__start_file(stbi__context * s,FILE * f)823 static void stbi__start_file(stbi__context *s, FILE *f)
824 {
825    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
826 }
827 
828 //static void stop_file(stbi__context *s) { }
829 
830 #endif // !STBI_NO_STDIO
831 
stbi__rewind(stbi__context * s)832 static void stbi__rewind(stbi__context *s)
833 {
834    // conceptually rewind SHOULD rewind to the beginning of the stream,
835    // but we just rewind to the beginning of the initial buffer, because
836    // we only use it after doing 'test', which only ever looks at at most 92 bytes
837    s->img_buffer = s->img_buffer_original;
838 }
839 
840 #ifndef STBI_NO_JPEG
841 static int      stbi__jpeg_test(stbi__context *s);
842 static stbi_uc *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
843 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
844 #endif
845 
846 #ifndef STBI_NO_PNG
847 static int      stbi__png_test(stbi__context *s);
848 static stbi_uc *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
849 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
850 #endif
851 
852 #ifndef STBI_NO_BMP
853 static int      stbi__bmp_test(stbi__context *s);
854 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
855 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
856 #endif
857 
858 #ifndef STBI_NO_TGA
859 static int      stbi__tga_test(stbi__context *s);
860 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
861 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
862 #endif
863 
864 #ifndef STBI_NO_PSD
865 static int      stbi__psd_test(stbi__context *s);
866 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
867 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
868 #endif
869 
870 #ifndef STBI_NO_HDR
871 static int      stbi__hdr_test(stbi__context *s);
872 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
873 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
874 #endif
875 
876 #ifndef STBI_NO_PIC
877 static int      stbi__pic_test(stbi__context *s);
878 static stbi_uc *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
879 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
880 #endif
881 
882 #ifndef STBI_NO_GIF
883 static int      stbi__gif_test(stbi__context *s);
884 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
885 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
886 #endif
887 
888 #ifndef STBI_NO_PNM
889 static int      stbi__pnm_test(stbi__context *s);
890 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
891 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
892 #endif
893 
894 #ifndef STBI_NO_PNM
895 static int      stbi__pnm_test(stbi__context *s);
896 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
897 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
898 #endif
899 
900 #ifndef STBI_NO_DDS
901 #include "stbi_DDS.h"
902 static int      stbi__dds_test(stbi__context *s);
903 static stbi_uc *stbi__dds_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
904 #endif
905 
906 #ifndef STBI_NO_PVR
907 #include "stbi_pvr.h"
908 static int      stbi__pvr_test(stbi__context *s);
909 static stbi_uc *stbi__pvr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
910 #endif
911 
912 #ifndef STBI_NO_PKM
913 #include "stbi_pkm.h"
914 static int      stbi__pkm_test(stbi__context *s);
915 static stbi_uc *stbi__pkm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
916 #endif
917 
918 // this is not threadsafe
919 static const char *stbi__g_failure_reason;
920 
stbi_failure_reason(void)921 STBIDEF const char *stbi_failure_reason(void)
922 {
923    return stbi__g_failure_reason;
924 }
925 
stbi__err(const char * str)926 static int stbi__err(const char *str)
927 {
928    stbi__g_failure_reason = str;
929    return 0;
930 }
931 
stbi__malloc(size_t size)932 static void *stbi__malloc(size_t size)
933 {
934     return STBI_MALLOC(size);
935 }
936 
937 // stbi__err - error
938 // stbi__errpf - error returning pointer to float
939 // stbi__errpuc - error returning pointer to unsigned char
940 
941 #ifdef STBI_NO_FAILURE_STRINGS
942    #define stbi__err(x,y)  0
943 #elif defined(STBI_FAILURE_USERMSG)
944    #define stbi__err(x,y)  stbi__err(y)
945 #else
946    #define stbi__err(x,y)  stbi__err(x)
947 #endif
948 
949 #define stbi__errpf(x,y)   ((float *) (stbi__err(x,y)?NULL:NULL))
950 #define stbi__errpuc(x,y)  ((unsigned char *) (stbi__err(x,y)?NULL:NULL))
951 
stbi_image_free(void * retval_from_stbi_load)952 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
953 {
954    STBI_FREE(retval_from_stbi_load);
955 }
956 
957 #ifndef STBI_NO_LINEAR
958 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
959 #endif
960 
961 #ifndef STBI_NO_HDR
962 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
963 #endif
964 
965 static int stbi__vertically_flip_on_load = 0;
966 
stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)967 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
968 {
969     stbi__vertically_flip_on_load = flag_true_if_should_flip;
970 }
971 
stbi__load_main(stbi__context * s,int * x,int * y,int * comp,int req_comp)972 static unsigned char *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
973 {
974    #ifndef STBI_NO_JPEG
975    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp);
976    #endif
977    #ifndef STBI_NO_PNG
978    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp);
979    #endif
980    #ifndef STBI_NO_BMP
981    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp);
982    #endif
983    #ifndef STBI_NO_GIF
984    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp);
985    #endif
986    #ifndef STBI_NO_PSD
987    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp);
988    #endif
989    #ifndef STBI_NO_PIC
990    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp);
991    #endif
992    #ifndef STBI_NO_PNM
993    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp);
994    #endif
995    #ifndef STBI_NO_DDS
996    if (stbi__dds_test(s))  return stbi__dds_load(s,x,y,comp,req_comp);
997    #endif
998    #ifndef STBI_NO_PVR
999    if (stbi__pvr_test(s))  return stbi__pvr_load(s,x,y,comp,req_comp);
1000    #endif
1001    #ifndef STBI_NO_PKM
1002    if (stbi__pkm_test(s))  return stbi__pkm_load(s,x,y,comp,req_comp);
1003    #endif
1004    #ifndef STBI_NO_HDR
1005    if (stbi__hdr_test(s)) {
1006       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp);
1007       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
1008    }
1009    #endif
1010 
1011    #ifndef STBI_NO_TGA
1012    // test tga last because it's a crappy test!
1013    if (stbi__tga_test(s))
1014       return stbi__tga_load(s,x,y,comp,req_comp);
1015    #endif
1016 
1017    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
1018 }
1019 
stbi__load_flip(stbi__context * s,int * x,int * y,int * comp,int req_comp)1020 static unsigned char *stbi__load_flip(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1021 {
1022    unsigned char *result = stbi__load_main(s, x, y, comp, req_comp);
1023 
1024    if (stbi__vertically_flip_on_load && result != NULL) {
1025       int w = *x, h = *y;
1026       int depth = req_comp ? req_comp : *comp;
1027       int row,col,z;
1028       stbi_uc temp;
1029 
1030       // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1031       for (row = 0; row < (h>>1); row++) {
1032          for (col = 0; col < w; col++) {
1033             for (z = 0; z < depth; z++) {
1034                temp = result[(row * w + col) * depth + z];
1035                result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
1036                result[((h - row - 1) * w + col) * depth + z] = temp;
1037             }
1038          }
1039       }
1040    }
1041 
1042    return result;
1043 }
1044 
stbi__float_postprocess(float * result,int * x,int * y,int * comp,int req_comp)1045 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1046 {
1047    if (stbi__vertically_flip_on_load && result != NULL) {
1048       int w = *x, h = *y;
1049       int depth = req_comp ? req_comp : *comp;
1050       int row,col,z;
1051       float temp;
1052 
1053       // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1054       for (row = 0; row < (h>>1); row++) {
1055          for (col = 0; col < w; col++) {
1056             for (z = 0; z < depth; z++) {
1057                temp = result[(row * w + col) * depth + z];
1058                result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
1059                result[((h - row - 1) * w + col) * depth + z] = temp;
1060             }
1061          }
1062       }
1063    }
1064 }
1065 
1066 
1067 #ifndef STBI_NO_STDIO
1068 
stbi__fopen(char const * filename,char const * mode)1069 static FILE *stbi__fopen(char const *filename, char const *mode)
1070 {
1071    FILE *f;
1072 #if defined(_MSC_VER) && _MSC_VER >= 1400
1073    if (0 != fopen_s(&f, filename, mode))
1074       f=0;
1075 #else
1076    f = fopen(filename, mode);
1077 #endif
1078    return f;
1079 }
1080 
1081 
stbi_load(char const * filename,int * x,int * y,int * comp,int req_comp)1082 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1083 {
1084    FILE *f = stbi__fopen(filename, "rb");
1085    unsigned char *result;
1086    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1087    result = stbi_load_from_file(f,x,y,comp,req_comp);
1088    fclose(f);
1089    return result;
1090 }
1091 
stbi_load_from_file(FILE * f,int * x,int * y,int * comp,int req_comp)1092 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1093 {
1094    unsigned char *result;
1095    stbi__context s;
1096    stbi__start_file(&s,f);
1097    result = stbi__load_flip(&s,x,y,comp,req_comp);
1098    if (result) {
1099       // need to 'unget' all the characters in the IO buffer
1100       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1101    }
1102    return result;
1103 }
1104 #endif //!STBI_NO_STDIO
1105 
stbi_load_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp,int req_comp)1106 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1107 {
1108    stbi__context s;
1109    stbi__start_mem(&s,buffer,len);
1110    return stbi__load_flip(&s,x,y,comp,req_comp);
1111 }
1112 
stbi_load_from_callbacks(stbi_io_callbacks const * clbk,void * user,int * x,int * y,int * comp,int req_comp)1113 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1114 {
1115    stbi__context s;
1116    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1117    return stbi__load_flip(&s,x,y,comp,req_comp);
1118 }
1119 
1120 #ifndef STBI_NO_LINEAR
stbi__loadf_main(stbi__context * s,int * x,int * y,int * comp,int req_comp)1121 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1122 {
1123    unsigned char *data;
1124    #ifndef STBI_NO_HDR
1125    if (stbi__hdr_test(s)) {
1126       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp);
1127       if (hdr_data)
1128          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
1129       return hdr_data;
1130    }
1131    #endif
1132    data = stbi__load_flip(s, x, y, comp, req_comp);
1133    if (data)
1134       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1135    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1136 }
1137 
stbi_loadf_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp,int req_comp)1138 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1139 {
1140    stbi__context s;
1141    stbi__start_mem(&s,buffer,len);
1142    return stbi__loadf_main(&s,x,y,comp,req_comp);
1143 }
1144 
stbi_loadf_from_callbacks(stbi_io_callbacks const * clbk,void * user,int * x,int * y,int * comp,int req_comp)1145 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1146 {
1147    stbi__context s;
1148    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1149    return stbi__loadf_main(&s,x,y,comp,req_comp);
1150 }
1151 
1152 #ifndef STBI_NO_STDIO
stbi_loadf(char const * filename,int * x,int * y,int * comp,int req_comp)1153 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1154 {
1155    float *result;
1156    FILE *f = stbi__fopen(filename, "rb");
1157    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1158    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1159    fclose(f);
1160    return result;
1161 }
1162 
stbi_loadf_from_file(FILE * f,int * x,int * y,int * comp,int req_comp)1163 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1164 {
1165    stbi__context s;
1166    stbi__start_file(&s,f);
1167    return stbi__loadf_main(&s,x,y,comp,req_comp);
1168 }
1169 #endif // !STBI_NO_STDIO
1170 
1171 #endif // !STBI_NO_LINEAR
1172 
1173 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1174 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1175 // reports false!
1176 
stbi_is_hdr_from_memory(stbi_uc const * buffer,int len)1177 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1178 {
1179    #ifndef STBI_NO_HDR
1180    stbi__context s;
1181    stbi__start_mem(&s,buffer,len);
1182    return stbi__hdr_test(&s);
1183    #else
1184    STBI_NOTUSED(buffer);
1185    STBI_NOTUSED(len);
1186    return 0;
1187    #endif
1188 }
1189 
1190 #ifndef STBI_NO_STDIO
stbi_is_hdr(char const * filename)1191 STBIDEF int      stbi_is_hdr          (char const *filename)
1192 {
1193    FILE *f = stbi__fopen(filename, "rb");
1194    int result=0;
1195    if (f) {
1196       result = stbi_is_hdr_from_file(f);
1197       fclose(f);
1198    }
1199    return result;
1200 }
1201 
stbi_is_hdr_from_file(FILE * f)1202 STBIDEF int      stbi_is_hdr_from_file(FILE *f)
1203 {
1204    #ifndef STBI_NO_HDR
1205    stbi__context s;
1206    stbi__start_file(&s,f);
1207    return stbi__hdr_test(&s);
1208    #else
1209    return 0;
1210    #endif
1211 }
1212 #endif // !STBI_NO_STDIO
1213 
stbi_is_hdr_from_callbacks(stbi_io_callbacks const * clbk,void * user)1214 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1215 {
1216    #ifndef STBI_NO_HDR
1217    stbi__context s;
1218    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1219    return stbi__hdr_test(&s);
1220    #else
1221    return 0;
1222    #endif
1223 }
1224 
1225 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1226 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1227 
1228 #ifndef STBI_NO_LINEAR
stbi_ldr_to_hdr_gamma(float gamma)1229 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
stbi_ldr_to_hdr_scale(float scale)1230 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1231 #endif
1232 
stbi_hdr_to_ldr_gamma(float gamma)1233 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
stbi_hdr_to_ldr_scale(float scale)1234 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1235 
1236 
1237 //////////////////////////////////////////////////////////////////////////////
1238 //
1239 // Common code used by all image loaders
1240 //
1241 
1242 enum
1243 {
1244    STBI__SCAN_load=0,
1245    STBI__SCAN_type,
1246    STBI__SCAN_header
1247 };
1248 
stbi__refill_buffer(stbi__context * s)1249 static void stbi__refill_buffer(stbi__context *s)
1250 {
1251    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1252    if (n == 0) {
1253       // at end of file, treat same as if from memory, but need to handle case
1254       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1255       s->read_from_callbacks = 0;
1256       s->img_buffer = s->buffer_start;
1257       s->img_buffer_end = s->buffer_start+1;
1258       *s->img_buffer = 0;
1259    } else {
1260       s->img_buffer = s->buffer_start;
1261       s->img_buffer_end = s->buffer_start + n;
1262    }
1263 }
1264 
stbi__get8(stbi__context * s)1265 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1266 {
1267    if (s->img_buffer < s->img_buffer_end)
1268       return *s->img_buffer++;
1269    if (s->read_from_callbacks) {
1270       stbi__refill_buffer(s);
1271       return *s->img_buffer++;
1272    }
1273    return 0;
1274 }
1275 
stbi__at_eof(stbi__context * s)1276 stbi_inline static int stbi__at_eof(stbi__context *s)
1277 {
1278    if (s->io.read) {
1279       if (!(s->io.eof)(s->io_user_data)) return 0;
1280       // if feof() is true, check if buffer = end
1281       // special case: we've only got the special 0 character at the end
1282       if (s->read_from_callbacks == 0) return 1;
1283    }
1284 
1285    return s->img_buffer >= s->img_buffer_end;
1286 }
1287 
stbi__skip(stbi__context * s,int n)1288 static void stbi__skip(stbi__context *s, int n)
1289 {
1290    if (n < 0) {
1291       s->img_buffer = s->img_buffer_end;
1292       return;
1293    }
1294    if (s->io.read) {
1295       int blen = (int) (s->img_buffer_end - s->img_buffer);
1296       if (blen < n) {
1297          s->img_buffer = s->img_buffer_end;
1298          (s->io.skip)(s->io_user_data, n - blen);
1299          return;
1300       }
1301    }
1302    s->img_buffer += n;
1303 }
1304 
stbi__getn(stbi__context * s,stbi_uc * buffer,int n)1305 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1306 {
1307    if (s->io.read) {
1308       int blen = (int) (s->img_buffer_end - s->img_buffer);
1309       if (blen < n) {
1310          int res, count;
1311 
1312          memcpy(buffer, s->img_buffer, blen);
1313 
1314          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1315          res = (count == (n-blen));
1316          s->img_buffer = s->img_buffer_end;
1317          return res;
1318       }
1319    }
1320 
1321    if (s->img_buffer+n <= s->img_buffer_end) {
1322       memcpy(buffer, s->img_buffer, n);
1323       s->img_buffer += n;
1324       return 1;
1325    } else
1326       return 0;
1327 }
1328 
stbi__get16be(stbi__context * s)1329 static int stbi__get16be(stbi__context *s)
1330 {
1331    int z = stbi__get8(s);
1332    return (z << 8) + stbi__get8(s);
1333 }
1334 
stbi__get32be(stbi__context * s)1335 static stbi__uint32 stbi__get32be(stbi__context *s)
1336 {
1337    stbi__uint32 z = stbi__get16be(s);
1338    return (z << 16) + stbi__get16be(s);
1339 }
1340 
stbi__get16le(stbi__context * s)1341 static int stbi__get16le(stbi__context *s)
1342 {
1343    int z = stbi__get8(s);
1344    return z + (stbi__get8(s) << 8);
1345 }
1346 
stbi__get32le(stbi__context * s)1347 static stbi__uint32 stbi__get32le(stbi__context *s)
1348 {
1349    stbi__uint32 z = stbi__get16le(s);
1350    return z + (stbi__get16le(s) << 16);
1351 }
1352 
1353 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
1354 
1355 
1356 //////////////////////////////////////////////////////////////////////////////
1357 //
1358 //  generic converter from built-in img_n to req_comp
1359 //    individual types do this automatically as much as possible (e.g. jpeg
1360 //    does all cases internally since it needs to colorspace convert anyway,
1361 //    and it never has alpha, so very few cases ). png can automatically
1362 //    interleave an alpha=255 channel, but falls back to this for other cases
1363 //
1364 //  assume data buffer is malloced, so malloc a new one and free that one
1365 //  only failure mode is malloc failing
1366 
stbi__compute_y(int r,int g,int b)1367 static stbi_uc stbi__compute_y(int r, int g, int b)
1368 {
1369    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
1370 }
1371 
stbi__convert_format(unsigned char * data,int img_n,int req_comp,unsigned int x,unsigned int y)1372 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1373 {
1374    int i,j;
1375    unsigned char *good;
1376 
1377    if (req_comp == img_n) return data;
1378    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1379 
1380    good = (unsigned char *) stbi__malloc(req_comp * x * y);
1381    if (good == NULL) {
1382       STBI_FREE(data);
1383       return stbi__errpuc("outofmem", "Out of memory");
1384    }
1385 
1386    for (j=0; j < (int) y; ++j) {
1387       unsigned char *src  = data + j * x * img_n   ;
1388       unsigned char *dest = good + j * x * req_comp;
1389 
1390       #define COMBO(a,b)  ((a)*8+(b))
1391       #define CASE(a,b)   case COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1392       // convert source image with img_n components to one with req_comp components;
1393       // avoid switch per pixel, so use switch per scanline and massive macros
1394       switch (COMBO(img_n, req_comp)) {
1395          CASE(1,2) dest[0]=src[0], dest[1]=255;
1396          break;
1397          CASE(1,3) dest[0]=dest[1]=dest[2]=src[0];
1398          break;
1399          CASE(1,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=255;
1400          break;
1401          CASE(2,1) dest[0]=src[0];
1402          break;
1403          CASE(2,3) dest[0]=dest[1]=dest[2]=src[0];
1404          break;
1405          CASE(2,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1];
1406          break;
1407          CASE(3,4) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255;
1408          break;
1409          CASE(3,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]);
1410          break;
1411          CASE(3,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255;
1412          break;
1413          CASE(4,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]);
1414          break;
1415          CASE(4,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3];
1416          break;
1417          CASE(4,3) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2];
1418          break;
1419          default: STBI_ASSERT(0);
1420       }
1421       #undef CASE
1422    }
1423 
1424    STBI_FREE(data);
1425    return good;
1426 }
1427 
1428 #ifndef STBI_NO_LINEAR
stbi__ldr_to_hdr(stbi_uc * data,int x,int y,int comp)1429 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1430 {
1431    int i,k,n;
1432    float *output = (float *) stbi__malloc(x * y * comp * sizeof(float));
1433    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1434    // compute number of non-alpha components
1435    if (comp & 1) n = comp; else n = comp-1;
1436    for (i=0; i < x*y; ++i) {
1437       for (k=0; k < n; ++k) {
1438          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1439       }
1440       if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
1441    }
1442    STBI_FREE(data);
1443    return output;
1444 }
1445 #endif
1446 
1447 #ifndef STBI_NO_HDR
1448 #define stbi__float2int(x)   ((int) (x))
stbi__hdr_to_ldr(float * data,int x,int y,int comp)1449 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
1450 {
1451    int i,k,n;
1452    stbi_uc *output = (stbi_uc *) stbi__malloc(x * y * comp);
1453    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1454    // compute number of non-alpha components
1455    if (comp & 1) n = comp; else n = comp-1;
1456    for (i=0; i < x*y; ++i) {
1457       for (k=0; k < n; ++k) {
1458          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1459          if (z < 0) z = 0;
1460          if (z > 255) z = 255;
1461          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1462       }
1463       if (k < comp) {
1464          float z = data[i*comp+k] * 255 + 0.5f;
1465          if (z < 0) z = 0;
1466          if (z > 255) z = 255;
1467          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1468       }
1469    }
1470    STBI_FREE(data);
1471    return output;
1472 }
1473 #endif
1474 
1475 //////////////////////////////////////////////////////////////////////////////
1476 //
1477 //  "baseline" JPEG/JFIF decoder
1478 //
1479 //    simple implementation
1480 //      - doesn't support delayed output of y-dimension
1481 //      - simple interface (only one output format: 8-bit interleaved RGB)
1482 //      - doesn't try to recover corrupt jpegs
1483 //      - doesn't allow partial loading, loading multiple at once
1484 //      - still fast on x86 (copying globals into locals doesn't help x86)
1485 //      - allocates lots of intermediate memory (full size of all components)
1486 //        - non-interleaved case requires this anyway
1487 //        - allows good upsampling (see next)
1488 //    high-quality
1489 //      - upsampled channels are bilinearly interpolated, even across blocks
1490 //      - quality integer IDCT derived from IJG's 'slow'
1491 //    performance
1492 //      - fast huffman; reasonable integer IDCT
1493 //      - some SIMD kernels for common paths on targets with SSE2/NEON
1494 //      - uses a lot of intermediate memory, could cache poorly
1495 
1496 #ifndef STBI_NO_JPEG
1497 
1498 // huffman decoding acceleration
1499 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
1500 
1501 typedef struct
1502 {
1503    stbi_uc  fast[1 << FAST_BITS];
1504    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1505    stbi__uint16 code[256];
1506    stbi_uc  values[256];
1507    stbi_uc  size[257];
1508    unsigned int maxcode[18];
1509    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
1510 } stbi__huffman;
1511 
1512 typedef struct
1513 {
1514    stbi__context *s;
1515    stbi__huffman huff_dc[4];
1516    stbi__huffman huff_ac[4];
1517    stbi_uc dequant[4][64];
1518    stbi__int16 fast_ac[4][1 << FAST_BITS];
1519 
1520 // sizes for components, interleaved MCUs
1521    int img_h_max, img_v_max;
1522    int img_mcu_x, img_mcu_y;
1523    int img_mcu_w, img_mcu_h;
1524 
1525 // definition of jpeg image component
1526    struct
1527    {
1528       int id;
1529       int h,v;
1530       int tq;
1531       int hd,ha;
1532       int dc_pred;
1533 
1534       int x,y,w2,h2;
1535       stbi_uc *data;
1536       void *raw_data, *raw_coeff;
1537       stbi_uc *linebuf;
1538       short   *coeff;   // progressive only
1539       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
1540    } img_comp[4];
1541 
1542    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
1543    int            code_bits;   // number of valid bits
1544    unsigned char  marker;      // marker seen while filling entropy buffer
1545    int            nomore;      // flag if we saw a marker so must stop
1546 
1547    int            progressive;
1548    int            spec_start;
1549    int            spec_end;
1550    int            succ_high;
1551    int            succ_low;
1552    int            eob_run;
1553 
1554    int scan_n, order[4];
1555    int restart_interval, todo;
1556 
1557 // kernels
1558    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1559    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1560    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1561 } stbi__jpeg;
1562 
stbi__build_huffman(stbi__huffman * h,int * count)1563 static int stbi__build_huffman(stbi__huffman *h, int *count)
1564 {
1565    int i,j,k=0,code;
1566    // build size list for each symbol (from JPEG spec)
1567    for (i=0; i < 16; ++i)
1568       for (j=0; j < count[i]; ++j)
1569          h->size[k++] = (stbi_uc) (i+1);
1570    h->size[k] = 0;
1571 
1572    // compute actual symbols (from jpeg spec)
1573    code = 0;
1574    k = 0;
1575    for(j=1; j <= 16; ++j) {
1576       // compute delta to add to code to compute symbol id
1577       h->delta[j] = k - code;
1578       if (h->size[k] == j) {
1579          while (h->size[k] == j)
1580             h->code[k++] = (stbi__uint16) (code++);
1581          if (code-1 >= (1 << j)) return stbi__err("bad code lengths","Corrupt JPEG");
1582       }
1583       // compute largest code + 1 for this size, preshifted as needed later
1584       h->maxcode[j] = code << (16-j);
1585       code <<= 1;
1586    }
1587    h->maxcode[j] = 0xffffffff;
1588 
1589    // build non-spec acceleration table; 255 is flag for not-accelerated
1590    memset(h->fast, 255, 1 << FAST_BITS);
1591    for (i=0; i < k; ++i) {
1592       int s = h->size[i];
1593       if (s <= FAST_BITS) {
1594          int c = h->code[i] << (FAST_BITS-s);
1595          int m = 1 << (FAST_BITS-s);
1596          for (j=0; j < m; ++j) {
1597             h->fast[c+j] = (stbi_uc) i;
1598          }
1599       }
1600    }
1601    return 1;
1602 }
1603 
1604 // build a table that decodes both magnitude and value of small ACs in
1605 // one go.
stbi__build_fast_ac(stbi__int16 * fast_ac,stbi__huffman * h)1606 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
1607 {
1608    int i;
1609    for (i=0; i < (1 << FAST_BITS); ++i) {
1610       stbi_uc fast = h->fast[i];
1611       fast_ac[i] = 0;
1612       if (fast < 255) {
1613          int rs = h->values[fast];
1614          int run = (rs >> 4) & 15;
1615          int magbits = rs & 15;
1616          int len = h->size[fast];
1617 
1618          if (magbits && len + magbits <= FAST_BITS) {
1619             // magnitude code followed by receive_extend code
1620             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
1621             int m = 1 << (magbits - 1);
1622             if (k < m) k += (-1 << magbits) + 1;
1623             // if the result is small enough, we can fit it in fast_ac table
1624             if (k >= -128 && k <= 127)
1625                fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits));
1626          }
1627       }
1628    }
1629 }
1630 
stbi__grow_buffer_unsafe(stbi__jpeg * j)1631 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
1632 {
1633    do {
1634       int b = j->nomore ? 0 : stbi__get8(j->s);
1635       if (b == 0xff) {
1636          int c = stbi__get8(j->s);
1637          if (c != 0) {
1638             j->marker = (unsigned char) c;
1639             j->nomore = 1;
1640             return;
1641          }
1642       }
1643       j->code_buffer |= b << (24 - j->code_bits);
1644       j->code_bits += 8;
1645    } while (j->code_bits <= 24);
1646 }
1647 
1648 // (1 << n) - 1
1649 static stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
1650 
1651 // decode a jpeg huffman value from the bitstream
stbi__jpeg_huff_decode(stbi__jpeg * j,stbi__huffman * h)1652 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
1653 {
1654    unsigned int temp;
1655    int c,k;
1656 
1657    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1658 
1659    // look at the top FAST_BITS and determine what symbol ID it is,
1660    // if the code is <= FAST_BITS
1661    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1662    k = h->fast[c];
1663    if (k < 255) {
1664       int s = h->size[k];
1665       if (s > j->code_bits)
1666          return -1;
1667       j->code_buffer <<= s;
1668       j->code_bits -= s;
1669       return h->values[k];
1670    }
1671 
1672    // naive test is to shift the code_buffer down so k bits are
1673    // valid, then test against maxcode. To speed this up, we've
1674    // preshifted maxcode left so that it has (16-k) 0s at the
1675    // end; in other words, regardless of the number of bits, it
1676    // wants to be compared against something shifted to have 16;
1677    // that way we don't need to shift inside the loop.
1678    temp = j->code_buffer >> 16;
1679    for (k=FAST_BITS+1 ; ; ++k)
1680       if (temp < h->maxcode[k])
1681          break;
1682    if (k == 17) {
1683       // error! code not found
1684       j->code_bits -= 16;
1685       return -1;
1686    }
1687 
1688    if (k > j->code_bits)
1689       return -1;
1690 
1691    // convert the huffman code to the symbol id
1692    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
1693    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
1694 
1695    // convert the id to a symbol
1696    j->code_bits -= k;
1697    j->code_buffer <<= k;
1698    return h->values[c];
1699 }
1700 
1701 // bias[n] = (-1<<n) + 1
1702 static int const stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
1703 
1704 // combined JPEG 'receive' and JPEG 'extend', since baseline
1705 // always extends everything it receives.
stbi__extend_receive(stbi__jpeg * j,int n)1706 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
1707 {
1708    unsigned int k;
1709    int sgn;
1710    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1711 
1712    sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
1713    k = stbi_lrot(j->code_buffer, n);
1714    STBI_ASSERT(n >= 0 && n < (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask)));
1715    j->code_buffer = k & ~stbi__bmask[n];
1716    k &= stbi__bmask[n];
1717    j->code_bits -= n;
1718    return k + (stbi__jbias[n] & ~sgn);
1719 }
1720 
1721 // get some unsigned bits
stbi__jpeg_get_bits(stbi__jpeg * j,int n)1722 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
1723 {
1724    unsigned int k;
1725    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1726    k = stbi_lrot(j->code_buffer, n);
1727    j->code_buffer = k & ~stbi__bmask[n];
1728    k &= stbi__bmask[n];
1729    j->code_bits -= n;
1730    return k;
1731 }
1732 
stbi__jpeg_get_bit(stbi__jpeg * j)1733 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
1734 {
1735    unsigned int k;
1736    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
1737    k = j->code_buffer;
1738    j->code_buffer <<= 1;
1739    --j->code_bits;
1740    return k & 0x80000000;
1741 }
1742 
1743 // given a value that's at position X in the zigzag stream,
1744 // where does it appear in the 8x8 matrix coded as row-major?
1745 static stbi_uc stbi__jpeg_dezigzag[64+15] =
1746 {
1747     0,  1,  8, 16,  9,  2,  3, 10,
1748    17, 24, 32, 25, 18, 11,  4,  5,
1749    12, 19, 26, 33, 40, 48, 41, 34,
1750    27, 20, 13,  6,  7, 14, 21, 28,
1751    35, 42, 49, 56, 57, 50, 43, 36,
1752    29, 22, 15, 23, 30, 37, 44, 51,
1753    58, 59, 52, 45, 38, 31, 39, 46,
1754    53, 60, 61, 54, 47, 55, 62, 63,
1755    // let corrupt input sample past end
1756    63, 63, 63, 63, 63, 63, 63, 63,
1757    63, 63, 63, 63, 63, 63, 63
1758 };
1759 
1760 // decode one 64-entry block--
stbi__jpeg_decode_block(stbi__jpeg * j,short data[64],stbi__huffman * hdc,stbi__huffman * hac,stbi__int16 * fac,int b,stbi_uc * dequant)1761 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant)
1762 {
1763    int diff,dc,k;
1764    int t;
1765 
1766    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1767    t = stbi__jpeg_huff_decode(j, hdc);
1768    if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1769 
1770    // 0 all the ac values now so we can do it 32-bits at a time
1771    memset(data,0,64*sizeof(data[0]));
1772 
1773    diff = t ? stbi__extend_receive(j, t) : 0;
1774    dc = j->img_comp[b].dc_pred + diff;
1775    j->img_comp[b].dc_pred = dc;
1776    data[0] = (short) (dc * dequant[0]);
1777 
1778    // decode AC components, see JPEG spec
1779    k = 1;
1780    do {
1781       unsigned int zig;
1782       int c,r,s;
1783       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1784       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1785       r = fac[c];
1786       if (r) { // fast-AC path
1787          k += (r >> 4) & 15; // run
1788          s = r & 15; // combined length
1789          j->code_buffer <<= s;
1790          j->code_bits -= s;
1791          // decode into unzigzag'd location
1792          zig = stbi__jpeg_dezigzag[k++];
1793          data[zig] = (short) ((r >> 8) * dequant[zig]);
1794       } else {
1795          int rs = stbi__jpeg_huff_decode(j, hac);
1796          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1797          s = rs & 15;
1798          r = rs >> 4;
1799          if (s == 0) {
1800             if (rs != 0xf0) break; // end block
1801             k += 16;
1802          } else {
1803             k += r;
1804             // decode into unzigzag'd location
1805             zig = stbi__jpeg_dezigzag[k++];
1806             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
1807          }
1808       }
1809    } while (k < 64);
1810    return 1;
1811 }
1812 
stbi__jpeg_decode_block_prog_dc(stbi__jpeg * j,short data[64],stbi__huffman * hdc,int b)1813 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
1814 {
1815    int diff,dc;
1816    int t;
1817    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1818 
1819    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1820 
1821    if (j->succ_high == 0) {
1822       // first scan for DC coefficient, must be first
1823       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
1824       t = stbi__jpeg_huff_decode(j, hdc);
1825       diff = t ? stbi__extend_receive(j, t) : 0;
1826 
1827       dc = j->img_comp[b].dc_pred + diff;
1828       j->img_comp[b].dc_pred = dc;
1829       data[0] = (short) (dc << j->succ_low);
1830    } else {
1831       // refinement scan for DC coefficient
1832       if (stbi__jpeg_get_bit(j))
1833          data[0] += (short) (1 << j->succ_low);
1834    }
1835    return 1;
1836 }
1837 
1838 // @OPTIMIZE: store non-zigzagged during the decode passes,
1839 // and only de-zigzag when dequantizing
stbi__jpeg_decode_block_prog_ac(stbi__jpeg * j,short data[64],stbi__huffman * hac,stbi__int16 * fac)1840 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
1841 {
1842    int k;
1843    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1844 
1845    if (j->succ_high == 0) {
1846       int shift = j->succ_low;
1847 
1848       if (j->eob_run) {
1849          --j->eob_run;
1850          return 1;
1851       }
1852 
1853       k = j->spec_start;
1854       do {
1855          unsigned int zig;
1856          int c,r,s;
1857          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1858          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1859          r = fac[c];
1860          if (r) { // fast-AC path
1861             k += (r >> 4) & 15; // run
1862             s = r & 15; // combined length
1863             j->code_buffer <<= s;
1864             j->code_bits -= s;
1865             zig = stbi__jpeg_dezigzag[k++];
1866             data[zig] = (short) ((r >> 8) << shift);
1867          } else {
1868             int rs = stbi__jpeg_huff_decode(j, hac);
1869             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1870             s = rs & 15;
1871             r = rs >> 4;
1872             if (s == 0) {
1873                if (r < 15) {
1874                   j->eob_run = (1 << r);
1875                   if (r)
1876                      j->eob_run += stbi__jpeg_get_bits(j, r);
1877                   --j->eob_run;
1878                   break;
1879                }
1880                k += 16;
1881             } else {
1882                k += r;
1883                zig = stbi__jpeg_dezigzag[k++];
1884                data[zig] = (short) (stbi__extend_receive(j,s) << shift);
1885             }
1886          }
1887       } while (k <= j->spec_end);
1888    } else {
1889       // refinement scan for these AC coefficients
1890 
1891       short bit = (short) (1 << j->succ_low);
1892 
1893       if (j->eob_run) {
1894          --j->eob_run;
1895          for (k = j->spec_start; k <= j->spec_end; ++k) {
1896             short *p = &data[stbi__jpeg_dezigzag[k]];
1897             if (*p != 0)
1898                if (stbi__jpeg_get_bit(j))
1899                   if ((*p & bit)==0) {
1900                      if (*p > 0)
1901                         *p += bit;
1902                      else
1903                         *p -= bit;
1904                   }
1905          }
1906       } else {
1907          k = j->spec_start;
1908          do {
1909             int r,s;
1910             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
1911             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1912             s = rs & 15;
1913             r = rs >> 4;
1914             if (s == 0) {
1915                if (r < 15) {
1916                   j->eob_run = (1 << r) - 1;
1917                   if (r)
1918                      j->eob_run += stbi__jpeg_get_bits(j, r);
1919                   r = 64; // force end of block
1920                } else {
1921                   // r=15 s=0 should write 16 0s, so we just do
1922                   // a run of 15 0s and then write s (which is 0),
1923                   // so we don't have to do anything special here
1924                }
1925             } else {
1926                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
1927                // sign bit
1928                if (stbi__jpeg_get_bit(j))
1929                   s = bit;
1930                else
1931                   s = -bit;
1932             }
1933 
1934             // advance by r
1935             while (k <= j->spec_end) {
1936                short *p = &data[stbi__jpeg_dezigzag[k++]];
1937                if (*p != 0) {
1938                   if (stbi__jpeg_get_bit(j))
1939                      if ((*p & bit)==0) {
1940                         if (*p > 0)
1941                            *p += bit;
1942                         else
1943                            *p -= bit;
1944                      }
1945                } else {
1946                   if (r == 0) {
1947                      *p = (short) s;
1948                      break;
1949                   }
1950                   --r;
1951                }
1952             }
1953          } while (k <= j->spec_end);
1954       }
1955    }
1956    return 1;
1957 }
1958 
1959 // take a -128..127 value and stbi__clamp it and convert to 0..255
stbi__clamp(int x)1960 stbi_inline static stbi_uc stbi__clamp(int x)
1961 {
1962    // trick to use a single test to catch both cases
1963    if ((unsigned int) x > 255) {
1964       if (x < 0) return 0;
1965       if (x > 255) return 255;
1966    }
1967    return (stbi_uc) x;
1968 }
1969 
1970 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
1971 #define stbi__fsh(x)  ((x) << 12)
1972 
1973 // derived from jidctint -- DCT_ISLOW
1974 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
1975    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
1976    p2 = s2;                                    \
1977    p3 = s6;                                    \
1978    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
1979    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
1980    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
1981    p2 = s0;                                    \
1982    p3 = s4;                                    \
1983    t0 = stbi__fsh(p2+p3);                      \
1984    t1 = stbi__fsh(p2-p3);                      \
1985    x0 = t0+t3;                                 \
1986    x3 = t0-t3;                                 \
1987    x1 = t1+t2;                                 \
1988    x2 = t1-t2;                                 \
1989    t0 = s7;                                    \
1990    t1 = s5;                                    \
1991    t2 = s3;                                    \
1992    t3 = s1;                                    \
1993    p3 = t0+t2;                                 \
1994    p4 = t1+t3;                                 \
1995    p1 = t0+t3;                                 \
1996    p2 = t1+t2;                                 \
1997    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
1998    t0 = t0*stbi__f2f( 0.298631336f);           \
1999    t1 = t1*stbi__f2f( 2.053119869f);           \
2000    t2 = t2*stbi__f2f( 3.072711026f);           \
2001    t3 = t3*stbi__f2f( 1.501321110f);           \
2002    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
2003    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
2004    p3 = p3*stbi__f2f(-1.961570560f);           \
2005    p4 = p4*stbi__f2f(-0.390180644f);           \
2006    t3 += p1+p4;                                \
2007    t2 += p2+p3;                                \
2008    t1 += p2+p4;                                \
2009    t0 += p1+p3;
2010 
stbi__idct_block(stbi_uc * out,int out_stride,short data[64])2011 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
2012 {
2013    int i,val[64],*v=val;
2014    stbi_uc *o;
2015    short *d = data;
2016 
2017    // columns
2018    for (i=0; i < 8; ++i,++d, ++v) {
2019       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
2020       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
2021            && d[40]==0 && d[48]==0 && d[56]==0) {
2022          //    no shortcut                 0     seconds
2023          //    (1|2|3|4|5|6|7)==0          0     seconds
2024          //    all separate               -0.047 seconds
2025          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
2026          int dcterm = d[0] << 2;
2027          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
2028       } else {
2029          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
2030          // constants scaled things up by 1<<12; let's bring them back
2031          // down, but keep 2 extra bits of precision
2032          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
2033          v[ 0] = (x0+t3) >> 10;
2034          v[56] = (x0-t3) >> 10;
2035          v[ 8] = (x1+t2) >> 10;
2036          v[48] = (x1-t2) >> 10;
2037          v[16] = (x2+t1) >> 10;
2038          v[40] = (x2-t1) >> 10;
2039          v[24] = (x3+t0) >> 10;
2040          v[32] = (x3-t0) >> 10;
2041       }
2042    }
2043 
2044    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
2045       // no fast case since the first 1D IDCT spread components out
2046       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
2047       // constants scaled things up by 1<<12, plus we had 1<<2 from first
2048       // loop, plus horizontal and vertical each scale by sqrt(8) so together
2049       // we've got an extra 1<<3, so 1<<17 total we need to remove.
2050       // so we want to round that, which means adding 0.5 * 1<<17,
2051       // aka 65536. Also, we'll end up with -128 to 127 that we want
2052       // to encode as 0..255 by adding 128, so we'll add that before the shift
2053       x0 += 65536 + (128<<17);
2054       x1 += 65536 + (128<<17);
2055       x2 += 65536 + (128<<17);
2056       x3 += 65536 + (128<<17);
2057       // tried computing the shifts into temps, or'ing the temps to see
2058       // if any were out of range, but that was slower
2059       o[0] = stbi__clamp((x0+t3) >> 17);
2060       o[7] = stbi__clamp((x0-t3) >> 17);
2061       o[1] = stbi__clamp((x1+t2) >> 17);
2062       o[6] = stbi__clamp((x1-t2) >> 17);
2063       o[2] = stbi__clamp((x2+t1) >> 17);
2064       o[5] = stbi__clamp((x2-t1) >> 17);
2065       o[3] = stbi__clamp((x3+t0) >> 17);
2066       o[4] = stbi__clamp((x3-t0) >> 17);
2067    }
2068 }
2069 
2070 #ifdef STBI_SSE2
2071 // sse2 integer IDCT. not the fastest possible implementation but it
2072 // produces bit-identical results to the generic C version so it's
2073 // fully "transparent".
stbi__idct_simd(stbi_uc * out,int out_stride,short data[64])2074 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2075 {
2076    // This is constructed to match our regular (generic) integer IDCT exactly.
2077    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2078    __m128i tmp;
2079 
2080    // dot product constant: even elems=x, odd elems=y
2081    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2082 
2083    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
2084    // out(1) = c1[even]*x + c1[odd]*y
2085    #define dct_rot(out0,out1, x,y,c0,c1) \
2086       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2087       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2088       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2089       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2090       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2091       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2092 
2093    // out = in << 12  (in 16-bit, out 32-bit)
2094    #define dct_widen(out, in) \
2095       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2096       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2097 
2098    // wide add
2099    #define dct_wadd(out, a, b) \
2100       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2101       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2102 
2103    // wide sub
2104    #define dct_wsub(out, a, b) \
2105       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2106       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2107 
2108    // butterfly a/b, add bias, then shift by "s" and pack
2109    #define dct_bfly32o(out0, out1, a,b,bias,s) \
2110       { \
2111          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2112          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2113          dct_wadd(sum, abiased, b); \
2114          dct_wsub(dif, abiased, b); \
2115          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2116          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2117       }
2118 
2119    // 8-bit interleave step (for transposes)
2120    #define dct_interleave8(a, b) \
2121       tmp = a; \
2122       a = _mm_unpacklo_epi8(a, b); \
2123       b = _mm_unpackhi_epi8(tmp, b)
2124 
2125    // 16-bit interleave step (for transposes)
2126    #define dct_interleave16(a, b) \
2127       tmp = a; \
2128       a = _mm_unpacklo_epi16(a, b); \
2129       b = _mm_unpackhi_epi16(tmp, b)
2130 
2131    #define dct_pass(bias,shift) \
2132       { \
2133          /* even part */ \
2134          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2135          __m128i sum04 = _mm_add_epi16(row0, row4); \
2136          __m128i dif04 = _mm_sub_epi16(row0, row4); \
2137          dct_widen(t0e, sum04); \
2138          dct_widen(t1e, dif04); \
2139          dct_wadd(x0, t0e, t3e); \
2140          dct_wsub(x3, t0e, t3e); \
2141          dct_wadd(x1, t1e, t2e); \
2142          dct_wsub(x2, t1e, t2e); \
2143          /* odd part */ \
2144          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2145          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2146          __m128i sum17 = _mm_add_epi16(row1, row7); \
2147          __m128i sum35 = _mm_add_epi16(row3, row5); \
2148          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2149          dct_wadd(x4, y0o, y4o); \
2150          dct_wadd(x5, y1o, y5o); \
2151          dct_wadd(x6, y2o, y5o); \
2152          dct_wadd(x7, y3o, y4o); \
2153          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2154          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2155          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2156          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2157       }
2158 
2159    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2160    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2161    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2162    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2163    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2164    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2165    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2166    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2167 
2168    // rounding biases in column/row passes, see stbi__idct_block for explanation.
2169    __m128i bias_0 = _mm_set1_epi32(512);
2170    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2171 
2172    // load
2173    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2174    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2175    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2176    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2177    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2178    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2179    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2180    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2181 
2182    // column pass
2183    dct_pass(bias_0, 10);
2184 
2185    {
2186       // 16bit 8x8 transpose pass 1
2187       dct_interleave16(row0, row4);
2188       dct_interleave16(row1, row5);
2189       dct_interleave16(row2, row6);
2190       dct_interleave16(row3, row7);
2191 
2192       // transpose pass 2
2193       dct_interleave16(row0, row2);
2194       dct_interleave16(row1, row3);
2195       dct_interleave16(row4, row6);
2196       dct_interleave16(row5, row7);
2197 
2198       // transpose pass 3
2199       dct_interleave16(row0, row1);
2200       dct_interleave16(row2, row3);
2201       dct_interleave16(row4, row5);
2202       dct_interleave16(row6, row7);
2203    }
2204 
2205    // row pass
2206    dct_pass(bias_1, 17);
2207 
2208    {
2209       // pack
2210       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2211       __m128i p1 = _mm_packus_epi16(row2, row3);
2212       __m128i p2 = _mm_packus_epi16(row4, row5);
2213       __m128i p3 = _mm_packus_epi16(row6, row7);
2214 
2215       // 8bit 8x8 transpose pass 1
2216       dct_interleave8(p0, p2); // a0e0a1e1...
2217       dct_interleave8(p1, p3); // c0g0c1g1...
2218 
2219       // transpose pass 2
2220       dct_interleave8(p0, p1); // a0c0e0g0...
2221       dct_interleave8(p2, p3); // b0d0f0h0...
2222 
2223       // transpose pass 3
2224       dct_interleave8(p0, p2); // a0b0c0d0...
2225       dct_interleave8(p1, p3); // a4b4c4d4...
2226 
2227       // store
2228       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2229       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2230       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2231       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2232       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2233       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2234       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2235       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2236    }
2237 
2238 #undef dct_const
2239 #undef dct_rot
2240 #undef dct_widen
2241 #undef dct_wadd
2242 #undef dct_wsub
2243 #undef dct_bfly32o
2244 #undef dct_interleave8
2245 #undef dct_interleave16
2246 #undef dct_pass
2247 }
2248 
2249 #endif // STBI_SSE2
2250 
2251 #ifdef STBI_NEON
2252 
2253 // NEON integer IDCT. should produce bit-identical
2254 // results to the generic C version.
stbi__idct_simd(stbi_uc * out,int out_stride,short data[64])2255 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2256 {
2257    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2258 
2259    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2260    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2261    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2262    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2263    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2264    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2265    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2266    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2267    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2268    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2269    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2270    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2271 
2272 #define dct_long_mul(out, inq, coeff) \
2273    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2274    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2275 
2276 #define dct_long_mac(out, acc, inq, coeff) \
2277    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2278    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2279 
2280 #define dct_widen(out, inq) \
2281    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2282    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2283 
2284 // wide add
2285 #define dct_wadd(out, a, b) \
2286    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2287    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2288 
2289 // wide sub
2290 #define dct_wsub(out, a, b) \
2291    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2292    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2293 
2294 // butterfly a/b, then shift using "shiftop" by "s" and pack
2295 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2296    { \
2297       dct_wadd(sum, a, b); \
2298       dct_wsub(dif, a, b); \
2299       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2300       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2301    }
2302 
2303 #define dct_pass(shiftop, shift) \
2304    { \
2305       /* even part */ \
2306       int16x8_t sum26 = vaddq_s16(row2, row6); \
2307       dct_long_mul(p1e, sum26, rot0_0); \
2308       dct_long_mac(t2e, p1e, row6, rot0_1); \
2309       dct_long_mac(t3e, p1e, row2, rot0_2); \
2310       int16x8_t sum04 = vaddq_s16(row0, row4); \
2311       int16x8_t dif04 = vsubq_s16(row0, row4); \
2312       dct_widen(t0e, sum04); \
2313       dct_widen(t1e, dif04); \
2314       dct_wadd(x0, t0e, t3e); \
2315       dct_wsub(x3, t0e, t3e); \
2316       dct_wadd(x1, t1e, t2e); \
2317       dct_wsub(x2, t1e, t2e); \
2318       /* odd part */ \
2319       int16x8_t sum15 = vaddq_s16(row1, row5); \
2320       int16x8_t sum17 = vaddq_s16(row1, row7); \
2321       int16x8_t sum35 = vaddq_s16(row3, row5); \
2322       int16x8_t sum37 = vaddq_s16(row3, row7); \
2323       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2324       dct_long_mul(p5o, sumodd, rot1_0); \
2325       dct_long_mac(p1o, p5o, sum17, rot1_1); \
2326       dct_long_mac(p2o, p5o, sum35, rot1_2); \
2327       dct_long_mul(p3o, sum37, rot2_0); \
2328       dct_long_mul(p4o, sum15, rot2_1); \
2329       dct_wadd(sump13o, p1o, p3o); \
2330       dct_wadd(sump24o, p2o, p4o); \
2331       dct_wadd(sump23o, p2o, p3o); \
2332       dct_wadd(sump14o, p1o, p4o); \
2333       dct_long_mac(x4, sump13o, row7, rot3_0); \
2334       dct_long_mac(x5, sump24o, row5, rot3_1); \
2335       dct_long_mac(x6, sump23o, row3, rot3_2); \
2336       dct_long_mac(x7, sump14o, row1, rot3_3); \
2337       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2338       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2339       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2340       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2341    }
2342 
2343    // load
2344    row0 = vld1q_s16(data + 0*8);
2345    row1 = vld1q_s16(data + 1*8);
2346    row2 = vld1q_s16(data + 2*8);
2347    row3 = vld1q_s16(data + 3*8);
2348    row4 = vld1q_s16(data + 4*8);
2349    row5 = vld1q_s16(data + 5*8);
2350    row6 = vld1q_s16(data + 6*8);
2351    row7 = vld1q_s16(data + 7*8);
2352 
2353    // add DC bias
2354    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2355 
2356    // column pass
2357    dct_pass(vrshrn_n_s32, 10);
2358 
2359    // 16bit 8x8 transpose
2360    {
2361 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2362 // whether compilers actually get this is another story, sadly.
2363 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2364 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2365 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2366 
2367       // pass 1
2368       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2369       dct_trn16(row2, row3);
2370       dct_trn16(row4, row5);
2371       dct_trn16(row6, row7);
2372 
2373       // pass 2
2374       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2375       dct_trn32(row1, row3);
2376       dct_trn32(row4, row6);
2377       dct_trn32(row5, row7);
2378 
2379       // pass 3
2380       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2381       dct_trn64(row1, row5);
2382       dct_trn64(row2, row6);
2383       dct_trn64(row3, row7);
2384 
2385 #undef dct_trn16
2386 #undef dct_trn32
2387 #undef dct_trn64
2388    }
2389 
2390    // row pass
2391    // vrshrn_n_s32 only supports shifts up to 16, we need
2392    // 17. so do a non-rounding shift of 16 first then follow
2393    // up with a rounding shift by 1.
2394    dct_pass(vshrn_n_s32, 16);
2395 
2396    {
2397       // pack and round
2398       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2399       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2400       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2401       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2402       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2403       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2404       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2405       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2406 
2407       // again, these can translate into one instruction, but often don't.
2408 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2409 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2410 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2411 
2412       // sadly can't use interleaved stores here since we only write
2413       // 8 bytes to each scan line!
2414 
2415       // 8x8 8-bit transpose pass 1
2416       dct_trn8_8(p0, p1);
2417       dct_trn8_8(p2, p3);
2418       dct_trn8_8(p4, p5);
2419       dct_trn8_8(p6, p7);
2420 
2421       // pass 2
2422       dct_trn8_16(p0, p2);
2423       dct_trn8_16(p1, p3);
2424       dct_trn8_16(p4, p6);
2425       dct_trn8_16(p5, p7);
2426 
2427       // pass 3
2428       dct_trn8_32(p0, p4);
2429       dct_trn8_32(p1, p5);
2430       dct_trn8_32(p2, p6);
2431       dct_trn8_32(p3, p7);
2432 
2433       // store
2434       vst1_u8(out, p0); out += out_stride;
2435       vst1_u8(out, p1); out += out_stride;
2436       vst1_u8(out, p2); out += out_stride;
2437       vst1_u8(out, p3); out += out_stride;
2438       vst1_u8(out, p4); out += out_stride;
2439       vst1_u8(out, p5); out += out_stride;
2440       vst1_u8(out, p6); out += out_stride;
2441       vst1_u8(out, p7);
2442 
2443 #undef dct_trn8_8
2444 #undef dct_trn8_16
2445 #undef dct_trn8_32
2446    }
2447 
2448 #undef dct_long_mul
2449 #undef dct_long_mac
2450 #undef dct_widen
2451 #undef dct_wadd
2452 #undef dct_wsub
2453 #undef dct_bfly32o
2454 #undef dct_pass
2455 }
2456 
2457 #endif // STBI_NEON
2458 
2459 #define STBI__MARKER_none  0xff
2460 // if there's a pending marker from the entropy stream, return that
2461 // otherwise, fetch from the stream and get a marker. if there's no
2462 // marker, return 0xff, which is never a valid marker value
stbi__get_marker(stbi__jpeg * j)2463 static stbi_uc stbi__get_marker(stbi__jpeg *j)
2464 {
2465    stbi_uc x;
2466    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2467    x = stbi__get8(j->s);
2468    if (x != 0xff) return STBI__MARKER_none;
2469    while (x == 0xff)
2470       x = stbi__get8(j->s);
2471    return x;
2472 }
2473 
2474 // in each scan, we'll have scan_n components, and the order
2475 // of the components is specified by order[]
2476 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
2477 
2478 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2479 // the dc prediction
stbi__jpeg_reset(stbi__jpeg * j)2480 static void stbi__jpeg_reset(stbi__jpeg *j)
2481 {
2482    j->code_bits = 0;
2483    j->code_buffer = 0;
2484    j->nomore = 0;
2485    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
2486    j->marker = STBI__MARKER_none;
2487    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2488    j->eob_run = 0;
2489    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2490    // since we don't even allow 1<<30 pixels
2491 }
2492 
stbi__parse_entropy_coded_data(stbi__jpeg * z)2493 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2494 {
2495    stbi__jpeg_reset(z);
2496    if (!z->progressive) {
2497       if (z->scan_n == 1) {
2498          int i,j;
2499          STBI_SIMD_ALIGN(short, data[64]);
2500          int n = z->order[0];
2501          // non-interleaved data, we just need to process one block at a time,
2502          // in trivial scanline order
2503          // number of blocks to do just depends on how many actual "pixels" this
2504          // component has, independent of interleaved MCU blocking and such
2505          int w = (z->img_comp[n].x+7) >> 3;
2506          int h = (z->img_comp[n].y+7) >> 3;
2507          for (j=0; j < h; ++j) {
2508             for (i=0; i < w; ++i) {
2509                int ha = z->img_comp[n].ha;
2510                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2511                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2512                // every data block is an MCU, so countdown the restart interval
2513                if (--z->todo <= 0) {
2514                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2515                   // if it's NOT a restart, then just bail, so we get corrupt data
2516                   // rather than no data
2517                   if (!STBI__RESTART(z->marker)) return 1;
2518                   stbi__jpeg_reset(z);
2519                }
2520             }
2521          }
2522          return 1;
2523       } else { // interleaved
2524          int i,j,k,x,y;
2525          STBI_SIMD_ALIGN(short, data[64]);
2526          for (j=0; j < z->img_mcu_y; ++j) {
2527             for (i=0; i < z->img_mcu_x; ++i) {
2528                // scan an interleaved mcu... process scan_n components in order
2529                for (k=0; k < z->scan_n; ++k) {
2530                   int n = z->order[k];
2531                   // scan out an mcu's worth of this component; that's just determined
2532                   // by the basic H and V specified for the component
2533                   for (y=0; y < z->img_comp[n].v; ++y) {
2534                      for (x=0; x < z->img_comp[n].h; ++x) {
2535                         int x2 = (i*z->img_comp[n].h + x)*8;
2536                         int y2 = (j*z->img_comp[n].v + y)*8;
2537                         int ha = z->img_comp[n].ha;
2538                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2539                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2540                      }
2541                   }
2542                }
2543                // after all interleaved components, that's an interleaved MCU,
2544                // so now count down the restart interval
2545                if (--z->todo <= 0) {
2546                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2547                   if (!STBI__RESTART(z->marker)) return 1;
2548                   stbi__jpeg_reset(z);
2549                }
2550             }
2551          }
2552          return 1;
2553       }
2554    } else {
2555       if (z->scan_n == 1) {
2556          int i,j;
2557          int n = z->order[0];
2558          // non-interleaved data, we just need to process one block at a time,
2559          // in trivial scanline order
2560          // number of blocks to do just depends on how many actual "pixels" this
2561          // component has, independent of interleaved MCU blocking and such
2562          int w = (z->img_comp[n].x+7) >> 3;
2563          int h = (z->img_comp[n].y+7) >> 3;
2564          for (j=0; j < h; ++j) {
2565             for (i=0; i < w; ++i) {
2566                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2567                if (z->spec_start == 0) {
2568                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2569                      return 0;
2570                } else {
2571                   int ha = z->img_comp[n].ha;
2572                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2573                      return 0;
2574                }
2575                // every data block is an MCU, so countdown the restart interval
2576                if (--z->todo <= 0) {
2577                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2578                   if (!STBI__RESTART(z->marker)) return 1;
2579                   stbi__jpeg_reset(z);
2580                }
2581             }
2582          }
2583          return 1;
2584       } else { // interleaved
2585          int i,j,k,x,y;
2586          for (j=0; j < z->img_mcu_y; ++j) {
2587             for (i=0; i < z->img_mcu_x; ++i) {
2588                // scan an interleaved mcu... process scan_n components in order
2589                for (k=0; k < z->scan_n; ++k) {
2590                   int n = z->order[k];
2591                   // scan out an mcu's worth of this component; that's just determined
2592                   // by the basic H and V specified for the component
2593                   for (y=0; y < z->img_comp[n].v; ++y) {
2594                      for (x=0; x < z->img_comp[n].h; ++x) {
2595                         int x2 = (i*z->img_comp[n].h + x);
2596                         int y2 = (j*z->img_comp[n].v + y);
2597                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
2598                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2599                            return 0;
2600                      }
2601                   }
2602                }
2603                // after all interleaved components, that's an interleaved MCU,
2604                // so now count down the restart interval
2605                if (--z->todo <= 0) {
2606                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2607                   if (!STBI__RESTART(z->marker)) return 1;
2608                   stbi__jpeg_reset(z);
2609                }
2610             }
2611          }
2612          return 1;
2613       }
2614    }
2615 }
2616 
stbi__jpeg_dequantize(short * data,stbi_uc * dequant)2617 static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant)
2618 {
2619    int i;
2620    for (i=0; i < 64; ++i)
2621       data[i] *= dequant[i];
2622 }
2623 
stbi__jpeg_finish(stbi__jpeg * z)2624 static void stbi__jpeg_finish(stbi__jpeg *z)
2625 {
2626    if (z->progressive) {
2627       // dequantize and idct the data
2628       int i,j,n;
2629       for (n=0; n < z->s->img_n; ++n) {
2630          int w = (z->img_comp[n].x+7) >> 3;
2631          int h = (z->img_comp[n].y+7) >> 3;
2632          for (j=0; j < h; ++j) {
2633             for (i=0; i < w; ++i) {
2634                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2635                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
2636                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2637             }
2638          }
2639       }
2640    }
2641 }
2642 
stbi__process_marker(stbi__jpeg * z,int m)2643 static int stbi__process_marker(stbi__jpeg *z, int m)
2644 {
2645    int L;
2646    switch (m) {
2647       case STBI__MARKER_none: // no marker found
2648          return stbi__err("expected marker","Corrupt JPEG");
2649 
2650       case 0xDD: // DRI - specify restart interval
2651          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
2652          z->restart_interval = stbi__get16be(z->s);
2653          return 1;
2654 
2655       case 0xDB: // DQT - define quantization table
2656          L = stbi__get16be(z->s)-2;
2657          while (L > 0) {
2658             int q = stbi__get8(z->s);
2659             int p = q >> 4;
2660             int t = q & 15,i;
2661             if (p != 0) return stbi__err("bad DQT type","Corrupt JPEG");
2662             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
2663             for (i=0; i < 64; ++i)
2664                z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
2665             L -= 65;
2666          }
2667          return L==0;
2668 
2669       case 0xC4: // DHT - define huffman table
2670          L = stbi__get16be(z->s)-2;
2671          while (L > 0) {
2672             stbi_uc *v;
2673             int sizes[16],i,n=0;
2674             int q = stbi__get8(z->s);
2675             int tc = q >> 4;
2676             int th = q & 15;
2677             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
2678             for (i=0; i < 16; ++i) {
2679                sizes[i] = stbi__get8(z->s);
2680                n += sizes[i];
2681             }
2682             L -= 17;
2683             if (tc == 0) {
2684                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
2685                v = z->huff_dc[th].values;
2686             } else {
2687                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
2688                v = z->huff_ac[th].values;
2689             }
2690             for (i=0; i < n; ++i)
2691                v[i] = stbi__get8(z->s);
2692             if (tc != 0)
2693                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
2694             L -= n;
2695          }
2696          return L==0;
2697    }
2698    // check for comment block or APP blocks
2699    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
2700       stbi__skip(z->s, stbi__get16be(z->s)-2);
2701       return 1;
2702    }
2703    return 0;
2704 }
2705 
2706 // after we see SOS
stbi__process_scan_header(stbi__jpeg * z)2707 static int stbi__process_scan_header(stbi__jpeg *z)
2708 {
2709    int i;
2710    int Ls = stbi__get16be(z->s);
2711    z->scan_n = stbi__get8(z->s);
2712    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
2713    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
2714    for (i=0; i < z->scan_n; ++i) {
2715       int id = stbi__get8(z->s), which;
2716       int q = stbi__get8(z->s);
2717       for (which = 0; which < z->s->img_n; ++which)
2718          if (z->img_comp[which].id == id)
2719             break;
2720       if (which == z->s->img_n) return 0; // no match
2721       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
2722       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
2723       z->order[i] = which;
2724    }
2725 
2726    {
2727       int aa;
2728       z->spec_start = stbi__get8(z->s);
2729       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
2730       aa = stbi__get8(z->s);
2731       z->succ_high = (aa >> 4);
2732       z->succ_low  = (aa & 15);
2733       if (z->progressive) {
2734          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
2735             return stbi__err("bad SOS", "Corrupt JPEG");
2736       } else {
2737          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
2738          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
2739          z->spec_end = 63;
2740       }
2741    }
2742 
2743    return 1;
2744 }
2745 
stbi__process_frame_header(stbi__jpeg * z,int scan)2746 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
2747 {
2748    stbi__context *s = z->s;
2749    int Lf,p,i,q, h_max=1,v_max=1,c;
2750    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
2751    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
2752    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
2753    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
2754    c = stbi__get8(s);
2755    if (c != 3 && c != 1) return stbi__err("bad component count","Corrupt JPEG");    // JFIF requires
2756    s->img_n = c;
2757    for (i=0; i < c; ++i) {
2758       z->img_comp[i].data = NULL;
2759       z->img_comp[i].linebuf = NULL;
2760    }
2761 
2762    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
2763 
2764    for (i=0; i < s->img_n; ++i) {
2765       z->img_comp[i].id = stbi__get8(s);
2766       if (z->img_comp[i].id != i+1)   // JFIF requires
2767          if (z->img_comp[i].id != i)  // some version of jpegtran outputs non-JFIF-compliant files!
2768             return stbi__err("bad component ID","Corrupt JPEG");
2769       q = stbi__get8(s);
2770       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
2771       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
2772       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
2773    }
2774 
2775    if (scan != STBI__SCAN_load) return 1;
2776 
2777    if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
2778 
2779    for (i=0; i < s->img_n; ++i) {
2780       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
2781       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
2782    }
2783 
2784    // compute interleaved mcu info
2785    z->img_h_max = h_max;
2786    z->img_v_max = v_max;
2787    z->img_mcu_w = h_max * 8;
2788    z->img_mcu_h = v_max * 8;
2789    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
2790    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
2791 
2792    for (i=0; i < s->img_n; ++i) {
2793       // number of effective pixels (e.g. for non-interleaved MCU)
2794       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
2795       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
2796       // to simplify generation, we'll allocate enough memory to decode
2797       // the bogus oversized data from using interleaved MCUs and their
2798       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
2799       // discard the extra data until colorspace conversion
2800       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
2801       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
2802       z->img_comp[i].raw_data = stbi__malloc(z->img_comp[i].w2 * z->img_comp[i].h2+15);
2803 
2804       if (z->img_comp[i].raw_data == NULL) {
2805          for(--i; i >= 0; --i) {
2806             STBI_FREE(z->img_comp[i].raw_data);
2807             z->img_comp[i].data = NULL;
2808          }
2809          return stbi__err("outofmem", "Out of memory");
2810       }
2811       // align blocks for idct using mmx/sse
2812       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
2813       z->img_comp[i].linebuf = NULL;
2814       if (z->progressive) {
2815          z->img_comp[i].coeff_w = (z->img_comp[i].w2 + 7) >> 3;
2816          z->img_comp[i].coeff_h = (z->img_comp[i].h2 + 7) >> 3;
2817          z->img_comp[i].raw_coeff = STBI_MALLOC(z->img_comp[i].coeff_w * z->img_comp[i].coeff_h * 64 * sizeof(short) + 15);
2818          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
2819       } else {
2820          z->img_comp[i].coeff = 0;
2821          z->img_comp[i].raw_coeff = 0;
2822       }
2823    }
2824 
2825    return 1;
2826 }
2827 
2828 // use comparisons since in some cases we handle more than one case (e.g. SOF)
2829 #define stbi__DNL(x)         ((x) == 0xdc)
2830 #define stbi__SOI(x)         ((x) == 0xd8)
2831 #define stbi__EOI(x)         ((x) == 0xd9)
2832 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
2833 #define stbi__SOS(x)         ((x) == 0xda)
2834 
2835 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
2836 
stbi__decode_jpeg_header(stbi__jpeg * z,int scan)2837 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
2838 {
2839    int m;
2840    z->marker = STBI__MARKER_none; // initialize cached marker to empty
2841    m = stbi__get_marker(z);
2842    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
2843    if (scan == STBI__SCAN_type) return 1;
2844    m = stbi__get_marker(z);
2845    while (!stbi__SOF(m)) {
2846       if (!stbi__process_marker(z,m)) return 0;
2847       m = stbi__get_marker(z);
2848       while (m == STBI__MARKER_none) {
2849          // some files have extra padding after their blocks, so ok, we'll scan
2850          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
2851          m = stbi__get_marker(z);
2852       }
2853    }
2854    z->progressive = stbi__SOF_progressive(m);
2855    if (!stbi__process_frame_header(z, scan)) return 0;
2856    return 1;
2857 }
2858 
2859 // decode image to YCbCr format
stbi__decode_jpeg_image(stbi__jpeg * j)2860 static int stbi__decode_jpeg_image(stbi__jpeg *j)
2861 {
2862    int m;
2863    for (m = 0; m < 4; m++) {
2864       j->img_comp[m].raw_data = NULL;
2865       j->img_comp[m].raw_coeff = NULL;
2866    }
2867    j->restart_interval = 0;
2868    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
2869    m = stbi__get_marker(j);
2870    while (!stbi__EOI(m)) {
2871       if (stbi__SOS(m)) {
2872          if (!stbi__process_scan_header(j)) return 0;
2873          if (!stbi__parse_entropy_coded_data(j)) return 0;
2874          if (j->marker == STBI__MARKER_none ) {
2875             // handle 0s at the end of image data from IP Kamera 9060
2876             while (!stbi__at_eof(j->s)) {
2877                int x = stbi__get8(j->s);
2878                if (x == 255) {
2879                   j->marker = stbi__get8(j->s);
2880                   break;
2881                } else if (x != 0) {
2882                   return stbi__err("junk before marker", "Corrupt JPEG");
2883                }
2884             }
2885             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
2886          }
2887       } else {
2888          if (!stbi__process_marker(j, m)) return 0;
2889       }
2890       m = stbi__get_marker(j);
2891    }
2892    if (j->progressive)
2893       stbi__jpeg_finish(j);
2894    return 1;
2895 }
2896 
2897 // static jfif-centered resampling (across block boundaries)
2898 
2899 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
2900                                     int w, int hs);
2901 
2902 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
2903 
resample_row_1(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2904 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2905 {
2906    STBI_NOTUSED(out);
2907    STBI_NOTUSED(in_far);
2908    STBI_NOTUSED(w);
2909    STBI_NOTUSED(hs);
2910    return in_near;
2911 }
2912 
stbi__resample_row_v_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2913 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2914 {
2915    // need to generate two samples vertically for every one in input
2916    int i;
2917    STBI_NOTUSED(hs);
2918    for (i=0; i < w; ++i)
2919       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
2920    return out;
2921 }
2922 
stbi__resample_row_h_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2923 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2924 {
2925    // need to generate two samples horizontally for every one in input
2926    int i;
2927    stbi_uc *input = in_near;
2928 
2929    if (w == 1) {
2930       // if only one sample, can't do any interpolation
2931       out[0] = out[1] = input[0];
2932       return out;
2933    }
2934 
2935    out[0] = input[0];
2936    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
2937    for (i=1; i < w-1; ++i) {
2938       int n = 3*input[i]+2;
2939       out[i*2+0] = stbi__div4(n+input[i-1]);
2940       out[i*2+1] = stbi__div4(n+input[i+1]);
2941    }
2942    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
2943    out[i*2+1] = input[w-1];
2944 
2945    STBI_NOTUSED(in_far);
2946    STBI_NOTUSED(hs);
2947 
2948    return out;
2949 }
2950 
2951 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
2952 
stbi__resample_row_hv_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2953 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2954 {
2955    // need to generate 2x2 samples for every one in input
2956    int i,t0,t1;
2957    if (w == 1) {
2958       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2959       return out;
2960    }
2961 
2962    t1 = 3*in_near[0] + in_far[0];
2963    out[0] = stbi__div4(t1+2);
2964    for (i=1; i < w; ++i) {
2965       t0 = t1;
2966       t1 = 3*in_near[i]+in_far[i];
2967       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
2968       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
2969    }
2970    out[w*2-1] = stbi__div4(t1+2);
2971 
2972    STBI_NOTUSED(hs);
2973 
2974    return out;
2975 }
2976 
2977 #if defined(STBI_SSE2) || defined(STBI_NEON)
stbi__resample_row_hv_2_simd(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2978 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2979 {
2980    // need to generate 2x2 samples for every one in input
2981    int i=0,t0,t1;
2982 
2983    if (w == 1) {
2984       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2985       return out;
2986    }
2987 
2988    t1 = 3*in_near[0] + in_far[0];
2989    // process groups of 8 pixels for as long as we can.
2990    // note we can't handle the last pixel in a row in this loop
2991    // because we need to handle the filter boundary conditions.
2992    for (; i < ((w-1) & ~7); i += 8) {
2993 #if defined(STBI_SSE2)
2994       // load and perform the vertical filtering pass
2995       // this uses 3*x + y = 4*x + (y - x)
2996       __m128i zero  = _mm_setzero_si128();
2997       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
2998       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
2999       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
3000       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
3001       __m128i diff  = _mm_sub_epi16(farw, nearw);
3002       __m128i nears = _mm_slli_epi16(nearw, 2);
3003       __m128i curr  = _mm_add_epi16(nears, diff); // current row
3004 
3005       // horizontal filter works the same based on shifted vers of current
3006       // row. "prev" is current row shifted right by 1 pixel; we need to
3007       // insert the previous pixel value (from t1).
3008       // "next" is current row shifted left by 1 pixel, with first pixel
3009       // of next block of 8 pixels added in.
3010       __m128i prv0 = _mm_slli_si128(curr, 2);
3011       __m128i nxt0 = _mm_srli_si128(curr, 2);
3012       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
3013       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
3014 
3015       // horizontal filter, polyphase implementation since it's convenient:
3016       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3017       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3018       // note the shared term.
3019       __m128i bias  = _mm_set1_epi16(8);
3020       __m128i curs = _mm_slli_epi16(curr, 2);
3021       __m128i prvd = _mm_sub_epi16(prev, curr);
3022       __m128i nxtd = _mm_sub_epi16(next, curr);
3023       __m128i curb = _mm_add_epi16(curs, bias);
3024       __m128i even = _mm_add_epi16(prvd, curb);
3025       __m128i odd  = _mm_add_epi16(nxtd, curb);
3026 
3027       // interleave even and odd pixels, then undo scaling.
3028       __m128i int0 = _mm_unpacklo_epi16(even, odd);
3029       __m128i int1 = _mm_unpackhi_epi16(even, odd);
3030       __m128i de0  = _mm_srli_epi16(int0, 4);
3031       __m128i de1  = _mm_srli_epi16(int1, 4);
3032 
3033       // pack and write output
3034       __m128i outv = _mm_packus_epi16(de0, de1);
3035       _mm_storeu_si128((__m128i *) (out + i*2), outv);
3036 #elif defined(STBI_NEON)
3037       // load and perform the vertical filtering pass
3038       // this uses 3*x + y = 4*x + (y - x)
3039       uint8x8_t farb  = vld1_u8(in_far + i);
3040       uint8x8_t nearb = vld1_u8(in_near + i);
3041       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3042       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3043       int16x8_t curr  = vaddq_s16(nears, diff); // current row
3044 
3045       // horizontal filter works the same based on shifted vers of current
3046       // row. "prev" is current row shifted right by 1 pixel; we need to
3047       // insert the previous pixel value (from t1).
3048       // "next" is current row shifted left by 1 pixel, with first pixel
3049       // of next block of 8 pixels added in.
3050       int16x8_t prv0 = vextq_s16(curr, curr, 7);
3051       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3052       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3053       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
3054 
3055       // horizontal filter, polyphase implementation since it's convenient:
3056       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3057       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3058       // note the shared term.
3059       int16x8_t curs = vshlq_n_s16(curr, 2);
3060       int16x8_t prvd = vsubq_s16(prev, curr);
3061       int16x8_t nxtd = vsubq_s16(next, curr);
3062       int16x8_t even = vaddq_s16(curs, prvd);
3063       int16x8_t odd  = vaddq_s16(curs, nxtd);
3064 
3065       // undo scaling and round, then store with even/odd phases interleaved
3066       uint8x8x2_t o;
3067       o.val[0] = vqrshrun_n_s16(even, 4);
3068       o.val[1] = vqrshrun_n_s16(odd,  4);
3069       vst2_u8(out + i*2, o);
3070 #endif
3071 
3072       // "previous" value for next iter
3073       t1 = 3*in_near[i+7] + in_far[i+7];
3074    }
3075 
3076    t0 = t1;
3077    t1 = 3*in_near[i] + in_far[i];
3078    out[i*2] = stbi__div16(3*t1 + t0 + 8);
3079 
3080    for (++i; i < w; ++i) {
3081       t0 = t1;
3082       t1 = 3*in_near[i]+in_far[i];
3083       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3084       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3085    }
3086    out[w*2-1] = stbi__div4(t1+2);
3087 
3088    STBI_NOTUSED(hs);
3089 
3090    return out;
3091 }
3092 #endif
3093 
stbi__resample_row_generic(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)3094 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3095 {
3096    // resample with nearest-neighbor
3097    int i,j;
3098    STBI_NOTUSED(in_far);
3099    for (i=0; i < w; ++i)
3100       for (j=0; j < hs; ++j)
3101          out[i*hs+j] = in_near[i];
3102    return out;
3103 }
3104 
3105 #ifdef STBI_JPEG_OLD
3106 // this is the same YCbCr-to-RGB calculation that stb_image has used
3107 // historically before the algorithm changes in 1.49
3108 #define float2fixed(x)  ((int) ((x) * 65536 + 0.5))
stbi__YCbCr_to_RGB_row(stbi_uc * out,const stbi_uc * y,const stbi_uc * pcb,const stbi_uc * pcr,int count,int step)3109 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3110 {
3111    int i;
3112    for (i=0; i < count; ++i) {
3113       int y_fixed = (y[i] << 16) + 32768; // rounding
3114       int r,g,b;
3115       int cr = pcr[i] - 128;
3116       int cb = pcb[i] - 128;
3117       r = y_fixed + cr*float2fixed(1.40200f);
3118       g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
3119       b = y_fixed                            + cb*float2fixed(1.77200f);
3120       r >>= 16;
3121       g >>= 16;
3122       b >>= 16;
3123       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3124       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3125       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3126       out[0] = (stbi_uc)r;
3127       out[1] = (stbi_uc)g;
3128       out[2] = (stbi_uc)b;
3129       out[3] = 255;
3130       out += step;
3131    }
3132 }
3133 #else
3134 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
3135 // to make sure the code produces the same results in both SIMD and scalar
3136 #define float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
stbi__YCbCr_to_RGB_row(stbi_uc * out,const stbi_uc * y,const stbi_uc * pcb,const stbi_uc * pcr,int count,int step)3137 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3138 {
3139    int i;
3140    for (i=0; i < count; ++i) {
3141       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3142       int r,g,b;
3143       int cr = pcr[i] - 128;
3144       int cb = pcb[i] - 128;
3145       r = y_fixed +  cr* float2fixed(1.40200f);
3146       g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3147       b = y_fixed                               +   cb* float2fixed(1.77200f);
3148       r >>= 20;
3149       g >>= 20;
3150       b >>= 20;
3151       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3152       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3153       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3154       out[0] = (stbi_uc)r;
3155       out[1] = (stbi_uc)g;
3156       out[2] = (stbi_uc)b;
3157       out[3] = 255;
3158       out += step;
3159    }
3160 }
3161 #endif
3162 
3163 #if defined(STBI_SSE2) || defined(STBI_NEON)
stbi__YCbCr_to_RGB_simd(stbi_uc * out,stbi_uc const * y,stbi_uc const * pcb,stbi_uc const * pcr,int count,int step)3164 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3165 {
3166    int i = 0;
3167 
3168 #ifdef STBI_SSE2
3169    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3170    // it's useful in practice (you wouldn't use it for textures, for example).
3171    // so just accelerate step == 4 case.
3172    if (step == 4) {
3173       // this is a fairly straightforward implementation and not super-optimized.
3174       __m128i signflip  = _mm_set1_epi8(-0x80);
3175       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
3176       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3177       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3178       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
3179       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3180       __m128i xw = _mm_set1_epi16(255); // alpha channel
3181 
3182       for (; i+7 < count; i += 8) {
3183          // load
3184          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3185          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3186          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3187          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3188          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3189 
3190          // unpack to short (and left-shift cr, cb by 8)
3191          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
3192          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3193          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3194 
3195          // color transform
3196          __m128i yws = _mm_srli_epi16(yw, 4);
3197          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3198          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3199          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3200          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3201          __m128i rws = _mm_add_epi16(cr0, yws);
3202          __m128i gwt = _mm_add_epi16(cb0, yws);
3203          __m128i bws = _mm_add_epi16(yws, cb1);
3204          __m128i gws = _mm_add_epi16(gwt, cr1);
3205 
3206          // descale
3207          __m128i rw = _mm_srai_epi16(rws, 4);
3208          __m128i bw = _mm_srai_epi16(bws, 4);
3209          __m128i gw = _mm_srai_epi16(gws, 4);
3210 
3211          // back to byte, set up for transpose
3212          __m128i brb = _mm_packus_epi16(rw, bw);
3213          __m128i gxb = _mm_packus_epi16(gw, xw);
3214 
3215          // transpose to interleave channels
3216          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3217          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3218          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3219          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3220 
3221          // store
3222          _mm_storeu_si128((__m128i *) (out + 0), o0);
3223          _mm_storeu_si128((__m128i *) (out + 16), o1);
3224          out += 32;
3225       }
3226    }
3227 #endif
3228 
3229 #ifdef STBI_NEON
3230    // in this version, step=3 support would be easy to add. but is there demand?
3231    if (step == 4) {
3232       // this is a fairly straightforward implementation and not super-optimized.
3233       uint8x8_t signflip = vdup_n_u8(0x80);
3234       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
3235       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3236       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3237       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
3238 
3239       for (; i+7 < count; i += 8) {
3240          // load
3241          uint8x8_t y_bytes  = vld1_u8(y + i);
3242          uint8x8_t cr_bytes = vld1_u8(pcr + i);
3243          uint8x8_t cb_bytes = vld1_u8(pcb + i);
3244          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3245          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3246 
3247          // expand to s16
3248          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3249          int16x8_t crw = vshll_n_s8(cr_biased, 7);
3250          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3251 
3252          // color transform
3253          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3254          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3255          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3256          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3257          int16x8_t rws = vaddq_s16(yws, cr0);
3258          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3259          int16x8_t bws = vaddq_s16(yws, cb1);
3260 
3261          // undo scaling, round, convert to byte
3262          uint8x8x4_t o;
3263          o.val[0] = vqrshrun_n_s16(rws, 4);
3264          o.val[1] = vqrshrun_n_s16(gws, 4);
3265          o.val[2] = vqrshrun_n_s16(bws, 4);
3266          o.val[3] = vdup_n_u8(255);
3267 
3268          // store, interleaving r/g/b/a
3269          vst4_u8(out, o);
3270          out += 8*4;
3271       }
3272    }
3273 #endif
3274 
3275    for (; i < count; ++i) {
3276       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3277       int r,g,b;
3278       int cr = pcr[i] - 128;
3279       int cb = pcb[i] - 128;
3280       r = y_fixed + cr* float2fixed(1.40200f);
3281       g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3282       b = y_fixed                             +   cb* float2fixed(1.77200f);
3283       r >>= 20;
3284       g >>= 20;
3285       b >>= 20;
3286       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3287       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3288       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3289       out[0] = (stbi_uc)r;
3290       out[1] = (stbi_uc)g;
3291       out[2] = (stbi_uc)b;
3292       out[3] = 255;
3293       out += step;
3294    }
3295 }
3296 #endif
3297 
3298 // set up the kernels
stbi__setup_jpeg(stbi__jpeg * j)3299 static void stbi__setup_jpeg(stbi__jpeg *j)
3300 {
3301    j->idct_block_kernel = stbi__idct_block;
3302    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3303    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3304 
3305 #ifdef STBI_SSE2
3306    if (stbi__sse2_available()) {
3307       j->idct_block_kernel = stbi__idct_simd;
3308       #ifndef STBI_JPEG_OLD
3309       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3310       #endif
3311       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3312    }
3313 #endif
3314 
3315 #ifdef STBI_NEON
3316    j->idct_block_kernel = stbi__idct_simd;
3317    #ifndef STBI_JPEG_OLD
3318    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3319    #endif
3320    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3321 #endif
3322 }
3323 
3324 // clean up the temporary component buffers
stbi__cleanup_jpeg(stbi__jpeg * j)3325 static void stbi__cleanup_jpeg(stbi__jpeg *j)
3326 {
3327    int i;
3328    for (i=0; i < j->s->img_n; ++i) {
3329       if (j->img_comp[i].raw_data) {
3330          STBI_FREE(j->img_comp[i].raw_data);
3331          j->img_comp[i].raw_data = NULL;
3332          j->img_comp[i].data = NULL;
3333       }
3334       if (j->img_comp[i].raw_coeff) {
3335          STBI_FREE(j->img_comp[i].raw_coeff);
3336          j->img_comp[i].raw_coeff = 0;
3337          j->img_comp[i].coeff = 0;
3338       }
3339       if (j->img_comp[i].linebuf) {
3340          STBI_FREE(j->img_comp[i].linebuf);
3341          j->img_comp[i].linebuf = NULL;
3342       }
3343    }
3344 }
3345 
3346 typedef struct
3347 {
3348    resample_row_func resample;
3349    stbi_uc *line0,*line1;
3350    int hs,vs;   // expansion factor in each axis
3351    int w_lores; // horizontal pixels pre-expansion
3352    int ystep;   // how far through vertical expansion we are
3353    int ypos;    // which pre-expansion row we're on
3354 } stbi__resample;
3355 
load_jpeg_image(stbi__jpeg * z,int * out_x,int * out_y,int * comp,int req_comp)3356 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3357 {
3358    int n, decode_n;
3359    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3360 
3361    // validate req_comp
3362    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3363 
3364    // load a jpeg image from whichever source, but leave in YCbCr format
3365    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3366 
3367    // determine actual number of components to generate
3368    n = req_comp ? req_comp : z->s->img_n;
3369 
3370    if (z->s->img_n == 3 && n < 3)
3371       decode_n = 1;
3372    else
3373       decode_n = z->s->img_n;
3374 
3375    // resample and color-convert
3376    {
3377       int k;
3378       unsigned int i,j;
3379       stbi_uc *output;
3380       stbi_uc *coutput[4];
3381 
3382       stbi__resample res_comp[4];
3383 
3384       for (k=0; k < decode_n; ++k) {
3385          stbi__resample *r = &res_comp[k];
3386 
3387          // allocate line buffer big enough for upsampling off the edges
3388          // with upsample factor of 4
3389          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3390          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3391 
3392          r->hs      = z->img_h_max / z->img_comp[k].h;
3393          r->vs      = z->img_v_max / z->img_comp[k].v;
3394          r->ystep   = r->vs >> 1;
3395          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3396          r->ypos    = 0;
3397          r->line0   = r->line1 = z->img_comp[k].data;
3398 
3399          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3400          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3401          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3402          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3403          else                               r->resample = stbi__resample_row_generic;
3404       }
3405 
3406       // can't error after this so, this is safe
3407       output = (stbi_uc *) stbi__malloc(n * z->s->img_x * z->s->img_y + 1);
3408       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3409 
3410       // now go ahead and resample
3411       for (j=0; j < z->s->img_y; ++j) {
3412          stbi_uc *out = output + n * z->s->img_x * j;
3413          for (k=0; k < decode_n; ++k) {
3414             stbi__resample *r = &res_comp[k];
3415             int y_bot = r->ystep >= (r->vs >> 1);
3416             coutput[k] = r->resample(z->img_comp[k].linebuf,
3417                                      y_bot ? r->line1 : r->line0,
3418                                      y_bot ? r->line0 : r->line1,
3419                                      r->w_lores, r->hs);
3420             if (++r->ystep >= r->vs) {
3421                r->ystep = 0;
3422                r->line0 = r->line1;
3423                if (++r->ypos < z->img_comp[k].y)
3424                   r->line1 += z->img_comp[k].w2;
3425             }
3426          }
3427          if (n >= 3) {
3428             stbi_uc *y = coutput[0];
3429             if (z->s->img_n == 3) {
3430                z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3431             } else
3432                for (i=0; i < z->s->img_x; ++i) {
3433                   out[0] = out[1] = out[2] = y[i];
3434                   out[3] = 255; // not used if n==3
3435                   out += n;
3436                }
3437          } else {
3438             stbi_uc *y = coutput[0];
3439             if (n == 1)
3440                for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
3441             else
3442                for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
3443          }
3444       }
3445       stbi__cleanup_jpeg(z);
3446       *out_x = z->s->img_x;
3447       *out_y = z->s->img_y;
3448       if (comp) *comp  = z->s->img_n; // report original components, not output
3449       return output;
3450    }
3451 }
3452 
stbi__jpeg_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)3453 static unsigned char * STBI_FORCE_STACK_ALIGN stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
3454 {
3455    stbi__jpeg j;
3456    j.s = s;
3457    stbi__setup_jpeg(&j);
3458    return load_jpeg_image(&j, x,y,comp,req_comp);
3459 }
3460 
stbi__jpeg_test(stbi__context * s)3461 static int stbi__jpeg_test(stbi__context *s)
3462 {
3463    int r;
3464    stbi__jpeg j;
3465    j.s = s;
3466    stbi__setup_jpeg(&j);
3467    r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
3468    stbi__rewind(s);
3469    return r;
3470 }
3471 
stbi__jpeg_info_raw(stbi__jpeg * j,int * x,int * y,int * comp)3472 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
3473 {
3474    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3475       stbi__rewind( j->s );
3476       return 0;
3477    }
3478    if (x) *x = j->s->img_x;
3479    if (y) *y = j->s->img_y;
3480    if (comp) *comp = j->s->img_n;
3481    return 1;
3482 }
3483 
stbi__jpeg_info(stbi__context * s,int * x,int * y,int * comp)3484 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
3485 {
3486    stbi__jpeg j;
3487    j.s = s;
3488    return stbi__jpeg_info_raw(&j, x, y, comp);
3489 }
3490 #endif
3491 
3492 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
3493 //    simple implementation
3494 //      - all input must be provided in an upfront buffer
3495 //      - all output is written to a single output buffer (can malloc/realloc)
3496 //    performance
3497 //      - fast huffman
3498 
3499 #ifndef STBI_NO_ZLIB
3500 
3501 // fast-way is faster to check than jpeg huffman, but slow way is slower
3502 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
3503 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
3504 
3505 // zlib-style huffman encoding
3506 // (jpegs packs from left, zlib from right, so can't share code)
3507 typedef struct
3508 {
3509    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
3510    stbi__uint16 firstcode[16];
3511    int maxcode[17];
3512    stbi__uint16 firstsymbol[16];
3513    stbi_uc  size[288];
3514    stbi__uint16 value[288];
3515 } stbi__zhuffman;
3516 
stbi__bitreverse16(int n)3517 stbi_inline static int stbi__bitreverse16(int n)
3518 {
3519   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
3520   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
3521   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
3522   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
3523   return n;
3524 }
3525 
stbi__bit_reverse(int v,int bits)3526 stbi_inline static int stbi__bit_reverse(int v, int bits)
3527 {
3528    STBI_ASSERT(bits <= 16);
3529    // to bit reverse n bits, reverse 16 and shift
3530    // e.g. 11 bits, bit reverse and shift away 5
3531    return stbi__bitreverse16(v) >> (16-bits);
3532 }
3533 
stbi__zbuild_huffman(stbi__zhuffman * z,stbi_uc * sizelist,int num)3534 static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num)
3535 {
3536    int i,k=0;
3537    int code, next_code[16], sizes[17];
3538 
3539    // DEFLATE spec for generating codes
3540    memset(sizes, 0, sizeof(sizes));
3541    memset(z->fast, 0, sizeof(z->fast));
3542    for (i=0; i < num; ++i)
3543       ++sizes[sizelist[i]];
3544    sizes[0] = 0;
3545    for (i=1; i < 16; ++i)
3546       if (sizes[i] > (1 << i))
3547          return stbi__err("bad sizes", "Corrupt PNG");
3548    code = 0;
3549    for (i=1; i < 16; ++i) {
3550       next_code[i] = code;
3551       z->firstcode[i] = (stbi__uint16) code;
3552       z->firstsymbol[i] = (stbi__uint16) k;
3553       code = (code + sizes[i]);
3554       if (sizes[i])
3555          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
3556       z->maxcode[i] = code << (16-i); // preshift for inner loop
3557       code <<= 1;
3558       k += sizes[i];
3559    }
3560    z->maxcode[16] = 0x10000; // sentinel
3561    for (i=0; i < num; ++i) {
3562       int s = sizelist[i];
3563       if (s) {
3564          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
3565          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
3566          z->size [c] = (stbi_uc     ) s;
3567          z->value[c] = (stbi__uint16) i;
3568          if (s <= STBI__ZFAST_BITS) {
3569             int k2 = stbi__bit_reverse(next_code[s],s);
3570             while (k2 < (1 << STBI__ZFAST_BITS)) {
3571                z->fast[k2] = fastv;
3572                k2 += (1 << s);
3573             }
3574          }
3575          ++next_code[s];
3576       }
3577    }
3578    return 1;
3579 }
3580 
3581 // zlib-from-memory implementation for PNG reading
3582 //    because PNG allows splitting the zlib stream arbitrarily,
3583 //    and it's annoying structurally to have PNG call ZLIB call PNG,
3584 //    we require PNG read all the IDATs and combine them into a single
3585 //    memory buffer
3586 
3587 typedef struct
3588 {
3589    stbi_uc *zbuffer, *zbuffer_end;
3590    int num_bits;
3591    stbi__uint32 code_buffer;
3592 
3593    char *zout;
3594    char *zout_start;
3595    char *zout_end;
3596    int   z_expandable;
3597 
3598    stbi__zhuffman z_length, z_distance;
3599 } stbi__zbuf;
3600 
stbi__zget8(stbi__zbuf * z)3601 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
3602 {
3603    if (z->zbuffer >= z->zbuffer_end) return 0;
3604    return *z->zbuffer++;
3605 }
3606 
stbi__fill_bits(stbi__zbuf * z)3607 static void stbi__fill_bits(stbi__zbuf *z)
3608 {
3609    do {
3610       STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
3611       z->code_buffer |= stbi__zget8(z) << z->num_bits;
3612       z->num_bits += 8;
3613    } while (z->num_bits <= 24);
3614 }
3615 
stbi__zreceive(stbi__zbuf * z,int n)3616 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
3617 {
3618    unsigned int k;
3619    if (z->num_bits < n) stbi__fill_bits(z);
3620    k = z->code_buffer & ((1 << n) - 1);
3621    z->code_buffer >>= n;
3622    z->num_bits -= n;
3623    return k;
3624 }
3625 
stbi__zhuffman_decode_slowpath(stbi__zbuf * a,stbi__zhuffman * z)3626 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
3627 {
3628    int b,s,k;
3629    // not resolved by fast table, so compute it the slow way
3630    // use jpeg approach, which requires MSbits at top
3631    k = stbi__bit_reverse(a->code_buffer, 16);
3632    for (s=STBI__ZFAST_BITS+1; ; ++s)
3633       if (k < z->maxcode[s])
3634          break;
3635    if (s == 16) return -1; // invalid code!
3636    // code size is s, so:
3637    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
3638    STBI_ASSERT(z->size[b] == s);
3639    a->code_buffer >>= s;
3640    a->num_bits -= s;
3641    return z->value[b];
3642 }
3643 
stbi__zhuffman_decode(stbi__zbuf * a,stbi__zhuffman * z)3644 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
3645 {
3646    int b,s;
3647    if (a->num_bits < 16) stbi__fill_bits(a);
3648    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
3649    if (b) {
3650       s = b >> 9;
3651       a->code_buffer >>= s;
3652       a->num_bits -= s;
3653       return b & 511;
3654    }
3655    return stbi__zhuffman_decode_slowpath(a, z);
3656 }
3657 
stbi__zexpand(stbi__zbuf * z,char * zout,int n)3658 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
3659 {
3660    char *q;
3661    int cur, limit;
3662    z->zout = zout;
3663    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
3664    cur   = (int) (z->zout     - z->zout_start);
3665    limit = (int) (z->zout_end - z->zout_start);
3666    while (cur + n > limit)
3667       limit *= 2;
3668    q = (char *) STBI_REALLOC(z->zout_start, limit);
3669    if (q == NULL) return stbi__err("outofmem", "Out of memory");
3670    z->zout_start = q;
3671    z->zout       = q + cur;
3672    z->zout_end   = q + limit;
3673    return 1;
3674 }
3675 
3676 static int stbi__zlength_base[31] = {
3677    3,4,5,6,7,8,9,10,11,13,
3678    15,17,19,23,27,31,35,43,51,59,
3679    67,83,99,115,131,163,195,227,258,0,0 };
3680 
3681 static int stbi__zlength_extra[31]=
3682 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
3683 
3684 static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
3685 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
3686 
3687 static int stbi__zdist_extra[32] =
3688 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
3689 
stbi__parse_huffman_block(stbi__zbuf * a)3690 static int stbi__parse_huffman_block(stbi__zbuf *a)
3691 {
3692    char *zout = a->zout;
3693    for(;;) {
3694       int z = stbi__zhuffman_decode(a, &a->z_length);
3695       if (z < 256) {
3696          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
3697          if (zout >= a->zout_end) {
3698             if (!stbi__zexpand(a, zout, 1)) return 0;
3699             zout = a->zout;
3700          }
3701          *zout++ = (char) z;
3702       } else {
3703          stbi_uc *p;
3704          int len,dist;
3705          if (z == 256) {
3706             a->zout = zout;
3707             return 1;
3708          }
3709          z -= 257;
3710          len = stbi__zlength_base[z];
3711          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
3712          z = stbi__zhuffman_decode(a, &a->z_distance);
3713          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
3714          dist = stbi__zdist_base[z];
3715          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
3716          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
3717          if (zout + len > a->zout_end) {
3718             if (!stbi__zexpand(a, zout, len)) return 0;
3719             zout = a->zout;
3720          }
3721          p = (stbi_uc *) (zout - dist);
3722          if (dist == 1) { // run of one byte; common in images.
3723             stbi_uc v = *p;
3724             if (len) { do *zout++ = v; while (--len); }
3725          } else {
3726             if (len) { do *zout++ = *p++; while (--len); }
3727          }
3728       }
3729    }
3730 }
3731 
stbi__compute_huffman_codes(stbi__zbuf * a)3732 static int stbi__compute_huffman_codes(stbi__zbuf *a)
3733 {
3734    static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
3735    stbi__zhuffman z_codelength;
3736    stbi_uc lencodes[286+32+137];//padding for maximum single op
3737    stbi_uc codelength_sizes[19];
3738    int i,n;
3739 
3740    int hlit  = stbi__zreceive(a,5) + 257;
3741    int hdist = stbi__zreceive(a,5) + 1;
3742    int hclen = stbi__zreceive(a,4) + 4;
3743 
3744    memset(codelength_sizes, 0, sizeof(codelength_sizes));
3745    for (i=0; i < hclen; ++i) {
3746       int s = stbi__zreceive(a,3);
3747       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
3748    }
3749    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
3750 
3751    n = 0;
3752    while (n < hlit + hdist) {
3753       int c = stbi__zhuffman_decode(a, &z_codelength);
3754       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
3755       if (c < 16)
3756          lencodes[n++] = (stbi_uc) c;
3757       else if (c == 16) {
3758          c = stbi__zreceive(a,2)+3;
3759          memset(lencodes+n, lencodes[n-1], c);
3760          n += c;
3761       } else if (c == 17) {
3762          c = stbi__zreceive(a,3)+3;
3763          memset(lencodes+n, 0, c);
3764          n += c;
3765       } else {
3766          STBI_ASSERT(c == 18);
3767          c = stbi__zreceive(a,7)+11;
3768          memset(lencodes+n, 0, c);
3769          n += c;
3770       }
3771    }
3772    if (n != hlit+hdist) return stbi__err("bad codelengths","Corrupt PNG");
3773    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
3774    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
3775    return 1;
3776 }
3777 
stbi__parse_uncomperssed_block(stbi__zbuf * a)3778 static int stbi__parse_uncomperssed_block(stbi__zbuf *a)
3779 {
3780    stbi_uc header[4];
3781    int len,nlen,k;
3782    if (a->num_bits & 7)
3783       stbi__zreceive(a, a->num_bits & 7); // discard
3784    // drain the bit-packed data into header
3785    k = 0;
3786    while (a->num_bits > 0) {
3787       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
3788       a->code_buffer >>= 8;
3789       a->num_bits -= 8;
3790    }
3791    STBI_ASSERT(a->num_bits == 0);
3792    // now fill header the normal way
3793    while (k < 4)
3794       header[k++] = stbi__zget8(a);
3795    len  = header[1] * 256 + header[0];
3796    nlen = header[3] * 256 + header[2];
3797    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
3798    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
3799    if (a->zout + len > a->zout_end)
3800       if (!stbi__zexpand(a, a->zout, len)) return 0;
3801    memcpy(a->zout, a->zbuffer, len);
3802    a->zbuffer += len;
3803    a->zout += len;
3804    return 1;
3805 }
3806 
stbi__parse_zlib_header(stbi__zbuf * a)3807 static int stbi__parse_zlib_header(stbi__zbuf *a)
3808 {
3809    int cmf   = stbi__zget8(a);
3810    int cm    = cmf & 15;
3811    /* int cinfo = cmf >> 4; */
3812    int flg   = stbi__zget8(a);
3813    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
3814    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
3815    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
3816    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
3817    return 1;
3818 }
3819 
3820 // @TODO: should statically initialize these for optimal thread safety
3821 static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
stbi__init_zdefaults(void)3822 static void stbi__init_zdefaults(void)
3823 {
3824    int i;   // use <= to match clearly with spec
3825    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
3826    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
3827    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
3828    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
3829 
3830    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
3831 }
3832 
stbi__parse_zlib(stbi__zbuf * a,int parse_header)3833 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
3834 {
3835    int final, type;
3836    if (parse_header)
3837       if (!stbi__parse_zlib_header(a)) return 0;
3838    a->num_bits = 0;
3839    a->code_buffer = 0;
3840    do {
3841       final = stbi__zreceive(a,1);
3842       type = stbi__zreceive(a,2);
3843       if (type == 0) {
3844          if (!stbi__parse_uncomperssed_block(a)) return 0;
3845       } else if (type == 3) {
3846          return 0;
3847       } else {
3848          if (type == 1) {
3849             // use fixed code lengths
3850             if (!stbi__zdefault_distance[31]) stbi__init_zdefaults();
3851             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , 288)) return 0;
3852             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
3853          } else {
3854             if (!stbi__compute_huffman_codes(a)) return 0;
3855          }
3856          if (!stbi__parse_huffman_block(a)) return 0;
3857       }
3858    } while (!final);
3859    return 1;
3860 }
3861 
stbi__do_zlib(stbi__zbuf * a,char * obuf,int olen,int exp,int parse_header)3862 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
3863 {
3864    a->zout_start = obuf;
3865    a->zout       = obuf;
3866    a->zout_end   = obuf + olen;
3867    a->z_expandable = exp;
3868 
3869    return stbi__parse_zlib(a, parse_header);
3870 }
3871 
stbi_zlib_decode_malloc_guesssize(const char * buffer,int len,int initial_size,int * outlen)3872 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
3873 {
3874    stbi__zbuf a;
3875    char *p = (char *) stbi__malloc(initial_size);
3876    if (p == NULL) return NULL;
3877    a.zbuffer = (stbi_uc *) buffer;
3878    a.zbuffer_end = (stbi_uc *) buffer + len;
3879    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
3880       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3881       return a.zout_start;
3882    } else {
3883       STBI_FREE(a.zout_start);
3884       return NULL;
3885    }
3886 }
3887 
stbi_zlib_decode_malloc(char const * buffer,int len,int * outlen)3888 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
3889 {
3890    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
3891 }
3892 
stbi_zlib_decode_malloc_guesssize_headerflag(const char * buffer,int len,int initial_size,int * outlen,int parse_header)3893 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
3894 {
3895    stbi__zbuf a;
3896    char *p = (char *) stbi__malloc(initial_size);
3897    if (p == NULL) return NULL;
3898    a.zbuffer = (stbi_uc *) buffer;
3899    a.zbuffer_end = (stbi_uc *) buffer + len;
3900    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
3901       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3902       return a.zout_start;
3903    } else {
3904       STBI_FREE(a.zout_start);
3905       return NULL;
3906    }
3907 }
3908 
stbi_zlib_decode_buffer(char * obuffer,int olen,char const * ibuffer,int ilen)3909 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
3910 {
3911    stbi__zbuf a;
3912    a.zbuffer = (stbi_uc *) ibuffer;
3913    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3914    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
3915       return (int) (a.zout - a.zout_start);
3916    else
3917       return -1;
3918 }
3919 
stbi_zlib_decode_noheader_malloc(char const * buffer,int len,int * outlen)3920 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
3921 {
3922    stbi__zbuf a;
3923    char *p = (char *) stbi__malloc(16384);
3924    if (p == NULL) return NULL;
3925    a.zbuffer = (stbi_uc *) buffer;
3926    a.zbuffer_end = (stbi_uc *) buffer+len;
3927    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
3928       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3929       return a.zout_start;
3930    } else {
3931       STBI_FREE(a.zout_start);
3932       return NULL;
3933    }
3934 }
3935 
stbi_zlib_decode_noheader_buffer(char * obuffer,int olen,const char * ibuffer,int ilen)3936 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
3937 {
3938    stbi__zbuf a;
3939    a.zbuffer = (stbi_uc *) ibuffer;
3940    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3941    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
3942       return (int) (a.zout - a.zout_start);
3943    else
3944       return -1;
3945 }
3946 #endif
3947 
3948 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
3949 //    simple implementation
3950 //      - only 8-bit samples
3951 //      - no CRC checking
3952 //      - allocates lots of intermediate memory
3953 //        - avoids problem of streaming data between subsystems
3954 //        - avoids explicit window management
3955 //    performance
3956 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
3957 
3958 #ifndef STBI_NO_PNG
3959 typedef struct
3960 {
3961    stbi__uint32 length;
3962    stbi__uint32 type;
3963 } stbi__pngchunk;
3964 
stbi__get_chunk_header(stbi__context * s)3965 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
3966 {
3967    stbi__pngchunk c;
3968    c.length = stbi__get32be(s);
3969    c.type   = stbi__get32be(s);
3970    return c;
3971 }
3972 
stbi__check_png_header(stbi__context * s)3973 static int stbi__check_png_header(stbi__context *s)
3974 {
3975    static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
3976    int i;
3977    for (i=0; i < 8; ++i)
3978       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
3979    return 1;
3980 }
3981 
3982 typedef struct
3983 {
3984    stbi__context *s;
3985    stbi_uc *idata, *expanded, *out;
3986 } stbi__png;
3987 
3988 
3989 enum {
3990    STBI__F_none=0,
3991    STBI__F_sub=1,
3992    STBI__F_up=2,
3993    STBI__F_avg=3,
3994    STBI__F_paeth=4,
3995    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
3996    STBI__F_avg_first,
3997    STBI__F_paeth_first
3998 };
3999 
4000 static stbi_uc first_row_filter[5] =
4001 {
4002    STBI__F_none,
4003    STBI__F_sub,
4004    STBI__F_none,
4005    STBI__F_avg_first,
4006    STBI__F_paeth_first
4007 };
4008 
stbi__paeth(int a,int b,int c)4009 static int stbi__paeth(int a, int b, int c)
4010 {
4011    int p = a + b - c;
4012    int pa = abs(p-a);
4013    int pb = abs(p-b);
4014    int pc = abs(p-c);
4015    if (pa <= pb && pa <= pc) return a;
4016    if (pb <= pc) return b;
4017    return c;
4018 }
4019 
4020 static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
4021 
4022 // create the png data from post-deflated data
stbi__create_png_image_raw(stbi__png * a,stbi_uc * raw,stbi__uint32 raw_len,int out_n,stbi__uint32 x,stbi__uint32 y,int depth,int color)4023 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
4024 {
4025    stbi__context *s = a->s;
4026    stbi__uint32 i,j,stride = x*out_n;
4027    stbi__uint32 img_len, img_width_bytes;
4028    int k;
4029    int img_n = s->img_n; // copy it into a local for later
4030 
4031    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
4032    a->out = (stbi_uc *) stbi__malloc(x * y * out_n); // extra bytes to write off the end into
4033    if (!a->out) return stbi__err("outofmem", "Out of memory");
4034 
4035    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
4036    img_len = (img_width_bytes + 1) * y;
4037    if (s->img_x == x && s->img_y == y) {
4038       if (raw_len != img_len) return stbi__err("not enough pixels","Corrupt PNG");
4039    } else { // interlaced:
4040       if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
4041    }
4042 
4043    for (j=0; j < y; ++j) {
4044       stbi_uc *cur = a->out + stride*j;
4045       stbi_uc *prior = cur - stride;
4046       int filter = *raw++;
4047       int filter_bytes = img_n;
4048       int width = x;
4049       if (filter > 4)
4050          return stbi__err("invalid filter","Corrupt PNG");
4051 
4052       if (depth < 8) {
4053          STBI_ASSERT(img_width_bytes <= x);
4054          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
4055          filter_bytes = 1;
4056          width = img_width_bytes;
4057       }
4058 
4059       // if first row, use special filter that doesn't sample previous row
4060       if (j == 0) filter = first_row_filter[filter];
4061 
4062       // handle first byte explicitly
4063       for (k=0; k < filter_bytes; ++k) {
4064          switch (filter) {
4065             case STBI__F_none       : cur[k] = raw[k]; break;
4066             case STBI__F_sub        : cur[k] = raw[k]; break;
4067             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4068             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
4069             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
4070             case STBI__F_avg_first  : cur[k] = raw[k]; break;
4071             case STBI__F_paeth_first: cur[k] = raw[k]; break;
4072          }
4073       }
4074 
4075       if (depth == 8) {
4076          if (img_n != out_n)
4077             cur[img_n] = 255; // first pixel
4078          raw += img_n;
4079          cur += out_n;
4080          prior += out_n;
4081       } else {
4082          raw += 1;
4083          cur += 1;
4084          prior += 1;
4085       }
4086 
4087       // this is a little gross, so that we don't switch per-pixel or per-component
4088       if (depth < 8 || img_n == out_n) {
4089          int nk = (width - 1)*img_n;
4090          #define CASE(f) \
4091              case f:     \
4092                 for (k=0; k < nk; ++k)
4093          switch (filter) {
4094             // "none" filter turns into a memcpy here; make that explicit.
4095             case STBI__F_none:         memcpy(cur, raw, nk);
4096              break;
4097             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]);
4098              break;
4099             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]);
4100              break;
4101             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1));
4102              break;
4103             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes]));
4104              break;
4105             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1));
4106              break;
4107             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0));
4108              break;
4109          }
4110          #undef CASE
4111          raw += nk;
4112       } else {
4113          STBI_ASSERT(img_n+1 == out_n);
4114          #define CASE(f) \
4115              case f:     \
4116                 for (i=x-1; i >= 1; --i, cur[img_n]=255,raw+=img_n,cur+=out_n,prior+=out_n) \
4117                    for (k=0; k < img_n; ++k)
4118          switch (filter) {
4119             CASE(STBI__F_none)         cur[k] = raw[k];
4120             break;
4121             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-out_n]);
4122             break;
4123             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]);
4124             break;
4125             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-out_n])>>1));
4126             break;
4127             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],prior[k],prior[k-out_n]));
4128             break;
4129             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-out_n] >> 1));
4130             break;
4131             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],0,0));
4132             break;
4133          }
4134          #undef CASE
4135       }
4136    }
4137 
4138    // we make a separate pass to expand bits to pixels; for performance,
4139    // this could run two scanlines behind the above code, so it won't
4140    // intefere with filtering but will still be in the cache.
4141    if (depth < 8) {
4142       for (j=0; j < y; ++j) {
4143          stbi_uc *cur = a->out + stride*j;
4144          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
4145          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
4146          // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
4147          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4148 
4149          // note that the final byte might overshoot and write more data than desired.
4150          // we can allocate enough data that this never writes out of memory, but it
4151          // could also overwrite the next scanline. can it overwrite non-empty data
4152          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
4153          // so we need to explicitly clamp the final ones
4154 
4155          if (depth == 4) {
4156             for (k=x*img_n; k >= 2; k-=2, ++in) {
4157                *cur++ = scale * ((*in >> 4)       );
4158                *cur++ = scale * ((*in     ) & 0x0f);
4159             }
4160             if (k > 0) *cur++ = scale * ((*in >> 4)       );
4161          } else if (depth == 2) {
4162             for (k=x*img_n; k >= 4; k-=4, ++in) {
4163                *cur++ = scale * ((*in >> 6)       );
4164                *cur++ = scale * ((*in >> 4) & 0x03);
4165                *cur++ = scale * ((*in >> 2) & 0x03);
4166                *cur++ = scale * ((*in     ) & 0x03);
4167             }
4168             if (k > 0) *cur++ = scale * ((*in >> 6)       );
4169             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
4170             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
4171          } else if (depth == 1) {
4172             for (k=x*img_n; k >= 8; k-=8, ++in) {
4173                *cur++ = scale * ((*in >> 7)       );
4174                *cur++ = scale * ((*in >> 6) & 0x01);
4175                *cur++ = scale * ((*in >> 5) & 0x01);
4176                *cur++ = scale * ((*in >> 4) & 0x01);
4177                *cur++ = scale * ((*in >> 3) & 0x01);
4178                *cur++ = scale * ((*in >> 2) & 0x01);
4179                *cur++ = scale * ((*in >> 1) & 0x01);
4180                *cur++ = scale * ((*in     ) & 0x01);
4181             }
4182             if (k > 0) *cur++ = scale * ((*in >> 7)       );
4183             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4184             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4185             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4186             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4187             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4188             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4189          }
4190          if (img_n != out_n) {
4191             // insert alpha = 255
4192             stbi_uc *cur2 = a->out + stride*j;
4193             int i2;
4194             if (img_n == 1) {
4195                for (i2=x-1; i2 >= 0; --i2) {
4196                   cur2[i2*2+1] = 255;
4197                   cur2[i2*2+0] = cur2[i2];
4198                }
4199             } else {
4200                STBI_ASSERT(img_n == 3);
4201                for (i2=x-1; i2 >= 0; --i2) {
4202                   cur2[i2*4+3] = 255;
4203                   cur2[i2*4+2] = cur2[i2*3+2];
4204                   cur2[i2*4+1] = cur2[i2*3+1];
4205                   cur2[i2*4+0] = cur2[i2*3+0];
4206                }
4207             }
4208          }
4209       }
4210    }
4211 
4212    return 1;
4213 }
4214 
stbi__create_png_image(stbi__png * a,stbi_uc * image_data,stbi__uint32 image_data_len,int out_n,int depth,int color,int interlaced)4215 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4216 {
4217    stbi_uc *final;
4218    int p;
4219    if (!interlaced)
4220       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4221 
4222    // de-interlacing
4223    final = (stbi_uc *) stbi__malloc(a->s->img_x * a->s->img_y * out_n);
4224    for (p=0; p < 7; ++p) {
4225       int xorig[] = { 0,4,0,2,0,1,0 };
4226       int yorig[] = { 0,0,4,0,2,0,1 };
4227       int xspc[]  = { 8,8,4,4,2,2,1 };
4228       int yspc[]  = { 8,8,8,4,4,2,2 };
4229       int i,j,x,y;
4230       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4231       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4232       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4233       if (x && y) {
4234          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4235          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4236             STBI_FREE(final);
4237             return 0;
4238          }
4239          for (j=0; j < y; ++j) {
4240             for (i=0; i < x; ++i) {
4241                int out_y = j*yspc[p]+yorig[p];
4242                int out_x = i*xspc[p]+xorig[p];
4243                memcpy(final + out_y*a->s->img_x*out_n + out_x*out_n,
4244                       a->out + (j*x+i)*out_n, out_n);
4245             }
4246          }
4247          STBI_FREE(a->out);
4248          image_data += img_len;
4249          image_data_len -= img_len;
4250       }
4251    }
4252    a->out = final;
4253 
4254    return 1;
4255 }
4256 
stbi__compute_transparency(stbi__png * z,stbi_uc tc[3],int out_n)4257 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4258 {
4259    stbi__context *s = z->s;
4260    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4261    stbi_uc *p = z->out;
4262 
4263    // compute color-based transparency, assuming we've
4264    // already got 255 as the alpha value in the output
4265    STBI_ASSERT(out_n == 2 || out_n == 4);
4266 
4267    if (out_n == 2) {
4268       for (i=0; i < pixel_count; ++i) {
4269          p[1] = (p[0] == tc[0] ? 0 : 255);
4270          p += 2;
4271       }
4272    } else {
4273       for (i=0; i < pixel_count; ++i) {
4274          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4275             p[3] = 0;
4276          p += 4;
4277       }
4278    }
4279    return 1;
4280 }
4281 
stbi__expand_png_palette(stbi__png * a,stbi_uc * palette,int len,int pal_img_n)4282 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4283 {
4284    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4285    stbi_uc *p, *temp_out, *orig = a->out;
4286 
4287    p = (stbi_uc *) stbi__malloc(pixel_count * pal_img_n);
4288    if (p == NULL) return stbi__err("outofmem", "Out of memory");
4289 
4290    // between here and free(out) below, exitting would leak
4291    temp_out = p;
4292 
4293    if (pal_img_n == 3) {
4294       for (i=0; i < pixel_count; ++i) {
4295          int n = orig[i]*4;
4296          p[0] = palette[n  ];
4297          p[1] = palette[n+1];
4298          p[2] = palette[n+2];
4299          p += 3;
4300       }
4301    } else {
4302       for (i=0; i < pixel_count; ++i) {
4303          int n = orig[i]*4;
4304          p[0] = palette[n  ];
4305          p[1] = palette[n+1];
4306          p[2] = palette[n+2];
4307          p[3] = palette[n+3];
4308          p += 4;
4309       }
4310    }
4311    STBI_FREE(a->out);
4312    a->out = temp_out;
4313 
4314    STBI_NOTUSED(len);
4315 
4316    return 1;
4317 }
4318 
4319 static int stbi__unpremultiply_on_load = 0;
4320 static int stbi__de_iphone_flag = 0;
4321 
stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)4322 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4323 {
4324    stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
4325 }
4326 
stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)4327 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
4328 {
4329    stbi__de_iphone_flag = flag_true_if_should_convert;
4330 }
4331 
stbi__de_iphone(stbi__png * z)4332 static void stbi__de_iphone(stbi__png *z)
4333 {
4334    stbi__context *s = z->s;
4335    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4336    stbi_uc *p = z->out;
4337 
4338    if (s->img_out_n == 3) {  // convert bgr to rgb
4339       for (i=0; i < pixel_count; ++i) {
4340          stbi_uc t = p[0];
4341          p[0] = p[2];
4342          p[2] = t;
4343          p += 3;
4344       }
4345    } else {
4346       STBI_ASSERT(s->img_out_n == 4);
4347       if (stbi__unpremultiply_on_load) {
4348          // convert bgr to rgb and unpremultiply
4349          for (i=0; i < pixel_count; ++i) {
4350             stbi_uc a = p[3];
4351             stbi_uc t = p[0];
4352             if (a) {
4353                p[0] = p[2] * 255 / a;
4354                p[1] = p[1] * 255 / a;
4355                p[2] =  t   * 255 / a;
4356             } else {
4357                p[0] = p[2];
4358                p[2] = t;
4359             }
4360             p += 4;
4361          }
4362       } else {
4363          // convert bgr to rgb
4364          for (i=0; i < pixel_count; ++i) {
4365             stbi_uc t = p[0];
4366             p[0] = p[2];
4367             p[2] = t;
4368             p += 4;
4369          }
4370       }
4371    }
4372 }
4373 
4374 #define STBI__PNG_TYPE(a,b,c,d)  (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
4375 
stbi__parse_png_file(stbi__png * z,int scan,int req_comp)4376 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
4377 {
4378    stbi_uc palette[1024], pal_img_n=0;
4379    stbi_uc has_trans=0, tc[3];
4380    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
4381    int first=1,k,interlace=0, color=0, depth=0, is_iphone=0;
4382    stbi__context *s = z->s;
4383 
4384    z->expanded = NULL;
4385    z->idata = NULL;
4386    z->out = NULL;
4387 
4388    if (!stbi__check_png_header(s)) return 0;
4389 
4390    if (scan == STBI__SCAN_type) return 1;
4391 
4392    for (;;) {
4393       stbi__pngchunk c = stbi__get_chunk_header(s);
4394       switch (c.type) {
4395          case STBI__PNG_TYPE('C','g','B','I'):
4396             is_iphone = 1;
4397             stbi__skip(s, c.length);
4398             break;
4399          case STBI__PNG_TYPE('I','H','D','R'): {
4400             int comp,filter;
4401             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
4402             first = 0;
4403             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
4404             s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4405             s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4406             depth = stbi__get8(s);  if (depth != 1 && depth != 2 && depth != 4 && depth != 8)  return stbi__err("1/2/4/8-bit only","PNG not supported: 1/2/4/8-bit only");
4407             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
4408             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
4409             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
4410             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
4411             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
4412             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
4413             if (!pal_img_n) {
4414                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
4415                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
4416                if (scan == STBI__SCAN_header) return 1;
4417             } else {
4418                // if paletted, then pal_n is our final components, and
4419                // img_n is # components to decompress/filter.
4420                s->img_n = 1;
4421                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
4422                // if SCAN_header, have to scan to see if we have a tRNS
4423             }
4424             break;
4425          }
4426 
4427          case STBI__PNG_TYPE('P','L','T','E'):  {
4428             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4429             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
4430             pal_len = c.length / 3;
4431             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
4432             for (i=0; i < pal_len; ++i) {
4433                palette[i*4+0] = stbi__get8(s);
4434                palette[i*4+1] = stbi__get8(s);
4435                palette[i*4+2] = stbi__get8(s);
4436                palette[i*4+3] = 255;
4437             }
4438             break;
4439          }
4440 
4441          case STBI__PNG_TYPE('t','R','N','S'): {
4442             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4443             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
4444             if (pal_img_n) {
4445                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
4446                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
4447                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
4448                pal_img_n = 4;
4449                for (i=0; i < c.length; ++i)
4450                   palette[i*4+3] = stbi__get8(s);
4451             } else {
4452                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
4453                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
4454                has_trans = 1;
4455                for (k=0; k < s->img_n; ++k)
4456                   tc[k] = (stbi_uc) (stbi__get16be(s) & 255) * stbi__depth_scale_table[depth]; // non 8-bit images will be larger
4457             }
4458             break;
4459          }
4460 
4461          case STBI__PNG_TYPE('I','D','A','T'): {
4462             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4463             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
4464             if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
4465             if ((int)(ioff + c.length) < (int)ioff) return 0;
4466             if (ioff + c.length > idata_limit) {
4467                stbi_uc *p;
4468                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
4469                while (ioff + c.length > idata_limit)
4470                   idata_limit *= 2;
4471                p = (stbi_uc *) STBI_REALLOC(z->idata, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
4472                z->idata = p;
4473             }
4474             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
4475             ioff += c.length;
4476             break;
4477          }
4478 
4479          case STBI__PNG_TYPE('I','E','N','D'): {
4480             stbi__uint32 raw_len, bpl;
4481             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4482             if (scan != STBI__SCAN_load) return 1;
4483             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
4484             // initial guess for decoded data size to avoid unnecessary reallocs
4485             bpl = (s->img_x * depth + 7) / 8; // bytes per line, per component
4486             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
4487             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
4488             if (z->expanded == NULL) return 0; // zlib should set error
4489             STBI_FREE(z->idata); z->idata = NULL;
4490             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
4491                s->img_out_n = s->img_n+1;
4492             else
4493                s->img_out_n = s->img_n;
4494             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, depth, color, interlace)) return 0;
4495             if (has_trans)
4496                if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
4497             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
4498                stbi__de_iphone(z);
4499             if (pal_img_n) {
4500                // pal_img_n == 3 or 4
4501                s->img_n = pal_img_n; // record the actual colors we had
4502                s->img_out_n = pal_img_n;
4503                if (req_comp >= 3) s->img_out_n = req_comp;
4504                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
4505                   return 0;
4506             }
4507             STBI_FREE(z->expanded); z->expanded = NULL;
4508             return 1;
4509          }
4510 
4511          default:
4512             // if critical, fail
4513             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4514             if ((c.type & (1 << 29)) == 0) {
4515                #ifndef STBI_NO_FAILURE_STRINGS
4516                // not threadsafe
4517                static char invalid_chunk[] = "XXXX PNG chunk not known";
4518                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
4519                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
4520                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
4521                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
4522                #endif
4523                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
4524             }
4525             stbi__skip(s, c.length);
4526             break;
4527       }
4528       // end of PNG chunk, read and skip CRC
4529       stbi__get32be(s);
4530    }
4531 }
4532 
stbi__do_png(stbi__png * p,int * x,int * y,int * n,int req_comp)4533 static unsigned char *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp)
4534 {
4535    unsigned char *result=NULL;
4536    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
4537    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
4538       result = p->out;
4539       p->out = NULL;
4540       if (req_comp && req_comp != p->s->img_out_n) {
4541          result = stbi__convert_format(result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
4542          p->s->img_out_n = req_comp;
4543          if (result == NULL) return result;
4544       }
4545       *x = p->s->img_x;
4546       *y = p->s->img_y;
4547       if (n) *n = p->s->img_out_n;
4548    }
4549    STBI_FREE(p->out);      p->out      = NULL;
4550    STBI_FREE(p->expanded); p->expanded = NULL;
4551    STBI_FREE(p->idata);    p->idata    = NULL;
4552 
4553    return result;
4554 }
4555 
stbi__png_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4556 static unsigned char *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4557 {
4558    stbi__png p;
4559    p.s = s;
4560    return stbi__do_png(&p, x,y,comp,req_comp);
4561 }
4562 
stbi__png_test(stbi__context * s)4563 static int stbi__png_test(stbi__context *s)
4564 {
4565    int r;
4566    r = stbi__check_png_header(s);
4567    stbi__rewind(s);
4568    return r;
4569 }
4570 
stbi__png_info_raw(stbi__png * p,int * x,int * y,int * comp)4571 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
4572 {
4573    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
4574       stbi__rewind( p->s );
4575       return 0;
4576    }
4577    if (x) *x = p->s->img_x;
4578    if (y) *y = p->s->img_y;
4579    if (comp) *comp = p->s->img_n;
4580    return 1;
4581 }
4582 
stbi__png_info(stbi__context * s,int * x,int * y,int * comp)4583 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
4584 {
4585    stbi__png p;
4586    p.s = s;
4587    return stbi__png_info_raw(&p, x, y, comp);
4588 }
4589 #endif
4590 
4591 // Microsoft/Windows BMP image
4592 
4593 #ifndef STBI_NO_BMP
stbi__bmp_test_raw(stbi__context * s)4594 static int stbi__bmp_test_raw(stbi__context *s)
4595 {
4596    int r;
4597    int sz;
4598    if (stbi__get8(s) != 'B') return 0;
4599    if (stbi__get8(s) != 'M') return 0;
4600    stbi__get32le(s); // discard filesize
4601    stbi__get16le(s); // discard reserved
4602    stbi__get16le(s); // discard reserved
4603    stbi__get32le(s); // discard data offset
4604    sz = stbi__get32le(s);
4605    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
4606    return r;
4607 }
4608 
stbi__bmp_test(stbi__context * s)4609 static int stbi__bmp_test(stbi__context *s)
4610 {
4611    int r = stbi__bmp_test_raw(s);
4612    stbi__rewind(s);
4613    return r;
4614 }
4615 
4616 
4617 // returns 0..31 for the highest set bit
stbi__high_bit(unsigned int z)4618 static int stbi__high_bit(unsigned int z)
4619 {
4620    int n=0;
4621    if (z == 0) return -1;
4622    if (z >= 0x10000) n += 16, z >>= 16;
4623    if (z >= 0x00100) n +=  8, z >>=  8;
4624    if (z >= 0x00010) n +=  4, z >>=  4;
4625    if (z >= 0x00004) n +=  2, z >>=  2;
4626    if (z >= 0x00002) n +=  1, z >>=  1;
4627    return n;
4628 }
4629 
stbi__bitcount(unsigned int a)4630 static int stbi__bitcount(unsigned int a)
4631 {
4632    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
4633    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
4634    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
4635    a = (a + (a >> 8)); // max 16 per 8 bits
4636    a = (a + (a >> 16)); // max 32 per 8 bits
4637    return a & 0xff;
4638 }
4639 
stbi__shiftsigned(int v,int shift,int bits)4640 static int stbi__shiftsigned(int v, int shift, int bits)
4641 {
4642    int result;
4643    int z=0;
4644 
4645    if (shift < 0) v <<= -shift;
4646    else v >>= shift;
4647    result = v;
4648 
4649    z = bits;
4650    while (z < 8) {
4651       result += v >> z;
4652       z += bits;
4653    }
4654    return result;
4655 }
4656 
stbi__bmp_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4657 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4658 {
4659    stbi_uc *out;
4660    unsigned int mr=0,mg=0,mb=0,ma=0, fake_a=0;
4661    stbi_uc pal[256][4];
4662    int psize=0,i,j,compress=0,width;
4663    int bpp, flip_vertically, pad, target, offset, hsz;
4664    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
4665    stbi__get32le(s); // discard filesize
4666    stbi__get16le(s); // discard reserved
4667    stbi__get16le(s); // discard reserved
4668    offset = stbi__get32le(s);
4669    hsz = stbi__get32le(s);
4670    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
4671    if (hsz == 12) {
4672       s->img_x = stbi__get16le(s);
4673       s->img_y = stbi__get16le(s);
4674    } else {
4675       s->img_x = stbi__get32le(s);
4676       s->img_y = stbi__get32le(s);
4677    }
4678    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
4679    bpp = stbi__get16le(s);
4680    if (bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
4681    flip_vertically = ((int) s->img_y) > 0;
4682    s->img_y = abs((int) s->img_y);
4683    if (hsz == 12) {
4684       if (bpp < 24)
4685          psize = (offset - 14 - 24) / 3;
4686    } else {
4687       compress = stbi__get32le(s);
4688       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
4689       stbi__get32le(s); // discard sizeof
4690       stbi__get32le(s); // discard hres
4691       stbi__get32le(s); // discard vres
4692       stbi__get32le(s); // discard colorsused
4693       stbi__get32le(s); // discard max important
4694       if (hsz == 40 || hsz == 56) {
4695          if (hsz == 56) {
4696             stbi__get32le(s);
4697             stbi__get32le(s);
4698             stbi__get32le(s);
4699             stbi__get32le(s);
4700          }
4701          if (bpp == 16 || bpp == 32) {
4702             mr = mg = mb = 0;
4703             if (compress == 0) {
4704                if (bpp == 32) {
4705                   mr = 0xffu << 16;
4706                   mg = 0xffu <<  8;
4707                   mb = 0xffu <<  0;
4708                   ma = 0xffu << 24;
4709                   fake_a = 1; // @TODO: check for cases like alpha value is all 0 and switch it to 255
4710                   STBI_NOTUSED(fake_a);
4711                } else {
4712                   mr = 31u << 10;
4713                   mg = 31u <<  5;
4714                   mb = 31u <<  0;
4715                }
4716             } else if (compress == 3) {
4717                mr = stbi__get32le(s);
4718                mg = stbi__get32le(s);
4719                mb = stbi__get32le(s);
4720                // not documented, but generated by photoshop and handled by mspaint
4721                if (mr == mg && mg == mb) {
4722                   // ?!?!?
4723                   return stbi__errpuc("bad BMP", "bad BMP");
4724                }
4725             } else
4726                return stbi__errpuc("bad BMP", "bad BMP");
4727          }
4728       } else {
4729          STBI_ASSERT(hsz == 108 || hsz == 124);
4730          mr = stbi__get32le(s);
4731          mg = stbi__get32le(s);
4732          mb = stbi__get32le(s);
4733          ma = stbi__get32le(s);
4734          stbi__get32le(s); // discard color space
4735          for (i=0; i < 12; ++i)
4736             stbi__get32le(s); // discard color space parameters
4737          if (hsz == 124) {
4738             stbi__get32le(s); // discard rendering intent
4739             stbi__get32le(s); // discard offset of profile data
4740             stbi__get32le(s); // discard size of profile data
4741             stbi__get32le(s); // discard reserved
4742          }
4743       }
4744       if (bpp < 16)
4745          psize = (offset - 14 - hsz) >> 2;
4746    }
4747    s->img_n = ma ? 4 : 3;
4748    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
4749       target = req_comp;
4750    else
4751       target = s->img_n; // if they want monochrome, we'll post-convert
4752    out = (stbi_uc *) stbi__malloc(target * s->img_x * s->img_y);
4753    if (!out) return stbi__errpuc("outofmem", "Out of memory");
4754    if (bpp < 16) {
4755       int z=0;
4756       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
4757       for (i=0; i < psize; ++i) {
4758          pal[i][2] = stbi__get8(s);
4759          pal[i][1] = stbi__get8(s);
4760          pal[i][0] = stbi__get8(s);
4761          if (hsz != 12) stbi__get8(s);
4762          pal[i][3] = 255;
4763       }
4764       stbi__skip(s, offset - 14 - hsz - psize * (hsz == 12 ? 3 : 4));
4765       if (bpp == 4) width = (s->img_x + 1) >> 1;
4766       else if (bpp == 8) width = s->img_x;
4767       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
4768       pad = (-width)&3;
4769       for (j=0; j < (int) s->img_y; ++j) {
4770          for (i=0; i < (int) s->img_x; i += 2) {
4771             int v=stbi__get8(s),v2=0;
4772             if (bpp == 4) {
4773                v2 = v & 15;
4774                v >>= 4;
4775             }
4776             out[z++] = pal[v][0];
4777             out[z++] = pal[v][1];
4778             out[z++] = pal[v][2];
4779             if (target == 4) out[z++] = 255;
4780             if (i+1 == (int) s->img_x) break;
4781             v = (bpp == 8) ? stbi__get8(s) : v2;
4782             out[z++] = pal[v][0];
4783             out[z++] = pal[v][1];
4784             out[z++] = pal[v][2];
4785             if (target == 4) out[z++] = 255;
4786          }
4787          stbi__skip(s, pad);
4788       }
4789    } else {
4790       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
4791       int z = 0;
4792       int easy=0;
4793       stbi__skip(s, offset - 14 - hsz);
4794       if (bpp == 24) width = 3 * s->img_x;
4795       else if (bpp == 16) width = 2*s->img_x;
4796       else /* bpp = 32 and pad = 0 */ width=0;
4797       pad = (-width) & 3;
4798       if (bpp == 24) {
4799          easy = 1;
4800       } else if (bpp == 32) {
4801          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
4802             easy = 2;
4803       }
4804       if (!easy) {
4805          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
4806          // right shift amt to put high bit in position #7
4807          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
4808          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
4809          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
4810          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
4811       }
4812       for (j=0; j < (int) s->img_y; ++j) {
4813          if (easy) {
4814             for (i=0; i < (int) s->img_x; ++i) {
4815                unsigned char a;
4816                out[z+2] = stbi__get8(s);
4817                out[z+1] = stbi__get8(s);
4818                out[z+0] = stbi__get8(s);
4819                z += 3;
4820                a = (easy == 2 ? stbi__get8(s) : 255);
4821                if (target == 4) out[z++] = a;
4822             }
4823          } else {
4824             for (i=0; i < (int) s->img_x; ++i) {
4825                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
4826                int a;
4827                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
4828                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
4829                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
4830                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
4831                if (target == 4) out[z++] = STBI__BYTECAST(a);
4832             }
4833          }
4834          stbi__skip(s, pad);
4835       }
4836    }
4837    if (flip_vertically) {
4838       stbi_uc t;
4839       for (j=0; j < (int) s->img_y>>1; ++j) {
4840          stbi_uc *p1 = out +      j     *s->img_x*target;
4841          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
4842          for (i=0; i < (int) s->img_x*target; ++i) {
4843             t = p1[i], p1[i] = p2[i], p2[i] = t;
4844          }
4845       }
4846    }
4847 
4848    if (req_comp && req_comp != target) {
4849       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
4850       if (out == NULL) return out; // stbi__convert_format frees input on failure
4851    }
4852 
4853    *x = s->img_x;
4854    *y = s->img_y;
4855    if (comp) *comp = s->img_n;
4856    return out;
4857 }
4858 #endif
4859 
4860 // Targa Truevision - TGA
4861 // by Jonathan Dummer
4862 #ifndef STBI_NO_TGA
stbi__tga_info(stbi__context * s,int * x,int * y,int * comp)4863 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
4864 {
4865     int tga_w, tga_h, tga_comp;
4866     int sz;
4867     stbi__get8(s);                   // discard Offset
4868     sz = stbi__get8(s);              // color type
4869     if( sz > 1 ) {
4870         stbi__rewind(s);
4871         return 0;      // only RGB or indexed allowed
4872     }
4873     sz = stbi__get8(s);              // image type
4874     // only RGB or grey allowed, +/- RLE
4875     if ((sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11)) return 0;
4876     stbi__skip(s,9);
4877     tga_w = stbi__get16le(s);
4878     if( tga_w < 1 ) {
4879         stbi__rewind(s);
4880         return 0;   // test width
4881     }
4882     tga_h = stbi__get16le(s);
4883     if( tga_h < 1 ) {
4884         stbi__rewind(s);
4885         return 0;   // test height
4886     }
4887     sz = stbi__get8(s);               // bits per pixel
4888     // only RGB or RGBA or grey allowed
4889     if ((sz != 8) && (sz != 16) && (sz != 24) && (sz != 32)) {
4890         stbi__rewind(s);
4891         return 0;
4892     }
4893     tga_comp = sz;
4894     if (x) *x = tga_w;
4895     if (y) *y = tga_h;
4896     if (comp) *comp = tga_comp / 8;
4897     return 1;                   // seems to have passed everything
4898 }
4899 
stbi__tga_test(stbi__context * s)4900 static int stbi__tga_test(stbi__context *s)
4901 {
4902    int res;
4903    int sz;
4904    stbi__get8(s);      //   discard Offset
4905    sz = stbi__get8(s);   //   color type
4906    if ( sz > 1 ) return 0;   //   only RGB or indexed allowed
4907    sz = stbi__get8(s);   //   image type
4908    if ( (sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11) ) return 0;   //   only RGB or grey allowed, +/- RLE
4909    stbi__get16be(s);      //   discard palette start
4910    stbi__get16be(s);      //   discard palette length
4911    stbi__get8(s);         //   discard bits per palette color entry
4912    stbi__get16be(s);      //   discard x origin
4913    stbi__get16be(s);      //   discard y origin
4914    if ( stbi__get16be(s) < 1 ) return 0;      //   test width
4915    if ( stbi__get16be(s) < 1 ) return 0;      //   test height
4916    sz = stbi__get8(s);   //   bits per pixel
4917    if ( (sz != 8) && (sz != 16) && (sz != 24) && (sz != 32) )
4918       res = 0;
4919    else
4920       res = 1;
4921    stbi__rewind(s);
4922    return res;
4923 }
4924 
stbi__tga_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4925 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4926 {
4927    //   read in the TGA header stuff
4928    int tga_offset = stbi__get8(s);
4929    int tga_indexed = stbi__get8(s);
4930    int tga_image_type = stbi__get8(s);
4931    int tga_is_RLE = 0;
4932    int tga_palette_start = stbi__get16le(s);
4933    int tga_palette_len = stbi__get16le(s);
4934    int tga_palette_bits = stbi__get8(s);
4935    int tga_x_origin = stbi__get16le(s);
4936    int tga_y_origin = stbi__get16le(s);
4937    int tga_width = stbi__get16le(s);
4938    int tga_height = stbi__get16le(s);
4939    int tga_bits_per_pixel = stbi__get8(s);
4940    int tga_comp = tga_bits_per_pixel / 8;
4941    int tga_inverted = stbi__get8(s);
4942    //   image data
4943    unsigned char *tga_data;
4944    unsigned char *tga_palette = NULL;
4945    int i, j;
4946    unsigned char raw_data[4];
4947    int RLE_count = 0;
4948    int RLE_repeating = 0;
4949    int read_next_pixel = 1;
4950 
4951    //   do a tiny bit of precessing
4952    if ( tga_image_type >= 8 )
4953    {
4954       tga_image_type -= 8;
4955       tga_is_RLE = 1;
4956    }
4957    /* int tga_alpha_bits = tga_inverted & 15; */
4958    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
4959 
4960    //   error check
4961    if ( //(tga_indexed) ||
4962       (tga_width < 1) || (tga_height < 1) ||
4963       (tga_image_type < 1) || (tga_image_type > 3) ||
4964       ((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16) &&
4965       (tga_bits_per_pixel != 24) && (tga_bits_per_pixel != 32))
4966       )
4967    {
4968       return NULL; // we don't report this as a bad TGA because we don't even know if it's TGA
4969    }
4970 
4971    //   If I'm paletted, then I'll use the number of bits from the palette
4972    if ( tga_indexed )
4973    {
4974       tga_comp = tga_palette_bits / 8;
4975    }
4976 
4977    //   tga info
4978    *x = tga_width;
4979    *y = tga_height;
4980    if (comp) *comp = tga_comp;
4981 
4982    tga_data = (unsigned char*)stbi__malloc( (size_t)tga_width * tga_height * tga_comp );
4983    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
4984 
4985    // skip to the data's starting position (offset usually = 0)
4986    stbi__skip(s, tga_offset );
4987 
4988    if ( !tga_indexed && !tga_is_RLE) {
4989       for (i=0; i < tga_height; ++i) {
4990          int y2 = tga_inverted ? tga_height -i - 1 : i;
4991          stbi_uc *tga_row = tga_data + y2*tga_width*tga_comp;
4992          stbi__getn(s, tga_row, tga_width * tga_comp);
4993       }
4994    } else  {
4995       //   do I need to load a palette?
4996       if ( tga_indexed)
4997       {
4998          //   any data to skip? (offset usually = 0)
4999          stbi__skip(s, tga_palette_start );
5000          //   load the palette
5001          tga_palette = (unsigned char*)stbi__malloc( tga_palette_len * tga_palette_bits / 8 );
5002          if (!tga_palette) {
5003             STBI_FREE(tga_data);
5004             return stbi__errpuc("outofmem", "Out of memory");
5005          }
5006          if (!stbi__getn(s, tga_palette, tga_palette_len * tga_palette_bits / 8 )) {
5007             STBI_FREE(tga_data);
5008             STBI_FREE(tga_palette);
5009             return stbi__errpuc("bad palette", "Corrupt TGA");
5010          }
5011       }
5012       //   load the data
5013       for (i=0; i < tga_width * tga_height; ++i)
5014       {
5015          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
5016          if ( tga_is_RLE )
5017          {
5018             if ( RLE_count == 0 )
5019             {
5020                //   yep, get the next byte as a RLE command
5021                int RLE_cmd = stbi__get8(s);
5022                RLE_count = 1 + (RLE_cmd & 127);
5023                RLE_repeating = RLE_cmd >> 7;
5024                read_next_pixel = 1;
5025             } else if ( !RLE_repeating )
5026             {
5027                read_next_pixel = 1;
5028             }
5029          } else
5030          {
5031             read_next_pixel = 1;
5032          }
5033          //   OK, if I need to read a pixel, do it now
5034          if ( read_next_pixel )
5035          {
5036             //   load however much data we did have
5037             if ( tga_indexed )
5038             {
5039                //   read in 1 byte, then perform the lookup
5040                int pal_idx = stbi__get8(s);
5041                if ( pal_idx >= tga_palette_len )
5042                {
5043                   //   invalid index
5044                   pal_idx = 0;
5045                }
5046                pal_idx *= tga_bits_per_pixel / 8;
5047                for (j = 0; j*8 < tga_bits_per_pixel; ++j)
5048                {
5049                   raw_data[j] = tga_palette[pal_idx+j];
5050                }
5051             } else
5052             {
5053                //   read in the data raw
5054                for (j = 0; j*8 < tga_bits_per_pixel; ++j)
5055                {
5056                   raw_data[j] = stbi__get8(s);
5057                }
5058             }
5059             //   clear the reading flag for the next pixel
5060             read_next_pixel = 0;
5061          } // end of reading a pixel
5062 
5063          // copy data
5064          for (j = 0; j < tga_comp; ++j)
5065            tga_data[i*tga_comp+j] = raw_data[j];
5066 
5067          //   in case we're in RLE mode, keep counting down
5068          --RLE_count;
5069       }
5070       //   do I need to invert the image?
5071       if ( tga_inverted )
5072       {
5073          for (j = 0; j*2 < tga_height; ++j)
5074          {
5075             int index1 = j * tga_width * tga_comp;
5076             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
5077             for (i = tga_width * tga_comp; i > 0; --i)
5078             {
5079                unsigned char temp = tga_data[index1];
5080                tga_data[index1] = tga_data[index2];
5081                tga_data[index2] = temp;
5082                ++index1;
5083                ++index2;
5084             }
5085          }
5086       }
5087       //   clear my palette, if I had one
5088       if ( tga_palette != NULL )
5089       {
5090          STBI_FREE( tga_palette );
5091       }
5092    }
5093 
5094    // swap RGB
5095    if (tga_comp >= 3)
5096    {
5097       unsigned char* tga_pixel = tga_data;
5098       for (i=0; i < tga_width * tga_height; ++i)
5099       {
5100          unsigned char temp = tga_pixel[0];
5101          tga_pixel[0] = tga_pixel[2];
5102          tga_pixel[2] = temp;
5103          tga_pixel += tga_comp;
5104       }
5105    }
5106 
5107    // convert to target component count
5108    if (req_comp && req_comp != tga_comp)
5109       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
5110 
5111    //   the things I do to get rid of an error message, and yet keep
5112    //   Microsoft's C compilers happy... [8^(
5113    tga_palette_start = tga_palette_len = tga_palette_bits =
5114          tga_x_origin = tga_y_origin = 0;
5115    //   OK, done
5116    return tga_data;
5117 }
5118 #endif
5119 
5120 // *************************************************************************************************
5121 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
5122 
5123 #ifndef STBI_NO_PSD
stbi__psd_test(stbi__context * s)5124 static int stbi__psd_test(stbi__context *s)
5125 {
5126    int r = (stbi__get32be(s) == 0x38425053);
5127    stbi__rewind(s);
5128    return r;
5129 }
5130 
stbi__psd_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5131 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5132 {
5133    int   pixelCount;
5134    int channelCount, compression;
5135    int channel, i, count, len;
5136    int w,h;
5137    stbi_uc *out;
5138 
5139    // Check identifier
5140    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
5141       return stbi__errpuc("not PSD", "Corrupt PSD image");
5142 
5143    // Check file type version.
5144    if (stbi__get16be(s) != 1)
5145       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
5146 
5147    // Skip 6 reserved bytes.
5148    stbi__skip(s, 6 );
5149 
5150    // Read the number of channels (R, G, B, A, etc).
5151    channelCount = stbi__get16be(s);
5152    if (channelCount < 0 || channelCount > 16)
5153       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
5154 
5155    // Read the rows and columns of the image.
5156    h = stbi__get32be(s);
5157    w = stbi__get32be(s);
5158 
5159    // Make sure the depth is 8 bits.
5160    if (stbi__get16be(s) != 8)
5161       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 bit");
5162 
5163    // Make sure the color mode is RGB.
5164    // Valid options are:
5165    //   0: Bitmap
5166    //   1: Grayscale
5167    //   2: Indexed color
5168    //   3: RGB color
5169    //   4: CMYK color
5170    //   7: Multichannel
5171    //   8: Duotone
5172    //   9: Lab color
5173    if (stbi__get16be(s) != 3)
5174       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
5175 
5176    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
5177    stbi__skip(s,stbi__get32be(s) );
5178 
5179    // Skip the image resources.  (resolution, pen tool paths, etc)
5180    stbi__skip(s, stbi__get32be(s) );
5181 
5182    // Skip the reserved data.
5183    stbi__skip(s, stbi__get32be(s) );
5184 
5185    // Find out if the data is compressed.
5186    // Known values:
5187    //   0: no compression
5188    //   1: RLE compressed
5189    compression = stbi__get16be(s);
5190    if (compression > 1)
5191       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
5192 
5193    // Create the destination image.
5194    out = (stbi_uc *) stbi__malloc(channelCount * w*h);
5195    if (!out) return stbi__errpuc("outofmem", "Out of memory");
5196    pixelCount = w*h;
5197 
5198    // Initialize the data to zero.
5199    //memset( out, 0, pixelCount * 4 );
5200 
5201    // Finally, the image data.
5202    if (compression) {
5203       // RLE as used by .PSD and .TIFF
5204       // Loop until you get the number of unpacked bytes you are expecting:
5205       //     Read the next source byte into n.
5206       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
5207       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
5208       //     Else if n is 128, noop.
5209       // Endloop
5210 
5211       // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
5212       // which we're going to just skip.
5213       stbi__skip(s, h * channelCount * 2 );
5214 
5215       // Read the RLE data by channel.
5216       for (channel = 0; channel < channelCount; channel++) {
5217          stbi_uc *p;
5218 
5219          p = out+channel;
5220          if (channel >= channelCount) {
5221             // Fill this channel with default data.
5222             for (i = 0; i < pixelCount; i++) *p = (channel == 3 ? 255 : 0), p += channelCount;
5223          } else {
5224             // Read the RLE data.
5225             count = 0;
5226             while (count < pixelCount) {
5227                len = stbi__get8(s);
5228                if (len == 128) {
5229                   // No-op.
5230                } else if (len < 128) {
5231                   // Copy next len+1 bytes literally.
5232                   len++;
5233                   count += len;
5234                   while (len) {
5235                      *p = stbi__get8(s);
5236                      p += channelCount;
5237                      len--;
5238                   }
5239                } else if (len > 128) {
5240                   stbi_uc   val;
5241                   // Next -len+1 bytes in the dest are replicated from next source byte.
5242                   // (Interpret len as a negative 8-bit int.)
5243                   len ^= 0x0FF;
5244                   len += 2;
5245                   val = stbi__get8(s);
5246                   count += len;
5247                   while (len) {
5248                      *p = val;
5249                      p += channelCount;
5250                      len--;
5251                   }
5252                }
5253             }
5254          }
5255       }
5256 
5257    } else {
5258       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
5259       // where each channel consists of an 8-bit value for each pixel in the image.
5260 
5261       // Read the data by channel.
5262       for (channel = 0; channel < channelCount; channel++) {
5263          stbi_uc *p;
5264 
5265          p = out + channel;
5266          if (channel > channelCount) {
5267             // Fill this channel with default data.
5268             for (i = 0; i < pixelCount; i++) *p = channel == 3 ? 255 : 0, p += channelCount;
5269          } else {
5270             // Read the data.
5271             for (i = 0; i < pixelCount; i++)
5272                *p = stbi__get8(s), p += channelCount;
5273          }
5274       }
5275    }
5276 
5277    if (req_comp && req_comp != channelCount) {
5278       out = stbi__convert_format(out, channelCount, req_comp, w, h);
5279       if (out == NULL) return out; // stbi__convert_format frees input on failure
5280    }
5281 
5282    if (comp) *comp = channelCount;
5283    *y = h;
5284    *x = w;
5285 
5286    return out;
5287 }
5288 #endif
5289 
5290 // *************************************************************************************************
5291 // Softimage PIC loader
5292 // by Tom Seddon
5293 //
5294 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
5295 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
5296 
5297 #ifndef STBI_NO_PIC
stbi__pic_is4(stbi__context * s,const char * str)5298 static int stbi__pic_is4(stbi__context *s,const char *str)
5299 {
5300    int i;
5301    for (i=0; i<4; ++i)
5302       if (stbi__get8(s) != (stbi_uc)str[i])
5303          return 0;
5304 
5305    return 1;
5306 }
5307 
stbi__pic_test_core(stbi__context * s)5308 static int stbi__pic_test_core(stbi__context *s)
5309 {
5310    int i;
5311 
5312    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
5313       return 0;
5314 
5315    for(i=0;i<84;++i)
5316       stbi__get8(s);
5317 
5318    if (!stbi__pic_is4(s,"PICT"))
5319       return 0;
5320 
5321    return 1;
5322 }
5323 
5324 typedef struct
5325 {
5326    stbi_uc size,type,channel;
5327 } stbi__pic_packet;
5328 
stbi__readval(stbi__context * s,int channel,stbi_uc * dest)5329 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
5330 {
5331    int mask=0x80, i;
5332 
5333    for (i=0; i<4; ++i, mask>>=1) {
5334       if (channel & mask) {
5335          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
5336          dest[i]=stbi__get8(s);
5337       }
5338    }
5339 
5340    return dest;
5341 }
5342 
stbi__copyval(int channel,stbi_uc * dest,const stbi_uc * src)5343 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
5344 {
5345    int mask=0x80,i;
5346 
5347    for (i=0;i<4; ++i, mask>>=1)
5348       if (channel&mask)
5349          dest[i]=src[i];
5350 }
5351 
stbi__pic_load_core(stbi__context * s,int width,int height,int * comp,stbi_uc * result)5352 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
5353 {
5354    int act_comp=0,num_packets=0,y,chained;
5355    stbi__pic_packet packets[10];
5356 
5357    // this will (should...) cater for even some bizarre stuff like having data
5358     // for the same channel in multiple packets.
5359    do {
5360       stbi__pic_packet *packet;
5361 
5362       if (num_packets==sizeof(packets)/sizeof(packets[0]))
5363          return stbi__errpuc("bad format","too many packets");
5364 
5365       packet = &packets[num_packets++];
5366 
5367       chained = stbi__get8(s);
5368       packet->size    = stbi__get8(s);
5369       packet->type    = stbi__get8(s);
5370       packet->channel = stbi__get8(s);
5371 
5372       act_comp |= packet->channel;
5373 
5374       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
5375       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
5376    } while (chained);
5377 
5378    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
5379 
5380    for(y=0; y<height; ++y) {
5381       int packet_idx;
5382 
5383       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
5384          stbi__pic_packet *packet = &packets[packet_idx];
5385          stbi_uc *dest = result+y*width*4;
5386 
5387          switch (packet->type) {
5388             default:
5389                return stbi__errpuc("bad format","packet has bad compression type");
5390 
5391             case 0: {//uncompressed
5392                int x;
5393 
5394                for(x=0;x<width;++x, dest+=4)
5395                   if (!stbi__readval(s,packet->channel,dest))
5396                      return 0;
5397                break;
5398             }
5399 
5400             case 1://Pure RLE
5401                {
5402                   int left=width, i;
5403 
5404                   while (left>0) {
5405                      stbi_uc count,value[4];
5406 
5407                      count=stbi__get8(s);
5408                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
5409 
5410                      if (count > left)
5411                         count = (stbi_uc) left;
5412 
5413                      if (!stbi__readval(s,packet->channel,value))  return 0;
5414 
5415                      for(i=0; i<count; ++i,dest+=4)
5416                         stbi__copyval(packet->channel,dest,value);
5417                      left -= count;
5418                   }
5419                }
5420                break;
5421 
5422             case 2: {//Mixed RLE
5423                int left=width;
5424                while (left>0) {
5425                   int count = stbi__get8(s), i;
5426                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
5427 
5428                   if (count >= 128) { // Repeated
5429                      stbi_uc value[4];
5430 
5431                      if (count==128)
5432                         count = stbi__get16be(s);
5433                      else
5434                         count -= 127;
5435                      if (count > left)
5436                         return stbi__errpuc("bad file","scanline overrun");
5437 
5438                      if (!stbi__readval(s,packet->channel,value))
5439                         return 0;
5440 
5441                      for(i=0;i<count;++i, dest += 4)
5442                         stbi__copyval(packet->channel,dest,value);
5443                   } else { // Raw
5444                      ++count;
5445                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
5446 
5447                      for(i=0;i<count;++i, dest+=4)
5448                         if (!stbi__readval(s,packet->channel,dest))
5449                            return 0;
5450                   }
5451                   left-=count;
5452                }
5453                break;
5454             }
5455          }
5456       }
5457    }
5458 
5459    return result;
5460 }
5461 
stbi__pic_load(stbi__context * s,int * px,int * py,int * comp,int req_comp)5462 static stbi_uc *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp)
5463 {
5464    stbi_uc *result;
5465    int i, x,y;
5466 
5467    for (i=0; i<92; ++i)
5468       stbi__get8(s);
5469 
5470    x = stbi__get16be(s);
5471    y = stbi__get16be(s);
5472    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
5473    if ((1 << 28) / x < y) return stbi__errpuc("too large", "Image too large to decode");
5474 
5475    stbi__get32be(s); //skip `ratio'
5476    stbi__get16be(s); //skip `fields'
5477    stbi__get16be(s); //skip `pad'
5478 
5479    // intermediate buffer is RGBA
5480    result = (stbi_uc *) stbi__malloc(x*y*4);
5481    memset(result, 0xff, x*y*4);
5482 
5483    if (!stbi__pic_load_core(s,x,y,comp, result)) {
5484       STBI_FREE(result);
5485       result=0;
5486    }
5487    *px = x;
5488    *py = y;
5489    if (req_comp == 0) req_comp = *comp;
5490    result=stbi__convert_format(result,4,req_comp,x,y);
5491 
5492    return result;
5493 }
5494 
stbi__pic_test(stbi__context * s)5495 static int stbi__pic_test(stbi__context *s)
5496 {
5497    int r = stbi__pic_test_core(s);
5498    stbi__rewind(s);
5499    return r;
5500 }
5501 #endif
5502 
5503 // *************************************************************************************************
5504 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
5505 
5506 #ifndef STBI_NO_GIF
5507 typedef struct
5508 {
5509    stbi__int16 prefix;
5510    stbi_uc first;
5511    stbi_uc suffix;
5512 } stbi__gif_lzw;
5513 
5514 typedef struct
5515 {
5516    int w,h;
5517    stbi_uc *out;                 // output buffer (always 4 components)
5518    int flags, bgindex, ratio, transparent, eflags;
5519    stbi_uc  pal[256][4];
5520    stbi_uc lpal[256][4];
5521    stbi__gif_lzw codes[4096];
5522    stbi_uc *color_table;
5523    int parse, step;
5524    int lflags;
5525    int start_x, start_y;
5526    int max_x, max_y;
5527    int cur_x, cur_y;
5528    int line_size;
5529 } stbi__gif;
5530 
stbi__gif_test_raw(stbi__context * s)5531 static int stbi__gif_test_raw(stbi__context *s)
5532 {
5533    int sz;
5534    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
5535    sz = stbi__get8(s);
5536    if (sz != '9' && sz != '7') return 0;
5537    if (stbi__get8(s) != 'a') return 0;
5538    return 1;
5539 }
5540 
stbi__gif_test(stbi__context * s)5541 static int stbi__gif_test(stbi__context *s)
5542 {
5543    int r = stbi__gif_test_raw(s);
5544    stbi__rewind(s);
5545    return r;
5546 }
5547 
stbi__gif_parse_colortable(stbi__context * s,stbi_uc pal[256][4],int num_entries,int transp)5548 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
5549 {
5550    int i;
5551    for (i=0; i < num_entries; ++i) {
5552       pal[i][2] = stbi__get8(s);
5553       pal[i][1] = stbi__get8(s);
5554       pal[i][0] = stbi__get8(s);
5555       pal[i][3] = transp == i ? 0 : 255;
5556    }
5557 }
5558 
stbi__gif_header(stbi__context * s,stbi__gif * g,int * comp,int is_info)5559 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
5560 {
5561    stbi_uc version;
5562    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
5563       return stbi__err("not GIF", "Corrupt GIF");
5564 
5565    version = stbi__get8(s);
5566    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
5567    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
5568 
5569    stbi__g_failure_reason = "";
5570    g->w = stbi__get16le(s);
5571    g->h = stbi__get16le(s);
5572    g->flags = stbi__get8(s);
5573    g->bgindex = stbi__get8(s);
5574    g->ratio = stbi__get8(s);
5575    g->transparent = -1;
5576 
5577    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
5578 
5579    if (is_info) return 1;
5580 
5581    if (g->flags & 0x80)
5582       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
5583 
5584    return 1;
5585 }
5586 
stbi__gif_info_raw(stbi__context * s,int * x,int * y,int * comp)5587 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
5588 {
5589    stbi__gif g;
5590    if (!stbi__gif_header(s, &g, comp, 1)) {
5591       stbi__rewind( s );
5592       return 0;
5593    }
5594    if (x) *x = g.w;
5595    if (y) *y = g.h;
5596    return 1;
5597 }
5598 
stbi__out_gif_code(stbi__gif * g,stbi__uint16 code)5599 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
5600 {
5601    stbi_uc *p, *c;
5602 
5603    // recurse to decode the prefixes, since the linked-list is backwards,
5604    // and working backwards through an interleaved image would be nasty
5605    if (g->codes[code].prefix >= 0)
5606       stbi__out_gif_code(g, g->codes[code].prefix);
5607 
5608    if (g->cur_y >= g->max_y) return;
5609 
5610    p = &g->out[g->cur_x + g->cur_y];
5611    c = &g->color_table[g->codes[code].suffix * 4];
5612 
5613    if (c[3] >= 128) {
5614       p[0] = c[2];
5615       p[1] = c[1];
5616       p[2] = c[0];
5617       p[3] = c[3];
5618    }
5619    g->cur_x += 4;
5620 
5621    if (g->cur_x >= g->max_x) {
5622       g->cur_x = g->start_x;
5623       g->cur_y += g->step;
5624 
5625       while (g->cur_y >= g->max_y && g->parse > 0) {
5626          g->step = (1 << g->parse) * g->line_size;
5627          g->cur_y = g->start_y + (g->step >> 1);
5628          --g->parse;
5629       }
5630    }
5631 }
5632 
stbi__process_gif_raster(stbi__context * s,stbi__gif * g)5633 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
5634 {
5635    stbi_uc lzw_cs;
5636    stbi__int32 len, code;
5637    stbi__uint32 first;
5638    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
5639    stbi__gif_lzw *p;
5640 
5641    lzw_cs = stbi__get8(s);
5642    if (lzw_cs > 12) return NULL;
5643    clear = 1 << lzw_cs;
5644    first = 1;
5645    codesize = lzw_cs + 1;
5646    codemask = (1 << codesize) - 1;
5647    bits = 0;
5648    valid_bits = 0;
5649    for (code = 0; code < clear; code++) {
5650       g->codes[code].prefix = -1;
5651       g->codes[code].first = (stbi_uc) code;
5652       g->codes[code].suffix = (stbi_uc) code;
5653    }
5654 
5655    // support no starting clear code
5656    avail = clear+2;
5657    oldcode = -1;
5658 
5659    len = 0;
5660    for(;;) {
5661       if (valid_bits < codesize) {
5662          if (len == 0) {
5663             len = stbi__get8(s); // start new block
5664             if (len == 0)
5665                return g->out;
5666          }
5667          --len;
5668          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
5669          valid_bits += 8;
5670       } else {
5671          stbi__int32 code2 = bits & codemask;
5672          bits >>= codesize;
5673          valid_bits -= codesize;
5674          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
5675          if (code2 == clear) {  // clear code
5676             codesize = lzw_cs + 1;
5677             codemask = (1 << codesize) - 1;
5678             avail = clear + 2;
5679             oldcode = -1;
5680             first = 0;
5681          } else if (code2 == clear + 1) { // end of stream code
5682             stbi__skip(s, len);
5683             while ((len = stbi__get8(s)) > 0)
5684                stbi__skip(s,len);
5685             return g->out;
5686          } else if (code2 <= avail) {
5687             if (first) return stbi__errpuc("no clear code", "Corrupt GIF");
5688 
5689             if (oldcode >= 0) {
5690                p = &g->codes[avail++];
5691                if (avail > 4096)        return stbi__errpuc("too many codes", "Corrupt GIF");
5692                p->prefix = (stbi__int16) oldcode;
5693                p->first = g->codes[oldcode].first;
5694                p->suffix = (code2 == avail) ? p->first : g->codes[code2].first;
5695             } else if (code2 == avail)
5696                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5697 
5698             stbi__out_gif_code(g, (stbi__uint16) code2);
5699 
5700             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
5701                codesize++;
5702                codemask = (1 << codesize) - 1;
5703             }
5704 
5705             oldcode = code2;
5706          } else {
5707             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5708          }
5709       }
5710    }
5711 }
5712 
stbi__fill_gif_background(stbi__gif * g)5713 static void stbi__fill_gif_background(stbi__gif *g)
5714 {
5715    int i;
5716    stbi_uc *c = g->pal[g->bgindex];
5717    // @OPTIMIZE: write a dword at a time
5718    for (i = 0; i < g->w * g->h * 4; i += 4) {
5719       stbi_uc *p  = &g->out[i];
5720       p[0] = c[2];
5721       p[1] = c[1];
5722       p[2] = c[0];
5723       p[3] = c[3];
5724    }
5725 }
5726 
5727 // this function is designed to support animated gifs, although stb_image doesn't support it
stbi__gif_load_next(stbi__context * s,stbi__gif * g,int * comp,int req_comp)5728 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp)
5729 {
5730    int i;
5731    stbi_uc *old_out = 0;
5732 
5733    if (g->out == 0) {
5734       if (!stbi__gif_header(s, g, comp,0))     return 0; // stbi__g_failure_reason set by stbi__gif_header
5735       g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
5736       if (g->out == 0)                      return stbi__errpuc("outofmem", "Out of memory");
5737       stbi__fill_gif_background(g);
5738    } else {
5739       // animated-gif-only path
5740       if (((g->eflags & 0x1C) >> 2) == 3) {
5741          old_out = g->out;
5742          g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
5743          if (g->out == 0)                   return stbi__errpuc("outofmem", "Out of memory");
5744          memcpy(g->out, old_out, g->w*g->h*4);
5745       }
5746    }
5747 
5748    for (;;) {
5749       switch (stbi__get8(s)) {
5750          case 0x2C: /* Image Descriptor */
5751          {
5752             stbi__int32 x, y, w, h;
5753             stbi_uc *o;
5754 
5755             x = stbi__get16le(s);
5756             y = stbi__get16le(s);
5757             w = stbi__get16le(s);
5758             h = stbi__get16le(s);
5759             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
5760                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
5761 
5762             g->line_size = g->w * 4;
5763             g->start_x = x * 4;
5764             g->start_y = y * g->line_size;
5765             g->max_x   = g->start_x + w * 4;
5766             g->max_y   = g->start_y + h * g->line_size;
5767             g->cur_x   = g->start_x;
5768             g->cur_y   = g->start_y;
5769 
5770             g->lflags = stbi__get8(s);
5771 
5772             if (g->lflags & 0x40) {
5773                g->step = 8 * g->line_size; // first interlaced spacing
5774                g->parse = 3;
5775             } else {
5776                g->step = g->line_size;
5777                g->parse = 0;
5778             }
5779 
5780             if (g->lflags & 0x80) {
5781                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
5782                g->color_table = (stbi_uc *) g->lpal;
5783             } else if (g->flags & 0x80) {
5784                for (i=0; i < 256; ++i)  // @OPTIMIZE: stbi__jpeg_reset only the previous transparent
5785                   g->pal[i][3] = 255;
5786                if (g->transparent >= 0 && (g->eflags & 0x01))
5787                   g->pal[g->transparent][3] = 0;
5788                g->color_table = (stbi_uc *) g->pal;
5789             } else
5790                return stbi__errpuc("missing color table", "Corrupt GIF");
5791 
5792             o = stbi__process_gif_raster(s, g);
5793             if (o == NULL) return NULL;
5794 
5795             if (req_comp && req_comp != 4)
5796                o = stbi__convert_format(o, 4, req_comp, g->w, g->h);
5797             return o;
5798          }
5799 
5800          case 0x21: // Comment Extension.
5801          {
5802             int len;
5803             if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
5804                len = stbi__get8(s);
5805                if (len == 4) {
5806                   g->eflags = stbi__get8(s);
5807                   stbi__get16le(s); // delay
5808                   g->transparent = stbi__get8(s);
5809                } else {
5810                   stbi__skip(s, len);
5811                   break;
5812                }
5813             }
5814             while ((len = stbi__get8(s)) != 0)
5815                stbi__skip(s, len);
5816             break;
5817          }
5818 
5819          case 0x3B: // gif stream termination code
5820             return (stbi_uc *) s; // using '1' causes warning on some compilers
5821 
5822          default:
5823             return stbi__errpuc("unknown code", "Corrupt GIF");
5824       }
5825    }
5826 }
5827 
stbi__gif_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5828 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5829 {
5830    stbi_uc *u = 0;
5831    stbi__gif g;
5832    memset(&g, 0, sizeof(g));
5833 
5834    u = stbi__gif_load_next(s, &g, comp, req_comp);
5835    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
5836    if (u) {
5837       *x = g.w;
5838       *y = g.h;
5839    }
5840 
5841    return u;
5842 }
5843 
stbi__gif_info(stbi__context * s,int * x,int * y,int * comp)5844 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
5845 {
5846    return stbi__gif_info_raw(s,x,y,comp);
5847 }
5848 #endif
5849 
5850 // *************************************************************************************************
5851 // Radiance RGBE HDR loader
5852 // originally by Nicolas Schulz
5853 #ifndef STBI_NO_HDR
stbi__hdr_test_core(stbi__context * s)5854 static int stbi__hdr_test_core(stbi__context *s)
5855 {
5856    const char *signature = "#?RADIANCE\n";
5857    int i;
5858    for (i=0; signature[i]; ++i)
5859       if (stbi__get8(s) != signature[i])
5860          return 0;
5861    return 1;
5862 }
5863 
stbi__hdr_test(stbi__context * s)5864 static int stbi__hdr_test(stbi__context* s)
5865 {
5866    int r = stbi__hdr_test_core(s);
5867    stbi__rewind(s);
5868    return r;
5869 }
5870 
5871 #define STBI__HDR_BUFLEN  1024
stbi__hdr_gettoken(stbi__context * z,char * buffer)5872 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
5873 {
5874    int len=0;
5875    char c = '\0';
5876 
5877    c = (char) stbi__get8(z);
5878 
5879    while (!stbi__at_eof(z) && c != '\n') {
5880       buffer[len++] = c;
5881       if (len == STBI__HDR_BUFLEN-1) {
5882          // flush to end of line
5883          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
5884             ;
5885          break;
5886       }
5887       c = (char) stbi__get8(z);
5888    }
5889 
5890    buffer[len] = 0;
5891    return buffer;
5892 }
5893 
stbi__hdr_convert(float * output,stbi_uc * input,int req_comp)5894 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
5895 {
5896    if ( input[3] != 0 ) {
5897       float f1;
5898       // Exponent
5899       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
5900       if (req_comp <= 2)
5901          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
5902       else {
5903          output[0] = input[0] * f1;
5904          output[1] = input[1] * f1;
5905          output[2] = input[2] * f1;
5906       }
5907       if (req_comp == 2) output[1] = 1;
5908       if (req_comp == 4) output[3] = 1;
5909    } else {
5910       switch (req_comp) {
5911          case 4: output[3] = 1; /* fallthrough */
5912          case 3: output[0] = output[1] = output[2] = 0;
5913                  break;
5914          case 2: output[1] = 1; /* fallthrough */
5915          case 1: output[0] = 0;
5916                  break;
5917       }
5918    }
5919 }
5920 
stbi__hdr_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5921 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5922 {
5923    char buffer[STBI__HDR_BUFLEN];
5924    char *token;
5925    int valid = 0;
5926    int width, height;
5927    stbi_uc *scanline;
5928    float *hdr_data;
5929    int len;
5930    unsigned char count, value;
5931    int i, j, k, c1,c2, z;
5932 
5933 
5934    // Check identifier
5935    if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0)
5936       return stbi__errpf("not HDR", "Corrupt HDR image");
5937 
5938    // Parse header
5939    for(;;) {
5940       token = stbi__hdr_gettoken(s,buffer);
5941       if (token[0] == 0) break;
5942       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
5943    }
5944 
5945    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
5946 
5947    // Parse width and height
5948    // can't use sscanf() if we're not using stdio!
5949    token = stbi__hdr_gettoken(s,buffer);
5950    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5951    token += 3;
5952    height = (int) strtol(token, &token, 10);
5953    while (*token == ' ') ++token;
5954    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5955    token += 3;
5956    width = (int) strtol(token, NULL, 10);
5957 
5958    *x = width;
5959    *y = height;
5960 
5961    if (comp) *comp = 3;
5962    if (req_comp == 0) req_comp = 3;
5963 
5964    // Read data
5965    hdr_data = (float *) stbi__malloc(height * width * req_comp * sizeof(float));
5966 
5967    // Load image data
5968    // image data is stored as some number of sca
5969    if ( width < 8 || width >= 32768) {
5970       // Read flat data
5971       for (j=0; j < height; ++j) {
5972          for (i=0; i < width; ++i) {
5973             stbi_uc rgbe[4];
5974            main_decode_loop:
5975             stbi__getn(s, rgbe, 4);
5976             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
5977          }
5978       }
5979    } else {
5980       // Read RLE-encoded data
5981       scanline = NULL;
5982 
5983       for (j = 0; j < height; ++j) {
5984          c1 = stbi__get8(s);
5985          c2 = stbi__get8(s);
5986          len = stbi__get8(s);
5987          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
5988             // not run-length encoded, so we have to actually use THIS data as a decoded
5989             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
5990             stbi_uc rgbe[4];
5991             rgbe[0] = (stbi_uc) c1;
5992             rgbe[1] = (stbi_uc) c2;
5993             rgbe[2] = (stbi_uc) len;
5994             rgbe[3] = (stbi_uc) stbi__get8(s);
5995             stbi__hdr_convert(hdr_data, rgbe, req_comp);
5996             i = 1;
5997             j = 0;
5998             STBI_FREE(scanline);
5999             goto main_decode_loop; // yes, this makes no sense
6000          }
6001          len <<= 8;
6002          len |= stbi__get8(s);
6003          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
6004          if (scanline == NULL) scanline = (stbi_uc *) stbi__malloc(width * 4);
6005 
6006          for (k = 0; k < 4; ++k) {
6007             i = 0;
6008             while (i < width) {
6009                count = stbi__get8(s);
6010                if (count > 128) {
6011                   // Run
6012                   value = stbi__get8(s);
6013                   count -= 128;
6014                   for (z = 0; z < count; ++z)
6015                      scanline[i++ * 4 + k] = value;
6016                } else {
6017                   // Dump
6018                   for (z = 0; z < count; ++z)
6019                      scanline[i++ * 4 + k] = stbi__get8(s);
6020                }
6021             }
6022          }
6023          for (i=0; i < width; ++i)
6024             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
6025       }
6026       STBI_FREE(scanline);
6027    }
6028 
6029    return hdr_data;
6030 }
6031 
stbi__hdr_info(stbi__context * s,int * x,int * y,int * comp)6032 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
6033 {
6034    char buffer[STBI__HDR_BUFLEN];
6035    char *token;
6036    int valid = 0;
6037 
6038    if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0) {
6039        stbi__rewind( s );
6040        return 0;
6041    }
6042 
6043    for(;;) {
6044       token = stbi__hdr_gettoken(s,buffer);
6045       if (token[0] == 0) break;
6046       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
6047    }
6048 
6049    if (!valid) {
6050        stbi__rewind( s );
6051        return 0;
6052    }
6053    token = stbi__hdr_gettoken(s,buffer);
6054    if (strncmp(token, "-Y ", 3)) {
6055        stbi__rewind( s );
6056        return 0;
6057    }
6058    token += 3;
6059    *y = (int) strtol(token, &token, 10);
6060    while (*token == ' ') ++token;
6061    if (strncmp(token, "+X ", 3)) {
6062        stbi__rewind( s );
6063        return 0;
6064    }
6065    token += 3;
6066    *x = (int) strtol(token, NULL, 10);
6067    *comp = 3;
6068    return 1;
6069 }
6070 #endif // STBI_NO_HDR
6071 
6072 #ifndef STBI_NO_BMP
stbi__bmp_info(stbi__context * s,int * x,int * y,int * comp)6073 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
6074 {
6075    int hsz;
6076    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') {
6077        stbi__rewind( s );
6078        return 0;
6079    }
6080    stbi__skip(s,12);
6081    hsz = stbi__get32le(s);
6082    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) {
6083        stbi__rewind( s );
6084        return 0;
6085    }
6086    if (hsz == 12) {
6087       *x = stbi__get16le(s);
6088       *y = stbi__get16le(s);
6089    } else {
6090       *x = stbi__get32le(s);
6091       *y = stbi__get32le(s);
6092    }
6093    if (stbi__get16le(s) != 1) {
6094        stbi__rewind( s );
6095        return 0;
6096    }
6097    *comp = stbi__get16le(s) / 8;
6098    return 1;
6099 }
6100 #endif
6101 
6102 #ifndef STBI_NO_PSD
stbi__psd_info(stbi__context * s,int * x,int * y,int * comp)6103 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
6104 {
6105    int channelCount;
6106    if (stbi__get32be(s) != 0x38425053) {
6107        stbi__rewind( s );
6108        return 0;
6109    }
6110    if (stbi__get16be(s) != 1) {
6111        stbi__rewind( s );
6112        return 0;
6113    }
6114    stbi__skip(s, 6);
6115    channelCount = stbi__get16be(s);
6116    if (channelCount < 0 || channelCount > 16) {
6117        stbi__rewind( s );
6118        return 0;
6119    }
6120    *y = stbi__get32be(s);
6121    *x = stbi__get32be(s);
6122    if (stbi__get16be(s) != 8) {
6123        stbi__rewind( s );
6124        return 0;
6125    }
6126    if (stbi__get16be(s) != 3) {
6127        stbi__rewind( s );
6128        return 0;
6129    }
6130    *comp = 4;
6131    return 1;
6132 }
6133 #endif
6134 
6135 #ifndef STBI_NO_PIC
stbi__pic_info(stbi__context * s,int * x,int * y,int * comp)6136 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
6137 {
6138    int act_comp=0,num_packets=0,chained;
6139    stbi__pic_packet packets[10];
6140 
6141    stbi__skip(s, 92);
6142 
6143    *x = stbi__get16be(s);
6144    *y = stbi__get16be(s);
6145    if (stbi__at_eof(s))  return 0;
6146    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
6147        stbi__rewind( s );
6148        return 0;
6149    }
6150 
6151    stbi__skip(s, 8);
6152 
6153    do {
6154       stbi__pic_packet *packet;
6155 
6156       if (num_packets==sizeof(packets)/sizeof(packets[0]))
6157          return 0;
6158 
6159       packet = &packets[num_packets++];
6160       chained = stbi__get8(s);
6161       packet->size    = stbi__get8(s);
6162       packet->type    = stbi__get8(s);
6163       packet->channel = stbi__get8(s);
6164       act_comp |= packet->channel;
6165 
6166       if (stbi__at_eof(s)) {
6167           stbi__rewind( s );
6168           return 0;
6169       }
6170       if (packet->size != 8) {
6171           stbi__rewind( s );
6172           return 0;
6173       }
6174    } while (chained);
6175 
6176    *comp = (act_comp & 0x10 ? 4 : 3);
6177 
6178    return 1;
6179 }
6180 #endif
6181 
6182 // *************************************************************************************************
6183 // Portable Gray Map and Portable Pixel Map loader
6184 // by Ken Miller
6185 //
6186 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
6187 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
6188 //
6189 // Known limitations:
6190 //    Does not support comments in the header section
6191 //    Does not support ASCII image data (formats P2 and P3)
6192 //    Does not support 16-bit-per-channel
6193 
6194 #ifndef STBI_NO_PNM
6195 
stbi__pnm_test(stbi__context * s)6196 static int      stbi__pnm_test(stbi__context *s)
6197 {
6198    char p, t;
6199    p = (char) stbi__get8(s);
6200    t = (char) stbi__get8(s);
6201    if (p != 'P' || (t != '5' && t != '6')) {
6202        stbi__rewind( s );
6203        return 0;
6204    }
6205    return 1;
6206 }
6207 
stbi__pnm_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)6208 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
6209 {
6210    stbi_uc *out;
6211    if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
6212       return 0;
6213    *x = s->img_x;
6214    *y = s->img_y;
6215    *comp = s->img_n;
6216 
6217    out = (stbi_uc *) stbi__malloc(s->img_n * s->img_x * s->img_y);
6218    if (!out) return stbi__errpuc("outofmem", "Out of memory");
6219    stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
6220 
6221    if (req_comp && req_comp != s->img_n) {
6222       out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
6223       if (out == NULL) return out; // stbi__convert_format frees input on failure
6224    }
6225    return out;
6226 }
6227 
stbi__pnm_isspace(char c)6228 static int      stbi__pnm_isspace(char c)
6229 {
6230    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
6231 }
6232 
stbi__pnm_skip_whitespace(stbi__context * s,char * c)6233 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
6234 {
6235    while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
6236       *c = (char) stbi__get8(s);
6237 }
6238 
stbi__pnm_isdigit(char c)6239 static int      stbi__pnm_isdigit(char c)
6240 {
6241    return c >= '0' && c <= '9';
6242 }
6243 
stbi__pnm_getinteger(stbi__context * s,char * c)6244 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
6245 {
6246    int value = 0;
6247 
6248    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
6249       value = value*10 + (*c - '0');
6250       *c = (char) stbi__get8(s);
6251    }
6252 
6253    return value;
6254 }
6255 
stbi__pnm_info(stbi__context * s,int * x,int * y,int * comp)6256 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
6257 {
6258    int maxv;
6259    char c, p, t;
6260 
6261    stbi__rewind( s );
6262 
6263    // Get identifier
6264    p = (char) stbi__get8(s);
6265    t = (char) stbi__get8(s);
6266    if (p != 'P' || (t != '5' && t != '6')) {
6267        stbi__rewind( s );
6268        return 0;
6269    }
6270 
6271    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
6272 
6273    c = (char) stbi__get8(s);
6274    stbi__pnm_skip_whitespace(s, &c);
6275 
6276    *x = stbi__pnm_getinteger(s, &c); // read width
6277    stbi__pnm_skip_whitespace(s, &c);
6278 
6279    *y = stbi__pnm_getinteger(s, &c); // read height
6280    stbi__pnm_skip_whitespace(s, &c);
6281 
6282    maxv = stbi__pnm_getinteger(s, &c);  // read max value
6283 
6284    if (maxv > 255)
6285       return stbi__err("max value > 255", "PPM image not 8-bit");
6286    else
6287       return 1;
6288 }
6289 #endif
6290 
stbi__info_main(stbi__context * s,int * x,int * y,int * comp)6291 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
6292 {
6293    #ifndef STBI_NO_JPEG
6294    if (stbi__jpeg_info(s, x, y, comp)) return 1;
6295    #endif
6296 
6297    #ifndef STBI_NO_PNG
6298    if (stbi__png_info(s, x, y, comp))  return 1;
6299    #endif
6300 
6301    #ifndef STBI_NO_GIF
6302    if (stbi__gif_info(s, x, y, comp))  return 1;
6303    #endif
6304 
6305    #ifndef STBI_NO_BMP
6306    if (stbi__bmp_info(s, x, y, comp))  return 1;
6307    #endif
6308 
6309    #ifndef STBI_NO_PSD
6310    if (stbi__psd_info(s, x, y, comp))  return 1;
6311    #endif
6312 
6313    #ifndef STBI_NO_PIC
6314    if (stbi__pic_info(s, x, y, comp))  return 1;
6315    #endif
6316 
6317    #ifndef STBI_NO_PNM
6318    if (stbi__pnm_info(s, x, y, comp))  return 1;
6319    #endif
6320 
6321    #ifndef STBI_NO_HDR
6322    if (stbi__hdr_info(s, x, y, comp))  return 1;
6323    #endif
6324 
6325    // test tga last because it's a crappy test!
6326    #ifndef STBI_NO_TGA
6327    if (stbi__tga_info(s, x, y, comp))
6328        return 1;
6329    #endif
6330    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
6331 }
6332 
6333 #ifndef STBI_NO_STDIO
stbi_info(char const * filename,int * x,int * y,int * comp)6334 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
6335 {
6336     FILE *f = stbi__fopen(filename, "rb");
6337     int result;
6338     if (!f) return stbi__err("can't fopen", "Unable to open file");
6339     result = stbi_info_from_file(f, x, y, comp);
6340     fclose(f);
6341     return result;
6342 }
6343 
stbi_info_from_file(FILE * f,int * x,int * y,int * comp)6344 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
6345 {
6346    int r;
6347    stbi__context s;
6348    long pos = ftell(f);
6349    stbi__start_file(&s, f);
6350    r = stbi__info_main(&s,x,y,comp);
6351    fseek(f,pos,SEEK_SET);
6352    return r;
6353 }
6354 #endif // !STBI_NO_STDIO
6355 
stbi_info_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp)6356 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
6357 {
6358    stbi__context s;
6359    stbi__start_mem(&s,buffer,len);
6360    return stbi__info_main(&s,x,y,comp);
6361 }
6362 
stbi_info_from_callbacks(stbi_io_callbacks const * c,void * user,int * x,int * y,int * comp)6363 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
6364 {
6365    stbi__context s;
6366    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
6367    return stbi__info_main(&s,x,y,comp);
6368 }
6369 
6370 // add in my DDS loading support
6371 #ifndef STBI_NO_DDS
6372 #include "stbi_DDS_c.h"
6373 #endif
6374 
6375 // add in my pvr loading support
6376 #ifndef STBI_NO_PVR
6377 #include "stbi_pvr_c.h"
6378 #endif
6379 
6380 // add in my pkm ( ETC1 ) loading support
6381 #ifndef STBI_NO_PKM
6382 #include "stbi_pkm_c.h"
6383 #endif
6384 
6385 #ifndef STBI_NO_EXT
6386 #include "stbi_ext_c.h"
6387 #endif
6388 
6389 #endif // STB_IMAGE_IMPLEMENTATION
6390 
6391 /*
6392    revision history:
6393       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
6394       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
6395       2.03  (2015-04-12) extra corruption checking (mmozeiko)
6396                          stbi_set_flip_vertically_on_load (nguillemot)
6397                          fix NEON support; fix mingw support
6398       2.02  (2015-01-19) fix incorrect assert, fix warning
6399       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
6400       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
6401       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
6402                          progressive JPEG (stb)
6403                          PGM/PPM support (Ken Miller)
6404                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
6405                          GIF bugfix -- seemingly never worked
6406                          STBI_NO_*, STBI_ONLY_*
6407       1.48  (2014-12-14) fix incorrectly-named assert()
6408       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
6409                          optimize PNG (ryg)
6410                          fix bug in interlaced PNG with user-specified channel count (stb)
6411       1.46  (2014-08-26)
6412               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
6413       1.45  (2014-08-16)
6414               fix MSVC-ARM internal compiler error by wrapping malloc
6415       1.44  (2014-08-07)
6416               various warning fixes from Ronny Chevalier
6417       1.43  (2014-07-15)
6418               fix MSVC-only compiler problem in code changed in 1.42
6419       1.42  (2014-07-09)
6420               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
6421               fixes to stbi__cleanup_jpeg path
6422               added STBI_ASSERT to avoid requiring assert.h
6423       1.41  (2014-06-25)
6424               fix search&replace from 1.36 that messed up comments/error messages
6425       1.40  (2014-06-22)
6426               fix gcc struct-initialization warning
6427       1.39  (2014-06-15)
6428               fix to TGA optimization when req_comp != number of components in TGA;
6429               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
6430               add support for BMP version 5 (more ignored fields)
6431       1.38  (2014-06-06)
6432               suppress MSVC warnings on integer casts truncating values
6433               fix accidental rename of 'skip' field of I/O
6434       1.37  (2014-06-04)
6435               remove duplicate typedef
6436       1.36  (2014-06-03)
6437               convert to header file single-file library
6438               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
6439       1.35  (2014-05-27)
6440               various warnings
6441               fix broken STBI_SIMD path
6442               fix bug where stbi_load_from_file no longer left file pointer in correct place
6443               fix broken non-easy path for 32-bit BMP (possibly never used)
6444               TGA optimization by Arseny Kapoulkine
6445       1.34  (unknown)
6446               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
6447       1.33  (2011-07-14)
6448               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
6449       1.32  (2011-07-13)
6450               support for "info" function for all supported filetypes (SpartanJ)
6451       1.31  (2011-06-20)
6452               a few more leak fixes, bug in PNG handling (SpartanJ)
6453       1.30  (2011-06-11)
6454               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
6455               removed deprecated format-specific test/load functions
6456               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
6457               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
6458               fix inefficiency in decoding 32-bit BMP (David Woo)
6459       1.29  (2010-08-16)
6460               various warning fixes from Aurelien Pocheville
6461       1.28  (2010-08-01)
6462               fix bug in GIF palette transparency (SpartanJ)
6463       1.27  (2010-08-01)
6464               cast-to-stbi_uc to fix warnings
6465       1.26  (2010-07-24)
6466               fix bug in file buffering for PNG reported by SpartanJ
6467       1.25  (2010-07-17)
6468               refix trans_data warning (Won Chun)
6469       1.24  (2010-07-12)
6470               perf improvements reading from files on platforms with lock-heavy fgetc()
6471               minor perf improvements for jpeg
6472               deprecated type-specific functions so we'll get feedback if they're needed
6473               attempt to fix trans_data warning (Won Chun)
6474       1.23    fixed bug in iPhone support
6475       1.22  (2010-07-10)
6476               removed image *writing* support
6477               stbi_info support from Jetro Lauha
6478               GIF support from Jean-Marc Lienher
6479               iPhone PNG-extensions from James Brown
6480               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
6481       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
6482       1.20    added support for Softimage PIC, by Tom Seddon
6483       1.19    bug in interlaced PNG corruption check (found by ryg)
6484       1.18  (2008-08-02)
6485               fix a threading bug (local mutable static)
6486       1.17    support interlaced PNG
6487       1.16    major bugfix - stbi__convert_format converted one too many pixels
6488       1.15    initialize some fields for thread safety
6489       1.14    fix threadsafe conversion bug
6490               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
6491       1.13    threadsafe
6492       1.12    const qualifiers in the API
6493       1.11    Support installable IDCT, colorspace conversion routines
6494       1.10    Fixes for 64-bit (don't use "unsigned long")
6495               optimized upsampling by Fabian "ryg" Giesen
6496       1.09    Fix format-conversion for PSD code (bad global variables!)
6497       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
6498       1.07    attempt to fix C++ warning/errors again
6499       1.06    attempt to fix C++ warning/errors again
6500       1.05    fix TGA loading to return correct *comp and use good luminance calc
6501       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
6502       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
6503       1.02    support for (subset of) HDR files, float interface for preferred access to them
6504       1.01    fix bug: possible bug in handling right-side up bmps... not sure
6505               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
6506       1.00    interface to zlib that skips zlib header
6507       0.99    correct handling of alpha in palette
6508       0.98    TGA loader by lonesock; dynamically add loaders (untested)
6509       0.97    jpeg errors on too large a file; also catch another malloc failure
6510       0.96    fix detection of invalid v value - particleman@mollyrocket forum
6511       0.95    during header scan, seek to markers in case of padding
6512       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
6513       0.93    handle jpegtran output; verbose errors
6514       0.92    read 4,8,16,24,32-bit BMP files of several formats
6515       0.91    output 24-bit Windows 3.0 BMP files
6516       0.90    fix a few more warnings; bump version number to approach 1.0
6517       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
6518       0.60    fix compiling as c++
6519       0.59    fix warnings: merge Dave Moore's -Wall fixes
6520       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
6521       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
6522       0.56    fix bug: zlib uncompressed mode len vs. nlen
6523       0.55    fix bug: restart_interval not initialized to 0
6524       0.54    allow NULL for 'int *comp'
6525       0.53    fix bug in png 3->4; speedup png decoding
6526       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
6527       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
6528               on 'test' only check type, not whether we support this variant
6529       0.50  (2006-11-19)
6530               first released version
6531 */
6532