1 /* stb_image - v2.12 - public domain image loader - http://nothings.org/stb_image.h
2                                      no warranty implied; use at your own risk
3 
4    Do this:
5       #define STB_IMAGE_IMPLEMENTATION
6    before you include this file in *one* C or C++ file to create the implementation.
7 
8    // i.e. it should look like this:
9    #include ...
10    #include ...
11    #include ...
12    #define STB_IMAGE_IMPLEMENTATION
13    #include "stb_image.h"
14 
15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
17 
18 
19    QUICK NOTES:
20       Primarily of interest to game developers and other people who can
21           avoid problematic images and only need the trivial interface
22 
23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
24       PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
25 
26       TGA (not sure what subset, if a subset)
27       BMP non-1bpp, non-RLE
28       PSD (composited view only, no extra channels, 8/16 bit-per-channel)
29 
30       GIF (*comp always reports as 4-channel)
31       HDR (radiance rgbE format)
32       PIC (Softimage PIC)
33       PNM (PPM and PGM binary only)
34 
35       Animated GIF still needs a proper API, but here's one way to do it:
36           http://gist.github.com/urraka/685d9a6340b26b830d49
37 
38       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
39       - decode from arbitrary I/O callbacks
40       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
41 
42    Full documentation under "DOCUMENTATION" below.
43 
44 
45    Revision 2.00 release notes:
46 
47       - Progressive JPEG is now supported.
48 
49       - PPM and PGM binary formats are now supported, thanks to Ken Miller.
50 
51       - x86 platforms now make use of SSE2 SIMD instructions for
52         JPEG decoding, and ARM platforms can use NEON SIMD if requested.
53         This work was done by Fabian "ryg" Giesen. SSE2 is used by
54         default, but NEON must be enabled explicitly; see docs.
55 
56         With other JPEG optimizations included in this version, we see
57         2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
58         on a JPEG on an ARM machine, relative to previous versions of this
59         library. The same results will not obtain for all JPGs and for all
60         x86/ARM machines. (Note that progressive JPEGs are significantly
61         slower to decode than regular JPEGs.) This doesn't mean that this
62         is the fastest JPEG decoder in the land; rather, it brings it
63         closer to parity with standard libraries. If you want the fastest
64         decode, look elsewhere. (See "Philosophy" section of docs below.)
65 
66         See final bullet items below for more info on SIMD.
67 
68       - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
69         the memory allocator. Unlike other STBI libraries, these macros don't
70         support a context parameter, so if you need to pass a context in to
71         the allocator, you'll have to store it in a global or a thread-local
72         variable.
73 
74       - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
75         STBI_NO_LINEAR.
76             STBI_NO_HDR:     suppress implementation of .hdr reader format
77             STBI_NO_LINEAR:  suppress high-dynamic-range light-linear float API
78 
79       - You can suppress implementation of any of the decoders to reduce
80         your code footprint by #defining one or more of the following
81         symbols before creating the implementation.
82 
83             STBI_NO_JPEG
84             STBI_NO_PNG
85             STBI_NO_BMP
86             STBI_NO_PSD
87             STBI_NO_TGA
88             STBI_NO_GIF
89             STBI_NO_HDR
90             STBI_NO_PIC
91             STBI_NO_PNM   (.ppm and .pgm)
92 
93       - You can request *only* certain decoders and suppress all other ones
94         (this will be more forward-compatible, as addition of new decoders
95         doesn't require you to disable them explicitly):
96 
97             STBI_ONLY_JPEG
98             STBI_ONLY_PNG
99             STBI_ONLY_BMP
100             STBI_ONLY_PSD
101             STBI_ONLY_TGA
102             STBI_ONLY_GIF
103             STBI_ONLY_HDR
104             STBI_ONLY_PIC
105             STBI_ONLY_PNM   (.ppm and .pgm)
106 
107          Note that you can define multiples of these, and you will get all
108          of them ("only x" and "only y" is interpreted to mean "only x&y").
109 
110        - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
111          want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
112 
113       - Compilation of all SIMD code can be suppressed with
114             #define STBI_NO_SIMD
115         It should not be necessary to disable SIMD unless you have issues
116         compiling (e.g. using an x86 compiler which doesn't support SSE
117         intrinsics or that doesn't support the method used to detect
118         SSE2 support at run-time), and even those can be reported as
119         bugs so I can refine the built-in compile-time checking to be
120         smarter.
121 
122       - The old STBI_SIMD system which allowed installing a user-defined
123         IDCT etc. has been removed. If you need this, don't upgrade. My
124         assumption is that almost nobody was doing this, and those who
125         were will find the built-in SIMD more satisfactory anyway.
126 
127       - RGB values computed for JPEG images are slightly different from
128         previous versions of stb_image. (This is due to using less
129         integer precision in SIMD.) The C code has been adjusted so
130         that the same RGB values will be computed regardless of whether
131         SIMD support is available, so your app should always produce
132         consistent results. But these results are slightly different from
133         previous versions. (Specifically, about 3% of available YCbCr values
134         will compute different RGB results from pre-1.49 versions by +-1;
135         most of the deviating values are one smaller in the G channel.)
136 
137       - If you must produce consistent results with previous versions of
138         stb_image, #define STBI_JPEG_OLD and you will get the same results
139         you used to; however, you will not get the SIMD speedups for
140         the YCbCr-to-RGB conversion step (although you should still see
141         significant JPEG speedup from the other changes).
142 
143         Please note that STBI_JPEG_OLD is a temporary feature; it will be
144         removed in future versions of the library. It is only intended for
145         near-term back-compatibility use.
146 
147 
148    Latest revision history:
149       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
150       2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
151                          RGB-format JPEG; remove white matting in PSD;
152                          allocate large structures on the stack;
153                          correct channel count for PNG & BMP
154       2.10  (2016-01-22) avoid warning introduced in 2.09
155       2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
156       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
157       2.07  (2015-09-13) partial animated GIF support
158                          limited 16-bit PSD support
159                          minor bugs, code cleanup, and compiler warnings
160       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
161       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
162       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
163       2.03  (2015-04-12) additional corruption checking
164                          stbi_set_flip_vertically_on_load
165                          fix NEON support; fix mingw support
166       2.02  (2015-01-19) fix incorrect assert, fix warning
167       2.01  (2015-01-17) fix various warnings
168       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
169       2.00  (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD
170                          progressive JPEG
171                          PGM/PPM support
172                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
173                          STBI_NO_*, STBI_ONLY_*
174                          GIF bugfix
175 
176    See end of file for full revision history.
177 
178 
179  ============================    Contributors    =========================
180 
181  Image formats                          Extensions, features
182     Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
183     Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
184     Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
185     Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
186     Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
187     Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
188     Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
189     urraka@github (animated gif)           Junggon Kim (PNM comments)
190                                            Daniel Gibson (16-bit TGA)
191 
192  Optimizations & bugfixes
193     Fabian "ryg" Giesen
194     Arseny Kapoulkine
195 
196  Bug & warning fixes
197     Marc LeBlanc            David Woo          Guillaume George   Martins Mozeiko
198     Christpher Lloyd        Martin Golini      Jerry Jansson      Joseph Thomson
199     Dave Moore              Roy Eltham         Hayaki Saito       Phil Jordan
200     Won Chun                Luke Graham        Johan Duparc       Nathan Reed
201     the Horde3D community   Thomas Ruf         Ronny Chevalier    Nick Verigakis
202     Janez Zemva             John Bartholomew   Michal Cichon      svdijk@github
203     Jonathan Blow           Ken Hamada         Tero Hanninen      Baldur Karlsson
204     Laurent Gomila          Cort Stratton      Sergio Gonzalez    romigrou@github
205     Aruelien Pocheville     Thibault Reuille   Cass Everitt       Matthew Gregan
206     Ryamond Barbiero        Paul Du Bois       Engin Manap        snagar@github
207     Michaelangel007@github  Oriol Ferrer Mesia socks-the-fox
208     Blazej Dariusz Roszkowski
209 
210 
211 LICENSE
212 
213 This software is dual-licensed to the public domain and under the following
214 license: you are granted a perpetual, irrevocable license to copy, modify,
215 publish, and distribute this file as you see fit.
216 
217 */
218 
219 #ifndef STBI_INCLUDE_STB_IMAGE_H
220 #define STBI_INCLUDE_STB_IMAGE_H
221 
222 // DOCUMENTATION
223 //
224 // Limitations:
225 //    - no 16-bit-per-channel PNG
226 //    - no 12-bit-per-channel JPEG
227 //    - no JPEGs with arithmetic coding
228 //    - no 1-bit BMP
229 //    - GIF always returns *comp=4
230 //
231 // Basic usage (see HDR discussion below for HDR usage):
232 //    int x,y,n;
233 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
234 //    // ... process data if not NULL ...
235 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
236 //    // ... replace '0' with '1'..'4' to force that many components per pixel
237 //    // ... but 'n' will always be the number that it would have been if you said 0
238 //    stbi_image_free(data)
239 //
240 // Standard parameters:
241 //    int *x       -- outputs image width in pixels
242 //    int *y       -- outputs image height in pixels
243 //    int *comp    -- outputs # of image components in image file
244 //    int req_comp -- if non-zero, # of image components requested in result
245 //
246 // The return value from an image loader is an 'unsigned char *' which points
247 // to the pixel data, or NULL on an allocation failure or if the image is
248 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
249 // with each pixel consisting of N interleaved 8-bit components; the first
250 // pixel pointed to is top-left-most in the image. There is no padding between
251 // image scanlines or between pixels, regardless of format. The number of
252 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
253 // If req_comp is non-zero, *comp has the number of components that _would_
254 // have been output otherwise. E.g. if you set req_comp to 4, you will always
255 // get RGBA output, but you can check *comp to see if it's trivially opaque
256 // because e.g. there were only 3 channels in the source image.
257 //
258 // An output image with N components has the following components interleaved
259 // in this order in each pixel:
260 //
261 //     N=#comp     components
262 //       1           grey
263 //       2           grey, alpha
264 //       3           red, green, blue
265 //       4           red, green, blue, alpha
266 //
267 // If image loading fails for any reason, the return value will be NULL,
268 // and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
269 // can be queried for an extremely brief, end-user unfriendly explanation
270 // of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
271 // compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
272 // more user-friendly ones.
273 //
274 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
275 //
276 // ===========================================================================
277 //
278 // Philosophy
279 //
280 // stb libraries are designed with the following priorities:
281 //
282 //    1. easy to use
283 //    2. easy to maintain
284 //    3. good performance
285 //
286 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
287 // and for best performance I may provide less-easy-to-use APIs that give higher
288 // performance, in addition to the easy to use ones. Nevertheless, it's important
289 // to keep in mind that from the standpoint of you, a client of this library,
290 // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
291 //
292 // Some secondary priorities arise directly from the first two, some of which
293 // make more explicit reasons why performance can't be emphasized.
294 //
295 //    - Portable ("ease of use")
296 //    - Small footprint ("easy to maintain")
297 //    - No dependencies ("ease of use")
298 //
299 // ===========================================================================
300 //
301 // I/O callbacks
302 //
303 // I/O callbacks allow you to read from arbitrary sources, like packaged
304 // files or some other source. Data read from callbacks are processed
305 // through a small internal buffer (currently 128 bytes) to try to reduce
306 // overhead.
307 //
308 // The three functions you must define are "read" (reads some bytes of data),
309 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
310 //
311 // ===========================================================================
312 //
313 // SIMD support
314 //
315 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
316 // supported by the compiler. For ARM Neon support, you must explicitly
317 // request it.
318 //
319 // (The old do-it-yourself SIMD API is no longer supported in the current
320 // code.)
321 //
322 // On x86, SSE2 will automatically be used when available based on a run-time
323 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
324 // the typical path is to have separate builds for NEON and non-NEON devices
325 // (at least this is true for iOS and Android). Therefore, the NEON support is
326 // toggled by a build flag: define STBI_NEON to get NEON loops.
327 //
328 // The output of the JPEG decoder is slightly different from versions where
329 // SIMD support was introduced (that is, for versions before 1.49). The
330 // difference is only +-1 in the 8-bit RGB channels, and only on a small
331 // fraction of pixels. You can force the pre-1.49 behavior by defining
332 // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
333 // and hence cost some performance.
334 //
335 // If for some reason you do not want to use any of SIMD code, or if
336 // you have issues compiling it, you can disable it entirely by
337 // defining STBI_NO_SIMD.
338 //
339 // ===========================================================================
340 //
341 // HDR image support   (disable by defining STBI_NO_HDR)
342 //
343 // stb_image now supports loading HDR images in general, and currently
344 // the Radiance .HDR file format, although the support is provided
345 // generically. You can still load any file through the existing interface;
346 // if you attempt to load an HDR file, it will be automatically remapped to
347 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
348 // both of these constants can be reconfigured through this interface:
349 //
350 //     stbi_hdr_to_ldr_gamma(2.2f);
351 //     stbi_hdr_to_ldr_scale(1.0f);
352 //
353 // (note, do not use _inverse_ constants; stbi_image will invert them
354 // appropriately).
355 //
356 // Additionally, there is a new, parallel interface for loading files as
357 // (linear) floats to preserve the full dynamic range:
358 //
359 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
360 //
361 // If you load LDR images through this interface, those images will
362 // be promoted to floating point values, run through the inverse of
363 // constants corresponding to the above:
364 //
365 //     stbi_ldr_to_hdr_scale(1.0f);
366 //     stbi_ldr_to_hdr_gamma(2.2f);
367 //
368 // Finally, given a filename (or an open file or memory block--see header
369 // file for details) containing image data, you can query for the "most
370 // appropriate" interface to use (that is, whether the image is HDR or
371 // not), using:
372 //
373 //     stbi_is_hdr(char *filename);
374 //
375 // ===========================================================================
376 //
377 // iPhone PNG support:
378 //
379 // By default we convert iphone-formatted PNGs back to RGB, even though
380 // they are internally encoded differently. You can disable this conversion
381 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
382 // you will always just get the native iphone "format" through (which
383 // is BGR stored in RGB).
384 //
385 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
386 // pixel to remove any premultiplied alpha *only* if the image file explicitly
387 // says there's premultiplied data (currently only happens in iPhone images,
388 // and only if iPhone convert-to-rgb processing is on).
389 //
390 
391 
392 #ifndef STBI_NO_STDIO
393 #include <stdio.h>
394 #endif // STBI_NO_STDIO
395 
396 #define STBI_VERSION 1
397 
398 enum
399 {
400    STBI_default = 0, // only used for req_comp
401 
402    STBI_grey       = 1,
403    STBI_grey_alpha = 2,
404    STBI_rgb        = 3,
405    STBI_rgb_alpha  = 4
406 };
407 
408 typedef unsigned char stbi_uc;
409 
410 #ifdef __cplusplus
411 extern "C" {
412 #endif
413 
414 #ifdef STB_IMAGE_STATIC
415 #define STBIDEF static
416 #else
417 #define STBIDEF extern
418 #endif
419 
420 //////////////////////////////////////////////////////////////////////////////
421 //
422 // PRIMARY API - works on images of any type
423 //
424 
425 //
426 // load image by filename, open file, or memory buffer
427 //
428 
429 typedef struct
430 {
431    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
432    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
433    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
434 } stbi_io_callbacks;
435 
436 STBIDEF stbi_uc *stbi_load               (char              const *filename,           int *x, int *y, int *comp, int req_comp);
437 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *comp, int req_comp);
438 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *comp, int req_comp);
439 
440 #ifndef STBI_NO_STDIO
441 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f,                  int *x, int *y, int *comp, int req_comp);
442 // for stbi_load_from_file, file pointer is left pointing immediately after image
443 #endif
444 
445 #ifndef STBI_NO_LINEAR
446    STBIDEF float *stbi_loadf                 (char const *filename,           int *x, int *y, int *comp, int req_comp);
447    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
448    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp);
449 
450    #ifndef STBI_NO_STDIO
451    STBIDEF float *stbi_loadf_from_file  (FILE *f,                int *x, int *y, int *comp, int req_comp);
452    #endif
453 #endif
454 
455 #ifndef STBI_NO_HDR
456    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
457    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
458 #endif // STBI_NO_HDR
459 
460 #ifndef STBI_NO_LINEAR
461    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
462    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
463 #endif // STBI_NO_LINEAR
464 
465 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
466 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
467 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
468 #ifndef STBI_NO_STDIO
469 STBIDEF int      stbi_is_hdr          (char const *filename);
470 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
471 #endif // STBI_NO_STDIO
472 
473 
474 // get a VERY brief reason for failure
475 // NOT THREADSAFE
476 STBIDEF const char *stbi_failure_reason  (void);
477 
478 // free the loaded image -- this is just free()
479 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
480 
481 // get image dimensions & components without fully decoding
482 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
483 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
484 
485 #ifndef STBI_NO_STDIO
486 STBIDEF int      stbi_info            (char const *filename,     int *x, int *y, int *comp);
487 STBIDEF int      stbi_info_from_file  (FILE *f,                  int *x, int *y, int *comp);
488 
489 #endif
490 
491 
492 
493 // for image formats that explicitly notate that they have premultiplied alpha,
494 // we just return the colors as stored in the file. set this flag to force
495 // unpremultiplication. results are undefined if the unpremultiply overflow.
496 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
497 
498 // indicate whether we should process iphone images back to canonical format,
499 // or just pass them through "as-is"
500 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
501 
502 // flip the image vertically, so the first pixel in the output array is the bottom left
503 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
504 
505 // ZLIB client - used by PNG, available for other purposes
506 
507 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
508 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
509 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
510 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
511 
512 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
513 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
514 
515 
516 #ifdef __cplusplus
517 }
518 #endif
519 
520 //
521 //
522 ////   end header file   /////////////////////////////////////////////////////
523 #endif // STBI_INCLUDE_STB_IMAGE_H
524 
525 #ifdef STB_IMAGE_IMPLEMENTATION
526 
527 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
528   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
529   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
530   || defined(STBI_ONLY_ZLIB)
531    #ifndef STBI_ONLY_JPEG
532    #define STBI_NO_JPEG
533    #endif
534    #ifndef STBI_ONLY_PNG
535    #define STBI_NO_PNG
536    #endif
537    #ifndef STBI_ONLY_BMP
538    #define STBI_NO_BMP
539    #endif
540    #ifndef STBI_ONLY_PSD
541    #define STBI_NO_PSD
542    #endif
543    #ifndef STBI_ONLY_TGA
544    #define STBI_NO_TGA
545    #endif
546    #ifndef STBI_ONLY_GIF
547    #define STBI_NO_GIF
548    #endif
549    #ifndef STBI_ONLY_HDR
550    #define STBI_NO_HDR
551    #endif
552    #ifndef STBI_ONLY_PIC
553    #define STBI_NO_PIC
554    #endif
555    #ifndef STBI_ONLY_PNM
556    #define STBI_NO_PNM
557    #endif
558 #endif
559 
560 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
561 #define STBI_NO_ZLIB
562 #endif
563 
564 
565 #include <stdarg.h>
566 #include <stddef.h> // ptrdiff_t on osx
567 #include <stdlib.h>
568 #include <string.h>
569 
570 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
571 #include <math.h>  // ldexp
572 #endif
573 
574 #ifndef STBI_NO_STDIO
575 #include <stdio.h>
576 #endif
577 
578 #ifndef STBI_ASSERT
579 #include <assert.h>
580 #define STBI_ASSERT(x) assert(x)
581 #endif
582 
583 
584 #ifndef _MSC_VER
585    #ifdef __cplusplus
586    #define stbi_inline inline
587    #else
588    #define stbi_inline
589    #endif
590 #else
591    #define stbi_inline __forceinline
592 #endif
593 
594 
595 #ifdef _MSC_VER
596 typedef unsigned short stbi__uint16;
597 typedef   signed short stbi__int16;
598 typedef unsigned int   stbi__uint32;
599 typedef   signed int   stbi__int32;
600 #else
601 #include <stdint.h>
602 typedef uint16_t stbi__uint16;
603 typedef int16_t  stbi__int16;
604 typedef uint32_t stbi__uint32;
605 typedef int32_t  stbi__int32;
606 #endif
607 
608 // should produce compiler error if size is wrong
609 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
610 
611 #ifdef _MSC_VER
612 #define STBI_NOTUSED(v)  (void)(v)
613 #else
614 #define STBI_NOTUSED(v)  (void)sizeof(v)
615 #endif
616 
617 #ifdef _MSC_VER
618 #define STBI_HAS_LROTL
619 #endif
620 
621 #ifdef STBI_HAS_LROTL
622    #define stbi_lrot(x,y)  _lrotl(x,y)
623 #else
624    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (32 - (y))))
625 #endif
626 
627 #if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
628 // ok
629 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
630 // ok
631 #else
632 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
633 #endif
634 
635 #ifndef STBI_MALLOC
636 #define STBI_MALLOC(sz)           malloc(sz)
637 #define STBI_REALLOC(p,newsz)     realloc(p,newsz)
638 #define STBI_FREE(p)              free(p)
639 #endif
640 
641 #ifndef STBI_REALLOC_SIZED
642 #define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
643 #endif
644 
645 // x86/x64 detection
646 #if defined(__x86_64__) || defined(_M_X64)
647 #define STBI__X64_TARGET
648 #elif defined(__i386) || defined(_M_IX86)
649 #define STBI__X86_TARGET
650 #endif
651 
652 #if defined(__GNUC__) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET)) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
653 // NOTE: not clear do we actually need this for the 64-bit path?
654 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
655 // (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
656 // this is just broken and gcc are jerks for not fixing it properly
657 // http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
658 #define STBI_NO_SIMD
659 #endif
660 
661 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
662 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
663 //
664 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
665 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
666 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
667 // simultaneously enabling "-mstackrealign".
668 //
669 // See https://github.com/nothings/stb/issues/81 for more information.
670 //
671 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
672 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
673 #define STBI_NO_SIMD
674 #endif
675 
676 #if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
677 #define STBI_SSE2
678 #include <emmintrin.h>
679 
680 #ifdef _MSC_VER
681 
682 #if _MSC_VER >= 1400  // not VC6
683 #include <intrin.h> // __cpuid
stbi__cpuid3(void)684 static int stbi__cpuid3(void)
685 {
686    int info[4];
687    __cpuid(info,1);
688    return info[3];
689 }
690 #else
stbi__cpuid3(void)691 static int stbi__cpuid3(void)
692 {
693    int res;
694    __asm {
695       mov  eax,1
696       cpuid
697       mov  res,edx
698    }
699    return res;
700 }
701 #endif
702 
703 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
704 
stbi__sse2_available()705 static int stbi__sse2_available()
706 {
707    int info3 = stbi__cpuid3();
708    return ((info3 >> 26) & 1) != 0;
709 }
710 #else // assume GCC-style if not VC++
711 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
712 
stbi__sse2_available()713 static int stbi__sse2_available()
714 {
715 #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
716    // GCC 4.8+ has a nice way to do this
717    return __builtin_cpu_supports("sse2");
718 #else
719    // portable way to do this, preferably without using GCC inline ASM?
720    // just bail for now.
721    return 0;
722 #endif
723 }
724 #endif
725 #endif
726 
727 // ARM NEON
728 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
729 #undef STBI_NEON
730 #endif
731 
732 #ifdef STBI_NEON
733 #include <arm_neon.h>
734 // assume GCC or Clang on ARM targets
735 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
736 #endif
737 
738 #ifndef STBI_SIMD_ALIGN
739 #define STBI_SIMD_ALIGN(type, name) type name
740 #endif
741 
742 ///////////////////////////////////////////////
743 //
744 //  stbi__context struct and start_xxx functions
745 
746 // stbi__context structure is our basic context used by all images, so it
747 // contains all the IO context, plus some basic image information
748 typedef struct
749 {
750    stbi__uint32 img_x, img_y;
751    int img_n, img_out_n;
752 
753    stbi_io_callbacks io;
754    void *io_user_data;
755 
756    int read_from_callbacks;
757    int buflen;
758    stbi_uc buffer_start[128];
759 
760    stbi_uc *img_buffer, *img_buffer_end;
761    stbi_uc *img_buffer_original, *img_buffer_original_end;
762 } stbi__context;
763 
764 
765 static void stbi__refill_buffer(stbi__context *s);
766 
767 // initialize a memory-decode context
stbi__start_mem(stbi__context * s,stbi_uc const * buffer,int len)768 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
769 {
770    s->io.read = NULL;
771    s->read_from_callbacks = 0;
772    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
773    s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
774 }
775 
776 // initialize a callback-based context
stbi__start_callbacks(stbi__context * s,stbi_io_callbacks * c,void * user)777 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
778 {
779    s->io = *c;
780    s->io_user_data = user;
781    s->buflen = sizeof(s->buffer_start);
782    s->read_from_callbacks = 1;
783    s->img_buffer_original = s->buffer_start;
784    stbi__refill_buffer(s);
785    s->img_buffer_original_end = s->img_buffer_end;
786 }
787 
788 #ifndef STBI_NO_STDIO
789 
stbi__stdio_read(void * user,char * data,int size)790 static int stbi__stdio_read(void *user, char *data, int size)
791 {
792    return (int) fread(data,1,size,(FILE*) user);
793 }
794 
stbi__stdio_skip(void * user,int n)795 static void stbi__stdio_skip(void *user, int n)
796 {
797    fseek((FILE*) user, n, SEEK_CUR);
798 }
799 
stbi__stdio_eof(void * user)800 static int stbi__stdio_eof(void *user)
801 {
802    return feof((FILE*) user);
803 }
804 
805 static stbi_io_callbacks stbi__stdio_callbacks =
806 {
807    stbi__stdio_read,
808    stbi__stdio_skip,
809    stbi__stdio_eof,
810 };
811 
stbi__start_file(stbi__context * s,FILE * f)812 static void stbi__start_file(stbi__context *s, FILE *f)
813 {
814    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
815 }
816 
817 //static void stop_file(stbi__context *s) { }
818 
819 #endif // !STBI_NO_STDIO
820 
stbi__rewind(stbi__context * s)821 static void stbi__rewind(stbi__context *s)
822 {
823    // conceptually rewind SHOULD rewind to the beginning of the stream,
824    // but we just rewind to the beginning of the initial buffer, because
825    // we only use it after doing 'test', which only ever looks at at most 92 bytes
826    s->img_buffer = s->img_buffer_original;
827    s->img_buffer_end = s->img_buffer_original_end;
828 }
829 
830 #ifndef STBI_NO_JPEG
831 static int      stbi__jpeg_test(stbi__context *s);
832 static stbi_uc *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
833 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
834 #endif
835 
836 #ifndef STBI_NO_PNG
837 static int      stbi__png_test(stbi__context *s);
838 static stbi_uc *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
839 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
840 #endif
841 
842 #ifndef STBI_NO_BMP
843 static int      stbi__bmp_test(stbi__context *s);
844 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
845 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
846 #endif
847 
848 #ifndef STBI_NO_TGA
849 static int      stbi__tga_test(stbi__context *s);
850 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
851 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
852 #endif
853 
854 #ifndef STBI_NO_PSD
855 static int      stbi__psd_test(stbi__context *s);
856 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
857 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
858 #endif
859 
860 #ifndef STBI_NO_HDR
861 static int      stbi__hdr_test(stbi__context *s);
862 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
863 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
864 #endif
865 
866 #ifndef STBI_NO_PIC
867 static int      stbi__pic_test(stbi__context *s);
868 static stbi_uc *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
869 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
870 #endif
871 
872 #ifndef STBI_NO_GIF
873 static int      stbi__gif_test(stbi__context *s);
874 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
875 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
876 #endif
877 
878 #ifndef STBI_NO_PNM
879 static int      stbi__pnm_test(stbi__context *s);
880 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
881 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
882 #endif
883 
884 // this is not threadsafe
885 static const char *stbi__g_failure_reason;
886 
stbi_failure_reason(void)887 STBIDEF const char *stbi_failure_reason(void)
888 {
889    return stbi__g_failure_reason;
890 }
891 
stbi__err(const char * str)892 static int stbi__err(const char *str)
893 {
894    stbi__g_failure_reason = str;
895    return 0;
896 }
897 
stbi__malloc(size_t size)898 static void *stbi__malloc(size_t size)
899 {
900     return STBI_MALLOC(size);
901 }
902 
903 // stbi__err - error
904 // stbi__errpf - error returning pointer to float
905 // stbi__errpuc - error returning pointer to unsigned char
906 
907 #ifdef STBI_NO_FAILURE_STRINGS
908    #define stbi__err(x,y)  0
909 #elif defined(STBI_FAILURE_USERMSG)
910    #define stbi__err(x,y)  stbi__err(y)
911 #else
912    #define stbi__err(x,y)  stbi__err(x)
913 #endif
914 
915 #define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
916 #define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
917 
stbi_image_free(void * retval_from_stbi_load)918 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
919 {
920    STBI_FREE(retval_from_stbi_load);
921 }
922 
923 #ifndef STBI_NO_LINEAR
924 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
925 #endif
926 
927 #ifndef STBI_NO_HDR
928 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
929 #endif
930 
931 static int stbi__vertically_flip_on_load = 0;
932 
stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)933 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
934 {
935     stbi__vertically_flip_on_load = flag_true_if_should_flip;
936 }
937 
stbi__load_main(stbi__context * s,int * x,int * y,int * comp,int req_comp)938 static unsigned char *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
939 {
940    #ifndef STBI_NO_JPEG
941    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp);
942    #endif
943    #ifndef STBI_NO_PNG
944    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp);
945    #endif
946    #ifndef STBI_NO_BMP
947    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp);
948    #endif
949    #ifndef STBI_NO_GIF
950    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp);
951    #endif
952    #ifndef STBI_NO_PSD
953    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp);
954    #endif
955    #ifndef STBI_NO_PIC
956    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp);
957    #endif
958    #ifndef STBI_NO_PNM
959    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp);
960    #endif
961 
962    #ifndef STBI_NO_HDR
963    if (stbi__hdr_test(s)) {
964       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp);
965       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
966    }
967    #endif
968 
969    #ifndef STBI_NO_TGA
970    // test tga last because it's a crappy test!
971    if (stbi__tga_test(s))
972       return stbi__tga_load(s,x,y,comp,req_comp);
973    #endif
974 
975    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
976 }
977 
stbi__load_flip(stbi__context * s,int * x,int * y,int * comp,int req_comp)978 static unsigned char *stbi__load_flip(stbi__context *s, int *x, int *y, int *comp, int req_comp)
979 {
980    unsigned char *result = stbi__load_main(s, x, y, comp, req_comp);
981 
982    if (stbi__vertically_flip_on_load && result != NULL) {
983       int w = *x, h = *y;
984       int depth = req_comp ? req_comp : *comp;
985       int row,col,z;
986       stbi_uc temp;
987 
988       // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
989       for (row = 0; row < (h>>1); row++) {
990          for (col = 0; col < w; col++) {
991             for (z = 0; z < depth; z++) {
992                temp = result[(row * w + col) * depth + z];
993                result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
994                result[((h - row - 1) * w + col) * depth + z] = temp;
995             }
996          }
997       }
998    }
999 
1000    return result;
1001 }
1002 
1003 #ifndef STBI_NO_HDR
stbi__float_postprocess(float * result,int * x,int * y,int * comp,int req_comp)1004 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1005 {
1006    if (stbi__vertically_flip_on_load && result != NULL) {
1007       int w = *x, h = *y;
1008       int depth = req_comp ? req_comp : *comp;
1009       int row,col,z;
1010       float temp;
1011 
1012       // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1013       for (row = 0; row < (h>>1); row++) {
1014          for (col = 0; col < w; col++) {
1015             for (z = 0; z < depth; z++) {
1016                temp = result[(row * w + col) * depth + z];
1017                result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
1018                result[((h - row - 1) * w + col) * depth + z] = temp;
1019             }
1020          }
1021       }
1022    }
1023 }
1024 #endif
1025 
1026 #ifndef STBI_NO_STDIO
1027 
stbi__fopen(char const * filename,char const * mode)1028 static FILE *stbi__fopen(char const *filename, char const *mode)
1029 {
1030    FILE *f;
1031 #if defined(_MSC_VER) && _MSC_VER >= 1400
1032    if (0 != fopen_s(&f, filename, mode))
1033       f=0;
1034 #else
1035    f = fopen(filename, mode);
1036 #endif
1037    return f;
1038 }
1039 
1040 
stbi_load(char const * filename,int * x,int * y,int * comp,int req_comp)1041 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1042 {
1043    FILE *f = stbi__fopen(filename, "rb");
1044    unsigned char *result;
1045    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1046    result = stbi_load_from_file(f,x,y,comp,req_comp);
1047    fclose(f);
1048    return result;
1049 }
1050 
stbi_load_from_file(FILE * f,int * x,int * y,int * comp,int req_comp)1051 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1052 {
1053    unsigned char *result;
1054    stbi__context s;
1055    stbi__start_file(&s,f);
1056    result = stbi__load_flip(&s,x,y,comp,req_comp);
1057    if (result) {
1058       // need to 'unget' all the characters in the IO buffer
1059       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1060    }
1061    return result;
1062 }
1063 #endif //!STBI_NO_STDIO
1064 
stbi_load_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp,int req_comp)1065 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1066 {
1067    stbi__context s;
1068    stbi__start_mem(&s,buffer,len);
1069    return stbi__load_flip(&s,x,y,comp,req_comp);
1070 }
1071 
stbi_load_from_callbacks(stbi_io_callbacks const * clbk,void * user,int * x,int * y,int * comp,int req_comp)1072 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1073 {
1074    stbi__context s;
1075    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1076    return stbi__load_flip(&s,x,y,comp,req_comp);
1077 }
1078 
1079 #ifndef STBI_NO_LINEAR
stbi__loadf_main(stbi__context * s,int * x,int * y,int * comp,int req_comp)1080 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1081 {
1082    unsigned char *data;
1083    #ifndef STBI_NO_HDR
1084    if (stbi__hdr_test(s)) {
1085       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp);
1086       if (hdr_data)
1087          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
1088       return hdr_data;
1089    }
1090    #endif
1091    data = stbi__load_flip(s, x, y, comp, req_comp);
1092    if (data)
1093       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1094    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1095 }
1096 
stbi_loadf_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp,int req_comp)1097 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1098 {
1099    stbi__context s;
1100    stbi__start_mem(&s,buffer,len);
1101    return stbi__loadf_main(&s,x,y,comp,req_comp);
1102 }
1103 
stbi_loadf_from_callbacks(stbi_io_callbacks const * clbk,void * user,int * x,int * y,int * comp,int req_comp)1104 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1105 {
1106    stbi__context s;
1107    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1108    return stbi__loadf_main(&s,x,y,comp,req_comp);
1109 }
1110 
1111 #ifndef STBI_NO_STDIO
stbi_loadf(char const * filename,int * x,int * y,int * comp,int req_comp)1112 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1113 {
1114    float *result;
1115    FILE *f = stbi__fopen(filename, "rb");
1116    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1117    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1118    fclose(f);
1119    return result;
1120 }
1121 
stbi_loadf_from_file(FILE * f,int * x,int * y,int * comp,int req_comp)1122 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1123 {
1124    stbi__context s;
1125    stbi__start_file(&s,f);
1126    return stbi__loadf_main(&s,x,y,comp,req_comp);
1127 }
1128 #endif // !STBI_NO_STDIO
1129 
1130 #endif // !STBI_NO_LINEAR
1131 
1132 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1133 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1134 // reports false!
1135 
stbi_is_hdr_from_memory(stbi_uc const * buffer,int len)1136 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1137 {
1138    #ifndef STBI_NO_HDR
1139    stbi__context s;
1140    stbi__start_mem(&s,buffer,len);
1141    return stbi__hdr_test(&s);
1142    #else
1143    STBI_NOTUSED(buffer);
1144    STBI_NOTUSED(len);
1145    return 0;
1146    #endif
1147 }
1148 
1149 #ifndef STBI_NO_STDIO
stbi_is_hdr(char const * filename)1150 STBIDEF int      stbi_is_hdr          (char const *filename)
1151 {
1152    FILE *f = stbi__fopen(filename, "rb");
1153    int result=0;
1154    if (f) {
1155       result = stbi_is_hdr_from_file(f);
1156       fclose(f);
1157    }
1158    return result;
1159 }
1160 
stbi_is_hdr_from_file(FILE * f)1161 STBIDEF int      stbi_is_hdr_from_file(FILE *f)
1162 {
1163    #ifndef STBI_NO_HDR
1164    stbi__context s;
1165    stbi__start_file(&s,f);
1166    return stbi__hdr_test(&s);
1167    #else
1168    STBI_NOTUSED(f);
1169    return 0;
1170    #endif
1171 }
1172 #endif // !STBI_NO_STDIO
1173 
stbi_is_hdr_from_callbacks(stbi_io_callbacks const * clbk,void * user)1174 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1175 {
1176    #ifndef STBI_NO_HDR
1177    stbi__context s;
1178    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1179    return stbi__hdr_test(&s);
1180    #else
1181    STBI_NOTUSED(clbk);
1182    STBI_NOTUSED(user);
1183    return 0;
1184    #endif
1185 }
1186 
1187 #ifndef STBI_NO_LINEAR
1188 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1189 
stbi_ldr_to_hdr_gamma(float gamma)1190 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
stbi_ldr_to_hdr_scale(float scale)1191 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1192 #endif
1193 
1194 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1195 
stbi_hdr_to_ldr_gamma(float gamma)1196 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
stbi_hdr_to_ldr_scale(float scale)1197 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1198 
1199 
1200 //////////////////////////////////////////////////////////////////////////////
1201 //
1202 // Common code used by all image loaders
1203 //
1204 
1205 enum
1206 {
1207    STBI__SCAN_load=0,
1208    STBI__SCAN_type,
1209    STBI__SCAN_header
1210 };
1211 
stbi__refill_buffer(stbi__context * s)1212 static void stbi__refill_buffer(stbi__context *s)
1213 {
1214    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1215    if (n == 0) {
1216       // at end of file, treat same as if from memory, but need to handle case
1217       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1218       s->read_from_callbacks = 0;
1219       s->img_buffer = s->buffer_start;
1220       s->img_buffer_end = s->buffer_start+1;
1221       *s->img_buffer = 0;
1222    } else {
1223       s->img_buffer = s->buffer_start;
1224       s->img_buffer_end = s->buffer_start + n;
1225    }
1226 }
1227 
stbi__get8(stbi__context * s)1228 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1229 {
1230    if (s->img_buffer < s->img_buffer_end)
1231       return *s->img_buffer++;
1232    if (s->read_from_callbacks) {
1233       stbi__refill_buffer(s);
1234       return *s->img_buffer++;
1235    }
1236    return 0;
1237 }
1238 
stbi__at_eof(stbi__context * s)1239 stbi_inline static int stbi__at_eof(stbi__context *s)
1240 {
1241    if (s->io.read) {
1242       if (!(s->io.eof)(s->io_user_data)) return 0;
1243       // if feof() is true, check if buffer = end
1244       // special case: we've only got the special 0 character at the end
1245       if (s->read_from_callbacks == 0) return 1;
1246    }
1247 
1248    return s->img_buffer >= s->img_buffer_end;
1249 }
1250 
stbi__skip(stbi__context * s,int n)1251 static void stbi__skip(stbi__context *s, int n)
1252 {
1253    if (n < 0) {
1254       s->img_buffer = s->img_buffer_end;
1255       return;
1256    }
1257    if (s->io.read) {
1258       int blen = (int) (s->img_buffer_end - s->img_buffer);
1259       if (blen < n) {
1260          s->img_buffer = s->img_buffer_end;
1261          (s->io.skip)(s->io_user_data, n - blen);
1262          return;
1263       }
1264    }
1265    s->img_buffer += n;
1266 }
1267 
stbi__getn(stbi__context * s,stbi_uc * buffer,int n)1268 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1269 {
1270    if (s->io.read) {
1271       int blen = (int) (s->img_buffer_end - s->img_buffer);
1272       if (blen < n) {
1273          int res, count;
1274 
1275          memcpy(buffer, s->img_buffer, blen);
1276 
1277          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1278          res = (count == (n-blen));
1279          s->img_buffer = s->img_buffer_end;
1280          return res;
1281       }
1282    }
1283 
1284    if (s->img_buffer+n <= s->img_buffer_end) {
1285       memcpy(buffer, s->img_buffer, n);
1286       s->img_buffer += n;
1287       return 1;
1288    } else
1289       return 0;
1290 }
1291 
stbi__get16be(stbi__context * s)1292 static int stbi__get16be(stbi__context *s)
1293 {
1294    int z = stbi__get8(s);
1295    return (z << 8) + stbi__get8(s);
1296 }
1297 
stbi__get32be(stbi__context * s)1298 static stbi__uint32 stbi__get32be(stbi__context *s)
1299 {
1300    stbi__uint32 z = stbi__get16be(s);
1301    return (z << 16) + stbi__get16be(s);
1302 }
1303 
1304 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1305 // nothing
1306 #else
stbi__get16le(stbi__context * s)1307 static int stbi__get16le(stbi__context *s)
1308 {
1309    int z = stbi__get8(s);
1310    return z + (stbi__get8(s) << 8);
1311 }
1312 #endif
1313 
1314 #ifndef STBI_NO_BMP
stbi__get32le(stbi__context * s)1315 static stbi__uint32 stbi__get32le(stbi__context *s)
1316 {
1317    stbi__uint32 z = stbi__get16le(s);
1318    return z + (stbi__get16le(s) << 16);
1319 }
1320 #endif
1321 
1322 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
1323 
1324 
1325 //////////////////////////////////////////////////////////////////////////////
1326 //
1327 //  generic converter from built-in img_n to req_comp
1328 //    individual types do this automatically as much as possible (e.g. jpeg
1329 //    does all cases internally since it needs to colorspace convert anyway,
1330 //    and it never has alpha, so very few cases ). png can automatically
1331 //    interleave an alpha=255 channel, but falls back to this for other cases
1332 //
1333 //  assume data buffer is malloced, so malloc a new one and free that one
1334 //  only failure mode is malloc failing
1335 
stbi__compute_y(int r,int g,int b)1336 static stbi_uc stbi__compute_y(int r, int g, int b)
1337 {
1338    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
1339 }
1340 
stbi__convert_format(unsigned char * data,int img_n,int req_comp,unsigned int x,unsigned int y)1341 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1342 {
1343    int i,j;
1344    unsigned char *good;
1345 
1346    if (req_comp == img_n) return data;
1347    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1348 
1349    good = (unsigned char *) stbi__malloc(req_comp * x * y);
1350    if (good == NULL) {
1351       STBI_FREE(data);
1352       return stbi__errpuc("outofmem", "Out of memory");
1353    }
1354 
1355    for (j=0; j < (int) y; ++j) {
1356       unsigned char *src  = data + j * x * img_n   ;
1357       unsigned char *dest = good + j * x * req_comp;
1358 
1359       #define COMBO(a,b)  ((a)*8+(b))
1360       #define CASE(a,b)   case COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1361       // convert source image with img_n components to one with req_comp components;
1362       // avoid switch per pixel, so use switch per scanline and massive macros
1363       switch (COMBO(img_n, req_comp)) {
1364          CASE(1,2) dest[0]=src[0], dest[1]=255; break;
1365          CASE(1,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1366          CASE(1,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=255; break;
1367          CASE(2,1) dest[0]=src[0]; break;
1368          CASE(2,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1369          CASE(2,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; break;
1370          CASE(3,4) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255; break;
1371          CASE(3,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1372          CASE(3,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255; break;
1373          CASE(4,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1374          CASE(4,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3]; break;
1375          CASE(4,3) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; break;
1376          default: STBI_ASSERT(0);
1377       }
1378       #undef CASE
1379    }
1380 
1381    STBI_FREE(data);
1382    return good;
1383 }
1384 
1385 #ifndef STBI_NO_LINEAR
stbi__ldr_to_hdr(stbi_uc * data,int x,int y,int comp)1386 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1387 {
1388    int i,k,n;
1389    float *output = (float *) stbi__malloc(x * y * comp * sizeof(float));
1390    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1391    // compute number of non-alpha components
1392    if (comp & 1) n = comp; else n = comp-1;
1393    for (i=0; i < x*y; ++i) {
1394       for (k=0; k < n; ++k) {
1395          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1396       }
1397       if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
1398    }
1399    STBI_FREE(data);
1400    return output;
1401 }
1402 #endif
1403 
1404 #ifndef STBI_NO_HDR
1405 #define stbi__float2int(x)   ((int) (x))
stbi__hdr_to_ldr(float * data,int x,int y,int comp)1406 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
1407 {
1408    int i,k,n;
1409    stbi_uc *output = (stbi_uc *) stbi__malloc(x * y * comp);
1410    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1411    // compute number of non-alpha components
1412    if (comp & 1) n = comp; else n = comp-1;
1413    for (i=0; i < x*y; ++i) {
1414       for (k=0; k < n; ++k) {
1415          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1416          if (z < 0) z = 0;
1417          if (z > 255) z = 255;
1418          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1419       }
1420       if (k < comp) {
1421          float z = data[i*comp+k] * 255 + 0.5f;
1422          if (z < 0) z = 0;
1423          if (z > 255) z = 255;
1424          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1425       }
1426    }
1427    STBI_FREE(data);
1428    return output;
1429 }
1430 #endif
1431 
1432 //////////////////////////////////////////////////////////////////////////////
1433 //
1434 //  "baseline" JPEG/JFIF decoder
1435 //
1436 //    simple implementation
1437 //      - doesn't support delayed output of y-dimension
1438 //      - simple interface (only one output format: 8-bit interleaved RGB)
1439 //      - doesn't try to recover corrupt jpegs
1440 //      - doesn't allow partial loading, loading multiple at once
1441 //      - still fast on x86 (copying globals into locals doesn't help x86)
1442 //      - allocates lots of intermediate memory (full size of all components)
1443 //        - non-interleaved case requires this anyway
1444 //        - allows good upsampling (see next)
1445 //    high-quality
1446 //      - upsampled channels are bilinearly interpolated, even across blocks
1447 //      - quality integer IDCT derived from IJG's 'slow'
1448 //    performance
1449 //      - fast huffman; reasonable integer IDCT
1450 //      - some SIMD kernels for common paths on targets with SSE2/NEON
1451 //      - uses a lot of intermediate memory, could cache poorly
1452 
1453 #ifndef STBI_NO_JPEG
1454 
1455 // huffman decoding acceleration
1456 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
1457 
1458 typedef struct
1459 {
1460    stbi_uc  fast[1 << FAST_BITS];
1461    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1462    stbi__uint16 code[256];
1463    stbi_uc  values[256];
1464    stbi_uc  size[257];
1465    unsigned int maxcode[18];
1466    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
1467 } stbi__huffman;
1468 
1469 typedef struct
1470 {
1471    stbi__context *s;
1472    stbi__huffman huff_dc[4];
1473    stbi__huffman huff_ac[4];
1474    stbi_uc dequant[4][64];
1475    stbi__int16 fast_ac[4][1 << FAST_BITS];
1476 
1477 // sizes for components, interleaved MCUs
1478    int img_h_max, img_v_max;
1479    int img_mcu_x, img_mcu_y;
1480    int img_mcu_w, img_mcu_h;
1481 
1482 // definition of jpeg image component
1483    struct
1484    {
1485       int id;
1486       int h,v;
1487       int tq;
1488       int hd,ha;
1489       int dc_pred;
1490 
1491       int x,y,w2,h2;
1492       stbi_uc *data;
1493       void *raw_data, *raw_coeff;
1494       stbi_uc *linebuf;
1495       short   *coeff;   // progressive only
1496       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
1497    } img_comp[4];
1498 
1499    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
1500    int            code_bits;   // number of valid bits
1501    unsigned char  marker;      // marker seen while filling entropy buffer
1502    int            nomore;      // flag if we saw a marker so must stop
1503 
1504    int            progressive;
1505    int            spec_start;
1506    int            spec_end;
1507    int            succ_high;
1508    int            succ_low;
1509    int            eob_run;
1510    int            rgb;
1511 
1512    int scan_n, order[4];
1513    int restart_interval, todo;
1514 
1515 // kernels
1516    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1517    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1518    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1519 } stbi__jpeg;
1520 
stbi__build_huffman(stbi__huffman * h,int * count)1521 static int stbi__build_huffman(stbi__huffman *h, int *count)
1522 {
1523    int i,j,k=0,code;
1524    // build size list for each symbol (from JPEG spec)
1525    for (i=0; i < 16; ++i)
1526       for (j=0; j < count[i]; ++j)
1527          h->size[k++] = (stbi_uc) (i+1);
1528    h->size[k] = 0;
1529 
1530    // compute actual symbols (from jpeg spec)
1531    code = 0;
1532    k = 0;
1533    for(j=1; j <= 16; ++j) {
1534       // compute delta to add to code to compute symbol id
1535       h->delta[j] = k - code;
1536       if (h->size[k] == j) {
1537          while (h->size[k] == j)
1538             h->code[k++] = (stbi__uint16) (code++);
1539          if (code-1 >= (1 << j)) return stbi__err("bad code lengths","Corrupt JPEG");
1540       }
1541       // compute largest code + 1 for this size, preshifted as needed later
1542       h->maxcode[j] = code << (16-j);
1543       code <<= 1;
1544    }
1545    h->maxcode[j] = 0xffffffff;
1546 
1547    // build non-spec acceleration table; 255 is flag for not-accelerated
1548    memset(h->fast, 255, 1 << FAST_BITS);
1549    for (i=0; i < k; ++i) {
1550       int s = h->size[i];
1551       if (s <= FAST_BITS) {
1552          int c = h->code[i] << (FAST_BITS-s);
1553          int m = 1 << (FAST_BITS-s);
1554          for (j=0; j < m; ++j) {
1555             h->fast[c+j] = (stbi_uc) i;
1556          }
1557       }
1558    }
1559    return 1;
1560 }
1561 
1562 // build a table that decodes both magnitude and value of small ACs in
1563 // one go.
stbi__build_fast_ac(stbi__int16 * fast_ac,stbi__huffman * h)1564 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
1565 {
1566    int i;
1567    for (i=0; i < (1 << FAST_BITS); ++i) {
1568       stbi_uc fast = h->fast[i];
1569       fast_ac[i] = 0;
1570       if (fast < 255) {
1571          int rs = h->values[fast];
1572          int run = (rs >> 4) & 15;
1573          int magbits = rs & 15;
1574          int len = h->size[fast];
1575 
1576          if (magbits && len + magbits <= FAST_BITS) {
1577             // magnitude code followed by receive_extend code
1578             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
1579             int m = 1 << (magbits - 1);
1580             if (k < m) k += (-1 << magbits) + 1;
1581             // if the result is small enough, we can fit it in fast_ac table
1582             if (k >= -128 && k <= 127)
1583                fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits));
1584          }
1585       }
1586    }
1587 }
1588 
stbi__grow_buffer_unsafe(stbi__jpeg * j)1589 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
1590 {
1591    do {
1592       int b = j->nomore ? 0 : stbi__get8(j->s);
1593       if (b == 0xff) {
1594          int c = stbi__get8(j->s);
1595          if (c != 0) {
1596             j->marker = (unsigned char) c;
1597             j->nomore = 1;
1598             return;
1599          }
1600       }
1601       j->code_buffer |= b << (24 - j->code_bits);
1602       j->code_bits += 8;
1603    } while (j->code_bits <= 24);
1604 }
1605 
1606 // (1 << n) - 1
1607 static stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
1608 
1609 // decode a jpeg huffman value from the bitstream
stbi__jpeg_huff_decode(stbi__jpeg * j,stbi__huffman * h)1610 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
1611 {
1612    unsigned int temp;
1613    int c,k;
1614 
1615    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1616 
1617    // look at the top FAST_BITS and determine what symbol ID it is,
1618    // if the code is <= FAST_BITS
1619    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1620    k = h->fast[c];
1621    if (k < 255) {
1622       int s = h->size[k];
1623       if (s > j->code_bits)
1624          return -1;
1625       j->code_buffer <<= s;
1626       j->code_bits -= s;
1627       return h->values[k];
1628    }
1629 
1630    // naive test is to shift the code_buffer down so k bits are
1631    // valid, then test against maxcode. To speed this up, we've
1632    // preshifted maxcode left so that it has (16-k) 0s at the
1633    // end; in other words, regardless of the number of bits, it
1634    // wants to be compared against something shifted to have 16;
1635    // that way we don't need to shift inside the loop.
1636    temp = j->code_buffer >> 16;
1637    for (k=FAST_BITS+1 ; ; ++k)
1638       if (temp < h->maxcode[k])
1639          break;
1640    if (k == 17) {
1641       // error! code not found
1642       j->code_bits -= 16;
1643       return -1;
1644    }
1645 
1646    if (k > j->code_bits)
1647       return -1;
1648 
1649    // convert the huffman code to the symbol id
1650    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
1651    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
1652 
1653    // convert the id to a symbol
1654    j->code_bits -= k;
1655    j->code_buffer <<= k;
1656    return h->values[c];
1657 }
1658 
1659 // bias[n] = (-1<<n) + 1
1660 static int const stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
1661 
1662 // combined JPEG 'receive' and JPEG 'extend', since baseline
1663 // always extends everything it receives.
stbi__extend_receive(stbi__jpeg * j,int n)1664 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
1665 {
1666    unsigned int k;
1667    int sgn;
1668    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1669 
1670    sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
1671    k = stbi_lrot(j->code_buffer, n);
1672    STBI_ASSERT(n >= 0 && n < (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask)));
1673    j->code_buffer = k & ~stbi__bmask[n];
1674    k &= stbi__bmask[n];
1675    j->code_bits -= n;
1676    return k + (stbi__jbias[n] & ~sgn);
1677 }
1678 
1679 // get some unsigned bits
stbi__jpeg_get_bits(stbi__jpeg * j,int n)1680 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
1681 {
1682    unsigned int k;
1683    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1684    k = stbi_lrot(j->code_buffer, n);
1685    j->code_buffer = k & ~stbi__bmask[n];
1686    k &= stbi__bmask[n];
1687    j->code_bits -= n;
1688    return k;
1689 }
1690 
stbi__jpeg_get_bit(stbi__jpeg * j)1691 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
1692 {
1693    unsigned int k;
1694    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
1695    k = j->code_buffer;
1696    j->code_buffer <<= 1;
1697    --j->code_bits;
1698    return k & 0x80000000;
1699 }
1700 
1701 // given a value that's at position X in the zigzag stream,
1702 // where does it appear in the 8x8 matrix coded as row-major?
1703 static stbi_uc stbi__jpeg_dezigzag[64+15] =
1704 {
1705     0,  1,  8, 16,  9,  2,  3, 10,
1706    17, 24, 32, 25, 18, 11,  4,  5,
1707    12, 19, 26, 33, 40, 48, 41, 34,
1708    27, 20, 13,  6,  7, 14, 21, 28,
1709    35, 42, 49, 56, 57, 50, 43, 36,
1710    29, 22, 15, 23, 30, 37, 44, 51,
1711    58, 59, 52, 45, 38, 31, 39, 46,
1712    53, 60, 61, 54, 47, 55, 62, 63,
1713    // let corrupt input sample past end
1714    63, 63, 63, 63, 63, 63, 63, 63,
1715    63, 63, 63, 63, 63, 63, 63
1716 };
1717 
1718 // decode one 64-entry block--
stbi__jpeg_decode_block(stbi__jpeg * j,short data[64],stbi__huffman * hdc,stbi__huffman * hac,stbi__int16 * fac,int b,stbi_uc * dequant)1719 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant)
1720 {
1721    int diff,dc,k;
1722    int t;
1723 
1724    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1725    t = stbi__jpeg_huff_decode(j, hdc);
1726    if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1727 
1728    // 0 all the ac values now so we can do it 32-bits at a time
1729    memset(data,0,64*sizeof(data[0]));
1730 
1731    diff = t ? stbi__extend_receive(j, t) : 0;
1732    dc = j->img_comp[b].dc_pred + diff;
1733    j->img_comp[b].dc_pred = dc;
1734    data[0] = (short) (dc * dequant[0]);
1735 
1736    // decode AC components, see JPEG spec
1737    k = 1;
1738    do {
1739       unsigned int zig;
1740       int c,r,s;
1741       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1742       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1743       r = fac[c];
1744       if (r) { // fast-AC path
1745          k += (r >> 4) & 15; // run
1746          s = r & 15; // combined length
1747          j->code_buffer <<= s;
1748          j->code_bits -= s;
1749          // decode into unzigzag'd location
1750          zig = stbi__jpeg_dezigzag[k++];
1751          data[zig] = (short) ((r >> 8) * dequant[zig]);
1752       } else {
1753          int rs = stbi__jpeg_huff_decode(j, hac);
1754          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1755          s = rs & 15;
1756          r = rs >> 4;
1757          if (s == 0) {
1758             if (rs != 0xf0) break; // end block
1759             k += 16;
1760          } else {
1761             k += r;
1762             // decode into unzigzag'd location
1763             zig = stbi__jpeg_dezigzag[k++];
1764             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
1765          }
1766       }
1767    } while (k < 64);
1768    return 1;
1769 }
1770 
stbi__jpeg_decode_block_prog_dc(stbi__jpeg * j,short data[64],stbi__huffman * hdc,int b)1771 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
1772 {
1773    int diff,dc;
1774    int t;
1775    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1776 
1777    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1778 
1779    if (j->succ_high == 0) {
1780       // first scan for DC coefficient, must be first
1781       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
1782       t = stbi__jpeg_huff_decode(j, hdc);
1783       diff = t ? stbi__extend_receive(j, t) : 0;
1784 
1785       dc = j->img_comp[b].dc_pred + diff;
1786       j->img_comp[b].dc_pred = dc;
1787       data[0] = (short) (dc << j->succ_low);
1788    } else {
1789       // refinement scan for DC coefficient
1790       if (stbi__jpeg_get_bit(j))
1791          data[0] += (short) (1 << j->succ_low);
1792    }
1793    return 1;
1794 }
1795 
1796 // @OPTIMIZE: store non-zigzagged during the decode passes,
1797 // and only de-zigzag when dequantizing
stbi__jpeg_decode_block_prog_ac(stbi__jpeg * j,short data[64],stbi__huffman * hac,stbi__int16 * fac)1798 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
1799 {
1800    int k;
1801    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1802 
1803    if (j->succ_high == 0) {
1804       int shift = j->succ_low;
1805 
1806       if (j->eob_run) {
1807          --j->eob_run;
1808          return 1;
1809       }
1810 
1811       k = j->spec_start;
1812       do {
1813          unsigned int zig;
1814          int c,r,s;
1815          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1816          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1817          r = fac[c];
1818          if (r) { // fast-AC path
1819             k += (r >> 4) & 15; // run
1820             s = r & 15; // combined length
1821             j->code_buffer <<= s;
1822             j->code_bits -= s;
1823             zig = stbi__jpeg_dezigzag[k++];
1824             data[zig] = (short) ((r >> 8) << shift);
1825          } else {
1826             int rs = stbi__jpeg_huff_decode(j, hac);
1827             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1828             s = rs & 15;
1829             r = rs >> 4;
1830             if (s == 0) {
1831                if (r < 15) {
1832                   j->eob_run = (1 << r);
1833                   if (r)
1834                      j->eob_run += stbi__jpeg_get_bits(j, r);
1835                   --j->eob_run;
1836                   break;
1837                }
1838                k += 16;
1839             } else {
1840                k += r;
1841                zig = stbi__jpeg_dezigzag[k++];
1842                data[zig] = (short) (stbi__extend_receive(j,s) << shift);
1843             }
1844          }
1845       } while (k <= j->spec_end);
1846    } else {
1847       // refinement scan for these AC coefficients
1848 
1849       short bit = (short) (1 << j->succ_low);
1850 
1851       if (j->eob_run) {
1852          --j->eob_run;
1853          for (k = j->spec_start; k <= j->spec_end; ++k) {
1854             short *p = &data[stbi__jpeg_dezigzag[k]];
1855             if (*p != 0)
1856                if (stbi__jpeg_get_bit(j))
1857                   if ((*p & bit)==0) {
1858                      if (*p > 0)
1859                         *p += bit;
1860                      else
1861                         *p -= bit;
1862                   }
1863          }
1864       } else {
1865          k = j->spec_start;
1866          do {
1867             int r,s;
1868             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
1869             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1870             s = rs & 15;
1871             r = rs >> 4;
1872             if (s == 0) {
1873                if (r < 15) {
1874                   j->eob_run = (1 << r) - 1;
1875                   if (r)
1876                      j->eob_run += stbi__jpeg_get_bits(j, r);
1877                   r = 64; // force end of block
1878                } else {
1879                   // r=15 s=0 should write 16 0s, so we just do
1880                   // a run of 15 0s and then write s (which is 0),
1881                   // so we don't have to do anything special here
1882                }
1883             } else {
1884                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
1885                // sign bit
1886                if (stbi__jpeg_get_bit(j))
1887                   s = bit;
1888                else
1889                   s = -bit;
1890             }
1891 
1892             // advance by r
1893             while (k <= j->spec_end) {
1894                short *p = &data[stbi__jpeg_dezigzag[k++]];
1895                if (*p != 0) {
1896                   if (stbi__jpeg_get_bit(j))
1897                      if ((*p & bit)==0) {
1898                         if (*p > 0)
1899                            *p += bit;
1900                         else
1901                            *p -= bit;
1902                      }
1903                } else {
1904                   if (r == 0) {
1905                      *p = (short) s;
1906                      break;
1907                   }
1908                   --r;
1909                }
1910             }
1911          } while (k <= j->spec_end);
1912       }
1913    }
1914    return 1;
1915 }
1916 
1917 // take a -128..127 value and stbi__clamp it and convert to 0..255
stbi__clamp(int x)1918 stbi_inline static stbi_uc stbi__clamp(int x)
1919 {
1920    // trick to use a single test to catch both cases
1921    if ((unsigned int) x > 255) {
1922       if (x < 0) return 0;
1923       if (x > 255) return 255;
1924    }
1925    return (stbi_uc) x;
1926 }
1927 
1928 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
1929 #define stbi__fsh(x)  ((x) << 12)
1930 
1931 // derived from jidctint -- DCT_ISLOW
1932 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
1933    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
1934    p2 = s2;                                    \
1935    p3 = s6;                                    \
1936    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
1937    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
1938    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
1939    p2 = s0;                                    \
1940    p3 = s4;                                    \
1941    t0 = stbi__fsh(p2+p3);                      \
1942    t1 = stbi__fsh(p2-p3);                      \
1943    x0 = t0+t3;                                 \
1944    x3 = t0-t3;                                 \
1945    x1 = t1+t2;                                 \
1946    x2 = t1-t2;                                 \
1947    t0 = s7;                                    \
1948    t1 = s5;                                    \
1949    t2 = s3;                                    \
1950    t3 = s1;                                    \
1951    p3 = t0+t2;                                 \
1952    p4 = t1+t3;                                 \
1953    p1 = t0+t3;                                 \
1954    p2 = t1+t2;                                 \
1955    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
1956    t0 = t0*stbi__f2f( 0.298631336f);           \
1957    t1 = t1*stbi__f2f( 2.053119869f);           \
1958    t2 = t2*stbi__f2f( 3.072711026f);           \
1959    t3 = t3*stbi__f2f( 1.501321110f);           \
1960    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
1961    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
1962    p3 = p3*stbi__f2f(-1.961570560f);           \
1963    p4 = p4*stbi__f2f(-0.390180644f);           \
1964    t3 += p1+p4;                                \
1965    t2 += p2+p3;                                \
1966    t1 += p2+p4;                                \
1967    t0 += p1+p3;
1968 
stbi__idct_block(stbi_uc * out,int out_stride,short data[64])1969 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
1970 {
1971    int i,val[64],*v=val;
1972    stbi_uc *o;
1973    short *d = data;
1974 
1975    // columns
1976    for (i=0; i < 8; ++i,++d, ++v) {
1977       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
1978       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
1979            && d[40]==0 && d[48]==0 && d[56]==0) {
1980          //    no shortcut                 0     seconds
1981          //    (1|2|3|4|5|6|7)==0          0     seconds
1982          //    all separate               -0.047 seconds
1983          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
1984          int dcterm = d[0] << 2;
1985          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
1986       } else {
1987          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
1988          // constants scaled things up by 1<<12; let's bring them back
1989          // down, but keep 2 extra bits of precision
1990          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
1991          v[ 0] = (x0+t3) >> 10;
1992          v[56] = (x0-t3) >> 10;
1993          v[ 8] = (x1+t2) >> 10;
1994          v[48] = (x1-t2) >> 10;
1995          v[16] = (x2+t1) >> 10;
1996          v[40] = (x2-t1) >> 10;
1997          v[24] = (x3+t0) >> 10;
1998          v[32] = (x3-t0) >> 10;
1999       }
2000    }
2001 
2002    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
2003       // no fast case since the first 1D IDCT spread components out
2004       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
2005       // constants scaled things up by 1<<12, plus we had 1<<2 from first
2006       // loop, plus horizontal and vertical each scale by sqrt(8) so together
2007       // we've got an extra 1<<3, so 1<<17 total we need to remove.
2008       // so we want to round that, which means adding 0.5 * 1<<17,
2009       // aka 65536. Also, we'll end up with -128 to 127 that we want
2010       // to encode as 0..255 by adding 128, so we'll add that before the shift
2011       x0 += 65536 + (128<<17);
2012       x1 += 65536 + (128<<17);
2013       x2 += 65536 + (128<<17);
2014       x3 += 65536 + (128<<17);
2015       // tried computing the shifts into temps, or'ing the temps to see
2016       // if any were out of range, but that was slower
2017       o[0] = stbi__clamp((x0+t3) >> 17);
2018       o[7] = stbi__clamp((x0-t3) >> 17);
2019       o[1] = stbi__clamp((x1+t2) >> 17);
2020       o[6] = stbi__clamp((x1-t2) >> 17);
2021       o[2] = stbi__clamp((x2+t1) >> 17);
2022       o[5] = stbi__clamp((x2-t1) >> 17);
2023       o[3] = stbi__clamp((x3+t0) >> 17);
2024       o[4] = stbi__clamp((x3-t0) >> 17);
2025    }
2026 }
2027 
2028 #ifdef STBI_SSE2
2029 // sse2 integer IDCT. not the fastest possible implementation but it
2030 // produces bit-identical results to the generic C version so it's
2031 // fully "transparent".
stbi__idct_simd(stbi_uc * out,int out_stride,short data[64])2032 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2033 {
2034    // This is constructed to match our regular (generic) integer IDCT exactly.
2035    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2036    __m128i tmp;
2037 
2038    // dot product constant: even elems=x, odd elems=y
2039    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2040 
2041    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
2042    // out(1) = c1[even]*x + c1[odd]*y
2043    #define dct_rot(out0,out1, x,y,c0,c1) \
2044       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2045       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2046       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2047       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2048       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2049       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2050 
2051    // out = in << 12  (in 16-bit, out 32-bit)
2052    #define dct_widen(out, in) \
2053       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2054       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2055 
2056    // wide add
2057    #define dct_wadd(out, a, b) \
2058       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2059       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2060 
2061    // wide sub
2062    #define dct_wsub(out, a, b) \
2063       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2064       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2065 
2066    // butterfly a/b, add bias, then shift by "s" and pack
2067    #define dct_bfly32o(out0, out1, a,b,bias,s) \
2068       { \
2069          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2070          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2071          dct_wadd(sum, abiased, b); \
2072          dct_wsub(dif, abiased, b); \
2073          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2074          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2075       }
2076 
2077    // 8-bit interleave step (for transposes)
2078    #define dct_interleave8(a, b) \
2079       tmp = a; \
2080       a = _mm_unpacklo_epi8(a, b); \
2081       b = _mm_unpackhi_epi8(tmp, b)
2082 
2083    // 16-bit interleave step (for transposes)
2084    #define dct_interleave16(a, b) \
2085       tmp = a; \
2086       a = _mm_unpacklo_epi16(a, b); \
2087       b = _mm_unpackhi_epi16(tmp, b)
2088 
2089    #define dct_pass(bias,shift) \
2090       { \
2091          /* even part */ \
2092          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2093          __m128i sum04 = _mm_add_epi16(row0, row4); \
2094          __m128i dif04 = _mm_sub_epi16(row0, row4); \
2095          dct_widen(t0e, sum04); \
2096          dct_widen(t1e, dif04); \
2097          dct_wadd(x0, t0e, t3e); \
2098          dct_wsub(x3, t0e, t3e); \
2099          dct_wadd(x1, t1e, t2e); \
2100          dct_wsub(x2, t1e, t2e); \
2101          /* odd part */ \
2102          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2103          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2104          __m128i sum17 = _mm_add_epi16(row1, row7); \
2105          __m128i sum35 = _mm_add_epi16(row3, row5); \
2106          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2107          dct_wadd(x4, y0o, y4o); \
2108          dct_wadd(x5, y1o, y5o); \
2109          dct_wadd(x6, y2o, y5o); \
2110          dct_wadd(x7, y3o, y4o); \
2111          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2112          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2113          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2114          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2115       }
2116 
2117    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2118    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2119    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2120    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2121    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2122    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2123    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2124    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2125 
2126    // rounding biases in column/row passes, see stbi__idct_block for explanation.
2127    __m128i bias_0 = _mm_set1_epi32(512);
2128    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2129 
2130    // load
2131    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2132    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2133    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2134    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2135    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2136    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2137    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2138    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2139 
2140    // column pass
2141    dct_pass(bias_0, 10);
2142 
2143    {
2144       // 16bit 8x8 transpose pass 1
2145       dct_interleave16(row0, row4);
2146       dct_interleave16(row1, row5);
2147       dct_interleave16(row2, row6);
2148       dct_interleave16(row3, row7);
2149 
2150       // transpose pass 2
2151       dct_interleave16(row0, row2);
2152       dct_interleave16(row1, row3);
2153       dct_interleave16(row4, row6);
2154       dct_interleave16(row5, row7);
2155 
2156       // transpose pass 3
2157       dct_interleave16(row0, row1);
2158       dct_interleave16(row2, row3);
2159       dct_interleave16(row4, row5);
2160       dct_interleave16(row6, row7);
2161    }
2162 
2163    // row pass
2164    dct_pass(bias_1, 17);
2165 
2166    {
2167       // pack
2168       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2169       __m128i p1 = _mm_packus_epi16(row2, row3);
2170       __m128i p2 = _mm_packus_epi16(row4, row5);
2171       __m128i p3 = _mm_packus_epi16(row6, row7);
2172 
2173       // 8bit 8x8 transpose pass 1
2174       dct_interleave8(p0, p2); // a0e0a1e1...
2175       dct_interleave8(p1, p3); // c0g0c1g1...
2176 
2177       // transpose pass 2
2178       dct_interleave8(p0, p1); // a0c0e0g0...
2179       dct_interleave8(p2, p3); // b0d0f0h0...
2180 
2181       // transpose pass 3
2182       dct_interleave8(p0, p2); // a0b0c0d0...
2183       dct_interleave8(p1, p3); // a4b4c4d4...
2184 
2185       // store
2186       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2187       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2188       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2189       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2190       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2191       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2192       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2193       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2194    }
2195 
2196 #undef dct_const
2197 #undef dct_rot
2198 #undef dct_widen
2199 #undef dct_wadd
2200 #undef dct_wsub
2201 #undef dct_bfly32o
2202 #undef dct_interleave8
2203 #undef dct_interleave16
2204 #undef dct_pass
2205 }
2206 
2207 #endif // STBI_SSE2
2208 
2209 #ifdef STBI_NEON
2210 
2211 // NEON integer IDCT. should produce bit-identical
2212 // results to the generic C version.
stbi__idct_simd(stbi_uc * out,int out_stride,short data[64])2213 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2214 {
2215    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2216 
2217    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2218    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2219    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2220    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2221    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2222    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2223    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2224    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2225    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2226    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2227    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2228    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2229 
2230 #define dct_long_mul(out, inq, coeff) \
2231    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2232    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2233 
2234 #define dct_long_mac(out, acc, inq, coeff) \
2235    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2236    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2237 
2238 #define dct_widen(out, inq) \
2239    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2240    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2241 
2242 // wide add
2243 #define dct_wadd(out, a, b) \
2244    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2245    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2246 
2247 // wide sub
2248 #define dct_wsub(out, a, b) \
2249    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2250    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2251 
2252 // butterfly a/b, then shift using "shiftop" by "s" and pack
2253 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2254    { \
2255       dct_wadd(sum, a, b); \
2256       dct_wsub(dif, a, b); \
2257       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2258       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2259    }
2260 
2261 #define dct_pass(shiftop, shift) \
2262    { \
2263       /* even part */ \
2264       int16x8_t sum26 = vaddq_s16(row2, row6); \
2265       dct_long_mul(p1e, sum26, rot0_0); \
2266       dct_long_mac(t2e, p1e, row6, rot0_1); \
2267       dct_long_mac(t3e, p1e, row2, rot0_2); \
2268       int16x8_t sum04 = vaddq_s16(row0, row4); \
2269       int16x8_t dif04 = vsubq_s16(row0, row4); \
2270       dct_widen(t0e, sum04); \
2271       dct_widen(t1e, dif04); \
2272       dct_wadd(x0, t0e, t3e); \
2273       dct_wsub(x3, t0e, t3e); \
2274       dct_wadd(x1, t1e, t2e); \
2275       dct_wsub(x2, t1e, t2e); \
2276       /* odd part */ \
2277       int16x8_t sum15 = vaddq_s16(row1, row5); \
2278       int16x8_t sum17 = vaddq_s16(row1, row7); \
2279       int16x8_t sum35 = vaddq_s16(row3, row5); \
2280       int16x8_t sum37 = vaddq_s16(row3, row7); \
2281       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2282       dct_long_mul(p5o, sumodd, rot1_0); \
2283       dct_long_mac(p1o, p5o, sum17, rot1_1); \
2284       dct_long_mac(p2o, p5o, sum35, rot1_2); \
2285       dct_long_mul(p3o, sum37, rot2_0); \
2286       dct_long_mul(p4o, sum15, rot2_1); \
2287       dct_wadd(sump13o, p1o, p3o); \
2288       dct_wadd(sump24o, p2o, p4o); \
2289       dct_wadd(sump23o, p2o, p3o); \
2290       dct_wadd(sump14o, p1o, p4o); \
2291       dct_long_mac(x4, sump13o, row7, rot3_0); \
2292       dct_long_mac(x5, sump24o, row5, rot3_1); \
2293       dct_long_mac(x6, sump23o, row3, rot3_2); \
2294       dct_long_mac(x7, sump14o, row1, rot3_3); \
2295       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2296       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2297       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2298       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2299    }
2300 
2301    // load
2302    row0 = vld1q_s16(data + 0*8);
2303    row1 = vld1q_s16(data + 1*8);
2304    row2 = vld1q_s16(data + 2*8);
2305    row3 = vld1q_s16(data + 3*8);
2306    row4 = vld1q_s16(data + 4*8);
2307    row5 = vld1q_s16(data + 5*8);
2308    row6 = vld1q_s16(data + 6*8);
2309    row7 = vld1q_s16(data + 7*8);
2310 
2311    // add DC bias
2312    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2313 
2314    // column pass
2315    dct_pass(vrshrn_n_s32, 10);
2316 
2317    // 16bit 8x8 transpose
2318    {
2319 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2320 // whether compilers actually get this is another story, sadly.
2321 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2322 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2323 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2324 
2325       // pass 1
2326       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2327       dct_trn16(row2, row3);
2328       dct_trn16(row4, row5);
2329       dct_trn16(row6, row7);
2330 
2331       // pass 2
2332       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2333       dct_trn32(row1, row3);
2334       dct_trn32(row4, row6);
2335       dct_trn32(row5, row7);
2336 
2337       // pass 3
2338       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2339       dct_trn64(row1, row5);
2340       dct_trn64(row2, row6);
2341       dct_trn64(row3, row7);
2342 
2343 #undef dct_trn16
2344 #undef dct_trn32
2345 #undef dct_trn64
2346    }
2347 
2348    // row pass
2349    // vrshrn_n_s32 only supports shifts up to 16, we need
2350    // 17. so do a non-rounding shift of 16 first then follow
2351    // up with a rounding shift by 1.
2352    dct_pass(vshrn_n_s32, 16);
2353 
2354    {
2355       // pack and round
2356       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2357       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2358       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2359       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2360       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2361       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2362       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2363       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2364 
2365       // again, these can translate into one instruction, but often don't.
2366 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2367 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2368 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2369 
2370       // sadly can't use interleaved stores here since we only write
2371       // 8 bytes to each scan line!
2372 
2373       // 8x8 8-bit transpose pass 1
2374       dct_trn8_8(p0, p1);
2375       dct_trn8_8(p2, p3);
2376       dct_trn8_8(p4, p5);
2377       dct_trn8_8(p6, p7);
2378 
2379       // pass 2
2380       dct_trn8_16(p0, p2);
2381       dct_trn8_16(p1, p3);
2382       dct_trn8_16(p4, p6);
2383       dct_trn8_16(p5, p7);
2384 
2385       // pass 3
2386       dct_trn8_32(p0, p4);
2387       dct_trn8_32(p1, p5);
2388       dct_trn8_32(p2, p6);
2389       dct_trn8_32(p3, p7);
2390 
2391       // store
2392       vst1_u8(out, p0); out += out_stride;
2393       vst1_u8(out, p1); out += out_stride;
2394       vst1_u8(out, p2); out += out_stride;
2395       vst1_u8(out, p3); out += out_stride;
2396       vst1_u8(out, p4); out += out_stride;
2397       vst1_u8(out, p5); out += out_stride;
2398       vst1_u8(out, p6); out += out_stride;
2399       vst1_u8(out, p7);
2400 
2401 #undef dct_trn8_8
2402 #undef dct_trn8_16
2403 #undef dct_trn8_32
2404    }
2405 
2406 #undef dct_long_mul
2407 #undef dct_long_mac
2408 #undef dct_widen
2409 #undef dct_wadd
2410 #undef dct_wsub
2411 #undef dct_bfly32o
2412 #undef dct_pass
2413 }
2414 
2415 #endif // STBI_NEON
2416 
2417 #define STBI__MARKER_none  0xff
2418 // if there's a pending marker from the entropy stream, return that
2419 // otherwise, fetch from the stream and get a marker. if there's no
2420 // marker, return 0xff, which is never a valid marker value
stbi__get_marker(stbi__jpeg * j)2421 static stbi_uc stbi__get_marker(stbi__jpeg *j)
2422 {
2423    stbi_uc x;
2424    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2425    x = stbi__get8(j->s);
2426    if (x != 0xff) return STBI__MARKER_none;
2427    while (x == 0xff)
2428       x = stbi__get8(j->s);
2429    return x;
2430 }
2431 
2432 // in each scan, we'll have scan_n components, and the order
2433 // of the components is specified by order[]
2434 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
2435 
2436 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2437 // the dc prediction
stbi__jpeg_reset(stbi__jpeg * j)2438 static void stbi__jpeg_reset(stbi__jpeg *j)
2439 {
2440    j->code_bits = 0;
2441    j->code_buffer = 0;
2442    j->nomore = 0;
2443    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
2444    j->marker = STBI__MARKER_none;
2445    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2446    j->eob_run = 0;
2447    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2448    // since we don't even allow 1<<30 pixels
2449 }
2450 
stbi__parse_entropy_coded_data(stbi__jpeg * z)2451 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2452 {
2453    stbi__jpeg_reset(z);
2454    if (!z->progressive) {
2455       if (z->scan_n == 1) {
2456          int i,j;
2457          STBI_SIMD_ALIGN(short, data[64]);
2458          int n = z->order[0];
2459          // non-interleaved data, we just need to process one block at a time,
2460          // in trivial scanline order
2461          // number of blocks to do just depends on how many actual "pixels" this
2462          // component has, independent of interleaved MCU blocking and such
2463          int w = (z->img_comp[n].x+7) >> 3;
2464          int h = (z->img_comp[n].y+7) >> 3;
2465          for (j=0; j < h; ++j) {
2466             for (i=0; i < w; ++i) {
2467                int ha = z->img_comp[n].ha;
2468                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2469                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2470                // every data block is an MCU, so countdown the restart interval
2471                if (--z->todo <= 0) {
2472                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2473                   // if it's NOT a restart, then just bail, so we get corrupt data
2474                   // rather than no data
2475                   if (!STBI__RESTART(z->marker)) return 1;
2476                   stbi__jpeg_reset(z);
2477                }
2478             }
2479          }
2480          return 1;
2481       } else { // interleaved
2482          int i,j,k,x,y;
2483          STBI_SIMD_ALIGN(short, data[64]);
2484          for (j=0; j < z->img_mcu_y; ++j) {
2485             for (i=0; i < z->img_mcu_x; ++i) {
2486                // scan an interleaved mcu... process scan_n components in order
2487                for (k=0; k < z->scan_n; ++k) {
2488                   int n = z->order[k];
2489                   // scan out an mcu's worth of this component; that's just determined
2490                   // by the basic H and V specified for the component
2491                   for (y=0; y < z->img_comp[n].v; ++y) {
2492                      for (x=0; x < z->img_comp[n].h; ++x) {
2493                         int x2 = (i*z->img_comp[n].h + x)*8;
2494                         int y2 = (j*z->img_comp[n].v + y)*8;
2495                         int ha = z->img_comp[n].ha;
2496                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2497                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2498                      }
2499                   }
2500                }
2501                // after all interleaved components, that's an interleaved MCU,
2502                // so now count down the restart interval
2503                if (--z->todo <= 0) {
2504                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2505                   if (!STBI__RESTART(z->marker)) return 1;
2506                   stbi__jpeg_reset(z);
2507                }
2508             }
2509          }
2510          return 1;
2511       }
2512    } else {
2513       if (z->scan_n == 1) {
2514          int i,j;
2515          int n = z->order[0];
2516          // non-interleaved data, we just need to process one block at a time,
2517          // in trivial scanline order
2518          // number of blocks to do just depends on how many actual "pixels" this
2519          // component has, independent of interleaved MCU blocking and such
2520          int w = (z->img_comp[n].x+7) >> 3;
2521          int h = (z->img_comp[n].y+7) >> 3;
2522          for (j=0; j < h; ++j) {
2523             for (i=0; i < w; ++i) {
2524                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2525                if (z->spec_start == 0) {
2526                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2527                      return 0;
2528                } else {
2529                   int ha = z->img_comp[n].ha;
2530                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2531                      return 0;
2532                }
2533                // every data block is an MCU, so countdown the restart interval
2534                if (--z->todo <= 0) {
2535                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2536                   if (!STBI__RESTART(z->marker)) return 1;
2537                   stbi__jpeg_reset(z);
2538                }
2539             }
2540          }
2541          return 1;
2542       } else { // interleaved
2543          int i,j,k,x,y;
2544          for (j=0; j < z->img_mcu_y; ++j) {
2545             for (i=0; i < z->img_mcu_x; ++i) {
2546                // scan an interleaved mcu... process scan_n components in order
2547                for (k=0; k < z->scan_n; ++k) {
2548                   int n = z->order[k];
2549                   // scan out an mcu's worth of this component; that's just determined
2550                   // by the basic H and V specified for the component
2551                   for (y=0; y < z->img_comp[n].v; ++y) {
2552                      for (x=0; x < z->img_comp[n].h; ++x) {
2553                         int x2 = (i*z->img_comp[n].h + x);
2554                         int y2 = (j*z->img_comp[n].v + y);
2555                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
2556                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2557                            return 0;
2558                      }
2559                   }
2560                }
2561                // after all interleaved components, that's an interleaved MCU,
2562                // so now count down the restart interval
2563                if (--z->todo <= 0) {
2564                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2565                   if (!STBI__RESTART(z->marker)) return 1;
2566                   stbi__jpeg_reset(z);
2567                }
2568             }
2569          }
2570          return 1;
2571       }
2572    }
2573 }
2574 
stbi__jpeg_dequantize(short * data,stbi_uc * dequant)2575 static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant)
2576 {
2577    int i;
2578    for (i=0; i < 64; ++i)
2579       data[i] *= dequant[i];
2580 }
2581 
stbi__jpeg_finish(stbi__jpeg * z)2582 static void stbi__jpeg_finish(stbi__jpeg *z)
2583 {
2584    if (z->progressive) {
2585       // dequantize and idct the data
2586       int i,j,n;
2587       for (n=0; n < z->s->img_n; ++n) {
2588          int w = (z->img_comp[n].x+7) >> 3;
2589          int h = (z->img_comp[n].y+7) >> 3;
2590          for (j=0; j < h; ++j) {
2591             for (i=0; i < w; ++i) {
2592                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2593                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
2594                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2595             }
2596          }
2597       }
2598    }
2599 }
2600 
stbi__process_marker(stbi__jpeg * z,int m)2601 static int stbi__process_marker(stbi__jpeg *z, int m)
2602 {
2603    int L;
2604    switch (m) {
2605       case STBI__MARKER_none: // no marker found
2606          return stbi__err("expected marker","Corrupt JPEG");
2607 
2608       case 0xDD: // DRI - specify restart interval
2609          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
2610          z->restart_interval = stbi__get16be(z->s);
2611          return 1;
2612 
2613       case 0xDB: // DQT - define quantization table
2614          L = stbi__get16be(z->s)-2;
2615          while (L > 0) {
2616             int q = stbi__get8(z->s);
2617             int p = q >> 4;
2618             int t = q & 15,i;
2619             if (p != 0) return stbi__err("bad DQT type","Corrupt JPEG");
2620             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
2621             for (i=0; i < 64; ++i)
2622                z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
2623             L -= 65;
2624          }
2625          return L==0;
2626 
2627       case 0xC4: // DHT - define huffman table
2628          L = stbi__get16be(z->s)-2;
2629          while (L > 0) {
2630             stbi_uc *v;
2631             int sizes[16],i,n=0;
2632             int q = stbi__get8(z->s);
2633             int tc = q >> 4;
2634             int th = q & 15;
2635             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
2636             for (i=0; i < 16; ++i) {
2637                sizes[i] = stbi__get8(z->s);
2638                n += sizes[i];
2639             }
2640             L -= 17;
2641             if (tc == 0) {
2642                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
2643                v = z->huff_dc[th].values;
2644             } else {
2645                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
2646                v = z->huff_ac[th].values;
2647             }
2648             for (i=0; i < n; ++i)
2649                v[i] = stbi__get8(z->s);
2650             if (tc != 0)
2651                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
2652             L -= n;
2653          }
2654          return L==0;
2655    }
2656    // check for comment block or APP blocks
2657    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
2658       stbi__skip(z->s, stbi__get16be(z->s)-2);
2659       return 1;
2660    }
2661    return 0;
2662 }
2663 
2664 // after we see SOS
stbi__process_scan_header(stbi__jpeg * z)2665 static int stbi__process_scan_header(stbi__jpeg *z)
2666 {
2667    int i;
2668    int Ls = stbi__get16be(z->s);
2669    z->scan_n = stbi__get8(z->s);
2670    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
2671    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
2672    for (i=0; i < z->scan_n; ++i) {
2673       int id = stbi__get8(z->s), which;
2674       int q = stbi__get8(z->s);
2675       for (which = 0; which < z->s->img_n; ++which)
2676          if (z->img_comp[which].id == id)
2677             break;
2678       if (which == z->s->img_n) return 0; // no match
2679       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
2680       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
2681       z->order[i] = which;
2682    }
2683 
2684    {
2685       int aa;
2686       z->spec_start = stbi__get8(z->s);
2687       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
2688       aa = stbi__get8(z->s);
2689       z->succ_high = (aa >> 4);
2690       z->succ_low  = (aa & 15);
2691       if (z->progressive) {
2692          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
2693             return stbi__err("bad SOS", "Corrupt JPEG");
2694       } else {
2695          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
2696          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
2697          z->spec_end = 63;
2698       }
2699    }
2700 
2701    return 1;
2702 }
2703 
stbi__process_frame_header(stbi__jpeg * z,int scan)2704 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
2705 {
2706    stbi__context *s = z->s;
2707    int Lf,p,i,q, h_max=1,v_max=1,c;
2708    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
2709    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
2710    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
2711    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
2712    c = stbi__get8(s);
2713    if (c != 3 && c != 1) return stbi__err("bad component count","Corrupt JPEG");    // JFIF requires
2714    s->img_n = c;
2715    for (i=0; i < c; ++i) {
2716       z->img_comp[i].data = NULL;
2717       z->img_comp[i].linebuf = NULL;
2718    }
2719 
2720    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
2721 
2722    z->rgb = 0;
2723    for (i=0; i < s->img_n; ++i) {
2724       static unsigned char rgb[3] = { 'R', 'G', 'B' };
2725       z->img_comp[i].id = stbi__get8(s);
2726       if (z->img_comp[i].id != i+1)   // JFIF requires
2727          if (z->img_comp[i].id != i) {  // some version of jpegtran outputs non-JFIF-compliant files!
2728             // somethings output this (see http://fileformats.archiveteam.org/wiki/JPEG#Color_format)
2729             if (z->img_comp[i].id != rgb[i])
2730                return stbi__err("bad component ID","Corrupt JPEG");
2731             ++z->rgb;
2732          }
2733       q = stbi__get8(s);
2734       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
2735       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
2736       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
2737    }
2738 
2739    if (scan != STBI__SCAN_load) return 1;
2740 
2741    if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
2742 
2743    for (i=0; i < s->img_n; ++i) {
2744       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
2745       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
2746    }
2747 
2748    // compute interleaved mcu info
2749    z->img_h_max = h_max;
2750    z->img_v_max = v_max;
2751    z->img_mcu_w = h_max * 8;
2752    z->img_mcu_h = v_max * 8;
2753    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
2754    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
2755 
2756    for (i=0; i < s->img_n; ++i) {
2757       // number of effective pixels (e.g. for non-interleaved MCU)
2758       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
2759       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
2760       // to simplify generation, we'll allocate enough memory to decode
2761       // the bogus oversized data from using interleaved MCUs and their
2762       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
2763       // discard the extra data until colorspace conversion
2764       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
2765       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
2766       z->img_comp[i].raw_data = stbi__malloc(z->img_comp[i].w2 * z->img_comp[i].h2+15);
2767 
2768       if (z->img_comp[i].raw_data == NULL) {
2769          for(--i; i >= 0; --i) {
2770             STBI_FREE(z->img_comp[i].raw_data);
2771             z->img_comp[i].raw_data = NULL;
2772          }
2773          return stbi__err("outofmem", "Out of memory");
2774       }
2775       // align blocks for idct using mmx/sse
2776       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
2777       z->img_comp[i].linebuf = NULL;
2778       if (z->progressive) {
2779          z->img_comp[i].coeff_w = (z->img_comp[i].w2 + 7) >> 3;
2780          z->img_comp[i].coeff_h = (z->img_comp[i].h2 + 7) >> 3;
2781          z->img_comp[i].raw_coeff = STBI_MALLOC(z->img_comp[i].coeff_w * z->img_comp[i].coeff_h * 64 * sizeof(short) + 15);
2782          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
2783       } else {
2784          z->img_comp[i].coeff = 0;
2785          z->img_comp[i].raw_coeff = 0;
2786       }
2787    }
2788 
2789    return 1;
2790 }
2791 
2792 // use comparisons since in some cases we handle more than one case (e.g. SOF)
2793 #define stbi__DNL(x)         ((x) == 0xdc)
2794 #define stbi__SOI(x)         ((x) == 0xd8)
2795 #define stbi__EOI(x)         ((x) == 0xd9)
2796 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
2797 #define stbi__SOS(x)         ((x) == 0xda)
2798 
2799 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
2800 
stbi__decode_jpeg_header(stbi__jpeg * z,int scan)2801 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
2802 {
2803    int m;
2804    z->marker = STBI__MARKER_none; // initialize cached marker to empty
2805    m = stbi__get_marker(z);
2806    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
2807    if (scan == STBI__SCAN_type) return 1;
2808    m = stbi__get_marker(z);
2809    while (!stbi__SOF(m)) {
2810       if (!stbi__process_marker(z,m)) return 0;
2811       m = stbi__get_marker(z);
2812       while (m == STBI__MARKER_none) {
2813          // some files have extra padding after their blocks, so ok, we'll scan
2814          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
2815          m = stbi__get_marker(z);
2816       }
2817    }
2818    z->progressive = stbi__SOF_progressive(m);
2819    if (!stbi__process_frame_header(z, scan)) return 0;
2820    return 1;
2821 }
2822 
2823 // decode image to YCbCr format
stbi__decode_jpeg_image(stbi__jpeg * j)2824 static int stbi__decode_jpeg_image(stbi__jpeg *j)
2825 {
2826    int m;
2827    for (m = 0; m < 4; m++) {
2828       j->img_comp[m].raw_data = NULL;
2829       j->img_comp[m].raw_coeff = NULL;
2830    }
2831    j->restart_interval = 0;
2832    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
2833    m = stbi__get_marker(j);
2834    while (!stbi__EOI(m)) {
2835       if (stbi__SOS(m)) {
2836          if (!stbi__process_scan_header(j)) return 0;
2837          if (!stbi__parse_entropy_coded_data(j)) return 0;
2838          if (j->marker == STBI__MARKER_none ) {
2839             // handle 0s at the end of image data from IP Kamera 9060
2840             while (!stbi__at_eof(j->s)) {
2841                int x = stbi__get8(j->s);
2842                if (x == 255) {
2843                   j->marker = stbi__get8(j->s);
2844                   break;
2845                } else if (x != 0) {
2846                   return stbi__err("junk before marker", "Corrupt JPEG");
2847                }
2848             }
2849             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
2850          }
2851       } else {
2852          if (!stbi__process_marker(j, m)) return 0;
2853       }
2854       m = stbi__get_marker(j);
2855    }
2856    if (j->progressive)
2857       stbi__jpeg_finish(j);
2858    return 1;
2859 }
2860 
2861 // static jfif-centered resampling (across block boundaries)
2862 
2863 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
2864                                     int w, int hs);
2865 
2866 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
2867 
resample_row_1(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2868 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2869 {
2870    STBI_NOTUSED(out);
2871    STBI_NOTUSED(in_far);
2872    STBI_NOTUSED(w);
2873    STBI_NOTUSED(hs);
2874    return in_near;
2875 }
2876 
stbi__resample_row_v_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2877 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2878 {
2879    // need to generate two samples vertically for every one in input
2880    int i;
2881    STBI_NOTUSED(hs);
2882    for (i=0; i < w; ++i)
2883       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
2884    return out;
2885 }
2886 
stbi__resample_row_h_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2887 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2888 {
2889    // need to generate two samples horizontally for every one in input
2890    int i;
2891    stbi_uc *input = in_near;
2892 
2893    if (w == 1) {
2894       // if only one sample, can't do any interpolation
2895       out[0] = out[1] = input[0];
2896       return out;
2897    }
2898 
2899    out[0] = input[0];
2900    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
2901    for (i=1; i < w-1; ++i) {
2902       int n = 3*input[i]+2;
2903       out[i*2+0] = stbi__div4(n+input[i-1]);
2904       out[i*2+1] = stbi__div4(n+input[i+1]);
2905    }
2906    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
2907    out[i*2+1] = input[w-1];
2908 
2909    STBI_NOTUSED(in_far);
2910    STBI_NOTUSED(hs);
2911 
2912    return out;
2913 }
2914 
2915 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
2916 
stbi__resample_row_hv_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2917 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2918 {
2919    // need to generate 2x2 samples for every one in input
2920    int i,t0,t1;
2921    if (w == 1) {
2922       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2923       return out;
2924    }
2925 
2926    t1 = 3*in_near[0] + in_far[0];
2927    out[0] = stbi__div4(t1+2);
2928    for (i=1; i < w; ++i) {
2929       t0 = t1;
2930       t1 = 3*in_near[i]+in_far[i];
2931       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
2932       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
2933    }
2934    out[w*2-1] = stbi__div4(t1+2);
2935 
2936    STBI_NOTUSED(hs);
2937 
2938    return out;
2939 }
2940 
2941 #if defined(STBI_SSE2) || defined(STBI_NEON)
stbi__resample_row_hv_2_simd(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2942 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2943 {
2944    // need to generate 2x2 samples for every one in input
2945    int i=0,t0,t1;
2946 
2947    if (w == 1) {
2948       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2949       return out;
2950    }
2951 
2952    t1 = 3*in_near[0] + in_far[0];
2953    // process groups of 8 pixels for as long as we can.
2954    // note we can't handle the last pixel in a row in this loop
2955    // because we need to handle the filter boundary conditions.
2956    for (; i < ((w-1) & ~7); i += 8) {
2957 #if defined(STBI_SSE2)
2958       // load and perform the vertical filtering pass
2959       // this uses 3*x + y = 4*x + (y - x)
2960       __m128i zero  = _mm_setzero_si128();
2961       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
2962       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
2963       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
2964       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
2965       __m128i diff  = _mm_sub_epi16(farw, nearw);
2966       __m128i nears = _mm_slli_epi16(nearw, 2);
2967       __m128i curr  = _mm_add_epi16(nears, diff); // current row
2968 
2969       // horizontal filter works the same based on shifted vers of current
2970       // row. "prev" is current row shifted right by 1 pixel; we need to
2971       // insert the previous pixel value (from t1).
2972       // "next" is current row shifted left by 1 pixel, with first pixel
2973       // of next block of 8 pixels added in.
2974       __m128i prv0 = _mm_slli_si128(curr, 2);
2975       __m128i nxt0 = _mm_srli_si128(curr, 2);
2976       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
2977       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
2978 
2979       // horizontal filter, polyphase implementation since it's convenient:
2980       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
2981       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
2982       // note the shared term.
2983       __m128i bias  = _mm_set1_epi16(8);
2984       __m128i curs = _mm_slli_epi16(curr, 2);
2985       __m128i prvd = _mm_sub_epi16(prev, curr);
2986       __m128i nxtd = _mm_sub_epi16(next, curr);
2987       __m128i curb = _mm_add_epi16(curs, bias);
2988       __m128i even = _mm_add_epi16(prvd, curb);
2989       __m128i odd  = _mm_add_epi16(nxtd, curb);
2990 
2991       // interleave even and odd pixels, then undo scaling.
2992       __m128i int0 = _mm_unpacklo_epi16(even, odd);
2993       __m128i int1 = _mm_unpackhi_epi16(even, odd);
2994       __m128i de0  = _mm_srli_epi16(int0, 4);
2995       __m128i de1  = _mm_srli_epi16(int1, 4);
2996 
2997       // pack and write output
2998       __m128i outv = _mm_packus_epi16(de0, de1);
2999       _mm_storeu_si128((__m128i *) (out + i*2), outv);
3000 #elif defined(STBI_NEON)
3001       // load and perform the vertical filtering pass
3002       // this uses 3*x + y = 4*x + (y - x)
3003       uint8x8_t farb  = vld1_u8(in_far + i);
3004       uint8x8_t nearb = vld1_u8(in_near + i);
3005       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3006       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3007       int16x8_t curr  = vaddq_s16(nears, diff); // current row
3008 
3009       // horizontal filter works the same based on shifted vers of current
3010       // row. "prev" is current row shifted right by 1 pixel; we need to
3011       // insert the previous pixel value (from t1).
3012       // "next" is current row shifted left by 1 pixel, with first pixel
3013       // of next block of 8 pixels added in.
3014       int16x8_t prv0 = vextq_s16(curr, curr, 7);
3015       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3016       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3017       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
3018 
3019       // horizontal filter, polyphase implementation since it's convenient:
3020       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3021       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3022       // note the shared term.
3023       int16x8_t curs = vshlq_n_s16(curr, 2);
3024       int16x8_t prvd = vsubq_s16(prev, curr);
3025       int16x8_t nxtd = vsubq_s16(next, curr);
3026       int16x8_t even = vaddq_s16(curs, prvd);
3027       int16x8_t odd  = vaddq_s16(curs, nxtd);
3028 
3029       // undo scaling and round, then store with even/odd phases interleaved
3030       uint8x8x2_t o;
3031       o.val[0] = vqrshrun_n_s16(even, 4);
3032       o.val[1] = vqrshrun_n_s16(odd,  4);
3033       vst2_u8(out + i*2, o);
3034 #endif
3035 
3036       // "previous" value for next iter
3037       t1 = 3*in_near[i+7] + in_far[i+7];
3038    }
3039 
3040    t0 = t1;
3041    t1 = 3*in_near[i] + in_far[i];
3042    out[i*2] = stbi__div16(3*t1 + t0 + 8);
3043 
3044    for (++i; i < w; ++i) {
3045       t0 = t1;
3046       t1 = 3*in_near[i]+in_far[i];
3047       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3048       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3049    }
3050    out[w*2-1] = stbi__div4(t1+2);
3051 
3052    STBI_NOTUSED(hs);
3053 
3054    return out;
3055 }
3056 #endif
3057 
stbi__resample_row_generic(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)3058 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3059 {
3060    // resample with nearest-neighbor
3061    int i,j;
3062    STBI_NOTUSED(in_far);
3063    for (i=0; i < w; ++i)
3064       for (j=0; j < hs; ++j)
3065          out[i*hs+j] = in_near[i];
3066    return out;
3067 }
3068 
3069 #ifdef STBI_JPEG_OLD
3070 // this is the same YCbCr-to-RGB calculation that stb_image has used
3071 // historically before the algorithm changes in 1.49
3072 #define float2fixed(x)  ((int) ((x) * 65536 + 0.5))
stbi__YCbCr_to_RGB_row(stbi_uc * out,const stbi_uc * y,const stbi_uc * pcb,const stbi_uc * pcr,int count,int step)3073 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3074 {
3075    int i;
3076    for (i=0; i < count; ++i) {
3077       int y_fixed = (y[i] << 16) + 32768; // rounding
3078       int r,g,b;
3079       int cr = pcr[i] - 128;
3080       int cb = pcb[i] - 128;
3081       r = y_fixed + cr*float2fixed(1.40200f);
3082       g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
3083       b = y_fixed                            + cb*float2fixed(1.77200f);
3084       r >>= 16;
3085       g >>= 16;
3086       b >>= 16;
3087       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3088       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3089       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3090       out[0] = (stbi_uc)r;
3091       out[1] = (stbi_uc)g;
3092       out[2] = (stbi_uc)b;
3093       out[3] = 255;
3094       out += step;
3095    }
3096 }
3097 #else
3098 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
3099 // to make sure the code produces the same results in both SIMD and scalar
3100 #define float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
stbi__YCbCr_to_RGB_row(stbi_uc * out,const stbi_uc * y,const stbi_uc * pcb,const stbi_uc * pcr,int count,int step)3101 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3102 {
3103    int i;
3104    for (i=0; i < count; ++i) {
3105       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3106       int r,g,b;
3107       int cr = pcr[i] - 128;
3108       int cb = pcb[i] - 128;
3109       r = y_fixed +  cr* float2fixed(1.40200f);
3110       g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3111       b = y_fixed                               +   cb* float2fixed(1.77200f);
3112       r >>= 20;
3113       g >>= 20;
3114       b >>= 20;
3115       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3116       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3117       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3118       out[0] = (stbi_uc)r;
3119       out[1] = (stbi_uc)g;
3120       out[2] = (stbi_uc)b;
3121       out[3] = 255;
3122       out += step;
3123    }
3124 }
3125 #endif
3126 
3127 #if defined(STBI_SSE2) || defined(STBI_NEON)
stbi__YCbCr_to_RGB_simd(stbi_uc * out,stbi_uc const * y,stbi_uc const * pcb,stbi_uc const * pcr,int count,int step)3128 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3129 {
3130    int i = 0;
3131 
3132 #ifdef STBI_SSE2
3133    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3134    // it's useful in practice (you wouldn't use it for textures, for example).
3135    // so just accelerate step == 4 case.
3136    if (step == 4) {
3137       // this is a fairly straightforward implementation and not super-optimized.
3138       __m128i signflip  = _mm_set1_epi8(-0x80);
3139       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
3140       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3141       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3142       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
3143       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3144       __m128i xw = _mm_set1_epi16(255); // alpha channel
3145 
3146       for (; i+7 < count; i += 8) {
3147          // load
3148          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3149          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3150          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3151          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3152          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3153 
3154          // unpack to short (and left-shift cr, cb by 8)
3155          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
3156          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3157          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3158 
3159          // color transform
3160          __m128i yws = _mm_srli_epi16(yw, 4);
3161          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3162          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3163          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3164          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3165          __m128i rws = _mm_add_epi16(cr0, yws);
3166          __m128i gwt = _mm_add_epi16(cb0, yws);
3167          __m128i bws = _mm_add_epi16(yws, cb1);
3168          __m128i gws = _mm_add_epi16(gwt, cr1);
3169 
3170          // descale
3171          __m128i rw = _mm_srai_epi16(rws, 4);
3172          __m128i bw = _mm_srai_epi16(bws, 4);
3173          __m128i gw = _mm_srai_epi16(gws, 4);
3174 
3175          // back to byte, set up for transpose
3176          __m128i brb = _mm_packus_epi16(rw, bw);
3177          __m128i gxb = _mm_packus_epi16(gw, xw);
3178 
3179          // transpose to interleave channels
3180          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3181          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3182          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3183          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3184 
3185          // store
3186          _mm_storeu_si128((__m128i *) (out + 0), o0);
3187          _mm_storeu_si128((__m128i *) (out + 16), o1);
3188          out += 32;
3189       }
3190    }
3191 #endif
3192 
3193 #ifdef STBI_NEON
3194    // in this version, step=3 support would be easy to add. but is there demand?
3195    if (step == 4) {
3196       // this is a fairly straightforward implementation and not super-optimized.
3197       uint8x8_t signflip = vdup_n_u8(0x80);
3198       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
3199       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3200       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3201       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
3202 
3203       for (; i+7 < count; i += 8) {
3204          // load
3205          uint8x8_t y_bytes  = vld1_u8(y + i);
3206          uint8x8_t cr_bytes = vld1_u8(pcr + i);
3207          uint8x8_t cb_bytes = vld1_u8(pcb + i);
3208          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3209          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3210 
3211          // expand to s16
3212          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3213          int16x8_t crw = vshll_n_s8(cr_biased, 7);
3214          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3215 
3216          // color transform
3217          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3218          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3219          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3220          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3221          int16x8_t rws = vaddq_s16(yws, cr0);
3222          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3223          int16x8_t bws = vaddq_s16(yws, cb1);
3224 
3225          // undo scaling, round, convert to byte
3226          uint8x8x4_t o;
3227          o.val[0] = vqrshrun_n_s16(rws, 4);
3228          o.val[1] = vqrshrun_n_s16(gws, 4);
3229          o.val[2] = vqrshrun_n_s16(bws, 4);
3230          o.val[3] = vdup_n_u8(255);
3231 
3232          // store, interleaving r/g/b/a
3233          vst4_u8(out, o);
3234          out += 8*4;
3235       }
3236    }
3237 #endif
3238 
3239    for (; i < count; ++i) {
3240       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3241       int r,g,b;
3242       int cr = pcr[i] - 128;
3243       int cb = pcb[i] - 128;
3244       r = y_fixed + cr* float2fixed(1.40200f);
3245       g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3246       b = y_fixed                             +   cb* float2fixed(1.77200f);
3247       r >>= 20;
3248       g >>= 20;
3249       b >>= 20;
3250       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3251       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3252       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3253       out[0] = (stbi_uc)r;
3254       out[1] = (stbi_uc)g;
3255       out[2] = (stbi_uc)b;
3256       out[3] = 255;
3257       out += step;
3258    }
3259 }
3260 #endif
3261 
3262 // set up the kernels
stbi__setup_jpeg(stbi__jpeg * j)3263 static void stbi__setup_jpeg(stbi__jpeg *j)
3264 {
3265    j->idct_block_kernel = stbi__idct_block;
3266    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3267    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3268 
3269 #ifdef STBI_SSE2
3270    if (stbi__sse2_available()) {
3271       j->idct_block_kernel = stbi__idct_simd;
3272       #ifndef STBI_JPEG_OLD
3273       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3274       #endif
3275       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3276    }
3277 #endif
3278 
3279 #ifdef STBI_NEON
3280    j->idct_block_kernel = stbi__idct_simd;
3281    #ifndef STBI_JPEG_OLD
3282    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3283    #endif
3284    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3285 #endif
3286 }
3287 
3288 // clean up the temporary component buffers
stbi__cleanup_jpeg(stbi__jpeg * j)3289 static void stbi__cleanup_jpeg(stbi__jpeg *j)
3290 {
3291    int i;
3292    for (i=0; i < j->s->img_n; ++i) {
3293       if (j->img_comp[i].raw_data) {
3294          STBI_FREE(j->img_comp[i].raw_data);
3295          j->img_comp[i].raw_data = NULL;
3296          j->img_comp[i].data = NULL;
3297       }
3298       if (j->img_comp[i].raw_coeff) {
3299          STBI_FREE(j->img_comp[i].raw_coeff);
3300          j->img_comp[i].raw_coeff = 0;
3301          j->img_comp[i].coeff = 0;
3302       }
3303       if (j->img_comp[i].linebuf) {
3304          STBI_FREE(j->img_comp[i].linebuf);
3305          j->img_comp[i].linebuf = NULL;
3306       }
3307    }
3308 }
3309 
3310 typedef struct
3311 {
3312    resample_row_func resample;
3313    stbi_uc *line0,*line1;
3314    int hs,vs;   // expansion factor in each axis
3315    int w_lores; // horizontal pixels pre-expansion
3316    int ystep;   // how far through vertical expansion we are
3317    int ypos;    // which pre-expansion row we're on
3318 } stbi__resample;
3319 
load_jpeg_image(stbi__jpeg * z,int * out_x,int * out_y,int * comp,int req_comp)3320 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3321 {
3322    int n, decode_n;
3323    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3324 
3325    // validate req_comp
3326    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3327 
3328    // load a jpeg image from whichever source, but leave in YCbCr format
3329    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3330 
3331    // determine actual number of components to generate
3332    n = req_comp ? req_comp : z->s->img_n;
3333 
3334    if (z->s->img_n == 3 && n < 3)
3335       decode_n = 1;
3336    else
3337       decode_n = z->s->img_n;
3338 
3339    // resample and color-convert
3340    {
3341       int k;
3342       unsigned int i,j;
3343       stbi_uc *output;
3344       stbi_uc *coutput[4];
3345 
3346       stbi__resample res_comp[4];
3347 
3348       for (k=0; k < decode_n; ++k) {
3349          stbi__resample *r = &res_comp[k];
3350 
3351          // allocate line buffer big enough for upsampling off the edges
3352          // with upsample factor of 4
3353          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3354          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3355 
3356          r->hs      = z->img_h_max / z->img_comp[k].h;
3357          r->vs      = z->img_v_max / z->img_comp[k].v;
3358          r->ystep   = r->vs >> 1;
3359          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3360          r->ypos    = 0;
3361          r->line0   = r->line1 = z->img_comp[k].data;
3362 
3363          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3364          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3365          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3366          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3367          else                               r->resample = stbi__resample_row_generic;
3368       }
3369 
3370       // can't error after this so, this is safe
3371       output = (stbi_uc *) stbi__malloc(n * z->s->img_x * z->s->img_y + 1);
3372       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3373 
3374       // now go ahead and resample
3375       for (j=0; j < z->s->img_y; ++j) {
3376          stbi_uc *out = output + n * z->s->img_x * j;
3377          for (k=0; k < decode_n; ++k) {
3378             stbi__resample *r = &res_comp[k];
3379             int y_bot = r->ystep >= (r->vs >> 1);
3380             coutput[k] = r->resample(z->img_comp[k].linebuf,
3381                                      y_bot ? r->line1 : r->line0,
3382                                      y_bot ? r->line0 : r->line1,
3383                                      r->w_lores, r->hs);
3384             if (++r->ystep >= r->vs) {
3385                r->ystep = 0;
3386                r->line0 = r->line1;
3387                if (++r->ypos < z->img_comp[k].y)
3388                   r->line1 += z->img_comp[k].w2;
3389             }
3390          }
3391          if (n >= 3) {
3392             stbi_uc *y = coutput[0];
3393             if (z->s->img_n == 3) {
3394                if (z->rgb == 3) {
3395                   for (i=0; i < z->s->img_x; ++i) {
3396                      out[0] = y[i];
3397                      out[1] = coutput[1][i];
3398                      out[2] = coutput[2][i];
3399                      out[3] = 255;
3400                      out += n;
3401                   }
3402                } else {
3403                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3404                }
3405             } else
3406                for (i=0; i < z->s->img_x; ++i) {
3407                   out[0] = out[1] = out[2] = y[i];
3408                   out[3] = 255; // not used if n==3
3409                   out += n;
3410                }
3411          } else {
3412             stbi_uc *y = coutput[0];
3413             if (n == 1)
3414                for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
3415             else
3416                for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
3417          }
3418       }
3419       stbi__cleanup_jpeg(z);
3420       *out_x = z->s->img_x;
3421       *out_y = z->s->img_y;
3422       if (comp) *comp  = z->s->img_n; // report original components, not output
3423       return output;
3424    }
3425 }
3426 
stbi__jpeg_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)3427 static unsigned char *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
3428 {
3429    unsigned char* result;
3430    stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
3431    j->s = s;
3432    stbi__setup_jpeg(j);
3433    result = load_jpeg_image(j, x,y,comp,req_comp);
3434    STBI_FREE(j);
3435    return result;
3436 }
3437 
stbi__jpeg_test(stbi__context * s)3438 static int stbi__jpeg_test(stbi__context *s)
3439 {
3440    int r;
3441    stbi__jpeg j;
3442    j.s = s;
3443    stbi__setup_jpeg(&j);
3444    r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
3445    stbi__rewind(s);
3446    return r;
3447 }
3448 
stbi__jpeg_info_raw(stbi__jpeg * j,int * x,int * y,int * comp)3449 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
3450 {
3451    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3452       stbi__rewind( j->s );
3453       return 0;
3454    }
3455    if (x) *x = j->s->img_x;
3456    if (y) *y = j->s->img_y;
3457    if (comp) *comp = j->s->img_n;
3458    return 1;
3459 }
3460 
stbi__jpeg_info(stbi__context * s,int * x,int * y,int * comp)3461 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
3462 {
3463    int result;
3464    stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
3465    j->s = s;
3466    result = stbi__jpeg_info_raw(j, x, y, comp);
3467    STBI_FREE(j);
3468    return result;
3469 }
3470 #endif
3471 
3472 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
3473 //    simple implementation
3474 //      - all input must be provided in an upfront buffer
3475 //      - all output is written to a single output buffer (can malloc/realloc)
3476 //    performance
3477 //      - fast huffman
3478 
3479 #ifndef STBI_NO_ZLIB
3480 
3481 // fast-way is faster to check than jpeg huffman, but slow way is slower
3482 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
3483 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
3484 
3485 // zlib-style huffman encoding
3486 // (jpegs packs from left, zlib from right, so can't share code)
3487 typedef struct
3488 {
3489    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
3490    stbi__uint16 firstcode[16];
3491    int maxcode[17];
3492    stbi__uint16 firstsymbol[16];
3493    stbi_uc  size[288];
3494    stbi__uint16 value[288];
3495 } stbi__zhuffman;
3496 
stbi__bitreverse16(int n)3497 stbi_inline static int stbi__bitreverse16(int n)
3498 {
3499   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
3500   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
3501   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
3502   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
3503   return n;
3504 }
3505 
stbi__bit_reverse(int v,int bits)3506 stbi_inline static int stbi__bit_reverse(int v, int bits)
3507 {
3508    STBI_ASSERT(bits <= 16);
3509    // to bit reverse n bits, reverse 16 and shift
3510    // e.g. 11 bits, bit reverse and shift away 5
3511    return stbi__bitreverse16(v) >> (16-bits);
3512 }
3513 
stbi__zbuild_huffman(stbi__zhuffman * z,stbi_uc * sizelist,int num)3514 static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num)
3515 {
3516    int i,k=0;
3517    int code, next_code[16], sizes[17];
3518 
3519    // DEFLATE spec for generating codes
3520    memset(sizes, 0, sizeof(sizes));
3521    memset(z->fast, 0, sizeof(z->fast));
3522    for (i=0; i < num; ++i)
3523       ++sizes[sizelist[i]];
3524    sizes[0] = 0;
3525    for (i=1; i < 16; ++i)
3526       if (sizes[i] > (1 << i))
3527          return stbi__err("bad sizes", "Corrupt PNG");
3528    code = 0;
3529    for (i=1; i < 16; ++i) {
3530       next_code[i] = code;
3531       z->firstcode[i] = (stbi__uint16) code;
3532       z->firstsymbol[i] = (stbi__uint16) k;
3533       code = (code + sizes[i]);
3534       if (sizes[i])
3535          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
3536       z->maxcode[i] = code << (16-i); // preshift for inner loop
3537       code <<= 1;
3538       k += sizes[i];
3539    }
3540    z->maxcode[16] = 0x10000; // sentinel
3541    for (i=0; i < num; ++i) {
3542       int s = sizelist[i];
3543       if (s) {
3544          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
3545          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
3546          z->size [c] = (stbi_uc     ) s;
3547          z->value[c] = (stbi__uint16) i;
3548          if (s <= STBI__ZFAST_BITS) {
3549             int j = stbi__bit_reverse(next_code[s],s);
3550             while (j < (1 << STBI__ZFAST_BITS)) {
3551                z->fast[j] = fastv;
3552                j += (1 << s);
3553             }
3554          }
3555          ++next_code[s];
3556       }
3557    }
3558    return 1;
3559 }
3560 
3561 // zlib-from-memory implementation for PNG reading
3562 //    because PNG allows splitting the zlib stream arbitrarily,
3563 //    and it's annoying structurally to have PNG call ZLIB call PNG,
3564 //    we require PNG read all the IDATs and combine them into a single
3565 //    memory buffer
3566 
3567 typedef struct
3568 {
3569    stbi_uc *zbuffer, *zbuffer_end;
3570    int num_bits;
3571    stbi__uint32 code_buffer;
3572 
3573    char *zout;
3574    char *zout_start;
3575    char *zout_end;
3576    int   z_expandable;
3577 
3578    stbi__zhuffman z_length, z_distance;
3579 } stbi__zbuf;
3580 
stbi__zget8(stbi__zbuf * z)3581 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
3582 {
3583    if (z->zbuffer >= z->zbuffer_end) return 0;
3584    return *z->zbuffer++;
3585 }
3586 
stbi__fill_bits(stbi__zbuf * z)3587 static void stbi__fill_bits(stbi__zbuf *z)
3588 {
3589    do {
3590       STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
3591       z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
3592       z->num_bits += 8;
3593    } while (z->num_bits <= 24);
3594 }
3595 
stbi__zreceive(stbi__zbuf * z,int n)3596 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
3597 {
3598    unsigned int k;
3599    if (z->num_bits < n) stbi__fill_bits(z);
3600    k = z->code_buffer & ((1 << n) - 1);
3601    z->code_buffer >>= n;
3602    z->num_bits -= n;
3603    return k;
3604 }
3605 
stbi__zhuffman_decode_slowpath(stbi__zbuf * a,stbi__zhuffman * z)3606 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
3607 {
3608    int b,s,k;
3609    // not resolved by fast table, so compute it the slow way
3610    // use jpeg approach, which requires MSbits at top
3611    k = stbi__bit_reverse(a->code_buffer, 16);
3612    for (s=STBI__ZFAST_BITS+1; ; ++s)
3613       if (k < z->maxcode[s])
3614          break;
3615    if (s == 16) return -1; // invalid code!
3616    // code size is s, so:
3617    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
3618    STBI_ASSERT(z->size[b] == s);
3619    a->code_buffer >>= s;
3620    a->num_bits -= s;
3621    return z->value[b];
3622 }
3623 
stbi__zhuffman_decode(stbi__zbuf * a,stbi__zhuffman * z)3624 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
3625 {
3626    int b,s;
3627    if (a->num_bits < 16) stbi__fill_bits(a);
3628    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
3629    if (b) {
3630       s = b >> 9;
3631       a->code_buffer >>= s;
3632       a->num_bits -= s;
3633       return b & 511;
3634    }
3635    return stbi__zhuffman_decode_slowpath(a, z);
3636 }
3637 
stbi__zexpand(stbi__zbuf * z,char * zout,int n)3638 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
3639 {
3640    char *q;
3641    int cur, limit, old_limit;
3642    z->zout = zout;
3643    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
3644    cur   = (int) (z->zout     - z->zout_start);
3645    limit = old_limit = (int) (z->zout_end - z->zout_start);
3646    while (cur + n > limit)
3647       limit *= 2;
3648    q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
3649    STBI_NOTUSED(old_limit);
3650    if (q == NULL) return stbi__err("outofmem", "Out of memory");
3651    z->zout_start = q;
3652    z->zout       = q + cur;
3653    z->zout_end   = q + limit;
3654    return 1;
3655 }
3656 
3657 static int stbi__zlength_base[31] = {
3658    3,4,5,6,7,8,9,10,11,13,
3659    15,17,19,23,27,31,35,43,51,59,
3660    67,83,99,115,131,163,195,227,258,0,0 };
3661 
3662 static int stbi__zlength_extra[31]=
3663 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
3664 
3665 static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
3666 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
3667 
3668 static int stbi__zdist_extra[32] =
3669 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
3670 
stbi__parse_huffman_block(stbi__zbuf * a)3671 static int stbi__parse_huffman_block(stbi__zbuf *a)
3672 {
3673    char *zout = a->zout;
3674    for(;;) {
3675       int z = stbi__zhuffman_decode(a, &a->z_length);
3676       if (z < 256) {
3677          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
3678          if (zout >= a->zout_end) {
3679             if (!stbi__zexpand(a, zout, 1)) return 0;
3680             zout = a->zout;
3681          }
3682          *zout++ = (char) z;
3683       } else {
3684          stbi_uc *p;
3685          int len,dist;
3686          if (z == 256) {
3687             a->zout = zout;
3688             return 1;
3689          }
3690          z -= 257;
3691          len = stbi__zlength_base[z];
3692          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
3693          z = stbi__zhuffman_decode(a, &a->z_distance);
3694          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
3695          dist = stbi__zdist_base[z];
3696          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
3697          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
3698          if (zout + len > a->zout_end) {
3699             if (!stbi__zexpand(a, zout, len)) return 0;
3700             zout = a->zout;
3701          }
3702          p = (stbi_uc *) (zout - dist);
3703          if (dist == 1) { // run of one byte; common in images.
3704             stbi_uc v = *p;
3705             if (len) { do *zout++ = v; while (--len); }
3706          } else {
3707             if (len) { do *zout++ = *p++; while (--len); }
3708          }
3709       }
3710    }
3711 }
3712 
stbi__compute_huffman_codes(stbi__zbuf * a)3713 static int stbi__compute_huffman_codes(stbi__zbuf *a)
3714 {
3715    static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
3716    stbi__zhuffman z_codelength;
3717    stbi_uc lencodes[286+32+137];//padding for maximum single op
3718    stbi_uc codelength_sizes[19];
3719    int i,n;
3720 
3721    int hlit  = stbi__zreceive(a,5) + 257;
3722    int hdist = stbi__zreceive(a,5) + 1;
3723    int hclen = stbi__zreceive(a,4) + 4;
3724 
3725    memset(codelength_sizes, 0, sizeof(codelength_sizes));
3726    for (i=0; i < hclen; ++i) {
3727       int s = stbi__zreceive(a,3);
3728       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
3729    }
3730    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
3731 
3732    n = 0;
3733    while (n < hlit + hdist) {
3734       int c = stbi__zhuffman_decode(a, &z_codelength);
3735       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
3736       if (c < 16)
3737          lencodes[n++] = (stbi_uc) c;
3738       else if (c == 16) {
3739          c = stbi__zreceive(a,2)+3;
3740          memset(lencodes+n, lencodes[n-1], c);
3741          n += c;
3742       } else if (c == 17) {
3743          c = stbi__zreceive(a,3)+3;
3744          memset(lencodes+n, 0, c);
3745          n += c;
3746       } else {
3747          STBI_ASSERT(c == 18);
3748          c = stbi__zreceive(a,7)+11;
3749          memset(lencodes+n, 0, c);
3750          n += c;
3751       }
3752    }
3753    if (n != hlit+hdist) return stbi__err("bad codelengths","Corrupt PNG");
3754    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
3755    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
3756    return 1;
3757 }
3758 
stbi__parse_uncompressed_block(stbi__zbuf * a)3759 static int stbi__parse_uncompressed_block(stbi__zbuf *a)
3760 {
3761    stbi_uc header[4];
3762    int len,nlen,k;
3763    if (a->num_bits & 7)
3764       stbi__zreceive(a, a->num_bits & 7); // discard
3765    // drain the bit-packed data into header
3766    k = 0;
3767    while (a->num_bits > 0) {
3768       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
3769       a->code_buffer >>= 8;
3770       a->num_bits -= 8;
3771    }
3772    STBI_ASSERT(a->num_bits == 0);
3773    // now fill header the normal way
3774    while (k < 4)
3775       header[k++] = stbi__zget8(a);
3776    len  = header[1] * 256 + header[0];
3777    nlen = header[3] * 256 + header[2];
3778    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
3779    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
3780    if (a->zout + len > a->zout_end)
3781       if (!stbi__zexpand(a, a->zout, len)) return 0;
3782    memcpy(a->zout, a->zbuffer, len);
3783    a->zbuffer += len;
3784    a->zout += len;
3785    return 1;
3786 }
3787 
stbi__parse_zlib_header(stbi__zbuf * a)3788 static int stbi__parse_zlib_header(stbi__zbuf *a)
3789 {
3790    int cmf   = stbi__zget8(a);
3791    int cm    = cmf & 15;
3792    /* int cinfo = cmf >> 4; */
3793    int flg   = stbi__zget8(a);
3794    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
3795    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
3796    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
3797    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
3798    return 1;
3799 }
3800 
3801 // @TODO: should statically initialize these for optimal thread safety
3802 static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
stbi__init_zdefaults(void)3803 static void stbi__init_zdefaults(void)
3804 {
3805    int i;   // use <= to match clearly with spec
3806    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
3807    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
3808    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
3809    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
3810 
3811    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
3812 }
3813 
stbi__parse_zlib(stbi__zbuf * a,int parse_header)3814 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
3815 {
3816    int final, type;
3817    if (parse_header)
3818       if (!stbi__parse_zlib_header(a)) return 0;
3819    a->num_bits = 0;
3820    a->code_buffer = 0;
3821    do {
3822       final = stbi__zreceive(a,1);
3823       type = stbi__zreceive(a,2);
3824       if (type == 0) {
3825          if (!stbi__parse_uncompressed_block(a)) return 0;
3826       } else if (type == 3) {
3827          return 0;
3828       } else {
3829          if (type == 1) {
3830             // use fixed code lengths
3831             if (!stbi__zdefault_distance[31]) stbi__init_zdefaults();
3832             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , 288)) return 0;
3833             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
3834          } else {
3835             if (!stbi__compute_huffman_codes(a)) return 0;
3836          }
3837          if (!stbi__parse_huffman_block(a)) return 0;
3838       }
3839    } while (!final);
3840    return 1;
3841 }
3842 
stbi__do_zlib(stbi__zbuf * a,char * obuf,int olen,int exp,int parse_header)3843 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
3844 {
3845    a->zout_start = obuf;
3846    a->zout       = obuf;
3847    a->zout_end   = obuf + olen;
3848    a->z_expandable = exp;
3849 
3850    return stbi__parse_zlib(a, parse_header);
3851 }
3852 
stbi_zlib_decode_malloc_guesssize(const char * buffer,int len,int initial_size,int * outlen)3853 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
3854 {
3855    stbi__zbuf a;
3856    char *p = (char *) stbi__malloc(initial_size);
3857    if (p == NULL) return NULL;
3858    a.zbuffer = (stbi_uc *) buffer;
3859    a.zbuffer_end = (stbi_uc *) buffer + len;
3860    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
3861       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3862       return a.zout_start;
3863    } else {
3864       STBI_FREE(a.zout_start);
3865       return NULL;
3866    }
3867 }
3868 
stbi_zlib_decode_malloc(char const * buffer,int len,int * outlen)3869 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
3870 {
3871    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
3872 }
3873 
stbi_zlib_decode_malloc_guesssize_headerflag(const char * buffer,int len,int initial_size,int * outlen,int parse_header)3874 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
3875 {
3876    stbi__zbuf a;
3877    char *p = (char *) stbi__malloc(initial_size);
3878    if (p == NULL) return NULL;
3879    a.zbuffer = (stbi_uc *) buffer;
3880    a.zbuffer_end = (stbi_uc *) buffer + len;
3881    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
3882       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3883       return a.zout_start;
3884    } else {
3885       STBI_FREE(a.zout_start);
3886       return NULL;
3887    }
3888 }
3889 
stbi_zlib_decode_buffer(char * obuffer,int olen,char const * ibuffer,int ilen)3890 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
3891 {
3892    stbi__zbuf a;
3893    a.zbuffer = (stbi_uc *) ibuffer;
3894    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3895    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
3896       return (int) (a.zout - a.zout_start);
3897    else
3898       return -1;
3899 }
3900 
stbi_zlib_decode_noheader_malloc(char const * buffer,int len,int * outlen)3901 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
3902 {
3903    stbi__zbuf a;
3904    char *p = (char *) stbi__malloc(16384);
3905    if (p == NULL) return NULL;
3906    a.zbuffer = (stbi_uc *) buffer;
3907    a.zbuffer_end = (stbi_uc *) buffer+len;
3908    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
3909       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3910       return a.zout_start;
3911    } else {
3912       STBI_FREE(a.zout_start);
3913       return NULL;
3914    }
3915 }
3916 
stbi_zlib_decode_noheader_buffer(char * obuffer,int olen,const char * ibuffer,int ilen)3917 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
3918 {
3919    stbi__zbuf a;
3920    a.zbuffer = (stbi_uc *) ibuffer;
3921    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3922    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
3923       return (int) (a.zout - a.zout_start);
3924    else
3925       return -1;
3926 }
3927 #endif
3928 
3929 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
3930 //    simple implementation
3931 //      - only 8-bit samples
3932 //      - no CRC checking
3933 //      - allocates lots of intermediate memory
3934 //        - avoids problem of streaming data between subsystems
3935 //        - avoids explicit window management
3936 //    performance
3937 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
3938 
3939 #ifndef STBI_NO_PNG
3940 typedef struct
3941 {
3942    stbi__uint32 length;
3943    stbi__uint32 type;
3944 } stbi__pngchunk;
3945 
stbi__get_chunk_header(stbi__context * s)3946 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
3947 {
3948    stbi__pngchunk c;
3949    c.length = stbi__get32be(s);
3950    c.type   = stbi__get32be(s);
3951    return c;
3952 }
3953 
stbi__check_png_header(stbi__context * s)3954 static int stbi__check_png_header(stbi__context *s)
3955 {
3956    static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
3957    int i;
3958    for (i=0; i < 8; ++i)
3959       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
3960    return 1;
3961 }
3962 
3963 typedef struct
3964 {
3965    stbi__context *s;
3966    stbi_uc *idata, *expanded, *out;
3967    int depth;
3968 } stbi__png;
3969 
3970 
3971 enum {
3972    STBI__F_none=0,
3973    STBI__F_sub=1,
3974    STBI__F_up=2,
3975    STBI__F_avg=3,
3976    STBI__F_paeth=4,
3977    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
3978    STBI__F_avg_first,
3979    STBI__F_paeth_first
3980 };
3981 
3982 static stbi_uc first_row_filter[5] =
3983 {
3984    STBI__F_none,
3985    STBI__F_sub,
3986    STBI__F_none,
3987    STBI__F_avg_first,
3988    STBI__F_paeth_first
3989 };
3990 
stbi__paeth(int a,int b,int c)3991 static int stbi__paeth(int a, int b, int c)
3992 {
3993    int p = a + b - c;
3994    int pa = abs(p-a);
3995    int pb = abs(p-b);
3996    int pc = abs(p-c);
3997    if (pa <= pb && pa <= pc) return a;
3998    if (pb <= pc) return b;
3999    return c;
4000 }
4001 
4002 static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
4003 
4004 // create the png data from post-deflated data
stbi__create_png_image_raw(stbi__png * a,stbi_uc * raw,stbi__uint32 raw_len,int out_n,stbi__uint32 x,stbi__uint32 y,int depth,int color)4005 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
4006 {
4007    int bytes = (depth == 16? 2 : 1);
4008    stbi__context *s = a->s;
4009    stbi__uint32 i,j,stride = x*out_n*bytes;
4010    stbi__uint32 img_len, img_width_bytes;
4011    int k;
4012    int img_n = s->img_n; // copy it into a local for later
4013 
4014    int output_bytes = out_n*bytes;
4015    int filter_bytes = img_n*bytes;
4016    int width = x;
4017 
4018    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
4019    a->out = (stbi_uc *) stbi__malloc(x * y * output_bytes); // extra bytes to write off the end into
4020    if (!a->out) return stbi__err("outofmem", "Out of memory");
4021 
4022    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
4023    img_len = (img_width_bytes + 1) * y;
4024    if (s->img_x == x && s->img_y == y) {
4025       if (raw_len != img_len) return stbi__err("not enough pixels","Corrupt PNG");
4026    } else { // interlaced:
4027       if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
4028    }
4029 
4030    for (j=0; j < y; ++j) {
4031       stbi_uc *cur = a->out + stride*j;
4032       stbi_uc *prior = cur - stride;
4033       int filter = *raw++;
4034 
4035       if (filter > 4)
4036          return stbi__err("invalid filter","Corrupt PNG");
4037 
4038       if (depth < 8) {
4039          STBI_ASSERT(img_width_bytes <= x);
4040          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
4041          filter_bytes = 1;
4042          width = img_width_bytes;
4043       }
4044 
4045       // if first row, use special filter that doesn't sample previous row
4046       if (j == 0) filter = first_row_filter[filter];
4047 
4048       // handle first byte explicitly
4049       for (k=0; k < filter_bytes; ++k) {
4050          switch (filter) {
4051             case STBI__F_none       : cur[k] = raw[k]; break;
4052             case STBI__F_sub        : cur[k] = raw[k]; break;
4053             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4054             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
4055             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
4056             case STBI__F_avg_first  : cur[k] = raw[k]; break;
4057             case STBI__F_paeth_first: cur[k] = raw[k]; break;
4058          }
4059       }
4060 
4061       if (depth == 8) {
4062          if (img_n != out_n)
4063             cur[img_n] = 255; // first pixel
4064          raw += img_n;
4065          cur += out_n;
4066          prior += out_n;
4067       } else if (depth == 16) {
4068          if (img_n != out_n) {
4069             cur[filter_bytes]   = 255; // first pixel top byte
4070             cur[filter_bytes+1] = 255; // first pixel bottom byte
4071          }
4072          raw += filter_bytes;
4073          cur += output_bytes;
4074          prior += output_bytes;
4075       } else {
4076          raw += 1;
4077          cur += 1;
4078          prior += 1;
4079       }
4080 
4081       // this is a little gross, so that we don't switch per-pixel or per-component
4082       if (depth < 8 || img_n == out_n) {
4083          int nk = (width - 1)*filter_bytes;
4084          #define CASE(f) \
4085              case f:     \
4086                 for (k=0; k < nk; ++k)
4087          switch (filter) {
4088             // "none" filter turns into a memcpy here; make that explicit.
4089             case STBI__F_none:         memcpy(cur, raw, nk); break;
4090             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); break;
4091             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4092             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); break;
4093             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); break;
4094             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); break;
4095             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); break;
4096          }
4097          #undef CASE
4098          raw += nk;
4099       } else {
4100          STBI_ASSERT(img_n+1 == out_n);
4101          #define CASE(f) \
4102              case f:     \
4103                 for (i=x-1; i >= 1; --i, cur[filter_bytes]=255,raw+=filter_bytes,cur+=output_bytes,prior+=output_bytes) \
4104                    for (k=0; k < filter_bytes; ++k)
4105          switch (filter) {
4106             CASE(STBI__F_none)         cur[k] = raw[k]; break;
4107             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k- output_bytes]); break;
4108             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4109             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k- output_bytes])>>1)); break;
4110             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],prior[k],prior[k- output_bytes])); break;
4111             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k- output_bytes] >> 1)); break;
4112             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],0,0)); break;
4113          }
4114          #undef CASE
4115 
4116          // the loop above sets the high byte of the pixels' alpha, but for
4117          // 16 bit png files we also need the low byte set. we'll do that here.
4118          if (depth == 16) {
4119             cur = a->out + stride*j; // start at the beginning of the row again
4120             for (i=0; i < x; ++i,cur+=output_bytes) {
4121                cur[filter_bytes+1] = 255;
4122             }
4123          }
4124       }
4125    }
4126 
4127    // we make a separate pass to expand bits to pixels; for performance,
4128    // this could run two scanlines behind the above code, so it won't
4129    // intefere with filtering but will still be in the cache.
4130    if (depth < 8) {
4131       for (j=0; j < y; ++j) {
4132          stbi_uc *cur = a->out + stride*j;
4133          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
4134          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
4135          // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
4136          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4137 
4138          // note that the final byte might overshoot and write more data than desired.
4139          // we can allocate enough data that this never writes out of memory, but it
4140          // could also overwrite the next scanline. can it overwrite non-empty data
4141          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
4142          // so we need to explicitly clamp the final ones
4143 
4144          if (depth == 4) {
4145             for (k=x*img_n; k >= 2; k-=2, ++in) {
4146                *cur++ = scale * ((*in >> 4)       );
4147                *cur++ = scale * ((*in     ) & 0x0f);
4148             }
4149             if (k > 0) *cur++ = scale * ((*in >> 4)       );
4150          } else if (depth == 2) {
4151             for (k=x*img_n; k >= 4; k-=4, ++in) {
4152                *cur++ = scale * ((*in >> 6)       );
4153                *cur++ = scale * ((*in >> 4) & 0x03);
4154                *cur++ = scale * ((*in >> 2) & 0x03);
4155                *cur++ = scale * ((*in     ) & 0x03);
4156             }
4157             if (k > 0) *cur++ = scale * ((*in >> 6)       );
4158             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
4159             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
4160          } else if (depth == 1) {
4161             for (k=x*img_n; k >= 8; k-=8, ++in) {
4162                *cur++ = scale * ((*in >> 7)       );
4163                *cur++ = scale * ((*in >> 6) & 0x01);
4164                *cur++ = scale * ((*in >> 5) & 0x01);
4165                *cur++ = scale * ((*in >> 4) & 0x01);
4166                *cur++ = scale * ((*in >> 3) & 0x01);
4167                *cur++ = scale * ((*in >> 2) & 0x01);
4168                *cur++ = scale * ((*in >> 1) & 0x01);
4169                *cur++ = scale * ((*in     ) & 0x01);
4170             }
4171             if (k > 0) *cur++ = scale * ((*in >> 7)       );
4172             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4173             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4174             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4175             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4176             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4177             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4178          }
4179          if (img_n != out_n) {
4180             int q;
4181             // insert alpha = 255
4182             cur = a->out + stride*j;
4183             if (img_n == 1) {
4184                for (q=x-1; q >= 0; --q) {
4185                   cur[q*2+1] = 255;
4186                   cur[q*2+0] = cur[q];
4187                }
4188             } else {
4189                STBI_ASSERT(img_n == 3);
4190                for (q=x-1; q >= 0; --q) {
4191                   cur[q*4+3] = 255;
4192                   cur[q*4+2] = cur[q*3+2];
4193                   cur[q*4+1] = cur[q*3+1];
4194                   cur[q*4+0] = cur[q*3+0];
4195                }
4196             }
4197          }
4198       }
4199    } else if (depth == 16) {
4200       // force the image data from big-endian to platform-native.
4201       // this is done in a separate pass due to the decoding relying
4202       // on the data being untouched, but could probably be done
4203       // per-line during decode if care is taken.
4204       stbi_uc *cur = a->out;
4205       stbi__uint16 *cur16 = (stbi__uint16*)cur;
4206 
4207       for(i=0; i < x*y*out_n; ++i,cur16++,cur+=2) {
4208          *cur16 = (cur[0] << 8) | cur[1];
4209       }
4210    }
4211 
4212    return 1;
4213 }
4214 
stbi__create_png_image(stbi__png * a,stbi_uc * image_data,stbi__uint32 image_data_len,int out_n,int depth,int color,int interlaced)4215 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4216 {
4217    stbi_uc *final;
4218    int p;
4219    if (!interlaced)
4220       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4221 
4222    // de-interlacing
4223    final = (stbi_uc *) stbi__malloc(a->s->img_x * a->s->img_y * out_n);
4224    for (p=0; p < 7; ++p) {
4225       int xorig[] = { 0,4,0,2,0,1,0 };
4226       int yorig[] = { 0,0,4,0,2,0,1 };
4227       int xspc[]  = { 8,8,4,4,2,2,1 };
4228       int yspc[]  = { 8,8,8,4,4,2,2 };
4229       int i,j,x,y;
4230       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4231       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4232       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4233       if (x && y) {
4234          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4235          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4236             STBI_FREE(final);
4237             return 0;
4238          }
4239          for (j=0; j < y; ++j) {
4240             for (i=0; i < x; ++i) {
4241                int out_y = j*yspc[p]+yorig[p];
4242                int out_x = i*xspc[p]+xorig[p];
4243                memcpy(final + out_y*a->s->img_x*out_n + out_x*out_n,
4244                       a->out + (j*x+i)*out_n, out_n);
4245             }
4246          }
4247          STBI_FREE(a->out);
4248          image_data += img_len;
4249          image_data_len -= img_len;
4250       }
4251    }
4252    a->out = final;
4253 
4254    return 1;
4255 }
4256 
stbi__compute_transparency(stbi__png * z,stbi_uc tc[3],int out_n)4257 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4258 {
4259    stbi__context *s = z->s;
4260    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4261    stbi_uc *p = z->out;
4262 
4263    // compute color-based transparency, assuming we've
4264    // already got 255 as the alpha value in the output
4265    STBI_ASSERT(out_n == 2 || out_n == 4);
4266 
4267    if (out_n == 2) {
4268       for (i=0; i < pixel_count; ++i) {
4269          p[1] = (p[0] == tc[0] ? 0 : 255);
4270          p += 2;
4271       }
4272    } else {
4273       for (i=0; i < pixel_count; ++i) {
4274          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4275             p[3] = 0;
4276          p += 4;
4277       }
4278    }
4279    return 1;
4280 }
4281 
stbi__compute_transparency16(stbi__png * z,stbi__uint16 tc[3],int out_n)4282 static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
4283 {
4284    stbi__context *s = z->s;
4285    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4286    stbi__uint16 *p = (stbi__uint16*) z->out;
4287 
4288    // compute color-based transparency, assuming we've
4289    // already got 65535 as the alpha value in the output
4290    STBI_ASSERT(out_n == 2 || out_n == 4);
4291 
4292    if (out_n == 2) {
4293       for (i = 0; i < pixel_count; ++i) {
4294          p[1] = (p[0] == tc[0] ? 0 : 65535);
4295          p += 2;
4296       }
4297    } else {
4298       for (i = 0; i < pixel_count; ++i) {
4299          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4300             p[3] = 0;
4301          p += 4;
4302       }
4303    }
4304    return 1;
4305 }
4306 
stbi__expand_png_palette(stbi__png * a,stbi_uc * palette,int len,int pal_img_n)4307 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4308 {
4309    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4310    stbi_uc *p, *temp_out, *orig = a->out;
4311 
4312    p = (stbi_uc *) stbi__malloc(pixel_count * pal_img_n);
4313    if (p == NULL) return stbi__err("outofmem", "Out of memory");
4314 
4315    // between here and free(out) below, exitting would leak
4316    temp_out = p;
4317 
4318    if (pal_img_n == 3) {
4319       for (i=0; i < pixel_count; ++i) {
4320          int n = orig[i]*4;
4321          p[0] = palette[n  ];
4322          p[1] = palette[n+1];
4323          p[2] = palette[n+2];
4324          p += 3;
4325       }
4326    } else {
4327       for (i=0; i < pixel_count; ++i) {
4328          int n = orig[i]*4;
4329          p[0] = palette[n  ];
4330          p[1] = palette[n+1];
4331          p[2] = palette[n+2];
4332          p[3] = palette[n+3];
4333          p += 4;
4334       }
4335    }
4336    STBI_FREE(a->out);
4337    a->out = temp_out;
4338 
4339    STBI_NOTUSED(len);
4340 
4341    return 1;
4342 }
4343 
stbi__reduce_png(stbi__png * p)4344 static int stbi__reduce_png(stbi__png *p)
4345 {
4346    int i;
4347    int img_len = p->s->img_x * p->s->img_y * p->s->img_out_n;
4348    stbi_uc *reduced;
4349    stbi__uint16 *orig = (stbi__uint16*)p->out;
4350 
4351    if (p->depth != 16) return 1; // don't need to do anything if not 16-bit data
4352 
4353    reduced = (stbi_uc *)stbi__malloc(img_len);
4354    if (p == NULL) return stbi__err("outofmem", "Out of memory");
4355 
4356    for (i = 0; i < img_len; ++i) reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is a decent approx of 16->8 bit scaling
4357 
4358    p->out = reduced;
4359    STBI_FREE(orig);
4360 
4361    return 1;
4362 }
4363 
4364 static int stbi__unpremultiply_on_load = 0;
4365 static int stbi__de_iphone_flag = 0;
4366 
stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)4367 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4368 {
4369    stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
4370 }
4371 
stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)4372 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
4373 {
4374    stbi__de_iphone_flag = flag_true_if_should_convert;
4375 }
4376 
stbi__de_iphone(stbi__png * z)4377 static void stbi__de_iphone(stbi__png *z)
4378 {
4379    stbi__context *s = z->s;
4380    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4381    stbi_uc *p = z->out;
4382 
4383    if (s->img_out_n == 3) {  // convert bgr to rgb
4384       for (i=0; i < pixel_count; ++i) {
4385          stbi_uc t = p[0];
4386          p[0] = p[2];
4387          p[2] = t;
4388          p += 3;
4389       }
4390    } else {
4391       STBI_ASSERT(s->img_out_n == 4);
4392       if (stbi__unpremultiply_on_load) {
4393          // convert bgr to rgb and unpremultiply
4394          for (i=0; i < pixel_count; ++i) {
4395             stbi_uc a = p[3];
4396             stbi_uc t = p[0];
4397             if (a) {
4398                p[0] = p[2] * 255 / a;
4399                p[1] = p[1] * 255 / a;
4400                p[2] =  t   * 255 / a;
4401             } else {
4402                p[0] = p[2];
4403                p[2] = t;
4404             }
4405             p += 4;
4406          }
4407       } else {
4408          // convert bgr to rgb
4409          for (i=0; i < pixel_count; ++i) {
4410             stbi_uc t = p[0];
4411             p[0] = p[2];
4412             p[2] = t;
4413             p += 4;
4414          }
4415       }
4416    }
4417 }
4418 
4419 #define STBI__PNG_TYPE(a,b,c,d)  (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
4420 
stbi__parse_png_file(stbi__png * z,int scan,int req_comp)4421 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
4422 {
4423    stbi_uc palette[1024], pal_img_n=0;
4424    stbi_uc has_trans=0, tc[3];
4425    stbi__uint16 tc16[3];
4426    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
4427    int first=1,k,interlace=0, color=0, is_iphone=0;
4428    stbi__context *s = z->s;
4429 
4430    z->expanded = NULL;
4431    z->idata = NULL;
4432    z->out = NULL;
4433 
4434    if (!stbi__check_png_header(s)) return 0;
4435 
4436    if (scan == STBI__SCAN_type) return 1;
4437 
4438    for (;;) {
4439       stbi__pngchunk c = stbi__get_chunk_header(s);
4440       switch (c.type) {
4441          case STBI__PNG_TYPE('C','g','B','I'):
4442             is_iphone = 1;
4443             stbi__skip(s, c.length);
4444             break;
4445          case STBI__PNG_TYPE('I','H','D','R'): {
4446             int comp,filter;
4447             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
4448             first = 0;
4449             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
4450             s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4451             s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4452             z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
4453             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
4454 			if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
4455             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
4456             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
4457             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
4458             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
4459             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
4460             if (!pal_img_n) {
4461                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
4462                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
4463                if (scan == STBI__SCAN_header) return 1;
4464             } else {
4465                // if paletted, then pal_n is our final components, and
4466                // img_n is # components to decompress/filter.
4467                s->img_n = 1;
4468                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
4469                // if SCAN_header, have to scan to see if we have a tRNS
4470             }
4471             break;
4472          }
4473 
4474          case STBI__PNG_TYPE('P','L','T','E'):  {
4475             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4476             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
4477             pal_len = c.length / 3;
4478             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
4479             for (i=0; i < pal_len; ++i) {
4480                palette[i*4+0] = stbi__get8(s);
4481                palette[i*4+1] = stbi__get8(s);
4482                palette[i*4+2] = stbi__get8(s);
4483                palette[i*4+3] = 255;
4484             }
4485             break;
4486          }
4487 
4488          case STBI__PNG_TYPE('t','R','N','S'): {
4489             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4490             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
4491             if (pal_img_n) {
4492                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
4493                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
4494                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
4495                pal_img_n = 4;
4496                for (i=0; i < c.length; ++i)
4497                   palette[i*4+3] = stbi__get8(s);
4498             } else {
4499                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
4500                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
4501                has_trans = 1;
4502                if (z->depth == 16) {
4503                   for (k = 0; k < s->img_n; ++k) tc16[k] = stbi__get16be(s); // copy the values as-is
4504                } else {
4505                   for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
4506                }
4507             }
4508             break;
4509          }
4510 
4511          case STBI__PNG_TYPE('I','D','A','T'): {
4512             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4513             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
4514             if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
4515             if ((int)(ioff + c.length) < (int)ioff) return 0;
4516             if (ioff + c.length > idata_limit) {
4517                stbi__uint32 idata_limit_old = idata_limit;
4518                stbi_uc *p;
4519                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
4520                while (ioff + c.length > idata_limit)
4521                   idata_limit *= 2;
4522                STBI_NOTUSED(idata_limit_old);
4523                p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
4524                z->idata = p;
4525             }
4526             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
4527             ioff += c.length;
4528             break;
4529          }
4530 
4531          case STBI__PNG_TYPE('I','E','N','D'): {
4532             stbi__uint32 raw_len, bpl;
4533             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4534             if (scan != STBI__SCAN_load) return 1;
4535             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
4536             // initial guess for decoded data size to avoid unnecessary reallocs
4537             bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
4538             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
4539             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
4540             if (z->expanded == NULL) return 0; // zlib should set error
4541             STBI_FREE(z->idata); z->idata = NULL;
4542             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
4543                s->img_out_n = s->img_n+1;
4544             else
4545                s->img_out_n = s->img_n;
4546             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
4547             if (has_trans) {
4548                if (z->depth == 16) {
4549                   if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
4550                } else {
4551                   if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
4552                }
4553             }
4554             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
4555                stbi__de_iphone(z);
4556             if (pal_img_n) {
4557                // pal_img_n == 3 or 4
4558                s->img_n = pal_img_n; // record the actual colors we had
4559                s->img_out_n = pal_img_n;
4560                if (req_comp >= 3) s->img_out_n = req_comp;
4561                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
4562                   return 0;
4563             }
4564             STBI_FREE(z->expanded); z->expanded = NULL;
4565             return 1;
4566          }
4567 
4568          default:
4569             // if critical, fail
4570             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4571             if ((c.type & (1 << 29)) == 0) {
4572                #ifndef STBI_NO_FAILURE_STRINGS
4573                // not threadsafe
4574                static char invalid_chunk[] = "XXXX PNG chunk not known";
4575                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
4576                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
4577                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
4578                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
4579                #endif
4580                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
4581             }
4582             stbi__skip(s, c.length);
4583             break;
4584       }
4585       // end of PNG chunk, read and skip CRC
4586       stbi__get32be(s);
4587    }
4588 }
4589 
stbi__do_png(stbi__png * p,int * x,int * y,int * n,int req_comp)4590 static unsigned char *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp)
4591 {
4592    unsigned char *result=NULL;
4593    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
4594    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
4595       if (p->depth == 16) {
4596          if (!stbi__reduce_png(p)) {
4597             return result;
4598          }
4599       }
4600       result = p->out;
4601       p->out = NULL;
4602       if (req_comp && req_comp != p->s->img_out_n) {
4603          result = stbi__convert_format(result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
4604          p->s->img_out_n = req_comp;
4605          if (result == NULL) return result;
4606       }
4607       *x = p->s->img_x;
4608       *y = p->s->img_y;
4609       if (n) *n = p->s->img_n;
4610    }
4611    STBI_FREE(p->out);      p->out      = NULL;
4612    STBI_FREE(p->expanded); p->expanded = NULL;
4613    STBI_FREE(p->idata);    p->idata    = NULL;
4614 
4615    return result;
4616 }
4617 
stbi__png_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4618 static unsigned char *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4619 {
4620    stbi__png p;
4621    p.s = s;
4622    return stbi__do_png(&p, x,y,comp,req_comp);
4623 }
4624 
stbi__png_test(stbi__context * s)4625 static int stbi__png_test(stbi__context *s)
4626 {
4627    int r;
4628    r = stbi__check_png_header(s);
4629    stbi__rewind(s);
4630    return r;
4631 }
4632 
stbi__png_info_raw(stbi__png * p,int * x,int * y,int * comp)4633 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
4634 {
4635    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
4636       stbi__rewind( p->s );
4637       return 0;
4638    }
4639    if (x) *x = p->s->img_x;
4640    if (y) *y = p->s->img_y;
4641    if (comp) *comp = p->s->img_n;
4642    return 1;
4643 }
4644 
stbi__png_info(stbi__context * s,int * x,int * y,int * comp)4645 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
4646 {
4647    stbi__png p;
4648    p.s = s;
4649    return stbi__png_info_raw(&p, x, y, comp);
4650 }
4651 #endif
4652 
4653 // Microsoft/Windows BMP image
4654 
4655 #ifndef STBI_NO_BMP
stbi__bmp_test_raw(stbi__context * s)4656 static int stbi__bmp_test_raw(stbi__context *s)
4657 {
4658    int r;
4659    int sz;
4660    if (stbi__get8(s) != 'B') return 0;
4661    if (stbi__get8(s) != 'M') return 0;
4662    stbi__get32le(s); // discard filesize
4663    stbi__get16le(s); // discard reserved
4664    stbi__get16le(s); // discard reserved
4665    stbi__get32le(s); // discard data offset
4666    sz = stbi__get32le(s);
4667    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
4668    return r;
4669 }
4670 
stbi__bmp_test(stbi__context * s)4671 static int stbi__bmp_test(stbi__context *s)
4672 {
4673    int r = stbi__bmp_test_raw(s);
4674    stbi__rewind(s);
4675    return r;
4676 }
4677 
4678 
4679 // returns 0..31 for the highest set bit
stbi__high_bit(unsigned int z)4680 static int stbi__high_bit(unsigned int z)
4681 {
4682    int n=0;
4683    if (z == 0) return -1;
4684    if (z >= 0x10000) n += 16, z >>= 16;
4685    if (z >= 0x00100) n +=  8, z >>=  8;
4686    if (z >= 0x00010) n +=  4, z >>=  4;
4687    if (z >= 0x00004) n +=  2, z >>=  2;
4688    if (z >= 0x00002) n +=  1, z >>=  1;
4689    return n;
4690 }
4691 
stbi__bitcount(unsigned int a)4692 static int stbi__bitcount(unsigned int a)
4693 {
4694    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
4695    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
4696    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
4697    a = (a + (a >> 8)); // max 16 per 8 bits
4698    a = (a + (a >> 16)); // max 32 per 8 bits
4699    return a & 0xff;
4700 }
4701 
stbi__shiftsigned(int v,int shift,int bits)4702 static int stbi__shiftsigned(int v, int shift, int bits)
4703 {
4704    int result;
4705    int z=0;
4706 
4707    if (shift < 0) v <<= -shift;
4708    else v >>= shift;
4709    result = v;
4710 
4711    z = bits;
4712    while (z < 8) {
4713       result += v >> z;
4714       z += bits;
4715    }
4716    return result;
4717 }
4718 
4719 typedef struct
4720 {
4721    int bpp, offset, hsz;
4722    unsigned int mr,mg,mb,ma, all_a;
4723 } stbi__bmp_data;
4724 
stbi__bmp_parse_header(stbi__context * s,stbi__bmp_data * info)4725 static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
4726 {
4727    int hsz;
4728    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
4729    stbi__get32le(s); // discard filesize
4730    stbi__get16le(s); // discard reserved
4731    stbi__get16le(s); // discard reserved
4732    info->offset = stbi__get32le(s);
4733    info->hsz = hsz = stbi__get32le(s);
4734    info->mr = info->mg = info->mb = info->ma = 0;
4735 
4736    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
4737    if (hsz == 12) {
4738       s->img_x = stbi__get16le(s);
4739       s->img_y = stbi__get16le(s);
4740    } else {
4741       s->img_x = stbi__get32le(s);
4742       s->img_y = stbi__get32le(s);
4743    }
4744    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
4745    info->bpp = stbi__get16le(s);
4746    if (info->bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
4747    if (hsz != 12) {
4748       int compress = stbi__get32le(s);
4749       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
4750       stbi__get32le(s); // discard sizeof
4751       stbi__get32le(s); // discard hres
4752       stbi__get32le(s); // discard vres
4753       stbi__get32le(s); // discard colorsused
4754       stbi__get32le(s); // discard max important
4755       if (hsz == 40 || hsz == 56) {
4756          if (hsz == 56) {
4757             stbi__get32le(s);
4758             stbi__get32le(s);
4759             stbi__get32le(s);
4760             stbi__get32le(s);
4761          }
4762          if (info->bpp == 16 || info->bpp == 32) {
4763             if (compress == 0) {
4764                if (info->bpp == 32) {
4765                   info->mr = 0xffu << 16;
4766                   info->mg = 0xffu <<  8;
4767                   info->mb = 0xffu <<  0;
4768                   info->ma = 0xffu << 24;
4769                   info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
4770                } else {
4771                   info->mr = 31u << 10;
4772                   info->mg = 31u <<  5;
4773                   info->mb = 31u <<  0;
4774                }
4775             } else if (compress == 3) {
4776                info->mr = stbi__get32le(s);
4777                info->mg = stbi__get32le(s);
4778                info->mb = stbi__get32le(s);
4779                // not documented, but generated by photoshop and handled by mspaint
4780                if (info->mr == info->mg && info->mg == info->mb) {
4781                   // ?!?!?
4782                   return stbi__errpuc("bad BMP", "bad BMP");
4783                }
4784             } else
4785                return stbi__errpuc("bad BMP", "bad BMP");
4786          }
4787       } else {
4788          int i;
4789          if (hsz != 108 && hsz != 124)
4790             return stbi__errpuc("bad BMP", "bad BMP");
4791          info->mr = stbi__get32le(s);
4792          info->mg = stbi__get32le(s);
4793          info->mb = stbi__get32le(s);
4794          info->ma = stbi__get32le(s);
4795          stbi__get32le(s); // discard color space
4796          for (i=0; i < 12; ++i)
4797             stbi__get32le(s); // discard color space parameters
4798          if (hsz == 124) {
4799             stbi__get32le(s); // discard rendering intent
4800             stbi__get32le(s); // discard offset of profile data
4801             stbi__get32le(s); // discard size of profile data
4802             stbi__get32le(s); // discard reserved
4803          }
4804       }
4805    }
4806    return (void *) 1;
4807 }
4808 
4809 
stbi__bmp_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4810 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4811 {
4812    stbi_uc *out;
4813    unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
4814    stbi_uc pal[256][4];
4815    int psize=0,i,j,width;
4816    int flip_vertically, pad, target;
4817    stbi__bmp_data info;
4818 
4819    info.all_a = 255;
4820    if (stbi__bmp_parse_header(s, &info) == NULL)
4821       return NULL; // error code already set
4822 
4823    flip_vertically = ((int) s->img_y) > 0;
4824    s->img_y = abs((int) s->img_y);
4825 
4826    mr = info.mr;
4827    mg = info.mg;
4828    mb = info.mb;
4829    ma = info.ma;
4830    all_a = info.all_a;
4831 
4832    if (info.hsz == 12) {
4833       if (info.bpp < 24)
4834          psize = (info.offset - 14 - 24) / 3;
4835    } else {
4836       if (info.bpp < 16)
4837          psize = (info.offset - 14 - info.hsz) >> 2;
4838    }
4839 
4840    s->img_n = ma ? 4 : 3;
4841    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
4842       target = req_comp;
4843    else
4844       target = s->img_n; // if they want monochrome, we'll post-convert
4845 
4846    out = (stbi_uc *) stbi__malloc(target * s->img_x * s->img_y);
4847    if (!out) return stbi__errpuc("outofmem", "Out of memory");
4848    if (info.bpp < 16) {
4849       int z=0;
4850       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
4851       for (i=0; i < psize; ++i) {
4852          pal[i][2] = stbi__get8(s);
4853          pal[i][1] = stbi__get8(s);
4854          pal[i][0] = stbi__get8(s);
4855          if (info.hsz != 12) stbi__get8(s);
4856          pal[i][3] = 255;
4857       }
4858       stbi__skip(s, info.offset - 14 - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
4859       if (info.bpp == 4) width = (s->img_x + 1) >> 1;
4860       else if (info.bpp == 8) width = s->img_x;
4861       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
4862       pad = (-width)&3;
4863       for (j=0; j < (int) s->img_y; ++j) {
4864          for (i=0; i < (int) s->img_x; i += 2) {
4865             int v=stbi__get8(s),v2=0;
4866             if (info.bpp == 4) {
4867                v2 = v & 15;
4868                v >>= 4;
4869             }
4870             out[z++] = pal[v][0];
4871             out[z++] = pal[v][1];
4872             out[z++] = pal[v][2];
4873             if (target == 4) out[z++] = 255;
4874             if (i+1 == (int) s->img_x) break;
4875             v = (info.bpp == 8) ? stbi__get8(s) : v2;
4876             out[z++] = pal[v][0];
4877             out[z++] = pal[v][1];
4878             out[z++] = pal[v][2];
4879             if (target == 4) out[z++] = 255;
4880          }
4881          stbi__skip(s, pad);
4882       }
4883    } else {
4884       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
4885       int z = 0;
4886       int easy=0;
4887       stbi__skip(s, info.offset - 14 - info.hsz);
4888       if (info.bpp == 24) width = 3 * s->img_x;
4889       else if (info.bpp == 16) width = 2*s->img_x;
4890       else /* bpp = 32 and pad = 0 */ width=0;
4891       pad = (-width) & 3;
4892       if (info.bpp == 24) {
4893          easy = 1;
4894       } else if (info.bpp == 32) {
4895          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
4896             easy = 2;
4897       }
4898       if (!easy) {
4899          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
4900          // right shift amt to put high bit in position #7
4901          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
4902          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
4903          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
4904          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
4905       }
4906       for (j=0; j < (int) s->img_y; ++j) {
4907          if (easy) {
4908             for (i=0; i < (int) s->img_x; ++i) {
4909                unsigned char a;
4910                out[z+2] = stbi__get8(s);
4911                out[z+1] = stbi__get8(s);
4912                out[z+0] = stbi__get8(s);
4913                z += 3;
4914                a = (easy == 2 ? stbi__get8(s) : 255);
4915                all_a |= a;
4916                if (target == 4) out[z++] = a;
4917             }
4918          } else {
4919             int bpp = info.bpp;
4920             for (i=0; i < (int) s->img_x; ++i) {
4921                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
4922                int a;
4923                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
4924                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
4925                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
4926                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
4927                all_a |= a;
4928                if (target == 4) out[z++] = STBI__BYTECAST(a);
4929             }
4930          }
4931          stbi__skip(s, pad);
4932       }
4933    }
4934 
4935    // if alpha channel is all 0s, replace with all 255s
4936    if (target == 4 && all_a == 0)
4937       for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
4938          out[i] = 255;
4939 
4940    if (flip_vertically) {
4941       stbi_uc t;
4942       for (j=0; j < (int) s->img_y>>1; ++j) {
4943          stbi_uc *p1 = out +      j     *s->img_x*target;
4944          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
4945          for (i=0; i < (int) s->img_x*target; ++i) {
4946             t = p1[i], p1[i] = p2[i], p2[i] = t;
4947          }
4948       }
4949    }
4950 
4951    if (req_comp && req_comp != target) {
4952       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
4953       if (out == NULL) return out; // stbi__convert_format frees input on failure
4954    }
4955 
4956    *x = s->img_x;
4957    *y = s->img_y;
4958    if (comp) *comp = s->img_n;
4959    return out;
4960 }
4961 #endif
4962 
4963 // Targa Truevision - TGA
4964 // by Jonathan Dummer
4965 #ifndef STBI_NO_TGA
4966 // returns STBI_rgb or whatever, 0 on error
stbi__tga_get_comp(int bits_per_pixel,int is_grey,int * is_rgb16)4967 static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
4968 {
4969    // only RGB or RGBA (incl. 16bit) or grey allowed
4970    if(is_rgb16) *is_rgb16 = 0;
4971    switch(bits_per_pixel) {
4972       case 8:  return STBI_grey;
4973       case 16: if(is_grey) return STBI_grey_alpha;
4974             // else: fall-through
4975       case 15: if(is_rgb16) *is_rgb16 = 1;
4976             return STBI_rgb;
4977       case 24: // fall-through
4978       case 32: return bits_per_pixel/8;
4979       default: return 0;
4980    }
4981 }
4982 
stbi__tga_info(stbi__context * s,int * x,int * y,int * comp)4983 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
4984 {
4985     int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
4986     int sz, tga_colormap_type;
4987     stbi__get8(s);                   // discard Offset
4988     tga_colormap_type = stbi__get8(s); // colormap type
4989     if( tga_colormap_type > 1 ) {
4990         stbi__rewind(s);
4991         return 0;      // only RGB or indexed allowed
4992     }
4993     tga_image_type = stbi__get8(s); // image type
4994     if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
4995         if (tga_image_type != 1 && tga_image_type != 9) {
4996             stbi__rewind(s);
4997             return 0;
4998         }
4999         stbi__skip(s,4);       // skip index of first colormap entry and number of entries
5000         sz = stbi__get8(s);    //   check bits per palette color entry
5001         if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
5002             stbi__rewind(s);
5003             return 0;
5004         }
5005         stbi__skip(s,4);       // skip image x and y origin
5006         tga_colormap_bpp = sz;
5007     } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
5008         if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
5009             stbi__rewind(s);
5010             return 0; // only RGB or grey allowed, +/- RLE
5011         }
5012         stbi__skip(s,9); // skip colormap specification and image x/y origin
5013         tga_colormap_bpp = 0;
5014     }
5015     tga_w = stbi__get16le(s);
5016     if( tga_w < 1 ) {
5017         stbi__rewind(s);
5018         return 0;   // test width
5019     }
5020     tga_h = stbi__get16le(s);
5021     if( tga_h < 1 ) {
5022         stbi__rewind(s);
5023         return 0;   // test height
5024     }
5025     tga_bits_per_pixel = stbi__get8(s); // bits per pixel
5026     stbi__get8(s); // ignore alpha bits
5027     if (tga_colormap_bpp != 0) {
5028         if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
5029             // when using a colormap, tga_bits_per_pixel is the size of the indexes
5030             // I don't think anything but 8 or 16bit indexes makes sense
5031             stbi__rewind(s);
5032             return 0;
5033         }
5034         tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
5035     } else {
5036         tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
5037     }
5038     if(!tga_comp) {
5039       stbi__rewind(s);
5040       return 0;
5041     }
5042     if (x) *x = tga_w;
5043     if (y) *y = tga_h;
5044     if (comp) *comp = tga_comp;
5045     return 1;                   // seems to have passed everything
5046 }
5047 
stbi__tga_test(stbi__context * s)5048 static int stbi__tga_test(stbi__context *s)
5049 {
5050    int res = 0;
5051    int sz, tga_color_type;
5052    stbi__get8(s);      //   discard Offset
5053    tga_color_type = stbi__get8(s);   //   color type
5054    if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
5055    sz = stbi__get8(s);   //   image type
5056    if ( tga_color_type == 1 ) { // colormapped (paletted) image
5057       if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
5058       stbi__skip(s,4);       // skip index of first colormap entry and number of entries
5059       sz = stbi__get8(s);    //   check bits per palette color entry
5060       if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5061       stbi__skip(s,4);       // skip image x and y origin
5062    } else { // "normal" image w/o colormap
5063       if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
5064       stbi__skip(s,9); // skip colormap specification and image x/y origin
5065    }
5066    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
5067    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
5068    sz = stbi__get8(s);   //   bits per pixel
5069    if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
5070    if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5071 
5072    res = 1; // if we got this far, everything's good and we can return 1 instead of 0
5073 
5074 errorEnd:
5075    stbi__rewind(s);
5076    return res;
5077 }
5078 
5079 // read 16bit value and convert to 24bit RGB
stbi__tga_read_rgb16(stbi__context * s,stbi_uc * out)5080 void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
5081 {
5082    stbi__uint16 px = stbi__get16le(s);
5083    stbi__uint16 fiveBitMask = 31;
5084    // we have 3 channels with 5bits each
5085    int r = (px >> 10) & fiveBitMask;
5086    int g = (px >> 5) & fiveBitMask;
5087    int b = px & fiveBitMask;
5088    // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
5089    out[0] = (r * 255)/31;
5090    out[1] = (g * 255)/31;
5091    out[2] = (b * 255)/31;
5092 
5093    // some people claim that the most significant bit might be used for alpha
5094    // (possibly if an alpha-bit is set in the "image descriptor byte")
5095    // but that only made 16bit test images completely translucent..
5096    // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
5097 }
5098 
stbi__tga_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5099 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5100 {
5101    //   read in the TGA header stuff
5102    int tga_offset = stbi__get8(s);
5103    int tga_indexed = stbi__get8(s);
5104    int tga_image_type = stbi__get8(s);
5105    int tga_is_RLE = 0;
5106    int tga_palette_start = stbi__get16le(s);
5107    int tga_palette_len = stbi__get16le(s);
5108    int tga_palette_bits = stbi__get8(s);
5109    int tga_x_origin = stbi__get16le(s);
5110    int tga_y_origin = stbi__get16le(s);
5111    int tga_width = stbi__get16le(s);
5112    int tga_height = stbi__get16le(s);
5113    int tga_bits_per_pixel = stbi__get8(s);
5114    int tga_comp, tga_rgb16=0;
5115    int tga_inverted = stbi__get8(s);
5116    // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
5117    //   image data
5118    unsigned char *tga_data;
5119    unsigned char *tga_palette = NULL;
5120    int i, j;
5121    unsigned char raw_data[4];
5122    int RLE_count = 0;
5123    int RLE_repeating = 0;
5124    int read_next_pixel = 1;
5125 
5126    //   do a tiny bit of precessing
5127    if ( tga_image_type >= 8 )
5128    {
5129       tga_image_type -= 8;
5130       tga_is_RLE = 1;
5131    }
5132    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
5133 
5134    //   If I'm paletted, then I'll use the number of bits from the palette
5135    if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
5136    else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
5137 
5138    if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
5139       return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
5140 
5141    //   tga info
5142    *x = tga_width;
5143    *y = tga_height;
5144    if (comp) *comp = tga_comp;
5145 
5146    tga_data = (unsigned char*)stbi__malloc( (size_t)tga_width * tga_height * tga_comp );
5147    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
5148 
5149    // skip to the data's starting position (offset usually = 0)
5150    stbi__skip(s, tga_offset );
5151 
5152    if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
5153       for (i=0; i < tga_height; ++i) {
5154          int row = tga_inverted ? tga_height -i - 1 : i;
5155          stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
5156          stbi__getn(s, tga_row, tga_width * tga_comp);
5157       }
5158    } else  {
5159       //   do I need to load a palette?
5160       if ( tga_indexed)
5161       {
5162          //   any data to skip? (offset usually = 0)
5163          stbi__skip(s, tga_palette_start );
5164          //   load the palette
5165          tga_palette = (unsigned char*)stbi__malloc( tga_palette_len * tga_comp );
5166          if (!tga_palette) {
5167             STBI_FREE(tga_data);
5168             return stbi__errpuc("outofmem", "Out of memory");
5169          }
5170          if (tga_rgb16) {
5171             stbi_uc *pal_entry = tga_palette;
5172             STBI_ASSERT(tga_comp == STBI_rgb);
5173             for (i=0; i < tga_palette_len; ++i) {
5174                stbi__tga_read_rgb16(s, pal_entry);
5175                pal_entry += tga_comp;
5176             }
5177          } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
5178                STBI_FREE(tga_data);
5179                STBI_FREE(tga_palette);
5180                return stbi__errpuc("bad palette", "Corrupt TGA");
5181          }
5182       }
5183       //   load the data
5184       for (i=0; i < tga_width * tga_height; ++i)
5185       {
5186          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
5187          if ( tga_is_RLE )
5188          {
5189             if ( RLE_count == 0 )
5190             {
5191                //   yep, get the next byte as a RLE command
5192                int RLE_cmd = stbi__get8(s);
5193                RLE_count = 1 + (RLE_cmd & 127);
5194                RLE_repeating = RLE_cmd >> 7;
5195                read_next_pixel = 1;
5196             } else if ( !RLE_repeating )
5197             {
5198                read_next_pixel = 1;
5199             }
5200          } else
5201          {
5202             read_next_pixel = 1;
5203          }
5204          //   OK, if I need to read a pixel, do it now
5205          if ( read_next_pixel )
5206          {
5207             //   load however much data we did have
5208             if ( tga_indexed )
5209             {
5210                // read in index, then perform the lookup
5211                int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
5212                if ( pal_idx >= tga_palette_len ) {
5213                   // invalid index
5214                   pal_idx = 0;
5215                }
5216                pal_idx *= tga_comp;
5217                for (j = 0; j < tga_comp; ++j) {
5218                   raw_data[j] = tga_palette[pal_idx+j];
5219                }
5220             } else if(tga_rgb16) {
5221                STBI_ASSERT(tga_comp == STBI_rgb);
5222                stbi__tga_read_rgb16(s, raw_data);
5223             } else {
5224                //   read in the data raw
5225                for (j = 0; j < tga_comp; ++j) {
5226                   raw_data[j] = stbi__get8(s);
5227                }
5228             }
5229             //   clear the reading flag for the next pixel
5230             read_next_pixel = 0;
5231          } // end of reading a pixel
5232 
5233          // copy data
5234          for (j = 0; j < tga_comp; ++j)
5235            tga_data[i*tga_comp+j] = raw_data[j];
5236 
5237          //   in case we're in RLE mode, keep counting down
5238          --RLE_count;
5239       }
5240       //   do I need to invert the image?
5241       if ( tga_inverted )
5242       {
5243          for (j = 0; j*2 < tga_height; ++j)
5244          {
5245             int index1 = j * tga_width * tga_comp;
5246             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
5247             for (i = tga_width * tga_comp; i > 0; --i)
5248             {
5249                unsigned char temp = tga_data[index1];
5250                tga_data[index1] = tga_data[index2];
5251                tga_data[index2] = temp;
5252                ++index1;
5253                ++index2;
5254             }
5255          }
5256       }
5257       //   clear my palette, if I had one
5258       if ( tga_palette != NULL )
5259       {
5260          STBI_FREE( tga_palette );
5261       }
5262    }
5263 
5264    // swap RGB - if the source data was RGB16, it already is in the right order
5265    if (tga_comp >= 3 && !tga_rgb16)
5266    {
5267       unsigned char* tga_pixel = tga_data;
5268       for (i=0; i < tga_width * tga_height; ++i)
5269       {
5270          unsigned char temp = tga_pixel[0];
5271          tga_pixel[0] = tga_pixel[2];
5272          tga_pixel[2] = temp;
5273          tga_pixel += tga_comp;
5274       }
5275    }
5276 
5277    // convert to target component count
5278    if (req_comp && req_comp != tga_comp)
5279       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
5280 
5281    //   the things I do to get rid of an error message, and yet keep
5282    //   Microsoft's C compilers happy... [8^(
5283    tga_palette_start = tga_palette_len = tga_palette_bits =
5284          tga_x_origin = tga_y_origin = 0;
5285    //   OK, done
5286    return tga_data;
5287 }
5288 #endif
5289 
5290 // *************************************************************************************************
5291 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
5292 
5293 #ifndef STBI_NO_PSD
stbi__psd_test(stbi__context * s)5294 static int stbi__psd_test(stbi__context *s)
5295 {
5296    int r = (stbi__get32be(s) == 0x38425053);
5297    stbi__rewind(s);
5298    return r;
5299 }
5300 
stbi__psd_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5301 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5302 {
5303    int   pixelCount;
5304    int channelCount, compression;
5305    int channel, i, count, len;
5306    int bitdepth;
5307    int w,h;
5308    stbi_uc *out;
5309 
5310    // Check identifier
5311    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
5312       return stbi__errpuc("not PSD", "Corrupt PSD image");
5313 
5314    // Check file type version.
5315    if (stbi__get16be(s) != 1)
5316       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
5317 
5318    // Skip 6 reserved bytes.
5319    stbi__skip(s, 6 );
5320 
5321    // Read the number of channels (R, G, B, A, etc).
5322    channelCount = stbi__get16be(s);
5323    if (channelCount < 0 || channelCount > 16)
5324       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
5325 
5326    // Read the rows and columns of the image.
5327    h = stbi__get32be(s);
5328    w = stbi__get32be(s);
5329 
5330    // Make sure the depth is 8 bits.
5331    bitdepth = stbi__get16be(s);
5332    if (bitdepth != 8 && bitdepth != 16)
5333       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
5334 
5335    // Make sure the color mode is RGB.
5336    // Valid options are:
5337    //   0: Bitmap
5338    //   1: Grayscale
5339    //   2: Indexed color
5340    //   3: RGB color
5341    //   4: CMYK color
5342    //   7: Multichannel
5343    //   8: Duotone
5344    //   9: Lab color
5345    if (stbi__get16be(s) != 3)
5346       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
5347 
5348    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
5349    stbi__skip(s,stbi__get32be(s) );
5350 
5351    // Skip the image resources.  (resolution, pen tool paths, etc)
5352    stbi__skip(s, stbi__get32be(s) );
5353 
5354    // Skip the reserved data.
5355    stbi__skip(s, stbi__get32be(s) );
5356 
5357    // Find out if the data is compressed.
5358    // Known values:
5359    //   0: no compression
5360    //   1: RLE compressed
5361    compression = stbi__get16be(s);
5362    if (compression > 1)
5363       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
5364 
5365    // Create the destination image.
5366    out = (stbi_uc *) stbi__malloc(4 * w*h);
5367    if (!out) return stbi__errpuc("outofmem", "Out of memory");
5368    pixelCount = w*h;
5369 
5370    // Initialize the data to zero.
5371    //memset( out, 0, pixelCount * 4 );
5372 
5373    // Finally, the image data.
5374    if (compression) {
5375       // RLE as used by .PSD and .TIFF
5376       // Loop until you get the number of unpacked bytes you are expecting:
5377       //     Read the next source byte into n.
5378       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
5379       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
5380       //     Else if n is 128, noop.
5381       // Endloop
5382 
5383       // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
5384       // which we're going to just skip.
5385       stbi__skip(s, h * channelCount * 2 );
5386 
5387       // Read the RLE data by channel.
5388       for (channel = 0; channel < 4; channel++) {
5389          stbi_uc *p;
5390 
5391          p = out+channel;
5392          if (channel >= channelCount) {
5393             // Fill this channel with default data.
5394             for (i = 0; i < pixelCount; i++, p += 4)
5395                *p = (channel == 3 ? 255 : 0);
5396          } else {
5397             // Read the RLE data.
5398             count = 0;
5399             while (count < pixelCount) {
5400                len = stbi__get8(s);
5401                if (len == 128) {
5402                   // No-op.
5403                } else if (len < 128) {
5404                   // Copy next len+1 bytes literally.
5405                   len++;
5406                   count += len;
5407                   while (len) {
5408                      *p = stbi__get8(s);
5409                      p += 4;
5410                      len--;
5411                   }
5412                } else if (len > 128) {
5413                   stbi_uc   val;
5414                   // Next -len+1 bytes in the dest are replicated from next source byte.
5415                   // (Interpret len as a negative 8-bit int.)
5416                   len ^= 0x0FF;
5417                   len += 2;
5418                   val = stbi__get8(s);
5419                   count += len;
5420                   while (len) {
5421                      *p = val;
5422                      p += 4;
5423                      len--;
5424                   }
5425                }
5426             }
5427          }
5428       }
5429 
5430    } else {
5431       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
5432       // where each channel consists of an 8-bit value for each pixel in the image.
5433 
5434       // Read the data by channel.
5435       for (channel = 0; channel < 4; channel++) {
5436          stbi_uc *p;
5437 
5438          p = out + channel;
5439          if (channel >= channelCount) {
5440             // Fill this channel with default data.
5441             stbi_uc val = channel == 3 ? 255 : 0;
5442             for (i = 0; i < pixelCount; i++, p += 4)
5443                *p = val;
5444          } else {
5445             // Read the data.
5446             if (bitdepth == 16) {
5447                for (i = 0; i < pixelCount; i++, p += 4)
5448                   *p = (stbi_uc) (stbi__get16be(s) >> 8);
5449             } else {
5450                for (i = 0; i < pixelCount; i++, p += 4)
5451                   *p = stbi__get8(s);
5452             }
5453          }
5454       }
5455    }
5456 
5457    if (channelCount >= 4) {
5458       for (i=0; i < w*h; ++i) {
5459          unsigned char *pixel = out + 4*i;
5460          if (pixel[3] != 0 && pixel[3] != 255) {
5461             // remove weird white matte from PSD
5462             float a = pixel[3] / 255.0f;
5463             float ra = 1.0f / a;
5464             float inv_a = 255.0f * (1 - ra);
5465             pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
5466             pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
5467             pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
5468          }
5469       }
5470    }
5471 
5472    if (req_comp && req_comp != 4) {
5473       out = stbi__convert_format(out, 4, req_comp, w, h);
5474       if (out == NULL) return out; // stbi__convert_format frees input on failure
5475    }
5476 
5477    if (comp) *comp = 4;
5478    *y = h;
5479    *x = w;
5480 
5481    return out;
5482 }
5483 #endif
5484 
5485 // *************************************************************************************************
5486 // Softimage PIC loader
5487 // by Tom Seddon
5488 //
5489 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
5490 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
5491 
5492 #ifndef STBI_NO_PIC
stbi__pic_is4(stbi__context * s,const char * str)5493 static int stbi__pic_is4(stbi__context *s,const char *str)
5494 {
5495    int i;
5496    for (i=0; i<4; ++i)
5497       if (stbi__get8(s) != (stbi_uc)str[i])
5498          return 0;
5499 
5500    return 1;
5501 }
5502 
stbi__pic_test_core(stbi__context * s)5503 static int stbi__pic_test_core(stbi__context *s)
5504 {
5505    int i;
5506 
5507    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
5508       return 0;
5509 
5510    for(i=0;i<84;++i)
5511       stbi__get8(s);
5512 
5513    if (!stbi__pic_is4(s,"PICT"))
5514       return 0;
5515 
5516    return 1;
5517 }
5518 
5519 typedef struct
5520 {
5521    stbi_uc size,type,channel;
5522 } stbi__pic_packet;
5523 
stbi__readval(stbi__context * s,int channel,stbi_uc * dest)5524 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
5525 {
5526    int mask=0x80, i;
5527 
5528    for (i=0; i<4; ++i, mask>>=1) {
5529       if (channel & mask) {
5530          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
5531          dest[i]=stbi__get8(s);
5532       }
5533    }
5534 
5535    return dest;
5536 }
5537 
stbi__copyval(int channel,stbi_uc * dest,const stbi_uc * src)5538 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
5539 {
5540    int mask=0x80,i;
5541 
5542    for (i=0;i<4; ++i, mask>>=1)
5543       if (channel&mask)
5544          dest[i]=src[i];
5545 }
5546 
stbi__pic_load_core(stbi__context * s,int width,int height,int * comp,stbi_uc * result)5547 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
5548 {
5549    int act_comp=0,num_packets=0,y,chained;
5550    stbi__pic_packet packets[10];
5551 
5552    // this will (should...) cater for even some bizarre stuff like having data
5553     // for the same channel in multiple packets.
5554    do {
5555       stbi__pic_packet *packet;
5556 
5557       if (num_packets==sizeof(packets)/sizeof(packets[0]))
5558          return stbi__errpuc("bad format","too many packets");
5559 
5560       packet = &packets[num_packets++];
5561 
5562       chained = stbi__get8(s);
5563       packet->size    = stbi__get8(s);
5564       packet->type    = stbi__get8(s);
5565       packet->channel = stbi__get8(s);
5566 
5567       act_comp |= packet->channel;
5568 
5569       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
5570       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
5571    } while (chained);
5572 
5573    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
5574 
5575    for(y=0; y<height; ++y) {
5576       int packet_idx;
5577 
5578       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
5579          stbi__pic_packet *packet = &packets[packet_idx];
5580          stbi_uc *dest = result+y*width*4;
5581 
5582          switch (packet->type) {
5583             default:
5584                return stbi__errpuc("bad format","packet has bad compression type");
5585 
5586             case 0: {//uncompressed
5587                int x;
5588 
5589                for(x=0;x<width;++x, dest+=4)
5590                   if (!stbi__readval(s,packet->channel,dest))
5591                      return 0;
5592                break;
5593             }
5594 
5595             case 1://Pure RLE
5596                {
5597                   int left=width, i;
5598 
5599                   while (left>0) {
5600                      stbi_uc count,value[4];
5601 
5602                      count=stbi__get8(s);
5603                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
5604 
5605                      if (count > left)
5606                         count = (stbi_uc) left;
5607 
5608                      if (!stbi__readval(s,packet->channel,value))  return 0;
5609 
5610                      for(i=0; i<count; ++i,dest+=4)
5611                         stbi__copyval(packet->channel,dest,value);
5612                      left -= count;
5613                   }
5614                }
5615                break;
5616 
5617             case 2: {//Mixed RLE
5618                int left=width;
5619                while (left>0) {
5620                   int count = stbi__get8(s), i;
5621                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
5622 
5623                   if (count >= 128) { // Repeated
5624                      stbi_uc value[4];
5625 
5626                      if (count==128)
5627                         count = stbi__get16be(s);
5628                      else
5629                         count -= 127;
5630                      if (count > left)
5631                         return stbi__errpuc("bad file","scanline overrun");
5632 
5633                      if (!stbi__readval(s,packet->channel,value))
5634                         return 0;
5635 
5636                      for(i=0;i<count;++i, dest += 4)
5637                         stbi__copyval(packet->channel,dest,value);
5638                   } else { // Raw
5639                      ++count;
5640                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
5641 
5642                      for(i=0;i<count;++i, dest+=4)
5643                         if (!stbi__readval(s,packet->channel,dest))
5644                            return 0;
5645                   }
5646                   left-=count;
5647                }
5648                break;
5649             }
5650          }
5651       }
5652    }
5653 
5654    return result;
5655 }
5656 
stbi__pic_load(stbi__context * s,int * px,int * py,int * comp,int req_comp)5657 static stbi_uc *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp)
5658 {
5659    stbi_uc *result;
5660    int i, x,y;
5661 
5662    for (i=0; i<92; ++i)
5663       stbi__get8(s);
5664 
5665    x = stbi__get16be(s);
5666    y = stbi__get16be(s);
5667    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
5668    if ((1 << 28) / x < y) return stbi__errpuc("too large", "Image too large to decode");
5669 
5670    stbi__get32be(s); //skip `ratio'
5671    stbi__get16be(s); //skip `fields'
5672    stbi__get16be(s); //skip `pad'
5673 
5674    // intermediate buffer is RGBA
5675    result = (stbi_uc *) stbi__malloc(x*y*4);
5676    memset(result, 0xff, x*y*4);
5677 
5678    if (!stbi__pic_load_core(s,x,y,comp, result)) {
5679       STBI_FREE(result);
5680       result=0;
5681    }
5682    *px = x;
5683    *py = y;
5684    if (req_comp == 0) req_comp = *comp;
5685    result=stbi__convert_format(result,4,req_comp,x,y);
5686 
5687    return result;
5688 }
5689 
stbi__pic_test(stbi__context * s)5690 static int stbi__pic_test(stbi__context *s)
5691 {
5692    int r = stbi__pic_test_core(s);
5693    stbi__rewind(s);
5694    return r;
5695 }
5696 #endif
5697 
5698 // *************************************************************************************************
5699 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
5700 
5701 #ifndef STBI_NO_GIF
5702 typedef struct
5703 {
5704    stbi__int16 prefix;
5705    stbi_uc first;
5706    stbi_uc suffix;
5707 } stbi__gif_lzw;
5708 
5709 typedef struct
5710 {
5711    int w,h;
5712    stbi_uc *out, *old_out;             // output buffer (always 4 components)
5713    int flags, bgindex, ratio, transparent, eflags, delay;
5714    stbi_uc  pal[256][4];
5715    stbi_uc lpal[256][4];
5716    stbi__gif_lzw codes[4096];
5717    stbi_uc *color_table;
5718    int parse, step;
5719    int lflags;
5720    int start_x, start_y;
5721    int max_x, max_y;
5722    int cur_x, cur_y;
5723    int line_size;
5724 } stbi__gif;
5725 
stbi__gif_test_raw(stbi__context * s)5726 static int stbi__gif_test_raw(stbi__context *s)
5727 {
5728    int sz;
5729    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
5730    sz = stbi__get8(s);
5731    if (sz != '9' && sz != '7') return 0;
5732    if (stbi__get8(s) != 'a') return 0;
5733    return 1;
5734 }
5735 
stbi__gif_test(stbi__context * s)5736 static int stbi__gif_test(stbi__context *s)
5737 {
5738    int r = stbi__gif_test_raw(s);
5739    stbi__rewind(s);
5740    return r;
5741 }
5742 
stbi__gif_parse_colortable(stbi__context * s,stbi_uc pal[256][4],int num_entries,int transp)5743 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
5744 {
5745    int i;
5746    for (i=0; i < num_entries; ++i) {
5747       pal[i][2] = stbi__get8(s);
5748       pal[i][1] = stbi__get8(s);
5749       pal[i][0] = stbi__get8(s);
5750       pal[i][3] = transp == i ? 0 : 255;
5751    }
5752 }
5753 
stbi__gif_header(stbi__context * s,stbi__gif * g,int * comp,int is_info)5754 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
5755 {
5756    stbi_uc version;
5757    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
5758       return stbi__err("not GIF", "Corrupt GIF");
5759 
5760    version = stbi__get8(s);
5761    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
5762    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
5763 
5764    stbi__g_failure_reason = "";
5765    g->w = stbi__get16le(s);
5766    g->h = stbi__get16le(s);
5767    g->flags = stbi__get8(s);
5768    g->bgindex = stbi__get8(s);
5769    g->ratio = stbi__get8(s);
5770    g->transparent = -1;
5771 
5772    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
5773 
5774    if (is_info) return 1;
5775 
5776    if (g->flags & 0x80)
5777       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
5778 
5779    return 1;
5780 }
5781 
stbi__gif_info_raw(stbi__context * s,int * x,int * y,int * comp)5782 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
5783 {
5784    stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
5785    if (!stbi__gif_header(s, g, comp, 1)) {
5786       STBI_FREE(g);
5787       stbi__rewind( s );
5788       return 0;
5789    }
5790    if (x) *x = g->w;
5791    if (y) *y = g->h;
5792    STBI_FREE(g);
5793    return 1;
5794 }
5795 
stbi__out_gif_code(stbi__gif * g,stbi__uint16 code)5796 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
5797 {
5798    stbi_uc *p, *c;
5799 
5800    // recurse to decode the prefixes, since the linked-list is backwards,
5801    // and working backwards through an interleaved image would be nasty
5802    if (g->codes[code].prefix >= 0)
5803       stbi__out_gif_code(g, g->codes[code].prefix);
5804 
5805    if (g->cur_y >= g->max_y) return;
5806 
5807    p = &g->out[g->cur_x + g->cur_y];
5808    c = &g->color_table[g->codes[code].suffix * 4];
5809 
5810    if (c[3] >= 128) {
5811       p[0] = c[2];
5812       p[1] = c[1];
5813       p[2] = c[0];
5814       p[3] = c[3];
5815    }
5816    g->cur_x += 4;
5817 
5818    if (g->cur_x >= g->max_x) {
5819       g->cur_x = g->start_x;
5820       g->cur_y += g->step;
5821 
5822       while (g->cur_y >= g->max_y && g->parse > 0) {
5823          g->step = (1 << g->parse) * g->line_size;
5824          g->cur_y = g->start_y + (g->step >> 1);
5825          --g->parse;
5826       }
5827    }
5828 }
5829 
stbi__process_gif_raster(stbi__context * s,stbi__gif * g)5830 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
5831 {
5832    stbi_uc lzw_cs;
5833    stbi__int32 len, init_code;
5834    stbi__uint32 first;
5835    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
5836    stbi__gif_lzw *p;
5837 
5838    lzw_cs = stbi__get8(s);
5839    if (lzw_cs > 12) return NULL;
5840    clear = 1 << lzw_cs;
5841    first = 1;
5842    codesize = lzw_cs + 1;
5843    codemask = (1 << codesize) - 1;
5844    bits = 0;
5845    valid_bits = 0;
5846    for (init_code = 0; init_code < clear; init_code++) {
5847       g->codes[init_code].prefix = -1;
5848       g->codes[init_code].first = (stbi_uc) init_code;
5849       g->codes[init_code].suffix = (stbi_uc) init_code;
5850    }
5851 
5852    // support no starting clear code
5853    avail = clear+2;
5854    oldcode = -1;
5855 
5856    len = 0;
5857    for(;;) {
5858       if (valid_bits < codesize) {
5859          if (len == 0) {
5860             len = stbi__get8(s); // start new block
5861             if (len == 0)
5862                return g->out;
5863          }
5864          --len;
5865          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
5866          valid_bits += 8;
5867       } else {
5868          stbi__int32 code = bits & codemask;
5869          bits >>= codesize;
5870          valid_bits -= codesize;
5871          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
5872          if (code == clear) {  // clear code
5873             codesize = lzw_cs + 1;
5874             codemask = (1 << codesize) - 1;
5875             avail = clear + 2;
5876             oldcode = -1;
5877             first = 0;
5878          } else if (code == clear + 1) { // end of stream code
5879             stbi__skip(s, len);
5880             while ((len = stbi__get8(s)) > 0)
5881                stbi__skip(s,len);
5882             return g->out;
5883          } else if (code <= avail) {
5884             if (first) return stbi__errpuc("no clear code", "Corrupt GIF");
5885 
5886             if (oldcode >= 0) {
5887                p = &g->codes[avail++];
5888                if (avail > 4096)        return stbi__errpuc("too many codes", "Corrupt GIF");
5889                p->prefix = (stbi__int16) oldcode;
5890                p->first = g->codes[oldcode].first;
5891                p->suffix = (code == avail) ? p->first : g->codes[code].first;
5892             } else if (code == avail)
5893                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5894 
5895             stbi__out_gif_code(g, (stbi__uint16) code);
5896 
5897             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
5898                codesize++;
5899                codemask = (1 << codesize) - 1;
5900             }
5901 
5902             oldcode = code;
5903          } else {
5904             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5905          }
5906       }
5907    }
5908 }
5909 
stbi__fill_gif_background(stbi__gif * g,int x0,int y0,int x1,int y1)5910 static void stbi__fill_gif_background(stbi__gif *g, int x0, int y0, int x1, int y1)
5911 {
5912    int x, y;
5913    stbi_uc *c = g->pal[g->bgindex];
5914    for (y = y0; y < y1; y += 4 * g->w) {
5915       for (x = x0; x < x1; x += 4) {
5916          stbi_uc *p  = &g->out[y + x];
5917          p[0] = c[2];
5918          p[1] = c[1];
5919          p[2] = c[0];
5920          p[3] = 0;
5921       }
5922    }
5923 }
5924 
5925 // this function is designed to support animated gifs, although stb_image doesn't support it
stbi__gif_load_next(stbi__context * s,stbi__gif * g,int * comp,int req_comp)5926 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp)
5927 {
5928    int i;
5929    stbi_uc *prev_out = 0;
5930 
5931    if (g->out == 0 && !stbi__gif_header(s, g, comp,0))
5932       return 0; // stbi__g_failure_reason set by stbi__gif_header
5933 
5934    prev_out = g->out;
5935    g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
5936    if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory");
5937 
5938    switch ((g->eflags & 0x1C) >> 2) {
5939       case 0: // unspecified (also always used on 1st frame)
5940          stbi__fill_gif_background(g, 0, 0, 4 * g->w, 4 * g->w * g->h);
5941          break;
5942       case 1: // do not dispose
5943          if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
5944          g->old_out = prev_out;
5945          break;
5946       case 2: // dispose to background
5947          if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
5948          stbi__fill_gif_background(g, g->start_x, g->start_y, g->max_x, g->max_y);
5949          break;
5950       case 3: // dispose to previous
5951          if (g->old_out) {
5952             for (i = g->start_y; i < g->max_y; i += 4 * g->w)
5953                memcpy(&g->out[i + g->start_x], &g->old_out[i + g->start_x], g->max_x - g->start_x);
5954          }
5955          break;
5956    }
5957 
5958    for (;;) {
5959       switch (stbi__get8(s)) {
5960          case 0x2C: /* Image Descriptor */
5961          {
5962             int prev_trans = -1;
5963             stbi__int32 x, y, w, h;
5964             stbi_uc *o;
5965 
5966             x = stbi__get16le(s);
5967             y = stbi__get16le(s);
5968             w = stbi__get16le(s);
5969             h = stbi__get16le(s);
5970             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
5971                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
5972 
5973             g->line_size = g->w * 4;
5974             g->start_x = x * 4;
5975             g->start_y = y * g->line_size;
5976             g->max_x   = g->start_x + w * 4;
5977             g->max_y   = g->start_y + h * g->line_size;
5978             g->cur_x   = g->start_x;
5979             g->cur_y   = g->start_y;
5980 
5981             g->lflags = stbi__get8(s);
5982 
5983             if (g->lflags & 0x40) {
5984                g->step = 8 * g->line_size; // first interlaced spacing
5985                g->parse = 3;
5986             } else {
5987                g->step = g->line_size;
5988                g->parse = 0;
5989             }
5990 
5991             if (g->lflags & 0x80) {
5992                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
5993                g->color_table = (stbi_uc *) g->lpal;
5994             } else if (g->flags & 0x80) {
5995                if (g->transparent >= 0 && (g->eflags & 0x01)) {
5996                   prev_trans = g->pal[g->transparent][3];
5997                   g->pal[g->transparent][3] = 0;
5998                }
5999                g->color_table = (stbi_uc *) g->pal;
6000             } else
6001                return stbi__errpuc("missing color table", "Corrupt GIF");
6002 
6003             o = stbi__process_gif_raster(s, g);
6004             if (o == NULL) return NULL;
6005 
6006             if (prev_trans != -1)
6007                g->pal[g->transparent][3] = (stbi_uc) prev_trans;
6008 
6009             return o;
6010          }
6011 
6012          case 0x21: // Comment Extension.
6013          {
6014             int len;
6015             if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
6016                len = stbi__get8(s);
6017                if (len == 4) {
6018                   g->eflags = stbi__get8(s);
6019                   g->delay = stbi__get16le(s);
6020                   g->transparent = stbi__get8(s);
6021                } else {
6022                   stbi__skip(s, len);
6023                   break;
6024                }
6025             }
6026             while ((len = stbi__get8(s)) != 0)
6027                stbi__skip(s, len);
6028             break;
6029          }
6030 
6031          case 0x3B: // gif stream termination code
6032             return (stbi_uc *) s; // using '1' causes warning on some compilers
6033 
6034          default:
6035             return stbi__errpuc("unknown code", "Corrupt GIF");
6036       }
6037    }
6038 
6039    STBI_NOTUSED(req_comp);
6040 }
6041 
stbi__gif_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)6042 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
6043 {
6044    stbi_uc *u = 0;
6045    stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
6046    memset(g, 0, sizeof(*g));
6047 
6048    u = stbi__gif_load_next(s, g, comp, req_comp);
6049    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
6050    if (u) {
6051       *x = g->w;
6052       *y = g->h;
6053       if (req_comp && req_comp != 4)
6054          u = stbi__convert_format(u, 4, req_comp, g->w, g->h);
6055    }
6056    else if (g->out)
6057       STBI_FREE(g->out);
6058    STBI_FREE(g);
6059    return u;
6060 }
6061 
stbi__gif_info(stbi__context * s,int * x,int * y,int * comp)6062 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
6063 {
6064    return stbi__gif_info_raw(s,x,y,comp);
6065 }
6066 #endif
6067 
6068 // *************************************************************************************************
6069 // Radiance RGBE HDR loader
6070 // originally by Nicolas Schulz
6071 #ifndef STBI_NO_HDR
stbi__hdr_test_core(stbi__context * s)6072 static int stbi__hdr_test_core(stbi__context *s)
6073 {
6074    const char *signature = "#?RADIANCE\n";
6075    int i;
6076    for (i=0; signature[i]; ++i)
6077       if (stbi__get8(s) != signature[i])
6078          return 0;
6079    return 1;
6080 }
6081 
stbi__hdr_test(stbi__context * s)6082 static int stbi__hdr_test(stbi__context* s)
6083 {
6084    int r = stbi__hdr_test_core(s);
6085    stbi__rewind(s);
6086    return r;
6087 }
6088 
6089 #define STBI__HDR_BUFLEN  1024
stbi__hdr_gettoken(stbi__context * z,char * buffer)6090 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
6091 {
6092    int len=0;
6093    char c = '\0';
6094 
6095    c = (char) stbi__get8(z);
6096 
6097    while (!stbi__at_eof(z) && c != '\n') {
6098       buffer[len++] = c;
6099       if (len == STBI__HDR_BUFLEN-1) {
6100          // flush to end of line
6101          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
6102             ;
6103          break;
6104       }
6105       c = (char) stbi__get8(z);
6106    }
6107 
6108    buffer[len] = 0;
6109    return buffer;
6110 }
6111 
stbi__hdr_convert(float * output,stbi_uc * input,int req_comp)6112 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
6113 {
6114    if ( input[3] != 0 ) {
6115       float f1;
6116       // Exponent
6117       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
6118       if (req_comp <= 2)
6119          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
6120       else {
6121          output[0] = input[0] * f1;
6122          output[1] = input[1] * f1;
6123          output[2] = input[2] * f1;
6124       }
6125       if (req_comp == 2) output[1] = 1;
6126       if (req_comp == 4) output[3] = 1;
6127    } else {
6128       switch (req_comp) {
6129          case 4: output[3] = 1; /* fallthrough */
6130          case 3: output[0] = output[1] = output[2] = 0;
6131                  break;
6132          case 2: output[1] = 1; /* fallthrough */
6133          case 1: output[0] = 0;
6134                  break;
6135       }
6136    }
6137 }
6138 
stbi__hdr_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)6139 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
6140 {
6141    char buffer[STBI__HDR_BUFLEN];
6142    char *token;
6143    int valid = 0;
6144    int width, height;
6145    stbi_uc *scanline;
6146    float *hdr_data;
6147    int len;
6148    unsigned char count, value;
6149    int i, j, k, c1,c2, z;
6150 
6151 
6152    // Check identifier
6153    if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0)
6154       return stbi__errpf("not HDR", "Corrupt HDR image");
6155 
6156    // Parse header
6157    for(;;) {
6158       token = stbi__hdr_gettoken(s,buffer);
6159       if (token[0] == 0) break;
6160       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
6161    }
6162 
6163    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
6164 
6165    // Parse width and height
6166    // can't use sscanf() if we're not using stdio!
6167    token = stbi__hdr_gettoken(s,buffer);
6168    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
6169    token += 3;
6170    height = (int) strtol(token, &token, 10);
6171    while (*token == ' ') ++token;
6172    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
6173    token += 3;
6174    width = (int) strtol(token, NULL, 10);
6175 
6176    *x = width;
6177    *y = height;
6178 
6179    if (comp) *comp = 3;
6180    if (req_comp == 0) req_comp = 3;
6181 
6182    // Read data
6183    hdr_data = (float *) stbi__malloc(height * width * req_comp * sizeof(float));
6184 
6185    // Load image data
6186    // image data is stored as some number of sca
6187    if ( width < 8 || width >= 32768) {
6188       // Read flat data
6189       for (j=0; j < height; ++j) {
6190          for (i=0; i < width; ++i) {
6191             stbi_uc rgbe[4];
6192            main_decode_loop:
6193             stbi__getn(s, rgbe, 4);
6194             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
6195          }
6196       }
6197    } else {
6198       // Read RLE-encoded data
6199       scanline = NULL;
6200 
6201       for (j = 0; j < height; ++j) {
6202          c1 = stbi__get8(s);
6203          c2 = stbi__get8(s);
6204          len = stbi__get8(s);
6205          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
6206             // not run-length encoded, so we have to actually use THIS data as a decoded
6207             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
6208             stbi_uc rgbe[4];
6209             rgbe[0] = (stbi_uc) c1;
6210             rgbe[1] = (stbi_uc) c2;
6211             rgbe[2] = (stbi_uc) len;
6212             rgbe[3] = (stbi_uc) stbi__get8(s);
6213             stbi__hdr_convert(hdr_data, rgbe, req_comp);
6214             i = 1;
6215             j = 0;
6216             STBI_FREE(scanline);
6217             goto main_decode_loop; // yes, this makes no sense
6218          }
6219          len <<= 8;
6220          len |= stbi__get8(s);
6221          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
6222          if (scanline == NULL) scanline = (stbi_uc *) stbi__malloc(width * 4);
6223 
6224          for (k = 0; k < 4; ++k) {
6225             i = 0;
6226             while (i < width) {
6227                count = stbi__get8(s);
6228                if (count > 128) {
6229                   // Run
6230                   value = stbi__get8(s);
6231                   count -= 128;
6232                   for (z = 0; z < count; ++z)
6233                      scanline[i++ * 4 + k] = value;
6234                } else {
6235                   // Dump
6236                   for (z = 0; z < count; ++z)
6237                      scanline[i++ * 4 + k] = stbi__get8(s);
6238                }
6239             }
6240          }
6241          for (i=0; i < width; ++i)
6242             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
6243       }
6244       STBI_FREE(scanline);
6245    }
6246 
6247    return hdr_data;
6248 }
6249 
stbi__hdr_info(stbi__context * s,int * x,int * y,int * comp)6250 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
6251 {
6252    char buffer[STBI__HDR_BUFLEN];
6253    char *token;
6254    int valid = 0;
6255 
6256    if (stbi__hdr_test(s) == 0) {
6257        stbi__rewind( s );
6258        return 0;
6259    }
6260 
6261    for(;;) {
6262       token = stbi__hdr_gettoken(s,buffer);
6263       if (token[0] == 0) break;
6264       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
6265    }
6266 
6267    if (!valid) {
6268        stbi__rewind( s );
6269        return 0;
6270    }
6271    token = stbi__hdr_gettoken(s,buffer);
6272    if (strncmp(token, "-Y ", 3)) {
6273        stbi__rewind( s );
6274        return 0;
6275    }
6276    token += 3;
6277    *y = (int) strtol(token, &token, 10);
6278    while (*token == ' ') ++token;
6279    if (strncmp(token, "+X ", 3)) {
6280        stbi__rewind( s );
6281        return 0;
6282    }
6283    token += 3;
6284    *x = (int) strtol(token, NULL, 10);
6285    *comp = 3;
6286    return 1;
6287 }
6288 #endif // STBI_NO_HDR
6289 
6290 #ifndef STBI_NO_BMP
stbi__bmp_info(stbi__context * s,int * x,int * y,int * comp)6291 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
6292 {
6293    void *p;
6294    stbi__bmp_data info;
6295 
6296    info.all_a = 255;
6297    p = stbi__bmp_parse_header(s, &info);
6298    stbi__rewind( s );
6299    if (p == NULL)
6300       return 0;
6301    *x = s->img_x;
6302    *y = s->img_y;
6303    *comp = info.ma ? 4 : 3;
6304    return 1;
6305 }
6306 #endif
6307 
6308 #ifndef STBI_NO_PSD
stbi__psd_info(stbi__context * s,int * x,int * y,int * comp)6309 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
6310 {
6311    int channelCount;
6312    if (stbi__get32be(s) != 0x38425053) {
6313        stbi__rewind( s );
6314        return 0;
6315    }
6316    if (stbi__get16be(s) != 1) {
6317        stbi__rewind( s );
6318        return 0;
6319    }
6320    stbi__skip(s, 6);
6321    channelCount = stbi__get16be(s);
6322    if (channelCount < 0 || channelCount > 16) {
6323        stbi__rewind( s );
6324        return 0;
6325    }
6326    *y = stbi__get32be(s);
6327    *x = stbi__get32be(s);
6328    if (stbi__get16be(s) != 8) {
6329        stbi__rewind( s );
6330        return 0;
6331    }
6332    if (stbi__get16be(s) != 3) {
6333        stbi__rewind( s );
6334        return 0;
6335    }
6336    *comp = 4;
6337    return 1;
6338 }
6339 #endif
6340 
6341 #ifndef STBI_NO_PIC
stbi__pic_info(stbi__context * s,int * x,int * y,int * comp)6342 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
6343 {
6344    int act_comp=0,num_packets=0,chained;
6345    stbi__pic_packet packets[10];
6346 
6347    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
6348       stbi__rewind(s);
6349       return 0;
6350    }
6351 
6352    stbi__skip(s, 88);
6353 
6354    *x = stbi__get16be(s);
6355    *y = stbi__get16be(s);
6356    if (stbi__at_eof(s)) {
6357       stbi__rewind( s);
6358       return 0;
6359    }
6360    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
6361       stbi__rewind( s );
6362       return 0;
6363    }
6364 
6365    stbi__skip(s, 8);
6366 
6367    do {
6368       stbi__pic_packet *packet;
6369 
6370       if (num_packets==sizeof(packets)/sizeof(packets[0]))
6371          return 0;
6372 
6373       packet = &packets[num_packets++];
6374       chained = stbi__get8(s);
6375       packet->size    = stbi__get8(s);
6376       packet->type    = stbi__get8(s);
6377       packet->channel = stbi__get8(s);
6378       act_comp |= packet->channel;
6379 
6380       if (stbi__at_eof(s)) {
6381           stbi__rewind( s );
6382           return 0;
6383       }
6384       if (packet->size != 8) {
6385           stbi__rewind( s );
6386           return 0;
6387       }
6388    } while (chained);
6389 
6390    *comp = (act_comp & 0x10 ? 4 : 3);
6391 
6392    return 1;
6393 }
6394 #endif
6395 
6396 // *************************************************************************************************
6397 // Portable Gray Map and Portable Pixel Map loader
6398 // by Ken Miller
6399 //
6400 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
6401 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
6402 //
6403 // Known limitations:
6404 //    Does not support comments in the header section
6405 //    Does not support ASCII image data (formats P2 and P3)
6406 //    Does not support 16-bit-per-channel
6407 
6408 #ifndef STBI_NO_PNM
6409 
stbi__pnm_test(stbi__context * s)6410 static int      stbi__pnm_test(stbi__context *s)
6411 {
6412    char p, t;
6413    p = (char) stbi__get8(s);
6414    t = (char) stbi__get8(s);
6415    if (p != 'P' || (t != '5' && t != '6')) {
6416        stbi__rewind( s );
6417        return 0;
6418    }
6419    return 1;
6420 }
6421 
stbi__pnm_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)6422 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
6423 {
6424    stbi_uc *out;
6425    if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
6426       return 0;
6427    *x = s->img_x;
6428    *y = s->img_y;
6429    *comp = s->img_n;
6430 
6431    out = (stbi_uc *) stbi__malloc(s->img_n * s->img_x * s->img_y);
6432    if (!out) return stbi__errpuc("outofmem", "Out of memory");
6433    stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
6434 
6435    if (req_comp && req_comp != s->img_n) {
6436       out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
6437       if (out == NULL) return out; // stbi__convert_format frees input on failure
6438    }
6439    return out;
6440 }
6441 
stbi__pnm_isspace(char c)6442 static int      stbi__pnm_isspace(char c)
6443 {
6444    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
6445 }
6446 
stbi__pnm_skip_whitespace(stbi__context * s,char * c)6447 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
6448 {
6449    for (;;) {
6450       while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
6451          *c = (char) stbi__get8(s);
6452 
6453       if (stbi__at_eof(s) || *c != '#')
6454          break;
6455 
6456       while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
6457          *c = (char) stbi__get8(s);
6458    }
6459 }
6460 
stbi__pnm_isdigit(char c)6461 static int      stbi__pnm_isdigit(char c)
6462 {
6463    return c >= '0' && c <= '9';
6464 }
6465 
stbi__pnm_getinteger(stbi__context * s,char * c)6466 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
6467 {
6468    int value = 0;
6469 
6470    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
6471       value = value*10 + (*c - '0');
6472       *c = (char) stbi__get8(s);
6473    }
6474 
6475    return value;
6476 }
6477 
stbi__pnm_info(stbi__context * s,int * x,int * y,int * comp)6478 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
6479 {
6480    int maxv;
6481    char c, p, t;
6482 
6483    stbi__rewind( s );
6484 
6485    // Get identifier
6486    p = (char) stbi__get8(s);
6487    t = (char) stbi__get8(s);
6488    if (p != 'P' || (t != '5' && t != '6')) {
6489        stbi__rewind( s );
6490        return 0;
6491    }
6492 
6493    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
6494 
6495    c = (char) stbi__get8(s);
6496    stbi__pnm_skip_whitespace(s, &c);
6497 
6498    *x = stbi__pnm_getinteger(s, &c); // read width
6499    stbi__pnm_skip_whitespace(s, &c);
6500 
6501    *y = stbi__pnm_getinteger(s, &c); // read height
6502    stbi__pnm_skip_whitespace(s, &c);
6503 
6504    maxv = stbi__pnm_getinteger(s, &c);  // read max value
6505 
6506    if (maxv > 255)
6507       return stbi__err("max value > 255", "PPM image not 8-bit");
6508    else
6509       return 1;
6510 }
6511 #endif
6512 
stbi__info_main(stbi__context * s,int * x,int * y,int * comp)6513 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
6514 {
6515    #ifndef STBI_NO_JPEG
6516    if (stbi__jpeg_info(s, x, y, comp)) return 1;
6517    #endif
6518 
6519    #ifndef STBI_NO_PNG
6520    if (stbi__png_info(s, x, y, comp))  return 1;
6521    #endif
6522 
6523    #ifndef STBI_NO_GIF
6524    if (stbi__gif_info(s, x, y, comp))  return 1;
6525    #endif
6526 
6527    #ifndef STBI_NO_BMP
6528    if (stbi__bmp_info(s, x, y, comp))  return 1;
6529    #endif
6530 
6531    #ifndef STBI_NO_PSD
6532    if (stbi__psd_info(s, x, y, comp))  return 1;
6533    #endif
6534 
6535    #ifndef STBI_NO_PIC
6536    if (stbi__pic_info(s, x, y, comp))  return 1;
6537    #endif
6538 
6539    #ifndef STBI_NO_PNM
6540    if (stbi__pnm_info(s, x, y, comp))  return 1;
6541    #endif
6542 
6543    #ifndef STBI_NO_HDR
6544    if (stbi__hdr_info(s, x, y, comp))  return 1;
6545    #endif
6546 
6547    // test tga last because it's a crappy test!
6548    #ifndef STBI_NO_TGA
6549    if (stbi__tga_info(s, x, y, comp))
6550        return 1;
6551    #endif
6552    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
6553 }
6554 
6555 #ifndef STBI_NO_STDIO
stbi_info(char const * filename,int * x,int * y,int * comp)6556 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
6557 {
6558     FILE *f = stbi__fopen(filename, "rb");
6559     int result;
6560     if (!f) return stbi__err("can't fopen", "Unable to open file");
6561     result = stbi_info_from_file(f, x, y, comp);
6562     fclose(f);
6563     return result;
6564 }
6565 
stbi_info_from_file(FILE * f,int * x,int * y,int * comp)6566 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
6567 {
6568    int r;
6569    stbi__context s;
6570    long pos = ftell(f);
6571    stbi__start_file(&s, f);
6572    r = stbi__info_main(&s,x,y,comp);
6573    fseek(f,pos,SEEK_SET);
6574    return r;
6575 }
6576 #endif // !STBI_NO_STDIO
6577 
stbi_info_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp)6578 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
6579 {
6580    stbi__context s;
6581    stbi__start_mem(&s,buffer,len);
6582    return stbi__info_main(&s,x,y,comp);
6583 }
6584 
stbi_info_from_callbacks(stbi_io_callbacks const * c,void * user,int * x,int * y,int * comp)6585 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
6586 {
6587    stbi__context s;
6588    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
6589    return stbi__info_main(&s,x,y,comp);
6590 }
6591 
6592 #endif // STB_IMAGE_IMPLEMENTATION
6593 
6594 /*
6595    revision history:
6596       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
6597       2.11  (2016-04-02) allocate large structures on the stack
6598                          remove white matting for transparent PSD
6599                          fix reported channel count for PNG & BMP
6600                          re-enable SSE2 in non-gcc 64-bit
6601                          support RGB-formatted JPEG
6602                          read 16-bit PNGs (only as 8-bit)
6603       2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
6604       2.09  (2016-01-16) allow comments in PNM files
6605                          16-bit-per-pixel TGA (not bit-per-component)
6606                          info() for TGA could break due to .hdr handling
6607                          info() for BMP to shares code instead of sloppy parse
6608                          can use STBI_REALLOC_SIZED if allocator doesn't support realloc
6609                          code cleanup
6610       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
6611       2.07  (2015-09-13) fix compiler warnings
6612                          partial animated GIF support
6613                          limited 16-bpc PSD support
6614                          #ifdef unused functions
6615                          bug with < 92 byte PIC,PNM,HDR,TGA
6616       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
6617       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
6618       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
6619       2.03  (2015-04-12) extra corruption checking (mmozeiko)
6620                          stbi_set_flip_vertically_on_load (nguillemot)
6621                          fix NEON support; fix mingw support
6622       2.02  (2015-01-19) fix incorrect assert, fix warning
6623       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
6624       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
6625       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
6626                          progressive JPEG (stb)
6627                          PGM/PPM support (Ken Miller)
6628                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
6629                          GIF bugfix -- seemingly never worked
6630                          STBI_NO_*, STBI_ONLY_*
6631       1.48  (2014-12-14) fix incorrectly-named assert()
6632       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
6633                          optimize PNG (ryg)
6634                          fix bug in interlaced PNG with user-specified channel count (stb)
6635       1.46  (2014-08-26)
6636               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
6637       1.45  (2014-08-16)
6638               fix MSVC-ARM internal compiler error by wrapping malloc
6639       1.44  (2014-08-07)
6640               various warning fixes from Ronny Chevalier
6641       1.43  (2014-07-15)
6642               fix MSVC-only compiler problem in code changed in 1.42
6643       1.42  (2014-07-09)
6644               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
6645               fixes to stbi__cleanup_jpeg path
6646               added STBI_ASSERT to avoid requiring assert.h
6647       1.41  (2014-06-25)
6648               fix search&replace from 1.36 that messed up comments/error messages
6649       1.40  (2014-06-22)
6650               fix gcc struct-initialization warning
6651       1.39  (2014-06-15)
6652               fix to TGA optimization when req_comp != number of components in TGA;
6653               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
6654               add support for BMP version 5 (more ignored fields)
6655       1.38  (2014-06-06)
6656               suppress MSVC warnings on integer casts truncating values
6657               fix accidental rename of 'skip' field of I/O
6658       1.37  (2014-06-04)
6659               remove duplicate typedef
6660       1.36  (2014-06-03)
6661               convert to header file single-file library
6662               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
6663       1.35  (2014-05-27)
6664               various warnings
6665               fix broken STBI_SIMD path
6666               fix bug where stbi_load_from_file no longer left file pointer in correct place
6667               fix broken non-easy path for 32-bit BMP (possibly never used)
6668               TGA optimization by Arseny Kapoulkine
6669       1.34  (unknown)
6670               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
6671       1.33  (2011-07-14)
6672               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
6673       1.32  (2011-07-13)
6674               support for "info" function for all supported filetypes (SpartanJ)
6675       1.31  (2011-06-20)
6676               a few more leak fixes, bug in PNG handling (SpartanJ)
6677       1.30  (2011-06-11)
6678               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
6679               removed deprecated format-specific test/load functions
6680               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
6681               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
6682               fix inefficiency in decoding 32-bit BMP (David Woo)
6683       1.29  (2010-08-16)
6684               various warning fixes from Aurelien Pocheville
6685       1.28  (2010-08-01)
6686               fix bug in GIF palette transparency (SpartanJ)
6687       1.27  (2010-08-01)
6688               cast-to-stbi_uc to fix warnings
6689       1.26  (2010-07-24)
6690               fix bug in file buffering for PNG reported by SpartanJ
6691       1.25  (2010-07-17)
6692               refix trans_data warning (Won Chun)
6693       1.24  (2010-07-12)
6694               perf improvements reading from files on platforms with lock-heavy fgetc()
6695               minor perf improvements for jpeg
6696               deprecated type-specific functions so we'll get feedback if they're needed
6697               attempt to fix trans_data warning (Won Chun)
6698       1.23    fixed bug in iPhone support
6699       1.22  (2010-07-10)
6700               removed image *writing* support
6701               stbi_info support from Jetro Lauha
6702               GIF support from Jean-Marc Lienher
6703               iPhone PNG-extensions from James Brown
6704               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
6705       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
6706       1.20    added support for Softimage PIC, by Tom Seddon
6707       1.19    bug in interlaced PNG corruption check (found by ryg)
6708       1.18  (2008-08-02)
6709               fix a threading bug (local mutable static)
6710       1.17    support interlaced PNG
6711       1.16    major bugfix - stbi__convert_format converted one too many pixels
6712       1.15    initialize some fields for thread safety
6713       1.14    fix threadsafe conversion bug
6714               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
6715       1.13    threadsafe
6716       1.12    const qualifiers in the API
6717       1.11    Support installable IDCT, colorspace conversion routines
6718       1.10    Fixes for 64-bit (don't use "unsigned long")
6719               optimized upsampling by Fabian "ryg" Giesen
6720       1.09    Fix format-conversion for PSD code (bad global variables!)
6721       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
6722       1.07    attempt to fix C++ warning/errors again
6723       1.06    attempt to fix C++ warning/errors again
6724       1.05    fix TGA loading to return correct *comp and use good luminance calc
6725       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
6726       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
6727       1.02    support for (subset of) HDR files, float interface for preferred access to them
6728       1.01    fix bug: possible bug in handling right-side up bmps... not sure
6729               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
6730       1.00    interface to zlib that skips zlib header
6731       0.99    correct handling of alpha in palette
6732       0.98    TGA loader by lonesock; dynamically add loaders (untested)
6733       0.97    jpeg errors on too large a file; also catch another malloc failure
6734       0.96    fix detection of invalid v value - particleman@mollyrocket forum
6735       0.95    during header scan, seek to markers in case of padding
6736       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
6737       0.93    handle jpegtran output; verbose errors
6738       0.92    read 4,8,16,24,32-bit BMP files of several formats
6739       0.91    output 24-bit Windows 3.0 BMP files
6740       0.90    fix a few more warnings; bump version number to approach 1.0
6741       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
6742       0.60    fix compiling as c++
6743       0.59    fix warnings: merge Dave Moore's -Wall fixes
6744       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
6745       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
6746       0.56    fix bug: zlib uncompressed mode len vs. nlen
6747       0.55    fix bug: restart_interval not initialized to 0
6748       0.54    allow NULL for 'int *comp'
6749       0.53    fix bug in png 3->4; speedup png decoding
6750       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
6751       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
6752               on 'test' only check type, not whether we support this variant
6753       0.50  (2006-11-19)
6754               first released version
6755 */
6756