1 /* stb_image - v2.02 - public domain image loader - http://nothings.org/stb_image.h
2                                      no warranty implied; use at your own risk
3 
4    Do this:
5       #define STB_IMAGE_IMPLEMENTATION
6    before you include this file in *one* C or C++ file to create the implementation.
7 
8    // i.e. it should look like this:
9    #include ...
10    #include ...
11    #include ...
12    #define STB_IMAGE_IMPLEMENTATION
13    #include "stb_image.h"
14 
15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
17 
18 
19    QUICK NOTES:
20       Primarily of interest to game developers and other people who can
21           avoid problematic images and only need the trivial interface
22 
23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
24       PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
25 
26       TGA (not sure what subset, if a subset)
27       BMP non-1bpp, non-RLE
28       PSD (composited view only, no extra channels)
29 
30       GIF (*comp always reports as 4-channel)
31       HDR (radiance rgbE format)
32       PIC (Softimage PIC)
33       PNM (PPM and PGM binary only)
34 
35       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
36       - decode from arbitrary I/O callbacks
37       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
38 
39    Full documentation under "DOCUMENTATION" below.
40 
41 
42    Revision 2.00 release notes:
43 
44       - Progressive JPEG is now supported.
45 
46       - PPM and PGM binary formats are now supported, thanks to Ken Miller.
47 
48       - x86 platforms now make use of SSE2 SIMD instructions for
49         JPEG decoding, and ARM platforms can use NEON SIMD if requested.
50         This work was done by Fabian "ryg" Giesen. SSE2 is used by
51         default, but NEON must be enabled explicitly; see docs.
52 
53         With other JPEG optimizations included in this version, we see
54         2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
55         on a JPEG on an ARM machine, relative to previous versions of this
56         library. The same results will not obtain for all JPGs and for all
57         x86/ARM machines. (Note that progressive JPEGs are significantly
58         slower to decode than regular JPEGs.) This doesn't mean that this
59         is the fastest JPEG decoder in the land; rather, it brings it
60         closer to parity with standard libraries. If you want the fastest
61         decode, look elsewhere. (See "Philosophy" section of docs below.)
62 
63         See final bullet items below for more info on SIMD.
64 
65       - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
66         the memory allocator. Unlike other STBI libraries, these macros don't
67         support a context parameter, so if you need to pass a context in to
68         the allocator, you'll have to store it in a global or a thread-local
69         variable.
70 
71       - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
72         STBI_NO_LINEAR.
73             STBI_NO_HDR:     suppress implementation of .hdr reader format
74             STBI_NO_LINEAR:  suppress high-dynamic-range light-linear float API
75 
76       - You can suppress implementation of any of the decoders to reduce
77         your code footprint by #defining one or more of the following
78         symbols before creating the implementation.
79 
80             STBI_NO_JPEG
81             STBI_NO_PNG
82             STBI_NO_BMP
83             STBI_NO_PSD
84             STBI_NO_TGA
85             STBI_NO_GIF
86             STBI_NO_HDR
87             STBI_NO_PIC
88             STBI_NO_PNM   (.ppm and .pgm)
89 
90       - You can request *only* certain decoders and suppress all other ones
91         (this will be more forward-compatible, as addition of new decoders
92         doesn't require you to disable them explicitly):
93 
94             STBI_ONLY_JPEG
95             STBI_ONLY_PNG
96             STBI_ONLY_BMP
97             STBI_ONLY_PSD
98             STBI_ONLY_TGA
99             STBI_ONLY_GIF
100             STBI_ONLY_HDR
101             STBI_ONLY_PIC
102             STBI_ONLY_PNM   (.ppm and .pgm)
103 
104          Note that you can define multiples of these, and you will get all
105          of them ("only x" and "only y" is interpreted to mean "only x&y").
106 
107        - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
108          want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
109 
110       - Compilation of all SIMD code can be suppressed with
111             #define STBI_NO_SIMD
112         It should not be necessary to disable SIMD unless you have issues
113         compiling (e.g. using an x86 compiler which doesn't support SSE
114         intrinsics or that doesn't support the method used to detect
115         SSE2 support at run-time), and even those can be reported as
116         bugs so I can refine the built-in compile-time checking to be
117         smarter.
118 
119       - The old STBI_SIMD system which allowed installing a user-defined
120         IDCT etc. has been removed. If you need this, don't upgrade. My
121         assumption is that almost nobody was doing this, and those who
122         were will find the built-in SIMD more satisfactory anyway.
123 
124       - RGB values computed for JPEG images are slightly different from
125         previous versions of stb_image. (This is due to using less
126         integer precision in SIMD.) The C code has been adjusted so
127         that the same RGB values will be computed regardless of whether
128         SIMD support is available, so your app should always produce
129         consistent results. But these results are slightly different from
130         previous versions. (Specifically, about 3% of available YCbCr values
131         will compute different RGB results from pre-1.49 versions by +-1;
132         most of the deviating values are one smaller in the G channel.)
133 
134       - If you must produce consistent results with previous versions of
135         stb_image, #define STBI_JPEG_OLD and you will get the same results
136         you used to; however, you will not get the SIMD speedups for
137         the YCbCr-to-RGB conversion step (although you should still see
138         significant JPEG speedup from the other changes).
139 
140         Please note that STBI_JPEG_OLD is a temporary feature; it will be
141         removed in future versions of the library. It is only intended for
142         near-term back-compatibility use.
143 
144 
145    Latest revision history:
146       2.02  (2015-01-19) fix incorrect assert, fix warning
147       2.01  (2015-01-17) fix various warnings
148       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
149       2.00  (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD
150                          progressive JPEG
151                          PGM/PPM support
152                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
153                          STBI_NO_*, STBI_ONLY_*
154                          GIF bugfix
155       1.48  (2014-12-14) fix incorrectly-named assert()
156       1.47  (2014-12-14) 1/2/4-bit PNG support (both grayscale and paletted)
157                          optimize PNG
158                          fix bug in interlaced PNG with user-specified channel count
159       1.46  (2014-08-26) fix broken tRNS chunk in non-paletted PNG
160       1.45  (2014-08-16) workaround MSVC-ARM internal compiler error by wrapping malloc
161 
162    See end of file for full revision history.
163 
164 
165  ============================    Contributors    =========================
166 
167  Image formats                                Bug fixes & warning fixes
168     Sean Barrett (jpeg, png, bmp)                Marc LeBlanc
169     Nicolas Schulz (hdr, psd)                    Christpher Lloyd
170     Jonathan Dummer (tga)                        Dave Moore
171     Jean-Marc Lienher (gif)                      Won Chun
172     Tom Seddon (pic)                             the Horde3D community
173     Thatcher Ulrich (psd)                        Janez Zemva
174     Ken Miller (pgm, ppm)                        Jonathan Blow
175                                                  Laurent Gomila
176                                                  Aruelien Pocheville
177  Extensions, features                            Ryamond Barbiero
178     Jetro Lauha (stbi_info)                      David Woo
179     Martin "SpartanJ" Golini (stbi_info)         Martin Golini
180     James "moose2000" Brown (iPhone PNG)         Roy Eltham
181     Ben "Disch" Wenger (io callbacks)            Luke Graham
182     Omar Cornut (1/2/4-bit PNG)                  Thomas Ruf
183                                                  John Bartholomew
184                                                  Ken Hamada
185  Optimizations & bugfixes                        Cort Stratton
186     Fabian "ryg" Giesen                          Blazej Dariusz Roszkowski
187     Arseny Kapoulkine                            Thibault Reuille
188                                                  Paul Du Bois
189                                                  Guillaume George
190   If your name should be here but                Jerry Jansson
191   isn't, let Sean know.                          Hayaki Saito
192                                                  Johan Duparc
193                                                  Ronny Chevalier
194                                                  Michal Cichon
195                                                  Tero Hanninen
196                                                  Sergio Gonzalez
197                                                  Cass Everitt
198                                                  Engin Manap
199 
200 License:
201    This software is in the public domain. Where that dedication is not
202    recognized, you are granted a perpetual, irrevocable license to copy
203    and modify this file however you want.
204 
205 */
206 
207 #ifndef STBI_INCLUDE_STB_IMAGE_H
208 #define STBI_INCLUDE_STB_IMAGE_H
209 
210 // DOCUMENTATION
211 //
212 // Limitations:
213 //    - no 16-bit-per-channel PNG
214 //    - no 12-bit-per-channel JPEG
215 //    - no JPEGs with arithmetic coding
216 //    - no 1-bit BMP
217 //    - GIF always returns *comp=4
218 //
219 // Basic usage (see HDR discussion below for HDR usage):
220 //    int x,y,n;
221 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
222 //    // ... process data if not NULL ...
223 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
224 //    // ... replace '0' with '1'..'4' to force that many components per pixel
225 //    // ... but 'n' will always be the number that it would have been if you said 0
226 //    stbi_image_free(data)
227 //
228 // Standard parameters:
229 //    int *x       -- outputs image width in pixels
230 //    int *y       -- outputs image height in pixels
231 //    int *comp    -- outputs # of image components in image file
232 //    int req_comp -- if non-zero, # of image components requested in result
233 //
234 // The return value from an image loader is an 'unsigned char *' which points
235 // to the pixel data, or NULL on an allocation failure or if the image is
236 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
237 // with each pixel consisting of N interleaved 8-bit components; the first
238 // pixel pointed to is top-left-most in the image. There is no padding between
239 // image scanlines or between pixels, regardless of format. The number of
240 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
241 // If req_comp is non-zero, *comp has the number of components that _would_
242 // have been output otherwise. E.g. if you set req_comp to 4, you will always
243 // get RGBA output, but you can check *comp to see if it's trivially opaque
244 // because e.g. there were only 3 channels in the source image.
245 //
246 // An output image with N components has the following components interleaved
247 // in this order in each pixel:
248 //
249 //     N=#comp     components
250 //       1           grey
251 //       2           grey, alpha
252 //       3           red, green, blue
253 //       4           red, green, blue, alpha
254 //
255 // If image loading fails for any reason, the return value will be NULL,
256 // and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
257 // can be queried for an extremely brief, end-user unfriendly explanation
258 // of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
259 // compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
260 // more user-friendly ones.
261 //
262 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
263 //
264 // ===========================================================================
265 //
266 // Philosophy
267 //
268 // stb libraries are designed with the following priorities:
269 //
270 //    1. easy to use
271 //    2. easy to maintain
272 //    3. good performance
273 //
274 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
275 // and for best performance I may provide less-easy-to-use APIs that give higher
276 // performance, in addition to the easy to use ones. Nevertheless, it's important
277 // to keep in mind that from the standpoint of you, a client of this library,
278 // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
279 //
280 // Some secondary priorities arise directly from the first two, some of which
281 // make more explicit reasons why performance can't be emphasized.
282 //
283 //    - Portable ("ease of use")
284 //    - Small footprint ("easy to maintain")
285 //    - No dependencies ("ease of use")
286 //
287 // ===========================================================================
288 //
289 // I/O callbacks
290 //
291 // I/O callbacks allow you to read from arbitrary sources, like packaged
292 // files or some other source. Data read from callbacks are processed
293 // through a small internal buffer (currently 128 bytes) to try to reduce
294 // overhead.
295 //
296 // The three functions you must define are "read" (reads some bytes of data),
297 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
298 //
299 // ===========================================================================
300 //
301 // SIMD support
302 //
303 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
304 // supported by the compiler. For ARM Neon support, you must explicitly
305 // request it.
306 //
307 // (The old do-it-yourself SIMD API is no longer supported in the current
308 // code.)
309 //
310 // On x86, SSE2 will automatically be used when available based on a run-time
311 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
312 // the typical path is to have separate builds for NEON and non-NEON devices
313 // (at least this is true for iOS and Android). Therefore, the NEON support is
314 // toggled by a build flag: define STBI_NEON to get NEON loops.
315 //
316 // The output of the JPEG decoder is slightly different from versions where
317 // SIMD support was introduced (that is, for versions before 1.49). The
318 // difference is only +-1 in the 8-bit RGB channels, and only on a small
319 // fraction of pixels. You can force the pre-1.49 behavior by defining
320 // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
321 // and hence cost some performance.
322 //
323 // If for some reason you do not want to use any of SIMD code, or if
324 // you have issues compiling it, you can disable it entirely by
325 // defining STBI_NO_SIMD.
326 //
327 // ===========================================================================
328 //
329 // HDR image support   (disable by defining STBI_NO_HDR)
330 //
331 // stb_image now supports loading HDR images in general, and currently
332 // the Radiance .HDR file format, although the support is provided
333 // generically. You can still load any file through the existing interface;
334 // if you attempt to load an HDR file, it will be automatically remapped to
335 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
336 // both of these constants can be reconfigured through this interface:
337 //
338 //     stbi_hdr_to_ldr_gamma(2.2f);
339 //     stbi_hdr_to_ldr_scale(1.0f);
340 //
341 // (note, do not use _inverse_ constants; stbi_image will invert them
342 // appropriately).
343 //
344 // Additionally, there is a new, parallel interface for loading files as
345 // (linear) floats to preserve the full dynamic range:
346 //
347 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
348 //
349 // If you load LDR images through this interface, those images will
350 // be promoted to floating point values, run through the inverse of
351 // constants corresponding to the above:
352 //
353 //     stbi_ldr_to_hdr_scale(1.0f);
354 //     stbi_ldr_to_hdr_gamma(2.2f);
355 //
356 // Finally, given a filename (or an open file or memory block--see header
357 // file for details) containing image data, you can query for the "most
358 // appropriate" interface to use (that is, whether the image is HDR or
359 // not), using:
360 //
361 //     stbi_is_hdr(char *filename);
362 //
363 // ===========================================================================
364 //
365 // iPhone PNG support:
366 //
367 // By default we convert iphone-formatted PNGs back to RGB, even though
368 // they are internally encoded differently. You can disable this conversion
369 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
370 // you will always just get the native iphone "format" through (which
371 // is BGR stored in RGB).
372 //
373 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
374 // pixel to remove any premultiplied alpha *only* if the image file explicitly
375 // says there's premultiplied data (currently only happens in iPhone images,
376 // and only if iPhone convert-to-rgb processing is on).
377 //
378 
379 
380 #ifndef STBI_NO_STDIO
381 #include <stdio.h>
382 #endif // STBI_NO_STDIO
383 
384 #define STBI_VERSION 1
385 
386 enum
387 {
388    STBI_default = 0, // only used for req_comp
389 
390    STBI_grey       = 1,
391    STBI_grey_alpha = 2,
392    STBI_rgb        = 3,
393    STBI_rgb_alpha  = 4
394 };
395 
396 typedef unsigned char stbi_uc;
397 
398 #ifdef __cplusplus
399 extern "C" {
400 #endif
401 
402 #ifdef STB_IMAGE_STATIC
403 #define STBIDEF static
404 #else
405 #define STBIDEF extern
406 #endif
407 
408 //////////////////////////////////////////////////////////////////////////////
409 //
410 // PRIMARY API - works on images of any type
411 //
412 
413 //
414 // load image by filename, open file, or memory buffer
415 //
416 
417 typedef struct
418 {
419    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
420    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
421    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
422 } stbi_io_callbacks;
423 
424 STBIDEF stbi_uc *stbi_load               (char              const *filename,           int *x, int *y, int *comp, int req_comp);
425 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *comp, int req_comp);
426 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *comp, int req_comp);
427 
428 #ifndef STBI_NO_STDIO
429 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f,                  int *x, int *y, int *comp, int req_comp);
430 // for stbi_load_from_file, file pointer is left pointing immediately after image
431 #endif
432 
433 #ifndef STBI_NO_LINEAR
434    STBIDEF float *stbi_loadf                 (char const *filename,           int *x, int *y, int *comp, int req_comp);
435    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp);
436    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp);
437 
438    #ifndef STBI_NO_STDIO
439    STBIDEF float *stbi_loadf_from_file  (FILE *f,                int *x, int *y, int *comp, int req_comp);
440    #endif
441 #endif
442 
443 #ifndef STBI_NO_HDR
444    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
445    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
446 #endif
447 
448 #ifndef STBI_NO_LINEAR
449    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
450    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
451 #endif // STBI_NO_HDR
452 
453 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
454 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
455 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
456 #ifndef STBI_NO_STDIO
457 STBIDEF int      stbi_is_hdr          (char const *filename);
458 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
459 #endif // STBI_NO_STDIO
460 
461 
462 // get a VERY brief reason for failure
463 // NOT THREADSAFE
464 STBIDEF const char *stbi_failure_reason  (void);
465 
466 // free the loaded image -- this is just free()
467 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
468 
469 // get image dimensions & components without fully decoding
470 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
471 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
472 
473 #ifndef STBI_NO_STDIO
474 STBIDEF int      stbi_info            (char const *filename,     int *x, int *y, int *comp);
475 STBIDEF int      stbi_info_from_file  (FILE *f,                  int *x, int *y, int *comp);
476 
477 #endif
478 
479 
480 
481 // for image formats that explicitly notate that they have premultiplied alpha,
482 // we just return the colors as stored in the file. set this flag to force
483 // unpremultiplication. results are undefined if the unpremultiply overflow.
484 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
485 
486 // indicate whether we should process iphone images back to canonical format,
487 // or just pass them through "as-is"
488 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
489 
490 
491 // ZLIB client - used by PNG, available for other purposes
492 
493 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
494 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
495 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
496 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
497 
498 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
499 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
500 
501 
502 #ifdef __cplusplus
503 }
504 #endif
505 
506 //
507 //
508 ////   end header file   /////////////////////////////////////////////////////
509 #endif // STBI_INCLUDE_STB_IMAGE_H
510 
511 #ifdef STB_IMAGE_IMPLEMENTATION
512 
513 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
514   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
515   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
516   || defined(STBI_ONLY_ZLIB)
517    #ifndef STBI_ONLY_JPEG
518    #define STBI_NO_JPEG
519    #endif
520    #ifndef STBI_ONLY_PNG
521    #define STBI_NO_PNG
522    #endif
523    #ifndef STBI_ONLY_BMP
524    #define STBI_NO_BMP
525    #endif
526    #ifndef STBI_ONLY_PSD
527    #define STBI_NO_PSD
528    #endif
529    #ifndef STBI_ONLY_TGA
530    #define STBI_NO_TGA
531    #endif
532    #ifndef STBI_ONLY_GIF
533    #define STBI_NO_GIF
534    #endif
535    #ifndef STBI_ONLY_HDR
536    #define STBI_NO_HDR
537    #endif
538    #ifndef STBI_ONLY_PIC
539    #define STBI_NO_PIC
540    #endif
541    #ifndef STBI_ONLY_PNM
542    #define STBI_NO_PNM
543    #endif
544 #endif
545 
546 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
547 #define STBI_NO_ZLIB
548 #endif
549 
550 
551 #include <stdarg.h>
552 #include <stddef.h> // ptrdiff_t on osx
553 #include <stdlib.h>
554 #include <string.h>
555 
556 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
557 #include <math.h>  // ldexp
558 #endif
559 
560 #ifndef STBI_NO_STDIO
561 #include <stdio.h>
562 #endif
563 
564 #ifndef STBI_ASSERT
565 #include <assert.h>
566 #define STBI_ASSERT(x) assert(x)
567 #endif
568 
569 
570 #ifndef _MSC_VER
571    #ifdef __cplusplus
572    #define stbi_inline inline
573    #else
574    #define stbi_inline
575    #endif
576 #else
577    #define stbi_inline __forceinline
578 #endif
579 
580 
581 #ifdef _MSC_VER
582 typedef unsigned short stbi__uint16;
583 typedef   signed short stbi__int16;
584 typedef unsigned int   stbi__uint32;
585 typedef   signed int   stbi__int32;
586 #else
587 #include <stdint.h>
588 typedef uint16_t stbi__uint16;
589 typedef int16_t  stbi__int16;
590 typedef uint32_t stbi__uint32;
591 typedef int32_t  stbi__int32;
592 #endif
593 
594 // should produce compiler error if size is wrong
595 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
596 
597 #ifdef _MSC_VER
598 #define STBI_NOTUSED(v)  (void)(v)
599 #else
600 #define STBI_NOTUSED(v)  (void)sizeof(v)
601 #endif
602 
603 #ifdef _MSC_VER
604 #define STBI_HAS_LROTL
605 #endif
606 
607 #ifdef STBI_HAS_LROTL
608    #define stbi_lrot(x,y)  _lrotl(x,y)
609 #else
610    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (32 - (y))))
611 #endif
612 
613 #if defined(STBI_MALLOC) && defined(STBI_FREE) && defined(STBI_REALLOC)
614 // ok
615 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC)
616 // ok
617 #else
618 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC."
619 #endif
620 
621 #ifndef STBI_MALLOC
622 #define STBI_MALLOC(sz)    malloc(sz)
623 #define STBI_REALLOC(p,sz) realloc(p,sz)
624 #define STBI_FREE(p)       free(p)
625 #endif
626 
627 #if defined(__GNUC__) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
628 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
629 // (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
630 // this is just broken and gcc are jerks for not fixing it properly
631 // http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
632 #define STBI_NO_SIMD
633 #endif
634 
635 #if !defined(STBI_NO_SIMD) && (defined(__x86_64__) || defined(_M_X64) || defined(__i386) || defined(_M_IX86))
636 #define STBI_SSE2
637 #include <emmintrin.h>
638 
639 #ifdef _MSC_VER
640 
641 #if _MSC_VER >= 1400  // not VC6
642 #include <intrin.h> // __cpuid
stbi__cpuid3(void)643 static int stbi__cpuid3(void)
644 {
645    int info[4];
646    __cpuid(info,1);
647    return info[3];
648 }
649 #else
stbi__cpuid3(void)650 static int stbi__cpuid3(void)
651 {
652    int res;
653    __asm {
654       mov  eax,1
655       cpuid
656       mov  res,edx
657    }
658    return res;
659 }
660 #endif
661 
662 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
663 
stbi__sse2_available()664 static int stbi__sse2_available()
665 {
666    int info3 = stbi__cpuid3();
667    return ((info3 >> 26) & 1) != 0;
668 }
669 #else // assume GCC-style if not VC++
670 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
671 
stbi__sse2_available()672 static int stbi__sse2_available()
673 {
674 #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
675    // GCC 4.8+ has a nice way to do this
676    return __builtin_cpu_supports("sse2");
677 #else
678    // portable way to do this, preferably without using GCC inline ASM?
679    // just bail for now.
680    return 0;
681 #endif
682 }
683 #endif
684 #endif
685 
686 // ARM NEON
687 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
688 #undef STBI_NEON
689 #endif
690 
691 #ifdef STBI_NEON
692 #include <arm_neon.h>
693 // assume GCC or Clang on ARM targets
694 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
695 #endif
696 
697 #ifndef STBI_SIMD_ALIGN
698 #define STBI_SIMD_ALIGN(type, name) type name
699 #endif
700 
701 ///////////////////////////////////////////////
702 //
703 //  stbi__context struct and start_xxx functions
704 
705 // stbi__context structure is our basic context used by all images, so it
706 // contains all the IO context, plus some basic image information
707 typedef struct
708 {
709    stbi__uint32 img_x, img_y;
710    int img_n, img_out_n;
711 
712    stbi_io_callbacks io;
713    void *io_user_data;
714 
715    int read_from_callbacks;
716    int buflen;
717    stbi_uc buffer_start[128];
718 
719    stbi_uc *img_buffer, *img_buffer_end;
720    stbi_uc *img_buffer_original;
721 } stbi__context;
722 
723 
724 static void stbi__refill_buffer(stbi__context *s);
725 
726 // initialize a memory-decode context
stbi__start_mem(stbi__context * s,stbi_uc const * buffer,int len)727 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
728 {
729    s->io.read = NULL;
730    s->read_from_callbacks = 0;
731    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
732    s->img_buffer_end = (stbi_uc *) buffer+len;
733 }
734 
735 // initialize a callback-based context
stbi__start_callbacks(stbi__context * s,stbi_io_callbacks * c,void * user)736 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
737 {
738    s->io = *c;
739    s->io_user_data = user;
740    s->buflen = sizeof(s->buffer_start);
741    s->read_from_callbacks = 1;
742    s->img_buffer_original = s->buffer_start;
743    stbi__refill_buffer(s);
744 }
745 
746 #ifndef STBI_NO_STDIO
747 
stbi__stdio_read(void * user,char * data,int size)748 static int stbi__stdio_read(void *user, char *data, int size)
749 {
750    return (int) fread(data,1,size,(FILE*) user);
751 }
752 
stbi__stdio_skip(void * user,int n)753 static void stbi__stdio_skip(void *user, int n)
754 {
755    fseek((FILE*) user, n, SEEK_CUR);
756 }
757 
stbi__stdio_eof(void * user)758 static int stbi__stdio_eof(void *user)
759 {
760    return feof((FILE*) user);
761 }
762 
763 static stbi_io_callbacks stbi__stdio_callbacks =
764 {
765    stbi__stdio_read,
766    stbi__stdio_skip,
767    stbi__stdio_eof,
768 };
769 
stbi__start_file(stbi__context * s,FILE * f)770 static void stbi__start_file(stbi__context *s, FILE *f)
771 {
772    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
773 }
774 
775 //static void stop_file(stbi__context *s) { }
776 
777 #endif // !STBI_NO_STDIO
778 
stbi__rewind(stbi__context * s)779 static void stbi__rewind(stbi__context *s)
780 {
781    // conceptually rewind SHOULD rewind to the beginning of the stream,
782    // but we just rewind to the beginning of the initial buffer, because
783    // we only use it after doing 'test', which only ever looks at at most 92 bytes
784    s->img_buffer = s->img_buffer_original;
785 }
786 
787 #ifndef STBI_NO_JPEG
788 static int      stbi__jpeg_test(stbi__context *s);
789 static stbi_uc *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
790 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
791 #endif
792 
793 #ifndef STBI_NO_PNG
794 static int      stbi__png_test(stbi__context *s);
795 static stbi_uc *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
796 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
797 #endif
798 
799 #ifndef STBI_NO_BMP
800 static int      stbi__bmp_test(stbi__context *s);
801 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
802 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
803 #endif
804 
805 #ifndef STBI_NO_TGA
806 static int      stbi__tga_test(stbi__context *s);
807 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
808 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
809 #endif
810 
811 #ifndef STBI_NO_PSD
812 static int      stbi__psd_test(stbi__context *s);
813 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
814 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
815 #endif
816 
817 #ifndef STBI_NO_HDR
818 static int      stbi__hdr_test(stbi__context *s);
819 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
820 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
821 #endif
822 
823 #ifndef STBI_NO_PIC
824 static int      stbi__pic_test(stbi__context *s);
825 static stbi_uc *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
826 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
827 #endif
828 
829 #ifndef STBI_NO_GIF
830 static int      stbi__gif_test(stbi__context *s);
831 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
832 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
833 #endif
834 
835 #ifndef STBI_NO_PNM
836 static int      stbi__pnm_test(stbi__context *s);
837 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp);
838 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
839 #endif
840 
841 // this is not threadsafe
842 static const char *stbi__g_failure_reason;
843 
stbi_failure_reason(void)844 STBIDEF const char *stbi_failure_reason(void)
845 {
846    return stbi__g_failure_reason;
847 }
848 
stbi__err(const char * str)849 static int stbi__err(const char *str)
850 {
851    stbi__g_failure_reason = str;
852    return 0;
853 }
854 
stbi__malloc(size_t size)855 static void *stbi__malloc(size_t size)
856 {
857     return STBI_MALLOC(size);
858 }
859 
860 // stbi__err - error
861 // stbi__errpf - error returning pointer to float
862 // stbi__errpuc - error returning pointer to unsigned char
863 
864 #ifdef STBI_NO_FAILURE_STRINGS
865    #define stbi__err(x,y)  0
866 #elif defined(STBI_FAILURE_USERMSG)
867    #define stbi__err(x,y)  stbi__err(y)
868 #else
869    #define stbi__err(x,y)  stbi__err(x)
870 #endif
871 
872 #define stbi__errpf(x,y)   ((float *) (stbi__err(x,y)?NULL:NULL))
873 #define stbi__errpuc(x,y)  ((unsigned char *) (stbi__err(x,y)?NULL:NULL))
874 
stbi_image_free(void * retval_from_stbi_load)875 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
876 {
877    STBI_FREE(retval_from_stbi_load);
878 }
879 
880 #ifndef STBI_NO_LINEAR
881 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
882 #endif
883 
884 #ifndef STBI_NO_HDR
885 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
886 #endif
887 
stbi_load_main(stbi__context * s,int * x,int * y,int * comp,int req_comp)888 static unsigned char *stbi_load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
889 {
890    #ifndef STBI_NO_JPEG
891    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp);
892    #endif
893    #ifndef STBI_NO_PNG
894    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp);
895    #endif
896    #ifndef STBI_NO_BMP
897    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp);
898    #endif
899    #ifndef STBI_NO_GIF
900    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp);
901    #endif
902    #ifndef STBI_NO_PSD
903    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp);
904    #endif
905    #ifndef STBI_NO_PIC
906    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp);
907    #endif
908    #ifndef STBI_NO_PNM
909    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp);
910    #endif
911 
912    #ifndef STBI_NO_HDR
913    if (stbi__hdr_test(s)) {
914       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp);
915       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
916    }
917    #endif
918 
919    #ifndef STBI_NO_TGA
920    // test tga last because it's a crappy test!
921    if (stbi__tga_test(s))
922       return stbi__tga_load(s,x,y,comp,req_comp);
923    #endif
924 
925    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
926 }
927 
928 #ifndef STBI_NO_STDIO
929 
stbi__fopen(char const * filename,char const * mode)930 static FILE *stbi__fopen(char const *filename, char const *mode)
931 {
932    FILE *f;
933 #if defined(_MSC_VER) && _MSC_VER >= 1400
934    if (0 != fopen_s(&f, filename, mode))
935       f=0;
936 #else
937    f = fopen(filename, mode);
938 #endif
939    return f;
940 }
941 
942 
stbi_load(char const * filename,int * x,int * y,int * comp,int req_comp)943 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
944 {
945    FILE *f = stbi__fopen(filename, "rb");
946    unsigned char *result;
947    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
948    result = stbi_load_from_file(f,x,y,comp,req_comp);
949    fclose(f);
950    return result;
951 }
952 
stbi_load_from_file(FILE * f,int * x,int * y,int * comp,int req_comp)953 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
954 {
955    unsigned char *result;
956    stbi__context s;
957    stbi__start_file(&s,f);
958    result = stbi_load_main(&s,x,y,comp,req_comp);
959    if (result) {
960       // need to 'unget' all the characters in the IO buffer
961       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
962    }
963    return result;
964 }
965 #endif //!STBI_NO_STDIO
966 
stbi_load_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp,int req_comp)967 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
968 {
969    stbi__context s;
970    stbi__start_mem(&s,buffer,len);
971    return stbi_load_main(&s,x,y,comp,req_comp);
972 }
973 
stbi_load_from_callbacks(stbi_io_callbacks const * clbk,void * user,int * x,int * y,int * comp,int req_comp)974 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
975 {
976    stbi__context s;
977    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
978    return stbi_load_main(&s,x,y,comp,req_comp);
979 }
980 
981 #ifndef STBI_NO_LINEAR
stbi_loadf_main(stbi__context * s,int * x,int * y,int * comp,int req_comp)982 static float *stbi_loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
983 {
984    unsigned char *data;
985    #ifndef STBI_NO_HDR
986    if (stbi__hdr_test(s))
987       return stbi__hdr_load(s,x,y,comp,req_comp);
988    #endif
989    data = stbi_load_main(s, x, y, comp, req_comp);
990    if (data)
991       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
992    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
993 }
994 
stbi_loadf_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp,int req_comp)995 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
996 {
997    stbi__context s;
998    stbi__start_mem(&s,buffer,len);
999    return stbi_loadf_main(&s,x,y,comp,req_comp);
1000 }
1001 
stbi_loadf_from_callbacks(stbi_io_callbacks const * clbk,void * user,int * x,int * y,int * comp,int req_comp)1002 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1003 {
1004    stbi__context s;
1005    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1006    return stbi_loadf_main(&s,x,y,comp,req_comp);
1007 }
1008 
1009 #ifndef STBI_NO_STDIO
stbi_loadf(char const * filename,int * x,int * y,int * comp,int req_comp)1010 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1011 {
1012    float *result;
1013    FILE *f = stbi__fopen(filename, "rb");
1014    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1015    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1016    fclose(f);
1017    return result;
1018 }
1019 
stbi_loadf_from_file(FILE * f,int * x,int * y,int * comp,int req_comp)1020 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1021 {
1022    stbi__context s;
1023    stbi__start_file(&s,f);
1024    return stbi_loadf_main(&s,x,y,comp,req_comp);
1025 }
1026 #endif // !STBI_NO_STDIO
1027 
1028 #endif // !STBI_NO_LINEAR
1029 
1030 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1031 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1032 // reports false!
1033 
stbi_is_hdr_from_memory(stbi_uc const * buffer,int len)1034 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1035 {
1036    #ifndef STBI_NO_HDR
1037    stbi__context s;
1038    stbi__start_mem(&s,buffer,len);
1039    return stbi__hdr_test(&s);
1040    #else
1041    STBI_NOTUSED(buffer);
1042    STBI_NOTUSED(len);
1043    return 0;
1044    #endif
1045 }
1046 
1047 #ifndef STBI_NO_STDIO
stbi_is_hdr(char const * filename)1048 STBIDEF int      stbi_is_hdr          (char const *filename)
1049 {
1050    FILE *f = stbi__fopen(filename, "rb");
1051    int result=0;
1052    if (f) {
1053       result = stbi_is_hdr_from_file(f);
1054       fclose(f);
1055    }
1056    return result;
1057 }
1058 
stbi_is_hdr_from_file(FILE * f)1059 STBIDEF int      stbi_is_hdr_from_file(FILE *f)
1060 {
1061    #ifndef STBI_NO_HDR
1062    stbi__context s;
1063    stbi__start_file(&s,f);
1064    return stbi__hdr_test(&s);
1065    #else
1066    return 0;
1067    #endif
1068 }
1069 #endif // !STBI_NO_STDIO
1070 
stbi_is_hdr_from_callbacks(stbi_io_callbacks const * clbk,void * user)1071 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1072 {
1073    #ifndef STBI_NO_HDR
1074    stbi__context s;
1075    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1076    return stbi__hdr_test(&s);
1077    #else
1078    return 0;
1079    #endif
1080 }
1081 
1082 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1083 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1084 
1085 #ifndef STBI_NO_LINEAR
stbi_ldr_to_hdr_gamma(float gamma)1086 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
stbi_ldr_to_hdr_scale(float scale)1087 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1088 #endif
1089 
stbi_hdr_to_ldr_gamma(float gamma)1090 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
stbi_hdr_to_ldr_scale(float scale)1091 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1092 
1093 
1094 //////////////////////////////////////////////////////////////////////////////
1095 //
1096 // Common code used by all image loaders
1097 //
1098 
1099 enum
1100 {
1101    STBI__SCAN_load=0,
1102    STBI__SCAN_type,
1103    STBI__SCAN_header
1104 };
1105 
stbi__refill_buffer(stbi__context * s)1106 static void stbi__refill_buffer(stbi__context *s)
1107 {
1108    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1109    if (n == 0) {
1110       // at end of file, treat same as if from memory, but need to handle case
1111       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1112       s->read_from_callbacks = 0;
1113       s->img_buffer = s->buffer_start;
1114       s->img_buffer_end = s->buffer_start+1;
1115       *s->img_buffer = 0;
1116    } else {
1117       s->img_buffer = s->buffer_start;
1118       s->img_buffer_end = s->buffer_start + n;
1119    }
1120 }
1121 
stbi__get8(stbi__context * s)1122 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1123 {
1124    if (s->img_buffer < s->img_buffer_end)
1125       return *s->img_buffer++;
1126    if (s->read_from_callbacks) {
1127       stbi__refill_buffer(s);
1128       return *s->img_buffer++;
1129    }
1130    return 0;
1131 }
1132 
stbi__at_eof(stbi__context * s)1133 stbi_inline static int stbi__at_eof(stbi__context *s)
1134 {
1135    if (s->io.read) {
1136       if (!(s->io.eof)(s->io_user_data)) return 0;
1137       // if feof() is true, check if buffer = end
1138       // special case: we've only got the special 0 character at the end
1139       if (s->read_from_callbacks == 0) return 1;
1140    }
1141 
1142    return s->img_buffer >= s->img_buffer_end;
1143 }
1144 
stbi__skip(stbi__context * s,int n)1145 static void stbi__skip(stbi__context *s, int n)
1146 {
1147    if (s->io.read) {
1148       int blen = (int) (s->img_buffer_end - s->img_buffer);
1149       if (blen < n) {
1150          s->img_buffer = s->img_buffer_end;
1151          (s->io.skip)(s->io_user_data, n - blen);
1152          return;
1153       }
1154    }
1155    s->img_buffer += n;
1156 }
1157 
stbi__getn(stbi__context * s,stbi_uc * buffer,int n)1158 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1159 {
1160    if (s->io.read) {
1161       int blen = (int) (s->img_buffer_end - s->img_buffer);
1162       if (blen < n) {
1163          int res, count;
1164 
1165          memcpy(buffer, s->img_buffer, blen);
1166 
1167          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1168          res = (count == (n-blen));
1169          s->img_buffer = s->img_buffer_end;
1170          return res;
1171       }
1172    }
1173 
1174    if (s->img_buffer+n <= s->img_buffer_end) {
1175       memcpy(buffer, s->img_buffer, n);
1176       s->img_buffer += n;
1177       return 1;
1178    } else
1179       return 0;
1180 }
1181 
stbi__get16be(stbi__context * s)1182 static int stbi__get16be(stbi__context *s)
1183 {
1184    int z = stbi__get8(s);
1185    return (z << 8) + stbi__get8(s);
1186 }
1187 
stbi__get32be(stbi__context * s)1188 static stbi__uint32 stbi__get32be(stbi__context *s)
1189 {
1190    stbi__uint32 z = stbi__get16be(s);
1191    return (z << 16) + stbi__get16be(s);
1192 }
1193 
stbi__get16le(stbi__context * s)1194 static int stbi__get16le(stbi__context *s)
1195 {
1196    int z = stbi__get8(s);
1197    return z + (stbi__get8(s) << 8);
1198 }
1199 
stbi__get32le(stbi__context * s)1200 static stbi__uint32 stbi__get32le(stbi__context *s)
1201 {
1202    stbi__uint32 z = stbi__get16le(s);
1203    return z + (stbi__get16le(s) << 16);
1204 }
1205 
1206 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
1207 
1208 
1209 //////////////////////////////////////////////////////////////////////////////
1210 //
1211 //  generic converter from built-in img_n to req_comp
1212 //    individual types do this automatically as much as possible (e.g. jpeg
1213 //    does all cases internally since it needs to colorspace convert anyway,
1214 //    and it never has alpha, so very few cases ). png can automatically
1215 //    interleave an alpha=255 channel, but falls back to this for other cases
1216 //
1217 //  assume data buffer is malloced, so malloc a new one and free that one
1218 //  only failure mode is malloc failing
1219 
stbi__compute_y(int r,int g,int b)1220 static stbi_uc stbi__compute_y(int r, int g, int b)
1221 {
1222    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
1223 }
1224 
stbi__convert_format(unsigned char * data,int img_n,int req_comp,unsigned int x,unsigned int y)1225 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1226 {
1227    int i,j;
1228    unsigned char *good;
1229 
1230    if (req_comp == img_n) return data;
1231    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1232 
1233    good = (unsigned char *) stbi__malloc(req_comp * x * y);
1234    if (good == NULL) {
1235       STBI_FREE(data);
1236       return stbi__errpuc("outofmem", "Out of memory");
1237    }
1238 
1239    for (j=0; j < (int) y; ++j) {
1240       unsigned char *src  = data + j * x * img_n   ;
1241       unsigned char *dest = good + j * x * req_comp;
1242 
1243       #define COMBO(a,b)  ((a)*8+(b))
1244       #define CASE(a,b)   case COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1245       // convert source image with img_n components to one with req_comp components;
1246       // avoid switch per pixel, so use switch per scanline and massive macros
1247       switch (COMBO(img_n, req_comp)) {
1248          CASE(1,2) dest[0]=src[0], dest[1]=255; break;
1249          CASE(1,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1250          CASE(1,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=255; break;
1251          CASE(2,1) dest[0]=src[0]; break;
1252          CASE(2,3) dest[0]=dest[1]=dest[2]=src[0]; break;
1253          CASE(2,4) dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; break;
1254          CASE(3,4) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255; break;
1255          CASE(3,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1256          CASE(3,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255; break;
1257          CASE(4,1) dest[0]=stbi__compute_y(src[0],src[1],src[2]); break;
1258          CASE(4,2) dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3]; break;
1259          CASE(4,3) dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; break;
1260          default: STBI_ASSERT(0);
1261       }
1262       #undef CASE
1263    }
1264 
1265    STBI_FREE(data);
1266    return good;
1267 }
1268 
1269 #ifndef STBI_NO_LINEAR
stbi__ldr_to_hdr(stbi_uc * data,int x,int y,int comp)1270 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1271 {
1272    int i,k,n;
1273    float *output = (float *) stbi__malloc(x * y * comp * sizeof(float));
1274    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1275    // compute number of non-alpha components
1276    if (comp & 1) n = comp; else n = comp-1;
1277    for (i=0; i < x*y; ++i) {
1278       for (k=0; k < n; ++k) {
1279          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1280       }
1281       if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
1282    }
1283    STBI_FREE(data);
1284    return output;
1285 }
1286 #endif
1287 
1288 #ifndef STBI_NO_HDR
1289 #define stbi__float2int(x)   ((int) (x))
stbi__hdr_to_ldr(float * data,int x,int y,int comp)1290 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
1291 {
1292    int i,k,n;
1293    stbi_uc *output = (stbi_uc *) stbi__malloc(x * y * comp);
1294    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1295    // compute number of non-alpha components
1296    if (comp & 1) n = comp; else n = comp-1;
1297    for (i=0; i < x*y; ++i) {
1298       for (k=0; k < n; ++k) {
1299          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1300          if (z < 0) z = 0;
1301          if (z > 255) z = 255;
1302          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1303       }
1304       if (k < comp) {
1305          float z = data[i*comp+k] * 255 + 0.5f;
1306          if (z < 0) z = 0;
1307          if (z > 255) z = 255;
1308          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1309       }
1310    }
1311    STBI_FREE(data);
1312    return output;
1313 }
1314 #endif
1315 
1316 //////////////////////////////////////////////////////////////////////////////
1317 //
1318 //  "baseline" JPEG/JFIF decoder
1319 //
1320 //    simple implementation
1321 //      - doesn't support delayed output of y-dimension
1322 //      - simple interface (only one output format: 8-bit interleaved RGB)
1323 //      - doesn't try to recover corrupt jpegs
1324 //      - doesn't allow partial loading, loading multiple at once
1325 //      - still fast on x86 (copying globals into locals doesn't help x86)
1326 //      - allocates lots of intermediate memory (full size of all components)
1327 //        - non-interleaved case requires this anyway
1328 //        - allows good upsampling (see next)
1329 //    high-quality
1330 //      - upsampled channels are bilinearly interpolated, even across blocks
1331 //      - quality integer IDCT derived from IJG's 'slow'
1332 //    performance
1333 //      - fast huffman; reasonable integer IDCT
1334 //      - some SIMD kernels for common paths on targets with SSE2/NEON
1335 //      - uses a lot of intermediate memory, could cache poorly
1336 
1337 #ifndef STBI_NO_JPEG
1338 
1339 // huffman decoding acceleration
1340 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
1341 
1342 typedef struct
1343 {
1344    stbi_uc  fast[1 << FAST_BITS];
1345    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1346    stbi__uint16 code[256];
1347    stbi_uc  values[256];
1348    stbi_uc  size[257];
1349    unsigned int maxcode[18];
1350    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
1351 } stbi__huffman;
1352 
1353 typedef struct
1354 {
1355    stbi__context *s;
1356    stbi__huffman huff_dc[4];
1357    stbi__huffman huff_ac[4];
1358    stbi_uc dequant[4][64];
1359    stbi__int16 fast_ac[4][1 << FAST_BITS];
1360 
1361 // sizes for components, interleaved MCUs
1362    int img_h_max, img_v_max;
1363    int img_mcu_x, img_mcu_y;
1364    int img_mcu_w, img_mcu_h;
1365 
1366 // definition of jpeg image component
1367    struct
1368    {
1369       int id;
1370       int h,v;
1371       int tq;
1372       int hd,ha;
1373       int dc_pred;
1374 
1375       int x,y,w2,h2;
1376       stbi_uc *data;
1377       void *raw_data, *raw_coeff;
1378       stbi_uc *linebuf;
1379       short   *coeff;   // progressive only
1380       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
1381    } img_comp[4];
1382 
1383    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
1384    int            code_bits;   // number of valid bits
1385    unsigned char  marker;      // marker seen while filling entropy buffer
1386    int            nomore;      // flag if we saw a marker so must stop
1387 
1388    int            progressive;
1389    int            spec_start;
1390    int            spec_end;
1391    int            succ_high;
1392    int            succ_low;
1393    int            eob_run;
1394 
1395    int scan_n, order[4];
1396    int restart_interval, todo;
1397 
1398 // kernels
1399    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1400    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1401    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1402 } stbi__jpeg;
1403 
stbi__build_huffman(stbi__huffman * h,int * count)1404 static int stbi__build_huffman(stbi__huffman *h, int *count)
1405 {
1406    int i,j,k=0,code;
1407    // build size list for each symbol (from JPEG spec)
1408    for (i=0; i < 16; ++i)
1409       for (j=0; j < count[i]; ++j)
1410          h->size[k++] = (stbi_uc) (i+1);
1411    h->size[k] = 0;
1412 
1413    // compute actual symbols (from jpeg spec)
1414    code = 0;
1415    k = 0;
1416    for(j=1; j <= 16; ++j) {
1417       // compute delta to add to code to compute symbol id
1418       h->delta[j] = k - code;
1419       if (h->size[k] == j) {
1420          while (h->size[k] == j)
1421             h->code[k++] = (stbi__uint16) (code++);
1422          if (code-1 >= (1 << j)) return stbi__err("bad code lengths","Corrupt JPEG");
1423       }
1424       // compute largest code + 1 for this size, preshifted as needed later
1425       h->maxcode[j] = code << (16-j);
1426       code <<= 1;
1427    }
1428    h->maxcode[j] = 0xffffffff;
1429 
1430    // build non-spec acceleration table; 255 is flag for not-accelerated
1431    memset(h->fast, 255, 1 << FAST_BITS);
1432    for (i=0; i < k; ++i) {
1433       int s = h->size[i];
1434       if (s <= FAST_BITS) {
1435          int c = h->code[i] << (FAST_BITS-s);
1436          int m = 1 << (FAST_BITS-s);
1437          for (j=0; j < m; ++j) {
1438             h->fast[c+j] = (stbi_uc) i;
1439          }
1440       }
1441    }
1442    return 1;
1443 }
1444 
1445 // build a table that decodes both magnitude and value of small ACs in
1446 // one go.
stbi__build_fast_ac(stbi__int16 * fast_ac,stbi__huffman * h)1447 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
1448 {
1449    int i;
1450    for (i=0; i < (1 << FAST_BITS); ++i) {
1451       stbi_uc fast = h->fast[i];
1452       fast_ac[i] = 0;
1453       if (fast < 255) {
1454          int rs = h->values[fast];
1455          int run = (rs >> 4) & 15;
1456          int magbits = rs & 15;
1457          int len = h->size[fast];
1458 
1459          if (magbits && len + magbits <= FAST_BITS) {
1460             // magnitude code followed by receive_extend code
1461             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
1462             int m = 1 << (magbits - 1);
1463             if (k < m) k += (-1 << magbits) + 1;
1464             // if the result is small enough, we can fit it in fast_ac table
1465             if (k >= -128 && k <= 127)
1466                fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits));
1467          }
1468       }
1469    }
1470 }
1471 
stbi__grow_buffer_unsafe(stbi__jpeg * j)1472 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
1473 {
1474    do {
1475       int b = j->nomore ? 0 : stbi__get8(j->s);
1476       if (b == 0xff) {
1477          int c = stbi__get8(j->s);
1478          if (c != 0) {
1479             j->marker = (unsigned char) c;
1480             j->nomore = 1;
1481             return;
1482          }
1483       }
1484       j->code_buffer |= b << (24 - j->code_bits);
1485       j->code_bits += 8;
1486    } while (j->code_bits <= 24);
1487 }
1488 
1489 // (1 << n) - 1
1490 static stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
1491 
1492 // decode a jpeg huffman value from the bitstream
stbi__jpeg_huff_decode(stbi__jpeg * j,stbi__huffman * h)1493 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
1494 {
1495    unsigned int temp;
1496    int c,k;
1497 
1498    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1499 
1500    // look at the top FAST_BITS and determine what symbol ID it is,
1501    // if the code is <= FAST_BITS
1502    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1503    k = h->fast[c];
1504    if (k < 255) {
1505       int s = h->size[k];
1506       if (s > j->code_bits)
1507          return -1;
1508       j->code_buffer <<= s;
1509       j->code_bits -= s;
1510       return h->values[k];
1511    }
1512 
1513    // naive test is to shift the code_buffer down so k bits are
1514    // valid, then test against maxcode. To speed this up, we've
1515    // preshifted maxcode left so that it has (16-k) 0s at the
1516    // end; in other words, regardless of the number of bits, it
1517    // wants to be compared against something shifted to have 16;
1518    // that way we don't need to shift inside the loop.
1519    temp = j->code_buffer >> 16;
1520    for (k=FAST_BITS+1 ; ; ++k)
1521       if (temp < h->maxcode[k])
1522          break;
1523    if (k == 17) {
1524       // error! code not found
1525       j->code_bits -= 16;
1526       return -1;
1527    }
1528 
1529    if (k > j->code_bits)
1530       return -1;
1531 
1532    // convert the huffman code to the symbol id
1533    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
1534    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
1535 
1536    // convert the id to a symbol
1537    j->code_bits -= k;
1538    j->code_buffer <<= k;
1539    return h->values[c];
1540 }
1541 
1542 // bias[n] = (-1<<n) + 1
1543 static int const stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
1544 
1545 // combined JPEG 'receive' and JPEG 'extend', since baseline
1546 // always extends everything it receives.
stbi__extend_receive(stbi__jpeg * j,int n)1547 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
1548 {
1549    unsigned int k;
1550    int sgn;
1551    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1552 
1553    sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
1554    k = stbi_lrot(j->code_buffer, n);
1555    j->code_buffer = k & ~stbi__bmask[n];
1556    k &= stbi__bmask[n];
1557    j->code_bits -= n;
1558    return k + (stbi__jbias[n] & ~sgn);
1559 }
1560 
1561 // get some unsigned bits
stbi__jpeg_get_bits(stbi__jpeg * j,int n)1562 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
1563 {
1564    unsigned int k;
1565    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1566    k = stbi_lrot(j->code_buffer, n);
1567    j->code_buffer = k & ~stbi__bmask[n];
1568    k &= stbi__bmask[n];
1569    j->code_bits -= n;
1570    return k;
1571 }
1572 
stbi__jpeg_get_bit(stbi__jpeg * j)1573 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
1574 {
1575    unsigned int k;
1576    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
1577    k = j->code_buffer;
1578    j->code_buffer <<= 1;
1579    --j->code_bits;
1580    return k & 0x80000000;
1581 }
1582 
1583 // given a value that's at position X in the zigzag stream,
1584 // where does it appear in the 8x8 matrix coded as row-major?
1585 static stbi_uc stbi__jpeg_dezigzag[64+15] =
1586 {
1587     0,  1,  8, 16,  9,  2,  3, 10,
1588    17, 24, 32, 25, 18, 11,  4,  5,
1589    12, 19, 26, 33, 40, 48, 41, 34,
1590    27, 20, 13,  6,  7, 14, 21, 28,
1591    35, 42, 49, 56, 57, 50, 43, 36,
1592    29, 22, 15, 23, 30, 37, 44, 51,
1593    58, 59, 52, 45, 38, 31, 39, 46,
1594    53, 60, 61, 54, 47, 55, 62, 63,
1595    // let corrupt input sample past end
1596    63, 63, 63, 63, 63, 63, 63, 63,
1597    63, 63, 63, 63, 63, 63, 63
1598 };
1599 
1600 // decode one 64-entry block--
stbi__jpeg_decode_block(stbi__jpeg * j,short data[64],stbi__huffman * hdc,stbi__huffman * hac,stbi__int16 * fac,int b,stbi_uc * dequant)1601 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant)
1602 {
1603    int diff,dc,k;
1604    int t;
1605 
1606    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1607    t = stbi__jpeg_huff_decode(j, hdc);
1608    if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1609 
1610    // 0 all the ac values now so we can do it 32-bits at a time
1611    memset(data,0,64*sizeof(data[0]));
1612 
1613    diff = t ? stbi__extend_receive(j, t) : 0;
1614    dc = j->img_comp[b].dc_pred + diff;
1615    j->img_comp[b].dc_pred = dc;
1616    data[0] = (short) (dc * dequant[0]);
1617 
1618    // decode AC components, see JPEG spec
1619    k = 1;
1620    do {
1621       unsigned int zig;
1622       int c,r,s;
1623       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1624       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1625       r = fac[c];
1626       if (r) { // fast-AC path
1627          k += (r >> 4) & 15; // run
1628          s = r & 15; // combined length
1629          j->code_buffer <<= s;
1630          j->code_bits -= s;
1631          // decode into unzigzag'd location
1632          zig = stbi__jpeg_dezigzag[k++];
1633          data[zig] = (short) ((r >> 8) * dequant[zig]);
1634       } else {
1635          int rs = stbi__jpeg_huff_decode(j, hac);
1636          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1637          s = rs & 15;
1638          r = rs >> 4;
1639          if (s == 0) {
1640             if (rs != 0xf0) break; // end block
1641             k += 16;
1642          } else {
1643             k += r;
1644             // decode into unzigzag'd location
1645             zig = stbi__jpeg_dezigzag[k++];
1646             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
1647          }
1648       }
1649    } while (k < 64);
1650    return 1;
1651 }
1652 
stbi__jpeg_decode_block_prog_dc(stbi__jpeg * j,short data[64],stbi__huffman * hdc,int b)1653 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
1654 {
1655    int diff,dc;
1656    int t;
1657    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1658 
1659    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1660 
1661    if (j->succ_high == 0) {
1662       // first scan for DC coefficient, must be first
1663       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
1664       t = stbi__jpeg_huff_decode(j, hdc);
1665       diff = t ? stbi__extend_receive(j, t) : 0;
1666 
1667       dc = j->img_comp[b].dc_pred + diff;
1668       j->img_comp[b].dc_pred = dc;
1669       data[0] = (short) (dc << j->succ_low);
1670    } else {
1671       // refinement scan for DC coefficient
1672       if (stbi__jpeg_get_bit(j))
1673          data[0] += (short) (1 << j->succ_low);
1674    }
1675    return 1;
1676 }
1677 
1678 // @OPTIMIZE: store non-zigzagged during the decode passes,
1679 // and only de-zigzag when dequantizing
stbi__jpeg_decode_block_prog_ac(stbi__jpeg * j,short data[64],stbi__huffman * hac,stbi__int16 * fac)1680 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
1681 {
1682    int k;
1683    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
1684 
1685    if (j->succ_high == 0) {
1686       int shift = j->succ_low;
1687 
1688       if (j->eob_run) {
1689          --j->eob_run;
1690          return 1;
1691       }
1692 
1693       k = j->spec_start;
1694       do {
1695          unsigned int zig;
1696          int c,r,s;
1697          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1698          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1699          r = fac[c];
1700          if (r) { // fast-AC path
1701             k += (r >> 4) & 15; // run
1702             s = r & 15; // combined length
1703             j->code_buffer <<= s;
1704             j->code_bits -= s;
1705             zig = stbi__jpeg_dezigzag[k++];
1706             data[zig] = (short) ((r >> 8) << shift);
1707          } else {
1708             int rs = stbi__jpeg_huff_decode(j, hac);
1709             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1710             s = rs & 15;
1711             r = rs >> 4;
1712             if (s == 0) {
1713                if (r < 15) {
1714                   j->eob_run = (1 << r);
1715                   if (r)
1716                      j->eob_run += stbi__jpeg_get_bits(j, r);
1717                   --j->eob_run;
1718                   break;
1719                }
1720                k += 16;
1721             } else {
1722                k += r;
1723                zig = stbi__jpeg_dezigzag[k++];
1724                data[zig] = (short) (stbi__extend_receive(j,s) << shift);
1725             }
1726          }
1727       } while (k <= j->spec_end);
1728    } else {
1729       // refinement scan for these AC coefficients
1730 
1731       short bit = (short) (1 << j->succ_low);
1732 
1733       if (j->eob_run) {
1734          --j->eob_run;
1735          for (k = j->spec_start; k <= j->spec_end; ++k) {
1736             short *p = &data[stbi__jpeg_dezigzag[k]];
1737             if (*p != 0)
1738                if (stbi__jpeg_get_bit(j))
1739                   if ((*p & bit)==0) {
1740                      if (*p > 0)
1741                         *p += bit;
1742                      else
1743                         *p -= bit;
1744                   }
1745          }
1746       } else {
1747          k = j->spec_start;
1748          do {
1749             int r,s;
1750             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
1751             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1752             s = rs & 15;
1753             r = rs >> 4;
1754             if (s == 0) {
1755                if (r < 15) {
1756                   j->eob_run = (1 << r) - 1;
1757                   if (r)
1758                      j->eob_run += stbi__jpeg_get_bits(j, r);
1759                   r = 64; // force end of block
1760                } else
1761                   r = 16; // r=15 is the code for 16 0s
1762             } else {
1763                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
1764                // sign bit
1765                if (stbi__jpeg_get_bit(j))
1766                   s = bit;
1767                else
1768                   s = -bit;
1769             }
1770 
1771             // advance by r
1772             while (k <= j->spec_end) {
1773                short *p = &data[stbi__jpeg_dezigzag[k]];
1774                if (*p != 0) {
1775                   if (stbi__jpeg_get_bit(j))
1776                      if ((*p & bit)==0) {
1777                         if (*p > 0)
1778                            *p += bit;
1779                         else
1780                            *p -= bit;
1781                      }
1782                   ++k;
1783                } else {
1784                   if (r == 0) {
1785                      if (s)
1786                         data[stbi__jpeg_dezigzag[k++]] = (short) s;
1787                      break;
1788                   }
1789                   --r;
1790                   ++k;
1791                }
1792             }
1793          } while (k <= j->spec_end);
1794       }
1795    }
1796    return 1;
1797 }
1798 
1799 // take a -128..127 value and stbi__clamp it and convert to 0..255
stbi__clamp(int x)1800 stbi_inline static stbi_uc stbi__clamp(int x)
1801 {
1802    // trick to use a single test to catch both cases
1803    if ((unsigned int) x > 255) {
1804       if (x < 0) return 0;
1805       if (x > 255) return 255;
1806    }
1807    return (stbi_uc) x;
1808 }
1809 
1810 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
1811 #define stbi__fsh(x)  ((x) << 12)
1812 
1813 // derived from jidctint -- DCT_ISLOW
1814 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
1815    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
1816    p2 = s2;                                    \
1817    p3 = s6;                                    \
1818    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
1819    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
1820    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
1821    p2 = s0;                                    \
1822    p3 = s4;                                    \
1823    t0 = stbi__fsh(p2+p3);                      \
1824    t1 = stbi__fsh(p2-p3);                      \
1825    x0 = t0+t3;                                 \
1826    x3 = t0-t3;                                 \
1827    x1 = t1+t2;                                 \
1828    x2 = t1-t2;                                 \
1829    t0 = s7;                                    \
1830    t1 = s5;                                    \
1831    t2 = s3;                                    \
1832    t3 = s1;                                    \
1833    p3 = t0+t2;                                 \
1834    p4 = t1+t3;                                 \
1835    p1 = t0+t3;                                 \
1836    p2 = t1+t2;                                 \
1837    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
1838    t0 = t0*stbi__f2f( 0.298631336f);           \
1839    t1 = t1*stbi__f2f( 2.053119869f);           \
1840    t2 = t2*stbi__f2f( 3.072711026f);           \
1841    t3 = t3*stbi__f2f( 1.501321110f);           \
1842    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
1843    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
1844    p3 = p3*stbi__f2f(-1.961570560f);           \
1845    p4 = p4*stbi__f2f(-0.390180644f);           \
1846    t3 += p1+p4;                                \
1847    t2 += p2+p3;                                \
1848    t1 += p2+p4;                                \
1849    t0 += p1+p3;
1850 
stbi__idct_block(stbi_uc * out,int out_stride,short data[64])1851 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
1852 {
1853    int i,val[64],*v=val;
1854    stbi_uc *o;
1855    short *d = data;
1856 
1857    // columns
1858    for (i=0; i < 8; ++i,++d, ++v) {
1859       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
1860       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
1861            && d[40]==0 && d[48]==0 && d[56]==0) {
1862          //    no shortcut                 0     seconds
1863          //    (1|2|3|4|5|6|7)==0          0     seconds
1864          //    all separate               -0.047 seconds
1865          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
1866          int dcterm = d[0] << 2;
1867          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
1868       } else {
1869          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
1870          // constants scaled things up by 1<<12; let's bring them back
1871          // down, but keep 2 extra bits of precision
1872          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
1873          v[ 0] = (x0+t3) >> 10;
1874          v[56] = (x0-t3) >> 10;
1875          v[ 8] = (x1+t2) >> 10;
1876          v[48] = (x1-t2) >> 10;
1877          v[16] = (x2+t1) >> 10;
1878          v[40] = (x2-t1) >> 10;
1879          v[24] = (x3+t0) >> 10;
1880          v[32] = (x3-t0) >> 10;
1881       }
1882    }
1883 
1884    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
1885       // no fast case since the first 1D IDCT spread components out
1886       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
1887       // constants scaled things up by 1<<12, plus we had 1<<2 from first
1888       // loop, plus horizontal and vertical each scale by sqrt(8) so together
1889       // we've got an extra 1<<3, so 1<<17 total we need to remove.
1890       // so we want to round that, which means adding 0.5 * 1<<17,
1891       // aka 65536. Also, we'll end up with -128 to 127 that we want
1892       // to encode as 0..255 by adding 128, so we'll add that before the shift
1893       x0 += 65536 + (128<<17);
1894       x1 += 65536 + (128<<17);
1895       x2 += 65536 + (128<<17);
1896       x3 += 65536 + (128<<17);
1897       // tried computing the shifts into temps, or'ing the temps to see
1898       // if any were out of range, but that was slower
1899       o[0] = stbi__clamp((x0+t3) >> 17);
1900       o[7] = stbi__clamp((x0-t3) >> 17);
1901       o[1] = stbi__clamp((x1+t2) >> 17);
1902       o[6] = stbi__clamp((x1-t2) >> 17);
1903       o[2] = stbi__clamp((x2+t1) >> 17);
1904       o[5] = stbi__clamp((x2-t1) >> 17);
1905       o[3] = stbi__clamp((x3+t0) >> 17);
1906       o[4] = stbi__clamp((x3-t0) >> 17);
1907    }
1908 }
1909 
1910 #ifdef STBI_SSE2
1911 // sse2 integer IDCT. not the fastest possible implementation but it
1912 // produces bit-identical results to the generic C version so it's
1913 // fully "transparent".
stbi__idct_simd(stbi_uc * out,int out_stride,short data[64])1914 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
1915 {
1916    // This is constructed to match our regular (generic) integer IDCT exactly.
1917    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
1918    __m128i tmp;
1919 
1920    // dot product constant: even elems=x, odd elems=y
1921    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
1922 
1923    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
1924    // out(1) = c1[even]*x + c1[odd]*y
1925    #define dct_rot(out0,out1, x,y,c0,c1) \
1926       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
1927       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
1928       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
1929       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
1930       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
1931       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
1932 
1933    // out = in << 12  (in 16-bit, out 32-bit)
1934    #define dct_widen(out, in) \
1935       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
1936       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
1937 
1938    // wide add
1939    #define dct_wadd(out, a, b) \
1940       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
1941       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
1942 
1943    // wide sub
1944    #define dct_wsub(out, a, b) \
1945       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
1946       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
1947 
1948    // butterfly a/b, add bias, then shift by "s" and pack
1949    #define dct_bfly32o(out0, out1, a,b,bias,s) \
1950       { \
1951          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
1952          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
1953          dct_wadd(sum, abiased, b); \
1954          dct_wsub(dif, abiased, b); \
1955          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
1956          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
1957       }
1958 
1959    // 8-bit interleave step (for transposes)
1960    #define dct_interleave8(a, b) \
1961       tmp = a; \
1962       a = _mm_unpacklo_epi8(a, b); \
1963       b = _mm_unpackhi_epi8(tmp, b)
1964 
1965    // 16-bit interleave step (for transposes)
1966    #define dct_interleave16(a, b) \
1967       tmp = a; \
1968       a = _mm_unpacklo_epi16(a, b); \
1969       b = _mm_unpackhi_epi16(tmp, b)
1970 
1971    #define dct_pass(bias,shift) \
1972       { \
1973          /* even part */ \
1974          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
1975          __m128i sum04 = _mm_add_epi16(row0, row4); \
1976          __m128i dif04 = _mm_sub_epi16(row0, row4); \
1977          dct_widen(t0e, sum04); \
1978          dct_widen(t1e, dif04); \
1979          dct_wadd(x0, t0e, t3e); \
1980          dct_wsub(x3, t0e, t3e); \
1981          dct_wadd(x1, t1e, t2e); \
1982          dct_wsub(x2, t1e, t2e); \
1983          /* odd part */ \
1984          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
1985          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
1986          __m128i sum17 = _mm_add_epi16(row1, row7); \
1987          __m128i sum35 = _mm_add_epi16(row3, row5); \
1988          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
1989          dct_wadd(x4, y0o, y4o); \
1990          dct_wadd(x5, y1o, y5o); \
1991          dct_wadd(x6, y2o, y5o); \
1992          dct_wadd(x7, y3o, y4o); \
1993          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
1994          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
1995          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
1996          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
1997       }
1998 
1999    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2000    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2001    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2002    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2003    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2004    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2005    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2006    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2007 
2008    // rounding biases in column/row passes, see stbi__idct_block for explanation.
2009    __m128i bias_0 = _mm_set1_epi32(512);
2010    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2011 
2012    // load
2013    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2014    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2015    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2016    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2017    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2018    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2019    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2020    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2021 
2022    // column pass
2023    dct_pass(bias_0, 10);
2024 
2025    {
2026       // 16bit 8x8 transpose pass 1
2027       dct_interleave16(row0, row4);
2028       dct_interleave16(row1, row5);
2029       dct_interleave16(row2, row6);
2030       dct_interleave16(row3, row7);
2031 
2032       // transpose pass 2
2033       dct_interleave16(row0, row2);
2034       dct_interleave16(row1, row3);
2035       dct_interleave16(row4, row6);
2036       dct_interleave16(row5, row7);
2037 
2038       // transpose pass 3
2039       dct_interleave16(row0, row1);
2040       dct_interleave16(row2, row3);
2041       dct_interleave16(row4, row5);
2042       dct_interleave16(row6, row7);
2043    }
2044 
2045    // row pass
2046    dct_pass(bias_1, 17);
2047 
2048    {
2049       // pack
2050       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2051       __m128i p1 = _mm_packus_epi16(row2, row3);
2052       __m128i p2 = _mm_packus_epi16(row4, row5);
2053       __m128i p3 = _mm_packus_epi16(row6, row7);
2054 
2055       // 8bit 8x8 transpose pass 1
2056       dct_interleave8(p0, p2); // a0e0a1e1...
2057       dct_interleave8(p1, p3); // c0g0c1g1...
2058 
2059       // transpose pass 2
2060       dct_interleave8(p0, p1); // a0c0e0g0...
2061       dct_interleave8(p2, p3); // b0d0f0h0...
2062 
2063       // transpose pass 3
2064       dct_interleave8(p0, p2); // a0b0c0d0...
2065       dct_interleave8(p1, p3); // a4b4c4d4...
2066 
2067       // store
2068       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2069       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2070       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2071       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2072       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2073       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2074       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2075       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2076    }
2077 
2078 #undef dct_const
2079 #undef dct_rot
2080 #undef dct_widen
2081 #undef dct_wadd
2082 #undef dct_wsub
2083 #undef dct_bfly32o
2084 #undef dct_interleave8
2085 #undef dct_interleave16
2086 #undef dct_pass
2087 }
2088 
2089 #endif // STBI_SSE2
2090 
2091 #ifdef STBI_NEON
2092 
2093 // NEON integer IDCT. should produce bit-identical
2094 // results to the generic C version.
stbi__idct_simd(stbi_uc * out,int out_stride,short data[64])2095 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2096 {
2097    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2098 
2099    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2100    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2101    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2102    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2103    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2104    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2105    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2106    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2107    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2108    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2109    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2110    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2111 
2112 #define dct_long_mul(out, inq, coeff) \
2113    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2114    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2115 
2116 #define dct_long_mac(out, acc, inq, coeff) \
2117    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2118    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2119 
2120 #define dct_widen(out, inq) \
2121    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2122    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2123 
2124 // wide add
2125 #define dct_wadd(out, a, b) \
2126    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2127    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2128 
2129 // wide sub
2130 #define dct_wsub(out, a, b) \
2131    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2132    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2133 
2134 // butterfly a/b, then shift using "shiftop" by "s" and pack
2135 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2136    { \
2137       dct_wadd(sum, a, b); \
2138       dct_wsub(dif, a, b); \
2139       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2140       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2141    }
2142 
2143 #define dct_pass(shiftop, shift) \
2144    { \
2145       /* even part */ \
2146       int16x8_t sum26 = vaddq_s16(row2, row6); \
2147       dct_long_mul(p1e, sum26, rot0_0); \
2148       dct_long_mac(t2e, p1e, row6, rot0_1); \
2149       dct_long_mac(t3e, p1e, row2, rot0_2); \
2150       int16x8_t sum04 = vaddq_s16(row0, row4); \
2151       int16x8_t dif04 = vsubq_s16(row0, row4); \
2152       dct_widen(t0e, sum04); \
2153       dct_widen(t1e, dif04); \
2154       dct_wadd(x0, t0e, t3e); \
2155       dct_wsub(x3, t0e, t3e); \
2156       dct_wadd(x1, t1e, t2e); \
2157       dct_wsub(x2, t1e, t2e); \
2158       /* odd part */ \
2159       int16x8_t sum15 = vaddq_s16(row1, row5); \
2160       int16x8_t sum17 = vaddq_s16(row1, row7); \
2161       int16x8_t sum35 = vaddq_s16(row3, row5); \
2162       int16x8_t sum37 = vaddq_s16(row3, row7); \
2163       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2164       dct_long_mul(p5o, sumodd, rot1_0); \
2165       dct_long_mac(p1o, p5o, sum17, rot1_1); \
2166       dct_long_mac(p2o, p5o, sum35, rot1_2); \
2167       dct_long_mul(p3o, sum37, rot2_0); \
2168       dct_long_mul(p4o, sum15, rot2_1); \
2169       dct_wadd(sump13o, p1o, p3o); \
2170       dct_wadd(sump24o, p2o, p4o); \
2171       dct_wadd(sump23o, p2o, p3o); \
2172       dct_wadd(sump14o, p1o, p4o); \
2173       dct_long_mac(x4, sump13o, row7, rot3_0); \
2174       dct_long_mac(x5, sump24o, row5, rot3_1); \
2175       dct_long_mac(x6, sump23o, row3, rot3_2); \
2176       dct_long_mac(x7, sump14o, row1, rot3_3); \
2177       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2178       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2179       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2180       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2181    }
2182 
2183    // load
2184    row0 = vld1q_s16(data + 0*8);
2185    row1 = vld1q_s16(data + 1*8);
2186    row2 = vld1q_s16(data + 2*8);
2187    row3 = vld1q_s16(data + 3*8);
2188    row4 = vld1q_s16(data + 4*8);
2189    row5 = vld1q_s16(data + 5*8);
2190    row6 = vld1q_s16(data + 6*8);
2191    row7 = vld1q_s16(data + 7*8);
2192 
2193    // add DC bias
2194    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2195 
2196    // column pass
2197    dct_pass(vrshrn_n_s32, 10);
2198 
2199    // 16bit 8x8 transpose
2200    {
2201 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2202 // whether compilers actually get this is another story, sadly.
2203 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2204 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2205 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2206 
2207       // pass 1
2208       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2209       dct_trn16(row2, row3);
2210       dct_trn16(row4, row5);
2211       dct_trn16(row6, row7);
2212 
2213       // pass 2
2214       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2215       dct_trn32(row1, row3);
2216       dct_trn32(row4, row6);
2217       dct_trn32(row5, row7);
2218 
2219       // pass 3
2220       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2221       dct_trn64(row1, row5);
2222       dct_trn64(row2, row6);
2223       dct_trn64(row3, row7);
2224 
2225 #undef dct_trn16
2226 #undef dct_trn32
2227 #undef dct_trn64
2228    }
2229 
2230    // row pass
2231    // vrshrn_n_s32 only supports shifts up to 16, we need
2232    // 17. so do a non-rounding shift of 16 first then follow
2233    // up with a rounding shift by 1.
2234    dct_pass(vshrn_n_s32, 16);
2235 
2236    {
2237       // pack and round
2238       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2239       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2240       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2241       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2242       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2243       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2244       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2245       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2246 
2247       // again, these can translate into one instruction, but often don't.
2248 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2249 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2250 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2251 
2252       // sadly can't use interleaved stores here since we only write
2253       // 8 bytes to each scan line!
2254 
2255       // 8x8 8-bit transpose pass 1
2256       dct_trn8_8(p0, p1);
2257       dct_trn8_8(p2, p3);
2258       dct_trn8_8(p4, p5);
2259       dct_trn8_8(p6, p7);
2260 
2261       // pass 2
2262       dct_trn8_16(p0, p2);
2263       dct_trn8_16(p1, p3);
2264       dct_trn8_16(p4, p6);
2265       dct_trn8_16(p5, p7);
2266 
2267       // pass 3
2268       dct_trn8_32(p0, p4);
2269       dct_trn8_32(p1, p5);
2270       dct_trn8_32(p2, p6);
2271       dct_trn8_32(p3, p7);
2272 
2273       // store
2274       vst1_u8(out, p0); out += out_stride;
2275       vst1_u8(out, p1); out += out_stride;
2276       vst1_u8(out, p2); out += out_stride;
2277       vst1_u8(out, p3); out += out_stride;
2278       vst1_u8(out, p4); out += out_stride;
2279       vst1_u8(out, p5); out += out_stride;
2280       vst1_u8(out, p6); out += out_stride;
2281       vst1_u8(out, p7);
2282 
2283 #undef dct_trn8_8
2284 #undef dct_trn8_16
2285 #undef dct_trn8_32
2286    }
2287 
2288 #undef dct_long_mul
2289 #undef dct_long_mac
2290 #undef dct_widen
2291 #undef dct_wadd
2292 #undef dct_wsub
2293 #undef dct_bfly32o
2294 #undef dct_pass
2295 }
2296 
2297 #endif // STBI_NEON
2298 
2299 #define STBI__MARKER_none  0xff
2300 // if there's a pending marker from the entropy stream, return that
2301 // otherwise, fetch from the stream and get a marker. if there's no
2302 // marker, return 0xff, which is never a valid marker value
stbi__get_marker(stbi__jpeg * j)2303 static stbi_uc stbi__get_marker(stbi__jpeg *j)
2304 {
2305    stbi_uc x;
2306    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2307    x = stbi__get8(j->s);
2308    if (x != 0xff) return STBI__MARKER_none;
2309    while (x == 0xff)
2310       x = stbi__get8(j->s);
2311    return x;
2312 }
2313 
2314 // in each scan, we'll have scan_n components, and the order
2315 // of the components is specified by order[]
2316 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
2317 
2318 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2319 // the dc prediction
stbi__jpeg_reset(stbi__jpeg * j)2320 static void stbi__jpeg_reset(stbi__jpeg *j)
2321 {
2322    j->code_bits = 0;
2323    j->code_buffer = 0;
2324    j->nomore = 0;
2325    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
2326    j->marker = STBI__MARKER_none;
2327    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2328    j->eob_run = 0;
2329    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2330    // since we don't even allow 1<<30 pixels
2331 }
2332 
stbi__parse_entropy_coded_data(stbi__jpeg * z)2333 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2334 {
2335    stbi__jpeg_reset(z);
2336    if (!z->progressive) {
2337       if (z->scan_n == 1) {
2338          int i,j;
2339          STBI_SIMD_ALIGN(short, data[64]);
2340          int n = z->order[0];
2341          // non-interleaved data, we just need to process one block at a time,
2342          // in trivial scanline order
2343          // number of blocks to do just depends on how many actual "pixels" this
2344          // component has, independent of interleaved MCU blocking and such
2345          int w = (z->img_comp[n].x+7) >> 3;
2346          int h = (z->img_comp[n].y+7) >> 3;
2347          for (j=0; j < h; ++j) {
2348             for (i=0; i < w; ++i) {
2349                int ha = z->img_comp[n].ha;
2350                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2351                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2352                // every data block is an MCU, so countdown the restart interval
2353                if (--z->todo <= 0) {
2354                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2355                   // if it's NOT a restart, then just bail, so we get corrupt data
2356                   // rather than no data
2357                   if (!STBI__RESTART(z->marker)) return 1;
2358                   stbi__jpeg_reset(z);
2359                }
2360             }
2361          }
2362          return 1;
2363       } else { // interleaved
2364          int i,j,k,x,y;
2365          STBI_SIMD_ALIGN(short, data[64]);
2366          for (j=0; j < z->img_mcu_y; ++j) {
2367             for (i=0; i < z->img_mcu_x; ++i) {
2368                // scan an interleaved mcu... process scan_n components in order
2369                for (k=0; k < z->scan_n; ++k) {
2370                   int n = z->order[k];
2371                   // scan out an mcu's worth of this component; that's just determined
2372                   // by the basic H and V specified for the component
2373                   for (y=0; y < z->img_comp[n].v; ++y) {
2374                      for (x=0; x < z->img_comp[n].h; ++x) {
2375                         int x2 = (i*z->img_comp[n].h + x)*8;
2376                         int y2 = (j*z->img_comp[n].v + y)*8;
2377                         int ha = z->img_comp[n].ha;
2378                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2379                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2380                      }
2381                   }
2382                }
2383                // after all interleaved components, that's an interleaved MCU,
2384                // so now count down the restart interval
2385                if (--z->todo <= 0) {
2386                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2387                   if (!STBI__RESTART(z->marker)) return 1;
2388                   stbi__jpeg_reset(z);
2389                }
2390             }
2391          }
2392          return 1;
2393       }
2394    } else {
2395       if (z->scan_n == 1) {
2396          int i,j;
2397          int n = z->order[0];
2398          // non-interleaved data, we just need to process one block at a time,
2399          // in trivial scanline order
2400          // number of blocks to do just depends on how many actual "pixels" this
2401          // component has, independent of interleaved MCU blocking and such
2402          int w = (z->img_comp[n].x+7) >> 3;
2403          int h = (z->img_comp[n].y+7) >> 3;
2404          for (j=0; j < h; ++j) {
2405             for (i=0; i < w; ++i) {
2406                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2407                if (z->spec_start == 0) {
2408                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2409                      return 0;
2410                } else {
2411                   int ha = z->img_comp[n].ha;
2412                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2413                      return 0;
2414                }
2415                // every data block is an MCU, so countdown the restart interval
2416                if (--z->todo <= 0) {
2417                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2418                   if (!STBI__RESTART(z->marker)) return 1;
2419                   stbi__jpeg_reset(z);
2420                }
2421             }
2422          }
2423          return 1;
2424       } else { // interleaved
2425          int i,j,k,x,y;
2426          for (j=0; j < z->img_mcu_y; ++j) {
2427             for (i=0; i < z->img_mcu_x; ++i) {
2428                // scan an interleaved mcu... process scan_n components in order
2429                for (k=0; k < z->scan_n; ++k) {
2430                   int n = z->order[k];
2431                   // scan out an mcu's worth of this component; that's just determined
2432                   // by the basic H and V specified for the component
2433                   for (y=0; y < z->img_comp[n].v; ++y) {
2434                      for (x=0; x < z->img_comp[n].h; ++x) {
2435                         int x2 = (i*z->img_comp[n].h + x);
2436                         int y2 = (j*z->img_comp[n].v + y);
2437                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
2438                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2439                            return 0;
2440                      }
2441                   }
2442                }
2443                // after all interleaved components, that's an interleaved MCU,
2444                // so now count down the restart interval
2445                if (--z->todo <= 0) {
2446                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2447                   if (!STBI__RESTART(z->marker)) return 1;
2448                   stbi__jpeg_reset(z);
2449                }
2450             }
2451          }
2452          return 1;
2453       }
2454    }
2455 }
2456 
stbi__jpeg_dequantize(short * data,stbi_uc * dequant)2457 static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant)
2458 {
2459    int i;
2460    for (i=0; i < 64; ++i)
2461       data[i] *= dequant[i];
2462 }
2463 
stbi__jpeg_finish(stbi__jpeg * z)2464 static void stbi__jpeg_finish(stbi__jpeg *z)
2465 {
2466    if (z->progressive) {
2467       // dequantize and idct the data
2468       int i,j,n;
2469       for (n=0; n < z->s->img_n; ++n) {
2470          int w = (z->img_comp[n].x+7) >> 3;
2471          int h = (z->img_comp[n].y+7) >> 3;
2472          for (j=0; j < h; ++j) {
2473             for (i=0; i < w; ++i) {
2474                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2475                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
2476                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2477             }
2478          }
2479       }
2480    }
2481 }
2482 
stbi__process_marker(stbi__jpeg * z,int m)2483 static int stbi__process_marker(stbi__jpeg *z, int m)
2484 {
2485    int L;
2486    switch (m) {
2487       case STBI__MARKER_none: // no marker found
2488          return stbi__err("expected marker","Corrupt JPEG");
2489 
2490       case 0xDD: // DRI - specify restart interval
2491          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
2492          z->restart_interval = stbi__get16be(z->s);
2493          return 1;
2494 
2495       case 0xDB: // DQT - define quantization table
2496          L = stbi__get16be(z->s)-2;
2497          while (L > 0) {
2498             int q = stbi__get8(z->s);
2499             int p = q >> 4;
2500             int t = q & 15,i;
2501             if (p != 0) return stbi__err("bad DQT type","Corrupt JPEG");
2502             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
2503             for (i=0; i < 64; ++i)
2504                z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
2505             L -= 65;
2506          }
2507          return L==0;
2508 
2509       case 0xC4: // DHT - define huffman table
2510          L = stbi__get16be(z->s)-2;
2511          while (L > 0) {
2512             stbi_uc *v;
2513             int sizes[16],i,n=0;
2514             int q = stbi__get8(z->s);
2515             int tc = q >> 4;
2516             int th = q & 15;
2517             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
2518             for (i=0; i < 16; ++i) {
2519                sizes[i] = stbi__get8(z->s);
2520                n += sizes[i];
2521             }
2522             L -= 17;
2523             if (tc == 0) {
2524                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
2525                v = z->huff_dc[th].values;
2526             } else {
2527                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
2528                v = z->huff_ac[th].values;
2529             }
2530             for (i=0; i < n; ++i)
2531                v[i] = stbi__get8(z->s);
2532             if (tc != 0)
2533                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
2534             L -= n;
2535          }
2536          return L==0;
2537    }
2538    // check for comment block or APP blocks
2539    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
2540       stbi__skip(z->s, stbi__get16be(z->s)-2);
2541       return 1;
2542    }
2543    return 0;
2544 }
2545 
2546 // after we see SOS
stbi__process_scan_header(stbi__jpeg * z)2547 static int stbi__process_scan_header(stbi__jpeg *z)
2548 {
2549    int i;
2550    int Ls = stbi__get16be(z->s);
2551    z->scan_n = stbi__get8(z->s);
2552    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
2553    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
2554    for (i=0; i < z->scan_n; ++i) {
2555       int id = stbi__get8(z->s), which;
2556       int q = stbi__get8(z->s);
2557       for (which = 0; which < z->s->img_n; ++which)
2558          if (z->img_comp[which].id == id)
2559             break;
2560       if (which == z->s->img_n) return 0; // no match
2561       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
2562       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
2563       z->order[i] = which;
2564    }
2565 
2566    {
2567       int aa;
2568       z->spec_start = stbi__get8(z->s);
2569       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
2570       aa = stbi__get8(z->s);
2571       z->succ_high = (aa >> 4);
2572       z->succ_low  = (aa & 15);
2573       if (z->progressive) {
2574          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
2575             return stbi__err("bad SOS", "Corrupt JPEG");
2576       } else {
2577          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
2578          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
2579          z->spec_end = 63;
2580       }
2581    }
2582 
2583    return 1;
2584 }
2585 
stbi__process_frame_header(stbi__jpeg * z,int scan)2586 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
2587 {
2588    stbi__context *s = z->s;
2589    int Lf,p,i,q, h_max=1,v_max=1,c;
2590    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
2591    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
2592    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
2593    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
2594    c = stbi__get8(s);
2595    if (c != 3 && c != 1) return stbi__err("bad component count","Corrupt JPEG");    // JFIF requires
2596    s->img_n = c;
2597    for (i=0; i < c; ++i) {
2598       z->img_comp[i].data = NULL;
2599       z->img_comp[i].linebuf = NULL;
2600    }
2601 
2602    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
2603 
2604    for (i=0; i < s->img_n; ++i) {
2605       z->img_comp[i].id = stbi__get8(s);
2606       if (z->img_comp[i].id != i+1)   // JFIF requires
2607          if (z->img_comp[i].id != i)  // some version of jpegtran outputs non-JFIF-compliant files!
2608             return stbi__err("bad component ID","Corrupt JPEG");
2609       q = stbi__get8(s);
2610       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
2611       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
2612       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
2613    }
2614 
2615    if (scan != STBI__SCAN_load) return 1;
2616 
2617    if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
2618 
2619    for (i=0; i < s->img_n; ++i) {
2620       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
2621       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
2622    }
2623 
2624    // compute interleaved mcu info
2625    z->img_h_max = h_max;
2626    z->img_v_max = v_max;
2627    z->img_mcu_w = h_max * 8;
2628    z->img_mcu_h = v_max * 8;
2629    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
2630    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
2631 
2632    for (i=0; i < s->img_n; ++i) {
2633       // number of effective pixels (e.g. for non-interleaved MCU)
2634       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
2635       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
2636       // to simplify generation, we'll allocate enough memory to decode
2637       // the bogus oversized data from using interleaved MCUs and their
2638       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
2639       // discard the extra data until colorspace conversion
2640       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
2641       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
2642       z->img_comp[i].raw_data = stbi__malloc(z->img_comp[i].w2 * z->img_comp[i].h2+15);
2643 
2644       if (z->img_comp[i].raw_data == NULL) {
2645          for(--i; i >= 0; --i) {
2646             STBI_FREE(z->img_comp[i].raw_data);
2647             z->img_comp[i].data = NULL;
2648          }
2649          return stbi__err("outofmem", "Out of memory");
2650       }
2651       // align blocks for idct using mmx/sse
2652       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
2653       z->img_comp[i].linebuf = NULL;
2654       if (z->progressive) {
2655          z->img_comp[i].coeff_w = (z->img_comp[i].w2 + 7) >> 3;
2656          z->img_comp[i].coeff_h = (z->img_comp[i].h2 + 7) >> 3;
2657          z->img_comp[i].raw_coeff = STBI_MALLOC(z->img_comp[i].coeff_w * z->img_comp[i].coeff_h * 64 * sizeof(short) + 15);
2658          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
2659       } else {
2660          z->img_comp[i].coeff = 0;
2661          z->img_comp[i].raw_coeff = 0;
2662       }
2663    }
2664 
2665    return 1;
2666 }
2667 
2668 // use comparisons since in some cases we handle more than one case (e.g. SOF)
2669 #define stbi__DNL(x)         ((x) == 0xdc)
2670 #define stbi__SOI(x)         ((x) == 0xd8)
2671 #define stbi__EOI(x)         ((x) == 0xd9)
2672 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
2673 #define stbi__SOS(x)         ((x) == 0xda)
2674 
2675 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
2676 
stbi__decode_jpeg_header(stbi__jpeg * z,int scan)2677 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
2678 {
2679    int m;
2680    z->marker = STBI__MARKER_none; // initialize cached marker to empty
2681    m = stbi__get_marker(z);
2682    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
2683    if (scan == STBI__SCAN_type) return 1;
2684    m = stbi__get_marker(z);
2685    while (!stbi__SOF(m)) {
2686       if (!stbi__process_marker(z,m)) return 0;
2687       m = stbi__get_marker(z);
2688       while (m == STBI__MARKER_none) {
2689          // some files have extra padding after their blocks, so ok, we'll scan
2690          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
2691          m = stbi__get_marker(z);
2692       }
2693    }
2694    z->progressive = stbi__SOF_progressive(m);
2695    if (!stbi__process_frame_header(z, scan)) return 0;
2696    return 1;
2697 }
2698 
2699 // decode image to YCbCr format
stbi__decode_jpeg_image(stbi__jpeg * j)2700 static int stbi__decode_jpeg_image(stbi__jpeg *j)
2701 {
2702    int m;
2703    j->restart_interval = 0;
2704    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
2705    m = stbi__get_marker(j);
2706    while (!stbi__EOI(m)) {
2707       if (stbi__SOS(m)) {
2708          if (!stbi__process_scan_header(j)) return 0;
2709          if (!stbi__parse_entropy_coded_data(j)) return 0;
2710          if (j->marker == STBI__MARKER_none ) {
2711             // handle 0s at the end of image data from IP Kamera 9060
2712             while (!stbi__at_eof(j->s)) {
2713                int x = stbi__get8(j->s);
2714                if (x == 255) {
2715                   j->marker = stbi__get8(j->s);
2716                   break;
2717                } else if (x != 0) {
2718                   return stbi__err("junk before marker", "Corrupt JPEG");
2719                }
2720             }
2721             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
2722          }
2723       } else {
2724          if (!stbi__process_marker(j, m)) return 0;
2725       }
2726       m = stbi__get_marker(j);
2727    }
2728    if (j->progressive)
2729       stbi__jpeg_finish(j);
2730    return 1;
2731 }
2732 
2733 // static jfif-centered resampling (across block boundaries)
2734 
2735 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
2736                                     int w, int hs);
2737 
2738 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
2739 
resample_row_1(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2740 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2741 {
2742    STBI_NOTUSED(out);
2743    STBI_NOTUSED(in_far);
2744    STBI_NOTUSED(w);
2745    STBI_NOTUSED(hs);
2746    return in_near;
2747 }
2748 
stbi__resample_row_v_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2749 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2750 {
2751    // need to generate two samples vertically for every one in input
2752    int i;
2753    STBI_NOTUSED(hs);
2754    for (i=0; i < w; ++i)
2755       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
2756    return out;
2757 }
2758 
stbi__resample_row_h_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2759 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2760 {
2761    // need to generate two samples horizontally for every one in input
2762    int i;
2763    stbi_uc *input = in_near;
2764 
2765    if (w == 1) {
2766       // if only one sample, can't do any interpolation
2767       out[0] = out[1] = input[0];
2768       return out;
2769    }
2770 
2771    out[0] = input[0];
2772    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
2773    for (i=1; i < w-1; ++i) {
2774       int n = 3*input[i]+2;
2775       out[i*2+0] = stbi__div4(n+input[i-1]);
2776       out[i*2+1] = stbi__div4(n+input[i+1]);
2777    }
2778    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
2779    out[i*2+1] = input[w-1];
2780 
2781    STBI_NOTUSED(in_far);
2782    STBI_NOTUSED(hs);
2783 
2784    return out;
2785 }
2786 
2787 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
2788 
stbi__resample_row_hv_2(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2789 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2790 {
2791    // need to generate 2x2 samples for every one in input
2792    int i,t0,t1;
2793    if (w == 1) {
2794       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2795       return out;
2796    }
2797 
2798    t1 = 3*in_near[0] + in_far[0];
2799    out[0] = stbi__div4(t1+2);
2800    for (i=1; i < w; ++i) {
2801       t0 = t1;
2802       t1 = 3*in_near[i]+in_far[i];
2803       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
2804       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
2805    }
2806    out[w*2-1] = stbi__div4(t1+2);
2807 
2808    STBI_NOTUSED(hs);
2809 
2810    return out;
2811 }
2812 
2813 #if defined(STBI_SSE2) || defined(STBI_NEON)
stbi__resample_row_hv_2_simd(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2814 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2815 {
2816    // need to generate 2x2 samples for every one in input
2817    int i=0,t0,t1;
2818 
2819    if (w == 1) {
2820       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
2821       return out;
2822    }
2823 
2824    t1 = 3*in_near[0] + in_far[0];
2825    // process groups of 8 pixels for as long as we can.
2826    // note we can't handle the last pixel in a row in this loop
2827    // because we need to handle the filter boundary conditions.
2828    for (; i < ((w-1) & ~7); i += 8) {
2829 #if defined(STBI_SSE2)
2830       // load and perform the vertical filtering pass
2831       // this uses 3*x + y = 4*x + (y - x)
2832       __m128i zero  = _mm_setzero_si128();
2833       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
2834       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
2835       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
2836       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
2837       __m128i diff  = _mm_sub_epi16(farw, nearw);
2838       __m128i nears = _mm_slli_epi16(nearw, 2);
2839       __m128i curr  = _mm_add_epi16(nears, diff); // current row
2840 
2841       // horizontal filter works the same based on shifted vers of current
2842       // row. "prev" is current row shifted right by 1 pixel; we need to
2843       // insert the previous pixel value (from t1).
2844       // "next" is current row shifted left by 1 pixel, with first pixel
2845       // of next block of 8 pixels added in.
2846       __m128i prv0 = _mm_slli_si128(curr, 2);
2847       __m128i nxt0 = _mm_srli_si128(curr, 2);
2848       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
2849       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
2850 
2851       // horizontal filter, polyphase implementation since it's convenient:
2852       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
2853       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
2854       // note the shared term.
2855       __m128i bias  = _mm_set1_epi16(8);
2856       __m128i curs = _mm_slli_epi16(curr, 2);
2857       __m128i prvd = _mm_sub_epi16(prev, curr);
2858       __m128i nxtd = _mm_sub_epi16(next, curr);
2859       __m128i curb = _mm_add_epi16(curs, bias);
2860       __m128i even = _mm_add_epi16(prvd, curb);
2861       __m128i odd  = _mm_add_epi16(nxtd, curb);
2862 
2863       // interleave even and odd pixels, then undo scaling.
2864       __m128i int0 = _mm_unpacklo_epi16(even, odd);
2865       __m128i int1 = _mm_unpackhi_epi16(even, odd);
2866       __m128i de0  = _mm_srli_epi16(int0, 4);
2867       __m128i de1  = _mm_srli_epi16(int1, 4);
2868 
2869       // pack and write output
2870       __m128i outv = _mm_packus_epi16(de0, de1);
2871       _mm_storeu_si128((__m128i *) (out + i*2), outv);
2872 #elif defined(STBI_NEON)
2873       // load and perform the vertical filtering pass
2874       // this uses 3*x + y = 4*x + (y - x)
2875       uint8x8_t farb  = vld1_u8(in_far + i);
2876       uint8x8_t nearb = vld1_u8(in_near + i);
2877       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
2878       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
2879       int16x8_t curr  = vaddq_s16(nears, diff); // current row
2880 
2881       // horizontal filter works the same based on shifted vers of current
2882       // row. "prev" is current row shifted right by 1 pixel; we need to
2883       // insert the previous pixel value (from t1).
2884       // "next" is current row shifted left by 1 pixel, with first pixel
2885       // of next block of 8 pixels added in.
2886       int16x8_t prv0 = vextq_s16(curr, curr, 7);
2887       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
2888       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
2889       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
2890 
2891       // horizontal filter, polyphase implementation since it's convenient:
2892       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
2893       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
2894       // note the shared term.
2895       int16x8_t curs = vshlq_n_s16(curr, 2);
2896       int16x8_t prvd = vsubq_s16(prev, curr);
2897       int16x8_t nxtd = vsubq_s16(next, curr);
2898       int16x8_t even = vaddq_s16(curs, prvd);
2899       int16x8_t odd  = vaddq_s16(curs, nxtd);
2900 
2901       // undo scaling and round, then store with even/odd phases interleaved
2902       uint8x8x2_t o;
2903       o.val[0] = vqrshrun_n_s16(even, 4);
2904       o.val[1] = vqrshrun_n_s16(odd,  4);
2905       vst2_u8(out + i*2, o);
2906 #endif
2907 
2908       // "previous" value for next iter
2909       t1 = 3*in_near[i+7] + in_far[i+7];
2910    }
2911 
2912    t0 = t1;
2913    t1 = 3*in_near[i] + in_far[i];
2914    out[i*2] = stbi__div16(3*t1 + t0 + 8);
2915 
2916    for (++i; i < w; ++i) {
2917       t0 = t1;
2918       t1 = 3*in_near[i]+in_far[i];
2919       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
2920       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
2921    }
2922    out[w*2-1] = stbi__div4(t1+2);
2923 
2924    STBI_NOTUSED(hs);
2925 
2926    return out;
2927 }
2928 #endif
2929 
stbi__resample_row_generic(stbi_uc * out,stbi_uc * in_near,stbi_uc * in_far,int w,int hs)2930 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
2931 {
2932    // resample with nearest-neighbor
2933    int i,j;
2934    STBI_NOTUSED(in_far);
2935    for (i=0; i < w; ++i)
2936       for (j=0; j < hs; ++j)
2937          out[i*hs+j] = in_near[i];
2938    return out;
2939 }
2940 
2941 #ifdef STBI_JPEG_OLD
2942 // this is the same YCbCr-to-RGB calculation that stb_image has used
2943 // historically before the algorithm changes in 1.49
2944 #define float2fixed(x)  ((int) ((x) * 65536 + 0.5))
stbi__YCbCr_to_RGB_row(stbi_uc * out,const stbi_uc * y,const stbi_uc * pcb,const stbi_uc * pcr,int count,int step)2945 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
2946 {
2947    int i;
2948    for (i=0; i < count; ++i) {
2949       int y_fixed = (y[i] << 16) + 32768; // rounding
2950       int r,g,b;
2951       int cr = pcr[i] - 128;
2952       int cb = pcb[i] - 128;
2953       r = y_fixed + cr*float2fixed(1.40200f);
2954       g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
2955       b = y_fixed                            + cb*float2fixed(1.77200f);
2956       r >>= 16;
2957       g >>= 16;
2958       b >>= 16;
2959       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
2960       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
2961       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
2962       out[0] = (stbi_uc)r;
2963       out[1] = (stbi_uc)g;
2964       out[2] = (stbi_uc)b;
2965       out[3] = 255;
2966       out += step;
2967    }
2968 }
2969 #else
2970 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
2971 // to make sure the code produces the same results in both SIMD and scalar
2972 #define float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
stbi__YCbCr_to_RGB_row(stbi_uc * out,const stbi_uc * y,const stbi_uc * pcb,const stbi_uc * pcr,int count,int step)2973 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
2974 {
2975    int i;
2976    for (i=0; i < count; ++i) {
2977       int y_fixed = (y[i] << 20) + (1<<19); // rounding
2978       int r,g,b;
2979       int cr = pcr[i] - 128;
2980       int cb = pcb[i] - 128;
2981       r = y_fixed +  cr* float2fixed(1.40200f);
2982       g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
2983       b = y_fixed                               +   cb* float2fixed(1.77200f);
2984       r >>= 20;
2985       g >>= 20;
2986       b >>= 20;
2987       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
2988       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
2989       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
2990       out[0] = (stbi_uc)r;
2991       out[1] = (stbi_uc)g;
2992       out[2] = (stbi_uc)b;
2993       out[3] = 255;
2994       out += step;
2995    }
2996 }
2997 #endif
2998 
2999 #if defined(STBI_SSE2) || defined(STBI_NEON)
stbi__YCbCr_to_RGB_simd(stbi_uc * out,stbi_uc const * y,stbi_uc const * pcb,stbi_uc const * pcr,int count,int step)3000 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3001 {
3002    int i = 0;
3003 
3004 #ifdef STBI_SSE2
3005    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3006    // it's useful in practice (you wouldn't use it for textures, for example).
3007    // so just accelerate step == 4 case.
3008    if (step == 4) {
3009       // this is a fairly straightforward implementation and not super-optimized.
3010       __m128i signflip  = _mm_set1_epi8(-0x80);
3011       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
3012       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3013       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3014       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
3015       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3016       __m128i xw = _mm_set1_epi16(255); // alpha channel
3017 
3018       for (; i+7 < count; i += 8) {
3019          // load
3020          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3021          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3022          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3023          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3024          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3025 
3026          // unpack to short (and left-shift cr, cb by 8)
3027          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
3028          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3029          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3030 
3031          // color transform
3032          __m128i yws = _mm_srli_epi16(yw, 4);
3033          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3034          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3035          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3036          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3037          __m128i rws = _mm_add_epi16(cr0, yws);
3038          __m128i gwt = _mm_add_epi16(cb0, yws);
3039          __m128i bws = _mm_add_epi16(yws, cb1);
3040          __m128i gws = _mm_add_epi16(gwt, cr1);
3041 
3042          // descale
3043          __m128i rw = _mm_srai_epi16(rws, 4);
3044          __m128i bw = _mm_srai_epi16(bws, 4);
3045          __m128i gw = _mm_srai_epi16(gws, 4);
3046 
3047          // back to byte, set up for transpose
3048          __m128i brb = _mm_packus_epi16(rw, bw);
3049          __m128i gxb = _mm_packus_epi16(gw, xw);
3050 
3051          // transpose to interleave channels
3052          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3053          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3054          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3055          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3056 
3057          // store
3058          _mm_storeu_si128((__m128i *) (out + 0), o0);
3059          _mm_storeu_si128((__m128i *) (out + 16), o1);
3060          out += 32;
3061       }
3062    }
3063 #endif
3064 
3065 #ifdef STBI_NEON
3066    // in this version, step=3 support would be easy to add. but is there demand?
3067    if (step == 4) {
3068       // this is a fairly straightforward implementation and not super-optimized.
3069       uint8x8_t signflip = vdup_n_u8(0x80);
3070       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
3071       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3072       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3073       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
3074 
3075       for (; i+7 < count; i += 8) {
3076          // load
3077          uint8x8_t y_bytes  = vld1_u8(y + i);
3078          uint8x8_t cr_bytes = vld1_u8(pcr + i);
3079          uint8x8_t cb_bytes = vld1_u8(pcb + i);
3080          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3081          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3082 
3083          // expand to s16
3084          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3085          int16x8_t crw = vshll_n_s8(cr_biased, 7);
3086          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3087 
3088          // color transform
3089          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3090          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3091          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3092          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3093          int16x8_t rws = vaddq_s16(yws, cr0);
3094          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3095          int16x8_t bws = vaddq_s16(yws, cb1);
3096 
3097          // undo scaling, round, convert to byte
3098          uint8x8x4_t o;
3099          o.val[0] = vqrshrun_n_s16(rws, 4);
3100          o.val[1] = vqrshrun_n_s16(gws, 4);
3101          o.val[2] = vqrshrun_n_s16(bws, 4);
3102          o.val[3] = vdup_n_u8(255);
3103 
3104          // store, interleaving r/g/b/a
3105          vst4_u8(out, o);
3106          out += 8*4;
3107       }
3108    }
3109 #endif
3110 
3111    for (; i < count; ++i) {
3112       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3113       int r,g,b;
3114       int cr = pcr[i] - 128;
3115       int cb = pcb[i] - 128;
3116       r = y_fixed + cr* float2fixed(1.40200f);
3117       g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3118       b = y_fixed                             +   cb* float2fixed(1.77200f);
3119       r >>= 20;
3120       g >>= 20;
3121       b >>= 20;
3122       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3123       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3124       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3125       out[0] = (stbi_uc)r;
3126       out[1] = (stbi_uc)g;
3127       out[2] = (stbi_uc)b;
3128       out[3] = 255;
3129       out += step;
3130    }
3131 }
3132 #endif
3133 
3134 // set up the kernels
stbi__setup_jpeg(stbi__jpeg * j)3135 static void stbi__setup_jpeg(stbi__jpeg *j)
3136 {
3137    j->idct_block_kernel = stbi__idct_block;
3138    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3139    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3140 
3141 #ifdef STBI_SSE2
3142    if (stbi__sse2_available()) {
3143       j->idct_block_kernel = stbi__idct_simd;
3144       #ifndef STBI_JPEG_OLD
3145       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3146       #endif
3147       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3148    }
3149 #endif
3150 
3151 #ifdef STBI_NEON
3152    j->idct_block_kernel = stbi__idct_simd;
3153    #ifndef STBI_JPEG_OLD
3154    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3155    #endif
3156    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3157 #endif
3158 }
3159 
3160 // clean up the temporary component buffers
stbi__cleanup_jpeg(stbi__jpeg * j)3161 static void stbi__cleanup_jpeg(stbi__jpeg *j)
3162 {
3163    int i;
3164    for (i=0; i < j->s->img_n; ++i) {
3165       if (j->img_comp[i].raw_data) {
3166          STBI_FREE(j->img_comp[i].raw_data);
3167          j->img_comp[i].raw_data = NULL;
3168          j->img_comp[i].data = NULL;
3169       }
3170       if (j->img_comp[i].raw_coeff) {
3171          STBI_FREE(j->img_comp[i].raw_coeff);
3172          j->img_comp[i].raw_coeff = 0;
3173          j->img_comp[i].coeff = 0;
3174       }
3175       if (j->img_comp[i].linebuf) {
3176          STBI_FREE(j->img_comp[i].linebuf);
3177          j->img_comp[i].linebuf = NULL;
3178       }
3179    }
3180 }
3181 
3182 typedef struct
3183 {
3184    resample_row_func resample;
3185    stbi_uc *line0,*line1;
3186    int hs,vs;   // expansion factor in each axis
3187    int w_lores; // horizontal pixels pre-expansion
3188    int ystep;   // how far through vertical expansion we are
3189    int ypos;    // which pre-expansion row we're on
3190 } stbi__resample;
3191 
load_jpeg_image(stbi__jpeg * z,int * out_x,int * out_y,int * comp,int req_comp)3192 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3193 {
3194    int n, decode_n;
3195    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3196 
3197    // validate req_comp
3198    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3199 
3200    // load a jpeg image from whichever source, but leave in YCbCr format
3201    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3202 
3203    // determine actual number of components to generate
3204    n = req_comp ? req_comp : z->s->img_n;
3205 
3206    if (z->s->img_n == 3 && n < 3)
3207       decode_n = 1;
3208    else
3209       decode_n = z->s->img_n;
3210 
3211    // resample and color-convert
3212    {
3213       int k;
3214       unsigned int i,j;
3215       stbi_uc *output;
3216       stbi_uc *coutput[4];
3217 
3218       stbi__resample res_comp[4];
3219 
3220       for (k=0; k < decode_n; ++k) {
3221          stbi__resample *r = &res_comp[k];
3222 
3223          // allocate line buffer big enough for upsampling off the edges
3224          // with upsample factor of 4
3225          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3226          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3227 
3228          r->hs      = z->img_h_max / z->img_comp[k].h;
3229          r->vs      = z->img_v_max / z->img_comp[k].v;
3230          r->ystep   = r->vs >> 1;
3231          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3232          r->ypos    = 0;
3233          r->line0   = r->line1 = z->img_comp[k].data;
3234 
3235          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3236          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3237          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3238          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3239          else                               r->resample = stbi__resample_row_generic;
3240       }
3241 
3242       // can't error after this so, this is safe
3243       output = (stbi_uc *) stbi__malloc(n * z->s->img_x * z->s->img_y + 1);
3244       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3245 
3246       // now go ahead and resample
3247       for (j=0; j < z->s->img_y; ++j) {
3248          stbi_uc *out = output + n * z->s->img_x * j;
3249          for (k=0; k < decode_n; ++k) {
3250             stbi__resample *r = &res_comp[k];
3251             int y_bot = r->ystep >= (r->vs >> 1);
3252             coutput[k] = r->resample(z->img_comp[k].linebuf,
3253                                      y_bot ? r->line1 : r->line0,
3254                                      y_bot ? r->line0 : r->line1,
3255                                      r->w_lores, r->hs);
3256             if (++r->ystep >= r->vs) {
3257                r->ystep = 0;
3258                r->line0 = r->line1;
3259                if (++r->ypos < z->img_comp[k].y)
3260                   r->line1 += z->img_comp[k].w2;
3261             }
3262          }
3263          if (n >= 3) {
3264             stbi_uc *y = coutput[0];
3265             if (z->s->img_n == 3) {
3266                z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3267             } else
3268                for (i=0; i < z->s->img_x; ++i) {
3269                   out[0] = out[1] = out[2] = y[i];
3270                   out[3] = 255; // not used if n==3
3271                   out += n;
3272                }
3273          } else {
3274             stbi_uc *y = coutput[0];
3275             if (n == 1)
3276                for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
3277             else
3278                for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
3279          }
3280       }
3281       stbi__cleanup_jpeg(z);
3282       *out_x = z->s->img_x;
3283       *out_y = z->s->img_y;
3284       if (comp) *comp  = z->s->img_n; // report original components, not output
3285       return output;
3286    }
3287 }
3288 
stbi__jpeg_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)3289 static unsigned char *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
3290 {
3291    stbi__jpeg j;
3292    j.s = s;
3293    stbi__setup_jpeg(&j);
3294    return load_jpeg_image(&j, x,y,comp,req_comp);
3295 }
3296 
stbi__jpeg_test(stbi__context * s)3297 static int stbi__jpeg_test(stbi__context *s)
3298 {
3299    int r;
3300    stbi__jpeg j;
3301    j.s = s;
3302    stbi__setup_jpeg(&j);
3303    r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
3304    stbi__rewind(s);
3305    return r;
3306 }
3307 
stbi__jpeg_info_raw(stbi__jpeg * j,int * x,int * y,int * comp)3308 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
3309 {
3310    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3311       stbi__rewind( j->s );
3312       return 0;
3313    }
3314    if (x) *x = j->s->img_x;
3315    if (y) *y = j->s->img_y;
3316    if (comp) *comp = j->s->img_n;
3317    return 1;
3318 }
3319 
stbi__jpeg_info(stbi__context * s,int * x,int * y,int * comp)3320 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
3321 {
3322    stbi__jpeg j;
3323    j.s = s;
3324    return stbi__jpeg_info_raw(&j, x, y, comp);
3325 }
3326 #endif
3327 
3328 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
3329 //    simple implementation
3330 //      - all input must be provided in an upfront buffer
3331 //      - all output is written to a single output buffer (can malloc/realloc)
3332 //    performance
3333 //      - fast huffman
3334 
3335 #ifndef STBI_NO_ZLIB
3336 
3337 // fast-way is faster to check than jpeg huffman, but slow way is slower
3338 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
3339 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
3340 
3341 // zlib-style huffman encoding
3342 // (jpegs packs from left, zlib from right, so can't share code)
3343 typedef struct
3344 {
3345    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
3346    stbi__uint16 firstcode[16];
3347    int maxcode[17];
3348    stbi__uint16 firstsymbol[16];
3349    stbi_uc  size[288];
3350    stbi__uint16 value[288];
3351 } stbi__zhuffman;
3352 
stbi__bitreverse16(int n)3353 stbi_inline static int stbi__bitreverse16(int n)
3354 {
3355   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
3356   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
3357   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
3358   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
3359   return n;
3360 }
3361 
stbi__bit_reverse(int v,int bits)3362 stbi_inline static int stbi__bit_reverse(int v, int bits)
3363 {
3364    STBI_ASSERT(bits <= 16);
3365    // to bit reverse n bits, reverse 16 and shift
3366    // e.g. 11 bits, bit reverse and shift away 5
3367    return stbi__bitreverse16(v) >> (16-bits);
3368 }
3369 
stbi__zbuild_huffman(stbi__zhuffman * z,stbi_uc * sizelist,int num)3370 static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num)
3371 {
3372    int i,k=0;
3373    int code, next_code[16], sizes[17];
3374 
3375    // DEFLATE spec for generating codes
3376    memset(sizes, 0, sizeof(sizes));
3377    memset(z->fast, 0, sizeof(z->fast));
3378    for (i=0; i < num; ++i)
3379       ++sizes[sizelist[i]];
3380    sizes[0] = 0;
3381    for (i=1; i < 16; ++i)
3382       STBI_ASSERT(sizes[i] <= (1 << i));
3383    code = 0;
3384    for (i=1; i < 16; ++i) {
3385       next_code[i] = code;
3386       z->firstcode[i] = (stbi__uint16) code;
3387       z->firstsymbol[i] = (stbi__uint16) k;
3388       code = (code + sizes[i]);
3389       if (sizes[i])
3390          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt JPEG");
3391       z->maxcode[i] = code << (16-i); // preshift for inner loop
3392       code <<= 1;
3393       k += sizes[i];
3394    }
3395    z->maxcode[16] = 0x10000; // sentinel
3396    for (i=0; i < num; ++i) {
3397       int s = sizelist[i];
3398       if (s) {
3399          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
3400          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
3401          z->size [c] = (stbi_uc     ) s;
3402          z->value[c] = (stbi__uint16) i;
3403          if (s <= STBI__ZFAST_BITS) {
3404             int k = stbi__bit_reverse(next_code[s],s);
3405             while (k < (1 << STBI__ZFAST_BITS)) {
3406                z->fast[k] = fastv;
3407                k += (1 << s);
3408             }
3409          }
3410          ++next_code[s];
3411       }
3412    }
3413    return 1;
3414 }
3415 
3416 // zlib-from-memory implementation for PNG reading
3417 //    because PNG allows splitting the zlib stream arbitrarily,
3418 //    and it's annoying structurally to have PNG call ZLIB call PNG,
3419 //    we require PNG read all the IDATs and combine them into a single
3420 //    memory buffer
3421 
3422 typedef struct
3423 {
3424    stbi_uc *zbuffer, *zbuffer_end;
3425    int num_bits;
3426    stbi__uint32 code_buffer;
3427 
3428    char *zout;
3429    char *zout_start;
3430    char *zout_end;
3431    int   z_expandable;
3432 
3433    stbi__zhuffman z_length, z_distance;
3434 } stbi__zbuf;
3435 
stbi__zget8(stbi__zbuf * z)3436 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
3437 {
3438    if (z->zbuffer >= z->zbuffer_end) return 0;
3439    return *z->zbuffer++;
3440 }
3441 
stbi__fill_bits(stbi__zbuf * z)3442 static void stbi__fill_bits(stbi__zbuf *z)
3443 {
3444    do {
3445       STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
3446       z->code_buffer |= stbi__zget8(z) << z->num_bits;
3447       z->num_bits += 8;
3448    } while (z->num_bits <= 24);
3449 }
3450 
stbi__zreceive(stbi__zbuf * z,int n)3451 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
3452 {
3453    unsigned int k;
3454    if (z->num_bits < n) stbi__fill_bits(z);
3455    k = z->code_buffer & ((1 << n) - 1);
3456    z->code_buffer >>= n;
3457    z->num_bits -= n;
3458    return k;
3459 }
3460 
stbi__zhuffman_decode_slowpath(stbi__zbuf * a,stbi__zhuffman * z)3461 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
3462 {
3463    int b,s,k;
3464    // not resolved by fast table, so compute it the slow way
3465    // use jpeg approach, which requires MSbits at top
3466    k = stbi__bit_reverse(a->code_buffer, 16);
3467    for (s=STBI__ZFAST_BITS+1; ; ++s)
3468       if (k < z->maxcode[s])
3469          break;
3470    if (s == 16) return -1; // invalid code!
3471    // code size is s, so:
3472    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
3473    STBI_ASSERT(z->size[b] == s);
3474    a->code_buffer >>= s;
3475    a->num_bits -= s;
3476    return z->value[b];
3477 }
3478 
stbi__zhuffman_decode(stbi__zbuf * a,stbi__zhuffman * z)3479 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
3480 {
3481    int b,s;
3482    if (a->num_bits < 16) stbi__fill_bits(a);
3483    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
3484    if (b) {
3485       s = b >> 9;
3486       a->code_buffer >>= s;
3487       a->num_bits -= s;
3488       return b & 511;
3489    }
3490    return stbi__zhuffman_decode_slowpath(a, z);
3491 }
3492 
stbi__zexpand(stbi__zbuf * z,char * zout,int n)3493 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
3494 {
3495    char *q;
3496    int cur, limit;
3497    z->zout = zout;
3498    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
3499    cur   = (int) (z->zout     - z->zout_start);
3500    limit = (int) (z->zout_end - z->zout_start);
3501    while (cur + n > limit)
3502       limit *= 2;
3503    q = (char *) STBI_REALLOC(z->zout_start, limit);
3504    if (q == NULL) return stbi__err("outofmem", "Out of memory");
3505    z->zout_start = q;
3506    z->zout       = q + cur;
3507    z->zout_end   = q + limit;
3508    return 1;
3509 }
3510 
3511 static int stbi__zlength_base[31] = {
3512    3,4,5,6,7,8,9,10,11,13,
3513    15,17,19,23,27,31,35,43,51,59,
3514    67,83,99,115,131,163,195,227,258,0,0 };
3515 
3516 static int stbi__zlength_extra[31]=
3517 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
3518 
3519 static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
3520 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
3521 
3522 static int stbi__zdist_extra[32] =
3523 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
3524 
stbi__parse_huffman_block(stbi__zbuf * a)3525 static int stbi__parse_huffman_block(stbi__zbuf *a)
3526 {
3527    char *zout = a->zout;
3528    for(;;) {
3529       int z = stbi__zhuffman_decode(a, &a->z_length);
3530       if (z < 256) {
3531          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
3532          if (zout >= a->zout_end) {
3533             if (!stbi__zexpand(a, zout, 1)) return 0;
3534             zout = a->zout;
3535          }
3536          *zout++ = (char) z;
3537       } else {
3538          stbi_uc *p;
3539          int len,dist;
3540          if (z == 256) {
3541             a->zout = zout;
3542             return 1;
3543          }
3544          z -= 257;
3545          len = stbi__zlength_base[z];
3546          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
3547          z = stbi__zhuffman_decode(a, &a->z_distance);
3548          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
3549          dist = stbi__zdist_base[z];
3550          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
3551          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
3552          if (zout + len > a->zout_end) {
3553             if (!stbi__zexpand(a, zout, len)) return 0;
3554             zout = a->zout;
3555          }
3556          p = (stbi_uc *) (zout - dist);
3557          if (dist == 1) { // run of one byte; common in images.
3558             stbi_uc v = *p;
3559             do *zout++ = v; while (--len);
3560          } else {
3561             do *zout++ = *p++; while (--len);
3562          }
3563       }
3564    }
3565 }
3566 
stbi__compute_huffman_codes(stbi__zbuf * a)3567 static int stbi__compute_huffman_codes(stbi__zbuf *a)
3568 {
3569    static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
3570    stbi__zhuffman z_codelength;
3571    stbi_uc lencodes[286+32+137];//padding for maximum single op
3572    stbi_uc codelength_sizes[19];
3573    int i,n;
3574 
3575    int hlit  = stbi__zreceive(a,5) + 257;
3576    int hdist = stbi__zreceive(a,5) + 1;
3577    int hclen = stbi__zreceive(a,4) + 4;
3578 
3579    memset(codelength_sizes, 0, sizeof(codelength_sizes));
3580    for (i=0; i < hclen; ++i) {
3581       int s = stbi__zreceive(a,3);
3582       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
3583    }
3584    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
3585 
3586    n = 0;
3587    while (n < hlit + hdist) {
3588       int c = stbi__zhuffman_decode(a, &z_codelength);
3589       STBI_ASSERT(c >= 0 && c < 19);
3590       if (c < 16)
3591          lencodes[n++] = (stbi_uc) c;
3592       else if (c == 16) {
3593          c = stbi__zreceive(a,2)+3;
3594          memset(lencodes+n, lencodes[n-1], c);
3595          n += c;
3596       } else if (c == 17) {
3597          c = stbi__zreceive(a,3)+3;
3598          memset(lencodes+n, 0, c);
3599          n += c;
3600       } else {
3601          STBI_ASSERT(c == 18);
3602          c = stbi__zreceive(a,7)+11;
3603          memset(lencodes+n, 0, c);
3604          n += c;
3605       }
3606    }
3607    if (n != hlit+hdist) return stbi__err("bad codelengths","Corrupt PNG");
3608    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
3609    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
3610    return 1;
3611 }
3612 
stbi__parse_uncomperssed_block(stbi__zbuf * a)3613 static int stbi__parse_uncomperssed_block(stbi__zbuf *a)
3614 {
3615    stbi_uc header[4];
3616    int len,nlen,k;
3617    if (a->num_bits & 7)
3618       stbi__zreceive(a, a->num_bits & 7); // discard
3619    // drain the bit-packed data into header
3620    k = 0;
3621    while (a->num_bits > 0) {
3622       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
3623       a->code_buffer >>= 8;
3624       a->num_bits -= 8;
3625    }
3626    STBI_ASSERT(a->num_bits == 0);
3627    // now fill header the normal way
3628    while (k < 4)
3629       header[k++] = stbi__zget8(a);
3630    len  = header[1] * 256 + header[0];
3631    nlen = header[3] * 256 + header[2];
3632    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
3633    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
3634    if (a->zout + len > a->zout_end)
3635       if (!stbi__zexpand(a, a->zout, len)) return 0;
3636    memcpy(a->zout, a->zbuffer, len);
3637    a->zbuffer += len;
3638    a->zout += len;
3639    return 1;
3640 }
3641 
stbi__parse_zlib_header(stbi__zbuf * a)3642 static int stbi__parse_zlib_header(stbi__zbuf *a)
3643 {
3644    int cmf   = stbi__zget8(a);
3645    int cm    = cmf & 15;
3646    /* int cinfo = cmf >> 4; */
3647    int flg   = stbi__zget8(a);
3648    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
3649    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
3650    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
3651    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
3652    return 1;
3653 }
3654 
3655 // @TODO: should statically initialize these for optimal thread safety
3656 static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
stbi__init_zdefaults(void)3657 static void stbi__init_zdefaults(void)
3658 {
3659    int i;   // use <= to match clearly with spec
3660    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
3661    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
3662    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
3663    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
3664 
3665    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
3666 }
3667 
stbi__parse_zlib(stbi__zbuf * a,int parse_header)3668 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
3669 {
3670    int final, type;
3671    if (parse_header)
3672       if (!stbi__parse_zlib_header(a)) return 0;
3673    a->num_bits = 0;
3674    a->code_buffer = 0;
3675    do {
3676       final = stbi__zreceive(a,1);
3677       type = stbi__zreceive(a,2);
3678       if (type == 0) {
3679          if (!stbi__parse_uncomperssed_block(a)) return 0;
3680       } else if (type == 3) {
3681          return 0;
3682       } else {
3683          if (type == 1) {
3684             // use fixed code lengths
3685             if (!stbi__zdefault_distance[31]) stbi__init_zdefaults();
3686             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , 288)) return 0;
3687             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
3688          } else {
3689             if (!stbi__compute_huffman_codes(a)) return 0;
3690          }
3691          if (!stbi__parse_huffman_block(a)) return 0;
3692       }
3693    } while (!final);
3694    return 1;
3695 }
3696 
stbi__do_zlib(stbi__zbuf * a,char * obuf,int olen,int exp,int parse_header)3697 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
3698 {
3699    a->zout_start = obuf;
3700    a->zout       = obuf;
3701    a->zout_end   = obuf + olen;
3702    a->z_expandable = exp;
3703 
3704    return stbi__parse_zlib(a, parse_header);
3705 }
3706 
stbi_zlib_decode_malloc_guesssize(const char * buffer,int len,int initial_size,int * outlen)3707 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
3708 {
3709    stbi__zbuf a;
3710    char *p = (char *) stbi__malloc(initial_size);
3711    if (p == NULL) return NULL;
3712    a.zbuffer = (stbi_uc *) buffer;
3713    a.zbuffer_end = (stbi_uc *) buffer + len;
3714    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
3715       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3716       return a.zout_start;
3717    } else {
3718       STBI_FREE(a.zout_start);
3719       return NULL;
3720    }
3721 }
3722 
stbi_zlib_decode_malloc(char const * buffer,int len,int * outlen)3723 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
3724 {
3725    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
3726 }
3727 
stbi_zlib_decode_malloc_guesssize_headerflag(const char * buffer,int len,int initial_size,int * outlen,int parse_header)3728 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
3729 {
3730    stbi__zbuf a;
3731    char *p = (char *) stbi__malloc(initial_size);
3732    if (p == NULL) return NULL;
3733    a.zbuffer = (stbi_uc *) buffer;
3734    a.zbuffer_end = (stbi_uc *) buffer + len;
3735    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
3736       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3737       return a.zout_start;
3738    } else {
3739       STBI_FREE(a.zout_start);
3740       return NULL;
3741    }
3742 }
3743 
stbi_zlib_decode_buffer(char * obuffer,int olen,char const * ibuffer,int ilen)3744 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
3745 {
3746    stbi__zbuf a;
3747    a.zbuffer = (stbi_uc *) ibuffer;
3748    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3749    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
3750       return (int) (a.zout - a.zout_start);
3751    else
3752       return -1;
3753 }
3754 
stbi_zlib_decode_noheader_malloc(char const * buffer,int len,int * outlen)3755 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
3756 {
3757    stbi__zbuf a;
3758    char *p = (char *) stbi__malloc(16384);
3759    if (p == NULL) return NULL;
3760    a.zbuffer = (stbi_uc *) buffer;
3761    a.zbuffer_end = (stbi_uc *) buffer+len;
3762    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
3763       if (outlen) *outlen = (int) (a.zout - a.zout_start);
3764       return a.zout_start;
3765    } else {
3766       STBI_FREE(a.zout_start);
3767       return NULL;
3768    }
3769 }
3770 
stbi_zlib_decode_noheader_buffer(char * obuffer,int olen,const char * ibuffer,int ilen)3771 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
3772 {
3773    stbi__zbuf a;
3774    a.zbuffer = (stbi_uc *) ibuffer;
3775    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
3776    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
3777       return (int) (a.zout - a.zout_start);
3778    else
3779       return -1;
3780 }
3781 #endif
3782 
3783 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
3784 //    simple implementation
3785 //      - only 8-bit samples
3786 //      - no CRC checking
3787 //      - allocates lots of intermediate memory
3788 //        - avoids problem of streaming data between subsystems
3789 //        - avoids explicit window management
3790 //    performance
3791 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
3792 
3793 #ifndef STBI_NO_PNG
3794 typedef struct
3795 {
3796    stbi__uint32 length;
3797    stbi__uint32 type;
3798 } stbi__pngchunk;
3799 
stbi__get_chunk_header(stbi__context * s)3800 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
3801 {
3802    stbi__pngchunk c;
3803    c.length = stbi__get32be(s);
3804    c.type   = stbi__get32be(s);
3805    return c;
3806 }
3807 
stbi__check_png_header(stbi__context * s)3808 static int stbi__check_png_header(stbi__context *s)
3809 {
3810    static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
3811    int i;
3812    for (i=0; i < 8; ++i)
3813       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
3814    return 1;
3815 }
3816 
3817 typedef struct
3818 {
3819    stbi__context *s;
3820    stbi_uc *idata, *expanded, *out;
3821 } stbi__png;
3822 
3823 
3824 enum {
3825    STBI__F_none=0,
3826    STBI__F_sub=1,
3827    STBI__F_up=2,
3828    STBI__F_avg=3,
3829    STBI__F_paeth=4,
3830    // synthetic filters used for first scanline to avoid needing a dummy row of 0s
3831    STBI__F_avg_first,
3832    STBI__F_paeth_first
3833 };
3834 
3835 static stbi_uc first_row_filter[5] =
3836 {
3837    STBI__F_none,
3838    STBI__F_sub,
3839    STBI__F_none,
3840    STBI__F_avg_first,
3841    STBI__F_paeth_first
3842 };
3843 
stbi__paeth(int a,int b,int c)3844 static int stbi__paeth(int a, int b, int c)
3845 {
3846    int p = a + b - c;
3847    int pa = abs(p-a);
3848    int pb = abs(p-b);
3849    int pc = abs(p-c);
3850    if (pa <= pb && pa <= pc) return a;
3851    if (pb <= pc) return b;
3852    return c;
3853 }
3854 
3855 static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
3856 
3857 // create the png data from post-deflated data
stbi__create_png_image_raw(stbi__png * a,stbi_uc * raw,stbi__uint32 raw_len,int out_n,stbi__uint32 x,stbi__uint32 y,int depth,int color)3858 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
3859 {
3860    stbi__context *s = a->s;
3861    stbi__uint32 i,j,stride = x*out_n;
3862    stbi__uint32 img_len, img_width_bytes;
3863    int k;
3864    int img_n = s->img_n; // copy it into a local for later
3865 
3866    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
3867    a->out = (stbi_uc *) stbi__malloc(x * y * out_n); // extra bytes to write off the end into
3868    if (!a->out) return stbi__err("outofmem", "Out of memory");
3869 
3870    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
3871    img_len = (img_width_bytes + 1) * y;
3872    if (s->img_x == x && s->img_y == y) {
3873       if (raw_len != img_len) return stbi__err("not enough pixels","Corrupt PNG");
3874    } else { // interlaced:
3875       if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
3876    }
3877 
3878    for (j=0; j < y; ++j) {
3879       stbi_uc *cur = a->out + stride*j;
3880       stbi_uc *prior = cur - stride;
3881       int filter = *raw++;
3882       int filter_bytes = img_n;
3883       int width = x;
3884       if (filter > 4)
3885          return stbi__err("invalid filter","Corrupt PNG");
3886 
3887       if (depth < 8) {
3888          STBI_ASSERT(img_width_bytes <= x);
3889          cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
3890          filter_bytes = 1;
3891          width = img_width_bytes;
3892       }
3893 
3894       // if first row, use special filter that doesn't sample previous row
3895       if (j == 0) filter = first_row_filter[filter];
3896 
3897       // handle first byte explicitly
3898       for (k=0; k < filter_bytes; ++k) {
3899          switch (filter) {
3900             case STBI__F_none       : cur[k] = raw[k]; break;
3901             case STBI__F_sub        : cur[k] = raw[k]; break;
3902             case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
3903             case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
3904             case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
3905             case STBI__F_avg_first  : cur[k] = raw[k]; break;
3906             case STBI__F_paeth_first: cur[k] = raw[k]; break;
3907          }
3908       }
3909 
3910       if (depth == 8) {
3911          if (img_n != out_n)
3912             cur[img_n] = 255; // first pixel
3913          raw += img_n;
3914          cur += out_n;
3915          prior += out_n;
3916       } else {
3917          raw += 1;
3918          cur += 1;
3919          prior += 1;
3920       }
3921 
3922       // this is a little gross, so that we don't switch per-pixel or per-component
3923       if (depth < 8 || img_n == out_n) {
3924          int nk = (width - 1)*img_n;
3925          #define CASE(f) \
3926              case f:     \
3927                 for (k=0; k < nk; ++k)
3928          switch (filter) {
3929             // "none" filter turns into a memcpy here; make that explicit.
3930             case STBI__F_none:         memcpy(cur, raw, nk); break;
3931             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); break;
3932             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
3933             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); break;
3934             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); break;
3935             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); break;
3936             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); break;
3937          }
3938          #undef CASE
3939          raw += nk;
3940       } else {
3941          STBI_ASSERT(img_n+1 == out_n);
3942          #define CASE(f) \
3943              case f:     \
3944                 for (i=x-1; i >= 1; --i, cur[img_n]=255,raw+=img_n,cur+=out_n,prior+=out_n) \
3945                    for (k=0; k < img_n; ++k)
3946          switch (filter) {
3947             CASE(STBI__F_none)         cur[k] = raw[k]; break;
3948             CASE(STBI__F_sub)          cur[k] = STBI__BYTECAST(raw[k] + cur[k-out_n]); break;
3949             CASE(STBI__F_up)           cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
3950             CASE(STBI__F_avg)          cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-out_n])>>1)); break;
3951             CASE(STBI__F_paeth)        cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],prior[k],prior[k-out_n])); break;
3952             CASE(STBI__F_avg_first)    cur[k] = STBI__BYTECAST(raw[k] + (cur[k-out_n] >> 1)); break;
3953             CASE(STBI__F_paeth_first)  cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-out_n],0,0)); break;
3954          }
3955          #undef CASE
3956       }
3957    }
3958 
3959    // we make a separate pass to expand bits to pixels; for performance,
3960    // this could run two scanlines behind the above code, so it won't
3961    // intefere with filtering but will still be in the cache.
3962    if (depth < 8) {
3963       for (j=0; j < y; ++j) {
3964          stbi_uc *cur = a->out + stride*j;
3965          stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
3966          // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
3967          // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
3968          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
3969 
3970          // note that the final byte might overshoot and write more data than desired.
3971          // we can allocate enough data that this never writes out of memory, but it
3972          // could also overwrite the next scanline. can it overwrite non-empty data
3973          // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
3974          // so we need to explicitly clamp the final ones
3975 
3976          if (depth == 4) {
3977             for (k=x*img_n; k >= 2; k-=2, ++in) {
3978                *cur++ = scale * ((*in >> 4)       );
3979                *cur++ = scale * ((*in     ) & 0x0f);
3980             }
3981             if (k > 0) *cur++ = scale * ((*in >> 4)       );
3982          } else if (depth == 2) {
3983             for (k=x*img_n; k >= 4; k-=4, ++in) {
3984                *cur++ = scale * ((*in >> 6)       );
3985                *cur++ = scale * ((*in >> 4) & 0x03);
3986                *cur++ = scale * ((*in >> 2) & 0x03);
3987                *cur++ = scale * ((*in     ) & 0x03);
3988             }
3989             if (k > 0) *cur++ = scale * ((*in >> 6)       );
3990             if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
3991             if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
3992          } else if (depth == 1) {
3993             for (k=x*img_n; k >= 8; k-=8, ++in) {
3994                *cur++ = scale * ((*in >> 7)       );
3995                *cur++ = scale * ((*in >> 6) & 0x01);
3996                *cur++ = scale * ((*in >> 5) & 0x01);
3997                *cur++ = scale * ((*in >> 4) & 0x01);
3998                *cur++ = scale * ((*in >> 3) & 0x01);
3999                *cur++ = scale * ((*in >> 2) & 0x01);
4000                *cur++ = scale * ((*in >> 1) & 0x01);
4001                *cur++ = scale * ((*in     ) & 0x01);
4002             }
4003             if (k > 0) *cur++ = scale * ((*in >> 7)       );
4004             if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4005             if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4006             if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4007             if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4008             if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4009             if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4010          }
4011          if (img_n != out_n) {
4012             // insert alpha = 255
4013             stbi_uc *cur = a->out + stride*j;
4014             int i;
4015             if (img_n == 1) {
4016                for (i=x-1; i >= 0; --i) {
4017                   cur[i*2+1] = 255;
4018                   cur[i*2+0] = cur[i];
4019                }
4020             } else {
4021                STBI_ASSERT(img_n == 3);
4022                for (i=x-1; i >= 0; --i) {
4023                   cur[i*4+3] = 255;
4024                   cur[i*4+2] = cur[i*3+2];
4025                   cur[i*4+1] = cur[i*3+1];
4026                   cur[i*4+0] = cur[i*3+0];
4027                }
4028             }
4029          }
4030       }
4031    }
4032 
4033    return 1;
4034 }
4035 
stbi__create_png_image(stbi__png * a,stbi_uc * image_data,stbi__uint32 image_data_len,int out_n,int depth,int color,int interlaced)4036 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4037 {
4038    stbi_uc *final;
4039    int p;
4040    if (!interlaced)
4041       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4042 
4043    // de-interlacing
4044    final = (stbi_uc *) stbi__malloc(a->s->img_x * a->s->img_y * out_n);
4045    for (p=0; p < 7; ++p) {
4046       int xorig[] = { 0,4,0,2,0,1,0 };
4047       int yorig[] = { 0,0,4,0,2,0,1 };
4048       int xspc[]  = { 8,8,4,4,2,2,1 };
4049       int yspc[]  = { 8,8,8,4,4,2,2 };
4050       int i,j,x,y;
4051       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4052       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4053       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4054       if (x && y) {
4055          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4056          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4057             STBI_FREE(final);
4058             return 0;
4059          }
4060          for (j=0; j < y; ++j) {
4061             for (i=0; i < x; ++i) {
4062                int out_y = j*yspc[p]+yorig[p];
4063                int out_x = i*xspc[p]+xorig[p];
4064                memcpy(final + out_y*a->s->img_x*out_n + out_x*out_n,
4065                       a->out + (j*x+i)*out_n, out_n);
4066             }
4067          }
4068          STBI_FREE(a->out);
4069          image_data += img_len;
4070          image_data_len -= img_len;
4071       }
4072    }
4073    a->out = final;
4074 
4075    return 1;
4076 }
4077 
stbi__compute_transparency(stbi__png * z,stbi_uc tc[3],int out_n)4078 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4079 {
4080    stbi__context *s = z->s;
4081    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4082    stbi_uc *p = z->out;
4083 
4084    // compute color-based transparency, assuming we've
4085    // already got 255 as the alpha value in the output
4086    STBI_ASSERT(out_n == 2 || out_n == 4);
4087 
4088    if (out_n == 2) {
4089       for (i=0; i < pixel_count; ++i) {
4090          p[1] = (p[0] == tc[0] ? 0 : 255);
4091          p += 2;
4092       }
4093    } else {
4094       for (i=0; i < pixel_count; ++i) {
4095          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4096             p[3] = 0;
4097          p += 4;
4098       }
4099    }
4100    return 1;
4101 }
4102 
stbi__expand_png_palette(stbi__png * a,stbi_uc * palette,int len,int pal_img_n)4103 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4104 {
4105    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4106    stbi_uc *p, *temp_out, *orig = a->out;
4107 
4108    p = (stbi_uc *) stbi__malloc(pixel_count * pal_img_n);
4109    if (p == NULL) return stbi__err("outofmem", "Out of memory");
4110 
4111    // between here and free(out) below, exitting would leak
4112    temp_out = p;
4113 
4114    if (pal_img_n == 3) {
4115       for (i=0; i < pixel_count; ++i) {
4116          int n = orig[i]*4;
4117          p[0] = palette[n  ];
4118          p[1] = palette[n+1];
4119          p[2] = palette[n+2];
4120          p += 3;
4121       }
4122    } else {
4123       for (i=0; i < pixel_count; ++i) {
4124          int n = orig[i]*4;
4125          p[0] = palette[n  ];
4126          p[1] = palette[n+1];
4127          p[2] = palette[n+2];
4128          p[3] = palette[n+3];
4129          p += 4;
4130       }
4131    }
4132    STBI_FREE(a->out);
4133    a->out = temp_out;
4134 
4135    STBI_NOTUSED(len);
4136 
4137    return 1;
4138 }
4139 
4140 static int stbi__unpremultiply_on_load = 0;
4141 static int stbi__de_iphone_flag = 0;
4142 
stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)4143 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4144 {
4145    stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
4146 }
4147 
stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)4148 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
4149 {
4150    stbi__de_iphone_flag = flag_true_if_should_convert;
4151 }
4152 
stbi__de_iphone(stbi__png * z)4153 static void stbi__de_iphone(stbi__png *z)
4154 {
4155    stbi__context *s = z->s;
4156    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4157    stbi_uc *p = z->out;
4158 
4159    if (s->img_out_n == 3) {  // convert bgr to rgb
4160       for (i=0; i < pixel_count; ++i) {
4161          stbi_uc t = p[0];
4162          p[0] = p[2];
4163          p[2] = t;
4164          p += 3;
4165       }
4166    } else {
4167       STBI_ASSERT(s->img_out_n == 4);
4168       if (stbi__unpremultiply_on_load) {
4169          // convert bgr to rgb and unpremultiply
4170          for (i=0; i < pixel_count; ++i) {
4171             stbi_uc a = p[3];
4172             stbi_uc t = p[0];
4173             if (a) {
4174                p[0] = p[2] * 255 / a;
4175                p[1] = p[1] * 255 / a;
4176                p[2] =  t   * 255 / a;
4177             } else {
4178                p[0] = p[2];
4179                p[2] = t;
4180             }
4181             p += 4;
4182          }
4183       } else {
4184          // convert bgr to rgb
4185          for (i=0; i < pixel_count; ++i) {
4186             stbi_uc t = p[0];
4187             p[0] = p[2];
4188             p[2] = t;
4189             p += 4;
4190          }
4191       }
4192    }
4193 }
4194 
4195 #define STBI__PNG_TYPE(a,b,c,d)  (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
4196 
stbi__parse_png_file(stbi__png * z,int scan,int req_comp)4197 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
4198 {
4199    stbi_uc palette[1024], pal_img_n=0;
4200    stbi_uc has_trans=0, tc[3];
4201    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
4202    int first=1,k,interlace=0, color=0, depth=0, is_iphone=0;
4203    stbi__context *s = z->s;
4204 
4205    z->expanded = NULL;
4206    z->idata = NULL;
4207    z->out = NULL;
4208 
4209    if (!stbi__check_png_header(s)) return 0;
4210 
4211    if (scan == STBI__SCAN_type) return 1;
4212 
4213    for (;;) {
4214       stbi__pngchunk c = stbi__get_chunk_header(s);
4215       switch (c.type) {
4216          case STBI__PNG_TYPE('C','g','B','I'):
4217             is_iphone = 1;
4218             stbi__skip(s, c.length);
4219             break;
4220          case STBI__PNG_TYPE('I','H','D','R'): {
4221             int comp,filter;
4222             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
4223             first = 0;
4224             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
4225             s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4226             s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4227             depth = stbi__get8(s);  if (depth != 1 && depth != 2 && depth != 4 && depth != 8)  return stbi__err("1/2/4/8-bit only","PNG not supported: 1/2/4/8-bit only");
4228             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
4229             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
4230             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
4231             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
4232             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
4233             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
4234             if (!pal_img_n) {
4235                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
4236                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
4237                if (scan == STBI__SCAN_header) return 1;
4238             } else {
4239                // if paletted, then pal_n is our final components, and
4240                // img_n is # components to decompress/filter.
4241                s->img_n = 1;
4242                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
4243                // if SCAN_header, have to scan to see if we have a tRNS
4244             }
4245             break;
4246          }
4247 
4248          case STBI__PNG_TYPE('P','L','T','E'):  {
4249             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4250             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
4251             pal_len = c.length / 3;
4252             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
4253             for (i=0; i < pal_len; ++i) {
4254                palette[i*4+0] = stbi__get8(s);
4255                palette[i*4+1] = stbi__get8(s);
4256                palette[i*4+2] = stbi__get8(s);
4257                palette[i*4+3] = 255;
4258             }
4259             break;
4260          }
4261 
4262          case STBI__PNG_TYPE('t','R','N','S'): {
4263             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4264             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
4265             if (pal_img_n) {
4266                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
4267                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
4268                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
4269                pal_img_n = 4;
4270                for (i=0; i < c.length; ++i)
4271                   palette[i*4+3] = stbi__get8(s);
4272             } else {
4273                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
4274                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
4275                has_trans = 1;
4276                for (k=0; k < s->img_n; ++k)
4277                   tc[k] = (stbi_uc) (stbi__get16be(s) & 255) * stbi__depth_scale_table[depth]; // non 8-bit images will be larger
4278             }
4279             break;
4280          }
4281 
4282          case STBI__PNG_TYPE('I','D','A','T'): {
4283             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4284             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
4285             if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
4286             if (ioff + c.length > idata_limit) {
4287                stbi_uc *p;
4288                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
4289                while (ioff + c.length > idata_limit)
4290                   idata_limit *= 2;
4291                p = (stbi_uc *) STBI_REALLOC(z->idata, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
4292                z->idata = p;
4293             }
4294             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
4295             ioff += c.length;
4296             break;
4297          }
4298 
4299          case STBI__PNG_TYPE('I','E','N','D'): {
4300             stbi__uint32 raw_len, bpl;
4301             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4302             if (scan != STBI__SCAN_load) return 1;
4303             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
4304             // initial guess for decoded data size to avoid unnecessary reallocs
4305             bpl = (s->img_x * depth + 7) / 8; // bytes per line, per component
4306             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
4307             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
4308             if (z->expanded == NULL) return 0; // zlib should set error
4309             STBI_FREE(z->idata); z->idata = NULL;
4310             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
4311                s->img_out_n = s->img_n+1;
4312             else
4313                s->img_out_n = s->img_n;
4314             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, depth, color, interlace)) return 0;
4315             if (has_trans)
4316                if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
4317             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
4318                stbi__de_iphone(z);
4319             if (pal_img_n) {
4320                // pal_img_n == 3 or 4
4321                s->img_n = pal_img_n; // record the actual colors we had
4322                s->img_out_n = pal_img_n;
4323                if (req_comp >= 3) s->img_out_n = req_comp;
4324                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
4325                   return 0;
4326             }
4327             STBI_FREE(z->expanded); z->expanded = NULL;
4328             return 1;
4329          }
4330 
4331          default:
4332             // if critical, fail
4333             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4334             if ((c.type & (1 << 29)) == 0) {
4335                #ifndef STBI_NO_FAILURE_STRINGS
4336                // not threadsafe
4337                static char invalid_chunk[] = "XXXX PNG chunk not known";
4338                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
4339                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
4340                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
4341                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
4342                #endif
4343                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
4344             }
4345             stbi__skip(s, c.length);
4346             break;
4347       }
4348       // end of PNG chunk, read and skip CRC
4349       stbi__get32be(s);
4350    }
4351 }
4352 
stbi__do_png(stbi__png * p,int * x,int * y,int * n,int req_comp)4353 static unsigned char *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp)
4354 {
4355    unsigned char *result=NULL;
4356    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
4357    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
4358       result = p->out;
4359       p->out = NULL;
4360       if (req_comp && req_comp != p->s->img_out_n) {
4361          result = stbi__convert_format(result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
4362          p->s->img_out_n = req_comp;
4363          if (result == NULL) return result;
4364       }
4365       *x = p->s->img_x;
4366       *y = p->s->img_y;
4367       if (n) *n = p->s->img_out_n;
4368    }
4369    STBI_FREE(p->out);      p->out      = NULL;
4370    STBI_FREE(p->expanded); p->expanded = NULL;
4371    STBI_FREE(p->idata);    p->idata    = NULL;
4372 
4373    return result;
4374 }
4375 
stbi__png_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4376 static unsigned char *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4377 {
4378    stbi__png p;
4379    p.s = s;
4380    return stbi__do_png(&p, x,y,comp,req_comp);
4381 }
4382 
stbi__png_test(stbi__context * s)4383 static int stbi__png_test(stbi__context *s)
4384 {
4385    int r;
4386    r = stbi__check_png_header(s);
4387    stbi__rewind(s);
4388    return r;
4389 }
4390 
stbi__png_info_raw(stbi__png * p,int * x,int * y,int * comp)4391 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
4392 {
4393    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
4394       stbi__rewind( p->s );
4395       return 0;
4396    }
4397    if (x) *x = p->s->img_x;
4398    if (y) *y = p->s->img_y;
4399    if (comp) *comp = p->s->img_n;
4400    return 1;
4401 }
4402 
stbi__png_info(stbi__context * s,int * x,int * y,int * comp)4403 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
4404 {
4405    stbi__png p;
4406    p.s = s;
4407    return stbi__png_info_raw(&p, x, y, comp);
4408 }
4409 #endif
4410 
4411 // Microsoft/Windows BMP image
4412 
4413 #ifndef STBI_NO_BMP
stbi__bmp_test_raw(stbi__context * s)4414 static int stbi__bmp_test_raw(stbi__context *s)
4415 {
4416    int r;
4417    int sz;
4418    if (stbi__get8(s) != 'B') return 0;
4419    if (stbi__get8(s) != 'M') return 0;
4420    stbi__get32le(s); // discard filesize
4421    stbi__get16le(s); // discard reserved
4422    stbi__get16le(s); // discard reserved
4423    stbi__get32le(s); // discard data offset
4424    sz = stbi__get32le(s);
4425    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
4426    return r;
4427 }
4428 
stbi__bmp_test(stbi__context * s)4429 static int stbi__bmp_test(stbi__context *s)
4430 {
4431    int r = stbi__bmp_test_raw(s);
4432    stbi__rewind(s);
4433    return r;
4434 }
4435 
4436 
4437 // returns 0..31 for the highest set bit
stbi__high_bit(unsigned int z)4438 static int stbi__high_bit(unsigned int z)
4439 {
4440    int n=0;
4441    if (z == 0) return -1;
4442    if (z >= 0x10000) n += 16, z >>= 16;
4443    if (z >= 0x00100) n +=  8, z >>=  8;
4444    if (z >= 0x00010) n +=  4, z >>=  4;
4445    if (z >= 0x00004) n +=  2, z >>=  2;
4446    if (z >= 0x00002) n +=  1, z >>=  1;
4447    return n;
4448 }
4449 
stbi__bitcount(unsigned int a)4450 static int stbi__bitcount(unsigned int a)
4451 {
4452    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
4453    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
4454    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
4455    a = (a + (a >> 8)); // max 16 per 8 bits
4456    a = (a + (a >> 16)); // max 32 per 8 bits
4457    return a & 0xff;
4458 }
4459 
stbi__shiftsigned(int v,int shift,int bits)4460 static int stbi__shiftsigned(int v, int shift, int bits)
4461 {
4462    int result;
4463    int z=0;
4464 
4465    if (shift < 0) v <<= -shift;
4466    else v >>= shift;
4467    result = v;
4468 
4469    z = bits;
4470    while (z < 8) {
4471       result += v >> z;
4472       z += bits;
4473    }
4474    return result;
4475 }
4476 
stbi__bmp_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4477 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4478 {
4479    stbi_uc *out;
4480    unsigned int mr=0,mg=0,mb=0,ma=0, fake_a=0;
4481    stbi_uc pal[256][4];
4482    int psize=0,i,j,compress=0,width;
4483    int bpp, flip_vertically, pad, target, offset, hsz;
4484    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
4485    stbi__get32le(s); // discard filesize
4486    stbi__get16le(s); // discard reserved
4487    stbi__get16le(s); // discard reserved
4488    offset = stbi__get32le(s);
4489    hsz = stbi__get32le(s);
4490    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
4491    if (hsz == 12) {
4492       s->img_x = stbi__get16le(s);
4493       s->img_y = stbi__get16le(s);
4494    } else {
4495       s->img_x = stbi__get32le(s);
4496       s->img_y = stbi__get32le(s);
4497    }
4498    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
4499    bpp = stbi__get16le(s);
4500    if (bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
4501    flip_vertically = ((int) s->img_y) > 0;
4502    s->img_y = abs((int) s->img_y);
4503    if (hsz == 12) {
4504       if (bpp < 24)
4505          psize = (offset - 14 - 24) / 3;
4506    } else {
4507       compress = stbi__get32le(s);
4508       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
4509       stbi__get32le(s); // discard sizeof
4510       stbi__get32le(s); // discard hres
4511       stbi__get32le(s); // discard vres
4512       stbi__get32le(s); // discard colorsused
4513       stbi__get32le(s); // discard max important
4514       if (hsz == 40 || hsz == 56) {
4515          if (hsz == 56) {
4516             stbi__get32le(s);
4517             stbi__get32le(s);
4518             stbi__get32le(s);
4519             stbi__get32le(s);
4520          }
4521          if (bpp == 16 || bpp == 32) {
4522             mr = mg = mb = 0;
4523             if (compress == 0) {
4524                if (bpp == 32) {
4525                   mr = 0xffu << 16;
4526                   mg = 0xffu <<  8;
4527                   mb = 0xffu <<  0;
4528                   ma = 0xffu << 24;
4529                   fake_a = 1; // @TODO: check for cases like alpha value is all 0 and switch it to 255
4530                   STBI_NOTUSED(fake_a);
4531                } else {
4532                   mr = 31u << 10;
4533                   mg = 31u <<  5;
4534                   mb = 31u <<  0;
4535                }
4536             } else if (compress == 3) {
4537                mr = stbi__get32le(s);
4538                mg = stbi__get32le(s);
4539                mb = stbi__get32le(s);
4540                // not documented, but generated by photoshop and handled by mspaint
4541                if (mr == mg && mg == mb) {
4542                   // ?!?!?
4543                   return stbi__errpuc("bad BMP", "bad BMP");
4544                }
4545             } else
4546                return stbi__errpuc("bad BMP", "bad BMP");
4547          }
4548       } else {
4549          STBI_ASSERT(hsz == 108 || hsz == 124);
4550          mr = stbi__get32le(s);
4551          mg = stbi__get32le(s);
4552          mb = stbi__get32le(s);
4553          ma = stbi__get32le(s);
4554          stbi__get32le(s); // discard color space
4555          for (i=0; i < 12; ++i)
4556             stbi__get32le(s); // discard color space parameters
4557          if (hsz == 124) {
4558             stbi__get32le(s); // discard rendering intent
4559             stbi__get32le(s); // discard offset of profile data
4560             stbi__get32le(s); // discard size of profile data
4561             stbi__get32le(s); // discard reserved
4562          }
4563       }
4564       if (bpp < 16)
4565          psize = (offset - 14 - hsz) >> 2;
4566    }
4567    s->img_n = ma ? 4 : 3;
4568    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
4569       target = req_comp;
4570    else
4571       target = s->img_n; // if they want monochrome, we'll post-convert
4572    out = (stbi_uc *) stbi__malloc(target * s->img_x * s->img_y);
4573    if (!out) return stbi__errpuc("outofmem", "Out of memory");
4574    if (bpp < 16) {
4575       int z=0;
4576       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
4577       for (i=0; i < psize; ++i) {
4578          pal[i][2] = stbi__get8(s);
4579          pal[i][1] = stbi__get8(s);
4580          pal[i][0] = stbi__get8(s);
4581          if (hsz != 12) stbi__get8(s);
4582          pal[i][3] = 255;
4583       }
4584       stbi__skip(s, offset - 14 - hsz - psize * (hsz == 12 ? 3 : 4));
4585       if (bpp == 4) width = (s->img_x + 1) >> 1;
4586       else if (bpp == 8) width = s->img_x;
4587       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
4588       pad = (-width)&3;
4589       for (j=0; j < (int) s->img_y; ++j) {
4590          for (i=0; i < (int) s->img_x; i += 2) {
4591             int v=stbi__get8(s),v2=0;
4592             if (bpp == 4) {
4593                v2 = v & 15;
4594                v >>= 4;
4595             }
4596             out[z++] = pal[v][0];
4597             out[z++] = pal[v][1];
4598             out[z++] = pal[v][2];
4599             if (target == 4) out[z++] = 255;
4600             if (i+1 == (int) s->img_x) break;
4601             v = (bpp == 8) ? stbi__get8(s) : v2;
4602             out[z++] = pal[v][0];
4603             out[z++] = pal[v][1];
4604             out[z++] = pal[v][2];
4605             if (target == 4) out[z++] = 255;
4606          }
4607          stbi__skip(s, pad);
4608       }
4609    } else {
4610       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
4611       int z = 0;
4612       int easy=0;
4613       stbi__skip(s, offset - 14 - hsz);
4614       if (bpp == 24) width = 3 * s->img_x;
4615       else if (bpp == 16) width = 2*s->img_x;
4616       else /* bpp = 32 and pad = 0 */ width=0;
4617       pad = (-width) & 3;
4618       if (bpp == 24) {
4619          easy = 1;
4620       } else if (bpp == 32) {
4621          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
4622             easy = 2;
4623       }
4624       if (!easy) {
4625          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
4626          // right shift amt to put high bit in position #7
4627          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
4628          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
4629          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
4630          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
4631       }
4632       for (j=0; j < (int) s->img_y; ++j) {
4633          if (easy) {
4634             for (i=0; i < (int) s->img_x; ++i) {
4635                unsigned char a;
4636                out[z+2] = stbi__get8(s);
4637                out[z+1] = stbi__get8(s);
4638                out[z+0] = stbi__get8(s);
4639                z += 3;
4640                a = (easy == 2 ? stbi__get8(s) : 255);
4641                if (target == 4) out[z++] = a;
4642             }
4643          } else {
4644             for (i=0; i < (int) s->img_x; ++i) {
4645                stbi__uint32 v = (stbi__uint32) (bpp == 16 ? stbi__get16le(s) : stbi__get32le(s));
4646                int a;
4647                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
4648                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
4649                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
4650                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
4651                if (target == 4) out[z++] = STBI__BYTECAST(a);
4652             }
4653          }
4654          stbi__skip(s, pad);
4655       }
4656    }
4657    if (flip_vertically) {
4658       stbi_uc t;
4659       for (j=0; j < (int) s->img_y>>1; ++j) {
4660          stbi_uc *p1 = out +      j     *s->img_x*target;
4661          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
4662          for (i=0; i < (int) s->img_x*target; ++i) {
4663             t = p1[i], p1[i] = p2[i], p2[i] = t;
4664          }
4665       }
4666    }
4667 
4668    if (req_comp && req_comp != target) {
4669       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
4670       if (out == NULL) return out; // stbi__convert_format frees input on failure
4671    }
4672 
4673    *x = s->img_x;
4674    *y = s->img_y;
4675    if (comp) *comp = s->img_n;
4676    return out;
4677 }
4678 #endif
4679 
4680 // Targa Truevision - TGA
4681 // by Jonathan Dummer
4682 #ifndef STBI_NO_TGA
stbi__tga_info(stbi__context * s,int * x,int * y,int * comp)4683 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
4684 {
4685     int tga_w, tga_h, tga_comp;
4686     int sz;
4687     stbi__get8(s);                   // discard Offset
4688     sz = stbi__get8(s);              // color type
4689     if( sz > 1 ) {
4690         stbi__rewind(s);
4691         return 0;      // only RGB or indexed allowed
4692     }
4693     sz = stbi__get8(s);              // image type
4694     // only RGB or grey allowed, +/- RLE
4695     if ((sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11)) return 0;
4696     stbi__skip(s,9);
4697     tga_w = stbi__get16le(s);
4698     if( tga_w < 1 ) {
4699         stbi__rewind(s);
4700         return 0;   // test width
4701     }
4702     tga_h = stbi__get16le(s);
4703     if( tga_h < 1 ) {
4704         stbi__rewind(s);
4705         return 0;   // test height
4706     }
4707     sz = stbi__get8(s);               // bits per pixel
4708     // only RGB or RGBA or grey allowed
4709     if ((sz != 8) && (sz != 16) && (sz != 24) && (sz != 32)) {
4710         stbi__rewind(s);
4711         return 0;
4712     }
4713     tga_comp = sz;
4714     if (x) *x = tga_w;
4715     if (y) *y = tga_h;
4716     if (comp) *comp = tga_comp / 8;
4717     return 1;                   // seems to have passed everything
4718 }
4719 
stbi__tga_test(stbi__context * s)4720 static int stbi__tga_test(stbi__context *s)
4721 {
4722    int res;
4723    int sz;
4724    stbi__get8(s);      //   discard Offset
4725    sz = stbi__get8(s);   //   color type
4726    if ( sz > 1 ) return 0;   //   only RGB or indexed allowed
4727    sz = stbi__get8(s);   //   image type
4728    if ( (sz != 1) && (sz != 2) && (sz != 3) && (sz != 9) && (sz != 10) && (sz != 11) ) return 0;   //   only RGB or grey allowed, +/- RLE
4729    stbi__get16be(s);      //   discard palette start
4730    stbi__get16be(s);      //   discard palette length
4731    stbi__get8(s);         //   discard bits per palette color entry
4732    stbi__get16be(s);      //   discard x origin
4733    stbi__get16be(s);      //   discard y origin
4734    if ( stbi__get16be(s) < 1 ) return 0;      //   test width
4735    if ( stbi__get16be(s) < 1 ) return 0;      //   test height
4736    sz = stbi__get8(s);   //   bits per pixel
4737    if ( (sz != 8) && (sz != 16) && (sz != 24) && (sz != 32) )
4738       res = 0;
4739    else
4740       res = 1;
4741    stbi__rewind(s);
4742    return res;
4743 }
4744 
stbi__tga_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4745 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4746 {
4747    //   read in the TGA header stuff
4748    int tga_offset = stbi__get8(s);
4749    int tga_indexed = stbi__get8(s);
4750    int tga_image_type = stbi__get8(s);
4751    int tga_is_RLE = 0;
4752    int tga_palette_start = stbi__get16le(s);
4753    int tga_palette_len = stbi__get16le(s);
4754    int tga_palette_bits = stbi__get8(s);
4755    int tga_x_origin = stbi__get16le(s);
4756    int tga_y_origin = stbi__get16le(s);
4757    int tga_width = stbi__get16le(s);
4758    int tga_height = stbi__get16le(s);
4759    int tga_bits_per_pixel = stbi__get8(s);
4760    int tga_comp = tga_bits_per_pixel / 8;
4761    int tga_inverted = stbi__get8(s);
4762    //   image data
4763    unsigned char *tga_data;
4764    unsigned char *tga_palette = NULL;
4765    int i, j;
4766    unsigned char raw_data[4];
4767    int RLE_count = 0;
4768    int RLE_repeating = 0;
4769    int read_next_pixel = 1;
4770 
4771    //   do a tiny bit of precessing
4772    if ( tga_image_type >= 8 )
4773    {
4774       tga_image_type -= 8;
4775       tga_is_RLE = 1;
4776    }
4777    /* int tga_alpha_bits = tga_inverted & 15; */
4778    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
4779 
4780    //   error check
4781    if ( //(tga_indexed) ||
4782       (tga_width < 1) || (tga_height < 1) ||
4783       (tga_image_type < 1) || (tga_image_type > 3) ||
4784       ((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16) &&
4785       (tga_bits_per_pixel != 24) && (tga_bits_per_pixel != 32))
4786       )
4787    {
4788       return NULL; // we don't report this as a bad TGA because we don't even know if it's TGA
4789    }
4790 
4791    //   If I'm paletted, then I'll use the number of bits from the palette
4792    if ( tga_indexed )
4793    {
4794       tga_comp = tga_palette_bits / 8;
4795    }
4796 
4797    //   tga info
4798    *x = tga_width;
4799    *y = tga_height;
4800    if (comp) *comp = tga_comp;
4801 
4802    tga_data = (unsigned char*)stbi__malloc( tga_width * tga_height * tga_comp );
4803    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
4804 
4805    // skip to the data's starting position (offset usually = 0)
4806    stbi__skip(s, tga_offset );
4807 
4808    if ( !tga_indexed && !tga_is_RLE) {
4809       for (i=0; i < tga_height; ++i) {
4810          int y = tga_inverted ? tga_height -i - 1 : i;
4811          stbi_uc *tga_row = tga_data + y*tga_width*tga_comp;
4812          stbi__getn(s, tga_row, tga_width * tga_comp);
4813       }
4814    } else  {
4815       //   do I need to load a palette?
4816       if ( tga_indexed)
4817       {
4818          //   any data to skip? (offset usually = 0)
4819          stbi__skip(s, tga_palette_start );
4820          //   load the palette
4821          tga_palette = (unsigned char*)stbi__malloc( tga_palette_len * tga_palette_bits / 8 );
4822          if (!tga_palette) {
4823             STBI_FREE(tga_data);
4824             return stbi__errpuc("outofmem", "Out of memory");
4825          }
4826          if (!stbi__getn(s, tga_palette, tga_palette_len * tga_palette_bits / 8 )) {
4827             STBI_FREE(tga_data);
4828             STBI_FREE(tga_palette);
4829             return stbi__errpuc("bad palette", "Corrupt TGA");
4830          }
4831       }
4832       //   load the data
4833       for (i=0; i < tga_width * tga_height; ++i)
4834       {
4835          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
4836          if ( tga_is_RLE )
4837          {
4838             if ( RLE_count == 0 )
4839             {
4840                //   yep, get the next byte as a RLE command
4841                int RLE_cmd = stbi__get8(s);
4842                RLE_count = 1 + (RLE_cmd & 127);
4843                RLE_repeating = RLE_cmd >> 7;
4844                read_next_pixel = 1;
4845             } else if ( !RLE_repeating )
4846             {
4847                read_next_pixel = 1;
4848             }
4849          } else
4850          {
4851             read_next_pixel = 1;
4852          }
4853          //   OK, if I need to read a pixel, do it now
4854          if ( read_next_pixel )
4855          {
4856             //   load however much data we did have
4857             if ( tga_indexed )
4858             {
4859                //   read in 1 byte, then perform the lookup
4860                int pal_idx = stbi__get8(s);
4861                if ( pal_idx >= tga_palette_len )
4862                {
4863                   //   invalid index
4864                   pal_idx = 0;
4865                }
4866                pal_idx *= tga_bits_per_pixel / 8;
4867                for (j = 0; j*8 < tga_bits_per_pixel; ++j)
4868                {
4869                   raw_data[j] = tga_palette[pal_idx+j];
4870                }
4871             } else
4872             {
4873                //   read in the data raw
4874                for (j = 0; j*8 < tga_bits_per_pixel; ++j)
4875                {
4876                   raw_data[j] = stbi__get8(s);
4877                }
4878             }
4879             //   clear the reading flag for the next pixel
4880             read_next_pixel = 0;
4881          } // end of reading a pixel
4882 
4883          // copy data
4884          for (j = 0; j < tga_comp; ++j)
4885            tga_data[i*tga_comp+j] = raw_data[j];
4886 
4887          //   in case we're in RLE mode, keep counting down
4888          --RLE_count;
4889       }
4890       //   do I need to invert the image?
4891       if ( tga_inverted )
4892       {
4893          for (j = 0; j*2 < tga_height; ++j)
4894          {
4895             int index1 = j * tga_width * tga_comp;
4896             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
4897             for (i = tga_width * tga_comp; i > 0; --i)
4898             {
4899                unsigned char temp = tga_data[index1];
4900                tga_data[index1] = tga_data[index2];
4901                tga_data[index2] = temp;
4902                ++index1;
4903                ++index2;
4904             }
4905          }
4906       }
4907       //   clear my palette, if I had one
4908       if ( tga_palette != NULL )
4909       {
4910          STBI_FREE( tga_palette );
4911       }
4912    }
4913 
4914    // swap RGB
4915    if (tga_comp >= 3)
4916    {
4917       unsigned char* tga_pixel = tga_data;
4918       for (i=0; i < tga_width * tga_height; ++i)
4919       {
4920          unsigned char temp = tga_pixel[0];
4921          tga_pixel[0] = tga_pixel[2];
4922          tga_pixel[2] = temp;
4923          tga_pixel += tga_comp;
4924       }
4925    }
4926 
4927    // convert to target component count
4928    if (req_comp && req_comp != tga_comp)
4929       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
4930 
4931    //   the things I do to get rid of an error message, and yet keep
4932    //   Microsoft's C compilers happy... [8^(
4933    tga_palette_start = tga_palette_len = tga_palette_bits =
4934          tga_x_origin = tga_y_origin = 0;
4935    //   OK, done
4936    return tga_data;
4937 }
4938 #endif
4939 
4940 // *************************************************************************************************
4941 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
4942 
4943 #ifndef STBI_NO_PSD
stbi__psd_test(stbi__context * s)4944 static int stbi__psd_test(stbi__context *s)
4945 {
4946    int r = (stbi__get32be(s) == 0x38425053);
4947    stbi__rewind(s);
4948    return r;
4949 }
4950 
stbi__psd_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)4951 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
4952 {
4953    int   pixelCount;
4954    int channelCount, compression;
4955    int channel, i, count, len;
4956    int w,h;
4957    stbi_uc *out;
4958 
4959    // Check identifier
4960    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
4961       return stbi__errpuc("not PSD", "Corrupt PSD image");
4962 
4963    // Check file type version.
4964    if (stbi__get16be(s) != 1)
4965       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
4966 
4967    // Skip 6 reserved bytes.
4968    stbi__skip(s, 6 );
4969 
4970    // Read the number of channels (R, G, B, A, etc).
4971    channelCount = stbi__get16be(s);
4972    if (channelCount < 0 || channelCount > 16)
4973       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
4974 
4975    // Read the rows and columns of the image.
4976    h = stbi__get32be(s);
4977    w = stbi__get32be(s);
4978 
4979    // Make sure the depth is 8 bits.
4980    if (stbi__get16be(s) != 8)
4981       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 bit");
4982 
4983    // Make sure the color mode is RGB.
4984    // Valid options are:
4985    //   0: Bitmap
4986    //   1: Grayscale
4987    //   2: Indexed color
4988    //   3: RGB color
4989    //   4: CMYK color
4990    //   7: Multichannel
4991    //   8: Duotone
4992    //   9: Lab color
4993    if (stbi__get16be(s) != 3)
4994       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
4995 
4996    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
4997    stbi__skip(s,stbi__get32be(s) );
4998 
4999    // Skip the image resources.  (resolution, pen tool paths, etc)
5000    stbi__skip(s, stbi__get32be(s) );
5001 
5002    // Skip the reserved data.
5003    stbi__skip(s, stbi__get32be(s) );
5004 
5005    // Find out if the data is compressed.
5006    // Known values:
5007    //   0: no compression
5008    //   1: RLE compressed
5009    compression = stbi__get16be(s);
5010    if (compression > 1)
5011       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
5012 
5013    // Create the destination image.
5014    out = (stbi_uc *) stbi__malloc(4 * w*h);
5015    if (!out) return stbi__errpuc("outofmem", "Out of memory");
5016    pixelCount = w*h;
5017 
5018    // Initialize the data to zero.
5019    //memset( out, 0, pixelCount * 4 );
5020 
5021    // Finally, the image data.
5022    if (compression) {
5023       // RLE as used by .PSD and .TIFF
5024       // Loop until you get the number of unpacked bytes you are expecting:
5025       //     Read the next source byte into n.
5026       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
5027       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
5028       //     Else if n is 128, noop.
5029       // Endloop
5030 
5031       // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
5032       // which we're going to just skip.
5033       stbi__skip(s, h * channelCount * 2 );
5034 
5035       // Read the RLE data by channel.
5036       for (channel = 0; channel < 4; channel++) {
5037          stbi_uc *p;
5038 
5039          p = out+channel;
5040          if (channel >= channelCount) {
5041             // Fill this channel with default data.
5042             for (i = 0; i < pixelCount; i++) *p = (channel == 3 ? 255 : 0), p += 4;
5043          } else {
5044             // Read the RLE data.
5045             count = 0;
5046             while (count < pixelCount) {
5047                len = stbi__get8(s);
5048                if (len == 128) {
5049                   // No-op.
5050                } else if (len < 128) {
5051                   // Copy next len+1 bytes literally.
5052                   len++;
5053                   count += len;
5054                   while (len) {
5055                      *p = stbi__get8(s);
5056                      p += 4;
5057                      len--;
5058                   }
5059                } else if (len > 128) {
5060                   stbi_uc   val;
5061                   // Next -len+1 bytes in the dest are replicated from next source byte.
5062                   // (Interpret len as a negative 8-bit int.)
5063                   len ^= 0x0FF;
5064                   len += 2;
5065                   val = stbi__get8(s);
5066                   count += len;
5067                   while (len) {
5068                      *p = val;
5069                      p += 4;
5070                      len--;
5071                   }
5072                }
5073             }
5074          }
5075       }
5076 
5077    } else {
5078       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
5079       // where each channel consists of an 8-bit value for each pixel in the image.
5080 
5081       // Read the data by channel.
5082       for (channel = 0; channel < 4; channel++) {
5083          stbi_uc *p;
5084 
5085          p = out + channel;
5086          if (channel > channelCount) {
5087             // Fill this channel with default data.
5088             for (i = 0; i < pixelCount; i++) *p = channel == 3 ? 255 : 0, p += 4;
5089          } else {
5090             // Read the data.
5091             for (i = 0; i < pixelCount; i++)
5092                *p = stbi__get8(s), p += 4;
5093          }
5094       }
5095    }
5096 
5097    if (req_comp && req_comp != 4) {
5098       out = stbi__convert_format(out, 4, req_comp, w, h);
5099       if (out == NULL) return out; // stbi__convert_format frees input on failure
5100    }
5101 
5102    if (comp) *comp = channelCount;
5103    *y = h;
5104    *x = w;
5105 
5106    return out;
5107 }
5108 #endif
5109 
5110 // *************************************************************************************************
5111 // Softimage PIC loader
5112 // by Tom Seddon
5113 //
5114 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
5115 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
5116 
5117 #ifndef STBI_NO_PIC
stbi__pic_is4(stbi__context * s,const char * str)5118 static int stbi__pic_is4(stbi__context *s,const char *str)
5119 {
5120    int i;
5121    for (i=0; i<4; ++i)
5122       if (stbi__get8(s) != (stbi_uc)str[i])
5123          return 0;
5124 
5125    return 1;
5126 }
5127 
stbi__pic_test_core(stbi__context * s)5128 static int stbi__pic_test_core(stbi__context *s)
5129 {
5130    int i;
5131 
5132    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
5133       return 0;
5134 
5135    for(i=0;i<84;++i)
5136       stbi__get8(s);
5137 
5138    if (!stbi__pic_is4(s,"PICT"))
5139       return 0;
5140 
5141    return 1;
5142 }
5143 
5144 typedef struct
5145 {
5146    stbi_uc size,type,channel;
5147 } stbi__pic_packet;
5148 
stbi__readval(stbi__context * s,int channel,stbi_uc * dest)5149 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
5150 {
5151    int mask=0x80, i;
5152 
5153    for (i=0; i<4; ++i, mask>>=1) {
5154       if (channel & mask) {
5155          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
5156          dest[i]=stbi__get8(s);
5157       }
5158    }
5159 
5160    return dest;
5161 }
5162 
stbi__copyval(int channel,stbi_uc * dest,const stbi_uc * src)5163 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
5164 {
5165    int mask=0x80,i;
5166 
5167    for (i=0;i<4; ++i, mask>>=1)
5168       if (channel&mask)
5169          dest[i]=src[i];
5170 }
5171 
stbi__pic_load_core(stbi__context * s,int width,int height,int * comp,stbi_uc * result)5172 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
5173 {
5174    int act_comp=0,num_packets=0,y,chained;
5175    stbi__pic_packet packets[10];
5176 
5177    // this will (should...) cater for even some bizarre stuff like having data
5178     // for the same channel in multiple packets.
5179    do {
5180       stbi__pic_packet *packet;
5181 
5182       if (num_packets==sizeof(packets)/sizeof(packets[0]))
5183          return stbi__errpuc("bad format","too many packets");
5184 
5185       packet = &packets[num_packets++];
5186 
5187       chained = stbi__get8(s);
5188       packet->size    = stbi__get8(s);
5189       packet->type    = stbi__get8(s);
5190       packet->channel = stbi__get8(s);
5191 
5192       act_comp |= packet->channel;
5193 
5194       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
5195       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
5196    } while (chained);
5197 
5198    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
5199 
5200    for(y=0; y<height; ++y) {
5201       int packet_idx;
5202 
5203       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
5204          stbi__pic_packet *packet = &packets[packet_idx];
5205          stbi_uc *dest = result+y*width*4;
5206 
5207          switch (packet->type) {
5208             default:
5209                return stbi__errpuc("bad format","packet has bad compression type");
5210 
5211             case 0: {//uncompressed
5212                int x;
5213 
5214                for(x=0;x<width;++x, dest+=4)
5215                   if (!stbi__readval(s,packet->channel,dest))
5216                      return 0;
5217                break;
5218             }
5219 
5220             case 1://Pure RLE
5221                {
5222                   int left=width, i;
5223 
5224                   while (left>0) {
5225                      stbi_uc count,value[4];
5226 
5227                      count=stbi__get8(s);
5228                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
5229 
5230                      if (count > left)
5231                         count = (stbi_uc) left;
5232 
5233                      if (!stbi__readval(s,packet->channel,value))  return 0;
5234 
5235                      for(i=0; i<count; ++i,dest+=4)
5236                         stbi__copyval(packet->channel,dest,value);
5237                      left -= count;
5238                   }
5239                }
5240                break;
5241 
5242             case 2: {//Mixed RLE
5243                int left=width;
5244                while (left>0) {
5245                   int count = stbi__get8(s), i;
5246                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
5247 
5248                   if (count >= 128) { // Repeated
5249                      stbi_uc value[4];
5250                      int i;
5251 
5252                      if (count==128)
5253                         count = stbi__get16be(s);
5254                      else
5255                         count -= 127;
5256                      if (count > left)
5257                         return stbi__errpuc("bad file","scanline overrun");
5258 
5259                      if (!stbi__readval(s,packet->channel,value))
5260                         return 0;
5261 
5262                      for(i=0;i<count;++i, dest += 4)
5263                         stbi__copyval(packet->channel,dest,value);
5264                   } else { // Raw
5265                      ++count;
5266                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
5267 
5268                      for(i=0;i<count;++i, dest+=4)
5269                         if (!stbi__readval(s,packet->channel,dest))
5270                            return 0;
5271                   }
5272                   left-=count;
5273                }
5274                break;
5275             }
5276          }
5277       }
5278    }
5279 
5280    return result;
5281 }
5282 
stbi__pic_load(stbi__context * s,int * px,int * py,int * comp,int req_comp)5283 static stbi_uc *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp)
5284 {
5285    stbi_uc *result;
5286    int i, x,y;
5287 
5288    for (i=0; i<92; ++i)
5289       stbi__get8(s);
5290 
5291    x = stbi__get16be(s);
5292    y = stbi__get16be(s);
5293    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
5294    if ((1 << 28) / x < y) return stbi__errpuc("too large", "Image too large to decode");
5295 
5296    stbi__get32be(s); //skip `ratio'
5297    stbi__get16be(s); //skip `fields'
5298    stbi__get16be(s); //skip `pad'
5299 
5300    // intermediate buffer is RGBA
5301    result = (stbi_uc *) stbi__malloc(x*y*4);
5302    memset(result, 0xff, x*y*4);
5303 
5304    if (!stbi__pic_load_core(s,x,y,comp, result)) {
5305       STBI_FREE(result);
5306       result=0;
5307    }
5308    *px = x;
5309    *py = y;
5310    if (req_comp == 0) req_comp = *comp;
5311    result=stbi__convert_format(result,4,req_comp,x,y);
5312 
5313    return result;
5314 }
5315 
stbi__pic_test(stbi__context * s)5316 static int stbi__pic_test(stbi__context *s)
5317 {
5318    int r = stbi__pic_test_core(s);
5319    stbi__rewind(s);
5320    return r;
5321 }
5322 #endif
5323 
5324 // *************************************************************************************************
5325 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
5326 
5327 #ifndef STBI_NO_GIF
5328 typedef struct
5329 {
5330    stbi__int16 prefix;
5331    stbi_uc first;
5332    stbi_uc suffix;
5333 } stbi__gif_lzw;
5334 
5335 typedef struct
5336 {
5337    int w,h;
5338    stbi_uc *out;                 // output buffer (always 4 components)
5339    int flags, bgindex, ratio, transparent, eflags;
5340    stbi_uc  pal[256][4];
5341    stbi_uc lpal[256][4];
5342    stbi__gif_lzw codes[4096];
5343    stbi_uc *color_table;
5344    int parse, step;
5345    int lflags;
5346    int start_x, start_y;
5347    int max_x, max_y;
5348    int cur_x, cur_y;
5349    int line_size;
5350 } stbi__gif;
5351 
stbi__gif_test_raw(stbi__context * s)5352 static int stbi__gif_test_raw(stbi__context *s)
5353 {
5354    int sz;
5355    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
5356    sz = stbi__get8(s);
5357    if (sz != '9' && sz != '7') return 0;
5358    if (stbi__get8(s) != 'a') return 0;
5359    return 1;
5360 }
5361 
stbi__gif_test(stbi__context * s)5362 static int stbi__gif_test(stbi__context *s)
5363 {
5364    int r = stbi__gif_test_raw(s);
5365    stbi__rewind(s);
5366    return r;
5367 }
5368 
stbi__gif_parse_colortable(stbi__context * s,stbi_uc pal[256][4],int num_entries,int transp)5369 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
5370 {
5371    int i;
5372    for (i=0; i < num_entries; ++i) {
5373       pal[i][2] = stbi__get8(s);
5374       pal[i][1] = stbi__get8(s);
5375       pal[i][0] = stbi__get8(s);
5376       pal[i][3] = transp == i ? 0 : 255;
5377    }
5378 }
5379 
stbi__gif_header(stbi__context * s,stbi__gif * g,int * comp,int is_info)5380 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
5381 {
5382    stbi_uc version;
5383    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
5384       return stbi__err("not GIF", "Corrupt GIF");
5385 
5386    version = stbi__get8(s);
5387    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
5388    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
5389 
5390    stbi__g_failure_reason = "";
5391    g->w = stbi__get16le(s);
5392    g->h = stbi__get16le(s);
5393    g->flags = stbi__get8(s);
5394    g->bgindex = stbi__get8(s);
5395    g->ratio = stbi__get8(s);
5396    g->transparent = -1;
5397 
5398    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
5399 
5400    if (is_info) return 1;
5401 
5402    if (g->flags & 0x80)
5403       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
5404 
5405    return 1;
5406 }
5407 
stbi__gif_info_raw(stbi__context * s,int * x,int * y,int * comp)5408 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
5409 {
5410    stbi__gif g;
5411    if (!stbi__gif_header(s, &g, comp, 1)) {
5412       stbi__rewind( s );
5413       return 0;
5414    }
5415    if (x) *x = g.w;
5416    if (y) *y = g.h;
5417    return 1;
5418 }
5419 
stbi__out_gif_code(stbi__gif * g,stbi__uint16 code)5420 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
5421 {
5422    stbi_uc *p, *c;
5423 
5424    // recurse to decode the prefixes, since the linked-list is backwards,
5425    // and working backwards through an interleaved image would be nasty
5426    if (g->codes[code].prefix >= 0)
5427       stbi__out_gif_code(g, g->codes[code].prefix);
5428 
5429    if (g->cur_y >= g->max_y) return;
5430 
5431    p = &g->out[g->cur_x + g->cur_y];
5432    c = &g->color_table[g->codes[code].suffix * 4];
5433 
5434    if (c[3] >= 128) {
5435       p[0] = c[2];
5436       p[1] = c[1];
5437       p[2] = c[0];
5438       p[3] = c[3];
5439    }
5440    g->cur_x += 4;
5441 
5442    if (g->cur_x >= g->max_x) {
5443       g->cur_x = g->start_x;
5444       g->cur_y += g->step;
5445 
5446       while (g->cur_y >= g->max_y && g->parse > 0) {
5447          g->step = (1 << g->parse) * g->line_size;
5448          g->cur_y = g->start_y + (g->step >> 1);
5449          --g->parse;
5450       }
5451    }
5452 }
5453 
stbi__process_gif_raster(stbi__context * s,stbi__gif * g)5454 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
5455 {
5456    stbi_uc lzw_cs;
5457    stbi__int32 len, code;
5458    stbi__uint32 first;
5459    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
5460    stbi__gif_lzw *p;
5461 
5462    lzw_cs = stbi__get8(s);
5463    clear = 1 << lzw_cs;
5464    first = 1;
5465    codesize = lzw_cs + 1;
5466    codemask = (1 << codesize) - 1;
5467    bits = 0;
5468    valid_bits = 0;
5469    for (code = 0; code < clear; code++) {
5470       g->codes[code].prefix = -1;
5471       g->codes[code].first = (stbi_uc) code;
5472       g->codes[code].suffix = (stbi_uc) code;
5473    }
5474 
5475    // support no starting clear code
5476    avail = clear+2;
5477    oldcode = -1;
5478 
5479    len = 0;
5480    for(;;) {
5481       if (valid_bits < codesize) {
5482          if (len == 0) {
5483             len = stbi__get8(s); // start new block
5484             if (len == 0)
5485                return g->out;
5486          }
5487          --len;
5488          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
5489          valid_bits += 8;
5490       } else {
5491          stbi__int32 code = bits & codemask;
5492          bits >>= codesize;
5493          valid_bits -= codesize;
5494          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
5495          if (code == clear) {  // clear code
5496             codesize = lzw_cs + 1;
5497             codemask = (1 << codesize) - 1;
5498             avail = clear + 2;
5499             oldcode = -1;
5500             first = 0;
5501          } else if (code == clear + 1) { // end of stream code
5502             stbi__skip(s, len);
5503             while ((len = stbi__get8(s)) > 0)
5504                stbi__skip(s,len);
5505             return g->out;
5506          } else if (code <= avail) {
5507             if (first) return stbi__errpuc("no clear code", "Corrupt GIF");
5508 
5509             if (oldcode >= 0) {
5510                p = &g->codes[avail++];
5511                if (avail > 4096)        return stbi__errpuc("too many codes", "Corrupt GIF");
5512                p->prefix = (stbi__int16) oldcode;
5513                p->first = g->codes[oldcode].first;
5514                p->suffix = (code == avail) ? p->first : g->codes[code].first;
5515             } else if (code == avail)
5516                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5517 
5518             stbi__out_gif_code(g, (stbi__uint16) code);
5519 
5520             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
5521                codesize++;
5522                codemask = (1 << codesize) - 1;
5523             }
5524 
5525             oldcode = code;
5526          } else {
5527             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
5528          }
5529       }
5530    }
5531 }
5532 
stbi__fill_gif_background(stbi__gif * g)5533 static void stbi__fill_gif_background(stbi__gif *g)
5534 {
5535    int i;
5536    stbi_uc *c = g->pal[g->bgindex];
5537    // @OPTIMIZE: write a dword at a time
5538    for (i = 0; i < g->w * g->h * 4; i += 4) {
5539       stbi_uc *p  = &g->out[i];
5540       p[0] = c[2];
5541       p[1] = c[1];
5542       p[2] = c[0];
5543       p[3] = c[3];
5544    }
5545 }
5546 
5547 // this function is designed to support animated gifs, although stb_image doesn't support it
stbi__gif_load_next(stbi__context * s,stbi__gif * g,int * comp,int req_comp)5548 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp)
5549 {
5550    int i;
5551    stbi_uc *old_out = 0;
5552 
5553    if (g->out == 0) {
5554       if (!stbi__gif_header(s, g, comp,0))     return 0; // stbi__g_failure_reason set by stbi__gif_header
5555       g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
5556       if (g->out == 0)                      return stbi__errpuc("outofmem", "Out of memory");
5557       stbi__fill_gif_background(g);
5558    } else {
5559       // animated-gif-only path
5560       if (((g->eflags & 0x1C) >> 2) == 3) {
5561          old_out = g->out;
5562          g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h);
5563          if (g->out == 0)                   return stbi__errpuc("outofmem", "Out of memory");
5564          memcpy(g->out, old_out, g->w*g->h*4);
5565       }
5566    }
5567 
5568    for (;;) {
5569       switch (stbi__get8(s)) {
5570          case 0x2C: /* Image Descriptor */
5571          {
5572             stbi__int32 x, y, w, h;
5573             stbi_uc *o;
5574 
5575             x = stbi__get16le(s);
5576             y = stbi__get16le(s);
5577             w = stbi__get16le(s);
5578             h = stbi__get16le(s);
5579             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
5580                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
5581 
5582             g->line_size = g->w * 4;
5583             g->start_x = x * 4;
5584             g->start_y = y * g->line_size;
5585             g->max_x   = g->start_x + w * 4;
5586             g->max_y   = g->start_y + h * g->line_size;
5587             g->cur_x   = g->start_x;
5588             g->cur_y   = g->start_y;
5589 
5590             g->lflags = stbi__get8(s);
5591 
5592             if (g->lflags & 0x40) {
5593                g->step = 8 * g->line_size; // first interlaced spacing
5594                g->parse = 3;
5595             } else {
5596                g->step = g->line_size;
5597                g->parse = 0;
5598             }
5599 
5600             if (g->lflags & 0x80) {
5601                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
5602                g->color_table = (stbi_uc *) g->lpal;
5603             } else if (g->flags & 0x80) {
5604                for (i=0; i < 256; ++i)  // @OPTIMIZE: stbi__jpeg_reset only the previous transparent
5605                   g->pal[i][3] = 255;
5606                if (g->transparent >= 0 && (g->eflags & 0x01))
5607                   g->pal[g->transparent][3] = 0;
5608                g->color_table = (stbi_uc *) g->pal;
5609             } else
5610                return stbi__errpuc("missing color table", "Corrupt GIF");
5611 
5612             o = stbi__process_gif_raster(s, g);
5613             if (o == NULL) return NULL;
5614 
5615             if (req_comp && req_comp != 4)
5616                o = stbi__convert_format(o, 4, req_comp, g->w, g->h);
5617             return o;
5618          }
5619 
5620          case 0x21: // Comment Extension.
5621          {
5622             int len;
5623             if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
5624                len = stbi__get8(s);
5625                if (len == 4) {
5626                   g->eflags = stbi__get8(s);
5627                   stbi__get16le(s); // delay
5628                   g->transparent = stbi__get8(s);
5629                } else {
5630                   stbi__skip(s, len);
5631                   break;
5632                }
5633             }
5634             while ((len = stbi__get8(s)) != 0)
5635                stbi__skip(s, len);
5636             break;
5637          }
5638 
5639          case 0x3B: // gif stream termination code
5640             return (stbi_uc *) s; // using '1' causes warning on some compilers
5641 
5642          default:
5643             return stbi__errpuc("unknown code", "Corrupt GIF");
5644       }
5645    }
5646 }
5647 
stbi__gif_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5648 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5649 {
5650    stbi_uc *u = 0;
5651    stbi__gif g;
5652    memset(&g, 0, sizeof(g));
5653 
5654    u = stbi__gif_load_next(s, &g, comp, req_comp);
5655    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
5656    if (u) {
5657       *x = g.w;
5658       *y = g.h;
5659    }
5660 
5661    return u;
5662 }
5663 
stbi__gif_info(stbi__context * s,int * x,int * y,int * comp)5664 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
5665 {
5666    return stbi__gif_info_raw(s,x,y,comp);
5667 }
5668 #endif
5669 
5670 // *************************************************************************************************
5671 // Radiance RGBE HDR loader
5672 // originally by Nicolas Schulz
5673 #ifndef STBI_NO_HDR
stbi__hdr_test_core(stbi__context * s)5674 static int stbi__hdr_test_core(stbi__context *s)
5675 {
5676    const char *signature = "#?RADIANCE\n";
5677    int i;
5678    for (i=0; signature[i]; ++i)
5679       if (stbi__get8(s) != signature[i])
5680          return 0;
5681    return 1;
5682 }
5683 
stbi__hdr_test(stbi__context * s)5684 static int stbi__hdr_test(stbi__context* s)
5685 {
5686    int r = stbi__hdr_test_core(s);
5687    stbi__rewind(s);
5688    return r;
5689 }
5690 
5691 #define STBI__HDR_BUFLEN  1024
stbi__hdr_gettoken(stbi__context * z,char * buffer)5692 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
5693 {
5694    int len=0;
5695    char c = '\0';
5696 
5697    c = (char) stbi__get8(z);
5698 
5699    while (!stbi__at_eof(z) && c != '\n') {
5700       buffer[len++] = c;
5701       if (len == STBI__HDR_BUFLEN-1) {
5702          // flush to end of line
5703          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
5704             ;
5705          break;
5706       }
5707       c = (char) stbi__get8(z);
5708    }
5709 
5710    buffer[len] = 0;
5711    return buffer;
5712 }
5713 
stbi__hdr_convert(float * output,stbi_uc * input,int req_comp)5714 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
5715 {
5716    if ( input[3] != 0 ) {
5717       float f1;
5718       // Exponent
5719       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
5720       if (req_comp <= 2)
5721          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
5722       else {
5723          output[0] = input[0] * f1;
5724          output[1] = input[1] * f1;
5725          output[2] = input[2] * f1;
5726       }
5727       if (req_comp == 2) output[1] = 1;
5728       if (req_comp == 4) output[3] = 1;
5729    } else {
5730       switch (req_comp) {
5731          case 4: output[3] = 1; /* fallthrough */
5732          case 3: output[0] = output[1] = output[2] = 0;
5733                  break;
5734          case 2: output[1] = 1; /* fallthrough */
5735          case 1: output[0] = 0;
5736                  break;
5737       }
5738    }
5739 }
5740 
stbi__hdr_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)5741 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
5742 {
5743    char buffer[STBI__HDR_BUFLEN];
5744    char *token;
5745    int valid = 0;
5746    int width, height;
5747    stbi_uc *scanline;
5748    float *hdr_data;
5749    int len;
5750    unsigned char count, value;
5751    int i, j, k, c1,c2, z;
5752 
5753 
5754    // Check identifier
5755    if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0)
5756       return stbi__errpf("not HDR", "Corrupt HDR image");
5757 
5758    // Parse header
5759    for(;;) {
5760       token = stbi__hdr_gettoken(s,buffer);
5761       if (token[0] == 0) break;
5762       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
5763    }
5764 
5765    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
5766 
5767    // Parse width and height
5768    // can't use sscanf() if we're not using stdio!
5769    token = stbi__hdr_gettoken(s,buffer);
5770    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5771    token += 3;
5772    height = (int) strtol(token, &token, 10);
5773    while (*token == ' ') ++token;
5774    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
5775    token += 3;
5776    width = (int) strtol(token, NULL, 10);
5777 
5778    *x = width;
5779    *y = height;
5780 
5781    if (comp) *comp = 3;
5782    if (req_comp == 0) req_comp = 3;
5783 
5784    // Read data
5785    hdr_data = (float *) stbi__malloc(height * width * req_comp * sizeof(float));
5786 
5787    // Load image data
5788    // image data is stored as some number of sca
5789    if ( width < 8 || width >= 32768) {
5790       // Read flat data
5791       for (j=0; j < height; ++j) {
5792          for (i=0; i < width; ++i) {
5793             stbi_uc rgbe[4];
5794            main_decode_loop:
5795             stbi__getn(s, rgbe, 4);
5796             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
5797          }
5798       }
5799    } else {
5800       // Read RLE-encoded data
5801       scanline = NULL;
5802 
5803       for (j = 0; j < height; ++j) {
5804          c1 = stbi__get8(s);
5805          c2 = stbi__get8(s);
5806          len = stbi__get8(s);
5807          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
5808             // not run-length encoded, so we have to actually use THIS data as a decoded
5809             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
5810             stbi_uc rgbe[4];
5811             rgbe[0] = (stbi_uc) c1;
5812             rgbe[1] = (stbi_uc) c2;
5813             rgbe[2] = (stbi_uc) len;
5814             rgbe[3] = (stbi_uc) stbi__get8(s);
5815             stbi__hdr_convert(hdr_data, rgbe, req_comp);
5816             i = 1;
5817             j = 0;
5818             STBI_FREE(scanline);
5819             goto main_decode_loop; // yes, this makes no sense
5820          }
5821          len <<= 8;
5822          len |= stbi__get8(s);
5823          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
5824          if (scanline == NULL) scanline = (stbi_uc *) stbi__malloc(width * 4);
5825 
5826          for (k = 0; k < 4; ++k) {
5827             i = 0;
5828             while (i < width) {
5829                count = stbi__get8(s);
5830                if (count > 128) {
5831                   // Run
5832                   value = stbi__get8(s);
5833                   count -= 128;
5834                   for (z = 0; z < count; ++z)
5835                      scanline[i++ * 4 + k] = value;
5836                } else {
5837                   // Dump
5838                   for (z = 0; z < count; ++z)
5839                      scanline[i++ * 4 + k] = stbi__get8(s);
5840                }
5841             }
5842          }
5843          for (i=0; i < width; ++i)
5844             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
5845       }
5846       STBI_FREE(scanline);
5847    }
5848 
5849    return hdr_data;
5850 }
5851 
stbi__hdr_info(stbi__context * s,int * x,int * y,int * comp)5852 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
5853 {
5854    char buffer[STBI__HDR_BUFLEN];
5855    char *token;
5856    int valid = 0;
5857 
5858    if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0) {
5859        stbi__rewind( s );
5860        return 0;
5861    }
5862 
5863    for(;;) {
5864       token = stbi__hdr_gettoken(s,buffer);
5865       if (token[0] == 0) break;
5866       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
5867    }
5868 
5869    if (!valid) {
5870        stbi__rewind( s );
5871        return 0;
5872    }
5873    token = stbi__hdr_gettoken(s,buffer);
5874    if (strncmp(token, "-Y ", 3)) {
5875        stbi__rewind( s );
5876        return 0;
5877    }
5878    token += 3;
5879    *y = (int) strtol(token, &token, 10);
5880    while (*token == ' ') ++token;
5881    if (strncmp(token, "+X ", 3)) {
5882        stbi__rewind( s );
5883        return 0;
5884    }
5885    token += 3;
5886    *x = (int) strtol(token, NULL, 10);
5887    *comp = 3;
5888    return 1;
5889 }
5890 #endif // STBI_NO_HDR
5891 
5892 #ifndef STBI_NO_BMP
stbi__bmp_info(stbi__context * s,int * x,int * y,int * comp)5893 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
5894 {
5895    int hsz;
5896    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') {
5897        stbi__rewind( s );
5898        return 0;
5899    }
5900    stbi__skip(s,12);
5901    hsz = stbi__get32le(s);
5902    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) {
5903        stbi__rewind( s );
5904        return 0;
5905    }
5906    if (hsz == 12) {
5907       *x = stbi__get16le(s);
5908       *y = stbi__get16le(s);
5909    } else {
5910       *x = stbi__get32le(s);
5911       *y = stbi__get32le(s);
5912    }
5913    if (stbi__get16le(s) != 1) {
5914        stbi__rewind( s );
5915        return 0;
5916    }
5917    *comp = stbi__get16le(s) / 8;
5918    return 1;
5919 }
5920 #endif
5921 
5922 #ifndef STBI_NO_PSD
stbi__psd_info(stbi__context * s,int * x,int * y,int * comp)5923 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
5924 {
5925    int channelCount;
5926    if (stbi__get32be(s) != 0x38425053) {
5927        stbi__rewind( s );
5928        return 0;
5929    }
5930    if (stbi__get16be(s) != 1) {
5931        stbi__rewind( s );
5932        return 0;
5933    }
5934    stbi__skip(s, 6);
5935    channelCount = stbi__get16be(s);
5936    if (channelCount < 0 || channelCount > 16) {
5937        stbi__rewind( s );
5938        return 0;
5939    }
5940    *y = stbi__get32be(s);
5941    *x = stbi__get32be(s);
5942    if (stbi__get16be(s) != 8) {
5943        stbi__rewind( s );
5944        return 0;
5945    }
5946    if (stbi__get16be(s) != 3) {
5947        stbi__rewind( s );
5948        return 0;
5949    }
5950    *comp = 4;
5951    return 1;
5952 }
5953 #endif
5954 
5955 #ifndef STBI_NO_PIC
stbi__pic_info(stbi__context * s,int * x,int * y,int * comp)5956 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
5957 {
5958    int act_comp=0,num_packets=0,chained;
5959    stbi__pic_packet packets[10];
5960 
5961    stbi__skip(s, 92);
5962 
5963    *x = stbi__get16be(s);
5964    *y = stbi__get16be(s);
5965    if (stbi__at_eof(s))  return 0;
5966    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
5967        stbi__rewind( s );
5968        return 0;
5969    }
5970 
5971    stbi__skip(s, 8);
5972 
5973    do {
5974       stbi__pic_packet *packet;
5975 
5976       if (num_packets==sizeof(packets)/sizeof(packets[0]))
5977          return 0;
5978 
5979       packet = &packets[num_packets++];
5980       chained = stbi__get8(s);
5981       packet->size    = stbi__get8(s);
5982       packet->type    = stbi__get8(s);
5983       packet->channel = stbi__get8(s);
5984       act_comp |= packet->channel;
5985 
5986       if (stbi__at_eof(s)) {
5987           stbi__rewind( s );
5988           return 0;
5989       }
5990       if (packet->size != 8) {
5991           stbi__rewind( s );
5992           return 0;
5993       }
5994    } while (chained);
5995 
5996    *comp = (act_comp & 0x10 ? 4 : 3);
5997 
5998    return 1;
5999 }
6000 #endif
6001 
6002 // *************************************************************************************************
6003 // Portable Gray Map and Portable Pixel Map loader
6004 // by Ken Miller
6005 //
6006 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
6007 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
6008 //
6009 // Known limitations:
6010 //    Does not support comments in the header section
6011 //    Does not support ASCII image data (formats P2 and P3)
6012 //    Does not support 16-bit-per-channel
6013 
6014 #ifndef STBI_NO_PNM
6015 
stbi__pnm_test(stbi__context * s)6016 static int      stbi__pnm_test(stbi__context *s)
6017 {
6018    char p, t;
6019    p = (char) stbi__get8(s);
6020    t = (char) stbi__get8(s);
6021    if (p != 'P' || (t != '5' && t != '6')) {
6022        stbi__rewind( s );
6023        return 0;
6024    }
6025    return 1;
6026 }
6027 
stbi__pnm_load(stbi__context * s,int * x,int * y,int * comp,int req_comp)6028 static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp)
6029 {
6030    stbi_uc *out;
6031    if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
6032       return 0;
6033    *x = s->img_x;
6034    *y = s->img_y;
6035    *comp = s->img_n;
6036 
6037    out = (stbi_uc *) stbi__malloc(s->img_n * s->img_x * s->img_y);
6038    if (!out) return stbi__errpuc("outofmem", "Out of memory");
6039    stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
6040 
6041    if (req_comp && req_comp != s->img_n) {
6042       out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
6043       if (out == NULL) return out; // stbi__convert_format frees input on failure
6044    }
6045    return out;
6046 }
6047 
stbi__pnm_isspace(char c)6048 static int      stbi__pnm_isspace(char c)
6049 {
6050    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
6051 }
6052 
stbi__pnm_skip_whitespace(stbi__context * s,char * c)6053 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
6054 {
6055    while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
6056       *c = (char) stbi__get8(s);
6057 }
6058 
stbi__pnm_isdigit(char c)6059 static int      stbi__pnm_isdigit(char c)
6060 {
6061    return c >= '0' && c <= '9';
6062 }
6063 
stbi__pnm_getinteger(stbi__context * s,char * c)6064 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
6065 {
6066    int value = 0;
6067 
6068    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
6069       value = value*10 + (*c - '0');
6070       *c = (char) stbi__get8(s);
6071    }
6072 
6073    return value;
6074 }
6075 
stbi__pnm_info(stbi__context * s,int * x,int * y,int * comp)6076 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
6077 {
6078    int maxv;
6079    char c, p, t;
6080 
6081    stbi__rewind( s );
6082 
6083    // Get identifier
6084    p = (char) stbi__get8(s);
6085    t = (char) stbi__get8(s);
6086    if (p != 'P' || (t != '5' && t != '6')) {
6087        stbi__rewind( s );
6088        return 0;
6089    }
6090 
6091    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
6092 
6093    c = (char) stbi__get8(s);
6094    stbi__pnm_skip_whitespace(s, &c);
6095 
6096    *x = stbi__pnm_getinteger(s, &c); // read width
6097    stbi__pnm_skip_whitespace(s, &c);
6098 
6099    *y = stbi__pnm_getinteger(s, &c); // read height
6100    stbi__pnm_skip_whitespace(s, &c);
6101 
6102    maxv = stbi__pnm_getinteger(s, &c);  // read max value
6103 
6104    if (maxv > 255)
6105       return stbi__err("max value > 255", "PPM image not 8-bit");
6106    else
6107       return 1;
6108 }
6109 #endif
6110 
stbi__info_main(stbi__context * s,int * x,int * y,int * comp)6111 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
6112 {
6113    #ifndef STBI_NO_JPEG
6114    if (stbi__jpeg_info(s, x, y, comp)) return 1;
6115    #endif
6116 
6117    #ifndef STBI_NO_PNG
6118    if (stbi__png_info(s, x, y, comp))  return 1;
6119    #endif
6120 
6121    #ifndef STBI_NO_GIF
6122    if (stbi__gif_info(s, x, y, comp))  return 1;
6123    #endif
6124 
6125    #ifndef STBI_NO_BMP
6126    if (stbi__bmp_info(s, x, y, comp))  return 1;
6127    #endif
6128 
6129    #ifndef STBI_NO_PSD
6130    if (stbi__psd_info(s, x, y, comp))  return 1;
6131    #endif
6132 
6133    #ifndef STBI_NO_PIC
6134    if (stbi__pic_info(s, x, y, comp))  return 1;
6135    #endif
6136 
6137    #ifndef STBI_NO_PNM
6138    if (stbi__pnm_info(s, x, y, comp))  return 1;
6139    #endif
6140 
6141    #ifndef STBI_NO_HDR
6142    if (stbi__hdr_info(s, x, y, comp))  return 1;
6143    #endif
6144 
6145    // test tga last because it's a crappy test!
6146    #ifndef STBI_NO_TGA
6147    if (stbi__tga_info(s, x, y, comp))
6148        return 1;
6149    #endif
6150    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
6151 }
6152 
6153 #ifndef STBI_NO_STDIO
stbi_info(char const * filename,int * x,int * y,int * comp)6154 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
6155 {
6156     FILE *f = stbi__fopen(filename, "rb");
6157     int result;
6158     if (!f) return stbi__err("can't fopen", "Unable to open file");
6159     result = stbi_info_from_file(f, x, y, comp);
6160     fclose(f);
6161     return result;
6162 }
6163 
stbi_info_from_file(FILE * f,int * x,int * y,int * comp)6164 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
6165 {
6166    int r;
6167    stbi__context s;
6168    long pos = ftell(f);
6169    stbi__start_file(&s, f);
6170    r = stbi__info_main(&s,x,y,comp);
6171    fseek(f,pos,SEEK_SET);
6172    return r;
6173 }
6174 #endif // !STBI_NO_STDIO
6175 
stbi_info_from_memory(stbi_uc const * buffer,int len,int * x,int * y,int * comp)6176 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
6177 {
6178    stbi__context s;
6179    stbi__start_mem(&s,buffer,len);
6180    return stbi__info_main(&s,x,y,comp);
6181 }
6182 
stbi_info_from_callbacks(stbi_io_callbacks const * c,void * user,int * x,int * y,int * comp)6183 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
6184 {
6185    stbi__context s;
6186    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
6187    return stbi__info_main(&s,x,y,comp);
6188 }
6189 
6190 #endif // STB_IMAGE_IMPLEMENTATION
6191 
6192 /*
6193    revision history:
6194       2.02  (2015-01-19) fix incorrect assert, fix warning
6195       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
6196       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
6197       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
6198                          progressive JPEG (stb)
6199                          PGM/PPM support (Ken Miller)
6200                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
6201                          GIF bugfix -- seemingly never worked
6202                          STBI_NO_*, STBI_ONLY_*
6203       1.48  (2014-12-14) fix incorrectly-named assert()
6204       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
6205                          optimize PNG (ryg)
6206                          fix bug in interlaced PNG with user-specified channel count (stb)
6207       1.46  (2014-08-26)
6208               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
6209       1.45  (2014-08-16)
6210               fix MSVC-ARM internal compiler error by wrapping malloc
6211       1.44  (2014-08-07)
6212               various warning fixes from Ronny Chevalier
6213       1.43  (2014-07-15)
6214               fix MSVC-only compiler problem in code changed in 1.42
6215       1.42  (2014-07-09)
6216               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
6217               fixes to stbi__cleanup_jpeg path
6218               added STBI_ASSERT to avoid requiring assert.h
6219       1.41  (2014-06-25)
6220               fix search&replace from 1.36 that messed up comments/error messages
6221       1.40  (2014-06-22)
6222               fix gcc struct-initialization warning
6223       1.39  (2014-06-15)
6224               fix to TGA optimization when req_comp != number of components in TGA;
6225               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
6226               add support for BMP version 5 (more ignored fields)
6227       1.38  (2014-06-06)
6228               suppress MSVC warnings on integer casts truncating values
6229               fix accidental rename of 'skip' field of I/O
6230       1.37  (2014-06-04)
6231               remove duplicate typedef
6232       1.36  (2014-06-03)
6233               convert to header file single-file library
6234               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
6235       1.35  (2014-05-27)
6236               various warnings
6237               fix broken STBI_SIMD path
6238               fix bug where stbi_load_from_file no longer left file pointer in correct place
6239               fix broken non-easy path for 32-bit BMP (possibly never used)
6240               TGA optimization by Arseny Kapoulkine
6241       1.34  (unknown)
6242               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
6243       1.33  (2011-07-14)
6244               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
6245       1.32  (2011-07-13)
6246               support for "info" function for all supported filetypes (SpartanJ)
6247       1.31  (2011-06-20)
6248               a few more leak fixes, bug in PNG handling (SpartanJ)
6249       1.30  (2011-06-11)
6250               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
6251               removed deprecated format-specific test/load functions
6252               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
6253               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
6254               fix inefficiency in decoding 32-bit BMP (David Woo)
6255       1.29  (2010-08-16)
6256               various warning fixes from Aurelien Pocheville
6257       1.28  (2010-08-01)
6258               fix bug in GIF palette transparency (SpartanJ)
6259       1.27  (2010-08-01)
6260               cast-to-stbi_uc to fix warnings
6261       1.26  (2010-07-24)
6262               fix bug in file buffering for PNG reported by SpartanJ
6263       1.25  (2010-07-17)
6264               refix trans_data warning (Won Chun)
6265       1.24  (2010-07-12)
6266               perf improvements reading from files on platforms with lock-heavy fgetc()
6267               minor perf improvements for jpeg
6268               deprecated type-specific functions so we'll get feedback if they're needed
6269               attempt to fix trans_data warning (Won Chun)
6270       1.23    fixed bug in iPhone support
6271       1.22  (2010-07-10)
6272               removed image *writing* support
6273               stbi_info support from Jetro Lauha
6274               GIF support from Jean-Marc Lienher
6275               iPhone PNG-extensions from James Brown
6276               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
6277       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
6278       1.20    added support for Softimage PIC, by Tom Seddon
6279       1.19    bug in interlaced PNG corruption check (found by ryg)
6280       1.18 2008-08-02
6281               fix a threading bug (local mutable static)
6282       1.17    support interlaced PNG
6283       1.16    major bugfix - stbi__convert_format converted one too many pixels
6284       1.15    initialize some fields for thread safety
6285       1.14    fix threadsafe conversion bug
6286               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
6287       1.13    threadsafe
6288       1.12    const qualifiers in the API
6289       1.11    Support installable IDCT, colorspace conversion routines
6290       1.10    Fixes for 64-bit (don't use "unsigned long")
6291               optimized upsampling by Fabian "ryg" Giesen
6292       1.09    Fix format-conversion for PSD code (bad global variables!)
6293       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
6294       1.07    attempt to fix C++ warning/errors again
6295       1.06    attempt to fix C++ warning/errors again
6296       1.05    fix TGA loading to return correct *comp and use good luminance calc
6297       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
6298       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
6299       1.02    support for (subset of) HDR files, float interface for preferred access to them
6300       1.01    fix bug: possible bug in handling right-side up bmps... not sure
6301               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
6302       1.00    interface to zlib that skips zlib header
6303       0.99    correct handling of alpha in palette
6304       0.98    TGA loader by lonesock; dynamically add loaders (untested)
6305       0.97    jpeg errors on too large a file; also catch another malloc failure
6306       0.96    fix detection of invalid v value - particleman@mollyrocket forum
6307       0.95    during header scan, seek to markers in case of padding
6308       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
6309       0.93    handle jpegtran output; verbose errors
6310       0.92    read 4,8,16,24,32-bit BMP files of several formats
6311       0.91    output 24-bit Windows 3.0 BMP files
6312       0.90    fix a few more warnings; bump version number to approach 1.0
6313       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
6314       0.60    fix compiling as c++
6315       0.59    fix warnings: merge Dave Moore's -Wall fixes
6316       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
6317       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
6318       0.56    fix bug: zlib uncompressed mode len vs. nlen
6319       0.55    fix bug: restart_interval not initialized to 0
6320       0.54    allow NULL for 'int *comp'
6321       0.53    fix bug in png 3->4; speedup png decoding
6322       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
6323       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
6324               on 'test' only check type, not whether we support this variant
6325       0.50    first released version
6326 */
6327