• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

anti-aliasing/H24-Feb-2020-2,9592,032

auto-box/H24-Feb-2020-148120

blurs/H24-Feb-2020-8,3816,700

border/H24-Feb-2020-1,8831,573

cel/H24-Feb-2020-774619

crt/H24-Feb-2020-31,61421,841

cubic/H24-Feb-2020-217172

ddt/H24-Feb-2020-693564

deblur/shaders/H24-Feb-2020-144113

denoisers/H24-Feb-2020-917795

dithering/H24-Feb-2020-1,6331,301

eagle/H24-Feb-2020-188163

gpu/H24-Feb-2020-923755

handheld/H24-Feb-2020-8,9117,268

hqx/H24-Feb-2020-533447

include/H24-Feb-2020-3,4971,901

interpolation/H24-Feb-2020-1,002788

linear/H24-Feb-2020-7660

misc/H24-Feb-2020-2,6922,237

motion-interpolation/H24-Feb-2020-275231

motionblur/H24-Feb-2020-589480

nedi/H24-Feb-2020-1,3261,177

nes_raw_palette/H24-Feb-2020-1,6661,381

nnedi3/H24-Feb-2020-5,0144,535

ntsc/H24-Feb-2020-2,9982,477

omniscale/H24-Feb-2020-510416

pal/H24-Feb-2020-955799

presets/H24-Feb-2020-3,1622,733

procedural/H24-Feb-2020-22,50819,221

quad/shaders/H24-Feb-2020-246201

reshade/H24-Feb-2020-1,5651,317

sabr/H24-Feb-2020-937780

scalefx/H24-Feb-2020-1,7881,291

scalehq/H24-Feb-2020-223188

scalenx/H24-Feb-2020-722578

scanlines/shaders/H24-Feb-2020-261213

sharpen/H24-Feb-2020-1,042835

spec/H24-Feb-2020-780591

test/H24-Feb-2020-278223

vhs/H24-Feb-2020-919725

warp/shaders/H24-Feb-2020-300231

windowed/H24-Feb-2020-311248

xbr/H24-Feb-2020-8,5506,866

xbrz/H24-Feb-2020-2,1081,812

xsal/H24-Feb-2020-454363

xsoft/H24-Feb-2020-306250

MakefileH A D03-May-2022336 1511

README.mdH A D24-Feb-202035.3 KiB780591

bilinear.slangpH A D24-Feb-202057 53

configureH A D24-Feb-202038 41

nearest.slangpH A D24-Feb-202058 53

stock.slangH A D24-Feb-2020665 3529

README.md

1# Vulkan GLSL RetroArch shader system
2
3This document is a draft of RetroArch's new GPU shader system.
4It will outline the features in the new shader subsystem and describe details for how it will work in practice.
5
6In addition this document will contain various musings on why certain design choices are made and which compromised have been made to arrive at the conclusion. This is mostly for discussing and deliberation while the new system is under development.
7
8## Introduction
9
10### Target shader languages
11 - Vulkan
12 - GL 2.x (legacy desktop)
13 - GL 3.x+ (modern desktop)
14 - GLES2 (legacy mobile)
15 - GLES3 (modern mobile)
16 - (HLSL, potentially)
17 - (Metal, potentially)
18
19RetroArch is still expected to run on GLES2 and GL2 systems.
20GL2 is mostly not relevant any longer, but GLES2 is certainly a very relevant platform still and having GLES2 compatibility makes GL2 very easy.
21We therefore want to avoid speccing out a design which deliberately ruins GLES2 compatibility.
22
23However, we also do not want to artificially limit ourselves to shader features which are only available in GLES2.
24There are many shader builtins for example which only work in GLES3/GL3 and we should not hold back support in these cases.
25When we want to consider GLES2 compat we should not spec out high level features which do not make much sense in the context of GLES2.
26
27### Why a new spec?
28
29The current shader subsystem in RetroArch is quite mature with a large body of shaders written for it.
30While it has served us well, it is not forward-compatible.
31
32The current state of writing high-level shading languages that work "everywhere" is very challenging.
33There was no good ready-made solution for this.
34Up until now, we have relied on nVidia Cg to serve as a basic foundation for shaders, but Cg has been discontinued for years and is closed source.
35This is very problematic since Cg is not a forward compatible platform.
36It has many warts which are heavily tied in to legacy APIs and systems.
37For this reason, we cannot use Cg for newer APIs such as Vulkan and potentially D3D12 and Metal.
38
39Cg cross compilation to GLSL is barely working and it is horribly unmaintainable with several unfixable issues.
40The output is so horribly mangled and unoptimized that it is clearly not the approach we should be taking.
41We also cannot do the Cg transform in runtime on mobile due to lack of open source Cg runtime, so there's that as well.
42
43Another alternative is to write straight-up GLSL, but this too has some severe problems.
44All the different GL versions and GLSL variants are different enough that it becomes painful to write portable GLSL code that works without modification.
45Examples include:
46
47 - varying/attribute vs in/out (legacy vs modern)
48 - precision qualifiers (GLSL vs ESSL)
49 - texture2D vs texture (legacy vs modern)
50 - Lack of standard support for #include to reduce copy-pasta
51
52The problem really is that GLSL shaders are dependent on the runtime GL version, which makes it very annoying and hard to test all shader variants.
53
54We do not want to litter every shader with heaps of #ifdefs everywhere to combat this problem.
55We also want to avoid having to write pseudo-GLSL with some text based replacement behind the scenes.
56
57#### Vulkan GLSL as the portable solution
58
59Fortunately, there is now a forward looking and promising solution to our problems.
60Vulkan GLSL is a GLSL dialect designed for Vulkan and SPIR-V intermediate representation.
61The good part is that we can use whatever GLSL version we want when writing shaders, as it is decoupled from the GL runtime.
62
63In runtime, we can have a vendor-neutral mature compiler,
64[https://github.com/KhronosGroup/glslang](glslang) which compiles our Vulkan GLSL to SPIR-V.
65Using [https://github.com/KhronosGroup/SPIRV-Cross](SPIRV-Cross), we can then do reflection on the SPIR-V binary to deduce our filter chain layout.
66We can also disassemble back to our desired GLSL dialect in the GL backend based on which GL version we're running,
67which effectively means we can completely sidestep all our current problems with a pure GLSL based shading system.
68
69Another upside of this is that we no longer have to deal with vendor-specific quirks in the GLSL frontend.
70A common problem when people write for nVidia is that people mistakingly use float2/float3/float4 types from Cg/HLSL, which is supported
71as an extension in their GLSL frontend.
72
73##### Why not SPIR-V directly?
74
75This was considered, but there are several convenience problems with having a shading spec around pure SPIR-V.
76The first problem is metadata. In GLSL, we can quite easily extend with custom #pragmas or similar, but there is no trivial way to do this in SPIR-V
77outside writing custom tools to emit special metadata as debug information or similar with OpSource.
78
79We could also have this metadata outside in a separate file, but juggling more files means more churn, which we should try to avoid.
80The other problem is convenience. If RetroArch only accepts SPIR-V, we would need an explicit build step outside RetroArch first before we could
81test a shader. This gets very annoying during shader development,
82so it is clear that we need to support GLSL anyways, making SPIR-V support kinda redundant.
83
84The main argument for supporting SPIR-V would be to allow new shading languages to be used. This is a reasonable thing to consider, which is why
85the goal is to not design ourselves into a corner where it's only Vulkan GLSL that can possibly work down the line. We are open to the idea that
86new shading languages that target SPIR-V will emerge.
87
88### Warts in old shader system
89
90While the old shader system is functional it has some severe warts which have accumulated over time.
91In hindsight, some of the early design decisions were misguided and need to be properly fixed.
92
93#### Forced POT with padding
94
95This is arguably the largest wart of them all. The original reason behind this design decision was caused by a misguided effort to combat FP precision issues with texture sampling. The idea at the time was to avoid cases where nearest neighbor sampling at texel edges would cause artifacts. This is a typical case when textures are scaled with non-integer factors. However, the problem to begin with is naive nearest neighbor and non-integer scaling factors, and not FP precision. It was pure luck that POT tended to give better results with broken shaders, but we should not make this mistake again. POT padding has some severe issues which are not just cleanliness related either.
96
97Technically, GLES2 doesn't require non-POT support, but in practice, all GPUs support this.
98
99##### No proper UV wrapping
100Since the texture "ends" at UV coords < 1.0, we cannot properly
101use sampler wrapping modes. We can only fake `CLAMP_TO_BORDER` by padding with black color, but this filtering mode is not available by default in GLES2 and even GLES3!
102`CLAMP_TO_BORDER` isn't necessarily what we want either. `CLAMP_TO_EDGE` is usually a far more sane default.
103
104##### Extra arguments for actual width vs. texture width
105
106With normalized coordinates we need to think in both real resolution (e.g. 320x240) vs. POT padded resolutions (512x512) to deal with normalized UV coords. This complicates things massively and
107we were passing an insane amount of attributes and varyings to deal with this because the ratios between the two needn't be the same for two different textures.
108
109#### Arbitrary limits
110The way the old shader system deals with limits is quite naive.
111There is a hard limit of 8 when referencing other passes and older frames.
112There is no reason why we should have arbitrary limits like these.
113Part of the reason is C where dealing with dynamic memory is more painful than is should be so it was easier to take the lazy way out.
114
115#### Tacked on format handling
116
117In more complex shaders we need to consider more than just the plain `RGBA8_UNORM` format.
118The old shader system tacked on these things after the fact by adding booleans for SRGB and FP support, but this obviously doesn't scale.
119This point does get problematic since GLES2 has terrible support for render target formats, but we should allow complex shaders to use complex RT formats
120and rather just allow some shader presets to drop GLES2 compat.
121
122#### PASS vs PASSPREV
123
124Ugly. We do not need two ways to access previous passes, the actual solution is to have aliases for passes instead and access by name.
125
126#### Inconsistencies in parameter passing
127
128MVP matrices are passed in with weird conventions in the Cg spec, and its casing is weird.
129The source texture is passed with magic TEXUNIT0 semantic while other textures are passed via uniform struct members, etc.
130This is the result of tacking on feature support slowly over time without proper forethought.
131
132## High level Overview
133
134The RetroArch shader format outlines a filter chain/graph, a series of shader passes which operate on previously generated data to produce a final result.
135The goal is for every individual pass to access information from *all* previous shader passes, even across frames, easily.
136
137 - The filter chain specifies a number of shader passes to be executed one after the other.
138 - Each pass renders a full-screen quad to a texture of a certain resolution and format.
139 - The resolution can be dependent on external information.
140 - All filter chains begin at an input texture, which is created by a libretro core or similar.
141 - All filter chains terminate by rendering to the "backbuffer".
142
143The backbuffer is somewhat special since the resolution of it cannot be controlled by the shader.
144It can also not be fed back into the filter chain later
145because the frontend (here RetroArch) will render UI elements and such on top of the final pass output.
146
147Let's first look at what we mean by filter chains and how far we can expand this idea.
148
149### Simplest filter chain
150
151The simplest filter chain we can specify is a single pass.
152
153```
154(Input) -> [ Shader Pass #0 ] -> (Backbuffer)
155```
156
157In this case there are no offscreen render targets necessary since our input is rendered directly to screen.
158
159### Multiple passes
160
161A trivial extension is to keep our straight line view of the world where each pass looks at the previous output.
162
163```
164(Input) -> [ Shader Pass #0 ] -> (Framebuffer) -> [ Shader Pass #1 ] -> (Backbuffer)
165```
166
167Framebuffer here might have a different resolution than both Input and Backbuffer.
168A very common scenario for this is separable filters where we first scale horizontally, then vertically.
169
170### Multiple passes and multiple inputs
171
172There is no reason why we should restrict ourselves to a straight-line view.
173
174```
175     /------------------------------------------------\
176    /                                                  v
177(Input) -> [ Shader Pass #0 ] -> (Framebuffer #0) -> [ Shader Pass #1 ] -> (Backbuffer)
178```
179
180In this scenario, we have two inputs to shader pass #1, both the original, untouched input as well as the result of a pass in-between.
181All the inputs to a pass can have different resolutions.
182We have a way to query the resolution of individual textures to allow highly controlled sampling.
183
184We are now at a point where we can express an arbitrarily complex filter graph, but we can do better.
185For certain effects, time (or rather, results from earlier frames) can be an important factor.
186
187### Multiple passes, multiple inputs, with history
188
189We now extend our filter graph, where we also have access to information from earlier frames. Note that this is still a causal filter system.
190
191```
192Frame N:        (Input     N, Input N - 1, Input N - 2) -> [ Shader Pass #0 ] -> (Framebuffer     N, Framebuffer N - 1, Input N - 3) -> [ Shader Pass #1 ] -> (Backbuffer)
193Frame N - 1:    (Input N - 1, Input N - 2, Input N - 3) -> [ Shader Pass #0 ] -> (Framebuffer N - 1, Framebuffer N - 2, Input N - 4) -> [ Shader Pass #1 ] -> (Backbuffer)
194Frame N - 2:    (Input N - 2, Input N - 3, Input N - 4) -> [ Shader Pass #0 ] -> (Framebuffer N - 2, Framebuffer N - 3, Input N - 5) -> [ Shader Pass #1 ] -> (Backbuffer)
195```
196
197For framebuffers we can read the previous frame's framebuffer. We don't really need more than one frame of history since we have a feedback effect in place.
198Just like IIR filters, the "response" of such a feedback in the filter graph gives us essentially "infinite" history back in time,
199although it is mostly useful for long-lasting blurs and ghosting effects. Supporting more than one frame of feedback would also be extremely memory intensive since framebuffers tend to be
200much higher resolution than their input counterparts. One frame is also a nice "clean" limit. Once we go beyond just 1, the floodgate opens to arbitrary numbers, which we would want to avoid.
201It is also possible to fake as many feedback frames of history we want anyways,
202since we can copy a feedback frame to a separate pass anyways which effectively creates a "shift register" of feedback framebuffers in memory.
203
204Input textures can have arbitrary number of textures as history (just limited by memory).
205They cannot feedback since the filter chain cannot render into it, so it effectively is finite response (FIR).
206
207For the very first frames, frames with frame N < 0 are transparent black (all values 0).
208
209### No POT padding
210
211No texture in the filter chain is padded at any time. It is possible for resolutions in the filter chain to vary over time which is common with certain emulated systems.
212In this scenarios, the textures and framebuffers are simply resized appropriately.
213Older frames still keep their old resolution in the brief moment that the resolution is changing.
214
215It is very important that shaders do not blindly sample with nearest filter with any scale factor. If naive nearest neighbor sampling is to be used, shaders must make sure that
216the filter chain is configured with integer scaling factors so that ambiguous texel-edge sampling is avoided.
217
218### Deduce shader inputs by reflection
219
220We want to have as much useful information in the shader source as possible. We want to avoid having to explicitly write out metadata in shaders whereever we can.
221The biggest hurdle to overcome is how we describe our pipeline layout. The pipeline layout contains information about how we access resources such as uniforms and textures.
222There are three main types of inputs in this shader system.
223
224 - Texture samplers (sampler2D)
225 - Look-up textures for static input data
226 - Uniform data describing dimensions of textures
227 - Uniform ancillary data for render target dimensions, backbuffer target dimensions, frame count, etc
228 - Uniform user-defined parameters
229 - Uniform MVP for vertex shader
230
231#### Deduction by name
232
233There are two main approaches to deduce what a sampler2D uniform wants to sample from.
234The first way is to explicitly state somewhere else what that particular sampler needs, e.g.
235
236```
237uniform sampler2D geeWhatAmI;
238
239// Metadata somewhere else
240SAMPLER geeWhatAmI = Input[-2]; // Input frame from 2 frames ago
241```
242
243The other approach is to have built-in identifiers which correspond to certain textures.
244
245```
246// Source here being defined as the texture from previous framebuffer pass or the input texture if this is the first pass in the chain.
247uniform sampler2D Source;
248```
249
250In SPIR-V, we can use `OpName` to describe these names, so we do not require the original Vulkan GLSL source to perform this reflection.
251We use this approach throughout the specification. An identifier is mapped to an internal meaning (semantic). The shader backend looks at these semantics and constructs
252a filter chain based on all shaders in the chain.
253
254Identifiers can also have user defined meaning, either as an alias to existing identifiers or mapping to user defined parameters.
255
256### Combining vertex and fragment into a single shader file
257
258One strength of Cg is its ability to contain multiple shader stages in the same .cg file.
259This is very convenient since we always want to consider vertex and fragment together.
260This is especially needed when trying to mix and match shaders in a GUI window for example.
261We don't want to require users to load first a vertex shader, then fragment manually.
262
263GLSL however does not support this out of the box. This means we need to define a light-weight system for preprocessing
264one GLSL source file into multiple stages.
265
266#### Should we make vertex optional?
267
268In most cases, the vertex shader will remain the same.
269This leaves us with the option to provide a "default" vertex stage if the shader stage is not defined.
270
271### #include support
272
273With complex filter chains there is a lot of oppurtunity to reuse code.
274We therefore want light support for the #include directive.
275
276### User parameter support
277
278Since we already have a "preprocessor" of sorts, we can also trivially extend this idea with user parameters.
279In the shader source we can specify which uniform inputs are user controlled, GUI visible name, their effective range, etc.
280
281### Lookup textures
282
283A handy feature to have is reading from lookup textures.
284We can specify that some sampler inputs are loaded from a PNG file on disk as a plain RGBA8 texture.
285
286#### Do we want to support complex reinterpretation?
287
288There could be valid use cases for supporting other formats than plain `RGBA8_UNORM`.
289`SRGB` and `UINT` might be valid cases as well and maybe even 2x16-bit, 1x32-bit integer formats.
290
291#### Lookup buffers
292
293Do we want to support lookup buffers as UBOs as well?
294This wouldn't be doable in GLES2, but it could be useful as a more modern feature.
295If the LUT is small enough, we could realize it via plain old uniforms as well perhaps.
296
297This particular feature could be very interesting for generic polyphase lookup banks with different LUT files for different filters.
298
299## Vulkan GLSL specification
300
301This part of the spec considers how Vulkan GLSL shaders are written. The frontend uses the glslang frontend to compile GLSL sources.
302This ensures that we do not end up with vendor-specific extensions.
303The #version string should be as recent as possible, e.g. `#version 450` or `#version 310 es`.
304It is recommended to use 310 es since it allows mediump which can help on mobile.
305Note that after the Vulkan GLSL is turned into SPIR-V, the original #version string does not matter anymore.
306Also note that SPIR-V cannot be generated from legacy shader versions such as #version 100 (ES 2.0) or #version 120 (GL 2.1).
307
308The frontend will use reflection on the resulting SPIR-V file in order to deduce what each element in the UBO or what each texture means.
309The main types of data passed to shaders are read-only and can be classified as:
310
311 - `uniform sampler2D`: This is used for input textures, framebuffer results and lookup-textures.
312 - `uniform Block { };`: This is used for any constant data which is passed to the shader.
313 - `layout(push_constant) uniform Push {} name;`: This is used for any push constant data which is passed to the shader.
314
315### Resource usage rules
316
317Certain rules must be adhered to in order to make it easier for the frontend to dynamically set up bindings to resources.
318
319 - All resources must be using descriptor set #0, or don't use layout(set = #N) at all.
320 - layout(binding = #N) must be declared for all UBOs and sampler2Ds.
321 - All resources must use different bindings.
322 - There can be only one UBO.
323 - There can be only use push constant block.
324 - It is possible to have one regular UBO and one push constant UBO.
325 - If a UBO is used in both vertex and fragment, their binding number must match.
326 - If a UBO is used in both vertex and fragment, members with the same name must have the same offset/binary interface.
327   This problem is easily avoided by having the same UBO visible to both vertex and fragment as "common" code.
328 - If a push constant block is used in both vertex and fragment, members with the same name must have the same offset/binary interface.
329 - sampler2D cannot be used in vertex, although the size parameters of samplers can be used in vertex.
330 - Other resource types such as SSBOs, images, atomic counters, etc, etc, are not allowed.
331 - Every member of the UBOs and push constant blocks as well as every texture must be meaningful
332   to the frontend in some way, or an error is generated.
333
334### Initial preprocess of slang files
335
336The very first line of a `.slang` file must contain a `#version` statement.
337
338The first process which takes place is dealing with `#include` statements.
339A slang file is preprocessed by scanning through the slang and resolving all `#include` statements.
340The include process does not consider any preprocessor defines or conditional expressions.
341The include path must always be relative, and it will be relative to the file path of the current file.
342Nested includes are allowed, but includes in a cycle are undefined as preprocessor guards are not considered.
343
344E.g.:
345```
346#include "common.inc"
347```
348
349After includes have been resolved, the frontend scans through all lines of the shader and considers `#pragma` statements.
350These pragmas build up ancillary reflection information and otherwise meaningful metadata.
351
352#### `#pragma stage`
353This pragma controls which part of a `.slang` file are visible to certain shader stages.
354Currently, two variants of this pragma are supported:
355
356 - `#pragma stage vertex`
357 - `#pragma stage fragment`
358
359If no `#pragma stage` has been encountered yet, lines of code in a shader belong to all shader stages.
360If a `#pragma stage` statement has been encountered, that stage is considered active, and the following lines of shader code will only be used when building source for that particular shader stage. A new `#pragma stage` can override which stage is active.
361
362#### `#pragma name`
363This pragma lets a shader set its identifier. This identifier can be used to create simple aliases for other passes.
364
365E.g.:
366```
367#pragma name HorizontalPass
368```
369
370#### `#pragma format`
371This pragma controls the format of the framebuffer which this shader will render to.
372The default render target format is `R8G8B8A8_UNORM`.
373
374Supported render target formats are listed below. From a portability perspective,
375please be aware that GLES2 has abysmal render target format support,
376and GLES3/GL3 may have restricted floating point render target support.
377
378If rendering to uint/int formats, make sure your fragment shader output target is uint/int.
379
380#### 8-bit
381 - `R8_UNORM`
382 - `R8_UINT`
383 - `R8_SINT`
384 - `R8G8_UNORM`
385 - `R8G8_UINT`
386 - `R8G8_SINT`
387 - `R8G8B8A8_UNORM`
388 - `R8G8B8A8_UINT`
389 - `R8G8B8A8_SINT`
390 - `R8G8B8A8_SRGB`
391
392#### 10-bit
393 - `A2B10G10R10_UNORM_PACK32`
394 - `A2B10G10R10_UINT_PACK32`
395
396#### 16-bit
397 - `R16_UINT`
398 - `R16_SINT`
399 - `R16_SFLOAT`
400 - `R16G16_UINT`
401 - `R16G16_SINT`
402 - `R16G16_SFLOAT`
403 - `R16G16B16A16_UINT`
404 - `R16G16B16A16_SINT`
405 - `R16G16B16A16_SFLOAT`
406
407#### 32-bit
408 - `R32_UINT`
409 - `R32_SINT`
410 - `R32_SFLOAT`
411 - `R32G32_UINT`
412 - `R32G32_SINT`
413 - `R32G32_SFLOAT`
414 - `R32G32B32A32_UINT`
415 - `R32G32B32A32_SINT`
416 - `R32G32B32A32_SFLOAT`
417
418E.g.:
419```
420#pragma format R16_SFLOAT
421```
422#### `#pragma parameter`
423
424Shader parameters allow shaders to take user-defined inputs as uniform values.
425This makes shaders more configurable.
426
427The format is:
428```
429#pragma parameter IDENTIFIER "DESCRIPTION" INITIAL MINIMUM MAXIMUM [STEP]
430```
431The step parameter is optional.
432INITIAL, MINIMUM and MAXIMUM are floating point values.
433IDENTIFIER is the meaningful string which is the name of the uniform which will be used in a UBO or push constant block.
434DESCRIPTION is a string which is human readable representation of IDENTIFIER.
435
436E.g:
437```
438layout(push_constant) uniform Push {
439   float DummyVariable;
440} registers;
441#pragma parameter DummyVariable "This is a dummy variable" 1.0 0.2 2.0 0.1
442```
443
444### I/O interface variables
445
446The slang shader spec specifies two vertex inputs and one fragment output.
447Varyings between vertex and fragment shaders are user-defined.
448
449#### Vertex inputs
450Two attributes are provided and must be present in a shader.
451It is only the layout(location = #N) which is actually significant.
452The particular names of input and output variables are ignored, but should be consistent for readability.
453
454##### `layout(location = 0) in vec4 Position;`
455This attribute is a 2D position in the form `vec4(x, y, 0.0, 1.0);`.
456Shaders should not try to extract meaning from the x, y.
457`gl_Position` must be assigned as:
458
459```
460gl_Position = MVP * Position;
461```
462##### `layout(location = 1) in vec2 TexCoord;`
463The texture coordinate is semantically such that (0.0, 0.0) is top-left and (1.0, 1.0) is bottom right.
464If TexCoord is passed to a varying unmodified, the interpolated varying will be `uv = 0.5 / OutputSize` when rendering the upper left pixel as expected and `uv = 1.0 - 0.5 / OutputSize` when rendering the bottom-right pixel.
465
466#### Vertex/Fragment interface
467Vertex outputs and fragment inputs link by location, and not name.
468
469E.g.:
470```
471// Vertex
472layout(location = 0) out vec4 varying;
473// Fragment
474layout(location = 0) in vec4 some_other_name;
475```
476will still link fine, although using same names are encouraged for readability.
477
478#### Fragment outputs
479
480##### `layout(location = 0) out vec4 FragColor;`
481Fragment shaders must have a single output to location = 0.
482Multiple render targets are not allowed. The type of the output depends on the render target format.
483int/uint type must be used if UINT/INT render target formats are used, otherwise float type.
484
485### Builtin variables
486
487#### Builtin texture variables
488The input of textures get their meaning from their name.
489
490 - Original: This accesses the input of the filter chain, accessible from any pass.
491 - Source: This accesses the input from previous shader pass, or Original if accessed in the first pass of the filter chain.
492 - OriginalHistory#: This accesses the input # frames back in time.
493   There is no limit on #, except larger numbers will consume more VRAM.
494   OriginalHistory0 is an alias for Original, OriginalHistory1 is the previous frame and so on.
495 - PassOutput#: This accesses the output from pass # in this frame.
496   PassOutput# must be causal, it is an error to access PassOutputN in pass M if N >= M.
497   PassOutput# will typically be aliased to a more readable value.
498 - PassFeedback#: This accesses PassOutput# from the previous frame.
499   Any pass can read the feedback of any feedback, since it is causal.
500   PassFeedback# will typically be aliased to a more readable value.
501 - User#: This accesses look-up textures.
502   However, the direct use of User# is discouraged and should always be accessed via aliases.
503
504#### Builtin texture size uniform variables
505
506If a member of a UBO or a push constant block is called ???Size# where ???# is the name of a texture variable,
507that member must be a vec4, which will receive these values:
508 - X: Horizontal size of texture
509 - Y: Vertical size of texture
510 - Z: 1.0 / (Horizontal size of texture)
511 - W: 1.0 / (Vertical size of texture)
512
513It is valid to use a size variable without declaring the texture itself. This is useful for vertex shading.
514It is valid (although probably not useful) for a variable to be present in both a push constant block and a UBO block at the same time.
515
516#### Builtin uniform variables
517
518Other than uniforms related to textures, there are other special uniforms available.
519These builtin variables may be part of a UBO block and/or a push constant block.
520
521 - MVP: mat4 model view projection matrix.
522 - OutputSize: a vec4(x, y, 1.0 / x, 1.0 / y) variable describing the render target size (x, y) for this pass.
523 - FinalViewportSize: a vec4(x, y, 1.0 / x, 1.0 / y) variable describing the render target size for the final pass.
524   Accessible from any pass.
525 - FrameCount: a uint variable taking a value which increases by one every frame.
526   This value could be pre-wrapped by modulo if specified in preset.
527   This is useful for creating time-dependent effects.
528
529#### Aliases
530Aliases can give meaning to arbitrary names in a slang file.
531This is mostly relevant for LUT textures, shader parameters and accessing other passes by name.
532
533If a shader pass has a `#pragma name NAME` associated with it, meaning is given to the shader:
534 - NAME, is a sampler2D.
535 - NAMESize is a vec4 size uniform associated with NAME.
536 - NAMEFeedback is a sampler2D for the previous frame.
537 - NAMEFeedbackSize is a vec4 size uniform associated with NAMEFeedback.
538
539#### Example slang shader
540
541```
542#version 450
543// 450 or 310 es are recommended
544
545layout(set = 0, binding = 0, std140) uniform UBO
546{
547   mat4 MVP;
548   vec4 SourceSize; // Not used here, but doesn't hurt
549   float ColorMod;
550};
551
552#pragma name StockShader
553#pragma format R8G8B8A8_UNORM
554#pragma parameter ColorMod "Color intensity" 1.0 0.1 2.0 0.1
555
556#pragma stage vertex
557layout(location = 0) in vec4 Position;
558layout(location = 1) in vec2 TexCoord;
559layout(location = 0) out vec2 vTexCoord;
560void main()
561{
562   gl_Position = MVP * Position;
563   vTexCoord = TexCoord;
564}
565
566#pragma stage fragment
567layout(location = 0) in vec2 vTexCoord;
568layout(location = 0) out vec4 FragColor;
569layout(binding = 1) uniform sampler2D Source;
570void main()
571{
572   FragColor = texture(Source, vTexCoord) * ColorMod;
573}
574```
575
576### Push constants vs uniform blocks
577Push constants are fast-access uniform data which on some GPUs will improve performance over plain UBOs.
578It is encouraged to use push constant data as much as possible.
579
580```
581layout(push_constant) uniform Push
582{
583   vec4 SourceSize;
584   vec4 FinalViewportSize;
585} registers;
586```
587
588However, be aware that there is a limit to how large push constant blocks can be used.
589Vulkan puts a minimum required size of 128 bytes, which equals 8 vec4s.
590It is an error to use more than 128 bytes.
591If you're running out of space, you can move the MVP to a UBO instead, which frees up 64 bytes.
592Always prioritize push constants for data used in fragment shaders as there are many more fragment threads than vertex.
593Also note that like UBOs, the push constant space is shared across vertex and fragment.
594
595If you need more than 8 vec4s, you can spill uniforms over to plain UBOs,
596but more than 8 vec4s should be quite rare in practice.
597
598E.g.:
599
600```
601layout(binding = 0, std140) uniform UBO
602{
603   mat4 MVP; // Only used in vertex
604   vec4 SpilledUniform;
605} global;
606
607layout(push_constant) uniform Push
608{
609   vec4 SourceSize;
610   vec4 BlurPassSize;
611   // ...
612} registers;
613```
614
615### Samplers
616Which samplers are used for textures are specified by the preset format.
617The sampler remains constant throughout the frame, there is currently no way to select samplers on a frame-by-frame basic.
618This is mostly to make it possible to use the spec in GLES2 as GLES2 has no concept of separate samplers and images.
619
620### sRGB
621The input to the filter chain will not be of an sRGB format.
622This is due to many reasons, the main one being that it is very difficult for the frontend to get "free" passthrough of sRGB. It is possible to have a first pass which linearizes the input to a proper sRGB render target. In this way, custom gammas can be used as well.
623
624Similarly, the final pass will not be an sRGB backbuffer for similar reasons.
625
626### Caveats
627
628#### Frag Coord
629TexCoord also replaces `gl_FragCoord`. Do not use `gl_FragCoord` as it doesn't consider the viewports correctly.
630If you need `gl_FragCoord` use `vTexCoord * OutputSize.xy` instead.
631
632#### Derivatives
633Be careful with derivatives of vTexCoord. The screen might have been rotated by the vertex shader, which will also rotate the derivatives, especially in the final pass which hits the backbuffer.
634However, derivatives are fortunately never really needed, since w = 1 (we render flat 2D quads),
635which means derivatives of varyings are constant. You can do some trivial replacements which will be faster and more robust.
636
637```
638dFdx(vTexCoord) = vec2(OutputSize.z, 0.0);
639dFdy(vTexCoord) = vec2(0.0, OutputSize.w);
640fwidth(vTexCoord) = max(OutputSize.z, OutputSize.w);
641```
642To avoid issues with rotation or unexpected derivatives in case derivatives are really needed,
643off-screen passes will not have rotation and
644dFdx and dFdy will behave as expected.
645
646#### Correctly sampling textures
647A common mistake made by shaders is that they aren't careful enough about sampling textures correctly.
648There are three major cases to consider
649
650##### Bilinear sampling
651If bilinear is used, it is always safe to sample a texture.
652
653##### Nearest, with integer scale
654If the OutputSize / InputSize is integer,
655the interpolated vTexCoord will always fall inside the texel safely, so no special precautions have to be used.
656For very particular shaders which rely on nearest neighbor sampling, using integer scale to a framebuffer and upscaling that
657with more stable upscaling filters like bicubic for example is usually a great choice.
658
659##### Nearest, with non-integer scale
660Sometimes, it is necessary to upscale images to the backbuffer which have an arbitrary size.
661Bilinear is not always good enough here, so we must deal with a complicated case.
662
663If we interpolate vTexCoord over a frame with non-integer scale, it is possible that we end up just between two texels.
664Nearest neighbor will have to find a texel which is nearest, but there is no clear "nearest" texel. In this scenario, we end up having lots of failure cases which are typically observed as weird glitches in the image which change based on the resolution.
665
666To correctly sample nearest textures with non-integer scale, we must pre-quantize our texture coordinates.
667Here's a snippet which lets us safely sample a nearest filtered texture and emulate bilinear filtering.
668
669```
670   vec2 uv = vTexCoord * global.SourceSize.xy - 0.5; // Shift by 0.5 since the texel sampling points are in the texel center.
671   vec2 a = fract(uv);
672   vec2 tex = (floor(uv) + 0.5) * global.SourceSize.zw; // Build a sampling point which is in the center of the texel.
673
674   // Sample the bilinear footprint.
675   vec4 t0 = textureLodOffset(Source, tex, 0.0, ivec2(0, 0));
676   vec4 t1 = textureLodOffset(Source, tex, 0.0, ivec2(1, 0));
677   vec4 t2 = textureLodOffset(Source, tex, 0.0, ivec2(0, 1));
678   vec4 t3 = textureLodOffset(Source, tex, 0.0, ivec2(1, 1));
679
680   // Bilinear filter.
681   vec4 result = mix(mix(t0, t1, a.x), mix(t2, t3, a.x), a.y);
682```
683
684The concept of splitting up the integer texel along with the fractional texel helps us safely
685do arbitrary non-integer scaling safely.
686The uv variable could also be passed pre-computed from vertex to avoid the extra computation in fragment.
687
688### Preset format (.slangp)
689
690The present format is essentially unchanged from the old .cgp and .glslp, except the new preset format is called .slangp.
691
692## Porting guide from legacy Cg spec
693
694### Common functions
695 - mul(mat, vec) -> mat * vec
696 - lerp() -> mix()
697 - ddx() -> dFdx()
698 - ddy() -> dFdy()
699 - tex2D() -> texture()
700 - frac() -> fract()
701
702### Types
703
704 - floatN -> vecN
705 - boolN -> bvecN
706 - intN -> ivecN
707 - uintN -> uvecN
708 - float4x4 -> mat4
709
710### Builtin uniforms and misc
711
712 - modelViewProj -> MVP
713 - IN.video\_size -> SourceSize.xy
714 - IN.texture\_size -> SourceSize.xy (no POT shenanigans, so they are the same)
715 - IN.output\_size -> OutputSize.xy
716 - IN.frame\_count -> FrameCount (uint instead of float)
717 - \*.tex\_coord -> TexCoord (no POT shenanigans, so they are all the same)
718 - \*.lut\_tex\_coord -> TexCoord
719 - ORIG -> `Original`
720 - PASS# -> PassOutput#
721 - PASSPREV# -> No direct analog, PassOutput(CurrentPass - #), but prefer aliases
722
723### Cg semantics
724
725 - POSITION -> gl\_Position
726 - float2 texcoord : TEXCOORD0 -> layout(location = 1) in vec2 TexCoord;
727 - float4 varying : TEXCOORD# -> layout(location = #) out vec4 varying;
728 - uniform float4x4 modelViewProj -> uniform UBO { mat4 MVP; };
729
730Output structs should be flattened into separate varyings.
731
732E.g. instead of
733```
734struct VertexData
735{
736   float pos : POSITION;
737   float4 tex0 : TEXCOORD0;
738   float4 tex1 : TEXCOORD1;
739};
740
741void main_vertex(out VertexData vout)
742{
743   vout.pos = ...;
744   vout.tex0 = ...;
745   vout.tex1 = ...;
746}
747
748void main_fragment(in VertexData vout)
749{
750   ...
751}
752```
753
754do this
755
756```
757#pragma stage vertex
758layout(location = 0) out vec4 tex0;
759layout(location = 1) out vec4 tex1;
760void main()
761{
762   gl_Position = ...;
763   tex0 = ...;
764   tex1 = ...;
765}
766
767#pragma stage fragment
768layout(location = 0) in vec4 tex0;
769layout(location = 1) in vec4 tex1;
770void main()
771{
772}
773```
774
775Instead of returning a float4 from main\_fragment, have an output in fragment:
776
777```
778layout(location = 0) out vec4 FragColor;
779```
780