1TGSI
2====
3
4TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
5for describing shaders. Since Gallium is inherently shaderful, shaders are
6an important part of the API. TGSI is the only intermediate representation
7used by all drivers.
8
9Basics
10------
11
12All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
13floating-point four-component vectors. An opcode may have up to one
14destination register, known as *dst*, and between zero and three source
15registers, called *src0* through *src2*, or simply *src* if there is only
16one.
17
18Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
19components as integers. Other instructions permit using registers as
20two-component vectors with double precision; see :ref:`doubleopcodes`.
21
22When an instruction has a scalar result, the result is usually copied into
23each of the components of *dst*. When this happens, the result is said to be
24*replicated* to *dst*. :opcode:`RCP` is one such instruction.
25
26Source Modifiers
27^^^^^^^^^^^^^^^^
28
29TGSI supports 32-bit negate and absolute value modifiers on floating-point
30inputs, and 32-bit integer negates on some drivers.  The negate applies after
31absolute value if both are present.
32
33The type of an input can be found by ``tgsi_opcode_infer_src_type()``, and
34TGSI_OPCODE_MOV and the second and third operands of TGSI_OPCODE_UCMP (which
35return TGSI_TYPE_UNTYPED) are also considered floats for the purpose of source
36modifiers.
37
38
39Other Modifiers
40^^^^^^^^^^^^^^^
41
42The saturate modifier clamps 32-bit destination stores to [0.0, 1.0].
43
44For arithmetic instruction having a precise modifier certain optimizations
45which may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be
46optimized to TGSI_OPCODE_MAD, because some hardware only supports the fused
47MAD instruction.
48
49Instruction Set
50---------------
51
52Core ISA
53^^^^^^^^^^^^^^^^^^^^^^^^^
54
55These opcodes are guaranteed to be available regardless of the driver being
56used.
57
58.. opcode:: ARL - Address Register Load
59
60.. math::
61
62  dst.x = (int) \lfloor src.x\rfloor
63
64  dst.y = (int) \lfloor src.y\rfloor
65
66  dst.z = (int) \lfloor src.z\rfloor
67
68  dst.w = (int) \lfloor src.w\rfloor
69
70
71.. opcode:: MOV - Move
72
73.. math::
74
75  dst.x = src.x
76
77  dst.y = src.y
78
79  dst.z = src.z
80
81  dst.w = src.w
82
83
84.. opcode:: LIT - Light Coefficients
85
86.. math::
87
88  dst.x &= 1 \\
89  dst.y &= max(src.x, 0) \\
90  dst.z &= (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0 \\
91  dst.w &= 1
92
93
94.. opcode:: RCP - Reciprocal
95
96This instruction replicates its result.
97
98.. math::
99
100  dst = \frac{1}{src.x}
101
102
103.. opcode:: RSQ - Reciprocal Square Root
104
105This instruction replicates its result. The results are undefined for src <= 0.
106
107.. math::
108
109  dst = \frac{1}{\sqrt{src.x}}
110
111
112.. opcode:: SQRT - Square Root
113
114This instruction replicates its result. The results are undefined for src < 0.
115
116.. math::
117
118  dst = {\sqrt{src.x}}
119
120
121.. opcode:: EXP - Approximate Exponential Base 2
122
123.. math::
124
125  dst.x &= 2^{\lfloor src.x\rfloor} \\
126  dst.y &= src.x - \lfloor src.x\rfloor \\
127  dst.z &= 2^{src.x} \\
128  dst.w &= 1
129
130
131.. opcode:: LOG - Approximate Logarithm Base 2
132
133.. math::
134
135  dst.x &= \lfloor\log_2{|src.x|}\rfloor \\
136  dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\
137  dst.z &= \log_2{|src.x|} \\
138  dst.w &= 1
139
140
141.. opcode:: MUL - Multiply
142
143.. math::
144
145  dst.x = src0.x \times src1.x
146
147  dst.y = src0.y \times src1.y
148
149  dst.z = src0.z \times src1.z
150
151  dst.w = src0.w \times src1.w
152
153
154.. opcode:: ADD - Add
155
156.. math::
157
158  dst.x = src0.x + src1.x
159
160  dst.y = src0.y + src1.y
161
162  dst.z = src0.z + src1.z
163
164  dst.w = src0.w + src1.w
165
166
167.. opcode:: DP3 - 3-component Dot Product
168
169This instruction replicates its result.
170
171.. math::
172
173  dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
174
175
176.. opcode:: DP4 - 4-component Dot Product
177
178This instruction replicates its result.
179
180.. math::
181
182  dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
183
184
185.. opcode:: DST - Distance Vector
186
187.. math::
188
189  dst.x &= 1\\
190  dst.y &= src0.y \times src1.y\\
191  dst.z &= src0.z\\
192  dst.w &= src1.w
193
194
195.. opcode:: MIN - Minimum
196
197.. math::
198
199  dst.x = min(src0.x, src1.x)
200
201  dst.y = min(src0.y, src1.y)
202
203  dst.z = min(src0.z, src1.z)
204
205  dst.w = min(src0.w, src1.w)
206
207
208.. opcode:: MAX - Maximum
209
210.. math::
211
212  dst.x = max(src0.x, src1.x)
213
214  dst.y = max(src0.y, src1.y)
215
216  dst.z = max(src0.z, src1.z)
217
218  dst.w = max(src0.w, src1.w)
219
220
221.. opcode:: SLT - Set On Less Than
222
223.. math::
224
225  dst.x = (src0.x < src1.x) ? 1.0F : 0.0F
226
227  dst.y = (src0.y < src1.y) ? 1.0F : 0.0F
228
229  dst.z = (src0.z < src1.z) ? 1.0F : 0.0F
230
231  dst.w = (src0.w < src1.w) ? 1.0F : 0.0F
232
233
234.. opcode:: SGE - Set On Greater Equal Than
235
236.. math::
237
238  dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F
239
240  dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F
241
242  dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F
243
244  dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F
245
246
247.. opcode:: MAD - Multiply And Add
248
249Perform a * b + c. The implementation is free to decide whether there is an
250intermediate rounding step or not.
251
252.. math::
253
254  dst.x = src0.x \times src1.x + src2.x
255
256  dst.y = src0.y \times src1.y + src2.y
257
258  dst.z = src0.z \times src1.z + src2.z
259
260  dst.w = src0.w \times src1.w + src2.w
261
262
263.. opcode:: LRP - Linear Interpolate
264
265.. math::
266
267  dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
268
269  dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
270
271  dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
272
273  dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
274
275
276.. opcode:: FMA - Fused Multiply-Add
277
278Perform a * b + c with no intermediate rounding step.
279
280.. math::
281
282  dst.x = src0.x \times src1.x + src2.x
283
284  dst.y = src0.y \times src1.y + src2.y
285
286  dst.z = src0.z \times src1.z + src2.z
287
288  dst.w = src0.w \times src1.w + src2.w
289
290
291.. opcode:: FRC - Fraction
292
293.. math::
294
295  dst.x = src.x - \lfloor src.x\rfloor
296
297  dst.y = src.y - \lfloor src.y\rfloor
298
299  dst.z = src.z - \lfloor src.z\rfloor
300
301  dst.w = src.w - \lfloor src.w\rfloor
302
303
304.. opcode:: FLR - Floor
305
306.. math::
307
308  dst.x = \lfloor src.x\rfloor
309
310  dst.y = \lfloor src.y\rfloor
311
312  dst.z = \lfloor src.z\rfloor
313
314  dst.w = \lfloor src.w\rfloor
315
316
317.. opcode:: ROUND - Round
318
319.. math::
320
321  dst.x = round(src.x)
322
323  dst.y = round(src.y)
324
325  dst.z = round(src.z)
326
327  dst.w = round(src.w)
328
329
330.. opcode:: EX2 - Exponential Base 2
331
332This instruction replicates its result.
333
334.. math::
335
336  dst = 2^{src.x}
337
338
339.. opcode:: LG2 - Logarithm Base 2
340
341This instruction replicates its result.
342
343.. math::
344
345  dst = \log_2{src.x}
346
347
348.. opcode:: POW - Power
349
350This instruction replicates its result.
351
352.. math::
353
354  dst = src0.x^{src1.x}
355
356
357.. opcode:: LDEXP - Multiply Number by Integral Power of 2
358
359src1 is an integer.
360
361.. math::
362
363  dst.x = src0.x * 2^{src1.x}
364  dst.y = src0.y * 2^{src1.y}
365  dst.z = src0.z * 2^{src1.z}
366  dst.w = src0.w * 2^{src1.w}
367
368
369.. opcode:: COS - Cosine
370
371This instruction replicates its result.
372
373.. math::
374
375  dst = \cos{src.x}
376
377
378.. opcode:: DDX, DDX_FINE - Derivative Relative To X
379
380The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
381advertised. When it is, the fine version guarantees one derivative per row
382while DDX is allowed to be the same for the entire 2x2 quad.
383
384.. math::
385
386  dst.x = partialx(src.x)
387
388  dst.y = partialx(src.y)
389
390  dst.z = partialx(src.z)
391
392  dst.w = partialx(src.w)
393
394
395.. opcode:: DDY, DDY_FINE - Derivative Relative To Y
396
397The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
398advertised. When it is, the fine version guarantees one derivative per column
399while DDY is allowed to be the same for the entire 2x2 quad.
400
401.. math::
402
403  dst.x = partialy(src.x)
404
405  dst.y = partialy(src.y)
406
407  dst.z = partialy(src.z)
408
409  dst.w = partialy(src.w)
410
411
412.. opcode:: PK2H - Pack Two 16-bit Floats
413
414This instruction replicates its result.
415
416.. math::
417
418  dst = f32\_to\_f16(src.x) | f32\_to\_f16(src.y) << 16
419
420
421.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
422
423This instruction replicates its result.
424
425.. math::
426
427  dst = f32\_to\_unorm16(src.x) | f32\_to\_unorm16(src.y) << 16
428
429
430.. opcode:: PK4B - Pack Four Signed 8-bit Scalars
431
432This instruction replicates its result.
433
434.. math::
435
436  dst = f32\_to\_snorm8(src.x) |
437        (f32\_to\_snorm8(src.y) << 8) |
438        (f32\_to\_snorm8(src.z) << 16) |
439        (f32\_to\_snorm8(src.w) << 24)
440
441
442.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
443
444This instruction replicates its result.
445
446.. math::
447
448  dst = f32\_to\_unorm8(src.x) |
449        (f32\_to\_unorm8(src.y) << 8) |
450        (f32\_to\_unorm8(src.z) << 16) |
451        (f32\_to\_unorm8(src.w) << 24)
452
453
454.. opcode:: SEQ - Set On Equal
455
456.. math::
457
458  dst.x = (src0.x == src1.x) ? 1.0F : 0.0F
459
460  dst.y = (src0.y == src1.y) ? 1.0F : 0.0F
461
462  dst.z = (src0.z == src1.z) ? 1.0F : 0.0F
463
464  dst.w = (src0.w == src1.w) ? 1.0F : 0.0F
465
466
467.. opcode:: SGT - Set On Greater Than
468
469.. math::
470
471  dst.x = (src0.x > src1.x) ? 1.0F : 0.0F
472
473  dst.y = (src0.y > src1.y) ? 1.0F : 0.0F
474
475  dst.z = (src0.z > src1.z) ? 1.0F : 0.0F
476
477  dst.w = (src0.w > src1.w) ? 1.0F : 0.0F
478
479
480.. opcode:: SIN - Sine
481
482This instruction replicates its result.
483
484.. math::
485
486  dst = \sin{src.x}
487
488
489.. opcode:: SLE - Set On Less Equal Than
490
491.. math::
492
493  dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F
494
495  dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F
496
497  dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F
498
499  dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F
500
501
502.. opcode:: SNE - Set On Not Equal
503
504.. math::
505
506  dst.x = (src0.x != src1.x) ? 1.0F : 0.0F
507
508  dst.y = (src0.y != src1.y) ? 1.0F : 0.0F
509
510  dst.z = (src0.z != src1.z) ? 1.0F : 0.0F
511
512  dst.w = (src0.w != src1.w) ? 1.0F : 0.0F
513
514
515.. opcode:: TEX - Texture Lookup
516
517  for array textures src0.y contains the slice for 1D,
518  and src0.z contain the slice for 2D.
519
520  for shadow textures with no arrays (and not cube map),
521  src0.z contains the reference value.
522
523  for shadow textures with arrays, src0.z contains
524  the reference value for 1D arrays, and src0.w contains
525  the reference value for 2D arrays and cube maps.
526
527  for cube map array shadow textures, the reference value
528  cannot be passed in src0.w, and TEX2 must be used instead.
529
530.. math::
531
532  coord = src0
533
534  shadow_ref = src0.z or src0.w (optional)
535
536  unit = src1
537
538  dst = texture\_sample(unit, coord, shadow_ref)
539
540
541.. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only)
542
543  this is the same as TEX, but uses another reg to encode the
544  reference value.
545
546.. math::
547
548  coord = src0
549
550  shadow_ref = src1.x
551
552  unit = src2
553
554  dst = texture\_sample(unit, coord, shadow_ref)
555
556
557
558
559.. opcode:: TXD - Texture Lookup with Derivatives
560
561.. math::
562
563  coord = src0
564
565  ddx = src1
566
567  ddy = src2
568
569  unit = src3
570
571  dst = texture\_sample\_deriv(unit, coord, ddx, ddy)
572
573
574.. opcode:: TXP - Projective Texture Lookup
575
576.. math::
577
578  coord.x = src0.x / src0.w
579
580  coord.y = src0.y / src0.w
581
582  coord.z = src0.z / src0.w
583
584  coord.w = src0.w
585
586  unit = src1
587
588  dst = texture\_sample(unit, coord)
589
590
591.. opcode:: UP2H - Unpack Two 16-Bit Floats
592
593.. math::
594
595  dst.x = f16\_to\_f32(src0.x \& 0xffff)
596
597  dst.y = f16\_to\_f32(src0.x >> 16)
598
599  dst.z = f16\_to\_f32(src0.x \& 0xffff)
600
601  dst.w = f16\_to\_f32(src0.x >> 16)
602
603.. note::
604
605   Considered for removal.
606
607.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
608
609  TBD
610
611.. note::
612
613   Considered for removal.
614
615.. opcode:: UP4B - Unpack Four Signed 8-Bit Values
616
617  TBD
618
619.. note::
620
621   Considered for removal.
622
623.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
624
625  TBD
626
627.. note::
628
629   Considered for removal.
630
631
632.. opcode:: ARR - Address Register Load With Round
633
634.. math::
635
636  dst.x = (int) round(src.x)
637
638  dst.y = (int) round(src.y)
639
640  dst.z = (int) round(src.z)
641
642  dst.w = (int) round(src.w)
643
644
645.. opcode:: SSG - Set Sign
646
647.. math::
648
649  dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
650
651  dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
652
653  dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
654
655  dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
656
657
658.. opcode:: CMP - Compare
659
660.. math::
661
662  dst.x = (src0.x < 0) ? src1.x : src2.x
663
664  dst.y = (src0.y < 0) ? src1.y : src2.y
665
666  dst.z = (src0.z < 0) ? src1.z : src2.z
667
668  dst.w = (src0.w < 0) ? src1.w : src2.w
669
670
671.. opcode:: KILL_IF - Conditional Discard
672
673  Conditional discard.  Allowed in fragment shaders only.
674
675.. math::
676
677  if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
678    discard
679  endif
680
681
682.. opcode:: KILL - Discard
683
684  Unconditional discard.  Allowed in fragment shaders only.
685
686
687.. opcode:: DEMOTE - Demote Invocation to a Helper
688
689  This demotes the current invocation to a helper, but continues
690  execution (while KILL may or may not terminate the
691  invocation). After this runs, all the usual helper invocation rules
692  apply about discarding buffer and render target writes. This is
693  useful for having accurate derivatives in the other invocations
694  which have not been demoted.
695
696  Allowed in fragment shaders only.
697
698
699.. opcode:: READ_HELPER - Reads Invocation Helper Status
700
701  This is identical to ``TGSI_SEMANTIC_HELPER_INVOCATION``, except
702  this will read the current value, which might change as a result of
703  a ``DEMOTE`` instruction.
704
705  Allowed in fragment shaders only.
706
707
708.. opcode:: TXB - Texture Lookup With Bias
709
710  for cube map array textures and shadow cube maps, the bias value
711  cannot be passed in src0.w, and TXB2 must be used instead.
712
713  if the target is a shadow texture, the reference value is always
714  in src.z (this prevents shadow 3d and shadow 2d arrays from
715  using this instruction, but this is not needed).
716
717.. math::
718
719  coord.x = src0.x
720
721  coord.y = src0.y
722
723  coord.z = src0.z
724
725  coord.w = none
726
727  bias = src0.w
728
729  unit = src1
730
731  dst = texture\_sample(unit, coord, bias)
732
733
734.. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only)
735
736  this is the same as TXB, but uses another reg to encode the
737  LOD bias value for cube map arrays and shadow cube maps.
738  Presumably shadow 2d arrays and shadow 3d targets could use
739  this encoding too, but this is not legal.
740
741  if the target is a shadow cube map array, the reference value is in
742  src1.y.
743
744.. math::
745
746  coord = src0
747
748  bias = src1.x
749
750  unit = src2
751
752  dst = texture\_sample(unit, coord, bias)
753
754
755.. opcode:: DIV - Divide
756
757.. math::
758
759  dst.x = \frac{src0.x}{src1.x}
760
761  dst.y = \frac{src0.y}{src1.y}
762
763  dst.z = \frac{src0.z}{src1.z}
764
765  dst.w = \frac{src0.w}{src1.w}
766
767
768.. opcode:: DP2 - 2-component Dot Product
769
770This instruction replicates its result.
771
772.. math::
773
774  dst = src0.x \times src1.x + src0.y \times src1.y
775
776
777.. opcode:: TEX_LZ - Texture Lookup With LOD = 0
778
779  This is the same as TXL with LOD = 0. Like every texture opcode, it obeys
780  pipe_sampler_view::u.tex.first_level and pipe_sampler_state::min_lod.
781  There is no way to override those two in shaders.
782
783.. math::
784
785  coord.x = src0.x
786
787  coord.y = src0.y
788
789  coord.z = src0.z
790
791  coord.w = none
792
793  lod = 0
794
795  unit = src1
796
797  dst = texture\_sample(unit, coord, lod)
798
799
800.. opcode:: TXL - Texture Lookup With explicit LOD
801
802  for cube map array textures, the explicit LOD value
803  cannot be passed in src0.w, and TXL2 must be used instead.
804
805  if the target is a shadow texture, the reference value is always
806  in src.z (this prevents shadow 3d / 2d array / cube targets from
807  using this instruction, but this is not needed).
808
809.. math::
810
811  coord.x = src0.x
812
813  coord.y = src0.y
814
815  coord.z = src0.z
816
817  coord.w = none
818
819  lod = src0.w
820
821  unit = src1
822
823  dst = texture\_sample(unit, coord, lod)
824
825
826.. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only)
827
828  this is the same as TXL, but uses another reg to encode the
829  explicit LOD value.
830  Presumably shadow 3d / 2d array / cube targets could use
831  this encoding too, but this is not legal.
832
833  if the target is a shadow cube map array, the reference value is in
834  src1.y.
835
836.. math::
837
838  coord = src0
839
840  lod = src1.x
841
842  unit = src2
843
844  dst = texture\_sample(unit, coord, lod)
845
846
847Compute ISA
848^^^^^^^^^^^^^^^^^^^^^^^^
849
850These opcodes are primarily provided for special-use computational shaders.
851Support for these opcodes indicated by a special pipe capability bit (TBD).
852
853XXX doesn't look like most of the opcodes really belong here.
854
855.. opcode:: CEIL - Ceiling
856
857.. math::
858
859  dst.x = \lceil src.x\rceil
860
861  dst.y = \lceil src.y\rceil
862
863  dst.z = \lceil src.z\rceil
864
865  dst.w = \lceil src.w\rceil
866
867
868.. opcode:: TRUNC - Truncate
869
870.. math::
871
872  dst.x = trunc(src.x)
873
874  dst.y = trunc(src.y)
875
876  dst.z = trunc(src.z)
877
878  dst.w = trunc(src.w)
879
880
881.. opcode:: MOD - Modulus
882
883.. math::
884
885  dst.x = src0.x \bmod src1.x
886
887  dst.y = src0.y \bmod src1.y
888
889  dst.z = src0.z \bmod src1.z
890
891  dst.w = src0.w \bmod src1.w
892
893
894.. opcode:: UARL - Integer Address Register Load
895
896  Moves the contents of the source register, assumed to be an integer, into the
897  destination register, which is assumed to be an address (ADDR) register.
898
899
900.. opcode:: TXF - Texel Fetch
901
902  As per NV_gpu_shader4, extract a single texel from a specified texture
903  image or PIPE_BUFFER resource. The source sampler may not be a CUBE or
904  SHADOW.  src 0 is a
905  four-component signed integer vector used to identify the single texel
906  accessed. 3 components + level.  If the texture is multisampled, then
907  the fourth component indicates the sample, not the mipmap level.
908  Just like texture instructions, an optional
909  offset vector is provided, which is subject to various driver restrictions
910  (regarding range, source of offsets). This instruction ignores the sampler
911  state.
912
913  TXF(uint_vec coord, int_vec offset).
914
915
916.. opcode:: TXQ - Texture Size Query
917
918  As per NV_gpu_program4, retrieve the dimensions of the texture depending on
919  the target. For 1D (width), 2D/RECT/CUBE (width, height), 3D (width, height,
920  depth), 1D array (width, layers), 2D array (width, height, layers).
921  Also return the number of accessible levels (last_level - first_level + 1)
922  in W.
923
924  For components which don't return a resource dimension, their value
925  is undefined.
926
927.. math::
928
929  lod = src0.x
930
931  dst.x = texture\_width(unit, lod)
932
933  dst.y = texture\_height(unit, lod)
934
935  dst.z = texture\_depth(unit, lod)
936
937  dst.w = texture\_levels(unit)
938
939
940.. opcode:: TXQS - Texture Samples Query
941
942  This retrieves the number of samples in the texture, and stores it
943  into the x component as an unsigned integer. The other components are
944  undefined.  If the texture is not multisampled, this function returns
945  (1, undef, undef, undef).
946
947.. math::
948
949  dst.x = texture\_samples(unit)
950
951
952.. opcode:: TG4 - Texture Gather
953
954  As per ARB_texture_gather, gathers the four texels to be used in a bi-linear
955  filtering operation and packs them into a single register.  Only works with
956  2D, 2D array, cubemaps, and cubemaps arrays.  For 2D textures, only the
957  addressing modes of the sampler and the top level of any mip pyramid are
958  used. Set W to zero.  It behaves like the TEX instruction, but a filtered
959  sample is not generated. The four samples that contribute to filtering are
960  placed into xyzw in clockwise order, starting with the (u,v) texture
961  coordinate delta at the following locations (-, +), (+, +), (+, -), (-, -),
962  where the magnitude of the deltas are half a texel.
963
964  PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample
965  depth compares, single component selection, and a non-constant offset. It
966  doesn't allow support for the GL independent offset to get i0,j0. This would
967  require another CAP is hw can do it natively. For now we lower that before
968  TGSI.
969
970  PIPE_CAP_TGSI_TG4_COMPONENT_IN_SWIZZLE changes the encoding so that component
971  is stored in the sampler source swizzle x.
972
973.. math::
974
975   coord = src0
976
977   (without TGSI_TG4_COMPONENT_IN_SWIZZLE)
978   component = src1
979
980   dst = texture\_gather4 (unit, coord, component)
981
982   (with TGSI_TG4_COMPONENT_IN_SWIZZLE)
983   dst = texture\_gather4 (unit, coord)
984   component is encoded in sampler swizzle.
985
986(with SM5 - cube array shadow)
987
988.. math::
989
990   coord = src0
991
992   compare = src1
993
994   dst = texture\_gather (uint, coord, compare)
995
996.. opcode:: LODQ - level of detail query
997
998   Compute the LOD information that the texture pipe would use to access the
999   texture. The Y component contains the computed LOD lambda_prime. The X
1000   component contains the LOD that will be accessed, based on min/max LODs
1001   and mipmap filters.
1002
1003.. math::
1004
1005   coord = src0
1006
1007   dst.xy = lodq(uint, coord);
1008
1009.. opcode:: CLOCK - retrieve the current shader time
1010
1011   Invoking this instruction multiple times in the same shader should
1012   cause monotonically increasing values to be returned. The values
1013   are implicitly 64-bit, so if fewer than 64 bits of precision are
1014   available, to provide expected wraparound semantics, the value
1015   should be shifted up so that the most significant bit of the time
1016   is the most significant bit of the 64-bit value.
1017
1018.. math::
1019
1020   dst.xy = clock()
1021
1022
1023Integer ISA
1024^^^^^^^^^^^^^^^^^^^^^^^^
1025These opcodes are used for integer operations.
1026Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
1027
1028
1029.. opcode:: I2F - Signed Integer To Float
1030
1031   Rounding is unspecified (round to nearest even suggested).
1032
1033.. math::
1034
1035  dst.x = (float) src.x
1036
1037  dst.y = (float) src.y
1038
1039  dst.z = (float) src.z
1040
1041  dst.w = (float) src.w
1042
1043
1044.. opcode:: U2F - Unsigned Integer To Float
1045
1046   Rounding is unspecified (round to nearest even suggested).
1047
1048.. math::
1049
1050  dst.x = (float) src.x
1051
1052  dst.y = (float) src.y
1053
1054  dst.z = (float) src.z
1055
1056  dst.w = (float) src.w
1057
1058
1059.. opcode:: F2I - Float to Signed Integer
1060
1061   Rounding is towards zero (truncate).
1062   Values outside signed range (including NaNs) produce undefined results.
1063
1064.. math::
1065
1066  dst.x = (int) src.x
1067
1068  dst.y = (int) src.y
1069
1070  dst.z = (int) src.z
1071
1072  dst.w = (int) src.w
1073
1074
1075.. opcode:: F2U - Float to Unsigned Integer
1076
1077   Rounding is towards zero (truncate).
1078   Values outside unsigned range (including NaNs) produce undefined results.
1079
1080.. math::
1081
1082  dst.x = (unsigned) src.x
1083
1084  dst.y = (unsigned) src.y
1085
1086  dst.z = (unsigned) src.z
1087
1088  dst.w = (unsigned) src.w
1089
1090
1091.. opcode:: UADD - Integer Add
1092
1093   This instruction works the same for signed and unsigned integers.
1094   The low 32bit of the result is returned.
1095
1096.. math::
1097
1098  dst.x = src0.x + src1.x
1099
1100  dst.y = src0.y + src1.y
1101
1102  dst.z = src0.z + src1.z
1103
1104  dst.w = src0.w + src1.w
1105
1106
1107.. opcode:: UMAD - Integer Multiply And Add
1108
1109   This instruction works the same for signed and unsigned integers.
1110   The multiplication returns the low 32bit (as does the result itself).
1111
1112.. math::
1113
1114  dst.x = src0.x \times src1.x + src2.x
1115
1116  dst.y = src0.y \times src1.y + src2.y
1117
1118  dst.z = src0.z \times src1.z + src2.z
1119
1120  dst.w = src0.w \times src1.w + src2.w
1121
1122
1123.. opcode:: UMUL - Integer Multiply
1124
1125   This instruction works the same for signed and unsigned integers.
1126   The low 32bit of the result is returned.
1127
1128.. math::
1129
1130  dst.x = src0.x \times src1.x
1131
1132  dst.y = src0.y \times src1.y
1133
1134  dst.z = src0.z \times src1.z
1135
1136  dst.w = src0.w \times src1.w
1137
1138
1139.. opcode:: IMUL_HI - Signed Integer Multiply High Bits
1140
1141   The high 32bits of the multiplication of 2 signed integers are returned.
1142
1143.. math::
1144
1145  dst.x = (src0.x \times src1.x) >> 32
1146
1147  dst.y = (src0.y \times src1.y) >> 32
1148
1149  dst.z = (src0.z \times src1.z) >> 32
1150
1151  dst.w = (src0.w \times src1.w) >> 32
1152
1153
1154.. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
1155
1156   The high 32bits of the multiplication of 2 unsigned integers are returned.
1157
1158.. math::
1159
1160  dst.x = (src0.x \times src1.x) >> 32
1161
1162  dst.y = (src0.y \times src1.y) >> 32
1163
1164  dst.z = (src0.z \times src1.z) >> 32
1165
1166  dst.w = (src0.w \times src1.w) >> 32
1167
1168
1169.. opcode:: IDIV - Signed Integer Division
1170
1171   TBD: behavior for division by zero.
1172
1173.. math::
1174
1175  dst.x = \frac{src0.x}{src1.x}
1176
1177  dst.y = \frac{src0.y}{src1.y}
1178
1179  dst.z = \frac{src0.z}{src1.z}
1180
1181  dst.w = \frac{src0.w}{src1.w}
1182
1183
1184.. opcode:: UDIV - Unsigned Integer Division
1185
1186   For division by zero, 0xffffffff is returned.
1187
1188.. math::
1189
1190  dst.x = \frac{src0.x}{src1.x}
1191
1192  dst.y = \frac{src0.y}{src1.y}
1193
1194  dst.z = \frac{src0.z}{src1.z}
1195
1196  dst.w = \frac{src0.w}{src1.w}
1197
1198
1199.. opcode:: UMOD - Unsigned Integer Remainder
1200
1201   If second arg is zero, 0xffffffff is returned.
1202
1203.. math::
1204
1205  dst.x = src0.x \bmod src1.x
1206
1207  dst.y = src0.y \bmod src1.y
1208
1209  dst.z = src0.z \bmod src1.z
1210
1211  dst.w = src0.w \bmod src1.w
1212
1213
1214.. opcode:: NOT - Bitwise Not
1215
1216.. math::
1217
1218  dst.x = \sim src.x
1219
1220  dst.y = \sim src.y
1221
1222  dst.z = \sim src.z
1223
1224  dst.w = \sim src.w
1225
1226
1227.. opcode:: AND - Bitwise And
1228
1229.. math::
1230
1231  dst.x = src0.x \& src1.x
1232
1233  dst.y = src0.y \& src1.y
1234
1235  dst.z = src0.z \& src1.z
1236
1237  dst.w = src0.w \& src1.w
1238
1239
1240.. opcode:: OR - Bitwise Or
1241
1242.. math::
1243
1244  dst.x = src0.x | src1.x
1245
1246  dst.y = src0.y | src1.y
1247
1248  dst.z = src0.z | src1.z
1249
1250  dst.w = src0.w | src1.w
1251
1252
1253.. opcode:: XOR - Bitwise Xor
1254
1255.. math::
1256
1257  dst.x = src0.x \oplus src1.x
1258
1259  dst.y = src0.y \oplus src1.y
1260
1261  dst.z = src0.z \oplus src1.z
1262
1263  dst.w = src0.w \oplus src1.w
1264
1265
1266.. opcode:: IMAX - Maximum of Signed Integers
1267
1268.. math::
1269
1270  dst.x = max(src0.x, src1.x)
1271
1272  dst.y = max(src0.y, src1.y)
1273
1274  dst.z = max(src0.z, src1.z)
1275
1276  dst.w = max(src0.w, src1.w)
1277
1278
1279.. opcode:: UMAX - Maximum of Unsigned Integers
1280
1281.. math::
1282
1283  dst.x = max(src0.x, src1.x)
1284
1285  dst.y = max(src0.y, src1.y)
1286
1287  dst.z = max(src0.z, src1.z)
1288
1289  dst.w = max(src0.w, src1.w)
1290
1291
1292.. opcode:: IMIN - Minimum of Signed Integers
1293
1294.. math::
1295
1296  dst.x = min(src0.x, src1.x)
1297
1298  dst.y = min(src0.y, src1.y)
1299
1300  dst.z = min(src0.z, src1.z)
1301
1302  dst.w = min(src0.w, src1.w)
1303
1304
1305.. opcode:: UMIN - Minimum of Unsigned Integers
1306
1307.. math::
1308
1309  dst.x = min(src0.x, src1.x)
1310
1311  dst.y = min(src0.y, src1.y)
1312
1313  dst.z = min(src0.z, src1.z)
1314
1315  dst.w = min(src0.w, src1.w)
1316
1317
1318.. opcode:: SHL - Shift Left
1319
1320   The shift count is masked with 0x1f before the shift is applied.
1321
1322.. math::
1323
1324  dst.x = src0.x << (0x1f \& src1.x)
1325
1326  dst.y = src0.y << (0x1f \& src1.y)
1327
1328  dst.z = src0.z << (0x1f \& src1.z)
1329
1330  dst.w = src0.w << (0x1f \& src1.w)
1331
1332
1333.. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer)
1334
1335   The shift count is masked with 0x1f before the shift is applied.
1336
1337.. math::
1338
1339  dst.x = src0.x >> (0x1f \& src1.x)
1340
1341  dst.y = src0.y >> (0x1f \& src1.y)
1342
1343  dst.z = src0.z >> (0x1f \& src1.z)
1344
1345  dst.w = src0.w >> (0x1f \& src1.w)
1346
1347
1348.. opcode:: USHR - Logical Shift Right
1349
1350   The shift count is masked with 0x1f before the shift is applied.
1351
1352.. math::
1353
1354  dst.x = src0.x >> (unsigned) (0x1f \& src1.x)
1355
1356  dst.y = src0.y >> (unsigned) (0x1f \& src1.y)
1357
1358  dst.z = src0.z >> (unsigned) (0x1f \& src1.z)
1359
1360  dst.w = src0.w >> (unsigned) (0x1f \& src1.w)
1361
1362
1363.. opcode:: UCMP - Integer Conditional Move
1364
1365.. math::
1366
1367  dst.x = src0.x ? src1.x : src2.x
1368
1369  dst.y = src0.y ? src1.y : src2.y
1370
1371  dst.z = src0.z ? src1.z : src2.z
1372
1373  dst.w = src0.w ? src1.w : src2.w
1374
1375
1376
1377.. opcode:: ISSG - Integer Set Sign
1378
1379.. math::
1380
1381  dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
1382
1383  dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
1384
1385  dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
1386
1387  dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
1388
1389
1390
1391.. opcode:: FSLT - Float Set On Less Than (ordered)
1392
1393   Same comparison as SLT but returns integer instead of 1.0/0.0 float
1394
1395.. math::
1396
1397  dst.x = (src0.x < src1.x) ? \sim 0 : 0
1398
1399  dst.y = (src0.y < src1.y) ? \sim 0 : 0
1400
1401  dst.z = (src0.z < src1.z) ? \sim 0 : 0
1402
1403  dst.w = (src0.w < src1.w) ? \sim 0 : 0
1404
1405
1406.. opcode:: ISLT - Signed Integer Set On Less Than
1407
1408.. math::
1409
1410  dst.x = (src0.x < src1.x) ? \sim 0 : 0
1411
1412  dst.y = (src0.y < src1.y) ? \sim 0 : 0
1413
1414  dst.z = (src0.z < src1.z) ? \sim 0 : 0
1415
1416  dst.w = (src0.w < src1.w) ? \sim 0 : 0
1417
1418
1419.. opcode:: USLT - Unsigned Integer Set On Less Than
1420
1421.. math::
1422
1423  dst.x = (src0.x < src1.x) ? \sim 0 : 0
1424
1425  dst.y = (src0.y < src1.y) ? \sim 0 : 0
1426
1427  dst.z = (src0.z < src1.z) ? \sim 0 : 0
1428
1429  dst.w = (src0.w < src1.w) ? \sim 0 : 0
1430
1431
1432.. opcode:: FSGE - Float Set On Greater Equal Than (ordered)
1433
1434   Same comparison as SGE but returns integer instead of 1.0/0.0 float
1435
1436.. math::
1437
1438  dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1439
1440  dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1441
1442  dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1443
1444  dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1445
1446
1447.. opcode:: ISGE - Signed Integer Set On Greater Equal Than
1448
1449.. math::
1450
1451  dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1452
1453  dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1454
1455  dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1456
1457  dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1458
1459
1460.. opcode:: USGE - Unsigned Integer Set On Greater Equal Than
1461
1462.. math::
1463
1464  dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1465
1466  dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1467
1468  dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1469
1470  dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1471
1472
1473.. opcode:: FSEQ - Float Set On Equal (ordered)
1474
1475   Same comparison as SEQ but returns integer instead of 1.0/0.0 float
1476
1477.. math::
1478
1479  dst.x = (src0.x == src1.x) ? \sim 0 : 0
1480
1481  dst.y = (src0.y == src1.y) ? \sim 0 : 0
1482
1483  dst.z = (src0.z == src1.z) ? \sim 0 : 0
1484
1485  dst.w = (src0.w == src1.w) ? \sim 0 : 0
1486
1487
1488.. opcode:: USEQ - Integer Set On Equal
1489
1490.. math::
1491
1492  dst.x = (src0.x == src1.x) ? \sim 0 : 0
1493
1494  dst.y = (src0.y == src1.y) ? \sim 0 : 0
1495
1496  dst.z = (src0.z == src1.z) ? \sim 0 : 0
1497
1498  dst.w = (src0.w == src1.w) ? \sim 0 : 0
1499
1500
1501.. opcode:: FSNE - Float Set On Not Equal (unordered)
1502
1503   Same comparison as SNE but returns integer instead of 1.0/0.0 float
1504
1505.. math::
1506
1507  dst.x = (src0.x != src1.x) ? \sim 0 : 0
1508
1509  dst.y = (src0.y != src1.y) ? \sim 0 : 0
1510
1511  dst.z = (src0.z != src1.z) ? \sim 0 : 0
1512
1513  dst.w = (src0.w != src1.w) ? \sim 0 : 0
1514
1515
1516.. opcode:: USNE - Integer Set On Not Equal
1517
1518.. math::
1519
1520  dst.x = (src0.x != src1.x) ? \sim 0 : 0
1521
1522  dst.y = (src0.y != src1.y) ? \sim 0 : 0
1523
1524  dst.z = (src0.z != src1.z) ? \sim 0 : 0
1525
1526  dst.w = (src0.w != src1.w) ? \sim 0 : 0
1527
1528
1529.. opcode:: INEG - Integer Negate
1530
1531  Two's complement.
1532
1533.. math::
1534
1535  dst.x = -src.x
1536
1537  dst.y = -src.y
1538
1539  dst.z = -src.z
1540
1541  dst.w = -src.w
1542
1543
1544.. opcode:: IABS - Integer Absolute Value
1545
1546.. math::
1547
1548  dst.x = |src.x|
1549
1550  dst.y = |src.y|
1551
1552  dst.z = |src.z|
1553
1554  dst.w = |src.w|
1555
1556Bitwise ISA
1557^^^^^^^^^^^
1558These opcodes are used for bit-level manipulation of integers.
1559
1560.. opcode:: IBFE - Signed Bitfield Extract
1561
1562  Like GLSL bitfieldExtract. Extracts a set of bits from the input, and
1563  sign-extends them if the high bit of the extracted window is set.
1564
1565  Pseudocode::
1566
1567    def ibfe(value, offset, bits):
1568      if offset < 0 or bits < 0 or offset + bits > 32:
1569        return undefined
1570      if bits == 0: return 0
1571      # Note: >> sign-extends
1572      return (value << (32 - offset - bits)) >> (32 - bits)
1573
1574.. opcode:: UBFE - Unsigned Bitfield Extract
1575
1576  Like GLSL bitfieldExtract. Extracts a set of bits from the input, without
1577  any sign-extension.
1578
1579  Pseudocode::
1580
1581    def ubfe(value, offset, bits):
1582      if offset < 0 or bits < 0 or offset + bits > 32:
1583        return undefined
1584      if bits == 0: return 0
1585      # Note: >> does not sign-extend
1586      return (value << (32 - offset - bits)) >> (32 - bits)
1587
1588.. opcode:: BFI - Bitfield Insert
1589
1590  Like GLSL bitfieldInsert. Replaces a bit region of 'base' with the low bits
1591  of 'insert'.
1592
1593  Pseudocode::
1594
1595    def bfi(base, insert, offset, bits):
1596      if offset < 0 or bits < 0 or offset + bits > 32:
1597        return undefined
1598      # << defined such that mask == ~0 when bits == 32, offset == 0
1599      mask = ((1 << bits) - 1) << offset
1600      return ((insert << offset) & mask) | (base & ~mask)
1601
1602.. opcode:: BREV - Bitfield Reverse
1603
1604  See SM5 instruction BFREV. Reverses the bits of the argument.
1605
1606.. opcode:: POPC - Population Count
1607
1608  See SM5 instruction COUNTBITS. Counts the number of set bits in the argument.
1609
1610.. opcode:: LSB - Index of lowest set bit
1611
1612  See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set
1613  bit of the argument. Returns -1 if none are set.
1614
1615.. opcode:: IMSB - Index of highest non-sign bit
1616
1617  See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest
1618  non-sign bit of the argument (i.e. highest 0 bit for negative numbers,
1619  highest 1 bit for positive numbers). Returns -1 if all bits are the same
1620  (i.e. for inputs 0 and -1).
1621
1622.. opcode:: UMSB - Index of highest set bit
1623
1624  See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest
1625  set bit of the argument. Returns -1 if none are set.
1626
1627Geometry ISA
1628^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1629
1630These opcodes are only supported in geometry shaders; they have no meaning
1631in any other type of shader.
1632
1633.. opcode:: EMIT - Emit
1634
1635  Generate a new vertex for the current primitive into the specified vertex
1636  stream using the values in the output registers.
1637
1638
1639.. opcode:: ENDPRIM - End Primitive
1640
1641  Complete the current primitive in the specified vertex stream (consisting of
1642  the emitted vertices), and start a new one.
1643
1644
1645GLSL ISA
1646^^^^^^^^^^
1647
1648These opcodes are part of :term:`GLSL`'s opcode set. Support for these
1649opcodes is determined by a special capability bit, ``GLSL``.
1650Some require glsl version 1.30 (UIF/SWITCH/CASE/DEFAULT/ENDSWITCH).
1651
1652.. opcode:: CAL - Subroutine Call
1653
1654  push(pc)
1655  pc = target
1656
1657
1658.. opcode:: RET - Subroutine Call Return
1659
1660  pc = pop()
1661
1662
1663.. opcode:: CONT - Continue
1664
1665  Unconditionally moves the point of execution to the instruction after the
1666  last bgnloop. The instruction must appear within a bgnloop/endloop.
1667
1668.. note::
1669
1670   Support for CONT is determined by a special capability bit,
1671   ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
1672
1673
1674.. opcode:: BGNLOOP - Begin a Loop
1675
1676  Start a loop. Must have a matching endloop.
1677
1678
1679.. opcode:: BGNSUB - Begin Subroutine
1680
1681  Starts definition of a subroutine. Must have a matching endsub.
1682
1683
1684.. opcode:: ENDLOOP - End a Loop
1685
1686  End a loop started with bgnloop.
1687
1688
1689.. opcode:: ENDSUB - End Subroutine
1690
1691  Ends definition of a subroutine.
1692
1693
1694.. opcode:: NOP - No Operation
1695
1696  Do nothing.
1697
1698
1699.. opcode:: BRK - Break
1700
1701  Unconditionally moves the point of execution to the instruction after the
1702  next endloop or endswitch. The instruction must appear within a loop/endloop
1703  or switch/endswitch.
1704
1705
1706.. opcode:: IF - Float If
1707
1708  Start an IF ... ELSE .. ENDIF block.  Condition evaluates to true if
1709
1710    src0.x != 0.0
1711
1712  where src0.x is interpreted as a floating point register.
1713
1714
1715.. opcode:: UIF - Bitwise If
1716
1717  Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
1718
1719    src0.x != 0
1720
1721  where src0.x is interpreted as an integer register.
1722
1723
1724.. opcode:: ELSE - Else
1725
1726  Starts an else block, after an IF or UIF statement.
1727
1728
1729.. opcode:: ENDIF - End If
1730
1731  Ends an IF or UIF block.
1732
1733
1734.. opcode:: SWITCH - Switch
1735
1736   Starts a C-style switch expression. The switch consists of one or multiple
1737   CASE statements, and at most one DEFAULT statement. Execution of a statement
1738   ends when a BRK is hit, but just like in C falling through to other cases
1739   without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
1740   just as last statement, and fallthrough is allowed into/from it.
1741   CASE src arguments are evaluated at bit level against the SWITCH src argument.
1742
1743   Example::
1744
1745     SWITCH src[0].x
1746     CASE src[0].x
1747     (some instructions here)
1748     (optional BRK here)
1749     DEFAULT
1750     (some instructions here)
1751     (optional BRK here)
1752     CASE src[0].x
1753     (some instructions here)
1754     (optional BRK here)
1755     ENDSWITCH
1756
1757
1758.. opcode:: CASE - Switch case
1759
1760   This represents a switch case label. The src arg must be an integer immediate.
1761
1762
1763.. opcode:: DEFAULT - Switch default
1764
1765   This represents the default case in the switch, which is taken if no other
1766   case matches.
1767
1768
1769.. opcode:: ENDSWITCH - End of switch
1770
1771   Ends a switch expression.
1772
1773
1774Interpolation ISA
1775^^^^^^^^^^^^^^^^^
1776
1777The interpolation instructions allow an input to be interpolated in a
1778different way than its declaration. This corresponds to the GLSL 4.00
1779interpolateAt* functions. The first argument of each of these must come from
1780``TGSI_FILE_INPUT``.
1781
1782.. opcode:: INTERP_CENTROID - Interpolate at the centroid
1783
1784   Interpolates the varying specified by src0 at the centroid
1785
1786.. opcode:: INTERP_SAMPLE - Interpolate at the specified sample
1787
1788   Interpolates the varying specified by src0 at the sample id specified by
1789   src1.x (interpreted as an integer)
1790
1791.. opcode:: INTERP_OFFSET - Interpolate at the specified offset
1792
1793   Interpolates the varying specified by src0 at the offset src1.xy from the
1794   pixel center (interpreted as floats)
1795
1796
1797.. _doubleopcodes:
1798
1799Double ISA
1800^^^^^^^^^^^^^^^
1801
1802The double-precision opcodes reinterpret four-component vectors into
1803two-component vectors with doubled precision in each component.
1804
1805.. opcode:: DABS - Absolute
1806
1807.. math::
1808
1809  dst.xy = |src0.xy|
1810
1811  dst.zw = |src0.zw|
1812
1813.. opcode:: DADD - Add
1814
1815.. math::
1816
1817  dst.xy = src0.xy + src1.xy
1818
1819  dst.zw = src0.zw + src1.zw
1820
1821.. opcode:: DSEQ - Set on Equal
1822
1823.. math::
1824
1825  dst.x = src0.xy == src1.xy ? \sim 0 : 0
1826
1827  dst.z = src0.zw == src1.zw ? \sim 0 : 0
1828
1829.. opcode:: DSNE - Set on Not Equal
1830
1831.. math::
1832
1833  dst.x = src0.xy != src1.xy ? \sim 0 : 0
1834
1835  dst.z = src0.zw != src1.zw ? \sim 0 : 0
1836
1837.. opcode:: DSLT - Set on Less than
1838
1839.. math::
1840
1841  dst.x = src0.xy < src1.xy ? \sim 0 : 0
1842
1843  dst.z = src0.zw < src1.zw ? \sim 0 : 0
1844
1845.. opcode:: DSGE - Set on Greater equal
1846
1847.. math::
1848
1849  dst.x = src0.xy >= src1.xy ? \sim 0 : 0
1850
1851  dst.z = src0.zw >= src1.zw ? \sim 0 : 0
1852
1853.. opcode:: DFRAC - Fraction
1854
1855.. math::
1856
1857  dst.xy = src.xy - \lfloor src.xy\rfloor
1858
1859  dst.zw = src.zw - \lfloor src.zw\rfloor
1860
1861.. opcode:: DTRUNC - Truncate
1862
1863.. math::
1864
1865  dst.xy = trunc(src.xy)
1866
1867  dst.zw = trunc(src.zw)
1868
1869.. opcode:: DCEIL - Ceiling
1870
1871.. math::
1872
1873  dst.xy = \lceil src.xy\rceil
1874
1875  dst.zw = \lceil src.zw\rceil
1876
1877.. opcode:: DFLR - Floor
1878
1879.. math::
1880
1881  dst.xy = \lfloor src.xy\rfloor
1882
1883  dst.zw = \lfloor src.zw\rfloor
1884
1885.. opcode:: DROUND - Fraction
1886
1887.. math::
1888
1889  dst.xy = round(src.xy)
1890
1891  dst.zw = round(src.zw)
1892
1893.. opcode:: DSSG - Set Sign
1894
1895.. math::
1896
1897  dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0
1898
1899  dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0
1900
1901.. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components
1902
1903Like the ``frexp()`` routine in many math libraries, this opcode stores the
1904exponent of its source to ``dst0``, and the significand to ``dst1``, such that
1905:math:`dst1 \times 2^{dst0} = src` . The results are replicated across
1906channels.
1907
1908.. math::
1909
1910  dst0.xy = dst.zw = frac(src.xy)
1911
1912  dst1 = frac(src.xy)
1913
1914
1915.. opcode:: DLDEXP - Multiply Number by Integral Power of 2
1916
1917This opcode is the inverse of :opcode:`DFRACEXP`. The second
1918source is an integer.
1919
1920.. math::
1921
1922  dst.xy = src0.xy \times 2^{src1.x}
1923
1924  dst.zw = src0.zw \times 2^{src1.z}
1925
1926.. opcode:: DMIN - Minimum
1927
1928.. math::
1929
1930  dst.xy = min(src0.xy, src1.xy)
1931
1932  dst.zw = min(src0.zw, src1.zw)
1933
1934.. opcode:: DMAX - Maximum
1935
1936.. math::
1937
1938  dst.xy = max(src0.xy, src1.xy)
1939
1940  dst.zw = max(src0.zw, src1.zw)
1941
1942.. opcode:: DMUL - Multiply
1943
1944.. math::
1945
1946  dst.xy = src0.xy \times src1.xy
1947
1948  dst.zw = src0.zw \times src1.zw
1949
1950
1951.. opcode:: DMAD - Multiply And Add
1952
1953.. math::
1954
1955  dst.xy = src0.xy \times src1.xy + src2.xy
1956
1957  dst.zw = src0.zw \times src1.zw + src2.zw
1958
1959
1960.. opcode:: DFMA - Fused Multiply-Add
1961
1962Perform a * b + c with no intermediate rounding step.
1963
1964.. math::
1965
1966  dst.xy = src0.xy \times src1.xy + src2.xy
1967
1968  dst.zw = src0.zw \times src1.zw + src2.zw
1969
1970
1971.. opcode:: DDIV - Divide
1972
1973.. math::
1974
1975  dst.xy = \frac{src0.xy}{src1.xy}
1976
1977  dst.zw = \frac{src0.zw}{src1.zw}
1978
1979
1980.. opcode:: DRCP - Reciprocal
1981
1982.. math::
1983
1984   dst.xy = \frac{1}{src.xy}
1985
1986   dst.zw = \frac{1}{src.zw}
1987
1988.. opcode:: DSQRT - Square Root
1989
1990.. math::
1991
1992   dst.xy = \sqrt{src.xy}
1993
1994   dst.zw = \sqrt{src.zw}
1995
1996.. opcode:: DRSQ - Reciprocal Square Root
1997
1998.. math::
1999
2000   dst.xy = \frac{1}{\sqrt{src.xy}}
2001
2002   dst.zw = \frac{1}{\sqrt{src.zw}}
2003
2004.. opcode:: F2D - Float to Double
2005
2006.. math::
2007
2008   dst.xy = double(src0.x)
2009
2010   dst.zw = double(src0.y)
2011
2012.. opcode:: D2F - Double to Float
2013
2014.. math::
2015
2016   dst.x = float(src0.xy)
2017
2018   dst.y = float(src0.zw)
2019
2020.. opcode:: I2D - Int to Double
2021
2022.. math::
2023
2024   dst.xy = double(src0.x)
2025
2026   dst.zw = double(src0.y)
2027
2028.. opcode:: D2I - Double to Int
2029
2030.. math::
2031
2032   dst.x = int(src0.xy)
2033
2034   dst.y = int(src0.zw)
2035
2036.. opcode:: U2D - Unsigned Int to Double
2037
2038.. math::
2039
2040   dst.xy = double(src0.x)
2041
2042   dst.zw = double(src0.y)
2043
2044.. opcode:: D2U - Double to Unsigned Int
2045
2046.. math::
2047
2048   dst.x = unsigned(src0.xy)
2049
2050   dst.y = unsigned(src0.zw)
2051
205264-bit Integer ISA
2053^^^^^^^^^^^^^^^^^^
2054
2055The 64-bit integer opcodes reinterpret four-component vectors into
2056two-component vectors with 64-bits in each component.
2057
2058.. opcode:: I64ABS - 64-bit Integer Absolute Value
2059
2060.. math::
2061
2062  dst.xy = |src0.xy|
2063
2064  dst.zw = |src0.zw|
2065
2066.. opcode:: I64NEG - 64-bit Integer Negate
2067
2068  Two's complement.
2069
2070.. math::
2071
2072  dst.xy = -src.xy
2073
2074  dst.zw = -src.zw
2075
2076.. opcode:: I64SSG - 64-bit Integer Set Sign
2077
2078.. math::
2079
2080  dst.xy = (src0.xy < 0) ? -1 : (src0.xy > 0) ? 1 : 0
2081
2082  dst.zw = (src0.zw < 0) ? -1 : (src0.zw > 0) ? 1 : 0
2083
2084.. opcode:: U64ADD - 64-bit Integer Add
2085
2086.. math::
2087
2088  dst.xy = src0.xy + src1.xy
2089
2090  dst.zw = src0.zw + src1.zw
2091
2092.. opcode:: U64MUL - 64-bit Integer Multiply
2093
2094.. math::
2095
2096  dst.xy = src0.xy * src1.xy
2097
2098  dst.zw = src0.zw * src1.zw
2099
2100.. opcode:: U64SEQ - 64-bit Integer Set on Equal
2101
2102.. math::
2103
2104  dst.x = src0.xy == src1.xy ? \sim 0 : 0
2105
2106  dst.z = src0.zw == src1.zw ? \sim 0 : 0
2107
2108.. opcode:: U64SNE - 64-bit Integer Set on Not Equal
2109
2110.. math::
2111
2112  dst.x = src0.xy != src1.xy ? \sim 0 : 0
2113
2114  dst.z = src0.zw != src1.zw ? \sim 0 : 0
2115
2116.. opcode:: U64SLT - 64-bit Unsigned Integer Set on Less Than
2117
2118.. math::
2119
2120  dst.x = src0.xy < src1.xy ? \sim 0 : 0
2121
2122  dst.z = src0.zw < src1.zw ? \sim 0 : 0
2123
2124.. opcode:: U64SGE - 64-bit Unsigned Integer Set on Greater Equal
2125
2126.. math::
2127
2128  dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2129
2130  dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2131
2132.. opcode:: I64SLT - 64-bit Signed Integer Set on Less Than
2133
2134.. math::
2135
2136  dst.x = src0.xy < src1.xy ? \sim 0 : 0
2137
2138  dst.z = src0.zw < src1.zw ? \sim 0 : 0
2139
2140.. opcode:: I64SGE - 64-bit Signed Integer Set on Greater Equal
2141
2142.. math::
2143
2144  dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2145
2146  dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2147
2148.. opcode:: I64MIN - Minimum of 64-bit Signed Integers
2149
2150.. math::
2151
2152  dst.xy = min(src0.xy, src1.xy)
2153
2154  dst.zw = min(src0.zw, src1.zw)
2155
2156.. opcode:: U64MIN - Minimum of 64-bit Unsigned Integers
2157
2158.. math::
2159
2160  dst.xy = min(src0.xy, src1.xy)
2161
2162  dst.zw = min(src0.zw, src1.zw)
2163
2164.. opcode:: I64MAX - Maximum of 64-bit Signed Integers
2165
2166.. math::
2167
2168  dst.xy = max(src0.xy, src1.xy)
2169
2170  dst.zw = max(src0.zw, src1.zw)
2171
2172.. opcode:: U64MAX - Maximum of 64-bit Unsigned Integers
2173
2174.. math::
2175
2176  dst.xy = max(src0.xy, src1.xy)
2177
2178  dst.zw = max(src0.zw, src1.zw)
2179
2180.. opcode:: U64SHL - Shift Left 64-bit Unsigned Integer
2181
2182   The shift count is masked with 0x3f before the shift is applied.
2183
2184.. math::
2185
2186  dst.xy = src0.xy << (0x3f \& src1.x)
2187
2188  dst.zw = src0.zw << (0x3f \& src1.y)
2189
2190.. opcode:: I64SHR - Arithmetic Shift Right (of 64-bit Signed Integer)
2191
2192   The shift count is masked with 0x3f before the shift is applied.
2193
2194.. math::
2195
2196  dst.xy = src0.xy >> (0x3f \& src1.x)
2197
2198  dst.zw = src0.zw >> (0x3f \& src1.y)
2199
2200.. opcode:: U64SHR - Logical Shift Right (of 64-bit Unsigned Integer)
2201
2202   The shift count is masked with 0x3f before the shift is applied.
2203
2204.. math::
2205
2206  dst.xy = src0.xy >> (unsigned) (0x3f \& src1.x)
2207
2208  dst.zw = src0.zw >> (unsigned) (0x3f \& src1.y)
2209
2210.. opcode:: I64DIV - 64-bit Signed Integer Division
2211
2212.. math::
2213
2214  dst.xy = \frac{src0.xy}{src1.xy}
2215
2216  dst.zw = \frac{src0.zw}{src1.zw}
2217
2218.. opcode:: U64DIV - 64-bit Unsigned Integer Division
2219
2220.. math::
2221
2222  dst.xy = \frac{src0.xy}{src1.xy}
2223
2224  dst.zw = \frac{src0.zw}{src1.zw}
2225
2226.. opcode:: U64MOD - 64-bit Unsigned Integer Remainder
2227
2228.. math::
2229
2230  dst.xy = src0.xy \bmod src1.xy
2231
2232  dst.zw = src0.zw \bmod src1.zw
2233
2234.. opcode:: I64MOD - 64-bit Signed Integer Remainder
2235
2236.. math::
2237
2238  dst.xy = src0.xy \bmod src1.xy
2239
2240  dst.zw = src0.zw \bmod src1.zw
2241
2242.. opcode:: F2U64 - Float to 64-bit Unsigned Int
2243
2244.. math::
2245
2246   dst.xy = (uint64_t) src0.x
2247
2248   dst.zw = (uint64_t) src0.y
2249
2250.. opcode:: F2I64 - Float to 64-bit Int
2251
2252.. math::
2253
2254   dst.xy = (int64_t) src0.x
2255
2256   dst.zw = (int64_t) src0.y
2257
2258.. opcode:: U2I64 - Unsigned Integer to 64-bit Integer
2259
2260   This is a zero extension.
2261
2262.. math::
2263
2264   dst.xy = (int64_t) src0.x
2265
2266   dst.zw = (int64_t) src0.y
2267
2268.. opcode:: I2I64 - Signed Integer to 64-bit Integer
2269
2270   This is a sign extension.
2271
2272.. math::
2273
2274   dst.xy = (int64_t) src0.x
2275
2276   dst.zw = (int64_t) src0.y
2277
2278.. opcode:: D2U64 - Double to 64-bit Unsigned Int
2279
2280.. math::
2281
2282   dst.xy = (uint64_t) src0.xy
2283
2284   dst.zw = (uint64_t) src0.zw
2285
2286.. opcode:: D2I64 - Double to 64-bit Int
2287
2288.. math::
2289
2290   dst.xy = (int64_t) src0.xy
2291
2292   dst.zw = (int64_t) src0.zw
2293
2294.. opcode:: U642F - 64-bit unsigned integer to float
2295
2296.. math::
2297
2298   dst.x = (float) src0.xy
2299
2300   dst.y = (float) src0.zw
2301
2302.. opcode:: I642F - 64-bit Int to Float
2303
2304.. math::
2305
2306   dst.x = (float) src0.xy
2307
2308   dst.y = (float) src0.zw
2309
2310.. opcode:: U642D - 64-bit unsigned integer to double
2311
2312.. math::
2313
2314   dst.xy = (double) src0.xy
2315
2316   dst.zw = (double) src0.zw
2317
2318.. opcode:: I642D - 64-bit Int to double
2319
2320.. math::
2321
2322   dst.xy = (double) src0.xy
2323
2324   dst.zw = (double) src0.zw
2325
2326.. _samplingopcodes:
2327
2328Resource Sampling Opcodes
2329^^^^^^^^^^^^^^^^^^^^^^^^^
2330
2331Those opcodes follow very closely semantics of the respective Direct3D
2332instructions. If in doubt double check Direct3D documentation.
2333Note that the swizzle on SVIEW (src1) determines texel swizzling
2334after lookup.
2335
2336.. opcode:: SAMPLE
2337
2338  Using provided address, sample data from the specified texture using the
2339  filtering mode identified by the given sampler. The source data may come from
2340  any resource type other than buffers.
2341
2342  Syntax: ``SAMPLE dst, address, sampler_view, sampler``
2343
2344  Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]``
2345
2346.. opcode:: SAMPLE_I
2347
2348  Simplified alternative to the SAMPLE instruction.  Using the provided
2349  integer address, SAMPLE_I fetches data from the specified sampler view
2350  without any filtering.  The source data may come from any resource type
2351  other than CUBE.
2352
2353  Syntax: ``SAMPLE_I dst, address, sampler_view``
2354
2355  Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]``
2356
2357  The 'address' is specified as unsigned integers. If the 'address' is out of
2358  range [0...(# texels - 1)] the result of the fetch is always 0 in all
2359  components.  As such the instruction doesn't honor address wrap modes, in
2360  cases where that behavior is desirable 'SAMPLE' instruction should be used.
2361  address.w always provides an unsigned integer mipmap level. If the value is
2362  out of the range then the instruction always returns 0 in all components.
2363  address.yz are ignored for buffers and 1d textures.  address.z is ignored
2364  for 1d texture arrays and 2d textures.
2365
2366  For 1D texture arrays address.y provides the array index (also as unsigned
2367  integer). If the value is out of the range of available array indices
2368  [0... (array size - 1)] then the opcode always returns 0 in all components.
2369  For 2D texture arrays address.z provides the array index, otherwise it
2370  exhibits the same behavior as in the case for 1D texture arrays.  The exact
2371  semantics of the source address are presented in the table below:
2372
2373  +---------------------------+----+-----+-----+---------+
2374  | resource type             | X  |  Y  |  Z  |    W    |
2375  +===========================+====+=====+=====+=========+
2376  | ``PIPE_BUFFER``           | x  |     |     | ignored |
2377  +---------------------------+----+-----+-----+---------+
2378  | ``PIPE_TEXTURE_1D``       | x  |     |     |   mpl   |
2379  +---------------------------+----+-----+-----+---------+
2380  | ``PIPE_TEXTURE_2D``       | x  |  y  |     |   mpl   |
2381  +---------------------------+----+-----+-----+---------+
2382  | ``PIPE_TEXTURE_3D``       | x  |  y  |  z  |   mpl   |
2383  +---------------------------+----+-----+-----+---------+
2384  | ``PIPE_TEXTURE_RECT``     | x  |  y  |     |   mpl   |
2385  +---------------------------+----+-----+-----+---------+
2386  | ``PIPE_TEXTURE_CUBE``     | not allowed as source    |
2387  +---------------------------+----+-----+-----+---------+
2388  | ``PIPE_TEXTURE_1D_ARRAY`` | x  | idx |     |   mpl   |
2389  +---------------------------+----+-----+-----+---------+
2390  | ``PIPE_TEXTURE_2D_ARRAY`` | x  |  y  | idx |   mpl   |
2391  +---------------------------+----+-----+-----+---------+
2392
2393  Where 'mpl' is a mipmap level and 'idx' is the array index.
2394
2395.. opcode:: SAMPLE_I_MS
2396
2397  Just like SAMPLE_I but allows fetch data from multi-sampled surfaces.
2398
2399  Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample``
2400
2401.. opcode:: SAMPLE_B
2402
2403  Just like the SAMPLE instruction with the exception that an additional bias
2404  is applied to the level of detail computed as part of the instruction
2405  execution.
2406
2407  Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias``
2408
2409  Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2410
2411.. opcode:: SAMPLE_C
2412
2413  Similar to the SAMPLE instruction but it performs a comparison filter. The
2414  operands to SAMPLE_C are identical to SAMPLE, except that there is an
2415  additional float32 operand, reference value, which must be a register with
2416  single-component, or a scalar literal.  SAMPLE_C makes the hardware use the
2417  current samplers compare_func (in pipe_sampler_state) to compare reference
2418  value against the red component value for the surce resource at each texel
2419  that the currently configured texture filter covers based on the provided
2420  coordinates.
2421
2422  Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value``
2423
2424  Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2425
2426.. opcode:: SAMPLE_C_LZ
2427
2428  Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands
2429  for level-zero.
2430
2431  Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value``
2432
2433  Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2434
2435
2436.. opcode:: SAMPLE_D
2437
2438  SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for
2439  the source address in the x direction and the y direction are provided by
2440  extra parameters.
2441
2442  Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y``
2443
2444  Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]``
2445
2446.. opcode:: SAMPLE_L
2447
2448  SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided
2449  directly as a scalar value, representing no anisotropy.
2450
2451  Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod``
2452
2453  Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2454
2455.. opcode:: GATHER4
2456
2457  Gathers the four texels to be used in a bi-linear filtering operation and
2458  packs them into a single register.  Only works with 2D, 2D array, cubemaps,
2459  and cubemaps arrays.  For 2D textures, only the addressing modes of the
2460  sampler and the top level of any mip pyramid are used. Set W to zero.  It
2461  behaves like the SAMPLE instruction, but a filtered sample is not
2462  generated. The four samples that contribute to filtering are placed into
2463  xyzw in counter-clockwise order, starting with the (u,v) texture coordinate
2464  delta at the following locations (-, +), (+, +), (+, -), (-, -), where the
2465  magnitude of the deltas are half a texel.
2466
2467
2468.. opcode:: SVIEWINFO
2469
2470  Query the dimensions of a given sampler view.  dst receives width, height,
2471  depth or array size and number of mipmap levels as int4. The dst can have a
2472  writemask which will specify what info is the caller interested in.
2473
2474  Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view``
2475
2476  Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]``
2477
2478  src_mip_level is an unsigned integer scalar. If it's out of range then
2479  returns 0 for width, height and depth/array size but the total number of
2480  mipmap is still returned correctly for the given sampler view.  The returned
2481  width, height and depth values are for the mipmap level selected by the
2482  src_mip_level and are in the number of texels.  For 1d texture array width
2483  is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is
2484  still in dst.w.  In contrast to d3d10 resinfo, there's no way in the tgsi
2485  instruction encoding to specify the return type (float/rcpfloat/uint), hence
2486  always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1
2487  resinfo allowing swizzling dst values is ignored (due to the interaction
2488  with rcpfloat modifier which requires some swizzle handling in the state
2489  tracker anyway).
2490
2491.. opcode:: SAMPLE_POS
2492
2493  Query the position of a sample in the given resource or render target
2494  when per-sample fragment shading is in effect.
2495
2496  Syntax: ``SAMPLE_POS dst, source, sample_index``
2497
2498  dst receives float4 (x, y, undef, undef) indicated where the sample is
2499  located. Sample locations are in the range [0, 1] where 0.5 is the center
2500  of the fragment.
2501
2502  source is either a sampler view (to indicate a shader resource) or temp
2503  register (to indicate the render target).  The source register may have
2504  an optional swizzle to apply to the returned result
2505
2506  sample_index is an integer scalar indicating which sample position is to
2507  be queried.
2508
2509  If per-sample shading is not in effect or the source resource or render
2510  target is not multisampled, the result is (0.5, 0.5, undef, undef).
2511
2512  NOTE: no driver has implemented this opcode yet (and no gallium frontend
2513  emits it).  This information is subject to change.
2514
2515.. opcode:: SAMPLE_INFO
2516
2517  Query the number of samples in a multisampled resource or render target.
2518
2519  Syntax: ``SAMPLE_INFO dst, source``
2520
2521  dst receives int4 (n, 0, 0, 0) where n is the number of samples in a
2522  resource or the render target.
2523
2524  source is either a sampler view (to indicate a shader resource) or temp
2525  register (to indicate the render target).  The source register may have
2526  an optional swizzle to apply to the returned result
2527
2528  If per-sample shading is not in effect or the source resource or render
2529  target is not multisampled, the result is (1, 0, 0, 0).
2530
2531  NOTE: no driver has implemented this opcode yet (and no gallium frontend
2532  emits it).  This information is subject to change.
2533
2534.. opcode:: LOD - level of detail
2535
2536   Same syntax as the SAMPLE opcode but instead of performing an actual
2537   texture lookup/filter, return the computed LOD information that the
2538   texture pipe would use to access the texture. The Y component contains
2539   the computed LOD lambda_prime. The X component contains the LOD that will
2540   be accessed, based on min/max lod's and mipmap filters.
2541   The Z and W components are set to 0.
2542
2543   Syntax: ``LOD dst, address, sampler_view, sampler``
2544
2545
2546.. _resourceopcodes:
2547
2548Resource Access Opcodes
2549^^^^^^^^^^^^^^^^^^^^^^^
2550
2551For these opcodes, the resource can be a BUFFER, IMAGE, or MEMORY.
2552
2553.. opcode:: LOAD - Fetch data from a shader buffer or image
2554
2555               Syntax: ``LOAD dst, resource, address``
2556
2557               Example: ``LOAD TEMP[0], BUFFER[0], TEMP[1]``
2558
2559               Using the provided integer address, LOAD fetches data
2560               from the specified buffer or texture without any
2561               filtering.
2562
2563               The 'address' is specified as a vector of unsigned
2564               integers.  If the 'address' is out of range the result
2565               is unspecified.
2566
2567               Only the first mipmap level of a resource can be read
2568               from using this instruction.
2569
2570               For 1D or 2D texture arrays, the array index is
2571               provided as an unsigned integer in address.y or
2572               address.z, respectively.  address.yz are ignored for
2573               buffers and 1D textures.  address.z is ignored for 1D
2574               texture arrays and 2D textures.  address.w is always
2575               ignored.
2576
2577               A swizzle suffix may be added to the resource argument
2578               this will cause the resource data to be swizzled accordingly.
2579
2580.. opcode:: STORE - Write data to a shader resource
2581
2582               Syntax: ``STORE resource, address, src``
2583
2584               Example: ``STORE BUFFER[0], TEMP[0], TEMP[1]``
2585
2586               Using the provided integer address, STORE writes data
2587               to the specified buffer or texture.
2588
2589               The 'address' is specified as a vector of unsigned
2590               integers.  If the 'address' is out of range the result
2591               is unspecified.
2592
2593               Only the first mipmap level of a resource can be
2594               written to using this instruction.
2595
2596               For 1D or 2D texture arrays, the array index is
2597               provided as an unsigned integer in address.y or
2598               address.z, respectively.  address.yz are ignored for
2599               buffers and 1D textures.  address.z is ignored for 1D
2600               texture arrays and 2D textures.  address.w is always
2601               ignored.
2602
2603.. opcode:: RESQ - Query information about a resource
2604
2605  Syntax: ``RESQ dst, resource``
2606
2607  Example: ``RESQ TEMP[0], BUFFER[0]``
2608
2609  Returns information about the buffer or image resource. For buffer
2610  resources, the size (in bytes) is returned in the x component. For
2611  image resources, .xyz will contain the width/height/layers of the
2612  image, while .w will contain the number of samples for multi-sampled
2613  images.
2614
2615.. opcode:: FBFETCH - Load data from framebuffer
2616
2617  Syntax: ``FBFETCH dst, output``
2618
2619  Example: ``FBFETCH TEMP[0], OUT[0]``
2620
2621  This is only valid on ``COLOR`` semantic outputs. Returns the color
2622  of the current position in the framebuffer from before this fragment
2623  shader invocation. May return the same value from multiple calls for
2624  a particular output within a single invocation. Note that result may
2625  be undefined if a fragment is drawn multiple times without a blend
2626  barrier in between.
2627
2628
2629.. _bindlessopcodes:
2630
2631Bindless Opcodes
2632^^^^^^^^^^^^^^^^
2633
2634These opcodes are for working with bindless sampler or image handles and
2635require PIPE_CAP_BINDLESS_TEXTURE.
2636
2637.. opcode:: IMG2HND - Get a bindless handle for a image
2638
2639  Syntax: ``IMG2HND dst, image``
2640
2641  Example: ``IMG2HND TEMP[0], IMAGE[0]``
2642
2643  Sets 'dst' to a bindless handle for 'image'.
2644
2645.. opcode:: SAMP2HND - Get a bindless handle for a sampler
2646
2647  Syntax: ``SAMP2HND dst, sampler``
2648
2649  Example: ``SAMP2HND TEMP[0], SAMP[0]``
2650
2651  Sets 'dst' to a bindless handle for 'sampler'.
2652
2653
2654.. _threadsyncopcodes:
2655
2656Inter-thread synchronization opcodes
2657^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2658
2659These opcodes are intended for communication between threads running
2660within the same compute grid.  For now they're only valid in compute
2661programs.
2662
2663.. opcode:: BARRIER - Thread group barrier
2664
2665  ``BARRIER``
2666
2667  This opcode suspends the execution of the current thread until all
2668  the remaining threads in the working group reach the same point of
2669  the program.  Results are unspecified if any of the remaining
2670  threads terminates or never reaches an executed BARRIER instruction.
2671
2672.. opcode:: MEMBAR - Memory barrier
2673
2674  ``MEMBAR type``
2675
2676  This opcode waits for the completion of all memory accesses based on
2677  the type passed in. The type is an immediate bitfield with the following
2678  meaning:
2679
2680  Bit 0: Shader storage buffers
2681  Bit 1: Atomic buffers
2682  Bit 2: Images
2683  Bit 3: Shared memory
2684  Bit 4: Thread group
2685
2686  These may be passed in in any combination. An implementation is free to not
2687  distinguish between these as it sees fit. However these map to all the
2688  possibilities made available by GLSL.
2689
2690.. _atomopcodes:
2691
2692Atomic opcodes
2693^^^^^^^^^^^^^^
2694
2695These opcodes provide atomic variants of some common arithmetic and
2696logical operations.  In this context atomicity means that another
2697concurrent memory access operation that affects the same memory
2698location is guaranteed to be performed strictly before or after the
2699entire execution of the atomic operation. The resource may be a BUFFER,
2700IMAGE, HWATOMIC, or MEMORY.  In the case of an image, the offset works
2701the same as for ``LOAD`` and ``STORE``, specified above. For atomic
2702counters, the offset is an immediate index to the base hw atomic
2703counter for this operation.
2704These atomic operations may only be used with 32-bit integer image formats.
2705
2706.. opcode:: ATOMUADD - Atomic integer addition
2707
2708  Syntax: ``ATOMUADD dst, resource, offset, src``
2709
2710  Example: ``ATOMUADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2711
2712  The following operation is performed atomically:
2713
2714.. math::
2715
2716  dst_x = resource[offset]
2717
2718  resource[offset] = dst_x + src_x
2719
2720
2721.. opcode:: ATOMFADD - Atomic floating point addition
2722
2723  Syntax: ``ATOMFADD dst, resource, offset, src``
2724
2725  Example: ``ATOMFADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2726
2727  The following operation is performed atomically:
2728
2729.. math::
2730
2731  dst_x = resource[offset]
2732
2733  resource[offset] = dst_x + src_x
2734
2735
2736.. opcode:: ATOMXCHG - Atomic exchange
2737
2738  Syntax: ``ATOMXCHG dst, resource, offset, src``
2739
2740  Example: ``ATOMXCHG TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2741
2742  The following operation is performed atomically:
2743
2744.. math::
2745
2746  dst_x = resource[offset]
2747
2748  resource[offset] = src_x
2749
2750
2751.. opcode:: ATOMCAS - Atomic compare-and-exchange
2752
2753  Syntax: ``ATOMCAS dst, resource, offset, cmp, src``
2754
2755  Example: ``ATOMCAS TEMP[0], BUFFER[0], TEMP[1], TEMP[2], TEMP[3]``
2756
2757  The following operation is performed atomically:
2758
2759.. math::
2760
2761  dst_x = resource[offset]
2762
2763  resource[offset] = (dst_x == cmp_x ? src_x : dst_x)
2764
2765
2766.. opcode:: ATOMAND - Atomic bitwise And
2767
2768  Syntax: ``ATOMAND dst, resource, offset, src``
2769
2770  Example: ``ATOMAND TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2771
2772  The following operation is performed atomically:
2773
2774.. math::
2775
2776  dst_x = resource[offset]
2777
2778  resource[offset] = dst_x \& src_x
2779
2780
2781.. opcode:: ATOMOR - Atomic bitwise Or
2782
2783  Syntax: ``ATOMOR dst, resource, offset, src``
2784
2785  Example: ``ATOMOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2786
2787  The following operation is performed atomically:
2788
2789.. math::
2790
2791  dst_x = resource[offset]
2792
2793  resource[offset] = dst_x | src_x
2794
2795
2796.. opcode:: ATOMXOR - Atomic bitwise Xor
2797
2798  Syntax: ``ATOMXOR dst, resource, offset, src``
2799
2800  Example: ``ATOMXOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2801
2802  The following operation is performed atomically:
2803
2804.. math::
2805
2806  dst_x = resource[offset]
2807
2808  resource[offset] = dst_x \oplus src_x
2809
2810
2811.. opcode:: ATOMUMIN - Atomic unsigned minimum
2812
2813  Syntax: ``ATOMUMIN dst, resource, offset, src``
2814
2815  Example: ``ATOMUMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2816
2817  The following operation is performed atomically:
2818
2819.. math::
2820
2821  dst_x = resource[offset]
2822
2823  resource[offset] = (dst_x < src_x ? dst_x : src_x)
2824
2825
2826.. opcode:: ATOMUMAX - Atomic unsigned maximum
2827
2828  Syntax: ``ATOMUMAX dst, resource, offset, src``
2829
2830  Example: ``ATOMUMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2831
2832  The following operation is performed atomically:
2833
2834.. math::
2835
2836  dst_x = resource[offset]
2837
2838  resource[offset] = (dst_x > src_x ? dst_x : src_x)
2839
2840
2841.. opcode:: ATOMIMIN - Atomic signed minimum
2842
2843  Syntax: ``ATOMIMIN dst, resource, offset, src``
2844
2845  Example: ``ATOMIMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2846
2847  The following operation is performed atomically:
2848
2849.. math::
2850
2851  dst_x = resource[offset]
2852
2853  resource[offset] = (dst_x < src_x ? dst_x : src_x)
2854
2855
2856.. opcode:: ATOMIMAX - Atomic signed maximum
2857
2858  Syntax: ``ATOMIMAX dst, resource, offset, src``
2859
2860  Example: ``ATOMIMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2861
2862  The following operation is performed atomically:
2863
2864.. math::
2865
2866  dst_x = resource[offset]
2867
2868  resource[offset] = (dst_x > src_x ? dst_x : src_x)
2869
2870
2871.. opcode:: ATOMINC_WRAP - Atomic increment + wrap around
2872
2873  Syntax: ``ATOMINC_WRAP dst, resource, offset, src``
2874
2875  Example: ``ATOMINC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2876
2877  The following operation is performed atomically:
2878
2879.. math::
2880
2881  dst_x = resource[offset] + 1
2882
2883  resource[offset] = dst_x <= src_x ? dst_x : 0
2884
2885
2886.. opcode:: ATOMDEC_WRAP - Atomic decrement + wrap around
2887
2888  Syntax: ``ATOMDEC_WRAP dst, resource, offset, src``
2889
2890  Example: ``ATOMDEC_WRAP TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2891
2892  The following operation is performed atomically:
2893
2894.. math::
2895
2896  dst_x = resource[offset]
2897
2898  resource[offset] = (dst_x > 0 && dst_x < src_x) ? dst_x - 1 : 0
2899
2900
2901.. _interlaneopcodes:
2902
2903Inter-lane opcodes
2904^^^^^^^^^^^^^^^^^^
2905
2906These opcodes reduce the given value across the shader invocations
2907running in the current SIMD group. Every thread in the subgroup will receive
2908the same result. The BALLOT operations accept a single-channel argument that
2909is treated as a boolean and produce a 64-bit value.
2910
2911.. opcode:: VOTE_ANY - Value is set in any of the active invocations
2912
2913  Syntax: ``VOTE_ANY dst, value``
2914
2915  Example: ``VOTE_ANY TEMP[0].x, TEMP[1].x``
2916
2917
2918.. opcode:: VOTE_ALL - Value is set in all of the active invocations
2919
2920  Syntax: ``VOTE_ALL dst, value``
2921
2922  Example: ``VOTE_ALL TEMP[0].x, TEMP[1].x``
2923
2924
2925.. opcode:: VOTE_EQ - Value is the same in all of the active invocations
2926
2927  Syntax: ``VOTE_EQ dst, value``
2928
2929  Example: ``VOTE_EQ TEMP[0].x, TEMP[1].x``
2930
2931
2932.. opcode:: BALLOT - Lanemask of whether the value is set in each active
2933            invocation
2934
2935  Syntax: ``BALLOT dst, value``
2936
2937  Example: ``BALLOT TEMP[0].xy, TEMP[1].x``
2938
2939  When the argument is a constant true, this produces a bitmask of active
2940  invocations. In fragment shaders, this can include helper invocations
2941  (invocations whose outputs and writes to memory are discarded, but which
2942  are used to compute derivatives).
2943
2944
2945.. opcode:: READ_FIRST - Broadcast the value from the first active
2946            invocation to all active lanes
2947
2948  Syntax: ``READ_FIRST dst, value``
2949
2950  Example: ``READ_FIRST TEMP[0], TEMP[1]``
2951
2952
2953.. opcode:: READ_INVOC - Retrieve the value from the given invocation
2954            (need not be uniform)
2955
2956  Syntax: ``READ_INVOC dst, value, invocation``
2957
2958  Example: ``READ_INVOC TEMP[0].xy, TEMP[1].xy, TEMP[2].x``
2959
2960  invocation.x controls the invocation number to read from for all channels.
2961  The invocation number must be the same across all active invocations in a
2962  sub-group; otherwise, the results are undefined.
2963
2964
2965Explanation of symbols used
2966------------------------------
2967
2968
2969Functions
2970^^^^^^^^^^^^^^
2971
2972
2973  :math:`|x|`       Absolute value of `x`.
2974
2975  :math:`\lceil x \rceil` Ceiling of `x`.
2976
2977  clamp(x,y,z)      Clamp x between y and z.
2978                    (x < y) ? y : (x > z) ? z : x
2979
2980  :math:`\lfloor x\rfloor` Floor of `x`.
2981
2982  :math:`\log_2{x}` Logarithm of `x`, base 2.
2983
2984  max(x,y)          Maximum of x and y.
2985                    (x > y) ? x : y
2986
2987  min(x,y)          Minimum of x and y.
2988                    (x < y) ? x : y
2989
2990  partialx(x)       Derivative of x relative to fragment's X.
2991
2992  partialy(x)       Derivative of x relative to fragment's Y.
2993
2994  pop()             Pop from stack.
2995
2996  :math:`x^y`       `x` to the power `y`.
2997
2998  push(x)           Push x on stack.
2999
3000  round(x)          Round x.
3001
3002  trunc(x)          Truncate x, i.e. drop the fraction bits.
3003
3004
3005Keywords
3006^^^^^^^^^^^^^
3007
3008
3009  discard           Discard fragment.
3010
3011  pc                Program counter.
3012
3013  target            Label of target instruction.
3014
3015
3016Other tokens
3017---------------
3018
3019
3020Declaration
3021^^^^^^^^^^^
3022
3023
3024Declares a register that is will be referenced as an operand in Instruction
3025tokens.
3026
3027File field contains register file that is being declared and is one
3028of TGSI_FILE.
3029
3030UsageMask field specifies which of the register components can be accessed
3031and is one of TGSI_WRITEMASK.
3032
3033The Local flag specifies that a given value isn't intended for
3034subroutine parameter passing and, as a result, the implementation
3035isn't required to give any guarantees of it being preserved across
3036subroutine boundaries.  As it's merely a compiler hint, the
3037implementation is free to ignore it.
3038
3039If Dimension flag is set to 1, a Declaration Dimension token follows.
3040
3041If Semantic flag is set to 1, a Declaration Semantic token follows.
3042
3043If Interpolate flag is set to 1, a Declaration Interpolate token follows.
3044
3045If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows.
3046
3047If Array flag is set to 1, a Declaration Array token follows.
3048
3049Array Declaration
3050^^^^^^^^^^^^^^^^^^^^^^^^
3051
3052Declarations can optional have an ArrayID attribute which can be referred by
3053indirect addressing operands. An ArrayID of zero is reserved and treated as
3054if no ArrayID is specified.
3055
3056If an indirect addressing operand refers to a specific declaration by using
3057an ArrayID only the registers in this declaration are guaranteed to be
3058accessed, accessing any register outside this declaration results in undefined
3059behavior. Note that for compatibility the effective index is zero-based and
3060not relative to the specified declaration
3061
3062If no ArrayID is specified with an indirect addressing operand the whole
3063register file might be accessed by this operand. This is strongly discouraged
3064and will prevent packing of scalar/vec2 arrays and effective alias analysis.
3065This is only legal for TEMP and CONST register files.
3066
3067Declaration Semantic
3068^^^^^^^^^^^^^^^^^^^^^^^^
3069
3070Vertex and fragment shader input and output registers may be labeled
3071with semantic information consisting of a name and index.
3072
3073Follows Declaration token if Semantic bit is set.
3074
3075Since its purpose is to link a shader with other stages of the pipeline,
3076it is valid to follow only those Declaration tokens that declare a register
3077either in INPUT or OUTPUT file.
3078
3079SemanticName field contains the semantic name of the register being declared.
3080There is no default value.
3081
3082SemanticIndex is an optional subscript that can be used to distinguish
3083different register declarations with the same semantic name. The default value
3084is 0.
3085
3086The meanings of the individual semantic names are explained in the following
3087sections.
3088
3089TGSI_SEMANTIC_POSITION
3090""""""""""""""""""""""
3091
3092For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
3093output register which contains the homogeneous vertex position in the clip
3094space coordinate system.  After clipping, the X, Y and Z components of the
3095vertex will be divided by the W value to get normalized device coordinates.
3096
3097For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
3098fragment shader input (or system value, depending on which one is
3099supported by the driver) contains the fragment's window position.  The X
3100component starts at zero and always increases from left to right.
3101The Y component starts at zero and always increases but Y=0 may either
3102indicate the top of the window or the bottom depending on the fragment
3103coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
3104The Z coordinate ranges from 0 to 1 to represent depth from the front
3105to the back of the Z buffer.  The W component contains the interpolated
3106reciprocal of the vertex position W component (corresponding to gl_Fragcoord,
3107but unlike d3d10 which interpolates the same 1/w but then gives back
3108the reciprocal of the interpolated value).
3109
3110Fragment shaders may also declare an output register with
3111TGSI_SEMANTIC_POSITION.  Only the Z component is writable.  This allows
3112the fragment shader to change the fragment's Z position.
3113
3114
3115
3116TGSI_SEMANTIC_COLOR
3117"""""""""""""""""""
3118
3119For vertex shader outputs or fragment shader inputs/outputs, this
3120label indicates that the register contains an R,G,B,A color.
3121
3122Several shader inputs/outputs may contain colors so the semantic index
3123is used to distinguish them.  For example, color[0] may be the diffuse
3124color while color[1] may be the specular color.
3125
3126This label is needed so that the flat/smooth shading can be applied
3127to the right interpolants during rasterization.
3128
3129
3130
3131TGSI_SEMANTIC_BCOLOR
3132""""""""""""""""""""
3133
3134Back-facing colors are only used for back-facing polygons, and are only valid
3135in vertex shader outputs. After rasterization, all polygons are front-facing
3136and COLOR and BCOLOR end up occupying the same slots in the fragment shader,
3137so all BCOLORs effectively become regular COLORs in the fragment shader.
3138
3139
3140TGSI_SEMANTIC_FOG
3141"""""""""""""""""
3142
3143Vertex shader inputs and outputs and fragment shader inputs may be
3144labeled with TGSI_SEMANTIC_FOG to indicate that the register contains
3145a fog coordinate.  Typically, the fragment shader will use the fog coordinate
3146to compute a fog blend factor which is used to blend the normal fragment color
3147with a constant fog color.  But fog coord really is just an ordinary vec4
3148register like regular semantics.
3149
3150
3151TGSI_SEMANTIC_PSIZE
3152"""""""""""""""""""
3153
3154Vertex shader input and output registers may be labeled with
3155TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
3156in the form (S, 0, 0, 1).  The point size controls the width or diameter
3157of points for rasterization.  This label cannot be used in fragment
3158shaders.
3159
3160When using this semantic, be sure to set the appropriate state in the
3161:ref:`rasterizer` first.
3162
3163
3164TGSI_SEMANTIC_TEXCOORD
3165""""""""""""""""""""""
3166
3167Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3168
3169Vertex shader outputs and fragment shader inputs may be labeled with
3170this semantic to make them replaceable by sprite coordinates via the
3171sprite_coord_enable state in the :ref:`rasterizer`.
3172The semantic index permitted with this semantic is limited to <= 7.
3173
3174If the driver does not support TEXCOORD, sprite coordinate replacement
3175applies to inputs with the GENERIC semantic instead.
3176
3177The intended use case for this semantic is gl_TexCoord.
3178
3179
3180TGSI_SEMANTIC_PCOORD
3181""""""""""""""""""""
3182
3183Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3184
3185Fragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate
3186that the register contains sprite coordinates in the form (x, y, 0, 1), if
3187the current primitive is a point and point sprites are enabled. Otherwise,
3188the contents of the register are undefined.
3189
3190The intended use case for this semantic is gl_PointCoord.
3191
3192
3193TGSI_SEMANTIC_GENERIC
3194"""""""""""""""""""""
3195
3196All vertex/fragment shader inputs/outputs not labeled with any other
3197semantic label can be considered to be generic attributes.  Typical
3198uses of generic inputs/outputs are texcoords and user-defined values.
3199
3200
3201TGSI_SEMANTIC_NORMAL
3202""""""""""""""""""""
3203
3204Indicates that a vertex shader input is a normal vector.  This is
3205typically only used for legacy graphics APIs.
3206
3207
3208TGSI_SEMANTIC_FACE
3209""""""""""""""""""
3210
3211This label applies to fragment shader inputs (or system values,
3212depending on which one is supported by the driver) and indicates that
3213the register contains front/back-face information.
3214
3215If it is an input, it will be a floating-point vector in the form (F, 0, 0, 1),
3216where F will be positive when the fragment belongs to a front-facing polygon,
3217and negative when the fragment belongs to a back-facing polygon.
3218
3219If it is a system value, it will be an integer vector in the form (F, 0, 0, 1),
3220where F is 0xffffffff when the fragment belongs to a front-facing polygon and
32210 when the fragment belongs to a back-facing polygon.
3222
3223
3224TGSI_SEMANTIC_EDGEFLAG
3225""""""""""""""""""""""
3226
3227For vertex shaders, this sematic label indicates that an input or
3228output is a boolean edge flag.  The register layout is [F, x, x, x]
3229where F is 0.0 or 1.0 and x = don't care.  Normally, the vertex shader
3230simply copies the edge flag input to the edgeflag output.
3231
3232Edge flags are used to control which lines or points are actually
3233drawn when the polygon mode converts triangles/quads/polygons into
3234points or lines.
3235
3236
3237TGSI_SEMANTIC_STENCIL
3238"""""""""""""""""""""
3239
3240For fragment shaders, this semantic label indicates that an output
3241is a writable stencil reference value. Only the Y component is writable.
3242This allows the fragment shader to change the fragments stencilref value.
3243
3244
3245TGSI_SEMANTIC_VIEWPORT_INDEX
3246""""""""""""""""""""""""""""
3247
3248For geometry shaders, this semantic label indicates that an output
3249contains the index of the viewport (and scissor) to use.
3250This is an integer value, and only the X component is used.
3251
3252If PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is
3253supported, then this semantic label can also be used in vertex or
3254tessellation evaluation shaders, respectively. Only the value written in the
3255last vertex processing stage is used.
3256
3257
3258TGSI_SEMANTIC_LAYER
3259"""""""""""""""""""
3260
3261For geometry shaders, this semantic label indicates that an output
3262contains the layer value to use for the color and depth/stencil surfaces.
3263This is an integer value, and only the X component is used.
3264(Also known as rendertarget array index.)
3265
3266If PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is
3267supported, then this semantic label can also be used in vertex or
3268tessellation evaluation shaders, respectively. Only the value written in the
3269last vertex processing stage is used.
3270
3271
3272TGSI_SEMANTIC_CLIPDIST
3273""""""""""""""""""""""
3274
3275Note this covers clipping and culling distances.
3276
3277When components of vertex elements are identified this way, these
3278values are each assumed to be a float32 signed distance to a plane.
3279
3280For clip distances:
3281Primitive setup only invokes rasterization on pixels for which
3282the interpolated plane distances are >= 0.
3283
3284For cull distances:
3285Primitives will be completely discarded if the plane distance
3286for all of the vertices in the primitive are < 0.
3287If a vertex has a cull distance of NaN, that vertex counts as "out"
3288(as if its < 0);
3289
3290Multiple clip/cull planes can be implemented simultaneously, by
3291annotating multiple components of one or more vertex elements with
3292the above specified semantic.
3293The limits on both clip and cull distances are bound
3294by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
3295the maximum number of components that can be used to hold the
3296distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
3297which specifies the maximum number of registers which can be
3298annotated with those semantics.
3299The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
3300are used to divide up the 2 x vec4 space between clipping and culling.
3301
3302TGSI_SEMANTIC_SAMPLEID
3303""""""""""""""""""""""
3304
3305For fragment shaders, this semantic label indicates that a system value
3306contains the current sample id (i.e. gl_SampleID) as an unsigned int.
3307Only the X component is used.  If per-sample shading is not enabled,
3308the result is (0, undef, undef, undef).
3309
3310Note that if the fragment shader uses this system value, the fragment
3311shader is automatically executed at per sample frequency.
3312
3313TGSI_SEMANTIC_SAMPLEPOS
3314"""""""""""""""""""""""
3315
3316For fragment shaders, this semantic label indicates that a system
3317value contains the current sample's position as float4(x, y, undef, undef)
3318in the render target (i.e.  gl_SamplePosition) when per-fragment shading
3319is in effect.  Position values are in the range [0, 1] where 0.5 is
3320the center of the fragment.
3321
3322Note that if the fragment shader uses this system value, the fragment
3323shader is automatically executed at per sample frequency.
3324
3325TGSI_SEMANTIC_SAMPLEMASK
3326""""""""""""""""""""""""
3327
3328For fragment shaders, this semantic label can be applied to either a
3329shader system value input or output.
3330
3331For a system value, the sample mask indicates the set of samples covered by
3332the current primitive.  If MSAA is not enabled, the value is (1, 0, 0, 0).
3333
3334For an output, the sample mask is used to disable further sample processing.
3335
3336For both, the register type is uint[4] but only the X component is used
3337(i.e. gl_SampleMask[0]). Each bit corresponds to one sample position (up
3338to 32x MSAA is supported).
3339
3340TGSI_SEMANTIC_INVOCATIONID
3341""""""""""""""""""""""""""
3342
3343For geometry shaders, this semantic label indicates that a system value
3344contains the current invocation id (i.e. gl_InvocationID).
3345This is an integer value, and only the X component is used.
3346
3347TGSI_SEMANTIC_INSTANCEID
3348""""""""""""""""""""""""
3349
3350For vertex shaders, this semantic label indicates that a system value contains
3351the current instance id (i.e. gl_InstanceID). It does not include the base
3352instance. This is an integer value, and only the X component is used.
3353
3354TGSI_SEMANTIC_VERTEXID
3355""""""""""""""""""""""
3356
3357For vertex shaders, this semantic label indicates that a system value contains
3358the current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the
3359base vertex. This is an integer value, and only the X component is used.
3360
3361TGSI_SEMANTIC_VERTEXID_NOBASE
3362"""""""""""""""""""""""""""""""
3363
3364For vertex shaders, this semantic label indicates that a system value contains
3365the current vertex id without including the base vertex (this corresponds to
3366d3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX
3367== TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component
3368is used.
3369
3370TGSI_SEMANTIC_BASEVERTEX
3371""""""""""""""""""""""""
3372
3373For vertex shaders, this semantic label indicates that a system value contains
3374the base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls,
3375this contains the first (or start) value instead.
3376This is an integer value, and only the X component is used.
3377
3378TGSI_SEMANTIC_PRIMID
3379""""""""""""""""""""
3380
3381For geometry and fragment shaders, this semantic label indicates the value
3382contains the primitive id (i.e. gl_PrimitiveID). This is an integer value,
3383and only the X component is used.
3384FIXME: This right now can be either a ordinary input or a system value...
3385
3386
3387TGSI_SEMANTIC_PATCH
3388"""""""""""""""""""
3389
3390For tessellation evaluation/control shaders, this semantic label indicates a
3391generic per-patch attribute. Such semantics will not implicitly be per-vertex
3392arrays.
3393
3394TGSI_SEMANTIC_TESSCOORD
3395"""""""""""""""""""""""
3396
3397For tessellation evaluation shaders, this semantic label indicates the
3398coordinates of the vertex being processed. This is available in XYZ; W is
3399undefined.
3400
3401TGSI_SEMANTIC_TESSOUTER
3402"""""""""""""""""""""""
3403
3404For tessellation evaluation/control shaders, this semantic label indicates the
3405outer tessellation levels of the patch. Isoline tessellation will only have XY
3406defined, triangle will have XYZ and quads will have XYZW defined. This
3407corresponds to gl_TessLevelOuter.
3408
3409TGSI_SEMANTIC_TESSINNER
3410"""""""""""""""""""""""
3411
3412For tessellation evaluation/control shaders, this semantic label indicates the
3413inner tessellation levels of the patch. The X value is only defined for
3414triangle tessellation, while quads will have XY defined. This is entirely
3415undefined for isoline tessellation.
3416
3417TGSI_SEMANTIC_VERTICESIN
3418""""""""""""""""""""""""
3419
3420For tessellation evaluation/control shaders, this semantic label indicates the
3421number of vertices provided in the input patch. Only the X value is defined.
3422
3423TGSI_SEMANTIC_HELPER_INVOCATION
3424"""""""""""""""""""""""""""""""
3425
3426For fragment shaders, this semantic indicates whether the current
3427invocation is covered or not. Helper invocations are created in order
3428to properly compute derivatives, however it may be desirable to skip
3429some of the logic in those cases. See ``gl_HelperInvocation`` documentation.
3430
3431TGSI_SEMANTIC_BASEINSTANCE
3432""""""""""""""""""""""""""
3433
3434For vertex shaders, the base instance argument supplied for this
3435draw. This is an integer value, and only the X component is used.
3436
3437TGSI_SEMANTIC_DRAWID
3438""""""""""""""""""""
3439
3440For vertex shaders, the zero-based index of the current draw in a
3441``glMultiDraw*`` invocation. This is an integer value, and only the X
3442component is used.
3443
3444
3445TGSI_SEMANTIC_WORK_DIM
3446""""""""""""""""""""""
3447
3448For compute shaders started via OpenCL this retrieves the work_dim
3449parameter to the clEnqueueNDRangeKernel call with which the shader
3450was started.
3451
3452
3453TGSI_SEMANTIC_GRID_SIZE
3454"""""""""""""""""""""""
3455
3456For compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3457of a grid of thread blocks.
3458
3459
3460TGSI_SEMANTIC_BLOCK_ID
3461""""""""""""""""""""""
3462
3463For compute shaders, this semantic indicates the (x, y, z) coordinates of the
3464current block inside of the grid.
3465
3466
3467TGSI_SEMANTIC_BLOCK_SIZE
3468""""""""""""""""""""""""
3469
3470For compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3471of a block in threads.
3472
3473
3474TGSI_SEMANTIC_THREAD_ID
3475"""""""""""""""""""""""
3476
3477For compute shaders, this semantic indicates the (x, y, z) coordinates of the
3478current thread inside of the block.
3479
3480
3481TGSI_SEMANTIC_SUBGROUP_SIZE
3482"""""""""""""""""""""""""""
3483
3484This semantic indicates the subgroup size for the current invocation. This is
3485an integer of at most 64, as it indicates the width of lanemasks. It does not
3486depend on the number of invocations that are active.
3487
3488
3489TGSI_SEMANTIC_SUBGROUP_INVOCATION
3490"""""""""""""""""""""""""""""""""
3491
3492The index of the current invocation within its subgroup.
3493
3494
3495TGSI_SEMANTIC_SUBGROUP_EQ_MASK
3496""""""""""""""""""""""""""""""
3497
3498A bit mask of ``bit index == TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3499``1 << subgroup_invocation`` in arbitrary precision arithmetic.
3500
3501
3502TGSI_SEMANTIC_SUBGROUP_GE_MASK
3503""""""""""""""""""""""""""""""
3504
3505A bit mask of ``bit index >= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3506``((1 << (subgroup_size - subgroup_invocation)) - 1) << subgroup_invocation``
3507in arbitrary precision arithmetic.
3508
3509
3510TGSI_SEMANTIC_SUBGROUP_GT_MASK
3511""""""""""""""""""""""""""""""
3512
3513A bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3514``((1 << (subgroup_size - subgroup_invocation - 1)) - 1) << (subgroup_invocation + 1)``
3515in arbitrary precision arithmetic.
3516
3517
3518TGSI_SEMANTIC_SUBGROUP_LE_MASK
3519""""""""""""""""""""""""""""""
3520
3521A bit mask of ``bit index <= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3522``(1 << (subgroup_invocation + 1)) - 1`` in arbitrary precision arithmetic.
3523
3524
3525TGSI_SEMANTIC_SUBGROUP_LT_MASK
3526""""""""""""""""""""""""""""""
3527
3528A bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3529``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic.
3530
3531
3532TGSI_SEMANTIC_VIEWPORT_MASK
3533"""""""""""""""""""""""""""
3534
3535A bit mask of viewports to broadcast the current primitive to. See
3536GL_NV_viewport_array2 for more details.
3537
3538
3539TGSI_SEMANTIC_TESS_DEFAULT_OUTER_LEVEL
3540""""""""""""""""""""""""""""""""""""""
3541
3542A system value equal to the default_outer_level array set via set_tess_level.
3543
3544
3545TGSI_SEMANTIC_TESS_DEFAULT_INNER_LEVEL
3546""""""""""""""""""""""""""""""""""""""
3547
3548A system value equal to the default_inner_level array set via set_tess_level.
3549
3550
3551Declaration Interpolate
3552^^^^^^^^^^^^^^^^^^^^^^^
3553
3554This token is only valid for fragment shader INPUT declarations.
3555
3556The Interpolate field specifes the way input is being interpolated by
3557the rasteriser and is one of TGSI_INTERPOLATE_*.
3558
3559The Location field specifies the location inside the pixel that the
3560interpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that
3561when per-sample shading is enabled, the implementation may choose to
3562interpolate at the sample irrespective of the Location field.
3563
3564
3565Declaration Sampler View
3566^^^^^^^^^^^^^^^^^^^^^^^^
3567
3568Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
3569
3570DCL SVIEW[#], resource, type(s)
3571
3572Declares a shader input sampler view and assigns it to a SVIEW[#]
3573register.
3574
3575resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray.
3576
3577type must be 1 or 4 entries (if specifying on a per-component
3578level) out of UNORM, SNORM, SINT, UINT and FLOAT.
3579
3580For TEX\* style texture sample opcodes (as opposed to SAMPLE\* opcodes
3581which take an explicit SVIEW[#] source register), there may be optionally
3582SVIEW[#] declarations.  In this case, the SVIEW index is implied by the
3583SAMP index, and there must be a corresponding SVIEW[#] declaration for
3584each SAMP[#] declaration.  Drivers are free to ignore this if they wish.
3585But note in particular that some drivers need to know the sampler type
3586(float/int/unsigned) in order to generate the correct code, so cases
3587where integer textures are sampled, SVIEW[#] declarations should be
3588used.
3589
3590NOTE: It is NOT legal to mix SAMPLE\* style opcodes and TEX\* opcodes
3591in the same shader.
3592
3593Declaration Resource
3594^^^^^^^^^^^^^^^^^^^^
3595
3596Follows Declaration token if file is TGSI_FILE_RESOURCE.
3597
3598DCL RES[#], resource [, WR] [, RAW]
3599
3600Declares a shader input resource and assigns it to a RES[#]
3601register.
3602
3603resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and
36042DArray.
3605
3606If the RAW keyword is not specified, the texture data will be
3607subject to conversion, swizzling and scaling as required to yield
3608the specified data type from the physical data format of the bound
3609resource.
3610
3611If the RAW keyword is specified, no channel conversion will be
3612performed: the values read for each of the channels (X,Y,Z,W) will
3613correspond to consecutive words in the same order and format
3614they're found in memory.  No element-to-address conversion will be
3615performed either: the value of the provided X coordinate will be
3616interpreted in byte units instead of texel units.  The result of
3617accessing a misaligned address is undefined.
3618
3619Usage of the STORE opcode is only allowed if the WR (writable) flag
3620is set.
3621
3622Hardware Atomic Register File
3623^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3624
3625Hardware atomics are declared as a 2D array with an optional array id.
3626
3627The first member of the dimension is the buffer resource the atomic
3628is located in.
3629The second member is a range into the buffer resource, either for
3630one or multiple counters. If this is an array, the declaration will have
3631an unique array id.
3632
3633Each counter is 4 bytes in size, and index and ranges are in counters not bytes.
3634DCL HWATOMIC[0][0]
3635DCL HWATOMIC[0][1]
3636
3637This declares two atomics, one at the start of the buffer and one in the
3638second 4 bytes.
3639
3640DCL HWATOMIC[0][0]
3641DCL HWATOMIC[1][0]
3642DCL HWATOMIC[1][1..3], ARRAY(1)
3643
3644This declares 5 atomics, one in buffer 0 at 0,
3645one in buffer 1 at 0, and an array of 3 atomics in
3646the buffer 1, starting at 1.
3647
3648Properties
3649^^^^^^^^^^^^^^^^^^^^^^^^
3650
3651Properties are general directives that apply to the whole TGSI program.
3652
3653FS_COORD_ORIGIN
3654"""""""""""""""
3655
3656Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
3657The default value is UPPER_LEFT.
3658
3659If UPPER_LEFT, the position will be (0,0) at the upper left corner and
3660increase downward and rightward.
3661If LOWER_LEFT, the position will be (0,0) at the lower left corner and
3662increase upward and rightward.
3663
3664OpenGL defaults to LOWER_LEFT, and is configurable with the
3665GL_ARB_fragment_coord_conventions extension.
3666
3667DirectX 9/10 use UPPER_LEFT.
3668
3669FS_COORD_PIXEL_CENTER
3670"""""""""""""""""""""
3671
3672Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
3673The default value is HALF_INTEGER.
3674
3675If HALF_INTEGER, the fractionary part of the position will be 0.5
3676If INTEGER, the fractionary part of the position will be 0.0
3677
3678Note that this does not affect the set of fragments generated by
3679rasterization, which is instead controlled by half_pixel_center in the
3680rasterizer.
3681
3682OpenGL defaults to HALF_INTEGER, and is configurable with the
3683GL_ARB_fragment_coord_conventions extension.
3684
3685DirectX 9 uses INTEGER.
3686DirectX 10 uses HALF_INTEGER.
3687
3688FS_COLOR0_WRITES_ALL_CBUFS
3689""""""""""""""""""""""""""
3690Specifies that writes to the fragment shader color 0 are replicated to all
3691bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
3692fragData is directed to a single color buffer, but fragColor is broadcast.
3693
3694VS_PROHIBIT_UCPS
3695""""""""""""""""""""""""""
3696If this property is set on the program bound to the shader stage before the
3697fragment shader, user clip planes should have no effect (be disabled) even if
3698that shader does not write to any clip distance outputs and the rasterizer's
3699clip_plane_enable is non-zero.
3700This property is only supported by drivers that also support shader clip
3701distance outputs.
3702This is useful for APIs that don't have UCPs and where clip distances written
3703by a shader cannot be disabled.
3704
3705GS_INVOCATIONS
3706""""""""""""""
3707
3708Specifies the number of times a geometry shader should be executed for each
3709input primitive. Each invocation will have a different
3710TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to
3711be 1.
3712
3713VS_WINDOW_SPACE_POSITION
3714""""""""""""""""""""""""""
3715If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output
3716is assumed to contain window space coordinates.
3717Division of X,Y,Z by W and the viewport transformation are disabled, and 1/W is
3718directly taken from the 4-th component of the shader output.
3719Naturally, clipping is not performed on window coordinates either.
3720The effect of this property is undefined if a geometry or tessellation shader
3721are in use.
3722
3723TCS_VERTICES_OUT
3724""""""""""""""""
3725
3726The number of vertices written by the tessellation control shader. This
3727effectively defines the patch input size of the tessellation evaluation shader
3728as well.
3729
3730TES_PRIM_MODE
3731"""""""""""""
3732
3733This sets the tessellation primitive mode, one of ``PIPE_PRIM_TRIANGLES``,
3734``PIPE_PRIM_QUADS``, or ``PIPE_PRIM_LINES``. (Unlike in GL, there is no
3735separate isolines settings, the regular lines is assumed to mean isolines.)
3736
3737TES_SPACING
3738"""""""""""
3739
3740This sets the spacing mode of the tessellation generator, one of
3741``PIPE_TESS_SPACING_*``.
3742
3743TES_VERTEX_ORDER_CW
3744"""""""""""""""""""
3745
3746This sets the vertex order to be clockwise if the value is 1, or
3747counter-clockwise if set to 0.
3748
3749TES_POINT_MODE
3750""""""""""""""
3751
3752If set to a non-zero value, this turns on point mode for the tessellator,
3753which means that points will be generated instead of primitives.
3754
3755NUM_CLIPDIST_ENABLED
3756""""""""""""""""""""
3757
3758How many clip distance scalar outputs are enabled.
3759
3760NUM_CULLDIST_ENABLED
3761""""""""""""""""""""
3762
3763How many cull distance scalar outputs are enabled.
3764
3765FS_EARLY_DEPTH_STENCIL
3766""""""""""""""""""""""
3767
3768Whether depth test, stencil test, and occlusion query should run before
3769the fragment shader (regardless of fragment shader side effects). Corresponds
3770to GLSL early_fragment_tests.
3771
3772NEXT_SHADER
3773"""""""""""
3774
3775Which shader stage will MOST LIKELY follow after this shader when the shader
3776is bound. This is only a hint to the driver and doesn't have to be precise.
3777Only set for VS and TES.
3778
3779CS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH
3780"""""""""""""""""""""""""""""""""""""
3781
3782Threads per block in each dimension, if known at compile time. If the block size
3783is known all three should be at least 1. If it is unknown they should all be set
3784to 0 or not set.
3785
3786MUL_ZERO_WINS
3787"""""""""""""
3788
3789The MUL TGSI operation (FP32 multiplication) will return 0 if either
3790of the operands are equal to 0. That means that 0 * Inf = 0. This
3791should be set the same way for an entire pipeline. Note that this
3792applies not only to the literal MUL TGSI opcode, but all FP32
3793multiplications implied by other operations, such as MAD, FMA, DP2,
3794DP3, DP4, DST, LOG, LRP, and possibly others. If there is a
3795mismatch between shaders, then it is unspecified whether this behavior
3796will be enabled.
3797
3798FS_POST_DEPTH_COVERAGE
3799""""""""""""""""""""""
3800
3801When enabled, the input for TGSI_SEMANTIC_SAMPLEMASK will exclude samples
3802that have failed the depth/stencil tests. This is only valid when
3803FS_EARLY_DEPTH_STENCIL is also specified.
3804
3805LAYER_VIEWPORT_RELATIVE
3806"""""""""""""""""""""""
3807
3808When enabled, the TGSI_SEMATNIC_LAYER output value is relative to the
3809current viewport. This is especially useful in conjunction with
3810TGSI_SEMANTIC_VIEWPORT_MASK.
3811
3812
3813Texture Sampling and Texture Formats
3814------------------------------------
3815
3816This table shows how texture image components are returned as (x,y,z,w) tuples
3817by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
3818:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
3819well.
3820
3821+--------------------+--------------+--------------------+--------------+
3822| Texture Components | Gallium      | OpenGL             | Direct3D 9   |
3823+====================+==============+====================+==============+
3824| R                  | (r, 0, 0, 1) | (r, 0, 0, 1)       | (r, 1, 1, 1) |
3825+--------------------+--------------+--------------------+--------------+
3826| RG                 | (r, g, 0, 1) | (r, g, 0, 1)       | (r, g, 1, 1) |
3827+--------------------+--------------+--------------------+--------------+
3828| RGB                | (r, g, b, 1) | (r, g, b, 1)       | (r, g, b, 1) |
3829+--------------------+--------------+--------------------+--------------+
3830| RGBA               | (r, g, b, a) | (r, g, b, a)       | (r, g, b, a) |
3831+--------------------+--------------+--------------------+--------------+
3832| A                  | (0, 0, 0, a) | (0, 0, 0, a)       | (0, 0, 0, a) |
3833+--------------------+--------------+--------------------+--------------+
3834| L                  | (l, l, l, 1) | (l, l, l, 1)       | (l, l, l, 1) |
3835+--------------------+--------------+--------------------+--------------+
3836| LA                 | (l, l, l, a) | (l, l, l, a)       | (l, l, l, a) |
3837+--------------------+--------------+--------------------+--------------+
3838| I                  | (i, i, i, i) | (i, i, i, i)       | N/A          |
3839+--------------------+--------------+--------------------+--------------+
3840| UV                 | XXX TBD      | (0, 0, 0, 1)       | (u, v, 1, 1) |
3841|                    |              | [#envmap-bumpmap]_ |              |
3842+--------------------+--------------+--------------------+--------------+
3843| Z                  | XXX TBD      | (z, z, z, 1)       | (0, z, 0, 1) |
3844|                    |              | [#depth-tex-mode]_ |              |
3845+--------------------+--------------+--------------------+--------------+
3846| S                  | (s, s, s, s) | unknown            | unknown      |
3847+--------------------+--------------+--------------------+--------------+
3848
3849.. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
3850.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
3851   or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.
3852