1#  HLMS: High Level Material System {#hlms}
2
3This component allows you to manage shader variations of a specific shader template.
4It is a different take to the Uber shader management, but instead of using plain
5@c \#ifdefs it uses a custom, more powerful preprocessor language.
6
7Additionally it allows you to define a set of abstract properties that are then used to
8configure the shader generation.
9
10Currently there is only the Physically Based Shading (PBS) material implementation based on the HLMS
11that does not read the classical Materials and therefore does not respect
12the settings for fog, diffuse_color etc.
13
14@attention This documentation was originally written for %Ogre 2.1, so not all details apply to the actual HLMS backport.
15
16@tableofcontents
17
18#  The three components {#components}
19
20![](hlms_components.svg)
21
221.  Scripts. To set the material properties (i.e. type of Hlms to use:
23    PBS, Toon shading, GUI; what textures, diffuse colour,
24    roughness, etc). **You currently have to do this from C++.** Everybody
25    will be using this part.
26
272.  Shader template. The Hlms takes a couple hand-written glsl/hlsl
28    files as template and then adapts it to fit the needs on the
29    fly (i.e. if the mesh doesn’t contain skeleton, the bit of code
30    pertaining to skeletal animation is stripped from the
31    vertex shader). The Hlms provides a simple preprocessor to deal with
32    this entirely within from the template, but you’re not forced to
33    use it. Here’s a simple example of the preprocessor. I won’t be
34    explaining the main keywords today. Advanced users will probably
35    want to modify these files (or write some of their own) to fit their
36    custom needs.
37
383.  C++ classes implementation. The C++ takes care of picking the shader
39    templates and manipulating them before compiling; and most
40    importantly it feeds the shaders with uniform/constans data and sets
41    the textures that are being in use. It is extremely flexible,
42    powerful, efficient and scalable, but it’s harder to use than good
43    ol’ Materials because those used to be data-driven: there are no
44    AutoParamsSource here. Want the view matrix? You better grab it from
45    the camera when the scene pass is about to start, and then pass it
46    yourself to the shader. This is very powerful, because in D3D11/GL3+
47    you can just set the uniform buffer with the view matrix just once
48    for the entire frame, and thus have multiple uniforms buffers sorted
49    by update frequency. Very advanced user will be using messing with
50    this part.
51
52@note Material scripts in Ogre 1.x do not yet support
53the HLMS - you must use the C++ API. e.g. Ogre::PbsMaterial.
54
55Based on your skillset and needs, you can pick up to which parts you
56want to mess with. Most users will just use the scripts to define
57materials, advanced users will change the template, and very advanced
58users who need something entirely different will change all three.
59
60For example the PBS material has its own C++ implementation and its own set of shader templates.
61The Toon Shading has its own C++ implementation and set of shaders.
62
63It is theoretically possible to implement both Toon & PBS in the same
64C++ module, but that would be crazy, hard to maintain and not very
65modular.
66
67#  Material parameters are stored in “Blocks” {#data}
68
69You could be thinking the reason I came up with these two is to fit with
70D3D11′s grand scheme of things while being compatible with OpenGL. But
71that’s a half truth and an awesome side effect. I’ve been developing the
72Hlms using OpenGL this whole time.
73
74An OpenGL fan will tell you that grouping these together in single call
75like D3D11 did barely reduce API overhead in practice (as long as you
76keep sorting by state), and they’re right about that.
77
78However, there are big advantages for using blocks:
79
801.  Many materials in practice share the same Macro- &
81    Blendblock parameters. In an age where we want many 3D primitives
82    with the same shader but slightly different parameters like texture,
83    colour, or roughness (which equals, a different material) having
84    these settings repeated per material wastes a lot of memory space…
85    and a lot of bandwidth (and wastes cache space). Ogre 2.0 is
86    bandwidth bound, so having all materials share the same pointer to
87    the same Macroblock can potentially save a lot of bandwidth, and be
88    friendlier to the cache at the same time.This stays true whether we
89    use D3D11, D3D12, OpenGL, GL ES 2, or Mantle.
90
912.  Sorting by Macroblock is a lot easier (and faster) than sorting by
92    its individual parameters: when preparing the hash used for sorting,
93    it’s much easier to just do (every frame, per object) `hash
94    |= (macroblock->getId() << bits) & mask` than to do: `hash =| m->depth_check | m->depthWrite << 1 | m->depthBias << 2 | m->depth_slope_bias << 3 | m->cullMode << 18 | ... ;` We also need a lot more bits we can’t afford. Ogre
95    2.0 imposes a limit on the amount of live Macroblocks you can have
96    at the same time; as we run out of hashing space (by the way, D3D11
97    has its own limit). It operates around the idea that most setting
98    combinations won’t be used in practice.
99
100Of course it’s not perfect, it can’t fit every use case. We inherit the
101same problems D3D11 has. If a particular rendering technique relies on
102regularly changing a property that lives in a Macroblock (i.e. like
103alternating depth comparison function between less & greater with every
104draw call, or gradually incrementing the depth bias on each draw call);
105you’ll end up redundantly changing a lot of other states (culling mode,
106polygon mode, depth check & write flags, depth bias) alongside it. This
107is rare. We’re aiming the general use case.
108
109These problems make me wonder if D3D11 made the right choice of using
110blocks from an API perspective, since I’m not used to driver
111development. However from an engine perspective, blocks make sense.
112
113## Datablocks {#toc52}
114
115We’re introducing the concept of Datablocks.
116A Datablock is a “material” from the user’s perspective.
117It holds data (i.e. material properties) that will be
118passed directly to the shaders.
119
120![](hlms_blocks.svg)
121
122The diagram shows a typical layout of a datablock.
123Samplerblocks do not live inside base Ogre::HlmsDatablock, but rather in its
124derived implementation. This is because some implementations may not
125need textures at all, and the number of samplerblocks is unknown. Some
126implementations may want one samplerblock per texture, whereas others
127may just need one.
128
129@note Macroblocks and Blendblocks are not available in 1.x - use Ogre::Pass::setDepthCheckEnabled etc. as usual, to change the respective properties
130
131# Hlms templates {#toc69}
132
133The Hlms will parse the template files from the template folder
134according to the following rules:
135
1361.  The files with the names "VertexShader_vs", "PixelShader_ps",
137    "GeometryShader_gs", "HullShader_hs", "DomainShader_ds" will be
138    fully parsed and compiled into the shader. If an implementation only
139    provides "VertexShader_vs.glslt", "PixelShader_ps.glslt"; only the
140    vertex and pixel shaders for OpenGL will be created. There will be
141    no geometry or tesellation shaders.
142
1432.  The files that contain the string "_piece_vs" in their filenames
144    will be parsed only for collecting pieces (more on pieces later).
145    Likewise, the words "_piece_ps", "_piece_gs", "_piece_hs",
146    "_piece_ds” correspond to the pieces for their respective
147    shader stages. Note that you can concatenate, thus
148    "MyUtilities_piece_vs_piece_ps.glslt” will be collected both in
149    the vertex and pixel shader stages.
150
151The Hlms takes a template file (i.e. a file written in GLSL or HLSL) and
152spits out valid shader code. Templates can take advantage of the Hlms'
153preprocessor, which is a simple yet powerful macro-like preprocessor
154that helps writing the required code.
155
156##  The Hlms preprocessor {#preproc}
157
158The preprocessor was written with speed and simplicity in mind. It does
159not implement an AST or anything fancy. This is very important to
160account while writing templates because there will be cases when using
161the preprocessor may feel counter-intuitive or frustrating.
162
163For example
164```cpp
165  \@property( IncludeLighting )
166
167  /* code here */
168
169  @end
170```
171
172is analogous to
173```cpp
174  #if IncludeLighting != 0
175
176  /* code here */
177
178  #endif
179```
180
181However you can't evaluate IncludeLighting to anything other than zero
182and non-zero, i.e. you can't check whether IncludeLighting == 2 with the
183Hlms preprocessor. A simple workaround is to define, from C++, the
184variable “IncludeLightingEquals2” and check whether it's non-zero.
185Another solution is to use the GLSL/HLSL preprocessor itself instead of
186Hlms'. However, the advantage of Hlms is that you can see its generated
187output in a file for inspection, whereas you can't see the GLSL/HLSL
188after the macro preprocessor without vendor-specific tools. Plus, in the
189case of GLSL, you'll depend on the driver implementation having a good
190macro preprocessor.
191
192##  Preprocessor syntax {#syntax}
193
194The preprocessor always starts with \@ followed by the command, and often
195with arguments inside parenthesis. Note that the preprocessor is always
196case-sensitive. The following keywords are recognized:
197
198-   \@property
199
200-   \@foreach
201
202-   \@counter
203
204-   \@value
205
206-   \@set add sub mul div mod min max
207
208-   \@piece
209
210-   \@insertpiece
211
212-   \@pset padd psub pmul pdiv pmod pmin pmax
213
214###  \@property( expression )
215
216Checks whether the variables in the expression are true, if so, the text
217inside the block is printed. Must be finazlied with \@end. The expression
218is case-sensitive. When the variable hasn't been declared, it evaluates
219to false.
220
221The logical operands && || ! are valid.
222
223Examples:
224```cpp
225  \@property( hlms_skeleton )
226
227  //Skeleton animation code here
228
229  @end
230
231  \@property( hlms_skeleton && !hlms_normal )
232
233  //Print this code if it has skeleton animation but no normals
234
235  @end
236
237  \@property( hlms_normal || hlms_tangent )
238
239  //Print this code if it has normals or tangents
240
241  @end
242
243  \@property( hlms_normal && (!hlms_skeleton || hlms_tangent) )
244
245  //Print this code if it has normals and either no skeleton or tangents
246
247  @end
248```
249
250It is very similar to \#if hlms_skeleton != 0 \#endif; however there is
251no equivalent \#else or \#elif syntax. As a simple workaround you can
252do:
253```cpp
254  \@property( hlms_skeleton )
255
256  //Skeleton animation code here
257
258  @end \@property( !hlms_skeleton )
259
260  //Non-Skeleton code here
261
262  @end
263```
264
265Newlines are not necessary. The following is perfectly valid:
266```
267  diffuse = surfaceDiffuse \@property( hasLights )* lightDiffuse@end ;
268```
269
270Which will print:
271```
272  hasLights != 0                              hasLights == 0
273  diffuse = surfaceDiffuse * lightDiffuse;   diffuse = surfaceDiffuse ;
274```
275
276###  \@foreach( scopedVar, count, \[start\] )
277
278Loop that prints the text inside the block, The text is repeated count -
279start times. Must be finalized with \@end.
280
281-   scopedVar is a variable that can be used to print the current
282    iteration of the loop while inside the block. i.e. “\@scopedVar” will
283    be converted into a number in the range \[start; count)
284
285-   count The number of times to repeat the loop (if start = 0). Count
286    can read variables.
287
288-   start Optional. Allows to start from a value different than 0. Start
289    can read variables.
290
291Newlines are very important, as they will be printed with the loop.
292
293Examples:
294|  Expression    |         Output |
295|----------------|----------------|
296|  \@foreach( 4, n ) <br>&emsp; \@n\@end  | <br>0<br>1<br>2<br>3|
297|  \@foreach( 4, n ) \@n\@end             |   0 1 2 3 |
298|  \@foreach( 4, n )<br>&emsp;\@n<br>\@end        |  <br>0<br><br>1<br><br>2<br><br>3<br> |
299|  \@foreach( 4, n, 2 ) \@n\@end  |            2 3 |
300| \@pset( myStartVar, 1 )<br>\@pset( myCountVar, 3 )<br>\@foreach( myStartVar, n, myCountVar )<br>&emsp;\@n\@end         |          1<br>2 |
301|  \@foreach( 2, n )<br>&emsp;\@insertpiece( pieceName\@n )\@end | \@insertpiece( pieceName0 )<br>        \@insertpiece( pieceName1 ) |
302
303> **Attention \#1!**
304>
305>  Don't use the common letter i for the loop counter. It will conflict with other keywords.
306>
307>  i.e. “\@foreach( 1, i )\@insertpiece( pieceName )\@end” will print “0nsertpiece( pieceName )” which is probably not what you intended.
308>
309>  **Attention \#2!**
310>
311>  foreach is parsed after property math (pset, padd, etc). That means that driving each iteration through a combination of properties and padd functions will not work as you would expect.
312>
313>  i.e. The following code will not work:
314>
315> ```cpp
316>    @pset( myVar, 1 )
317>
318>    @foreach( 2, n )
319>
320>    //Code
321>
322>    @psub( myVar, 1 ) //Decrement myVar on each loop
323>
324>    \@property( myVar )
325>
326>    //Code that shouldn't be printed in the last iteration
327>
328>    @end
329>
330>    @end
331>```
332>
333> Because psub will be evaluated before expanding the foreach.
334
335###  \@counter( variable )
336
337Prints the current value of variable and increments it by 1. If the
338variable hasn't been declared yet, it is initialized to 0.
339
340Examples:
341```
342  Expression          Output
343
344  @counter( myVar )   0
345
346  @counter( myVar )   1
347
348  @counter( myVar )   2
349```
350
351### \@value( variable )
352
353Prints the current value of variable without incrementing it. If the
354variable hasn't been declared, prints 0.
355```cpp
356  Expression          Output
357
358  @value( myVar )     0
359
360  @value( myVar )     0
361
362  @counter( myVar )   0
363
364  @value( myVar )     1
365
366  @value( myVar )     1
367```
368
369### \@set add sub mul div mod min max
370
371Sets a variable to a given value, adds, subtracts, multiplies, divides,
372calculates modulus, or the minimum/maximum of a variable and a constant,
373or two variables. This family of functions get evaluated after
374foreach(s) have been expanded and pieces have been inserted. Doesn't
375print its value.
376
377Arguments can be in the form \@add(a, b) meaning a += b; or in the form
378\@add( a, b, c ) meaning a = b + c
379
380Useful in combination with \@counter and \@value
381
382|  Expression     |        Output |  Math |
383|-----------------|---------------|-------|
384|  \@set( myVar, 1 ) <br> \@value( myVar ) |       1     |   myVar = 1 |
385|  \@add( myVar, 5 )<br> \@value( myVar )   |    6  |      myVar = 1 + 5|
386|  \@div( myVar, 2 ) <br> \@value( myVar ) |     3   |     myVar = 6 / 2|
387|  \@mul( myVar, myVar )<br> \@value( myVar ) |  9   |     myVar = 3 * 3|
388|  \@mod( myVar, 5 ) <br> \@value( myVar )    |  4   |     myVar = 9 % 5|
389|  \@add( myVar, 1, 1 ) <br> \@value( myVar ) |  2  |       myVar = 1 + 1|
390
391###  \@piece( nameOfPiece )
392
393Saves all the text inside the blocks and saves it as a named piece. If a
394piece with the given name already exists, a compiler error will be
395thrown. The text that was inside the block won't be printed. Useful when
396in combination with \@insertpiece. Pieces can also be defined from C++ or
397[*collected*](#toc69) from piece template files.
398
399Example:
400```cpp
401  Expression                        Output
402
403  @piece( VertexTransform )
404
405  outPos = worldViewProj * inPos
406
407  @end
408```
409
410###  \@insertpiece( nameOfPiece )
411
412Prints a block of text that was previously saved with piece (or from
413C++). If no piece with such name exists, prints nothing.
414
415Example:
416```
417  Expression                                                     Output
418
419  @piece( VertexTransform )outPos = worldViewProj * inPos@end   void main()
420
421  void main()                                                    {
422
423  {                                                              outPos = worldViewProj * inPos
424
425  @insertpiece( VertexTransform )                                }
426
427  @insertpiece( InexistentPiece )
428
429  }
430```
431
432###  \@pset padd psub pmul pdiv pmod pmin pmax
433
434Analogous to [*the family of math functions without the 'p'
435prefix*](#toc304). The difference is that the math is evaluated before
436anything else. There is no much use to these functions, probably except
437for quickly testing whether a given flag/variable is being properly set
438from C++ without having to recompile.
439
440i.e. If you suspect hlms_normal is never being set, try \@pset(
441hlms_normal, 1 )
442
443One important use worth mentioning, is that variables retain their
444values across shader stages. First the vertex shader template is parsed,
445then the pixel shader one. If 'myVal' is 0 and the vertex shader
446contains \@counter( myVal ); when the pixel shader is parsed \@value(
447myVal ) will return 1, not 0.
448
449If you need to reset these variables across shader stages, you can use
450pset( myVal, 0 ); which is guaranteed to reset your variable to 0 before
451anything else happens; even if the pset is stored in a piece file.
452
453#  Creation of shaders {#shaders}
454
455There are two components that needs to be evaluated that may affect the
456shader itself and would need to be recompiled:
457
4581.  The Datablock/Material. Does it have Normal maps? Then include code
459    to sample the normal map and affect the lighting calculations. Does
460    it have a diffuse map? If not, avoid sampling the diffuse map and
461    multiplying it against the diffuse colour, etc.
462
4632.  The Mesh. Is it skeletally animated? Then include skeletal
464    animation code. How many blend weights? Modify the skeletal
465    animation code appropiately. It doesn't have tangents? Then skip the
466    normal map defined in the material. And so on.
467
468When calling Ogre::SceneManager::_renderScene, what happens is that
469Ogre::ShaderManager::getGpuProgram will get called and this function evaluates both
470the mesh and datablock compatibility.
471
472If they're compatible, all the variables (aka properties) and pieces are
473generated and cached in a structure (mShaderCache) with a hash key
474to this cache entry. If a different pair of datablock-mesh ends up
475having the same properties and pieces, they will get the same hash (and
476share the same shader).
477
478The following graph summarizes the process:
479
480![](hlms_hash.svg)
481
482Later on during rendering, at the start each render pass, a similar
483process is done, which ends up generating a “[*pass hash*](#toc567)”
484instead of a renderable hash. Pass data stores settings like number of
485shadow casting lights, number of lights per type (directional, point,
486spot).
487
488While iterating each renderable for render, the hash key is read from
489the Renderable and merged with the pass' hash. With the merged hash, the
490shader is retrieved from a cache. If it's not in the cache, the shader
491will be generated and compiled by merging the cached data (pieces and
492variables) from the Renderable and the Pass. The following graph
493illustrates the process:
494
495![](hlms_caching.svg)
496
497#  C++ interaction with shader templates {#cpp}
498
499Note: This section is relevant to those seeking to write their own Hlms
500implementation.
501
502C++ can use Ogre::HlmsMaterialBase::getPropertyMap().setProperty( "key", value ) to set “key” to the given
503value. This value can be read by \@property, \@foreach,
504\@add/sub/mul/div/mod, \@counter, \@value and \@padd/psub/pmul/pdiv/pmod
505
506To create pieces (or read them) you need to pass your custom
507Hlms::PiecesMap to Hlms::addRenderableCache.
508
509The recommended place to do this is in Hlms::calculateHashForPreCreate
510and Hlms::calculateHashForPreCaster. Both are virtual. The former gets
511called right before adding the set of properties, pieces and hash to the
512cache, while the latter happens right before adding the similar set for
513the shadow caster pass.
514
515In those two functions you get the chance to call setProperty to set
516your own variables and add your own pieces.
517
518Another option is to overload Hlms::calculateHashFor which gives you
519more control but you'll have to do some of the work the base class does.
520
521For some particularly complex features, the Hlms preprocessor may not be
522enough, too difficult, or just impossible to implement, and thus you can
523generate the string from C++ and send it as a piece. The template shader
524can insert it using \@insertpiece.
525
526The function Hlms::createShaderCacheEntry is the main responsible for
527generating the shaders and parsing the template through the Hlms
528preprocessor. If you overload it, you can ignore pieces, properties;
529basically override the entire Hlms system and provide the source for the
530shaders yourself.
531
532##  Common conventions
533
534Properties starting with 'hlms_' prefix are common to all or most Hlms
535implementations. i.e. 'hlms_skeleton' is set to 1 when a skeleton is
536present and hardware skinning should be performed.
537
538Save properties' IdStrings (hashed strings) into constant as performance
539optimizations. Ideally the compiler should detect the constant
540propagation and this shouldn't be needed, but this often isn't the case.
541
542For mobile, avoid mat4 and do the math yourself. As for 4x3 matrices
543(i.e. skinning), perform the math manually as many GLES2 drivers have
544issues compiling valid glsl code.
545
546Properties in underscore\_case are set from C++; propierties in
547camelCase are set from the template.
548
549Propierties and pieces starting with 'custom\_' are for user
550customizations of the template
551
552TBD
553
554##  Disabling a stage
555
556By default if a template isn't present, the shader stage won't be
557created. e.g. if there is no GeometryShader\_gs.glsl file, no geometry
558shader will be created. However there are times where you want to use a
559template but only use this stage in particular scenarios (e.g. toggled
560by a material parameter, disable it for shadow mapping, etc.). In this
561case, set the property hlms_disable\_stage to non-zero from within the
562template (i.e. using \@set) . The value of this property is reset to 0
563for every stage.
564
565Note that even when disabled, the Hlms template will be fully parsed and
566dumped to disk; and any modification you perform to the Hlms properties
567will be carried over to the next stages. Setting hlms_disable\_stage is
568not an early out or an abort.
569
570#  Customization {#customization}
571
572In many cases, users may want to slightly customize the shaders to
573achieve a particular look, implement a specific feature, or solve a
574unique problem; without having to rewrite the whole implementation.
575
576Maximum flexibility can be get by directly modifying the original source
577code. However this isn't modular, making it difficult to merge when the
578original source code has changed. Most of of the customizations don't
579require such intrusive approach.
580
581Note: For performance reasons, the listener interface does not allow you
582to add customizations that work per Renderable, as that loop is
583performance sensitive. The only listener callback that works inside
584Hlms::fillBuffersFor is hlmsTypeChanged which only gets evaluated when
585the previous Renderable used a different Hlms implementation; which is
586rare, and since we sort the RenderQueue, it often branch predicts well.
587
588There are different levels in which an Hlms implementation can be
589customized:
590
5911.  Using a library, see [*Hlms Initialization*](#toc574). pass a set
592    of piece files in a folder by pushing the folder to ArchiveVec. The
593    files in that folder will be parsed first, in order (archiveVec\[0\]
594    then archiveVec\[1\], … archiveVec\[N-1\]); which will let you
595    define your own pieces to insert code into the default template (see
596    the the table at the end). You can also do clever tricky things to
597    avoid dealing with C++ code at all even if there are no 'custom\_'
598    pieces for it. For example, you can write the following code to
599    override the BRDF declarations and provide a custom BRDF:
600```cpp
601  //Disable all known BRDFs that the implementation may enable
602
603  @pset( BRDF_CookTorrance, 0 )
604
605  @pset( BRDF_Default, 0 )
606
607  @piece( DeclareBRDF )
608
609  // Your BRDF code declaration here
610
611  @end
612```
613
6141.  Via listener, through HlmsListener. This allows you to have access
615    to the buffer pass to fill extra information; or bind extra buffers
616    to the shader.
617
6182.  Overload HlmsPbs. Useful for overriding only specific parts, or
619    adding new functionality that requires storing extra information in
620    a datablock (e.g. overload HlmsPbsDatablock to add more variables,
621    and then overload HlmsPbs::createDatablockImpl to create these
622    custom datablocks)
623
6243.  Directly modify HlmsPbs, HlmsPbsDatablock and the template.
625| Variable | Description |
626|----------|-------------|
627| custom_passBuffer            |   Piece where users can add extra information for the pass buffer (only useful if the user is using HlmsListener or overloaded HlmsPbs. |
628| custom_VStoPS                |   Piece where users can add more interpolants for passing data from the vertex to the pixel shader.|
629| custom_vs_attributes         |  Custom vertex shader attributes in the Vertex Shader (i.e. a special texcoord, etc).|
630| custom_vs_uniformDeclaration |  Data declaration (textures, texture buffers, uniform buffers) in the Vertex Shader.|
631| custom_vs_preExecution       |  Executed before Ogre's code from the Vertex Shader.|
632| custom_vs_posExecution       |  Executed after all code from the Vertex Shader has been performed.                     |
633| custom_ps_uniformDeclaration |  Same as custom_vs_uniformDeclaration, but for the Pixel Shader|
634| custom_ps_preExecution       |  Executed before Ogre's code from the Pixel Shader.|
635| custom_ps_posMaterialLoad    |  Executed right after loading material data; and before anything else. May not get executed if there is no relevant material data (i.e. doesn't have normals or QTangents for lighting calculation)|
636| custom_ps_preLights          |  Executed right before any light (i.e. to perform your own ambient / global illumination pass). All relevant texture data should be loaded by now.|
637| custom_ps_posExecution       |  Executed after all code from the Pixel Shader has been performed.|
638