1# HLMS: High Level Material System {#hlms} 2 3This component allows you to manage shader variations of a specific shader template. 4It is a different take to the Uber shader management, but instead of using plain 5@c \#ifdefs it uses a custom, more powerful preprocessor language. 6 7Additionally it allows you to define a set of abstract properties that are then used to 8configure the shader generation. 9 10Currently there is only the Physically Based Shading (PBS) material implementation based on the HLMS 11that does not read the classical Materials and therefore does not respect 12the settings for fog, diffuse_color etc. 13 14@attention This documentation was originally written for %Ogre 2.1, so not all details apply to the actual HLMS backport. 15 16@tableofcontents 17 18# The three components {#components} 19 20![](hlms_components.svg) 21 221. Scripts. To set the material properties (i.e. type of Hlms to use: 23 PBS, Toon shading, GUI; what textures, diffuse colour, 24 roughness, etc). **You currently have to do this from C++.** Everybody 25 will be using this part. 26 272. Shader template. The Hlms takes a couple hand-written glsl/hlsl 28 files as template and then adapts it to fit the needs on the 29 fly (i.e. if the mesh doesn’t contain skeleton, the bit of code 30 pertaining to skeletal animation is stripped from the 31 vertex shader). The Hlms provides a simple preprocessor to deal with 32 this entirely within from the template, but you’re not forced to 33 use it. Here’s a simple example of the preprocessor. I won’t be 34 explaining the main keywords today. Advanced users will probably 35 want to modify these files (or write some of their own) to fit their 36 custom needs. 37 383. C++ classes implementation. The C++ takes care of picking the shader 39 templates and manipulating them before compiling; and most 40 importantly it feeds the shaders with uniform/constans data and sets 41 the textures that are being in use. It is extremely flexible, 42 powerful, efficient and scalable, but it’s harder to use than good 43 ol’ Materials because those used to be data-driven: there are no 44 AutoParamsSource here. Want the view matrix? You better grab it from 45 the camera when the scene pass is about to start, and then pass it 46 yourself to the shader. This is very powerful, because in D3D11/GL3+ 47 you can just set the uniform buffer with the view matrix just once 48 for the entire frame, and thus have multiple uniforms buffers sorted 49 by update frequency. Very advanced user will be using messing with 50 this part. 51 52@note Material scripts in Ogre 1.x do not yet support 53the HLMS - you must use the C++ API. e.g. Ogre::PbsMaterial. 54 55Based on your skillset and needs, you can pick up to which parts you 56want to mess with. Most users will just use the scripts to define 57materials, advanced users will change the template, and very advanced 58users who need something entirely different will change all three. 59 60For example the PBS material has its own C++ implementation and its own set of shader templates. 61The Toon Shading has its own C++ implementation and set of shaders. 62 63It is theoretically possible to implement both Toon & PBS in the same 64C++ module, but that would be crazy, hard to maintain and not very 65modular. 66 67# Material parameters are stored in “Blocks” {#data} 68 69You could be thinking the reason I came up with these two is to fit with 70D3D11′s grand scheme of things while being compatible with OpenGL. But 71that’s a half truth and an awesome side effect. I’ve been developing the 72Hlms using OpenGL this whole time. 73 74An OpenGL fan will tell you that grouping these together in single call 75like D3D11 did barely reduce API overhead in practice (as long as you 76keep sorting by state), and they’re right about that. 77 78However, there are big advantages for using blocks: 79 801. Many materials in practice share the same Macro- & 81 Blendblock parameters. In an age where we want many 3D primitives 82 with the same shader but slightly different parameters like texture, 83 colour, or roughness (which equals, a different material) having 84 these settings repeated per material wastes a lot of memory space… 85 and a lot of bandwidth (and wastes cache space). Ogre 2.0 is 86 bandwidth bound, so having all materials share the same pointer to 87 the same Macroblock can potentially save a lot of bandwidth, and be 88 friendlier to the cache at the same time.This stays true whether we 89 use D3D11, D3D12, OpenGL, GL ES 2, or Mantle. 90 912. Sorting by Macroblock is a lot easier (and faster) than sorting by 92 its individual parameters: when preparing the hash used for sorting, 93 it’s much easier to just do (every frame, per object) `hash 94 |= (macroblock->getId() << bits) & mask` than to do: `hash =| m->depth_check | m->depthWrite << 1 | m->depthBias << 2 | m->depth_slope_bias << 3 | m->cullMode << 18 | ... ;` We also need a lot more bits we can’t afford. Ogre 95 2.0 imposes a limit on the amount of live Macroblocks you can have 96 at the same time; as we run out of hashing space (by the way, D3D11 97 has its own limit). It operates around the idea that most setting 98 combinations won’t be used in practice. 99 100Of course it’s not perfect, it can’t fit every use case. We inherit the 101same problems D3D11 has. If a particular rendering technique relies on 102regularly changing a property that lives in a Macroblock (i.e. like 103alternating depth comparison function between less & greater with every 104draw call, or gradually incrementing the depth bias on each draw call); 105you’ll end up redundantly changing a lot of other states (culling mode, 106polygon mode, depth check & write flags, depth bias) alongside it. This 107is rare. We’re aiming the general use case. 108 109These problems make me wonder if D3D11 made the right choice of using 110blocks from an API perspective, since I’m not used to driver 111development. However from an engine perspective, blocks make sense. 112 113## Datablocks {#toc52} 114 115We’re introducing the concept of Datablocks. 116A Datablock is a “material” from the user’s perspective. 117It holds data (i.e. material properties) that will be 118passed directly to the shaders. 119 120![](hlms_blocks.svg) 121 122The diagram shows a typical layout of a datablock. 123Samplerblocks do not live inside base Ogre::HlmsDatablock, but rather in its 124derived implementation. This is because some implementations may not 125need textures at all, and the number of samplerblocks is unknown. Some 126implementations may want one samplerblock per texture, whereas others 127may just need one. 128 129@note Macroblocks and Blendblocks are not available in 1.x - use Ogre::Pass::setDepthCheckEnabled etc. as usual, to change the respective properties 130 131# Hlms templates {#toc69} 132 133The Hlms will parse the template files from the template folder 134according to the following rules: 135 1361. The files with the names "VertexShader_vs", "PixelShader_ps", 137 "GeometryShader_gs", "HullShader_hs", "DomainShader_ds" will be 138 fully parsed and compiled into the shader. If an implementation only 139 provides "VertexShader_vs.glslt", "PixelShader_ps.glslt"; only the 140 vertex and pixel shaders for OpenGL will be created. There will be 141 no geometry or tesellation shaders. 142 1432. The files that contain the string "_piece_vs" in their filenames 144 will be parsed only for collecting pieces (more on pieces later). 145 Likewise, the words "_piece_ps", "_piece_gs", "_piece_hs", 146 "_piece_ds” correspond to the pieces for their respective 147 shader stages. Note that you can concatenate, thus 148 "MyUtilities_piece_vs_piece_ps.glslt” will be collected both in 149 the vertex and pixel shader stages. 150 151The Hlms takes a template file (i.e. a file written in GLSL or HLSL) and 152spits out valid shader code. Templates can take advantage of the Hlms' 153preprocessor, which is a simple yet powerful macro-like preprocessor 154that helps writing the required code. 155 156## The Hlms preprocessor {#preproc} 157 158The preprocessor was written with speed and simplicity in mind. It does 159not implement an AST or anything fancy. This is very important to 160account while writing templates because there will be cases when using 161the preprocessor may feel counter-intuitive or frustrating. 162 163For example 164```cpp 165 \@property( IncludeLighting ) 166 167 /* code here */ 168 169 @end 170``` 171 172is analogous to 173```cpp 174 #if IncludeLighting != 0 175 176 /* code here */ 177 178 #endif 179``` 180 181However you can't evaluate IncludeLighting to anything other than zero 182and non-zero, i.e. you can't check whether IncludeLighting == 2 with the 183Hlms preprocessor. A simple workaround is to define, from C++, the 184variable “IncludeLightingEquals2” and check whether it's non-zero. 185Another solution is to use the GLSL/HLSL preprocessor itself instead of 186Hlms'. However, the advantage of Hlms is that you can see its generated 187output in a file for inspection, whereas you can't see the GLSL/HLSL 188after the macro preprocessor without vendor-specific tools. Plus, in the 189case of GLSL, you'll depend on the driver implementation having a good 190macro preprocessor. 191 192## Preprocessor syntax {#syntax} 193 194The preprocessor always starts with \@ followed by the command, and often 195with arguments inside parenthesis. Note that the preprocessor is always 196case-sensitive. The following keywords are recognized: 197 198- \@property 199 200- \@foreach 201 202- \@counter 203 204- \@value 205 206- \@set add sub mul div mod min max 207 208- \@piece 209 210- \@insertpiece 211 212- \@pset padd psub pmul pdiv pmod pmin pmax 213 214### \@property( expression ) 215 216Checks whether the variables in the expression are true, if so, the text 217inside the block is printed. Must be finazlied with \@end. The expression 218is case-sensitive. When the variable hasn't been declared, it evaluates 219to false. 220 221The logical operands && || ! are valid. 222 223Examples: 224```cpp 225 \@property( hlms_skeleton ) 226 227 //Skeleton animation code here 228 229 @end 230 231 \@property( hlms_skeleton && !hlms_normal ) 232 233 //Print this code if it has skeleton animation but no normals 234 235 @end 236 237 \@property( hlms_normal || hlms_tangent ) 238 239 //Print this code if it has normals or tangents 240 241 @end 242 243 \@property( hlms_normal && (!hlms_skeleton || hlms_tangent) ) 244 245 //Print this code if it has normals and either no skeleton or tangents 246 247 @end 248``` 249 250It is very similar to \#if hlms_skeleton != 0 \#endif; however there is 251no equivalent \#else or \#elif syntax. As a simple workaround you can 252do: 253```cpp 254 \@property( hlms_skeleton ) 255 256 //Skeleton animation code here 257 258 @end \@property( !hlms_skeleton ) 259 260 //Non-Skeleton code here 261 262 @end 263``` 264 265Newlines are not necessary. The following is perfectly valid: 266``` 267 diffuse = surfaceDiffuse \@property( hasLights )* lightDiffuse@end ; 268``` 269 270Which will print: 271``` 272 hasLights != 0 hasLights == 0 273 diffuse = surfaceDiffuse * lightDiffuse; diffuse = surfaceDiffuse ; 274``` 275 276### \@foreach( scopedVar, count, \[start\] ) 277 278Loop that prints the text inside the block, The text is repeated count - 279start times. Must be finalized with \@end. 280 281- scopedVar is a variable that can be used to print the current 282 iteration of the loop while inside the block. i.e. “\@scopedVar” will 283 be converted into a number in the range \[start; count) 284 285- count The number of times to repeat the loop (if start = 0). Count 286 can read variables. 287 288- start Optional. Allows to start from a value different than 0. Start 289 can read variables. 290 291Newlines are very important, as they will be printed with the loop. 292 293Examples: 294| Expression | Output | 295|----------------|----------------| 296| \@foreach( 4, n ) <br>  \@n\@end | <br>0<br>1<br>2<br>3| 297| \@foreach( 4, n ) \@n\@end | 0 1 2 3 | 298| \@foreach( 4, n )<br> \@n<br>\@end | <br>0<br><br>1<br><br>2<br><br>3<br> | 299| \@foreach( 4, n, 2 ) \@n\@end | 2 3 | 300| \@pset( myStartVar, 1 )<br>\@pset( myCountVar, 3 )<br>\@foreach( myStartVar, n, myCountVar )<br> \@n\@end | 1<br>2 | 301| \@foreach( 2, n )<br> \@insertpiece( pieceName\@n )\@end | \@insertpiece( pieceName0 )<br> \@insertpiece( pieceName1 ) | 302 303> **Attention \#1!** 304> 305> Don't use the common letter i for the loop counter. It will conflict with other keywords. 306> 307> i.e. “\@foreach( 1, i )\@insertpiece( pieceName )\@end” will print “0nsertpiece( pieceName )” which is probably not what you intended. 308> 309> **Attention \#2!** 310> 311> foreach is parsed after property math (pset, padd, etc). That means that driving each iteration through a combination of properties and padd functions will not work as you would expect. 312> 313> i.e. The following code will not work: 314> 315> ```cpp 316> @pset( myVar, 1 ) 317> 318> @foreach( 2, n ) 319> 320> //Code 321> 322> @psub( myVar, 1 ) //Decrement myVar on each loop 323> 324> \@property( myVar ) 325> 326> //Code that shouldn't be printed in the last iteration 327> 328> @end 329> 330> @end 331>``` 332> 333> Because psub will be evaluated before expanding the foreach. 334 335### \@counter( variable ) 336 337Prints the current value of variable and increments it by 1. If the 338variable hasn't been declared yet, it is initialized to 0. 339 340Examples: 341``` 342 Expression Output 343 344 @counter( myVar ) 0 345 346 @counter( myVar ) 1 347 348 @counter( myVar ) 2 349``` 350 351### \@value( variable ) 352 353Prints the current value of variable without incrementing it. If the 354variable hasn't been declared, prints 0. 355```cpp 356 Expression Output 357 358 @value( myVar ) 0 359 360 @value( myVar ) 0 361 362 @counter( myVar ) 0 363 364 @value( myVar ) 1 365 366 @value( myVar ) 1 367``` 368 369### \@set add sub mul div mod min max 370 371Sets a variable to a given value, adds, subtracts, multiplies, divides, 372calculates modulus, or the minimum/maximum of a variable and a constant, 373or two variables. This family of functions get evaluated after 374foreach(s) have been expanded and pieces have been inserted. Doesn't 375print its value. 376 377Arguments can be in the form \@add(a, b) meaning a += b; or in the form 378\@add( a, b, c ) meaning a = b + c 379 380Useful in combination with \@counter and \@value 381 382| Expression | Output | Math | 383|-----------------|---------------|-------| 384| \@set( myVar, 1 ) <br> \@value( myVar ) | 1 | myVar = 1 | 385| \@add( myVar, 5 )<br> \@value( myVar ) | 6 | myVar = 1 + 5| 386| \@div( myVar, 2 ) <br> \@value( myVar ) | 3 | myVar = 6 / 2| 387| \@mul( myVar, myVar )<br> \@value( myVar ) | 9 | myVar = 3 * 3| 388| \@mod( myVar, 5 ) <br> \@value( myVar ) | 4 | myVar = 9 % 5| 389| \@add( myVar, 1, 1 ) <br> \@value( myVar ) | 2 | myVar = 1 + 1| 390 391### \@piece( nameOfPiece ) 392 393Saves all the text inside the blocks and saves it as a named piece. If a 394piece with the given name already exists, a compiler error will be 395thrown. The text that was inside the block won't be printed. Useful when 396in combination with \@insertpiece. Pieces can also be defined from C++ or 397[*collected*](#toc69) from piece template files. 398 399Example: 400```cpp 401 Expression Output 402 403 @piece( VertexTransform ) 404 405 outPos = worldViewProj * inPos 406 407 @end 408``` 409 410### \@insertpiece( nameOfPiece ) 411 412Prints a block of text that was previously saved with piece (or from 413C++). If no piece with such name exists, prints nothing. 414 415Example: 416``` 417 Expression Output 418 419 @piece( VertexTransform )outPos = worldViewProj * inPos@end void main() 420 421 void main() { 422 423 { outPos = worldViewProj * inPos 424 425 @insertpiece( VertexTransform ) } 426 427 @insertpiece( InexistentPiece ) 428 429 } 430``` 431 432### \@pset padd psub pmul pdiv pmod pmin pmax 433 434Analogous to [*the family of math functions without the 'p' 435prefix*](#toc304). The difference is that the math is evaluated before 436anything else. There is no much use to these functions, probably except 437for quickly testing whether a given flag/variable is being properly set 438from C++ without having to recompile. 439 440i.e. If you suspect hlms_normal is never being set, try \@pset( 441hlms_normal, 1 ) 442 443One important use worth mentioning, is that variables retain their 444values across shader stages. First the vertex shader template is parsed, 445then the pixel shader one. If 'myVal' is 0 and the vertex shader 446contains \@counter( myVal ); when the pixel shader is parsed \@value( 447myVal ) will return 1, not 0. 448 449If you need to reset these variables across shader stages, you can use 450pset( myVal, 0 ); which is guaranteed to reset your variable to 0 before 451anything else happens; even if the pset is stored in a piece file. 452 453# Creation of shaders {#shaders} 454 455There are two components that needs to be evaluated that may affect the 456shader itself and would need to be recompiled: 457 4581. The Datablock/Material. Does it have Normal maps? Then include code 459 to sample the normal map and affect the lighting calculations. Does 460 it have a diffuse map? If not, avoid sampling the diffuse map and 461 multiplying it against the diffuse colour, etc. 462 4632. The Mesh. Is it skeletally animated? Then include skeletal 464 animation code. How many blend weights? Modify the skeletal 465 animation code appropiately. It doesn't have tangents? Then skip the 466 normal map defined in the material. And so on. 467 468When calling Ogre::SceneManager::_renderScene, what happens is that 469Ogre::ShaderManager::getGpuProgram will get called and this function evaluates both 470the mesh and datablock compatibility. 471 472If they're compatible, all the variables (aka properties) and pieces are 473generated and cached in a structure (mShaderCache) with a hash key 474to this cache entry. If a different pair of datablock-mesh ends up 475having the same properties and pieces, they will get the same hash (and 476share the same shader). 477 478The following graph summarizes the process: 479 480![](hlms_hash.svg) 481 482Later on during rendering, at the start each render pass, a similar 483process is done, which ends up generating a “[*pass hash*](#toc567)” 484instead of a renderable hash. Pass data stores settings like number of 485shadow casting lights, number of lights per type (directional, point, 486spot). 487 488While iterating each renderable for render, the hash key is read from 489the Renderable and merged with the pass' hash. With the merged hash, the 490shader is retrieved from a cache. If it's not in the cache, the shader 491will be generated and compiled by merging the cached data (pieces and 492variables) from the Renderable and the Pass. The following graph 493illustrates the process: 494 495![](hlms_caching.svg) 496 497# C++ interaction with shader templates {#cpp} 498 499Note: This section is relevant to those seeking to write their own Hlms 500implementation. 501 502C++ can use Ogre::HlmsMaterialBase::getPropertyMap().setProperty( "key", value ) to set “key” to the given 503value. This value can be read by \@property, \@foreach, 504\@add/sub/mul/div/mod, \@counter, \@value and \@padd/psub/pmul/pdiv/pmod 505 506To create pieces (or read them) you need to pass your custom 507Hlms::PiecesMap to Hlms::addRenderableCache. 508 509The recommended place to do this is in Hlms::calculateHashForPreCreate 510and Hlms::calculateHashForPreCaster. Both are virtual. The former gets 511called right before adding the set of properties, pieces and hash to the 512cache, while the latter happens right before adding the similar set for 513the shadow caster pass. 514 515In those two functions you get the chance to call setProperty to set 516your own variables and add your own pieces. 517 518Another option is to overload Hlms::calculateHashFor which gives you 519more control but you'll have to do some of the work the base class does. 520 521For some particularly complex features, the Hlms preprocessor may not be 522enough, too difficult, or just impossible to implement, and thus you can 523generate the string from C++ and send it as a piece. The template shader 524can insert it using \@insertpiece. 525 526The function Hlms::createShaderCacheEntry is the main responsible for 527generating the shaders and parsing the template through the Hlms 528preprocessor. If you overload it, you can ignore pieces, properties; 529basically override the entire Hlms system and provide the source for the 530shaders yourself. 531 532## Common conventions 533 534Properties starting with 'hlms_' prefix are common to all or most Hlms 535implementations. i.e. 'hlms_skeleton' is set to 1 when a skeleton is 536present and hardware skinning should be performed. 537 538Save properties' IdStrings (hashed strings) into constant as performance 539optimizations. Ideally the compiler should detect the constant 540propagation and this shouldn't be needed, but this often isn't the case. 541 542For mobile, avoid mat4 and do the math yourself. As for 4x3 matrices 543(i.e. skinning), perform the math manually as many GLES2 drivers have 544issues compiling valid glsl code. 545 546Properties in underscore\_case are set from C++; propierties in 547camelCase are set from the template. 548 549Propierties and pieces starting with 'custom\_' are for user 550customizations of the template 551 552TBD 553 554## Disabling a stage 555 556By default if a template isn't present, the shader stage won't be 557created. e.g. if there is no GeometryShader\_gs.glsl file, no geometry 558shader will be created. However there are times where you want to use a 559template but only use this stage in particular scenarios (e.g. toggled 560by a material parameter, disable it for shadow mapping, etc.). In this 561case, set the property hlms_disable\_stage to non-zero from within the 562template (i.e. using \@set) . The value of this property is reset to 0 563for every stage. 564 565Note that even when disabled, the Hlms template will be fully parsed and 566dumped to disk; and any modification you perform to the Hlms properties 567will be carried over to the next stages. Setting hlms_disable\_stage is 568not an early out or an abort. 569 570# Customization {#customization} 571 572In many cases, users may want to slightly customize the shaders to 573achieve a particular look, implement a specific feature, or solve a 574unique problem; without having to rewrite the whole implementation. 575 576Maximum flexibility can be get by directly modifying the original source 577code. However this isn't modular, making it difficult to merge when the 578original source code has changed. Most of of the customizations don't 579require such intrusive approach. 580 581Note: For performance reasons, the listener interface does not allow you 582to add customizations that work per Renderable, as that loop is 583performance sensitive. The only listener callback that works inside 584Hlms::fillBuffersFor is hlmsTypeChanged which only gets evaluated when 585the previous Renderable used a different Hlms implementation; which is 586rare, and since we sort the RenderQueue, it often branch predicts well. 587 588There are different levels in which an Hlms implementation can be 589customized: 590 5911. Using a library, see [*Hlms Initialization*](#toc574). pass a set 592 of piece files in a folder by pushing the folder to ArchiveVec. The 593 files in that folder will be parsed first, in order (archiveVec\[0\] 594 then archiveVec\[1\], … archiveVec\[N-1\]); which will let you 595 define your own pieces to insert code into the default template (see 596 the the table at the end). You can also do clever tricky things to 597 avoid dealing with C++ code at all even if there are no 'custom\_' 598 pieces for it. For example, you can write the following code to 599 override the BRDF declarations and provide a custom BRDF: 600```cpp 601 //Disable all known BRDFs that the implementation may enable 602 603 @pset( BRDF_CookTorrance, 0 ) 604 605 @pset( BRDF_Default, 0 ) 606 607 @piece( DeclareBRDF ) 608 609 // Your BRDF code declaration here 610 611 @end 612``` 613 6141. Via listener, through HlmsListener. This allows you to have access 615 to the buffer pass to fill extra information; or bind extra buffers 616 to the shader. 617 6182. Overload HlmsPbs. Useful for overriding only specific parts, or 619 adding new functionality that requires storing extra information in 620 a datablock (e.g. overload HlmsPbsDatablock to add more variables, 621 and then overload HlmsPbs::createDatablockImpl to create these 622 custom datablocks) 623 6243. Directly modify HlmsPbs, HlmsPbsDatablock and the template. 625| Variable | Description | 626|----------|-------------| 627| custom_passBuffer | Piece where users can add extra information for the pass buffer (only useful if the user is using HlmsListener or overloaded HlmsPbs. | 628| custom_VStoPS | Piece where users can add more interpolants for passing data from the vertex to the pixel shader.| 629| custom_vs_attributes | Custom vertex shader attributes in the Vertex Shader (i.e. a special texcoord, etc).| 630| custom_vs_uniformDeclaration | Data declaration (textures, texture buffers, uniform buffers) in the Vertex Shader.| 631| custom_vs_preExecution | Executed before Ogre's code from the Vertex Shader.| 632| custom_vs_posExecution | Executed after all code from the Vertex Shader has been performed. | 633| custom_ps_uniformDeclaration | Same as custom_vs_uniformDeclaration, but for the Pixel Shader| 634| custom_ps_preExecution | Executed before Ogre's code from the Pixel Shader.| 635| custom_ps_posMaterialLoad | Executed right after loading material data; and before anything else. May not get executed if there is no relevant material data (i.e. doesn't have normals or QTangents for lighting calculation)| 636| custom_ps_preLights | Executed right before any light (i.e. to perform your own ambient / global illumination pass). All relevant texture data should be loaded by now.| 637| custom_ps_posExecution | Executed after all code from the Pixel Shader has been performed.| 638