1<!-- markdownlint-disable MD041 --> 2<!-- Copyright 2015-2021 LunarG, Inc. --> 3[![Khronos Vulkan][1]][2] 4 5[1]: https://vulkan.lunarg.com/img/Vulkan_100px_Dec16.png "https://www.khronos.org/vulkan/" 6[2]: https://www.khronos.org/vulkan/ 7 8# GPU-Assisted Validation 9 10GPU-Assisted Validation is implemented in the SPIR-V Tools optimizer and the `VK_LAYER_KHRONOS_validation` layer. 11This document covers the design of the layer portion of the implementation. 12 13GPU-Assisted Validation can easily be enabled and configured using the [Vulkan Configurator](https://vulkan.lunarg.com/doc/sdk/latest/windows/vkconfig.html) included with the Vulkan SDK. Or you can manually enable and configure by following the directions below. 14 15## Basic Operation 16 17The basic operation of GPU-Assisted Validation is comprised of instrumenting shader code to perform run-time checking of shaders and 18reporting any error conditions to the layer. 19The layer then reports the errors to the user via the same reporting mechanisms used by the rest of the validation system. 20 21The layer instruments the shaders by passing the shader's SPIR-V bytecode to the SPIR-V optimizer component and 22instructs the optimizer to perform an instrumentation pass to add the additional instructions to perform the run-time checking. 23The layer then passes the resulting modified SPIR-V bytecode to the driver as part of the process of creating a ShaderModule. 24 25The layer also allocates a buffer that describes the length of all descriptor arrays and the write state of each element of each array. 26It only does this if the VK_EXT_descriptor_indexing extension is enabled. 27 28The layer also allocates a buffer that describes all addresses retrieved from vkGetBufferDeviceAddressEXT and the sizes of the corresponding buffers. 29It only does this if the VK_EXT_buffer_device_address extension is enabled. 30 31As the shader is executed, the instrumented shader code performs the run-time checks. 32If a check detects an error condition, the instrumentation code writes an error record into the GPU's device memory. 33This record is small and is on the order of a dozen 32-bit words. 34Since multiple shader stages and multiple invocations of a shader can all detect errors, the instrumentation code 35writes error records into consecutive memory locations as long as there is space available in the pre-allocated block of device memory. 36 37The layer inspects this device memory block after completion of a queue submission. 38If the GPU had written an error record to this memory block, 39the layer analyzes this error record and constructs a validation error message 40which is then reported in the same manner as other validation messages. 41If the shader was compiled with debug information (source code and SPIR-V instruction mapping to source code lines), the layer 42also provides the line of shader source code that provoked the error as part of the validation error message. 43 44## GPU-Assisted Validation Checks 45 46The initial release (Jan 2019) of GPU-Assisted Validation includes checking for out-of-bounds descriptor array indexing 47for image/texel descriptor types. 48 49The second release (Apr 2019) adds validation for out-of-bounds descriptor array indexing and use of unwritten descriptors when the 50VK_EXT_descriptor_indexing extension is enabled. Also added (June 2019) was validation for buffer descriptors. 51 52A third update (Aug 2019) adds validation of building top level acceleration structure for ray tracing when the 53VK_NV_ray_tracing extension is enabled. 54 55(August 2019) Add bounds checking for pointers retrieved from vkGetBufferDeviceAddressEXT. 56 57(December 2020) Add bounds checking for reads and writes to uniform buffers, storage buffers, uniform texel buffers, and storage texel buffers 58 59### Out-of-Bounds(OOB) Descriptor Array Indexing 60 61Checking for correct indexing of descriptor arrays is sometimes referred to as "bind-less validation". 62It is called "bind-less" because a binding in a descriptor set may contain an array of like descriptors. 63And unless there is a constant or compile-time indication of which descriptor in the array is selected, 64the descriptor binding status is considered to be ambiguous, leaving the actual binding to be determined at run-time. 65 66As an example, a fragment shader program may use a variable to index an array of combined image samplers. 67Such a line might look like: 68 69```glsl 70uFragColor = light * texture(tex[tex_ind], texcoord.xy); 71``` 72 73The array of combined image samplers is `tex` and has 6 samplers in the array. 74The complete validation error message issued when `tex_ind` indexes past the array is: 75 76```terminal 77ERROR : VALIDATION - Message Id Number: 0 | Message Id Name: UNASSIGNED-Image descriptor index out of bounds 78 Index of 6 used to index descriptor array of length 6. Command buffer (CubeDrawCommandBuf)(0xbc24b0). 79 Pipeline (0x45). Shader Module (0x43). Shader Instruction Index = 108. Stage = Fragment. 80 Fragment coord (x,y) = (419.5, 254.5). Shader validation error occurred in file: 81 /home/user/src/Vulkan-ValidationLayers/external/Vulkan-Tools/cube/cube.frag at line 45. 8245: uFragColor = light * texture(tex[tex_ind], texcoord.xy); 83``` 84The VK_EXT_descriptor_indexing extension allows a shader to declare a descriptor array without specifying its size 85```glsl 86layout(set = 0, binding = 1) uniform sampler2D tex[]; 87``` 88In this case, the layer needs to tell the optimization code how big the descriptor array is so the code can determine what is out of 89bounds and what is not. 90 91The extension also allows descriptor set bindings to be partially bound, meaning that as long as the shader doesn't use certain 92array elements, those elements are not required to have been written. 93The instrumentation code needs to know which elements of a descriptor array have been written, so that it can tell if one is used 94that has not been written. 95 96Note that currently, VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT validation is not working and all accesses are reported as valid. 97 98### Buffer device address checking 99The vkGetBufferDeviceAddressEXT routine can be used to get a GPU address that a shader can use to directly address a particular buffer. 100GPU-Assisted Validation code keeps track of all such addresses, along with the size of the associated buffer, and creates an input buffer listing all such address/size pairs 101Shader code is instrumented to validate buffer_reference addresses and report any reads or writes that do no fall within the listed address/size regions._ 102 103## GPU-Assisted Validation Options 104 105Here are the options related to activating GPU-Assisted Validation: 106 1071. Enable GPU-Assisted Validation - GPU-Assisted Validation is off by default and must be enabled. 108 109 GPU-Assisted Validation is disabled by default because the shader instrumentation may introduce significant 110 shader performance degradation and additional resource consumption. 111 GPU-Assisted Validation requires additional resources such as device memory and descriptors. 112 It is desirable for the user to opt-in to this feature because of these requirements. 113 In addition, there are several limitations that may adversely affect application behavior, 114 as described later in this document. 115 1162. Reserve a Descriptor Set Binding Slot - Modifies the value of the `VkPhysicalDeviceLimits::maxBoundDescriptorSets` 117 property to return a value one less than the actual device's value to "reserve" a descriptor set binding slot for use by GPU validation. 118 119 This option is likely only of interest to applications that dynamically adjust their descriptor set bindings to adjust for 120 the limits of the device. 121 122### Enabling and Specifying Options with a Configuration File 123 124The existing layer configuration file mechanism can be used to enable GPU-Assisted Validation. 125This mechanism is described on the 126[LunarXchange website](https://vulkan.lunarg.com/doc/sdk/latest/windows/layer_configuration.html), 127in the "Layers Overview and Configuration" document. 128 129To turn on GPU validation, add the following to your layer settings file, which is often 130named `vk_layer_settings.txt`. 131 132```code 133khronos_validation.enables = VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_EXT 134``` 135 136To turn on GPU validation and request to reserve a binding slot: 137 138```code 139khronos_validation.enables = VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_EXT,VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_RESERVE_BINDING_SLOT_EXT 140``` 141 142Some platforms do not support configuration of the validation layers with this configuration file. 143Programs running on these platforms must then use the programmatic interface. 144 145### Enabling and Specifying Options with the Programmatic Interface 146 147The `VK_EXT_validation_features` extension can be used to enable GPU-Assisted Validation at CreateInstance time. 148 149Here is sample code illustrating how to enable it: 150 151```C 152VkValidationFeatureEnableEXT enables[] = {VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_EXT}; 153VkValidationFeaturesEXT features = {}; 154features.sType = VK_STRUCTURE_TYPE_VALIDATION_FEATURES_EXT; 155features.enabledValidationFeatureCount = 1; 156features.pEnabledValidationFeatures = enables; 157 158VkInstanceCreateInfo info = {}; 159info.pNext = &features; 160``` 161 162Use the `VK_VALIDATION_FEATURE_ENABLE_GPU_ASSISTED_RESERVE_BINDING_SLOT_EXT` enum to reserve a binding slot. 163 164## GPU-Assisted Validation Limitations 165 166There are several limitations that may impede the operation of GPU-Assisted Validation: 167 168### Vulkan 1.1 169 170Vulkan 1.1 or later is required because the GPU instrumentation code uses SPIR-V 1.3 features. 171Vulkan 1,1 is required to ensure that SPIR-V 1.3 is available. 172 173### Descriptor Types 174 175The current implementation works with image, texel, and buffer descriptor types. 176A complete list appears later in this document. 177 178### Descriptor Set Binding Limit 179 180This is probably the most important limitation and is related to the 181`VkPhysicalDeviceLimits::maxBoundDescriptorSets` device limit. 182 183When applications use all the available descriptor set binding slots, 184GPU-Assisted Validation cannot be performed because it needs a descriptor set to 185locate the memory for writing the error report record. 186 187This problem is most likely to occur on devices, often mobile, that support only the 188minimum required value for `VkPhysicalDeviceLimits::maxBoundDescriptorSets`, which is 4. 189Some applications may be written to use 4 slots since this is the highest value that 190is guaranteed by the specification. 191When such an application using 4 slots runs on a device with only 4 slots, 192then GPU-Assisted Validation cannot be performed. 193 194In this implementation, this condition is detected and gracefully recovered from by 195building the graphics pipeline with non-instrumented shaders instead of instrumented ones. 196An error message is also displayed informing the user of the condition. 197 198Applications don't have many options in this situation and it is anticipated that 199changing the application to free a slot is difficult. 200 201### Device Memory 202 203GPU-Assisted Validation does allocate device memory for the error report buffers, and if 204descriptor indexing is enabled, for the input buffer of descriptor sizes and write state. 205This can lead to a greater chance of memory exhaustion, especially in cases where 206the application is trying to use all of the available memory. 207The extra memory allocations are also not visible to the application, making it 208impossible for the application to account for them. 209 210Note that if descriptor indexing is enabled, the input buffer size will be equal to 211(1 + (number_of_sets * 2) + (binding_count * 2) + descriptor_count) words of memory where 212binding_count is the binding number of the largest binding in the set. 213This means that sparsely populated sets and sets with a very large binding will cause 214the input buffer to be much larger than it could be with more densely packed binding numbers. 215As a best practice, when using GPU-Assisted Validation with descriptor indexing enabled, 216make sure descriptor bindings are densely packed. 217 218If GPU-Assisted Validation device memory allocations fail, the device could become 219unstable because some previously-built pipelines may contain instrumented shaders. 220This is a condition that is nearly impossible to recover from, so the layer just 221prints an error message and refrains from any further allocations or instrumentations. 222There is a reasonable chance to recover from these conditions, 223especially if the instrumentation does not write any error records. 224 225### Descriptors 226 227This is roughly the same problem as the device memory problem mentioned above, 228but for descriptors. 229Any failure to allocate a descriptor set means that the instrumented shader code 230won't have a place to write error records, resulting in unpredictable device 231behavior. 232 233### Other Device Limits 234 235This implementation uses additional resources that may count against the following limits, 236and possibly others: 237 238* `maxMemoryAllocationCount` 239* `maxBoundDescriptorSets` 240* `maxPerStageDescriptorStorageBuffers` 241* `maxPerStageResources` 242* `maxDescriptorSetStorageBuffers` 243* `maxFragmentCombinedOutputResources` 244 245The implementation does not take steps to avoid exceeding these limits 246and does not update the tracking performed by other validation functions. 247 248### A Note About the `VK_EXT_buffer_device_address` Extension 249 250The recently introduced `VK_EXT_buffer_device_address` extension can be used 251to implement GPU-Assisted Validation without some of the limitations described above. 252This approach would use this extension to obtain a GPU device pointer to a storage 253buffer and make it available to the shader via a specialization constant. 254This technique removes the need to create descriptors, use a descriptor set slot, 255modify pipeline layouts, etc, and would relax some of the limitations listed above. 256 257This alternate implementation is under consideration. 258 259## GPU-Assisted Validation Internal Design 260 261This section may be of interest to readers who are interested on how GPU-Assisted Validation is implemented. 262It isn't necessarily required for using the feature. 263 264### General 265 266In general, the implementation does: 267 268* For each draw, dispatch, and trace rays call, allocate a buffer with enough device memory to hold a single debug output record written by the 269 instrumented shader code. 270 If descriptor indexing is enabled, calculate the amount of memory needed to describe the descriptor arrays sizes and 271 write states and allocate device memory and a buffer for input to the instrumented shader. 272 The Vulkan Memory Allocator is used to handle this efficiently. 273 274 There is probably little advantage in providing a larger output buffer in order to obtain more debug records. 275 It is likely, especially for fragment shaders, that multiple errors occurring near each other have the same root cause. 276 277 A block is allocated on a per draw basis to make it possible to associate a shader debug error record with 278 a draw within a command buffer. 279 This is done partly to give the user more information in the error report, namely the command buffer handle/name and the draw within that command buffer. 280 An alternative design allocates this block on a per-device or per-queue basis and should work. 281 However, it is not possible to identify the command buffer that causes the error if multiple command buffers 282 are submitted at once. 283* For each draw, dispatch, and trace rays call, allocate a descriptor set and update it to point to the block of device memory just allocated. 284 If descriptor indexing is enabled, also update the descriptor set to point to the allocated input buffer. 285 Fill the DI input buffer with the size and write state information for each descriptor array. 286 There is a descriptor set manager to handle this efficiently. 287 If the buffer device address extension is enabled, allocate an input buffer to hold the address / size pairs for all addresses retrieved from vkGetBufferDeviceAddressEXT. 288 Also make an additional call down the chain to create a bind descriptor set command to bind our descriptor set at the desired index. 289 This has the effect of binding the device memory block belonging to this draw so that the GPU instrumentation 290 writes into this buffer for when the draw is executed. 291 The end result is that each draw call has its own buffer containing GPU instrumentation error 292 records, if any occurred while executing that draw. 293* Determine the descriptor set binding index that is eventually used to bind the descriptor set just allocated and updated. 294 Usually, it is `VkPhysicalDeviceLimits::maxBoundDescriptorSets` minus one. 295 For devices that have a very high or no limit on this bound, pick an index that isn't too high, but above most other device 296 maxima such as 32. 297* When creating a ShaderModule, pass the SPIR-V bytecode to the SPIR-V optimizer to perform the instrumentation pass. 298 Pass the desired descriptor set binding index to the optimizer via a parameter so that the instrumented 299 code knows which descriptor to use for writing error report data to the memory block. 300 If descriptor indexing is enabled, turn on OOB and write state checking in the instrumentation pass. 301 If the buffer_device_address extension is enabled, apply a pass to add instrumentation checking for out of bounds buffer references. 302 Use the instrumented bytecode to create the ShaderModule. 303* For all pipeline layouts, add our descriptor set to the layout, at the binding index determined earlier. 304 Fill any gaps with empty descriptor sets. 305 306 If the incoming layout already has a descriptor set placed at our desired index, the layer must not add its 307 descriptor set to the layout, replacing the one in the incoming layout. 308 Instead, the layer leaves the layout alone and later replaces the instrumented shaders with 309 non-instrumented ones when the pipeline layout is later used to create a graphics pipeline. 310 The layer issues an error message to report this condition. 311* When creating a GraphicsPipeline, ComputePipeline, or RayTracingPipeline, check to see if the pipeline is using the debug binding index. 312 If it is, replace the instrumented shaders in the pipeline with non-instrumented ones. 313* Before calling QueueSubmit, if descriptor indexing is enabled, check to see if there were any unwritten descriptors that were declared 314 update-after-bind. 315 If there were, update the write state of those elements. 316* After calling QueueSubmit, perform a wait on the queue to allow the queue to finish executing. 317 Then map and examine the device memory block for each draw or trace ray command that was submitted. 318 If any debug record is found, generate a validation error message for each record found. 319 320The above describes only the high-level details of GPU-Assisted Validation operation. 321More detail is found in the discussion of the individual hooked functions below. 322 323### Initialization 324 325When the validation layer loads, it examines the user options from both the layer settings file and the 326`VK_EXT_validation_features` extension. 327Note that it also processes the subsumed `VK_EXT_validation_flags` extension for simple backwards compatibility. 328From these options, the layer sets instance-scope flags in the validation layer tracking data to indicate if 329GPU-Assisted Validation has been requested, along with any other associated options. 330 331### "Calling Down the Chain" 332 333Much of the GPU-Assisted Validation implementation involves making "application level" Vulkan API 334calls outside of the application's API usage to create resources and perform its required operations 335inside of the validation layer. 336These calls are not routed up through the top of the loader/layer/driver call stack via the loader. 337Instead, they are simply dispatched via the containing layer's dispatch table. 338 339These calls therefore don't pass through any validation checks that occur before the GPU validation checks are run. 340This doesn't present any particular problem, but it does raise some issues: 341 342* The additional API calls are not fully validated 343 344 This implies that this additional code may never be checked for validation errors. 345 To address this, the code can "just" be written carefully so that it is "valid" Vulkan, 346 which is hard to do. 347 348 Or, this code can be checked by loading a Khronos validation layer with 349 GPU validation enabled on top of "normal" standard validation in the 350 layer stack, which effectively validates the API usage of this code. 351 This sort of checking is performed by layer developers to check that the additional 352 Vulkan usage is valid. 353 354 This validation can be accomplished by: 355 356 * Building the validation layer with a hack to force GPU-Assisted Validation to be enabled (don't use the exposed mechanisms because you probably don't want it enabled twice). 357 * Rename this layer binary to something else like "khronos_validation2" to keep it apart from the 358 "normal" Khronos validation. 359 * Create a new JSON file with the new layer name. 360 * Set up the layer stack so that the "khronos_validation2" layer is on top of or before the actual Khronos 361 validation layer. 362 * Then run tests and check for validation errors pointing to API usage in the "khronos_validation2" layer. 363 364 This should only need to be done after making any major changes to the implementation. 365 366 Another approach involves capturing an application trace with `vktrace` and then playing 367 it back with `vkreplay`. 368 369* The additional API calls are not state-tracked 370 371 This means that things like device memory allocations and descriptor allocations are not 372 tracked and do not show up in any of the bookkeeping performed by the validation layers. 373 For example, any device memory allocation performed by GPU-Assisted Validation won't be 374 counted towards the maximum number of allocations allowed by a device. 375 This could lead to an early allocation failure that is not accompanied by a validation error. 376 377 This shortcoming is left as not addressed in this implementation because it is anticipated that 378 a later implementation of GPU-Assisted Validation using the `VK_EXT_buffer_device_address` 379 extension will have less of a need to allocate these 380 tracked resources and it therefore becomes less of an issue. 381 382### Code Structure and Relationship to the Core Validation Layer 383 384The GPU-Assisted Validation code is largely contained in one 385[file](https://github.com/KhronosGroup/Vulkan-ValidationLayers/blob/master/layers/gpu_validation.cpp), with "hooks" in 386the other validation code that call functions in this file. 387These hooks in the validation code look something like this: 388 389```C 390if (GetEnables(dev_data)->gpu_validation) { 391 GpuPreCallRecordDestroyPipeline(dev_data, pipeline_state); 392} 393``` 394 395The GPU-Assisted Validation code is linked into the shared library for the Khronos and core validation layers. 396 397#### Review of Khronos Validation Code Structure 398 399Each function for a Vulkan API command intercepted in the Khronos validation layer is usually split up 400into several decomposed functions in order to organize the implementation. 401These functions take the form of: 402 403* PreCallValidate<foo>: Perform validation steps before calling down the chain 404* PostCallValidate<foo>: Perform validation steps after calling down the chain 405* PreCallRecord<foo>: Perform state recording before calling down the chain 406* PostCallRecord<foo>: Perform state recording after calling down the chain 407 408The GPU-Assisted Validation functions follow this pattern not by hooking into the top-level validation API shim, but 409by hooking one of these decomposed functions. 410 411The design of each hooked function follows: 412 413#### GpuPreCallRecordCreateDevice 414 415* Modify the `VkPhysicalDeviceFeatures` to turn on two additional physical device features: 416 * `fragmentStoresAndAtomics` 417 * `vertexPipelineStoresAndAtomics` 418 419#### GpuPostCallRecordCreateDevice 420 421* Determine and record (save in device state) the desired descriptor set binding index 422* Initialize Vulkan Memory Allocator 423 * Determine error record block size based on the maximum size of the error record and alignment limits of the device 424* Initialize descriptor set manager 425* Make a descriptor set layout to describe our descriptor set 426* Make a descriptor set layout to describe a "dummy" descriptor set that contains no descriptors 427 * This is used to "pad" pipeline layouts to fill any gaps between the used bind indices and our bind index 428* Record these objects in the per-device state 429 430#### GpuPreCallRecordDestroyDevice 431 432* Destroy descriptor set layouts created in CreateDevice 433* Clean up descriptor set manager 434* Clean up Vulkan Memory Allocator (VMA) 435* Clean up device state 436 437#### GpuAllocateValidationResources 438 439* For each Draw, Dispatch, or TraceRays call: 440 * Get a descriptor set from the descriptor set manager 441 * Get an output buffer and associated memory from VMA 442 * If descriptor indexing is enabled, get an input buffer and fill with descriptor array information 443 * If buffer device address is enabled, get an input buffer and fill with address / size pairs for addresses retrieved from vkGetBufferDeviceAddressEXT 444 * Update (write) the descriptor set with the memory info 445 * Check to see if the layout for the pipeline just bound is using our selected bind index 446 * If no conflict, add an additional command to the command buffer to bind our descriptor set at our selected index 447* Record the above objects in the per-CB state; 448Note that the Draw and Dispatch calls include vkCmdDraw, vkCmdDrawIndexed, vkCmdDrawIndirect, vkCmdDrawIndexedIndirect, vkCmdDispatch, vkCmdDispatchIndirect, and vkCmdTraceRaysNV. 449 450#### GpuPreCallRecordFreeCommandBuffers 451 452* For each command buffer: 453 * Destroy the VMA buffer(s), releasing the memory 454 * Give the descriptor sets back to the descriptor set manager 455 * Clean up CB state 456 457#### GpuOverrideDispatchCreateShaderModule 458 459This function is called from PreCallRecordCreateShaderModule. 460This routine sets up to call the SPIR-V optimizer to run the "BindlessCheckPass", replacing the original SPIR-V with the instrumented SPIR-V 461which is then used in the call down the chain to CreateShaderModule. 462 463This function generates a "unique shader ID" that is passed to the SPIR-V optimizer, 464which the instrumented code puts in the debug error record to identify the shader. 465This ID is returned by this function so it can be recorded in the shader module at PostCallRecord time. 466It would have been convenient to use the shader module handle returned from the driver to use as this shader ID. 467But the shader needs to be instrumented before creating the shader module and therefore the handle is not available to use 468as this ID to pass to the optimizer. 469Therefore, the layer keeps a "counter" in per-device state that is incremented each time a shader is instrumented 470to generate unique IDs. 471This unique ID is given to the SPIR-V optimizer and is stored in the shader module state tracker after the shader module is created, which creates the necessary association between the ID and the shader module. 472 473The process of instrumenting the SPIR-V also includes passing the selected descriptor set binding index 474to the SPIR-V optimizer which the instrumented 475code uses to locate the memory block used to write the debug error record. 476An instrumented shader is now "hard-wired" to write error records via the descriptor set at that binding 477if it detects an error. 478This implies that the instrumented shaders should only be allowed to run when the correct bindings are in place. 479 480The original SPIR-V bytecode is left stored in the shader module tracking data. 481This is important because the layer may need to replace the instrumented shader with the original shader if, for example, 482there is a binding index conflict. 483The application cannot destroy the shader module until it has used the shader module to create the pipeline. 484This ensures that the original SPIR-V bytecode is available if we need it to replace the instrumented shader. 485 486#### GpuOverrideDispatchCreatePipelineLayout 487 488This is function is called through PreCallRecordCreatePipelineLayout. 489 490* Check for a descriptor set binding index conflict. 491 * If there is one, issue an error message and leave the pipeline layout unmodified 492 * If no conflict, for each pipeline layout: 493 * Create a new pipeline layout 494 * Copy the original descriptor set layouts into the new pipeline layout 495 * Pad the new pipeline layout with dummy descriptor set layouts up to but not including the last one 496 * Add our descriptor set layout as the last one in the new pipeline layout 497* Create the pipeline layouts by calling down the chain with the original or modified create info 498 499#### GpuPreCallQueueSubmit 500 501* For each primary and secondary command buffer in the submission: 502 * Call helper function to see if there are any update after bind descriptors whose write state may need to be updated 503 and if so, map the input buffer and update the state. 504 505#### GpuPostCallQueueSubmit 506 507* Submit a command buffer containing a memory barrier to make GPU writes available to the host domain. 508* Call QueueWaitIdle. 509* For each primary and secondary command buffer in the submission: 510 * Call a helper function to process the instrumentation debug buffers (described later) 511 512#### GpuPreCallValidateCmdWaitEvents 513 514* Report an error about a possible deadlock if CmdWaitEvents is recorded with VK_PIPELINE_STAGE_HOST_BIT set. 515 516#### GpuPreCallRecordCreateGraphicsPipelines 517 518* Examine the pipelines to see if any use the debug descriptor set binding index 519* For those that do: 520 * Create non-instrumented shader modules from the saved original SPIR-V 521 * Modify the CreateInfo data to use these non-instrumented shaders. 522 * This prevents instrumented shaders from using the application's descriptor set. 523 524#### GpuPostCallRecordCreateGraphicsPipelines 525 526* For every shader in the pipeline: 527 * Destroy the shader module created in GpuPreCallRecordCreateGraphicsPipelines, if any 528 * These are found in the CreateInfo used to create the pipeline and not in the shader_module 529 * Create a shader tracking record that saves: 530 * shader module handle 531 * unique shader id 532 * graphics pipeline handle 533 * shader bytecode if it contains debug info 534 535This tracker is used to attach the shader bytecode to the shader in case it is needed 536later to get the shader source code debug info. 537 538The current shader module tracker in the validation code stores the bytecode, 539but this tracker has the same life cycle as the shader module itself. 540It is possible for the application to destroy the shader module after 541creating graphics pipeline and before submitting work that uses the shader, 542making the shader bytecode unavailable if needed for later analysis. 543Therefore, the bytecode must be saved at this opportunity. 544 545This tracker exists as long as the graphics pipeline exists, 546so the graphics pipeline handle is also stored in this tracker so that it can 547be looked up when the graphics pipeline is destroyed. 548At that point, it is safe to free the bytecode since the pipeline is never used again. 549 550#### GpuPreCallRecordDestroyPipeline 551 552* Find the shader tracker(s) with the graphics pipeline handle and free the tracker, along with any bytecode it has stored in it. 553 554### Shader Instrumentation Scope 555 556The shader instrumentation process performed by the SPIR-V optimizer applies descriptor index bounds checking 557to descriptors of the following types: 558 559 VK_DESCRIPTOR_TYPE_STORAGE_IMAGE 560 VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE 561 VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER 562 VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER 563 VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER 564 VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER 565 VK_DESCRIPTOR_TYPE_STORAGE_BUFFER 566 VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC 567 VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC 568 569Instrumentation is applied to the following SPIR-V operations: 570 571 OpImageSampleImplicitLod 572 OpImageSampleExplicitLod 573 OpImageSampleDrefImplicitLod 574 OpImageSampleDrefExplicitLod 575 OpImageSampleProjImplicitLod 576 OpImageSampleProjExplicitLod 577 OpImageSampleProjDrefImplicitLod 578 OpImageSampleProjDrefExplicitLod 579 OpImageGather 580 OpImageDrefGather 581 OpImageQueryLod 582 OpImageSparseSampleImplicitLod 583 OpImageSparseSampleExplicitLod 584 OpImageSparseSampleDrefImplicitLod 585 OpImageSparseSampleDrefExplicitLod 586 OpImageSparseSampleProjImplicitLod 587 OpImageSparseSampleProjExplicitLod 588 OpImageSparseSampleProjDrefImplicitLod 589 OpImageSparseSampleProjDrefExplicitLod 590 OpImageSparseGather 591 OpImageSparseDrefGather 592 OpImageFetch 593 OpImageRead 594 OpImageQueryFormat 595 OpImageQueryOrder 596 OpImageQuerySizeLod 597 OpImageQuerySize 598 OpImageQueryLevels 599 OpImageQuerySamples 600 OpImageSparseFetch 601 OpImageSparseRead 602 OpImageWrite 603 604Also, OpLoad and OpStore with an AccessChain into a base of OpVariable with 605either Uniform or StorageBuffer storage class and a type which is either a 606struct decorated with Block, or a runtime or statically-sized array of such 607a struct. 608 609 610### Shader Instrumentation Error Record Format 611 612The instrumented shader code generates "error records" in a specific format. 613 614This description includes the support for future GPU-Assisted Validation features 615such as checking for uninitialized descriptors in the partially-bound scenario. 616These items are not used in the current implementation for descriptor array 617bounds checking, but are provided here to complete the description of the 618error record format. 619 620The format of this buffer is as follows: 621 622```C 623struct DebugOutputBuffer_t 624{ 625 uint DataWrittenLength; 626 uint Data[]; 627} 628``` 629 630`DataWrittenLength` is the number of uint32_t words that have been attempted to be written. 631It should be initialized to 0. 632 633The `Data` array is the uint32_t words written by the shaders of the pipeline to record bindless validation errors. 634All elements of `Data` should be initialized to 0. 635Note that the `Data` array has runtime length. 636The shader queries the length of the `Data` array to make sure that it does not write past the end of `Data`. 637The shader only writes complete records. 638The layer uses the length of `Data` to control the number of records written by the shaders. 639 640The `DataWrittenLength` is atomically updated by the shaders so that shaders do not overwrite each others' data. 641The shader takes the value it gets from the atomic update. 642If the value plus the record length is greater than the length of `Data`, it does not write the record. 643 644Given this protocol, the value in `DataWrittenLength` is not very meaningful if it is greater than the length of `Data`. 645However, the format of the written records plus the fact that `Data` is initialized to 0 should be enough to determine 646the records that were written. 647 648### Record Format 649 650The format of an output record is the following: 651 652 Word 0: Record size 653 Word 1: Shader ID 654 Word 2: Instruction Index 655 Word 3: Stage 656 <Stage-Specific Words> 657 <Validation-Specific Words> 658 659The Record Size is the number of words in this record, including the Record Size. 660 661The Shader ID is a handle that was provided by the layer when the shader was instrumented. 662 663The Instruction Index is the instruction within the original function at which the error occurred. 664For bindless, this will be the instruction which consumes the descriptor in question, 665or the instruction that consumes the OpSampledImage that consumes the descriptor. 666 667The Stage is the integer value used in SPIR-V for each of the Execution Models: 668 669| Stage | Value | 670|---------------|:-----:| 671|Vertex |0 | 672|TessCtrl |1 | 673|TessEval |2 | 674|Geometry |3 | 675|Fragment |4 | 676|Compute |5 | 677|Task |5267 | 678|Mesh |5268 | 679|RayGenerationNV|5313 | 680|IntersectionNV |5314 | 681|AnyHitNV |5315 | 682|ClosestHitNV |5316 | 683|MissNV |5317 | 684|CallableNV |5318 | 685 686### Stage Specific Words 687 688These are words that identify which "instance" of the shader the validation error occurred in. 689Here are words for each stage: 690 691| Stage | Word 0 | Word 1 | Word 2 | 692|---------------|------------------|---------------|---------------| 693|Vertex |VertexID |InstanceID | unused | 694|TessCntrl |InvocationID |PrimitiveID | unused | 695|TessEval |PrimitiveID |TessCoord.u | TessCoord.v | 696|Geometry |PrimitiveID |InvocationID | unused | 697|Fragment |FragCoord.x |FragCoord.y | unused | 698|Compute |GlobalInvocID.x |GlobalInvocID.y|GlobalInvocID.z| 699|Task |GlobalInvocID.x |GlobalInvocID.y|GlobalInvocID.z| 700|Mesh |GlobalInvocID.x |GlobalInvocID.y|GlobalInvocID.z| 701|RayGenerationNV|LaunchIdNV.x |LaunchIdNV.y |LaunchIdNV.z | 702|IntersectionNV |LaunchIdNV.x |LaunchIdNV.y |LaunchIdNV.z | 703|AnyHitNV |LaunchIdNV.x |LaunchIdNV.y |LaunchIdNV.z | 704|ClosestHitNV |LaunchIdNV.x |LaunchIdNV.y |LaunchIdNV.z | 705|MissNV |LaunchIdNV.x |LaunchIdNV.y |LaunchIdNV.z | 706|CallableNV |LaunchIdNV.x |LaunchIdNV.y |LaunchIdNV.z | 707 708"unused" means not relevant, but still present. 709 710### Validation-Specific Words 711 712These are words that are specific to the validation being done. 713For bindless validation, they are variable. 714 715The first word is the Error Code. 716 717For the *OutOfBounds errors, two words will follow: Word0:DescriptorIndex, Word1:DescriptorArrayLength 718 719For the *Uninitialized errors, one word will follow: Word0:DescriptorIndex 720 721| Error | Word 0 | Word 1 | 722|-----------------------------|---------------------|-----------------------| 723|IndexOutOfBounds |Descriptor Index |Descriptor Array Length| 724|DescriptorUninitialized |Descriptor Index |unused | 725|BufferDeviceAddrOOB |Out of Bounds Address|unused | 726 727So the words written for an image descriptor bounds error in a fragment shader is: 728 729 Word 0: Record size (9) 730 Word 1: Shader ID 731 Word 2: Instruction Index 732 Word 3: Stage (4:Fragment) 733 Word 4: FragCoord.x 734 Word 5: FragCoord.y 735 Word 6: Error (0: ImageIndexOutOfBounds) 736 Word 7: DescriptorIndex 737 Word 8: DescriptorArrayLength 738 739If another error is encountered, that record is written starting at Word 9, if the whole record will not overflow Data. 740If overflow will happen, no words are written.. 741 742The validation layer can continue to read valid records until it sees a Record Length of 0 or the end of Data is reached. 743 744#### Programmatic interface 745 746The programmatic interface for the above informal description is codified in the 747[SPIRV-Tools](https://github.com/KhronosGroup/SPIRV-Tools) repository in file 748[`instrument.hpp`](https://github.com/KhronosGroup/SPIRV-Tools/blob/master/include/spirv-tools/instrument.hpp). 749It consists largely of integer constant definitions for the codes and values mentioned above and 750offsets into the record for locating each item. 751 752## GPU-Assisted Validation Error Report 753 754This is a fairly simple process of mapping the debug report buffer associated with 755each draw in the command buffer that was just submitted and looking to see if the GPU instrumentation 756code wrote anything. 757Each draw in the command buffer should have a corresponding result buffer in the command buffer's list of result buffers. 758The report generating code loops through the result buffers, maps each of them, checks for errors, and unmaps them. 759The layer clears the buffer to zeros when it is allocated and after processing any 760buffer that was written to. 761The instrumented shader code expects these buffers to be cleared to zeros before it 762writes to them. 763 764The layer then prepares a "common" validation error message containing: 765 766* command buffer handle - This is easily obtained because we are looping over the command 767 buffers just submitted. 768* draw number - keep track of how many draws we've processed for a given command buffer. 769* pipeline handle - The shader tracker discussed earlier contains this handle 770* shader module handle - The "Shader ID" (Word 1 in the record) is used to lookup 771 the shader tracker which is then used to obtain the shader module and pipeline handles 772* instruction index - This is the SPIR-V instruction index where the invalid array access occurred. 773 It is not that useful by itself, since the user would have to use it to locate a SPIR-V instruction 774 in a SPIR-V disassembly and somehow relate it back to the shader source code. 775 But it could still be useful to some and it is easy to report. 776 The user can build the shader with debug information to get source-level information. 777 778For all objects, the layer also looks up the objects in the Debug Utils object name map in 779case the application used that extension to name any objects. 780If a name exists for that object, it is included in the error message. 781 782The layer then adds an error message text obtained from decoding the stage-specific and 783validation-specific data as described earlier. 784 785This completes the error report when there is no source-level debug information in the shader. 786 787### Source-Level Debug Information 788 789This is one of the more complicated and code-heavy parts of the GPU-Assisted Validation feature 790and all it really does is display source-level information when the shader is compiled 791with debugging info (`-g` option in the case of `glslangValidator`). 792 793The process breaks down into two steps: 794 795#### OpLine Processing 796 797The SPIR-V generator (e.g., glslangValidator) places an OpLine SPIR-V instruction in the 798shader program ahead of code generated for each source code statement. 799The OpLine instruction contains the filename id (for an OpString), 800the source code line number and the source code column number. 801It is possible to have two source code statements on the same line in the source file, 802which explains the need for the column number. 803 804The layer scans the SPIR-V looking for the last OpLine instruction that appears before the instruction 805at the instruction index obtained from the debug report. 806This OpLine then contains the correct filename id, line number, and column number of the 807statement causing the error. 808The filename itself is obtained by scanning the SPIR-V again for an OpString instruction that 809matches the id from the OpLine. 810This OpString contains the text string representing the filename. 811This information is added to the validation error message. 812 813For online compilation when there is no "file", only the line number information is reported. 814 815#### OpSource Processing 816 817The SPIR-V built with source-level debug info also contains OpSource instructions that 818have a string containing the source code, delimited by newlines. 819Due to possible pre-processing, the layer just cannot simply use the source file line number 820from the OpLine to index into this set of source code lines. 821 822Instead, the correct source code line is found by first locating the "#line" directive in the 823source that specifies a line number closest to and less than the source line number reported 824by the OpLine located in the previous step. 825The correct "#line" directive must also match its filename, if specified, 826with the filename from the OpLine. 827 828Then the difference between the "#line" line number and the OpLine line number is added 829to the place where the "#line" was found to locate the actual line of source, which is 830then added to the validation error message. 831 832For example, if the OpLine line number is 15, and there is a "#line 10" on line 40 833in the OpSource source, then line 45 in the OpSource contains the correct source line. 834 835### Shader Instrumentation Input Record Format for Descriptor Indexing 836 837Although the DI input buffer is a linear array of unsigned integers, conceptually there are arrays within the linear array. 838 839Word 1 starts an array (denoted by sets_to_sizes) that is number_of_sets long, with an index that indicates the start of that set's entries in the sizes array. 840 841After the sets_to_sizes array is the sizes array, that contains the array size (or 1 if descriptor is not an array) of each descriptor in the set. Bindings with no descriptor are filled in with zeros. 842 843After the sizes array is the sets_to_bindings array that for each descriptor set, indexes into the bindings_to_written array. Word 0 contains the index that is the start of the sets_to_bindings array. 844 845After the sets_to_bindings array, is the bindings_to_written array that for each binding in the set, indexes to the start of that binding's entries in the written array. 846 847Lastly comes the written array, which indicates whether a given binding / array element has been written. 848 849Example: 850``` 851Assume Descriptor Set 0 looks like: And Descriptor Set 1 looks like: 852 Binding Binding 853 0 Array[3] 2 Array[4] 854 1 Non Array 3 Array[5] 855 3 Array[2] 856 857Here is what the input buffer should look like: 858 859 Index of sets_to_sizes sizes sets_to_bindings bindings_to_written written 860 sets_to_bindings 861 862 0 |11| sets_to_bindings 1 |3| set 0 sizes start at 3 3 |3| S0B0 11 |13| set 0 bindings start at 13 13 |21| S0B0 21 |1| S0B0I0 was written 863 starts at 11 2 |7| set 1 sizes start at 7 4 |1| S0B1 12 |17| set 1 bindings start at 17 14 |24| S0B1 22 |1| S0B0I1 was written 864 5 |0| S0B2 15 |0 | S0B2 23 |1| S0B0I3 was written 865 6 |2| S0B3 16 |25| S0B3 24 |1| S0B1 was written 866 7 |0| S1B0 17 |0 | S1B0 25 |1| S0B3I0 was written 867 8 |0| S1B1 18 |0 | S1B1 26 |1| S0B3I1 was written 868 9 |4| S1B2 19 |27| S1B2 27 |0| S1B2I0 was not written 869 10 |5| S1B3 20 |31| S1B3 28 |1| S1B2I1 was written 870 29 |1| S1B2I2 was written 871 30 |1| S1B2I3 was written 872 31 |1| S1B3I0 was written 873 32 |1| S1B3I1 was written 874 33 |1| S1B3I2 was written 875 34 |1| S1B3I3 was written 876 35 |1| S1B3I4 was written 877``` 878Alternately, you could describe the array size and write state data as: 879(set = s, binding = b, index = i) is not initialized if 880``` 881Input[ i + Input[ b + Input[ s + Input[ Input[0] ] ] ] ] == 0 882``` 883and the array's size = Input[ Input[ s + 1 ] + b ] 884 885### Shader Instrumentation Input Record Format for buffer device address 886The input buffer for buffer_reference accesses consists of all addresses retrieved from vkGetBufferDeviceAddressEXT and the sizes of the corresponding buffers. 887The addresses should be sorted in ascending order. 888``` 889Word 0: Index of start of buffer sizes (X+2) 890Word 1: 0x0000000000000000 891Word 2: Device address of first buffer 892 . 893 . 894Word X: Device address of last buffer 895Word X+1: 0xffffffffffffffff 896Word X+2: 0 (size of pretend buffer at word 1) 897Word X+3: Size of first buffer 898 . 899 . 900Word Y: Size of last buffer 901Word Y+1: 0 (size of pretend buffer at word X+1) 902``` 903### Acceleration Structure Building Validation 904 905Increasing performance of graphics hardware has made ray tracing a viable option for interactive rendering. The VK_NV_ray_tracing extension adds 906ray tracing support to Vulkan. With this extension, applications create and build VkAccelerationStructureNV objects for their scene geometry 907which allows implementations to manage the scene geometry as it is traversed during a ray tracing query. 908 909There are two types of acceleration structures, top level acceleration structures and bottom level acceleration structures. Bottom level acceleration 910structures are for an array of geometries and top level acceleration structures are for an array of instances of bottom level structures. 911 912The acceleration structure building validation feature of the GPU validation layer validates that the bottom level acceleration structure references 913found in the instance data used when building top level acceleration structures are valid. 914 915#### Implementation 916 917Because the instance data buffer used in vkCmdBuildAccelerationStructureNV could be a device local buffer and because commands are executed sometime 918in the future, validating the instance buffer must take place on the GPU. To accomplish this, the GPU validation layer tracks the known valid handles 919of bottom level acceleration structures at the time a command buffer is recorded and inserts an additional compute shader dispatch before commands 920which build top level acceleration structures to inspect and validate the instance buffer used. The compute shader iterates over the instance buffer 921and replaces unrecognized bottom level acceleration structure handles with a prebuilt valid bottom level acceleration structure handle. Upon queue 922submission and completion of the command buffer, the reported failures are read from a storage buffer written to by the compute shader and finally 923reported to the application. 924 925To help visualized, a command buffer that would originally have been recorded as: 926 927```cpp 928vkBeginCommandBuffer(...) 929 930... other commands ... 931 932vkCmdBuildAccelerationStructureNV(...) // build top level 933 934... other commands ... 935 936vkEndCommandBuffer(...) 937``` 938 939would actually be recorded as: 940 941```cpp 942vkBeginCommandBuffer(...) 943 944... other commands ... 945 946vkCmdPipelineBarrier(...) // ensure writes to instance buffer have completed 947 948vkCmdDispatch(...) // launch validation compute shader 949 950vkCmdPipelineBarrier(...) // ensure validation compute shader writes have completed 951 952vkCmdBuildAccelerationStructureNV(...) // build top level using modified instance buffer 953 954... other commands ... 955 956vkEndCommandBuffer(...) 957``` 958 959## GPU-Assisted Buffer Access Validation 960 961When GPU-Assisted Validation is active, either the descriptor indexing input buffer 962(if descriptor indexing is enabled) or an input buffer of the same format without array 963sizes is used to inform instrumented shaders of the size of each of the buffers the shader 964may access. If the shader accesses a buffer beyond the declared length of the buffer, the 965instrumentation will return an error to the validation layer. This checking applies to to 966all uniform and storage buffers. If a buffer access is found to be out of bounds, it will 967not be performed. Instead, writes will be skipped, and reads will return 0. 968Note that this validation can be disabled by setting "khronos_validation.gpuav_buffer_oob = false" 969in a vk_layer_settings.txt file. Note also that if a robust buffer access extension is enabled 970this buffer access checking will be disabled, since such accesses become valid. 971 972## GPU-Assisted Validation Testing 973 974Validation Layer Tests (VLTs) exist for GPU-Assisted Validation. 975They cannot be run with the "mock ICD" in headless CI environments because they need to 976actually execute shaders. 977But they are still useful to run on real devices to check for regressions. 978 979There isn't anything else that remarkable or different about these tests. 980They activate GPU-Assisted Validation via the programmatic 981interface as described earlier. 982 983The tests exercise the extraction of source code information when the shader 984is built with debug info. 985