#
7fa043ea |
| 26-Apr-2024 |
Dave Airlie <airlied@redhat.com> |
drm/i915: fix build with missing debugfs includes
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/i915/i915_debugfs_params.c:213:9: error: call to undeclared function 'debugfs_create_file_unsafe'
drm/i915: fix build with missing debugfs includes
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/i915/i915_debugfs_params.c:213:9: error: call to undeclared function 'debugfs_create_file_unsafe'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] return debugfs_create_file_unsafe(name, mode, parent, value, ^ /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/i915/i915_debugfs_params.c:213:9: error: incompatible integer to pointer conversion returning 'int' from a function with result type 'struct dentry *' [-Wint-conversion] return debugfs_create_file_unsafe(name, mode, parent, value, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/i915/i915_debugfs_params.c:222:9: error: call to undeclared function 'debugfs_create_file_unsafe'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] return debugfs_create_file_unsafe(name, mode, parent, value, ^ /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/i915/i915_debugfs_params.c:222:9: error: incompatible integer to pointer conversion returning 'int' from a function with result type 'struct dentry *' [-Wint-conversion] return debugfs_create_file_unsafe(name, mode, parent, value, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Building with clang gave me a bunch of similiar fails to the above.
Fixes: 33d5ae6cacf4 ("drm/print: drop include debugfs.h and include where needed") Signed-off-by: Dave Airlie <airlied@redhat.com>
show more ...
|
#
48ba4a6d |
| 20-Mar-2024 |
Lucas De Marchi <lucas.demarchi@intel.com> |
drm/i915: Update IP_VER(12, 50)
With no platform using graphics/media IP_VER(12, 50), replace the checks throughout the code with IP_VER(12, 55) so the code makes sense by itself with no additional
drm/i915: Update IP_VER(12, 50)
With no platform using graphics/media IP_VER(12, 50), replace the checks throughout the code with IP_VER(12, 55) so the code makes sense by itself with no additional explanation of previous baggage.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://patchwork.freedesktop.org/patch/msgid/20240320060543.4034215-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
show more ...
|
#
3c0fa9f4 |
| 02-Feb-2024 |
Ville Syrjälä <ville.syrjala@linux.intel.com> |
drm/i915: Use struct resource for memory region IO as well
mem->region is a struct resource, but mem->io_start and mem->io_size are not for whatever reason. Let's unify this and convert the io stuff
drm/i915: Use struct resource for memory region IO as well
mem->region is a struct resource, but mem->io_start and mem->io_size are not for whatever reason. Let's unify this and convert the io stuff into a struct resource as well. Should make life a little less annoying when you don't have juggle between two different approaches all the time.
Mostly done using cocci (with manual tweaks at all the places where we mutate io_size by hand): @@ struct intel_memory_region *M; expression START, SIZE; @@ - M->io_start = START; - M->io_size = SIZE; + M->io = DEFINE_RES_MEM(START, SIZE);
@@ struct intel_memory_region *M; @@ - M->io_start + M->io.start
@@ struct intel_memory_region M; @@ - M.io_start + M.io.start
@@ expression M; @@ - M->io_size + resource_size(&M->io)
@@ expression M; @@ - M.io_size + resource_size(&M.io)
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Tested-by: Paz Zcharya <pazz@chromium.org> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240202224340.30647-2-ville.syrjala@linux.intel.com
show more ...
|
#
f8e9325f |
| 10-Nov-2023 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: update in-source bug filing URLs
The bug filing documentation has been moved from the gitlab wiki to gitlab pages at https://drm.pages.freedesktop.org/intel-docs/.
Cc: Joonas Lahtinen <jo
drm/i915: update in-source bug filing URLs
The bug filing documentation has been moved from the gitlab wiki to gitlab pages at https://drm.pages.freedesktop.org/intel-docs/.
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231110114807.3455739-2-jani.nikula@intel.com
show more ...
|
#
d5818410 |
| 31-Oct-2023 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: move gpu error sysfs to i915_gpu_error.c
Hide gpu error specifics in i915_gpu_error.c. This is also cleaner wrt conditional compilation, as i915_gpu_error.c is only built with DRM_I915_CAP
drm/i915: move gpu error sysfs to i915_gpu_error.c
Hide gpu error specifics in i915_gpu_error.c. This is also cleaner wrt conditional compilation, as i915_gpu_error.c is only built with DRM_I915_CAPTURE_ERROR=y.
With this, we can also make i915_first_error_state() static.
Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231031124502.1772160-3-jani.nikula@intel.com
show more ...
|
#
4fca5198 |
| 31-Oct-2023 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: move gpu error debugfs to i915_gpu_error.c
Hide gpu error specifics in i915_gpu_error.c. This is also cleaner wrt conditional compilation, as i915_gpu_error.c is only built with DRM_I915_C
drm/i915: move gpu error debugfs to i915_gpu_error.c
Hide gpu error specifics in i915_gpu_error.c. This is also cleaner wrt conditional compilation, as i915_gpu_error.c is only built with DRM_I915_CAPTURE_ERROR=y.
Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231031124502.1772160-2-jani.nikula@intel.com
show more ...
|
#
2efb81e5 |
| 31-Oct-2023 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: make some error capture functions static
Not needed outside of i915_gpu_error.c.
Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: ht
drm/i915: make some error capture functions static
Not needed outside of i915_gpu_error.c.
Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231031124502.1772160-1-jani.nikula@intel.com
show more ...
|
#
7a61a6aa |
| 24-Oct-2023 |
Jouni Högander <jouni.hogander@intel.com> |
drm/i915/display: Dump also display parameters
GPU error dump contained all module parameters. If we are moving display parameters to intel_display_params.[ch] they are not dumped into GPU error dum
drm/i915/display: Dump also display parameters
GPU error dump contained all module parameters. If we are moving display parameters to intel_display_params.[ch] they are not dumped into GPU error dump. This patch is adding moved display parameters back to GPU error dump. Display parameters are also included in i915_capabilities
v2: Add parameters to i915_capabilities as well
Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Luca Coelho <luciano.coelho@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231024124109.384973-3-jouni.hogander@intel.com
show more ...
|
#
37d62359 |
| 28-Sep-2023 |
Nirmoy Das <nirmoy.das@intel.com> |
drm/i915/mtl: Skip MCR ops for ring fault register
On MTL GEN12_RING_FAULT_REG is not replicated so don't do mcr based operation for this register.
v2: use MEDIA_VER() instead of GRAPHICS_VER()(Mat
drm/i915/mtl: Skip MCR ops for ring fault register
On MTL GEN12_RING_FAULT_REG is not replicated so don't do mcr based operation for this register.
v2: use MEDIA_VER() instead of GRAPHICS_VER()(Matt). v3: s/"MEDIA_VER(i915) == 13"/"MEDIA_VER(i915) >= 13"(Matt) improve comment. v4: improve the comment further(Andi)
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230928130015.6758-4-nirmoy.das@intel.com
show more ...
|
#
b94c165e |
| 19-Sep-2023 |
Clint Taylor <clinton.a.taylor@intel.com> |
drm/i915/xe2lpd: Register DE_RRMR has been removed
Do not read DE_RRMR register after display version 20. This register contains display state information during GFX state dumps.
Bspec: 69456 Cc: A
drm/i915/xe2lpd: Register DE_RRMR has been removed
Do not read DE_RRMR register after display version 20. This register contains display state information during GFX state dumps.
Bspec: 69456 Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Cc: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Clint Taylor <clinton.a.taylor@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230919192128.2045154-9-lucas.demarchi@intel.com
show more ...
|
#
8874288c |
| 13-Sep-2023 |
Jouni Högander <jouni.hogander@intel.com> |
drm/i915: Remove runtime suspended boolean from intel_runtime_pm struct
It's not necessary to carry separate suspended status information in intel_runtime_pm struct as this information is already in
drm/i915: Remove runtime suspended boolean from intel_runtime_pm struct
It's not necessary to carry separate suspended status information in intel_runtime_pm struct as this information is already in underlying device structure. Remove it and use pm_runtime_suspended() to obtain suspended status information when needed.
Cc: Jani Nikula <jani.nikula@intel.com> Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230913100430.3433969-1-jouni.hogander@intel.com
show more ...
|
#
4ae7eb92 |
| 27-Jun-2023 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: separate display info printing from the rest
Add new function intel_display_device_info_print() and print the display device info there instead of intel_device_info_print(). This also fixe
drm/i915: separate display info printing from the rest
Add new function intel_display_device_info_print() and print the display device info there instead of intel_device_info_print(). This also fixes the display runtime info printing to use the actual runtime info instead of the static defaults.
Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/30d4f93c58839bc9312b43423cd43bc0ef655a35.1687878757.git.jani.nikula@intel.com
show more ...
|
#
36dd2a6e |
| 17-Jun-2023 |
Sumitra Sharma <sumitraartsy@gmail.com> |
drm/i915: Replace kmap() with kmap_local_page()
kmap() has been deprecated in favor of the kmap_local_page() due to high cost, restricted mapping space, the overhead of a global lock for synchroniza
drm/i915: Replace kmap() with kmap_local_page()
kmap() has been deprecated in favor of the kmap_local_page() due to high cost, restricted mapping space, the overhead of a global lock for synchronization, and making the process sleep in the absence of free slots.
kmap_local_page() is faster than kmap() and offers thread-local and CPU-local mappings, can take pagefaults in a local kmap region and preserves preemption by saving the mappings of outgoing tasks and restoring those of the incoming one during a context switch.
The mapping is kept thread local in the function “i915_vma_coredump_create” in i915_gpu_error.c
Therefore, replace kmap() with kmap_local_page().
Suggested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Sumitra Sharma <sumitraartsy@gmail.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230617180420.GA410966@sumitra.com [tursulin: Removed blank line within tags. Fixup commit text.]
show more ...
|
#
f8a101ff |
| 21-Jun-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
i915: convert i915_gpu_error to use a folio_batch
Remove one of the last remaining users of pagevec.
Link: https://lkml.kernel.org/r/20230621164557.3510324-9-willy@infradead.org Signed-off-by: Matt
i915: convert i915_gpu_error to use a folio_batch
Remove one of the last remaining users of pagevec.
Link: https://lkml.kernel.org/r/20230621164557.3510324-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
6197cff3 |
| 18-Apr-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915: Dump error capture to kernel log
This is useful for getting debug information out in certain situations, such as failing kernel selftests and CI runs that don't log error captures. It is e
drm/i915: Dump error capture to kernel log
This is useful for getting debug information out in certain situations, such as failing kernel selftests and CI runs that don't log error captures. It is especially useful for things like retrieving GuC logs as GuC operation can't be tracked by adding printk or ftrace entries.
v2: Add CONFIG_DRM_I915_DEBUG_GEM wrapper (review feedback by Rodrigo).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230418181744.3251240-2-John.C.Harrison@Intel.com
show more ...
|
#
9275277d |
| 09-May-2023 |
Fei Yang <fei.yang@intel.com> |
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for buffer objects. This is flaky because the PAT index which really controls th
drm/i915: use pat_index instead of cache_level
Currently the KMD is using enum i915_cache_level to set caching policy for buffer objects. This is flaky because the PAT index which really controls the caching behavior in PTE has far more levels than what's defined in the enum. In addition, the PAT index is platform dependent, having to translate between i915_cache_level and PAT index is not reliable, and makes the code more complicated.
From UMD's perspective there is also a necessity to set caching policy for performance fine tuning. It's much easier for the UMD to directly use PAT index because the behavior of each PAT index is clearly defined in Bspec. Having the abstracted i915_cache_level sitting in between would only cause more ambiguity. PAT is expected to work much like MOCS already works today, and by design userspace is expected to select the index that exactly matches the desired behavior described in the hardware specification.
For these reasons this patch replaces i915_cache_level with PAT index. Also note, the cache_level is not completely removed yet, because the KMD still has the need of creating buffer objects with simple cache settings such as cached, uncached, or writethrough. For kernel objects, cache_level is used for simplicity and backward compatibility. For Pre-gen12 platforms PAT can have 1:1 mapping to i915_cache_level, so these two are interchangeable. see the use of LEGACY_CACHELEVEL.
One consequence of this change is that gen8_pte_encode is no longer working for gen12 platforms due to the fact that gen12 platforms has different PAT definitions. In the meantime the mtl_pte_encode introduced specfically for MTL becomes generic for all gen12 platforms. This patch renames the MTL PTE encode function into gen12_pte_encode and apply it to all gen12. Even though this change looks unrelated, but separating them would temporarily break gen12 PTE encoding, thus squash them in one patch.
Special note: this patch changes the way caching behavior is controlled in the sense that some objects are left to be managed by userspace. For such objects we need to be careful not to change the userspace settings.There are kerneldoc and comments added around obj->cache_coherent, cache_dirty, and how to bypass the checkings by i915_gem_object_has_cache_level. For full understanding, these changes need to be looked at together with the two follow-up patches, one disables the {set|get}_caching ioctl's and the other adds set_pat extension to the GEM_CREATE uAPI.
Bspec: 63019
Cc: Chris Wilson <chris.p.wilson@linux.intel.com> Signed-off-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com
show more ...
|
#
88629fee |
| 02-May-2023 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915/error: fix i915_capture_error_state() kernel-doc
drivers/gpu/drm/i915/i915_gpu_error.c:2174: warning: Function parameter or member 'dump_flags' not described in 'i915_capture_error_state'
drm/i915/error: fix i915_capture_error_state() kernel-doc
drivers/gpu/drm/i915/i915_gpu_error.c:2174: warning: Function parameter or member 'dump_flags' not described in 'i915_capture_error_state'
Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20524292b002800975d82d23b5bd47da878f1733.1683041799.git.jani.nikula@intel.com
show more ...
|
#
e4730ae4 |
| 28-Apr-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915/guc: Fix error capture for virtual engines
GuC based register dumps in error capture logs were basically broken for virtual engines. This can be seen in igt@gem_exec_balancer@hang: [IGT]
drm/i915/guc: Fix error capture for virtual engines
GuC based register dumps in error capture logs were basically broken for virtual engines. This can be seen in igt@gem_exec_balancer@hang: [IGT] gem_exec_balancer: starting subtest hang [drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388] [drm] GT0: GUC: No register capture node found for 0x1005 / 0xFEDC311D [drm] GPU HANG: ecode 12:4:00000000, in gem_exec_balanc [6388] [IGT] gem_exec_balancer: exiting, ret=0
The test causes a hang on both engines of a virtual engine context. The engine instance zero hang gets a valid error capture but the non-instance-zero hang does not.
Fix that by scanning through the list of pending register captures when a hang notification for a virtual engine is received. That way, the hang can be assigned to the correct physical engine prior to starting the error capture process. So later on, when the error capture handler tries to find the engine register list, it looks for one on the correct engine.
Also, sneak in a missing blank line before a comment in the node search code.
v2: Fix null pointer deref on non-GuC platforms.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230428185636.457407-5-John.C.Harrison@Intel.com
show more ...
|
#
c8a76df6 |
| 11-Mar-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915: Include timeline seqno in error capture
The seqno value actually written out to memory is no longer in the regular HWSP. Instead, it is now in its own private timeline buffer. Thus, it is
drm/i915: Include timeline seqno in error capture
The seqno value actually written out to memory is no longer in the regular HWSP. Instead, it is now in its own private timeline buffer. Thus, it is no longer visible in an error capture. So, explicitly read the value and include that in the capture.
v2: %d -> %u (Alan)
Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230311063714.570389-4-John.C.Harrison@Intel.com
show more ...
|
#
e7696d65 |
| 27-Jan-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915: Allow error capture of a pending request
A hang situation has been observed where the only requests on the context were either completed or not yet started according to the breaadcrumbs. H
drm/i915: Allow error capture of a pending request
A hang situation has been observed where the only requests on the context were either completed or not yet started according to the breaadcrumbs. However, the register state claimed a batch was (maybe) in progress. So, allow capture of the pending request on the grounds that this might be better than nothing.
v2: Reword 'not started' warning message (Tvrtko)
Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-6-John.C.Harrison@Intel.com
show more ...
|
#
e8a3319c |
| 27-Jan-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915: Allow error capture without a request
There was a report of error captures occurring without any hung context being indicated despite the capture being initiated by a 'hung context notific
drm/i915: Allow error capture without a request
There was a report of error captures occurring without any hung context being indicated despite the capture being initiated by a 'hung context notification' from GuC. The problem was not reproducible. However, it is possible to happen if the context in question has no active requests. For example, if the hang was in the context switch itself then the breadcrumb write would have occurred and the KMD would see an idle context.
In the interests of attempting to provide as much information as possible about a hang, it seems wise to include the engine info regardless of whether a request was found or not. As opposed to just prentending there was no hang at all.
So update the error capture code to always record engine information if a context is given. Which means updating record_context() to take a context instead of a request (which it only ever used to find the context anyway). And split the request agnostic parts of intel_engine_coredump_add_request() out into a seaprate function.
v2: Remove a duplicate 'if' statement (Umesh) and fix a put of a null pointer. v3: Tidy up request locking code flow (Tvrtko) v4: Pull in improved info message from next patch and fix up potential leak of GuC register state (Daniele)
Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> (v2) Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-5-John.C.Harrison@Intel.com
show more ...
|
#
a4be3dca |
| 27-Jan-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915: Fix up locking around dumping requests lists
The debugfs dump of requests was confused about what state requires the execlist lock versus the GuC lock. There was also a bunch of duplicated
drm/i915: Fix up locking around dumping requests lists
The debugfs dump of requests was confused about what state requires the execlist lock versus the GuC lock. There was also a bunch of duplicated messy code between it and the error capture code.
So refactor the hung request search into a re-usable function. And reduce the span of the execlist state lock to only the execlist specific code paths. In order to do that, also move the report of hold count (which is an execlist only concept) from the top level dump function to the lower level execlist specific function. Also, move the execlist specific code into the execlist source file.
v2: Rename some functions and move to more appropriate files (Daniele). v3: Rename new execlist dump function (Daniele)
Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset with GuC") Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Michael Cheng <michael.cheng@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-4-John.C.Harrison@Intel.com
show more ...
|
#
3700e353 |
| 27-Jan-2023 |
John Harrison <John.C.Harrison@Intel.com> |
drm/i915: Fix request ref counting during error capture & debugfs dump
When GuC support was added to error capture, the reference counting around the request object was broken. Fix it up.
The conte
drm/i915: Fix request ref counting during error capture & debugfs dump
When GuC support was added to error capture, the reference counting around the request object was broken. Fix it up.
The context based search manages the spinlocking around the search internally. So it needs to grab the reference count internally as well. The execlist only request based search relies on external locking, so it needs an external reference count but within the spinlock not outside it.
The only other caller of the context based search is the code for dumping engine state to debugfs. That code wasn't previously getting an explicit reference at all as it does everything while holding the execlist specific spinlock. So, that needs updaing as well as that spinlock doesn't help when using GuC submission. Rather than trying to conditionally get/put depending on submission model, just change it to always do the get/put.
v2: Explicitly document adding an extra blank line in some dense code (Andy Shevchenko). Fix multiple potential null pointer derefs in case of no request found (some spotted by Tvrtko, but there was more!). Also fix a leaked request in case of !started and another in __guc_reset_context now that intel_context_find_active_request is actually reference counting the returned request. v3: Add a _get suffix to intel_context_find_active_request now that it grabs a reference (Daniele). v4: Split the intel_guc_find_hung_context change to a separate patch and rename intel_context_find_active_request_get to intel_context_get_active_request (Tvrtko). v5: s/locking/reference counting/ in commit message (Tvrtko)
Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset with GuC") Fixes: 573ba126aef3 ("drm/i915/guc: Capture error state on context reset") Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Andrzej Hajda <andrzej.hajda@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Michael Cheng <michael.cheng@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-3-John.C.Harrison@Intel.com
show more ...
|
#
801543b2 |
| 09-Nov-2022 |
Jani Nikula <jani.nikula@intel.com> |
drm/i915: stop including i915_irq.h from i915_trace.h
Turns out many of the files that need i915_reg.h get it implicitly via {display/intel_de.h, gt/intel_context.h} -> i915_trace.h -> i915_irq.h ->
drm/i915: stop including i915_irq.h from i915_trace.h
Turns out many of the files that need i915_reg.h get it implicitly via {display/intel_de.h, gt/intel_context.h} -> i915_trace.h -> i915_irq.h -> i915_reg.h. Since i915_trace.h doesn't actually need i915_irq.h, makes sense to drop it, but that requires adding quite a few new includes all over the place.
Prefer including i915_reg.h where needed instead of adding another implicit include, because eventually we'll want to split up i915_reg.h and only include the specific registers at each place.
Also some places actually needed i915_irq.h too.
Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/6e78a2e0ac1bffaf5af3b5ccc21dff05e6518cef.1668008071.git.jani.nikula@intel.com
show more ...
|
#
ab1b2d40 |
| 14-Oct-2022 |
Matt Roper <matthew.d.roper@intel.com> |
drm/i915/xehp: Check for faults on primary GAM
On Xe_HP the fault registers are now in a multicast register range. However as part of the GAM these registers follow special rules and we need only re
drm/i915/xehp: Check for faults on primary GAM
On Xe_HP the fault registers are now in a multicast register range. However as part of the GAM these registers follow special rules and we need only read from the "primary" GAM's instance to get the information we need. So a single intel_gt_mcr_read_any() (which will automatically steer to the primary GAM) is sufficient; we don't need to loop over each instance of the MCR register.
v2: - Update more instances of fault registers. (Bala)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-7-matthew.d.roper@intel.com
show more ...
|