History log of /linux/drivers/gpu/drm/i915/i915_gpu_error.c (Results 1 – 25 of 453)
Revision Date Author Comments
# 7fa043ea 26-Apr-2024 Dave Airlie <airlied@redhat.com>

drm/i915: fix build with missing debugfs includes

/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/i915/i915_debugfs_params.c:213:9: error: call to undeclared function 'debugfs_create_file_unsafe'

drm/i915: fix build with missing debugfs includes

/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/i915/i915_debugfs_params.c:213:9: error: call to undeclared function 'debugfs_create_file_unsafe'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
return debugfs_create_file_unsafe(name, mode, parent, value,
^
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/i915/i915_debugfs_params.c:213:9: error: incompatible integer to pointer conversion returning 'int' from a function with result type 'struct dentry *' [-Wint-conversion]
return debugfs_create_file_unsafe(name, mode, parent, value,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/i915/i915_debugfs_params.c:222:9: error: call to undeclared function 'debugfs_create_file_unsafe'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
return debugfs_create_file_unsafe(name, mode, parent, value,
^
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/i915/i915_debugfs_params.c:222:9: error: incompatible integer to pointer conversion returning 'int' from a function with result type 'struct dentry *' [-Wint-conversion]
return debugfs_create_file_unsafe(name, mode, parent, value,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Building with clang gave me a bunch of similiar fails to the above.

Fixes: 33d5ae6cacf4 ("drm/print: drop include debugfs.h and include where needed")
Signed-off-by: Dave Airlie <airlied@redhat.com>

show more ...


# 48ba4a6d 20-Mar-2024 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: Update IP_VER(12, 50)

With no platform using graphics/media IP_VER(12, 50), replace the
checks throughout the code with IP_VER(12, 55) so the code makes sense
by itself with no additional

drm/i915: Update IP_VER(12, 50)

With no platform using graphics/media IP_VER(12, 50), replace the
checks throughout the code with IP_VER(12, 55) so the code makes sense
by itself with no additional explanation of previous baggage.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20240320060543.4034215-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

show more ...


# 3c0fa9f4 02-Feb-2024 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Use struct resource for memory region IO as well

mem->region is a struct resource, but mem->io_start and
mem->io_size are not for whatever reason. Let's unify this
and convert the io stuff

drm/i915: Use struct resource for memory region IO as well

mem->region is a struct resource, but mem->io_start and
mem->io_size are not for whatever reason. Let's unify this
and convert the io stuff into a struct resource as well.
Should make life a little less annoying when you don't have
juggle between two different approaches all the time.

Mostly done using cocci (with manual tweaks at all the
places where we mutate io_size by hand):
@@
struct intel_memory_region *M;
expression START, SIZE;
@@
- M->io_start = START;
- M->io_size = SIZE;
+ M->io = DEFINE_RES_MEM(START, SIZE);

@@
struct intel_memory_region *M;
@@
- M->io_start
+ M->io.start

@@
struct intel_memory_region M;
@@
- M.io_start
+ M.io.start

@@
expression M;
@@
- M->io_size
+ resource_size(&M->io)

@@
expression M;
@@
- M.io_size
+ resource_size(&M.io)

Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Tested-by: Paz Zcharya <pazz@chromium.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202224340.30647-2-ville.syrjala@linux.intel.com

show more ...


# f8e9325f 10-Nov-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915: update in-source bug filing URLs

The bug filing documentation has been moved from the gitlab wiki to
gitlab pages at https://drm.pages.freedesktop.org/intel-docs/.

Cc: Joonas Lahtinen <jo

drm/i915: update in-source bug filing URLs

The bug filing documentation has been moved from the gitlab wiki to
gitlab pages at https://drm.pages.freedesktop.org/intel-docs/.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231110114807.3455739-2-jani.nikula@intel.com

show more ...


# d5818410 31-Oct-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915: move gpu error sysfs to i915_gpu_error.c

Hide gpu error specifics in i915_gpu_error.c. This is also cleaner wrt
conditional compilation, as i915_gpu_error.c is only built with
DRM_I915_CAP

drm/i915: move gpu error sysfs to i915_gpu_error.c

Hide gpu error specifics in i915_gpu_error.c. This is also cleaner wrt
conditional compilation, as i915_gpu_error.c is only built with
DRM_I915_CAPTURE_ERROR=y.

With this, we can also make i915_first_error_state() static.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231031124502.1772160-3-jani.nikula@intel.com

show more ...


# 4fca5198 31-Oct-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915: move gpu error debugfs to i915_gpu_error.c

Hide gpu error specifics in i915_gpu_error.c. This is also cleaner wrt
conditional compilation, as i915_gpu_error.c is only built with
DRM_I915_C

drm/i915: move gpu error debugfs to i915_gpu_error.c

Hide gpu error specifics in i915_gpu_error.c. This is also cleaner wrt
conditional compilation, as i915_gpu_error.c is only built with
DRM_I915_CAPTURE_ERROR=y.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231031124502.1772160-2-jani.nikula@intel.com

show more ...


# 2efb81e5 31-Oct-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915: make some error capture functions static

Not needed outside of i915_gpu_error.c.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: ht

drm/i915: make some error capture functions static

Not needed outside of i915_gpu_error.c.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231031124502.1772160-1-jani.nikula@intel.com

show more ...


# 7a61a6aa 24-Oct-2023 Jouni Högander <jouni.hogander@intel.com>

drm/i915/display: Dump also display parameters

GPU error dump contained all module parameters. If we are moving
display parameters to intel_display_params.[ch] they are not dumped
into GPU error dum

drm/i915/display: Dump also display parameters

GPU error dump contained all module parameters. If we are moving
display parameters to intel_display_params.[ch] they are not dumped
into GPU error dump. This patch is adding moved display parameters
back to GPU error dump. Display parameters are also included in
i915_capabilities

v2: Add parameters to i915_capabilities as well

Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231024124109.384973-3-jouni.hogander@intel.com

show more ...


# 37d62359 28-Sep-2023 Nirmoy Das <nirmoy.das@intel.com>

drm/i915/mtl: Skip MCR ops for ring fault register

On MTL GEN12_RING_FAULT_REG is not replicated so don't
do mcr based operation for this register.

v2: use MEDIA_VER() instead of GRAPHICS_VER()(Mat

drm/i915/mtl: Skip MCR ops for ring fault register

On MTL GEN12_RING_FAULT_REG is not replicated so don't
do mcr based operation for this register.

v2: use MEDIA_VER() instead of GRAPHICS_VER()(Matt).
v3: s/"MEDIA_VER(i915) == 13"/"MEDIA_VER(i915) >= 13"(Matt)
improve comment.
v4: improve the comment further(Andi)

Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230928130015.6758-4-nirmoy.das@intel.com

show more ...


# b94c165e 19-Sep-2023 Clint Taylor <clinton.a.taylor@intel.com>

drm/i915/xe2lpd: Register DE_RRMR has been removed

Do not read DE_RRMR register after display version 20. This register
contains display state information during GFX state dumps.

Bspec: 69456
Cc: A

drm/i915/xe2lpd: Register DE_RRMR has been removed

Do not read DE_RRMR register after display version 20. This register
contains display state information during GFX state dumps.

Bspec: 69456
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Signed-off-by: Clint Taylor <clinton.a.taylor@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230919192128.2045154-9-lucas.demarchi@intel.com

show more ...


# 8874288c 13-Sep-2023 Jouni Högander <jouni.hogander@intel.com>

drm/i915: Remove runtime suspended boolean from intel_runtime_pm struct

It's not necessary to carry separate suspended status information in
intel_runtime_pm struct as this information is already in

drm/i915: Remove runtime suspended boolean from intel_runtime_pm struct

It's not necessary to carry separate suspended status information in
intel_runtime_pm struct as this information is already in underlying device
structure. Remove it and use pm_runtime_suspended() to obtain suspended
status information when needed.

Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Imre Deak <imre.deak@intel.com>

Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230913100430.3433969-1-jouni.hogander@intel.com

show more ...


# 4ae7eb92 27-Jun-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915: separate display info printing from the rest

Add new function intel_display_device_info_print() and print the display
device info there instead of intel_device_info_print(). This also fixe

drm/i915: separate display info printing from the rest

Add new function intel_display_device_info_print() and print the display
device info there instead of intel_device_info_print(). This also fixes
the display runtime info printing to use the actual runtime info instead
of the static defaults.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/30d4f93c58839bc9312b43423cd43bc0ef655a35.1687878757.git.jani.nikula@intel.com

show more ...


# 36dd2a6e 17-Jun-2023 Sumitra Sharma <sumitraartsy@gmail.com>

drm/i915: Replace kmap() with kmap_local_page()

kmap() has been deprecated in favor of the kmap_local_page()
due to high cost, restricted mapping space, the overhead of a
global lock for synchroniza

drm/i915: Replace kmap() with kmap_local_page()

kmap() has been deprecated in favor of the kmap_local_page()
due to high cost, restricted mapping space, the overhead of a
global lock for synchronization, and making the process sleep
in the absence of free slots.

kmap_local_page() is faster than kmap() and offers thread-local
and CPU-local mappings, can take pagefaults in a local kmap region
and preserves preemption by saving the mappings of outgoing tasks
and restoring those of the incoming one during a context switch.

The mapping is kept thread local in the function
“i915_vma_coredump_create” in i915_gpu_error.c

Therefore, replace kmap() with kmap_local_page().

Suggested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Sumitra Sharma <sumitraartsy@gmail.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230617180420.GA410966@sumitra.com
[tursulin: Removed blank line within tags. Fixup commit text.]

show more ...


# f8a101ff 21-Jun-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

i915: convert i915_gpu_error to use a folio_batch

Remove one of the last remaining users of pagevec.

Link: https://lkml.kernel.org/r/20230621164557.3510324-9-willy@infradead.org
Signed-off-by: Matt

i915: convert i915_gpu_error to use a folio_batch

Remove one of the last remaining users of pagevec.

Link: https://lkml.kernel.org/r/20230621164557.3510324-9-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

show more ...


# 6197cff3 18-Apr-2023 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Dump error capture to kernel log

This is useful for getting debug information out in certain
situations, such as failing kernel selftests and CI runs that don't
log error captures. It is e

drm/i915: Dump error capture to kernel log

This is useful for getting debug information out in certain
situations, such as failing kernel selftests and CI runs that don't
log error captures. It is especially useful for things like retrieving
GuC logs as GuC operation can't be tracked by adding printk or ftrace
entries.

v2: Add CONFIG_DRM_I915_DEBUG_GEM wrapper (review feedback by Rodrigo).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230418181744.3251240-2-John.C.Harrison@Intel.com

show more ...


# 9275277d 09-May-2023 Fei Yang <fei.yang@intel.com>

drm/i915: use pat_index instead of cache_level

Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
th

drm/i915: use pat_index instead of cache_level

Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.

From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.

For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.

One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.

Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.

Bspec: 63019

Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com

show more ...


# 88629fee 02-May-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915/error: fix i915_capture_error_state() kernel-doc

drivers/gpu/drm/i915/i915_gpu_error.c:2174: warning: Function parameter or member 'dump_flags' not described in 'i915_capture_error_state'

drm/i915/error: fix i915_capture_error_state() kernel-doc

drivers/gpu/drm/i915/i915_gpu_error.c:2174: warning: Function parameter or member 'dump_flags' not described in 'i915_capture_error_state'

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20524292b002800975d82d23b5bd47da878f1733.1683041799.git.jani.nikula@intel.com

show more ...


# e4730ae4 28-Apr-2023 John Harrison <John.C.Harrison@Intel.com>

drm/i915/guc: Fix error capture for virtual engines

GuC based register dumps in error capture logs were basically broken
for virtual engines. This can be seen in igt@gem_exec_balancer@hang:
[IGT]

drm/i915/guc: Fix error capture for virtual engines

GuC based register dumps in error capture logs were basically broken
for virtual engines. This can be seen in igt@gem_exec_balancer@hang:
[IGT] gem_exec_balancer: starting subtest hang
[drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388]
[drm] GT0: GUC: No register capture node found for 0x1005 / 0xFEDC311D
[drm] GPU HANG: ecode 12:4:00000000, in gem_exec_balanc [6388]
[IGT] gem_exec_balancer: exiting, ret=0

The test causes a hang on both engines of a virtual engine context.
The engine instance zero hang gets a valid error capture but the
non-instance-zero hang does not.

Fix that by scanning through the list of pending register captures
when a hang notification for a virtual engine is received. That way,
the hang can be assigned to the correct physical engine prior to
starting the error capture process. So later on, when the error capture
handler tries to find the engine register list, it looks for one on
the correct engine.

Also, sneak in a missing blank line before a comment in the node
search code.

v2: Fix null pointer deref on non-GuC platforms.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230428185636.457407-5-John.C.Harrison@Intel.com

show more ...


# c8a76df6 11-Mar-2023 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Include timeline seqno in error capture

The seqno value actually written out to memory is no longer in the
regular HWSP. Instead, it is now in its own private timeline buffer.
Thus, it is

drm/i915: Include timeline seqno in error capture

The seqno value actually written out to memory is no longer in the
regular HWSP. Instead, it is now in its own private timeline buffer.
Thus, it is no longer visible in an error capture. So, explicitly read
the value and include that in the capture.

v2: %d -> %u (Alan)

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230311063714.570389-4-John.C.Harrison@Intel.com

show more ...


# e7696d65 27-Jan-2023 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Allow error capture of a pending request

A hang situation has been observed where the only requests on the
context were either completed or not yet started according to the
breaadcrumbs. H

drm/i915: Allow error capture of a pending request

A hang situation has been observed where the only requests on the
context were either completed or not yet started according to the
breaadcrumbs. However, the register state claimed a batch was (maybe)
in progress. So, allow capture of the pending request on the grounds
that this might be better than nothing.

v2: Reword 'not started' warning message (Tvrtko)

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-6-John.C.Harrison@Intel.com

show more ...


# e8a3319c 27-Jan-2023 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Allow error capture without a request

There was a report of error captures occurring without any hung
context being indicated despite the capture being initiated by a 'hung
context notific

drm/i915: Allow error capture without a request

There was a report of error captures occurring without any hung
context being indicated despite the capture being initiated by a 'hung
context notification' from GuC. The problem was not reproducible.
However, it is possible to happen if the context in question has no
active requests. For example, if the hang was in the context switch
itself then the breadcrumb write would have occurred and the KMD would
see an idle context.

In the interests of attempting to provide as much information as
possible about a hang, it seems wise to include the engine info
regardless of whether a request was found or not. As opposed to just
prentending there was no hang at all.

So update the error capture code to always record engine information
if a context is given. Which means updating record_context() to take a
context instead of a request (which it only ever used to find the
context anyway). And split the request agnostic parts of
intel_engine_coredump_add_request() out into a seaprate function.

v2: Remove a duplicate 'if' statement (Umesh) and fix a put of a null
pointer.
v3: Tidy up request locking code flow (Tvrtko)
v4: Pull in improved info message from next patch and fix up potential
leak of GuC register state (Daniele)

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> (v2)
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-5-John.C.Harrison@Intel.com

show more ...


# a4be3dca 27-Jan-2023 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Fix up locking around dumping requests lists

The debugfs dump of requests was confused about what state requires
the execlist lock versus the GuC lock. There was also a bunch of
duplicated

drm/i915: Fix up locking around dumping requests lists

The debugfs dump of requests was confused about what state requires
the execlist lock versus the GuC lock. There was also a bunch of
duplicated messy code between it and the error capture code.

So refactor the hung request search into a re-usable function. And
reduce the span of the execlist state lock to only the execlist
specific code paths. In order to do that, also move the report of hold
count (which is an execlist only concept) from the top level dump
function to the lower level execlist specific function. Also, move the
execlist specific code into the execlist source file.

v2: Rename some functions and move to more appropriate files (Daniele).
v3: Rename new execlist dump function (Daniele)

Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset with GuC")
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Michael Cheng <michael.cheng@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-4-John.C.Harrison@Intel.com

show more ...


# 3700e353 27-Jan-2023 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Fix request ref counting during error capture & debugfs dump

When GuC support was added to error capture, the reference counting
around the request object was broken. Fix it up.

The conte

drm/i915: Fix request ref counting during error capture & debugfs dump

When GuC support was added to error capture, the reference counting
around the request object was broken. Fix it up.

The context based search manages the spinlocking around the search
internally. So it needs to grab the reference count internally as
well. The execlist only request based search relies on external
locking, so it needs an external reference count but within the
spinlock not outside it.

The only other caller of the context based search is the code for
dumping engine state to debugfs. That code wasn't previously getting
an explicit reference at all as it does everything while holding the
execlist specific spinlock. So, that needs updaing as well as that
spinlock doesn't help when using GuC submission. Rather than trying to
conditionally get/put depending on submission model, just change it to
always do the get/put.

v2: Explicitly document adding an extra blank line in some dense code
(Andy Shevchenko). Fix multiple potential null pointer derefs in case
of no request found (some spotted by Tvrtko, but there was more!).
Also fix a leaked request in case of !started and another in
__guc_reset_context now that intel_context_find_active_request is
actually reference counting the returned request.
v3: Add a _get suffix to intel_context_find_active_request now that it
grabs a reference (Daniele).
v4: Split the intel_guc_find_hung_context change to a separate patch
and rename intel_context_find_active_request_get to
intel_context_get_active_request (Tvrtko).
v5: s/locking/reference counting/ in commit message (Tvrtko)

Fixes: dc0dad365c5e ("drm/i915/guc: Fix for error capture after full GPU reset with GuC")
Fixes: 573ba126aef3 ("drm/i915/guc: Capture error state on context reset")
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Michael Cheng <michael.cheng@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127002842.3169194-3-John.C.Harrison@Intel.com

show more ...


# 801543b2 09-Nov-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: stop including i915_irq.h from i915_trace.h

Turns out many of the files that need i915_reg.h get it implicitly via
{display/intel_de.h, gt/intel_context.h} -> i915_trace.h -> i915_irq.h
->

drm/i915: stop including i915_irq.h from i915_trace.h

Turns out many of the files that need i915_reg.h get it implicitly via
{display/intel_de.h, gt/intel_context.h} -> i915_trace.h -> i915_irq.h
-> i915_reg.h. Since i915_trace.h doesn't actually need i915_irq.h,
makes sense to drop it, but that requires adding quite a few new
includes all over the place.

Prefer including i915_reg.h where needed instead of adding another
implicit include, because eventually we'll want to split up i915_reg.h
and only include the specific registers at each place.

Also some places actually needed i915_irq.h too.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6e78a2e0ac1bffaf5af3b5ccc21dff05e6518cef.1668008071.git.jani.nikula@intel.com

show more ...


# ab1b2d40 14-Oct-2022 Matt Roper <matthew.d.roper@intel.com>

drm/i915/xehp: Check for faults on primary GAM

On Xe_HP the fault registers are now in a multicast register range.
However as part of the GAM these registers follow special rules and we
need only re

drm/i915/xehp: Check for faults on primary GAM

On Xe_HP the fault registers are now in a multicast register range.
However as part of the GAM these registers follow special rules and we
need only read from the "primary" GAM's instance to get the information
we need. So a single intel_gt_mcr_read_any() (which will automatically
steer to the primary GAM) is sufficient; we don't need to loop over each
instance of the MCR register.

v2:
- Update more instances of fault registers. (Bala)

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221014230239.1023689-7-matthew.d.roper@intel.com

show more ...


12345678910>>...19