History log of /linux/drivers/accel/habanalabs/ (Results 251 – 275 of 279)
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
75b6984e10-Jan-2023 Ofir Bitton <obitton@habana.ai>

habanalabs: optimize command submission completion timestamp

Completion timestamp is taken during the actual command submission
release. As the release happens in a work queue, the timestamp taken
i

habanalabs: optimize command submission completion timestamp

Completion timestamp is taken during the actual command submission
release. As the release happens in a work queue, the timestamp taken
is not accurate. Hence, we will take the timestamp in the interrupt
handler itself while propagating it to the release function.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

9a7d530a16-Jan-2023 Ofir Bitton <obitton@habana.ai>

habanalabs: refactor user interrupt type

In order to support more user interrupt types in the future, we
enumerate the user interrupt type instead of using a boolean.

Signed-off-by: Ofir Bitton <ob

habanalabs: refactor user interrupt type

In order to support more user interrupt types in the future, we
enumerate the user interrupt type instead of using a boolean.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

12d3ea0116-Jan-2023 Dani Liberman <dliberman@habana.ai>

habanalabs/gaudi2: fix emda range registers razwi handling

Handling edma razwi is different than all other engines since edma
uses sft routers. For hbw transactions sft router contain separate
inter

habanalabs/gaudi2: fix emda range registers razwi handling

Handling edma razwi is different than all other engines since edma
uses sft routers. For hbw transactions sft router contain separate
interface for each edma and for lbw there is common interface for
both edma engines of the same dcore.

To handle the razwi correctly we need to:
1. Simplify the calculation of the sft router address.
2. Add razwi handling for edma qm errors, since edma qman doesn't
reports axi error response.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

a6685b5711-Jan-2023 Koby Elbaz <kelbaz@habana.ai>

habanalabs: block soft-reset on an unusable device

A device with status malfunction indicates that it can't be used.
In such a case we do not support certain reset types, e.g.,
all kinds of soft-res

habanalabs: block soft-reset on an unusable device

A device with status malfunction indicates that it can't be used.
In such a case we do not support certain reset types, e.g.,
all kinds of soft-resets (compute reset, inference soft-reset),
and reset upon device release.

A hard-reset is the only way that an unusable device can change its
status. All other reset procedures can't put the device in a reset
procedure, which might ultimately cause the device to change its
status, unintentionally, to become operational again.

Such a scenario has recently occurred, when a user requested
a hard-reset while another heavy user workload was ongoing (reset
request is queued).
Since the workload couldn't finish within reset's timeout limits, the
reset has failed and set a device status malfunction.
Eventually, when the user released the FD, an unsuccessful soft-reset
occurred, hence followed by an additional hard-reset that changed the
ASICs status back to be operational.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

4364795210-Jan-2023 Dani Liberman <dliberman@habana.ai>

habanalabs/gaudi2: print page fault axi transaction id

AXI transaction id holds information about the initiator which caused
the page fault. In the future it will be translated automatically by
driv

habanalabs/gaudi2: print page fault axi transaction id

AXI transaction id holds information about the initiator which caused
the page fault. In the future it will be translated automatically by
driver to an initiator name.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

c89d19f711-Jan-2023 Dani Liberman <dliberman@habana.ai>

habanalabe/gaudi2: add cfg base when displaying razwi addresses

Captured addresses of low b/w razwi information contains only the
offset from the cfg base. To make it more user readable, add the cfg

habanalabe/gaudi2: add cfg base when displaying razwi addresses

Captured addresses of low b/w razwi information contains only the
offset from the cfg base. To make it more user readable, add the cfg
base to it.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

c21f9f3405-Jan-2023 Dani Liberman <dliberman@habana.ai>

habanalabs/gaudi2: read mmio razwi information

In gaudi2 there night be different routers for low b/w and high b/w
transactions. But in the code that collects razwi information, we used
the same rou

habanalabs/gaudi2: read mmio razwi information

In gaudi2 there night be different routers for low b/w and high b/w
transactions. But in the code that collects razwi information, we used
the same router for high b/w and low b/w.

Fixed it by reading the information also from low b/w routers.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

ac5af99010-Jan-2023 farah kassabri <fkassabri@habana.ai>

habanalabs: fix bug in timestamps registration code

Protect re-using the same timestamp buffer record before actually
adding it to the to interrupt wait list.
Mark ts buff offset as in use in the sp

habanalabs: fix bug in timestamps registration code

Protect re-using the same timestamp buffer record before actually
adding it to the to interrupt wait list.
Mark ts buff offset as in use in the spinlock protection area of the
interrupt wait list to avoid getting in the re-use section in
ts_buff_get_kernel_ts_record before adding the node to the list.
this scenario might happen when multiple threads are racing on
same offset and one thread could set data in the ts buff in
ts_buff_get_kernel_ts_record then the other thread takes over
and get to ts_buff_get_kernel_ts_record and we will try
to re-use the same ts buff offset then we will try to
delete a non existing node from the list.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

1693fef908-Jan-2023 farah kassabri <fkassabri@habana.ai>

habanalabs: bugs fixes in timestamps buff alloc

use argument instead of fixed GFP value for allocation
in Timestamps buffers alloc function.
change data type of size to size_t.

Fixes: 9158bf69e74f

habanalabs: bugs fixes in timestamps buff alloc

use argument instead of fixed GFP value for allocation
in Timestamps buffers alloc function.
change data type of size to size_t.

Fixes: 9158bf69e74f ("habanalabs: Timestamps buffers registration")
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

72848de003-Jan-2023 farah kassabri <fkassabri@habana.ai>

habanalabs: check pad and reserved fields in ioctls

Make sure all reserved/pad fields in uapi input structures
are set to 0.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Ga

habanalabs: check pad and reserved fields in ioctls

Make sure all reserved/pad fields in uapi input structures
are set to 0.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

0970380a10-Jan-2023 XU pengfei <xupengfei@nfschina.com>

habanalabs: remove unnecessary (void*) conversions

data is a void * type and does not require a cast.

Signed-off-by: XU pengfei <xupengfei@nfschina.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org

habanalabs: remove unnecessary (void*) conversions

data is a void * type and does not require a cast.

Signed-off-by: XU pengfei <xupengfei@nfschina.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

7d352a8110-Jan-2023 Gustavo A. R. Silva <gustavoars@kernel.org>

habanalabs: Replace zero-length arrays with flexible-array members

Zero-length arrays are deprecated[1] and we are moving towards
adopting C99 flexible-array members instead. So, replace zero-length

habanalabs: Replace zero-length arrays with flexible-array members

Zero-length arrays are deprecated[1] and we are moving towards
adopting C99 flexible-array members instead. So, replace zero-length
arrays in a couple of structures with flex-array members.

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [2].

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays [1]
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2]
Link: https://github.com/KSPP/linux/issues/78
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

2a0a839b29-Dec-2022 Moti Haimovski <mhaimovski@habana.ai>

habanalabs: extend fatal messages to contain PCI info

This commit attaches the PCI device address to driver fatal messages
in order to ease debugging in multi-device setups.

Signed-off-by: Moti Hai

habanalabs: extend fatal messages to contain PCI info

This commit attaches the PCI device address to driver fatal messages
in order to ease debugging in multi-device setups.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

eaca606e03-Jan-2023 Dani Liberman <dliberman@habana.ai>

habanalabs/gaudi2: remove use of razwi info received from f/w

Because f/w does not update razwi info when sending events, remove the
use of it.
The driver is responsible to check if razwi happened a

habanalabs/gaudi2: remove use of razwi info received from f/w

Because f/w does not update razwi info when sending events, remove the
use of it.
The driver is responsible to check if razwi happened and to
collect razwi data.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

54fcb38430-Nov-2022 Ohad Sharabi <osharabi@habana.ai>

habanalabs: trace LBW reads/writes

Add traces to LBW reads/writes.
This may be handy when debugging configuration failure or events when
tracking configuration flow.

Signed-off-by: Ohad Sharabi <os

habanalabs: trace LBW reads/writes

Add traces to LBW reads/writes.
This may be handy when debugging configuration failure or events when
tracking configuration flow.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

200f3cf004-Jan-2023 Carmit Carmel <ccarmel@habana.ai>

habanalabs/gaudi2: fix log for sob value overflow/underflow

The value in SM_SEI_CAUSE includes the SOB index and not the SOB group
index.
Remove usage of log_mask in sm_sei_cause structure as it was

habanalabs/gaudi2: fix log for sob value overflow/underflow

The value in SM_SEI_CAUSE includes the SOB index and not the SOB group
index.
Remove usage of log_mask in sm_sei_cause structure as it was never
used.

Signed-off-by: Carmit Carmel <ccarmel@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

ab509d8102-Jan-2023 Ohad Sharabi <osharabi@habana.ai>

habanalabs: add set engines masks ASIC function

This function shall be used whenever components enable/binning masks
should be updated.

Usage is in one of the below cases:
- update user (or default

habanalabs: add set engines masks ASIC function

This function shall be used whenever components enable/binning masks
should be updated.

Usage is in one of the below cases:
- update user (or default) component masks
- update when getting the masks from FW (either CPUCP or COMMS)

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

571d1a7223-Dec-2022 Koby Elbaz <kelbaz@habana.ai>

habanalabs: protect access to dynamic mem 'user_mappings'

When HL_INFO_USER_MAPPINGS IOCTL is called, we copy_to_user from
a dynamically allocated memory - 'user_mappings'.
Since freeing/allocating

habanalabs: protect access to dynamic mem 'user_mappings'

When HL_INFO_USER_MAPPINGS IOCTL is called, we copy_to_user from
a dynamically allocated memory - 'user_mappings'.
Since freeing/allocating it happens in runtime (upon a page fault),
it not unlikely to access it even before being initially allocated
(i.e., accessing a NULL pointer).

The solution is to simply mark the spot when the err info has been
collected, and that way to know whether err info (either page fault
or RAZWI) is available to be read.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

c7d7b9ac07-Jan-2023 Tom Rix <trix@redhat.com>

habanalabs: remove redundant memset

From reviewing the code, the line
memset(kdata, 0, usize);
is not needed because kdata is either zeroed by
kdata = kzalloc(asize, GFP_KERNEL);
when allocated

habanalabs: remove redundant memset

From reviewing the code, the line
memset(kdata, 0, usize);
is not needed because kdata is either zeroed by
kdata = kzalloc(asize, GFP_KERNEL);
when allocated at runtime or by
char stack_kdata[128] = {0};
at compile time.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

78baccbd25-Dec-2022 Koby Elbaz <kelbaz@habana.ai>

habanalabs: refactor razwi/page-fault information structures

This refactor makes the code clearer and the new variables' names
better describe their roles.

Signed-off-by: Koby Elbaz <kelbaz@habana.

habanalabs: refactor razwi/page-fault information structures

This refactor makes the code clearer and the new variables' names
better describe their roles.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

6cfb001321-Dec-2022 Koby Elbaz <kelbaz@habana.ai>

habanalabs/gaudi2: avoid reconfiguring the same PB registers

It appears that, within the sync manager security configuration,
we reconfigure PB registers over and over without any need to do that.

habanalabs/gaudi2: avoid reconfiguring the same PB registers

It appears that, within the sync manager security configuration,
we reconfigure PB registers over and over without any need to do that.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

4083697a25-Dec-2022 Ofir Bitton <obitton@habana.ai>

habanalabs/gaudi: allow device acquire while in debug mode

During device acquire, the driver is using a QMAN for clearing some
registers. In order to avoid internal races, the driver verifies
the de

habanalabs/gaudi: allow device acquire while in debug mode

During device acquire, the driver is using a QMAN for clearing some
registers. In order to avoid internal races, the driver verifies
the device is idle before submitting the register clear job.

This check introduces an issue, as debug mode will cause the device
to be non-idle which will lead to device acquire failure.

In order to overcome this issue we can entirely remove the idle
check as the driver is using the QMAN only when there is no active
context.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

e1e8e74722-Dec-2022 Oded Gabbay <ogabbay@kernel.org>

habanalabs: move some prints to debug level

When entering an IOCTL, the driver prints a message in case device is
not operational. This message should be printed in debug level as
it can spam the ke

habanalabs: move some prints to debug level

When entering an IOCTL, the driver prints a message in case device is
not operational. This message should be printed in debug level as
it can spam the kernel log and it is not an error.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

show more ...

139dad0421-Dec-2022 Oded Gabbay <ogabbay@kernel.org>

habanalabs: update f/w files

Update common firmware files with the latest version.
There is no functional change.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

bcace6f021-Dec-2022 Oded Gabbay <ogabbay@kernel.org>

habanalabs/gaudi2: update f/w files

Update gaudi2 firmware files with the latest version.
There is no functional change.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>

1...<<1112