75b6984e | 10-Jan-2023 |
Ofir Bitton <obitton@habana.ai> |
habanalabs: optimize command submission completion timestamp
Completion timestamp is taken during the actual command submission release. As the release happens in a work queue, the timestamp taken i
habanalabs: optimize command submission completion timestamp
Completion timestamp is taken during the actual command submission release. As the release happens in a work queue, the timestamp taken is not accurate. Hence, we will take the timestamp in the interrupt handler itself while propagating it to the release function.
Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
9a7d530a | 16-Jan-2023 |
Ofir Bitton <obitton@habana.ai> |
habanalabs: refactor user interrupt type
In order to support more user interrupt types in the future, we enumerate the user interrupt type instead of using a boolean.
Signed-off-by: Ofir Bitton <ob
habanalabs: refactor user interrupt type
In order to support more user interrupt types in the future, we enumerate the user interrupt type instead of using a boolean.
Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
12d3ea01 | 16-Jan-2023 |
Dani Liberman <dliberman@habana.ai> |
habanalabs/gaudi2: fix emda range registers razwi handling
Handling edma razwi is different than all other engines since edma uses sft routers. For hbw transactions sft router contain separate inter
habanalabs/gaudi2: fix emda range registers razwi handling
Handling edma razwi is different than all other engines since edma uses sft routers. For hbw transactions sft router contain separate interface for each edma and for lbw there is common interface for both edma engines of the same dcore.
To handle the razwi correctly we need to: 1. Simplify the calculation of the sft router address. 2. Add razwi handling for edma qm errors, since edma qman doesn't reports axi error response.
Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
a6685b57 | 11-Jan-2023 |
Koby Elbaz <kelbaz@habana.ai> |
habanalabs: block soft-reset on an unusable device
A device with status malfunction indicates that it can't be used. In such a case we do not support certain reset types, e.g., all kinds of soft-res
habanalabs: block soft-reset on an unusable device
A device with status malfunction indicates that it can't be used. In such a case we do not support certain reset types, e.g., all kinds of soft-resets (compute reset, inference soft-reset), and reset upon device release.
A hard-reset is the only way that an unusable device can change its status. All other reset procedures can't put the device in a reset procedure, which might ultimately cause the device to change its status, unintentionally, to become operational again.
Such a scenario has recently occurred, when a user requested a hard-reset while another heavy user workload was ongoing (reset request is queued). Since the workload couldn't finish within reset's timeout limits, the reset has failed and set a device status malfunction. Eventually, when the user released the FD, an unsuccessful soft-reset occurred, hence followed by an additional hard-reset that changed the ASICs status back to be operational.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
43647952 | 10-Jan-2023 |
Dani Liberman <dliberman@habana.ai> |
habanalabs/gaudi2: print page fault axi transaction id
AXI transaction id holds information about the initiator which caused the page fault. In the future it will be translated automatically by driv
habanalabs/gaudi2: print page fault axi transaction id
AXI transaction id holds information about the initiator which caused the page fault. In the future it will be translated automatically by driver to an initiator name.
Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
c89d19f7 | 11-Jan-2023 |
Dani Liberman <dliberman@habana.ai> |
habanalabe/gaudi2: add cfg base when displaying razwi addresses
Captured addresses of low b/w razwi information contains only the offset from the cfg base. To make it more user readable, add the cfg
habanalabe/gaudi2: add cfg base when displaying razwi addresses
Captured addresses of low b/w razwi information contains only the offset from the cfg base. To make it more user readable, add the cfg base to it.
Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
c21f9f34 | 05-Jan-2023 |
Dani Liberman <dliberman@habana.ai> |
habanalabs/gaudi2: read mmio razwi information
In gaudi2 there night be different routers for low b/w and high b/w transactions. But in the code that collects razwi information, we used the same rou
habanalabs/gaudi2: read mmio razwi information
In gaudi2 there night be different routers for low b/w and high b/w transactions. But in the code that collects razwi information, we used the same router for high b/w and low b/w.
Fixed it by reading the information also from low b/w routers.
Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
ac5af990 | 10-Jan-2023 |
farah kassabri <fkassabri@habana.ai> |
habanalabs: fix bug in timestamps registration code
Protect re-using the same timestamp buffer record before actually adding it to the to interrupt wait list. Mark ts buff offset as in use in the sp
habanalabs: fix bug in timestamps registration code
Protect re-using the same timestamp buffer record before actually adding it to the to interrupt wait list. Mark ts buff offset as in use in the spinlock protection area of the interrupt wait list to avoid getting in the re-use section in ts_buff_get_kernel_ts_record before adding the node to the list. this scenario might happen when multiple threads are racing on same offset and one thread could set data in the ts buff in ts_buff_get_kernel_ts_record then the other thread takes over and get to ts_buff_get_kernel_ts_record and we will try to re-use the same ts buff offset then we will try to delete a non existing node from the list.
Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
1693fef9 | 08-Jan-2023 |
farah kassabri <fkassabri@habana.ai> |
habanalabs: bugs fixes in timestamps buff alloc
use argument instead of fixed GFP value for allocation in Timestamps buffers alloc function. change data type of size to size_t.
Fixes: 9158bf69e74f
habanalabs: bugs fixes in timestamps buff alloc
use argument instead of fixed GFP value for allocation in Timestamps buffers alloc function. change data type of size to size_t.
Fixes: 9158bf69e74f ("habanalabs: Timestamps buffers registration") Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
72848de0 | 03-Jan-2023 |
farah kassabri <fkassabri@habana.ai> |
habanalabs: check pad and reserved fields in ioctls
Make sure all reserved/pad fields in uapi input structures are set to 0.
Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Ga
habanalabs: check pad and reserved fields in ioctls
Make sure all reserved/pad fields in uapi input structures are set to 0.
Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
0970380a | 10-Jan-2023 |
XU pengfei <xupengfei@nfschina.com> |
habanalabs: remove unnecessary (void*) conversions
data is a void * type and does not require a cast.
Signed-off-by: XU pengfei <xupengfei@nfschina.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org
habanalabs: remove unnecessary (void*) conversions
data is a void * type and does not require a cast.
Signed-off-by: XU pengfei <xupengfei@nfschina.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
7d352a81 | 10-Jan-2023 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
habanalabs: Replace zero-length arrays with flexible-array members
Zero-length arrays are deprecated[1] and we are moving towards adopting C99 flexible-array members instead. So, replace zero-length
habanalabs: Replace zero-length arrays with flexible-array members
Zero-length arrays are deprecated[1] and we are moving towards adopting C99 flexible-array members instead. So, replace zero-length arrays in a couple of structures with flex-array members.
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help us make progress towards globally enabling -fstrict-flex-arrays=3 [2].
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays [1] Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [2] Link: https://github.com/KSPP/linux/issues/78 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
2a0a839b | 29-Dec-2022 |
Moti Haimovski <mhaimovski@habana.ai> |
habanalabs: extend fatal messages to contain PCI info
This commit attaches the PCI device address to driver fatal messages in order to ease debugging in multi-device setups.
Signed-off-by: Moti Hai
habanalabs: extend fatal messages to contain PCI info
This commit attaches the PCI device address to driver fatal messages in order to ease debugging in multi-device setups.
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
eaca606e | 03-Jan-2023 |
Dani Liberman <dliberman@habana.ai> |
habanalabs/gaudi2: remove use of razwi info received from f/w
Because f/w does not update razwi info when sending events, remove the use of it. The driver is responsible to check if razwi happened a
habanalabs/gaudi2: remove use of razwi info received from f/w
Because f/w does not update razwi info when sending events, remove the use of it. The driver is responsible to check if razwi happened and to collect razwi data.
Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
54fcb384 | 30-Nov-2022 |
Ohad Sharabi <osharabi@habana.ai> |
habanalabs: trace LBW reads/writes
Add traces to LBW reads/writes. This may be handy when debugging configuration failure or events when tracking configuration flow.
Signed-off-by: Ohad Sharabi <os
habanalabs: trace LBW reads/writes
Add traces to LBW reads/writes. This may be handy when debugging configuration failure or events when tracking configuration flow.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
200f3cf0 | 04-Jan-2023 |
Carmit Carmel <ccarmel@habana.ai> |
habanalabs/gaudi2: fix log for sob value overflow/underflow
The value in SM_SEI_CAUSE includes the SOB index and not the SOB group index. Remove usage of log_mask in sm_sei_cause structure as it was
habanalabs/gaudi2: fix log for sob value overflow/underflow
The value in SM_SEI_CAUSE includes the SOB index and not the SOB group index. Remove usage of log_mask in sm_sei_cause structure as it was never used.
Signed-off-by: Carmit Carmel <ccarmel@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
ab509d81 | 02-Jan-2023 |
Ohad Sharabi <osharabi@habana.ai> |
habanalabs: add set engines masks ASIC function
This function shall be used whenever components enable/binning masks should be updated.
Usage is in one of the below cases: - update user (or default
habanalabs: add set engines masks ASIC function
This function shall be used whenever components enable/binning masks should be updated.
Usage is in one of the below cases: - update user (or default) component masks - update when getting the masks from FW (either CPUCP or COMMS)
Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
571d1a72 | 23-Dec-2022 |
Koby Elbaz <kelbaz@habana.ai> |
habanalabs: protect access to dynamic mem 'user_mappings'
When HL_INFO_USER_MAPPINGS IOCTL is called, we copy_to_user from a dynamically allocated memory - 'user_mappings'. Since freeing/allocating
habanalabs: protect access to dynamic mem 'user_mappings'
When HL_INFO_USER_MAPPINGS IOCTL is called, we copy_to_user from a dynamically allocated memory - 'user_mappings'. Since freeing/allocating it happens in runtime (upon a page fault), it not unlikely to access it even before being initially allocated (i.e., accessing a NULL pointer).
The solution is to simply mark the spot when the err info has been collected, and that way to know whether err info (either page fault or RAZWI) is available to be read.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
c7d7b9ac | 07-Jan-2023 |
Tom Rix <trix@redhat.com> |
habanalabs: remove redundant memset
From reviewing the code, the line memset(kdata, 0, usize); is not needed because kdata is either zeroed by kdata = kzalloc(asize, GFP_KERNEL); when allocated
habanalabs: remove redundant memset
From reviewing the code, the line memset(kdata, 0, usize); is not needed because kdata is either zeroed by kdata = kzalloc(asize, GFP_KERNEL); when allocated at runtime or by char stack_kdata[128] = {0}; at compile time.
Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
78baccbd | 25-Dec-2022 |
Koby Elbaz <kelbaz@habana.ai> |
habanalabs: refactor razwi/page-fault information structures
This refactor makes the code clearer and the new variables' names better describe their roles.
Signed-off-by: Koby Elbaz <kelbaz@habana.
habanalabs: refactor razwi/page-fault information structures
This refactor makes the code clearer and the new variables' names better describe their roles.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
6cfb0013 | 21-Dec-2022 |
Koby Elbaz <kelbaz@habana.ai> |
habanalabs/gaudi2: avoid reconfiguring the same PB registers
It appears that, within the sync manager security configuration, we reconfigure PB registers over and over without any need to do that.
habanalabs/gaudi2: avoid reconfiguring the same PB registers
It appears that, within the sync manager security configuration, we reconfigure PB registers over and over without any need to do that.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
4083697a | 25-Dec-2022 |
Ofir Bitton <obitton@habana.ai> |
habanalabs/gaudi: allow device acquire while in debug mode
During device acquire, the driver is using a QMAN for clearing some registers. In order to avoid internal races, the driver verifies the de
habanalabs/gaudi: allow device acquire while in debug mode
During device acquire, the driver is using a QMAN for clearing some registers. In order to avoid internal races, the driver verifies the device is idle before submitting the register clear job.
This check introduces an issue, as debug mode will cause the device to be non-idle which will lead to device acquire failure.
In order to overcome this issue we can entirely remove the idle check as the driver is using the QMAN only when there is no active context.
Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
e1e8e747 | 22-Dec-2022 |
Oded Gabbay <ogabbay@kernel.org> |
habanalabs: move some prints to debug level
When entering an IOCTL, the driver prints a message in case device is not operational. This message should be printed in debug level as it can spam the ke
habanalabs: move some prints to debug level
When entering an IOCTL, the driver prints a message in case device is not operational. This message should be printed in debug level as it can spam the kernel log and it is not an error.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
139dad04 | 21-Dec-2022 |
Oded Gabbay <ogabbay@kernel.org> |
habanalabs: update f/w files
Update common firmware files with the latest version. There is no functional change.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org> |
bcace6f0 | 21-Dec-2022 |
Oded Gabbay <ogabbay@kernel.org> |
habanalabs/gaudi2: update f/w files
Update gaudi2 firmware files with the latest version. There is no functional change.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org> |