fa58b594 | 12-Feb-2024 |
Ofir Bitton <obitton@habana.ai> |
accel/habanalabs: modify pci health check
Today we read PCI VENDOR-ID in order to make sure PCI link is healthy. Apparently the VENDOR-ID might be stored on host and hence, when we read it we might
accel/habanalabs: modify pci health check
Today we read PCI VENDOR-ID in order to make sure PCI link is healthy. Apparently the VENDOR-ID might be stored on host and hence, when we read it we might not access the PCI bus. In order to make sure PCI health check is reliable, we will start checking the DEVICE-ID instead.
Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
c5170683 | 30-Jan-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: keep explicit size of reserved memory for FW
The reserved memory for FW is currently saved in an ASIC property in units of MB, just like the value that comes from FW. Except the fa
accel/habanalabs: keep explicit size of reserved memory for FW
The reserved memory for FW is currently saved in an ASIC property in units of MB, just like the value that comes from FW. Except the fact that it is not clear from the property's name, it means also that a calculation to actual size is required everywhere that it is used. Modify the property to hold the size in bytes.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
db45bbdd | 29-Jan-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: handle reserved memory request when working with full FW
Currently the reserved memory request from FW is handled when running with preboot only, but this request is relevant also
accel/habanalabs: handle reserved memory request when working with full FW
Currently the reserved memory request from FW is handled when running with preboot only, but this request is relevant also when running with full FW. Modify to always handle this reservation request.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
5b6658eb | 06-Feb-2024 |
Ofir Bitton <obitton@habana.ai> |
accel/habanalabs/hwmon: rate limit errors user can generate
Fetching sensor data can fail due to various reasons. In order not to pollute the kernel log, those error prints must be rate limited.
Si
accel/habanalabs/hwmon: rate limit errors user can generate
Fetching sensor data can fail due to various reasons. In order not to pollute the kernel log, those error prints must be rate limited.
Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
3bf6ef98 | 05-Feb-2024 |
Ofir Bitton <obitton@habana.ai> |
accel/habanalabs/gaudi2: drain event lacks rd/wr indication
Due to a H/W issue, AXI drain event does not include a read/write indication, hence we remove this print.
Signed-off-by: Ofir Bitton <obi
accel/habanalabs/gaudi2: drain event lacks rd/wr indication
Due to a H/W issue, AXI drain event does not include a read/write indication, hence we remove this print.
Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
fd8d2fa0 | 05-Feb-2024 |
Dani Liberman <dliberman@habana.ai> |
accel/habanalabs: fix error print
The unmasking is for event and it can be other event than RAZWI.
Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> R
accel/habanalabs: fix error print
The unmasking is for event and it can be other event than RAZWI.
Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
c8c062e9 | 31-Jan-2024 |
Tal Risin <trisin@habana.ai> |
accel/habanalabs: initialize maybe-uninitialized variables
Prevent static analysis warning.
Signed-off-by: Tal Risin <trisin@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ca
accel/habanalabs: initialize maybe-uninitialized variables
Prevent static analysis warning.
Signed-off-by: Tal Risin <trisin@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
0b105a2a | 16-Jan-2024 |
Avri Kehat <akehat@habana.ai> |
accel/habanalabs: fix debugfs files permissions
debugfs files are created with permissions that don't align with the access requirements.
Signed-off-by: Avri Kehat <akehat@habana.ai> Reviewed-by: O
accel/habanalabs: fix debugfs files permissions
debugfs files are created with permissions that don't align with the access requirements.
Signed-off-by: Avri Kehat <akehat@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
e855869b | 25-Jan-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: fix glbl error cause handling
The glbl error cause handling has a wrong assumption that all error bits are consecutive. Fix the handling to check all relevant error bits per ASIC.
accel/habanalabs: fix glbl error cause handling
The glbl error cause handling has a wrong assumption that all error bits are consecutive. Fix the handling to check all relevant error bits per ASIC.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
c1e89ae4 | 18-Jan-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs/gaudi2: check extended errors according to PCIe addr_dec interrupt info
The FW interrupt info for a PCIe addr_dec event is set correctly, so check for either global errors or razwi
accel/habanalabs/gaudi2: check extended errors according to PCIe addr_dec interrupt info
The FW interrupt info for a PCIe addr_dec event is set correctly, so check for either global errors or razwi according to the indications there.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
7159813c | 18-Jan-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: modify print for skip loading linux FW to debug log
Skip loading a linux FW image into the device with the current supported ASICs is done for test purposes only. Moreover, for fut
accel/habanalabs: modify print for skip loading linux FW to debug log
Skip loading a linux FW image into the device with the current supported ASICs is done for test purposes only. Moreover, for future supported ASICs it is possible that there won't be a need to load such an image. The print in such a case is therefore not needed in most cases, so replace the used dev_info() with dev_dbg().
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
c14e5cd3 | 15-Jan-2024 |
Farah Kassabri <fkassabri@habana.ai> |
accel/habanalabs: remove hop size from asic properties
The hop size related properties is a MMU properties and not asic properties. As for PMMU and HMMU we could have different sizes.
Signed-off-by
accel/habanalabs: remove hop size from asic properties
The hop size related properties is a MMU properties and not asic properties. As for PMMU and HMMU we could have different sizes.
Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
9e263c50 | 20-Jan-2024 |
Erick Archer <erick.archer@gmx.com> |
accel/habanalabs: use kcalloc() instead of kzalloc()
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multipli
accel/habanalabs: use kcalloc() instead of kzalloc()
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors.
So, use the purpose specific kcalloc() function instead of the argument size * count in the kzalloc() function.
Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1] Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Erick Archer <erick.archer@gmx.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
5ae8b6b7 | 06-Jan-2024 |
Colin Ian King <colin.i.king@intel.com> |
accel/habanalabs/goya: remove redundant assignment to pointer 'input'
The pointer input is assigned a value that is not read, it is being re-assigned again later with the same value. Resolve this by
accel/habanalabs/goya: remove redundant assignment to pointer 'input'
The pointer input is assigned a value that is not read, it is being re-assigned again later with the same value. Resolve this by moving the declaration to input into the if block.
Cleans up clang scan build warning: warning: Value stored to 'input' during its initialization is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@intel.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
01f8cd0f | 02-Jan-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs/gaudi2: fail memory memset when failing to copy QM packet to device
gaudi2_memset_memory_chunk_using_edma_qm() calls the access_dev_mem() ASIC function, but ignores its return value
accel/habanalabs/gaudi2: fail memory memset when failing to copy QM packet to device
gaudi2_memset_memory_chunk_using_edma_qm() calls the access_dev_mem() ASIC function, but ignores its return value. Add this missing check.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
731d320e | 01-Jan-2024 |
Dani Liberman <dliberman@habana.ai> |
accel/habanalabs: remove call to deprecated function
In newer kernel versions, irq_set_affinity_hint() is deprecated. Instead, use the newer version which is irq_set_affinity_and_hint().
Signed-off
accel/habanalabs: remove call to deprecated function
In newer kernel versions, irq_set_affinity_hint() is deprecated. Instead, use the newer version which is irq_set_affinity_and_hint().
Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
8a5be2b6 | 28-Dec-2023 |
Malkoot Khan <engr.mkhan1990@gmail.com> |
accel/habanalabs: Remove unnecessary braces from if statement
The coding style in the Linux kernel prefers not to use braces for single-statement if conditions. This patch removes the unnecessary br
accel/habanalabs: Remove unnecessary braces from if statement
The coding style in the Linux kernel prefers not to use braces for single-statement if conditions. This patch removes the unnecessary braces from an if statement in the file drivers/accel/habanalabs/common/command_submission.c, which also resolves a coding style warning.
Signed-off-by: Malkoot Khan <engr.mkhan1990@gmail.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
f728c17f | 02-Nov-2023 |
Farah Kassabri <fkassabri@habana.ai> |
accel/habanalabs/gaudi2: move HMMU page tables to device memory
Currently the HMMU page tables reside in the host memory, which will cause host access from the device for every page walk. This can a
accel/habanalabs/gaudi2: move HMMU page tables to device memory
Currently the HMMU page tables reside in the host memory, which will cause host access from the device for every page walk. This can affect PCIe bandwidth in certain scenarios.
To prevent that problem, HMMU page tables will be moved to the device memory so the miss transaction will read the hops from there instead of going to the host.
Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
246d8b6c | 24-Dec-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: abort device reset for consecutive heartbeat failures
The mechanism of aborting device reset for consecutive fatal errors is currently only for fatal errors that are reported by FW
accel/habanalabs: abort device reset for consecutive heartbeat failures
The mechanism of aborting device reset for consecutive fatal errors is currently only for fatal errors that are reported by FW. A non-responsive FW and consecutive heartbeat failures is also considered fatal, so add them as well to this mechanism to avoid recurring device reset in such a case.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
d0df8a35 | 14-Dec-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: fix DRAM BAR base address calculation
When the DRAM region size in the BAR is not a power of 2, calculating the corresponding BAR base address should be done using the offset from
accel/habanalabs: fix DRAM BAR base address calculation
When the DRAM region size in the BAR is not a power of 2, calculating the corresponding BAR base address should be done using the offset from the DRAM start address, and not using directly the DRAM address.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
8c075401 | 11-Dec-2023 |
Koby Elbaz <kelbaz@habana.ai> |
accel/habanalabs: increase HL_MAX_STR to 64 bytes to avoid warnings
Fix a warning of a buffer overflow: ‘snprintf’ output between 38 and 47 bytes into a destination of size 32
Signed-off-by: Koby E
accel/habanalabs: increase HL_MAX_STR to 64 bytes to avoid warnings
Fix a warning of a buffer overflow: ‘snprintf’ output between 38 and 47 bytes into a destination of size 32
Signed-off-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
a9f07790 | 08-Dec-2023 |
Xingyuan Mo <hdthky0@gmail.com> |
accel/habanalabs: fix information leak in sec_attest_info()
This function may copy the pad0 field of struct hl_info_sec_attest to user mode which has not been initialized, resulting in leakage of ke
accel/habanalabs: fix information leak in sec_attest_info()
This function may copy the pad0 field of struct hl_info_sec_attest to user mode which has not been initialized, resulting in leakage of kernel heap data to user mode. To prevent this, use kzalloc() to allocate and zero out the buffer, which can also eliminate other uninitialized holes, if any.
Fixes: 0c88760f8f5e ("habanalabs/gaudi2: add secured attestation info uapi") Signed-off-by: Xingyuan Mo <hdthky0@gmail.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
bc5f15ab | 29-Nov-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs/gaudi2: avoid overriding existing undefined opcode data
Part of the undefined opcode data is updated in gaudi2_handle_qman_err_generic() and some in handle_lower_qman_data_on_err().
accel/habanalabs/gaudi2: avoid overriding existing undefined opcode data
Part of the undefined opcode data is updated in gaudi2_handle_qman_err_generic() and some in handle_lower_qman_data_on_err(). However, the 'write_enable' flag is checked only in gaudi2_handle_qman_err_generic(), and information of more than a single error can be mixed there.
Moreover, handle_lower_qman_data_on_err() is called only for the lower QMAN, so for an error in the upper QMAN there is only a partial info.
Move all the data update to be done in a single place, protected by the 'write_enable' flag. As mainly the lower QMAN's info is interesting, avoid saving the partial info for the upper QMAN.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
aa5cea38 | 03-May-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: add parent_device sysfs attribute
The device debugfs directory was modified to be named as the device-name. This name is the parent device name, i.e. either the PCI address in case
accel/habanalabs: add parent_device sysfs attribute
The device debugfs directory was modified to be named as the device-name. This name is the parent device name, i.e. either the PCI address in case of an ASIC, or the simulator device name in case of a simulator.
This change makes it more difficult for a user to access the debugfs directory for a specific accel device, because he can't just use the accel minor id, but he needs to do more device-dependent operations to get the device name.
To make it easier to get this name, add a 'parent_device' sysfs attribute that the user can read using the minor id before accessing debugfs.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|