Revision tags: v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1 |
|
#
0c6de0c9 |
| 28-Jun-2021 |
Menglong Dong <dong.menglong@zte.com.cn> |
net: tipc: fix FB_MTU eat two pages
FB_MTU is used in 'tipc_msg_build()' to alloc smaller skb when memory allocation fails, which can avoid unnecessary sending failures.
The value of FB_MTU now is
net: tipc: fix FB_MTU eat two pages
FB_MTU is used in 'tipc_msg_build()' to alloc smaller skb when memory allocation fails, which can avoid unnecessary sending failures.
The value of FB_MTU now is 3744, and the data size will be:
(3744 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + \ SKB_DATA_ALIGN(BUF_HEADROOM + BUF_TAILROOM + 3))
which is larger than one page(4096), and two pages will be allocated.
To avoid it, replace '3744' with a calculation:
(PAGE_SIZE - SKB_DATA_ALIGN(BUF_OVERHEAD) - \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
What's more, alloc_skb_fclone() will call SKB_DATA_ALIGN for data size, and it's not necessary to make alignment for buf_size in tipc_buf_acquire(). So, just remove it.
Fixes: 4c94cc2d3d57 ("tipc: fall back to smaller MTU if allocation of local send skb fails") Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1 |
|
#
ec78e318 |
| 16-Oct-2020 |
Hoang Huu Le <hoang.h.le@dektech.com.au> |
tipc: fix incorrect setting window for bcast link
In commit 16ad3f4022bb ("tipc: introduce variable window congestion control"), we applied the algorithm to select window size from minimum window to
tipc: fix incorrect setting window for bcast link
In commit 16ad3f4022bb ("tipc: introduce variable window congestion control"), we applied the algorithm to select window size from minimum window to the configured maximum window for unicast link, and, besides we chose to keep the window size for broadcast link unchanged and equal (i.e fix window 50)
However, when setting maximum window variable via command, the window variable was re-initialized to unexpect value (i.e 32).
We fix this by updating the fix window for broadcast as we stated.
Fixes: 16ad3f4022bb ("tipc: introduce variable window congestion control") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
75cee397 |
| 16-Oct-2020 |
Hoang Huu Le <hoang.h.le@dektech.com.au> |
tipc: re-configure queue limit for broadcast link
The queue limit of the broadcast link is being calculated base on initial MTU. However, when MTU value changed (e.g manual changing MTU on NIC devic
tipc: re-configure queue limit for broadcast link
The queue limit of the broadcast link is being calculated base on initial MTU. However, when MTU value changed (e.g manual changing MTU on NIC device, MTU negotiation etc.,) we do not re-calculate queue limit. This gives throughput does not reflect with the change.
So fix it by calling the function to re-calculate queue limit of the broadcast link.
Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2 |
|
#
cad2929d |
| 17-Jun-2020 |
Hoang Huu Le <hoang.h.le@dektech.com.au> |
tipc: update a binding service via broadcast
Currently, updating binding table (add service binding to name table/withdraw a service binding) is being sent over replicast. However, if we are scaling
tipc: update a binding service via broadcast
Currently, updating binding table (add service binding to name table/withdraw a service binding) is being sent over replicast. However, if we are scaling up clusters to > 100 nodes/containers this method is less affection because of looping through nodes in a cluster one by one.
It is worth to use broadcast to update a binding service. This way, the binding table can be updated on all peer nodes in one shot.
Broadcast is used when all peer nodes, as indicated by a new capability flag TIPC_NAMED_BCAST, support reception of this message type.
Four problems need to be considered when introducing this feature. 1) When establishing a link to a new peer node we still update this by a unicast 'bulk' update. This may lead to race conditions, where a later broadcast publication/withdrawal bypass the 'bulk', resulting in disordered publications, or even that a withdrawal may arrive before the corresponding publication. We solve this by adding an 'is_last_bulk' bit in the last bulk messages so that it can be distinguished from all other messages. Only when this message has arrived do we open up for reception of broadcast publications/withdrawals.
2) When a first legacy node is added to the cluster all distribution will switch over to use the legacy 'replicast' method, while the opposite happens when the last legacy node leaves the cluster. This entails another risk of message disordering that has to be handled. We solve this by adding a sequence number to the broadcast/replicast messages, so that disordering can be discovered and corrected. Note however that we don't need to consider potential message loss or duplication at this protocol level.
3) Bulk messages don't contain any sequence numbers, and will always arrive in order. Hence we must exempt those from the sequence number control and deliver them unconditionally. We solve this by adding a new 'is_bulk' bit in those messages so that they can be recognized.
4) Legacy messages, which don't contain any new bits or sequence numbers, but neither can arrive out of order, also need to be exempt from the initial synchronization and sequence number check, and delivered unconditionally. Therefore, we add another 'is_not_legacy' bit to all new messages so that those can be distinguished from legacy messages and the latter delivered directly.
v1->v2: - fix warning issue reported by kbuild test robot <lkp@intel.com> - add santiy check to drop the publication message with a sequence number that is lower than the agreed synch point
Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.8-rc1, v5.7 |
|
#
03b6fefd |
| 26-May-2020 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: add support for broadcast rcv stats dumping
This commit enables dumping the statistics of a broadcast-receiver link like the traditional 'broadcast-link' one (which is for broadcast- sender).
tipc: add support for broadcast rcv stats dumping
This commit enables dumping the statistics of a broadcast-receiver link like the traditional 'broadcast-link' one (which is for broadcast- sender). The link dumping can be triggered via netlink (e.g. the iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the indicator.
The name of a broadcast-receiver link of a specific peer will be in the format: 'broadcast-link:<peer-id>'.
For example:
Link <broadcast-link:1001002> Window:50 packets RX packets:7841 fragments:2408/440 bundles:0/0 TX packets:0 fragments:0/0 bundles:0/0 RX naks:0 defs:124 dups:0 TX naks:21 acks:0 retrans:0 Congestion link:0 Send queue max:0 avg:0
In addition, the broadcast-receiver link statistics can be reset in the usual way via netlink by specifying that link name in command.
Note: the 'tipc_link_name_ext()' is removed because the link name can now be retrieved simply via the 'l->name'.
Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a91d55d1 |
| 26-May-2020 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: enable broadcast retrans via unicast
In some environment, broadcast traffic is suppressed at high rate (i.e. a kind of bandwidth limit setting). When it is applied, TIPC broadcast can still ru
tipc: enable broadcast retrans via unicast
In some environment, broadcast traffic is suppressed at high rate (i.e. a kind of bandwidth limit setting). When it is applied, TIPC broadcast can still run successfully. However, when it comes to a high load, some packets will be dropped first and TIPC tries to retransmit them but the packet retransmission is intentionally broadcast too, so making things worse and not helpful at all.
This commit enables the broadcast retransmission via unicast which only retransmits packets to the specific peer that has really reported a gap i.e. not broadcasting to all nodes in the cluster, so will prevent from being suppressed, and also reduce some overheads on the other peers due to duplicates, finally improve the overall TIPC broadcast performance.
Note: the functionality can be turned on/off via the sysctl file:
echo 1 > /proc/sys/net/tipc/bc_retruni echo 0 > /proc/sys/net/tipc/bc_retruni
Default is '0', i.e. the broadcast retransmission still works as usual.
Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
d7626b5a |
| 26-May-2020 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: introduce Gap ACK blocks for broadcast link
As achieved through commit 9195948fbf34 ("tipc: improve TIPC throughput by Gap ACK blocks"), we apply the same mechanism for the broadcast link as w
tipc: introduce Gap ACK blocks for broadcast link
As achieved through commit 9195948fbf34 ("tipc: improve TIPC throughput by Gap ACK blocks"), we apply the same mechanism for the broadcast link as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will consist of two parts built for both the broadcast and unicast types:
31 16 15 0 +-------------+-------------+-------------+-------------+ | bgack_cnt | ugack_cnt | len | +-------------+-------------+-------------+-------------+ - | gap | ack | | +-------------+-------------+-------------+-------------+ > bc gacks : : : | +-------------+-------------+-------------+-------------+ - | gap | ack | | +-------------+-------------+-------------+-------------+ > uc gacks : : : | +-------------+-------------+-------------+-------------+ -
which is "automatically" backward-compatible.
We also increase the max number of Gap ACK blocks to 128, allowing upto 64 blocks per type (total buffer size = 516 bytes).
Besides, the 'tipc_link_advance_transmq()' function is refactored which is applicable for both the unicast and broadcast cases now, so some old functions can be removed and the code is optimized.
With the patch, TIPC broadcast is more robust regardless of packet loss or disorder, latency, ... in the underlying network. Its performance is boost up significantly. For example, experiment with a 5% packet loss rate results:
$ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 0m 42.46s user 0m 1.16s sys 0m 17.67s
Without the patch:
$ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 8m 27.94s user 0m 0.55s sys 0m 2.38s
Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2 |
|
#
dca4a17d |
| 10-Dec-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: fix potential hanging after b/rcast changing
In commit c55c8edafa91 ("tipc: smooth change between replicast and broadcast"), we allow instant switching between replicast and broadcast by sendi
tipc: fix potential hanging after b/rcast changing
In commit c55c8edafa91 ("tipc: smooth change between replicast and broadcast"), we allow instant switching between replicast and broadcast by sending a dummy 'SYN' packet on the last used link to synchronize packets on the links. The 'SYN' message is an object of link congestion also, so if that happens, a 'SOCK_WAKEUP' will be scheduled to be sent back to the socket... However, in that commit, we simply use the same socket 'cong_link_cnt' counter for both the 'SYN' & normal payload message sending. Therefore, if both the replicast & broadcast links are congested, the counter will be not updated correctly but overwritten by the latter congestion. Later on, when the 'SOCK_WAKEUP' messages are processed, the counter is reduced one by one and eventually overflowed. Consequently, further activities on the socket will only wait for the false congestion signal to disappear but never been met.
Because sending the 'SYN' message is vital for the mechanism, it should be done anyway. This commit fixes the issue by marking the message with an error code e.g. 'TIPC_ERR_NO_PORT', so its sending should not face a link congestion, there is no need to touch the socket 'cong_link_cnt' either. In addition, in the event of any error (e.g. -ENOBUFS), we will purge the entire payload message queue and make a return immediately.
Fixes: c55c8edafa91 ("tipc: smooth change between replicast and broadcast") Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
16ad3f40 |
| 09-Dec-2019 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: introduce variable window congestion control
We introduce a simple variable window congestion control for links. The algorithm is inspired by the Reno algorithm, covering both 'slow start', 'c
tipc: introduce variable window congestion control
We introduce a simple variable window congestion control for links. The algorithm is inspired by the Reno algorithm, covering both 'slow start', 'congestion avoidance', and 'fast recovery' modes.
- We introduce hard lower and upper window limits per link, still different and configurable per bearer type.
- We introduce a 'slow start theshold' variable, initially set to the maximum window size.
- We let a link start at the minimum congestion window, i.e. in slow start mode, and then let is grow rapidly (+1 per rceived ACK) until it reaches the slow start threshold and enters congestion avoidance mode.
- In congestion avoidance mode we increment the congestion window for each window-size number of acked packets, up to a possible maximum equal to the configured maximum window.
- For each non-duplicate NACK received, we drop back to fast recovery mode, by setting the both the slow start threshold to and the congestion window to (current_congestion_window / 2).
- If the timeout handler finds that the transmit queue has not moved since the previous timeout, it drops the link back to slow start and forces a probe containing the last sent sequence number to the sent to the peer, so that this can discover the stale situation.
This change does in reality have effect only on unicast ethernet transport, as we have seen that there is no room whatsoever for increasing the window max size for the UDP bearer. For now, we also choose to keep the limits for the broadcast link unchanged and equal.
This algorithm seems to give a 50-100% throughput improvement for messages larger than MTU.
Suggested-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.5-rc1, v5.4 |
|
#
ba5f6a86 |
| 21-Nov-2019 |
Hoang Le <hoang.h.le@dektech.com.au> |
tipc: update replicast capability for broadcast send link
When setting up a cluster with non-replicast/replicast capability supported. This capability will be disabled for broadcast send link in ord
tipc: update replicast capability for broadcast send link
When setting up a cluster with non-replicast/replicast capability supported. This capability will be disabled for broadcast send link in order to be backwards compatible.
However, when these non-support nodes left and be removed out the cluster. We don't update this capability on broadcast send link. Then, some of features that based on this capability will also disabling as unexpected.
In this commit, we make sure the broadcast send link capabilities will be re-calculated as soon as a node removed/rejoined a cluster.
Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.4-rc8, v5.4-rc7 |
|
#
fc1b6d6d |
| 08-Nov-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: introduce TIPC encryption & authentication
This commit offers an option to encrypt and authenticate all messaging, including the neighbor discovery messages. The currently most advanced algori
tipc: introduce TIPC encryption & authentication
This commit offers an option to encrypt and authenticate all messaging, including the neighbor discovery messages. The currently most advanced algorithm supported is the AEAD AES-GCM (like IPSec or TLS). All encryption/decryption is done at the bearer layer, just before leaving or after entering TIPC.
Supported features: - Encryption & authentication of all TIPC messages (header + data); - Two symmetric-key modes: Cluster and Per-node; - Automatic key switching; - Key-expired revoking (sequence number wrapped); - Lock-free encryption/decryption (RCU); - Asynchronous crypto, Intel AES-NI supported; - Multiple cipher transforms; - Logs & statistics;
Two key modes: - Cluster key mode: One single key is used for both TX & RX in all nodes in the cluster. - Per-node key mode: Each nodes in the cluster has one specific TX key. For RX, a node requires its peers' TX key to be able to decrypt the messages from those peers.
Key setting from user-space is performed via netlink by a user program (e.g. the iproute2 'tipc' tool).
Internal key state machine:
Attach Align(RX) +-+ +-+ | V | V +---------+ Attach +---------+ | IDLE |---------------->| PENDING |(user = 0) +---------+ +---------+ A A Switch| A | | | | | | Free(switch/revoked) | | (Free)| +----------------------+ | |Timeout | (TX) | | |(RX) | | | | | | v | +---------+ Switch +---------+ | PASSIVE |<----------------| ACTIVE | +---------+ (RX) +---------+ (user = 1) (user >= 1)
The number of TFMs is 10 by default and can be changed via the procfs 'net/tipc/max_tfms'. At this moment, as for simplicity, this file is also used to print the crypto statistics at runtime:
echo 0xfff1 > /proc/sys/net/tipc/max_tfms
The patch defines a new TIPC version (v7) for the encryption message (- backward compatibility as well). The message is basically encapsulated as follows:
+----------------------------------------------------------+ | TIPCv7 encryption | Original TIPCv2 | Authentication | | header | packet (encrypted) | Tag | +----------------------------------------------------------+
The throughput is about ~40% for small messages (compared with non- encryption) and ~9% for large messages. With the support from hardware crypto i.e. the Intel AES-NI CPU instructions, the throughput increases upto ~85% for small messages and ~55% for large messages.
By default, the new feature is inactive (i.e. no encryption) until user sets a key for TIPC. There is however also a new option - "TIPC_CRYPTO" in the kernel configuration to enable/disable the new code when needed.
MAINTAINERS | add two new files 'crypto.h' & 'crypto.c' in tipc
Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5 |
|
#
e654f9f5 |
| 15-Aug-2019 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: clean up skb list lock handling on send path
The policy for handling the skb list locks on the send and receive paths is simple.
- On the send path we never need to grab the lock on the 'xmit
tipc: clean up skb list lock handling on send path
The policy for handling the skb list locks on the send and receive paths is simple.
- On the send path we never need to grab the lock on the 'xmitq' list when the destination is an exernal node.
- On the receive path we always need to grab the lock on the 'inputq' list, irrespective of source node.
However, when transmitting node local messages those will eventually end up on the receive path of a local socket, meaning that the argument 'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in the function tipc_sk_rcv(). This has been handled by always initializing the spinlock of the 'xmitq' list at message creation, just in case it may end up on the receive path later, and despite knowing that the lock in most cases never will be used.
This approach is inaccurate and confusing, and has also concealed the fact that the stated 'no lock grabbing' policy for the send path is violated in some cases.
We now clean up this by never initializing the lock at message creation, instead doing this at the moment we find that the message actually will enter the receive path. At the same time we fix the four locations where we incorrectly access the spinlock on the send/error path.
This patch also reverts commit d12cffe9329f ("tipc: ensure head->lock is initialised") which has now become redundant.
CC: Eric Dumazet <edumazet@google.com> Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.3-rc4 |
|
#
6c9081a3 |
| 07-Aug-2019 |
John Rutherford <john.rutherford@dektech.com.au> |
tipc: add loopback device tracking
Since node internal messages are passed directly to the socket, it is not possible to observe those messages via tcpdump or wireshark.
We now remedy this by makin
tipc: add loopback device tracking
Since node internal messages are passed directly to the socket, it is not possible to observe those messages via tcpdump or wireshark.
We now remedy this by making it possible to clone such messages and send the clones to the loopback interface. The clones are dropped at reception and have no functional role except making the traffic visible.
The feature is enabled if network taps are active for the loopback device. pcap filtering restrictions require the messages to be presented to the receiving side of the loopback device.
v3 - Function dev_nit_active used to check for network taps. - Procedure netif_rx_ni used to send cloned messages to loopback device.
Signed-off-by: John Rutherford <john.rutherford@dektech.com.au> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7 |
|
#
a7dc51ad |
| 25-Jun-2019 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: rename function msg_get_wrapped() to msg_inner_hdr()
We rename the inline function msg_get_wrapped() to the more comprehensible msg_inner_hdr().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.c
tipc: rename function msg_get_wrapped() to msg_inner_hdr()
We rename the inline function msg_get_wrapped() to the more comprehensible msg_inner_hdr().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4 |
|
#
e1279ff7 |
| 03-Apr-2019 |
Hoang Le <hoang.h.le@dektech.com.au> |
tipc: add NULL pointer check
skb somehow dequeued out of inputq before processing, it causes to NULL pointer and kernel crashed.
Add checking skb valid before using.
Fixes: c55c8edafa9 ("tipc: smo
tipc: add NULL pointer check
skb somehow dequeued out of inputq before processing, it causes to NULL pointer and kernel crashed.
Add checking skb valid before using.
Fixes: c55c8edafa9 ("tipc: smooth change between replicast and broadcast") Reported-by: Tuong Lien Tong <tuong.t.lien@dektech.com.au> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.1-rc3 |
|
#
6da88a82 |
| 25-Mar-2019 |
Wei Yongjun <weiyongjun1@huawei.com> |
tipc: fix return value check in tipc_mcast_send_sync()
Fix the return value check which testing the wrong variable in tipc_mcast_send_sync().
Fixes: c55c8edafa91 ("tipc: smooth change between repli
tipc: fix return value check in tipc_mcast_send_sync()
Fix the return value check which testing the wrong variable in tipc_mcast_send_sync().
Fixes: c55c8edafa91 ("tipc: smooth change between replicast and broadcast") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.1-rc2 |
|
#
08e046c8 |
| 21-Mar-2019 |
Hoang Le <hoang.h.le@dektech.com.au> |
tipc: fix a null pointer deref
In commit c55c8edafa91 ("tipc: smooth change between replicast and broadcast") we introduced new method to eliminate the risk of message reordering that happen in betw
tipc: fix a null pointer deref
In commit c55c8edafa91 ("tipc: smooth change between replicast and broadcast") we introduced new method to eliminate the risk of message reordering that happen in between different nodes. Unfortunately, we forgot checking at receiving side to ignore intra node.
We fix this by checking and returning if arrived message from intra node.
syzbot report:
================================================================== kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 7820 Comm: syz-executor418 Not tainted 5.0.0+ #61 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:tipc_mcast_filter_msg+0x21b/0x13d0 net/tipc/bcast.c:782 Code: 45 c0 0f 84 39 06 00 00 48 89 5d 98 e8 ce ab a5 fa 49 8d bc 24 c8 00 00 00 48 b9 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 9a 0e 00 00 49 8b 9c 24 c8 00 00 00 48 be 00 00 RSP: 0018:ffff8880959defc8 EFLAGS: 00010202 RAX: 0000000000000019 RBX: ffff888081258a48 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: ffffffff86cab862 RDI: 00000000000000c8 RBP: ffff8880959df030 R08: ffff8880813d0200 R09: ffffed1015d05bc8 R10: ffffed1015d05bc7 R11: ffff8880ae82de3b R12: 0000000000000000 R13: 000000000000002c R14: 0000000000000000 R15: ffff888081258a48 FS: 000000000106a880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020001cc0 CR3: 0000000094a20000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tipc_sk_filter_rcv+0x182d/0x34f0 net/tipc/socket.c:2168 tipc_sk_enqueue net/tipc/socket.c:2254 [inline] tipc_sk_rcv+0xc45/0x25a0 net/tipc/socket.c:2305 tipc_sk_mcast_rcv+0x724/0x1020 net/tipc/socket.c:1209 tipc_mcast_xmit+0x7fe/0x1200 net/tipc/bcast.c:410 tipc_sendmcast+0xb36/0xfc0 net/tipc/socket.c:820 __tipc_sendmsg+0x10df/0x18d0 net/tipc/socket.c:1358 tipc_sendmsg+0x53/0x80 net/tipc/socket.c:1291 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0xdd/0x130 net/socket.c:661 ___sys_sendmsg+0x806/0x930 net/socket.c:2260 __sys_sendmsg+0x105/0x1d0 net/socket.c:2298 __do_sys_sendmsg net/socket.c:2307 [inline] __se_sys_sendmsg net/socket.c:2305 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2305 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4401c9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffd887fa9d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004401c9 RDX: 0000000000000000 RSI: 0000000020002140 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401a50 R13: 0000000000401ae0 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: ---[ end trace ba79875754e1708f ]---
Reported-by: syzbot+be4bdf2cc3e85e952c50@syzkaller.appspotmail.com Fixes: c55c8eda ("tipc: smooth change between replicast and broadcast") Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c55c8eda |
| 19-Mar-2019 |
Hoang Le <hoang.h.le@dektech.com.au> |
tipc: smooth change between replicast and broadcast
Currently, a multicast stream may start out using replicast, because there are few destinations, and then it should ideally switch to L2/broadcast
tipc: smooth change between replicast and broadcast
Currently, a multicast stream may start out using replicast, because there are few destinations, and then it should ideally switch to L2/broadcast IGMP/multicast when the number of destinations grows beyond a certain limit. The opposite should happen when the number decreases below the limit.
To eliminate the risk of message reordering caused by method change, a sending socket must stick to a previously selected method until it enters an idle period of 5 seconds. Means there is a 5 seconds pause in the traffic from the sender socket.
If the sender never makes such a pause, the method will never change, and transmission may become very inefficient as the cluster grows.
With this commit, we allow such a switch between replicast and broadcast without any need for a traffic pause.
Solution is to send a dummy message with only the header, also with the SYN bit set, via broadcast or replicast. For the data message, the SYN bit is set and sending via replicast or broadcast (inverse method with dummy).
Then, at receiving side any messages follow first SYN bit message (data or dummy message), they will be held in deferred queue until another pair (dummy or data message) arrived in other link.
v2: reverse christmas tree declaration
Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
02ec6caf |
| 19-Mar-2019 |
Hoang Le <hoang.h.le@dektech.com.au> |
tipc: support broadcast/replicast configurable for bc-link
Currently, a multicast stream uses either broadcast or replicast as transmission method, based on the ratio between number of actual destin
tipc: support broadcast/replicast configurable for bc-link
Currently, a multicast stream uses either broadcast or replicast as transmission method, based on the ratio between number of actual destinations nodes and cluster size.
However, when an L2 interface (e.g., VXLAN) provides pseudo broadcast support, this becomes very inefficient, as it blindly replicates multicast packets to all cluster/subnet nodes, irrespective of whether they host actual target sockets or not.
The TIPC multicast algorithm is able to distinguish real destination nodes from other nodes, and hence provides a smarter and more efficient method for transferring multicast messages than pseudo broadcast can do.
Because of this, we now make it possible for users to force the broadcast link to permanently switch to using replicast, irrespective of which capabilities the bearer provides, or pretend to provide. Conversely, we also make it possible to force the broadcast link to always use true broadcast. While maybe less useful in deployed systems, this may at least be useful for testing the broadcast algorithm in small clusters.
We retain the current AUTOSELECT ability, i.e., to let the broadcast link automatically select which algorithm to use, and to switch back and forth between broadcast and replicast as the ratio between destination node number and cluster size changes. This remains the default method.
Furthermore, we make it possible to configure the threshold ratio for such switches. The default ratio is now set to 10%, down from 25% in the earlier implementation.
Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3 |
|
#
9cc1bf39 |
| 03-Sep-2018 |
Zhenbo Gao <zhenbo.gao@windriver.com> |
tipc: correct spelling errors for struct tipc_bc_base's comment
Trivial fix for two spelling mistakes.
Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com> Reviewed-by: Ying Xue <ying.xue@windriver
tipc: correct spelling errors for struct tipc_bc_base's comment
Trivial fix for two spelling mistakes.
Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.19-rc2, v4.19-rc1, v4.18, v4.18-rc8, v4.18-rc7 |
|
#
a0732548 |
| 27-Jul-2018 |
Jia-Ju Bai <baijiaju1990@gmail.com> |
net: tipc: bcast: Replace GFP_ATOMIC with GFP_KERNEL in tipc_bcast_init()
tipc_bcast_init() is never called in atomic context. It calls kzalloc() with GFP_ATOMIC, which is not necessary. GFP_ATOMIC
net: tipc: bcast: Replace GFP_ATOMIC with GFP_KERNEL in tipc_bcast_init()
tipc_bcast_init() is never called in atomic context. It calls kzalloc() with GFP_ATOMIC, which is not necessary. GFP_ATOMIC can be replaced with GFP_KERNEL.
This is found by a static analysis tool named DCNS written by myself.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5 |
|
#
c9efb15a |
| 05-Mar-2018 |
Gustavo A. R. Silva <garsilva@embeddedor.com> |
tipc: bcast: use true and false for boolean values
Assign true or false to boolean variables instead of an integer value.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustav
tipc: bcast: use true and false for boolean values
Assign true or false to boolean variables instead of an integer value.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.16-rc4, v4.16-rc3, v4.16-rc2, v4.16-rc1, v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3, v4.15-rc2 |
|
#
4c94cc2d |
| 30-Nov-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: fall back to smaller MTU if allocation of local send skb fails
When sending node local messages the code is using an 'mtu' of 66060 bytes to avoid unnecessary fragmentation. During situations
tipc: fall back to smaller MTU if allocation of local send skb fails
When sending node local messages the code is using an 'mtu' of 66060 bytes to avoid unnecessary fragmentation. During situations of low memory tipc_msg_build() may sometimes fail to allocate such large buffers, resulting in unnecessary send failures. This can easily be remedied by falling back to a smaller MTU, and then reassemble the buffer chain as if the message were arriving from a remote node.
At the same time, we change the initial MTU setting of the broadcast link to a lower value, so that large messages always are fragmented into smaller buffers even when we run in single node mode. Apart from obtaining the same advantage as for the 'fallback' solution above, this turns out to give a significant performance improvement. This can probably be explained with the __pskb_copy() operation performed on the buffer for each recipient during reception. We found the optimal value for this, considering the most relevant skb pool, to be 3744 bytes.
Acked-by: Ying Xue <ying.xue@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.15-rc1, v4.14, v4.14-rc8, v4.14-rc7, v4.14-rc6, v4.14-rc5 |
|
#
a80ae530 |
| 13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: improve destination linked list
We often see a need for a linked list of destination identities, sometimes containing a port number, sometimes a node identity, and sometimes both. The currentl
tipc: improve destination linked list
We often see a need for a linked list of destination identities, sometimes containing a port number, sometimes a node identity, and sometimes both. The currently defined struct u32_list is not generic enough to cover all cases, so we extend it to contain two u32 integers and rename it to struct tipc_dest_list.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.14-rc4 |
|
#
3382605f |
| 07-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: correct initialization of skb list
We change the initialization of the skb transmit buffer queues in the functions tipc_bcast_xmit() and tipc_rcast_xmit() to also initialize their spinlocks. T
tipc: correct initialization of skb list
We change the initialization of the skb transmit buffer queues in the functions tipc_bcast_xmit() and tipc_rcast_xmit() to also initialize their spinlocks. This is needed because we may, during error conditions, need to call skb_queue_purge() on those queues further down the stack.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|