#
dc489f86 |
| 14-Feb-2024 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bridge: switchdev: Skip MDB replays of deferred events on offload
Before this change, generation of the list of MDB events to replay would race against the creation of new group memberships, ei
net: bridge: switchdev: Skip MDB replays of deferred events on offload
Before this change, generation of the list of MDB events to replay would race against the creation of new group memberships, either from the IGMP/MLD snooping logic or from user configuration.
While new memberships are immediately visible to walkers of br->mdb_list, the notification of their existence to switchdev event subscribers is deferred until a later point in time. So if a replay list was generated during a time that overlapped with such a window, it would also contain a replay of the not-yet-delivered event.
The driver would thus receive two copies of what the bridge internally considered to be one single event. On destruction of the bridge, only a single membership deletion event was therefore sent. As a consequence of this, drivers which reference count memberships (at least DSA), would be left with orphan groups in their hardware database when the bridge was destroyed.
This is only an issue when replaying additions. While deletion events may still be pending on the deferred queue, they will already have been removed from br->mdb_list, so no duplicates can be generated in that scenario.
To a user this meant that old group memberships, from a bridge in which a port was previously attached, could be reanimated (in hardware) when the port joined a new bridge, without the new bridge's knowledge.
For example, on an mv88e6xxx system, create a snooping bridge and immediately add a port to it:
root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \ > ip link set dev x3 up master br0
And then destroy the bridge:
root@infix-06-0b-00:~$ ip link del dev br0 root@infix-06-0b-00:~$ mvls atu ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a DEV:0 Marvell 88E6393X 33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . . 33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . . ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a root@infix-06-0b-00:~$
The two IPv6 groups remain in the hardware database because the port (x3) is notified of the host's membership twice: once via the original event and once via a replay. Since only a single delete notification is sent, the count remains at 1 when the bridge is destroyed.
Then add the same port (or another port belonging to the same hardware domain) to a new bridge, this time with snooping disabled:
root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \ > ip link set dev x3 up master br1
All multicast, including the two IPv6 groups from br0, should now be flooded, according to the policy of br1. But instead the old memberships are still active in the hardware database, causing the switch to only forward traffic to those groups towards the CPU (port 0).
Eliminate the race in two steps:
1. Grab the write-side lock of the MDB while generating the replay list.
This prevents new memberships from showing up while we are generating the replay list. But it leaves the scenario in which a deferred event was already generated, but not delivered, before we grabbed the lock. Therefore:
2. Make sure that no deferred version of a replay event is already enqueued to the switchdev deferred queue, before adding it to the replay list, when replaying additions.
Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a7672871 |
| 08-Aug-2023 |
Yue Haibing <yuehaibing@huawei.com> |
net: switchdev: Remove unused declaration switchdev_port_fwd_mark_set()
Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") removed the implementation but leave d
net: switchdev: Remove unused declaration switchdev_port_fwd_mark_set()
Commit 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") removed the implementation but leave declaration.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://lore.kernel.org/r/20230808145955.2176-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
f85b1c7d |
| 01-Aug-2023 |
Yue Haibing <yuehaibing@huawei.com> |
net: switchdev: Remove unused typedef switchdev_obj_dump_cb_t()
Commit 29ab586c3d83 ("net: switchdev: Remove bridge bypass support from switchdev") leave this unused.
Signed-off-by: Yue Haibing <yu
net: switchdev: Remove unused typedef switchdev_obj_dump_cb_t()
Commit 29ab586c3d83 ("net: switchdev: Remove bridge bypass support from switchdev") leave this unused.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230801144209.27512-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
f2e2857b |
| 19-Jul-2023 |
Petr Machata <petrm@nvidia.com> |
net: switchdev: Add a helper to replay objects on a bridge port
When a front panel joins a bridge via another netdevice (typically a LAG), the driver needs to learn about the objects configured on t
net: switchdev: Add a helper to replay objects on a bridge port
When a front panel joins a bridge via another netdevice (typically a LAG), the driver needs to learn about the objects configured on the bridge port. When the bridge port is offloaded by the driver for the first time, this can be achieved by passing a notifier to switchdev_bridge_port_offload(). The notifier is then invoked for the individual objects (such as VLANs) configured on the bridge, and can look for the interesting ones.
Calling switchdev_bridge_port_offload() when the second port joins the bridge lower is unnecessary, but the replay is still needed. To that end, add a new function, switchdev_bridge_port_replay(), which does only the replay part of the _offload() function in exactly the same way as that function.
Cc: Jiri Pirko <jiri@resnulli.us> Cc: Ivan Vecera <ivecera@redhat.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: bridge@lists.linux-foundation.org Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
27fabd02 |
| 08-Nov-2022 |
Hans J. Schultz <netdev@kapio-technology.com> |
bridge: switchdev: Allow device drivers to install locked FDB entries
When the bridge is offloaded to hardware, FDB entries are learned and aged-out by the hardware. Some device drivers synchronize
bridge: switchdev: Allow device drivers to install locked FDB entries
When the bridge is offloaded to hardware, FDB entries are learned and aged-out by the hardware. Some device drivers synchronize the hardware and software FDBs by generating switchdev events towards the bridge.
When a port is locked, the hardware must not learn autonomously, as otherwise any host will blindly gain authorization. Instead, the hardware should generate events regarding hosts that are trying to gain authorization and their MAC addresses should be notified by the device driver as locked FDB entries towards the bridge driver.
Allow device drivers to notify the bridge driver about such entries by extending the 'switchdev_notifier_fdb_info' structure with the 'locked' bit. The bit can only be set by device drivers and not by the bridge driver.
Prevent a locked entry from being installed if MAB is not enabled on the bridge port.
If an entry already exists in the bridge driver, reject the locked entry if the current entry does not have the "locked" flag set or if it points to a different port. The same semantics are implemented in the software data path.
Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
3eb4a4c3 |
| 28-Jun-2022 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: switchdev: add reminder near struct switchdev_notifier_fdb_info
br_switchdev_fdb_notify() creates an on-stack FDB info variable, and initializes it member by member. As such, newly added fields
net: switchdev: add reminder near struct switchdev_notifier_fdb_info
br_switchdev_fdb_notify() creates an on-stack FDB info variable, and initializes it member by member. As such, newly added fields which are not initialized by br_switchdev_fdb_notify() will contain junk bytes from the stack.
Other uses of struct switchdev_notifier_fdb_info have a struct initializer which should put zeroes in the uninitialized fields.
Add a reminder above the structure for future developers. Recently discussed during review.
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220524152144.40527-2-schultz.hans+netdev@gmail.com/#24877698 Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220524152144.40527-3-schultz.hans+netdev@gmail.com/#24912269 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220628100831.2899434-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
7ae9147f |
| 16-Mar-2022 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bridge: mst: Notify switchdev drivers of MST state changes
Generate a switchdev notification whenever an MST state changes. This notification is keyed by the VLANs MSTI rather than the VID, sin
net: bridge: mst: Notify switchdev drivers of MST state changes
Generate a switchdev notification whenever an MST state changes. This notification is keyed by the VLANs MSTI rather than the VID, since multiple VLANs may share the same MST instance.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
6284c723 |
| 16-Mar-2022 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations
Whenever a VLAN moves to a new MSTI, send a switchdev notification so that switchdevs can track a bridge's VID to MSTI mappings.
S
net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations
Whenever a VLAN moves to a new MSTI, send a switchdev notification so that switchdevs can track a bridge's VID to MSTI mappings.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
87c167bb |
| 16-Mar-2022 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: bridge: mst: Notify switchdev drivers of MST mode changes
Trigger a switchdev event whenever the bridge's MST mode is enabled/disabled. This allows constituent ports to either perform any requi
net: bridge: mst: Notify switchdev drivers of MST mode changes
Trigger a switchdev event whenever the bridge's MST mode is enabled/disabled. This allows constituent ports to either perform any required hardware config, or refuse the change if it not supported.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
ec638740 |
| 23-Feb-2022 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device
When the switchdev_handle_fdb_event_to_device() event replication helper was created, my original thought was that FDB eve
net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device
When the switchdev_handle_fdb_event_to_device() event replication helper was created, my original thought was that FDB events on LAG interfaces should most likely be special-cased, not just replicated towards all switchdev ports beneath that LAG. So this replication helper currently does not recurse through switchdev lower interfaces of LAG bridge ports, but rather calls the lag_mod_cb() if that was provided.
No switchdev driver uses this helper for FDB events on LAG interfaces yet, so that was an assumption which was yet to be tested. It is certainly usable for that purpose, as my RFC series shows:
https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/
however this approach is slightly convoluted because:
- the switchdev driver gets a "dev" that isn't its own net device, but rather the LAG net device. It must call switchdev_lower_dev_find(dev) in order to get a handle of any of its own net devices (the ones that pass check_cb).
- in order for FDB entries on LAG ports to be correctly refcounted per the number of switchdev ports beneath that LAG, we haven't escaped the need to iterate through the LAG's lower interfaces. Except that is now the responsibility of the switchdev driver, because the replication helper just stopped half-way.
So, even though yes, FDB events on LAG bridge ports must be special-cased, in the end it's simpler to let switchdev_handle_fdb_* just iterate through the LAG port's switchdev lowers, and let the switchdev driver figure out that those physical ports are under a LAG.
The switchdev_handle_fdb_event_to_device() helper takes a "foreign_dev_check" callback so it can figure out whether @dev can autonomously forward to @foreign_dev. DSA fills this method properly: if the LAG is offloaded by another port in the same tree as @dev, then it isn't foreign. If it is a software LAG, it is foreign - forwarding happens in software.
Whether an interface is foreign or not decides whether the replication helper will go through the LAG's switchdev lowers or not. Since the lan966x doesn't properly fill this out, FDB events on software LAG uppers will get called. By changing lan966x_foreign_dev_check(), we can suppress them.
Whereas DSA will now start receiving FDB events for its offloaded LAG uppers, so we need to return -EOPNOTSUPP, since we currently don't do the right thing for them.
Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
c4076cdd |
| 15-Feb-2022 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: switchdev: introduce switchdev_handle_port_obj_{add,del} for foreign interfaces
The switchdev_handle_port_obj_add() helper is good for replicating a port object on the lower interfaces of @dev,
net: switchdev: introduce switchdev_handle_port_obj_{add,del} for foreign interfaces
The switchdev_handle_port_obj_add() helper is good for replicating a port object on the lower interfaces of @dev, if that object was emitted on a bridge, or on a bridge port that is a LAG.
However, drivers that use this helper limit themselves to a box from which they can no longer intercept port objects notified on neighbor ports ("foreign interfaces").
One such driver is DSA, where software bridging with foreign interfaces such as standalone NICs or Wi-Fi APs is an important use case. There, a VLAN installed on a neighbor bridge port roughly corresponds to a forwarding VLAN installed on the DSA switch's CPU port.
To support this use case while also making use of the benefits of the switchdev_handle_* replication helper for port objects, introduce a new variant of these functions that crawls through the neighbor ports of @dev, in search of potentially compatible switchdev ports that are interested in the event.
The strategy is identical to switchdev_handle_fdb_event_to_device(): if @dev wasn't a switchdev interface, then go one step upper, and recursively call this function on the bridge that this port belongs to. At the next recursion step, __switchdev_handle_port_obj_add() will iterate through the bridge's lower interfaces. Among those, some will be switchdev interfaces, and one will be the original @dev that we came from. To prevent infinite recursion, we must suppress reentry into the original @dev, and just call the @add_cb for the switchdev_interfaces.
It looks like this:
br0 / | \ / | \ / | \ swp0 swp1 eth0
1. __switchdev_handle_port_obj_add(eth0) -> check_cb(eth0) returns false -> eth0 has no lower interfaces -> eth0's bridge is br0 -> switchdev_lower_dev_find(br0, check_cb, foreign_dev_check_cb)) finds br0
2. __switchdev_handle_port_obj_add(br0) -> check_cb(br0) returns false -> netdev_for_each_lower_dev -> check_cb(swp0) returns true, so we don't skip this interface
3. __switchdev_handle_port_obj_add(swp0) -> check_cb(swp0) returns true, so we call add_cb(swp0)
(back to netdev_for_each_lower_dev from 2) -> check_cb(swp1) returns true, so we don't skip this interface
4. __switchdev_handle_port_obj_add(swp1) -> check_cb(swp1) returns true, so we call add_cb(swp1)
(back to netdev_for_each_lower_dev from 2) -> check_cb(eth0) returns false, so we skip this interface to avoid infinite recursion
Note: eth0 could have been a LAG, and we don't want to suppress the recursion through its lowers if those exist, so when check_cb() returns false, we still call switchdev_lower_dev_find() to estimate whether there's anything worth a recursion beneath that LAG. Using check_cb() and foreign_dev_check_cb(), switchdev_lower_dev_find() not only figures out whether the lowers of the LAG are switchdev, but also whether they actively offload the LAG or not (whether the LAG is "foreign" to the switchdev interface or not).
The port_obj_info->orig_dev is preserved across recursive calls, so switchdev drivers still know on which device was this notification originally emitted.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
8d23a54f |
| 15-Feb-2022 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: switchdev: differentiate new VLANs from changed ones
br_switchdev_port_vlan_add() currently emits a SWITCHDEV_PORT_OBJ_ADD event with a SWITCHDEV_OBJ_ID_PORT_VLAN for 2 distinct cases:
net: bridge: switchdev: differentiate new VLANs from changed ones
br_switchdev_port_vlan_add() currently emits a SWITCHDEV_PORT_OBJ_ADD event with a SWITCHDEV_OBJ_ID_PORT_VLAN for 2 distinct cases:
- a struct net_bridge_vlan got created - an existing struct net_bridge_vlan was modified
This makes it impossible for switchdev drivers to properly balance PORT_OBJ_ADD with PORT_OBJ_DEL events, so if we want to allow that to happen, we must provide a way for drivers to distinguish between a VLAN with changed flags and a new one.
Annotate struct switchdev_obj_port_vlan with a "bool changed" that distinguishes the 2 cases above.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
716a30a9 |
| 26-Oct-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device
To reduce code churn, the same patch makes multiple changes, since they all touch the same lines:
1. The implementations for these two
net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device
To reduce code churn, the same patch makes multiple changes, since they all touch the same lines:
1. The implementations for these two are identical, just with different function pointers. Reduce duplications and name the function pointers "mod_cb" instead of "add_cb" and "del_cb". Pass the event as argument.
2. Drop the "const" attribute from "orig_dev". If the driver needs to check whether orig_dev belongs to itself and then call_switchdev_notifiers(orig_dev, SWITCHDEV_FDB_OFFLOADED), it can't, because call_switchdev_notifiers takes a non-const struct net_device *.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
957e2235 |
| 03-Aug-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge
With the introduction of explicit offloading API in switchdev in commit 2f5dc00f7a3e ("net: bridge: switchdev: let driver
net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge
With the introduction of explicit offloading API in switchdev in commit 2f5dc00f7a3e ("net: bridge: switchdev: let drivers inform which bridge ports are offloaded"), we started having Ethernet switch drivers calling directly into a function exported by net/bridge/br_switchdev.c, which is a function exported by the bridge driver.
This means that drivers that did not have an explicit dependency on the bridge before, like cpsw and am65-cpsw, now do - otherwise it is not possible to call a symbol exported by a driver that can be built as module unless you are a module too.
There was an attempt to solve the dependency issue in the form of commit b0e81817629a ("net: build all switchdev drivers as modules when the bridge is a module"). Grygorii Strashko, however, says about it:
| In my opinion, the problem is a bit bigger here than just fixing the | build :( | | In case, of ^cpsw the switchdev mode is kinda optional and in many | cases (especially for testing purposes, NFS) the multi-mac mode is | still preferable mode. | | There were no such tight dependency between switchdev drivers and | bridge core before and switchdev serviced as independent, notification | based layer between them, so ^cpsw still can be "Y" and bridge can be | "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE | will need to be set as "Y", or we will have to update drivers to | support build with BRIDGE=n and maintain separate builds for | networking vs non-networking testing. But is this enough? Wouldn't | it cause 'chain reaction' required to add more and more "Y" options | (like CONFIG_VLAN_8021Q)? | | PS. Just to be sure we on the same page - ARM builds will be forced | (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our | automation testing will just fail with omap2plus_defconfig.
In the light of this, it would be desirable for some configurations to avoid dependencies between switchdev drivers and the bridge, and have the switchdev mode as completely optional within the driver.
Arnd Bergmann also tried to write a patch which better expressed the build time dependency for Ethernet switch drivers where the switchdev support is optional, like cpsw/am65-cpsw, and this made the drivers follow the bridge (compile as module if the bridge is a module) only if the optional switchdev support in the driver was enabled in the first place: https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/
but this still did not solve the fact that cpsw and am65-cpsw now must be built as modules when the bridge is a module - it just expressed correctly that optional dependency. But the new behavior is an apparent regression from Grygorii's perspective.
So to support the use case where the Ethernet driver is built-in, NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we need a framework that can handle the possible absence of the bridge from the running system, i.e. runtime bloatware as opposed to build-time bloatware.
Luckily we already have this framework, since switchdev has been using it extensively. Events from the bridge side are transmitted to the driver side using notifier chains - this was originally done so that unrelated drivers could snoop for events emitted by the bridge towards ports that are implemented by other drivers (think of a switch driver with LAG offload that listens for switchdev events on a bonding/team interface that it offloads).
There are also events which are transmitted from the driver side to the bridge side, which again are modeled using notifiers. SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with notifying the bridge that a MAC address has been dynamically learned. So there is a precedent we can use for modeling the new framework.
The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work that the bridge needs to do when a port becomes offloaded is blocking in its nature: replay VLANs, MDBs etc. The calling context is indeed blocking (we are under rtnl_mutex), but the existing switchdev notification chain that the bridge is subscribed to is only the atomic one. So we need to subscribe the bridge to the blocking switchdev notification chain too.
This patch: - keeps the driver-side perception of the switchdev_bridge_port_{,un}offload unchanged - moves the implementation of switchdev_bridge_port_{,un}offload from the bridge module into the switchdev module. - makes everybody that is subscribed to the switchdev blocking notifier chain "hear" offload & unoffload events - makes the bridge driver subscribe and handle those events - moves the bridge driver's handling of those events into 2 new functions called br_switchdev_port_{,un}offload. These functions contain in fact the core of the logic that was previously in switchdev_bridge_port_{,un}offload, just that now we go through an extra indirection layer to reach them.
Unlike all the other switchdev notification structures, the structure used to carry the bridge port information, struct switchdev_notifier_brport_info, does not contain a "bool handled". This is because in the current usage pattern, we always know that a switchdev bridge port offloading event will be handled by the bridge, because the switchdev_bridge_port_offload() call was initiated by a NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event couldn't have happened.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
94111dfc |
| 20-Jul-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: switchdev: remove stray semicolon in switchdev_handle_fdb_del_to_device shim
With the semicolon at the end, the compiler sees the shim function as a declaration and not as a definition, and war
net: switchdev: remove stray semicolon in switchdev_handle_fdb_del_to_device shim
With the semicolon at the end, the compiler sees the shim function as a declaration and not as a definition, and warns:
'switchdev_handle_fdb_del_to_device' declared 'static' but never defined
Reported-by: kernel test robot <lkp@intel.com> Fixes: 8ca07176ab00 ("net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
8ca07176 |
| 19-Jul-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
Currently DSA has an issue with FDB entries pointing towards the bridge in the presence of br_fdb_replay() being calle
net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
Currently DSA has an issue with FDB entries pointing towards the bridge in the presence of br_fdb_replay() being called at port join and leave time.
In particular, each bridge port will ask for a replay for the FDB entries pointing towards the bridge when it joins, and for another replay when it leaves.
This means that for example, a bridge with 4 switch ports will notify DSA 4 times of the bridge MAC address.
But if the MAC address of the bridge changes during the normal runtime of the system, the bridge notifies switchdev [ once ] of the deletion of the old MAC address as a local FDB towards the bridge, and of the insertion [ again once ] of the new MAC address as a local FDB.
This is a problem, because DSA keeps the old MAC address as a host FDB entry with refcount 4 (4 ports asked for it using br_fdb_replay). So the old MAC address will not be deleted. Additionally, the new MAC address will only be installed with refcount 1, and when the first switch port leaves the bridge (leaving 3 others as still members), it will delete with it the new MAC address of the bridge from the local FDB entries kept by DSA (because the br_fdb_replay call on deletion will bring the entry's refcount from 1 to 0).
So the problem, really, is that the number of br_fdb_replay() calls is not matched with the refcount that a host FDB is offloaded to DSA during normal runtime.
An elegant way to solve the problem would be to make the switchdev notification emitted by br_fdb_change_mac_address() result in a host FDB kept by DSA which has a refcount exactly equal to the number of ports under that bridge. Then, no matter how many DSA ports join or leave that bridge, the host FDB entry will always be deleted when there are exactly zero remaining DSA switch ports members of the bridge.
To implement the proposed solution, we remember that the switchdev objects and port attributes have some helpers provided by switchdev, which can be optionally called by drivers: switchdev_handle_port_obj_{add,del} and switchdev_handle_port_attr_set. These helpers: - fan out a switchdev object/attribute emitted for the bridge towards all the lower interfaces that pass the check_cb(). - fan out a switchdev object/attribute emitted for a bridge port that is a LAG towards all the lower interfaces that pass the check_cb().
In other words, this is the model we need for the FDB events too: something that will keep an FDB entry emitted towards a physical port as it is, but translate an FDB entry emitted towards the bridge into N FDB entries, one per physical port.
Of course, there are many differences between fanning out a switchdev object (VLAN) on 3 lower interfaces of a LAG and fanning out an FDB entry on 3 lower interfaces of a LAG. Intuitively, an FDB entry towards a LAG should be treated specially, because FDB entries are unicast, we can't just install the same address towards 3 destinations. It is imaginable that drivers might want to treat this case specifically, so create some methods for this case and do not recurse into the LAG lower ports, just the bridge ports.
DSA also listens for FDB entries on "foreign" interfaces, aka interfaces bridged with us which are not part of our hardware domain: think an Ethernet switch bridged with a Wi-Fi AP. For those addresses, DSA installs host FDB entries. However, there we have the same problem (those host FDB entries are installed with a refcount of only 1) and an even bigger one which we did not have with FDB entries towards the bridge:
br_fdb_replay() is currently not called for FDB entries on foreign interfaces, just for the physical port and for the bridge itself.
So when DSA sniffs an address learned by the software bridge towards a foreign interface like an e1000 port, and then that e1000 leaves the bridge, DSA remains with the dangling host FDB address. That will be fixed separately by replaying all FDB entries and not just the ones towards the port and the bridge.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c6451cda |
| 19-Jul-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: switchdev: introduce helper for checking dynamically learned FDB entries
It is a bit difficult to understand what DSA checks when it tries to avoid installing dynamically learned addresses on f
net: switchdev: introduce helper for checking dynamically learned FDB entries
It is a bit difficult to understand what DSA checks when it tries to avoid installing dynamically learned addresses on foreign interfaces as local host addresses, so create a generic switchdev helper that can be reused and is generally more readable.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
69bfac96 |
| 27-Jun-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: switchdev: add a context void pointer to struct switchdev_notifier_info
In the case where the driver asks for a replay of a certain type of event (port object or attribute) for a bridge port th
net: switchdev: add a context void pointer to struct switchdev_notifier_info
In the case where the driver asks for a replay of a certain type of event (port object or attribute) for a bridge port that is a LAG, it may do so because this port has just joined the LAG.
But there might already be other switchdev ports in that LAG, and it is preferable that those preexisting switchdev ports do not act upon the replayed event.
The solution is to add a context to switchdev events, which is NULL most of the time (when the bridge layer initiates the call) but which can be set to a value controlled by the switchdev driver when a replay is requested. The driver can then check the context to figure out if all ports within the LAG should act upon the switchdev event, or just the ones that match the context.
We have to modify all switchdev_handle_* helper functions as well as the prototypes in the drivers that use these helpers too, because these helpers hide the underlying struct switchdev_notifier_info from us and there is no way to retrieve the context otherwise.
The context structure will be populated and used in later patches.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
2c4eca3e |
| 14-Apr-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: switchdev: include local flag in FDB notifications
As explained in bugfix commit 6ab4c3117aec ("net: bridge: don't notify switchdev for local FDB addresses") as well as in this discussi
net: bridge: switchdev: include local flag in FDB notifications
As explained in bugfix commit 6ab4c3117aec ("net: bridge: don't notify switchdev for local FDB addresses") as well as in this discussion: https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/
the switchdev notifiers for FDB entries managed to have a zero-day bug, which was that drivers would not know what to do with local FDB entries, because they were not told that they are local. The bug fix was to simply not notify them of those addresses.
Let us now add the 'is_local' bit to bridge FDB entries, and make all drivers ignore these entries by their own choice.
Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
4f2673b3 |
| 22-Mar-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring interfaces up as soon as they are created.
I create a bridge as
net: bridge: add helper to replay port and host-joined mdb entries
I have a system with DSA ports, and udhcpcd is configured to bring interfaces up as soon as they are created.
I create a bridge as follows:
ip link add br0 type bridge
As soon as I create the bridge and udhcpcd brings it up, I also have avahi which automatically starts sending IPv6 packets to advertise some local services, and because of that, the br0 bridge joins the following IPv6 groups due to the code path detailed below:
33:33:ff:6d:c1:9c vid 0 33:33:00:00:00:6a vid 0 33:33:00:00:00:fb vid 0
br_dev_xmit -> br_multicast_rcv -> br_ip6_multicast_add_group -> __br_multicast_add_group -> br_multicast_host_join -> br_mdb_notify
This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host hooked up, and switchdev will attempt to offload the host joined groups to an empty list of ports. Of course nobody offloads them.
Then when we add a port to br0:
ip link set swp0 master br0
the bridge doesn't replay the host-joined MDB entries from br_add_if, and eventually the host joined addresses expire, and a switchdev notification for deleting it is emitted, but surprise, the original addition was already completely missed.
The strategy to address this problem is to replay the MDB entries (both the port ones and the host joined ones) when the new port joins the bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can be populated and only then attached to a bridge that you offload). However there are 2 possibilities: the addresses can be 'pushed' by the bridge into the port, or the port can 'pull' them from the bridge.
Considering that in the general case, the new port can be really late to the party, and there may have been many other switchdev ports that already received the initial notification, we would like to avoid delivering duplicate events to them, since they might misbehave. And currently, the bridge calls the entire switchdev notifier chain, whereas for replaying it should just call the notifier block of the new guy. But the bridge doesn't know what is the new guy's notifier block, it just knows where the switchdev notifier chain is. So for simplification, we make this a driver-initiated pull for now, and the notifier block is passed as an argument.
To emulate the calling context for mdb objects (deferred and put on the blocking notifier chain), we must iterate under RCU protection through the bridge's mdb entries, queue them, and only call them once we're out of the RCU read-side critical section.
There was some opportunity for reuse between br_mdb_switchdev_host_port, br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev mdb object is created, so a helper was created.
Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c513efa2 |
| 16-Feb-2021 |
Horatiu Vultur <horatiu.vultur@microchip.com> |
switchdev: mrp: Extend ring_role_mrp and in_role_mrp
Add the member sw_backup to the structures switchdev_obj_ring_role_mrp and switchdev_obj_in_role_mrp. In this way the SW can call the driver in 2
switchdev: mrp: Extend ring_role_mrp and in_role_mrp
Add the member sw_backup to the structures switchdev_obj_ring_role_mrp and switchdev_obj_in_role_mrp. In this way the SW can call the driver in 2 ways, once when sw_backup is set to false, meaning that the driver should implement this completely in HW. And if that is not supported the SW will call again but with sw_backup set to true, meaning that the HW should help or allow the SW to run the protocol.
For example when role is MRM, if the HW can't detect when it stops receiving MRP Test frames but it can trap these frames to CPU, then it needs to return -EOPNOTSUPP when sw_backup is false and return 0 when sw_backup is true.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
405be6b4 |
| 16-Feb-2021 |
Horatiu Vultur <horatiu.vultur@microchip.com> |
switchdev: mrp: Remove CONFIG_BRIDGE_MRP
Remove #IS_ENABLED(CONFIG_BRIDGE_MRP) from switchdev.h. This will simplify the code implements MRP callbacks and will be similar with the vlan filtering.
Si
switchdev: mrp: Remove CONFIG_BRIDGE_MRP
Remove #IS_ENABLED(CONFIG_BRIDGE_MRP) from switchdev.h. This will simplify the code implements MRP callbacks and will be similar with the vlan filtering.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
419dfaed |
| 15-Feb-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: fix switchdev_port_attr_set stub when CONFIG_SWITCHDEV=n
The switchdev_port_attr_set function prototype was updated only for the case where CONFIG_SWITCHDEV=y|m, leaving a prototype mis
net: bridge: fix switchdev_port_attr_set stub when CONFIG_SWITCHDEV=n
The switchdev_port_attr_set function prototype was updated only for the case where CONFIG_SWITCHDEV=y|m, leaving a prototype mismatch with the stub definition for the disabled case. This results in a build error, so update that function too.
Fixes: dcbdf1350e33 ("net: bridge: propagate extack through switchdev_port_attr_set") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
dcbdf135 |
| 13-Feb-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: bridge: propagate extack through switchdev_port_attr_set
The benefit is the ability to propagate errors from switchdev drivers for the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING and SWITCHDEV_ATTR
net: bridge: propagate extack through switchdev_port_attr_set
The benefit is the ability to propagate errors from switchdev drivers for the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING and SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL attributes.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
e18f4c18 |
| 12-Feb-2021 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes
This switchdev attribute offers a counterproductive API for a driver writer, because although br_switchdev_set_port_flag ge
net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes
This switchdev attribute offers a counterproductive API for a driver writer, because although br_switchdev_set_port_flag gets passed a "flags" and a "mask", those are passed piecemeal to the driver, so while the PRE_BRIDGE_FLAGS listener knows what changed because it has the "mask", the BRIDGE_FLAGS listener doesn't, because it only has the final value. But certain drivers can offload only certain combinations of settings, like for example they cannot change unicast flooding independently of multicast flooding - they must be both on or both off. The way the information is passed to switchdev makes drivers not expressive enough, and unable to reject this request ahead of time, in the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during the deferred BRIDGE_FLAGS attribute, where the rejection is currently ignored.
This patch also changes drivers to make use of the "mask" field for edge detection when possible.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|