History log of /netbsd/sys/net/if_mpls.c (Results 1 – 25 of 41)
Revision Date Author Comments
# b6400998 03-Sep-2022 thorpej <thorpej@NetBSD.org>

Machete-waving to fix mpls rump build after pktqueue changes.


# d10e2ccb 03-Sep-2022 thorpej <thorpej@NetBSD.org>

Garbage-collect the remaining vestiges of netisr.


# 7258b55e 03-Sep-2022 thorpej <thorpej@NetBSD.org>

Convert MPLS from a legacy netisr to pktqueue.


# c47000ff 29-Jul-2022 skrll <skrll@NetBSD.org>

No need to wrap the call to if_detach with splnet / splx as if_detach
raises spl as required.


# bbdda93b 16-Jun-2021 riastradh <riastradh@NetBSD.org>

if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so

if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.

show more ...


# 3a4e0c47 29-Jan-2020 thorpej <thorpej@NetBSD.org>

Adopt <net/if_stats.h>.


# c3dd4857 27-Apr-2019 pgoyette <pgoyette@NetBSD.org>

A few more empty-string --> NULL in required-modules lists


# 8517c9d1 26-Jun-2018 msaitoh <msaitoh@NetBSD.org>

Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug th

Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.

show more ...


# fd44ad84 19-Jan-2018 maxv <maxv@NetBSD.org>

Several changes:

* Declare TRIM_LABEL as a function.

* In mpls_unlabel_inet, copy the label locally. It's not incorrect to
keep a pointer on the mbuf, but it's bug-friendly.

* In mpls_label_

Several changes:

* Declare TRIM_LABEL as a function.

* In mpls_unlabel_inet, copy the label locally. It's not incorrect to
keep a pointer on the mbuf, but it's bug-friendly.

* In mpls_label_inetX, fix the length check. Meanwhile add an XXX: we
just want to make sure that m_copydata won't fail, but if we were
guaranteed that m has M_PKTHDR set, we could simply check the length
against m->m_pkthdr.len.

show more ...


# b63c71a0 09-Dec-2017 maxv <maxv@NetBSD.org>

Kick MPLS packets earlier.


# 9370b4fc 08-Dec-2017 maxv <maxv@NetBSD.org>

Style, and fix several bugs:
- ip4_check(), mpls_unlabel_inet() and mpls_unlabel_inet6() perform
pullups, so we need to pass the updated pointers back
- in mpls_lse() the route is not always fre

Style, and fix several bugs:
- ip4_check(), mpls_unlabel_inet() and mpls_unlabel_inet6() perform
pullups, so we need to pass the updated pointers back
- in mpls_lse() the route is not always freed
Looks a little better now.

show more ...


# 7aca579b 23-Oct-2017 msaitoh <msaitoh@NetBSD.org>

If if_attach() failed in the attach function, free resources and return.


# 683bfd37 12-Dec-2016 ozaki-r <ozaki-r@NetBSD.org>

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
--------

We protect the routing table with a rwock and protect

Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
--------

We protect the routing table with a rwock and protect
rtcaches with another rwlock. Each rtentry is protected
from being freed or updated via reference counting and psref.

Global rwlocks
--------------

There are two rwlocks; one for the routing table (rt_lock) and
the other for rtcaches (rtcache_lock). rtcache_lock covers
all existing rtcaches; there may have room for optimizations
(future work).

The locking order is rtcache_lock first and rt_lock is next.

rtentry references
------------------

References to an rtentry is managed with reference counting
and psref. Either of the two mechanisms is used depending on
where a rtentry is obtained. Reference counting is used when
we obtain a rtentry from the routing table directly via
rtalloc1 and rtrequest{,1} while psref is used when we obtain
a rtentry from a rtcache via rtcache_* APIs. In both cases,
a caller can sleep/block with holding an obtained rtentry.

The reasons why we use two different mechanisms are (i) only
using reference counting hurts the performance due to atomic
instructions (rtcache case) (ii) ease of implementation;
applying psref to APIs such rtaloc1 and rtrequest{,1} requires
additional works (adding a local variable and an argument).

We will finally migrate to use only psref but we can do it
when we have a lockless routing table alternative.

Reference counting for rtentry
------------------------------

rt_refcnt now doesn't count permanent references such as for
rt_timers and rtcaches, instead it is used only for temporal
references when obtaining a rtentry via rtalloc1 and rtrequest{,1}.
We can do so because destroying a rtentry always involves
removing references of rt_timers and rtcaches to the rtentry
and we don't need to track such references. This also makes
it easy to wait for readers to release references on deleting
or updating a rtentry, i.e., we can simply wait until the
reference counter is 0 or 1. (If there are permanent references
the counter can be arbitrary.)

rt_ref increments a reference counter of a rtentry and rt_unref
decrements it. rt_ref is called inside APIs (rtalloc1 and
rtrequest{,1} so users don't need to care about it while
users must call rt_unref to an obtained rtentry after using it.

rtfree is removed and we use rt_unref and rt_free instead.
rt_unref now just decrements the counter of a given rtentry
and rt_free just tries to destroy a given rtentry.

See the next section for destructions of rtentries by rt_free.

Destructions of rtentries
-------------------------

We destroy a rtentry only when we call rtrequst{,1}(RTM_DELETE);
the original implementation can destroy in any rtfree where it's
the last reference. If we use reference counting or psref, it's
easy to understand if the place that a rtentry is destroyed is
fixed.

rt_free waits for references to a given rtentry to be released
before actually destroying the rtentry. rt_free uses a condition
variable (cv_wait) (and psref_target_destroy for psref) to wait.

Unfortunately rtrequst{,1}(RTM_DELETE) can be called in softint
that we cannot use cv_wait. In that case, we have to defer the
destruction to a workqueue.

rtentry#rt_cv, rtentry#rt_psref and global variables
(see rt_free_global) are added to conduct the procedure.

Updates of rtentries
--------------------

One difficulty to use refcnt/psref instead of rwlock for rtentry
is updates of rtentries. We need an additional mechanism to
prevent readers from seeing inconsistency of a rtentry being
updated.

We introduce RTF_UPDATING flag to rtentries that are updating.
While the flag is set to a rtentry, users cannot acquire the
rtentry. By doing so, we avoid users to see inconsistent
rtentries.

There are two options when a user tries to acquire a rtentry
with the RTF_UPDATING flag; if a user runs in softint context
the user fails to acquire a rtentry (NULL is returned).
Otherwise a user waits until the update completes by waiting
on cv.

The procedure of a updater is simpler to destruction of
a rtentry. Wait on cv (and psref) and after all readers left,
proceed with the update.

Global variables (see rt_update_global) are added to conduct
the procedure.

Currently we apply the mechanism to only RTM_CHANGE in
rtsock.c. We would have to apply other codes. See
"Known issues" section.

psref for rtentry
-----------------

When we obtain a rtentry from a rtcache via rtcache_* APIs,
psref is used to reference to the rtentry.

rtcache_ref acquires a reference to a rtentry with psref
and rtcache_unref releases the reference after using it.
rtcache_ref is called inside rtcache_* APIs and users don't
need to take care of it while users must call rtcache_unref
to release the reference.

struct psref and int bound that is needed for psref is
embedded into struct route. By doing so we don't need to
add local variables and additional argument to APIs.

However this adds another constraint to psref other than
reference counting one's; holding a reference of an rtentry
via a rtcache is allowed by just one caller at the same time.
So we must not acquire a rtentry via a rtcache twice and
avoid a recursive use of a rtcache. And also a rtcache must
be arranged to be used by a LWP/softint at the same time
somehow. For IP forwarding case, we have per-CPU rtcaches
used in softint so the constraint is guaranteed. For a h
rtcache of a PCB case, the constraint is guaranteed by the
solock of each PCB. Any other cases (pf, ipf, stf and ipsec)
are currently guaranteed by only the existence of the global
locks (softnet_lock and/or KERNEL_LOCK). If we've found the
cases that we cannot guarantee the constraint, we would need
to introduce other rtcache APIs that use simple reference
counting.

psref of rtcache is created with IPL_SOFTNET and so rtcache
shouldn't used at an IPL higher than IPL_SOFTNET.

Note that rtcache_free is used to invalidate a given rtcache.
We don't need another care by my change; just keep them as
they are.

Performance impact
------------------

When NET_MPSAFE is disabled the performance drop is 3% while
when it's enabled the drop is increased to 11%. The difference
comes from that currently we don't take any global locks and
don't use psref if NET_MPSAFE is disabled.

We can optimize the performance of the case of NET_MPSAFE
on by reducing lookups of rtcache that uses psref;
currently we do two lookups but we should be able to trim
one of two. This is a future work.

Known issues
------------

There are two known issues to be solved; one is that
a caller of rtrequest(RTM_ADD) may change rtentry (see rtinit).
We need to prevent new references during the update. Or
we may be able to remove the code (perhaps, need more
investigations).

The other is rtredirect that updates a rtentry. We need
to apply our update mechanism, however it's not easy because
rtredirect is called in softint and we cannot apply our
mechanism simply. One solution is to defer rtredirect to
a workqueue but it requires some code restructuring.

show more ...


# 8253fb55 03-Oct-2016 ozaki-r <ozaki-r@NetBSD.org>

Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutin

Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@

show more ...


# 4a3aea06 07-Aug-2016 christos <christos@NetBSD.org>

modularize some more drivers and merge the module glue


# 575a7585 07-Jul-2016 msaitoh <msaitoh@NetBSD.org>

KNF. Remove extra spaces. No functional change.


# 0ee43273 20-Jun-2016 knakahara <knakahara@NetBSD.org>

fix: kern/51259


# c3596435 20-Jun-2016 knakahara <knakahara@NetBSD.org>

apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).


# 9d07c695 10-Jun-2016 ozaki-r <ozaki-r@NetBSD.org>

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
pac

Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.

show more ...


# 9b969dc0 28-Apr-2016 ozaki-r <ozaki-r@NetBSD.org>

Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentri

Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.

show more ...


# ce4e075c 26-Apr-2016 ozaki-r <ozaki-r@NetBSD.org>

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 address

Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.

show more ...


# b8256fd8 09-Feb-2016 ozaki-r <ozaki-r@NetBSD.org>

Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this

Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!

show more ...


# 082aeae6 24-Aug-2015 pooka <pooka@NetBSD.org>

sprinkle _KERNEL_OPT


# 4bf9ea45 20-Aug-2015 christos <christos@NetBSD.org>

include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.


# 3442f7f8 04-Jun-2015 ozaki-r <ozaki-r@NetBSD.org>

Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output rout

Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.

show more ...


12