#
0f9e9ec2 |
| 13-May-2024 |
jsg <jsg@openbsd.org> |
remove prototypes with no matching function ok mpi@
|
#
7ac7a880 |
| 12-May-2024 |
jsg <jsg@openbsd.org> |
sync_ifp and ticket_pabuf don't exist, remove externs
|
#
19e99d06 |
| 22-Apr-2024 |
bluhm <bluhm@openbsd.org> |
Show pf fragment reassembly counters.
Framgent count and statistics are stored in struct pf_status. From there pfctl(8) and systat(1) collect and show them. Note that pfctl -s info needs the -v sw
Show pf fragment reassembly counters.
Framgent count and statistics are stored in struct pf_status. From there pfctl(8) and systat(1) collect and show them. Note that pfctl -s info needs the -v switch to show fragments. As fragment reassembly has its own mutex, also grab this in pf ipctl(2) and sysctl(2) code.
input claudio@; OK henning@
show more ...
|
#
034f31ce |
| 01-Jan-2024 |
bluhm <bluhm@openbsd.org> |
Protect link between pf and inp with mutex.
Introduce global mutex to protect the pointers between pf state key and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do not need exclusive ne
Protect link between pf and inp with mutex.
Introduce global mutex to protect the pointers between pf state key and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do not need exclusive netlock anymore. Use a bunch of read once unlocked access to reduce performance impact.
OK sashan@
show more ...
|
#
46650f23 |
| 10-Oct-2023 |
bluhm <bluhm@openbsd.org> |
Remove dead code in pf_pull_hdr().
pf_pull_hdr() allows to pass an action pointer parameter as output value. This is never used, all callers pass a NULL argument. Remove ACTION_SET() entirely.
Th
Remove dead code in pf_pull_hdr().
pf_pull_hdr() allows to pass an action pointer parameter as output value. This is never used, all callers pass a NULL argument. Remove ACTION_SET() entirely.
The logic (fragoff >= len) in pf_pull_hdr() does not work since revision 1.4. Before it was used to drop short TCP or UDP fragments that contained only part of the header. Current code in pf_pull_hdr() drops the packets anyway, so always set reason PFRES_FRAG.
OK kn@ sashan@
show more ...
|
#
cc90b7e6 |
| 06-Jul-2023 |
dlg <dlg@openbsd.org> |
big update to pfsync to try and clean up locking in particular.
moving pf forward has been a real struggle, and pfsync has been a constant source of pain. we have been papering over the problems for
big update to pfsync to try and clean up locking in particular.
moving pf forward has been a real struggle, and pfsync has been a constant source of pain. we have been papering over the problems for a while now, but it reached the point that it needed a fundamental restructure, which is what this diff is.
the big headliner changes in this diff are:
- pfsync specific locks
this is the whole reason for this diff.
rather than rely on NET_LOCK or KERNEL_LOCK or whatever, pfsync now has it's own locks to protect it's internal data structures. this is important because pfsync runs a bunch of timeouts and tasks to push pfsync packets out on the wire, or when it's handling requests generated by incoming pfsync packets, both of which happen outside pf itself running. having pfsync specific locks around pfsync data structures makes the mutations of these data structures a lot more explicit and auditable.
- partitioning
to enable future parallelisation of the network stack, this rewrite includes support for pfsync to partition states into different "slices". these slices run independently, ie, the states collected by one slice are serialised into a separate packet to the states collected and serialised by another slice.
states are mapped to pfsync slices based on the pf state hash, which is the same hash that the rest of the network stack and multiq hardware uses.
- no more pfsync called from netisr
pfsync used to be called from netisr to try and bundle packets, but now that there's multiple pfsync slices this doesnt make sense. instead it uses tasks in softnet tqs.
- improved bulk transfer handling
there's shiny new state machines around both the bulk transmit and receive handling. pfsync used to do horrible things to carp demotion counters, but now it is very predictable and returns the counters back where they started.
- better tdb handling
the tdb handling was pretty hairy, but hrvoje has kicked this around a lot with ipsec and sasyncd and we've found and fixed a bunch of issues as a result of that testing.
- mpsafe pf state purges
this was committed previously, but because the locks pfsync relied on weren't clear this just caused a ton of bugs. as part of this diff it's now reliable, and moves a big chunk of work out from under KERNEL_LOCK, which in turn improves the responsiveness and throughput of a firewall even if you're not using pfsync.
there's a bunch of other little changes along the way, but the above are the big ones.
hrvoje has done performance testing with this diff and notes a big improvement when pfsync is not in use. performance when pfsync is enabled is about the same, but im hoping the slices means we can scale along with pf as it improves.
lots (months) of testing by me and hrvoje on pfsync boxes tests and ok sashan@ deraadt@ says this is a good time to put it in
show more ...
|
#
d2364f60 |
| 04-Jul-2023 |
sashan <sashan@openbsd.org> |
The recent change to DIOCGETRULE allows applications which periodically read rules from pf(4) to consume all kernel memory. The bug has been discovered and root caused by florian@. In this particular
The recent change to DIOCGETRULE allows applications which periodically read rules from pf(4) to consume all kernel memory. The bug has been discovered and root caused by florian@. In this particular case it was snmpd(8) what ate all kernel memory.
This commit introduces DIOCXEND to pf(4) so applications such as snmpd(8) and systat(1) to close ticket/transaction when they are done with fetching the rules. This change also updates snmpd(8) and systat(1) to use newly introduced DIOCXEND ioctl(2).
OK claudio@, deraadt@, kn@
show more ...
|
#
d66d11b4 |
| 26-May-2023 |
kn <kn@openbsd.org> |
Remove net lock from DIOC{S,G}ETLIMIT
Grab the pf lock for pf_pool_limits[] in pfsync such that all access is covered by the pf lock; document accordingly.
Hard memory pool limits don't need the n
Remove net lock from DIOC{S,G}ETLIMIT
Grab the pf lock for pf_pool_limits[] in pfsync such that all access is covered by the pf lock; document accordingly.
Hard memory pool limits don't need the net lock for protection, pool(9)s have their own internal lock and the pf lock fully covers limit values.
(pf_pool_limits[] access in DIOCXCOMMIT remains under pf *and net* lock until the rest in there gets pulled out of the net lock.)
OK sashan
show more ...
|
#
072583c9 |
| 28-Apr-2023 |
sashan <sashan@openbsd.org> |
This change speeds up DIOCGETRULE ioctl(2) which pfctl(8) uses to retrieve rules from kernel. The current implementation requires like O((n^2)/2) operation to read the complete rule set, because each
This change speeds up DIOCGETRULE ioctl(2) which pfctl(8) uses to retrieve rules from kernel. The current implementation requires like O((n^2)/2) operation to read the complete rule set, because each DIOCGETRULE operation must iterate over previous n rules to find (n + 1)-th rule to read.
To address the issue diff introduces a pf_trans structure to keep pointer to next rule to read, thus reading process does not need to iterate from beginning of rule set to reach the next rule. All transactions opened by process get closed either when process is done (reads all rules) or when /dev/pf device is closed.
the diff also comes with lots of improvements from dlg@ and kn@
OK dlg@, kn@
show more ...
|
#
1fdb608f |
| 07-Feb-2023 |
sashan <sashan@openbsd.org> |
internal representation of icmp type/code in pfctl(8)/pf(4) does not fit into u_int8_t. Issue has been noticed and kindly reported by amalinin _at_ bh0.amt.ru via bugs@.
OK bluhm@
|
#
acafb14e |
| 06-Jan-2023 |
sashan <sashan@openbsd.org> |
PF_ANCHOR_STACK_MAX is insufficient protection against stack overflow. On amd64 stack overflows for anchor rule with depth ~30. The tricky thing is the 'safe' depth varies depending on kind of packet
PF_ANCHOR_STACK_MAX is insufficient protection against stack overflow. On amd64 stack overflows for anchor rule with depth ~30. The tricky thing is the 'safe' depth varies depending on kind of packet processed by pf_match_rule(). For example for local outbound TCP packet stack overflows when recursion if pf_match_rule() reaches depth 24.
Instead of lowering PF_ANCHOR_STACK_MAX to 20 and hoping it will be enough on all platforms and for all packets I'd like to stop calling pf_match_rule() recursively. This commit brings back pf_anchor_stackframe array we used to have back in 2017. It also revives patrick@'s idea to pre-allocate stack frame arrays from per-cpu.
OK kn@
show more ...
|
#
e9311d0b |
| 04-Jan-2023 |
dlg <dlg@openbsd.org> |
move the pf_state_tree_id type from pfvar.h to pfvar_priv.h.
the pf_state_tree_id type is private to the kernel.
while here, move it from being an RB tree to an RBT tree. this saves about 12k in pf
move the pf_state_tree_id type from pfvar.h to pfvar_priv.h.
the pf_state_tree_id type is private to the kernel.
while here, move it from being an RB tree to an RBT tree. this saves about 12k in pf.o on amd64.
ok sashan@
show more ...
|
#
211190b8 |
| 04-Jan-2023 |
dlg <dlg@openbsd.org> |
move the pf_state_tree rb tree type from pfvar.h to pfvar_priv.h
the pf_state_tree types are kernel private, and are not used by userland. make build agrees with me.
while here, move the pf_state_t
move the pf_state_tree rb tree type from pfvar.h to pfvar_priv.h
the pf_state_tree types are kernel private, and are not used by userland. make build agrees with me.
while here, move the pf_state_tree from the RB macros to the RBT functions. this shaves about 13k off pf.o on amd64.
ok sashan@
show more ...
|
#
ad6271c5 |
| 22-Dec-2022 |
dlg <dlg@openbsd.org> |
use stoeplitz to generate a hash/flowid for state keys.
the hash will be used to partition work in pf and pfsync in the future, and right now it is used as the first comparison in the rb tree state
use stoeplitz to generate a hash/flowid for state keys.
the hash will be used to partition work in pf and pfsync in the future, and right now it is used as the first comparison in the rb tree state lookup.
using stoeplitz means that pf will hash traffic the same way that hardware using a stoeplitz key will hash incoming traffic on rings. stoeplitz is also used by the tcp stack to generate a flow id, which is used to pick which transmit ring is used on nics with multiple queues too. using the same algorithm throughout the stack encourages affinity of packets to rings and softnet threads the whole way through.
using the hash as the first comparison in the state rb tree comparison should encourage faster traversal of the state tree by having all the address/port bits summarised into the single hash value. however, tests by hrvoje popovski don't show performance changing. on the plus side, if this change is free from a performance point of view then it makes the future steps more straightforward.
discussed at length at h2k22 tested by sashan@ and hrvoje popovski ok tb@ sashan@ claudio@ jmatthew@
show more ...
|
#
35751c74 |
| 21-Dec-2022 |
dlg <dlg@openbsd.org> |
prefix pf_state_key and pf_state_item struct bits to make them more unique.
this makes searching for the struct members easier, which in turn makes tweaking code around them a lot easier too. sk_ref
prefix pf_state_key and pf_state_item struct bits to make them more unique.
this makes searching for the struct members easier, which in turn makes tweaking code around them a lot easier too. sk_refcnt in particular would have been a lot nicer to fiddle with than just refcnt because pf_state structs also have a refcnt, which is annoying.
tweaks and ok sashan@ reads ok kn@
show more ...
|
#
d956567b |
| 19-Dec-2022 |
dlg <dlg@openbsd.org> |
move pf_state_item and pf_state_key structs from pfvar.h to pfvar_priv.h.
both of these are kernel private data structures and do not need to be visible to userland. moving them to pfvar_priv.h make
move pf_state_item and pf_state_key structs from pfvar.h to pfvar_priv.h.
both of these are kernel private data structures and do not need to be visible to userland. moving them to pfvar_priv.h makes this explicit, and makes it leass scary to tweak them in the future.
ok deraadt@ kn@ sashan@
show more ...
|
#
437c6c3f |
| 16-Dec-2022 |
dlg <dlg@openbsd.org> |
always keep pf_state_keys attached to pf_states.
pf_state structures don't contain ip addresses, protocols, ports, etc. that information is stored in a pf_state_key struct, which is used to wire a s
always keep pf_state_keys attached to pf_states.
pf_state structures don't contain ip addresses, protocols, ports, etc. that information is stored in a pf_state_key struct, which is used to wire a state into the state table. when things like pfsync or the pf state ioctls want to export information about a state, particularly the addresses on it, they needs the pf_state_key struct to read from.
before this diff the code assumed that when a state was removed from the state tables it could throw the pf_state_key structs away as part of that removal. this code changes it so once pf_state_insert succeeds, a pf_state will keep its references to the pf_state_key structs until the pf_state struct itself is being destroyed.
this allows anything that holds a reference to a pf_state to also look at the pf_state_key structs because they're now effectively an immutable part of the pf_state struct.
this is by far the simplest and most straightforward fix for pfsync crashing on pf_state_key dereferences we've come up with so far. it has been made possible by the addition of reference counts to pf_state and pf_state_key structs, which allows us to properly account for this adjusted lifecycle for pf_state_keys on pf_state structs.
sashan@ and i have been kicking this diff around for a couple of weeks now. ok sashan@ jmatthew@
show more ...
|
#
af09f97b |
| 25-Nov-2022 |
bluhm <bluhm@openbsd.org> |
revert pf.c r1.1152 again: move pf_purge out from under the kernel lock
Using systqmp for pf_purge creates a deadlock between pf_purge() and ixgbe_stop() and possibly other drivers. On systqmp pf(4
revert pf.c r1.1152 again: move pf_purge out from under the kernel lock
Using systqmp for pf_purge creates a deadlock between pf_purge() and ixgbe_stop() and possibly other drivers. On systqmp pf(4) needs netlock which the interface ioctl(2) is holding. ix(4) waits in sched_barrier() which is also scheduled on the systqmp task queue.
Removing the netlock from pf_purge() as a quick fix caused other problems.
backout suggested by deraadt@
show more ...
|
#
a6949b20 |
| 11-Nov-2022 |
dlg <dlg@openbsd.org> |
try pf.c r1.1143 again: move pf_purge out from under the kernel lock
this also avoids holding NET_LOCK too long.
the main change is done by running the purge tasks in systqmp instead of systq. the
try pf.c r1.1143 again: move pf_purge out from under the kernel lock
this also avoids holding NET_LOCK too long.
the main change is done by running the purge tasks in systqmp instead of systq. the pf state list was recently reworked so iteration over the state can be done without blocking insertions.
however, scanning a lot of states can still take a lot of time, so this also makes the state list scanner yield if it has spent too much time running.
the other purge tasks for source nodes, rules, and fragments have been moved to their own timeout/task pair to simplify the time accounting.
in my environment, before this change pf purges often took 10 to 50ms. the softclock thread runs next to it often took a similar amount of time, presumably because they ended up spinning waiting for each other. after this change the pf_purges are more like 6 to 12ms, and dont block softclock. most of the variability in the runs now seems to come from contention on the net lock.
tested by me sthen@ chris@ ok sashan@ kn@ claudio@
the diff was backed out because it made things a bit more racey, but sashan@ has squashed those races this week. let's try it again.
show more ...
|
#
4103306e |
| 11-Nov-2022 |
dlg <dlg@openbsd.org> |
rewrite the pf_state_peer_ntoh and pf_state_peer_hton macros as functions.
i can read this code as functions, but it takes too much effort as macros.
|
#
bd25c300 |
| 11-Nov-2022 |
dlg <dlg@openbsd.org> |
move struct pf_state from pfvar.h to pfvar_priv.h.
we (sashan) are going to add a mutex to the pf_state struct, but a mutex is a kernel data structure that changes shape depending on things like whe
move struct pf_state from pfvar.h to pfvar_priv.h.
we (sashan) are going to add a mutex to the pf_state struct, but a mutex is a kernel data structure that changes shape depending on things like whether MULTIPROCESSOR is enabled, and should therefore not be visible to userland. when we added a mutex to pf_state, compiling pfctl failed because it doesn't know what a mutex is and it can't know which version of it the current kernel is running with.
moving struct pf_state to pfvar_priv.h makes it clear it is a private kernel only data structure, and avoids this leak into userland.
tested by me and make build ok sashan@
show more ...
|
#
e0fbf0e2 |
| 10-Nov-2022 |
sashan <sashan@openbsd.org> |
revert pf_state mtx commit, because it breaks tree. pfctl does not build
OK dlg@
|
#
02c56bfa |
| 10-Nov-2022 |
sashan <sashan@openbsd.org> |
Add a mutex to pf_state structure. Mutex retain a consistency of structure members without using a global state lock. The first member which uses protection by mutex is key[] array. more will follow.
Add a mutex to pf_state structure. Mutex retain a consistency of structure members without using a global state lock. The first member which uses protection by mutex is key[] array. more will follow.
OK dlg@
show more ...
|
#
a21b78ca |
| 09-Nov-2022 |
sashan <sashan@openbsd.org> |
simplify expiration of 'once' rules. let packet to mark 'once' rule as expired. The rule will be removed by pfctl(8) when rules are updated.
OK kn@
|
#
8cdf5d8c |
| 07-Nov-2022 |
dlg <dlg@openbsd.org> |
revert "move pf_purge out from under the kernel lock".
hrvoje popovski showed me pfsync blowing up with this. im backing it out quickly in case something else at the hackathon makes it harder to do
revert "move pf_purge out from under the kernel lock".
hrvoje popovski showed me pfsync blowing up with this. im backing it out quickly in case something else at the hackathon makes it harder to do later.
kn@ agrees
show more ...
|