#
058bd4d2 |
| 11-Mar-2012 |
Joe Perches <joe@perches.com> |
net: Convert printks to pr_<level>
Use a more current kernel messaging style.
Convert a printk block to print_hex_dump. Coalesce formats, align arguments. Use %s, __func__ instead of embedding func
net: Convert printks to pr_<level>
Use a more current kernel messaging style.
Convert a printk block to print_hex_dump. Coalesce formats, align arguments. Use %s, __func__ instead of embedding function names.
Some messages that were prefixed with <foo>_close are now prefixed with <foo>_fini. Some ah4 and esp messages are now not prefixed with "ip ".
The intent of this patch is to later add something like #define pr_fmt(fmt) "IPv4: " fmt. to standardize the output messages.
Text size is trivially reduced. (x86-32 allyesconfig)
$ size net/ipv4/built-in.o* text data bss dec hex filename 887888 31558 249696 1169142 11d6f6 net/ipv4/built-in.o.new 887934 31558 249800 1169292 11d78c net/ipv4/built-in.o.old
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
87fb4b7b |
| 13-Oct-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: more accurate skb truesize
skb truesize currently accounts for sk_buff struct and part of skb head. kmalloc() roundings are also ignored.
Considering that skb_shared_info is larger than sk_buf
net: more accurate skb truesize
skb truesize currently accounts for sk_buff struct and part of skb head. kmalloc() roundings are also ignored.
Considering that skb_shared_info is larger than sk_buff, its time to take it into account for better memory accounting.
This patch introduces SKB_TRUESIZE(X) macro to centralize various assumptions into a single place.
At skb alloc phase, we put skb_shared_info struct at the exact end of skb head, to allow a better use of memory (lowering number of reallocations), since kmalloc() gives us power-of-two memory blocks.
Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are aligned to cache lines, as before.
Note: This patch might trigger performance regressions because of misconfigured protocol stacks, hitting per socket or global memory limits that were previously not reached. But its a necessary step for a more accurate memory accounting.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <ak@linux.intel.com> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
415b3334 |
| 22-Jul-2011 |
David S. Miller <davem@davemloft.net> |
icmp: Fix regression in nexthop resolution during replies.
icmp_route_lookup() uses the wrong flow parameters if the reverse session route lookup isn't used.
So do not commit to the re-decoded flow
icmp: Fix regression in nexthop resolution during replies.
icmp_route_lookup() uses the wrong flow parameters if the reverse session route lookup isn't used.
So do not commit to the re-decoded flow until we actually make a final decision to use a real route saved in 'rt2'.
Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
a48eff12 |
| 18-May-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Pass explicit destination address to rt_bind_peer().
Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c319b4d7 |
| 13-May-2011 |
Vasiliy Kulikov <segoon@openwall.com> |
net: ipv4: add IPPROTO_ICMP socket kind
This patch adds IPPROTO_ICMP socket kind. It makes it possible to send ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages without any s
net: ipv4: add IPPROTO_ICMP socket kind
This patch adds IPPROTO_ICMP socket kind. It makes it possible to send ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages without any special privileges. In other words, the patch makes it possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In order not to increase the kernel's attack surface, the new functionality is disabled by default, but is enabled at bootup by supporting Linux distributions, optionally with restriction to a group or a group range (see below).
Similar functionality is implemented in Mac OS X: http://www.manpagez.com/man/4/icmp/
A new ping socket is created with
socket(PF_INET, SOCK_DGRAM, PROT_ICMP)
Message identifiers (octets 4-5 of ICMP header) are interpreted as local ports. Addresses are stored in struct sockaddr_in. No port numbers are reserved for privileged processes, port 0 is reserved for API ("let the kernel pick a free number"). There is no notion of remote ports, remote port numbers provided by the user (e.g. in connect()) are ignored.
Data sent and received include ICMP headers. This is deliberate to: 1) Avoid the need to transport headers values like sequence numbers by other means. 2) Make it easier to port existing programs using raw sockets.
ICMP headers given to send() are checked and sanitized. The type must be ICMP_ECHO and the code must be zero (future extensions might relax this, see below). The id is set to the number (local port) of the socket, the checksum is always recomputed.
ICMP reply packets received from the network are demultiplexed according to their id's, and are returned by recv() without any modifications. IP header information and ICMP errors of those packets may be obtained via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source quenches and redirects are reported as fake errors via the error queue (IP_RECVERR); the next hop address for redirects is saved to ee_info (in network order).
socket(2) is restricted to the group range specified in "/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning that nobody (not even root) may create ping sockets. Setting it to "100 100" would grant permissions to the single group (to either make /sbin/ping g+s and owned by this group or to grant permissions to the "netadmins" group), "0 4294967295" would enable it for the world, "100 4294967295" would enable it for the users, but not daemons.
The existing code might be (in the unlikely case anyone needs it) extended rather easily to handle other similar pairs of ICMP messages (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply etc.).
Userspace ping util & patch for it: http://openwall.info/wiki/people/segoon/ping
For Openwall GNU/*/Linux it was the last step on the road to the setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels) is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs: http://mirrors.kernel.org/openwall/Owl/current/iso/
Initially this functionality was written by Pavel Kankovsky for Linux 2.4.32, but unfortunately it was never made public.
All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with the patch.
PATCH v3: - switched to flowi4. - minor changes to be consistent with raw sockets code.
PATCH v2: - changed ping_debug() to pr_debug(). - removed CONFIG_IP_PING. - removed ping_seq_fops.owner field (unused for procfs). - switched to proc_net_fops_create(). - switched to %pK in seq_printf().
PATCH v1: - fixed checksumming bug. - CAP_NET_RAW may not create icmp sockets anymore.
RFC v2: - minor cleanups. - introduced sysctl'able group range to restrict socket(2).
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
9f6abb5f |
| 09-May-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: icmp: Eliminate remaining uses of rt->rt_src
On input packets, rt->rt_src always equals ip_hdr(skb)->saddr
Anything that mangles or otherwise changes the IP header must relookup the route fou
ipv4: icmp: Eliminate remaining uses of rt->rt_src
On input packets, rt->rt_src always equals ip_hdr(skb)->saddr
Anything that mangles or otherwise changes the IP header must relookup the route found at skb_rtable(). Therefore this invariant must always hold true.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f5fca608 |
| 09-May-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Pass flow key down into ip_append_*().
This way rt->rt_dst accesses are unnecessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77968b78 |
| 09-May-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Pass flow keys down into datagram packet building engine.
This way ip_output.c no longer needs rt->rt_{src,dst}.
We already have these keys sitting, ready and waiting, on the stack or in a so
ipv4: Pass flow keys down into datagram packet building engine.
This way ip_output.c no longer needs rt->rt_{src,dst}.
We already have these keys sitting, ready and waiting, on the stack or in a socket structure.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
f6d8bd05 |
| 21-Apr-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
inet: add RCU protection to inet->opt
We lack proper synchronization to manipulate inet->opt ip_options
Problem is ip_make_skb() calls ip_setup_cork() and ip_setup_cork() possibly makes a copy of i
inet: add RCU protection to inet->opt
We lack proper synchronization to manipulate inet->opt ip_options
Problem is ip_make_skb() calls ip_setup_cork() and ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options), without any protection against another thread manipulating inet->opt.
Another thread can change inet->opt pointer and free old one under us.
Use RCU to protect inet->opt (changed to inet->inet_opt).
Instead of handling atomic refcounts, just copy ip_options when necessary, to avoid cache line dirtying.
We cant insert an rcu_head in struct ip_options since its included in skb->cb[], so this patch is large because I had to introduce a new ip_options_rcu structure.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b71d1d42 |
| 22-Apr-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
inet: constify ip headers and in6_addr
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious.
Signed-off-by: Eric Dumazet <eric.du
inet: constify ip headers and in6_addr
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
25985edc |
| 31-Mar-2011 |
Lucas De Marchi <lucas.demarchi@profusion.mobi> |
Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
|
#
9cce96df |
| 12-Mar-2011 |
David S. Miller <davem@davemloft.net> |
net: Put fl4_* macros to struct flowi4 and use them again.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d6ec938 |
| 12-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Use flowi4 in public route lookup interfaces.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6281dcc9 |
| 12-Mar-2011 |
David S. Miller <davem@davemloft.net> |
net: Make flowi ports AF dependent.
Create two sets of port member accessors, one set prefixed by fl4_* and the other prefixed by fl6_*
This will let us to create AF optimal flow instances.
It wil
net: Make flowi ports AF dependent.
Create two sets of port member accessors, one set prefixed by fl4_* and the other prefixed by fl6_*
This will let us to create AF optimal flow instances.
It will work because every context in which we access the ports, we have to be fully aware of which AF the flowi is anyways.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
1d28f42c |
| 12-Mar-2011 |
David S. Miller <davem@davemloft.net> |
net: Put flowi_* prefix on AF independent members of struct flowi
I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant include
net: Put flowi_* prefix on AF independent members of struct flowi
I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant includes first, much like struct sock_common.
This is the first step to move in that direction.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
5e2b61f7 |
| 05-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Remove flowi from struct rtable.
The only necessary parts are the src/dst addresses, the interface indexes, the TOS, and the mark.
The rest is unnecessary bloat, which amounts to nearly 50 by
ipv4: Remove flowi from struct rtable.
The only necessary parts are the src/dst addresses, the interface indexes, the TOS, and the mark.
The rest is unnecessary bloat, which amounts to nearly 50 bytes on 64-bit.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
b23dd4fe |
| 02-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Make output route lookup return rtable directly.
Instead of on the stack.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
452edd59 |
| 02-Mar-2011 |
David S. Miller <davem@davemloft.net> |
xfrm: Return dst directly from xfrm_lookup()
Instead of on the stack.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f6d460cf |
| 01-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Make icmp route lookup code a bit clearer.
The route lookup code in icmp_send() is slightly tricky as a result of having to handle all of the requirements of RFC 4301 host relookups.
Pull the
ipv4: Make icmp route lookup code a bit clearer.
The route lookup code in icmp_send() is slightly tricky as a result of having to handle all of the requirements of RFC 4301 host relookups.
Pull the route resolution into a seperate function, so that the error handling and route reference counting is hopefully easier to see and contained wholly within this new routine.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
92d86829 |
| 04-Feb-2011 |
David S. Miller <davem@davemloft.net> |
inetpeer: Move ICMP rate limiting state into inet_peer entries.
Like metrics, the ICMP rate limiting bits are cached state about a destination. So move it into the inet_peer entries.
If an inet_pe
inetpeer: Move ICMP rate limiting state into inet_peer entries.
Like metrics, the ICMP rate limiting bits are cached state about a destination. So move it into the inet_peer entries.
If an inet_peer cannot be bound (the reason is memory allocation failure or similar), the policy is to allow.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
5811662b |
| 12-Nov-2010 |
Changli Gao <xiaosuo@gmail.com> |
net: use the macros defined for the members of flowi
Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Mil
net: use the macros defined for the members of flowi
Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
7d98ffd8 |
| 05-Nov-2010 |
Ulrich Weber <uweber@astaro.com> |
xfrm: update flowi saddr in icmp_send if unset
otherwise xfrm_lookup will fail to find correct policy
Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft
xfrm: update flowi saddr in icmp_send if unset
otherwise xfrm_lookup will fail to find correct policy
Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c7537967 |
| 12-Nov-2010 |
David S. Miller <davem@davemloft.net> |
ipv4: Make rt->fl.iif tests lest obscure.
When we test rt->fl.iif against zero, we're seeing if it's an output or an input route.
Make that explicit with some helper functions.
Signed-off-by: Davi
ipv4: Make rt->fl.iif tests lest obscure.
When we test rt->fl.iif against zero, we're seeing if it's an output or an input route.
Make that explicit with some helper functions.
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
2244d07b |
| 17-Aug-2010 |
Oliver Hartkopp <socketcan@hartkopp.net> |
net: simplify flags for tx timestamping
This patch removes the abstraction introduced by the union skb_shared_tx in the shared skb data.
The access of the different union elements at several places
net: simplify flags for tx timestamping
This patch removes the abstraction introduced by the union skb_shared_tx in the shared skb data.
The access of the different union elements at several places led to some confusion about accessing the shared tx_flags e.g. in skb_orphan_try().
http://marc.info/?l=linux-netdev&m=128084897415886&w=2
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
4bc2f18b |
| 09-Jul-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net/ipv4: EXPORT_SYMBOL cleanups
CodingStyle cleanups
EXPORT_SYMBOL should immediately follow the symbol declaration.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. M
net/ipv4: EXPORT_SYMBOL cleanups
CodingStyle cleanups
EXPORT_SYMBOL should immediately follow the symbol declaration.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|