summaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)AuthorFilesLines
2006-09-21[CRYPTO] users: Use crypto_comp and crypto_has_*Herbert Xu1-12/+13
This patch converts all users to use the new crypto_comp type and the crypto_has_* functions. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-09-21[IPSEC]: Use HMAC template and hash interfaceHerbert Xu2-27/+45
This patch converts IPsec to use the new HMAC template. The names of existing simple digest algorithms may still be used to refer to their HMAC composites. The same structure can be used by other MACs such as AES-XCBC-MAC. This patch also switches from the digest interface to hash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-21[IPSEC] ESP: Use block ciphers where applicableHerbert Xu2-20/+30
This patch converts IPSec/ESP to use the new block cipher type where applicable. Similar to the HMAC conversion, existing algorithm names have been kept for compatibility. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-09-19[IPV4] fib_trie: missing ntohl() when calling fib_semantic_match()Al Viro1-4/+4
fib_trie.c::check_leaf() passes host-endian where fib_semantic_match() expects (and stores into) net-endian. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-17[TCP] tcp-lp: bug fix for oops in 2.6.18-rc6Wong Hoi Sing Edison1-14/+21
Sorry that the patch submited yesterday still contain a small bug. This version have already been test for hours with BT connections. The oops is now difficult to reproduce. Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-17[IPVS]: remove the debug option go ip_vs_ftpSimon Horman1-14/+6
This patch makes the debuging behaviour of this code more consistent with the rest of IPVS. Signed-Off-By: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-17[IPVS]: Make sure ip_vs_ftp ports are validSimon Horman1-0/+6
I'm not entirely sure what happens in the case of a valid port, at best it'll be silently ignored. This patch ignores them a little more verbosely. Signed-Off-By: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-17[IPVS]: auto-help for ip_vs_ftpSimon Horman1-0/+1
Fill in a help message for the ports option to ip_vs_ftp Signed-Off-By: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-17[TCP]: Turn ABC off.Stephen Hemminger1-1/+1
Turn Appropriate Byte Count off by default because it unfairly penalizes applications that do small writes. Add better documentation to describe what it is so users will understand why they might want to turn it on. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-31[IPV4]: Fix SNMPv2 "ipFragFails" counter errorWei Dong1-0/+1
When I tested Linux kernel 2.6.17.7 about statistics "ipFragFails",found that this counter couldn't increase correctly. The criteria is RFC2011: RFC2011 ipFragFails OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION "The number of IP datagrams that have been discarded because they needed to be fragmented at this entity but could not be, e.g., because their Don't Fragment flag was set." ::= { ip 18 } When I send big IP packet to a router with DF bit set to 1 which need to be fragmented, and router just sends an ICMP error message ICMP_FRAG_NEEDED but no increments for this counter(in the function ip_fragment). Signed-off-by: Wei Dong <weid@nanjing-fnst.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-29[TCP]: Two RFC3465 Appropriate Byte Count fixes.Daikichi Osuga2-3/+8
1) fix slow start after retransmit timeout 2) fix case of L=2*SMSS acked bytes comparison Signed-off-by: Daikichi Osuga <osugad@s1.nttdocomo.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-22[TCP]: Limit window scaling if window is clamped.Stephen Hemminger1-0/+1
This small change allows for easy per-route workarounds for broken hosts or middleboxes that are not compliant with TCP standards for window scaling. Rather than having to turn off window scaling globally. This patch allows reducing or disabling window scaling if window clamp is present. Example: Mark Lord reported a problem with 2.6.17 kernel being unable to access http://www.everymac.com # ip route add 216.145.246.23/32 via 10.8.0.1 window 65535 Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-22[NETFILTER]: arp_tables: fix table locking in arpt_do_tablePatrick McHardy1-1/+2
table->private might change because of ruleset changes, don't use it without holding the lock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17[NETFILTER]: ip_tables: fix table locking in ipt_do_tablePatrick McHardy1-1/+2
table->private might change because of ruleset changes, don't use it without holding the lock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17[NETFILTER]: ctnetlink: fix deadlock in table dumpingPatrick McHardy1-10/+7
ip_conntrack_put must not be called while holding ip_conntrack_lock since destroy_conntrack takes it again. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17[IPV4]: severe locking bug in fib_semantics.cAlexey Kuznetsov1-6/+6
Found in 2.4 by Yixin Pan <yxpan@hotmail.com>. > When I read fib_semantics.c of Linux-2.4.32, write_lock(&fib_info_lock) = > is used in fib_release_info() instead of write_lock_bh(&fib_info_lock). = > Is the following case possible: a BH interrupts fib_release_info() while = > holding the write lock, and calls ip_check_fib_default() which calls = > read_lock(&fib_info_lock), and spin forever. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17[MCAST]: Fix filter leak on device removal.David L Stevens1-13/+19
This fixes source filter leakage when a device is removed and a process leaves the group thereafter. This also includes corresponding fixes for IPv6 multicast source filters on device removal. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-17[IPV4]: Possible leak of multicast source filter sctructureMichal Ruzicka1-3/+3
There is a leak of a socket's multicast source filter list structure on closing a socket with a multicast source filter set on an interface that does not exist any more. Signed-off-by: Michal Ruzicka <michal.ruzicka@comstar.cz> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13[INET]: Use pskb_trim_unique when trimming paged unique skbsHerbert Xu1-2/+2
The IPv4/IPv6 datagram output path was using skb_trim to trim paged packets because they know that the packet has not been cloned yet (since the packet hasn't been given to anything else in the system). This broke because skb_trim no longer allows paged packets to be trimmed. Paged packets must be given to one of the pskb_trim functions instead. This patch adds a new pskb_trim_unique function to cover the IPv4/IPv6 datagram output path scenario and replaces the corresponding skb_trim calls with it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13[NETFILTER]: ulog: fix panic on SMP kernelsMark Huang1-0/+5
Fix kernel panic on various SMP machines. The culprit is a null ub->skb in ulog_send(). If ulog_timer() has already been scheduled on one CPU and is spinning on the lock, and ipt_ulog_packet() flushes the queue on another CPU by calling ulog_send() right before it exits, there will be no skbuff when ulog_timer() acquires the lock and calls ulog_send(). Cancelling the timer in ulog_send() doesn't help because it has already been scheduled and is running on the first CPU. Similar problem exists in ebt_ulog.c and nfnetlink_log.c. Signed-off-by: Mark Huang <mlhuang@cs.princeton.edu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13[NETFILTER]: {arp,ip,ip6}_tables: proper error recovery in init pathPatrick McHardy2-15/+45
Neither of {arp,ip,ip6}_tables cleans up behind itself when something goes wrong during initialization. Noticed by Rennie deGraaf <degraaf@cpsc.ucalgary.ca> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13[NETFILTER]: xt_hashlimit: fix limit off-by-onePatrick McHardy1-7/+4
Hashlimit doesn't account for the first packet, which is inconsistent with the limit match. Reported by ryan.castellucci@gmail.com, netfilter bugzilla #500. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13[TCP]: Fix botched memory leak fix to tcpprobe_read().David S. Miller1-1/+2
Somehow I clobbered James's original fix and only my subsequent compiler warning change went in for that changeset. Get the real fix in there. Noticed by Jesper Juhl. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-07[TCP]: SNMPv2 tcpOutSegs counter errorWei Yongjun1-3/+9
Do not count retransmitted segments. Signed-off-by: Wei Yongjun <yjwei@nanjing-fnst.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-07[IPV4]: Limit rt cache size properly.Kirill Korotaev1-1/+1
From: Kirill Korotaev <dev@sw.ru> During OpenVZ stress testing we found that UDP traffic with random src can generate too much excessive rt hash growing leading finally to OOM and kernel panics. It was found that for 4GB i686 system (having 1048576 total pages and 225280 normal zone pages) kernel allocates the following route hash: syslog: IP route cache hash table entries: 262144 (order: 8, 1048576 bytes) => ip_rt_max_size = 4194304 entries, i.e. max rt size is 4194304 * 256b = 1Gb of RAM > normal_zone Attached the patch which removes HASH_HIGHMEM flag from alloc_large_system_hash() call. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-04[TCP]: Fixes IW > 2 cases when TCP is application limitedIlpo Järvinen1-1/+2
Whenever a transfer is application limited, we are allowed at least initial window worth of data per window unless cwnd is previously less than that. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02[NET]: Fix more per-cpu typosAlexey Dobriyan1-1/+1
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02[AF_UNIX]: Kernel memory leak fix for af_unix datagram getpeersec patchCatherine Zhang1-2/+7
From: Catherine Zhang <cxzhang@watson.ibm.com> This patch implements a cleaner fix for the memory leak problem of the original unix datagram getpeersec patch. Instead of creating a security context each time a unix datagram is sent, we only create the security context when the receiver requests it. This new design requires modification of the current unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely, secid_to_secctx and release_secctx. The former retrieves the security context and the latter releases it. A hook is required for releasing the security context because it is up to the security module to decide how that's done. In the case of Selinux, it's a simple kfree operation. Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02[IPV6]: SNMPv2 "ipv6IfStatsOutFragCreates" counter errorWei Dong1-3/+4
When I tested linux kernel 2.6.71.7 about statistics "ipv6IfStatsOutFragCreates", and found that it couldn't increase correctly. The criteria is RFC 2465: ipv6IfStatsOutFragCreates OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION "The number of output datagram fragments that have been generated as a result of fragmentation at this output interface." ::= { ipv6IfStatsEntry 15 } I think there are two issues in Linux kernel. 1st: RFC2465 specifies the counter is "The number of output datagram fragments...". I think increasing this counter after output a fragment successfully is better. And it should not be increased even though a fragment is created but failed to output. 2nd: If we send a big ICMP/ICMPv6 echo request to a host, and receive ICMP/ICMPv6 echo reply consisted of some fragments. As we know that in Linux kernel first fragmentation occurs in ICMP layer(maybe saying transport layer is better), but this is not the "real" fragmentation,just do some "pre-fragment" -- allocate space for date, and form a frag_list, etc. The "real" fragmentation happens in IP layer -- set offset and MF flag and so on. So I think in "fast path" for ip_fragment/ip6_fragment, if we send a fragment which "pre-fragment" by upper layer we should also increase "ipv6IfStatsOutFragCreates". Signed-off-by: Wei Dong <weid@nanjing-fnst.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02[NETFILTER]: xt_hashlimit/xt_string: missing string validationPatrick McHardy1-0/+3
The hashlimit table name and the textsearch algorithm need to be terminated, the textsearch pattern length must not exceed the maximum size. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02[NETFILTER]: SIP helper: expect RTP streams in both directionsPatrick McHardy1-1/+1
Since we don't know in which direction the first packet will arrive, we need to create one expectation for each direction, which is currently prevented by max_expected beeing set to 1. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02[TCP]: Process linger2 timeout consistently.David S. Miller1-1/+2
Based upon guidance from Alexey Kuznetsov. When linger2 is active, we check to see if the fin_wait2 timeout is longer than the timewait. If it is, we schedule the keepalive timer for the difference between the timewait timeout and the fin_wait2 timeout. When this orphan socket is seen by tcp_keepalive_timer() it will try to transform this fin_wait2 socket into a fin_wait2 mini-socket, again if linger2 is active. Not all paths were setting this initial keepalive timer correctly. The tcp input path was doing it correctly, but tcp_close() wasn't, potentially making the socket linger longer than it really needs to. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02[NET]: Core net changes to generate neteventsTom Tucker1-0/+8
Generate netevents for: - neighbour changes - routing redirects - pmtu changes Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02[TCP]: SNMPv2 tcpAttemptFails counter errorWei Yongjun2-3/+3
Refer to RFC2012, tcpAttemptFails is defined as following: tcpAttemptFails OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION "The number of times TCP connections have made a direct transition to the CLOSED state from either the SYN-SENT state or the SYN-RCVD state, plus the number of times TCP connections have made a direct transition to the LISTEN state from the SYN-RCVD state." ::= { tcp 7 } When I lookup into RFC793, I found that the state change should occured under following condition: 1. SYN-SENT -> CLOSED a) Received ACK,RST segment when SYN-SENT state. 2. SYN-RCVD -> CLOSED b) Received SYN segment when SYN-RCVD state(came from LISTEN). c) Received RST segment when SYN-RCVD state(came from SYN-SENT). d) Received SYN segment when SYN-RCVD state(came from SYN-SENT). 3. SYN-RCVD -> LISTEN e) Received RST segment when SYN-RCVD state(came from LISTEN). In my test, those direct state transition can not be counted to tcpAttemptFails. Signed-off-by: Wei Yongjun <yjwei@nanjing-fnst.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02[TCP]: fix memory leak in net/ipv4/tcp_probe.c::tcpprobe_read()James Morris1-1/+1
Based upon a patch by Jesper Juhl. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-25[IPV4/IPV6]: Setting 0 for unused port field in RAW IP recvmsg().Tetsuo Handa1-0/+1
From: Tetsuo Handa from-linux-kernel@i-love.sakura.ne.jp The recvmsg() for raw socket seems to return random u16 value from the kernel stack memory since port field is not initialized. But I'm not sure this patch is correct. Does raw socket return any information stored in port field? [ BSD defines RAW IP recvmsg to return a sin_port value of zero. This is described in Steven's TCP/IP Illustrated Volume 2 on page 1055, which is discussing the BSD rip_input() implementation. ] Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-25[IPV4] ipmr: ip multicast route bug fix.Alexey Kuznetsov1-6/+13
IP multicast route code was reusing an skb which causes use after free and double free. From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Note, it is real skb_clone(), not alloc_skb(). Equeued skb contains the whole half-prepared netlink message plus room for the rest. It could be also skb_copy(), if we want to be puristic about mangling cloned data, but original copy is really not going to be used. Acked-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-24[IPV4]: Clear the whole IPCB, this clears also IPCB(skb)->flags.Guillaume Chazarain1-1/+1
Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-24[NETFILTER]: SNMP NAT: fix byteorder confusionPatrick McHardy1-2/+2
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-24[NETFILTER]: conntrack: fix SYSCTL=n compileAdrian Bunk1-2/+2
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-24[NETFILTER]: H.323 helper: fix possible NULL-ptr dereferencePatrick McHardy1-1/+1
An RCF message containing a timeout results in a NULL-ptr dereference if no RRQ has been seen before. Noticed by the "SATURN tool", reported by Thomas Dillig <tdillig@stanford.edu> and Isil Dillig <isil@stanford.edu>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21[IPV4]: Fix nexthop realm dumping for multipath routesPatrick McHardy1-4/+8
Routing realms exist per nexthop, but are only returned to userspace for the first nexthop. This is due to the fact that iproute2 only allows to set the realm for the first nexthop and the kernel refuses multipath routes where only a single realm is present. Dump all realms for multipath routes to enable iproute to correctly display them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21[NET]: Conversions from kmalloc+memset to k(z|c)alloc.Panagiotis Issaris15-47/+22
Signed-off-by: Panagiotis Issaris <takis@issaris.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21[IPV4]: Get rid of redundant IPCB->opts initialisationHerbert Xu5-6/+0
Now that we always zero the IPCB->opts in ip_rcv, it is no longer necessary to do so before calling netif_rx for tunneled packets. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-14[IPV4]: Clear skb cb on IP inputStephen Hemminger1-0/+3
when data arrives at IP through loopback (and possibly other devices). So the field needs to be cleared before it confuses the route code. This was seen when running netem over loopback, but there are probably other device cases. Maybe this should go into stable? Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-12[IPV4]: Fix error handling for fib_insert_node callHerbert Xu1-1/+1
The error handling around fib_insert_node was broken because we always zeroed the error before checking it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-12[IPCOMP]: Fix truesize after decompressionHerbert Xu1-1/+2
The truesize check has uncovered the fact that we forgot to update truesize after pskb_expand_head. Unfortunately pskb_expand_head can't update it for us because it's used in all sorts of different contexts, some of which would not allow truesize to be updated by itself. So the solution for now is to simply update it in IPComp. This patch also changes skb_put to __skb_put since we've just expanded tailroom by exactly that amount so we know it's there (but gcc does not). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-12[TCP] tcp_highspeed: Fix AI updates.Xiaoliang (David) Wei1-4/+9
I think there is still a problem with the AIMD parameter update in HighSpeed TCP code. Line 125~138 of the code (net/ipv4/tcp_highspeed.c): /* Update AIMD parameters */ if (tp->snd_cwnd > hstcp_aimd_vals[ca->ai].cwnd) { while (tp->snd_cwnd > hstcp_aimd_vals[ca->ai].cwnd && ca->ai < HSTCP_AIMD_MAX - 1) ca->ai++; } else if (tp->snd_cwnd < hstcp_aimd_vals[ca->ai].cwnd) { while (tp->snd_cwnd > hstcp_aimd_vals[ca->ai].cwnd && ca->ai > 0) ca->ai--; In fact, the second part (decreasing ca->ai) never decreases since the while loop's inequality is in the reverse direction. This leads to unfairness with multiple flows (once a flow happens to enjoy a higher ca->ai, it keeps enjoying that even its cwnd decreases) Here is a tentative fix (I also added a comment, trying to keep the change clear): Acked-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-10[TCP]: Remove TCP CompoundDavid S. Miller3-459/+0
This reverts: f890f921040fef6a35e39d15b729af1fd1a35f29 The inclusion of TCP Compound needs to be reverted at this time because it is not 100% certain that this code conforms to the requirements of Developer's Certificate of Origin 1.1 paragraph (b). Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-10[IPV4] inetpeer: Get rid of volatile from peer_totalHerbert Xu1-1/+1
The variable peer_total is protected by a lock. The volatile marker makes no sense. This shaves off 20 bytes on i386. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>