Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit c8965932a2e3b70197ec02c6741c29460279e2a8 ]
The function ip6_tnl_validate assumes that the rtnl
attribute IFLA_IPTUN_PROTO always be filled . If this
attribute is not filled by the userspace application
kernel get crashed with NULL pointer dereference. This
patch fixes the potential kernel crash when
IFLA_IPTUN_PROTO is missing .
Signed-off-by: Susant Sahani <susant@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 418a31561d594a2b636c1e2fa94ecd9e1245abb1 ]
If conntrack defragments incoming ipv6 frags it stores largest original
frag size in ip6cb and sets ->local_df.
We must thus first test the largest original frag size vs. mtu, and not
vice versa.
Without this patch PKTTOOBIG is still generated in ip6_fragment() later
in the stack, but
1) IPSTATS_MIB_INTOOBIGERRORS won't increment
2) packet did (needlessly) traverse netfilter postrouting hook.
Fixes: fe6cc55f3a9 ("net: ip, ipv6: handle gso skbs in forwarding path")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1c2658545816088477e91860c3a645053719cb54 ]
When the ipv6 fib changes during a table dump, the walk is
restarted and the number of nodes dumped are skipped. But the existing
code doesn't advance to the next node after a node is skipped. This can
cause the dump to loop or produce lots of duplicates when the fib
is modified during the dump.
This change advances the walk to the next node if the current node is
skipped after a restart.
Signed-off-by: Kumar Sundararajan <kumar@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 54d63f787b652755e66eb4dd8892ee6d3f5197fc ]
It's possible to remove the FB tunnel with the command 'ip link del ip6gre0' but
this is unsafe, the module always supposes that this device exists. For example,
ip6gre_tunnel_lookup() may use it unconditionally.
Let's add a rtnl handler for dellink, which will never remove the FB tunnel (we
let ip6gre_destroy_tunnels() do the job).
Introduced by commit c12b395a4664 ("gre: Support GRE over IPv6").
CC: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 30f78d8ebf7f514801e71b88a10c948275168518 ]
Francois reported that setting big mtu on loopback device could prevent
tcp sessions making progress.
We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.
We must limit the IPv6 MTU to (65535 + 40) bytes in theory.
Tested:
ifconfig lo mtu 70000
netperf -H ::1
Before patch : Throughput : 0.05 Mbits
After patch : Throughput : 35484 Mbits
Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c58dd2dd443c26d856a168db108a0cd11c285bf3 upstream.
All xtables variants suffer from the defect that the copy_to_user()
to copy the counters to user memory may fail after the table has
already been exchanged and thus exposed. Return an error at this
point will result in freeing the already exposed table. Any
subsequent packet processing will result in a kernel panic.
We can't copy the counters before exposing the new tables as we
want provide the counter state after the old table has been
unhooked. Therefore convert this into a silent error.
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 43a43b6040165f7b40b5b489fe61a4cb7f8c4980 ]
After commit c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify
processing to workqueue") some counters are now updated in process context
and thus need to disable bh before doing so, otherwise deadlocks can
happen on 32-bit archs. Fabio Estevam noticed this while while mounting
a NFS volume on an ARM board.
As a compensation for missing this I looked after the other *_STATS_BH
and found three other calls which need updating:
1) icmp6_send: ip6_fragment -> icmpv6_send -> icmp6_send (error handling)
2) ip6_push_pending_frames: rawv6_sendmsg -> rawv6_push_pending_frames -> ...
(only in case of icmp protocol with raw sockets in error handling)
3) ping6_v6_sendmsg (error handling)
Fixes: c15b1ccadb323ea ("ipv6: move DAD and addrconf_verify processing to workqueue")
Reported-by: Fabio Estevam <festevam@gmail.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f518338b16038beeb73e155e60d0f70beb9379f4 ]
Commit 812e44dd1829 ("ip6mr: advertise new mfc entries via rtnl") reuses the
function ip6mr_fill_mroute() to notify mfc events.
But this function was used only for dump and thus was always setting the
flag NLM_F_MULTI, which is wrong in case of a single notification.
Libraries like libnl will wait forever for NLMSG_DONE.
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e367c2d03dba4c9bcafad24688fadb79dd95b218 ]
In ip6_append_data_mtu(), when the xfrm mode is not tunnel(such as
transport),the ipsec header need to be added in the first fragment, so the mtu
will decrease to reserve space for it, then the second fragment come, the mtu
should be turn back, as the commit 0c1833797a5a6ec23ea9261d979aa18078720b74
said. however, in the commit a493e60ac4bbe2e977e7129d6d8cbb0dd236be, it use
*mtu = min(*mtu, ...) to change the mtu, which lead to the new mtu is alway
equal with the first fragment's. and cannot turn back.
when I test through ping6 -c1 -s5000 $ip (mtu=1280):
...frag (0|1232) ESP(spi=0x00002000,seq=0xb), length 1232
...frag (1232|1216)
...frag (2448|1216)
...frag (3664|1216)
...frag (4880|164)
which should be:
...frag (0|1232) ESP(spi=0x00001000,seq=0x1), length 1232
...frag (1232|1232)
...frag (2464|1232)
...frag (3696|1232)
...frag (4928|116)
so delete the min() when change back the mtu.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Fixes: 75a493e60ac4bb ("ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ecab67015ef6e3f3635551dcc9971cf363cc1cd5 ]
tmp_prefered_lft is an offset to ifp->tstamp, not now. Therefore
age needs to be added to the condition.
Age calculation in ipv6_create_tempaddr is different from the one
in addrconf_verify and doesn't consider ADDRCONF_TIMER_FUZZ_MINUS.
This can cause age in ipv6_create_tempaddr to be less than the one
in addrconf_verify and therefore unnecessary temporary address to
be generated.
Use age calculation as in addrconf_modify to avoid this.
Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c88507fbad8055297c1d1e21e599f46960cbee39 ]
DST_NOCOUNT should only be used if an authorized user adds routes
locally. In case of routes which are added on behalf of router
advertisments this flag must not get used as it allows an unlimited
number of routes getting added remotely.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d2d273ffabd315eecefce21a4391d44b6e156b73 ]
Without this fix, ipv6_exthdrs_offload_init doesn't register IPPROTO_DSTOPTS
offload, but returns 0 (as the IPPROTO_ROUTING registration actually succeeds).
This then causes the ipv6_gso_segment to drop IPv6 packets with IPPROTO_DSTOPTS
header.
The issue detected and the fix verified by running MS HCK Offload LSO test on
top of QEMU Windows guests, as this test sends IPv6 packets with
IPPROTO_DSTOPTS.
Signed-off-by: Anton Nayshtut <anton@swortex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit accfe0e356327da5bd53da8852b93fc22de9b5fc ]
The commit 9195bb8e381d81d5a315f911904cdf0cfcc919b8 ("ipv6: improve
ipv6_find_hdr() to skip empty routing headers") broke ipv6_find_hdr().
When a target is specified like IPPROTO_ICMPV6 ipv6_find_hdr()
returns -ENOENT when it's found, not the header as expected.
A part of IPVS is broken and possible also nft_exthdr_eval().
When target is -1 which it is most cases, it works.
This patch exits the do while loop if the specific header is found
so the nexthdr could be returned as expected.
Reported-by: Art -kwaak- van Breemen <ard@telegraafnet.nl>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
CC:Ansis Atteka <aatteka@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 916e4cf46d0204806c062c8c6c4d1f633852c5b6 ]
Currently we generate a new fragmentation id on UFO segmentation. It
is pretty hairy to identify the correct net namespace and dst there.
Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst
available at all.
This causes unreliable or very predictable ipv6 fragmentation id
generation while segmentation.
Luckily we already have pregenerated the ip6_frag_id in
ip6_ufo_append_data and can use it here.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fe6cc55f3a9a053482a76f5a6b2257cee51b4663 upstream.
Marcelo Ricardo Leitner reported problems when the forwarding link path
has a lower mtu than the incoming one if the inbound interface supports GRO.
Given:
Host <mtu1500> R1 <mtu1200> R2
Host sends tcp stream which is routed via R1 and R2. R1 performs GRO.
In this case, the kernel will fail to send ICMP fragmentation needed
messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
checks in forward path. Instead, Linux tries to send out packets exceeding
the mtu.
When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
not fragment the packets when forwarding, and again tries to send out
packets exceeding R1-R2 link mtu.
This alters the forwarding dstmtu checks to take the individual gso
segment lengths into account.
For ipv6, we send out pkt too big error for gso if the individual
segments are too big.
For ipv4, we either send icmp fragmentation needed, or, if the DF bit
is not set, perform software segmentation and let the output path
create fragments when the packet is leaving the machine.
It is not 100% correct as the error message will contain the headers of
the GRO skb instead of the original/segmented one, but it seems to
work fine in my (limited) tests.
Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
sofware segmentation.
However it turns out that skb_segment() assumes skb nr_frags is related
to mss size so we would BUG there. I don't want to mess with it considering
Herbert and Eric disagree on what the correct behavior should be.
Hannes Frederic Sowa notes that when we would shrink gso_size
skb_segment would then also need to deal with the case where
SKB_MAX_FRAGS would be exceeded.
This uses sofware segmentation in the forward path when we hit ipv4
non-DF packets and the outgoing link mtu is too small. Its not perfect,
but given the lack of bug reports wrt. GRO fwd being broken this is a
rare case anyway. Also its not like this could not be improved later
once the dust settles.
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ No relevant upstream commit. ]
This problem was fixed upstream by commit 1e9f3d6f1c40 ("ip6tnl: fix use after
free of fb_tnl_dev").
The upstream patch depends on upstream commit 0bd8762824e7 ("ip6tnl: add x-netns
support"), which was not backported into 3.10 branch.
First, explain the problem: when the ip6_tunnel module is unloaded,
ip6_tunnel_cleanup() is called.
rmmod ip6_tunnel
=> ip6_tunnel_cleanup()
=> rtnl_link_unregister()
=> __rtnl_kill_links()
=> for_each_netdev(net, dev) {
if (dev->rtnl_link_ops == ops)
ops->dellink(dev, &list_kill);
}
At this point, the FB device is deleted (and all ip6tnl tunnels).
=> unregister_pernet_device()
=> unregister_pernet_operations()
=> ops_exit_list()
=> ip6_tnl_exit_net()
=> ip6_tnl_destroy_tunnels()
=> t = rtnl_dereference(ip6n->tnls_wc[0]);
unregister_netdevice_queue(t->dev, &list);
We delete the FB device a second time here!
The previous fix removes these lines, which fix this double free. But the patch
introduces a memory leak when a netns is destroyed, because the FB device is
never deleted. By adding an rtnl ops which delete all ip6tnl device excepting
the FB device, we can keep this exlicit removal in ip6_tnl_destroy_tunnels().
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: Steven Rostedt <srostedt@redhat.com>
Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team)
Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Tested-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ No relevant upstream commit. ]
This reverts commit 22c3ec552c29cf4bd4a75566088950fe57d860c4.
This patch is not the right fix, it introduces a memory leak when a netns is
destroyed (the FB device is never deleted).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: Steven Rostedt <srostedt@redhat.com>
Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team)
Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Tested-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ No relevant upstream commit. ]
This problem was fixed upstream by commit 9434266f2c64 ("sit: fix use after free
of fb_tunnel_dev").
The upstream patch depends on upstream commit 5e6700b3bf98 ("sit: add support of
x-netns"), which was not backported into 3.10 branch.
First, explain the problem: when the sit module is unloaded, sit_cleanup() is
called.
rmmod sit
=> sit_cleanup()
=> rtnl_link_unregister()
=> __rtnl_kill_links()
=> for_each_netdev(net, dev) {
if (dev->rtnl_link_ops == ops)
ops->dellink(dev, &list_kill);
}
At this point, the FB device is deleted (and all sit tunnels).
=> unregister_pernet_device()
=> unregister_pernet_operations()
=> ops_exit_list()
=> sit_exit_net()
=> sit_destroy_tunnels()
In this function, no tunnel is found.
=> unregister_netdevice_queue(sitn->fb_tunnel_dev, &list);
We delete the FB device a second time here!
Because we cannot simply remove the second deletion (sit_exit_net() must remove
the FB device when a netns is deleted), we add an rtnl ops which delete all sit
device excepting the FB device and thus we can keep the explicit deletion in
sit_exit_net().
CC: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Reported-by: Steven Rostedt <srostedt@redhat.com>
Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team)
Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Tested-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a452ce345d63ddf92cd101e4196569f8718ad319 ]
I see a memory leak when using a transparent HTTP proxy using TPROXY
together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):
unreferenced object 0xffff88008cba4a40 (size 1696):
comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s)
hex dump (first 32 bytes):
0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00 .. j@..7..2.....
02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9
[<ffffffff81270185>] sk_prot_alloc+0x29/0xc5
[<ffffffff812702cf>] sk_clone_lock+0x14/0x283
[<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b
[<ffffffff8129a893>] netlink_broadcast+0x14/0x16
[<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3
[<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d
[<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0
[<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e
[<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55
[<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725
[<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154
[<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514
[<ffffffff8127aa77>] process_backlog+0xee/0x1c5
[<ffffffff8127c949>] net_rx_action+0xa7/0x200
[<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157
But there are many more, resulting in the machine going OOM after some
days.
From looking at the TPROXY code, and with help from Florian, I see
that the memory leak is introduced in tcp_v4_early_demux():
void tcp_v4_early_demux(struct sk_buff *skb)
{
/* ... */
iph = ip_hdr(skb);
th = tcp_hdr(skb);
if (th->doff < sizeof(struct tcphdr) / 4)
return;
sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
iph->saddr, th->source,
iph->daddr, ntohs(th->dest),
skb->skb_iif);
if (sk) {
skb->sk = sk;
where the socket is assigned unconditionally to skb->sk, also bumping
the refcnt on it. This is problematic, because in our case the skb
has already a socket assigned in the TPROXY target. This then results
in the leak I see.
The very same issue seems to be with IPv6, but haven't tested.
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 95f4a45de1a0f172b35451fc52283290adb21f6e ]
Bob Falken reported that after 4G packets, multicast forwarding stopped
working. This was because of a rule reference counter overflow which
freed the rule as soon as the overflow happend.
This patch solves this by adding the FIB_LOOKUP_NOREF flag to
fib_rules_lookup calls. This is safe even from non-rcu locked sections
as in this case the flag only implies not taking a reference to the rule,
which we don't need at all.
Rules only hold references to the namespace, which are guaranteed to be
available during the call of the non-rcu protected function reg_vif_xmit
because of the interface reference which itself holds a reference to
the net namespace.
Fixes: f0ad0860d01e47 ("ipv4: ipmr: support multiple tables")
Fixes: d1db275dd3f6e4 ("ipv6: ip6mr: support multiple tables")
Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 24f5b855e17df7e355eacd6c4a12cc4d6a6c9ff0 ]
ip6_rt_copy only sets dst.from if ort has flag RTF_ADDRCONF and RTF_DEFAULT.
but the prefix routes which did get installed by hand locally can have an
expiration, and no any flag combination which can ensure a potential from
does never expire, so we should always set the new created dst's from.
This also fixes the new created dst is always expired since the ort, which
is created by RA, maybe has RTF_EXPIRES and RTF_ADDRCONF, but no RTF_DEFAULT.
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a3300ef4bbb1f1e33ff0400e1e6cf7733d988f4f ]
Brett Ciphery reported that new ipv6 addresses failed to get installed
because the addrconf generated dsts where counted against the dst gc
limit. We don't need to count those routes like we currently don't count
administratively added routes.
Because the max_addresses check enforces a limit on unbounded address
generation first in case someone plays with router advertisments, we
are still safe here.
Reported-by: Brett Ciphery <brett.ciphery@windriver.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7150aede5dd241539686e17d9592f5ebd28a2cda ]
The behaviour of blackhole and prohibit routes has been corrected by setting
the input and output pointers of the dst variable appropriately. For
blackhole routes, they are set to dst_discard and to ip6_pkt_discard and
ip6_pkt_discard_out respectively for prohibit routes.
ipv6: ip6_pkt_prohibit(_out) should not depend on
CONFIG_IPV6_MULTIPLE_TABLES
We need ip6_pkt_prohibit(_out) available without
CONFIG_IPV6_MULTIPLE_TABLES
Signed-off-by: Kamala R <kamala@aristanetworks.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The upstream commit bb8140947a24 ("ip6tnl: allow to use rtnl ops on fb tunnel")
(backported into linux-3.10.y) left a bug which was fixed upstream by commit
1e9f3d6f1c40 ("ip6tnl: fix use after free of fb_tnl_dev").
The problem is a bit different in linux-3.10.y, because there is no x-netns
support (upstream commit 0bd8762824e7 ("ip6tnl: add x-netns support")).
When ip6_tunnel.ko is unloaded, FB device is deleted by rtnl_link_unregister()
and then we try to delete it again in ip6_tnl_destroy_tunnels().
This patch removes the second deletion.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7f88c6b23afbd31545c676dea77ba9593a1a14bf ]
IPv6 stats are 64 bits and thus are protected with a seqlock. By not
disabling bottom-half we could deadlock here if we don't disable bh and
a softirq reentrantly updates the same mib.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6aafeef03b9d9ecf255f3a80ed85ee070260e1ae ]
Pushing original fragments through causes several problems. For example
for matching, frags may not be matched correctly. Take following
example:
<example>
On HOSTA do:
ip6tables -I INPUT -p icmpv6 -j DROP
ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
and on HOSTB you do:
ping6 HOSTA -s2000 (MTU is 1500)
Incoming echo requests will be filtered out on HOSTA. This issue does
not occur with smaller packets than MTU (where fragmentation does not happen)
</example>
As was discussed previously, the only correct solution seems to be to use
reassembled skb instead of separete frags. Doing this has positive side
effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
dances in ipvs and conntrack can be removed.
Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
entirely and use code in net/ipv6/reassembly.c instead.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9037c3579a277f3a23ba476664629fda8c35f7c4 ]
If reassembled packet would fit into outdev MTU, it is not fragmented
according the original frag size and it is send as single big packet.
The second case is if skb is gso. In that case fragmentation does not happen
according to the original frag size.
This patch fixes these.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1fa4c710b6fe7b0aac9907240291b6fe6aafc3b8 ]
Offenders don't have port numbers, so set it to 0.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
functions
[ Upstream commit 85fbaa75037d0b6b786ff18658ddf0b4014ce2a4 ]
Commit bceaa90240b6019ed73b49965eac7d167610be69 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.
As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.
This broke traceroute and such.
Fixes: bceaa90240b6 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: Brad Spengler <spender@grsecurity.net>
Reported-by: Tom Labanowski
Cc: mpb <mpb.mail@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bceaa90240b6019ed73b49965eac7d167610be69 ]
Only update *addr_len when we actually fill in sockaddr, otherwise we
can return uninitialized memory from the stack to the caller in the
recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
checks because we only get called with a valid addr_len pointer either
from sock_common_recvmsg or inet_recvmsg.
If a blocking read waits on a socket which is concurrently shut down we
now return zero and set msg_msgnamelen to 0.
Reported-by: mpb <mpb.mail@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f8c31c8f80dd882f7eb49276989a4078d33d67a7 ]
Fixes a suspicious rcu derference warning.
Cc: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit f104a567e673f382b09542a8dc3500aa689957b4 ]
As the rfc 4191 said, the Router Preference and Lifetime values in a
::/0 Route Information Option should override the preference and lifetime
values in the Router Advertisement header. But when the kernel deals with
a ::/0 Route Information Option, the rt6_get_route_info() always return
NULL, that means that overriding will not happen, because those default
routers were added without flag RTF_ROUTEINFO in rt6_add_dflt_router().
In order to deal with that condition, we should call rt6_get_dflt_router
when the prefix length is 0.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0e033e04c2678dbbe74a46b23fffb7bb918c288e ]
Commit 1e2bd517c108816220f262d7954b697af03b5f9c ("udp6: Fix udp
fragmentation for tunnel traffic.") changed the calculation if
there is enough space to include a fragment header in the skb from a
skb->mac_header dervived one to skb_headroom. Because we already peeled
off the skb to transport_header this is wrong. Change this back to check
if we have enough room before the mac_header.
This fixes a panic Saran Neti reported. He used the tbf scheduler which
skb_gso_segments the skb. The offsets get negative and we panic in memcpy
because the skb was erroneously not expanded at the head.
Reported-by: Saran Neti <Saran.Neti@telus.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit e3bc10bd95d7fcc3f2ac690c6ff22833ea6781d6 ]
On receiving a packet too big icmp error we check if our current cached
dst_entry in the socket is still valid. This validation check did not
care about the expiration of the (cached) route.
The error path I traced down:
The socket receives a packet too big mtu notification. It still has a
valid dst_entry and thus issues the ip6_rt_pmtu_update on this dst_entry,
setting RTF_EXPIRE and updates the dst.expiration value (which could
fail because of not up-to-date expiration values, see previous patch).
In some seldom cases we race with a) the ip6_fib gc or b) another routing
lookup which would result in a recreation of the cached rt6_info from its
parent non-cached rt6_info. While copying the rt6_info we reinitialize the
metrics store by copying it over from the parent thus invalidating the
just installed pmtu update (both dsts use the same key to the inetpeer
storage). The dst_entry with the just invalidated metrics data would
just get its RTF_EXPIRES flag cleared and would continue to stay valid
for the socket.
We should have not issued the pmtu update on the already expired dst_entry
in the first placed. By checking the expiration on the dst entry and
doing a relookup in case it is out of date we close the race because
we would install a new rt6_info into the fib before we issue the pmtu
update, thus closing this race.
Not reliably updating the dst.expire value was fixed by the patch "ipv6:
reset dst.expires value when clearing expire flag".
Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
Reported-by: Valentijn Sessink <valentyn@blub.net>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Tested-by: Valentijn Sessink <valentyn@blub.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c2f17e827b419918c856131f592df9521e1a38e3 ]
Routes need to be probed asynchronous otherwise the call stack gets
exhausted when the kernel attemps to deliver another skb inline, like
e.g. xt_TEE does, and we probe at the same time.
We update neigh->updated still at once, otherwise we would send to
many probes.
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 550bab42f83308c9d6ab04a980cc4333cef1c8fa ]
Make sure rt6i_gateway contains nexthop information in
all routes returned from lookup or when routes are directly
attached to skb for generated ICMP packets.
The effect of this patch should be a faster version of
rt6_nexthop() and the consideration of local addresses as
nexthop.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ This is a simplified -stable version of a set of upstream commits. ]
This is a replacement patch only for stable which does fix the problems
handled by the following two commits in -net:
"ip_output: do skb ufo init for peeked non ufo skb as well" (e93b7d748be887cd7639b113ba7d7ef792a7efb9)
"ip6_output: do skb ufo init for peeked non ufo skb as well" (c547dbf55d5f8cf615ccc0e7265e98db27d3fb8b)
Three frames are written on a corked udp socket for which the output
netdevice has UFO enabled. If the first and third frame are smaller than
the mtu and the second one is bigger, we enqueue the second frame with
skb_append_datato_frags without initializing the gso fields. This leads
to the third frame appended regulary and thus constructing an invalid skb.
This fixes the problem by always using skb_append_datato_frags as soon
as the first frag got enqueued to the skb without marking the packet
as SKB_GSO_UDP.
The problem with only two frames for ipv6 was fixed by "ipv6: udp
packets following an UFO enqueued packet need also be handled by UFO"
(2811ebac2521ceac84f2bdae402455baa6a7fb47).
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 80ad1d61e72d626e30ebe8529a0455e660ca4693 ]
commit 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU /
hlist_nulls") incorrectly used sock_put() on TIMEWAIT sockets.
We should instead use inet_twsk_put()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bb8140947a247b9aa15652cc24dc555ebb0b64b0 ]
rtnl ops where introduced by c075b13098b3 ("ip6tnl: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.
Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because the fallback tunnel is added to the queue
in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 0bd8762824e7 ("ip6tnl: add x-netns support")).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 205983c43700ac3a81e7625273a3fa83cd2759b5 ]
rtnl ops where introduced by ba3e3f50a0e5 ("sit: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.
Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because the fallback tunnel is added to the queue
in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 5e6700b3bf98 ("sit: add support of x-netns")).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9260d3e1013701aa814d10c8fc6a9f92bd17d643 ]
It is possible for the timer handlers to run after the call to
ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the
handler function in order to do proper cleanup when the refcnt
reaches 0. Otherwise, the refcnt can reach zero without the
inet6_dev being destroyed and we end up leaking a reference to
the net_device and see messages like the following,
unregister_netdevice: waiting for eth0 to become free. Usage count = 1
Tested on linux-3.4.43.
Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3da812d860755925da890e8c713f2d2e2d7b1bae ]
gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header,
so initialize max_headroom to zero. Otherwise the
if (encap_limit >= 0) {
max_headroom += 8;
mtu -= 8;
}
increments an uninitialized variable before max_headroom was reset.
Found with coverity: 728539
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7df37ff33dc122f7bd0614d707939fe84322d264 ]
When a router is doing DNAT for 6to4/6rd packets the latest
anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for
6to4 and 6rd") will drop them because the IPv6 address embedded does
not match the IPv4 destination. This patch will allow them to pass by
testing if we have an address that matches on 6to4/6rd interface. I
have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR.
Also, log the dropped packets (with rate limit).
Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2811ebac2521ceac84f2bdae402455baa6a7fb47 ]
In the following scenario the socket is corked:
If the first UDP packet is larger then the mtu we try to append it to the
write queue via ip6_ufo_append_data. A following packet, which is smaller
than the mtu would be appended to the already queued up gso-skb via
plain ip6_append_data. This causes random memory corruptions.
In ip6_ufo_append_data we also have to be careful to not queue up the
same skb multiple times. So setup the gso frame only when no first skb
is available.
This also fixes a shortcoming where we add the current packet's length to
cork->length but return early because of a packet > mtu with dontfrag set
(instead of sutracting it again).
Found with trinity.
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 703133de331a7a7df47f31fb9de51dc6f68a9de8 ]
If local fragmentation is allowed, then ip_select_ident() and
ip_select_ident_more() need to generate unique IDs to ensure
correct defragmentation on the peer.
For example, if IPsec (tunnel mode) has to encrypt large skbs
that have local_df bit set, then all IP fragments that belonged
to different ESP datagrams would have used the same identificator.
If one of these IP fragments would get lost or reordered, then
peer could possibly stitch together wrong IP fragments that did
not belong to the same datagram. This would lead to a packet loss
or data corruption.
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 0d2ede929f61783aebfb9228e4d32a0546ee4d23 ]
IFLA_IPTUN_LOCAL and IFLA_IPTUN_REMOTE were inverted.
Introduced by c075b13098b3 (ip6tnl: advertise tunnel param via rtnl).
Signed-off-by: Ding Zhi <zhi.ding@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit ae7b4e1f213aa659aedf9c6ecad0bf5f0476e1e2 ]
When the kernel is compiled with CONFIG_IPV6_SUBTREES, and we return
with an error in fn = fib6_add_1(), then error codes are encoded into
the return pointer e.g. ERR_PTR(-ENOENT). In such an error case, we
write the error code into err and jump to out, hence enter the if(err)
condition. Now, if CONFIG_IPV6_SUBTREES is enabled, we check for:
if (pn != fn && pn->leaf == rt)
...
if (pn != fn && !pn->leaf && !(pn->fn_flags & RTN_RTINFO))
...
Since pn is NULL and fn is f.e. ERR_PTR(-ENOENT), then pn != fn
evaluates to true and causes a NULL-pointer dereference on further
checks on pn. Fix it, by setting both NULL in error case, so that
pn != fn already evaluates to false and no further dereference
takes place.
This was first correctly implemented in 4a287eba2 ("IPv6 routing,
NLM_F_* flag support: REPLACE and EXCL flags support, warn about
missing CREATE flag"), but the bug got later on introduced by
188c517a0 ("ipv6: return errno pointers consistently for fib6_add_1()").
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Lin Ming <mlin@ss.pku.edu.cn>
Cc: Matti Vaittinen <matti.vaittinen@nsn.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Matti Vaittinen <matti.vaittinen@nsn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8112b1fe071be01a28a774ed55909e6f4b29712d ]
In rfc4942 and rfc2460 I cannot find anything which would implicate to
drop packets which have only padding in tlv.
Current behaviour breaks TAHI Test v6LC.1.2.6.
Problem was intruduced in:
9b905fe6843 "ipv6/exthdrs: strict Pad1 and PadN check"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 3a1c756590633c0e86df606e5c618c190926a0df ]
In tcp_v6_do_rcv() code, when processing pkt options, we soley work
on our skb clone opt_skb that we've created earlier before entering
tcp_rcv_established() on our way. However, only in condition ...
if (np->rxopt.bits.rxtclass)
np->rcv_tclass = ipv6_get_dsfield(ipv6_hdr(skb));
... we work on skb itself. As we extract every other information out
of opt_skb in ipv6_pktoptions path, this seems wrong, since skb can
already be released by tcp_rcv_established() earlier on. When we try
to access it in ipv6_hdr(), we will dereference freed skb.
[ Bug added by commit 4c507d2897bd9b ("net: implement IP_RECVTOS for
IP_PKTOPTIONS") ]
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|