summaryrefslogtreecommitdiff
path: root/net/netlink
AgeCommit message (Collapse)AuthorFilesLines
2014-11-14fix misuses of f_count() in ppp and netlinkAl Viro1-1/+1
commit 24dff96a37a2ca319e75a74d3929b2de22447ca6 upstream. we used to check for "nobody else could start doing anything with that opened file" by checking that refcount was 2 or less - one for descriptor table and one we'd acquired in fget() on the way to wherever we are. That was race-prone (somebody else might have had a reference to descriptor table and do fget() just as we'd been checking) and it had become flat-out incorrect back when we switched to fget_light() on those codepaths - unlike fget(), it doesn't grab an extra reference unless the descriptor table is shared. The same change allowed a race-free check, though - we are safe exactly when refcount is less than 2. It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading to ppp one) and 2.6.17 for sendmsg() (netlink one). OTOH, netlink hadn't grown that check until 3.9 and ppp used to live in drivers/net, not drivers/net/ppp until 3.1. The bug existed well before that, though, and the same fix used to apply in old location of file. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28netlink: Fix handling of error from netlink_dump().Ben Pfaff1-2/+2
[ Upstream commit ac30ef832e6af0505b6f0251a6659adcfa74975e ] netlink_dump() returns a negative errno value on error. Until now, netlink_recvmsg() directly recorded that negative value in sk->sk_err, but that's wrong since sk_err takes positive errno values. (This manifests as userspace receiving a positive return value from the recv() system call, falsely indicating success.) This bug was introduced in the commit that started checking the netlink_dump() return value, commit b44d211 (netlink: handle errors from netlink_dump()). Multithreaded Netlink dumps are one way to trigger this behavior in practice, as described in the commit message for the userspace workaround posted here: http://openvswitch.org/pipermail/dev/2014-June/042339.html This commit also fixes the same bug in netlink_poll(), introduced in commit cd1df525d (netlink: add flow control for memory mapped I/O). Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26netlink: Only check file credentials for implicit destinationsEric W. Biederman1-1/+6
[ Upstream commit 2d7a85f4b06e9c27ff629f07a524c48074f07f81 ] It was possible to get a setuid root or setcap executable to write to it's stdout or stderr (which has been set made a netlink socket) and inadvertently reconfigure the networking stack. To prevent this we check that both the creator of the socket and the currentl applications has permission to reconfigure the network stack. Unfortunately this breaks Zebra which always uses sendto/sendmsg and creates it's socket without any privileges. To keep Zebra working don't bother checking if the creator of the socket has privilege when a destination address is specified. Instead rely exclusively on the privileges of the sender of the socket. Note from Andy: This is exactly Eric's code except for some comment clarifications and formatting fixes. Neither I nor, I think, anyone else is thrilled with this approach, but I'm hesitant to wait on a better fix since 3.15 is almost here. Note to stable maintainers: This is a mess. An earlier series of patches in 3.15 fix a rather serious security issue (CVE-2014-0181), but they did so in a way that breaks Zebra. The offending series includes: commit aa4cf9452f469f16cea8c96283b641b4576d4a7b Author: Eric W. Biederman <ebiederm@xmission.com> Date: Wed Apr 23 14:28:03 2014 -0700 net: Add variants of capable for use on netlink messages If a given kernel version is missing that series of fixes, it's probably worth backporting it and this patch. if that series is present, then this fix is critical if you care about Zebra. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: Use netlink_ns_capable to verify the permisions of netlink messagesEric W. Biederman1-1/+1
[ Upstream commit 90f62cf30a78721641e08737bda787552428061e ] It is possible by passing a netlink socket to a more privileged executable and then to fool that executable into writing to the socket data that happens to be valid netlink message to do something that privileged executable did not intend to do. To keep this from happening replace bare capable and ns_capable calls with netlink_capable, netlink_net_calls and netlink_ns_capable calls. Which act the same as the previous calls except they verify that the opener of the socket had the desired permissions as well. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26net: Add variants of capable for use on netlink messagesEric W. Biederman1-0/+65
[ Upstream commit aa4cf9452f469f16cea8c96283b641b4576d4a7b ] netlink_net_capable - The common case use, for operations that are safe on a network namespace netlink_capable - For operations that are only known to be safe for the global root netlink_ns_capable - The general case of capable used to handle special cases __netlink_ns_capable - Same as netlink_ns_capable except taking a netlink_skb_parms instead of the skbuff of a netlink message. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-26netlink: Rename netlink_capable netlink_allowedEric W. Biederman1-5/+5
[ Upstream commit 5187cd055b6e81fc6526109456f8b20623148d5f ] netlink_capable is a static internal function in af_netlink.c and we have better uses for the name netlink_capable. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-08net: rework recvmsg handler msg_name and msg_namelen logicHannes Frederic Sowa1-2/+0
[ Upstream commit f3d3342602f8bcbf37d7c46641cb9bca7618eb1c ] This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr || msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller <davem@davemloft.net> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14genl: Hold reference on correct module while netlink-dump.Pravin B Shelar1-9/+11
[ Upstream commit 33c6b1f6b154894321f5734e50c66621e9134e7e ] netlink dump operations take module as parameter to hold reference for entire netlink dump duration. Currently it holds ref only on genl module which is not correct when we use ops registered to genl from another module. Following patch adds module pointer to genl_ops so that netlink can hold ref count on it. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> CC: Jesse Gross <jesse@nicira.com> CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-14genl: Fix genl dumpit() locking.Pravin B Shelar1-5/+46
[ Upstream commit 9b96309c5b0b9e466773c07a5bc8b7b68fcf010a ] In case of genl-family with parallel ops off, dumpif() callback is expected to run under genl_lock, But commit def3117493eafd9df (genl: Allow concurrent genl callbacks.) changed this behaviour where only first dumpit() op was called under genl-lock. For subsequent dump, only nlk->cb_lock was taken. Following patch fixes it by defining locked dumpit() and done() callback which takes care of genl-locking. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> CC: Jesse Gross <jesse@nicira.com> CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-20Revert "genetlink: fix family dump race"Greg Kroah-Hartman1-7/+0
This reverts commit aab4f8d490ef8c184d854d5f630438c10406765c, commit 58ad436fcf49810aa006016107f494c9ac9013db upstream, as it causes problems. Cc: Johannes Berg <johannes.berg@intel.com> Cc: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-20genetlink: fix family dump raceJohannes Berg1-0/+7
commit 58ad436fcf49810aa006016107f494c9ac9013db upstream. When dumping generic netlink families, only the first dump call is locked with genl_lock(), which protects the list of families, and thus subsequent calls can access the data without locking, racing against family addition/removal. This can cause a crash. Fix it - the locking needs to be conditional because the first time around it's already locked. A similar bug was reported to me on an old kernel (3.4.47) but the exact scenario that happened there is no longer possible, on those kernels the first round wasn't locked either. Looking at the current code I found the race described above, which had also existed on the old kernel. Reported-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-11genetlink: release cb_lock before requesting additional moduleStanislaw Gruszka1-0/+2
[ Upstream commit c74f2b2678f40b80265dd53556f1f778c8e1823f ] Requesting external module with cb_lock taken can result in the deadlock like showed below: [ 2458.111347] Showing all locks held in the system: [ 2458.111347] 1 lock held by NetworkManager/582: [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162bc79>] genl_rcv+0x19/0x40 [ 2458.111347] 1 lock held by modprobe/603: [ 2458.111347] #0: (cb_lock){++++++}, at: [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 [ 2461.579457] SysRq : Show Blocked State [ 2461.580103] task PC stack pid father [ 2461.580103] NetworkManager D ffff880034b84500 4040 582 1 0x00000080 [ 2461.580103] ffff8800197ff720 0000000000000046 00000000001d5340 ffff8800197fffd8 [ 2461.580103] ffff8800197fffd8 00000000001d5340 ffff880019631700 7fffffffffffffff [ 2461.580103] ffff8800197ff880 ffff8800197ff878 ffff880019631700 ffff880019631700 [ 2461.580103] Call Trace: [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 [ 2461.580103] [<ffffffff81731ad1>] schedule_timeout+0x1c1/0x360 [ 2461.580103] [<ffffffff810e69eb>] ? mark_held_locks+0xbb/0x140 [ 2461.580103] [<ffffffff817377ac>] ? _raw_spin_unlock_irq+0x2c/0x50 [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2461.580103] [<ffffffff81736398>] wait_for_completion_killable+0xe8/0x170 [ 2461.580103] [<ffffffff810b7fa0>] ? wake_up_state+0x20/0x20 [ 2461.580103] [<ffffffff81095825>] call_usermodehelper_exec+0x1a5/0x210 [ 2461.580103] [<ffffffff817362ed>] ? wait_for_completion_killable+0x3d/0x170 [ 2461.580103] [<ffffffff81095cc3>] __request_module+0x1b3/0x370 [ 2461.580103] [<ffffffff810e6b6d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 2461.580103] [<ffffffff8162c5c9>] ctrl_getfamily+0x159/0x190 [ 2461.580103] [<ffffffff8162d8a4>] genl_family_rcv_msg+0x1f4/0x2e0 [ 2461.580103] [<ffffffff8162d990>] ? genl_family_rcv_msg+0x2e0/0x2e0 [ 2461.580103] [<ffffffff8162da1e>] genl_rcv_msg+0x8e/0xd0 [ 2461.580103] [<ffffffff8162b729>] netlink_rcv_skb+0xa9/0xc0 [ 2461.580103] [<ffffffff8162bc88>] genl_rcv+0x28/0x40 [ 2461.580103] [<ffffffff8162ad6d>] netlink_unicast+0xdd/0x190 [ 2461.580103] [<ffffffff8162b149>] netlink_sendmsg+0x329/0x750 [ 2461.580103] [<ffffffff815db849>] sock_sendmsg+0x99/0xd0 [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 [ 2461.580103] [<ffffffff810e96e8>] ? lock_release_non_nested+0x308/0x350 [ 2461.580103] [<ffffffff815dbc6e>] ___sys_sendmsg+0x39e/0x3b0 [ 2461.580103] [<ffffffff810565af>] ? kvm_clock_read+0x2f/0x50 [ 2461.580103] [<ffffffff810218b9>] ? sched_clock+0x9/0x10 [ 2461.580103] [<ffffffff810bb2bd>] ? sched_clock_local+0x1d/0x80 [ 2461.580103] [<ffffffff810bb448>] ? sched_clock_cpu+0xa8/0x100 [ 2461.580103] [<ffffffff810e33ad>] ? trace_hardirqs_off+0xd/0x10 [ 2461.580103] [<ffffffff810bb58f>] ? local_clock+0x5f/0x70 [ 2461.580103] [<ffffffff810e3f7f>] ? lock_release_holdtime.part.28+0xf/0x1a0 [ 2461.580103] [<ffffffff8120fec9>] ? fget_light+0xf9/0x510 [ 2461.580103] [<ffffffff8120fe0c>] ? fget_light+0x3c/0x510 [ 2461.580103] [<ffffffff815dd1d2>] __sys_sendmsg+0x42/0x80 [ 2461.580103] [<ffffffff815dd222>] SyS_sendmsg+0x12/0x20 [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b [ 2461.580103] modprobe D ffff88000f2c8000 4632 603 602 0x00000080 [ 2461.580103] ffff88000f04fba8 0000000000000046 00000000001d5340 ffff88000f04ffd8 [ 2461.580103] ffff88000f04ffd8 00000000001d5340 ffff8800377d4500 ffff8800377d4500 [ 2461.580103] ffffffff81d0b260 ffffffff81d0b268 ffffffff00000000 ffffffff81d0b2b0 [ 2461.580103] Call Trace: [ 2461.580103] [<ffffffff817355f9>] schedule+0x29/0x70 [ 2461.580103] [<ffffffff81736d4d>] rwsem_down_write_failed+0xed/0x1a0 [ 2461.580103] [<ffffffff810bb200>] ? update_cpu_load_active+0x10/0xb0 [ 2461.580103] [<ffffffff8137b473>] call_rwsem_down_write_failed+0x13/0x20 [ 2461.580103] [<ffffffff8173492d>] ? down_write+0x9d/0xb2 [ 2461.580103] [<ffffffff8162baa5>] ? genl_lock_all+0x15/0x30 [ 2461.580103] [<ffffffff8162baa5>] genl_lock_all+0x15/0x30 [ 2461.580103] [<ffffffff8162cbb3>] genl_register_family+0x53/0x1f0 [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffff8162d650>] genl_register_family_with_ops+0x20/0x80 [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffffa017fe84>] nl80211_init+0x24/0xf0 [cfg80211] [ 2461.580103] [<ffffffffa01dc000>] ? 0xffffffffa01dbfff [ 2461.580103] [<ffffffffa01dc043>] cfg80211_init+0x43/0xdb [cfg80211] [ 2461.580103] [<ffffffff810020fa>] do_one_initcall+0xfa/0x1b0 [ 2461.580103] [<ffffffff8105cb93>] ? set_memory_nx+0x43/0x50 [ 2461.580103] [<ffffffff810f75af>] load_module+0x1c6f/0x27f0 [ 2461.580103] [<ffffffff810f2c90>] ? store_uevent+0x40/0x40 [ 2461.580103] [<ffffffff810f82c6>] SyS_finit_module+0x86/0xb0 [ 2461.580103] [<ffffffff81741ad9>] system_call_fastpath+0x16/0x1b [ 2461.580103] Sched Debug Version: v0.10, 3.11.0-0.rc1.git4.1.fc20.x86_64 #1 Problem start to happen after adding net-pf-16-proto-16-family-nl80211 alias name to cfg80211 module by below commit (though that commit itself is perfectly fine): commit fb4e156886ce6e8309e912d8b370d192330d19d3 Author: Marcel Holtmann <marcel@holtmann.org> Date: Sun Apr 28 16:22:06 2013 -0700 nl80211: Add generic netlink module alias for cfg80211/nl80211 Reported-and-tested-by: Jeff Layton <jlayton@redhat.com> Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Reviewed-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-11netlink: fix error propagation in netlink_mmap()Patrick McHardy1-1/+1
Return the error if something went wrong instead of unconditionally returning 0. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-04net: fix sk_buff head without data areaPablo Neira1-1/+1
Eric Dumazet spotted that we have to check skb->head instead of skb->data as skb->head points to the beginning of the data area of the skbuff. Similarly, we have to initialize the skb->head pointer, not skb->data in __alloc_skb_head. After this fix, netlink crashes in the release path of the sk_buff, so let's fix that as well. This bug was introduced in (0ebd0ac net: add function to allocate sk_buff head without data area). Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-01netlink: kconfig: move mmap i/o into netlink kconfigDaniel Borkmann1-0/+9
Currently, in menuconfig, Netlink's new mmaped IO is the very first entry under the ``Networking support'' item and comes even before ``Networking options'': [ ] Netlink: mmaped IO Networking options ---> ... Lets move this into ``Networking options'' under netlink's Kconfig, since this might be more appropriate. Introduced by commit ccdfcc398 (``netlink: mmaped netlink: ring setup''). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-01netlink: Fix skb ref counting.Pravin B Shelar1-1/+0
Commit f9c2288837ba072b21dba955f04a4c97eaa77b1e (netlink: implement memory mapped recvmsg) increamented skb->users ref count twice for a dump op which does not look right. Following patch fixes that. CC: Patrick McHardy <kaber@trash.net> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-26genetlink: fix possible memory leak in genl_family_rcv_msg()Wei Yongjun1-2/+3
'attrbuf' is malloced in genl_family_rcv_msg() when family->maxattr && family->parallel_ops, thus should be freed before leaving from the error handling cases, otherwise it will cause memory leak. Introduced by commit def3117493eafd9dfa1f809d861e0031b2cc8a07 (genl: Allow concurrent genl callbacks.) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-25genl: Allow concurrent genl callbacks.Pravin B Shelar1-38/+76
All genl callbacks are serialized by genl-mutex. This can become bottleneck in multi threaded case. Following patch adds an parameter to genl_family so that a particular family can get concurrent netlink callback without genl_lock held. New rw-sem is used to protect genl callback from genl family unregister. in case of parallel_ops genl-family read-lock is taken for callbacks and write lock is taken for register or unregistration for any family. In case of locked genl family semaphore and gel-mutex is locked for any openration. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-24netlink: fix compilation after memory mapped patchesNicolas Dichtel1-4/+4
Depending of the kernel configuration (CONFIG_UIDGID_STRICT_TYPE_CHECKS), we can get the following errors: net/netlink/af_netlink.c: In function ‘netlink_queue_mmaped_skb’: net/netlink/af_netlink.c:663:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kuid_t’ net/netlink/af_netlink.c:664:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kgid_t’ net/netlink/af_netlink.c: In function ‘netlink_ring_set_copied’: net/netlink/af_netlink.c:693:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kuid_t’ net/netlink/af_netlink.c:694:14: error: incompatible types when assigning to type ‘__u32’ from type ‘kgid_t’ We must use the helpers to get the uid and gid, and also take care of user_ns. Fix suggested by Eric W. Biederman <ebiederm@xmission.com>. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-23netlink: Fix build with mmap disabled.David S. Miller1-0/+7
net/netlink/diag.c: In function 'sk_diag_put_rings_cfg': net/netlink/diag.c:28:17: error: 'struct netlink_sock' has no member named 'pg_vec_lock' net/netlink/diag.c:29:29: error: 'struct netlink_sock' has no member named 'rx_ring' net/netlink/diag.c:31:30: error: 'struct netlink_sock' has no member named 'tx_ring' net/netlink/diag.c:33:19: error: 'struct netlink_sock' has no member named 'pg_vec_lock' Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-23netlink: fix typo in net/netlink/af_netlink.cStephen Rothwell1-1/+1
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: add RX/TX-ring support to netlink diagPatrick McHardy1-0/+32
Based on AF_PACKET. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: add flow control for memory mapped I/OPatrick McHardy1-27/+61
Add flow control for memory mapped RX. Since user-space usually doesn't invoke recvmsg() when using memory mapped I/O, flow control is performed in netlink_poll(). Dumps are allowed to continue if at least half of the ring frames are unused. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: implement memory mapped recvmsg()Patrick McHardy1-3/+141
Add support for mmap'ed recvmsg(). To allow the kernel to construct messages into the mapped area, a dataless skb is allocated and the data pointer is set to point into the ring frame. This means frames will be delivered to userspace in order of allocation instead of order of transmission. This usually doesn't matter since the order is either not determinable by userspace or message creation/transmission is serialized. The only case where this can have a visible difference is nfnetlink_queue. Userspace can't assume mmap'ed messages have ordered IDs anymore and needs to check this if using batched verdicts. For non-mapped sockets, nothing changes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: implement memory mapped sendmsg()Patrick McHardy1-6/+129
Add support for mmap'ed sendmsg() to netlink. Since the kernel validates received messages before processing them, the code makes sure userspace can't modify the message contents after invoking sendmsg(). To do that only a single mapping of the TX ring is allowed to exist and the socket must not be shared. If either of these two conditions does not hold, it falls back to copying. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: add mmap'ed netlink helper functionsPatrick McHardy1-2/+183
Add helper functions for looking up mmap'ed frame headers, reading and writing their status, allocating skbs with mmap'ed data areas and a poll function. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: mmaped netlink: ring setupPatrick McHardy2-2/+286
Add support for mmap'ed RX and TX ring setup and teardown based on the af_packet.c code. The following patches will use this to add the real mmap'ed receive and transmit functionality. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: add netlink_skb_set_owner_r()Patrick McHardy1-3/+17
For mmap'ed I/O a netlink specific skb destructor needs to be invoked after the final kfree_skb() to clean up state. This doesn't work currently since the skb's ownership is transfered to the receiving socket using skb_set_owner_r(), which orphans the skb, thereby invoking the destructor prematurely. Since netlink doesn't account skbs to the originating socket, there's no need to orphan the skb. Add a netlink specific skb_set_owner_r() variant that does not orphan the skb and use a netlink specific destructor to call sock_rfree(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: don't orphan skb in netlink_trim()Patrick McHardy1-1/+1
Netlink doesn't account skbs to the sending socket, so the there's no need to orphan the skb before trimming it. Removing the skb_orphan() call is required for mmap'ed netlink, which uses a netlink specific skb destructor that must not be invoked before the final freeing of the skb. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: rename ssk to sk in struct netlink_skb_paramsPatrick McHardy1-1/+1
Memory mapped netlink needs to store the receiving userspace socket when sending from the kernel to userspace. Rename 'ssk' to 'sk' to avoid confusion. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19netlink: add symbolic value for congested statePatrick McHardy1-7/+11
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28net-next: replace obsolete NLMSG_* with type safe nlmsg_*Hong zhi guo1-2/+2
Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21netlink: Diag core and basic socket info dumping (v2)Andrey Vagin3-0/+201
The netlink_diag can be built as a module, just like it's done in unix sockets. The core dumping message carries the basic info about netlink sockets: family, type and protocol, portis, dst_group, dst_portid, state. Groups can be received as an optional parameter NETLINK_DIAG_GROUPS. Netlink sockets cab be filtered by protocols. The socket inode number and cookie is reserved for future per-socket info retrieving. The per-protocol filtering is also reserved for future by requiring the sdiag_protocol to be zero. The file /proc/net/netlink doesn't provide enough information for dumping netlink sockets. It doesn't provide dst_group, dst_portid, groups above 32. v2: fix NETLINK_DIAG_MAX. Now it's equal to the last constant. Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21net: prepare netlink code for netlink diagAndrey Vagin2-54/+67
Move a few declarations in a header. Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20genetlink: trigger BUG_ON if a group name is too longMasatake YAMATO1-0/+1
Trigger BUG_ON if a group name is longer than GENL_NAMSIZ. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin1-19/+11
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-26Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
2013-02-22new helper: file_inode(file)Al Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-18net: proc: change proc_net_remove to remove_proc_entryGao feng1-1/+1
proc_net_remove is only used to remove proc entries that under /proc/net,it's not a general function for removing proc entries of netns. if we want to remove some proc entries which under /proc/net/stat/, we still need to call remove_proc_entry. this patch use remove_proc_entry to replace proc_net_remove. we can remove proc_net_remove after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18net: proc: change proc_net_fops_create to proc_createGao feng1-1/+1
Right now, some modules such as bonding use proc_create to create proc entries under /proc/net/, and other modules such as ipv4 use proc_net_fops_create. It looks a little chaos.this patch changes all of proc_net_fops_create to proc_create. we can remove proc_net_fops_create after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-09netlink: Use FIELD_SIZEOF() in netlink_proto_init().YOSHIFUJI Hideaki / 吉藤英明1-2/+1
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-17netlink: validate addr_len on bindHannes Frederic Sowa1-0/+3
Otherwise an out of bounds read could happen. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-17netlink: change presentation of portid in procfs to unsignedHannes Frederic Sowa1-1/+1
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Allow userns root to control llc, netfilter, netlink, packet, and xfrmEric W. Biederman1-1/+1
Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Allow creation of af_key sockets. Allow creation of llc sockets. Allow creation of af_packet sockets. Allow sending xfrm netlink control messages. Allow binding to netlink multicast groups. Allow sending to netlink multicast groups. Allow adding and dropping netlink multicast groups. Allow sending to all netlink multicast groups and port ids. Allow reading the netfilter SO_IP_SET socket option. Allow sending netfilter netlink messages. Allow setting and getting ip_vs netfilter socket options. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-18netlink: use kfree_rcu() in netlink_release()Eric Dumazet1-4/+15
On some suspend/resume operations involving wimax device, we have noticed some intermittent memory corruptions in netlink code. Stéphane Marchesin tracked this corruption in netlink_update_listeners() and suggested a patch. It appears netlink_release() should use kfree_rcu() instead of kfree() for the listeners structure as it may be used by other cpus using RCU protection. netlink_release() must set to NULL the listeners pointer when it is about to be freed. Also have to protect netlink_update_listeners() and netlink_has_listeners() if listeners is NULL. Add a nl_deref_protected() lockdep helper to properly document which locks protects us. Reported-by: Jonathan Kliegman <kliegs@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stéphane Marchesin <marcheu@google.com> Cc: Sam Leffler <sleffler@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-07netlink: add reference of module in netlink_dump_startGao feng1-8/+21
I get a panic when I use ss -a and rmmod inet_diag at the same time. It's because netlink_dump uses inet_diag_dump which belongs to module inet_diag. I search the codes and find many modules have the same problem. We need to add a reference to the module which the cb->dump belongs to. Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo. Change From v3: change netlink_dump_start to inline,suggestion from Pablo and Eric. Change From v2: delete netlink_dump_done,and call module_put in netlink_dump and netlink_sock_destruct. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10netlink: Rename pid to portid to avoid confusionEric W. Biederman2-107/+107
It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-08netlink: hide struct module parameter in netlink_kernel_createPablo Neira Ayuso2-7/+4
This patch defines netlink_kernel_create as a wrapper function of __netlink_kernel_create to hide the struct module *me parameter (which seems to be THIS_MODULE in all existing netlink subsystems). Suggested by David S. Miller. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-08netlink: kill netlink_set_nonrootPablo Neira Ayuso2-17/+14
Replace netlink_set_nonroot by one new field `flags' in struct netlink_kernel_cfg that is passed to netlink_kernel_create. This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since now the flags field in nl_table is generic (so we can add more flags if needed in the future). Also adjust all callers in the net-next tree to use these flags instead of netlink_set_nonroot. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07scm: Don't use struct ucred in NETLINK_CB and struct scm_cookie.Eric W. Biederman1-1/+1
Passing uids and gids on NETLINK_CB from a process in one user namespace to a process in another user namespace can result in the wrong uid or gid being presented to userspace. Avoid that problem by passing kuids and kgids instead. - define struct scm_creds for use in scm_cookie and netlink_skb_parms that holds uid and gid information in kuid_t and kgid_t. - Modify scm_set_cred to fill out scm_creds by heand instead of using cred_to_ucred to fill out struct ucred. This conversion ensures userspace does not get incorrect uid or gid values to look at. - Modify scm_recv to convert from struct scm_creds to struct ucred before copying credential values to userspace. - Modify __scm_send to populate struct scm_creds on in the scm_cookie, instead of just copying struct ucred from userspace. - Modify netlink_sendmsg to copy scm_creds instead of struct ucred into the NETLINK_CB. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>