summaryrefslogtreecommitdiff
path: root/net/sunrpc/xprtsock.c
AgeCommit message (Collapse)AuthorFilesLines
2014-03-12SUNRPC: close a rare race in xs_tcp_setup_socket.NeilBrown1-4/+9
commit 93dc41bdc5c853916610576c6b48a1704959c70d upstream. We have one report of a crash in xs_tcp_setup_socket. The call path to the crash is: xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested. The 'sock' passed to that last function is NULL. The only way I can see this happening is a concurrent call to xs_close: xs_close -> xs_reset_transport -> sock_release -> inet_release inet_release sets: sock->sk = NULL; inet_stream_connect calls lock_sock(sock->sk); which gets NULL. All calls to xs_close are protected by XPRT_LOCKED as are most activations of the workqueue which runs xs_tcp_setup_socket. The exception is xs_tcp_schedule_linger_timeout. So presumably the timeout queued by the later fires exactly when some other code runs xs_close(). To protect against this we can move the cancel_delayed_work_sync() call from xs_destory() to xs_close(). As xs_close is never called from the worker scheduled on ->connect_worker, this can never deadlock. Signed-off-by: NeilBrown <neilb@suse.de> [Trond: Make it safe to call cancel_delayed_work_sync() on AF_LOCAL sockets] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2014-03-05SUNRPC: Fix races in xs_nospace()Trond Myklebust1-1/+5
commit 06ea0bfe6e6043cb56a78935a19f6f8ebc636226 upstream. When a send failure occurs due to the socket being out of buffer space, we call xs_nospace() in order to have the RPC task wait until the socket has drained enough to make it worth while trying again. The current patch fixes a race in which the socket is drained before we get round to setting up the machinery in xs_nospace(), and which is reported to cause hangs. Link: http://lkml.kernel.org/r/20140210170315.33dfc621@notabene.brown Fixes: a9a6b52ee1ba (SUNRPC: Don't start the retransmission timer...) Reported-by: Neil Brown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2013-11-29SUNRPC: Fix a data corruption issue when retransmitting RPC callsTrond Myklebust1-7/+21
commit a6b31d18b02ff9d7915c5898c9b5ca41a798cd73 upstream. The following scenario can cause silent data corruption when doing NFS writes. It has mainly been observed when doing database writes using O_DIRECT. 1) The RPC client uses sendpage() to do zero-copy of the page data. 2) Due to networking issues, the reply from the server is delayed, and so the RPC client times out. 3) The client issues a second sendpage of the page data as part of an RPC call retransmission. 4) The reply to the first transmission arrives from the server _before_ the client hardware has emptied the TCP socket send buffer. 5) After processing the reply, the RPC state machine rules that the call to be done, and triggers the completion callbacks. 6) The application notices the RPC call is done, and reuses the pages to store something else (e.g. a new write). 7) The client NIC drains the TCP socket send buffer. Since the page data has now changed, it reads a corrupted version of the initial RPC call, and puts it on the wire. This patch fixes the problem in the following manner: The ordering guarantees of TCP ensure that when the server sends a reply, then we know that the _first_ transmission has completed. Using zero-copy in that situation is therefore safe. If a time out occurs, we then send the retransmission using sendmsg() (i.e. no zero-copy), We then know that the socket contains a full copy of the data, and so it will retransmit a faithful reproduction even if the RPC call completes, and the application reuses the O_DIRECT buffer in the meantime. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-09Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-1/+12
Pull NFS client updates from Trond Myklebust: "Highlights include: - Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as lease loss due to a network partition, where doing so may result in data corruption. Add a kernel parameter to control choice of legacy behaviour or not. - Performance improvements when 2 processes are writing to the same file. - Flush data to disk when an RPCSEC_GSS session timeout is imminent. - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other NFS clients from being able to manipulate our lease and file locking state. - Allow sharing of RPCSEC_GSS caches between different rpc clients. - Fix the broken NFSv4 security auto-negotiation between client and server. - Fix rmdir() to wait for outstanding sillyrename unlinks to complete - Add a tracepoint framework for debugging NFSv4 state recovery issues. - Add tracing to the generic NFS layer. - Add tracing for the SUNRPC socket connection state. - Clean up the rpc_pipefs mount/umount event management. - Merge more patches from Chuck in preparation for NFSv4 migration support" * tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits) NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set NFSv4: Allow security autonegotiation for submounts NFSv4: Disallow security negotiation for lookups when 'sec=' is specified NFSv4: Fix security auto-negotiation NFS: Clean up nfs_parse_security_flavors() NFS: Clean up the auth flavour array mess NFSv4.1 Use MDS auth flavor for data server connection NFS: Don't check lock owner compatability unless file is locked (part 2) NFS: Don't check lock owner compatibility in writes unless file is locked nfs4: Map NFS4ERR_WRONG_CRED to EPERM nfs4.1: Add SP4_MACH_CRED write and commit support nfs4.1: Add SP4_MACH_CRED stateid support nfs4.1: Add SP4_MACH_CRED secinfo support nfs4.1: Add SP4_MACH_CRED cleanup support nfs4.1: Add state protection handler nfs4.1: Minimal SP4_MACH_CRED implementation SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid SUNRPC: Add an identifier for struct rpc_clnt SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints ...
2013-09-04SUNRPC: Add tracepoints to help debug socket connection issuesTrond Myklebust1-1/+12
Add client side debugging to help trace socket connection/disconnection and unexpected state change issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-24net: add sk_stream_is_writeable() helperEric Dumazet1-1/+1
Several call sites use the hardcoded following condition : sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) Lets use a helper because TCP_NOTSENT_LOWAT support will change this condition for TCP sockets. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-1/+0
Pull nfsd changes from Bruce Fields: "Changes this time include: - 4.1 enabled on the server by default: the last 4.1-specific issues I know of are fixed, so we're not going to find the rest of the bugs without more exposure. - Experimental support for NFSv4.2 MAC Labeling (to allow running selinux over NFS), from Dave Quigley. - Fixes for some delicate cache/upcall races that could cause rare server hangs; thanks to Neil Brown and Bodo Stroesser for extreme debugging persistence. - Fixes for some bugs found at the recent NFS bakeathon, mostly v4 and v4.1-specific, but also a generic bug handling fragmented rpc calls" * 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits) nfsd4: support minorversion 1 by default nfsd4: allow destroy_session over destroyed session svcrpc: fix failures to handle -1 uid's sunrpc: Don't schedule an upcall on a replaced cache entry. net/sunrpc: xpt_auth_cache should be ignored when expired. sunrpc/cache: ensure items removed from cache do not have pending upcalls. sunrpc/cache: use cache_fresh_unlocked consistently and correctly. sunrpc/cache: remove races with queuing an upcall. nfsd4: return delegation immediately if lease fails nfsd4: do not throw away 4.1 lock state on last unlock nfsd4: delegation-based open reclaims should bypass permissions svcrpc: don't error out on small tcp fragment svcrpc: fix handling of too-short rpc's nfsd4: minor read_buf cleanup nfsd4: fix decoding of compounds across page boundaries nfsd4: clean up nfs4_open_delegation NFSD: Don't give out read delegations on creates nfsd4: allow client to send no cb_sec flavors nfsd4: fail attempts to request gss on the backchannel nfsd4: implement minimal SP4_MACH_CRED ...
2013-06-13net: Convert uses of typedef ctl_table to struct ctl_tableJoe Perches1-2/+2
Reduce the uses of this unnecessary typedef. Done via perl script: $ git grep --name-only -w ctl_table net | \ xargs perl -p -i -e '\ sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \ s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge' Reflow the modified lines that now exceed 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-15sunrpc: server back channel needs no rpcbind methodJ. Bruce Fields1-1/+0
XPRT_BOUND is set on server backchannel xprts by xs_setup_bc_tcp() (using xprt_set_bound()), and is never cleared, so ->rpcbind() will never need to be called. Reported-by: "Myklebust, Trond" <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-26SUNRPC: attempt AF_LOCAL connect on setupJ. Bruce Fields1-0/+3
In the gss-proxy case, setup time is when I know I'll have the right namespace for the connect. In other cases, it might be useful to get any connection errors earlier--though actually in practice it doesn't make any difference for rpcbind. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-26Merge Trond's nfs-for-nextJ. Bruce Fields1-5/+9
Merging Trond's nfs-for-next branch, mainly to get b7993cebb841b0da7a33e9d5ce301a9fd3209165 "SUNRPC: Allow rpc_create() to request that TCP slots be unlimited", which a small piece of the gss-proxy work depends on.
2013-04-14SUNRPC: Allow rpc_create() to request that TCP slots be unlimitedTrond Myklebust1-1/+5
This is mainly for use by NFSv4.1, where the session negotiation ultimately wants to decide how many RPC slots we can fill. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25SUNRPC: Report network/connection errors correctly for SOFTCONN rpc tasksTrond Myklebust1-4/+4
In the case of a SOFTCONN rpc task, we really want to ensure that it reports errors like ENETUNREACH back to the caller. Currently, only some of these errors are being reported back (connect errors are not), and they are being converted by the RPC layer into EIO. Reported-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-09sunrpc: don't attempt to cancel unitialized workJ. Bruce Fields1-5/+10
As of dc107402ae06286a9ed33c32daf3f35514a7cb8d "SUNRPC: make AF_LOCAL connect synchronous", we no longer initialize connect_worker in the AF_LOCAL case, resulting in warnings like: WARNING: at lib/debugobjects.c:261 debug_print_object+0x8c/0xb0() Hardware name: Bochs ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20 Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss nfs_acl lockd sunrpc Pid: 4816, comm: nfsd Tainted: G W 3.8.0-rc2-00049-gdc10740 #801 Call Trace: [<ffffffff8156ec00>] ? free_obj_work+0x60/0xa0 [<ffffffff81046aaf>] warn_slowpath_common+0x7f/0xc0 [<ffffffff81046ba6>] warn_slowpath_fmt+0x46/0x50 [<ffffffff8156eccc>] debug_print_object+0x8c/0xb0 [<ffffffff81055030>] ? timer_debug_hint+0x10/0x10 [<ffffffff8156f7e3>] debug_object_assert_init+0xe3/0x120 [<ffffffff81057ebb>] del_timer+0x2b/0x80 [<ffffffff8109c4e6>] ? mark_held_locks+0x86/0x110 [<ffffffff81065a29>] try_to_grab_pending+0xd9/0x150 [<ffffffff81065b57>] __cancel_work_timer+0x27/0xc0 [<ffffffff81065c03>] cancel_delayed_work_sync+0x13/0x20 [<ffffffffa0007067>] xs_destroy+0x27/0x80 [sunrpc] [<ffffffffa00040d8>] xprt_destroy+0x78/0xa0 [sunrpc] [<ffffffffa0006241>] xprt_put+0x21/0x30 [sunrpc] [<ffffffffa00030cf>] rpc_free_client+0x10f/0x1a0 [sunrpc] [<ffffffffa0002ff3>] ? rpc_free_client+0x33/0x1a0 [sunrpc] [<ffffffffa0002f7e>] rpc_release_client+0x6e/0xb0 [sunrpc] [<ffffffffa000325d>] rpc_shutdown_client+0xfd/0x1b0 [sunrpc] [<ffffffffa0017196>] rpcb_put_local+0x106/0x130 [sunrpc] ... Acked-by: "Myklebust, Trond" <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-28Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-8/+27
Pull nfsd changes from J Bruce Fields: "Miscellaneous bugfixes, plus: - An overhaul of the DRC cache by Jeff Layton. The main effect is just to make it larger. This decreases the chances of intermittent errors especially in the UDP case. But we'll need to watch for any reports of performance regressions. - Containerized nfsd: with some limitations, we now support per-container nfs-service, thanks to extensive work from Stanislav Kinsbursky over the last year." Some notes about conflicts, since there were *two* non-data semantic conflicts here: - idr_remove_all() had been added by a memory leak fix, but has since become deprecated since idr_destroy() does it for us now. - xs_local_connect() had been added by this branch to make AF_LOCAL connections be synchronous, but in the meantime Trond had changed the calling convention in order to avoid a RCU dereference. There were a couple of more obvious actual source-level conflicts due to the hlist traversal changes and one just due to code changes next to each other, but those were trivial. * 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits) SUNRPC: make AF_LOCAL connect synchronous nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum svcrpc: fix rpc server shutdown races svcrpc: make svc_age_temp_xprts enqueue under sv_lock lockd: nlmclnt_reclaim(): avoid stack overflow nfsd: enable NFSv4 state in containers nfsd: disable usermode helper client tracker in container nfsd: use proper net while reading "exports" file nfsd: containerize NFSd filesystem nfsd: fix comments on nfsd_cache_lookup SUNRPC: move cache_detail->cache_request callback call to cache_read() SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function SUNRPC: rework cache upcall logic SUNRPC: introduce cache_detail->cache_request callback NFS: simplify and clean cache library NFS: use SUNRPC cache creation and destruction helper for DNS cache nfsd4: free_stid can be static nfsd: keep a checksum of the first 256 bytes of request sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer sunrpc: fix comment in struct xdr_buf definition ...
2013-02-28SUNRPC: make AF_LOCAL connect synchronousJ. Bruce Fields1-8/+27
It doesn't appear that anyone actually needs to connect asynchronously. Also, using a workqueue for the connect means we lose the namespace information from the original process. This is a problem since there's no way to explicitly pass in a filesystem namespace for resolution of an AF_LOCAL address. Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-05sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to ↵Jeff Layton1-0/+1
addr.h These routines are used by server and client code, so having them in a separate header would be best. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-01SUNRPC: Pass pointers to struct rpc_xprt to the congestion windowTrond Myklebust1-3/+3
Avoid access to task->tk_xprt Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-01SUNRPC: Fix an RCU dereference in xs_local_rpcbindTrond Myklebust1-1/+3
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-01SUNRPC: Pass a pointer to struct rpc_xprt to the connect callbackTrond Myklebust1-2/+2
Avoid another RCU dereference by passing the pointer to struct rpc_xprt from the caller. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-01SUNRPC: Eliminate task->tk_xprt accesses that bypass rcu_dereference()Trond Myklebust1-1/+1
tk_xprt is just a shortcut for tk_client->cl_xprt, however cl_xprt is defined as an __rcu variable. Replace dereferences of tk_xprt with non-rcu dereferences where it is safe to do so. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15SUNRPC: variable 'svsk' is unused in function bc_send_requestTrond Myklebust1-2/+0
Silence a compile time warning. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15SUNRPC: Handle ECONNREFUSED in xs_local_setup_socketTrond Myklebust1-0/+4
Silence the unnecessary warning "unhandled error (111) connecting to..." and convert it to a dprintk for debugging purposes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04SUNRPC: remove BUG_ON from bc_mallocWeston Andros Adamson1-2/+4
Replace BUG_ON() with WARN_ON_ONCE() and NULL return - the caller will handle this like a memory allocation failure. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04SUNRPC: remove BUG_ONs from *_reclassify_socket*Weston Andros Adamson1-3/+4
Replace multiple BUG_ON() calls with WARN_ON_ONCE() and early return when sanity checking socket ownership (lock). The bind call will fail if the socket was unsuccessfully reclassified. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-24SUNRPC: Get rid of the xs_error_report socket callbackTrond Myklebust1-25/+0
Chris Perl reports that we're seeing races between the wakeup call in xs_error_report and the connect attempts. Basically, Chris has shown that in certain circumstances, the call to xs_error_report causes the rpc_task that is responsible for reconnecting to wake up early, thus triggering a disconnect and retry. Since the sk->sk_error_report() calls in the socket layer are always followed by a tcp_done() in the cases where we care about waking up the rpc_tasks, just let the state_change callbacks take responsibility for those wake ups. Reported-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org Tested-by: Chris Perl <chris.perl@gmail.com>
2012-10-24SUNRPC: Prevent races in xs_abort_connection()Trond Myklebust1-5/+8
The call to xprt_disconnect_done() that is triggered by a successful connection reset will trigger another automatic wakeup of all tasks on the xprt->pending rpc_wait_queue. In particular it will cause an early wake up of the task that called xprt_connect(). All we really want to do here is clear all the socket-specific state flags, so we split that functionality out of xs_sock_mark_closed() into a helper that can be called by xs_abort_connection() Reported-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org Tested-by: Chris Perl <chris.perl@gmail.com>
2012-10-24Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..."Trond Myklebust1-1/+1
This reverts commit 55420c24a0d4d1fce70ca713f84aa00b6b74a70e. Now that we clear the connected flag when entering TCP_CLOSE_WAIT, the deadlock described in this commit is no longer possible. Instead, the resulting call to xs_tcp_shutdown() can interfere with pending reconnection attempts. Reported-by: Chris Perl <chris.perl@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org Tested-by: Chris Perl <chris.perl@gmail.com>
2012-10-24SUNRPC: Clear the connect flag when socket state is TCP_CLOSE_WAITTrond Myklebust1-0/+1
This is needed to ensure that we call xprt_connect() upon the next call to call_connect(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org Tested-by: Chris Perl <chris.perl@gmail.com>
2012-09-28SUNRPC: Get rid of the redundant xprt->shutdown bit fieldTrond Myklebust1-18/+0
It is only set after everyone has dereferenced the transport, and serves no useful purpose: setting it is racy, so all the socket code, etc still needs to be able to cope with the cases where they miss reading it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-25SUNRPC: Set alloc_slot for backchannel tcp opsBryan Schumaker1-0/+1
f39c1bfb5a03e2d255451bff05be0d7255298fa4 (SUNRPC: Fix a UDP transport regression) introduced the "alloc_slot" function for xprt operations, but never created one for the backchannel operations. This patch fixes a null pointer dereference when mounting NFS over v4.1. Call Trace: [<ffffffffa0207957>] ? xprt_reserve+0x47/0x50 [sunrpc] [<ffffffffa02023a4>] call_reserve+0x34/0x60 [sunrpc] [<ffffffffa020e280>] __rpc_execute+0x90/0x400 [sunrpc] [<ffffffffa020e61a>] rpc_async_schedule+0x2a/0x40 [sunrpc] [<ffffffff81073589>] process_one_work+0x139/0x500 [<ffffffff81070e70>] ? alloc_worker+0x70/0x70 [<ffffffffa020e5f0>] ? __rpc_execute+0x400/0x400 [sunrpc] [<ffffffff81073d1e>] worker_thread+0x15e/0x460 [<ffffffff8145c839>] ? preempt_schedule+0x49/0x70 [<ffffffff81073bc0>] ? rescuer_thread+0x230/0x230 [<ffffffff81079603>] kthread+0x93/0xa0 [<ffffffff81465d04>] kernel_thread_helper+0x4/0x10 [<ffffffff81079570>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff81465d00>] ? gs_change+0x13/0x13 Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-19SUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAITTrond Myklebust1-5/+16
Instead of doing a shutdown() call, we need to do an actual close(). Ditto if/when the server is sending us junk RPC headers. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Simon Kirby <sim@hostway.ca> Cc: stable@vger.kernel.org
2012-09-07SUNRPC: Fix a UDP transport regressionTrond Myklebust1-0/+3
Commit 43cedbf0e8dfb9c5610eb7985d5f21263e313802 (SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing hangs in the case of NFS over UDP mounts. Since neither the UDP or the RDMA transport mechanism use dynamic slot allocation, we can skip grabbing the socket lock for those transports. Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA case. Note that the NFSv4.1 back channel assigns the slot directly through rpc_run_bc_task, so we can ignore that case. Reported-by: Dick Streefland <dick.streefland@altium.nl> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.1]
2012-07-31Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds1-0/+43
Merge Andrew's second set of patches: - MM - a few random fixes - a couple of RTC leftovers * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits) rtc/rtc-88pm80x: remove unneed devm_kfree rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables tmpfs: distribute interleave better across nodes mm: remove redundant initialization mm: warn if pg_data_t isn't initialized with zero mips: zero out pg_data_t when it's allocated memcg: gix memory accounting scalability in shrink_page_list mm/sparse: remove index_init_lock mm/sparse: more checks on mem_section number mm/sparse: optimize sparse_index_alloc memcg: add mem_cgroup_from_css() helper memcg: further prevent OOM with too many dirty pages memcg: prevent OOM with too many dirty pages mm: mmu_notifier: fix freed page still mapped in secondary MMU mm: memcg: only check anon swapin page charges for swap cache mm: memcg: only check swap cache pages for repeated charging mm: memcg: split swapin charge function into private and public part mm: memcg: remove needless !mm fixup to init_mm when charging mm: memcg: remove unneeded shmem charge type ...
2012-07-31Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-0/+10
Pull second wave of NFS client updates from Trond Myklebust: - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into separate modules. - Fix Oopses in the NFSv4 idmapper - Fix a deadlock whereby rpciod tries to allocate a new socket and ends up recursing into the NFS code due to memory reclaim. - Increase the number of permitted callback connections. * tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: explicitly reject LOCK_MAND flock() requests nfs: increase number of permitted callback connections. SUNRPC: return negative value in case rpcbind client creation error NFS: Convert v4 into a module NFS: Convert v3 into a module NFS: Convert v2 into a module NFS: Keep module parameters in the generic NFS client NFS: Split out remaining NFS v4 inode functions NFS: Pass super operations and xattr handlers in the nfs_subversion NFS: Only initialize the ACL client in the v3 case NFS: Create a try_mount rpc op NFS: Remove the NFS v4 xdev mount function NFS: Add version registering framework NFS: Fix a number of bugs in the idmapper nfs: skip commit in releasepage if we're freeing memory for fs-related reasons sunrpc: clarify comments on rpc_make_runnable pnfsblock: bail out partial page IO
2012-07-31nfs: enable swap on NFSMel Gorman1-0/+43
Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol ->connect() method. PF_MEMALLOC should allow the allocation of struct socket and related objects and the early (re)setting of SOCK_MEMALLOC should allow us to receive the packets required for the TCP connection buildup. [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases] [dfeng@redhat.com: Fix handling of multiple swap files] [a.p.zijlstra@chello.nl: Original patch] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30nfs: skip commit in releasepage if we're freeing memory for fs-related reasonsJeff Layton1-0/+10
We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2012-07-05sunrpc: Don't do a dst_confirm() on an input routes.David S. Miller1-3/+0
xs_udp_data_ready() is operating on received packets, and tries to do a dst_confirm() on the dst attached to the SKB. This isn't right, dst confirmation is for output routes, not input rights. It's for resetting the timers on the nexthop neighbour entry for the route, indicating that we've got good evidence that we've successfully reached it. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-26sunrpc: skip portmap calls on sessions backchannelJ. Bruce Fields1-0/+1
There's obviously no point to doing portmap calls over the sessions backchannel. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-11SUNRPC: Fix a few sparse warningsTrond Myklebust1-5/+5
net/sunrpc/svcsock.c:412:22: warning: incorrect type in assignment (different address spaces) - svc_partial_recvfrom now takes a struct kvec, so the variable save_iovbase needs to be an ordinary (void *) Make a bunch of variables in net/sunrpc/xprtsock.c static Fix a couple of "warning: symbol 'foo' was not declared. Should it be static?" reports. Fix a couple of conflicting function declarations. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-16SUNRPC: add sending,pending queue and max slot to xprt statsAndy Adamson1-6/+17
With static RPC slots, the xprt backlog queue stats were useful in showing when the transport (TCP) was starved by lack of RPC slots. The new dynamic RPC slot code, commit d9ba131d8f58c0d2ff5029e7002ab43f913b36f9, always provides an RPC slot and so only uses the xprt backlog queue when the tcp_max_slot_table_entries value has been hit or when an allocation error occurs. All requests are now placed on the xprt sending or pending queue which need to be monitored for debugging. The max_slot stat shows the maximum number of dynamic RPC slots reached which is useful when debugging performance issues. Add the new fields at the end of the mountstats xprt stanza so that mountstats outputs the previous correct values and ignores the new fields. Bump NFS_IOSTATS_VERS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-22SUNRPC: Ensure we return EAGAIN in xs_nospace if congestion is clearedTrond Myklebust1-2/+1
By returning '0' instead of 'EAGAIN' when the tests in xs_nospace() fail to find evidence of socket congestion, we are making the RPC engine believe that the message was incorrectly sent and so it disconnects the socket instead of just retrying. The bug appears to have been introduced by commit 5e3771ce2d6a69e10fcc870cdf226d121d868491 (SUNRPC: Ensure that xs_nospace return values are propagated). Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 2.6.30] Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
2011-11-10SUNRPC: destroy freshly allocated transport in case of sockaddr init errorStanislav Kinsbursky1-1/+3
Otherwise we will leak xprt structure and struct net reference. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-17SUNRPC: Support dynamic slot allocation for TCP connectionsTrond Myklebust1-7/+42
Allow the number of available slots to grow with the TCP window size. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-17SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slotTrond Myklebust1-0/+2
This throttles the allocation of new slots when the socket is busy reconnecting and/or is out of buffer space. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15SUNRPC: sunrpc should not explicitly depend on NFS config optionsTrond Myklebust1-3/+3
Change explicit references to CONFIG_NFS_V4_1 to implicit ones Get rid of the unnecessary defines in backchannel_rqst.c and bc_svc.c: the Makefile takes care of those dependency. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-27SUNRPC: Support for RPC over AF_LOCAL transportsChuck Lever1-3/+392
TI-RPC introduces the capability of performing RPC over AF_LOCAL sockets. It uses this mainly for registering and unregistering local RPC services securely with the local rpcbind, but we could also conceivably use it as a generic upcall mechanism. This patch provides a client-side only implementation for the moment. We might also consider a server-side implementation to provide AF_LOCAL access to NLM (for statd downcalls, and such like). Autobinding is not supported on kernel AF_LOCAL transports at this time. Kernel ULPs must specify the pathname of the remote endpoint when an AF_LOCAL transport is created. rpcbind supports registering services available via AF_LOCAL, so the kernel could handle it with some adjustment to ->rpcbind and ->set_port. But we don't need this feature for doing upcalls via well-known named sockets. This has not been tested with ULPs that move a substantial amount of data. Thus, I can't attest to how robust the write_space and congestion management logic is. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-27SUNRPC: Rename xs_encode_tcp_fragment_header()Chuck Lever1-12/+12
Clean up: Use a more generic name for xs_encode_tcp_fragment_header(); it's appropriate to use for all stream transport types. We're about to add new stream transport. Also, move it to a place where it is more easily shared amongst the various send_request methods. And finally, replace the "htonl" macro invocation with its modern equivalent. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-27SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...Trond Myklebust1-3/+13
The TCP connection state code depends on the state_change() callback being called when the SYN_SENT state is set. However the networking layer doesn't actually call us back in that case. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
2011-03-31Fix common misspellingsLucas De Marchi1-2/+2
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>