Age | Commit message (Collapse) | Author | Files | Lines |
|
Shows how many people are testing coda - the bug had been there for 5 years
and results of stepping on it are not subtle.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (122 commits)
sunrpc: drop BKL around wrap and unwrap
NFSv4: Make sure unlock is really an unlock when cancelling a lock
NLM: fix source address of callback to client
SUNRPC client: add interface for binding to a local address
SUNRPC server: record the destination address of a request
SUNRPC: cleanup transport creation argument passing
NFSv4: Make the NFS state model work with the nosharedcache mount option
NFS: Error when mounting the same filesystem with different options
NFS: Add the mount option "nosharecache"
NFS: Add support for mounting NFSv4 file systems with string options
NFS: Add final pieces to support in-kernel mount option parsing
NFS: Introduce generic mount client API
NFS: Add enums and match tables for mount option parsing
NFS: Improve debugging output in NFS in-kernel mount client
NFS: Clean up in-kernel NFS mount
NFS: Remake nfsroot_mount as a permanent part of NFS client
SUNRPC: Add a convenient default for the hostname when calling rpc_create()
SUNRPC: Rename rpcb_getport to be consistent with new rpcb_getport_sync name
SUNRPC: Rename rpcb_getport_external routine
SUNRPC: Allow rpcbind requests to be interrupted by a signal.
...
|
|
When nfsd was transitioned to use splice instead of sendfile() for data
transfers, a line setting the page index was lost. Restore it, so that
nfsd is functional when that path is used.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If the output actor doesn't transfer the full amount of data, we will
increment ppos too much. Two related bugs in there:
- We need to break out and return actor() retval if it is shorted than
what we spliced into the pipe.
- Adjust ppos only according to actor() return.
Also fix loop problem in generic_file_splice_read(), it should not keep
going when data has already been transferred.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Revalidate read/write permissions for splice(2) and vmslice(2), in case
security policy has changed since the files were opened.
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
sysfs binary attributes
Well, first of all, I don't want to change so many files either.
What I do:
Adding a new parameter "struct bin_attribute *" in the
.read/.write methods for the sysfs binary attributes.
In fact, only the four lines change in fs/sysfs/bin.c and
include/linux/sysfs.h do the real work.
But I have to update all the files that use binary attributes
to make them compatible with the new .read and .write methods.
I'm not sure if I missed any. :(
Why I do this:
For a sysfs attribute, we can get a pointer pointing to the
struct attribute in the .show/.store method,
while we can't do this for the binary attributes.
I don't know why this is different, but this does make it not
so handy to use the binary attributes as the regular ones.
So I think this patch is reasonable. :)
Who benefits from it:
The patch that exposes ACPI tables in sysfs
requires such an improvement.
All the table binary attributes share the same .read method.
Parameter "struct bin_attribute *" is used to get
the table signature and instance number which are used to
distinguish different ACPI table binary attributes.
Without this parameter, we need to offer different .read methods
for different ACPI table binary attributes.
This is impossible as there are various ACPI tables on different
platforms, and we don't know what they are until they are loaded.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
This patch makes dentries and inodes for sysfs directories
reclaimable.
* sysfs_notify() is modified to walk sysfs_dirent tree instead of
dentry tree.
* sysfs_update_file() and sysfs_chmod_file() use sysfs_get_dentry() to
grab the victim dentry.
* sysfs_rename_dir() and sysfs_move_dir() grab all dentries using
sysfs_get_dentry() on startup.
* Dentries for all shadowed directories are pinned in memory to serve
as lookup start point.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Some sysfs operations require dentry and inode. sysfs_get_dentry()
looks up and gets dentry for the specified sysfs_dirent. It finds the
first ancestor with dentry attached and starts looking up dentries
from there.
Looking up from the nearest ancestor is necessary to support shadowed
directories because we can't reliably lookup dentry for one of the
shadows. Dentries for each shadow will be pinned in memory such that
they can serve as the starting point for dentry lookup.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
After add/remove path restructuring, the only user of
sysfs_drop_dentry() is sysfs_addrm_finish(). Move sysfs_drop_dentry()
to dir.c and make it static.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
The original add/remove code had the following problems.
* parent's timestamps are updated on dentry instantiation. this is
incorrect with reclaimable files.
* updating parent's timestamps isn't synchronized.
* parent nlink update assumes the inode is accessible which won't be
true once directory dentries are made reclaimable.
This patch restructures add/remove paths to resolve the above
problems. Add/removal are done in the following steps.
1. sysfs_addrm_start() : acquire locks including sysfs_mutex and other
resources.
2-a. sysfs_add_one() : add new sd. linking the new sd into the
children list is caller's responsibility.
2-b. sysfs_remove_one() : remove a sd. unlinking the sd from the
children list is caller's responsibility.
3. sysfs_addrm_finish() : release all resources and clean up.
Steps 2-a and/or 2-b can be repeated multiple times.
Parent's inode is looked up during sysfs_addrm_start(). If available
(always at the moment), it's pinned and nlink is updated as sd's are
added and removed. Timestamps are updated during finish if any sd has
been added or removed. If parent's inode is not available during
start, sysfs_mutex ensures that parent inode is not created till
add/remove is complete.
All the complexity is contained inside the helper functions.
Especially, dentry/inode handling is properly hidden from the rest of
sysfs which now mostly operate on sysfs_dirents. As an added bonus,
codes which use these helpers to add and remove sysfs_dirents are now
more structured and simpler.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
As kobj sysfs dentries and inodes are gonna be made reclaimable,
i_mutex can't be used to protect sysfs_dirent tree. Use sysfs_mutex
globally instead. As the whole tree is protected with sysfs_mutex,
there is no reason to keep sysfs_rename_sem. Drop it.
While at it, add docbook comments to functions which require
sysfs_mutex locking.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Replace sysfs_lock and kobj_sysfs_assoc_lock with sysfs_assoc_lock.
sysfs_lock was originally to be used to protect sysfs_dirent tree but
mutex seems better choice, so there is no reason to keep sysfs_lock
separate. Merge the two spinlocks into one.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
As kobj sysfs dentries and inodes are gonna be made reclaimable,
dentry can't be used as naming token for sysfs file/directory, replace
kobj->dentry with kobj->sd. The only external interface change is
shadow directory handling. All other changes are contained in kobj
and sysfs.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Implement sysfs_find_dirent() and sysfs_get_dirent().
sysfs_dirent_exist() is replaced by sysfs_find_dirent(). These will
be used to make directory entries reclamiable.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Implement SYSFS_FLAG_REMOVED flag which currently is used only to
improve sanity check in sysfs_deactivate(). The flag will be used to
make directory entries reclamiable.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Rename sysfs_dirent->s_type to s_flags, pack type into lower eight
bits and reserve the rest for flags. sysfs_type() can used to access
the type. All existing sd->s_type accesses are converted to use
sysfs_type(). While at it, type test is changed to equality test
instead of bit-and test where appropriate.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
sysfs_drop_dentry() used to go through sd->s_dentry and
sd->s_parent->s_dentry to access the inodes. This is incorrect
because inode can be cached without dentry.
This patch makes sysfs_drop_dentry() access inodes using ilookup() on
sd->s_ino. This is both correct and simpler.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Fix oops on x86_64 caused by the dereference of dir in
sysfs_drop_dentry() made before checking if dir is not NULL
(cf. http://marc.info/?l=linux-kernel&m=118151626704924&w=2).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Make sysfs_dirent use singly linked list for its tree structure.
sysfs_link_sibling() and sysfs_unlink_sibling() functions are added to
handle simpler cases. It adds some complexity and cpu cycle overhead
but reduced memory footprint is worthwhile on big machines.
This change reduces the sizeof sysfs_dirent from 104 to 88 on 64bit
and from 60 to 52 on 32bit.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Make sysfs_dirent->s_active an atomic_t instead of rwsem. This
reduces the size of sysfs_dirent from 136 to 104 on 64bit and from 76
to 60 on 32bit with lock debugging turned off. With lock debugging
turned on the reduction is much larger.
s_active starts at zero and each active reference increments s_active.
Putting a reference decrements s_active. Deactivation subtracts
SD_DEACTIVATED_BIAS which is currently INT_MIN and assumed to be small
enough to make s_active negative. If s_active is negative,
sysfs_get() no longer grants new references. Deactivation succeeds
immediately if there is no active user; otherwise, it waits using a
completion for the last put.
Due to the removal of lockdep tricks, this change makes things less
trickier in release_sysfs_dirent(). As all the complexity is
contained in three s_active functions, I think it's more readable this
way.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
These functions are about to receive more complexity and doesn't
really need to be inlined in the first place. Move them from
fs/sysfs/sysfs.h to fs/sysfs/dir.c.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
The root sysfs_dirent didn't point to the root dentry fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
After dentry is reclaimed, sysfs always used to allocate new dentry
and inode if the file is accessed again. This causes problem with
operations which only pin the inode. For example, if inotify watch is
added to a sysfs file and the dentry for the file is reclaimed, the
next update event creates new dentry and new inode making the inotify
watch miss all the events from there on.
This patch fixes it by using iget_locked() instead of new_inode().
sysfs_new_inode() is renamed to sysfs_get_inode() and inode is
initialized iff the inode is newly allocated. sysfs_instantiate() is
responsible for unlocking new inodes.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Reorganize/clean up sysfs_new_inode() and sysfs_create().
* sysfs_init_inode() is separated out from sysfs_new_inode() and is
responsible for basic initialization.
* sysfs_instantiate() replaces the last step of sysfs_create() and is
responsible for dentry instantitaion.
* type-specific initialization is moved out to the callers.
* mode is specified only once when creating a sysfs_dirent.
* spurious list_del_init(&sd->s_sibling) dropped from create_dir()
This change is to
* prepare for inode allocation fix.
* separate alloc and init code for synchronization update.
* make dentry/inode initialization more flexible for later changes.
This patch doesn't introduce visible behavior change.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Parent reference wasn't properly transferred during rename and move.
Fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
sysfs_alloc_ino() isn't used out side of fs/sysfs/dir.c. Make it
static.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
sysfs is now completely out of driver/module lifetime game. After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners. Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.
This patch kills now unnecessary attribute->owner. Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.
For more info regarding lifetime rule cleanup, please read the
following message.
http://article.gmane.org/gmane.linux.kernel/510293
(tweaked by Greg to not delete the field just yet, to make it easier to
merge things properly.)
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
This patch reimplements sysfs_drop_dentry() such that remove_dir() can
use it to drop dentry instead of using a separate mechanism. With
this change, making directories reclaimable is much easier.
This patch used to contain fixes for two race conditions around
sd->s_dentry but that part has been separated out and included into
mainline early as commit 6aa054aadfea613a437ad0b15d38eca2b963fc0a and
dd14cbc994709a1c5a64ed3621f583c49a27e521.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Consolidate sd <-> dentry association into sysfs_attach_dentry() and
call it after dentry and inode are properly set up. This is in
preparation of sysfs_drop_dentry() updates.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Now that sysfs_dirent can be disconnected from kobject on deletion,
there is no need to orphan each attribute files. All [bin_]attribute
nodes are automatically orphaned when the parent node is deleted.
Kill attribute file orphaning.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
sysfs: implement sysfs_dirent active reference and immediate disconnect
Opening a sysfs node references its associated kobject, so userland
can arbitrarily prolong lifetime of a kobject which complicates
lifetime rules in drivers. This patch implements active reference and
makes the association between kobject and sysfs immediately breakable.
Now each sysfs_dirent has two reference counts - s_count and s_active.
s_count is a regular reference count which guarantees that the
containing sysfs_dirent is accessible. As long as s_count reference
is held, all sysfs internal fields in sysfs_dirent are accessible
including s_parent and s_name.
The newly added s_active is active reference count. This is acquired
by invoking sysfs_get_active() and it's the caller's responsibility to
ensure sysfs_dirent itself is accessible (should be holding s_count
one way or the other). Dereferencing sysfs_dirent to access objects
out of sysfs proper requires active reference. This includes access
to the associated kobjects, attributes and ops.
The active references can be drained and denied by calling
sysfs_deactivate(). All active sysfs_dirents must be deactivated
after deletion but before the default reference is dropped. This
enables immediate disconnect of sysfs nodes. Once a sysfs_dirent is
deleted, it won't access any entity external to sysfs proper.
Because attr/bin_attr ops access both the node itself and its parent
for kobject, they need to hold active references to both.
sysfs_get/put_active_two() helpers are provided to help grabbing both
references. Parent's is acquired first and released last.
Unlike other operations, mmapped area lingers on after mmap() is
finished and the module implement implementing it and kobj need to
stay referenced till all the mapped pages are gone. This is
accomplished by holding one set of active references to the bin_attr
and its parent if there have been any mmap during lifetime of an
openfile. The references are dropped when the openfile is released.
This change makes sysfs lifetime rules independent from both kobject's
and module's. It not only fixes several race conditions caused by
sysfs not holding onto the proper module when referencing kobject, but
also helps fixing and simplifying lifetime management in driver model
and drivers by taking sysfs out of the equation.
Please read the following message for more info.
http://article.gmane.org/gmane.linux.kernel/510293
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Implement bin_buffer which contains a mutex and pointer to PAGE_SIZE
buffer to properly synchronize accesses to per-openfile buffer and
prepare for immediate-kobj-disconnect.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
sysfs symlink is implemented by referencing dentry and kobject from
sysfs_dirent - symlink entry references kobject, dentry is used to
walk the tree. This complicates object lifetimes rules and is
dangerous - for example, there is no way to tell to which module the
target of a symlink belongs and referencing that kobject can make it
linger after the module is gone.
This patch reimplements symlink using only sysfs_dirent tree. sd for
a symlink points and holds reference to the target sysfs_dirent and
all walking is done using sysfs_dirent tree. Simpler and safer.
Please read the following message for more info.
http://article.gmane.org/gmane.linux.kernel/510293
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
kobj->dentry can go away anytime unless the user controls when the
associated sysfs node is deleted. This patch implements
kobj_sysfs_assoc_lock which protects kobj->dentry. This will be used
to maintain kobj based API when converting sysfs to use sysfs_dirent
tree instead of dentry/kobject.
Note that this lock belongs to kobject/driver-model not sysfs. Once
sysfs is converted to not use kobject in its interface, this can be
removed from sysfs.
This is in preparation of object reference simplification.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Make sd->s_element a union of sysfs_elem_{dir|symlink|attr|bin_attr}
and rename it to s_elem. This is to achieve...
* some level of type checking : changing symlink to point to
sysfs_dirent instead of kobject is much safer and less painful now.
* easier / standardized dereferencing
* allow sysfs_elem_* to contain more than one entry
Where possible, pointer is obtained by directly deferencing from sd
instead of going through other entities. This reduces dependencies to
dentry, inode and kobject. to_attr() and to_bin_attr() are unused now
and removed.
This is in preparation of object reference simplification.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Add s_name to sysfs_dirent. This is to further reduce dependency to
the associated dentry. Name is copied for directories and symlinks
but not for attributes.
Where possible, name dereferences are converted to use sd->s_name.
sysfs_symlink->link_name and sysfs_get_name() are unused now and
removed.
This change allows symlink to be implemented using sysfs_dirent tree
proper, which is the last remaining dentry-dependent sysfs walk.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Add sysfs_dirent->s_parent. With this patch, each sd points to and
holds a reference to its parent. This allows walking sysfs tree
without referencing sd->s_dentry which can go away anytime if the user
doesn't control when it's deleted.
sd->s_parent is initialized and parent is referenced in
sysfs_attach_dirent(). Reference to parent is released when the sd is
released, so as long as reference to a sd is held, s_parent can be
followed.
dentry walk in sysfs_readdir() is convereted to s_parent walk.
This will be used to reimplement symlink such that it uses only
sysfs_dirent tree.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Currently there are four functions to create sysfs_dirent -
__sysfs_new_dirent(), sysfs_new_dirent(), __sysfs_make_dirent() and
sysfs_make_dirent(). Other than sysfs_make_dirent(), no function has
two users if calls to implement other functions are excluded.
This patch consolidates sysfs_dirent creation functions into the
following two.
* sysfs_new_dirent() : allocate and initialize
* sysfs_attach_dirent() : attach to sysfs_dirent hierarchy and/or
associate with dentry
This simplifies interface and gives callers more flexibility. This is
in preparation of object reference simplification.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Error handling in sysfs_rename_dir() was broken.
* When lookup_one_len() fails, 0 is returned.
* If parent inode check fails, returns with inode mutex and rename
rwsem held.
This patch fixes the above bugs and flattens error handling such that
it's more readable and easier to modify.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Flatten cleanup paths in sysfs_add_link() and create_dir() to improve
readability and ease further changes to these functions. This is in
preparation of object reference simplification.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Error handling in fs/sysfs/bin.c:write() was wrong because size_t
count is used to receive return value from flush_write() which is
negative on failure.
This patch updates write() such that int variable is used instead.
read() is updated the same way for consistency.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Make sysfs_put() ignore NULL sd instead of oopsing.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
sysfs used simple incrementing allocator which is not guaranteed to be
unique. This patch makes sysfs use ida to give each sd a unique and
packed inode number.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
There is no reason this function should be inlined and soon to follow
sysfs object reference simplification will make it heavier. Move it
to dir.c.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Implement debugfs_rename() to allow renaming files/directories in debugfs.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
I ran into a curious issue when a lock is being canceled. The
cancellation results in a lock request to the vfs layer instead of an
unlock request. This is particularly insidious when the process that
owns the lock is exiting. In that case, sometimes the erroneous lock is
applied AFTER the process has entered zombie state, preventing the lock
from ever being released. Eventually other processes block on the lock
causing a slow degredation of the system. In the 2.6.16 kernel this was
investigated on, the problem is compounded by the fact that the cl_sem
is held while blocking on the vfs lock, which results in most processes
accessing the nfs file system in question hanging.
In more detail, here is how the situation occurs:
first _nfs4_do_setlk():
static int _nfs4_do_setlk(struct nfs4_state *state, int cmd, struct file_lock *fl, int reclaim)
...
ret = nfs4_wait_for_completion_rpc_task(task);
if (ret == 0) {
...
} else
data->cancelled = 1;
then nfs4_lock_release():
static void nfs4_lock_release(void *calldata)
...
if (data->cancelled != 0) {
struct rpc_task *task;
task = nfs4_do_unlck(&data->fl, data->ctx, data->lsp,
data->arg.lock_seqid);
The problem is the same file_lock that was passed in to _nfs4_do_setlk()
gets passed to nfs4_do_unlck() from nfs4_lock_release(). So the type is
still F_RDLCK or FWRLCK, not F_UNLCK. At some point, when cancelling the
lock, the type needs to be changed to F_UNLCK. It seemed easiest to do
that in nfs4_do_unlck(), but it could be done in nfs4_lock_release().
The concern I had with doing it there was if something still needed the
original file_lock, though it turns out the original file_lock still
needs to be modified by nfs4_do_unlck() because nfs4_do_unlck() uses the
original file_lock to pass to the vfs layer, and a copy of the original
file_lock for the RPC request.
It seems like the simplest solution is to force all situations where
nfs4_do_unlck() is being used to result in an unlock, so with that in
mind, I made the following change:
Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Use the destination address of the original NLM request as the
source address in callbacks to the client.
Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Consider the case where the user has mounted the remote filesystem
server:/foo on the two local directories /bar and /baz using the
nosharedcache mount option. The files /bar/file and /baz/file are
represented by different inodes in the local namespace, but refer to the
same file /foo/file on the server.
Consider the case where a process opens both /bar/file and /baz/file, then
closes /bar/file: because the nfs4_state is not shared between /bar/file
and /baz/file, the kernel will see that the nfs4_state for /bar/file is no
longer referenced, so it will send off a CLOSE rpc call. Unless the
open_owners differ, then that CLOSE call will invalidate the open state on
/baz/file too.
Conclusion: we cannot share open state owners between two different
non-shared mount instances of the same filesystem.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Unless the user sets the NFS_MOUNT_NOSHAREDCACHE mount flag, we should
return EBUSY if the filesystem is already mounted on a superblock that
has set conflicting mount options.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Prior to David Howell's mount changes in 2.6.18, users who mounted
different directories which happened to be from the same filesystem on the
server would get different super blocks, and hence could choose different
mount options. As long as there were no hard linked files that crossed from
one subtree to another, this was quite safe.
Post the changes, if the two directories are on the same filesystem (have
the same 'fsid'), they will share the same super block, and hence the same
mount options.
Add a flag to allow users to elect not to share the NFS super block with
another mount point, even if the fsids are the same. This will allow
users to set different mount options for the two different super blocks, as
was previously possible. It is still up to the user to ensure that there
are no cache coherency issues when doing this, however the default
behaviour will be to share super blocks whenever two paths result in
the same fsid.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|