Age | Commit message (Collapse) | Author | Files | Lines |
|
Clean up.
For consistency, rewrite the IPv4 check to match the same style as the
new IPv6 check. Note that ipv4_is_loopback() is somewhat broader in
its interpretation of what is a loopback address than simply
"127.0.0.1".
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Commit b85e4676 added the nlm_privileged_requester() helper to check
whether an RPC request was sent from a local privileged caller. It
recognizes IPv4 privileged callers (from "127.0.0.1"), and IPv6
privileged callers (from "::1").
However, IPV6_ADDR_LOOPBACK is not set for the mapped IPv4 loopback
address (::ffff:7f00:0001), so the test breaks when the kernel's RPC
service is IPv6-enabled but user space is calling via the IPv4
loopback address. This is actually the most common case for IPv6-
enabled RPC services on Linux.
Rewrite the IPv6 check to handle the mapped IPv4 loopback address as
well as a normal IPv6 loopback address.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Clean up: nsm_addr_in() is no longer used, and nsm_addr() is used only in
fs/lockd/mon.c, so move it there.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Clean up: The include/linux/lockd/sm_inter.h header is nearly empty
now. Remove it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Clean up: nsm_find() now has only one caller, and that caller
unconditionally sets the @create argument. Thus the @create
argument is no longer needed.
Since nsm_find() now has a more specific purpose, pick a more
appropriate name for it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Introduce a new API to fs/lockd/mon.c that allows nlm_host_rebooted()
to lookup up nsm_handles via the contents of an nlm_reboot struct.
The new function is equivalent to calling nsm_find() with @create set
to zero, but it takes a struct nlm_reboot instead of separate
arguments.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
The NLM XDR decoders for the NLMPROC_SM_NOTIFY procedure should treat
their "priv" argument truly as an opaque, as defined by the protocol,
and let the upper layers figure out what is in it.
This will make it easier to modify the contents and interpretation of
the "priv" argument, and keep knowledge about what's in "priv" local
to fs/lockd/mon.c.
For now, the NLM and NSM implementations should behave exactly as they
did before.
The formation of the address of the rebooted host in
nlm_host_rebooted() may look a little strange, but it is the inverse
of how nsm_init_private() forms the private cookie. Plus, it's
going away soon anyway.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Pass the nlm_reboot data structure directly from the NLMPROC_SM_NOTIFY
XDR decoders to nlm_host_rebooted(). This eliminates some packing and
unpacking of the NLMPROC_SM_NOTIFY results, and prepares for passing
these results, including the "priv" cookie, directly to a lookup
routine in fs/lockd/mon.c.
This patch changes code organization but should not cause any
behavioral change.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Introduce a new data type, used by both the in-kernel NLM and NSM
implementations, that is used to manage the opaque "priv" argument
for the NSMPROC_MON and NLMPROC_SM_NOTIFY calls.
Construct the "priv" cookie when the nsm_handle is created.
The nsm_init_private() function may look a little strange, but it is
roughly equivalent to how the XDR encoder formed the "priv" argument.
It's going to go away soon.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
The nsm_find() function sets up fresh nsm_handle entries. This is
where we will store the "priv" cookie used to lookup nsm_handles during
reboot recovery. The cookie will be constructed when nsm_find()
creates a new nsm_handle.
As much as possible, I would like to keep everything that handles a
"priv" cookie in fs/lockd/mon.c so that all the smarts are in one
source file. That organization should make it pretty simple to see how
all this works.
To me, it makes more sense than the current arrangement to keep
nsm_find() with nsm_monitor() and nsm_unmonitor().
So, start reorganizing by moving nsm_find() into fs/lockd/mon.c. The
nsm_release() function comes along too, since it shares the nsm_lock
global variable.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Clean up: Move the RPC program and procedure numbers for NSM into the
one source file that needs them: fs/lockd/mon.c.
And, as with NLM, NFS, and rpcbind calls, use NSMPROC_FOO instead of
SM_FOO for NSM procedure numbers.
Finally, make a couple of comments more precise: what is referred to
here as SM_NOTIFY is really the NLM (lockd) NLMPROC_SM_NOTIFY downcall,
not NSMPROC_NOTIFY.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Clean up: NSM's XDR data structures are used only in fs/lockd/mon.c,
so move them there.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Clean up.
Make the nlm_host argument "const," and move the public declaration to
lockd.h. Add a documenting comment.
Bruce observed that nsm_unmonitor()'s only caller doesn't care about
its return code, so make nsm_unmonitor() return void.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
The nsm_handle's reference count is bumped in nlm_lookup_host(). It
should be decremented in nlm_destroy_host() to make it easier to see
the balance of these two operations.
Move the nsm_release() call to fs/lockd/host.c.
The h_nsmhandle pointer is set in nlm_lookup_host(), and never cleared.
The nlm_destroy_host() function is never called for the same nlm_host
twice, so h_nsmhandle won't ever be NULL when nsm_unmonitor() is
called.
All references to the nlm_host are gone before it is freed. We can
skip making h_nsmhandle NULL just before the nlm_host is deallocated.
It's also likely we can remove the h_nsmhandle NULL check in
nlmsvc_is_client() as well, but we can do that later when rearchitect-
ing the nlm_host cache.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Clean up.
Make the nlm_host argument "const," and move the public declaration to
lockd.h with other NSM public function (nsm_release, eg) and global
variable declarations.
Add a documenting comment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
The "mon_name" argument of the NSMPROC_MON and NSMPROC_UNMON upcalls
is a string that contains the hostname or IP address of the remote peer
to be notified when this host has rebooted. The sm-notify command uses
this identifier to contact the peer when we reboot, so it must be
either a well-qualified DNS hostname or a presentation format IP
address string.
When the "nsm_use_hostnames" sysctl is set to zero, the kernel's NSM
provides a presentation format IP address in the "mon_name" argument.
Otherwise, the "caller_name" argument from NLM requests is used,
which is usually just the DNS hostname of the peer.
To support IPv6 addresses for the mon_name argument, we use the
nsm_handle's address eye-catcher, which already contains an appropriate
presentation format address string. Using the eye-catcher string
obviates the need to use a large buffer on the stack to form the
presentation address string for the upcall.
This patch also addresses a subtle bug.
An NSMPROC_MON request and the subsequent NSMPROC_UNMON request for the
same peer are required to use the same value for the "mon_name"
argument. Otherwise, rpc.statd's NSMPROC_UNMON processing cannot
locate the database entry for that peer and remove it.
If the setting of nsm_use_hostnames is changed between the time the
kernel sends an NSMPROC_MON request and the time it sends the
NSMPROC_UNMON request for the same peer, the "mon_name" argument for
these two requests may not be the same. This is because the value of
"mon_name" is currently chosen at the moment the call is made based on
the setting of nsm_use_hostnames
To ensure both requests pass identical contents in the "mon_name"
argument, we now select which string to use for the argument in the
nsm_monitor() function. A pointer to this string is saved in the
nsm_handle so it can be used for a subsequent NSMPROC_UNMON upcall.
NB: There are other potential problems, such as how nlm_host_rebooted()
might behave if nsm_use_hostnames were changed while hosts are still
being monitored. This patch does not attempt to address those
problems.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Clean up: I'm about to add another "char *" field to the nsm_handle
structure. The sm_name field uses an older style of declaring a
"char *" field. If I match that style for the new field, checkpatch.pl
will complain.
So, fix the sm_name field to use the new style.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Scope ID support is needed since the kernel's NSM implementation is
about to use these displayed addresses as a mon_name in some cases.
When nsm_use_hostnames is zero, without scope ID support NSM will fail
to handle peers that contact us via a link-local address. Link-local
addresses do not work without an interface ID, which is stored in the
sockaddr's sin6_scope_id field.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
The h_name field in struct nlm_host is a just copy of
h_nsmhandle->sm_name. Likewise, the contents of the h_addrbuf field
should be identical to the sm_addrbuf field.
The h_srcaddrbuf field is used only in one place for debugging. We can
live without this until we get %pI formatting for printk().
Currently these buffers are 48 bytes, but we need to support scope IDs
in IPv6 presentation addresses, which means making the buffers even
larger. Instead, let's find ways to eliminate them to save space.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Clean up: I'm about to add another "char *" field to the nlm_host
structure. The h_name field, for example, uses an older style of
declaring a "char *" field. If I match that style for the new field,
checkpatch.pl will complain.
So, fix pointer fields to use the new style.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
svc_check_conn_limits() attempts to prevent denial of service attacks
by having the service close old connections once it reaches a
threshold. This threshold is based on the number of threads in the
service:
(serv->sv_nrthreads + 3) * 20
Once we reach this, we drop the oldest connections and a printk pops
to warn the admin that they should increase the number of threads.
Increasing the number of threads isn't an option however for services
like lockd. We don't want to eliminate this check entirely for such
services but we need some way to increase this limit.
This patch adds a sv_maxconn field to the svc_serv struct. When it's
set to 0, we use the current method to calculate the max number of
connections. RPC services can then set this on an as-needed basis.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
Descriptions taken from mountd code (in nfs-utils/utils/mountd/cache.c).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
dm snapshot: extend exception store functions
dm snapshot: split out exception store implementations
dm snapshot: rename struct exception_store
dm snapshot: separate out exception store interface
dm mpath: move trigger_event to system workqueue
dm: add name and uuid to sysfs
dm table: rework reference counting
dm: support barriers on simple devices
dm request: extend target interface
dm request: add caches
dm ioctl: allow dm_copy_name_and_uuid to return only one field
dm log: ensure log bitmap fits on log device
dm log: move region_size validation
dm log: avoid reinitialising io_req on every operation
dm: consolidate target deregistration error handling
dm raid1: fix error count
dm log: fix dm_io_client leak on error paths
dm snapshot: change yield to msleep
dm table: drop reference at unbind
|
|
Implement barrier support for single device DM devices
This patch implements barrier support in DM for the common case of dm linear
just remapping a single underlying device. In this case we can safely
pass the barrier through because there can be no reordering between
devices.
NB. Any DM device might cease to support barriers if it gets
reconfigured so code must continue to allow for a possible
-EOPNOTSUPP on every barrier bio submitted. - agk
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
This patch adds the following target interfaces for request-based dm.
map_rq : for mapping a request
rq_end_io : for finishing a request
busy : for avoiding performance regression from bio-based dm.
Target can tell dm core not to map requests now, and
that may help requests in the block layer queue to be
bigger by I/O merging.
In bio-based dm, this behavior is done by device
drivers managing the block layer queue.
But in request-based dm, dm core has to do that
since dm core manages the block layer queue.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
Change dm_unregister_target to return void and use BUG() for error
reporting.
dm_unregister_target can only fail because of programming bug in the
target driver. It can't fail because of user's behavior or disk errors.
This patch changes unregister_target to return void and use BUG if
someone tries to unregister non-registered target or unregister target
that is in use.
This patch removes code duplication (testing of error codes in all dm
targets) and reports bugs in just one place, in dm_unregister_target. In
some target drivers, these return codes were ignored, which could lead
to a situation where bugs could be missed.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
|
* 'for-next' of git://git.o-hand.com/linux-mfd: (30 commits)
mfd: Fix section mismatch in da903x
mfd: move drivers/i2c/chips/menelaus.c to drivers/mfd
mfd: move drivers/i2c/chips/tps65010.c to drivers/mfd
mfd: dm355evm msp430 driver
mfd: Add missing break from wm3850-core
mfd: Add WM8351 support
mfd: Support configurable numbers of DCDCs and ISINKs on WM8350
mfd: Handle missing WM8350 platform data
mfd: Add WM8352 support
mfd: Use irq_to_desc in twl4030 code
power_supply: Add Dialog DA9030 battery charger driver
mfd: Dialog DA9030 battery charger MFD driver
mfd: Register WM8400 codec device
mfd: Pass driver_data onto child devices
mfd: Fix twl4030-core.c build error
mfd: twl4030 regulator bug fixes
mfd: twl4030: create some regulator devices
mfd: twl4030: cleanup symbols and OMAP dependency
mfd: twl4030: simplified child creation code
power_supply: Add battery health reporting for WM8350
...
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
module: convert to stop_machine_create/destroy.
stop_machine: introduce stop_machine_create/destroy.
parisc: fix module loading failure of large kernel modules
module: fix module loading failure of large kernel modules for parisc
module: fix warning of unused function when !CONFIG_PROC_FS
kernel/module.c: compare symbol values when marking symbols as exported in /proc/kallsyms.
remove CONFIG_KMOD
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
swiotlb: Don't include linux/swiotlb.h twice in lib/swiotlb.c
intel-iommu: fix build error with INTR_REMAP=y and DMAR=n
swiotlb: add missing __init annotations
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (22 commits)
HID: fix error condition propagation in hid-sony driver
HID: fix reference count leak hidraw
HID: add proper support for pensketch 12x9 tablet
HID: don't allow DealExtreme usb-radio be handled by usb hid driver
HID: fix default Kconfig setting for TopSpeed driver
HID: driver for TopSeed Cyberlink quirky remote
HID: make boot protocol drivers depend on EMBEDDED
HID: avoid sparse warning in HID_COMPAT_LOAD_DRIVER
HID: hiddev cleanup -- handle all error conditions properly
HID: force feedback driver for GreenAsia 0x12 PID
HID: switch specialized drivers from "default y" to !EMBEDDED
HID: set proper dev.parent in hidraw
HID: add dynids facility
HID: use GFP_KERNEL in hid_alloc_buffers
HID: usbhid, use usb_endpoint_xfer_int
HID: move usbhid flags to usbhid.h
HID: add n-trig digitizer support
HID: add phys and name ioctls to hidraw
HID: struct device - replace bus_id with dev_name(), dev_set_name()
HID: automatically call usbhid_set_leds in usbhid driver
...
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (27 commits)
GFS2: Use DEFINE_SPINLOCK
GFS2: Fix use-after-free bug on umount (try #2)
Revert "GFS2: Fix use-after-free bug on umount"
GFS2: Streamline alloc calculations for writes
GFS2: Send useful information with uevent messages
GFS2: Fix use-after-free bug on umount
GFS2: Remove ancient, unused code
GFS2: Move four functions from super.c
GFS2: Fix bug in gfs2_lock_fs_check_clean()
GFS2: Send some sensible sysfs stuff
GFS2: Kill two daemons with one patch
GFS2: Move gfs2_recoverd into recovery.c
GFS2: Fix "truncate in progress" hang
GFS2: Clean up & move gfs2_quotad
GFS2: Add more detail to debugfs glock dumps
GFS2: Banish struct gfs2_rgrpd_host
GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd
GFS2: Move rg_igeneration into struct gfs2_rgrpd
GFS2: Banish struct gfs2_dinode_host
GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize
...
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (44 commits)
qlge: Fix sparse warnings for tx ring indexes.
qlge: Fix sparse warning regarding rx buffer queues.
qlge: Fix sparse endian warning in ql_hw_csum_setup().
qlge: Fix sparse endian warning for inbound packet control block flags.
qlge: Fix sparse warnings for byte swapping in qlge_ethool.c
myri10ge: print MAC and serial number on probe failure
pkt_sched: cls_u32: Fix locking in u32_change()
iucv: fix cpu hotplug
af_iucv: Free iucv path/socket in path_pending callback
af_iucv: avoid left over IUCV connections from failing connects
af_iucv: New error return codes for connect()
net/ehea: bitops work on unsigned longs
Revert "net: Fix for initial link state in 2.6.28"
tcp: Kill extraneous SPLICE_F_NONBLOCK checks.
tcp: don't mask EOF and socket errors on nonblocking splice receive
dccp: Integrate the TFRC library with DCCP
dccp: Clean up ccid.c after integration of CCID plugins
dccp: Lockless integration of CCID congestion-control plugins
qeth: get rid of extra argument after printk to dev_* conversion
qeth: No large send using EDDP for HiperSockets.
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Fix on resume, now preserves user policy min/max.
[CPUFREQ] Add Celeron Core support to p4-clockmod.
[CPUFREQ] add to speedstep-lib additional fsb values for core processors
[CPUFREQ] Disable sysfs ui for p4-clockmod.
[CPUFREQ] p4-clockmod: reduce noise
[CPUFREQ] clean up speedstep-centrino and reduce cpumask_t usage
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (138 commits)
ocfs2: Access the right buffer_head in ocfs2_merge_rec_left.
ocfs2: use min_t in ocfs2_quota_read()
ocfs2: remove unneeded lvb casts
ocfs2: Add xattr support checking in init_security
ocfs2: alloc xattr bucket in ocfs2_xattr_set_handle
ocfs2: calculate and reserve credits for xattr value in mknod
ocfs2/xattr: fix credits calculation during index create
ocfs2/xattr: Always updating ctime during xattr set.
ocfs2/xattr: Remove extend_trans call and add its credits from the beginning
ocfs2/dlm: Fix race during lockres mastery
ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list
ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migrating
ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler()
ocfs2/dlm: Fix a race between migrate request and exit domain
ocfs2: One more hamming code optimization.
ocfs2: Another hamming code optimization.
ocfs2: Don't hand-code xor in ocfs2_hamming_encode().
ocfs2: Enable metadata checksums.
ocfs2: Validate superblock with checksum and ecc.
ocfs2: Checksum and ECC for directory blocks.
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
inotify: fix type errors in interfaces
fix breakage in reiserfs_new_inode()
fix the treatment of jfs special inodes
vfs: remove duplicate code in get_fs_type()
add a vfs_fsync helper
sys_execve and sys_uselib do not call into fsnotify
zero i_uid/i_gid on inode allocation
inode->i_op is never NULL
ntfs: don't NULL i_op
isofs check for NULL ->i_op in root directory is dead code
affs: do not zero ->i_op
kill suid bit only for regular files
vfs: lseek(fd, 0, SEEK_CUR) race condition
|
|
An XFS workload showed up a bug in the lockless pagecache patch. Basically it
would go into an "infinite" loop, although it would sometimes be able to break
out of the loop! The reason is a missing compiler barrier in the "increment
reference count unless it was zero" case of the lockless pagecache protocol in
the gang lookup functions.
This would cause the compiler to use a cached value of struct page pointer to
retry the operation with, rather than reload it. So the page might have been
removed from pagecache and freed (refcount==0) but the lookup would not correctly
notice the page is no longer in pagecache, and keep attempting to increment the
refcount and failing, until the page gets reallocated for something else. This
isn't a data corruption because the condition will be detected if the page has
been reallocated. However it can result in a lockup.
Linus points out that ACCESS_ONCE is also required in that pointer load, even
if it's absence is not causing a bug on our particular build. The most general
way to solve this is just to put an rcu_dereference in radix_tree_deref_slot.
Assembly of find_get_pages,
before:
.L220:
movq (%rbx), %rax #* ivtmp.1162, tmp82
movq (%rax), %rdi #, prephitmp.1149
.L218:
testb $1, %dil #, prephitmp.1149
jne .L217 #,
testq %rdi, %rdi # prephitmp.1149
je .L203 #,
cmpq $-1, %rdi #, prephitmp.1149
je .L217 #,
movl 8(%rdi), %esi # <variable>._count.counter, c
testl %esi, %esi # c
je .L218 #,
after:
.L212:
movq (%rbx), %rax #* ivtmp.1109, tmp81
movq (%rax), %rdi #, ret
testb $1, %dil #, ret
jne .L211 #,
testq %rdi, %rdi # ret
je .L197 #,
cmpq $-1, %rdi #, ret
je .L211 #,
movl 8(%rdi), %esi # <variable>._count.counter, c
testl %esi, %esi # c
je .L212 #,
(notice the obvious infinite loop in the first example, if page->count remains 0)
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes.
For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is
currently 'u32', it should be '__s32' . That is Robert's suggestion, and
is consistent with the other declarations of watch descriptors in the
kernel source, in particular, the inotify_event structure in
include/linux/inotify.h:
struct inotify_event {
__s32 wd; /* watch descriptor */
__u32 mask; /* watch mask */
__u32 cookie; /* cookie to synchronize two events */
__u32 len; /* length (including nulls) of name */
char name[0]; /* stub for possible name */
};
The patch makes the changes needed for inotify_rm_watch().
Signed-off-by: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Robert Love <rlove@google.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Fsync currently has a fdatawrite/fdatawait pair around the method call,
and a mutex_lock/unlock of the inode mutex. All callers of fsync have
to duplicate this, but we have a few and most of them don't quite get
it right. This patch adds a new vfs_fsync that takes care of this.
It's a little more complicated as usual as ->fsync might get a NULL file
pointer and just a dentry from nfsd, but otherwise gets afile and we
want to take the mapping and file operations from it when it is there.
Notes on the fsync callers:
- ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
lower file
- coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
file, and returning 0 when ->fsync was missing
- shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
taking i_mutex. Now given that shared memory doesn't have disk
backing not doing anything in fsync seems fine and I left it out of
the vfs_fsync conversion for now, but in that case we might just
not pass it through to the lower file at all but just call the no-op
simple_sync_file directly.
[and now actually export vfs_fsync]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Filesystems often to do compute intensive operation on some
metadata. If this operation is repeated many times, it can be very
expensive. It would be much nicer if the operation could be performed
once before a buffer goes to disk.
This adds triggers to jbd2 buffer heads. Just before writing a metadata
buffer to the journal, jbd2 will optionally call a commit trigger associated
with the buffer. If the journal is aborted, an abort trigger will be
called on any dirty buffers as they are dropped from pending
transactions.
ocfs2 will use this feature.
Initially I tried to come up with a more generic trigger that could be
used for non-buffer-related events like transaction completion. It
doesn't tie nicely, because the information a buffer trigger needs
(specific to a journal_head) isn't the same as what a transaction
trigger needs (specific to a tranaction_t or perhaps journal_t). So I
implemented a buffer set, with the understanding that
journal/transaction wide triggers should be implemented separately.
There is only one trigger set allowed per buffer. I can't think of any
reason to attach more than one set. Contrast this with a journal or
transaction in which multiple places may want to watch the entire
transaction separately.
The trigger sets are considered static allocation from the jbd2
perspective. ocfs2 will just have one trigger set per block type,
setting the same set on every bh of the same type.
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
|
These are default functions for creating and destroying quota structures
and they should be used from filesystems.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
|
Unexport header files dqblk_v[12].h since except for quota format ID they
don't contain information userspace should be interested in. Move ID
definitions to quota.h.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
|
Add this so that file systems using JBD2 can safely allocate unused b_state
bits.
In this case, we add it so that Ocfs2 can define a single bit for tracking
the validation state of a buffer.
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
|
OCFS2 needs to scan all active dquots once in a while and sync quota
information among cluster nodes. Provide a helper function for it so
that it does not have to reimplement internally a list which VFS
already has. Moreover this function is probably going to be useful
for other clustered filesystems if they decide to use VFS quotas.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
|
recovery
OCFS2 needs to peek whether quota structure is already in memory so
that it can avoid expensive cluster locking in that case. Similarly
when freeing dquots, it checks whether it is the last quota structure
user or not. Finally, it needs to get reference to dquot structure for
specified id and quota type when recovering quota file after crash.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
|
Increase reported version number of quota support since quota core has changed
significantly. Also remove __DQUOT_NUM_VERSION__ since nobody uses it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
|
Quota in a clustered environment needs to synchronize quota information
among cluster nodes. This means we have to occasionally update some
information in dquot from disk / network. On the other hand we have to
be careful not to overwrite changes administrator did via SETQUOTA.
So indicate in dquot->dq_flags which entries have been set by SETQUOTA
and quota format can clear these flags when it properly propagated
the changes.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
|
For clustered filesystems, it can happen that space / inode usage goes
negative temporarily (because some node is allocating another node
is freeing and they are not completely in sync). So let quota code
allow this and change qsize_t so a signed type so that we don't
underflow the variables.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
|
Coming quota support for OCFS2 is going to need quite a bit
of additional per-sb quota information. Moreover having fs.h
include all the types needed for this structure would be a
pain in the a**. So remove the union from mem_dqinfo and add
a private pointer for filesystem's use.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
|
There is going to be a new version of quota format having 64-bit
quota limits and a new quota format for OCFS2. They are both
going to use the same tree structure as VFSv0 quota format. So
split out tree handling into a separate file and make size of
leaf blocks, amount of space usable in each block (needed for
checksumming) and structures contained in them configurable
so that the code can be shared.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
|
Since these include files are used only by implementation of quota formats,
there's no need to have them in include/linux/.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|