summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-12-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds114-16095/+3551
Pull KVM update from Paolo Bonzini: "3.19 changes for KVM: - spring cleaning: removed support for IA64, and for hardware- assisted virtualization on the PPC970 - ARM, PPC, s390 all had only small fixes For x86: - small performance improvements (though only on weird guests) - usual round of hardware-compliancy fixes from Nadav - APICv fixes - XSAVES support for hosts and guests. XSAVES hosts were broken because the (non-KVM) XSAVES patches inadvertently changed the KVM userspace ABI whenever XSAVES was enabled; hence, this part is going to stable. Guest support is just a matter of exposing the feature and CPUID leaves support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits) KVM: move APIC types to arch/x86/ KVM: PPC: Book3S: Enable in-kernel XICS emulation by default KVM: PPC: Book3S HV: Improve H_CONFER implementation KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register KVM: PPC: Book3S HV: Remove code for PPC970 processors KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions KVM: PPC: Book3S HV: Simplify locking around stolen time calculations arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function arch: powerpc: kvm: book3s_pr.c: Remove unused function arch: powerpc: kvm: book3s.c: Remove some unused functions arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked KVM: PPC: Book3S HV: ptes are big endian KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI KVM: PPC: Book3S HV: Fix KSM memory corruption KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI KVM: PPC: Book3S HV: Fix computation of tlbie operand KVM: PPC: Book3S HV: Add missing HPTE unlock KVM: PPC: BookE: Improve irq inject tracepoint arm/arm64: KVM: Require in-kernel vgic for the arch timers ...
2014-12-18KVM: PPC: E500: Compile fix in this_cpu_writeAlexander Graf1-1/+1
Commit 69111bac42f5 ("powerpc: Replace __get_cpu_var uses") introduced compile breakage to the e500 target by introducing invalid automatically created C syntax. Fix up the breakage and make the code compile again. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18mnt: Fix a memory stomp in umountEric W. Biederman1-0/+2
While reviewing the code of umount_tree I realized that when we append to a preexisting unmounted list we do not change pprev of the former first item in the list. Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on the former first item of the list will stomp unmounted.first leaving it set to some random mount point which we are likely to free soon. This isn't likely to hit, but if it does I don't know how anyone could track it down. [ This happened because we don't have all the same operations for hlist's as we do for normal doubly-linked lists. In particular, list_splice() is easy on our standard doubly-linked lists, while hlist_splice() doesn't exist and needs both start/end entries of the hlist. And commit 38129a13e6e7 incorrectly open-coded that missing hlist_splice(). We should think about making these kinds of "mindless" conversions easier to get right by adding the missing hlist helpers - Linus ] Fixes: 38129a13e6e71f666e0468e99fdd932a687b4d7e switch mnt_hash to hlist Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-18Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into HEADPaolo Bonzini26-1093/+870
Patch queue for ppc - 2014-12-18 Highights this time around: - Removal of HV support for 970. It became a maintenance burden and received practically no testing. POWER8 with HV is available now, so just grab one of those boxes if PR isn't enough for you. - Some bug fixes and performance improvements - Tracepoints for book3s_hv
2014-12-18KVM: move APIC types to arch/x86/Paolo Bonzini3-27/+27
They are not used anymore by IA64, move them away. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-17Ceph: remove left-over reject fileLinus Torvalds1-10/+0
Neither Sage nor I noticed that Zheng Yan had mistakenly committed fs/ceph/super.h.rej as part of commit 31c542a199d7 ("ceph: add inline data to pagecache"). Remove it. Requested-by: Yan, Zheng <ukernel@gmail.com> Cc: Sage Weil <sweil@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-17Merge branch 'for-linus' of ↵Linus Torvalds29-180/+992
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "The big item here is support for inline data for CephFS and for message signatures from Zheng. There are also several bug fixes, including interrupted flock request handling, 0-length xattrs, mksnap, cached readdir results, and a message version compat field. Finally there are several cleanups from Ilya, Dan, and Markus. Note that there is another series coming soon that fixes some bugs in the RBD 'lingering' requests, but it isn't quite ready yet" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits) ceph: fix setting empty extended attribute ceph: fix mksnap crash ceph: do_sync is never initialized libceph: fixup includes in pagelist.h ceph: support inline data feature ceph: flush inline version ceph: convert inline data to normal data before data write ceph: sync read inline data ceph: fetch inline data when getting Fcr cap refs ceph: use getattr request to fetch inline data ceph: add inline data to pagecache ceph: parse inline data in MClientReply and MClientCaps libceph: specify position of extent operation libceph: add CREATE osd operation support libceph: add SETXATTR/CMPXATTR osd operations support rbd: don't treat CEPH_OSD_OP_DELETE as extent op ceph: remove unused stringification macros libceph: require cephx message signature by default ceph: introduce global empty snap context ceph: message versioning fixes ...
2014-12-17KVM: PPC: Book3S: Enable in-kernel XICS emulation by defaultAnton Blanchard1-0/+1
The in-kernel XICS emulation is faster than doing it all in QEMU and it has got a lot of testing, so enable it by default. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17Merge branch 'for-linus' of ↵Linus Torvalds11-55/+374
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace related fixes from Eric Biederman: "As these are bug fixes almost all of thes changes are marked for backporting to stable. The first change (implicitly adding MNT_NODEV on remount) addresses a regression that was created when security issues with unprivileged remount were closed. I go on to update the remount test to make it easy to detect if this issue reoccurs. Then there are a handful of mount and umount related fixes. Then half of the changes deal with the a recently discovered design bug in the permission checks of gid_map. Unix since the beginning has allowed setting group permissions on files to less than the user and other permissions (aka ---rwx---rwx). As the unix permission checks stop as soon as a group matches, and setgroups allows setting groups that can not later be dropped, results in a situtation where it is possible to legitimately use a group to assign fewer privileges to a process. Which means dropping a group can increase a processes privileges. The fix I have adopted is that gid_map is now no longer writable without privilege unless the new file /proc/self/setgroups has been set to permanently disable setgroups. The bulk of user namespace using applications even the applications using applications using user namespaces without privilege remain unaffected by this change. Unfortunately this ix breaks a couple user space applications, that were relying on the problematic behavior (one of which was tools/selftests/mount/unprivileged-remount-test.c). To hopefully prevent needing a regression fix on top of my security fix I rounded folks who work with the container implementations mostly like to be affected and encouraged them to test the changes. > So far nothing broke on my libvirt-lxc test bed. :-) > Tested with openSUSE 13.2 and libvirt 1.2.9. > Tested-by: Richard Weinberger <richard@nod.at> > Tested on Fedora20 with libvirt 1.2.11, works fine. > Tested-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> > Ok, thanks - yes, unprivileged lxc is working fine with your kernels. > Just to be sure I was testing the right thing I also tested using > my unprivileged nsexec testcases, and they failed on setgroup/setgid > as now expected, and succeeded there without your patches. > Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com> > I tested this with Sandstorm. It breaks as is and it works if I add > the setgroups thing. > Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :(" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Unbreak the unprivileged remount tests userns; Correct the comment in map_write userns: Allow setting gid_maps without privilege when setgroups is disabled userns: Add a knob to disable setgroups on a per user namespace basis userns: Rename id_map_mutex to userns_state_mutex userns: Only allow the creator of the userns unprivileged mappings userns: Check euid no fsuid when establishing an unprivileged uid mapping userns: Don't allow unprivileged creation of gid mappings userns: Don't allow setgroups until a gid mapping has been setablished userns: Document what the invariant required for safe unprivileged mappings. groups: Consolidate the setgroups permission checks mnt: Clear mnt_expire during pivot_root mnt: Carefully set CL_UNPRIVILEGED in clone_mnt mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers. umount: Do not allow unmounting rootfs. umount: Disallow unprivileged mount force mnt: Update unprivileged remount test mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
2014-12-17mmu_gather: fix over-eager tlb_flush_mmu_free() callingLinus Torvalds1-3/+3
Dave Hansen reports that commit fb7332a9fedf ("mmu_gather: move minimal range calculations into generic code") caused a performance problem: "tlb_finish_mmu() goes up about 9x in the profiles (~0.4%->3.6%) and tlb_flush_mmu_free() takes about 3.1% of CPU time with the patch applied, but does not show up at all on the commit before" and the reason is that Will moved the test for whether we need to flush from tlb_flush_mmu() into tlb_flush_mmu_tlbonly(). But that meant that tlb_flush_mmu_free() basically lost that check. Move it back into tlb_flush_mmu() where it belongs, so that it covers both tlb_flush_mmu_tlbonly() _and_ tlb_flush_mmu_free(). Reported-and-tested-by: Dave Hansen <dave@sr71.net> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-17x86: mm: fix VM_FAULT_RETRY handlingLinus Torvalds1-1/+1
My commit 26178ec11ef3 ("x86: mm: consolidate VM_FAULT_RETRY handling") had a really stupid typo: the FAULT_FLAG_USER bit is in the 'flags' variable, not the 'fault' variable. Duh, The one silver lining in this is that Dave finding this at least confirms that trinity actually triggers this special path easily, in a way normal use does not. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-17Merge tag 'vfio-v3.19-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds4-3/+19
Pull VFIO updates from Alex Williamson: - s390 support (Frank Blaschka) - Enable iommu-type1 for ARM SMMU (Will Deacon) * tag 'vfio-v3.19-rc1' of git://github.com/awilliam/linux-vfio: drivers/vfio: allow type-1 IOMMU instantiation on top of an ARM SMMU vfio: make vfio run on s390
2014-12-17Merge tag 'virtio-next-for-linus' of ↵Linus Torvalds3-2/+58
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "A balloon enhancement, and a minor race-on-module-unload theoretical bug which doesn't merit cc: stable. All the exciting stuff went via MST this cycle" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: virtio_balloon: free some memory from balloon on OOM virtio_balloon: return the amount of freed memory from leak_balloon() virtio_blk: fix race at module removal virtio: Fix comment typo 'CONFIG_S_FAILED'
2014-12-17Merge branch 'next' of ↵Linus Torvalds34-858/+2787
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management update from Zhang Rui: "Summary: - of-thermal extension to allow drivers to register and use its functionality in a better way, without exploiting thermal core. From Lukasz Majewski. - Fix a bug in intel_soc_dts_thermal driver which calls a sleep function in interrupt handler. From Maurice Petallo. - add a thermal UAPI header file for exporting the thermal generic netlink information to user-space. From Florian Fainelli. - First round of refactoring in Exynos driver. Bartlomiej and Lukasz are attempting to make it lean and easier to understand. - New thermal driver for Rockchip (rk3288), with support for DT thermal. From Caesar Wang. - New thermal driver for Nvidia, Tegra124 SOCTHERM driver, with support for DT thermal. From Mikko Perttunen. - New cooling device, based on common clock framework. From Eduardo Valentin. - a couple of small fixes in thermal core framework. From Srinivas Pandruvada, Javi Merino, Luis Henriques. - Dropping Armada A375-Z1 SoC thermal support as the chip is not in the market, armada folks decided to drop its support. - a couple of small fixes and cleanups in int340x thermal driver" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (58 commits) thermal: provide an UAPI header file Thermal/int340x: Clear the error value of the last acpi_bus_get_device() call thermal/powerclamp: add id for braswell cpu thermal: Intel SoC DTS: Don't do thermal zone update inside spin_lock Thermal: fix platform_no_drv_owner.cocci warnings Thermal/int340x: avoid unnecessary pointer casting thermal: int3403: Delete a check before thermal_zone_device_unregister() thermal/int3400: export uuids thermal: of: Extend current of-thermal.c code to allow setting emulated temp thermal: of: Extend of-thermal to export table of trip points thermal: of: Rename struct __thermal_trip to struct thermal_trip thermal: of: Extend of-thermal.c to provide check if trip point is valid thermal: of: Extend of-thermal.c to provide number of trip points thermal: Fix error path in thermal_init() thermal: lock the thermal zone when switching governors thermal: core: ignore invalid trip temperature thermal: armada: Remove support for A375-Z1 SoC thermal: rockchip: add driver for thermal dt-bindings: document Rockchip thermal thermal: exynos: remove exynos_tmu_data.h include ...
2014-12-17Merge tag 'pwm/for-3.19-rc1' of ↵Linus Torvalds7-5/+646
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "There are two new drivers, one for the BCM2835 (Raspberry Pi) and one used in conjunction with the LCD controller on various Atmel SoCs. The Samsung PWM driver can now be built for 64-bit ARM (Exynos7). A couple of fixes have been applied to the FTM PWM driver and system sleep support was added" * tag 'pwm/for-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: pwm: atmel-hlcdc: add at91sam9x5 and sama5d3 errata handling pwm: ftm: Add Power Management support for FTM PWM pwm: ftm: Add regmap rbtree type cache support pwm: ftm: Correctly track usage count pwm: samsung: Allow Samsung PWM driver to be enabled on Exynos7 pwm: add DT bindings documentation for atmel-hlcdc-pwm driver pwm: add support for atmel-hlcdc-pwm device pwm: Add BCM2835 PWM driver
2014-12-17Merge branch 'for-linus' of ↵Linus Torvalds77-922/+4949
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input subsystem updates from Dmitry Torokhov: "Two new drivers for Elan hardware (for I2C touchpad and touchscreen found in several Chromebooks and other devices), a driver for Goodix touch panel, and small fixes to Cypress I2C trackpad and other input drivers. Also we switched to use __maybe_unused instead of gating suspend/ resume code with #ifdef guards to get better compile coverage" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (27 commits) Input: gpio_keys - fix warning regarding uninitialized 'button' variable Input: add support for Elan eKTH I2C touchscreens Input: gpio_keys - fix warning regarding uninitialized 'irq' variable Input: cyapa - use 'error' for error codes Input: cyapa - fix resuming the device Input: gpio_keys - add device tree support for interrupt only keys Input: amikbd - allocate temporary keymap buffer on the stack Input: amikbd - fix build if !CONFIG_HW_CONSOLE Input: lm8323 - missing error check in lm8323_set_disable() Input: initialize device counter variables with -1 Input: initialize input_no to -1 to avoid subtraction Input: i8042 - do not try to load on Intel NUC D54250WYK Input: atkbd - correct MSC_SCAN events for force_release keys Input: cyapa - switch to using managed resources Input: lifebook - use "static inline" instead of "inline" in lifebook.h Input: touchscreen - use __maybe_unused instead of ifdef around suspend/resume Input: mouse - use __maybe_unused instead of ifdef around suspend/resume Input: misc - use __maybe_unused instead of ifdef around suspend/resume Input: cap11xx - support for irq-active-high option Input: cap11xx - add support for various cap11xx devices ...
2014-12-17Merge tag 'for-linus-20141215' of git://git.infradead.org/linux-mtdLinus Torvalds42-472/+2414
Pull MTD updates from Brian Norris: "Summary: - Add device tree support for DoC3 - SPI NOR: Refactoring, for better layering between spi-nor.c and its driver users (e.g., m25p80.c) New flash device support Support 6-byte ID strings - NAND: New NAND driver for Allwinner SoC's (sunxi) GPMI NAND: add support for raw (no ECC) access, for testing purposes Add ATO manufacturer ID A few odd driver fixes - MTD tests: Allow testers to compensate for OOB bitflips in oobtest Fix a torturetest regression - nandsim: Support longer ID byte strings And more" * tag 'for-linus-20141215' of git://git.infradead.org/linux-mtd: (63 commits) mtd: tests: abort torturetest on erase errors mtd: physmap_of: fix potential NULL dereference mtd: spi-nor: allow NULL as chip name and try to auto detect it mtd: nand: gpmi: add raw oob access functions mtd: nand: gpmi: add proper raw access support mtd: nand: gpmi: add gpmi_copy_bits function mtd: spi-nor: factor out write_enable() for erase commands mtd: spi-nor: add support for s25fl128s mtd: spi-nor: remove the jedec_id/ext_id mtd: spi-nor: add id/id_len for flash_info{} mtd: nand: correct the comment of function nand_block_isreserved() jffs2: Drop bogus if in comment mtd: atmel_nand: replace memcpy32_toio/memcpy32_fromio with memcpy mtd: cafe_nand: drop duplicate .write_page implementation mtd: m25p80: Add support for serial flash Spansion S25FL132K MTD: m25p80: fix inconsistency in m25p_ids compared to spi_nor_ids mtd: spi-nor: improve wait-till-ready timeout loop mtd: delete unnecessary checks before two function calls mtd: nand: omap: Fix NAND enumeration on 3430 LDP mtd: nand: add ATO manufacturer info ...
2014-12-17Merge tag 'microblaze-3.19-rc1' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds3-5/+48
Pull Microblaze fix from Michal Simek: "Fix mmap for cache coherent memory" * tag 'microblaze-3.19-rc1' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Fix mmap for cache coherent memory
2014-12-17Merge branch 'for-linus' of ↵Linus Torvalds3-8/+9
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem fixes from James Morris. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: KEYS: remove a bogus NULL check ima: Fix build failure on powerpc when TCG_IBMVTPM dependencies are not met KEYS: Fix stale key registration at error path
2014-12-17Merge branch 'for-linus' of ↵Linus Torvalds6-524/+359
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse update from Miklos Szeredi: "The first part makes sure we don't hold up umount with pending async requests. In addition to being a cleanup, this is a small behavioral change (for the better) and unlikely to break anything. The second part prepares for a cleanup of the fuse device I/O code by adding a helper for simple request submission, with some savings in line numbers already realized" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: use file_inode() in fuse_file_fallocate() fuse: introduce fuse_simple_request() helper fuse: reduce max out args fuse: hold inode instead of path after release fuse: flush requests on umount fuse: don't wake up reserved req in fuse_conn_kill()
2014-12-17ceph: fix setting empty extended attributeYan, Zheng1-2/+5
make sure 'value' is not null. otherwise __ceph_setxattr will remove the extended attribute. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-12-17ceph: fix mksnap crashYan, Zheng1-1/+3
mksnap reply only contain 'target', does not contain 'dentry'. So it's wrong to use req->r_reply_info.head->is_dentry to detect traceless reply. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-12-17ceph: do_sync is never initializedDan Carpenter1-1/+1
Probably this code was syncing a lot more often then intended because the do_sync variable wasn't set to zero. Cc: stable@vger.kernel.org # v3.11+ Fixes: c62988ec0910 ('ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17libceph: fixup includes in pagelist.hIlya Dryomov1-1/+3
pagelist.h needs to include linux/types.h and asm/byteorder.h and not rely on other headers pulling yet another set of headers. Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17ceph: support inline data featureYan, Zheng1-1/+2
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17ceph: flush inline versionYan, Zheng3-4/+23
After converting inline data to normal data, client need to flush the new i_inline_version (CEPH_INLINE_NONE) to MDS. This commit makes cap messages (sent to MDS) contain inline_version and inline_data. Client always converts inline data to normal data before data write, so the inline data length part is always zero. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17ceph: convert inline data to normal data before data writeYan, Zheng3-3/+161
Before any data write, convert inline data to normal data and set i_inline_version to CEPH_INLINE_NONE. The OSD request that saves inline data to object contains 3 operations (CMPXATTR, WRITE and SETXATTR). It compares a xattr named 'inline_version' to prevent old data overwrites newer data. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17ceph: sync read inline dataYan, Zheng2-13/+116
we can't use getattr to fetch inline data while holding Fr cap, because it can cause deadlock. If we need to sync read inline data, drop cap refs first, then use getattr to fetch inline data. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17ceph: fetch inline data when getting Fcr cap refsYan, Zheng3-18/+63
we can't use getattr to fetch inline data after getting Fcr caps, because it can cause deadlock. The solution is try bringing inline data to page cache when not holding any cap, and hope the inline data page is still there after getting the Fcr caps. If the page is still there, pin it in page cache for later IO. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17ceph: use getattr request to fetch inline dataYan, Zheng4-10/+34
Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data in getattr reply will be copied to the page. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17ceph: add inline data to pagecacheYan, Zheng6-1/+85
Request reply and cap message can contain inline data. add inline data to the page cache if there is Fc cap. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17ceph: parse inline data in MClientReply and MClientCapsYan, Zheng3-11/+36
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17libceph: specify position of extent operationYan, Zheng4-20/+19
allow specifying position of extent operation in multi-operations osd request. This is required for cephfs to convert inline data to normal data (compare xattr, then write object). Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17libceph: add CREATE osd operation supportYan, Zheng1-20/+22
Add CEPH_OSD_OP_CREATE support. Also change libceph to not treat CEPH_OSD_OP_DELETE as an extent op and add an assert to that end. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17libceph: add SETXATTR/CMPXATTR osd operations supportYan, Zheng2-0/+57
Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17rbd: don't treat CEPH_OSD_OP_DELETE as extent opIlya Dryomov1-2/+6
CEPH_OSD_OP_DELETE is not an extent op, stop treating it as such. This sneaked in with discard patches - it's one of the three osd ops (the other two are CEPH_OSD_OP_TRUNCATE and CEPH_OSD_OP_ZERO) that discard is implemented with. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
2014-12-17ceph: remove unused stringification macrosIlya Dryomov1-3/+0
These were used to report git versions a long time ago. Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17libceph: require cephx message signature by defaultYan, Zheng2-0/+14
Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17ceph: introduce global empty snap contextYan, Zheng3-3/+35
Current snaphost code does not properly handle moving inode from one empty snap realm to another empty snap realm. After changing inode's snap realm, some dirty pages' snap context can be not equal to inode's i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs() The fix is introduce a global empty snap context for all empty snap realm. This avoids triggering the BUG() for filesystem with no snapshot. Fixes: http://tracker.ceph.com/issues/9928 Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17ceph: message versioning fixesJohn Spray1-2/+5
There were two places we were assigning version in host byte order instead of network byte order. Also in MSG_CLIENT_SESSION we weren't setting compat_version in the header to reflect continued compatability with older MDSs. Fixes: http://tracker.ceph.com/issues/9945 Signed-off-by: John Spray <john.spray@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-12-17libceph: update ceph_msg_header structureJohn Spray1-1/+2
2 bytes of what was reserved space is now used by userspace for the compat_version field. Signed-off-by: John Spray <john.spray@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
2014-12-17libceph: message signature supportYan, Zheng8-4/+162
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17libceph: store session key in cephx authorizerYan, Zheng2-7/+12
Session key is required when calculating message signature. Save the session key in authorizer, this avoid lookup ticket handler for each message Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17ceph, rbd: delete unnecessary checks before two function callsSF Markus Elfring4-14/+7
The functions ceph_put_snap_context() and iput() test whether their argument is NULL and then return immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> [idryomov@redhat.com: squashed rbd.c hunk, changelog] Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17ceph: introduce a new inode flag indicating if cached dentries are orderedYan, Zheng3-19/+55
After creating/deleting/renaming file, offsets of sibling dentries may change. So we can not use cached dentries to satisfy readdir. But we can still use the cached dentries to conclude -ENOENT for lookup. This patch introduces a new inode flag indicating if child dentries are ordered. The flag is set at the same time marking a directory complete. After creating/deleting/renaming file, we clear the flag on directory inode. This prevents ceph_readdir() from using cached dentries to satisfy readdir syscall. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17libceph: nuke ceph_kvfree()Ilya Dryomov5-14/+4
Use kvfree() from linux/mm.h instead, which is identical. Also fix the ceph_buffer comment: we will allocate with kmalloc() up to 32k - the value of PAGE_ALLOC_COSTLY_ORDER, but that really is just an implementation detail so don't mention it at all. Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2014-12-17ceph: fix file lock interruptionYan, Zheng4-12/+67
When a lock operation is interrupted, current code sends a unlock request to MDS to undo the lock operation. This method does not work as expected because the unlock request can drop locks that have already been acquired. The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR requests to interrupt blocked file lock request. These requests do not drop locks that have alread been acquired, they only interrupt blocked file lock request. Signed-off-by: Yan, Zheng <zyan@redhat.com>
2014-12-17KVM: PPC: Book3S HV: Improve H_CONFER implementationSam Bobroff4-2/+74
Currently the H_CONFER hcall is implemented in kernel virtual mode, meaning that whenever a guest thread does an H_CONFER, all the threads in that virtual core have to exit the guest. This is bad for performance because it interrupts the other threads even if they are doing useful work. The H_CONFER hcall is called by a guest VCPU when it is spinning on a spinlock and it detects that the spinlock is held by a guest VCPU that is currently not running on a physical CPU. The idea is to give this VCPU's time slice to the holder VCPU so that it can make progress towards releasing the lock. To avoid having the other threads exit the guest unnecessarily, we add a real-mode implementation of H_CONFER that checks whether the other threads are doing anything. If all the other threads are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER), it returns H_TOO_HARD which causes a guest exit and allows the H_CONFER to be handled in virtual mode. Otherwise it spins for a short time (up to 10 microseconds) to give other threads the chance to observe that this thread is trying to confer. The spin loop also terminates when any thread exits the guest or when all other threads are idle or trying to confer. If the timeout is reached, the H_CONFER returns H_SUCCESS. In this case the guest VCPU will recheck the spinlock word and most likely call H_CONFER again. This also improves the implementation of the H_CONFER virtual mode handler. If the VCPU is part of a virtual core (vcore) which is runnable, there will be a 'runner' VCPU which has taken responsibility for running the vcore. In this case we yield to the runner VCPU rather than the target VCPU. We also introduce a check on the target VCPU's yield count: if it differs from the yield count passed to H_CONFER, the target VCPU has run since H_CONFER was called and may have already released the lock. This check is required by PAPR. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR registerPaul Mackerras4-2/+9
There are two ways in which a guest instruction can be obtained from the guest in the guest exit code in book3s_hv_rmhandlers.S. If the exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal instruction), the offending instruction is in the HEIR register (Hypervisor Emulation Instruction Register). If the exit was caused by a load or store to an emulated MMIO device, we load the instruction from the guest by turning data relocation on and loading the instruction with an lwz instruction. Unfortunately, in the case where the guest has opposite endianness to the host, these two methods give results of different endianness, but both get put into vcpu->arch.last_inst. The HEIR value has been loaded using guest endianness, whereas the lwz will load the instruction using host endianness. The rest of the code that uses vcpu->arch.last_inst assumes it was loaded using host endianness. To fix this, we define a new vcpu field to store the HEIR value. Then, in kvmppc_handle_exit_hv(), we transfer the value from this new field to vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness differ. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-12-17KVM: PPC: Book3S HV: Remove code for PPC970 processorsPaul Mackerras13-955/+70
This removes the code that was added to enable HV KVM to work on PPC970 processors. The PPC970 is an old CPU that doesn't support virtualizing guest memory. Removing PPC970 support also lets us remove the code for allocating and managing contiguous real-mode areas, the code for the !kvm->arch.using_mmu_notifiers case, the code for pinning pages of guest memory when first accessed and keeping track of which pages have been pinned, and the code for handling H_ENTER hypercalls in virtual mode. Book3S HV KVM is now supported only on POWER7 and POWER8 processors. The KVM_CAP_PPC_RMA capability now always returns 0. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>