summaryrefslogtreecommitdiff
path: root/arch/x86/xen
AgeCommit message (Collapse)AuthorFilesLines
2012-10-12Merge tag 'stable/for-linus-3.7-rc0-tag' of ↵Linus Torvalds2-1/+58
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen fixes from Konrad Rzeszutek Wilk: "This has four bug-fixes and one tiny feature that I forgot to put initially in my tree due to oversight. The feature is for kdump kernels to speed up the /proc/vmcore reading. There is a ram_is_pfn helper function that the different platforms can register for. We are now doing that. The bug-fixes cover some embarrassing struct pv_cpu_ops variables being set to NULL on Xen (but not baremetal). We had a similar issue in the past with {write|read}_msr_safe and this fills the three missing ones. The other bug-fix is to make the console output (hvc) be capable of dealing with misbehaving backends and not fall flat on its face. Lastly, a quirk for older XenBus implementations that came with an ancient v3.4 hypervisor (so RHEL5 based) - reading of certain non-existent attributes just hangs the guest during bootup - so we take precaution of not doing that on such older installations. Feature: - Register a pfn_is_ram helper to speed up reading of /proc/vmcore. Bug-fixes: - Three pvops call for Xen were undefined causing BUG_ONs. - Add a quirk so that the shutdown watches (used by kdump) are not used with older Xen (3.4). - Fix ungraceful state transition for the HVC console." * tag 'stable/for-linus-3.7-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pv-on-hvm kexec: add quirk for Xen 3.4 and shutdown watches. xen/bootup: allow {read|write}_cr8 pvops call. xen/bootup: allow read_tscp call for Xen PV guests. xen pv-on-hvm: add pfn_is_ram helper for kdump xen/hvc: handle backend CLOSED without CLOSING
2012-10-12xen/bootup: allow {read|write}_cr8 pvops call.Konrad Rzeszutek Wilk1-1/+15
We actually do not do anything about it. Just return a default value of zero and if the kernel tries to write anything but 0 we BUG_ON. This fixes the case when an user tries to suspend the machine and it blows up in save_processor_state b/c 'read_cr8' is set to NULL and we get: kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100! invalid opcode: 0000 [#1] SMP Pid: 2687, comm: init.late Tainted: G O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs RIP: e030:[<ffffffff814d5f42>] [<ffffffff814d5f42>] save_processor_state+0x212/0x270 .. snip.. Call Trace: [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150 [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5 CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-12xen/bootup: allow read_tscp call for Xen PV guests.Konrad Rzeszutek Wilk1-0/+2
The hypervisor will trap it. However without this patch, we would crash as the .read_tscp is set to NULL. This patch fixes it and sets it to the native_read_tscp call. CC: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-09mm: kill vma flag VM_RESERVED and mm->reserved_vm counterKonstantin Khlebnikov1-2/+1
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: | effect | alternative flags -+------------------------+--------------------------------------------- 1| account as reserved_vm | VM_IO 2| skip in core dump | VM_IO, VM_DONTDUMP 3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-07Merge tag 'stable/for-linus-3.7-arm-tag' of ↵Linus Torvalds3-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull ADM Xen support from Konrad Rzeszutek Wilk: Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports ARMv7 platforms. The goal in implementing this architecture is to exploit the hardware as much as possible. That means use as little as possible of PV operations (so no PV MMU) - and use existing PV drivers for I/Os (network, block, console, etc). This is similar to how PVHVM guests operate in X86 platform nowadays - except that on ARM there is no need for QEMU. The end result is that we share a lot of the generic Xen drivers and infrastructure. Details on how to compile/boot/etc are available at this Wiki: http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions and this blog has links to a technical discussion/presentations on the overall architecture: http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/ * tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits) xen/xen_initial_domain: check that xen_start_info is initialized xen: mark xen_init_IRQ __init xen/Makefile: fix dom-y build arm: introduce a DTS for Xen unprivileged virtual machines MAINTAINERS: add myself as Xen ARM maintainer xen/arm: compile netback xen/arm: compile blkfront and blkback xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree xen/arm: receive Xen events on ARM xen/arm: initialize grant_table on ARM xen/arm: get privilege status xen/arm: introduce CONFIG_XEN on ARM xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM xen/arm: Introduce xen_ulong_t for unsigned long xen/arm: Xen detection and shared_info page mapping docs: Xen ARM DT bindings xen/arm: empty implementation of grant_table arch specific functions xen/arm: sync_bitops xen/arm: page.h definitions xen/arm: hypercalls ...
2012-10-04xen pv-on-hvm: add pfn_is_ram helper for kdumpOlaf Hering1-0/+41
Register pfn_is_ram helper speed up reading /proc/vmcore in the kdump kernel. See commit message of 997c136f518c ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages") for details. It makes use of a new hvmop HVMOP_get_mem_type which was introduced in xen 4.2 (23298:26413986e6e0) and backported to 4.1.1. The new function is currently only enabled for reading /proc/vmcore. Later it will be used also for the kexec kernel. Since that requires more changes in the generic kernel make it static for the time being. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-02Merge tag 'stable/for-linus-3.7-x86-tag' of ↵Linus Torvalds10-59/+378
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "Features: - When hotplugging PCI devices in a PV guest we can allocate Xen-SWIOTLB later. - Cleanup Xen SWIOTLB. - Support pages out grants from HVM domains in the backends. - Support wild cards in xen-pciback.hide=(BDF) arguments. - Update grant status updates with upstream hypervisor. - Boot PV guests with more than 128GB. - Cleanup Xen MMU code/add comments. - Obtain XENVERS using a preferred method. - Lay out generic changes to support Xen ARM. - Allow privcmd ioctl for HVM (used to do only PV). - Do v2 of mmap_batch for privcmd ioctls. - If hypervisor saves the LED keyboard light - we will now instruct the kernel about its state. Fixes: - More fixes to Xen PCI backend for various calls/FLR/etc. - With more than 4GB in a 64-bit PV guest disable native SWIOTLB. - Fix up smatch warnings. - Fix up various return values in privmcmd and mm." * tag 'stable/for-linus-3.7-x86-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (48 commits) xen/pciback: Restore the PCI config space after an FLR. xen-pciback: properly clean up after calling pcistub_device_find() xen/vga: add the xen EFI video mode support xen/x86: retrieve keyboard shift status flags from hypervisor. xen/gndev: Xen backend support for paged out grant targets V4. xen-pciback: support wild cards in slot specifications xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer. xen/swiotlb: Remove functions not needed anymore. xen/pcifront: Use Xen-SWIOTLB when initting if required. xen/swiotlb: For early initialization, return zero on success. xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used. xen/swiotlb: Move the error strings to its own function. xen/swiotlb: Move the nr_tbl determination in its own function. xen/arm: compile and run xenbus xen: resynchronise grant table status codes with upstream xen/privcmd: return -EFAULT on error xen/privcmd: Fix mmap batch ioctl error status copy back. xen/privcmd: add PRIVCMD_MMAPBATCH_V2 ioctl xen/mm: return more precise error from xen_remap_domain_range() xen/mmu: If the revector fails, don't attempt to revector anything else. ...
2012-10-01Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds1-11/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/platform changes from Ingo Molnar: "This cleans up some Xen-induced pagetable init code uglies, by generalizing new platform callbacks and state: x86_init.paging.*" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Document x86_init.paging.pagetable_init() x86: xen: Cleanup and remove x86_init.paging.pagetable_setup_done() x86: Move paging_init() call to x86_init.paging.pagetable_init() x86: Rename pagetable_setup_start() to pagetable_init() x86: Remove base argument from x86_init.paging.pagetable_setup_start
2012-10-01Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds1-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/asm changes from Ingo Molnar: "The one change that stands out is the alternatives patching change that prevents us from ever patching back instructions from SMP to UP: this simplifies things and speeds up CPU hotplug. Other than that it's smaller fixes, cleanups and improvements." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Unspaghettize do_trap() x86_64: Work around old GAS bug x86: Use REP BSF unconditionally x86: Prefer TZCNT over BFS x86/64: Adjust types of temporaries used by ffs()/fls()/fls64() x86: Drop unnecessary kernel_eflags variable on 64-bit x86/smp: Don't ever patch back to UP if we unplug cpus
2012-09-26Merge branch 'xenarm-for-linus' of ↵Konrad Rzeszutek Wilk3-1/+2
git://xenbits.xen.org/people/sstabellini/linux-pvhvm into stable/for-linus-3.7 * 'xenarm-for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: arm: introduce a DTS for Xen unprivileged virtual machines MAINTAINERS: add myself as Xen ARM maintainer xen/arm: compile netback xen/arm: compile blkfront and blkback xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree xen/arm: receive Xen events on ARM xen/arm: initialize grant_table on ARM xen/arm: get privilege status xen/arm: introduce CONFIG_XEN on ARM xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM xen/arm: Introduce xen_ulong_t for unsigned long xen/arm: Xen detection and shared_info page mapping docs: Xen ARM DT bindings xen/arm: empty implementation of grant_table arch specific functions xen/arm: sync_bitops xen/arm: page.h definitions xen/arm: hypercalls arm: initial Xen support Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-24xen/vga: add the xen EFI video mode supportJan Beulich1-0/+7
In order to add xen EFI frambebuffer video support, it is required to add xen-efi's new video type (XEN_VGATYPE_EFI_LFB) case and handle it in the function xen_init_vga and set the video type to VIDEO_TYPE_EFI to enable efi video mode. The original patch from which this was broken out from: http://marc.info/?i=4E099AA6020000780004A4C6@nat28.tlf.novell.com Signed-off-by: Jan Beulich <JBeulich@novell.com> Signed-off-by: Tang Liang <liang.tang@oracle.com> [v2: The original author is Jan Beulich and Liang Tang ported it to upstream] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-24xen/x86: retrieve keyboard shift status flags from hypervisor.Konrad Rzeszutek Wilk1-0/+8
The xen c/s 25873 allows the hypervisor to retrieve the NUMLOCK flag. With this patch, the Linux kernel can get the state according to the data in the BIOS. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-24xen/boot: Disable NUMA for PV guests.Konrad Rzeszutek Wilk1-0/+4
The hypervisor is in charge of allocating the proper "NUMA" memory and dealing with the CPU scheduler to keep them bound to the proper NUMA node. The PV guests (and PVHVM) have no inkling of where they run and do not need to know that right now. In the future we will need to inject NUMA configuration data (if a guest spans two or more NUMA nodes) so that the kernel can make the right choices. But those patches are not yet present. In the meantime, disable the NUMA capability in the PV guest, which also fixes a bootup issue. Andre says: "we see Dom0 crashes due to the kernel detecting the NUMA topology not by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA). This will detect the actual NUMA config of the physical machine, but will crash about the mismatch with Dom0's virtual memory. Variation of the theme: Dom0 sees what it's not supposed to see. This happens with the said config option enabled and on a machine where this scanning is still enabled (K8 and Fam10h, not Bulldozer class) We have this dump then: NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10 Scanning NUMA topology in Northbridge 24 Number of physical nodes 4 Node 0 MemBase 0000000000000000 Limit 0000000040000000 Node 1 MemBase 0000000040000000 Limit 0000000138000000 Node 2 MemBase 0000000138000000 Limit 00000001f8000000 Node 3 MemBase 00000001f8000000 Limit 0000000238000000 Initmem setup node 0 0000000000000000-0000000040000000 NODE_DATA [000000003ffd9000 - 000000003fffffff] Initmem setup node 1 0000000040000000-0000000138000000 NODE_DATA [0000000137fd9000 - 0000000137ffffff] Initmem setup node 2 0000000138000000-00000001f8000000 NODE_DATA [00000001f095e000 - 00000001f0984fff] Initmem setup node 3 00000001f8000000-0000000238000000 Cannot find 159744 bytes in node 3 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar RIP: e030:[<ffffffff81d220e6>] [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 .. snip.. [<ffffffff81d23024>] sparse_early_usemaps_alloc_node+0x64/0x178 [<ffffffff81d23348>] sparse_init+0xe4/0x25a [<ffffffff81d16840>] paging_init+0x13/0x22 [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b [<ffffffff81683954>] ? printk+0x3c/0x3e [<ffffffff81d01a38>] start_kernel+0xe5/0x468 [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1 [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36 [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c " so we just disable NUMA scanning by setting numa_off=1. CC: stable@vger.kernel.org Reported-and-Tested-by: Andre Przywara <andre.przywara@amd.com> Acked-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-22Merge branch 'stable/late-swiotlb.v3.3' into stable/for-linus-3.7Konrad Rzeszutek Wilk1-5/+42
* stable/late-swiotlb.v3.3: xen/swiotlb: Fix compile warnings when using plain integer instead of NULL pointer. xen/swiotlb: Remove functions not needed anymore. xen/pcifront: Use Xen-SWIOTLB when initting if required. xen/swiotlb: For early initialization, return zero on success. xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late when PV PCI is used. xen/swiotlb: Move the error strings to its own function. xen/swiotlb: Move the nr_tbl determination in its own function. swiotlb: add the late swiotlb initialization function with iotlb memory xen/swiotlb: With more than 4GB on 64-bit, disable the native SWIOTLB. xen/swiotlb: Simplify the logic. Conflicts: arch/x86/xen/pci-swiotlb-xen.c Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-19xen/boot: Disable BIOS SMP MP table search.Konrad Rzeszutek Wilk1-0/+4
As the initial domain we are able to search/map certain regions of memory to harvest configuration data. For all low-level we use ACPI tables - for interrupts we use exclusively ACPI _PRT (so DSDT) and MADT for INT_SRC_OVR. The SMP MP table is not used at all. As a matter of fact we do not even support machines that only have SMP MP but no ACPI tables. Lets follow how Moorestown does it and just disable searching for BIOS SMP tables. This also fixes an issue on HP Proliant BL680c G5 and DL380 G6: 9f->100 for 1:1 PTE Freeing 9f-100 pfn range: 97 pages freed 1-1 mapping on 9f->100 .. snip.. e820: BIOS-provided physical RAM map: Xen: [mem 0x0000000000000000-0x000000000009efff] usable Xen: [mem 0x000000000009f400-0x00000000000fffff] reserved Xen: [mem 0x0000000000100000-0x00000000cfd1dfff] usable .. snip.. Scan for SMP in [mem 0x00000000-0x000003ff] Scan for SMP in [mem 0x0009fc00-0x0009ffff] Scan for SMP in [mem 0x000f0000-0x000fffff] found SMP MP-table at [mem 0x000f4fa0-0x000f4faf] mapped at [ffff8800000f4fa0] (XEN) mm.c:908:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 0000000000100461 for l1e_owner=0, pg_owner=0 (XEN) mm.c:4995:d0 ptwr_emulate: could not get_page_from_l1e() BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81ac07e2>] xen_set_pte_init+0x66/0x71 . snip.. Pid: 0, comm: swapper Not tainted 3.6.0-rc6upstream-00188-gb6fb969-dirty #2 HP ProLiant BL680c G5 .. snip.. Call Trace: [<ffffffff81ad31c6>] __early_ioremap+0x18a/0x248 [<ffffffff81624731>] ? printk+0x48/0x4a [<ffffffff81ad32ac>] early_ioremap+0x13/0x15 [<ffffffff81acc140>] get_mpc_size+0x2f/0x67 [<ffffffff81acc284>] smp_scan_config+0x10c/0x136 [<ffffffff81acc2e4>] default_find_smp_config+0x36/0x5a [<ffffffff81ac3085>] setup_arch+0x5b3/0xb5b [<ffffffff81624731>] ? printk+0x48/0x4a [<ffffffff81abca7f>] start_kernel+0x90/0x390 [<ffffffff81abc356>] x86_64_start_reservations+0x131/0x136 [<ffffffff81abfa83>] xen_start_kernel+0x65f/0x661 (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. which is that ioremap would end up mapping 0xff using _PAGE_IOMAP (which is what early_ioremap sticks as a flag) - which meant we would get MFN 0xFF (pte ff461, which is OK), and then it would also map 0x100 (b/c ioremap tries to get page aligned request, and it was trying to map 0xf4fa0 + PAGE_SIZE - so it mapped the next page) as _PAGE_IOMAP. Since 0x100 is actually a RAM page, and the _PAGE_IOMAP bypasses the P2M lookup we would happily set the PTE to 1000461. Xen would deny the request since we do not have access to the Machine Frame Number (MFN) of 0x100. The P2M[0x100] is for example 0x80140. CC: stable@vger.kernel.org Fixes-Oracle-Bugzilla: https://bugzilla.oracle.com/bugzilla/show_bug.cgi?id=13665 Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-17xen/swiotlb: Fix compile warnings when using plain integer instead of NULL ↵Konrad Rzeszutek Wilk1-2/+2
pointer. arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-17xen/swiotlb: Use the swiotlb_late_init_with_tbl to init Xen-SWIOTLB late ↵Konrad Rzeszutek Wilk1-2/+22
when PV PCI is used. With this patch we provide the functionality to initialize the Xen-SWIOTLB late in the bootup cycle - specifically for Xen PCI-frontend. We still will work if the user had supplied 'iommu=soft' on the Linux command line. Note: We cannot depend on after_bootmem to automatically determine whether this is early or not. This is because when PCI IOMMUs are initialized it is after after_bootmem but before a lot of "other" subsystems are initialized. CC: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> [v1: Fix smatch warnings] [v2: Added check for xen_swiotlb] [v3: Rebased with new xen-swiotlb changes] [v4: squashed xen/swiotlb: Depending on after_bootmem is not correct in] Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-12xen/m2p: do not reuse kmap_op->dev_bus_addrStefano Stabellini1-16/+11
If the caller passes a valid kmap_op to m2p_add_override, we use kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is part of the interface with Xen and if we are batching the hypercalls it might not have been written by the hypervisor yet. That means that later on Xen will write to it and we'll think that the original mfn is actually what Xen has written to it. Rather than "stealing" struct members from kmap_op, keep using page->index to store the original mfn and add another parameter to m2p_remove_override to get the corresponding kmap_op instead. It is now responsibility of the caller to keep track of which kmap_op corresponds to a particular page in the m2p_override (gntdev, the only user of this interface that passes a valid kmap_op, is already doing that). CC: stable@kernel.org Reported-and-Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-12Merge branch 'stable/128gb.v5.1' into stable/for-linus-3.7Konrad Rzeszutek Wilk5-47/+254
* stable/128gb.v5.1: xen/mmu: If the revector fails, don't attempt to revector anything else. xen/p2m: When revectoring deal with holes in the P2M array. xen/mmu: Release just the MFN list, not MFN list and part of pagetables. xen/mmu: Remove from __ka space PMD entries for pagetables. xen/mmu: Copy and revector the P2M tree. xen/p2m: Add logic to revector a P2M tree to use __va leafs. xen/mmu: Recycle the Xen provided L4, L3, and L2 pages xen/mmu: For 64-bit do not call xen_map_identity_early xen/mmu: use copy_page instead of memcpy. xen/mmu: Provide comments describing the _ka and _va aliasing issue xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything. Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." and "xen/x86: Use memblock_reserve for sensitive areas." xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain. xen/x86: Use memblock_reserve for sensitive areas. xen/p2m: Fix the comment describing the P2M tree. Conflicts: arch/x86/xen/mmu.c The pagetable_init is the old xen_pagetable_setup_done and xen_pagetable_setup_start rolled in one. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-12Merge branch 'x86/platform' of ↵Konrad Rzeszutek Wilk5-84/+225
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into stable/for-linus-3.7 * 'x86/platform' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (9690 commits) x86: Document x86_init.paging.pagetable_init() x86: xen: Cleanup and remove x86_init.paging.pagetable_setup_done() x86: Move paging_init() call to x86_init.paging.pagetable_init() x86: Rename pagetable_setup_start() to pagetable_init() x86: Remove base argument from x86_init.paging.pagetable_setup_start Linux 3.6-rc5 HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured Remove user-triggerable BUG from mpol_to_str xen/pciback: Fix proper FLR steps. uml: fix compile error in deliver_alarm() dj: memory scribble in logi_dj Fix order of arguments to compat_put_time[spec|val] xen: Use correct masking in xen_swiotlb_alloc_coherent. xen: fix logical error in tlb flushing xen/p2m: Fix one-off error in checking the P2M tree directory. powerpc: Don't use __put_user() in patch_instruction powerpc: Make sure IPI handlers see data written by IPI senders powerpc: Restore correct DSCR in context switch powerpc: Fix DSCR inheritance in copy_thread() powerpc: Keep thread.dscr and thread.dscr_inherit in sync ...
2012-09-12x86: xen: Cleanup and remove x86_init.paging.pagetable_setup_done()Attilio Rao1-9/+4
At this stage x86_init.paging.pagetable_setup_done is only used in the XEN case. Move its content in the x86_init.paging.pagetable_init setup function and remove the now unused x86_init.paging.pagetable_setup_done remaining infrastructure. Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Acked-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-5-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-09-12x86: Move paging_init() call to x86_init.paging.pagetable_init()Attilio Rao1-0/+1
Move the paging_init() call to the platform specific pagetable_init() function, so we can get rid of the extra pagetable_setup_done() function pointer. Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Acked-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-4-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-09-12x86: Rename pagetable_setup_start() to pagetable_init()Attilio Rao1-2/+2
In preparation for unifying the pagetable_setup_start() and pagetable_setup_done() setup functions, rename appropriately all the infrastructure related to pagetable_setup_start(). Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Ackedd-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-3-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-09-12x86: Remove base argument from x86_init.paging.pagetable_setup_startAttilio Rao1-1/+1
We either use swapper_pg_dir or the argument is unused. Preparatory patch to simplify platform pagetable setup further. Signed-off-by: Attilio Rao <attilio.rao@citrix.com> Ackedb-by: <konrad.wilk@oracle.com> Cc: <Ian.Campbell@citrix.com> Cc: <Stefano.Stabellini@eu.citrix.com> Cc: <xen-devel@lists.xensource.com> Link: http://lkml.kernel.org/r/1345580561-8506-2-git-send-email-attilio.rao@citrix.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-09-05xen/mm: return more precise error from xen_remap_domain_range()David Vrabel1-2/+2
Callers of xen_remap_domain_range() need to know if the remap failed because frame is currently paged out. So they can retry the remap later on. Return -ENOENT in this case. This assumes that the error codes returned by Xen are a subset of those used by the kernel. It is unclear if this is defined as part of the hypercall ABI. Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-05xen: fix logical error in tlb flushingAlex Shi1-1/+1
While TLB_FLUSH_ALL gets passed as 'end' argument to flush_tlb_others(), the Xen code was made to check its 'start' parameter. That may give a incorrect op.cmd to MMUEXT_INVLPG_MULTI instead of MMUEXT_TLB_FLUSH_MULTI. Then it causes some page can not be flushed from TLB. This patch fixed this issue. Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Alex Shi <alex.shi@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Tested-by: Yongjie Ren <yongjie.ren@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-09-05Merge commit '4cb38750d49010ae72e718d46605ac9ba5a851b4' into ↵Konrad Rzeszutek Wilk3-10/+6
stable/for-linus-3.6 * commit '4cb38750d49010ae72e718d46605ac9ba5a851b4': (6849 commits) bcma: fix invalid PMU chip control masks [libata] pata_cmd64x: whitespace cleanup libata-acpi: fix up for acpi_pm_device_sleep_state API sata_dwc_460ex: device tree may specify dma_channel ahci, trivial: fixed coding style issues related to braces ahci_platform: add hibernation callbacks libata-eh.c: local functions should not be exposed globally libata-transport.c: local functions should not be exposed globally sata_dwc_460ex: support hardreset ata: use module_pci_driver drivers/ata/pata_pcmcia.c: adjust suspicious bit operation pata_imx: Convert to clk_prepare_enable/clk_disable_unprepare ahci: Enable SB600 64bit DMA on MSI K9AGM2 (MS-7327) v2 [libata] Prevent interface errors with Seagate FreeAgent GoFlex drivers/acpi/glue: revert accidental license-related 6b66d95895c bits libata-acpi: add missing inlines in libata.h i2c-omap: Add support for I2C_M_STOP message flag i2c: Fall back to emulated SMBus if the operation isn't supported natively i2c: Add SCCB support i2c-tiny-usb: Add support for the Robofuzz OSIF USB/I2C converter ...
2012-09-05xen/p2m: Fix one-off error in checking the P2M tree directory.Konrad Rzeszutek Wilk1-1/+1
We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES inclusive) when trying to figure out whether we can re-use some of the P2M middle leafs. Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512 we would try to use the 512th entry. Fortunately for us the p2m_top_index has a check for this: BUG_ON(pfn >= MAX_P2M_PFN); which we hit and saw this: (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.1.2-OVM x86_64 debug=n Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff819cadeb>] (XEN) RFLAGS: 0000000000000212 EM: 1 CONTEXT: pv guest (XEN) rax: ffffffff81db5000 rbx: ffffffff81db4000 rcx: 0000000000000000 (XEN) rdx: 0000000000480211 rsi: 0000000000000000 rdi: ffffffff81db4000 (XEN) rbp: ffffffff81793db8 rsp: ffffffff81793d38 r8: 0000000008000000 (XEN) r9: 4000000000000000 r10: 0000000000000000 r11: ffffffff81db7000 (XEN) r12: 0000000000000ff8 r13: ffffffff81df1ff8 r14: ffffffff81db6000 (XEN) r15: 0000000000000ff8 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 0000000661795000 cr2: 0000000000000000 Fixes-Oracle-Bug: 14570662 CC: stable@vger.kernel.org # only for v3.5 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: If the revector fails, don't attempt to revector anything else.Konrad Rzeszutek Wilk1-1/+3
If the P2M revectoring would fail, we would try to continue on by cleaning the PMD for L1 (PTE) page-tables. The xen_cleanhighmap is greedy and erases the PMD on both boundaries. Since the P2M array can share the PMD, we would wipe out part of the __ka that is still used in the P2M tree to point to P2M leafs. This fixes it by bypassing the revectoring and continuing on. If the revector fails, a nice WARN is printed so we can still troubleshoot this. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/p2m: When revectoring deal with holes in the P2M array.Konrad Rzeszutek Wilk1-3/+11
When we free the PFNs and then subsequently populate them back during bootup: Freeing 20000-20200 pfn range: 512 pages freed 1-1 mapping on 20000->20200 Freeing 40000-40200 pfn range: 512 pages freed 1-1 mapping on 40000->40200 Freeing bad80-badf4 pfn range: 116 pages freed 1-1 mapping on bad80->badf4 Freeing badf6-bae7f pfn range: 137 pages freed 1-1 mapping on badf6->bae7f Freeing bb000-100000 pfn range: 282624 pages freed 1-1 mapping on bb000->100000 Released 283999 pages of unused memory Set 283999 page(s) to 1-1 mapping Populating 1acb8a-1f20e9 pfn range: 283999 pages added We end up having the P2M array (that is the one that was grafted on the P2M tree) filled with IDENTITY_FRAME or INVALID_P2M_ENTRY) entries. The patch titled "xen/p2m: Reuse existing P2M leafs if they are filled with 1:1 PFNs or INVALID." recycles said slots and replaces the P2M tree leaf's with &mfn_list[xx] with p2m_identity or p2m_missing. And re-uses the P2M array sections for other P2M tree leaf's. For the above mentioned bootup excerpt, the PFNs at 0x20000->0x20200 are going to be IDENTITY based: P2M[0][256][0] -> P2M[0][257][0] get turned in IDENTITY_FRAME. We can re-use that and replace P2M[0][256] to point to p2m_identity. The "old" page (the grafted P2M array provided by Xen) that was at P2M[0][256] gets put somewhere else. Specifically at P2M[6][358], b/c when we populate back: Populating 1acb8a-1f20e9 pfn range: 283999 pages added we fill P2M[6][358][0] (and P2M[6][358], P2M[6][359], ...) with the new MFNs. That is all OK, except when we revector we assume that the PFN count would be the same in the grafted P2M array and in the newly allocated. Since that is no longer the case, as we have holes in the P2M that point to p2m_missing or p2m_identity we have to take that into account. [v2: Check for overflow] [v3: Move within the __va check] [v4: Fix the computation] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: Release just the MFN list, not MFN list and part of pagetables.Konrad Rzeszutek Wilk1-1/+1
We call memblock_reserve for [start of mfn list] -> [PMD aligned end of mfn list] instead of <start of mfn list> -> <page aligned end of mfn list]. This has the disastrous effect that if at bootup the end of mfn_list is not PMD aligned we end up returning to memblock parts of the region past the mfn_list array. And those parts are the PTE tables with the disastrous effect of seeing this at bootup: Write protecting the kernel read-only data: 10240k Freeing unused kernel memory: 1860k freed Freeing unused kernel memory: 200k freed (XEN) mm.c:2429:d0 Bad type (saw 1400000000000002 != exp 7000000000000000) for mfn 116a80 (pfn 14e26) ... (XEN) mm.c:908:d0 Error getting mfn 116a83 (pfn 14e2a) from L1 entry 8000000116a83067 for l1e_owner=0, pg_owner=0 (XEN) mm.c:908:d0 Error getting mfn 4040 (pfn 5555555555555555) from L1 entry 0000000004040601 for l1e_owner=0, pg_owner=0 .. and so on. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: Remove from __ka space PMD entries for pagetables.Konrad Rzeszutek Wilk1-0/+19
Please first read the description in "xen/mmu: Copy and revector the P2M tree." At this stage, the __ka address space (which is what the old P2M tree was using) is partially disassembled. The cleanup_highmap has removed the PMD entries from 0-16MB and anything past _brk_end up to the max_pfn_mapped (which is the end of the ramdisk). The xen_remove_p2m_tree and code around has ripped out the __ka for the old P2M array. Here we continue on doing it to where the Xen page-tables were. It is safe to do it, as the page-tables are addressed using __va. For good measure we delete anything that is within MODULES_VADDR and up to the end of the PMD. At this point the __ka only contains PMD entries for the start of the kernel up to __brk. [v1: Per Stefano's suggestion wrapped the MODULES_VADDR in debug] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: Copy and revector the P2M tree.Konrad Rzeszutek Wilk1-0/+57
Please first read the description in "xen/p2m: Add logic to revector a P2M tree to use __va leafs" patch. The 'xen_revector_p2m_tree()' function allocates a new P2M tree copies the contents of the old one in it, and returns the new one. At this stage, the __ka address space (which is what the old P2M tree was using) is partially disassembled. The cleanup_highmap has removed the PMD entries from 0-16MB and anything past _brk_end up to the max_pfn_mapped (which is the end of the ramdisk). We have revectored the P2M tree (and the one for save/restore as well) to use new shiny __va address to new MFNs. The xen_start_info has been taken care of already in 'xen_setup_kernel_pagetable()' and xen_start_info->shared_info in 'xen_setup_shared_info()', so we are free to roam and delete PMD entries - which is exactly what we are going to do. We rip out the __ka for the old P2M array. [v1: Fix smatch warnings] [v2: memset was doing 0 instead of 0xff] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/p2m: Add logic to revector a P2M tree to use __va leafs.Konrad Rzeszutek Wilk2-0/+71
During bootup Xen supplies us with a P2M array. It sticks it right after the ramdisk, as can be seen with a 128GB PV guest: (certain parts removed for clarity): xc_dom_build_image: called xc_dom_alloc_segment: kernel : 0xffffffff81000000 -> 0xffffffff81e43000 (pfn 0x1000 + 0xe43 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000 xc_dom_alloc_segment: ramdisk : 0xffffffff81e43000 -> 0xffffffff925c7000 (pfn 0x1e43 + 0x10784 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000 xc_dom_alloc_segment: phys2mach : 0xffffffff925c7000 -> 0xffffffffa25c7000 (pfn 0x125c7 + 0x10000 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000 xc_dom_alloc_page : start info : 0xffffffffa25c7000 (pfn 0x225c7) xc_dom_alloc_page : xenstore : 0xffffffffa25c8000 (pfn 0x225c8) xc_dom_alloc_page : console : 0xffffffffa25c9000 (pfn 0x225c9) nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s) nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s) nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s) nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s) xc_dom_alloc_segment: page tables : 0xffffffffa25ca000 -> 0xffffffffa26e1000 (pfn 0x225ca + 0x117 pages) xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000 xc_dom_alloc_page : boot stack : 0xffffffffa26e1000 (pfn 0x226e1) xc_dom_build_image : virt_alloc_end : 0xffffffffa26e2000 xc_dom_build_image : virt_pgtab_end : 0xffffffffa2800000 So the physical memory and virtual (using __START_KERNEL_map addresses) layout looks as so: phys __ka /------------\ /-------------------\ | 0 | empty | 0xffffffff80000000| | .. | | .. | | 16MB | <= kernel starts | 0xffffffff81000000| | .. | | | | 30MB | <= kernel ends => | 0xffffffff81e43000| | .. | & ramdisk starts | .. | | 293MB | <= ramdisk ends=> | 0xffffffff925c7000| | .. | & P2M starts | .. | | .. | | .. | | 549MB | <= P2M ends => | 0xffffffffa25c7000| | .. | start_info | 0xffffffffa25c7000| | .. | xenstore | 0xffffffffa25c8000| | .. | cosole | 0xffffffffa25c9000| | 549MB | <= page tables => | 0xffffffffa25ca000| | .. | | | | 550MB | <= PGT end => | 0xffffffffa26e1000| | .. | boot stack | | \------------/ \-------------------/ As can be seen, the ramdisk, P2M and pagetables are taking a bit of __ka addresses space. Which is a problem since the MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits right in there! This results during bootup with the inability to load modules, with this error: ------------[ cut here ]------------ WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370() Call Trace: [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0 [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20 [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370 [<ffffffff81130c4d>] map_vm_area+0x2d/0x50 [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80 [<ffffffff810c6186>] ? load_module+0x66/0x19c0 [<ffffffff8105cadc>] module_alloc+0x5c/0x60 [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80 [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80 [<ffffffff810c70c3>] load_module+0xfa3/0x19c0 [<ffffffff812491f6>] ? security_file_permission+0x86/0x90 [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220 [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b ---[ end trace fd8f7704fdea0291 ]--- vmalloc: allocation failure, allocated 16384 of 20480 bytes modprobe: page allocation failure: order:0, mode:0xd2 Since the __va and __ka are 1:1 up to MODULES_VADDR and cleanup_highmap rids __ka of the ramdisk mapping, what we want to do is similar - get rid of the P2M in the __ka address space. There are two ways of fixing this: 1) All P2M lookups instead of using the __ka address would use the __va address. This means we can safely erase from __ka space the PMD pointers that point to the PFNs for P2M array and be OK. 2). Allocate a new array, copy the existing P2M into it, revector the P2M tree to use that, and return the old P2M to the memory allocate. This has the advantage that it sets the stage for using XEN_ELF_NOTE_INIT_P2M feature. That feature allows us to set the exact virtual address space we want for the P2M - and allows us to boot as initial domain on large machines. So we pick option 2). This patch only lays the groundwork in the P2M code. The patch that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree." Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: Recycle the Xen provided L4, L3, and L2 pagesKonrad Rzeszutek Wilk1-7/+35
As we are not using them. We end up only using the L1 pagetables and grafting those to our page-tables. [v1: Per Stefano's suggestion squashed two commits] [v2: Per Stefano's suggestion simplified loop] [v3: Fix smatch warnings] [v4: Add more comments] Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: For 64-bit do not call xen_map_identity_earlyKonrad Rzeszutek Wilk1-6/+5
B/c we do not need it. During the startup the Xen provides us with all the initial memory mapped that we need to function. The initial memory mapped is up to the bootstack, which means we can reference using __ka up to 4.f): (from xen/interface/xen.h): 4. This the order of bootstrap elements in the initial virtual region: a. relocated kernel image b. initial ram disk [mod_start, mod_len] c. list of allocated page frames [mfn_list, nr_pages] d. start_info_t structure [register ESI (x86)] e. bootstrap page tables [pt_base, CR3 (x86)] f. bootstrap stack [register ESP (x86)] (initial ram disk may be ommitted). [v1: More comments in git commit] Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: use copy_page instead of memcpy.Konrad Rzeszutek Wilk1-7/+6
After all, this is what it is there for. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: Provide comments describing the _ka and _va aliasing issueKonrad Rzeszutek Wilk1-0/+17
Which is that the level2_kernel_pgt (__ka virtual addresses) and level2_ident_pgt (__va virtual address) contain the same PMD entries. So if you modify a PTE in __ka, it will be reflected in __va (and vice-versa). Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything.Konrad Rzeszutek Wilk3-13/+4
We don't need to return the new PGD - as we do not use it. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23Revert "xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain." ↵Konrad Rzeszutek Wilk3-65/+27
and "xen/x86: Use memblock_reserve for sensitive areas." This reverts commit 806c312e50f122c47913145cf884f53dd09d9199 and commit 59b294403e9814e7c1154043567f0d71bac7a511. And also documents setup.c and why we want to do it that way, which is that we tried to make the the memblock_reserve more selective so that it would be clear what region is reserved. Sadly we ran in the problem wherein on a 64-bit hypervisor with a 32-bit initial domain, the pt_base has the cr3 value which is not neccessarily where the pagetable starts! As Jan put it: " Actually, the adjustment turns out to be correct: The page tables for a 32-on-64 dom0 get allocated in the order "first L1", "first L2", "first L3", so the offset to the page table base is indeed 2. When reading xen/include/public/xen.h's comment very strictly, this is not a violation (since there nothing is said that the first thing in the page table space is pointed to by pt_base; I admit that this seems to be implied though, namely do I think that it is implied that the page table space is the range [pt_base, pt_base + nt_pt_frames), whereas that range here indeed is [pt_base - 2, pt_base - 2 + nt_pt_frames), which - without a priori knowledge - the kernel would have difficulty to figure out)." - so lets just fall back to the easy way and reserve the whole region. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/swiotlb: Fix compile warnings when using plain integer instead of NULL ↵Konrad Rzeszutek Wilk1-2/+2
pointer. arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer arch/x86/xen/pci-swiotlb-xen.c:96:1: warning: Using plain integer as NULL pointer Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen: allow privcmd for HVM guestsStefano Stabellini1-0/+3
This patch removes the "return -ENOSYS" for auto_translated_physmap guests from privcmd_mmap, thus it allows ARM guests to issue privcmd mmap calls. However privcmd mmap calls are still going to fail for HVM and hybrid guests on x86 because the xen_remap_domain_mfn_range implementation is currently PV only. Changes in v2: - better commit message; - return -EINVAL from xen_remap_domain_mfn_range if auto_translated_physmap. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23xen/setup: Fix one-off error when adding for-balloon PFNs to the P2M.Konrad Rzeszutek Wilk1-1/+8
When we are finished with return PFNs to the hypervisor, then populate it back, and also mark the E820 MMIO and E820 gaps as IDENTITY_FRAMEs, we then call P2M to set areas that can be used for ballooning. We were off by one, and ended up over-writting a P2M entry that most likely was an IDENTITY_FRAME. For example: 1-1 mapping on 40000->40200 1-1 mapping on bc558->bc5ac 1-1 mapping on bc5b4->bc8c5 1-1 mapping on bc8c6->bcb7c 1-1 mapping on bcd00->100000 Released 614 pages of unused memory Set 277889 page(s) to 1-1 mapping Populating 40200-40466 pfn range: 614 pages added => here we set from 40466 up to bc559 P2M tree to be INVALID_P2M_ENTRY. We should have done it up to bc558. The end result is that if anybody is trying to construct a PTE for PFN bc558 they end up with ~PAGE_PRESENT. CC: stable@vger.kernel.org Reported-by-and-Tested-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-23x86/smp: Don't ever patch back to UP if we unplug cpusRusty Russell1-4/+2
We still patch SMP instructions to UP variants if we boot with a single CPU, but not at any other time. In particular, not if we unplug CPUs to return to a single cpu. Paul McKenney points out: mean offline overhead is 6251/48=130.2 milliseconds. If I remove the alternatives_smp_switch() from the offline path [...] the mean offline overhead is 550/42=13.1 milliseconds Basically, we're never going to get those 120ms back, and the code is pretty messy. We get rid of: 1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the documentation is wrong. It's now the default. 2) The skip_smp_alternatives flag used by suspend. 3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end() which were only used to set this one flag. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paul.mckenney@us.ibm.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-08-21xen/apic/xenbus/swiotlb/pcifront/grant/tmem: Make functions or variables static.Konrad Rzeszutek Wilk4-1/+6
There is no need for those functions/variables to be visible. Make them static and also fix the compile warnings of this sort: drivers/xen/<some file>.c: warning: symbol '<blah>' was not declared. Should it be static? Some of them just require including the header file that declares the functions. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-21xen/swiotlb: With more than 4GB on 64-bit, disable the native SWIOTLB.Konrad Rzeszutek Wilk1-0/+14
If a PV guest is booted the native SWIOTLB should not be turned on. It does not help us (we don't have any PCI devices) and it eats 64MB of good memory. In the case of PV guests with PCI devices we need the Xen-SWIOTLB one. [v1: Rewrite comment per Stefano's suggestion] Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-21xen/swiotlb: Simplify the logic.Konrad Rzeszutek Wilk1-4/+5
Its pretty easy: 1). We only check to see if we need Xen SWIOTLB for PV guests. 2). If swiotlb=force or iommu=soft is set, then Xen SWIOTLB will be enabled. 3). If it is an initial domain, then Xen SWIOTLB will be enabled. 4). Native SWIOTLB must be disabled for PV guests. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-21xen/x86: Workaround 64-bit hypervisor and 32-bit initial domain.Konrad Rzeszutek Wilk1-9/+21
If a 64-bit hypervisor is booted with a 32-bit initial domain, the hypervisor deals with the initial domain as "compat" and does some extra adjustments (like pagetables are 4 bytes instead of 8). It also adjusts the xen_start_info->pt_base incorrectly. When booted with a 32-bit hypervisor (32-bit initial domain): .. (XEN) Start info: cf831000->cf83147c (XEN) Page tables: cf832000->cf8b5000 .. [ 0.000000] PT: cf832000 (f832000) [ 0.000000] Reserving PT: f832000->f8b5000 And with a 64-bit hypervisor: (XEN) Start info: 00000000cf831000->00000000cf8314b4 (XEN) Page tables: 00000000cf832000->00000000cf8b6000 [ 0.000000] PT: cf834000 (f834000) [ 0.000000] Reserving PT: f834000->f8b8000 To deal with this, we keep keep track of the highest physical address we have reserved via memblock_reserve. If that address does not overlap with pt_base, we have a gap which we reserve. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-21xen/x86: Use memblock_reserve for sensitive areas.Konrad Rzeszutek Wilk3-9/+53
instead of a big memblock_reserve. This way we can be more selective in freeing regions (and it also makes it easier to understand where is what). [v1: Move the auto_translate_physmap to proper line] [v2: Per Stefano suggestion add more comments] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-21xen/p2m: Fix the comment describing the P2M tree.Konrad Rzeszutek Wilk1-7/+7
It mixed up the p2m_mid_missing with p2m_missing. Also remove some extra spaces. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>