Age | Commit message (Collapse) | Author | Files | Lines |
|
commit e6023367d779060fddc9a52d1f474085b2b36298 upstream.
When choosing a random address, the current implementation does not take into
account the reversed space for .bss and .brk sections. Thus the relocated kernel
may overlap other components in memory. Here is an example of the overlap from a
x86_64 kernel in qemu (the ranges of physical addresses are presented):
Physical Address
0x0fe00000 --+--------------------+ <-- randomized base
/ | relocated kernel |
vmlinux.bin | (from vmlinux.bin) |
0x1336d000 (an ELF file) +--------------------+--
\ | | \
0x1376d870 --+--------------------+ |
| relocs table | |
0x13c1c2a8 +--------------------+ .bss and .brk
| | |
0x13ce6000 +--------------------+ |
| | /
0x13f77000 | initrd |--
| |
0x13fef374 +--------------------+
The initrd image will then be overwritten by the memset during early
initialization:
[ 1.655204] Unpacking initramfs...
[ 1.662831] Initramfs unpacking failed: junk in compressed archive
This patch prevents the above situation by requiring a larger space when looking
for a random kernel base, so that existing logic can effectively avoids the
overlap.
[kees: switched to perl to avoid hex translation pain in mawk vs gawk]
[kees: calculated overlap without relocs table]
Fixes: 82fa9637a2 ("x86, kaslr: Select random position from e820 maps")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Junjie Mao <eternal.n08@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1414762838-13067-1-git-send-email-eternal.n08@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c0a717f23dccdb6e3b03471bc846fdc636f2b353 upstream.
Save the patch while we're running on the BSP instead of later, before
the initrd has been jettisoned. More importantly, on 32-bit we need to
access the physical address instead of the virtual.
This way we actually do find it on the APs instead of having to go
through the initrd each time.
Tested-by: Richard Hendershot <rshendershot@mchsi.com>
Fixes: 5335ba5cf475 ("x86, microcode, AMD: Fix early ucode loading")
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4750a0d112cbfcc744929f1530ffe3193436766c upstream.
Konrad triggered the following splat below in a 32-bit guest on an AMD
box. As it turns out, in save_microcode_in_initrd_amd() we're using the
*physical* address of the container *after* we have enabled paging and
thus we #PF in load_microcode_amd() when trying to access the microcode
container in the ramdisk range.
Because the ramdisk is exactly there:
[ 0.000000] RAMDISK: [mem 0x35e04000-0x36ef9fff]
and we fault at 0x35e04304.
And since this guest doesn't relocate the ramdisk, we don't do the
computation which will give us the correct virtual address and we end up
with the PA.
So, we should actually be using virtual addresses on 32-bit too by the
time we're freeing the initrd. Do that then!
Unpacking initramfs...
BUG: unable to handle kernel paging request at 35d4e304
IP: [<c042e905>] load_microcode_amd+0x25/0x4a0
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.1-302.fc21.i686 #1
Hardware name: Xen HVM domU, BIOS 4.4.1 10/01/2014
task: f5098000 ti: f50d0000 task.ti: f50d0000
EIP: 0060:[<c042e905>] EFLAGS: 00010246 CPU: 0
EIP is at load_microcode_amd+0x25/0x4a0
EAX: 00000000 EBX: f6e9ec4c ECX: 00001ec4 EDX: 00000000
ESI: f5d4e000 EDI: 35d4e2fc EBP: f50d1ed0 ESP: f50d1e94
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 35d4e304 CR3: 00e33000 CR4: 000406d0
Stack:
00000000 00000000 f50d1ebc f50d1ec4 f5d4e000 c0d7735a f50d1ed0 15a3d17f
f50d1ec4 00600f20 00001ec4 bfb83203 f6e9ec4c f5d4e000 c0d7735a f50d1ed8
c0d80861 f50d1ee0 c0d80429 f50d1ef0 c0d889a9 f5d4e000 c0000000 f50d1f04
Call Trace:
? unpack_to_rootfs
? unpack_to_rootfs
save_microcode_in_initrd_amd
save_microcode_in_initrd
free_initrd_mem
populate_rootfs
? unpack_to_rootfs
do_one_initcall
? unpack_to_rootfs
? repair_env_string
? proc_mkdir
kernel_init_freeable
kernel_init
ret_from_kernel_thread
? rest_init
Reported-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
References: https://bugzilla.redhat.com/show_bug.cgi?id=1158204
Fixes: 75a1ba5b2c52 ("x86, microcode, AMD: Unify valid container checks")
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20141101100100.GA4462@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 81f49a8fd7088cfcb588d182eeede862c0e3303e upstream.
is_compat_task() is the wrong check for audit arch; the check should
be is_ia32_task(): x32 syscalls should be AUDIT_ARCH_X86_64, not
AUDIT_ARCH_I386.
CONFIG_AUDITSYSCALL is currently incompatible with x32, so this has
no visible effect.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/a0138ed8c709882aec06e4acc30bfa9b623b8717.1409954077.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b47dcbdc5161d3d5756f430191e2840d9b855492 upstream.
If the TSC is unusable or disabled, then this patch fixes:
- Confusion while trying to clear old APIC interrupts.
- Division by zero and incorrect programming of the TSC deadline
timer.
This fixes boot if the CPU has a TSC deadline timer but a missing or
broken TSC. The failure to boot can be observed with qemu using
-cpu qemu64,-tsc,+tsc-deadline
This also happens to me in nested KVM for unknown reasons.
With this patch, I can boot cleanly (although without a TSC).
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Bandan Das <bsd@redhat.com>
Link: http://lkml.kernel.org/r/e2fa274e498c33988efac0ba8b7e3120f7f92d78.1413393027.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 849f5d894383d25c49132437aa289c9a9c98d5df upstream.
Add Braswell PCI ID to list of supported ID's for the IOSF driver.
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1411017231-20807-2-git-send-email-david.e.box@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Chang Rebecca Swee Fun <rebecca.swee.fun.chang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit aece118e487a744eafcdd0c77fe32b55ee2092a1 upstream.
Intel processors which don't report cache information via cpuid(2)
or cpuid(4) need quirk code in the legacy_cache_size callback to
report this data. For Intel that callback is is intel_size_cache().
This patch enables calling of cpu_detect_cache_sizes() inside of
init_intel() and hence the calling of the legacy_cache callback in
intel_size_cache(). Adding this call will ensure that PIII Tualatin
currently in intel_size_cache() and Quark SoC X1000 being added to
intel_size_cache() in this patch will report their respective cache
sizes.
This model of calling cpu_detect_cache_sizes() is consistent with
AMD/Via/Cirix/Transmeta and Centaur.
Also added is a string to idenitfy the Quark as Quark SoC X1000
giving better and more descriptive output via /proc/cpuinfo
Adding cpu_detect_cache_sizes to init_intel() will enable calling
of intel_size_cache() on Intel processors which currently no code
can reach. Therefore this patch will also re-enable reporting
of PIII Tualatin cache size information as well as add
Quark SoC X1000 support.
Comment text and cache flow logic suggested by Thomas Gleixner
Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: davej@redhat.com
Cc: hmh@hmh.eng.br
Link: http://lkml.kernel.org/r/1412641189-12415-3-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chang Rebecca Swee Fun <rebecca.swee.fun.chang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 04725ad59474d24553d526fa774179ecd2922342 upstream.
Introduce PCI IDs macro for the list of supported product:
BayTrail & Quark X1000.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-5-git-send-email-david.e.box@linux.intel.com
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Chang Rebecca Swee Fun <rebecca.swee.fun.chang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 90916e048c1e0c1d379577e43ab9b8e331490cfb upstream.
Add PCI device ID, i.e. that of the Host Bridge,
for IOSF MBI driver.
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-4-git-send-email-david.e.box@linux.intel.com
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Chang Rebecca Swee Fun <rebecca.swee.fun.chang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7ef1def800e907edd28ddb1a5c64bae6b8749cdd upstream.
Added all the MBI units below and their associated read/write
opcodes:
- Host Bridge Arbiter
- Host Bridge
- Remote Management Unit
- Memory Manager & eSRAM
- SoC Unit
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-3-git-send-email-david.e.box@linux.intel.com
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Chang Rebecca Swee Fun <rebecca.swee.fun.chang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6b8f0c8780c71d78624f736d7849645b64cc88b7 upstream.
Currently drivers that run on non-IOSF systems (Core/Xeon) can't use the IOSF
driver on SOC's without selecting it which forces an unnecessary and limiting
dependency. Provides dummy functions to allow these modules to conditionally
use the driver on IOSF equipped platforms without impacting their ability to
compile and load on non-IOSF platforms. Build default m to ensure availability
on x86 SOC's.
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Link: http://lkml.kernel.org/r/1399668248-24199-2-git-send-email-david.e.box@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Chang Rebecca Swee Fun <rebecca.swee.fun.chang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a642fc305053cc1c6e47e4f4df327895747ab485 upstream.
On systems with invvpid instruction support (corresponding bit in
IA32_VMX_EPT_VPID_CAP MSR is set) guest invocation of invvpid
causes vm exit, which is currently not handled and results in
propagation of unknown exit to userspace.
Fix this by installing an invvpid vm exit handler.
This is CVE-2014-3646.
Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 234f3ce485d54017f15cf5e0699cff4100121601 upstream.
Before changing rip (during jmp, call, ret, etc.) the target should be asserted
to be canonical one, as real CPUs do. During sysret, both target rsp and rip
should be canonical. If any of these values is noncanonical, a #GP exception
should occur. The exception to this rule are syscall and sysenter instructions
in which the assigned rip is checked during the assignment to the relevant
MSRs.
This patch fixes the emulator to behave as real CPUs do for near branches.
Far branches are handled by the next patch.
This fixes CVE-2014-3647.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 05c83ec9b73c8124555b706f6af777b10adf0862 upstream.
Relative jumps and calls do the masking according to the operand size, and not
according to the address size as the KVM emulator does today.
This patch fixes KVM behavior.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2bc19dc3754fc066c43799659f0d848631c44cfe upstream.
KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was
triggered by a priveledged application. Let's not kill the guest: WARN
and inject #UD instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 854e8bb1aa06c578c2c9145fa6bfe3680ef63b23 upstream.
Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
written to certain MSRs. The behavior is "almost" identical for AMD and Intel
(ignoring MSRs that are not implemented in either architecture since they would
anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
non-canonical address is written on Intel but not on AMD (which ignores the top
32-bits).
Accordingly, this patch injects a #GP on the MSRs which behave identically on
Intel and AMD. To eliminate the differences between the architecutres, the
value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
canonical value before writing instead of injecting a #GP.
Some references from Intel and AMD manuals:
According to Intel SDM description of WRMSR instruction #GP is expected on
WRMSR "If the source register contains a non-canonical address and ECX
specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."
According to AMD manual instruction manual:
LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical
form, a general-protection exception (#GP) occurs."
IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
base field must be in canonical form or a #GP fault will occur."
IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
be in canonical form."
This patch fixes CVE-2014-3610.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2febc839133280d5a5e8e1179c94ea674489dae2 upstream.
There's a race condition in the PIT emulation code in KVM. In
__kvm_migrate_pit_timer the pit_timer object is accessed without
synchronization. If the race condition occurs at the wrong time this
can crash the host kernel.
This fixes CVE-2014-3611.
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8b3c3104c3f4f706e99365c3e0d2aa61b95f969f upstream.
The previous patch blocked invalid writes directly when the MSR
is written. As a precaution, prevent future similar mistakes by
gracefulling handle GPs caused by writes to shared MSRs.
Signed-off-by: Andrew Honig <ahonig@google.com>
[Remove parts obsoleted by Nadav's patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d1cd1210834649ce1ca6bafe5ac25d2f40331343 upstream.
pte_pfn() returns a PFN of long (32 bits in 32-PAE), so "long <<
PAGE_SHIFT" will overflow for PFNs above 4GB.
Due to this issue, some Linux 32-PAE distros, running as guests on Hyper-V,
with 5GB memory assigned, can't load the netvsc driver successfully and
hence the synthetic network device can't work (we can use the kernel parameter
mem=3000M to work around the issue).
Cast pte_pfn() to phys_addr_t before shifting.
Fixes: "commit d76565344512: x86, mm: Create slow_virt_to_phys()"
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: gregkh@linuxfoundation.org
Cc: linux-mm@kvack.org
Cc: olaf@aepfle.de
Cc: apw@canonical.com
Cc: jasowang@redhat.com
Cc: dave.hansen@intel.com
Cc: riel@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1414580017-27444-1-git-send-email-decui@microsoft.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 653bc77af60911ead1f423e588f54fc2547c4957 upstream.
Rusty noticed a Really Bad Bug (tm) in my NT fix. The entry code
reads out of bounds, causing the NT fix to be unreliable. But, and
this is much, much worse, if your stack is somehow just below the
top of the direct map (or a hole), you read out of bounds and crash.
Excerpt from the crash:
[ 1.129513] RSP: 0018:ffff88001da4bf88 EFLAGS: 00010296
2b:* f7 84 24 90 00 00 00 testl $0x4000,0x90(%rsp)
That read is deterministically above the top of the stack. I
thought I even single-stepped through this code when I wrote it to
check the offset, but I clearly screwed it up.
Fixes: 8c7aa698baca ("x86_64, entry: Filter RFLAGS.NT on entry from userspace")
Reported-by: Rusty Russell <rusty@ozlabs.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8c7aa698baca5e8f1ba9edb68081f1e7a1abf455 upstream.
The NT flag doesn't do anything in long mode other than causing IRET
to #GP. Oddly, CPL3 code can still set NT using popf.
Entry via hardware or software interrupt clears NT automatically, so
the only relevant entries are fast syscalls.
If user code causes kernel code to run with NT set, then there's at
least some (small) chance that it could cause trouble. For example,
user code could cause a call to EFI code with NT set, and who knows
what would happen? Apparently some games on Wine sometimes do
this (!), and, if an IRET return happens, they will segfault. That
segfault cannot be handled, because signal delivery fails, too.
This patch programs the CPU to clear NT on entry via SYSCALL (both
32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
in software on entry via SYSENTER.
To save a few cycles, this borrows a trick from Jan Beulich in Xen:
it checks whether NT is set before trying to clear it. As a result,
it seems to have very little effect on SYSENTER performance on my
machine.
There's another minor bug fix in here: it looks like the CFI
annotations were wrong if CONFIG_AUDITSYSCALL=n.
Testers beware: on Xen, SYSENTER with NT set turns into a GPF.
I haven't touched anything on 32-bit kernels.
The syscall mask change comes from a variant of this patch by Anish
Bhatt.
Note to stable maintainers: there is no known security issue here.
A misguided program can set NT and cause the kernel to try and fail
to deliver SIGSEGV, crashing the program. This patch fixes Far Cry
on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275
Reported-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 66463db4fc5605d51c7bb81d009d5bf30a783a2c upstream.
save_xstate_sig()->drop_init_fpu() doesn't look right. setup_rt_frame()
can fail after that, in this case the next setup_rt_frame() triggered
by SIGSEGV won't save fpu simply because the old state was lost. This
obviously mean that fpu won't be restored after sys_rt_sigreturn() from
SIGSEGV handler.
Shift drop_init_fpu() into !failed branch in handle_signal().
Test-case (needs -O2):
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <pthread.h>
#include <assert.h>
volatile double D;
void test(double d)
{
int pid = getpid();
for (D = d; D == d; ) {
/* sys_tkill(pid, SIGHUP); asm to avoid save/reload
* fp regs around "C" call */
asm ("" : : "a"(200), "D"(pid), "S"(1));
asm ("syscall" : : : "ax");
}
printf("ERR!!\n");
}
void sigh(int sig)
{
}
char altstack[4096 * 10] __attribute__((aligned(4096)));
void *tfunc(void *arg)
{
for (;;) {
mprotect(altstack, sizeof(altstack), PROT_READ);
mprotect(altstack, sizeof(altstack), PROT_READ|PROT_WRITE);
}
}
int main(void)
{
stack_t st = {
.ss_sp = altstack,
.ss_size = sizeof(altstack),
.ss_flags = SS_ONSTACK,
};
struct sigaction sa = {
.sa_handler = sigh,
};
pthread_t pt;
sigaction(SIGSEGV, &sa, NULL);
sigaltstack(&st, NULL);
sa.sa_flags = SA_ONSTACK;
sigaction(SIGHUP, &sa, NULL);
pthread_create(&pt, NULL, tfunc, NULL);
test(123.456);
return 0;
}
Reported-by: Bean Anderson <bean@azulsystems.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140902175713.GA21646@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit df24fb859a4e200d9324e2974229fbb7adf00aef upstream.
Add preempt_disable() + preempt_enable() around math_state_restore() in
__restore_xstate_sig(). Otherwise __switch_to() after __thread_fpu_begin()
can overwrite fpu->state we are going to restore.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140902175717.GA21649@redhat.com
Reviewed-by: Suresh Siddha <sbsiddha@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0e6d3112a4e95d55cf6dca88f298d5f4b8f29bd1 upstream.
It is currently possible to execve() an x32 executable on an x86_64
kernel that has only ia32 compat enabled. However all its syscalls
will fail, even _exit(). This usually causes it to segfault.
Change the ELF compat architecture check so that x32 executables are
rejected if we don't support the x32 ABI.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ee1b5b165c0a2f04d2107e634e51f05d0eb107de upstream.
Quark x1000 advertises PGE via the standard CPUID method
PGE bits exist in Quark X1000's PTEs. In order to flush
an individual PTE it is necessary to reload CR3 irrespective
of the PTE.PGE bit.
See Quark Core_DevMan_001.pdf section 6.4.11
This bug was fixed in Galileo kernels, unfixed vanilla kernels are expected to
crash and burn on this platform.
Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1411514784-14885-1-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ee3d1570b58677885b4552bce8217fda7b226a68 upstream.
vcpu exits and memslot mutations can run concurrently as long as the
vcpu does not aquire the slots mutex. Thus it is theoretically possible
for memslots to change underneath a vcpu that is handling an exit.
If we increment the memslot generation number again after
synchronize_srcu_expedited(), vcpus can safely cache memslot generation
without maintaining a single rcu_dereference through an entire vm exit.
And much of the x86/kvm code does not maintain a single rcu_dereference
of the current memslots during each exit.
We can prevent the following case:
vcpu (CPU 0) | thread (CPU 1)
--------------------------------------------+--------------------------
1 vm exit |
2 srcu_read_unlock(&kvm->srcu) |
3 decide to cache something based on |
old memslots |
4 | change memslots
| (increments generation)
5 | synchronize_srcu(&kvm->srcu);
6 retrieve generation # from new memslots |
7 tag cache with new memslot generation |
8 srcu_read_unlock(&kvm->srcu) |
... |
<action based on cache occurs even |
though the caching decision was based |
on the old memslots> |
... |
<action *continues* to occur until next |
memslot generation change, which may |
be never> |
|
By incrementing the generation after synchronizing with kvm->srcu readers,
we ensure that the generation retrieved in (6) will become invalid soon
after (8).
Keeping the existing increment is not strictly necessary, but we
do keep it and just move it for consistency from update_memslots to
install_new_memslots. It invalidates old cached MMIOs immediately,
instead of having to wait for the end of synchronize_srcu_expedited,
which makes the code more clearly correct in case CPU 1 is preempted
right after synchronize_srcu() returns.
To avoid halving the generation space in SPTEs, always presume that the
low bit of the generation is zero when reconstructing a generation number
out of an SPTE. This effectively disables MMIO caching in SPTEs during
the call to synchronize_srcu_expedited. Using the low bit this way is
somewhat like a seqcount---where the protected thing is a cache, and
instead of retrying we can simply punt if we observe the low bit to be 1.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 56f17dd3fbc44adcdbc3340fe3988ddb833a47a7 upstream.
The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
up to userspace:
(1) Guest accesses gpa X without a memory slot. The gfn is cached in
struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
the SPTE write-execute-noread so that future accesses cause
EPT_MISCONFIGs.
(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
covering the page just accessed.
(3) Guest attempts to read or write to gpa X again. On Intel, this
generates an EPT_MISCONFIG. The memory slot generation number that
was incremented in (2) would normally take care of this but we fast
path mmio faults through quickly_check_mmio_pf(), which only checks
the per-vcpu mmio cache. Since we hit the cache, KVM passes a
KVM_EXIT_MMIO up to userspace.
This patch fixes the issue by using the memslot generation number
to validate the mmio cache.
Signed-off-by: David Matlack <dmatlack@google.com>
[xiaoguangrong: adjust the code to make it simpler for stable-tree fix.]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Tested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 24223657806a0ebd0ae5c9caaf7b021091889cf2 upstream.
CPUs which should support the RAPL counters according to
Family/Model/Stepping may still issue #GP when attempting to access
the RAPL MSRs. This may happen when Linux is running under KVM and
we are passing-through host F/M/S data, for example. Use rdmsrl_safe
to first access the RAPL_POWER_UNIT MSR; if this fails, do not
attempt to use this PMU.
Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com
Cc: zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: ak@linux.intel.com
Cc: linux-kernel@vger.kernel.org
[ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[backport by whissi]
Cc: Thomas D <whissi@whissi.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 03bd4e1f7265548832a76e7919a81f3137c44fd1 upstream.
The following bug can be triggered by hot adding and removing a large number of
xen domain0's vcpus repeatedly:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
PGD 5a9d5067 PUD 13067 PMD 0
Oops: 0000 [#3] SMP
[...]
Call Trace:
load_balance
? _raw_spin_unlock_irqrestore
idle_balance
__schedule
schedule
schedule_timeout
? lock_timer_base
schedule_timeout_uninterruptible
msleep
lock_device_hotplug_sysfs
online_store
dev_attr_store
sysfs_write_file
vfs_write
SyS_write
system_call_fastpath
Last level cache shared mask is built during CPU up and the
build_sched_domain() routine takes advantage of it to setup
the sched domain CPU topology.
However, llc_shared_mask is not released during CPU disable,
which leads to an invalid sched domainCPU topology.
This patch fix it by releasing the llc_shared_mask correctly
during CPU disable.
Yasuaki also reported that this can happen on real hardware:
https://lkml.org/lkml/2014/7/22/1018
His case is here:
==
Here is an example on my system.
My system has 4 sockets and each socket has 15 cores and HT is
enabled. In this case, each core of sockes is numbered as
follows:
| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-44, 90-104
Socket#3 | 45-59, 105-119
Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
It means that last level cache of Socket#2 is shared with
CPU#30-44 and 90-104.
When hot-removing socket#2 and #3, each core of sockets is
numbered as follows:
| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
remains having 0x3fff80000001fffc0000000.
After that, when hot-adding socket#2 and #3, each core of
sockets is numbered as follows:
| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-59
Socket#3 | 90-119
Then llc_shared_mask of CPU#30 becomes
0x3fff8000fffffffc0000000. It means that last level cache of
Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
the wrong value.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: Linn Crosetto <linn@hp.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0cacbfbeb5077b63d5d3cf6df88b14ac12ad584b upstream.
The KASLR location-choosing logic needs to avoid the setup_data
list memory areas as well. Without this, it would be possible to
have the ASLR position stomp on the memory, ultimately causing
the boot to fail.
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140911161931.GA12001@www.outflux.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3eddc69ffeba092d288c386646bfa5ec0fce25fd upstream.
3.16 kernel boot fail with earlyprintk=efi, it keeps scrolling at the
bottom line of screen.
Bisected, the first bad commit is below:
commit 86dfc6f339886559d80ee0d4bd20fe5ee90450f0
Author: Lv Zheng <lv.zheng@intel.com>
Date: Fri Apr 4 12:38:57 2014 +0800
ACPICA: Tables: Fix table checksums verification before installation.
I did some debugging by enabling both serial and efi earlyprintk, below is
some debug dmesg, seems early_ioremap fails in scroll up function due to
no free slot, see below dmesg output:
WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:116 __early_ioremap+0x90/0x1c4()
__early_ioremap(ed00c800, 00000c80) not found slot
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.17.0-rc1+ #204
Hardware name: Hewlett-Packard HP Z420 Workstation/1589, BIOS J61 v03.15 05/09/2013
Call Trace:
dump_stack+0x4e/0x7a
warn_slowpath_common+0x75/0x8e
? __early_ioremap+0x90/0x1c4
warn_slowpath_fmt+0x47/0x49
__early_ioremap+0x90/0x1c4
? sprintf+0x46/0x48
early_ioremap+0x13/0x15
early_efi_map+0x24/0x26
early_efi_scroll_up+0x6d/0xc0
early_efi_write+0x1b0/0x214
call_console_drivers.constprop.21+0x73/0x7e
console_unlock+0x151/0x3b2
? vprintk_emit+0x49f/0x532
vprintk_emit+0x521/0x532
? console_unlock+0x383/0x3b2
printk+0x4f/0x51
acpi_os_vprintf+0x2b/0x2d
acpi_os_printf+0x43/0x45
acpi_info+0x5c/0x63
? __acpi_map_table+0x13/0x18
? acpi_os_map_iomem+0x21/0x147
acpi_tb_print_table_header+0x177/0x186
acpi_tb_install_table_with_override+0x4b/0x62
acpi_tb_install_standard_table+0xd9/0x215
? early_ioremap+0x13/0x15
? __acpi_map_table+0x13/0x18
acpi_tb_parse_root_table+0x16e/0x1b4
acpi_initialize_tables+0x57/0x59
acpi_table_init+0x50/0xce
acpi_boot_table_init+0x1e/0x85
setup_arch+0x9b7/0xcc4
start_kernel+0x94/0x42d
? early_idt_handlers+0x120/0x120
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0xf3/0x100
Quote reply from Lv.zheng about the early ioremap slot usage in this case:
"""
In early_efi_scroll_up(), 2 mapping entries will be used for the src/dst screen buffer.
In drivers/acpi/acpica/tbutils.c, we've improved the early table loading code in acpi_tb_parse_root_table().
We now need 2 mapping entries:
1. One mapping entry is used for RSDT table mapping. Each RSDT entry contains an address for another ACPI table.
2. For each entry in RSDP, we need another mapping entry to map the table to perform necessary check/override before installing it.
When acpi_tb_parse_root_table() prints something through EFI earlyprintk console, we'll have 4 mapping entries used.
The current 4 slots setting of early_ioremap() seems to be too small for such a use case.
"""
Thus increase the slot to 8 in this patch to fix this issue.
boot-time mappings become 512 page with this patch.
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0b5a50635fc916cf46e3de0b819a61fc3f17e7ee upstream.
When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded
modules exceeds 512 MiB, then loading modules fails with a warning
(and hence a vmalloc allocation failure) because the PTEs for the
newly-allocated vmalloc address space are not zero.
WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128
vmap_page_range_noflush+0x2a1/0x360()
This is caused by xen_setup_kernel_pagetables() copying
level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present
entries.
Without KASLR, the normal kernel image size only covers the first half
of level2_kernel_pgt and module space starts after that.
L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[ 0..255]->kernel
[256..511]->module
[511]->level2_fixmap_pgt[ 0..505]->module
This allows 512 MiB of of module vmalloc space to be used before
having to use the corrupted level2_fixmap_pgt entries.
With KASLR enabled, the kernel image uses the full PUD range of 1G and
module space starts in the level2_fixmap_pgt. So basically:
L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel
[511]->level2_fixmap_pgt[0..505]->module
And now no module vmalloc space can be used without using the corrupt
level2_fixmap_pgt entries.
Fix this by properly converting the level2_fixmap_pgt entries to MFNs,
and setting level1_fixmap_pgt as read-only.
A number of comments were also using the the wrong L3 offset for
level2_kernel_pgt. These have been corrected.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8d5999df35314607c38fbd6bdd709e25c3a4eeab upstream.
If the timer irqs are resumed during device resume it is possible in
certain circumstances for the resume to hang early on, before device
interrupts are resumed. For an Ubuntu 14.04 PVHVM guest this would
occur in ~0.5% of resume attempts.
It is not entirely clear what is occuring the point of the hang but I
think a task necessary for the resume calls schedule_timeout(),
waiting for a timer interrupt (which never arrives). This failure may
require specific tasks to be running on the other VCPUs to trigger
(processes are not frozen during a suspend/resume if PREEMPT is
disabled).
Add IRQF_EARLY_RESUME to the timer interrupts so they are resumed in
syscore_resume().
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7d951f3ccb0308c95bf76d5eef9886dea35a7013 upstream.
Commit b7dd0e350e0b (x86/xen: safely map and unmap grant frames when
in atomic context) causes PVH guests to crash in
arch_gnttab_map_shared() when they attempted to map the pages for the
grant table.
This use of a PV-specific function during the PVH grant table setup is
non-obvious and not needed. The standard vmap() function does the
right thing.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Tested-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7b2a583afb4ab894f78bc0f8bd136e96b6499a7e upstream.
Without CONFIG_RELOCATABLE the early boot code will decompress the
kernel to LOAD_PHYSICAL_ADDR. While this may have been fine in the BIOS
days, that isn't going to fly with UEFI since parts of the firmware
code/data may be located at LOAD_PHYSICAL_ADDR.
Straying outside of the bounds of the regions we've explicitly requested
from the firmware will cause all sorts of trouble. Bruno reports that
his machine resets while trying to decompress the kernel image.
We already go to great pains to ensure the kernel is loaded into a
suitably aligned buffer, it's just that the address isn't necessarily
LOAD_PHYSICAL_ADDR, because we can't guarantee that address isn't in-use
by the firmware.
Explicitly enforce CONFIG_RELOCATABLE for the EFI boot stub, so that we
can load the kernel at any address with the correct alignment.
Reported-by: Bruno Prémont <bonbons@linux-vserver.org>
Tested-by: Bruno Prémont <bonbons@linux-vserver.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 53b884ac3745353de220d92ef792515c3ae692f0 upstream.
This commit in Linux 3.6:
commit c767a54ba0657e52e6edaa97cbe0b0a8bf1c1655
Author: Joe Perches <joe@perches.com>
Date: Mon May 21 19:50:07 2012 -0700
x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>
caused warn_bad_vsyscall to output garbage in the middle of the
line. Revert the bad part of it.
The printk in question isn't actually bare; the level is "%s".
The bug this fixes is purely cosmetic; backports are optional.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/03eac1f24110bbe496ecc12a4df467e0d88466d4.1406330947.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit cbace46a9710a480cae51e4611697df5de41713e upstream.
Commit 30919b0bf356 ("x86: avoid low BIOS area when allocating address
space") moved the test for resource allocations that fall within the first
1MB of address space from the PCI-specific path to a generic path, such
that all resource allocations will avoid this area. However, this breaks
ISA cards which need to allocate a memory region within the first 1MB. An
example is the i82365 PCMCIA controller and derivatives like the Ricoh
RF5C296/396 which map part of the PCMCIA socket memory address space into
the first 1MB of system memory address space. They do not work anymore as
no usable memory region exists due to this change:
Intel ISA PCIC probe: Ricoh RF5C296/396 ISA-to-PCMCIA at port 0x3e0 ofs 0x00, 2 sockets
host opts [0]: none
host opts [1]: none
ISA irqs (scanned) = 3,4,5,9,10 status change on irq 10
pcmcia_socket pcmcia_socket1: pccard: PCMCIA card inserted into slot 1
pcmcia_socket pcmcia_socket0: cs: IO port probe 0xc00-0xcff: excluding 0xcf8-0xcff
pcmcia_socket pcmcia_socket0: cs: IO port probe 0xa00-0xaff: clean.
pcmcia_socket pcmcia_socket0: cs: IO port probe 0x100-0x3ff: excluding 0x170-0x177 0x1f0-0x1f7 0x2f8-0x2ff 0x370-0x37f 0x3c0-0x3e7 0x3f0-0x3ff
pcmcia_socket pcmcia_socket0: cs: memory probe 0x0a0000-0x0affff: excluding 0xa0000-0xaffff
pcmcia_socket pcmcia_socket0: cs: memory probe 0x0b0000-0x0bffff: excluding 0xb0000-0xbffff
pcmcia_socket pcmcia_socket0: cs: memory probe 0x0c0000-0x0cffff: excluding 0xc0000-0xcbfff
pcmcia_socket pcmcia_socket0: cs: memory probe 0x0d0000-0x0dffff: clean.
pcmcia_socket pcmcia_socket0: cs: memory probe 0x0e0000-0x0effff: clean.
pcmcia_socket pcmcia_socket0: cs: memory probe 0x60000000-0x60ffffff: clean.
pcmcia_socket pcmcia_socket0: cs: memory probe 0xa0000000-0xa0ffffff: clean.
pcmcia_socket pcmcia_socket1: cs: IO port probe 0xc00-0xcff: excluding 0xcf8-0xcff
pcmcia_socket pcmcia_socket1: cs: IO port probe 0xa00-0xaff: clean.
pcmcia_socket pcmcia_socket1: cs: IO port probe 0x100-0x3ff: excluding 0x170-0x177 0x1f0-0x1f7 0x2f8-0x2ff 0x370-0x37f 0x3c0-0x3e7 0x3f0-0x3ff
pcmcia_socket pcmcia_socket1: cs: memory probe 0x0a0000-0x0affff: excluding 0xa0000-0xaffff
pcmcia_socket pcmcia_socket1: cs: memory probe 0x0b0000-0x0bffff: excluding 0xb0000-0xbffff
pcmcia_socket pcmcia_socket1: cs: memory probe 0x0c0000-0x0cffff: excluding 0xc0000-0xcbfff
pcmcia_socket pcmcia_socket1: cs: memory probe 0x0d0000-0x0dffff: clean.
pcmcia_socket pcmcia_socket1: cs: memory probe 0x0e0000-0x0effff: clean.
pcmcia_socket pcmcia_socket1: cs: memory probe 0x60000000-0x60ffffff: clean.
pcmcia_socket pcmcia_socket1: cs: memory probe 0xa0000000-0xa0ffffff: clean.
pcmcia_socket pcmcia_socket1: cs: memory probe 0x0cc000-0x0effff: excluding 0xe0000-0xeffff
pcmcia_socket pcmcia_socket1: cs: unable to map card memory!
If filtering out the first 1MB is reverted, everything works as expected.
Tested-by: Robert Resch <fli4l@robert.reschpara.de>
Signed-off-by: Christoph Schulz <develop@kristov.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0d234daf7e0a3290a3a20c8087eefbd6335a5bd4 upstream.
This reverts commit 682367c494869008eb89ef733f196e99415ae862,
which causes 32-bit SMP Windows 7 guests to panic.
SeaBIOS has a limit on the number of MTRRs that it can handle,
and this patch exceeded the limit. Better revert it.
Thanks to Nadav Amit for debugging the cause.
Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 56cc2406d68c0f09505c389e276f27a99f495cbd upstream.
After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info
if L1 asks us to), "Acknowledge interrupt on exit" behavior can be
emulated. To do so, KVM will ask the APIC for the interrupt vector if
during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv,
kvm_get_apic_interrupt would return -1 and give the following WARNING:
Call Trace:
[<ffffffff81493563>] dump_stack+0x49/0x5e
[<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96
[<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17
[<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
[<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
[<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
[<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm]
[<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm]
To fix this, we cannot rely on the processor's virtual interrupt delivery,
because "acknowledge interrupt on exit" must only update the virtual
ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR)
but it should not deliver the interrupt through the IDT. Thus, KVM has
to deliver the interrupt "by hand", similar to the treatment of EOI in
commit fc57ac2c9ca8 (KVM: lapic: sync highest ISR to hardware apic on
EOI, 2014-05-14).
The patch modifies kvm_cpu_get_interrupt to always acknowledge an
interrupt; there are only two callers, and the other is not affected
because it is never reached with kvm_apic_vid_enabled() == true. Then it
modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition
to the registers.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Tested-by: Liu, RongrongX <rongrongx.liu@intel.com>
Tested-by: Felipe Reyes <freyes@suse.com>
Fixes: 77b0f5d67ff2781f36831cba79674c3e97bd7acf
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9e8919ae793f4edfaa29694a70f71a515ae9942a upstream.
Return unhandlable error on inter-privilege level ret instruction. This is
since the current emulation does not check the privilege level correctly when
loading the CS, and does not pop RSP/SS as needed.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8762e5092828c4dc0f49da5a47a644c670df77f3 upstream.
init_espfix_ap() is currently off by one level when informing hypervisor
that allocated pages will be used for ministacks' page tables.
The most immediate effect of this on a PV guest is that if
'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor
will refuse to use it for a page table (which it shouldn't be anyway). This will
result in warnings by both Xen and Linux.
More importantly, a subsequent write to that page (again, by a PV guest) is
likely to result in fatal page fault.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.com
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ea9f9274bf4337ba7cbab241c780487651642d63 upstream.
Remove xen_enable_nmi() to fix a 64-bit guest crash when registering
the NMI callback on Xen 3.1 and earlier.
It's not needed since the NMI callback is set by a set_trap_table
hypercall (in xen_load_idt() or xen_write_idt_entry()).
It's also broken since it only set the current VCPU's callback.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 upstream.
This moves the espfix64 logic into native_iret. To make this work,
it gets rid of the native patch for INTERRUPT_RETURN:
INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.
This changes the 16-bit SS behavior on Xen from OOPSing to leaking
some bits of the Xen hypervisor's RSP (I think).
[ hpa: this is a nonzero cost on native, but probably not enough to
measure. Xen needs to fix this in their own code, probably doing
something equivalent to espfix64. ]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 34273f41d57ee8d854dcd2a1d754cbb546cb548f upstream.
Embedded systems, which may be very memory-size-sensitive, are
extremely unlikely to ever encounter any 16-bit software, so make it
a CONFIG_EXPERT option to turn off support for any 16-bit software
whatsoever.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 197725de65477bc8509b41388157c1a2283542bb upstream.
Make espfix64 a hidden Kconfig option. This fixes the x86-64 UML
build which had broken due to the non-existence of init_espfix_bsp()
in UML: since UML uses its own Kconfig, this option does not appear in
the UML build.
This also makes it possible to make support for 16-bit segments a
configuration option, for the people who want to minimize the size of
the kernel.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Richard Weinberger <richard@nod.at>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 20b68535cd27183ebd3651ff313afb2b97dac941 upstream.
Header guard is #ifndef, not #ifdef...
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e1fe9ed8d2a4937510d0d60e20705035c2609aea upstream.
Sparse warns that the percpu variables aren't declared before they are
defined. Rather than hacking around it, move espfix definitions into
a proper header file.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3891a04aafd668686239349ea58f3314ea2af86b upstream.
The IRET instruction, when returning to a 16-bit segment, only
restores the bottom 16 bits of the user space stack pointer. This
causes some 16-bit software to break, but it also leaks kernel state
to user space. We have a software workaround for that ("espfix") for
the 32-bit kernel, but it relies on a nonzero stack segment base which
is not available in 64-bit mode.
In checkin:
b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
the logic that 16-bit support is crippled on 64-bit kernels anyway (no
V86 support), but it turns out that people are doing stuff like
running old Win16 binaries under Wine and expect it to work.
This works around this by creating percpu "ministacks", each of which
is mapped 2^16 times 64K apart. When we detect that the return SS is
on the LDT, we copy the IRET frame to the ministack and use the
relevant alias to return to userspace. The ministacks are mapped
readonly, so if IRET faults we promote #GP to #DF which is an IST
vector and thus has its own stack; we then do the fixup in the #DF
handler.
(Making #GP an IST exception would make the msr_safe functions unsafe
in NMI/MC context, and quite possibly have other effects.)
Special thanks to:
- Andy Lutomirski, for the suggestion of using very small stack slots
and copy (as opposed to map) the IRET frame there, and for the
suggestion to mark them readonly and let the fault promote to #DF.
- Konrad Wilk for paravirt fixup and testing.
- Borislav Petkov for testing help and useful comments.
Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Lutomriski <amluto@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dirk Hohndel <dirk@hohndel.org>
Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
Cc: comex <comexk@gmail.com>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <stable@vger.kernel.org> # consider after upstream merge
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7ed6fb9b5a5510e4ef78ab27419184741169978a upstream.
This reverts commit fa81511bb0bbb2b1aace3695ce869da9762624ff in
preparation of merging in the proper fix (espfix64).
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c7fb93ec51d462ec3540a729ba446663c26a0505 upstream.
The PE/COFF headers currently describe only the initialised-data
portions of the image, and result in no space being allocated for the
uninitialised-data portions. Consequently, the EFI boot stub will end
up overwriting unexpected areas of memory, with unpredictable results.
Fix by including a .bss section in the PE/COFF headers (functionally
equivalent to the init_size field in the bzImage header).
Signed-off-by: Michael Brown <mbrown@fensystems.co.uk>
Cc: Thomas Bächler <thomas@archlinux.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|