Age | Commit message (Collapse) | Author | Files | Lines |
|
show_regs() is inherently arch-dependent but it does make sense to print
generic debug information and some archs already do albeit in slightly
different forms. This patch introduces a generic function to print debug
information from show_regs() so that different archs print out the same
information and it's much easier to modify what's printed.
show_regs_print_info() prints out the same debug info as dump_stack()
does plus task and thread_info pointers.
* Archs which didn't print debug info now do.
alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
um, xtensa
* Already prints debug info. Replaced with show_regs_print_info().
The printed information is superset of what used to be there.
arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86
* s390 is special in that it used to print arch-specific information
along with generic debug info. Heiko and Martin think that the
arch-specific extra isn't worth keeping s390 specfic implementation.
Converted to use the generic version.
Note that now all archs print the debug info before actual register
dumps.
An example BUG() dump follows.
kernel BUG at /work/os/work/kernel/workqueue.c:4841!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
RIP: 0010:[<ffffffff8234a07e>] [<ffffffff8234a07e>] init_workqueues+0x4/0x6
RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246
RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
Call Trace:
[<ffffffff81000312>] do_one_initcall+0x122/0x170
[<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
[<ffffffff81c47760>] ? rest_init+0x140/0x140
[<ffffffff81c4776e>] kernel_init+0xe/0xf0
[<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
[<ffffffff81c47760>] ? rest_init+0x140/0x140
...
v2: Typo fix in x86-32.
v3: CPU number dropped from show_regs_print_info() as
dump_stack_print_info() has been updated to print it. s390
specific implementation dropped as requested by s390 maintainers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile bits]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
x86 and ia64 can acquire extra hardware identification information
from DMI and print it along with task dumps; however, the usage isn't
consistent.
* x86 show_regs() collects vendor, product and board strings and print
them out with PID, comm and utsname. Some of the information is
printed again later in the same dump.
* warn_slowpath_common() explicitly accesses the DMI board and prints
it out with "Hardware name:" label. This applies to both x86 and
ia64 but is irrelevant on all other archs.
* ia64 doesn't show DMI information on other non-WARN dumps.
This patch introduces arch-specific hardware description used by
dump_stack(). It can be set by calling dump_stack_set_arch_desc()
during boot and, if exists, printed out in a separate line with
"Hardware name:" label.
dmi_set_dump_stack_arch_desc() is added which sets arch-specific
description from DMI data. It uses dmi_ids_string[] which is set from
dmi_present() used for DMI debug message. It is superset of the
information x86 show_regs() is using. The function is called from x86
and ia64 boot code right after dmi_scan_machine().
This makes the explicit DMI handling in warn_slowpath_common()
unnecessary. Removed.
show_regs() isn't yet converted to use generic debug information
printing and this patch doesn't remove the duplicate DMI handling in
x86 show_regs(). The next patch will unify show_regs() handling and
remove the duplication.
An example WARN dump follows.
WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #3
Hardware name: empty empty/S3992, BIOS 080011 10/26/2007
0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
ffffffff8108f500 ffffffff82228240 0000000000000040 ffffffff8234a08e
0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
Call Trace:
[<ffffffff81c614dc>] dump_stack+0x19/0x1b
[<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
[<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20
[<ffffffff8234a0c3>] init_workqueues+0x35/0x505
...
v2: Use the same string as the debug message from dmi_present() which
also contains BIOS information. Move hardware name into its own
line as warn_slowpath_common() did. This change was suggested by
Bjorn Helgaas.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Both dump_stack() and show_stack() are currently implemented by each
architecture. show_stack(NULL, NULL) dumps the backtrace for the
current task as does dump_stack(). On some archs, dump_stack() prints
extra information - pid, utsname and so on - in addition to the
backtrace while the two are identical on other archs.
The usages in arch-independent code of the two functions indicate
show_stack(NULL, NULL) should print out bare backtrace while
dump_stack() is used for debugging purposes when something went wrong,
so it does make sense to print additional information on the task which
triggered dump_stack().
There's no reason to require archs to implement two separate but mostly
identical functions. It leads to unnecessary subtle information.
This patch expands the dummy fallback dump_stack() implementation in
lib/dump_stack.c such that it prints out debug information (taken from
x86) and invokes show_stack(NULL, NULL) and drops arch-specific
dump_stack() implementations in all archs except blackfin. Blackfin's
dump_stack() does something wonky that I don't understand.
Debug information can be printed separately by calling
dump_stack_print_info() so that arch-specific dump_stack()
implementation can still emit the same debug information. This is used
in blackfin.
This patch brings the following behavior changes.
* On some archs, an extra level in backtrace for show_stack() could be
printed. This is because the top frame was determined in
dump_stack() on those archs while generic dump_stack() can't do that
reliably. It can be compensated by inlining dump_stack() but not
sure whether that'd be necessary.
* Most archs didn't use to print debug info on dump_stack(). They do
now.
An example WARN dump follows.
WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
Hardware name: empty
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9
0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c
0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
Call Trace:
[<ffffffff81c614dc>] dump_stack+0x19/0x1b
[<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20
[<ffffffff8234a071>] init_workqueues+0x35/0x505
...
v2: CPU number added to the generic debug info as requested by s390
folks and dropped the s390 specific dump_stack(). This loses %ksp
from the debug message which the maintainers think isn't important
enough to keep the s390-specific dump_stack() implementation.
dump_stack_print_info() is moved to kernel/printk.c from
lib/dump_stack.c. Because linkage is per objecct file,
dump_stack_print_info() living in the same lib file as generic
dump_stack() means that archs which implement custom dump_stack()
- at this point, only blackfin - can't use dump_stack_print_info()
as that will bring in the generic version of dump_stack() too. v1
The v1 patch broke build on blackfin due to this issue. The build
breakage was reported by Fengguang Wu.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390 bits]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael J Wysocki:
- ARM big.LITTLE cpufreq driver from Viresh Kumar.
- exynos5440 cpufreq driver from Amit Daniel Kachhap.
- cpufreq core cleanup and code consolidation from Viresh Kumar and
Stratos Karafotis.
- cpufreq scalability improvement from Nathan Zimmer.
- AMD "frequency sensitivity feedback" powersave bias for the ondemand
cpufreq governor from Jacob Shin.
- cpuidle code consolidation and cleanups from Daniel Lezcano.
- ARM OMAP cpuidle fixes from Santosh Shilimkar and Daniel Lezcano.
- ACPICA fixes and other improvements from Bob Moore, Jung-uk Kim, Lv
Zheng, Yinghai Lu, Tang Chen, Colin Ian King, and Linn Crosetto.
- ACPI core updates related to hotplug from Toshi Kani, Paul Bolle,
Yasuaki Ishimatsu, and Rafael J Wysocki.
- Intel Lynxpoint LPSS (Low-Power Subsystem) support improvements from
Rafael J Wysocki and Andy Shevchenko.
* tag 'pm+acpi-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (192 commits)
cpufreq: Revert incorrect commit 5800043
cpufreq: MAINTAINERS: Add co-maintainer
cpuidle: add maintainer entry
ACPI / thermal: do not always return THERMAL_TREND_RAISING for active trip points
ARM: s3c64xx: cpuidle: use init/exit common routine
cpufreq: pxa2xx: initialize variables
ACPI: video: correct acpi_video_bus_add error processing
SH: cpuidle: use init/exit common routine
ARM: S5pv210: compiling issue, ARM_S5PV210_CPUFREQ needs CONFIG_CPU_FREQ_TABLE=y
ACPI: Fix wrong parameter passed to memblock_reserve
cpuidle: fix comment format
pnp: use %*phC to dump small buffers
isapnp: remove debug leftovers
ARM: imx: cpuidle: use init/exit common routine
ARM: davinci: cpuidle: use init/exit common routine
ARM: kirkwood: cpuidle: use init/exit common routine
ARM: calxeda: cpuidle: use init/exit common routine
ARM: tegra: cpuidle: use init/exit common routine for tegra3
ARM: tegra: cpuidle: use init/exit common routine for tegra2
ARM: OMAP4: cpuidle: use init/exit common routine
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP/hotplug changes from Ingo Molnar:
"This is a pretty large, multi-arch series unifying and generalizing
the various disjunct pieces of idle routines that architectures have
historically copied from each other and have grown in random, wildly
inconsistent and sometimes buggy directions:
101 files changed, 455 insertions(+), 1328 deletions(-)
this went through a number of review and test iterations before it was
committed, it was tested on various architectures, was exposed to
linux-next for quite some time - nevertheless it might cause problems
on architectures that don't read the mailing lists and don't regularly
test linux-next.
This cat herding excercise was motivated by the -rt kernel, and was
brought to you by Thomas "the Whip" Gleixner."
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
idle: Remove GENERIC_IDLE_LOOP config switch
um: Use generic idle loop
ia64: Make sure interrupts enabled when we "safe_halt()"
sparc: Use generic idle loop
idle: Remove unused ARCH_HAS_DEFAULT_IDLE
bfin: Fix typo in arch_cpu_idle()
xtensa: Use generic idle loop
x86: Use generic idle loop
unicore: Use generic idle loop
tile: Use generic idle loop
tile: Enter idle with preemption disabled
sh: Use generic idle loop
score: Use generic idle loop
s390: Use generic idle loop
powerpc: Use generic idle loop
parisc: Use generic idle loop
openrisc: Use generic idle loop
mn10300: Use generic idle loop
mips: Use generic idle loop
microblaze: Use generic idle loop
...
|
|
Resolve conflicts for Ingo.
Conflicts:
drivers/firmware/Kconfig
drivers/firmware/efivars.c
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
|
|
Fix build with CONFIG_PCI unset by linking KVM_CAP_IOMMU to
device assignment config option. It has no purpose otherwise.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
When booting on a large memory system, the kernel spends considerable
time in memmap_init_zone() setting up memory zones. Analysis shows
significant time spent in __early_pfn_to_nid().
The routine memmap_init_zone() checks each PFN to verify the nid is
valid. __early_pfn_to_nid() sequentially scans the list of pfn ranges
to find the right range and returns the nid. This does not scale well.
On a 4 TB (single rack) system there are 308 memory ranges to scan. The
higher the PFN the more time spent sequentially spinning through memory
ranges.
Since memmap_init_zone() increments pfn, it will almost always be
looking for the same range as the previous pfn, so check that range
first. If it is in the same range, return that nid. If not, scan the
list as before.
A 4 TB (single rack) UV1 system takes 512 seconds to get through the
zone code. This performance optimization reduces the time by 189
seconds, a 36% improvement.
A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds,
a 112.9 second (53%) reduction.
[akpm@linux-foundation.org: make the statics __meminitdata]
[akpm@linux-foundation.org: fix comment formatting]
[akpm@linux-foundation.org: fix ia64, per yinghai]
[akpm@linux-foundation.org: add missing semicolon, per Tony]
Signed-off-by: Russ Anderson <rja@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Tested-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The sparse code, when asking the architecture to populate the vmemmap,
specifies the section range as a starting page and a number of pages.
This is an awkward interface, because none of the arch-specific code
actually thinks of the range in terms of 'struct page' units and always
translates it to bytes first.
In addition, later patches mix huge page and regular page backing for
the vmemmap. For this, they need to call vmemmap_populate_basepages()
on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind. But these
are not necessarily multiples of the 'struct page' size and so this unit
is too coarse.
Just translate the section range into bytes once in the generic sparse
code, then pass byte ranges down the stack.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: David S. Miller <davem@davemloft.net>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit abf09bed3cce ("s390/mm: implement software dirty bits")
introduced another difference in the pte layout vs. the pmd layout on
s390, thoroughly breaking the s390 support for hugetlbfs. This requires
replacing some more pte_xxx functions in mm/hugetlbfs.c with a
huge_pte_xxx version.
This patch introduces those huge_pte_xxx functions and their generic
implementation in asm-generic/hugetlb.h, which will now be included on
all architectures supporting hugetlbfs apart from s390. This change
will be a no-op for those architectures.
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz> [for !s390 parts]
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use common help functions to free reserved pages.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
On large systems with a lot of memory, walking all RAM to determine page
types may take a half second or even more.
In non-blockable contexts, the page allocator will emit a page allocation
failure warning unless __GFP_NOWARN is specified. In such contexts, irqs
are typically disabled and such a lengthy delay may even result in NMI
watchdog timeouts.
To fix this, suppress the page walk in such contexts when printing the
page allocation failure warning.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Don't use create_proc_read_entry() as that is deprecated, but rather use
proc_create_data() and seq_file instead.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tony Luck <tony.luck@intel.com>
cc: Fenghua Yu <fenghua.yu@intel.com>
cc: linux-ia64@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver update from Greg Kroah-Hartman:
"Here's the big tty/serial driver merge request for 3.10-rc1
Once again, Jiri has a number of TTY driver fixes and cleanups, and
Peter Hurley came through with a bunch of ldisc fixes that resolve a
number of reported issues. There are some other serial driver
cleanups as well.
All of these have been in the linux-next tree for a while"
* tag 'tty-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (117 commits)
tty/serial/sirf: fix MODULE_DEVICE_TABLE
serial: mxs: drop superfluous {get|put}_device
serial: mxs: fix buffer overflow
ARM: PL011: add support for extended FIFO-size of PL011-r1p5
serial_core.c: add put_device() after device_find_child()
tty: Fix unsafe bit ops in tty_throttle_safe/unthrottle_safe
serial: sccnxp: Replace pdata.init/exit with regulator API
serial: sccnxp: Do not override device name
TTY: pty, fix compilation warning
TTY: rocket, fix compilation warning
TTY: ircomm: fix DTR being raised on hang up
TTY: synclinkmp: fix DTR being raised on hang up
TTY: synclink_gt: fix DTR being raised on hang up
TTY: synclink: fix DTR being raised on hang up
serial: 8250_dw: Fix the stub for dw8250_probe_acpi()
serial: 8250_dw: Convert to devm_ioremap()
serial: 8250_dw: Set port capabilities based on CPR register
serial: 8250_dw: Let ACPI code extract the DMA client info
serial: 8250_dw: Support clk framework also with ACPI
serial: 8250_dw: Enable runtime PM
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"PCI changes for the v3.10 merge window:
PCI device hotplug
- Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe)
- Make acpiphp builtin only, not modular (Jiang Liu)
- Add acpiphp mutual exclusion (Jiang Liu)
Power management
- Skip "PME enabled/disabled" messages when not supported (Rafael
Wysocki)
- Fix fallback to PCI_D0 (Rafael Wysocki)
Miscellaneous
- Factor quirk_io_region (Yinghai Lu)
- Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas)
- Clean up EISA resource initialization and logging (Bjorn Helgaas)
- Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas)
- MIPS: Initialize of_node before scanning bus (Gabor Juhos)
- Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor
Juhos)
- Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang)
- Fix aer_inject return values (Prarit Bhargava)
- Remove PME/ACPI dependency (Andrew Murray)
- Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan)"
* tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (63 commits)
vfio-pci: Use cached MSI/MSI-X capabilities
vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
PCI: Remove "extern" from function declarations
PCI: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
PCI: Drop msi_mask_reg() and remove drivers/pci/msi.h
PCI: Use msix_table_size() directly, drop multi_msix_capable()
PCI: Drop msix_table_offset_reg() and msix_pba_offset_reg() macros
PCI: Drop is_64bit_address() and is_mask_bit_support() macros
PCI: Drop msi_data_reg() macro
PCI: Drop msi_lower_address_reg() and msi_upper_address_reg() macros
PCI: Drop msi_control_reg() macro and use PCI_MSI_FLAGS directly
PCI: Use cached MSI/MSI-X offsets from dev, not from msi_desc
PCI: Clean up MSI/MSI-X capability #defines
PCI: Use cached MSI-X cap while enabling MSI-X
PCI: Use cached MSI cap while enabling MSI interrupts
PCI: Remove MSI/MSI-X cap check in pci_msi_check_device()
PCI: Cache MSI/MSI-X capability offsets in struct pci_dev
PCI: Use u8, not int, for PM capability offset
[SCSI] megaraid_sas: Use correct #define for MSI-X capability
PCI: Remove "extern" from function declarations
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull ia64 fixes from Tony Luck:
"Bundle of miscellaneous ia64 fixes for 3.10 merge window."
* tag 'please-pull-misc-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
Add size restriction to the kdump documentation
Fix example error_injection_tool
Fix build error for numa_clear_node() under IA64
Fix initialization of CMCI/CMCP interrupts
Change "select DMAR" to "select INTEL_IOMMU"
Wrong asm register contraints in the kvm implementation
Wrong asm register contraints in the futex implementation
Remove cast for kmalloc return value
Fix kexec oops when iosapic was removed
iosapic: fix a minor typo in comments
Add WB/UC check for early_ioremap
Fix broken fsys_getppid()
tiocx: check retval from bus_register()
|
|
We hope to at some point deprecate KVM legacy device assignment in
favor of VFIO-based assignment. Towards that end, allow legacy
device assignment to be deconfigured.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
|
|
* pm-cpufreq: (57 commits)
cpufreq: MAINTAINERS: Add co-maintainer
cpufreq: pxa2xx: initialize variables
ARM: S5pv210: compiling issue, ARM_S5PV210_CPUFREQ needs CONFIG_CPU_FREQ_TABLE=y
cpufreq: cpu0: Put cpu parent node after using it
cpufreq: ARM big LITTLE: Adapt to latest cpufreq updates
cpufreq: ARM big LITTLE: put DT nodes after using them
cpufreq: Don't call __cpufreq_governor() for drivers without target()
cpufreq: exynos5440: Protect OPP search calls with RCU lock
cpufreq: dbx500: Round to closest available freq
cpufreq: Call __cpufreq_governor() with correct policy->cpus mask
cpufreq / intel_pstate: Optimize intel_pstate_set_policy
cpufreq: OMAP: instantiate omap-cpufreq as a platform_driver
arm: exynos: Enable OPP library support for exynos5440
cpufreq: exynos: Remove error return even if no soc is found
cpufreq: exynos: Add cpufreq driver for exynos5440
cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor
cpufreq: ondemand: allow custom powersave_bias_target handler to be registered
cpufreq: convert cpufreq_driver to using RCU
cpufreq: powerpc/platforms/cell: move cpufreq driver to drivers/cpufreq
cpufreq: sparc: move cpufreq driver to drivers/cpufreq
...
Conflicts:
MAINTAINERS (with commit a8e39c3 from pm-cpuidle)
drivers/cpufreq/cpufreq_governor.h (with commit beb0ff3)
|
|
We changed a few things in non-ia64 code paths. This patch blindly applies
the changes to the ia64 code as well, hoping it proves useful in case anyone
revives the ia64 kvm code.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|
Conflicts:
drivers/net/ethernet/emulex/benet/be_main.c
drivers/net/ethernet/intel/igb/igb_main.c
drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
include/net/scm.h
net/batman-adv/routing.c
net/ipv4/tcp_input.c
The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
cleanup in net-next to now pass cred structs around.
The be2net driver had a bug fix in 'net' that overlapped with the VLAN
interface changes by Patrick McHardy in net-next.
An IGB conflict existed because in 'net' the build_skb() support was
reverted, and in 'net-next' there was a comment style fix within that
code.
Several batman-adv conflicts were resolved by making sure that all
calls to batadv_is_my_mac() are changed to have a new bat_priv first
argument.
Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
rewrite in 'net-next', mostly overlapping changes.
Thanks to Stephen Rothwell and Antonio Quartulli for help with several
of these merge resolutions.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix the compile error with kvm_vm_ioctl_irq_line.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
All archs are converted over. Remove the config switch and the
fallback code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
In commit d166991234347215dc23fc9dc15a63a83a1a54e1
idle: Implement generic idle function
Thomas Gleixner cleaned up many things but perturbed some
fragile code that was keeping ia64 alive. So we started
seeing:
WARNING: at kernel/cpu/idle.c:94 cpu_idle_loop+0x360/0x380()
and other unpleasantness like system hangs during boot.
We really shouldn't ever halt with interrupts disabled.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: magnus.damm@gmail.com
Link: http://lkml.kernel.org/r/516d9a0c26048eae9c@agluck-desk.sc.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Both TMR and EOI exit bitmap need to be updated when ioapic changed
or vcpu's id/ldr/dfr changed. So use common function instead eoi exit
bitmap specific function.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
GENERIC_GPIO has been made equivalent to GPIOLIB in architecture code
and all driver code has been switch to depend on GPIOLIB. It is thus
safe to have GENERIC_GPIO removed.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
|
|
We want the fixes here.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Implement pcibios_{add|remove}_bus() hooks for IA64 platforms.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
|
|
This patch moves cpufreq driver of IA64 architecture to drivers/cpufreq.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"A nasty bug in fs/namespace.c caught by Andrey + a couple of less
serious unpleasantness - ecryptfs misc device playing hopeless games
with try_module_get() and palinfo procfs support being... not quite
correctly done, to be polite."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
mnt: release locks on error path in do_loopback
palinfo fixes
procfs: add proc_remove_subtree()
ecryptfs: close rmmod race
|
|
* check for proc_mkdir() failures
* use remove_proc_subtree()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data. Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
* check for proc_mkdir() failures
* fix buffer overrun - sizeof(format string) is *not* enough to
hold sprintf() result.
* use proc_remove_subtree(); life's much easier with it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20130321215234.406851909@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Move it to a common place. Preparatory patch for implementing
set/clear for the idle need_resched poll implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130321215233.446034505@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
numa_clear_node() function is not implemented under IA64,
it will be called in unmap_cpu_on_node() in mm/memory_hotplug.c.
This cause build error under IA64, this patch adds numa_clear_node()
in IA64 to fix this problem.
[Added __cpuinit notation to numa_clear_node() to keep linker happy -Tony]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Back 2010 during a revamp of the irq code some initializations
were moved from ia64_mca_init() to ia64_mca_late_init() in
commit c75f2aa13f5b268aba369b5dc566088b5194377c
Cannot use register_percpu_irq() from ia64_mca_init()
But this was hideously wrong. First of all these initializations
are now down far too late. Specifically after all the other cpus
have been brought up and initialized their own CMC vectors from
smp_callin(). Also ia64_mca_late_init() may be called from any cpu
so the line:
ia64_mca_cmc_vector_setup(); /* Setup vector on BSP */
is generally not executed on the BSP, and so the CMC vector isn't
setup at all on that processor.
Make use of the arch_early_irq_init() hook to get this code executed
at just the right moment: not too early, not too late.
Reported-by: Fred Hartnett <fred.hartnett@hp.com>
Tested-by: Fred Hartnett <fred.hartnett@hp.com>
Cc: stable@kernel.org # v2.6.37+
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
policy->cpus contains all online cpus that have single shared clock line. And
their frequencies are always updated together.
Many SMP system's cpufreq drivers take care of this in individual drivers but
the best place for this code is in cpufreq core.
This patch modifies cpufreq_notify_transition() to notify frequency change for
all cpus in policy->cpus and hence updates all users of this API.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
We need the fixes here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Conflicts:
net/mac80211/sta_info.c
net/wireless/core.h
Two minor conflicts in wireless. Overlapping additions of extern
declarations in net/wireless/core.h and a bug fix overlapping with
the addition of a boolean parameter to __ieee80211_key_free().
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, when a socket receives something on the error queue it only wakes up
the socket on select if it is in the "read" list, that is the socket has
something to read. It is useful also to wake the socket if it is in the error
list, which would enable software to wait on error queue packets without waking
up for regular data on the socket. The main use case is for receiving
timestamped transmit packets which return the timestamp to the socket via the
error queue. This enables an application to select on the socket for the error
queue only instead of for the regular traffic.
-v2-
* Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file
* Modified every socket poll function that checks error queue
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jeffrey Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Matthew Vick <matthew.vick@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 3e7fc708eb41 ("ia64 idle: delete pm_idle") in 3.9-rc1 didn't
finish the job, leaving an un-initialized reference to (*idle)().
[ Haven't seen a crash from this - but seems like we are just being
lucky that "idle" is zero so it does get initialized before we jump to
randomland - Len ]
Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
Merge reason:
From: Alexander Graf <agraf@suse.de>
"Just recently this really important patch got pulled into Linus' tree for 3.9:
commit 1674400aaee5b466c595a8fc310488263ce888c7
Author: Anton Blanchard <anton <at> samba.org>
Date: Tue Mar 12 01:51:51 2013 +0000
Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue.
Could you please merge kvm/next against linus/master, so that I can base my trees against that?"
* upstream/master: (653 commits)
PCI: Use ROM images from firmware only if no other ROM source available
sparc: remove unused "config BITS"
sparc: delete "if !ULTRA_HAS_POPULATION_COUNT"
KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)
KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797)
KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796)
arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS
arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED
inet: limit length of fragment queue hash table bucket lists
qeth: Fix scatter-gather regression
qeth: Fix invalid router settings handling
qeth: delay feature trace
sgy-cts1000: Remove __dev* attributes
KVM: x86: fix deadlock in clock-in-progress request handling
KVM: allow host header to be included even for !CONFIG_KVM
hwmon: (lm75) Fix tcn75 prefix
hwmon: (lm75.h) Update header inclusion
MAINTAINERS: Remove Mark M. Hoffman
xfs: ensure we capture IO errors correctly
xfs: fix xfs_iomap_eof_prealloc_initial_size type
...
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
|
Commit d3f138106b ("iommu: Rename the DMAR and INTR_REMAP config
options") changed all references to DMAR in Kconfig files to INTEL_IOMMU
(and, likewise, changed the references to CONFIG_DMAR everywhere else
to CONFIG_INTEL_IOMMU). That commit missed one "select DMAR" statement
in ia64's Kconfig file. Change that one too.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
The Linux Kernel contains some inline assembly source code which has
wrong asm register constraints in arch/ia64/kvm/vtlb.c.
I observed this on Kernel 3.2.35 but it is also true on the most
recent Kernel 3.9-rc1.
File arch/ia64/kvm/vtlb.c:
u64 guest_vhpt_lookup(u64 iha, u64 *pte)
{
u64 ret;
struct thash_data *data;
data = __vtr_lookup(current_vcpu, iha, D_TLB);
if (data != NULL)
thash_vhpt_insert(current_vcpu, data->page_flags,
data->itir, iha, D_TLB);
asm volatile (
"rsm psr.ic|psr.i;;"
"srlz.d;;"
"ld8.s r9=[%1];;"
"tnat.nz p6,p7=r9;;"
"(p6) mov %0=1;"
"(p6) mov r9=r0;"
"(p7) extr.u r9=r9,0,53;;"
"(p7) mov %0=r0;"
"(p7) st8 [%2]=r9;;"
"ssm psr.ic;;"
"srlz.d;;"
"ssm psr.i;;"
"srlz.d;;"
: "=r"(ret) : "r"(iha), "r"(pte):"memory");
return ret;
}
The list of output registers is
: "=r"(ret) : "r"(iha), "r"(pte):"memory");
The constraint "=r" means that the GCC has to maintain that these vars
are in registers and contain valid info when the program flow leaves
the assembly block (output registers).
But "=r" also means that GCC can put them in registers that are used
as input registers. Input registers are iha, pte on the example.
If the predicate p7 is true, the 8th assembly instruction
"(p7) mov %0=r0;"
is the first one which writes to a register which is maintained by the
register constraints; it sets %0. %0 means the first register operand;
it is ret here.
This instruction might overwrite the %2 register (pte) which is needed
by the next instruction:
"(p7) st8 [%2]=r9;;"
Whether it really happens depends on how GCC decides what registers it
uses and how it optimizes the code.
The attached patch fixes the register operand constraints in
arch/ia64/kvm/vtlb.c.
The register constraints should be
: "=&r"(ret) : "r"(iha), "r"(pte):"memory");
The & means that GCC must not use any of the input registers to place
this output register in.
This is Debian bug#702639
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702639).
The patch is applicable on Kernel 3.9-rc1, 3.2.35 and many other versions.
Signed-off-by: Stephan Schreiber <info@fs-driver.org>
Cc: stable@vger.kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
The Linux Kernel contains some inline assembly source code which has
wrong asm register constraints in arch/ia64/include/asm/futex.h.
I observed this on Kernel 3.2.23 but it is also true on the most
recent Kernel 3.9-rc1.
File arch/ia64/include/asm/futex.h:
static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
u32 oldval, u32 newval)
{
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;
{
register unsigned long r8 __asm ("r8");
unsigned long prev;
__asm__ __volatile__(
" mf;; \n"
" mov %0=r0 \n"
" mov ar.ccv=%4;; \n"
"[1:] cmpxchg4.acq %1=[%2],%3,ar.ccv \n"
" .xdata4 \"__ex_table\", 1b-., 2f-. \n"
"[2:]"
: "=r" (r8), "=r" (prev)
: "r" (uaddr), "r" (newval),
"rO" ((long) (unsigned) oldval)
: "memory");
*uval = prev;
return r8;
}
}
The list of output registers is
: "=r" (r8), "=r" (prev)
The constraint "=r" means that the GCC has to maintain that these vars
are in registers and contain valid info when the program flow leaves
the assembly block (output registers).
But "=r" also means that GCC can put them in registers that are used
as input registers. Input registers are uaddr, newval, oldval on the
example.
The second assembly instruction
" mov %0=r0 \n"
is the first one which writes to a register; it sets %0 to 0. %0 means
the first register operand; it is r8 here. (The r0 is read-only and
always 0 on the Itanium; it can be used if an immediate zero value is
needed.)
This instruction might overwrite one of the other registers which are
still needed.
Whether it really happens depends on how GCC decides what registers it
uses and how it optimizes the code.
The objdump utility can give us disassembly.
The futex_atomic_cmpxchg_inatomic() function is inline, so we have to
look for a module that uses the funtion. This is the
cmpxchg_futex_value_locked() function in
kernel/futex.c:
static int cmpxchg_futex_value_locked(u32 *curval, u32 __user *uaddr,
u32 uval, u32 newval)
{
int ret;
pagefault_disable();
ret = futex_atomic_cmpxchg_inatomic(curval, uaddr, uval, newval);
pagefault_enable();
return ret;
}
Now the disassembly. At first from the Kernel package 3.2.23 which has
been compiled with GCC 4.4, remeber this Kernel seemed to work:
objdump -d linux-3.2.23/debian/build/build_ia64_none_mckinley/kernel/futex.o
0000000000000230 <cmpxchg_futex_value_locked>:
230: 0b 18 80 1b 18 21 [MMI] adds r3=3168,r13;;
236: 80 40 0d 00 42 00 adds r8=40,r3
23c: 00 00 04 00 nop.i 0x0;;
240: 0b 50 00 10 10 10 [MMI] ld4 r10=[r8];;
246: 90 08 28 00 42 00 adds r9=1,r10
24c: 00 00 04 00 nop.i 0x0;;
250: 09 00 00 00 01 00 [MMI] nop.m 0x0
256: 00 48 20 20 23 00 st4 [r8]=r9
25c: 00 00 04 00 nop.i 0x0;;
260: 08 10 80 06 00 21 [MMI] adds r2=32,r3
266: 00 00 00 02 00 00 nop.m 0x0
26c: 02 08 f1 52 extr.u r16=r33,0,61
270: 05 40 88 00 08 e0 [MLX] addp4 r8=r34,r0
276: ff ff 0f 00 00 e0 movl r15=0xfffffffbfff;;
27c: f1 f7 ff 65
280: 09 70 00 04 18 10 [MMI] ld8 r14=[r2]
286: 00 00 00 02 00 c0 nop.m 0x0
28c: f0 80 1c d0 cmp.ltu p6,p7=r15,r16;;
290: 08 40 fc 1d 09 3b [MMI] cmp.eq p8,p9=-1,r14
296: 00 00 00 02 00 40 nop.m 0x0
29c: e1 08 2d d0 cmp.ltu p10,p11=r14,r33
2a0: 56 01 10 00 40 10 [BBB] (p10) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
2a6: 02 08 00 80 21 03 (p08) br.cond.dpnt.few 2b0
<cmpxchg_futex_value_locked+0x80>
2ac: 40 00 00 41 (p06) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
2b0: 0a 00 00 00 22 00 [MMI] mf;;
2b6: 80 00 00 00 42 00 mov r8=r0
2bc: 00 00 04 00 nop.i 0x0
2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;;
2c6: 10 1a 85 22 20 00 cmpxchg4.acq r33=[r33],r35,ar.ccv
2cc: 00 00 04 00 nop.i 0x0;;
2d0: 10 00 84 40 90 11 [MIB] st4 [r32]=r33
2d6: 00 00 00 02 00 00 nop.i 0x0
2dc: 20 00 00 40 br.few 2f0
<cmpxchg_futex_value_locked+0xc0>
2e0: 09 40 c8 f9 ff 27 [MMI] mov r8=-14
2e6: 00 00 00 02 00 00 nop.m 0x0
2ec: 00 00 04 00 nop.i 0x0;;
2f0: 0b 58 20 1a 19 21 [MMI] adds r11=3208,r13;;
2f6: 20 01 2c 20 20 00 ld4 r18=[r11]
2fc: 00 00 04 00 nop.i 0x0;;
300: 0b 88 fc 25 3f 23 [MMI] adds r17=-1,r18;;
306: 00 88 2c 20 23 00 st4 [r11]=r17
30c: 00 00 04 00 nop.i 0x0;;
310: 11 00 00 00 01 00 [MIB] nop.m 0x0
316: 00 00 00 02 00 80 nop.i 0x0
31c: 08 00 84 00 br.ret.sptk.many b0;;
The lines
2b0: 0a 00 00 00 22 00 [MMI] mf;;
2b6: 80 00 00 00 42 00 mov r8=r0
2bc: 00 00 04 00 nop.i 0x0
2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;;
2c6: 10 1a 85 22 20 00 cmpxchg4.acq r33=[r33],r35,ar.ccv
2cc: 00 00 04 00 nop.i 0x0;;
are the instructions of the assembly block.
The line
2b6: 80 00 00 00 42 00 mov r8=r0
sets the r8 register to 0 and after that
2c0: 0b 00 20 40 2a 04 [MMI] mov.m ar.ccv=r8;;
prepares the 'oldvalue' for the cmpxchg but it takes it from r8. This
is wrong.
What happened here is what I explained above: An input register is
overwritten which is still needed.
The register operand constraints in futex.h are wrong.
(The problem doesn't occur when the Kernel is compiled with GCC 4.6.)
The attached patch fixes the register operand constraints in futex.h.
The code after patching of it:
static inline int
futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
u32 oldval, u32 newval)
{
if (!access_ok(VERIFY_WRITE, uaddr, sizeof(u32)))
return -EFAULT;
{
register unsigned long r8 __asm ("r8") = 0;
unsigned long prev;
__asm__ __volatile__(
" mf;; \n"
" mov ar.ccv=%4;; \n"
"[1:] cmpxchg4.acq %1=[%2],%3,ar.ccv \n"
" .xdata4 \"__ex_table\", 1b-., 2f-. \n"
"[2:]"
: "+r" (r8), "=&r" (prev)
: "r" (uaddr), "r" (newval),
"rO" ((long) (unsigned) oldval)
: "memory");
*uval = prev;
return r8;
}
}
I also initialized the 'r8' var with the C programming language.
The _asm qualifier on the definition of the 'r8' var forces GCC to use
the r8 processor register for it.
I don't believe that we should use inline assembly for zeroing out a
local variable.
The constraint is
"+r" (r8)
what means that it is both an input register and an output register.
Note that the page fault handler will modify the r8 register which
will be the return value of the function.
The real fix is
"=&r" (prev)
The & means that GCC must not use any of the input registers to place
this output register in.
Patched the Kernel 3.2.23 and compiled it with GCC4.4:
0000000000000230 <cmpxchg_futex_value_locked>:
230: 0b 18 80 1b 18 21 [MMI] adds r3=3168,r13;;
236: 80 40 0d 00 42 00 adds r8=40,r3
23c: 00 00 04 00 nop.i 0x0;;
240: 0b 50 00 10 10 10 [MMI] ld4 r10=[r8];;
246: 90 08 28 00 42 00 adds r9=1,r10
24c: 00 00 04 00 nop.i 0x0;;
250: 09 00 00 00 01 00 [MMI] nop.m 0x0
256: 00 48 20 20 23 00 st4 [r8]=r9
25c: 00 00 04 00 nop.i 0x0;;
260: 08 10 80 06 00 21 [MMI] adds r2=32,r3
266: 20 12 01 10 40 00 addp4 r34=r34,r0
26c: 02 08 f1 52 extr.u r16=r33,0,61
270: 05 40 00 00 00 e1 [MLX] mov r8=r0
276: ff ff 0f 00 00 e0 movl r15=0xfffffffbfff;;
27c: f1 f7 ff 65
280: 09 70 00 04 18 10 [MMI] ld8 r14=[r2]
286: 00 00 00 02 00 c0 nop.m 0x0
28c: f0 80 1c d0 cmp.ltu p6,p7=r15,r16;;
290: 08 40 fc 1d 09 3b [MMI] cmp.eq p8,p9=-1,r14
296: 00 00 00 02 00 40 nop.m 0x0
29c: e1 08 2d d0 cmp.ltu p10,p11=r14,r33
2a0: 56 01 10 00 40 10 [BBB] (p10) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
2a6: 02 08 00 80 21 03 (p08) br.cond.dpnt.few 2b0
<cmpxchg_futex_value_locked+0x80>
2ac: 40 00 00 41 (p06) br.cond.spnt.few 2e0
<cmpxchg_futex_value_locked+0xb0>
2b0: 0b 00 00 00 22 00 [MMI] mf;;
2b6: 00 10 81 54 08 00 mov.m ar.ccv=r34
2bc: 00 00 04 00 nop.i 0x0;;
2c0: 09 58 8c 42 11 10 [MMI] cmpxchg4.acq r11=[r33],r35,ar.ccv
2c6: 00 00 00 02 00 00 nop.m 0x0
2cc: 00 00 04 00 nop.i 0x0;;
2d0: 10 00 2c 40 90 11 [MIB] st4 [r32]=r11
2d6: 00 00 00 02 00 00 nop.i 0x0
2dc: 20 00 00 40 br.few 2f0
<cmpxchg_futex_value_locked+0xc0>
2e0: 09 40 c8 f9 ff 27 [MMI] mov r8=-14
2e6: 00 00 00 02 00 00 nop.m 0x0
2ec: 00 00 04 00 nop.i 0x0;;
2f0: 0b 88 20 1a 19 21 [MMI] adds r17=3208,r13;;
2f6: 30 01 44 20 20 00 ld4 r19=[r17]
2fc: 00 00 04 00 nop.i 0x0;;
300: 0b 90 fc 27 3f 23 [MMI] adds r18=-1,r19;;
306: 00 90 44 20 23 00 st4 [r17]=r18
30c: 00 00 04 00 nop.i 0x0;;
310: 11 00 00 00 01 00 [MIB] nop.m 0x0
316: 00 00 00 02 00 80 nop.i 0x0
31c: 08 00 84 00 br.ret.sptk.many b0;;
Much better.
There is a
270: 05 40 00 00 00 e1 [MLX] mov r8=r0
which was generated by C code r8 = 0. Below
2b6: 00 10 81 54 08 00 mov.m ar.ccv=r34
what means that oldval is no longer overwritten.
This is Debian bug#702641
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702641).
The patch is applicable on Kernel 3.9-rc1, 3.2.23 and many other versions.
Signed-off-by: Stephan Schreiber <info@fs-driver.org>
Cc: stable@vger.kernel.org
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
remove cast for kmalloc return value.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Iosapic hotplug was supported in IA64 code, but will lead to kexec oops
when iosapic was removed. here is the code logic:
iosapic_remove
iosapic_free
memset(&iosapic_lists[index], 0, sizeof(iosapic_lists[0]))
iosapic_lists[index].addr was set to 0;
and then kexec a new kernel
kexec_disable_iosapic
iosapic_write(rte->iosapic,..)
__iosapic_write(iosapic->addr, reg, val);
addr was set to 0 when iosapic_remove, and oops happened
The call trace is:
Starting new kernel
kexec[11336]: Oops 8804682956800 [1]
Modules linked in: raw(N) ipv6(N) acpi_cpufreq(N) binfmt_misc(N) fuse(N) nls_iso
8859_1(N) loop(N) ipmi_si(N) ipmi_devintf(N) ipmi_msghandler(N) mca_ereport(N) s
csi_ereport(N) nic_ereport(N) pcie_ereport(N) err_transport(N) nvlist(PN) dm_mod
(N) tpm_tis(N) tpm(N) ppdev(N) tpm_bios(N) serio_raw(N) i2c_i801(N) iTCO_wdt(N)
i2c_core(N) iTCO_vendor_support(N) sg(N) ioatdma(N) igb(N) mptctl(N) dca(N) parp
ort_pc(N) parport(N) container(N) button(N) usbhid(N) hid(N) uhci_hcd(N) ehci_hc
d(N) usbcore(N) sd_mod(N) crc_t10dif(N) ext3(N) mbcache(N) jbd(N) fan(N) process
or(N) ide_pci_generic(N) ide_core(N) ata_piix(N) libata(N) mptsas(N) mptscsih(N)
mptbase(N) scsi_transport_sas(N) scsi_mod(N) thermal(N) thermal_sys(N) hwmon(N)
Supported: Yes, External
Pid: 11336, CPU 0, comm: kexec
psr : 0000101009522030 ifs : 8000000000000791 ip : [<a00000010004c160>] Tain
ted: P N (2.6.32.12_RAS_V1R3C00B011)
ip is at kexec_disable_iosapic+0x120/0x1e0
unat: 0000000000000000 pfs : 0000000000000791 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr : 65519aa6a555a659
ldrs: 0000000000000000 ccv : 00000000ea3cf51e fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0 : a00000010004c150 b6 : a000000100012620 b7 : a00000010000cda0
f6 : 000000000000000000000 f7 : 1003e0000000002000000
f8 : 1003e0000000050000003 f9 : 1003e0000028fb97183cd
f10 : 1003ee9f380df3c548b67 f11 : 1003e00000000000000cc
r1 : a0000001016cf660 r2 : 0000000000000000 r3 : 0000000000000000
r8 : 0000001009526030 r9 : a000000100012620 r10 : e00000010053f600
r11 : c0000000fec34040 r12 : e00000078f76fd30 r13 : e00000078f760000
r14 : 0000000000000000 r15 : 0000000000000000 r16 : 0000000000000000
r17 : 0000000000000000 r18 : 0000000000007fff r19 : 0000000000000000
r20 : 0000000000000000 r21 : e00000010053f590 r22 : a000000100cf0000
r23 : 0000000000000036 r24 : e0000007002f8a84 r25 : 0000000000000022
r26 : e0000007002f8a88 r27 : 0000000000000020 r28 : 0000000000000002
r29 : a0000001012c8c60 r30 : 0000000000000000 r31 : 0000000000322e49
Call Trace:
[<a000000100018ca0>] show_stack+0x80/0xa0
sp=e00000078f76f8f0 bsp=e00000078f761380
[<a000000100019300>] show_regs+0x640/0x920
sp=e00000078f76fac0 bsp=e00000078f761328
[<a00000010002a130>] die+0x190/0x2e0
sp=e00000078f76fad0 bsp=e00000078f7612e8
[<a000000100922fa0>] ia64_do_page_fault+0x840/0xb20
sp=e00000078f76fad0 bsp=e00000078f761288
[<a00000010000d5c0>] ia64_native_leave_kernel+0x0/0x270
sp=e00000078f76fb60 bsp=e00000078f761288
[<a00000010004c160>] kexec_disable_iosapic+0x120/0x1e0
sp=e00000078f76fd30 bsp=e00000078f761200
[<a000000100016970>] machine_shutdown+0x110/0x140
sp=e00000078f76fd30 bsp=e00000078f7611c8
[<a000000100133530>] kernel_kexec+0xd0/0x120
sp=e00000078f76fd30 bsp=e00000078f7611a0
[<a0000001000eca40>] sys_reboot+0x480/0x4e0
sp=e00000078f76fd30 bsp=e00000078f761128
[<a00000010000d420>] ia64_ret_from_syscall+0x0/0x20
sp=e00000078f76fe30 bsp=e00000078f761120
Kernel panic - not syncing: Fatal exception
With Tony and Toshi's advice, the patch removes the "rte" from rte_list
when the iosapic was removed.
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
describeinterrupts -> describe interrupts
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
On ia64 system, the function early_ioremap returned an uncached memory
reference without checking whether this was consistent with existing
mappings. This causes efi error and the kernel failed during boot. Add a
check to test whether memory has EFI_MEMORY_WB set. Use the function
kern_mem_attribute() in early_iomap() function to provide appropriate
cacheable or uncacheable mapped address.
See the document Documentation/ia64/aliasing.txt for more details.
Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|