Age | Commit message (Collapse) | Author | Files | Lines |
|
commit fa19ac4b92bc2b5024af3e868f41f81fa738567a upstream.
Fix UE event being reported as HW_EVENT_ERR_CORRECTED.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/8beb13803500076fef827eab33d523e355d83759.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8030122a9ccf939186f8db96c318dbb99b5463f6 upstream.
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/e6dd616f2cd51583a7e77af6f639b86313c74144.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8a3f075d6c9b3612b4a5fb2af8db82b38b20caf0 upstream.
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/d02465b4f30314b390c12c061502eda5e9d29c52.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ab0543de6ff0877474f57a5aafbb51a61e88676f upstream.
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/7aee8e244a32ff86b399a8f966c4aae70296aae0.1413405053.git.jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
pci_get_device() decrements the reference count of "from" (last
argument) so when we break off the loop successfully we have only one
device reference - and we don't know which device we have. If we want
a reference to each device, we must take them explicitly and let
the pci_get_device() walk complete to avoid duplicate references.
This is serious, as over-putting device references will cause
the device to eventually disappear. Without this fix, the kernel
crashes after a few insmod/rmmod cycles.
Tested on an Intel S7000FC4UR system with a 7300 chipset.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
The reference count changes done by pci_get_device can be a little
misleading when the usage diverges from the most common scheme. The
reference count of the device passed as the last parameter is always
decreased, even if the function returns no new device. So if we are
going to try alternative device IDs, we must manually increment the
device reference count before each retry. If we don't, we end up
decreasing the reference count, and after a few modprobe/rmmod cycles
the PCI devices will vanish.
In other words and as Alan put it: without this fix the EDAC code
corrupts the PCI device list.
This fixes kernel bug #50491:
https://bugzilla.kernel.org/show_bug.cgi?id=50491
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare
Reviewed-by: Alan Cox <alan@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
We're using edac_mc_workq_setup() both on the init path, when
we load an edac driver and when we change the polling period
(edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec.
On that second path we don't need to init the workqueue which has been
initialized already.
Thanks to Tejun for workqueue insights.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: <stable@vger.kernel.org>
|
|
Sanitize code even more to accept unsigned longs only and to not allow
polling intervals below 1 second as this is unnecessary and doesn't make
much sense anyway for polling errors.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@vger.kernel.org>
|
|
If you do
echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec
the following stack trace is output because the edac module is not
designed to poll with a timeout of zero.
WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0()
list_add corruption. prev->next should be next (ffff8808291dd1b8), but was (null). (prev=ffff8808286fe3f8).
Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1
Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Call Trace:
<IRQ>
__list_add+0xac/0xc0
__internal_add_timer+0xab/0x130
internal_add_timer+0x17/0x40
mod_timer_pinned+0xca/0x170
intel_pstate_timer_func+0x28a/0x380
call_timer_fn+0x36/0x100
run_timer_softirq+0x1ff/0x2f0
__do_softirq+0xf5/0x2e0
irq_exit+0x10d/0x120
smp_apic_timer_interrupt+0x45/0x60
apic_timer_interrupt+0x6d/0x80
<EOI>
cpuidle_idle_call+0xb9/0x1f0
arch_cpu_idle+0xe/0x30
cpu_startup_entry+0x9e/0x240
start_secondary+0x1e4/0x290
kernel BUG at kernel/timer.c:1084!
invalid opcode: 0000 [#1] SMP
Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 0 Comm: swapper/12 Tainted: G W 3.13.0+ #1
Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Call Trace:
<IRQ>
run_timer_softirq+0x245/0x2f0
__do_softirq+0xf5/0x2e0
irq_exit+0x10d/0x120
smp_apic_timer_interrupt+0x45/0x60
apic_timer_interrupt+0x6d/0x80
<EOI>
cpuidle_idle_call+0xb9/0x1f0
arch_cpu_idle+0xe/0x30
cpu_startup_entry+0x9e/0x240
start_secondary+0x1e4/0x290
RIP cascade+0x93/0xa0
WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0()
Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G W 3.13.0+ #1
Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Workqueue: edac-poller edac_mc_workq_function [edac_core]
Call Trace:
dump_stack+0x45/0x56
warn_slowpath_common+0x7d/0xa0
warn_slowpath_null+0x1a/0x20
__queue_delayed_work+0xed/0x1a0
queue_delayed_work_on+0x27/0x50
edac_mc_workq_function+0x72/0xa0 [edac_core]
process_one_work+0x17b/0x460
worker_thread+0x11b/0x400
kthread+0xd2/0xf0
ret_from_fork+0x7c/0xb0
This patch adds a range check in the edac_mc_poll_msec code to check for 0.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS changes from Ingo Molnar:
- SCI reporting for other error types not only correctable ones
- GHES cleanups
- Add the functionality to override error reporting agents as some
machines are sporting a new extended error logging capability which,
if done properly in the BIOS, makes a corresponding EDAC module
redundant
- PCIe AER tracepoint severity levels fix
- Error path correction for the mce device init
- MCE timer fix
- Add more flexibility to the error injection (EINJ) debugfs interface
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, mce: Fix mce_start_timer semantics
ACPI, APEI, GHES: Cleanup ghes memory error handling
ACPI, APEI: Cleanup alignment-aware accesses
ACPI, APEI, GHES: Do not report only correctable errors with SCI
ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface
ACPI, eMCA: Combine eMCA/EDAC event reporting priority
EDAC, sb_edac: Modify H/W event reporting policy
EDAC: Add an edac_report parameter to EDAC
PCI, AER: Fix severity usage in aer trace event
x86, mce: Call put_device on device_register failure
|
|
Pull EDAC updates from Borislav Petkov:
- mpc85xx PCIe error interrupt support
- misc small enhancements/fixes all over the place.
* tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC: Don't try to cancel workqueue when it's never setup
e752x_edac: Fix pci_dev usage count
sb_edac: Mark get_mci_for_node_id as static
EDAC: Mark edac_create_debug_nodes as static
amd64_edac: Remove "amd64" prefix from static functions
amd64_edac: Simplify code around decode_bus_error
amd64_edac: Mark amd64_decode_bus_error as static
EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro
amd64_edac: Fix condition to verify max channels allowed for F15 M30h
edac/85xx: Add PCIe error interrupt edac support
|
|
We only setup a workqueue for edac devices that use the polling
method. We still try to cancel the workqueue if an edac_device
uses the irq method though. This causes a warning from debug
objects when we remove an edac device:
WARNING: CPU: 0 PID: 56 at lib/debugobjects.c:260 debug_print_object+0x98/0xc0()
ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x28
Modules linked in: krait_edac(-)
CPU: 0 PID: 56 Comm: rmmod Not tainted 3.12.0-rc2-00035-g15292a0 #69
(unwind_backtrace+0x0/0x144)
(show_stack+0x20/0x24)
(dump_stack+0x74/0xb4)
(warn_slowpath_common+0x78/0x9c)
(warn_slowpath_fmt+0x40/0x48)
(debug_print_object+0x98/0xc0)
(debug_object_assert_init+0xdc/0xec)
(del_timer+0x24/0x7c)
(try_to_grab_pending+0xc0/0x1b0)
(cancel_delayed_work+0x2c/0xa0)
(edac_device_workq_teardown+0x1c/0x38)
(edac_device_del_device+0xb8/0xe4)
(krait_edac_remove+0x50/0x70 [krait_edac])
(platform_drv_remove+0x24/0x28)
(__device_release_driver+0x68/0xc0)
(driver_detach+0xc4/0xc8)
(bus_remove_driver+0xac/0x114)
(driver_unregister+0x38/0x58)
(platform_driver_unregister+0x1c/0x20)
(krait_edac_driver_exit+0x14/0x1c [krait_edac])
(SyS_delete_module+0x178/0x2b4)
Fix it by skipping the workqueue teardown for such devices.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Link: http://lkml.kernel.org/r/1388434457-4194-2-git-send-email-sboyd@codeaurora.org
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
In case the device 0, function 1 is not found using pci_get_device(),
pci_scan_single_device() will be used but, differently than
pci_get_device(), it allocates a pci_dev but doesn't does bump the usage
count on the pci_dev and after few module removals and loads the pci_dev
will be freed.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Reviewed-by: mark gross <mark.gross@intel.com>
Link: http://lkml.kernel.org/r/20131205153755.GL4545@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras
Pull RAS updates from Borislav Petkov:
* Add the functionality to override error reporting agents as some
machines are sporting a new extended error logging capability which, if
done properly in the BIOS, makes a corresponding EDAC module redundant,
from Gong Chen.
* PCIe AER tracepoint severity levels fix, from Rui Wang.
* Error path correction for the mce device init, from Levente Kurusa.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This patch marks the function get_mci_for_node_id() as static because it
is not used outside of sb_edac.c.
Thus, it also eliminates the following warning:
drivers/edac/sb_edac.c:918:22: warning: no previous prototype for ‘get_mci_for_node_id’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/0441f508186fc4eeabc8e9c3e4dde013d99405d4.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
This patch marks the function edac_create_debug_nodes() as static
because it is not used outside of edac_mc_sysfs.c.
Thus, it also eliminates the following warning:
drivers/edac/edac_mc_sysfs.c:917:5: warning: no previous prototype for ‘edac_create_debug_nodes’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/a1c863b08c0d6f67d03280cf908c771bf26a3239.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
No need for the namespace tagging there. Cleanup setup_pci_device while
at it.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Drop wrapper function and prefixes.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
This patch marks the function amd64_decode_bus_error() as static because
it is not used outside of amd64_edac.c.
It also eliminates the following warning:
drivers/edac/amd64_edac.c:2038:6: warning: no previous prototype for ‘amd64_decode_bus_error’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/7cddbd4c69ed493f183383e98853181aaf75b26b.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Newer Intel platforms support more than one method to report H/W event.
On this kind of platform, H/W event report can adopt new method and
traditional EDAC method should be disabled. Moreover, if EDAC event
report method is set to *force*, it means event must be reported via
EDAC interface. IOW, it overrides the default event report policy.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1386310630-12529-3-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit and error messages ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
This new parameter is used to control how to report HW error reporting,
especially for newer Intel platform, like Ivybridge-EX, which contains
an enhanced error decoding functionality in the firmware, i.e. eMCA.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1386310630-12529-2-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Currently, there is no other bus that has something like this macro for
their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com
[ Boris: swap commit message with better one. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
The value returned from 'f15_m30h_determine_channel' will
always be 0x3 max. The condition
(channel > 4 || channel < 0)
works as hardware never returns a value of 4, but
it leads to static checker analysis errors like
http://marc.info/?l=linux-edac&m=138607615131951&w=2.
Fix that.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20131203130857.GA32170@elgon.mountain
[ Boris: massage commit message a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Fix this:
In file included from drivers/edac/sb_edac.c:27:0:
drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’:
drivers/edac/edac_core.h:50:8: warning: ‘limit’ may be used uninitialized in this function [-Wmaybe-uninitialized]
printk(level "EDAC " prefix ": " fmt, ##arg)
^
drivers/edac/sb_edac.c:948:25: note: ‘limit’ was declared here
u64 ch_addr, offset, limit, prv = 0;
Limit can be initialized to 0. The only way limit wouldn't be
initialized is if there are no DIMMs present (which would be a bug of
course) and it'd fail on the next test.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Link: http://lkml.kernel.org/r/20131121122021.GD26009@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Add pcie error interrupt edac support for mpc85xx, p3041, p4080, and
p5020. The mpc85xx uses the legacy interrupt report mechanism - the
error interrupts are reported directly to mpic. While the p3041/
p4080/p5020 attaches the most of error interrupts to interrupt zero. And
report error interrupts to mpic via interrupt 0.
This patch can handle both of them.
Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com>
Link: http://lkml.kernel.org/r/1384712714-8826-3-git-send-email-morbidrsa@gmail.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Dave Jiang <dave.jiang@gmail.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC driver updates from Mauro Carvalho Chehab:
- sb_edac: add support for Ivy Bridge support
- cell_edac: add a missing of_node_put() call
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
cell_edac: fix missing of_node_put
sb_edac: add support for Ivy Bridge
sb_edac: avoid decoding the same error multiple times
sb_edac: rename mci_bind_devs()
sb_edac: enable multiple PCI id tables to be used
sb_edac: rework sad_pkg
sb_edac: allow different interleave lists
sb_edac: allow different dram_rule arrays
sb_edac: isolate TOHM retrieval
sb_edac: rename pci_br
sb_edac: isolate TOLM retrieval
sb_edac: make RANK_CFG_A value part of sbridge_info
|
|
Pull EDAC updates from Borislav Petkov:
"Following up on last week's discussion, here's my part of the EDAC
pile, highlights in the signed tag.
The last two patches have a date from just now because I've just
applied them to the tree after Johannes sent them to me earlier. I
decided to forward them now because they're trivial.
There's a third one for MPC85xx which adds PCIe error interrupt
support but since it is not so trivial and hasn't seen any linux-next
time, I'm deferring it to 3.14
EDAC update highlights:
- Support for Calxeda ECX-2000 memory controller, from Robert Richter
- Misc Calxeda Highbank drivers and EDAC core cleanups, from Rob
Herring and Robert Richter
- New maintainer for Freescale's MPC85xx EDAC driver: Johannes
Thumshirn"
* tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
edac/85xx: Remove mpc85xx_pci_err_remove
EDAC: Add edac-mpc85xx driver to MAINTAINERS
edac, highbank: Moving error injection to sysfs for edac
edac, highbank: Add MAINTAINERS entry
edac: Unify reporting of device info for device, mc and pci
edac, highbank: Improve and unify naming
edac, highbank: Add Calxeda ECX-2000 support
ARM: dts: calxeda: move memory-controller node out of ecx-common.dtsi
edac, highbank: Fix interrupt setup of mem and l2 controller
|
|
Remove mpc85xx_pci_err_remove(...) which is obsolete, this removes the
compiler warning which can be seen when building the driver either
statically or as a module.
Signed-off-by: Johannes Thumshirn <morbidrsa@gmail.com>
Link: https://lkml.kernel.org/r/20131112161901.GA15637@jtlinux
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Decrease device_node refcount np after task completion.
Signed-off-by: Libo Chen <libo.chen@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
Since Ivy Bridge memory controller is very similar to Sandy Bridge, it's
wiser to modify sb_edac to support both instead of creating another
driver.
[m.chehab@samsung.com: Fix CodingStyle]
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
Whenever the extended error reporting is active, multiple MCEs will be
generated for the same event, which will lead to multiple repeated
errors to be reported. So check ADDRV and only decode the error if the
MCE address is valid.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
This is in preparation for Ivy Bridge support
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
This is needed to allow separated PCI id tables for Sandy Bridge and Ivy
Bridge.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
This is in preparation for Ivy Bridge support
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
This is in preparation for Ivy Bridge support
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
This is in preparation for Ivy Bridge support
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
This is preparation of Ivy Bridge support.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
Ivy Bridge has more than one, so rename pci_br to pci_br0
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
This is in preparation for the Ivy Bridge support.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
This is in preparation of Ivy Bridge support.
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.
- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for
deferred probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates"
* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
powerpc: add missing explicit OF includes for ppc
dt/irq: add empty of_irq_count for !OF_IRQ
dt: disable self-tests for !OF_IRQ
of: irq: Fix interrupt-map entry matching
MIPS: Netlogic: replace early_init_devtree() call
of: Add Panasonic Corporation vendor prefix
of: Add Chunghwa Picture Tubes Ltd. vendor prefix
of: Add AU Optronics Corporation vendor prefix
of/irq: Fix potential buffer overflow
of/irq: Fix bug in interrupt parsing refactor.
of: set dma_mask to point to coherent_dma_mask
of: add vendor prefix for PHYTEC Messtechnik GmbH
DT: sort vendor-prefixes.txt
of: Add vendor prefix for Cadence
of: Add empty for_each_available_child_of_node() macro definition
arm/versatile: Fix versatile irq specifications.
of/irq: create interrupts-extended property
microblaze/pci: Drop PowerPC-ism from irq parsing
of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
of/irq: Use irq_of_parse_and_map()
...
|
|
Always have the error injection i/f available, even if there is no
debugfs or EDAC_DEBUG enabled. We need this for testing production
kernels and environments.
Thus, the entry moves from:
/sys/kernel/debug/edac/mc0/inject_ctrl
to:
/sys/devices/system/edac/mc/mc0/inject_ctrl
No other changes of the interface.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Signed-off-by: Robert Richter <rric@kernel.org>
|
|
Log messages slightly differ between edac subsystems. Unifying it.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Robert Richter <rric@kernel.org>
|
|
Assinging correct names of the 'hb_mc_edac' and 'hb_l2_edac' edac
modules for module, controller and device. Reported values for
Highbank in dmesg are now:
EDAC MC0: Giving out device to module hb_mc_edac controller
calxeda,hb-ddr-ctrl: DEV fff00000.memory-controller (INTERRUPT)
EDAC DEVICE0: Giving out device to module hb_l2_edac controller
calxeda,hb-sregs-l2-ecc: DEV fff3c200.sregs (INTERRUPT)
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Robert Richter <rric@kernel.org>
|
|
Implement edac support for Calxeda ECX-2000.
The ECX-2000 memory controller is similar to Highbank but has
different register bases for error and interrupt registers. There is
an own device tree name "calxeda,ecx-2000-ddr-ctrl" for identification
and initialization of the ECX-2000 and its base addresses.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Robert Richter <rric@kernel.org>
|
|
Register and enable interrupts after the edac registration. Otherwise
incomming ecc error interrupts lead to crashes during device setup.
Fixing this in drivers for mc and l2.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Cc: stable <stable@vger.kernel.org> # 3.6+
Signed-off-by: Robert Richter <rric@kernel.org>
|
|
In latest UEFI spec(by now it's 2.4) there are some new
fields for memory error reporting. Add these new fields for
ghes_edac interface.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
In latest UEFI spec(by now it is 2.4) memory error definition
for CPER (UEFI 2.4 Appendix N Common Platform Error Record)
adds some new fields. These fields help people to locate
memory error to an actual DIMM location.
Original-author: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
GENMASK is used to create a contiguous bitmask([hi:lo]). It is
implemented twice in current kernel. One is in EDAC driver, the other
is in SiS/XGI FB driver. Move it to a more generic place for other
usage.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Winischhofer <thomas@winischhofer.net>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Powerpc is a mess of implicit includes by prom.h. Add the necessary
explicit includes to drivers in preparation of prom.h cleanup.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Grant Likely <grant.likely@linaro.org>
|