Commit Graph

21 Commits

Author SHA1 Message Date
Patrick Daly 142c367110 edac: cortex_arm64_edac: Use dbe irq only
Add a Kconfig option to use the double-bit error interrupt as the only
source for checking for parity errors.

The following alternative sources would be disabled:
1) correctable single-bit parity errors through pmu counters.
2) polling the cpumerrsr and l2merrsr registers.
3) checking for parity errors in the do_bad/bad_mode handlers.

Change-Id: I2a861a12d788e3239c81bca247a08b94d88ebf34
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
2015-01-27 12:07:58 -08:00
Patrick Daly 2aae1b9976 edac: cortex_arm64_edac: Print out L2ECTLR register
The L2ECTLR register describes whether an interrupt was generated from
the L2 memory or arrived through the AXI bus.

Change-Id: Ic9416e5572743029c72f5b92b2bde978e2e7cd04
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
2015-01-19 19:45:58 -08:00
Rohit Vaswani bca69616ea edac: arm64: Reconfigure pmu and enable the irq after hotplug
Once a CPU is hotplugged, the pmu state is lost. Reconfigure the
PMU after hotplug and enable the single bit error interrupt.

Change-Id: I03c4af04f1b0271c3ba23297cb5182488947cd45
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2014-12-23 17:00:49 -08:00
Patrick Daly 3db5e3d816 edac: cortex_arm64: Remove misleading edac error warning
If bad_mode() or do_bad() is called, the arm edac driver checks for an
error. Ensure that warning messages are only printed out if there is
actually an error.

Additionally, fix an issue where the warning for a single bit error could
be printed for a double bit error.

Change-Id: I6133cb298fb9e660a220434761427d6ea6adb2ba
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
2014-12-12 13:09:09 -08:00
Rohit Vaswani 06e0a05148 edac: arm64: Check the fatal bit and mark the error as Double-bit
On a double-bit error, the PMU counters may increment and
trigger the PPI edac pmu overflow interrupt before the SPI for the
double bit interrupt triggers.
Check the FATAL bit of the cache error syndrome registers and mark
the error as double bit if it is set.

Change-Id: I5f992dac877b3d8936f11483d29b4e82fc70b0bf
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2014-11-11 19:59:16 -08:00
Patrick Daly ab503cecbe edac: cortex_arm64: Print esr register information
Cortex A53 & A57 processors store information about the type of parity
error in esr_el1 for certain types of errors. Print out this information
in the interrupt handler.

Print the cpumerrsr, l2merrsr, and esr register in all cases.

Change-Id: I508560740ee4f6937b5137cb33ad21216f86649f
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
2014-11-11 15:32:40 -08:00
Rohit Vaswani 15c8581540 edac: cortex_arm64: Poll to check for cache errors
By design, the CortexA53/A57 processors are incapable of
gernerating interrupts or PMU events once a single-bit
error is observed in the L2 caches.
Hence, we need to poll the L2MERRSR register to periodically check
for single bit errors. We need to do this for L2 on both clusters.

Change-Id: I76a440b820f23c9667a5596cf550ff7725ec1cf5
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2014-10-09 16:18:18 -07:00
Rohit Vaswani 281331c1e5 edac: cortex_arm64: Conditionally register cti-pmu workaround cpu notifier
The cti-pmu workaround cpu notifier enables the workaround on CPU online.
Check the DT property to ensure that this is required before enabling it.

Change-Id: Id92702fcdc98b99970cad0df46c6832faff78491
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2014-09-29 11:32:21 -07:00
Rohit Vaswani 1f8440c4ae edac: arm64: Check for ECC errors on panic
Check for ecc errors on panic on all processors

Change-Id: I2a68644afb2730a69aca35abb1f10899a11514dd
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2014-09-23 13:24:02 -07:00
Rohit Vaswani a37acc6669 edac: cortex_arm64: Use device tree property for CTI PMU Workaround
The CTI PMU workaround is enabled by default. Use a device tree property
to decide if the workaround needs to be applied or not.

Change-Id: Iaa847b7309d204d41c0ca53984964f3b238e0427
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2014-09-19 16:01:24 -07:00
Rohit Vaswani 8a138da0fc edac: arm64: Enable cti pmu workaround on CPUs onlined post-boot
The CTI PMU workaround for the EDAC interrupt is enabled at probe
only on the CPUs that are online during boot-up. Some of the CPUs
can be onlined at a later point from userspace.
Ensure that the workaround is enabled on those CPUs as well.

Change-Id: I551b228d3df3f7ed7d935a55aec6474339d569a6
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2014-08-04 13:09:11 -07:00
Linux Build Service Account 221bd54d73 Merge "edac: arm64: Add 0x as a prefix for printing hexadecimal values" 2014-06-27 08:06:15 -07:00
Rohit Vaswani 29e85853ce edac: arm64: Add 0x as a prefix for printing hexadecimal values
Cosmetic change to add 0x as a prefix while printing hexadecimal
values so that it makes the messages unambiguous.

Change-Id: I2d638708833da194f0066abd37ffc157c52c4182
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2014-06-26 12:30:54 -07:00
Rohit Vaswani d7296e990b edac: arm64: Create a commandline option to disable panic on CE
Currently, the decision to control panic on a correctable error is
through a defconfig. Add a commandline parameter for easy toggling of
this particular option. Also, to make these correctable errors stand-out
in the logs, add a WARN when the panic_on_ce is disabled.

Change-Id: Ie3c61cad2489b680f51d466c8243ba23ccf5375c
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2014-06-24 16:07:20 -07:00
Linux Build Service Account e58ae82f35 Merge "EDAC: arm64: Add option to panic on correctable errors" 2014-06-19 12:11:27 -07:00
Rohit Vaswani 1b1d3e9fbb arm64: edac: Add workaround for using cti to trigger pmu irq
In MSM8994-V1, for unknown reasons the pmu percpu interrupt was
incorrectly connected to the corresponding CPU in the other cluster.
To workaround this problem, it was decided that the cti could be used
to trigger the percpu pmu interrupt to the right cpu which actually
caused the event and can handle it.
This patch implements this workaround.

Change-Id: I732ad77ed2529a54b85c51b3fadcc53d93d70279
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2014-06-17 13:57:48 -07:00
Rohit Vaswani d545dec326 edac: arm64: Add support for detecting Single Bit Errors
Single bit errors are detected using the PMU's memory event.
The counter overflow interrupt triggers a read of the
relevant CPU registers which help in reporting the single
bit errors.

Change-Id: I29cc3c952c1e0f1c05120b23cf30775583dcd67c
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2014-06-17 13:57:48 -07:00
Stepan Moskovchenko 15c4e1608b EDAC: arm64: Add option to panic on correctable errors
Allow the Cortex A53/A57 EDAC driver to be configured to
panic the kernel if a correctable error (CE) is detected.

Change-Id: Id7bf66ed36a348eb321d8fd457efe097c003ebcd
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
2014-06-16 19:03:31 -07:00
Stepan Moskovchenko 4d3099b3b5 EDAC: arm64: Fix CPU instance error reporting
The 'instance number' argument passed to the EDAC framework
error reporting functions needs to correspond to the ID of
the CPU which reported the error, rather than the ID of the
cache set/way where the error occurred.

Change-Id: Ice5039abe83756ef16726cc34b62b0b3c88e1e16
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
2014-06-10 19:40:32 -07:00
Stepan Moskovchenko 12f030ed39 EDAC: arm64: Fix CPUMERRSR_EL1 truncation
CPUMERRSR_EL1 is a 64-bit register, so we must use a 64-bit
data type to hold its value.

Change-Id: I6bb7e41d84f417eb39e62cffef0807098205d268
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
2014-06-03 19:20:18 -07:00
Rohit Vaswani 793f12dfd4 EDAC: Add support for Cortex A53 / A57 CPUs
Add a driver for handling the L1/L2 cache error reporting
features found on the ARM Cortex A53 / A57 processors.

Change-Id: I03e11ada791265aa998aab7031a8e274a193d8f9
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
2014-05-15 19:05:01 -07:00