Add a Kconfig option to use the double-bit error interrupt as the only
source for checking for parity errors.
The following alternative sources would be disabled:
1) correctable single-bit parity errors through pmu counters.
2) polling the cpumerrsr and l2merrsr registers.
3) checking for parity errors in the do_bad/bad_mode handlers.
Change-Id: I2a861a12d788e3239c81bca247a08b94d88ebf34
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
The L2ECTLR register describes whether an interrupt was generated from
the L2 memory or arrived through the AXI bus.
Change-Id: Ic9416e5572743029c72f5b92b2bde978e2e7cd04
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
Once a CPU is hotplugged, the pmu state is lost. Reconfigure the
PMU after hotplug and enable the single bit error interrupt.
Change-Id: I03c4af04f1b0271c3ba23297cb5182488947cd45
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
If bad_mode() or do_bad() is called, the arm edac driver checks for an
error. Ensure that warning messages are only printed out if there is
actually an error.
Additionally, fix an issue where the warning for a single bit error could
be printed for a double bit error.
Change-Id: I6133cb298fb9e660a220434761427d6ea6adb2ba
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
On a double-bit error, the PMU counters may increment and
trigger the PPI edac pmu overflow interrupt before the SPI for the
double bit interrupt triggers.
Check the FATAL bit of the cache error syndrome registers and mark
the error as double bit if it is set.
Change-Id: I5f992dac877b3d8936f11483d29b4e82fc70b0bf
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
Cortex A53 & A57 processors store information about the type of parity
error in esr_el1 for certain types of errors. Print out this information
in the interrupt handler.
Print the cpumerrsr, l2merrsr, and esr register in all cases.
Change-Id: I508560740ee4f6937b5137cb33ad21216f86649f
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
By design, the CortexA53/A57 processors are incapable of
gernerating interrupts or PMU events once a single-bit
error is observed in the L2 caches.
Hence, we need to poll the L2MERRSR register to periodically check
for single bit errors. We need to do this for L2 on both clusters.
Change-Id: I76a440b820f23c9667a5596cf550ff7725ec1cf5
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
The cti-pmu workaround cpu notifier enables the workaround on CPU online.
Check the DT property to ensure that this is required before enabling it.
Change-Id: Id92702fcdc98b99970cad0df46c6832faff78491
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
Check for ecc errors on panic on all processors
Change-Id: I2a68644afb2730a69aca35abb1f10899a11514dd
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
The CTI PMU workaround is enabled by default. Use a device tree property
to decide if the workaround needs to be applied or not.
Change-Id: Iaa847b7309d204d41c0ca53984964f3b238e0427
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
The CTI PMU workaround for the EDAC interrupt is enabled at probe
only on the CPUs that are online during boot-up. Some of the CPUs
can be onlined at a later point from userspace.
Ensure that the workaround is enabled on those CPUs as well.
Change-Id: I551b228d3df3f7ed7d935a55aec6474339d569a6
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
Cosmetic change to add 0x as a prefix while printing hexadecimal
values so that it makes the messages unambiguous.
Change-Id: I2d638708833da194f0066abd37ffc157c52c4182
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
Currently, the decision to control panic on a correctable error is
through a defconfig. Add a commandline parameter for easy toggling of
this particular option. Also, to make these correctable errors stand-out
in the logs, add a WARN when the panic_on_ce is disabled.
Change-Id: Ie3c61cad2489b680f51d466c8243ba23ccf5375c
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
In MSM8994-V1, for unknown reasons the pmu percpu interrupt was
incorrectly connected to the corresponding CPU in the other cluster.
To workaround this problem, it was decided that the cti could be used
to trigger the percpu pmu interrupt to the right cpu which actually
caused the event and can handle it.
This patch implements this workaround.
Change-Id: I732ad77ed2529a54b85c51b3fadcc53d93d70279
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
Single bit errors are detected using the PMU's memory event.
The counter overflow interrupt triggers a read of the
relevant CPU registers which help in reporting the single
bit errors.
Change-Id: I29cc3c952c1e0f1c05120b23cf30775583dcd67c
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
Allow the Cortex A53/A57 EDAC driver to be configured to
panic the kernel if a correctable error (CE) is detected.
Change-Id: Id7bf66ed36a348eb321d8fd457efe097c003ebcd
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
The 'instance number' argument passed to the EDAC framework
error reporting functions needs to correspond to the ID of
the CPU which reported the error, rather than the ID of the
cache set/way where the error occurred.
Change-Id: Ice5039abe83756ef16726cc34b62b0b3c88e1e16
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
CPUMERRSR_EL1 is a 64-bit register, so we must use a 64-bit
data type to hold its value.
Change-Id: I6bb7e41d84f417eb39e62cffef0807098205d268
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Add a driver for handling the L1/L2 cache error reporting
features found on the ARM Cortex A53 / A57 processors.
Change-Id: I03e11ada791265aa998aab7031a8e274a193d8f9
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
Signed-off-by: Stepan Moskovchenko <stepanm@codeaurora.org>