Commit Graph

11137 Commits

Author SHA1 Message Date
Linus Torvalds af180b81a3 Merge branch 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm bugfixes from Gleb Natapov:
 "There is one more fix for MIPS KVM ABI here, MIPS and PPC build
  breakage fixes and a couple of PPC bug fixes"

* 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
  kvm/ppc/booke: Hold srcu lock when calling gfn functions
  kvm/ppc/booke64: Disable e6500 support
  kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage
  mips/kvm: Use KVM_REG_MIPS and proper size indicators for *_ONE_REG
  kvm: Add definition of KVM_REG_MIPS
  KVM: add kvm_para_available to asm-generic/kvm_para.h
2013-06-11 11:16:43 -07:00
Scott Wood 7c11c0ccc7 kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
EE is hard-disabled on entry to kvmppc_handle_exit(), so call
hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled
is unset.

Without this, we get warnings such as arch/powerpc/kernel/time.c:300,
and sometimes host kernel hangs.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:11:00 +03:00
Scott Wood f1e89028f0 kvm/ppc/booke: Hold srcu lock when calling gfn functions
KVM core expects arch code to acquire the srcu lock when calling
gfn_to_memslot and similar functions.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:10:59 +03:00
Scott Wood 2b6398fcf2 kvm/ppc/booke64: Disable e6500 support
The previous patch made 64-bit booke KVM build again, but Altivec
support is still not complete, and we can't prevent the guest from
turning on Altivec (which can corrupt host state until state
save/restore is implemented).  Disable e6500 on KVM until this is
fixed.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:10:56 +03:00
Mihai Caraman 4edd1ae91b kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage
Interrupt numbers defined for Book3E follows IVORs definition. Align
BOOKE_INTERRUPT_ALTIVEC_UNAVAIL and BOOKE_INTERRUPT_ALTIVEC_ASSIST to this
rule which also fixes the build breakage.
IVORs 32 and 33 are shared so reflect this in the interrupts naming.

This fixes a build break for 64-bit booke KVM.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-11 11:10:49 +03:00
Michael Ellerman b11ae95100 powerpc: Partial revert of "Context switch more PMU related SPRs"
In commit 59affcd I added context switching of more PMU SPRs, because
they are potentially exposed to userspace on Power8. However despite me
being a smart arse in the commit message it's actually not correct. In
particular it interacts badly with a global perf record.

We will have to do something more complicated, but that will have to
wait for 3.11.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:35 +10:00
Michael Ellerman 6772faa1ba powerpc/perf: Fix deadlock caused by calling printk() in PMU exception
In commit bc09c21 "Fix finding overflowed PMC in interrupt" we added
a printk() to the PMU exception handler. Unfortunately that is not safe.

The problem is that the PMU exception may run even when interrupts are
soft disabled, aka NMI context. We do this so that we can profile parts
of the kernel that have interrupts soft-disabled.

But by calling printk() from the exception handler, we can potentially
deadlock in the printk code on logbuf_lock, eg:

  [c00000038ba575c0] c000000000081928 .vprintk_emit+0xa8/0x540
  [c00000038ba576a0] c0000000007bcde8 .printk+0x48/0x58
  [c00000038ba57710] c000000000076504 .perf_event_interrupt+0x2d4/0x490
  [c00000038ba57810] c00000000001f6f8 .performance_monitor_exception+0x48/0x60
  [c00000038ba57880] c0000000000032cc performance_monitor_common+0x14c/0x180
  --- Exception: f01 (Performance Monitor) at c0000000007b25d4 ._raw_spin_lock_irq
  +0x64/0xc0
  [c00000038ba57bf0] c00000000007ed90 .devkmsg_read+0xd0/0x5a0
  [c00000038ba57d00] c0000000001c2934 .vfs_read+0xc4/0x1e0
  [c00000038ba57d90] c0000000001c2cd8 .SyS_read+0x58/0xd0
  [c00000038ba57e30] c000000000009d54 syscall_exit+0x0/0x98
  --- Exception: c01 (System Call) at 00001fffffbf6f7c
  SP (3ffff6d4de10) is in userspace

Fix it by making sure we only call printk() when we are not in NMI
context.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: <stable@vger.kernel.org> # 3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:32 +10:00
Michael Neuling 82a9f16adc powerpc/hw_breakpoints: Add DABRX cpu feature to fix 32-bit regression
When introducing support for DABRX in 4474ef0, we broke older 32-bit CPUs
that don't have that register.

Some CPUs have a DABR but not DABRX.  Configuration are:
- No 32bit CPUs have DABRX but some have DABR.
- POWER4+ and below have the DABR but no DABRX.
- 970 and POWER5 and above have DABR and DABRX.
- POWER8 has DAWR, hence no DABRX.

This introduces CPU_FTR_DABRX and sets it on appropriate CPUs.  We use
the top 64 bits for CPU FTR bits since only 64 bit CPUs have this.

Processors that don't have the DABRX will still work as they will fall
back to software filtering these breakpoints via perf_exclude_event().

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: "Gorelik, Jacob (335F)" <jacob.gorelik@jpl.nasa.gov>
cc: stable@vger.kernel.org (v3.9 only)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:29 +10:00
Michael Neuling fb0fce3e55 powerpc/power8: Update denormalization handler
POWER8 can take a denormalisation exception on any VSX registers.

This does the extra 32 VSX registers we don't currently handle.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:26 +10:00
Michael Neuling d7c67fb1cf powerpc/pseries: Simplify denormalization handler
The following simplifies the denorm code by using macros to generate the long
stream of almost identical instructions.

This patch results in no changes to the output binary, but removes a lot of
lines of code.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:22 +10:00
Michael Neuling 6a60f9e7d8 powerpc/power8: Fix oprofile and perf
In 2ac6f42 powerpc/cputable: Fix oprofile_cpu_type on power8
we broke all power8 hw events.

This reverts this change and uses oprofile_type instead. Perf now works
on POWER8 again and oprofile will revert to using timers on POWER8.

Kudos to mpe this fix.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:19 +10:00
Gavin Shan b8b3de224f powerpc/eeh: Don't check RTAS token to get PE addr
RTAS token "ibm,get-config-addr-info" or ibm,get-config-addr-info2"
are used to retrieve the PE address according to PCI address, which
made up of domain/bus/slot/function. If we don't have those 2 tokens,
the domain/bus/slot/function would be used as the address for EEH
RTAS operations. Some older f/w might not have those 2 tokens and
that blocks the EEH functionality to be initialized. It was introduced
by commit e2af155c ("powerpc/eeh: pseries platform EEH initialization").

The patch skips the check on those 2 tokens so we can bring up EEH
functionality successfully. And domain/bus/slot/function will be
used as address for EEH RTAS operations.

Cc: <stable@vger.kernel.org> # v3.4+
Reported-by: Robert Knight <knight@princeton.edu>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Tested-by: Robert Knight <knight@princeton.edu>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:16 +10:00
Kevin Hao c5df457ffe powerpc/pci: Check the bus address instead of resource address in pcibios_fixup_resources
If a BAR has the value of 0, we would assume that it is unset yet and
then mark the resource as unset and would reassign it later. But after
commit 6c5705fe (powerpc/PCI: get rid of device resource fixups)
the pcibios_fixup_resources is invoked after the bus address was
translated to linux resource. So the value of res->start is resource
address. And since the resource and bus address may be different, we
should translate it to the bus address before doing the check.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:13 +10:00
Will Schmidt badec11b64 powerpc/cputable: Fix typo on P7+ cputable entry
Fix a typo in setting COMMON_USER2_POWER7 bits to .cpu_user_features2
cpu specs table.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 09:30:03 +10:00
Michael Ellerman 58a032c3b1 powerpc/perf: Add missing SIER support
Commit 8f61aa3 "Add support for SIER" missed updates to siar_valid()
and perf_get_data_addr().

In both cases we need to check the SIER instead of mmcra.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:29 +10:00
Michael Ellerman cbda6aa10b powerpc/perf: Revert to original NO_SIPR logic
This is a revert and then some of commit 860aad7 "Add regs_no_sipr()".
This workaround was only needed on early chip versions.

As before NO_SIPR becomes a static flag of the PMU struct.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:29 +10:00
Kevin Hao 858957ab1e powerpc/pci: Remove the unused variables in pci_process_bridge_OF_ranges
The codes which ever used these two variables have gone. Throw away
them too.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:28 +10:00
Kevin Hao 2798389604 powerpc/pci: Remove the stale comments of pci_process_bridge_OF_ranges
These comments already don't apply to the current code. So just remove
them.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:28 +10:00
Srivatsa S. Bhat f274ef8747 powerpc/pseries: Always enable CONFIG_HOTPLUG_CPU on PSERIES SMP
Adam Lackorzynski reported the following build failure on
!CONFIG_HOTPLUG_CPU configuration:

  CC      arch/powerpc/kernel/rtas.o
arch/powerpc/kernel/rtas.c: In function ‘rtas_cpu_state_change_mask’:
arch/powerpc/kernel/rtas.c:843:4: error: implicit declaration of function ‘cpu_down’ [-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors
make[1]: *** [arch/powerpc/kernel/rtas.o] Error 1
make: *** [arch/powerpc/kernel] Error 2

The build fails because cpu_down() is defined only under CONFIG_HOTPLUG_CPU.

Looking further, the mobility code in pseries is one of the call-sites which
uses rtas_ibm_suspend_me(), which in turn calls rtas_cpu_state_change_mask().
And the mobility code is unconditionally compiled-in (it does not fall under
any Kconfig option). And commit 120496ac (powerpc: Bring all threads online
prior to migration/hibernation) which introduced this build regression is
critical for the proper functioning of the migration code. So it appears
that the only solution to this problem is to enable CONFIG_HOTPLUG_CPU if
SMP is enabled on PPC_PSERIES platforms. So make that change in the Kconfig.

Reported-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:27 +10:00
Paul Mackerras 8e44ddc3f3 powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulation
This adds the remaining two hypercalls defined by PAPR for manipulating
the XICS interrupt controller, H_IPOLL and H_XIRR_X.  H_IPOLL returns
information about the priority and pending interrupts for a virtual
cpu, without changing any state.  H_XIRR_X is like H_XIRR in that it
reads and acknowledges the highest-priority pending interrupt, but it
also returns the timestamp (timebase register value) from when the
interrupt was first received by the hypervisor.  Currently we just
return the current time, since we don't do any software queueing of
virtual interrupts inside the XICS emulation code.

These hcalls are not currently used by Linux guests, but may be in
future.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:27 +10:00
Priyanka Jain f7b3367774 powerpc/32bit:Store temporary result in r0 instead of r8
Commit a9c4e541ea
"powerpc/kprobe: Complete kprobe and migrate exception frame"
introduced a regression:

While returning from exception handling in case of PREEMPT enabled,
_TIF_NEED_RESCHED bit is checked in TI_FLAGS (thread_info flag) of current
task. Only if this bit is set, it should continue with the process of
calling preempt_schedule_irq() to schedule highest priority task if
available.

Current code assumes that r8 contains TI_FLAGS and check this for
_TIF_NEED_RESCHED, but as r8 is modified in the code which executes before
this check, r8 no longer contains the expected TI_FLAGS information.

As a result check for comparison with _TIF_NEED_RESCHED was failing even if
NEED_RESCHED bit is set in the current thread_info flag. Due to this,
preempt_schedule_irq() and in turn scheduler was not getting called even if
highest priority task is ready for execution.

So, store temporary results in r0 instead of r8 to prevent r8 from getting
modified as subsequent code is dependent on its value.

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
CC: <stable@vger.kernel.org> [v3.7+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:27 +10:00
Aneesh Kumar K.V 0608d69246 powerpc/mm: Always invalidate tlb on hpte invalidate and update
If a hash bucket gets full, we "evict" a more/less random entry from it.
When we do that we don't invalidate the TLB (hpte_remove) because we assume
the old translation is still technically "valid". This implies that when
we are invalidating or updating pte, even if HPTE entry is not valid
we should do a tlb invalidate.

This was a regression introduced by b1022fbd29

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:26 +10:00
Michael Neuling 280a5ba22c powerpc/pseries: Improve stream generation comments in copypage/user
No code changes, just documenting what's happening a little better.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:26 +10:00
Michael Neuling a515348fc6 powerpc/pseries: Kill all prefetch streams on context switch
On context switch, we should have no prefetch streams leak from one
userspace process to another.  This frees up prefetch resources for the
next process.

Based on patch from Milton Miller.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:25 +10:00
Nishanth Aravamudan 2ac6f427ad powerpc/cputable: Fix oprofile_cpu_type on power8
Maynard informed me that neither the oprofile kernel module nor oprofile
userspace has been updated to support that "legacy" oprofile module
interface for power8, which is indicated by "ppc64/power8." This results
in no samples. The solution is to default to the "timer" type, instead.
The raw entry also should be updated, as "ppc64/ibm-compat-v1" indicates
to oprofile userspace to use "compatibility events" which are obsolete
in ISA 2.07.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:25 +10:00
chenhui zhao e242114aff powerpc/mpic: Fix irq distribution problem when MPIC_SINGLE_DEST_CPU
For the mpic with a flag MPIC_SINGLE_DEST_CPU, only one bit should be
set in interrupt destination registers.

The code is applicable to 64-bit platforms as well as 32-bit.

Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:24 +10:00
Michael Neuling 2b3f8e87cf powerpc/tm: Fix userspace stack corruption on signal delivery for active transactions
When in an active transaction that takes a signal, we need to be careful with
the stack.  It's possible that the stack has moved back up after the tbegin.
The obvious case here is when the tbegin is called inside a function that
returns before a tend.  In this case, the stack is part of the checkpointed
transactional memory state.  If we write over this non transactionally or in
suspend, we are in trouble because if we get a tm abort, the program counter
and stack pointer will be back at the tbegin but our in memory stack won't be
valid anymore.

To avoid this, when taking a signal in an active transaction, we need to use
the stack pointer from the checkpointed state, rather than the speculated
state.  This ensures that the signal context (written tm suspended) will be
written below the stack required for the rollback.  The transaction is aborted
becuase of the treclaim, so any memory written between the tbegin and the
signal will be rolled back anyway.

For signals taken in non-TM or suspended mode, we use the
normal/non-checkpointed stack pointer.

Tested with 64 and 32 bit signals

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:23 +10:00
Michael Neuling b75c100ef2 powerpc/tm: Move TM abort cause codes to uapi
These cause codes are usable by userspace, so let's export to uapi.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:23 +10:00
Michael Neuling 6ce6c629fd powerpc/tm: Abort on emulation and alignment faults
If we are emulating an instruction inside an active user transaction that
touches memory, the kernel can't emulate it as it operates in transactional
suspend context.  We need to abort these transactions and send them back to
userspace for the hardware to rollback.

We can service these if the user transaction is in suspend mode, since the
kernel will operate in the same suspend context.

This adds a check to all alignment faults and to specific instruction
emulations (only string instructions for now).  If the user process is in an
active (non-suspended) transaction, we abort the transaction go back to
userspace allowing the HW to roll back the transaction and tell the user of the
failure.  This also adds new tm abort cause codes to report the reason of the
persistent error to the user.

Crappy test case here http://neuling.org/devel/junkcode/aligntm.c

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:22 +10:00
Michael Neuling 35f7097fce powerpc/tm: Make room for hypervisor in abort cause codes
PAPR carves out 0xff-0xe0 for hypervisor use of transactional memory software
abort cause codes.  Unfortunately we don't respect this currently.

Below fixes this to move our cause codes to below this region.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # 3.9 only
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:22 +10:00
Linus Torvalds d4c2456a18 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Benjamin Herrenschmidt:
 "Some more P8 related bits, a bunch of fixes for our P7+/P8 HW crypto
  drivers, some added workarounds for those radeons that don't do proper
  64-bit MSIs and a couple of other trivialities by myself."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/pseries: Make 32-bit MSI quirk work on systems lacking firmware support
  powerpc/powernv: Build a zImage.epapr
  powerpc: Make radeon 32-bit MSI quirk work on powernv
  powerpc: Context switch more PMU related SPRs
  powerpc/powernv: Fix condition for when to invalidate the TCE cache
  powerpc/pci: Fix bogus message at boot about empty memory resources
  powerpc: Fix TLB cleanup at boot on POWER8
  drivers/crypto/nx: Fixes for multiple races and issues
2013-05-24 08:57:38 -07:00
Brian King f1dd153121 powerpc/pseries: Make 32-bit MSI quirk work on systems lacking firmware support
Recent commit e61133dda4 added support
for a new firmware feature to force an adapter to use 32 bit MSIs.
However, this firmware is not available for all systems. The hack below
allows devices needing 32 bit MSIs to work on these systems as well.
It is careful to only enable this on Gen2 slots, which should limit
this to configurations where this hack is needed and tested to work.

[Small change to factor out the hack into a separate function -- BenH]

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-24 18:16:54 +10:00
Benjamin Herrenschmidt 83920c498e powerpc/powernv: Build a zImage.epapr
The zImage.epapr wrapper allows to use zImages when booting via a flat
device-tree which can be used on powernv.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-24 18:13:46 +10:00
Benjamin Herrenschmidt b72c1f6514 powerpc: Make radeon 32-bit MSI quirk work on powernv
This moves the quirk itself to pci_64.c as to get built on all ppc64
platforms (the only ones with a pci_dn), factors the two implementations
of get_pdn() into a single pci_get_dn() and use the quirk to do 32-bit
MSIs on IODA based powernv platforms.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-24 18:13:45 +10:00
Michael Ellerman 59affcd3e4 powerpc: Context switch more PMU related SPRs
In commit 9353374 "Context switch the new EBB SPRs" we added support for
context switching some new EBB SPRs. However despite four of us signing
off on that patch we missed some. To be fair these are not actually new
SPRs, but they are now potentially user accessible so need to be context
switched.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-24 18:13:45 +10:00
Benjamin Herrenschmidt 605e44d6e6 powerpc/powernv: Fix condition for when to invalidate the TCE cache
We use two flags, one to indicate an invalidation is needed after
creating a new entry and one to indicate an invalidation is needed
after removing an entry. However we were testing the wrong flag
in the remove case.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-24 18:13:44 +10:00
Benjamin Herrenschmidt bee7dd9c5f powerpc/pci: Fix bogus message at boot about empty memory resources
The message is only meant to be displayed if resource 0 is empty,
but was displayed if any is.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-24 18:13:44 +10:00
Benjamin Herrenschmidt 8fc1f5d7ef powerpc: Fix TLB cleanup at boot on POWER8
The TLB has 512 congruence classes (2048 entries 4 way set associative)
while P7 had 128

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-24 18:13:44 +10:00
Linus Torvalds b64194068b USB fixes for 3.10-rc2
Here are a number of tiny USB bugfixes / new device ids for 3.10-rc2
 
 The majority of these are USB gadget fixes, but they are all small.
 Other than that, some USB host controller fixes, and USB serial driver
 fixes for problems reported with them.
 
 Also hopefully a fixed up USB_OTG Kconfig dependancy, that one seems to
 be almost impossible to get right for all of the different platforms
 these days.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlGeP/gACgkQMUfUDdst+ykXygCfeIL5/HMREhUSHQgb6IiIDR81
 QHAAn0zJfRAyWUz4AtbrRsLSVfk3Q9vm
 =DQIq
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg Kroah-Hartman:
 "Here are a number of tiny USB bugfixes / new device ids for 3.10-rc2

  The majority of these are USB gadget fixes, but they are all small.
  Other than that, some USB host controller fixes, and USB serial driver
  fixes for problems reported with them.

  Also hopefully a fixed up USB_OTG Kconfig dependancy, that one seems
  to be almost impossible to get right for all of the different
  platforms these days."

* tag 'usb-3.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (56 commits)
  USB: cxacru: potential underflow in cxacru_cm_get_array()
  USB: ftdi_sio: Add support for Newport CONEX motor drivers
  USB: option: add device IDs for Dell 5804 (Novatel E371) WWAN card
  usb: ohci: fix goto wrong tag in err case
  usb: isp1760-if: fix memleak when platform_get_resource fail
  usb: ehci-s5p: fix memleak when fallback to pdata
  USB: serial: clean up chars_in_buffer
  USB: ti_usb_3410_5052: fix chars_in_buffer overhead
  USB: io_ti: fix chars_in_buffer overhead
  USB: ftdi_sio: fix chars_in_buffer overhead
  USB: ftdi_sio: clean up get_modem_status
  USB: serial: add generic wait_until_sent implementation
  USB: serial: add wait_until_sent operation
  USB: set device dma_mask without reference to global data
  USB: Blacklisted Cinterion's PLxx WWAN Interface
  usb: option: Add Telewell TW-LTE 4G
  USB: EHCI: remove bogus #error
  USB: reset resume quirk needed by a hub
  USB: usb-stor: realtek_cr: Fix compile error
  usb, chipidea: fix link error when USB_EHCI_HCD is a module
  ...
2013-05-23 09:23:32 -07:00
Alan Stern 98f541c6e3 USB: remove remaining instances of USB_SUSPEND
Commit 84ebc10294 (USB: remove
CONFIG_USB_SUSPEND option) failed to remove all of the usages of
USB_SUSPEND throughout the kernel.  This patch (as1677) removes the
remaining instances of that symbol.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-15 13:44:44 -04:00
Linus Torvalds a2c7a54fcc Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Benjamin Herrenschmidt:
 "This is mostly bug fixes (some of them regressions, some of them I
  deemed worth merging now) along with some patches from Li Zhong
  hooking up the new context tracking stuff (for the new full NO_HZ)"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits)
  powerpc: Set show_unhandled_signals to 1 by default
  powerpc/perf: Fix setting of "to" addresses for BHRB
  powerpc/pmu: Fix order of interpreting BHRB target entries
  powerpc/perf: Move BHRB code into CONFIG_PPC64 region
  powerpc: select HAVE_CONTEXT_TRACKING for pSeries
  powerpc: Use the new schedule_user API on userspace preemption
  powerpc: Exit user context on notify resume
  powerpc: Exception hooks for context tracking subsystem
  powerpc: Syscall hooks for context tracking subsystem
  powerpc/booke64: Fix kernel hangs at kernel_dbg_exc
  powerpc: Fix irq_set_affinity() return values
  powerpc: Provide __bswapdi2
  powerpc/powernv: Fix starting of secondary CPUs on OPALv2 and v3
  powerpc/powernv: Detect OPAL v3 API version
  powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again
  powerpc: Make CONFIG_RTAS_PROC depend on CONFIG_PROC_FS
  powerpc: Bring all threads online prior to migration/hibernation
  powerpc/rtas_flash: Fix validate_flash buffer overflow issue
  powerpc/kexec: Fix kexec when using VMX optimised memcpy
  powerpc: Fix build errors STRICT_MM_TYPECHECKS
  ...
2013-05-14 07:43:11 -07:00
Benjamin Herrenschmidt e34166ad63 powerpc: Set show_unhandled_signals to 1 by default
Just like other architectures

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 18:01:04 +10:00
Michael Neuling 691231846c powerpc/perf: Fix setting of "to" addresses for BHRB
Currently we only set the "to" address in the branch stack when the CPU
explicitly gives us a value.  Unfortunately it only does this for XL form
branches (eg blr, bctr, bctar) and not I and B form branches (eg b, bc).

Fortunately if we read the instruction from memory we can extract the offset of
a branch and calculate the target address.

This adds a function power_pmu_bhrb_to() to calculate the target/to address of
the corresponding I and B form branches.  It handles branches in both user and
kernel spaces.  It also plumbs this into the perf brhb reading code.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:22 +10:00
Michael Neuling 506e70d132 powerpc/pmu: Fix order of interpreting BHRB target entries
The current Branch History Rolling Buffer (BHRB) code misinterprets the order
of entries in the hardware buffer.  It assumes that a branch target address
will be read _after_ its corresponding branch.  In reality the branch target
comes before (lower mfbhrb entry) it's corresponding branch.

This is a rewrite of the code to take this into account.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:22 +10:00
Michael Neuling d52f2dc40b powerpc/perf: Move BHRB code into CONFIG_PPC64 region
The new Branch History Rolling buffer (BHRB) code is only useful on 64bit
processors, so move it into the #ifdef CONFIG_PPC64 region.

This avoids code bloat on 32bit systems.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:21 +10:00
Li Zhong a1797b2fd2 powerpc: select HAVE_CONTEXT_TRACKING for pSeries
Start context tracking support from pSeries.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:21 +10:00
Li Zhong 5d1c574511 powerpc: Use the new schedule_user API on userspace preemption
This patch corresponds to
[PATCH] x86: Use the new schedule_user API on userspace preemption
  commit 0430499ce9

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:20 +10:00
Li Zhong 106ed886ab powerpc: Exit user context on notify resume
This patch allows RCU usage in do_notify_resume, e.g. signal handling.
It corresponds to
[PATCH] x86: Exit RCU extended QS on notify resume
  commit edf55fda35

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:20 +10:00
Li Zhong ba12eedee3 powerpc: Exception hooks for context tracking subsystem
This is the exception hooks for context tracking subsystem, including
data access, program check, single step, instruction breakpoint, machine check,
alignment, fp unavailable, altivec assist, unknown exception, whose handlers
might use RCU.

This patch corresponds to
[PATCH] x86: Exception hooks for userspace RCU extended QS
  commit 6ba3c97a38

But after the exception handling moved to generic code, and some changes in
following two commits:
56dd9470d7
  context_tracking: Move exception handling to generic code
6c1e0256fa
  context_tracking: Restore correct previous context state on exception exit

it is able for exception hooks to use the generic code above instead of a
redundant arch implementation.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:19 +10:00
Li Zhong 22ecbe8dce powerpc: Syscall hooks for context tracking subsystem
This is the syscall slow path hooks for context tracking subsystem,
corresponding to
[PATCH] x86: Syscall hooks for userspace RCU extended QS
  commit bf5a3c13b9

TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
better for it to be in the same 16 bits with others in the group, so in the
asm code, andi. with this group could work.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:19 +10:00
Scott Wood 6cecf76b47 powerpc/booke64: Fix kernel hangs at kernel_dbg_exc
MSR_DE is not cleared on entry to the kernel, and we don't clear it
explicitly outside of debug code.  If we have MSR_DE set in
prime_debug_regs(), and the new thread has events enabled in DBCR0
(e.g.  ICMP is set in thread->dbsr0, even though it was cleared in the
real DBCR0 when the thread got scheduled out), we'll end up taking a
debug exception in the kernel when DBCR0 is loaded.  DSRR0 will not
point to an exception vector, and the kernel ends up hanging at
kernel_dbg_exc.  Fix this by always clearing MSR_DE when we load new
debug state.

Another observed source of kernel_dbg_exc hangs is with the branch
taken event.  If this event is active, but we take a non-debug trap
(e.g. a TLB miss or an asynchronous interrupt) before the next branch.
We end up taking a branch-taken debug exception on the initial branch
instruction of the exception vector, but because the debug exception is
DBSR_BT rather than DBSR_IC we branch to kernel_dbg_exc before even
checking the DSRR0 address.  Fix this by checking for DBSR_BT as well
as DBSR_IC, which is what 32-bit does and what the comments suggest was
intended in the 64-bit code as well.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:19 +10:00
Alexander Gordeev dcb615aef9 powerpc: Fix irq_set_affinity() return values
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:18 +10:00
David Woodhouse ca9d7aea59 powerpc: Provide __bswapdi2
Some versions of GCC apparently expect this to be provided by libgcc.

Updates from Mikey to fix 32 bit version and adding "r" to registers.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:17 +10:00
Benjamin Herrenschmidt b2b48584df powerpc/powernv: Fix starting of secondary CPUs on OPALv2 and v3
The current code fails to handle kexec on OPALv2. This fixes it
and adds code to improve the situation on OPALv3 where we can
query the CPU status from the firmware and decide what to do
based on that.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 15:12:31 +10:00
Benjamin Herrenschmidt 75b93da43a powerpc/powernv: Detect OPAL v3 API version
Future firmwares will support that new version

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 15:10:02 +10:00
Li Zhong af945cf4bf powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again
Saw this warning again, and this time from the ret_from_fork path.

It seems we could clear the back chain earlier in copy_thread(), which
could cover both path, and also fix potential lockdep usage in
schedule_tail(), or exception occurred before we clear the back chain.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:35 +10:00
Michael Ellerman b80ec3dc8e powerpc: Make CONFIG_RTAS_PROC depend on CONFIG_PROC_FS
We are getting build errors with CONFIG_PROC_FS=n:

arch/powerpc/kernel/rtas_flash.c
   In function 'rtas_flash_init':
  745:33: error: unused variable 'f' [-Werror=unused-variable]

But rtas_flash.c should not be built when CONFIG_PROC_FS=n, beacause all
it does is provide a /proc interface to the RTAS flash routines.

CONFIG_RTAS_FLASH already depends on CONFIG_RTAS_PROC, to indicate that
it depends on the RTAS proc support, but CONFIG_RTAS_PROC does not
depend on CONFIG_PROC_FS. So fix that.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:32 +10:00
Robert Jennings 120496ac2d powerpc: Bring all threads online prior to migration/hibernation
This patch brings online all threads which are present but not online
prior to migration/hibernation.  After migration/hibernation those
threads are taken back offline.

During migration/hibernation all online CPUs must call H_JOIN, this is
required by the hypervisor.  Without this patch, threads that are offline
(H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be
deadlocked (all threads either JOIN'd or CEDE'd).

Cc: <stable@kernel.org>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:29 +10:00
Vasant Hegde a94a14720e powerpc/rtas_flash: Fix validate_flash buffer overflow issue
ibm,validate-flash-image RTAS call output buffer contains 150 - 200
bytes of data on latest system. Presently we have output
buffer size as 64 bytes and we use sprintf to copy data from
RTAS buffer to local buffer. This causes kernel oops (see below
call trace).

This patch increases local buffer size to 256 and also uses
snprintf instead of sprintf to copy data from RTAS buffer.

Kernel call trace :
-------------------
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: nfs fscache lockd auth_rpcgss nfs_acl sunrpc fuse loop dm_mod ipv6 ipv6_lib usb_storage ehea(X) sr_mod qlge ses cdrom enclosure st be2net sg ext3 jbd mbcache usbhid hid ohci_hcd ehci_hcd usbcore qla2xxx usb_common sd_mod crc_t10dif scsi_dh_hp_sw scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh lpfc scsi_transport_fc scsi_tgt ipr(X) libata scsi_mod
Supported: Yes
NIP: 4520323031333130 LR: 4520323031333130 CTR: 0000000000000000
REGS: c0000001b91779b0 TRAP: 0400   Tainted: G            X  (3.0.13-0.27-ppc64)
MSR: 8000000040009032 <EE,ME,IR,DR>  CR: 44022488  XER: 20000018
TASK = c0000001bca1aba0[4736] 'cat' THREAD: c0000001b9174000 CPU: 36
GPR00: 4520323031333130 c0000001b9177c30 c000000000f87c98 000000000000009b
GPR04: c0000001b9177c4a 000000000000000b 3520323031333130 2032303133313031
GPR08: 3133313031350a4d 000000000000009b 0000000000000000 c0000000003664a4
GPR12: 0000000022022448 c000000003ee6c00 0000000000000002 00000000100e8a90
GPR16: 00000000100cb9d8 0000000010093370 000000001001d310 0000000000000000
GPR20: 0000000000008000 00000000100fae60 000000000000005e 0000000000000000
GPR24: 0000000010129350 46573738302e3030 2046573738302e30 300a4d4720323031
GPR28: 333130313520554e 4b4e4f574e0a4d47 2032303133313031 3520323031333130
NIP [4520323031333130] 0x4520323031333130
LR [4520323031333130] 0x4520323031333130
Call Trace:
[c0000001b9177c30] [4520323031333130] 0x4520323031333130 (unreliable)
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:26 +10:00
Anton Blanchard 79c66ce8f6 powerpc/kexec: Fix kexec when using VMX optimised memcpy
commit b3f271e86e (powerpc: POWER7 optimised memcpy using VMX and
enhanced prefetch) uses VMX when it is safe to do so (ie not in
interrupt). It also looks at the task struct to decide if we have to
save the current tasks' VMX state.

kexec calls memcpy() at a point where the task struct may have been
overwritten by the new kexec segments. If it has been overwritten
then when memcpy -> enable_altivec looks up current->thread.regs->msr
we get a cryptic oops or lockup.

I also notice we aren't initialising thread_info->cpu, which means
smp_processor_id is broken. Fix that too.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@vger.kernel.org> # 3.6+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:23 +10:00
Aneesh Kumar K.V 83d5e64b7e powerpc: Fix build errors STRICT_MM_TYPECHECKS
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:20 +10:00
Aneesh Kumar K.V 613e60a661 powerpc/mm: Use the correct mask value when looking at pgtable address
Our pgtable are 2*sizeof(pte_t)*PTRS_PER_PTE which is PTE_FRAG_SIZE.
Instead of depending on frag size, mask with PMD_MASKED_BITS.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:17 +10:00
Linus Torvalds c4cc75c332 Merge git://git.infradead.org/users/eparis/audit
Pull audit changes from Eric Paris:
 "Al used to send pull requests every couple of years but he told me to
  just start pushing them to you directly.

  Our touching outside of core audit code is pretty straight forward.  A
  couple of interface changes which hit net/.  A simple argument bug
  calling audit functions in namei.c and the removal of some assembly
  branch prediction code on ppc"

* git://git.infradead.org/users/eparis/audit: (31 commits)
  audit: fix message spacing printing auid
  Revert "audit: move kaudit thread start from auditd registration to kaudit init"
  audit: vfs: fix audit_inode call in O_CREAT case of do_last
  audit: Make testing for a valid loginuid explicit.
  audit: fix event coverage of AUDIT_ANOM_LINK
  audit: use spin_lock in audit_receive_msg to process tty logging
  audit: do not needlessly take a lock in tty_audit_exit
  audit: do not needlessly take a spinlock in copy_signal
  audit: add an option to control logging of passwords with pam_tty_audit
  audit: use spin_lock_irqsave/restore in audit tty code
  helper for some session id stuff
  audit: use a consistent audit helper to log lsm information
  audit: push loginuid and sessionid processing down
  audit: stop pushing loginid, uid, sessionid as arguments
  audit: remove the old depricated kernel interface
  audit: make validity checking generic
  audit: allow checking the type of audit message in the user filter
  audit: fix build break when AUDIT_DEBUG == 2
  audit: remove duplicate export of audit_enabled
  Audit: do not print error when LSMs disabled
  ...
2013-05-11 14:29:11 -07:00
Linus Torvalds 3644bc2ec7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull stray syscall bits from Al Viro:
 "Several syscall-related commits that were missing from the original"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
  unicore32: just use mmap_pgoff()...
  unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
  x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)
2013-05-10 09:21:05 -07:00
Scott Wood 4a3b8d0b83 powerpc: hard_irq_disable(): Call trace_hardirqs_off after disabling
lockdep.c has this:
        /*
         * So we're supposed to get called after you mask local IRQs,
         * but for some reason the hardware doesn't quite think you did
         * a proper job.
         */
	if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
		return;

Since irqs_disabled() is based on soft_enabled(), that (not just the
hard EE bit) needs to be 0 before we call trace_hardirqs_off.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-05-10 17:01:09 +10:00
Benjamin Herrenschmidt 73ed148aea powerpc/powernv: Improve kexec reliability
We add a machine_shutdown hook that frees the OPAL interrupts
(so they get masked at the source and don't fire while kexec'ing)
and which triggers an IODA reset on all the PCIe host bridges
which will have the effect of blocking all DMAs and subsequent
PCIs interrupts.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-10 16:59:18 +10:00
Al Viro 91c2e0bcae unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-09 13:46:38 -04:00
Linus Torvalds 5647ac0ad4 Removal of GENERIC_GPIO for v3.10
GENERIC_GPIO now synonymous with GPIOLIB. There are no longer any valid
 cases for enableing GENERIC_GPIO without GPIOLIB, even though it is
 possible to do so which has been causing confusion and breakage. This
 branch does the work to completely eliminate GENERIC_GPIO.
 
 However, it is not trivial to just create a branch to remove it. Over
 the course of the v3.9 cycle more code referencing GENERIC_GPIO has been
 added to linux-next that conflicts with this branch. The following must
 be done to resolve the conflicts when merging this branch into mainline:
 
 * "git grep CONFIG_GENERIC_GPIO" should return 0 hits. Matches should be
   replaced with CONFIG_GPIOLIB
 * "git grep '\bGENERIC_GPIO\b'" should return 1 hit in the Chinese
   documentation.
 * Selectors of GENERIC_GPIO should be turned into selectors of GPIOLIB
 * definitions of the option in architecture Kconfig code should be deleted.
 
 Stephen has 3 merge fixup patches[1] that do the above. They are currently
 applicable on mainline as of May 2nd.
 
 [1] http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg428056.html
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRifUnAAoJEEFnBt12D9kBs2YP/0U6+ia+xYvkVaJc28PDVIzn
 OReZNcJOYU8D5voxz0voaRD0EdcPwjbMu9Kp9aXMHlk4VxevF+8jCc/us0bIjtO1
 VcB5VmSCIhMhxdnBlum11Mk7Vr5MCweyl9NBsypnPt8cl4obMBZHf2yzoodFktNb
 wtyYlOb6FALtc6iDbOO6dG3w9F7FAOLvskUFzdv89m8mupTsBu9jw9NqFDbJHOex
 rxq0Sdd+kWF/nkJVcV5Y6jIdletRlhpipefMJ9diexreHvwqh+c4kJEYZaXgB5+m
 ha95cPbReK1d+RqzM3A8d4irzSVSmq4k7ijI6QkFOr48+AH7XsgKv5so885LKzMN
 IIXg2Phm9i0H8+ecEvhcc4oIYBHJiEKK54Y0qUD9dqbFoDGPTCSqMHdSSMbpAY+J
 bIIXlVzj1En3PPNUJLPt8q8Qz6WxCT9mDST3QSGYnD4o90HT+1R9j92RxGL6McOq
 rUOyJDwmzFvpBvKK4raGdOU435M+ps2NPKKNIRaIGQPPY9rM1kN4YqvhXukEsC9L
 3a3+3cQLh7iKxBHncxeQsJfethP1CPkJnzvF9r+ZZLf2rcPH4pbQIE2uO0XnX/nd
 5/DKi0nGgAJ//GMMzdo3RiOA5zGFjIZ/KMvfhQldpP6qFJRhqdGi6FPlAcwr1z1n
 YnCByPwwlvfC4LTXFOGL
 =xodc
 -----END PGP SIGNATURE-----

Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux

Pull removal of GENERIC_GPIO from Grant Likely:
 "GENERIC_GPIO now synonymous with GPIOLIB.  There are no longer any
  valid cases for enableing GENERIC_GPIO without GPIOLIB, even though it
  is possible to do so which has been causing confusion and breakage.
  This branch does the work to completely eliminate GENERIC_GPIO."

* tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux:
  gpio: update gpio Chinese documentation
  Remove GENERIC_GPIO config option
  Convert selectors of GENERIC_GPIO to GPIOLIB
  blackfin: force use of gpiolib
  m68k: coldfire: use gpiolib
  mips: pnx833x: remove requirement for GENERIC_GPIO
  openrisc: default GENERIC_GPIO to false
  avr32: default GENERIC_GPIO to false
  xtensa: remove explicit selection of GENERIC_GPIO
  sh: replace CONFIG_GENERIC_GPIO by CONFIG_GPIOLIB
  powerpc: remove redundant GENERIC_GPIO selection
  unicore32: default GENERIC_GPIO to false
  unicore32: remove unneeded select GENERIC_GPIO
  arm: plat-orion: use GPIO driver on CONFIG_GPIOLIB
  arm: remove redundant GENERIC_GPIO selection
  mips: alchemy: require gpiolib
  mips: txx9: change GENERIC_GPIO to GPIOLIB
  mips: loongson: use GPIO driver on CONFIG_GPIOLIB
  mips: remove redundant GENERIC_GPIO select
2013-05-09 09:59:16 -07:00
Benjamin Herrenschmidt 1de1455f33 powerpc/powernv: Properly drop characters if console is closed
If the firmware returns an error such as "closed" (or hardware
error), we should drop characters.

Currently we only do that when a firmware compatible with OPAL v2
APIs is detected, in the code that calls opal_console_write_buffer_space(),
which didn't exist with OPAL v1 (or didn't work).

However, when enabling early debug consoles, the flag indicating
that v2 is supported isn't set yet, causing us, in case of errors
or closed console, to spin forever.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-08 14:14:26 +10:00
Alistair Popple 30650239ad powerpc: Add an in memory udbg console
This patch adds a new udbg early debug console which utilises
statically defined input and output buffers stored within the kernel
BSS. It is primarily designed to assist with bring up of new hardware
which may not have a working console but which has a method of
reading/writing kernel memory.

This version incorporates comments made by Ben H (thanks!).

Changes from v1:
	- Add memory barriers.
	- Ensure updating of read/write positions is atomic.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-08 06:36:49 +10:00
Benjamin Herrenschmidt 5737789c83 powerpc: Make hard_irq_disable() do the right thing vs. irq tracing
If hard_irq_disable() is called while interrupts are already soft-disabled
(which is the most common case) all is already well.

However you can (and in some cases want) to call it while everything is
enabled (to make sure you don't get a lazy even, for example before entry
into KVM guests) and in this case we need to inform the irq tracer that
the irqs are going off.

We have to change the inline into a macro to avoid an include circular
dependency hell hole.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-07 17:13:57 +10:00
Benjamin Herrenschmidt d5dae72130 powerpc/topology: Fix spurr attribute permission
We are registering the attribute with permission 0600 but it
doesn't have a store callback, which causes WARN_ON's during
boot. Fix the permission.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 15:02:40 +10:00
Benjamin Herrenschmidt 3fd47f063b powerpc/pci: Support per-aperture memory offset
The PCI core supports an offset per aperture nowadays but our arch
code still has a single offset per host bridge representing the
difference betwen CPU memory addresses and PCI MMIO addresses.

This is a problem as new machines and hypervisor versions are
coming out where the 64-bit windows will have a different offset
(basically mapped 1:1) from the 32-bit windows.

This fixes it by using separate offsets. In the long run, we probably
want to get rid of that intermediary struct pci_controller and have
those directly stored into the pci_host_bridge as they are parsed
but this will be a more invasive change.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 13:40:40 +10:00
Benjamin Herrenschmidt 342d6666f7 powerpc/cell/iommu: Improve error message for missing node
Some devices don't have a correct node ID and thus can't be
attached to an iommu.

The message displayed by the iommu code isn't very useful if
you don't have a device-tree node as it tries to print the
device-tree path but not the struct device name.

Improve this by printing the device name as well.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 12:03:49 +10:00
Benjamin Herrenschmidt 96cf3f66de powerpc/cell/spufs: Fix status attribute permission
We are registering the attribute with permission 0644 but it
doesn't have a store callback, which causes WARN_ON's during
boot. Fix the permission.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 12:02:05 +10:00
Benjamin Herrenschmidt 5fe0c1f2f0 irqdomain: Allow quiet failure mode
Some interrupt controllers refuse to map interrupts marked as
"protected" by firwmare. Since we try to map everyting in the
device-tree on some platforms, we end up with a lot of nasty
WARN's in the boot log for what is a normal situation on those
machines.

This defines a specific return code (-EPERM) from the host map()
callback which cause irqdomain to fail silently.

MPIC is updated to return this when hitting a protected source
printing only a single line message for diagnostic purposes.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 11:37:43 +10:00
Benjamin Herrenschmidt f3d40c2532 powerpc/pnv: Fix "compatible" property for P8 PHB
The property should be "ibm,power8-pciex", not "ibm,p8-pciex". The latter
was changed in FW because it was inconsistent with the rest of the nodes.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:42 +10:00
Benjamin Herrenschmidt a0b8e76fac powerpc/pci: Don't add bogus empty resources to PHBs
When converting to use the new pci_add_resource_offset() we didn't
properly account for empty resources (0 flags) and add those bogons
to the PHBs. The result is some annoying messages in the log.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:41 +10:00
Benjamin Herrenschmidt 4ea9008b75 powerpc/powerpnv: Properly handle failure starting CPUs
If OPAL returns an error, propagate it upward rather than spinning
seconds waiting for a CPU that will never show up

Signed-off-by: Benjamin Herrenschmidt  <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:41 +10:00
Nishanth Aravamudan 748645bf06 powerpc/cputable: Advertise support for ISEL/HTM/DSCR/TAR on POWER8
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:40 +10:00
Nishanth Aravamudan 6c1bf48275 powerpc/cputable: Advertise ISEL support on appropriate embedded processors
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:39 +10:00
Nishanth Aravamudan 4a1ae4f3c5 powerpc/cputable: Advertise DSCR support on P7/P7+
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:39 +10:00
Nishanth Aravamudan cbbc6f1b14 powerpc/cputable: Reserve bits in HWCAP2 for new features
Also, make HTM's presence dependent on the .config option.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:38 +10:00
Kleber Sacilotto de Souza d82fb31abc powerpc/pseries: Perform proper max_bus_speed detection
On pseries machines the detection for max_bus_speed should be done
through an OpenFirmware property. This patch adds a function to perform
this detection and a hook to perform dynamic adding of the function only
for pseries. This is done by overwriting the weak
pcibios_root_bridge_prepare function which is called by
pci_create_root_bus().

From: Lucas Kannebley Tavares <lucaskt@linux.vnet.ibm.com>
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:38 +10:00
Brian King e61133dda4 powerpc/pseries: Force 32 bit MSIs for devices that require it
The following patch implements a new PAPR change which allows
the OS to force the use of 32 bit MSIs, regardless of what
the PCI capabilities indicate. This is required for some
devices that advertise support for 64 bit MSIs but don't
actually support them.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:37 +10:00
Michael Neuling c2fd22df89 powerpc/tm: Fix null pointer deference in flush_hash_page
Make sure that current->thread.reg exists before we deference it in
flush_hash_page.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: John J Miller <millerjo@us.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:36 +10:00
Jeremy Kerr c4463b3738 powerpc/powernv: Defer OPAL exception handler registration
Currently, the OPAL exception vectors are registered before the feature
fixups are processed. This means that the now-firmware-owned vectors
will likely be overwritten by the kernel.

This change moves the exception registration code to an early initcall,
rather than at machine_init time.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:36 +10:00
Anton Blanchard 73d2fb758e powerpc: Emulate non privileged DSCR read and write
POWER8 allows read and write of the DSCR in userspace. We added
kernel emulation so applications could always use the instructions
regardless of the CPU type.

Unfortunately there are two SPRs for the DSCR and we only added
emulation for the privileged one. Add code to match the non
privileged one.

A simple test was created to verify the fix:

http://ozlabs.org/~anton/junkcode/user_dscr_test.c

Without the patch we get a SIGILL and it passes with the patch.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:35 +10:00
Linus Torvalds 01227a889e Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Gleb Natapov:
 "Highlights of the updates are:

  general:
   - new emulated device API
   - legacy device assignment is now optional
   - irqfd interface is more generic and can be shared between arches

  x86:
   - VMCS shadow support and other nested VMX improvements
   - APIC virtualization and Posted Interrupt hardware support
   - Optimize mmio spte zapping

  ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

  ARM:
   - reworking of Hyp idmaps

  s390:
   - ioeventfd for virtio-ccw

  And many other bug fixes, cleanups and improvements"

* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  kvm: Add compat_ioctl for device control API
  KVM: x86: Account for failing enable_irq_window for NMI window request
  KVM: PPC: Book3S: Add API for in-kernel XICS emulation
  kvm/ppc/mpic: fix missing unlock in set_base_addr()
  kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
  kvm/ppc/mpic: remove users
  kvm/ppc/mpic: fix mmio region lists when multiple guests used
  kvm/ppc/mpic: remove default routes from documentation
  kvm: KVM_CAP_IOMMU only available with device assignment
  ARM: KVM: iterate over all CPUs for CPU compatibility check
  KVM: ARM: Fix spelling in error message
  ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
  KVM: ARM: Fix API documentation for ONE_REG encoding
  ARM: KVM: promote vfp_host pointer to generic host cpu context
  ARM: KVM: add architecture specific hook for capabilities
  ARM: KVM: perform HYP initilization for hotplugged CPUs
  ARM: KVM: switch to a dual-step HYP init code
  ARM: KVM: rework HYP page table freeing
  ARM: KVM: enforce maximum size for identity mapped code
  ARM: KVM: move to a KVM provided HYP idmap
  ...
2013-05-05 14:47:31 -07:00
Linus Torvalds 173192958d MMC highlights for 3.10:
Core:
  - Introduce MMC_CAP2_NO_PRESCAN_POWERUP to allow skipping mmc_power_up()
    at boot/initialization time if it's already happened, for performance
    (faster boot time) reasons.
  - Fix a bit width test failure that resulted in old eMMC cards being put
    into 1-bit mode when 4-bit mode was available.
  - Expose fwrev/hwrev for MMCv4 parts.
  - Improve card removal logic in the case where the card's removed slowly;
    we were missing card removal events if the card retained contact with
    the slot pads for long enough to reply to a CMD13 while being removed.
 
 Drivers:
  - davinci_mmc: Support using PIO instead of DMA.
  - dw_mmc: Add support for Exynos4412.
  - mxcmmc: DT support, use slot-gpio API.
  - mxs-mmc: Add broken-cd/cd-inverted/non-removable DT property support.
  - sdhci-sirf: New sdhci-pltfm driver for CSR SiRF SoCs:
     SiRFprimaII: unicore ARM Cortex-A9
     SiRFatlas6: unicore ARM Cortex-A9
     SiRFmarco: dual core ARM Cortex-A9 SMP
  - sdhci-tegra: Add support for Tegra114 platforms, use mmc_of_parse().
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRfr+qAAoJEHNBYZ7TNxYMG90QALW6ArdTxkfNKNybCx/s3ttm
 hyP2BB9eXF/FbZ9bGHGuv9NXV/4YZ4yr4c3yV4paOMW3s0z/+IhqajLfCgPFPCGK
 MOk5GAStN/+znBjLy+5V/17EJuVpE+WVwUqDg3RafB8NGQtaClymwPg6R+SMWwL9
 uiRcCVgmgzlk9l+4Z1p9wHUeQlhZQ/Okzj4CjkOpWfgsUpo0PcWQRosd+CHHirWz
 iWnhFBM1cP4Nj59ashYSedu797s29pKGd31JcuWWQE228iykGfcCJWzMr9YMiU6x
 LwESeuTST8b9HblI7WYnFvpVvD9qTuBkkKMIKMNb4E34xBf8+H2zUh1hK7iEnDXt
 W0F1xy3WsxpJ1yF0BY9SqqnNeLpJBOenHlTZfGlf5r+kHWYUUQvvqw/ZdmWStRgv
 mSoqXSmz/oM3rn1KjMritJ9oyr0EH8kHDMAsoiCF0EuGipmenFTdJJiwbh0JZJxK
 bjqPUROtSK8TdPvTw1BPRX5YtJLzsG4HdsI3C8W05BTbCL9iIE1NaWP27nikPVfq
 glt9EilzgR252Wc649OrB7JAHBZjlVFIndNzjziGdxNzH6m1AK05qxTL5hBvba99
 plccrsVMDGWLSmsGVwz7Eo+TLflRCdtrE+7O4NrlydZzp2t24HZrMc9rN8GDWQQE
 dP91YUdpDYrTsUBKiKGQ
 =Pr7P
 -----END PGP SIGNATURE-----

Merge tag 'mmc-updates-for-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc

Pull MMC update from Chris Ball:
 "MMC highlights for 3.10:

  Core:
   - Introduce MMC_CAP2_NO_PRESCAN_POWERUP to allow skipping
     mmc_power_up() at boot/initialization time if it's already
     happened, for performance (faster boot time) reasons.
   - Fix a bit width test failure that resulted in old eMMC cards being
     put into 1-bit mode when 4-bit mode was available.
   - Expose fwrev/hwrev for MMCv4 parts.
   - Improve card removal logic in the case where the card's removed
     slowly; we were missing card removal events if the card retained
     contact with the slot pads for long enough to reply to a CMD13
     while being removed.

  Drivers:
   - davinci_mmc: Support using PIO instead of DMA.
   - dw_mmc: Add support for Exynos4412.
   - mxcmmc: DT support, use slot-gpio API.
   - mxs-mmc: Add broken-cd/cd-inverted/non-removable DT property
     support.
   - sdhci-sirf: New sdhci-pltfm driver for CSR SiRF SoCs:
       SiRFprimaII: unicore ARM Cortex-A9
       SiRFatlas6: unicore ARM Cortex-A9
       SiRFmarco: dual core ARM Cortex-A9 SMP
   - sdhci-tegra: Add support for Tegra114 platforms, use
     mmc_of_parse()"

* tag 'mmc-updates-for-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (66 commits)
  mmc: sdhci-tegra: fix MODULE_DEVICE_TABLE
  mmc: core: fix init controller performance regression, updated patch
  mmc: mxcmmc: enable DMA support on mpc512x
  mmc: mxcmmc: constify mxcmci_devtype
  mmc: mxcmmc: use slot-gpio API for write-protect detection
  mmc: mxcmmc: add mpc512x SDHC support
  mmc: mxcmmc: fix race conditions for host->req and host->data access
  mmc: mxcmmc: DT support
  mmc: dw_mmc: let device core setup the default pin configuration
  mmc: mxs-mmc: add broken-cd property
  mmc: mxs-mmc: add non-removable property
  mmc: mxs-mmc: add cd-inverted property
  mmc: core: call pm_runtime_put_noidle in pm_runtime_get_sync failed case
  mmc: mxcmmc: Fix bug when card is present during boot
  mmc: core: fix performance regression initializing MMC host controllers
  Revert "mmc: core: wait while adding MMC host to ensure root mounts successfully"
  mmc: atmel-mci: pio hang on block errors
  mmc: core: Fix bit width test failing on old eMMC cards
  mmc: dw_mmc: Use pr_info instead of printk
  mmc: dw_mmc: Check return value of regulator_enable
  ...
2013-05-04 13:45:17 -07:00
Linus Torvalds 5a148af669 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
 "The main highlights this time around are:

   - A pile of addition POWER8 bits and nits, such as updated
     performance counter support (Michael Ellerman), new branch history
     buffer support (Anshuman Khandual), base support for the new PCI
     host bridge when not using the hypervisor (Gavin Shan) and other
     random related bits and fixes from various contributors.

   - Some rework of our page table format by Aneesh Kumar which fixes a
     thing or two and paves the way for THP support.  THP itself will
     not make it this time around however.

   - More Freescale updates, including Altivec support on the new e6500
     cores, new PCI controller support, and a pile of new boards support
     and updates.

   - The usual batch of trivial cleanups & fixes"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
  powerpc: Fix build error for book3e
  powerpc: Context switch the new EBB SPRs
  powerpc: Turn on the EBB H/FSCR bits
  powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
  powerpc: Setup BHRB instructions facility in HFSCR for POWER8
  powerpc: Fix interrupt range check on debug exception
  powerpc: Update tlbie/tlbiel as per ISA doc
  powerpc: Print page size info during boot
  powerpc: print both base and actual page size on hash failure
  powerpc: Fix hpte_decode to use the correct decoding for page sizes
  powerpc: Decode the pte-lp-encoding bits correctly.
  powerpc: Use encode avpn where we need only avpn values
  powerpc: Reduce PTE table memory wastage
  powerpc: Move the pte free routines from common header
  powerpc: Reduce the PTE_INDEX_SIZE
  powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
  powerpc: New hugepage directory format
  powerpc: Don't truncate pgd_index wrongly
  powerpc: Don't hard code the size of pte page
  powerpc: Save DAR and DSISR in pt_regs on MCE
  ...
2013-05-02 10:16:16 -07:00
Paul Mackerras 5975a2e095 KVM: PPC: Book3S: Add API for in-kernel XICS emulation
This adds the API for userspace to instantiate an XICS device in a VM
and connect VCPUs to it.  The API consists of a new device type for
the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
which is used to assert and deassert interrupt inputs of the XICS.

The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
Each attribute within this group corresponds to the state of one
interrupt source.  The attribute number is the same as the interrupt
source number.

This does not support irq routing or irqfd yet.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:36 +02:00
Wei Yongjun d133b40f2c kvm/ppc/mpic: fix missing unlock in set_base_addr()
Add the missing unlock before return from function set_base_addr()
when disables the mapping.

Introduced by commit 5df554ad5b
(kvm/ppc/mpic: in-kernel MPIC emulation)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:35 +02:00
Scott Wood ed840ee9c8 kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
These functions do an srcu_dereference without acquiring the srcu lock
themselves.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:35 +02:00
Scott Wood 1d6f6b7339 kvm/ppc/mpic: remove users
This is an unused (no pun intended) leftover from when this code did
reference counting.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:34 +02:00
Scott Wood 398d87836e kvm/ppc/mpic: fix mmio region lists when multiple guests used
Keeping a linked list of statically defined objects doesn't work
very well when we have multiple guests. :-P

Switch to an array of constant objects.  This fixes a hang when
multiple guests are used.

Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: remove struct list_head from mem_reg]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:33 +02:00
Linus Torvalds 20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
Aneesh Kumar K.V 54d5999d98 powerpc: Fix build error for book3e
We moved the definition of shift_to_mmu_psize and mmu_psize_to_shift
out of hugetlbpage.c in patch "powerpc: New hugepage directory format".

These functions are not related to hugetlbpage and we want to use them
outside hugetlbpage.c We missed a definition for book3e when we moved
these functions. Add similar functions to mmu-book3e.h

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:37:37 +10:00
Michael Ellerman 9353374b8e powerpc: Context switch the new EBB SPRs
This context switches the new Event Based Branching (EBB) SPRs.  The three new
SPRs are:
  - Event Based Branch Handler Register (EBBHR)
  - Event Based Branch Return Register (EBBRR)
  - Branch Event Status and Control Register (BESCR)

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:37:36 +10:00
Michael Neuling 1ddf499e1a powerpc: Turn on the EBB H/FSCR bits
This turns Event Based Branching (EBB) on in the Hypervisor Facility Status and
Control Register (HFSCR) and Facility Status and Control Register (FSCR).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:36:55 +10:00
Michael Ellerman 1de2bd4e0c powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
We are getting low on cpu feature bits. So rather than add a separate bit for
every new Power8 feature, add a bit for arch 2.07 server catagory and use that
instead.

Hijack the value we had for BCTAR, but swap the value with CFAR so that all the
ARCH defines are together.

Note we don't touch CPU_FTR_TM, because it is conditionally enabled if
the kernel is built with TM support.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:35:16 +10:00
Anshuman Khandual 53b56ca019 powerpc: Setup BHRB instructions facility in HFSCR for POWER8
Make BHRB instructions available in problem and privileged states.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:35:15 +10:00
Bharat Bhushan fc2a6cfe05 powerpc: Fix interrupt range check on debug exception
We do not want to take single step and branch-taken debug exception
in kernel exception code. But the address range check was not covering
all kernel exception handlers address range.

With this patch we defined the interrupt_end label which defines the
end on kernel exception code. So now we check interrupt_base to
interrupt_end range for not handling debug exception in kernel
exception entry.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:31:01 +10:00
David Howells 4c23782091 ppc: Clean up scanlog
Clean up the pseries scanlog driver's use of procfs:

 (1) Don't need to save the proc_dir_entry pointer as we have the filename to
     remove with.

 (2) Save the scan log buffer pointer in a static variable (there is only one
     of it) and don't save it in the PDE (which doesn't have a destructor).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cc: Paul Mackerras <paulus@samba.org>
cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:45 -04:00
David Howells e8eeded3c5 ppc: Clean up rtas_flash driver somewhat
Clean up some of the problems with the rtas_flash driver:

 (1) It shouldn't fiddle with the internals of the procfs filesystem (altering
     pde->count).

 (2) If pid namespaces are in effect, then you can get multiple inodes
     connected to a single pde, thereby rendering the pde->count > 2 test
     useless.

 (3) The pde->count fudging doesn't work for forked, dup'd or cloned file
     descriptors, so add static mutexes and use them to wrap access to the
     driver through read, write and release methods.

 (4) The driver can only handle one device, so allocate most of the data
     previously attached to the pde->data as static variables instead (though
     allocate the validation data buffer with kmalloc).

 (5) We don't need to save the pde pointers as long as we have the filenames
     available for removal.

 (6) Don't try to multiplex what the update file read method does based on the
     filename.  Instead provide separate file ops and split the function.

Whilst we're at it, tabulate the procfile information and loop through it when
creating or destroying them rather than manually coding each one.

[Folded fixes from Vasant Hegde <hegdevasant@linux.vnet.ibm.com>]

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cc: Paul Mackerras <paulus@samba.org>
cc: Anton Blanchard <anton@samba.org>
cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:45 -04:00
David Howells 271a15eabe proc: Supply PDE attribute setting accessor functions
Supply accessor functions to set attributes in proc_dir_entry structs.

The following are supplied: proc_set_size() and proc_set_user().

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
cc: linuxppc-dev@lists.ozlabs.org
cc: linux-media@vger.kernel.org
cc: netdev@vger.kernel.org
cc: linux-wireless@vger.kernel.org
cc: linux-pci@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:18 -04:00
Linus Torvalds 73287a43cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights (1721 non-merge commits, this has to be a record of some
  sort):

   1) Add 'random' mode to team driver, from Jiri Pirko and Eric
      Dumazet.

   2) Make it so that any driver that supports configuration of multiple
      MAC addresses can provide the forwarding database add and del
      calls by providing a default implementation and hooking that up if
      the driver doesn't have an explicit set of handlers.  From Vlad
      Yasevich.

   3) Support GSO segmentation over tunnels and other encapsulating
      devices such as VXLAN, from Pravin B Shelar.

   4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

   5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
      Dukkipati.

   6) In the PHY layer, allow supporting wake-on-lan in situations where
      the PHY registers have to be written for it to be configured.

      Use it to support wake-on-lan in mv643xx_eth.

      From Michael Stapelberg.

   7) Significantly improve firewire IPV6 support, from YOSHIFUJI
      Hideaki.

   8) Allow multiple packets to be sent in a single transmission using
      network coding in batman-adv, from Martin Hundebøll.

   9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

  10) Generalize the VXLAN forwarding tables so that there is more
      flexibility in configurating various aspects of the endpoints.
      From David Stevens.

  11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
      from Dmitry Kravkov.

  12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
      Neira Ayuso.

  13) Start adding networking selftests.

  14) In situations of overload on the same AF_PACKET fanout socket, or
      per-cpu packet receive queue, minimize drop by distributing the
      load to other cpus/fanouts.  From Willem de Bruijn and Eric
      Dumazet.

  15) Add support for new payload offset BPF instruction, from Daniel
      Borkmann.

  16) Convert several drivers over to mdoule_platform_driver(), from
      Sachin Kamat.

  17) Provide a minimal BPF JIT image disassembler userspace tool, from
      Daniel Borkmann.

  18) Rewrite F-RTO implementation in TCP to match the final
      specification of it in RFC4138 and RFC5682.  From Yuchung Cheng.

  19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
      you like netlink, so I implemented netlink dumping of netlink
      sockets.") From Andrey Vagin.

  20) Remove ugly passing of rtnetlink attributes into rtnl_doit
      functions, from Thomas Graf.

  21) Allow userspace to be able to see if a configuration change occurs
      in the middle of an address or device list dump, from Nicolas
      Dichtel.

  22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
      Frederic Sowa.

  23) Increase accuracy of packet length used by packet scheduler, from
      Jason Wang.

  24) Beginning set of changes to make ipv4/ipv6 fragment handling more
      scalable and less susceptible to overload and locking contention,
      from Jesper Dangaard Brouer.

  25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
      instead.  From Hong Zhiguo.

  26) Optimize route usage in IPVS by avoiding reference counting where
      possible, from Julian Anastasov.

  27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

  28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
      Eitzenberger.

  29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
      nfnetlink_log, and nfnetlink_queue.  From Gao feng.

  30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

  31) Support several new r8169 chips, from Hayes Wang.

  32) Support tokenized interface identifiers in ipv6, from Daniel
      Borkmann.

  33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

  34) Add 802.1ad vlan offload support, from Patrick McHardy.

  35) Support mmap() based netlink communication, also from Patrick
      McHardy.

  36) Support HW timestamping in mlx4 driver, from Amir Vadai.

  37) Rationalize AF_PACKET packet timestamping when transmitting, from
      Willem de Bruijn and Daniel Borkmann.

  38) Bring parity to what's provided by /proc/net/packet socket dumping
      and the info provided by netlink socket dumping of AF_PACKET
      sockets.  From Nicolas Dichtel.

  39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
      Poirier"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  filter: fix va_list build error
  af_unix: fix a fatal race with bit fields
  bnx2x: Prevent memory leak when cnic is absent
  bnx2x: correct reading of speed capabilities
  net: sctp: attribute printl with __printf for gcc fmt checks
  netlink: kconfig: move mmap i/o into netlink kconfig
  netpoll: convert mutex into a semaphore
  netlink: Fix skb ref counting.
  net_sched: act_ipt forward compat with xtables
  mlx4_en: fix a build error on 32bit arches
  Revert "bnx2x: allow nvram test to run when device is down"
  bridge: avoid OOPS if root port not found
  drivers: net: cpsw: fix kernel warn on cpsw irq enable
  sh_eth: use random MAC address if no valid one supplied
  3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
  tg3: fix to append hardware time stamping flags
  unix/stream: fix peeking with an offset larger than data in queue
  unix/dgram: fix peeking with an offset larger than data in queue
  unix/dgram: peek beyond 0-sized skbs
  openvswitch: Remove unneeded ovs_netdev_get_ifindex()
  ...
2013-05-01 14:08:52 -07:00
Linus Torvalds 08d7676083 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull compat cleanup from Al Viro:
 "Mostly about syscall wrappers this time; there will be another pile
  with patches in the same general area from various people, but I'd
  rather push those after both that and vfs.git pile are in."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  syscalls.h: slightly reduce the jungles of macros
  get rid of union semop in sys_semctl(2) arguments
  make do_mremap() static
  sparc: no need to sign-extend in sync_file_range() wrapper
  ppc compat wrappers for add_key(2) and request_key(2) are pointless
  x86: trim sys_ia32.h
  x86: sys32_kill and sys32_mprotect are pointless
  get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
  merge compat sys_ipc instances
  consolidate compat lookup_dcookie()
  convert vmsplice to COMPAT_SYSCALL_DEFINE
  switch getrusage() to COMPAT_SYSCALL_DEFINE
  switch epoll_pwait to COMPAT_SYSCALL_DEFINE
  convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
  switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
  make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect
  make HAVE_SYSCALL_WRAPPERS unconditional
  consolidate cond_syscall and SYSCALL_ALIAS declarations
  teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long
  get rid of duplicate logics in __SC_....[1-6] definitions
2013-05-01 07:21:43 -07:00
Linus Torvalds 5f56886521 Merge branch 'akpm' (incoming from Andrew)
Merge third batch of fixes from Andrew Morton:
 "Most of the rest.  I still have two large patchsets against AIO and
  IPC, but they're a bit stuck behind other trees and I'm about to
  vanish for six days.

   - random fixlets
   - inotify
   - more of the MM queue
   - show_stack() cleanups
   - DMI update
   - kthread/workqueue things
   - compat cleanups
   - epoll udpates
   - binfmt updates
   - nilfs2
   - hfs
   - hfsplus
   - ptrace
   - kmod
   - coredump
   - kexec
   - rbtree
   - pids
   - pidns
   - pps
   - semaphore tweaks
   - some w1 patches
   - relay updates
   - core Kconfig changes
   - sysrq tweaks"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits)
  Documentation/sysrq: fix inconstistent help message of sysrq key
  ethernet/emac/sysrq: fix inconstistent help message of sysrq key
  sparc/sysrq: fix inconstistent help message of sysrq key
  powerpc/xmon/sysrq: fix inconstistent help message of sysrq key
  ARM/etm/sysrq: fix inconstistent help message of sysrq key
  power/sysrq: fix inconstistent help message of sysrq key
  kgdb/sysrq: fix inconstistent help message of sysrq key
  lib/decompress.c: fix initconst
  notifier-error-inject: fix module names in Kconfig
  kernel/sys.c: make prctl(PR_SET_MM) generally available
  UAPI: remove empty Kbuild files
  menuconfig: print more info for symbol without prompts
  init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display
  kconfig menu: move Virtualization drivers near other virtualization options
  Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
  relay: use macro PAGE_ALIGN instead of FIX_SIZE
  kernel/relay.c: move FIX_SIZE macro into relay.c
  kernel/relay.c: remove unused function argument actor
  drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave()
  drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave()
  ...
2013-04-30 17:37:43 -07:00
zhangwei(Jovi) 90a102e59a powerpc/xmon/sysrq: fix inconstistent help message of sysrq key
Currently help message of /proc/sysrq-trigger highlight its
upper-case characters, like below:

      SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E)
      memory-full-oom-kill(F) kill-all-tasks(I) ...

this would confuse user trigger sysrq by upper-case character, which is
inconsistent with the real lower-case character registed key.

This inconsistent help message will also lead more confused when
26 upper-case letters put into use in future.

This patch fix powerpc xmon sysrq key: "xmon(x)"

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:10 -07:00
Tejun Heo a43cb95d54 dump_stack: unify debug information printed by show_regs()
show_regs() is inherently arch-dependent but it does make sense to print
generic debug information and some archs already do albeit in slightly
different forms.  This patch introduces a generic function to print debug
information from show_regs() so that different archs print out the same
information and it's much easier to modify what's printed.

show_regs_print_info() prints out the same debug info as dump_stack()
does plus task and thread_info pointers.

* Archs which didn't print debug info now do.

  alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
  metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
  um, xtensa

* Already prints debug info.  Replaced with show_regs_print_info().
  The printed information is superset of what used to be there.

  arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86

* s390 is special in that it used to print arch-specific information
  along with generic debug info.  Heiko and Martin think that the
  arch-specific extra isn't worth keeping s390 specfic implementation.
  Converted to use the generic version.

Note that now all archs print the debug info before actual register
dumps.

An example BUG() dump follows.

 kernel BUG at /work/os/work/kernel/workqueue.c:4841!
 invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
 Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
 task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
 RIP: 0010:[<ffffffff8234a07e>]  [<ffffffff8234a07e>] init_workqueues+0x4/0x6
 RSP: 0000:ffff88007c861ec8  EFLAGS: 00010246
 RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
 RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
 RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Stack:
  ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
  0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
  ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
 Call Trace:
  [<ffffffff81000312>] do_one_initcall+0x122/0x170
  [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
  [<ffffffff81c47760>] ? rest_init+0x140/0x140
  [<ffffffff81c4776e>] kernel_init+0xe/0xf0
  [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81c47760>] ? rest_init+0x140/0x140
  ...

v2: Typo fix in x86-32.

v3: CPU number dropped from show_regs_print_info() as
    dump_stack_print_info() has been updated to print it.  s390
    specific implementation dropped as requested by s390 maintainers.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>		[tile bits]
Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:02 -07:00
Tejun Heo 196779b9b4 dump_stack: consolidate dump_stack() implementations and unify their behaviors
Both dump_stack() and show_stack() are currently implemented by each
architecture.  show_stack(NULL, NULL) dumps the backtrace for the
current task as does dump_stack().  On some archs, dump_stack() prints
extra information - pid, utsname and so on - in addition to the
backtrace while the two are identical on other archs.

The usages in arch-independent code of the two functions indicate
show_stack(NULL, NULL) should print out bare backtrace while
dump_stack() is used for debugging purposes when something went wrong,
so it does make sense to print additional information on the task which
triggered dump_stack().

There's no reason to require archs to implement two separate but mostly
identical functions.  It leads to unnecessary subtle information.

This patch expands the dummy fallback dump_stack() implementation in
lib/dump_stack.c such that it prints out debug information (taken from
x86) and invokes show_stack(NULL, NULL) and drops arch-specific
dump_stack() implementations in all archs except blackfin.  Blackfin's
dump_stack() does something wonky that I don't understand.

Debug information can be printed separately by calling
dump_stack_print_info() so that arch-specific dump_stack()
implementation can still emit the same debug information.  This is used
in blackfin.

This patch brings the following behavior changes.

* On some archs, an extra level in backtrace for show_stack() could be
  printed.  This is because the top frame was determined in
  dump_stack() on those archs while generic dump_stack() can't do that
  reliably.  It can be compensated by inlining dump_stack() but not
  sure whether that'd be necessary.

* Most archs didn't use to print debug info on dump_stack().  They do
  now.

An example WARN dump follows.

 WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
 Hardware name: empty
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9
  0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
  ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c
  0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
 Call Trace:
  [<ffffffff81c614dc>] dump_stack+0x19/0x1b
  [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0
  [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8234a071>] init_workqueues+0x35/0x505
  ...

v2: CPU number added to the generic debug info as requested by s390
    folks and dropped the s390 specific dump_stack().  This loses %ksp
    from the debug message which the maintainers think isn't important
    enough to keep the s390-specific dump_stack() implementation.

    dump_stack_print_info() is moved to kernel/printk.c from
    lib/dump_stack.c.  Because linkage is per objecct file,
    dump_stack_print_info() living in the same lib file as generic
    dump_stack() means that archs which implement custom dump_stack()
    - at this point, only blackfin - can't use dump_stack_print_info()
    as that will bring in the generic version of dump_stack() too.  v1
    The v1 patch broke build on blackfin due to this issue.  The build
    breakage was reported by Fengguang Wu.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	[s390 bits]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:02 -07:00
Linus Torvalds 3ed1c478ef Power management and ACPI updates for 3.10-rc1
- ARM big.LITTLE cpufreq driver from Viresh Kumar.
 
 - exynos5440 cpufreq driver from Amit Daniel Kachhap.
 
 - cpufreq core cleanup and code consolidation from Viresh Kumar and
   Stratos Karafotis.
 
 - cpufreq scalability improvement from Nathan Zimmer.
 
 - AMD "frequency sensitivity feedback" powersave bias for the ondemand
   cpufreq governor from Jacob Shin.
 
 - cpuidle code consolidation and cleanups from Daniel Lezcano.
 
 - ARM OMAP cpuidle fixes from Santosh Shilimkar and Daniel Lezcano.
 
 - ACPICA fixes and other improvements from Bob Moore, Jung-uk Kim,
   Lv Zheng, Yinghai Lu, Tang Chen, Colin Ian King, and Linn Crosetto.
 
 - ACPI core updates related to hotplug from Toshi Kani, Paul Bolle,
   Yasuaki Ishimatsu, and Rafael J. Wysocki.
 
 - Intel Lynxpoint LPSS (Low-Power Subsystem) support improvements
   from Rafael J. Wysocki and Andy Shevchenko.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJRf8M8AAoJEKhOf7ml8uNsud4P/3cabXP5lDipzibRrpOiONse
 puuvIdhtNdMRMc3t1oSDjNH/w/JA51Gc+ICGFAORiyVmqxBc85mpT6J5ibqV7hNd
 pCqbKJceoB5PajHZSx22e4wG9O7YN1k3r80p38IfFzA+Ct0KNSuE0ixMEfHKYjiq
 p5pXswk6TY3gtBReH9agrafHqDtXw4IMTE0asMuJ+BorPW7vQeiNlrkuA+0qmDuu
 26O0Pm2TVkx1ryfTjdM9zSZ9X2G4JuM8rm1/VFZWQJTExwlv3bA2Za1nvQNJlJ99
 6JZ0JXfAehcEW2Ye0sqsZ8HSEabDVHM29QvvOszJ5RpBXERiOCHOkhvFleCoTpn0
 Xq0rtXPrLMH1G28Ej+cxmsAjfzOLV2Byg30CAoI/GCLuQ+xh+VMCpuNYQuld25CG
 9rtYd0fWESeYsAebhDcX0E3xyzJtbrHtOb9PyGwNkbAJ8YQfhVSMCOPi2SX2wa+Q
 qXLXi2VaHvjBSUKcAv5BmM+Ya57Be+88D0LxbgXbUeOnYefUK1ljldKDDshkMjgG
 P4LPdm4JpoB5ncXSOO1Dz9w9QnNcFexSUySd/TtKLNMha1vEHV8ISzNPYY+9IdXf
 XN0VZbFnUDzdj+Fwna7zyFb1cGihDYJKAtpXvRd8Y6RGUxKx9uGLAFJZw/xZB/cR
 KZKuML5O8MgJuef37F38
 =H/se
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael J Wysocki:

 - ARM big.LITTLE cpufreq driver from Viresh Kumar.

 - exynos5440 cpufreq driver from Amit Daniel Kachhap.

 - cpufreq core cleanup and code consolidation from Viresh Kumar and
   Stratos Karafotis.

 - cpufreq scalability improvement from Nathan Zimmer.

 - AMD "frequency sensitivity feedback" powersave bias for the ondemand
   cpufreq governor from Jacob Shin.

 - cpuidle code consolidation and cleanups from Daniel Lezcano.

 - ARM OMAP cpuidle fixes from Santosh Shilimkar and Daniel Lezcano.

 - ACPICA fixes and other improvements from Bob Moore, Jung-uk Kim, Lv
   Zheng, Yinghai Lu, Tang Chen, Colin Ian King, and Linn Crosetto.

 - ACPI core updates related to hotplug from Toshi Kani, Paul Bolle,
   Yasuaki Ishimatsu, and Rafael J Wysocki.

 - Intel Lynxpoint LPSS (Low-Power Subsystem) support improvements from
   Rafael J Wysocki and Andy Shevchenko.

* tag 'pm+acpi-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (192 commits)
  cpufreq: Revert incorrect commit 5800043
  cpufreq: MAINTAINERS: Add co-maintainer
  cpuidle: add maintainer entry
  ACPI / thermal: do not always return THERMAL_TREND_RAISING for active trip points
  ARM: s3c64xx: cpuidle: use init/exit common routine
  cpufreq: pxa2xx: initialize variables
  ACPI: video: correct acpi_video_bus_add error processing
  SH: cpuidle: use init/exit common routine
  ARM: S5pv210: compiling issue, ARM_S5PV210_CPUFREQ needs CONFIG_CPU_FREQ_TABLE=y
  ACPI: Fix wrong parameter passed to memblock_reserve
  cpuidle: fix comment format
  pnp: use %*phC to dump small buffers
  isapnp: remove debug leftovers
  ARM: imx: cpuidle: use init/exit common routine
  ARM: davinci: cpuidle: use init/exit common routine
  ARM: kirkwood: cpuidle: use init/exit common routine
  ARM: calxeda: cpuidle: use init/exit common routine
  ARM: tegra: cpuidle: use init/exit common routine for tegra3
  ARM: tegra: cpuidle: use init/exit common routine for tegra2
  ARM: OMAP4: cpuidle: use init/exit common routine
  ...
2013-04-30 15:21:02 -07:00
Linus Torvalds 5d434fcb25 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small
  code cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits)
  mm: Convert print_symbol to %pSR
  gfs2: Convert print_symbol to %pSR
  m32r: Convert print_symbol to %pSR
  iostats.txt: add easy-to-find description for field 6
  x86 cmpxchg.h: fix wrong comment
  treewide: Fix typo in printk and comments
  doc: devicetree: Fix various typos
  docbook: fix 8250 naming in device-drivers
  pata_pdc2027x: Fix compiler warning
  treewide: Fix typo in printks
  mei: Fix comments in drivers/misc/mei
  treewide: Fix typos in kernel messages
  pm44xx: Fix comment for "CONFIG_CPU_IDLE"
  doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP"
  mmzone: correct "pags" to "pages" in comment.
  kernel-parameters: remove outdated 'noresidual' parameter
  Remove spurious _H suffixes from ifdef comments
  sound: Remove stray pluses from Kconfig file
  radio-shark: Fix printk "CONFIG_LED_CLASS"
  doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE
  ...
2013-04-30 09:36:50 -07:00
Linus Torvalds 8700c95adb Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP/hotplug changes from Ingo Molnar:
 "This is a pretty large, multi-arch series unifying and generalizing
  the various disjunct pieces of idle routines that architectures have
  historically copied from each other and have grown in random, wildly
  inconsistent and sometimes buggy directions:

   101 files changed, 455 insertions(+), 1328 deletions(-)

  this went through a number of review and test iterations before it was
  committed, it was tested on various architectures, was exposed to
  linux-next for quite some time - nevertheless it might cause problems
  on architectures that don't read the mailing lists and don't regularly
  test linux-next.

  This cat herding excercise was motivated by the -rt kernel, and was
  brought to you by Thomas "the Whip" Gleixner."

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  idle: Remove GENERIC_IDLE_LOOP config switch
  um: Use generic idle loop
  ia64: Make sure interrupts enabled when we "safe_halt()"
  sparc: Use generic idle loop
  idle: Remove unused ARCH_HAS_DEFAULT_IDLE
  bfin: Fix typo in arch_cpu_idle()
  xtensa: Use generic idle loop
  x86: Use generic idle loop
  unicore: Use generic idle loop
  tile: Use generic idle loop
  tile: Enter idle with preemption disabled
  sh: Use generic idle loop
  score: Use generic idle loop
  s390: Use generic idle loop
  powerpc: Use generic idle loop
  parisc: Use generic idle loop
  openrisc: Use generic idle loop
  mn10300: Use generic idle loop
  mips: Use generic idle loop
  microblaze: Use generic idle loop
  ...
2013-04-30 07:50:17 -07:00
Linus Torvalds e0972916e8 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Features:

   - Add "uretprobes" - an optimization to uprobes, like kretprobes are
     an optimization to kprobes.  "perf probe -x file sym%return" now
     works like kretprobes.  By Oleg Nesterov.

   - Introduce per core aggregation in 'perf stat', from Stephane
     Eranian.

   - Add memory profiling via PEBS, from Stephane Eranian.

   - Event group view for 'annotate' in --stdio, --tui and --gtk, from
     Namhyung Kim.

   - Add support for AMD NB and L2I "uncore" counters, by Jacob Shin.

   - Add Ivy Bridge-EP uncore support, by Zheng Yan

   - IBM zEnterprise EC12 oprofile support patchlet from Robert Richter.

   - Add perf test entries for checking breakpoint overflow signal
     handler issues, from Jiri Olsa.

   - Add perf test entry for for checking number of EXIT events, from
     Namhyung Kim.

   - Add perf test entries for checking --cpu in record and stat, from
     Jiri Olsa.

   - Introduce perf stat --repeat forever, from Frederik Deweerdt.

   - Add --no-demangle to report/top, from Namhyung Kim.

   - PowerPC fixes plus a couple of cleanups/optimizations in uprobes
     and trace_uprobes, by Oleg Nesterov.

  Various fixes and refactorings:

   - Fix dependency of the python binding wrt libtraceevent, from
     Naohiro Aota.

   - Simplify some perf_evlist methods and to allow 'stat' to share code
     with 'record' and 'trace', by Arnaldo Carvalho de Melo.

   - Remove dead code in related to libtraceevent integration, from
     Namhyung Kim.

   - Revert "perf sched: Handle PERF_RECORD_EXIT events" to get 'perf
     sched lat' back working, by Arnaldo Carvalho de Melo

   - We don't use Newt anymore, just plain libslang, by Arnaldo Carvalho
     de Melo.

   - Kill a bunch of die() calls, from Namhyung Kim.

   - Fix build on non-glibc systems due to libio.h absence, from Cody P
     Schafer.

   - Remove some perf_session and tracing dead code, from David Ahern.

   - Honor parallel jobs, fix from Borislav Petkov

   - Introduce tools/lib/lk library, initially just removing duplication
     among tools/perf and tools/vm.  from Borislav Petkov

  ... and many more I missed to list, see the shortlog and git log for
  more details."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (136 commits)
  perf/x86/intel/P4: Robistify P4 PMU types
  perf/x86/amd: Fix AMD NB and L2I "uncore" support
  perf/x86/amd: Remove old-style NB counter support from perf_event_amd.c
  perf/x86: Check all MSRs before passing hw check
  perf/x86/amd: Add support for AMD NB and L2I "uncore" counters
  perf/x86/intel: Add Ivy Bridge-EP uncore support
  perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management
  perf/x86: Avoid kfree() in CPU_{STARTING,DYING}
  uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty
  uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()
  uprobes/tracing: Change create_trace_uprobe() to support uretprobes
  uprobes/tracing: Make seq_printf() code uretprobe-friendly
  uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly
  uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly
  uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher()
  uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers
  uprobes/tracing: Generalize struct uprobe_trace_entry_head
  uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls
  uprobes/tracing: Kill the pointless seq_print_ip_sym() call
  uprobes/tracing: Kill the pointless task_pt_regs() calls
  ...
2013-04-30 07:41:01 -07:00
Aneesh Kumar K.V 1f6aaaccb1 powerpc: Update tlbie/tlbiel as per ISA doc
Encode the actual page correctly in tlbie/tlbiel. This make sure we handle
multiple page size segment correctly.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:29 +10:00
Aneesh Kumar K.V 3dc4feca4b powerpc: Print page size info during boot
This gives hint about different base and actual page size combination
supported by the platform.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:25 +10:00
Aneesh Kumar K.V d8139ebf85 powerpc: print both base and actual page size on hash failure
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:22 +10:00
Aneesh Kumar K.V 7e74c3921a powerpc: Fix hpte_decode to use the correct decoding for page sizes
As per ISA doc, we encode base and actual page size in the LP bits of
PTE. The number of bit used to encode the page sizes depend on actual
page size.  ISA doc lists this as

   PTE LP     actual page size
rrrr rrrz 	>=8KB
rrrr rrzz	>=16KB
rrrr rzzz 	>=32KB
rrrr zzzz 	>=64KB
rrrz zzzz 	>=128KB
rrzz zzzz 	>=256KB
rzzz zzzz	>=512KB
zzzz zzzz 	>=1MB

ISA doc also says
"The values of the “z” bits used to specify each size, along with all possible
values of “r” bits in the LP field, must result in LP values distinct from
other LP values for other sizes."

based on the above update hpte_decode to use the correct decoding for LP bits.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:18 +10:00
Aneesh Kumar K.V b1022fbd29 powerpc: Decode the pte-lp-encoding bits correctly.
We look at both the segment base page size and actual page size and store
the pte-lp-encodings in an array per base page size.

We also update all relevant functions to take actual page size argument
so that we can use the correct PTE LP encoding in HPTE. This should also
get the basic Multiple Page Size per Segment (MPSS) support. This is needed
to enable THP on ppc64.

[Fixed PR KVM build --BenH]

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:14 +10:00
Aneesh Kumar K.V 74f227b228 powerpc: Use encode avpn where we need only avpn values
In all these cases we are doing something similar to

HPTE_V_COMPARE(hpte_v, want_v) which ignores the HPTE_V_LARGE bit

With MPSS support we would need actual page size to set HPTE_V_LARGE
bit and that won't be available in most of these cases. Since we are ignoring
HPTE_V_LARGE bit, use the  avpn value instead. There should not be any change
in behaviour after this patch.

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:11 +10:00
Aneesh Kumar K.V 5c1f6ee9a3 powerpc: Reduce PTE table memory wastage
We allocate one page for the last level of linux page table. With THP and
large page size of 16MB, that would mean we are wasting large part
of that page. To map 16MB area, we only need a PTE space of 2K with 64K
page size. This patch reduce the space wastage by sharing the page
allocated for the last level of linux page table with multiple pmd
entries. We call these smaller chunks PTE page fragments and allocated
page, PTE page.

In order to support systems which doesn't have 64K HPTE support, we also
add another 2K to PTE page fragment. The second half of the PTE fragments
is used for storing slot and secondary bit information of an HPTE. With this
we now have a 4K PTE fragment.

We use a simple approach to share the PTE page. On allocation, we bump the
PTE page refcount to 16 and share the PTE page with the next 16 pte alloc
request. This should help in the node locality of the PTE page fragment,
assuming that the immediate pte alloc request will mostly come from the
same NUMA node. We don't try to reuse the freed PTE page fragment. Hence
we could be waisting some space.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:07 +10:00
Aneesh Kumar K.V d614bb0412 powerpc: Move the pte free routines from common header
Acked-by: Paul Mackerras <paulus@samba.org>

This patch moves the common code to 32/64 bit headers and also duplicate
4K_PAGES and 64K_PAGES section. We will later change the 64 bit 64K_PAGES
version to support smaller PTE fragments. The patch doesn't introduce
any functional changes.

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:04 +10:00
Aneesh Kumar K.V 419df06eea powerpc: Reduce the PTE_INDEX_SIZE
This make one PMD cover 16MB range. That helps in easier implementation of THP
on power. THP core code make use of one pmd entry to track the hugepage and
the range mapped by a single pmd entry should be equal to the hugepage size
supported by the hardware.

This also switch PGD to cover 16GB. That is needed so that we can simplify the
hugetlb page walking code so that we have same pte format for explicit hugepage
and THP hugepage.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:00 +10:00
Aneesh Kumar K.V e2b3d202d1 powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
We will be switching PMD_SHIFT to 24 bits to facilitate THP impmenetation.
With PMD_SHIFT set to 24, we now have 16MB huge pages allocated at PGD level.
That means with 32 bit process we cannot allocate normal pages at
all, because we cover the entire address space with one pgd entry. Fix this
by switching to a new page table format for hugepages. With the new page table
format for 16GB and 16MB hugepages we won't allocate hugepage directory. Instead
we encode the PTE information directly at the directory level. This forces 16MB
hugepage at PMD level. This will also make the page take walk much simpler later
when we add the THP support.

With the new table format we have 4 cases for pgds and pmds:
(1) invalid (all zeroes)
(2) pointer to next table, as normal; bottom 6 bits == 0
(3) leaf pte for huge page, bottom two bits != 00
(4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:56 +10:00
Aneesh Kumar K.V cf9427b85e powerpc: New hugepage directory format
Change the hugepage directory format so that we can have leaf ptes directly
at page directory avoiding the allocation of hugepage directory.

With the new table format we have 3 cases for pgds and pmds:
(1) invalid (all zeroes)
(2) pointer to next table, as normal; bottom 6 bits == 0
(4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table

Instead of storing shift value in hugepd pointer we use mmu_psize_def index
so that we can fit all the supported hugepage size in 4 bits

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:53 +10:00
Aneesh Kumar K.V 0e5f35d0e4 powerpc: Don't truncate pgd_index wrongly
With PGD_INDEX_SIZE set to 12 the existing macro doesn't work. Fix it to
use PTRS_PER_PGD

The idea originally was to have one more bit in the result of
pgd_index() than PGD_INDEX_SIZE, so that if one had an address
corresponding to the last PGD entry, and then incremented that address
by PGD_SIZE, and took pgd_index() of that, you wouldn't end up with
zero.  The commit that introduced that dates back to 2002, and the
code that was sensitive to that edge case has long since been
refactored (several times), so there is no need for it these days.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:49 +10:00
Aneesh Kumar K.V cc3665a60a powerpc: Don't hard code the size of pte page
USE PTRS_PER_PTE to indicate the size of pte page. To support THP,
later patches will be changing PTRS_PER_PTE value.

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:46 +10:00
Aneesh Kumar K.V ce54152f42 powerpc: Save DAR and DSISR in pt_regs on MCE
We were not saving DAR and DSISR on MCE. Save then and also print the values
along with exception details in xmon.

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:42 +10:00
Aneesh Kumar K.V 4b8f63d92e powerpc: Use signed formatting when printing error
PAPR defines these errors as negative values. So print them accordingly
for easy debugging.

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:39 +10:00
Nathan Fontenot 601abdc3b4 powerpc/pseries: Correct builds break when CONFIG_SMP not defined
Correct build failure for powerpc/pseries builds with CONFIG_SMP not defined.

The function cpu_sibling_mask has no meaning (or definition) when CONFIG_SMP
is not defined. Additionally, the updating of NUMA affinity for a CPU in a UP
system doesn't really make sense.

This patch ifdef's out the code making the affinity updates for PRRN events to
fix the following build break.

arch/powerpc/mm/numa.c: In function ‘stage_topology_update’:
arch/powerpc/mm/numa.c:1535: error: implicit declaration of function ‘cpu_sibling_mask’
arch/powerpc/mm/numa.c:1535: warning: passing argument 3 of ‘cpumask_or’ makes pointer from integer without a cast
make[1]: *** [arch/powerpc/mm/numa.o] Error 1

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:35 +10:00
Kevin Hao 177c19237b powerpc/booke: Remove obsolete macro FINISH_EXCEPTION
This is stale and not used by anyone now.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:32 +10:00
Vasant Hegde fb4696c395 powerpc/rtas_flash: Fix bad memory access
We use kmem_cache_alloc() to allocate memory to hold the new firmware
which will be flashed. kmem_cache_alloc() calls rtas_block_ctor() to
set memory to NULL. But these constructor is called only for newly
allocated slabs.

If we run below command multiple time without rebooting, allocator may
allocate memory from the area which was free'd by kmem_cache_free and
it will not call constructor. In this situation we may hit kernel oops.

dd if=<fw image> of=/proc/ppc64/rtas/firmware_flash bs=4096

oops message:
-------------
[ 1602.399755] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1602.399772] SMP NR_CPUS=1024 NUMA pSeries
[ 1602.399779] Modules linked in: rtas_flash nfsd lockd auth_rpcgss nfs_acl sunrpc fuse loop dm_mod sg ipv6 ses enclosure ehea ehci_pci ohci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh ipr libata scsi_mod
[ 1602.399817] NIP: d00000000a170b9c LR: d00000000a170b64 CTR: c00000000079cd58
[ 1602.399823] REGS: c0000003b9937930 TRAP: 0300   Not tainted  (3.9.0-rc4-0.27-ppc64)
[ 1602.399828] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 22000428  XER: 20000000
[ 1602.399841] SOFTE: 1
[ 1602.399844] CFAR: c000000000005f24
[ 1602.399848] DAR: 8c2625a820631fef, DSISR: 40000000
[ 1602.399852] TASK = c0000003b4520760[3655] 'dd' THREAD: c0000003b9934000 CPU: 3
GPR00: 8c2625a820631fe7 c0000003b9937bb0 d00000000a179f28 d00000000a171f08
GPR04: 0000000010040000 0000000000001000 c0000003b9937df0 c0000003b5fb2080
GPR08: c0000003b58f7200 d00000000a179f28 c0000003b40058d4 c00000000079cd58
GPR12: d00000000a171450 c000000007f40900 0000000000000005 0000000010178d20
GPR16: 00000000100cb9d8 000000000000001d 0000000000000000 000000001003ffff
GPR20: 0000000000000001 0000000000000000 00003fffa0b50d30 000000001001f010
GPR24: 0000000010020888 0000000010040000 d00000000a171f08 d00000000a172808
GPR28: 0000000000001000 0000000010040000 c0000003b4005880 8c2625a820631fe7
[ 1602.399924] NIP [d00000000a170b9c] .rtas_flash_write+0x7c/0x1e8 [rtas_flash]
[ 1602.399930] LR [d00000000a170b64] .rtas_flash_write+0x44/0x1e8 [rtas_flash]
[ 1602.399934] Call Trace:
[ 1602.399939] [c0000003b9937bb0] [d00000000a170b64] .rtas_flash_write+0x44/0x1e8 [rtas_flash] (unreliable)
[ 1602.399948] [c0000003b9937c60] [c000000000282830] .proc_reg_write+0x90/0xe0
[ 1602.399955] [c0000003b9937ce0] [c0000000001ff374] .vfs_write+0x114/0x238
[ 1602.399961] [c0000003b9937d80] [c0000000001ff5d8] .SyS_write+0x70/0xe8
[ 1602.399968] [c0000003b9937e30] [c000000000009cdc] syscall_exit+0x0/0xa0
[ 1602.399973] Instruction dump:
[ 1602.399977] eb698010 801b0028 2f80dcd6 419e00a4 2fbc0000 419e009c ebfb0030 2fbf0000
[ 1602.399989] 409e0010 480000d8 60000000 7c1f0378 <e81f0008> 2fa00000 409efff4 e81f0000
[ 1602.400012] ---[ end trace b4136d115dc31dac ]---
[ 1602.402178]
[ 1602.402185] Sending IPI to other CPUs
[ 1602.403329] IPI complete

This patch uses kmem_cache_zalloc() instead of kmem_cache_alloc() to
allocate memory, which makes sure memory is set to 0 before using.
Also removes rtas_block_ctor(), which is no longer required.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:28 +10:00
Stephen Rothwell 9f3a90e89d powerpc: Fix build failure after merge of the cgroup tree
After merging the cgroup tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology':
arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration]
arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror]
arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]

Caused by commit 30c05350c3 ("powerpc/pseries: Use stop machine to
update cpu maps") from the powerpc tree interacting with (probably)
commit ff794dea52 ("cpuset: remove include of cgroup.h from cpuset.h")
from the cgroup tree.  Removing includes from header files is fraught
with danger ...

The former should have added an include of linux/slab.h to
arch/powerpc/mm/numa.c.

I have added the following merge fix patch for today (but it should be
applied to the powerpc tree ASAP).

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 29 Apr 2013 14:01:44 +1000
Subject: [PATCH] powerpc: numa.c: using kzalloc/kfree requires including
 slab.h

fixes these build errors:

arch/powerpc/mm/numa.c: In function 'arch_update_cpu_topology':
arch/powerpc/mm/numa.c:1465:2: error: implicit declaration of function 'kzalloc' [-Werror=implicit-function-declaration]
arch/powerpc/mm/numa.c:1465:10: error: assignment makes pointer from integer without a cast [-Werror]
arch/powerpc/mm/numa.c:1497:2: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:24 +10:00
Michael Neuling d5bbe6596a powerpc: Fix usage of setup_pci_atmu()
Linux next is currently failing to compile mpc85xx_defconfig with:
  arch/powerpc/sysdev/fsl_pci.c:944:2: error: too many arguments to function 'setup_pci_atmu'

This is caused by (from Kumar's next branch):
  commit 34642bbb3d
  Author: Kumar Gala <galak@kernel.crashing.org>
  powerpc/fsl-pci: Keep PCI SoC controller registers in pci_controller

Which changed definition of setup_pci_atmu() but didn't update one of
the callers.  Below fixes this.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:15:12 +10:00
Linus Torvalds 56847d857c Merge branch 'akpm' (incoming from Andrew)
Merge second batch of fixes from Andrew Morton:

 - various misc bits

 - some printk updates

 - a new "SRAM" driver.

 - MAINTAINERS updates

 - the backlight driver queue

 - checkpatch updates

 - a few init/ changes

 - a huge number of drivers/rtc changes

 - fatfs updates

 - some lib/idr.c work

 - some renaming of the random driver interfaces

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (285 commits)
  net: rename random32 to prandom
  net/core: remove duplicate statements by do-while loop
  net/core: rename random32() to prandom_u32()
  net/netfilter: rename random32() to prandom_u32()
  net/sched: rename random32() to prandom_u32()
  net/sunrpc: rename random32() to prandom_u32()
  scsi: rename random32() to prandom_u32()
  lguest: rename random32() to prandom_u32()
  uwb: rename random32() to prandom_u32()
  video/uvesafb: rename random32() to prandom_u32()
  mmc: rename random32() to prandom_u32()
  drbd: rename random32() to prandom_u32()
  kernel/: rename random32() to prandom_u32()
  mm/: rename random32() to prandom_u32()
  lib/: rename random32() to prandom_u32()
  x86: rename random32() to prandom_u32()
  x86: pageattr-test: remove srandom32 call
  uuid: use prandom_bytes()
  raid6test: use prandom_bytes()
  sctp: convert sctp_assoc_set_id() to use idr_alloc_cyclic()
  ...
2013-04-29 19:47:50 -07:00
Linus Torvalds 191a712090 Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - Fixes and a lot of cleanups.  Locking cleanup is finally complete.
   cgroup_mutex is no longer exposed to individual controlelrs which
   used to cause nasty deadlock issues.  Li fixed and cleaned up quite a
   bit including long standing ones like racy cgroup_path().

 - device cgroup now supports proper hierarchy thanks to Aristeu.

 - perf_event cgroup now supports proper hierarchy.

 - A new mount option "__DEVEL__sane_behavior" is added.  As indicated
   by the name, this option is to be used for development only at this
   point and generates a warning message when used.  Unfortunately,
   cgroup interface currently has too many brekages and inconsistencies
   to implement a consistent and unified hierarchy on top.  The new flag
   is used to collect the behavior changes which are necessary to
   implement consistent unified hierarchy.  It's likely that this flag
   won't be used verbatim when it becomes ready but will be enabled
   implicitly along with unified hierarchy.

   The option currently disables some of broken behaviors in cgroup core
   and also .use_hierarchy switch in memcg (will be routed through -mm),
   which can be used to make very unusual hierarchy where nesting is
   partially honored.  It will also be used to implement hierarchy
   support for blk-throttle which would be impossible otherwise without
   introducing a full separate set of control knobs.

   This is essentially versioning of interface which isn't very nice but
   at this point I can't see any other options which would allow keeping
   the interface the same while moving towards hierarchy behavior which
   is at least somewhat sane.  The planned unified hierarchy is likely
   to require some level of adaptation from userland anyway, so I think
   it'd be best to take the chance and update the interface such that
   it's supportable in the long term.

   Maintaining the existing interface does complicate cgroup core but
   shouldn't put too much strain on individual controllers and I think
   it'd be manageable for the foreseeable future.  Maybe we'll be able
   to drop it in a decade.

Fix up conflicts (including a semantic one adding a new #include to ppc
that was uncovered by header the file changes) as per Tejun.

* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (45 commits)
  cpuset: fix compile warning when CONFIG_SMP=n
  cpuset: fix cpu hotplug vs rebuild_sched_domains() race
  cpuset: use rebuild_sched_domains() in cpuset_hotplug_workfn()
  cgroup: restore the call to eventfd->poll()
  cgroup: fix use-after-free when umounting cgroupfs
  cgroup: fix broken file xattrs
  devcg: remove parent_cgroup.
  memcg: force use_hierarchy if sane_behavior
  cgroup: remove cgrp->top_cgroup
  cgroup: introduce sane_behavior mount option
  move cgroupfs_root to include/linux/cgroup.h
  cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix
  cgroup: make cgroup_path() not print double slashes
  Revert "cgroup: remove bind() method from cgroup_subsys."
  perf: make perf_event cgroup hierarchical
  cgroup: implement cgroup_is_descendant()
  cgroup: make sure parent won't be destroyed before its children
  cgroup: remove bind() method from cgroup_subsys.
  devcg: remove broken_hierarchy tag
  cgroup: remove cgroup_lock_is_held()
  ...
2013-04-29 19:14:20 -07:00
Thomas Gleixner d0380e6c3c early_printk: consolidate random copies of identical code
The early console implementations are the same all over the place.  Move
the print function to kernel/printk and get rid of the copies.

[akpm@linux-foundation.org: arch/mips/kernel/early_printk.c needs kernel.h for va_list]
[paul.gortmaker@windriver.com: sh4: make the bios early console support depend on EARLY_PRINTK]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Richard Weinberger <richard@nod.at>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:13 -07:00
Benjamin Herrenschmidt bc23100a0d Merge remote-tracking branch 'kumar/next' into next
From Kumar Gala:
<<
Add support for T4 and B4 SoC families from Freescale, e6500 altivec
support, some various board fixes and other minor cleanups.
>>
2013-04-30 11:10:09 +10:00
Benjamin Herrenschmidt 28bf41a1fe Merge remote-tracking branch 'agust/next' into next
From Anatolij Gustschin:
<<
There are some changes for mpc5121 generic platform code
to support mpc5125 SoC and DTS files for ac14xx and
MPC5125-TWR boards.
>>
2013-04-30 11:08:45 +10:00
Michel Lespinasse fba2369e6c mm: use vm_unmapped_area() on powerpc architecture
Update the powerpc slice_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Tested-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 11:05:17 +10:00
Michel Lespinasse 34d07177b8 mm: remove free_area_cache use in powerpc architecture
As all other architectures have been converted to use vm_unmapped_area(),
we are about to retire the free_area_cache.

This change simply removes the use of that cache in
slice_get_unmapped_area(), which will most certainly have a
performance cost. Next one will convert that function to use the
vm_unmapped_area() infrastructure and regain the performance.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 11:05:10 +10:00
David Rientjes 4edd7ceff0 mm, hotplug: avoid compiling memory hotremove functions when disabled
__remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE.  PowerPC
pseries will return -EOPNOTSUPP if unsupported.

Adding an #ifdef causes several other functions it depends on to also
become unnecessary, which saves in .text when disabled (it's disabled in
most defconfigs besides powerpc, including x86).  remove_memory_block()
becomes static since it is not referenced outside of
drivers/base/memory.c.

Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled
and disabled.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:37 -07:00
Cody P Schafer f9d531b8dc powerpc/mm/numa: use setup_nr_node_ids() instead of opencoding.
[sfr@canb.auug.org.au: add missing semicolon]
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:36 -07:00
Johannes Weiner 0aad818b2d sparse-vmemmap: specify vmemmap population range in bytes
The sparse code, when asking the architecture to populate the vmemmap,
specifies the section range as a starting page and a number of pages.

This is an awkward interface, because none of the arch-specific code
actually thinks of the range in terms of 'struct page' units and always
translates it to bytes first.

In addition, later patches mix huge page and regular page backing for
the vmemmap.  For this, they need to call vmemmap_populate_basepages()
on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind.  But these
are not necessarily multiples of the 'struct page' size and so this unit
is too coarse.

Just translate the section range into bytes once in the generic sparse
code, then pass byte ranges down the stack.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: David S. Miller <davem@davemloft.net>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:35 -07:00
Gerald Schaefer 106c992a5e mm/hugetlb: add more arch-defined huge_pte functions
Commit abf09bed3c ("s390/mm: implement software dirty bits")
introduced another difference in the pte layout vs.  the pmd layout on
s390, thoroughly breaking the s390 support for hugetlbfs.  This requires
replacing some more pte_xxx functions in mm/hugetlbfs.c with a
huge_pte_xxx version.

This patch introduces those huge_pte_xxx functions and their generic
implementation in asm-generic/hugetlb.h, which will now be included on
all architectures supporting hugetlbfs apart from s390.  This change
will be a no-op for those architectures.

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>	[for !s390 parts]
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:33 -07:00
Jiang Liu 7db6f78cfe mm/PPC: use free_highmem_page() to free highmem pages into buddy system
Use helper function free_highmem_page() to free highmem pages into
the buddy system.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: "Suzuki K. Poulose" <suzuki@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:32 -07:00
Jiang Liu 5d585e5c48 mm/ppc: use common help functions to free reserved pages
Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:30 -07:00
Kevin Hao 9e2ecdbba3 powerpc/fsl-booke: add the reg prop for pci bridge device node for T4/B4
The reg property in the pci bridge device node is used to bind this
device node to the pci bridge device. Then all the pci devices under
this bridge could use the interrupt maps defined in this device node
to do the irq translation. So if this property is missed, the pci
traditional irq mechanism will not work.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2013-04-29 14:47:25 -05:00
Kevin Hao 04aa99cd0d powerpc/fsl-pci: don't unmap the PCI SoC controller registers in setup_pci_atmu
In patch 34642bbb (powerpc/fsl-pci: Keep PCI SoC controller registers in
pci_controller) we choose to keep the map of the PCI SoC controller
registers. But we missed to delete the unmap in setup_pci_atmu
function. This will cause the following call trace once we access
the PCI SoC controller registers later.

Unable to handle kernel paging request for data at address 0x8000080080040f14
Faulting instruction address: 0xc00000000002ea58
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=24 T4240 QDS
Modules linked in:
NIP: c00000000002ea58 LR: c00000000002eaf4 CTR: c00000000002eac0
REGS: c00000017e10b4a0 TRAP: 0300   Not tainted  (3.9.0-rc1-00052-gfa3529f-dirty)
MSR: 0000000080029000 <CE,EE,ME>  CR: 28adbe22  XER: 00000000
SOFTE: 0
DEAR: 8000080080040f14, ESR: 0000000000000000
TASK = c00000017e100000[1] 'swapper/0' THREAD: c00000017e108000 CPU: 2
GPR00: 0000000000000000 c00000017e10b720 c0000000009928d8 c00000017e578e00
GPR04: 0000000000000000 000000000000000c 0000000000000001 c00000017e10bb40
GPR08: 0000000000000000 8000080080040000 0000000000000000 0000000000000016
GPR12: 0000000088adbe22 c00000000fffa800 c000000000001ba0 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 c0000000008a5b70
GPR24: c0000000008af938 c0000000009a28d8 c0000000009bb5dc c00000017e10bb40
GPR28: c00000017e32a400 c00000017e10bc00 c00000017e32a400 c00000017e578e00
NIP [c00000000002ea58] .fsl_pcie_check_link+0x88/0xf0
LR [c00000000002eaf4] .fsl_indirect_read_config+0x34/0xb0
Call Trace:
[c00000017e10b720] [c00000017e10b7a0] 0xc00000017e10b7a0 (unreliable)
[c00000017e10ba30] [c00000000002eaf4] .fsl_indirect_read_config+0x34/0xb0
[c00000017e10bad0] [c00000000033aa08] .pci_bus_read_config_byte+0x88/0xd0
[c00000017e10bb90] [c00000000088d708] .pci_apply_final_quirks+0x9c/0x18c
[c00000017e10bc40] [c0000000000013dc] .do_one_initcall+0x5c/0x1f0
[c00000017e10bcf0] [c00000000086ebac] .kernel_init_freeable+0x180/0x26c
[c00000017e10bdb0] [c000000000001bbc] .kernel_init+0x1c/0x460
[c00000017e10be30] [c000000000000880] .ret_from_kernel_thread+0x64/0xe4
Instruction dump:
38210310 2b800015 4fdde842 7c600026 5463fffe e8010010 7c0803a6 4e800020
60000000 60000000 e92301d0 7c0004ac <80690f14> 0c030000 4c00012c 38210310
---[ end trace 7a8fe0cbccb7d992 ]---

Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Acked-by: Roy Zang <tie-fei.zang@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2013-04-29 14:39:51 -05:00
Zhicheng Fan 9aa171fbc0 powerpc/dts: Fix the dts for p1025rdb 36bit
Fix the following errors:

Error: p1025rdb.dtsi:326.2-3 label or path, 'qe', not found
Error: p1021si-post.dtsi:242.2-3 label or path, 'qe', not found
FATAL ERROR: Syntax error parsing input tree

Signed-off-by: Zhicheng Fan <B32736@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2013-04-29 14:39:31 -05:00
Linus Torvalds ec25e246b9 USB patches for 3.10-rc1
Here's the big USB pull request for 3.10-rc1.
 
 Lots of USB patches here, the majority being USB gadget changes and
 USB-serial driver cleanups, the rest being ARM build fixes / cleanups,
 and individual driver updates.  We also finally got some chipidea fixes,
 which have been delayed for a number of kernel releases, as the
 maintainer has now reappeared.
 
 All of these have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlF+md4ACgkQMUfUDdst+ymkSgCfZWIiCtiX/li0yJqSiRB4yYJx
 Ex0AoNemOOf6ywvSOHPbILTbJ1G+c/PX
 =JmvB
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB patches from Greg Kroah-Hartman:
 "Here's the big USB pull request for 3.10-rc1.

  Lots of USB patches here, the majority being USB gadget changes and
  USB-serial driver cleanups, the rest being ARM build fixes / cleanups,
  and individual driver updates.  We also finally got some chipidea
  fixes, which have been delayed for a number of kernel releases, as the
  maintainer has now reappeared.

  All of these have been in linux-next for a while"

* tag 'usb-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (568 commits)
  USB: ehci-msm: USB_MSM_OTG needs USB_PHY
  USB: OHCI: avoid conflicting platform drivers
  USB: OMAP: ISP1301 needs USB_PHY
  USB: lpc32xx: ISP1301 needs USB_PHY
  USB: ftdi_sio: enable two UART ports on ST Microconnect Lite
  usb: phy: tegra: don't call into tegra-ehci directly
  usb: phy: phy core cannot yet be a module
  USB: Fix initconst in ehci driver
  usb-storage: CY7C68300A chips do not support Cypress ATACB
  USB: serial: option: Added support Olivetti Olicard 145
  USB: ftdi_sio: correct ST Micro Connect Lite PIDs
  ARM: mxs_defconfig: add CONFIG_USB_PHY
  ARM: imx_v6_v7_defconfig: add CONFIG_USB_PHY
  usb: phy: remove exported function from __init section
  usb: gadget: zero: put function instances on unbind
  usb: gadget: f_sourcesink.c: correct a copy-paste misnomer
  usb: gadget: cdc2: fix error return code in cdc_do_config()
  usb: gadget: multi: fix error return code in rndis_do_config()
  usb: gadget: f_obex: fix error return code in obex_bind()
  USB: storage: convert to use module_usb_driver()
  ...
2013-04-29 12:19:23 -07:00
Rafael J. Wysocki 885f925eef Merge branch 'pm-cpufreq'
* pm-cpufreq: (57 commits)
  cpufreq: MAINTAINERS: Add co-maintainer
  cpufreq: pxa2xx: initialize variables
  ARM: S5pv210: compiling issue, ARM_S5PV210_CPUFREQ needs CONFIG_CPU_FREQ_TABLE=y
  cpufreq: cpu0: Put cpu parent node after using it
  cpufreq: ARM big LITTLE: Adapt to latest cpufreq updates
  cpufreq: ARM big LITTLE: put DT nodes after using them
  cpufreq: Don't call __cpufreq_governor() for drivers without target()
  cpufreq: exynos5440: Protect OPP search calls with RCU lock
  cpufreq: dbx500: Round to closest available freq
  cpufreq: Call __cpufreq_governor() with correct policy->cpus mask
  cpufreq / intel_pstate: Optimize intel_pstate_set_policy
  cpufreq: OMAP: instantiate omap-cpufreq as a platform_driver
  arm: exynos: Enable OPP library support for exynos5440
  cpufreq: exynos: Remove error return even if no soc is found
  cpufreq: exynos: Add cpufreq driver for exynos5440
  cpufreq: AMD "frequency sensitivity feedback" powersave bias for ondemand governor
  cpufreq: ondemand: allow custom powersave_bias_target handler to be registered
  cpufreq: convert cpufreq_driver to using RCU
  cpufreq: powerpc/platforms/cell: move cpufreq driver to drivers/cpufreq
  cpufreq: sparc: move cpufreq driver to drivers/cpufreq
  ...

Conflicts:
	MAINTAINERS (with commit a8e39c3 from pm-cpuidle)
	drivers/cpufreq/cpufreq_governor.h (with commit beb0ff3)
2013-04-28 02:10:46 +02:00
Rafael J. Wysocki e4f5a3adc4 Merge branch 'pm-cpuidle'
* pm-cpuidle: (51 commits)
  cpuidle: add maintainer entry
  ARM: s3c64xx: cpuidle: use init/exit common routine
  SH: cpuidle: use init/exit common routine
  cpuidle: fix comment format
  ARM: imx: cpuidle: use init/exit common routine
  ARM: davinci: cpuidle: use init/exit common routine
  ARM: kirkwood: cpuidle: use init/exit common routine
  ARM: calxeda: cpuidle: use init/exit common routine
  ARM: tegra: cpuidle: use init/exit common routine for tegra3
  ARM: tegra: cpuidle: use init/exit common routine for tegra2
  ARM: OMAP4: cpuidle: use init/exit common routine
  ARM: shmobile: cpuidle: use init/exit common routine
  ARM: tegra: cpuidle: use init/exit common routine
  ARM: OMAP3: cpuidle: use init/exit common routine
  ARM: at91: cpuidle: use init/exit common routine
  ARM: ux500: cpuidle: use init/exit common routine
  cpuidle: make a single register function for all
  ARM: ux500: cpuidle: replace for_each_online_cpu by for_each_possible_cpu
  cpuidle: remove en_core_tk_irqen flag
  ARM: OMAP3: remove cpuidle_wrap_enter
  ...
2013-04-28 01:54:49 +02:00
Paul Mackerras 8b78645c93 KVM: PPC: Book3S: Facilities to save/restore XICS presentation ctrler state
This adds the ability for userspace to save and restore the state
of the XICS interrupt presentation controllers (ICPs) via the
KVM_GET/SET_ONE_REG interface.  Since there is one ICP per vcpu, we
simply define a new 64-bit register in the ONE_REG space for the ICP
state.  The state includes the CPU priority setting, the pending IPI
priority, and the priority and source number of any pending external
interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:34 +02:00
Paul Mackerras d19bd86204 KVM: PPC: Book3S: Add support for ibm,int-on/off RTAS calls
This adds support for the ibm,int-on and ibm,int-off RTAS calls to the
in-kernel XICS emulation and corrects the handling of the saved
priority by the ibm,set-xive RTAS call.  With this, ibm,int-off sets
the specified interrupt's priority in its saved_priority field and
sets the priority to 0xff (the least favoured value).  ibm,int-on
restores the saved_priority to the priority field, and ibm,set-xive
sets both the priority and the saved_priority to the specified
priority value.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:33 +02:00
Paul Mackerras 4619ac88b7 KVM: PPC: Book3S HV: Improve real-mode handling of external interrupts
This streamlines our handling of external interrupts that come in
while we're in the guest.  First, when waking up a hardware thread
that was napping, we split off the "napping due to H_CEDE" case
earlier, and use the code that handles an external interrupt (0x500)
in the guest to handle that too.  Secondly, the code that handles
those external interrupts now checks if any other thread is exiting
to the host before bouncing an external interrupt to the guest, and
also checks that there is actually an external interrupt pending for
the guest before setting the LPCR MER bit (mediated external request).

This also makes sure that we clear the "ceded" flag when we handle a
wakeup from cede in real mode, and fixes a potential infinite loop
in kvmppc_run_vcpu() which can occur if we ever end up with the ceded
flag set but MSR[EE] off.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:32 +02:00
Benjamin Herrenschmidt e7d26f285b KVM: PPC: Book3S HV: Add support for real mode ICP in XICS emulation
This adds an implementation of the XICS hypercalls in real mode for HV
KVM, which allows us to avoid exiting the guest MMU context on all
threads for a variety of operations such as fetching a pending
interrupt, EOI of messages, IPIs, etc.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:32 +02:00
Benjamin Herrenschmidt 54695c3088 KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM
Currently, we wake up a CPU by sending a host IPI with
smp_send_reschedule() to thread 0 of that core, which will take all
threads out of the guest, and cause them to re-evaluate their
interrupt status on the way back in.

This adds a mechanism to differentiate real host IPIs from IPIs sent
by KVM for guest threads to poke each other, in order to target the
guest threads precisely when possible and avoid that global switch of
the core to host state.

We then use this new facility in the in-kernel XICS code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:31 +02:00
Benjamin Herrenschmidt bc5ad3f370 KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller
This adds in-kernel emulation of the XICS (eXternal Interrupt
Controller Specification) interrupt controller specified by PAPR, for
both HV and PR KVM guests.

The XICS emulation supports up to 1048560 interrupt sources.
Interrupt source numbers below 16 are reserved; 0 is used to mean no
interrupt and 2 is used for IPIs.  Internally these are represented in
blocks of 1024, called ICS (interrupt controller source) entities, but
that is not visible to userspace.

Each vcpu gets one ICP (interrupt controller presentation) entity,
used to store the per-vcpu state such as vcpu priority, pending
interrupt state, IPI request, etc.

This does not include any API or any way to connect vcpus to their
ICP state; that will be added in later patches.

This is based on an initial implementation by Michael Ellerman
<michael@ellerman.id.au> reworked by Benjamin Herrenschmidt and
Paul Mackerras.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix typo, add dependency on !KVM_MPIC]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:30 +02:00
Michael Ellerman 8e591cb720 KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls
For pseries machine emulation, in order to move the interrupt
controller code to the kernel, we need to intercept some RTAS
calls in the kernel itself.  This adds an infrastructure to allow
in-kernel handlers to be registered for RTAS services by name.
A new ioctl, KVM_PPC_RTAS_DEFINE_TOKEN, then allows userspace to
associate token values with those service names.  Then, when the
guest requests an RTAS service with one of those token values, it
will be handled by the relevant in-kernel handler rather than being
passed up to userspace as at present.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix warning]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:29 +02:00
Scott Wood 91194919a6 kvm/ppc/mpic: Eliminate mmio_mapped
We no longer need to keep track of this now that MPIC destruction
always happens either during VM destruction (after MMIO has been
destroyed) or during a failed creation (before the fd has been exposed
to userspace, and thus before the MMIO region could have been
registered).

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:28 +02:00
Scott Wood 07f0a7bdec kvm: destroy emulated devices on VM exit
The hassle of getting refcounting right was greater than the hassle
of keeping a list of devices to destroy on VM exit.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:28 +02:00
Alexander Graf 447a03c02a KVM: PPC: MPIC: Restrict to e500 platforms
The code as is doesn't make any sense on non-e500 platforms. Restrict it
there, so that people don't get wrong ideas on what would actually work.

This patch should get reverted as soon as it's possible to either run e500
guests on non-e500 hosts or the MPIC emulation gains support for non-e500
modes.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:26 +02:00
Alexander Graf 5efdb4be59 KVM: PPC: MPIC: Add support for KVM_IRQ_LINE
Now that all pieces are in place for reusing generic irq infrastructure,
we can copy x86's implementation of KVM_IRQ_LINE irq injection and simply
reuse it for PPC, as it will work there just as well.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:25 +02:00
Alexander Graf de9ba2f363 KVM: PPC: Support irq routing and irqfd for in-kernel MPIC
Now that all the irq routing and irqfd pieces are generic, we can expose
real irqchip support to all of KVM's internal helpers.

This allows us to use irqfd with the in-kernel MPIC.

Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:25 +02:00
Scott Wood eb1e4f43e0 kvm/ppc/mpic: add KVM_CAP_IRQ_MPIC
Enabling this capability connects the vcpu to the designated in-kernel
MPIC.  Using explicit connections between vcpus and irqchips allows
for flexibility, but the main benefit at the moment is that it
simplifies the code -- KVM doesn't need vm-global state to remember
which MPIC object is associated with this vm, and it doesn't need to
care about ordering between irqchip creation and vcpu creation.

Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: add stub functions for kvmppc_mpic_{dis,}connect_vcpu]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:24 +02:00
Scott Wood 5df554ad5b kvm/ppc/mpic: in-kernel MPIC emulation
Hook the MPIC code up to the KVM interfaces, add locking, etc.

Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:23 +02:00
Scott Wood f0f5c481a9 kvm/ppc/mpic: adapt to kernel style and environment
Remove braces that Linux style doesn't permit, remove space after
'*' that Lindent added, keep error/debug strings contiguous, etc.

Substitute type names, debug prints, etc.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:22 +02:00
Scott Wood 6dd830a09a kvm/ppc/mpic: remove some obviously unneeded code
Remove some parts of the code that are obviously QEMU or Raven specific
before fixing style issues, to reduce the style issues that need to be
fixed.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:22 +02:00
Scott Wood b823f98f89 kvm/ppc/mpic: import hw/openpic.c from QEMU
This is QEMU's hw/openpic.c from commit
abd8d4a4d6dfea7ddea72f095f993e1de941614e ("Update version for
1.4.0-rc0"), run through Lindent with no other changes to ease merging
future changes between Linux and QEMU.  Remaining style issues
(including those introduced by Lindent) will be fixed in a later patch.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:20 +02:00
Paul Mackerras c35635efdc KVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty map
At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications
done by the host to the virtual processor areas (VPAs) and dispatch
trace logs (DTLs) registered by the guest.  This is because those
modifications are done either in real mode or in the host kernel
context, and in neither case does the access go through the guest's
HPT, and thus no change (C) bit gets set in the guest's HPT.

However, the changes done by the host do need to be tracked so that
the modified pages get transferred when doing live migration.  In
order to track these modifications, this adds a dirty flag to the
struct representing the VPA/DTL areas, and arranges to set the flag
when the VPA/DTL gets modified by the host.  Then, when we are
collecting the dirty log, we also check the dirty flags for the
VPA and DTL for each vcpu and set the relevant bit in the dirty log
if necessary.  Doing this also means we now need to keep track of
the guest physical address of the VPA/DTL areas.

So as not to lose track of modifications to a VPA/DTL area when it gets
unregistered, or when a new area gets registered in its place, we need
to transfer the dirty state to the rmap chain.  This adds code to
kvmppc_unpin_guest_page() to do that if the area was dirty.  To simplify
that code, we now require that all VPA, DTL and SLB shadow buffer areas
fit within a single host page.  Guests already comply with this
requirement because pHyp requires that these areas not cross a 4k
boundary.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:13 +02:00
Paul Mackerras a1b4a0f606 KVM: PPC: Book3S HV: Make HPT reading code notice R/C bit changes
At present, the code that determines whether a HPT entry has changed,
and thus needs to be sent to userspace when it is copying the HPT,
doesn't consider a hardware update to the reference and change bits
(R and C) in the HPT entries to constitute a change that needs to
be sent to userspace.  This adds code to check for changes in R and C
when we are scanning the HPT to find changed entries, and adds code
to set the changed flag for the HPTE when we update the R and C bits
in the guest view of the HPTE.

Since we now need to set the HPTE changed flag in book3s_64_mmu_hv.c
as well as book3s_hv_rm_mmu.c, we move the note_hpte_modification()
function into kvm_book3s_64.h.

Current Linux guest kernels don't use the hardware updates of R and C
in the HPT, so this change won't affect them.  Linux (or other) kernels
might in future want to use the R and C bits and have them correctly
transferred across when a guest is migrated, so it is better to correct
this deficiency.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:12 +02:00
Mihai Caraman d9ce6041b3 KVM: PPC: e500: Add e6500 core to Kconfig description
Add e6500 core to Kconfig description.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:11 +02:00
Mihai Caraman ea17a971c2 KVM: PPC: e500mc: Enable e6500 cores
Extend processor compatibility names to e6500 cores.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:10 +02:00
Mihai Caraman 5b21501045 KVM: PPC: e500: Remove E.PT and E.HV.LRAT categories from VCPUs
Embedded.Page Table (E.PT) category is not supported yet in e6500 kernel.
Configure TLBnCFG to remove E.PT and E.HV.LRAT categories from VCPUs.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:09 +02:00
Mihai Caraman 9a6061d7fd KVM: PPC: e500: Add support for EPTCFG register
EPTCFG register defined by E.PT is accessed unconditionally by Linux guests
in the presence of MAV 2.0. Emulate it now.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:08 +02:00
Mihai Caraman 307d9008ed KVM: PPC: e500: Add support for TLBnPS registers
Add support for TLBnPS registers available in MMU Architecture Version
(MAV) 2.0.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:07 +02:00
Mihai Caraman 8893a188b1 KVM: PPC: e500: Move vcpu's MMU configuration to dedicated functions
Vcpu's MMU default configuration and geometry update logic was buried in
a chunk of code. Move them to dedicated functions to add more clarity.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:07 +02:00
Mihai Caraman a85d2aa23e KVM: PPC: e500: Expose MMU registers via ONE_REG
MMU registers were exposed to user-space using sregs interface. Add them
to ONE_REG interface using kvmppc_get_one_reg/kvmppc_set_one_reg delegation
mechanism.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:06 +02:00
Mihai Caraman 35b299e279 KVM: PPC: Book3E: Refactor ONE_REG ioctl implementation
Refactor Book3E ONE_REG ioctl implementation to use kvmppc_get_one_reg/
kvmppc_set_one_reg delegation interface introduced by Book3S. This is
necessary for MMU SPRs which are platform specifics.

Get rid of useless case braces in the process.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:05 +02:00
Bharat Bhushan 9b4f530807 booke: exit to user space if emulator request
This allows the exit to user space if emulator request by returning
EMULATE_EXIT_USER. This will be used in subsequent patches in list

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:04 +02:00
Bharat Bhushan 0f47f9b517 KVM: extend EMULATE_EXIT_USER to support different exit reasons
Currently the instruction emulator code returns EMULATE_EXIT_USER
and common code initializes the "run->exit_reason = .." and
"vcpu->arch.hcall_needed = .." with one fixed reason.
But there can be different reasons when emulator need to exit
to user space. To support that the "run->exit_reason = .."
and "vcpu->arch.hcall_needed = .." initialization is moved a
level up to emulator.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:03 +02:00
Bharat Bhushan c402a3f457 Rename EMULATE_DO_PAPR to EMULATE_EXIT_USER
Instruction emulation return EMULATE_DO_PAPR when it requires
exit to userspace on book3s. Similar return is required
for booke. EMULATE_DO_PAPR reads out to be confusing so it is
renamed to EMULATE_EXIT_USER.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:03 +02:00
Bharat Bhushan 092d62ee93 KVM: PPC: debug stub interface parameter defined
This patch defines the interface parameter for KVM_SET_GUEST_DEBUG
ioctl support. Follow up patches will use this for setting up
hardware breakpoints, watchpoints and software breakpoints.

Also kvm_arch_vcpu_ioctl_set_guest_debug() is brought one level below.
This is because I am not sure what is required for book3s. So this ioctl
behaviour will not change for book3s.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:02 +02:00
Bharat Bhushan adccf65ca4 KVM: PPC: cache flush for kernel managed pages
Kernel can only access pages which maps as memory.
So flush only the valid kernel pages.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:01 +02:00
Anshuman Khandual 3925f46bb5 powerpc/perf: Enable branch stack sampling framework
Provides basic enablement for perf branch stack sampling framework on
POWER8 processor based platforms. Adds new BHRB related elements into
cpu_hw_event structure to represent current BHRB config, BHRB filter
configuration, manage context and to hold output BHRB buffer during
PMU interrupt before passing to the user space. This also enables
processing of BHRB data and converts them into generic perf branch
stack data format.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:13:02 +10:00
Anshuman Khandual b1113557fb powerpc/perf: Define BHRB generic functions, data and flags for POWER8
This patch populates BHRB specific data for power_pmu structure. It
also implements POWER8 specific BHRB filter and configuration functions.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:13:01 +10:00
Anshuman Khandual 5afc9b52a7 powerpc/perf: Add new BHRB related generic functions, data and flags
This patch adds couple of generic functions to power_pmu structure
which would configure the BHRB and it's filters. It also adds
representation of the number of BHRB entries present on the PMU.
A new PMU flag PPMU_BHRB would indicate presence of BHRB feature.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:12 +10:00
Anshuman Khandual 73760931dc powerpc/perf: Add basic assembly code to read BHRB entries on POWER8
This patch adds the basic assembly code to read BHRB buffer. BHRB entries
are valid only after a PMU interrupt has happened (when MMCR0[PMAO]=1)
and BHRB has been freezed. BHRB read should not be attempted when it is
still enabled (MMCR0[PMAE]=1) and getting updated, as this can produce
non-deterministic results.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:11 +10:00
Anshuman Khandual 95213959ae powerpc/perf: Add new BHRB related instructions for POWER8
This patch adds new POWER8 instruction encoding for reading
and clearing Branch History Rolling Buffer entries. The new
instruction 'mfbhrbe' (move from branch history rolling buffer
entry) is used to read BHRB buffer entries and instruction
'clrbhrb' (clear branch history rolling buffer) is used to
clear the entire buffer. The instruction 'clrbhrb' has straight
forward encoding. But the instruction encoding format for
reading the BHRB entries is like 'mfbhrbe RT, BHRBE' where it
takes two arguments, i.e the index for the BHRB buffer entry to
read and a general purpose register to put the value which was
read from the buffer entry.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:11 +10:00
Michael Ellerman e05b9b9e5c powerpc/perf: Power8 PMU support
This patch adds support for the power8 PMU to perf.

Work is ongoing to add generic cache events.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:10 +10:00
Michael Ellerman 8f61aa325f powerpc/perf: Add support for SIER
On power8 we have a new SIER (Sampled Instruction Event Register), which
captures information about instructions when we have random sampling
enabled.

Add support for loading the SIER into pt_regs, overloading regs->dar.
Also set the new NO_SIPR flag in regs->result if we don't have SIPR.

Update regs_sihv/sipr() to look for SIPR/SIHV in SIER.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:10 +10:00
Michael Ellerman 860aad71fc powerpc/perf: Add regs_no_sipr()
On power8 the presence or absence of SIPR depends on settings at runtime,
so convert to using a dynamic flag for NO_SIPR. Existing backends that
set NO_SIPR unconditionally set the dynamic flag obviously.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:09 +10:00
Michael Ellerman 33904054b4 powerpc/perf: Add an accessor for regs->result
Add an accessor for regs->result so we can use it to store more flags in
future.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:08 +10:00
Michael Ellerman 5682c46026 powerpc/perf: Convert mmcra_sipr/sihv() to regs_sipr/sihv()
On power8 the SIPR and SIHV are not in MMCRA, so convert the routines
to take regs and change the names accordingly.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:08 +10:00
Michael Ellerman 7a7868326d powerpc/perf: Add an explict flag indicating presence of SLOT field
In perf_ip_adjust() we potentially use the MMCRA[SLOT] field to adjust
the reported IP of a sampled instruction.

Currently the logic is written so that if the backend does NOT have
the PPMU_ALT_SIPR flag set then we assume MMCRA[SLOT] exists.

However on power8 we do not want to set ALT_SIPR (it's in a third
location), and we also do not have MMCRA[SLOT].

So add a new flag which only indicates whether MMCRA[SLOT] exists.

Naively we'd set it on everything except power6/7, because they set
ALT_SIPR, and we've reversed the polarity of the flag. But it's more
complicated than that.

mpc7450 is 32-bit, and uses its own version of perf_ip_adjust()
which doesn't use MMCRA[SLOT], so it doesn't need the new flag set and
the behaviour is unchanged.

PPC970 (and I assume power4) don't have MMCRA[SLOT], so shouldn't have
the new flag set. This is a behaviour change on those cpus, though we
were probably getting lucky and the bits in question were 0.

power5 and power5+ set the new flag, behaviour unchanged.

power6 & power7 do not set the new flag, behaviour unchanged.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:07 +10:00
Michael Ellerman 240686c136 powerpc: Initialise PMU related regs on Power8
For both HV and guest kernels, intialise PMU regs to something sane.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:06 +10:00
Gavin Shan 959c9bdd58 powerpc/powernv: Fix invalid IOMMU table
Ben found the root cause. Commit 37f02195be
("powerpc/pci: fix PCI-e devices rescan issue on powerpc platform")
overwrites the IOMMU table of PCI device while enabling PCI device.
The patch intends to fix the IOMMU table after that point.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:10:00 +10:00
Gavin Shan 373f565741 powerpc/powernv: Build DMA space for PE on PHB3
The patch intends to build 32-bits DMA space for individual PEs on
PHB3. The TVE# is recognized by the combo of PE# and fixed bits
from DMA address, which is zero for 32-bits DMA space.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:10:00 +10:00
Gavin Shan 4cce95508b powerpc/powernv: TCE invalidation for PHB3
The TCE should be invalidated while it's created or free'd. The
approach to do that for IODA1 and IODA2 compliant PHBs are different.
So the patch differentiate them with different functions called to
do that for IODA1 and IODA2 compliant PHBs. It's notable that the
PCI address is used to invalidate the corresponding TCE on IODA2
compliant PHB3.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:09:59 +10:00
Gavin Shan 137436c9a6 powerpc/powernv: Patch MSI EOI handler on P8
The EOI handler of MSI/MSI-X interrupts for P8 (PHB3) need additional
steps to handle the P/Q bits in IVE before EOIing the corresponding
interrupt. The patch changes the EOI handler to cover that. we have
individual IRQ chip in each PHB instance. During the MSI IRQ setup
time, the IRQ chip is copied over from the original one for that IRQ,
and the EOI handler is patched with the one that will handle the P/Q
bits (As Ben suggested).

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:09:59 +10:00
Gavin Shan a486bdb0e9 powerpc/powernv: Add option CONFIG_POWERNV_MSI
As Michael Ellerman suggested, to add CONFIG_POWERNV_MSI for PowerNV
platform. That's similar to CONFIG_PSERIES_MSI for pSeries platform.
For now, we don't make it dependent on CONFIG_EEH since it's not ready
to enable that yet.

Apart from that, we also enable CONFIG_PPC_MSI_BITMAP on selecting
CONFIG_POWERNV_MSI.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:09:58 +10:00
Gavin Shan aa0c033f99 powerpc/powernv: Supports PHB3
The patch intends to initialize PHB3 during system boot stage. The
flag "PNV_PHB_MODEL_PHB3" is introduced to differentiate IODA2
compatible PHB3 from other types of PHBs.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:27 +10:00
Paul Mackerras a485c70989 powerpc: Fix "attempt to move .org backwards" error
Building a 64-bit powerpc kernel with PR KVM enabled currently gives
this error:

  AS      arch/powerpc/kernel/head_64.o
arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards
make[2]: *** [arch/powerpc/kernel/head_64.o] Error 1

This happens because the MASKABLE_EXCEPTION_PSERIES macro turns into
33 instructions, but we only have space for 32 at the decrementer
interrupt vector (from 0x900 to 0x980).

In the code generated by the MASKABLE_EXCEPTION_PSERIES macro, we
currently have two instances of the HMT_MEDIUM macro, which has the
effect of setting the SMT thread priority to medium.  One is the
first instruction, and is overwritten by a no-op on processors where
we save the PPR (processor priority register), that is, POWER7 or
later.  The other is after we have saved the PPR.

In order to reduce the code at 0x900 by one instruction, we omit the
first HMT_MEDIUM.  On processors without SMT this will have no effect
since HMT_MEDIUM is a no-op there.  On POWER5 and RS64 machines this
will mean that the first few instructions take a little longer in the
case where a decrementer interrupt occurs when the hardware thread is
running at low SMT priority.  On POWER6 and later machines, the
hardware automatically boosts the thread priority when a decrementer
interrupt is taken if the thread priority was below medium, so this
change won't make any difference.

The alternative would be to branch out of line after saving the CFAR.
However, that would incur an extra overhead on all processors, whereas
the approach adopted here only adds overhead on older threaded processors.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:27 +10:00
Nathan Fontenot e04fa61214 powerpc/pseries: Add /proc interface to control topology updates
There are instances in which we do not want topology updates to occur.
In order to allow this a /proc interface (/proc/powerpc/topology_updates)
is introduced so that topology updates can be enabled and disabled.

This patch also adds a prrn_is_enabled() call so that PRRN events are
handled in the kernel only if topology updating is enabled.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:26 +10:00
Nathan Fontenot 1b1218d32e powerpc/pseries: Enable PRRN handling
The Linux kernel and platform firmware negotiate their mutual support
of the PRRN option via the ibm,client-architecture-support interface.
This patch simply sets the appropriate fields in the client architecture
vector to indicate Linux support for PRRN and will allow the firmware to
report PRRN events via the RTAS event-scan mechanism.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:26 +10:00
Jesse Larrew b7abef045f powerpc/pseries: RE-enable Virtual Processor Home Node updating
The new PRRN firmware feature provides a more convenient and event-driven
interface than VPHN for notifying Linux of changes to the NUMA affinity of
platform resources. However, for practical reasons, it may not be feasible
for some customers to update to the latest firmware. For these customers,
the VPHN feature supported on previous firmware versions may still be the
best option.

The VPHN feature was previously disabled due to races with the load
balancing code when accessing the NUMA cpu maps, but the new stop_machine()
approach protects the NUMA cpu maps from these concurrent accesses. It
should be safe to re-enable this feature now.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:25 +10:00
Jesse Larrew 176bbf1424 powerpc/pseries: Update NUMA VDSO information when updating CPU maps
The following patch adds vdso_getcpu_init(), which stores the NUMA node for
a cpu in SPRG3:

Commit 18ad51dd34 ("powerpc: Add VDSO version of getcpu") adds
vdso_getcpu_init(), which stores the NUMA node for a cpu in SPRG3.

This patch ensures that this information is also updated when the NUMA
affinity of a cpu changes.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:24 +10:00
Nathan Fontenot 30c05350c3 powerpc/pseries: Use stop machine to update cpu maps
The new PRRN firmware feature allows CPU and memory resources to be
transparently reassigned across NUMA boundaries. When this happens, the
kernel must update the node maps to reflect the new affinity information.

Although the NUMA maps can be protected by locking primitives during the
update itself, this is insufficient to prevent concurrent accesses to these
structures. Since cpumask_of_node() hands out a pointer to these
structures, they can still be modified outside of the lock. Furthermore,
tracking down each usage of these pointers and adding locks would be quite
invasive and difficult to maintain.

The approach used is to make a list of affected cpus and call stop_machine
to have the update routine run on each of the affected cpus allowing them
to update themselves. Each cpu finds itself in the list of cpus and makes
the appropriate updates. We need to have each cpu do this for themselves to
handle calls to vdso_getcpu_init() added in a subsequent patch.

Situations like these are best handled using stop_machine(). Since the NUMA
affinity updates are exceptionally rare events, this approach has the
benefit of not adding any overhead while accessing the NUMA maps during
normal operation.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:24 +10:00
Jesse Larrew 5d88aa85c0 powerpc/pseries: Update CPU maps when device tree is updated
Platform events such as partition migration or the new PRRN firmware
feature can cause the NUMA characteristics of a CPU to change, and these
changes will be reflected in the device tree nodes for the affected
CPUs.

This patch registers a handler for Open Firmware device tree updates
and reconfigures the CPU and node maps whenever the associativity
changes. Currently, this is accomplished by marking the affected CPUs in
the cpu_associativity_changes_mask and allowing
arch_update_cpu_topology() to retrieve the new associativity information
using hcall_vphn().

Protecting the NUMA cpu maps from concurrent access during an update
operation will be addressed in a subsequent patch in this series.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:23 +10:00
Nathan Fontenot 8002b0c54b powerpc/pseries: Update numa.c to use updated firmware_has_feature()
Update the numa code to use the updated firmware_has_feature() when checking
for type 1 affinity.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:22 +10:00
Nathan Fontenot f0ff7eb483 powerpc/pseries: Update firmware_has_feature() to check architecture vector 5 bits
The firmware_has_feature() function makes it easy to check for supported
features of the hypervisor. This patch extends the capability of
firmware_has_feature() to include checking for specified bits
in vector 5 of the architecture vector as reported in the device tree.

As part of this the #defines used for the architecture vector are re-defined
such that each option has the index into vector 5 and the feature bit encoded
into it. This makes checking for architecture bits when initiating data
for firmware_has_feature much easier.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:22 +10:00
Nathan Fontenot 43c0ea6053 powerpc/pseries: Use ARRAY_SIZE to iterate over firmware_features_table array
When iterating over the entries in firmware_features_table we only need
to go over the actual number of entries in the array instead of declaring
it to be bigger and checking to make sure there is a valid entry in every
slot.

This patch removes the FIRMWARE_MAX_FEATURES #define and replaces the
array looping with the use of ARRAY_SIZE().

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:21 +10:00
Nathan Fontenot 530b5e1475 powerpc/pseries: Move architecture vector definitions to prom.h
As part of handling of PRRN events we need to check vector 5 of the
architecture vector bits reported in the device tree to ensure PRRN event
handling is enabled. To do this firmware_has_feature() is updated (in a
subsequent patch) to make this check vector 5 bits. To avoid having to
re-define bits in the architecture vector the bit definitions are moved
to prom.h.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:21 +10:00
Jesse Larrew 49c68a8518 powerpc/pseries: Add PRRN RTAS event handler
A PRRN event is signaled via the RTAS event-scan mechanism, which
returns a Hot Plug Event message "fixed part" indicating "Platform
Resource Reassignment". In response to the Hot Plug Event message,
we must call ibm,update-nodes to determine which resources were
reassigned and then ibm,update-properties to obtain the new affinity
information about those resources.

The PRRN event-scan RTAS message contains only the "fixed part" with
the "Type" field set to the value 160 and no Extended Event Log. The
four-byte Extended Event Log Length field is re-purposed (since no
Extended Event Log message is included) to pass the "scope" parameter
that causes the ibm,update-nodes to return the nodes affected by the
specific resource reassignment.

This patch adds a handler for RTAS events. The function
pseries_devicetree_update() (from mobility.c) is used to make the
ibm,update-nodes/ibm,update-properties RTAS calls. Updating the NUMA maps
(handled by a subsequent patch) will require significant processing,
so pseries_devicetree_update() is called from an asynchronous workqueue
to allow event processing to continue.

PRRN RTAS events on pseries systems are rare events that have to be
initiated from the HMC console for the system by an IBM tech. This allows
us to assume that these events are widely spaced. Additionally, all work
on the queue is flushed before handling any new work to ensure we only have
one event in flight being handled at a time.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:20 +10:00
Nathan Fontenot 2e9b7b02a3 powerpc/pseries: Correct buffer parsing in update_dt_node()
Correct parsing of the buffer returned from ibm,update-properties. The first
element is a length and the path to the property which is slightly different
from the list of properties in the buffer so we need to specifically
handle this.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:19 +10:00
Nathan Fontenot 762ec15707 powerpc/pseries: Expose pseries devicetree_update()
Newer firmware on Power systems can transparently reassign platform resources
(CPU and Memory) in use. For instance, if a processor or memory unit is
predicted to fail, the platform may transparently move the processing to an
equivalent unused processor or the memory state to an equivalent unused
memory unit. However, reassigning resources across NUMA boundaries may alter
the performance of the partition. When such reassignment is necessary, the
Platform Resource Reassignment Notification (PRRN) option provides a
mechanism to inform the Linux kernel of changes to the NUMA affinity of
its platform resources.

When rtasd receives a PRRN event, it needs to make a series of RTAS
calls (ibm,update-nodes and ibm,update-properties) to retrieve the
updated device tree information. These calls are already handled in the
pseries_devicetree_update() routine used in partition migration.

This patch exposes pseries_devicetree_update() to make it accessible
to other pseries routines, this patch also updates pseries_devicetree_update()
to take a 32-bit scope parameter. The scope value, which was previously hard
coded to 1 for partition migration, is used for the RTAS calls
ibm,update-nodes/properties to update the device tree.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:19 +10:00
Michael Neuling 3e96ca7f00 powerpc: Fix hardware IRQs with MMU on exceptions when HV=0
POWER8 allows us to take interrupts with the MMU on.  This gives us a
second set of vectors offset at 0x4000.

Unfortunately when coping these vectors we missed checking for MSR HV
for hardware interrupts (0x500).  This results in us trying to use
HSRR0/1 when HV=0, rather than SRR0/1 on HW IRQs

The below fixes this to check CPU_FTR_HVMODE when patching the code at
0x4500.

Also we remove the check for CPU_FTR_ARCH_206 since relocation on IRQs
are only available in arch 2.07 and beyond.

Thanks to benh for helping find this.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:18 +10:00
Michael Neuling 8c2a381734 powerpc/power8: Fix secondary CPUs hanging on boot for HV=0
In __restore_cpu_power8 we determine if we are HV and if not, we return
before setting HV only resources.

Unfortunately we forgot to restore the link register from r11 before
returning.

This will happen on boot and with secondary CPUs not coming online.

This adds the missing link register restore.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:17 +10:00
Michael Neuling 29ce3c5073 powerpc: Add isync to copy_and_flush
In __after_prom_start we copy the kernel down to zero in two calls to
copy_and_flush.  After the first call (copy from 0 to copy_to_here:)
we jump to the newly copied code soon after.

Unfortunately there's no isync between the copy of this code and the
jump to it.  Hence it's possible that stale instructions could still be
in the icache or pipeline before we branch to it.

We've seen this on real machines and it's results in no console output
after:
  calling quiesce...
  returning from prom_init

The below adds an isync to ensure that the copy and flushing has
completed before any branching to the new instructions occurs.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:17 +10:00
Michael Neuling 2171364d1a powerpc: Add HWCAP2 aux entry
We are currently out of free bits in AT_HWCAP. With POWER8, we have
several hardware features that we need to advertise.

Tested on POWER and x86.

Signed-off-by: Michael Neuling <michael@neuling.org>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:16 +10:00
Benjamin Herrenschmidt 6263fb3bd7 powerpc/powernv: Fix missing Kconfig dependency for MSIs
We need PPC_MSI_BITMAP support

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-24 15:15:33 +10:00
Benjamin Herrenschmidt 234d15def9 Merge remote-tracking branch 'origin/master' into next
Merge upstream to get the audit fixes
2013-04-24 14:43:36 +10:00
Michael Ellerman 6747e83235 powerpc/spufs: Initialise inode->i_ino in spufs_new_inode()
In commit 85fe402 (fs: do not assign default i_ino in new_inode), the
initialisation of i_ino was removed from new_inode() and pushed down
into the callers. However spufs_new_inode() was not updated.

This exhibits as no files appearing in /spu, because all our dirents
have a zero inode, which readdir() seems to dislike.

Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-24 14:22:32 +10:00
Vasant Hegde fd2d37c854 powerpc/rtas_flash: New return code to indicate FW entitlement expiry
Add new return code to rtas_flash to indicate firmware entitlement
expiry. Strictly we don't need this update. But to keep it in sync
with PAPR, this was added.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-24 14:22:31 +10:00
Vasant Hegde f51005d7a4 powerpc/rtas_flash: Update return token comments
Add proper comment to ibm,validate-flash-image RTAS call
update result tokens.

Note: Only comment section is modified, no code change.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-24 14:22:31 +10:00
Michael Ellerman 51e0eaf98d powerpc/cell: Only iterate over online nodes in cbe_init_pm_irq()
None of the cell platforms support CPU hotplug, so we should iterate
only over online nodes when setting PMU interrupts.

This also fixes a warning during boot when NODES_SHIFT is large enough:

WARNING: at /scratch/michael/src/kmk/linus/kernel/irq/irqdomain.c:766
...
NIP [c0000000000db278] .irq_linear_revmap+0x30/0x58
LR [c0000000000dc2a0] .irq_create_mapping+0x38/0x1a8
Call Trace:
[c0000003fc9c3af0] [c0000000000dc2a0] .irq_create_mapping+0x38/0x1a8 (unreliable)
[c0000003fc9c3b80] [c000000000655c1c] .__machine_initcall_cell_cbe_init_pm_irq+0x84/0x158
[c0000003fc9c3c20] [c00000000000afb4] .do_one_initcall+0x5c/0x1e0
[c0000003fc9c3cd0] [c000000000644580] .kernel_init_freeable+0x238/0x328
[c0000003fc9c3db0] [c00000000000b784] .kernel_init+0x1c/0x120
[c0000003fc9c3e30] [c000000000009fb8] .ret_from_kernel_thread+0x64/0xac

This is caused by us overflowing our linear revmap because we're
requesting too many interrupts.

Reported-by: Dennis Schridde <devurandom@gmx.net>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-24 14:22:30 +10:00
Benjamin Herrenschmidt 973e2abd15 Merge remote-tracking branch 'mpe/master' into next
Merge the tentative powerpc-next branch maintained by Michael
while I was on vacation
2013-04-24 14:11:20 +10:00
Daniel Lezcano 554c06ba3e cpuidle: remove en_core_tk_irqen flag
The en_core_tk_irqen flag is set in all the cpuidle driver which
means it is not necessary to specify this flag.

Remove the flag and the code related to it.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>  # for mach-omap2/*
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-23 13:45:22 +02:00
Chen Gang 5676005acf powerpc/pseries/lparcfg: Fix possible overflow are more than 1026
need set '\0' for 'local_buffer'.

SPLPAR_MAXLENGTH is 1026, RTAS_DATA_BUF_SIZE is 4096. so the contents of
rtas_data_buf may truncated in memcpy.

if contents are really truncated.
  the splpar_strlen is more than 1026. the next while loop checking will
  not find the end of buffer. that will cause memory access violation.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-23 16:39:35 +10:00
Oleg Nesterov 28d170abad ptrace/powerpc: Don't flush_ptrace_hw_breakpoint() on fork()
arch_dup_task_struct() does flush_ptrace_hw_breakpoint(src), this
destroys the parent's breakpoints for no reason. We should clear
child->thread.ptrace_bps[] copied by dup_task_struct() instead.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-23 16:05:06 +10:00
Adhemerval Zanella fcb41a2030 powerpc: Add VDSO version of time
On 04/18/2013 07:38 PM, Anton Blanchard wrote:
> Since you are only reading one long you shouldn't need to check the
> update count and loop, you will always see a consistent value. The
> system call version of time() just does an unprotected load for example.

Fixed.

> With the above change and with Michael's comments covered (decent
> changelog entry and Signed-off-by):
>
> Acked-by: Anton Blanchard <anton@samba.org>

Thanks for the review, below the updated patch:

From: Adhemerval Zanella <azanella@linux.vnet.ibm.com>

This patch implement the time syscall as vDSO. The performance speedups
are:

Baseline PPC32: 380 nsec
Baseline PPC64: 350 nsec
vdso PPC32:      20 nsec
vsdo PPC64:      20 nsec

Tested on 64 bit build with both 32 bit and 64 bit userland.

Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Adhemerval Zanella <azanella@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-23 16:05:05 +10:00
David S. Miller 6e0895c2ea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be_main.c
	drivers/net/ethernet/intel/igb/igb_main.c
	drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
	include/net/scm.h
	net/batman-adv/routing.c
	net/ipv4/tcp_input.c

The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
cleanup in net-next to now pass cred structs around.

The be2net driver had a bug fix in 'net' that overlapped with the VLAN
interface changes by Patrick McHardy in net-next.

An IGB conflict existed because in 'net' the build_skb() support was
reverted, and in 'net-next' there was a comment style fix within that
code.

Several batman-adv conflicts were resolved by making sure that all
calls to batadv_is_my_mac() are changed to have a new bat_priv first
argument.

Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
rewrite in 'net-next', mostly overlapping changes.

Thanks to Stephen Rothwell and Antonio Quartulli for help with several
of these merge resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-22 20:32:51 -04:00
Ingo Molnar 73e21ce28d Merge branch 'perf/urgent' into perf/core
Conflicts:
	arch/x86/kernel/cpu/perf_event_intel.c

Merge in the latest fixes before applying new patches, resolve the conflict.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-21 10:57:33 +02:00
Li Zhong 016af59f0f powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()
This patch fixes the following oops, which could be trigged by build the kernel
with many concurrent threads, under CONFIG_DEBUG_PAGEALLOC.

hpte_insert() might return -1, indicating that the bucket (primary here)
is full. We are not necessarily reporting a BUG in this case. Instead, we could
try repeatedly (try secondary, remove and try again) until we find a slot.

[  543.075675] ------------[ cut here ]------------
[  543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239!
[  543.075714] Oops: Exception in kernel mode, sig: 5 [#1]
[  543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
[  543.075741] Modules linked in: binfmt_misc ehea
[  543.075759] NIP: c000000000036eb0 LR: c000000000036ea4 CTR: c00000000005a594
[  543.075771] REGS: c0000000a90832c0 TRAP: 0700   Not tainted  (3.8.0-next-20130222)
[  543.075781] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 22224482  XER: 00000000
[  543.075816] SOFTE: 0
[  543.075823] CFAR: c00000000004c200
[  543.075830] TASK = c0000000e506b750[23934] 'cc1' THREAD: c0000000a9080000 CPU: 1
GPR00: 0000000000000001 c0000000a9083540 c000000000c600a8 ffffffffffffffff
GPR04: 0000000000000050 fffffffffffffffa c0000000a90834e0 00000000004ff594
GPR08: 0000000000000001 0000000000000000 000000009592d4d8 c000000000c86854
GPR12: 0000000000000002 c000000006ead300 0000000000a51000 0000000000000001
GPR16: f000000003354380 ffffffffffffffff ffffffffffffff80 0000000000000000
GPR20: 0000000000000001 c000000000c600a8 0000000000000001 0000000000000001
GPR24: 0000000003354380 c000000000000000 0000000000000000 c000000000b65950
GPR28: 0000002000000000 00000000000cd50e 0000000000bf50d9 c000000000c7c230
[  543.076005] NIP [c000000000036eb0] .kernel_map_pages+0x1e0/0x3f8
[  543.076016] LR [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8
[  543.076025] Call Trace:
[  543.076033] [c0000000a9083540] [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable)
[  543.076053] [c0000000a9083640] [c000000000167638] .get_page_from_freelist+0x6cc/0x8dc
[  543.076067] [c0000000a9083800] [c000000000167a48] .__alloc_pages_nodemask+0x200/0x96c
[  543.076082] [c0000000a90839c0] [c0000000001ade44] .alloc_pages_vma+0x160/0x1e4
[  543.076098] [c0000000a9083a80] [c00000000018ce04] .handle_pte_fault+0x1b0/0x7e8
[  543.076113] [c0000000a9083b50] [c00000000018d5a8] .handle_mm_fault+0x16c/0x1a0
[  543.076129] [c0000000a9083c00] [c0000000007bf1dc] .do_page_fault+0x4d0/0x7a4
[  543.076144] [c0000000a9083e30] [c0000000000090e8] handle_page_fault+0x10/0x30
[  543.076155] Instruction dump:
[  543.076163] 7c630038 78631d88 e80a0000 f8410028 7c0903a6 e91f01de e96a0010 e84a0008
[  543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 <0b000000> 7f63db78 48785781 60000000
[  543.076224] ---[ end trace bd5807e8d6ae186b ]---

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 16:00:00 +10:00
Li Zhong b170bd3de6 powerpc: Split the code trying to insert hpte repeatedly as an helper function
Move the logic trying to insert hpte in __hash_page_huge() to an helper
function, so it could also be used by others.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:59 +10:00
Li Zhong 2c3c0693d9 powerpc: Move the setting of rflags out of loop in __hash_page_huge
It seems that new_pte and rflags don't get changed in the repeating loop, so
move their assignment out of the loop.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:59 +10:00
Brian King c2e1d84523 powerpc: Set default VGA device
Add a PCI quirk for VGA devices on Power to set the default VGA device.
Ensures a default VGA is always set if a graphics adapter is present,
even if firmware did not initialize it. If more than one graphics
adapter is present, ensure the one initialized by firmware is set
as the default VGA device. This ensures that X autoconfiguration
will work.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:58 +10:00
Yuanquan Chen 37f02195be powerpc/pci: fix PCI-e devices rescan issue on powerpc platform
Powerpc initializes the DMA and IRQ information in pci_scan_child_bus()->
pcibios_fixup_bus()->pcibios_setup_bus_devices(). But for the devices
which are hotpluged, bus->is added has been set for the first scan of the
PCI-e bus, so the initialization code won't be called. Then the hotpluged
devices' driver will fail to load.

For example :
The PCI-e device 0001:03:00.0 is the Intel PCI-e e1000e network card, remove
it from the system:

    # echo 1 > /sys/bus/pci/devices/0001\:03\:00.0/remove
    # e1000e 0001:03:00.0 eth0: removed PHC

Rescan it from it's bus:

    # echo 1 > /sys/bus/pci/devices/0001\:02\:00.0/rescan
    ...
    e1000e 0001:03:00.0: Disabling ASPM L0s L1
    e1000e 0001:03:00.0: No usable DMA configuration, aborting
    e1000e: probe of 0001:03:00.0 failed with error -5

So we move the DMA & IRQ initialization code from pcibios_setup_devices() and
construct a new function pcibios_enable_device. We call this function in
pcibios_enable_device, which will be called by PCI-e rescan code. At the
meanwhile, we avoid the the impact on cardbus. I also validate this patch with
silicon's PCIe-sata which encounters the IRQ issue.

Signed-off-by: Yuanquan Chen <Yuanquan.Chen@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hiroo Matsumoto <matsumoto.hiroo@jp.fujitsu.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:57 +10:00
Stephen Rothwell 55671f3cc2 powerpc: fix annotation of fake_numa_create_new_node()
This function has always been marked as __cpuinit, but is only called
from functions marked as __init and references an __initdata variable.
So change its annotation to __init.

Fixes this build warning:

WARNING: arch/powerpc/mm/built-in.o(.cpuinit.text+0x86): Section mismatch in reference from the function .fake_numa_create_new_node() to the variable .init.data:cmdline
The function __cpuinit .fake_numa_create_new_node() references
a variable __initdata cmdline.
If cmdline is only used by .fake_numa_create_new_node then
annotate cmdline with a matching annotation.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:56 +10:00
Vaidyanathan Srinivasan 7122beeee7 powerpc: fix numa distance for form0 device tree
The following commit breaks numa distance setup for old powerpc
systems that use form0 encoding in device tree.

commit 41eab6f88f
powerpc/numa: Use form 1 affinity to setup node distance

Device tree node /rtas/ibm,associativity-reference-points would
index into /cpus/PowerPCxxxx/ibm,associativity based on form0 or
form1 encoding detected by ibm,architecture-vec-5 property.

All modern systems use form1 and current kernel code is correct.
However, on older systems with form0 encoding, the numa distance
will get hard coded as LOCAL_DISTANCE for all nodes.  This causes
task scheduling anomaly since scheduler will skip building numa
level domain (topmost domain with all cpus) if all numa distances
are same.  (value of 'level' in sched_init_numa() will remain 0)

Prior to the above commit:
((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)

Restoring compatible behavior with this patch for old powerpc systems
with device tree where numa distance are encoded as form0.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:56 +10:00
Michael Neuling 517b731477 powerpc/ptrace: Add DAWR debug feature info for userspace
This adds new debug feature information so that the DAWR can be
identified by userspace tools like GDB.

Unfortunately the DAWR doesn't sit nicely into the current description
that ptrace provides to userspace via struct ppc_debug_info.  It doesn't
allow for specifying that only some ranges are possible or even the end
alignment constraints (DAWR only allows 512 byte wide ranges which can't
cross a 512 byte boundary).

After talking to Edjunior Machado (GDB ppc developer), it was decided
this was the best approach.  Just mark it as debug feature DAWR and
tools like GDB can internally decide the constraints.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:55 +10:00
Ian Munsie a6a058e52a powerpc: Add accounting for Doorbell interrupts
This patch adds a new line to /proc/interrupts to account for the
doorbell interrupts that each hardware thread has received. The total
interrupt count in /proc/stat will now also include doorbells.

 # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
 16:        551       1267        281        175      XICS Level     IPI
LOC:       2037       1503       1688       1625   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
CNT:          0          0          0          0   Performance monitoring interrupts
MCE:          0          0          0          0   Machine check exceptions
DBL:         42        550         20         91   Doorbell interrupts

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:55 +10:00
Nishanth Aravamudan 61435690a9 powerpc/pseries: close DDW race between functions of adapter
Given a PCI device with multiple functions in a DDW capable slot, the
following situation can be encountered: When the first function sets a
64-bit DMA mask, enable_ddw() will be called and we can fail to properly
configure DDW (the most common reason being the new DMA window's size is
not large enough to map all of an LPAR's memory). With the recent
changes to DDW, we remove the base window in order to determine if the
new window is of sufficient size to cover an LPAR's memory. We correctly
replace the base window if we find that not to be the case. However,
once we go through and re-configured 32-bit DMA via the IOMMU, the next
function of the adapter will go through the same process. And since DDW
is a characteristic of the slot itself, we are most likely going to fail
again. But to determine we are going to fail the second slot, we again
remove the base window -- but that is now in-use by the first
function/driver, which might be issuing I/O already.

To close this window, keep a list of all the failed struct device_nodes
that have failed to configure DDW. If the current device_node is in that
list, just fail out immediately and fall back to 32-bit DMA without
doing any DDW manipulation.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:04:00 +10:00
Gavin Shan fb1b55d654 powerpc/powernv: Use MSI bitmap to manage IRQs
As Michael Ellerman mentioned, arch/powerpc/sysdev/msi_bitmap.c
already implemented bitmap to manage (alloc/free) MSI interrupts.
The patch intends to use that mechanism to manage MSI interrupts
for PowerNV platform.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:59 +10:00
Michael Neuling 2a3563b023 powerpc: Setup in HFSCR for POWER8
Setup the HFSCR (Hypervisor Facility Status and Control Register) for POWER8
when running HV=1.  The HFSCR is the same as the FSCR except it's for
hypervisors.  It controls the available of various facilities in OS and
userspace levels.  It also indicates the cause of a hypervisor facility
unavailable interrupt (although we are not using this here).

This patch sets the facilities Linux knows about incase the firmware doesn't.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:59 +10:00
Michael Neuling 04b418c97f powerpc: Add HFSCR SPR definitions
Add SPR number and bit definitions for the HFSCR (Hypervisor Facility Status
and Control Register).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:58 +10:00
Alexey Kardashevskiy ee4a391661 powerpc: fixing ptrace_get_reg to return an error
Currently ptrace_get_reg returns error as a value
what make impossible to tell whether it is a correct value or error code.

The patch adds a parameter which points to the real return data and
returns an error code.

As get_user_msr() never fails and it is used in multiple places so it has not
been changed by this patch.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:57 +10:00