Commit Graph

104421 Commits

Author SHA1 Message Date
Al Viro 37b00bc929 ppc32: fix copy_from_user()
commit 224264657b8b228f949b42346e09ed8c90136a8e upstream.

should clear on access_ok() failures.  Also remove the useless
range truncation logics.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:03 +01:00
Al Viro 601d2912d5 sparc32: fix copy_from_user()
commit 917400cecb4b52b5cde5417348322bb9c8272fa6 upstream.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:03 +01:00
Al Viro 88e42bd1bf mn10300: copy_from_user() should zero on access_ok() failure...
commit ae7cc577ec2a4a6151c9e928fd1f595d953ecef1 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:03 +01:00
Guenter Roeck 3c5d8c4f2d openrisc: fix the fix of copy_from_user()
commit 8e4b72054f554967827e18be1de0e8122e6efc04 upstream.

Since commit acb2505d0119 ("openrisc: fix copy_from_user()"),
copy_from_user() returns the number of bytes requested, not the
number of bytes not copied.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Fixes: acb2505d0119 ("openrisc: fix copy_from_user()")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:02 +01:00
Al Viro 298c0771fb openrisc: fix copy_from_user()
commit acb2505d0119033a80c85ac8d02dccae41271667 upstream.

... that should zero on faults.  Also remove the <censored> helpful
logics wrt range truncation copied from ppc32.  Where it had ever
been needed only in case of copy_from_user() *and* had not been merged
into the mainline until a month after the need had disappeared.
A decade before openrisc went into mainline, I might add...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:02 +01:00
Al Viro fc1e56cd13 parisc: fix copy_from_user()
commit aace880feea38875fbc919761b77e5732a3659ef upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:02 +01:00
Al Viro d01c959ab2 metag: copy_from_user() should zero the destination on access_ok() failure
commit 8ae95ed4ae5fc7c3391ed668b2014c9e2079533b upstream.

Acked-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:02 +01:00
Al Viro 351a44facc alpha: fix copy_from_user()
commit 2561d309dfd1555e781484af757ed0115035ddb3 upstream.

it should clear the destination even when access_ok() fails.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:02 +01:00
Al Viro c946f64eb2 mips: copy_from_user() must zero the destination on access_ok() failure
commit e69d700535ac43a18032b3c399c69bf4639e89a2 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:01 +01:00
Al Viro 0ea4b0e2cc hexagon: fix strncpy_from_user() error return
commit f35c1e0671728d1c9abc405d05ef548b5fcb2fc4 upstream.

It's -EFAULT, not -1 (and contrary to the comment in there,
__strnlen_user() can return 0 - on faults).

Acked-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:01 +01:00
Al Viro a6c16ec5a2 sh: fix copy_from_user()
commit 6e050503a150b2126620c1a1e9b3a368fcd51eac upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:01 +01:00
Al Viro a0c422b88e score: fix copy_from_user() and friends
commit b615e3c74621e06cd97f86373ca90d43d6d998aa upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:00 +01:00
Al Viro 668a740df2 blackfin: fix copy_from_user()
commit 8f035983dd826d7e04f67b28acf8e2f08c347e41 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:00 +01:00
Al Viro b5f6c58b90 cris: buggered copy_from_user/copy_to_user/clear_user
commit eb47e0293baaa3044022059f1fa9ff474bfe35cb upstream.

* copy_from_user() on access_ok() failure ought to zero the destination
* none of those primitives should skip the access_ok() check in case of
small constant size.

Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:00 +01:00
Al Viro 233d4b179c frv: fix clear_user()
commit 3b8767a8f00cc6538ba6b1cf0f88502e2fd2eb90 upstream.

It should check access_ok().  Otherwise a bunch of places turn into
trivially exploitable rootholes.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:33:00 +01:00
Vineet Gupta df375880c8 ARC: uaccess: get_user to zero out dest in cause of fault
commit 05d9d0b96e53c52a113fd783c0c97c830c8dc7af upstream.

Al reported potential issue with ARC get_user() as it wasn't clearing
out destination pointer in case of fault due to bad address etc.

Verified using following

| {
|  	u32 bogus1 = 0xdeadbeef;
|	u64 bogus2 = 0xdead;
|	int rc1, rc2;
|
|  	pr_info("Orig values %x %llx\n", bogus1, bogus2);
|	rc1 = get_user(bogus1, (u32 __user *)0x40000000);
|	rc2 = get_user(bogus2, (u64 __user *)0x50000000);
|	pr_info("access %d %d, new values %x %llx\n",
|		rc1, rc2, bogus1, bogus2);
| }

| [ARCLinux]# insmod /mnt/kernel-module/qtn.ko
| Orig values deadbeef dead
| access -14 -14, new values 0 0

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:59 +01:00
Al Viro ed98892e36 s390: get_user() should zero on failure
commit fd2d2b191fe75825c4c7a6f12f3fef35aaed7dd7 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:59 +01:00
Al Viro 60f0190e11 score: fix __get_user/get_user
commit c2f18fa4cbb3ad92e033a24efa27583978ce9600 upstream.

* should zero on any failure
* __get_user() should use __copy_from_user(), not copy_from_user()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:59 +01:00
Al Viro 643d0a2d2c sh64: failing __get_user() should zero
commit c6852389228df9fb3067f94f3b651de2a7921b36 upstream.

It could be done in exception-handling bits in __get_user_b() et.al.,
but the surgery involved would take more knowledge of sh64 details
than I have or _want_ to have.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:58 +01:00
Al Viro 322dab0da0 m32r: fix __get_user()
commit c90a3bc5061d57e7931a9b7ad14784e1a0ed497d upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:58 +01:00
Al Viro 7d84a5d51f mn10300: failing __get_user() and get_user() should zero
commit 43403eabf558d2800b429cd886e996fd555aa542 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:58 +01:00
Al Viro 22e232d64f microblaze: fix copy_from_user()
commit d0cf385160c12abd109746cad1f13e3b3e8b50b8 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[wt: s/might_fault/might_sleep]

Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:58 +01:00
Al Viro 90f2278f09 microblaze: fix __get_user()
commit e98b9e37ae04562d52c96f46b3cf4c2e80222dc1 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:57 +01:00
John David Anglin 61f2d84aa4 parisc: Ensure consistent state when switching to kernel stack at syscall entry
commit 6ed518328d0189e0fdf1bb7c73290d546143ea66 upstream.

We have one critical section in the syscall entry path in which we switch from
the userspace stack to kernel stack. In the event of an external interrupt, the
interrupt code distinguishes between those two states by analyzing the value of
sr7. If sr7 is zero, it uses the kernel stack. Therefore it's important, that
the value of sr7 is in sync with the currently enabled stack.

This patch now disables interrupts while executing the critical section.  This
prevents the interrupt handler to possibly see an inconsistent state which in
the worst case can lead to crashes.

Interestingly, in the syscall exit path interrupts were already disabled in the
critical section which switches back to the userspace stack.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:57 +01:00
Dan Carpenter cba31cb938 avr32: off by one in at32_init_pio()
commit 55f1cf83d5cf885c75267269729805852039c834 upstream.

The pio_dev[] array has MAX_NR_PIO_DEVICES elements so the > should be
>=.

Fixes: 5f97f7f940 ('[PATCH] avr32 architecture')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:57 +01:00
Guenter Roeck da189d04e7 avr32: fix 'undefined reference to `___copy_from_user'
commit 65c0044ca8d7c7bbccae37f0ff2972f0210e9f41 upstream.

avr32 builds fail with:

arch/avr32/kernel/built-in.o: In function `arch_ptrace':
(.text+0x650): undefined reference to `___copy_from_user'
arch/avr32/kernel/built-in.o:(___ksymtab+___copy_from_user+0x0): undefined
reference to `___copy_from_user'
kernel/built-in.o: In function `proc_doulongvec_ms_jiffies_minmax':
(.text+0x5dd8): undefined reference to `___copy_from_user'
kernel/built-in.o: In function `proc_dointvec_minmax_sysadmin':
sysctl.c:(.text+0x6174): undefined reference to `___copy_from_user'
kernel/built-in.o: In function `ptrace_has_cap':
ptrace.c:(.text+0x69c0): undefined reference to `___copy_from_user'
kernel/built-in.o:ptrace.c:(.text+0x6b90): more undefined references to
`___copy_from_user' follow

Fixes: 8630c32275ba ("avr32: fix copy_from_user()")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Havard Skinnemoen <hskinnemoen@gmail.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:57 +01:00
Al Viro 91be3ab41c avr32: fix copy_from_user()
commit 8630c32275bac2de6ffb8aea9d9b11663e7ad28e upstream.

really ugly, but apparently avr32 compilers turns access_ok() into
something so bad that they want it in assembler.  Left that way,
zeroing added in inline wrapper.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:56 +01:00
Pan Xinhui ac78e19eba powerpc/nvram: Fix an incorrect partition merge
commit 11b7e154b132232535befe51c55db048069c8461 upstream.

When we merge two contiguous partitions whose signatures are marked
NVRAM_SIG_FREE, We need update prev's length and checksum, then write it
to nvram, not cur's. So lets fix this mistake now.

Also use memset instead of strncpy to set the partition's name. It's
more readable if we want to fill up with duplicate chars .

Fixes: fa2b4e54d4 ("powerpc/nvram: Improve partition removal")
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:56 +01:00
Paul Mackerras 5daad2cce3 powerpc/64: Fix incorrect return value from __copy_tofrom_user
commit 1a34439e5a0b2235e43f96816dbb15ee1154f656 upstream.

Debugging a data corruption issue with virtio-net/vhost-net led to
the observation that __copy_tofrom_user was occasionally returning
a value 16 larger than it should.  Since the return value from
__copy_tofrom_user is the number of bytes not copied, this means
that __copy_tofrom_user can occasionally return a value larger
than the number of bytes it was asked to copy.  In turn this can
cause higher-level copy functions such as copy_page_to_iter_iovec
to corrupt memory by copying data into the wrong memory locations.

It turns out that the failing case involves a fault on the store
at label 79, and at that point the first unmodified byte of the
destination is at R3 + 16.  Consequently the exception handler
for that store needs to add 16 to R3 before using it to work out
how many bytes were not copied, but in this one case it was not
adding the offset to R3.  To fix it, this moves the label 179 to
the point where we add 16 to R3.  I have checked manually all the
exception handlers for the loads and stores in this code and the
rest of them are correct (it would be excellent to have an
automated test of all the exception cases).

This bug has been present since this code was initially
committed in May 2002 to Linux version 2.5.20.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:56 +01:00
Gavin Shan 6d13a7b0e1 powerpc/powernv: Use CPU-endian PEST in pnv_pci_dump_p7ioc_diag_data()
commit 5adaf8629b193f185ca5a1665b9e777a0579f518 upstream.

This fixes the warnings reported from sparse:

  pci.c:312:33: warning: restricted __be64 degrades to integer
  pci.c:313:33: warning: restricted __be64 degrades to integer

Fixes: cee72d5bb4 ("powerpc/powernv: Display diag data on p7ioc EEH errors")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:56 +01:00
Anton Blanchard 546731aaea powerpc/vdso64: Use double word compare on pointers
commit 5045ea37377ce8cca6890d32b127ad6770e6dce5 upstream.

__kernel_get_syscall_map() and __kernel_clock_getres() use cmpli to
check if the passed in pointer is non zero. cmpli maps to a 32 bit
compare on binutils, so we ignore the top 32 bits.

A simple test case can be created by passing in a bogus pointer with
the bottom 32 bits clear. Using a clk_id that is handled by the VDSO,
then one that is handled by the kernel shows the problem:

  printf("%d\n", clock_getres(CLOCK_REALTIME, (void *)0x100000000));
  printf("%d\n", clock_getres(CLOCK_BOOTTIME, (void *)0x100000000));

And we get:

  0
  -1

The bigger issue is if we pass a valid pointer with the bottom 32 bits
clear, in this case we will return success but won't write any data
to the pointer.

I stumbled across this issue because the LLVM integrated assembler
doesn't accept cmpli with 3 arguments. Fix this by converting them to
cmpldi.

Fixes: a7f290dad3 ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:55 +01:00
Paul Mackerras b0a4c16739 powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
commit f077aaf0754bcba0fffdbd925bc12f09cd1e38aa upstream.

In commit c60ac5693c ("powerpc: Update kernel VSID range", 2013-03-13)
we lost a check on the region number (the top four bits of the effective
address) for addresses below PAGE_OFFSET.  That commit replaced a check
that the top 18 bits were all zero with a check that bits 46 - 59 were
zero (performed for all addresses, not just user addresses).

This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx
and we will insert a valid SLB entry for it.  The VSID used will be the
same as if the top 4 bits were 0, but the page size will be some random
value obtained by indexing beyond the end of the mm_ctx_high_slices_psize
array in the paca.  If that page size is the same as would be used for
region 0, then userspace just has an alias of the region 0 space.  If the
page size is different, then no HPTE will be found for the access, and
the process will get a SIGSEGV (since hash_page_mm() will refuse to create
a HPTE for the bogus address).

The access beyond the end of the mm_ctx_high_slices_psize can be at most
5.5MB past the array, and so will be in RAM somewhere.  Since the access
is a load performed in real mode, it won't fault or crash the kernel.
At most this bug could perhaps leak a little bit of information about
blocks of 32 bytes of memory located at offsets of i * 512kB past the
paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11.

Fixes: c60ac5693c ("powerpc: Update kernel VSID range")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:55 +01:00
Marcin Nowakowski 4787d839ac MIPS: ptrace: Fix regs_return_value for kernel context
commit 74f1077b5b783e7bf4fa3007cefdc8dbd6c07518 upstream.

Currently regs_return_value always negates reg[2] if it determines
the syscall has failed, but when called in kernel context this check is
invalid and may result in returning a wrong value.

This fixes errors reported by CONFIG_KPROBES_SANITY_TEST

Fixes: d7e7528bcd ("Audit: push audit success and retcode into arch ptrace.h")
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14381/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:55 +01:00
Paul Burton bc9f83ea7f MIPS: Malta: Fix IOCU disable switch read for MIPS64
commit 305723ab439e14debc1d339aa04e835d488b8253 upstream.

Malta boards used with CPU emulators feature a switch to disable use of
an IOCU. Software has to check this switch & ignore any present IOCU if
the switch is closed. The read used to do this was unsafe for 64 bit
kernels, as it simply casted the address 0xbf403000 to a pointer &
dereferenced it. Whilst in a 32 bit kernel this would access kseg1, in a
64 bit kernel this attempts to access xuseg & results in an address
error exception.

Fix by accessing a correctly formed ckseg1 address generated using the
CKSEG1ADDR macro.

Whilst modifying this code, define the name of the register and the bit
we care about within it, which indicates whether PCI DMA is routed to
the IOCU or straight to DRAM. The code previously checked that bit 0 was
also set, but the least significant 7 bits of the CONFIG_GEN0 register
contain the value of the MReqInfo signal provided to the IOCU OCP bus,
so singling out bit 0 makes little sense & that part of the check is
dropped.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: b6d92b4a6b ("MIPS: Add option to disable software I/O coherency.")
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/14187/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:55 +01:00
Will Deacon 83099f397a arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP
commit 3a402a709500c5a3faca2111668c33d96555e35a upstream.

When TIF_SINGLESTEP is set for a task, the single-step state machine is
enabled and we must take care not to reset it to the active-not-pending
state if it is already in the active-pending state.

Unfortunately, that's exactly what user_enable_single_step does, by
unconditionally setting the SS bit in the SPSR for the current task.
This causes failures in the GDB testsuite, where GDB ends up missing
expected step traps if the instruction being stepped generates another
trap, e.g. PTRACE_EVENT_FORK from an SVC instruction.

This patch fixes the problem by preserving the current state of the
stepping state machine when TIF_SINGLESTEP is set on the current thread.

Cc: <stable@vger.kernel.org>
Reported-by: Yao Qi <yao.qi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:54 +01:00
Will Deacon d65df5171a arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb()
commit 872c63fbf9e153146b07f0cece4da0d70b283eeb upstream.

smp_mb__before_spinlock() is intended to upgrade a spin_lock() operation
to a full barrier, such that prior stores are ordered with respect to
loads and stores occuring inside the critical section.

Unfortunately, the core code defines the barrier as smp_wmb(), which
is insufficient to provide the required ordering guarantees when used in
conjunction with our load-acquire-based spinlock implementation.

This patch overrides the arm64 definition of smp_mb__before_spinlock()
to map to a full smp_mb().

Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:54 +01:00
James Hogan 1fd5c7b654 arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
commit 3146bc64d12377a74dbda12b96ea32da3774ae07 upstream.

AT_VECTOR_SIZE_ARCH should be defined with the maximum number of
NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined
for arm64 at all even though ARCH_DLINFO will contain one NEW_AUX_ENT
for the VDSO address.

This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for
AT_BASE_PLATFORM which arm64 doesn't use, but lets define it now and add
the comment above ARCH_DLINFO as found in several other architectures to
remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to
date.

Fixes: f668cd1673 ("arm64: ELF definitions")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:54 +01:00
Mark Rutland e5471defd7 arm64: avoid returning from bad_mode
commit 7d9e8f71b989230bc613d121ca38507d34ada849 upstream.

Generally, taking an unexpected exception should be a fatal event, and
bad_mode is intended to cater for this. However, it should be possible
to contain unexpected synchronous exceptions from EL0 without bringing
the kernel down, by sending a SIGILL to the task.

We tried to apply this approach in commit 9955ac47f4 ("arm64:
don't kill the kernel on a bad esr from el0"), by sending a signal for
any bad_mode call resulting from an EL0 exception.

However, this also applies to other unexpected exceptions, such as
SError and FIQ. The entry paths for these exceptions branch to bad_mode
without configuring the link register, and have no kernel_exit. Thus, if
we take one of these exceptions from EL0, bad_mode will eventually
return to the original user link register value.

This patch fixes this by introducing a new bad_el0_sync handler to cater
for the recoverable case, and restoring bad_mode to its original state,
whereby it calls panic() and never returns. The recoverable case
branches to bad_el0_sync with a bl, and returns to userspace via the
usual ret_to_user mechanism.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 9955ac47f4 ("arm64: don't kill the kernel on a bad esr from el0")
Reported-by: Mark Salter <msalter@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:54 +01:00
Russell King eee1bdb521 ARM: sa1111: fix pcmcia suspend/resume
commit 06dfe5cc0cc684e735cb0232fdb756d30780b05d upstream.

SA1111 PCMCIA was broken when PCMCIA switched to using dev_pm_ops for
the PCMCIA socket class.  PCMCIA used to handle suspend/resume via the
socket hosting device, which happened at normal device suspend/resume
time.

However, the referenced commit changed this: much of the resume now
happens much earlier, in the noirq resume handler of dev_pm_ops.

However, on SA1111, the PCMCIA device is not accessible as the SA1111
has not been resumed at _noirq time.  It's slightly worse than that,
because the SA1111 has already been put to sleep at _noirq time, so
suspend doesn't work properly.

Fix this by converting the core SA1111 code to use dev_pm_ops as well,
and performing its own suspend/resume at noirq time.

This fixes these errors in the kernel log:

pcmcia_socket pcmcia_socket0: time out after reset
pcmcia_socket pcmcia_socket1: time out after reset

and the resulting lack of PCMCIA cards after a S2RAM cycle.

Fixes: d7646f7632 ("pcmcia: use dev_pm_ops for class pcmcia_socket_class")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:53 +01:00
Russell King 5b4918cca9 ARM: sa1100: clear reset status prior to reboot
commit da60626e7d02a4f385cae80e450afc8b07035368 upstream.

Clear the current reset status prior to rebooting the platform.  This
adds the bit missing from 04fef228fb ("[ARM] pxa: introduce
reset_status and clear_reset_status for driver's usage").

Fixes: 04fef228fb ("[ARM] pxa: introduce reset_status and clear_reset_status for driver's usage")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:53 +01:00
Srinivas Ramana 1774ca8155 ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7
commit 117e5e9c4cfcb7628f08de074fbfefec1bb678b7 upstream.

If the bootloader uses the long descriptor format and jumps to
kernel decompressor code, TTBCR may not be in a right state.
Before enabling the MMU, it is required to clear the TTBCR.PD0
field to use TTBR0 for translation table walks.

The commit dbece45894 ("ARM: 7501/1: decompressor:
reset ttbcr for VMSA ARMv7 cores") does the reset of TTBCR.N, but
doesn't consider all the bits for the size of TTBCR.N.

Clear TTBCR.PD0 field and reset all the three bits of TTBCR.N to
indicate the use of TTBR0 and the correct base address width.

Fixes: dbece45894 ("ARM: 7501/1: decompressor: reset ttbcr for VMSA ARMv7 cores")
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Srinivas Ramana <sramana@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:53 +01:00
Robin Murphy 88654a154a ARM: 8616/1: dt: Respect property size when parsing CPUs
commit ba6dea4f7cedb4b1c17e36f4087675d817c2e24b upstream.

Whilst MPIDR values themselves are less than 32 bits, it is still
perfectly valid for a DT to have #address-cells > 1 in the CPUs node,
resulting in the "reg" property having leading zero cell(s). In that
situation, the big-endian nature of the data conspires with the current
behaviour of only reading the first cell to cause the kernel to think
all CPUs have ID 0, and become resoundingly unhappy as a consequence.

Take the full property length into account when parsing CPUs so as to
be correct under any circumstances.

Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:53 +01:00
Michael S. Tsirkin 69c373d8f7 x86/um: reuse asm-generic/barrier.h
commit 577f183acc88645eae116326cc2203dc88ea730c upstream.

On x86/um CONFIG_SMP is never defined.  As a result, several macros
match the asm-generic variant exactly. Drop the local definitions and
pull in asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Richard Weinberger <richard@nod.at>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:52 +01:00
H.J. Lu 186c5f347b x86/build: Build compressed x86 kernels as PIE
commit 6d92bc9d483aa1751755a66fee8fb39dffb088c0 upstream.

The 32-bit x86 assembler in binutils 2.26 will generate R_386_GOT32X
relocation to get the symbol address in PIC.  When the compressed x86
kernel isn't built as PIC, the linker optimizes R_386_GOT32X relocations
to their fixed symbol addresses.  However, when the compressed x86
kernel is loaded at a different address, it leads to the following
load failure:

  Failed to allocate space for phdrs

during the decompression stage.

If the compressed x86 kernel is relocatable at run-time, it should be
compiled with -fPIE, instead of -fPIC, if possible and should be built as
Position Independent Executable (PIE) so that linker won't optimize
R_386_GOT32X relocation to its fixed symbol address.

Older linkers generate R_386_32 relocations against locally defined
symbols, _bss, _ebss, _got and _egot, in PIE.  It isn't wrong, just less
optimal than R_386_RELATIVE.  But the x86 kernel fails to properly handle
R_386_32 relocations when relocating the kernel.  To generate
R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as
hidden in both 32-bit and 64-bit x86 kernels.

To build a 64-bit compressed x86 kernel as PIE, we need to disable the
relocation overflow check to avoid relocation overflow errors. We do
this with a new linker command-line option, -z noreloc-overflow, which
got added recently:

 commit 4c10bbaa0912742322f10d9d5bb630ba4e15dfa7
 Author: H.J. Lu <hjl.tools@gmail.com>
 Date:   Tue Mar 15 11:07:06 2016 -0700

    Add -z noreloc-overflow option to x86-64 ld

    Add -z noreloc-overflow command-line option to the x86-64 ELF linker to
    disable relocation overflow check.  This can be used to avoid relocation
    overflow check if there will be no dynamic relocation overflow at
    run-time.

The 64-bit compressed x86 kernel is built as PIE only if the linker supports
-z noreloc-overflow.  So far 64-bit relocatable compressed x86 kernel
boots fine even when it is built as a normal executable.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Edited the changelog and comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:52 +01:00
Steven Rostedt 6523fa8c34 x86/paravirt: Do not trace _paravirt_ident_*() functions
commit 15301a570754c7af60335d094dd2d1808b0641a5 upstream.

Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up
after enabling function tracer. I asked him to bisect the functions within
available_filter_functions, which he did and it came down to three:

  _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64()

It was found that this is only an issue when noreplace-paravirt is added
to the kernel command line.

This means that those functions are most likely called within critical
sections of the funtion tracer, and must not be traced.

In newer kenels _paravirt_nop() is defined within gcc asm(), and is no
longer an issue.  But both _paravirt_ident_{32,64}() causes the
following splat when they are traced:

 mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054)
 mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070)
 mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054)
 mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054)
 NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469]
 Modules linked in: e1000e
 CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
 task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000
 RIP: 0010:[<ffffffff81134148>]  [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0
 RSP: 0018:ffff8800d4aefb90  EFLAGS: 00000246
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40
 RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030
 RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000
 R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8
 R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0
 FS:  00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0
 Call Trace:
   _raw_spin_lock+0x27/0x30
   handle_pte_fault+0x13db/0x16b0
   handle_mm_fault+0x312/0x670
   __do_page_fault+0x1b1/0x4e0
   do_page_fault+0x22/0x30
   page_fault+0x28/0x30
   __vfs_read+0x28/0xe0
   vfs_read+0x86/0x130
   SyS_read+0x46/0xa0
   entry_SYSCALL_64_fastpath+0x1e/0xa8
 Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b

Reported-by: Łukasz Daniluk <lukasz.daniluk@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:51 +01:00
Jiri Kosina 1eae225e43 x86/mm/pat, /dev/mem: Remove superfluous error message
commit 39380b80d72723282f0ea1d1bbf2294eae45013e upstream.

Currently it's possible for broken (or malicious) userspace to flood a
kernel log indefinitely with messages a-la

	Program dmidecode tried to access /dev/mem between f0000->100000

because range_is_allowed() is case of CONFIG_STRICT_DEVMEM being turned on
dumps this information each and every time devmem_is_allowed() fails.

Reportedly userspace that is able to trigger contignuous flow of these
messages exists.

It would be possible to rate limit this message, but that'd have a
questionable value; the administrator wouldn't get information about all
the failing accessess, so then the information would be both superfluous
and incomplete at the same time :)

Returning EPERM (which is what is actually happening) is enough indication
for userspace what has happened; no need to log this particular error as
some sort of special condition.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1607081137020.24757@cbobk.fhfr.pm
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:51 +01:00
Wanpeng Li 928a27752e x86/apic: Do not init irq remapping if ioapic is disabled
commit 2e63ad4bd5dd583871e6602f9d398b9322d358d9 upstream.

native_smp_prepare_cpus
  -> default_setup_apic_routing
    -> enable_IR_x2apic
      -> irq_remapping_prepare
        -> intel_prepare_irq_remapping
          -> intel_setup_irq_remapping

So IR table is setup even if "noapic" boot parameter is added. As a result we
crash later when the interrupt affinity is set due to a half initialized
remapping infrastructure.

Prevent remap initialization when IOAPIC is disabled.

Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joerg Roedel <joro@8bytes.org>
Link: http://lkml.kernel.org/r/1471954039-3942-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:51 +01:00
Sebastian Andrzej Siewior b591901eb8 x86/mm: Disable preemption during CR3 read+write
commit 5cf0791da5c162ebc14b01eb01631cfa7ed4fa6e upstream.

There's a subtle preemption race on UP kernels:

Usually current->mm (and therefore mm->pgd) stays the same during the
lifetime of a task so it does not matter if a task gets preempted during
the read and write of the CR3.

But then, there is this scenario on x86-UP:

TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by:

 -> mmput()
 -> exit_mmap()
 -> tlb_finish_mmu()
 -> tlb_flush_mmu()
 -> tlb_flush_mmu_tlbonly()
 -> tlb_flush()
 -> flush_tlb_mm_range()
 -> __flush_tlb_up()
 -> __flush_tlb()
 ->  __native_flush_tlb()

At this point current->mm is NULL but current->active_mm still points to
the "old" mm.

Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
own mm so CR3 has changed.

Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's
mm and so CR3 remains unchanged. Once taskA gets active it continues
where it was interrupted and that means it writes its old CR3 value
back. Everything is fine because userland won't need its memory
anymore.

Now the fun part:

Let's preempt taskA one more time and get back to taskB. This
time switch_mm() won't do a thing because oldmm (->active_mm)
is the same as mm (as per context_switch()). So we remain
with a bad CR3 / PGD and return to userland.

The next thing that happens is handle_mm_fault() with an address for
the execution of its code in userland. handle_mm_fault() realizes that
it has a PTE with proper rights so it returns doing nothing. But the
CPU looks at the wrong PGD and insists that something is wrong and
faults again. And again. And one more time…

This pagefault circle continues until the scheduler gets tired of it and
puts another task on the CPU. It gets little difficult if the task is a
RT task with a high priority. The system will either freeze or it gets
fixed by the software watchdog thread which usually runs at RT-max prio.
But waiting for the watchdog will increase the latency of the RT task
which is no good.

Fix this by disabling preemption across the critical code section.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
[ Prettified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:51 +01:00
Andy Lutomirski 213060d93a x86/traps: Ignore high word of regs->cs in early_idt_handler_common
This is a backport of:
commit fc0e81b2bea0ebceb71889b61d2240856141c9ee upstream

On the 80486 DX, it seems that some exceptions may leave garbage in
the high bits of CS.  This causes sporadic failures in which
early_fixup_exception() refuses to fix up an exception.

As far as I can tell, this has been buggy for a long time, but the
problem seems to have been exacerbated by commits:

  1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4")
  e1bfc11c5a6f ("x86/init: Fix cr4_init_shadow() on CR4-less machines")

This appears to have broken for as long as we've had early
exception handling.

[ This backport should apply to kernels from 3.4 - 4.5. ]

Fixes: 4c5023a3fa ("x86-32: Handle exception table entries during early boot")
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org
Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:50 +01:00
Juergen Gross 506750b71b x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
commit 1cf38741308c64d08553602b3374fb39224eeb5a upstream.

xen_cleanhighmap() is operating on level2_kernel_pgt only. The upper
bound of the loop setting non-kernel-image entries to zero should not
exceed the size of level2_kernel_pgt.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 23:32:50 +01:00
Jan Beulich 0679df0fff x86/mm/xen: Suppress hugetlbfs in PV guests
commit 103f6112f253017d7062cd74d17f4a514ed4485c upstream.

Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:

  kernel BUG at .../fs/hugetlbfs/inode.c:428!
  invalid opcode: 0000 [#1] SMP
  ...
  RIP: e030:[<ffffffff811c333b>]  [<ffffffff811c333b>] remove_inode_hugepages+0x25b/0x320
  ...
  Call Trace:
   [<ffffffff811c3415>] hugetlbfs_evict_inode+0x15/0x40
   [<ffffffff81167b3d>] evict+0xbd/0x1b0
   [<ffffffff8116514a>] __dentry_kill+0x19a/0x1f0
   [<ffffffff81165b0e>] dput+0x1fe/0x220
   [<ffffffff81150535>] __fput+0x155/0x200
   [<ffffffff81079fc0>] task_work_run+0x60/0xa0
   [<ffffffff81063510>] do_exit+0x160/0x400
   [<ffffffff810637eb>] do_group_exit+0x3b/0xa0
   [<ffffffff8106e8bd>] get_signal+0x1ed/0x470
   [<ffffffff8100f854>] do_signal+0x14/0x110
   [<ffffffff810030e9>] prepare_exit_to_usermode+0xe9/0xf0
   [<ffffffff814178a5>] retint_user+0x8/0x13

This is CVE-2016-3961 / XSA-174.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <JGross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>
Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 09:04:07 +01:00
Ignacio Alvarado 01c2288236 KVM: Disable irq while unregistering user notifier
commit 1650b4ebc99da4c137bfbfc531be4a2405f951dd upstream.

Function user_notifier_unregister should be called only once for each
registered user notifier.

Function kvm_arch_hardware_disable can be executed from an IPI context
which could cause a race condition with a VCPU returning to user mode
and attempting to unregister the notifier.

Signed-off-by: Ignacio Alvarado <ikalvarado@google.com>
Fixes: 18863bdd60 ("KVM: x86 shared msr infrastructure")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 09:04:07 +01:00
Paolo Bonzini 014db7f668 KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
commit 7301d6abaea926d685832f7e1f0c37dd206b01f4 upstream.

Reported by syzkaller:

    [ INFO: suspicious RCU usage. ]
    4.9.0-rc4+ #47 Not tainted
    -------------------------------
    ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage!

    stack backtrace:
    CPU: 1 PID: 6679 Comm: syz-executor Not tainted 4.9.0-rc4+ #47
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
     ffff880039e2f6d0 ffffffff81c2e46b ffff88003e3a5b40 0000000000000000
     0000000000000001 ffffffff83215600 ffff880039e2f700 ffffffff81334ea9
     ffffc9000730b000 0000000000000004 ffff88003c4f8420 ffff88003d3f8000
    Call Trace:
     [<     inline     >] __dump_stack lib/dump_stack.c:15
     [<ffffffff81c2e46b>] dump_stack+0xb3/0x118 lib/dump_stack.c:51
     [<ffffffff81334ea9>] lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4445
     [<     inline     >] __kvm_memslots include/linux/kvm_host.h:534
     [<     inline     >] kvm_memslots include/linux/kvm_host.h:541
     [<ffffffff8105d6ae>] kvm_gfn_to_hva_cache_init+0xa1e/0xce0 virt/kvm/kvm_main.c:1941
     [<ffffffff8112685d>] kvm_lapic_set_vapic_addr+0xed/0x140 arch/x86/kvm/lapic.c:2217

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: fda4e2e85589191b123d31cdc21fd33ee70f50fd
Cc: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 09:04:07 +01:00
Ido Yariv a3e49e607d KVM: x86: fix wbinvd_dirty_mask use-after-free
commit bd768e146624cbec7122ed15dead8daa137d909d upstream.

vcpu->arch.wbinvd_dirty_mask may still be used after freeing it,
corrupting memory. For example, the following call trace may set a bit
in an already freed cpu mask:
    kvm_arch_vcpu_load
    vcpu_load
    vmx_free_vcpu_nested
    vmx_free_vcpu
    kvm_arch_vcpu_free

Fix this by deferring freeing of wbinvd_dirty_mask.

Signed-off-by: Ido Yariv <ido@wizery.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 09:04:07 +01:00
James Hogan 010a6cc4a6 KVM: MIPS: Make ERET handle ERL before EXL
commit ede5f3e7b54a4347be4d8525269eae50902bd7cd upstream.

The ERET instruction to return from exception is used for returning from
exception level (Status.EXL) and error level (Status.ERL). If both bits
are set however we should be returning from ERL first, as ERL can
interrupt EXL, for example when an NMI is taken. KVM however checks EXL
first.

Fix the order of the checks to match the pseudocode in the instruction
set manual.

Fixes: e685c689f3 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 09:04:07 +01:00
Radim Krčmář a0753a8c4a KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write
commit dccbfcf52cebb8963246eba5b177b77f26b34da0 upstream.

If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
write with vmcs02 as the current VMCS.
This will incorrectly apply modifications intended for vmcs01 to vmcs02
and L2 can use it to gain access to L0's x2APIC registers by disabling
virtualized x2APIC while using msr bitmap that assumes enabled.

Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
current VMCS.  An alternative solution would temporarily make vmcs01 the
current VMCS, but it requires more care.

Fixes: 8d14695f95 ("x86, apicv: add virtual x2apic support")
Reported-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 09:04:07 +01:00
James Hogan 4c7055f88a KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
commit 91e4f1b6073dd680d86cdb7e42d7cccca9db39d8 upstream.

When a guest TLB entry is replaced by TLBWI or TLBWR, we only invalidate
TLB entries on the local CPU. This doesn't work correctly on an SMP host
when the guest is migrated to a different physical CPU, as it could pick
up stale TLB mappings from the last time the vCPU ran on that physical
CPU.

Therefore invalidate both user and kernel host ASIDs on other CPUs,
which will cause new ASIDs to be generated when it next runs on those
CPUs.

We're careful only to do this if the TLB entry was already valid, and
only for the kernel ASID where the virtual address it mapped is outside
of the guest user address range.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Cc: Jiri Slaby <jslaby@suse.cz>
[james.hogan@imgtec.com: Backport to 3.10..3.16]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 09:04:07 +01:00
James Hogan 1d3b3d3176 KVM: MIPS: Precalculate MMIO load resume PC
commit e1e575f6b026734be3b1f075e780e91ab08ca541 upstream.

The advancing of the PC when completing an MMIO load is done before
re-entering the guest, i.e. before restoring the guest ASID. However if
the load is in a branch delay slot it may need to access guest code to
read the prior branch instruction. This isn't safe in TLB mapped code at
the moment, nor in the future when we'll access unmapped guest segments
using direct user accessors too, as it could read the branch from host
user memory instead.

Therefore calculate the resume PC in advance while we're still in the
right context and save it in the new vcpu->arch.io_pc (replacing the no
longer needed vcpu->arch.pending_load_cause), and restore it on MMIO
completion.

Fixes: e685c689f3 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10.x-3.16.x: 5f508c43a764: MIPS: KVM: Fix unused variable build warning
Cc: <stable@vger.kernel.org> # 3.10.x-3.16.x
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[james.hogan@imgtec.com: Backport to 3.10..3.16]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 09:04:07 +01:00
Nicholas Mc Guire 63b0fa1706 MIPS: KVM: Fix unused variable build warning
commit 5f508c43a7648baa892528922402f1e13f258bd4 upstream.

As kvm_mips_complete_mmio_load() did not yet modify PC at this point
as James Hogans <james.hogan@imgtec.com> explained the curr_pc variable
and the comments along with it can be dropped.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Link: http://lkml.org/lkml/2015/5/8/422
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/9993/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[james.hogan@imgtec.com: Backport to 3.10..3.16]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-06 09:04:07 +01:00
Patrick Tjin a848c65fb7 Merge branch android-msm-bullhead-3.10-nyc-mr2 into android-msm-bullhead-3.10 2017-01-26 12:02:30 -08:00
Josh Gao 5bb169b717 ARM64: Wire up getrandom.
Bug: http://b/29621447
Change-Id: I2e7623ae13318b91589d17d779391c4baa292421
2017-01-25 19:07:30 -08:00
Russell King 1e6ffcb122 BACKPORT: ARM: wire up getrandom syscall
Clean cherry pick of eb6452537b280652eee66801ec97cc369e27e5d8.

Add the new getrandom syscall for ARM.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Bug: http://b/29621447
Change-Id: I6d50b57f3a61fbf9102c69103b9a5b7ebf239860
(cherry picked from commit eb6452537b280652eee66801ec97cc369e27e5d8)
2017-01-25 19:07:07 -08:00
Theodore Ts'o f4387cc432 BACKPORT: random: introduce getrandom(2) system call
Almost clean cherry pick of c6e9d6f38894798696f23c8084ca7edbf16ee895,
includes change made by merge 0891ad829d2a0501053703df66029e843e3b8365.

The getrandom(2) system call was requested by the LibreSSL Portable
developers.  It is analoguous to the getentropy(2) system call in
OpenBSD.

The rationale of this system call is to provide resiliance against
file descriptor exhaustion attacks, where the attacker consumes all
available file descriptors, forcing the use of the fallback code where
/dev/[u]random is not available.  Since the fallback code is often not
well-tested, it is better to eliminate this potential failure mode
entirely.

The other feature provided by this new system call is the ability to
request randomness from the /dev/urandom entropy pool, but to block
until at least 128 bits of entropy has been accumulated in the
/dev/urandom entropy pool.  Historically, the emphasis in the
/dev/urandom development has been to ensure that urandom pool is
initialized as quickly as possible after system boot, and preferably
before the init scripts start execution.

This is because changing /dev/urandom reads to block represents an
interface change that could potentially break userspace which is not
acceptable.  In practice, on most x86 desktop and server systems, in
general the entropy pool can be initialized before it is needed (and
in modern kernels, we will printk a warning message if not).  However,
on an embedded system, this may not be the case.  And so with this new
interface, we can provide the functionality of blocking until the
urandom pool has been initialized.  Any userspace program which uses
this new functionality must take care to assure that if it is used
during the boot process, that it will not cause the init scripts or
other portions of the system startup to hang indefinitely.

SYNOPSIS
	#include <linux/random.h>

	int getrandom(void *buf, size_t buflen, unsigned int flags);

DESCRIPTION
	The system call getrandom() fills the buffer pointed to by buf
	with up to buflen random bytes which can be used to seed user
	space random number generators (i.e., DRBG's) or for other
	cryptographic uses.  It should not be used for Monte Carlo
	simulations or other programs/algorithms which are doing
	probabilistic sampling.

	If the GRND_RANDOM flags bit is set, then draw from the
	/dev/random pool instead of the /dev/urandom pool.  The
	/dev/random pool is limited based on the entropy that can be
	obtained from environmental noise, so if there is insufficient
	entropy, the requested number of bytes may not be returned.
	If there is no entropy available at all, getrandom(2) will
	either block, or return an error with errno set to EAGAIN if
	the GRND_NONBLOCK bit is set in flags.

	If the GRND_RANDOM bit is not set, then the /dev/urandom pool
	will be used.  Unlike using read(2) to fetch data from
	/dev/urandom, if the urandom pool has not been sufficiently
	initialized, getrandom(2) will block (or return -1 with the
	errno set to EAGAIN if the GRND_NONBLOCK bit is set in flags).

	The getentropy(2) system call in OpenBSD can be emulated using
	the following function:

            int getentropy(void *buf, size_t buflen)
            {
                    int     ret;

                    if (buflen > 256)
                            goto failure;
                    ret = getrandom(buf, buflen, 0);
                    if (ret < 0)
                            return ret;
                    if (ret == buflen)
                            return 0;
            failure:
                    errno = EIO;
                    return -1;
            }

RETURN VALUE
       On success, the number of bytes that was filled in the buf is
       returned.  This may not be all the bytes requested by the
       caller via buflen if insufficient entropy was present in the
       /dev/random pool, or if the system call was interrupted by a
       signal.

       On error, -1 is returned, and errno is set appropriately.

ERRORS
	EINVAL		An invalid flag was passed to getrandom(2)

	EFAULT		buf is outside the accessible address space.

	EAGAIN		The requested entropy was not available, and
			getentropy(2) would have blocked if the
			GRND_NONBLOCK flag was not set.

	EINTR		While blocked waiting for entropy, the call was
			interrupted by a signal handler; see the description
			of how interrupted read(2) calls on "slow" devices
			are handled with and without the SA_RESTART flag
			in the signal(7) man page.

NOTES
	For small requests (buflen <= 256) getrandom(2) will not
	return EINTR when reading from the urandom pool once the
	entropy pool has been initialized, and it will return all of
	the bytes that have been requested.  This is the recommended
	way to use getrandom(2), and is designed for compatibility
	with OpenBSD's getentropy() system call.

	However, if you are using GRND_RANDOM, then getrandom(2) may
	block until the entropy accounting determines that sufficient
	environmental noise has been gathered such that getrandom(2)
	will be operating as a NRBG instead of a DRBG for those people
	who are working in the NIST SP 800-90 regime.  Since it may
	block for a long time, these guarantees do *not* apply.  The
	user may want to interrupt a hanging process using a signal,
	so blocking until all of the requested bytes are returned
	would be unfriendly.

	For this reason, the user of getrandom(2) MUST always check
	the return value, in case it returns some error, or if fewer
	bytes than requested was returned.  In the case of
	!GRND_RANDOM and small request, the latter should never
	happen, but the careful userspace code (and all crypto code
	should be careful) should check for this anyway!

	Finally, unless you are doing long-term key generation (and
	perhaps not even then), you probably shouldn't be using
	GRND_RANDOM.  The cryptographic algorithms used for
	/dev/urandom are quite conservative, and so should be
	sufficient for all purposes.  The disadvantage of GRND_RANDOM
	is that it can block, and the increased complexity required to
	deal with partially fulfilled getrandom(2) requests.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Zach Brown <zab@zabbo.net>

Bug: http://b/29621447
Change-Id: I189ba74070dd6d918b0fdf83ff30bb74ec0f7556
(cherry picked from commit 4af712e8df998475736f3e2727701bd31e3751a9)
2017-01-25 19:06:23 -08:00
Dave Weinstein 19c833e083 ANDROID: lib: vsprintf: whitelist stack traces
Use the %pP functionality to explicitly allow kernel
pointers to be logged for stack traces

BUG: 30368199
Change-Id: I495915465565293e9e4da5aa28fbd1d14538d99b
Signed-off-by: Dave Weinstein <olorin@google.com>
2017-01-20 13:05:53 -08:00
Patrick Tjin 415ccacc9e Merge branch 'android-msm-bullhead-3.10-nyc-mr2' into android-msm-bullhead-3.10
March 2017.1

Bug: 34128678
2017-01-18 15:25:56 -08:00
Lorenzo Colitti f97a38bc9e bullhead: config: enable CONFIG_CRYPTO_SHA512 and savedefconfig
Bug: 34114242
Change-Id: I53a121503bc9c3b43f004a612efa92068ea3fc4f
2017-01-16 17:09:25 +09:00
Max Bires fcb795b0fe arm64: bullhead_defconfig: Unsetting DEVPORT from bullhead configurations.
/dev/port is not used as shown by previous SELinux policy changes to
blacklist it. It is being unset to reduce kernel size and attack
surface, mirroring the same chagnes done to DEVMEM and DEVKMEM.

Change-Id: I0a9e26fa3f234fd94a6d0521bc7edc7d5a84f4f6
Signed-off-by: Max Bires <jbires@google.com>
Bug: 33301618
2017-01-06 18:37:39 +00:00
Patrick Tjin 488f3bdc24 Merge branch android-msm-bullhead-3.10-nyc-mr2 into android-msm-bullhead-3.10 2016-12-13 19:07:41 -08:00
John Dias 2431031b32 arm64/configs: enable RCU_BOOST
Enable RCU boost to avoid risk of priority-inversion
and memory leaks when readers are preempted.

Bug: 32633926
Change-Id: I6de9b83c7f83646955e229a94078ae49c5962bc2
Signed-off-by: John Dias <joaodias@google.com>
2016-12-09 22:25:01 +00:00
Patrick Tjin 8d8b94b6b0 arm64/configs: bullhead: add CONFIG_QUOTA
Bug: 28032718
Change-Id: Idcdb529881df5c30a0de0db6f0865bc554bf5f71
2016-11-15 22:52:44 +00:00
Martijn Coenen 02111bc358 arm64/configs: bullhead: enable hwbinder domain.
Change-Id: I990b10784b02d23c92b56c577cf1ba81cfac6f78
Signed-off-by: Martijn Coenen <maco@android.com>
2016-11-02 13:31:22 +01:00
Patrick Tjin 01872d075e Merge branch android-msm-bullhead-3.10-security-next into android-msm-bullhead-3.10
December 2016.1
2016-10-21 15:59:23 -07:00
James Hogan fea24c07f6 MIPS: KVM: Check for pfn noslot case
commit ba913e4f72fc9cfd03dad968dfb110eb49211d80 upstream.

When mapping a page into the guest we error check using is_error_pfn(),
however this doesn't detect a value of KVM_PFN_NOSLOT, indicating an
error HVA for the page. This can only happen on MIPS right now due to
unusual memslot management (e.g. being moved / removed / resized), or
with an Enhanced Virtual Memory (EVA) configuration where the default
KVM_HVA_ERR_* and kvm_is_error_hva() definitions are unsuitable (fixed
in a later patch). This case will be treated as a pfn of zero, mapping
the first page of physical memory into the guest.

It would appear the MIPS KVM port wasn't updated prior to being merged
(in v3.10) to take commit 81c52c56e2 ("KVM: do not treat noslot pfn as
a error pfn") into account (merged v3.8), which converted a bunch of
is_error_pfn() calls to is_error_noslot_pfn(). Switch to using
is_error_noslot_pfn() instead to catch this case properly.

Fixes: 858dd5d457 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[james.hogan@imgtec.com: Backport to v3.16.y]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-10-20 00:46:31 +02:00
Willy Tarreau baa032f71a Revert "powerpc/tm: Always reclaim in start_thread() for exec() class syscalls"
This reverts commit 8110080dc5.

Guenter noticed that this breaks PPC build when CONFIG_PPC_TRANSACTIONAL_MEM
is set, because this patch was not for 3.10.

Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-10-20 00:46:30 +02:00
Mark Rutland 5667cac21f UPSTREAM: arm64: make sys_call_table const
As with x86, mark the sys_call_table const such that it will be placed
in the .rodata section. This will cause attempts to modify the table
(accidental or deliberate) to fail when strict page permissions are in
place. In the absence of strict page permissions, there should be no
functional change.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Bug: 31660652

Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
(cherry picked from commit c623b33b4e9599c6ac5076f7db7369eb9869aa04)
Change-Id: I6b39438617578e5714543c8987d9567e7a64a174
2016-10-17 23:14:25 -07:00
Wei Wang 478382f187 arm64/configs: bullhead: remove kernel logger
Bug: 31941628
Change-Id: I9193319a38fea170844a2adb66f93e24e862f9d9
2016-10-04 16:33:42 -07:00
Patrick Tjin 8be927c77c arm64/configs: bullhead: disable extra SCSI configs
Bug: 30951599
Change-Id: Ie76dced13ceaba8bd30574e6d16fcf802322cc6e
Signed-off-by: Patrick Tjin <pattjin@google.com>
2016-09-17 08:20:28 +00:00
James Hogan 256dc4c87a metag: Fix __cmpxchg_u32 asm constraint for CMP
commit 6154c187b97ee7513046bb4eb317a89f738f13ef upstream.

The LNKGET based atomic sequence in __cmpxchg_u32 has slightly incorrect
constraints for the return value which under certain circumstances can
allow an address unit register to be used as the first operand of a CMP
instruction. This isn't a valid instruction however as the encodings
only allow a data unit to be specified. This would result in an
assembler error like the following:

  Error: failed to assemble instruction: "CMP A0.2,D0Ar6"

Fix by changing the constraint from "=&da" (assigned, early clobbered,
data or address unit register) to "=&d" (data unit register only).

The constraint for the second operand, "bd" (an op2 register where op1
is a data unit register and the instruction supports O2R) is already
correct assuming the first operand is a data unit register.

Other cases of CMP in inline asm have had their constraints checked, and
appear to all be fine.

Fixes: 6006c0d8ce ("metag: Atomics, locks and bitops")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.9.x-
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:38 +02:00
David Howells f7f15c543b KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspace
commit 20f06ed9f61a185c6dabd662c310bed6189470df upstream.

MIPS64 needs to use compat_sys_keyctl for 32-bit userspace rather than
calling sys_keyctl.  The latter will work in a lot of cases, thereby hiding
the issue.

Reported-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: keyrings@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13832/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:34 +02:00
Andy Lutomirski d5a5d0bcb3 x86/mm: Improve switch_mm() barrier comments
commit 4eaffdd5a5fe6ff9f95e1ab4de1ac904d5e0fa8b upstream.

My previous comments were still a bit confusing and there was a
typo. Fix it up.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:33 +02:00
Vineet Gupta 46d597e212 ARC: use ASL assembler mnemonic
commit a6416f57ce57fb390b6ee30b12c01c29032a26af upstream.

ARCompact and ARCv2 only have ASL, while binutils used to support LSL as
a alias mnemonic.

Newer binutils (upstream) don't want to do that so replace it.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:31 +02:00
Alexey Brodkin 6b03918b1d arc: unwind: warn only once if DW2_UNWIND is disabled
commit 9bd54517ee86cb164c734f72ea95aeba4804f10b upstream.

If CONFIG_ARC_DW2_UNWIND is disabled every time arc_unwind_core()
gets called following message gets printed in debug console:
----------------->8---------------
CONFIG_ARC_DW2_UNWIND needs to be enabled
----------------->8---------------

That message makes sense if user indeed wants to see a backtrace or
get nice function call-graphs in perf but what if user disabled
unwinder for the purpose? Why pollute his debug console?

So instead we'll warn user about possibly missing feature once and
let him decide if that was what he or she really wanted.

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:30 +02:00
Jan Willeke 884c6beca3 s390/seccomp: fix error return for filtered system calls
commit dc295880c6752076f8b94ba3885d0bfff09e3e82 upstream.

The syscall_set_return_value function of s390 negates the error argument
before storing the value to the return register gpr2. This is incorrect,
the seccomp code already passes the negative error value.
Store the unmodified error value to gpr2.

Signed-off-by: Jan Willeke <willeke@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:30 +02:00
Borislav Petkov 583e3ee15b x86/amd_nb: Fix boot crash on non-AMD systems
commit 1ead852dd88779eda12cb09cc894a03d9abfe1ec upstream.

Fix boot crash that triggers if this driver is built into a kernel and
run on non-AMD systems.

AMD northbridges users call amd_cache_northbridges() and it returns
a negative value to signal that we weren't able to cache/detect any
northbridges on the system.

At least, it should do so as all its callers expect it to do so. But it
does return a negative value only when kmalloc() fails.

Fix it to return -ENODEV if there are no NBs cached as otherwise, amd_nb
users like amd64_edac, for example, which relies on it to know whether
it should load or not, gets loaded on systems like Intel Xeons where it
shouldn't.

Reported-and-tested-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1466097230-5333-2-git-send-email-bp@alien8.de
Link: https://lkml.kernel.org/r/5761BEB0.9000807@cybernetics.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:23 +02:00
Masami Hiramatsu b92e992e59 kprobes/x86: Clear TF bit in fault on single-stepping
commit dcfc47248d3f7d28df6f531e6426b933de94370d upstream.

Fix kprobe_fault_handler() to clear the TF (trap flag) bit of
the flags register in the case of a fault fixup on single-stepping.

If we put a kprobe on the instruction which caused a
page fault (e.g. actual mov instructions in copy_user_*),
that fault happens on the single-stepping buffer. In this
case, kprobes resets running instance so that the CPU can
retry execution on the original ip address.

However, current code forgets to reset the TF bit. Since this
fault happens with TF bit set for enabling single-stepping,
when it retries, it causes a debug exception and kprobes
can not handle it because it already reset itself.

On the most of x86-64 platform, it can be easily reproduced
by using kprobe tracer. E.g.

  # cd /sys/kernel/debug/tracing
  # echo p copy_user_enhanced_fast_string+5 > kprobe_events
  # echo 1 > events/kprobes/enable

And you'll see a kernel panic on do_debug(), since the debug
trap is not handled by kprobes.

To fix this problem, we just need to clear the TF bit when
resetting running kprobe.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: systemtap@sourceware.org
Cc: stable@vger.kernel.org # All the way back to ancient kernels
Link: http://lkml.kernel.org/r/20160611140648.25885.37482.stgit@devbox
[ Updated the comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:22 +02:00
H. Peter Anvin 970e17c2f8 x86, build: copy ldlinux.c32 to image.iso
commit 9c77679cadb118c0aa99e6f88533d91765a131ba upstream.

For newer versions of Syslinux, we need ldlinux.c32 in addition to
isolinux.bin to reside on the boot disk, so if the latter is found,
copy it, too, to the isoimage tree.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linux Stable Tree <stable@vger.kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-27 11:40:22 +02:00
Helge Deller fcf55035f9 parisc: Fix pagefault crash in unaligned __get_user() call
commit 8b78f260887df532da529f225c49195d18fef36b upstream.

One of the debian buildd servers had this crash in the syslog without
any other information:

 Unaligned handler failed, ret = -2
 clock_adjtime (pid 22578): Unaligned data reference (code 28)
 CPU: 1 PID: 22578 Comm: clock_adjtime Tainted: G  E  4.5.0-2-parisc64-smp #1 Debian 4.5.4-1
 task: 000000007d9960f8 ti: 00000001bde7c000 task.ti: 00000001bde7c000

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
 PSW: 00001000000001001111100000001111 Tainted: G            E
 r00-03  000000ff0804f80f 00000001bde7c2b0 00000000402d2be8 00000001bde7c2b0
 r04-07  00000000409e1fd0 00000000fa6f7fff 00000001bde7c148 00000000fa6f7fff
 r08-11  0000000000000000 00000000ffffffff 00000000fac9bb7b 000000000002b4d4
 r12-15  000000000015241c 000000000015242c 000000000000002d 00000000fac9bb7b
 r16-19  0000000000028800 0000000000000001 0000000000000070 00000001bde7c218
 r20-23  0000000000000000 00000001bde7c210 0000000000000002 0000000000000000
 r24-27  0000000000000000 0000000000000000 00000001bde7c148 00000000409e1fd0
 r28-31  0000000000000001 00000001bde7c320 00000001bde7c350 00000001bde7c218
 sr00-03  0000000001200000 0000000001200000 0000000000000000 0000000001200000
 sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

 IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402d2e84 00000000402d2e88
  IIR: 0ca0d089    ISR: 0000000001200000  IOR: 00000000fa6f7fff
  CPU:        1   CR30: 00000001bde7c000 CR31: ffffffffffffffff
  ORIG_R28: 00000002369fe628
  IAOQ[0]: compat_get_timex+0x2dc/0x3c0
  IAOQ[1]: compat_get_timex+0x2e0/0x3c0
  RP(r2): compat_get_timex+0x40/0x3c0
 Backtrace:
  [<00000000402d4608>] compat_SyS_clock_adjtime+0x40/0xc0
  [<0000000040205024>] syscall_exit+0x0/0x14

This means the userspace program clock_adjtime called the clock_adjtime()
syscall and then crashed inside the compat_get_timex() function.
Syscalls should never crash programs, but instead return EFAULT.

The IIR register contains the executed instruction, which disassebles
into "ldw 0(sr3,r5),r9".
This load-word instruction is part of __get_user() which tried to read the word
at %r5/IOR (0xfa6f7fff). This means the unaligned handler jumped in.  The
unaligned handler is able to emulate all ldw instructions, but it fails if it
fails to read the source e.g. because of page fault.

The following program reproduces the problem:

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/mman.h>

int main(void) {
        /* allocate 8k */
        char *ptr = mmap(NULL, 2*4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
        /* free second half (upper 4k) and make it invalid. */
        munmap(ptr+4096, 4096);
        /* syscall where first int is unaligned and clobbers into invalid memory region */
        /* syscall should return EFAULT */
        return syscall(__NR_clock_adjtime, 0, ptr+4095);
}

To fix this issue we simply need to check if the faulting instruction address
is in the exception fixup table when the unaligned handler failed. If it
is, call the fixup routine instead of crashing.

While looking at the unaligned handler I found another issue as well: The
target register should not be modified if the handler was unsuccessful.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-22 07:31:40 +02:00
Dave Weinstein d948109df1 arm: oabi compat: add missing access checks
commit 7de249964f5578e67b99699c5f0b405738d820a2 upstream.

Add access checks to sys_oabi_epoll_wait() and sys_oabi_semtimedop().
This fixes CVE-2016-3857, a local privilege escalation under
CONFIG_OABI_COMPAT.

Cc: stable@vger.kernel.org
Reported-by: Chiachih Wu <wuchiachih@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Dave Weinstein <olorin@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-22 07:31:40 +02:00
Russell King 951b391d26 ARM: fix PTRACE_SETVFPREGS on SMP systems
commit e2dfb4b880146bfd4b6aa8e138c0205407cebbaf upstream.

PTRACE_SETVFPREGS fails to properly mark the VFP register set to be
reloaded, because it undoes one of the effects of vfp_flush_hwstate().

Specifically vfp_flush_hwstate() sets thread->vfpstate.hard.cpu to
an invalid CPU number, but vfp_set() overwrites this with the original
CPU number, thereby rendering the hardware state as apparently "valid",
even though the software state is more recent.

Fix this by reverting the previous change.

Cc: <stable@vger.kernel.org>
Fixes: 8130b9d7b9 ("ARM: 7308/1: vfp: flush thread hwstate before copying ptrace registers")
Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Simon Marchi <simon.marchi@ericsson.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-22 07:31:39 +02:00
Paolo Bonzini 42bc57b6a7 KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS
commit d14bdb553f9196169f003058ae1cdabe514470e6 upstream.

MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
any of bits 63:32.  However, this is not detected at KVM_SET_DEBUGREGS
time, and the next KVM_RUN oopses:

   general protection fault: 0000 [#1] SMP
   CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
   Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
   [...]
   Call Trace:
    [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm]
    [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
    [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
    [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
    [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
   RIP  [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40
    RSP <ffff88005836bd50>

Testcase (beautified/reduced from syzkaller output):

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[8];

    int main()
    {
        struct kvm_debugregs dr = { 0 };

        r[2] = open("/dev/kvm", O_RDONLY);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
        r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);

        memcpy(&dr,
               "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72"
               "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8"
               "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9"
               "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb",
               48);
        r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr);
        r[6] = ioctl(r[4], KVM_RUN, 0);
    }

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-22 07:31:39 +02:00
Cyril Bur 8110080dc5 powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
commit 8e96a87c5431c256feb65bcfc5aec92d9f7839b6 upstream.

Userspace can quite legitimately perform an exec() syscall with a
suspended transaction. exec() does not return to the old process, rather
it load a new one and starts that, the expectation therefore is that the
new process starts not in a transaction. Currently exec() is not treated
any differently to any other syscall which creates problems.

Firstly it could allow a new process to start with a suspended
transaction for a binary that no longer exists. This means that the
checkpointed state won't be valid and if the suspended transaction were
ever to be resumed and subsequently aborted (a possibility which is
exceedingly likely as exec()ing will likely doom the transaction) the
new process will jump to invalid state.

Secondly the incorrect attempt to keep the transactional state while
still zeroing state for the new process creates at least two TM Bad
Things. The first triggers on the rfid to return to userspace as
start_thread() has given the new process a 'clean' MSR but the suspend
will still be set in the hardware MSR. The second TM Bad Thing triggers
in __switch_to() as the processor is still transactionally suspended but
__switch_to() wants to zero the TM sprs for the new process.

This is an example of the outcome of calling exec() with a suspended
transaction. Note the first 700 is likely the first TM bad thing
decsribed earlier only the kernel can't report it as we've loaded
userspace registers. c000000000009980 is the rfid in
fast_exception_return()

  Bad kernel stack pointer 3fffcfa1a370 at c000000000009980
  Oops: Bad kernel stack pointer, sig: 6 [#1]
  CPU: 0 PID: 2006 Comm: tm-execed Not tainted
  NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000
  REGS: c00000003ffefd40 TRAP: 0700   Not tainted
  MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]>  CR: 00000000  XER: 00000000
  CFAR: c0000000000098b4 SOFTE: 0
  PACATMSCRATCH: b00000010000d033
  GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000
  GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000
  NIP [c000000000009980] fast_exception_return+0xb0/0xb8
  LR [0000000000000000]           (null)
  Call Trace:
  Instruction dump:
  f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070
  e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b

  Kernel BUG at c000000000043e80 [verbose debug info unavailable]
  Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033)
  Oops: Unrecoverable exception, sig: 6 [#2]
  CPU: 0 PID: 2006 Comm: tm-execed Tainted: G      D
  task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000
  NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000
  REGS: c00000003ffef7e0 TRAP: 0700   Tainted: G      D
  MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]>  CR: 28002828  XER: 00000000
  CFAR: c000000000015a20 SOFTE: 0
  PACATMSCRATCH: b00000010000d033
  GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000
  GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000
  GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004
  GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000
  GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000
  GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80
  NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c
  LR [c000000000015a24] __switch_to+0x1f4/0x420
  Call Trace:
  Instruction dump:
  7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
  4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020

This fixes CVE-2016-5828.

Fixes: bc2a9408fa ("powerpc: Hook in new transactional memory code")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:45 +02:00
Gavin Shan cc80e5c914 powerpc/pseries: Fix PCI config address for DDW
commit 8a934efe94347eee843aeea65bdec8077a79e259 upstream.

In commit 8445a87f7092 "powerpc/iommu: Remove the dependency on EEH
struct in DDW mechanism", the PE address was replaced with the PCI
config address in order to remove dependency on EEH. According to PAPR
spec, firmware (pHyp or QEMU) should accept "xxBBSSxx" format PCI config
address, not "xxxxBBSS" provided by the patch. Note that "BB" is PCI bus
number and "SS" is the combination of slot and function number.

This fixes the PCI address passed to DDW RTAS calls.

Fixes: 8445a87f7092 ("powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism")
Cc: stable@vger.kernel.org # v3.4+
Reported-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:45 +02:00
Guilherme G. Piccoli b738ed81a8 powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism
commit 8445a87f7092bc8336ea1305be9306f26b846d93 upstream.

Commit 39baadbf36 ("powerpc/eeh: Remove eeh information from pci_dn")
changed the pci_dn struct by removing its EEH-related members.
As part of this clean-up, DDW mechanism was modified to read the device
configuration address from eeh_dev struct.

As a consequence, now if we disable EEH mechanism on kernel command-line
for example, the DDW mechanism will fail, generating a kernel oops by
dereferencing a NULL pointer (which turns to be the eeh_dev pointer).

This patch just changes the configuration address calculation on DDW
functions to a manual calculation based on pci_dn members instead of
using eeh_dev-based address.

No functional changes were made. This was tested on pSeries, both
in PHyp and qemu guest.

Fixes: 39baadbf36 ("powerpc/eeh: Remove eeh information from pci_dn")
Cc: stable@vger.kernel.org # v3.4+
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:45 +02:00
Russell Currey 628cb1780b powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge
commit 871e178e0f2c4fa788f694721a10b4758d494ce1 upstream.

In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the
spec states that values of 9900-9905 can be returned, indicating that
software should delay for 10^x (where x is the last digit, i.e. 990x)
milliseconds and attempt the call again. Currently, the kernel doesn't
know about this, and respecting it fixes some PCI failures when the
hypervisor is busy.

The delay is capped at 0.2 seconds.

Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:45 +02:00
Thomas Huth 0d330892ab powerpc: Use privileged SPR number for MMCR2
commit 8dd75ccb571f3c92c48014b3dabd3d51a115ab41 upstream.

We are already using the privileged versions of MMCR0, MMCR1
and MMCRA in the kernel, so for MMCR2, we should better use
the privileged versions, too, to be consistent.

Fixes: 240686c136 ("powerpc: Initialise PMU related regs on Power8")
Suggested-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:44 +02:00
Thomas Huth aeadb93a16 powerpc: Fix definition of SIAR and SDAR registers
commit d23fac2b27d94aeb7b65536a50d32bfdc21fe01e upstream.

The SIAR and SDAR registers are available twice, one time as SPRs
780 / 781 (unprivileged, but read-only), and one time as the SPRs
796 / 797 (privileged, but read and write). The Linux kernel code
currently uses the unprivileged  SPRs - while this is OK for reading,
writing to that register of course does not work.
Since the KVM code tries to write to this register, too (see the mtspr
in book3s_hv_rmhandlers.S), the contents of this register sometimes get
lost for the guests, e.g. during migration of a VM.
To fix this issue, simply switch to the privileged SPR numbers instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:44 +02:00
Hari Bathini be925e7ad2 powerpc/book3s64: Fix branching to OOL handlers in relocatable kernel
commit 8ed8ab40047a570fdd8043a40c104a57248dd3fd upstream.

Some of the interrupt vectors on 64-bit POWER server processors are only
32 bytes long (8 instructions), which is not enough for the full
first-level interrupt handler. For these we need to branch to an
out-of-line (OOL) handler. But when we are running a relocatable kernel,
interrupt vectors till __end_interrupts marker are copied down to real
address 0x100. So, branching to labels (ie. OOL handlers) outside this
section must be handled differently (see LOAD_HANDLER()), considering
relocatable kernel, which would need at least 4 instructions.

However, branching from interrupt vector means that we corrupt the
CFAR (come-from address register) on POWER7 and later processors as
mentioned in commit 1707dd16. So, EXCEPTION_PROLOG_0 (6 instructions)
that contains the part up to the point where the CFAR is saved in the
PACA should be part of the short interrupt vectors before we branch out
to OOL handlers.

But as mentioned already, there are interrupt vectors on 64-bit POWER
server processors that are only 32 bytes long (like vectors 0x4f00,
0x4f20, etc.), which cannot accomodate the above two cases at the same
time owing to space constraint. Currently, in these interrupt vectors,
we simply branch out to OOL handlers, without using LOAD_HANDLER(),
which leaves us vulnerable when running a relocatable kernel (eg. kdump
case). While this has been the case for sometime now and kdump is used
widely, we were fortunate not to see any problems so far, for three
reasons:

  1. In almost all cases, production kernel (relocatable) is used for
     kdump as well, which would mean that crashed kernel's OOL handler
     would be at the same place where we end up branching to, from short
     interrupt vector of kdump kernel.
  2. Also, OOL handler was unlikely the reason for crash in almost all
     the kdump scenarios, which meant we had a sane OOL handler from
     crashed kernel that we branched to.
  3. On most 64-bit POWER server processors, page size is large enough
     that marking interrupt vector code as executable (see commit
     429d2e83) leads to marking OOL handler code from crashed kernel,
     that sits right below interrupt vector code from kdump kernel, as
     executable as well.

Let us fix this by moving the __end_interrupts marker down past OOL
handlers to make sure that we also copy OOL handlers to real address
0x100 when running a relocatable kernel.

This fix has been tested successfully in kdump scenario, on an LPAR with
4K page size by using different default/production kernel and kdump
kernel.

Also tested by manually corrupting the OOL handlers in the first kernel
and then kdump'ing, and then causing the OOL handlers to fire - mpe.

Fixes: c1fb6816fb ("powerpc: Add relocation on exception vector handlers")
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:44 +02:00
James Hogan 9338ba336a MIPS: KVM: Fix modular KVM under QEMU
commit 797179bc4fe06c89e47a9f36f886f68640b423f8 upstream.

Copy __kvm_mips_vcpu_run() into unmapped memory, so that we can never
get a TLB refill exception in it when KVM is built as a module.

This was observed to happen with the host MIPS kernel running under
QEMU, due to a not entirely transparent optimisation in the QEMU TLB
handling where TLB entries replaced with TLBWR are copied to a separate
part of the TLB array. Code in those pages continue to be executable,
but those mappings persist only until the next ASID switch, even if they
are marked global.

An ASID switch happens in __kvm_mips_vcpu_run() at exception level after
switching to the guest exception base. Subsequent TLB mapped kernel
instructions just prior to switching to the guest trigger a TLB refill
exception, which enters the guest exception handlers without updating
EPC. This appears as a guest triggered TLB refill on a host kernel
mapped (host KSeg2) address, which is not handled correctly as user
(guest) mode accesses to kernel (host) segments always generate address
error exceptions.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.10.x-
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[james.hogan@imgtec.com: backported for stable 3.14]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:41 +02:00
Ralf Baechle ce7222fcf1 MIPS: Fix 64k page support for 32 bit kernels.
commit d7de413475f443957a0c1d256e405d19b3a2cb22 upstream.

TASK_SIZE was defined as 0x7fff8000UL which for 64k pages is not a
multiple of the page size.  Somewhere further down the math fails
such that executing an ELF binary fails.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Tested-by: Joshua Henderson <joshua.henderson@microchip.com>
Cc: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:41 +02:00
Matthias Schiffer 13f004c0ac MIPS: ath79: make bootconsole wait for both THRE and TEMT
commit f5b556c94c8490d42fea79d7b4ae0ecbc291e69d upstream.

This makes the ath79 bootconsole behave the same way as the generic 8250
bootconsole.

Also waiting for TEMT (transmit buffer is empty) instead of just THRE
(transmit buffer is not full) ensures that all characters have been
transmitted before the real serial driver starts reconfiguring the serial
controller (which would sometimes result in garbage being transmitted.)
This change does not cause a visible performance loss.

In addition, this seems to fix a hang observed in certain configurations on
many AR7xxx/AR9xxx SoCs during autoconfig of the real serial driver.

A more complete follow-up patch will disable 8250 autoconfig for ath79
altogether (the serial controller is detected as a 16550A, which is not
fully compatible with the ath79 serial, and the autoconfig may lead to
undefined behavior on ath79.)

Cc: <stable@vger.kernel.org>
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:41 +02:00
James Hogan f8141132c5 MIPS: Fix siginfo.h to use strict posix types
commit 5daebc477da4dfeb31ae193d83084def58fd2697 upstream.

Commit 85efde6f4e ("make exported headers use strict posix types")
changed the asm-generic siginfo.h to use the __kernel_* types, and
commit 3a471cbc08 ("remove __KERNEL_STRICT_NAMES") make the internal
types accessible only to the kernel, but the MIPS implementation hasn't
been updated to match.

Switch to proper types now so that the exported asm/siginfo.h won't
produce quite so many compiler errors when included alone by a user
program.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Christopher Ferris <cferris@google.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 2.6.30-
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/12477/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:40 +02:00
Paul Burton 2c767daeba MIPS: math-emu: Fix jalr emulation when rd == $0
commit ab4a92e66741b35ca12f8497896bafbe579c28a1 upstream.

When emulating a jalr instruction with rd == $0, the code in
isBranchInstr was incorrectly writing to GPR $0 which should actually
always remain zeroed. This would lead to any further instructions
emulated which use $0 operating on a bogus value until the task is next
context switched, at which point the value of $0 in the task context
would be restored to the correct zero by a store in SAVE_SOME. Fix this
by not writing to rd if it is $0.

Fixes: 102cedc32a ("MIPS: microMIPS: Floating point support.")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Maciej W. Rozycki <macro@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13160/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:40 +02:00
James Hogan 5928125348 MIPS: KVM: Propagate kseg0/mapped tlb fault errors
commit 9b731bcfdec4c159ad2e4312e25d69221709b96a upstream.

Propagate errors from kvm_mips_handle_kseg0_tlb_fault() and
kvm_mips_handle_mapped_seg_tlb_fault(), usually triggering an internal
error since they normally indicate the guest accessed bad physical
memory or the commpage in an unexpected way.

Fixes: 858dd5d457 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
Fixes: e685c689f3 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[james.hogan@imgtec.com: Backport to v3.10.y - v3.15.y]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:40 +02:00
James Hogan 8cee00e93c MIPS: KVM: Fix gfn range check in kseg0 tlb faults
commit 0741f52d1b980dbeb290afe67d88fc2928edd8ab upstream.

Two consecutive gfns are loaded into host TLB, so ensure the range check
isn't off by one if guest_pmap_npages is odd.

Fixes: 858dd5d457 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[james.hogan@imgtec.com: Backport to v3.10.y - v3.15.y]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:39 +02:00
James Hogan 2336de8d71 MIPS: KVM: Add missing gfn range check
commit 8985d50382359e5bf118fdbefc859d0dbf6cebc7 upstream.

kvm_mips_handle_mapped_seg_tlb_fault() calculates the guest frame number
based on the guest TLB EntryLo values, however it is not range checked
to ensure it lies within the guest_pmap. If the physical memory the
guest refers to is out of range then dump the guest TLB and emit an
internal error.

Fixes: 858dd5d457 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[james.hogan@imgtec.com: Backport to v3.10.y - v3.15.y]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:39 +02:00
James Hogan 828e4e161f MIPS: KVM: Fix mapped fault broken commpage handling
commit c604cffa93478f8888bec62b23d6073dad03d43a upstream.

kvm_mips_handle_mapped_seg_tlb_fault() appears to map the guest page at
virtual address 0 to PFN 0 if the guest has created its own mapping
there. The intention is unclear, but it may have been an attempt to
protect the zero page from being mapped to anything but the comm page in
code paths you wouldn't expect from genuine commpage accesses (guest
kernel mode cache instructions on that address, hitting trapping
instructions when executing from that address with a coincidental TLB
eviction during the KVM handling, and guest user mode accesses to that
address).

Fix this to check for mappings exactly at KVM_GUEST_COMMPAGE_ADDR (it
may not be at address 0 since commit 42aa12e74e91 ("MIPS: KVM: Move
commpage so 0x0 is unmapped")), and set the corresponding EntryLo to be
interpreted as 0 (invalid).

Fixes: 858dd5d457 ("KVM/MIPS32: MMU/TLB operations for the Guest.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[james.hogan@imgtec.com: Backport to v3.10.y - v3.15.y]
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:39 +02:00
Andy Lutomirski b1b4becf71 x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
commit 71b3c126e61177eb693423f2e18a1914205b165e upstream.

When switch_mm() activates a new PGD, it also sets a bit that
tells other CPUs that the PGD is in use so that TLB flush IPIs
will be sent.  In order for that to work correctly, the bit
needs to be visible prior to loading the PGD and therefore
starting to fill the local TLB.

Document all the barriers that make this work correctly and add
a couple that were missing.

CVE-2016-2069

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ luis: backported to 3.16:
  - dropped N/A comment in flush_tlb_mm_range()
  - adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
[ciwillia@brocade.com: backported to 3.10: adjusted context]
Signed-off-by: Charles (Chas) Williams <ciwillia@brocade.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:36 +02:00
Andrey Ryabinin 55e1f395ce perf/x86: Fix undefined shift on 32-bit kernels
commit 6d6f2833bfbf296101f9f085e10488aef2601ba5 upstream.

Jim reported:

	UBSAN: Undefined behaviour in arch/x86/events/intel/core.c:3708:12
	shift exponent 35 is too large for 32-bit type 'long unsigned int'

The use of 'unsigned long' type obviously is not correct here, make it
'unsigned long long' instead.

Reported-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Imre Palik <imrep@amazon.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: 2c33645d366d ("perf/x86: Honor the architectural performance monitoring version")
Link: http://lkml.kernel.org/r/1462974711-10037-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Christopher <kevinc@vmware.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:32 +02:00
Palik, Imre ad0fd1a5f8 perf/x86: Honor the architectural performance monitoring version
commit 2c33645d366d13b969d936b68b9f4875b1fdddea upstream.

Architectural performance monitoring, version 1, doesn't support fixed counters.

Currently, even if a hypervisor advertises support for architectural
performance monitoring version 1, perf may still try to use the fixed
counters, as the constraints are set up based on the CPU model.

This patch ensures that perf honors the architectural performance monitoring
version returned by CPUID, and it only uses the fixed counters for version 2
and above.

(Some of the ideas in this patch came from Peter Zijlstra.)

Signed-off-by: Imre Palik <imrep@amazon.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1433767609-1039-1-git-send-email-imrep.amz@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[wt: FIXED_EVENT_FLAGS was X86_RAW_EVENT_MASK in 3.10]
Cc: Kevin Christopher <kevinc@vmware.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 23:22:32 +02:00
Andi Kleen 2d2bec8f2f x86, asmlinkage, apm: Make APM data structure used from assembler visible
commit 54c2f3fdb941204cad136024c7b854b7ad112ab6 upstream.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-12-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-08-21 17:26:12 +02:00
Marek Szyprowski 6a40b4a821 arm64: dma-mapping: always clear allocated buffers
[ Upstream commit 6829e274a623187c24f7cfc0e3d35f25d087fcc5 ]

Buffers allocated by dma_alloc_coherent() are always zeroed on Alpha,
ARM (32bit), MIPS, PowerPC, x86/x86_64 and probably other architectures.
It turned out that some drivers rely on this 'feature'. Allocated buffer
might be also exposed to userspace with dma_mmap() call, so clearing it
is desired from security point of view to avoid exposing random memory
to userspace. This patch unifies dma_alloc_coherent() behavior on ARM64
architecture with other implementations by unconditionally zeroing
allocated buffer.

Bug: 29795245
CRs-Fixed: 1041735
Change-Id: I74bf024e0f603ca8c0b05430dc2ee154d579cfb2
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Git-commit: a142e9641dcbead2c8845c949ad518acac96ed28
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[lmark@codeaurora.org: resolve merge conflicts]
Signed-off-by: Liam Mark <lmark@codeaurora.org>
2016-08-19 23:39:16 +00:00
Suzuki K. Poulose e9b2df99bb UPSTREAM: arm64: perf: reject groups spanning multiple HW PMUs
The perf core implicitly rejects events spanning multiple HW PMUs, as in
these cases the event->ctx will differ. However this validation is
performed after pmu::event_init() is called in perf_init_event(), and
thus pmu::event_init() may be called with a group leader from a
different HW PMU.

The ARM64 PMU driver does not take this fact into account, and when
validating groups assumes that it can call to_arm_pmu(event->pmu) for
any HW event. When the event in question is from another HW PMU this is
wrong, and results in dereferencing garbage.

This patch updates the ARM64 PMU driver to first test for and reject
events from other PMUs, moving the to_arm_pmu and related logic after
this test. Fixes a crash triggered by perf_fuzzer on Linux-4.0-rc2, with
a CCI PMU present:

Bad mode in Synchronous Abort handler detected, code 0x86000006 -- IABT (current EL)
CPU: 0 PID: 1371 Comm: perf_fuzzer Not tainted 3.19.0+ #249
Hardware name: V2F-1XV7 Cortex-A53x2 SMM (DT)
task: ffffffc07c73a280 ti: ffffffc07b0a0000 task.ti: ffffffc07b0a0000
PC is at 0x0
LR is at validate_event+0x90/0xa8
pc : [<0000000000000000>] lr : [<ffffffc000090228>] pstate: 00000145
sp : ffffffc07b0a3ba0

[<          (null)>]           (null)
[<ffffffc0000907d8>] armpmu_event_init+0x174/0x3cc
[<ffffffc00015d870>] perf_try_init_event+0x34/0x70
[<ffffffc000164094>] perf_init_event+0xe0/0x10c
[<ffffffc000164348>] perf_event_alloc+0x288/0x358
[<ffffffc000164c5c>] SyS_perf_event_open+0x464/0x98c
Code: bad PC value

Also cleans up the code to use the arm_pmu only when we know
that we are dealing with an arm pmu event.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Peter Ziljstra (Intel) <peterz@infradead.org>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

(cherry picked from commit 8fff105e13041e49b82f92eef034f363a6b1c071)

Bug: 29508816
Change-Id: I6fa1860d561fbcdf88101eea319815eb4b8e3e29
2016-08-19 22:32:43 +00:00
Lorenzo Colitti d50137a63a bullhead: Add IPv6 rpfilter support.
Bug: 9580643
Bug: 30298058
Change-Id: Iaef6b863029da32db6284b340fd2ff162cf3e7ec
2016-08-02 06:26:19 +00:00
Patrick Tjin 4015feb8d9 Revert "arm64: Allow cache maintenance operations to trigger write faults"
This reverts commit 3fbe6bc28a.

Bug: 27265969
Change-Id: I4f855ba1f640247aad172b9ef4160a4d298370d4
Signed-off-by: Patrick Tjin <pattjin@google.com>
2016-07-29 14:28:08 -07:00
Zhiting Zhu 518ddfb9b0 Add branch instruction count
Simpleperf only shows branch misses count. Without branch
instruction count, branch miss ratio could not be calculated.

BUG: 29426791

Change-Id: Id1a371746e476507c5257481fd9c25a546048cb3
2016-07-15 19:10:23 +00:00
Rohit Kulkarni 43154bfee9 ARM: dts: msm: increase secure heap size for resolution change
Increase CMA secure heap size to support scenario of dynamic
resolution switching to 1080p resolution when older resolution
buffers may not be freed yet.

Bug: 29577623
Signed-off-by: Rohit Kulkarni <rkulkarn@codeaurora.org>
2016-06-28 11:09:25 -07:00
Mekala Natarajan 9ec5de0995 bullhead_defconfig: enable SECURITY_PERF_EVENTS_RESTRICT
Bug: 29119870
Change-Id: If2de39c04ebc542479da547196cd292de972cb03
Signed-off-by: Mekala Natarajan <mnatarajan@google.com>
2016-06-20 18:52:45 +00:00
Veena Sambasivan 1b490b29c7 perf: arm64: fix RCU usage on pmu resume from low-power
Commit ( perf: arm64: implement CPU_PM notifier)
added code in the arm perf infrastructure that allows the kernel to
save/restore perf counters whenever the CPU enters a low-power
state. The kernel saves/restores the counters for each active event
through the armpmu_{stop/start} ARM pmu API, so that the low-power state
enter/exit cycle is emulated through pmu start/stop operations for each
event in use.

However, calling armpmu_start() for each active event on power up
executes code that requires RCU locking (perf_event_update_userpage())
to be functional, so, given that the core may call the CPU_PM notifiers
while running the idle thread in an quiescent RCU state this is not
allowed as detected through the following splat when kernel is run with
CONFIG_PROVE_LOCKING enabled:

[   49.293286]
[   49.294761] ===============================
[   49.298895] [ INFO: suspicious RCU usage. ]
[   49.303031] 4.6.0-rc3+ #421 Not tainted
[   49.306821] -------------------------------
[   49.310956] include/linux/rcupdate.h:872 rcu_read_lock() used
illegally while idle!
[   49.318530]
[   49.318530] other info that might help us debug this:
[   49.318530]
[   49.326451]
[   49.326451] RCU used illegally from idle CPU!
[   49.326451] rcu_scheduler_active = 1, debug_locks = 0
[   49.337209] RCU used illegally from extended quiescent state!
[   49.342892] 2 locks held by swapper/2/0:
[   49.346768]  #0:  (cpu_pm_notifier_lock){......}, at:
[<ffffff8008163c28>] cpu_pm_exit+0x18/0x80
[   49.355492]  #1:  (rcu_read_lock){......}, at: [<ffffff800816dc38>]
perf_event_update_userpage+0x0/0x260

This patch wraps the armpmu_start() call (that indirectly calls
perf_event_update_userpage()) on CPU_PM notifier power state exit (or
failed entry) within the RCU_NONIDLE() macro so that the RCU subsystem
is made aware the calling cpu is not idle from an RCU perspective for
the armpmu_start() call duration, therefore fixing the issue.

Bug: 28086229
Fixes: da4e4f18afe0 ("drivers/perf: arm_pmu: implement CPU_PM notifier")
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reported-by: James Morse <james.morse@arm.com>
Suggested-by: Kevin Hilman <khilman@baylibre.com>
Cc: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Git-commit: cbcc72e037b8a3eb1fad3c1ae22021df21c97a51
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[schikk@codeaurora.org: ported the change for 3.10 kernel]
Change-Id: I8376beb62a597b03806744df3aba26bd6deee6c2
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>

Signed-off-by: Veena Sambasivan <veenas@codeaurora.org>
2016-06-13 15:34:54 -07:00
Veena Sambasivan 6cfbde2707 perf: arm64: implement CPU_PM notifier
When a CPU is suspended (either through suspend-to-RAM or CPUidle),
its PMU registers content can be lost, which means that counters
registers values that were initialized on power down entry have to be
reprogrammed on power-up to make sure the counters set-up is preserved
(ie on power-up registers take the reset values on Cold or Warm reset,
which can be architecturally UNKNOWN).

To guarantee seamless profiling conditions across a core power down
this patch adds a CPU PM notifier to ARM pmus, that upon CPU PM
entry/exit from low-power states saves/restores the pmu registers
set-up (by using the ARM perf API), so that the power-down/up cycle does
not affect the perf behaviour (apart from a black-out period between
power-up/down CPU PM notifications that is unavoidable).

Bug: 28086229
Cc: Will Deacon <will.deacon@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Git-commit: da4e4f18afe0f3729d68f3785c5802f786d36e34
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[schikk@codeaurora.org: ported the change for 3.10 kernel]
Change-Id: I249afadd85f96ec89989fa015090c484748abb17
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>

Signed-off-by: Veena Sambasivan <veenas@codeaurora.org>
2016-06-13 15:33:41 -07:00
Veena Sambasivan 0075aae104 Revert "Perf: arm64: support hotplug and power collapse"
This reverts commit 19d3bf8ee8 .
("Perf: arm64: support hotplug and power collapse")

This change is being reverted so that it can be replaced
by equivalent functionality from upstream.

Bug: 28086229
Change-Id: Iffd577bb7d7070791749a79aabf3a07eaa202532
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>
Signed-off-by: Veena Sambasivan <veenas@codeaurora.org>
2016-06-13 15:33:29 -07:00
Veena Sambasivan 4d6b3d3aba Revert "Perf: arm64: fix disable of pmu irq during hotplug"
This reverts commit fd64280eb5 .
("Perf: arm64: fix disable of pmu irq during hotplug")

This change is being reverted so that it can be replaced
by equivalent functionality from upstream.

Bug: 28086229
Change-Id: I94fb8218698d0bdda1c33b81ce16a0c0d3d326dd
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>
Signed-off-by: Veena Sambasivan <veenas@codeaurora.org>
2016-06-13 15:33:25 -07:00
Veena Sambasivan 1f5ed55bee Revert "Perf: arm64: restore registers after reset"
This reverts commit d42a3dae16 .
("Perf: arm64: restore registers after reset")

This change is being reverted so that it can be replaced
by equivalent functionality from upstream.

Bug: 28086229
Change-Id: Ie217c0a655a961057badcd00732d96891d507e71
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>
Signed-off-by: Veena Sambasivan <veenas@codeaurora.org>
2016-06-13 15:33:20 -07:00
Veena Sambasivan b6cc008e6d Revert "Perf: arm64: stop counters when going into hotplug"
This reverts commit ff90fe97c9 .
("Perf: arm64: stop counters when going into hotplug")

This change is being reverted so that it can be replaced
by equivalent functionality from upstream.

Bug: 28086229
Change-Id: Ib542f86b83617b8ea5e582f41e164cececb0e60f
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>
Signed-off-by: Veena Sambasivan <veenas@codeaurora.org>
2016-06-13 15:33:15 -07:00
Patrick Tjin dd32500c4f Merge branch 'android-msm-bullhead-3.10-security-next' into android-msm-bullhead-3.10
Merge security-next into master @ 92459d1 for August 2016.1
2016-06-10 10:40:09 -07:00
Praveen Chavan e0736438b0 ARM: dts: msm: reduce secure heap sizes for support up to 1080p
Reduce CMA and QSEE secure heap size to support 1080p playback;
which was originally carved out for 4K.

Bug: 28920141
Signed-off-by: Praveen Chavan <pchavan@codeaurora.org>
2016-06-08 23:08:04 +00:00
Tony Lindgren d248f68a93 ARM: OMAP3: Fix booting with thumb2 kernel
commit d8a50941c91a68da202aaa96a3dacd471ea9c693 upstream.

We get a NULL pointer dereference on omap3 for thumb2 compiled kernels:

Internal error: Oops: 80000005 [#1] SMP THUMB2
...
[<c046497b>] (_raw_spin_unlock_irqrestore) from [<c0024375>]
(omap3_enter_idle_bm+0xc5/0x178)
[<c0024375>] (omap3_enter_idle_bm) from [<c0374e63>]
(cpuidle_enter_state+0x77/0x27c)
[<c0374e63>] (cpuidle_enter_state) from [<c00627f1>]
(cpu_startup_entry+0x155/0x23c)
[<c00627f1>] (cpu_startup_entry) from [<c06b9a47>]
(start_kernel+0x32f/0x338)
[<c06b9a47>] (start_kernel) from [<8000807f>] (0x8000807f)

The power management related assembly on omaps needs to interact with
ARM mode bootrom code, so we need to keep most of the related assembly
in ARM mode.

Turns out this error is because of missing ENDPROC for assembly code
as suggested by Stephen Boyd <sboyd@codeaurora.org>. Let's fix the
problem by adding ENDPROC in two places to sleep34xx.S.

Let's also remove the now duplicate custom code for mode switching.
This has been unnecessary since commit 6ebbf2ce437b ("ARM: convert
all "mov.* pc, reg" to "bx reg" for ARMv6+").

And let's also remove the comments about local variables, they are
now just confusing after the ENDPROC.

The reason why ENDPROC makes a difference is it sets .type and then
the compiler knows what to do with the thumb bit as explained at:

https://wiki.ubuntu.com/ARM/Thumb2PortingHowto

Reported-by: Kevin Hilman <khilman@kernel.org>
Tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:54 +02:00
Anton Blanchard 73dd3ac10b powerpc: scan_features() updates incorrect bits for REAL_LE
commit 6997e57d693b07289694239e52a10d2f02c3a46f upstream.

The REAL_LE feature entry in the ibm_pa_feature struct is missing an MMU
feature value, meaning all the remaining elements initialise the wrong
values.

This means instead of checking for byte 5, bit 0, we check for byte 0,
bit 0, and then we incorrectly set the CPU feature bit as well as MMU
feature bit 1 and CPU user feature bits 0 and 2 (5).

Checking byte 0 bit 0 (IBM numbering), means we're looking at the
"Memory Management Unit (MMU)" feature - ie. does the CPU have an MMU.
In practice that bit is set on all platforms which have the property.

This means we set CPU_FTR_REAL_LE always. In practice that seems not to
matter because all the modern cpus which have this property also
implement REAL_LE, and we've never needed to disable it.

We're also incorrectly setting MMU feature bit 1, which is:

  #define MMU_FTR_TYPE_8xx		0x00000002

Luckily the only place that looks for MMU_FTR_TYPE_8xx is in Book3E
code, which can't run on the same cpus as scan_features(). So this also
doesn't matter in practice.

Finally in the CPU user feature mask, we're setting bits 0 and 2. Bit 2
is not currently used, and bit 0 is:

  #define PPC_FEATURE_PPC_LE		0x00000001

Which says the CPU supports the old style "PPC Little Endian" mode.
Again this should be harmless in practice as no 64-bit CPUs implement
that mode.

Fix the code by adding the missing initialisation of the MMU feature.

Also add a comment marking CPU user feature bit 2 (0x4) as reserved. It
would be unsafe to start using it as old kernels incorrectly set it.

Fixes: 44ae3ab335 ("powerpc: Free up some CPU feature bits by moving out MMU-related features")
Signed-off-by: Anton Blanchard <anton@samba.org>
[mpe: Flesh out changelog, add comment reserving 0x4]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:53 +02:00
Sascha Hauer 7b640feea9 ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
commit 5616f36713ea77f57ae908bf2fef641364403c9f upstream.

The secondary CPU starts up in ARM mode. When the kernel is compiled in
thumb2 mode we have to explicitly compile the secondary startup
trampoline in ARM mode, otherwise the CPU will go to Nirvana.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Reported-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:52 +02:00
Pali Rohr 26898db604 ARM: OMAP3: Add cpuidle parameters table for omap3430
commit 98f42221501353067251fbf11e732707dbb68ce3 upstream.

Based on CPU type choose generic omap3 or omap3430 specific cpuidle
parameters. Parameters for omap3430 were measured on Nokia N900 device and
added by commit 5a1b1d3a9e ("OMAP3: RX-51: Pass cpu idle parameters")
which were later removed by commit 231900afba ("ARM: OMAP3: cpuidle -
remove rx51 cpuidle parameters table") due to huge code complexity.

This patch brings cpuidle parameters for omap3430 devices again, but uses
simple condition based on CPU type.

Fixes: 231900afba ("ARM: OMAP3: cpuidle - remove rx51 cpuidle
parameters table")
Signed-off-by: Pali Rohr <pali.rohar@gmail.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:52 +02:00
Helge Deller 247ed0d395 parisc: Fix kernel crash with reversed copy_from_user()
commit ef72f3110d8b19f4c098a0bff7ed7d11945e70c6 upstream.

The kernel module testcase (lib/test_user_copy.c) exhibited a kernel
crash on parisc if the parameters for copy_from_user were reversed
("illegal reversed copy_to_user" testcase).

Fix this potential crash by checking the fault handler if the faulting
address is in the exception table.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:48 +02:00
Helge Deller ac6a8eb7a7 parisc: Avoid function pointers for kernel exception routines
commit e3893027a300927049efc1572f852201eb785142 upstream.

We want to avoid the kernel module loader to create function pointers
for the kernel fixup routines of get_user() and put_user(). Changing
the external reference from function type to int type fixes this.

This unbreaks exception handling for get_user() and put_user() when
called from a kernel module.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:48 +02:00
Andi Kleen 750fc132a8 perf/x86/intel: Fix PEBS data source interpretation on Nehalem/Westmere
commit e17dc65328057c00db7e1bfea249c8771a78b30b upstream.

Jiri reported some time ago that some entries in the PEBS data source table
in perf do not agree with the SDM. We investigated and the bits
changed for Sandy Bridge, but the SDM was not updated.

perf already implements the bits correctly for Sandy Bridge
and later. This patch patches it up for Nehalem and Westmere.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1456871124-15985-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:48 +02:00
Max Filippov c716151e12 xtensa: clear all DBREAKC registers on start
commit 7de7ac785ae18a2cdc78d7560f48e3213d9ea0ab upstream.

There are XCHAL_NUM_DBREAK registers, clear them all.
This also fixes cryptic assembler error message with binutils 2.25 when
XCHAL_NUM_DBREAK is 0:

  as: out of memory allocating 18446744073709551575 bytes after a total
  of 495616 bytes

Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:47 +02:00
Max Filippov fd5924d8b3 xtensa: ISS: don't hang if stdin EOF is reached
commit 362014c8d9d51d504c167c44ac280169457732be upstream.

Simulator stdin may be connected to a file, when its end is reached
kernel hangs in infinite loop inside rs_poll, because simc_poll always
signals that descriptor 0 is readable and simc_read always returns 0.
Check simc_read return value and exit loop if it's not positive. Also
don't rewind polling timer if it's zero.

Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:46 +02:00
Andy Lutomirski 721485b62f x86/iopl: Fix iopl capability check on Xen PV
commit c29016cf41fe9fa994a5ecca607cf5f1cd98801e upstream.

iopl(3) is supposed to work if iopl is already 3, even if
unprivileged.  This didn't work right on Xen PV.  Fix it.

Reviewewd-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/8ce12013e6e4c0a44a97e316be4a6faff31bd5ea.1458162709.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:44 +02:00
H. Peter Anvin a47831b0d8 x86, processor-flags: Fix the datatypes and add bit number defines
commit d1fbefcb3aa608599a3c9e4582cbeeb6ba6c8939 upstream.

The control registers are unsigned long (32 bits on i386, 64 bits on
x86-64), and so make that manifest in the data type for the various
constants.  Add defines with a _BIT suffix which defines the bit
number, as opposed to the bit mask.

This should resolve some issues with ~bitmask that Linus discovered.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-cwckhbrib2aux1qbteaebij0@git.kernel.org
[wt: backported to 3.10 only to keep next patch clean]

Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:44 +02:00
H. Peter Anvin f85cb76155 x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE
commit afcbf13fa6d53d8a97eafaca1dcb344331d2ce0c upstream.

Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE to match the SDM.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Link: http://lkml.kernel.org/n/tip-buq1evi5dpykxx7ak6amaam0@git.kernel.org
[wt: backported to 3.10 only to keep next patch clean]

Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:44 +02:00
Radim Krčmář a6821c1796 KVM: i8254: change PIT discard tick policy
commit 7dd0fdff145c5be7146d0ac06732ae3613412ac1 upstream.

Discard policy uses ack_notifiers to prevent injection of PIT interrupts
before EOI from the last one.

This patch changes the policy to always try to deliver the interrupt,
which makes a difference when its vector is in ISR.
Old implementation would drop the interrupt, but proposed one injects to
IRR, like real hardware would.

The old policy breaks legacy NMI watchdogs, where PIT is used through
virtual wire (LVT0): PIT never sends an interrupt before receiving EOI,
thus a guest deadlock with disabled interrupts will stop NMIs.

Note that NMI doesn't do EOI, so PIT also had to send a normal interrupt
through IOAPIC.  (KVM's PIT is deeply rotten and luckily not used much
in modern systems.)

Even though there is a chance of regressions, I think we can fix the
LVT0 NMI bug without introducing a new tick policy.

Cc: <stable@vger.kernel.org>
Reported-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:44 +02:00
Kamal Mostafa c9950bcb91 x86/iopl/64: Properly context-switch IOPL on Xen PV
commit b7a584598aea7ca73140cb87b40319944dd3393f upstream.

From: Andy Lutomirski <luto@kernel.org>

On Xen PV, regs->flags doesn't reliably reflect IOPL and the
exit-to-userspace code doesn't change IOPL.  We need to context
switch it manually.

I'm doing this without going through paravirt because this is
specific to Xen PV.  After the dust settles, we can merge this with
the 32-bit code, tidy up the iopl syscall implementation, and remove
the set_iopl pvop entirely.

Fixes XSA-171.

Reviewewd-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/693c3bd7aeb4d3c27c92c622b7d0f554a458173c.1458162709.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ kamal: backport to 3.19-stable: no X86_FEATURE_XENPV so just call
  xen_pv_domain() directly ]
Acked-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2016-06-07 10:42:43 +02:00
Jeff Vander Stoep 7dcbe1f355 enable fstack-protector-strong
Enable protection against stack corruption. (most) Functions with the
possibility of stack corruption  are protected with a canary.

With this change, the bullhead kernel grows from 10636247 to 10878698
an increase of 2.3%. Performance loss is considered minimal. Security
against stack overflow is greatly improved. [1]

This is the improved version of CONFIG_CC_STACKPROTECTOR=y which is
enabled in shamu's kernel.

[1] https://lwn.net/Articles/584225

Bug: 28837708
Change-Id: I41e45451793d917a633160df093b73b81a9360e5
2016-05-26 06:53:54 +00:00
Veena Sambasivan 32ddad53bc msm: perf: Do not allocate new hw_event if event is duplicate.
During a perf_event_enable, kernel/events/core.c calls pmu->add() which
is platform implementation(arch/arm/kernel/perf_event.c). Due to the
duplicate constraints, arch/arm/mach-msm/perf_event_msm_krait_l2.c
drivers marks the event as OFF but returns TRUE to perf_event.c which
goes ahead and allocates the hw_event and enables it.
Since event is marked OFF, kernel events core will try to enable this event
again during next perf_event_enable. Which results in same event enabled
on multiple hw_events. But during the perf_release, event struct is freed
and only one hw_event is released. This results in dereferencing the
invalid pointer and hence the crash.
Fix this by returning error in case of constraint event duplicate. Hence
avoiding the same event programmed on multiple hw event counters.

bug: 28172137
Change-Id: Ia3360be027dfe87ac753191ffe7e0bc947e72455
Signed-off-by: Arun KS <arunks@codeaurora.org>
Signed-off-by: Veena Sambasivan <veenas@codeaurora.org>
2016-05-25 00:31:10 +00:00
Thierry Strudel 11afac3e2c Revert "arm64: Introduce execute-only page access permissions"
This reverts commit f72129c220.

Bug: 28557020
Change-Id: Ia63cce8e2b1c847a323527ab1021ef72f4f708db
Signed-off-by: Thierry Strudel <tstrudel@google.com>
2016-05-19 19:03:22 +00:00
Yueyao (Nathan) Zhu 7872047076 bullhead_defconfig: disable CONFIG_PFT as it is unsupported
Disable the drivers/platform/msm/pft.c driver as it is unsupported by
Qualcomm, and opens us up to a wide range of potential attack surfaces
that has not been audited by anyone.

Qualcomm recommends that it be disabled, as it hooks into SELinux in
some "interesting" ways, and the userspace portion of the code is not
even part of the image.

Bug: 28588434
Change-Id: I932ff9f8e9bc3aff01585f210514f52958a508b4
Author: Greg Kroah-Hartman <gregkh@google.com>
2016-05-16 22:50:42 +00:00
Thierry Strudel 394aa6b217 bullhead_defconfig: disable NF_TARGET_REJECT_SKERR
Disable forcing socket error when rejecting with icmp* for now
due to frequent kernel crash.

Bug: 28424847
Change-Id: I3926b2b160fbafe8597d0442b297bd1accf70b4b
Signed-off-by: Thierry Strudel <tstrudel@google.com>
2016-05-11 19:22:39 -07:00
Pavel Labath cc48d16f9e Revert "arch: arm64: disable hardware breakpoints"
The original commit says the breakpoints were disabled because
some chips failed to boot up with breakpoints enabled. This is not
true for bullhead devices, which appear to function well with them
enabled.

This reverts commit 5acefa10b8.

Bug: 28111681
Change-Id: Icc76ae43b6e1fd3d7ae6921afd293487eff0ff2b
Signed-off-by: Pavel Labath <labath@google.com>
2016-04-15 22:02:32 +00:00
Thierry Strudel 3469bb98e3 Revert "bullhead_defconfig: enable option for iotop to work"
This reverts commit 00b2d9c3f5.
2016-04-07 12:53:50 -07:00
Daniel Rosenberg d1605d73dd bullhead: Enable sdcardfs
Bug: 27794037
Signed-off-by: Daniel Rosenberg <drosen@google.com>
2016-03-29 17:59:49 -07:00
Kiran Gunda 6444e3ad62 msm: msm_bus: remove the buspm module from kernel
Remove the buspm module from msm_bus since it adds
no functionality to the bus bandwidth aggregation
driver. It is a loadable module used for profiling
purposes.

Bug: 26354602
Change-Id: I7d70a22f73a0c396a3d8e330d3207871546cbfe3
Signed-off-by: Yuan Lin <yualin@google.com>
2016-03-24 21:35:38 -07:00
Siqi Lin c33c20fb8e Revert "arm64/dts: bullhead: enable pstore ecc"
This reverts commit cf2903fd9d.

Bug: 26587668
Change-Id: Ie8b8b0e8bd5b2ff29a81af0703333266ab688df5
2016-03-23 20:37:29 +00:00
Tim Murray e4870a69c2 Revert "Disable Android kernel LMK, enable mem cgroups."
This reverts commit 3f7e5ad6b2.

bug 27804052
bug 27381069
bug 27799851

Change-Id: I0e5d2a56f976f45cc5cd6e623af4a0feae198006
2016-03-22 20:39:44 -07:00
dcashman 7905cfc8f8 BACKPORT: FROMLIST: mm: ASLR: use get_random_long()
(cherry picked from commit https://lkml.org/lkml/2016/2/4/833)

Replace calls to get_random_int() followed by a cast to (unsigned long)
with calls to get_random_long().  Also address shifting bug which, in case
of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.

Bug: 26963541
Change-Id: I78b86a343da2e48ffcfc494deaa79e3f3745d700
Signed-off-by: Daniel Cashman <dcashman@android.com>
Signed-off-by: Daniel Cashman <dcashman@google.com>
2016-03-16 23:05:34 +00:00
Marcelo Tosatti 73c1d981f0 KVM: x86: move steal time initialization to vcpu entry time
commit 7cae2bedcbd4680b155999655e49c27b9cf020fa upstream.

As reported at https://bugs.launchpad.net/qemu/+bug/1494350,
it is possible to have vcpu->arch.st.last_steal initialized
from a thread other than vcpu thread, say the iothread, via
KVM_SET_MSRS.

Which can cause an overflow later (when subtracting from vcpu threads
sched_info.run_delay).

To avoid that, move steal time accumulation to vcpu entry time,
before copying steal time data to guest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-16 08:41:36 -07:00
Andreas Schwab b4f8295ef4 powerpc: Fix dedotify for binutils >= 2.26
commit f15838e9cac8f78f0cc506529bb9d3b9fa589c1f upstream.

Since binutils 2.26 BFD is doing suffix merging on STRTAB sections.  But
dedotify modifies the symbol names in place, which can also modify
unrelated symbols with a name that matches a suffix of a dotted name.  To
remove the leading dot of a symbol name we can just increment the pointer
into the STRTAB section instead.

Backport to all stables to avoid breakage when people update their
binutils - mpe.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-16 08:41:36 -07:00
Radim Krčmář 618fedd40d KVM: VMX: disable PEBS before a guest entry
commit 7099e2e1f4d9051f31bbfa5803adf954bb5d76ef upstream.

Linux guests on Haswell (and also SandyBridge and Broadwell, at least)
would crash if you decided to run a host command that uses PEBS, like
  perf record -e 'cpu/mem-stores/pp' -a

This happens because KVM is using VMX MSR switching to disable PEBS, but
SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it
isn't safe:
  When software needs to reconfigure PEBS facilities, it should allow a
  quiescent period between stopping the prior event counting and setting
  up a new PEBS event. The quiescent period is to allow any latent
  residual PEBS records to complete its capture at their previously
  specified buffer address (provided by IA32_DS_AREA).

There might not be a quiescent period after the MSR switch, so a CPU
ends up using host's MSR_IA32_DS_AREA to access an area in guest's
memory.  (Or MSR switching is just buggy on some models.)

The guest can learn something about the host this way:
If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results
in #PF where we leak host's MSR_IA32_DS_AREA through CR2.

After that, a malicious guest can map and configure memory where
MSR_IA32_DS_AREA is pointing and can therefore get an output from
host's tracing.

This is not a critical leak as the host must initiate with PEBS tracing
and I have not been able to get a record from more than one instruction
before vmentry in vmx_vcpu_run() (that place has most registers already
overwritten with guest's).

We could disable PEBS just few instructions before vmentry, but
disabling it earlier shouldn't affect host tracing too much.
We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that
optimization isn't worth its code, IMO.

(If you are implementing PEBS for guests, be sure to handle the case
 where both host and guest enable PEBS, because this patch doesn't.)

Fixes: 26a4f3c08d ("perf/x86: disable PEBS on a guest entry.")
Reported-by: Jiří Olša <jolsa@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-16 08:41:35 -07:00
Thierry Strudel 00b2d9c3f5 bullhead_defconfig: enable option for iotop to work
Change-Id: Ia345a08d1735f995a0a270ddf451bf45f50d61b8
Signed-off-by: Thierry Strudel <tstrudel@google.com>
2016-03-14 10:25:57 -07:00
Martijn Coenen 3f7e5ad6b2 Disable Android kernel LMK, enable mem cgroups.
Android userspace lmkd was disabled because it was suspected
to be the cause of performance issues. In the end we didn't
find any reason for userspace lmkd to be causing these issues,
so re-enable userspace lmkd by disabling the kernel one.

Change-Id: I7a06ce683d5d3975c0f50626a429acfc9312ea25
2016-03-10 13:49:04 +01:00
Todd E Brandt 137d2a237d PM / sleep / x86: Fix crash on graph trace through x86 suspend
commit 92f9e179a702a6adbc11e2fedc76ecd6ffc9e3f7 upstream.

Pause/unpause graph tracing around do_suspend_lowlevel as it has
inconsistent call/return info after it jumps to the wakeup vector.
The graph trace buffer will otherwise become misaligned and
may eventually crash and hang on suspend.

To reproduce the issue and test the fix:
Run a function_graph trace over suspend/resume and set the graph
function to suspend_devices_and_enter. This consistently hangs the
system without this fix.

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-09 15:31:54 -08:00
Andy Lutomirski 4e24fd5466 x86/entry/compat: Add missing CLAC to entry_INT80_32
commit 3d44d51bd339766f0178f0cf2e8d048b4a4872aa upstream.

This doesn't seem to fix a regression -- I don't think the CLAC was
ever there.

I double-checked in a debugger: entries through the int80 gate do
not automatically clear AC.

Stable maintainers: I can provide a backport to 4.3 and earlier if
needed.  This needs to be backported all the way to 3.10.

Reported-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 63bcff2a30 ("x86, smap: Add STAC and CLAC instructions to control user space access")
Link: http://lkml.kernel.org/r/b02b7e71ae54074be01fc171cbd4b72517055c0e.1456345086.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[ kamal: backport to 3.10 through 3.19-stable: file rename; context ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-09 15:31:54 -08:00
Patrick Tjin 652c94ce54 arm64/config: bullhead: add CONFIG_IP_MULTICAST and savedefconfig
Bug: 19173869
Signed-off-by: Patrick Tjin <pattjin@google.com>
2016-03-09 13:13:25 -08:00
Dmitry V. Levin 39e88dd4da sparc64: fix incorrect sign extension in sys_sparc64_personality
commit 525fd5a94e1be0776fa652df5c687697db508c91 upstream.

The value returned by sys_personality has type "long int".
It is saved to a variable of type "int", which is not a problem
yet because the type of task_struct->pesonality is "unsigned int".
The problem is the sign extension from "int" to "long int"
that happens on return from sys_sparc64_personality.

For example, a userspace call personality((unsigned) -EINVAL) will
result to any subsequent personality call, including absolutely
harmless read-only personality(0xffffffff) call, failing with
errno set to EINVAL.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:06:23 -08:00
Vegard Nossum 9cbb43b99b uml: flush stdout before forking
commit 0754fb298f2f2719f0393491d010d46cfb25d043 upstream.

I was seeing some really weird behaviour where piping UML's output
somewhere would cause output to get duplicated:

  $ ./vmlinux | head -n 40
  Checking that ptrace can change system call numbers...Core dump limits :
          soft - 0
          hard - NONE
  OK
  Checking syscall emulation patch for ptrace...Core dump limits :
          soft - 0
          hard - NONE
  OK
  Checking advanced syscall emulation patch for ptrace...Core dump limits :
          soft - 0
          hard - NONE
  OK
  Core dump limits :
          soft - 0
          hard - NONE

This is because these tests do a fork() which duplicates the non-empty
stdout buffer, then glibc flushes the duplicated buffer as each child
exits.

A simple workaround is to flush before forking.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:06:23 -08:00
Ard Biesheuvel 396a61bef1 s390: fix normalization bug in exception table sorting
commit bcb7825a77f41c7dd91da6f7ac10b928156a322e upstream.

The normalization pass in the sorting routine of the relative exception
table serves two purposes:
- it ensures that the address fields of the exception table entries are
  fully ordered, so that no ambiguities arise between entries with
  identical instruction offsets (i.e., when two instructions that are
  exactly 8 bytes apart each have an exception table entry associated with
  them)
- it ensures that the offsets of both the instruction and the fixup fields
  of each entry are relative to their final location after sorting.

Commit eb608fb366 ("s390/exceptions: switch to relative exception table
entries") ported the relative exception table format from x86, but modified
the sorting routine to only normalize the instruction offset field and not
the fixup offset field. The result is that the fixup offset of each entry
will be relative to the original location of the entry before sorting,
likely leading to crashes when those entries are dereferenced.

Fixes: eb608fb366 ("s390/exceptions: switch to relative exception table entries")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:06:22 -08:00
Vineet Gupta 677eea664a ARC: dw2 unwind: Remove falllback linear search thru FDE entries
commit 2e22502c080f27afeab5e6f11e618fb7bc7aea53 upstream.

Fixes STAR 9000953410: "perf callgraph profiling causing RCU stalls"

| perf record -g -c 15000 -e cycles /sbin/hackbench
|
| INFO: rcu_preempt self-detected stall on CPU
| 1: (1 GPs behind) idle=609/140000000000002/0 softirq=2914/2915 fqs=603
| Task dump for CPU 1:

in-kernel dwarf unwinder has a fast binary lookup and a fallback linear
search (which iterates thru each of ~11K entries) thus takes 2 orders of
magnitude longer (~3 million cycles vs. 2000). Routines written in hand
assembler lack dwarf info (as we don't support assembler CFI pseudo-ops
yet) fail the unwinder binary lookup, hit linear search, failing
nevertheless in the end.

However the linear search is pointless as binary lookup tables are created
from it in first place. It is impossible to have binary lookup fail while
succeed the linear search. It is pure waste of cycles thus removed by
this patch.

This manifested as RCU stalls / NMI watchdog splat when running
hackbench under perf with callgraph profiling. The triggering condition
was perf counter overflowing in routine lacking dwarf info (like memset)
leading to patheic 3 million cycle unwinder slow path and by the time it
returned new interrupts were already pending (Timer, IPI) and taken
rightaway. The original memset didn't make forward progress, system kept
accruing more interrupts and more unwinder delayes in a vicious feedback
loop, ultimately triggering the NMI diagnostic.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:06:21 -08:00
James Hogan 23a9a7dc6f MIPS: KVM: Uninit VCPU in vcpu_create error path
commit 585bb8f9a5e592f2ce7abbe5ed3112d5438d2754 upstream.

If either of the memory allocations in kvm_arch_vcpu_create() fail, the
vcpu which has been allocated and kvm_vcpu_init'd doesn't get uninit'd
in the error handling path. Add a call to kvm_vcpu_uninit() to fix this.

Fixes: 669e846e6c ("KVM/MIPS32: MIPS arch specific APIs for KVM")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:06:20 -08:00
James Hogan 7b0fc45113 MIPS: KVM: Fix CACHE immediate offset sign extension
commit c5c2a3b998f1ff5a586f9d37e154070b8d550d17 upstream.

The immediate field of the CACHE instruction is signed, so ensure that
it gets sign extended by casting it to an int16_t rather than just
masking the low 16 bits.

Fixes: e685c689f3 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:06:20 -08:00
James Hogan 5525dd65cd MIPS: KVM: Fix ASID restoration logic
commit 002374f371bd02df864cce1fe85d90dc5b292837 upstream.

ASID restoration on guest resume should determine the guest execution
mode based on the guest Status register rather than bit 30 of the guest
PC.

Fix the two places in locore.S that do this, loading the guest status
from the cop0 area. Note, this assembly is specific to the trap &
emulate implementation of KVM, so it doesn't need to check the
supervisor bit as that mode is not implemented in the guest.

Fixes: b680f70fc1 ("KVM/MIPS32: Entry point for trampolining to...")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:06:20 -08:00
Ingo Molnar 9ee0d9ad93 efi: Disable interrupts around EFI calls, not in the epilog/prolog calls
commit 23a0d4e8fa6d3a1d7fb819f79bcc0a3739c30ba9 upstream.

Tapasweni Pathak reported that we do a kmalloc() in efi_call_phys_prolog()
on x86-64 while having interrupts disabled, which is a big no-no, as
kmalloc() can sleep.

Solve this by removing the irq disabling from the prolog/epilog calls
around EFI calls: it's unnecessary, as in this stage we are single
threaded in the boot thread, and we don't ever execute this from
interrupt contexts.

Reported-by: Tapasweni Pathak <tapaswenipathak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
[ luis: backported to 3.10: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:06:19 -08:00
Andy Lutomirski 4800af9122 x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
commit 425be5679fd292a3c36cb1fe423086708a99f11a upstream.

The early_idt_handlers asm code generates an array of entry
points spaced nine bytes apart.  It's not really clear from that
code or from the places that reference it what's going on, and
the code only works in the first place because GAS never
generates two-byte JMP instructions when jumping to global
labels.

Clean up the code to generate the correct array stride (member size)
explicitly. This should be considerably more robust against
screw-ups, as GAS will warn if a .fill directive has a negative
count.  Using '. =' to advance would have been even more robust
(it would generate an actual error if it tried to move
backwards), but it would pad with nulls, confusing anyone who
tries to disassemble the code.  The new scheme should be much
clearer to future readers.

While we're at it, improve the comments and rename the array and
common code.

Binutils may start relaxing jumps to non-weak labels.  If so,
this change will fix our build, and we may need to backport this
change.

Before, on x86_64:

  0000000000000000 <early_idt_handlers>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 00 00 00 00          jmpq   9 <early_idt_handlers+0x9>
                          5: R_X86_64_PC32        early_idt_handler-0x4
  ...
    48:   66 90                   xchg   %ax,%ax
    4a:   6a 08                   pushq  $0x8
    4c:   e9 00 00 00 00          jmpq   51 <early_idt_handlers+0x51>
                          4d: R_X86_64_PC32       early_idt_handler-0x4
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   e9 00 00 00 00          jmpq   120 <early_idt_handler>
                          11c: R_X86_64_PC32      early_idt_handler-0x4

After:

  0000000000000000 <early_idt_handler_array>:
     0:   6a 00                   pushq  $0x0
     2:   6a 00                   pushq  $0x0
     4:   e9 14 01 00 00          jmpq   11d <early_idt_handler_common>
  ...
    48:   6a 08                   pushq  $0x8
    4a:   e9 d1 00 00 00          jmpq   120 <early_idt_handler_common>
    4f:   cc                      int3
    50:   cc                      int3
  ...
   117:   6a 00                   pushq  $0x0
   119:   6a 1f                   pushq  $0x1f
   11b:   eb 03                   jmp    120 <early_idt_handler_common>
   11d:   cc                      int3
   11e:   cc                      int3
   11f:   cc                      int3

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Binutils <binutils@sourceware.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ac027962af343b0c599cbfcf50b945ad2ef3d7a8.1432336324.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 11:57:49 -08:00
Sudip Mukherjee ac6cef6cfa m32r: fix m32104ut_defconfig build fail
commit 601f1db653217f205ffa5fb33514b4e1711e56d1 upstream.

The build of m32104ut_defconfig for m32r arch was failing for long long
time with the error:

  ERROR: "memory_start" [fs/udf/udf.ko] undefined!
  ERROR: "memory_end" [fs/udf/udf.ko] undefined!
  ERROR: "memory_end" [drivers/scsi/sg.ko] undefined!
  ERROR: "memory_start" [drivers/scsi/sg.ko] undefined!
  ERROR: "memory_end" [drivers/i2c/i2c-dev.ko] undefined!
  ERROR: "memory_start" [drivers/i2c/i2c-dev.ko] undefined!

As done in other architectures export the symbols to fix the error.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 11:57:49 -08:00
Linus Walleij fa4aa48526 ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz()
commit 5070fb14a0154f075c8b418e5bc58a620ae85a45 upstream.

When trying to set the ICST 307 clock to 25174000 Hz I ran into
this arithmetic error: the icst_hz_to_vco() correctly figure out
DIVIDE=2, RDW=100 and VDW=99 yielding a frequency of
25174000 Hz out of the VCO. (I replicated the icst_hz() function
in a spreadsheet to verify this.)

However, when I called icst_hz() on these VCO settings it would
instead return 4122709 Hz. This causes an error in the common
clock driver for ICST as the common clock framework will call
.round_rate() on the clock which will utilize icst_hz_to_vco()
followed by icst_hz() suggesting the erroneous frequency, and
then the clock gets set to this.

The error did not manifest in the old clock framework since
this high frequency was only used by the CLCD, which calls
clk_set_rate() without first calling clk_round_rate() and since
the old clock framework would not call clk_round_rate() before
setting the frequency, the correct values propagated into
the VCO.

After some experimenting I figured out that it was due to a simple
arithmetic overflow: the divisor for 24Mhz reference frequency
as reference becomes 24000000*2*(99+8)=0x132212400 and the "1"
in bit 32 overflows and is lost.

But introducing an explicit 64-by-32 bit do_div() and casting
the divisor into (u64) we get the right frequency back, and the
right frequency gets set.

Tested on the ARM Versatile.

Cc: linux-clk@vger.kernel.org
Cc: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 11:57:48 -08:00
Linus Walleij 57cd1f0a20 ARM: 8519/1: ICST: try other dividends than 1
commit e972c37459c813190461dabfeaac228e00aae259 upstream.

Since the dawn of time the ICST code has only supported divide
by one or hang in an eternal loop. Luckily we were always dividing
by one because the reference frequency for the systems using
the ICSTs is 24MHz and the [min,max] values for the PLL input
if [10,320] MHz for ICST307 and [6,200] for ICST525, so the loop
will always terminate immediately without assigning any divisor
for the reference frequency.

But for the code to make sense, let's insert the missing i++

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 11:57:48 -08:00
Martijn Coenen 5195cfd1ff Revert "bullhead_defconfig: disable Android LMK, enable mem cgroups."
This reverts commit 0dbad758a5.
2016-02-25 09:49:52 +00:00
Nick Desaulniers ae7e70513d Silences WLAN, PCIe, and CPU suspend state kernel messages
Bug: 27134656
Change-Id: I681ec2171472514489365ca4bfc4ef16d9b344fe
2016-02-23 14:12:49 -08:00
Will Deacon f4ae17cea6 UPSTREAM: arm64: cpu hotplug: ensure we mask out CPU_TASKS_FROZEN in notifiers
(cherry picked from commit e56d82a116176f7af9d642b560abbbd3a2b68013)

We have a couple of CPU hotplug notifiers for resetting the CPU debug
state to a sane value when a CPU comes online.

This patch ensures that we mask out CPU_TASKS_FROZEN so that we don't
miss any online events occuring due to suspend/resume.

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

Signed-off-by: Pavel Labath <labath@google.com>
Bug: 27189927
Change-Id: I72549149b9bf1f0d05cb17a1db98f9a342c580c0
2016-02-22 21:49:24 +00:00
Helge Deller 4fdefe95e6 parisc: Fix __ARCH_SI_PREAMBLE_SIZE
commit e60fc5aa608eb38b47ba4ee058f306f739eb70a0 upstream.

On a 64bit kernel build the compiler aligns the _sifields union in the
struct siginfo_t on a 64bit address. The __ARCH_SI_PREAMBLE_SIZE define
compensates for this alignment and thus fixes the wait testcase of the
strace package.

The symptoms of a wrong __ARCH_SI_PREAMBLE_SIZE value is that
_sigchld.si_stime variable is missed to be copied and thus after a
copy_siginfo() will have uninitialized values.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-19 14:22:38 -08:00
Helge Deller fedff89dca parisc: Fix syscall restarts
commit 71a71fb5374a23be36a91981b5614590b9e722c3 upstream.

On parisc syscalls which are interrupted by signals sometimes failed to
restart and instead returned -ENOSYS which in the worst case lead to
userspace crashes.
A similiar problem existed on MIPS and was fixed by commit e967ef02
("MIPS: Fix restart of indirect syscalls").

On parisc the current syscall restart code assumes that all syscall
callers load the syscall number in the delay slot of the ble
instruction. That's how it is e.g. done in the unistd.h header file:
	ble 0x100(%sr2, %r0)
	ldi #syscall_nr, %r20
Because of that assumption the current code never restored %r20 before
returning to userspace.

This assumption is at least not true for code which uses the glibc
syscall() function, which instead uses this syntax:
	ble 0x100(%sr2, %r0)
	copy regX, %r20
where regX depend on how the compiler optimizes the code and register
usage.

This patch fixes this problem by adding code to analyze how the syscall
number is loaded in the delay branch and - if needed - copy the syscall
number to regX prior returning to userspace for the syscall restart.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-19 14:22:38 -08:00
Helge Deller d2bb787af4 parisc: Drop unused MADV_xxxK_PAGES flags from asm/mman.h
commit dcbf0d299c00ed4f82ea8d6e359ad88a5182f9b8 upstream.

Drop the MADV_xxK_PAGES flags, which were never used and were from a proposed
API which was never integrated into the generic Linux kernel code.

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-19 14:22:38 -08:00
Dmitry V. Levin 516932b705 sh64: fix __NR_fgetxattr
commit 2d33fa1059da4c8e816627a688d950b613ec0474 upstream.

According to arch/sh/kernel/syscalls_64.S and common sense, __NR_fgetxattr
has to be defined to 259, but it doesn't.  Instead, it's defined to 269,
which is of course used by another syscall, __NR_sched_setaffinity in this
case.

This bug was found by strace test suite.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-19 14:22:38 -08:00
Srinivas Girigowda 1a7e9c2eb0 bullhead: Add CNSS platform kernel configs
Separate the WLAN CNSS platform driver according to
the bus interface PCIe and SDIO. Add separate Kernel
Config file for the WLAN CNSS platform Driver and its
necessary module which has support for the CNSS
connectivity Subsystem.

Add Kernel Config flag to refactoring the CNSS platform Driver:

CONFIG_CNSS Kernel Config add support to CNSS Core
driver compilation and export Generic GPL wrappers.

CONFIG_CNSS_SDIO Kernel Config add support to CNSS
Platform Driver compilation for SDIO based WiFi Devices
and export platform driver API's based on the SDIO bus.

CONFIG_CNSS_PCI Kernel Config add support to CNSS
Platform Driver compilation for PCIe based WiFi Devices
and export platform driver API's based on the PCIe bus.

CRs-Fixed: 939171
Signed-off-by: Subhani Shaik <subhanis@codeaurora.org>
Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>
2016-02-19 21:58:22 +00:00
Srinivas Girigowda a9f143423b Net: CNSS: refactoring CNSS platform Driver
Separate the WLAN CNSS platform driver according to
the bus interface PCIe and SDIO. Add separate Kernel
Config file for the WLAN CNSS platform Driver and its
necessary module which has support for the CNSS
connectivity Subsystem.

Add Kernel Config flag to refactoring the CNSS platform Driver:

CONFIG_CNSS Kernel Config add support to CNSS Core
driver compilation and export Generic GPL wrappers.

CONFIG_CNSS_SDIO Kernel Config add support to CNSS
Platform Driver compilation for SDIO based WiFi Devices
and export platform driver API's based on the SDIO bus.

CONFIG_CNSS_PCI Kernel Config add support to CNSS
Platform Driver compilation for PCIe based WiFi Devices
and export platform driver API's based on the PCIe bus.

CRs-fixed: 939171
Change-Id: I8cce5bbc87e6742179a7967ccba295f55c444b28
Signed-off-by: Kai Liu <kaliu@codeaurora.org>
Signed-off-by: Sarada Prasanna Garnayak <sgarna@codeaurora.org>
Signed-off-by: Amarnath Hullur Subramanyam <amarnath@codeaurora.org>
Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>
2016-02-19 21:56:31 +00:00
Ben Fennema 2eb0448f7a bullhead_defconfig: switch back to nanohub
Power regression resolved.

Bug: 27122067
Bug: 27057077

Change-Id: Ia7f2946eb6bc9041c4492e2be1f8da38b3ad08ef
Signed-off-by: Ben Fennema <fennema@google.com>
2016-02-17 13:44:58 -08:00
Lorenzo Colitti 094123c645 bullhead: enable CONFIG_INET_DIAG_DESTROY
BUG=26976388
Change-Id: Iea88117ab33087c3910c1894b30b013a617c1d11
2016-02-16 17:34:16 +09:00
Ben Fennema 680d605413 bullhead_defconfig: revert back to chinook contexthub
revert back to chinook while tracking down power regression

Bug: 27122067
Change-Id: I87e77b26869eba7522faa5414f7bbb447297d579
Signed-off-by: Ben Fennema <fennema@google.com>
2016-02-10 14:42:54 -08:00
Martijn Coenen 0dbad758a5 bullhead_defconfig: disable Android LMK, enable mem cgroups.
To start using user-space low memory killer.

Change-Id: Ie60e20739b1cc6ba51bc01b24c3c963f3a30db2c
2016-02-10 01:39:31 +00:00
Ben Fennema dd4b8a08e5 nanohub: add nanohub kernel driver
Adds nanohub kernel driver files as well as device tree configuration to
use nanohub instead of contexthub (spich)

Change-Id: I089dd5c3a08968a002e965e8d1ed260b93e32ce3
Signed-off-by: Ben Fennema <fennema@google.com>
2016-02-05 12:51:01 -08:00
Guenter Roeck 5d5ee1d4fd mn10300: Select CONFIG_HAVE_UID16 to fix build failure
commit c86576ea114a9a881cf7328dc7181052070ca311 upstream.

mn10300 builds fail with

fs/stat.c: In function 'cp_old_stat':
fs/stat.c:163:2: error: 'old_uid_t' undeclared

ipc/util.c: In function 'ipc64_perm_to_ipc_perm':
ipc/util.c:540:2: error: 'old_uid_t' undeclared

Select CONFIG_HAVE_UID16 and remove local definition of CONFIG_UID16
to fix the problem.

Fixes: fbc416ff8618 ("arm64: fix building without CONFIG_UID16")
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:37 -08:00
Andrew Morton 156057c612 openrisc: fix CONFIG_UID16 setting
commit 04ea1e91f85615318ea91ce8ab50cb6a01ee4005 upstream.

openrisc-allnoconfig:

  kernel/uid16.c: In function 'SYSC_setgroups16':
  kernel/uid16.c:184:2: error: implicit declaration of function 'groups_alloc'
  kernel/uid16.c:184:13: warning: assignment makes pointer from integer without a cast

openrisc shouldn't be setting CONFIG_UID16 when CONFIG_MULTIUSER=n.

Fixes: 2813893f8b197a1 ("kernel: conditionally support non-root users, groups and capabilities")
Reported-by: Fengguang Wu <fengguang.wu@gmail.com>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:36 -08:00
Will Deacon c8f487a49a arm64: mm: ensure that the zero page is visible to the page table walker
commit 32d6397805d00573ce1fa55f408ce2bca15b0ad3 upstream.

In paging_init, we allocate the zero page, memset it to zero and then
point TTBR0 to it in order to avoid speculative fetches through the
identity mapping.

In order to guarantee that the freshly zeroed page is indeed visible to
the page table walker, we need to execute a dsb instruction prior to
writing the TTBR.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:36 -08:00
John Blackwood c2db3a421b arm64: Clear out any singlestep state on a ptrace detach operation
commit 5db4fd8c52810bd9740c1240ebf89223b171aa70 upstream.

Make sure to clear out any ptrace singlestep state when a ptrace(2)
PTRACE_DETACH call is made on arm64 systems.

Otherwise, the previously ptraced task will die off with a SIGTRAP
signal if the debugger just previously singlestepped the ptraced task.

Signed-off-by: John Blackwood <john.blackwood@ccur.com>
[will: added comment to justify why this is in the arch code]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:36 -08:00
Boqun Feng 5bb9a369bd powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
commit 81d7a3294de7e9828310bbf986a67246b13fa01e upstream.

According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
versions all need to be fully ordered, however they are now just
RELEASE+ACQUIRE, which are not fully ordered.

So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
__{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
b97021f855 ("powerpc: Fix atomic_xxx_return barrier semantics")

This patch depends on patch "powerpc: Make value-returning atomics fully
ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:35 -08:00
Boqun Feng 5ac5ac96bc powerpc: Make value-returning atomics fully ordered
commit 49e9cf3f0c04bf76ffa59242254110309554861d upstream.

According to memory-barriers.txt:

> Any atomic operation that modifies some state in memory and returns
> information about the state (old or new) implies an SMP-conditional
> general memory barrier (smp_mb()) on each side of the actual
> operation ...

Which mean these operations should be fully ordered. However on PPC,
PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
which is currently "lwsync" if SMP=y. The leading "lwsync" can not
guarantee fully ordered atomics, according to Paul Mckenney:

https://lkml.org/lkml/2015/10/14/970

To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
the fully-ordered semantics.

This also makes futex atomics fully ordered, which can avoid possible
memory ordering problems if userspace code relies on futex system call
for fully ordered semantics.

Fixes: b97021f855 ("powerpc: Fix atomic_xxx_return barrier semantics")
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:35 -08:00
Michael Neuling 5d64942934 powerpc/tm: Block signal return setting invalid MSR state
commit d2b9d2a5ad5ef04ff978c9923d19730cb05efd55 upstream.

Currently we allow both the MSR T and S bits to be set by userspace on
a signal return.  Unfortunately this is a reserved configuration and
will cause a TM Bad Thing exception if attempted (via rfid).

This patch checks for this case in both the 32 and 64 bit signals
code.  If both T and S are set, we mark the context as invalid.

Found using a syscall fuzzer.

Fixes: 2b0a576d15 ("powerpc: Add new transactional memory state to the signal context")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:35 -08:00
H.J. Lu 6ec8f1c4d6 x86/boot: Double BOOT_HEAP_SIZE to 64KB
commit 8c31902cffc4d716450be549c66a67a8a3dd479c upstream.

When decompressing kernel image during x86 bootup, malloc memory
for ELF program headers may run out of heap space, which leads
to system halt.  This patch doubles BOOT_HEAP_SIZE to 64KB.

Tested with 32-bit kernel which failed to boot without this patch.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:28 -08:00
Mario Kleiner a919f20b06 x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[]
commit 2f0c0b2d96b1205efb14347009748d786c2d9ba5 upstream.

Without the reboot=pci method, the iMac 10,1 simply
hangs after printing "Restarting system" at the point
when it should reboot. This fixes it.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1450466646-26663-1-git-send-email-mario.kleiner.de@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:28 -08:00
Paul Mackerras d59f772b71 KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR
commit c20875a3e638e4a03e099b343ec798edd1af5cc6 upstream.

Currently it is possible for userspace (e.g. QEMU) to set a value
for the MSR for a guest VCPU which has both of the TS bits set,
which is an illegal combination.  The result of this is that when
we execute a hrfid (hypervisor return from interrupt doubleword)
instruction to enter the guest, the CPU will take a TM Bad Thing
type of program interrupt (vector 0x700).

Now, if PR KVM is configured in the kernel along with HV KVM, we
actually handle this without crashing the host or giving hypervisor
privilege to the guest; instead what happens is that we deliver a
program interrupt to the guest, with SRR0 reflecting the address
of the hrfid instruction and SRR1 containing the MSR value at that
point.  If PR KVM is not configured in the kernel, then we try to
run the host's program interrupt handler with the MMU set to the
guest context, which almost certainly causes a host crash.

This closes the hole by making kvmppc_set_msr_hv() check for the
illegal combination and force the TS field to a safe value (00,
meaning non-transactional).

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:28 -08:00
Ouyang Zhaowei (Charles) c55958f9a8 x86/xen: don't reset vcpu_info on a cancelled suspend
commit 6a1f513776b78c994045287073e55bae44ed9f8c upstream.

On a cancelled suspend the vcpu_info location does not change (it's
still in the per-cpu area registered by xen_vcpu_setup()).  So do not
call xen_hvm_init_shared_info() which would make the kernel think its
back in the shared info.  With the wrong vcpu_info, events cannot be
received and the domain will hang after a cancelled suspend.

Signed-off-by: Charles Ouyang <ouyangzhaowei@huawei.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:28 -08:00
Dmitry V. Levin c511958e1c x86/signal: Fix restart_syscall number for x32 tasks
commit 22eab1108781eff09961ae7001704f7bd8fb1dce upstream.

When restarting a syscall with regs->ax == -ERESTART_RESTARTBLOCK,
regs->ax is assigned to a restart_syscall number.  For x32 tasks, this
syscall number must have __X32_SYSCALL_BIT set, otherwise it will be
an x86_64 syscall number instead of a valid x32 syscall number. This
issue has been there since the introduction of x32.

Reported-by: strace/tests/restart_syscall.test
Reported-and-tested-by: Elvira Khabirova <lineprinter0@gmail.com>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Elvira Khabirova <lineprinter0@gmail.com>
Link: http://lkml.kernel.org/r/20151130215436.GA25996@altlinux.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-28 21:49:28 -08:00
dcashman 4bb0fa3db3 bullhead_defconfig: update mmap_rnd_bits to max value.
Bug: 26648393
Signed-off-by: Daniel Cashman <dcashman@google.com>
Change-Id: Idd9038fcbe6e346e7f0262281c31238a9ddc884b
2016-01-21 03:04:23 +00:00
dcashman 3c9da67447 FROMLIST: x86: mm: support ARCH_MMAP_RND_BITS.
(cherry picked from commit https://lkml.org/lkml/2015/12/21/339)

x86: arch_mmap_rnd() uses hard-coded values, 8 for 32-bit and 28 for
64-bit, to generate the random offset for the mmap base address.
This value represents a compromise between increased ASLR
effectiveness and avoiding address-space fragmentation. Replace it
with a Kconfig option, which is sensibly bounded, so that platform
developers may choose where to place this compromise. Keep default
values as new minimums.

Bug: 24047224
Signed-off-by: Daniel Cashman <dcashman@android.com>
Signed-off-by: Daniel Cashman <dcashman@google.com>
Change-Id: I4f0cb7159406166fbc9f8938ba8d3e24b72eeefb
2016-01-14 11:49:22 -08:00
dcashman fbb45643a9 BACKPORT: FROMLIST: arm64: mm: support ARCH_MMAP_RND_BITS.
(cherry picked from commit https://lkml.org/lkml/2015/12/21/340)

arm64: arch_mmap_rnd() uses STACK_RND_MASK to generate the
random offset for the mmap base address.  This value represents a
compromise between increased ASLR effectiveness and avoiding
address-space fragmentation. Replace it with a Kconfig option, which
is sensibly bounded, so that platform developers may choose where to
place this compromise. Keep default values as new minimums.

Bug: 24047224
Signed-off-by: Daniel Cashman <dcashman@android.com>
Signed-off-by: Daniel Cashman <dcashman@google.com>
Change-Id: I7caf105b838cfc3ab55f275e1a061eb2b77c9a2a
2016-01-14 11:49:15 -08:00
dcashman 3bcb15580a FROMLIST: arm: mm: support ARCH_MMAP_RND_BITS.
(cherry picked from commit https://lkml.org/lkml/2015/12/21/341)

arm: arch_mmap_rnd() uses a hard-code value of 8 to generate the
random offset for the mmap base address.  This value represents a
compromise between increased ASLR effectiveness and avoiding
address-space fragmentation. Replace it with a Kconfig option, which
is sensibly bounded, so that platform developers may choose where to
place this compromise. Keep 8 as the minimum acceptable value.

Bug: 24047224
Signed-off-by: Daniel Cashman <dcashman@android.com>
Signed-off-by: Daniel Cashman <dcashman@google.com>
Change-Id: I89c23a8737c981116a67381c241fdd5556e2b043
2016-01-14 11:48:21 -08:00
dcashman f6abe1ab29 FROMLIST: mm: mmap: Add new /proc tunable for mmap_base ASLR.
(cherry picked from commit https://lkml.org/lkml/2015/12/21/337)

ASLR  only uses as few as 8 bits to generate the random offset for the
mmap base address on 32 bit architectures. This value was chosen to
prevent a poorly chosen value from dividing the address space in such
a way as to prevent large allocations. This may not be an issue on all
platforms. Allow the specification of a minimum number of bits so that
platforms desiring greater ASLR protection may determine where to place
the trade-off.

Bug: 24047224
Signed-off-by: Daniel Cashman <dcashman@android.com>
Signed-off-by: Daniel Cashman <dcashman@google.com>
Change-Id: I66ac01c6f4f2c8dcfc84d1f1e99490b8385b3ed4
2016-01-14 11:48:20 -08:00
Tim Murray 9a2bf03bd6 bullhead: set CONFIG_HZ to 300
1000Hz is still too aggressive from a power consumption point of
view. Set to 300Hz for now.

bug 26444620

Change-Id: I3d1ca38d518e5705b653487bfa95085873247f1c
2016-01-14 11:43:31 -08:00
Tim Murray 2c51b35da0 bullhead: disable ALLOC_IN_4K_CHUNKS.
Disable ALLOC_IN_4K_CHUNKS to try and keep unmovable pages
centralized. This should help avoid setting all pageblocks to unmovable.

bug 25392275

Change-Id: I95e3dd9aec9a772d85b54cf51764d31bf4d3be63
2016-01-14 11:43:31 -08:00
soyoung.min f8ed00e0bf ARM64: dts: bullhead: add bluish lcd gamma table
Bug: 26110238

Change-Id: Ifc1277a55b06f13d7b2474a3eafdcb27a98d87d4
Signed-off-by: soyoung.min <soyoung.min@lge.com>
2016-01-14 11:43:28 -08:00
Thierry Strudel a62a250325 ARM64: dts: bullhead: haptic: slightly increase vmax
Bug: 26299440
Change-Id: Ie4f39423c03c88d7e035895d06f3f7ce7a103c35
Signed-off-by: Thierry Strudel <tstrudel@google.com>
2016-01-12 11:34:55 -08:00
Thierry Strudel b54545429a ARM64: dts: bullhead: haptic: use sine wave shape
Bug: 26299440
Change-Id: I32d978bf9b39a1f628643ad929057d6f194477bb
2016-01-12 11:34:54 -08:00
Tim Murray b86ce971d3 bullhead_defconfig: set CONFIG_HZ to 1000
Increase CONFIG_HZ in order to improve UI responsiveness, especially
when there is significant background CPU work.

bug 25347271

Change-Id: If2d77233ea333d1be8facb8c35f8241aa9e2fd6e
2016-01-12 11:34:54 -08:00
Ajay Dudani bb57c6eb80 msm: kgsl: Fix direct references to HZ
Make the various timeout values HZ agnostic by using the proper
macros and values instead.

Change-Id: I6b75b3f7795e6670220b1eec3df9a03b75b8c8f9
Signed-off-by: Suman Tatiraju <sumant@codeaurora.org>
Signed-off-by: Ajay Dudani <adudani@codeaurora.org>
2016-01-05 14:29:50 -08:00
Sami Tolvanen 198ce7d738 arch: arm64: bullhead_defconfig: enable CONFIG_DM_VERITY_FEC
Bug: 21893453
Change-Id: I52ab1457be57e72f4cbbbb6668d4dd9995e06717
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
2016-01-04 11:53:21 +00:00
Thierry Strudel 51c9d8c709 Revert "android/lowmemorykiller: Ignore tasks with freed mm"
This reverts commit 05d5ad4d0a.

Change-Id: Ib5770b5b38f123116322507646ae9bce6c3c186a
2015-12-29 13:25:56 -08:00
Jongrak Kwon cf2903fd9d arm64/dts: bullhead: enable pstore ecc
Bug: 25742233

Change-Id: I9a8e1fdbadda358862ceac2cc51cbfb33c3ffbc2
Signed-off-by: Jongrak Kwon <jongrak.kwon@lge.com>
2015-12-14 13:58:56 -08:00
Vineeta Srivastava 529f523acd Revert "bullhead: Enable SCHED_HRTICK"
This reverts commit 0482650b09.
2015-12-10 13:02:07 -08:00
Robin Murphy b2a5b59647 arm64: Fix compat register mappings
commit 5accd17d0eb523350c9ef754d655e379c9bb93b3 upstream.

For reasons not entirely apparent, but now enshrined in history, the
architectural mapping of AArch32 banked registers to AArch64 registers
actually orders SP_<mode> and LR_<mode> backwards compared to the
intuitive r13/r14 order, for all modes except FIQ.

Fix the compat_<reg>_<mode> macros accordingly, in the hope of avoiding
subtle bugs with KVM and AArch32 guests.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09 13:40:10 -05:00
Andrew Cooper 8fa88fa850 x86/cpu: Fix SMAP check in PVOPS environments
commit 581b7f158fe0383b492acd1ce3fb4e99d4e57808 upstream.

There appears to be no formal statement of what pv_irq_ops.save_fl() is
supposed to return precisely.  Native returns the full flags, while lguest and
Xen only return the Interrupt Flag, and both have comments by the
implementations stating that only the Interrupt Flag is looked at.  This may
have been true when initially implemented, but no longer is.

To make matters worse, the Xen PVOP leaves the upper bits undefined, making
the BUG_ON() undefined behaviour.  Experimentally, this now trips for 32bit PV
guests on Broadwell hardware.  The BUG_ON() is consistent for an individual
build, but not consistent for all builds.  It has also been a sitting timebomb
since SMAP support was introduced.

Use native_save_fl() instead, which will obtain an accurate view of the AC
flag.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: <lguest@lists.ozlabs.org>
Cc: Xen-devel <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1433323874-6927-1-git-send-email-andrew.cooper3@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09 13:40:08 -05:00
Borislav Petkov 8f14777b58 x86/cpu: Call verify_cpu() after having entered long mode too
commit 04633df0c43d710e5f696b06539c100898678235 upstream.

When we get loaded by a 64-bit bootloader, kernel entry point is
startup_64 in head_64.S. We don't trust any and all bootloaders because
some will fiddle with CPU configuration so we go ahead and massage each
CPU into sanity again.

For example, some dell BIOSes have this XD disable feature which set
IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround
for other OSes but Linux sure doesn't need it.

A similar thing is present in the Surface 3 firmware - see
https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit
only on the BSP:

  # rdmsr -a 0x1a0
  400850089
  850089
  850089
  850089

I know, right?!

There's not even an off switch in there.

So fix all those cases by sanitizing the 64-bit entry point too. For
that, make verify_cpu() callable in 64-bit mode also.

Requested-and-debugged-by: "H. Peter Anvin" <hpa@zytor.com>
Reported-and-tested-by: Bastien Nocera <bugzilla@hadess.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09 13:40:08 -05:00
Krzysztof Mazur 308b1b0433 x86/setup: Fix low identity map for >= 2GB kernel range
commit 68accac392d859d24adcf1be3a90e41f978bd54c upstream.

The commit f5f3497cad8c extended the low identity mapping. However, if
the kernel uses more than 2 GB (VMSPLIT_2G_OPT or VMSPLIT_1G memory
split), the normal memory mapping is overwritten by the low identity
mapping causing a crash. To avoid overwritting, limit the low identity
map to cover only memory before kernel range (PAGE_OFFSET).

Fixes: f5f3497cad8c "x86/setup: Extend low identity map to cover whole kernel range
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/r/1446815916-22105-1-git-send-email-krzysiek@podlesie.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09 13:40:08 -05:00
Paolo Bonzini fa4fbf7138 x86/setup: Extend low identity map to cover whole kernel range
commit f5f3497cad8c8416a74b9aaceb127908755d020a upstream.

On 32-bit systems, the initial_page_table is reused by
efi_call_phys_prolog as an identity map to call
SetVirtualAddressMap.  efi_call_phys_prolog takes care of
converting the current CPU's GDT to a physical address too.

For PAE kernels the identity mapping is achieved by aliasing the
first PDPE for the kernel memory mapping into the first PDPE
of initial_page_table.  This makes the EFI stub's trick "just work".

However, for non-PAE kernels there is no guarantee that the identity
mapping in the initial_page_table extends as far as the GDT; in this
case, accesses to the GDT will cause a page fault (which quickly becomes
a triple fault).  Fix this by copying the kernel mappings from
swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at
identity mapping.

For some reason, this is only reproducible with QEMU's dynamic translation
mode, and not for example with KVM.  However, even under KVM one can clearly
see that the page table is bogus:

    $ qemu-system-i386 -pflash OVMF.fd -M q35 vmlinuz0 -s -S -daemonize
    $ gdb
    (gdb) target remote localhost:1234
    (gdb) hb *0x02858f6f
    Hardware assisted breakpoint 1 at 0x2858f6f
    (gdb) c
    Continuing.

    Breakpoint 1, 0x02858f6f in ?? ()
    (gdb) monitor info registers
    ...
    GDT=     0724e000 000000ff
    IDT=     fffbb000 000007ff
    CR0=0005003b CR2=ff896000 CR3=032b7000 CR4=00000690
    ...

The page directory is sane:

    (gdb) x/4wx 0x32b7000
    0x32b7000:	0x03398063	0x03399063	0x0339a063	0x0339b063
    (gdb) x/4wx 0x3398000
    0x3398000:	0x00000163	0x00001163	0x00002163	0x00003163
    (gdb) x/4wx 0x3399000
    0x3399000:	0x00400003	0x00401003	0x00402003	0x00403003

but our particular page directory entry is empty:

    (gdb) x/1wx 0x32b7000 + (0x724e000 >> 22) * 4
    0x32b7070:	0x00000000

[ It appears that you can skate past this issue if you don't receive
  any interrupts while the bogus GDT pointer is loaded, or if you avoid
  reloading the segment registers in general.

  Andy Lutomirski provides some additional insight:

   "AFAICT it's entirely permissible for the GDTR and/or LDT
    descriptor to point to unmapped memory.  Any attempt to use them
    (segment loads, interrupts, IRET, etc) will try to access that memory
    as if the access came from CPL 0 and, if the access fails, will
    generate a valid page fault with CR2 pointing into the GDT or
    LDT."

  Up until commit 23a0d4e8fa6d ("efi: Disable interrupts around EFI
  calls, not in the epilog/prolog calls") interrupts were disabled
  around the prolog and epilog calls, and the functional GDT was
  re-installed before interrupts were re-enabled.

  Which explains why no one has hit this issue until now. ]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Laszlo Ersek <lersek@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
[ Updated changelog. ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09 13:40:08 -05:00
Florian Fainelli 8d4b965243 ARM: orion: Fix DSA platform device after mvmdio conversion
commit d836ace65ee98d7079bc3c5afdbcc0e27dca20a3 upstream.

DSA expects the host_dev pointer to be the device structure associated
with the MDIO bus controller driver. First commit breaking that was
c3a07134e6 ("mv643xx_eth: convert to use the Marvell Orion MDIO
driver"), and then, it got completely under the radar for a while.

Reported-by: Frans van de Wiel <fvdw@fvdw.eu>
Fixes: c3a07134e6 ("mv643xx_eth: convert to use the Marvell Orion MDIO driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09 13:40:08 -05:00
Marek Szyprowski 1aac1dc988 ARM: 8427/1: dma-mapping: add support for offset parameter in dma_mmap()
commit 7e31210349e9e03a9a4dff31ab5f2bc83e8e84f5 upstream.

IOMMU-based dma_mmap() implementation lacked proper support for offset
parameter used in mmap call (it always assumed that mapping starts from
offset zero). This patch adds support for offset parameter to IOMMU-based
implementation.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09 13:40:07 -05:00
Marek Szyprowski 98cc6d31fa ARM: 8426/1: dma-mapping: add missing range check in dma_mmap()
commit 371f0f085f629fc0f66695f572373ca4445a67ad upstream.

dma_mmap() function in IOMMU-based dma-mapping implementation lacked
a check for valid range of mmap parameters (offset and buffer size), what
might have caused access beyond the allocated buffer. This patch fixes
this issue.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-09 13:40:07 -05:00
Tim Murray 0482650b09 bullhead: Enable SCHED_HRTICK
SCHED_HRTICK seems to improve system responsiveness under load
significantly. Enable it for now while keeping CONFIG_HZ at 100.

Bug: 25745871
Change-Id: Ieb57610a3d513b727debee012301e9523bd4ec3c
2015-11-19 10:48:25 -08:00
Nicolas Pitre 676a883ce5 ARM64: add IPI tracepoints
The strings used to list IPIs in /proc/interrupts are reused for tracing
purposes.

While at it, the code is slightly cleaned up so the ipi_types array
indices are no longer offset by IPI_RESCHEDULE whose value is 0 anyway.

Link: http://lkml.kernel.org/p/1406318733-26754-5-git-send-email-nicolas.pitre@linaro.org

Change-Id: Ie161065c84f210510ca1591a33425c256d4625f3
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-17 14:59:39 -08:00
Greg Kroah-Hartman 9754f73ba0 xen: fix backport of previous kexec patch
Fixes the backport of 0b34a166f291d255755be46e43ed5497cdd194f2 upstream

Commit 0b34a166f291d255755be46e43ed5497cdd194f2 "x86/xen: Support
kexec/kdump in HVM guests by doing a soft reset" has been added to the
4.2-stable tree" needed to correct the CONFIG variable, as
CONFIG_KEXEC_CORE only showed up in 4.3.

Reported-by: David Vrabel <david.vrabel@citrix.com>
Reported-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-09 10:12:59 -08:00
Will Deacon d0ecd78fd2 Revert "ARM64: unwind: Fix PC calculation"
commit 9702970c7bd3e2d6fecb642a190269131d4ac16c upstream.

This reverts commit e306dfd06fcb44d21c80acb8e5a88d55f3d1cf63.

With this patch applied, we were the only architecture making this sort
of adjustment to the PC calculation in the unwinder. This causes
problems for ftrace, where the PC values are matched against the
contents of the stack frames in the callchain and fail to match any
records after the address adjustment.

Whilst there has been some effort to change ftrace to workaround this,
those patches are not yet ready for mainline and, since we're the odd
architecture in this regard, let's just step in line with other
architectures (like arch/arm/) for now.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-09 10:12:58 -08:00
Vasant Hegde f61004603b powerpc/rtas: Validate rtas.entry before calling enter_rtas()
commit 8832317f662c06f5c06e638f57bfe89a71c9b266 upstream.

Currently we do not validate rtas.entry before calling enter_rtas(). This
leads to a kernel oops when user space calls rtas system call on a powernv
platform (see below). This patch adds code to validate rtas.entry before
making enter_rtas() call.

  Oops: Exception in kernel mode, sig: 4 [#1]
  SMP NR_CPUS=1024 NUMA PowerNV
  task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
  NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
  REGS: c0000007e1a7b920 TRAP: 0e40   Not tainted  (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
  MSR: 1000000000081000 <HV,ME>  CR: 00000000  XER: 00000000
  CFAR: c000000000009c0c SOFTE: 0
  NIP [0000000000000000]           (null)
  LR [0000000000009c14] 0x9c14
  Call Trace:
  [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
  [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
  [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98

Fixes: 55190f8878 ("powerpc: Add skeleton PowerNV platform")
Reported-by: NAGESWARA R. SASTRY <nasastry@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[mpe: Reword change log, trim oops, and add stable + fixes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-11-09 10:12:58 -08:00
Dave Kleikamp 29e5b42437 crypto: sparc - initialize blkcipher.ivsize
commit a66d7f724a96d6fd279bfbd2ee488def6b081bea upstream.

Some of the crypto algorithms write to the initialization vector,
but no space has been allocated for it. This clobbers adjacent memory.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27 09:44:49 +09:00
Geert Uytterhoeven 31e29b7bfc m68k/uaccess: Fix asm constraints for userspace access
commit 631d8b674f5f8235e9cb7e628b0fe9e5200e3158 upstream.

When compiling a MMU kernel with CPU_HAS_ADDRESS_SPACES=n (e.g. "MMU=y
allnoconfig": "echo CONFIG_MMU=y > allno.config && make KCONFIG_ALLCONFIG=1
allnoconfig"), we use plain "move" instead of "moves", and I got:

  CC      arch/m68k/lib/uaccess.o
{standard input}: Assembler messages:
{standard input}:47: Error: operands mismatch -- statement `move.b %a0,(%a1)' ignored

This happens because plain "move" doesn't support byte transfers between
memory and address registers, while "moves" does.

Fix the asm constraints for __generic_copy_from_user(),
__generic_copy_to_user(), and __clear_user() to only use data registers
when accessing userspace.

Also, relax the asm constraints for 16-bit userspace accesses in
__put_user() and __get_user(), as both "move" and "moves" do support
such transfers between memory and address registers.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-27 09:44:49 +09:00
Andi Kleen f7c7bb9f06 x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic
commit ff47ab4ff3cddfa7bc1b25b990e24abe2ae474ff upstream.

The 64bit __copy_{from,to}_user_inatomic always called
copy_from_user_generic, but skipped the special optimizations for 1/2/4/8
byte accesses.

This especially hurts the futex call, which accesses the 4 byte futex
user value with a complicated fast string operation in a function call,
instead of a single movl.

Use __copy_{from,to}_user for _inatomic instead to get the same
optimizations. The only problem was the might_fault() in those functions.
So move that to new wrapper and call __copy_{f,t}_user_nocheck()
from *_inatomic directly.

32bit already did this correctly by duplicating the code.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1376687844-19857-2-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:53 -07:00
Andreas Schwab 46409b7e4b m68k: Define asmlinkage_protect
commit 8474ba74193d302e8340dddd1e16c85cc4b98caf upstream.

Make sure the compiler does not modify arguments of syscall functions.
This can happen if the compiler generates a tailcall to another
function.  For example, without asmlinkage_protect sys_openat is compiled
into this function:

sys_openat:
	clr.l %d0
	move.w 18(%sp),%d0
	move.l %d0,16(%sp)
	jbra do_sys_open

Note how the fourth argument is modified in place, modifying the register
%d4 that gets restored from this stack slot when the function returns to
user-space.  The caller may expect the register to be unmodified across
system calls.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:53 -07:00
Mark Salyzyn 339cb27fc5 arm64: readahead: fault retry breaks mmap file read random detection
commit 569ba74a7ba69f46ce2950bf085b37fea2408385 upstream.

This is the arm64 portion of commit 45cac65b0f ("readahead: fault
retry breaks mmap file read random detection"), which was absent from
the initial port and has since gone unnoticed. The original commit says:

> .fault now can retry.  The retry can break state machine of .fault.  In
> filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
> try, since the page is in page cache now, ra->mmap_miss is decreased.  And
> these are done in one fault, so we can't detect random mmap file access.
>
> Add a new flag to indicate .fault is tried once.  In the second try, skip
> ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.

With this change, Mark reports that:

> Random read improves by 250%, sequential read improves by 40%, and
> random write by 400% to an eMMC device with dm crypto wrapped around it.

Cc: Shaohua Li <shli@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Riley Andrews <riandrews@android.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:52 -07:00
Paul Mackerras 892e053cd7 powerpc/MSI: Fix race condition in tearing down MSI interrupts
commit e297c939b745e420ef0b9dc989cb87bda617b399 upstream.

This fixes a race which can result in the same virtual IRQ number
being assigned to two different MSI interrupts.  The most visible
consequence of that is usually a warning and stack trace from the
sysfs code about an attempt to create a duplicate entry in sysfs.

The race happens when one CPU (say CPU 0) is disposing of an MSI
while another CPU (say CPU 1) is setting up an MSI.  CPU 0 calls
(for example) pnv_teardown_msi_irqs(), which calls
msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
hardware IRQ number) is no longer in use.  Then, before CPU 0 gets
to calling irq_dispose_mapping() to free up the virtal IRQ number,
CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
MSI, and gets the same hardware IRQ number that CPU 0 just freed.
CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
which sees that there is currently a mapping for that hardware IRQ
number and returns the corresponding virtual IRQ number (which is
the same virtual IRQ number that CPU 0 was using).  CPU 0 then
calls irq_dispose_mapping() and frees that virtual IRQ number.
Now, if another CPU comes along and calls irq_create_mapping(), it
is likely to get the virtual IRQ number that was just freed,
resulting in the same virtual IRQ number apparently being used for
two different hardware interrupts.

To fix this race, we just move the call to msi_bitmap_free_hwirqs()
to after the call to irq_dispose_mapping().  Since virq_to_hw()
doesn't work for the virtual IRQ number after irq_dispose_mapping()
has been called, we need to call it before irq_dispose_mapping() and
remember the result for the msi_bitmap_free_hwirqs() call.

The pattern of calling msi_bitmap_free_hwirqs() before
irq_dispose_mapping() appears in 5 places under arch/powerpc, and
appears to have originated in commit 05af7bd2d7 ("[POWERPC] MPIC
U3/U4 MSI backend") from 2007.

Fixes: 05af7bd2d7 ("[POWERPC] MPIC U3/U4 MSI backend")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:52 -07:00
James Hogan 1335a48ac2 MIPS: dma-default: Fix 32-bit fall back to GFP_DMA
commit 53960059d56ecef67d4ddd546731623641a3d2d1 upstream.

If there is a DMA zone (usually 24bit = 16MB I believe), but no DMA32
zone, as is the case for some 32-bit kernels, then massage_gfp_flags()
will cause DMA memory allocated for devices with a 32..63-bit
coherent_dma_mask to fall back to using __GFP_DMA, even though there may
only be 32-bits of physical address available anyway.

Correct that case to compare against a mask the size of phys_addr_t
instead of always using a 64-bit mask.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Fixes: a2e715a86c ("MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.")
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9610/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:52 -07:00
Vitaly Kuznetsov 919845cb33 x86/xen: Support kexec/kdump in HVM guests by doing a soft reset
commit 0b34a166f291d255755be46e43ed5497cdd194f2 upstream.

Currently there is a number of issues preventing PVHVM Xen guests from
doing successful kexec/kdump:

  - Bound event channels.
  - Registered vcpu_info.
  - PIRQ/emuirq mappings.
  - shared_info frame after XENMAPSPACE_shared_info operation.
  - Active grant mappings.

Basically, newly booted kernel stumbles upon already set up Xen
interfaces and there is no way to reestablish them. In Xen-4.7 a new
feature called 'soft reset' is coming. A guest performing kexec/kdump
operation is supposed to call SCHEDOP_shutdown hypercall with
SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor
(with some help from toolstack) will do full domain cleanup (but
keeping its memory and vCPU contexts intact) returning the guest to
the state it had when it was first booted and thus allowing it to
start over.

Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is
probably OK as by default all unknown shutdown reasons cause domain
destroy with a message in toolstack log: 'Unknown shutdown reason code
5. Destroying domain.'  which gives a clue to what the problem is and
eliminates false expectations.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:50 -07:00
Stephen Smalley 4f9d53595c x86/mm: Set NX on gap between __ex_table and rodata
commit ab76f7b4ab2397ffdd2f1eb07c55697d19991d10 upstream.

Unused space between the end of __ex_table and the start of
rodata can be left W+x in the kernel page tables.  Extend the
setting of the NX bit to cover this gap by starting from
text_end rather than rodata_start.

  Before:
  ---[ High Kernel Mapping ]---
  0xffffffff80000000-0xffffffff81000000          16M                               pmd
  0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
  0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
  0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB x  pte
  0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
  0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
  0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
  0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
  0xffffffff82200000-0xffffffffa0000000         478M                               pmd

  After:
  ---[ High Kernel Mapping ]---
  0xffffffff80000000-0xffffffff81000000          16M                               pmd
  0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
  0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
  0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB NX pte
  0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
  0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
  0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
  0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
  0xffffffff82200000-0xffffffffa0000000         478M                               pmd

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:49 -07:00
Dirk Müller fb7eff9cfd Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS
commit d2922422c48df93f3edff7d872ee4f3191fefb08 upstream.

The cpu feature flags are not ever going to change, so warning
everytime can cause a lot of kernel log spam
(in our case more than 10GB/hour).

The warning seems to only occur when nested virtualization is
enabled, so it's probably triggered by a KVM bug.  This is a
sensible and safe change anyway, and the KVM bug fix might not
be suitable for stable releases anyway.

Signed-off-by: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:49 -07:00
David Woodhouse 9870892fc2 x86/platform: Fix Geode LX timekeeping in the generic x86 build
commit 03da3ff1cfcd7774c8780d2547ba0d995f7dc03d upstream.

In 2007, commit 07190a08ee ("Mark TSC on GeodeLX reliable")
bypassed verification of the TSC on Geode LX. However, this code
(now in the check_system_tsc_reliable() function in
arch/x86/kernel/tsc.c) was only present if CONFIG_MGEODE_LX was
set.

OpenWRT has recently started building its generic Geode target
for Geode GX, not LX, to include support for additional
platforms. This broke the timekeeping on LX-based devices,
because the TSC wasn't marked as reliable:
https://dev.openwrt.org/ticket/20531

By adding a runtime check on is_geode_lx(), we can also include
the fix if CONFIG_MGEODEGX1 or CONFIG_X86_GENERIC are set, thus
fixing the problem.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marcelo Tosatti <marcelo@kvack.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1442409003.131189.87.camel@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:49 -07:00
Shaohua Li ffebdff74f x86/apic: Serialize LVTT and TSC_DEADLINE writes
commit 5d7c631d926b59aa16f3c56eaeb83f1036c81dc7 upstream.

The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
guaranteed that the write to LVTT has reached the APIC before the
TSC_DEADLINE MSR is written. In such a case the write to the MSR is
ignored and as a consequence the local timer interrupt never fires.

The SDM decribes this issue for xAPIC and x2APIC modes. The
serialization methods recommended by the SDM differ.

xAPIC:
 "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
  2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
  3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
  4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."

x2APIC:
 "To allow for efficient access to the APIC registers in x2APIC mode,
  the serializing semantics of WRMSR are relaxed when writing to the
  APIC registers. Thus, system software should not use 'WRMSR to APIC
  registers in x2APIC mode' as a serializing instruction. Read and write
  accesses to the APIC registers will occur in program order. A WRMSR to
  an APIC register may complete before all preceding stores are globally
  visible; software can prevent this by inserting a serializing
  instruction, an SFENCE, or an MFENCE before the WRMSR."

The xAPIC method is to just wait for the memory mapped write to hit
the LVTT by checking whether the MSR write has reached the hardware.
There is no reason why a proper MFENCE after the memory mapped write would
not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
xAPIC case as well.

Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
support.

[ tglx: Massaged the changelog ]

Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: <Kernel-team@fb.com>
Cc: <lenb@kernel.org>
Cc: <fenghua.yu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:49 -07:00
Ard Biesheuvel a18006390c ARM: 8429/1: disable GCC SRA optimization
commit a077224fd35b2f7fbc93f14cf67074fc792fbac2 upstream.

While working on the 32-bit ARM port of UEFI, I noticed a strange
corruption in the kernel log. The following snprintf() statement
(in drivers/firmware/efi/efi.c:efi_md_typeattr_format())

	snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",

was producing the following output in the log:

	|    |   |   |   |    |WB|WT|WC|UC]
	|    |   |   |   |    |WB|WT|WC|UC]
	|    |   |   |   |    |WB|WT|WC|UC]
	|RUN|   |   |   |    |WB|WT|WC|UC]*
	|RUN|   |   |   |    |WB|WT|WC|UC]*
	|    |   |   |   |    |WB|WT|WC|UC]
	|RUN|   |   |   |    |WB|WT|WC|UC]*
	|    |   |   |   |    |WB|WT|WC|UC]
	|RUN|   |   |   |    |   |   |   |UC]
	|RUN|   |   |   |    |   |   |   |UC]

As it turns out, this is caused by incorrect code being emitted for
the string() function in lib/vsprintf.c. The following code

	if (!(spec.flags & LEFT)) {
		while (len < spec.field_width--) {
			if (buf < end)
				*buf = ' ';
			++buf;
		}
	}
	for (i = 0; i < len; ++i) {
		if (buf < end)
			*buf = *s;
		++buf; ++s;
	}
	while (len < spec.field_width--) {
		if (buf < end)
			*buf = ' ';
		++buf;
	}

when called with len == 0, triggers an issue in the GCC SRA optimization
pass (Scalar Replacement of Aggregates), which handles promotion of signed
struct members incorrectly. This is a known but as yet unresolved issue.
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular
case, it is causing the second while loop to be executed erroneously a
single time, causing the additional space characters to be printed.

So disable the optimization by passing -fno-ipa-sra.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-22 14:37:49 -07:00
Patrick Tjin 89ef8f4ebd arm/configs: bullhead: remove SysV IPC from kernel
System V IPCs are not compliant with Android's application lifecycle
because allocated resources are not freeable by the low memory killer.
This lead to global kernel resource leakage.

For example, there is no way to automatically release a SysV
semaphore allocated in the kernel when:
- a buggy or malicious process exits
- a non-buggy and non-malicious process crashes or is explicitly
  killed.

Killing processes automatically to make room for new ones is an
important part of Android's application lifecycle implementation.
This means that, even assuming only non-buggy and non-malicious
code, it is very likely that over time, the kernel global tables
used to implement SysV IPCs will fill up.

Bug: 24551430
Bug: 22300191

Signed-off-by: Patrick Tjin <pattjin@google.com>
Change-Id: I3ee30e0c3526114771479338817c725a1933fa59
2015-10-15 21:26:53 +00:00
Thierry Strudel 82d382b58d ARM64: dts: enable brake pattern in haptic
Change-Id: I998e63c93341d266c0bd4b8b09d7bcda92a2fd42
Signed-off-by: Thierry Strudel <tstrudel@google.com>
2015-10-08 09:25:43 -07:00
Ajay Dudani e4c0ffe0e8 Revert "defconfig: bullhead: Disable CONFIG_BUS_AUTO_SUSPEND"
This reverts commit 0151fc7c0a.

Signed-off-by: Ajay Dudani <adudani@codeaurora.org>
2015-10-02 16:22:44 +00:00
daekwang.yoo c6644fd6eb bullhead_defconfig: enable ION alloc 4KB chunks
ion: Support an option to allocate buffers in 4KB chunks

For performance profiling, we want to create worst case
scenarios where system is quite fragmented and cannot
allocate buffers in chunks more than 4KB. Enable an
option which forcibly allocates in 4KB chunks only.

Change-Id: I8d9415940c89642f36ef942060f132d7ed200fec
Signed-off-by: daekwang.yoo <daekwang.yoo@lge.com>
2015-10-01 16:07:06 +00:00
Alexei Starovoitov 9f6191daa5 x86: bpf_jit: fix compilation of large bpf programs
commit 3f7352bf21f8fd7ba3e2fcef9488756f188e12be upstream.

x86 has variable length encoding. x86 JIT compiler is trying
to pick the shortest encoding for given bpf instruction.
While doing so the jump targets are changing, so JIT is doing
multiple passes over the program. Typical program needs 3 passes.
Some very short programs converge with 2 passes. Large programs
may need 4 or 5. But specially crafted bpf programs may hit the
pass limit and if the program converges on the last iteration
the JIT compiler will be producing an image full of 'int 3' insns.
Fix this corner case by doing final iteration over bpf program.

Fixes: 0a14842f5a ("net: filter: Just In Time compiler for x86-64")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01 12:07:34 +02:00
Helge Deller 706ad8dcb5 parisc: Filter out spurious interrupts in PA-RISC irq handler
commit b1b4e435e4ef7de77f07bf2a42c8380b960c2d44 upstream.

When detecting a serial port on newer PA-RISC machines (with iosapic) we have a
long way to go to find the right IRQ line, registering it, then registering the
serial port and the irq handler for the serial port. During this phase spurious
interrupts for the serial port may happen which then crashes the kernel because
the action handler might not have been set up yet.

So, basically it's a race condition between the serial port hardware and the
CPU which sets up the necessary fields in the irq sructs. The main reason for
this race is, that we unmask the serial port irqs too early without having set
up everything properly before (which isn't easily possible because we need the
IRQ number to register the serial ports).

This patch is a work-around for this problem. It adds checks to the CPU irq
handler to verify if the IRQ action field has been initialized already. If not,
we just skip this interrupt (which isn't critical for a serial port at bootup).
The real fix would probably involve rewriting all PA-RISC specific IRQ code
(for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of
the irq chips and proper irq enabling along this line.

This bug has been in the PA-RISC port since the beginning, but the crashes
happened very rarely with currently used hardware.  But on the latest machine
which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900,
1GHz) and which has the largest possible L1 cache size (64MB each), the kernel
crashed at every boot because of this race. So, without this patch the machine
would currently be unuseable.

For the record, here is the flow logic:
1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq().
2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq.
3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq
4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!)
5. serial_init_chip() then registers the 8250 port.
Problems:
- In step 4 the CPU irq shouldn't have been registered yet, but after step 5
- If serial irq happens between 4 and 5 have finished, the kernel will crash

Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01 12:07:32 +02:00
Minfei Huang 55b9029eab x86/mm: Initialize pmd_idx in page_table_range_init_count()
commit 9962eea9e55f797f05f20ba6448929cab2a9f018 upstream.

The variable pmd_idx is not initialized for the first iteration of the
for loop.

Assign the proper value which indexes the start address.

Fixes: 719272c45b 'x86, mm: only call early_ioremap_page_table_range_init() once'
Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Cc: tony.luck@intel.com
Cc: wangnan0@huawei.com
Cc: david.vrabel@citrix.com
Reviewed-by: yinghai@kernel.org
Link: http://lkml.kernel.org/r/1436703522-29552-1-git-send-email-mhuang@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01 12:07:31 +02:00
Thomas Huth 2ba90c0e06 powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers
commit 1c2cb594441d02815d304cccec9742ff5c707495 upstream.

The EPOW interrupt handler uses rtas_get_sensor(), which in turn
uses rtas_busy_delay() to wait for RTAS becoming ready in case it
is necessary. But rtas_busy_delay() is annotated with might_sleep()
and thus may not be used by interrupts handlers like the EPOW handler!
This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled:

 BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
 Call Trace:
 [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
 [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
 [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
 [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
 [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
 [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
 [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
 [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
 [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
 [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
 [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
 [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
 [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180

Fix this issue by introducing a new rtas_get_sensor_fast() function
that does not use rtas_busy_delay() - and thus can only be used for
sensors that do not cause a BUSY condition - known as "fast" sensors.

The EPOW sensor is defined to be "fast" in sPAPR - mpe.

Fixes: 587f83e8dd ("powerpc/pseries: Use rtas_get_sensor in RAS code")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01 12:07:30 +02:00
Michael Ellerman 36e5789bc7 powerpc/mm: Fix pte_pagesize_index() crash on 4K w/64K hash
commit 74b5037baa2011a2799e2c43adde7d171b072f9e upstream.

The powerpc kernel can be built to have either a 4K PAGE_SIZE or a 64K
PAGE_SIZE.

However when built with a 4K PAGE_SIZE there is an additional config
option which can be enabled, PPC_HAS_HASH_64K, which means the kernel
also knows how to hash a 64K page even though the base PAGE_SIZE is 4K.

This is used in one obscure configuration, to support 64K pages for SPU
local store on the Cell processor when the rest of the kernel is using
4K pages.

In this configuration, pte_pagesize_index() is defined to just pass
through its arguments to get_slice_psize(). However pte_pagesize_index()
is called for both user and kernel addresses, whereas get_slice_psize()
only knows how to handle user addresses.

This has been broken forever, however until recently it happened to
work. That was because in get_slice_psize() the large kernel address
would cause the right shift of the slice mask to return zero.

However in commit 7aa0727f33 ("powerpc/mm: Increase the slice range to
64TB"), the get_slice_psize() code was changed so that instead of a
right shift we do an array lookup based on the address. When passed a
kernel address this means we index way off the end of the slice array
and return random junk.

That is only fatal if we happen to hit something non-zero, but when we
do return a non-zero value we confuse the MMU code and eventually cause
a check stop.

This fix is ugly, but simple. When we're called for a kernel address we
return 4K, which is always correct in this configuration, otherwise we
use the slice mask.

Fixes: 7aa0727f33 ("powerpc/mm: Increase the slice range to 64TB")
Reported-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01 12:07:30 +02:00
Will Deacon 8a31f0de7f arm64: head.S: initialise mdcr_el2 in el2_setup
commit d10bcd473301888f957ec4b6b12aa3621be78d59 upstream.

When entering the kernel at EL2, we fail to initialise the MDCR_EL2
register which controls debug access and PMU capabilities at EL1.

This patch ensures that the register is initialised so that all traps
are disabled and all the PMU counters are available to the host. When a
guest is scheduled, KVM takes care to configure trapping appropriately.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01 12:07:29 +02:00
Will Deacon a507adf4f0 arm64: compat: fix vfp save/restore across signal handlers in big-endian
commit bdec97a855ef1e239f130f7a11584721c9a1bf04 upstream.

When saving/restoring the VFP registers from a compat (AArch32)
signal frame, we rely on the compat registers forming a prefix of the
native register file and therefore make use of copy_{to,from}_user to
transfer between the native fpsimd_state and the compat_vfp_sigframe.

Unfortunately, this doesn't work so well in a big-endian environment.
Our fpsimd save/restore code operates directly on 128-bit quantities
(Q registers) whereas the compat_vfp_sigframe represents the registers
as an array of 64-bit (D) registers. The architecture packs the compat D
registers into the Q registers, with the least significant bytes holding
the lower register. Consequently, we need to swap the 64-bit halves when
converting between these two representations on a big-endian machine.

This patch replaces the __copy_{to,from}_user invocations in our
compat VFP signal handling code with explicit __put_user loops that
operate on 64-bit values and swap them accordingly.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-01 12:07:29 +02:00