Commit Graph

16812 Commits

Author SHA1 Message Date
Peter Zijlstra 1f6661e256 perf: Fix fasync handling on inherited events
commit fed66e2cdd4f127a43fd11b8d92a99bdd429528c upstream.

Vince reported that the fasync signal stuff doesn't work proper for
inherited events. So fix that.

Installing fasync allocates memory and sets filp->f_flags |= FASYNC,
which upon the demise of the file descriptor ensures the allocation is
freed and state is updated.

Now for perf, we can have the events stick around for a while after the
original FD is dead because of references from child events. So we
cannot copy the fasync pointer around. We can however consistently use
the parent's fasync, as that will be updated.

Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho deMelo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-13 09:07:59 -07:00
Joonwoo Park b715e2a8ca sched: look for least busy and fallback CPU only when it's needed
Function best_small_task_cpu() has bias on mostly idle CPUs and shallow
cstate CPUs.  Thus chance of needing to find the least busy or the least
power cost fallback CPU is quite rare typically.  At present, however,
the function finds those two CPUs always unnecessarily for most of time.

Optimize the function by amending it to look for the least busy CPU and
the least power cost fallback CPU only when those are in need.  This change
is solely for optimization and doesn't make functional changes.

CRs-fixed: 849655
Change-Id: I5eca11436e85b448142a7a7644f422c71eb25e8e
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-09-11 00:33:58 -07:00
Patrick Daly 8a397a7081 Revert "rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks"
This reverts commit c0f4dfd4f9.

The above change allows the CPU to enter idle with RCU callbacks pending
for the durations specified in rcu_idle_gp_delay or rcu_idle_lazy_gp_delay.
This was causing a throughput decrease in synchronize_rcu(), as mentioned
by the original author.
These decreases were determined to be too large, as sub- 1 jiffy
synchronize_rcu() latency in cpu idle scenarios was desired.

Reverting this change change in order to start the force_quiescent_state()
state machine upon entering idle, rather than on the scheduler tick()

CRs-Fixed: 887035
Change-Id: I61865131dad751ec0fa14653b2b08b8bada4d061
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
2015-09-11 00:33:57 -07:00
Tim Murray 068a452ae3 sched: Return fallback CPU when no CPUs are available.
Given certain combinations of hotplug and cpusets, it is possible
to have a cpumask with no valid CPUs, including the CPU where the
task is currently running.

Fixes a regression caused by 93e62b823, because we will run the
body of the loop with an invalid CPU index instead of falling
through to return fallback_cpu.

bug 23748688

Change-Id: I621e4f6ab12f9fe932dfd3cd12d705d5617836bc
2015-09-03 22:56:18 +00:00
Jongrak Kwon 9b1420afd1 panic: resume console on panic
Device rebooted by kernel panic but no panic message if
console was suspended. This patch enables console if it's
suspended to get the panic messages.

Bug: 23629167

Change-Id: I61220b7de74f51677f3c0c8117bd8468ca9abcd3
Signed-off-by: Jongrak Kwon <jongrak.kwon@lge.com>
2015-09-02 23:13:30 -07:00
Riley Andrews 8882445f98 sched: fair: Change the synchronous wakeup logic in hmp.
If doing a synchronous wakeup keep the woken task on the same
core as the waker, ignoring task history (i.e. power) on task placement.

Change-Id: I911702ef2ed9e3705d19591da1f0e33d2198c1df
2015-09-01 22:35:14 +00:00
Joonwoo Park a6bd7ae282 sched: inline function scale_load_to_cpu()
Inline relatively small and frequently used function scale_load_to_cpu().

CRs-fixed: 849655
Change-Id: Id5f60595c394959d78e6da4cc4c18c338fec285b
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-08-24 17:45:31 -07:00
Joonwoo Park 93e62b823f sched: iterate search CPUs starting from prev_cpu for optimization
Function best_small_task_cpu() looks for a mostly idle CPU and returns it
as the best CPU for a given small task.  At present, however, it cannot
break the CPU search loop when the function found a mostly idle CPU but
continues to iterate CPU search loop because the function needs to find
and return the given task's previous CPU as the best CPU to avoid
unnecessary task migration when the previous CPU is mostly idle.

Optimize the function best_small_task_cpu() to iterate search CPUs
starting from the given task's CPU so it can break the loop as soon as
mostly idle CPU found.  This optimization saves few hundreds ns spent by
the function and doesn't make any functional change.

CRs-fixed: 849655
Change-Id: I8c540963487f4102dac4d54e9f98e24a4a92a7b3
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-08-24 17:45:23 -07:00
Syed Rameez Mustafa da6603f1ae sched: Optimize the select_best_cpu() "for" loop
select_best_cpu() is agnostic of the hardware topology. This means that
certain functions such as task_will_fit() and skip_cpu() are run
unnecessarily for every CPU in a cluster whereas they need to run only
once per cluster. Reduce the execution time of select_best_cpu() by
ensuring these functions run only once per cluster. The frequency domain
mask is used to identify CPUs that fall in the same cluster.

CRs-fixed: 849655
Change-Id: Id24208710a0fc6321e24d9a773f00be9312b75de
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: added continue after clearing search_cpus.
 fixed indentations with space.  fixed skip_cpu() to return true when rq ==
 task_rq.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-08-24 17:45:18 -07:00
Syed Rameez Mustafa 6f717a4bf8 sched: Optimize select_best_cpu() to reduce execution time
select_best_cpu() is a crucial wakeup routine that determines the
time taken by the scheduler to wake up a task. Optimize this routine
to get higher performance. The following changes have been made as
part of the optimization listed in order of how they built on top of
one another:

* Several routines called by select_best_cpu() recalculate task load
  and CPU load even though these are already known quantities. For
  example mostly_idle_cpu_sync() calculates CPU load; task_will_fit()
  calculates task load before spill_threshold_crossed() recalculates
  both. Remove these redundant calculations by moving the task load
  and CPU load computations to the select_best_cpu() 'for' loop and
  passing to any functions that need the information.

* Rewrite best_small_task_cpu() to avoid the existing two pass
  approach. The two pass approach was only in place to find the
  minimum power cluster for small task placement. This information
  can easily be established by looking at runqueue capacities. The
  cluster with not the highest capacity constitutes the minimum power
  cluster. A special CPU mask is called the mpc_mask required to safeguard
  against undue side effects on SMP systems. Also terminate the function
  early if the previous CPU is found to be mostly_idle.

* Reorganize code to ensure that no unnecessary computations or
  variable assignments are done. For example there is no need to
  compute CPU load if that information does not end up getting used
  in any iteration of the 'for' loop.

* The tick logic for EA migrations unnecessarily checks for the power
  of all CPUs only for skip_cpu() to throw away the result later.
  Ensure that for EA we only check CPUs within the same cluster
  and avoid running select_best_cpu() whenever possible.

CRs-fixed: 849655
Change-Id: I4e722912fcf3fe4e365a826d4d92a4dd45c05ef3
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed cpufreq_notifier_policy() to set mpc_mask.
 added a comment about prerequisite of lower_power_cpu_available().
 s/struct rq * rq/struct rq *rq/.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-08-24 17:45:09 -07:00
Joonwoo Park dde9a3644f sched: avoid CPUs with high irq activity for non-small tasks
The irq-aware scheduler is to achieve better performance by avoiding task
placement to the CPUs which have high irq activity.  However current
scheduler places tasks to the CPUs which are loaded by irq activity
preferably as opposed to what it is meant to be when the task is non-small.
This is suboptimal for both power and performance.
Fix task placement algorithm to avoid CPUs with significant irq activities.

Change-Id: Ifa5a6ac186241bd58fa614e93e3d873a5f5ad4ca
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-08-24 17:45:04 -07:00
Naveen Ramaraj e4ba1e2775 sched: Do not track cpu power for tasks if arch does not support it
This fixes the undefined references to acct_update_power() when
building for ARCH=um introduced by commit: c0a74c5d

Bug: 21631098
Change-Id: Id8565f02e141395a1ffb269de09a062680552090
Signed-off-by: Naveen Ramaraj <nramaraj@codeaurora.org>
2015-08-22 00:02:28 -07:00
Peter Zijlstra 4adac624b9 hrtimer: Allow concurrent hrtimer_start() for self restarting timers
Because we drop cpu_base->lock around calling hrtimer::function, it is
possible for hrtimer_start() to come in between and enqueue the timer.

If hrtimer::function then returns HRTIMER_RESTART we'll hit the BUG_ON
because HRTIMER_STATE_ENQUEUED will be set.

Since the above is a perfectly valid scenario, remove the BUG_ON and
make the enqueue_hrtimer() call conditional on the timer not being
enqueued already.

NOTE: in that concurrent scenario its entirely common for both sites
to want to modify the hrtimer, since hrtimers don't provide
serialization themselves be sure to provide some such that the
hrtimer::function and the hrtimer_start() caller don't both try and
fudge the expiration state at the same time.

To that effect, add a WARN when someone tries to forward an already
enqueued timer, the most common way to change the expiry of self
restarting timers. Ideally we'd put the WARN in everything modifying
the expiry but most of that is inlines and we don't need the bloat.

Change-Id: I819cca2a35430886b77ac8d43a6fbdcfd8b2b238
Fixes: 2d44ae4d71 ("hrtimer: clean up cpu->base locking tricks")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/20150415113105.GT5029@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

(cherry picked from commit 5de2755c8c8b3a6b8414870e2c284914a2b42e4d)
Signed-off-by: Thierry Strudel <tstrudel@google.com>
2015-08-21 23:27:45 -07:00
Amanieu d'Antras a6bb935312 signal: fix information leak in copy_siginfo_from_user32
commit 3c00cb5e68dc719f2fc73a33b1b230aadfcb1309 upstream.

This function can leak kernel stack data when the user siginfo_t has a
positive si_code value.  The top 16 bits of si_code descibe which fields
in the siginfo_t union are active, but they are treated inconsistently
between copy_siginfo_from_user32, copy_siginfo_to_user32 and
copy_siginfo_to_user.

copy_siginfo_from_user32 is called from rt_sigqueueinfo and
rt_tgsigqueueinfo in which the user has full control overthe top 16 bits
of si_code.

This fixes the following information leaks:
x86:   8 bytes leaked when sending a signal from a 32-bit process to
       itself. This leak grows to 16 bytes if the process uses x32.
       (si_code = __SI_CHLD)
x86:   100 bytes leaked when sending a signal from a 32-bit process to
       a 64-bit process. (si_code = -1)
sparc: 4 bytes leaked when sending a signal from a 32-bit process to a
       64-bit process. (si_code = any)

parsic and s390 have similar bugs, but they are not vulnerable because
rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code
to a different process.  These bugs are also fixed for consistency.

Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16 20:51:42 -07:00
Amanieu d'Antras 16a49557bc signal: fix information leak in copy_siginfo_to_user
commit 26135022f85105ad725cda103fa069e29e83bd16 upstream.

This function may copy the si_addr_lsb, si_lower and si_upper fields to
user mode when they haven't been initialized, which can leak kernel
stack data to user mode.

Just checking the value of si_code is insufficient because the same
si_code value is shared between multiple signals.  This is solved by
checking the value of si_signo in addition to si_code.

Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-16 20:51:42 -07:00
Riley Andrews e1798921cb mutex: Add a delay into the SPIN_ON_OWNER wait loop.
On arm systems the spin on owner optimization can intermittently cause a
lockup that's usually as long as the waiting thread's cpu timeslice. The
repeated mutex aquisitions + atomics in a single spinning thread can
completely lock out the owner from releasing the kernel mutex. The
owner needs to acquire a spinlock on the relase path and this spinlock
can share a monitor with the other locks and atomics on the waiter path.
Rate limit the waiter so that the thread releasing the mutex never
is starved.

Bug 23036902

Change-Id: Ie1b64275a0c6141f94faaf3e63fcbf9b5438140c
2015-08-11 23:00:58 +00:00
Thomas Gleixner dcc2305a48 genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD
commit 75a06189fc508a2acf470b0b12710362ffb2c4b1 upstream.

The resend mechanism happily calls the interrupt handler of interrupts
which are marked IRQ_NESTED_THREAD from softirq context. This can
result in crashes because the interrupt handler is not the proper way
to invoke the device handlers. They must be invoked via
handle_nested_irq.

Prevent the resend even if the interrupt has no valid parent irq
set. Its better to have a lost interrupt than a crashing machine.

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-10 12:20:30 -07:00
Ruchi Kandoi 3056ec9456 wakeup_reason: use vsnprintf instead of snsprintf for vargs.
Bug: 22368519
Change-Id: I9dd0262111d50079a7df371716508ea66e272337
Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
2015-08-06 05:40:30 +00:00
Steven Rostedt (Red Hat) f1bb130708 tracing: Have branch tracer use recursive field of task struct
commit 6224beb12e190ff11f3c7d4bf50cb2922878f600 upstream.

Fengguang Wu's tests triggered a bug in the branch tracer's start up
test when CONFIG_DEBUG_PREEMPT set. This was because that config
adds some debug logic in the per cpu field, which calls back into
the branch tracer.

The branch tracer has its own recursive checks, but uses a per cpu
variable to implement it. If retrieving the per cpu variable calls
back into the branch tracer, you can see how things will break.

Instead of using a per cpu variable, use the trace_recursion field
of the current task struct. Simply set a bit when entering the
branch tracing and clear it when leaving. If the bit is set on
entry, just don't do the tracing.

There's also the case with lockdep, as the local_irq_save() called
before the recursion can also trigger code that can call back into
the function. Changing that to a raw_local_irq_save() will protect
that as well.

This prevents the recursion and the inevitable crash that follows.

Link: http://lkml.kernel.org/r/20150630141803.GA28071@wfg-t540p.sh.intel.com

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-03 09:29:45 -07:00
Steven Rostedt (Red Hat) 3f6ba7f88d tracing/filter: Do not allow infix to exceed end of string
commit 6b88f44e161b9ee2a803e5b2b1fbcf4e20e8b980 upstream.

While debugging a WARN_ON() for filtering, I found that it is possible
for the filter string to be referenced after its end. With the filter:

 # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter

The filter_parse() function can call infix_get_op() which calls
infix_advance() that updates the infix filter pointers for the cnt
and tail without checking if the filter is already at the end, which
will put the cnt to zero and the tail beyond the end. The loop then calls
infix_next() that has

	ps->infix.cnt--;
	return ps->infix.string[ps->infix.tail++];

The cnt will now be below zero, and the tail that is returned is
already passed the end of the filter string. So far the allocation
of the filter string usually has some buffer that is zeroed out, but
if the filter string is of the exact size of the allocated buffer
there's no guarantee that the charater after the nul terminating
character will be zero.

Luckily, only root can write to the filter.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-03 09:29:45 -07:00
Steven Rostedt (Red Hat) c8b9a1fbd1 tracing/filter: Do not WARN on operand count going below zero
commit b4875bbe7e68f139bd3383828ae8e994a0df6d28 upstream.

When testing the fix for the trace filter, I could not come up with
a scenario where the operand count goes below zero, so I added a
WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case
that it can happen (although the filter would be wrong).

 # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter

That is, a single operation without any operands will hit the path
where the WARN_ON_ONCE() can trigger. Although this is harmless,
and the filter is reported as a error. But instead of spitting out
a warning to the kernel dmesg, just fail nicely and report it via
the proper channels.

Link: http://lkml.kernel.org/r/558C6082.90608@oracle.com

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-08-03 09:29:45 -07:00
Joonwoo Park 5ed5e4f869 sched: fix incorrect prev_runnable_sum accounting with long ISR run
At present, when IRQ handler spans multiple scheduler windows, HMP
scheduler resets the IRQ CPU's prev_runnable_sum with its current max
capacity under the assumption that there is no other possible contribution
to the CPU's prev_runnable_sum.  This isn't correct as another CPU can
migrate tasks to the IRQ CPU.

Furthermore such incorrectness can trigger BUG_ON() if the migrated task's
prev_window is larger than migrating CPU's current capacity in following
scenario.

1. ISR on the power efficient CPU has been running for multiple windows.
2. A task which has prev_window higher than IRQ CPU's current capacity
   migrated to the IRQ CPU.
3. Servicing IRQ is done and the IRQ CPU resets its prev_runnable_rum =
   CPU's current capacity.
4. Before window rollover, the task on the IRQ CPU migrates to other CPU
   and fixes up source and destnation CPUs' busy time.
5. BUG_ON(src_rq->prev_runnable_sum < 0) triggers as p->ravg.prev_window
   is larger than src_rq->prev_runnable_sum.

Fix such incorrectness by preserving prev_runnable_sum when ISR spans
multiple scheduler windows.  There is no need to reset it.

CRs-fixed: 828055
Change-Id: I1f95ece026493e49d3810f9c940ec5f698cc0b81
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-07-21 16:55:35 -07:00
Micha Kalfon 7e20affe77 prctl: make PR_SET_TIMERSLACK_PID pid namespace aware
Make PR_SET_TIMERSLACK_PID consider pid namespace and resolve the
target pid in the caller's namespace. Otherwise, calls from pid
namespace other than init would fail or affect the wrong task.

Change-Id: I1da15196abc4096536713ce03714e99d2e63820a
Signed-off-by: Micha Kalfon <micha@cellrox.com>
Acked-by: Oren Laadan <orenl@cellrox.com>
2015-07-21 16:06:29 -07:00
Micha Kalfon 2f325b1f46 prctl: fix misplaced PR_SET_TIMERSLACK_PID case
The case clause for the PR_SET_TIMERSLACK_PID option was placed inside
the an internal switch statement for PR_MCE_KILL (see commits 37a591d4
and 8ae872f1) . This commit moves it to the right place.

Change-Id: I63251669d7e2f2aa843d1b0900e7df61518c3dea
Signed-off-by: Micha Kalfon <micha@cellrox.com>
Acked-by: Oren Laadan <orenl@cellrox.com>
2015-07-21 16:06:29 -07:00
Andy Lutomirski 7ae89c27e2 seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
The secure_computing function took a syscall number parameter, but
it only paid any attention to that parameter if seccomp mode 1 was
enabled.  Rather than coming up with a kludge to get the parameter
to work in mode 2, just remove the parameter.

To avoid churn in arches that don't have seccomp filters (and may
not even support syscall_get_nr right now), this leaves the
parameter in secure_computing_strict, which is now a real function.

For ARM, this is a bit ugly due to the fact that ARM conditionally
supports seccomp filters.  Fixing that would probably only be a
couple of lines of code, but it should be coordinated with the audit
maintainers.

This will be a slight slowdown on some arches.  The right fix is to
pass in all of seccomp_data instead of trying to make just the
syscall nr part be fast.

This is a prerequisite for making two-phase seccomp work cleanly.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: x86@kernel.org
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
2015-07-06 17:16:22 -07:00
Guenter Roeck 13586040a7 seccomp: Replace BUG(!spin_is_locked()) with assert_spin_lock
Current upstream kernel hangs with mips and powerpc targets in
uniprocessor mode if SECCOMP is configured.

Bisect points to commit dbd952127d11 ("seccomp: introduce writer locking").
Turns out that code such as
	BUG_ON(!spin_is_locked(&list_lock));
can not be used in uniprocessor mode because spin_is_locked() always
returns false in this configuration, and that assert_spin_locked()
exists for that very purpose and must be used instead.

Fixes: dbd952127d11 ("seccomp: introduce writer locking")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
2015-07-06 17:16:21 -07:00
Kees Cook 6108b31b1d seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
Applying restrictive seccomp filter programs to large or diverse
codebases often requires handling threads which may be started early in
the process lifetime (e.g., by code that is linked in). While it is
possible to apply permissive programs prior to process start up, it is
difficult to further restrict the kernel ABI to those threads after that
point.

This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for
synchronizing thread group seccomp filters at filter installation time.

When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
filter) an attempt will be made to synchronize all threads in current's
threadgroup to its new seccomp filter program. This is possible iff all
threads are using a filter that is an ancestor to the filter current is
attempting to synchronize to. NULL filters (where the task is running as
SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be
transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS,
...) has been set on the calling thread, no_new_privs will be set for
all synchronized threads too. On success, 0 is returned. On failure,
the pid of one of the failing threads will be returned and no filters
will have been applied.

The race conditions against another thread are:
- requesting TSYNC (already handled by sighand lock)
- performing a clone (already handled by sighand lock)
- changing its filter (already handled by sighand lock)
- calling exec (handled by cred_guard_mutex)
The clone case is assisted by the fact that new threads will have their
seccomp state duplicated from their parent before appearing on the tasklist.

Holding cred_guard_mutex means that seccomp filters cannot be assigned
while in the middle of another thread's exec (potentially bypassing
no_new_privs or similar). The call to de_thread() may kill threads waiting
for the mutex.

Changes across threads to the filter pointer includes a barrier.

Based on patches by Will Drewry.

Suggested-by: Julien Tinnes <jln@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2015-07-06 17:16:21 -07:00
Kees Cook 4ffce22eef seccomp: allow mode setting across threads
This changes the mode setting helper to allow threads to change the
seccomp mode from another thread. We must maintain barriers to keep
TIF_SECCOMP synchronized with the rest of the seccomp state.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>

Conflicts:
	kernel/seccomp.c
2015-07-06 17:16:21 -07:00
Kees Cook bdac154942 seccomp: introduce writer locking
Normally, task_struct.seccomp.filter is only ever read or modified by
the task that owns it (current). This property aids in fast access
during system call filtering as read access is lockless.

Updating the pointer from another task, however, opens up race
conditions. To allow cross-thread filter pointer updates, writes to the
seccomp fields are now protected by the sighand spinlock (which is shared
by all threads in the thread group). Read access remains lockless because
pointer updates themselves are atomic.  However, writes (or cloning)
often entail additional checking (like maximum instruction counts)
which require locking to perform safely.

In the case of cloning threads, the child is invisible to the system
until it enters the task list. To make sure a child can't be cloned from
a thread and left in a prior state, seccomp duplication is additionally
moved under the sighand lock. Then parent and child are certain have
the same seccomp state when they exit the lock.

Based on patches by Will Drewry and David Drysdale.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2015-07-06 17:16:21 -07:00
Kees Cook 25b84f016a seccomp: split filter prep from check and apply
In preparation for adding seccomp locking, move filter creation away
from where it is checked and applied. This will allow for locking where
no memory allocation is happening. The validation, filter attachment,
and seccomp mode setting can all happen under the future locks.

For extreme defensiveness, I've added a BUG_ON check for the calculated
size of the buffer allocation in case BPF_MAXINSN ever changes, which
shouldn't ever happen. The compiler should actually optimize out this
check since the test above it makes it impossible.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>

Conflicts:
	kernel/seccomp.c
2015-07-06 17:16:20 -07:00
Kees Cook 8334b709bf sched: move no_new_privs into new atomic flags
Since seccomp transitions between threads requires updates to the
no_new_privs flag to be atomic, the flag must be part of an atomic flag
set. This moves the nnp flag into a separate task field, and introduces
accessors.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2015-07-06 17:16:20 -07:00
Kees Cook c7ff43e528 seccomp: add "seccomp" syscall
This adds the new "seccomp" syscall with both an "operation" and "flags"
parameter for future expansion. The third argument is a pointer value,
used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).

In addition to the TSYNC flag later in this patch series, there is a
non-zero chance that this syscall could be used for configuring a fixed
argument area for seccomp-tracer-aware processes to pass syscall arguments
in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter"
for this syscall. Additionally, this syscall uses operation, flags,
and user pointer for arguments because strictly passing arguments via
a user pointer would mean seccomp itself would be unable to trivially
filter the seccomp syscall itself.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2015-07-06 17:16:20 -07:00
Kees Cook c13c8c32dc seccomp: split mode setting routines
Separates the two mode setting paths to make things more readable with
fewer #ifdefs within function bodies.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2015-07-06 17:16:20 -07:00
Kees Cook c14cdadc57 seccomp: extract check/assign mode helpers
To support splitting mode 1 from mode 2, extract the mode checking and
assignment logic into common functions.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2015-07-06 17:16:20 -07:00
Kees Cook 82e69a633c seccomp: create internal mode-setting function
In preparation for having other callers of the seccomp mode setting
logic, split the prctl entry point away from the core logic that performs
seccomp mode setting.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2015-07-06 17:16:19 -07:00
Eric Paris 4802a181c2 syscall_get_arch: remove useless function arguments
Every caller of syscall_get_arch() uses current for the task and no
implementors of the function need args.  So just get rid of both of
those things.  Admittedly, since these are inline functions we aren't
wasting stack space, but it just makes the prototypes better.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux390@de.ibm.com
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
2015-07-06 17:16:19 -07:00
Rashika Kheria d9de37f398 kernel: Mark function as static in kernel/seccomp.c
Mark function as static in kernel/seccomp.c because it is not used
outside this file.

This eliminates the following warning in kernel/seccomp.c:
kernel/seccomp.c:296:6: warning: no previous prototype for ?seccomp_attach_user_filter? [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Will Drewry <wad@chromium.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2015-07-06 17:16:19 -07:00
Mark Grondona a20cc70d1e __ptrace_may_access() should not deny sub-threads
commit 73af963f9f3036dffed55c3a2898598186db1045 upstream.

__ptrace_may_access() checks get_dumpable/ptrace_has_cap/etc if task !=
current, this can can lead to surprising results.

For example, a sub-thread can't readlink("/proc/self/exe") if the
executable is not readable.  setup_new_exec()->would_dump() notices that
inode_permission(MAY_READ) fails and then it does
set_dumpable(suid_dumpable).  After that get_dumpable() fails.

(It is not clear why proc_pid_readlink() checks get_dumpable(), perhaps we
could add PTRACE_MODE_NODUMPABLE)

Change __ptrace_may_access() to use same_thread_group() instead of "task
== current".  Any security check is pointless when the tasks share the
same ->mm.

Signed-off-by: Mark Grondona <mgrondona@llnl.gov>
Signed-off-by: Ben Woodard <woodard@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-07-03 19:48:08 -07:00
Steven Rostedt 63dec31183 tracing: Have filter check for balanced ops
commit 2cf30dc180cea808077f003c5116388183e54f9e upstream.

When the following filter is used it causes a warning to trigger:

 # cd /sys/kernel/debug/tracing
 # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
 # cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: No error

 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990()
 Modules linked in: bnep lockd grace bluetooth  ...
 CPU: 3 PID: 1223 Comm: bash Tainted: G        W       4.1.0-rc3-test+ #450
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
  0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0
  0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c
  ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea
 Call Trace:
  [<ffffffff816ed4f9>] dump_stack+0x4c/0x6e
  [<ffffffff8107fb07>] warn_slowpath_common+0x97/0xe0
  [<ffffffff8136b46c>] ? _kstrtoull+0x2c/0x80
  [<ffffffff8107fb6a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff81159065>] replace_preds+0x3c5/0x990
  [<ffffffff811596b2>] create_filter+0x82/0xb0
  [<ffffffff81159944>] apply_event_filter+0xd4/0x180
  [<ffffffff81152bbf>] event_filter_write+0x8f/0x120
  [<ffffffff811db2a8>] __vfs_write+0x28/0xe0
  [<ffffffff811dda43>] ? __sb_start_write+0x53/0xf0
  [<ffffffff812e51e0>] ? security_file_permission+0x30/0xc0
  [<ffffffff811dc408>] vfs_write+0xb8/0x1b0
  [<ffffffff811dc72f>] SyS_write+0x4f/0xb0
  [<ffffffff816f5217>] system_call_fastpath+0x12/0x6a
 ---[ end trace e11028bd95818dcd ]---

Worse yet, reading the error message (the filter again) it says that
there was no error, when there clearly was. The issue is that the
code that checks the input does not check for balanced ops. That is,
having an op between a closed parenthesis and the next token.

This would only cause a warning, and fail out before doing any real
harm, but it should still not caues a warning, and the error reported
should work:

 # cd /sys/kernel/debug/tracing
 # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
 # cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: Meaningless filter expression

And give no kernel warning.

Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[ luis: backported to 3.16:
  - unconditionally decrement cnt as the OP_NOT logic was introduced only
    by e12c09cf3087 ("tracing: Add NOT to filtering logic") ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-29 12:08:34 -07:00
Wang Long 61a5c6bf34 ring-buffer-benchmark: Fix the wrong sched_priority of producer
commit 108029323910c5dd1ef8fa2d10da1ce5fbce6e12 upstream.

The producer should be used producer_fifo as its sched_priority,
so correct it.

Link: http://lkml.kernel.org/r/1433923957-67842-1-git-send-email-long.wanglong@huawei.com

Signed-off-by: Wang Long <long.wanglong@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-22 16:55:53 -07:00
Subbaraman Narayanamurthy b9b31537c1 kthread: Fix the race condition when kthread is parked
While stressing the CPU hotplug path, sometimes we hit a problem
as shown below.

[57056.416774] ------------[ cut here ]------------
[57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8
[57056.489245] Code: e594a000 eb085236 e15a0000 0a000000 (e7f001f2)
[57056.489259] ------------[ cut here ]------------
[57056.492840] kernel BUG at kernel/kernel/smpboot.c:134!
[57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[57056.519055] Modules linked in: wlan(O) mhi(O)
[57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: G        W  O 3.10.0-g3677c61-00008-g180c060 #1
[57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000
[57056.537991] PC is at smpboot_thread_fn+0x124/0x218
[57056.542750] LR is at smpboot_thread_fn+0x11c/0x218
[57056.547528] pc : [<c01931e8>]    lr : [<c01931e0>]    psr: 200f0013
[57056.547528] sp : f0e79f30  ip : 00000000  fp : 00000000
[57056.558983] r10: 00000001  r9 : 00000000  r8 : f0e78000
[57056.564192] r7 : 00000001  r6 : c1195758  r5 : f0e78000  r4 : f0e5fd00
[57056.570701] r3 : 00000001  r2 : f0e79f20  r1 : 00000000  r0 : 00000000

This issue was always seen in the context of "ksoftirqd". It seems to
be happening because of a potential race condition in __kthread_parkme
where just after completing the parked completion, before the
ksoftirqd task has been scheduled again, it can go into running state.

Fix this by waiting for the task state to parked after waiting the
parked completion.

CRs-Fixed: 659674
Change-Id: If3f0e9b706eeb5d30d5a32f84378d35bb03fe794
Signed-off-by: Subbaraman Narayanamurthy <subbaram@codeaurora.org>
2015-06-19 15:54:09 -07:00
Riley Andrews 77464f4ec7 cpuset: Make cpusets restore on hotplug
This deliberately changes the behavior of the per-cpuset
cpus file to not be effected by hotplug. When a cpu is offlined,
it will be removed from the cpuset/cpus file. When a cpu is onlined,
if the cpuset originally requested that that cpu was part of the cpuset, that
cpu will be restored to the cpuset. The cpus files still
have to be hierachical, but the ranges no longer have to be out of
the currently online cpus, just the physically present cpus.

Change-Id: I3efbae24a1f6384be1e603fb56f0d3baef61d924
2015-06-19 13:59:27 -07:00
Riley Andrews 934373d935 cpuset: Add allow_attach hook for cpusets on android.
Change-Id: Ic1b61b2bbb7ce74c9e9422b5e22ee9078251de21
2015-06-19 13:59:20 -07:00
Patrick Tjin 8635fd59ea partial-resume: add missing mutex unlock
Change-Id: I0adc7ad8771b3c84b2f1f6e951e42463693df46c
Signed-off-by: Patrick Tjin <pattjin@google.com>
2015-06-19 10:25:15 -07:00
Ruchi Kandoi c0a74c5df7 sched: cpufreq: Adds a field cpu_power in the task_struct
cpu_power has been added to keep track of amount of power each task is
consuming. cpu_power is updated whenever stime and utime are updated for
a task. power is computed by taking into account the frequency at which
the current core was running and the current for cpu actively
running at hat frequency.

Bug: 21498425
Change-Id: Ic535941e7b339aab5cae9081a34049daeb44b248
Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
2015-06-18 23:36:19 -07:00
Nishanth Menon 614477b4d9 PM / QoS: Add debugfs support to view the list of constraints
PM QoS requests are notoriously hard to debug and made even
more so due to their highly dynamic nature. Having visibility
into the internal data representation per constraint allows
us to have much better appreciation of potential issues or
bad usage by drivers in the system.

So introduce for all classes of PM QoS, an entry in
/sys/kernel/debug/pm_qos that shall show all the current
requests as well as the snapshot of the value these requests
boil down to. For example:
==> /sys/kernel/debug/pm_qos/cpu_dma_latency <==
1: 4444: Active
2: 2000000000: Default
3: 2000000000: Default
4: 2000000000: Default
Type=Minimum, Value=4444, Requests: active=1 / total=4

==> /sys/kernel/debug/pm_qos/memory_bandwidth <==
Empty!

...

The actual value listed will have their meaning based
on the QoS it is on, the 'Type' indicates what logic
it would use to collate the information - Minimum,
Maximum, or Sum. Value is the collation of all requests.
This interface also compares the values with the defaults
for the QoS class and marks the ones that are
currently active.

Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Acked-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-18 12:54:39 -07:00
Dmitry Shmidt d669b095d7 irq: pm: Remove unused variable
Change-Id: Ie4311b554628af878cd80fd0abc03b2be294f0bf
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2015-06-16 15:30:41 -07:00
Ruchi Kandoi c955dd12c0 suspend: Return error when pending wakeup source is found.
If a wakeup source is found to be pending in the last stage of suspend
after syscore suspend then the device doesn't suspend but the error is
not propogated which causes an error in the accounting for the number
of suspend aborts and successful suspends.

Bug: 18179405
Change-Id: Ib63b4ead755127eaf03e3b303aab3c782ad02ed1
Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
2015-06-16 15:30:41 -07:00
Dmitry Shmidt ea3f235f71 PM: Reduce waiting for wakeup reasons to 100 ms
In 80% cases there is no need to wait, and in case
of timeout we continue to resume.

Change-Id: I6ae44e0ef6f7aa497f57fcd5f6e6bc83dc781852
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2015-06-16 15:30:41 -07:00
Iliyan Malchev 9bcc844f9e PM: wakeup_reasons: disable wakeup-reason deduction by default
Introduce a config item, CONFIG_DEDUCE_WAKEUP_REASONS, disabled by default.
Make CONFIG_PARTUALRESUME select it.

Change-Id: I7d831ff0a9dfe0a504824f4bc65ba55c4d92546b
Signed-off-by: Iliyan Malchev <malchev@google.com>
2015-06-16 15:30:40 -07:00
Ruchi Kandoi 28819bfd65 power: Avoids bogus error messages for the suspend aborts.
Avoids printing bogus error message "tasks refusing to freeze", in cases
where pending wakeup source caused the suspend abort.

Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
Change-Id: I913ad290f501b31cd536d039834c8d24c6f16928
2015-06-16 15:30:40 -07:00
Iliyan Malchev f4e4de4826 PM: wakeup_reasons: fix race condition
log_possible_wakeup_reason() and stop_logging_wakeup_reasons() can race, as the
latter can be called from process context, and both can run on separate cores.

Change-Id: I306441d0be46dd4fe58c55cdc162f9d61a28c27d
Signed-off-by: Iliyan Malchev <malchev@google.com>
2015-06-16 15:30:40 -07:00
Dmitry Shmidt d520963582 Power: Report total suspend times from boot in suspend_since_boot
This node exports five values separated by space.
    From left to right:
    1. Amount of suspend/resume cycles
    2. Amount of suspend abort cycles
    3. Total time spent in suspend/resume process
    4. Total time in suspend abort process
    5. Total time spent sleep in suspend state

Change-Id: I5d71af80516dd451eceba28228bfbaff9681f7ab
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2015-06-16 15:30:40 -07:00
jinqian 8b051c3819 Power: Report suspend times from last_suspend_time
This node epxorts two values separated by space.
From left to right:
1. time spent in suspend/resume process
2. time spent sleep in suspend state

Change-Id: I2cb9a9408a5fd12166aaec11b935a0fd6a408c63
2015-06-16 15:30:39 -07:00
Iliyan Malchev 953bd19e2c PM: extend suspend_again mechanism to use partialresume
The old platform suspend_again callback overrides drivers' votes, such that if
it implemented and returns false, then we do not call the partialresume
handlers.  When it doesn't exists or returns true, then we also query the
registered drivers for consensus.

When a device resumes from suspend, the suspend/resume code invokes
partialresume to check to see if the set of wakeup interrupts all have matching
handlers. If this is not the case, the PM subsystem can continue to resume as
before.  If all of the wakeup sources have matching handlers, then those are
invoked in turn (and can block), and if all of them reach consensus that the
reason for the wakeup can be ignored, they say so to the PM subsystem, which
goes right back into suspend.

Signed-off-by: Iliyan Malchev <malchev@google.com>
Change-Id: Iaeb9ed78c4b5fb815c6e9c701233e703f481f962
2015-06-16 15:30:39 -07:00
Iliyan Malchev f2d448b992 power: add partial-resume framework
Partial resume refers to the concept of not waking up userspace when the kernel
comes out of suspend for certain types of events that we wish to discard.  An
example is a network packet that can be disacarded in the kernel, or spurious
wakeup event that we wish to ignore.  Partial resume allows drivers to register
callbacks, one one hand, and provides hooks into the PM's suspend/resume
mechanism, on the other.

When a device resumes from suspend, the core suspend/resume code invokes
partialresume to check to see if the set of wakeup interrupts all have
matching handlers. If this is not the case, the PM subsystem can continue to
resume as before.  If all of the wakeup sources have matching handlers, then
those are invoked in turn (and can block), and if all of them reach consensus
that the reason for the wakeup can be ignored, they say so to the PM subsystem,
which goes right back into suspend.  This latter support is implemented in a
separate change.

Signed-off-by: Iliyan Malchev <malchev@google.com>
Change-Id: Id50940bb22a550b413412264508d259f7121d442
2015-06-16 15:30:39 -07:00
Iliyan Malchev 573c7a8e9e PM: wakeup_reason: add functions to query and clear wakeup reasons
The query results are valid until the next PM_SUSPEND_PREPARE.

Change-Id: I6bc2bd47c830262319576a001d39ac9a994916cf
Signed-off-by: Iliyan Malchev <malchev@google.com>
2015-06-16 15:30:39 -07:00
Lorenzo Colitti 9a215ba404 Put device-specific code behind #ifndef CONFIG_UML.
This is required to run net_test, which builds for ARCH=um.

Change-Id: I7041756c057b913a09554b66cd817b4abaa712a6
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2015-06-16 15:30:38 -07:00
Dmitry Shmidt 3803d603fe power: Add check_wakeup_reason() to verify wakeup source irq
Wakeup reason is set before driver resume handlers are called.
It is cleared before driver suspend handlers are called, on
PM_SUSPEND_PREPARE.

Change-Id: I04218c9b0c115a7877e8029c73e6679ff82e0aa4
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2015-06-16 15:30:38 -07:00
Ruchi Kandoi 07be015b30 power: Adds functionality to log the last suspend abort reason.
Extends the last_resume_reason to log suspend abort reason. The abort
reasons will have "Abort:" appended at the start to distinguish itself
from the resume reason.

Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
Change-Id: I3207f1844e3d87c706dfc298fb10e1c648814c5f
2015-06-16 15:30:38 -07:00
Theodore Ts'o 90c11314a2 sched: add bit_wait_io for 3.18 ext4 backport
Excerpted from commit 743162013: "sched: Remove proliferation of
wait_on_bit() action functions"

Change-Id: I56153d55a9af9f2911ed6ffb15d36ad89d45cd55
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
2015-06-15 15:09:46 -07:00
Ramakrishnan Ganesh 8c9b8b6409 Revert "sched: Use only partial wait time as task demand"
This reverts commit 2a7c208b28d7558a5e296cec662c691590b0652a.

Change-Id: Ie02645574fd93f4149aa6a3f355c71eb412c9bac
Signed-off-by: Ramakrishnan Ganesh <ramakris@codeaurora.org>
2015-05-28 12:17:00 -07:00
Steve Muckle 6ed14be2c5 sched: disable IRQs in update_min_max_capacity
IRQs must be disabled while locking runqueues since an
interrupt may cause a runqueue lock to be acquired.

CRs-fixed: 828598
Change-Id: Id66f2e25ed067fc4af028482db8c3abd3d10c20f
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2015-05-28 12:16:59 -07:00
Syed Rameez Mustafa a77f89168a sched: Use only partial wait time as task demand
The scheduler currently either considers a tasks entire wait time as
task demand or completely ignores wait time based on the tunable
sched_account_wait_time. Both approaches have their limitations,
however. The former artificially boosts tasks demand when it may not
actually be justified. With the latter, the scheduler runs the risk
of never being able to recognize true load (consider two CPU hogs on
a single little CPU). To achieve a compromise between these two
extremes, change the load tracking algorithm to only consider part of
a tasks wait time as its demand. The portion of wait time accounted
as demand is determined by each tasks percent load, i.e. a task that
waits for 10ms and has 60 % task load, only 6 ms of the wait will
contribute to task demand. This approach is more fair as the scheduler
now tries to determine how much of its wait time would a task actually
have been using the CPU if it had been executing. It ensures that tasks
with high demand continue to see most of the benefits of accounting
wait time as busy time, however, lower demand tasks don't experience a
disproportionately high boost to demand triggering unjustified big CPU
usage. Note that this new approach is only applicable to wait time
being considered as task demand and not wait time considered as CPU
busy time.

To achieve the above effect, ensure that anytime a task is waiting, its
runtime in every relevant window segment is appropriately adjusted using
its pct load.

Change-Id: I6a698d6cb1adeca49113c3499029b422daf7871f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-05-28 12:16:58 -07:00
Olav Haugan cb3b575990 sched/fair: Add irq load awareness to the tick CPU selection logic
IRQ load is not taken into account when determining whether a task
should be migrated to a different CPU.  A task that runs for a long time
could get stuck on CPU with high IRQ load causing degraded performance.

Add irq load awareness to the tick CPU selection logic.

CRs-fixed: 809119
Change-Id: I7969f7dd947fb5d66fce0bedbc212bfb2d42c8c1
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2015-05-28 12:16:46 -07:00
choongryeol.lee c9b85f93b5 ARM64: bullhead: Add initial LGE Crash Handler
Change-Id: Ida38a2fafea95017dae95fbad72b69c709bd3b2c
Signed-off-by: kyuyoung.lee <kyuyoung.lee@lge.com>
Signed-off-by: choongryeol.lee <choongryeol.lee@lge.com>
2015-05-20 21:28:36 +00:00
Christoph Hellwig 5a4d93f39c revert "softirq: Add support for triggering softirq work on softirqs"
commit fc21c0cff2f425891b28ff6fb6b03b325c977428 upstream.

This commit was incomplete in that code to remove items from the per-cpu
lists was missing and never acquired a user in the 5 years it has been in
the tree.  We're going to implement what it seems to try to archive in a
simpler way, and this code is in the way of doing so.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pan Xinhui <xinhuix.pan@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-17 09:51:33 -07:00
Calvin Owens 61ea92b948 ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
commit 28423ad283d5348793b0c45cc9b1af058e776fd6 upstream.

While debugging an issue with excessive softirq usage, I encountered the
following note in commit 3e339b5dae ("softirq: Use hotplug thread
infrastructure"):

    [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]

...but despite this note, the patch still calls RCU with IRQs disabled.

This seemingly innocuous change caused a significant regression in softirq
CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
introducing 0.01% packet loss, the softirq usage would jump to around 25%,
spiking as high as 50%. Before the change, the usage would never exceed 5%.

Moving the call to rcu_note_context_switch() after the cond_sched() call,
as it was originally before the hotplug patch, completely eliminated this
problem.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06 21:56:27 +02:00
Oleg Nesterov ddb56eac0e ptrace: fix race between ptrace_resume() and wait_task_stopped()
commit b72c186999e689cb0b055ab1c7b3cd8fffbeb5ed upstream.

ptrace_resume() is called when the tracee is still __TASK_TRACED.  We set
tracee->exit_code and then wake_up_state() changes tracee->state.  If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.

This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.

Test-case:

	#include <stdio.h>
	#include <unistd.h>
	#include <sys/wait.h>
	#include <sys/ptrace.h>
	#include <pthread.h>
	#include <assert.h>

	int pid;

	void *waiter(void *arg)
	{
		int stat;

		for (;;) {
			assert(pid == wait(&stat));
			assert(WIFSTOPPED(stat));
			if (WSTOPSIG(stat) == SIGHUP)
				continue;

			assert(WSTOPSIG(stat) == SIGCONT);
			printf("ERR! extra/wrong report:%x\n", stat);
		}
	}

	int main(void)
	{
		pthread_t thread;

		pid = fork();
		if (!pid) {
			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
			for (;;)
				kill(getpid(), SIGHUP);
		}

		assert(pthread_create(&thread, NULL, waiter, NULL) == 0);

		for (;;)
			ptrace(PTRACE_CONT, pid, 0, SIGCONT);

		return 0;
	}

Note for stable: the bug is very old, but without 9899d11f65 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Pavel Labath <labath@google.com>
Tested-by: Pavel Labath <labath@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06 21:56:24 +02:00
Steven Rostedt faf8db2e22 ring-buffer: Replace this_cpu_*() with __this_cpu_*()
commit 80a9b64e2c156b6523e7a01f2ba6e5d86e722814 upstream.

It has come to my attention that this_cpu_read/write are horrible on
architectures other than x86. Worse yet, they actually disable
preemption or interrupts! This caused some unexpected tracing results
on ARM.

   101.356868: preempt_count_add <-ring_buffer_lock_reserve
   101.356870: preempt_count_sub <-ring_buffer_lock_reserve

The ring_buffer_lock_reserve has recursion protection that requires
accessing a per cpu variable. But since preempt_disable() is traced, it
too got traced while accessing the variable that is suppose to prevent
recursion like this.

The generic version of this_cpu_read() and write() are:

 #define this_cpu_generic_read(pcp)					\
 ({	typeof(pcp) ret__;						\
	preempt_disable();						\
	ret__ = *this_cpu_ptr(&(pcp));					\
	preempt_enable();						\
	ret__;								\
 })

 #define this_cpu_generic_to_op(pcp, val, op)				\
 do {									\
	unsigned long flags;						\
	raw_local_irq_save(flags);					\
	*__this_cpu_ptr(&(pcp)) op val;					\
	raw_local_irq_restore(flags);					\
 } while (0)

Which is unacceptable for locations that know they are within preempt
disabled or interrupt disabled locations.

Paul McKenney stated that __this_cpu_() versions produce much better code on
other architectures than this_cpu_() does, if we know that the call is done in
a preempt disabled location.

I also changed the recursive_unlock() to use two local variables instead
of accessing the per_cpu variable twice.

Link: http://lkml.kernel.org/r/20150317114411.GE3589@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20150317104038.312e73d1@gandalf.local.home

Acked-by: Christoph Lameter <cl@linux.com>
Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-06 21:56:21 +02:00
Syed Rameez Mustafa d191fa671d sched: Update max_capacity when an entire cluster is hotplugged
When an entire cluster is hotplugged, the scheduler's notion of
max_capacity can get outdated. This introduces the following
inefficiencies in behavior:

* task_will_fit() does not return true on all tasks. Consequently
  all big tasks go through fallback CPU selection logic skipping
  C-state and power checks in select_best_cpu().

* During boost, migration_needed() return true unnecessarily
  causing an avoidable rerun of select_best_cpu().

* An unnecessary kick is sent to all little CPUs when boost is set.

* An opportunity for early bailout from nohz_kick_needed() is lost.

Start handling CPUFREQ_REMOVE_POLICY in the policy notifier callback
which indicates the last CPU in a cluster being hotplugged out. Also
modify update_min_max_capacity() to only iterate through online CPUs
instead of possible CPUs. While we can't guarantee the integrity of
the cpu_online_mask in the notifier callback, the scheduler will fix
up all state soon after any changes to the online mask.

The change does have one side effect; early termination from the
notifier callback when min_max_freq or max_possible_freq remain
unchanged is no longer possible. This is because when the last CPU
in a cluster is hot removed, only max_capacity is updated without
affecting min_max_freq or max_possible_freq. Therefore, when the
first CPU in the same cluster gets hot added at a later point
max_capacity must once again be recomputed despite there being no
change in min_max_freq or max_possible_freq.

Change-Id: I9a1256b5c2cd6fcddd85b069faf5e2ace177e122
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-05-01 19:40:50 -07:00
Ian Maund faa9cc5339 This is the 3.10.73 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVFBE+AAoJEDjbvchgkmk+oTkP/j2ipSvgXghFEipZbOJUQkqC
 fa8elfoF7riTKpKOuDtDU2WI1ttCGYs5gmTNpd4KaEt23eJOQgVqIpV8GhAkW5Af
 NVyGhjF3dXNqpBkxnyuIkk5OLrNKGRNS2xpz1U254iGObYrK+tr62IzGPxEcPAhX
 Y+58xPVSjLtNdTJW3YLT3DohUbnbHG6Br9geI1IHtlxg1oDiTxtnX2FmOFzzDpP5
 qu8gnPIekg/+1EE46nEiq0C59AwC3aCzNxwlYe1Kd41SY3LUFF1eZMzmOnnwyI5K
 3FslAzT6x/sOmGJFTYrKjFA4GKsW67xHVkB/hp/Mu768RqxiQCxV4kgmPsAFLbXb
 D5qbNwr3i0iQ/9AaD7h8HJkxC/KHmszMux00L/mgZ3SGdGMEIBxHg+oP8+nP8V6C
 WfXKSWA94dpdRyULEfWdnKnUnp2860C7kt7ASTkOl8rIgU8HgaRqeu+U/KPM2ovD
 ZJtXPVB5UXCRuVAhZwbvvrLOY8UMZTnv2auAaeLYG8YptcvGeN5Z398/8qdV/z7c
 A9kOsgebs74X+lR3rbVgSDPQaq2AEiuIvtX77SfmrWXBXGmc99i9+PikuFggRprz
 cJm5bCM9DaHu/3b77X9Fwl7vnpReB0zPHiwTdH/p7OPMf5m1uQt7SqegC6btLPHs
 iYgjLd4oW+6uiV/2X1Vx
 =L+mC
 -----END PGP SIGNATURE-----

Merge commit 'v3.10.73' into LA.BF64.1.2.9

This merge brings us up to date with upstream kernel.org tag v3.10.73.
As part of the conflict resolution, changes introduced by commit 72684eae7
("arm64: Fix up /proc/cpuinfo") have been intentionally dropped, as they
conflict with Android changes msm-3.10 kernel to solve the problems
in a different way. Since userspace readers of this file may depend on
the existing msm-3.10 implementation, it's left as-is for now. The
commit may later be introduced if it is found to not impact userspaces
paired with this kernel.

* commit 'v3.10.73' (264 commits):
  Linux 3.10.73
  target: Allow Write Exclusive non-reservation holders to READ
  target: Allow AllRegistrants to re-RESERVE existing reservation
  target: Fix R_HOLDER bit usage for AllRegistrants
  target/pscsi: Fix NULL pointer dereference in get_device_type
  iscsi-target: Avoid early conn_logout_comp for iser connections
  target: Fix reference leak in target_get_sess_cmd() error path
  ARM: at91: pm: fix at91rm9200 standby
  ipvs: rerouting to local clients is not needed anymore
  ipvs: add missing ip_vs_pe_put in sync code
  powerpc/smp: Wait until secondaries are active & online
  x86/vdso: Fix the build on GCC5
  x86/fpu: Drop_fpu() should not assume that tsk equals current
  x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig()
  crypto: aesni - fix memory usage in GCM decryption
  libsas: Fix Kernel Crash in smp_execute_task
  xen-pciback: limit guest control of command register
  nilfs2: fix deadlock of segment constructor during recovery
  regulator: core: Fix enable GPIO reference counting
  regulator: Only enable disabled regulators on resume
  ALSA: hda - Treat stereo-to-mono mix properly
  ALSA: hda - Add workaround for MacBook Air 5,2 built-in mic
  ALSA: hda - Set single_adc_amp flag for CS420x codecs
  ALSA: hda - Don't access stereo amps for mono channel widgets
  ALSA: hda - Fix built-in mic on Compaq Presario CQ60
  ALSA: control: Add sanity checks for user ctl id name string
  spi: pl022: Fix race in giveback() leading to driver lock-up
  tpm/ibmvtpm: Additional LE support for tpm_ibmvtpm_send
  workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
  can: add missing initialisations in CAN related skbuffs
  Change email address for 8250_pci
  virtio_console: init work unconditionally
  fuse: notify: don't move pages
  fuse: set stolen page uptodate
  drm/radeon: drop setting UPLL to sleep mode
  drm/radeon: do a posting read in rs600_set_irq
  drm/radeon: do a posting read in si_set_irq
  drm/radeon: do a posting read in r600_set_irq
  drm/radeon: do a posting read in r100_set_irq
  drm/radeon: do a posting read in evergreen_set_irq
  drm/radeon: fix DRM_IOCTL_RADEON_CS oops
  tcp: make connect() mem charging friendly
  net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour
  tcp: fix tcp fin memory accounting
  Revert "net: cx82310_eth: use common match macro"
  rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()
  caif: fix MSG_OOB test in caif_seqpkt_recvmsg()
  inet_diag: fix possible overflow in inet_diag_dump_one_icsk()
  rds: avoid potential stack overflow
  net: sysctl_net_core: check SNDBUF and RCVBUF for min length
  sparc64: Fix several bugs in memmove().
  sparc: Touch NMI watchdog when walking cpus and calling printk
  sparc: perf: Make counting mode actually work
  sparc: perf: Remove redundant perf_pmu_{en|dis}able calls
  sparc: semtimedop() unreachable due to comparison error
  sparc32: destroy_context() and switch_mm() needs to disable interrupts.
  Linux 3.10.72
  ath5k: fix spontaneus AR5312 freezes
  ACPI / video: Load the module even if ACPI is disabled
  drm/radeon: fix 1 RB harvest config setup for TN/RL
  Drivers: hv: vmbus: incorrect device name is printed when child device is unregistered
  HID: fixup the conflicting keyboard mappings quirk
  HID: input: fix confusion on conflicting mappings
  staging: comedi: cb_pcidas64: fix incorrect AI range code handling
  dm snapshot: fix a possible invalid memory access on unload
  dm: fix a race condition in dm_get_md
  dm io: reject unsupported DISCARD requests with EOPNOTSUPP
  dm mirror: do not degrade the mirror on discard error
  staging: comedi: comedi_compat32.c: fix COMEDI_CMD copy back
  clk: sunxi: Support factor clocks with N factor starting not from 0
  fixed invalid assignment of 64bit mask to host dma_boundary for scatter gather segment boundary limit.
  nilfs2: fix potential memory overrun on inode
  IB/qib: Do not write EEPROM
  sg: fix read() error reporting
  ALSA: hda - Add pin configs for ASUS mobo with IDT 92HD73XX codec
  ALSA: pcm: Don't leave PREPARED state after draining
  tty: fix up atime/mtime mess, take four
  sunrpc: fix braino in ->poll()
  procfs: fix race between symlink removals and traversals
  debugfs: leave freeing a symlink body until inode eviction
  autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation
  USB: serial: fix potential use-after-free after failed probe
  TTY: fix tty_wait_until_sent on 64-bit machines
  USB: serial: fix infinite wait_until_sent timeout
  net: irda: fix wait_until_sent poll timeout
  xhci: fix reporting of 0-sized URBs in control endpoint
  xhci: Allocate correct amount of scratchpad buffers
  usb: ftdi_sio: Add jtag quirk support for Cyber Cortex AV boards
  USB: usbfs: don't leak kernel data in siginfo
  USB: serial: cp210x: Adding Seletek device id's
  KVM: MIPS: Fix trace event to save PC directly
  KVM: emulate: fix CMPXCHG8B on 32-bit hosts
  Btrfs:__add_inode_ref: out of bounds memory read when looking for extended ref.
  Btrfs: fix data loss in the fast fsync path
  btrfs: fix lost return value due to variable shadowing
  iio: imu: adis16400: Fix sign extension
  x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization
  PM / QoS: remove duplicate call to pm_qos_update_target
  target: Check for LBA + sectors wrap-around in sbc_parse_cdb
  mm/memory.c: actually remap enough memory
  mm/compaction: fix wrong order check in compact_finished()
  mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
  mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
  mm/hugetlb: add migration entry check in __unmap_hugepage_range
  team: don't traverse port list using rcu in team_set_mac_address
  udp: only allow UFO for packets from SOCK_DGRAM sockets
  usb: plusb: Add support for National Instruments host-to-host cable
  macvtap: make sure neighbour code can push ethernet header
  net: compat: Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msg
  team: fix possible null pointer dereference in team_handle_frame
  net: reject creation of netdev names with colons
  ematch: Fix auto-loading of ematch modules.
  net: phy: Fix verification of EEE support in phy_init_eee
  ipv4: ip_check_defrag should not assume that skb_network_offset is zero
  ipv4: ip_check_defrag should correctly check return value of skb_copy_bits
  gen_stats.c: Duplicate xstats buffer for later use
  rtnetlink: call ->dellink on failure when ->newlink exists
  ipv6: fix ipv6_cow_metrics for non DST_HOST case
  rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY
  Linux 3.10.71
  libceph: fix double __remove_osd() problem
  libceph: change from BUG to WARN for __remove_osd() asserts
  libceph: assert both regular and lingering lists in __remove_osd()
  MIPS: Export FP functions used by lose_fpu(1) for KVM
  x86, mm/ASLR: Fix stack randomization on 64-bit systems
  blk-throttle: check stats_cpu before reading it from sysfs
  jffs2: fix handling of corrupted summary length
  md/raid1: fix read balance when a drive is write-mostly.
  md/raid5: Fix livelock when array is both resyncing and degraded.
  metag: Fix KSTK_EIP() and KSTK_ESP() macros
  gpio: tps65912: fix wrong container_of arguments
  arm64: compat Fix siginfo_t -> compat_siginfo_t conversion on big endian
  hx4700: regulator: declare full constraints
  KVM: x86: update masterclock values on TSC writes
  KVM: MIPS: Don't leak FPU/DSP to guest
  ARC: fix page address calculation if PAGE_OFFSET != LINUX_LINK_BASE
  ntp: Fixup adjtimex freq validation on 32-bit systems
  kdb: fix incorrect counts in KDB summary command output
  ARM: pxa: add regulator_has_full_constraints to poodle board file
  ARM: pxa: add regulator_has_full_constraints to corgi board file
  vt: provide notifications on selection changes
  usb: core: buffer: smallest buffer should start at ARCH_DMA_MINALIGN
  USB: fix use-after-free bug in usb_hcd_unlink_urb()
  USB: cp210x: add ID for RUGGEDCOM USB Serial Console
  tty: Prevent untrappable signals from malicious program
  axonram: Fix bug in direct_access
  cfq-iosched: fix incorrect filing of rt async cfqq
  cfq-iosched: handle failure of cfq group allocation
  iscsi-target: Drop problematic active_ts_list usage
  NFSv4.1: Fix a kfree() of uninitialised pointers in decode_cb_sequence_args
  Added Little Endian support to vtpm module
  tpm/tpm_i2c_stm_st33: Fix potential bug in tpm_stm_i2c_send
  tpm: Fix NULL return in tpm_ibmvtpm_get_desired_dma
  tpm_tis: verify interrupt during init
  ARM: 8284/1: sa1100: clear RCSR_SMR on resume
  tracing: Fix unmapping loop in tracing_mark_write
  MIPS: KVM: Deliver guest interrupts after local_irq_disable()
  nfs: don't call blocking operations while !TASK_RUNNING
  mmc: sdhci-pxav3: fix setting of pdata->clk_delay_cycles
  power_supply: 88pm860x: Fix leaked power supply on probe fail
  ALSA: hdspm - Constrain periods to 2 on older cards
  ALSA: off by one bug in snd_riptide_joystick_probe()
  lmedm04: Fix usb_submit_urb BOGUS urb xfer, pipe 1 != type 3 in interrupt urb
  cpufreq: speedstep-smi: enable interrupts when waiting
  PCI: Fix infinite loop with ROM image of size 0
  PCI: Generate uppercase hex for modalias var in uevent
  HID: i2c-hid: Limit reads to wMaxInputLength bytes for input events
  iwlwifi: mvm: always use mac color zero
  iwlwifi: mvm: fix failure path when power_update fails in add_interface
  iwlwifi: mvm: validate tid and sta_id in ba_notif
  iwlwifi: pcie: disable the SCD_BASE_ADDR when we resume from WoWLAN
  fsnotify: fix handling of renames in audit
  xfs: set superblock buffer type correctly
  xfs: inode unlink does not set AGI buffer type
  xfs: ensure buffer types are set correctly
  Bluetooth: ath3k: workaround the compatibility issue with xHCI controller
  Linux 3.10.70
  rbd: drop an unsafe assertion
  media/rc: Send sync space information on the lirc device
  net: sctp: fix passing wrong parameter header to param_type2af in sctp_process_param
  ppp: deflate: never return len larger than output buffer
  ipv4: tcp: get rid of ugly unicast_sock
  tcp: ipv4: initialize unicast_sock sk_pacing_rate
  bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify
  ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too
  ping: Fix race in free in receive path
  udp_diag: Fix socket skipping within chain
  ipv4: try to cache dst_entries which would cause a redirect
  net: sctp: fix slab corruption from use after free on INIT collisions
  netxen: fix netxen_nic_poll() logic
  ipv6: stop sending PTB packets for MTU < 1280
  net: rps: fix cpu unplug
  ip: zero sockaddr returned on error queue
  Linux 3.10.69
  crypto: crc32c - add missing crypto module alias
  x86,kvm,vmx: Preserve CR4 across VM entry
  kvm: vmx: handle invvpid vm exit gracefully
  smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread()
  ALSA: ak411x: Fix stall in work callback
  ASoC: sgtl5000: add delay before first I2C access
  ASoC: atmel_ssc_dai: fix start event for I2S mode
  lib/checksum.c: fix build for generic csum_tcpudp_nofold
  ext4: prevent bugon on race between write/fcntl
  arm64: Fix up /proc/cpuinfo
  nilfs2: fix deadlock of segment constructor over I_SYNC flag
  lib/checksum.c: fix carry in csum_tcpudp_nofold
  mm: pagewalk: call pte_hole() for VM_PFNMAP during walk_page_range
  MIPS: Fix kernel lockup or crash after CPU offline/online
  MIPS: IRQ: Fix disable_irq on CPU IRQs
  PCI: Add NEC variants to Stratus ftServer PCIe DMI check
  gpio: sysfs: fix memory leak in gpiod_sysfs_set_active_low
  gpio: sysfs: fix memory leak in gpiod_export_link
  Linux 3.10.68
  target: Drop arbitrary maximum I/O size limit
  iser-target: Fix implicit termination of connections
  iser-target: Handle ADDR_CHANGE event for listener cm_id
  iser-target: Fix connected_handler + teardown flow race
  iser-target: Parallelize CM connection establishment
  iser-target: Fix flush + disconnect completion handling
  iscsi,iser-target: Initiate termination only once
  vhost-scsi: Add missing virtio-scsi -> TCM attribute conversion
  tcm_loop: Fix wrong I_T nexus association
  vhost-scsi: Take configfs group dependency during VHOST_SCSI_SET_ENDPOINT
  ib_isert: Add max_send_sge=2 minimum for control PDU responses
  IB/isert: Adjust CQ size to HW limits
  workqueue: fix subtle pool management issue which can stall whole worker_pool
  gpio: squelch a compiler warning
  efi-pstore: Make efi-pstore return a unique id
  pstore/ram: avoid atomic accesses for ioremapped regions
  pstore: Fix NULL pointer fault if get NULL prz in ramoops_get_next_prz
  pstore: skip zero size persistent ram buffer in traverse
  pstore: clarify clearing of _read_cnt in ramoops_context
  pstore: d_alloc_name() doesn't return an ERR_PTR
  pstore: Fail to unlink if a driver has not defined pstore_erase
  ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE
  ARM: 8108/1: mm: Introduce {pte,pmd}_isset and {pte,pmd}_isclear
  ARM: DMA: ensure that old section mappings are flushed from the TLB
  ARM: 7931/1: Correct virt_addr_valid
  ARM: fix asm/memory.h build error
  ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' in atomic_cmpxchg().
  ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.h
  ARM: lpae: fix definition of PTE_HWTABLE_PTRS
  ARM: fix type of PHYS_PFN_OFFSET to unsigned long
  ARM: LPAE: use phys_addr_t in alloc_init_pud()
  ARM: LPAE: use signed arithmetic for mask definitions
  ARM: mm: correct pte_same behaviour for LPAE.
  ARM: 7829/1: Add ".text.unlikely" and ".text.hot" to arm unwind tables
  drivers: net: cpsw: discard dual emac default vlan configuration
  regulator: core: fix race condition in regulator_put()
  spi/pxa2xx: Clear cur_chip pointer before starting next message
  dm cache: fix missing ERR_PTR returns and handling
  dm thin: don't allow messages to be sent to a pool target in READ_ONLY or FAIL mode
  nl80211: fix per-station group key get/del and memory leak
  NFSv4.1: Fix an Oops in nfs41_walk_client_list
  nfs: fix dio deadlock when O_DIRECT flag is flipped
  Input: i8042 - add noloop quirk for Medion Akoya E7225 (MD98857)
  ALSA: seq-dummy: remove deadlock-causing events on close
  powerpc/xmon: Fix another endiannes issue in RTAS call from xmon
  can: kvaser_usb: Fix state handling upon BUS_ERROR events
  can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT
  can: kvaser_usb: Send correct context to URB completion
  can: kvaser_usb: Do not sleep in atomic context
  ASoC: wm8960: Fix capture sample rate from 11250 to 11025
  spi: dw-mid: fix FIFO size

Signed-off-by: Ian Maund <imaund@codeaurora.org>
2015-05-01 13:49:45 -07:00
Matt Wagantall 9216d4ae8c Revert "workqueue: fix subtle pool management issue which can stall whole worker_pool"
This reverts commit 6515b297d5fcc5fd0f80ed04382a5f6b2b15685a.

Drop this change in preparation for picking up its upstream linux-stable
version, in which merge conflicts have been differently resolved. This
will allow for a cleaner merge of linux-stable v3.10.73.

Change-Id: I6cc115983a00f0153914a9b8fa1378c8fbae07f8
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
2015-05-01 13:44:26 -07:00
Ian Maund 807a44d01a This is the 3.10.67 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUyuGRAAoJEDjbvchgkmk+7EwQALYPOeh+AManQFB1MQvFuOgZ
 /4ulpjhGXw/RPTKHMeyHo8vRfUhMOx8UPF62uql+g1l9b/Zt2bs6qXu4QcxRRsQc
 trSTUpi+U14y1hkgqOVOcFYP2ZaTjNEBQgLJ4eGn46CliLqme+rfoyRYm2GXzcR4
 6cbSAr3mufdFIpi9/8Dn62Gv0aws5lIv3qkHJXznyuux3tisPT5y6Ux2KJoivPn/
 SqADtRpwo+7lTjl15fE++9AqNsGMorV6toT2OO/7nXP+824psInKLmREAT2qC99b
 BG61vcYdxOuHtzmwrvCf1jSRjxhvZT0j2xhBr/vCKcxy08AT0vDv68zrV1r6TIuu
 U7/CKXtFBY95cjfnkTLJuswBSuIA/+sQHV6DaddH0V8fcZ6rQMLrblQ9ZcFFFkmT
 2SG6lmlXqZvcEKYGMnL/Dcow1rkRhB5stiGgTkYxjiRSRpzAHISRJ/GGpsT+rRqK
 HpBs5p9JshvRl7RWKwAu+DNGaEK1X/WYxc4/jw6dZFWX7lEWSMIPlr9zXgZCZ39y
 V6lV1VVlT9/CSs1swKHUyhHHehlFsnIlQ6Fkiycr/KkuqBLs92Hyb7WhpVa819yX
 osXdxSm6J54skiOLKYpBWHpnY09Tc+p28VEfMpErTExgp2oE8F34K7kdhoQPQb97
 2mHiXNa+J4CLUNQ+sRmw
 =HDBo
 -----END PGP SIGNATURE-----

Merge commit 'v3.10.67' into LA.BF64.1.2.9

This merge brings us up to date with upstream kernel.org tag v3.10.67.
It also contains changes to allow forbidden warnings introduced in
the commit 'core, nfqueue, openvswitch: Orphan frags in skb_zerocopy
and handle errors'. Once upstream has corrected these warnings, the
changes to scripts/gcc-wrapper.py, in this commit, can be reverted.

* 'v3.10.67' (915 commits):
  Linux 3.10.67
  md/raid5: fetch_block must fetch all the blocks handle_stripe_dirtying wants.
  ext4: fix warning in ext4_da_update_reserve_space()
  quota: provide interface for readding allocated space into reserved space
  crypto: add missing crypto module aliases
  crypto: include crypto- module prefix in template
  crypto: prefix module autoloading with "crypto-"
  drbd: merge_bvec_fn: properly remap bvm->bi_bdev
  Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
  ipvs: uninitialized data with IP_VS_IPV6
  KEYS: close race between key lookup and freeing
  sata_dwc_460ex: fix resource leak on error path
  x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs
  x86, tls: Interpret an all-zero struct user_desc as "no segment"
  x86, tls, ldt: Stop checking lm in LDT_empty
  x86/tsc: Change Fast TSC calibration failed from error to info
  x86, hyperv: Mark the Hyper-V clocksource as being continuous
  clocksource: exynos_mct: Fix bitmask regression for exynos4_mct_write
  can: dev: fix crtlmode_supported check
  bus: mvebu-mbus: fix support of MBus window 13
  ARM: dts: imx25: Fix PWM "per" clocks
  time: adjtimex: Validate the ADJ_FREQUENCY values
  time: settimeofday: Validate the values of tv from user
  dm cache: share cache-metadata object across inactive and active DM tables
  ipr: wait for aborted command responses
  drm/i915: Fix mutex->owner inspection race under DEBUG_MUTEXES
  scripts/recordmcount.pl: There is no -m32 gcc option on Super-H anymore
  ALSA: usb-audio: Add mic volume fix quirk for Logitech Webcam C210
  libata: prevent HSM state change race between ISR and PIO
  pinctrl: Fix two deadlocks
  gpio: sysfs: fix gpio device-attribute leak
  gpio: sysfs: fix gpio-chip device-attribute leak
  Linux 3.10.66
  s390/3215: fix tty output containing tabs
  s390/3215: fix hanging console issue
  fsnotify: next_i is freed during fsnotify_unmount_inodes.
  netfilter: ipset: small potential read beyond the end of buffer
  mmc: sdhci: Fix sleep in atomic after inserting SD card
  LOCKD: Fix a race when initialising nlmsvc_timeout
  x86, um: actually mark system call tables readonly
  um: Skip futex_atomic_cmpxchg_inatomic() test
  decompress_bunzip2: off by one in get_next_block()
  ARM: shmobile: sh73a0 legacy: Set .control_parent for all irqpin instances
  ARM: omap5/dra7xx: Fix frequency typos
  ARM: clk-imx6q: fix video divider for rev T0 1.0
  ARM: imx6q: drop unnecessary semicolon
  ARM: dts: imx25: Fix the SPI1 clocks
  Input: I8042 - add Acer Aspire 7738 to the nomux list
  Input: i8042 - reset keyboard to fix Elantech touchpad detection
  can: kvaser_usb: Don't send a RESET_CHIP for non-existing channels
  can: kvaser_usb: Reset all URB tx contexts upon channel close
  can: kvaser_usb: Don't free packets when tight on URBs
  USB: keyspan: fix null-deref at probe
  USB: cp210x: add IDs for CEL USB sticks and MeshWorks devices
  USB: cp210x: fix ID for production CEL MeshConnect USB Stick
  usb: dwc3: gadget: Stop TRB preparation after limit is reached
  usb: dwc3: gadget: Fix TRB preparation during SG
  OHCI: add a quirk for ULi M5237 blocking on reset
  gpiolib: of: Correct error handling in of_get_named_gpiod_flags
  NFSv4.1: Fix client id trunking on Linux
  ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing
  vfio-pci: Fix the check on pci device type in vfio_pci_probe()
  uvcvideo: Fix destruction order in uvc_delete()
  smiapp: Take mutex during PLL update in sensor initialisation
  af9005: fix kernel panic on init if compiled without IR
  smiapp-pll: Correct clock debug prints
  video/logo: prevent use of logos after they have been freed
  storvsc: ring buffer failures may result in I/O freeze
  iscsi-target: Fail connection on short sendmsg writes
  hp_accel: Add support for HP ZBook 15
  cfg80211: Fix 160 MHz channels with 80+80 and 160 MHz drivers
  ARC: [nsimosci] move peripherals to match model to FPGA
  drm/i915: Force the CS stall for invalidate flushes
  drm/i915: Invalidate media caches on gen7
  drm/radeon: properly filter DP1.2 4k modes on non-DP1.2 hw
  drm/radeon: check the right ring in radeon_evict_flags()
  drm/vmwgfx: Fix fence event code
  enic: fix rx skb checksum
  alx: fix alx_poll()
  tcp: Do not apply TSO segment limit to non-TSO packets
  tg3: tg3_disable_ints using uninitialized mailbox value to disable interrupts
  netlink: Don't reorder loads/stores before marking mmap netlink frame as available
  netlink: Always copy on mmap TX.
  Linux 3.10.65
  mm: Don't count the stack guard page towards RLIMIT_STACK
  mm: propagate error from stack expansion even for guard page
  mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
  perf session: Do not fail on processing out of order event
  perf: Fix events installation during moving group
  perf/x86/intel/uncore: Make sure only uncore events are collected
  Btrfs: don't delay inode ref updates during log replay
  ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XP
  scripts/kernel-doc: don't eat struct members with __aligned
  nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
  nfsd4: fix xdr4 inclusion of escaped char
  fs: nfsd: Fix signedness bug in compare_blob
  serial: samsung: wait for transfer completion before clock disable
  writeback: fix a subtle race condition in I_DIRTY clearing
  cdc-acm: memory leak in error case
  genhd: check for int overflow in disk_expand_part_tbl()
  USB: cdc-acm: check for valid interfaces
  ALSA: hda - Fix wrong gpio_dir & gpio_mask hint setups for IDT/STAC codecs
  ALSA: hda - using uninitialized data
  ALSA: usb-audio: extend KEF X300A FU 10 tweak to Arcam rPAC
  driver core: Fix unbalanced device reference in drivers_probe
  x86, vdso: Use asm volatile in __getcpu
  x86_64, vdso: Fix the vdso address randomization algorithm
  HID: Add a new id 0x501a for Genius MousePen i608X
  HID: add battery quirk for USB_DEVICE_ID_APPLE_ALU_WIRELESS_2011_ISO keyboard
  HID: roccat: potential out of bounds in pyra_sysfs_write_settings()
  HID: i2c-hid: prevent buffer overflow in early IRQ
  HID: i2c-hid: fix race condition reading reports
  iommu/vt-d: Fix an off-by-one bug in __domain_mapping()
  UBI: Fix double free after do_sync_erase()
  UBI: Fix invalid vfree()
  pstore-ram: Allow optional mapping with pgprot_noncached
  pstore-ram: Fix hangs by using write-combine mappings
  PCI: Restore detection of read-only BARs
  ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap
  ASoC: max98090: Fix ill-defined sidetone route
  ASoC: sigmadsp: Refuse to load firmware files with a non-supported version
  ath5k: fix hardware queue index assignment
  swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single
  can: peak_usb: fix memset() usage
  can: peak_usb: fix cleanup sequence order in case of error during init
  ath9k: fix BE/BK queue order
  ath9k_hw: fix hardware queue allocation
  ocfs2: fix journal commit deadlock
  Linux 3.10.64
  Btrfs: fix fs corruption on transaction abort if device supports discard
  Btrfs: do not move em to modified list when unpinning
  eCryptfs: Remove buggy and unnecessary write in file name decode routine
  eCryptfs: Force RO mount when encrypted view is enabled
  udf: Verify symlink size before loading it
  exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
  ncpfs: return proper error from NCP_IOC_SETROOT ioctl
  crypto: af_alg - fix backlog handling
  userns: Unbreak the unprivileged remount tests
  userns: Allow setting gid_maps without privilege when setgroups is disabled
  userns: Add a knob to disable setgroups on a per user namespace basis
  userns: Rename id_map_mutex to userns_state_mutex
  userns: Only allow the creator of the userns unprivileged mappings
  userns: Check euid no fsuid when establishing an unprivileged uid mapping
  userns: Don't allow unprivileged creation of gid mappings
  userns: Don't allow setgroups until a gid mapping has been setablished
  userns: Document what the invariant required for safe unprivileged mappings.
  groups: Consolidate the setgroups permission checks
  umount: Disallow unprivileged mount force
  mnt: Update unprivileged remount test
  mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
  mac80211: free management frame keys when removing station
  mac80211: fix multicast LED blinking and counter
  KEYS: Fix stale key registration at error path
  isofs: Fix unchecked printing of ER records
  x86/tls: Don't validate lm in set_thread_area() after all
  dm space map metadata: fix sm_bootstrap_get_nr_blocks()
  dm bufio: fix memleak when using a dm_buffer's inline bio
  nfs41: fix nfs4_proc_layoutget error handling
  megaraid_sas: corrected return of wait_event from abort frame path
  mmc: block: add newline to sysfs display of force_ro
  mfd: tc6393xb: Fail ohci suspend if full state restore is required
  md/bitmap: always wait for writes on unplug.
  x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit
  x86_64, switch_to(): Load TLS descriptors before switching DS and ES
  x86/tls: Disallow unusual TLS segments
  x86/tls: Validate TLS entries to protect espfix
  isofs: Fix infinite looping over CE entries
  Linux 3.10.63
  ALSA: usb-audio: Don't resubmit pending URBs at MIDI error recovery
  powerpc: 32 bit getcpu VDSO function uses 64 bit instructions
  ARM: sched_clock: Load cycle count after epoch stabilizes
  igb: bring link up when PHY is powered up
  ext2: Fix oops in ext2_get_block() called from ext2_quota_write()
  nEPT: Nested INVEPT
  net: sctp: use MAX_HEADER for headroom reserve in output path
  net: mvneta: fix Tx interrupt delay
  rtnetlink: release net refcnt on error in do_setlink()
  net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
  tg3: fix ring init when there are more TX than RX channels
  ipv6: gre: fix wrong skb->protocol in WCCP
  sata_fsl: fix error handling of irq_of_parse_and_map
  ahci: disable MSI on SAMSUNG 0xa800 SSD
  AHCI: Add DeviceIDs for Sunrise Point-LP SATA controller
  media: smiapp: Only some selection targets are settable
  drm/i915: Unlock panel even when LVDS is disabled
  drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6
  i2c: davinci: generate STP always when NACK is received
  i2c: omap: fix i207 errata handling
  i2c: omap: fix NACK and Arbitration Lost irq handling
  xen-netfront: Remove BUGs on paged skb data which crosses a page boundary
  mm: fix swapoff hang after page migration and fork
  mm: frontswap: invalidate expired data on a dup-store failure
  Linux 3.10.62
  nfsd: Fix ACL null pointer deref
  powerpc/powernv: Honor the generic "no_64bit_msi" flag
  bnx2fc: do not add shared skbs to the fcoe_rx_list
  nfsd4: fix leak of inode reference on delegation failure
  nfsd: Fix slot wake up race in the nfsv4.1 callback code
  rt2x00: do not align payload on modern H/W
  can: dev: avoid calling kfree_skb() from interrupt context
  spi: dw: Fix dynamic speed change.
  iser-target: Handle DEVICE_REMOVAL event on network portal listener correctly
  target: Don't call TFO->write_pending if data_length == 0
  srp-target: Retry when QP creation fails with ENOMEM
  Input: xpad - use proper endpoint type
  ARM: 8222/1: mvebu: enable strex backoff delay
  ARM: 8216/1: xscale: correct auxiliary register in suspend/resume
  ALSA: usb-audio: Add ctrl message delay quirk for Marantz/Denon devices
  can: esd_usb2: fix memory leak on disconnect
  USB: xhci: don't start a halted endpoint before its new dequeue is set
  usb-quirks: Add reset-resume quirk for MS Wireless Laser Mouse 6000
  usb: serial: ftdi_sio: add PIDs for Matrix Orbital products
  USB: serial: cp210x: add IDs for CEL MeshConnect USB Stick
  USB: keyspan: fix tty line-status reporting
  USB: keyspan: fix overrun-error reporting
  USB: ssu100: fix overrun-error reporting
  iio: Fix IIO_EVENT_CODE_EXTRACT_DIR bit mask
  powerpc/pseries: Fix endiannes issue in RTAS call from xmon
  powerpc/pseries: Honor the generic "no_64bit_msi" flag
  of/base: Fix PowerPC address parsing hack
  ASoC: wm_adsp: Avoid attempt to free buffers that might still be in use
  ASoC: sgtl5000: Fix SMALL_POP bit definition
  PCI/MSI: Add device flag indicating that 64-bit MSIs don't work
  ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg
  pptp: fix stack info leak in pptp_getname()
  qmi_wwan: Add support for HP lt4112 LTE/HSPA+ Gobi 4G Modem
  ieee802154: fix error handling in ieee802154fake_probe()
  ipv4: Fix incorrect error code when adding an unreachable route
  inetdevice: fixed signed integer overflow
  sparc64: Fix constraints on swab helpers.
  uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME
  x86, mm: Set NX across entire PMD at boot
  x86: Require exact match for 'noxsave' command line option
  x86_64, traps: Rework bad_iret
  x86_64, traps: Stop using IST for #SS
  x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
  MIPS: Loongson: Make platform serial setup always built-in.
  MIPS: oprofile: Fix backtrace on 64-bit kernel
  Linux 3.10.61
  mm: memcg: handle non-error OOM situations more gracefully
  mm: memcg: do not trap chargers with full callstack on OOM
  mm: memcg: rework and document OOM waiting and wakeup
  mm: memcg: enable memcg OOM killer only for user faults
  x86: finish user fault error path with fatal signal
  arch: mm: pass userspace fault flag to generic fault handler
  arch: mm: do not invoke OOM killer on kernel fault OOM
  arch: mm: remove obsolete init OOM protection
  mm: invoke oom-killer from remaining unconverted page fault handlers
  net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks
  net: sctp: fix panic on duplicate ASCONF chunks
  net: sctp: fix remote memory pressure from excessive queueing
  KVM: x86: Don't report guest userspace emulation error to userspace
  SCSI: hpsa: fix a race in cmd_free/scsi_done
  net/mlx4_en: Fix BlueFlame race
  ARM: Correct BUG() assembly to ensure it is endian-agnostic
  perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
  mei: bus: fix possible boundaries violation
  perf: Handle compat ioctl
  MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches
  dell-wmi: Fix access out of memory
  ARM: probes: fix instruction fetch order with <asm/opcodes.h>
  br: fix use of ->rx_handler_data in code executed on non-rx_handler path
  netfilter: nf_nat: fix oops on netns removal
  netfilter: xt_bpf: add mising opaque struct sk_filter definition
  netfilter: nf_log: release skbuff on nlmsg put failure
  netfilter: nfnetlink_log: fix maximum packet length logged to userspace
  netfilter: nf_log: account for size of NLMSG_DONE attribute
  ipc: always handle a new value of auto_msgmni
  clocksource: Remove "weak" from clocksource_default_clock() declaration
  kgdb: Remove "weak" from kgdb_arch_pc() declaration
  media: ttusb-dec: buffer overflow in ioctl
  NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
  nfs: Fix use of uninitialized variable in nfs_getattr()
  NFS: Don't try to reclaim delegation open state if recovery failed
  NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
  Input: alps - allow up to 2 invalid packets without resetting device
  Input: alps - ignore potential bare packets when device is out of sync
  dm raid: ensure superblock's size matches device's logical block size
  dm btree: fix a recursion depth bug in btree walking code
  block: Fix computation of merged request priority
  parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls
  scsi: only re-lock door after EH on devices that were reset
  nfs: fix pnfs direct write memory leak
  firewire: cdev: prevent kernel stack leaking into ioctl arguments
  arm64: __clear_user: handle exceptions on strb
  ARM: 8198/1: make kuser helpers depend on MMU
  drm/radeon: add missing crtc unlock when setting up the MC
  mac80211: fix use-after-free in defragmentation
  macvtap: Fix csum_start when VLAN tags are present
  iwlwifi: configure the LTR
  libceph: do not crash on large auth tickets
  xtensa: re-wire umount syscall to sys_oldumount
  ALSA: usb-audio: Fix memory leak in FTU quirk
  ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks
  ahci: Add Device IDs for Intel Sunrise Point PCH
  audit: keep inode pinned
  x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
  sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks
  sparc64: Do irq_{enter,exit}() around generic_smp_call_function*().
  sparc64: Fix crashes in schizo_pcierr_intr_other().
  sunvdc: don't call VD_OP_GET_VTOC
  vio: fix reuse of vio_dring slot
  sunvdc: limit each sg segment to a page
  sunvdc: compute vdisk geometry from capacity
  sunvdc: add cdrom and v1.1 protocol support
  net: sctp: fix memory leak in auth key management
  net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet
  gre6: Move the setting of dev->iflink into the ndo_init functions.
  ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.
  Linux 3.10.60
  libceph: ceph-msgr workqueue needs a resque worker
  Btrfs: fix kfree on list_head in btrfs_lookup_csums_range error cleanup
  of: Fix overflow bug in string property parsing functions
  sysfs: driver core: Fix glue dir race condition by gdp_mutex
  i2c: at91: don't account as iowait
  acer-wmi: Add acpi_backlight=video quirk for the Acer KAV80
  rbd: Fix error recovery in rbd_obj_read_sync()
  drm/radeon: remove invalid pci id
  usb: gadget: udc: core: fix kernel oops with soft-connect
  usb: gadget: function: acm: make f_acm pass USB20CV Chapter9
  usb: dwc3: gadget: fix set_halt() bug with pending transfers
  crypto: algif - avoid excessive use of socket buffer in skcipher
  mm: Remove false WARN_ON from pagecache_isize_extended()
  x86, apic: Handle a bad TSC more gracefully
  posix-timers: Fix stack info leak in timer_create()
  mac80211: fix typo in starting baserate for rts_cts_rate_idx
  PM / Sleep: fix recovery during resuming from hibernation
  tty: Fix high cpu load if tty is unreleaseable
  quota: Properly return errors from dquot_writeback_dquots()
  ext3: Don't check quota format when there are no quota files
  nfsd4: fix crash on unknown operation number
  cpc925_edac: Report UE events properly
  e7xxx_edac: Report CE events properly
  i3200_edac: Report CE events properly
  i82860_edac: Report CE events properly
  scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
  lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
  cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.
  usb: Do not allow usb_alloc_streams on unconfigured devices
  USB: opticon: fix non-atomic allocation in write path
  usb-storage: handle a skipped data phase
  spi: pxa2xx: toggle clocks on suspend if not disabled by runtime PM
  spi: pl022: Fix incorrect dma_unmap_sg
  usb: dwc3: gadget: Properly initialize LINK TRB
  wireless: rt2x00: add new rt2800usb device
  USB: option: add Haier CE81B CDMA modem
  usb: option: add support for Telit LE910
  USB: cdc-acm: only raise DTR on transitions from B0
  USB: cdc-acm: add device id for GW Instek AFG-2225
  usb: serial: ftdi_sio: add "bricked" FTDI device PID
  usb: serial: ftdi_sio: add Awinda Station and Dongle products
  USB: serial: cp210x: add Silicon Labs 358x VID and PID
  serial: Fix divide-by-zero fault in uart_get_divisor()
  staging:iio:ade7758: Remove "raw" from channel name
  staging:iio:ade7758: Fix check if channels are enabled in prenable
  staging:iio:ade7758: Fix NULL pointer deref when enabling buffer
  staging:iio:ad5933: Drop "raw" from channel names
  staging:iio:ad5933: Fix NULL pointer deref when enabling buffer
  OOM, PM: OOM killed task shouldn't escape PM suspend
  freezer: Do not freeze tasks killed by OOM killer
  ext4: fix oops when loading block bitmap failed
  cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy
  ext4: fix overflow when updating superblock backups after resize
  ext4: check s_chksum_driver when looking for bg csum presence
  ext4: fix reservation overflow in ext4_da_write_begin
  ext4: add ext4_iget_normal() which is to be used for dir tree lookups
  ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT
  ext4: don't check quota format when there are no quota files
  ext4: check EA value offset when loading
  jbd2: free bh when descriptor block checksum fails
  MIPS: tlbex: Properly fix HUGE TLB Refill exception handler
  target: Fix APTPL metadata handling for dynamic MappedLUNs
  target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE
  qla_target: don't delete changed nacls
  ARC: Update order of registers in KGDB to match GDB 7.5
  ARC: [nsimosci] Allow "headless" models to boot
  KVM: x86: Emulator fixes for eip canonical checks on near branches
  KVM: x86: Fix wrong masking on relative jump/call
  kvm: x86: don't kill guest on unknown exit reason
  KVM: x86: Check non-canonical addresses upon WRMSR
  KVM: x86: Improve thread safety in pit
  KVM: x86: Prevent host from panicking on shared MSR writes.
  kvm: fix excessive pages un-pinning in kvm_iommu_map error path.
  media: tda7432: Fix setting TDA7432_MUTE bit for TDA7432_RF register
  media: ds3000: fix LNB supply voltage on Tevii S480 on initialization
  media: em28xx-v4l: give back all active video buffers to the vb2 core properly on streaming stop
  media: v4l2-common: fix overflow in v4l_bound_align_image()
  drm/nouveau/bios: memset dcb struct to zero before parsing
  drm/tilcdc: Fix the error path in tilcdc_load()
  drm/ast: Fix HW cursor image
  Input: i8042 - quirks for Fujitsu Lifebook A544 and Lifebook AH544
  Input: i8042 - add noloop quirk for Asus X750LN
  framebuffer: fix border color
  modules, lock around setting of MODULE_STATE_UNFORMED
  dm log userspace: fix memory leak in dm_ulog_tfr_init failure path
  block: fix alignment_offset math that assumes io_min is a power-of-2
  drbd: compute the end before rb_insert_augmented()
  dm bufio: update last_accessed when relinking a buffer
  virtio_pci: fix virtio spec compliance on restore
  selinux: fix inode security list corruption
  pstore: Fix duplicate {console,ftrace}-efi entries
  mfd: rtsx_pcr: Fix MSI enable error handling
  mnt: Prevent pivot_root from creating a loop in the mount tree
  UBI: add missing kmem_cache_free() in process_pool_aeb error path
  random: add and use memzero_explicit() for clearing data
  crypto: more robust crypto_memneq
  fix misuses of f_count() in ppp and netlink
  kill wbuf_queued/wbuf_dwork_lock
  ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode
  evm: check xattr value length and type in evm_inode_setxattr()
  x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
  x86_64, entry: Fix out of bounds read on sysenter
  x86_64, entry: Filter RFLAGS.NT on entry from userspace
  x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
  x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
  x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()
  x86: Reject x32 executables if x32 ABI not supported
  vfs: fix data corruption when blocksize < pagesize for mmaped data
  UBIFS: fix free log space calculation
  UBIFS: fix a race condition
  UBIFS: remove mst_mutex
  fs: Fix theoretical division by 0 in super_cache_scan().
  fs: make cont_expand_zero interruptible
  mmc: rtsx_pci_sdmmc: fix incorrect last byte in R2 response
  libata-sff: Fix controllers with no ctl port
  pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller
  Revert "percpu: free percpu allocation info for uniprocessor system"
  lockd: Try to reconnect if statd has moved
  drivers/net: macvtap and tun depend on INET
  ipv4: dst_entry leak in ip_send_unicast_reply()
  ax88179_178a: fix bonding failure
  ipv4: fix nexthop attlen check in fib_nh_match
  tracing/syscalls: Ignore numbers outside NR_syscalls' range
  Linux 3.10.59
  ecryptfs: avoid to access NULL pointer when write metadata in xattr
  ARM: at91/PMC: don't forget to write PMC_PCDR register to disable clocks
  ALSA: usb-audio: Add support for Steinberg UR22 USB interface
  ALSA: emu10k1: Fix deadlock in synth voice lookup
  ALSA: pcm: use the same dma mmap codepath both for arm and arm64
  arm64: compat: fix compat types affecting struct compat_elf_prpsinfo
  spi: dw-mid: terminate ongoing transfers at exit
  kernel: add support for gcc 5
  fanotify: enable close-on-exec on events' fd when requested in fanotify_init()
  mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set
  Bluetooth: Fix issue with USB suspend in btusb driver
  Bluetooth: Fix HCI H5 corrupted ack value
  rt2800: correct BBP1_TX_POWER_CTRL mask
  PCI: Generate uppercase hex for modalias interface class
  PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size
  iwlwifi: Add missing PCI IDs for the 7260 series
  NFSv4.1: Fix an NFSv4.1 state renewal regression
  NFSv4: fix open/lock state recovery error handling
  NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
  lzo: check for length overrun in variable length encoding.
  Revert "lzo: properly check for overruns"
  Documentation: lzo: document part of the encoding
  m68k: Disable/restore interrupts in hwreg_present()/hwreg_write()
  Drivers: hv: vmbus: Fix a bug in vmbus_open()
  Drivers: hv: vmbus: Cleanup vmbus_establish_gpadl()
  Drivers: hv: vmbus: Cleanup vmbus_teardown_gpadl()
  Drivers: hv: vmbus: Cleanup vmbus_post_msg()
  firmware_class: make sure fw requests contain a name
  qla2xxx: Use correct offset to req-q-out for reserve calculation
  mptfusion: enable no_write_same for vmware scsi disks
  be2iscsi: check ip buffer before copying
  regmap: fix NULL pointer dereference in _regmap_write/read
  regmap: debugfs: fix possbile NULL pointer dereference
  spi: dw-mid: check that DMA was inited before exit
  spi: dw-mid: respect 8 bit mode
  x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
  kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
  KVM: s390: unintended fallthrough for external call
  kvm: x86: fix stale mmio cache bug
  fs: Add a missing permission check to do_umount
  Btrfs: fix race in WAIT_SYNC ioctl
  Btrfs: fix build_backref_tree issue with multiple shared blocks
  Btrfs: try not to ENOSPC on log replay
  Linux 3.10.58
  USB: cp210x: add support for Seluxit USB dongle
  USB: serial: cp210x: added Ketra N1 wireless interface support
  USB: Add device quirk for ASUS T100 Base Station keyboard
  ipv6: reallocate addrconf router for ipv6 address when lo device up
  tcp: fixing TLP's FIN recovery
  sctp: handle association restarts when the socket is closed.
  ip6_gre: fix flowi6_proto value in xmit path
  hyperv: Fix a bug in netvsc_start_xmit()
  tg3: Allow for recieve of full-size 8021AD frames
  tg3: Work around HW/FW limitations with vlan encapsulated frames
  l2tp: fix race while getting PMTU on PPP pseudo-wire
  openvswitch: fix panic with multiple vlan headers
  packet: handle too big packets for PACKET_V3
  tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced()
  sit: Fix ipip6_tunnel_lookup device matching criteria
  myri10ge: check for DMA mapping errors
  Linux 3.10.57
  cpufreq: ondemand: Change the calculation of target frequency
  cpufreq: Fix wrong time unit conversion
  nl80211: clear skb cb before passing to netlink
  drbd: fix regression 'out of mem, failed to invoke fence-peer helper'
  jiffies: Fix timeval conversion to jiffies
  md/raid5: disable 'DISCARD' by default due to safety concerns.
  media: vb2: fix VBI/poll regression
  mm: numa: Do not mark PTEs pte_numa when splitting huge pages
  mm, thp: move invariant bug check out of loop in __split_huge_page_map
  ring-buffer: Fix infinite spin in reading buffer
  init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu
  perf: fix perf bug in fork()
  udf: Avoid infinite loop when processing indirect ICBs
  Linux 3.10.56
  vm_is_stack: use for_each_thread() rather then buggy while_each_thread()
  oom_kill: add rcu_read_lock() into find_lock_task_mm()
  oom_kill: has_intersects_mems_allowed() needs rcu_read_lock()
  oom_kill: change oom_kill.c to use for_each_thread()
  introduce for_each_thread() to replace the buggy while_each_thread()
  kernel/fork.c:copy_process(): unify CLONE_THREAD-or-thread_group_leader code
  arm: multi_v7_defconfig: Enable Zynq UART driver
  ext2: Fix fs corruption in ext2_get_xip_mem()
  serial: 8250_dma: check the result of TX buffer mapping
  ARM: 7748/1: oabi: handle faults when loading swi instruction from userspace
  netfilter: nf_conntrack: avoid large timeout for mid-stream pickup
  PM / sleep: Use valid_state() for platform-dependent sleep states only
  PM / sleep: Add state field to pm_states[] entries
  ipvs: fix ipv6 hook registration for local replies
  ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding
  ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack
  md/raid1: fix_read_error should act on all non-faulty devices.
  media: cx18: fix kernel oops with tda8290 tuner
  Fix nasty 32-bit overflow bug in buffer i/o code.
  perf kmem: Make it work again on non NUMA machines
  perf: Fix a race condition in perf_remove_from_context()
  alarmtimer: Lock k_itimer during timer callback
  alarmtimer: Do not signal SIGEV_NONE timers
  parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds
  powerpc/perf: Fix ABIv2 kernel backtraces
  sched: Fix unreleased llc_shared_mask bit during CPU hotplug
  ocfs2/dlm: do not get resource spinlock if lockres is new
  nilfs2: fix data loss with mmap()
  fs/notify: don't show f_handle if exportfs_encode_inode_fh failed
  fsnotify/fdinfo: use named constants instead of hardcoded values
  kcmp: fix standard comparison bug
  Revert "mac80211: disable uAPSD if all ACs are under ACM"
  usb: dwc3: core: fix ordering for PHY suspend
  usb: dwc3: core: fix order of PM runtime calls
  usb: host: xhci: fix compliance mode workaround
  genhd: fix leftover might_sleep() in blk_free_devt()
  lockd: fix rpcbind crash on lockd startup failure
  rtlwifi: rtl8192cu: Add new ID
  percpu: perform tlb flush after pcpu_map_pages() failure
  percpu: fix pcpu_alloc_pages() failure path
  percpu: free percpu allocation info for uniprocessor system
  ata_piix: Add Device IDs for Intel 9 Series PCH
  Input: i8042 - add nomux quirk for Avatar AVIU-145A6
  Input: i8042 - add Fujitsu U574 to no_timeout dmi table
  Input: atkbd - do not try 'deactivate' keyboard on any LG laptops
  Input: elantech - fix detection of touchpad on ASUS s301l
  Input: synaptics - add support for ForcePads
  Input: serport - add compat handling for SPIOCSTYPE ioctl
  dm crypt: fix access beyond the end of allocated space
  block: Fix dev_t minor allocation lifetime
  workqueue: apply __WQ_ORDERED to create_singlethread_workqueue()
  Revert "iwlwifi: dvm: don't enable CTS to self"
  SCSI: libiscsi: fix potential buffer overrun in __iscsi_conn_send_pdu
  NFC: microread: Potential overflows in microread_target_discovered()
  iscsi-target: Fix memory corruption in iscsit_logout_post_handler_diffcid
  iscsi-target: avoid NULL pointer in iscsi_copy_param_list failure
  Target/iser: Don't put isert_conn inside disconnected handler
  Target/iser: Get isert_conn reference once got to connected_handler
  iio:inkern: fix overwritten -EPROBE_DEFER in of_iio_channel_get_by_name
  iio:magnetometer: bugfix magnetometers gain values
  iio: adc: ad_sigma_delta: Fix indio_dev->trig assignment
  iio: st_sensors: Fix indio_dev->trig assignment
  iio: meter: ade7758: Fix indio_dev->trig assignment
  iio: inv_mpu6050: Fix indio_dev->trig assignment
  iio: gyro: itg3200: Fix indio_dev->trig assignment
  iio:trigger: modify return value for iio_trigger_get
  CIFS: Fix SMB2 readdir error handling
  CIFS: Fix directory rename error
  ASoC: davinci-mcasp: Correct rx format unit configuration
  shmem: fix nlink for rename overwrite directory
  x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
  KVM: x86: handle idiv overflow at kvm_write_tsc
  regmap: Fix handling of volatile registers for format_write() chips
  ACPICA: Update to GPIO region handler interface.
  MIPS: mcount: Adjust stack pointer for static trace in MIPS32
  MIPS: ZBOOT: add missing <linux/string.h> include
  ARM: 8165/1: alignment: don't break misaligned NEON load/store
  ARM: 7897/1: kexec: Use the right ISA for relocate_new_kernel
  ARM: 8133/1: use irq_set_affinity with force=false when migrating irqs
  ARM: 8128/1: abort: don't clear the exclusive monitors
  NFSv4: Fix another bug in the close/open_downgrade code
  NFSv4: nfs4_state_manager() vs. nfs_server_remove_lists()
  usb:hub set hub->change_bits when over-current happens
  usb: dwc3: omap: fix ordering for runtime pm calls
  USB: EHCI: unlink QHs even after the controller has stopped
  USB: storage: Add quirks for Entrega/Xircom USB to SCSI converters
  USB: storage: Add quirk for Ariston Technologies iConnect USB to SCSI adapter
  USB: storage: Add quirk for Adaptec USBConnect 2000 USB-to-SCSI Adapter
  storage: Add single-LUN quirk for Jaz USB Adapter
  usb: hub: take hub->hdev reference when processing from eventlist
  xhci: fix oops when xhci resumes from hibernate with hw lpm capable devices
  xhci: Fix null pointer dereference if xhci initialization fails
  USB: zte_ev: fix removed PIDs
  USB: ftdi_sio: add support for NOVITUS Bono E thermal printer
  USB: sierra: add 1199:68AA device ID
  USB: sierra: avoid CDC class functions on "68A3" devices
  USB: zte_ev: remove duplicate Qualcom PID
  USB: zte_ev: remove duplicate Gobi PID
  Revert "USB: option,zte_ev: move most ZTE CDMA devices to zte_ev"
  USB: option: add VIA Telecom CDS7 chipset device id
  USB: option: reduce interrupt-urb logging verbosity
  USB: serial: fix potential heap buffer overflow
  USB: sisusb: add device id for Magic Control USB video
  USB: serial: fix potential stack buffer overflow
  USB: serial: pl2303: add device id for ztek device
  xtensa: fix a6 and a7 handling in fast_syscall_xtensa
  xtensa: fix TLBTEMP_BASE_2 region handling in fast_second_level_miss
  xtensa: fix access to THREAD_RA/THREAD_SP/THREAD_DS
  xtensa: fix address checks in dma_{alloc,free}_coherent
  xtensa: replace IOCTL code definitions with constants
  drm/radeon: add connector quirk for fujitsu board
  drm/vmwgfx: Fix a potential infinite spin waiting for fifo idle
  drm/ast: AST2000 cannot be detected correctly
  drm/i915: Wait for vblank before enabling the TV encoder
  drm/i915: Remove bogus __init annotation from DMI callbacks
  HID: logitech-dj: prevent false errors to be shown
  HID: magicmouse: sanity check report size in raw_event() callback
  HID: picolcd: sanity check report size in raw_event() callback
  cfq-iosched: Fix wrong children_weight calculation
  ALSA: pcm: fix fifo_size frame calculation
  ALSA: hda - Fix invalid pin powermap without jack detection
  ALSA: hda - Fix COEF setups for ALC1150 codec
  ALSA: core: fix buffer overflow in snd_info_get_line()
  arm64: ptrace: fix compat hardware watchpoint reporting
  trace: Fix epoll hang when we race with new entries
  i2c: at91: Fix a race condition during signal handling in at91_do_twi_xfer.
  i2c: at91: add bound checking on SMBus block length bytes
  arm64: flush TLS registers during exec
  ibmveth: Fix endian issues with rx_no_buffer statistic
  ahci: add pcid for Marvel 0x9182 controller
  ahci: Add Device IDs for Intel 9 Series PCH
  pata_scc: propagate return value of scc_wait_after_reset
  drm/i915: read HEAD register back in init_ring_common() to enforce ordering
  drm/radeon: load the lm63 driver for an lm64 thermal chip.
  drm/ttm: Choose a pool to shrink correctly in ttm_dma_pool_shrink_scan().
  drm/ttm: Fix possible division by 0 in ttm_dma_pool_shrink_scan().
  drm/tilcdc: fix double kfree
  drm/tilcdc: fix release order on exit
  drm/tilcdc: panel: fix leak when unloading the module
  drm/tilcdc: tfp410: fix dangling sysfs connector node
  drm/tilcdc: slave: fix dangling sysfs connector node
  drm/tilcdc: panel: fix dangling sysfs connector node
  carl9170: fix sending URBs with wrong type when using full-speed
  Linux 3.10.55
  libceph: gracefully handle large reply messages from the mon
  libceph: rename ceph_msg::front_max to front_alloc_len
  tpm: Provide a generic means to override the chip returned timeouts
  vfs: fix bad hashing of dentries
  dcache.c: get rid of pointless macros
  IB/srp: Fix deadlock between host removal and multipathd
  blkcg: don't call into policy draining if root_blkg is already gone
  mtd: nand: omap: Fix 1-bit Hamming code scheme, omap_calculate_ecc()
  mtd/ftl: fix the double free of the buffers allocated in build_maps()
  CIFS: Fix wrong restart readdir for SMB1
  CIFS: Fix wrong filename length for SMB2
  CIFS: Fix wrong directory attributes after rename
  CIFS: Possible null ptr deref in SMB2_tcon
  CIFS: Fix async reading on reconnects
  CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2
  libceph: do not hard code max auth ticket len
  libceph: add process_one_ticket() helper
  libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly
  md/raid1,raid10: always abort recover on write error.
  xfs: don't zero partial page cache pages during O_DIRECT writes
  xfs: don't zero partial page cache pages during O_DIRECT writes
  xfs: don't dirty buffers beyond EOF
  xfs: quotacheck leaves dquot buffers without verifiers
  RDMA/iwcm: Use a default listen backlog if needed
  md/raid10: Fix memory leak when raid10 reshape completes.
  md/raid10: fix memory leak when reshaping a RAID10.
  md/raid6: avoid data corruption during recovery of double-degraded RAID6
  Bluetooth: Avoid use of session socket after the session gets freed
  Bluetooth: never linger on process exit
  mnt: Add tests for unprivileged remount cases that have found to be faulty
  mnt: Change the default remount atime from relatime to the existing value
  mnt: Correct permission checks in do_remount
  mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
  mnt: Only change user settable mount flags in remount
  ring-buffer: Up rb_iter_peek() loop count to 3
  ring-buffer: Always reset iterator to reader page
  ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock
  ACPI: Run fixed event device notifications in process context
  ACPICA: Utilities: Fix memory leak in acpi_ut_copy_iobject_to_iobject
  bfa: Fix undefined bit shift on big-endian architectures with 32-bit DMA address
  ASoC: pxa-ssp: drop SNDRV_PCM_FMTBIT_S24_LE
  ASoC: max98090: Fix missing free_irq
  ASoC: samsung: Correct I2S DAI suspend/resume ops
  ASoC: wm_adsp: Add missing MODULE_LICENSE
  ASoC: pcm: fix dpcm_path_put in dpcm runtime update
  openrisc: Rework signal handling
  MIPS: Fix accessing to per-cpu data when flushing the cache
  MIPS: OCTEON: make get_system_type() thread-safe
  MIPS: asm: thread_info: Add _TIF_SECCOMP flag
  MIPS: Cleanup flags in syscall flags handlers.
  MIPS: asm/reg.h: Make 32- and 64-bit definitions available at the same time
  MIPS: Remove BUG_ON(!is_fpu_owner()) in do_ade()
  MIPS: tlbex: Fix a missing statement for HUGETLB
  MIPS: Prevent user from setting FCSR cause bits
  MIPS: GIC: Prevent array overrun
  drivers: scsi: storvsc: Correctly handle TEST_UNIT_READY failure
  Drivers: scsi: storvsc: Implement a eh_timed_out handler
  powerpc/pseries: Failure on removing device node
  powerpc/mm: Use read barrier when creating real_pte
  powerpc/mm/numa: Fix break placement
  regulator: arizona-ldo1: remove bypass functionality
  mfd: omap-usb-host: Fix improper mask use.
  kernel/smp.c:on_each_cpu_cond(): fix warning in fallback path
  CAPABILITIES: remove undefined caps from all processes
  tpm: missing tpm_chip_put in tpm_get_random()
  firmware: Do not use WARN_ON(!spin_is_locked())
  spi: omap2-mcspi: Configure hardware when slave driver changes mode
  spi: orion: fix incorrect handling of cell-index DT property
  iommu/amd: Fix cleanup_domain for mass device removal
  media: media-device: Remove duplicated memset() in media_enum_entities()
  media: au0828: Only alt setting logic when needed
  media: xc4000: Fix get_frequency()
  media: xc5000: Fix get_frequency()
  Linux 3.10.54
  USB: fix build error with CONFIG_PM_RUNTIME disabled
  NFSv4: Fix problems with close in the presence of a delegation
  NFSv3: Fix another acl regression
  svcrdma: Select NFSv4.1 backchannel transport based on forward channel
  NFSD: Decrease nfsd_users in nfsd_startup_generic fail
  usb: hub: Prevent hub autosuspend if usbcore.autosuspend is -1
  USB: whiteheat: Added bounds checking for bulk command response
  USB: ftdi_sio: Added PID for new ekey device
  USB: ftdi_sio: add Basic Micro ATOM Nano USB2Serial PID
  ARM: OMAP2+: hwmod: Rearm wake-up interrupts for DT when MUSB is idled
  usb: xhci: amd chipset also needs short TX quirk
  xhci: Treat not finding the event_seg on COMP_STOP the same as COMP_STOP_INVAL
  Staging: speakup: Update __speakup_paste_selection() tty (ab)usage to match vt
  jbd2: fix infinite loop when recovering corrupt journal blocks
  mei: nfc: fix memory leak in error path
  mei: reset client state on queued connect request
  Btrfs: fix csum tree corruption, duplicate and outdated checksums
  hpsa: fix bad -ENOMEM return value in hpsa_big_passthru_ioctl
  x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub
  x86_64/vsyscall: Fix warn_bad_vsyscall log output
  x86: don't exclude low BIOS area when allocating address space for non-PCI cards
  drm/radeon: add additional SI pci ids
  ext4: fix BUG_ON in mb_free_blocks()
  kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601)
  Revert "KVM: x86: Increase the number of fixed MTRR regs to 10"
  KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
  KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table
  KVM: x86: Inter-privilege level ret emulation is not implemeneted
  crypto: ux500 - make interrupt mode plausible
  serial: core: Preserve termios c_cflag for console resume
  ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct
  drivers/i2c/busses: use correct type for dma_map/unmap
  hwmon: (dme1737) Prevent overflow problem when writing large limits
  hwmon: (ads1015) Fix out-of-bounds array access
  hwmon: (lm85) Fix various errors on attribute writes
  hwmon: (ads1015) Fix off-by-one for valid channel index checking
  hwmon: (gpio-fan) Prevent overflow problem when writing large limits
  hwmon: (lm78) Fix overflow problems seen when writing large temperature limits
  hwmon: (sis5595) Prevent overflow problem when writing large limits
  drm: omapdrm: fix compiler errors
  ARM: OMAP3: Fix choice of omap3_restore_es function in OMAP34XX rev3.1.2 case.
  mei: start disconnect request timer consistently
  ALSA: hda/realtek - Avoid setting wrong COEF on ALC269 & co
  ALSA: hda/ca0132 - Don't try loading firmware at resume when already failed
  ALSA: virtuoso: add Xonar Essence STX II support
  ALSA: hda - fix an external mic jack problem on a HP machine
  USB: Fix persist resume of some SS USB devices
  USB: ehci-pci: USB host controller support for Intel Quark X1000
  USB: serial: ftdi_sio: Add support for new Xsens devices
  USB: serial: ftdi_sio: Annotate the current Xsens PID assignments
  USB: OHCI: don't lose track of EDs when a controller dies
  isofs: Fix unbounded recursion when processing relocated directories
  HID: fix a couple of off-by-ones
  HID: logitech: perform bounds checking on device_id early enough
  stable_kernel_rules: Add pointer to netdev-FAQ for network patches
  Linux 3.10.53
  arch/sparc/math-emu/math_32.c: drop stray break operator
  sparc64: ldc_connect() should not return EINVAL when handshake is in progress.
  sunsab: Fix detection of BREAK on sunsab serial console
  bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000
  sparc64: Guard against flushing openfirmware mappings.
  sparc64: Do not insert non-valid PTEs into the TSB hash table.
  sparc64: Add membar to Niagara2 memcpy code.
  sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.
  sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses.
  sparc64: Fix top-level fault handling bugs.
  sparc64: Handle 32-bit tasks properly in compute_effective_address().
  sparc64: Make itc_sync_lock raw
  sparc64: Fix argument sign extension for compat_sys_futex().
  sctp: fix possible seqlock seadlock in sctp_packet_transmit()
  iovec: make sure the caller actually wants anything in memcpy_fromiovecend
  net: Correctly set segment mac_len in skb_segment().
  macvlan: Initialize vlan_features to turn on offload support.
  net: sctp: inherit auth_capable on INIT collisions
  tcp: Fix integer-overflow in TCP vegas
  tcp: Fix integer-overflows in TCP veno
  net: sendmsg: fix NULL pointer dereference
  ip: make IP identifiers less predictable
  inetpeer: get rid of ip_id_count
  bnx2x: fix crash during TSO tunneling
  Linux 3.10.52
  x86/espfix/xen: Fix allocation of pages for paravirt page tables
  lib/btree.c: fix leak of whole btree nodes
  net/l2tp: don't fall back on UDP [get|set]sockopt
  net: mvneta: replace Tx timer with a real interrupt
  net: mvneta: add missing bit descriptions for interrupt masks and causes
  net: mvneta: do not schedule in mvneta_tx_timeout
  net: mvneta: use per_cpu stats to fix an SMP lock up
  net: mvneta: increase the 64-bit rx/tx stats out of the hot path
  Revert "mac80211: move "bufferable MMPDU" check to fix AP mode scan"
  staging: vt6655: Fix Warning on boot handle_irq_event_percpu.
  x86_64/entry/xen: Do not invoke espfix64 on Xen
  x86, espfix: Make it possible to disable 16-bit support
  x86, espfix: Make espfix64 a Kconfig option, fix UML
  x86, espfix: Fix broken header guard
  x86, espfix: Move espfix definitions into a separate header file
  x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack
  Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"
  timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
  printk: rename printk_sched to printk_deferred
  iio: buffer: Fix demux table creation
  staging: vt6655: Fix disassociated messages every 10 seconds
  mm, thp: do not allow thp faults to avoid cpuset restrictions
  scsi: handle flush errors properly
  rapidio/tsi721_dma: fix failure to obtain transaction descriptor
  cfg80211: fix mic_failure tracing
  ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout
  crypto: af_alg - properly label AF_ALG socket
  Linux 3.10.51
  core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
  x86/efi: Include a .bss section within the PE/COFF headers
  s390/ptrace: fix PSW mask check
  Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
  mm: hugetlb: fix copy_hugetlb_page_range()
  x86_32, entry: Store badsys error code in %eax
  hwmon: (smsc47m192) Fix temperature limit and vrm write operations
  parisc: Remove SA_RESTORER define
  coredump: fix the setting of PF_DUMPCORE
  Input: fix defuzzing logic
  slab_common: fix the check for duplicate slab names
  slab_common: Do not check for duplicate slab names
  tracing: Fix wraparound problems in "uptime" trace clock
  blkcg: don't call into policy draining if root_blkg is already gone
  ahci: add support for the Promise FastTrak TX8660 SATA HBA (ahci mode)
  libata: introduce ata_host->n_tags to avoid oops on SAS controllers
  libata: support the ata host which implements a queue depth less than 32
  block: don't assume last put of shared tags is for the host
  block: provide compat ioctl for BLKZEROOUT
  media: tda10071: force modulation to QPSK on DVB-S
  media: hdpvr: fix two audio bugs
  Linux 3.10.50
  ARC: Implement ptrace(PTRACE_GET_THREAD_AREA)
  sched: Fix possible divide by zero in avg_atom() calculation
  locking/mutex: Disable optimistic spinning on some architectures
  PM / sleep: Fix request_firmware() error at resume
  dm cache metadata: do not allow the data block size to change
  dm thin metadata: do not allow the data block size to change
  alarmtimer: Fix bug where relative alarm timers were treated as absolute
  drm/radeon: avoid leaking edid data
  drm/qxl: return IRQ_NONE if it was not our irq
  drm/radeon: set default bl level to something reasonable
  irqchip: gic: Fix core ID calculation when topology is read from DT
  irqchip: gic: Add support for cortex a7 compatible string
  ring-buffer: Fix polling on trace_pipe
  mwifiex: fix Tx timeout issue
  perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  ipv4: fix buffer overflow in ip_options_compile()
  dns_resolver: Null-terminate the right string
  dns_resolver: assure that dns_query() result is null-terminated
  sunvnet: clean up objects created in vnet_new() on vnet_exit()
  net: pppoe: use correct channel MTU when using Multilink PPP
  net: sctp: fix information leaks in ulpevent layer
  tipc: clear 'next'-pointer of message fragments before reassembly
  be2net: set EQ DB clear-intr bit in be_open()
  netlink: Fix handling of error from netlink_dump().
  net: mvneta: Fix big endian issue in mvneta_txq_desc_csum()
  net: mvneta: fix operation in 10 Mbit/s mode
  appletalk: Fix socket referencing in skb
  tcp: fix false undo corner cases
  igmp: fix the problem when mc leave group
  net: qmi_wwan: add two Sierra Wireless/Netgear devices
  net: qmi_wwan: Add ID for Telewell TW-LTE 4G v2
  ipv4: icmp: Fix pMTU handling for rare case
  tcp: Fix divide by zero when pushing during tcp-repair
  bnx2x: fix possible panic under memory stress
  net: fix sparse warning in sk_dst_set()
  ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix
  ipv4: fix dst race in sk_dst_get()
  8021q: fix a potential memory leak
  net: sctp: check proc_dointvec result in proc_sctp_do_auth
  tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb
  ip_tunnel: fix ip_tunnel_lookup
  shmem: fix splicing from a hole while it's punched
  shmem: fix faulting into a hole, not taking i_mutex
  shmem: fix faulting into a hole while it's punched
  iwlwifi: dvm: don't enable CTS to self
  igb: do a reset on SR-IOV re-init if device is down
  hwmon: (adt7470) Fix writes to temperature limit registers
  hwmon: (da9052) Don't use dash in the name attribute
  hwmon: (da9055) Don't use dash in the name attribute
  tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs
  tracing: Fix graph tracer with stack tracer on other archs
  fuse: handle large user and group ID
  Bluetooth: Ignore H5 non-link packets in non-active state
  Drivers: hv: util: Fix a bug in the KVP code
  media: gspca_pac7302: Add new usb-id for Genius i-Look 317
  usb: Check if port status is equal to RxDetect

Signed-off-by: Ian Maund <imaund@codeaurora.org>
2015-05-01 13:34:57 -07:00
Al Viro 6637ecd306 move d_rcu from overlapping d_child to overlapping d_alias
commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ben Hutchings <ben@decadent.org.uk>
[hujianyang: Backported to 3.10 refer to the work of Ben Hutchings in 3.2:
 - Apply name changes in all the different places we use d_alias and d_child
 - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()]
Signed-off-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29 10:34:00 +02:00
Peter Hurley 391f1c610a console: Fix console name size mismatch
commit 30a22c215a0007603ffc08021f2e8b64018517dd upstream.

commit 6ae9200f2c ("enlarge console.name") increased the storage
for the console name to 16 bytes, but not the corresponding
struct console_cmdline::name storage. Console names longer than
8 bytes cause read beyond end-of-string and failure to match
console; I'm not sure if there are other unexpected consequences.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-19 10:10:51 +02:00
Peter Zijlstra a49a0c95f4 perf: Fix irq_work 'tail' recursion
commit d525211f9d1be8b523ec7633f080f2116f5ea536 upstream.

Vince reported a watchdog lockup like:

	[<ffffffff8115e114>] perf_tp_event+0xc4/0x210
	[<ffffffff810b4f8a>] perf_trace_lock+0x12a/0x160
	[<ffffffff810b7f10>] lock_release+0x130/0x260
	[<ffffffff816c7474>] _raw_spin_unlock_irqrestore+0x24/0x40
	[<ffffffff8107bb4d>] do_send_sig_info+0x5d/0x80
	[<ffffffff811f69df>] send_sigio_to_task+0x12f/0x1a0
	[<ffffffff811f71ce>] send_sigio+0xae/0x100
	[<ffffffff811f72b7>] kill_fasync+0x97/0xf0
	[<ffffffff8115d0b4>] perf_event_wakeup+0xd4/0xf0
	[<ffffffff8115d103>] perf_pending_event+0x33/0x60
	[<ffffffff8114e3fc>] irq_work_run_list+0x4c/0x80
	[<ffffffff8114e448>] irq_work_run+0x18/0x40
	[<ffffffff810196af>] smp_trace_irq_work_interrupt+0x3f/0xc0
	[<ffffffff816c99bd>] trace_irq_work_interrupt+0x6d/0x80

Which is caused by an irq_work generating new irq_work and therefore
not allowing forward progress.

This happens because processing the perf irq_work triggers another
perf event (tracepoint stuff) which in turn generates an irq_work ad
infinitum.

Avoid this by raising the recursion counter in the irq_work -- which
effectively disables all software events (including tracepoints) from
actually triggering again.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-13 14:02:12 +02:00
Linux Build Service Account bb5760c08d Merge "sched: fix race conditions where HMP tunables change" 2015-04-07 01:00:42 -07:00
Linux Build Service Account 07ecb8c824 Merge "sched: check HMP scheduler tunables validity" 2015-04-07 01:00:41 -07:00
Linux Build Service Account 93adc6a54e Merge "exit: Add PANIC_ON_RECURSIVE_FAULT Kconfig option" 2015-04-07 01:00:29 -07:00
Joonwoo Park dfc7b64857 sched: fix race conditions where HMP tunables change
When multiple threads race to update HMP scheduler tunables, at present,
the tunables which require big/small task count fix-up can be updated
without fix-up and it can trigger BUG_ON().
This happens because sched_hmp_proc_update_handler() acquires rq locks and
does fix-up only when number of big/small tasks affecting tunables are
updated even though the function sched_hmp_proc_update_handler() calls
set_hmp_defaults() which re-calculates all sysctl input data at that
point.  Consequently a thread that is trying to update a tunable which does
not affect big/small task count can call set_hmp_defaults() and update
big/small task count affecting tunable without fix-up if there is another
thread and it just set fix-up needed sysctl value.

Example of problem scenario :
thread 0                               thread 1
Set sched_small_task – needs fix up.
                                       Set sched_init_task_load – no fix
                                       up needed.
proc_dointvec_minmax() completed
which means sysctl_sched_small_task has
new value.
                                       Call set_hmp_defaults() without
                                       lock/fixup. set_hmp_defaults() still
                                       updates sched_small_tasks with new
                                       sysctl_sched_small_task value by
                                       thread 0.

Fix such issue by embracing proc update handler with already existing
policy mutex.

CRs-fixed: 812443
Change-Id: I7aa4c0efc1ca56e28dc0513480aca3264786d4f7
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-04-06 18:34:59 -07:00
Joonwoo Park eeb8ce07ec sched: check HMP scheduler tunables validity
Check tunables validity to take valid values only.

CRs-fixed: 812443
Change-Id: Ibb9ec0d6946247068174ab7abe775a6389412d5b
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-04-06 18:34:31 -07:00
Joonwoo Park 93b96a6fc9 timer: reduce cache bouncing of deferral timer wheel
Commit a40f752 (timer: make deferrable cpu unbound timers really not
bound to a cpu) made any non-idle CPUs in the system can service expired
deferral timer entries.  This means deferral timer entries can be
serviced as early as possible after it expires but in many cases it's
suboptimal since switching the CPU of timer wheel incurs a cache
bouncing/synchronization cost.

Reduce cache bouncing by servicing the deferrable timer wheel with timer
tick CPU as it doesn't bounce as often but still guarantees to run timer
softirq when there is any CPU that isn't idle.

CRs-fixed: 815184
Change-Id: I6d51390832538ad90eccf459e90bff0b25aad09f
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-04-04 01:52:51 -07:00
Matt Wagantall 1b5958bc7c exit: Add PANIC_ON_RECURSIVE_FAULT Kconfig option
If a recursive fault is detected during do_exit(), tasks are left
to sit and wait in an un-interruptible sleep until the system
reboots (typically manually). Add Kconfig option to change this
behaviour and force a panic.

This is particularly important if a critical system task encounters
a recursive fault (ex. a kworker). Otherwise, the system may be
unusable, but since the scheduler is still running system watchdogs
may continue to be pet.

Change-Id: Ifc26fc79d6066f05a3b2c4d27f78bf4f8d2bd640
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
2015-03-31 23:54:50 -07:00
Tejun Heo d8bee0e3ab workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
commit 8603e1b30027f943cc9c1eef2b291d42c3347af1 upstream.

cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.

try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
this case, __cancel_work_timer() currently invokes flush_work().  The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping

Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority.  Let's say task A just got woken
up from flush_work() by the completion of the target work item.  If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing.  This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.

task A			task B			worker

						executing work
__cancel_work_timer()
  try_to_grab_pending()
  set work CANCELING
  flush_work()
    block for work completion
						completion, wakes up A
			__cancel_work_timer()
			while (forever) {
			  try_to_grab_pending()
			    -ENOENT as work is being canceled
			  flush_work()
			    false as work is no longer executing
			}

This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.

Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com

v3: bit_waitqueue() can't be used for work items defined in vmalloc
    area.  Switched to custom wake function which matches the target
    work item and exclusive wait and wakeup.

v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
    the target bit waitqueue has wait_bit_queue's on it.  Use
    DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
    Vizoso.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Rabin Vincent <rabin.vincent@axis.com>
Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com>
Tested-by: Jesper Nilsson <jesper.nilsson@axis.com>
Tested-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-26 15:00:58 +01:00
Syed Rameez Mustafa 9ce460cacd sched: Ensure attempting load balance when HMP active balance flags are set
find_busiest_group() can end up returning a NULL group due to load based
checks even though there are tasks that can be migrated to higher capacity
CPUs (LBF_BIG_TASK_ACTIVE_BALANCE) or EA core rotation is possible
(LBF_EA_ACTIVE_BALANCE). To get best power and performance ensure that load
balance does attempt to pull tasks when HMP_ACTIVE_BALANCE flag is set.
Since sched boost also falls under the same category club it into the same
generic condition.

Change-Id: I3db7ec200d2a038917b1f2341602eb87b5aed289
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-03-20 11:13:24 -07:00
Michael Scott 8dbaea2b3d PM / QoS: remove duplicate call to pm_qos_update_target
In 3.10.y backport patch 1dba303727,
the logic to call pm_qos_update_target was moved to __pm_qos_update_request.
However, the original code was left in function pm_qos_update_request.

Currently, if pm_qos_update_request is called where new_value !=
req->node.prio then pm_qos_update_target will be called twice in a row.
Once in pm_qos_update_request and then again in the following call to
_pm_qos_update_request.

Removing the left over code from pm_qos_update_request stops this second
call to pm_qos_update_target where the work of removing / re-adding the
new_value in the constraints list would be duplicated.

Signed-off-by: Michael Scott <michael.scott@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-18 13:22:28 +01:00
John Stultz 72e2d60926 ntp: Fixup adjtimex freq validation on 32-bit systems
commit 29183a70b0b828500816bd794b3fe192fce89f73 upstream.

Additional validation of adjtimex freq values to avoid
potential multiplication overflows were added in commit
5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values)

Unfortunately the patch used LONG_MAX/MIN instead of
LLONG_MAX/MIN, which was fine on 64-bit systems, but being
much smaller on 32-bit systems caused false positives
resulting in most direct frequency adjustments to fail w/
EINVAL.

ntpd only does direct frequency adjustments at startup, so
the issue was not as easily observed there, but other time
sync applications like ptpd and chrony were more effected by
the bug.

See bugs:

  https://bugzilla.kernel.org/show_bug.cgi?id=92481
  https://bugzilla.redhat.com/show_bug.cgi?id=1188074

This patch changes the checks to use LLONG_MAX for
clarity, and additionally the checks are disabled
on 32-bit systems since LLONG_MAX/PPM_SCALE is always
larger then the 32-bit long freq value, so multiplication
overflows aren't possible there.

Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Reported-by: George Joseph <george.joseph@fairview5.com>
Tested-by: George Joseph <george.joseph@fairview5.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stultz@linaro.org
[ Prettified the changelog and the comments a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06 14:40:52 -08:00
Jay Lan ab92b84e8d kdb: fix incorrect counts in KDB summary command output
commit 146755923262037fc4c54abc28c04b1103f3cc51 upstream.

The output of KDB 'summary' command should report MemTotal, MemFree
and Buffers output in kB. Current codes report in unit of pages.

A define of K(x) as
is defined in the code, but not used.

This patch would apply the define to convert the values to kB.
Please include me on Cc on replies. I do not subscribe to linux-kernel.

Signed-off-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06 14:40:52 -08:00
Vikram Mulukutla 0d998434ce tracing: Fix unmapping loop in tracing_mark_write
commit 7215853e985a4bef1a6c14e00e89dfec84f1e457 upstream.

Commit 6edb2a8a38 introduced
an array map_pages that contains the addresses returned by
kmap_atomic. However, when unmapping those pages, map_pages[0]
is unmapped before map_pages[1], breaking the nesting requirement
as specified in the documentation for kmap_atomic/kunmap_atomic.

This was caught by the highmem debug code present in kunmap_atomic.
Fix the loop to do the unmapping properly.

Link: http://lkml.kernel.org/r/1418871056-6614-1-git-send-email-markivx@codeaurora.org

Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Reported-by: Lime Yang <limey@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06 14:40:50 -08:00
Srivatsa Vaddagiri cbf9e59b96 sched: Add cgroup-based criteria for upmigration
It may be desirable to discourage upmigration of tasks belonging to
some cgroups. Add a per-cgroup flag (upmigrate_discourage) that
discourages upmigration of tasks of a cgroup. Tasks of the cgroup are
allowed to upmigrate only under overcommitted scenario.

Change-Id: I1780e420af1b6865c5332fb55ee1ee408b74d8ce
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-03-05 15:50:56 -08:00
Joonwoo Park ed177c3203 sched: actively migrate big tasks on power CPU to idle performance CPU
When performance CPU runs idle or newly idle load balancer to pull a
task on power efficient CPU, the load balancer always fails and enters
idle mode if the big task on the power efficient CPU is running.  This is
suboptimal when the running task on the power efficient CPU doesn't fit
on the power efficient CPU as it's quite possible that the big task will
sustain on the power efficient CPU until it's preempted while there is
a performance CPU sitting idle.

Revise load balancer algorithm to actively migrate big tasks on power
efficient CPU to performance CPU when performance CPU runs idle or newly
idle load balancer.

Change-Id: Iaf05e0236955fdcc7ded0ff09af0880050a2be32
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-03-04 11:30:27 -08:00
Linux Build Service Account a234225810 Merge "sched: avoid running idle_balance() on behalf of wrong CPU" 2015-03-03 18:27:22 -08:00
Linux Build Service Account ded4c87aba Merge "sched: Avoid pulling big tasks to the little cluster during load balance" 2015-03-03 18:27:21 -08:00
Joonwoo Park d577d4a23c sched: avoid running idle_balance() on behalf of wrong CPU
With EA (Energy Awareness), idle_balance() on a CPU runs on behalf of most
power efficient idle CPU among the CPUs in its sched domain level under the
condition that the substitute idle CPU should be limited to a CPU which has
the same capacity with original idle CPU.
It is found that at present idle_balance() spans all the CPUs in its sched
domain and run idle balancer on behalf of any CPU within the domain which
could be all the CPUs in the system which consequently makes idle balancer
on a performance CPU always runs on behalf of a power efficient idle CPU.
This would cause for idle performance CPUs to fail to pull tasks from power
efficient CPUs always when there is only an online performance CPU.

Limit search CPUs to cache sharing CPUs with original idle CPU to ensure to
run idle balancre on behalf of more power efficient CPU but still has the
same capacity with original CPU to fix such issue.

Change-Id: I0575290c24f28db011d9353915186e64df7e57fe
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-03-01 08:41:23 -08:00
Syed Rameez Mustafa 70f7aae2e8 sched: Avoid pulling big tasks to the little cluster during load balance
When a lower capacity CPU attempts to pull work from a higher capacity CPU,
during load balance, it does not distinguish between tasks that will fit
or not fit on the destination CPU. This causes suboptimal load balancing
decisions whereby big tasks end up on the lower capacity CPUs and little
tasks remain on higher capacity CPUs. Avoid this behavior, by first
restricting search to only include tasks that fit on the destination CPU.
If such a task cannot be found, remove this restriction so that any task
can be pulled over to the destination CPU. This behavior is not applicable
during sched_boost, however, as none of the tasks will fit on a lower
capacity CPU.

Change-Id: I1093420a629a0886fc3375849372ab7cf42e928e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-02-28 16:50:45 -08:00
Syed Rameez Mustafa 7b0a1a7e87 sched: Avoid pulling all tasks from a CPU during load balance
When running load balance, the destination CPU checks the number
of running tasks on the busiest CPU without holding the busiest
CPUs runqueue lock. This opens the load balancer to a race whereby
a third CPU running load balance at the same time; having found the
same busiest group and queue, may have already pulled one of the
waiting tasks from the busiest CPU. Under scenarios where the source
CPU is running the idle task and only a single task remains waiting on
the busiest runqueue (nr_running = 1), the destination CPU will end
up pulling the only enqueued task from that CPU, leaving the destination
CPU with nothing left to run. Fix this race, by reconfirming nr_running
for the busiest CPU, after its runqueue lock has been obtained.

Change-Id: I42e132b15f96d9d5d7b32ef4de3fb92d2f837e63
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-02-28 16:50:37 -08:00
Srivatsa Vaddagiri 8f9ba192b6 sched: Keep track of average nr_big_tasks
Extend sched_get_nr_running_avg() API to return average nr_big_tasks,
in addition to average nr_running and average nr_io_wait tasks. Also
add a new trace point to record values returned by
sched_get_nr_running_avg() API.

Change-Id: Id3591e6d04da8db484b4d1cb9d95dba075f5ab9a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-02-26 10:37:26 -08:00
Srivatsa Vaddagiri 3f116af70d sched: Fix bug in average nr_running and nr_iowait calculation
sched_get_nr_running_avg() returns average nr_running and nr_iowait
task count since it was last invoked. Fix several bugs in their
calculation.

* sched_update_nr_prod() needs to consider that nr_running count can
  change by more than 1 when CFS_BANDWIDTH feature is used

* sched_get_nr_running_avg() needs to sum up nr_iowait count across
  all cpus, rather than just one

* sched_get_nr_running_avg() could race with sched_update_nr_prod(),
  as a result of which it could use curr_time which is behind a cpu's
  'last_time' value. That would lead to erroneous calculation of
  average nr_running or nr_iowait.

While at it, fix also a bug in BUG_ON() check in
sched_update_nr_prod() function and remove unnecessary nr_running
argument to sched_update_nr_prod() function.

Change-Id: I46737614737292fae0d7204c4648fb9b862f65b2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-02-26 10:33:52 -08:00
Olav Haugan f0d9067045 sched/fair: Respect wake to idle over sync wakeup
Sync wakeup currently takes precedence over wake to idle flag. A sync
wakeup causes a task to be placed on a non-idle CPU because we expect
this CPU to become idle very shortly. However, even though the sync flag
is set there is no guarantee that the task will go to sleep right away
As a consequence performance suffers.

Fix this by preferring an idle CPU over a potential busy cpu when both
wake to idle and sync wakeup are set.

Change-Id: I6b40a44e2b4d5b5fa6088e4f16428f9867bd928d
CRs-fixed: 794424
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2015-02-24 18:35:45 -08:00
bait_dispatcher_monitor_system 1cac96e2e4 Merge "sched: fix rounding error on scaled execution time calculation" 2015-02-19 13:45:45 -08:00
Joonwoo Park 3a037e913e sched: fix rounding error on scaled execution time calculation
It's found that the scaled execution time can be less than its actual time
due to rounding errors.  The HMP scheduler accumulates scaled execution
time of tasks to determine if tasks are in need of up-migration.  But the
rounding error prevents the HMP scheduler from accumulating 100% load which
prevents us from ever reaching an up-migrate of 100%.
Fix rounding error by rounding quotient up.

CRs-fixed: 759041
Change-Id: Ie4d9693593cc3053a292a29078aa56e6de8a2d52
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-02-18 19:54:12 -08:00
Patrick Daly a5ad036f58 msm: rtb: Fix buffer corruption issue
Consider the case of a nentries==8 and 3 cpus.
Numbers in parenthesis are the equivalent location in the circular buffer.
CPU:   Index0:  Index1: Index2: Index3:
0      0        3       6       9(1)
1      1        4       7       10(2)
2      2        5       8(0)

The current design is only appropriate for the case where
nentries % nrcpus == 0.

Fix this issue by incrementing the index by (nentries % nrcpus)
each time circular buffer wraps around.

CPU:   Index0:  Index1: Index2:
0      0        3       6+2==8(0)
1      1        4       7+2==9(1)
2      2        5       8+2==10(2)

Change-Id: I4f96eb4c971cc18357e145dabcf4272e466dcda2
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
2015-02-18 13:02:11 -08:00
Tejun Heo ede8177f2e workqueue: fix subtle pool management issue which can stall whole worker_pool
A worker_pool's forward progress is guaranteed by the fact that the
last idle worker assumes the manager role to create more workers and
summon the rescuers if creating workers doesn't succeed in timely
manner before proceeding to execute work items.

This manager role is implemented in manage_workers(), which indicates
whether the worker may proceed to work item execution with its return
value.  This is necessary because multiple workers may contend for the
manager role, and, if there already is a manager, others should
proceed to work item execution.

Unfortunately, the function also indicates that the worker may proceed
to work item execution if need_to_create_worker() is false at the head
of the function.  need_to_create_worker() tests the following
conditions.

	pending work items && !nr_running && !nr_idle

The first and third conditions are protected by pool->lock and thus
won't change while holding pool->lock; however, nr_running can change
asynchronously as other workers block and resume and while it's likely
to be zero, as someone woke this worker up in the first place, some
other workers could have become runnable inbetween making it non-zero.

If this happens, manage_worker() could return false even with zero
nr_idle making the worker, the last idle one, proceed to execute work
items.  If then all workers of the pool end up blocking on a resource
which can only be released by a work item which is pending on that
pool, the whole pool can deadlock as there's no one to create more
workers or summon the rescuers.

This patch fixes the problem by removing the early exit condition from
maybe_create_worker() and making manage_workers() return false iff
there's already another manager, which ensures that the last worker
doesn't start executing work items.

We can leave the early exit condition alone and just ignore the return
value but the only reason it was put there is because the
manage_workers() used to perform both creations and destructions of
workers and thus the function may be invoked while the pool is trying
to reduce the number of workers.  Now that manage_workers() is called
only when more workers are needed, the only case this early exit
condition is triggered is rare race conditions rendering it pointless.

Tested with simulated workload and modified workqueue code which
trigger the pool deadlock reliably without this patch.

Change-Id: If0ca1b86c06ee17b7fbb670b0e70b421a256b2b7
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Eric Sandeen <sandeen@sandeen.net>
Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net
Cc: Dave Chinner <david@fromorbit.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: stable@vger.kernel.org
CRs-fixed: 783758
Git-commit: 29187a9eeaf362d8422e62e17a22a6e115277a49
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[joonwoop@codeaurora.org: fixed conflicts in comments and manage_workers()
 to keep maybe_destroy_workers()]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-02-14 00:42:27 -08:00
Thomas Gleixner d1217e81bc hrtimer: Prevent stale expiry time in hrtimer_interrupt()
hrtimer_interrupt() has the following subtle issue:

hrtimer_interrupt()
  lock(cpu_base);
  expires_next = KTIME_MAX;

  expire_timers(CLOCK_MONOTONIC);
  expires = get_next_timer(CLOCK_MONOTONIC);
  if (expires < expires_next)
    expires_next = expires;
  expire_timers(CLOCK_REALTIME);
    unlock(cpu_base);
    wakeup()
    hrtimer_start(CLOCK_MONOTONIC, newtimer);
    lock(cpu_base();
  expires = get_next_timer(CLOCK_REALTIME);
  if (expires < expires_next)
    expires_next = expires;
So because we already evaluated the next expiring timer of
CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be
earlier than the overall next expiry time in hrtimer_interrupt().

To solve this, remove the caching of the next expiry value from
hrtimer_interrupt() and reevaluate all active clock bases for the next
expiry value. To avoid another code duplication, create a shared
evaluation function and use it for hrtimer_get_next_event(),
hrtimer_force_reprogram() and hrtimer_interrupt().

There is another subtlety in this mechanism:

While hrtimer_interrupt() is running, we want to avoid to touch the
hardware device because we will reprogram it anyway at the end of
hrtimer_interrupt(). This works nicely for hrtimers which get rearmed
via the HRTIMER_RESTART mechanism, because we drop out when the
callback on that CPU is running. But that fails, if a new timer gets
enqueued like in the example above.

This has another implication: While hrtimer_interrupt() is running we
refuse remote enqueueing of timers - see hrtimer_interrupt() and
hrtimer_check_target().

hrtimer_interrupt() tries to prevent this by setting cpu_base->expires
to KTIME_MAX, but that fails if a new timer gets queued.

Prevent both the hardware access and the remote enqueue
explicitely. We can loosen the restriction on the remote enqueue now
due to reevaluation of the next expiry value, but that needs a
seperate patch.

Folded in a fix from Vignesh Radhakrishnan.

Change-Id: I803322cc29a294eab73fa2046e9f3a2e5f66755e
Reported-and-tested-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Based-on-patch-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: vigneshr@codeaurora.org
Cc: john.stultz@linaro.org
Cc: viresh.kumar@linaro.org
Cc: fweisbec@gmail.com
Cc: cl@linux.com
Cc: stuart.w.hayes@gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Patch-mainline : linux-arm-kernel @ 01/23/15, 03:21
[vigneshr@codeaurora.org : Changes to the file kernel/time/hrtimer.c
is made to the file kernel/hrtimer.c]
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
2015-02-13 04:20:48 -08:00
Lai Jiangshan 677616e3ec smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread()
commit 4bee96860a65c3a62d332edac331b3cf936ba3ad upstream.

The following race exists in the smpboot percpu threads management:

CPU0	      	   	     CPU1
cpu_up(2)
  get_online_cpus();
  smpboot_create_threads(2);
			     smpboot_register_percpu_thread();
			     for_each_online_cpu();
			       __smpboot_create_thread();
  __cpu_up(2);

This results in a missing per cpu thread for the newly onlined cpu2 and
in a NULL pointer dereference on a consecutive offline of that cpu.

Proctect smpboot_register_percpu_thread() with get_online_cpus() to
prevent that.

[ tglx: Massaged changelog and removed the change in
        smpboot_unregister_percpu_thread() because that's an
        optimization and therefor not stable material. ]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/1406777421-12830-1-git-send-email-laijs@cn.fujitsu.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-11 14:48:17 +08:00
Biswajit Paul e758417e7c kernel: Restrict permissions of /proc/iomem.
The permissions of /proc/iomem currently are -r--r--r--. Everyone can
see its content. As iomem contains information about the physical memory
content of the device, restrict the information only to root.

Change-Id: If0be35c3fac5274151bea87b738a48e6ec0ae891
CRs-Fixed: 786116
Signed-off-by: Biswajit Paul <biswajitpaul@codeaurora.org>
Signed-off-by: Avijit Kanti Das <avijitnsec@codeaurora.org>
2015-02-09 16:17:30 -08:00
Tejun Heo 85be16bad7 workqueue: fix subtle pool management issue which can stall whole worker_pool
commit 29187a9eeaf362d8422e62e17a22a6e115277a49 upstream.

A worker_pool's forward progress is guaranteed by the fact that the
last idle worker assumes the manager role to create more workers and
summon the rescuers if creating workers doesn't succeed in timely
manner before proceeding to execute work items.

This manager role is implemented in manage_workers(), which indicates
whether the worker may proceed to work item execution with its return
value.  This is necessary because multiple workers may contend for the
manager role, and, if there already is a manager, others should
proceed to work item execution.

Unfortunately, the function also indicates that the worker may proceed
to work item execution if need_to_create_worker() is false at the head
of the function.  need_to_create_worker() tests the following
conditions.

	pending work items && !nr_running && !nr_idle

The first and third conditions are protected by pool->lock and thus
won't change while holding pool->lock; however, nr_running can change
asynchronously as other workers block and resume and while it's likely
to be zero, as someone woke this worker up in the first place, some
other workers could have become runnable inbetween making it non-zero.

If this happens, manage_worker() could return false even with zero
nr_idle making the worker, the last idle one, proceed to execute work
items.  If then all workers of the pool end up blocking on a resource
which can only be released by a work item which is pending on that
pool, the whole pool can deadlock as there's no one to create more
workers or summon the rescuers.

This patch fixes the problem by removing the early exit condition from
maybe_create_worker() and making manage_workers() return false iff
there's already another manager, which ensures that the last worker
doesn't start executing work items.

We can leave the early exit condition alone and just ignore the return
value but the only reason it was put there is because the
manage_workers() used to perform both creations and destructions of
workers and thus the function may be invoked while the pool is trying
to reduce the number of workers.  Now that manage_workers() is called
only when more workers are needed, the only case this early exit
condition is triggered is rare race conditions rendering it pointless.

Tested with simulated workload and modified workqueue code which
trigger the pool deadlock reliably without this patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Eric Sandeen <sandeen@sandeen.net>
Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net
Cc: Dave Chinner <david@fromorbit.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-02-05 22:35:40 -08:00
Vignesh Radhakrishnan e07ac5d19c kmemleak : Make module scanning optional using config
Currently kmemleak scans module memory as provided
in the area list. This takes up lot of time with
irq's and preemption disabled. Provide a compile
time configurable config to enable this functionality.

Change-Id: I5117705e7e6726acdf492e7f87c0703bc1f28da0
Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
2015-02-04 18:42:37 +05:30
Linux Build Service Account 651997c82a Merge "sched: Remove sched_wake_to_idle for HMP scheduler" 2015-02-03 15:23:53 -08:00
Linux Build Service Account df92d19819 Merge "sched: Support CFS_BANDWIDTH feature in HMP scheduler" 2015-02-03 15:23:51 -08:00
Linux Build Service Account 54dd18b6a1 Merge "sched: Consolidate hmp stats into their own struct" 2015-02-03 15:23:50 -08:00
Linux Build Service Account d00c2b9d82 Merge "sched: Add userspace interface to set PF_WAKE_UP_IDLE" 2015-02-03 15:23:50 -08:00
Sasha Levin 2497402c9a time: adjtimex: Validate the ADJ_FREQUENCY values
commit 5e5aeb4367b450a28f447f6d5ab57d8f2ab16a5f upstream.

Verify that the frequency value from userspace is valid and makes sense.

Unverified values can cause overflows later on.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[jstultz: Fix up bug for negative values and drop redunent cap check]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29 17:40:56 -08:00
Sasha Levin 7336dcc213 time: settimeofday: Validate the values of tv from user
commit 6ada1fc0e1c4775de0e043e1bd3ae9d065491aa5 upstream.

An unvalidated user input is multiplied by a constant, which can result in
an undefined behaviour for large values. While this is validated later,
we should avoid triggering undefined behaviour.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[jstultz: include trivial milisecond->microsecond correction noticed
by Andy]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-29 17:40:56 -08:00
Srivatsa Vaddagiri 930bca74d2 sched: Remove sched_wake_to_idle for HMP scheduler
sched_wake_to_idle tunable is obsoleted by sched_prefer_idle tunable
in HMP scheduler. Remove the same when CONFIG_SCHED_HMP is defined

Change-Id: I7bcf12cc3c50df5ef09261f097711c9f29ec63a4
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-01-28 14:22:12 +05:30
Srivatsa Vaddagiri 2385d33016 sched: Support CFS_BANDWIDTH feature in HMP scheduler
CFS_BANDWIDTH feature is not currently well-supported by HMP
scheduler. Issues encountered include a kernel panic when
rq->nr_big_tasks count becomes negative. This patch fixes HMP
scheduler code to better handle CFS_BANDWIDTH feature. The most
prominent change introduced is maintenance of HMP stats (nr_big_tasks,
nr_small_tasks, cumulative_runnable_avg) per 'struct cfs_rq' in
addition to being maintained in each 'struct rq'. This allows HMP
stats to be updated easily when a group is throttled on a cpu.

Change-Id: Iad9f378b79ab5d9d76f86d1775913cc1941e266a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-01-28 14:13:19 +05:30
Srivatsa Vaddagiri bbef4c5e1b sched: Consolidate hmp stats into their own struct
Key hmp stats (nr_big_tasks, nr_small_tasks and
cumulative_runnable_average) are currently maintained per-cpu in
'struct rq'. Merge those stats in their own structure (struct
hmp_sched_stats) and modify impacted functions to deal with the newly
introduced structure. This cleanup is required for a subsequent patch
which fixes various issues with use of CFS_BANDWIDTH feature in HMP
scheduler.

Change-Id: Ieffc10a3b82a102f561331bc385d042c15a33998
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-01-28 14:13:14 +05:30
Linux Build Service Account 3ff6a5a197 Merge "tracing: power: Add trace events for core control" 2015-01-27 06:53:12 -08:00
Srivatsa Vaddagiri 7e767d3e45 sched: Add userspace interface to set PF_WAKE_UP_IDLE
sched_prefer_idle flag controls whether tasks can be woken to any
available idle cpu. It may be desirable to set sched_prefer_idle to 0
so that most tasks wake up to non-idle cpus under mostly_idle
threshold and have specialized tasks override this behavior through
other means. PF_WAKE_UP_IDLE flag per task provides exactly that. It
lets tasks with PF_WAKE_UP_IDLE flag set be woken up to any available
idle cpu independent of sched_prefer_idle flag setting. Currently
only kernel-space API exists to set PF_WAKE_UP_IDLE flag for a task.
This patch adds a user-space API (in /proc filesystem) to set
PF_WAKE_UP_IDLE flag for a given task. /proc/[pid]/sched_wake_up_idle
file can be written to set or clear PF_WAKE_UP_IDLE flag for a given
task.

Change-Id: I13a37e740195e503f457ebe291d54e83b230fbeb
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-01-27 19:39:57 +05:30
Junjie Wu 6c68b1215d tracing: power: Add trace events for core control
Add trace events for core control module.

Change-Id: I36da5381709f81ef1ba82025cd9cf8610edef3fc
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
2015-01-22 17:31:16 -08:00
Jiri Olsa d525563b50 perf: Fix events installation during moving group
commit 9fc81d87420d0d3fd62d5e5529972c0ad9eab9cc upstream.

We allow PMU driver to change the cpu on which the event
should be installed to. This happened in patch:

  e2d37cd213 ("perf: Allow the PMU driver to choose the CPU on which to install events")

This patch also forces all the group members to follow
the currently opened events cpu if the group happened
to be moved.

This and the change of event->cpu in perf_install_in_context()
function introduced in:

  0cda4c0231 ("perf: Introduce perf_pmu_migrate_context()")

forces group members to change their event->cpu,
if the currently-opened-event's PMU changed the cpu
and there is a group move.

Above behaviour causes problem for breakpoint events,
which uses event->cpu to touch cpu specific data for
breakpoints accounting. By changing event->cpu, some
breakpoints slots were wrongly accounted for given
cpu.

Vinces's perf fuzzer hit this issue and caused following
WARN on my setup:

   WARNING: CPU: 0 PID: 20214 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x142/0x150()
   Can't find any breakpoint slot
   [...]

This patch changes the group moving code to keep the event's
original cpu.

Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vince@deater.net>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1418243031-20367-3-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-16 06:59:03 -08:00
Linux Build Service Account 4c5b2873a2 Merge "sched: add sched feature FORCE_CPU_THROTTLING_IMMINENT" 2015-01-14 17:12:07 -08:00
Linux Build Service Account a2ada2d4c8 Merge "sched: continue to search less power efficient cpu for load balancer" 2015-01-14 17:12:06 -08:00
Linux Build Service Account 81921f5681 Merge "cpu_pm: Add level to the cluster pm notification" 2015-01-13 12:24:49 -08:00
Joonwoo Park 66bd788705 sched: add sched feature FORCE_CPU_THROTTLING_IMMINENT
Add a new sched feature FORCE_CPU_THROTTLING_IMMINENT to perform
migration due to EA without checking frequency throttling.  This option
can give us better debugging and verification capability.

Change-Id: Iba445961a7f9812528b4e3aa9c6ddf47a3aad583
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-01-08 11:22:33 -08:00
Joonwoo Park 2c7cc326ed sched: continue to search less power efficient cpu for load balancer
When choosing a CPU to do power-aware active balance from the load
balancer currently selects the first eligible CPU it finds, even if
there is another eligible CPU which is higher-power. This can lead to
suboptimal load balancing behavior and extra migrations. Power and
performance will be impacted.

Achieve better power and performance by continuing to search the least
power efficient cpu as long as the cpu's load average is higher than or
equal to the busiest cpu found by far.

CRs-fixed: 777341
Change-Id: I14eb21ab725bf7dab88b2e1e169aced6f2d712ca
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-01-08 11:22:33 -08:00
Oleg Nesterov 12279351a4 exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
commit 24c037ebf5723d4d9ab0996433cee4f96c292a4d upstream.

alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it
fails because disable_pid_allocation() was called by the exiting
child_reaper.

We could simply move get_pid_ns() down to successful return, but this fix
tries to be as trivial as possible.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Sterling Alexander <stalexan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:17 -08:00
Eric W. Biederman afb5285388 userns: Allow setting gid_maps without privilege when setgroups is disabled
commit 66d2f338ee4c449396b6f99f5e75cd18eb6df272 upstream.

Now that setgroups can be disabled and not reenabled, setting gid_map
without privielge can now be enabled when setgroups is disabled.

This restores most of the functionality that was lost when unprivileged
setting of gid_map was removed.  Applications that use this functionality
will need to check to see if they use setgroups or init_groups, and if they
don't they can be fixed by simply disabling setgroups before writing to
gid_map.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:17 -08:00
Eric W. Biederman 1c587ee50e userns: Add a knob to disable setgroups on a per user namespace basis
commit 9cc46516ddf497ea16e8d7cb986ae03a0f6b92f8 upstream.

- Expose the knob to user space through a proc file /proc/<pid>/setgroups

  A value of "deny" means the setgroups system call is disabled in the
  current processes user namespace and can not be enabled in the
  future in this user namespace.

  A value of "allow" means the segtoups system call is enabled.

- Descendant user namespaces inherit the value of setgroups from
  their parents.

- A proc file is used (instead of a sysctl) as sysctls currently do
  not allow checking the permissions at open time.

- Writing to the proc file is restricted to before the gid_map
  for the user namespace is set.

  This ensures that disabling setgroups at a user namespace
  level will never remove the ability to call setgroups
  from a process that already has that ability.

  A process may opt in to the setgroups disable for itself by
  creating, entering and configuring a user namespace or by calling
  setns on an existing user namespace with setgroups disabled.
  Processes without privileges already can not call setgroups so this
  is a noop.  Prodcess with privilege become processes without
  privilege when entering a user namespace and as with any other path
  to dropping privilege they would not have the ability to call
  setgroups.  So this remains within the bounds of what is possible
  without a knob to disable setgroups permanently in a user namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman 6eaab78729 userns: Rename id_map_mutex to userns_state_mutex
commit f0d62aec931e4ae3333c797d346dc4f188f454ba upstream.

Generalize id_map_mutex so it can be used for more state of a user namespace.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman a1821391e5 userns: Only allow the creator of the userns unprivileged mappings
commit f95d7918bd1e724675de4940039f2865e5eec5fe upstream.

If you did not create the user namespace and are allowed
to write to uid_map or gid_map you should already have the necessary
privilege in the parent user namespace to establish any mapping
you want so this will not affect userspace in practice.

Limiting unprivileged uid mapping establishment to the creator of the
user namespace makes it easier to verify all credentials obtained with
the uid mapping can be obtained without the uid mapping without
privilege.

Limiting unprivileged gid mapping establishment (which is temporarily
absent) to the creator of the user namespace also ensures that the
combination of uid and gid can already be obtained without privilege.

This is part of the fix for CVE-2014-8989.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman f028f2d732 userns: Check euid no fsuid when establishing an unprivileged uid mapping
commit 80dd00a23784b384ccea049bfb3f259d3f973b9d upstream.

setresuid allows the euid to be set to any of uid, euid, suid, and
fsuid.  Therefor it is safe to allow an unprivileged user to map
their euid and use CAP_SETUID privileged with exactly that uid,
as no new credentials can be obtained.

I can not find a combination of existing system calls that allows setting
uid, euid, suid, and fsuid from the fsuid making the previous use
of fsuid for allowing unprivileged mappings a bug.

This is part of a fix for CVE-2014-8989.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman ba0922adbd userns: Don't allow unprivileged creation of gid mappings
commit be7c6dba2332cef0677fbabb606e279ae76652c3 upstream.

As any gid mapping will allow and must allow for backwards
compatibility dropping groups don't allow any gid mappings to be
established without CAP_SETGID in the parent user namespace.

For a small class of applications this change breaks userspace
and removes useful functionality.  This small class of applications
includes tools/testing/selftests/mount/unprivilged-remount-test.c

Most of the removed functionality will be added back with the addition
of a one way knob to disable setgroups.  Once setgroups is disabled
setting the gid_map becomes as safe as setting the uid_map.

For more common applications that set the uid_map and the gid_map
with privilege this change will have no affect.

This is part of a fix for CVE-2014-8989.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman fc9b65e3d7 userns: Don't allow setgroups until a gid mapping has been setablished
commit 273d2c67c3e179adb1e74f403d1e9a06e3f841b5 upstream.

setgroups is unique in not needing a valid mapping before it can be called,
in the case of setgroups(0, NULL) which drops all supplemental groups.

The design of the user namespace assumes that CAP_SETGID can not actually
be used until a gid mapping is established.  Therefore add a helper function
to see if the user namespace gid mapping has been established and call
that function in the setgroups permission check.

This is part of the fix for CVE-2014-8989, being able to drop groups
without privilege using user namespaces.

Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman b8a0441b54 userns: Document what the invariant required for safe unprivileged mappings.
commit 0542f17bf2c1f2430d368f44c8fcf2f82ec9e53e upstream.

The rule is simple.  Don't allow anything that wouldn't be allowed
without unprivileged mappings.

It was previously overlooked that establishing gid mappings would
allow dropping groups and potentially gaining permission to files and
directories that had lesser permissions for a specific group than for
all other users.

This is the rule needed to fix CVE-2014-8989 and prevent any other
security issues with new_idmap_permitted.

The reason for this rule is that the unix permission model is old and
there are programs out there somewhere that take advantage of every
little corner of it.  So allowing a uid or gid mapping to be
established without privielge that would allow anything that would not
be allowed without that mapping will result in expectations from some
code somewhere being violated.  Violated expectations about the
behavior of the OS is a long way to say a security issue.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Eric W. Biederman 4accc8c8e2 groups: Consolidate the setgroups permission checks
commit 7ff4d90b4c24a03666f296c3d4878cd39001e81e upstream.

Today there are 3 instances of setgroups and due to an oversight their
permission checking has diverged.  Add a common function so that
they may all share the same permission checking code.

This corrects the current oversight in the current permission checks
and adds a helper to avoid this in the future.

A user namespace security fix will update this new helper, shortly.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-08 09:58:16 -08:00
Murali Nalajala 16323e6eef cpu_pm: Add level to the cluster pm notification
Cluster pm notifications without level information increases difficulty
and complexity for the registered drivers to figure out when the last
coherency level is going into power collapse.

Send notifications with level information that allows the registered
drivers to easily determine the cluster level that is going in/out of
power collapse.

There is an issue with this implementation. GIC driver saves and
restores the distributed registers as part of cluster notifications. On
newer platforms there are multiple cluster levels are defined (e.g l2,
cci etc). These cluster level notofications can happen independently.
On MSM platforms GIC is still active while the cluster sleeps in idle,
causing the GIC state to be overwritten with an incorrect previous state
of the interrupts. This leads to a system hang. Do not save and restore
on any L2 and higher cache coherency level sleep entry and exit.

Change-Id: I31918d6383f19e80fe3b064cfaf0b55e16b97eb6
Signed-off-by: Archana Sathyakumar <asathyak@codeaurora.org>
Signed-off-by: Murali Nalajala <mnalajal@codeaurora.org>
2015-01-07 22:31:58 -08:00
Syed Rameez Mustafa 13e853e988 sched: Update cur_freq for offline CPUs in notifier callback
cpufreq governor does not send frequency change notifications for
offline CPUs. This means that a hot removed CPU's cur_freq information
can get stale if there is a frequency change while that CPU is offline.
When the offline CPU is hotplugged back in, all subsequent load
calculations are based off the stale information until another frequency
change occurs and the corresponding set of notifications are sent out.
Avoid this incorrect load tracking by updating the cur_freq for all
CPUs in the same frequency domain.

Change-Id: Ie11ad9a64e7c9b115d01a7c065f22d386eb431d5
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-01-05 15:36:33 -08:00
Linux Build Service Account 05e6c29de4 Merge "sched: add preference for prev_cpu in HMP task placement" 2014-12-29 17:31:49 -08:00
Linux Build Service Account 380cadc7f3 Merge "sched: Per-cpu prefer_idle flag" 2014-12-29 17:31:47 -08:00
Linux Build Service Account 307e71816e Merge "sched: Consider PF_WAKE_UP_IDLE in select_best_cpu()" 2014-12-29 17:31:47 -08:00
Olav Haugan 30d383d45b sched: Fix overflow in max possible capacity calculation
The max possible capacity calculation might overflow given large enough
max possible frequency and capacity. Fix potential for overflow.

Change-Id: Ie9345bc657988845aeb450d922052550cca48a5f
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-26 11:25:32 -08:00
Steve Muckle cecf6c46cd sched: add preference for prev_cpu in HMP task placement
At present the HMP task placement algorithm scans CPUs in numerical
order and if two identical options are found, the first one
encountered is chosen, even if it is different from the task's
previous CPU.

Add a bias towards the task's previous CPU in such situations. Any
time two or more CPUs are considered equivalent (load, C-state, power
cost), if one of them is the task's previous CPU, bias towards that
CPU. The algorithm is otherwise unchanged.

CRs-Fixed: 772033
Change-Id: I511f5b929c2bfa6fdea9e7433893c27b29ed8026
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-23 15:54:35 -08:00
Linux Build Service Account 5da5828a23 Merge "sched: Add sysctl to enable power aware scheduling" 2014-12-23 13:21:41 -08:00
Linux Build Service Account 4ef567e8bc Merge "sched: Ensure no active EA migration occurs when EA is disabled" 2014-12-23 05:58:15 -08:00
Srivatsa Vaddagiri 599bfc7503 sched: Per-cpu prefer_idle flag
Remove the global sysctl_sched_prefer_idle flag and replace it with a
per-cpu prefer_idle flag. The per-cpu flag is expected to same for all
cpus in a cluster. It thus provides convenient means to disable
packing in one cluster while allowing packing in another cluster.

Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-23 09:52:43 +05:30
Srivatsa Vaddagiri 92ba1d55f3 sched: Consider PF_WAKE_UP_IDLE in select_best_cpu()
sysctl_sched_prefer_idle controls selection of idle cpus for waking
tasks. In some cases, waking to idle cpus help performance while in
other cases it hurts (as tasks incur latency associated with C-state
wakeup). Its ideal if scheduler can adapt prefer_idle behavior based
on the task that is waking up, but that's hard for scheduler to
figure by itself. PF_WAKE_UP_IDLE hint can be provided by external
module/driver in such case to guide scheduler in preferring an idle
cpu for select tasks irrespective of sysctl_sched_prefer_idle flag.

This patch enhances select_best_cpu() to consider PF_WAKE_UP_IDLE
hint. Wakeup posted from any task that has PF_WAKE_UP_IDLE set is a
hint for scheduler to prefer idle cpu for waking tasks. Similarly
scheduler will attempt to place any task with PF_WAKE_UP_IDLE set on
idle cpu when they wakeup.

CRs-Fixed: 773101
Change-Id: Ia8bf334d98fd9fd2ff9eda875430497d55d64ce6
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-23 09:52:27 +05:30
Olav Haugan 7e13b27b8b sched: Add sysctl to enable power aware scheduling
Add sysctl to enable energy awareness at runtime. This is useful for
performance/power tuning/measurements and debugging. In addition this
will match up with the Documentation/scheduler/sched-hmp.txt documentation.

Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-22 14:37:33 -08:00
Olav Haugan 294b88dc67 sched: Ensure no active EA migration occurs when EA is disabled
There exists a flag called "sched_enable_power_aware" that is not honored
everywhere. Fix this.

Change-Id: I62225939b71b25970115565b4e9ccb450e252d7c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-22 14:23:55 -08:00
Vignesh Radhakrishnan 230faea50d irq_work: register irq_work_cpu_notify in early init
Currently irq_work_cpu_notify is registered using
device_initcall().

In cases where CPU is hotplugged early (example
would be thermal engine hotplugging CPU), there
are chances where irq_work_cpu_notifier has not
even registered, but CPU is already hotplugged out.
irq_work uses CPU_DYING notifier to clear out the
pending irq_work. But since the cpu notifier is not
even registered that early, pending irq_work
items are never run since this pending list is
percpu.

One specific scenario where this is impacting
the system is, rcu framework using irq_work
to wakeup and complete cleanup operations. In this
scenario we notice that RCU operations needs cleanup
on the hotplugged CPU.

Fix this by registering irq_work_cpu_notify
in early init.

CRs-Fixed: 768180
Change-Id: Ibe7f5c77097de7a342eeb1e8d597fb2f72185ecf
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
2014-12-22 14:30:12 +05:30
Joonwoo Park fc994a4b9e sched: take account of irq preemption when calculating irqload delta
If irq raises while sched_irqload() is calculating irqload delta,
sched_account_irqtime() can update rq's irqload_ts which can be greater
than the jiffies stored in sched_irqload()'s context so delta can be
negative.  This negative delta means there was recent irq occurence.
So remove improper BUG_ON().

CRs-fixed: 771894
Change-Id: I5bb01b50ec84c14bf9f26dd9c95de82ec2cd19b5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-12-16 16:56:50 -08:00
Joonwoo Park 2cec55a2e2 sched: Prevent race conditions where upmigrate_min_nice changes
When upmigrate_min_nice is changed dec_nr_big_small_task() can trigger
BUG_ON(rq->nr_big_tasks < 0).  This happens when there is a task which was
considered as non-big task due to its nice > upmigrate_min_nice and later
upmigrate_min_nice is changed to higher value so the task becomes big task.
In this case runqueue still has nr_big_tasks = 0 incorrectly with current
implementation.  Consequently next scheduler tick sees a big task to
schedule and try to decrease nr_big_tasks which is already 0.

Introduce sched_upmigrate_min_nice which is updated atomically and re-count
the number of big and small tasks to fix BUG_ON() triggering.

Change-Id: I6f5fc62ed22bbe5c52ec71613082a6e64f406e58
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-12-16 11:05:22 -08:00
Olav Haugan 4beca1fd4d sched: Avoid frequent task migration due to EA in lb
A new tunable exists that allow task migration to be throttled when the
scheduler tries to do task migrations due to Energy Awareness (EA). This
tunable is only taken into account when migrations occur in the tick
path. Extend the usage of the tunable to take into account the load
balancer (lb) path also.

In addition ensure that the start of task execution on a CPU is updated
correctly. If a task is preempted but still runnable on the same CPU the
start of execution should not be updated. Only update the start of
execution when a task wakes up after sleep or moves to a new CPU.

Change-Id: I6b2a8e06d8d2df8e0f9f62b7aba3b4ee4b2c1c4d
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-13 06:43:49 -08:00
Olav Haugan a7bc092692 sched: Avoid migrating tasks to little cores due to EA
If during the check whether migration is needed we find that there is a
lower power CPU available we commence to find a new CPU for this task.
However, by the time we search for a new CPU the lower power CPU might
no longer be available. We should abort the attempt to migrate a task in
this case.

CRs-Fixed: 764788
Change-Id: I867923a82b95c599278b81cd73bb102b6aff4d03
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-13 06:43:48 -08:00
Olav Haugan 2c320f2ffa sched: Add temperature to cpu_load trace point
Add the current CPU temperature to the sched_cpu_load trace point.
This will allow us to track the CPU temperature.

CRs-Fixed: 764788
Change-Id: Ib2e3559bbbe3fe07a6b7c8115db606828bc36254
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-13 06:43:48 -08:00
Olav Haugan 0c0d18bb15 sched: Only do EA migration when CPU throttling is imminent
We do not want to migrate tasks unnecessary to avoid cache hit and other
migration latencies that could affect the performance of the system. Add
a check to only try EA migration when CPU frequency throttling is
imminent.

CRs-Fixed: 764788
Change-Id: I92e86e62da10ce15f1e76a980df3545e93d76348
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-13 06:34:55 -08:00
Srivatsa Vaddagiri 32c6ac7c62 sched: Avoid frequent migration of running task
Power values for cpus can drop quite considerably when it goes idle.
As a result, the best choice for running a single task in a cluster
can vary quite rapidly. As the task keeps hopping cpus, other cpus go
idle and start being seen as more favorable target for running a task,
leading to task migrating almost every scheduler tick!

Prevent this by keeping track of when a task started running on a cpu
and allowing task migration in tick path (migration_needed()) on
account of energy efficiency reasons only if the task has run
sufficiently long (as determined by sysctl_sched_min_runtime
variable).

Note that currently sysctl_sched_min_runtime setting is considered
only in scheduler_tick()->migration_needed() path and not in
idle_balance() path. In other words, a task could be migrated to
another cpu which did a idle_balance(). This limitation should not
affect high-frequency migrations seen typically (when a single
high-demand task runs on high-performance cpu).

CRs-Fixed: 756570
Change-Id: I96413b7a81b623193c3bbcec6f3fa9dfec367d99
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-13 06:34:55 -08:00
Steve Muckle 1bfb9a0dd3 sched: treat sync waker CPUs with 1 task as idle
When a CPU with one task performs a sync wakeup, its
one task is expected to sleep immediately so this CPU
should be treated as idle for the purposes of CPU selection
for the waking task.

This is only done when idle CPUs are the preferred targets
for non-small task wakeups. When prefer_idle is 0, the
CPU is left as non-idle in the selection logic so it is still
a preferred candidate for the sync wakeup.

Change-Id: I65c6535169293e8ba0c37fb5e88aec336338f7d7
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:58 -08:00
Syed Rameez Mustafa b3c5c54d72 sched: extend sched_task_load tracepoint to indicate prefer_idle
Prefer idle determines whether the scheduler prefers an idle CPU
over a busy CPU or not to wake up a task on. Knowing the correct
value of this tunable is essential in understanding placement
decisions made in select_best_cpu().

Change-Id: I955d7577061abccb65d01f560e1911d9db70298a
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-12-10 23:53:57 -08:00
Steve Muckle 84370f934b sched: extend sched_task_load tracepoint to indicate sync wakeup
Sync wakeups provide a hint to the scheduler about upcoming task
activity. Knowing which wakeups are sync wakeups from logs will
assist in workload analysis.

Change-Id: I6ffe73f2337e56b8234d4097069d5d70ab045eda
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:56 -08:00
Steve Muckle ee9ddb5f3c sched: add sync wakeup recognition in select_best_cpu
If a wakeup is a sync wakeup, we need to discount the currently
running task's load from the waker's CPU as we calculate the best
CPU for the waking task to land on.

Change-Id: I00c5df626d17868323d60fb90b4513c0dd314825
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:55 -08:00
Srivatsa Vaddagiri 6e778f0cdc sched: Provide knob to prefer mostly_idle over idle cpus
sysctl_sched_prefer_idle lets the scheduler bias selection of
idle cpus over mostly idle cpus for tasks. This knob could be
useful to control balance between power and performance.

Change-Id: Ide6eef684ef94ac8b9927f53c220ccf94976fe67
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-10 23:53:54 -08:00
Steve Muckle 75d1c94217 sched: make sched_cpu_high_irqload a runtime tunable
It may be desirable to be able to alter the scehd_cpu_high_irqload
setting easily, so make it a runtime tunable value.

Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:53 -08:00
Steve Muckle 00acd0448b sched: trace: extend sched_cpu_load to print irqload
The irqload is used in determining whether CPUs are mostly idle
so it is useful to know this value while viewing scheduler traces.

Change-Id: Icbb74fc1285be878f254ae54886bdb161b14a270
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:51 -08:00
Steve Muckle 51f0d7663b sched: avoid CPUs with high irq activity
CPUs with significant IRQ activity will not be able to serve tasks
quickly. Avoid them if possible by disqualifying such CPUs from
being recognized as mostly idle.

Change-Id: I2c09272a4f259f0283b272455147d288fce11982
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:47 -08:00
Steve Muckle a14e01109a sched: refresh sched_clock() after acquiring rq lock in irq path
The wallclock time passed to sched_account_irqtime() may be stale
after we wait to acquire the runqueue lock. This could cause problems
in update_task_ravg because a different CPU may have advanced
this CPU's window_start based on a more up-to-date wallclock value,
triggering a BUG_ON(window_start > wallclock).

Change-Id: I316af62d1716e9b59c4a2898a2d9b44d6c7a75d8
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 19:50:46 -08:00
Steve Muckle 5fdc1d3aaa sched: track soft/hard irqload per-RQ with decaying avg
The scheduler currently ignores irq activity when deciding which
CPUs to place tasks on. If a CPU is getting hammered with IRQ activity
but has no tasks it will look attractive to the scheduler as it will
not be in a low power mode.

Track irqload with a decaying average. This quantity can be used
in the task placement logic to avoid CPUs which are under high
irqload. The decay factor is 3/4. Note that with this algorithm the
tracked irqload quantity will be higher than the actual irq time
observed in any single window. Some sample outcomes with steady
irqloads per 10ms window and the 3/4 decay factor (irqload of 10 is
used as a threshold in a subsequent patch):

irqload per window        load value asymptote      # windows to > 10
2ms			  8			    n/a
3ms			  12			    7
4ms			  16			    4
5ms			  20			    3

Of course irqload will not be constant in each window, these are just
given as simple examples.

Change-Id: I9dba049f5dfdcecc04339f727c8dd4ff554e01a5
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 19:50:45 -08:00
Steve Muckle c5c90f6099 sched: do not set window until sched_clock is fully initialized
The system initially uses a jiffy-based sched clock. When the platform
registers a new timer for sched_clock, sched_clock can jump backwards.
Once sched_clock_postinit() runs it should be safe to rely on it.

Also sched_clock_cpu() relies on completion of sched_clock_init()
and until that happens sched_clock_cpu() returns zero. This is used
in the irq accounting path which window-based stats relies upon.
So do not set window_start until sched_clock_cpu() is working.

Change-Id: Ided349de8f8554f80a027ace0f63ea52b1c38c68
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 19:50:44 -08:00
Andy Lutomirski 0061b518b1 uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME
commit 82975bc6a6df743b9a01810fb32cb65d0ec5d60b upstream.

x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but
not on non-paranoid returns.  I suspect that this is a mistake and that
the code only works because int3 is paranoid.

Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround
for the x86 bug.  With that bug fixed, we can remove _TIF_NOTIFY_RESUME
from the uprobes code.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-12-06 15:05:46 -08:00
Linux Build Service Account fea8806a70 Merge "sched: Fix inaccurate accounting for real-time task" 2014-12-06 14:38:08 -08:00
Linux Build Service Account cd2d717655 Merge "Revert "sched: update_rq_clock() must skip ONE update"" 2014-12-06 14:38:07 -08:00
Linux Build Service Account b4229d736e Merge "sched: Make RT tasks eligible for boost" 2014-12-05 00:05:48 -08:00
Syed Rameez Mustafa fce95c9a12 sched: Make RT tasks eligible for boost
During sched boost RT tasks currently end up going to the lowest
power cluster. This can be a performance bottleneck especially if
the frequency and IPC differences between clusters are high.
Furthermore, when RT tasks go over to the little cluster during
boost, the load balancer keeps attempting to pull work over to the
big cluster. This results in pre-emption of the executing RT task
causing more delays. Finally, containing more work on a single
cluster during boost might help save some power if the little
cluster can then enter deeper low power modes.

Change-Id: I177b2e81be5657c23e7ac43889472561ce9993a9
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-12-03 19:50:25 -08:00
Linux Build Service Account a5f4e12c8d Merge "sched: Limit LBF_PWR_ACTIVE_BALANCE to within cluster" 2014-12-03 16:31:51 -08:00
Linux Build Service Account 6be10b2d68 Merge "sched: Packing support until a frequency threshold" 2014-12-03 16:31:32 -08:00
Linux Build Service Account 0899fc9f81 Merge "irq: smp_affinity: Initialize work struct only once" 2014-12-03 16:31:31 -08:00
Srivatsa Vaddagiri 66b5ce9db0 sched: Limit LBF_PWR_ACTIVE_BALANCE to within cluster
When higher power (performance) cluster has only one online cpu, we
currently let an idle cpu in lower power cluster pull a running task
from performance cluster via active balance. Active balance for
power-aware reasons is supposed to be restricted to balance within
cluster, the check for which is not correctly implemented.

Change-Id: I5fba7f01ad80c082a9b27e89b7f6b17a6d9cde14
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-02 15:28:14 +05:30
Srivatsa Vaddagiri 8fd5aa3bf2 sched: Fix inaccurate accounting for real-time task
It is possible that rq->clock_task was not updated in put_prev_task()
in which case we can potentially overcharge a real-time task for time
it did not run. This is because clock_task could be stale and not
represent the exact time real-time task started running.

Fix this by forcing update of rq->clock_task when real-time task
starts running.

Change-Id: I8320bb4e47924368583127b950d987925e8e6a6c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-02 15:09:11 +05:30
Srivatsa Vaddagiri 21357f54c1 Revert "sched: update_rq_clock() must skip ONE update"
This reverts commit ab2ff007fe
as it was found to cause some performance regressions

Change-Id: Idd71fb04c77f5c9b0bc6bccc66b94ab5a7368471
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-02 14:46:32 +05:30
Srivatsa Vaddagiri 57da62614c sched: Packing support until a frequency threshold
Add another dimension for task packing based on frequency. This patch
adds a per-cpu tunable, rq->mostly_idle_freq, which when set will
result in tasks being packed on a single cpu in cluster as long as
cluster frequency is less than set threshold.

Change-Id: I318e9af6c8788ddf5dfcda407d621449ea5343c0
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-02 11:48:30 +05:30
Linux Build Service Account 72de2d6cb8 Merge "sched: update_rq_clock() must skip ONE update" 2014-11-30 16:19:16 -08:00
Linux Build Service Account b4b0ebc5f9 Merge "sched: tighten up jiffy to sched_clock mapping" 2014-11-29 17:17:43 -08:00
Praveen Chidambaram ee7ee8aa74 irq: smp_affinity: Initialize work struct only once
The work function to handle the irq affinity change is currently being
set from setup_affinity, which is called whenever the affinity changes.
Initialize the work function only once when the irq desc object's
defaults are set up.

CRs-Fixed: 756463
Change-Id: I66732f8c01cba166c41ce89c329d313eeaea8a7d
Signed-off-by: Praveen Chidambaram <pchidamb@codeaurora.org>
2014-11-25 11:33:31 -07:00
Srivatsa Vaddagiri ab2ff007fe sched: update_rq_clock() must skip ONE update
Prevent large wakeup latencies from being accounted to the wrong task.

Change-Id: Ie9932acb8a733989441ff2dd51c50a2626cfe5c5
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
CRs-Fixed: 755576
Patch-mainline: http://permalink.gmane.org/gmane.linux.kernel/1677324
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-11-25 12:28:35 +05:30
Linux Build Service Account 6383a10831 Merge "perf: Add queued work to remove orphaned child events" 2014-11-22 08:32:07 -08:00
Linux Build Service Account e999189413 Merge "perf: Set owner pointer for kernel events" 2014-11-22 08:32:07 -08:00
Jiri Olsa 6cd67d5a13 perf: Add queued work to remove orphaned child events
In cases when the  owner task exits before the workload and the
workload made some forks, all the events stay in until the last
workload process exits. Thats' because each child event holds
parent reference.

We want to release all children events once the parent is gone,
because at that time there's no process to read them anyway, so
they're just eating resources.

This removal  races with process exit, which removes all events
and fork, which clone events.  To be clear of those two, adding
work queue to remove orphaned child for context in case such
event is detected.

Using delayed work queue (with delay == 1), because we queue this
work under perf scheduler callbacks. Normal work queue tries to wake
up the queue process, which deadlocks on rq->lock in this place.

Also preventing clones from abandoned parent event.

Change-Id: I20b674d9b56910828444e29a9c0756daac1b4680
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1406896382-18404-4-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: fadfe7be6e50de7f03913833b33c56cd8fb66bac
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[sheetals@codeaurora.org: fixed merge conflicts]
Signed-off-by: Sheetal Sahasrabudhe <sheetals@codeaurora.org>
2014-11-21 14:35:38 -05:00
Jiri Olsa 18bebb752b perf: Set owner pointer for kernel events
Adding fake EVENT_OWNER_KERNEL owner pointer value for kernel perf
events, so we could distinguish it from user events, which needs
special care in following patch.

Change-Id: I975186151644af709d7fdfc13f1ce9d2ebd4c83b
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1406896382-18404-3-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: f86977620ee4635f26befcf436700493a38ce002
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[sheetals@codeaurora.org: fixed merge conflicts]
Signed-off-by: Sheetal Sahasrabudhe <sheetals@codeaurora.org>
2014-11-21 14:35:16 -05:00
Pawel Moll 858879737b perf: Handle compat ioctl
commit b3f207855f57b9c8f43a547a801340bb5cbc59e5 upstream.

When running a 32-bit userspace on a 64-bit kernel (eg. i386
application on x86_64 kernel or 32-bit arm userspace on arm64
kernel) some of the perf ioctls must be treated with special
care, as they have a pointer size encoded in the command.

For example, PERF_EVENT_IOC_ID in 32-bit world will be encoded
as 0x80042407, but 64-bit kernel will expect 0x80082407. In
result the ioctl will fail returning -ENOTTY.

This patch solves the problem by adding code fixing up the
size as compat_ioctl file operation.

Reported-by: Drew Richardson <drew.richardson@arm.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1402671812-9078-1-git-send-email-pawel.moll@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Ahern <daahern@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21 09:22:55 -08:00
Miklos Szeredi bd501a2eb2 audit: keep inode pinned
commit 799b601451b21ebe7af0e6e8f6e2ccd4683c5064 upstream.

Audit rules disappear when an inode they watch is evicted from the cache.
This is likely not what we want.

The guilty commit is "fsnotify: allow marks to not pin inodes in core",
which didn't take into account that audit_tree adds watches with a zero
mask.

Adding any mask should fix this.

Fixes: 90b1e7a578 ("fsnotify: allow marks to not pin inodes in core")
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-21 09:22:52 -08:00
Linux Build Service Account aa285b577f Merge "sched: per-cpu mostly_idle threshold" 2014-11-20 15:36:30 -08:00
Linux Build Service Account 62b1d26801 Merge "sched: Add API to set task's initial task load" 2014-11-20 15:36:29 -08:00
Steve Muckle f17fe85baf sched: tighten up jiffy to sched_clock mapping
The tick code already tracks exact time a tick is expected
to arrive. This can be used to eliminate slack in the jiffy
to sched_clock mapping that aligns windows between a caller
of sched_set_window and the scheduler itself.

Change-Id: I9d47466658d01e6857d7457405459436d504a2ca
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-11-19 15:06:33 -08:00
Ram Chandrasekar ecd44b989f printk: Make the console flush configurable in hotplug path
Make console flush configurable in hot plug code path.

The thread which initiates the hot plug can get scheduled
out, while trying to acquire the console lock,
thus increasing the hot plug latency. This option
allows to selectively disable the console flush and
in turn reduce the hot plug latency.

Change-Id: I0e6cd50a7e2ab14cab987815e47352b6b71f187a
Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org>
2014-11-18 19:16:25 -07:00
Patrick Daly 5a52a6e15e printk: Save interrupt flags when enabling __log_oops_buf
oops_printk_start() may be called with interrupts disabled; save and
restore the interrupt flags properly.

Found after a spinlock was acquired recursively in the following backtrace:

\raw_spin_lock_irq
\run_timer_softirq
...(skipped)
\el1_irq
()
\printk\oops_printk_start
\panic\oops_enter
\traps\die
\fault\__do_kernel_fault.part.5
\fault\do_page_fault
\fault\do_translation_fault
\fault\do_mem_abort
()
\get_next_timer_interrupt
\tick-sched\__tick_nohz_idle_enter
\tick-sched\tick_nohz_idle_enter
\idle\cpu_startup_entry
\init/main\rest_init
\init/main\start_kernel

CRs-fixed: 754837
Change-Id: Ib9b5079b0177833d40b14ddf0f0458be5676509a
Signed-off-by: Patrick Daly <pdaly@codeaurora.org>
2014-11-17 12:30:52 -08:00
Linux Build Service Account b3491934ec Merge "idle: Implement a per-cpu idle-polling mode" 2014-11-15 20:27:53 -08:00
Linux Build Service Account 2492f77873 Merge "idle: exit the cpu_idle_poll loop if cpu_idle_force_poll is cleared" 2014-11-15 20:27:53 -08:00
Linux Build Service Account 0866af874e Merge "idle: Add a memory barrier after setting cpu_idle_force_poll" 2014-11-15 20:27:52 -08:00
Mathias Krause f9b6264a0f posix-timers: Fix stack info leak in timer_create()
commit 6891c4509c792209c44ced55a60f13954cb50ef4 upstream.

If userland creates a timer without specifying a sigevent info, we'll
create one ourself, using a stack local variable. Particularly will we
use the timer ID as sival_int. But as sigev_value is a union containing
a pointer and an int, that assignment will only partially initialize
sigev_value on systems where the size of a pointer is bigger than the
size of an int. On such systems we'll copy the uninitialized stack bytes
from the timer_create() call to userland when the timer actually fires
and we're going to deliver the signal.

Initialize sigev_value with 0 to plug the stack info leak.

Found in the PaX patch, written by the PaX Team.

Fixes: 5a9fa73072 ("posix-timers: kill ->it_sigev_signo and...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Link: http://lkml.kernel.org/r/1412456799-32339-1-git-send-email-minipli@googlemail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14 08:48:00 -08:00