android_kernel_lge_bullhead

Commit Graph

Author	SHA1	Message	Date
Peter Zijlstra	1f6661e256	perf: Fix fasync handling on inherited events commit fed66e2cdd4f127a43fd11b8d92a99bdd429528c upstream. Vince reported that the fasync signal stuff doesn't work proper for inherited events. So fix that. Installing fasync allocates memory and sets filp->f_flags \|= FASYNC, which upon the demise of the file descriptor ensures the allocation is freed and state is updated. Now for perf, we can have the events stick around for a while after the original FD is dead because of references from child events. So we cannot copy the fasync pointer around. We can however consistently use the parent's fasync, as that will be updated. Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho deMelo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-09-13 09:07:59 -07:00
Joonwoo Park	b715e2a8ca	sched: look for least busy and fallback CPU only when it's needed Function best_small_task_cpu() has bias on mostly idle CPUs and shallow cstate CPUs. Thus chance of needing to find the least busy or the least power cost fallback CPU is quite rare typically. At present, however, the function finds those two CPUs always unnecessarily for most of time. Optimize the function by amending it to look for the least busy CPU and the least power cost fallback CPU only when those are in need. This change is solely for optimization and doesn't make functional changes. CRs-fixed: 849655 Change-Id: I5eca11436e85b448142a7a7644f422c71eb25e8e Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-09-11 00:33:58 -07:00
Patrick Daly	8a397a7081	Revert "rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks" This reverts commit `c0f4dfd4f9`. The above change allows the CPU to enter idle with RCU callbacks pending for the durations specified in rcu_idle_gp_delay or rcu_idle_lazy_gp_delay. This was causing a throughput decrease in synchronize_rcu(), as mentioned by the original author. These decreases were determined to be too large, as sub- 1 jiffy synchronize_rcu() latency in cpu idle scenarios was desired. Reverting this change change in order to start the force_quiescent_state() state machine upon entering idle, rather than on the scheduler tick() CRs-Fixed: 887035 Change-Id: I61865131dad751ec0fa14653b2b08b8bada4d061 Signed-off-by: Patrick Daly <pdaly@codeaurora.org>	2015-09-11 00:33:57 -07:00
Tim Murray	068a452ae3	sched: Return fallback CPU when no CPUs are available. Given certain combinations of hotplug and cpusets, it is possible to have a cpumask with no valid CPUs, including the CPU where the task is currently running. Fixes a regression caused by `93e62b823`, because we will run the body of the loop with an invalid CPU index instead of falling through to return fallback_cpu. bug 23748688 Change-Id: I621e4f6ab12f9fe932dfd3cd12d705d5617836bc	2015-09-03 22:56:18 +00:00
Jongrak Kwon	9b1420afd1	panic: resume console on panic Device rebooted by kernel panic but no panic message if console was suspended. This patch enables console if it's suspended to get the panic messages. Bug: 23629167 Change-Id: I61220b7de74f51677f3c0c8117bd8468ca9abcd3 Signed-off-by: Jongrak Kwon <jongrak.kwon@lge.com>	2015-09-02 23:13:30 -07:00
Riley Andrews	8882445f98	sched: fair: Change the synchronous wakeup logic in hmp. If doing a synchronous wakeup keep the woken task on the same core as the waker, ignoring task history (i.e. power) on task placement. Change-Id: I911702ef2ed9e3705d19591da1f0e33d2198c1df	2015-09-01 22:35:14 +00:00
Joonwoo Park	a6bd7ae282	sched: inline function scale_load_to_cpu() Inline relatively small and frequently used function scale_load_to_cpu(). CRs-fixed: 849655 Change-Id: Id5f60595c394959d78e6da4cc4c18c338fec285b Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-08-24 17:45:31 -07:00
Joonwoo Park	93e62b823f	sched: iterate search CPUs starting from prev_cpu for optimization Function best_small_task_cpu() looks for a mostly idle CPU and returns it as the best CPU for a given small task. At present, however, it cannot break the CPU search loop when the function found a mostly idle CPU but continues to iterate CPU search loop because the function needs to find and return the given task's previous CPU as the best CPU to avoid unnecessary task migration when the previous CPU is mostly idle. Optimize the function best_small_task_cpu() to iterate search CPUs starting from the given task's CPU so it can break the loop as soon as mostly idle CPU found. This optimization saves few hundreds ns spent by the function and doesn't make any functional change. CRs-fixed: 849655 Change-Id: I8c540963487f4102dac4d54e9f98e24a4a92a7b3 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-08-24 17:45:23 -07:00
Syed Rameez Mustafa	da6603f1ae	sched: Optimize the select_best_cpu() "for" loop select_best_cpu() is agnostic of the hardware topology. This means that certain functions such as task_will_fit() and skip_cpu() are run unnecessarily for every CPU in a cluster whereas they need to run only once per cluster. Reduce the execution time of select_best_cpu() by ensuring these functions run only once per cluster. The frequency domain mask is used to identify CPUs that fall in the same cluster. CRs-fixed: 849655 Change-Id: Id24208710a0fc6321e24d9a773f00be9312b75de Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: added continue after clearing search_cpus. fixed indentations with space. fixed skip_cpu() to return true when rq == task_rq.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-08-24 17:45:18 -07:00
Syed Rameez Mustafa	6f717a4bf8	sched: Optimize select_best_cpu() to reduce execution time select_best_cpu() is a crucial wakeup routine that determines the time taken by the scheduler to wake up a task. Optimize this routine to get higher performance. The following changes have been made as part of the optimization listed in order of how they built on top of one another: * Several routines called by select_best_cpu() recalculate task load and CPU load even though these are already known quantities. For example mostly_idle_cpu_sync() calculates CPU load; task_will_fit() calculates task load before spill_threshold_crossed() recalculates both. Remove these redundant calculations by moving the task load and CPU load computations to the select_best_cpu() 'for' loop and passing to any functions that need the information. * Rewrite best_small_task_cpu() to avoid the existing two pass approach. The two pass approach was only in place to find the minimum power cluster for small task placement. This information can easily be established by looking at runqueue capacities. The cluster with not the highest capacity constitutes the minimum power cluster. A special CPU mask is called the mpc_mask required to safeguard against undue side effects on SMP systems. Also terminate the function early if the previous CPU is found to be mostly_idle. * Reorganize code to ensure that no unnecessary computations or variable assignments are done. For example there is no need to compute CPU load if that information does not end up getting used in any iteration of the 'for' loop. * The tick logic for EA migrations unnecessarily checks for the power of all CPUs only for skip_cpu() to throw away the result later. Ensure that for EA we only check CPUs within the same cluster and avoid running select_best_cpu() whenever possible. CRs-fixed: 849655 Change-Id: I4e722912fcf3fe4e365a826d4d92a4dd45c05ef3 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed cpufreq_notifier_policy() to set mpc_mask. added a comment about prerequisite of lower_power_cpu_available(). s/struct rq * rq/struct rq *rq/.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-08-24 17:45:09 -07:00
Joonwoo Park	dde9a3644f	sched: avoid CPUs with high irq activity for non-small tasks The irq-aware scheduler is to achieve better performance by avoiding task placement to the CPUs which have high irq activity. However current scheduler places tasks to the CPUs which are loaded by irq activity preferably as opposed to what it is meant to be when the task is non-small. This is suboptimal for both power and performance. Fix task placement algorithm to avoid CPUs with significant irq activities. Change-Id: Ifa5a6ac186241bd58fa614e93e3d873a5f5ad4ca Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-08-24 17:45:04 -07:00
Naveen Ramaraj	e4ba1e2775	sched: Do not track cpu power for tasks if arch does not support it This fixes the undefined references to acct_update_power() when building for ARCH=um introduced by commit: `c0a74c5d` Bug: 21631098 Change-Id: Id8565f02e141395a1ffb269de09a062680552090 Signed-off-by: Naveen Ramaraj <nramaraj@codeaurora.org>	2015-08-22 00:02:28 -07:00
Peter Zijlstra	4adac624b9	hrtimer: Allow concurrent hrtimer_start() for self restarting timers Because we drop cpu_base->lock around calling hrtimer::function, it is possible for hrtimer_start() to come in between and enqueue the timer. If hrtimer::function then returns HRTIMER_RESTART we'll hit the BUG_ON because HRTIMER_STATE_ENQUEUED will be set. Since the above is a perfectly valid scenario, remove the BUG_ON and make the enqueue_hrtimer() call conditional on the timer not being enqueued already. NOTE: in that concurrent scenario its entirely common for both sites to want to modify the hrtimer, since hrtimers don't provide serialization themselves be sure to provide some such that the hrtimer::function and the hrtimer_start() caller don't both try and fudge the expiration state at the same time. To that effect, add a WARN when someone tries to forward an already enqueued timer, the most common way to change the expiry of self restarting timers. Ideally we'd put the WARN in everything modifying the expiry but most of that is inlines and we don't need the bloat. Change-Id: I819cca2a35430886b77ac8d43a6fbdcfd8b2b238 Fixes: `2d44ae4d71` ("hrtimer: clean up cpu->base locking tricks") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ben Segall <bsegall@google.com> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Paul Turner <pjt@google.com> Link: http://lkml.kernel.org/r/20150415113105.GT5029@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> (cherry picked from commit 5de2755c8c8b3a6b8414870e2c284914a2b42e4d) Signed-off-by: Thierry Strudel <tstrudel@google.com>	2015-08-21 23:27:45 -07:00
Amanieu d'Antras	a6bb935312	signal: fix information leak in copy_siginfo_from_user32 commit 3c00cb5e68dc719f2fc73a33b1b230aadfcb1309 upstream. This function can leak kernel stack data when the user siginfo_t has a positive si_code value. The top 16 bits of si_code descibe which fields in the siginfo_t union are active, but they are treated inconsistently between copy_siginfo_from_user32, copy_siginfo_to_user32 and copy_siginfo_to_user. copy_siginfo_from_user32 is called from rt_sigqueueinfo and rt_tgsigqueueinfo in which the user has full control overthe top 16 bits of si_code. This fixes the following information leaks: x86: 8 bytes leaked when sending a signal from a 32-bit process to itself. This leak grows to 16 bytes if the process uses x32. (si_code = __SI_CHLD) x86: 100 bytes leaked when sending a signal from a 32-bit process to a 64-bit process. (si_code = -1) sparc: 4 bytes leaked when sending a signal from a 32-bit process to a 64-bit process. (si_code = any) parsic and s390 have similar bugs, but they are not vulnerable because rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code to a different process. These bugs are also fixed for consistency. Signed-off-by: Amanieu d'Antras <amanieu@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-08-16 20:51:42 -07:00
Amanieu d'Antras	16a49557bc	signal: fix information leak in copy_siginfo_to_user commit 26135022f85105ad725cda103fa069e29e83bd16 upstream. This function may copy the si_addr_lsb, si_lower and si_upper fields to user mode when they haven't been initialized, which can leak kernel stack data to user mode. Just checking the value of si_code is insufficient because the same si_code value is shared between multiple signals. This is solved by checking the value of si_signo in addition to si_code. Signed-off-by: Amanieu d'Antras <amanieu@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-08-16 20:51:42 -07:00
Riley Andrews	e1798921cb	mutex: Add a delay into the SPIN_ON_OWNER wait loop. On arm systems the spin on owner optimization can intermittently cause a lockup that's usually as long as the waiting thread's cpu timeslice. The repeated mutex aquisitions + atomics in a single spinning thread can completely lock out the owner from releasing the kernel mutex. The owner needs to acquire a spinlock on the relase path and this spinlock can share a monitor with the other locks and atomics on the waiter path. Rate limit the waiter so that the thread releasing the mutex never is starved. Bug 23036902 Change-Id: Ie1b64275a0c6141f94faaf3e63fcbf9b5438140c	2015-08-11 23:00:58 +00:00
Thomas Gleixner	dcc2305a48	genirq: Prevent resend to interrupts marked IRQ_NESTED_THREAD commit 75a06189fc508a2acf470b0b12710362ffb2c4b1 upstream. The resend mechanism happily calls the interrupt handler of interrupts which are marked IRQ_NESTED_THREAD from softirq context. This can result in crashes because the interrupt handler is not the proper way to invoke the device handlers. They must be invoked via handle_nested_irq. Prevent the resend even if the interrupt has no valid parent irq set. Its better to have a lost interrupt than a crashing machine. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-08-10 12:20:30 -07:00
Ruchi Kandoi	3056ec9456	wakeup_reason: use vsnprintf instead of snsprintf for vargs. Bug: 22368519 Change-Id: I9dd0262111d50079a7df371716508ea66e272337 Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>	2015-08-06 05:40:30 +00:00
Steven Rostedt (Red Hat)	f1bb130708	tracing: Have branch tracer use recursive field of task struct commit 6224beb12e190ff11f3c7d4bf50cb2922878f600 upstream. Fengguang Wu's tests triggered a bug in the branch tracer's start up test when CONFIG_DEBUG_PREEMPT set. This was because that config adds some debug logic in the per cpu field, which calls back into the branch tracer. The branch tracer has its own recursive checks, but uses a per cpu variable to implement it. If retrieving the per cpu variable calls back into the branch tracer, you can see how things will break. Instead of using a per cpu variable, use the trace_recursion field of the current task struct. Simply set a bit when entering the branch tracing and clear it when leaving. If the bit is set on entry, just don't do the tracing. There's also the case with lockdep, as the local_irq_save() called before the recursion can also trigger code that can call back into the function. Changing that to a raw_local_irq_save() will protect that as well. This prevents the recursion and the inevitable crash that follows. Link: http://lkml.kernel.org/r/20150630141803.GA28071@wfg-t540p.sh.intel.com Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-08-03 09:29:45 -07:00
Steven Rostedt (Red Hat)	3f6ba7f88d	tracing/filter: Do not allow infix to exceed end of string commit 6b88f44e161b9ee2a803e5b2b1fbcf4e20e8b980 upstream. While debugging a WARN_ON() for filtering, I found that it is possible for the filter string to be referenced after its end. With the filter: # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter The filter_parse() function can call infix_get_op() which calls infix_advance() that updates the infix filter pointers for the cnt and tail without checking if the filter is already at the end, which will put the cnt to zero and the tail beyond the end. The loop then calls infix_next() that has ps->infix.cnt--; return ps->infix.string[ps->infix.tail++]; The cnt will now be below zero, and the tail that is returned is already passed the end of the filter string. So far the allocation of the filter string usually has some buffer that is zeroed out, but if the filter string is of the exact size of the allocated buffer there's no guarantee that the charater after the nul terminating character will be zero. Luckily, only root can write to the filter. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-08-03 09:29:45 -07:00
Steven Rostedt (Red Hat)	c8b9a1fbd1	tracing/filter: Do not WARN on operand count going below zero commit b4875bbe7e68f139bd3383828ae8e994a0df6d28 upstream. When testing the fix for the trace filter, I could not come up with a scenario where the operand count goes below zero, so I added a WARN_ON_ONCE(cnt < 0) to the logic. But there is legitimate case that it can happen (although the filter would be wrong). # echo '>' > /sys/kernel/debug/events/ext4/ext4_truncate_exit/filter That is, a single operation without any operands will hit the path where the WARN_ON_ONCE() can trigger. Although this is harmless, and the filter is reported as a error. But instead of spitting out a warning to the kernel dmesg, just fail nicely and report it via the proper channels. Link: http://lkml.kernel.org/r/558C6082.90608@oracle.com Reported-by: Vince Weaver <vincent.weaver@maine.edu> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-08-03 09:29:45 -07:00
Joonwoo Park	5ed5e4f869	sched: fix incorrect prev_runnable_sum accounting with long ISR run At present, when IRQ handler spans multiple scheduler windows, HMP scheduler resets the IRQ CPU's prev_runnable_sum with its current max capacity under the assumption that there is no other possible contribution to the CPU's prev_runnable_sum. This isn't correct as another CPU can migrate tasks to the IRQ CPU. Furthermore such incorrectness can trigger BUG_ON() if the migrated task's prev_window is larger than migrating CPU's current capacity in following scenario. 1. ISR on the power efficient CPU has been running for multiple windows. 2. A task which has prev_window higher than IRQ CPU's current capacity migrated to the IRQ CPU. 3. Servicing IRQ is done and the IRQ CPU resets its prev_runnable_rum = CPU's current capacity. 4. Before window rollover, the task on the IRQ CPU migrates to other CPU and fixes up source and destnation CPUs' busy time. 5. BUG_ON(src_rq->prev_runnable_sum < 0) triggers as p->ravg.prev_window is larger than src_rq->prev_runnable_sum. Fix such incorrectness by preserving prev_runnable_sum when ISR spans multiple scheduler windows. There is no need to reset it. CRs-fixed: 828055 Change-Id: I1f95ece026493e49d3810f9c940ec5f698cc0b81 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-07-21 16:55:35 -07:00
Micha Kalfon	7e20affe77	prctl: make PR_SET_TIMERSLACK_PID pid namespace aware Make PR_SET_TIMERSLACK_PID consider pid namespace and resolve the target pid in the caller's namespace. Otherwise, calls from pid namespace other than init would fail or affect the wrong task. Change-Id: I1da15196abc4096536713ce03714e99d2e63820a Signed-off-by: Micha Kalfon <micha@cellrox.com> Acked-by: Oren Laadan <orenl@cellrox.com>	2015-07-21 16:06:29 -07:00
Micha Kalfon	2f325b1f46	prctl: fix misplaced PR_SET_TIMERSLACK_PID case The case clause for the PR_SET_TIMERSLACK_PID option was placed inside the an internal switch statement for PR_MCE_KILL (see commits 37a591d4 and 8ae872f1) . This commit moves it to the right place. Change-Id: I63251669d7e2f2aa843d1b0900e7df61518c3dea Signed-off-by: Micha Kalfon <micha@cellrox.com> Acked-by: Oren Laadan <orenl@cellrox.com>	2015-07-21 16:06:29 -07:00
Andy Lutomirski	7ae89c27e2	seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing The secure_computing function took a syscall number parameter, but it only paid any attention to that parameter if seccomp mode 1 was enabled. Rather than coming up with a kludge to get the parameter to work in mode 2, just remove the parameter. To avoid churn in arches that don't have seccomp filters (and may not even support syscall_get_nr right now), this leaves the parameter in secure_computing_strict, which is now a real function. For ARM, this is a bit ugly due to the fact that ARM conditionally supports seccomp filters. Fixing that would probably only be a couple of lines of code, but it should be coordinated with the audit maintainers. This will be a slight slowdown on some arches. The right fix is to pass in all of seccomp_data instead of trying to make just the syscall nr part be fast. This is a prerequisite for making two-phase seccomp work cleanly. Cc: Russell King <linux@arm.linux.org.uk> Cc: linux-arm-kernel@lists.infradead.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: x86@kernel.org Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Kees Cook <keescook@chromium.org>	2015-07-06 17:16:22 -07:00
Guenter Roeck	13586040a7	seccomp: Replace BUG(!spin_is_locked()) with assert_spin_lock Current upstream kernel hangs with mips and powerpc targets in uniprocessor mode if SECCOMP is configured. Bisect points to commit dbd952127d11 ("seccomp: introduce writer locking"). Turns out that code such as BUG_ON(!spin_is_locked(&list_lock)); can not be used in uniprocessor mode because spin_is_locked() always returns false in this configuration, and that assert_spin_locked() exists for that very purpose and must be used instead. Fixes: dbd952127d11 ("seccomp: introduce writer locking") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Kees Cook <keescook@chromium.org>	2015-07-06 17:16:21 -07:00
Kees Cook	6108b31b1d	seccomp: implement SECCOMP_FILTER_FLAG_TSYNC Applying restrictive seccomp filter programs to large or diverse codebases often requires handling threads which may be started early in the process lifetime (e.g., by code that is linked in). While it is possible to apply permissive programs prior to process start up, it is difficult to further restrict the kernel ABI to those threads after that point. This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for synchronizing thread group seccomp filters at filter installation time. When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, filter) an attempt will be made to synchronize all threads in current's threadgroup to its new seccomp filter program. This is possible iff all threads are using a filter that is an ancestor to the filter current is attempting to synchronize to. NULL filters (where the task is running as SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS, ...) has been set on the calling thread, no_new_privs will be set for all synchronized threads too. On success, 0 is returned. On failure, the pid of one of the failing threads will be returned and no filters will have been applied. The race conditions against another thread are: - requesting TSYNC (already handled by sighand lock) - performing a clone (already handled by sighand lock) - changing its filter (already handled by sighand lock) - calling exec (handled by cred_guard_mutex) The clone case is assisted by the fact that new threads will have their seccomp state duplicated from their parent before appearing on the tasklist. Holding cred_guard_mutex means that seccomp filters cannot be assigned while in the middle of another thread's exec (potentially bypassing no_new_privs or similar). The call to de_thread() may kill threads waiting for the mutex. Changes across threads to the filter pointer includes a barrier. Based on patches by Will Drewry. Suggested-by: Julien Tinnes <jln@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>	2015-07-06 17:16:21 -07:00
Kees Cook	4ffce22eef	seccomp: allow mode setting across threads This changes the mode setting helper to allow threads to change the seccomp mode from another thread. We must maintain barriers to keep TIF_SECCOMP synchronized with the rest of the seccomp state. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net> Conflicts: kernel/seccomp.c	2015-07-06 17:16:21 -07:00
Kees Cook	bdac154942	seccomp: introduce writer locking Normally, task_struct.seccomp.filter is only ever read or modified by the task that owns it (current). This property aids in fast access during system call filtering as read access is lockless. Updating the pointer from another task, however, opens up race conditions. To allow cross-thread filter pointer updates, writes to the seccomp fields are now protected by the sighand spinlock (which is shared by all threads in the thread group). Read access remains lockless because pointer updates themselves are atomic. However, writes (or cloning) often entail additional checking (like maximum instruction counts) which require locking to perform safely. In the case of cloning threads, the child is invisible to the system until it enters the task list. To make sure a child can't be cloned from a thread and left in a prior state, seccomp duplication is additionally moved under the sighand lock. Then parent and child are certain have the same seccomp state when they exit the lock. Based on patches by Will Drewry and David Drysdale. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>	2015-07-06 17:16:21 -07:00
Kees Cook	25b84f016a	seccomp: split filter prep from check and apply In preparation for adding seccomp locking, move filter creation away from where it is checked and applied. This will allow for locking where no memory allocation is happening. The validation, filter attachment, and seccomp mode setting can all happen under the future locks. For extreme defensiveness, I've added a BUG_ON check for the calculated size of the buffer allocation in case BPF_MAXINSN ever changes, which shouldn't ever happen. The compiler should actually optimize out this check since the test above it makes it impossible. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net> Conflicts: kernel/seccomp.c	2015-07-06 17:16:20 -07:00
Kees Cook	8334b709bf	sched: move no_new_privs into new atomic flags Since seccomp transitions between threads requires updates to the no_new_privs flag to be atomic, the flag must be part of an atomic flag set. This moves the nnp flag into a separate task field, and introduces accessors. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>	2015-07-06 17:16:20 -07:00
Kees Cook	c7ff43e528	seccomp: add "seccomp" syscall This adds the new "seccomp" syscall with both an "operation" and "flags" parameter for future expansion. The third argument is a pointer value, used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...). In addition to the TSYNC flag later in this patch series, there is a non-zero chance that this syscall could be used for configuring a fixed argument area for seccomp-tracer-aware processes to pass syscall arguments in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter" for this syscall. Additionally, this syscall uses operation, flags, and user pointer for arguments because strictly passing arguments via a user pointer would mean seccomp itself would be unable to trivially filter the seccomp syscall itself. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>	2015-07-06 17:16:20 -07:00
Kees Cook	c13c8c32dc	seccomp: split mode setting routines Separates the two mode setting paths to make things more readable with fewer #ifdefs within function bodies. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>	2015-07-06 17:16:20 -07:00
Kees Cook	c14cdadc57	seccomp: extract check/assign mode helpers To support splitting mode 1 from mode 2, extract the mode checking and assignment logic into common functions. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>	2015-07-06 17:16:20 -07:00
Kees Cook	82e69a633c	seccomp: create internal mode-setting function In preparation for having other callers of the seccomp mode setting logic, split the prctl entry point away from the core logic that performs seccomp mode setting. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>	2015-07-06 17:16:19 -07:00
Eric Paris	4802a181c2	syscall_get_arch: remove useless function arguments Every caller of syscall_get_arch() uses current for the task and no implementors of the function need args. So just get rid of both of those things. Admittedly, since these are inline functions we aren't wasting stack space, but it just makes the prototypes better. Signed-off-by: Eric Paris <eparis@redhat.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mips@linux-mips.org Cc: linux390@de.ibm.com Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linux-arch@vger.kernel.org	2015-07-06 17:16:19 -07:00
Rashika Kheria	d9de37f398	kernel: Mark function as static in kernel/seccomp.c Mark function as static in kernel/seccomp.c because it is not used outside this file. This eliminates the following warning in kernel/seccomp.c: kernel/seccomp.c:296:6: warning: no previous prototype for ?seccomp_attach_user_filter? [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Will Drewry <wad@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>	2015-07-06 17:16:19 -07:00
Mark Grondona	a20cc70d1e	__ptrace_may_access() should not deny sub-threads commit 73af963f9f3036dffed55c3a2898598186db1045 upstream. __ptrace_may_access() checks get_dumpable/ptrace_has_cap/etc if task != current, this can can lead to surprising results. For example, a sub-thread can't readlink("/proc/self/exe") if the executable is not readable. setup_new_exec()->would_dump() notices that inode_permission(MAY_READ) fails and then it does set_dumpable(suid_dumpable). After that get_dumpable() fails. (It is not clear why proc_pid_readlink() checks get_dumpable(), perhaps we could add PTRACE_MODE_NODUMPABLE) Change __ptrace_may_access() to use same_thread_group() instead of "task == current". Any security check is pointless when the tasks share the same ->mm. Signed-off-by: Mark Grondona <mgrondona@llnl.gov> Signed-off-by: Ben Woodard <woodard@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-07-03 19:48:08 -07:00
Steven Rostedt	63dec31183	tracing: Have filter check for balanced ops commit 2cf30dc180cea808077f003c5116388183e54f9e upstream. When the following filter is used it causes a warning to trigger: # cd /sys/kernel/debug/tracing # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter -bash: echo: write error: Invalid argument # cat events/ext4/ext4_truncate_exit/filter ((dev==1)blocks==2) ^ parse_error: No error ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990() Modules linked in: bnep lockd grace bluetooth ... CPU: 3 PID: 1223 Comm: bash Tainted: G W 4.1.0-rc3-test+ #450 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0 0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea Call Trace: [<ffffffff816ed4f9>] dump_stack+0x4c/0x6e [<ffffffff8107fb07>] warn_slowpath_common+0x97/0xe0 [<ffffffff8136b46c>] ? _kstrtoull+0x2c/0x80 [<ffffffff8107fb6a>] warn_slowpath_null+0x1a/0x20 [<ffffffff81159065>] replace_preds+0x3c5/0x990 [<ffffffff811596b2>] create_filter+0x82/0xb0 [<ffffffff81159944>] apply_event_filter+0xd4/0x180 [<ffffffff81152bbf>] event_filter_write+0x8f/0x120 [<ffffffff811db2a8>] __vfs_write+0x28/0xe0 [<ffffffff811dda43>] ? __sb_start_write+0x53/0xf0 [<ffffffff812e51e0>] ? security_file_permission+0x30/0xc0 [<ffffffff811dc408>] vfs_write+0xb8/0x1b0 [<ffffffff811dc72f>] SyS_write+0x4f/0xb0 [<ffffffff816f5217>] system_call_fastpath+0x12/0x6a ---[ end trace e11028bd95818dcd ]--- Worse yet, reading the error message (the filter again) it says that there was no error, when there clearly was. The issue is that the code that checks the input does not check for balanced ops. That is, having an op between a closed parenthesis and the next token. This would only cause a warning, and fail out before doing any real harm, but it should still not caues a warning, and the error reported should work: # cd /sys/kernel/debug/tracing # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter -bash: echo: write error: Invalid argument # cat events/ext4/ext4_truncate_exit/filter ((dev==1)blocks==2) ^ parse_error: Meaningless filter expression And give no kernel warning. Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [ luis: backported to 3.16: - unconditionally decrement cnt as the OP_NOT logic was introduced only by e12c09cf3087 ("tracing: Add NOT to filtering logic") ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-06-29 12:08:34 -07:00
Wang Long	61a5c6bf34	ring-buffer-benchmark: Fix the wrong sched_priority of producer commit 108029323910c5dd1ef8fa2d10da1ce5fbce6e12 upstream. The producer should be used producer_fifo as its sched_priority, so correct it. Link: http://lkml.kernel.org/r/1433923957-67842-1-git-send-email-long.wanglong@huawei.com Signed-off-by: Wang Long <long.wanglong@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-06-22 16:55:53 -07:00
Subbaraman Narayanamurthy	b9b31537c1	kthread: Fix the race condition when kthread is parked While stressing the CPU hotplug path, sometimes we hit a problem as shown below. [57056.416774] ------------[ cut here ]------------ [57056.489232] ksoftirqd/1 (14): undefined instruction: pc=c01931e8 [57056.489245] Code: e594a000 eb085236 e15a0000 0a000000 (e7f001f2) [57056.489259] ------------[ cut here ]------------ [57056.492840] kernel BUG at kernel/kernel/smpboot.c:134! [57056.513236] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [57056.519055] Modules linked in: wlan(O) mhi(O) [57056.523394] CPU: 0 PID: 14 Comm: ksoftirqd/1 Tainted: G W O 3.10.0-g3677c61-00008-g180c060 #1 [57056.532595] task: f0c8b000 ti: f0e78000 task.ti: f0e78000 [57056.537991] PC is at smpboot_thread_fn+0x124/0x218 [57056.542750] LR is at smpboot_thread_fn+0x11c/0x218 [57056.547528] pc : [<c01931e8>] lr : [<c01931e0>] psr: 200f0013 [57056.547528] sp : f0e79f30 ip : 00000000 fp : 00000000 [57056.558983] r10: 00000001 r9 : 00000000 r8 : f0e78000 [57056.564192] r7 : 00000001 r6 : c1195758 r5 : f0e78000 r4 : f0e5fd00 [57056.570701] r3 : 00000001 r2 : f0e79f20 r1 : 00000000 r0 : 00000000 This issue was always seen in the context of "ksoftirqd". It seems to be happening because of a potential race condition in __kthread_parkme where just after completing the parked completion, before the ksoftirqd task has been scheduled again, it can go into running state. Fix this by waiting for the task state to parked after waiting the parked completion. CRs-Fixed: 659674 Change-Id: If3f0e9b706eeb5d30d5a32f84378d35bb03fe794 Signed-off-by: Subbaraman Narayanamurthy <subbaram@codeaurora.org>	2015-06-19 15:54:09 -07:00
Riley Andrews	77464f4ec7	cpuset: Make cpusets restore on hotplug This deliberately changes the behavior of the per-cpuset cpus file to not be effected by hotplug. When a cpu is offlined, it will be removed from the cpuset/cpus file. When a cpu is onlined, if the cpuset originally requested that that cpu was part of the cpuset, that cpu will be restored to the cpuset. The cpus files still have to be hierachical, but the ranges no longer have to be out of the currently online cpus, just the physically present cpus. Change-Id: I3efbae24a1f6384be1e603fb56f0d3baef61d924	2015-06-19 13:59:27 -07:00
Riley Andrews	934373d935	cpuset: Add allow_attach hook for cpusets on android. Change-Id: Ic1b61b2bbb7ce74c9e9422b5e22ee9078251de21	2015-06-19 13:59:20 -07:00
Patrick Tjin	8635fd59ea	partial-resume: add missing mutex unlock Change-Id: I0adc7ad8771b3c84b2f1f6e951e42463693df46c Signed-off-by: Patrick Tjin <pattjin@google.com>	2015-06-19 10:25:15 -07:00
Ruchi Kandoi	c0a74c5df7	sched: cpufreq: Adds a field cpu_power in the task_struct cpu_power has been added to keep track of amount of power each task is consuming. cpu_power is updated whenever stime and utime are updated for a task. power is computed by taking into account the frequency at which the current core was running and the current for cpu actively running at hat frequency. Bug: 21498425 Change-Id: Ic535941e7b339aab5cae9081a34049daeb44b248 Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>	2015-06-18 23:36:19 -07:00
Nishanth Menon	614477b4d9	PM / QoS: Add debugfs support to view the list of constraints PM QoS requests are notoriously hard to debug and made even more so due to their highly dynamic nature. Having visibility into the internal data representation per constraint allows us to have much better appreciation of potential issues or bad usage by drivers in the system. So introduce for all classes of PM QoS, an entry in /sys/kernel/debug/pm_qos that shall show all the current requests as well as the snapshot of the value these requests boil down to. For example: ==> /sys/kernel/debug/pm_qos/cpu_dma_latency <== 1: 4444: Active 2: 2000000000: Default 3: 2000000000: Default 4: 2000000000: Default Type=Minimum, Value=4444, Requests: active=1 / total=4 ==> /sys/kernel/debug/pm_qos/memory_bandwidth <== Empty! ... The actual value listed will have their meaning based on the QoS it is on, the 'Type' indicates what logic it would use to collate the information - Minimum, Maximum, or Sum. Value is the collation of all requests. This interface also compares the values with the defaults for the QoS class and marks the ones that are currently active. Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Dave Gerlach <d-gerlach@ti.com> Acked-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-06-18 12:54:39 -07:00
Dmitry Shmidt	d669b095d7	irq: pm: Remove unused variable Change-Id: Ie4311b554628af878cd80fd0abc03b2be294f0bf Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>	2015-06-16 15:30:41 -07:00
Ruchi Kandoi	c955dd12c0	suspend: Return error when pending wakeup source is found. If a wakeup source is found to be pending in the last stage of suspend after syscore suspend then the device doesn't suspend but the error is not propogated which causes an error in the accounting for the number of suspend aborts and successful suspends. Bug: 18179405 Change-Id: Ib63b4ead755127eaf03e3b303aab3c782ad02ed1 Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>	2015-06-16 15:30:41 -07:00
Dmitry Shmidt	ea3f235f71	PM: Reduce waiting for wakeup reasons to 100 ms In 80% cases there is no need to wait, and in case of timeout we continue to resume. Change-Id: I6ae44e0ef6f7aa497f57fcd5f6e6bc83dc781852 Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>	2015-06-16 15:30:41 -07:00
Iliyan Malchev	9bcc844f9e	PM: wakeup_reasons: disable wakeup-reason deduction by default Introduce a config item, CONFIG_DEDUCE_WAKEUP_REASONS, disabled by default. Make CONFIG_PARTUALRESUME select it. Change-Id: I7d831ff0a9dfe0a504824f4bc65ba55c4d92546b Signed-off-by: Iliyan Malchev <malchev@google.com>	2015-06-16 15:30:40 -07:00
Ruchi Kandoi	28819bfd65	power: Avoids bogus error messages for the suspend aborts. Avoids printing bogus error message "tasks refusing to freeze", in cases where pending wakeup source caused the suspend abort. Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com> Change-Id: I913ad290f501b31cd536d039834c8d24c6f16928	2015-06-16 15:30:40 -07:00
Iliyan Malchev	f4e4de4826	PM: wakeup_reasons: fix race condition log_possible_wakeup_reason() and stop_logging_wakeup_reasons() can race, as the latter can be called from process context, and both can run on separate cores. Change-Id: I306441d0be46dd4fe58c55cdc162f9d61a28c27d Signed-off-by: Iliyan Malchev <malchev@google.com>	2015-06-16 15:30:40 -07:00
Dmitry Shmidt	d520963582	Power: Report total suspend times from boot in suspend_since_boot This node exports five values separated by space. From left to right: 1. Amount of suspend/resume cycles 2. Amount of suspend abort cycles 3. Total time spent in suspend/resume process 4. Total time in suspend abort process 5. Total time spent sleep in suspend state Change-Id: I5d71af80516dd451eceba28228bfbaff9681f7ab Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>	2015-06-16 15:30:40 -07:00
jinqian	8b051c3819	Power: Report suspend times from last_suspend_time This node epxorts two values separated by space. From left to right: 1. time spent in suspend/resume process 2. time spent sleep in suspend state Change-Id: I2cb9a9408a5fd12166aaec11b935a0fd6a408c63	2015-06-16 15:30:39 -07:00
Iliyan Malchev	953bd19e2c	PM: extend suspend_again mechanism to use partialresume The old platform suspend_again callback overrides drivers' votes, such that if it implemented and returns false, then we do not call the partialresume handlers. When it doesn't exists or returns true, then we also query the registered drivers for consensus. When a device resumes from suspend, the suspend/resume code invokes partialresume to check to see if the set of wakeup interrupts all have matching handlers. If this is not the case, the PM subsystem can continue to resume as before. If all of the wakeup sources have matching handlers, then those are invoked in turn (and can block), and if all of them reach consensus that the reason for the wakeup can be ignored, they say so to the PM subsystem, which goes right back into suspend. Signed-off-by: Iliyan Malchev <malchev@google.com> Change-Id: Iaeb9ed78c4b5fb815c6e9c701233e703f481f962	2015-06-16 15:30:39 -07:00
Iliyan Malchev	f2d448b992	power: add partial-resume framework Partial resume refers to the concept of not waking up userspace when the kernel comes out of suspend for certain types of events that we wish to discard. An example is a network packet that can be disacarded in the kernel, or spurious wakeup event that we wish to ignore. Partial resume allows drivers to register callbacks, one one hand, and provides hooks into the PM's suspend/resume mechanism, on the other. When a device resumes from suspend, the core suspend/resume code invokes partialresume to check to see if the set of wakeup interrupts all have matching handlers. If this is not the case, the PM subsystem can continue to resume as before. If all of the wakeup sources have matching handlers, then those are invoked in turn (and can block), and if all of them reach consensus that the reason for the wakeup can be ignored, they say so to the PM subsystem, which goes right back into suspend. This latter support is implemented in a separate change. Signed-off-by: Iliyan Malchev <malchev@google.com> Change-Id: Id50940bb22a550b413412264508d259f7121d442	2015-06-16 15:30:39 -07:00
Iliyan Malchev	573c7a8e9e	PM: wakeup_reason: add functions to query and clear wakeup reasons The query results are valid until the next PM_SUSPEND_PREPARE. Change-Id: I6bc2bd47c830262319576a001d39ac9a994916cf Signed-off-by: Iliyan Malchev <malchev@google.com>	2015-06-16 15:30:39 -07:00
Lorenzo Colitti	9a215ba404	Put device-specific code behind #ifndef CONFIG_UML. This is required to run net_test, which builds for ARCH=um. Change-Id: I7041756c057b913a09554b66cd817b4abaa712a6 Signed-off-by: Lorenzo Colitti <lorenzo@google.com>	2015-06-16 15:30:38 -07:00
Dmitry Shmidt	3803d603fe	power: Add check_wakeup_reason() to verify wakeup source irq Wakeup reason is set before driver resume handlers are called. It is cleared before driver suspend handlers are called, on PM_SUSPEND_PREPARE. Change-Id: I04218c9b0c115a7877e8029c73e6679ff82e0aa4 Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>	2015-06-16 15:30:38 -07:00
Ruchi Kandoi	07be015b30	power: Adds functionality to log the last suspend abort reason. Extends the last_resume_reason to log suspend abort reason. The abort reasons will have "Abort:" appended at the start to distinguish itself from the resume reason. Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com> Change-Id: I3207f1844e3d87c706dfc298fb10e1c648814c5f	2015-06-16 15:30:38 -07:00
Theodore Ts'o	90c11314a2	sched: add bit_wait_io for 3.18 ext4 backport Excerpted from commit 743162013: "sched: Remove proliferation of wait_on_bit() action functions" Change-Id: I56153d55a9af9f2911ed6ffb15d36ad89d45cd55 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>	2015-06-15 15:09:46 -07:00
Ramakrishnan Ganesh	8c9b8b6409	Revert "sched: Use only partial wait time as task demand" This reverts commit 2a7c208b28d7558a5e296cec662c691590b0652a. Change-Id: Ie02645574fd93f4149aa6a3f355c71eb412c9bac Signed-off-by: Ramakrishnan Ganesh <ramakris@codeaurora.org>	2015-05-28 12:17:00 -07:00
Steve Muckle	6ed14be2c5	sched: disable IRQs in update_min_max_capacity IRQs must be disabled while locking runqueues since an interrupt may cause a runqueue lock to be acquired. CRs-fixed: 828598 Change-Id: Id66f2e25ed067fc4af028482db8c3abd3d10c20f Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2015-05-28 12:16:59 -07:00
Syed Rameez Mustafa	a77f89168a	sched: Use only partial wait time as task demand The scheduler currently either considers a tasks entire wait time as task demand or completely ignores wait time based on the tunable sched_account_wait_time. Both approaches have their limitations, however. The former artificially boosts tasks demand when it may not actually be justified. With the latter, the scheduler runs the risk of never being able to recognize true load (consider two CPU hogs on a single little CPU). To achieve a compromise between these two extremes, change the load tracking algorithm to only consider part of a tasks wait time as its demand. The portion of wait time accounted as demand is determined by each tasks percent load, i.e. a task that waits for 10ms and has 60 % task load, only 6 ms of the wait will contribute to task demand. This approach is more fair as the scheduler now tries to determine how much of its wait time would a task actually have been using the CPU if it had been executing. It ensures that tasks with high demand continue to see most of the benefits of accounting wait time as busy time, however, lower demand tasks don't experience a disproportionately high boost to demand triggering unjustified big CPU usage. Note that this new approach is only applicable to wait time being considered as task demand and not wait time considered as CPU busy time. To achieve the above effect, ensure that anytime a task is waiting, its runtime in every relevant window segment is appropriately adjusted using its pct load. Change-Id: I6a698d6cb1adeca49113c3499029b422daf7871f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2015-05-28 12:16:58 -07:00
Olav Haugan	cb3b575990	sched/fair: Add irq load awareness to the tick CPU selection logic IRQ load is not taken into account when determining whether a task should be migrated to a different CPU. A task that runs for a long time could get stuck on CPU with high IRQ load causing degraded performance. Add irq load awareness to the tick CPU selection logic. CRs-fixed: 809119 Change-Id: I7969f7dd947fb5d66fce0bedbc212bfb2d42c8c1 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2015-05-28 12:16:46 -07:00
choongryeol.lee	c9b85f93b5	ARM64: bullhead: Add initial LGE Crash Handler Change-Id: Ida38a2fafea95017dae95fbad72b69c709bd3b2c Signed-off-by: kyuyoung.lee <kyuyoung.lee@lge.com> Signed-off-by: choongryeol.lee <choongryeol.lee@lge.com>	2015-05-20 21:28:36 +00:00
Christoph Hellwig	5a4d93f39c	revert "softirq: Add support for triggering softirq work on softirqs" commit fc21c0cff2f425891b28ff6fb6b03b325c977428 upstream. This commit was incomplete in that code to remove items from the per-cpu lists was missing and never acquired a user in the 5 years it has been in the tree. We're going to implement what it seems to try to archive in a simpler way, and this code is in the way of doing so. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhuix.pan@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-05-17 09:51:33 -07:00
Calvin Owens	61ea92b948	ksoftirqd: Enable IRQs and call cond_resched() before poking RCU commit 28423ad283d5348793b0c45cc9b1af058e776fd6 upstream. While debugging an issue with excessive softirq usage, I encountered the following note in commit `3e339b5dae` ("softirq: Use hotplug thread infrastructure"): [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ] ...but despite this note, the patch still calls RCU with IRQs disabled. This seemingly innocuous change caused a significant regression in softirq CPU usage on the sending side of a large TCP transfer (~1 GB/s): when introducing 0.01% packet loss, the softirq usage would jump to around 25%, spiking as high as 50%. Before the change, the usage would never exceed 5%. Moving the call to rcu_note_context_switch() after the cond_sched() call, as it was originally before the hotplug patch, completely eliminated this problem. Signed-off-by: Calvin Owens <calvinowens@fb.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-05-06 21:56:27 +02:00
Oleg Nesterov	ddb56eac0e	ptrace: fix race between ptrace_resume() and wait_task_stopped() commit b72c186999e689cb0b055ab1c7b3cd8fffbeb5ed upstream. ptrace_resume() is called when the tracee is still __TASK_TRACED. We set tracee->exit_code and then wake_up_state() changes tracee->state. If the tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T) wrongly looks like another report from tracee. This confuses debugger, and since wait_task_stopped() clears ->exit_code the tracee can miss a signal. Test-case: #include <stdio.h> #include <unistd.h> #include <sys/wait.h> #include <sys/ptrace.h> #include <pthread.h> #include <assert.h> int pid; void waiter(void arg) { int stat; for (;;) { assert(pid == wait(&stat)); assert(WIFSTOPPED(stat)); if (WSTOPSIG(stat) == SIGHUP) continue; assert(WSTOPSIG(stat) == SIGCONT); printf("ERR! extra/wrong report:%x\n", stat); } } int main(void) { pthread_t thread; pid = fork(); if (!pid) { assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0); for (;;) kill(getpid(), SIGHUP); } assert(pthread_create(&thread, NULL, waiter, NULL) == 0); for (;;) ptrace(PTRACE_CONT, pid, 0, SIGCONT); return 0; } Note for stable: the bug is very old, but without `9899d11f65` "ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix should use lock_task_sighand(child). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Pavel Labath <labath@google.com> Tested-by: Pavel Labath <labath@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-05-06 21:56:24 +02:00
Steven Rostedt	faf8db2e22	ring-buffer: Replace this_cpu_() with __this_cpu_() commit 80a9b64e2c156b6523e7a01f2ba6e5d86e722814 upstream. It has come to my attention that this_cpu_read/write are horrible on architectures other than x86. Worse yet, they actually disable preemption or interrupts! This caused some unexpected tracing results on ARM. 101.356868: preempt_count_add <-ring_buffer_lock_reserve 101.356870: preempt_count_sub <-ring_buffer_lock_reserve The ring_buffer_lock_reserve has recursion protection that requires accessing a per cpu variable. But since preempt_disable() is traced, it too got traced while accessing the variable that is suppose to prevent recursion like this. The generic version of this_cpu_read() and write() are: #define this_cpu_generic_read(pcp) \ ({ typeof(pcp) ret__; \ preempt_disable(); \ ret__ = this_cpu_ptr(&(pcp)); \ preempt_enable(); \ ret__; \ }) #define this_cpu_generic_to_op(pcp, val, op) \ do { \ unsigned long flags; \ raw_local_irq_save(flags); \ __this_cpu_ptr(&(pcp)) op val; \ raw_local_irq_restore(flags); \ } while (0) Which is unacceptable for locations that know they are within preempt disabled or interrupt disabled locations. Paul McKenney stated that __this_cpu_() versions produce much better code on other architectures than this_cpu_() does, if we know that the call is done in a preempt disabled location. I also changed the recursive_unlock() to use two local variables instead of accessing the per_cpu variable twice. Link: http://lkml.kernel.org/r/20150317114411.GE3589@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20150317104038.312e73d1@gandalf.local.home Acked-by: Christoph Lameter <cl@linux.com> Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de> Tested-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-05-06 21:56:21 +02:00
Syed Rameez Mustafa	d191fa671d	sched: Update max_capacity when an entire cluster is hotplugged When an entire cluster is hotplugged, the scheduler's notion of max_capacity can get outdated. This introduces the following inefficiencies in behavior: * task_will_fit() does not return true on all tasks. Consequently all big tasks go through fallback CPU selection logic skipping C-state and power checks in select_best_cpu(). * During boost, migration_needed() return true unnecessarily causing an avoidable rerun of select_best_cpu(). * An unnecessary kick is sent to all little CPUs when boost is set. * An opportunity for early bailout from nohz_kick_needed() is lost. Start handling CPUFREQ_REMOVE_POLICY in the policy notifier callback which indicates the last CPU in a cluster being hotplugged out. Also modify update_min_max_capacity() to only iterate through online CPUs instead of possible CPUs. While we can't guarantee the integrity of the cpu_online_mask in the notifier callback, the scheduler will fix up all state soon after any changes to the online mask. The change does have one side effect; early termination from the notifier callback when min_max_freq or max_possible_freq remain unchanged is no longer possible. This is because when the last CPU in a cluster is hot removed, only max_capacity is updated without affecting min_max_freq or max_possible_freq. Therefore, when the first CPU in the same cluster gets hot added at a later point max_capacity must once again be recomputed despite there being no change in min_max_freq or max_possible_freq. Change-Id: I9a1256b5c2cd6fcddd85b069faf5e2ace177e122 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2015-05-01 19:40:50 -07:00
Ian Maund	faa9cc5339	This is the 3.10.73 stable release -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJVFBE+AAoJEDjbvchgkmk+oTkP/j2ipSvgXghFEipZbOJUQkqC fa8elfoF7riTKpKOuDtDU2WI1ttCGYs5gmTNpd4KaEt23eJOQgVqIpV8GhAkW5Af NVyGhjF3dXNqpBkxnyuIkk5OLrNKGRNS2xpz1U254iGObYrK+tr62IzGPxEcPAhX Y+58xPVSjLtNdTJW3YLT3DohUbnbHG6Br9geI1IHtlxg1oDiTxtnX2FmOFzzDpP5 qu8gnPIekg/+1EE46nEiq0C59AwC3aCzNxwlYe1Kd41SY3LUFF1eZMzmOnnwyI5K 3FslAzT6x/sOmGJFTYrKjFA4GKsW67xHVkB/hp/Mu768RqxiQCxV4kgmPsAFLbXb D5qbNwr3i0iQ/9AaD7h8HJkxC/KHmszMux00L/mgZ3SGdGMEIBxHg+oP8+nP8V6C WfXKSWA94dpdRyULEfWdnKnUnp2860C7kt7ASTkOl8rIgU8HgaRqeu+U/KPM2ovD ZJtXPVB5UXCRuVAhZwbvvrLOY8UMZTnv2auAaeLYG8YptcvGeN5Z398/8qdV/z7c A9kOsgebs74X+lR3rbVgSDPQaq2AEiuIvtX77SfmrWXBXGmc99i9+PikuFggRprz cJm5bCM9DaHu/3b77X9Fwl7vnpReB0zPHiwTdH/p7OPMf5m1uQt7SqegC6btLPHs iYgjLd4oW+6uiV/2X1Vx =L+mC -----END PGP SIGNATURE----- Merge commit 'v3.10.73' into LA.BF64.1.2.9 This merge brings us up to date with upstream kernel.org tag v3.10.73. As part of the conflict resolution, changes introduced by commit `72684eae7` ("arm64: Fix up /proc/cpuinfo") have been intentionally dropped, as they conflict with Android changes msm-3.10 kernel to solve the problems in a different way. Since userspace readers of this file may depend on the existing msm-3.10 implementation, it's left as-is for now. The commit may later be introduced if it is found to not impact userspaces paired with this kernel. * commit 'v3.10.73' (264 commits): Linux 3.10.73 target: Allow Write Exclusive non-reservation holders to READ target: Allow AllRegistrants to re-RESERVE existing reservation target: Fix R_HOLDER bit usage for AllRegistrants target/pscsi: Fix NULL pointer dereference in get_device_type iscsi-target: Avoid early conn_logout_comp for iser connections target: Fix reference leak in target_get_sess_cmd() error path ARM: at91: pm: fix at91rm9200 standby ipvs: rerouting to local clients is not needed anymore ipvs: add missing ip_vs_pe_put in sync code powerpc/smp: Wait until secondaries are active & online x86/vdso: Fix the build on GCC5 x86/fpu: Drop_fpu() should not assume that tsk equals current x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig() crypto: aesni - fix memory usage in GCM decryption libsas: Fix Kernel Crash in smp_execute_task xen-pciback: limit guest control of command register nilfs2: fix deadlock of segment constructor during recovery regulator: core: Fix enable GPIO reference counting regulator: Only enable disabled regulators on resume ALSA: hda - Treat stereo-to-mono mix properly ALSA: hda - Add workaround for MacBook Air 5,2 built-in mic ALSA: hda - Set single_adc_amp flag for CS420x codecs ALSA: hda - Don't access stereo amps for mono channel widgets ALSA: hda - Fix built-in mic on Compaq Presario CQ60 ALSA: control: Add sanity checks for user ctl id name string spi: pl022: Fix race in giveback() leading to driver lock-up tpm/ibmvtpm: Additional LE support for tpm_ibmvtpm_send workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE can: add missing initialisations in CAN related skbuffs Change email address for 8250_pci virtio_console: init work unconditionally fuse: notify: don't move pages fuse: set stolen page uptodate drm/radeon: drop setting UPLL to sleep mode drm/radeon: do a posting read in rs600_set_irq drm/radeon: do a posting read in si_set_irq drm/radeon: do a posting read in r600_set_irq drm/radeon: do a posting read in r100_set_irq drm/radeon: do a posting read in evergreen_set_irq drm/radeon: fix DRM_IOCTL_RADEON_CS oops tcp: make connect() mem charging friendly net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour tcp: fix tcp fin memory accounting Revert "net: cx82310_eth: use common match macro" rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg() caif: fix MSG_OOB test in caif_seqpkt_recvmsg() inet_diag: fix possible overflow in inet_diag_dump_one_icsk() rds: avoid potential stack overflow net: sysctl_net_core: check SNDBUF and RCVBUF for min length sparc64: Fix several bugs in memmove(). sparc: Touch NMI watchdog when walking cpus and calling printk sparc: perf: Make counting mode actually work sparc: perf: Remove redundant perf_pmu_{en\|dis}able calls sparc: semtimedop() unreachable due to comparison error sparc32: destroy_context() and switch_mm() needs to disable interrupts. Linux 3.10.72 ath5k: fix spontaneus AR5312 freezes ACPI / video: Load the module even if ACPI is disabled drm/radeon: fix 1 RB harvest config setup for TN/RL Drivers: hv: vmbus: incorrect device name is printed when child device is unregistered HID: fixup the conflicting keyboard mappings quirk HID: input: fix confusion on conflicting mappings staging: comedi: cb_pcidas64: fix incorrect AI range code handling dm snapshot: fix a possible invalid memory access on unload dm: fix a race condition in dm_get_md dm io: reject unsupported DISCARD requests with EOPNOTSUPP dm mirror: do not degrade the mirror on discard error staging: comedi: comedi_compat32.c: fix COMEDI_CMD copy back clk: sunxi: Support factor clocks with N factor starting not from 0 fixed invalid assignment of 64bit mask to host dma_boundary for scatter gather segment boundary limit. nilfs2: fix potential memory overrun on inode IB/qib: Do not write EEPROM sg: fix read() error reporting ALSA: hda - Add pin configs for ASUS mobo with IDT 92HD73XX codec ALSA: pcm: Don't leave PREPARED state after draining tty: fix up atime/mtime mess, take four sunrpc: fix braino in ->poll() procfs: fix race between symlink removals and traversals debugfs: leave freeing a symlink body until inode eviction autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation USB: serial: fix potential use-after-free after failed probe TTY: fix tty_wait_until_sent on 64-bit machines USB: serial: fix infinite wait_until_sent timeout net: irda: fix wait_until_sent poll timeout xhci: fix reporting of 0-sized URBs in control endpoint xhci: Allocate correct amount of scratchpad buffers usb: ftdi_sio: Add jtag quirk support for Cyber Cortex AV boards USB: usbfs: don't leak kernel data in siginfo USB: serial: cp210x: Adding Seletek device id's KVM: MIPS: Fix trace event to save PC directly KVM: emulate: fix CMPXCHG8B on 32-bit hosts Btrfs:__add_inode_ref: out of bounds memory read when looking for extended ref. Btrfs: fix data loss in the fast fsync path btrfs: fix lost return value due to variable shadowing iio: imu: adis16400: Fix sign extension x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization PM / QoS: remove duplicate call to pm_qos_update_target target: Check for LBA + sectors wrap-around in sbc_parse_cdb mm/memory.c: actually remap enough memory mm/compaction: fix wrong order check in compact_finished() mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() mm/hugetlb: add migration entry check in __unmap_hugepage_range team: don't traverse port list using rcu in team_set_mac_address udp: only allow UFO for packets from SOCK_DGRAM sockets usb: plusb: Add support for National Instruments host-to-host cable macvtap: make sure neighbour code can push ethernet header net: compat: Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msg team: fix possible null pointer dereference in team_handle_frame net: reject creation of netdev names with colons ematch: Fix auto-loading of ematch modules. net: phy: Fix verification of EEE support in phy_init_eee ipv4: ip_check_defrag should not assume that skb_network_offset is zero ipv4: ip_check_defrag should correctly check return value of skb_copy_bits gen_stats.c: Duplicate xstats buffer for later use rtnetlink: call ->dellink on failure when ->newlink exists ipv6: fix ipv6_cow_metrics for non DST_HOST case rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY Linux 3.10.71 libceph: fix double __remove_osd() problem libceph: change from BUG to WARN for __remove_osd() asserts libceph: assert both regular and lingering lists in __remove_osd() MIPS: Export FP functions used by lose_fpu(1) for KVM x86, mm/ASLR: Fix stack randomization on 64-bit systems blk-throttle: check stats_cpu before reading it from sysfs jffs2: fix handling of corrupted summary length md/raid1: fix read balance when a drive is write-mostly. md/raid5: Fix livelock when array is both resyncing and degraded. metag: Fix KSTK_EIP() and KSTK_ESP() macros gpio: tps65912: fix wrong container_of arguments arm64: compat Fix siginfo_t -> compat_siginfo_t conversion on big endian hx4700: regulator: declare full constraints KVM: x86: update masterclock values on TSC writes KVM: MIPS: Don't leak FPU/DSP to guest ARC: fix page address calculation if PAGE_OFFSET != LINUX_LINK_BASE ntp: Fixup adjtimex freq validation on 32-bit systems kdb: fix incorrect counts in KDB summary command output ARM: pxa: add regulator_has_full_constraints to poodle board file ARM: pxa: add regulator_has_full_constraints to corgi board file vt: provide notifications on selection changes usb: core: buffer: smallest buffer should start at ARCH_DMA_MINALIGN USB: fix use-after-free bug in usb_hcd_unlink_urb() USB: cp210x: add ID for RUGGEDCOM USB Serial Console tty: Prevent untrappable signals from malicious program axonram: Fix bug in direct_access cfq-iosched: fix incorrect filing of rt async cfqq cfq-iosched: handle failure of cfq group allocation iscsi-target: Drop problematic active_ts_list usage NFSv4.1: Fix a kfree() of uninitialised pointers in decode_cb_sequence_args Added Little Endian support to vtpm module tpm/tpm_i2c_stm_st33: Fix potential bug in tpm_stm_i2c_send tpm: Fix NULL return in tpm_ibmvtpm_get_desired_dma tpm_tis: verify interrupt during init ARM: 8284/1: sa1100: clear RCSR_SMR on resume tracing: Fix unmapping loop in tracing_mark_write MIPS: KVM: Deliver guest interrupts after local_irq_disable() nfs: don't call blocking operations while !TASK_RUNNING mmc: sdhci-pxav3: fix setting of pdata->clk_delay_cycles power_supply: 88pm860x: Fix leaked power supply on probe fail ALSA: hdspm - Constrain periods to 2 on older cards ALSA: off by one bug in snd_riptide_joystick_probe() lmedm04: Fix usb_submit_urb BOGUS urb xfer, pipe 1 != type 3 in interrupt urb cpufreq: speedstep-smi: enable interrupts when waiting PCI: Fix infinite loop with ROM image of size 0 PCI: Generate uppercase hex for modalias var in uevent HID: i2c-hid: Limit reads to wMaxInputLength bytes for input events iwlwifi: mvm: always use mac color zero iwlwifi: mvm: fix failure path when power_update fails in add_interface iwlwifi: mvm: validate tid and sta_id in ba_notif iwlwifi: pcie: disable the SCD_BASE_ADDR when we resume from WoWLAN fsnotify: fix handling of renames in audit xfs: set superblock buffer type correctly xfs: inode unlink does not set AGI buffer type xfs: ensure buffer types are set correctly Bluetooth: ath3k: workaround the compatibility issue with xHCI controller Linux 3.10.70 rbd: drop an unsafe assertion media/rc: Send sync space information on the lirc device net: sctp: fix passing wrong parameter header to param_type2af in sctp_process_param ppp: deflate: never return len larger than output buffer ipv4: tcp: get rid of ugly unicast_sock tcp: ipv4: initialize unicast_sock sk_pacing_rate bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too ping: Fix race in free in receive path udp_diag: Fix socket skipping within chain ipv4: try to cache dst_entries which would cause a redirect net: sctp: fix slab corruption from use after free on INIT collisions netxen: fix netxen_nic_poll() logic ipv6: stop sending PTB packets for MTU < 1280 net: rps: fix cpu unplug ip: zero sockaddr returned on error queue Linux 3.10.69 crypto: crc32c - add missing crypto module alias x86,kvm,vmx: Preserve CR4 across VM entry kvm: vmx: handle invvpid vm exit gracefully smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread() ALSA: ak411x: Fix stall in work callback ASoC: sgtl5000: add delay before first I2C access ASoC: atmel_ssc_dai: fix start event for I2S mode lib/checksum.c: fix build for generic csum_tcpudp_nofold ext4: prevent bugon on race between write/fcntl arm64: Fix up /proc/cpuinfo nilfs2: fix deadlock of segment constructor over I_SYNC flag lib/checksum.c: fix carry in csum_tcpudp_nofold mm: pagewalk: call pte_hole() for VM_PFNMAP during walk_page_range MIPS: Fix kernel lockup or crash after CPU offline/online MIPS: IRQ: Fix disable_irq on CPU IRQs PCI: Add NEC variants to Stratus ftServer PCIe DMI check gpio: sysfs: fix memory leak in gpiod_sysfs_set_active_low gpio: sysfs: fix memory leak in gpiod_export_link Linux 3.10.68 target: Drop arbitrary maximum I/O size limit iser-target: Fix implicit termination of connections iser-target: Handle ADDR_CHANGE event for listener cm_id iser-target: Fix connected_handler + teardown flow race iser-target: Parallelize CM connection establishment iser-target: Fix flush + disconnect completion handling iscsi,iser-target: Initiate termination only once vhost-scsi: Add missing virtio-scsi -> TCM attribute conversion tcm_loop: Fix wrong I_T nexus association vhost-scsi: Take configfs group dependency during VHOST_SCSI_SET_ENDPOINT ib_isert: Add max_send_sge=2 minimum for control PDU responses IB/isert: Adjust CQ size to HW limits workqueue: fix subtle pool management issue which can stall whole worker_pool gpio: squelch a compiler warning efi-pstore: Make efi-pstore return a unique id pstore/ram: avoid atomic accesses for ioremapped regions pstore: Fix NULL pointer fault if get NULL prz in ramoops_get_next_prz pstore: skip zero size persistent ram buffer in traverse pstore: clarify clearing of _read_cnt in ramoops_context pstore: d_alloc_name() doesn't return an ERR_PTR pstore: Fail to unlink if a driver has not defined pstore_erase ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE ARM: 8108/1: mm: Introduce {pte,pmd}_isset and {pte,pmd}_isclear ARM: DMA: ensure that old section mappings are flushed from the TLB ARM: 7931/1: Correct virt_addr_valid ARM: fix asm/memory.h build error ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' in atomic_cmpxchg(). ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.h ARM: lpae: fix definition of PTE_HWTABLE_PTRS ARM: fix type of PHYS_PFN_OFFSET to unsigned long ARM: LPAE: use phys_addr_t in alloc_init_pud() ARM: LPAE: use signed arithmetic for mask definitions ARM: mm: correct pte_same behaviour for LPAE. ARM: 7829/1: Add ".text.unlikely" and ".text.hot" to arm unwind tables drivers: net: cpsw: discard dual emac default vlan configuration regulator: core: fix race condition in regulator_put() spi/pxa2xx: Clear cur_chip pointer before starting next message dm cache: fix missing ERR_PTR returns and handling dm thin: don't allow messages to be sent to a pool target in READ_ONLY or FAIL mode nl80211: fix per-station group key get/del and memory leak NFSv4.1: Fix an Oops in nfs41_walk_client_list nfs: fix dio deadlock when O_DIRECT flag is flipped Input: i8042 - add noloop quirk for Medion Akoya E7225 (MD98857) ALSA: seq-dummy: remove deadlock-causing events on close powerpc/xmon: Fix another endiannes issue in RTAS call from xmon can: kvaser_usb: Fix state handling upon BUS_ERROR events can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT can: kvaser_usb: Send correct context to URB completion can: kvaser_usb: Do not sleep in atomic context ASoC: wm8960: Fix capture sample rate from 11250 to 11025 spi: dw-mid: fix FIFO size Signed-off-by: Ian Maund <imaund@codeaurora.org>	2015-05-01 13:49:45 -07:00
Matt Wagantall	9216d4ae8c	Revert "workqueue: fix subtle pool management issue which can stall whole worker_pool" This reverts commit 6515b297d5fcc5fd0f80ed04382a5f6b2b15685a. Drop this change in preparation for picking up its upstream linux-stable version, in which merge conflicts have been differently resolved. This will allow for a cleaner merge of linux-stable v3.10.73. Change-Id: I6cc115983a00f0153914a9b8fa1378c8fbae07f8 Signed-off-by: Matt Wagantall <mattw@codeaurora.org>	2015-05-01 13:44:26 -07:00
Ian Maund	807a44d01a	This is the 3.10.67 stable release -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJUyuGRAAoJEDjbvchgkmk+7EwQALYPOeh+AManQFB1MQvFuOgZ /4ulpjhGXw/RPTKHMeyHo8vRfUhMOx8UPF62uql+g1l9b/Zt2bs6qXu4QcxRRsQc trSTUpi+U14y1hkgqOVOcFYP2ZaTjNEBQgLJ4eGn46CliLqme+rfoyRYm2GXzcR4 6cbSAr3mufdFIpi9/8Dn62Gv0aws5lIv3qkHJXznyuux3tisPT5y6Ux2KJoivPn/ SqADtRpwo+7lTjl15fE++9AqNsGMorV6toT2OO/7nXP+824psInKLmREAT2qC99b BG61vcYdxOuHtzmwrvCf1jSRjxhvZT0j2xhBr/vCKcxy08AT0vDv68zrV1r6TIuu U7/CKXtFBY95cjfnkTLJuswBSuIA/+sQHV6DaddH0V8fcZ6rQMLrblQ9ZcFFFkmT 2SG6lmlXqZvcEKYGMnL/Dcow1rkRhB5stiGgTkYxjiRSRpzAHISRJ/GGpsT+rRqK HpBs5p9JshvRl7RWKwAu+DNGaEK1X/WYxc4/jw6dZFWX7lEWSMIPlr9zXgZCZ39y V6lV1VVlT9/CSs1swKHUyhHHehlFsnIlQ6Fkiycr/KkuqBLs92Hyb7WhpVa819yX osXdxSm6J54skiOLKYpBWHpnY09Tc+p28VEfMpErTExgp2oE8F34K7kdhoQPQb97 2mHiXNa+J4CLUNQ+sRmw =HDBo -----END PGP SIGNATURE----- Merge commit 'v3.10.67' into LA.BF64.1.2.9 This merge brings us up to date with upstream kernel.org tag v3.10.67. It also contains changes to allow forbidden warnings introduced in the commit 'core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors'. Once upstream has corrected these warnings, the changes to scripts/gcc-wrapper.py, in this commit, can be reverted. * 'v3.10.67' (915 commits): Linux 3.10.67 md/raid5: fetch_block must fetch all the blocks handle_stripe_dirtying wants. ext4: fix warning in ext4_da_update_reserve_space() quota: provide interface for readding allocated space into reserved space crypto: add missing crypto module aliases crypto: include crypto- module prefix in template crypto: prefix module autoloading with "crypto-" drbd: merge_bvec_fn: properly remap bvm->bi_bdev Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single" ipvs: uninitialized data with IP_VS_IPV6 KEYS: close race between key lookup and freeing sata_dwc_460ex: fix resource leak on error path x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs x86, tls: Interpret an all-zero struct user_desc as "no segment" x86, tls, ldt: Stop checking lm in LDT_empty x86/tsc: Change Fast TSC calibration failed from error to info x86, hyperv: Mark the Hyper-V clocksource as being continuous clocksource: exynos_mct: Fix bitmask regression for exynos4_mct_write can: dev: fix crtlmode_supported check bus: mvebu-mbus: fix support of MBus window 13 ARM: dts: imx25: Fix PWM "per" clocks time: adjtimex: Validate the ADJ_FREQUENCY values time: settimeofday: Validate the values of tv from user dm cache: share cache-metadata object across inactive and active DM tables ipr: wait for aborted command responses drm/i915: Fix mutex->owner inspection race under DEBUG_MUTEXES scripts/recordmcount.pl: There is no -m32 gcc option on Super-H anymore ALSA: usb-audio: Add mic volume fix quirk for Logitech Webcam C210 libata: prevent HSM state change race between ISR and PIO pinctrl: Fix two deadlocks gpio: sysfs: fix gpio device-attribute leak gpio: sysfs: fix gpio-chip device-attribute leak Linux 3.10.66 s390/3215: fix tty output containing tabs s390/3215: fix hanging console issue fsnotify: next_i is freed during fsnotify_unmount_inodes. netfilter: ipset: small potential read beyond the end of buffer mmc: sdhci: Fix sleep in atomic after inserting SD card LOCKD: Fix a race when initialising nlmsvc_timeout x86, um: actually mark system call tables readonly um: Skip futex_atomic_cmpxchg_inatomic() test decompress_bunzip2: off by one in get_next_block() ARM: shmobile: sh73a0 legacy: Set .control_parent for all irqpin instances ARM: omap5/dra7xx: Fix frequency typos ARM: clk-imx6q: fix video divider for rev T0 1.0 ARM: imx6q: drop unnecessary semicolon ARM: dts: imx25: Fix the SPI1 clocks Input: I8042 - add Acer Aspire 7738 to the nomux list Input: i8042 - reset keyboard to fix Elantech touchpad detection can: kvaser_usb: Don't send a RESET_CHIP for non-existing channels can: kvaser_usb: Reset all URB tx contexts upon channel close can: kvaser_usb: Don't free packets when tight on URBs USB: keyspan: fix null-deref at probe USB: cp210x: add IDs for CEL USB sticks and MeshWorks devices USB: cp210x: fix ID for production CEL MeshConnect USB Stick usb: dwc3: gadget: Stop TRB preparation after limit is reached usb: dwc3: gadget: Fix TRB preparation during SG OHCI: add a quirk for ULi M5237 blocking on reset gpiolib: of: Correct error handling in of_get_named_gpiod_flags NFSv4.1: Fix client id trunking on Linux ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing vfio-pci: Fix the check on pci device type in vfio_pci_probe() uvcvideo: Fix destruction order in uvc_delete() smiapp: Take mutex during PLL update in sensor initialisation af9005: fix kernel panic on init if compiled without IR smiapp-pll: Correct clock debug prints video/logo: prevent use of logos after they have been freed storvsc: ring buffer failures may result in I/O freeze iscsi-target: Fail connection on short sendmsg writes hp_accel: Add support for HP ZBook 15 cfg80211: Fix 160 MHz channels with 80+80 and 160 MHz drivers ARC: [nsimosci] move peripherals to match model to FPGA drm/i915: Force the CS stall for invalidate flushes drm/i915: Invalidate media caches on gen7 drm/radeon: properly filter DP1.2 4k modes on non-DP1.2 hw drm/radeon: check the right ring in radeon_evict_flags() drm/vmwgfx: Fix fence event code enic: fix rx skb checksum alx: fix alx_poll() tcp: Do not apply TSO segment limit to non-TSO packets tg3: tg3_disable_ints using uninitialized mailbox value to disable interrupts netlink: Don't reorder loads/stores before marking mmap netlink frame as available netlink: Always copy on mmap TX. Linux 3.10.65 mm: Don't count the stack guard page towards RLIMIT_STACK mm: propagate error from stack expansion even for guard page mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed perf session: Do not fail on processing out of order event perf: Fix events installation during moving group perf/x86/intel/uncore: Make sure only uncore events are collected Btrfs: don't delay inode ref updates during log replay ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XP scripts/kernel-doc: don't eat struct members with __aligned nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races nfsd4: fix xdr4 inclusion of escaped char fs: nfsd: Fix signedness bug in compare_blob serial: samsung: wait for transfer completion before clock disable writeback: fix a subtle race condition in I_DIRTY clearing cdc-acm: memory leak in error case genhd: check for int overflow in disk_expand_part_tbl() USB: cdc-acm: check for valid interfaces ALSA: hda - Fix wrong gpio_dir & gpio_mask hint setups for IDT/STAC codecs ALSA: hda - using uninitialized data ALSA: usb-audio: extend KEF X300A FU 10 tweak to Arcam rPAC driver core: Fix unbalanced device reference in drivers_probe x86, vdso: Use asm volatile in __getcpu x86_64, vdso: Fix the vdso address randomization algorithm HID: Add a new id 0x501a for Genius MousePen i608X HID: add battery quirk for USB_DEVICE_ID_APPLE_ALU_WIRELESS_2011_ISO keyboard HID: roccat: potential out of bounds in pyra_sysfs_write_settings() HID: i2c-hid: prevent buffer overflow in early IRQ HID: i2c-hid: fix race condition reading reports iommu/vt-d: Fix an off-by-one bug in __domain_mapping() UBI: Fix double free after do_sync_erase() UBI: Fix invalid vfree() pstore-ram: Allow optional mapping with pgprot_noncached pstore-ram: Fix hangs by using write-combine mappings PCI: Restore detection of read-only BARs ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap ASoC: max98090: Fix ill-defined sidetone route ASoC: sigmadsp: Refuse to load firmware files with a non-supported version ath5k: fix hardware queue index assignment swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single can: peak_usb: fix memset() usage can: peak_usb: fix cleanup sequence order in case of error during init ath9k: fix BE/BK queue order ath9k_hw: fix hardware queue allocation ocfs2: fix journal commit deadlock Linux 3.10.64 Btrfs: fix fs corruption on transaction abort if device supports discard Btrfs: do not move em to modified list when unpinning eCryptfs: Remove buggy and unnecessary write in file name decode routine eCryptfs: Force RO mount when encrypted view is enabled udf: Verify symlink size before loading it exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting ncpfs: return proper error from NCP_IOC_SETROOT ioctl crypto: af_alg - fix backlog handling userns: Unbreak the unprivileged remount tests userns: Allow setting gid_maps without privilege when setgroups is disabled userns: Add a knob to disable setgroups on a per user namespace basis userns: Rename id_map_mutex to userns_state_mutex userns: Only allow the creator of the userns unprivileged mappings userns: Check euid no fsuid when establishing an unprivileged uid mapping userns: Don't allow unprivileged creation of gid mappings userns: Don't allow setgroups until a gid mapping has been setablished userns: Document what the invariant required for safe unprivileged mappings. groups: Consolidate the setgroups permission checks umount: Disallow unprivileged mount force mnt: Update unprivileged remount test mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount mac80211: free management frame keys when removing station mac80211: fix multicast LED blinking and counter KEYS: Fix stale key registration at error path isofs: Fix unchecked printing of ER records x86/tls: Don't validate lm in set_thread_area() after all dm space map metadata: fix sm_bootstrap_get_nr_blocks() dm bufio: fix memleak when using a dm_buffer's inline bio nfs41: fix nfs4_proc_layoutget error handling megaraid_sas: corrected return of wait_event from abort frame path mmc: block: add newline to sysfs display of force_ro mfd: tc6393xb: Fail ohci suspend if full state restore is required md/bitmap: always wait for writes on unplug. x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit x86_64, switch_to(): Load TLS descriptors before switching DS and ES x86/tls: Disallow unusual TLS segments x86/tls: Validate TLS entries to protect espfix isofs: Fix infinite looping over CE entries Linux 3.10.63 ALSA: usb-audio: Don't resubmit pending URBs at MIDI error recovery powerpc: 32 bit getcpu VDSO function uses 64 bit instructions ARM: sched_clock: Load cycle count after epoch stabilizes igb: bring link up when PHY is powered up ext2: Fix oops in ext2_get_block() called from ext2_quota_write() nEPT: Nested INVEPT net: sctp: use MAX_HEADER for headroom reserve in output path net: mvneta: fix Tx interrupt delay rtnetlink: release net refcnt on error in do_setlink() net/mlx4_core: Limit count field to 24 bits in qp_alloc_res tg3: fix ring init when there are more TX than RX channels ipv6: gre: fix wrong skb->protocol in WCCP sata_fsl: fix error handling of irq_of_parse_and_map ahci: disable MSI on SAMSUNG 0xa800 SSD AHCI: Add DeviceIDs for Sunrise Point-LP SATA controller media: smiapp: Only some selection targets are settable drm/i915: Unlock panel even when LVDS is disabled drm/radeon: kernel panic in drm_calc_vbltimestamp_from_scanoutpos with 3.18.0-rc6 i2c: davinci: generate STP always when NACK is received i2c: omap: fix i207 errata handling i2c: omap: fix NACK and Arbitration Lost irq handling xen-netfront: Remove BUGs on paged skb data which crosses a page boundary mm: fix swapoff hang after page migration and fork mm: frontswap: invalidate expired data on a dup-store failure Linux 3.10.62 nfsd: Fix ACL null pointer deref powerpc/powernv: Honor the generic "no_64bit_msi" flag bnx2fc: do not add shared skbs to the fcoe_rx_list nfsd4: fix leak of inode reference on delegation failure nfsd: Fix slot wake up race in the nfsv4.1 callback code rt2x00: do not align payload on modern H/W can: dev: avoid calling kfree_skb() from interrupt context spi: dw: Fix dynamic speed change. iser-target: Handle DEVICE_REMOVAL event on network portal listener correctly target: Don't call TFO->write_pending if data_length == 0 srp-target: Retry when QP creation fails with ENOMEM Input: xpad - use proper endpoint type ARM: 8222/1: mvebu: enable strex backoff delay ARM: 8216/1: xscale: correct auxiliary register in suspend/resume ALSA: usb-audio: Add ctrl message delay quirk for Marantz/Denon devices can: esd_usb2: fix memory leak on disconnect USB: xhci: don't start a halted endpoint before its new dequeue is set usb-quirks: Add reset-resume quirk for MS Wireless Laser Mouse 6000 usb: serial: ftdi_sio: add PIDs for Matrix Orbital products USB: serial: cp210x: add IDs for CEL MeshConnect USB Stick USB: keyspan: fix tty line-status reporting USB: keyspan: fix overrun-error reporting USB: ssu100: fix overrun-error reporting iio: Fix IIO_EVENT_CODE_EXTRACT_DIR bit mask powerpc/pseries: Fix endiannes issue in RTAS call from xmon powerpc/pseries: Honor the generic "no_64bit_msi" flag of/base: Fix PowerPC address parsing hack ASoC: wm_adsp: Avoid attempt to free buffers that might still be in use ASoC: sgtl5000: Fix SMALL_POP bit definition PCI/MSI: Add device flag indicating that 64-bit MSIs don't work ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg pptp: fix stack info leak in pptp_getname() qmi_wwan: Add support for HP lt4112 LTE/HSPA+ Gobi 4G Modem ieee802154: fix error handling in ieee802154fake_probe() ipv4: Fix incorrect error code when adding an unreachable route inetdevice: fixed signed integer overflow sparc64: Fix constraints on swab helpers. uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME x86, mm: Set NX across entire PMD at boot x86: Require exact match for 'noxsave' command line option x86_64, traps: Rework bad_iret x86_64, traps: Stop using IST for #SS x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C MIPS: Loongson: Make platform serial setup always built-in. MIPS: oprofile: Fix backtrace on 64-bit kernel Linux 3.10.61 mm: memcg: handle non-error OOM situations more gracefully mm: memcg: do not trap chargers with full callstack on OOM mm: memcg: rework and document OOM waiting and wakeup mm: memcg: enable memcg OOM killer only for user faults x86: finish user fault error path with fatal signal arch: mm: pass userspace fault flag to generic fault handler arch: mm: do not invoke OOM killer on kernel fault OOM arch: mm: remove obsolete init OOM protection mm: invoke oom-killer from remaining unconverted page fault handlers net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks net: sctp: fix panic on duplicate ASCONF chunks net: sctp: fix remote memory pressure from excessive queueing KVM: x86: Don't report guest userspace emulation error to userspace SCSI: hpsa: fix a race in cmd_free/scsi_done net/mlx4_en: Fix BlueFlame race ARM: Correct BUG() assembly to ensure it is endian-agnostic perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge mei: bus: fix possible boundaries violation perf: Handle compat ioctl MIPS: Fix forgotten preempt_enable() when CPU has inclusive pcaches dell-wmi: Fix access out of memory ARM: probes: fix instruction fetch order with <asm/opcodes.h> br: fix use of ->rx_handler_data in code executed on non-rx_handler path netfilter: nf_nat: fix oops on netns removal netfilter: xt_bpf: add mising opaque struct sk_filter definition netfilter: nf_log: release skbuff on nlmsg put failure netfilter: nfnetlink_log: fix maximum packet length logged to userspace netfilter: nf_log: account for size of NLMSG_DONE attribute ipc: always handle a new value of auto_msgmni clocksource: Remove "weak" from clocksource_default_clock() declaration kgdb: Remove "weak" from kgdb_arch_pc() declaration media: ttusb-dec: buffer overflow in ioctl NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return nfs: Fix use of uninitialized variable in nfs_getattr() NFS: Don't try to reclaim delegation open state if recovery failed NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired Input: alps - allow up to 2 invalid packets without resetting device Input: alps - ignore potential bare packets when device is out of sync dm raid: ensure superblock's size matches device's logical block size dm btree: fix a recursion depth bug in btree walking code block: Fix computation of merged request priority parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls scsi: only re-lock door after EH on devices that were reset nfs: fix pnfs direct write memory leak firewire: cdev: prevent kernel stack leaking into ioctl arguments arm64: __clear_user: handle exceptions on strb ARM: 8198/1: make kuser helpers depend on MMU drm/radeon: add missing crtc unlock when setting up the MC mac80211: fix use-after-free in defragmentation macvtap: Fix csum_start when VLAN tags are present iwlwifi: configure the LTR libceph: do not crash on large auth tickets xtensa: re-wire umount syscall to sys_oldumount ALSA: usb-audio: Fix memory leak in FTU quirk ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks ahci: Add Device IDs for Intel Sunrise Point PCH audit: keep inode pinned x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks sparc64: Do irq_{enter,exit}() around generic_smp_call_function*(). sparc64: Fix crashes in schizo_pcierr_intr_other(). sunvdc: don't call VD_OP_GET_VTOC vio: fix reuse of vio_dring slot sunvdc: limit each sg segment to a page sunvdc: compute vdisk geometry from capacity sunvdc: add cdrom and v1.1 protocol support net: sctp: fix memory leak in auth key management net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet gre6: Move the setting of dev->iflink into the ndo_init functions. ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function. Linux 3.10.60 libceph: ceph-msgr workqueue needs a resque worker Btrfs: fix kfree on list_head in btrfs_lookup_csums_range error cleanup of: Fix overflow bug in string property parsing functions sysfs: driver core: Fix glue dir race condition by gdp_mutex i2c: at91: don't account as iowait acer-wmi: Add acpi_backlight=video quirk for the Acer KAV80 rbd: Fix error recovery in rbd_obj_read_sync() drm/radeon: remove invalid pci id usb: gadget: udc: core: fix kernel oops with soft-connect usb: gadget: function: acm: make f_acm pass USB20CV Chapter9 usb: dwc3: gadget: fix set_halt() bug with pending transfers crypto: algif - avoid excessive use of socket buffer in skcipher mm: Remove false WARN_ON from pagecache_isize_extended() x86, apic: Handle a bad TSC more gracefully posix-timers: Fix stack info leak in timer_create() mac80211: fix typo in starting baserate for rts_cts_rate_idx PM / Sleep: fix recovery during resuming from hibernation tty: Fix high cpu load if tty is unreleaseable quota: Properly return errors from dquot_writeback_dquots() ext3: Don't check quota format when there are no quota files nfsd4: fix crash on unknown operation number cpc925_edac: Report UE events properly e7xxx_edac: Report CE events properly i3200_edac: Report CE events properly i82860_edac: Report CE events properly scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND lib/bitmap.c: fix undefined shift in __bitmap_shift_{left\|right}() cgroup/kmemleak: add kmemleak_free() for cgroup deallocations. usb: Do not allow usb_alloc_streams on unconfigured devices USB: opticon: fix non-atomic allocation in write path usb-storage: handle a skipped data phase spi: pxa2xx: toggle clocks on suspend if not disabled by runtime PM spi: pl022: Fix incorrect dma_unmap_sg usb: dwc3: gadget: Properly initialize LINK TRB wireless: rt2x00: add new rt2800usb device USB: option: add Haier CE81B CDMA modem usb: option: add support for Telit LE910 USB: cdc-acm: only raise DTR on transitions from B0 USB: cdc-acm: add device id for GW Instek AFG-2225 usb: serial: ftdi_sio: add "bricked" FTDI device PID usb: serial: ftdi_sio: add Awinda Station and Dongle products USB: serial: cp210x: add Silicon Labs 358x VID and PID serial: Fix divide-by-zero fault in uart_get_divisor() staging:iio:ade7758: Remove "raw" from channel name staging:iio:ade7758: Fix check if channels are enabled in prenable staging:iio:ade7758: Fix NULL pointer deref when enabling buffer staging:iio:ad5933: Drop "raw" from channel names staging:iio:ad5933: Fix NULL pointer deref when enabling buffer OOM, PM: OOM killed task shouldn't escape PM suspend freezer: Do not freeze tasks killed by OOM killer ext4: fix oops when loading block bitmap failed cpufreq: intel_pstate: Fix setting max_perf_pct in performance policy ext4: fix overflow when updating superblock backups after resize ext4: check s_chksum_driver when looking for bg csum presence ext4: fix reservation overflow in ext4_da_write_begin ext4: add ext4_iget_normal() which is to be used for dir tree lookups ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT ext4: don't check quota format when there are no quota files ext4: check EA value offset when loading jbd2: free bh when descriptor block checksum fails MIPS: tlbex: Properly fix HUGE TLB Refill exception handler target: Fix APTPL metadata handling for dynamic MappedLUNs target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE qla_target: don't delete changed nacls ARC: Update order of registers in KGDB to match GDB 7.5 ARC: [nsimosci] Allow "headless" models to boot KVM: x86: Emulator fixes for eip canonical checks on near branches KVM: x86: Fix wrong masking on relative jump/call kvm: x86: don't kill guest on unknown exit reason KVM: x86: Check non-canonical addresses upon WRMSR KVM: x86: Improve thread safety in pit KVM: x86: Prevent host from panicking on shared MSR writes. kvm: fix excessive pages un-pinning in kvm_iommu_map error path. media: tda7432: Fix setting TDA7432_MUTE bit for TDA7432_RF register media: ds3000: fix LNB supply voltage on Tevii S480 on initialization media: em28xx-v4l: give back all active video buffers to the vb2 core properly on streaming stop media: v4l2-common: fix overflow in v4l_bound_align_image() drm/nouveau/bios: memset dcb struct to zero before parsing drm/tilcdc: Fix the error path in tilcdc_load() drm/ast: Fix HW cursor image Input: i8042 - quirks for Fujitsu Lifebook A544 and Lifebook AH544 Input: i8042 - add noloop quirk for Asus X750LN framebuffer: fix border color modules, lock around setting of MODULE_STATE_UNFORMED dm log userspace: fix memory leak in dm_ulog_tfr_init failure path block: fix alignment_offset math that assumes io_min is a power-of-2 drbd: compute the end before rb_insert_augmented() dm bufio: update last_accessed when relinking a buffer virtio_pci: fix virtio spec compliance on restore selinux: fix inode security list corruption pstore: Fix duplicate {console,ftrace}-efi entries mfd: rtsx_pcr: Fix MSI enable error handling mnt: Prevent pivot_root from creating a loop in the mount tree UBI: add missing kmem_cache_free() in process_pool_aeb error path random: add and use memzero_explicit() for clearing data crypto: more robust crypto_memneq fix misuses of f_count() in ppp and netlink kill wbuf_queued/wbuf_dwork_lock ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode evm: check xattr value length and type in evm_inode_setxattr() x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE x86_64, entry: Fix out of bounds read on sysenter x86_64, entry: Filter RFLAGS.NT on entry from userspace x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal() x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable() x86: Reject x32 executables if x32 ABI not supported vfs: fix data corruption when blocksize < pagesize for mmaped data UBIFS: fix free log space calculation UBIFS: fix a race condition UBIFS: remove mst_mutex fs: Fix theoretical division by 0 in super_cache_scan(). fs: make cont_expand_zero interruptible mmc: rtsx_pci_sdmmc: fix incorrect last byte in R2 response libata-sff: Fix controllers with no ctl port pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE Controller Revert "percpu: free percpu allocation info for uniprocessor system" lockd: Try to reconnect if statd has moved drivers/net: macvtap and tun depend on INET ipv4: dst_entry leak in ip_send_unicast_reply() ax88179_178a: fix bonding failure ipv4: fix nexthop attlen check in fib_nh_match tracing/syscalls: Ignore numbers outside NR_syscalls' range Linux 3.10.59 ecryptfs: avoid to access NULL pointer when write metadata in xattr ARM: at91/PMC: don't forget to write PMC_PCDR register to disable clocks ALSA: usb-audio: Add support for Steinberg UR22 USB interface ALSA: emu10k1: Fix deadlock in synth voice lookup ALSA: pcm: use the same dma mmap codepath both for arm and arm64 arm64: compat: fix compat types affecting struct compat_elf_prpsinfo spi: dw-mid: terminate ongoing transfers at exit kernel: add support for gcc 5 fanotify: enable close-on-exec on events' fd when requested in fanotify_init() mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set Bluetooth: Fix issue with USB suspend in btusb driver Bluetooth: Fix HCI H5 corrupted ack value rt2800: correct BBP1_TX_POWER_CTRL mask PCI: Generate uppercase hex for modalias interface class PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size iwlwifi: Add missing PCI IDs for the 7260 series NFSv4.1: Fix an NFSv4.1 state renewal regression NFSv4: fix open/lock state recovery error handling NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails lzo: check for length overrun in variable length encoding. Revert "lzo: properly check for overruns" Documentation: lzo: document part of the encoding m68k: Disable/restore interrupts in hwreg_present()/hwreg_write() Drivers: hv: vmbus: Fix a bug in vmbus_open() Drivers: hv: vmbus: Cleanup vmbus_establish_gpadl() Drivers: hv: vmbus: Cleanup vmbus_teardown_gpadl() Drivers: hv: vmbus: Cleanup vmbus_post_msg() firmware_class: make sure fw requests contain a name qla2xxx: Use correct offset to req-q-out for reserve calculation mptfusion: enable no_write_same for vmware scsi disks be2iscsi: check ip buffer before copying regmap: fix NULL pointer dereference in _regmap_write/read regmap: debugfs: fix possbile NULL pointer dereference spi: dw-mid: check that DMA was inited before exit spi: dw-mid: respect 8 bit mode x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead kvm: don't take vcpu mutex for obviously invalid vcpu ioctls KVM: s390: unintended fallthrough for external call kvm: x86: fix stale mmio cache bug fs: Add a missing permission check to do_umount Btrfs: fix race in WAIT_SYNC ioctl Btrfs: fix build_backref_tree issue with multiple shared blocks Btrfs: try not to ENOSPC on log replay Linux 3.10.58 USB: cp210x: add support for Seluxit USB dongle USB: serial: cp210x: added Ketra N1 wireless interface support USB: Add device quirk for ASUS T100 Base Station keyboard ipv6: reallocate addrconf router for ipv6 address when lo device up tcp: fixing TLP's FIN recovery sctp: handle association restarts when the socket is closed. ip6_gre: fix flowi6_proto value in xmit path hyperv: Fix a bug in netvsc_start_xmit() tg3: Allow for recieve of full-size 8021AD frames tg3: Work around HW/FW limitations with vlan encapsulated frames l2tp: fix race while getting PMTU on PPP pseudo-wire openvswitch: fix panic with multiple vlan headers packet: handle too big packets for PACKET_V3 tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() sit: Fix ipip6_tunnel_lookup device matching criteria myri10ge: check for DMA mapping errors Linux 3.10.57 cpufreq: ondemand: Change the calculation of target frequency cpufreq: Fix wrong time unit conversion nl80211: clear skb cb before passing to netlink drbd: fix regression 'out of mem, failed to invoke fence-peer helper' jiffies: Fix timeval conversion to jiffies md/raid5: disable 'DISCARD' by default due to safety concerns. media: vb2: fix VBI/poll regression mm: numa: Do not mark PTEs pte_numa when splitting huge pages mm, thp: move invariant bug check out of loop in __split_huge_page_map ring-buffer: Fix infinite spin in reading buffer init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu perf: fix perf bug in fork() udf: Avoid infinite loop when processing indirect ICBs Linux 3.10.56 vm_is_stack: use for_each_thread() rather then buggy while_each_thread() oom_kill: add rcu_read_lock() into find_lock_task_mm() oom_kill: has_intersects_mems_allowed() needs rcu_read_lock() oom_kill: change oom_kill.c to use for_each_thread() introduce for_each_thread() to replace the buggy while_each_thread() kernel/fork.c:copy_process(): unify CLONE_THREAD-or-thread_group_leader code arm: multi_v7_defconfig: Enable Zynq UART driver ext2: Fix fs corruption in ext2_get_xip_mem() serial: 8250_dma: check the result of TX buffer mapping ARM: 7748/1: oabi: handle faults when loading swi instruction from userspace netfilter: nf_conntrack: avoid large timeout for mid-stream pickup PM / sleep: Use valid_state() for platform-dependent sleep states only PM / sleep: Add state field to pm_states[] entries ipvs: fix ipv6 hook registration for local replies ipvs: Maintain all DSCP and ECN bits for ipv6 tun forwarding ipvs: avoid netns exit crash on ip_vs_conn_drop_conntrack md/raid1: fix_read_error should act on all non-faulty devices. media: cx18: fix kernel oops with tda8290 tuner Fix nasty 32-bit overflow bug in buffer i/o code. perf kmem: Make it work again on non NUMA machines perf: Fix a race condition in perf_remove_from_context() alarmtimer: Lock k_itimer during timer callback alarmtimer: Do not signal SIGEV_NONE timers parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds powerpc/perf: Fix ABIv2 kernel backtraces sched: Fix unreleased llc_shared_mask bit during CPU hotplug ocfs2/dlm: do not get resource spinlock if lockres is new nilfs2: fix data loss with mmap() fs/notify: don't show f_handle if exportfs_encode_inode_fh failed fsnotify/fdinfo: use named constants instead of hardcoded values kcmp: fix standard comparison bug Revert "mac80211: disable uAPSD if all ACs are under ACM" usb: dwc3: core: fix ordering for PHY suspend usb: dwc3: core: fix order of PM runtime calls usb: host: xhci: fix compliance mode workaround genhd: fix leftover might_sleep() in blk_free_devt() lockd: fix rpcbind crash on lockd startup failure rtlwifi: rtl8192cu: Add new ID percpu: perform tlb flush after pcpu_map_pages() failure percpu: fix pcpu_alloc_pages() failure path percpu: free percpu allocation info for uniprocessor system ata_piix: Add Device IDs for Intel 9 Series PCH Input: i8042 - add nomux quirk for Avatar AVIU-145A6 Input: i8042 - add Fujitsu U574 to no_timeout dmi table Input: atkbd - do not try 'deactivate' keyboard on any LG laptops Input: elantech - fix detection of touchpad on ASUS s301l Input: synaptics - add support for ForcePads Input: serport - add compat handling for SPIOCSTYPE ioctl dm crypt: fix access beyond the end of allocated space block: Fix dev_t minor allocation lifetime workqueue: apply __WQ_ORDERED to create_singlethread_workqueue() Revert "iwlwifi: dvm: don't enable CTS to self" SCSI: libiscsi: fix potential buffer overrun in __iscsi_conn_send_pdu NFC: microread: Potential overflows in microread_target_discovered() iscsi-target: Fix memory corruption in iscsit_logout_post_handler_diffcid iscsi-target: avoid NULL pointer in iscsi_copy_param_list failure Target/iser: Don't put isert_conn inside disconnected handler Target/iser: Get isert_conn reference once got to connected_handler iio:inkern: fix overwritten -EPROBE_DEFER in of_iio_channel_get_by_name iio:magnetometer: bugfix magnetometers gain values iio: adc: ad_sigma_delta: Fix indio_dev->trig assignment iio: st_sensors: Fix indio_dev->trig assignment iio: meter: ade7758: Fix indio_dev->trig assignment iio: inv_mpu6050: Fix indio_dev->trig assignment iio: gyro: itg3200: Fix indio_dev->trig assignment iio:trigger: modify return value for iio_trigger_get CIFS: Fix SMB2 readdir error handling CIFS: Fix directory rename error ASoC: davinci-mcasp: Correct rx format unit configuration shmem: fix nlink for rename overwrite directory x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8 KVM: x86: handle idiv overflow at kvm_write_tsc regmap: Fix handling of volatile registers for format_write() chips ACPICA: Update to GPIO region handler interface. MIPS: mcount: Adjust stack pointer for static trace in MIPS32 MIPS: ZBOOT: add missing <linux/string.h> include ARM: 8165/1: alignment: don't break misaligned NEON load/store ARM: 7897/1: kexec: Use the right ISA for relocate_new_kernel ARM: 8133/1: use irq_set_affinity with force=false when migrating irqs ARM: 8128/1: abort: don't clear the exclusive monitors NFSv4: Fix another bug in the close/open_downgrade code NFSv4: nfs4_state_manager() vs. nfs_server_remove_lists() usb:hub set hub->change_bits when over-current happens usb: dwc3: omap: fix ordering for runtime pm calls USB: EHCI: unlink QHs even after the controller has stopped USB: storage: Add quirks for Entrega/Xircom USB to SCSI converters USB: storage: Add quirk for Ariston Technologies iConnect USB to SCSI adapter USB: storage: Add quirk for Adaptec USBConnect 2000 USB-to-SCSI Adapter storage: Add single-LUN quirk for Jaz USB Adapter usb: hub: take hub->hdev reference when processing from eventlist xhci: fix oops when xhci resumes from hibernate with hw lpm capable devices xhci: Fix null pointer dereference if xhci initialization fails USB: zte_ev: fix removed PIDs USB: ftdi_sio: add support for NOVITUS Bono E thermal printer USB: sierra: add 1199:68AA device ID USB: sierra: avoid CDC class functions on "68A3" devices USB: zte_ev: remove duplicate Qualcom PID USB: zte_ev: remove duplicate Gobi PID Revert "USB: option,zte_ev: move most ZTE CDMA devices to zte_ev" USB: option: add VIA Telecom CDS7 chipset device id USB: option: reduce interrupt-urb logging verbosity USB: serial: fix potential heap buffer overflow USB: sisusb: add device id for Magic Control USB video USB: serial: fix potential stack buffer overflow USB: serial: pl2303: add device id for ztek device xtensa: fix a6 and a7 handling in fast_syscall_xtensa xtensa: fix TLBTEMP_BASE_2 region handling in fast_second_level_miss xtensa: fix access to THREAD_RA/THREAD_SP/THREAD_DS xtensa: fix address checks in dma_{alloc,free}_coherent xtensa: replace IOCTL code definitions with constants drm/radeon: add connector quirk for fujitsu board drm/vmwgfx: Fix a potential infinite spin waiting for fifo idle drm/ast: AST2000 cannot be detected correctly drm/i915: Wait for vblank before enabling the TV encoder drm/i915: Remove bogus __init annotation from DMI callbacks HID: logitech-dj: prevent false errors to be shown HID: magicmouse: sanity check report size in raw_event() callback HID: picolcd: sanity check report size in raw_event() callback cfq-iosched: Fix wrong children_weight calculation ALSA: pcm: fix fifo_size frame calculation ALSA: hda - Fix invalid pin powermap without jack detection ALSA: hda - Fix COEF setups for ALC1150 codec ALSA: core: fix buffer overflow in snd_info_get_line() arm64: ptrace: fix compat hardware watchpoint reporting trace: Fix epoll hang when we race with new entries i2c: at91: Fix a race condition during signal handling in at91_do_twi_xfer. i2c: at91: add bound checking on SMBus block length bytes arm64: flush TLS registers during exec ibmveth: Fix endian issues with rx_no_buffer statistic ahci: add pcid for Marvel 0x9182 controller ahci: Add Device IDs for Intel 9 Series PCH pata_scc: propagate return value of scc_wait_after_reset drm/i915: read HEAD register back in init_ring_common() to enforce ordering drm/radeon: load the lm63 driver for an lm64 thermal chip. drm/ttm: Choose a pool to shrink correctly in ttm_dma_pool_shrink_scan(). drm/ttm: Fix possible division by 0 in ttm_dma_pool_shrink_scan(). drm/tilcdc: fix double kfree drm/tilcdc: fix release order on exit drm/tilcdc: panel: fix leak when unloading the module drm/tilcdc: tfp410: fix dangling sysfs connector node drm/tilcdc: slave: fix dangling sysfs connector node drm/tilcdc: panel: fix dangling sysfs connector node carl9170: fix sending URBs with wrong type when using full-speed Linux 3.10.55 libceph: gracefully handle large reply messages from the mon libceph: rename ceph_msg::front_max to front_alloc_len tpm: Provide a generic means to override the chip returned timeouts vfs: fix bad hashing of dentries dcache.c: get rid of pointless macros IB/srp: Fix deadlock between host removal and multipathd blkcg: don't call into policy draining if root_blkg is already gone mtd: nand: omap: Fix 1-bit Hamming code scheme, omap_calculate_ecc() mtd/ftl: fix the double free of the buffers allocated in build_maps() CIFS: Fix wrong restart readdir for SMB1 CIFS: Fix wrong filename length for SMB2 CIFS: Fix wrong directory attributes after rename CIFS: Possible null ptr deref in SMB2_tcon CIFS: Fix async reading on reconnects CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2 libceph: do not hard code max auth ticket len libceph: add process_one_ticket() helper libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly md/raid1,raid10: always abort recover on write error. xfs: don't zero partial page cache pages during O_DIRECT writes xfs: don't zero partial page cache pages during O_DIRECT writes xfs: don't dirty buffers beyond EOF xfs: quotacheck leaves dquot buffers without verifiers RDMA/iwcm: Use a default listen backlog if needed md/raid10: Fix memory leak when raid10 reshape completes. md/raid10: fix memory leak when reshaping a RAID10. md/raid6: avoid data corruption during recovery of double-degraded RAID6 Bluetooth: Avoid use of session socket after the session gets freed Bluetooth: never linger on process exit mnt: Add tests for unprivileged remount cases that have found to be faulty mnt: Change the default remount atime from relatime to the existing value mnt: Correct permission checks in do_remount mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount mnt: Only change user settable mount flags in remount ring-buffer: Up rb_iter_peek() loop count to 3 ring-buffer: Always reset iterator to reader page ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock ACPI: Run fixed event device notifications in process context ACPICA: Utilities: Fix memory leak in acpi_ut_copy_iobject_to_iobject bfa: Fix undefined bit shift on big-endian architectures with 32-bit DMA address ASoC: pxa-ssp: drop SNDRV_PCM_FMTBIT_S24_LE ASoC: max98090: Fix missing free_irq ASoC: samsung: Correct I2S DAI suspend/resume ops ASoC: wm_adsp: Add missing MODULE_LICENSE ASoC: pcm: fix dpcm_path_put in dpcm runtime update openrisc: Rework signal handling MIPS: Fix accessing to per-cpu data when flushing the cache MIPS: OCTEON: make get_system_type() thread-safe MIPS: asm: thread_info: Add _TIF_SECCOMP flag MIPS: Cleanup flags in syscall flags handlers. MIPS: asm/reg.h: Make 32- and 64-bit definitions available at the same time MIPS: Remove BUG_ON(!is_fpu_owner()) in do_ade() MIPS: tlbex: Fix a missing statement for HUGETLB MIPS: Prevent user from setting FCSR cause bits MIPS: GIC: Prevent array overrun drivers: scsi: storvsc: Correctly handle TEST_UNIT_READY failure Drivers: scsi: storvsc: Implement a eh_timed_out handler powerpc/pseries: Failure on removing device node powerpc/mm: Use read barrier when creating real_pte powerpc/mm/numa: Fix break placement regulator: arizona-ldo1: remove bypass functionality mfd: omap-usb-host: Fix improper mask use. kernel/smp.c:on_each_cpu_cond(): fix warning in fallback path CAPABILITIES: remove undefined caps from all processes tpm: missing tpm_chip_put in tpm_get_random() firmware: Do not use WARN_ON(!spin_is_locked()) spi: omap2-mcspi: Configure hardware when slave driver changes mode spi: orion: fix incorrect handling of cell-index DT property iommu/amd: Fix cleanup_domain for mass device removal media: media-device: Remove duplicated memset() in media_enum_entities() media: au0828: Only alt setting logic when needed media: xc4000: Fix get_frequency() media: xc5000: Fix get_frequency() Linux 3.10.54 USB: fix build error with CONFIG_PM_RUNTIME disabled NFSv4: Fix problems with close in the presence of a delegation NFSv3: Fix another acl regression svcrdma: Select NFSv4.1 backchannel transport based on forward channel NFSD: Decrease nfsd_users in nfsd_startup_generic fail usb: hub: Prevent hub autosuspend if usbcore.autosuspend is -1 USB: whiteheat: Added bounds checking for bulk command response USB: ftdi_sio: Added PID for new ekey device USB: ftdi_sio: add Basic Micro ATOM Nano USB2Serial PID ARM: OMAP2+: hwmod: Rearm wake-up interrupts for DT when MUSB is idled usb: xhci: amd chipset also needs short TX quirk xhci: Treat not finding the event_seg on COMP_STOP the same as COMP_STOP_INVAL Staging: speakup: Update __speakup_paste_selection() tty (ab)usage to match vt jbd2: fix infinite loop when recovering corrupt journal blocks mei: nfc: fix memory leak in error path mei: reset client state on queued connect request Btrfs: fix csum tree corruption, duplicate and outdated checksums hpsa: fix bad -ENOMEM return value in hpsa_big_passthru_ioctl x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub x86_64/vsyscall: Fix warn_bad_vsyscall log output x86: don't exclude low BIOS area when allocating address space for non-PCI cards drm/radeon: add additional SI pci ids ext4: fix BUG_ON in mb_free_blocks() kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601) Revert "KVM: x86: Increase the number of fixed MTRR regs to 10" KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table KVM: x86: Inter-privilege level ret emulation is not implemeneted crypto: ux500 - make interrupt mode plausible serial: core: Preserve termios c_cflag for console resume ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct drivers/i2c/busses: use correct type for dma_map/unmap hwmon: (dme1737) Prevent overflow problem when writing large limits hwmon: (ads1015) Fix out-of-bounds array access hwmon: (lm85) Fix various errors on attribute writes hwmon: (ads1015) Fix off-by-one for valid channel index checking hwmon: (gpio-fan) Prevent overflow problem when writing large limits hwmon: (lm78) Fix overflow problems seen when writing large temperature limits hwmon: (sis5595) Prevent overflow problem when writing large limits drm: omapdrm: fix compiler errors ARM: OMAP3: Fix choice of omap3_restore_es function in OMAP34XX rev3.1.2 case. mei: start disconnect request timer consistently ALSA: hda/realtek - Avoid setting wrong COEF on ALC269 & co ALSA: hda/ca0132 - Don't try loading firmware at resume when already failed ALSA: virtuoso: add Xonar Essence STX II support ALSA: hda - fix an external mic jack problem on a HP machine USB: Fix persist resume of some SS USB devices USB: ehci-pci: USB host controller support for Intel Quark X1000 USB: serial: ftdi_sio: Add support for new Xsens devices USB: serial: ftdi_sio: Annotate the current Xsens PID assignments USB: OHCI: don't lose track of EDs when a controller dies isofs: Fix unbounded recursion when processing relocated directories HID: fix a couple of off-by-ones HID: logitech: perform bounds checking on device_id early enough stable_kernel_rules: Add pointer to netdev-FAQ for network patches Linux 3.10.53 arch/sparc/math-emu/math_32.c: drop stray break operator sparc64: ldc_connect() should not return EINVAL when handshake is in progress. sunsab: Fix detection of BREAK on sunsab serial console bbc-i2c: Fix BBC I2C envctrl on SunBlade 2000 sparc64: Guard against flushing openfirmware mappings. sparc64: Do not insert non-valid PTEs into the TSB hash table. sparc64: Add membar to Niagara2 memcpy code. sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus. sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses. sparc64: Fix top-level fault handling bugs. sparc64: Handle 32-bit tasks properly in compute_effective_address(). sparc64: Make itc_sync_lock raw sparc64: Fix argument sign extension for compat_sys_futex(). sctp: fix possible seqlock seadlock in sctp_packet_transmit() iovec: make sure the caller actually wants anything in memcpy_fromiovecend net: Correctly set segment mac_len in skb_segment(). macvlan: Initialize vlan_features to turn on offload support. net: sctp: inherit auth_capable on INIT collisions tcp: Fix integer-overflow in TCP vegas tcp: Fix integer-overflows in TCP veno net: sendmsg: fix NULL pointer dereference ip: make IP identifiers less predictable inetpeer: get rid of ip_id_count bnx2x: fix crash during TSO tunneling Linux 3.10.52 x86/espfix/xen: Fix allocation of pages for paravirt page tables lib/btree.c: fix leak of whole btree nodes net/l2tp: don't fall back on UDP [get\|set]sockopt net: mvneta: replace Tx timer with a real interrupt net: mvneta: add missing bit descriptions for interrupt masks and causes net: mvneta: do not schedule in mvneta_tx_timeout net: mvneta: use per_cpu stats to fix an SMP lock up net: mvneta: increase the 64-bit rx/tx stats out of the hot path Revert "mac80211: move "bufferable MMPDU" check to fix AP mode scan" staging: vt6655: Fix Warning on boot handle_irq_event_percpu. x86_64/entry/xen: Do not invoke espfix64 on Xen x86, espfix: Make it possible to disable 16-bit support x86, espfix: Make espfix64 a Kconfig option, fix UML x86, espfix: Fix broken header guard x86, espfix: Move espfix definitions into a separate header file x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks printk: rename printk_sched to printk_deferred iio: buffer: Fix demux table creation staging: vt6655: Fix disassociated messages every 10 seconds mm, thp: do not allow thp faults to avoid cpuset restrictions scsi: handle flush errors properly rapidio/tsi721_dma: fix failure to obtain transaction descriptor cfg80211: fix mic_failure tracing ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout crypto: af_alg - properly label AF_ALG socket Linux 3.10.51 core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors x86/efi: Include a .bss section within the PE/COFF headers s390/ptrace: fix PSW mask check Fix gcc-4.9.0 miscompilation of load_balance() in scheduler mm: hugetlb: fix copy_hugetlb_page_range() x86_32, entry: Store badsys error code in %eax hwmon: (smsc47m192) Fix temperature limit and vrm write operations parisc: Remove SA_RESTORER define coredump: fix the setting of PF_DUMPCORE Input: fix defuzzing logic slab_common: fix the check for duplicate slab names slab_common: Do not check for duplicate slab names tracing: Fix wraparound problems in "uptime" trace clock blkcg: don't call into policy draining if root_blkg is already gone ahci: add support for the Promise FastTrak TX8660 SATA HBA (ahci mode) libata: introduce ata_host->n_tags to avoid oops on SAS controllers libata: support the ata host which implements a queue depth less than 32 block: don't assume last put of shared tags is for the host block: provide compat ioctl for BLKZEROOUT media: tda10071: force modulation to QPSK on DVB-S media: hdpvr: fix two audio bugs Linux 3.10.50 ARC: Implement ptrace(PTRACE_GET_THREAD_AREA) sched: Fix possible divide by zero in avg_atom() calculation locking/mutex: Disable optimistic spinning on some architectures PM / sleep: Fix request_firmware() error at resume dm cache metadata: do not allow the data block size to change dm thin metadata: do not allow the data block size to change alarmtimer: Fix bug where relative alarm timers were treated as absolute drm/radeon: avoid leaking edid data drm/qxl: return IRQ_NONE if it was not our irq drm/radeon: set default bl level to something reasonable irqchip: gic: Fix core ID calculation when topology is read from DT irqchip: gic: Add support for cortex a7 compatible string ring-buffer: Fix polling on trace_pipe mwifiex: fix Tx timeout issue perf/x86/intel: ignore CondChgd bit to avoid false NMI handling ipv4: fix buffer overflow in ip_options_compile() dns_resolver: Null-terminate the right string dns_resolver: assure that dns_query() result is null-terminated sunvnet: clean up objects created in vnet_new() on vnet_exit() net: pppoe: use correct channel MTU when using Multilink PPP net: sctp: fix information leaks in ulpevent layer tipc: clear 'next'-pointer of message fragments before reassembly be2net: set EQ DB clear-intr bit in be_open() netlink: Fix handling of error from netlink_dump(). net: mvneta: Fix big endian issue in mvneta_txq_desc_csum() net: mvneta: fix operation in 10 Mbit/s mode appletalk: Fix socket referencing in skb tcp: fix false undo corner cases igmp: fix the problem when mc leave group net: qmi_wwan: add two Sierra Wireless/Netgear devices net: qmi_wwan: Add ID for Telewell TW-LTE 4G v2 ipv4: icmp: Fix pMTU handling for rare case tcp: Fix divide by zero when pushing during tcp-repair bnx2x: fix possible panic under memory stress net: fix sparse warning in sk_dst_set() ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix ipv4: fix dst race in sk_dst_get() 8021q: fix a potential memory leak net: sctp: check proc_dointvec result in proc_sctp_do_auth tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb ip_tunnel: fix ip_tunnel_lookup shmem: fix splicing from a hole while it's punched shmem: fix faulting into a hole, not taking i_mutex shmem: fix faulting into a hole while it's punched iwlwifi: dvm: don't enable CTS to self igb: do a reset on SR-IOV re-init if device is down hwmon: (adt7470) Fix writes to temperature limit registers hwmon: (da9052) Don't use dash in the name attribute hwmon: (da9055) Don't use dash in the name attribute tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs tracing: Fix graph tracer with stack tracer on other archs fuse: handle large user and group ID Bluetooth: Ignore H5 non-link packets in non-active state Drivers: hv: util: Fix a bug in the KVP code media: gspca_pac7302: Add new usb-id for Genius i-Look 317 usb: Check if port status is equal to RxDetect Signed-off-by: Ian Maund <imaund@codeaurora.org>	2015-05-01 13:34:57 -07:00
Al Viro	6637ecd306	move d_rcu from overlapping d_child to overlapping d_alias commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Ben Hutchings <ben@decadent.org.uk> [hujianyang: Backported to 3.10 refer to the work of Ben Hutchings in 3.2: - Apply name changes in all the different places we use d_alias and d_child - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()] Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-04-29 10:34:00 +02:00
Peter Hurley	391f1c610a	console: Fix console name size mismatch commit 30a22c215a0007603ffc08021f2e8b64018517dd upstream. commit `6ae9200f2c` ("enlarge console.name") increased the storage for the console name to 16 bytes, but not the corresponding struct console_cmdline::name storage. Console names longer than 8 bytes cause read beyond end-of-string and failure to match console; I'm not sure if there are other unexpected consequences. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-04-19 10:10:51 +02:00
Peter Zijlstra	a49a0c95f4	perf: Fix irq_work 'tail' recursion commit d525211f9d1be8b523ec7633f080f2116f5ea536 upstream. Vince reported a watchdog lockup like: [<ffffffff8115e114>] perf_tp_event+0xc4/0x210 [<ffffffff810b4f8a>] perf_trace_lock+0x12a/0x160 [<ffffffff810b7f10>] lock_release+0x130/0x260 [<ffffffff816c7474>] _raw_spin_unlock_irqrestore+0x24/0x40 [<ffffffff8107bb4d>] do_send_sig_info+0x5d/0x80 [<ffffffff811f69df>] send_sigio_to_task+0x12f/0x1a0 [<ffffffff811f71ce>] send_sigio+0xae/0x100 [<ffffffff811f72b7>] kill_fasync+0x97/0xf0 [<ffffffff8115d0b4>] perf_event_wakeup+0xd4/0xf0 [<ffffffff8115d103>] perf_pending_event+0x33/0x60 [<ffffffff8114e3fc>] irq_work_run_list+0x4c/0x80 [<ffffffff8114e448>] irq_work_run+0x18/0x40 [<ffffffff810196af>] smp_trace_irq_work_interrupt+0x3f/0xc0 [<ffffffff816c99bd>] trace_irq_work_interrupt+0x6d/0x80 Which is caused by an irq_work generating new irq_work and therefore not allowing forward progress. This happens because processing the perf irq_work triggers another perf event (tracepoint stuff) which in turn generates an irq_work ad infinitum. Avoid this by raising the recursion counter in the irq_work -- which effectively disables all software events (including tracepoints) from actually triggering again. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-04-13 14:02:12 +02:00
Linux Build Service Account	bb5760c08d	Merge "sched: fix race conditions where HMP tunables change"	2015-04-07 01:00:42 -07:00
Linux Build Service Account	07ecb8c824	Merge "sched: check HMP scheduler tunables validity"	2015-04-07 01:00:41 -07:00
Linux Build Service Account	93adc6a54e	Merge "exit: Add PANIC_ON_RECURSIVE_FAULT Kconfig option"	2015-04-07 01:00:29 -07:00
Joonwoo Park	dfc7b64857	sched: fix race conditions where HMP tunables change When multiple threads race to update HMP scheduler tunables, at present, the tunables which require big/small task count fix-up can be updated without fix-up and it can trigger BUG_ON(). This happens because sched_hmp_proc_update_handler() acquires rq locks and does fix-up only when number of big/small tasks affecting tunables are updated even though the function sched_hmp_proc_update_handler() calls set_hmp_defaults() which re-calculates all sysctl input data at that point. Consequently a thread that is trying to update a tunable which does not affect big/small task count can call set_hmp_defaults() and update big/small task count affecting tunable without fix-up if there is another thread and it just set fix-up needed sysctl value. Example of problem scenario : thread 0 thread 1 Set sched_small_task – needs fix up. Set sched_init_task_load – no fix up needed. proc_dointvec_minmax() completed which means sysctl_sched_small_task has new value. Call set_hmp_defaults() without lock/fixup. set_hmp_defaults() still updates sched_small_tasks with new sysctl_sched_small_task value by thread 0. Fix such issue by embracing proc update handler with already existing policy mutex. CRs-fixed: 812443 Change-Id: I7aa4c0efc1ca56e28dc0513480aca3264786d4f7 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-04-06 18:34:59 -07:00
Joonwoo Park	eeb8ce07ec	sched: check HMP scheduler tunables validity Check tunables validity to take valid values only. CRs-fixed: 812443 Change-Id: Ibb9ec0d6946247068174ab7abe775a6389412d5b Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-04-06 18:34:31 -07:00
Joonwoo Park	93b96a6fc9	timer: reduce cache bouncing of deferral timer wheel Commit `a40f752` (timer: make deferrable cpu unbound timers really not bound to a cpu) made any non-idle CPUs in the system can service expired deferral timer entries. This means deferral timer entries can be serviced as early as possible after it expires but in many cases it's suboptimal since switching the CPU of timer wheel incurs a cache bouncing/synchronization cost. Reduce cache bouncing by servicing the deferrable timer wheel with timer tick CPU as it doesn't bounce as often but still guarantees to run timer softirq when there is any CPU that isn't idle. CRs-fixed: 815184 Change-Id: I6d51390832538ad90eccf459e90bff0b25aad09f Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-04-04 01:52:51 -07:00
Matt Wagantall	1b5958bc7c	exit: Add PANIC_ON_RECURSIVE_FAULT Kconfig option If a recursive fault is detected during do_exit(), tasks are left to sit and wait in an un-interruptible sleep until the system reboots (typically manually). Add Kconfig option to change this behaviour and force a panic. This is particularly important if a critical system task encounters a recursive fault (ex. a kworker). Otherwise, the system may be unusable, but since the scheduler is still running system watchdogs may continue to be pet. Change-Id: Ifc26fc79d6066f05a3b2c4d27f78bf4f8d2bd640 Signed-off-by: Matt Wagantall <mattw@codeaurora.org>	2015-03-31 23:54:50 -07:00
Tejun Heo	d8bee0e3ab	workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE commit 8603e1b30027f943cc9c1eef2b291d42c3347af1 upstream. cancel[_delayed]_work_sync() are implemented using __cancel_work_timer() which grabs the PENDING bit using try_to_grab_pending() and then flushes the work item with PENDING set to prevent the on-going execution of the work item from requeueing itself. try_to_grab_pending() can always grab PENDING bit without blocking except when someone else is doing the above flushing during cancelation. In that case, try_to_grab_pending() returns -ENOENT. In this case, __cancel_work_timer() currently invokes flush_work(). The assumption is that the completion of the work item is what the other canceling task would be waiting for too and thus waiting for the same condition and retrying should allow forward progress without excessive busy looping Unfortunately, this doesn't work if preemption is disabled or the latter task has real time priority. Let's say task A just got woken up from flush_work() by the completion of the target work item. If, before task A starts executing, task B gets scheduled and invokes __cancel_work_timer() on the same work item, its try_to_grab_pending() will return -ENOENT as the work item is still being canceled by task A and flush_work() will also immediately return false as the work item is no longer executing. This puts task B in a busy loop possibly preventing task A from executing and clearing the canceling state on the work item leading to a hang. task A task B worker executing work __cancel_work_timer() try_to_grab_pending() set work CANCELING flush_work() block for work completion completion, wakes up A __cancel_work_timer() while (forever) { try_to_grab_pending() -ENOENT as work is being canceled flush_work() false as work is no longer executing } This patch removes the possible hang by updating __cancel_work_timer() to explicitly wait for clearing of CANCELING rather than invoking flush_work() after try_to_grab_pending() fails with -ENOENT. Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com v3: bit_waitqueue() can't be used for work items defined in vmalloc area. Switched to custom wake function which matches the target work item and exclusive wait and wakeup. v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if the target bit waitqueue has wait_bit_queue's on it. Use DEFINE_WAIT_BIT() and __wake_up_bit() instead. Reported by Tomeu Vizoso. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rabin Vincent <rabin.vincent@axis.com> Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com> Tested-by: Jesper Nilsson <jesper.nilsson@axis.com> Tested-by: Rabin Vincent <rabin.vincent@axis.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-26 15:00:58 +01:00
Syed Rameez Mustafa	9ce460cacd	sched: Ensure attempting load balance when HMP active balance flags are set find_busiest_group() can end up returning a NULL group due to load based checks even though there are tasks that can be migrated to higher capacity CPUs (LBF_BIG_TASK_ACTIVE_BALANCE) or EA core rotation is possible (LBF_EA_ACTIVE_BALANCE). To get best power and performance ensure that load balance does attempt to pull tasks when HMP_ACTIVE_BALANCE flag is set. Since sched boost also falls under the same category club it into the same generic condition. Change-Id: I3db7ec200d2a038917b1f2341602eb87b5aed289 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2015-03-20 11:13:24 -07:00
Michael Scott	8dbaea2b3d	PM / QoS: remove duplicate call to pm_qos_update_target In 3.10.y backport patch `1dba303727`, the logic to call pm_qos_update_target was moved to __pm_qos_update_request. However, the original code was left in function pm_qos_update_request. Currently, if pm_qos_update_request is called where new_value != req->node.prio then pm_qos_update_target will be called twice in a row. Once in pm_qos_update_request and then again in the following call to _pm_qos_update_request. Removing the left over code from pm_qos_update_request stops this second call to pm_qos_update_target where the work of removing / re-adding the new_value in the constraints list would be duplicated. Signed-off-by: Michael Scott <michael.scott@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-18 13:22:28 +01:00
John Stultz	72e2d60926	ntp: Fixup adjtimex freq validation on 32-bit systems commit 29183a70b0b828500816bd794b3fe192fce89f73 upstream. Additional validation of adjtimex freq values to avoid potential multiplication overflows were added in commit 5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values) Unfortunately the patch used LONG_MAX/MIN instead of LLONG_MAX/MIN, which was fine on 64-bit systems, but being much smaller on 32-bit systems caused false positives resulting in most direct frequency adjustments to fail w/ EINVAL. ntpd only does direct frequency adjustments at startup, so the issue was not as easily observed there, but other time sync applications like ptpd and chrony were more effected by the bug. See bugs: https://bugzilla.kernel.org/show_bug.cgi?id=92481 https://bugzilla.redhat.com/show_bug.cgi?id=1188074 This patch changes the checks to use LLONG_MAX for clarity, and additionally the checks are disabled on 32-bit systems since LLONG_MAX/PPM_SCALE is always larger then the 32-bit long freq value, so multiplication overflows aren't possible there. Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Reported-by: George Joseph <george.joseph@fairview5.com> Tested-by: George Joseph <george.joseph@fairview5.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sasha Levin <sasha.levin@oracle.com> Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stultz@linaro.org [ Prettified the changelog and the comments a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-06 14:40:52 -08:00
Jay Lan	ab92b84e8d	kdb: fix incorrect counts in KDB summary command output commit 146755923262037fc4c54abc28c04b1103f3cc51 upstream. The output of KDB 'summary' command should report MemTotal, MemFree and Buffers output in kB. Current codes report in unit of pages. A define of K(x) as is defined in the code, but not used. This patch would apply the define to convert the values to kB. Please include me on Cc on replies. I do not subscribe to linux-kernel. Signed-off-by: Jay Lan <jlan@sgi.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-06 14:40:52 -08:00
Vikram Mulukutla	0d998434ce	tracing: Fix unmapping loop in tracing_mark_write commit 7215853e985a4bef1a6c14e00e89dfec84f1e457 upstream. Commit `6edb2a8a38` introduced an array map_pages that contains the addresses returned by kmap_atomic. However, when unmapping those pages, map_pages[0] is unmapped before map_pages[1], breaking the nesting requirement as specified in the documentation for kmap_atomic/kunmap_atomic. This was caught by the highmem debug code present in kunmap_atomic. Fix the loop to do the unmapping properly. Link: http://lkml.kernel.org/r/1418871056-6614-1-git-send-email-markivx@codeaurora.org Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Reported-by: Lime Yang <limey@codeaurora.org> Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-03-06 14:40:50 -08:00
Srivatsa Vaddagiri	cbf9e59b96	sched: Add cgroup-based criteria for upmigration It may be desirable to discourage upmigration of tasks belonging to some cgroups. Add a per-cgroup flag (upmigrate_discourage) that discourages upmigration of tasks of a cgroup. Tasks of the cgroup are allowed to upmigrate only under overcommitted scenario. Change-Id: I1780e420af1b6865c5332fb55ee1ee408b74d8ce Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2015-03-05 15:50:56 -08:00
Joonwoo Park	ed177c3203	sched: actively migrate big tasks on power CPU to idle performance CPU When performance CPU runs idle or newly idle load balancer to pull a task on power efficient CPU, the load balancer always fails and enters idle mode if the big task on the power efficient CPU is running. This is suboptimal when the running task on the power efficient CPU doesn't fit on the power efficient CPU as it's quite possible that the big task will sustain on the power efficient CPU until it's preempted while there is a performance CPU sitting idle. Revise load balancer algorithm to actively migrate big tasks on power efficient CPU to performance CPU when performance CPU runs idle or newly idle load balancer. Change-Id: Iaf05e0236955fdcc7ded0ff09af0880050a2be32 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-03-04 11:30:27 -08:00
Linux Build Service Account	a234225810	Merge "sched: avoid running idle_balance() on behalf of wrong CPU"	2015-03-03 18:27:22 -08:00
Linux Build Service Account	ded4c87aba	Merge "sched: Avoid pulling big tasks to the little cluster during load balance"	2015-03-03 18:27:21 -08:00
Joonwoo Park	d577d4a23c	sched: avoid running idle_balance() on behalf of wrong CPU With EA (Energy Awareness), idle_balance() on a CPU runs on behalf of most power efficient idle CPU among the CPUs in its sched domain level under the condition that the substitute idle CPU should be limited to a CPU which has the same capacity with original idle CPU. It is found that at present idle_balance() spans all the CPUs in its sched domain and run idle balancer on behalf of any CPU within the domain which could be all the CPUs in the system which consequently makes idle balancer on a performance CPU always runs on behalf of a power efficient idle CPU. This would cause for idle performance CPUs to fail to pull tasks from power efficient CPUs always when there is only an online performance CPU. Limit search CPUs to cache sharing CPUs with original idle CPU to ensure to run idle balancre on behalf of more power efficient CPU but still has the same capacity with original CPU to fix such issue. Change-Id: I0575290c24f28db011d9353915186e64df7e57fe Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-03-01 08:41:23 -08:00
Syed Rameez Mustafa	70f7aae2e8	sched: Avoid pulling big tasks to the little cluster during load balance When a lower capacity CPU attempts to pull work from a higher capacity CPU, during load balance, it does not distinguish between tasks that will fit or not fit on the destination CPU. This causes suboptimal load balancing decisions whereby big tasks end up on the lower capacity CPUs and little tasks remain on higher capacity CPUs. Avoid this behavior, by first restricting search to only include tasks that fit on the destination CPU. If such a task cannot be found, remove this restriction so that any task can be pulled over to the destination CPU. This behavior is not applicable during sched_boost, however, as none of the tasks will fit on a lower capacity CPU. Change-Id: I1093420a629a0886fc3375849372ab7cf42e928e Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2015-02-28 16:50:45 -08:00
Syed Rameez Mustafa	7b0a1a7e87	sched: Avoid pulling all tasks from a CPU during load balance When running load balance, the destination CPU checks the number of running tasks on the busiest CPU without holding the busiest CPUs runqueue lock. This opens the load balancer to a race whereby a third CPU running load balance at the same time; having found the same busiest group and queue, may have already pulled one of the waiting tasks from the busiest CPU. Under scenarios where the source CPU is running the idle task and only a single task remains waiting on the busiest runqueue (nr_running = 1), the destination CPU will end up pulling the only enqueued task from that CPU, leaving the destination CPU with nothing left to run. Fix this race, by reconfirming nr_running for the busiest CPU, after its runqueue lock has been obtained. Change-Id: I42e132b15f96d9d5d7b32ef4de3fb92d2f837e63 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2015-02-28 16:50:37 -08:00
Srivatsa Vaddagiri	8f9ba192b6	sched: Keep track of average nr_big_tasks Extend sched_get_nr_running_avg() API to return average nr_big_tasks, in addition to average nr_running and average nr_io_wait tasks. Also add a new trace point to record values returned by sched_get_nr_running_avg() API. Change-Id: Id3591e6d04da8db484b4d1cb9d95dba075f5ab9a Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2015-02-26 10:37:26 -08:00
Srivatsa Vaddagiri	3f116af70d	sched: Fix bug in average nr_running and nr_iowait calculation sched_get_nr_running_avg() returns average nr_running and nr_iowait task count since it was last invoked. Fix several bugs in their calculation. * sched_update_nr_prod() needs to consider that nr_running count can change by more than 1 when CFS_BANDWIDTH feature is used * sched_get_nr_running_avg() needs to sum up nr_iowait count across all cpus, rather than just one * sched_get_nr_running_avg() could race with sched_update_nr_prod(), as a result of which it could use curr_time which is behind a cpu's 'last_time' value. That would lead to erroneous calculation of average nr_running or nr_iowait. While at it, fix also a bug in BUG_ON() check in sched_update_nr_prod() function and remove unnecessary nr_running argument to sched_update_nr_prod() function. Change-Id: I46737614737292fae0d7204c4648fb9b862f65b2 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2015-02-26 10:33:52 -08:00
Olav Haugan	f0d9067045	sched/fair: Respect wake to idle over sync wakeup Sync wakeup currently takes precedence over wake to idle flag. A sync wakeup causes a task to be placed on a non-idle CPU because we expect this CPU to become idle very shortly. However, even though the sync flag is set there is no guarantee that the task will go to sleep right away As a consequence performance suffers. Fix this by preferring an idle CPU over a potential busy cpu when both wake to idle and sync wakeup are set. Change-Id: I6b40a44e2b4d5b5fa6088e4f16428f9867bd928d CRs-fixed: 794424 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2015-02-24 18:35:45 -08:00
bait_dispatcher_monitor_system	1cac96e2e4	Merge "sched: fix rounding error on scaled execution time calculation"	2015-02-19 13:45:45 -08:00
Joonwoo Park	3a037e913e	sched: fix rounding error on scaled execution time calculation It's found that the scaled execution time can be less than its actual time due to rounding errors. The HMP scheduler accumulates scaled execution time of tasks to determine if tasks are in need of up-migration. But the rounding error prevents the HMP scheduler from accumulating 100% load which prevents us from ever reaching an up-migrate of 100%. Fix rounding error by rounding quotient up. CRs-fixed: 759041 Change-Id: Ie4d9693593cc3053a292a29078aa56e6de8a2d52 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-02-18 19:54:12 -08:00
Patrick Daly	a5ad036f58	msm: rtb: Fix buffer corruption issue Consider the case of a nentries==8 and 3 cpus. Numbers in parenthesis are the equivalent location in the circular buffer. CPU: Index0: Index1: Index2: Index3: 0 0 3 6 9(1) 1 1 4 7 10(2) 2 2 5 8(0) The current design is only appropriate for the case where nentries % nrcpus == 0. Fix this issue by incrementing the index by (nentries % nrcpus) each time circular buffer wraps around. CPU: Index0: Index1: Index2: 0 0 3 6+2==8(0) 1 1 4 7+2==9(1) 2 2 5 8+2==10(2) Change-Id: I4f96eb4c971cc18357e145dabcf4272e466dcda2 Signed-off-by: Patrick Daly <pdaly@codeaurora.org>	2015-02-18 13:02:11 -08:00
Tejun Heo	ede8177f2e	workqueue: fix subtle pool management issue which can stall whole worker_pool A worker_pool's forward progress is guaranteed by the fact that the last idle worker assumes the manager role to create more workers and summon the rescuers if creating workers doesn't succeed in timely manner before proceeding to execute work items. This manager role is implemented in manage_workers(), which indicates whether the worker may proceed to work item execution with its return value. This is necessary because multiple workers may contend for the manager role, and, if there already is a manager, others should proceed to work item execution. Unfortunately, the function also indicates that the worker may proceed to work item execution if need_to_create_worker() is false at the head of the function. need_to_create_worker() tests the following conditions. pending work items && !nr_running && !nr_idle The first and third conditions are protected by pool->lock and thus won't change while holding pool->lock; however, nr_running can change asynchronously as other workers block and resume and while it's likely to be zero, as someone woke this worker up in the first place, some other workers could have become runnable inbetween making it non-zero. If this happens, manage_worker() could return false even with zero nr_idle making the worker, the last idle one, proceed to execute work items. If then all workers of the pool end up blocking on a resource which can only be released by a work item which is pending on that pool, the whole pool can deadlock as there's no one to create more workers or summon the rescuers. This patch fixes the problem by removing the early exit condition from maybe_create_worker() and making manage_workers() return false iff there's already another manager, which ensures that the last worker doesn't start executing work items. We can leave the early exit condition alone and just ignore the return value but the only reason it was put there is because the manage_workers() used to perform both creations and destructions of workers and thus the function may be invoked while the pool is trying to reduce the number of workers. Now that manage_workers() is called only when more workers are needed, the only case this early exit condition is triggered is rare race conditions rendering it pointless. Tested with simulated workload and modified workqueue code which trigger the pool deadlock reliably without this patch. Change-Id: If0ca1b86c06ee17b7fbb670b0e70b421a256b2b7 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Eric Sandeen <sandeen@sandeen.net> Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net Cc: Dave Chinner <david@fromorbit.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: stable@vger.kernel.org CRs-fixed: 783758 Git-commit: 29187a9eeaf362d8422e62e17a22a6e115277a49 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git [joonwoop@codeaurora.org: fixed conflicts in comments and manage_workers() to keep maybe_destroy_workers()] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-02-14 00:42:27 -08:00
Thomas Gleixner	d1217e81bc	hrtimer: Prevent stale expiry time in hrtimer_interrupt() hrtimer_interrupt() has the following subtle issue: hrtimer_interrupt() lock(cpu_base); expires_next = KTIME_MAX; expire_timers(CLOCK_MONOTONIC); expires = get_next_timer(CLOCK_MONOTONIC); if (expires < expires_next) expires_next = expires; expire_timers(CLOCK_REALTIME); unlock(cpu_base); wakeup() hrtimer_start(CLOCK_MONOTONIC, newtimer); lock(cpu_base(); expires = get_next_timer(CLOCK_REALTIME); if (expires < expires_next) expires_next = expires; So because we already evaluated the next expiring timer of CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be earlier than the overall next expiry time in hrtimer_interrupt(). To solve this, remove the caching of the next expiry value from hrtimer_interrupt() and reevaluate all active clock bases for the next expiry value. To avoid another code duplication, create a shared evaluation function and use it for hrtimer_get_next_event(), hrtimer_force_reprogram() and hrtimer_interrupt(). There is another subtlety in this mechanism: While hrtimer_interrupt() is running, we want to avoid to touch the hardware device because we will reprogram it anyway at the end of hrtimer_interrupt(). This works nicely for hrtimers which get rearmed via the HRTIMER_RESTART mechanism, because we drop out when the callback on that CPU is running. But that fails, if a new timer gets enqueued like in the example above. This has another implication: While hrtimer_interrupt() is running we refuse remote enqueueing of timers - see hrtimer_interrupt() and hrtimer_check_target(). hrtimer_interrupt() tries to prevent this by setting cpu_base->expires to KTIME_MAX, but that fails if a new timer gets queued. Prevent both the hardware access and the remote enqueue explicitely. We can loosen the restriction on the remote enqueue now due to reevaluation of the next expiry value, but that needs a seperate patch. Folded in a fix from Vignesh Radhakrishnan. Change-Id: I803322cc29a294eab73fa2046e9f3a2e5f66755e Reported-and-tested-by: Stanislav Fomichev <stfomichev@yandex-team.ru> Based-on-patch-by: Stanislav Fomichev <stfomichev@yandex-team.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: vigneshr@codeaurora.org Cc: john.stultz@linaro.org Cc: viresh.kumar@linaro.org Cc: fweisbec@gmail.com Cc: cl@linux.com Cc: stuart.w.hayes@gmail.com Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Patch-mainline : linux-arm-kernel @ 01/23/15, 03:21 [vigneshr@codeaurora.org : Changes to the file kernel/time/hrtimer.c is made to the file kernel/hrtimer.c] Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>	2015-02-13 04:20:48 -08:00
Lai Jiangshan	677616e3ec	smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread() commit 4bee96860a65c3a62d332edac331b3cf936ba3ad upstream. The following race exists in the smpboot percpu threads management: CPU0 CPU1 cpu_up(2) get_online_cpus(); smpboot_create_threads(2); smpboot_register_percpu_thread(); for_each_online_cpu(); __smpboot_create_thread(); __cpu_up(2); This results in a missing per cpu thread for the newly onlined cpu2 and in a NULL pointer dereference on a consecutive offline of that cpu. Proctect smpboot_register_percpu_thread() with get_online_cpus() to prevent that. [ tglx: Massaged changelog and removed the change in smpboot_unregister_percpu_thread() because that's an optimization and therefor not stable material. ] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1406777421-12830-1-git-send-email-laijs@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-02-11 14:48:17 +08:00
Biswajit Paul	e758417e7c	kernel: Restrict permissions of /proc/iomem. The permissions of /proc/iomem currently are -r--r--r--. Everyone can see its content. As iomem contains information about the physical memory content of the device, restrict the information only to root. Change-Id: If0be35c3fac5274151bea87b738a48e6ec0ae891 CRs-Fixed: 786116 Signed-off-by: Biswajit Paul <biswajitpaul@codeaurora.org> Signed-off-by: Avijit Kanti Das <avijitnsec@codeaurora.org>	2015-02-09 16:17:30 -08:00
Tejun Heo	85be16bad7	workqueue: fix subtle pool management issue which can stall whole worker_pool commit 29187a9eeaf362d8422e62e17a22a6e115277a49 upstream. A worker_pool's forward progress is guaranteed by the fact that the last idle worker assumes the manager role to create more workers and summon the rescuers if creating workers doesn't succeed in timely manner before proceeding to execute work items. This manager role is implemented in manage_workers(), which indicates whether the worker may proceed to work item execution with its return value. This is necessary because multiple workers may contend for the manager role, and, if there already is a manager, others should proceed to work item execution. Unfortunately, the function also indicates that the worker may proceed to work item execution if need_to_create_worker() is false at the head of the function. need_to_create_worker() tests the following conditions. pending work items && !nr_running && !nr_idle The first and third conditions are protected by pool->lock and thus won't change while holding pool->lock; however, nr_running can change asynchronously as other workers block and resume and while it's likely to be zero, as someone woke this worker up in the first place, some other workers could have become runnable inbetween making it non-zero. If this happens, manage_worker() could return false even with zero nr_idle making the worker, the last idle one, proceed to execute work items. If then all workers of the pool end up blocking on a resource which can only be released by a work item which is pending on that pool, the whole pool can deadlock as there's no one to create more workers or summon the rescuers. This patch fixes the problem by removing the early exit condition from maybe_create_worker() and making manage_workers() return false iff there's already another manager, which ensures that the last worker doesn't start executing work items. We can leave the early exit condition alone and just ignore the return value but the only reason it was put there is because the manage_workers() used to perform both creations and destructions of workers and thus the function may be invoked while the pool is trying to reduce the number of workers. Now that manage_workers() is called only when more workers are needed, the only case this early exit condition is triggered is rare race conditions rendering it pointless. Tested with simulated workload and modified workqueue code which trigger the pool deadlock reliably without this patch. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Eric Sandeen <sandeen@sandeen.net> Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net Cc: Dave Chinner <david@fromorbit.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-02-05 22:35:40 -08:00
Vignesh Radhakrishnan	e07ac5d19c	kmemleak : Make module scanning optional using config Currently kmemleak scans module memory as provided in the area list. This takes up lot of time with irq's and preemption disabled. Provide a compile time configurable config to enable this functionality. Change-Id: I5117705e7e6726acdf492e7f87c0703bc1f28da0 Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>	2015-02-04 18:42:37 +05:30
Linux Build Service Account	651997c82a	Merge "sched: Remove sched_wake_to_idle for HMP scheduler"	2015-02-03 15:23:53 -08:00
Linux Build Service Account	df92d19819	Merge "sched: Support CFS_BANDWIDTH feature in HMP scheduler"	2015-02-03 15:23:51 -08:00
Linux Build Service Account	54dd18b6a1	Merge "sched: Consolidate hmp stats into their own struct"	2015-02-03 15:23:50 -08:00
Linux Build Service Account	d00c2b9d82	Merge "sched: Add userspace interface to set PF_WAKE_UP_IDLE"	2015-02-03 15:23:50 -08:00
Sasha Levin	2497402c9a	time: adjtimex: Validate the ADJ_FREQUENCY values commit 5e5aeb4367b450a28f447f6d5ab57d8f2ab16a5f upstream. Verify that the frequency value from userspace is valid and makes sense. Unverified values can cause overflows later on. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> [jstultz: Fix up bug for negative values and drop redunent cap check] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-29 17:40:56 -08:00
Sasha Levin	7336dcc213	time: settimeofday: Validate the values of tv from user commit 6ada1fc0e1c4775de0e043e1bd3ae9d065491aa5 upstream. An unvalidated user input is multiplied by a constant, which can result in an undefined behaviour for large values. While this is validated later, we should avoid triggering undefined behaviour. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> [jstultz: include trivial milisecond->microsecond correction noticed by Andy] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-29 17:40:56 -08:00
Srivatsa Vaddagiri	930bca74d2	sched: Remove sched_wake_to_idle for HMP scheduler sched_wake_to_idle tunable is obsoleted by sched_prefer_idle tunable in HMP scheduler. Remove the same when CONFIG_SCHED_HMP is defined Change-Id: I7bcf12cc3c50df5ef09261f097711c9f29ec63a4 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2015-01-28 14:22:12 +05:30
Srivatsa Vaddagiri	2385d33016	sched: Support CFS_BANDWIDTH feature in HMP scheduler CFS_BANDWIDTH feature is not currently well-supported by HMP scheduler. Issues encountered include a kernel panic when rq->nr_big_tasks count becomes negative. This patch fixes HMP scheduler code to better handle CFS_BANDWIDTH feature. The most prominent change introduced is maintenance of HMP stats (nr_big_tasks, nr_small_tasks, cumulative_runnable_avg) per 'struct cfs_rq' in addition to being maintained in each 'struct rq'. This allows HMP stats to be updated easily when a group is throttled on a cpu. Change-Id: Iad9f378b79ab5d9d76f86d1775913cc1941e266a Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2015-01-28 14:13:19 +05:30
Srivatsa Vaddagiri	bbef4c5e1b	sched: Consolidate hmp stats into their own struct Key hmp stats (nr_big_tasks, nr_small_tasks and cumulative_runnable_average) are currently maintained per-cpu in 'struct rq'. Merge those stats in their own structure (struct hmp_sched_stats) and modify impacted functions to deal with the newly introduced structure. This cleanup is required for a subsequent patch which fixes various issues with use of CFS_BANDWIDTH feature in HMP scheduler. Change-Id: Ieffc10a3b82a102f561331bc385d042c15a33998 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2015-01-28 14:13:14 +05:30
Linux Build Service Account	3ff6a5a197	Merge "tracing: power: Add trace events for core control"	2015-01-27 06:53:12 -08:00
Srivatsa Vaddagiri	7e767d3e45	sched: Add userspace interface to set PF_WAKE_UP_IDLE sched_prefer_idle flag controls whether tasks can be woken to any available idle cpu. It may be desirable to set sched_prefer_idle to 0 so that most tasks wake up to non-idle cpus under mostly_idle threshold and have specialized tasks override this behavior through other means. PF_WAKE_UP_IDLE flag per task provides exactly that. It lets tasks with PF_WAKE_UP_IDLE flag set be woken up to any available idle cpu independent of sched_prefer_idle flag setting. Currently only kernel-space API exists to set PF_WAKE_UP_IDLE flag for a task. This patch adds a user-space API (in /proc filesystem) to set PF_WAKE_UP_IDLE flag for a given task. /proc/[pid]/sched_wake_up_idle file can be written to set or clear PF_WAKE_UP_IDLE flag for a given task. Change-Id: I13a37e740195e503f457ebe291d54e83b230fbeb Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2015-01-27 19:39:57 +05:30
Junjie Wu	6c68b1215d	tracing: power: Add trace events for core control Add trace events for core control module. Change-Id: I36da5381709f81ef1ba82025cd9cf8610edef3fc Signed-off-by: Junjie Wu <junjiew@codeaurora.org>	2015-01-22 17:31:16 -08:00
Jiri Olsa	d525563b50	perf: Fix events installation during moving group commit 9fc81d87420d0d3fd62d5e5529972c0ad9eab9cc upstream. We allow PMU driver to change the cpu on which the event should be installed to. This happened in patch: `e2d37cd213` ("perf: Allow the PMU driver to choose the CPU on which to install events") This patch also forces all the group members to follow the currently opened events cpu if the group happened to be moved. This and the change of event->cpu in perf_install_in_context() function introduced in: `0cda4c0231` ("perf: Introduce perf_pmu_migrate_context()") forces group members to change their event->cpu, if the currently-opened-event's PMU changed the cpu and there is a group move. Above behaviour causes problem for breakpoint events, which uses event->cpu to touch cpu specific data for breakpoints accounting. By changing event->cpu, some breakpoints slots were wrongly accounted for given cpu. Vinces's perf fuzzer hit this issue and caused following WARN on my setup: WARNING: CPU: 0 PID: 20214 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x142/0x150() Can't find any breakpoint slot [...] This patch changes the group moving code to keep the event's original cpu. Reported-by: Vince Weaver <vince@deater.net> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vince@deater.net> Cc: Yan, Zheng <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/1418243031-20367-3-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-16 06:59:03 -08:00
Linux Build Service Account	4c5b2873a2	Merge "sched: add sched feature FORCE_CPU_THROTTLING_IMMINENT"	2015-01-14 17:12:07 -08:00
Linux Build Service Account	a2ada2d4c8	Merge "sched: continue to search less power efficient cpu for load balancer"	2015-01-14 17:12:06 -08:00
Linux Build Service Account	81921f5681	Merge "cpu_pm: Add level to the cluster pm notification"	2015-01-13 12:24:49 -08:00
Joonwoo Park	66bd788705	sched: add sched feature FORCE_CPU_THROTTLING_IMMINENT Add a new sched feature FORCE_CPU_THROTTLING_IMMINENT to perform migration due to EA without checking frequency throttling. This option can give us better debugging and verification capability. Change-Id: Iba445961a7f9812528b4e3aa9c6ddf47a3aad583 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-01-08 11:22:33 -08:00
Joonwoo Park	2c7cc326ed	sched: continue to search less power efficient cpu for load balancer When choosing a CPU to do power-aware active balance from the load balancer currently selects the first eligible CPU it finds, even if there is another eligible CPU which is higher-power. This can lead to suboptimal load balancing behavior and extra migrations. Power and performance will be impacted. Achieve better power and performance by continuing to search the least power efficient cpu as long as the cpu's load average is higher than or equal to the busiest cpu found by far. CRs-fixed: 777341 Change-Id: I14eb21ab725bf7dab88b2e1e169aced6f2d712ca Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-01-08 11:22:33 -08:00
Oleg Nesterov	12279351a4	exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting commit 24c037ebf5723d4d9ab0996433cee4f96c292a4d upstream. alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it fails because disable_pid_allocation() was called by the exiting child_reaper. We could simply move get_pid_ns() down to successful return, but this fix tries to be as trivial as possible. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-08 09:58:17 -08:00
Eric W. Biederman	afb5285388	userns: Allow setting gid_maps without privilege when setgroups is disabled commit 66d2f338ee4c449396b6f99f5e75cd18eb6df272 upstream. Now that setgroups can be disabled and not reenabled, setting gid_map without privielge can now be enabled when setgroups is disabled. This restores most of the functionality that was lost when unprivileged setting of gid_map was removed. Applications that use this functionality will need to check to see if they use setgroups or init_groups, and if they don't they can be fixed by simply disabling setgroups before writing to gid_map. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-08 09:58:17 -08:00
Eric W. Biederman	1c587ee50e	userns: Add a knob to disable setgroups on a per user namespace basis commit 9cc46516ddf497ea16e8d7cb986ae03a0f6b92f8 upstream. - Expose the knob to user space through a proc file /proc/<pid>/setgroups A value of "deny" means the setgroups system call is disabled in the current processes user namespace and can not be enabled in the future in this user namespace. A value of "allow" means the segtoups system call is enabled. - Descendant user namespaces inherit the value of setgroups from their parents. - A proc file is used (instead of a sysctl) as sysctls currently do not allow checking the permissions at open time. - Writing to the proc file is restricted to before the gid_map for the user namespace is set. This ensures that disabling setgroups at a user namespace level will never remove the ability to call setgroups from a process that already has that ability. A process may opt in to the setgroups disable for itself by creating, entering and configuring a user namespace or by calling setns on an existing user namespace with setgroups disabled. Processes without privileges already can not call setgroups so this is a noop. Prodcess with privilege become processes without privilege when entering a user namespace and as with any other path to dropping privilege they would not have the ability to call setgroups. So this remains within the bounds of what is possible without a knob to disable setgroups permanently in a user namespace. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-08 09:58:16 -08:00
Eric W. Biederman	6eaab78729	userns: Rename id_map_mutex to userns_state_mutex commit f0d62aec931e4ae3333c797d346dc4f188f454ba upstream. Generalize id_map_mutex so it can be used for more state of a user namespace. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-08 09:58:16 -08:00
Eric W. Biederman	a1821391e5	userns: Only allow the creator of the userns unprivileged mappings commit f95d7918bd1e724675de4940039f2865e5eec5fe upstream. If you did not create the user namespace and are allowed to write to uid_map or gid_map you should already have the necessary privilege in the parent user namespace to establish any mapping you want so this will not affect userspace in practice. Limiting unprivileged uid mapping establishment to the creator of the user namespace makes it easier to verify all credentials obtained with the uid mapping can be obtained without the uid mapping without privilege. Limiting unprivileged gid mapping establishment (which is temporarily absent) to the creator of the user namespace also ensures that the combination of uid and gid can already be obtained without privilege. This is part of the fix for CVE-2014-8989. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-08 09:58:16 -08:00
Eric W. Biederman	f028f2d732	userns: Check euid no fsuid when establishing an unprivileged uid mapping commit 80dd00a23784b384ccea049bfb3f259d3f973b9d upstream. setresuid allows the euid to be set to any of uid, euid, suid, and fsuid. Therefor it is safe to allow an unprivileged user to map their euid and use CAP_SETUID privileged with exactly that uid, as no new credentials can be obtained. I can not find a combination of existing system calls that allows setting uid, euid, suid, and fsuid from the fsuid making the previous use of fsuid for allowing unprivileged mappings a bug. This is part of a fix for CVE-2014-8989. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-08 09:58:16 -08:00
Eric W. Biederman	ba0922adbd	userns: Don't allow unprivileged creation of gid mappings commit be7c6dba2332cef0677fbabb606e279ae76652c3 upstream. As any gid mapping will allow and must allow for backwards compatibility dropping groups don't allow any gid mappings to be established without CAP_SETGID in the parent user namespace. For a small class of applications this change breaks userspace and removes useful functionality. This small class of applications includes tools/testing/selftests/mount/unprivilged-remount-test.c Most of the removed functionality will be added back with the addition of a one way knob to disable setgroups. Once setgroups is disabled setting the gid_map becomes as safe as setting the uid_map. For more common applications that set the uid_map and the gid_map with privilege this change will have no affect. This is part of a fix for CVE-2014-8989. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-08 09:58:16 -08:00
Eric W. Biederman	fc9b65e3d7	userns: Don't allow setgroups until a gid mapping has been setablished commit 273d2c67c3e179adb1e74f403d1e9a06e3f841b5 upstream. setgroups is unique in not needing a valid mapping before it can be called, in the case of setgroups(0, NULL) which drops all supplemental groups. The design of the user namespace assumes that CAP_SETGID can not actually be used until a gid mapping is established. Therefore add a helper function to see if the user namespace gid mapping has been established and call that function in the setgroups permission check. This is part of the fix for CVE-2014-8989, being able to drop groups without privilege using user namespaces. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-08 09:58:16 -08:00
Eric W. Biederman	b8a0441b54	userns: Document what the invariant required for safe unprivileged mappings. commit 0542f17bf2c1f2430d368f44c8fcf2f82ec9e53e upstream. The rule is simple. Don't allow anything that wouldn't be allowed without unprivileged mappings. It was previously overlooked that establishing gid mappings would allow dropping groups and potentially gaining permission to files and directories that had lesser permissions for a specific group than for all other users. This is the rule needed to fix CVE-2014-8989 and prevent any other security issues with new_idmap_permitted. The reason for this rule is that the unix permission model is old and there are programs out there somewhere that take advantage of every little corner of it. So allowing a uid or gid mapping to be established without privielge that would allow anything that would not be allowed without that mapping will result in expectations from some code somewhere being violated. Violated expectations about the behavior of the OS is a long way to say a security issue. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-08 09:58:16 -08:00
Eric W. Biederman	4accc8c8e2	groups: Consolidate the setgroups permission checks commit 7ff4d90b4c24a03666f296c3d4878cd39001e81e upstream. Today there are 3 instances of setgroups and due to an oversight their permission checking has diverged. Add a common function so that they may all share the same permission checking code. This corrects the current oversight in the current permission checks and adds a helper to avoid this in the future. A user namespace security fix will update this new helper, shortly. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-01-08 09:58:16 -08:00
Murali Nalajala	16323e6eef	cpu_pm: Add level to the cluster pm notification Cluster pm notifications without level information increases difficulty and complexity for the registered drivers to figure out when the last coherency level is going into power collapse. Send notifications with level information that allows the registered drivers to easily determine the cluster level that is going in/out of power collapse. There is an issue with this implementation. GIC driver saves and restores the distributed registers as part of cluster notifications. On newer platforms there are multiple cluster levels are defined (e.g l2, cci etc). These cluster level notofications can happen independently. On MSM platforms GIC is still active while the cluster sleeps in idle, causing the GIC state to be overwritten with an incorrect previous state of the interrupts. This leads to a system hang. Do not save and restore on any L2 and higher cache coherency level sleep entry and exit. Change-Id: I31918d6383f19e80fe3b064cfaf0b55e16b97eb6 Signed-off-by: Archana Sathyakumar <asathyak@codeaurora.org> Signed-off-by: Murali Nalajala <mnalajal@codeaurora.org>	2015-01-07 22:31:58 -08:00
Syed Rameez Mustafa	13e853e988	sched: Update cur_freq for offline CPUs in notifier callback cpufreq governor does not send frequency change notifications for offline CPUs. This means that a hot removed CPU's cur_freq information can get stale if there is a frequency change while that CPU is offline. When the offline CPU is hotplugged back in, all subsequent load calculations are based off the stale information until another frequency change occurs and the corresponding set of notifications are sent out. Avoid this incorrect load tracking by updating the cur_freq for all CPUs in the same frequency domain. Change-Id: Ie11ad9a64e7c9b115d01a7c065f22d386eb431d5 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2015-01-05 15:36:33 -08:00
Linux Build Service Account	05e6c29de4	Merge "sched: add preference for prev_cpu in HMP task placement"	2014-12-29 17:31:49 -08:00
Linux Build Service Account	380cadc7f3	Merge "sched: Per-cpu prefer_idle flag"	2014-12-29 17:31:47 -08:00
Linux Build Service Account	307e71816e	Merge "sched: Consider PF_WAKE_UP_IDLE in select_best_cpu()"	2014-12-29 17:31:47 -08:00
Olav Haugan	30d383d45b	sched: Fix overflow in max possible capacity calculation The max possible capacity calculation might overflow given large enough max possible frequency and capacity. Fix potential for overflow. Change-Id: Ie9345bc657988845aeb450d922052550cca48a5f Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2014-12-26 11:25:32 -08:00
Steve Muckle	cecf6c46cd	sched: add preference for prev_cpu in HMP task placement At present the HMP task placement algorithm scans CPUs in numerical order and if two identical options are found, the first one encountered is chosen, even if it is different from the task's previous CPU. Add a bias towards the task's previous CPU in such situations. Any time two or more CPUs are considered equivalent (load, C-state, power cost), if one of them is the task's previous CPU, bias towards that CPU. The algorithm is otherwise unchanged. CRs-Fixed: 772033 Change-Id: I511f5b929c2bfa6fdea9e7433893c27b29ed8026 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-23 15:54:35 -08:00
Linux Build Service Account	5da5828a23	Merge "sched: Add sysctl to enable power aware scheduling"	2014-12-23 13:21:41 -08:00
Linux Build Service Account	4ef567e8bc	Merge "sched: Ensure no active EA migration occurs when EA is disabled"	2014-12-23 05:58:15 -08:00
Srivatsa Vaddagiri	599bfc7503	sched: Per-cpu prefer_idle flag Remove the global sysctl_sched_prefer_idle flag and replace it with a per-cpu prefer_idle flag. The per-cpu flag is expected to same for all cpus in a cluster. It thus provides convenient means to disable packing in one cluster while allowing packing in another cluster. Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-12-23 09:52:43 +05:30
Srivatsa Vaddagiri	92ba1d55f3	sched: Consider PF_WAKE_UP_IDLE in select_best_cpu() sysctl_sched_prefer_idle controls selection of idle cpus for waking tasks. In some cases, waking to idle cpus help performance while in other cases it hurts (as tasks incur latency associated with C-state wakeup). Its ideal if scheduler can adapt prefer_idle behavior based on the task that is waking up, but that's hard for scheduler to figure by itself. PF_WAKE_UP_IDLE hint can be provided by external module/driver in such case to guide scheduler in preferring an idle cpu for select tasks irrespective of sysctl_sched_prefer_idle flag. This patch enhances select_best_cpu() to consider PF_WAKE_UP_IDLE hint. Wakeup posted from any task that has PF_WAKE_UP_IDLE set is a hint for scheduler to prefer idle cpu for waking tasks. Similarly scheduler will attempt to place any task with PF_WAKE_UP_IDLE set on idle cpu when they wakeup. CRs-Fixed: 773101 Change-Id: Ia8bf334d98fd9fd2ff9eda875430497d55d64ce6 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-12-23 09:52:27 +05:30
Olav Haugan	7e13b27b8b	sched: Add sysctl to enable power aware scheduling Add sysctl to enable energy awareness at runtime. This is useful for performance/power tuning/measurements and debugging. In addition this will match up with the Documentation/scheduler/sched-hmp.txt documentation. Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2014-12-22 14:37:33 -08:00
Olav Haugan	294b88dc67	sched: Ensure no active EA migration occurs when EA is disabled There exists a flag called "sched_enable_power_aware" that is not honored everywhere. Fix this. Change-Id: I62225939b71b25970115565b4e9ccb450e252d7c Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2014-12-22 14:23:55 -08:00
Vignesh Radhakrishnan	230faea50d	irq_work: register irq_work_cpu_notify in early init Currently irq_work_cpu_notify is registered using device_initcall(). In cases where CPU is hotplugged early (example would be thermal engine hotplugging CPU), there are chances where irq_work_cpu_notifier has not even registered, but CPU is already hotplugged out. irq_work uses CPU_DYING notifier to clear out the pending irq_work. But since the cpu notifier is not even registered that early, pending irq_work items are never run since this pending list is percpu. One specific scenario where this is impacting the system is, rcu framework using irq_work to wakeup and complete cleanup operations. In this scenario we notice that RCU operations needs cleanup on the hotplugged CPU. Fix this by registering irq_work_cpu_notify in early init. CRs-Fixed: 768180 Change-Id: Ibe7f5c77097de7a342eeb1e8d597fb2f72185ecf Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>	2014-12-22 14:30:12 +05:30
Joonwoo Park	fc994a4b9e	sched: take account of irq preemption when calculating irqload delta If irq raises while sched_irqload() is calculating irqload delta, sched_account_irqtime() can update rq's irqload_ts which can be greater than the jiffies stored in sched_irqload()'s context so delta can be negative. This negative delta means there was recent irq occurence. So remove improper BUG_ON(). CRs-fixed: 771894 Change-Id: I5bb01b50ec84c14bf9f26dd9c95de82ec2cd19b5 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2014-12-16 16:56:50 -08:00
Joonwoo Park	2cec55a2e2	sched: Prevent race conditions where upmigrate_min_nice changes When upmigrate_min_nice is changed dec_nr_big_small_task() can trigger BUG_ON(rq->nr_big_tasks < 0). This happens when there is a task which was considered as non-big task due to its nice > upmigrate_min_nice and later upmigrate_min_nice is changed to higher value so the task becomes big task. In this case runqueue still has nr_big_tasks = 0 incorrectly with current implementation. Consequently next scheduler tick sees a big task to schedule and try to decrease nr_big_tasks which is already 0. Introduce sched_upmigrate_min_nice which is updated atomically and re-count the number of big and small tasks to fix BUG_ON() triggering. Change-Id: I6f5fc62ed22bbe5c52ec71613082a6e64f406e58 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2014-12-16 11:05:22 -08:00
Olav Haugan	4beca1fd4d	sched: Avoid frequent task migration due to EA in lb A new tunable exists that allow task migration to be throttled when the scheduler tries to do task migrations due to Energy Awareness (EA). This tunable is only taken into account when migrations occur in the tick path. Extend the usage of the tunable to take into account the load balancer (lb) path also. In addition ensure that the start of task execution on a CPU is updated correctly. If a task is preempted but still runnable on the same CPU the start of execution should not be updated. Only update the start of execution when a task wakes up after sleep or moves to a new CPU. Change-Id: I6b2a8e06d8d2df8e0f9f62b7aba3b4ee4b2c1c4d Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2014-12-13 06:43:49 -08:00
Olav Haugan	a7bc092692	sched: Avoid migrating tasks to little cores due to EA If during the check whether migration is needed we find that there is a lower power CPU available we commence to find a new CPU for this task. However, by the time we search for a new CPU the lower power CPU might no longer be available. We should abort the attempt to migrate a task in this case. CRs-Fixed: 764788 Change-Id: I867923a82b95c599278b81cd73bb102b6aff4d03 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2014-12-13 06:43:48 -08:00
Olav Haugan	2c320f2ffa	sched: Add temperature to cpu_load trace point Add the current CPU temperature to the sched_cpu_load trace point. This will allow us to track the CPU temperature. CRs-Fixed: 764788 Change-Id: Ib2e3559bbbe3fe07a6b7c8115db606828bc36254 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2014-12-13 06:43:48 -08:00
Olav Haugan	0c0d18bb15	sched: Only do EA migration when CPU throttling is imminent We do not want to migrate tasks unnecessary to avoid cache hit and other migration latencies that could affect the performance of the system. Add a check to only try EA migration when CPU frequency throttling is imminent. CRs-Fixed: 764788 Change-Id: I92e86e62da10ce15f1e76a980df3545e93d76348 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2014-12-13 06:34:55 -08:00
Srivatsa Vaddagiri	32c6ac7c62	sched: Avoid frequent migration of running task Power values for cpus can drop quite considerably when it goes idle. As a result, the best choice for running a single task in a cluster can vary quite rapidly. As the task keeps hopping cpus, other cpus go idle and start being seen as more favorable target for running a task, leading to task migrating almost every scheduler tick! Prevent this by keeping track of when a task started running on a cpu and allowing task migration in tick path (migration_needed()) on account of energy efficiency reasons only if the task has run sufficiently long (as determined by sysctl_sched_min_runtime variable). Note that currently sysctl_sched_min_runtime setting is considered only in scheduler_tick()->migration_needed() path and not in idle_balance() path. In other words, a task could be migrated to another cpu which did a idle_balance(). This limitation should not affect high-frequency migrations seen typically (when a single high-demand task runs on high-performance cpu). CRs-Fixed: 756570 Change-Id: I96413b7a81b623193c3bbcec6f3fa9dfec367d99 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-12-13 06:34:55 -08:00
Steve Muckle	1bfb9a0dd3	sched: treat sync waker CPUs with 1 task as idle When a CPU with one task performs a sync wakeup, its one task is expected to sleep immediately so this CPU should be treated as idle for the purposes of CPU selection for the waking task. This is only done when idle CPUs are the preferred targets for non-small task wakeups. When prefer_idle is 0, the CPU is left as non-idle in the selection logic so it is still a preferred candidate for the sync wakeup. Change-Id: I65c6535169293e8ba0c37fb5e88aec336338f7d7 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-10 23:53:58 -08:00
Syed Rameez Mustafa	b3c5c54d72	sched: extend sched_task_load tracepoint to indicate prefer_idle Prefer idle determines whether the scheduler prefers an idle CPU over a busy CPU or not to wake up a task on. Knowing the correct value of this tunable is essential in understanding placement decisions made in select_best_cpu(). Change-Id: I955d7577061abccb65d01f560e1911d9db70298a Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2014-12-10 23:53:57 -08:00
Steve Muckle	84370f934b	sched: extend sched_task_load tracepoint to indicate sync wakeup Sync wakeups provide a hint to the scheduler about upcoming task activity. Knowing which wakeups are sync wakeups from logs will assist in workload analysis. Change-Id: I6ffe73f2337e56b8234d4097069d5d70ab045eda Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-10 23:53:56 -08:00
Steve Muckle	ee9ddb5f3c	sched: add sync wakeup recognition in select_best_cpu If a wakeup is a sync wakeup, we need to discount the currently running task's load from the waker's CPU as we calculate the best CPU for the waking task to land on. Change-Id: I00c5df626d17868323d60fb90b4513c0dd314825 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-10 23:53:55 -08:00
Srivatsa Vaddagiri	6e778f0cdc	sched: Provide knob to prefer mostly_idle over idle cpus sysctl_sched_prefer_idle lets the scheduler bias selection of idle cpus over mostly idle cpus for tasks. This knob could be useful to control balance between power and performance. Change-Id: Ide6eef684ef94ac8b9927f53c220ccf94976fe67 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-12-10 23:53:54 -08:00
Steve Muckle	75d1c94217	sched: make sched_cpu_high_irqload a runtime tunable It may be desirable to be able to alter the scehd_cpu_high_irqload setting easily, so make it a runtime tunable value. Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-10 23:53:53 -08:00
Steve Muckle	00acd0448b	sched: trace: extend sched_cpu_load to print irqload The irqload is used in determining whether CPUs are mostly idle so it is useful to know this value while viewing scheduler traces. Change-Id: Icbb74fc1285be878f254ae54886bdb161b14a270 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-10 23:53:51 -08:00
Steve Muckle	51f0d7663b	sched: avoid CPUs with high irq activity CPUs with significant IRQ activity will not be able to serve tasks quickly. Avoid them if possible by disqualifying such CPUs from being recognized as mostly idle. Change-Id: I2c09272a4f259f0283b272455147d288fce11982 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-10 23:53:47 -08:00
Steve Muckle	a14e01109a	sched: refresh sched_clock() after acquiring rq lock in irq path The wallclock time passed to sched_account_irqtime() may be stale after we wait to acquire the runqueue lock. This could cause problems in update_task_ravg because a different CPU may have advanced this CPU's window_start based on a more up-to-date wallclock value, triggering a BUG_ON(window_start > wallclock). Change-Id: I316af62d1716e9b59c4a2898a2d9b44d6c7a75d8 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-10 19:50:46 -08:00
Steve Muckle	5fdc1d3aaa	sched: track soft/hard irqload per-RQ with decaying avg The scheduler currently ignores irq activity when deciding which CPUs to place tasks on. If a CPU is getting hammered with IRQ activity but has no tasks it will look attractive to the scheduler as it will not be in a low power mode. Track irqload with a decaying average. This quantity can be used in the task placement logic to avoid CPUs which are under high irqload. The decay factor is 3/4. Note that with this algorithm the tracked irqload quantity will be higher than the actual irq time observed in any single window. Some sample outcomes with steady irqloads per 10ms window and the 3/4 decay factor (irqload of 10 is used as a threshold in a subsequent patch): irqload per window load value asymptote # windows to > 10 2ms 8 n/a 3ms 12 7 4ms 16 4 5ms 20 3 Of course irqload will not be constant in each window, these are just given as simple examples. Change-Id: I9dba049f5dfdcecc04339f727c8dd4ff554e01a5 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-10 19:50:45 -08:00
Steve Muckle	c5c90f6099	sched: do not set window until sched_clock is fully initialized The system initially uses a jiffy-based sched clock. When the platform registers a new timer for sched_clock, sched_clock can jump backwards. Once sched_clock_postinit() runs it should be safe to rely on it. Also sched_clock_cpu() relies on completion of sched_clock_init() and until that happens sched_clock_cpu() returns zero. This is used in the irq accounting path which window-based stats relies upon. So do not set window_start until sched_clock_cpu() is working. Change-Id: Ided349de8f8554f80a027ace0f63ea52b1c38c68 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-10 19:50:44 -08:00
Andy Lutomirski	0061b518b1	uprobes, x86: Fix _TIF_UPROBE vs _TIF_NOTIFY_RESUME commit 82975bc6a6df743b9a01810fb32cb65d0ec5d60b upstream. x86 call do_notify_resume on paranoid returns if TIF_UPROBE is set but not on non-paranoid returns. I suspect that this is a mistake and that the code only works because int3 is paranoid. Setting _TIF_NOTIFY_RESUME in the uprobe code was probably a workaround for the x86 bug. With that bug fixed, we can remove _TIF_NOTIFY_RESUME from the uprobes code. Reported-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-12-06 15:05:46 -08:00
Linux Build Service Account	fea8806a70	Merge "sched: Fix inaccurate accounting for real-time task"	2014-12-06 14:38:08 -08:00
Linux Build Service Account	cd2d717655	Merge "Revert "sched: update_rq_clock() must skip ONE update""	2014-12-06 14:38:07 -08:00
Linux Build Service Account	b4229d736e	Merge "sched: Make RT tasks eligible for boost"	2014-12-05 00:05:48 -08:00
Syed Rameez Mustafa	fce95c9a12	sched: Make RT tasks eligible for boost During sched boost RT tasks currently end up going to the lowest power cluster. This can be a performance bottleneck especially if the frequency and IPC differences between clusters are high. Furthermore, when RT tasks go over to the little cluster during boost, the load balancer keeps attempting to pull work over to the big cluster. This results in pre-emption of the executing RT task causing more delays. Finally, containing more work on a single cluster during boost might help save some power if the little cluster can then enter deeper low power modes. Change-Id: I177b2e81be5657c23e7ac43889472561ce9993a9 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2014-12-03 19:50:25 -08:00
Linux Build Service Account	a5f4e12c8d	Merge "sched: Limit LBF_PWR_ACTIVE_BALANCE to within cluster"	2014-12-03 16:31:51 -08:00
Linux Build Service Account	6be10b2d68	Merge "sched: Packing support until a frequency threshold"	2014-12-03 16:31:32 -08:00
Linux Build Service Account	0899fc9f81	Merge "irq: smp_affinity: Initialize work struct only once"	2014-12-03 16:31:31 -08:00
Srivatsa Vaddagiri	66b5ce9db0	sched: Limit LBF_PWR_ACTIVE_BALANCE to within cluster When higher power (performance) cluster has only one online cpu, we currently let an idle cpu in lower power cluster pull a running task from performance cluster via active balance. Active balance for power-aware reasons is supposed to be restricted to balance within cluster, the check for which is not correctly implemented. Change-Id: I5fba7f01ad80c082a9b27e89b7f6b17a6d9cde14 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-12-02 15:28:14 +05:30
Srivatsa Vaddagiri	8fd5aa3bf2	sched: Fix inaccurate accounting for real-time task It is possible that rq->clock_task was not updated in put_prev_task() in which case we can potentially overcharge a real-time task for time it did not run. This is because clock_task could be stale and not represent the exact time real-time task started running. Fix this by forcing update of rq->clock_task when real-time task starts running. Change-Id: I8320bb4e47924368583127b950d987925e8e6a6c Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-12-02 15:09:11 +05:30
Srivatsa Vaddagiri	21357f54c1	Revert "sched: update_rq_clock() must skip ONE update" This reverts commit `ab2ff007fe` as it was found to cause some performance regressions Change-Id: Idd71fb04c77f5c9b0bc6bccc66b94ab5a7368471 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-12-02 14:46:32 +05:30
Srivatsa Vaddagiri	57da62614c	sched: Packing support until a frequency threshold Add another dimension for task packing based on frequency. This patch adds a per-cpu tunable, rq->mostly_idle_freq, which when set will result in tasks being packed on a single cpu in cluster as long as cluster frequency is less than set threshold. Change-Id: I318e9af6c8788ddf5dfcda407d621449ea5343c0 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-12-02 11:48:30 +05:30
Linux Build Service Account	72de2d6cb8	Merge "sched: update_rq_clock() must skip ONE update"	2014-11-30 16:19:16 -08:00
Linux Build Service Account	b4b0ebc5f9	Merge "sched: tighten up jiffy to sched_clock mapping"	2014-11-29 17:17:43 -08:00
Praveen Chidambaram	ee7ee8aa74	irq: smp_affinity: Initialize work struct only once The work function to handle the irq affinity change is currently being set from setup_affinity, which is called whenever the affinity changes. Initialize the work function only once when the irq desc object's defaults are set up. CRs-Fixed: 756463 Change-Id: I66732f8c01cba166c41ce89c329d313eeaea8a7d Signed-off-by: Praveen Chidambaram <pchidamb@codeaurora.org>	2014-11-25 11:33:31 -07:00
Srivatsa Vaddagiri	ab2ff007fe	sched: update_rq_clock() must skip ONE update Prevent large wakeup latencies from being accounted to the wrong task. Change-Id: Ie9932acb8a733989441ff2dd51c50a2626cfe5c5 Cc: <stable@vger.kernel.org> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> CRs-Fixed: 755576 Patch-mainline: http://permalink.gmane.org/gmane.linux.kernel/1677324 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-11-25 12:28:35 +05:30
Linux Build Service Account	6383a10831	Merge "perf: Add queued work to remove orphaned child events"	2014-11-22 08:32:07 -08:00
Linux Build Service Account	e999189413	Merge "perf: Set owner pointer for kernel events"	2014-11-22 08:32:07 -08:00
Jiri Olsa	6cd67d5a13	perf: Add queued work to remove orphaned child events In cases when the owner task exits before the workload and the workload made some forks, all the events stay in until the last workload process exits. Thats' because each child event holds parent reference. We want to release all children events once the parent is gone, because at that time there's no process to read them anyway, so they're just eating resources. This removal races with process exit, which removes all events and fork, which clone events. To be clear of those two, adding work queue to remove orphaned child for context in case such event is detected. Using delayed work queue (with delay == 1), because we queue this work under perf scheduler callbacks. Normal work queue tries to wake up the queue process, which deadlocks on rq->lock in this place. Also preventing clones from abandoned parent event. Change-Id: I20b674d9b56910828444e29a9c0756daac1b4680 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1406896382-18404-4-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Git-commit: fadfe7be6e50de7f03913833b33c56cd8fb66bac Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git [sheetals@codeaurora.org: fixed merge conflicts] Signed-off-by: Sheetal Sahasrabudhe <sheetals@codeaurora.org>	2014-11-21 14:35:38 -05:00
Jiri Olsa	18bebb752b	perf: Set owner pointer for kernel events Adding fake EVENT_OWNER_KERNEL owner pointer value for kernel perf events, so we could distinguish it from user events, which needs special care in following patch. Change-Id: I975186151644af709d7fdfc13f1ce9d2ebd4c83b Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1406896382-18404-3-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Git-commit: f86977620ee4635f26befcf436700493a38ce002 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git [sheetals@codeaurora.org: fixed merge conflicts] Signed-off-by: Sheetal Sahasrabudhe <sheetals@codeaurora.org>	2014-11-21 14:35:16 -05:00
Pawel Moll	858879737b	perf: Handle compat ioctl commit b3f207855f57b9c8f43a547a801340bb5cbc59e5 upstream. When running a 32-bit userspace on a 64-bit kernel (eg. i386 application on x86_64 kernel or 32-bit arm userspace on arm64 kernel) some of the perf ioctls must be treated with special care, as they have a pointer size encoded in the command. For example, PERF_EVENT_IOC_ID in 32-bit world will be encoded as 0x80042407, but 64-bit kernel will expect 0x80082407. In result the ioctl will fail returning -ENOTTY. This patch solves the problem by adding code fixing up the size as compat_ioctl file operation. Reported-by: Drew Richardson <drew.richardson@arm.com> Signed-off-by: Pawel Moll <pawel.moll@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1402671812-9078-1-git-send-email-pawel.moll@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: David Ahern <daahern@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-11-21 09:22:55 -08:00
Miklos Szeredi	bd501a2eb2	audit: keep inode pinned commit 799b601451b21ebe7af0e6e8f6e2ccd4683c5064 upstream. Audit rules disappear when an inode they watch is evicted from the cache. This is likely not what we want. The guilty commit is "fsnotify: allow marks to not pin inodes in core", which didn't take into account that audit_tree adds watches with a zero mask. Adding any mask should fix this. Fixes: `90b1e7a578` ("fsnotify: allow marks to not pin inodes in core") Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-11-21 09:22:52 -08:00
Linux Build Service Account	aa285b577f	Merge "sched: per-cpu mostly_idle threshold"	2014-11-20 15:36:30 -08:00
Linux Build Service Account	62b1d26801	Merge "sched: Add API to set task's initial task load"	2014-11-20 15:36:29 -08:00
Steve Muckle	f17fe85baf	sched: tighten up jiffy to sched_clock mapping The tick code already tracks exact time a tick is expected to arrive. This can be used to eliminate slack in the jiffy to sched_clock mapping that aligns windows between a caller of sched_set_window and the scheduler itself. Change-Id: I9d47466658d01e6857d7457405459436d504a2ca Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-11-19 15:06:33 -08:00
Ram Chandrasekar	ecd44b989f	printk: Make the console flush configurable in hotplug path Make console flush configurable in hot plug code path. The thread which initiates the hot plug can get scheduled out, while trying to acquire the console lock, thus increasing the hot plug latency. This option allows to selectively disable the console flush and in turn reduce the hot plug latency. Change-Id: I0e6cd50a7e2ab14cab987815e47352b6b71f187a Signed-off-by: Ram Chandrasekar <rkumbako@codeaurora.org>	2014-11-18 19:16:25 -07:00
Patrick Daly	5a52a6e15e	printk: Save interrupt flags when enabling __log_oops_buf oops_printk_start() may be called with interrupts disabled; save and restore the interrupt flags properly. Found after a spinlock was acquired recursively in the following backtrace: \raw_spin_lock_irq \run_timer_softirq ...(skipped) \el1_irq () \printk\oops_printk_start \panic\oops_enter \traps\die \fault\__do_kernel_fault.part.5 \fault\do_page_fault \fault\do_translation_fault \fault\do_mem_abort () \get_next_timer_interrupt \tick-sched\__tick_nohz_idle_enter \tick-sched\tick_nohz_idle_enter \idle\cpu_startup_entry \init/main\rest_init \init/main\start_kernel CRs-fixed: 754837 Change-Id: Ib9b5079b0177833d40b14ddf0f0458be5676509a Signed-off-by: Patrick Daly <pdaly@codeaurora.org>	2014-11-17 12:30:52 -08:00
Linux Build Service Account	b3491934ec	Merge "idle: Implement a per-cpu idle-polling mode"	2014-11-15 20:27:53 -08:00
Linux Build Service Account	2492f77873	Merge "idle: exit the cpu_idle_poll loop if cpu_idle_force_poll is cleared"	2014-11-15 20:27:53 -08:00
Linux Build Service Account	0866af874e	Merge "idle: Add a memory barrier after setting cpu_idle_force_poll"	2014-11-15 20:27:52 -08:00
Mathias Krause	f9b6264a0f	posix-timers: Fix stack info leak in timer_create() commit 6891c4509c792209c44ced55a60f13954cb50ef4 upstream. If userland creates a timer without specifying a sigevent info, we'll create one ourself, using a stack local variable. Particularly will we use the timer ID as sival_int. But as sigev_value is a union containing a pointer and an int, that assignment will only partially initialize sigev_value on systems where the size of a pointer is bigger than the size of an int. On such systems we'll copy the uninitialized stack bytes from the timer_create() call to userland when the timer actually fires and we're going to deliver the signal. Initialize sigev_value with 0 to plug the stack info leak. Found in the PaX patch, written by the PaX Team. Fixes: `5a9fa73072` ("posix-timers: kill ->it_sigev_signo and...") Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Brad Spengler <spender@grsecurity.net> Cc: PaX Team <pageexec@freemail.hu> Link: http://lkml.kernel.org/r/1412456799-32339-1-git-send-email-minipli@googlemail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2014-11-14 08:48:00 -08:00

... 2 3 4 5 6 ...

16812 Commits