Commit Graph

710 Commits

Author SHA1 Message Date
Linux Build Service Account 72de2d6cb8 Merge "sched: update_rq_clock() must skip ONE update" 2014-11-30 16:19:16 -08:00
Linux Build Service Account b4b0ebc5f9 Merge "sched: tighten up jiffy to sched_clock mapping" 2014-11-29 17:17:43 -08:00
Srivatsa Vaddagiri ab2ff007fe sched: update_rq_clock() must skip ONE update
Prevent large wakeup latencies from being accounted to the wrong task.

Change-Id: Ie9932acb8a733989441ff2dd51c50a2626cfe5c5
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
CRs-Fixed: 755576
Patch-mainline: http://permalink.gmane.org/gmane.linux.kernel/1677324
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-11-25 12:28:35 +05:30
Linux Build Service Account aa285b577f Merge "sched: per-cpu mostly_idle threshold" 2014-11-20 15:36:30 -08:00
Linux Build Service Account 62b1d26801 Merge "sched: Add API to set task's initial task load" 2014-11-20 15:36:29 -08:00
Steve Muckle f17fe85baf sched: tighten up jiffy to sched_clock mapping
The tick code already tracks exact time a tick is expected
to arrive. This can be used to eliminate slack in the jiffy
to sched_clock mapping that aligns windows between a caller
of sched_set_window and the scheduler itself.

Change-Id: I9d47466658d01e6857d7457405459436d504a2ca
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-11-19 15:06:33 -08:00
Syed Rameez Mustafa a40d3ce56e sched: Avoid unnecessary load balance when tasks don't fit on dst_cpu
When considering to pull over a task that does not fit on the
destination CPU make sure that the busiest group has exceeded its
capacity. While the change is applicable to all groups, the biggest
impact will be on migrating big tasks to little CPUs. This should
only happen when the big cluster is no longer capable of balancing
load within the cluster. This change should have no impact on single
cluster systems.

Change-Id: I6d1ef0e0d878460530f036921ce4a4a9c1e1394b
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-11-13 12:24:31 -08:00
Steve Muckle e3d8a00dab sched: print sched_cpu_load tracepoint for all CPUs
When select_best_cpu() is called because a task is on a suboptimal
CPU, certain CPUs are skipped because moving the task there would
not make things any better. For the purposes of debugging though it
is useful to always see the state of all CPUs.

Change-Id: I76965663c1feef5c4cfab9909e477b0dcf67272d
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-11-10 19:22:51 -08:00
Srivatsa Vaddagiri ed7d7749e9 sched: per-cpu mostly_idle threshold
sched_mostly_idle_load and sched_mostly_idle_nr_run knobs help pack
tasks on cpus to some extent. In some cases, it may be desirable to
have different packing limits for different cpus. For example, pack to
a higher limit on high-performance cpus compared to power-efficient
cpus.

This patch removes the global mostly_idle tunables and makes them
per-cpu, thus letting task packing behavior to be controlled in a
fine-grained manner.

Change-Id: Ifc254cda34b928eae9d6c342ce4c0f64e531e6c2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-11-06 15:27:00 +05:30
Srivatsa Vaddagiri f0e281597c sched: Add API to set task's initial task load
Add a per-task attribute, init_load_pct, that is used to initialize
newly created children's initial task load. This helps important
applications launch their child tasks on cpus with highest capacity.

Change-Id: Ie9665fd2aeb15203f95fd7f211c50bebbaa18727
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-11-05 14:26:59 +05:30
Syed Rameez Mustafa 297c4ccce8 sched: use C-states in non-small task wakeup placement logic
Currently when a non-small task wakes up, the task placement logic
first tries to find the least loaded CPU before breaking any ties
via the power cost of running the task on those CPUs. When the power
cost is also same, however, the scheduler just selects the first CPU
it came across. Use C-states to further break ties when the power
cost is the same for multiple CPUs. The scheduler will now pick a
CPU in the shallowest C-state.

Change-Id: Ie1401b305fa02758a2f7b30cfca1afe64459fc2b
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-11-04 14:11:24 -08:00
Linux Build Service Account c44483c313 Merge "sched: Provide an easy method to log context switch latencies" 2014-10-24 22:51:02 -07:00
Linux Build Service Account 0f1e07cdf9 Merge "sched: take rq lock prior to saving idle task's mark_start" 2014-10-24 16:09:25 -07:00
Syed Rameez Mustafa a2006e83ba sched: Provide an easy method to log context switch latencies
Allow logging of various sections of context switch in order to derive
the worst case latencies associated with them. This is required for
scheduler profiling.

Change-Id: I3a5009cb3088cc7ace2cd3130d4d7b24e957bada
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-10-23 17:38:22 -07:00
Linux Build Service Account 27c06362c4 Merge "sched: update governor notification logic" 2014-10-22 15:58:33 -07:00
Steve Muckle eed96dfa2a sched: take rq lock prior to saving idle task's mark_start
When the idle task is being re-initialized during hotplug its
mark_start value must be retained. The runqueue lock must be
held when reading this value though to serialize this with
other CPUs that could update the idle task's window-based
statistics.

CRs-Fixed: 743991
Change-Id: I1bca092d9ebc32a808cea2b9fe890cd24dc868cd
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-10-22 15:12:41 -07:00
Srivatsa Vaddagiri f3386c7cfb sched: update governor notification logic
Make criteria for notifying governor to be per-cpu. Governor is
notified of any large change in cpu's busy time statistics
(rq->prev_runnable_sum) since the last reported value.

Change-Id: I727354d994d909b166d093b94d3dade7c7dddc0d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-15 14:57:18 -07:00
Srivatsa Vaddagiri f99927a703 sched: window-stats: Retain idle thread's mark_start
init_idle() is called on a cpu's idle-thread once at bootup and
subsequently everytime the cpu is hot-added. Since init_idle() calls
__sched_fork(), we end up blowing idle thread's ravg.mark_start value.
As a result we will fail to accurately maintain cpu's
curr/prev_runnable_sum counters. Below example illustrates such a
failure:

CS = curr_runnable_sum, PS = prev_runnable_sum

t0 -> New window starts for CPU2
<after some_task_activity> CS = X, PS = Y
t1 -> <cpu2 is hot-removed. idle_task start's running on cpu2>
      At this time, cpu2_idle_thread.ravg.mark_start = t1

t1 -> t0 + W. One window elapses. CPU2 still hot-removed. We
	defer swapping CS and PS until some future task event occurs

t2 -> CPU2 hot-added.  _cpu_up()->idle_thread_get()->init_idle()
	->__sched_fork() results in cpu2_idle_thread.ravg.mark_start = 0

t3 -> Some task wakes on cpu2. Since mark_start = 0, we don't swap CS
	and PS => which is a BUG!

Fix this by retaining idle task's original mark_start value during
init_idle() call.

Change-Id: I4ac9bfe3a58fb5da8a6c7bc378c79d9930d17942
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-13 16:11:20 -07:00
Linux Build Service Account 53d2a04a26 Merge "sched: Stop task migration to busy CPUs due to power active balance" 2014-10-12 08:39:38 -07:00
Olav Haugan 72bbd4b7cb sched: Add checks for frequency change
We need to check for frequency change when a task is migrated due to
affinity change and during active balance.

Change-Id: I96676db04d34b5b91edd83431c236a1c28166985
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-10-09 15:37:22 -07:00
Srivatsa Vaddagiri 19b3f3f871 sched: Use absolute scale for notifying governor
Make the tunables used for deciding the need for notification to be on
absolute scale. The earlier scale (in percent terms relative to
cur_freq) does not work well with available range of frequencies. For
example, 100% tunable value would work well for lower range of
frequencies and not for higher range. Having the tunable to be on
absolute scale makes tuning more realistic.

Change-Id: I35a8c4e2f2e9da57f4ca4462072276d06ad386f1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-03 14:03:56 -07:00
Srivatsa Vaddagiri 2568673dd6 sched: window-stats: Enhance cpu busy time accounting
rq->curr/prev_runnable_sum counters represent cpu demand from various
tasks that have run on a cpu. Any task that runs on a cpu will have a
representation in rq->curr_runnable_sum. Their partial_demand value
will be included in rq->curr_runnable_sum. Since partial_demand is
derived from historical load samples for a task, rq->curr_runnable_sum
could represent "inflated/un-realistic" cpu usage. As an example, lets
say that task with partial_demand of 10ms runs for only 1ms on a cpu.
What is included in rq->curr_runnable_sum is 10ms (and not the actual
execution time of 1ms). This leads to cpu busy time being reported on
the upside causing frequency to stay higher than necessary.

This patch fixes cpu busy accounting scheme to strictly represent
actual usage. It also provides for conditional fixup of busy time upon
migration and upon heavy-task wakeup.

CRs-Fixed: 691443
Change-Id: Ic4092627668053934049af4dfef65d9b6b901e6b
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-03 14:03:51 -07:00
Srivatsa Vaddagiri dababc266f sched: window-stats: ftrace event improvements
Add two new ftrace event:

* trace_sched_freq_alert, to log notifications sent
  to governor for requesting change in frequency.
* trace_sched_get_busy, to log cpu busytime information returned by
  scheduler

Extend existing ftrace events as follows:

* sched_update_task_ravg() event to log irqtime parameter
* sched_migration_update_sum() to log threadid which is being migrated
  (and thus responsible for update of curr_runnable_sum and
  prev_runnable_sum counters)

Change-Id: Ia68ce0953a2d21d319a1db7f916c51ff6a91557c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-03 13:47:29 -07:00
Srivatsa Vaddagiri 86df733742 sched: improve logic for alerting governor
Currently we send notification to governor not taking note of cpus
that are synchronized with regard to their frequency. As a result,
scheduler could send pointless notifications (notification spam!).

Avoid this by considering synchronized cpus and alerting governor only
when the highest demand of any cpu within cluster far exceeds or falls
behind current frequency.

Change-Id: I74908b5a212404ca56b38eb94548f9b1fbcca33d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-03 13:46:18 -07:00
Syed Rameez Mustafa 0b013c8593 sched: Stop task migration to busy CPUs due to power active balance
Power active balance should only be invoked when the destination CPU
is calling load balance with either a CPU_IDLE or a CPU_NEWLY_IDLE
environment. We do not want to push tasks towards busy CPUs even they
are a more power efficient place to run that task. This can cause
higher scheduling latencies due to the resulting load imbalance.

Change-Id: I8e0f242338887d189e2fc17acfb63586e7c40839
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-10-02 17:37:09 -07:00
Srivatsa Vaddagiri 6cb3d32976 sched: window-stats: Fix accounting bug in legacy mode
TASK_UPDATE event currently does not result in increment of
rq->curr_runnable_sum in legacy mode, which is wrong. As a result,
cpu busy time reported under legacy mode could be incorrect.

Change-Id: Ifa76c735a0ead23062c1a64faf97e7b801b66bf9
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-15 12:00:46 +05:30
Srivatsa Vaddagiri 802a513d90 sched: window-stats: Note legacy mode in fork() and exit()
In legacy mode, mark_task_starting() should avoid adding (new) task's
(initial) demand to rq->curr_runnable_sum and rq->prev_runnable_sum.
Similarly exit() should avoid removing (exiting) task's demand from
rq->curr_runnable_sum and rq->prev_runnable_sum (as those counters
don't include task's demand and partial_demand values in legacy mode).

Change-Id: I26820b1ac5885a9d681d363ec53d6866a2ea2e6f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-15 12:00:46 +05:30
Srivatsa Vaddagiri 718293c53c sched: Fix reference to stale task_struct in try_to_wake_up()
try_to_wake_up() currently drops p->pi_lock and later checks for need
to notify cpufreq governor on task migrations or wakeups. However the
woken task could exit between the time p->pi_lock is released and the
time the test for notification is run. As a result, the test for
notification could refer to an exited task. task_notify_on_migrate(p)
could thus lead to invalid memory reference.

Fix this by running the test for notification with task's pi_lock
held.

Change-Id: I1c7a337473d2d8e79342a015a179174ce00702e1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-15 12:00:46 +05:30
Syed Rameez Mustafa 37c0e84719 sched: Remove hack to enable/disable HMP scheduling extensions
The current method of turning HMP scheduling extensions on or off
based on the number of CPUs is inappropriate as there may be SoCs with
4 or less cores that require the use of these extensions. Remove this
hack as HMP extensions will now be enabled/disabled via command line
options.

Change-Id: Id44b53c2c3b3c3b83e1911a834e2c824f3958135
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-11 09:18:03 -07:00
Linux Build Service Account 7a62303fd9 Merge "sched: add check for cpu idleness when using C-state information" 2014-09-11 01:52:32 -07:00
Linux Build Service Account e81a3dc7f7 Merge "sched: extend sched_task_load tracepoint to indicate small tasks" 2014-09-11 01:52:31 -07:00
Linux Build Service Account cef2bfadb1 Merge "sched: Add C-state tracking to the sched_cpu_load trace event" 2014-09-09 04:48:06 -07:00
Linux Build Service Account 0dbd5f1b7b Merge "sched: window-stats: add a new AVG policy" 2014-09-09 04:47:32 -07:00
Linux Build Service Account 672d3eb95f Merge "sched: fix wrong load_scale_factor/capacity/nr_big/small_tasks" 2014-09-09 00:57:10 -07:00
Srivatsa Vaddagiri 9e37153f17 sched: fix wrong load_scale_factor/capacity/nr_big/small_tasks
A couple bugs exist with incorrect use of cpu_online_mask in
pre/post_big_small_task() functions, leading to potentially incorrect
computation of load_scale_factor/capacity/nr_big/small_tasks.

pre/post_big_small_task_count_change() use cpu_online_mask in an
unreliable manner. While local_irq_disable() in
pre_big_small_task_count_change() ensures a cpu won't go away in
cpu_online_mask, nothing prevents a cpu from coming online
concurrently. As a result, cpu_online_mask used in
pre_big_small_task_count_change() can be inconsistent with that used
in post_big_small_task_count_change() which can lead to an attempt to
unlock rq->lock which was not taken before.

Secondly, when either max_possible_freq or min_max_freq is changing,
it needs to trigger recomputation of load_scale_factor and capacity
for *all* cpus, even if some are offline. Otherwise, an offline cpu
could later come online with incorrect load_scale_factor/capacity.

While it should be sufficient to scan online cpus for
updating their nr_big/small_tasks in
post_big_small_task_count_change(), unfortunately it sounds pretty
hard to provide a stable cpu_online_mask when its called from
cpufreq_notifier_policy(). cpufreq framework can trigger a
CPUFREQ_NOTIFY notification in multiple contexts, some in cpu-hotplug
paths, which makes it pretty hard to guess whether get_online_cpus()
can be taken without causing deadlocks or not. To workaround the
insufficient information we have about the hotplug-safety context when
CPUFREQ_NOTIFY is issued, have post_big_small_task_count_change()
traverse all possible cpus in updating nr_big/small_task_count.

CRs-Fixed: 717134
Change-Id: Ife8f3f7cdfd77d5a21eee63627d7a3465930aed5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-08 17:18:24 -07:00
Syed Rameez Mustafa 04953f4035 sched: add check for cpu idleness when using C-state information
Task enqueue on a CPU occurs prior to that CPU exiting an idle state.
For the time duration between enqueue and idle exit, the CPU C-state
information can no longer be relied on for further task placement
since already enqueued/waiting tasks are not taken into account. The
small task placement algorithm implicitly assumes a non zero C-state
implies an idle CPU. Since this assumption is incorrect for the
duration described above, make the cpu_idle() check explicit. This
problem can lead to task packing beyond the mostly_idle threshold.

Change-Id: Idb5be85705d6b15f187d011ea2196e1bfe31dbf2
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-08 15:25:11 -07:00
Syed Rameez Mustafa 444e5dee14 sched: extend sched_task_load tracepoint to indicate small tasks
While debugging its always useful to know whether a task is small or
not to determine the scheduling algorithm being used. Have the
sched_task_load tracepoint indicate this information rather than
having to do manual calculations for every task placement.

Change-Id: Ibf390095f05c7da80df1ebfe00f4c5af66c97d12
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-08 14:40:58 -07:00
Syed Rameez Mustafa e85e73f1d7 sched: Add C-state tracking to the sched_cpu_load trace event
C-state information is used by the scheduler for small task placement
decisions. Track this information in the sched_cpu_load trace event.
Also add the trace event in best_small_task_cpu(). This will help
better understand small task placement decisions.

Change-Id: Ife5f05bba59f85c968fab999bd13b9fb6b1c184e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-08 11:29:57 -07:00
Syed Rameez Mustafa bf3e6c0e55 sched: window-stats: add a new AVG policy
The current WINDOW_STATS_AVG policy is actually a misnomer since it
uses the maximum value of the runtime in the recent window and the
average of the past ravg_hist_size windows. Add a policy that only
uses the average and call it WINDOW_STATS_AVG policy. Rename all the
other polices to make them shorter and unambiguous.

Change-Id: I080a4ea072a84a88858ca9da59a4151dfbdbe62c
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-08 11:07:41 -07:00
Linux Build Service Account c5bc590f13 Merge "sched: Fix compile error" 2014-09-07 08:21:53 -07:00
Srivatsa Vaddagiri 594ce07f48 sched: Fix compile error
sched_get_busy(), sched_set_io_is_busy() and sched_set_window() need
to be defined only when CONFIG_SCHED_FREQ_INPUT is defined, otherwise
we get compilation error related to dual definition of those routines

Change-Id: Ifd5c9b6675b78d04c2f7ef0e24efeae70f7ce19b
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-04 12:14:38 +05:30
Syed Rameez Mustafa e4600ab9eb sched: update ld_moved for active balance from the load balancer
ld_moved is currently left set to 0 when the load balancer calls upon
active balance. This behavior is incorrect as it prevents the termination
of load balance for parent sched domains. Currently the feature is used
quite frequently for power active balance and sched boost. This means that
while sched boost is in effect we could run into a scenario where a more
power efficient newly idle big CPU first triggers active migration from a
less power efficient busy big CPU. It then continues to load balance at the
cluster level causing active migration for a task running on a little CPU.
Consequently the more power efficient big CPU ends up with two tasks where
as the less power efficient big CPU may become idle. Fix this problem by
updating ld_moved when active migration has been requested.

Change-Id: I52e84eafb77249fd9378ebe531abe2d694178537
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-03 20:01:44 -07:00
Syed Rameez Mustafa 5f5ecf01d3 sched: actively migrate tasks to idle big CPUs during sched boost
The sched boost feature is currently tick driven, i.e. task placement
decisions only take place at a tick (or wakeup). The load balancer
does not have any knowledge of boost being in effect. Tasks that are
woken up on a little CPU when all big CPUs are busy will continue
executing there at least until the next tick even if one of the big
CPUs becomes idle. Reduce this latency by adding support for detecting
whether boost is in effect or not in the load balancer.  If boost is
in effect any big CPU running idle balance will trigger active
migration from a little CPU with the highest task load.

Change-Id: Ib2828809efa0f9857f5009b29931f63b276a59f3
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-03 19:42:15 -07:00
Syed Rameez Mustafa d3990aabb5 sched: always do idle balance with a NEWLY_IDLE idle environment
With the introduction of energy aware scheduling, if idle_balance() is
to be called on behalf of a different CPU which is idle, CPU_IDLE is
used in the environment for load_balance(). This, however, introduces
subtle differences in load calculations and policies in the load
balancer. For example there are restrictions on which CPU is permitted
to do load balancing during !CPU_NEWLY_IDLE (see update_sg_lb_stats)
and find_busiest_group() uses different criteria to detect the
presence of a busy group. There are other differences as well. Revert
back to using the NEWLY_IDLE environment irrespective of whether
idle_balance() is called for the newly idle CPU or on behalf on
already existing idle CPU. This will ensure that task movement logic
while doing idle balance remains unaffected.

Change-Id: I388b0ad9a38ca550667895c8ed19628f3d25ce1a
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-03 19:23:41 -07:00
Syed Rameez Mustafa 9c37494817 sched: fix bail condition in bail_inter_cluster_balance()
Following commit efcad25cbfb (revert "sched: influence cpu_power based
on max_freq and efficiency), all CPUs in the system have the same
cpu_power and consequently the same group capacity. Therefore, the
check in bail_inter_cluster_balance() can now no longer be used to
distinguish a higher performance cluster from one with lower
performance. The check is currently broken and always returns true for
every load balancing attempt. Fix this by using runqueue capacity
instead which can still be used as a good measure of cluster
capabilities.

Also the logic for distinguishing between idle environments and using
a different sched group capacity in update_sd_pick_busiest() is
redundant. sgs->group_capacity would now always be equal to the number
of CPUs in the group. Use sgs->group_capacity directly in conditonal
checks in that function.

Change-Id: Idecfd1ed221d27d4324b20539e5224a92bf8b751
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-03 19:23:40 -07:00
Srivatsa Vaddagiri 1b36dc118d sched: Initialize env->loop variable to 0
load_balance() function does not explicitly initialize env->loop
variable to 0. As a result, there is a vague possibility of
move_tasks() hitting a very long (unnecessary) loop when its unable to
move tasks from src_cpu. This can lead to unpleasant results like a
watchdog bark. Fix this by explicitly initializing env->loop variable
to 0 (in both load_balance() and active_load_balance_cpu_stop()).

Change-Id: I36b84c91a9753870fa16ef9c9339db7b706527be
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-25 16:07:57 +05:30
Linux Build Service Account a4e6dcf42b Merge "sched: window-stats: use policy_mutex in sched_set_window()" 2014-08-24 20:01:43 -07:00
Linux Build Service Account 355e55afc6 Merge "sched: window-stats: Avoid taking all cpu's rq->lock for long" 2014-08-24 20:01:43 -07:00
Linux Build Service Account d7bca8f374 Merge "sched: window_stats: Add "disable" mode support" 2014-08-24 20:01:41 -07:00
Linux Build Service Account bf7b729348 Merge "sched: window-stats: Fix exit race" 2014-08-24 20:01:41 -07:00
Linux Build Service Account e621b2a191 Merge "sched: window-stats: code cleanup" 2014-08-24 20:01:40 -07:00
Linux Build Service Account 5b1594145e Merge "sched: window-stats: legacy mode" 2014-08-24 20:01:39 -07:00
Linux Build Service Account 755da8b25c Merge "sched: window-stats: Code cleanup" 2014-08-24 20:01:38 -07:00
Linux Build Service Account 273f377789 Merge "sched: window-stats: Code cleanup" 2014-08-24 20:01:37 -07:00
Linux Build Service Account 0e3780151f Merge "sched: window-stats: Code cleanup" 2014-08-24 20:01:37 -07:00
Linux Build Service Account 7b3f011d4e Merge "sched: window-stats: Remove unused prev_window variable" 2014-08-24 20:01:36 -07:00
Srivatsa Vaddagiri 4f93bebd20 sched: window-stats: use policy_mutex in sched_set_window()
Several configuration variable change will result in
reset_all_window_stats() being called. All of them, except
sched_set_window(), are serialized via policy_mutex. Take
policy_mutex in sched_set_window() as well to serialize use of
reset_all_window_stats() function

Change-Id: Iada7ff8ac85caa1517e2adcf6394c5b050e3968a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-22 14:45:17 -07:00
Srivatsa Vaddagiri e4ff6c07c5 sched: window-stats: Avoid taking all cpu's rq->lock for long
reset_all_window_stats() walks task-list with all cpu's rq->lock held,
which can cause spinlock timeouts if task-list is huge (and hence lead
to a spinlock bug report). Avoid this by walking task-list without
cpu's rq->lock held.

Change-Id: Id09afd8b730fa32c76cd3bff5da7c0cd7aeb8dfb
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-22 14:45:16 -07:00
Srivatsa Vaddagiri b432b691fa sched: window_stats: Add "disable" mode support
"disabled" mode (sched_disble_window_stats = 1) disables all
window-stats related activity. This is useful when changing key
configuration variables associated with window-stats feature (like
policy or window size).

Change-Id: I9e55c9eb7f7e3b1b646079c3aa338db6259a9cfe
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-22 14:45:15 -07:00
Srivatsa Vaddagiri e81a1d6ece sched: window-stats: Fix exit race
Exiting tasks are removed from tasklist and hence at some point will
become invisible to do_each_thread/for_each_thread task iterators.
This breaks the functionality of reset_all_windows_stats() which *has*
to reset stats for *all* tasks.

This patch causes exiting tasks stats to be reset *before* they are
removed from tasklist. DONT_ACCOUNT bit in exiting task's ravg.flags
is also marked so that their remaining execution time is not accounted
in cpu busy time counters (rq->curr/prev_runnable_sum).
reset_all_windows_stats() is thus guaranteed to return with all task's
stats reset to 0.

Change-Id: I5f101156a4f958c1b3f31eb0db8cd06e621b75e9
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-22 14:45:11 -07:00
Srivatsa Vaddagiri f90ea88fa6 sched: window-stats: code cleanup
Provide a wrapper function to reset task's window statistics. This will be
reused by a subsequent patch

Change-Id: Ied7d32325854088c91285d8fee55d5a5e8a954b3
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-22 14:43:50 -07:00
Srivatsa Vaddagiri 85ed6be992 sched: window-stats: legacy mode
Support legacy mode, which results in busy time being seen by governor
that is close to what it would have seen via existing APIs i.e
get_cpu_idle_time_us(), get_cpu_iowait_time_us() and
get_cpu_idle_time_jiffy(). In particular, legacy mode means that only
task execution time is counted in rq->curr_runnable_sum and
rq->prev_runnable_sum. Also task migration does not result in
adjustment of those counters.

Change-Id: If374ccc084aa73f77374b6b3ab4cd0a4ca7b8c90
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-22 14:43:14 -07:00
Srivatsa Vaddagiri da60007442 sched: window-stats: Code cleanup
Collapse duplicated comments about keeping few of sysctl knobs
initialized to same value as their non-sysctl copies

Change-Id: Idc8261d86b9f36e5f2f2ab845213bae268ae9028
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-22 14:43:13 -07:00
Srivatsa Vaddagiri dafe791457 sched: window-stats: Code cleanup
Remove code duplication associated with update of various window-stats
related sysctl tunables

Change-Id: I64e29ac065172464ba371a03758937999c42a71f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-22 14:43:12 -07:00
Srivatsa Vaddagiri 25d5c94d24 sched: window-stats: Code cleanup
add_task_demand() and 'long_sleep' calculation in it are not strictly
required. rq_freq_margin() check for need to change frequency, which
removes need for long_sleep calculation. Once that is removed, need
for add_task_demand() vanishes.

Change-Id: I936540c06072eb8238fc18754aba88789ee3c9f5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-22 14:43:12 -07:00
Srivatsa Vaddagiri e1ea811d7a sched: window-stats: Remove unused prev_window variable
Remove unused prev_window variable in 'struct ravg'

Change-Id: I22ec040bae6fa5810f9f8771aa1cb873a2183746
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-22 14:43:11 -07:00
Ian Maund 6440f462f9 Merge upstream tag 'v3.10.49' into msm-3.10
* commit 'v3.10.49': (529 commits)
  Linux 3.10.49
  ACPI / battery: Retry to get battery information if failed during probing
  x86, ioremap: Speed up check for RAM pages
  Score: Modify the Makefile of Score, remove -mlong-calls for compiling
  Score: The commit is for compiling successfully.
  Score: Implement the function csum_ipv6_magic
  score: normalize global variables exported by vmlinux.lds
  rtmutex: Plug slow unlock race
  rtmutex: Handle deadlock detection smarter
  rtmutex: Detect changes in the pi lock chain
  rtmutex: Fix deadlock detector for real
  ring-buffer: Check if buffer exists before polling
  drm/radeon: stop poisoning the GART TLB
  drm/radeon: fix typo in golden register setup on evergreen
  ext4: disable synchronous transaction batching if max_batch_time==0
  ext4: clarify error count warning messages
  ext4: fix unjournalled bg descriptor while initializing inode bitmap
  dm io: fix a race condition in the wake up code for sync_io
  Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code
  clk: spear3xx: Use proper control register offset
  ...

In addition to bringing in upstream commits, this merge also makes minor
changes to mainitain compatibility with upstream:

The definition of list_next_entry in qcrypto.c and ipa_dp.c has been
removed, as upstream has moved the definition to list.h. The implementation
of list_next_entry was identical between the two.

irq.c, for both arm and arm64 architecture, has had its calls to
__irq_set_affinity_locked updated to reflect changes to the API upstream.

Finally, as we have removed the sleep_length member variable of the
tick_sched struct, all changes made by upstream commit ec804bd do not
apply to our tree and have been removed from this merge. Only
kernel/time/tick-sched.c is impacted.

Change-Id: I63b7e0c1354812921c94804e1f3b33d1ad6ee3f1
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-08-20 13:23:09 -07:00
Peter Zijlstra c5ac12693f arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit: 4e857c58efeb99393cba5a5d0d8ec7117183137c
[joonwoop@codeaurora.org: fixed trivial merge conflict.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-08-15 11:45:28 -07:00
Peter Zijlstra 7d9e69c77f arch: Prepare for smp_mb__{before,after}_atomic()
Since the smp_mb__{before,after}*() ops are fundamentally dependent on
how an arch can implement atomics it doesn't make sense to have 3
variants of them. They must all be the same.

Furthermore, the 3 variants suggest they're only valid for those 3
atomic ops, while we have many more where they could be applied.

So move away from
smp_mb__{before,after}_{atomic,clear}_{dec,inc,bit}() and reduce the
interface to just the two: smp_mb__{before,after}_atomic().

This patch prepares the way by introducing default implementations in
asm-generic/barrier.h that default to a full barrier and providing
__deprecated inlines for the previous 6 barriers if they're not
provided by the arch.

This should allow for a mostly painless transition (lots of deprecated
warns in the interim).

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-wr59327qdyi9mbzn6x937s4e@git.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Chen, Gong" <gong.chen@linux.intel.com>
Cc: John Sullivan <jsrhbz@kanargh.force9.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit: febdbfe8a91ce0d11939d4940b592eb0dba8d663
[joonwoop@codeaurora.org: fixed trivial merge conflict.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-08-15 11:45:27 -07:00
Steve Muckle 0a0adbb0b1 sched: disable frequency notifications by default
The frequency notifications from the scheduler do not currently respect
synchronous topologies. If demand on CPU 0 is driving frequency high and
CPU 1 is in the same frequency domain, and demand on CPU 1 is low,
frequency notifiers will be continuously sent by CPU 1 in an attempt to
have its frequency lowered.

Until the notifiers are fixed, disable them by default. They can still
be re-enabled at runtime.

Change-Id: Ic8a927af2236d8fe83b4f4a633b20a8ddcfba359
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-08-12 11:15:34 -07:00
Steve Muckle 7ebd479ae3 sched: fix misalignment between requested and actual windows
When set_window_start() is first executed sched_clock() has not yet
stabilized. Refresh the sched_init_jiffy and sched_clock_at_init_jiffy
values until it is known that sched_clock has stabilized - this will
be the case by the time a client calls the sched_set_window() API.

Change-Id: Icd057707ff44c3b240e5e7e96891b23c95733daa
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-08-12 11:15:33 -07:00
Syed Rameez Mustafa efcad24cbf Revert "sched: Influence cpu_power based on max_freq and efficiency"
This reverts commit 0951ec0ff1 ("sched:
Influence cpu_power based on max_freq and efficiency") to let all cpus
be seen at same 'cpu_power' from load balance perspective. Without
this revert, some cpus will be seen to have more 'cpu_power' than
others, causing tasks to incur wait-time despite availability of idle
cpus. This happens because a cpu with low 'cpu_power' can fail to see
imbalance with another cpu having higher 'cpu_power' and thus can go
idle without pulling any work.

Change-Id: Iccb34319c527d5b45f29c2d12d2ebc7acdd9d07e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-08-12 11:15:33 -07:00
Olav Haugan df91ad278c sched: Make RAVG_HIST_SIZE tunable
Make RAVG_HIST_SIZE available from /proc/sys/kernel/sched_ravg_hist_size
to allow tuning of the size of the history that is used in computation
of task demand.

CRs-fixed: 706138
Change-Id: Id54c1e4b6e974a62d787070a0af1b4e8ce3b4be6
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-08-12 11:15:20 -07:00
Srivatsa Vaddagiri 7b76c244c2 sched: Fix possibility of "stuck" reserved flag
check_for_migration() could mark a thread for migration (in
rq->push_task) and invoke active_load_balance_cpu_stop(). However that
thread could get migrated to another cpu by the time
active_load_balance_cpu_stop() runs, which could fail to clear
reserved flag for a cpu and drop task_sruct reference when cpu has
only one task (stopper thread running
active_load_balance_cpu_stop()). This would cause a cpu to have
reserved bit stuck, which prevents it from being used effectively.

Fix this by having active_load_balance_cpu_stop() drop reserved bit
always.

Change-Id: I2464a46b4ddb52376a95518bcc95dd9768e891f9
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-12 10:51:31 -07:00
Srivatsa Vaddagiri 0699a566d3 sched: initialize env->flags variable to 0
env->flags and env->new_dst_cpu fields are not initialized in
load_balance() function. As a result, load_balance() could wrongly see
LBF_SOME_PINNED flag set and access (bogus) new_dst_cpu's runqueue
leading to invalid memory reference. Fix this by initializing
env->flags field to 0. While we are at it, fix similar issue in
active_load_balance_cpu_stop() function, although there is no harm
present currently in that function with uninitialized env->flags
variable.

Change-Id: Ied470b0abd65bf2ecfa33fa991ba554a5393f649
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-12 10:51:31 -07:00
Srivatsa Vaddagiri 098d8371ad sched: window-stats: 64-bit type for curr/prev_runnable_sum
Expand rq->curr_runnable_sum and rq->prev_runnable_sum to be 64-bit
counters as otherwise they can easily overflow when a cpu has many
tasks.

Change-Id: I68ab2658ac6a3174ddb395888ecd6bf70ca70473
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-12 10:51:31 -07:00
Srivatsa Vaddagiri 5e8f14fbbc sched: window-stats: Allow acct_wait_time to be tuned
Add sysctl interface to tune sched_acct_wait_time variable at runtime

Change-Id: I38339cdb388a507019e429709a7c28e80b5b3585
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-12 10:51:30 -07:00
Srivatsa Vaddagiri 4da7e167b3 sched: window-stats: Account interrupt handling time as busy time
Account cycles spent by idle cpu handling interrupts (irq or softirq)
towards its busy time.

Change-Id: I84cc084ced67502e1cfa7037594f29ed2305b2b1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-12 10:51:30 -07:00
Srivatsa Vaddagiri a3c1ecd80a sched: window-stats: Account idle time as busy time
Provide a knob to consider idle time as busy time, when cpu becomes
idle as a result of io_schedule() call. This will let governor
parameter 'io_is_busy' to be appropriately honored.

Change-Id: Id9fb4fe448e8e4909696aa8a3be5a165ad7529d3
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-12 10:51:30 -07:00
Srivatsa Vaddagiri c55cc8b64a sched: window-stats: Account wait time
Extend window-based task load accounting mechanism to include
wait-time as part of task demand. A subsequent patch will make this
feature configurable at runtime.

Change-Id: I8e79337c30a19921d5c5527a79ac0133b385f8a9
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-12 10:51:29 -07:00
Srivatsa Vaddagiri dca58d0666 sched: window-stats: update task demand on tick
A task can execute on a cpu for a long time without being preempted
or migrated. In such case, its demand can become outdated for a long
time. Prevent that from happening by updating demand of currently
running task during scheduler tick.

Change-Id: I321917b4590635c0a612560e3a1baf1e6921e792
CRs-Fixed: 698662
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-12 10:51:29 -07:00
Srivatsa Vaddagiri fa15bd9937 sched: Fix herding issue
check_for_migration() could run concurrently on multiple cpus,
resulting in multiple tasks wanting to migrate to same cpu. This could
cause cpus to be underutilized and lead to increased scheduling
latencies for tasks. Fix this by serializing select_best_cpu() calls
from cpus running check_for_migration() check and marking selected
cpus as reserved, so that subsequent call to select_best_cpu() from
check_for_migration() will skip reserved cpus.

Change-Id: I73a22cacab32dee3c14267a98b700f572aa3900c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-12 10:51:29 -07:00
Srivatsa Vaddagiri 02cc889604 sched: window-stats: print window size in /proc/sched_debug
Printing window size in /proc/sched_debug would provide useful
information to debug scheduler issues.

Change-Id: Ia12ab2cb544f41a61c8a1d87bf821b85a19e09fd
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-12 10:51:29 -07:00
Srivatsa Vaddagiri 1f12e6698c sched: Extend ftrace event to record boost and reason code
Add a new ftrace event to record changes to boost setting. Also extend
sched_task_load() ftrace event to record boost setting and reason code
passed to select_best_cpu(). This will be useful for debug purpose.

Change-Id: Idac72f86d954472abe9f88a8db184343b7730287
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-12 10:51:28 -07:00
Srivatsa Vaddagiri 232b0fe6f4 sched: Avoid needless migration
Restrict check_for_migration() to operate on fair_sched class tasks
only.

Also check_for_migration() can result in a call to select_best_cpu()
to look for a better cpu for currently running task on a cpu. However
select_best_cpu() can end up suggesting a cpu that is not necessarily
better than the cpu on which task is running currently. This will
result in unnecessary migration. Prevent that from happening.

Change-Id: I391cdda0d7285671d5f79aa2da12eaaa6cae42d7
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-12 10:51:28 -07:00
John Stultz 3984bb13c8 printk: rename printk_sched to printk_deferred
commit aac74dc495456412c4130a1167ce4beb6c1f0b38 upstream.

After learning we'll need some sort of deferred printk functionality in
the timekeeping core, Peter suggested we rename the printk_sched function
so it can be reused by needed subsystems.

This only changes the function name. No logic changes.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-07 14:30:26 -07:00
Srivatsa Vaddagiri c6d6e960df sched: Drop active balance request upon cpu going offline
A cpu could mark its currently running task to be migrated to another
cpu (via rq->push_task/rq->push_cpu) and could go offline before
active load balance handles the request. In such case, clear the
active load balance request.

Change-Id: Ia3e668e34edbeb91d8559c1abb4cbffa25b1830b
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-06 15:36:59 +05:30
Srivatsa Vaddagiri ea5020bcd2 sched: trigger immediate migration of tasks upon boost
Currently turning on boost does not immediately trigger migration of
tasks from lower capacity cpus. Tasks could incur migration latency
of up to one timer tick (when check_for_migration() is run).

Fix this by triggering a migration check on cpus with lower capacity
as soon as boost is turned on for first time.

Change-Id: I244649f9cb6608862d87631325967b887b7f4b7e
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-06 15:36:59 +05:30
Srivatsa Vaddagiri 6cd6b83b50 sched: Extend boost benefit for small and low-prio tasks
Allow small and low-prio tasks to benefit from boost, which is
expected to last for a short duration. Any task that wishes to run
during that short period is allowed boost benefit.

Change-Id: I02979a0c5feeba0f1256b7ee3d73f6b283fcfafa
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-06 15:36:59 +05:30
Srivatsa Vaddagiri a57fe9b6df sched: window-stats: Handle policy change properly
sched_window_stat_policy influences task demand and thus various
statistics maintained per-cpu like curr_runnable_sum. Changing policy
non-atomically would lead to improper accounting. For example, when
task is enqueued on a cpu's runqueue, its demand that is added to
rq->cumulative_runnable_avg could be based on AVG policy and when its
dequeued its demand that is removed can be based on MAX, leading to
erroneous accounting.

This change causes policy change to be "atomic" i.e all cpu's rq->lock
are held and all task's window-stats are reset before policy is changed.

Change-Id: I6a3e4fb7bc299dfc5c367693b5717a1ef518c32d
CRs-Fixed: 687409
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-06 15:36:59 +05:30
Srivatsa Vaddagiri 7b5f42a8e1 sched: window-stats: Reset all window stats
Currently, few of the window statistics for tasks are not reset when
window size is changing. Fix this to completely reset all window
statistics for tasks and cpus. Move the reset code to a function,
which can be reused by a subsequent patch that resets same statistics
upon policy change.

Change-Id: Ic626260245b89007c4d70b9a07ebd577e217f283
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-06 15:36:58 +05:30
Srivatsa Vaddagiri 578851abaa sched: window-stats: Additional error checking in sched_set_window()
Check for invalid window size passed as argument to sched_set_window()
Also move up local_irq_disable() call to avoid thread from being
preempted during calculation of window_start and its comparison
against sched_clock(). Use right macro to evluate whether window_start
argument is ahead in time or not.

Change-Id: Idc0d3ab17ede08471ae63b72a2d55e7f84868fd6
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-06 15:36:58 +05:30
Srivatsa Vaddagiri f2a21ce199 sched: window-stats: Fix incorrect calculation of partial_demand
When using MAX_POLICY, partial_demand is calculated incorrectly as 0.
Fix this by picking maximum of previous 4 windows and most recent
sample.

Change-Id: I27850a510746a63b5382c84761920fc021b876c5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-28 12:51:07 -07:00
Srivatsa Vaddagiri 5c351b809f sched: window-stats: Fix potential wrong use of rq
'rq' reference to a cpu where a waking task last ran can be
potentially incorrect leading to incorrect accounting. This happens
when task_cpu() changes between points A & B in try_to_wake_up()
listed below:

try_to_wake_up()
{

cpu = src_cpu = task_cpu(p);
rq = cpu_rq(src_cpu);		-> Point A

..

while (p->on_cpu)
	cpu_relax();

smp_rmb();

raw_spin_lock(&rq->lock);	-> Point B

Fix this by initializing 'rq' variable after task has slept (its
on_cpu field becomes 0).

Also avoid adding task demand to its old cpu runqueue
(prev_runnable_sum) in case it's gone offline.

Change-Id: I9e5d3beeca01796d944137b5416805b983a6e06e
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-28 12:51:07 -07:00
Mateusz Guzik 4aba6e3634 sched: Fix possible divide by zero in avg_atom() calculation
commit b0ab99e7736af88b8ac1b7ae50ea287fffa2badc upstream.

proc_sched_show_task() does:

  if (nr_switches)
	do_div(avg_atom, nr_switches);

nr_switches is unsigned long and do_div truncates it to 32 bits, which
means it can test non-zero on e.g. x86-64 and be truncated to zero for
division.

Fix the problem by using div64_ul() instead.

As a side effect calculations of avg_atom for big nr_switches are now correct.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1402750809-31991-1-git-send-email-mguzik@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-28 08:00:07 -07:00
Linux Build Service Account 0c445a72ec Merge "sched: set initial task load to just above a small task" 2014-07-26 12:41:28 -07:00
Steve Muckle 1eaec37bfd sched: set initial task load to just above a small task
To maximize power savings, set the intial load of newly created
tasks to just above a small task. Setting it below the small
task threshold would cause new tasks to be packed which is
very likely too aggressive.

Change-Id: Idace26cc0252e31a5472c73534d2f5277a1e3fa4
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-07-25 10:55:56 -07:00
Olav Haugan 47c59a6b72 sched/fair: Check whether any CPUs are available
There is a possibility that there are no allowed CPUs online when we try
to select the best cpu for a small task. Add a check to ensure we don't
continue if there are no CPUs available.

CRs-fixed: 692505
Change-Id: Iff955fb0d0b07e758a893539f7bc8ea8aa09d9c4
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-07-25 08:29:24 -07:00
Steve Muckle 1aa9b6992a sched: fixes for compilation without CONFIG_SCHED_HMP
These fixes are necessary to compile without CONFIG_SCHED_HMP
enabled.

Change-Id: Iabbde3c22a81288242ed3a44fdfdb2a16db8b072
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-07-22 16:08:02 -07:00
Steve Muckle 483fa0ade3 sched: enable hmp, power aware scheduling for targets with > 4 CPUs
Enabling and disabling hmp/power-aware scheduling is meant to be done
via kernel command line options. Until that is fully supported however,
take advantage of the fact that current targets with more than 4 CPUs
will need these features.

Change-Id: I4916805881d58eeb54747e4b972816ffc96caae7
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-07-22 16:08:01 -07:00
Srivatsa Vaddagiri fefafa08b7 sched: remove sysctl control for HMP and power-aware task placement
There is no real need to control HMP and power-aware task placement at
runtime after kernel has booted. Boot-time control should be
sufficient. Not allowing for runtime (sysctl) support simplifies the
code quite a bit.

Also rename sysctl_sched_enable_hmp_task_placement to be shorter.

Change-Id: I60cae51a173c6f73b79cbf90c50ddd41a27604aa
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 16:07:58 -07:00
Srivatsa Vaddagiri 87df0beb43 sched: support legacy mode better
It should be possible to bypass all HMP scheduler changes at runtime
by setting sysctl_sched_enable_hmp_task_placement and
sysctl_sched_enable_power_aware to 0.  Fix various code paths to honor
this requirement.

Change-Id: I74254e68582b3f9f1b84661baf7dae14f981c025
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 16:07:05 -07:00
Srivatsa Vaddagiri 1ef9206d1c sched: code cleanup
Avoid the long if() block of code in set_task_cpu(). Move that code to
its own function

Change-Id: Ia80a99867ff9c23a614635e366777759abaccee4
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 16:05:48 -07:00
Srivatsa Vaddagiri cf0d1f54be sched: Add BUG_ON when task_cpu() is incorrect
It would be fatal if task_cpu() information for a task does not
accurately represent the cpu on which its running. All sorts of wierd
issues can arise if that were to happen! Add a BUG_ON() in context
switch to detect such cases.

Change-Id: I4eb2c96c850e2247e22f773bbb6eedb8ccafa49c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:23:05 -07:00
Srivatsa Vaddagiri cea49887c4 sched: avoid active migration of tasks not in TASK_RUNNING state
Avoid wasting effort in migrating tasks that are about to sleep.

Change-Id: Icf9520b1c8fa48d3e071cb9fa1c5526b3b36ff16
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:23:05 -07:00
Srivatsa Vaddagiri 955b16f3ce sched: fix up task load during migration
Fix the hack to set task's on_rq to 0 during task migration. Task's
load is temporarily added back to its runqueue so that
update_task_ravg() can fixup task's load when its demand is changing.
Task's load is removed immediately afterwards.

Temporarily setting p->on_rq to 0 introduces a race condition with
try_to_wake_up(). Another task (task A) may be attempting to wake
up the migrating task (task B). As long as task A sees task B's
p->on_rq as 1, the wake up will not continue. Changing p->on_rq to
0, then back to 1, allows task A to continue "waking" task B, at
which point we have both try_to_wake_up and the migration code
attempting to set the cpu of task B at the same time.

CRs-Fixed: 695071
Change-Id: I525745f144da4ffeba1d539890b4d46720ec3ef1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:23:05 -07:00
Prasad Sodagudi 6075c8a6f7 sched: avoid pushing tasks to an offline CPU
Currently active_load_balance_cpu_stop is run by cpu stopper and it
pushes running tasks off the busiest CPU onto idle target CPU. But
there is no check to see whether target cpu is offline or not before
pushing the tasks.  With the introduction of active migration in the
scheduler tick path (see check_for_migration()) there have been
instances of attempts to migrate tasks to offline CPUs.

Add a check as to whether the target cpu is online or not to prevent
scheduling on offline CPUs.

Change-Id: Ib8ac7f8aeabd3ca7365f3eae977075952dab4f21
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2014-07-22 14:23:04 -07:00
Syed Rameez Mustafa 8f7e5b8ee8 sched: Add a per rq max_possible_capacity for use in power calculations
In the absence of a power driver providing real power values, the scheduler
currently defaults to using capacity of a CPU as a measure of power. This,
however, is not a good measure since the capacity of a CPU can change due
to thermal conditions and/or other hardware restrictions. These frequency
restrictions have no effect on the power efficiency of those CPUs.
Introduce max possible capacity of a CPU to track an absolute measure of
capacity which translates into a good absolute measure of power efficiency.
Max possible capacity takes the max possible frequency of CPUs into account
instead of max frequency.

Change-Id: Ia970b853e43a90eb8cc6fd990b5c47fca7e50db8
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:04 -07:00
Syed Rameez Mustafa 16fa06671f sched: Disable interrupts when holding the rq lock in sched_get_busy()
Interrupts can end up waking processes on the same cpu as the one for
which sched_get_busy() is called. Since sched_get_busy() takes the rq
lock this can result in a deadlock as the same rq lock is required to
enqueue the waking up task. Fix the deadlock by disabling interrupts
when taking the rq lock.

Change-Id: I46e14a14789c2fb0ead42363fbaaa0a303a5818f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:03 -07:00
Srivatsa Vaddagiri 9361844015 sched: Make wallclock more accurate
update_task_ravg() in context switch uses wallclock that is updated
before running put_prev_task() and pick_next_task(), both of which can
take some time. Its better to update wallclock after those routines,
leading to more accurate accounting.

Change-Id: I882b1f0e8eddd2cc17d42ca2ab8f7a2841b8d89a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:23:03 -07:00
Syed Rameez Mustafa cd0c28ddcc sched: Resolve some 64 bit compilation issues
On 64 bit architectures a pointer is no longer the same size as an
int. Therefore any place that does a conversion from int to a pointer
type gives a compilation error. Resolve these by type casting to long
first which is guaranteed to be the same size as a pointer.

Change-Id: I518ac3c562bd3f85893f91ad6dbcd2f0c7bf081b
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:03 -07:00
Syed Rameez Mustafa 3d972d3af1 sched: Make task and CPU load calculations safe from truncation
Load calculations have been modified to accept and return 64 bit values.
Fix up all the places where we make such calculations to store the result
in 64 bit variables. This is necessary to avoid issues caused by
truncation of values.

While at it update scale_task_load() to scale_load_to_cpu(). This is
because the API is used to scale load of both individual tasks as well as
the cumulative load of CPUs. In this sense the name was a misnomer. Also
clean up power_cost() to use max_task_load().

Change-Id: I51e683e1592a5ea3c4e4b2b06d7a7339a49cce9c
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:03 -07:00
Syed Rameez Mustafa 8eebaa1826 sched/fair: Introduce C-state aware task placement for small tasks
Small tasks execute for small durations. This means that the power
cost of taking CPUs out of a low power mode outweigh any performance
advantage of using an idle core or power advantage of using the most
power efficient CPU. Introduce C-state aware task placement for small
tasks. This requires a two pass approach where we first determine the
most power effecient CPU and establish a band of CPUs offering a
similar power cost for the task. The order of preference then is as
follows:

1) Any mostly idle CPU in active C-state in the same power band.
2) A CPU with the shallowest C-state in the same power band.
3) A CPU with the least load in the same power band.
4) Lowest power CPU in a higher power band.

The patch also modifies the definition of a small task. Small tasks
are now determined relative to minimum capacity CPUs in the system
and not the task CPU.

Change-Id: Ia09840a5972881cad7ba7bea8fe34c45f909725e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:02 -07:00
Srivatsa Vaddagiri 34e72241a9 sched: Make the scheduler aware of C-state for cpus
C-state represents a power-state of a cpu. A cpu could have one or
more C-states associated with it. C-state transitions are based on
various factors (expected sleep time for example). "Deeper" C-states
implies longer wakeup latencies.

Scheduler needs to know wakeup latency associated with various C-states.
Having this information allows the scheduler to make better decisions
during task placement. For example:

- Prefer an idle cpu that is in the least shallow C-state
- Avoid waking up small tasks on a idle cpu unless it is in the least
  shallow C-state

This patch introduces APIs in the scheduler that can be used by the
architecture specific power-management driver to inform the scheduler
about C-states for cpus.

Change-Id: I39c5ae6dbace4f8bd96e88f75cd2d72620436dd1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:02 -07:00
Syed Rameez Mustafa 65eab4a6f5 sched/fair: Introduce scheduler boost for low latency workloads
Certain low latency bursty workloads require immediate use of highest
capacity CPUs in HMP systems. Existing load tracking mechanisms may be
unable to respond to the sudden surge in the system load within the
latency requirements. Introduce the scheduler boost feature for such
workloads. While boost is in effect the scheduler bypasses regular load
based task placement and prefers highest capacity CPUs in the system
for all non-small fair sched class tasks. Provide both a kernel and
userspace API for software that may have apriori knowledge about the
system workload.

Change-Id: I783f585d1f8c97219e629d9c54f712318821922f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:02 -07:00
Srivatsa Vaddagiri a6fa50d177 sched: Move call to trace_sched_cpu_load()
select_best_cpu() invokes trace_sched_cpu_load() for all online cpus
in a loop, before it enters the loop for core selection. Moving
invocation of trace_sched_cpu_load() in inner core loop is potentially
more efficient.

Change-Id: Iae1c58b26632edf9ec5f5da905c31356eb95c925
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:01 -07:00
Srivatsa Vaddagiri 73bf6e4e2d sched: fair: Reset balance_interval before sending NOHZ kick
balance_interval needs to be reset for anycpu being kicked. Otherwise it
can end up ignoring the kick (i.e not doing load balance for itself).

Also bypass the check for existence of idle cpus in tickless state for
!CONFIG_SCHED_HMP to allow for more aggressive load balance.

Change-Id: I52365ee7c2997ec09bd93c4e9ae0293a954e39a8
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:01 -07:00
Srivatsa Vaddagiri 5b97613e54 sched: Avoid active migration of small tasks
We currently check the need to migrate the currently running task in
scheduler_tick(). Skip that check for small tasks, as its not worth
the effort!

Change-Id: Ic205cc6452f42fde6be6b85c3bf06a8542a73eba
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:01 -07:00
Srivatsa Vaddagiri e87706488c sched: Account for cpu's current frequency when calculating its power cost
In estimating cost of running a task on a given cpu, cost of cpu at
its current frequency needs to over-ride cost at frequency demanded by
task, where cur_freq exceeds required frequency of task. This is
because placing a task on a cpu can only result in an increase of
cpu's frequency.

Change-Id: I021a3bbaf179bf1ec2c7f4556870256936797eb9
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:01 -07:00
Srivatsa Vaddagiri 8e4ffa0d07 sched: make sched_set_window() return failure when PELT is in use
Window-based load tracking is a pre-requisite for the scheduler to
feed cpu load information to the governor. When PELT is in use, return
failure when governor attempts to set window-size. This will let
governor fall back to other APIs for retrieving cpu load statistics.

Change-Id: I0e11188594c1a54b3b7ff55447d30bfed1a01115
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:01 -07:00
Srivatsa Vaddagiri 7d8d1bd095 sched: debug: Print additional information in /proc/sched_debug
Provide information in /proc/sched_debug on min_capacity, max_capacity
and whether pelt or window-based task load statistics is in use.

Change-Id: Ie4e9450652f4c83110dda75be3ead8aa5bb355c3
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:23:00 -07:00
Srivatsa Vaddagiri df5fd251ed sched: Move around code
Move up chunk of code to be defined early. This helps a subsequent
patch that needs update_min_max_capacity()

Change-Id: I9403c7b4dcc74ba4ef1034327241c81df97b01ea
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:00 -07:00
Srivatsa Vaddagiri af9a2812eb sched: Update capacity of all online cpus when min_max_freq changes
During bootup, its possible for min_max_freq to change as frequency
information for additional clusters is processed. That would need to
trigger recalculation of capacity/load_scale_factor for all (online)
cpus, as they strongly depend on min_max_freq variable. Not doing so
would imply some cpus will have their capacity/load_scale_factor
computed wrongly.

Change-Id: Iea5a0a517a2d71be24c2c71cdd805c0733ce37f8
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:23:00 -07:00
Srivatsa Vaddagiri 2baa89cd4c sched: update task statistics when CPU frequency changes
A CPU may have its frequency changed by a different CPU. Because of
this, it is not guaranteed that we will update task statistics at
approximately the same time that the frequency change occurs. To
guard against accruing time to a task at the wrong frequency, update
the task's window-based statistics if the CPU it is running on
changes frequency.

Change-Id: I333c3f8aa82676bd2831797b55fd7af9c4225555
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:23:00 -07:00
Srivatsa Vaddagiri 11262c23d6 sched: Add new trace events
Add trace events for update_task_ravg(), update_history(), and
set_task_cpu(). These tracepoints are useful for monitoring the
per-task and per-runqueue demand statistics.

Change-Id: Ibec9f945074ff31d1fc1a76ae37c40c8fea8cda9
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:59 -07:00
Steve Muckle fd3c9f6c53 sched: do not balance on exec if SCHED_HMP
Rebalancing at exec time will currently undo any beneficial placement
that has been done during fork time, since select_best_cpu() will not
discount the currently running task.

For now just skip re-evaluating task placement at exec.

Change-Id: I1e5e0fcc329b7b53c338c8c73795ebd5e85a118b
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-07-22 14:22:59 -07:00
Srivatsa Vaddagiri 00faa770e7 sched: Use historical load for freq governor input
Historical load maintained per task can be used to influence cpu
frequency better. For example, when a heavy demand task wakes up after
prolonged sleep, we could use the historical load information to alert
cpufreq governor about the need to raise cpu frequency. This patch
changes CPU busy statistics to be aggregation of historical task
demand. Also task's historical load (as defined by
sysctl_sched_window_stats_policy) is add to cpu's busy statistics
(rq->curr_runnable_sum) whenever it executes on a cpu.

Change-Id: I2b66136f138b147ba19083b9b044c4feb20d9b57
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:59 -07:00
Srivatsa Vaddagiri 0352c87d18 sched: window-stats: apply scaling to full elapsed windows
In the event that a full window (or multiple full windows) have
elapsed when updating a task's window-based stats, the runtime of
those windows needs to be scaled based on the CPU frequency. This
is currently missing, causing full windows to be accounted as having
elapsed at maximum frequency, erroneously inflating task demand.

Change-Id: I356b4279d44d4f39c8aea881c04327b70ed66183
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:22:58 -07:00
Steve Muckle 80753c7e5e sched: notify cpufreq on over/underprovisioned CPUs
After a migration occurs the source and destination CPUs may
not be running at frequencies which match the new task load on
those CPUs.

Previously, the scheduler was notifying cpufreq anytime a task
greater than a certain size migrates. This is suboptimal however
since this does not take into account the CPU's current
frequency and other task activity that may be present.

Change-Id: I5092bda3a517e1343f97e5a455957c25ee19b549
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-07-22 14:22:58 -07:00
Syed Rameez Mustafa dbd2db2471 sched: Introduce spill threshold tunables to manage overcommitment
When the number of tasks intended for a cluster exceed the number of
mostly idle CPUs in that cluster, the scheduler currently freely uses
CPUs in other clusters if possible. While this is optimal for
performance the power trade off can be quite significant. Introduce
spill threshold tunables that govern the extent to which the scheduler
should attempt to contain tasks within a cluster.

Change-Id: I797e6c6b2aa0c3a376dad93758abe1d587663624
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:22:58 -07:00
Steve Muckle dd76a2e00a sched: add affinity, task load information to sched tracepoints
Knowing the affinity mask and CPU usage of a task is helpful
in understanding the behavior of the system. Affinity information
has been added to the enq_deq trace event, and the migration
tracepoint now reports the load of the task migrated.

Change-Id: I29d8a610292b4dfeeb8fe16174e9d4dc196649b7
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-07-22 14:22:58 -07:00
Steve Muckle bb3f4aae22 sched: add migration load change notifier for frequency guidance
When a task moves between CPUs in two different frequency domains
the cpufreq governor may wish to immediately modify the frequency
of both the source and destination CPUs of the migrating task.

A tunable is provided to establish what size task is considered
"significant" enough to warrant notifying cpufreq.

Also fix a bug that would cause load to not be accounted properly
during wakeup migrations.

Change-Id: Ie8f6b1cc4d43a602840dac18590b42a81327c95a
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-07-22 14:22:57 -07:00
Syed Rameez Mustafa 8968469989 sched/fair: Limit MAX_PINNED_INTERVAL for more frequent load balancing
Should the system get stuck in a state where load balancing is failing
due to all tasks being pinned, deferring load balancing for up to half
a second may cause further performance problems. Eventually all tasks
will not be pinned and load balancing should not be deferred for a great
length of time.

Change-Id: I06f93b5448353b5871645b9274ce4419dc9fae0f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:57 -07:00
Syed Rameez Mustafa 6317c544d8 sched/fair: Help out higher capacity CPUs when they are overcommitted
This comprises of two parts:

If we have a task to schedule, we currently don't consider CPUs where
it will not fit even if they are idle. Instead we choose the previous
CPU which is sub-optimal for performance if an idle CPU is
present. This change introduces tracking of any idle CPUs irrespective
of whether the task fits on them or not. If we don't have a good place
to put the task, prefer the lowest power idle CPU.

The other part involves the load balancer which was unable to move
tasks despite the above mentioned task placement to balance out the
load. The reason is that the load balancer checks the big cluster's
group capacity and determines that it can take twice the amount of
workload as the little cluster. Hence the big cluster does not get
marked as busy. While this behavior is intended under heavily loaded
systems where we want to push more work towards the higher capacity
CPUs, it is sub optimal when we have idle CPUs. Add the ability to
differentiate between the two scenarios when marking a group as
busy. If load_balance is called from a CPU_NOT_IDLE environment use
the the group capacity to determine whether the group is busy or
not. For everything else use number of CPUs in the group.

Change-Id: I4e8290639ad1602541a44a80ba4b2804068cac0f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:57 -07:00
Syed Rameez Mustafa 5349ec0ee8 sched/rt: Introduce power aware scheduling for real time tasks
Real Time task scheduling has historically been geared towards performance
with a significant attempt to keep higher priority tasks on the same CPU.
This is not optimal for power since the task CPU may not be the most
power efficient CPU.

Also task movement via select_lowest_rq() gives CPU priority the primary
consideration before looking at CPU topologies to find a CPU closest to
the task CPU in terms of topology. This again is not optimal for power
since the closest CPU may be significantly worse for power than CPUs
further away.

This patch removes any bias for the task CPU. When the lowest priority
CPUs in the system are found we give no consideration to the CPU topology.
Instead we find the lowest power CPU within local_cpu_mask. This takes care
of select_task_rq_rt() and push_task(). The pull model remains unaffected
since we have no room for power optimization there.

Change-Id: I4162ebe2f74be14240e62476f231f9e4a18bd9e8
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:57 -07:00
Steve Muckle ea296243e2 sched: balance power inefficient CPUs with one task
Normally the load balancer does not pay attention to CPUs with one
task since it is not possible to subdivide that load any further
to achieve better balance. With power aware scheduling however it
may be desirable to relocate that one task if the CPU it is currently
executing on is less power efficient than other CPUs in the system.

Change-Id: Idf3f5e22b88048184323513f0052827b884526b6
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:56 -07:00
Steve Muckle 6e6c17d05c sched: check for power inefficient task placement in tick
Although tasks are routed to the most power-efficient CPUs during
task wakeup, a CPU-bound task will not go through this decision point.
Load balancing can help if it is modified to dislodge a single task
from an inefficient CPU. The situation can be further improved if
during the tick, the task placement is checked to see if it is
optimal.

This sort of checking is already being done to ensure proper
task placement in heterogneous CPU topologies, so checking for
power efficient task placement fits pretty well.

Change-Id: I71e56d406d314702bc26dee1438c0eeda7699027
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:56 -07:00
Steve Muckle 7edb149465 sched: do nohz load balancing in order of power efficiency
The nohz load balancer CPU does load balancing on behalf of all
idle tickless CPUs. In the interest of power efficiency though, we
should do load balancing on the most power efficient idle tickless
CPU first, and then work our way towards the least power efficient
idle tickless CPU. This will help load find its way to the most
power efficient CPUs in the system.

Since when selecting the CPU to balance next it is unknown what
task load would be pulled, a frequency must be assumed in order
to do a comparison of CPU power consumption. The maximum
freqeuncy supported by all CPUs is used for this.

Change-Id: I96c7f4300fde2c677c068dc10fc0e57f763eb9b2
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:56 -07:00
Steve Muckle ac1defc4d3 sched: run idle_balance() on most power-efficient CPU
When a CPU goes idle, it checks to see whether it can pull any load
from other busy CPUs. The CPU going idle may not be the most
power-efficient idle CPU in the system however.

This patch causes the CPU going idle to check to see whether
there is a more power-efficient idle CPU within the same
lowest sched domain. If there is, then it runs the load balancer
on behalf of that CPU instead of itself.

Since it is unknown at this point what task load would be pulled,
a frequency must be assumed for this in order to do a comparison
of CPU power consumption. The maximum freqeuncy supported by all
CPUs is used for this.

Change-Id: I5eedddc1f7d10df58ecd358f37dba563eeecf4fc
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:56 -07:00
Steve Muckle 5ef89ee90e sched: add hook for platform-specific CPU power information
To enable power-aware scheduling, provide a hook/infrastructure
for platforms to communicate CPU power requirements for each
supported CPU frequency. This information is then used to estimate
the cost of running a task on a given CPU.

Currently, an assumption is made that the task will be running
by itself on the CPU. Given the current policy tries to spread
tasks as much as possible this assumption should not be too
far off.

Change-Id: I19f1fa760a0d43222d2880f8aec0508c468b39bb
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:56 -07:00
Steve Muckle bf1afbbbcd sched: add power aware scheduling sysctl
The sched_enable_power_aware sysctl will control whether
or not scheduling decisions are influenced by the power
consumption of individual CPUs.

Change-Id: I312f892cf76a3fccc4ecc8aa6703908b205267f0
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:55 -07:00
Srivatsa Vaddagiri a6e9741047 sched: Extend update_task_ravg() to accept wallclock as argument
This will make it easier to account interrupt time on a cpu,
introduced in a subsequent patch.

Change-Id: I0e1fb5255c280ca374fd255e7fc19d5de9f8b045
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:55 -07:00
Srivatsa Vaddagiri 61490dcfb4 sched: add sched_get_busy, sched_set_window APIs
sched_get_busy() returns the busy time of a cpu during the most
recent completed window.
sched_set_window() will set window size and aligns windows across
all CPUs.

Change-Id: Ic53e27f43fd4600109b7b6db979e1c52c7aca103
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:55 -07:00
Steve Muckle d2cc14b9e2 sched: window-stats: adjust RQ curr, prev sums on task migration
Adjust cpu's busy time in its recent and previous window upon task
migration. This would enable scheduler to provide better inputs to
cpufreq governor on a cpu's busy time in a given window.

Change-Id: Idec2ca459382e9f46d882da3af53148412d631c6
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:54 -07:00
Steve Muckle d7b56a170f sched: window-stats: Add aggregated runqueue windowed stats
Add counters per-cpu to track its busy time in the latest window and
one previous to that. This would be needed to track accurate busy time
per-cpu that accounts for migrations. Basically once a task migrates,
its execution time in current window is migrated as well to new cpu.

The idle task's runtime is not accounted since it should not count
towards runqueue busy time.

Change-Id: I4014dd686f95dbbfaa4274269bc36ed716573421
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:54 -07:00
Srivatsa Vaddagiri af4d1578b9 sched: window-stats: add prev_window counter per-task
Currently windows where tasks had no execution time are ignored.
However accurate accounting of cpu busy time that factors in migration
would need to know actual utilization of a task in the window previous
to the latest one. This would help scheduler guide cpufreq governor on
busy time per-cpu that is not subject to migration induced errors.

Change-Id: I5841b1732c83e83d69002139de3bdb93333ce347
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:54 -07:00
Srivatsa Vaddagiri 975dbc9783 sched: window-stats: synchronize windows across cpus
Synchronizing windows across cpus for task load measurements
simplifies cpu busy time accounting during migrations. For task
migrations, its usage in current window can be carried over to its new
cpu. This lets cpufreq governor see a correct picture of cpu busy time
that is not affected by migrations.

This patch lines up windows across cpus. One of the cpu, sync_cpu,
serves as a reference for all others. During bootup sync_cpu would
initialize its window_start (from its sched_clock()). Other cpus will
synchronize their window_start in reference to sync_cpu. This patch
assumes synchronous sched_clock() across cpus and may need some change
to address architectures which do not provide such synchronized
sched_clock().

Change-Id: I13381389a72f5f9f85cc2446401d493a55c78ab7
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:54 -07:00
Srivatsa Vaddagiri bf90a4be22 sched: window-stats: Do not account wait time
Task load statistics are used for two purposes : cpu frequency
management and placement. Task's load can't be accurately judged by
its wait time. For ex: a task could have waited for 10ms and when given
opportunity to run, could just execute for 1ms. Accounting for 11ms as
task's demand could be over-stating its needs in this example. For
now, remove wait time from task demand and instead let task load be
derived from its actual exec time. This may need to become a tunable
feature.

Change-Id: I47e94c444c6b44c3b0810347287d50f1ee685038
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:53 -07:00
Srivatsa Vaddagiri bd020d066f sched: window-stats: update during migration and earlier at wakeup
During migrations accounting needs to be done in set_task_cpu() to
subtract the task activity from the source CPU and add it to the
destination CPU. This accounting will require that the task's window
based load statistics be up to date.

Unfortunately, the window-based statistics cannot always be updated in
set_task_cpu() because they are already being updated in the wakeup
path. We cannot update the statistics solely in the wakeup path
because not all wakeups are migrations. Those non-migrating wakeups
will not enter set_task_cpu().

To ensure the window-based stats are always updated for both wakeup
migrations and regular migrations, they are updated earlier in the
wakeup path, and also updated in set_task_cpu if the task is already
runnable (this ensures it is not a wakeup migration, but a regular
migration).

Change-Id: Ib246028741d0be9bb38ce93679d6e6ba25b10756
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:22:53 -07:00
Srivatsa Vaddagiri f0ad6a880a sched: move definition of update_task_ravg()
set_task_cpu() will need to call update_task_ravg(). Move up
definition to make it easy.

Change-Id: I95c1c9e009bd1805f28708e8d6fd3b7b2166410e
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-07-22 14:20:35 -07:00