Commit Graph

710 Commits

Author SHA1 Message Date
Syed Rameez Mustafa 7b0a1a7e87 sched: Avoid pulling all tasks from a CPU during load balance
When running load balance, the destination CPU checks the number
of running tasks on the busiest CPU without holding the busiest
CPUs runqueue lock. This opens the load balancer to a race whereby
a third CPU running load balance at the same time; having found the
same busiest group and queue, may have already pulled one of the
waiting tasks from the busiest CPU. Under scenarios where the source
CPU is running the idle task and only a single task remains waiting on
the busiest runqueue (nr_running = 1), the destination CPU will end
up pulling the only enqueued task from that CPU, leaving the destination
CPU with nothing left to run. Fix this race, by reconfirming nr_running
for the busiest CPU, after its runqueue lock has been obtained.

Change-Id: I42e132b15f96d9d5d7b32ef4de3fb92d2f837e63
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-02-28 16:50:37 -08:00
Srivatsa Vaddagiri 8f9ba192b6 sched: Keep track of average nr_big_tasks
Extend sched_get_nr_running_avg() API to return average nr_big_tasks,
in addition to average nr_running and average nr_io_wait tasks. Also
add a new trace point to record values returned by
sched_get_nr_running_avg() API.

Change-Id: Id3591e6d04da8db484b4d1cb9d95dba075f5ab9a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-02-26 10:37:26 -08:00
Srivatsa Vaddagiri 3f116af70d sched: Fix bug in average nr_running and nr_iowait calculation
sched_get_nr_running_avg() returns average nr_running and nr_iowait
task count since it was last invoked. Fix several bugs in their
calculation.

* sched_update_nr_prod() needs to consider that nr_running count can
  change by more than 1 when CFS_BANDWIDTH feature is used

* sched_get_nr_running_avg() needs to sum up nr_iowait count across
  all cpus, rather than just one

* sched_get_nr_running_avg() could race with sched_update_nr_prod(),
  as a result of which it could use curr_time which is behind a cpu's
  'last_time' value. That would lead to erroneous calculation of
  average nr_running or nr_iowait.

While at it, fix also a bug in BUG_ON() check in
sched_update_nr_prod() function and remove unnecessary nr_running
argument to sched_update_nr_prod() function.

Change-Id: I46737614737292fae0d7204c4648fb9b862f65b2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-02-26 10:33:52 -08:00
Olav Haugan f0d9067045 sched/fair: Respect wake to idle over sync wakeup
Sync wakeup currently takes precedence over wake to idle flag. A sync
wakeup causes a task to be placed on a non-idle CPU because we expect
this CPU to become idle very shortly. However, even though the sync flag
is set there is no guarantee that the task will go to sleep right away
As a consequence performance suffers.

Fix this by preferring an idle CPU over a potential busy cpu when both
wake to idle and sync wakeup are set.

Change-Id: I6b40a44e2b4d5b5fa6088e4f16428f9867bd928d
CRs-fixed: 794424
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2015-02-24 18:35:45 -08:00
Joonwoo Park 3a037e913e sched: fix rounding error on scaled execution time calculation
It's found that the scaled execution time can be less than its actual time
due to rounding errors.  The HMP scheduler accumulates scaled execution
time of tasks to determine if tasks are in need of up-migration.  But the
rounding error prevents the HMP scheduler from accumulating 100% load which
prevents us from ever reaching an up-migrate of 100%.
Fix rounding error by rounding quotient up.

CRs-fixed: 759041
Change-Id: Ie4d9693593cc3053a292a29078aa56e6de8a2d52
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-02-18 19:54:12 -08:00
Srivatsa Vaddagiri 2385d33016 sched: Support CFS_BANDWIDTH feature in HMP scheduler
CFS_BANDWIDTH feature is not currently well-supported by HMP
scheduler. Issues encountered include a kernel panic when
rq->nr_big_tasks count becomes negative. This patch fixes HMP
scheduler code to better handle CFS_BANDWIDTH feature. The most
prominent change introduced is maintenance of HMP stats (nr_big_tasks,
nr_small_tasks, cumulative_runnable_avg) per 'struct cfs_rq' in
addition to being maintained in each 'struct rq'. This allows HMP
stats to be updated easily when a group is throttled on a cpu.

Change-Id: Iad9f378b79ab5d9d76f86d1775913cc1941e266a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-01-28 14:13:19 +05:30
Srivatsa Vaddagiri bbef4c5e1b sched: Consolidate hmp stats into their own struct
Key hmp stats (nr_big_tasks, nr_small_tasks and
cumulative_runnable_average) are currently maintained per-cpu in
'struct rq'. Merge those stats in their own structure (struct
hmp_sched_stats) and modify impacted functions to deal with the newly
introduced structure. This cleanup is required for a subsequent patch
which fixes various issues with use of CFS_BANDWIDTH feature in HMP
scheduler.

Change-Id: Ieffc10a3b82a102f561331bc385d042c15a33998
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-01-28 14:13:14 +05:30
Srivatsa Vaddagiri 7e767d3e45 sched: Add userspace interface to set PF_WAKE_UP_IDLE
sched_prefer_idle flag controls whether tasks can be woken to any
available idle cpu. It may be desirable to set sched_prefer_idle to 0
so that most tasks wake up to non-idle cpus under mostly_idle
threshold and have specialized tasks override this behavior through
other means. PF_WAKE_UP_IDLE flag per task provides exactly that. It
lets tasks with PF_WAKE_UP_IDLE flag set be woken up to any available
idle cpu independent of sched_prefer_idle flag setting. Currently
only kernel-space API exists to set PF_WAKE_UP_IDLE flag for a task.
This patch adds a user-space API (in /proc filesystem) to set
PF_WAKE_UP_IDLE flag for a given task. /proc/[pid]/sched_wake_up_idle
file can be written to set or clear PF_WAKE_UP_IDLE flag for a given
task.

Change-Id: I13a37e740195e503f457ebe291d54e83b230fbeb
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2015-01-27 19:39:57 +05:30
Linux Build Service Account 4c5b2873a2 Merge "sched: add sched feature FORCE_CPU_THROTTLING_IMMINENT" 2015-01-14 17:12:07 -08:00
Linux Build Service Account a2ada2d4c8 Merge "sched: continue to search less power efficient cpu for load balancer" 2015-01-14 17:12:06 -08:00
Joonwoo Park 66bd788705 sched: add sched feature FORCE_CPU_THROTTLING_IMMINENT
Add a new sched feature FORCE_CPU_THROTTLING_IMMINENT to perform
migration due to EA without checking frequency throttling.  This option
can give us better debugging and verification capability.

Change-Id: Iba445961a7f9812528b4e3aa9c6ddf47a3aad583
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-01-08 11:22:33 -08:00
Joonwoo Park 2c7cc326ed sched: continue to search less power efficient cpu for load balancer
When choosing a CPU to do power-aware active balance from the load
balancer currently selects the first eligible CPU it finds, even if
there is another eligible CPU which is higher-power. This can lead to
suboptimal load balancing behavior and extra migrations. Power and
performance will be impacted.

Achieve better power and performance by continuing to search the least
power efficient cpu as long as the cpu's load average is higher than or
equal to the busiest cpu found by far.

CRs-fixed: 777341
Change-Id: I14eb21ab725bf7dab88b2e1e169aced6f2d712ca
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2015-01-08 11:22:33 -08:00
Syed Rameez Mustafa 13e853e988 sched: Update cur_freq for offline CPUs in notifier callback
cpufreq governor does not send frequency change notifications for
offline CPUs. This means that a hot removed CPU's cur_freq information
can get stale if there is a frequency change while that CPU is offline.
When the offline CPU is hotplugged back in, all subsequent load
calculations are based off the stale information until another frequency
change occurs and the corresponding set of notifications are sent out.
Avoid this incorrect load tracking by updating the cur_freq for all
CPUs in the same frequency domain.

Change-Id: Ie11ad9a64e7c9b115d01a7c065f22d386eb431d5
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2015-01-05 15:36:33 -08:00
Linux Build Service Account 05e6c29de4 Merge "sched: add preference for prev_cpu in HMP task placement" 2014-12-29 17:31:49 -08:00
Linux Build Service Account 380cadc7f3 Merge "sched: Per-cpu prefer_idle flag" 2014-12-29 17:31:47 -08:00
Linux Build Service Account 307e71816e Merge "sched: Consider PF_WAKE_UP_IDLE in select_best_cpu()" 2014-12-29 17:31:47 -08:00
Olav Haugan 30d383d45b sched: Fix overflow in max possible capacity calculation
The max possible capacity calculation might overflow given large enough
max possible frequency and capacity. Fix potential for overflow.

Change-Id: Ie9345bc657988845aeb450d922052550cca48a5f
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-26 11:25:32 -08:00
Steve Muckle cecf6c46cd sched: add preference for prev_cpu in HMP task placement
At present the HMP task placement algorithm scans CPUs in numerical
order and if two identical options are found, the first one
encountered is chosen, even if it is different from the task's
previous CPU.

Add a bias towards the task's previous CPU in such situations. Any
time two or more CPUs are considered equivalent (load, C-state, power
cost), if one of them is the task's previous CPU, bias towards that
CPU. The algorithm is otherwise unchanged.

CRs-Fixed: 772033
Change-Id: I511f5b929c2bfa6fdea9e7433893c27b29ed8026
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-23 15:54:35 -08:00
Srivatsa Vaddagiri 599bfc7503 sched: Per-cpu prefer_idle flag
Remove the global sysctl_sched_prefer_idle flag and replace it with a
per-cpu prefer_idle flag. The per-cpu flag is expected to same for all
cpus in a cluster. It thus provides convenient means to disable
packing in one cluster while allowing packing in another cluster.

Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-23 09:52:43 +05:30
Srivatsa Vaddagiri 92ba1d55f3 sched: Consider PF_WAKE_UP_IDLE in select_best_cpu()
sysctl_sched_prefer_idle controls selection of idle cpus for waking
tasks. In some cases, waking to idle cpus help performance while in
other cases it hurts (as tasks incur latency associated with C-state
wakeup). Its ideal if scheduler can adapt prefer_idle behavior based
on the task that is waking up, but that's hard for scheduler to
figure by itself. PF_WAKE_UP_IDLE hint can be provided by external
module/driver in such case to guide scheduler in preferring an idle
cpu for select tasks irrespective of sysctl_sched_prefer_idle flag.

This patch enhances select_best_cpu() to consider PF_WAKE_UP_IDLE
hint. Wakeup posted from any task that has PF_WAKE_UP_IDLE set is a
hint for scheduler to prefer idle cpu for waking tasks. Similarly
scheduler will attempt to place any task with PF_WAKE_UP_IDLE set on
idle cpu when they wakeup.

CRs-Fixed: 773101
Change-Id: Ia8bf334d98fd9fd2ff9eda875430497d55d64ce6
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-23 09:52:27 +05:30
Olav Haugan 7e13b27b8b sched: Add sysctl to enable power aware scheduling
Add sysctl to enable energy awareness at runtime. This is useful for
performance/power tuning/measurements and debugging. In addition this
will match up with the Documentation/scheduler/sched-hmp.txt documentation.

Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-22 14:37:33 -08:00
Olav Haugan 294b88dc67 sched: Ensure no active EA migration occurs when EA is disabled
There exists a flag called "sched_enable_power_aware" that is not honored
everywhere. Fix this.

Change-Id: I62225939b71b25970115565b4e9ccb450e252d7c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-22 14:23:55 -08:00
Joonwoo Park fc994a4b9e sched: take account of irq preemption when calculating irqload delta
If irq raises while sched_irqload() is calculating irqload delta,
sched_account_irqtime() can update rq's irqload_ts which can be greater
than the jiffies stored in sched_irqload()'s context so delta can be
negative.  This negative delta means there was recent irq occurence.
So remove improper BUG_ON().

CRs-fixed: 771894
Change-Id: I5bb01b50ec84c14bf9f26dd9c95de82ec2cd19b5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-12-16 16:56:50 -08:00
Joonwoo Park 2cec55a2e2 sched: Prevent race conditions where upmigrate_min_nice changes
When upmigrate_min_nice is changed dec_nr_big_small_task() can trigger
BUG_ON(rq->nr_big_tasks < 0).  This happens when there is a task which was
considered as non-big task due to its nice > upmigrate_min_nice and later
upmigrate_min_nice is changed to higher value so the task becomes big task.
In this case runqueue still has nr_big_tasks = 0 incorrectly with current
implementation.  Consequently next scheduler tick sees a big task to
schedule and try to decrease nr_big_tasks which is already 0.

Introduce sched_upmigrate_min_nice which is updated atomically and re-count
the number of big and small tasks to fix BUG_ON() triggering.

Change-Id: I6f5fc62ed22bbe5c52ec71613082a6e64f406e58
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2014-12-16 11:05:22 -08:00
Olav Haugan 4beca1fd4d sched: Avoid frequent task migration due to EA in lb
A new tunable exists that allow task migration to be throttled when the
scheduler tries to do task migrations due to Energy Awareness (EA). This
tunable is only taken into account when migrations occur in the tick
path. Extend the usage of the tunable to take into account the load
balancer (lb) path also.

In addition ensure that the start of task execution on a CPU is updated
correctly. If a task is preempted but still runnable on the same CPU the
start of execution should not be updated. Only update the start of
execution when a task wakes up after sleep or moves to a new CPU.

Change-Id: I6b2a8e06d8d2df8e0f9f62b7aba3b4ee4b2c1c4d
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-13 06:43:49 -08:00
Olav Haugan a7bc092692 sched: Avoid migrating tasks to little cores due to EA
If during the check whether migration is needed we find that there is a
lower power CPU available we commence to find a new CPU for this task.
However, by the time we search for a new CPU the lower power CPU might
no longer be available. We should abort the attempt to migrate a task in
this case.

CRs-Fixed: 764788
Change-Id: I867923a82b95c599278b81cd73bb102b6aff4d03
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-13 06:43:48 -08:00
Olav Haugan 2c320f2ffa sched: Add temperature to cpu_load trace point
Add the current CPU temperature to the sched_cpu_load trace point.
This will allow us to track the CPU temperature.

CRs-Fixed: 764788
Change-Id: Ib2e3559bbbe3fe07a6b7c8115db606828bc36254
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-13 06:43:48 -08:00
Olav Haugan 0c0d18bb15 sched: Only do EA migration when CPU throttling is imminent
We do not want to migrate tasks unnecessary to avoid cache hit and other
migration latencies that could affect the performance of the system. Add
a check to only try EA migration when CPU frequency throttling is
imminent.

CRs-Fixed: 764788
Change-Id: I92e86e62da10ce15f1e76a980df3545e93d76348
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-12-13 06:34:55 -08:00
Srivatsa Vaddagiri 32c6ac7c62 sched: Avoid frequent migration of running task
Power values for cpus can drop quite considerably when it goes idle.
As a result, the best choice for running a single task in a cluster
can vary quite rapidly. As the task keeps hopping cpus, other cpus go
idle and start being seen as more favorable target for running a task,
leading to task migrating almost every scheduler tick!

Prevent this by keeping track of when a task started running on a cpu
and allowing task migration in tick path (migration_needed()) on
account of energy efficiency reasons only if the task has run
sufficiently long (as determined by sysctl_sched_min_runtime
variable).

Note that currently sysctl_sched_min_runtime setting is considered
only in scheduler_tick()->migration_needed() path and not in
idle_balance() path. In other words, a task could be migrated to
another cpu which did a idle_balance(). This limitation should not
affect high-frequency migrations seen typically (when a single
high-demand task runs on high-performance cpu).

CRs-Fixed: 756570
Change-Id: I96413b7a81b623193c3bbcec6f3fa9dfec367d99
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-13 06:34:55 -08:00
Steve Muckle 1bfb9a0dd3 sched: treat sync waker CPUs with 1 task as idle
When a CPU with one task performs a sync wakeup, its
one task is expected to sleep immediately so this CPU
should be treated as idle for the purposes of CPU selection
for the waking task.

This is only done when idle CPUs are the preferred targets
for non-small task wakeups. When prefer_idle is 0, the
CPU is left as non-idle in the selection logic so it is still
a preferred candidate for the sync wakeup.

Change-Id: I65c6535169293e8ba0c37fb5e88aec336338f7d7
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:58 -08:00
Syed Rameez Mustafa b3c5c54d72 sched: extend sched_task_load tracepoint to indicate prefer_idle
Prefer idle determines whether the scheduler prefers an idle CPU
over a busy CPU or not to wake up a task on. Knowing the correct
value of this tunable is essential in understanding placement
decisions made in select_best_cpu().

Change-Id: I955d7577061abccb65d01f560e1911d9db70298a
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-12-10 23:53:57 -08:00
Steve Muckle 84370f934b sched: extend sched_task_load tracepoint to indicate sync wakeup
Sync wakeups provide a hint to the scheduler about upcoming task
activity. Knowing which wakeups are sync wakeups from logs will
assist in workload analysis.

Change-Id: I6ffe73f2337e56b8234d4097069d5d70ab045eda
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:56 -08:00
Steve Muckle ee9ddb5f3c sched: add sync wakeup recognition in select_best_cpu
If a wakeup is a sync wakeup, we need to discount the currently
running task's load from the waker's CPU as we calculate the best
CPU for the waking task to land on.

Change-Id: I00c5df626d17868323d60fb90b4513c0dd314825
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:55 -08:00
Srivatsa Vaddagiri 6e778f0cdc sched: Provide knob to prefer mostly_idle over idle cpus
sysctl_sched_prefer_idle lets the scheduler bias selection of
idle cpus over mostly idle cpus for tasks. This knob could be
useful to control balance between power and performance.

Change-Id: Ide6eef684ef94ac8b9927f53c220ccf94976fe67
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-10 23:53:54 -08:00
Steve Muckle 75d1c94217 sched: make sched_cpu_high_irqload a runtime tunable
It may be desirable to be able to alter the scehd_cpu_high_irqload
setting easily, so make it a runtime tunable value.

Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:53 -08:00
Steve Muckle 00acd0448b sched: trace: extend sched_cpu_load to print irqload
The irqload is used in determining whether CPUs are mostly idle
so it is useful to know this value while viewing scheduler traces.

Change-Id: Icbb74fc1285be878f254ae54886bdb161b14a270
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:51 -08:00
Steve Muckle 51f0d7663b sched: avoid CPUs with high irq activity
CPUs with significant IRQ activity will not be able to serve tasks
quickly. Avoid them if possible by disqualifying such CPUs from
being recognized as mostly idle.

Change-Id: I2c09272a4f259f0283b272455147d288fce11982
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 23:53:47 -08:00
Steve Muckle a14e01109a sched: refresh sched_clock() after acquiring rq lock in irq path
The wallclock time passed to sched_account_irqtime() may be stale
after we wait to acquire the runqueue lock. This could cause problems
in update_task_ravg because a different CPU may have advanced
this CPU's window_start based on a more up-to-date wallclock value,
triggering a BUG_ON(window_start > wallclock).

Change-Id: I316af62d1716e9b59c4a2898a2d9b44d6c7a75d8
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 19:50:46 -08:00
Steve Muckle 5fdc1d3aaa sched: track soft/hard irqload per-RQ with decaying avg
The scheduler currently ignores irq activity when deciding which
CPUs to place tasks on. If a CPU is getting hammered with IRQ activity
but has no tasks it will look attractive to the scheduler as it will
not be in a low power mode.

Track irqload with a decaying average. This quantity can be used
in the task placement logic to avoid CPUs which are under high
irqload. The decay factor is 3/4. Note that with this algorithm the
tracked irqload quantity will be higher than the actual irq time
observed in any single window. Some sample outcomes with steady
irqloads per 10ms window and the 3/4 decay factor (irqload of 10 is
used as a threshold in a subsequent patch):

irqload per window        load value asymptote      # windows to > 10
2ms			  8			    n/a
3ms			  12			    7
4ms			  16			    4
5ms			  20			    3

Of course irqload will not be constant in each window, these are just
given as simple examples.

Change-Id: I9dba049f5dfdcecc04339f727c8dd4ff554e01a5
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 19:50:45 -08:00
Steve Muckle c5c90f6099 sched: do not set window until sched_clock is fully initialized
The system initially uses a jiffy-based sched clock. When the platform
registers a new timer for sched_clock, sched_clock can jump backwards.
Once sched_clock_postinit() runs it should be safe to rely on it.

Also sched_clock_cpu() relies on completion of sched_clock_init()
and until that happens sched_clock_cpu() returns zero. This is used
in the irq accounting path which window-based stats relies upon.
So do not set window_start until sched_clock_cpu() is working.

Change-Id: Ided349de8f8554f80a027ace0f63ea52b1c38c68
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-12-10 19:50:44 -08:00
Linux Build Service Account fea8806a70 Merge "sched: Fix inaccurate accounting for real-time task" 2014-12-06 14:38:08 -08:00
Linux Build Service Account cd2d717655 Merge "Revert "sched: update_rq_clock() must skip ONE update"" 2014-12-06 14:38:07 -08:00
Linux Build Service Account b4229d736e Merge "sched: Make RT tasks eligible for boost" 2014-12-05 00:05:48 -08:00
Syed Rameez Mustafa fce95c9a12 sched: Make RT tasks eligible for boost
During sched boost RT tasks currently end up going to the lowest
power cluster. This can be a performance bottleneck especially if
the frequency and IPC differences between clusters are high.
Furthermore, when RT tasks go over to the little cluster during
boost, the load balancer keeps attempting to pull work over to the
big cluster. This results in pre-emption of the executing RT task
causing more delays. Finally, containing more work on a single
cluster during boost might help save some power if the little
cluster can then enter deeper low power modes.

Change-Id: I177b2e81be5657c23e7ac43889472561ce9993a9
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-12-03 19:50:25 -08:00
Linux Build Service Account a5f4e12c8d Merge "sched: Limit LBF_PWR_ACTIVE_BALANCE to within cluster" 2014-12-03 16:31:51 -08:00
Linux Build Service Account 6be10b2d68 Merge "sched: Packing support until a frequency threshold" 2014-12-03 16:31:32 -08:00
Srivatsa Vaddagiri 66b5ce9db0 sched: Limit LBF_PWR_ACTIVE_BALANCE to within cluster
When higher power (performance) cluster has only one online cpu, we
currently let an idle cpu in lower power cluster pull a running task
from performance cluster via active balance. Active balance for
power-aware reasons is supposed to be restricted to balance within
cluster, the check for which is not correctly implemented.

Change-Id: I5fba7f01ad80c082a9b27e89b7f6b17a6d9cde14
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-02 15:28:14 +05:30
Srivatsa Vaddagiri 8fd5aa3bf2 sched: Fix inaccurate accounting for real-time task
It is possible that rq->clock_task was not updated in put_prev_task()
in which case we can potentially overcharge a real-time task for time
it did not run. This is because clock_task could be stale and not
represent the exact time real-time task started running.

Fix this by forcing update of rq->clock_task when real-time task
starts running.

Change-Id: I8320bb4e47924368583127b950d987925e8e6a6c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-02 15:09:11 +05:30
Srivatsa Vaddagiri 21357f54c1 Revert "sched: update_rq_clock() must skip ONE update"
This reverts commit ab2ff007fe
as it was found to cause some performance regressions

Change-Id: Idd71fb04c77f5c9b0bc6bccc66b94ab5a7368471
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-02 14:46:32 +05:30
Srivatsa Vaddagiri 57da62614c sched: Packing support until a frequency threshold
Add another dimension for task packing based on frequency. This patch
adds a per-cpu tunable, rq->mostly_idle_freq, which when set will
result in tasks being packed on a single cpu in cluster as long as
cluster frequency is less than set threshold.

Change-Id: I318e9af6c8788ddf5dfcda407d621449ea5343c0
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-12-02 11:48:30 +05:30
Linux Build Service Account 72de2d6cb8 Merge "sched: update_rq_clock() must skip ONE update" 2014-11-30 16:19:16 -08:00
Linux Build Service Account b4b0ebc5f9 Merge "sched: tighten up jiffy to sched_clock mapping" 2014-11-29 17:17:43 -08:00
Srivatsa Vaddagiri ab2ff007fe sched: update_rq_clock() must skip ONE update
Prevent large wakeup latencies from being accounted to the wrong task.

Change-Id: Ie9932acb8a733989441ff2dd51c50a2626cfe5c5
Cc: <stable@vger.kernel.org>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
CRs-Fixed: 755576
Patch-mainline: http://permalink.gmane.org/gmane.linux.kernel/1677324
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-11-25 12:28:35 +05:30
Linux Build Service Account aa285b577f Merge "sched: per-cpu mostly_idle threshold" 2014-11-20 15:36:30 -08:00
Linux Build Service Account 62b1d26801 Merge "sched: Add API to set task's initial task load" 2014-11-20 15:36:29 -08:00
Steve Muckle f17fe85baf sched: tighten up jiffy to sched_clock mapping
The tick code already tracks exact time a tick is expected
to arrive. This can be used to eliminate slack in the jiffy
to sched_clock mapping that aligns windows between a caller
of sched_set_window and the scheduler itself.

Change-Id: I9d47466658d01e6857d7457405459436d504a2ca
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-11-19 15:06:33 -08:00
Syed Rameez Mustafa a40d3ce56e sched: Avoid unnecessary load balance when tasks don't fit on dst_cpu
When considering to pull over a task that does not fit on the
destination CPU make sure that the busiest group has exceeded its
capacity. While the change is applicable to all groups, the biggest
impact will be on migrating big tasks to little CPUs. This should
only happen when the big cluster is no longer capable of balancing
load within the cluster. This change should have no impact on single
cluster systems.

Change-Id: I6d1ef0e0d878460530f036921ce4a4a9c1e1394b
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-11-13 12:24:31 -08:00
Steve Muckle e3d8a00dab sched: print sched_cpu_load tracepoint for all CPUs
When select_best_cpu() is called because a task is on a suboptimal
CPU, certain CPUs are skipped because moving the task there would
not make things any better. For the purposes of debugging though it
is useful to always see the state of all CPUs.

Change-Id: I76965663c1feef5c4cfab9909e477b0dcf67272d
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-11-10 19:22:51 -08:00
Srivatsa Vaddagiri ed7d7749e9 sched: per-cpu mostly_idle threshold
sched_mostly_idle_load and sched_mostly_idle_nr_run knobs help pack
tasks on cpus to some extent. In some cases, it may be desirable to
have different packing limits for different cpus. For example, pack to
a higher limit on high-performance cpus compared to power-efficient
cpus.

This patch removes the global mostly_idle tunables and makes them
per-cpu, thus letting task packing behavior to be controlled in a
fine-grained manner.

Change-Id: Ifc254cda34b928eae9d6c342ce4c0f64e531e6c2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-11-06 15:27:00 +05:30
Srivatsa Vaddagiri f0e281597c sched: Add API to set task's initial task load
Add a per-task attribute, init_load_pct, that is used to initialize
newly created children's initial task load. This helps important
applications launch their child tasks on cpus with highest capacity.

Change-Id: Ie9665fd2aeb15203f95fd7f211c50bebbaa18727
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-11-05 14:26:59 +05:30
Syed Rameez Mustafa 297c4ccce8 sched: use C-states in non-small task wakeup placement logic
Currently when a non-small task wakes up, the task placement logic
first tries to find the least loaded CPU before breaking any ties
via the power cost of running the task on those CPUs. When the power
cost is also same, however, the scheduler just selects the first CPU
it came across. Use C-states to further break ties when the power
cost is the same for multiple CPUs. The scheduler will now pick a
CPU in the shallowest C-state.

Change-Id: Ie1401b305fa02758a2f7b30cfca1afe64459fc2b
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-11-04 14:11:24 -08:00
Linux Build Service Account c44483c313 Merge "sched: Provide an easy method to log context switch latencies" 2014-10-24 22:51:02 -07:00
Linux Build Service Account 0f1e07cdf9 Merge "sched: take rq lock prior to saving idle task's mark_start" 2014-10-24 16:09:25 -07:00
Syed Rameez Mustafa a2006e83ba sched: Provide an easy method to log context switch latencies
Allow logging of various sections of context switch in order to derive
the worst case latencies associated with them. This is required for
scheduler profiling.

Change-Id: I3a5009cb3088cc7ace2cd3130d4d7b24e957bada
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-10-23 17:38:22 -07:00
Linux Build Service Account 27c06362c4 Merge "sched: update governor notification logic" 2014-10-22 15:58:33 -07:00
Steve Muckle eed96dfa2a sched: take rq lock prior to saving idle task's mark_start
When the idle task is being re-initialized during hotplug its
mark_start value must be retained. The runqueue lock must be
held when reading this value though to serialize this with
other CPUs that could update the idle task's window-based
statistics.

CRs-Fixed: 743991
Change-Id: I1bca092d9ebc32a808cea2b9fe890cd24dc868cd
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2014-10-22 15:12:41 -07:00
Srivatsa Vaddagiri f3386c7cfb sched: update governor notification logic
Make criteria for notifying governor to be per-cpu. Governor is
notified of any large change in cpu's busy time statistics
(rq->prev_runnable_sum) since the last reported value.

Change-Id: I727354d994d909b166d093b94d3dade7c7dddc0d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-15 14:57:18 -07:00
Srivatsa Vaddagiri f99927a703 sched: window-stats: Retain idle thread's mark_start
init_idle() is called on a cpu's idle-thread once at bootup and
subsequently everytime the cpu is hot-added. Since init_idle() calls
__sched_fork(), we end up blowing idle thread's ravg.mark_start value.
As a result we will fail to accurately maintain cpu's
curr/prev_runnable_sum counters. Below example illustrates such a
failure:

CS = curr_runnable_sum, PS = prev_runnable_sum

t0 -> New window starts for CPU2
<after some_task_activity> CS = X, PS = Y
t1 -> <cpu2 is hot-removed. idle_task start's running on cpu2>
      At this time, cpu2_idle_thread.ravg.mark_start = t1

t1 -> t0 + W. One window elapses. CPU2 still hot-removed. We
	defer swapping CS and PS until some future task event occurs

t2 -> CPU2 hot-added.  _cpu_up()->idle_thread_get()->init_idle()
	->__sched_fork() results in cpu2_idle_thread.ravg.mark_start = 0

t3 -> Some task wakes on cpu2. Since mark_start = 0, we don't swap CS
	and PS => which is a BUG!

Fix this by retaining idle task's original mark_start value during
init_idle() call.

Change-Id: I4ac9bfe3a58fb5da8a6c7bc378c79d9930d17942
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-13 16:11:20 -07:00
Linux Build Service Account 53d2a04a26 Merge "sched: Stop task migration to busy CPUs due to power active balance" 2014-10-12 08:39:38 -07:00
Olav Haugan 72bbd4b7cb sched: Add checks for frequency change
We need to check for frequency change when a task is migrated due to
affinity change and during active balance.

Change-Id: I96676db04d34b5b91edd83431c236a1c28166985
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2014-10-09 15:37:22 -07:00
Srivatsa Vaddagiri 19b3f3f871 sched: Use absolute scale for notifying governor
Make the tunables used for deciding the need for notification to be on
absolute scale. The earlier scale (in percent terms relative to
cur_freq) does not work well with available range of frequencies. For
example, 100% tunable value would work well for lower range of
frequencies and not for higher range. Having the tunable to be on
absolute scale makes tuning more realistic.

Change-Id: I35a8c4e2f2e9da57f4ca4462072276d06ad386f1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-03 14:03:56 -07:00
Srivatsa Vaddagiri 2568673dd6 sched: window-stats: Enhance cpu busy time accounting
rq->curr/prev_runnable_sum counters represent cpu demand from various
tasks that have run on a cpu. Any task that runs on a cpu will have a
representation in rq->curr_runnable_sum. Their partial_demand value
will be included in rq->curr_runnable_sum. Since partial_demand is
derived from historical load samples for a task, rq->curr_runnable_sum
could represent "inflated/un-realistic" cpu usage. As an example, lets
say that task with partial_demand of 10ms runs for only 1ms on a cpu.
What is included in rq->curr_runnable_sum is 10ms (and not the actual
execution time of 1ms). This leads to cpu busy time being reported on
the upside causing frequency to stay higher than necessary.

This patch fixes cpu busy accounting scheme to strictly represent
actual usage. It also provides for conditional fixup of busy time upon
migration and upon heavy-task wakeup.

CRs-Fixed: 691443
Change-Id: Ic4092627668053934049af4dfef65d9b6b901e6b
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-03 14:03:51 -07:00
Srivatsa Vaddagiri dababc266f sched: window-stats: ftrace event improvements
Add two new ftrace event:

* trace_sched_freq_alert, to log notifications sent
  to governor for requesting change in frequency.
* trace_sched_get_busy, to log cpu busytime information returned by
  scheduler

Extend existing ftrace events as follows:

* sched_update_task_ravg() event to log irqtime parameter
* sched_migration_update_sum() to log threadid which is being migrated
  (and thus responsible for update of curr_runnable_sum and
  prev_runnable_sum counters)

Change-Id: Ia68ce0953a2d21d319a1db7f916c51ff6a91557c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-03 13:47:29 -07:00
Srivatsa Vaddagiri 86df733742 sched: improve logic for alerting governor
Currently we send notification to governor not taking note of cpus
that are synchronized with regard to their frequency. As a result,
scheduler could send pointless notifications (notification spam!).

Avoid this by considering synchronized cpus and alerting governor only
when the highest demand of any cpu within cluster far exceeds or falls
behind current frequency.

Change-Id: I74908b5a212404ca56b38eb94548f9b1fbcca33d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-10-03 13:46:18 -07:00
Syed Rameez Mustafa 0b013c8593 sched: Stop task migration to busy CPUs due to power active balance
Power active balance should only be invoked when the destination CPU
is calling load balance with either a CPU_IDLE or a CPU_NEWLY_IDLE
environment. We do not want to push tasks towards busy CPUs even they
are a more power efficient place to run that task. This can cause
higher scheduling latencies due to the resulting load imbalance.

Change-Id: I8e0f242338887d189e2fc17acfb63586e7c40839
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-10-02 17:37:09 -07:00
Srivatsa Vaddagiri 6cb3d32976 sched: window-stats: Fix accounting bug in legacy mode
TASK_UPDATE event currently does not result in increment of
rq->curr_runnable_sum in legacy mode, which is wrong. As a result,
cpu busy time reported under legacy mode could be incorrect.

Change-Id: Ifa76c735a0ead23062c1a64faf97e7b801b66bf9
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-15 12:00:46 +05:30
Srivatsa Vaddagiri 802a513d90 sched: window-stats: Note legacy mode in fork() and exit()
In legacy mode, mark_task_starting() should avoid adding (new) task's
(initial) demand to rq->curr_runnable_sum and rq->prev_runnable_sum.
Similarly exit() should avoid removing (exiting) task's demand from
rq->curr_runnable_sum and rq->prev_runnable_sum (as those counters
don't include task's demand and partial_demand values in legacy mode).

Change-Id: I26820b1ac5885a9d681d363ec53d6866a2ea2e6f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-15 12:00:46 +05:30
Srivatsa Vaddagiri 718293c53c sched: Fix reference to stale task_struct in try_to_wake_up()
try_to_wake_up() currently drops p->pi_lock and later checks for need
to notify cpufreq governor on task migrations or wakeups. However the
woken task could exit between the time p->pi_lock is released and the
time the test for notification is run. As a result, the test for
notification could refer to an exited task. task_notify_on_migrate(p)
could thus lead to invalid memory reference.

Fix this by running the test for notification with task's pi_lock
held.

Change-Id: I1c7a337473d2d8e79342a015a179174ce00702e1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-15 12:00:46 +05:30
Syed Rameez Mustafa 37c0e84719 sched: Remove hack to enable/disable HMP scheduling extensions
The current method of turning HMP scheduling extensions on or off
based on the number of CPUs is inappropriate as there may be SoCs with
4 or less cores that require the use of these extensions. Remove this
hack as HMP extensions will now be enabled/disabled via command line
options.

Change-Id: Id44b53c2c3b3c3b83e1911a834e2c824f3958135
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-11 09:18:03 -07:00
Linux Build Service Account 7a62303fd9 Merge "sched: add check for cpu idleness when using C-state information" 2014-09-11 01:52:32 -07:00
Linux Build Service Account e81a3dc7f7 Merge "sched: extend sched_task_load tracepoint to indicate small tasks" 2014-09-11 01:52:31 -07:00
Linux Build Service Account cef2bfadb1 Merge "sched: Add C-state tracking to the sched_cpu_load trace event" 2014-09-09 04:48:06 -07:00
Linux Build Service Account 0dbd5f1b7b Merge "sched: window-stats: add a new AVG policy" 2014-09-09 04:47:32 -07:00
Linux Build Service Account 672d3eb95f Merge "sched: fix wrong load_scale_factor/capacity/nr_big/small_tasks" 2014-09-09 00:57:10 -07:00
Srivatsa Vaddagiri 9e37153f17 sched: fix wrong load_scale_factor/capacity/nr_big/small_tasks
A couple bugs exist with incorrect use of cpu_online_mask in
pre/post_big_small_task() functions, leading to potentially incorrect
computation of load_scale_factor/capacity/nr_big/small_tasks.

pre/post_big_small_task_count_change() use cpu_online_mask in an
unreliable manner. While local_irq_disable() in
pre_big_small_task_count_change() ensures a cpu won't go away in
cpu_online_mask, nothing prevents a cpu from coming online
concurrently. As a result, cpu_online_mask used in
pre_big_small_task_count_change() can be inconsistent with that used
in post_big_small_task_count_change() which can lead to an attempt to
unlock rq->lock which was not taken before.

Secondly, when either max_possible_freq or min_max_freq is changing,
it needs to trigger recomputation of load_scale_factor and capacity
for *all* cpus, even if some are offline. Otherwise, an offline cpu
could later come online with incorrect load_scale_factor/capacity.

While it should be sufficient to scan online cpus for
updating their nr_big/small_tasks in
post_big_small_task_count_change(), unfortunately it sounds pretty
hard to provide a stable cpu_online_mask when its called from
cpufreq_notifier_policy(). cpufreq framework can trigger a
CPUFREQ_NOTIFY notification in multiple contexts, some in cpu-hotplug
paths, which makes it pretty hard to guess whether get_online_cpus()
can be taken without causing deadlocks or not. To workaround the
insufficient information we have about the hotplug-safety context when
CPUFREQ_NOTIFY is issued, have post_big_small_task_count_change()
traverse all possible cpus in updating nr_big/small_task_count.

CRs-Fixed: 717134
Change-Id: Ife8f3f7cdfd77d5a21eee63627d7a3465930aed5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-08 17:18:24 -07:00
Syed Rameez Mustafa 04953f4035 sched: add check for cpu idleness when using C-state information
Task enqueue on a CPU occurs prior to that CPU exiting an idle state.
For the time duration between enqueue and idle exit, the CPU C-state
information can no longer be relied on for further task placement
since already enqueued/waiting tasks are not taken into account. The
small task placement algorithm implicitly assumes a non zero C-state
implies an idle CPU. Since this assumption is incorrect for the
duration described above, make the cpu_idle() check explicit. This
problem can lead to task packing beyond the mostly_idle threshold.

Change-Id: Idb5be85705d6b15f187d011ea2196e1bfe31dbf2
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-08 15:25:11 -07:00
Syed Rameez Mustafa 444e5dee14 sched: extend sched_task_load tracepoint to indicate small tasks
While debugging its always useful to know whether a task is small or
not to determine the scheduling algorithm being used. Have the
sched_task_load tracepoint indicate this information rather than
having to do manual calculations for every task placement.

Change-Id: Ibf390095f05c7da80df1ebfe00f4c5af66c97d12
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-08 14:40:58 -07:00
Syed Rameez Mustafa e85e73f1d7 sched: Add C-state tracking to the sched_cpu_load trace event
C-state information is used by the scheduler for small task placement
decisions. Track this information in the sched_cpu_load trace event.
Also add the trace event in best_small_task_cpu(). This will help
better understand small task placement decisions.

Change-Id: Ife5f05bba59f85c968fab999bd13b9fb6b1c184e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-08 11:29:57 -07:00
Syed Rameez Mustafa bf3e6c0e55 sched: window-stats: add a new AVG policy
The current WINDOW_STATS_AVG policy is actually a misnomer since it
uses the maximum value of the runtime in the recent window and the
average of the past ravg_hist_size windows. Add a policy that only
uses the average and call it WINDOW_STATS_AVG policy. Rename all the
other polices to make them shorter and unambiguous.

Change-Id: I080a4ea072a84a88858ca9da59a4151dfbdbe62c
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-08 11:07:41 -07:00
Linux Build Service Account c5bc590f13 Merge "sched: Fix compile error" 2014-09-07 08:21:53 -07:00
Srivatsa Vaddagiri 594ce07f48 sched: Fix compile error
sched_get_busy(), sched_set_io_is_busy() and sched_set_window() need
to be defined only when CONFIG_SCHED_FREQ_INPUT is defined, otherwise
we get compilation error related to dual definition of those routines

Change-Id: Ifd5c9b6675b78d04c2f7ef0e24efeae70f7ce19b
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-09-04 12:14:38 +05:30
Syed Rameez Mustafa e4600ab9eb sched: update ld_moved for active balance from the load balancer
ld_moved is currently left set to 0 when the load balancer calls upon
active balance. This behavior is incorrect as it prevents the termination
of load balance for parent sched domains. Currently the feature is used
quite frequently for power active balance and sched boost. This means that
while sched boost is in effect we could run into a scenario where a more
power efficient newly idle big CPU first triggers active migration from a
less power efficient busy big CPU. It then continues to load balance at the
cluster level causing active migration for a task running on a little CPU.
Consequently the more power efficient big CPU ends up with two tasks where
as the less power efficient big CPU may become idle. Fix this problem by
updating ld_moved when active migration has been requested.

Change-Id: I52e84eafb77249fd9378ebe531abe2d694178537
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-03 20:01:44 -07:00
Syed Rameez Mustafa 5f5ecf01d3 sched: actively migrate tasks to idle big CPUs during sched boost
The sched boost feature is currently tick driven, i.e. task placement
decisions only take place at a tick (or wakeup). The load balancer
does not have any knowledge of boost being in effect. Tasks that are
woken up on a little CPU when all big CPUs are busy will continue
executing there at least until the next tick even if one of the big
CPUs becomes idle. Reduce this latency by adding support for detecting
whether boost is in effect or not in the load balancer.  If boost is
in effect any big CPU running idle balance will trigger active
migration from a little CPU with the highest task load.

Change-Id: Ib2828809efa0f9857f5009b29931f63b276a59f3
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-03 19:42:15 -07:00
Syed Rameez Mustafa d3990aabb5 sched: always do idle balance with a NEWLY_IDLE idle environment
With the introduction of energy aware scheduling, if idle_balance() is
to be called on behalf of a different CPU which is idle, CPU_IDLE is
used in the environment for load_balance(). This, however, introduces
subtle differences in load calculations and policies in the load
balancer. For example there are restrictions on which CPU is permitted
to do load balancing during !CPU_NEWLY_IDLE (see update_sg_lb_stats)
and find_busiest_group() uses different criteria to detect the
presence of a busy group. There are other differences as well. Revert
back to using the NEWLY_IDLE environment irrespective of whether
idle_balance() is called for the newly idle CPU or on behalf on
already existing idle CPU. This will ensure that task movement logic
while doing idle balance remains unaffected.

Change-Id: I388b0ad9a38ca550667895c8ed19628f3d25ce1a
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-03 19:23:41 -07:00
Syed Rameez Mustafa 9c37494817 sched: fix bail condition in bail_inter_cluster_balance()
Following commit efcad25cbfb (revert "sched: influence cpu_power based
on max_freq and efficiency), all CPUs in the system have the same
cpu_power and consequently the same group capacity. Therefore, the
check in bail_inter_cluster_balance() can now no longer be used to
distinguish a higher performance cluster from one with lower
performance. The check is currently broken and always returns true for
every load balancing attempt. Fix this by using runqueue capacity
instead which can still be used as a good measure of cluster
capabilities.

Also the logic for distinguishing between idle environments and using
a different sched group capacity in update_sd_pick_busiest() is
redundant. sgs->group_capacity would now always be equal to the number
of CPUs in the group. Use sgs->group_capacity directly in conditonal
checks in that function.

Change-Id: Idecfd1ed221d27d4324b20539e5224a92bf8b751
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2014-09-03 19:23:40 -07:00
Srivatsa Vaddagiri 1b36dc118d sched: Initialize env->loop variable to 0
load_balance() function does not explicitly initialize env->loop
variable to 0. As a result, there is a vague possibility of
move_tasks() hitting a very long (unnecessary) loop when its unable to
move tasks from src_cpu. This can lead to unpleasant results like a
watchdog bark. Fix this by explicitly initializing env->loop variable
to 0 (in both load_balance() and active_load_balance_cpu_stop()).

Change-Id: I36b84c91a9753870fa16ef9c9339db7b706527be
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2014-08-25 16:07:57 +05:30
Linux Build Service Account a4e6dcf42b Merge "sched: window-stats: use policy_mutex in sched_set_window()" 2014-08-24 20:01:43 -07:00
Linux Build Service Account 355e55afc6 Merge "sched: window-stats: Avoid taking all cpu's rq->lock for long" 2014-08-24 20:01:43 -07:00
Linux Build Service Account d7bca8f374 Merge "sched: window_stats: Add "disable" mode support" 2014-08-24 20:01:41 -07:00
Linux Build Service Account bf7b729348 Merge "sched: window-stats: Fix exit race" 2014-08-24 20:01:41 -07:00