android_kernel_lge_bullhead

Commit Graph

Author	SHA1	Message	Date
Wanpeng Li	693f42f492	sched/cputime: Fix prev steal time accouting during CPU hotplug Commit: e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug") ... set rq->prev_* to 0 after a CPU hotplug comes back, in order to fix the case where (after CPU hotplug) steal time is smaller than rq->prev_steal_time. However, this should never happen. Steal time was only smaller because of the KVM-specific bug fixed by the previous patch. Worse, the previous patch triggers a bug on CPU hot-unplug/plug operation: because rq->prev_steal_time is cleared, all of the CPU's past steal time will be accounted again on hot-plug. Since the root cause has been fixed, we can just revert commit e9532e69b8d1. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")' Link: http://lkml.kernel.org/r/1465813966-3116-3-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-08-26 16:02:24 +02:00
Nathan Chancellor	459f05e480	This is the 3.10.102 stable release -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXXS5iAAoJEE44bZycYXAvDj8P/jbhmGAgW6tw2cnS90QIZDqG M/nclEId61jICNvbfP6zsioKeWyrmzr5G7NjqTThsSNhCo/DXs3ddMqLy3pOaFdq mytXtHIUpwZoplEib+ODinW40CMqnu11XSWEcee2nrsPuGNsnc7BY0wmFBa6UVCV rOZef9SN9lJcZSYY/auvgLDXOXdQ+NMxp5hau30aF5HBO8hTDXStjPRcUwCvz7aR govTQJHlS4HzLH3JOYS3Dt8IYFDOrKhQIby2nFdw7eiUxHCRy2F0asabTh3DzCw1 iLvFroozjyVXwozfWMqLCvMa+514MXJy8Nkva6xiAHraC8UrgfPtcNsTdgtkdH9T V2Am9b0L7yiBdG6hsZLxkU3akk7vU/0dtppwzvudANT6i2tGcDSBeaZq3T2pAv7B 7coY53GzHZdQnbdTZbYeS1fxebxyXw50D5OJkF8DyLhoL7Uj2Dvv0QdjKv+U/e5D VQ+ZyGcBdCLuOzflXysI10E01y0/M3FrkubgGBM4Oh0eYKCHJaHG/NCZy5JY/qxy S0phem8RbeZPbcL14z+5buWIi1lUkTiCIMG8c32ZEmDh84drnICqABA0RzKmqdkj ucQa+PzkMQ1DyhAMUl/CwpBfSqf1Zs3agLo78Kp5MTGfeAA90m0SeVqhmDgWhwqG HhSlsPFfMfmJl5S0uJpQ =UhFl -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEJDfLduVEy2qz2d/TmXOSYMtstxYFAlpqdSoACgkQmXOSYMts txbb1A/6A1pJjz3//6RsYU7G2f4WgAjqCRzQDPtVtBUwpyBtj7DuAxNGeOAvw0KM BfOTy0fhtgHfOV6F4kynIuU6scNY8zZlZ2ZCgndhiC45dlDBSto2mYgF9DmDl7m3 rRhiWmmSqFvJW+USxCETg8PxXVIs0Si+TU8AfBKJt3Mf25UyLsrm/hIDqg3FtkyP STZlpmACGQEJl6qTVTubTv6/psJc0oE7gUZ2G4TTuFxt+p3/4MPf+pnicl5jcP04 laN1k2ce8ciV8Tc7f5zM55ArLGM+M4QQNRqO6Wrl7gQvtXpn6Efno9aY2MuaXtdm 7sKKvQWj0QMS/9tei+wGS73gDsfIb1qrsaMWD9UF9zGb7miGkRr3wdDZPYurysWy 5cIL1TErJDiIVlVedL/o8EYOxCYamSQPJ35WGxSgeS9kqfTlh3C1angGy9EOpv27 ER1myFM4TUc51ziPIFlEeBu1ku4vVY7atCsZU25VqKFLAapeDG3xuK1RDmal/PTd d2JahllwPQ4Uh8OUNeHcN4Ptxf/fBVezSCZw1tv6vkAUdt6uXcbweutDw74cWlNJ KbKd5yluWVCAVsOSiVNRFX8ij/9GeJvu94eU5o7jiC578TQTRrMdKyxEqVKzz6te 39rFoX20GZ7IosRoJDp9gsJTA7GAVsCcfU9CK/SNL3jxGLFvJbo= =CaKB -----END PGP SIGNATURE----- Merge 3.10.102 into android-msm-bullhead-3.10-oreo-m5 Changes in 3.10.102: (144 commits) pipe: Fix buffer offset after partially failed read x86/iopl/64: Properly context-switch IOPL on Xen PV ext4: fix NULL pointer dereference in ext4_mark_inode_dirty() compiler-gcc: integrate the various compiler-gcc[345].h files x86: LLVMLinux: Fix "incomplete type const struct x86cpu_device_id" KVM: i8254: change PIT discard tick policy KVM: fix spin_lock_init order on x86 EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr() PCI: Disable IO/MEM decoding for devices with non-compliant BARs linux/const.h: Add _BITUL() and _BITULL() x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE x86, processor-flags: Fix the datatypes and add bit number defines x86/iopl: Fix iopl capability check on Xen PV sg: fix dxferp in from_to case aacraid: Fix memory leak in aac_fib_map_free be2iscsi: set the boot_kset pointer to NULL in case of failure usb: retry reset if a device times out USB: cdc-acm: more sanity checking USB: iowarrior: fix oops with malicious USB descriptors USB: usb_driver_claim_interface: add sanity checking USB: mct_u232: add sanity checking in probe USB: digi_acceleport: do sanity checking for the number of ports USB: cypress_m8: add endpoint sanity check USB: serial: cp210x: Adding GE Healthcare Device ID USB: option: add "D-Link DWM-221 B1" device id pwc: Add USB id for Philips Spc880nc webcam Input: powermate - fix oops with malicious USB descriptors net: irda: Fix use-after-free in irtty_open() 8250: use callbacks to access UART_DLL/UART_DLM bttv: Width must be a multiple of 16 when capturing planar formats media: v4l2-compat-ioctl32: fix missing length copy in put_v4l2_buffer32 ALSA: intel8x0: Add clock quirk entry for AD1981B on IBM ThinkPad X41. jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path bcache: fix cache_set_flush() NULL pointer dereference on OOM watchdog: rc32434_wdt: fix ioctl error handling splice: handle zero nr_pages in splice_to_pipe() xtensa: ISS: don't hang if stdin EOF is reached xtensa: clear all DBREAKC registers on start md/raid5: Compare apples to apples (or sectors to sectors) rapidio/rionet: fix deadlock on SMP ipr: Fix out-of-bounds null overwrite ipr: Fix regression when loading firmware drm/radeon: Don't drop DP 2.7 Ghz link setup on some cards. tracing: Have preempt(irqs)off trace preempt disabled functions tracing: Fix crash from reading trace_pipe with sendfile tracing: Fix trace_printk() to print when not using bprintk() scripts/coccinelle: modernize & Input: ims-pcu - sanity check against missing interfaces Input: ati_remote2 - fix crashes on detecting device with invalid descriptor ocfs2/dlm: fix race between convert and recovery ocfs2/dlm: fix BUG in dlm_move_lockres_to_recovery_list mtd: onenand: fix deadlock in onenand_block_markbad sched/cputime: Fix steal time accounting vs. CPU hotplug perf/x86/intel: Fix PEBS data source interpretation on Nehalem/Westmere hwmon: (max1111) Return -ENODEV from max1111_read_channel if not instantiated parisc: Avoid function pointers for kernel exception routines parisc: Fix kernel crash with reversed copy_from_user() ALSA: timer: Use mod_timer() for rearming the system timer net: jme: fix suspend/resume on JMC260 sctp: lack the check for ports in sctp_v6_cmp_addr ipv6: re-enable fragment header matching in ipv6_find_hdr cdc_ncm: toggle altsetting to force reset before setup usbnet: cleanup after bind() in probe() udp6: fix UDP/IPv6 encap resubmit path sh_eth: fix NULL pointer dereference in sh_eth_ring_format() net: Fix use after free in the recvmmsg exit path farsync: fix off-by-one bug in fst_add_one ath9k: fix buffer overrun for ar9287 qlge: Fix receive packets drop. ppp: take reference on channels netns qmi_wwan: add "D-Link DWM-221 B1" device id ipv4: l2tp: fix a potential issue in l2tp_ip_recv ipv6: l2tp: fix a potential issue in l2tp_ip6_recv ip6_tunnel: set rtnl_link_ops before calling register_netdevice usb: renesas_usbhs: avoid NULL pointer derefernce in usbhsf_pkt_handler() usb: renesas_usbhs: disable TX IRQ before starting TX DMAC transfer ext4: add lockdep annotations for i_data_sem HID: usbhid: fix inconsistent reset/resume/reset-resume behavior drm/radeon: hold reference to fences in radeon_sa_bo_new (3.17 and older) usbvision-video: fix memory leak of alt_max_pkt_size usbvision: fix leak of usb_dev on failure paths in usbvision_probe() usbvision: fix crash on detecting device with invalid configuration usb: xhci: fix wild pointers in xhci_mem_cleanup usb: hcd: out of bounds access in for_each_companion crypto: gcm - Fix rfc4543 decryption crash nl80211: check netlink protocol in socket release notification Input: gtco - fix crash on detecting device without endpoints i2c: cpm: Fix build break due to incompatible pointer types EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback ASoC: s3c24xx: use const snd_soc_component_driver pointer efi: Fix out-of-bounds read in variable_matches() workqueue: fix ghost PENDING flag while doing MQ IO USB: usbip: fix potential out-of-bounds write paride: make 'verbose' parameter an 'int' again fbdev: da8xx-fb: fix videomodes of lcd panels misc/bmp085: Enable building as a module rtc: vr41xx: Wire up alarm_irq_enable drivers/misc/ad525x_dpot: AD5274 fix RDAC read back errors include/linux/poison.h: fix LIST_POISON{1,2} offset Drivers: hv: vmbus: prevent cpu offlining on newer hypervisors perf stat: Document --detailed option ARM: OMAP3: Add cpuidle parameters table for omap3430 compiler-gcc: disable -ftracer for __noclone functions ipvs: correct initial offset of Call-ID header search in SIP persistence engine nbd: ratelimit error msgs after socket close clk: versatile: sp810: support reentrance lpfc: fix misleading indentation ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel proc: prevent accessing /proc/<PID>/environ until it's ready batman-adv: Fix broadcast/ogm queue limit on a removed interface MAINTAINERS: Remove asterisk from EFI directory names ACPICA: Dispatcher: Update thread ID for recursive method calls USB: serial: cp210x: add ID for Link ECU USB: serial: cp210x: add Straizona Focusers device ids Input: ads7846 - correct the value got from SPI powerpc: scan_features() updates incorrect bits for REAL_LE crypto: hash - Fix page length clamping in hash walk get_rock_ridge_filename(): handle malformed NM entries Input: max8997-haptic - fix NULL pointer dereference asmlinkage, pnp: Make variables used from assembler code visible ARM: OMAP3: Fix booting with thumb2 kernel decnet: Do not build routes to devices without decnet private data. route: do not cache fib route info on local routes with oif packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface atl2: Disable unimplemented scatter/gather feature net: fix infoleak in llc net: fix infoleak in rtnetlink VSOCK: do not disconnect socket when peer has shutdown SEND only net: bridge: fix old ioctl unlocked net device walk net: fix a kernel infoleak in x25 module fs/cifs: correctly to anonymous authentication via NTLMSSP ring-buffer: Use long for nr_pages to avoid overflow failures ring-buffer: Prevent overflow of size in ring_buffer_resize() mfd: omap-usb-tll: Fix scheduling while atomic BUG mmc: mmc: Fix partition switch timeout for some eMMCs mmc: longer timeout for long read time quirk Bluetooth: vhci: purge unhandled skbs USB: serial: keyspan: fix use-after-free in probe error path USB: serial: quatech2: fix use-after-free in probe error path USB: serial: io_edgeport: fix memory leaks in probe error path USB: serial: option: add support for Cinterion PH8 and AHxx tty: vt, return error when con_startup fails serial: samsung: Reorder the sequence of clock control when call s3c24xx_serial_set_termios() Linux 3.10.102 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Conflicts: drivers/media/v4l2-core/v4l2-compat-ioctl32.c fs/pipe.c kernel/trace/trace_printk.c net/core/rtnetlink.c net/socket.c	2018-01-25 17:24:10 -07:00
Thomas Gleixner	0579a12791	sched/cputime: Fix steal time accounting vs. CPU hotplug commit e9532e69b8d1d1284e8ecf8d2586de34aec61244 upstream. On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time value over CPU down and up. So after the CPU comes up again the delta calculation in steal_account_process_tick() wreckages itself due to the unsigned math: u64 steal = paravirt_steal_clock(smp_processor_id()); steal -= this_rq()->prev_steal_time; So if steal is smaller than rq->prev_steal_time we end up with an insane large value which then gets added to rq->prev_steal_time, resulting in a permanent wreckage of the accounting. As a consequence the per CPU stats in /proc/stat become stale. Nice trick to tell the world how idle the system is (100%) while the CPU is 100% busy running tasks. Though we prefer realistic numbers. None of the accounting values which use a previous value to account for fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity check for prev_irq_time and prev_steal_time_rq, but that sanity check solely deals with clock warps and limits the /proc/stat visible wreckage. The prev_time values are still wrong. Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: commit `095c0aa83e` "sched: adjust scheduler cpu power for stolen time" Fixes: commit `aa48380851` "sched: Remove irq time from available CPU power" Fixes: commit `e6e6685acc` "KVM guest: Steal time accounting" Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-06-07 10:42:48 +02:00
Joonwoo Park	a6bd7ae282	sched: inline function scale_load_to_cpu() Inline relatively small and frequently used function scale_load_to_cpu(). CRs-fixed: 849655 Change-Id: Id5f60595c394959d78e6da4cc4c18c338fec285b Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-08-24 17:45:31 -07:00
Syed Rameez Mustafa	6f717a4bf8	sched: Optimize select_best_cpu() to reduce execution time select_best_cpu() is a crucial wakeup routine that determines the time taken by the scheduler to wake up a task. Optimize this routine to get higher performance. The following changes have been made as part of the optimization listed in order of how they built on top of one another: * Several routines called by select_best_cpu() recalculate task load and CPU load even though these are already known quantities. For example mostly_idle_cpu_sync() calculates CPU load; task_will_fit() calculates task load before spill_threshold_crossed() recalculates both. Remove these redundant calculations by moving the task load and CPU load computations to the select_best_cpu() 'for' loop and passing to any functions that need the information. * Rewrite best_small_task_cpu() to avoid the existing two pass approach. The two pass approach was only in place to find the minimum power cluster for small task placement. This information can easily be established by looking at runqueue capacities. The cluster with not the highest capacity constitutes the minimum power cluster. A special CPU mask is called the mpc_mask required to safeguard against undue side effects on SMP systems. Also terminate the function early if the previous CPU is found to be mostly_idle. * Reorganize code to ensure that no unnecessary computations or variable assignments are done. For example there is no need to compute CPU load if that information does not end up getting used in any iteration of the 'for' loop. * The tick logic for EA migrations unnecessarily checks for the power of all CPUs only for skip_cpu() to throw away the result later. Ensure that for EA we only check CPUs within the same cluster and avoid running select_best_cpu() whenever possible. CRs-fixed: 849655 Change-Id: I4e722912fcf3fe4e365a826d4d92a4dd45c05ef3 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed cpufreq_notifier_policy() to set mpc_mask. added a comment about prerequisite of lower_power_cpu_available(). s/struct rq * rq/struct rq *rq/.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2015-08-24 17:45:09 -07:00
Ramakrishnan Ganesh	8c9b8b6409	Revert "sched: Use only partial wait time as task demand" This reverts commit 2a7c208b28d7558a5e296cec662c691590b0652a. Change-Id: Ie02645574fd93f4149aa6a3f355c71eb412c9bac Signed-off-by: Ramakrishnan Ganesh <ramakris@codeaurora.org>	2015-05-28 12:17:00 -07:00
Syed Rameez Mustafa	a77f89168a	sched: Use only partial wait time as task demand The scheduler currently either considers a tasks entire wait time as task demand or completely ignores wait time based on the tunable sched_account_wait_time. Both approaches have their limitations, however. The former artificially boosts tasks demand when it may not actually be justified. With the latter, the scheduler runs the risk of never being able to recognize true load (consider two CPU hogs on a single little CPU). To achieve a compromise between these two extremes, change the load tracking algorithm to only consider part of a tasks wait time as its demand. The portion of wait time accounted as demand is determined by each tasks percent load, i.e. a task that waits for 10ms and has 60 % task load, only 6 ms of the wait will contribute to task demand. This approach is more fair as the scheduler now tries to determine how much of its wait time would a task actually have been using the CPU if it had been executing. It ensures that tasks with high demand continue to see most of the benefits of accounting wait time as busy time, however, lower demand tasks don't experience a disproportionately high boost to demand triggering unjustified big CPU usage. Note that this new approach is only applicable to wait time being considered as task demand and not wait time considered as CPU busy time. To achieve the above effect, ensure that anytime a task is waiting, its runtime in every relevant window segment is appropriately adjusted using its pct load. Change-Id: I6a698d6cb1adeca49113c3499029b422daf7871f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2015-05-28 12:16:58 -07:00
Syed Rameez Mustafa	d191fa671d	sched: Update max_capacity when an entire cluster is hotplugged When an entire cluster is hotplugged, the scheduler's notion of max_capacity can get outdated. This introduces the following inefficiencies in behavior: * task_will_fit() does not return true on all tasks. Consequently all big tasks go through fallback CPU selection logic skipping C-state and power checks in select_best_cpu(). * During boost, migration_needed() return true unnecessarily causing an avoidable rerun of select_best_cpu(). * An unnecessary kick is sent to all little CPUs when boost is set. * An opportunity for early bailout from nohz_kick_needed() is lost. Start handling CPUFREQ_REMOVE_POLICY in the policy notifier callback which indicates the last CPU in a cluster being hotplugged out. Also modify update_min_max_capacity() to only iterate through online CPUs instead of possible CPUs. While we can't guarantee the integrity of the cpu_online_mask in the notifier callback, the scheduler will fix up all state soon after any changes to the online mask. The change does have one side effect; early termination from the notifier callback when min_max_freq or max_possible_freq remain unchanged is no longer possible. This is because when the last CPU in a cluster is hot removed, only max_capacity is updated without affecting min_max_freq or max_possible_freq. Therefore, when the first CPU in the same cluster gets hot added at a later point max_capacity must once again be recomputed despite there being no change in min_max_freq or max_possible_freq. Change-Id: I9a1256b5c2cd6fcddd85b069faf5e2ace177e122 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2015-05-01 19:40:50 -07:00
Srivatsa Vaddagiri	cbf9e59b96	sched: Add cgroup-based criteria for upmigration It may be desirable to discourage upmigration of tasks belonging to some cgroups. Add a per-cgroup flag (upmigrate_discourage) that discourages upmigration of tasks of a cgroup. Tasks of the cgroup are allowed to upmigrate only under overcommitted scenario. Change-Id: I1780e420af1b6865c5332fb55ee1ee408b74d8ce Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2015-03-05 15:50:56 -08:00
Srivatsa Vaddagiri	8f9ba192b6	sched: Keep track of average nr_big_tasks Extend sched_get_nr_running_avg() API to return average nr_big_tasks, in addition to average nr_running and average nr_io_wait tasks. Also add a new trace point to record values returned by sched_get_nr_running_avg() API. Change-Id: Id3591e6d04da8db484b4d1cb9d95dba075f5ab9a Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2015-02-26 10:37:26 -08:00
Srivatsa Vaddagiri	3f116af70d	sched: Fix bug in average nr_running and nr_iowait calculation sched_get_nr_running_avg() returns average nr_running and nr_iowait task count since it was last invoked. Fix several bugs in their calculation. * sched_update_nr_prod() needs to consider that nr_running count can change by more than 1 when CFS_BANDWIDTH feature is used * sched_get_nr_running_avg() needs to sum up nr_iowait count across all cpus, rather than just one * sched_get_nr_running_avg() could race with sched_update_nr_prod(), as a result of which it could use curr_time which is behind a cpu's 'last_time' value. That would lead to erroneous calculation of average nr_running or nr_iowait. While at it, fix also a bug in BUG_ON() check in sched_update_nr_prod() function and remove unnecessary nr_running argument to sched_update_nr_prod() function. Change-Id: I46737614737292fae0d7204c4648fb9b862f65b2 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2015-02-26 10:33:52 -08:00
Srivatsa Vaddagiri	2385d33016	sched: Support CFS_BANDWIDTH feature in HMP scheduler CFS_BANDWIDTH feature is not currently well-supported by HMP scheduler. Issues encountered include a kernel panic when rq->nr_big_tasks count becomes negative. This patch fixes HMP scheduler code to better handle CFS_BANDWIDTH feature. The most prominent change introduced is maintenance of HMP stats (nr_big_tasks, nr_small_tasks, cumulative_runnable_avg) per 'struct cfs_rq' in addition to being maintained in each 'struct rq'. This allows HMP stats to be updated easily when a group is throttled on a cpu. Change-Id: Iad9f378b79ab5d9d76f86d1775913cc1941e266a Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2015-01-28 14:13:19 +05:30
Srivatsa Vaddagiri	bbef4c5e1b	sched: Consolidate hmp stats into their own struct Key hmp stats (nr_big_tasks, nr_small_tasks and cumulative_runnable_average) are currently maintained per-cpu in 'struct rq'. Merge those stats in their own structure (struct hmp_sched_stats) and modify impacted functions to deal with the newly introduced structure. This cleanup is required for a subsequent patch which fixes various issues with use of CFS_BANDWIDTH feature in HMP scheduler. Change-Id: Ieffc10a3b82a102f561331bc385d042c15a33998 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2015-01-28 14:13:14 +05:30
Linux Build Service Account	380cadc7f3	Merge "sched: Per-cpu prefer_idle flag"	2014-12-29 17:31:47 -08:00
Srivatsa Vaddagiri	599bfc7503	sched: Per-cpu prefer_idle flag Remove the global sysctl_sched_prefer_idle flag and replace it with a per-cpu prefer_idle flag. The per-cpu flag is expected to same for all cpus in a cluster. It thus provides convenient means to disable packing in one cluster while allowing packing in another cluster. Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-12-23 09:52:43 +05:30
Olav Haugan	7e13b27b8b	sched: Add sysctl to enable power aware scheduling Add sysctl to enable energy awareness at runtime. This is useful for performance/power tuning/measurements and debugging. In addition this will match up with the Documentation/scheduler/sched-hmp.txt documentation. Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2014-12-22 14:37:33 -08:00
Joonwoo Park	fc994a4b9e	sched: take account of irq preemption when calculating irqload delta If irq raises while sched_irqload() is calculating irqload delta, sched_account_irqtime() can update rq's irqload_ts which can be greater than the jiffies stored in sched_irqload()'s context so delta can be negative. This negative delta means there was recent irq occurence. So remove improper BUG_ON(). CRs-fixed: 771894 Change-Id: I5bb01b50ec84c14bf9f26dd9c95de82ec2cd19b5 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>	2014-12-16 16:56:50 -08:00
Olav Haugan	2c320f2ffa	sched: Add temperature to cpu_load trace point Add the current CPU temperature to the sched_cpu_load trace point. This will allow us to track the CPU temperature. CRs-Fixed: 764788 Change-Id: Ib2e3559bbbe3fe07a6b7c8115db606828bc36254 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2014-12-13 06:43:48 -08:00
Steve Muckle	75d1c94217	sched: make sched_cpu_high_irqload a runtime tunable It may be desirable to be able to alter the scehd_cpu_high_irqload setting easily, so make it a runtime tunable value. Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-10 23:53:53 -08:00
Steve Muckle	51f0d7663b	sched: avoid CPUs with high irq activity CPUs with significant IRQ activity will not be able to serve tasks quickly. Avoid them if possible by disqualifying such CPUs from being recognized as mostly idle. Change-Id: I2c09272a4f259f0283b272455147d288fce11982 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-10 23:53:47 -08:00
Steve Muckle	5fdc1d3aaa	sched: track soft/hard irqload per-RQ with decaying avg The scheduler currently ignores irq activity when deciding which CPUs to place tasks on. If a CPU is getting hammered with IRQ activity but has no tasks it will look attractive to the scheduler as it will not be in a low power mode. Track irqload with a decaying average. This quantity can be used in the task placement logic to avoid CPUs which are under high irqload. The decay factor is 3/4. Note that with this algorithm the tracked irqload quantity will be higher than the actual irq time observed in any single window. Some sample outcomes with steady irqloads per 10ms window and the 3/4 decay factor (irqload of 10 is used as a threshold in a subsequent patch): irqload per window load value asymptote # windows to > 10 2ms 8 n/a 3ms 12 7 4ms 16 4 5ms 20 3 Of course irqload will not be constant in each window, these are just given as simple examples. Change-Id: I9dba049f5dfdcecc04339f727c8dd4ff554e01a5 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-12-10 19:50:45 -08:00
Linux Build Service Account	b4229d736e	Merge "sched: Make RT tasks eligible for boost"	2014-12-05 00:05:48 -08:00
Syed Rameez Mustafa	fce95c9a12	sched: Make RT tasks eligible for boost During sched boost RT tasks currently end up going to the lowest power cluster. This can be a performance bottleneck especially if the frequency and IPC differences between clusters are high. Furthermore, when RT tasks go over to the little cluster during boost, the load balancer keeps attempting to pull work over to the big cluster. This results in pre-emption of the executing RT task causing more delays. Finally, containing more work on a single cluster during boost might help save some power if the little cluster can then enter deeper low power modes. Change-Id: I177b2e81be5657c23e7ac43889472561ce9993a9 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2014-12-03 19:50:25 -08:00
Srivatsa Vaddagiri	57da62614c	sched: Packing support until a frequency threshold Add another dimension for task packing based on frequency. This patch adds a per-cpu tunable, rq->mostly_idle_freq, which when set will result in tasks being packed on a single cpu in cluster as long as cluster frequency is less than set threshold. Change-Id: I318e9af6c8788ddf5dfcda407d621449ea5343c0 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-12-02 11:48:30 +05:30
Srivatsa Vaddagiri	ed7d7749e9	sched: per-cpu mostly_idle threshold sched_mostly_idle_load and sched_mostly_idle_nr_run knobs help pack tasks on cpus to some extent. In some cases, it may be desirable to have different packing limits for different cpus. For example, pack to a higher limit on high-performance cpus compared to power-efficient cpus. This patch removes the global mostly_idle tunables and makes them per-cpu, thus letting task packing behavior to be controlled in a fine-grained manner. Change-Id: Ifc254cda34b928eae9d6c342ce4c0f64e531e6c2 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-11-06 15:27:00 +05:30
Srivatsa Vaddagiri	f3386c7cfb	sched: update governor notification logic Make criteria for notifying governor to be per-cpu. Governor is notified of any large change in cpu's busy time statistics (rq->prev_runnable_sum) since the last reported value. Change-Id: I727354d994d909b166d093b94d3dade7c7dddc0d Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-10-15 14:57:18 -07:00
Srivatsa Vaddagiri	2568673dd6	sched: window-stats: Enhance cpu busy time accounting rq->curr/prev_runnable_sum counters represent cpu demand from various tasks that have run on a cpu. Any task that runs on a cpu will have a representation in rq->curr_runnable_sum. Their partial_demand value will be included in rq->curr_runnable_sum. Since partial_demand is derived from historical load samples for a task, rq->curr_runnable_sum could represent "inflated/un-realistic" cpu usage. As an example, lets say that task with partial_demand of 10ms runs for only 1ms on a cpu. What is included in rq->curr_runnable_sum is 10ms (and not the actual execution time of 1ms). This leads to cpu busy time being reported on the upside causing frequency to stay higher than necessary. This patch fixes cpu busy accounting scheme to strictly represent actual usage. It also provides for conditional fixup of busy time upon migration and upon heavy-task wakeup. CRs-Fixed: 691443 Change-Id: Ic4092627668053934049af4dfef65d9b6b901e6b Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-10-03 14:03:51 -07:00
Srivatsa Vaddagiri	86df733742	sched: improve logic for alerting governor Currently we send notification to governor not taking note of cpus that are synchronized with regard to their frequency. As a result, scheduler could send pointless notifications (notification spam!). Avoid this by considering synchronized cpus and alerting governor only when the highest demand of any cpu within cluster far exceeds or falls behind current frequency. Change-Id: I74908b5a212404ca56b38eb94548f9b1fbcca33d Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-10-03 13:46:18 -07:00
Linux Build Service Account	0dbd5f1b7b	Merge "sched: window-stats: add a new AVG policy"	2014-09-09 04:47:32 -07:00
Linux Build Service Account	672d3eb95f	Merge "sched: fix wrong load_scale_factor/capacity/nr_big/small_tasks"	2014-09-09 00:57:10 -07:00
Srivatsa Vaddagiri	9e37153f17	sched: fix wrong load_scale_factor/capacity/nr_big/small_tasks A couple bugs exist with incorrect use of cpu_online_mask in pre/post_big_small_task() functions, leading to potentially incorrect computation of load_scale_factor/capacity/nr_big/small_tasks. pre/post_big_small_task_count_change() use cpu_online_mask in an unreliable manner. While local_irq_disable() in pre_big_small_task_count_change() ensures a cpu won't go away in cpu_online_mask, nothing prevents a cpu from coming online concurrently. As a result, cpu_online_mask used in pre_big_small_task_count_change() can be inconsistent with that used in post_big_small_task_count_change() which can lead to an attempt to unlock rq->lock which was not taken before. Secondly, when either max_possible_freq or min_max_freq is changing, it needs to trigger recomputation of load_scale_factor and capacity for all cpus, even if some are offline. Otherwise, an offline cpu could later come online with incorrect load_scale_factor/capacity. While it should be sufficient to scan online cpus for updating their nr_big/small_tasks in post_big_small_task_count_change(), unfortunately it sounds pretty hard to provide a stable cpu_online_mask when its called from cpufreq_notifier_policy(). cpufreq framework can trigger a CPUFREQ_NOTIFY notification in multiple contexts, some in cpu-hotplug paths, which makes it pretty hard to guess whether get_online_cpus() can be taken without causing deadlocks or not. To workaround the insufficient information we have about the hotplug-safety context when CPUFREQ_NOTIFY is issued, have post_big_small_task_count_change() traverse all possible cpus in updating nr_big/small_task_count. CRs-Fixed: 717134 Change-Id: Ife8f3f7cdfd77d5a21eee63627d7a3465930aed5 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-09-08 17:18:24 -07:00
Syed Rameez Mustafa	bf3e6c0e55	sched: window-stats: add a new AVG policy The current WINDOW_STATS_AVG policy is actually a misnomer since it uses the maximum value of the runtime in the recent window and the average of the past ravg_hist_size windows. Add a policy that only uses the average and call it WINDOW_STATS_AVG policy. Rename all the other polices to make them shorter and unambiguous. Change-Id: I080a4ea072a84a88858ca9da59a4151dfbdbe62c Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2014-09-08 11:07:41 -07:00
Syed Rameez Mustafa	9c37494817	sched: fix bail condition in bail_inter_cluster_balance() Following commit efcad25cbfb (revert "sched: influence cpu_power based on max_freq and efficiency), all CPUs in the system have the same cpu_power and consequently the same group capacity. Therefore, the check in bail_inter_cluster_balance() can now no longer be used to distinguish a higher performance cluster from one with lower performance. The check is currently broken and always returns true for every load balancing attempt. Fix this by using runqueue capacity instead which can still be used as a good measure of cluster capabilities. Also the logic for distinguishing between idle environments and using a different sched group capacity in update_sd_pick_busiest() is redundant. sgs->group_capacity would now always be equal to the number of CPUs in the group. Use sgs->group_capacity directly in conditonal checks in that function. Change-Id: Idecfd1ed221d27d4324b20539e5224a92bf8b751 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2014-09-03 19:23:40 -07:00
Srivatsa Vaddagiri	4f93bebd20	sched: window-stats: use policy_mutex in sched_set_window() Several configuration variable change will result in reset_all_window_stats() being called. All of them, except sched_set_window(), are serialized via policy_mutex. Take policy_mutex in sched_set_window() as well to serialize use of reset_all_window_stats() function Change-Id: Iada7ff8ac85caa1517e2adcf6394c5b050e3968a Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-08-22 14:45:17 -07:00
Srivatsa Vaddagiri	b432b691fa	sched: window_stats: Add "disable" mode support "disabled" mode (sched_disble_window_stats = 1) disables all window-stats related activity. This is useful when changing key configuration variables associated with window-stats feature (like policy or window size). Change-Id: I9e55c9eb7f7e3b1b646079c3aa338db6259a9cfe Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-08-22 14:45:15 -07:00
Srivatsa Vaddagiri	85ed6be992	sched: window-stats: legacy mode Support legacy mode, which results in busy time being seen by governor that is close to what it would have seen via existing APIs i.e get_cpu_idle_time_us(), get_cpu_iowait_time_us() and get_cpu_idle_time_jiffy(). In particular, legacy mode means that only task execution time is counted in rq->curr_runnable_sum and rq->prev_runnable_sum. Also task migration does not result in adjustment of those counters. Change-Id: If374ccc084aa73f77374b6b3ab4cd0a4ca7b8c90 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-08-22 14:43:14 -07:00
Srivatsa Vaddagiri	dafe791457	sched: window-stats: Code cleanup Remove code duplication associated with update of various window-stats related sysctl tunables Change-Id: I64e29ac065172464ba371a03758937999c42a71f Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-08-22 14:43:12 -07:00
Olav Haugan	df91ad278c	sched: Make RAVG_HIST_SIZE tunable Make RAVG_HIST_SIZE available from /proc/sys/kernel/sched_ravg_hist_size to allow tuning of the size of the history that is used in computation of task demand. CRs-fixed: 706138 Change-Id: Id54c1e4b6e974a62d787070a0af1b4e8ce3b4be6 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2014-08-12 11:15:20 -07:00
Srivatsa Vaddagiri	098d8371ad	sched: window-stats: 64-bit type for curr/prev_runnable_sum Expand rq->curr_runnable_sum and rq->prev_runnable_sum to be 64-bit counters as otherwise they can easily overflow when a cpu has many tasks. Change-Id: I68ab2658ac6a3174ddb395888ecd6bf70ca70473 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-08-12 10:51:31 -07:00
Srivatsa Vaddagiri	5e8f14fbbc	sched: window-stats: Allow acct_wait_time to be tuned Add sysctl interface to tune sched_acct_wait_time variable at runtime Change-Id: I38339cdb388a507019e429709a7c28e80b5b3585 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-08-12 10:51:30 -07:00
Srivatsa Vaddagiri	4da7e167b3	sched: window-stats: Account interrupt handling time as busy time Account cycles spent by idle cpu handling interrupts (irq or softirq) towards its busy time. Change-Id: I84cc084ced67502e1cfa7037594f29ed2305b2b1 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-08-12 10:51:30 -07:00
Srivatsa Vaddagiri	fa15bd9937	sched: Fix herding issue check_for_migration() could run concurrently on multiple cpus, resulting in multiple tasks wanting to migrate to same cpu. This could cause cpus to be underutilized and lead to increased scheduling latencies for tasks. Fix this by serializing select_best_cpu() calls from cpus running check_for_migration() check and marking selected cpus as reserved, so that subsequent call to select_best_cpu() from check_for_migration() will skip reserved cpus. Change-Id: I73a22cacab32dee3c14267a98b700f572aa3900c Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-08-12 10:51:29 -07:00
Srivatsa Vaddagiri	ea5020bcd2	sched: trigger immediate migration of tasks upon boost Currently turning on boost does not immediately trigger migration of tasks from lower capacity cpus. Tasks could incur migration latency of up to one timer tick (when check_for_migration() is run). Fix this by triggering a migration check on cpus with lower capacity as soon as boost is turned on for first time. Change-Id: I244649f9cb6608862d87631325967b887b7f4b7e Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-08-06 15:36:59 +05:30
Srivatsa Vaddagiri	a57fe9b6df	sched: window-stats: Handle policy change properly sched_window_stat_policy influences task demand and thus various statistics maintained per-cpu like curr_runnable_sum. Changing policy non-atomically would lead to improper accounting. For example, when task is enqueued on a cpu's runqueue, its demand that is added to rq->cumulative_runnable_avg could be based on AVG policy and when its dequeued its demand that is removed can be based on MAX, leading to erroneous accounting. This change causes policy change to be "atomic" i.e all cpu's rq->lock are held and all task's window-stats are reset before policy is changed. Change-Id: I6a3e4fb7bc299dfc5c367693b5717a1ef518c32d CRs-Fixed: 687409 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-08-06 15:36:59 +05:30
Steve Muckle	1aa9b6992a	sched: fixes for compilation without CONFIG_SCHED_HMP These fixes are necessary to compile without CONFIG_SCHED_HMP enabled. Change-Id: Iabbde3c22a81288242ed3a44fdfdb2a16db8b072 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>	2014-07-22 16:08:02 -07:00
Srivatsa Vaddagiri	fefafa08b7	sched: remove sysctl control for HMP and power-aware task placement There is no real need to control HMP and power-aware task placement at runtime after kernel has booted. Boot-time control should be sufficient. Not allowing for runtime (sysctl) support simplifies the code quite a bit. Also rename sysctl_sched_enable_hmp_task_placement to be shorter. Change-Id: I60cae51a173c6f73b79cbf90c50ddd41a27604aa Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-07-22 16:07:58 -07:00
Srivatsa Vaddagiri	87df0beb43	sched: support legacy mode better It should be possible to bypass all HMP scheduler changes at runtime by setting sysctl_sched_enable_hmp_task_placement and sysctl_sched_enable_power_aware to 0. Fix various code paths to honor this requirement. Change-Id: I74254e68582b3f9f1b84661baf7dae14f981c025 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>	2014-07-22 16:07:05 -07:00
Syed Rameez Mustafa	8f7e5b8ee8	sched: Add a per rq max_possible_capacity for use in power calculations In the absence of a power driver providing real power values, the scheduler currently defaults to using capacity of a CPU as a measure of power. This, however, is not a good measure since the capacity of a CPU can change due to thermal conditions and/or other hardware restrictions. These frequency restrictions have no effect on the power efficiency of those CPUs. Introduce max possible capacity of a CPU to track an absolute measure of capacity which translates into a good absolute measure of power efficiency. Max possible capacity takes the max possible frequency of CPUs into account instead of max frequency. Change-Id: Ia970b853e43a90eb8cc6fd990b5c47fca7e50db8 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2014-07-22 14:23:04 -07:00
Syed Rameez Mustafa	3d972d3af1	sched: Make task and CPU load calculations safe from truncation Load calculations have been modified to accept and return 64 bit values. Fix up all the places where we make such calculations to store the result in 64 bit variables. This is necessary to avoid issues caused by truncation of values. While at it update scale_task_load() to scale_load_to_cpu(). This is because the API is used to scale load of both individual tasks as well as the cumulative load of CPUs. In this sense the name was a misnomer. Also clean up power_cost() to use max_task_load(). Change-Id: I51e683e1592a5ea3c4e4b2b06d7a7339a49cce9c Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2014-07-22 14:23:03 -07:00
Syed Rameez Mustafa	8eebaa1826	sched/fair: Introduce C-state aware task placement for small tasks Small tasks execute for small durations. This means that the power cost of taking CPUs out of a low power mode outweigh any performance advantage of using an idle core or power advantage of using the most power efficient CPU. Introduce C-state aware task placement for small tasks. This requires a two pass approach where we first determine the most power effecient CPU and establish a band of CPUs offering a similar power cost for the task. The order of preference then is as follows: 1) Any mostly idle CPU in active C-state in the same power band. 2) A CPU with the shallowest C-state in the same power band. 3) A CPU with the least load in the same power band. 4) Lowest power CPU in a higher power band. The patch also modifies the definition of a small task. Small tasks are now determined relative to minimum capacity CPUs in the system and not the task CPU. Change-Id: Ia09840a5972881cad7ba7bea8fe34c45f909725e Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2014-07-22 14:23:02 -07:00

1 2 3

137 Commits