Commit Graph

233 Commits

Author SHA1 Message Date
Archana Sathyakumar 2eb821458e lpm_levels: Select mode based on steady state power for hotplug
Currently we choose power collapse or standalone power collapse
as default mode for hotplug. Instead select deepest mode based
on the least steady state power.

If lpm isn't probed yet, then check for mode availability for
this spm device and select this mode for cpu hotplug.

Change-Id: Ia54994ae4ed65af20318fdbe68095ac7177ad759
Signed-off-by: Archana Sathyakumar <asathyak@codeaurora.org>
2014-07-14 16:53:28 -06:00
Mahesh Sivasubramanian fb54e802a6 cpuidle: lpm-levels: Fix logic in choosing a cluster low power
If atleast one of the core in a cluster is online, the cluster low power
modes should be determined by the idle characteristics. In one scenario,
when the last core to configure the cluster enters the low power modes
as a part of hotlug but the remainder of the cores are online, the
selection of the cluster low power modes should use parameters related
to idle states and not suspend states.

Change-Id: Ia0af8b57f668225736cfe4a7a33436fed6ffcd4a
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
2014-07-10 15:21:04 -06:00
Venkat Devarasetty 1e1a85c422 msm: lpm: do not allow system level modes if pending rpm ack
Do not allow APSS cores to enter system level mode if there
is a pending acknowledge from RPM. At the same time allow
cores to enter individual power collapse as the cpu collapse
overhead time is low as compared to system level modes.

Change-Id: I6bb4048529690b4ceee8555b27444ece6da82e4a
Signed-off-by: Venkat Devarasetty <vdevaras@codeaurora.org>
2014-07-01 21:44:31 -07:00
Praveen Chidambaram 56f3340c26 msm: lpm-levels: Remove incorrect BUG_ON() called from hotplug path
The cluster_select() code is used for hotplug as well. When cores are
hotplugged as a result of suspend/resume, the cores in the cluster would
be offline, resulting in the cpumask_and() evaluating to 0, resulting
in the control hitting the incorrect BUG_ON().

Change-Id: Ifa4d7e1ee06384a2e371663294bfbe68bb4d523a
Signed-off-by: Praveen Chidambaram <pchidamb@codeaurora.org>
2014-06-30 09:58:19 -06:00
Praveen Chidambaram acddfa88e7 msm: lpm-levels: Apply QoS requests only to relevant cpus.
Use the PM QoS cpu/cupmask variant to provide a mimimum CPU_DMA_LATENCY
for the requesting kernel modules. Kernel modules may specify individual
cpus/cpumask but mostly can set the IRQ affinity for QoS. The PM QoS
framework distils the requirement and can provide the latency
requirement for each cpu or a collection of cpus.

Change-Id: I5f5465653496427d3d40a25ec46570d3c183239e
Signed-off-by: Praveen Chidambaram <pchidamb@codeaurora.org>
2014-06-25 14:07:25 -06:00
Linux Build Service Account b0903a72c5 Merge "msm: lpm: check for next wake up against current time" 2014-06-24 14:27:45 -07:00
Linux Build Service Account 67a383249d Merge "Merge v3.10.40 and related reverts into msm-3.10" 2014-06-20 00:09:33 -07:00
Ian Maund 491fb5c232 Merge upstream tag 'v3.10.40' into msm-3.10
* commit 'v3.10.40': (203 commits)
  Linux 3.10.40
  ARC: !PREEMPT: Ensure Return to kernel mode is IRQ safe
  drm: cirrus: add power management support
  Input: synaptics - add min/max quirk for ThinkPad Edge E431
  Input: synaptics - add min/max quirk for ThinkPad T431s, L440, L540, S1 Yoga and X1
  lockd: ensure we tear down any live sockets when socket creation fails during lockd_up
  dm thin: fix dangling bio in process_deferred_bios error path
  dm transaction manager: fix corruption due to non-atomic transaction commit
  Skip intel_crt_init for Dell XPS 8700
  mtd: sm_ftl: heap corruption in sm_create_sysfs_attributes()
  mtd: nuc900_nand: NULL dereference in nuc900_nand_enable()
  mtd: atmel_nand: Disable subpage NAND write when using Atmel PMECC
  tgafb: fix data copying
  gpio: mxs: Allow for recursive enable_irq_wake() call
  rtlwifi: rtl8188ee: initialize packet_beacon
  rtlwifi: rtl8192se: Fix regression due to commit 1bf4bbb
  rtlwifi: rtl8192se: Fix too long disable of IRQs
  rtlwifi: rtl8192cu: Fix too long disable of IRQs
  rtlwifi: rtl8188ee: Fix too long disable of IRQs
  rtlwifi: rtl8723ae: Fix too long disable of IRQs
  ...

Change-Id: If5388cf980cb123e35e1b29275ba288c89c5aa18
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-06-18 13:10:54 -07:00
Mahesh Sivasubramanian d38bbf5417 cpuidle: lpm-levels: Add ftrace logging for idle low power modes
Add ftrace events to log entry/exit of cluster and cpu low power modes.
Also, add events to log parameters passed down to secure code during
power collapse.

Change-Id: I76e2faf63a80155509d6e3e610db1daa611c0b6a
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
2014-06-18 09:46:43 -06:00
Linux Build Service Account 00ddd1e8b7 Merge "lpm-levels: Remove hotplug serialization before cluster initialization" 2014-06-16 18:06:00 -07:00
Venkat Devarasetty 0c2194e659 msm: lpm: check for next wake up against current time
If the next wake up time for a core is less than the
current time then we should not subtract current time
from next event time which results in a large negative
value. In case the earliest event is expired already
then do not enter cluster low power mode.

Change-Id: Iab04ef9fa4a64e76817254f9c6af4af2d80abb26
Signed-off-by: Venkat Devarasetty <vdevaras@codeaurora.org>
2014-06-16 23:04:31 +05:30
Karthik Parsha 3a6eceeb8f msm: lpm-levels: Allow clock gating even when sleep_disabled is set
The system can boot with all low power modes disabled by setting
sleep_disabled. In this scenario clock gating would also be disabled.
This could lead to thermal conditions that would result in the cores
being hotplugged. Even in the case where all other low power modes are
disabled allow the system to select clock gating as the default level.

Change-Id: I9460dbb417947987a8519813f98ac68de36ba538
Signed-off-by: Karthik Parsha <kparsha@codeaurora.org>
2014-06-13 16:27:31 -07:00
Archana Sathyakumar 25740ee68d lpm-levels: Remove hotplug serialization before cluster initialization
In the event where cpu hotplug happens before lpm probe completion,
it fails as the per_cpu cluster variables and the remote spin lock
has not been initialized yet.

There is no need to serialize hotplug before reading all the cluster
levels for the target in the probe. Return the default flag and do
not acquire the remote spin lock in lpm_cpu_pre_pc_cb call.

Change-Id: Idbffa9c3a5ca5b4e8edf280760f26558230eb461
Signed-off-by: Archana Sathyakumar <asathyak@codeaurora.org>
2014-06-10 17:31:56 -06:00
Linux Build Service Account a34097a3d7 Merge "Revert "msm: add default enable mode option for lpm levels"" 2014-06-10 11:54:54 -07:00
Karthik Parsha b43f28d4a8 msm: lpm-levels: Match suspend tracking across suspend and resume
When the system enters suspend, this entry is tracked by a variable. Unset
this variable on exiting suspend.

Change-Id: I35a1f2caf38940ef95ad6c9a1627ff107061f0f1
Signed-off-by: Karthik Parsha <kparsha@codeaurora.org>
2014-06-09 10:37:49 -07:00
Venkat Devarasetty b3cee487aa Revert "msm: add default enable mode option for lpm levels"
This reverts commit b774cded4a.
Change is no more needed as 8939 boots with all low power modes
enabled by default.

Change-Id: Ia70c0f1974c01086ea18bba8015925528bdd4c48
Signed-off-by: Venkat Devarasetty <vdevaras@codeaurora.org>
2014-06-09 15:28:09 +05:30
Linux Build Service Account 6ea3516add Merge "msm: lpm-levels: Fix cpu votes for lpm-level nodes" 2014-06-06 06:07:46 -07:00
Linux Build Service Account 92ea4580e3 Merge "msm: add default enable mode option for lpm levels" 2014-06-06 06:05:05 -07:00
Venkat Devarasetty b774cded4a msm: add default enable mode option for lpm levels
With cluster architecture changes all lpm modes are
enabled at boot. There are crashes at boot when all
lpms are enabled at boot. Add an option to enable
only selected modes at boot up.

Change-Id: Iebeab667aad8d12926b7a3a92deb8ca47c68bfb3
Signed-off-by: Venkat Devarasetty <vdevaras@codeaurora.org>
2014-06-06 13:03:13 +05:30
Mahesh Sivasubramanian 7d24a55021 msm: lpm-levels: Fix cpu votes for lpm-level nodes
When kernel is booted with lesser than max cpus possible, the lpm
levels' cpu mask does not account for hotplugged cores to be able to
enter a cluster/system low power mode. Fix by accounting for offline
cpus in the levels structure.

Change-Id: Icca70bf64d9faa511ce8611507101889051702a7
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
2014-06-01 11:01:54 -07:00
Linux Build Service Account 162358c79d Merge "lpm_levels: Fix num_childs_in_sync initialization" 2014-05-31 19:44:17 -07:00
Archana Sathyakumar abf45b0929 lpm_levels: Fix num_childs_in_sync initialization
Currently num_childs_in_sync does not get initialized as cluster
lookup returns NULL. Initialize the variable after per_cpu cluster
nodes are initialized.

Change-Id: Iae30debe7c52324d6e2cb666ef96ee894144358e
Signed-off-by: Archana Sathyakumar <asathyak@codeaurora.org>
2014-05-30 13:27:56 -06:00
Mahesh Sivasubramanian 6feb62015a msm: lpm-levels: Add kernel parameter to disable sleep
By default, the lpm levels are disabled. For debug scenarios, it is
beneficial to start the device with low power modes disabled. Provide a
module parameter to override the default behavior

Change-Id: I79c5fa665fdf2b3d6331e2c51e22a5990ba5bf1f
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
2014-05-28 13:54:18 -06:00
Praveen Chidambaram dfbc6bef45 msm: lpm_levels: Allow enable/disable LPM for cpus and clusters
Add sysfs interface to allow/disallow low power modes. Every level
specified in the Devicetree for each cpu and cluster will have an
idle_enabled and suspend_enabled option that controls the availability of
the low power mode for CPUIdle and HOTPLUG/Suspend frameworks.

Change-Id: Ic27f3a586eb9992c611411d2a13365b909ae48a3
Signed-off-by: Praveen Chidambaram <pchidamb@codeaurora.org>
2014-05-22 09:28:51 -06:00
Linux Build Service Account fa06910036 Merge "msm: lpm-levels: Save and restore cpu ctis on L2 SCU power transition" 2014-05-21 21:25:06 -07:00
Karthik Parsha 5b3a36486e msm: lpm-levels: Save and restore cpu ctis on L2 SCU power transition
L2 SCU is turned off on L2 PC and L2 GDHS. Save and then restore cpu ctis
on L2 entering PC or GDHS and restore on exit.

Change-Id: I20360f8be213a634b5a4f9aa4cb7e0ef0aa263be
Signed-off-by: Karthik Parsha <kparsha@codeaurora.org>
2014-05-21 08:54:39 -07:00
Mahesh Sivasubramanian 4f939aeb57 msm: lpm-levels: Add support for reporting statistics.
The pm-stats drivers is rearchitected to support multiple clusters and
report the statistics of individual resources. Add support to invoke the
new APIs to appropriately report the pm-stats for clusters and cpus

Change-Id: Ibcae9e79a2df39a74247968fce89c72a996d9ad3
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
2014-05-20 18:55:00 -06:00
Praveen Chidambaram 1ade5c364e msm: lpm: Support cluster low power modes for cpu hotplug
When all cpus in any cluster are powered down, we would want to do cluster
low power modes and save power draw by devices like L2, associated with the
cluster. Also, bubble up and allow low power modes on top level clusters.

Change-Id: I68bc2ccbb3cff884da9356ef9aad88d5c2207c10
Signed-off-by: Praveen Chidambaram <pchidamb@codeaurora.org>
2014-05-20 18:55:00 -06:00
Mahesh Sivasubramanian 0bae66c38a msm: lpm-levels: Support for cluster power management
Add power management support for multilevel cluster. The code is redesigned
to support low power modes with multiple hierarchies of clusters.

Change-Id: I0d0142e53bf2fe6152e7791f09bcb4d35a82e461
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
2014-05-20 18:54:35 -06:00
Mahesh Sivasubramanian c75831d1ff msm: spm_devices: Support querying of spm devices by name
The current driver assumes that hardware has SPM devices to control CPU and
a L2 only. This restriction doesn't apply to all targets and so adding for
power modules to query spm devices by name to allow multiple spm devices
within the subsystem.

Also, added a new property qcom,cpu which references a CPU's phandle. The
driver will use the phandle to determine the logical CPU map instead if
relying on the qcom,core-id property. qcom,core-id property will be
supported on targets that doesn't support the CPU phandle but would
eventually be deprecated when all the targets have migrated over to
defining CPU phandles.

Also, remove any used APIs in the process.

Change-Id: I3a89fa164d00b91d52f26c6a373af7188cb7908c
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
2014-05-14 15:20:03 -06:00
Mahesh Sivasubramanian 24f47befa4 msm: spm: Unify SPM enums for power modes
Unify the SPM enums for power modes across multiple devices. This
prevents the driver from having to maintain different tables to support
multiple SPM devices in the system. Currently, only two types of SPM
devices are supported, cpu and L2, but multiple such SPM devices could
be available on newer targets.

Change-Id: I89305bee79ca8a1560f9ea1b81e241713a43c8ab
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
2014-05-06 23:50:19 -06:00
Anji Jonnala e9ea12f5de lpm_levels: Do not allow power collapse when multi cores online
When any core power collapse is not selected, no need to allow
idle power collapse on cpu0 when multi cores online. It may leads to
stability issues.

CRs-fixed: 651076
Change-Id: I4c4eb4fb7d40e8bda78fa61358bcb0b00c4f6787
Signed-off-by: Anji Jonnala <anjir@codeaurora.org>
2014-04-22 11:28:16 +05:30
Daniel Fu 6ae69a801b cpuidle: Check the result of cpuidle_get_driver() against NULL
commit 3b9c10e98021e1f92e6f8c7ce1778b86ba68db10 upstream.

If the current CPU has no cpuidle driver, drv will be NULL in
cpuidle_driver_ref().  Check if that is the case before trying
to bump up the driver's refcount to prevent the kernel from
crashing.

[rjw: Subject and changelog]
Signed-off-by: Daniel Fu <danifu@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14 06:42:15 -07:00
Abhijeet Dharmapurikar d3136affb9 msm: krait-regulator-pmic: workaround for soft start issue
There is hw bug in FTS2 where if the gang is disabled and enabled in a
single phase configuration, the subsequent addition of phases do not see
the SS done flag of the gang leader and as a result use SS_CTL timings
for voltage stepping instead of VS_CTL settings.

The sw workaround is to update SS_CTL to the same timing as VS_CTL when
phases are added. Also when the gang is disabled switch back to original
SS_CTL to allow for quick startup when enabled.

Moreover, the recommended settling time after a phase increase with this
workaround is 100uS instead of 50uS. Update the code for this as well.

CRs-Fixed: 640655
Change-Id: I732b526152e581231af19b4aac05b1500a68712e
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
2014-04-09 13:25:31 -07:00
Ian Maund f1b32d4e47 Merge upstream linux-stable v3.10.28 into msm-3.10
The following commits have been reverted from this merge, as they are
known to introduce new bugs and are currently incompatible with our
audio implementation. Investigation of these commits is ongoing, and
they are expected to be brought in at a later time:

86e6de7 ALSA: compress: fix drain calls blocking other compress functions (v6)
16442d4 ALSA: compress: fix drain calls blocking other compress functions

This merge commit also includes a change in block, necessary for
compilation. Upstream has modified elevator_init_fn to prevent race
conditions, requring updates to row_init_queue and test_init_queue.

* commit 'v3.10.28': (1964 commits)
  Linux 3.10.28
  ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling
  drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
  serial: amba-pl011: use port lock to guard control register access
  mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL
  md/raid5: Fix possible confusion when multiple write errors occur.
  md/raid10: fix two bugs in handling of known-bad-blocks.
  md/raid10: fix bug when raid10 recovery fails to recover a block.
  md: fix problem when adding device to read-only array with bitmap.
  drm/i915: fix DDI PLLs HW state readout code
  nilfs2: fix segctor bug that causes file system corruption
  thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only
  ftrace/x86: Load ftrace_ops in parameter not the variable holding it
  SELinux: Fix possible NULL pointer dereference in selinux_inode_permission()
  writeback: Fix data corruption on NFS
  hwmon: (coretemp) Fix truncated name of alarm attributes
  vfs: In d_path don't call d_dname on a mount point
  staging: comedi: adl_pci9111: fix incorrect irq passed to request_irq()
  staging: comedi: addi_apci_1032: fix subdevice type/flags bug
  mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
  GFS2: Increase i_writecount during gfs2_setattr_chown
  perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h
  perf scripting perl: Fix build error on Fedora 12
  ARM: 7815/1: kexec: offline non panic CPUs on Kdump panic
  Linux 3.10.27
  sched: Guarantee new group-entities always have weight
  sched: Fix hrtimer_cancel()/rq->lock deadlock
  sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
  sched: Fix race on toggling cfs_bandwidth_used
  x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround
  netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper
  SCSI: sd: Reduce buffer size for vpd request
  intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters.
  mac80211: move "bufferable MMPDU" check to fix AP mode scan
  ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS
  ACPI / TPM: fix memory leak when walking ACPI namespace
  mfd: rtsx_pcr: Disable interrupts before cancelling delayed works
  clk: exynos5250: fix sysmmu_mfc{l,r} gate clocks
  clk: samsung: exynos5250: Add CLK_IGNORE_UNUSED flag for the sysreg clock
  clk: samsung: exynos4: Correct SRC_MFC register
  clk: clk-divider: fix divisor > 255 bug
  ahci: add PCI ID for Marvell 88SE9170 SATA controller
  parisc: Ensure full cache coherency for kmap/kunmap
  drm/nouveau/bios: make jump conditional
  ARM: shmobile: mackerel: Fix coherent DMA mask
  ARM: shmobile: armadillo: Fix coherent DMA mask
  ARM: shmobile: kzm9g: Fix coherent DMA mask
  ARM: dts: exynos5250: Fix MDMA0 clock number
  ARM: fix "bad mode in ... handler" message for undefined instructions
  ARM: fix footbridge clockevent device
  net: Loosen constraints for recalculating checksum in skb_segment()
  bridge: use spin_lock_bh() in br_multicast_set_hash_max
  netpoll: Fix missing TXQ unlock and and OOPS.
  net: llc: fix use after free in llc_ui_recvmsg
  virtio-net: fix refill races during restore
  virtio_net: don't leak memory or block when too many frags
  virtio-net: make all RX paths handle errors consistently
  virtio_net: fix error handling for mergeable buffers
  vlan: Fix header ops passthru when doing TX VLAN offload.
  net: rose: restore old recvmsg behavior
  rds: prevent dereference of a NULL device
  ipv6: always set the new created dst's from in ip6_rt_copy
  net: fec: fix potential use after free
  hamradio/yam: fix info leak in ioctl
  drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl()
  net: inet_diag: zero out uninitialized idiag_{src,dst} fields
  ip_gre: fix msg_name parsing for recvfrom/recvmsg
  net: unix: allow bind to fail on mutex lock
  ipv6: fix illegal mac_header comparison on 32bit
  netvsc: don't flush peers notifying work during setting mtu
  tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0
  net: unix: allow set_peek_off to fail
  net: drop_monitor: fix the value of maxattr
  ipv6: don't count addrconf generated routes against gc limit
  packet: fix send path when running with proto == 0
  virtio: delete napi structures from netdev before releasing memory
  macvtap: signal truncated packets
  tun: update file current position
  macvtap: update file current position
  macvtap: Do not double-count received packets
  rds: prevent BUG_ON triggered on congestion update to loopback
  net: do not pretend FRAGLIST support
  IPv6: Fixed support for blackhole and prohibit routes
  HID: Revert "Revert "HID: Fix logitech-dj: missing Unifying device issue""
  gpio-rcar: R-Car GPIO IRQ share interrupt
  clocksource: em_sti: Set cpu_possible_mask to fix SMP broadcast
  irqchip: renesas-irqc: Fix irqc_probe error handling
  Linux 3.10.26
  sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
  ext4: fix bigalloc regression
  arm64: Use Normal NonCacheable memory for writecombine
  arm64: Do not flush the D-cache for anonymous pages
  arm64: Avoid cache flushing in flush_dcache_page()
  ARM: KVM: arch_timers: zero CNTVOFF upon return to host
  ARM: hyp: initialize CNTVOFF to zero
  clocksource: arch_timer: use virtual counters
  arm64: Remove unused cpu_name ascii in arch/arm64/mm/proc.S
  arm64: dts: Reserve the memory used for secondary CPU release address
  arm64: check for number of arguments in syscall_get/set_arguments()
  arm64: fix possible invalid FPSIMD initialization state
  ...

Change-Id: Ia0e5d71b536ab49ec3a1179d59238c05bdd03106
Signed-off-by: Ian Maund <imaund@codeaurora.org>
2014-03-24 14:28:34 -07:00
Mahesh Sivasubramanian 6e34202746 msm: lpm_levels: Allow hotplug even if the driver hasn't been probed.
When the system runs into a thermal condition, the kernel thermal module
tries to hotplug the core before lpm probe is initialized. If lpm probe
isn't initialized the power collapse modes are not chosen for hotplug and
the thermal condition isn't mitigated resulting in a thermal reset. Fix
issue by choosing the deepest sleep mode for a given cpu.

CRs-fixed:609769
Change-Id: I4f724ba4f682c640ffde5686b8e194a4de6808f8
Signed-off-by: Mahesh Sivasubramanian <msivasub@codeaurora.org>
2014-02-18 10:02:25 -07:00
Murali Nalajala 7e9435318a msm: lpm_levels: move lpm_level driver to cpuidle directory
Part of power driver code reorganisation move lpm_level
driver to a generic/appropriate place in the kernel directory.

Change-Id: I022b76d17b3fa6e647b9b6a85a514e5c092962cd
Signed-off-by: Murali Nalajala <mnalajal@codeaurora.org>
2014-02-17 14:22:53 +05:30
Colin Cross 736899ab70 cpuidle: coupled: fix race condition between pokes and safe state
commit 9e19b73c30a5fa42a53583a1f7817dd857126156 upstream.

The coupled cpuidle waiting loop clears pending pokes before
entering the safe state.  If a poke arrives just before the
pokes are cleared, but after the while loop condition checks,
the poke will be lost and the cpu will stay in the safe state
until another interrupt arrives.  This may cause the cpu that
sent the poke to spin in the ready loop with interrupts off
until another cpu receives an interrupt, and if no other cpus
have interrupts routed to them it can spin forever.

Change the return value of cpuidle_coupled_clear_pokes to
return if a poke was cleared, and move the need_resched()
checks into the callers.  In the waiting loop, if
a poke was cleared restart the loop to repeat the while
condition checks.

Reported-by: Neil Zhang <zhangwm@marvell.com>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26 17:18:02 -07:00
Colin Cross 61704f0366 cpuidle: coupled: abort idle if pokes are pending
commit f983827bcb9d2c34c4d8935861a1e9128aec2baf upstream.

Joseph Lo <josephl@nvidia.com> reported a lockup on Tegra20 caused
by a race condition in coupled cpuidle.  When two or more cpus
enter idle at the same time, the first cpus to arrive may go to the
ready loop without processing pending pokes from the last cpu to
arrive.

This patch adds a check for pending pokes once all cpus have been
synchronized in the ready loop and resets the coupled state and
retries if any cpus failed to handle their pending poke.

Retrying on all cpus may trigger the same issue again, so this patch
also adds a check to ensure that each cpu has received at least one
poke between when it enters the waiting loop and when it moves on to
the ready loop.

Reported-and-tested-by: Joseph Lo <josephl@nvidia.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26 17:18:02 -07:00
Rafael J. Wysocki d201a0b94d Revert "cpuidle: Quickly notice prediction failure for repeat mode"
commit 148519120c6d1f19ad53349683aeae9f228b0b8d upstream.

Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
repeat mode), because it has been identified as the source of a
significant performance regression in v3.8 and later as explained by
Jeremy Eder:

  We believe we've identified a particular commit to the cpuidle code
  that seems to be impacting performance of variety of workloads.
  The simplest way to reproduce is using netperf TCP_RR test, so
  we're using that, on a pair of Sandy Bridge based servers.  We also
  have data from a large database setup where performance is also
  measurably/positively impacted, though that test data isn't easily
  share-able.

  Included below are test results from 3 test kernels:

  kernel       reverts
  -----------------------------------------------------------
  1) vanilla   upstream (no reverts)

  2) perfteam2 reverts e11538d1f0

  3) test      reverts 69a37beabf
                       e11538d1f0

  In summary, netperf TCP_RR numbers improve by approximately 4%
  after reverting 69a37beabf.  When
  69a37beabf is included, C0 residency
  never seems to get above 40%.  Taking that patch out gets C0 near
  100% quite often, and performance increases.

  The below data are histograms representing the %c0 residency @
  1-second sample rates (using turbostat), while under netperf test.

  - If you look at the first 4 histograms, you can see %c0 residency
    almost entirely in the 30,40% bin.
  - The last pair, which reverts 69a37beabf,
    shows %c0 in the 80,90,100% bins.

  Below each kernel name are netperf TCP_RR trans/s numbers for the
  particular kernel that can be disclosed publicly, comparing the 3
  test kernels.  We ran a 4th test with the vanilla kernel where
  we've also set /dev/cpu_dma_latency=0 to show overall impact
  boosting single-threaded TCP_RR performance over 11% above
  baseline.

  3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
  TCP_RR trans/s 54323.78

  -----------------------------------------------------------
  3.10-rc2 vanilla RX (no reverts)
  TCP_RR trans/s 48192.47

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    59]:
  ***********************************************************
     40.0000 -    50.0000 [     1]: *
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  Sender %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    11]: ***********
     40.0000 -    50.0000 [    49]:
  *************************************************
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  -----------------------------------------------------------
  3.10-rc2 perfteam2 RX (reverts commit
  e11538d1f0)
  TCP_RR trans/s 49698.69

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     1]: *
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    59]:
  ***********************************************************
     40.0000 -    50.0000 [     0]:
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  Sender %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [     2]: **
     40.0000 -    50.0000 [    58]:
  **********************************************************
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [     0]:

  -----------------------------------------------------------
  3.10-rc2 test RX (reverts 69a37beabf
  and e11538d1f0)
  TCP_RR trans/s 47766.95

  Receiver %c0
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     1]: *
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    27]: ***************************
     40.0000 -    50.0000 [     2]: **
     50.0000 -    60.0000 [     0]:
     60.0000 -    70.0000 [     2]: **
     70.0000 -    80.0000 [     0]:
     80.0000 -    90.0000 [     0]:
     90.0000 -   100.0000 [    28]: ****************************

  Sender:
      0.0000 -    10.0000 [     1]: *
     10.0000 -    20.0000 [     0]:
     20.0000 -    30.0000 [     0]:
     30.0000 -    40.0000 [    11]: ***********
     40.0000 -    50.0000 [     0]:
     50.0000 -    60.0000 [     1]: *
     60.0000 -    70.0000 [     0]:
     70.0000 -    80.0000 [     3]: ***
     80.0000 -    90.0000 [     7]: *******
     90.0000 -   100.0000 [    38]: **************************************

  These results demonstrate gaining back the tendency of the CPU to
  stay in more responsive, performant C-states (and thus yield
  measurably better performance), by reverting commit
  69a37beabf.

Requested-by: Jeremy Eder <jeder@redhat.com>
Tested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-11 18:35:24 -07:00
Rafael J. Wysocki 657e142082 Revert "cpuidle: Quickly notice prediction failure in general case"
commit 228b30234f258a193317874854eee1ca7807186e upstream.

Revert commit e11538d1 (cpuidle: Quickly notice prediction failure in
general case), since it depends on commit 69a37be (cpuidle: Quickly
notice prediction failure for repeat mode) that has been identified
as the source of a significant performance regression in v3.8 and
later.

Requested-by: Jeremy Eder <jeder@redhat.com>
Tested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-11 18:35:24 -07:00
Rohit Vaswani d95704c0ec cpuidle: menu: Remove the unused get_loadavg function
The fuction get_loadavg was unused and this also caused
a warning during build. This change also removes it from
the allowed warnings white-list.

Change-Id: I85b184e1eb0ab8b3cf763cab87a30e140d00332b
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
2013-07-08 06:09:53 -07:00
Colin Cross df99953c42 cpuidle: governor: menu: don't use loadavg
get_loadavg doesn't work as intended.  According to the comments, it
should be returning an average over a few seconds, but it is actually
reading the instantaneous load.  It is almost always returning 0, but
can sometimes, depending on workload, spike very high into the hundreds
even when the average cpu load is under 10%.  Disable it for now.

Change-Id: I63ed100af1cf9463549939b8113ed83676db5f86
Signed-off-by: Colin Cross <ccross@android.com>
2013-07-01 13:34:58 -07:00
Daniel Lezcano a8e39c35b5 cpuidle: add maintainer entry
Currently cpuidle drivers are spread across different archs.

As a result, there are several different paths for cpuidle patch
submissions: cpuidle core changes go through linux-pm, ARM driver
changes go to the arm-soc or SoC-specific trees, sh changes go
through the sh arch tree, pseries changes go through the PowerPC tree
and finally intel changes go through the Len's tree while ACPI idle
changes go through linux-pm.

That makes it difficult to consolidate code and to propagate
modifications from the cpuidle core to the different drivers.

Hopefully, a movement has started to put the majority of cpuidle
drivers under drivers/cpuidle like cpuidle-calxeda.c and
cpuidle-kirkwood.c.

Add a maintainer entry for cpuidle to MAINTAINERS to clarify the
situation and to indicate to new cpuidle driver authors that those
drivers should not go into arch-specific directories.

The upstreaming process is unchanged: Rafael takes patches for
merging into his tree, but with an Acked-by: tag from the driver's
maintainer, so indicate in the drivers' headers who maintains them.

The arrangement will be the same as for cpufreq.

[rjw: Changelog]
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Andrew Lunn <andrew@lunn.ch>  #for kirkwood
Acked-by: Jason Cooper <jason@lakedaemon.net> #for kirkwood
Acked-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-26 22:30:25 +02:00
Daniel Lezcano 1c192d047a cpuidle: fix comment format
Fix comment format for the kernel doc script.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-24 00:54:51 +02:00
Daniel Lezcano 30dc72c6fa ARM: kirkwood: cpuidle: use init/exit common routine
Remove the duplicated code and use the cpuidle common code for initialization.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-23 13:45:23 +02:00
Daniel Lezcano 0b210d96a6 ARM: calxeda: cpuidle: use init/exit common routine
Remove the duplicated code and use the cpuidle common code for initialization.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-23 13:45:23 +02:00
Daniel Lezcano 4c637b2175 cpuidle: make a single register function for all
The usual scheme to initialize a cpuidle driver on a SMP is:

	cpuidle_register_driver(drv);
	for_each_possible_cpu(cpu) {
		device = &per_cpu(cpuidle_dev, cpu);
		cpuidle_register_device(device);
	}

This code is duplicated in each cpuidle driver.

On UP systems, it is done this way:

	cpuidle_register_driver(drv);
	device = &per_cpu(cpuidle_dev, cpu);
	cpuidle_register_device(device);

On UP, the macro 'for_each_cpu' does one iteration:

#define for_each_cpu(cpu, mask)                 \
        for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)

Hence, the initialization loop is the same for UP than SMP.

Beside, we saw different bugs / mis-initialization / return code unchecked in
the different drivers, the code is duplicated including bugs. After fixing all
these ones, it appears the initialization pattern is the same for everyone.

Please note, some drivers are doing dev->state_count = drv->state_count. This is
not necessary because it is done by the cpuidle_enable_device function in the
cpuidle framework. This is true, until you have the same states for all your
devices. Otherwise, the 'low level' API should be used instead with the specific
initialization for the driver.

Let's add a wrapper function doing this initialization with a cpumask parameter
for the coupled idle states and use it for all the drivers.

That will save a lot of LOC, consolidate the code, and the modifications in the
future could be done in a single place. Another benefit is the consolidation of
the cpuidle_device variable which is now in the cpuidle framework and no longer
spread accross the different arch specific drivers.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-23 13:45:22 +02:00
Daniel Lezcano 554c06ba3e cpuidle: remove en_core_tk_irqen flag
The en_core_tk_irqen flag is set in all the cpuidle driver which
means it is not necessary to specify this flag.

Remove the flag and the code related to it.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>  # for mach-omap2/*
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-23 13:45:22 +02:00
Daniel Lezcano a06df062a1 cpuidle: initialize the broadcast timer framework
The commit 89878baa73f0f1c679355006bd8632e5d78f96c2 introduced
the CPUIDLE_FLAG_TIMER_STOP flag where we specify a specific idle
state stops the local timer.

Now use this flag to check at init time if one state will need
the broadcast timer and, in this case, setup the broadcast timer
framework. That prevents multiple code duplication in the drivers.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:10:28 +02:00
Silviu-Mihai Popescu 488540bf41 cpuidle: kirkwood: fix coccicheck warnings
Convert all uses of devm_request_and_ioremap() to the newly introduced
devm_ioremap_resource() which provides more consistent error handling.

devm_ioremap_resource() provides its own error messages so all explicit
error messages can be removed from the failure code paths.

Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:10:27 +02:00
Daniel Lezcano 9a23fe65cf cpuidle / kirkwood: remove redundant Kconfig option
When the CPU_IDLE and the ARCH_KIRKWOOD options are set it is
pointless to define a new option CPU_IDLE_KIRKWOOD because it
is redundant.

The Makefile drivers directory contains a condition to compile
the cpuidle drivers:

obj-$(CONFIG_CPU_IDLE)          += cpuidle/

Hence, if CPU_IDLE is not set we won't enter this directory.

This patch removes the useless Kconfig option and replaces the
condition in the Makefile by CONFIG_ARCH_KIRKWOOD.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:10:27 +02:00
Daniel Lezcano b60e6a0eb0 cpuidle : handle clockevent notify from the cpuidle framework
When a cpu enters a deep idle state, the local timers are stopped and
the time framework falls back to the timer device used as a broadcast
timer.

The different cpuidle drivers are calling clockevents_notify ENTER/EXIT
when the idle state stops the local timer.

Add a new flag CPUIDLE_FLAG_TIMER_STOP which can be set by the cpuidle
drivers. If the flag is set, the cpuidle core code takes care of the
notification on behalf of the driver to avoid pointless code duplication.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-04-01 01:10:27 +02:00
Linus Torvalds bab588fcfb arm-soc: soc-specific updates
This is a larger set of new functionality for the existing SoC families,
 including:
 
 * vt8500 gains support for new CPU cores, notably the Cortex-A9 based wm8850
 * prima2 gains support for the "marco" SoC family, its SMP based cousin
 * tegra gains support for the new Tegra4 (Tegra114) family
 * socfpga now supports a newer version of the hardware including SMP
 * i.mx31 and bcm2835 are now using DT probing for their clocks
 * lots of updates for sh-mobile
 * OMAP updates for clocks, power management and USB
 * i.mx6q and tegra now support cpuidle
 * kirkwood now supports PCIe hot plugging
 * tegra clock support is updated
 * tegra USB PHY probing gets implemented diffently
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUSUyPGCrR//JCVInAQI4YA/+Nb0FaA7qMmTPuJhm7aZNfnwBcGxZ7IZp
 s2xByEl3r5zbLKlKGNGE0x7Q7ETHV4y9tohzi9ZduH2b60dMRYgII06CEmDPu6/h
 4vBap2oLzfWfs9hwpCIh7N9wNzxSj/R42vlXHhNmspHlw7cFk1yw5EeJ+ocxmZPq
 H9lyjAxsGErkZyM/xstNQ1Uvhc8XHAFSUzWrg8hvf6AVVR8hwpIqVzfIizv6Vpk6
 ryBoUBHfdTztAOrafK54CdRc7l6kVMomRodKGzMyasnBK3ZfFca3IR7elnxLyEFJ
 uPDu5DKOdYrjXC8X2dPM6kYiE41YFuqOV2ahBt9HqRe6liNBLHQ6NAH7f7+jBWSI
 eeWe84c2vFaqhAGlci/xm4GaP0ud5ZLudtiVPlDY5tYIADqLygNcx1HIt/5sT7QI
 h34LMjc4+/TGVWTVf5yRmIzTrCXZv5YoAak3UWFoM4nVBo/eYVyNLEt5g9YsfjrC
 P/GWrXJJvOCB3gAi31pgGYJzZg8K7kTTAh/dgxjqzU4f6nGRm5PBydiJe18/lWkH
 qtfNE0RbhxCi3JEBnxW48AIEndVSRbd7jf8upC/s9rPURtFSVXp4APTHVyNUKCip
 gojBxcRYtesyG/53nrwdTyiyHx6GocmWnMNZJoDo0UQEkog2dOef+StdC3zhc2Vm
 9EttcFqWJ+E=
 =PRrg
 -----END PGP SIGNATURE-----

Merge tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC-specific updates from Arnd Bergmann:
 "This is a larger set of new functionality for the existing SoC
  families, including:

   - vt8500 gains support for new CPU cores, notably the Cortex-A9 based
     wm8850

   - prima2 gains support for the "marco" SoC family, its SMP based
     cousin

   - tegra gains support for the new Tegra4 (Tegra114) family

   - socfpga now supports a newer version of the hardware including SMP

   - i.mx31 and bcm2835 are now using DT probing for their clocks

   - lots of updates for sh-mobile

   - OMAP updates for clocks, power management and USB

   - i.mx6q and tegra now support cpuidle

   - kirkwood now supports PCIe hot plugging

   - tegra clock support is updated

   - tegra USB PHY probing gets implemented diffently"

* tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (148 commits)
  ARM: prima2: remove duplicate v7_invalidate_l1
  ARM: shmobile: r8a7779: Correct TMU clock support again
  ARM: prima2: fix __init section for cpu hotplug
  ARM: OMAP: Consolidate OMAP USB-HS platform data (part 3/3)
  ARM: OMAP: Consolidate OMAP USB-HS platform data (part 1/3)
  arm: socfpga: Add SMP support for actual socfpga harware
  arm: Add v7_invalidate_l1 to cache-v7.S
  arm: socfpga: Add entries to enable make dtbs socfpga
  arm: socfpga: Add new device tree source for actual socfpga HW
  ARM: tegra: sort Kconfig selects for Tegra114
  ARM: tegra: enable ARCH_REQUIRE_GPIOLIB for Tegra114
  ARM: tegra: Fix build error w/ ARCH_TEGRA_114_SOC w/o ARCH_TEGRA_3x_SOC
  ARM: tegra: Fix build error for gic update
  ARM: tegra: remove empty tegra_smp_init_cpus()
  ARM: shmobile: Register ARM architected timer
  ARM: MARCO: fix the build issue due to gic-vic-to-irqchip move
  ARM: shmobile: r8a7779: Correct TMU clock support
  ARM: mxs_defconfig: Select CONFIG_DEVTMPFS_MOUNT
  ARM: mxs: decrease mxs_clockevent_device.min_delta_ns to 2 clock cycles
  ARM: mxs: use apbx bus clock to drive the timers on timrotv2
  ...
2013-02-21 15:27:22 -08:00
Andrew Lunn 9cfc94eb0f cpuidle: kirkwood: Move out of mach directory
Move the Kirkwood cpuidle driver out of arch/arm/mach-kirkwood and
into drivers/cpuidle. Convert the driver into a platform driver.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2013-01-31 17:01:37 +00:00
Paul Gortmaker 43720bd601 PM / tracing: remove deprecated power trace API
The text in Documentation said it would be removed in 2.6.41;
the text in the Kconfig said removal in the 3.1 release.  Either
way you look at it, we are well past both, so push it off a cliff.

Note that the POWER_CSTATE and the POWER_PSTATE are part of the
legacy tracing API.  Remove all tracepoints which use these flags.
As can be seen from context, most already have a trace entry via
trace_cpu_idle anyways.

Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as
compared to the CSTATE ones which all have a clear start/stop.
As part of this, the trace_power_frequency also becomes orphaned,
so it too is deleted.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-26 00:39:12 +01:00
Daniel Lezcano 8aef33a7cf cpuidle: remove the power_specified field in the driver
We realized that the power usage field is never filled and when it
is filled for tegra, the power_specified flag is not set causing all
of these values to be reset when the driver is initialized with
set_power_state().

However, the power_specified flag can be simply removed under the
assumption that the states are always backward sorted, which is the
case with the current code.

This change allows the menu governor select function and the
cpuidle_play_dead() to be simplified.  Moreover, the
set_power_states() function can removed as it does not make sense
any more.

Drop the power_specified flag from struct cpuidle_driver and make
the related changes as described above.

As a consequence, this also fixes the bug where on the dynamic
C-states system, the power fields are not initialized.

[rjw: Changelog]
References: https://bugzilla.kernel.org/show_bug.cgi?id=42870
References: https://bugzilla.kernel.org/show_bug.cgi?id=43349
References: https://lkml.org/lkml/2012/10/16/518
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-15 14:18:04 +01:00
Krzysztof Mazur 392370e7aa cpuidle: fix number of initialized/destroyed states
Commit bf4d1b5ddb (cpuidle: support
multiple drivers) changed the number of initialized state kobjects
in cpuidle_add_state_sysfs() from device->state_count to
drv->state_count, but left device->state_count in
cpuidle_remove_state_sysfs().  The values of these two fields may be
different, in which case a NULL pointer dereference may happen in
cpuidle_remove_state_sysfs(), for example.  Fix this problem by making
cpuidle_add_state_sysfs() use device->state_count too (which restores
the original behavior of it).

[rjw: Changelog]
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-11 23:20:09 +01:00
Daniel Lezcano ac34d7c8c8 cpuidle: fix lock contention in the idle path
Commit bf4d1b5 (cpuidle: support multiple drivers) introduced
locking in cpuidle_get_cpu_driver(), which is used in the
idle_call() function.

This leads to a contention problem with a large number of CPUs,
because they all try to run the idle routine at the same time.

The lock can be safely removed because of how is used the cpuidle
API.  Namely, cpuidle_register_driver() is called first, but the
cpuidle idle function is not entered before cpuidle_register_device()
is called, because the cpuidle device is not enabled then. Moreover,
cpuidle_unregister_driver(), which would reset the driver value to
NULL, is not called before cpuidle_unregister_device().

All of the cpuidle drivers use the API in the same way.

In general, a cleanup around the lock is necessary and a proper
refcounting mechanism should be used to ensure the consistency in the
API (for example, cpuidle_unregister_driver() should fail if the
driver's refcount is not 0). However, these modifications will require
some code reorganization and rewrite which will be too intrusive for
a fix.

For this reason, fix the contention problem introduced by commit
bf4d1b5 by simply removing the locking from cpuidle_get_cpu_driver(),
which restores the original behavior of that routine.

[rjw: Changelog.]
Reported-and-tested-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-03 13:11:06 +01:00
Sivaram Nair 92638e2fac cpuidle / coupled: fix ready counter decrement
The ready_waiting_counts atomic variable is compared against the wrong
online cpu count. The latter is computed incorrectly using logical-OR
instead of bit-OR. This patch fixes that.

Signed-off-by: Sivaram Nair <sivaramn@nvidia.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Colin Cross <ccross@android.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-03 13:11:05 +01:00
Sivaram Nair 0e5537b30d cpuidle: Fix finding state with min power_usage
Since cpuidle_state.power_usage is a signed value, use INT_MAX (instead
of -1) to init the local copies so that functions that tries to find
cpuidle states with minimum power usage works correctly even if they use
non-negative values.

Signed-off-by: Sivaram Nair <sivaramn@nvidia.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-03 13:11:05 +01:00
Linus Torvalds d027db132b ARM: arm-soc: SoC updates for 3.8
This contains the bulk of new SoC development for this merge window.
 
 Two new platforms have been added, the sunxi platforms (Allwinner A1x
 SoCs) by Maxime Ripard, and a generic Broadcom platform for a new
 series of ARMv7 platforms from them, where the hope is that we can
 keep the platform code generic enough to have them all share one mach
 directory. The new Broadcom platform is contributed by Christian Daudt.
 
 Highbank has grown support for Calxeda's next generation of hardware,
 ECX-2000.
 
 clps711x has seen a lot of cleanup from Alexander Shiyan, and he's also
 taken on maintainership of the platform.
 
 Beyond this there has been a bunch of work from a number of people on
 converting more platforms to IRQ domains, pinctrl conversion, cleanup
 and general feature enablement across most of the active platforms.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQyLCjAAoJEIwa5zzehBx3AdQP/R+L3+EQMjiEWt/p7g/ql5Em
 0SnP92CcGzrjgLTg9z1FeOazfOsGnkZAYUlDRkqfKobH3VqkhYFFtt1/0x0KMahm
 xcowHgMBOyimFdWT9vLK3J8U6DLui5XrEG9LGH2VL+lqmfjIyP/OOF3mVc0/+pV9
 WTLAsYswdBRSeiNuF43kqlfrOwF6xsPLgiNMlc82w6BzHqoHu6dOif5M9MqWaApS
 V74DPmwLD371Tyit6aHqt3JOqpgiPSHlmxkzomK+5idcW3Pa7HnzzFYmx85dk/eN
 J2siqIkoOu7tEfjIbNZTL2MYoX4tUUKv4qZZ3IOl3YSWaV3P5ilMApF01XVrkk8E
 DWOMhzte9hC7L90W+/kCPLF1VyeAhCem2KQWUitO71fKur3r+3ZaUokNVvWzkJIL
 7aduxAJOV2hfLgEqbjbjF3o4S8p63OV3kzivFJM1And15zDJo4+qqOh67+bPo4jj
 +R4du+SqzXriw4i3tDLGVpdjDffk4D41tbLzgkWAtvGyoP45yeYfHAzAh0pDFPRv
 ASfZVmZ5PhwAUAkIMnpC2sjgmxMYff3SYqmDgnsqXES7rbDH/hG+teymtHFTyUQp
 m+f60DNotSMcMvkLdvruLSB4aeTiwbfOqPn/g+aXYUlPuNMq1fVWgN7EJKWkamK4
 nRwaJmLwx1/ojcVbpy2G
 =YMKB
 -----END PGP SIGNATURE-----

Merge tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC updates from Olof Johansson:
 "This contains the bulk of new SoC development for this merge window.

  Two new platforms have been added, the sunxi platforms (Allwinner A1x
  SoCs) by Maxime Ripard, and a generic Broadcom platform for a new
  series of ARMv7 platforms from them, where the hope is that we can
  keep the platform code generic enough to have them all share one mach
  directory.  The new Broadcom platform is contributed by Christian
  Daudt.

  Highbank has grown support for Calxeda's next generation of hardware,
  ECX-2000.

  clps711x has seen a lot of cleanup from Alexander Shiyan, and he's
  also taken on maintainership of the platform.

  Beyond this there has been a bunch of work from a number of people on
  converting more platforms to IRQ domains, pinctrl conversion, cleanup
  and general feature enablement across most of the active platforms."

Fix up trivial conflicts as per Olof.

* tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (174 commits)
  mfd: vexpress-sysreg: Remove LEDs code
  irqchip: irq-sunxi: Add terminating entry for sunxi_irq_dt_ids
  clocksource: sunxi_timer: Add terminating entry for sunxi_timer_dt_ids
  irq: versatile: delete dangling variable
  ARM: sunxi: add missing include for mdelay()
  ARM: EXYNOS: Avoid early use of of_machine_is_compatible()
  ARM: dts: add node for PL330 MDMA1 controller for exynos4
  ARM: EXYNOS: Add support for secondary CPU bring-up on Exynos4412
  ARM: EXYNOS: add UART3 to DEBUG_LL ports
  ARM: S3C24XX: Add clkdev entry for camif-upll clock
  ARM: SAMSUNG: Add s3c24xx/s3c64xx CAMIF GPIO setup helpers
  ARM: sunxi: Add missing sun4i.dtsi file
  pinctrl: samsung: Do not initialise statics to 0
  ARM i.MX6: remove gate_mask from pllv3
  ARM i.MX6: Fix ethernet PLL clocks
  ARM i.MX6: rename PLLs according to datasheet
  ARM i.MX6: Add pwm support
  ARM i.MX51: Add pwm support
  ARM i.MX53: Add pwm support
  ARM: mx5: Replace clk_register_clkdev with clock DT lookup
  ...
2012-12-12 12:05:15 -08:00
Julius Werner a474a51549 cpuidle: Measure idle state durations with monotonic clock
Many cpuidle drivers measure their time spent in an idle state by
reading the wallclock time before and after idling and calculating the
difference. This leads to erroneous results when the wallclock time gets
updated by another processor in the meantime, adding that clock
adjustment to the idle state's time counter.

If the clock adjustment was negative, the result is even worse due to an
erroneous cast from int to unsigned long long of the last_residency
variable. The negative 32 bit integer will zero-extend and result in a
forward time jump of roughly four billion milliseconds or 1.3 hours on
the idle state residency counter.

This patch changes all affected cpuidle drivers to either use the
monotonic clock for their measurements or make use of the generic time
measurement wrapper in cpuidle.c, which was already working correctly.
Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
should always already be disabled before entering the idle function, and
not get reenabled until the generic wrapper has performed its second
measurement). It also removes the erroneous cast, making sure that
negative residency values are applied correctly even though they should
not appear anymore.

Signed-off-by: Julius Werner <jwerner@chromium.org>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-27 14:17:58 +01:00
Li Zhong a093b93ee0 cpuidle: fix a suspicious RCU usage in menu governor
I saw this suspicious RCU usage on the next tree of 11/15

[   67.123404] ===============================
[   67.123413] [ INFO: suspicious RCU usage. ]
[   67.123423] 3.7.0-rc5-next-20121115-dirty #1 Not tainted
[   67.123434] -------------------------------
[   67.123444] include/trace/events/timer.h:186 suspicious rcu_dereference_check() usage!
[   67.123458]
[   67.123458] other info that might help us debug this:
[   67.123458]
[   67.123474]
[   67.123474] RCU used illegally from idle CPU!
[   67.123474] rcu_scheduler_active = 1, debug_locks = 0
[   67.123493] RCU used illegally from extended quiescent state!
[   67.123507] 1 lock held by swapper/1/0:
[   67.123516]  #0:  (&cpu_base->lock){-.-...}, at: [<c0000000000979b0>] .__hrtimer_start_range_ns+0x28c/0x524
[   67.123555]
[   67.123555] stack backtrace:
[   67.123566] Call Trace:
[   67.123576] [c0000001e2ccb920] [c00000000001275c] .show_stack+0x78/0x184 (unreliable)
[   67.123599] [c0000001e2ccb9d0] [c0000000000c15a0] .lockdep_rcu_suspicious+0x120/0x148
[   67.123619] [c0000001e2ccba70] [c00000000009601c] .enqueue_hrtimer+0x1c0/0x1c8
[   67.123639] [c0000001e2ccbb00] [c000000000097aa0] .__hrtimer_start_range_ns+0x37c/0x524
[   67.123660] [c0000001e2ccbc20] [c0000000005c9698] .menu_select+0x508/0x5bc
[   67.123678] [c0000001e2ccbd20] [c0000000005c740c] .cpuidle_idle_call+0xa8/0x6e4
[   67.123699] [c0000001e2ccbdd0] [c0000000000459a0] .pSeries_idle+0x10/0x34
[   67.123717] [c0000001e2ccbe40] [c000000000014dc8] .cpu_idle+0x130/0x280
[   67.123738] [c0000001e2ccbee0] [c0000000006ffa8c] .start_secondary+0x378/0x384
[   67.123758] [c0000001e2ccbf90] [c00000000000936c] .start_secondary_prolog+0x10/0x14

hrtimer_start was added in 198fd638 and ae515197. The patch below tries
to use RCU_NONIDLE around it to avoid the above report.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-23 00:05:03 +01:00
Daniel Lezcano bf4d1b5ddb cpuidle: support multiple drivers
With the tegra3 and the big.LITTLE [1] new architectures, several cpus
with different characteristics (latencies and states) can co-exists on the
system.

The cpuidle framework has the limitation of handling only identical cpus.

This patch removes this limitation by introducing the multiple driver support
for cpuidle.

This option is configurable at compile time and should be enabled for the
architectures mentioned above. So there is no impact for the other platforms
if the option is disabled. The option defaults to 'n'. Note the multiple drivers
support is also compatible with the existing drivers, even if just one driver is
needed, all the cpu will be tied to this driver using an extra small chunk of
processor memory.

The multiple driver support use a per-cpu driver pointer instead of a global
variable and the accessor to this variable are done from a cpu context.

In order to keep the compatibility with the existing drivers, the function
'cpuidle_register_driver' and 'cpuidle_unregister_driver' will register
the specified driver for all the cpus.

The semantic for the output of /sys/devices/system/cpu/cpuidle/current_driver
remains the same except the driver name will be related to the current cpu.

The /sys/devices/system/cpu/cpu[0-9]/cpuidle/driver/name files are added
allowing to read the per cpu driver name.

[1] http://lwn.net/Articles/481055/

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:23 +01:00
Daniel Lezcano 13dd52f11a cpuidle: prepare the cpuidle core to handle multiple drivers
This patch is a preparation for the multiple cpuidle drivers support.

As the next patch will introduce the multiple drivers with the Kconfig
option and we want to keep the code clean and understandable, this patch
defines a set of functions for encapsulating some common parts and splits
what should be done under a lock from the rest.

[rjw: Modified the subject and changelog slightly.]
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:22 +01:00
Daniel Lezcano 4168203271 cpuidle: move driver checking within the lock section
The code is racy and the check with cpuidle_curr_driver should be
done under the lock.

I don't find a path in the different drivers where that could happen
because the arch specific drivers are written in such way it is not
possible to register a driver while it is unregistered, except maybe
in a very improbable case when "intel_idle" and "processor_idle" are
competing. One could unregister a driver, while the other one is
registering.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:22 +01:00
Daniel Lezcano 42f67f2aca cpuidle: move driver's refcount to cpuidle
We want to support different cpuidle drivers co-existing together.
In this case we should move the refcount to the cpuidle_driver
structure to handle several drivers at a time.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:22 +01:00
Daniel Lezcano 8f3e9953e1 cpuidle: fixup device.h header in cpuidle.h
The "struct device" is only used in sysfs.c.

The other .c files including the private header "cpuidle.h"
do not need to pull the entire headers tree from there as they
don't manipulate the "struct device".

This patch fixes this by moving the header inclusion to sysfs.c
and adding a forward declaration for the struct device.

The number of lines generated by the preprocesor:
Without this patch : 17269 loc
With this patch : 16446 loc

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:21 +01:00
Daniel Lezcano 349631e0e4 cpuidle / sysfs: move structure declaration into the sysfs.c file
The structure cpuidle_state_kobj is not used anywhere except
in the sysfs.c file. The definition of this structure is not
needed in the cpuidle header file. This patch moves it to the
sysfs.c file in order to encapsulate the code a bit more.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:21 +01:00
Youquan Song c96ca4fb76 cpuidle: Get typical recent sleep interval
The function detect_repeating_patterns was not very useful for
workloads with alternating long and short pauses, for example
virtual machines handling network requests for each other (say
a web and database server).

Instead, try to find a recent sleep interval that is somewhere
between the median and the mode sleep time, by discarding outliers
to the up side and recalculating the average and standard deviation
until that is no longer required.

This should do something sane with a sleep interval series like:

	200 180 210 10000 30 1000 170 200

The current code would simply discard such a series, while the
new code will guess a typical sleep interval just shy of 200.

The original patch come from Rik van Riel <riel@redhat.com>.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:20 +01:00
Youquan Song d73d68dc49 cpuidle: Set residency to 0 if target Cstate not enter
When cpuidle governor choose a C-state to enter for idle CPU, but it notice that
there is tasks request to be executed. So the idle CPU will not really enter
the target C-state and go to run task.

In this situation, it will use the residency of previous really entered target
C-states. Obviously, it is not reasonable.

So, this patch fix it by set the target C-state residency to 0.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:20 +01:00
Youquan Song e11538d1f0 cpuidle: Quickly notice prediction failure in general case
The prediction for future is difficult and when the cpuidle governor prediction
fails and govenor possibly choose the shallower C-state than it should. How to
quickly notice and find the failure becomes important for power saving.

The patch extends to general case that prediction logic get a small predicted
residency, so it choose a shallow C-state though the expected residency is large
. Once the prediction will be fail, the CPU will keep staying at shallow C-state
for a long time. Acutally, the CPU has change enter into deep C-state.
So when the expected residency is long enough but governor choose a shallow
C-state, an timer will be added in order to monitor if the prediction failure.

When C-state is waken up prior to the adding timer, the timer will be cancelled
initiatively. When the timer is triggered and menu governor will quickly notice
prediction failure and re-evaluates deeper C-states possibility.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:20 +01:00
Youquan Song 69a37beabf cpuidle: Quickly notice prediction failure for repeat mode
The prediction for future is difficult and when the cpuidle governor prediction
fails and govenor possibly choose the shallower C-state than it should. How to
quickly notice and find the failure becomes important for power saving.

cpuidle menu governor has a method to predict the repeat pattern if there are 8
C-states residency which are continuous and the same or very close, so it will
predict the next C-states residency will keep same residency time.

There is a real case that turbostat utility (tools/power/x86/turbostat)
at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
 governor will predict it is repeat mode and there is another IPI wake up idle
 CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
idle. However, in the turbostat, following 10 registers reading is sleep 5
seconds by default, so the idle CPU will keep at C1 for a long time though it is
 idle until break event occurs.
In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
deep C-state stays at >99.98%.

In the patch, a timer is added when menu governor detects a repeat mode and
choose a shallow C-state. The timer is set to a time out value that greater
than predicted time, and we conclude repeat mode prediction failure if timer is
triggered. When repeat mode happens as expected, the timer is not triggered
and CPU waken up from C-states and it will cancel the timer initiatively.
When repeat mode does not happen, the timer will be time out and menu governor
will quickly notice that the repeat mode prediction fails and then re-evaluates
deeper C-states possibility.

Below is another case which will clearly show the patch much benefit:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/time.h>
#include <time.h>
#include <pthread.h>

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
	fprintf(stderr,
		"Usage: idle_predict [options]\n"
		"  --help	-h  Print this help\n"
		"  --thread	-n  Thread number\n"
		"  --loop     	-l  Loop times in shallow Cstate\n"
		"  --delay	-t  Sleep time (uS)in shallow Cstate\n");
}

void *simple_loop() {
	int idle_num = 1;
	while (!(*shutdown)) {
		*count = *count + 1;

		if (idle_num % loop)
			usleep(delay);
		else {
			/* sleep 1 second */
			usleep(1000000);
			idle_num = 0;
		}
		idle_num++;
	}

}

static void sighand(int sig)
{
	*shutdown = 1;
}

int main(int argc, char *argv[])
{
	sigset_t sigset;
	int signum = SIGALRM;
	int i, c, er = 0, thread_num = 8;
	pthread_t pt[1024];

	static char optstr[] = "n:l:t:h:";

	while ((c = getopt(argc, argv, optstr)) != EOF)
		switch (c) {
			case 'n':
				thread_num = atoi(optarg);
				break;
			case 'l':
				loop = atoi(optarg);
				break;
			case 't':
				delay = atoi(optarg);
				break;
			case 'h':
			default:
				usage();
				exit(1);
		}

	printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
	count = malloc(sizeof(long));
	shutdown = malloc(sizeof(int));
	*count = 0;
	*shutdown = 0;

	sigemptyset(&sigset);
	sigaddset(&sigset, signum);
	sigprocmask (SIG_BLOCK, &sigset, NULL);
	signal(SIGINT, sighand);
	signal(SIGTERM, sighand);

	for(i = 0; i < thread_num ; i++)
		pthread_create(&pt[i], NULL, simple_loop, NULL);

	for (i = 0; i < thread_num; i++)
		pthread_join(pt[i], NULL);

	exit(0);
}

Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
After build the above test application, then run it.
Test plaform can be Intel Sandybridge or other recent platforms.
#./idle_predict -l 10 &
#./powertop

We will find that deep C-state will dangle between 40%~100% and much time spent
on C1 state. It is because menu governor wrongly predict that repeat mode
is kept, so it will choose the C1 shallow C-state even though it has chance to
sleep 1 second in deep C-state.

While after patched the kernel, we find that deep C-state will keep >99.6%.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:19 +01:00
Daniel Lezcano e45a00d679 cpuidle / sysfs: move kobj initialization in the syfs file
Move the kobj initialization and completion in the sysfs.c
and encapsulate the code more.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:19 +01:00
Daniel Lezcano 1aef40e288 cpuidle / sysfs: change function parameter
The function needs the cpuidle_device which is initially passed to the
caller.

The current code gets the struct device from the struct cpuidle_device,
pass it the cpuidle_add_sysfs function. This function calls
per_cpu(cpuidle_devices, cpu) to get the cpuidle_device.

This patch pass the cpuidle_device instead and simplify the code.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:19 +01:00
Rob Herring be6a98d3f0 cpuidle: add Calxeda SOC idle support
Add support for core powergating on Calxeda platforms. Initially, this
supports ECX-1000 (highbank), but support will be added for ECX-2000
later.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
2012-11-07 17:15:36 -06:00
Srivatsa S. Bhat cf31cd1a0c ACPI idle, CPU hotplug: Fix NULL pointer dereference during hotplug
On a KVM guest, when a CPU is taken offline and brought back online, we hit
the following NULL pointer dereference:

[   45.400843] Unregister pv shared memory for cpu 1
[   45.412331] smpboot: CPU 1 is now offline
[   45.529894] SMP alternatives: lockdep: fixing up alternatives
[   45.533472] smpboot: Booting Node 0 Processor 1 APIC 0x1
[   45.411526] kvm-clock: cpu 1, msr 0:7d14601, secondary cpu clock
[   45.571370] KVM setup async PF for cpu 1
[   45.572331] kvm-stealtime: cpu 1, msr 7d0e040
[   45.575031] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   45.576017] IP: [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
[   45.576017] PGD 5dfb067 PUD 5da8067 PMD 0
[   45.576017] Oops: 0000 [#1] SMP
[   45.576017] Modules linked in:
[   45.576017] CPU 0
[   45.576017] Pid: 607, comm: stress_cpu_hotp Not tainted 3.6.0-padata-tp-debug #3 Bochs Bochs
[   45.576017] RIP: 0010:[<ffffffff81519f98>]  [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
[   45.576017] RSP: 0018:ffff880005d93ce8  EFLAGS: 00010286
[   45.576017] RAX: ffff880005d93fd8 RBX: 0000000000000000 RCX: 0000000000000006
[   45.576017] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 0000000000000000
[   45.576017] RBP: ffff880005d93cf8 R08: 2222222222222222 R09: 2222222222222222
[   45.576017] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   45.576017] R13: 0000000000000000 R14: ffffffff81c8cca0 R15: 0000000000000001
[   45.576017] FS:  00007f91936ae700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
[   45.576017] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   45.576017] CR2: 0000000000000000 CR3: 0000000005db3000 CR4: 00000000000006f0
[   45.576017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   45.576017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   45.576017] Process stress_cpu_hotp (pid: 607, threadinfo ffff880005d92000, task ffff8800066bbf40)
[   45.576017] Stack:
[   45.576017]  ffff880007a96400 0000000000000000 ffff880005d93d28 ffffffff813ac689
[   45.576017]  ffff880007a96400 ffff880007a96400 0000000000000002 ffffffff81cd8d01
[   45.576017]  ffff880005d93d58 ffffffff813aa498 0000000000000001 00000000ffffffdd
[   45.576017] Call Trace:
[   45.576017]  [<ffffffff813ac689>] acpi_processor_hotplug+0x55/0x97
[   45.576017]  [<ffffffff813aa498>] acpi_cpu_soft_notify+0x93/0xce
[   45.576017]  [<ffffffff816ae47d>] notifier_call_chain+0x5d/0x110
[   45.576017]  [<ffffffff8109730e>] __raw_notifier_call_chain+0xe/0x10
[   45.576017]  [<ffffffff81069050>] __cpu_notify+0x20/0x40
[   45.576017]  [<ffffffff81069085>] cpu_notify+0x15/0x20
[   45.576017]  [<ffffffff816978f1>] _cpu_up+0xee/0x137
[   45.576017]  [<ffffffff81697983>] cpu_up+0x49/0x59
[   45.576017]  [<ffffffff8168758d>] store_online+0x9d/0xe0
[   45.576017]  [<ffffffff8140a9f8>] dev_attr_store+0x18/0x30
[   45.576017]  [<ffffffff812322c0>] sysfs_write_file+0xe0/0x150
[   45.576017]  [<ffffffff811b389c>] vfs_write+0xac/0x180
[   45.576017]  [<ffffffff811b3be2>] sys_write+0x52/0xa0
[   45.576017]  [<ffffffff816b31e9>] system_call_fastpath+0x16/0x1b
[   45.576017] Code: 48 c7 c7 40 e5 ca 81 e8 07 d0 18 00 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 48 89 fb <f6> 07 02 75 13 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 84 00 00
[   45.576017] RIP  [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80
[   45.576017]  RSP <ffff880005d93ce8>
[   45.576017] CR2: 0000000000000000
[   45.656079] ---[ end trace 433d6c9ac0b02cef ]---

Analysis:
Commit 3d339dc (cpuidle / ACPI : move cpuidle_device field out of the
acpi_processor_power structure()) made the allocation of the dev structure
(struct cpuidle) of a CPU dynamic, whereas previously it was statically
allocated. And this dynamic allocation occurs in acpi_processor_power_init()
if pr->flags.power evaluates to non-zero.

On KVM guests, pr->flags.power evaluates to zero, hence dev is never
allocated. This causes the NULL pointer (dev) dereference in
cpuidle_disable_device() during a subsequent CPU online operation. Fix this
by ensuring that dev is non-NULL before dereferencing.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-10-08 22:52:54 -04:00
Daniel Lezcano ed953472d1 cpuidle: rename function name "__cpuidle_register_driver", v2
The function __cpuidle_register_driver name is confusing because it
suggests, conforming to the coding style of the kernel, it registers
the driver without taking a lock. Actually, it just fill the different
power field states with a decresing value if the power has not been
specified.

Clarify the purpose of the function by changing its name and
move the condition out of this function.

This patch fix nothing and does not change the behavior of the
function. It is just for the sake of clarity.

IHMO, reading in the code:

+       if (!drv->power_specified)
+               set_power_states(drv);

is much more explicit than:

-       __cpuidle_register_driver(drv);

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-09-22 00:38:32 +02:00
Daniel Lezcano a77de28662 cpuidle: remove some empty lines
This mindless patch is just about removing some trailing
carriage returns.

[rjw: Changed the subject.]

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-09-19 21:59:42 +02:00
Rafael J. Wysocki 66804c13f7 PM / cpuidle: Make ladder governor use the "disabled" state flag
For the mechanism introduced by commit cbc9ef0 (PM / Domains: Add
preliminary support for cpuidle, v2) to work with the ladder
governor, that governor should respect the "disabled" state flag
added by that commit.  Change the ladder governor accordingly.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-09-04 01:35:45 +02:00
Carsten Emde 62d6ae880e Honor state disabling in the cpuidle ladder governor
There are two cpuidle governors ladder and menu. While the ladder
governor is always available, if CONFIG_CPU_IDLE is selected, the
menu governor additionally requires CONFIG_NO_HZ.

A particular C state can be disabled by writing to the sysfs file
/sys/devices/system/cpu/cpuN/cpuidle/stateN/disable, but this mechanism
is only implemented in the menu governor. Thus, in a system where
CONFIG_NO_HZ is not selected, the ladder governor becomes default and
always will walk through all sleep states - irrespective of whether the
C state was disabled via sysfs or not. The only way to select a specific
C state was to write the related latency to /dev/cpu_dma_latency and
keep the file open as long as this setting was required - not very
practical and not suitable for setting a single core in an SMP system.

With this patch, the ladder governor only will promote to the next
C state, if it has not been disabled, and it will demote, if the
current C state was disabled.

Note that the patch does not make the setting of the sysfs variable
"disable" coherent, i.e. if one is disabling a light state, then all
deeper states are disabled as well, but the "disable" variable does not
reflect it. Likewise, if one enables a deep state but a lighter state
still is disabled, then this has no effect. A related section has been
added to the documentation.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-09-04 01:35:44 +02:00
Jon Medhurst (Tixy) 5fbbb90dfd cpuidle: Prevent null pointer dereference in cpuidle_coupled_cpu_notify
When a kernel is built to support multiple hardware types it's possible
that CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is set but the hardware the
kernel is run on doesn't support cpuidle and therefore doesn't load a
driver for it. In this case, when the system is shut down,
cpuidle_coupled_cpu_notify() gets called with cpuidle_devices set to
NULL. There are quite possibly other circumstances where this
situation can also occur and we should check for it.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-08-17 19:37:08 +02:00
Colin Cross 63c6ba4352 cpuidle: coupled: fix sleeping while atomic in cpu notifier
The cpu hotplug notifier gets called in both atomic and non-atomic
contexts, it is not always safe to lock a mutex.  Filter out all events
except the six necessary ones, which are all sleepable, before taking
the mutex.

Signed-off-by: Colin Cross <ccross@android.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-08-17 19:37:01 +02:00
Linus Torvalds 476525004a Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull ACPI & power management update from Len Brown:
 "Re-write of the turbostat tool.
     lower overhead was necessary for measuring very large system when
     they are very idle.

  IVB support in intel_idle
     It's what I run on my IVB, others should be able to also:-)

  ACPICA core update
     We have found some bugs due to divergence between Linux and the
     upstream ACPICA base.  Most of these patches are to reduce that
     divergence to reduce the risk of future bugs.

  Some cpuidle updates, mostly for non-Intel
     More will be coming, as they depend on this part.

  Some thermal management changes needed by non-ACPI systems.

  Some _OST (OS Status Indication) updates for hot ACPI hot-plug."

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (51 commits)
  Thermal: Documentation update
  Thermal: Add Hysteresis attributes
  Thermal: Make Thermal trip points writeable
  ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check
  tools/power: turbostat: fix large c1% issue
  tools/power: turbostat v2 - re-write for efficiency
  ACPICA: Update to version 20120711
  ACPICA: AcpiSrc: Fix some translation issues for Linux conversion
  ACPICA: Update header files copyrights to 2012
  ACPICA: Add new ACPI table load/unload external interfaces
  ACPICA: Split file: tbxface.c -> tbxfload.c
  ACPICA: Add PCC address space to space ID decode function
  ACPICA: Fix some comment fields
  ACPICA: Table manager: deploy new firmware error/warning interfaces
  ACPICA: Add new interfaces for BIOS(firmware) errors and warnings
  ACPICA: Split exception code utilities to a new file, utexcep.c
  ACPI: acpi_pad: tune round_robin_time
  ACPICA: Update to version 20120620
  ACPICA: Add support for implicit notify on multiple devices
  ACPICA: Update comments; no functional change
  ...
2012-07-26 14:28:55 -07:00
Len Brown ec033d0a02 Merge branches 'acpi_pad', 'acpica', 'apei-bugzilla-43282', 'battery', 'cpuidle-coupled', 'cpuidle-tweaks', 'intel_idle-ivb', 'ost', 'red-hat-bz-772730', 'thermal', 'thermal-spear' and 'turbostat-v2' into release 2012-07-26 00:03:58 -04:00
Rafael J. Wysocki 7791bd230c Merge branch 'pm-domains'
* pm-domains:
  PM / Domains: Fix build warning for CONFIG_PM_RUNTIME unset
  PM / Domains: Replace plain integer with NULL pointer in domain.c file
  PM / Domains: Add missing static storage class specifier in domain.c file
  PM / Domains: Allow device callbacks to be added at any time
  PM / Domains: Add device domain data reference counter
  PM / Domains: Add preliminary support for cpuidle, v2
  PM / Domains: Do not stop devices after restoring their states
  PM / Domains: Use subsystem runtime suspend/resume callbacks by default
2012-07-19 00:03:17 +02:00
Preeti U Murthy 8651f97bd9 PM / cpuidle: System resume hang fix with cpuidle
On certain bios, resume hangs if cpus are allowed to enter idle states
during suspend [1].

This was fixed in apci idle driver [2].But intel_idle driver does not
have this fix. Thus instead of replicating the fix in both the idle
drivers, or in more platform specific idle drivers if needed, the
more general cpuidle infrastructure could handle this.

A suspend callback in cpuidle_driver could handle this fix. But
a cpuidle_driver provides only basic functionalities like platform idle
state detection capability and mechanisms to support entry and exit
into CPU idle states. All other cpuidle functions are found in the
cpuidle generic infrastructure for good reason that all cpuidle
drivers, irrepective of their platforms will support these functions.

One option therefore would be to register a suspend callback in cpuidle
which handles this fix. This could be called through a PM_SUSPEND_PREPARE
notifier. But this is too generic a notfier for a driver to handle.

Also, ideally the job of cpuidle is not to handle side effects of suspend.
It should expose the interfaces which "handle cpuidle 'during' suspend"
or any other operation, which the subsystems call during that respective
operation.

The fix demands that during suspend, no cpus should be allowed to enter
deep C-states. The interface cpuidle_uninstall_idle_handler() in cpuidle
ensures that. Not just that it also kicks all the cpus which are already
in idle out of their idle states which was being done during cpu hotplug
through a CPU_DYING_FROZEN callbacks.

Now the question arises about when during suspend should
cpuidle_uninstall_idle_handler() be called. Since we are dealing with
drivers it seems best to call this function during dpm_suspend().
Delaying the call till dpm_suspend_noirq() does no harm, as long as it is
before cpu_hotplug_begin() to avoid race conditions with cpu hotpulg
operations. In dpm_suspend_noirq(), it would be wise to place this call
before suspend_device_irqs() to avoid ugly interactions with the same.

Ananlogously, during resume.

References:
[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/674075.
[2] http://marc.info/?l=linux-pm&m=133958534231884&w=2

Reported-and-tested-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-07-10 21:34:49 +02:00
Rafael J. Wysocki cbc9ef0287 PM / Domains: Add preliminary support for cpuidle, v2
On some systems there are CPU cores located in the same power
domains as I/O devices.  Then, power can only be removed from the
domain if all I/O devices in it are not in use and the CPU core
is idle.  Add preliminary support for that to the generic PM domains
framework.

First, the platform is expected to provide a cpuidle driver with one
extra state designated for use with the generic PM domains code.
This state should be initially disabled and its exit_latency value
should be set to whatever time is needed to bring up the CPU core
itself after restoring power to it, not including the domain's
power on latency.  Its .enter() callback should point to a procedure
that will remove power from the domain containing the CPU core at
the end of the CPU power transition.

The remaining characteristics of the extra cpuidle state, referred to
as the "domain" cpuidle state below, (e.g. power usage, target
residency) should be populated in accordance with the properties of
the hardware.

Next, the platform should execute genpd_attach_cpuidle() on the PM
domain containing the CPU core.  That will cause the generic PM
domains framework to treat that domain in a special way such that:

 * When all devices in the domain have been suspended and it is about
   to be turned off, the states of the devices will be saved, but
   power will not be removed from the domain.  Instead, the "domain"
   cpuidle state will be enabled so that power can be removed from
   the domain when the CPU core is idle and the state has been chosen
   as the target by the cpuidle governor.

 * When the first I/O device in the domain is resumed and
   __pm_genpd_poweron(() is called for the first time after
   power has been removed from the domain, the "domain" cpuidle
   state will be disabled to avoid subsequent surprise power removals
   via cpuidle.

The effective exit_latency value of the "domain" cpuidle state
depends on the time needed to bring up the CPU core itself after
restoring power to it as well as on the power on latency of the
domain containing the CPU core.  Thus the "domain" cpuidle state's
exit_latency has to be recomputed every time the domain's power on
latency is updated, which may happen every time power is restored
to the domain, if the measured power on latency is greater than
the latency stored in the corresponding generic_pm_domain structure.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Kevin Hilman <khilman@ti.com>
2012-07-03 19:07:42 +02:00
Rafael J. Wysocki 6e797a0788 PM / cpuidle: Add driver reference counter
Add a reference counter for the cpuidle driver, so that it can't
be unregistered when it is in use.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-07-03 19:06:25 +02:00
ShuoX Liu dc7fd275ae cpuidle: move field disable from per-driver to per-cpu
Andrew J.Schorr raises a question.  When he changes the disable setting on
a single CPU, it affects all the other CPUs.  Basically, currently, the
disable field is per-driver instead of per-cpu.  All the C states of the
same driver are shared by all CPU in the same machine.

The patch changes the `disable' field to per-cpu, so we could set this
separately for each cpu.

Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
Reported-by: Andrew J.Schorr <aschorr@telemetry-investments.com>
Reviewed-by: Yanmin Zhang <yanmin_zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-07-03 19:05:31 +02:00
Colin Cross 20ff51a36b cpuidle: coupled: add parallel barrier function
Adds cpuidle_coupled_parallel_barrier, which can be used by coupled
cpuidle state enter functions to handle resynchronization after
determining if any cpu needs to abort.  The normal use case will
be:

static bool abort_flag;
static atomic_t abort_barrier;

int arch_cpuidle_enter(struct cpuidle_device *dev, ...)
{
	if (arch_turn_off_irq_controller()) {
	   	/* returns an error if an irq is pending and would be lost
		   if idle continued and turned off power */
		abort_flag = true;
	}

	cpuidle_coupled_parallel_barrier(dev, &abort_barrier);

	if (abort_flag) {
	   	/* One of the cpus didn't turn off it's irq controller */
	   	arch_turn_on_irq_controller();
		return -EINTR;
	}

	/* continue with idle */
	...
}

This will cause all cpus to abort idle together if one of them needs
to abort.

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-02 00:49:36 -04:00
Colin Cross 4126c0197b cpuidle: add support for states that affect multiple cpus
On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the
cpus cannot be independently powered down, either due to
sequencing restrictions (on Tegra 2, cpu 0 must be the last to
power down), or due to HW bugs (on OMAP4460, a cpu powering up
will corrupt the gic state unless the other cpu runs a work
around).  Each cpu has a power state that it can enter without
coordinating with the other cpu (usually Wait For Interrupt, or
WFI), and one or more "coupled" power states that affect blocks
shared between the cpus (L2 cache, interrupt controller, and
sometimes the whole SoC).  Entering a coupled power state must
be tightly controlled on both cpus.

The easiest solution to implementing coupled cpu power states is
to hotplug all but one cpu whenever possible, usually using a
cpufreq governor that looks at cpu load to determine when to
enable the secondary cpus.  This causes problems, as hotplug is an
expensive operation, so the number of hotplug transitions must be
minimized, leading to very slow response to loads, often on the
order of seconds.

This file implements an alternative solution, where each cpu will
wait in the WFI state until all cpus are ready to enter a coupled
state, at which point the coupled state function will be called
on all cpus at approximately the same time.

Once all cpus are ready to enter idle, they are woken by an smp
cross call.  At this point, there is a chance that one of the
cpus will find work to do, and choose not to enter idle.  A
final pass is needed to guarantee that all cpus will call the
power state enter function at the same time.  During this pass,
each cpu will increment the ready counter, and continue once the
ready counter matches the number of online coupled cpus.  If any
cpu exits idle, the other cpus will decrement their counter and
retry.

To use coupled cpuidle states, a cpuidle driver must:

   Set struct cpuidle_device.coupled_cpus to the mask of all
   coupled cpus, usually the same as cpu_possible_mask if all cpus
   are part of the same cluster.  The coupled_cpus mask must be
   set in the struct cpuidle_device for each cpu.

   Set struct cpuidle_device.safe_state to a state that is not a
   coupled state.  This is usually WFI.

   Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each
   state that affects multiple cpus.

   Provide a struct cpuidle_state.enter function for each state
   that affects multiple cpus.  This function is guaranteed to be
   called on all cpus at approximately the same time.  The driver
   should ensure that the cpus all abort together if any cpu tries
   to abort once the function is called.

update1:

cpuidle: coupled: fix count of online cpus

online_count was never incremented on boot, and was also counting
cpus that were not part of the coupled set.  Fix both issues by
introducting a new function that counts online coupled cpus, and
call it from register as well as the hotplug notifier.

update2:

cpuidle: coupled: fix decrementing ready count

cpuidle_coupled_set_not_ready sometimes refuses to decrement the
ready count in order to prevent a race condition.  This makes it
unsuitable for use when finished with idle.  Add a new function
cpuidle_coupled_set_done that decrements both the ready count and
waiting count, and call it after idle is complete.

Cc: Amit Kucheria <amit.kucheria@linaro.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Trinabh Gupta <g.trinabh@gmail.com>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-02 00:49:09 -04:00
Colin Cross 3af272ab75 cpuidle: fix error handling in __cpuidle_register_device
Fix the error handling in __cpuidle_register_device to include
the missing list_del.  Move it to a label, which will simplify
the error handling when coupled states are added.

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Colin Cross <ccross@android.com>
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-02 00:48:49 -04:00
Colin Cross 56cfbf74a1 cpuidle: refactor out cpuidle_enter_state
Split the code to enter a state and update the stats into a helper
function, cpuidle_enter_state, and export it.  This function will
be called by the coupled state code to handle entering the safe
state and the final coupled state.

Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Colin Cross <ccross@android.com>
Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-02 00:48:31 -04:00
Srivatsa S. Bhat 1b0a0e9a15 cpuidle: add checks to avoid NULL pointer dereference
The existing check for dev == NULL in __cpuidle_register_device() is
rendered useless because dev is dereferenced before the check itself.
Moreover, correctly speaking, it is the job of the callers of this
function, i.e., cpuidle_register_device() & cpuidle_enable_device() (which
also happen to be exported functions) to ensure that
__cpuidle_register_device() is called with a non-NULL dev.

So add the necessary dev == NULL checks in the two callers and remove the
(useless) check from __cpuidle_register_device().

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-01 16:07:23 -04:00
Sergey Senozhatsky 0aeb9cac6f cpuidle: remove unused hrtimer_peek_ahead_timers() call
commit 9a6558371b
  Author: Arjan van de Ven <arjan@linux.intel.com>
  Date:   Sun Nov 9 12:45:10 2008 -0800

     regression: disable timer peek-ahead for 2.6.28

     It's showing up as regressions; disabling it very likely just papers
     over an underlying issue, but time is running out for 2.6.28, lets get
     back to this for 2.6.29

 Many years has passed since 2008, so it seems ok to remove whole `#if 0' block.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Trinabh Gupta <g.trinabh@gmail.com>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-01 16:06:48 -04:00
Thomas Gleixner 4a1625133d cpuidle: Use kick_all_cpus_sync()
kick_all_cpus_sync() is the core implementation of cpu_idle_wait()
which is copied all over the arch code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120507175652.119842173@linutronix.de
2012-05-08 12:35:06 +02:00
Len Brown eeaab2d8af Merge branches 'idle-fix' and 'misc' into release 2012-04-06 21:48:59 -04:00
Toshi Kani ee01e66337 cpuidle: Fix panic in CPU off-lining with no idle driver
Fix a NULL pointer dereference panic in cpuidle_play_dead() during
CPU off-lining when no cpuidle driver is registered.  A cpuidle
driver may be registered at boot-time based on CPU type.  This patch
allows an off-lined CPU to enter HLT-based idle in this condition.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: Boris Ostrovsky <boris.ostrovsky@amd.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-04-06 15:01:25 -04:00