-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZQspmAAoJEE44bZycYXAvLXMP/3Uqx7K7dGjHvvhGA4DhnzSp
bGLpjeP1sXXnnd932PN+qkGbl2j/NPjS74DobDqGWnrwxKRzQ21F4YkWJGtb4Pe2
JKcY7y2rbKGcwhpS9qDMkSWuaUKJWF5MAsH08LnCWqlGphGwAH/uPTdqS4iI/CJM
aQvaaITe5SVzvpvpyoCVdHqu8K+Ukraf91mvt7hlmrn9OnqO9us9MWulw5sSXQcd
pM8ZbRkBDE5OFeVnPKJDBY+cR2ML41wekMMwvJWt7uRyrX2i5c7oQVXYoeYE4MKx
Pueb7aG7LQwBUzNJCiZA6PAEFQPwNPCoxHZbAax0D6/JyDWOZukappquzjd6gLDM
+U7mxeFTeNZJ5v9tUcUIOb4GaaFcccS3wdDP23V2N8iM88hFVwJn0RSy/pksX37+
ZNDiEyDeJBjz3kh/Kf40zhFIIrABMozFeX3tpSRVVqXb+T6P9l8Y88O2LGY5FCXK
QBbAC+jC4X4YI+4v+QWImg9mkfTwzZyjyAlfyjPlHVSK9KDP9M6LXpr2+jKS7jOc
ievMOh9ku0HIVuSWGUKZSqjvcF01Bh99tFlX+KqipomwNTwa4hKCLmnOVflF1BPE
8sfD9hvenA0e949kXrURUmqpg6Ujkrbb/lXuD7e2CakCu+XjEMf317R11TyTsHNG
10hsmPsGDVcwbyFOFHS3
=mvzl
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEJDfLduVEy2qz2d/TmXOSYMtstxYFAlpqfEUACgkQmXOSYMts
txbJOQ/+Pce1eBSgjESWKuz0OP9BfAe9RpWFi7lBZ/EgRwJVYEx6jau9EYXAQ7YT
roCIsV6eufhMplYGHJz6EHxK2Hieb1zG9ooX9ss9GxiB6qmqeqC0Slm9EQE15yGT
px3fVz9r86edqjtj7UKK0/n8DJUaFh5LWOymLD3d3/115RYQsl/GowugH9F79PvN
pR+OyXq7srtfCmwdhZ65012Ef10RXqBRv0fCYBH6r+jkMqb7uSDFzdR39Z7k3QFk
AM4+3lTm6EEZ4xZkcMyX3GuQWslpPAlvFdEx43TjdCbseXAqURoppmxvz+Izum75
fy0oOdKl5OSpyZArRkUfZ0MnL6BHGcKxwYV4u1LupwvqPyaUT4yiT5VEUdy9EqJo
Syrr0oSR2lrXqQESdxKkmOZVXyul0nF3Fh1p5QlU1/Id9oskMLYqcXegFyhr2Wyp
+A4ZozljEQ4AGm4dYFdH3w8TcNDttjztYoKf8OXnaCOj3p/SEq84tk4Hm3vpoPvh
5OzsZC3UB9gJ1mXsKOVKLJFCPzmg61KOvwhopfAcC6cyiIIf/MPCneZeOzsavtQX
J+atSNcLVNE3jmrXvUrwxSpZ3KCc3Ti5Q8pD9ni6/B6st2+LO8EXPrS6n2+28nvu
hVpjyCXLbghdmn1mjOGW9lvMQEg/Dupj/ocpCPHJnXpbpM8Mcjo=
=3eAv
-----END PGP SIGNATURE-----
Merge 3.10.106 into android-msm-bullhead-3.10-oreo-m5
Changes in 3.10.106: (252 commits)
packet: fix race condition in packet_set_ring
crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks
EVM: Use crypto_memneq() for digest comparisons
libceph: don't set weight to IN when OSD is destroyed
KVM: x86: fix emulation of "MOV SS, null selector"
KVM: x86: Introduce segmented_write_std
posix_acl: Clear SGID bit when setting file permissions
tmpfs: clear S_ISGID when setting posix ACLs
fbdev: color map copying bounds checking
selinux: fix off-by-one in setprocattr
tcp: avoid infinite loop in tcp_splice_read()
xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings
KEYS: Change the name of the dead type to ".dead" to prevent user access
KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings
ext4: fix data exposure after a crash
locking/rtmutex: Prevent dequeue vs. unlock race
m68k: Fix ndelay() macro
hotplug: Make register and unregister notifier API symmetric
Btrfs: fix tree search logic when replaying directory entry deletes
USB: serial: kl5kusb105: fix open error path
block_dev: don't test bdev->bd_contains when it is not stable
crypto: caam - fix AEAD givenc descriptors
ext4: fix mballoc breakage with 64k block size
ext4: fix stack memory corruption with 64k block size
ext4: reject inodes with negative size
ext4: return -ENOMEM instead of success
f2fs: set ->owner for debugfs status file's file_operations
block: protect iterate_bdevs() against concurrent close
scsi: zfcp: fix use-after-"free" in FC ingress path after TMF
scsi: zfcp: do not trace pure benign residual HBA responses at default level
scsi: zfcp: fix rport unblock race with LUN recovery
ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it
IB/mad: Fix an array index check
IB/multicast: Check ib_find_pkey() return value
powerpc: Convert cmp to cmpd in idle enter sequence
usb: gadget: composite: Test get_alt() presence instead of set_alt()
USB: serial: omninet: fix NULL-derefs at open and disconnect
USB: serial: quatech2: fix sleep-while-atomic in close
USB: serial: pl2303: fix NULL-deref at open
USB: serial: keyspan_pda: verify endpoints at probe
USB: serial: spcp8x5: fix NULL-deref at open
USB: serial: io_ti: fix NULL-deref at open
USB: serial: io_ti: fix another NULL-deref at open
USB: serial: iuu_phoenix: fix NULL-deref at open
USB: serial: garmin_gps: fix memory leak on failed URB submit
USB: serial: ti_usb_3410_5052: fix NULL-deref at open
USB: serial: io_edgeport: fix NULL-deref at open
USB: serial: oti6858: fix NULL-deref at open
USB: serial: cyberjack: fix NULL-deref at open
USB: serial: kobil_sct: fix NULL-deref in write
USB: serial: mos7840: fix NULL-deref at open
USB: serial: mos7720: fix NULL-deref at open
USB: serial: mos7720: fix use-after-free on probe errors
USB: serial: mos7720: fix parport use-after-free on probe errors
USB: serial: mos7720: fix parallel probe
usb: xhci-mem: use passed in GFP flags instead of GFP_KERNEL
usb: musb: Fix trying to free already-free IRQ 4
ALSA: usb-audio: Fix bogus error return in snd_usb_create_stream()
USB: serial: kl5kusb105: abort on open exception path
staging: iio: ad7606: fix improper setting of oversampling pins
usb: dwc3: gadget: always unmap EP0 requests
cris: Only build flash rescue image if CONFIG_ETRAX_AXISFLASHMAP is selected
hwmon: (ds620) Fix overflows seen when writing temperature limits
clk: clk-wm831x: fix a logic error
iommu/amd: Fix the left value check of cmd buffer
scsi: mvsas: fix command_active typo
target/iscsi: Fix double free in lio_target_tiqn_addtpg()
mmc: mmc_test: Uninitialized return value
powerpc/pci/rpadlpar: Fix device reference leaks
ser_gigaset: return -ENOMEM on error instead of success
net, sched: fix soft lockup in tc_classify
net: stmmac: Fix race between stmmac_drv_probe and stmmac_open
gro: Enter slow-path if there is no tailroom
gro: use min_t() in skb_gro_reset_offset()
gro: Disable frag0 optimization on IPv6 ext headers
powerpc: Fix build warning on 32-bit PPC
Input: i8042 - add Pegatron touchpad to noloop table
mm/hugetlb.c: fix reservation race when freeing surplus pages
USB: serial: kl5kusb105: fix line-state error handling
USB: serial: ch341: fix initial modem-control state
USB: serial: ch341: fix open error handling
USB: serial: ch341: fix control-message error handling
USB: serial: ch341: fix open and resume after B0
USB: serial: ch341: fix resume after reset
USB: serial: ch341: fix modem-control and B0 handling
x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option
NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.
powerpc/ibmebus: Fix further device reference leaks
powerpc/ibmebus: Fix device reference leaks in sysfs interface
IB/mlx4: Set traffic class in AH
IB/mlx4: Fix port query for 56Gb Ethernet links
perf scripting: Avoid leaking the scripting_context variable
ARM: dts: imx31: fix clock control module interrupts description
svcrpc: don't leak contexts on PROC_DESTROY
mmc: mxs-mmc: Fix additional cycles after transmission stop
mtd: nand: xway: disable module support
ubifs: Fix journal replay wrt. xattr nodes
arm64/ptrace: Preserve previous registers for short regset write
arm64/ptrace: Avoid uninitialised struct padding in fpr_set()
arm64/ptrace: Reject attempts to set incomplete hardware breakpoint fields
ARM: ux500: fix prcmu_is_cpu_in_wfi() calculation
ite-cir: initialize use_demodulator before using it
fuse: do not use iocb after it may have been freed
crypto: caam - fix non-hmac hashes
drm/i915: Don't leak edid in intel_crt_detect_ddc()
s5k4ecgx: select CRC32 helper
platform/x86: intel_mid_powerbtn: Set IRQ_ONESHOT
net: fix harmonize_features() vs NETIF_F_HIGHDMA
tcp: initialize max window for a new fastopen socket
svcrpc: fix oops in absence of krb5 module
ARM: 8643/3: arm/ptrace: Preserve previous registers for short regset write
mac80211: Fix adding of mesh vendor IEs
scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send
drm/i915: fix use-after-free in page_flip_completed()
net: use a work queue to defer net_disable_timestamp() work
ipv4: keep skb->dst around in presence of IP options
netlabel: out of bound access in cipso_v4_validate()
ip6_gre: fix ip6gre_err() invalid reads
ping: fix a null pointer dereference
l2tp: do not use udp_ioctl()
packet: fix races in fanout_add()
packet: Do not call fanout_release from atomic contexts
net: socket: fix recvmmsg not returning error from sock_error
USB: serial: mos7840: fix another NULL-deref at open
USB: serial: ftdi_sio: fix modem-status error handling
USB: serial: ftdi_sio: fix extreme low-latency setting
USB: serial: ftdi_sio: fix line-status over-reporting
USB: serial: spcp8x5: fix modem-status handling
USB: serial: opticon: fix CTS retrieval at open
USB: serial: ark3116: fix register-accessor error handling
x86/platform/goldfish: Prevent unconditional loading
goldfish: Sanitize the broken interrupt handler
ocfs2: do not write error flag to user structure we cannot copy from/to
mfd: pm8921: Potential NULL dereference in pm8921_remove()
drm/nv50/disp: min/max are reversed in nv50_crtc_gamma_set()
net: 6lowpan: fix lowpan_header_create non-compression memcpy call
vti4: Don't count header length twice.
net/sched: em_meta: Fix 'meta vlan' to correctly recognize zero VID frames
MIPS: OCTEON: Fix copy_from_user fault handling for large buffers
MIPS: Clear ISA bit correctly in get_frame_info()
MIPS: Prevent unaligned accesses during stack unwinding
MIPS: Fix get_frame_info() handling of microMIPS function size
MIPS: Fix is_jump_ins() handling of 16b microMIPS instructions
MIPS: Calculate microMIPS ra properly when unwinding the stack
MIPS: Handle microMIPS jumps in the same way as MIPS32/MIPS64 jumps
uvcvideo: Fix a wrong macro
scsi: aacraid: Reorder Adapter status check
ath9k: use correct OTP register offsets for the AR9340 and AR9550
fuse: add missing FR_FORCE
RDMA/core: Fix incorrect structure packing for booleans
NFSv4: fix getacl head length estimation
s390/qdio: clear DSCI prior to scanning multiple input queues
IB/ipoib: Fix deadlock between rmmod and set_mode
ktest: Fix child exit code processing
nlm: Ensure callback code also checks that the files match
dm: flush queued bios when process blocks to avoid deadlock
USB: serial: digi_acceleport: fix OOB data sanity check
USB: serial: digi_acceleport: fix OOB-event processing
MIPS: ip27: Disable qlge driver in defconfig
tracing: Add #undef to fix compile error
USB: serial: safe_serial: fix information leak in completion handler
USB: serial: omninet: fix reference leaks at open
USB: iowarrior: fix NULL-deref at probe
USB: iowarrior: fix NULL-deref in write
USB: serial: io_ti: fix NULL-deref in interrupt callback
USB: serial: io_ti: fix information leak in completion handler
vxlan: correctly validate VXLAN ID against VXLAN_N_VID
ipv4: mask tos for input route
locking/static_keys: Add static_key_{en,dis}able() helpers
net: net_enable_timestamp() can be called from irq contexts
dccp/tcp: fix routing redirect race
net sched actions: decrement module reference count after table flush.
perf/core: Fix event inheritance on fork()
isdn/gigaset: fix NULL-deref at probe
xen: do not re-use pirq number cached in pci device msi msg data
net: properly release sk_frag.page
net: unix: properly re-increment inflight counter of GC discarded candidates
Input: ims-pcu - validate number of endpoints before using them
Input: hanwang - validate number of endpoints before using them
Input: yealink - validate number of endpoints before using them
Input: cm109 - validate number of endpoints before using them
USB: uss720: fix NULL-deref at probe
USB: idmouse: fix NULL-deref at probe
USB: wusbcore: fix NULL-deref at probe
uwb: i1480-dfu: fix NULL-deref at probe
uwb: hwa-rc: fix NULL-deref at probe
mmc: ushc: fix NULL-deref at probe
ext4: mark inode dirty after converting inline directory
scsi: libsas: fix ata xfer length
ALSA: ctxfi: Fallback DMA mask to 32bit
ALSA: ctxfi: Fix the incorrect check of dma_set_mask() call
ACPI / PNP: Avoid conflicting resource reservations
ACPI / resources: free memory on error in add_region_before()
ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage
USB: OHCI: Fix race between ED unlink and URB submission
i2c: at91: manage unexpected RXRDY flag when starting a transfer
ipv4: igmp: Allow removing groups from a removed interface
ptrace: fix PTRACE_LISTEN race corrupting task->state
ring-buffer: Fix return value check in test_ringbuffer()
metag/usercopy: Fix alignment error checking
metag/usercopy: Add early abort to copy_to_user
metag/usercopy: Set flags before ADDZ
metag/usercopy: Fix src fixup in from user rapf loops
metag/usercopy: Add missing fixups
s390/decompressor: fix initrd corruption caused by bss clear
net/mlx4_en: Fix bad WQE issue
net/mlx4_core: Fix racy CQ (Completion Queue) free
char: Drop bogus dependency of DEVPORT on !M68K
powerpc: Disable HFSCR[TM] if TM is not supported
pegasus: Use heap buffers for all register access
rtl8150: Use heap buffers for all register access
tracing: Allocate the snapshot buffer before enabling probe
ring-buffer: Have ring_buffer_iter_empty() return true when empty
netfilter: arp_tables: fix invoking 32bit "iptable -P INPUT ACCEPT" failed in 64bit kernel
net: phy: handle state correctly in phy_stop_machine
l2tp: take reference on sessions being dumped
MIPS: KGDB: Use kernel context for sleeping threads
ARM: dts: imx31: move CCM device node to AIPS2 bus devices
ARM: dts: imx31: fix AVIC base address
tun: Fix TUN_PKT_STRIP setting
Staging: vt6655-6: potential NULL dereference in hostap_disable_hostapd()
net: sctp: rework multihoming retransmission path selection to rfc4960
perf trace: Use the syscall raw_syscalls:sys_enter timestamp
USB: usbtmc: add missing endpoint sanity check
ping: implement proper locking
USB: fix problems with duplicate endpoint addresses
USB: dummy-hcd: fix bug in stop_activity (handle ep0)
mm/init: fix zone boundary creation
can: Fix kernel panic at security_sock_rcv_skb
Drivers: hv: avoid vfree() on crash
xc2028: avoid use after free
xc2028: unlock on error in xc2028_set_config()
xc2028: Fix use-after-free bug properly
ipv6: fix ip6_tnl_parse_tlv_enc_lim()
ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim()
ipv6: fix the use of pcpu_tstats in ip6_tunnel
sctp: avoid BUG_ON on sctp_wait_for_sndbuf
sctp: deny peeloff operation on asocs with threads sleeping on it
KVM: x86: clear bus pointer when destroyed
kvm: exclude ioeventfd from counting kvm_io_range limit
KVM: kvm_io_bus_unregister_dev() should never fail
TTY: n_hdlc, fix lockdep false positive
tty: n_hdlc: get rid of racy n_hdlc.tbuf
ipv6: handle -EFAULT from skb_copy_bits
fs: exec: apply CLOEXEC before changing dumpable task flags
mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
dccp/tcp: do not inherit mc_list from parent
char: lp: fix possible integer overflow in lp_setup()
dccp: fix freeing skb too early for IPV6_RECVPKTINFO
Linux 3.10.106
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Conflicts:
drivers/mfd/pm8921-core.c
include/linux/cpu.h
kernel/cpu.c
net/ipv4/inet_connection_sock.c
net/ipv4/ping.c
commit 777c6e0daebb3fcefbbd6f620410a946b07ef6d0 upstream.
Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
notifiers when HOTPLUG_CPU=y while the registration might succeed even
when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
might keep a stale notifier on the list on the manual clean up during
the pool tear down and thus corrupt the list. Resulting in the following
[ 144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
[ 144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
<snipped>
[ 145.122628] Call Trace:
[ 145.125086] [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
[ 145.131350] [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
[ 145.137268] [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
[ 145.143188] [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
[ 145.149018] [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
[ 145.154940] [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
[ 145.160511] [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
[ 145.167035] [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
[ 145.172694] [<ffffffffa290848d>] module_attr_store+0x1d/0x30
[ 145.178443] [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
[ 145.183925] [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
[ 145.189761] [<ffffffffa2a99248>] __vfs_write+0x18/0x40
[ 145.194982] [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
[ 145.200122] [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
[ 145.205177] [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17
This can be even triggered manually by changing
/sys/module/zswap/parameters/compressor multiple times.
Fix this issue by making unregister APIs symmetric to the register so
there are no surprises.
[js] backport to 3.12
Fixes: 47e627bc8c ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
Reported-and-tested-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Willy Tarreau <w@1wt.eu>
In case some sysfs nodes needs to be labeled with a different label than
sysfs then user needs to be notified when a core is brought back online.
Bug: 29359497
Change-Id: I0395c86e01cd49c348fda8f93087d26f88557c91
Signed-off-by: Thierry Strudel <tstrudel@google.com>
The following method of CPU hotplug callback registration is not safe
due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
and the cpu_hotplug.lock.
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
The deadlock is shown below:
CPU 0 CPU 1
----- -----
Acquire cpu_hotplug.lock
[via get_online_cpus()]
CPU online/offline operation
takes cpu_add_remove_lock
[via cpu_maps_update_begin()]
Try to acquire
cpu_add_remove_lock
[via register_cpu_notifier()]
CPU online/offline operation
tries to acquire cpu_hotplug.lock
[via cpu_hotplug_begin()]
*** DEADLOCK! ***
The problem here is that callback registration takes the locks in one order
whereas the CPU hotplug operations take the same locks in the opposite order.
To avoid this issue and to provide a race-free method to register CPU hotplug
callbacks (along with initialization of already online CPUs), introduce new
variants of the callback registration APIs that simply register the callbacks
without holding the cpu_add_remove_lock during the registration. That way,
we can avoid the ABBA scenario. However, we will need to hold the
cpu_add_remove_lock throughout the entire critical section, to protect updates
to the callback/notifier chain.
This can be achieved by writing the callback registration code as follows:
cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]
for_each_online_cpu(cpu)
init_cpu(cpu);
/* This doesn't take the cpu_add_remove_lock */
__register_cpu_notifier(&foobar_cpu_notifier);
cpu_maps_update_done(); [ or cpu_notifier_register_done(); see below ]
Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
notifiers, and hence get_online_cpus() cannot provide the necessary
synchronization to protect the callback/notifier chains against concurrent
reads and writes. On the other hand, since the cpu_add_remove_lock protects
the entire hotplug operation (including CPU_POST_DEAD), we can use
cpu_maps_update_begin/done() to guarantee proper synchronization.
Also, since cpu_maps_update_begin/done() is like a super-set of
get/put_online_cpus(), the former naturally protects the critical sections
from concurrent hotplug operations.
Since the names cpu_maps_update_begin/done() don't make much sense in CPU
hotplug callback registration scenarios, we'll introduce new APIs named
cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().
In summary, introduce the lockless variants of un/register_cpu_notifier() and
also export the cpu_notifier_register_begin/done() APIs for use by modules.
This way, we provide a race-free way to register hotplug callbacks as well as
perform initialization for the CPUs that are already online.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Git-commit: 93ae4f978ca7f26d17df915ac7afc919c1dd0353
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Osvaldo Banuelos <osvaldob@codeaurora.org>
Lai found that:
WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
...
migration_cpu_stop+0x1d/0x22
was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.
This isn't true since 5fbd036b55 ("sched: Cleanup cpu_active madness").
So set active and online at the same time to avoid this particular
problem.
CRs-Fixed: 680496
Change-Id: I89ac9b6829acf200072975bc7d028a469167f083
Fixes: 5fbd036b55 ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: 24d52daafc
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
commit 6acbfb96976fc3350e30d964acb1dbbdf876d55e upstream.
Lai found that:
WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
...
migration_cpu_stop+0x1d/0x22
was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.
This isn't true since 5fbd036b55 ("sched: Cleanup cpu_active madness").
So set active and online at the same time to avoid this particular
problem.
Fixes: 5fbd036b55 ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is no need for sync_sched() in _cpu_down as stop_machine()
provides that barrier implicitly. Removing it also helps improve
hot-unplug latency.
The sync_sched/rcu were earlier removed for the same reason by the
commit 9ee349ad6d, but recently got
added as part of commit c4575f83b9.
CRs-Fixed: 667325
Change-Id: I97763004454d082d3cc2d9d9dbef7da923608600
Signed-off-by: Kaushal Kumar <kaushalk@codeaurora.org>
Commit 6acce3ef8:
sched: Remove get_online_cpus() usage
tries to do sync_sched/rcu() inside _cpu_down() but triggers:
INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[<ffffffff811263dc>] synchronize_rcu+0x2c/0x30
[<ffffffff81d1bd82>] _cpu_down+0x2b2/0x340
...
It was caused by that in the rcu boost case we rely on smpboot thread to
finish the rcu callback, which has already been parked before sync in here
and leads to the endless sync_sched/rcu().
This patch exchanges the sequence of smpboot_park_threads() and
sync_sched/rcu() to fix the bug.
CRs-fixed: 647141
Change-Id: Ia8b325d7c3c778f428d9fb51271977c849df32ad
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5282EDC0.6060003@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: 106dd5afde3cd10db7e1370b6ddc77f0b2496a75
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Kaushal Kumar <kaushalk@codeaurora.org>
Remove get_online_cpus() usage from the scheduler; there's 4 sites that
use it:
- sched_init_smp(); where its completely superfluous since we're in
'early' boot and there simply cannot be any hotplugging.
- sched_getaffinity(); we already take a raw spinlock to protect the
task cpus_allowed mask, this disables preemption and therefore
also stabilizes cpu_online_mask as that's modified using
stop_machine. However switch to active mask for symmetry with
sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
mask stability by inserting sync_rcu/sched() into _cpu_down.
- sched_setaffinity(); we don't appear to need get_online_cpus()
either, there's two sites where hotplug appears relevant:
* cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
for the cpuset case we hold task_lock, which is a spinlock and
thus for mainline disables preemption (might cause pain on RT).
* set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
preemption properly disabled; also it already deals with hotplug
races explicitly where it releases them.
- migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
us with a little trickery. By adding a sync_sched/rcu() after the
CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
cpu_active_mask. Use these to validate that both our cpus are active
when queueing the stop work before we queue the stop_machine works
for take_cpu_down().
CRs-fixed: 647141
Change-Id: Id41e66659574f716de0e7c29f477e56a86db9404
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: 6acce3ef84520537f8a09a12c9ddbe814a584dd2
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[kaushalk@codeaurora.org: get_online_cpus has only 3 sites of usage in
kernel/sched/core.c of msm-3.10 so migrate_swap changes are not
applicable here. stop_two_cpus related change is not applicable to
msm-3.10, so skip it.]
Signed-off-by: Kaushal Kumar <kaushalk@codeaurora.org>
* qandroid-3.10: (636 commits)
netfilter: xt_qtaguid: Protect iface list access with necessary lock
HID: magicmouse: Fix build warning
USB: gadget: mtp: Fix OUT endpoint request length usage in read
USB: gadget: f_mtp: Fix using tx buffer pointer
msm: Fix race condition in domain lookup
msm: Add null-pointer checks for domains
base: sync: increase size of sync_timeline name
USB: gadget: mtp: Add module parameters for Tx transfer length
msm: iommu: Lock the genpool allocation
gpu: ion: fix page offset in dma_buf_kmap()
gpu: ion: Fix bug in ion_system_heap map_user
gpu: ion: Only map as much of the vma as the user requested
gpu: ion: use vmalloc to allocate page array to map kernel
gpu: ion: Remove dead comments
gpu: ion: Minimize allocation fallback delay
mmc: sd: Set the card removed if card detect fails
gpu: ion: don't fault in individual pages for the CP heap
gpu: ion: do not ask for compound pages in system heap
gpu: ion: Modify the system heap to try to allocate large/huge pages
gpu: ion: Set the dma_address of the sg list at alloc time
...
Conflicts:
arch/arm/Kconfig
arch/arm/include/asm/hardware/cache-l2x0.h
arch/arm/mm/cache-l2x0.c
drivers/mmc/card/block.c
drivers/usb/gadget/udc-core.c
Add ftrace event trace_sched_cpu_hotplug to track cpu
hot-add and hot-remove events.
This is useful in a variety of power, performance and
debug analysis scenarios.
Change-Id: I5d202c7a229ffacc3aafb7cf9afee0b0ee7b0931
Signed-off-by: Arun Bharadwaj <abharadw@codeaurora.org>
Move the x86_64 idle notifiers originally by Andi Kleen and Venkatesh
Pallipadi to generic.
Change-Id: Idf29cda15be151f494ff245933c12462643388d5
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Todd Poynor <toddpoynor@google.com>
There are instances in the kernel where we would like to disable CPU
hotplug (from sysfs) during some important operation. Today the freezer
code depends on this and the code to do it was kinda tailor-made for
that.
Restructure the code and make it generic enough to be useful for other
usecases too.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull preparatory smp/hotplug patches from Ingo Molnar:
"Some early preparatory changes for the WIP hotplug rework by Thomas
Gleixner."
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stop_machine: Use smpboot threads
stop_machine: Store task reference in a separate per cpu variable
smpboot: Allow selfparking per cpu threads
Use the smpboot thread infrastructure. Mark the stopper thread
selfparking and park it after it has finished the take_cpu_down()
work.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Weinberger <rw@linutronix.de>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130131120741.686315164@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is in preparation for the full dynticks feature. While
remotely reading the cputime of a task running in a full
dynticks CPU, we'll need to do some extra-computation. This
way we can account the time it spent tickless in userspace
since its last cputime snapshot.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Pull x86 BSP hotplug changes from Ingo Molnar:
"This tree enables CPU#0 (the boot processor) to be onlined/offlined on
x86, just like any other CPU. Enabled on Intel CPUs for now.
Allowing this required the identification and fixing of latent CPU#0
assumptions (such as CPU#0 initializations, etc.) in the x86
architecture code, plus the identification of barriers to
BSP-offlining, such as active PIC interrupts which can only be
serviced on the BSP.
It's behind a default-off option, and there's a debug option that
allows the automatic testing of this feature.
The motivation of this feature is to allow and prepare for true
CPU-hotplug hardware support: recent changes to MCE support enable us
to detect a deteriorating but not yet hard-failing L1/L2 cache on a
CPU that could be soft-unplugged - or a failing L3 cache on a
multi-socket system.
Note that true hardware hot-plug is not yet fully enabled by this,
because that requires a special platform wakeup sequence to be sent to
the freshly powered up CPU#0. Future patches for this are planned,
once such a platform exists. Chicken and egg"
* 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, topology: Debug CPU0 hotplug
x86/i387.c: Initialize thread xstate only on CPU0 only once
x86, hotplug: Handle retrigger irq by the first available CPU
x86, hotplug: The first online processor saves the MTRR state
x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
x86-32, hotplug: Add start_cpu0() entry point to head_32.S
x86-64, hotplug: Add start_cpu0() entry point to head_64.S
kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
x86, hotplug, suspend: Online CPU0 for suspend or hibernate
x86, hotplug: Support functions for CPU0 online/offline
x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
x86, Kconfig: Add config switch for CPU0 hotplug
doc: Add x86 CPU0 online/offline feature
Even if acpi_processor_handle_eject() offlines cpu, there is a chance
to online the cpu after that. So the patch closes the window by using
get/put_online_cpus().
Why does the patch change _cpu_up() logic?
The patch cares the race of hot-remove cpu and _cpu_up(). If the patch
does not change it, there is the following race.
hot-remove cpu | _cpu_up()
------------------------------------- ------------------------------------
call acpi_processor_handle_eject() |
call cpu_down() |
call get_online_cpus() |
| call cpu_hotplug_begin() and stop here
call arch_unregister_cpu() |
call acpi_unmap_lsapic() |
call put_online_cpus() |
| start and continue _cpu_up()
return acpi_processor_remove() |
continue hot-remove the cpu |
So _cpu_up() can continue to itself. And hot-remove cpu can also continue
itself. If the patch changes _cpu_up() logic, the race disappears as below:
hot-remove cpu | _cpu_up()
-----------------------------------------------------------------------
call acpi_processor_handle_eject() |
call cpu_down() |
call get_online_cpus() |
| call cpu_hotplug_begin() and stop here
call arch_unregister_cpu() |
call acpi_unmap_lsapic() |
cpu's cpu_present is set |
to false by set_cpu_present()|
call put_online_cpus() |
| start _cpu_up()
| check cpu_present() and return -EINVAL
return acpi_processor_remove() |
continue hot-remove the cpu |
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpu_hotplug_pm_callback should have higher priority than
bsp_pm_callback which depends on cpu_hotplug_pm_callback to disable cpu hotplug
to avoid race during bsp online checking.
This is to hightlight the priorities between the two callbacks in case people
may overlook the order.
Ideally the priorities should be defined in macro/enum instead of fixed values.
To do that, a seperate patchset may be pushed which will touch serveral other
generic files and is out of scope of this patchset.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1352835171-3958-7-git-send-email-fenghua.yu@intel.com
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The synchronization between CPU hotplug readers and writers is achieved
by means of refcounting, safeguarded by the cpu_hotplug.lock.
get_online_cpus() increments the refcount, whereas put_online_cpus()
decrements it. If we ever hit an imbalance between the two, we end up
compromising the guarantees of the hotplug synchronization i.e, for
example, an extra call to put_online_cpus() can end up allowing a
hotplug reader to execute concurrently with a hotplug writer.
So, add a WARN_ON() in put_online_cpus() to detect such cases where the
refcount can go negative, and also attempt to fix it up, so that we can
continue to run.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86/asm changes from Ingo Molnar:
"The one change that stands out is the alternatives patching change
that prevents us from ever patching back instructions from SMP to UP:
this simplifies things and speeds up CPU hotplug.
Other than that it's smaller fixes, cleanups and improvements."
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Unspaghettize do_trap()
x86_64: Work around old GAS bug
x86: Use REP BSF unconditionally
x86: Prefer TZCNT over BFS
x86/64: Adjust types of temporaries used by ffs()/fls()/fls64()
x86: Drop unnecessary kernel_eflags variable on 64-bit
x86/smp: Don't ever patch back to UP if we unplug cpus
We still patch SMP instructions to UP variants if we boot with a
single CPU, but not at any other time. In particular, not if we
unplug CPUs to return to a single cpu.
Paul McKenney points out:
mean offline overhead is 6251/48=130.2 milliseconds.
If I remove the alternatives_smp_switch() from the offline
path [...] the mean offline overhead is 550/42=13.1 milliseconds
Basically, we're never going to get those 120ms back, and the
code is pretty messy.
We get rid of:
1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the
documentation is wrong. It's now the default.
2) The skip_smp_alternatives flag used by suspend.
3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end()
which were only used to set this one flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paul.mckenney@us.ibm.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.au
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Provide a generic interface for setting up and tearing down percpu
threads.
On registration the threads for already online cpus are created and
started. On deregistration (modules) the threads are stoppped.
During hotplug operations the threads are created, started, parked and
unparked. The datastructure for registration provides a pointer to
percpu storage space and optional setup, cleanup, park, unpark
functions. These functions are called when the thread state changes.
Each implementation has to provide a function which is queried and
returns whether the thread should run and the thread function itself.
The core code handles all state transitions and avoids duplicated code
in the call sites.
[ paulmck: Preemption leak fix ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20120716103948.352501068@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When hotadd_new_pgdat() is called to create new pgdat for a new node, a
fallback zonelist should be created for the new node. There's code to try
to achieve that in hotadd_new_pgdat() as below:
/*
* The node we allocated has no zone fallback lists. For avoiding
* to access not-initialized zonelist, build here.
*/
mutex_lock(&zonelists_mutex);
build_all_zonelists(pgdat, NULL);
mutex_unlock(&zonelists_mutex);
But it doesn't work as expected. When hotadd_new_pgdat() is called, the
new node is still in offline state because node_set_online(nid) hasn't
been called yet. And build_all_zonelists() only builds zonelists for
online nodes as:
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
build_zonelists(pgdat);
build_zonelist_cache(pgdat);
}
Though we hope to create zonelist for the new pgdat, but it doesn't. So
add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
the new pgdat too.
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Keping Chen <chenkeping@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add more comments on clear_tasks_mm_cpumask, plus adds a runtime check:
the function is only suitable for offlined CPUs, and if called
inappropriately, the kernel should scream aloud.
[akpm@linux-foundation.org: tweak comment: s/walks up/walks/, use 80 cols]
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many architectures clear tasks' mm_cpumask like this:
read_lock(&tasklist_lock);
for_each_process(p) {
if (p->mm)
cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
}
read_unlock(&tasklist_lock);
Depending on the context, the code above may have several problems,
such as:
1. Working with task->mm w/o getting mm or grabing the task lock is
dangerous as ->mm might disappear (exit_mm() assigns NULL under
task_lock(), so tasklist lock is not enough).
2. Checking for process->mm is not enough because process' main
thread may exit or detach its mm via use_mm(), but other threads
may still have a valid mm.
This patch implements a small helper function that does things
correctly, i.e.:
1. We take the task's lock while whe handle its mm (we can't use
get_task_mm()/mmput() pair as mmput() might sleep);
2. To catch exited main thread case, we use find_lock_task_mm(),
which walks up all threads and returns an appropriate task
(with task lock held).
Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
the new helper, instead we take the rcu read lock. We can do this
because the function is called after the cpu is taken down and marked
offline, so no new tasks will get this cpu set in their mm mask.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
percpu areas are already allocated during boot for each possible cpu.
percpu idle threads can be considered as an extension of the percpu areas,
and allocate them for each possible cpu during boot.
This will eliminate the need for workqueue based idle thread allocation.
In future we can move the idle thread area into the percpu area too.
[ tglx: Moved the loop into smpboot.c and added an error check when
the init code failed to allocate an idle thread for a cpu which
should be onlined ]
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: venki@google.com
Link: http://lkml.kernel.org/r/1334966930.28674.245.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
All SMP architectures have magic to fork the idle task and to store it
for reusage when cpu hotplug is enabled. Provide a generic
infrastructure for it.
Create/reinit the idle thread for the cpu which is brought up in the
generic code and hand the thread pointer to the architecture code via
__cpu_up().
Note, that fork_idle() is called via a workqueue, because this
guarantees that the idle thread does not get a reference to a user
space VM. This can happen when the boot process did not bring up all
possible cpus and a later cpu_up() is initiated via the sysfs
interface. In that case fork_idle() would be called in the context of
the user space task and take a reference on the user space VM.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: x86@kernel.org
Acked-by: Venkatesh Pallipadi <venki@google.com>
Link: http://lkml.kernel.org/r/20120420124557.102478630@linutronix.de
Start a new file, which will hold SMP and CPU hotplug related generic
infrastructure.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20120420124557.035417523@linutronix.de
Preparatory patch to make the idle thread allocation for secondary
cpus generic.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20120420124556.964170564@linutronix.de
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
PM / Hibernate: Implement compat_ioctl for /dev/snapshot
PM / Freezer: fix return value of freezable_schedule_timeout_killable()
PM / shmobile: Allow the A4R domain to be turned off at run time
PM / input / touchscreen: Make st1232 use device PM QoS constraints
PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
PM / shmobile: Remove the stay_on flag from SH7372's PM domains
PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
PM: Drop generic_subsys_pm_ops
PM / Sleep: Remove forward-only callbacks from AMBA bus type
PM / Sleep: Remove forward-only callbacks from platform bus type
PM: Run the driver callback directly if the subsystem one is not there
PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
PM / Sleep: Merge internal functions in generic_ops.c
PM / Sleep: Simplify generic system suspend callbacks
PM / Hibernate: Remove deprecated hibernation snapshot ioctls
PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
ARM: S3C64XX: Implement basic power domain support
PM / shmobile: Use common always on power domain governor
...
Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
XBT_FORCE_SLEEP bit
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
sched/tracing: Add a new tracepoint for sleeptime
sched: Disable scheduler warnings during oopses
sched: Fix cgroup movement of waking process
sched: Fix cgroup movement of newly created process
sched: Fix cgroup movement of forking process
sched: Remove cfs bandwidth period check in tg_set_cfs_period()
sched: Fix load-balance lock-breaking
sched: Replace all_pinned with a generic flags field
sched: Only queue remote wakeups when crossing cache boundaries
sched: Add missing rcu_dereference() around ->real_parent usage
[S390] fix cputime overflow in uptime_proc_show
[S390] cputime: add sparse checking and cleanup
sched: Mark parent and real_parent as __rcu
sched, nohz: Fix missing RCU read lock
sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
sched, nohz: Fix the idle cpu check in nohz_idle_balance
sched: Use jump_labels for sched_feat
sched/accounting: Fix parameter passing in task_group_account_field
sched/accounting: Fix user/system tick double accounting
sched/accounting: Re-use scheduler statistics for the root cgroup
...
Fix up conflicts in
- arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
usecs_to_cputime64() vs the sparse cleanups
- kernel/sched/fair.c, kernel/time/tick-sched.c
scheduler changes in multiple branches
Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Building rcutorture as a module requires cpu_up() as well as cpu_down()
exported, so apply EXPORT_SYMBOL_GPL().
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Add __init for functions alloc_frozen_cpus() and cpu_hotplug_pm_sync_init()
because they are only called during boot time.
Add static for function cpu_hotplug_pm_sync_init() because its scope is limited
in this file only.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
The CPU hotplug notifications sent out by the _cpu_up() and _cpu_down()
functions depend on the value of the 'tasks_frozen' argument passed to them
(which indicates whether tasks have been frozen or not).
(Examples for such CPU hotplug notifications: CPU_ONLINE, CPU_ONLINE_FROZEN,
CPU_DEAD, CPU_DEAD_FROZEN).
Thus, it is essential that while the callbacks for those notifications are
running, the state of the system with respect to the tasks being frozen or
not remains unchanged, *throughout that duration*. Hence there is a need for
synchronizing the CPU hotplug code with the freezer subsystem.
Since the freezer is involved only in the Suspend/Hibernate call paths, this
patch hooks the CPU hotplug code to the suspend/hibernate notifiers
PM_[SUSPEND|HIBERNATE]_PREPARE and PM_POST_[SUSPEND|HIBERNATE] to prevent
the race between CPU hotplug and freezer, thus ensuring that CPU hotplug
notifications will always be run with the state of the system really being
what the notifications indicate, _throughout_ their execution time.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else. Revector them
onto the isolated export header for faster compile times.
Nothing to see here but a whole lot of instances of:
-#include <linux/module.h>
+#include <linux/export.h>
This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Change the printk() calls to have the KERN_INFO/KERN_ERROR stuff, and
fixes other coding style errors. Not _all_ of them are gone, though.
[akpm@linux-foundation.org: revert the bits I disagree with]
Signed-off-by: Michael Rodriguez <dkingston02@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64, asm: Use fxsaveq/fxrestorq in more places
* 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, hwmon: Add core threshold notification to therm_throt.c
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, paravirt: Use native_halt on a halt, not native_safe_halt
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
locking, lockdep: Convert sprintf_symbol to %pS
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
irq: Better struct irqaction layout
During suspend, we disable all the non boot cpus. And during resume we bring
them all back again. So no need to do alternatives_smp_switch() in between.
On my core 2 based laptop, this speeds up the suspend path by 15msec and the
resume path by 5 msec (suspend/resume speed up differences can be attributed
to the different P-states that the cpu is in during suspend/resume).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1290557500.4946.8.camel@sbsiddha-MOBL3.sc.intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Oleg mentioned that there is no actual guarantee the dying cpu's
migration thread is actually finished running when we get there, so
replace the BUG_ON() with a spinloop waiting for it.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
GCC warns us about:
kernel/cpu.c: In function ‘take_cpu_down’:
kernel/cpu.c:200:15: warning: unused variable ‘cpu’
This variable is unused since param->hcpu is directly
used later on in cpu_notify.
Signed-off-by: Dhaval Giani <dhaval_giani@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290091494.1145.5.camel@gondor.retis>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While discussing the need for sched_idle_next(), Oleg remarked that
since try_to_wake_up() ensures sleeping tasks will end up running on a
sane cpu, we can do away with migrate_live_tasks().
If we then extend the existing hack of migrating current from
CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the
need for the sched_idle_next() abomination disappears as well, since
idle will be the only possible thread left after the migration thread
stops.
This greatly simplifies the hot-unplug task migration path, as can be
seen from the resulting code reduction (and about half the new lines
are comments).
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1289851597.2109.547.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, when a cpu goes down, cpu_active is cleared before
CPU_DOWN_PREPARE starts and cpuset configuration is updated from a
default priority cpu notifier. When a cpu is coming up, it's set
before CPU_ONLINE but cpuset configuration again is updated from the
same cpu notifier.
For cpu notifiers, this presents an inconsistent state. Threads which
a CPU_DOWN_PREPARE notifier expects to be bound to the CPU can be
migrated to other cpus because the cpu is no more inactive.
Fix it by updating cpu_active in the highest priority cpu notifier and
cpuset configuration in the second highest when a cpu is coming up.
Down path is updated similarly. This guarantees that all other cpu
notifiers see consistent cpu_active and cpuset configuration.
cpuset_track_online_cpus() notifier is converted to
cpuset_update_active_cpus() which just updates the configuration and
now called from cpuset_cpu_[in]active() notifiers registered from
sched_init_smp(). If cpuset is disabled, cpuset_update_active_cpus()
degenerates into partition_sched_domains() making separate notifier
for !CONFIG_CPUSETS unnecessary.
This problem is triggered by cmwq. During CPU_DOWN_PREPARE, hotplug
callback creates a kthread and kthread_bind()s it to the target cpu,
and the thread is expected to run on that cpu.
* Ingo's test discovered __cpuinit/exit markups were incorrect.
Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Menage <menage@google.com>
In commit e9fb7631eb ("cpu-hotplug: introduce cpu_notify(),
__cpu_notify(), cpu_notify_nofail()") the new helper functions access
cpu_chain. As a result, it shouldn't be marked __cpuinitdata (via
section mismatch warning).
Alternatively, the helper functions should be forced inline, or marked
__ref or __cpuinit. In the meantime, this patch silences the warning
the trivial way.
Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If there's only one CPU online when disable_nonboot_cpus() is called,
the error variable will not be initialized and that may lead to
erroneous behavior. Fix this issue by initializing error in
disable_nonboot_cpus() as appropriate.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit e9fb7631eb ("cpu-hotplug: introduce cpu_notify(),
__cpu_notify(), cpu_notify_nofail()") also introduced this annoying
warning:
kernel/cpu.c:157: warning: 'cpu_notify_nofail' defined but not used
when CONFIG_HOTPLUG_CPU wasn't set.
So move that helper inside the #ifdef CONFIG_HOTPLUG_CPU region, and
simplify it while at it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>