Commit Graph

116 Commits

Author SHA1 Message Date
Nathan Chancellor a626beca4c This is the 3.10.105 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYnZNdAAoJEE44bZycYXAvJv0P/jpPc+jKb+D0FVUOiYDkY5Rw
 jxsZ3oeruTIeSFAAIzusVMLm9moBJA6DThuTHU5Kt68mRaKB2lgmqwkkvQAPTSYh
 tnDQwlrF7dOSVmPczJFHalpaLpRdXdQP9r8y+38PibaFZPssKdnZr3BBfdOdi5DT
 lj029AGKfG7co6Hb/iAhsxuAFfPmvGHY4QNwJ2FRbU1m6MDtmCTbXzF0fc6X5AW1
 qrtaWwPulJtZ/5MPk7aFyNpuCpNvIaTEqNaQsZbuz3bHfzDQVLerWze98vgHC0QM
 2YOTP6TnEiHhxHGMb9SywUgSV1ylx0X542YDfxmcfyxBWRr0khlxQh1gpX+waqE3
 pqdSlvN7AFzifw6kubbG2/XjkNvFtJcDTgrL3qco4utIezSijXmoOsDpKNnJuzk/
 kSD5WYd+Q1CSHOkqZX29QPw1Dl/7Ftm7GPfxu7Pis1OBuPByqtRkEfmn9DpiKSs5
 Aja0ljZYiQ3jy3fH+WlEzo6PVSxx0ZxKg0fOShlpgjj8KjMUdGfl9cB1OZxyWnNH
 UiQ9iIWd3tJci7WbsBOfawsQpq3EIJxZKjyUmLYpBht5/YenYxOBDCr/CLJDQBGI
 IQUPAs/E1JGDxGTUY3AmsaMVrcX2yOfhLzjrsVJGqSdote0um+2PdTLZHE4MMiz2
 Dh6CbUVYWS1KNgmQ8T8L
 =k5mW
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEJDfLduVEy2qz2d/TmXOSYMtstxYFAlpqeiwACgkQmXOSYMts
 txaBAQ/+KqZh90YZI+gRHGdczbo3XnlryHMdpp+DTIFtN3zU+2LM352oP+haoJfr
 YhNsixcMhW5TX0is5fg4SkIc0B3ooGKZLVKOPIRw+1NLBAVG5yVuYxW7I1faJgk6
 F37+4rvq7KAOPCNMjAEXRt7GqZ4WZjgvgKy+u5wzKh3k5kUylqDwlP2qdgx2L5Rc
 IxyxgOuaVGV6dZTyAyRlRMild5Tlz+SMY4pWoMe0sulDDXhd5/5PnGNVIgh+XqB6
 m0AGkIIzPVe+wmg6n1iYs93dQO0Jmu6DL47Zv4f3ASZNL/XVSLvU9ie63FyWGZXG
 e52qAPtztXInEOo15vPQSAAq7McZHDTzhHhsU/ZtkBT+LeSUU+rsxXddJ2EO5UgC
 O3cVm11x1FWMzbBtFNFtkqeri2Y2OxvU4O81mfNP1oOUQBTMeSHTzQ8psbCdXeEr
 ktSOtI+nakPmDE3aq4YSaz7BwSgt2tU/vZehkrTxtAQJxt0b88r2xFfThy5WScT1
 v6muoqxlprjjvFld7v99P8cXxJq4QrxKUxXtEBTdB79Q5xtCC29OAcTelpPFDCED
 /KpgZflubzH/Z872AW9Ru8OL9PYty6hBNDOP4aHLSFWfCu3KQxL6BMEeqi5qBjBX
 mJ8JT0dCQYP6xONIWq6a3fICroNMazhNFxdpPSfsQFRhujhjGPg=
 =zhKv
 -----END PGP SIGNATURE-----

Merge 3.10.105 into android-msm-bullhead-3.10-oreo-m5

Changes in 3.10.105: (315 commits)
        sched/core: Fix a race between try_to_wake_up() and a woken up task
        sched/core: Fix an SMP ordering race in try_to_wake_up() vs. schedule()
        crypto: algif_skcipher - Require setkey before accept(2)
        crypto: af_alg - Disallow bind/setkey/... after accept(2)
        crypto: af_alg - Add nokey compatibility path
        crypto: algif_skcipher - Add nokey compatibility path
        crypto: hash - Add crypto_ahash_has_setkey
        crypto: shash - Fix has_key setting
        crypto: algif_hash - Require setkey before accept(2)
        crypto: skcipher - Add crypto_skcipher_has_setkey
        crypto: algif_skcipher - Add key check exception for cipher_null
        crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey path
        crypto: algif_hash - Remove custom release parent function
        crypto: algif_skcipher - Remove custom release parent function
        crypto: af_alg - Forbid bind(2) when nokey child sockets are present
        crypto: algif_hash - Fix race condition in hash_check_key
        crypto: algif_skcipher - Fix race condition in skcipher_check_key
        crypto: algif_skcipher - Load TX SG list after waiting
        crypto: cryptd - initialize child shash_desc on import
        crypto: skcipher - Fix blkcipher walk OOM crash
        crypto: gcm - Fix IV buffer size in crypto_gcm_setkey
        MIPS: KVM: Fix unused variable build warning
        KVM: MIPS: Precalculate MMIO load resume PC
        KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
        KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write
        KVM: MIPS: Make ERET handle ERL before EXL
        KVM: x86: fix wbinvd_dirty_mask use-after-free
        KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
        KVM: Disable irq while unregistering user notifier
        PM / devfreq: Fix incorrect type issue.
        ppp: defer netns reference release for ppp channel
        x86/mm/xen: Suppress hugetlbfs in PV guests
        xen: Add RING_COPY_REQUEST()
        xen-netback: don't use last request to determine minimum Tx credit
        xen-netback: use RING_COPY_REQUEST() throughout
        xen-blkback: only read request operation from shared ring once
        xen/pciback: Save xen_pci_op commands before processing it
        xen/pciback: Save the number of MSI-X entries to be copied later.
        xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled
        xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled
        xen/pciback: Do not install an IRQ handler for MSI interrupts.
        xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled.
        xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set.
        xen-pciback: Add name prefix to global 'permissive' variable
        x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
        x86/traps: Ignore high word of regs->cs in early_idt_handler_common
        x86/mm: Disable preemption during CR3 read+write
        x86/apic: Do not init irq remapping if ioapic is disabled
        x86/mm/pat, /dev/mem: Remove superfluous error message
        x86/paravirt: Do not trace _paravirt_ident_*() functions
        x86/build: Build compressed x86 kernels as PIE
        x86/um: reuse asm-generic/barrier.h
        iommu/amd: Update Alias-DTE in update_device_table()
        iommu/amd: Free domain id when free a domain of struct dma_ops_domain
        ARM: 8616/1: dt: Respect property size when parsing CPUs
        ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7
        ARM: sa1100: clear reset status prior to reboot
        ARM: sa1111: fix pcmcia suspend/resume
        arm64: avoid returning from bad_mode
        arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
        arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb()
        arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP
        MIPS: Malta: Fix IOCU disable switch read for MIPS64
        MIPS: ptrace: Fix regs_return_value for kernel context
        powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
        powerpc/vdso64: Use double word compare on pointers
        powerpc/powernv: Use CPU-endian PEST in pnv_pci_dump_p7ioc_diag_data()
        powerpc/64: Fix incorrect return value from __copy_tofrom_user
        powerpc/nvram: Fix an incorrect partition merge
        avr32: fix copy_from_user()
        avr32: fix 'undefined reference to `___copy_from_user'
        avr32: off by one in at32_init_pio()
        s390/dasd: fix hanging device after clear subchannel
        parisc: Ensure consistent state when switching to kernel stack at syscall entry
        microblaze: fix __get_user()
        microblaze: fix copy_from_user()
        mn10300: failing __get_user() and get_user() should zero
        m32r: fix __get_user()
        sh64: failing __get_user() should zero
        score: fix __get_user/get_user
        s390: get_user() should zero on failure
        ARC: uaccess: get_user to zero out dest in cause of fault
        asm-generic: make get_user() clear the destination on errors
        frv: fix clear_user()
        cris: buggered copy_from_user/copy_to_user/clear_user
        blackfin: fix copy_from_user()
        score: fix copy_from_user() and friends
        sh: fix copy_from_user()
        hexagon: fix strncpy_from_user() error return
        mips: copy_from_user() must zero the destination on access_ok() failure
        asm-generic: make copy_from_user() zero the destination properly
        alpha: fix copy_from_user()
        metag: copy_from_user() should zero the destination on access_ok() failure
        parisc: fix copy_from_user()
        openrisc: fix copy_from_user()
        openrisc: fix the fix of copy_from_user()
        mn10300: copy_from_user() should zero on access_ok() failure...
        sparc32: fix copy_from_user()
        ppc32: fix copy_from_user()
        ia64: copy_from_user() should zero the destination on access_ok() failure
        fix fault_in_multipages_...() on architectures with no-op access_ok()
        fix memory leaks in tracing_buffers_splice_read()
        arc: don't leak bits of kernel stack into coredump
        Fix potential infoleak in older kernels
        swapfile: fix memory corruption via malformed swapfile
        coredump: fix unfreezable coredumping task
        usb: dwc3: gadget: increment request->actual once
        USB: validate wMaxPacketValue entries in endpoint descriptors
        USB: fix typo in wMaxPacketSize validation
        usb: xhci: Fix panic if disconnect
        USB: serial: fix memleak in driver-registration error path
        USB: kobil_sct: fix non-atomic allocation in write path
        USB: serial: mos7720: fix non-atomic allocation in write path
        USB: serial: mos7840: fix non-atomic allocation in write path
        usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
        USB: change bInterval default to 10 ms
        usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame()
        USB: serial: cp210x: fix hardware flow-control disable
        usb: misc: legousbtower: Fix NULL pointer deference
        usb: gadget: function: u_ether: don't starve tx request queue
        USB: serial: cp210x: fix tiocmget error handling
        usb: gadget: u_ether: remove interrupt throttling
        usb: chipidea: move the lock initialization to core file
        Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y
        ALSA: rawmidi: Fix possible deadlock with virmidi registration
        ALSA: timer: fix NULL pointer dereference in read()/ioctl() race
        ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE
        ALSA: timer: fix NULL pointer dereference on memory allocation failure
        ALSA: ali5451: Fix out-of-bound position reporting
        ALSA: pcm : Call kill_fasync() in stream lock
        zfcp: fix fc_host port_type with NPIV
        zfcp: fix ELS/GS request&response length for hardware data router
        zfcp: close window with unblocked rport during rport gone
        zfcp: retain trace level for SCSI and HBA FSF response records
        zfcp: restore: Dont use 0 to indicate invalid LUN in rec trace
        zfcp: trace on request for open and close of WKA port
        zfcp: restore tracing of handle for port and LUN with HBA records
        zfcp: fix D_ID field with actual value on tracing SAN responses
        zfcp: fix payload trace length for SAN request&response
        zfcp: trace full payload of all SAN records (req,resp,iels)
        scsi: zfcp: spin_lock_irqsave() is not nestable
        scsi: mpt3sas: Fix secure erase premature termination
        scsi: mpt3sas: Unblock device after controller reset
        scsi: mpt3sas: fix hang on ata passthrough commands
        mpt2sas: Fix secure erase premature termination
        scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices
        scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression
        scsi: ibmvfc: Fix I/O hang when port is not mapped
        scsi: Fix use-after-free
        scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer()
        scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded
        scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware
        ext4: validate that metadata blocks do not overlap superblock
        ext4: avoid modifying checksum fields directly during checksum verification
        ext4: use __GFP_NOFAIL in ext4_free_blocks()
        ext4: reinforce check of i_dtime when clearing high fields of uid and gid
        ext4: allow DAX writeback for hole punch
        ext4: sanity check the block and cluster size at mount time
        reiserfs: fix "new_insert_key may be used uninitialized ..."
        reiserfs: Unlock superblock before calling reiserfs_quota_on_mount()
        xfs: fix superblock inprogress check
        libxfs: clean up _calc_dquots_per_chunk
        btrfs: ensure that file descriptor used with subvol ioctls is a dir
        ocfs2/dlm: fix race between convert and migration
        ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
        ubifs: Fix assertion in layout_in_gaps()
        ubifs: Fix xattr_names length in exit paths
        UBIFS: Fix possible memory leak in ubifs_readdir()
        ubifs: Abort readdir upon error
        ubifs: Fix regression in ubifs_readdir()
        UBI: fastmap: scrub PEB when bitflips are detected in a free PEB EC header
        NFSv4.x: Fix a refcount leak in nfs_callback_up_net
        NFSD: Using free_conn free connection
        NFS: Don't drop CB requests with invalid principals
        NFSv4: Open state recovery must account for file permission changes
        fs/seq_file: fix out-of-bounds read
        fs/super.c: fix race between freeze_super() and thaw_super()
        isofs: Do not return EACCES for unknown filesystems
        hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()
        driver core: Delete an unnecessary check before the function call "put_device"
        driver core: fix race between creating/querying glue dir and its cleanup
        drm/radeon: fix radeon_move_blit on 32bit systems
        drm: Reject page_flip for !DRIVER_MODESET
        drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on
        qxl: check for kmap failures
        Input: i8042 - break load dependency between atkbd/psmouse and i8042
        Input: i8042 - set up shared ps2_cmd_mutex for AUX ports
        Input: ili210x - fix permissions on "calibrate" attribute
        hwrng: exynos - Disable runtime PM on probe failure
        hwrng: omap - Fix assumption that runtime_get_sync will always succeed
        hwrng: omap - Only fail if pm_runtime_get_sync returns < 0
        i2c-eg20t: fix race between i2c init and interrupt enable
        em28xx-i2c: rt_mutex_trylock() returns zero on failure
        i2c: core: fix NULL pointer dereference under race condition
        i2c: at91: fix write transfers by clearing pending interrupt first
        iio: accel: kxsd9: Fix raw read return
        iio: accel: kxsd9: Fix scaling bug
        thermal: hwmon: Properly report critical temperature in sysfs
        cdc-acm: fix wrong pipe type on rx interrupt xfers
        timers: Use proper base migration in add_timer_on()
        EDAC: Increment correct counter in edac_inc_ue_error()
        IB/ipoib: Fix memory corruption in ipoib cm mode connect flow
        IB/core: Fix use after free in send_leave function
        IB/ipoib: Don't allow MC joins during light MC flush
        IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV
        IB/mlx4: Fix create CQ error flow
        IB/uverbs: Fix leak of XRC target QPs
        IB/cm: Mark stale CM id's whenever the mad agent was unregistered
        mtd: blkdevs: fix potential deadlock + lockdep warnings
        mtd: pmcmsp-flash: Allocating too much in init_msp_flash()
        mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl
        perf symbols: Fixup symbol sizes before picking best ones
        perf: Tighten (and fix) the grouping condition
        tty: Prevent ldisc drivers from re-using stale tty fields
        tty: limit terminal size to 4M chars
        tty: vt, fix bogus division in csi_J
        vt: clear selection before resizing
        drivers/vfio: Rework offsetofend()
        include/stddef.h: Move offsetofend() from vfio.h to a generic kernel header
        stddef.h: move offsetofend inside #ifndef/#endif guard, neaten
        ipv6: don't call fib6_run_gc() until routing is ready
        ipv6: split duplicate address detection and router solicitation timer
        ipv6: move DAD and addrconf_verify processing to workqueue
        ipv6: addrconf: fix dev refcont leak when DAD failed
        ipv6: fix rtnl locking in setsockopt for anycast and multicast
        ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()
        ipv6: correctly add local routes when lo goes up
        ipv6: dccp: fix out of bound access in dccp_v6_err()
        ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped
        ip6_tunnel: Clear IP6CB in ip6tunnel_xmit()
        ip6_tunnel: disable caching when the traffic class is inherited
        net/irda: handle iriap_register_lsap() allocation failure
        tcp: fix use after free in tcp_xmit_retransmit_queue()
        tcp: properly scale window in tcp_v[46]_reqsk_send_ack()
        tcp: fix overflow in __tcp_retransmit_skb()
        tcp: fix wrong checksum calculation on MTU probing
        tcp: take care of truncations done by sk_filter()
        bonding: Fix bonding crash
        net: ratelimit warnings about dst entry refcount underflow or overflow
        mISDN: Support DR6 indication in mISDNipac driver
        mISDN: Fixing missing validation in base_sock_bind()
        net: disable fragment reassembly if high_thresh is set to zero
        ipvs: count pre-established TCP states as active
        iwlwifi: pcie: fix access to scratch buffer
        svc: Avoid garbage replies when pc_func() returns rpc_drop_reply
        brcmsmac: Free packet if dma_mapping_error() fails in dma_rxfill
        brcmsmac: Initialize power in brcms_c_stf_ss_algo_channel_get()
        brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap()
        pstore: Fix buffer overflow while write offset equal to buffer size
        net/mlx4_core: Allow resetting VF admin mac to zero
        firewire: net: guard against rx buffer overflows
        firewire: net: fix fragmented datagram_size off-by-one
        netfilter: fix namespace handling in nf_log_proc_dostring
        can: bcm: fix warning in bcm_connect/proc_register
        net: fix sk_mem_reclaim_partial()
        net: avoid sk_forward_alloc overflows
        ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route
        packet: call fanout_release, while UNREGISTERING a netdev
        net: sctp, forbid negative length
        sctp: validate chunk len before actually using it
        net: clear sk_err_soft in sk_clone_lock()
        net: mangle zero checksum in skb_checksum_help()
        dccp: do not send reset to already closed sockets
        dccp: fix out of bound access in dccp_v4_err()
        sctp: assign assoc_id earlier in __sctp_connect
        neigh: check error pointer instead of NULL for ipv4_neigh_lookup()
        ipv4: use new_gw for redirect neigh lookup
        mac80211: fix purging multicast PS buffer queue
        mac80211: discard multicast and 4-addr A-MSDUs
        cfg80211: limit scan results cache size
        mwifiex: printk() overflow with 32-byte SSIDs
        ipv4: Set skb->protocol properly for local output
        net: sky2: Fix shutdown crash
        kaweth: fix firmware download
        tracing: Move mutex to protect against resetting of seq data
        kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
        Revert "ipc/sem.c: optimize sem_lock()"
        cfq: fix starvation of asynchronous writes
        drbd: Fix kernel_sendmsg() usage - potential NULL deref
        lib/genalloc.c: start search from start of chunk
        tools/vm/slabinfo: fix an unintentional printf
        rcu: Fix soft lockup for rcu_nocb_kthread
        ratelimit: fix bug in time interval by resetting right begin time
        mfd: core: Fix device reference leak in mfd_clone_cell
        PM / sleep: fix device reference leak in test_suspend
        mmc: mxs: Initialize the spinlock prior to using it
        mmc: block: don't use CMD23 with very old MMC cards
        pstore/core: drop cmpxchg based updates
        pstore/ram: Use memcpy_toio instead of memcpy
        pstore/ram: Use memcpy_fromio() to save old buffer
        mb86a20s: fix the locking logic
        mb86a20s: fix demod settings
        cx231xx: don't return error on success
        cx231xx: fix GPIOs for Pixelview SBTVD hybrid
        gpio: mpc8xxx: Correct irq handler function
        uio: fix dmem_region_start computation
        KEYS: Fix short sprintf buffer in /proc/keys show function
        hv: do not lose pending heartbeat vmbus packets
        staging: iio: ad5933: avoid uninitialized variable in error case
        mei: bus: fix received data size check in NFC fixup
        ACPI / APEI: Fix incorrect return value of ghes_proc()
        PCI: Handle read-only BARs on AMD CS553x devices
        tile: avoid using clocksource_cyc2ns with absolute cycle count
        dm flakey: fix reads to be issued if drop_writes configured
        mm,ksm: fix endless looping in allocating memory when ksm enable
        can: dev: fix deadlock reported after bus-off
        hwmon: (adt7411) set bit 3 in CFG1 register
        mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]
        mfd: 88pm80x: Double shifting bug in suspend/resume
        ASoC: omap-mcpdm: Fix irq resource handling
        regulator: tps65910: Work around silicon erratum SWCZ010
        dm: mark request_queue dead before destroying the DM device
        fbdev/efifb: Fix 16 color palette entry calculation
        metag: Only define atomic_dec_if_positive conditionally
        Linux 3.10.105

Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>

Conflicts:
	arch/arm/mach-sa1100/generic.c
	arch/arm64/kernel/traps.c
	crypto/blkcipher.c
	drivers/devfreq/devfreq.c
	drivers/usb/dwc3/gadget.c
	drivers/usb/gadget/u_ether.c
	fs/ubifs/dir.c
	include/net/if_inet6.h
	lib/genalloc.c
	net/ipv6/addrconf.c
	net/ipv6/tcp_ipv6.c
	net/wireless/scan.c
	sound/core/timer.c
2018-01-25 17:45:32 -07:00
Nathan Chancellor 8e9b01b4f8 This is the 3.10.88 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJV9Z+KAAoJEDjbvchgkmk+yoUQALEbk57GLCvTzvi59u1l6R7P
 GicrtO6gC8q354AvTlQHCStnpjvdSl3rGoZx9XHMwq9ZlnuWWd6CBHmCoQbn4yi5
 cABQOF+cvVESPAUjag1qavZKWupImRRirSX714jpFj6acjUCVk+4JTP8aNFbF2Rd
 MMCVmy3XCQXdPSaAw8Y7Foxub3eC36hVnx//Af+dt1C1YdIAK1fFo9KMtnP1RDZK
 SpzAnaJcvj1IjAKdyetqZvDj7KBeVBW8Bg16Y6eOe8CHVNy1ro53whrhi3M4PotQ
 NUuLyGZsI4T8Y8JtZXK1qgW0y5iOidAaFSGDwSCu+PGDEMWoFa7K2mWWtc6BW2vx
 gZf/jQfzSJhHJP42qowJshbMvgq2aUUHFFSpzPpAivNbrr9/SOvqWaiOIV7FjyqE
 Z0CDTPWW5j1vOuTpMcvseobvTFM6UYLaVIQ6QXLCzM3JityKZ5uEmgTiPBDhcge5
 LKS5XNXOpzY01jFPYvgzk3gFcunRicK0bK1Tr2Q7sSCA5pbSXxxjkJw2FSorjort
 MqpT7ZG0rE5tadrDw9NrdeLj6fJ1xF6pq5RN4InFpsnfk437RCcxu7c+IkYzmPpg
 z/mGr4RcvJEZx7lohNl4oB2lyyg6om2SNYk6PJlQEM1REhRKXSTEVas4k851JEow
 Mgmxf/EQTFZ+bkhPttnc
 =hBHg
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEJDfLduVEy2qz2d/TmXOSYMtstxYFAlpqb94ACgkQmXOSYMts
 txZdYxAAnJt6aqAay59MFPKhxkpSFL8zlLPothknNfDM2tTMhMQEdOmZBZj9X6Vv
 sNvP6q7jCTncGlIAWI93k8O5BJ+YoBlaO9f8UxXD7i0EksgRqNsWmv5R3Cyk09tN
 XpIBoP5mV2TxrVVOlpNILFCGRZtjMLZzEieDtpP7g9LIwxwHVimhuJhlRfrJvIN7
 hqAYooPFk3n0gkG+X+BoIhTXACcGVMPuZQHAwSxOVu/bD7YYFxroxlQtIHMdxtbb
 BbJXs/L2aCeAOD83XSXkE9A/Qa3NleVlEwPuZq+x5tqXHzuLIBaC6mBbsctsWxE0
 ArBh0OBY1d2QeMYHIkhGs1xAHBcXT0kAh8XDJ3yv0Czy5/84GSI64WWro4iTM6nA
 uyR8Oj9cwFftjx1BQKtC/Uyxaa5ZKwHVznKBr8UkRk4b8b+AiOwDzafHYnEsXTvw
 6pww0iDDX/J3DeVKeRp5M6jdlpdvTmI7gISSAxGlEvIKQr8Q7dIlpMLFE6K03FNv
 2K3X9CVo+DwxWFazs/quV+pYQxkeHhWB18w2P+ENgJXFzEmsK3J5MTJKU4JLWzIk
 p4w2+jwowdz94tecQEGWrO0frsGYLn3Z2ZJe2+fE6AgkQYKWe3zqIKiH5/yOqAIg
 GuH00dRDSmCa+U0u8Bjz8fBusZYr1+uzyc7NP7EnGnDHuQuE/Ys=
 =IDT3
 -----END PGP SIGNATURE-----

Merge 3.10.88 into android-msm-bullhead-3.10-oreo-m5

Changes in 3.10.88: (12 commits)
        ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
        ipc/sem.c: update/correct memory barriers
        mm/hwpoison: fix page refcount of unknown non LRU page
        perf: Fix fasync handling on inherited events
        dm thin metadata: delete btrees when releasing metadata snapshot
        localmodconfig: Use Kbuild files too
        EDAC, ppc4xx: Access mci->csrows array elements properly
        drm/radeon: add new OLAND pci id
        libfc: Fix fc_fcp_cleanup_each_cmd()
        crypto: caam - fix memory corruption in ahash_final_ctx
        arm64/mm: Remove hack in mmap randomize layout
        Linux 3.10.88

Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>

Conflicts:
	arch/arm64/mm/mmap.c
2018-01-25 17:01:34 -07:00
Willy Tarreau 3d971ee18f Revert "ipc/sem.c: optimize sem_lock()"
This reverts commit 901f6fedc5
(upstream commit 6d07b68ce16ae9535955ba2059dedba5309c3ca1).

As suggested in commit 5864a2fd3088db73d47942370d0f7210a807b9bc
(ipc/sem.c: fix complex_count vs. simple op race) since it introduces
a regression and the candidate fix requires too many changes for 3.10.

Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
2017-02-10 11:03:58 +01:00
Manfred Spraul 62ba50fde4 ipc/sem.c: fully initialize sem_array before making it visible
(cherry pick from commit e8577d1f0329d4842e8302e289fb2c22156abef4)

ipc_addid() makes a new ipc identifier visible to everyone.  New objects
start as locked, so that the caller can complete the initialization
after the call.  Within struct sem_array, at least sma->sem_base and
sma->sem_nsems are accessed without any locks, therefore this approach
doesn't work.

Thus: Move the ipc_addid() to the end of the initialization.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Rik van Riel <riel@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mark Salyzyn <salyzyn@google.com>
Bug: 24551430
Change-Id: I36a8cc2281dfd68e9a399695f1d7e7b038e105d0
2015-10-14 23:59:19 +00:00
Manfred Spraul 30e5bc30f5 ipc/sem.c: update/correct memory barriers
commit 3ed1f8a99d70ea1cd1508910eb107d0edcae5009 upstream.

sem_lock() did not properly pair memory barriers:

!spin_is_locked() and spin_unlock_wait() are both only control barriers.
The code needs an acquire barrier, otherwise the cpu might perform read
operations before the lock test.

As no primitive exists inside <include/spinlock.h> and since it seems
noone wants another primitive, the code creates a local primitive within
ipc/sem.c.

With regards to -stable:

The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
nop (just a preprocessor redefinition to improve the readability).  The
bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
starting from 3.10).

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-09-13 09:07:59 -07:00
Herton R. Krzesinski 04d2af2854 ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits
commit 602b8593d2b4138c10e922eeaafe306f6b51817b upstream.

The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).

For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:

   Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
   000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
   010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
   Prev obj: start=ffff88003b45c180, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
   Next obj: start=ffff88003b45c200, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
   BUG: spinlock wrong CPU on CPU#2, test/18028
   general protection fault: 0000 [#1] SMP
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: spin_dump+0x53/0xc0
   Call Trace:
     spin_bug+0x30/0x40
     do_raw_spin_unlock+0x71/0xa0
     _raw_spin_unlock+0xe/0x10
     freeary+0x82/0x2a0
     ? _raw_spin_lock+0xe/0x10
     semctl_down.clone.0+0xce/0x160
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     SyS_semctl+0x236/0x2c0
     ? syscall_trace_leave+0xde/0x130
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
   RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
    RSP <ffff88003750fd68>
   ---[ end trace 783ebb76612867a0 ]---
   NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: native_read_tsc+0x0/0x20
   Call Trace:
     ? delay_tsc+0x40/0x70
     __delay+0xf/0x20
     do_raw_spin_lock+0x96/0x140
     _raw_spin_lock+0xe/0x10
     sem_lock_and_putref+0x11/0x70
     SYSC_semtimedop+0x7bf/0x960
     ? handle_mm_fault+0xbf6/0x1880
     ? dequeue_task_fair+0x79/0x4a0
     ? __do_page_fault+0x19a/0x430
     ? kfree_debugcheck+0x16/0x40
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     ? do_audit_syscall_entry+0x66/0x70
     ? syscall_trace_enter_phase1+0x139/0x160
     SyS_semtimedop+0xe/0x10
     SyS_semop+0x10/0x20
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
   Kernel panic - not syncing: softlockup: hung tasks

I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).

The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).

After the patch I'm unable to reproduce the problem using the test case
[1].

[1] Test case used below:

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/ipc.h>
    #include <sys/sem.h>
    #include <sys/wait.h>
    #include <stdlib.h>
    #include <time.h>
    #include <unistd.h>
    #include <errno.h>

    #define NSEM 1
    #define NSET 5

    int sid[NSET];

    void thread()
    {
            struct sembuf op;
            int s;
            uid_t pid = getuid();

            s = rand() % NSET;
            op.sem_num = pid % NSEM;
            op.sem_op = 1;
            op.sem_flg = SEM_UNDO;

            semop(sid[s], &op, 1);
            exit(EXIT_SUCCESS);
    }

    void create_set()
    {
            int i, j;
            pid_t p;
            union {
                    int val;
                    struct semid_ds *buf;
                    unsigned short int *array;
                    struct seminfo *__buf;
            } un;

            /* Create and initialize semaphore set */
            for (i = 0; i < NSET; i++) {
                    sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                    if (sid[i] < 0) {
                            perror("semget");
                            exit(EXIT_FAILURE);
                    }
            }
            un.val = 0;
            for (i = 0; i < NSET; i++) {
                    for (j = 0; j < NSEM; j++) {
                            if (semctl(sid[i], j, SETVAL, un) < 0)
                                    perror("semctl");
                    }
            }

            /* Launch threads that operate on semaphore set */
            for (i = 0; i < NSEM * NSET * NSET; i++) {
                    p = fork();
                    if (p < 0)
                            perror("fork");
                    if (p == 0)
                            thread();
            }

            /* Free semaphore set */
            for (i = 0; i < NSET; i++) {
                    if (semctl(sid[i], NSEM, IPC_RMID))
                            perror("IPC_RMID");
            }

            /* Wait for forked processes to exit */
            while (wait(NULL)) {
                    if (errno == ECHILD)
                            break;
            };
    }

    int main(int argc, char **argv)
    {
            pid_t p;

            srand(time(NULL));

            while (1) {
                    p = fork();
                    if (p < 0) {
                            perror("fork");
                            exit(EXIT_FAILURE);
                    }
                    if (p == 0) {
                            create_set();
                            goto end;
                    }

                    /* Wait for forked processes to exit */
                    while (wait(NULL)) {
                            if (errno == ECHILD)
                                    break;
                    };
            }
    end:
            return 0;
    }

[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-13 09:07:58 -07:00
Manfred Spraul 873be93b1a ipc/sem.c: synchronize semop and semctl with IPC_RMID
commit 6e224f94597842c5eb17f1fc2208d20b6f7f7d49 upstream.

After acquiring the semlock spinlock, operations must test that the
array is still valid.

 - semctl() and exit_sem() would walk stale linked lists (ugly, but
   should be ok: all lists are empty)

 - semtimedop() would sleep forever - and if woken up due to a signal -
   access memory after free.

The patch also:
 - standardizes the tests for .deleted, so that all tests in one
   function leave the function with the same approach.
 - unconditionally tests for .deleted immediately after every call to
   sem_lock - even it it means that for semctl(GETALL), .deleted will be
   tested twice.

Both changes make the review simpler: After every sem_lock, there must
be a test of .deleted, followed by a goto to the cleanup code (if the
function uses "goto cleanup").

The only exception is semctl_down(): If sem_ids().rwsem is locked, then
the presence in ids->ipcs_idr is equivalent to !.deleted, thus no
additional test is required.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Mike Galbraith <efault@gmx.de>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-04 10:56:12 -08:00
Manfred Spraul e556ea0191 ipc/sem.c: update sem_otime for all operations
commit 0e8c665699e953fa58dc1b0b0d09e5dce7343cc7 upstream.

In commit 0a2b9d4c79 ("ipc/sem.c: move wake_up_process out of the
spinlock section"), the update of semaphore's sem_otime(last semop time)
was moved to one central position (do_smart_update).

But since do_smart_update() is only called for operations that modify
the array, this means that wait-for-zero semops do not update sem_otime
anymore.

The fix is simple:
Non-alter operations must update sem_otime.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Jia He <jiakernel@gmail.com>
Tested-by: Jia He <jiakernel@gmail.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:48 -07:00
Manfred Spraul 83aeb6e344 ipc/sem.c: synchronize the proc interface
commit d8c633766ad88527f25d9f81a5c2f083d78a2b39 upstream.

The proc interface is not aware of sem_lock(), it instead calls
ipc_lock_object() directly.  This means that simple semop() operations
can run in parallel with the proc interface.  Right now, this is
uncritical, because the implementation doesn't do anything that requires
a proper synchronization.

But it is dangerous and therefore should be fixed.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:48 -07:00
Manfred Spraul 901f6fedc5 ipc/sem.c: optimize sem_lock()
commit 6d07b68ce16ae9535955ba2059dedba5309c3ca1 upstream.

Operations that need access to the whole array must guarantee that there
are no simple operations ongoing.  Right now this is achieved by
spin_unlock_wait(sem->lock) on all semaphores.

If complex_count is nonzero, then this spin_unlock_wait() is not
necessary, because it was already performed in the past by the thread
that increased complex_count and even though sem_perm.lock was dropped
inbetween, no simple operation could have started, because simple
operations cannot start when complex_count is non-zero.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Rik van Riel <riel@redhat.com>
Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:48 -07:00
Manfred Spraul 184076a9f9 ipc/sem.c: fix race in sem_lock()
commit 5e9d527591421ccdb16acb8c23662231135d8686 upstream.

The exclusion of complex operations in sem_lock() is insufficient: after
acquiring the per-semaphore lock, a simple op must first check that
sem_perm.lock is not locked and only after that test check
complex_count.  The current code does it the other way around - and that
creates a race.  Details are below.

The patch is a complete rewrite of sem_lock(), based in part on the code
from Mike Galbraith.  It removes all gotos and all loops and thus the
risk of livelocks.

I have tested the patch (together with the next one) on my i3 laptop and
it didn't cause any problems.

The bug is probably also present in 3.10 and 3.11, but for these kernels
it might be simpler just to move the test of sma->complex_count after
the spin_is_locked() test.

Details of the bug:

Assume:
 - sma->complex_count = 0.
 - Thread 1: semtimedop(complex op that must sleep)
 - Thread 2: semtimedop(simple op).

Pseudo-Trace:

Thread 1: sem_lock(): acquire sem_perm.lock
Thread 1: sem_lock(): check for ongoing simple ops
			Nothing ongoing, thread 2 is still before sem_lock().
Thread 1: try_atomic_semop()
	<<< preempted.

Thread 2: sem_lock():
        static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
                                      int nsops)
        {
                int locknum;
         again:
                if (nsops == 1 && !sma->complex_count) {
                        struct sem *sem = sma->sem_base + sops->sem_num;

                        /* Lock just the semaphore we are interested in. */
                        spin_lock(&sem->lock);

                        /*
                         * If sma->complex_count was set while we were spinning,
                         * we may need to look at things we did not lock here.
                         */
                        if (unlikely(sma->complex_count)) {
                                spin_unlock(&sem->lock);
                                goto lock_array;
                        }
        <<<<<<<<<
	<<< complex_count is still 0.
	<<<
        <<< Here it is preempted
        <<<<<<<<<

Thread 1: try_atomic_semop() returns, notices that it must sleep.
Thread 1: increases sma->complex_count.
Thread 1: drops sem_perm.lock
Thread 2:
                /*
                 * Another process is holding the global lock on the
                 * sem_array; we cannot enter our critical section,
                 * but have to wait for the global lock to be released.
                 */
                if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
                        spin_unlock(&sem->lock);
                        spin_unlock_wait(&sma->sem_perm.lock);
                        goto again;
                }
	<<< sem_perm.lock already dropped, thus no "goto again;"

                locknum = sops->sem_num;

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:48 -07:00
Davidlohr Bueso e84ca33375 ipc: fix race with LSMs
commit 53dad6d3a8e5ac1af8bacc6ac2134ae1a8b085f1 upstream.

Currently, IPC mechanisms do security and auditing related checks under
RCU.  However, since security modules can free the security structure,
for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
race if the structure is freed before other tasks are done with it,
creating a use-after-free condition.  Manfred illustrates this nicely,
for instance with shared mem and selinux:

 -> do_shmat calls rcu_read_lock()
 -> do_shmat calls shm_object_check().
     Checks that the object is still valid - but doesn't acquire any locks.
     Then it returns.
 -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
 -> selinux_shm_shmat calls ipc_has_perm()
 -> ipc_has_perm accesses ipc_perms->security

shm_close()
 -> shm_close acquires rw_mutex & shm_lock
 -> shm_close calls shm_destroy
 -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
 -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
 -> ipc_free_security calls kfree(ipc_perms->security)

This patch delays the freeing of the security structures after all RCU
readers are done.  Furthermore it aligns the security life cycle with
that of the rest of IPC - freeing them based on the reference counter.
For situations where we need not free security, the current behavior is
kept.  Linus states:

 "... the old behavior was suspect for another reason too: having the
  security blob go away from under a user sounds like it could cause
  various other problems anyway, so I think the old code was at least
  _prone_ to bugs even if it didn't have catastrophic behavior."

I have tested this patch with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server.  In both cases selinux is
enabled, and tests pass for both voluntary and forced preemption models.
While the mentioned races are theoretical (at least no one as reported
them), I wanted to make sure that this new logic doesn't break anything
we weren't aware of.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:48 -07:00
Davidlohr Bueso 33b7466985 ipc: rename ids->rw_mutex
commit d9a605e40b1376eb02b067d7690580255a0df68f upstream.

Since in some situations the lock can be shared for readers, we shouldn't
be calling it a mutex, rename it to rwsem.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:47 -07:00
Manfred Spraul b56e88e25e ipc/sem.c: rename try_atomic_semop() to perform_atomic_semop(), docu update
commit 758a6ba39ef6df4cdc615e5edd7bd86eab81a5f7 upstream.

Cleanup: Some minor points that I noticed while writing the previous
patches

1) The name try_atomic_semop() is misleading: The function performs the
   operation (if it is possible).

2) Some documentation updates.

No real code change, a rename and documentation changes.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:47 -07:00
Manfred Spraul bf6830ad68 ipc/sem.c: replace shared sem_otime with per-semaphore value
commit d12e1e50e47e0900dbbf52237b7e171f4f15ea1e upstream.

sem_otime contains the time of the last semaphore operation that
completed successfully.  Every operation updates this value, thus access
from multiple cpus can cause thrashing.

Therefore the patch replaces the variable with a per-semaphore variable.
The per-array sem_otime is only calculated when required.

No performance improvement on a single-socket i3 - only important for
larger systems.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:47 -07:00
Manfred Spraul e5639c5288 ipc/sem.c: always use only one queue for alter operations
commit f269f40ad5aeee229ed70044926f44318abe41ef upstream.

There are two places that can contain alter operations:
 - the global queue: sma->pending_alter
 - the per-semaphore queues: sma->sem_base[].pending_alter.

Since one of the queues must be processed first, this causes an odd
priorization of the wakeups: complex operations have priority over
simple ops.

The patch restores the behavior of linux <=3.0.9: The longest waiting
operation has the highest priority.

This is done by using only one queue:
 - if there are complex ops, then sma->pending_alter is used.
 - otherwise, the per-semaphore queues are used.

As a side effect, do_smart_update_queue() becomes much simpler: no more
goto logic.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:47 -07:00
Manfred Spraul ab63bc97fa ipc/sem: separate wait-for-zero and alter tasks into seperate queues
commit 1a82e9e1d0f1b45f47a97c9e2349020536ff8987 upstream.

Introduce separate queues for operations that do not modify the
semaphore values.  Advantages:

 - Simpler logic in check_restart().
 - Faster update_queue(): Right now, all wait-for-zero operations are
   always tested, even if the semaphore value is not 0.
 - wait-for-zero gets again priority, as in linux <=3.0.9

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:47 -07:00
Manfred Spraul 0824e44c3f ipc/sem.c: cacheline align the semaphore structures
commit f5c936c0f267ec58641451cf8b8d39b4c207ee4d upstream.

As now each semaphore has its own spinlock and parallel operations are
possible, give each semaphore its own cacheline.

On a i3 laptop, this gives up to 28% better performance:

  #semscale 10 | grep "interleave 2"
  - before:
  Cpus 1, interleave 2 delay 0: 36109234 in 10 secs
  Cpus 2, interleave 2 delay 0: 55276317 in 10 secs
  Cpus 3, interleave 2 delay 0: 62411025 in 10 secs
  Cpus 4, interleave 2 delay 0: 81963928 in 10 secs

  -after:
  Cpus 1, interleave 2 delay 0: 35527306 in 10 secs
  Cpus 2, interleave 2 delay 0: 70922909 in 10 secs <<< + 28%
  Cpus 3, interleave 2 delay 0: 80518538 in 10 secs
  Cpus 4, interleave 2 delay 0: 89115148 in 10 secs <<< + 8.7%

i3, with 2 cores and with hyperthreading enabled.  Interleave 2 in order
use first the full cores.  HT partially hides the delay from cacheline
trashing, thus the improvement is "only" 8.7% if 4 threads are running.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:47 -07:00
Davidlohr Bueso 5cd37e9217 ipc: remove unused functions
commit 9ad66ae65fc8d3e7e3344310fb0aa835910264fe upstream.

We can now drop the msg_lock and msg_lock_check functions along with a
bogus comment introduced previously in semctl_down.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:46 -07:00
Davidlohr Bueso ac9bc6e396 ipc: move locking out of ipcctl_pre_down_nolock
commit 7b4cc5d8411bd4e9d61d8714f53859740cf830c2 upstream.

This function currently acquires both the rw_mutex and the rcu lock on
successful lookups, leaving the callers to explicitly unlock them,
creating another two level locking situation.

Make the callers (including those that still use ipcctl_pre_down())
explicitly lock and unlock the rwsem and rcu lock.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:46 -07:00
Davidlohr Bueso 115d40dbef ipc: close open coded spin lock calls
commit cf9d5d78d05bca96df7618dfc3a5ee4414dcae58 upstream.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18 07:45:46 -07:00
Manfred Spraul ab465df9dd ipc/sem.c: Fix missing wakeups in do_smart_update_queue()
do_smart_update_queue() is called when an operation (semop,
semctl(SETVAL), semctl(SETALL), ...) modified the array.  It must check
which of the sleeping tasks can proceed.

do_smart_update_queue() missed a few wakeups:
 - if a sleeping complex op was completed, then all per-semaphore queues
   must be scanned - not only those that were modified by *sops
 - if a sleeping simple op proceeded, then the global queue must be
   scanned again

And:
 - the test for "|sops == NULL) before scanning the global queue is not
   required: If the global queue is empty, then it doesn't need to be
   scanned - regardless of the reason for calling do_smart_update_queue()

The patch is not optimized, i.e.  even completing a wait-for-zero
operation causes a rescan.  This is done to keep the patch as simple as
possible.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-26 15:14:51 -07:00
Rik van Riel de2657f94a ipc,sem: fix semctl(..., GETNCNT)
The semctl GETNCNT returns the number of semops waiting for the
specified semaphore to become nonzero.  After commit 9f1bc2c902
("ipc,sem: have only one list in struct sem_queue"), the semops waiting
on just one semaphore are waiting on that semaphore's list.

In order to return the correct count, we have to walk that list too, in
addition to the sem_array's list for complex operations.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-09 14:17:47 -07:00
Rik van Riel ebc2e5e6a4 ipc,sem: fix semctl(..., GETZCNT)
The semctl GETZCNT returns the number of semops waiting for the
specified semaphore to become zero.  After commit 9f1bc2c902
("ipc,sem: have only one list in struct sem_queue"), the semops waiting
on just one semaphore are waiting on that semaphore's list.

In order to return the correct count, we have to walk that list too, in
addition to the sem_array's list for complex operations.

This bug broke dbench; it works again with this patch applied.

Signed-off-by: Rik van Riel <riel@redhat.com>
Reported-by: Kent Overstreet <koverstreet@google.com>
Tested-by: Kent Overstreet <koverstreet@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-09 14:17:47 -07:00
Linus Torvalds 941b0304a7 ipc: simplify rcu_read_lock() in semctl_nolock()
This trivially combines two rcu_read_lock() calls in both sides of a
if-statement into one single one in front of the if-statement.

Split out as an independent cleanup from the previous commit.

Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-04 17:24:59 -07:00
Linus Torvalds c728b9c87b ipc: simplify semtimedop/semctl_main() common error path handling
With various straight RCU lock/unlock movements, one common exit path
pattern had become

	rcu_read_unlock();
	goto out_wakeup;

and in fact there were no cases where we wanted to exit to out_wakeup
_without_ releasing the RCU read lock.

So replace that pattern with "goto out_rcu_wakeup", and remove the old
out_wakeup.

Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-04 17:21:58 -07:00
Linus Torvalds 321310ced2 ipc: move sem_obtain_lock() rcu locking into the only caller
sem_obtain_lock() was another of those functions that returned with the
RCU lock held for reading in the success case.  Move the RCU locking to
the caller (semtimedop()), making it more obvious.  We already did RCU
locking elsewhere in that function.

Side note: why does semtimedop() re-do the semphore lookup after the
sleep, rather than just getting a reference to the semaphore it already
looked up originally?

Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-04 17:20:14 -07:00
Linus Torvalds fbfd1d2862 ipc: fix double sem unlock in semctl error path
Fix another ipc locking buglet introduced by the scalability patches:
when semctl_down() was changed to delay the semaphore locking, one error
path for security_sem_semctl() went through the semaphore unlock logic
even though the semaphore had never been locked.

Introduced by commit 16df3674ef ("ipc,sem: do not hold ipc lock more
than necessary")

Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-04 17:19:59 -07:00
Linus Torvalds 4091fd942e ipc: move the rcu_read_lock() from sem_lock_and_putref() into callers
This is another ipc semaphore locking cleanup, trying to make the
locking more straightforward.  We move the rcu read locking into the
callers of sem_lock_and_putref(), which in general means that we now
mostly do the rcu_read_lock() and rcu_read_unlock() in the same
function.

Mostly.  We still have the ipc_addid/newary/freeary mess, and things
like ipcctl_pre_down_nolock().

Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-04 17:19:39 -07:00
Linus Torvalds 73b29505c3 ipc: sem_putref() does not need the semaphore lock any more
ipc_rcu_putref() uses atomics for the refcount, and the games to lock
and unlock the semaphore just to try to keep the reference counting
working are no longer useful.

Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-04 11:24:21 -07:00
Linus Torvalds 6d49dab8ae ipc: move rcu_read_unlock() out of sem_unlock() and into callers
The IPC locking is a mess, and sem_unlock() unlocks not only the
semaphore spinlock, it also drops the rcu read lock.  Unlike sem_lock(),
which just gets the spin-lock, and expects the caller to get the rcu
read lock.

This all makes things very hard to follow, and it's very confusing when
you take the rcu read lock in one function, and then release it in
another.  And it has caused actual bugs: the sem_obtain_lock() function
ended up dropping the RCU read lock twice in one error path, because it
first did the sem_unlock(), and then did a rcu_read_unlock() to match
the rcu_read_lock() it had done.

This is just a totally mindless "remove rcu_read_unlock() from
sem_unlock() and add it immediately after each caller" (except for the
aforementioned bug where we did too many rcu_read_unlock(), and in
find_alloc_undo() where we just got the rcu_read_lock() to correct for
the fact that sem_unlock would immediately drop it again).

We can (and should) clean things up further, but this fixes the bug with
the minimal amount of subtlety.

Reviewed-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-04 11:24:00 -07:00
Al Viro ce857229e0 ipc: fix GETALL/IPC_RM race for sysv semaphores
We can step on WARN_ON_ONCE() in sem_getref() if a semaphore is removed
just as we are about to call sem_getref() from semctl_main(); results
are not pretty.

We should fail with -EIDRM, same as if IPC_RM happened while we'd been
doing allocation there.  This also expands sem_getref() at its only
callsite (and fixed there), while sem_getref_and_unlock() is simply
killed off - it has no callers at all.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-02 19:51:31 -07:00
Rik van Riel 6062a8dc05 ipc,sem: fine grained locking for semtimedop
Introduce finer grained locking for semtimedop, to handle the common case
of a program wanting to manipulate one semaphore from an array with
multiple semaphores.

If the call is a semop manipulating just one semaphore in an array with
multiple semaphores, only take the lock for that semaphore itself.

If the call needs to manipulate multiple semaphores, or another caller is
in a transaction that manipulates multiple semaphores, the sem_array lock
is taken, as well as all the locks for the individual semaphores.

On a 24 CPU system, performance numbers with the semop-multi
test with N threads and N semaphores, look like this:

	vanilla		Davidlohr's	Davidlohr's +	Davidlohr's +
threads			patches		rwlock patches	v3 patches
10	610652		726325		1783589		2142206
20	341570		365699		1520453		1977878
30	288102		307037		1498167		2037995
40	290714		305955		1612665		2256484
50	288620		312890		1733453		2650292
60	289987		306043		1649360		2388008
70	291298		306347		1723167		2717486
80	290948		305662		1729545		2763582
90	290996		306680		1736021		2757524
100	292243		306700		1773700		3059159

[davidlohr.bueso@hp.com: do not call sem_lock when bogus sma]
[davidlohr.bueso@hp.com: make refcounter atomic]
Signed-off-by: Rik van Riel <riel@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Jason Low <jason.low2@hp.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Tested-by: Emmanuel Benisty <benisty.e@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01 08:12:58 -07:00
Rik van Riel 9f1bc2c902 ipc,sem: have only one list in struct sem_queue
Having only one list in struct sem_queue, and only queueing simple
semaphore operations on the list for the semaphore involved, allows us to
introduce finer grained locking for semtimedop.

Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01 08:12:58 -07:00
Rik van Riel c460b662d5 ipc,sem: open code and rename sem_lock
Rename sem_lock() to sem_obtain_lock(), so we can introduce a sem_lock()
later that only locks the sem_array and does nothing else.

Open code the locking from ipc_lock() in sem_obtain_lock() so we can
introduce finer grained locking for the sem_array in the next patch.

[akpm@linux-foundation.org: propagate the ipc_obtain_object() errno out of sem_obtain_lock()]
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Chegu Vinod <chegu_vinod@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01 08:12:58 -07:00
Davidlohr Bueso 16df3674ef ipc,sem: do not hold ipc lock more than necessary
Instead of holding the ipc lock for permissions and security checks, among
others, only acquire it when necessary.

Some numbers....

1) With Rik's semop-multi.c microbenchmark we can see the following
   results:

Baseline (3.9-rc1):
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 151452270, ops/sec 5048409

+  59.40%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
+   6.14%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
+   3.84%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
+   3.64%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
+   2.06%            a.out  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
+   1.86%            a.out  [kernel.kallsyms]  [k] ipc_lock

With this patchset:
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 273156400, ops/sec 9105213

+  18.54%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
+  11.72%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
+   7.70%            a.out  [kernel.kallsyms]  [k] ipc_has_perm.isra.21
+   6.58%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
+   6.54%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
+   4.71%            a.out  [kernel.kallsyms]  [k] ipc_obtain_object_check

2) While on an Oracle swingbench DSS (data mining) workload the
   improvements are not as exciting as with Rik's benchmark, we can see
   some positive numbers.  For an 8 socket machine the following are the
   percentages of %sys time incurred in the ipc lock:

Baseline (3.9-rc1):
100 swingbench users: 8,74%
400 swingbench users: 21,86%
800 swingbench users: 84,35%

With this patchset:
100 swingbench users: 8,11%
400 swingbench users: 19,93%
800 swingbench users: 77,69%

[riel@redhat.com: fix two locking bugs]
[sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Acked-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01 08:12:58 -07:00
Al Viro e1fd1f490f get rid of union semop in sys_semctl(2) arguments
just have the bugger take unsigned long and deal with SETVAL
case (when we use an int member in the union) explicitly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-05 15:14:16 -05:00
Al Viro 22d1a35da0 make HAVE_SYSCALL_WRAPPERS unconditional
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-03-03 22:58:30 -05:00
Eric W. Biederman 1efdb69b0b userns: Convert ipc to use kuid and kgid where appropriate
- Store the ipc owner and creator with a kuid
- Store the ipc group and the crators group with a kgid.
- Add error handling to ipc_update_perms, allowing it to
  fail if the uids and gids can not be converted to kuids
  or kgids.
- Modify the proc files to display the ipc creator and
  owner in the user namespace of the opener of the proc file.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-06 22:17:20 -07:00
Manfred Spraul e57940d719 ipc/sem.c: remove private structures from public header file
include/linux/sem.h contains several structures that are only used within
ipc/sem.c.

The patch moves them into ipc/sem.c - there is no need to expose the
structures to the whole kernel.

No functional changes, only whitespace cleanups and 80-char per line
fixes.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:01 -07:00
Manfred Spraul 0b0577f608 ipc/sem.c: handle spurious wakeups
semtimedop() does not handle spurious wakeups, it returns -EINTR to user
space.  Most other schedule() users would just loop and not return to user
space.  The patch adds such a loop to semtimedop()

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:01 -07:00
Manfred Spraul 3c24783bb2 ipc/sem.c: fix return code race with semop vs. semop +semctl(IPC_RMID)
sys_semtimedop() may return -EIDRM although the semaphore operation
completed successfully:

thread 1:	thread 2:
		semtimedop(), sleeps
semop():
* acquires sem_lock()
		semtimedop() woken up due to timeout
		sem_lock() loops
* notices that thread 2 could be completed.
* performs the operations that thread 2 is sleeping on.
* marks the semaphore operation as IN_WAKEUP
* drops sem_lock(), does wakeup, sets return code to 0
		* thread delayed due to interrupt, whatever
* returns to user space
		* thread still delayed
semctl(IPC_RMID)
* acquires sem_lock()
* ipc_rmid(), ipcp->deleted=1
* drops sem_lock()
		* thread finally continues - but seem_lock()
		  now fails due to ipcp->deleted == 1
		* returns -EIDRM instead of 0

The fix is trivial: Always use the return code in queue.status.

In real world, the race probably doesn't matter:
If the semaphore array is destroyed, the app is probably not interested
if the last operation succeeded or was already cancelled.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:01 -07:00
Manfred Spraul d694ad62bf ipc/sem.c: fix race with concurrent semtimedop() timeouts and IPC_RMID
If a semaphore array is removed and in parallel a sleeping task is woken
up (signal or timeout, does not matter), then the woken up task does not
wait until wake_up_sem_queue_do() is completed.  This will cause crashes,
because wake_up_sem_queue_do() will read from a stale pointer.

The fix is simple: Regardless of anything, always call get_queue_result().
This function waits until wake_up_sem_queue_do() has finished it's task.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=27142

Reported-by: Yuriy Yevtukhov <yuriy@ucoz.com>
Reported-by: Harald Laabs <kernel@dasr.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: <stable@kernel.org>		[2.6.35+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 20:57:07 -07:00
Lai Jiangshan 693a8b6eec ipc,rcu: Convert call_rcu(free_un) to kfree_rcu()
The rcu callback free_un() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(free_un).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-07-20 14:10:16 -07:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Serge E. Hallyn b0e77598f8 userns: user namespaces: convert several capable() calls
CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be against
current_user_ns().

Changelog:
	Jan 11: Use task_ns_capable() in place of sched_capable().
	Jan 11: Use nsown_capable() as suggested by Bastian Blank.
	Jan 11: Clarify (hopefully) some logic in futex and sched.c
	Feb 15: use ns_capable for ipc, not nsown_capable
	Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
	Feb 23: pass ns down rather than taking it from current

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 19:47:08 -07:00
Dan Rosenberg 982f7c2b2e sys_semctl: fix kernel stack leakage
The semctl syscall has several code paths that lead to the leakage of
uninitialized kernel stack memory (namely the IPC_INFO, SEM_INFO,
IPC_STAT, and SEM_STAT commands) during the use of the older, obsolete
version of the semid_ds struct.

The copy_semid_to_user() function declares a semid_ds struct on the stack
and copies it back to the user without initializing or zeroing the
"sem_base", "sem_pending", "sem_pending_last", and "undo" pointers,
allowing the leakage of 16 bytes of kernel stack memory.

The code is still reachable on 32-bit systems - when calling semctl()
newer glibc's automatically OR the IPC command with the IPC_64 flag, but
invoking the syscall directly allows users to use the older versions of
the struct.

Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01 10:50:58 -07:00
Manfred Spraul c61284e991 ipc/sem.c: bugfix for semop() not reporting successful operation
The last change to improve the scalability moved the actual wake-up out of
the section that is protected by spin_lock(sma->sem_perm.lock).

This means that IN_WAKEUP can be in queue.status even when the spinlock is
acquired by the current task.  Thus the same loop that is performed when
queue.status is read without the spinlock acquired must be performed when
the spinlock is acquired.

Thanks to kamezawa.hiroyu@jp.fujitsu.com for noticing lack of the memory
barrier.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16255

[akpm@linux-foundation.org: clean up kerneldoc, checkpatch warning and whitespace]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Luca Tettamanti <kronos.it@gmail.com>
Tested-by: Luca Tettamanti <kronos.it@gmail.com>
Reported-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-20 16:25:40 -07:00
Julia Lawall 4de85cd6d6 ipc/sem.c: use ERR_CAST
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
type T;
T x;
identifier f;
@@

T f (...) { <+...
- ERR_PTR(PTR_ERR(x))
+ x
 ...+> }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:49 -07:00
Manfred Spraul c5cf6359ad ipc/sem.c: update description of the implementation
ipc/sem.c begins with a 15 year old description about bugs in the initial
implementation in Linux-1.0.  The patch replaces that with a top level
description of the current code.

A TODO could be derived from this text:

The opengroup man page for semop() does not mandate FIFO.  Thus there is
no need for a semaphore array list of pending operations.

If

- this list is removed
- the per-semaphore array spinlock is removed (possible if there is no
  list to protect)
- sem_otime is moved into the semaphores and calculated on demand during
  semctl()

then the array would be read-mostly - which would significantly improve
scaling for applications that use semaphore arrays with lots of entries.

The price would be expensive semctl() calls:

	for(i=0;i<sma->sem_nsems;i++) spin_lock(sma->sem_lock);
	<do stuff>
	for(i=0;i<sma->sem_nsems;i++) spin_unlock(sma->sem_lock);

I'm not sure if the complexity is worth the effort, thus here is the
documentation of the current behavior first.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:49 -07:00