-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZQspmAAoJEE44bZycYXAvLXMP/3Uqx7K7dGjHvvhGA4DhnzSp
bGLpjeP1sXXnnd932PN+qkGbl2j/NPjS74DobDqGWnrwxKRzQ21F4YkWJGtb4Pe2
JKcY7y2rbKGcwhpS9qDMkSWuaUKJWF5MAsH08LnCWqlGphGwAH/uPTdqS4iI/CJM
aQvaaITe5SVzvpvpyoCVdHqu8K+Ukraf91mvt7hlmrn9OnqO9us9MWulw5sSXQcd
pM8ZbRkBDE5OFeVnPKJDBY+cR2ML41wekMMwvJWt7uRyrX2i5c7oQVXYoeYE4MKx
Pueb7aG7LQwBUzNJCiZA6PAEFQPwNPCoxHZbAax0D6/JyDWOZukappquzjd6gLDM
+U7mxeFTeNZJ5v9tUcUIOb4GaaFcccS3wdDP23V2N8iM88hFVwJn0RSy/pksX37+
ZNDiEyDeJBjz3kh/Kf40zhFIIrABMozFeX3tpSRVVqXb+T6P9l8Y88O2LGY5FCXK
QBbAC+jC4X4YI+4v+QWImg9mkfTwzZyjyAlfyjPlHVSK9KDP9M6LXpr2+jKS7jOc
ievMOh9ku0HIVuSWGUKZSqjvcF01Bh99tFlX+KqipomwNTwa4hKCLmnOVflF1BPE
8sfD9hvenA0e949kXrURUmqpg6Ujkrbb/lXuD7e2CakCu+XjEMf317R11TyTsHNG
10hsmPsGDVcwbyFOFHS3
=mvzl
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEJDfLduVEy2qz2d/TmXOSYMtstxYFAlpqfEUACgkQmXOSYMts
txbJOQ/+Pce1eBSgjESWKuz0OP9BfAe9RpWFi7lBZ/EgRwJVYEx6jau9EYXAQ7YT
roCIsV6eufhMplYGHJz6EHxK2Hieb1zG9ooX9ss9GxiB6qmqeqC0Slm9EQE15yGT
px3fVz9r86edqjtj7UKK0/n8DJUaFh5LWOymLD3d3/115RYQsl/GowugH9F79PvN
pR+OyXq7srtfCmwdhZ65012Ef10RXqBRv0fCYBH6r+jkMqb7uSDFzdR39Z7k3QFk
AM4+3lTm6EEZ4xZkcMyX3GuQWslpPAlvFdEx43TjdCbseXAqURoppmxvz+Izum75
fy0oOdKl5OSpyZArRkUfZ0MnL6BHGcKxwYV4u1LupwvqPyaUT4yiT5VEUdy9EqJo
Syrr0oSR2lrXqQESdxKkmOZVXyul0nF3Fh1p5QlU1/Id9oskMLYqcXegFyhr2Wyp
+A4ZozljEQ4AGm4dYFdH3w8TcNDttjztYoKf8OXnaCOj3p/SEq84tk4Hm3vpoPvh
5OzsZC3UB9gJ1mXsKOVKLJFCPzmg61KOvwhopfAcC6cyiIIf/MPCneZeOzsavtQX
J+atSNcLVNE3jmrXvUrwxSpZ3KCc3Ti5Q8pD9ni6/B6st2+LO8EXPrS6n2+28nvu
hVpjyCXLbghdmn1mjOGW9lvMQEg/Dupj/ocpCPHJnXpbpM8Mcjo=
=3eAv
-----END PGP SIGNATURE-----
Merge 3.10.106 into android-msm-bullhead-3.10-oreo-m5
Changes in 3.10.106: (252 commits)
packet: fix race condition in packet_set_ring
crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks
EVM: Use crypto_memneq() for digest comparisons
libceph: don't set weight to IN when OSD is destroyed
KVM: x86: fix emulation of "MOV SS, null selector"
KVM: x86: Introduce segmented_write_std
posix_acl: Clear SGID bit when setting file permissions
tmpfs: clear S_ISGID when setting posix ACLs
fbdev: color map copying bounds checking
selinux: fix off-by-one in setprocattr
tcp: avoid infinite loop in tcp_splice_read()
xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings
KEYS: Change the name of the dead type to ".dead" to prevent user access
KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings
ext4: fix data exposure after a crash
locking/rtmutex: Prevent dequeue vs. unlock race
m68k: Fix ndelay() macro
hotplug: Make register and unregister notifier API symmetric
Btrfs: fix tree search logic when replaying directory entry deletes
USB: serial: kl5kusb105: fix open error path
block_dev: don't test bdev->bd_contains when it is not stable
crypto: caam - fix AEAD givenc descriptors
ext4: fix mballoc breakage with 64k block size
ext4: fix stack memory corruption with 64k block size
ext4: reject inodes with negative size
ext4: return -ENOMEM instead of success
f2fs: set ->owner for debugfs status file's file_operations
block: protect iterate_bdevs() against concurrent close
scsi: zfcp: fix use-after-"free" in FC ingress path after TMF
scsi: zfcp: do not trace pure benign residual HBA responses at default level
scsi: zfcp: fix rport unblock race with LUN recovery
ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it
IB/mad: Fix an array index check
IB/multicast: Check ib_find_pkey() return value
powerpc: Convert cmp to cmpd in idle enter sequence
usb: gadget: composite: Test get_alt() presence instead of set_alt()
USB: serial: omninet: fix NULL-derefs at open and disconnect
USB: serial: quatech2: fix sleep-while-atomic in close
USB: serial: pl2303: fix NULL-deref at open
USB: serial: keyspan_pda: verify endpoints at probe
USB: serial: spcp8x5: fix NULL-deref at open
USB: serial: io_ti: fix NULL-deref at open
USB: serial: io_ti: fix another NULL-deref at open
USB: serial: iuu_phoenix: fix NULL-deref at open
USB: serial: garmin_gps: fix memory leak on failed URB submit
USB: serial: ti_usb_3410_5052: fix NULL-deref at open
USB: serial: io_edgeport: fix NULL-deref at open
USB: serial: oti6858: fix NULL-deref at open
USB: serial: cyberjack: fix NULL-deref at open
USB: serial: kobil_sct: fix NULL-deref in write
USB: serial: mos7840: fix NULL-deref at open
USB: serial: mos7720: fix NULL-deref at open
USB: serial: mos7720: fix use-after-free on probe errors
USB: serial: mos7720: fix parport use-after-free on probe errors
USB: serial: mos7720: fix parallel probe
usb: xhci-mem: use passed in GFP flags instead of GFP_KERNEL
usb: musb: Fix trying to free already-free IRQ 4
ALSA: usb-audio: Fix bogus error return in snd_usb_create_stream()
USB: serial: kl5kusb105: abort on open exception path
staging: iio: ad7606: fix improper setting of oversampling pins
usb: dwc3: gadget: always unmap EP0 requests
cris: Only build flash rescue image if CONFIG_ETRAX_AXISFLASHMAP is selected
hwmon: (ds620) Fix overflows seen when writing temperature limits
clk: clk-wm831x: fix a logic error
iommu/amd: Fix the left value check of cmd buffer
scsi: mvsas: fix command_active typo
target/iscsi: Fix double free in lio_target_tiqn_addtpg()
mmc: mmc_test: Uninitialized return value
powerpc/pci/rpadlpar: Fix device reference leaks
ser_gigaset: return -ENOMEM on error instead of success
net, sched: fix soft lockup in tc_classify
net: stmmac: Fix race between stmmac_drv_probe and stmmac_open
gro: Enter slow-path if there is no tailroom
gro: use min_t() in skb_gro_reset_offset()
gro: Disable frag0 optimization on IPv6 ext headers
powerpc: Fix build warning on 32-bit PPC
Input: i8042 - add Pegatron touchpad to noloop table
mm/hugetlb.c: fix reservation race when freeing surplus pages
USB: serial: kl5kusb105: fix line-state error handling
USB: serial: ch341: fix initial modem-control state
USB: serial: ch341: fix open error handling
USB: serial: ch341: fix control-message error handling
USB: serial: ch341: fix open and resume after B0
USB: serial: ch341: fix resume after reset
USB: serial: ch341: fix modem-control and B0 handling
x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option
NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.
powerpc/ibmebus: Fix further device reference leaks
powerpc/ibmebus: Fix device reference leaks in sysfs interface
IB/mlx4: Set traffic class in AH
IB/mlx4: Fix port query for 56Gb Ethernet links
perf scripting: Avoid leaking the scripting_context variable
ARM: dts: imx31: fix clock control module interrupts description
svcrpc: don't leak contexts on PROC_DESTROY
mmc: mxs-mmc: Fix additional cycles after transmission stop
mtd: nand: xway: disable module support
ubifs: Fix journal replay wrt. xattr nodes
arm64/ptrace: Preserve previous registers for short regset write
arm64/ptrace: Avoid uninitialised struct padding in fpr_set()
arm64/ptrace: Reject attempts to set incomplete hardware breakpoint fields
ARM: ux500: fix prcmu_is_cpu_in_wfi() calculation
ite-cir: initialize use_demodulator before using it
fuse: do not use iocb after it may have been freed
crypto: caam - fix non-hmac hashes
drm/i915: Don't leak edid in intel_crt_detect_ddc()
s5k4ecgx: select CRC32 helper
platform/x86: intel_mid_powerbtn: Set IRQ_ONESHOT
net: fix harmonize_features() vs NETIF_F_HIGHDMA
tcp: initialize max window for a new fastopen socket
svcrpc: fix oops in absence of krb5 module
ARM: 8643/3: arm/ptrace: Preserve previous registers for short regset write
mac80211: Fix adding of mesh vendor IEs
scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send
drm/i915: fix use-after-free in page_flip_completed()
net: use a work queue to defer net_disable_timestamp() work
ipv4: keep skb->dst around in presence of IP options
netlabel: out of bound access in cipso_v4_validate()
ip6_gre: fix ip6gre_err() invalid reads
ping: fix a null pointer dereference
l2tp: do not use udp_ioctl()
packet: fix races in fanout_add()
packet: Do not call fanout_release from atomic contexts
net: socket: fix recvmmsg not returning error from sock_error
USB: serial: mos7840: fix another NULL-deref at open
USB: serial: ftdi_sio: fix modem-status error handling
USB: serial: ftdi_sio: fix extreme low-latency setting
USB: serial: ftdi_sio: fix line-status over-reporting
USB: serial: spcp8x5: fix modem-status handling
USB: serial: opticon: fix CTS retrieval at open
USB: serial: ark3116: fix register-accessor error handling
x86/platform/goldfish: Prevent unconditional loading
goldfish: Sanitize the broken interrupt handler
ocfs2: do not write error flag to user structure we cannot copy from/to
mfd: pm8921: Potential NULL dereference in pm8921_remove()
drm/nv50/disp: min/max are reversed in nv50_crtc_gamma_set()
net: 6lowpan: fix lowpan_header_create non-compression memcpy call
vti4: Don't count header length twice.
net/sched: em_meta: Fix 'meta vlan' to correctly recognize zero VID frames
MIPS: OCTEON: Fix copy_from_user fault handling for large buffers
MIPS: Clear ISA bit correctly in get_frame_info()
MIPS: Prevent unaligned accesses during stack unwinding
MIPS: Fix get_frame_info() handling of microMIPS function size
MIPS: Fix is_jump_ins() handling of 16b microMIPS instructions
MIPS: Calculate microMIPS ra properly when unwinding the stack
MIPS: Handle microMIPS jumps in the same way as MIPS32/MIPS64 jumps
uvcvideo: Fix a wrong macro
scsi: aacraid: Reorder Adapter status check
ath9k: use correct OTP register offsets for the AR9340 and AR9550
fuse: add missing FR_FORCE
RDMA/core: Fix incorrect structure packing for booleans
NFSv4: fix getacl head length estimation
s390/qdio: clear DSCI prior to scanning multiple input queues
IB/ipoib: Fix deadlock between rmmod and set_mode
ktest: Fix child exit code processing
nlm: Ensure callback code also checks that the files match
dm: flush queued bios when process blocks to avoid deadlock
USB: serial: digi_acceleport: fix OOB data sanity check
USB: serial: digi_acceleport: fix OOB-event processing
MIPS: ip27: Disable qlge driver in defconfig
tracing: Add #undef to fix compile error
USB: serial: safe_serial: fix information leak in completion handler
USB: serial: omninet: fix reference leaks at open
USB: iowarrior: fix NULL-deref at probe
USB: iowarrior: fix NULL-deref in write
USB: serial: io_ti: fix NULL-deref in interrupt callback
USB: serial: io_ti: fix information leak in completion handler
vxlan: correctly validate VXLAN ID against VXLAN_N_VID
ipv4: mask tos for input route
locking/static_keys: Add static_key_{en,dis}able() helpers
net: net_enable_timestamp() can be called from irq contexts
dccp/tcp: fix routing redirect race
net sched actions: decrement module reference count after table flush.
perf/core: Fix event inheritance on fork()
isdn/gigaset: fix NULL-deref at probe
xen: do not re-use pirq number cached in pci device msi msg data
net: properly release sk_frag.page
net: unix: properly re-increment inflight counter of GC discarded candidates
Input: ims-pcu - validate number of endpoints before using them
Input: hanwang - validate number of endpoints before using them
Input: yealink - validate number of endpoints before using them
Input: cm109 - validate number of endpoints before using them
USB: uss720: fix NULL-deref at probe
USB: idmouse: fix NULL-deref at probe
USB: wusbcore: fix NULL-deref at probe
uwb: i1480-dfu: fix NULL-deref at probe
uwb: hwa-rc: fix NULL-deref at probe
mmc: ushc: fix NULL-deref at probe
ext4: mark inode dirty after converting inline directory
scsi: libsas: fix ata xfer length
ALSA: ctxfi: Fallback DMA mask to 32bit
ALSA: ctxfi: Fix the incorrect check of dma_set_mask() call
ACPI / PNP: Avoid conflicting resource reservations
ACPI / resources: free memory on error in add_region_before()
ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage
USB: OHCI: Fix race between ED unlink and URB submission
i2c: at91: manage unexpected RXRDY flag when starting a transfer
ipv4: igmp: Allow removing groups from a removed interface
ptrace: fix PTRACE_LISTEN race corrupting task->state
ring-buffer: Fix return value check in test_ringbuffer()
metag/usercopy: Fix alignment error checking
metag/usercopy: Add early abort to copy_to_user
metag/usercopy: Set flags before ADDZ
metag/usercopy: Fix src fixup in from user rapf loops
metag/usercopy: Add missing fixups
s390/decompressor: fix initrd corruption caused by bss clear
net/mlx4_en: Fix bad WQE issue
net/mlx4_core: Fix racy CQ (Completion Queue) free
char: Drop bogus dependency of DEVPORT on !M68K
powerpc: Disable HFSCR[TM] if TM is not supported
pegasus: Use heap buffers for all register access
rtl8150: Use heap buffers for all register access
tracing: Allocate the snapshot buffer before enabling probe
ring-buffer: Have ring_buffer_iter_empty() return true when empty
netfilter: arp_tables: fix invoking 32bit "iptable -P INPUT ACCEPT" failed in 64bit kernel
net: phy: handle state correctly in phy_stop_machine
l2tp: take reference on sessions being dumped
MIPS: KGDB: Use kernel context for sleeping threads
ARM: dts: imx31: move CCM device node to AIPS2 bus devices
ARM: dts: imx31: fix AVIC base address
tun: Fix TUN_PKT_STRIP setting
Staging: vt6655-6: potential NULL dereference in hostap_disable_hostapd()
net: sctp: rework multihoming retransmission path selection to rfc4960
perf trace: Use the syscall raw_syscalls:sys_enter timestamp
USB: usbtmc: add missing endpoint sanity check
ping: implement proper locking
USB: fix problems with duplicate endpoint addresses
USB: dummy-hcd: fix bug in stop_activity (handle ep0)
mm/init: fix zone boundary creation
can: Fix kernel panic at security_sock_rcv_skb
Drivers: hv: avoid vfree() on crash
xc2028: avoid use after free
xc2028: unlock on error in xc2028_set_config()
xc2028: Fix use-after-free bug properly
ipv6: fix ip6_tnl_parse_tlv_enc_lim()
ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim()
ipv6: fix the use of pcpu_tstats in ip6_tunnel
sctp: avoid BUG_ON on sctp_wait_for_sndbuf
sctp: deny peeloff operation on asocs with threads sleeping on it
KVM: x86: clear bus pointer when destroyed
kvm: exclude ioeventfd from counting kvm_io_range limit
KVM: kvm_io_bus_unregister_dev() should never fail
TTY: n_hdlc, fix lockdep false positive
tty: n_hdlc: get rid of racy n_hdlc.tbuf
ipv6: handle -EFAULT from skb_copy_bits
fs: exec: apply CLOEXEC before changing dumpable task flags
mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
dccp/tcp: do not inherit mc_list from parent
char: lp: fix possible integer overflow in lp_setup()
dccp: fix freeing skb too early for IPV6_RECVPKTINFO
Linux 3.10.106
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Conflicts:
drivers/mfd/pm8921-core.c
include/linux/cpu.h
kernel/cpu.c
net/ipv4/inet_connection_sock.c
net/ipv4/ping.c
commit 90db10434b163e46da413d34db8d0e77404cc645 upstream.
No caller currently checks the return value of
kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on
freeing their device. A stale reference will remain in the io_bus,
getting at least used again, when the iobus gets teared down on
kvm_destroy_vm() - leading to use after free errors.
There is nothing the callers could do, except retrying over and over
again.
So let's simply remove the bus altogether, print an error and make
sure no one can access this broken bus again (returning -ENOMEM on any
attempt to access it).
Fixes: e93f8a0f82 ("KVM: convert io_bus to SRCU")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[wt: no kvm_io_bus_read_cookie in 3.10, slightly different constructs]
Signed-off-by: Willy Tarreau <w@1wt.eu>
commit 6ea34c9b78c10289846db0abeebd6b84d5aca084 upstream.
We can easily reach the 1000 limit by start VM with a couple
hundred I/O devices (multifunction=on). The hardcode limit
already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000).
In userspace, we already have maximum file descriptor to
limit ioeventfd count. But kvm_io_bus devices also are used
for pit, pic, ioapic, coalesced_mmio. They couldn't be limited
by maximum file descriptor.
Currently only ioeventfds take too much kvm_io_bus devices,
so just exclude it from counting kvm_io_range limit.
Also fixed one indent issue in kvm_host.h
Signed-off-by: Amos Kong <akong@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
[wt: next patch depends on this one]
Signed-off-by: Willy Tarreau <w@1wt.eu>
The gfn_to_index function relies on huge page defines which either may
not make sense on systems that don't support huge pages or are defined
in an unconvenient way for other architectures. Since this is
x86-specific, move the function to arch/x86/include/asm/kvm_host.h.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Git-commit: 6d9d41e57440e32a3400f37aa05ef7a1a09ced64
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Ian Maund <imaund@codeaurora.org>
The kvm_host.h header file doesn't handle well
inclusion when archs don't support KVM.
This results in build crashes for such archs when they
want to implement context tracking because this subsystem
includes kvm_host.h in order to implement the
guest_enter/exit APIs but it doesn't handle KVM off case.
To fix this, move the guest_enter()/guest_exit()
declarations and generic implementation to the context
tracking headers. These generic APIs actually belong to
this subsystem, besides other domains boundary tracking
like user_enter() et al.
KVM now properly becomes a user of this library, not the
other buggy way around.
Reported-by: Kevin Hilman <khilman@linaro.org>
Reviewed-by: Kevin Hilman <khilman@linaro.org>
Tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull kvm updates from Gleb Natapov:
"Highlights of the updates are:
general:
- new emulated device API
- legacy device assignment is now optional
- irqfd interface is more generic and can be shared between arches
x86:
- VMCS shadow support and other nested VMX improvements
- APIC virtualization and Posted Interrupt hardware support
- Optimize mmio spte zapping
ppc:
- BookE: in-kernel MPIC emulation with irqfd support
- Book3S: in-kernel XICS emulation (incomplete)
- Book3S: HV: migration fixes
- BookE: more debug support preparation
- BookE: e6500 support
ARM:
- reworking of Hyp idmaps
s390:
- ioeventfd for virtio-ccw
And many other bug fixes, cleanups and improvements"
* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
kvm: Add compat_ioctl for device control API
KVM: x86: Account for failing enable_irq_window for NMI window request
KVM: PPC: Book3S: Add API for in-kernel XICS emulation
kvm/ppc/mpic: fix missing unlock in set_base_addr()
kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
kvm/ppc/mpic: remove users
kvm/ppc/mpic: fix mmio region lists when multiple guests used
kvm/ppc/mpic: remove default routes from documentation
kvm: KVM_CAP_IOMMU only available with device assignment
ARM: KVM: iterate over all CPUs for CPU compatibility check
KVM: ARM: Fix spelling in error message
ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
KVM: ARM: Fix API documentation for ONE_REG encoding
ARM: KVM: promote vfp_host pointer to generic host cpu context
ARM: KVM: add architecture specific hook for capabilities
ARM: KVM: perform HYP initilization for hotplugged CPUs
ARM: KVM: switch to a dual-step HYP init code
ARM: KVM: rework HYP page table freeing
ARM: KVM: enforce maximum size for identity mapped code
ARM: KVM: move to a KVM provided HYP idmap
...
This adds the API for userspace to instantiate an XICS device in a VM
and connect VCPUs to it. The API consists of a new device type for
the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
which is used to assert and deassert interrupt inputs of the XICS.
The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
Each attribute within this group corresponds to the state of one
interrupt source. The attribute number is the same as the interrupt
source number.
This does not support irq routing or irqfd yet.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
We hope to at some point deprecate KVM legacy device assignment in
favor of VFIO-based assignment. Towards that end, allow legacy
device assignment to be deconfigured.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
The VMX implementation of enable_irq_window raised
KVM_REQ_IMMEDIATE_EXIT after we checked it in vcpu_enter_guest. This
caused infinite loops on vmentry. Fix it by letting enable_irq_window
signal the need for an immediate exit via its return value and drop
KVM_REQ_IMMEDIATE_EXIT.
This issue only affects nested VMX scenarios.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
The hassle of getting refcounting right was greater than the hassle
of keeping a list of devices to destroy on VM exit.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Hook the MPIC code up to the KVM interfaces, add locking, etc.
Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently, devices that are emulated inside KVM are configured in a
hardcoded manner based on an assumption that any given architecture
only has one way to do it. If there's any need to access device state,
it is done through inflexible one-purpose-only IOCTLs (e.g.
KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is
cumbersome and depletes a limited numberspace.
This API provides a mechanism to instantiate a device of a certain
type, returning an ID that can be used to set/get attributes of the
device. Attributes may include configuration parameters (e.g.
register base address), device state, operational commands, etc. It
is similar to the ONE_REG API, except that it acts on devices rather
than vcpus.
Both device types and individual attributes can be tested without having
to create the device or get/set the attribute, without the need for
separately managing enumerated capabilities.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Setting up IRQ routes is nothing IOAPIC specific. Extract everything
that really is generic code into irqchip.c and only leave the ioapic
specific bits to irq_comm.c.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
The prototype has been stale for a while, I can't spot any real function
define behind it. Let's just remove it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Quite a bit of code in KVM has been conditionalized on availability of
IOAPIC emulation. However, most of it is generically applicable to
platforms that don't have an IOPIC, but a different type of irq chip.
Make code that only relies on IRQ routing, not an APIC itself, on
CONFIG_HAVE_KVM_IRQ_ROUTING, so that we can reuse it later.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
The concept of routing interrupt lines to an irqchip is nothing
that is IOAPIC specific. Every irqchip has a maximum number of pins
that can be linked to irq lines.
So let's add a new define that allows us to reuse generic code for
non-IOAPIC platforms.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Both TMR and EOI exit bitmap need to be updated when ioapic changed
or vcpu's id/ldr/dfr changed. So use common function instead eoi exit
bitmap specific function.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Userspace may deliver RTC interrupt without query the status. So we
want to track RTC EOI for this case.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The variable kvm_rebooting is a common kvm variable, so move its
declaration from arch/x86/include/asm/kvm_host.h to
include/asm/kvm_host.h.
Fixes this sparse warning when building on arm64:
virt/kvm/kvm_main.c⚠️ symbol 'kvm_rebooting' was not declared. Should it be static?
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
The variables vm_list and kvm_lock are common to all architectures, so
move the declarations from arch/x86/include/asm/kvm_host.h to
include/linux/kvm_host.h.
Fixes sparse warnings like these when building for arm64:
virt/kvm/kvm_main.c: warning: symbol 'kvm_lock' was not declared. Should it be static?
virt/kvm/kvm_main.c: warning: symbol 'vm_list' was not declared. Should it be static?
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
This patch adds support for kvm_gfn_to_hva_cache_init functions for
reads and writes that will cross a page. If the range falls within
the same memslot, then this will be a fast operation. If the range
is split between two memslots, then the slower kvm_read_guest and
kvm_write_guest are used.
Tested: Test against kvm_clock unit tests.
Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Merge reason:
From: Alexander Graf <agraf@suse.de>
"Just recently this really important patch got pulled into Linus' tree for 3.9:
commit 1674400aae
Author: Anton Blanchard <anton <at> samba.org>
Date: Tue Mar 12 01:51:51 2013 +0000
Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue.
Could you please merge kvm/next against linus/master, so that I can base my trees against that?"
* upstream/master: (653 commits)
PCI: Use ROM images from firmware only if no other ROM source available
sparc: remove unused "config BITS"
sparc: delete "if !ULTRA_HAS_POPULATION_COUNT"
KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)
KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797)
KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796)
arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS
arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED
inet: limit length of fragment queue hash table bucket lists
qeth: Fix scatter-gather regression
qeth: Fix invalid router settings handling
qeth: delay feature trace
sgy-cts1000: Remove __dev* attributes
KVM: x86: fix deadlock in clock-in-progress request handling
KVM: allow host header to be included even for !CONFIG_KVM
hwmon: (lm75) Fix tcn75 prefix
hwmon: (lm75.h) Update header inclusion
MAINTAINERS: Remove Mark M. Hoffman
xfs: ensure we capture IO errors correctly
xfs: fix xfs_iomap_eof_prealloc_initial_size type
...
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The new context tracking subsystem unconditionally includes kvm_host.h
headers for the guest enter/exit macros. This causes a compile
failure when KVM is not enabled.
Fix by adding an IS_ENABLED(CONFIG_KVM) check to kvm_host so it can
be included/compiled even when KVM is not enabled.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Note that we mark as preempted only when vcpu's task state was
Running during preemption.
Thanks Jiannan, Avi for preemption notifier ideas. Thanks Gleb, PeterZ
for their precious suggestions. Thanks Srikar for an idea on avoiding
rcu lock while checking task state that improved overcommit numbers.
Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Add a new bus type for virtio-ccw devices on s390.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently, eventfd introduces module_init/module_exit functions
to initialize/cleanup the irqfd workqueue. This only works, however,
if no other module_init/module_exit functions are built into the
same module.
Let's just move the initialization and cleanup to kvm_init and kvm_exit.
This way, it is also clearer where kvm startup may fail.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch makes the parameter old a const pointer to the old memory
slot and adds a new parameter named change to know the change being
requested: the former is for removing extra copying and the latter is
for cleaning up the code.
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch drops the parameter old, a copy of the old memory slot, and
adds a new parameter named change to know the change being requested.
This not only cleans up the code but also removes extra copying of the
memory slot structure.
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This will be used for cleaning up prepare/commit_memory_region() later.
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Except ia64's stale code, KVM_SET_MEMORY_REGION support, this is only
used for sanity checks in __kvm_set_memory_region() which can easily
be changed to use slot id instead.
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
X86 does not use this any more. The remaining user, s390's !user_alloc
check, can be simply removed since KVM_SET_MEMORY_REGION ioctl is no
longer supported.
Note: fixed powerpc's indentations with spaces to suppress checkpatch
errors.
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This field was needed to differentiate memory slots created by the new
API, KVM_SET_USER_MEMORY_REGION, from those by the old equivalent,
KVM_SET_MEMORY_REGION, whose support was dropped long before:
commit b74a07beed
KVM: Remove kernel-allocated memory regions
Although we also have private memory slots to which KVM allocates
memory with vm_mmap(), !user_alloc slots in other words, the slot id
should be enough for differentiating them.
Note: corresponding function parameters will be removed later.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
manually, which is fully taken care of by the hardware. This needs
some special awareness into existing interrupr injection path:
- for pending interrupt, instead of direct injection, we may need
update architecture specific indicators before resuming to guest.
- A pending interrupt, which is masked by ISR, should be also
considered in above update action, since hardware will decide
when to inject it at right time. Current has_interrupt and
get_interrupt only returns a valid vector from injection p.o.v.
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
While remotely reading the cputime of a task running in a
full dynticks CPU, the values stored in utime/stime fields
of struct task_struct may be stale. Its values may be those
of the last kernel <-> user transition time snapshot and
we need to add the tickless time spent since this snapshot.
To fix this, flush the cputime of the dynticks CPUs on
kernel <-> user transition and record the time / context
where we did this. Then on top of this snapshot and the current
time, perform the fixup on the reader side from task_times()
accessors.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[fixed kvm module related build errors]
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Do some ground preparatory work before adding guest_enter()
and guest_exit() context tracking callbacks. Those will
be later used to read the guest cputime safely when we
run in full dynticks mode.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
The External Proxy Facility in FSL BookE chips allows the interrupt
controller to automatically acknowledge an interrupt as soon as a
core gets its pending external interrupt delivered.
Today, user space implements the interrupt controller, so we need to
check on it during such a cycle.
This patch implements logic for user space to enable EPR exiting,
disable EPR exiting and EPR exiting itself, so that user space can
acknowledge an interrupt when an external interrupt has successfully
been delivered into the guest vcpu.
Signed-off-by: Alexander Graf <agraf@suse.de>
Previous patch "kvm: Minor memory slot optimization" (b7f69c555c)
overlooked the generation field of the memory slots. Re-using the
original memory slots left us with with two slightly different memory
slots with the same generation. To fix this, make update_memslots()
take a new parameter to specify the last generation. This also makes
generation management more explicit to avoid such problems in the future.
Reported-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
We're currently offering a whopping 32 memory slots to user space, an
int is a bit excessive for storing this. We would like to increase
our memslots, but SHRT_MAX should be more than enough.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
struct kvm_userspace_memory_region.flags is a u32 with a comment that
bits 0 ~ 15 are visible to userspace and the other bits are reserved
for kvm internal use. KVM_MEMSLOT_INVALID is the only internal use
flag and it has a comment that bits 16 ~ 31 are internally used and
the other bits are visible to userspace.
Therefore, let's define this as a u32 so we don't waste bytes on LP64
systems. Move to the end of the struct for alignment.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
There's no need for this to be an int, it holds a boolean.
Move to the end of the struct for alignment.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Seems like everyone copied x86 and defined 4 private memory slots
that never actually get used. Even x86 only uses 3 of the 4. These
aren't exposed so there's no need to add padding.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM. One is
the user accessible slots and the other is user + private. Make this
more obvious.
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Pull KVM updates from Marcelo Tosatti:
"Considerable KVM/PPC work, x86 kvmclock vsyscall support,
IA32_TSC_ADJUST MSR emulation, amongst others."
Fix up trivial conflict in kernel/sched/core.c due to cross-cpu
migration notifier added next to rq migration call-back.
* tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (156 commits)
KVM: emulator: fix real mode segment checks in address linearization
VMX: remove unneeded enable_unrestricted_guest check
KVM: VMX: fix DPL during entry to protected mode
x86/kexec: crash_vmclear_local_vmcss needs __rcu
kvm: Fix irqfd resampler list walk
KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump
x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary
KVM: MMU: optimize for set_spte
KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface
KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation
KVM: PPC: bookehv: Add guest computation mode for irq delivery
KVM: PPC: Make EPCR a valid field for booke64 and bookehv
KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit
KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation
KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation
KVM: PPC: e500: Add emulation helper for getting instruction ea
KVM: PPC: bookehv64: Add support for interrupt handling
KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler
KVM: PPC: booke: Fix get_tb() compile error on 64-bit
KVM: PPC: e500: Silence bogus GCC warning in tlb code
...
The current eventfd code assumes that when we have eventfd, we also have
irqfd for in-kernel interrupt delivery. This is not necessarily true. On
PPC we don't have an in-kernel irqchip yet, but we can still support easily
support eventfd.
Signed-off-by: Alexander Graf <agraf@suse.de>