android_kernel_lge_bullhead

Commit Graph

Author	SHA1	Message	Date
Eric Dumazet	801c8a021e	can: Fix kernel panic at security_sock_rcv_skb commit f1712c73714088a7252d276a57126d56c7d37e64 upstream. Zhang Yanmin reported crashes [1] and provided a patch adding a synchronize_rcu() call in can_rx_unregister() The main problem seems that the sockets themselves are not RCU protected. If CAN uses RCU for delivery, then sockets should be freed only after one RCU grace period. Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's ease stable backports with the following fix instead. [1] BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81495e25>] selinux_socket_sock_rcv_skb+0x65/0x2a0 Call Trace: <IRQ> [<ffffffff81485d8c>] security_sock_rcv_skb+0x4c/0x60 [<ffffffff81d55771>] sk_filter+0x41/0x210 [<ffffffff81d12913>] sock_queue_rcv_skb+0x53/0x3a0 [<ffffffff81f0a2b3>] raw_rcv+0x2a3/0x3c0 [<ffffffff81f06eab>] can_rcv_filter+0x12b/0x370 [<ffffffff81f07af9>] can_receive+0xd9/0x120 [<ffffffff81f07beb>] can_rcv+0xab/0x100 [<ffffffff81d362ac>] __netif_receive_skb_core+0xd8c/0x11f0 [<ffffffff81d36734>] __netif_receive_skb+0x24/0xb0 [<ffffffff81d37f67>] process_backlog+0x127/0x280 [<ffffffff81d36f7b>] net_rx_action+0x33b/0x4f0 [<ffffffff810c88d4>] __do_softirq+0x184/0x440 [<ffffffff81f9e86c>] do_softirq_own_stack+0x1c/0x30 <EOI> [<ffffffff810c76fb>] do_softirq.part.18+0x3b/0x40 [<ffffffff810c8bed>] do_softirq+0x1d/0x20 [<ffffffff81d30085>] netif_rx_ni+0xe5/0x110 [<ffffffff8199cc87>] slcan_receive_buf+0x507/0x520 [<ffffffff8167ef7c>] flush_to_ldisc+0x21c/0x230 [<ffffffff810e3baf>] process_one_work+0x24f/0x670 [<ffffffff810e44ed>] worker_thread+0x9d/0x6f0 [<ffffffff810e4450>] ? rescuer_thread+0x480/0x480 [<ffffffff810ebafc>] kthread+0x12c/0x150 [<ffffffff81f9ccef>] ret_from_fork+0x3f/0x70 Reported-by: Zhang Yanmin <yanmin.zhang@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-08 00:47:09 +02:00
Rafael J. Wysocki	62bdbcf4f2	ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage commit 0294112ee3135fbd15eaa70015af8283642dd970 upstream. This effectively reverts the following three commits: 7bc10388ccdd ACPI / resources: free memory on error in add_region_before() 0f1b414d1907 ACPI / PNP: Avoid conflicting resource reservations b9a5e5e18fbf ACPI / init: Fix the ordering of acpi_reserve_resources() (commit b9a5e5e18fbf introduced regressions some of which, but not all, were addressed by commit 0f1b414d1907 and commit 7bc10388ccdd was a fixup on top of the latter) and causes ACPI fixed hardware resources to be reserved at the fs_initcall_sync stage of system initialization. The story is as follows. First, a boot regression was reported due to an apparent resource reservation ordering change after a commit that shouldn't lead to such changes. Investigation led to the conclusion that the problem happened because acpi_reserve_resources() was executed at the device_initcall() stage of system initialization which wasn't strictly ordered with respect to driver initialization (and with respect to the initialization of the pcieport driver in particular), so a random change causing the device initcalls to be run in a different order might break things. The response to that was to attempt to run acpi_reserve_resources() as soon as we knew that ACPI would be in use (commit b9a5e5e18fbf). However, that turned out to be too early, because it caused resource reservations made by the PNP system driver to fail on at least one system and that failure was addressed by commit 0f1b414d1907. That fix still turned out to be insufficient, though, because calling acpi_reserve_resources() before the fs_initcall stage of system initialization caused a boot regression to happen on the eCAFE EC-800-H20G/S netbook. That meant that we only could call acpi_reserve_resources() at the fs_initcall initialization stage or later, but then we might just as well call it after the PNP initalization in which case commit 0f1b414d1907 wouldn't be necessary any more. For this reason, the changes made by commit 0f1b414d1907 are reverted (along with a memory leak fixup on top of that commit), the changes made by commit b9a5e5e18fbf that went too far are reverted too and acpi_reserve_resources() is changed into fs_initcall_sync, which will cause it to be executed after the PNP subsystem initialization (which is an fs_initcall) and before device initcalls (including the pcieport driver initialization) which should avoid the initial issue. Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581 Link: http://marc.info/?t=143092384600002&r=1&w=2 Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831 Link: http://marc.info/?t=143389402600001&r=1&w=2 Fixes: b9a5e5e18fbf "ACPI / init: Fix the ordering of acpi_reserve_resources()" Reported-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-08 00:47:06 +02:00
Rafael J. Wysocki	0f06de41a3	ACPI / PNP: Avoid conflicting resource reservations commit 0f1b414d190724617eb1cdd615592fa8cd9d0b50 upstream. Commit b9a5e5e18fbf "ACPI / init: Fix the ordering of acpi_reserve_resources()" overlooked the fact that the memory and/or I/O regions reserved by acpi_reserve_resources() may conflict with those reserved by the PNP "system" driver. If that conflict actually takes place, it causes the reservations made by the "system" driver to fail while before commit b9a5e5e18fbf all reservations made by it and by acpi_reserve_resources() would be successful. In turn, that allows the resources that haven't been reserved by the "system" driver to be used by others (e.g. PCI) which sometimes leads to functional problems (up to and including boot failures). To fix that issue, introduce a common resource reservation routine, acpi_reserve_region(), to be used by both acpi_reserve_resources() and the "system" driver, that will track all resources reserved by it and avoid making conflicting requests. Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831 Link: http://marc.info/?t=143389402600001&r=1&w=2 Fixes: b9a5e5e18fbf "ACPI / init: Fix the ordering of acpi_reserve_resources()" Reported-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-08 00:47:06 +02:00
Peter Zijlstra	12f1a0f987	locking/static_keys: Add static_key_{en,dis}able() helpers commit e33886b38cc82a9fc3b2d655dfc7f50467594138 upstream. Add two helpers to make it easier to treat the refcount as boolean. [js] do not involve WARN_ON_ONCE as it causes build failures Suggested-by: Jason Baron <jasonbaron0@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz> [wt: only backported for use in next fix ; s/static_key_count(key)/atomic_read(&key->enabled)/] Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-08 00:47:03 +02:00
Rik van Riel	2b6aa6271a	tracing: Add #undef to fix compile error commit bf7165cfa23695c51998231c4efa080fe1d3548d upstream. There are several trace include files that define TRACE_INCLUDE_FILE. Include several of them in the same .c file (as I currently have in some code I am working on), and the compile will blow up with a "warning: "TRACE_INCLUDE_FILE" redefined #define TRACE_INCLUDE_FILE syscalls" Every other include file in include/trace/events/ avoids that issue by having a #undef TRACE_INCLUDE_FILE before the #define; syscalls.h should have one, too. Link: http://lkml.kernel.org/r/20160928225554.13bd7ac6@annuminas.surriel.com Fixes: `b8007ef742` ("tracing: Separate raw syscall from syscall tracer") Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-08 00:47:02 +02:00
Trond Myklebust	ebd9572e50	nlm: Ensure callback code also checks that the files match commit 251af29c320d86071664f02c76f0d063a19fefdf upstream. It is not sufficient to just check that the lock pids match when granting a callback, we also need to ensure that we're granting the callback on the right file. Reported-by: Pankaj Singh <psingh.ait@gmail.com> Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-08 00:47:02 +02:00
Jason Gunthorpe	e9a1e1cc1c	RDMA/core: Fix incorrect structure packing for booleans commit 55efcfcd7776165b294f8b5cd6e05ca00ec89b7c upstream. The RDMA core uses ib_pack() to convert from unpacked CPU structs to on-the-wire bitpacked structs. This process requires that 1 bit fields are declared as u8 in the unpacked struct, otherwise the packing process does not read the value properly and the packed result is wired to 0. Several places wrongly used int. Crucially this means the kernel has never, set reversible correctly in the path record request. It has always asked for irreversible paths even if the ULP requests otherwise. When the kernel is used with a SM that supports this feature, it completely breaks communication management if reversible paths are not properly requested. The only reason this ever worked is because opensm ignores the reversible bit. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-08 00:47:02 +02:00
Eric Dumazet	96296cddf9	netlabel: out of bound access in cipso_v4_validate() commit d71b7896886345c53ef1d84bda2bc758554f5d61 upstream. syzkaller found another out of bound access in ip_options_compile(), or more exactly in cipso_v4_validate() Fixes: `20e2a86485` ("cipso: handle CIPSO options correctly when NetLabel is disabled") Fixes: `446fda4f26` ("[NetLabel]: CIPSOv4 engine") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paul Moore <paul@paul-moore.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-08 00:46:58 +02:00
Herbert Xu	8ec6068428	gro: Disable frag0 optimization on IPv6 ext headers commit 57ea52a865144aedbcd619ee0081155e658b6f7d upstream. The GRO fast path caches the frag0 address. This address becomes invalid if frag0 is modified by pskb_may_pull or its variants. So whenever that happens we must disable the frag0 optimization. This is usually done through the combination of gro_header_hard and gro_header_slow, however, the IPv6 extension header path did the pulling directly and would continue to use the GRO fast path incorrectly. This patch fixes it by disabling the fast path when we enter the IPv6 extension header path. Fixes: `78a478d0ef` ("gro: Inline skb_gro_header and cache frag0 virtual address") Reported-by: Slava Shwartsman <slavash@mellanox.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-08 00:46:54 +02:00
Michal Hocko	57bf12f43e	hotplug: Make register and unregister notifier API symmetric commit 777c6e0daebb3fcefbbd6f620410a946b07ef6d0 upstream. Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its notifiers when HOTPLUG_CPU=y while the registration might succeed even when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap might keep a stale notifier on the list on the manual clean up during the pool tear down and thus corrupt the list. Resulting in the following [ 144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78 [ 144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40 <snipped> [ 145.122628] Call Trace: [ 145.125086] [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20 [ 145.131350] [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400 [ 145.137268] [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300 [ 145.143188] [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10 [ 145.149018] [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30 [ 145.154940] [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0 [ 145.160511] [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20 [ 145.167035] [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0 [ 145.172694] [<ffffffffa290848d>] module_attr_store+0x1d/0x30 [ 145.178443] [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70 [ 145.183925] [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180 [ 145.189761] [<ffffffffa2a99248>] __vfs_write+0x18/0x40 [ 145.194982] [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0 [ 145.200122] [<ffffffffa2a9a732>] SyS_write+0x52/0xa0 [ 145.205177] [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17 This can be even triggered manually by changing /sys/module/zswap/parameters/compressor multiple times. Fix this issue by making unregister APIs symmetric to the register so there are no surprises. [js] backport to 3.12 Fixes: `47e627bc8c` ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU") Reported-and-tested-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Streetman <ddstreet@ieee.org> Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-08 00:46:48 +02:00
Jan Kara	dd2421b5ed	posix_acl: Clear SGID bit when setting file permissions commit 073931017b49d9458aa351605b43a7e34598caef upstream. When file permissions are modified via chmod(2) and the user is not in the owning group or capable of CAP_FSETID, the setgid bit is cleared in inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file permissions as well as the new ACL, but doesn't clear the setgid bit in a similar way; this allows to bypass the check in chmod(2). Fix that. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> [wt: dropped hfsplus changes : no xattr in 3.10] Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-08 00:46:47 +02:00
James Yonan	5fd53819a3	crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks commit 6bf37e5aa90f18baf5acf4874bca505dd667c37f upstream. When comparing MAC hashes, AEAD authentication tags, or other hash values in the context of authentication or integrity checking, it is important not to leak timing information to a potential attacker, i.e. when communication happens over a network. Bytewise memory comparisons (such as memcmp) are usually optimized so that they return a nonzero value as soon as a mismatch is found. E.g, on x86_64/i5 for 512 bytes this can be ~50 cyc for a full mismatch and up to ~850 cyc for a full match (cold). This early-return behavior can leak timing information as a side channel, allowing an attacker to iteratively guess the correct result. This patch adds a new method crypto_memneq ("memory not equal to each other") to the crypto API that compares memory areas of the same length in roughly "constant time" (cache misses could change the timing, but since they don't reveal information about the content of the strings being compared, they are effectively benign). Iow, best and worst case behaviour take the same amount of time to complete (in contrast to memcmp). Note that crypto_memneq (unlike memcmp) can only be used to test for equality or inequality, NOT for lexicographical order. This, however, is not an issue for its use-cases within the crypto API. We tried to locate all of the places in the crypto API where memcmp was being used for authentication or integrity checking, and convert them over to crypto_memneq. crypto_memneq is declared noinline, placed in its own source file, and compiled with optimizations that might increase code size disabled ("Os") because a smart compiler (or LTO) might notice that the return value is always compared against zero/nonzero, and might then reintroduce the same early-return optimization that we are trying to avoid. Using #pragma or __attribute__ optimization annotations of the code for disabling optimization was avoided as it seems to be considered broken or unmaintained for long time in GCC [1]. Therefore, we work around that by specifying the compile flag for memneq.o directly in the Makefile. We found that this seems to be most appropriate. As we use ("Os"), this patch also provides a loop-free "fast-path" for frequently used 16 byte digests. Similarly to kernel library string functions, leave an option for future even further optimized architecture specific assembler implementations. This was a joint work of James Yonan and Daniel Borkmann. Also thanks for feedback from Florian Weimer on this and earlier proposals [2]. [1] http://gcc.gnu.org/ml/gcc/2012-07/msg00211.html [2] https://lkml.org/lkml/2013/2/10/131 Signed-off-by: James Yonan <james@openvpn.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-06-08 00:46:46 +02:00
Jin Qian	b69c3038bb	UPSTREAM: f2fs: sanity check segment count commit b9dd46188edc2f0d1f37328637860bb65a771124 upstream. F2FS uses 4 bytes to represent block address. As a result, supported size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments. Signed-off-by: Jin Qian <jinqian@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Bug: 36815012 Change-Id: I30ea36df066bc07e32e767336b7cae12063fe415	2017-06-07 19:32:36 +00:00
Andrew Chant	69ff3a22bb	Merge July 2017 security patches Merge 'android-msm-bullhead-3.10-nyc-mr2' into 'android-msm-bullhead-3.10' July 2017.1 Bug: 38137577 Change-Id: Id2935b141bbaa52d6ec63648551ac5dec3e21487	2017-05-17 23:07:26 -07:00
Dennis Cagle	2c2206a977	ashmem: remove cache maintenance support The cache maintenance routines in ashmem were causing several security issues. Since they are not being used anymore by any drivers, its well to remove them entirely. Bug: 34126808 Bug: 34173755 Bug: 34203176 CRs-Fixed: 1107034, 2001129, 2007786 Change-Id: I955e33d90b888d58db5cf6bb490905283374425b Signed-off-by: Sudarshan Rajagopalan <sudaraja@codeaurora.org> Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>	2017-05-17 17:24:19 +00:00
Chenbo Feng	0155e7c110	ANDROID: Add untag hacks to inet_release function To prevent protential risk of memory leak caused by closing socket with out untag it from qtaguid module, the qtaguid module now do not hold any socket file reference count. Instead, it will increase the sk_refcnt of the sk struct to prevent a reuse of the socket pointer. And when a socket is released. It will delete the tag if the socket is previously tagged so no more resources is held by xt_qtaguid moudle. A flag is added to the untag process to prevent possible kernel crash caused by fail to delete corresponding socket_tag_entry list. Bug: 36374484 Test: compile and run test under system/extra/test/iptables, run cts -m CtsNetTestCases -t android.net.cts.SocketRefCntTest Signed-off-by: Chenbo Feng <fengc@google.com> Change-Id: Iea7c3bf0c59b9774a5114af905b2405f6bc9ee52	2017-05-10 13:07:12 +09:00
Hareesh Gundu	6ddd277a82	msm: kgsl: Allow draw context to perform only replay on recovery Robust context attempts to perform a rendering that takes too long whether due to an infinite loop in a shader or even just a rendering operation that takes too long on the given hardware. This type of attempts can result into GPU faults. Robust context expect driver to replay IB instead skip IB and if it fails on replay context has to be invalidated. KGSL_CONTEXT_INVALIDATE_ON_FAULT flag allows draw context to execute only replay policy on GPU fault recovery instead of going to default recovery policy. User space has to set this flag during the context creation. Bug: 34887800 Change-Id: If42dc5afc7d5ed1226b73ae5abfa2648d7acf2c3 Signed-off-by: Hareesh Gundu <hareeshg@codeaurora.org>	2017-04-21 16:30:08 +00:00
Nick Desaulniers	bd015ed765	Merge branch msm-lge/android-msm-bullhead-3.10-nyc-mr2 into android-msm-bullhead-3.10 June 2017.1 Bug: 37156499 Change-Id: I0a6fb21fc83e5f4e417ccb0ba0667ee38f2c1880	2017-04-11 10:07:27 -07:00
Maciej Żenczykowski	dad64933db	BACKPORT: ipv6 addrconf: implement RFC7559 router solicitation backoff This implements: https://tools.ietf.org/html/rfc7559 Backoff is performed according to RFC3315 section 14: https://tools.ietf.org/html/rfc3315#section-14 We allow setting /proc/sys/net/ipv6/conf//router_solicitations to a negative value meaning an unlimited number of retransmits, and we make this the new default (inline with the RFC). We also add a new setting: /proc/sys/net/ipv6/conf//router_solicitation_max_interval defaulting to 1 hour (per RFC recommendation). Signed-off-by: Maciej Żenczykowski <maze@google.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit bd11f0741fa5a2c296629898ad07759dd12b35bb in DaveM's net-next/master, should make Linus' tree in 4.9-rc1) Change-Id: Ia32cdc5c61481893ef8040734e014bf2229fc39e	2017-04-11 16:47:07 +09:00
Dennis Cagle	b74e5f52ae	msm: camera: cpp: Fixing Heap overflow in output buffer Issue: Missing bound check when writing into the output array buffer, which can lead to out-of-bound heap write. Fix: Addding hardcoded constant 8 in the MSM_OUTPUT_BUF_CNT macro and size check to the place where the array is accessed. Returning '0' if exceeds MSM_OUTPUT_BUF_CNT. Caller will return -EINVAL for '0'. Bug: 34621613 Change-Id: Ic03f86e3e47ece9ca7069527e741a75ad9a0f83f CRs-Fixed: 2004036 Signed-off-by: Pratap Nirujogi <pratapn@codeaurora.org> Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>	2017-04-10 22:53:20 +00:00
Joel Scherpelz	e29230a913	net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs. This commit adds a new sysctl accept_ra_rt_info_min_plen that defines the minimum acceptable prefix length of Route Information Options. The new sysctl is intended to be used together with accept_ra_rt_info_max_plen to configure a range of acceptable prefix lengths. It is useful to prevent misconfigurations from unintentionally blackholing too much of the IPv6 address space (e.g., home routers announcing RIOs for fc00::/7, which is incorrect). Backport of net-next commit bbea124bc99d ("net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs.") [lorenzo@google.com: fixed conflicts in include/uapi/linux/ipv6.h] Bug: 33333670 Test: net_test passes Signed-off-by: Joel Scherpelz <jscherpelz@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-07 06:22:09 +00:00
Ecco Park	b6ad97c61a	Merge branch 'android-msm-bullhead-3.10-nyc-mr2' into android-msm-bullhead-3.10 May 2017.1 Bug:36138302	2017-03-15 20:37:51 -07:00
Ecco Park	0c289272eb	Merge branch 'android-msm-bullhead-3.10-nyc-mr1' into android-msm-bullhead-3.10-nyc-mr2 May 2017.1 Bug: 36138302	2017-03-15 20:21:31 -07:00
Steven Rostedt (Red Hat)	a4d7a2b9f5	UPSTREAM: tracing: Fix trace_printk() to print when not using bprintk() The trace_printk() code will allocate extra buffers if the compile detects that a trace_printk() is used. To do this, the format of the trace_printk() is saved to the __trace_printk_fmt section, and if that section is bigger than zero, the buffers are allocated (along with a message that this has happened). If trace_printk() uses a format that is not a constant, and thus something not guaranteed to be around when the print happens, the compiler optimizes the fmt out, as it is not used, and the __trace_printk_fmt section is not filled. This means the kernel will not allocate the special buffers needed for the trace_printk() and the trace_printk() will not write anything to the tracing buffer. Adding a "__used" to the variable in the __trace_printk_fmt section will keep it around, even though it is set to NULL. This will keep the string from being printed in the debugfs/tracing/printk_formats section as it is not needed. Reported-by: Vlastimil Babka <vbabka@suse.cz> Fixes: `07d777fe8c` "tracing: Add percpu buffers for trace_printk()" Cc: stable@vger.kernel.org # v3.5+ Bug: 34277115 Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Change-Id: I10ce56caa41c7644d9d290d9ed272a6d156c938c	2017-03-16 00:53:51 +00:00
Peter Zijlstra	5bbf533582	perf: Tighten (and fix) the grouping condition commit c3c87e770458aa004bd7ed3f29945ff436fd6511 upstream. The fix from 9fc81d87420d ("perf: Fix events installation during moving group") was incomplete in that it failed to recognise that creating a group with events for different CPUs is semantically broken -- they cannot be co-scheduled. Furthermore, it leads to real breakage where, when we create an event for CPU Y and then migrate it to form a group on CPU X, the code gets confused where the counter is programmed -- triggered in practice as well by me via the perf fuzzer. Fix this by tightening the rules for creating groups. Only allow grouping of counters that can be co-scheduled in the same context. This means for the same task and/or the same cpu. Fixes: 9fc81d87420d ("perf: Fix events installation during moving group") Bug: 34515362 Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Change-Id: I72247f51388177f172845ccd4621debcd3158940	2017-03-16 00:53:05 +00:00
Srinivas Girigowda	d0e43cb5d9	Driver to create cld80211 nl family at bootup time Create cnss_genl driver to create a netlink family cld80211 and make it available to cld driver and applications when they query for it. This driver creates multicast groups to facilitate communication from cld driver to userspace and allows cld driver to register for different commands from user space. Resolve compilation errors and tweak netlink family creation Change-Id: I0795dd08b6429fad60187fee724b3fd3ccfa5603 CRs-Fixed: 1100401 Bug: 32775496 Signed-off-by: Srinivas Dasari <dasaris@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2017-03-15 12:19:26 -07:00
John Dias	383afae450	Merge branch 'android-msm-bullhead-3.10-nyc-mr2' into android-msm-bullhead-3.10 April 2017.1 Bug: 34977530	2017-02-23 17:05:32 -08:00
Srinivas Girigowda	51b2a85494	cfg80211: Define macro to indicate support for random mac address for scan Define macro to indicate backport support for using random MAC addresses for scan while unassociated. Change-Id: I2b7318ca8f6af29a9eb13d14e8c1e55bd41ae654 CRs-Fixed: 1082480 Bug: 35436707 Signed-off-by: Srinivas Dasari <dasaris@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2017-02-16 14:25:16 -08:00
Srinivas Girigowda	823815f02f	cfg80211: allow drivers to support random MAC addresses for scan Add the necessary feature flags and a scan flag to support using random MAC addresses for scan while unassociated. The configuration for this supports an arbitrary MAC address value and mask, so that any kind of configuration (e.g. fixed OUI or full 46-bit random) can be requested. Full 46-bit random is the default when no other configuration is passed. Also add a small helper function to use the addr/mask correctly. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Git-commit: ad2b26abc157460ca6fac1a53a2bfeade283adfa Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next.git [dasaris@codeaurora.org: backport to 3.18 excluding the changes in nl80211_parse_wowlan_nd] Change-Id: Id30d201358654c77a99f46500178ebf975d609d5 CRs-Fixed: 1082480 Bug: 35436707 Signed-off-by: Srinivas Dasari <dasaris@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2017-02-16 14:25:11 -08:00
John Dias	e5706784ae	Merge branch 'android-msm-bullhead-3.10-nyc-mr1' into android-msm-bullhead-3.10-nyc-mr2 April 2017.1 Bug: 34977530	2017-02-15 17:05:47 -08:00
Jan Kara	7f22e818a9	BACKPORT: posix_acl: Clear SGID bit when setting file permissions (cherry pick from commit 073931017b49d9458aa351605b43a7e34598caef) When file permissions are modified via chmod(2) and the user is not in the owning group or capable of CAP_FSETID, the setgid bit is cleared in inode_change_ok(). Setting a POSIX ACL via setxattr(2) sets the file permissions as well as the new ACL, but doesn't clear the setgid bit in a similar way; this allows to bypass the check in chmod(2). Fix that. NB: conflicts resolution included extending the change to all visible users of the near deprecated function posix_acl_equiv_mode replaced with posix_acl_update_mode. We did not resolve the ACL leak in this CL, require additional upstream fixes. References: CVE-2016-7097 Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Bug: 32458736 Change-Id: I19591ad452cc825ac282b3cfd2daaa72aa9a1ac1	2017-02-15 02:04:12 +00:00
Miklos Szeredi	de04c177df	BACKPORT: fs: limit filesystem stacking depth Add a simple read-only counter to super_block that indicates how deep this is in the stack of filesystems. Previously ecryptfs was the only stackable filesystem and it explicitly disallowed multiple layers of itself. Overlayfs, however, can be stacked recursively and also may be stacked on top of ecryptfs or vice versa. To limit the kernel stack usage we must limit the depth of the filesystem stack. Initially the limit is set to 2. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> (cherry picked from commit 69c433ed2ecd2d3264efd7afec4439524b319121) Bug: 32761463 Change-Id: I69b2fba2112db2ece09a1bf61a44f8fc4db00820	2017-02-15 00:22:47 +00:00
Pratyush Anand	f7f849f70b	BACKPORT: hw_breakpoint: Allow watchpoint of length 3,5,6 and 7 (cherry picked from commit 651be3cb085341a21847e47c694c249c3e1e4e5b) We only support breakpoint/watchpoint of length 1, 2, 4 and 8. If we can support other length as well, then user may watch more data with less number of watchpoints (provided hardware supports it). For example: if we have to watch only 4th, 5th and 6th byte from a 64 bit aligned address, we will have to use two slots to implement it currently. One slot will watch a half word at offset 4 and other a byte at offset 6. If we can have a watchpoint of length 3 then we can watch it with single slot as well. ARM64 hardware does support such functionality, therefore adding these new definitions in generic layer. Signed-off-by: Pratyush Anand <panand@redhat.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Pavel Labath <labath@google.com> [pavel: tools/include/uapi/linux/hw_breakpoint.h is not present in this branch] Change-Id: Ie17ed89ca526e4fddf591bb4e556fdfb55fc2eac Bug: 30919905	2017-02-14 15:29:33 +00:00
Biswajit Paul	b41c6bfa46	qseecom: whitelist support for kernel client and listener Add whitelist support for listener to send modified resp to TZ; also add whitelist support for kernel client; and change the method to check whitelist feature. Bug: 31268796 CRs-Fixed: 1021945 Change-Id: I0030b0008d6224cda3fdc1f80308a7e9bcfe4405 Signed-off-by: Zhen Kong <zkong@codeaurora.org> Signed-off-by: Biswajit Paul <biswajitpaul@codeaurora.org>	2017-02-13 14:07:35 -08:00
Biswajit Paul	903dbbabd0	qseecom: support whitelist memory for qseecom_send_modfd_cmd qseecom_send_modfd_cmd converts ION buffer's virtual address to scatter gather(SG) list and then sends them to TA by populating SG list into message buffer. As the physical memory address in SG list is used directly by TA, this allows a malicious TA to access/corrupt arbitrary physical memory and may lead to the process gaining kernel/root privileges. Thus, make changes to have the QSEEComm driver passing a list of whitelist buffers that is allowed to be mapped by TA, and the QSEE kernel, in turn, should add checks to the register_shared_buffer syscall to make sure the shared buffers an application is mapping falls within one of these whitelist buffers. Bug: 31268796 CRs-fixed: 1021945 Change-Id: I776ead0030cad167afcf41ab985db7151a42d126 Signed-off-by: Zhen Kong <zkong@codeaurora.org> Signed-off-by: Biswajit Paul <biswajitpaul@codeaurora.org>	2017-02-13 14:06:56 -08:00
Dan Carpenter	a28d835844	mfd: 88pm80x: Double shifting bug in suspend/resume commit 9a6dc644512fd083400a96ac4a035ac154fe6b8d upstream. set_bit() and clear_bit() take the bit number so this code is really doing "1 << (1 << irq)" which is a double shift bug. It's done consistently so it won't cause a problem unless "irq" is more than 4. Fixes: `70c6cce040` ('mfd: Support 88pm80x in 80x driver') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:04:07 +01:00
Sergei Miroshnichenko	ebee2e2f30	can: dev: fix deadlock reported after bus-off commit 9abefcb1aaa58b9d5aa40a8bb12c87d02415e4c8 upstream. A timer was used to restart after the bus-off state, leading to a relatively large can_restart() executed in an interrupt context, which in turn sets up pinctrl. When this happens during system boot, there is a high probability of grabbing the pinctrl_list_mutex, which is locked already by the probe() of other device, making the kernel suspect a deadlock condition [1]. To resolve this issue, the restart_timer is replaced by a delayed work. [1] https://github.com/victronenergy/venus/issues/24 Signed-off-by: Sergei Miroshnichenko <sergeimir@emcraft.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:04:06 +01:00
Nikolay Aleksandrov	2943dca4f3	ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route commit 2cf750704bb6d7ed8c7d732e071dd1bc890ea5e8 upstream. Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid instead of the previous dst_pid which was copied from in_skb's portid. Since the skb is new the portid is 0 at that point so the packets are sent to the kernel and we get scheduling while atomic or a deadlock (depending on where it happens) by trying to acquire rtnl two times. Also since this is RTM_GETROUTE, it can be triggered by a normal user. Here's the sleeping while atomic trace: [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620 [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0 [ 7858.212881] 2 locks held by swapper/0/0: [ 7858.213013] #0: (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350 [ 7858.213422] #1: (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130 [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179 [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 7858.214108] 0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000 [ 7858.214412] ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e [ 7858.214716] 000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f [ 7858.215251] Call Trace: [ 7858.215412] <IRQ> [<ffffffff813a7804>] dump_stack+0x85/0xc1 [ 7858.215662] [<ffffffff810a4a72>] ___might_sleep+0x192/0x250 [ 7858.215868] [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100 [ 7858.216072] [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0 [ 7858.216279] [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460 [ 7858.216487] [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40 [ 7858.216687] [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260 [ 7858.216900] [<ffffffff81573c70>] rtnl_unicast+0x20/0x30 [ 7858.217128] [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0 [ 7858.217351] [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130 [ 7858.217581] [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180 [ 7858.217785] [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180 [ 7858.217990] [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350 [ 7858.218192] [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350 [ 7858.218415] [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180 [ 7858.218656] [<ffffffff810fde10>] run_timer_softirq+0x260/0x640 [ 7858.218865] [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f [ 7858.219068] [<ffffffff816637c8>] __do_softirq+0xe8/0x54f [ 7858.219269] [<ffffffff8107a948>] irq_exit+0xb8/0xc0 [ 7858.219463] [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50 [ 7858.219678] [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0 [ 7858.219897] <EOI> [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10 [ 7858.220165] [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10 [ 7858.220373] [<ffffffff810298e3>] default_idle+0x23/0x190 [ 7858.220574] [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20 [ 7858.220790] [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60 [ 7858.221016] [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0 [ 7858.221257] [<ffffffff8164f995>] rest_init+0x135/0x140 [ 7858.221469] [<ffffffff81f83014>] start_kernel+0x50e/0x51b [ 7858.221670] [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120 [ 7858.221894] [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c [ 7858.222113] [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a Fixes: `2942e90050` ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:52 +01:00
Eric Dumazet	9864f15313	net: avoid sk_forward_alloc overflows commit 20c64d5cd5a2bdcdc8982a06cb05e5e1bd851a3d upstream. A malicious TCP receiver, sending SACK, can force the sender to split skbs in write queue and increase its memory usage. Then, when socket is closed and its write queue purged, we might overflow sk_forward_alloc (It becomes negative) sk_mem_reclaim() does nothing in this case, and more than 2GB are leaked from TCP perspective (tcp_memory_allocated is not changed) Then warnings trigger from inet_sock_destruct() and sk_stream_kill_queues() seeing a not zero sk_forward_alloc All TCP stack can be stuck because TCP is under memory pressure. A simple fix is to preemptively reclaim from sk_mem_uncharge(). This makes sure a socket wont have more than 2 MB forward allocated, after burst and idle period. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:52 +01:00
Eric Dumazet	a5b79829a4	net: fix sk_mem_reclaim_partial() commit 1a24e04e4b50939daa3041682b38b82c896ca438 upstream. sk_mem_reclaim_partial() goal is to ensure each socket has one SK_MEM_QUANTUM forward allocation. This is needed both for performance and better handling of memory pressure situations in follow up patches. SK_MEM_QUANTUM is currently a page, but might be reduced to 4096 bytes as some arches have 64KB pages. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:52 +01:00
Mahesh Bandewar	745db35452	bonding: Fix bonding crash commit 24b27fc4cdf9e10c5e79e5923b6b7c2c5c95096c upstream. Following few steps will crash kernel - (a) Create bonding master > modprobe bonding miimon=50 (b) Create macvlan bridge on eth2 > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \ type macvlan (c) Now try adding eth2 into the bond > echo +eth2 > /sys/class/net/bond0/bonding/slaves <crash> Bonding does lots of things before checking if the device enslaved is busy or not. In this case when the notifier call-chain sends notifications, the bond_netdev_event() assumes that the rx_handler /rx_handler_data is registered while the bond_enslave() hasn't progressed far enough to register rx_handler for the new slave. This patch adds a rx_handler check that can be performed right at the beginning of the enslave code to avoid getting into this situation. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:47 +01:00
Eric Dumazet	56325d9fb7	tcp: take care of truncations done by sk_filter() commit ac6e780070e30e4c35bd395acfe9191e6268bdd3 upstream. With syzkaller help, Marco Grassi found a bug in TCP stack, crashing in tcp_collapse() Root cause is that sk_filter() can truncate the incoming skb, but TCP stack was not really expecting this to happen. It probably was expecting a simple DROP or ACCEPT behavior. We first need to make sure no part of TCP header could be removed. Then we need to adjust TCP_SKB_CB(skb)->end_seq Many thanks to syzkaller team and Marco for giving us a reproducer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Marco Grassi <marco.gra@gmail.com> Reported-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:46 +01:00
Eric Dumazet	13403121e7	tcp: fix use after free in tcp_xmit_retransmit_queue() commit bb1fceca22492109be12640d49f5ea5a544c6bb4 upstream. When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the tail of the write queue using tcp_add_write_queue_tail() Then it attempts to copy user data into this fresh skb. If the copy fails, we undo the work and remove the fresh skb. Unfortunately, this undo lacks the change done to tp->highest_sack and we can leave a dangling pointer (to a freed skb) Later, tcp_xmit_retransmit_queue() can dereference this pointer and access freed memory. For regular kernels where memory is not unmapped, this might cause SACK bugs because tcp_highest_sack_seq() is buggy, returning garbage instead of tp->snd_nxt, but with various debug features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel. This bug was found by Marco Grassi thanks to syzkaller. Fixes: `6859d49475` ("[TCP]: Abstract tp->highest_sack accessing & point to next skb") Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:45 +01:00
Eli Cooper	b030cd1acc	ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() commit 23f4ffedb7d751c7e298732ba91ca75d224bc1a6 upstream. skb->cb may contain data from previous layers. In the observed scenario, the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so that small packets sent through the tunnel are mistakenly fragmented. This patch unconditionally clears the control buffer in ip6tunnel_xmit(), which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier. Cc: stable@vger.kernel.org Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:44 +01:00
Hannes Frederic Sowa	835b474b8f	ipv6: move DAD and addrconf_verify processing to workqueue commit c15b1ccadb323ea50023e8f1cca2954129a62b51 upstream. addrconf_join_solict and addrconf_join_anycast may cause actions which need rtnl locked, especially on first address creation. A new DAD state is introduced which defers processing of the initial DAD processing into a workqueue. To get rtnl lock we need to push the code paths which depend on those calls up to workqueues, specifically addrconf_verify and the DAD processing. (v2) addrconf_dad_failure needs to be queued up to the workqueue, too. This patch introduces a new DAD state and stop the DAD processing in the workqueue (this is because of the possible ipv6_del_addr processing which removes the solicited multicast address from the device). addrconf_verify_lock is removed, too. After the transition it is not needed any more. As we are not processing in bottom half anymore we need to be a bit more careful about disabling bottom half out when we lock spin_locks which are also used in bh. Relevant backtrace: [ 541.030090] RTNL: assertion failed at net/core/dev.c (4496) [ 541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.10.33-1-amd64-vyatta #1 [ 541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 541.031146] ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8 [ 541.031148] 0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18 [ 541.031150] 0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000 [ 541.031152] Call Trace: [ 541.031153] <IRQ> [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17 [ 541.031180] [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180 [ 541.031183] [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0 [ 541.031185] [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0 [ 541.031189] [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90 [ 541.031198] [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6] [ 541.031208] [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0 [ 541.031212] [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6] [ 541.031216] [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6] [ 541.031219] [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6] [ 541.031223] [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6] [ 541.031226] [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6] [ 541.031229] [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6] [ 541.031233] [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6] [ 541.031241] [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50 [ 541.031244] [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6] [ 541.031247] [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6] [ 541.031255] [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90 [ 541.031258] [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6] Hunks and backtrace stolen from a patch by Stephen Hemminger. Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> # 3.10.y: b7b1bfce: ipv6: split dad and rs timers Cc: <stable@vger.kernel.org> # 3.10.y [Mike Manning <mmanning@brocade.com>: resolved minor conflicts in addrconf.c] Signed-off-by: Mike Manning <mmanning@brocade.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:42 +01:00
Hannes Frederic Sowa	973d5956f7	ipv6: split duplicate address detection and router solicitation timer commit b7b1bfce0bb68bd8f6e62a28295922785cc63781 upstream. This patch splits the timers for duplicate address detection and router solicitations apart. The router solicitations timer goes into inet6_dev and the dad timer stays in inet6_ifaddr. The reason behind this patch is to reduce the number of unneeded router solicitations send out by the host if additional link-local addresses are created. Currently we send out RS for every link-local address on an interface. If the RS timer fires we pick a source address with ipv6_get_lladdr. This change could hurt people adding additional link-local addresses and specifying these addresses in the radvd clients section because we no longer guarantee that we use every ll address as source address in router solicitations. Cc: Flavio Leitner <fleitner@redhat.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reviewed-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> # 3.10.y [Mike Manning <mmanning@brocade.com>: resolved conflicts with 36bddb] Signed-off-by: Mike Manning <mmanning@brocade.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:42 +01:00
Michal KubeÄek	af80b973df	ipv6: don't call fib6_run_gc() until routing is ready commit 2c861cc65ef4604011a0082e4dcdba2819aa191a upstream. When loading the ipv6 module, ndisc_init() is called before ip6_route_init(). As the former registers a handler calling fib6_run_gc(), this opens a window to run the garbage collector before necessary data structures are initialized. If a network device is initialized in this window, adding MAC address to it triggers a NETDEV_CHANGEADDR event, leading to a crash in fib6_clean_all(). Take the event handler registration out of ndisc_init() into a separate function ndisc_late_init() and move it after ip6_route_init(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> # 3.10.y Signed-off-by: Mike Manning <mmanning@brocade.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:42 +01:00
Joe Perches	349759be02	stddef.h: move offsetofend inside #ifndef/#endif guard, neaten commit 8c7fbe5795a016259445a61e072eb0118aaf6a61 upstream. Commit 3876488444e7 ("include/stddef.h: Move offsetofend() from vfio.h to a generic kernel header") added offsetofend outside the normal include #ifndef/#endif guard. Move it inside. Miscellanea: o remove unnecessary blank line o standardize offsetof macros whitespace style Signed-off-by: Joe Perches <joe@perches.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wt: backported only for ipv6 out-of-bounds fix] Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:42 +01:00
Denys Vlasenko	1ddb794478	include/stddef.h: Move offsetofend() from vfio.h to a generic kernel header commit 3876488444e71238e287459c39d7692b6f718c3e upstream. Suggested by Andy. Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1425912738-559-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> [wt: backported only for ipv6 out-of-bounds fix] Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:41 +01:00
Gavin Shan	6cc73a1c19	drivers/vfio: Rework offsetofend() commit b13460b92093b29347e99d6c3242e350052b62cd upstream. The macro offsetofend() introduces unnecessary temporary variable "tmp". The patch avoids that and saves a bit memory in stack. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> [wt: backported only for ipv6 out-of-bounds fix] Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:41 +01:00
Peter Zijlstra	ac74acf2bd	perf: Tighten (and fix) the grouping condition commit c3c87e770458aa004bd7ed3f29945ff436fd6511 upstream. The fix from 9fc81d87420d ("perf: Fix events installation during moving group") was incomplete in that it failed to recognise that creating a group with events for different CPUs is semantically broken -- they cannot be co-scheduled. Furthermore, it leads to real breakage where, when we create an event for CPU Y and then migrate it to form a group on CPU X, the code gets confused where the counter is programmed -- triggered in practice as well by me via the perf fuzzer. Fix this by tightening the rules for creating groups. Only allow grouping of counters that can be co-scheduled in the same context. This means for the same task and/or the same cpu. Fixes: 9fc81d87420d ("perf: Fix events installation during moving group") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:40 +01:00
Dmitry Torokhov	6a0f3597d6	Input: i8042 - break load dependency between atkbd/psmouse and i8042 commit 4097461897df91041382ff6fcd2bfa7ee6b2448c upstream. As explained in 1407814240-4275-1-git-send-email-decui@microsoft.com we have a hard load dependency between i8042 and atkbd which prevents keyboard from working on Gen2 Hyper-V VMs. > hyperv_keyboard invokes serio_interrupt(), which needs a valid serio > driver like atkbd.c. atkbd.c depends on libps2.c because it invokes > ps2_command(). libps2.c depends on i8042.c because it invokes > i8042_check_port_owner(). As a result, hyperv_keyboard actually > depends on i8042.c. > > For a Generation 2 Hyper-V VM (meaning no i8042 device emulated), if a > Linux VM (like Arch Linux) happens to configure CONFIG_SERIO_I8042=m > rather than =y, atkbd.ko can't load because i8042.ko can't load(due to > no i8042 device emulated) and finally hyperv_keyboard can't work and > the user can't input: https://bugs.archlinux.org/task/39820 > (Ubuntu/RHEL/SUSE aren't affected since they use CONFIG_SERIO_I8042=y) To break the dependency we move away from using i8042_check_port_owner() and instead allow serio port owner specify a mutex that clients should use to serialize PS/2 command stream. Reported-by: Mark Laws <mdl@60hz.org> Tested-by: Mark Laws <mdl@60hz.org> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-10 11:03:32 +01:00
Jerry Zhang	60ac2b4a37	usb: gadget: f_fs: Increase EP_ALLOC ioctl number Prevent conflict with possible new upstream ioctls before it itself is upstreamed. Test: None Change-Id: I10cbc01c25f920a626ea7559e8ca80ee08865333 Signed-off-by: Jerry Zhang <zhangjerry@google.com>	2017-02-08 16:19:40 -08:00
Al Viro	138c012118	fix fault_in_multipages_...() on architectures with no-op access_ok() commit e23d4159b109167126e5bcd7f3775c95de7fee47 upstream. Switching iov_iter fault-in to multipages variants has exposed an old bug in underlying fault_in_multipages_...(); they break if the range passed to them wraps around. Normally access_ok() done by callers will prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into such a range and they should not point to any valid objects). However, on architectures where userland and kernel live in different MMU contexts (e.g. s390) access_ok() is a no-op and on those a range with a wraparound can reach fault_in_multipages_...(). Since any wraparound means EFAULT there, the fix is trivial - turn those while (uaddr <= end) ... into if (unlikely(uaddr > end)) return -EFAULT; do ... while (uaddr <= end); Reported-by: Jan Stancek <jstancek@redhat.com> Tested-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-06 23:33:04 +01:00
Al Viro	61e7341c4b	asm-generic: make copy_from_user() zero the destination properly commit 2545e5da080b4839dd859e3b09343a884f6ab0e3 upstream. ... in all cases, including the failing access_ok() Note that some architectures using asm-generic/uaccess.h have __copy_from_user() not zeroing the tail on failure halfway through. This variant works either way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [wt: s/might_fault/might_sleep] Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-06 23:33:01 +01:00
Al Viro	44ccf7f165	asm-generic: make get_user() clear the destination on errors commit 9ad18b75c2f6e4a78ce204e79f37781f8815c0fa upstream. both for access_ok() failures and for faults halfway through Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-06 23:32:59 +01:00
David Vrabel	0a5b4e4cf2	xen: Add RING_COPY_REQUEST() commit 454d5d882c7e412b840e3c99010fe81a9862f6fb upstream. Using RING_GET_REQUEST() on a shared ring is easy to use incorrectly (i.e., by not considering that the other end may alter the data in the shared ring while it is being inspected). Safe usage of a request generally requires taking a local copy. Provide a RING_COPY_REQUEST() macro to use instead of RING_GET_REQUEST() and an open-coded memcpy(). This takes care of ensuring that the copy is done correctly regardless of any possible compiler optimizations. Use a volatile source to prevent the compiler from reordering or omitting the copy. This is part of XSA155. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-06 09:04:07 +01:00
Herbert Xu	61d0583123	crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey path commit 6a935170a980024dd29199e9dbb5c4da4767a1b9 upstream. This patch allows af_alg_release_parent to be called even for nokey sockets. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-06 09:04:07 +01:00
Herbert Xu	16d4044862	crypto: skcipher - Add crypto_skcipher_has_setkey commit a1383cd86a062fc798899ab20f0ec2116cce39cb upstream. This patch adds a way for skcipher users to determine whether a key is required by a transform. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-06 09:04:07 +01:00
Herbert Xu	7040a8428f	crypto: hash - Add crypto_ahash_has_setkey commit a5596d6332787fd383b3b5427b41f94254430827 upstream. This patch adds a way for ahash users to determine whether a key is required by a crypto_ahash transform. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-06 09:04:07 +01:00
Herbert Xu	a37d08973d	crypto: af_alg - Add nokey compatibility path commit 37766586c965d63758ad542325a96d5384f4a8c9 upstream. This patch adds a compatibility path to support old applications that do acept(2) before setkey. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-06 09:04:07 +01:00
Herbert Xu	12514d4693	crypto: af_alg - Disallow bind/setkey/... after accept(2) commit c840ac6af3f8713a71b4d2363419145760bd6044 upstream. Each af_alg parent socket obtained by socket(2) corresponds to a tfm object once bind(2) has succeeded. An accept(2) call on that parent socket creates a context which then uses the tfm object. Therefore as long as any child sockets created by accept(2) exist the parent socket must not be modified or freed. This patch guarantees this by using locks and a reference count on the parent socket. Any attempt to modify the parent socket will fail with EBUSY. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2017-02-06 09:04:07 +01:00
Patrick Tjin	a848c65fb7	Merge branch android-msm-bullhead-3.10-nyc-mr2 into android-msm-bullhead-3.10	2017-01-26 12:02:30 -08:00
Theodore Ts'o	f4387cc432	BACKPORT: random: introduce getrandom(2) system call Almost clean cherry pick of c6e9d6f38894798696f23c8084ca7edbf16ee895, includes change made by merge 0891ad829d2a0501053703df66029e843e3b8365. The getrandom(2) system call was requested by the LibreSSL Portable developers. It is analoguous to the getentropy(2) system call in OpenBSD. The rationale of this system call is to provide resiliance against file descriptor exhaustion attacks, where the attacker consumes all available file descriptors, forcing the use of the fallback code where /dev/[u]random is not available. Since the fallback code is often not well-tested, it is better to eliminate this potential failure mode entirely. The other feature provided by this new system call is the ability to request randomness from the /dev/urandom entropy pool, but to block until at least 128 bits of entropy has been accumulated in the /dev/urandom entropy pool. Historically, the emphasis in the /dev/urandom development has been to ensure that urandom pool is initialized as quickly as possible after system boot, and preferably before the init scripts start execution. This is because changing /dev/urandom reads to block represents an interface change that could potentially break userspace which is not acceptable. In practice, on most x86 desktop and server systems, in general the entropy pool can be initialized before it is needed (and in modern kernels, we will printk a warning message if not). However, on an embedded system, this may not be the case. And so with this new interface, we can provide the functionality of blocking until the urandom pool has been initialized. Any userspace program which uses this new functionality must take care to assure that if it is used during the boot process, that it will not cause the init scripts or other portions of the system startup to hang indefinitely. SYNOPSIS #include <linux/random.h> int getrandom(void buf, size_t buflen, unsigned int flags); DESCRIPTION The system call getrandom() fills the buffer pointed to by buf with up to buflen random bytes which can be used to seed user space random number generators (i.e., DRBG's) or for other cryptographic uses. It should not be used for Monte Carlo simulations or other programs/algorithms which are doing probabilistic sampling. If the GRND_RANDOM flags bit is set, then draw from the /dev/random pool instead of the /dev/urandom pool. The /dev/random pool is limited based on the entropy that can be obtained from environmental noise, so if there is insufficient entropy, the requested number of bytes may not be returned. If there is no entropy available at all, getrandom(2) will either block, or return an error with errno set to EAGAIN if the GRND_NONBLOCK bit is set in flags. If the GRND_RANDOM bit is not set, then the /dev/urandom pool will be used. Unlike using read(2) to fetch data from /dev/urandom, if the urandom pool has not been sufficiently initialized, getrandom(2) will block (or return -1 with the errno set to EAGAIN if the GRND_NONBLOCK bit is set in flags). The getentropy(2) system call in OpenBSD can be emulated using the following function: int getentropy(void buf, size_t buflen) { int ret; if (buflen > 256) goto failure; ret = getrandom(buf, buflen, 0); if (ret < 0) return ret; if (ret == buflen) return 0; failure: errno = EIO; return -1; } RETURN VALUE On success, the number of bytes that was filled in the buf is returned. This may not be all the bytes requested by the caller via buflen if insufficient entropy was present in the /dev/random pool, or if the system call was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. ERRORS EINVAL An invalid flag was passed to getrandom(2) EFAULT buf is outside the accessible address space. EAGAIN The requested entropy was not available, and getentropy(2) would have blocked if the GRND_NONBLOCK flag was not set. EINTR While blocked waiting for entropy, the call was interrupted by a signal handler; see the description of how interrupted read(2) calls on "slow" devices are handled with and without the SA_RESTART flag in the signal(7) man page. NOTES For small requests (buflen <= 256) getrandom(2) will not return EINTR when reading from the urandom pool once the entropy pool has been initialized, and it will return all of the bytes that have been requested. This is the recommended way to use getrandom(2), and is designed for compatibility with OpenBSD's getentropy() system call. However, if you are using GRND_RANDOM, then getrandom(2) may block until the entropy accounting determines that sufficient environmental noise has been gathered such that getrandom(2) will be operating as a NRBG instead of a DRBG for those people who are working in the NIST SP 800-90 regime. Since it may block for a long time, these guarantees do not apply. The user may want to interrupt a hanging process using a signal, so blocking until all of the requested bytes are returned would be unfriendly. For this reason, the user of getrandom(2) MUST always check the return value, in case it returns some error, or if fewer bytes than requested was returned. In the case of !GRND_RANDOM and small request, the latter should never happen, but the careful userspace code (and all crypto code should be careful) should check for this anyway! Finally, unless you are doing long-term key generation (and perhaps not even then), you probably shouldn't be using GRND_RANDOM. The cryptographic algorithms used for /dev/urandom are quite conservative, and so should be sufficient for all purposes. The disadvantage of GRND_RANDOM is that it can block, and the increased complexity required to deal with partially fulfilled getrandom(2) requests. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Zach Brown <zab@zabbo.net> Bug: http://b/29621447 Change-Id: I189ba74070dd6d918b0fdf83ff30bb74ec0f7556 (cherry picked from commit 4af712e8df998475736f3e2727701bd31e3751a9)	2017-01-25 19:06:23 -08:00
Hannes Frederic Sowa	71726ee8fa	BACKPORT: random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized Clean cherry pick of commit 4af712e8df998475736f3e2727701bd31e3751a9. The Tausworthe PRNG is initialized at late_initcall time. At that time the entropy pool serving get_random_bytes is not filled sufficiently. This patch adds an additional reseeding step as soon as the nonblocking pool gets marked as initialized. On some machines it might be possible that late_initcall gets called after the pool has been initialized. In this situation we won't reseed again. (A call to prandom_seed_late blocks later invocations of early reseed attempts.) Joint work with Daniel Borkmann. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Bug: http://b/29621447 Change-Id: I4d20e60b5df16228f3a3699d16ed2b1dddcceb2b (cherry picked from commit 4af712e8df998475736f3e2727701bd31e3751a9)	2017-01-25 19:05:36 -08:00
Dave Weinstein	19c833e083	ANDROID: lib: vsprintf: whitelist stack traces Use the %pP functionality to explicitly allow kernel pointers to be logged for stack traces BUG: 30368199 Change-Id: I495915465565293e9e4da5aa28fbd1d14538d99b Signed-off-by: Dave Weinstein <olorin@google.com>	2017-01-20 13:05:53 -08:00
Patrick Tjin	415ccacc9e	Merge branch 'android-msm-bullhead-3.10-nyc-mr2' into android-msm-bullhead-3.10 March 2017.1 Bug: 34128678	2017-01-18 15:25:56 -08:00
guyang	9a10269841	msm: camera: sensor: Validate eeprom_name string length Validate eeprom_name string length before copying into the userspace buffer. If more data than required is copied, userspace has the access to some of kernel data which is not intended. CRs-Fixed: 1090007 Bug: 32720522 Change-Id: Id40a287e0b1a93cc15d9b02c757fe9f347e285f2 Signed-off-by: Rajesh Bondugula <rajeshb@codeaurora.org> Signed-off-by: VijayaKumar T M <vtmuni@codeaurora.org> Signed-off-by: Yang Guang <guyang@codeaurora.org>	2017-01-18 23:17:14 +00:00
Al Viro	a85e9d8854	BACKPORT: smarter propagate_mnt() The current mainline has copies propagated to all nodes, then tears down the copies we made for nodes that do not contain counterparts of the desired mountpoint. That sets the right propagation graph for the copies (at teardown time we move the slaves of removed node to a surviving peer or directly to master), but we end up paying a fairly steep price in useless allocations. It's fairly easy to create a situation where N calls of mount(2) create exactly N bindings, with O(N^2) vfsmounts allocated and freed in process. Fortunately, it is possible to avoid those allocations/freeings. The trick is to create copies in the right order and find which one would've eventually become a master with the current algorithm. It turns out to be possible in O(nodes getting propagation) time and with no extra allocations at all. One part is that we need to make sure that eventual master will be created before its slaves, so we need to walk the propagation tree in a different order - by peer groups. And iterate through the peers before dealing with the next group. Another thing is finding the (earlier) copy that will be a master of one we are about to create; to do that we are (temporary) marking the masters of mountpoints we are attaching the copies to. Either we are in a peer of the last mountpoint we'd dealt with, or we have the following situation: we are attaching to mountpoint M, the last copy S_0 had been attached to M_0 and there are sequences S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i}, S_{i} mounted on M{i} and we need to create a slave of the first S_{k} such that M is getting propagation from M_{k}. It means that the master of M_{k} will be among the sequence of masters of M. On the other hand, the nearest marked node in that sequence will either be the master of M_{k} or the master of M_{k-1} (the latter - in the case if M_{k-1} is a slave of something M gets propagation from, but in a wrong peer group). So we go through the sequence of masters of M until we find a marked one (P). Let N be the one before it. Then we go through the sequence of masters of S_0 until we find one (say, S) mounted on a node D that has P as master and check if D is a peer of N. If it is, S will be the master of new copy, if not - the master of S will be. That's it for the hard part; the rest is fairly simple. Iterator is in next_group(), handling of one prospective mountpoint is propagate_one(). It seems to survive all tests and gives a noticably better performance than the current mainline for setups that are seriously using shared subtrees. Change-Id: I45648e8a405544f768c5956711bdbdf509e2705a Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-01-09 20:43:00 +00:00
Daniel Rosenberg	c3797a00b6	sdcardfs: Change magic value Sdcardfs uses the same magic value as wrapfs. This should not be the case. As it is entirely in memory, the value can be changed without any loss of compatibility. Change-Id: I24200b805d5e6d32702638be99e47d50d7f2f746 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-01-09 20:42:32 +00:00
Daniel Rosenberg	26920bdc5e	vfs: Add setattr2 for filesystems with per mount permissions This allows filesystems to use their mount private data to influence the permssions they use in setattr2. It has been separated into a new call to avoid disrupting current setattr users. Change-Id: I19959038309284448f1b7f232d579674ef546385 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-01-09 20:41:17 +00:00
Daniel Rosenberg	11cf20bdd3	vfs: Add permission2 for filesystems with per mount permissions This allows filesystems to use their mount private data to influence the permssions they return in permission2. It has been separated into a new call to avoid disrupting current permission users. Change-Id: I9d416e3b8b6eca84ef3e336bd2af89ddd51df6ca Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-01-09 20:41:09 +00:00
Daniel Rosenberg	b804216146	vfs: Allow filesystems to access their private mount data Now we pass the vfsmount when mounting and remounting. This allows the filesystem to actually set up the mount specific data, although we can't quite do anything with it yet. show_options is expanded to include data that lives with the mount. To avoid changing existing filesystems, these have been added as new vfs functions. Change-Id: If80670bfad9f287abb8ac22457e1b034c9697097 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-01-09 20:41:01 +00:00
Daniel Rosenberg	70c8b04dc8	mnt: Add filesystem private data to mount points This starts to add private data associated directly to mount points. The intent is to give filesystems a sense of where they have come from, as a means of letting a filesystem take different actions based on this information. Change-Id: Ie769d7b3bb2f5972afe05c1bf16cf88c91647ab2 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2017-01-09 20:40:52 +00:00
Lorenzo Colitti	807911df1c	net: inet: Support UID-based routing in IP protocols. - Use the UID in routing lookups made by protocol connect() and sendmsg() functions. - Make sure that routing lookups triggered by incoming packets (e.g., Path MTU discovery) take the UID of the socket into account. - For packets not associated with a userspace socket, (e.g., ping replies) use UID 0 inside the user namespace corresponding to the network namespace the socket belongs to. This allows all namespaces to apply routing and iptables rules to kernel-originated traffic in that namespaces by matching UID 0. This is better than using the UID of the kernel socket that is sending the traffic, because the UID of kernel sockets created at namespace creation time (e.g., the per-processor ICMP and TCP sockets) is the UID of the user that created the socket, which might not be mapped in the namespace. [Backport of net-next e2d118a1cb5e60d077131a09db1d81b90a5295fe] Bug: 16355602 Change-Id: I126f8359887b5b5bbac68daf0ded89e899cb7cb0 Tested: compiles allnoconfig, allyesconfig, allmodconfig Tested: https://android-review.googlesource.com/253302 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-20 15:27:17 +09:00
Lorenzo Colitti	a66ad98d08	net: core: add UID to flows, rules, and routes - Define a new FIB rule attributes, FRA_UID_RANGE, to describe a range of UIDs. - Define a RTA_UID attribute for per-UID route lookups and dumps. - Support passing these attributes to and from userspace via rtnetlink. The value INVALID_UID indicates no UID was specified. - Add a UID field to the flow structures. [Backport of net-next 622ec2c9d52405973c9f1ca5116eb1c393adfc7d] Bug: 16355602 Change-Id: I7e3ab388ed862c4b7e39dc8b0209d977cb1129ac Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-20 15:27:17 +09:00
Lorenzo Colitti	90bd7a9525	net: core: Add a UID field to struct sock. Protocol sockets (struct sock) don't have UIDs, but most of the time, they map 1:1 to userspace sockets (struct socket) which do. Various operations such as the iptables xt_owner match need access to the "UID of a socket", and do so by following the backpointer to the struct socket. This involves taking sk_callback_lock and doesn't work when there is no socket because userspace has already called close(). Simplify this by adding a sk_uid field to struct sock whose value matches the UID of the corresponding struct socket. The semantics are as follows: 1. Whenever sk_socket is non-null: sk_uid is the same as the UID in sk_socket, i.e., matches the return value of sock_i_uid. Specifically, the UID is set when userspace calls socket(), fchown(), or accept(). 2. When sk_socket is NULL, sk_uid is defined as follows: - For a socket that no longer has a sk_socket because userspace has called close(): the previous UID. - For a cloned socket (e.g., an incoming connection that is established but on which userspace has not yet called accept): the UID of the socket it was cloned from. - For a socket that has never had an sk_socket: UID 0 inside the user namespace corresponding to the network namespace the socket belongs to. Kernel sockets created by sock_create_kern are a special case of #1 and sk_uid is the user that created them. For kernel sockets created at network namespace creation time, such as the per-processor ICMP and TCP sockets, this is the user that created the network namespace. [Backport of net-next 86741ec25462e4c8cdce6df2f41ead05568c7d5e] Bug: 16355602 Change-Id: I73e1a57dfeedf672f4c2dfc9ce6867838b55974b Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-20 15:27:17 +09:00
Lorenzo Colitti	e813633eff	Revert "net: core: Support UID-based routing." This reverts commit `f6f535d3e0`. Bug: 16355602 Change-Id: I5987e276f5ddbe425ea3bd86861cee0ae22212d9	2016-12-20 15:27:17 +09:00
Lorenzo Colitti	cfe727279b	Revert "Handle 'sk' being NULL in UID-based routing." This reverts commit 455b09d66a9ccfc572497ae88375ae343ff9ae66. Bug: 16355602 Change-Id: I54fb9232343d93c115a529be9ce2104bc836d88d	2016-12-20 15:27:17 +09:00
Jerry Zhang	ab4be692ff	usb: gadget: f_fs: Add ioctl for allocating endpoint buffers. This creates an ioctl named FUNCTIONFS_ENDPOINT_ALLOC which will preallocate buffers for a given size. Any reads/writes on that endpoint below that size will use those buffers instead of allocating their own. If the endpoint is not active, the buffer will not be allocated until it becomes active. Change-Id: I4da517620ed913161ea9e21a31f6b92c9a012b44 Signed-off-by: Jerry Zhang <zhangjerry@google.com>	2016-12-14 18:24:42 -08:00
Robert Baldyga	2f6156c390	usb: gadget: f_fs: add ioctl returning ep descriptor This patch introduces ioctl named FUNCTIONFS_ENDPOINT_DESC, which returns endpoint descriptor to userspace. It works only if function is active. Signed-off-by: Robert Baldyga <r.baldyga@samsung.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Jerry Zhang <zhangjerry@google.com> Change-Id: I55987bf0c6744327f7763b567b5a2b39c50d18e6	2016-12-14 13:49:46 -08:00
Andy Lutomirski	a756dc0830	UPSTREAM: capabilities: ambient capabilities Credit where credit is due: this idea comes from Christoph Lameter with a lot of valuable input from Serge Hallyn. This patch is heavily based on Christoph's patch. ===== The status quo ===== On Linux, there are a number of capabilities defined by the kernel. To perform various privileged tasks, processes can wield capabilities that they hold. Each task has four capability masks: effective (pE), permitted (pP), inheritable (pI), and a bounding set (X). When the kernel checks for a capability, it checks pE. The other capability masks serve to modify what capabilities can be in pE. Any task can remove capabilities from pE, pP, or pI at any time. If a task has a capability in pP, it can add that capability to pE and/or pI. If a task has CAP_SETPCAP, then it can add any capability to pI, and it can remove capabilities from X. Tasks are not the only things that can have capabilities; files can also have capabilities. A file can have no capabilty information at all [1]. If a file has capability information, then it has a permitted mask (fP) and an inheritable mask (fI) as well as a single effective bit (fE) [2]. File capabilities modify the capabilities of tasks that execve(2) them. A task that successfully calls execve has its capabilities modified for the file ultimately being excecuted (i.e. the binary itself if that binary is ELF or for the interpreter if the binary is a script.) [3] In the capability evolution rules, for each mask Z, pZ represents the old value and pZ' represents the new value. The rules are: pP' = (X & fP) \| (pI & fI) pI' = pI pE' = (fE ? pP' : 0) X is unchanged For setuid binaries, fP, fI, and fE are modified by a moderately complicated set of rules that emulate POSIX behavior. Similarly, if euid == 0 or ruid == 0, then fP, fI, and fE are modified differently (primary, fP and fI usually end up being the full set). For nonroot users executing binaries with neither setuid nor file caps, fI and fP are empty and fE is false. As an extra complication, if you execute a process as nonroot and fE is set, then the "secure exec" rules are in effect: AT_SECURE gets set, LD_PRELOAD doesn't work, etc. This is rather messy. We've learned that making any changes is dangerous, though: if a new kernel version allows an unprivileged program to change its security state in a way that persists cross execution of a setuid program or a program with file caps, this persistent state is surprisingly likely to allow setuid or file-capped programs to be exploited for privilege escalation. ===== The problem ===== Capability inheritance is basically useless. If you aren't root and you execute an ordinary binary, fI is zero, so your capabilities have no effect whatsoever on pP'. This means that you can't usefully execute a helper process or a shell command with elevated capabilities if you aren't root. On current kernels, you can sort of work around this by setting fI to the full set for most or all non-setuid executable files. This causes pP' = pI for nonroot, and inheritance works. No one does this because it's a PITA and it isn't even supported on most filesystems. If you try this, you'll discover that every nonroot program ends up with secure exec rules, breaking many things. This is a problem that has bitten many people who have tried to use capabilities for anything useful. ===== The proposed change ===== This patch adds a fifth capability mask called the ambient mask (pA). pA does what most people expect pI to do. pA obeys the invariant that no bit can ever be set in pA if it is not set in both pP and pI. Dropping a bit from pP or pI drops that bit from pA. This ensures that existing programs that try to drop capabilities still do so, with a complication. Because capability inheritance is so broken, setting KEEPCAPS, using setresuid to switch to nonroot uids, and then calling execve effectively drops capabilities. Therefore, setresuid from root to nonroot conditionally clears pA unless SECBIT_NO_SETUID_FIXUP is set. Processes that don't like this can re-add bits to pA afterwards. The capability evolution rules are changed: pA' = (file caps or setuid or setgid ? 0 : pA) pP' = (X & fP) \| (pI & fI) \| pA' pI' = pI pE' = (fE ? pP' : pA') X is unchanged If you are nonroot but you have a capability, you can add it to pA. If you do so, your children get that capability in pA, pP, and pE. For example, you can set pA = CAP_NET_BIND_SERVICE, and your children can automatically bind low-numbered ports. Hallelujah! Unprivileged users can create user namespaces, map themselves to a nonzero uid, and create both privileged (relative to their namespace) and unprivileged process trees. This is currently more or less impossible. Hallelujah! You cannot use pA to try to subvert a setuid, setgid, or file-capped program: if you execute any such program, pA gets cleared and the resulting evolution rules are unchanged by this patch. Users with nonzero pA are unlikely to unintentionally leak that capability. If they run programs that try to drop privileges, dropping privileges will still work. It's worth noting that the degree of paranoia in this patch could possibly be reduced without causing serious problems. Specifically, if we allowed pA to persist across executing non-pA-aware setuid binaries and across setresuid, then, naively, the only capabilities that could leak as a result would be the capabilities in pA, and any attacker already has those capabilities. This would make me nervous, though -- setuid binaries that tried to privilege-separate might fail to do so, and putting CAP_DAC_READ_SEARCH or CAP_DAC_OVERRIDE into pA could have unexpected side effects. (Whether these unexpected side effects would be exploitable is an open question.) I've therefore taken the more paranoid route. We can revisit this later. An alternative would be to require PR_SET_NO_NEW_PRIVS before setting ambient capabilities. I think that this would be annoying and would make granting otherwise unprivileged users minor ambient capabilities (CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much less useful than it is with this patch. ===== Footnotes ===== [1] Files that are missing the "security.capability" xattr or that have unrecognized values for that xattr end up with has_cap set to false. The code that does that appears to be complicated for no good reason. [2] The libcap capability mask parsers and formatters are dangerously misleading and the documentation is flat-out wrong. fE is not a mask; it's a single bit. This has probably confused every single person who has tried to use file capabilities. [3] Linux very confusingly processes both the script and the interpreter if applicable, for reasons that elude me. The results from thinking about a script's file capabilities and/or setuid bits are mostly discarded. Preliminary userspace code is here, but it needs updating: https://git.kernel.org/cgit/linux/kernel/git/luto/util-linux-playground.git/commit/?h=cap_ambient&id=7f5afbd175d2 Here is a test program that can be used to verify the functionality (from Christoph): /* * Test program for the ambient capabilities. This program spawns a shell * that allows running processes with a defined set of capabilities. * * (C) 2015 Christoph Lameter <cl@linux.com> * Released under: GPL v3 or later. * * * Compile using: * * gcc -o ambient_test ambient_test.o -lcap-ng * * This program must have the following capabilities to run properly: * Permissions for CAP_NET_RAW, CAP_NET_ADMIN, CAP_SYS_NICE * * A command to equip the binary with the right caps is: * * setcap cap_net_raw,cap_net_admin,cap_sys_nice+p ambient_test * * * To get a shell with additional caps that can be inherited by other processes: * * ./ambient_test /bin/bash * * * Verifying that it works: * * From the bash spawed by ambient_test run * * cat /proc/$$/status * * and have a look at the capabilities. / / * Definitions from the kernel header files. These are going to be removed * when the /usr/include files have these defined. / static void set_ambient_cap(int cap) { int rc; capng_get_caps_process(); rc = capng_update(CAPNG_ADD, CAPNG_INHERITABLE, cap); if (rc) { printf("Cannot add inheritable cap\n"); exit(2); } capng_apply(CAPNG_SELECT_CAPS); / Note the two 0s at the end. Kernel checks for these / if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, cap, 0, 0)) { perror("Cannot set cap"); exit(1); } } int main(int argc, char *argv) { int rc; set_ambient_cap(CAP_NET_RAW); set_ambient_cap(CAP_NET_ADMIN); set_ambient_cap(CAP_SYS_NICE); printf("Ambient_test forking shell\n"); if (execv(argv[1], argv + 1)) perror("Cannot exec"); return 0; } Signed-off-by: Christoph Lameter <cl@linux.com> # Original author Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Aaron Jones <aaronmdjones@gmail.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Andrew G. Morgan <morgan@kernel.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Austin S Hemmelgarn <ahferroin7@gmail.com> Cc: Markku Savela <msa@moth.iki.fi> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 58319057b7847667f0c9585b9de0e8932b0fdb08) Bug: 31038224 Test: Builds. Change-Id: Ib4ebe89343b032765b3b1dc79dd3817192ad3788 Signed-off-by: Jorge Lucangeli Obes <jorgelo@google.com>	2016-12-05 12:10:41 -05:00
Jann Horn	6ceb9569a8	BACKPORT: security: fix typo in security_task_prctl Signed-off-by: Jann Horn <jann@thejh.net> Reviewed-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit b7f76ea2ef6739ee484a165ffbac98deb855d3d3) Bug: 33340301 Test: Builds. Change-Id: I47185aebc08bb335c466ac3d1174c77187516fb4 Signed-off-by: Jorge Lucangeli Obes <jorgelo@google.com>	2016-12-05 11:38:44 -05:00
Jerry Zhang	e54305df68	Revert "Backport ioctl for getting descriptors." This reverts commit `c57495e6fc`.	2016-10-31 17:46:03 -07:00
Jerry Zhang	c57495e6fc	Backport ioctl for getting descriptors. This is needed for MTP to know if writes are aligned to packet size. Change-Id: If504511e649d46eb8d52f1fafeda071dddeec263 Signed-off-by: Jerry Zhang <zhangjerry@google.com>	2016-10-31 17:20:13 -07:00
Patrick Tjin	01872d075e	Merge branch android-msm-bullhead-3.10-security-next into android-msm-bullhead-3.10 December 2016.1	2016-10-21 15:59:23 -07:00
Linus Torvalds	c33d1bdff9	UPSTREAM: mm: remove gup_flags FOLL_WRITE games from __get_user_pages() (cherry-picked from `9691eac559`) commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream. This is an ancient bug that was actually attempted to be fixed once (badly) by me eleven years ago in commit `4ceb5db975` ("Fix get_user_pages() race for write access") but that was then undone due to problems on s390 by commit `f33ea7f404` ("fix get_user_pages bug"). In the meantime, the s390 situation has long been fixed, and we can now fix it by checking the pte_dirty() bit properly (and do it better). The s390 dirty bit was implemented in `abf09bed3c` ("s390/mm: implement software dirty bits") which made it into v3.9. Earlier kernels will have to look at the page state itself. Also, the VM has become more scalable, and what used a purely theoretical race back then has become easier to trigger. To fix it, we introduce a new internal FOLL_COW flag to mark the "yes, we already did a COW" rather than play racy games with FOLL_WRITE that is very fundamental, and then use the pte dirty flag to validate that the FOLL_COW flag is still valid. Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Michal Hocko <mhocko@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Nick Piggin <npiggin@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wt: s/gup.c/memory.c; s/follow_page_pte/follow_page_mask; s/faultin_page/__get_user_page] Signed-off-by: Willy Tarreau <w@1wt.eu> Change-Id: I42e448ecacad4781b460c4c989026307169ba1b5 Bug: 32141528	2016-10-20 14:49:02 -07:00
Linus Torvalds	9691eac559	mm: remove gup_flags FOLL_WRITE games from __get_user_pages() commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream. This is an ancient bug that was actually attempted to be fixed once (badly) by me eleven years ago in commit `4ceb5db975` ("Fix get_user_pages() race for write access") but that was then undone due to problems on s390 by commit `f33ea7f404` ("fix get_user_pages bug"). In the meantime, the s390 situation has long been fixed, and we can now fix it by checking the pte_dirty() bit properly (and do it better). The s390 dirty bit was implemented in `abf09bed3c` ("s390/mm: implement software dirty bits") which made it into v3.9. Earlier kernels will have to look at the page state itself. Also, the VM has become more scalable, and what used a purely theoretical race back then has become easier to trigger. To fix it, we introduce a new internal FOLL_COW flag to mark the "yes, we already did a COW" rather than play racy games with FOLL_WRITE that is very fundamental, and then use the pte dirty flag to validate that the FOLL_COW flag is still valid. Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Michal Hocko <mhocko@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Nick Piggin <npiggin@gmail.com> Cc: Greg Thelen <gthelen@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wt: s/gup.c/memory.c; s/follow_page_pte/follow_page_mask; s/faultin_page/__get_user_page] Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-10-20 00:46:32 +02:00
Simon Horman	0c8b61323b	PCI: Add Netronome NFP4000 PF device ID commit 69874ec233871a62e1bc8c89e643993af93a8630 upstream. Add the device ID for the PF of the NFP4000. The device ID for the VF, 0x6003, is already present as PCI_DEVICE_ID_NETRONOME_NFP6000_VF. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-10-20 00:46:30 +02:00
Jason S. McMullan	a4697a4c17	PCI: Add Netronome vendor and device IDs commit a755e169031dac9ebaed03302c4921687c271d62 upstream. Device IDs for the Netronome NFP3200, NFP3240, NFP6000, and NFP6000 SR-IOV devices. Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com> [simon: edited changelog] Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-10-20 00:46:30 +02:00
John Dias	bcec0d56cd	perf: protect group_leader from races that cause ctx double-free When moving a group_leader perf event from a software-context to a hardware-context, there's a race in checking and updating that context. The existing locking solution doesn't work; note that it tries to grab a lock inside the group_leader's context object, which you can only get at by going through a pointer that should be protected from these races. To avoid that problem, and to produce a simple solution, we can just use a lock per group_leader to protect all checks on the group_leader's context. The new lock is grabbed and released when no context locks are held. Bug: 30955111 Bug: 31095224 Change-Id: If37124c100ca6f4aa962559fba3bd5dbbec8e052	2016-10-18 07:02:20 +00:00
Wei Wang	b49384434a	ARM: make sure RO local relocations are part of kernel RO section This makes sure that sections generated with -fPIC remain part of the .rodata section so that the kernel marks it correctly read-only. Signed-off-by: Kees Cook <keescook@google.com> Bug: 31703084 Change-Id: I66485b52ba9a801bae614e802a301119f04e507c	2016-10-17 23:02:51 -07:00
Robb Glasser	5120222996	Add padding field to fuse_open_out Bug: 30222859 Change-Id: Iefc66a02a7692a6286dab9b30d4bad7d92afdd77	2016-09-26 17:29:54 -07:00
Lorenzo Colitti	2aae505375	net: inet: diag: expose the socket mark to privileged processes. This adds the capability for a process that has CAP_NET_ADMIN on a socket to see the socket mark in socket dumps. Commit a52e95abf772 ("net: diag: allow socket bytecode filters to match socket marks") recently gave privileged processes the ability to filter socket dumps based on mark. This patch is complementary: it ensures that the mark is also passed to userspace in the socket's netlink attributes. It is useful for tools like ss which display information about sockets. [backport of net-next d545caca827b65aab557a9e9dcdcf1e5a3823c2d] Change-Id: I0c9708aae5ab8dfa296b8a1e6aecceb2a382415a Tested: https://android-review.googlesource.com/270210 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-20 17:45:03 +09:00
David Ahern	451f2271d5	net: diag: support SOCK_DESTROY for UDP sockets This implements SOCK_DESTROY for UDP sockets similar to what was done for TCP with commit c1e64e298b8ca ("net: diag: Support destroying TCP sockets.") A process with a UDP socket targeted for destroy is awakened and recvmsg fails with ECONNABORTED. [backport of net-next 5d77dca82839ef016a93ad7acd7058b14d967752] Change-Id: I84e71e774c859002f98dcdb5e0ca01f35227a44c Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-20 17:45:00 +09:00
Lorenzo Colitti	c3636b6ef1	net: diag: allow socket bytecode filters to match socket marks This allows a privileged process to filter by socket mark when dumping sockets via INET_DIAG_BY_FAMILY. This is useful on systems that use mark-based routing such as Android. The ability to filter socket marks requires CAP_NET_ADMIN, which is consistent with other privileged operations allowed by the SOCK_DIAG interface such as the ability to destroy sockets and the ability to inspect BPF filters attached to packet sockets. [backport of net-next a52e95abf772b43c9226e9a72d3c1353903ba96f] Change-Id: Ic02caf628a71007cc7c48c9da220b4088f5aa4f4 Tested: https://android-review.googlesource.com/261350 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-20 17:44:59 +09:00
David Ahern	2b8a6a453a	net: diag: Add support to filter on device index Add support to inet_diag facility to filter sockets based on device index. If an interface index is in the filter only sockets bound to that index (sk_bound_dev_if) are returned. [backport of net-next 637c841dd7a5f9bd97b75cbe90b526fa1a52e530] Change-Id: Ib430cfb44f1b3b1a771a561247ee9140737e52fd Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-09-20 17:44:56 +09:00
Patrick Tjin	6353437402	Merge branch 'android-msm-bullhead-3.10-security-next' into android-msm-bullhead-3.10 November 2016.1	2016-09-19 15:04:31 -07:00
Mark Salyzyn	931dd77cc7	pstore: drop pmsg bounce buffer Removing a bounce buffer copy operation in the pmsg driver path is always better. We also gain in overall performance by not requesting a vmalloc on every write as this can cause precious RT tasks, such as user facing media operation, to stall while memory is being reclaimed. Added a write_buf_user to the pstore functions, a backup platform write_buf_user that uses the small buffer that is part of the instance, and implemented a ramoops write_buf_user that only supports PSTORE_TYPE_PMSG. Signed-off-by: Mark Salyzyn <salyzyn@google.com> Bug: 31057326 Change-Id: I4cdee1cd31467aa3e6c605bce2fbd4de5b0f8caa	2016-09-17 15:52:54 +00:00
Biswajit Paul	e06cd33b5e	ASoC: msm: Add Buffer overflow check The overflow check is required to ensure that user space data in kernel may not go beyond buffer boundary. Bug: 28751152 CRs-Fixed: 1064411 Change-Id: I54c28a8942cf1a6a47a4e8272f3159b35d753ead Signed-off-by: Karthik Reddy Katta <a_katta@codeaurora.org> Signed-off-by: Biswajit Paul <biswajitpaul@codeaurora.org>	2016-09-17 06:12:53 +00:00
Eric Dumazet	263c0ab0cf	UPSTREAM: tcp: fix use after free in tcp_xmit_retransmit_queue() (cherry picked from commit bb1fceca22492109be12640d49f5ea5a544c6bb4) When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the tail of the write queue using tcp_add_write_queue_tail() Then it attempts to copy user data into this fresh skb. If the copy fails, we undo the work and remove the fresh skb. Unfortunately, this undo lacks the change done to tp->highest_sack and we can leave a dangling pointer (to a freed skb) Later, tcp_xmit_retransmit_queue() can dereference this pointer and access freed memory. For regular kernels where memory is not unmapped, this might cause SACK bugs because tcp_highest_sack_seq() is buggy, returning garbage instead of tp->snd_nxt, but with various debug features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel. This bug was found by Marco Grassi thanks to syzkaller. Fixes: `6859d49475` ("[TCP]: Abstract tp->highest_sack accessing & point to next skb") Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Change-Id: I58bb02d6e4e399612e8580b9e02d11e661df82f5 Bug: 31183296	2016-09-16 06:56:08 +00:00
Nikhilesh Reddy	702bb6170a	fuse: Add support for shortcircuited read/write for files Add support for shortcircuited read/write for files when enabled through a userspace init option of FUSE_SHORTCIRCUIT. When FUSE_SHORTCIRCUIT is enabled all the reads and writes to the fuse mount point go directly to the native filesystem rather than through the fuse daemon. All requsts that aren't read/write still go thought the userspace code. This allows for significantly better performance on read and writes and the difference between fuse and the native lower filesystem is negligible. Bug: 30222859 Change-Id: I49e21b77813595c2faec6fcba38a74e8f686d020 Signed-off-by: Nikhilesh Reddy <reddyn@codeaurora.org>	2016-09-13 08:53:29 -07:00
Thierry Strudel	2f33172175	Revert "fuse: Add support for fuse stacked I/O" This reverts commit `e24429f34d`. Bug: 30222859 Change-Id: I3bef9796686f356b125576760e1222b97731ff7c	2016-09-13 08:53:29 -07:00
Richard Weinberger	31f476515d	mm: Export migrate_page_move_mapping and migrate_page_copy commit 1118dce773d84f39ebd51a9fe7261f9169cb056e upstream. Export these symbols such that UBIFS can implement ->migratepage. Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [wt: also add the prototype to include/linux/migrate.h] Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-08-27 11:40:23 +02:00
Jason Gunthorpe	55ac348c64	IB/security: Restrict use of the write() interface commit e6bd18f57aad1a2d1ef40e646d03ed0f2515c9e3 upstream. The drivers/infiniband stack uses write() as a replacement for bi-directional ioctl(). This is not safe. There are ways to trigger write calls that result in the return structure that is normally written to user space being shunted off to user specified kernel memory instead. For the immediate repair, detect and deny suspicious accesses to the write API. For long term, update the user space libraries and the kernel API to something that doesn't present the same security vulnerabilities (likely a structured ioctl() interface). The impacted uAPI interfaces are generally only available if hardware from drivers/infiniband is installed in the system. Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [ Expanded check to all known write() entry points ] Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com> [wt: no hfi1 subdir in 3.10. A minimal rdma/ib.h had to be created from 3.11 sources to keep the code similar to mainline] Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-08-27 11:40:21 +02:00
Tejun Heo	0ba4f4bb1a	printk: do cond_resched() between lines while outputting to consoles commit 8d91f8b15361dfb438ab6eb3b319e2ded43458ff upstream. @console_may_schedule tracks whether console_sem was acquired through lock or trylock. If the former, we're inside a sleepable context and console_conditional_schedule() performs cond_resched(). This allows console drivers which use console_lock for synchronization to yield while performing time-consuming operations such as scrolling. However, the actual console outputting is performed while holding irq-safe logbuf_lock, so console_unlock() clears @console_may_schedule before starting outputting lines. Also, only a few drivers call console_conditional_schedule() to begin with. This means that when a lot of lines need to be output by console_unlock(), for example on a console registration, the task doing console_unlock() may not yield for a long time on a non-preemptible kernel. If this happens with a slow console devices, for example a serial console, the outputting task may occupy the cpu for a very long time. Long enough to trigger softlockup and/or RCU stall warnings, which in turn pile more messages, sometimes enough to trigger the next cycle of warnings incapacitating the system. Fix it by making console_unlock() insert cond_resched() between lines if @console_may_schedule. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Calvin Owens <calvinowens@fb.com> Acked-by: Jan Kara <jack@suse.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Kyle McMartin <kyle@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ciwillia@brocade.com: adjust context for 3.10.y] Signed-off-by: Chas Williams <ciwillia@brocade.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-08-21 23:22:37 +02:00
Willy Tarreau	f474c525b4	pipe: limit the per-user amount of pages allocated in pipes commit 759c01142a5d0f364a462346168a56de28a80f52 upstream. On no-so-small systems, it is possible for a single process to cause an OOM condition by filling large pipes with data that are never read. A typical process filling 4000 pipes with 1 MB of data will use 4 GB of memory. On small systems it may be tricky to set the pipe max size to prevent this from happening. This patch makes it possible to enforce a per-user soft limit above which new pipes will be limited to a single page, effectively limiting them to 4 kB each, as well as a hard limit above which no new pipes may be created for this user. This has the effect of protecting the system against memory abuse without hurting other users, and still allowing pipes to work correctly though with less data at once. The limit are controlled by two new sysctls : pipe-user-pages-soft, and pipe-user-pages-hard. Both may be disabled by setting them to zero. The default soft limit allows the default number of FDs per process (1024) to create pipes of the default size (64kB), thus reaching a limit of 64MB before starting to create only smaller pipes. With 256 processes limited to 1024 FDs each, this results in 102464kB + (2561024 - 1024) * 4kB = 1084 MB of memory allocated for a user. The hard limit is disabled by default to avoid breaking existing applications that make intensive use of pipes (eg: for splicing). CVE-2016-2847 Reported-by: socketpair@gmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Mitigates: CVE-2013-4312 (Linux 2.0+) Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Chas Williams <3chas3@gmail.com>	2016-08-21 23:22:36 +02:00
Alan Stern	122ac0bd1d	USB: EHCI: declare hostpc register as zero-length array commit 7e8b3dfef16375dbfeb1f36a83eb9f27117c51fd upstream. The HOSTPC extension registers found in some EHCI implementations form a variable-length array, with one element for each port. Therefore the hostpc field in struct ehci_regs should be declared as a zero-length array, not a single-element array. This fixes a problem reported by UBSAN. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de> Tested-by: Wilfried Klaebe <linux-kernel@lebenslange-mailadresse.de> CC: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-08-21 23:22:34 +02:00
Florian Westphal	0a0523382f	netfilter: x_tables: introduce and use xt_copy_counters_from_user commit 63ecb81aadf1c823c85c70a2bfd1ec9df3341a72 upstream. commit d7591f0c41ce3e67600a982bab6989ef0f07b3ce upstream The three variants use same copy&pasted code, condense this into a helper and use that. Make sure info.name is 0-terminated. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-08-21 23:22:32 +02:00
Florian Westphal	fbe426f822	netfilter: x_tables: xt_compat_match_from_user doesn't need a retval commit 0188346f21e6546498c2a0f84888797ad4063fc5 upstream. Always returned 0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-08-21 23:22:31 +02:00
Florian Westphal	71f72bbf29	netfilter: x_tables: check for bogus target offset commit ce683e5f9d045e5d67d1312a42b359cb2ab2a13c upstream. We're currently asserting that targetoff + targetsize <= nextoff. Extend it to also check that targetoff is >= sizeof(xt_entry). Since this is generic code, add an argument pointing to the start of the match/target, we can then derive the base structure size from the delta. We also need the e->elems pointer in a followup change to validate matches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-08-21 23:22:29 +02:00
Florian Westphal	305323325e	netfilter: x_tables: add compat version of xt_check_entry_offsets commit fc1221b3a163d1386d1052184202d5dc50d302d1 upstream. 32bit rulesets have different layout and alignment requirements, so once more integrity checks get added to xt_check_entry_offsets it will reject well-formed 32bit rulesets. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-08-21 23:22:28 +02:00
Florian Westphal	ff9be2063c	netfilter: x_tables: add and use xt_check_entry_offsets commit 7d35812c3214afa5b37a675113555259cfd67b98 upstream. Currently arp/ip and ip6tables each implement a short helper to check that the target offset is large enough to hold one xt_entry_target struct and that t->u.target_size fits within the current rule. Unfortunately these checks are not sufficient. To avoid adding new tests to all of ip/ip6/arptables move the current checks into a helper, then extend this helper in followup patches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-08-21 23:22:28 +02:00
vivek mehta	8b4a2cec80	ASoC: msm-lsm-client: free lsm client data in msm_lsm_close Currently lsm client data is deallocated when q6lsm_open() fails which can cause memory corruption if lsm client data is accessed after freed. Fix this issue by deallocating the client data only in msm_lsm_close(). Bug: 30142668 Change-Id: If048c26a0ffd8a346a28622183cbf2ba1e7e5ff3 Signed-off-by: Vidyakumar Athota <vathota@codeaurora.org> Signed-off-by: vivek mehta <mvivek@codeaurora.org>	2016-08-17 07:21:48 +00:00
Peter Zijlstra	0e28feab78	BACKPORT: sched/preempt: Fix preempt_count manipulations Vikram reported that his ARM64 compiler managed to 'optimize' away the preempt_count manipulations in code like: preempt_enable_no_resched(); put_user(); preempt_disable(); Irrespective of that fact that that is horrible code that should be fixed for many reasons, it does highlight a deficiency in the generic preempt_count manipulators. As it is never right to combine/elide preempt_count manipulations like this. Therefore sprinkle some volatile in the two generic accessors to ensure the compiler is aware of the fact that the preempt_count is observed outside of the regular program-order view and thus cannot be optimized away like this. x86; the only arch not using the generic code is not affected as we do all this in asm in order to use the segment base per-cpu stuff. Reported-by: Vikram Mulukutla <markivx@codeaurora.org> Tested-by: Vikram Mulukutla <markivx@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: a787870924db ("sched, arch: Create asm/preempt.h") Link: http://lkml.kernel.org/r/20160516131751.GH3205@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Bug: 28705983 (cherry picked from commit 2e636d5e66c35dfcbaf617aa8fa963f6847478fe) Signed-off-by: Patrick Tjin <pattjin@google.com> Change-Id: I88dec230b1fb14e4bf6e1497d088198feae1cf74	2016-07-22 15:56:28 -07:00
Linus Torvalds	20a8b65f65	UPSTREAM: kernel: make READ_ONCE() valid on const arguments The use of READ_ONCE() causes lots of warnings witht he pending paravirt spinlock fixes, because those ends up having passing a member to a 'const' structure to READ_ONCE(). There should certainly be nothing wrong with using READ_ONCE() with a const source, but the helper function __read_once_size() would cause warnings because it would drop the 'const' qualifier, but also because the destination would be marked 'const' too due to the use of 'typeof'. Use a union of types in READ_ONCE() to avoid this issue. Also make sure to use parenthesis around the macro arguments to avoid possible operator precedence issues. Tested-by: Ingo Molnar <mingo@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 28705983 (cherry picked from commit dd36929720f40f17685e841ae0d4c581c165ea60) Change-Id: I73b280cfae42245f6adf6a8e50be7f9177c88ebe	2016-07-22 15:56:27 -07:00
Christian Borntraeger	6a6404043f	UPSTREAM: kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val) Feedback has shown that WRITE_ONCE(x, val) is easier to use than ASSIGN_ONCE(val,x). There are no in-tree users yet, so lets change it for 3.19. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Bug: 28705983 (cherry picked from commit 43239cbe79fc369f5d2160bd7f69e28b5c50a58c) Change-Id: I10a969447a3aefa87009c244f787da78338af165	2016-07-22 15:56:26 -07:00
Christian Borntraeger	14f9e22136	UPSTREAM: kernel: Provide READ_ONCE and ASSIGN_ONCE ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via scalar types as suggested by Linus Torvalds. Accesses larger than the machines word size cannot be guaranteed to be atomic. These macros will use memcpy and emit a build warning. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Bug: 28705983 (cherry picked from commit 230fa253df6352af12ad0a16128760b5cb3f92df) Change-Id: I6235247c7b3a5e14093d477417000ed82f7586a3	2016-07-22 15:56:25 -07:00
Peter Zijlstra	ba58c599fd	UPSTREAM: sched: Introduce preempt_count accessor functions Replace the single preempt_count() 'function' that's an lvalue with two proper functions: preempt_count() - returns the preempt_count value as rvalue preempt_count_set() - Allows setting the preempt-count value Also provide preempt_count_ptr() as a convenience wrapper to implement all modifying operations. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-orxrbycjozopqfhb4dxdkdvb@git.kernel.org [ Fixed build failure. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Bug: 28705983 (cherry picked from commit 4a2b4b222743bb07fedf985b884550f2ca067ea9) Change-Id: I1f74e16f9a30d9fdf5f2439d32853568fa68baaa	2016-07-22 15:56:25 -07:00
Thierry Strudel	512ca704a1	Merge branch 'android-msm-bullhead-3.10-security-next' into android-msm-bullhead-3.10	2016-07-20 13:50:14 -07:00
Rainer Weikusat	3b07c73d14	BACKPORT: unix: avoid use-after-free in ep_remove_wait_queue Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: `ec0d215f94` ("af_unix: fix 'poll for write'/connected DGRAM sockets") Reviewed-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Bug: 29119002 (cherry picked from commit 7d267278a9ece963d77eefec61630223fce08c6c) Signed-off-by: Aarthi Thiruvengadam <athiru@google.com> Change-Id: Ia374ee061195088f8c777940baa75cedbe897f4e	2016-07-15 02:33:54 +00:00
mukesh agrawal	eafb4203b9	trace: net: use %pK for kernel pointers We want to use network trace events in production builds, to help diagnose Wifi problems. However, we don't want to expose raw kernel pointers in such builds. Change the format specifier for the skbaddr field, so that, if kptr_restrict is enabled, the pointers will be reported as 0. BUG=30090733 TEST=manual (see below) Manual test - Connect device to a wifi network. $ adb root && adb shell device# cd /sys/kernel/debug/tracing device# echo 1 > events/net/net_dev_xmit/enable device# echo 1 > events/net/netif_rx/enable device# echo 1 > tracing_on device# ping google.com device# cat trace - verify that skbaddr is always 0 Change-Id: Ic4bd583d37af6637343601feca875ee24479ddff	2016-07-15 01:09:53 +00:00
Biswajit Paul	1231732736	spmi: prevent showing the address of spmidev Creating devices with the address of the container spmidev is not indicative of the actual hardware device it represents. Instead use an unique id to indicate the device it represents. Bug: 28760543 CRs-Fixed: 1024197 Change-Id: Id18e2a19f4fa1249901a3f275defa8f589270d69 Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org> Signed-off-by: Biswajit Paul <biswajitpaul@codeaurora.org>	2016-07-14 23:43:35 +00:00
Jeff Vander Stoep	fc17b03094	FROMLIST: security,perf: Allow further restriction of perf_event_open When kernel.perf_event_open is set to 3 (or greater), disallow all access to performance events by users without CAP_SYS_ADMIN. Add a Kconfig symbol CONFIG_SECURITY_PERF_EVENTS_RESTRICT that makes this value the default. This is based on a similar feature in grsecurity (CONFIG_GRKERNSEC_PERF_HARDEN). This version doesn't include making the variable read-only. It also allows enabling further restriction at run-time regardless of whether the default is changed. https://lkml.org/lkml/2016/1/11/587 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Bug: 29054680 Change-Id: Iff5bff4fc1042e85866df9faa01bce8d04335ab8	2016-06-20 18:47:29 +00:00
Eric Dumazet	07bd7f369c	ipv6: add complete rcu protection around np->opt [ Upstream commit 45f6fad84cc305103b28d73482b344d7f5b76f39 ] This patch addresses multiple problems : UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions while socket is not locked : Other threads can change np->opt concurrently. Dmitry posted a syzkaller (http://github.com/google/syzkaller) program desmonstrating use-after-free. Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock() and dccp_v6_request_recv_sock() also need to use RCU protection to dereference np->opt once (before calling ipv6_dup_options()) This patch adds full RCU protection to np->opt BUG: 28746669 Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>	2016-06-17 02:48:07 +00:00
Paolo Bonzini	5d814ad8d3	compiler-gcc: disable -ftracer for __noclone functions commit 95272c29378ee7dc15f43fa2758cb28a5913a06d upstream. -ftracer can duplicate asm blocks causing compilation to fail in noclone functions. For example, KVM declares a global variable in an asm like asm("2: ... \n .pushsection data \n .global vmx_return \n vmx_return: .long 2b"); and -ftracer causes a double declaration. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: stable@vger.kernel.org Cc: kvm@vger.kernel.org Reported-by: Linda Walsh <lkml@tlinx.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-06-07 10:42:52 +02:00
Vasily Kulikov	df1da5a547	include/linux/poison.h: fix LIST_POISON{1,2} offset commit 8a5e5e02fc83aaf67053ab53b359af08c6c49aaf upstream. Poison pointer values should be small enough to find a room in non-mmap'able/hardly-mmap'able space. E.g. on x86 "poison pointer space" is located starting from 0x0. Given unprivileged users cannot mmap anything below mmap_min_addr, it should be safe to use poison pointers lower than mmap_min_addr. The current poison pointer values of LIST_POISON{1,2} might be too big for mmap_min_addr values equal or less than 1 MB (common case, e.g. Ubuntu uses only 0x10000). There is little point to use such a big value given the "poison pointer space" below 1 MB is not yet exhausted. Changing it to a smaller value solves the problem for small mmap_min_addr setups. The values are suggested by Solar Designer: http://www.openwall.com/lists/oss-security/2015/05/02/6 Signed-off-by: Vasily Kulikov <segoon@openwall.com> Cc: Solar Designer <solar@openwall.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-06-07 10:42:52 +02:00
Steven Rostedt (Red Hat)	2ec6dac110	tracing: Fix trace_printk() to print when not using bprintk() commit 3debb0a9ddb16526de8b456491b7db60114f7b5e upstream. The trace_printk() code will allocate extra buffers if the compile detects that a trace_printk() is used. To do this, the format of the trace_printk() is saved to the __trace_printk_fmt section, and if that section is bigger than zero, the buffers are allocated (along with a message that this has happened). If trace_printk() uses a format that is not a constant, and thus something not guaranteed to be around when the print happens, the compiler optimizes the fmt out, as it is not used, and the __trace_printk_fmt section is not filled. This means the kernel will not allocate the special buffers needed for the trace_printk() and the trace_printk() will not write anything to the tracing buffer. Adding a "__used" to the variable in the __trace_printk_fmt section will keep it around, even though it is set to NULL. This will keep the string from being printed in the debugfs/tracing/printk_formats section as it is not needed. Reported-by: Vlastimil Babka <vbabka@suse.cz> Fixes: `07d777fe8c` "tracing: Add percpu buffers for trace_printk()" Cc: stable@vger.kernel.org # v3.5+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-06-07 10:42:47 +02:00
H. Peter Anvin	bb37dac1d9	linux/const.h: Add _BITUL() and _BITULL() commit 2fc016c5bd8aad2e201cdf71b9fb4573f94775bd upstream. Add macros for single bit definitions of a specific type. These are similar to the BIT() macro that already exists, but with a few exceptions: 1. The namespace is such that they can be used in uapi definitions. 2. The type is set with the _AC() macro to allow it to be used in assembly. 3. The type is explicitly specified to be UL or ULL. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-nbca8p7cg6jyjoit7klh3o91@git.kernel.org [wt: backported to 3.10 only to keep next patch clean] Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-06-07 10:42:44 +02:00
Bjorn Helgaas	fa75115240	PCI: Disable IO/MEM decoding for devices with non-compliant BARs commit b84106b4e2290c081cdab521fa832596cdfea246 upstream. The PCI config header (first 64 bytes of each device's config space) is defined by the PCI spec so generic software can identify the device and manage its usage of I/O, memory, and IRQ resources. Some non-spec-compliant devices put registers other than BARs where the BARs should be. When the PCI core sizes these "BARs", the reads and writes it does may have unwanted side effects, and the "BAR" may appear to describe non-sensical address space. Add a flag bit to mark non-compliant devices so we don't touch their BARs. Turn off IO/MEM decoding to prevent the devices from consuming address space, since we can't read the BARs to find out what that address space would be. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-06-07 10:42:44 +02:00
Behan Webster	d4d37e92e0	x86: LLVMLinux: Fix "incomplete type const struct x86cpu_device_id" commit c4586256f0c440bc2bdb29d2cbb915f0ca785d26 upstream. Similar to the fix in `40413dcb7b` MODULE_DEVICE_TABLE(x86cpu, ...) expects the struct to be called struct x86cpu_device_id, and not struct x86_cpu_id which is what is used in the rest of the kernel code. Although gcc seems to ignore this error, clang fails without this define to fix the name. Code from drivers/thermal/x86_pkg_temp_thermal.c static const struct x86_cpu_id __initconst pkg_temp_thermal_ids[] = { ... }; MODULE_DEVICE_TABLE(x86cpu, pkg_temp_thermal_ids); Error from clang: drivers/thermal/x86_pkg_temp_thermal.c:577:1: error: variable has incomplete type 'const struct x86cpu_device_id' MODULE_DEVICE_TABLE(x86cpu, pkg_temp_thermal_ids); ^ include/linux/module.h:145:3: note: expanded from macro 'MODULE_DEVICE_TABLE' MODULE_GENERIC_TABLE(type##_device, name) ^ include/linux/module.h:87:32: note: expanded from macro 'MODULE_GENERIC_TABLE' extern const struct gtype##_id __mod_##gtype##_table \ ^ <scratch space>:143:1: note: expanded from here __mod_x86cpu_device_table ^ drivers/thermal/x86_pkg_temp_thermal.c:577:1: note: forward declaration of 'struct x86cpu_device_id' include/linux/module.h:145:3: note: expanded from macro 'MODULE_DEVICE_TABLE' MODULE_GENERIC_TABLE(type##_device, name) ^ include/linux/module.h:87:21: note: expanded from macro 'MODULE_GENERIC_TABLE' extern const struct gtype##_id __mod_##gtype##_table \ ^ <scratch space>:141:1: note: expanded from here x86cpu_device_id ^ 1 error generated. Signed-off-by: Behan Webster <behanw@converseincode.com> Signed-off-by: Jan-Simon M�ller <dl9pf@gmx.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: philm@manjaro.org Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-06-07 10:42:44 +02:00
Joe Perches	a4a4f1cd73	compiler-gcc: integrate the various compiler-gcc[345].h files commit cb984d101b30eb7478d32df56a0023e4603cba7f upstream. As gcc major version numbers are going to advance rather rapidly in the future, there's no real value in separate files for each compiler version. Deduplicate some of the macros #defined in each file too. Neaten comments using normal kernel commenting style. Signed-off-by: Joe Perches <joe@perches.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Alan Modra <amodra@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ philm: backport to 3.10-stable ] Signed-off-by: Philip M�ller <philm@manjaro.org> Signed-off-by: Willy Tarreau <w@1wt.eu>	2016-06-07 10:42:44 +02:00
Daniel Rosenberg	477dcf86b1	fuse: Add support for d_canonical_path Allows FUSE to report to inotify that it is acting as a layered filesystem. The userspace component returns a string representing the location of the underlying file. If the string cannot be resolved into a path, the top level path is returned instead. bug: 23904372 Change-Id: Iabdca0bbedfbff59e9c820c58636a68ef9683d9f Signed-off-by: Daniel Rosenberg <drosen@google.com>	2016-04-26 14:04:36 -07:00
Daniel Rosenberg	63b4724781	vfs: change d_canonical_path to take two paths bug: 23904372 Change-Id: I4a686d64b6de37decf60019be1718e1d820193e6 Signed-off-by: Daniel Rosenberg <drosen@google.com>	2016-04-26 14:04:29 -07:00
Srinivas Girigowda	65781c8991	net: cnss: add cnss common api support for dual WiFi Add common api support of subsystem restart and bus bandwidth for dual wifi. This feature redirect the cnss export api according to the bus type SDIO/PCI. CRs-Fixed: 986275 Change-Id: Iaf13d6c6d68ef62b7e4f6581899ec8325c5e9696 Signed-off-by: Sarada Prasanna Garnayak <sgarna@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-04-13 11:17:12 -07:00
Srinivas Girigowda	4bc0f29764	net: cnss: add subsystem restart support for dual WiFi Subsystem device add support for subsystem restart recovery and ramdump device for cnss firmware dump collection before the subsystem restart. Refactor subsystem restart wrapper APIs to avoid the name space collision in cnss platform driver compilation in dual WiFi mode. CRs-Fixed: 983677 Change-Id: Ib4a8d1a6d0ce8f1faa43ce0aa8312823b1ca3c15 Signed-off-by: Sarada Prasanna Garnayak <sgarna@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-04-13 11:17:07 -07:00
Srinivas Girigowda	5b8801afe7	net: cnss: add PM QoS support for dual WiFi PM QoS adds support to improve the wlan throughput. The cnss platform driver export PM QoS API to wlan host driver. Refactor PM QoS wrapper APIs to avoid the name space collision in cnss platform driver compilation in dual WiFi mode. CRs-Fixed: 983653 Change-Id: Id7a486f2f111476e73d5707eba36611a3530e9cf Signed-off-by: Sarada Prasanna Garnayak <sgarna@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-04-13 11:16:58 -07:00
Ravi Kumar Alamanda	3c4e732a32	ASoC: msm: audio-effects: fix stack overread and heap overwrite Fix overwrite of updt_params allocated in heap, and stack overread where param pointer is passed from user space. Bug: 26876409 Change-Id: Ida8bdb7da2fcb97023dce3b6eafe4b899a51cb66 Signed-off-by: Ravi Kumar Alamanda <arkumar@codeaurora.org>	2016-04-13 11:03:18 -07:00
Jin Qian	3b8a2d233c	Revert "mm: change max readahead size to 512KB" This reverts commit `403678af49`. Bug: 27849539 Change-Id: I023c2be95ebf3eb1146b3047317821c65a08e602	2016-04-07 10:39:48 -07:00
Daniel Rosenberg	a5d5b2480e	sdcardfs: remove effectless config option CONFIG_SDCARD_FS_CI_SEARCH only guards a define for LOOKUP_CASE_INSENSITIVE, which is never used in the kernel. Remove both, along with the option matching that supports it. Change-Id: I363a8f31de8ee7a7a934d75300cc9ba8176e2edf Signed-off-by: Daniel Rosenberg <drosen@google.com> (cherry picked from commit d4ecd64643a034f9445d6c7c54d247235200098a)	2016-03-29 17:59:49 -07:00
Daniel Rosenberg	7bf5d1a707	vfs: add d_canonical_path for stacked filesystem support Inotify does not currently know when a filesystem is acting as a wrapper around another fs. This means that inotify watchers will miss any modifications to the base file, as well as any made in a separate stacked fs that points to the same file. d_canonical_path solves this problem by allowing the fs to map a dentry to a path in the lower fs. Inotify can use it to find the appropriate place to watch to be informed of all changes to a file. Change-Id: I09563baffad1711a045e45c1bd0bd8713c2cc0b6 Signed-off-by: Daniel Rosenberg <drosen@google.com> (cherry picked from commit 223190f1a174b71affbae6d1167fc0b3f0785839)	2016-03-29 17:59:49 -07:00
Daniel Campello	096dc0ac2c	Initial port of sdcardfs Change-Id: I5b5772a2bbff9f3a7dda641644630a7b8afacec0 (cherry picked from commit 725af4e5e73147c79c7788ab80eec1faf1a53477)	2016-03-29 17:59:49 -07:00
Ranjith Kagathi Ananda	fb99fcb112	msm: camera: isp: Update reporting for output error For controllable output, there are 2 possible bufq under same stream. When reporting error, use bufq instead of stream to identify correctly. Update reporting of drop for CDS missing frame by adding new mask. Bug: 27771328 Bug: 27821985 Change-Id: Ib90ac13509008a3b7689bf41656ba39cc2804831 Signed-off-by: Harsh Shah <harshs@codeaurora.org> Signed-off-by: Ranjith Kagathi Ananda <ranjith@codeaurora.org>	2016-03-29 10:01:28 -07:00
Srinivas Girigowda	8f31c1bb18	cnss: Implement API to store WLAN MAC address in platform driver WLAN Functional Drivers Queries cnss platform driver to get the MAC Address. If the OEM doesn't provide the valid MAC address, the WLAN Driver fallbacks to use other approaches to get MAC address. This works under CONFIG_CNSS_MAC feature flag, which will be enabled only on the OEM platforms. For internal platforms, CNSS driver doesn't hold any valid mac addresses. CRs-Fixed: 985585 Change-Id: I1e8a030a32a640cec84cadd6b36b37938d5fe6be Signed-off-by: Komal Kumar <kseelam@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-03-28 10:21:09 -07:00
Srinivas Girigowda	63a4caac99	etherdevice: Use ether_addr_copy to copy an Ethernet address Some systems can use the normally known u16 alignment of Ethernet addresses to save some code/text bytes and cycles. This does not change currently emitted code on x86 by gcc 4.8. Change-Id: Ic9f5ea49e345bd332bf08e75cd9f4957434fd3f8 Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net> Git-commit: 286ab723d4b83d37deb4017008ef1444a95cfb0d Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-testing.git [hkadmany@codeaurora.org: trivial conflict fix for 3.14] Signed-off-by: Hamad Kadmany <hkadmany@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-03-28 10:21:09 -07:00
Srinivas Girigowda	7572e7c658	net: cnss: refactor PM QoS request wrapper API Make PM QoS request API generic to pass the type of latency requirement needed by the client instead of hard coding latency type. Add latency type as a function parameter. CRs-Fixed: 972761 Change-Id: Ic912148d2068fe8a758b6a4b3be570ccf870f03a Signed-off-by: Sarada Prasanna Garnayak <sgarna@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-03-28 10:21:09 -07:00
Srinivas Girigowda	135e3aed16	cnss: Expose dump stack functionality Add changes to expose dump stack functionality which can be used by driver to dump stack information when it requires. CRs-Fixed: 979886 Change-Id: Ib929ad0a510b996ac54d17afd2957ea487c62851 Signed-off-by: Abhishek Singh <absingh@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-03-28 10:21:09 -07:00
Thierry Strudel	c0627b67a6	trace: cpufreq: fix typo in min/max cpufreq Change-Id: Ieed402d3a912b7a318826e101efe2c24b07ebfe4 Signed-off-by: Thierry Strudel <tstrudel@google.com>	2016-03-23 10:03:49 -07:00
Martijn Coenen	12c7a54b36	mm: vmpressure: dynamic window sizing. The window size used for calculating vm pressure events was previously fixed at 512 pages. The window size has a big impact on the rate of notifications sent off to userspace, in particular when using the "low" level. On machines with a lot of memory, the current value is likely excessive. On the other hand, if the window size is too big, we might delay memory pressure events for too long, especially at critical levels of memory pressure. This patch attempts to address that problem with two changes. The first change is to calculate the window size based on the machine size, quite similar to how the vm watermarks are being calculated. This reduces the chance of false positives at any pressure level. Using the machine size only makes sense on the root cgroup though; for non-root cgroups, their hard memory limit is used to calculate the window size. If no hard memory limit is set, we fall back to the default window size that was previously used. The second change is based on an idea from Johannes Weiner, to only report medium and low pressure levels for every X windows that we scan. This reduces the frequency with which we report low/medium pressure levels, but at the same time will still report critical memory pressure immediately. Change-Id: Ieffca055b2fb6aa27ae0179e0a588e6fcb173a61 Signed-off-by: Martijn Coenen <maco@google.com>	2016-03-21 10:23:32 +00:00
dcashman	56f2bbb2d8	FROMLIST: drivers: char: random: add get_random_long() (cherry picked from commit https://lkml.org/lkml/2016/2/4/831) d07e22597d1d355 ("mm: mmap: add new /proc tunable for mmap_base ASLR") added the ability to choose from a range of values to use for entropy count in generating the random offset to the mmap_base address. The maximum value on this range was set to 32 bits for 64-bit x86 systems, but this value could be increased further, requiring more than the 32 bits of randomness provided by get_random_int(), as is already possible for arm64. Add a new function: get_random_long() which more naturally fits with the mmap usage of get_random_int() but operates exactly the same as get_random_int(). Also, fix the shifting constant in mmap_rnd() to be an unsigned long so that values greater than 31 bits generate an appropriate mask without overflow. This is especially important on x86, as its shift instruction uses a 5-bit mask for the shift operand, which meant that any value for mmap_rnd_bits over 31 acts as a no-op and effectively disables mmap_base randomization. Finally, replace calls to get_random_int() with get_random_long() where appropriate. Bug: 26963541 Change-Id: I2baedc5b665bbfc763a949512572ace30be56a16 Signed-off-by: Daniel Cashman <dcashman@android.com> Signed-off-by: Daniel Cashman <dcashman@google.com>	2016-03-16 23:05:14 +00:00
Greg Kroah-Hartman	1f2493fcd8	Revert: "crypto: af_alg - Disallow bind/setkey/... after accept(2)" This reverts commit `5a707f0972` which is commit c840ac6af3f8713a71b4d2363419145760bd6044 upstream. It's been widely reported that this patch breaks existing userspace applications when backported to the stable kernel releases. As no fix seems to be forthcoming, just revert it to let systems work again. Reported-by: "J. Paul Reed" <preed@sigkill.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-16 08:41:37 -07:00
Rusty Russell	d6692b5c5d	modules: fix longstanding /proc/kallsyms vs module insertion race. commit 8244062ef1e54502ef55f54cced659913f244c3e upstream. For CONFIG_KALLSYMS, we keep two symbol tables and two string tables. There's one full copy, marked SHF_ALLOC and laid out at the end of the module's init section. There's also a cut-down version that only contains core symbols and strings, and lives in the module's core section. After module init (and before we free the module memory), we switch the mod->symtab, mod->num_symtab and mod->strtab to point to the core versions. We do this under the module_mutex. However, kallsyms doesn't take the module_mutex: it uses preempt_disable() and rcu tricks to walk through the modules, because it's used in the oops path. It's also used in /proc/kallsyms. There's nothing atomic about the change of these variables, so we can get the old (larger!) num_symtab and the new symtab pointer; in fact this is what I saw when trying to reproduce. By grouping these variables together, we can use a carefully-dereferenced pointer to ensure we always get one or the other (the free of the module init section is already done in an RCU callback, so that's safe). We allocate the init one at the end of the module init section, and keep the core one inside the struct module itself (it could also have been allocated at the end of the module core, but that's probably overkill). [ Rebased for 4.4-stable and older, because the following changes aren't in the older trees: - e0224418516b4d8a6c2160574bac18447c354ef0: adds arg to is_core_symbol - 7523e4dc5057e157212b4741abd6256e03404cf1: module_init/module_core/init_size/core_size become init_layout.base/core_layout.base/init_layout.size/core_layout.size. Original commit: 8244062ef1e54502ef55f54cced659913f244c3e ] Reported-by: Weilong Chen <chenweilong@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111541 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-16 08:41:37 -07:00
Peter Jones	7b35014c77	efi: Make efivarfs entries immutable by default commit ed8b0de5a33d2a2557dce7f9429dca8cb5bc5879 upstream. "rm -rf" is bricking some peoples' laptops because of variables being used to store non-reinitializable firmware driver data that's required to POST the hardware. These are 100% bugs, and they need to be fixed, but in the mean time it shouldn't be easy to accidentally brick machines. We have to have delete working, and picking which variables do and don't work for deletion is quite intractable, so instead make everything immutable by default (except for a whitelist), and make tools that aren't quite so broad-spectrum unset the immutable flag. Signed-off-by: Peter Jones <pjones@redhat.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-16 08:41:37 -07:00
Peter Jones	d591b6da72	efi: Make our variable validation list include the guid commit 8282f5d9c17fe15a9e658c06e3f343efae1a2a2f upstream. All the variables in this list so far are defined to be in the global namespace in the UEFI spec, so this just further ensures we're validating the variables we think we are. Including the guid for entries will become more important in future patches when we decide whether or not to allow deletion of variables based on presence in this list. Signed-off-by: Peter Jones <pjones@redhat.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-16 08:41:37 -07:00
Peter Jones	44f21ed11c	efi: Do variable name validation tests in utf8 commit 3dcb1f55dfc7631695e69df4a0d589ce5274bd07 upstream. Actually translate from ucs2 to utf8 before doing the test, and then test against our other utf8 data, instead of fudging it. Signed-off-by: Peter Jones <pjones@redhat.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-16 08:41:37 -07:00
Peter Jones	f0b5aae480	lib/ucs2_string: Add ucs2 -> utf8 helper functions commit 73500267c930baadadb0d02284909731baf151f7 upstream. This adds ucs2_utf8size(), which tells us how big our ucs2 string is in bytes, and ucs2_as_utf8, which translates from ucs2 to utf8.. Signed-off-by: Peter Jones <pjones@redhat.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-16 08:41:36 -07:00
Steven Rostedt (Red Hat)	1934f322b9	tracing: Fix check for cpu online when event is disabled commit dc17147de328a74bbdee67c1bf37d2f1992de756 upstream. Commit f37755490fe9b ("tracepoints: Do not trace when cpu is offline") added a check to make sure that tracepoints only get called when the cpu is online, as it uses rcu_read_lock_sched() for protection. Commit 3a630178fd5f3 ("tracing: generate RCU warnings even when tracepoints are disabled") added lockdep checks (including rcu checks) for events that are not enabled to catch possible RCU issues that would only be triggered if a trace event was enabled. Commit f37755490fe9b only stopped the warnings when the trace event was enabled but did not prevent warnings if the trace event was called when disabled. To fix this, the cpu online check is moved to where the condition is added to the trace event. This will place the cpu online check in all places that it may be used now and in the future. Fixes: f37755490fe9b ("tracepoints: Do not trace when cpu is offline") Fixes: 3a630178fd5f3 ("tracing: generate RCU warnings even when tracepoints are disabled") Reported-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-16 08:41:36 -07:00
Harvey Hunt	f105a72e6e	libata: Align ata_device's id on a cacheline commit 4ee34ea3a12396f35b26d90a094c75db95080baa upstream. The id buffer in ata_device is a DMA target, but it isn't explicitly cacheline aligned. Due to this, adjacent fields can be overwritten with stale data from memory on non coherent architectures. As a result, the kernel is sometimes unable to communicate with an ATA device. Fix this by ensuring that the id buffer is cacheline aligned. This issue is similar to that fixed by Commit `84bda12af3` ("libata: align ap->sector_buf"). Signed-off-by: Harvey Hunt <harvey.hunt@imgtec.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-09 15:31:54 -08:00
Arnd Bergmann	45eb9d85bd	libata: fix HDIO_GET_32BIT ioctl commit 287e6611ab1eac76c2c5ebf6e345e04c80ca9c61 upstream. As reported by Soohoon Lee, the HDIO_GET_32BIT ioctl does not work correctly in compat mode with libata. I have investigated the issue further and found multiple problems that all appeared with the same commit that originally introduced HDIO_GET_32BIT handling in libata back in linux-2.6.8 and presumably also linux-2.4, as the code uses "copy_to_user(arg, &val, 1)" to copy a 'long' variable containing either 0 or 1 to user space. The problems with this are: * On big-endian machines, this will always write a zero because it stores the wrong byte into user space. * In compat mode, the upper three bytes of the variable are updated by the compat_hdio_ioctl() function, but they now contain uninitialized stack data. * The hdparm tool calling this ioctl uses a 'static long' variable to store the result. This means at least the upper bytes are initialized to zero, but calling another ioctl like HDIO_GET_MULTCOUNT would fill them with data that remains stale when the low byte is overwritten. Fortunately libata doesn't implement any of the affected ioctl commands, so this would only happen when we query both an IDE and an ATA device in the same command such as "hdparm -N -c /dev/hda /dev/sda" * The libata code for unknown reasons started using ATA_IOC_GET_IO32 and ATA_IOC_SET_IO32 as aliases for HDIO_GET_32BIT and HDIO_SET_32BIT, while the ioctl commands that were added later use the normal HDIO_* names. This is harmless but rather confusing. This addresses all four issues by changing the code to use put_user() on an 'unsigned long' variable in HDIO_GET_32BIT, like the IDE subsystem does, and by clarifying the names of the ioctl commands. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Soohoon Lee <Soohoon.Lee@f5.com> Tested-by: Soohoon Lee <Soohoon.Lee@f5.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-09 15:31:54 -08:00
Srinivas Girigowda	86e198c56d	cfg80211: Define macro to indicate bssid hint backport support Define macro to indicate backport support for bssid hint connect parameter which can be used by driver if the kernel version is less than 3.15. Change-Id: Ic8d2f29ca1287df8dedb7927cd444242c6ca1c27 CRs-Fixed: 919601 Signed-off-by: Edhar, Mahesh Kumar <MAHESH@codeaurora.org> Signed-off-by: Komal Seelam <kseelam@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-03-07 22:05:02 +00:00
Christoph Hellwig	9c8e8b9e50	nfs: fix nfs_size_to_loff_t commit 50ab8ec74a153eb30db26529088bc57dd700b24c upstream. See http: //www.infradead.org/rpr.html X-Evolution-Source: 1451162204.2173.11@leira.trondhjem.org Content-Transfer-Encoding: 8bit Mime-Version: 1.0 We support OFFSET_MAX just fine, so don't round down below it. Also switch to using min_t to make the helper more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Fixes: `433c92379d` ("NFS: Clean up nfs_size_to_loff_t()") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-03 15:06:24 -08:00
James Bottomley	f80e6add95	ses: fix additional element traversal bug commit 5e1033561da1152c57b97ee84371dba2b3d64c25 upstream. KASAN found that our additional element processing scripts drop off the end of the VPD page into unallocated space. The reason is that not every element has additional information but our traversal routines think they do, leading to them expecting far more additional information than is present. Fix this by adding a gate to the traversal routine so that it only processes elements that are expected to have additional information (list is in SES-2 section 6.1.13.1: Additional Element Status diagnostic page overview) Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Tested-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-03 15:06:22 -08:00
Andrey Ryabinin	db13625b09	lockd: create NSM handles per net namespace commit 0ad95472bf169a3501991f8f33f5147f792a8116 upstream. Commit `cb7323fffa` ("lockd: create and use per-net NSM RPC clients on MON/UNMON requests") introduced per-net NSM RPC clients. Unfortunately this doesn't make any sense without per-net nsm_handle. E.g. the following scenario could happen Two hosts (X and Y) in different namespaces (A and B) share the same nsm struct. 1. nsm_monitor(host_X) called => NSM rpc client created, nsm->sm_monitored bit set. 2. nsm_mointor(host-Y) called => nsm->sm_monitored already set, we just exit. Thus in namespace B ln->nsm_clnt == NULL. 3. host X destroyed => nsm->sm_count decremented to 1 4. host Y destroyed => nsm_unmonitor() => nsm_mon_unmon() => NULL-ptr dereference of *ln->nsm_clnt So this could be fixed by making per-net nsm_handles list, instead of global. Thus different net namespaces will not be able share the same nsm_handle. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-03 15:06:20 -08:00
Hannes Frederic Sowa	341f09c01a	unix: correctly track in-flight fds in sending process user_struct commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6 upstream. The commit referenced in the Fixes tag incorrectly accounted the number of in-flight fds over a unix domain socket to the original opener of the file-descriptor. This allows another process to arbitrary deplete the original file-openers resource limit for the maximum of open files. Instead the sending processes and its struct cred should be credited. To do so, we add a reference counted struct user_struct pointer to the scm_fp_list and use it to account for the number of inflight unix fds. Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix sockets") Reported-by: David Herrmann <dh.herrmann@gmail.com> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-03 15:06:20 -08:00
Steven Rostedt (Red Hat)	c3e07084a8	tracepoints: Do not trace when cpu is offline commit f37755490fe9bf76f6ba1d8c6591745d3574a6a6 upstream. The tracepoint infrastructure uses RCU sched protection to enable and disable tracepoints safely. There are some instances where tracepoints are used in infrastructure code (like kfree()) that get called after a CPU is going offline, and perhaps when it is coming back online but hasn't been registered yet. This can probuce the following warning: [ INFO: suspicious RCU usage. ] 4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S ------------------------------- include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/8/0. stack backtrace: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S 4.4.0-00006-g0fe53e8-dirty #34 Call Trace: [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable) [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170 [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440 [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100 [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150 [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140 [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310 [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60 [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40 [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560 [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360 [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14 This warning is not a false positive either. RCU is not protecting code that is being executed while the CPU is offline. Instead of playing "whack-a-mole(TM)" and adding conditional statements to the tracepoints we find that are used in this instance, simply add a cpu_online() test to the tracepoint code where the tracepoint will be ignored if the CPU is offline. Use of raw_smp_processor_id() is fine, as there should never be a case where the tracepoint code goes from running on a CPU that is online and suddenly gets migrated to a CPU that is offline. Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.org Reported-by: Denis Kirjanov <kda@linux-powerpc.org> Fixes: `97e1c18e8d` ("tracing: Kernel Tracepoints") Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-03 15:06:18 -08:00
Theodore Ts'o	edf47f2882	ext4 crypto: revalidate dentry after adding or removing the key Add a validation check for dentries for encrypted directory to make sure we're not caching stale data after a key has been added or removed. Also check to make sure that status of the encryption key is updated when readdir(2) is executed. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com> Change-Id: Ic7a90d79d9447272fc512ae2abbd299523de02b8	2016-03-03 14:35:20 -08:00
Srinath Sridharan	ba7c59f518	Introducing per-task/per-process cpufreq statistics. The statistics are provided in /proc/<pid>/cpufreq_stats as a read-only interface. This gives the amount of time spent by a task/process in each of the frequencies on each of the CPUs. The cat output will have "<cpu> <frequency> <time>" triplet in each line, which will mean that a given task spent <time> usertime units of time at <frequency> on CPU <cpu>. usertime units here is 10mS (similar to other time exported in /proc). The statistics are available during the lifetime of the tasks/processes. Change-Id: I3ebb338c27020aabd6b4e258bd26981c51a57771	2016-03-02 18:02:13 -08:00
Konstantin Khlebnikov	c750edf63c	radix-tree: fix oops after radix_tree_iter_retry commit 732042821cfa106b3c20b9780e4c60fee9d68900 upstream. Helper radix_tree_iter_retry() resets next_index to the current index. In following radix_tree_next_slot current chunk size becomes zero. This isn't checked and it tries to dereference null pointer in slot. Tagged iterator is fine because retry happens only at slot 0 where tag bitmask in iter->tags is filled with single bit. Fixes: 46437f9a554f ("radix-tree: fix race in gang lookup") Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ohad Ben-Cohen <ohad@wizery.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-02-25 11:57:49 -08:00
Matthew Wilcox	a202017766	radix-tree: fix race in gang lookup commit 46437f9a554fbe3e110580ca08ab703b59f2f95a upstream. If the indirect_ptr bit is set on a slot, that indicates we need to redo the lookup. Introduce a new function radix_tree_iter_retry() which forces the loop to retry the lookup by setting 'slot' to NULL and turning the iterator back to point at the problematic entry. This is a pretty rare problem to hit at the moment; the lookup has to race with a grow of the radix tree from a height of 0. The consequences of hitting this race are that gang lookup could return a pointer to a radix_tree_node instead of a pointer to whatever the user had inserted in the tree. Fixes: `cebbd29e1c` ("radix-tree: rewrite gang lookup using iterator") Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ohad Ben-Cohen <ohad@wizery.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-02-25 11:57:49 -08:00
Arnd Bergmann	710636bc7a	tracing: Fix freak link error caused by branch tracer commit b33c8ff4431a343561e2319f17c14286f2aa52e2 upstream. In my randconfig tests, I came across a bug that involves several components: * gcc-4.9 through at least 5.3 * CONFIG_GCOV_PROFILE_ALL enabling -fprofile-arcs for all files * CONFIG_PROFILE_ALL_BRANCHES overriding every if() * The optimized implementation of do_div() that tries to replace a library call with an division by multiplication * code in drivers/media/dvb-frontends/zl10353.c doing u32 adc_clock = 450560; /* 45.056 MHz */ if (state->config.adc_clock) adc_clock = state->config.adc_clock; do_div(value, adc_clock); In this case, gcc fails to determine whether the divisor in do_div() is __builtin_constant_p(). In particular, it concludes that __builtin_constant_p(adc_clock) is false, while __builtin_constant_p(!!adc_clock) is true. That in turn throws off the logic in do_div() that also uses __builtin_constant_p(), and instead of picking either the constant- optimized division, and the code in ilog2() that uses __builtin_constant_p() to figure out whether it knows the answer at compile time. The result is a link error from failing to find multiple symbols that should never have been called based on the __builtin_constant_p(): dvb-frontends/zl10353.c:138: undefined reference to `____ilog2_NaN' dvb-frontends/zl10353.c:138: undefined reference to `__aeabi_uldivmod' ERROR: "____ilog2_NaN" [drivers/media/dvb-frontends/zl10353.ko] undefined! ERROR: "__aeabi_uldivmod" [drivers/media/dvb-frontends/zl10353.ko] undefined! This patch avoids the problem by changing __trace_if() to check whether the condition is known at compile-time to be nonzero, rather than checking whether it is actually a constant. I see this one link error in roughly one out of 1600 randconfig builds on ARM, and the patch fixes all known instances. Link: http://lkml.kernel.org/r/1455312410-1058841-1-git-send-email-arnd@arndb.de Acked-by: Nicolas Pitre <nico@linaro.org> Fixes: `ab3c9c686e` ("branch tracer, intel-iommu: fix build with CONFIG_BRANCH_TRACER=y") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-02-25 11:57:47 -08:00
Jann Horn	414f6fbc84	ptrace: use fsuid, fsgid, effective creds for fs access checks commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream. By checking the effective credentials instead of the real UID / permitted capabilities, ensure that the calling process actually intended to use its credentials. To ensure that all ptrace checks use the correct caller credentials (e.g. in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS flag), use two new flags and require one of them to be set. The problem was that when a privileged task had temporarily dropped its privileges, e.g. by calling setreuid(0, user_uid), with the intent to perform following syscalls with the credentials of a user, it still passed ptrace access checks that the user would not be able to pass. While an attacker should not be able to convince the privileged task to perform a ptrace() syscall, this is a problem because the ptrace access check is reused for things in procfs. In particular, the following somewhat interesting procfs entries only rely on ptrace access checks: /proc/$pid/stat - uses the check for determining whether pointers should be visible, useful for bypassing ASLR /proc/$pid/maps - also useful for bypassing ASLR /proc/$pid/cwd - useful for gaining access to restricted directories that contain files with lax permissions, e.g. in this scenario: lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar drwx------ root root /root drwxr-xr-x root root /root/foobar -rw-r--r-- root root /root/foobar/secret Therefore, on a system where a root-owned mode 6755 binary changes its effective credentials as described and then dumps a user-specified file, this could be used by an attacker to reveal the memory layout of root's processes or reveal the contents of files he is not allowed to access (through /proc/$pid/cwd). [akpm@linux-foundation.org: fix warning] Signed-off-by: Jann Horn <jann@thejh.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-02-25 11:57:47 -08:00
Herton R. Krzesinski	8355335f9b	pty: make sure super_block is still valid in final /dev/tty close commit 1f55c718c290616889c04946864a13ef30f64929 upstream. Considering current pty code and multiple devpts instances, it's possible to umount a devpts file system while a program still has /dev/tty opened pointing to a previosuly closed pty pair in that instance. In the case all ptmx and pts/N files are closed, umount can be done. If the program closes /dev/tty after umount is done, devpts_kill_index will use now an invalid super_block, which was already destroyed in the umount operation after running ->kill_sb. This is another "use after free" type of issue, but now related to the allocated super_block instance. To avoid the problem (warning at ida_remove and potential crashes) for this specific case, I added two functions in devpts which grabs additional references to the super_block, which pty code now uses so it makes sure the super block structure is still valid until pty shutdown is done. I also moved the additional inode references to the same functions, which also covered similar case with inode being freed before /dev/tty final close/shutdown. Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-02-25 11:57:46 -08:00
Herbert Xu	5a707f0972	crypto: af_alg - Disallow bind/setkey/... after accept(2) commit c840ac6af3f8713a71b4d2363419145760bd6044 upstream. Each af_alg parent socket obtained by socket(2) corresponds to a tfm object once bind(2) has succeeded. An accept(2) call on that parent socket creates a context which then uses the tfm object. Therefore as long as any child sockets created by accept(2) exist the parent socket must not be modified or freed. This patch guarantees this by using locks and a reference count on the parent socket. Any attempt to modify the parent socket will fail with EBUSY. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-02-19 14:22:41 -08:00
Richard Weinberger	fa17bfe3b7	kernel/signal.c: unexport sigsuspend() commit 9d8a765211335cfdad464b90fb19f546af5706ae upstream. sigsuspend() is nowhere used except in signal.c itself, so we can mark it static do not pollute the global namespace. But this patch is more than a boring cleanup patch, it fixes a real issue on UserModeLinux. UML has a special console driver to display ttys using xterm, or other terminal emulators, on the host side. Vegard reported that sometimes UML is unable to spawn a xterm and he's facing the following warning: WARNING: CPU: 0 PID: 908 at include/linux/thread_info.h:128 sigsuspend+0xab/0xc0() It turned out that this warning makes absolutely no sense as the UML xterm code calls sigsuspend() on the host side, at least it tries. But as the kernel itself offers a sigsuspend() symbol the linker choose this one instead of the glibc wrapper. Interestingly this code used to work since ever but always blocked signals on the wrong side. Some recent kernel change made the WARN_ON() trigger and uncovered the bug. It is a wonderful example of how much works by chance on computers. :-) Fixes: `68f3f16d9a` ("new helper: sigsuspend()") Signed-off-by: Richard Weinberger <richard@nod.at> Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-02-19 14:22:37 -08:00
Srinivas Girigowda	48ad99e762	wireless: sort and extend element ID list The element ID list is currently almost sorted by amendment or similar topic, but the order is difficult to maintain and not very transparent. Sort the list by ID instead, and add a lot of missing IDs. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Git-commit: 8c78e38025060a00155a73bf722152c156242490 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Change-Id: I080099705fae20a477cfd4b04bfb65e9dfc7bef6 CRs-fixed: 969965 [vidyullatha@codeaurora.org: backport to 3.10 - This commit includes the changes in enum ieee80211_eid to add a number of missing IDs]. Signed-off-by: Vidyullatha Kanchanapally <vidyullatha@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-02-19 21:58:30 +00:00
Srinivas Girigowda	cd08158240	msm: ipa: Reset uC ready cb On calling client callback after uC loaded reset the uC ready cb to avoid further access Change-Id: I92495c373e5c832c9a527eacce1db406aeb0c32e Acked by: Sunil Paidimarri <hisunil@qti.qualcomm.com> Signed-off-by: Subhani Shaik <subhanis@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-02-19 21:56:42 +00:00
Srinivas Girigowda	a9f143423b	Net: CNSS: refactoring CNSS platform Driver Separate the WLAN CNSS platform driver according to the bus interface PCIe and SDIO. Add separate Kernel Config file for the WLAN CNSS platform Driver and its necessary module which has support for the CNSS connectivity Subsystem. Add Kernel Config flag to refactoring the CNSS platform Driver: CONFIG_CNSS Kernel Config add support to CNSS Core driver compilation and export Generic GPL wrappers. CONFIG_CNSS_SDIO Kernel Config add support to CNSS Platform Driver compilation for SDIO based WiFi Devices and export platform driver API's based on the SDIO bus. CONFIG_CNSS_PCI Kernel Config add support to CNSS Platform Driver compilation for PCIe based WiFi Devices and export platform driver API's based on the PCIe bus. CRs-fixed: 939171 Change-Id: I8cce5bbc87e6742179a7967ccba295f55c444b28 Signed-off-by: Kai Liu <kaliu@codeaurora.org> Signed-off-by: Sarada Prasanna Garnayak <sgarna@codeaurora.org> Signed-off-by: Amarnath Hullur Subramanyam <amarnath@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-02-19 21:56:31 +00:00
Srinivas Girigowda	24b02450fa	cnss: Refactor PreAlloc Driver for CLD Architecture Compile PreAlloc Driver independent of CNSS/WCNSS config. Refactor PreAlloc Driver to support all platforms independent of platform driver support. CRs-Fixed: 850751 Change-Id: Ic4cad794e9a1aea89dbc388f5aec55779ee64a50 Signed-off-by: Komal Seelam <kseelam@codeaurora.org> Signed-off-by: Amarnath Hullur Subramanyam <amarnath@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-02-19 21:56:19 +00:00
Srinivas Girigowda	94cf50d464	cfg80211: Add attributes describing prohibited channel bandwidth. Since there are frequency bands (e.g. 5.9GHz) allowing channels with only 10 or 5 MHz bandwidth, this patch adds attributes that allow keeping track about this information. When channel attributes are reported to user-space, make sure to not break old tools, i.e. if the 'split wiphy dump' is enabled, report the extra attributes (if present) describing the bandwidth restrictions. If the 'split wiphy dump' is not enabled, completely omit those channels that have flags set to either IEEE80211_CHAN_NO_10MHZ or IEEE80211_CHAN_NO_20MHZ. Add the check for new bandwidth restriction flags in cfg80211_chandef_usable() to comply with the restrictions. Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Git-commit: ea077c1cea36a6b5ded1256dcd56c72ff2a22c62 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Change-Id: I8610e5a7594099a99a28b699c42c40a2a62ab397 CRs-Fixed: 754373 Signed-off-by: Samuel Ahn <sahn@codeaurora.org> Signed-off-by: Sunil Dutt <usdutt@codeaurora.org> Signed-off-by: Amarnath Hullur Subramanyam <amarnath@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-02-19 21:56:10 +00:00
Srinivas Girigowda	951e222cd8	nl80211/cfg80211: add 5 and 10 MHz defines and wiphy flag. Add defines for 5 and 10 MHz channel width and fix channel handling functions accordingly. Also check for and report the WIPHY_FLAG_SUPPORTS_5_10_MHZ capability. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> [fix spelling in comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Git-commit: 2f301ab29e4656af824592363039d8f6bd5a9f68 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git CRs-Fixed: 754373 Change-Id: Id0795fb95200f532589edfd8d0701c42fa27ebce [sahn@codeaurora.org: resolve merge conflict by redefining WIPHY_FLAG_DFS_OFFLOAD to BIT(24) instead of BIT(22). WIPHY_FLAG_DFS_OFFLOAD is used internally only.] Signed-off-by: Samuel Ahn <sahn@codeaurora.org> Signed-off-by: Sunil Dutt <usdutt@codeaurora.org> Signed-off-by: Amarnath Hullur Subramanyam <amarnath@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-02-19 21:56:03 +00:00
Srinivas Girigowda	9cfb426e7d	msm: ipa: Support for wlan init before uC is loaded uC is not loaded during wlan auto bootup. Provide APIs to registeir callback for uC ready. CRs-Fixed: 786658 Change-Id: Ia7c7c10108b7da697da4d97ece359c583355f0c7 Acked-by: Sunil Kumar Paidimarri <hisunil@qti.qualcomm.com> Signed-off-by: Skylar Chang <chiaweic@codeaurora.org> Signed-off-by: Amarnath Hullur Subramanyam <amarnath@codeaurora.org> Signed-off-by: Srinivas Girigowda <sgirigow@codeaurora.org>	2016-02-19 21:55:55 +00:00
Bongkyu Kim	d806e1b71c	UPSTREAM: lz4: fix wrong compress buffer size for 64-bits (cherry picked from commit 06af1c52c9ea234e0b1266cc0b52c3e0c6c8fe9f) The current lz4 compress buffer is 16kb on 32-bits, 32kb on 64-bits system. But, lz4 needs only 16kb on both. On 64-bits, this causes wasted cpu cycles for additional memset during every compression. In case of lz4hc, the current buffer size is (256kb + 8) on 32-bits, (512kb + 16) on 64-bits. But, lz4hc needs only (256kb + 2 * pointer) on both. This patch fixes these wrong compress buffer sizes for 64-bits. Signed-off-by: Bongkyu Kim <bongkyu.kim@lge.com> Cc: Chanho Min <chanho.min@lge.com> Cc: Yann Collet <yann.collet.73@gmail.com> Cc: Kyungsik Lee <kyungsik.lee@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-02-10 01:37:24 +00:00
Ben Fennema	dd4b8a08e5	nanohub: add nanohub kernel driver Adds nanohub kernel driver files as well as device tree configuration to use nanohub instead of contexthub (spich) Change-Id: I089dd5c3a08968a002e965e8d1ed260b93e32ce3 Signed-off-by: Ben Fennema <fennema@google.com>	2016-02-05 12:51:01 -08:00
Vineeta Srivastava	e793ac05ce	Revert "Revert "ASoC: msm: fix integer overflow for long duration offload playback"" This reverts commit 96cc7c3532999084f8a25f8d42b91e3e8e8c32b4.	2016-02-02 06:08:19 +00:00
Arnd Bergmann	7c2543203b	arm64: fix building without CONFIG_UID16 commit fbc416ff86183e2203cdf975e2881d7c164b0271 upstream. As reported by Michal Simek, building an ARM64 kernel with CONFIG_UID16 disabled currently fails because the system call table still needs to reference the individual function entry points that are provided by kernel/sys_ni.c in this case, and the declarations are hidden inside of #ifdef CONFIG_UID16: arch/arm64/include/asm/unistd32.h:57:8: error: 'sys_lchown16' undeclared here (not in a function) __SYSCALL(__NR_lchown, sys_lchown16) I believe this problem only exists on ARM64, because older architectures tend to not need declarations when their system call table is built in assembly code, while newer architectures tend to not need UID16 support. ARM64 only uses these system calls for compatibility with 32-bit ARM binaries. This changes the CONFIG_UID16 check into CONFIG_HAVE_UID16, which is set unconditionally on ARM64 with CONFIG_COMPAT, so we see the declarations whenever we need them, but otherwise the behavior is unchanged. Fixes: `af1839eb4b` ("Kconfig: clean up the long arch list for the UID16 config option") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-01-28 21:49:35 -08:00
willy tarreau	df87da0783	unix: properly account for FDs passed over unix sockets [ Upstream commit 712f4aad406bb1ed67f3f98d04c044191f0ff593 ] It is possible for a process to allocate and accumulate far more FDs than the process' limit by sending them over a unix socket then closing them to keep the process' fd count low. This change addresses this problem by keeping track of the number of FDs in flight per user and preventing non-privileged processes from having more FDs in flight than their configured FD limit. Reported-by: socketpair@gmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Mitigates: CVE-2013-4312 (Linux 2.0+) Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-01-28 21:49:33 -08:00
Lorenzo Colitti	630fd91e7e	net: diag: Support destroying TCP sockets. This implements SOCK_DESTROY for TCP sockets. It causes all blocking calls on the socket to fail fast with ECONNABORTED and causes a protocol close of the socket. It informs the other end of the connection by sending a RST, i.e., initiating a TCP ABORT as per RFC 793. ECONNABORTED was chosen for consistency with FreeBSD. [Backport of net-next c1e64e298b8cad309091b95d8436a0255c84f54a] Change-Id: Ice9aad37741fe497341d1d2a51e0b70601a99c90 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-27 16:38:37 +09:00
Lorenzo Colitti	092dfcfd6b	net: diag: Support SOCK_DESTROY for inet sockets. This passes the SOCK_DESTROY operation to the underlying protocol diag handler, or returns -EOPNOTSUPP if that handler does not define a destroy operation. Most of this patch is just renaming functions. This is not strictly necessary, but it would be fairly counterintuitive to have the code to destroy inet sockets be in a function whose name starts with inet_diag_get. [Backport of net-next 6eb5d2e08f071c05ecbe135369c9ad418826cab2] Change-Id: Iee2c858bf11c48f54890b85b87821a2a2d7109e1 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-27 16:38:35 +09:00
Lorenzo Colitti	423a24852d	net: diag: Add the ability to destroy a socket. This patch adds a SOCK_DESTROY operation, a destroy function pointer to sock_diag_handler, and a diag_destroy function pointer. It does not include any implementation code. [Backport of net-next 64be0aed59ad519d6f2160868734f7e278290ac1] Change-Id: I3db262a7e41f1f8452ff0968d4001234598190d8 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-27 16:38:34 +09:00
Lorenzo Colitti	e4fe385f32	net: diag: split inet_diag_dump_one_icsk into two Currently, inet_diag_dump_one_icsk finds a socket and then dumps its information to userspace. Split it into a part that finds the socket and a part that dumps the information. [Backport of net-next b613f56ec9baf30edf5d9d607b822532a273dad7] Change-Id: I7aec27aca9c3e395e41332fe4e59d720042e0609 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-27 16:38:33 +09:00
Hannes Frederic Sowa	0b15bc2925	net: add validation for the socket syscall protocol argument [ Upstream commit 79462ad02e861803b3840cc782248c7359451cd9 ] 郭永刚 reported that one could simply crash the kernel as root by using a simple program: int socket_fd; struct sockaddr_in addr; addr.sin_port = 0; addr.sin_addr.s_addr = INADDR_ANY; addr.sin_family = 10; socket_fd = socket(10,3,0x40000000); connect(socket_fd , &addr,16); AF_INET, AF_INET6 sockets actually only support 8-bit protocol identifiers. inet_sock's skc_protocol field thus is sized accordingly, thus larger protocol identifiers simply cut off the higher bits and store a zero in the protocol fields. This could lead to e.g. NULL function pointer because as a result of the cut off inet_num is zero and we call down to inet_autobind, which is NULL for raw sockets. kernel: Call Trace: kernel: [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70 kernel: [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80 kernel: [<ffffffff81645069>] SYSC_connect+0xd9/0x110 kernel: [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80 kernel: [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200 kernel: [<ffffffff81645e0e>] SyS_connect+0xe/0x10 kernel: [<ffffffff81779515>] tracesys_phase2+0x84/0x89 I found no particular commit which introduced this problem. CVE: CVE-2015-8543 Cc: Cong Wang <cwang@twopensource.com> Reported-by: 郭永刚 <guoyonggang@360.cn> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-01-22 19:47:55 -08:00
Marcelo Ricardo Leitner	33fbe78ae8	sctp: update the netstamp_needed counter when copying sockets [ Upstream commit 01ce63c90170283a9855d1db4fe81934dddce648 ] Dmitry Vyukov reported that SCTP was triggering a WARN on socket destroy related to disabling sock timestamp. When SCTP accepts an association or peel one off, it copies sock flags but forgot to call net_enable_timestamp() if a packet timestamping flag was copied, leading to extra calls to net_disable_timestamp() whenever such clones were closed. The fix is to call net_enable_timestamp() whenever we copy a sock with that flag on, like tcp does. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-01-22 19:47:54 -08:00
Alan Stern	506d8269fb	USB: add quirk for devices with broken LPM commit ad87e03213b552a5c33d5e1e7a19a73768397010 upstream. Some USB device / host controller combinations seem to have problems with Link Power Management. For example, Steinar found that his xHCI controller wouldn't handle bandwidth calculations correctly for two video cards simultaneously when LPM was enabled, even though the bus had plenty of bandwidth available. This patch introduces a new quirk flag for devices that should remain disabled for LPM, and creates quirk entries for Steinar's devices. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-01-22 19:47:54 -08:00
Daeho Jeong	fe4b6c2682	ext4, jbd2: ensure entering into panic after recording an error in superblock commit 4327ba52afd03fc4b5afa0ee1d774c9c5b0e85c5 upstream. If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the journaling will be aborted first and the error number will be recorded into JBD2 superblock and, finally, the system will enter into the panic state in "errors=panic" option. But, in the rare case, this sequence is little twisted like the below figure and it will happen that the system enters into panic state, which means the system reset in mobile environment, before completion of recording an error in the journal superblock. In this case, e2fsck cannot recognize that the filesystem failure occurred in the previous run and the corruption wouldn't be fixed. Task A Task B ext4_handle_error() -> jbd2_journal_abort() -> __journal_abort_soft() -> __jbd2_journal_abort_hard() \| -> journal->j_flags \|= JBD2_ABORT; \| \| __ext4_abort() \| -> jbd2_journal_abort() \| \| -> __journal_abort_soft() \| \| -> if (journal->j_flags & JBD2_ABORT) \| \| return; \| -> panic() \| -> jbd2_journal_update_sb_errno() Tested-by: Hobin Woo <hobin.woo@samsung.com> Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-01-22 19:47:52 -08:00
Michal Kubeček	7cd9f60220	ipv6: distinguish frag queues by device for multicast and link-local packets [ Upstream commit 264640fc2c5f4f913db5c73fa3eb1ead2c45e9d7 ] If a fragmented multicast packet is received on an ethernet device which has an active macvlan on top of it, each fragment is duplicated and received both on the underlying device and the macvlan. If some fragments for macvlan are processed before the whole packet for the underlying device is reassembled, the "overlapping fragments" test in ip6_frag_queue() discards the whole fragment queue. To resolve this, add device ifindex to the search key and require it to match reassembling multicast packets and packets to link-local addresses. Note: similar patch has been already submitted by Yoshifuji Hideaki in http://patchwork.ozlabs.org/patch/220979/ but got lost and forgotten for some reason. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-01-22 19:47:52 -08:00
Rainer Weikusat	da8db0830a	unix: avoid use-after-free in ep_remove_wait_queue [ Upstream commit 7d267278a9ece963d77eefec61630223fce08c6c ] Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: `ec0d215f94` ("af_unix: fix 'poll for write'/connected DGRAM sockets") Reviewed-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-01-22 19:47:51 -08:00
dcashman	f6abe1ab29	FROMLIST: mm: mmap: Add new /proc tunable for mmap_base ASLR. (cherry picked from commit https://lkml.org/lkml/2015/12/21/337) ASLR only uses as few as 8 bits to generate the random offset for the mmap base address on 32 bit architectures. This value was chosen to prevent a poorly chosen value from dividing the address space in such a way as to prevent large allocations. This may not be an issue on all platforms. Allow the specification of a minimum number of bits so that platforms desiring greater ASLR protection may determine where to place the trade-off. Bug: 24047224 Signed-off-by: Daniel Cashman <dcashman@android.com> Signed-off-by: Daniel Cashman <dcashman@google.com> Change-Id: I66ac01c6f4f2c8dcfc84d1f1e99490b8385b3ed4	2016-01-14 11:48:20 -08:00
Nikhilesh Reddy	e24429f34d	fuse: Add support for fuse stacked I/O Add support for filesystem stacked read/write of files when enabled through a userspace init option of FUSE_STACKED_IO. When FUSE_STACKED_IO is enabled all the reads and writes to the fuse mount point go directly to the native filesystem rather than through the fuse daemon. All requests that aren't read/write still go thought the userspace code. Mmaped I/O is still not supported through stacking and can be added in. This allows for significantly better performance on read and writes. The difference in performance between fuse and the native lower filesystem is negligible. There is also a significant cpu/power savings that is achieved which is really important on embedded systems that use fuse for I/O Change-Id: Ic2e6b69df5acd92844999a0b7dd9c1c1db185d50 Signed-off-by: Nikhilesh Reddy <reddyn@codeaurora.org>	2016-01-14 11:43:28 -08:00
Vasily Kulikov	28987b16de	include/linux/poison.h: fix LIST_POISON{1,2} offset Poison pointer values should be small enough to find a room in non-mmap'able/hardly-mmap'able space. E.g. on x86 "poison pointer space" is located starting from 0x0. Given unprivileged users cannot mmap anything below mmap_min_addr, it should be safe to use poison pointers lower than mmap_min_addr. The current poison pointer values of LIST_POISON{1,2} might be too big for mmap_min_addr values equal or less than 1 MB (common case, e.g. Ubuntu uses only 0x10000). There is little point to use such a big value given the "poison pointer space" below 1 MB is not yet exhausted. Changing it to a smaller value solves the problem for small mmap_min_addr setups. The values are suggested by Solar Designer: http://www.openwall.com/lists/oss-security/2015/05/02/6 Bug: 26186802 Change-Id: I829d12b95c4793edec07d97d47a3986fd96bedd4 Signed-off-by: Yuan Lin <yualin@google.com>	2016-01-07 15:52:07 -05:00
Vlastimil Babka	895ac1ed5b	mm: when stealing freepages, also take pages created by splitting buddy page When studying page stealing, I noticed some weird looking decisions in try_to_steal_freepages(). The first I assume is a bug (Patch 1), the following two patches were driven by evaluation. Testing was done with stress-highalloc of mmtests, using the mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how often page stealing occurs for individual migratetypes, and what migratetypes are used for fallbacks. Arguably, the worst case of page stealing is when UNMOVABLE allocation steals from MOVABLE pageblock. RECLAIMABLE allocation stealing from MOVABLE allocation is also not ideal, so the goal is to minimize these two cases. The evaluation of v2 wasn't always clear win and Joonsoo questioned the results. Here I used different baseline which includes RFC compaction improvements from [1]. I found that the compaction improvements reduce variability of stress-highalloc, so there's less noise in the data. First, let's look at stress-highalloc configured to do sync compaction, and how these patches reduce page stealing events during the test. First column is after fresh reboot, other two are reiterations of test without reboot. That was all accumulater over 5 re-iterations (so the benchmark was run 5x3 times with 5 fresh restarts). Baseline: 3.19-rc4 3.19-rc4 3.19-rc4 5-nothp-1 5-nothp-2 5-nothp-3 Page alloc extfrag event 10264225 8702233 10244125 Extfrag fragmenting 10263271 8701552 10243473 Extfrag fragmenting for unmovable 13595 17616 15960 Extfrag fragmenting unmovable placed with movable 7989 12193 8447 Extfrag fragmenting for reclaimable 658 1840 1817 Extfrag fragmenting reclaimable placed with movable 558 1677 1679 Extfrag fragmenting for movable 10249018 8682096 10225696 With Patch 1: 3.19-rc4 3.19-rc4 3.19-rc4 6-nothp-1 6-nothp-2 6-nothp-3 Page alloc extfrag event 11834954 9877523 9774860 Extfrag fragmenting 11833993 9876880 9774245 Extfrag fragmenting for unmovable 7342 16129 11712 Extfrag fragmenting unmovable placed with movable 4191 10547 6270 Extfrag fragmenting for reclaimable 373 1130 923 Extfrag fragmenting reclaimable placed with movable 302 906 738 Extfrag fragmenting for movable 11826278 9859621 9761610 With Patch 2: 3.19-rc4 3.19-rc4 3.19-rc4 7-nothp-1 7-nothp-2 7-nothp-3 Page alloc extfrag event 4725990 3668793 3807436 Extfrag fragmenting 4725104 3668252 3806898 Extfrag fragmenting for unmovable 6678 7974 7281 Extfrag fragmenting unmovable placed with movable 2051 3829 4017 Extfrag fragmenting for reclaimable 429 1208 1278 Extfrag fragmenting reclaimable placed with movable 369 976 1034 Extfrag fragmenting for movable 4717997 3659070 3798339 With Patch 3: 3.19-rc4 3.19-rc4 3.19-rc4 8-nothp-1 8-nothp-2 8-nothp-3 Page alloc extfrag event 5016183 4700142 3850633 Extfrag fragmenting 5015325 4699613 3850072 Extfrag fragmenting for unmovable 1312 3154 3088 Extfrag fragmenting unmovable placed with movable 1115 2777 2714 Extfrag fragmenting for reclaimable 437 1193 1097 Extfrag fragmenting reclaimable placed with movable 330 969 879 Extfrag fragmenting for movable 5013576 4695266 3845887 In v2 we've seen apparent regression with Patch 1 for unmovable events, this is now gone, suggesting it was indeed noise. Here, each patch improves the situation for unmovable events. Reclaimable is improved by patch 1 and then either the same modulo noise, or perhaps sligtly worse - a small price for unmovable improvements, IMHO. The number of movable allocations falling back to other migratetypes is most noisy, but it's reduced to half at Patch 2 nevertheless. These are least critical as compaction can move them around. If we look at success rates, the patches don't affect them, that didn't change. Baseline: 3.19-rc4 3.19-rc4 3.19-rc4 5-nothp-1 5-nothp-2 5-nothp-3 Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%) Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%) Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%) Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%) Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%) Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%) Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%) Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%) Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%) Patch 1: 3.19-rc4 3.19-rc4 3.19-rc4 6-nothp-1 6-nothp-2 6-nothp-3 Success 1 Min 49.00 ( 0.00%) 44.00 ( 10.20%) 44.00 ( 10.20%) Success 1 Mean 51.80 ( 0.00%) 46.00 ( 11.20%) 45.80 ( 11.58%) Success 1 Max 54.00 ( 0.00%) 49.00 ( 9.26%) 49.00 ( 9.26%) Success 2 Min 58.00 ( 0.00%) 49.00 ( 15.52%) 48.00 ( 17.24%) Success 2 Mean 60.40 ( 0.00%) 51.80 ( 14.24%) 50.80 ( 15.89%) Success 2 Max 63.00 ( 0.00%) 54.00 ( 14.29%) 55.00 ( 12.70%) Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%) Success 3 Mean 85.00 ( 0.00%) 81.60 ( 4.00%) 79.80 ( 6.12%) Success 3 Max 86.00 ( 0.00%) 82.00 ( 4.65%) 82.00 ( 4.65%) Patch 2: 3.19-rc4 3.19-rc4 3.19-rc4 7-nothp-1 7-nothp-2 7-nothp-3 Success 1 Min 50.00 ( 0.00%) 44.00 ( 12.00%) 39.00 ( 22.00%) Success 1 Mean 52.80 ( 0.00%) 45.60 ( 13.64%) 42.40 ( 19.70%) Success 1 Max 55.00 ( 0.00%) 46.00 ( 16.36%) 47.00 ( 14.55%) Success 2 Min 52.00 ( 0.00%) 48.00 ( 7.69%) 45.00 ( 13.46%) Success 2 Mean 53.40 ( 0.00%) 49.80 ( 6.74%) 48.80 ( 8.61%) Success 2 Max 57.00 ( 0.00%) 52.00 ( 8.77%) 52.00 ( 8.77%) Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%) Success 3 Mean 85.00 ( 0.00%) 82.40 ( 3.06%) 79.60 ( 6.35%) Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%) Patch 3: 3.19-rc4 3.19-rc4 3.19-rc4 8-nothp-1 8-nothp-2 8-nothp-3 Success 1 Min 46.00 ( 0.00%) 44.00 ( 4.35%) 42.00 ( 8.70%) Success 1 Mean 50.20 ( 0.00%) 45.60 ( 9.16%) 44.00 ( 12.35%) Success 1 Max 52.00 ( 0.00%) 47.00 ( 9.62%) 47.00 ( 9.62%) Success 2 Min 53.00 ( 0.00%) 49.00 ( 7.55%) 48.00 ( 9.43%) Success 2 Mean 55.80 ( 0.00%) 50.60 ( 9.32%) 49.00 ( 12.19%) Success 2 Max 59.00 ( 0.00%) 52.00 ( 11.86%) 51.00 ( 13.56%) Success 3 Min 84.00 ( 0.00%) 80.00 ( 4.76%) 79.00 ( 5.95%) Success 3 Mean 85.40 ( 0.00%) 81.60 ( 4.45%) 80.40 ( 5.85%) Success 3 Max 87.00 ( 0.00%) 83.00 ( 4.60%) 82.00 ( 5.75%) While there's no improvement here, I consider reduced fragmentation events to be worth on its own. Patch 2 also seems to reduce scanning for free pages, and migrations in compaction, suggesting it has somewhat less work to do: Patch 1: Compaction stalls 4153 3959 3978 Compaction success 1523 1441 1446 Compaction failures 2630 2517 2531 Page migrate success 4600827 4943120 5104348 Page migrate failure 19763 16656 17806 Compaction pages isolated 9597640 10305617 10653541 Compaction migrate scanned 77828948 86533283 87137064 Compaction free scanned 517758295 521312840 521462251 Compaction cost 5503 5932 6110 Patch 2: Compaction stalls 3800 3450 3518 Compaction success 1421 1316 1317 Compaction failures 2379 2134 2201 Page migrate success 4160421 4502708 4752148 Page migrate failure 19705 14340 14911 Compaction pages isolated 8731983 9382374 9910043 Compaction migrate scanned 98362797 96349194 98609686 Compaction free scanned 496512560 469502017 480442545 Compaction cost 5173 5526 5811 As with v2, /proc/pagetypeinfo appears unaffected with respect to numbers of unmovable and reclaimable pageblocks. Configuring the benchmark to allocate like THP page fault (i.e. no sync compaction) gives much noisier results for iterations 2 and 3 after reboot. This is not so surprising given how [1] offers lower improvements in this scenario due to less restarts after deferred compaction which would change compaction pivot. Baseline: 3.19-rc4 3.19-rc4 3.19-rc4 5-thp-1 5-thp-2 5-thp-3 Page alloc extfrag event 8148965 6227815 6646741 Extfrag fragmenting 8147872 6227130 6646117 Extfrag fragmenting for unmovable 10324 12942 15975 Extfrag fragmenting unmovable placed with movable 5972 8495 10907 Extfrag fragmenting for reclaimable 601 1707 2210 Extfrag fragmenting reclaimable placed with movable 520 1570 2000 Extfrag fragmenting for movable 8136947 6212481 6627932 Patch 1: 3.19-rc4 3.19-rc4 3.19-rc4 6-thp-1 6-thp-2 6-thp-3 Page alloc extfrag event 8345457 7574471 7020419 Extfrag fragmenting 8343546 7573777 7019718 Extfrag fragmenting for unmovable 10256 18535 30716 Extfrag fragmenting unmovable placed with movable 6893 11726 22181 Extfrag fragmenting for reclaimable 465 1208 1023 Extfrag fragmenting reclaimable placed with movable 353 996 843 Extfrag fragmenting for movable 8332825 7554034 6987979 Patch 2: 3.19-rc4 3.19-rc4 3.19-rc4 7-thp-1 7-thp-2 7-thp-3 Page alloc extfrag event 3512847 3020756 2891625 Extfrag fragmenting 3511940 3020185 2891059 Extfrag fragmenting for unmovable 9017 6892 6191 Extfrag fragmenting unmovable placed with movable 1524 3053 2435 Extfrag fragmenting for reclaimable 445 1081 1160 Extfrag fragmenting reclaimable placed with movable 375 918 986 Extfrag fragmenting for movable 3502478 3012212 2883708 Patch 3: 3.19-rc4 3.19-rc4 3.19-rc4 8-thp-1 8-thp-2 8-thp-3 Page alloc extfrag event 3181699 3082881 2674164 Extfrag fragmenting 3180812 3082303 2673611 Extfrag fragmenting for unmovable 1201 4031 4040 Extfrag fragmenting unmovable placed with movable 974 3611 3645 Extfrag fragmenting for reclaimable 478 1165 1294 Extfrag fragmenting reclaimable placed with movable 387 985 1030 Extfrag fragmenting for movable 3179133 3077107 2668277 The improvements for first iteration are clear, the rest is much noisier and can appear like regression for Patch 1. Anyway, patch 2 rectifies it. Allocation success rates are again unaffected so there's no point in making this e-mail any longer. [1] http://marc.info/?l=linux-mm&m=142166196321125&w=2 This patch (of 3): When __rmqueue_fallback() is called to allocate a page of order X, it will find a page of order Y >= X of a fallback migratetype, which is different from the desired migratetype. With the help of try_to_steal_freepages(), it may change the migratetype (to the desired one) also of: 1) all currently free pages in the pageblock containing the fallback page 2) the fallback pageblock itself 3) buddy pages created by splitting the fallback page (when Y > X) These decisions take the order Y into account, as well as the desired migratetype, with the goal of preventing multiple fallback allocations that could e.g. distribute UNMOVABLE allocations among multiple pageblocks. Originally, decision for 1) has implied the decision for 3). Commit `47118af076` ("mm: mmzone: MIGRATE_CMA migration type added") changed that (probably unintentionally) so that the buddy pages in case 3) are always changed to the desired migratetype, except for CMA pageblocks. Commit fef903efcf0c ("mm/page_allo.c: restructure free-page stealing code and fix a bug") did some refactoring and added a comment that the case of 3) is intended. Commit 0cbef29a7821 ("mm: __rmqueue_fallback() should respect pageblock type") removed the comment and tried to restore the original behavior where 1) implies 3), but due to the previous refactoring, the result is instead that only 2) implies 3) - and the conditions for 2) are less frequently met than conditions for 1). This may increase fragmentation in situations where the code decides to steal all free pages from the pageblock (case 1)), but then gives back the buddy pages produced by splitting. This patch restores the original intended logic where 1) implies 3). During testing with stress-highalloc from mmtests, this has shown to decrease the number of events where UNMOVABLE and RECLAIMABLE allocations steal from MOVABLE pageblocks, which can lead to permanent fragmentation. In some cases it has increased the number of events when MOVABLE allocations steal from UNMOVABLE or RECLAIMABLE pageblocks, but these are fixable by sync compaction and thus less harmful. Note that evaluation has shown that the behavior introduced by `47118af076` for buddy pages in case 3) is actually even better than the original logic, so the following patch will introduce it properly once again. For stable backports of this patch it makes thus sense to only fix versions containing 0cbef29a7821. [iamjoonsoo.kim@lge.com: tracepoint fix] Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.13+ containing 0cbef29a7821] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-04 14:17:06 -08:00

... 2 3 4 5 6 ...

64551 Commits