summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)AuthorFilesLines
2016-06-01ubsan: fix tree-wide -Wmaybe-uninitialized false positivesAndrey Ryabinin1-0/+5
-fsanitize=* options makes GCC less smart than usual and increase number of 'maybe-uninitialized' false-positives. So this patch does two things: * Add -Wno-maybe-uninitialized to CFLAGS_UBSAN which will disable all such warnings for instrumented files. * Remove CONFIG_UBSAN_SANITIZE_ALL from all[yes|mod]config builds. So the all[yes|mod]config build goes without -fsanitize=* and still with -Wmaybe-uninitialized. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [backport from mainline for UBSAN] Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com> Change-Id: Ia32d029540fcb5ebd19a3ac447a3b6333a173f84
2016-06-01ubsan: cosmetic fix to Kconfig textYang Shi1-1/+3
When enabling UBSAN_SANITIZE_ALL, the kernel image size gets increased significantly (~3x). So, it sounds better to have some note in Kconfig. And, fixed a typo. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [backport from mainline for UBSAN] Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com> Change-Id: Ibd4183436723b1cad45eab0ffa5b0a8910edc0e6
2016-06-01UBSAN: run-time undefined behavior sanity checkerAndrey Ryabinin5-0/+574
UBSAN uses compile-time instrumentation to catch undefined behavior (UB). Compiler inserts code that perform certain kinds of checks before operations that could cause UB. If check fails (i.e. UB detected) __ubsan_handle_* function called to print error message. So the most of the work is done by compiler. This patch just implements ubsan handlers printing errors. GCC has this capability since 4.9.x [1] (see -fsanitize=undefined option and its suboptions). However GCC 5.x has more checkers implemented [2]. Article [3] has a bit more details about UBSAN in the GCC. [1] - https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html [2] - https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html [3] - http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/ Issues which UBSAN has found thus far are: Found bugs: * out-of-bounds access - 97840cb67ff5 ("netfilter: nfnetlink: fix insufficient validation in nfnetlink_bind") undefined shifts: * d48458d4a768 ("jbd2: use a better hash function for the revoke table") * 10632008b9e1 ("clockevents: Prevent shift out of bounds") * 'x << -1' shift in ext4 - http://lkml.kernel.org/r/<5444EF21.8020501@samsung.com> * undefined rol32(0) - http://lkml.kernel.org/r/<1449198241-20654-1-git-send-email-sasha.levin@oracle.com> * undefined dirty_ratelimit calculation - http://lkml.kernel.org/r/<566594E2.3050306@odin.com> * undefined roundown_pow_of_two(0) - http://lkml.kernel.org/r/<1449156616-11474-1-git-send-email-sasha.levin@oracle.com> * [WONTFIX] undefined shift in __bpf_prog_run - http://lkml.kernel.org/r/<CACT4Y+ZxoR3UjLgcNdUm4fECLMx2VdtfrENMtRRCdgHB2n0bJA@mail.gmail.com> WONTFIX here because it should be fixed in bpf program, not in kernel. signed overflows: * 32a8df4e0b33f ("sched: Fix odd values in effective_load() calculations") * mul overflow in ntp - http://lkml.kernel.org/r/<1449175608-1146-1-git-send-email-sasha.levin@oracle.com> * incorrect conversion into rtc_time in rtc_time64_to_tm() - http://lkml.kernel.org/r/<1449187944-11730-1-git-send-email-sasha.levin@oracle.com> * unvalidated timespec in io_getevents() - http://lkml.kernel.org/r/<CACT4Y+bBxVYLQ6LtOKrKtnLthqLHcw-BMp3aqP3mjdAvr9FULQ@mail.gmail.com> * [NOTABUG] signed overflow in ktime_add_safe() - http://lkml.kernel.org/r/<CACT4Y+aJ4muRnWxsUe1CMnA6P8nooO33kwG-c8YZg=0Xc8rJqw@mail.gmail.com> [akpm@linux-foundation.org: fix unused local warning] [akpm@linux-foundation.org: fix __int128 build woes] Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Marek <mmarek@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yury Gribov <y.gribov@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [backport from mainline for UBSAN] Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com> Change-Id: I216cb2d9dbfd9fc9e70b8e4515a0e34fcb68822f
2015-06-18revert "cpumask: don't perform while loop in cpumask_next_and()"Andrew Morton1-5/+4
Revert commit 534b483a86e6 ("cpumask: don't perform while loop in cpumask_next_and()"). This was a minor optimization, but it puts a `struct cpumask' on the stack, which consumes too much stack space. Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Amir Vadai <amirv@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-14Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds1-2/+2
Pull more MIPS fixes from Ralf Baechle: "Another round of 4.1 MIPS fixes, one fix to a MIPS-specific #if condition in lib/mpi, one fix to the MIPS GIC irqchip driver and one SSB fix. Details: - fix handling of clock in chipco SSB driver. - fix two MIPS-specific #if conditions to correctly work for GCC 5.1. - fix damage to R6 pgtable bits done by XPA support. - fix possible crash due to unloading modules that contain statically defined platform devices. - fix disabling of the MSA ASE on context switch to also work correctly when a new thread/process has the CPU for the very first time. This is part of linux-next and has been beaten to death on Imagination's test farm. While things are not looking too grim this pull request also means the rate of fixes for 4.1 remains nearly constant so I'd not be unhappy if you'd delay the release" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MPI: MIPS: Fix compilation error with GCC 5.1 IRQCHIP: mips-gic: Don't nest calls to do_IRQ() MIPS: MSA: bugfix - disable MSA correctly for new threads/processes. MIPS: Loongson: Do not register 8250 platform device from module. MIPS: Cobalt: Do not build MTD platform device registration code as module. SSB: Fix handling of ssb_pmu_get_alp_clock() MIPS: pgtable-bits: Fix XPA damage to R6 definitions.
2015-06-13MPI: MIPS: Fix compilation error with GCC 5.1Jaedon Shin1-2/+2
This patch fixes mips compilation error: lib/mpi/generic_mpih-mul1.c: In function 'mpihelp_mul_1': lib/mpi/longlong.h:651:2: error: impossible constraint in 'asm' Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com> Cc: Linux-MIPS <linux-mips@linux-mips.org> Patchwork: https://patchwork.linux-mips.org/patch/10546/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-06-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-0/+1
Pull networking fixes from David Miller: 1) Fix stack allocation in s390 BPF JIT, from Michael Holzheu. 2) Disable LRO on openvswitch paths, from Jiri Benc. 3) UDP early demux doesn't handle multicast group membership properly, fix from Shawn Bohrer. 4) Fix TX queue hang due to incorrect handling of mixed sized fragments and linearlization in i40e driver, from Anjali Singhai Jain. 5) Cannot use disable_irq() in timer handler of AMD xgbe driver, from Thomas Lendacky. 6) b2net driver improperly assumes pci_alloc_consistent() gives zero'd out memory, use dma_zalloc_coherent(). From Sriharsha Basavapatna. 7) Fix use-after-free in MPLS and ipv6, from Robert Shearman. 8) Missing neif_napi_del() calls in cleanup paths of b44 driver, from Hauke Mehrtens. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: replace last open coded skb_orphan_frags with function call net: bcmgenet: power on MII block for all MII modes ipv6: Fix protocol resubmission ipv6: fix possible use after free of dev stats b44: call netif_napi_del() bridge: disable softirqs around br_fdb_update to avoid lockup Revert "bridge: use _bh spinlock variant for br_fdb_update to avoid lockup" mpls: fix possible use after free of device be2net: Replace dma/pci_alloc_coherent() calls with dma_zalloc_coherent() bridge: use _bh spinlock variant for br_fdb_update to avoid lockup amd-xgbe: Use disable_irq_nosync from within timer function rhashtable: add missing import <linux/export.h> i40e: Make sure to be in VEB mode if SRIOV is enabled at probe i40e: start up in VEPA mode by default i40e/i40evf: Fix mixed size frags and linearization ipv4/udp: Verify multicast group is ours in upd_v4_early_demux() openvswitch: disable LRO s390/bpf: fix bpf frame pointer setup s390/bpf: fix stack allocation
2015-06-07rhashtable: add missing import <linux/export.h>Hauke Mehrtens1-0/+1
rhashtable uses EXPORT_SYMBOL_GPL() without importing linux/export.h directly it is only imported indirectly through some other includes. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-06Merge branch 'stable/for-linus-4.1' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb fix from Konrad Rzeszutek Wilk: "Tiny little fix which just converts an function to be static. Really tiny" * 'stable/for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: swiotlb: do not export map_single function
2015-06-05swiotlb: do not export map_single functionAlexandre Courbot1-2/+3
The map_single() function is not defined as static, even though it doesn't seem to be used anywhere else in the kernel. Make it static to avoid namespace pollution since this is a rather generic symbol. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-06-03lib: Clarify the return value of strnlen_user()Jan Kara1-1/+8
strnlen_user() can return a number in a range 0 to count + sizeof(unsigned long) - 1. Clarify the comment at the top of the function so that users don't think the function returns at most count+1. Signed-off-by: Jan Kara <jack@suse.cz> [ Also added commentary about preferably not using this function ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-02lib: Fix strnlen_user() to not touch memory after specified maximumJan Kara1-1/+2
If the specified maximum length of the string is a multiple of unsigned long, we would load one long behind the specified maximum. If that happens to be in a next page, we can hit a page fault although we were not expected to. Fix the off-by-one bug in the test whether we are at the end of the specified range. Signed-off-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-29Merge tag 'xfs-for-linus-4.1-rc6' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "This is a little larger than I'd like late in the release cycle, but all the fixes are for regressions introduced in the 4.1-rc1 merge, or are needed back in -stable kernels fairly quickly as they are filesystem corruption or userspace visible correctness issues. Changes in this update: - regression fix for new rename whiteout code - regression fixes for new superblock generic per-cpu counter code - fix for incorrect error return sign introduced in 3.17 - metadata corruption fixes that need to go back to -stable kernels" * tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: fix broken i_nlink accounting for whiteout tmpfile inode xfs: xfs_iozero can return positive errno xfs: xfs_attr_inactive leaves inconsistent attr fork state behind xfs: extent size hints can round up extents past MAXEXTLEN xfs: inode and free block counters need to use __percpu_counter_compare percpu_counter: batch size aware __percpu_counter_compare() xfs: use percpu_counter_read_positive for mp->m_icount
2015-05-29Merge tag 'fixes-for-linus' of ↵Linus Torvalds1-48/+26
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull fixes for cpumask and modules from Rusty Russell: "** NOW WITH TESTING! ** Two fixes which got lost in my recent distraction. One is a weird cpumask function which needed to be rewritten, the other is a module bug which is cc:stable" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: cpumask_set_cpu_local_first => cpumask_local_spread, lament module: Call module notifier on failure after complete_formation()
2015-05-29percpu_counter: batch size aware __percpu_counter_compare()Dave Chinner1-3/+3
XFS uses non-stanard batch sizes for avoiding frequent global counter updates on it's allocated inode counters, as they increment or decrement in batches of 64 inodes. Hence the standard percpu counter batch of 32 means that the counter is effectively a global counter. Currently Xfs uses a batch size of 128 so that it doesn't take the global lock on every single modification. However, Xfs also needs to compare accurately against zero, which means we need to use percpu_counter_compare(), and that has a hard-coded batch size of 32, and hence will spuriously fail to detect when it is supposed to use precise comparisons and hence the accounting goes wrong. Add __percpu_counter_compare() to take a custom batch size so we can use it sanely in XFS and factor percpu_counter_compare() to use it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-28cpumask_set_cpu_local_first => cpumask_local_spread, lamentRusty Russell1-48/+26
da91309e0a7e (cpumask: Utility function to set n'th cpu...) created a genuinely weird function. I never saw it before, it went through DaveM. (He only does this to make us other maintainers feel better about our own mistakes.) cpumask_set_cpu_local_first's purpose is say "I need to spread things across N online cpus, choose the ones on this numa node first"; you call it in a loop. It can fail. One of the two callers ignores this, the other aborts and fails the device open. It can fail in two ways: allocating the off-stack cpumask, or through a convoluted codepath which AFAICT can only occur if cpu_online_mask changes. Which shouldn't happen, because if cpu_online_mask can change while you call this, it could return a now-offline cpu anyway. It contains a nonsensical test "!cpumask_of_node(numa_node)". This was drawn to my attention by Geert, who said this causes a warning on Sparc. It sets a single bit in a cpumask instead of returning a cpu number, because that's what the callers want. It could be made more efficient by passing the previous cpu rather than an index, but that would be more invasive to the callers. Fixes: da91309e0a7e8966d916a74cce42ed170fde06bf Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (then rebased) Tested-by: Amir Vadai <amirv@mellanox.com> Acked-by: Amir Vadai <amirv@mellanox.com> Acked-by: David S. Miller <davem@davemloft.net>
2015-05-16rhashtable: Add cap on number of elements in hash tableHerbert Xu1-0/+11
We currently have no limit on the number of elements in a hash table. This is a problem because some users (tipc) set a ceiling on the maximum table size and when that is reached the hash table may degenerate. Others may encounter OOM when growing and if we allow insertions when that happens the hash table perofrmance may also suffer. This patch adds a new paramater insecure_max_entries which becomes the cap on the table. If unset it defaults to max_size * 2. If it is also zero it means that there is no cap on the number of elements in the table. However, the table will grow whenever the utilisation hits 100% and if that growth fails, you will get ENOMEM on insertion. As allowing oversubscription is potentially dangerous, the name contains the word insecure. Note that the cap is not a hard limit. This is done for performance reasons as enforcing a hard limit will result in use of atomic ops that are heavier than the ones we currently use. The reasoning is that we're only guarding against a gross over- subscription of the table, rather than a small breach of the limit. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-06Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fix from Ingo Molnar: "An RCU Kconfig fix that eliminates an annoying interactive kconfig question for CONFIG_RCU_TORTURE_TEST_SLOW_INIT" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Control grace-period delays directly from value
2015-05-05kasan: show gcc version requirements in Kconfig and DocumentationJoe Perches1-2/+6
The documentation shows a need for gcc > 4.9.2, but it's really >=. The Kconfig entries don't show require versions so add them. Correct a latter/later typo too. Also mention that gcc 5 required to catch out of bounds accesses to global and stack variables. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-05lib: delete lib/find_last_bit.cYury Norov1-41/+0
The file lib/find_last_bit.c was no longer used and supposed to be deleted by commit 8f6f19dd51 ("lib: move find_last_bit to lib/find_next_bit.c") but that delete didn't happen. This gets rid of it. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds1-1/+1
Pull crypto fixes from Herbert Xu: "This fixes a build problem with bcm63xx and yet another fix to the memzero_explicit function to ensure that the memset is not elided" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: hwrng: bcm63xx - Fix driver compilation lib: make memzero_explicit more robust against dead store elimination
2015-05-04lib: make memzero_explicit more robust against dead store eliminationDaniel Borkmann1-1/+1
In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in case LTO would decide to inline memzero_explicit() and eventually find out it could be elimiated as dead store. While using barrier() works well for the case of gcc, recent efforts from LLVMLinux people suggest to use llvm as an alternative to gcc, and there, Stephan found in a simple stand-alone user space example that llvm could nevertheless optimize and thus elimitate the memset(). A similar issue has been observed in the referenced llvm bug report, which is regarded as not-a-bug. Based on some experiments, icc is a bit special on its own, while it doesn't seem to eliminate the memset(), it could do so with an own implementation, and then result in similar findings as with llvm. The fix in this patch now works for all three compilers (also tested with more aggressive optimization levels). Arguably, in the current kernel tree it's more of a theoretical issue, but imho, it's better to be pedantic about it. It's clearly visible with gcc/llvm though, with the below code: if we would have used barrier() only here, llvm would have omitted clearing, not so with barrier_data() variant: static inline void memzero_explicit(void *s, size_t count) { memset(s, 0, count); barrier_data(s); } int main(void) { char buff[20]; memzero_explicit(buff, sizeof(buff)); return 0; } $ gcc -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x0000000000400400 <+0>: lea -0x28(%rsp),%rax 0x0000000000400405 <+5>: movq $0x0,-0x28(%rsp) 0x000000000040040e <+14>: movq $0x0,-0x20(%rsp) 0x0000000000400417 <+23>: movl $0x0,-0x18(%rsp) 0x000000000040041f <+31>: xor %eax,%eax 0x0000000000400421 <+33>: retq End of assembler dump. $ clang -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x00000000004004f0 <+0>: xorps %xmm0,%xmm0 0x00000000004004f3 <+3>: movaps %xmm0,-0x18(%rsp) 0x00000000004004f8 <+8>: movl $0x0,-0x8(%rsp) 0x0000000000400500 <+16>: lea -0x18(%rsp),%rax 0x0000000000400505 <+21>: xor %eax,%eax 0x0000000000400507 <+23>: retq End of assembler dump. As gcc, clang, but also icc defines __GNUC__, it's sufficient to define this in compiler-gcc.h only to be picked up. For a fallback or otherwise unsupported compiler, we define it as a barrier. Similarly, for ecc which does not support gcc inline asm. Reference: https://llvm.org/bugs/show_bug.cgi?id=15495 Reported-by: Stephan Mueller <smueller@chronox.de> Tested-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Stephan Mueller <smueller@chronox.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: mancha security <mancha1@zoho.com> Cc: Mark Charlebois <charlebm@gmail.com> Cc: Behan Webster <behanw@converseincode.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-04-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-3/+8
Pull networking fixes from David Miller: 1) mlx4 doesn't check fully for supported valid RSS hash function, fix from Amir Vadai 2) Off by one in ibmveth_change_mtu(), from David Gibson 3) Prevent altera chip from reporting false error interrupts in some circumstances, from Chee Nouk Phoon 4) Get rid of that stupid endless loop trying to allocate a FIN packet in TCP, and in the process kill deadlocks. From Eric Dumazet 5) Fix get_rps_cpus() crash due to wrong invalid-cpu value, also from Eric Dumazet 6) Fix two bugs in async rhashtable resizing, from Thomas Graf 7) Fix topology server listener socket namespace bug in TIPC, from Ying Xue 8) Add some missing HAS_DMA kconfig dependencies, from Geert Uytterhoeven 9) bgmac driver intends to force re-polling but does so by returning the wrong value from it's ->poll() handler. Fix from Rafał Miłecki 10) When the creater of an rhashtable configures a max size for it, don't bark in the logs and drop insertions when that is exceeded. Fix from Johannes Berg 11) Recover from out of order packets in ppp mppe properly, from Sylvain Rochet * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits) bnx2x: really disable TPA if 'disable_tpa' option is set net:treewide: Fix typo in drivers/net net/mlx4_en: Prevent setting invalid RSS hash function mdio-mux-gpio: use new gpiod_get_array and gpiod_put_array functions netfilter; Add some missing default cases to switch statements in nft_reject. ppp: mppe: discard late packet in stateless mode ppp: mppe: sanity error path rework net/bonding: Make DRV macros private net: rfs: fix crash in get_rps_cpus() altera tse: add support for fixed-links. pxa168: fix double deallocation of managed resources net: fix crash in build_skb() net: eth: altera: Resolve false errors from MSGDMA to TSE ehea: Fix memory hook reference counting crashes net/tg3: Release IRQs on permanent error net: mdio-gpio: support access that may sleep inet: fix possible panic in reqsk_queue_unlink() rhashtable: don't attempt to grow when at max_size bgmac: fix requests for extra polling calls from NAPI tcp: avoid looping in tcp_send_fin() ...
2015-04-24Merge tag 'md/4.1' of git://neil.brown.name/mdLinus Torvalds10-23/+347
Pull md updates from Neil Brown: "More updates that usual this time. A few have performance impacts which hould mostly be positive, but RAID5 (in particular) can be very work-load ensitive... We'll have to wait and see. Highlights: - "experimental" code for managing md/raid1 across a cluster using DLM. Code is not ready for general use and triggers a WARNING if used. However it is looking good and mostly done and having in mainline will help co-ordinate development. - RAID5/6 can now batch multiple (4K wide) stripe_heads so as to handle a full (chunk wide) stripe as a single unit. - RAID6 can now perform read-modify-write cycles which should help performance on larger arrays: 6 or more devices. - RAID5/6 stripe cache now grows and shrinks dynamically. The value set is used as a minimum. - Resync is now allowed to go a little faster than the 'mininum' when there is competing IO. How much faster depends on the speed of the devices, so the effective minimum should scale with device speed to some extent" * tag 'md/4.1' of git://neil.brown.name/md: (58 commits) md/raid5: don't do chunk aligned read on degraded array. md/raid5: allow the stripe_cache to grow and shrink. md/raid5: change ->inactive_blocked to a bit-flag. md/raid5: move max_nr_stripes management into grow_one_stripe and drop_one_stripe md/raid5: pass gfp_t arg to grow_one_stripe() md/raid5: introduce configuration option rmw_level md/raid5: activate raid6 rmw feature md/raid6 algorithms: xor_syndrome() for SSE2 md/raid6 algorithms: xor_syndrome() for generic int md/raid6 algorithms: improve test program md/raid6 algorithms: delta syndrome functions raid5: handle expansion/resync case with stripe batching raid5: handle io error of batch list RAID5: batch adjacent full stripe write raid5: track overwrite disk count raid5: add a new flag to track if a stripe can be batched raid5: use flex_array for scribble data md raid0: access mddev->queue (request queue member) conditionally because it is not set when accessed from dm-raid md: allow resync to go faster when there is competing IO. md: remove 'go_faster' option from ->sync_request() ...
2015-04-22rhashtable: Do not schedule more than one rehash if we can't grow furtherThomas Graf1-2/+2
The current code currently only stops inserting rehashes into the chain when no resizes are currently scheduled. As long as resizes are scheduled and while inserting above the utilization watermark, more and more rehashes will be scheduled. This lead to a perfect DoS storm with thousands of rehashes scheduled which lead to thousands of spinlocks to be taken sequentially. Instead, only allow either a series of resizes or a single rehash. Drop any further rehashes and return -EBUSY. Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion") Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-22rhashtable: Schedule async resize when sync realloc failsThomas Graf1-1/+6
When rhashtable_insert_rehash() fails with ENOMEM, this indicates that we can't allocate the necessary memory in the current context but the limits as set by the user would still allow to grow. Thus attempt an async resize in the background where we can allocate using GFP_KERNEL which is more likely to succeed. The insertion itself will still fail to indicate pressure. This fixes a bug where the table would never continue growing once the utilization is above 100%. Fixes: ccd57b1bd324 ("rhashtable: Add immediate rehash during insertion") Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds1-10/+10
Pull sparc fixes from David Miller: 1) ldc_alloc_exp_dring() can be called from softints, so use GFP_ATOMIC. From Sowmini Varadhan. 2) Some minor warning/build fixups for the new iommu-common code on certain archs and with certain debug options enabled. Also from Sowmini Varadhan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: Use GFP_ATOMIC in ldc_alloc_exp_dring() as it can be called in softirq context sparc64: Use M7 PMC write on all chips T4 and onward. iommu-common: rename iommu_pool_hash to iommu_hash_common iommu-common: fix x86_64 compiler warnings
2015-04-22md/raid6 algorithms: xor_syndrome() for SSE2Markus Stockhausen1-3/+227
The second and (last) optimized XOR syndrome calculation. This version supports right and left side optimization. All CPUs with architecture older than Haswell will benefit from it. It should be noted that SSE2 movntdq kills performance for memory areas that are read and written simultaneously in chunks smaller than cache line size. So use movdqa instead for P/Q writes in sse21 and sse22 XOR functions. Signed-off-by: Markus Stockhausen <stockhausen@collogia.de> Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22md/raid6 algorithms: xor_syndrome() for generic intMarkus Stockhausen1-1/+39
Start the algorithms with the very basic one. It is left and right optimized. That means we can avoid all calculations for unneeded pages above the right stop offset. For pages below the left start offset we still need the syndrome multiplication but without reading data pages. Signed-off-by: Markus Stockhausen <stockhausen@collogia.de> Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22md/raid6 algorithms: improve test programMarkus Stockhausen1-15/+36
It is always helpful to have a test tool in place if we implement new data critical algorithms. So add some test routines to the raid6 checker that can prove if the new xor_syndrome() works as expected. Run through all permutations of start/stop pages per algorithm and simulate a xor_syndrome() assisted rmw run. After each rmw check if the recovery algorithm still confirms that the stripe is fine. Signed-off-by: Markus Stockhausen <stockhausen@collogia.de> Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22md/raid6 algorithms: delta syndrome functionsMarkus Stockhausen9-8/+49
v3: s-o-b comment, explanation of performance and descision for the start/stop implementation Implementing rmw functionality for RAID6 requires optimized syndrome calculation. Up to now we can only generate a complete syndrome. The target P/Q pages are always overwritten. With this patch we provide a framework for inplace P/Q modification. In the first place simply fill those functions with NULL values. xor_syndrome() has two additional parameters: start & stop. These will indicate the first and last page that are changing during a rmw run. That makes it possible to avoid several unneccessary loops and speed up calculation. The caller needs to implement the following logic to make the functions work. 1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source blocks inside P/Q between (and including) start and end. 2) modify any block with start <= block <= stop 3) xor_syndrome(disks, start, stop, ...): "Reinsert" all data of source blocks into P/Q between (and including) start and end. Pages between start and stop that won't be changed should be filled with a pointer to the kernel zero page. The reasons for not taking NULL pages are: 1) Algorithms cross the whole source data line by line. Thus avoid additional branches. 2) Having a NULL page avoids calculating the XOR P parity but still need calulation steps for the Q parity. Depending on the algorithm unrolling that might be only a difference of 2 instructions per loop. The benchmark numbers of the gen_syndrome() functions are displayed in the kernel log. Do the same for the xor_syndrome() functions. This will help to analyze performance problems and give an rough estimate how well the algorithm works. The choice of the fastest algorithm will still depend on the gen_syndrome() performance. With the start/stop page implementation the speed can vary a lot in real life. E.g. a change of page 0 & page 15 on a stripe will be harder to compute than the case where page 0 & page 1 are XOR candidates. To be not to enthusiatic about the expected speeds we will run a worse case test that simulates a change on the upper half of the stripe. So we do: 1) calculation of P/Q for the upper pages 2) continuation of Q for the lower (empty) pages Signed-off-by: Markus Stockhausen <stockhausen@collogia.de> Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-21Merge tag 'char-misc-4.1-rc1' of ↵Linus Torvalds1-0/+28
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here's the big char/misc driver patchset for 4.1-rc1. Lots of different driver subsystem updates here, nothing major, full details are in the shortlog. All of this has been in linux-next for a while" * tag 'char-misc-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (133 commits) mei: trace: remove unused TRACE_SYSTEM_STRING DTS: ARM: OMAP3-N900: Add lis3lv02d support Documentation: DT: lis302: update wakeup binding lis3lv02d: DT: add wakeup unit 2 and wakeup threshold lis3lv02d: DT: use s32 to support negative values Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case Drivers: hv: hv_balloon: correctly handle val.freeram<num_pages case mei: replace check for connection instead of transitioning mei: use mei_cl_is_connected consistently mei: fix mei_poll operation hv_vmbus: Add gradually increased delay for retries in vmbus_post_msg() Drivers: hv: hv_balloon: survive ballooning request with num_pages=0 Drivers: hv: hv_balloon: eliminate jumps in piecewiese linear floor function Drivers: hv: hv_balloon: do not online pages in offline blocks hv: remove the per-channel workqueue hv: don't schedule new works in vmbus_onoffer()/vmbus_onoffer_rescind() hv: run non-blocking message handlers in the dispatch tasklet coresight: moving to new "hwtracing" directory coresight-tmc: Adding a status interface to sysfs coresight: remove the unnecessary configuration coresight-default-sink ...
2015-04-20iommu-common: rename iommu_pool_hash to iommu_hash_commonSowmini Varadhan1-3/+3
When CONFIG_DEBUG_FORCE_WEAK_PER_CPU is set, the DEFINE_PER_CPU_SECTION macro will define an extern __pcpu_unique_##name variable that could conflict with the same definition in powerpc at this time. Avoid that conflict by renaming iommu_pool_hash in iommu-common.c Thanks to Guenter Roeck for catching this, and helping to test the fix. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-20iommu-common: fix x86_64 compiler warningsSowmini Varadhan1-7/+7
Declare iommu_large_alloc as static. Remove extern definition for iommu_tbl_pool_init(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-20Merge tag 'cpumask-next-for-linus' of ↵Linus Torvalds2-32/+0
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull final removal of deprecated cpus_* cpumask functions from Rusty Russell: "This is the final removal (after several years!) of the obsolete cpus_* functions, prompted by their mis-use in staging. With these function removed, all cpu functions should only iterate to nr_cpu_ids, so we finally only allocate that many bits when cpumasks are allocated offstack" * tag 'cpumask-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits) cpumask: remove __first_cpu / __next_cpu cpumask: resurrect CPU_MASK_CPU0 linux/cpumask.h: add typechecking to cpumask_test_cpu cpumask: only allocate nr_cpumask_bits. Fix weird uses of num_online_cpus(). cpumask: remove deprecated functions. mips: fix obsolete cpumask_of_cpu usage. x86: fix more deprecated cpu function usage. ia64: remove deprecated cpus_ usage. powerpc: fix deprecated CPU_MASK_CPU0 usage. CPU_MASK_ALL/CPU_MASK_NONE: remove from deprecated region. staging/lustre/o2iblnd: Don't use cpus_weight staging/lustre/libcfs: replace deprecated cpus_ calls with cpumask_ staging/lustre/ptlrpc: Do not use deprecated cpus_* functions blackfin: fix up obsolete cpu function usage. parisc: fix up obsolete cpu function usage. tile: fix up obsolete cpu function usage. arm64: fix up obsolete cpu function usage. mips: fix up obsolete cpu function usage. x86: fix up obsolete cpu function usage. ...
2015-04-19hexdump: avoid warning in test functionLinus Torvalds1-1/+1
The test_data_1_le[] array is a const array of const char *. To avoid dropping any const information, we need to use "const char * const *", not just "const char **". I'm not sure why the different test arrays end up having different const'ness, but let's make the pointer we use to traverse them as const as possible, since we modify neither the array of pointers _or_ the pointers we find in the array. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-19cpumask: remove __first_cpu / __next_cpuRusty Russell1-21/+0
They were for use by the deprecated first_cpu() and next_cpu() wrappers, but sparc used them directly. They're now replaced by cpumask_first / cpumask_next. And __next_cpu_nr is completely obsolete. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: David S. Miller <davem@davemloft.net>
2015-04-18iommu-common: Fix PARISC compile-time warningsSowmini Varadhan1-1/+5
Fixes warnings due to - no DMA_ERROR_CODE on PARISC, - sizeof (unsigned long) == 4 bytes on PARISC. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-18Break up monolithic iommu table/lock into finer graularity pools and lockSowmini Varadhan2-1/+267
Investigation of multithreaded iperf experiments on an ethernet interface show the iommu->lock as the hottest lock identified by lockstat, with something of the order of 21M contentions out of 27M acquisitions, and an average wait time of 26 us for the lock. This is not efficient. A more scalable design is to follow the ppc model, where the iommu_map_table has multiple pools, each stretching over a segment of the map, and with a separate lock for each pool. This model allows for better parallelization of the iommu map search. This patch adds the iommu range alloc/free function infrastructure. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-18sparc: Revert generic IOMMU allocator.David S. Miller2-225/+1
I applied the wrong version of this patch series, V4 instead of V10, due to a patchwork bundling snafu. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-18Merge branch 'for-mingo' of ↵Ingo Molnar1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull RCU fix from Paul E. McKenney: "This series contains a single change that fixes Kconfig asking pointless questions." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds2-1/+225
Pull sparc updates from David Miller: "The PowerPC folks have a really nice scalable IOMMU pool allocator that we wanted to make use of for sparc. So here we have a series that abstracts out their code into a common layer that anyone can make use of. Sparc is converted, and the PowerPC folks have reviewed and ACK'd this series and plan to convert PowerPC over as well" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: iommu-common: Fix PARISC compile-time warnings sparc: Make LDC use common iommu poll management functions sparc: Make sparc64 use scalable lib/iommu-common.c functions sparc: Break up monolithic iommu table/lock into finer graularity pools and lock
2015-04-17iommu-common: Fix PARISC compile-time warningsSowmini Varadhan1-1/+5
Fixes warnings due to - no DMA_ERROR_CODE on PARISC, - sizeof (unsigned long) == 4 bytes on PARISC. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-17Merge branch 'akpm' (patches from Andrew)Linus Torvalds9-464/+342
Merge third patchbomb from Andrew Morton: - various misc things - a couple of lib/ optimisations - provide DIV_ROUND_CLOSEST_ULL() - checkpatch updates - rtc tree - befs, nilfs2, hfs, hfsplus, fatfs, adfs, affs, bfs - ptrace fixes - fork() fixes - seccomp cleanups - more mmap_sem hold time reductions from Davidlohr * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (138 commits) proc: show locks in /proc/pid/fdinfo/X docs: add missing and new /proc/PID/status file entries, fix typos drivers/rtc/rtc-at91rm9200.c: make IO endian agnostic Documentation/spi/spidev_test.c: fix warning drivers/rtc/rtc-s5m.c: allow usage on device type different than main MFD type .gitignore: ignore *.tar MAINTAINERS: add Mediatek SoC mailing list tomoyo: reduce mmap_sem hold for mm->exe_file powerpc/oprofile: reduce mmap_sem hold for exe_file oprofile: reduce mmap_sem hold for mm->exe_file mips: ip32: add platform data hooks to use DS1685 driver lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help text x86: switch to using asm-generic for seccomp.h sparc: switch to using asm-generic for seccomp.h powerpc: switch to using asm-generic for seccomp.h parisc: switch to using asm-generic for seccomp.h mips: switch to using asm-generic for seccomp.h microblaze: use asm-generic for seccomp.h arm: use asm-generic for seccomp.h seccomp: allow COMPAT sigreturn overrides ...
2015-04-17lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help textAndrew Morton1-3/+2
Cc: Yalin Wang <yalin.wang@sonymobile.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17cpumask: don't perform while loop in cpumask_next_and()Sergey Senozhatsky1-4/+5
cpumask_next_and() is looking for cpumask_next() in src1 in a loop and tests if found cpu is also present in src2. remove that loop, perform cpumask_and() of src1 and src2 first and use that new mask to find cpumask_next(). Apart from removing while loop, ./bloat-o-meter on x86_64 shows add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-8 (-8) function old new delta cpumask_next_and 62 54 -8 Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Amir Vadai <amirv@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17lib/bitmap.c: bitmap_[empty,full]: remove code duplicationYury Norov1-30/+0
bitmap_empty() has its own implementation. But it's clearly as simple as: find_first_bit(src, nbits) == nbits The same is true for 'bitmap_full'. Signed-off-by: Yury Norov <yury.norov@gmail.com> Cc: George Spelvin <linux@horizon.com> Cc: Alexey Klimov <klimov.linux@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17lib/vsprintf.c: improve put_dec_trunc8 slightlyRasmus Villemoes1-6/+4
I hadn't had enough coffee when I wrote this. Currently, the final increment of buf depends on the value loaded from the table, and causes gcc to emit a cmov immediately before the return. It is smarter to let it depend on r, since the increment can then be computed in parallel with the final load/store pair. It also shaves 16 bytes of .text. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17lib/dma-debug: fix bucket_find_contain()Sebastian Ott1-1/+1
bucket_find_contain() will search the bucket list for a dma_debug_entry. When the entry isn't found it needs to search other buckets too, since only the start address of a dma range is hashed (which might be in a different bucket). A copy of the dma_debug_entry is used to get the previous hash bucket but when its list is searched the original dma_debug_entry is to be used not its modified copy. This fixes false "device driver tries to sync DMA memory it has not allocated" warnings. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Horia Geanta <horia.geanta@freescale.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17lib/vsprintf.c: even faster binary to decimal conversionRasmus Villemoes1-118/+128
The most expensive part of decimal conversion is the divisions by 10 (albeit done using reciprocal multiplication with appropriately chosen constants). I decided to see if one could eliminate around half of these multiplications by emitting two digits at a time, at the cost of a 200 byte lookup table, and it does indeed seem like there is something to be gained, especially on 64 bits. Microbenchmarking shows improvements ranging from -50% (for numbers uniformly distributed in [0, 2^64-1]) to -25% (for numbers heavily biased toward the smaller end, a more realistic distribution). On a larger scale, perf shows that top, one of the big consumers of /proc data, uses 0.5-1.0% fewer cpu cycles. I had to jump through some hoops to get the 32 bit code to compile and run on my 64 bit machine, so I'm not sure how relevant these numbers are, but just for comparison the microbenchmark showed improvements between -30% and -10%. The bloat-o-meter costs are around 150 bytes (the generated code is a little smaller, so it's not the full 200 bytes) on both 32 and 64 bit. I'm aware that extra cache misses won't show up in a microbenchmark as used above, but on the other hand decimal conversions often happen in bulk (for example in the case of top). I have of course tested that the new code generates the same output as the old, for both the first and last 1e10 numbers in [0,2^64-1] and 4e9 'random' numbers in-between. Test and verification code on github: https://github.com/Villemoes/dec. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Tested-by: Jeff Epler <jepler@unpythonic.net> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>