summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2012-07-16memblock: free allocated memblock_reserved_regions laterYinghai Lu1-3/+1
commit 29f6738609e40227dabcc63bfb3b84b3726a75bd upstream. memblock_free_reserved_regions() calls memblock_free(), but memblock_free() would double reserved.regions too, so we could free the old range for reserved.regions. Also tj said there is another bug which could be related to this. | I don't think we're saving any noticeable | amount by doing this "free - give it to page allocator - reserve | again" dancing. We should just allocate regions aligned to page | boundaries and free them later when memblock is no longer in use. in that case, when DEBUG_PAGEALLOC, will get panic: memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39 BUG: unable to handle kernel paging request at ffff88102febd948 IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155 PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU 0 Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447 RIP: 0010:[<ffffffff836a5774>] [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155 See the discussion at https://lkml.org/lkml/2012/6/13/469 So try to allocate with PAGE_SIZE alignment and free it later. Reported-by: Sasha Levin <levinsasha928@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16memory hotplug: fix invalid memory access caused by stale kswapd pointerJiang Liu1-1/+1
commit d8adde17e5f858427504725218c56aef90e90fc7 upstream. kswapd_stop() is called to destroy the kswapd work thread when all memory of a NUMA node has been offlined. But kswapd_stop() only terminates the work thread without resetting NODE_DATA(nid)->kswapd to NULL. The stale pointer will prevent kswapd_run() from creating a new work thread when adding memory to the memory-less NUMA node again. Eventually the stale pointer may cause invalid memory access. An example stack dump as below. It's reproduced with 2.6.32, but latest kernel has the same issue. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81051a94>] exit_creds+0x12/0x78 PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/memory/memory391/state CPU 11 Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tpm_tis rtc_cmos i2c_i801 rtc_core tpm serio_raw pcspkr sg tpm_bios igb i2c_core iTCO_wdt rtc_lib mptctl iTCO_vendor_support button dca bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod Pid: 7949, comm: sh Not tainted 2.6.32.12-qiuxishi-5-default #92 Tecal RH2285 RIP: 0010:exit_creds+0x12/0x78 RSP: 0018:ffff8806044f1d78 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff880604f22140 RCX: 0000000000019502 RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000 RBP: ffff880604f22150 R08: 0000000000000000 R09: ffffffff81a4dc10 R10: 00000000000032a0 R11: ffff880006202500 R12: 0000000000000000 R13: 0000000000c40000 R14: 0000000000008000 R15: 0000000000000001 FS: 00007fbc03d066f0(0000) GS:ffff8800282e0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000060f029000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process sh (pid: 7949, threadinfo ffff8806044f0000, task ffff880603d7c600) Stack: ffff880604f22140 ffffffff8103aac5 ffff880604f22140 ffffffff8104d21e ffff880006202500 0000000000008000 0000000000c38000 ffffffff810bd5b1 0000000000000000 ffff880603d7c600 00000000ffffdd29 0000000000000003 Call Trace: __put_task_struct+0x5d/0x97 kthread_stop+0x50/0x58 offline_pages+0x324/0x3da memory_block_change_state+0x179/0x1db store_mem_state+0x9e/0xbb sysfs_write_file+0xd0/0x107 vfs_write+0xad/0x169 sys_write+0x45/0x6e system_call_fastpath+0x16/0x1b Code: ff 4d 00 0f 94 c0 84 c0 74 08 48 89 ef e8 1f fd ff ff 5b 5d 31 c0 41 5c c3 53 48 8b 87 20 06 00 00 48 89 fb 48 8b bf 18 06 00 00 <8b> 00 48 c7 83 18 06 00 00 00 00 00 00 f0 ff 0f 0f 94 c0 84 c0 RIP exit_creds+0x12/0x78 RSP <ffff8806044f1d78> CR2: 0000000000000000 [akpm@linux-foundation.org: add pglist_data.kswapd locking comments] Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16splice: fix racy pipe->buffers usesEric Dumazet1-4/+4
commit 047fe3605235888f3ebcda0c728cb31937eadfe6 upstream. Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered by splice_shrink_spd() called from vmsplice_to_pipe() commit 35f3d14dbbc5 (pipe: add support for shrinking and growing pipes) added capability to adjust pipe->buffers. Problem is some paths don't hold pipe mutex and assume pipe->buffers doesn't change for their duration. Fix this by adding nr_pages_max field in struct splice_pipe_desc, and use it in place of pipe->buffers where appropriate. splice_shrink_spd() loses its struct pipe_inode_info argument. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Tom Herbert <therbert@google.com> Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> [bwh: Backported to 3.2: - Adjust context in vmsplice_to_pipe() - Update one more call to splice_shrink_spd(), from skb_splice_bits()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16thp: avoid atomic64_read in pmd_read_atomic for 32bit PAEAndrea Arcangeli1-0/+10
commit e4eed03fd06578571c01d4f1478c874bb432c815 upstream. In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under Xen. So instead of dealing only with "consistent" pmdvals in pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals where the low 32bit and high 32bit could be inconsistent (to avoid having to use cmpxchg8b). The only guarantee we get from pmd_read_atomic is that if the low part of the pmd was found null, the high part will be null too (so the pmd will be considered unstable). And if the low part of the pmd is found "stable" later, then it means the whole pmd was read atomically (because after a pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore, and we read the high part after the low part). In the 32bit PAE x86 case, it is enough to read the low part of the pmdval atomically to declare the pmd as "stable" and that's true for THP and no THP, furthermore in the THP case we also have a barrier() that will prevent any inconsistent pmdvals to be cached by a later re-read of the *pmd. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Jonathan Nieder <jrnieder@gmail.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Tested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race conditionAndrea Arcangeli1-2/+20
commit 26c191788f18129af0eb32a358cdaea0c7479626 upstream. When holding the mmap_sem for reading, pmd_offset_map_lock should only run on a pmd_t that has been read atomically from the pmdp pointer, otherwise we may read only half of it leading to this crash. PID: 11679 TASK: f06e8000 CPU: 3 COMMAND: "do_race_2_panic" #0 [f06a9dd8] crash_kexec at c049b5ec #1 [f06a9e2c] oops_end at c083d1c2 #2 [f06a9e40] no_context at c0433ded #3 [f06a9e64] bad_area_nosemaphore at c043401a #4 [f06a9e6c] __do_page_fault at c0434493 #5 [f06a9eec] do_page_fault at c083eb45 #6 [f06a9f04] error_code (via page_fault) at c083c5d5 EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP: 00000000 DS: 007b ESI: 9e201000 ES: 007b EDI: 01fb4700 GS: 00e0 CS: 0060 EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246 #7 [f06a9f38] _spin_lock at c083bc14 #8 [f06a9f44] sys_mincore at c0507b7d #9 [f06a9fb0] system_call at c083becd start len EAX: ffffffda EBX: 9e200000 ECX: 00001000 EDX: 6228537f DS: 007b ESI: 00000000 ES: 007b EDI: 003d0f00 SS: 007b ESP: 62285354 EBP: 62285388 GS: 0033 CS: 0073 EIP: 00291416 ERR: 000000da EFLAGS: 00000286 This should be a longstanding bug affecting x86 32bit PAE without THP. Only archs with 64bit large pmd_t and 32bit unsigned long should be affected. With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad() would partly hide the bug when the pmd transition from none to stable, by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is enabled a new set of problem arises by the fact could then transition freely in any of the none, pmd_trans_huge or pmd_trans_stable states. So making the barrier in pmd_none_or_trans_huge_or_clear_bad() unconditional isn't good idea and it would be a flakey solution. This should be fully fixed by introducing a pmd_read_atomic that reads the pmd in order with THP disabled, or by reading the pmd atomically with cmpxchg8b with THP enabled. Luckily this new race condition only triggers in the places that must already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix is localized there but this bug is not related to THP. NOTE: this can trigger on x86 32bit systems with PAE enabled with more than 4G of ram, otherwise the high part of the pmd will never risk to be truncated because it would be zero at all times, in turn so hiding the SMP race. This bug was discovered and fully debugged by Ulrich, quote: ---- [..] pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and eax. 496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) 497 { 498 /* depend on compiler for an atomic pmd read */ 499 pmd_t pmdval = *pmd; // edi = pmd pointer 0xc0507a74 <sys_mincore+548>: mov 0x8(%esp),%edi ... // edx = PTE page table high address 0xc0507a84 <sys_mincore+564>: mov 0x4(%edi),%edx ... // eax = PTE page table low address 0xc0507a8e <sys_mincore+574>: mov (%edi),%eax [..] Please note that the PMD is not read atomically. These are two "mov" instructions where the high order bits of the PMD entry are fetched first. Hence, the above machine code is prone to the following race. - The PMD entry {high|low} is 0x0000000000000000. The "mov" at 0xc0507a84 loads 0x00000000 into edx. - A page fault (on another CPU) sneaks in between the two "mov" instructions and instantiates the PMD. - The PMD entry {high|low} is now 0x00000003fda38067. The "mov" at 0xc0507a8e loads 0xfda38067 into eax. ---- Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16SUNRPC: new svc_bind() routine introducedStanislav Kinsbursky1-0/+1
upstream commit 9793f7c88937e7ac07305ab1af1a519225836823. This new routine is responsible for service registration in a specified network context. The idea is to separate service creation from per-net operations. Note also: since registering service with svc_bind() can fail, the service will be destroyed and during destruction it will try to unregister itself from rpcbind. In this case unregistration has to be skipped. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16Lockd: pass network namespace to creation and destruction routinesStanislav Kinsbursky1-2/+2
upstream commit e3f70eadb7dddfb5a2bb9afff7abfc6ee17a29d0. v2: dereference of most probably already released nlm_host removed in nlmclnt_done() and reclaimer(). These routines are called from locks reclaimer() kernel thread. This thread works in "init_net" network context and currently relays on persence on lockd thread and it's per-net resources. Thus lockd_up() and lockd_down() can't relay on current network context. So let's pass corrent one into them. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16PCI: EHCI: fix crash during suspend on ASUS computersAlan Stern1-2/+0
commit dbf0e4c7257f8d684ec1a3c919853464293de66e upstream. Quite a few ASUS computers experience a nasty problem, related to the EHCI controllers, when going into system suspend. It was observed that the problem didn't occur if the controllers were not put into the D3 power state before starting the suspend, and commit 151b61284776be2d6f02d48c23c3625678960b97 (USB: EHCI: fix crash during suspend on ASUS computers) was created to do this. It turned out this approach messed up other computers that didn't have the problem -- it prevented USB wakeup from working. Consequently commit c2fb8a3fa25513de8fedb38509b1f15a5bbee47b (USB: add NO_D3_DURING_SLEEP flag and revert 151b61284776be2) was merged; it reverted the earlier commit and added a whitelist of known good board names. Now we know the actual cause of the problem. Thanks to AceLan Kao for tracking it down. According to him, an engineer at ASUS explained that some of their BIOSes contain a bug that was added in an attempt to work around a problem in early versions of Windows. When the computer goes into S3 suspend, the BIOS tries to verify that the EHCI controllers were first quiesced by the OS. Nothing's wrong with this, but the BIOS does it by checking that the PCI COMMAND registers contain 0 without checking the controllers' power state. If the register isn't 0, the BIOS assumes the controller needs to be quiesced and tries to do so. This involves making various MMIO accesses to the controller, which don't work very well if the controller is already in D3. The end result is a system hang or memory corruption. Since the value in the PCI COMMAND register doesn't matter once the controller has been suspended, and since the value will be restored anyway when the controller is resumed, we can work around the BIOS bug simply by setting the register to 0 during system suspend. This patch (as1590) does so and also reverts the second commit mentioned above, which is now unnecessary. In theory we could do this for every PCI device. However to avoid introducing new problems, the patch restricts itself to EHCI host controllers. Finally the affected systems can suspend with USB wakeup working properly. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=37632 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=42728 Based-on-patch-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Dâniel Fraga <fragabr@gmail.com> Tested-by: Javier Marcet <jmarcet@gmail.com> Tested-by: Andrey Rahmatullin <wrar@wrar.name> Tested-by: Oleksij Rempel <bug-track@fisher-privat.net> Tested-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16SCSI: libsas: fix taskfile corruption in sas_ata_qc_fill_rtfDan Williams1-2/+4
commit 6ef1b512f4e6f936d89aa20be3d97a7ec7c290ac upstream. fill_result_tf() grabs the taskfile flags from the originating qc which sas_ata_qc_fill_rtf() promptly overwrites. The presence of an ata_taskfile in the sata_device makes it tempting to just copy the full contents in sas_ata_qc_fill_rtf(). However, libata really only wants the fis contents and expects the other portions of the taskfile to not be touched by ->qc_fill_rtf. To that end store a fis buffer in the sata_device and use ata_tf_from_fis() like every other ->qc_fill_rtf() implementation. Reported-by: Praveen Murali <pmurali@logicube.com> Tested-by: Praveen Murali <pmurali@logicube.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16SCSI: Fix NULL dereferences in scsi_cmd_to_driverMark Rustad1-1/+7
commit 222a806af830fda34ad1f6bc991cd226916de060 upstream. Avoid crashing if the private_data pointer happens to be NULL. This has been seen sometimes when a host reset happens, notably when there are many LUNs: host3: Assigned Port ID 0c1601 scsi host3: libfc: Host reset succeeded on port (0c1601) BUG: unable to handle kernel NULL pointer dereference at 0000000000000350 IP: [<ffffffff81352bb8>] scsi_send_eh_cmnd+0x58/0x3a0 <snip> Process scsi_eh_3 (pid: 4144, threadinfo ffff88030920c000, task ffff880326b160c0) Stack: 000000010372e6ba 0000000000000282 000027100920dca0 ffffffffa0038ee0 0000000000000000 0000000000030003 ffff88030920dc80 ffff88030920dc80 00000002000e0000 0000000a00004000 ffff8803242f7760 ffff88031326ed80 Call Trace: [<ffffffff8105b590>] ? lock_timer_base+0x70/0x70 [<ffffffff81352fbe>] scsi_eh_tur+0x3e/0xc0 [<ffffffff81353a36>] scsi_eh_test_devices+0x76/0x170 [<ffffffff81354125>] scsi_eh_host_reset+0x85/0x160 [<ffffffff81354291>] scsi_eh_ready_devs+0x91/0x110 [<ffffffff813543fd>] scsi_unjam_host+0xed/0x1f0 [<ffffffff813546a8>] scsi_error_handler+0x1a8/0x200 [<ffffffff81354500>] ? scsi_unjam_host+0x1f0/0x1f0 [<ffffffff8106ec3e>] kthread+0x9e/0xb0 [<ffffffff81509264>] kernel_thread_helper+0x4/0x10 [<ffffffff8106eba0>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff81509260>] ? gs_change+0x13/0x13 Code: 25 28 00 00 00 48 89 45 c8 31 c0 48 8b 87 80 00 00 00 48 8d b5 60 ff ff ff 89 d1 48 89 fb 41 89 d6 4c 89 fa 48 8b 80 b8 00 00 00 <48> 8b 80 50 03 00 00 48 8b 00 48 89 85 38 ff ff ff 48 8b 07 4c RIP [<ffffffff81352bb8>] scsi_send_eh_cmnd+0x58/0x3a0 RSP <ffff88030920dc50> CR2: 0000000000000350 Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Marcus Dennis <marcusx.e.dennis@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16aio: make kiocb->private NUll in init_sync_kiocb()Junxiao Bi1-0/+1
commit 2dfd06036ba7ae8e7be2daf5a2fff1dac42390bf upstream. Ocfs2 uses kiocb.*private as a flag of unsigned long size. In commit a11f7e6 ocfs2: serialize unaligned aio, the unaligned io flag is involved in it to serialize the unaligned aio. As *private is not initialized in init_sync_kiocb() of do_sync_write(), this unaligned io flag may be unexpectly set in an aligned dio. And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased to -1 in ocfs2_dio_end_io(), thus the following unaligned dio will hang forever at ocfs2_aiodio_wait() in ocfs2_file_aio_write(). Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16SCSI & usb-storage: add try_rc_10_first flagAlan Stern1-0/+1
commit 6a0bdffa0073857870a4ed1b4489762146359eb4 upstream. Several bug reports have been received recently for USB mass-storage devices that don't handle READ CAPACITY(16) commands properly. They report bogus sizes, in some cases becoming unusable as a result. The bugs were triggered by commit 09b6b51b0b6c1b9bb61815baf205e4d74c89ff04 (SCSI & usb-storage: add flags for VPD pages and REPORT LUNS), which caused usb-storage to stop overriding the SCSI level reported by devices. By default, the sd driver will try READ CAPACITY(16) first for any device whose level is above SCSI_SPC_2. It seems likely that any device large enough to require the use of READ CAPACITY(16) (i.e., 2 TB or more) would be able to handle READ CAPACITY(10) commands properly. Indeed, I don't know of any devices that don't handle READ CAPACITY(10) properly. Therefore this patch (as1559) adds a new flag telling the sd driver to try READ CAPACITY(10) before READ CAPACITY(16), and sets this flag for every USB mass-storage device. If a device really is larger than 2 TB, sd will fall back to READ CAPACITY(16) just as it used to. This fixes Bugzilla #43391. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Hans de Goede <hdegoede@redhat.com> CC: "James E.J. Bottomley" <JBottomley@parallels.com> CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16rpmsg: make sure inflight messages don't invoke just-removed callbacksOhad Ben-Cohen1-0/+3
commit 15fd943af50dbc5f7f4de33835795c72595f7bf4 upstream. When inbound messages arrive, rpmsg core looks up their associated endpoint (by destination address) and then invokes their callback. We've made sure that endpoints will never be de-allocated after they were found by rpmsg core, but we also need to protect against the (rare) scenario where the rpmsg driver was just removed, and its callback function isn't available anymore. This is achieved by introducing a callback mutex, which must be taken before the callback is invoked, and, obviously, before it is removed. Reported-by: Fernando Guzman Lugo <fernando.lugo@ti.com> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16rpmsg: avoid premature deallocation of endpointsOhad Ben-Cohen1-0/+3
commit 5a081caa0414b9bbb82c17ffab9d6fe66edbb72f upstream. When an inbound message arrives, the rpmsg core looks up its associated endpoint and invokes the registered callback. If a message arrives while its endpoint is being removed (because the rpmsg driver was removed, or a recovery of a remote processor has kicked in) we must ensure atomicity, i.e.: - Either the ept is removed before it is found or - The ept is found but will not be freed until the callback returns This is achieved by maintaining a per-ept reference count, which, when drops to zero, will trigger deallocation of the ept. With this in hand, it is now forbidden to directly deallocate epts once they have been added to the endpoints idr. Reported-by: Fernando Guzman Lugo <fernando.lugo@ti.com> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16mm: fix slab->page _count corruption when using slubPravin B Shelar1-0/+10
commit abca7c4965845924f65d40e0aa1092bdd895e314 upstream. On arches that do not support this_cpu_cmpxchg_double() slab_lock is used to do atomic cmpxchg() on double word which contains page->_count. The page count can be changed from get_page() or put_page() without taking slab_lock. That corrupts page counter. Fix it by moving page->_count out of cmpxchg_double data. So that slub does no change it while updating slub meta-data in struct page. [akpm@linux-foundation.org: use standard comment layout, tweak comment text] Reported-by: Amey Bhide <abhide@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16net: remove skb_orphan_try()Eric Dumazet1-5/+2
[ Upstream commit 62b1a8ab9b3660bb820d8dfe23148ed6cda38574 ] Orphaning skb in dev_hard_start_xmit() makes bonding behavior unfriendly for applications sending big UDP bursts : Once packets pass the bonding device and come to real device, they might hit a full qdisc and be dropped. Without orphaning, the sender is automatically throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming sk_sndbuf is not too big) We could try to defer the orphaning adding another test in dev_hard_start_xmit(), but all this seems of little gain, now that BQL tends to make packets more likely to be parked in Qdisc queues instead of NIC TX ring, in cases where performance matters. Reverts commits : fc6055a5ba31 net: Introduce skb_orphan_try() 87fd308cfc6b net: skb_tx_hash() fix relative to skb_orphan_try() and removes SKBTX_DRV_NEEDS_SK_REF flag Reported-and-bisected-by: Jean-Michel Hautbois <jhautbois@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16bonding: Fix corrupted queue_mappingEric Dumazet1-2/+5
[ Upstream commit 5ee31c6898ea5537fcea160999d60dc63bc0c305 ] In the transmit path of the bonding driver, skb->cb is used to stash the skb->queue_mapping so that the bonding device can set its own queue mapping. This value becomes corrupted since the skb->cb is also used in __dev_xmit_skb. When transmitting through bonding driver, bond_select_queue is called from dev_queue_xmit. In bond_select_queue the original skb->queue_mapping is copied into skb->cb (via bond_queue_mapping) and skb->queue_mapping is overwritten with the bond driver queue. Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes the packet length into skb->cb, thereby overwriting the stashed queue mappping. In bond_dev_queue_xmit (called from hard_start_xmit), the queue mapping for the skb is set to the stashed value which is now the skb length and hence is an invalid queue for the slave device. If we want to save skb->queue_mapping into skb->cb[], best place is to add a field in struct qdisc_skb_cb, to make sure it wont conflict with other layers (eg : Qdiscc, Infiniband...) This patchs also makes sure (struct qdisc_skb_cb)->data is aligned on 8 bytes : netem qdisc for example assumes it can store an u64 in it, without misalignment penalty. Note : we only have 20 bytes left in (struct qdisc_skb_cb)->data[]. The largest user is CHOKe and it fills it. Based on a previous patch from Tom Herbert. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Tom Herbert <therbert@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Roland Dreier <roland@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16inetpeer: fix a race in inetpeer_gc_worker()Eric Dumazet1-1/+4
[ Upstream commit 55432d2b543a4b6dfae54f5c432a566877a85d90 ] commit 5faa5df1fa2024 (inetpeer: Invalidate the inetpeer tree along with the routing cache) added a race : Before freeing an inetpeer, we must respect a RCU grace period, and make sure no user will attempt to increase refcnt. inetpeer_invalidate_tree() waits for a RCU grace period before inserting inetpeer tree into gc_list and waking the worker. At that time, no concurrent lookup can find a inetpeer in this tree. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-16cipso: handle CIPSO options correctly when NetLabel is disabledPaul Moore1-1/+28
[ Upstream commit 20e2a86485967c385d7c7befc1646e4d1d39362e ] When NetLabel is not enabled, e.g. CONFIG_NETLABEL=n, and the system receives a CIPSO tagged packet it is dropped (cipso_v4_validate() returns non-zero). In most cases this is the correct and desired behavior, however, in the case where we are simply forwarding the traffic, e.g. acting as a network bridge, this becomes a problem. This patch fixes the forwarding problem by providing the basic CIPSO validation code directly in ip_options_compile() without the need for the NetLabel or CIPSO code. The new validation code can not perform any of the CIPSO option label/value verification that cipso_v4_validate() does, but it can verify the basic CIPSO option format. The behavior when NetLabel is enabled is unchanged. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22USB: add NO_D3_DURING_SLEEP flag and revert 151b61284776be2Alan Stern2-2/+2
commit c2fb8a3fa25513de8fedb38509b1f15a5bbee47b upstream. This patch (as1558) fixes a problem affecting several ASUS computers: The machine crashes or corrupts memory when going into suspend if the ehci-hcd driver is bound to any controllers. Users have been forced to unbind or unload ehci-hcd before putting their systems to sleep. After extensive testing, it was determined that the machines don't like going into suspend when any EHCI controllers are in the PCI D3 power state. Presumably this is a firmware bug, but there's nothing we can do about it except to avoid putting the controllers in D3 during system sleep. The patch adds a new flag to indicate whether the problem is present, and avoids changing the controller's power state if the flag is set. Runtime suspend is unaffected; this matters only for system suspend. However as a side effect, the controller will not respond to remote wakeup requests while the system is asleep. Hence USB wakeup is not functional -- but of course, this is already true in the current state of affairs. A similar patch has already been applied as commit 151b61284776be2d6f02d48c23c3625678960b97 (USB: EHCI: fix crash during suspend on ASUS computers). The patch supersedes that one and reverts it. There are two differences: The old patch added the flag at the USB level; this patch adds it at the PCI level. The old patch applied to all chipsets with the same vendor, subsystem vendor, and product IDs; this patch makes an exception for a known-good system (based on DMI information). Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Dâniel Fraga <fragabr@gmail.com> Tested-by: Andrey Rahmatullin <wrar@wrar.name> Tested-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-22swap: fix shmem swapping when more than 8 areasHugh Dickins1-3/+5
commit 9b15b817f3d62409290fd56fe3cbb076a931bb0a upstream. Minchan Kim reports that when a system has many swap areas, and tmpfs swaps out to the ninth or more, shmem_getpage_gfp()'s attempts to read back the page cannot locate it, and the read fails with -ENOMEM. Whoops. Yes, I blindly followed read_swap_header()'s pte_to_swp_entry( swp_entry_to_pte()) technique for determining maximum usable swap offset, without stopping to realize that that actually depends upon the pte swap encoding shifting swap offset to the higher bits and truncating it there. Whereas our radix_tree swap encoding leaves offset in the lower bits: it's swap "type" (that is, index of swap area) that was truncated. Fix it by reducing the SWP_TYPE_SHIFT() in swapops.h, and removing the broken radix_to_swp_entry(swp_to_radix_entry()) from read_swap_header(). This does not reduce the usable size of a swap area any further, it leaves it as claimed when making the original commit: no change from 3.0 on x86_64, nor on i386 without PAE; but 3.0's 512GB is reduced to 128GB per swapfile on i386 with PAE. It's not a change I would have risked five years ago, but with x86_64 supported for ten years, I believe it's appropriate now. Hmm, and what if some architecture implements its swap pte with offset encoded below type? That would equally break the maximum usable swap offset check. Happily, they all follow the same tradition of encoding offset above type, but I'll prepare a check on that for next. Reported-and-Reviewed-and-Tested-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-17libata: add a host flag to ignore detected ATA devicesAndy Whitcroft1-0/+1
commit db63a4c8115a0bb904496e1cdd3e7488e68b0d06 upstream. Where devices are visible via more than one host we sometimes wish to indicate that cirtain devices should be ignored on a specific host. Add a host flag indicating that this host wishes to ignore ATA specific devices. Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Cc: Victor Miasnikov <vvm@tut.by> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-17module_param: stop double-calling parameters.Rusty Russell1-5/+5
commit ae82fdb1406ad41d68f07027fe31f2d35ba22a90 upstream. Commit 026cee0086fe1df4cf74691cf273062cc769617d "params: <level>_initcall-like kernel parameters" set old-style module parameters to level 0. And we call those level 0 calls where we used to, early in start_kernel(). We also loop through the initcall levels and call the levelled module_params before the corresponding initcall. Unfortunately level 0 is early_init(), so we call the standard module_param calls twice. (Turns out most things don't care, but at least ubi.mtd does). Change the level to -1 for standard module_param calls. Reported-by: Benoît Thébaudeau <benoit.thebaudeau@advansee.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10drm/radeon/kms: add new SI PCI idsAlex Deucher1-2/+3
commit 7aaa61b3476462b69f1ac7669fcca8d608ce3cb5 upstream. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10drm/radeon/kms: add new BTC PCI idsAlex Deucher1-0/+2
commit a2bef8ce826dd1e787fd8ad9b6e0566ba59dab43 upstream. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10drm/radeon/kms: add new Palm, Sumo PCI idsAlex Deucher1-0/+2
commit 4a6991cc1fad514745b79181df3ace72d561e7aa upstream. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10drm/radeon/kms: add new Trinity PCI idsAlex Deucher1-0/+8
commit d430f7dbf7bd6aaaa40c0660b3204df8cf07b22b upstream. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10radix-tree: fix contiguous iteratorKonstantin Khlebnikov1-1/+4
commit fffaee365fded09f9ebf2db19066065fa54323c3 upstream. This patch fixes bug in macro radix_tree_for_each_contig(). If radix_tree_next_slot() sees NULL in next slot it returns NULL, but following radix_tree_next_chunk() switches iterating into next chunk. As result iterating becomes non-contiguous and breaks vfs "splice" and all its users. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Reported-and-bisected-by: Hans de Bruin <jmdebruin@xmsnet.nl> Reported-and-bisected-by: Ondrej Zary <linux@rainbow-software.org> Reported-bisected-and-tested-by: Toralf Förster <toralf.foerster@gmx.de> Link: https://lkml.org/lkml/2012/6/5/64 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10skb: avoid unnecessary reallocations in __skb_cowFelix Fietkau1-2/+0
[ Upstream commit 617c8c11236716dcbda877e764b7bf37c6fd8063 ] At the beginning of __skb_cow, headroom gets set to a minimum of NET_SKB_PAD. This causes unnecessary reallocations if the buffer was not cloned and the headroom is just below NET_SKB_PAD, but still more than the amount requested by the caller. This was showing up frequently in my tests on VLAN tx, where vlan_insert_tag calls skb_cow_head(skb, VLAN_HLEN). Locally generated packets should have enough headroom, and for forward paths, we already have NET_SKB_PAD bytes of headroom, so we don't need to add any extra space here. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10ipv6: fix incorrect ipsec fragmentGao feng1-0/+1
[ Upstream commit 0c1833797a5a6ec23ea9261d979aa18078720b74 ] Since commit ad0081e43a "ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed" the fragment of packets is incorrect. because tunnel mode needs IPsec headers and trailer for all fragments, while on transport mode it is sufficient to add the headers to the first fragment and the trailer to the last. so modify mtu and maxfraglen base on ipsec mode and if fragment is first or last. with my test,it work well(every fragment's size is the mtu) and does not trigger slow fragment path. Changes from v1: though optimization, mtu_prev and maxfraglen_prev can be delete. replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL. add fuction ip6_append_data_mtu to make codes clearer. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-10kbuild: install kernel-page-flags.hUlrich Drepper2-0/+5
commit 9295b7a07c859a42346221b5839be0ae612333b0 upstream. Programs using /proc/kpageflags need to know about the various flags. The <linux/kernel-page-flags.h> provides them and the comments in the file indicate that it is supposed to be used by user-level code. But the file is not installed. Install the headers and mark the unstable flags as out-of-bounds. The page-type tool is also adjusted to not duplicate the definitions Signed-off-by: Ulrich Drepper <drepper@gmail.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-01mmc: sdio: avoid spurious calls to interrupt handlersNicolas Pitre1-0/+2
commit bbbc4c4d8c5face097d695f9bf3a39647ba6b7e7 upstream. Commit 06e8935feb ("optimized SDIO IRQ handling for single irq") introduced some spurious calls to SDIO function interrupt handlers, such as when the SDIO IRQ thread is started, or the safety check performed upon a system resume. Let's add a flag to perform the optimization only when a real interrupt is signaled by the host driver and we know there is no point confirming it. Reported-by: Sujit Reddy Thumma <sthumma@codeaurora.org> Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-01xen: do not map the same GSI twice in PVHVM guests.Stefano Stabellini1-0/+3
commit 68c2c39a76b094e9b2773e5846424ea674bf2c46 upstream. PV on HVM guests map GSIs into event channels. At restore time the event channels are resumed by restore_pirqs. Device drivers might try to register the same GSI again through ACPI at restore time, but the GSI has already been mapped and bound by restore_pirqs. This patch detects these situations and avoids mapping the same GSI multiple times. Without this patch we get: (XEN) irq.c:2235: dom4: pirq 23 or emuirq 28 already mapped and waste a pirq. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-06-01usbhid: prevent deadlock during timeoutOliver Neukum1-0/+3
commit 8815bb09af21316aeb5f8948b24ac62181670db2 upstream. On some HCDs usb_unlink_urb() can directly call the completion handler. That limits the spinlocks that can be taken in the handler to locks not held while calling usb_unlink_urb() To prevent a race with resubmission, this patch exposes usbcore's infrastructure for blocking submission, uses it and so drops the lock without causing a race in usbhid. Signed-off-by: Oliver Neukum <oneukum@suse.de> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-19Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds3-7/+3
Pull block layer fixes from Jens Axboe: "A few small, but important fixes. Most of them are marked for stable as well - Fix failure to release a semaphore on error path in mtip32xx. - Fix crashable condition in bio_get_nr_vecs(). - Don't mark end-of-disk buffers as mapped, limit it to i_size. - Fix for build problem with CONFIG_BLOCK=n on arm at least. - Fix for a buffer overlow on UUID partition printing. - Trivial removal of unused variables in dac960." * 'for-linus' of git://git.kernel.dk/linux-block: block: fix buffer overflow when printing partition UUIDs Fix blkdev.h build errors when BLOCK=n bio allocation failure due to bio_get_nr_vecs() block: don't mark buffers beyond end of disk as mapped mtip32xx: release the semaphore on an error path dac960: Remove unused variables from DAC960_CreateProcEntries()
2012-05-17Merge branches 'perf-urgent-for-linus', 'x86-urgent-for-linus' and ↵Linus Torvalds1-0/+2
'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf, x86 and scheduler updates from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing: Do not enable function event with enable perf stat: handle ENXIO error for perf_event_open perf: Turn off compiler warnings for flex and bison generated files perf stat: Fix case where guest/host monitoring is not supported by kernel perf build-id: Fix filename size calculation * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable x86: Fix section annotation of acpi_map_cpu2node() x86/microcode: Ensure that module is only loaded on supported Intel CPUs * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption
2012-05-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-1/+19
Pull networking tree from David Miller: 1) ptp_pch driver build broke during this merge window due to missing slab.h header, fix from Geery Uytterhoeven. 2) If ipset passes in a bogus hash table size we crash because the size is not validated properly. Compounding this, gcc-4.7 can miscompile ipset such that even when the user specifies legitimate parameters the tool passes in an out-of-range size to the kernel. Fix from Jozsef Kadlecsik. 3) Users have reported that the netdev watchdog can trigger with pch_gbe devices, and it turns out this is happening because of races in the TX path of the driver leading to the transmitter hanging. Fix from Eric Dumazet, reported and tested by Andy Cress. 4) Novatel USB551L devices match the generic class entries for the cdc ethernet USB driver, but they don't work because they have generic descriptors and thus need FLAG_WWAN to function properly. Add the necessary ID table entry to fix this, from Dan Williams. 5) A recursive locking fix in the USBNET driver added a new problem, in that packet list traversal is now racy and we can thus access unlinked SKBs and crash. Avoid this situation by adding some extra state tracking, from Ming Lei. 6) The rtlwifi conversion to asynchronous firmware loading is racy, fix by reordering the probe procedure. From Larry Finger. Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=43187 7) Fix regressions with bluetooth keyboards by notifying userland properly when the security level changes, from Gustavo Padovan. 8) Bluetooth needs to make sure device connected events are emitted before other kinds of events, otherwise userspace will think there is no baseband link yet and therefore abort the sockets associated with that connection. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: netfilter: ipset: fix hash size checking in kernel ptp_pch: Add missing #include <linux/slab.h> pch_gbe: fix transmit races cdc_ether: add Novatel USB551L device IDs for FLAG_WWAN usbnet: fix skb traversing races during unlink(v2) Bluetooth: mgmt: Fix device_connected sending order Bluetooth: notify userspace of security level change rtlwifi: fix for race condition when firmware is cached
2012-05-16netfilter: ipset: fix hash size checking in kernelJozsef Kadlecsik1-0/+16
The hash size must fit both into u32 (jhash) and the max value of size_t. The missing checking could lead to kernel crash, bug reported by Seblu. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16Merge branch 'for-davem' of ↵David S. Miller1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John Linville says: Here are three more fixes that some of my developers are desperate to see included in 3.4... Johan Hedberg went to some length justifyng the inclusion of these two Bluetooth fixes: "The device_connected fix should be quite self-explanatory, but it's actually a wider issue than just for keyboards. All profiles that do incoming connection authorization (e.g. headsets) will break without it with specific hardware. The reason it wasn't caught earlier is that it only occurs with specific Bluetooth adapters. As for the security level patch, this fixes L2CAP socket based security level elevation during a connection. The HID profile needs this (for keyboards) and it is the only way to achieve the security level elevation when using the management interface to talk to the kernel (hence the management enabling patch being the one that exposes this" The rtlwifi fix addresses a regression related to firmware loading, as described in kernel.org bug 43187. It basically just moves a hunk of code to a more appropriate place. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15Merge branch 'master' of ↵John W. Linville1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2012-05-15usbnet: fix skb traversing races during unlink(v2)Ming Lei1-1/+2
Commit 4231d47e6fe69f061f96c98c30eaf9fb4c14b96d(net/usbnet: avoid recursive locking in usbnet_stop()) fixes the recursive locking problem by releasing the skb queue lock before unlink, but may cause skb traversing races: - after URB is unlinked and the queue lock is released, the refered skb and skb->next may be moved to done queue, even be released - in skb_queue_walk_safe, the next skb is still obtained by next pointer of the last skb - so maybe trigger oops or other problems This patch extends the usage of entry->state to describe 'start_unlink' state, so always holding the queue(rx/tx) lock to change the state if the referd skb is in rx or tx queue because we need to know if the refered urb has been started unlinking in unlink_urbs. The other part of this patch is based on Huajun's patch: always traverse from head of the tx/rx queue to get skb which is to be unlinked but not been started unlinking. Signed-off-by: Huajun Li <huajun.li.lee@gmail.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Cc: Oliver Neukum <oneukum@suse.de> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15block: fix buffer overflow when printing partition UUIDsTejun Heo1-6/+0
6d1d8050b4bc8 "block, partition: add partition_meta_info to hd_struct" added part_unpack_uuid() which assumes that the passed in buffer has enough space for sprintfing "%pU" - 37 characters including '\0'. Unfortunately, b5af921ec0233 "init: add support for root devices specified by partition UUID" supplied 33 bytes buffer to the function leading to the following panic with stackprotector enabled. Kernel panic - not syncing: stack-protector: Kernel stack corrupted in: ffffffff81b14c7e [<ffffffff815e226b>] panic+0xba/0x1c6 [<ffffffff81b14c7e>] ? printk_all_partitions+0x259/0x26xb [<ffffffff810566bb>] __stack_chk_fail+0x1b/0x20 [<ffffffff81b15c7e>] printk_all_paritions+0x259/0x26xb [<ffffffff81aedfe0>] mount_block_root+0x1bc/0x27f [<ffffffff81aee0fa>] mount_root+0x57/0x5b [<ffffffff81aee23b>] prepare_namespace+0x13d/0x176 [<ffffffff8107eec0>] ? release_tgcred.isra.4+0x330/0x30 [<ffffffff81aedd60>] kernel_init+0x155/0x15a [<ffffffff81087b97>] ? schedule_tail+0x27/0xb0 [<ffffffff815f4d24>] kernel_thread_helper+0x5/0x10 [<ffffffff81aedc0b>] ? start_kernel+0x3c5/0x3c5 [<ffffffff815f4d20>] ? gs_change+0x13/0x13 Increase the buffer size, remove the dangerous part_unpack_uuid() and use snprintf() directly from printk_all_partitions(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Szymon Gruszczynski <sz.gruszczynski@googlemail.com> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-14Merge branch 'v4l_for_linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "For a some fix patches for v3.4, including a regression fix at DVB core" Fix up trivial conflicts in Documentation/feature-removal-schedule.txt * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] gspca - sonixj: Fix a zero divide in isoc interrupt [media] media: videobuf2-dma-contig: include header for exported symbols [media] media: videobuf2-dma-contig: quiet sparse noise about plain integer as NULL pointer [media] media: vb2-memops: Export vb2_get_vma symbol [media] s5p-fimc: Correct memory allocation for VIDIOC_CREATE_BUFS [media] s5p-fimc: Fix locking in subdev set_crop op [media] dvb_frontend: fix a regression with DVB-S zig-zag [media] fintek-cir: change || to && [media] V4L: Schedule V4L2_CID_HCENTER, V4L2_CID_VCENTER controls for removal [media] rc: Postpone ISR registration [media] marvell-cam: fix an ARM build error [media] V4L: soc-camera: protect hosts during probing from overzealous user-space
2012-05-14Bluetooth: notify userspace of security level changeGustavo Padovan1-0/+1
It fixes L2CAP socket based security level elevation during a connection. The HID profile needs this (for keyboards) and it is the only way to achieve the security level elevation when using the management interface to talk to the kernel (hence the management enabling patch being the one that exposes this issue). It enables the userspace a security level change when the socket is already connected and create a way to notify the socket the result of the request. At the moment of the request the socket is made non writable, if the request fails the connections closes, otherwise the socket is made writable again, POLL_OUT is emmited. Signed-off-by: Gustavo Padovan <gustavo@padovan.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-05-14Fix blkdev.h build errors when BLOCK=nRussell King1-1/+2
I see builds failing with: CC [M] drivers/mmc/host/dw_mmc.o In file included from drivers/mmc/host/dw_mmc.c:15: include/linux/blkdev.h:1404: warning: 'struct task_struct' declared inside parameter list include/linux/blkdev.h:1404: warning: its scope is only this definition or declaration, which is probably not what you want include/linux/blkdev.h:1408: warning: 'struct task_struct' declared inside parameter list include/linux/blkdev.h:1413: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'blk_needs_flush_plug' make[4]: *** [drivers/mmc/host/dw_mmc.o] Error 1 This is because dw_mmc.c includes linux/blkdev.h as the very first file, and when CONFIG_BLOCK=n, blkdev.h omits all includes. As it requires linux/sched.h even when CONFIG_BLOCK=n, move this out of the #ifdef. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-14/+19
Pull networking fixes from David S. Miller: 1) Since we do RCU lookups on ipv4 FIB entries, we have to test if the entry is dead before returning it to our caller. 2) openvswitch locking and packet validation fixes from Ansis Atteka, Jesse Gross, and Pravin B Shelar. 3) Fix PM resume locking in IGB driver, from Benjamin Poirier. 4) Fix VLAN header handling in vhost-net and macvtap, from Basil Gor. 5) Revert a bogus network namespace isolation change that was causing regressions on S390 networking devices. 6) If bonding decides to process and handle a LACPDU frame, we shouldn't bump the rx_dropped counter. From Jiri Bohac. 7) Fix mis-calculation of available TX space in r8169 driver when doing TSO, which can lead to crashes and/or hung device. From Julien Ducourthial. 8) SCTP does not validate cached routes properly in all cases, from Nicolas Dichtel. 9) Link status interrupt needs to be handled in ks8851 driver, from Stephen Boyd. 10) Use capable(), not cap_raised(), in connector/userns netlink code. From Eric W. Biederman via Andrew Morton. 11) Fix pktgen OOPS on module unload, from Eric Dumazet. 12) iwlwifi under-estimates SKB truesizes, also from Eric Dumazet. 13) Cure division by zero in SFC driver, from Ben Hutchings. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits) ks8851: Update link status during link change interrupt macvtap: restore vlan header on user read vhost-net: fix handle_rx buffer size bonding: don't increase rx_dropped after processing LACPDUs connector/userns: replace netlink uses of cap_raised() with capable() sctp: check cached dst before using it pktgen: fix crash at module unload Revert "net: maintain namespace isolation between vlan and real device" ehea: fix losing of NEQ events when one event occurred early igb: fix rtnl race in PM resume path ipv4: Do not use dead fib_info entries. r8169: fix unsigned int wraparound with TSO sfc: Fix division by zero when using one RX channel and no SR-IOV openvswitch: Validation of IPv6 set port action uses IPv4 header net: compare_ether_addr[_64bits]() has no ordering cdc_ether: Ignore bogus union descriptor for RNDIS devices bnx2x: bug fix when loading after SAN boot e1000: Silence sparse warnings by correcting type igb, ixgbe: netdev_tx_reset_queue incorrectly called from tx init path openvswitch: Release rtnl_lock if ovs_vport_cmd_build_info() failed. ...
2012-05-11block: don't mark buffers beyond end of disk as mappedJeff Moyer1-0/+1
Hi, We have a bug report open where a squashfs image mounted on ppc64 would exhibit errors due to trying to read beyond the end of the disk. It can easily be reproduced by doing the following: [root@ibm-p750e-02-lp3 ~]# ls -l install.img -rw-r--r-- 1 root root 142032896 Apr 30 16:46 install.img [root@ibm-p750e-02-lp3 ~]# mount -o loop ./install.img /mnt/test [root@ibm-p750e-02-lp3 ~]# dd if=/dev/loop0 of=/dev/null dd: reading `/dev/loop0': Input/output error 277376+0 records in 277376+0 records out 142016512 bytes (142 MB) copied, 0.9465 s, 150 MB/s In dmesg, you'll find the following: squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 43.106012] attempt to access beyond end of device [ 43.106029] loop0: rw=0, want=277410, limit=277408 [ 43.106039] Buffer I/O error on device loop0, logical block 138704 [ 43.106053] attempt to access beyond end of device [ 43.106057] loop0: rw=0, want=277412, limit=277408 [ 43.106061] Buffer I/O error on device loop0, logical block 138705 [ 43.106066] attempt to access beyond end of device [ 43.106070] loop0: rw=0, want=277414, limit=277408 [ 43.106073] Buffer I/O error on device loop0, logical block 138706 [ 43.106078] attempt to access beyond end of device [ 43.106081] loop0: rw=0, want=277416, limit=277408 [ 43.106085] Buffer I/O error on device loop0, logical block 138707 [ 43.106089] attempt to access beyond end of device [ 43.106093] loop0: rw=0, want=277418, limit=277408 [ 43.106096] Buffer I/O error on device loop0, logical block 138708 [ 43.106101] attempt to access beyond end of device [ 43.106104] loop0: rw=0, want=277420, limit=277408 [ 43.106108] Buffer I/O error on device loop0, logical block 138709 [ 43.106112] attempt to access beyond end of device [ 43.106116] loop0: rw=0, want=277422, limit=277408 [ 43.106120] Buffer I/O error on device loop0, logical block 138710 [ 43.106124] attempt to access beyond end of device [ 43.106128] loop0: rw=0, want=277424, limit=277408 [ 43.106131] Buffer I/O error on device loop0, logical block 138711 [ 43.106135] attempt to access beyond end of device [ 43.106139] loop0: rw=0, want=277426, limit=277408 [ 43.106143] Buffer I/O error on device loop0, logical block 138712 [ 43.106147] attempt to access beyond end of device [ 43.106151] loop0: rw=0, want=277428, limit=277408 [ 43.106154] Buffer I/O error on device loop0, logical block 138713 [ 43.106158] attempt to access beyond end of device [ 43.106162] loop0: rw=0, want=277430, limit=277408 [ 43.106166] attempt to access beyond end of device [ 43.106169] loop0: rw=0, want=277432, limit=277408 ... [ 43.106307] attempt to access beyond end of device [ 43.106311] loop0: rw=0, want=277470, limit=2774 Squashfs manages to read in the end block(s) of the disk during the mount operation. Then, when dd reads the block device, it leads to block_read_full_page being called with buffers that are beyond end of disk, but are marked as mapped. Thus, it would end up submitting read I/O against them, resulting in the errors mentioned above. I fixed the problem by modifying init_page_buffers to only set the buffer mapped if it fell inside of i_size. Cheers, Jeff Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Nick Piggin <npiggin@kernel.dk> -- Changes from v1->v2: re-used max_block, as suggested by Nick Piggin. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-10sctp: check cached dst before using itNicolas Dichtel1-0/+13
dst_check() will take care of SA (and obsolete field), hence IPsec rekeying scenario is taken into account. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Vlad Yaseivch <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-10Revert "net: maintain namespace isolation between vlan and real device"David S. Miller1-9/+0
This reverts commit 8a83a00b0735190384a348156837918271034144. It causes regressions for S390 devices, because it does an unconditional DST drop on SKBs for vlans and the QETH device needs the neighbour entry hung off the DST for certain things on transmit. Arnd can't remember exactly why he even needed this change. Conflicts: drivers/net/macvlan.c net/8021q/vlan_dev.c net/core/dev.c Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-10tracing: Do not enable function event with enableSteven Rostedt1-0/+2
With the adding of function tracing event to perf, it caused a side effect that produces the following warning when enabling all events in ftrace: # echo 1 > /sys/kernel/debug/tracing/events/enable [console] event trace: Could not enable event function This is because when enabling all events via the debugfs system it ignores events that do not have a ->reg() function assigned. This was to skip over the ftrace internal events (as they are not TRACE_EVENTs). But as the ftrace function event now has a ->reg() function attached to it for use with perf, it is no longer ignored. Worse yet, this ->reg() function is being called when it should not be. It returns an error and causes the above warning to be printed. By adding a new event_call flag (TRACE_EVENT_FL_IGNORE_ENABLE) and have all ftrace internel event structures have it set, setting the events/enable will no longe try to incorrectly enable the function event and does not warn. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>