summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2008-04-18SPARC64: Fix __get_cpu_var in preemption-enabled area.David S. Miller1-1/+2
Upstream commit: 69072f6e8e4bd4799d2a54e4ff8771d0657512c1 Reported by Mariusz Kozlowski. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18SPARC64: Fix atomic backoff limit.David S. Miller1-1/+2
Upstream commit: 4cfea5a7dfcc2766251e50ca30271a782d5004ad 4096 will not fit into the immediate field of a compare instruction, in fact it will end up being -4096 causing the check to fail every time and thus disabling backoff. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18VLAN: Don't copy ALLMULTI/PROMISC flags from underlying devicePatrick McHardy1-1/+1
Upstream commit: 0ed21b321a13421e2dfeaa70a6c324e05e3e91e6 Changing these flags requires to use dev_set_allmulti/dev_set_promiscuity or dev_change_flags. Setting it directly causes two unwanted effects: - the next dev_change_flags call will notice a difference between dev->gflags and the actual flags, enable promisc/allmulti mode and incorrectly update dev->gflags - this keeps the underlying device in promisc/allmulti mode until the VLAN device is deleted [ Ported back to 2.6.24 VLAN code. -DaveM ] Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18TCP: Let skbs grow over a page on fast peersHerbert Xu1-2/+2
Upstream commit: 69d1506731168d6845a76a303b2c45f7c05f3f2c While testing the virtio-net driver on KVM with TSO I noticed that TSO performance with a 1500 MTU is significantly worse compared to the performance of non-TSO with a 16436 MTU. The packet dump shows that most of the packets sent are smaller than a page. Looking at the code this actually is quite obvious as it always stop extending the packet if it's the first packet yet to be sent and if it's larger than the MSS. Since each extension is bound by the page size, this means that (given a 1500 MTU) we're very unlikely to construct packets greater than a page, provided that the receiver and the path is fast enough so that packets can always be sent immediately. The fix is also quite obvious. The push calls inside the loop is just an optimisation so that we don't end up doing all the sending at the end of the loop. Therefore there is no specific reason why it has to do so at MSS boundaries. For TSO, the most natural extension of this optimisation is to do the pushing once the skb exceeds the TSO size goal. This is what the patch does and testing with KVM shows that the TSO performance with a 1500 MTU easily surpasses that of a 16436 MTU and indeed the packet sizes sent are generally larger than 16436. I don't see any obvious downsides for slower peers or connections, but it would be prudent to test this extensively to ensure that those cases don't regress. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18TCP: Fix shrinking windows with window scalingPatrick McHardy1-1/+1
Upstream commit: 607bfbf2d55dd1cfe5368b41c2a81a8c9ccf4723 When selecting a new window, tcp_select_window() tries not to shrink the offered window by using the maximum of the remaining offered window size and the newly calculated window size. The newly calculated window size is always a multiple of the window scaling factor, the remaining window size however might not be since it depends on rcv_wup/rcv_nxt. This means we're effectively shrinking the window when scaling it down. The dump below shows the problem (scaling factor 2^7): - Window size of 557 (71296) is advertised, up to 3111907257: IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...> - New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes below the last end: IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...> The number 40 results from downscaling the remaining window: 3111907257 - 3111841425 = 65832 65832 / 2^7 = 514 65832 % 2^7 = 40 If the sender uses up the entire window before it is shrunk, this can have chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq() will notice that the window has been shrunk since tcp_wnd_end() is before tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number. This will fail the receivers checks in tcp_sequence() however since it is before it's tp->rcv_wup, making it respond with a dupack. If both sides are in this condition, this leads to a constant flood of ACKs until the connection times out. Make sure the window is never shrunk by aligning the remaining window to the window scaling factor. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18NET: Fix multicast device ioctl checksPatrick McHardy1-2/+2
Upstream commit: 61ee6bd487b9cc160e533034eb338f2085dc7922 SIOCADDMULTI/SIOCDELMULTI check whether the driver has a set_multicast_list method to determine whether it supports multicast. Drivers implementing secondary unicast support use set_rx_mode however. Check for both dev->set_multicast_mode and dev->set_rx_mode to determine multicast capabilities. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18SCTP: Fix local_addr deletions during list traversals.Chidambar 'ilLogict' Zinnoury3-3/+9
Upstream commit: 22626216c46f2ec86287e75ea86dd9ac3df54265 Since the lists are circular, we need to explicitely tag the address to be deleted since we might end up freeing the list head instead. This fixes some interesting SCTP crashes. Signed-off-by: Chidambar 'ilLogict' Zinnoury <illogict@online.fr> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18sch_htb: fix "too many events" situationMartin Devera1-6/+7
Upstream commit: 8f3ea33a5078a09eba12bfe57424507809367756 HTB is event driven algorithm and part of its work is to apply scheduled events at proper times. It tried to defend itself from livelock by processing only limited number of events per dequeue. Because of faster computers some users already hit this hardcoded limit. This patch limits processing up to 2 jiffies (why not 1 jiffie ? because it might stop prematurely when only fraction of jiffie remains). Signed-off-by: Martin Devera <devik@cdi.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18NET: Add preemption point in qdisc_runHerbert Xu1-3/+15
Upstream commit: 2ba2506ca7ca62c56edaa334b0fe61eb5eab6ab0 The qdisc_run loop is currently unbounded and runs entirely in a softirq. This is bad as it may create an unbounded softirq run. This patch fixes this by calling need_resched and breaking out if necessary. It also adds a break out if the jiffies value changes since that would indicate we've been transmitting for too long which starves other softirqs. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18PPPOL2TP: Fix SMP issues in skb reorder queue handlingJames Chapman1-3/+8
Upstream commit: e653181dd6b3ad38ce14904351b03a5388f4b0f7 When walking a session's packet reorder queue, use skb_queue_walk_safe() since the list could be modified inside the loop. Rearrange the unlinking skbs from the reorder queue such that it is done while the queue lock is held in pppol2tp_recv_dequeue() when walking the skb list. A version of this patch was suggested by Jarek Poplawski. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18PPPOL2TP: Make locking calls softirq-safeJames Chapman1-29/+29
Upstream commit: cf3752e2d203bbbfc88d29e362e6938cef4339b3 Fix locking issues in the pppol2tp driver which can cause a kernel crash on SMP boxes. There were two problems:- 1. The driver was violating read_lock() and write_lock() scheduling rules because it wasn't using softirq-safe locks in softirq contexts. So we now consistently use the _bh variants of the lock functions. 2. The driver was calling sk_dst_get() in pppol2tp_xmit() which was taking sk_dst_lock in softirq context. We now call __sk_dst_get(). Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18netpoll: zap_completion_queue: adjust skb->users counterJarek Poplawski1-2/+4
Upstream commit: 8a455b087c9629b3ae3b521b4f1ed16672f978cc zap_completion_queue() retrieves skbs from completion_queue where they have zero skb->users counter. Before dev_kfree_skb_any() it should be non-zero yet, so it's increased now. Reported-and-tested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18LLC: Restrict LLC sockets to rootPatrick McHardy1-0/+3
Upstream commit: 3480c63bdf008e9289aab94418f43b9592978fff LLC currently allows users to inject raw frames, including IP packets encapsulated in SNAP. While Linux doesn't handle IP over SNAP, other systems do. Restrict LLC sockets to root similar to packet sockets. [ Modified Patrick's patch to use CAP_NEW_RAW --DaveM ] Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18INET: inet_frag_evictor() must run with BH disabledDavid S. Miller1-0/+2
Part of upstream commit: e8e16b706e8406f1ab3bccab16932ebc513896d8 Based upon a lockdep trace from Dave Jones. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18SUNGEM: Fix NAPI assertion failure.David S. Miller1-1/+1
Upstream commit: da990a2402aeaee84837f29054c4628eb02f7493 As reported by Johannes Berg: I started getting this warning with recent kernels: [ 773.908927] ------------[ cut here ]------------ [ 773.908954] Badness at net/core/dev.c:2204 ... If we loop more than once in gem_poll(), we'll use more than the real budget in our gem_rx() calls, thus eventually trigger the caller's assertions in net_rx_action(). Subtract "work_done" from "budget" for the second arg to gem_rx() to fix the bug. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18NET: include <linux/types.h> into linux/ethtool.h for __u* typedefKirill A. Shutemov1-0/+1
Upstream commit: e621e69137b24fdbbe7ad28214e8d81e614c25b7 Signed-off-by: Kirill A. Shutemov <k.shutemov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18AX25 ax25_out: check skb for NULL in ax25_kick()Jarek Poplawski1-2/+11
Upstream commit: f47b7257c7368698eabff6fd7b340071932af640 According to some OOPS reports ax25_kick tries to clone NULL skbs sometimes. It looks like a race with ax25_clear_queues(). Probably there is no need to add more than a simple check for this yet. Another report suggested there are probably also cases where ax25 ->paclen == 0 can happen in ax25_output(); this wasn't confirmed during testing but let's leave this debugging check for some time. Reported-and-tested-by: Jann Traschewski <jann@gmx.de> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18ipmi: change device node ordering to reflect probe orderCarol Hebert1-1/+1
upstream commit: abd24df828f1a72971db29d1b74fefae104ea9e2 In 2.6.14 a patch was merged which switching the order of the ipmi device naming from in-order-of-discovery over to reverse-order-of-discovery. So on systems with multiple BMC interfaces, the ipmi device names are being created in reverse order relative to how they are discovered on the system (e.g. on an IBM x3950 multinode server with N nodes, the device name for the BMC in the first node is /dev/ipmiN-1 and the device name for the BMC in the last node is /dev/ipmi0, etc.). The problem is caused by the list handling routines chosen in dmi_scan.c. Using list_add() causes the multiple ipmi devices to be added to the device list using a stack-paradigm and so the ipmi driver subsequently pulls them off during initialization in LIFO order. This patch changes the dmi_save_ipmi_device() list handling paradigm to a queue, thereby allowing the ipmi driver to build the ipmi device names in the order in which they are found on the system. Signed-off-by: Carol Hebert <cah@us.ibm.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18mtd: fix broken state in CFI driver caused by FL_SHUTDOWNAlexey Korolev1-5/+5
upstream commit: fb6d080c6f75dfd7e23d5a3575334785aa8738eb THe CFI driver in 2.6.24 kernel is broken. Not so intensive read/write operations cause incomplete writes which lead to kernel panics in JFFS2. We investigated the issue - it is caused by bug in FL_SHUTDOWN parsing code. Sometimes chip returns -EIO as if it is in FL_SHUTDOWN state when it should wait in FL_PONT (error in order of conditions). The following patch fixes the bug in state parsing code of CFI. Also I've added comments to notify developers if they want to add new case in future. Signed-off-by: Alexey Korolev <akorolev@infradead.org> Reviewed-by: Joern Engel <joern@logfs.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18CRYPTO xcbc: Fix crash when ipsec uses xcbc-mac with big data chunkJoy Latten1-8/+9
upstream commit: 1edcf2e1ee2babb011cfca80ad9d202e9c491669 The kernel crashes when ipsec passes a udp packet of about 14XX bytes of data to aes-xcbc-mac. It seems the first xxxx bytes of the data are in first sg entry, and remaining xx bytes are in next sg entry. But we don't check next sg entry to see if we need to go look the page up. I noticed in hmac.c, we do a scatterwalk_sg_next(), to do this check and possible lookup, thus xcbc.c needs to use this routine too. A 15-hour run of an ipsec stress test sending streams of tcp and udp packets of various sizes, using this patch and aes-xcbc-mac completed successfully, so hopefully this fixes the problem. Signed-off-by: Joy Latten <latten@austin.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [chrisw@sous-sol.org: backport to 2.6.24.4] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18USB: serial: ti_usb_3410_5052: Correct TUSB3410 endpoint requirements.Robert Spanton1-2/+2
upstream commit: 1bfd6693cd66f1e79abce62d3e8c3647e1f59a55 The changes introduced in commit 063a2da8f01806906f7d7b1a1424b9afddebc443 changed the semantics of the num_interrupt_in, num_interrupt_out, num_bulk_in and num_bulk_out entries of the usb_serial_driver struct to be the number of endpoints the device has when probed. This patch changes the ti_1port_device usb_serial_driver struct to reflect this change. The single port devices only have 1 bulk_out endpoint in their initial configuration, and so this patch changes the number of other types to NUM_DONT_CARE. The same change probably needs doing to the ti_2port_device struct, but I don't have a two port device at hand. Signed-off-by: Robert Spanton <rspanton@zepler.net> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18USB: serial: fix regression in Visor/Palm OS module for kernels >= 2.6.24Brad Sawatzky1-1/+1
upstream commit: d04863e9e65767feff7807c8f693ac2719dd1944 Fixes a bug/inconsistency revealed by the additional sanity checking in commit 063a2da8f01806906f7d7b1a1424b9afddebc443 introduced in the original 2.6.24 branch. The Handspring Visor / PalmOS 4 device structure defines .num_bulk_out=2 but the usb-serial probe returns num_bulk_out=3, triggering the check in the above commit and forcing a bail out when the device (a Garmin iQue in my case) attempts to connect. The patch bumps the expected number of endpoints to 3. FWIW, this patch will probably solve the following kernel bug report for Treo users (identical symptoms, different model PalmOS units): <http://bugzilla.kernel.org/show_bug.cgi?id=10118> Signed-off-by: Brad Sawatzky <brad+kernel@swatter.net> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18USB: Allow initialization of broken keyspan serial adapters.Clark Rawlins1-0/+4
upstream commit: 822470537d0fc1dee38a2a9c8b8c398bfbb332bb Fixes the keyspan driver after the addition of additional checking of driver requirements introduced in usb-serial.c commit 063a2da8f01806906f7d7b1a1424b9afddebc443. The initialization of the keyspan usb_serial_driver structs were not initializing the num_interrupt_out field and the additional checking was rejecting the end point so the driver wouldn't finish initializing. This commit initializes the fields to NUM_DONT_CARE. It works for the keyspan USA-49WG and doesn't break the USA-19HS which are the two keyspan devices I have to test with. Signed-off-by: Clark Rawlins <clark.rawlins@escient.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18vmcoreinfo: add the symbol "phys_base"Ken'ichi Ohmichi1-0/+1
upstream commit: 629c8b4cdb354518308663aff2f719e02f69ffbe Fix the problem that makedumpfile sometimes fails on x86_64 machine. This patch adds the symbol "phys_base" to a vmcoreinfo data. The vmcoreinfo data has the minimum debugging information only for dump filtering. makedumpfile (dump filtering command) gets it to distinguish unnecessary pages, and makedumpfile creates a small dumpfile. On x86_64 kernel which compiled with CONFIG_PHYSICAL_START=0x0 and CONFIG_RELOCATABLE=y, makedumpfile fails like the following: # makedumpfile -d31 /proc/vmcore dumpfile The kernel version is not supported. The created dumpfile may be incomplete. _exclude_free_page: Can't get next online node. makedumpfile Failed. # The cause is the lack of the symbol "phys_base" in a vmcoreinfo data. If the symbol "phys_base" does not exist, makedumpfile considers an x86_64 kernel as non relocatable. As the result, makedumpfile misunderstands the physical address where the kernel is loaded, and it cannot translate a kernel virtual address to physical address correctly. To fix this problem, this patch adds the symbol "phys_base" to a vmcoreinfo data. Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18hwmon: (w83781d) Fix I/O resource conflict with PNPJean Delvare1-5/+16
upstream commit: 2961cb22ef02850d90e7a12c28a14d74e327df8d Only request I/O ports 0x295-0x296 instead of the full I/O address range. This solves a conflict with PNP resources on a few motherboards. Also request the I/O ports in two parts (4 low ports, 4 high ports) during device detection, otherwise the PNP resource makes the request (and thus the detection) fail. This fixes lm-sensors ticket #2306: http://www.lm-sensors.org/ticket/2306 Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18pci: revert SMBus unhide on HP Compaq nx6110Jean Delvare1-5/+6
upstream commit: a99acc832de1104afaba02d7c2576fd9b9fd6422 This reverts commit 3c0a654e390d00fef9d8faed758f5e1e8078adb5 and fixes kernel bug #10245: http://bugzilla.kernel.org/show_bug.cgi?id=10245 The HP Compaq nc6120 has the same PCI sub-device ID as the nx6110, and the SMBus is used by ACPI for thermal management on the nc6120, so Linux should not attach a native driver to it. This means that this quirk is unsafe and has to be removed. I also added a comment to help developers realize that adding new IDs to this SMBus unhiding quirk table should be done only with great care, and in particular only after checking that ACPI is not making use of the SMBus. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Tomasz Koprowski <tomek@koprowski.org> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18vfs: fix data leak in nobh_write_end()Dmitri Monakhov1-7/+6
upstream commit: 5b41e74ad1b0bf7bc51765ae74e5dc564afc3e48 Current nobh_write_end() implementation ignore partial writes(copied < len) case if page was fully mapped and simply mark page as Uptodate, which is totally wrong because area [pos+copied, pos+len) wasn't updated explicitly in previous write_begin call. It simply contains garbage from pagecache and result in data leakage. #TEST_CASE_BEGIN: ~~~~~~~~~~~~~~~~ In fact issue triggered by classical testcase open("/mnt/test", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3 ftruncate(3, 409600) = 0 writev(3, [{"a", 1}, {NULL, 4095}], 2) = 1 ##TESTCASE_SOURCE: ~~~~~~~~~~~~~~~~~ #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <sys/uio.h> #include <sys/mman.h> #include <errno.h> int main(int argc, char **argv) { int fd, ret; void* p; struct iovec iov[2]; fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666); ftruncate(fd, 409600); iov[0].iov_base="a"; iov[0].iov_len=1; iov[1].iov_base=NULL; iov[1].iov_len=4096; ret = writev(fd, iov, sizeof(iov)/sizeof(struct iovec)); printf("writev = %d, err = %d\n", ret, errno); return 0; } ##TESTCASE RESULT: ~~~~~~~~~~~~~~~~~~ [root@ts63 ~]# mount | grep mnt2 /dev/mapper/test on /mnt2 type ext2 (rw,nobh) [root@ts63 ~]# /tmp/writev /mnt2/test writev = 1, err = 0 [root@ts63 ~]# hexdump -C /mnt2/test 00000000 61 65 62 6f 6f 74 00 00 f0 b9 b4 59 3a 00 00 00 |aeboot.....Y:...| 00000010 20 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 | .......!.......| 00000020 df df df df df df df df df df df df df df df df |................| 00000030 3a 00 00 00 2a 00 00 00 21 00 00 00 00 00 00 00 |:...*...!.......| 00000040 60 c0 8c 00 00 00 00 00 40 4a 8d 00 00 00 00 00 |`.......@J......| 00000050 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 |........A.......| 00000060 74 69 6d 65 20 64 64 20 69 66 3d 2f 64 65 76 2f |time dd if=/dev/| 00000070 6c 6f 6f 70 30 20 20 6f 66 3d 2f 64 65 76 2f 6e |loop0 of=/dev/n| skip.. 00000f50 00 00 00 00 00 00 00 00 31 00 00 00 00 00 00 00 |........1.......| 00000f60 6d 6b 66 73 2e 65 78 74 33 20 2f 64 65 76 2f 76 |mkfs.ext3 /dev/v| 00000f70 7a 76 67 2f 74 65 73 74 20 2d 62 34 30 39 36 00 |zvg/test -b4096.| 00000f80 a0 fe 8c 00 00 00 00 00 21 00 00 00 00 00 00 00 |........!.......| 00000f90 23 31 32 30 35 39 35 30 34 30 34 00 3a 00 00 00 |#1205950404.:...| 00000fa0 20 00 8d 00 00 00 00 00 21 00 00 00 00 00 00 00 | .......!.......| 00000fb0 d0 cf 8c 00 00 00 00 00 10 d0 8c 00 00 00 00 00 |................| 00000fc0 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 |........A.......| 00000fd0 6d 6f 75 6e 74 20 2f 64 65 76 2f 76 7a 76 67 2f |mount /dev/vzvg/| 00000fe0 74 65 73 74 20 20 2f 76 7a 20 2d 6f 20 64 61 74 |test /vz -o dat| 00000ff0 61 3d 77 72 69 74 65 62 61 63 6b 00 00 00 00 00 |a=writeback.....| 00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| As you can see file's page contains garbage from pagecache instead of zeros. #TEST_CASE_END Attached patch: - Add sanity check BUG_ON in order to prevent incorrect usage by caller, This is function invariant because page can has buffers and in no zero *fadata pointer at the same time. - Always attach buffers to page is it is partial write case. - Always switch back to generic_write_end if page has buffers. This is reasonable because if page already has buffer then generic_write_begin was called previously. Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org> Reviewed-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18alloc_percpu() fails to allocate percpu dataEric Dumazet1-1/+14
upstream commit: be852795e1c8d3829ddf3cb1ce806113611fa555 Some oprofile results obtained while using tbench on a 2x2 cpu machine were very surprising. For example, loopback_xmit() function was using high number of cpu cycles to perform the statistic updates, supposed to be real cheap since they use percpu data pcpu_lstats = netdev_priv(dev); lb_stats = per_cpu_ptr(pcpu_lstats, smp_processor_id()); lb_stats->packets++; /* HERE : serious contention */ lb_stats->bytes += skb->len; struct pcpu_lstats is a small structure containing two longs. It appears that on my 32bits platform, alloc_percpu(8) allocates a single cache line, instead of giving to each cpu a separate cache line. Using the following patch gave me impressive boost in various benchmarks ( 6 % in tbench) (all percpu_counters hit this bug too) Long term fix (ie >= 2.6.26) would be to let each CPU allocate their own block of memory, so that we dont need to roudup sizes to L1_CACHE_BYTES, or merging the SGI stuff of course... Note : SLUB vs SLAB is important here to *show* the improvement, since they dont have the same minimum allocation sizes (8 bytes vs 32 bytes). This could very well explain regressions some guys reported when they switched to SLUB. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18PERCPU : __percpu_alloc_mask() can dynamically size percpu_data storageEric Dumazet2-2/+2
upstream commit: b3242151906372f30f57feaa43b4cac96a23edb1 Instead of allocating a fix sized array of NR_CPUS pointers for percpu_data, we can use nr_cpu_ids, which is generally < NR_CPUS. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18xen: fix UP setup of shared_infoJeremy Fitzhardinge1-17/+22
upstream commit: 2e8fe719b57bbdc9e313daed1204bb55fed3ed44 We need to set up the shared_info pointer once we've mapped the real shared_info into its fixmap slot. That needs to happen once the general pagetable setup has been done. Previously, the UP shared_info was set up one in xen_start_kernel, but that was left pointing to the dummy shared info. Unfortunately there's no really good place to do a later setup of the shared_info in UP, so just do it once the pagetable setup has been done. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> [chrisw@sous-sol.org: backport to 2.6.24.4] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18xen: mask out SEP from CPUIDJeremy Fitzhardinge1-0/+1
upstream commit: d40e705903397445c6861a0a56c23e5b2e8f9b9a Fix 32-on-64 pvops kernel: we don't want userspace using syscall/sysenter, even if the hypervisor supports it, so mask it out from CPUID. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18xen: fix RMW when unmasking eventsJeremy Fitzhardinge2-3/+8
upstream commit: 04c44a080d2f699a3042d4e743f7ad2ffae9d538 xen_irq_enable_direct and xen_sysexit were using "andw $0x00ff, XEN_vcpu_info_pending(vcpu)" to unmask events and test for pending ones in one instuction. Unfortunately, the pending flag must be modified with a locked operation since it can be set by another CPU, and the unlocked form of this operation was causing the pending flag to get lost, allowing the processor to return to usermode with pending events and ultimately deadlock. The simple fix would be to make it a locked operation, but that's rather costly and unnecessary. The fix here is to split the mask-clearing and pending-testing into two instructions; the interrupt window between them is of no concern because either way pending or new events will be processed. This should fix lingering bugs in using direct vcpu structure access too. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18slab: fix cache_cache bootstrap in kmem_cache_init()Daniel Yeisley1-2/+2
upstream commit: ec1f5eeeb5a79a0d48036de649a3498da42db565 Commit 556a169dab38b5100df6f4a45b655dddd3db94c1 ("slab: fix bootstrap on memoryless node") introduced bootstrap-time cache_cache list3s for all nodes but forgot that initkmem_list3 needs to be accessed by [somevalue + node]. This patch fixes list_add() corruption in mm/slab.c seen on the ES7000. Cc: Mel Gorman <mel@csn.ul.ie> Cc: Olaf Hering <olaf@aepfle.de> Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18NOHZ: reevaluate idle sleep length after add_timer_on()Thomas Gleixner3-1/+58
upstream commit: 06d8308c61e54346585b2691c13ee3f90cb6fb2f add_timer_on() can add a timer on a CPU which is currently in a long idle sleep, but the timer wheel is not reevaluated by the nohz code on that CPU. So a timer can be delayed for quite a long time. This triggered a false positive in the clocksource watchdog code. To avoid this we need to wake up the idle CPU and enforce the reevaluation of the timer wheel for the next timer event. Add a function, which checks a given CPU for idle state, marks the idle task with NEED_RESCHED and sends a reschedule IPI to notify the other CPU of the change in the timer wheel. Call this function from add_timer_on(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org> -- include/linux/sched.h | 6 ++++++ kernel/sched.c | 43 +++++++++++++++++++++++++++++++++++++++++++ kernel/timer.c | 10 +++++++++- 3 files changed, 58 insertions(+), 1 deletion(-)
2008-04-18inotify: remove debug codeNick Piggin2-15/+5
upstream commit: 0d71bd5993b630a989d15adc2562a9ffe41cd26d The inotify debugging code is supposed to verify that the DCACHE_INOTIFY_PARENT_WATCHED scalability optimisation does not result in notifications getting lost nor extra needless locking generated. Unfortunately there are also some races in the debugging code. And it isn't very good at finding problems anyway. So remove it for now. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Jan Kara <jack@ucw.cz> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christian Lamparter <chunkeey@web.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18inotify: fix raceNick Piggin1-3/+10
upstream commit: d599e36a9ea85432587f4550acc113cd7549d12a There is a race between setting an inode's children's "parent watched" flag when placing the first watch on a parent, and instantiating new children of that parent: a child could miss having its flags set by set_dentry_child_flags, but then inotify_d_instantiate might still see !inotify_inode_watched. The solution is to set_dentry_child_flags after adding the watch. Locking is taken care of, because both set_dentry_child_flags and inotify_d_instantiate hold dcache_lock and child->d_locks. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Robert Love <rlove@google.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Jan Kara <jack@ucw.cz> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christian Lamparter <chunkeey@web.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18USB: new quirk flag to avoid Set-InterfaceAlan Stern3-1/+10
upstream commit: 392e1d9817d0024c96aae237c3c4349e47c976fd This patch (as1057) fixes a problem with the X-Rite/Gretag-Macbeth Eye-One Pro display colorimeter; the device crashes when it receives a Set-Interface request. A new quirk (USB_QUIRK_NO_SET_INTF) is introduced and a quirks entry is created for this device. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> [chrisw@sous-sol.org: backport to 2.6.24.4] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18USB: add support for Motorola ROKR Z6 cellphone in mass storage modeConstantin Baranov3-2/+16
upstream commit: cc36bdd47ae51b66780b317c1fa519221f894405 Motorola ROKR Z6 cellphone has bugs in its USB, so it is impossible to use it as mass storage. Patch describes new "unusual" USB device for it with FIX_INQUIRY and FIX_CAPACITY flags and new BULK_IGNORE_TAG flag. Last flag relaxes check for equality of bcs->Tag and us->tag in usb_stor_Bulk_transport routine. Signed-off-by: Constantin Baranov <const@tltsu.ru> Signed-off-by: Matthew Dharm <mdharm-usb@one-eyed-alien.net> Signed-off-by: Daniel Drake <dsd@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18UIO: add pgprot_noncached() to UIO mmap codeJean-Samuel Chenard1-0/+2
upstream commit: c9698d6b1a90929e427a165bd8283f803f57d9bd Mapping of physical memory in UIO needs pgprot_noncached() to ensure that IO memory is not cached. Without pgprot_noncached(), it (accidentally) works on x86 and arm, but fails on PPC. Signed-off-by: Jean-Samuel Chenard <jsamch@gmail.com> Signed-off-by: Hans J Koch <hjk@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18V4L: ivtv: Add missing sg_init_table()Ian Armstrong1-0/+3
upstream commit: 165e1213e13b49761f8b3fd9314701f83cf3db3a If a dma transfer is attempted for either yuv or framebuffer output, a missing sg_init_table() call causes a kernel BUG in scatterlist.h if CONFIG_DEBUG_SG is set. Signed-off-by: Ian Armstrong <ian@iarmst.demon.co.uk> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18md: remove the 'super' sysfs attribute from devices in an 'md' arrayNeilBrown1-12/+0
upstream commit: 0e82989d95cc46cc58622381eafa54f7428ee679 Exposing the binary blob which is the md 'super-block' via sysfs doesn't really fit with the whole sysfs model, and ever since commit 8118a859dc7abd873193986c77a8d9bdb877adc8 ("sysfs: fix off-by-one error in fill_read_buffer()") it doesn't actually work at all (as the size of the blob is often one page). (akpm: as in, fs/sysfs/file.c:fill_read_buffer() goes BUG) So just remove it altogether. It isn't really useful. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18mtd: memory corruption in block2mtd.cIngo van Lil1-1/+0
upstream commit: 2875fb65f8e40401c4b781ebc5002df10485f635 The block2mtd driver (drivers/mtd/devices/block2mtd.c) will kfree an on-stack pointer when handling an invalid argument line (e.g. block2mtd=/dev/loop0,xxx). The kfree was added some time ago when "name" was dynamically allocated. Signed-off-by: Ingo van Lil <inguin@gmx.de> Acked-by: Joern Engel <joern@lazybastard.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18kbuild: soften modpost checks when doing cross buildsSam Ravnborg5-3/+15
upstream commit: 4ce6efed48d736e3384c39ff87bda723e1f8e041 The module alias support in the kernel have a consistency check where it is checked that the size of a structure in the kernel and on the build host are the same. For cross builds this check does not make sense so detect when we do cross builds and silently skip the check in these situations. This fixes a build bug for a wireless driver when cross building for arm. Acked-by: Michael Buesch <mb@bu3sch.de> Tested-by: Gordon Farquharson <gordonfarquharson@gmail.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> [chrisw@sous-sol.org: backport to 2.6.24.4] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-04-18time: prevent the loop in timespec_add_ns() from being optimised awaySegher Boessenkool1-0/+4
upstream commit: 38332cb98772f5ea757e6486bed7ed0381cb5f98 Since some architectures don't support __udivdi3(). Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sedat Dilek <sedat.dilek@googlemail.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-03-24Linux 2.6.24.4v2.6.24.4Chris Wright1-1/+1
2008-03-24S390 futex: let futex_atomic_cmpxchg_pt survive early functional tests.Heiko Carstens1-0/+2
a0c1e9073ef7428a14309cba010633a6cd6719ea "futex: runtime enable pi and robust functionality" introduces a test wether futex in atomic stuff works or not. It does that by writing to address 0 of the kernel address space. This will crash on older machines where addressing mode switching is enabled but where the mvcos instruction is not available. Page table walking is done by hand and therefore the code tries to access current->mm which is NULL. Therefore add an extra check, so we survive the early test. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-03-24slab: NUMA slab allocator migration bugfixJoe Korty1-3/+2
NUMA slab allocator cpu migration bugfix The NUMA slab allocator (specifically, cache_alloc_refill) is not refreshing its local copies of what cpu and what numa node it is on, when it drops and reacquires the irq block that it inherited from its caller. As a result those values become invalid if an attempt to migrate the process to another numa node occured while the irq block had been dropped. The solution is to make cache_alloc_refill reload these variables whenever it drops and reacquires the irq block. The error is very difficult to hit. When it does occur, one gets the following oops + stack traceback bits in check_spinlock_acquired: kernel BUG at mm/slab.c:2417 cache_alloc_refill+0xe6 kmem_cache_alloc+0xd0 ... This patch was developed against 2.6.23, ported to and compiled-tested only against 2.6.25-rc4. Signed-off-by: Joe Korty <joe.korty@ccur.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-03-24relay: fix subbuf_splice_actor() adding too many pagesJens Axboe1-2/+3
If subbuf_pages was larger than the max number of pages the pipe buffer will hold, subbuf_splice_actor() would happily go beyond the array size. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-03-24BLUETOOTH: Fix bugs in previous conn add/del workqueue changes.Dave Young1-2/+6
Jens Axboe noticed that we were queueing &conn->work on both btaddconn and keventd_wq. Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-03-24SCSI advansys: Fix bug in AdvLoadMicrocodeMatthew Wilcox1-1/+1
commit: 951b62c11e86acf8c55d9828aa8c921575023c29 buf[i] can be up to 0xfd, so doubling it and assigning the result to an unsigned char truncates the value. Just use an unsigned int instead; it's only a temporary. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>