summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2011-05-28Documentation: Add statistics about nested locksJuri Lelli1-2/+34
Explain what the trailing "/1" on some lock class names of lock_stat output means. Reviewed-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4DD4F6C1.5090701@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-26Merge branch 'upstream/tidy-xen-mmu-2.6.39' of ↵Linus Torvalds3-275/+50
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/tidy-xen-mmu-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen: fix compile without CONFIG_XEN_DEBUG_FS Use arbitrary_virt_to_machine() to deal with ioremapped pud updates. Use arbitrary_virt_to_machine() to deal with ioremapped pmd updates. xen/mmu: remove all ad-hoc stats stuff xen: use normal virt_to_machine for ptes xen: make a pile of mmu pvop functions static vmalloc: remove vmalloc_sync_all() from alloc_vm_area() xen: condense everything onto xen_set_pte xen: use mmu_update for xen_set_pte_at() xen: drop all the special iomap pte paths.
2011-05-26selinux: don't pass in NULL avd to avc_has_perm_noauditLinus Torvalds2-11/+4
Right now security_get_user_sids() will pass in a NULL avd pointer to avc_has_perm_noaudit(), which then forces that function to have a dummy entry for that case and just generally test it. Don't do it. The normal callers all pass a real avd pointer, and this helper function is incredibly hot. So don't make avc_has_perm_noaudit() do conditional stuff that isn't needed for the common case. This also avoids some duplicated stack space. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linusLinus Torvalds23-151/+203
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: Squashfs: update email address Squashfs: add extra sanity checks at mount time Squashfs: add sanity checks to fragment reading at mount time Squashfs: add sanity checks to lookup table reading at mount time Squashfs: add sanity checks to id reading at mount time Squashfs: add sanity checks to xattr reading at mount time Squashfs: reverse order of filesystem table reading Squashfs: move table allocation into squashfs_read_table()
2011-05-26m68knommu: use generic find_next_bit_le()Akinobu Mita1-44/+2
The implementation of find_next_bit_le() on m68knommu is identical with the generic implementation of find_next_bit_le(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26s390: use asm-generic/bitops/le.hAkinobu Mita1-35/+2
The previous style change enables to use asm-generic/bitops/le.h on s390. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26arm: use asm-generic/bitops/le.hAkinobu Mita1-38/+5
The previous style change enables to use asm-generic/bitops/le.h on arm. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26arch: remove CONFIG_GENERIC_FIND_{NEXT_BIT,BIT_LE,LAST_BIT}Akinobu Mita24-126/+2
By the previous style change, CONFIG_GENERIC_FIND_NEXT_BIT, CONFIG_GENERIC_FIND_BIT_LE, and CONFIG_GENERIC_FIND_LAST_BIT are not used to test for existence of find bitops anymore. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Greg Ungerer <gerg@uclinux.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26bitops: add #ifndef for each of find bitopsAkinobu Mita5-0/+29
The style that we normally use in asm-generic is to test the macro itself for existence, so in asm-generic, do: #ifndef find_next_zero_bit_le extern unsigned long find_next_zero_bit_le(const void *addr, unsigned long size, unsigned long offset); #endif and in the architectures, write static inline unsigned long find_next_zero_bit_le(const void *addr, unsigned long size, unsigned long offset) #define find_next_zero_bit_le find_next_zero_bit_le This adds the #ifndef for each of the find bitops in the generic header and source files. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26arch: add #define for each of optimized find bitopsAkinobu Mita5-0/+36
The style that we normally use in asm-generic is to test the macro itself for existence, so in asm-generic, do: #ifndef find_next_zero_bit_le extern unsigned long find_next_zero_bit_le(const void *addr, unsigned long size, unsigned long offset); #endif and in the architectures, write static inline unsigned long find_next_zero_bit_le(const void *addr, unsigned long size, unsigned long offset) #define find_next_zero_bit_le find_next_zero_bit_le This adds the #define for each of the optimized find bitops in the architectures. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: Greg Ungerer <gerg@uclinux.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26m68knommu: fix build error due to the lack of find_next_bit_le()Akinobu Mita1-0/+44
m68knommu can't build ext4, udf, and ocfs2 due to the lack of find_next_bit_le(). This implements find_next_bit_le() on m68knommu by duplicating the generic find_next_bit_le() in lib/find_next_bit.c. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26w1: add Maxim/Dallas DS2780 Stand-Alone Fuel Gauge IC supportClifton Barnes8-0/+1222
Add support for the Maxim/Dallas DS2780 Stand-Alone Fuel Gauge IC. It was suggested to combine this functionality with the current ds2782 driver. Unfortunately, I'm unable to commit the time to refactoring this driver to that extent and I don't have a platform with the ds2782 part to validate that there are no regression issues by adding this functionality. [akpm@linux-foundation.org: use min_t()] Signed-off-by: Clifton Barnes <cabarnes@indesign-llc.com> Tested-by: Haojian Zhuang <haojian.zhuang@gmail.com> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Cc: Ryan Mallon <ryan@bluewatersys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26w1: have netlink search update kernel listDavid Fries3-5/+16
Reorganize so the netlink connector one wire search command will update the kernel list of detected slave devices. Otherwise, a newly detected device is unusable because unless it's in the kernel list of known devices any commands will result in ENODEV status. Signed-off-by: David Fries <David@Fries.net> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26w1: complete the 1-wire (w1) ds1wm driver search algorithmJean-François Dagenais2-103/+220
This adds multi-slave support of the w1 bus for the ds1wm Synthesizable 1-Wire Bus Master. Also many fixes and tweaks based on the rev3 of the datasheet http://datasheets.maxim-ic.com/en/ds/DS1WM.pdf Signed-off-by: Jean-François Dagenais <dagenaisj@sonatest.com> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Cc: Szabolcs Gyurko <szabolcs.gyurko@tlt.hu> Cc: Matt Reimer <mreimer@vpop.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26w1: add 1-wire (w1) DS2408 8-Channel Addressable Switch supportJean-François Dagenais4-0/+411
This DS2408 w1 slave driver is not complete for all the features of the chip, but its sufficient if you use it as a simple IO expander. [randy.dunlap@oracle.com: fix w1_ds2408.c printk formats] Signed-off-by: Jean-François Dagenais <dagenaisj@sonatest.com> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Cc: Szabolcs Gyurko <szabolcs.gyurko@tlt.hu> Cc: Matt Reimer <mreimer@vpop.net> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26w1: add 1-wire (w1) reset and resume command API supportJean-François Dagenais2-0/+28
The first patch adds generic functionnality to w1_io for Resume Command [A5h] lots of slaves support. I found it useful for multi-commands/reset workflows with the same slave on a multi-slave bus. This DS2408 w1 slave driver is not complete for all the features of the chip, but its sufficient if you use it as a simple IO expander. Enjoy! The ds1wm had Kconfig dependencies towards ARM && HAVE_CLK. I took them out since I was using the ds1wm on an x86_64 platform (ds1wm in a FPGA through pcie) and found them irrelevant. The clock freq/divisors at the top of ds1wm.c did not have the MSB set to 1. This bit is CLK_EN which turns the whole prescaler and dividers on. The driver never mentionned this bit either, so I just included this bit right in the table entries. I also took the liberty to add a couple of entries to the table. The spec doesn't explicitely mentions these possibilities but the description and examination of the core shows the prescalers & dividers can be used for more than the table explicitely shows. The table I enlarged still doesn't cover all possibilities, but it's a good start. I also made a few tweaks to a couple of the read and write algorithms which made sense while I had my head very deep in the ds1wm documentation. We stressed it a lot with 10+ slaves on the bus, many ds2408, ds2431 and ds2433 at the same time doing extensive interaction. It proved quite stable in our production environment. This patch: Add generic functionnality to w1_io for Resume Command [A5h] lots of slaves support. Signed-off-by: Jean-François Dagenais <dagenaisj@sonatest.com> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Cc: Szabolcs Gyurko <szabolcs.gyurko@tlt.hu> Cc: Matt Reimer <mreimer@vpop.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26kernel/profile.c: remove some duplicate code from profile_hits()Rakib Mullick1-7/+9
profile_hits() has a common check for prof_on and prof_buffer regardless of SMP or !SMP. So, remove some duplicate code by splitting profile_hits into two. [akpm@linux-foundation.org: make do_profile_hits static] Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26drivers/char/ppdev.c: put gotten port valueJulia Lawall1-0/+1
parport_find_number() calls parport_get_port() on its result, so there should be a corresponding call to parport_put_port() before dropping the reference. Similar code is found in the function register_device() in the same file. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @exists@ local idexpression struct parport * x; expression ra,rr; statement S1,S2; @@ x = parport_find_number(...) ... when != x = rr when any when != parport_put_port(x,...) when != if (...) { ... parport_put_port(x,...) ...} ( if(<+...x...+>) S1 else S2 | if(...) { ... when != x = ra when forall when != parport_put_port(x,...) *return...; } ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26edac,rcu: use synchronize_rcu() instead of call_rcu()+rcu_barrier()Lai Jiangshan4-55/+18
synchronize_rcu() does the stuff as needed. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26pid: fix typo in function descriptionSisir Koppaka1-1/+1
finds is misspelt as finr. No functional change. Signed-off-by: Sisir Koppaka <sisir.koppaka@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26fs/partitions/efi.c: corrupted GUID partition tables can cause kernel oopsTimo Warns1-0/+9
The kernel automatically evaluates partition tables of storage devices. The code for evaluating GUID partitions (in fs/partitions/efi.c) contains a bug that causes a kernel oops on certain corrupted GUID partition tables. This bug has security impacts, because it allows, for example, to prepare a storage device that crashes a kernel subsystem upon connecting the device (e.g., a "USB Stick of (Partial) Death"). crc = efi_crc32((const unsigned char *) (*gpt), le32_to_cpu((*gpt)->header_size)); computes a CRC32 checksum over gpt covering (*gpt)->header_size bytes. There is no validation of (*gpt)->header_size before the efi_crc32 call. A corrupted partition table may have large values for (*gpt)->header_size. In this case, the CRC32 computation access memory beyond the memory allocated for gpt, which may cause a kernel heap overflow. Validate value of GUID partition table header size. [akpm@linux-foundation.org: fix layout and indenting] Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: Matt Domsch <Matt_Domsch@dell.com> Cc: Eugene Teo <eugeneteo@kernel.sg> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26drivers/char/mspec.c: use {k,v}zalloc to allocate memoryRakib Mullick1-3/+2
Let memory allocator initialize the allocated memory as null, thus remove the use of memset. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26ipmi: convert to seq_file interfaceAlexey Dobriyan3-86/+142
The ->read_proc interface is going away, convert to seq_file. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc:Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pagesOlaf Hering2-3/+54
The balloon driver in a Xen guest frees guest pages and marks them as mmio. When the kernel crashes and the crash kernel attempts to read the oldmem via /proc/vmcore a read from ballooned pages will generate 100% load in dom0 because Xen asks qemu-dm for the page content. Since the reads come in as 8byte requests each ballooned page is tried 512 times. With this change a hook can be registered which checks wether the given pfn is really ram. The hook has to return a value > 0 for ram pages, a value < 0 on error (because the hypercall is not known) and 0 for non-ram pages. This will reduce the time to read /proc/vmcore. Without this change a 512M guest with 128M crashkernel region needs 200 seconds to read it, with this change it takes just 2 seconds. Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26proc: fix pagemap_read() error caseKOSAKI Motohiro1-10/+9
Currently, pagemap_read() has three error and/or corner case handling mistake. (1) If ppos parameter is wrong, mm refcount will be leak. (2) If count parameter is 0, mm refcount will be leak too. (3) If the current task is sleeping in kmalloc() and the system is out of memory and oom-killer kill the proc associated task, mm_refcount prevent the task free its memory. then system may hang up. <Quote Hugh's explain why we shold call kmalloc() before get_mm()> check_mem_permission gets a reference to the mm. If we __get_free_page after check_mem_permission, imagine what happens if the system is out of memory, and the mm we're looking at is selected for killing by the OOM killer: while we wait in __get_free_page for more memory, no memory is freed from the selected mm because it cannot reach exit_mmap while we hold that reference. This patch fixes the above three. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jovi Zhang <bookjovi@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Stephen Wilson <wilsons@start.ca> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26proc: put check_mem_permission after __get_free_page in mem_writeKOSAKI Motohiro1-7/+9
It whould be better if put check_mem_permission after __get_free_page in mem_write, to be same as function mem_read. Hugh Dickins explained the reason. check_mem_permission gets a reference to the mm. If we __get_free_page after check_mem_permission, imagine what happens if the system is out of memory, and the mm we're looking at is selected for killing by the OOM killer: while we wait in __get_free_page for more memory, no memory is freed from the selected mm because it cannot reach exit_mmap while we hold that reference. Reported-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Stephen Wilson <wilsons@start.ca> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26proc/stat: use defined macro KMALLOC_MAX_SIZEYuanhan Liu1-3/+3
There is a macro for the max size kmalloc can allocate, so use it instead of a hardcoded number. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26proc: constify status arrayMike Frysinger1-2/+2
No need for this local array to be writable, so mark it const. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26fs/proc: convert to kstrtoX()Alexey Dobriyan2-11/+13
Convert fs/proc/ from strict_strto*() to kstrto*() functions. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26coredump: add support for exe_file in core nameJiri Slaby2-1/+40
Now, exe_file is not proc FS dependent, so we can use it to name core file. So we add %E pattern for core file name cration which extract path from mm_struct->exe_file. Then it converts slashes to exclamation marks and pastes the result to the core file name itself. This is useful for environments where binary names are longer than 16 character (the current->comm limitation). Also where there are binaries with same name but in a different path. Further in case the binery itself changes its current->comm after exec. So by doing (s/$/#/ -- # is treated as git comment): $ sysctl kernel.core_pattern='core.%p.%e.%E' $ ln /bin/cat cat45678901234567890 $ ./cat45678901234567890 ^Z $ rm cat45678901234567890 $ fg ^\Quit (core dumped) $ ls core* we now get: core.2434.cat456789012345.!root!cat45678901234567890 (deleted) Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Reviewed-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26mm: extract exe_file handling from procfsJiri Slaby6-82/+53
Setup and cleanup of mm_struct->exe_file is currently done in fs/proc/. This was because exe_file was needed only for /proc/<pid>/exe. Since we will need the exe_file functionality also for core dumps (so core name can contain full binary path), built this functionality always into the kernel. To achieve that move that out of proc FS to the kernel/ where in fact it should belong. By doing that we can make dup_mm_exe_file static. Also we can drop linux/proc_fs.h inclusion in fs/exec.c and kernel/fork.c. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26kgdbts: unify/generalize gdb breakpoint adjustmentMike Frysinger4-18/+14
The Blackfin arch, like the x86 arch, needs to adjust the PC manually after a breakpoint is hit as normally this is handled by the remote gdb. However, rather than starting another arch ifdef mess, create a common GDB_ADJUSTS_BREAK_OFFSET define for any arch to opt-in via their kgdb.h. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Dongdong Deng <dongdong.deng@windriver.com> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26sh: convert to asm-generic ptrace.hMike Frysinger1-2/+4
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: Dongdong Deng <dongdong.deng@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26x86: convert to asm-generic ptrace.hMike Frysinger1-13/+5
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: Dongdong Deng <dongdong.deng@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26Blackfin: convert to asm-generic ptrace.hMike Frysinger1-3/+2
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26asm-generic/ptrace.h: start a common low level ptrace helperMike Frysinger1-0/+74
This is a series of low level ptrace unification steps to make it easier for common code (like KGDB) to poke at register state. This also avoids having to duplicate higher level operations for most ports which don't have special needs for accessing things. This patch: This implements a bunch of helper funcs for poking the registers of a ptrace structure. Now common code should be able to portably update specific registers (like kgdb updating the PC). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: Dongdong Deng <dongdong.deng@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26memcg: add the pagefault count into memcg statsYing Han6-5/+65
Two new stats in per-memcg memory.stat which tracks the number of page faults and number of major page faults. "pgfault" "pgmajfault" They are different from "pgpgin"/"pgpgout" stat which count number of pages charged/discharged to the cgroup and have no meaning of reading/ writing page to disk. It is valuable to track the two stats for both measuring application's performance as well as the efficiency of the kernel page reclaim path. Counting pagefaults per process is useful, but we also need the aggregated value since processes are monitored and controlled in cgroup basis in memcg. Functional test: check the total number of pgfault/pgmajfault of all memcgs and compare with global vmstat value: $ cat /proc/vmstat | grep fault pgfault 1070751 pgmajfault 553 $ cat /dev/cgroup/memory.stat | grep fault pgfault 1071138 pgmajfault 553 total_pgfault 1071142 total_pgmajfault 553 $ cat /dev/cgroup/A/memory.stat | grep fault pgfault 199 pgmajfault 0 total_pgfault 199 total_pgmajfault 0 Performance test: run page fault test(pft) wit 16 thread on faulting in 15G anon pages in 16G container. There is no regression noticed on the "flt/cpu/s" Sample output from pft: TAG pft:anon-sys-default: Gb Thr CLine User System Wall flt/cpu/s fault/wsec 15 16 1 0.67s 233.41s 14.76s 16798.546 266356.260 +-------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 16682.962 17344.027 16913.524 16928.812 166.5362 + 10 16695.568 16923.896 16820.604 16824.652 84.816568 No difference proven at 95.0% confidence [akpm@linux-foundation.org: fix build] [hughd@google.com: shmem fix] Signed-off-by: Ying Han <yinghan@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26memcg: add memory.numastat api for numa statisticsYing Han1-0/+155
The new API exports numa_maps per-memcg basis. This is a piece of useful information where it exports per-memcg page distribution across real numa nodes. One of the usecases is evaluating application performance by combining this information w/ the cpu allocation to the application. The output of the memory.numastat tries to follow w/ simiar format of numa_maps like: total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ... file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ... anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ... And we have per-node: total = file + anon + unevictable $ cat /dev/cgroup/memory/memory.numa_stat total=250020 N0=87620 N1=52367 N2=45298 N3=64735 file=225232 N0=83402 N1=46160 N2=40522 N3=55148 anon=21053 N0=3424 N1=6207 N2=4776 N3=6646 unevictable=3735 N0=794 N1=0 N2=0 N3=2941 Signed-off-by: Ying Han <yinghan@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26memcg: rename mem_cgroup_zone_nr_pages() to mem_cgroup_zone_nr_lru_pages()Ying Han3-9/+9
The caller of the function has been renamed to zone_nr_lru_pages(), and this is just fixing up in the memcg code. The current name is easily to be mis-read as zone's total number of pages. Signed-off-by: Ying Han <yinghan@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26memcg: remove unused retry signal from reclaimJohannes Weiner1-1/+1
If the memcg reclaim code detects the target memcg below its limit it exits and returns a guaranteed non-zero value so that the charge is retried. Nowadays, the charge side checks the memcg limit itself and does not rely on this non-zero return value trick. This patch removes it. The reclaim code will now always return the true number of pages it reclaimed on its own. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Rik van Riel<riel@redhat.com> Acked-by: Ying Han<yinghan@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26memcg: fix get_scan_count() for small targetsKAMEZAWA Hiroyuki3-35/+34
During memory reclaim we determine the number of pages to be scanned per zone as (anon + file) >> priority. Assume scan = (anon + file) >> priority. If scan < SWAP_CLUSTER_MAX, the scan will be skipped for this time and priority gets higher. This has some problems. 1. This increases priority as 1 without any scan. To do scan in this priority, amount of pages should be larger than 512M. If pages>>priority < SWAP_CLUSTER_MAX, it's recorded and scan will be batched, later. (But we lose 1 priority.) If memory size is below 16M, pages >> priority is 0 and no scan in DEF_PRIORITY forever. 2. If zone->all_unreclaimabe==true, it's scanned only when priority==0. So, x86's ZONE_DMA will never be recoverred until the user of pages frees memory by itself. 3. With memcg, the limit of memory can be small. When using small memcg, it gets priority < DEF_PRIORITY-2 very easily and need to call wait_iff_congested(). For doing scan before priorty=9, 64MB of memory should be used. Then, this patch tries to scan SWAP_CLUSTER_MAX of pages in force...when 1. the target is enough small. 2. it's kswapd or memcg reclaim. Then we can avoid rapid priority drop and may be able to recover all_unreclaimable in a small zones. And this patch removes nr_saved_scan. This will allow scanning in this priority even when pages >> priority is very small. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Ying Han <yinghan@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26memcg: reclaim memory from nodes in round-robin orderYing Han3-7/+106
Presently, memory cgroup's direct reclaim frees memory from the current node. But this has some troubles. Usually when a set of threads works in a cooperative way, they tend to operate on the same node. So if they hit limits under memcg they will reclaim memory from themselves, damaging the active working set. For example, assume 2 node system which has Node 0 and Node 1 and a memcg which has 1G limit. After some work, file cache remains and the usages are Node 0: 1M Node 1: 998M. and run an application on Node 0, it will eat its foot before freeing unnecessary file caches. This patch adds round-robin for NUMA and adds equal pressure to each node. When using cpuset's spread memory feature, this will work very well. But yes, a better algorithm is needed. [akpm@linux-foundation.org: comment editing] [kamezawa.hiroyu@jp.fujitsu.com: fix time comparisons] Signed-off-by: Ying Han <yinghan@google.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26MAINTAINERS: add mm/page_cgroup.c into memcg subsystemNamhyung Kim1-0/+1
AFAICS mm/page_cgroup.c is for memcg subsystem, but it was directed only to generic cgroup maintainers. Fix it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26memcg: move page-freeing code out of lockNamhyung Kim1-9/+13
Move page-freeing code out of swap_cgroup_mutex in the hope that it could reduce few of theoretical contentions between swapons and/or swapoffs. This is just a cleanup, no functional changes. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26memcg: fix off-by-one when calculating swap cgroup map lengthNamhyung Kim1-1/+1
It allocated one more page than necessary if @max_pages was a multiple of SC_PER_PAGE. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26memcg: mark init_section_page_cgroup() properlyNamhyung Kim1-2/+2
Commit ca371c0d7e23 ("memcg: fix page_cgroup fatal error in FLATMEM") removes call to alloc_bootmem() in the function so that it can be marked as __meminit to reduce memory usage when MEMORY_HOTPLUG=n. Also as the new helper function alloc_page_cgroup() is called only in the function, it should be marked too. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26memcg: remove pointless next_mz nullification in mem_cgroup_soft_limit_reclaim()Michal Hocko1-3/+2
next_mz is assigned to NULL if __mem_cgroup_largest_soft_limit_node selects the same mz. This doesn't make much sense as we assign to the variable right in the next loop. Compiler will probably optimize this out but it is little bit confusing for the code reading. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26memcg: add the soft_limit reclaim in global direct reclaim.Ying Han1-2/+14
We recently added the change in global background reclaim which counts the return value of soft_limit reclaim. Now this patch adds the similar logic on global direct reclaim. We should skip scanning global LRU on shrink_zone if soft_limit reclaim does enough work. This is the first step where we start with counting the nr_scanned and nr_reclaimed from soft_limit reclaim into global scan_control. Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26memcg: count the soft_limit reclaim in global background reclaimYing Han4-15/+39
The global kswapd scans per-zone LRU and reclaims pages regardless of the cgroup. It breaks memory isolation since one cgroup can end up reclaiming pages from another cgroup. Instead we should rely on memcg-aware target reclaim including per-memcg kswapd and soft_limit hierarchical reclaim under memory pressure. In the global background reclaim, we do soft reclaim before scanning the per-zone LRU. However, the return value is ignored. This patch is the first step to skip shrink_zone() if soft_limit reclaim does enough work. This is part of the effort which tries to reduce reclaiming pages in global LRU in memcg. The per-memcg background reclaim patchset further enhances the per-cgroup targetting reclaim, which I should have V4 posted shortly. Try running multiple memory intensive workloads within seperate memcgs. Watch the counters of soft_steal in memory.stat. $ cat /dev/cgroup/A/memory.stat | grep 'soft' soft_steal 240000 soft_scan 240000 total_soft_steal 240000 total_soft_scan 240000 This patch: In the global background reclaim, we do soft reclaim before scanning the per-zone LRU. However, the return value is ignored. We would like to skip shrink_zone() if soft_limit reclaim does enough work. Also, we need to make the memory pressure balanced across per-memcg zones, like the logic vm-core. This patch is the first step where we start with counting the nr_scanned and nr_reclaimed from soft_limit reclaim into the global scan_control. Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26mm: move enum vm_event_item into a standalone header fileAndrew Morton2-61/+65
enums are problematic because they cannot be forward-declared: akpm2:/home/akpm> cat t.c enum foo; static inline void bar(enum foo f) { } akpm2:/home/akpm> gcc -c t.c t.c:4: error: parameter 1 ('f') has incomplete type So move the enum's definition into a standalone header file which can be used wherever its definition is needed. Cc: Ying Han <yinghan@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>