From 0e9052eb98a9986ec0669d030604f7a68f6df638 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 16 Dec 2009 12:19:57 +0100 Subject: page-types: add standard GPL license header Signed-off-by: Wu Fengguang Signed-off-by: Andi Kleen --- Documentation/vm/page-types.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c index 7a7d9bab32e..66e9358e214 100644 --- a/Documentation/vm/page-types.c +++ b/Documentation/vm/page-types.c @@ -1,11 +1,22 @@ /* * page-types: Tool for querying page flags * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; version 2. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should find a copy of v2 of the GNU General Public License somewhere on + * your Linux system; if not, write to the Free Software Foundation, Inc., 59 + * Temple Place, Suite 330, Boston, MA 02111-1307 USA. + * * Copyright (C) 2009 Intel corporation * * Authors: Wu Fengguang - * - * Released under the General Public License (GPL). */ #define _LARGEFILE64_SOURCE -- cgit v1.2.3 From 847ce401df392b0704369fd3f75df614ac1414b4 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 16 Dec 2009 12:19:58 +0100 Subject: HWPOISON: Add unpoisoning support The unpoisoning interface is useful for stress testing tools to reclaim poisoned pages (to prevent OOM) There is no hardware level unpoisioning, so this cannot be used for real memory errors, only for software injected errors. Note that it may leak pages silently - those who have been removed from LRU cache, but not isolated from page cache/swap cache at hwpoison time. Especially the stress test of dirty swap cache pages shall reboot system before exhausting memory. AK: Fix comments, add documentation, add printks, rename symbol Signed-off-by: Wu Fengguang Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index 3ffadf8da61..f047e75acb2 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -98,10 +98,22 @@ madvise(MADV_POISON, ....) hwpoison-inject module through debugfs - /sys/debug/hwpoison/corrupt-pfn -Inject hwpoison fault at PFN echoed into this file +/sys/debug/hwpoison/ +corrupt-pfn + +Inject hwpoison fault at PFN echoed into this file. + +unpoison-pfn + +Software-unpoison page at PFN echoed into this file. This +way a page can be reused again. +This only works for Linux injected failures, not for real +memory failures. + +Note these injection interfaces are not stable and might change between +kernel versions Architecture specific MCE injector -- cgit v1.2.3 From 7c116f2b0dbac4a1dd051c7a5e8cef37701cafd4 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 16 Dec 2009 12:19:59 +0100 Subject: HWPOISON: add fs/device filters Filesystem data/metadata present the most tricky-to-isolate pages. It requires careful code review and stress testing to get them right. The fs/device filter helps to target the stress tests to some specific filesystem pages. The filter condition is block device's major/minor numbers: - corrupt-filter-dev-major - corrupt-filter-dev-minor When specified (non -1), only page cache pages that belong to that device will be poisoned. The filters are checked reliably on the locked and refcounted page. Haicheng: clear PG_hwpoison and drop bad page count if filter not OK AK: Add documentation CC: Haicheng Li CC: Nick Piggin Signed-off-by: Wu Fengguang Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index f047e75acb2..fdf58046432 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -115,6 +115,13 @@ memory failures. Note these injection interfaces are not stable and might change between kernel versions +corrupt-filter-dev-major +corrupt-filter-dev-minor + +Only handle memory failures to pages associated with the file system defined +by block device major/minor. -1U is the wildcard value. +This should be only used for testing with artificial injection. + Architecture specific MCE injector x86 has mce-inject, mce-test -- cgit v1.2.3 From 31d3d3484f9bd263925ecaa341500ac2df3a5d9b Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 16 Dec 2009 12:19:59 +0100 Subject: HWPOISON: limit hwpoison injector to known page types __memory_failure()'s workflow is set PG_hwpoison //... unset PG_hwpoison if didn't pass hwpoison filter That could kill unrelated process if it happens to page fault on the page with the (temporary) PG_hwpoison. The race should be big enough to appear in stress tests. Fix it by grabbing the page and checking filter at inject time. This also avoids the very noisy "Injecting memory failure..." messages. - we don't touch madvise() based injection, because the filters are generally not necessary for it. - if we want to apply the filters to h/w aided injection, we'd better to rearrange the logic in __memory_failure() instead of this patch. AK: fix documentation, use drain all, cleanups CC: Haicheng Li Signed-off-by: Wu Fengguang Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index fdf58046432..4ef7bb30d15 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -103,7 +103,8 @@ hwpoison-inject module through debugfs corrupt-pfn -Inject hwpoison fault at PFN echoed into this file. +Inject hwpoison fault at PFN echoed into this file. This does +some early filtering to avoid corrupted unintended pages in test suites. unpoison-pfn -- cgit v1.2.3 From 478c5ffc0b50527bd2390f2daa46cc16276b8413 Mon Sep 17 00:00:00 2001 From: Wu Fengguang Date: Wed, 16 Dec 2009 12:19:59 +0100 Subject: HWPOISON: add page flags filter When specified, only poison pages if ((page_flags & mask) == value). - corrupt-filter-flags-mask - corrupt-filter-flags-value This allows stress testing of many kinds of pages. Strictly speaking, the buddy pages requires taking zone lock, to avoid setting PG_hwpoison on a "was buddy but now allocated to someone" page. However we can just do nothing because we set PG_locked in the beginning, this prevents the page allocator from allocating it to someone. (It will BUG() on the unexpected PG_locked, which is fine for hwpoison testing.) [AK: Add select PROC_PAGE_MONITOR to satisfy dependency] CC: Nick Piggin Signed-off-by: Wu Fengguang Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index 4ef7bb30d15..f454d3cd4d6 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -123,6 +123,16 @@ Only handle memory failures to pages associated with the file system defined by block device major/minor. -1U is the wildcard value. This should be only used for testing with artificial injection. + +corrupt-filter-flags-mask +corrupt-filter-flags-value + +When specified, only poison pages if ((page_flags & mask) == value). +This allows stress testing of many kinds of pages. The page_flags +are the same as in /proc/kpageflags. The flag bits are defined in +include/linux/kernel-page-flags.h and documented in +Documentation/vm/pagemap.txt + Architecture specific MCE injector x86 has mce-inject, mce-test -- cgit v1.2.3 From 4fd466eb46a6a917c317a87fb94bfc7252a0f7ed Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 16 Dec 2009 12:19:59 +0100 Subject: HWPOISON: add memory cgroup filter The hwpoison test suite need to inject hwpoison to a collection of selected task pages, and must not touch pages not owned by them and thus kill important system processes such as init. (But it's OK to mis-hwpoison free/unowned pages as well as shared clean pages. Mis-hwpoison of shared dirty pages will kill all tasks, so the test suite will target all or non of such tasks in the first place.) The memory cgroup serves this purpose well. We can put the target processes under the control of a memory cgroup, and tell the hwpoison injection code to only kill pages associated with some active memory cgroup. The prerequisite for doing hwpoison stress tests with mem_cgroup is, the mem_cgroup code tracks task pages _accurately_ (unless page is locked). Which we believe is/should be true. The benefits are simplification of hwpoison injector code. Also the mem_cgroup code will automatically be tested by hwpoison test cases. The alternative interfaces pin-pfn/unpin-pfn can also delegate the (process and page flags) filtering functions reliably to user space. However prototype implementation shows that this scheme adds more complexity than we wanted. Example test case: mkdir /cgroup/hwpoison usemem -m 100 -s 1000 & echo `jobs -p` > /cgroup/hwpoison/tasks memcg_ino=$(ls -id /cgroup/hwpoison | cut -f1 -d' ') echo $memcg_ino > /debug/hwpoison/corrupt-filter-memcg page-types -p `pidof init` --hwpoison # shall do nothing page-types -p `pidof usemem` --hwpoison # poison its pages [AK: Fix documentation] [Add fix for problem noticed by Li Zefan ; dentry in the css could be NULL] CC: KOSAKI Motohiro CC: Hugh Dickins CC: Daisuke Nishimura CC: Balbir Singh CC: KAMEZAWA Hiroyuki CC: Li Zefan CC: Paul Menage CC: Nick Piggin CC: Andi Kleen Signed-off-by: Wu Fengguang Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index f454d3cd4d6..989e5afe740 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -123,6 +123,22 @@ Only handle memory failures to pages associated with the file system defined by block device major/minor. -1U is the wildcard value. This should be only used for testing with artificial injection. +corrupt-filter-memcg + +Limit injection to pages owned by memgroup. Specified by inode number +of the memcg. + +Example: + mkdir /cgroup/hwpoison + + usemem -m 100 -s 1000 & + echo `jobs -p` > /cgroup/hwpoison/tasks + + memcg_ino=$(ls -id /cgroup/hwpoison | cut -f1 -d' ') + echo $memcg_ino > /debug/hwpoison/corrupt-filter-memcg + + page-types -p `pidof init` --hwpoison # shall do nothing + page-types -p `pidof usemem` --hwpoison # poison its pages corrupt-filter-flags-mask corrupt-filter-flags-value -- cgit v1.2.3 From fe194d3e100dea323d7b2de96d3b44d0c067ba7a Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 16 Dec 2009 12:20:00 +0100 Subject: HWPOISON: Use correct name for MADV_HWPOISON in documentation Signed-off-by: Andi Kleen --- Documentation/vm/hwpoison.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt index 989e5afe740..12f9ba20ccb 100644 --- a/Documentation/vm/hwpoison.txt +++ b/Documentation/vm/hwpoison.txt @@ -92,7 +92,7 @@ PR_MCE_KILL_GET Testing: -madvise(MADV_POISON, ....) +madvise(MADV_HWPOISON, ....) (as root) Poison a page in the process for testing -- cgit v1.2.3 From facb6011f3993947283fa15d039dacb4ad140230 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Wed, 16 Dec 2009 12:20:00 +0100 Subject: HWPOISON: Add soft page offline support This is a simpler, gentler variant of memory_failure() for soft page offlining controlled from user space. It doesn't kill anything, just tries to invalidate and if that doesn't work migrate the page away. This is useful for predictive failure analysis, where a page has a high rate of corrected errors, but hasn't gone bad yet. Instead it can be offlined early and avoided. The offlining is controlled from sysfs, including a new generic entry point for hard page offlining for symmetry too. We use the page isolate facility to prevent re-allocation race. Normally this is only used by memory hotplug. To avoid races with memory allocation I am using lock_system_sleep(). This avoids the situation where memory hotplug is about to isolate a page range and then hwpoison undoes that work. This is a big hammer currently, but the simplest solution currently. When the page is not free or LRU we try to free pages from slab and other caches. The slab freeing is currently quite dumb and does not try to focus on the specific slab cache which might own the page. This could be potentially improved later. Thanks to Fengguang Wu and Haicheng Li for some fixes. [Added fix from Andrew Morton to adapt to new migrate_pages prototype] Signed-off-by: Andi Kleen --- .../ABI/testing/sysfs-memory-page-offline | 44 ++++++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-memory-page-offline (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline new file mode 100644 index 00000000000..e14703f12fd --- /dev/null +++ b/Documentation/ABI/testing/sysfs-memory-page-offline @@ -0,0 +1,44 @@ +What: /sys/devices/system/memory/soft_offline_page +Date: Sep 2009 +KernelVersion: 2.6.33 +Contact: andi@firstfloor.org +Description: + Soft-offline the memory page containing the physical address + written into this file. Input is a hex number specifying the + physical address of the page. The kernel will then attempt + to soft-offline it, by moving the contents elsewhere or + dropping it if possible. The kernel will then be placed + on the bad page list and never be reused. + + The offlining is done in kernel specific granuality. + Normally it's the base page size of the kernel, but + this might change. + + The page must be still accessible, not poisoned. The + kernel will never kill anything for this, but rather + fail the offline. Return value is the size of the + number, or a error when the offlining failed. Reading + the file is not allowed. + +What: /sys/devices/system/memory/hard_offline_page +Date: Sep 2009 +KernelVersion: 2.6.33 +Contact: andi@firstfloor.org +Description: + Hard-offline the memory page containing the physical + address written into this file. Input is a hex number + specifying the physical address of the page. The + kernel will then attempt to hard-offline the page, by + trying to drop the page or killing any owner or + triggering IO errors if needed. Note this may kill + any processes owning the page. The kernel will avoid + to access this page assuming it's poisoned by the + hardware. + + The offlining is done in kernel specific granuality. + Normally it's the base page size of the kernel, but + this might change. + + Return value is the size of the number, or a error when + the offlining failed. + Reading the file is not allowed. -- cgit v1.2.3