From a368ab67aa55615a03b2c9c00fb965bee3ebeaa4 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Tue, 7 Apr 2015 14:26:41 -0700 Subject: mm: move zone lock to a different cache line than order-0 free page lists Huang Ying reported the following problem due to commit 3484b2de9499 ("mm: rearrange zone fields into read-only, page alloc, statistics and page reclaim lines") from the Intel performance tests 24b7e5819ad5cbef 3484b2de9499df23c4604a513b ---------------- -------------------------- %stddev %change %stddev \ | \ 152288 \261 0% -46.2% 81911 \261 0% aim7.jobs-per-min 237 \261 0% +85.6% 440 \261 0% aim7.time.elapsed_time 237 \261 0% +85.6% 440 \261 0% aim7.time.elapsed_time.max 25026 \261 0% +70.7% 42712 \261 0% aim7.time.system_time 2186645 \261 5% +32.0% 2885949 \261 4% aim7.time.voluntary_context_switches 4576561 \261 1% +24.9% 5715773 \261 0% aim7.time.involuntary_context_switches The problem is specific to very large machines under stress. It was not reproducible with the machines I had used to justify the original patch because large numbers of CPUs are required. When pressure is high enough, the cache line is bouncing between CPUs trying to acquire the lock and the holder of the lock adjusting free lists. The intention was that the acquirer of the lock would automatically have the cache line holding the free lists but according to Huang, this is not a universal win. One possibility is to move the zone lock to its own cache line but it increases the size of the zone. This patch moves the lock to the other end of the free lists where they do not contend under high pressure. It does mean the page allocator paths now require more cache lines but Huang reports that it restores performance to previous levels on large machines %stddev %change %stddev \ | \ 84568 \261 1% +94.3% 164280 \261 1% aim7.jobs-per-min 2881944 \261 2% -35.1% 1870386 \261 8% aim7.time.voluntary_context_switches 681 \261 1% -3.4% 658 \261 0% aim7.time.user_time 5538139 \261 0% -12.1% 4867884 \261 0% aim7.time.involuntary_context_switches 44174 \261 1% -46.0% 23848 \261 1% aim7.time.system_time 426 \261 1% -48.4% 219 \261 1% aim7.time.elapsed_time 426 \261 1% -48.4% 219 \261 1% aim7.time.elapsed_time.max 468 \261 1% -43.1% 266 \261 2% uptime.boot Signed-off-by: Mel Gorman Reported-by: Huang Ying Tested-by: Huang Ying Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mmzone.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index f279d9c158cd..2782df47101e 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -474,16 +474,15 @@ struct zone { unsigned long wait_table_bits; ZONE_PADDING(_pad1_) - - /* Write-intensive fields used from the page allocator */ - spinlock_t lock; - /* free areas of different sizes */ struct free_area free_area[MAX_ORDER]; /* zone flags, see below */ unsigned long flags; + /* Write-intensive fields used from the page allocator */ + spinlock_t lock; + ZONE_PADDING(_pad2_) /* Write-intensive fields used by page reclaim */ -- cgit v1.2.3 From ce66b032ad7b838bf376e3b1bb4d8bce1a69ee5c Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Tue, 7 Apr 2015 14:26:44 -0700 Subject: include/linux/dmapool.h: declare struct device dmapool uses struct device in function arguments but relies on an implicit inclusion to declare struct device causing warnings in some configurations: include/linux/dmapool.h:31:7: warning: 'struct device' declared inside parameter list Fix this by adding a struct device declaration to the file. Signed-off-by: Mark Brown Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/dmapool.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/dmapool.h b/include/linux/dmapool.h index 022e34fcbd1b..52456aa566a0 100644 --- a/include/linux/dmapool.h +++ b/include/linux/dmapool.h @@ -14,6 +14,8 @@ #include #include +struct device; + struct dma_pool *dma_pool_create(const char *name, struct device *dev, size_t size, size_t align, size_t allocation); -- cgit v1.2.3 From 6b79c57b92cdd90853002980609af516d14c4f9c Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Tue, 7 Apr 2015 14:26:47 -0700 Subject: mm: numa: disable change protection for vma(VM_HUGETLB) Currently when a process accesses a hugetlb range protected with PROTNONE, unexpected COWs are triggered, which finally puts the hugetlb subsystem into a broken/uncontrollable state, where for example h->resv_huge_pages is subtracted too much and wraps around to a very large number, and the free hugepage pool is no longer maintainable. This patch simply stops changing protection for vma(VM_HUGETLB) to fix the problem. And this also allows us to avoid useless overhead of minor faults. Signed-off-by: Naoya Horiguchi Suggested-by: Mel Gorman Cc: Hugh Dickins Cc: "Kirill A. Shutemov" Cc: David Rientjes Cc: Rik van Riel Cc: Peter Zijlstra Cc: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/sched/fair.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bcfe32088b37..241213be507c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2165,8 +2165,10 @@ void task_numa_work(struct callback_head *work) vma = mm->mmap; } for (; vma; vma = vma->vm_next) { - if (!vma_migratable(vma) || !vma_policy_mof(vma)) + if (!vma_migratable(vma) || !vma_policy_mof(vma) || + is_vm_hugetlb_page(vma)) { continue; + } /* * Shared library pages mapped by multiple processes are not -- cgit v1.2.3