summaryrefslogtreecommitdiff
path: root/drivers/md/dm-bufio.c
AgeCommit message (Collapse)AuthorFilesLines
2014-08-07dm bufio: fully initialize shrinkerGreg Thelen1-1/+1
commit d8c712ea471ce7a4fd1734ad2211adf8469ddddc upstream. 1d3d4437eae1 ("vmscan: per-node deferred work") added a flags field to struct shrinker assuming that all shrinkers were zero filled. The dm bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data in flags. So far the only defined flags bit is SHRINKER_NUMA_AWARE. But there are proposed patches which add other bits to shrinker.flags (e.g. memcg awareness). Rather than simply initializing the shrinker, this patch uses kzalloc() when allocating the dm_bufio_client to ensure that the embedded shrinker and any other similar structures are zeroed. This fixes theoretical over aggressive shrinking of dm bufio objects. If the uninitialized dm_bufio_client.shrinker.flags contains SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for each numa node rather than just once. This has been broken since 3.12. Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-30Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...
2014-01-14dm snapshot: use dm-bufio prefetchMikulas Patocka1-2/+12
This patch modifies dm-snapshot so that it prefetches the buffers when loading the exceptions. The number of buffers read ahead is specified in the DM_PREFETCH_CHUNKS macro. The current value for DM_PREFETCH_CHUNKS (12) was found to provide the best performance on a single 15k SCSI spindle. In the future we may modify this default or make it configurable. Also, introduce the function dm_bufio_set_minimum_buffers to setup bufio's number of internal buffers before freeing happens. dm-bufio may hold more buffers if enough memory is available. There is no guarantee that the specified number of buffers will be available - if you need a guarantee, use the argument reserved_buffers for dm_bufio_client_create. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-01-14dm snapshot: use dm-bufioMikulas Patocka1-0/+22
Use dm-bufio for initial loading of the exceptions. Introduce a new function dm_bufio_forget that frees the given buffer. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-12-31Merge tag 'v3.13-rc6' into for-3.14/coreJens Axboe1-0/+5
Needed to bring blk-mq uptodate, since changes have been going in since for-3.14/core was established. Fixup merge issues related to the immutable biovec changes. Signed-off-by: Jens Axboe <axboe@kernel.dk> Conflicts: block/blk-flush.c fs/btrfs/check-integrity.c fs/btrfs/extent_io.c fs/btrfs/scrub.c fs/logfs/dev_bdev.c
2013-12-10dm bufio: initialize read-only module parametersMikulas Patocka1-0/+5
Some module parameters in dm-bufio are read-only. These parameters inform the user about memory consumption. They are not supposed to be changed by the user. However, despite being read-only, these parameters can be set on modprobe or insmod command line, for example: modprobe dm-bufio current_allocated_bytes=12345 The kernel doesn't expect that these variables can be non-zero at module initialization and if the user sets them, it results in BUG. This patch initializes the variables in the module init routine, so that user-supplied values are ignored. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.2+
2013-11-23block: Abstract out bvec iteratorKent Overstreet1-1/+1
Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
2013-09-10drivers: convert shrinkers to new count/scan APIDave Chinner1-25/+39
Convert the driver shrinkers to the new API. Most changes are compile tested only because I either don't have the hardware or it's staging stuff. FWIW, the md and android code is pretty good, but the rest of it makes me want to claw my eyes out. The amount of broken code I just encountered is mind boggling. I've added comments explaining what is broken, but I fear that some of the code would be best dealt with by being dragged behind the bike shed, burying in mud up to it's neck and then run over repeatedly with a blunt lawn mower. Special mention goes to the zcache/zcache2 drivers. They can't co-exist in the build at the same time, they are under different menu options in menuconfig, they only show up when you've got the right set of mm subsystem options configured and so even compile testing is an exercise in pulling teeth. And that doesn't even take into account the horrible, broken code... [glommer@openvz.org: fixes for i915, android lowmem, zcache, bcache] Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@openvz.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Kent Overstreet <koverstreet@google.com> Cc: John Stultz <john.stultz@linaro.org> Cc: David Rientjes <rientjes@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-10dm bufio: submit writes outside lockMikulas Patocka1-15/+58
This patch changes dm-bufio so that it submits write I/Os outside of the lock. If the number of submitted buffers is greater than the number of requests on the target queue, submit_bio blocks. We want to block outside of the lock to improve latency of other threads that may need the lock. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-07-10dm: use __GFP_HIGHMEM in __vmallocMikulas Patocka1-1/+1
Use __GFP_HIGHMEM in __vmalloc. Pages allocated with __vmalloc can be allocated in high memory that is not directly mapped to kernel space, so use __GFP_HIGHMEM just like vmalloc does. This patch reduces memory pressure slightly because pages can be allocated in the high zone. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm bufio: avoid a possible __vmalloc deadlockMikulas Patocka1-1/+23
This patch uses memalloc_noio_save to avoid a possible deadlock in dm-bufio. (it could happen only with large block size, at most PAGE_SIZE << MAX_ORDER (typically 8MiB). __vmalloc doesn't fully respect gfp flags. The specified gfp flags are used for allocation of requested pages, structures vmap_area, vmap_block and vm_struct and the radix tree nodes. However, the kernel pagetables are allocated always with GFP_KERNEL. Thus the allocation of pagetables can recurse back to the I/O layer and cause a deadlock. This patch uses the function memalloc_noio_save to set per-process PF_MEMALLOC_NOIO flag and the function memalloc_noio_restore to restore it. When this flag is set, all allocations in the process are done with implied GFP_NOIO flag, thus the deadlock can't happen. This should be backported to stable kernels, but they don't have the PF_MEMALLOC_NOIO flag and memalloc_noio_save/memalloc_noio_restore functions. So, PF_MEMALLOC should be set and restored instead. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20dm verity: avoid deadlockMikulas Patocka1-0/+2
A deadlock was found in the prefetch code in the dm verity map function. This patch fixes this by transferring the prefetch to a worker thread and skipping it completely if kmalloc fails. If generic_make_request is called recursively, it queues the I/O request on the current->bio_list without making the I/O request and returns. The routine making the recursive call cannot wait for the I/O to complete. The deadlock occurs when one thread grabs the bufio_client mutex and waits for an I/O to complete but the I/O is queued on another thread's current->bio_list and is waiting to get the mutex held by the first thread. The fix recognises that prefetching is not essential. If memory can be allocated, it queues the prefetch request to the worker thread, but if not, it does nothing. Signed-off-by: Paul Taysom <taysom@chromium.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@kernel.org
2013-03-01dm bufio: use WRITE_FLUSH instead of REQ_FLUSHMikulas Patocka1-1/+1
Use WRITE_FLUSH instead of REQ_FLUSH for submitted requests to make it consistent with the rest of the kernel. There is no functional change. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin1-2/+1
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-12dm: use ACCESS_ONCE for sysfs valuesMikulas Patocka1-7/+3
Use the ACCESS_ONCE macro in dm-bufio and dm-verity where a variable can be modified asynchronously (through sysfs) and we want to prevent compiler optimizations that assume that the variable hasn't changed. (See Documentation/atomic_ops.txt.) Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-10-12dm bufio: use list_moveWei Yongjun1-2/+1
Use list_move() instead of list_del() + list_add(). spatch with a semantic match was used to find this. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-28dm bufio: prefetchMikulas Patocka1-26/+82
This patch introduces a new function dm_bufio_prefetch. It prefetches the specified range of blocks into dm-bufio cache without waiting for i/o completion. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-02-03dm-bufio.c: there's no need to include linux/version.hJesper Juhl1-1/+0
As 'make versioncheck' points out, drivers/md/dm-bufio.c has no need to include linux/version.h, so this patch removes the unneeded include. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-11-07device-mapper: dm-bufio.c needs to include module.hStephen Rothwell1-0/+1
since it uses the module facilities. Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31dm: add bufioMikulas Patocka1-0/+1699
The dm-bufio interface allows you to do cached I/O on devices, holding recently-read blocks in memory and performing delayed writes. We don't use buffer cache or page cache already present in the kernel, because: * we need to handle block sizes larger than a page * we can't allocate memory to perform reads or we'd have deadlocks Currently, when a cache is required, we limit its size to a fraction of available memory. Usage can be viewed and changed in /sys/module/dm_bufio/parameters/ . The first user is thin provisioning, but more dm users are planned. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>