Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module fixes and a virtio block fix from Rusty Russell:
"Various minor fixes, but a slightly more complex one to fix the
per-cpu overload problem introduced recently by kvm id changes."
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: put modules in list much earlier.
module: add new state MODULE_STATE_UNFORMED.
module: prevent warning when finit_module a 0 sized file
virtio-blk: Don't free ida when disk is in use
|
|
CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.
This change removes the use of __devinit, __devexit_p, __devinitdata,
__devinitconst, and __devexit from these drivers.
Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Chirag Kantharia <chirag.kantharia@hp.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Jim Paris <jim@jtan.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tao Guo <Tao.Guo@emc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When a file system is mounted on a virtio-blk disk, we then remove it
and then reattach it, the reattached disk gets the same disk name and
ids as the hot removed one.
This leads to very nasty effects - mostly rendering the newly attached
device completely unusable.
Trying what happens when I do the same thing with a USB device, I saw
that the sd node simply doesn't get free'd when a device gets forcefully
removed.
Imitate the same behavior for vd devices. This way broken vd devices
simply are never free'd and newly attached ones keep working just fine.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
|
|
This reduces unnecessary interrupts that host could send to guest while
guest is in the progress of irq handling.
If one vcpu is handling the irq, while another interrupt comes, in
handle_edge_irq(), the guest will mask the interrupt via mask_msi_irq()
which is a very heavy operation that goes all the way down to host.
Here are some performance numbers on qemu:
Before:
-------------------------------------
seq-read : io=0 B, bw=269730KB/s, iops=67432 , runt= 62200msec
seq-write : io=0 B, bw=339716KB/s, iops=84929 , runt= 49386msec
rand-read : io=0 B, bw=270435KB/s, iops=67608 , runt= 62038msec
rand-write: io=0 B, bw=354436KB/s, iops=88608 , runt= 47335msec
clat (usec): min=101 , max=138052 , avg=14822.09, stdev=11771.01
clat (usec): min=96 , max=81543 , avg=11798.94, stdev=7735.60
clat (usec): min=128 , max=140043 , avg=14835.85, stdev=11765.33
clat (usec): min=109 , max=147207 , avg=11337.09, stdev=5990.35
cpu : usr=15.93%, sys=60.37%, ctx=7764972, majf=0, minf=54
cpu : usr=32.73%, sys=120.49%, ctx=7372945, majf=0, minf=1
cpu : usr=18.84%, sys=58.18%, ctx=7775420, majf=0, minf=1
cpu : usr=24.20%, sys=59.85%, ctx=8307886, majf=0, minf=0
vdb: ios=8389107/8368136, merge=0/0, ticks=19457874/14616506,
in_queue=34206098, util=99.68%
43: interrupt in total: 887320
fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting
--ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16
--ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0
--filename=/dev/vdb --name=seq-read --stonewall --rw=read
--name=seq-write --stonewall --rw=write --name=rnd-read --stonewall
--rw=randread --name=rnd-write --stonewall --rw=randwrite
After:
-------------------------------------
seq-read : io=0 B, bw=309503KB/s, iops=77375 , runt= 54207msec
seq-write : io=0 B, bw=448205KB/s, iops=112051 , runt= 37432msec
rand-read : io=0 B, bw=311254KB/s, iops=77813 , runt= 53902msec
rand-write: io=0 B, bw=377152KB/s, iops=94287 , runt= 44484msec
clat (usec): min=81 , max=90588 , avg=12946.06, stdev=9085.94
clat (usec): min=57 , max=72264 , avg=8967.97, stdev=5951.04
clat (usec): min=29 , max=101046 , avg=12889.95, stdev=9067.91
clat (usec): min=52 , max=106152 , avg=10660.56, stdev=4778.19
cpu : usr=15.05%, sys=57.92%, ctx=7710941, majf=0, minf=54
cpu : usr=26.78%, sys=101.40%, ctx=7387891, majf=0, minf=2
cpu : usr=19.03%, sys=58.17%, ctx=7681976, majf=0, minf=8
cpu : usr=24.65%, sys=58.34%, ctx=8442632, majf=0, minf=4
vdb: ios=8389086/8361888, merge=0/0, ticks=17243780/12742010,
in_queue=30078377, util=99.59%
43: interrupt in total: 1259639
fio --exec_prerun="echo 3 > /proc/sys/vm/drop_caches" --group_reporting
--ioscheduler=noop --thread --bs=4k --size=512MB --direct=1 --numjobs=16
--ioengine=libaio --iodepth=64 --loops=3 --ramp_time=0
--filename=/dev/vdb --name=seq-read --stonewall --rw=read
--name=seq-write --stonewall --rw=write --name=rnd-read --stonewall
--rw=randread --name=rnd-write --stonewall --rw=randwrite
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Smatch complains about the inconsistent NULL checking here. Fix it to
return NULL on failure.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (fixed accidental deletion)
|
|
We need to support both REQ_FLUSH and REQ_FUA for bio based path since
it does not get the sequencing of REQ_FUA into REQ_FLUSH that request
based drivers can request.
REQ_FLUSH is emulated by:
A) If the bio has no data to write:
1. Send VIRTIO_BLK_T_FLUSH to device,
2. In the flush I/O completion handler, finish the bio
B) If the bio has data to write:
1. Send VIRTIO_BLK_T_FLUSH to device
2. In the flush I/O completion handler, send the actual write data to device
3. In the write I/O completion handler, finish the bio
REQ_FUA is emulated by:
1. Send the actual write data to device
2. In the write I/O completion handler, send VIRTIO_BLK_T_FLUSH to device
3. In the flush I/O completion handler, finish the bio
Changes in v7:
- Using vbr->flags to trace request type
- Dropped unnecessary struct virtio_blk *vblk parameter
- Reuse struct virtblk_req in bio done function
Cahnges in v6:
- Reworked REQ_FLUSH and REQ_FUA emulatation order
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
This patch introduces bio-based IO path for virtio-blk.
Compared to request-based IO path, bio-based IO path uses driver
provided ->make_request_fn() method to bypasses the IO scheduler. It
handles the bio to device directly without allocating a request in block
layer. This reduces the IO path in guest kernel to achieve high IOPS
and lower latency. The downside is that guest can not use the IO
scheduler to merge and sort requests. However, this is not a big problem
if the backend disk in host side uses faster disk device.
When the bio-based IO path is not enabled, virtio-blk still uses the
original request-based IO path, no performance difference is observed.
Using a slow device e.g. normal SATA disk, the bio-based IO path for
sequential read and write are slower than req-based IO path due to lack
of merge in guest kernel. So we make the bio-based path optional.
Performance evaluation:
-----------------------------
1) Fio test is performed in a 8 vcpu guest with ramdisk based guest using
kvm tool.
Short version:
With bio-based IO path, sequential read/write, random read/write
IOPS boost : 28%, 24%, 21%, 16%
Latency improvement: 32%, 17%, 21%, 16%
Long version:
With bio-based IO path:
seq-read : io=2048.0MB, bw=116996KB/s, iops=233991 , runt= 17925msec
seq-write : io=2048.0MB, bw=100829KB/s, iops=201658 , runt= 20799msec
rand-read : io=3095.7MB, bw=112134KB/s, iops=224268 , runt= 28269msec
rand-write: io=3095.7MB, bw=96198KB/s, iops=192396 , runt= 32952msec
clat (usec): min=0 , max=2631.6K, avg=58716.99, stdev=191377.30
clat (usec): min=0 , max=1753.2K, avg=66423.25, stdev=81774.35
clat (usec): min=0 , max=2915.5K, avg=61685.70, stdev=120598.39
clat (usec): min=0 , max=1933.4K, avg=76935.12, stdev=96603.45
cpu : usr=74.08%, sys=703.84%, ctx=29661403, majf=21354, minf=22460954
cpu : usr=70.92%, sys=702.81%, ctx=77219828, majf=13980, minf=27713137
cpu : usr=72.23%, sys=695.37%, ctx=88081059, majf=18475, minf=28177648
cpu : usr=69.69%, sys=654.13%, ctx=145476035, majf=15867, minf=26176375
With request-based IO path:
seq-read : io=2048.0MB, bw=91074KB/s, iops=182147 , runt= 23027msec
seq-write : io=2048.0MB, bw=80725KB/s, iops=161449 , runt= 25979msec
rand-read : io=3095.7MB, bw=92106KB/s, iops=184211 , runt= 34416msec
rand-write: io=3095.7MB, bw=82815KB/s, iops=165630 , runt= 38277msec
clat (usec): min=0 , max=1932.4K, avg=77824.17, stdev=170339.49
clat (usec): min=0 , max=2510.2K, avg=78023.96, stdev=146949.15
clat (usec): min=0 , max=3037.2K, avg=74746.53, stdev=128498.27
clat (usec): min=0 , max=1363.4K, avg=89830.75, stdev=114279.68
cpu : usr=53.28%, sys=724.19%, ctx=37988895, majf=17531, minf=23577622
cpu : usr=49.03%, sys=633.20%, ctx=205935380, majf=18197, minf=27288959
cpu : usr=55.78%, sys=722.40%, ctx=101525058, majf=19273, minf=28067082
cpu : usr=56.55%, sys=690.83%, ctx=228205022, majf=18039, minf=26551985
2) Fio test is performed in a 8 vcpu guest with Fusion-IO based guest using
kvm tool.
Short version:
With bio-based IO path, sequential read/write, random read/write
IOPS boost : 11%, 11%, 13%, 10%
Latency improvement: 10%, 10%, 12%, 10%
Long Version:
With bio-based IO path:
read : io=2048.0MB, bw=58920KB/s, iops=117840 , runt= 35593msec
write: io=2048.0MB, bw=64308KB/s, iops=128616 , runt= 32611msec
read : io=3095.7MB, bw=59633KB/s, iops=119266 , runt= 53157msec
write: io=3095.7MB, bw=62993KB/s, iops=125985 , runt= 50322msec
clat (usec): min=0 , max=1284.3K, avg=128109.01, stdev=71513.29
clat (usec): min=94 , max=962339 , avg=116832.95, stdev=65836.80
clat (usec): min=0 , max=1846.6K, avg=128509.99, stdev=89575.07
clat (usec): min=0 , max=2256.4K, avg=121361.84, stdev=82747.25
cpu : usr=56.79%, sys=421.70%, ctx=147335118, majf=21080, minf=19852517
cpu : usr=61.81%, sys=455.53%, ctx=143269950, majf=16027, minf=24800604
cpu : usr=63.10%, sys=455.38%, ctx=178373538, majf=16958, minf=24822612
cpu : usr=62.04%, sys=453.58%, ctx=226902362, majf=16089, minf=23278105
With request-based IO path:
read : io=2048.0MB, bw=52896KB/s, iops=105791 , runt= 39647msec
write: io=2048.0MB, bw=57856KB/s, iops=115711 , runt= 36248msec
read : io=3095.7MB, bw=52387KB/s, iops=104773 , runt= 60510msec
write: io=3095.7MB, bw=57310KB/s, iops=114619 , runt= 55312msec
clat (usec): min=0 , max=1532.6K, avg=142085.62, stdev=109196.84
clat (usec): min=0 , max=1487.4K, avg=129110.71, stdev=114973.64
clat (usec): min=0 , max=1388.6K, avg=145049.22, stdev=107232.55
clat (usec): min=0 , max=1465.9K, avg=133585.67, stdev=110322.95
cpu : usr=44.08%, sys=590.71%, ctx=451812322, majf=14841, minf=17648641
cpu : usr=48.73%, sys=610.78%, ctx=418953997, majf=22164, minf=26850689
cpu : usr=45.58%, sys=581.16%, ctx=714079216, majf=21497, minf=22558223
cpu : usr=48.40%, sys=599.65%, ctx=656089423, majf=16393, minf=23824409
3) Fio test is performed in a 8 vcpu guest with normal SATA based guest
using kvm tool.
Short version:
With bio-based IO path, sequential read/write, random read/write
IOPS boost : -10%, -10%, 4.4%, 0.5%
Latency improvement: -12%, -15%, 2.5%, 0.8%
Long Version:
With bio-based IO path:
read : io=124812KB, bw=36537KB/s, iops=9060 , runt= 3416msec
write: io=169180KB, bw=24406KB/s, iops=6065 , runt= 6932msec
read : io=256200KB, bw=2089.3KB/s, iops=520 , runt=122630msec
write: io=257988KB, bw=1545.7KB/s, iops=384 , runt=166910msec
clat (msec): min=1 , max=1527 , avg=28.06, stdev=89.54
clat (msec): min=2 , max=344 , avg=41.12, stdev=38.70
clat (msec): min=8 , max=1984 , avg=490.63, stdev=207.28
clat (msec): min=33 , max=4131 , avg=659.19, stdev=304.71
cpu : usr=4.85%, sys=17.15%, ctx=31593, majf=0, minf=7
cpu : usr=3.04%, sys=11.45%, ctx=39377, majf=0, minf=0
cpu : usr=0.47%, sys=1.59%, ctx=262986, majf=0, minf=16
cpu : usr=0.47%, sys=1.46%, ctx=337410, majf=0, minf=0
With request-based IO path:
read : io=150120KB, bw=40420KB/s, iops=10037 , runt= 3714msec
write: io=194932KB, bw=27029KB/s, iops=6722 , runt= 7212msec
read : io=257136KB, bw=2001.1KB/s, iops=498 , runt=128443msec
write: io=258276KB, bw=1537.2KB/s, iops=382 , runt=168028msec
clat (msec): min=1 , max=1542 , avg=24.84, stdev=32.45
clat (msec): min=3 , max=628 , avg=35.62, stdev=39.71
clat (msec): min=8 , max=2540 , avg=503.28, stdev=236.97
clat (msec): min=41 , max=4398 , avg=653.88, stdev=302.61
cpu : usr=3.91%, sys=15.75%, ctx=26968, majf=0, minf=23
cpu : usr=2.50%, sys=10.56%, ctx=19090, majf=0, minf=0
cpu : usr=0.16%, sys=0.43%, ctx=20159, majf=0, minf=16
cpu : usr=0.18%, sys=0.53%, ctx=81364, majf=0, minf=0
How to use:
-----------------------------
Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk
use_bio=1' to enable ->make_request_fn() based I/O path.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature,
which exposes the cache mode in the configuration space and lets the
driver modify it. The cache mode is exposed via sysfs.
Even if the host does not support the new feature, the cache mode is
visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Block layer will allocate a spinlock for the queue if the driver does
not provide one in blk_init_queue().
The reason to use the internal spinlock is that blk_cleanup_queue() will
switch to use the internal spinlock in the cleanup code path.
if (q->queue_lock != &q->__queue_lock)
q->queue_lock = &q->__queue_lock;
However, processes which are in D state might have taken the driver
provided spinlock, when the processes wake up, they would release the
block provided spinlock.
=====================================
[ BUG: bad unlock balance detected! ]
3.4.0-rc7+ #238 Not tainted
-------------------------------------
fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
[<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!
other info that might help us debug this:
1 lock held by fio/3587:
#0: (&(&vblk->lock)->rlock){......}, at:
[<ffffffff8132661a>] get_request_wait+0x19a/0x250
Other drivers use block layer provided spinlock as well, e.g. SCSI.
Switching to the block layer provided spinlock saves a bit of memory and
does not increase lock contention. Performance test shows no real
difference is observed before and after this patch.
Changes in v2: Improve commit log as Michael suggested.
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
blk_cleanup_queue() will call blk_drian_queue() to drain all the
requests before queue DEAD marking. If we reset the device before
blk_cleanup_queue() the drain would fail.
1) if the queue is stopped in do_virtblk_request() because device is
full, the q->request_fn() will not be called.
blk_drain_queue() {
while(true) {
...
if (!list_empty(&q->queue_head))
__blk_run_queue(q) {
if (queue is not stoped)
q->request_fn()
}
...
}
}
Do no reset the device before blk_cleanup_queue() gives the chance to
start the queue in interrupt handler blk_done().
2) In commit b79d866c8b7014a51f611a64c40546109beaf24a, We abort requests
dispatched to driver before blk_cleanup_queue(). There is a race if
requests are dispatched to driver after the abort and before the queue
DEAD mark. To fix this, instead of aborting the requests explicitly, we
can just reset the device after after blk_cleanup_queue so that the
device can complete all the requests before queue DEAD marking in the
drain process.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
del_gendisk() might not return due to failing to remove the
/sys/block/vda/serial sysfs entry when another thread (udev) is
trying to read it.
virtblk_remove()
vdev->config->reset() : guest will not kick us through interrupt
del_gendisk()
device_del()
kobject_del(): got stuck, sysfs entry ref count non zero
sysfs_open_file(): user space process read /sys/block/vda/serial
sysfs_get_active() : got sysfs entry ref count
dev_attr_show()
virtblk_serial_show()
blk_execute_rq() : got stuck, interrupt is disabled
request cannot be finished
This patch fixes it by calling del_gendisk() before we disable guest's
interrupt so that the request sent in virtblk_serial_show() will be
finished and del_gendisk() will success.
This fixes another race in hot-unplug process.
It is save to call del_gendisk(vblk->disk) before
flush_work(&vblk->config_work) which might access vblk->disk, because
vblk->disk is not freed until put_disk(vblk->disk).
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Benchmark shows small performance improvement on fusion io device.
Before:
seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec
seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec
rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec
rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec
After:
seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec
seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec
rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec
rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
If we reset the virtio-blk device before the requests already dispatched
to the virtio-blk driver from the block layer are finised, we will stuck
in blk_cleanup_queue() and the remove will fail.
blk_cleanup_queue() calls blk_drain_queue() to drain all requests queued
before DEAD marking. However it will never success if the device is
already stopped. We'll have q->in_flight[] > 0, so the drain will not
finish.
How to reproduce the race:
1. hot-plug a virtio-blk device
2. keep reading/writing the device in guest
3. hot-unplug while the device is busy serving I/O
Test:
~1000 rounds of hot-plug/hot-unplug test passed with this patch.
Changes in v3:
- Drop blk_abort_queue and blk_abort_request
- Use __blk_end_request_all to complete request dispatched to driver
Changes in v2:
- Drop req_in_flight
- Use virtqueue_detach_unused_buf to get request dispatched to driver
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Pull virtio fixes from Michael S. Tsirkin:
"Here are some virtio fixes for 3.4: a test build fix, a patch by Ren
fixing naming for systems with a massive number of virtio blk devices,
and balloon fixes for powerpc by David Gibson.
There was some discussion about Ren's patch for virtio disc naming:
some people wanted to move the legacy name mangling function to the
block core. But there's no concensus on that yet, and we can always
deduplicate later. Added comments in the hope that this will stop
people from copying this legacy naming scheme into future drivers."
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_balloon: fix handling of PAGE_SIZE != 4k
virtio_balloon: Fix endian bug
virtio_blk: helper function to format disk names
tools/virtio: fix up vhost/test module build
|
|
The current virtio block's naming algorithm just supports 18278
(26^3 + 26^2 + 26) disks. If there are more virtio blocks,
there will be disks with the same name.
Based on commit 3e1a7ff8a0a7b948f2684930166954f9e8e776fe, add
a function "virtblk_name_format()" for virtio block to support mass
of disks naming.
Notes:
- Our naming scheme is ugly. We are stuck with it
for virtio but don't use it for any new driver:
new drivers should name their devices PREFIX%d
where the sequence number can be allocated by ida
- sd_format_disk_name has exactly the same logic.
Moving it to a central place was deferred over worries
that this will make people keep using the legacy naming
in new drivers.
We kept code idential in case someone wants to deduplicate later.
Signed-off-by: Ren Mingxin <renmx@cn.fujitsu.com>
Acked-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
If a virtio disk is open in guest and a disk resize operation is done,
(virsh blockresize), new size is not visible to tools like "fdisk -l".
This seems to be happening as we update only part->nr_sects and not
bdev->bd_inode size.
Call revalidate_disk() which should take care of it. I tested growing disk
size of already open disk and it works for me.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Introduce a wrapper around scsi_cmd_ioctl that takes a block device.
The function will then be enhanced to detect partition block devices
and, in that case, subject the ioctls to whitelisting.
Cc: linux-scsi@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Delete the vq and flush any pending requests from the block queue on the
freeze callback to prepare for hibernation.
Re-create the vq in the restore callback to resume normal function.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
The probe and PM restore functions will share this code.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Fix a theoretical race related to config work
handler: a config interrupt might happen
after we flush config work but before we
reset the device. It will then cause the
config work to run during or after reset.
Two problems with this:
- if this runs after device is gone we will get use after free
- access of config while reset is in progress is racy
(as layout is changing).
As a solution
1. flush after reset when we know there will be no more interrupts
2. add a flag to disable config access before reset
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Remove wrapper functions. This makes the allocation type explicit in
all callers; I used GPF_KERNEL where it seemed obvious, left it at
GFP_ATOMIC otherwise.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
|
|
Based on a patch by Mark Wu <dwu@redhat.com>
Current index allocation in virtio-blk is based on a monotonically
increasing variable "index". This means we'll run out of numbers
after a while. It also could cause confusion about the disk
name in the case of hot-plugging disks.
Change virtio-blk to use ida to allocate index, instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
We want to remove the implicit everywhere presence of module.h
so fix up the people relying on that implicit presence in advance.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
|
It is easier to figure out the context by reading SCSI_SENSE_BUFFERSIZE
instead of plain '96'.
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Wire up the virtio_driver config_changed method to get notified about
config changes raised by the host. For now we just re-read the device
size to support online resizing of devices, but once we add more
attributes that might be changeable they could be added as well.
Note that the config_changed method is called from irq context, so
we'll have to use the workqueue infrastructure to provide us a proper
user context for our changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
xen-blkfront: disable barrier/flush write support
Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
block: remove BLKDEV_IFL_WAIT
aic7xxx_old: removed unused 'req' variable
block: remove the BH_Eopnotsupp flag
block: remove the BLKDEV_IFL_BARRIER flag
block: remove the WRITE_BARRIER flag
swap: do not send discards as barriers
fat: do not send discards as barriers
ext4: do not send discards as barriers
jbd2: replace barriers with explicit flush / FUA usage
jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
jbd: replace barriers with explicit flush / FUA usage
nilfs2: replace barriers with explicit flush / FUA usage
reiserfs: replace barriers with explicit flush / FUA usage
gfs2: replace barriers with explicit flush / FUA usage
btrfs: replace barriers with explicit flush / FUA usage
xfs: replace barriers with explicit flush / FUA usage
block: pass gfp_mask and flags to sb_issue_discard
dm: convey that all flushes are processed as empty
...
|
|
Remove the BKL usage added in "block: push down BKL into .locked_ioctl".
Virtio-blk doesn't use the BKL for anything, and doesn't implement any
ioctl command by itself, but only uses the generic scsi_cmd_ioctl
which is fine without the BKL.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Conflicts:
block/blk-core.c
drivers/block/loop.c
mm/swapfile.c
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Must drop reference taken by blk_make_request().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # .35.x
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove now unused REQ_HARDBARRIER support. virtio_blk already
supports REQ_FLUSH and the usefulness of REQ_FUA for virtio_blk is
questionable at this point, so there's nothing else to do to support
new REQ_FLUSH/FUA interface.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
requests. Deprecate barrier. All REQ_HARDBARRIERs are failed with
-EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
blk_queue_flush().
blk_queue_flush() takes combinations of REQ_FLUSH and FUA. If a
device has write cache and can flush it, it should set REQ_FLUSH. If
the device can handle FUA writes, it should also set REQ_FUA.
All blk_queue_ordered() users are converted.
* ORDERED_DRAIN is mapped to 0 which is the default value.
* ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
* ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Nobody is making meaningful use of ORDERED_BY_TAG now and queue
draining for barrier requests will be removed soon which will render
the advantage of tag ordering moot. Kill ORDERED_BY_TAG. The
following users are affected.
* brd: converted to ORDERED_DRAIN.
* virtio_blk: ORDERED_TAG path was already marked deprecated. Removed.
* xen-blkfront: ORDERED_TAG case dropped.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
xen-blkfront: fix missing out label
blkdev: fix blkdev_issue_zeroout return value
block: update request stacking methods to support discards
block: fix missing export of blk_types.h
writeback: fix bad _bh spinlock nesting
drbd: revert "delay probes", feature is being re-implemented differently
drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
drbd: Disable delay probes for the upcomming release
writeback: cleanup bdi_register
writeback: add new tracepoints
writeback: remove unnecessary init_timer call
writeback: optimize periodic bdi thread wakeups
writeback: prevent unnecessary bdi threads wakeups
writeback: move bdi threads exiting logic to the forker thread
writeback: restructure bdi forker loop a little
writeback: move last_active to bdi
writeback: do not remove bdi from bdi_list
writeback: simplify bdi code a little
writeback: do not lose wake-ups in bdi threads
...
Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.
|
|
As a preparation for the removal of the big kernel
lock in the block layer, this removes the BKL
from the common ioctl handling code, moving it
into every single driver still using it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
This removes q->prepare_flush_fn completely (changes the
blk_queue_ordered API).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
use REQ_FLUSH flag instead.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
On compilation, gcc correctly detects that we do not handle
all types:
In function ‘blk_done’:
warning: enumeration value ‘REQ_TYPE_FS’ not handled in switch
warning: enumeration value ‘REQ_TYPE_SENSE’ not handled in switch
warning: enumeration value ‘REQ_TYPE_PM_SUSPEND’ not handled in switch
warning: enumeration value ‘REQ_TYPE_PM_RESUME’ not handled in switch
warning: enumeration value ‘REQ_TYPE_PM_SHUTDOWN’ not handled in switch
warning: enumeration value ‘REQ_TYPE_LINUX_BLOCK’ not handled in switch
warning: enumeration value ‘REQ_TYPE_ATA_TASKFILE’ not handled in switch
warning: enumeration value ‘REQ_TYPE_ATA_PC’ not handled in switch
which is a bit pointless since this is at the end of the request
processessing. Add a default case that just breaks out.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests. This allows much easier grepping for different request
types instead of unwinding through macros.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
With the availablility of a sysfs device attribute for examining disk serial
numbers the ioctl is no longer needed. The user-space changes for this aren't
upstream yet so we don't have any users to worry about.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Create a new attribute for virtio-blk devices that will fetch the serial number
of the block device. This attribute can be used by udev to create disk/by-id
symlinks for devices that don't have a UUID (filesystem) associated with them.
ATA_IDENTIFY strings are special in that they can be up to 20 chars long
and aren't required to be nul-terminated. The buffer is also zero-padded
meaning that if the serial is 19 chars or less that we get a nul-terminated
string. When copying this value into a string buffer, we must be careful to
copy up to the nul (if it present) and only 20 if it is longer and not to
attempt to nul terminate; this isn't needed.
Changes since v1:
- Added BUILD_BUG_ON() for PAGE_SIZE check
- Removed min() since BUILD_BUG_ON() handles the check
- Replaced serial_sysfs() by copying id directly to buffer
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
If we want to support barriers with the cache=writethrough mode in qemu
we need to tell the block layer that we only need queue drains to
implement a barrier. Follow the model set by SCSI and IDE and assume
that there is no volatile write cache if the host doesn't advertize it.
While this might imply working barriers on old qemu versions or other
hypervisors that actually have a volatile write cache this is only a
cosmetic issue - these hypervisors don't guarantee any data integrity
with or without this patch, but with the patch we at least provide
data ordering.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
We need at least one S/G element to operate properly, as does the block
layer which increments it to one anyway. We hit this due to a qemu
bug which advertises a sg_elements of 0 under some circumstances.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (tweaked logic)
|
|
Switch virtio_blk to new virtqueue_xxx wrappers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
drivers/block/virtio_blk.c:228:13: warning: multi-character character constant
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: john cooper <john.cooper@redhat.com>
|
|
Return serial string to the guest application via
ioctl driver call.
Note this form of interface to the guest userland
was the consensus when the prior version using
the ATA_IDENTIFY came under dispute.
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
Add virtio-blk device id (s/n) support via virtio request.
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (34 commits)
cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch
loop: Update mtime when writing using aops
block: expose the statistics in blkio.time and blkio.sectors for the root cgroup
backing-dev: Handle class_create() failure
Block: Fix block/elevator.c elevator_get() off-by-one error
drbd: lc_element_by_index() never returns NULL
cciss: unlock on error path
cfq-iosched: Do not merge queues of BE and IDLE classes
cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging
i2o: Remove the dangerous kobj_to_i2o_device macro
block: remove 16 bytes of padding from struct request on 64bits
cfq-iosched: fix a kbuild regression
block: make CONFIG_BLK_CGROUP visible
Remove GENHD_FL_DRIVERFS
block: Export max number of segments and max segment size in sysfs
block: Finalize conversion of block limits functions
block: Fix overrun in lcm() and move it to lib
vfs: improve writeback_inodes_wb()
paride: fix off-by-one test
drbd: fix al-to-on-disk-bitmap for 4k logical_block_size
...
|
|
implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
|
Remove compatibility wrappers and update remaining drivers.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|