From 0f640dca08330dfc7820d610578e5935b5e654b2 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Thu, 31 Jan 2013 14:11:14 +0000 Subject: [PATCH 1/2] dm thin: fix queue limits stacking thin_io_hints() is blindly copying the queue limits from the thin-pool which can lead to incorrect limits being set. The fix here simply deletes the thin_io_hints() hook which leaves the existing stacking infrastructure to set the limits correctly. When a thin-pool uses an MD device for the data device a thin device from the thin-pool must respect MD's constraints about disallowing a bio from spanning multiple chunks. Otherwise we can see problems. If the raid0 chunksize is 1152K and thin-pool chunksize is 256K I see the following md/raid0 error (with extra debug tracing added to thin_endio) when mkfs.xfs is executed against the thin device: md/raid0:md99: make_request bug: can't convert block across chunks or bigger than 1152k 6688 127 device-mapper: thin: bio sector=2080 err=-5 bi_size=130560 bi_rw=17 bi_vcnt=32 bi_idx=0 This extra DM debugging shows that the failing bio is spanning across the first and second logical 1152K chunk (sector 2080 + 255 takes the bio beyond the first chunk's boundary of sector 2304). So the bio splitting that DM is doing clearly isn't respecting the MD limits. max_hw_sectors_kb is 127 for both the thin-pool and thin device (queue_max_hw_sectors returns 255 so we'll excuse sysfs's lack of precision). So this explains why bi_size is 130560. But the thin device's max_hw_sectors_kb should be 4 (PAGE_SIZE) given that it doesn't have a .merge function (for bio_add_page to consult indirectly via dm_merge_bvec) yet the thin-pool does sit above an MD device that has a compulsory merge_bvec_fn. This scenario is exactly why DM must resort to sending single PAGE_SIZE bios to the underlying layer. Some additional context for this is available in the header for commit 8cbeb67a ("dm: avoid unsupported spanning of md stripe boundaries"). Long story short, the reason a thin device doesn't properly get configured to have a max_hw_sectors_kb of 4 (PAGE_SIZE) is that thin_io_hints() is blindly copying the queue limits from the thin-pool device directly to the thin device's queue limits. Fix this by eliminating thin_io_hints. Doing so is safe because the block layer's queue limits stacking already enables the upper level thin device to inherit the thin-pool device's discard and minimum_io_size and optimal_io_size limits that get set in pool_io_hints. But avoiding the queue limits copy allows the thin and thin-pool limits to be different where it is important, namely max_hw_sectors_kb. Reported-by: Daniel Browning Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon --- drivers/md/dm-thin.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 675ae527401..5409607d487 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -2746,19 +2746,9 @@ static int thin_iterate_devices(struct dm_target *ti, return 0; } -/* - * A thin device always inherits its queue limits from its pool. - */ -static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits) -{ - struct thin_c *tc = ti->private; - - *limits = bdev_get_queue(tc->pool_dev->bdev)->limits; -} - static struct target_type thin_target = { .name = "thin", - .version = {1, 6, 0}, + .version = {1, 7, 0}, .module = THIS_MODULE, .ctr = thin_ctr, .dtr = thin_dtr, @@ -2767,7 +2757,6 @@ static struct target_type thin_target = { .postsuspend = thin_postsuspend, .status = thin_status, .iterate_devices = thin_iterate_devices, - .io_hints = thin_io_hints, }; /*----------------------------------------------------------------*/ From fe7af2d3babefabd96a39e8b0d58ede88f3c7993 Mon Sep 17 00:00:00 2001 From: Alasdair G Kergon Date: Thu, 31 Jan 2013 14:23:36 +0000 Subject: [PATCH 2/2] dm: fix write same requests counting When processing write same requests, fix dm to send the configured number of WRITE SAME requests to the target rather than the number of discards, which is not always the same. Device-mapper WRITE SAME support was introduced by commit 23508a96cd2e857d57044a2ed7d305f2d9daf441 ("dm: add WRITE SAME support"). Signed-off-by: Alasdair G Kergon Acked-by: Mike Snitzer --- drivers/md/dm.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index c72e4d5a961..314a0e2faf7 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1188,6 +1188,7 @@ static int __clone_and_map_changing_extent_only(struct clone_info *ci, { struct dm_target *ti; sector_t len; + unsigned num_requests; do { ti = dm_table_find_target(ci->map, ci->sector); @@ -1200,7 +1201,8 @@ static int __clone_and_map_changing_extent_only(struct clone_info *ci, * reconfiguration might also have changed that since the * check was performed. */ - if (!get_num_requests || !get_num_requests(ti)) + num_requests = get_num_requests ? get_num_requests(ti) : 0; + if (!num_requests) return -EOPNOTSUPP; if (is_split_required && !is_split_required(ti)) @@ -1208,7 +1210,7 @@ static int __clone_and_map_changing_extent_only(struct clone_info *ci, else len = min(ci->sector_count, max_io_len(ci->sector, ti)); - __issue_target_requests(ci, ti, ti->num_discard_requests, len); + __issue_target_requests(ci, ti, num_requests, len); ci->sector += len; } while (ci->sector_count -= len);