From 5e8830dc85d0a6258132977381430b327cf553f2 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 12 Jun 2012 16:20:23 +0200 Subject: fs: Push file_update_time() into __block_page_mkwrite() Tested-by: Kamal Mostafa Tested-by: Peter M. Petrakis Tested-by: Dann Frazier Tested-by: Massimo Morana Signed-off-by: Jan Kara Signed-off-by: Al Viro --- fs/buffer.c | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'fs/buffer.c') diff --git a/fs/buffer.c b/fs/buffer.c index c7062c896d7..d5ec360e332 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2318,6 +2318,12 @@ int __block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, loff_t size; int ret; + /* + * Update file times before taking page lock. We may end up failing the + * fault so this update may be superfluous but who really cares... + */ + file_update_time(vma->vm_file); + lock_page(page); size = i_size_read(inode); if ((page->mapping != inode->i_mapping) || -- cgit v1.2.3 From 14da9200140f8d722ad1767dfabadebd8b34f2ad Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 12 Jun 2012 16:20:37 +0200 Subject: fs: Protect write paths by sb_start_write - sb_end_write There are several entry points which dirty pages in a filesystem. mmap (handled by block_page_mkwrite()), buffered write (handled by __generic_file_aio_write()), splice write (generic_file_splice_write), truncate, and fallocate (these can dirty last partial page - handled inside each filesystem separately). Protect these places with sb_start_write() and sb_end_write(). ->page_mkwrite() calls are particularly complex since they are called with mmap_sem held and thus we cannot use standard sb_start_write() due to lock ordering constraints. We solve the problem by using a special freeze protection sb_start_pagefault() which ranks below mmap_sem. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa Tested-by: Peter M. Petrakis Tested-by: Dann Frazier Tested-by: Massimo Morana Signed-off-by: Jan Kara Signed-off-by: Al Viro --- fs/buffer.c | 22 ++++------------------ 1 file changed, 4 insertions(+), 18 deletions(-) (limited to 'fs/buffer.c') diff --git a/fs/buffer.c b/fs/buffer.c index d5ec360e332..9f6d2e41281 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2306,8 +2306,8 @@ EXPORT_SYMBOL(block_commit_write); * beyond EOF, then the page is guaranteed safe against truncation until we * unlock the page. * - * Direct callers of this function should call vfs_check_frozen() so that page - * fault does not busyloop until the fs is thawed. + * Direct callers of this function should protect against filesystem freezing + * using sb_start_write() - sb_end_write() functions. */ int __block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, get_block_t get_block) @@ -2345,18 +2345,7 @@ int __block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, if (unlikely(ret < 0)) goto out_unlock; - /* - * Freezing in progress? We check after the page is marked dirty and - * with page lock held so if the test here fails, we are sure freezing - * code will wait during syncing until the page fault is done - at that - * point page will be dirty and unlocked so freezing code will write it - * and writeprotect it again. - */ set_page_dirty(page); - if (inode->i_sb->s_frozen != SB_UNFROZEN) { - ret = -EAGAIN; - goto out_unlock; - } wait_on_page_writeback(page); return 0; out_unlock: @@ -2371,12 +2360,9 @@ int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, int ret; struct super_block *sb = vma->vm_file->f_path.dentry->d_inode->i_sb; - /* - * This check is racy but catches the common case. The check in - * __block_page_mkwrite() is reliable. - */ - vfs_check_frozen(sb, SB_FREEZE_WRITE); + sb_start_pagefault(sb); ret = __block_page_mkwrite(vma, vmf, get_block); + sb_end_pagefault(sb); return block_page_mkwrite_return(ret); } EXPORT_SYMBOL(block_page_mkwrite); -- cgit v1.2.3 From 676ce6d5ca3098339c028d44fe0427d1566a4d2d Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Thu, 23 Aug 2012 12:17:36 +0200 Subject: block: replace __getblk_slow misfix by grow_dev_page fix Commit 91f68c89d8f3 ("block: fix infinite loop in __getblk_slow") is not good: a successful call to grow_buffers() cannot guarantee that the page won't be reclaimed before the immediate next call to __find_get_block(), which is why there was always a loop there. Yesterday I got "EXT4-fs error (device loop0): __ext4_get_inode_loc:3595: inode #19278: block 664: comm cc1: unable to read itable block" on console, which pointed to this commit. I've been trying to bisect for weeks, why kbuild-on-ext4-on-loop-on-tmpfs sometimes fails from a missing header file, under memory pressure on ppc G5. I've never seen this on x86, and I've never seen it on 3.5-rc7 itself, despite that commit being in there: bisection pointed to an irrelevant pinctrl merge, but hard to tell when failure takes between 18 minutes and 38 hours (but so far it's happened quicker on 3.6-rc2). (I've since found such __ext4_get_inode_loc errors in /var/log/messages from previous weeks: why the message never appeared on console until yesterday morning is a mystery for another day.) Revert 91f68c89d8f3, restoring __getblk_slow() to how it was (plus a checkpatch nitfix). Simplify the interface between grow_buffers() and grow_dev_page(), and avoid the infinite loop beyond end of device by instead checking init_page_buffers()'s end_block there (I presume that's more efficient than a repeated call to blkdev_max_block()), returning -ENXIO to __getblk_slow() in that case. And remove akpm's ten-year-old "__getblk() cannot fail ... weird" comment, but that is worrying: are all users of __getblk() really now prepared for a NULL bh beyond end of device, or will some oops?? Signed-off-by: Hugh Dickins Cc: stable@vger.kernel.org # 3.0 3.2 3.4 3.5 Signed-off-by: Jens Axboe --- fs/buffer.c | 66 ++++++++++++++++++++++++++++--------------------------------- 1 file changed, 30 insertions(+), 36 deletions(-) (limited to 'fs/buffer.c') diff --git a/fs/buffer.c b/fs/buffer.c index 9f6d2e41281..58e2e7b7737 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -914,7 +914,7 @@ link_dev_buffers(struct page *page, struct buffer_head *head) /* * Initialise the state of a blockdev page's buffers. */ -static void +static sector_t init_page_buffers(struct page *page, struct block_device *bdev, sector_t block, int size) { @@ -936,33 +936,41 @@ init_page_buffers(struct page *page, struct block_device *bdev, block++; bh = bh->b_this_page; } while (bh != head); + + /* + * Caller needs to validate requested block against end of device. + */ + return end_block; } /* * Create the page-cache page that contains the requested block. * - * This is user purely for blockdev mappings. + * This is used purely for blockdev mappings. */ -static struct page * +static int grow_dev_page(struct block_device *bdev, sector_t block, - pgoff_t index, int size) + pgoff_t index, int size, int sizebits) { struct inode *inode = bdev->bd_inode; struct page *page; struct buffer_head *bh; + sector_t end_block; + int ret = 0; /* Will call free_more_memory() */ page = find_or_create_page(inode->i_mapping, index, (mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS)|__GFP_MOVABLE); if (!page) - return NULL; + return ret; BUG_ON(!PageLocked(page)); if (page_has_buffers(page)) { bh = page_buffers(page); if (bh->b_size == size) { - init_page_buffers(page, bdev, block, size); - return page; + end_block = init_page_buffers(page, bdev, + index << sizebits, size); + goto done; } if (!try_to_free_buffers(page)) goto failed; @@ -982,14 +990,14 @@ grow_dev_page(struct block_device *bdev, sector_t block, */ spin_lock(&inode->i_mapping->private_lock); link_dev_buffers(page, bh); - init_page_buffers(page, bdev, block, size); + end_block = init_page_buffers(page, bdev, index << sizebits, size); spin_unlock(&inode->i_mapping->private_lock); - return page; - +done: + ret = (block < end_block) ? 1 : -ENXIO; failed: unlock_page(page); page_cache_release(page); - return NULL; + return ret; } /* @@ -999,7 +1007,6 @@ failed: static int grow_buffers(struct block_device *bdev, sector_t block, int size) { - struct page *page; pgoff_t index; int sizebits; @@ -1023,22 +1030,14 @@ grow_buffers(struct block_device *bdev, sector_t block, int size) bdevname(bdev, b)); return -EIO; } - block = index << sizebits; + /* Create a page with the proper size buffers.. */ - page = grow_dev_page(bdev, block, index, size); - if (!page) - return 0; - unlock_page(page); - page_cache_release(page); - return 1; + return grow_dev_page(bdev, block, index, size, sizebits); } static struct buffer_head * __getblk_slow(struct block_device *bdev, sector_t block, int size) { - int ret; - struct buffer_head *bh; - /* Size must be multiple of hard sectorsize */ if (unlikely(size & (bdev_logical_block_size(bdev)-1) || (size < 512 || size > PAGE_SIZE))) { @@ -1051,21 +1050,20 @@ __getblk_slow(struct block_device *bdev, sector_t block, int size) return NULL; } -retry: - bh = __find_get_block(bdev, block, size); - if (bh) - return bh; + for (;;) { + struct buffer_head *bh; + int ret; - ret = grow_buffers(bdev, block, size); - if (ret == 0) { - free_more_memory(); - goto retry; - } else if (ret > 0) { bh = __find_get_block(bdev, block, size); if (bh) return bh; + + ret = grow_buffers(bdev, block, size); + if (ret < 0) + return NULL; + if (ret == 0) + free_more_memory(); } - return NULL; } /* @@ -1321,10 +1319,6 @@ EXPORT_SYMBOL(__find_get_block); * which corresponds to the passed block_device, block and size. The * returned buffer has its reference count incremented. * - * __getblk() cannot fail - it just keeps trying. If you pass it an - * illegal block number, __getblk() will happily return a buffer_head - * which represents the non-existent block. Very weird. - * * __getblk() will lock up the machine if grow_dev_page's try_to_free_buffers() * attempt is failing. FIXME, perhaps? */ -- cgit v1.2.3 From 041bbb6d369811e948ae01f3d00414264076be35 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sun, 30 Sep 2012 23:04:56 -0400 Subject: ext4: fix mtime update in nodelalloc mode Commits 5e8830dc85d0 and 41c4d25f78c0 introduced a regression into v3.6-rc1 for ext4 in nodealloc mode, such that mtime updates would not take place for files modified via mmap if the page was already in the page cache. This would also affect ext3 file systems mounted using the ext4 file system driver. The problem was that ext4_page_mkwrite() had a shortcut which would avoid calling __block_page_mkwrite() under some circumstances, and the above two commit transferred the responsibility of calling file_update_time() to __block_page_mkwrite --- which woudln't get called in some circumstances. Since __block_page_mkwrite() only has three callers, block_page_mkwrite(), ext4_page_mkwrite, and nilfs_page_mkwrite(), the best way to solve this is to move the responsibility for calling file_update_time() to its caller. This problem was found via xfstests #215 with a file system mounted with -o nodelalloc. Signed-off-by: "Theodore Ts'o" Reviewed-by: Jan Kara Cc: KONISHI Ryusuke Cc: stable@vger.kernel.org --- fs/buffer.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'fs/buffer.c') diff --git a/fs/buffer.c b/fs/buffer.c index 9f6d2e41281..1fe3968357a 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2318,12 +2318,6 @@ int __block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, loff_t size; int ret; - /* - * Update file times before taking page lock. We may end up failing the - * fault so this update may be superfluous but who really cares... - */ - file_update_time(vma->vm_file); - lock_page(page); size = i_size_read(inode); if ((page->mapping != inode->i_mapping) || @@ -2361,6 +2355,13 @@ int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, struct super_block *sb = vma->vm_file->f_path.dentry->d_inode->i_sb; sb_start_pagefault(sb); + + /* + * Update file times before taking page lock. We may end up failing the + * fault so this update may be superfluous but who really cares... + */ + file_update_time(vma->vm_file); + ret = __block_page_mkwrite(vma, vmf, get_block); sb_end_pagefault(sb); return block_page_mkwrite_return(ret); -- cgit v1.2.3 From 45bce8f3e3436bbe2e03dd2b076abdce79ffabb7 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 29 Nov 2012 10:21:43 -0800 Subject: fs/buffer.c: make block-size be per-page and protected by the page lock This makes the buffer size handling be a per-page thing, which allows us to not have to worry about locking too much when changing the buffer size. If a page doesn't have buffers, we still need to read the block size from the inode, but we can do that with ACCESS_ONCE(), so that even if the size is changing, we get a consistent value. This doesn't convert all functions - many of the buffer functions are used purely by filesystems, which in turn results in the buffer size being fixed at mount-time. So they don't have the same consistency issues that the raw device access can have. Signed-off-by: Linus Torvalds --- fs/buffer.c | 79 +++++++++++++++++++++++++++++++++++++------------------------ 1 file changed, 48 insertions(+), 31 deletions(-) (limited to 'fs/buffer.c') diff --git a/fs/buffer.c b/fs/buffer.c index b5f044283ed..28a74ff5324 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1552,6 +1552,28 @@ void unmap_underlying_metadata(struct block_device *bdev, sector_t block) } EXPORT_SYMBOL(unmap_underlying_metadata); +/* + * Size is a power-of-two in the range 512..PAGE_SIZE, + * and the case we care about most is PAGE_SIZE. + * + * So this *could* possibly be written with those + * constraints in mind (relevant mostly if some + * architecture has a slow bit-scan instruction) + */ +static inline int block_size_bits(unsigned int blocksize) +{ + return ilog2(blocksize); +} + +static struct buffer_head *create_page_buffers(struct page *page, struct inode *inode, unsigned int b_state) +{ + BUG_ON(!PageLocked(page)); + + if (!page_has_buffers(page)) + create_empty_buffers(page, 1 << ACCESS_ONCE(inode->i_blkbits), b_state); + return page_buffers(page); +} + /* * NOTE! All mapped/uptodate combinations are valid: * @@ -1589,19 +1611,13 @@ static int __block_write_full_page(struct inode *inode, struct page *page, sector_t block; sector_t last_block; struct buffer_head *bh, *head; - const unsigned blocksize = 1 << inode->i_blkbits; + unsigned int blocksize, bbits; int nr_underway = 0; int write_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE); - BUG_ON(!PageLocked(page)); - - last_block = (i_size_read(inode) - 1) >> inode->i_blkbits; - - if (!page_has_buffers(page)) { - create_empty_buffers(page, blocksize, + head = create_page_buffers(page, inode, (1 << BH_Dirty)|(1 << BH_Uptodate)); - } /* * Be very careful. We have no exclusion from __set_page_dirty_buffers @@ -1613,9 +1629,12 @@ static int __block_write_full_page(struct inode *inode, struct page *page, * handle that here by just cleaning them. */ - block = (sector_t)page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits); - head = page_buffers(page); bh = head; + blocksize = bh->b_size; + bbits = block_size_bits(blocksize); + + block = (sector_t)page->index << (PAGE_CACHE_SHIFT - bbits); + last_block = (i_size_read(inode) - 1) >> bbits; /* * Get all the dirty buffers mapped to disk addresses and @@ -1806,12 +1825,10 @@ int __block_write_begin(struct page *page, loff_t pos, unsigned len, BUG_ON(to > PAGE_CACHE_SIZE); BUG_ON(from > to); - blocksize = 1 << inode->i_blkbits; - if (!page_has_buffers(page)) - create_empty_buffers(page, blocksize, 0); - head = page_buffers(page); + head = create_page_buffers(page, inode, 0); + blocksize = head->b_size; + bbits = block_size_bits(blocksize); - bbits = inode->i_blkbits; block = (sector_t)page->index << (PAGE_CACHE_SHIFT - bbits); for(bh = head, block_start = 0; bh != head || !block_start; @@ -1881,11 +1898,11 @@ static int __block_commit_write(struct inode *inode, struct page *page, unsigned blocksize; struct buffer_head *bh, *head; - blocksize = 1 << inode->i_blkbits; + bh = head = page_buffers(page); + blocksize = bh->b_size; - for(bh = head = page_buffers(page), block_start = 0; - bh != head || !block_start; - block_start=block_end, bh = bh->b_this_page) { + block_start = 0; + do { block_end = block_start + blocksize; if (block_end <= from || block_start >= to) { if (!buffer_uptodate(bh)) @@ -1895,7 +1912,10 @@ static int __block_commit_write(struct inode *inode, struct page *page, mark_buffer_dirty(bh); } clear_buffer_new(bh); - } + + block_start = block_end; + bh = bh->b_this_page; + } while (bh != head); /* * If this is a partial write which happened to make all buffers @@ -2020,7 +2040,6 @@ EXPORT_SYMBOL(generic_write_end); int block_is_partially_uptodate(struct page *page, read_descriptor_t *desc, unsigned long from) { - struct inode *inode = page->mapping->host; unsigned block_start, block_end, blocksize; unsigned to; struct buffer_head *bh, *head; @@ -2029,13 +2048,13 @@ int block_is_partially_uptodate(struct page *page, read_descriptor_t *desc, if (!page_has_buffers(page)) return 0; - blocksize = 1 << inode->i_blkbits; + head = page_buffers(page); + blocksize = head->b_size; to = min_t(unsigned, PAGE_CACHE_SIZE - from, desc->count); to = from + to; if (from < blocksize && to > PAGE_CACHE_SIZE - blocksize) return 0; - head = page_buffers(page); bh = head; block_start = 0; do { @@ -2068,18 +2087,16 @@ int block_read_full_page(struct page *page, get_block_t *get_block) struct inode *inode = page->mapping->host; sector_t iblock, lblock; struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE]; - unsigned int blocksize; + unsigned int blocksize, bbits; int nr, i; int fully_mapped = 1; - BUG_ON(!PageLocked(page)); - blocksize = 1 << inode->i_blkbits; - if (!page_has_buffers(page)) - create_empty_buffers(page, blocksize, 0); - head = page_buffers(page); + head = create_page_buffers(page, inode, 0); + blocksize = head->b_size; + bbits = block_size_bits(blocksize); - iblock = (sector_t)page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits); - lblock = (i_size_read(inode)+blocksize-1) >> inode->i_blkbits; + iblock = (sector_t)page->index << (PAGE_CACHE_SHIFT - bbits); + lblock = (i_size_read(inode)+blocksize-1) >> bbits; bh = head; nr = 0; i = 0; -- cgit v1.2.3 From bbec0270bdd887f96377065ee38b8848b5afa395 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 29 Nov 2012 12:31:52 -0800 Subject: blkdev_max_block: make private to fs/buffer.c We really don't want to look at the block size for the raw block device accesses in fs/block-dev.c, because it may be changing from under us. So get rid of the max_block logic entirely, since the caller should already have done it anyway. That leaves the only user of this function in fs/buffer.c, so move the whole function there and make it static. Signed-off-by: Linus Torvalds --- fs/buffer.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) (limited to 'fs/buffer.c') diff --git a/fs/buffer.c b/fs/buffer.c index 28a74ff5324..3586fb05c8c 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -911,6 +911,18 @@ link_dev_buffers(struct page *page, struct buffer_head *head) attach_page_buffers(page, head); } +static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size) +{ + sector_t retval = ~((sector_t)0); + loff_t sz = i_size_read(bdev->bd_inode); + + if (sz) { + unsigned int sizebits = blksize_bits(size); + retval = (sz >> sizebits); + } + return retval; +} + /* * Initialise the state of a blockdev page's buffers. */ @@ -921,7 +933,7 @@ init_page_buffers(struct page *page, struct block_device *bdev, struct buffer_head *head = page_buffers(page); struct buffer_head *bh = head; int uptodate = PageUptodate(page); - sector_t end_block = blkdev_max_block(I_BDEV(bdev->bd_inode)); + sector_t end_block = blkdev_max_block(I_BDEV(bdev->bd_inode), size); do { if (!buffer_mapped(bh)) { -- cgit v1.2.3 From 57302e0ddf8a210a66fd8a1a2fa50844863b5ded Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 4 Dec 2012 08:25:11 -0800 Subject: vfs: avoid "attempt to access beyond end of device" warnings The block device access simplification that avoided accessing the (racy) block size information (commit bbec0270bdd8: "blkdev_max_block: make private to fs/buffer.c") no longer checks the maximum block size in the block mapping path. That was _almost_ as simple as just removing the code entirely, because the readers and writers all check the size of the device anyway, so under normal circumstances it "just worked". However, the block size may be such that the end of the device may straddle one single buffer_head. At which point we may still want to access the end of the device, but the buffer we use to access it partially extends past the end. The 'bd_set_size()' function intentionally sets the block size to avoid this, but mounting the device - or setting the block size by hand to some other value - can modify that block size. So instead, teach 'submit_bh()' about the special case of the buffer head straddling the end of the device, and turning such an access into a smaller IO access, avoiding the problem. This, btw, also means that unlike before, we can now access the whole device regardless of device block size setting. So now, even if the device size is only 512-byte aligned, we can read and write even the last sector even when having a much bigger block size for accessing the rest of the device. So with this, we could now get rid of the 'bd_set_size()' block size code entirely - resulting in faster IO for the common case - but that would be a separate patch. Reported-and-tested-by: Romain Francoise Reporeted-and-tested-by: Meelis Roos Reported-by: Tony Luck Signed-off-by: Linus Torvalds --- fs/buffer.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) (limited to 'fs/buffer.c') diff --git a/fs/buffer.c b/fs/buffer.c index 3586fb05c8c..c4e11390a44 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2893,6 +2893,55 @@ static void end_bio_bh_io_sync(struct bio *bio, int err) bio_put(bio); } +/* + * This allows us to do IO even on the odd last sectors + * of a device, even if the bh block size is some multiple + * of the physical sector size. + * + * We'll just truncate the bio to the size of the device, + * and clear the end of the buffer head manually. + * + * Truly out-of-range accesses will turn into actual IO + * errors, this only handles the "we need to be able to + * do IO at the final sector" case. + */ +static void guard_bh_eod(int rw, struct bio *bio, struct buffer_head *bh) +{ + sector_t maxsector; + unsigned bytes; + + maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9; + if (!maxsector) + return; + + /* + * If the *whole* IO is past the end of the device, + * let it through, and the IO layer will turn it into + * an EIO. + */ + if (unlikely(bio->bi_sector >= maxsector)) + return; + + maxsector -= bio->bi_sector; + bytes = bio->bi_size; + if (likely((bytes >> 9) <= maxsector)) + return; + + /* Uhhuh. We've got a bh that straddles the device size! */ + bytes = maxsector << 9; + + /* Truncate the bio.. */ + bio->bi_size = bytes; + bio->bi_io_vec[0].bv_len = bytes; + + /* ..and clear the end of the buffer for reads */ + if (rw & READ) { + void *kaddr = kmap_atomic(bh->b_page); + memset(kaddr + bh_offset(bh) + bytes, 0, bh->b_size - bytes); + kunmap_atomic(kaddr); + } +} + int submit_bh(int rw, struct buffer_head * bh) { struct bio *bio; @@ -2929,6 +2978,9 @@ int submit_bh(int rw, struct buffer_head * bh) bio->bi_end_io = end_bio_bh_io_sync; bio->bi_private = bh; + /* Take care of bh's that straddle the end of the device */ + guard_bh_eod(rw, bio, bh); + bio_get(bio); submit_bio(rw, bio); -- cgit v1.2.3 From 27d7c2a006a81c04fab00b8cd81b99af3b32738d Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 5 Dec 2012 20:01:24 +0300 Subject: vfs: clear to the end of the buffer on partial buffer reads READ is zero so the "rw & READ" test is always false. The intended test was "((rw & RW_MASK) == READ)". Signed-off-by: Dan Carpenter Signed-off-by: Linus Torvalds --- fs/buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/buffer.c') diff --git a/fs/buffer.c b/fs/buffer.c index c4e11390a44..ec0aca8ba6b 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2935,7 +2935,7 @@ static void guard_bh_eod(int rw, struct bio *bio, struct buffer_head *bh) bio->bi_io_vec[0].bv_len = bytes; /* ..and clear the end of the buffer for reads */ - if (rw & READ) { + if ((rw & RW_MASK) == READ) { void *kaddr = kmap_atomic(bh->b_page); memset(kaddr + bh_offset(bh) + bytes, 0, bh->b_size - bytes); kunmap_atomic(kaddr); -- cgit v1.2.3 From 252aa6f5be64c90c67b9f066ccff880f6b487d32 Mon Sep 17 00:00:00 2001 From: Rafael Aquini Date: Tue, 11 Dec 2012 16:02:35 -0800 Subject: mm: redefine address_space.assoc_mapping Overhaul struct address_space.assoc_mapping renaming it to address_space.private_data and its type is redefined to void*. By this approach we consistently name the .private_* elements from struct address_space as well as allow extended usage for address_space association with other data structures through ->private_data. Also, all users of old ->assoc_mapping element are converted to reflect its new name and type change (->private_data). Signed-off-by: Rafael Aquini Cc: Rusty Russell Cc: "Michael S. Tsirkin" Cc: Rik van Riel Cc: Mel Gorman Cc: Andi Kleen Cc: Konrad Rzeszutek Wilk Cc: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/buffer.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'fs/buffer.c') diff --git a/fs/buffer.c b/fs/buffer.c index ec0aca8ba6b..6e9ed48064f 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -555,7 +555,7 @@ void emergency_thaw_all(void) */ int sync_mapping_buffers(struct address_space *mapping) { - struct address_space *buffer_mapping = mapping->assoc_mapping; + struct address_space *buffer_mapping = mapping->private_data; if (buffer_mapping == NULL || list_empty(&mapping->private_list)) return 0; @@ -588,10 +588,10 @@ void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode) struct address_space *buffer_mapping = bh->b_page->mapping; mark_buffer_dirty(bh); - if (!mapping->assoc_mapping) { - mapping->assoc_mapping = buffer_mapping; + if (!mapping->private_data) { + mapping->private_data = buffer_mapping; } else { - BUG_ON(mapping->assoc_mapping != buffer_mapping); + BUG_ON(mapping->private_data != buffer_mapping); } if (!bh->b_assoc_map) { spin_lock(&buffer_mapping->private_lock); @@ -788,7 +788,7 @@ void invalidate_inode_buffers(struct inode *inode) if (inode_has_buffers(inode)) { struct address_space *mapping = &inode->i_data; struct list_head *list = &mapping->private_list; - struct address_space *buffer_mapping = mapping->assoc_mapping; + struct address_space *buffer_mapping = mapping->private_data; spin_lock(&buffer_mapping->private_lock); while (!list_empty(list)) @@ -811,7 +811,7 @@ int remove_inode_buffers(struct inode *inode) if (inode_has_buffers(inode)) { struct address_space *mapping = &inode->i_data; struct list_head *list = &mapping->private_list; - struct address_space *buffer_mapping = mapping->assoc_mapping; + struct address_space *buffer_mapping = mapping->private_data; spin_lock(&buffer_mapping->private_lock); while (!list_empty(list)) { -- cgit v1.2.3 From a3f3c29cb290a2d5d26e3cf5504f447fd7256a81 Mon Sep 17 00:00:00 2001 From: Yan Hong Date: Wed, 12 Dec 2012 13:52:15 -0800 Subject: fs/buffer.c: do not inline exported function It makes no sense to inline an exported function. Signed-off-by: Yan Hong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/buffer.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'fs/buffer.c') diff --git a/fs/buffer.c b/fs/buffer.c index 6e9ed48064f..9083e528e3c 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -46,8 +46,7 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list); #define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers) -inline void -init_buffer(struct buffer_head *bh, bh_end_io_t *handler, void *private) +void init_buffer(struct buffer_head *bh, bh_end_io_t *handler, void *private) { bh->b_end_io = handler; bh->b_private = private; -- cgit v1.2.3 From 02c0ab684fc41bc13ba8d5ad89b0dc73b092fa08 Mon Sep 17 00:00:00 2001 From: Yan Hong Date: Wed, 12 Dec 2012 13:52:16 -0800 Subject: fs/buffer.c: remove redundant initialization in alloc_page_buffers() buffer_head comes from kmem_cache_zalloc(), no need to zero its fields. Signed-off-by: Yan Hong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/buffer.c | 3 --- 1 file changed, 3 deletions(-) (limited to 'fs/buffer.c') diff --git a/fs/buffer.c b/fs/buffer.c index 9083e528e3c..c017a2dfb90 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -849,13 +849,10 @@ try_again: if (!bh) goto no_grow; - bh->b_bdev = NULL; bh->b_this_page = head; bh->b_blocknr = -1; head = bh; - bh->b_state = 0; - atomic_set(&bh->b_count, 0); bh->b_size = size; /* Link the buffer to its page */ -- cgit v1.2.3 From 6d283dba3721cc43be014b50a1acc2f35860a65a Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 14 Jan 2013 13:17:50 -0800 Subject: vfs: add missing virtual cache flush after editing partial pages Andrew Morton pointed this out a month ago, and then I completely forgot about it. If we read a partial last page of a block device, we will zero out the end of the page, but since that page can then be mapped into user space, we should also make sure to flush the cache on architectures that have virtual caches. We have the flush_dcache_page() function for this, so use it. Now, in practice this really never matters, because nobody sane uses virtual caches to begin with, and they largely exist on old broken RISC arhitectures. And even if you did run on one of those obsolete CPU's, the whole "mmap and access the last partial page of a block device" behavior probably doesn't actually exist. The normal IO functions (read/write) will never see the zeroed-out part of the page that migth not be coherent in the cache, because they honor the size of the device. So I'm marking this for stable (3.7 only), but I'm not sure anybody will ever care. Pointed-out-by: Andrew Morton Cc: stable@vger.kernel.org # 3.7 Signed-off-by: Linus Torvalds --- fs/buffer.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs/buffer.c') diff --git a/fs/buffer.c b/fs/buffer.c index c017a2dfb90..7a75c3e0fd5 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2935,6 +2935,7 @@ static void guard_bh_eod(int rw, struct bio *bio, struct buffer_head *bh) void *kaddr = kmap_atomic(bh->b_page); memset(kaddr + bh_offset(bh) + bytes, 0, bh->b_size - bytes); kunmap_atomic(kaddr); + flush_dcache_page(bh->b_page); } } -- cgit v1.2.3