aboutsummaryrefslogtreecommitdiffstats
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2009-12-01CacheFiles: Update IMA counters when using dentry_openMarc Dionne1-0/+2
When IMA is active, using dentry_open without updating the IMA counters will result in free/open imbalance errors when fput is eventually called. Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-019p: fix build breakage introduced by FS-CacheDavid Howells1-1/+1
While building 2.6.32-rc8-git2 for Fedora I noticed the following thinko in commit 201a15428bd54f83eccec8b7c64a04b8f9431204 ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions"): fs/9p/cache.c: In function '__v9fs_fscache_release_page': fs/9p/cache.c:346: error: 'vnode' undeclared (first use in this function) fs/9p/cache.c:346: error: (Each undeclared identifier is reported only once fs/9p/cache.c:346: error: for each function it appears in.) make[2]: *** [fs/9p/cache.o] Error 1 Fix the 9P filesystem to correctly construct the argument to fscache_maybe_release_page(). Signed-off-by: Kyle McMartin <kyle@redhat.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> [from identical patch] Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> [from identical patch] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds2-5/+12
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Fix sparse warning [CIFS] Duplicate data on appending to some Samba servers [CIFS] fix oops in cifs_lookup during net boot
2009-11-30Merge branch 'for-linus' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: reject O_DIRECT flag also in fuse_create
2009-11-30jffs2: Fix memory corruption in jffs2_read_inode_range()David Woodhouse1-3/+6
In 2.6.23 kernel, commit a32ea1e1f925399e0d81ca3f7394a44a6dafa12c ("Fix read/truncate race") fixed a race in the generic code, and as a side effect, now do_generic_file_read() can ask us to readpage() past the i_size. This seems to be correctly handled by the block routines (e.g. block_read_full_page() fills the page with zeroes in case if somebody is trying to read past the last inode's block). JFFS2 doesn't handle this; it assumes that it won't be asked to read pages which don't exist -- and thus that there will be at least _one_ valid 'frag' on the page it's being asked to read. It will fill any holes with the following memset: memset(buf, 0, min(end, frag->ofs + frag->size) - offset); When the 'closest smaller match' returned by jffs2_lookup_node_frag() is actually on a previous page and ends before 'offset', that results in: memset(buf, 0, <huge unsigned negative>); Hopefully, in most cases the corruption is fatal, and quickly causing random oopses, like this: root@10.0.0.4:~/ltp-fs-20090531# ./testcases/kernel/fs/ftest/ftest01 Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc01cd980 Oops: Kernel access of bad area, sig: 11 [#1] [...] NIP [c01cd980] rb_insert_color+0x38/0x184 LR [c0043978] enqueue_hrtimer+0x88/0xc4 Call Trace: [c6c63b60] [c004f9a8] tick_sched_timer+0xa0/0xe4 (unreliable) [c6c63b80] [c0043978] enqueue_hrtimer+0x88/0xc4 [c6c63b90] [c0043a48] __run_hrtimer+0x94/0xbc [c6c63bb0] [c0044628] hrtimer_interrupt+0x140/0x2b8 [c6c63c10] [c000f8e8] timer_interrupt+0x13c/0x254 [c6c63c30] [c001352c] ret_from_except+0x0/0x14 --- Exception: 901 at memset+0x38/0x5c LR = jffs2_read_inode_range+0x144/0x17c [c6c63cf0] [00000000] (null) (unreliable) This patch fixes the issue, plus fixes all LTP tests on NAND/UBI with JFFS2 filesystem that were failing since 2.6.23 (seems like the bug above also broke the truncation). Reported-By: Anton Vorontsov <avorontsov@ru.mvista.com> Tested-By: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscacheLinus Torvalds21-209/+1322
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (31 commits) FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=n SLOW_WORK: Fix GFS2 to #include <linux/module.h> before using THIS_MODULE SLOW_WORK: Fix CIFS to pass THIS_MODULE to slow_work_register_user() CacheFiles: Don't log lookup/create failing with ENOBUFS CacheFiles: Catch an overly long wait for an old active object CacheFiles: Better showing of debugging information in active object problems CacheFiles: Mark parent directory locks as I_MUTEX_PARENT to keep lockdep happy CacheFiles: Handle truncate unlocking the page we're reading CacheFiles: Don't write a full page if there's only a partial page to cache FS-Cache: Actually requeue an object when requested FS-Cache: Start processing an object's operations on that object's death FS-Cache: Make sure FSCACHE_COOKIE_LOOKING_UP cleared on lookup failure FS-Cache: Add a retirement stat counter FS-Cache: Handle pages pending storage that get evicted under OOM conditions FS-Cache: Handle read request vs lookup, creation or other cache failure FS-Cache: Don't delete pending pages from the page-store tracking tree FS-Cache: Fix lock misorder in fscache_write_op() FS-Cache: The object-available state can't rely on the cookie to be available FS-Cache: Permit cache retrieval ops to be interrupted in the initial wait phase FS-Cache: Use radix tree preload correctly in tracking of pages to be stored ...
2009-11-27fuse: reject O_DIRECT flag also in fuse_createCsaba Henk1-0/+3
The comment in fuse_open about O_DIRECT: "VFS checks this, but only _after_ ->open()" also holds for fuse_create, however, the same kind of check was missing there. As an impact of this bug, open(newfile, O_RDWR|O_CREAT|O_DIRECT) fails, but a stub newfile will remain if the fuse server handled the implied FUSE_CREATE request appropriately. Other impact: in the above situation ima_file_free() will complain to open/free imbalance if CONFIG_IMA is set. Signed-off-by: Csaba Henk <csaba@gluster.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Harshavardhana <harsha@gluster.com> Cc: stable@kernel.org
2009-11-25[CIFS] Fix sparse warningSteve French2-1/+10
Also update CHANGES file Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-24[CIFS] Duplicate data on appending to some Samba serversSteve French1-2/+0
SMB writes are sent with a starting offset and length. When the server supports the newer SMB trans2 posix open (rather than using the SMB NTCreateX) a file can be opened with SMB_O_APPEND flag, and for that case Samba server assumes that the offset sent in SMBWriteX is unneeded since the write should go to the end of the file - which can cause problems if the write was cached (since the beginning part of a page could be written twice by the client mm). Jeff suggested that masking the flag on posix open on the client is easiest for the time being. Note that recent Samba server also had an unrelated problem with SMB NTCreateX and append (see samba bugzilla bug number 6898) which should not affect current Linux clients (unless cifs Unix Extensions are disabled). The cifs client did not send the O_APPEND flag on posix open before 2.6.29 so the fix is unneeded on early kernels. In the future, for the non-cached case (O_DIRECT, and forcedirectio mounts) it would be possible and useful to send O_APPEND on posix open (for Windows case: FILE_APPEND_DATA but not FILE_WRITE_DATA on SMB NTCreateX) but for cached writes although the vfs sets the offset to end of file it may fragment a write across pages - so we can't send O_APPEND on open (could result in sending part of a page twice). CC: Stable <stable@kernel.org> Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-24[CIFS] fix oops in cifs_lookup during net bootSteve French1-2/+2
Fixes bugzilla.kernel.org bug number 14641 Lookup called during network boot (network root filesystem for diskless workstation) has case where nd is null in lookup. This patch fixes that in cifs_lookup. (Shirish noted that 2.6.30 and 2.6.31 stable need the same check) Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> Tested-by: Vladimir Stavrinov <vs@inist.ru> CC: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-20FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=nDavid Howells1-0/+1
Provide nop fscache_stat_d() macro if CONFIG_FSCACHE_STATS=n lest errors like the following occur: fs/fscache/cache.c: In function 'fscache_withdraw_cache': fs/fscache/cache.c:386: error: implicit declaration of function 'fscache_stat_d' fs/fscache/cache.c:386: error: 'fscache_n_cop_sync_cache' undeclared (first use in this function) fs/fscache/cache.c:386: error: (Each undeclared identifier is reported only once fs/fscache/cache.c:386: error: for each function it appears in.) fs/fscache/cache.c:392: error: 'fscache_n_cop_dissociate_pages' undeclared (first use in this function) Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-20SLOW_WORK: Fix GFS2 to #include <linux/module.h> before using THIS_MODULEDavid Howells1-0/+1
GFS2 has been altered to pass THIS_MODULE to slow_work_register_user(), but hasn't been altered to #include <linux/module.h> to provide it, resulting in the following error: fs/gfs2/recovery.c:596: error: 'THIS_MODULE' undeclared here (not in a function) Add the missing #include. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-20SLOW_WORK: Fix CIFS to pass THIS_MODULE to slow_work_register_user()David Howells1-1/+1
As of the patch: SLOW_WORK: Wait for outstanding work items belonging to a module to clear Wait for outstanding slow work items belonging to a module to clear when unregistering that module as a user of the facility. This prevents the put_ref code of a work item from being taken away before it returns. slow_work_register_user() takes a module pointer as an argument. CIFS must now pass THIS_MODULE as that argument, lest the following error be observed: fs/cifs/cifsfs.c: In function 'init_cifs': fs/cifs/cifsfs.c:1040: error: too few arguments to function 'slow_work_register_user' Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19Merge branch 'upstream-linus' of ↵Linus Torvalds5-24/+80
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Trivial cleanup of jbd compatibility layer removal ocfs2: Refresh documentation ocfs2: return f_fsid info in ocfs2_statfs() ocfs2: duplicate inline data properly during reflink. ocfs2: Move ocfs2_complete_reflink to the right place. ocfs2: Return -EINVAL when a device is not ocfs2.
2009-11-19Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds1-1/+1
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Address buffer overrun in rpc_uaddr2sockaddr() NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT
2009-11-19CacheFiles: Don't log lookup/create failing with ENOBUFSDavid Howells1-2/+3
Don't log the CacheFiles lookup/create object routined failing with ENOBUFS as under high memory load or high cache load they can do this quite a lot. This error simply means that the requested object cannot be created on disk due to lack of space, or due to failure of the backing filesystem to find sufficient resources. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19CacheFiles: Catch an overly long wait for an old active objectDavid Howells5-23/+85
Catch an overly long wait for an old, dying active object when we want to replace it with a new one. The probability is that all the slow-work threads are hogged, and the delete can't get a look in. What we do instead is: (1) if there's nothing in the slow work queue, we sleep until either the dying object has finished dying or there is something in the slow work queue behind which we can queue our object. (2) if there is something in the slow work queue, we return ETIMEDOUT to fscache_lookup_object(), which then puts us back on the slow work queue, presumably behind the deletion that we're blocked by. We are then deferred for a while until we work our way back through the queue - without blocking a slow-work thread unnecessarily. A backtrace similar to the following may appear in the log without this patch: INFO: task kslowd004:5711 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kslowd004 D 0000000000000000 0 5711 2 0x00000080 ffff88000340bb80 0000000000000046 ffff88002550d000 0000000000000000 ffff88002550d000 0000000000000007 ffff88000340bfd8 ffff88002550d2a8 000000000000ddf0 00000000000118c0 00000000000118c0 ffff88002550d2a8 Call Trace: [<ffffffff81058e21>] ? trace_hardirqs_on+0xd/0xf [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles] [<ffffffffa011c4e1>] cachefiles_wait_bit+0x9/0xd [cachefiles] [<ffffffff81353153>] __wait_on_bit+0x43/0x76 [<ffffffff8111ae39>] ? ext3_xattr_get+0x1ec/0x270 [<ffffffff813531ef>] out_of_line_wait_on_bit+0x69/0x74 [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles] [<ffffffff8104c125>] ? wake_bit_function+0x0/0x2e [<ffffffffa011bc79>] cachefiles_mark_object_active+0x203/0x23b [cachefiles] [<ffffffffa011c209>] cachefiles_walk_to_object+0x558/0x827 [cachefiles] [<ffffffffa011a429>] cachefiles_lookup_object+0xac/0x12a [cachefiles] [<ffffffffa00aa1e9>] fscache_lookup_object+0x1c7/0x214 [fscache] [<ffffffffa00aafc5>] fscache_object_state_machine+0xa5/0x52d [fscache] [<ffffffffa00ab4ac>] fscache_object_slow_work_execute+0x5f/0xa0 [fscache] [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308 [<ffffffff8104be91>] kthread+0x7a/0x82 [<ffffffff8100beda>] child_rip+0xa/0x20 [<ffffffff8100b87c>] ? restore_args+0x0/0x30 [<ffffffff8104be17>] ? kthread+0x0/0x82 [<ffffffff8100bed0>] ? child_rip+0x0/0x20 1 lock held by kslowd004/5711: #0: (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffffa011be64>] cachefiles_walk_to_object+0x1b3/0x827 [cachefiles] Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19CacheFiles: Better showing of debugging information in active object problemsDavid Howells1-27/+75
Show more debugging information if cachefiles_mark_object_active() is asked to activate an active object. This may happen, for instance, if the netfs tries to register an object with the same key multiple times. The code is changed to (a) get the appropriate object lock to protect the cookie pointer whilst we dereference it, and (b) get and display the cookie key if available. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19CacheFiles: Mark parent directory locks as I_MUTEX_PARENT to keep lockdep happyDavid Howells1-2/+2
Mark parent directory locks as I_MUTEX_PARENT in the callers of cachefiles_bury_object() so that lockdep doesn't complain when that invokes vfs_unlink(): ============================================= [ INFO: possible recursive locking detected ] 2.6.32-rc6-cachefs #47 --------------------------------------------- kslowd002/3089 is trying to acquire lock: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff810bbf72>] vfs_unlink+0x8b/0x128 but task is already holding lock: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffffa00e4e61>] cachefiles_walk_to_object+0x1b0/0x831 [cachefiles] other info that might help us debug this: 1 lock held by kslowd002/3089: #0: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffffa00e4e61>] cachefiles_walk_to_object+0x1b0/0x831 [cachefiles] stack backtrace: Pid: 3089, comm: kslowd002 Not tainted 2.6.32-rc6-cachefs #47 Call Trace: [<ffffffff8105ad7b>] __lock_acquire+0x1649/0x16e3 [<ffffffff8118170e>] ? inode_has_perm+0x5f/0x61 [<ffffffff8105ae6c>] lock_acquire+0x57/0x6d [<ffffffff810bbf72>] ? vfs_unlink+0x8b/0x128 [<ffffffff81353ac3>] mutex_lock_nested+0x54/0x292 [<ffffffff810bbf72>] ? vfs_unlink+0x8b/0x128 [<ffffffff8118179e>] ? selinux_inode_permission+0x8e/0x90 [<ffffffff8117e271>] ? security_inode_permission+0x1c/0x1e [<ffffffff810bb4fb>] ? inode_permission+0x99/0xa5 [<ffffffff810bbf72>] vfs_unlink+0x8b/0x128 [<ffffffff810adb19>] ? kfree+0xed/0xf9 [<ffffffffa00e3f00>] cachefiles_bury_object+0xb6/0x420 [cachefiles] [<ffffffff81058e21>] ? trace_hardirqs_on+0xd/0xf [<ffffffffa00e7e24>] ? cachefiles_check_object_xattr+0x233/0x293 [cachefiles] [<ffffffffa00e51b0>] cachefiles_walk_to_object+0x4ff/0x831 [cachefiles] [<ffffffff81032238>] ? finish_task_switch+0x0/0xb2 [<ffffffffa00e3429>] cachefiles_lookup_object+0xac/0x12a [cachefiles] [<ffffffffa00741e9>] fscache_lookup_object+0x1c7/0x214 [fscache] [<ffffffffa0074fc5>] fscache_object_state_machine+0xa5/0x52d [fscache] [<ffffffffa00754ac>] fscache_object_slow_work_execute+0x5f/0xa0 [fscache] [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308 [<ffffffff8104be91>] kthread+0x7a/0x82 [<ffffffff8100beda>] child_rip+0xa/0x20 [<ffffffff8100b87c>] ? restore_args+0x0/0x30 [<ffffffff8104be17>] ? kthread+0x0/0x82 [<ffffffff8100bed0>] ? child_rip+0x0/0x20 Signed-off-by: Daivd Howells <dhowells@redhat.com>
2009-11-19CacheFiles: Handle truncate unlocking the page we're readingDavid Howells1-6/+93
Handle truncate unlocking the page we're attempting to read from the backing device before the read has completed. This was causing reports like the following to occur: Pid: 4765, comm: kslowd Not tainted 2.6.30.1 #1 Call Trace: [<ffffffffa0331d7a>] ? cachefiles_read_waiter+0xd9/0x147 [cachefiles] [<ffffffff804b74bd>] ? __wait_on_bit+0x60/0x6f [<ffffffff8022bbbb>] ? __wake_up_common+0x3f/0x71 [<ffffffff8022cc32>] ? __wake_up+0x30/0x44 [<ffffffff8024a41f>] ? __wake_up_bit+0x28/0x2d [<ffffffffa003a793>] ? ext3_truncate+0x4d7/0x8ed [ext3] [<ffffffff80281f90>] ? pagevec_lookup+0x17/0x1f [<ffffffff8028c2ff>] ? unmap_mapping_range+0x59/0x1ff [<ffffffff8022cc32>] ? __wake_up+0x30/0x44 [<ffffffff8028e286>] ? vmtruncate+0xc2/0xe2 [<ffffffff802b82cf>] ? inode_setattr+0x22/0x10a [<ffffffffa003baa5>] ? ext3_setattr+0x17b/0x1e6 [ext3] [<ffffffff802b853d>] ? notify_change+0x186/0x2c9 [<ffffffffa032d9de>] ? cachefiles_attr_changed+0x133/0x1cd [cachefiles] [<ffffffffa032df7f>] ? cachefiles_lookup_object+0xcf/0x12a [cachefiles] [<ffffffffa0318165>] ? fscache_lookup_object+0x110/0x122 [fscache] [<ffffffffa03188c3>] ? fscache_object_slow_work_execute+0x590/0x6bc [fscache] [<ffffffff80278f82>] ? slow_work_thread+0x285/0x43a [<ffffffff8024a446>] ? autoremove_wake_function+0x0/0x2e [<ffffffff80278cfd>] ? slow_work_thread+0x0/0x43a [<ffffffff8024a317>] ? kthread+0x54/0x81 [<ffffffff8020c93a>] ? child_rip+0xa/0x20 [<ffffffff8024a2c3>] ? kthread+0x0/0x81 [<ffffffff8020c930>] ? child_rip+0x0/0x20 CacheFiles: I/O Error: Readpage failed on backing file 200000000000810 FS-Cache: Cache cachefiles stopped due to I/O error Reported-by: Christian Kujau <lists@nerdbynature.de> Reported-by: Takashi Iwai <tiwai@suse.de> Reported-by: Duc Le Minh <duclm.vn@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19CacheFiles: Don't write a full page if there's only a partial page to cacheDavid Howells2-7/+36
cachefiles_write_page() writes a full page to the backing file for the last page of the netfs file, even if the netfs file's last page is only a partial page. This causes the EOF on the backing file to be extended beyond the EOF of the netfs, and thus the backing file will be truncated by cachefiles_attr_changed() called from cachefiles_lookup_object(). So we need to limit the write we make to the backing file on that last page such that it doesn't push the EOF too far. Also, if a backing file that has a partial page at the end is expanded, we discard the partial page and refetch it on the basis that we then have a hole in the file with invalid data, and should the power go out... A better way to deal with this could be to record a note that the partial page contains invalid data until the correct data is written into it. This isn't a problem for netfs's that discard the whole backing file if the file size changes (such as NFS). Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Actually requeue an object when requestedDavid Howells1-2/+1
FS-Cache objects have an FSCACHE_OBJECT_EV_REQUEUE event that can theoretically be raised to ask the state machine to requeue the object for further processing before the work function returns to the slow-work facility. However, fscache_object_work_execute() was clearing that bit before checking the event mask to see whether the object has any pending events that require it to be requeued immediately. Instead, the bit should be cleared after the check and enqueue. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Start processing an object's operations on that object's deathDavid Howells4-61/+69
Start processing an object's operations when that object moves into the DYING state as the object cannot be destroyed until all its outstanding operations have completed. Furthermore, make sure that read and allocation operations handle being woken up on a dead object. Such events are recorded in the Allocs.abt and Retrvls.abt statistics as viewable through /proc/fs/fscache/stats. The code for waiting for object activation for the read and allocation operations is also extracted into its own function as it is much the same in all cases, differing only in the stats incremented. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Make sure FSCACHE_COOKIE_LOOKING_UP cleared on lookup failureDavid Howells1-5/+12
We must make sure that FSCACHE_COOKIE_LOOKING_UP is cleared on lookup failure (if an object reaches the LC_DYING state), and we should clear it before clearing FSCACHE_COOKIE_CREATING. If this doesn't happen then fscache_wait_for_deferred_lookup() may hold allocation and retrieval operations indefinitely until they're interrupted by signals - which in turn pins the dying object until they go away. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Add a retirement stat counterDavid Howells3-2/+7
Add a stat counter to count retirement events rather than ordinary release events (the retire argument to fscache_relinquish_cookie()). Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Handle pages pending storage that get evicted under OOM conditionsDavid Howells6-34/+100
Handle netfs pages that the vmscan algorithm wants to evict from the pagecache under OOM conditions, but that are waiting for write to the cache. Under these conditions, vmscan calls the releasepage() function of the netfs, asking if a page can be discarded. The problem is typified by the following trace of a stuck process: kslowd005 D 0000000000000000 0 4253 2 0x00000080 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8 Call Trace: [<ffffffffa00782d8>] __fscache_wait_on_page_write+0x8b/0xa7 [fscache] [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffffa0078240>] ? __fscache_check_page_write+0x63/0x70 [fscache] [<ffffffffa00b671d>] nfs_fscache_release_page+0x4e/0xc4 [nfs] [<ffffffffa00927f0>] nfs_release_page+0x3c/0x41 [nfs] [<ffffffff810885d3>] try_to_release_page+0x32/0x3b [<ffffffff81093203>] shrink_page_list+0x316/0x4ac [<ffffffff8109372b>] shrink_inactive_list+0x392/0x67c [<ffffffff813532fa>] ? __mutex_unlock_slowpath+0x100/0x10b [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130 [<ffffffff8135330e>] ? mutex_unlock+0x9/0xb [<ffffffff81093aa2>] shrink_list+0x8d/0x8f [<ffffffff81093d1c>] shrink_zone+0x278/0x33c [<ffffffff81052d6c>] ? ktime_get_ts+0xad/0xba [<ffffffff81094b13>] try_to_free_pages+0x22e/0x392 [<ffffffff81091e24>] ? isolate_pages_global+0x0/0x212 [<ffffffff8108e743>] __alloc_pages_nodemask+0x3dc/0x5cf [<ffffffff81089529>] grab_cache_page_write_begin+0x65/0xaa [<ffffffff8110f8c0>] ext3_write_begin+0x78/0x1eb [<ffffffff81089ec5>] generic_file_buffered_write+0x109/0x28c [<ffffffff8103cb69>] ? current_fs_time+0x22/0x29 [<ffffffff8108a509>] __generic_file_aio_write+0x350/0x385 [<ffffffff8108a588>] ? generic_file_aio_write+0x4a/0xae [<ffffffff8108a59e>] generic_file_aio_write+0x60/0xae [<ffffffff810b2e82>] do_sync_write+0xe3/0x120 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffff810b18e1>] ? __dentry_open+0x1a5/0x2b8 [<ffffffff810b1a76>] ? dentry_open+0x82/0x89 [<ffffffffa00e693c>] cachefiles_write_page+0x298/0x335 [cachefiles] [<ffffffffa0077147>] fscache_write_op+0x178/0x2c2 [fscache] [<ffffffffa0075656>] fscache_op_execute+0x7a/0xd1 [fscache] [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308 [<ffffffff8104be91>] kthread+0x7a/0x82 [<ffffffff8100beda>] child_rip+0xa/0x20 [<ffffffff8100b87c>] ? restore_args+0x0/0x30 [<ffffffff8102ef83>] ? tg_shares_up+0x171/0x227 [<ffffffff8104be17>] ? kthread+0x0/0x82 [<ffffffff8100bed0>] ? child_rip+0x0/0x20 In the above backtrace, the following is happening: (1) A page storage operation is being executed by a slow-work thread (fscache_write_op()). (2) FS-Cache farms the operation out to the cache to perform (cachefiles_write_page()). (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's standard write (do_sync_write()) under KERNEL_DS directly from the netfs page. (4) However, for Ext3 to perform the write, it must allocate some memory, in particular, it must allocate at least one page cache page into which it can copy the data from the netfs page. (5) Under OOM conditions, the memory allocator can't immediately come up with a page, so it uses vmscan to find something to discard (try_to_free_pages()). (6) vmscan finds a clean netfs page it might be able to discard (possibly the one it's trying to write out). (7) The netfs is called to throw the page away (nfs_release_page()) - but it's called with __GFP_WAIT, so the netfs decides to wait for the store to complete (__fscache_wait_on_page_write()). (8) This blocks a slow-work processing thread - possibly against itself. The system ends up stuck because it can't write out any netfs pages to the cache without allocating more memory. To avoid this, we make FS-Cache cancel some writes that aren't in the middle of actually being performed. This means that some data won't make it into the cache this time. To support this, a new FS-Cache function is added fscache_maybe_release_page() that replaces what the netfs releasepage() functions used to do with respect to the cache. The decisions fscache_maybe_release_page() makes are counted and displayed through /proc/fs/fscache/stats on a line labelled "VmScan". There are four counters provided: "nos=N" - pages that weren't pending storage; "gon=N" - pages that were pending storage when we first looked, but weren't by the time we got the object lock; "bsy=N" - pages that we ignored as they were actively being written when we looked; and "can=N" - pages that we cancelled the storage of. What I'd really like to do is alter the behaviour of the cancellation heuristics, depending on how necessary it is to expel pages. If there are plenty of other pages that aren't waiting to be written to the cache that could be ejected first, then it would be nice to hold up on immediate cancellation of cache writes - but I don't see a way of doing that. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Handle read request vs lookup, creation or other cache failureDavid Howells3-2/+10
FS-Cache doesn't correctly handle the netfs requesting a read from the cache on an object that failed or was withdrawn by the cache. A trace similar to the following might be seen: CacheFiles: Lookup failed error -105 [exe ] unexpected submission OP165afe [OBJ6cac OBJECT_LC_DYING] [exe ] objstate=OBJECT_LC_DYING [OBJECT_LC_DYING] [exe ] objflags=0 [exe ] objevent=9 [fffffffffffffffb] [exe ] ops=0 inp=0 exc=0 Pid: 6970, comm: exe Not tainted 2.6.32-rc6-cachefs #50 Call Trace: [<ffffffffa0076477>] fscache_submit_op+0x3ff/0x45a [fscache] [<ffffffffa0077997>] __fscache_read_or_alloc_pages+0x187/0x3c4 [fscache] [<ffffffffa00b6480>] ? nfs_readpage_from_fscache_complete+0x0/0x66 [nfs] [<ffffffffa00b6388>] __nfs_readpages_from_fscache+0x7e/0x176 [nfs] [<ffffffff8108e483>] ? __alloc_pages_nodemask+0x11c/0x5cf [<ffffffffa009d796>] nfs_readpages+0x114/0x1d7 [nfs] [<ffffffff81090314>] __do_page_cache_readahead+0x15f/0x1ec [<ffffffff81090228>] ? __do_page_cache_readahead+0x73/0x1ec [<ffffffff810903bd>] ra_submit+0x1c/0x20 [<ffffffff810906bb>] ondemand_readahead+0x227/0x23a [<ffffffff81090762>] page_cache_sync_readahead+0x17/0x19 [<ffffffff8108a99e>] generic_file_aio_read+0x236/0x5a0 [<ffffffffa00937bd>] nfs_file_read+0xe4/0xf3 [nfs] [<ffffffff810b2fa2>] do_sync_read+0xe3/0x120 [<ffffffff81354cc3>] ? _spin_unlock_irq+0x2b/0x31 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffff811848e5>] ? selinux_file_permission+0x5d/0x10f [<ffffffff81352bdb>] ? thread_return+0x3e/0x101 [<ffffffff8117d7b0>] ? security_file_permission+0x11/0x13 [<ffffffff810b3b06>] vfs_read+0xaa/0x16f [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130 [<ffffffff810b3c84>] sys_read+0x45/0x6c [<ffffffff8100ae2b>] system_call_fastpath+0x16/0x1b The object state might also be OBJECT_DYING or OBJECT_WITHDRAWING. This should be handled by simply rejecting the new operation with ENOBUFS. There's no need to log an error for it. Events of this type now appear in the stats file under Ops:rej. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Don't delete pending pages from the page-store tracking treeDavid Howells1-2/+5
Don't delete pending pages from the page-store tracking tree, but rather send them for another write as they've presumably been updated. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Fix lock misorder in fscache_write_op()David Howells4-19/+48
FS-Cache has two structs internally for keeping track of the internal state of a cached file: the fscache_cookie struct, which represents the netfs's state, and fscache_object struct, which represents the cache's state. Each has a pointer that points to the other (when both are in existence), and each has a spinlock for pointer maintenance. Since netfs operations approach these structures from the cookie side, they get the cookie lock first, then the object lock. Cache operations, on the other hand, approach from the object side, and get the object lock first. It is not then permitted for a cache operation to get the cookie lock whilst it is holding the object lock lest deadlock occur; instead, it must do one of two things: (1) increment the cookie usage counter, drop the object lock and then get both locks in order, or (2) simply hold the object lock as certain parts of the cookie may not be altered whilst the object lock is held. It is also not permitted to follow either pointer without holding the lock at the end you start with. To break the pointers between the cookie and the object, both locks must be held. fscache_write_op(), however, violates the locking rules: It attempts to get the cookie lock without (a) checking that the cookie pointer is a valid pointer, and (b) holding the object lock to protect the cookie pointer whilst it follows it. This is so that it can access the pending page store tree without interference from __fscache_write_page(). This is fixed by splitting the cookie lock, such that the page store tracking tree is protected by its own lock, and checking that the cookie pointer is non-NULL before we attempt to follow it whilst holding the object lock. The new lock is subordinate to both the cookie lock and the object lock, and so should be taken after those. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: The object-available state can't rely on the cookie to be availableDavid Howells1-2/+7
The object-available state in the object processing state machine (as processed by fscache_object_available()) can't rely on the cookie to be available because the FSCACHE_COOKIE_CREATING bit may have been cleared by fscache_obtained_object() prior to the object being put into the FSCACHE_OBJECT_AVAILABLE state. Clearing the FSCACHE_COOKIE_CREATING bit on a cookie permits __fscache_relinquish_cookie() to proceed and detach the cookie from the object. To deal with this, we don't dereference object->cookie in fscache_object_available() if the object has already been detached. In addition, a couple of assertions are added into fscache_drop_object() to make sure the object is unbound from the cookie before it gets there. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Permit cache retrieval ops to be interrupted in the initial wait phaseDavid Howells4-39/+113
Permit the operations to retrieve data from the cache or to allocate space in the cache for future writes to be interrupted whilst they're waiting for permission for the operation to proceed. Typically this wait occurs whilst the cache object is being looked up on disk in the background. If an interruption occurs, and the operation has not yet been given the go-ahead to run, the operation is dequeued and cancelled, and control returns to the read operation of the netfs routine with none of the requested pages having been read or in any way marked as known by the cache. This means that the initial wait is done interruptibly rather than uninterruptibly. In addition, extra stats values are made available to show the number of ops cancelled and the number of cache space allocations interrupted. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Use radix tree preload correctly in tracking of pages to be storedDavid Howells1-1/+3
__fscache_write_page() attempts to load the radix tree preallocation pool for the CPU it is on before calling radix_tree_insert(), as the insertion must be done inside a pair of spinlocks. Use of the preallocation pool, however, is contingent on the radix tree being initialised without __GFP_WAIT specified. __fscache_acquire_cookie() was passing GFP_NOFS to INIT_RADIX_TREE() - but that includes __GFP_WAIT. The solution is to AND out __GFP_WAIT. Additionally, the banner comment to radix_tree_preload() is altered to make note of this prerequisite. Possibly there should be a WARN_ON() too. Without this fix, I have seen the following recursive deadlock caused by radix_tree_insert() attempting to allocate memory inside the spinlocked region, which resulted in FS-Cache being called back into to release memory - which required the spinlock already held. ============================================= [ INFO: possible recursive locking detected ] 2.6.32-rc6-cachefs #24 --------------------------------------------- nfsiod/7916 is trying to acquire lock: (&cookie->lock){+.+.-.}, at: [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache] but task is already holding lock: (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache] other info that might help us debug this: 5 locks held by nfsiod/7916: #0: (nfsiod){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2 #1: (&task->u.tk_work#2){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2 #2: (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache] #3: (&object->lock#2){+.+.-.}, at: [<ffffffffa0076b07>] __fscache_write_page+0x197/0x3f3 [fscache] #4: (&cookie->stores_lock){+.+...}, at: [<ffffffffa0076b0f>] __fscache_write_page+0x19f/0x3f3 [fscache] stack backtrace: Pid: 7916, comm: nfsiod Not tainted 2.6.32-rc6-cachefs #24 Call Trace: [<ffffffff8105ac7f>] __lock_acquire+0x1649/0x16e3 [<ffffffff81059ded>] ? __lock_acquire+0x7b7/0x16e3 [<ffffffff8100e27d>] ? dump_trace+0x248/0x257 [<ffffffff8105ad70>] lock_acquire+0x57/0x6d [<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache] [<ffffffff8135467c>] _spin_lock+0x2c/0x3b [<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache] [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache] [<ffffffffa0077eb7>] ? __fscache_check_page_write+0x0/0x71 [fscache] [<ffffffffa00b4755>] nfs_fscache_release_page+0x86/0xc4 [nfs] [<ffffffffa00907f0>] nfs_release_page+0x3c/0x41 [nfs] [<ffffffff81087ffb>] try_to_release_page+0x32/0x3b [<ffffffff81092c2b>] shrink_page_list+0x316/0x4ac [<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70 [<ffffffff8135451b>] ? _spin_unlock_irq+0x2b/0x31 [<ffffffff81093153>] shrink_inactive_list+0x392/0x67c [<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70 [<ffffffff810934ca>] shrink_list+0x8d/0x8f [<ffffffff81093744>] shrink_zone+0x278/0x33c [<ffffffff81052c70>] ? ktime_get_ts+0xad/0xba [<ffffffff8109453b>] try_to_free_pages+0x22e/0x392 [<ffffffff8109184c>] ? isolate_pages_global+0x0/0x212 [<ffffffff8108e16b>] __alloc_pages_nodemask+0x3dc/0x5cf [<ffffffff810ae24a>] cache_alloc_refill+0x34d/0x6c1 [<ffffffff811bcf74>] ? radix_tree_node_alloc+0x52/0x5c [<ffffffff810ae929>] kmem_cache_alloc+0xb2/0x118 [<ffffffff811bcf74>] radix_tree_node_alloc+0x52/0x5c [<ffffffff811bcfd5>] radix_tree_insert+0x57/0x19c [<ffffffffa0076b53>] __fscache_write_page+0x1e3/0x3f3 [fscache] [<ffffffffa00b4248>] __nfs_readpage_to_fscache+0x58/0x11e [nfs] [<ffffffffa009bb77>] nfs_readpage_release+0x34/0x9b [nfs] [<ffffffffa009c0d9>] nfs_readpage_release_full+0x32/0x4b [nfs] [<ffffffffa0006cff>] rpc_release_calldata+0x12/0x14 [sunrpc] [<ffffffffa0006e2d>] rpc_free_task+0x59/0x61 [sunrpc] [<ffffffffa0006f03>] rpc_async_release+0x10/0x12 [sunrpc] [<ffffffff810482e5>] worker_thread+0x1ef/0x2e2 [<ffffffff81048290>] ? worker_thread+0x19a/0x2e2 [<ffffffff81352433>] ? thread_return+0x3e/0x101 [<ffffffffa0006ef3>] ? rpc_async_release+0x0/0x12 [sunrpc] [<ffffffff8104bff5>] ? autoremove_wake_function+0x0/0x34 [<ffffffff81058d25>] ? trace_hardirqs_on+0xd/0xf [<ffffffff810480f6>] ? worker_thread+0x0/0x2e2 [<ffffffff8104bd21>] kthread+0x7a/0x82 [<ffffffff8100beda>] child_rip+0xa/0x20 [<ffffffff8100b87c>] ? restore_args+0x0/0x30 [<ffffffff8104c2b9>] ? add_wait_queue+0x15/0x44 [<ffffffff8104bca7>] ? kthread+0x0/0x82 [<ffffffff8100bed0>] ? child_rip+0x0/0x20 Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Clear netfs pointers in cookie after detaching object, not beforeDavid Howells1-4/+4
Clear the pointers from the fscache_cookie struct to netfs private data after clearing the pointer to the cookie from the fscache_object struct and releasing the object lock, rather than before. This allows the netfs private data pointers to be relied on simply by holding the object lock, rather than having to hold the cookie lock. This is makes things simpler as the cookie lock has to be taken before the object lock, but sometimes the object pointer is all that the code has. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Add counters for entry/exit to/from cache operation functionsDavid Howells6-9/+118
Count entries to and exits from cache operation table functions. Maintain these as a single counter that's added to or removed from as appropriate. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Allow the current state of all objects to be dumpedDavid Howells12-3/+484
Allow the current state of all fscache objects to be dumped by doing: cat /proc/fs/fscache/objects By default, all objects and all fields will be shown. This can be restricted by adding a suitable key to one of the caller's keyrings (such as the session keyring): keyctl add user fscache:objlist "<restrictions>" @s The <restrictions> are: K Show hexdump of object key (don't show if not given) A Show hexdump of object aux data (don't show if not given) And paired restrictions: C Show objects that have a cookie c Show objects that don't have a cookie B Show objects that are busy b Show objects that aren't busy W Show objects that have pending writes w Show objects that don't have pending writes R Show objects that have outstanding reads r Show objects that don't have outstanding reads S Show objects that have slow work queued s Show objects that don't have slow work queued If neither side of a restriction pair is given, then both are implied. For example: keyctl add user fscache:objlist KB @s shows objects that are busy, and lists their object keys, but does not dump their auxiliary data. It also implies "CcWwRrSs", but as 'B' is given, 'b' is not implied. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19FS-Cache: Annotate slow-work runqueue proc lines for FS-Cache work itemsDavid Howells3-6/+91
Annotate slow-work runqueue proc lines for FS-Cache work items. Objects include the object ID and the state. Operations include the object ID, the operation ID and the operation type and state. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19SLOW_WORK: Wait for outstanding work items belonging to a module to clearDavid Howells5-5/+8
Wait for outstanding slow work items belonging to a module to clear when unregistering that module as a user of the facility. This prevents the put_ref code of a work item from being taken away before it returns. Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-17fcntl: rename F_OWNER_GID to F_OWNER_PGRPPeter Zijlstra1-2/+2
This is for consistency with various ioctl() operations that include the suffix "PGRP" in their names, and also for consistency with PRIO_PGRP, used with setpriority() and getpriority(). Also, using PGRP instead of GID avoids confusion with the common abbreviation of "group ID". I'm fine with anything that makes it more consistent, and if PGRP is what is the predominant abbreviation then I see no need to further confuse matters by adding a third one. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-17procfs: fix /proc/<pid>/stat stack pointer for kernel threadsStefani Seibold1-1/+1
Fix a small issue for the stack pointer in /proc/<pid>/stat. In case of a kernel thread the value of the printed stack pointer should be 0. Signed-off-by: Stefani Seibold <stefani@seibold.net> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-17Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2-5/+22
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: copy li_lsn before dropping AIL lock XFS bug in log recover with quota (bugzilla id 855)
2009-11-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds1-1/+1
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: cifs: clear server inode number flag while autodisabling
2009-11-17Merge branch 'for-linus' of ↵Linus Torvalds3-4/+5
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: deleted inconsistent comment in nilfs_load_inode_block() nilfs2: deleted struct nilfs_dat_group_desc nilfs2: fix lock order reversal in chcp operation
2009-11-17xfs: copy li_lsn before dropping AIL lockNathaniel W. Turner1-3/+20
Access to log items on the AIL is generally protected by m_ail_lock; this is particularly needed when we're getting or setting the 64-bit li_lsn on a 32-bit platform. This patch fixes a couple places where we were accessing the log item after dropping the AIL lock on 32-bit machines. This can result in a partially-zeroed log->l_tail_lsn if xfs_trans_ail_delete is racing with xfs_trans_ail_update, and in at least some cases, this can leave the l_tail_lsn with a zero cycle number, which means xlog_space_left will think the log is full (unless CONFIG_XFS_DEBUG is set, in which case we'll trip an ASSERT), leading to processes stuck forever in xlog_grant_log_space. Thanks to Adrian VanderSpek for first spotting the race potential and to Dave Chinner for debug assistance. Signed-off-by: Nathaniel W. Turner <nate@houseofnate.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-11-17XFS bug in log recover with quota (bugzilla id 855)Jan Rekorajski1-2/+2
Hi, I was hit by a bug in linux 2.6.31 when XFS is not able to recover the log after a crash if fs was mounted with quotas. Gory details in XFS bugzilla: http://oss.sgi.com/bugzilla/show_bug.cgi?id=855. It looks like wrong struct is used in buffer length check, and the following patch should fix the problem. xfs_dqblk_t has a size of 104+32 bytes, while xfs_disk_dquot_t is 104 bytes long, and this is exactly what I see in system logs - "XFS: dquot too small (104) in xlog_recover_do_dquot_trans." Signed-off-by: Jan Rekorajski <baggins@sith.mimuw.edu.pl> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-11-16cifs: clear server inode number flag while autodisablingSuresh Jayaraman1-1/+1
Fix the commit ec06aedd44 that intended to turn off querying for server inode numbers when server doesn't consistently support inode numbers. Presumably the commit didn't actually clear the CIFS_MOUNT_SERVER_INUM flag, perhaps a typo. Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: Stable <stable@kernel.org> Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-15nilfs2: deleted inconsistent comment in nilfs_load_inode_block()Jiro SEKIBA1-1/+0
The comment says, "Caller of this function MUST lock s_inode_lock", however just above the comment, it locks s_inode_lock in the function. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-14Fix memory corruption caused by nfsd readdir+Petr Vandrovec1-1/+1
Commit 8177e6d6dfb9cd03d9bdeb647c32161f8f58f686 ("nfsd: clean up readdirplus encoding") introduced single character typo in nfs3 readdir+ implementation. Unfortunately that typo has quite bad side effects: random memory corruption, followed (on my box) with immediate spontaneous box reboot. Using 'p1' instead of 'p' fixes my Linux box rebooting whenever VMware ESXi box tries to list contents of my home directory. Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-13ocfs2: Trivial cleanup of jbd compatibility layer removalSunil Mushran2-11/+1
Mainline commit 53ef99cad9878f02f27bb30bc304fc42af8bdd6e removed the JBD compatibility layer from OCFS2. This patch removes the last remaining remnants of that. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-11-13nilfs2: fix lock order reversal in chcp operationRyusuke Konishi2-3/+5
Will fix the following lock order reversal lockdep detected: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-rc6 #7 ------------------------------------------------------- chcp/30157 is trying to acquire lock: (&nilfs->ns_mount_mutex){+.+.+.}, at: [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2] but task is already holding lock: (&nilfs->ns_segctor_sem){++++.+}, at: [<fed7ca32>] nilfs_transaction_begin+0xba/0x110 [nilfs2] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&nilfs->ns_segctor_sem){++++.+}: [<c105799c>] __lock_acquire+0x109c/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c14151e2>] down_read+0x31/0x45 [<fed6d77b>] nilfs_attach_checkpoint+0x8f/0x16b [nilfs2] [<fed6e393>] nilfs_get_sb+0x3e7/0x653 [nilfs2] [<c10c0ccb>] vfs_kern_mount+0x8b/0x124 [<c10c0db2>] do_kern_mount+0x37/0xc3 [<c10d7517>] do_mount+0x64d/0x69d [<c10d75cd>] sys_mount+0x66/0x95 [<c1002a14>] sysenter_do_call+0x12/0x32 -> #1 (&type->s_umount_key#31/1){+.+.+.}: [<c105799c>] __lock_acquire+0x109c/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c104c0f3>] down_write_nested+0x34/0x52 [<c10c08fe>] sget+0x22e/0x389 [<fed6e133>] nilfs_get_sb+0x187/0x653 [nilfs2] [<c10c0ccb>] vfs_kern_mount+0x8b/0x124 [<c10c0db2>] do_kern_mount+0x37/0xc3 [<c10d7517>] do_mount+0x64d/0x69d [<c10d75cd>] sys_mount+0x66/0x95 [<c1002a14>] sysenter_do_call+0x12/0x32 -> #0 (&nilfs->ns_mount_mutex){+.+.+.}: [<c1057727>] __lock_acquire+0xe27/0x139d [<c1057d26>] lock_acquire+0x89/0xa0 [<c1414d63>] mutex_lock_nested+0x41/0x23e [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2] [<fed801b2>] nilfs_ioctl+0x11a/0x7da [nilfs2] [<c10cca12>] vfs_ioctl+0x27/0x6e [<c10ccf93>] do_vfs_ioctl+0x491/0x4db [<c10cd022>] sys_ioctl+0x45/0x5f [<c1002a14>] sysenter_do_call+0x12/0x32 Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-12__generic_block_fiemap(): fix for files bigger than 4GBMike Hommey1-1/+1
Because of an integer overflow on start_blk, various kind of wrong results would be returned by the generic_block_fiemap() handler, such as no extents when there is a 4GB+ hole at the beginning of the file, or wrong fe_logical when an extent starts after the first 4GB. Signed-off-by: Mike Hommey <mh@glandium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Eric Sandeen <sandeen@sgi.com> Cc: Josef Bacik <jbacik@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>