]>
git.neil.brown.name Git - LaFS.git/log
NeilBrown [Sun, 15 Aug 2010 05:08:49 +0000 (15:08 +1000)]
clean.c - assorted tidy-ups
Change some magic constants into named constants, and
improves some comments.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 12:25:38 +0000 (22:25 +1000)]
Check if dirblock can be orphan before making it one.
Only certain sorts of deletions can make a directory block
into and orphan - check them out before committing resources.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 11:50:14 +0000 (21:50 +1000)]
flush orphans when renaming to empty directory just like rmdir
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 11:37:56 +0000 (21:37 +1000)]
Cleaner: add a memory barrier to ensure we see i_size promptly.
There is a possible race that we need a barrier to protect against.
I think... or hope.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 11:11:12 +0000 (21:11 +1000)]
Cleaner: remove pointless signed compare.
bcnt can never be negative, so don't pretend.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 11:09:40 +0000 (21:09 +1000)]
Allow cleaner to skip new filesystems.
If we can tell the a subset-filesystem is too new to
match what we find in the write-cluster, we can skip it
quickly.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 11:05:55 +0000 (21:05 +1000)]
When cleaner finds a block beyond EOF, ignore whole descriptor.
All other blocks in descriptor must also be beyond EOF,
so ignore them all at once.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 11:04:08 +0000 (21:04 +1000)]
Update trunc_gen whenever we truncate a file to zero.
That allows some minor optimisations in the cleaner to work.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 10:55:23 +0000 (20:55 +1000)]
cleaner_parse: use truncate number to avoid looking at old inodes.
That is why we have the truncnumber in the cluster head after all.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 10:52:11 +0000 (20:52 +1000)]
Allow cleaner_parse to request multiple inodes at once.
Currently cleaner_parse stops when it hits an inode that it cannot
load immediately. This reduced the opportunities for parallelism.
Instream allow up to 16 -EAGAINs from inode lookups.
This requires that we mark headers for inodes which failed, and
always start again from the beginning of the cluster head.
We already reduce the bcnt to 0, so for inodes that can be
found, we won't lookup the blocks twice.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 10:34:45 +0000 (20:34 +1000)]
Refactor try_clean
It is a very big function - change it to 3 moderate sized functions.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 10:08:53 +0000 (20:08 +1000)]
Give to_clean.ss a meaningful name.
It is a flag set when we have a valid segment address.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 08:17:48 +0000 (18:17 +1000)]
Keep track of 'seq' number in cleaner
When reading cluster-heads, track the seq number, both for validation
and, later, for optimisation.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 08:04:47 +0000 (18:04 +1000)]
README update
NeilBrown [Sat, 14 Aug 2010 06:48:39 +0000 (16:48 +1000)]
Clean up interaction between cleaner and checkpoint.
If a checkpoint is wanted, the cleaner shouldn't start any more work.
If the cleaner or segscan is active a checkpoint cannot start, but
when they complete they should wake the checkpoint process.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 06:38:14 +0000 (16:38 +1000)]
Release stray B_Async blocks if we find them.
There could still be some stray index blocks...
maybe fix that later.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 05:41:59 +0000 (15:41 +1000)]
Combine cleaning and orphan list_heads.
A datablock is very rarely both an orphan and requiring cleaning, so
having two list_heads is a waste.
If is an orphan it will have full parent linkage and addresses already
so it will be handled promptly and removed from the cleaning list.
So arrange that if a block wants to be both, it is preferentially on
the cleaning list, and when removed from the cleaning list is gets
added back to the pending_orphan list in case it needs processing.
Note that only directory and inode blocks can ever be orphans so some
optimisation of spinlocks is possible.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 05:16:37 +0000 (15:16 +1000)]
Change when orphan blocks are refcounted.
Count when while B_Orphan is set, rather than while on a list.
This gives us some freedom to do different things with the list,
and ensures that we never lose the flag by the block disappearing.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 05:01:34 +0000 (15:01 +1000)]
Use a new flag to identify blocks being processed by the cleaner.
This will help future patch which will unify cleaning and orphans
list_heads, and make is clear when a refcount is being help.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 04:43:11 +0000 (14:43 +1000)]
Apply single-exit pattern in try_clean
This removes a lot of duplications
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 03:36:36 +0000 (13:36 +1000)]
Implement youth decay.
- After a checkpoint, check if we are close enough to the end
of youth space to need a decay.
- when we record a new youth number, un-decay it if the block hasn't
been decayed yet (and convert endian properly)
- Change scan_seg to updates free_block/free_dev atomically in just
one place, and do a block worth of decay at that point.
As part of this, the youth block is only released at one place now.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 02:08:38 +0000 (12:08 +1000)]
Use the right value of creation_age of subsets.
It should be cluster seq number. This never wraps and
is used to compare against write cluster to trim searches of new
filesets early.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 14 Aug 2010 01:16:04 +0000 (11:16 +1000)]
Ensure lafs_orphan_release doesn't block too much
Make sure orphan->i_mutex isn't held for long
periods, and ensure that orphan_abort doesn't block in
erase_dblock.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 11:58:57 +0000 (21:58 +1000)]
README update
NeilBrown [Fri, 13 Aug 2010 11:56:49 +0000 (21:56 +1000)]
Tune checkpoint freq by segments, not blocks.
If nothing else does, we should force a checkpoint
every few segments rather than every so-many blocks.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 11:48:36 +0000 (21:48 +1000)]
Separate thread management from the cleaning.
The thread does a lot more than just 'clean' so don't call it the
'cleaner' any more - just the 'thread' or 'lafsd'.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 11:32:11 +0000 (21:32 +1000)]
roll: don't update index if block address hasn't changed.
This is quite possible if the block was pushed out in the previous
phase, and could save some work
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 9 Aug 2010 04:06:58 +0000 (14:06 +1000)]
Create a backing_dev_info for a lafs filesystem.
As a lafs filesystem can span multiple devices, we need our own
bdi to handle congestion notification and unplugging.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 09:36:06 +0000 (19:36 +1000)]
Give subset objects their own operations
And make sure they 'stat' like the sort of directory
that can be used to create them.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 06:43:37 +0000 (16:43 +1000)]
Add missing set_anon_super for subset mounts
oops - missed that.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 06:26:55 +0000 (16:26 +1000)]
Add test code for subset mounts.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 06:26:30 +0000 (16:26 +1000)]
Maybe add a bug-on in dirty_dblock
I think we want this bug_on, but it doesn't quite work
yet - leave it as a reminder of '15ca/' in README
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 06:23:28 +0000 (16:23 +1000)]
Better handling of changing a Directory into an InodeFile
- actually change the type !!!
- make sure the on-disk block gets a proper index update.
Note that the checkpointing before creating things in the FS is
important for this to be correct.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 06:14:27 +0000 (16:14 +1000)]
Various fixes for lafs_get_subset
- make sure root directory is created if it doesn't exist
- also create inode usage map.
- hold a ref on the inode while the fs is mounted.
- free the sb_key at unmount.
- set s_bdi from the prime_sb
NeilBrown [Fri, 13 Aug 2010 06:09:53 +0000 (16:09 +1000)]
lafs_get_subset: balance locks properly
We drop the mutex outside the 'if' so we must take it outside
the 'if' too - which is safer as well.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 06:06:06 +0000 (16:06 +1000)]
Fix lafs_put_super for subset mounts.
- we still need a checkpoint - though not a final one - to ensure
that all dirty blocks from the fileset are written.
- We it isn't the root for a snapshot, we don't want to put
to root inode - the root inode will be in the main filesystem,
not in this one.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 04:00:56 +0000 (14:00 +1000)]
Improve lafs_iget_fs
Allow getting inodes in other filesystem.
This isn't quite perfect yet though.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 03:59:10 +0000 (13:59 +1000)]
Set PinPending in flush_data_to_inode.
Should always have this set when we pin a block.
It keeps the block pinned until it is dirtied.
As lafs_pin_block does a refile at the end, it can drop the Pinned
state as soon as it is set.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 03:56:57 +0000 (13:56 +1000)]
iget_my_inode - fix for case of ino == NULL
igrab doesn't handle NULL inodes, so we must.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 13 Aug 2010 03:55:46 +0000 (13:55 +1000)]
lafs_write_end: set new file size correctly.
We were setting the size to the start of the write, not the end!!
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 11 Aug 2010 23:42:56 +0000 (09:42 +1000)]
Discard filesys field from lafs_inode
i_sb can be used just as well.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 11 Aug 2010 22:57:55 +0000 (08:57 +1000)]
Change filesys arg of lafs_new_inode to struct super_block
It is more direct in most cases to use a super_block rather
than a filesys inode.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 10 Aug 2010 05:31:12 +0000 (15:31 +1000)]
Make lafs_new_inode work when given an explicit inode number.
In this case imni->mb isn't set, so we have to cope with that.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 10 Aug 2010 05:16:03 +0000 (15:16 +1000)]
Choose_free_inum: never return number below 16
They are for internal use.
Also fix a missing B_PinPending setting.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 9 Aug 2010 11:17:11 +0000 (21:17 +1000)]
Add 'filesys' arg to lafs_new_inode
This allows it to be called with dir == NULL - when creating an inode
that isn't in a directory.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 9 Aug 2010 11:06:34 +0000 (21:06 +1000)]
Make dir arg to lafs_new_inode optional.
After all, some inodes will be created without a directory (root and
other special inodes).
Make inodbp optional too.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 9 Aug 2010 10:43:15 +0000 (20:43 +1000)]
Revise handling of filesystem inconsistency: nlink == 0
Our handling wasn't really correct, and made the less-safe
assumption.
So change it to simply increment the linkcount.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 9 Aug 2010 10:26:25 +0000 (20:26 +1000)]
checkpoint must wait for both dblock and iblock of root to change phase.
Only waiting for iblock isn't enough - dblock might still be in the
old phase, which gets rather confusing.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 9 Aug 2010 10:24:20 +0000 (20:24 +1000)]
Unpin data blocks from previous phase before allowing them to be dirty.
While checkpointing will unpin PinPending blocks, it might not
manage to do it before the block gets Dirtied again.
So before we Pin the block - which is a required precursor to dirtying
them, unpin the block.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 9 Aug 2010 10:20:16 +0000 (20:20 +1000)]
Be more careful about waking cleaner in cluster_end_io.
If done was set as well as wake, we didn't wake the cleaner
so *FlushNeeded wouldn't necessarily be effective.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 9 Aug 2010 10:17:09 +0000 (20:17 +1000)]
Don't release an orphan just because an inode cannot be found.
This is over-reacting. We could be between last_iput and setting
I_Delete for example, so orphan_release would be premature and wrong.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 9 Aug 2010 02:41:42 +0000 (12:41 +1000)]
README update
NeilBrown [Mon, 9 Aug 2010 02:34:47 +0000 (12:34 +1000)]
Use async erase_block in inode orphan handling
otherwise we could deadlock.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 9 Aug 2010 02:06:26 +0000 (12:06 +1000)]
Add tracing for when we actually wait for writeback.
This helps track deadlock bugs, just like the similar code
in lafs_iolock_block.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 1 Aug 2010 03:53:23 +0000 (13:53 +1000)]
rcu locking protection for ->my_inode
We use rcu to free inodes, and use rcu locking to protect
access to ->my_inode.
Part of this required that once I_Deleting is set it stays set,
so remove the pointless clearing of it.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 1 Aug 2010 03:38:06 +0000 (13:38 +1000)]
Use lafs_iget_fs rather than multiple get_blocks in orphan lookup.
When compacting the orphan table so so changing the orphan
slot for a block, use lafs_iget_fs to help find the orphan block.
This avoids allocating blocks if the inodes exist (which they
should).
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 1 Aug 2010 03:01:10 +0000 (13:01 +1000)]
hold ref to inode for directory orphans.
When we directory block is an orphan, make sure we hold
a reference on the inode so it cannot disappear on us.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 1 Aug 2010 02:46:48 +0000 (12:46 +1000)]
Hold ref on inode while truncating.
If I_Trunc is set and I_Deleting is not, then we hold a
reference to the inode and must drop it when clearing I_Trunc.
This ensure that delete_inode won't get called while truncate
is happening, and the inode won't otherwise disappear.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 1 Aug 2010 02:43:04 +0000 (12:43 +1000)]
Hold ref on inode during orphan handling.
Orphan handling will shortly drop references to the inode controlling
the orphan block. As run_orphans needs to drop the mutex at the end
it needs to hold another reference too.
If I_Deleting is set, then the db effectively owns a reference,
so no further igrab is needed, nor will it work.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 1 Aug 2010 00:57:01 +0000 (10:57 +1000)]
wait more effectively for truncate to progress.
The times that we wait for truncate to progress, we hold
i_mutex, so truncate cannot progress.
So if there is a need to wait, we need to call the orphan
handler directly.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 31 Jul 2010 00:00:22 +0000 (10:00 +1000)]
Break linkage between inode and dblock at earliest opportunity.
clear_inode is the first chance to break this linkage,
so do it there.
It is still possible for lafs_iget to get a new inode before
clear_inode has completed, so we need to do the same
test/clear in lafs_iget if b->my_inode is found to be non-NULL;
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 28 Jul 2010 12:38:07 +0000 (22:38 +1000)]
Remove some dead code.
This used to play with WritePhase, but as we don't do that
any more, the whole thing can go.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 28 Jul 2010 12:15:06 +0000 (22:15 +1000)]
Don't clear PinPending without a good reason.
This definitely is wrong. If an inode change is waiting for
previous-phase data to be written, we don't want to lose
the PinPending.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 28 Jul 2010 11:59:08 +0000 (21:59 +1000)]
Make sure we do a cluster-flush when SecondFlushNeeded
Also if FlushNeeded - normally this isn't needed as clhead won't be
empty, but there is room for races to make that not so - this is
safer.
So only test the FlushNeeded flags and ChecpointEnd for cnum==0,
ignore for the cleaner clusters.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 28 Jul 2010 11:56:29 +0000 (21:56 +1000)]
Tidy up choose_free_inum a bit.
The exact semantics of *bp and when refs were held were
a little confusing.
So only set *bp on function-exit while holding a ref on im,
and if the is set on function entry, drop the ref on it and on im.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 28 Jul 2010 11:52:34 +0000 (21:52 +1000)]
Remove repeated pin_dblock
inode_map_new_pin already pins this, so don't do it again.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 28 Jul 2010 11:25:32 +0000 (21:25 +1000)]
Special files should appear to have a link-count > 0
otherwise they might get deleted when we put them for
the last time, which would be bad.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 28 Jul 2010 11:21:09 +0000 (21:21 +1000)]
cleaner: don't iput while still holding a ref to a block.
As the block->inode ref isn't counted, this isn't really safe.
The inode could disappear and the block might not get killed
when the address-space is truncated.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 28 Jul 2010 11:15:44 +0000 (21:15 +1000)]
lafs_get_block: fix mem leak on error path.
forget to put the page...
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 25 Jul 2010 10:25:50 +0000 (20:25 +1000)]
Hold ref to inode-map inode while allocating inode.
The block doesn't explicitly reference the inode, so we need
to hold a reference as long as we reference a block in the file.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 25 Jul 2010 09:22:48 +0000 (19:22 +1000)]
refcount the prime_sb so fs doesn't disappear.
use prime_sb->s_active to refcount the main fs when a snapshot
or subset is mounted, so the fs doesn't disappear on us.
Signed-off-by: NeilBrown <neilb@suse.de>
Neil Brown [Mon, 19 Jul 2010 08:07:27 +0000 (18:07 +1000)]
Define filesystem type for sub-fileset filesystems
This also allows a sub-fileset to be created by
mounting an empty perm==0 directory as though it were
a sub-fileset already.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 25 Jul 2010 07:05:55 +0000 (17:05 +1000)]
Improve choice of superblock at mount.
Identify superblock by uuid, and ensure that it is unique
when mounting a new LaFS.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 25 Jul 2010 04:38:53 +0000 (14:38 +1000)]
New mount infrastructure for snapshot.
Using the new s_sb_info structure, we add a snapshot number
so we can uniquely identify a snapshot from the superblock and
the 'sget' can be used to find an existing or new superblock.
If it is new, set it up properly as before.
No need to fiddle with 'primary_sb' - we have a ref into it from the
path lookup so it cannot go away, and it shouldn't really matter if it
does.
Signed-off-by: NeilBrown <neilb@suse.de>
Neil Brown [Mon, 19 Jul 2010 08:36:47 +0000 (18:36 +1000)]
Change s_fs_info to point to root inode and fs
We create a new data structure containing the 'fs' and the root inode
of a filesystem, and store this in the superblock.
This allows each access to that root in iget, which previously was
impossible in general.
Signed-off-by: NeilBrown <neilb@suse.de>
Neil Brown [Mon, 19 Jul 2010 08:27:09 +0000 (18:27 +1000)]
Use inline to map sb to fs
because we are about to make the conversion slightly
more complex
Signed-off-by: NeilBrown <neilb@suse.de>
Neil Brown [Mon, 19 Jul 2010 05:22:40 +0000 (15:22 +1000)]
Discard per-device superblocks
There is no real value in the per-device superblocks.
Just open the device for exclusive access.
This loses the debatable possibility of just using mount
to add devices to an array - that should be remount anyway.
Signed-off-by: NeilBrown <neilb@suse.de>
Neil Brown [Mon, 19 Jul 2010 04:57:20 +0000 (14:57 +1000)]
Use anon super for prime_sb
This is cleaner.
also clean up failure path for lafs_load
Signed-off-by: NeilBrown <neilb@suse.de>
Neil Brown [Mon, 19 Jul 2010 04:40:09 +0000 (14:40 +1000)]
Store blocksize directly in struct fs
That saves a lot of dereferences through prime_sb
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 18 Jul 2010 17:46:52 +0000 (19:46 +0200)]
Better cleaner flushing
better comment needed
NeilBrown [Sun, 18 Jul 2010 17:17:07 +0000 (19:17 +0200)]
Remove pointless code duplication in refile.
Just get fs from inode once.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 18 Jul 2010 16:44:12 +0000 (18:44 +0200)]
Separate setting of PinPending out
We want PinPending set whenever a transaction might be in progress to
ensure that write_page doesn't flush the block early, or that the
cleaner doesn't clean the block in the middle.
We also want the block be completely written if a write has already
been scheduled.
So:
- set PinPending - after getting an IOlock and ensure the block is
not in writeback. This is set before the checkpoint lock is
taken.
- Once we have checkpoint lock and call pin_dblock, wait for
writeout to complete again. This can only be in writeout
if the block is being written to the previous phase, and it
is safe to wait for that inside the checkpoint lock.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 18 Jul 2010 15:08:21 +0000 (17:08 +0200)]
Discard WritePhase and phase wait
They didn't really work - we will achieve the same result
a different way
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 18 Jul 2010 15:03:33 +0000 (17:03 +0200)]
write_begin and sync_page fixes.
1/ write_begin needs to drop the page lock and failure,
and generally clean up properly.
2/ sync_page does not need to 'get_block' as a pointer is
readily available - so just use that with appropriate locking.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 16 Jul 2010 09:00:39 +0000 (11:00 +0200)]
Forward port to 2.6.34
Also convert to using kvm for testing - much faster.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 16 Jul 2010 05:56:18 +0000 (07:56 +0200)]
README update
NeilBrown [Fri, 16 Jul 2010 05:56:18 +0000 (07:56 +0200)]
README update
NeilBrown [Thu, 15 Jul 2010 18:44:45 +0000 (20:44 +0200)]
Use write_begin/write_end in place of prepare/commit
As this is the 'new way' and need for upgrading the base kernel.
Signed-off-by: NeilBrown <neilb@suse.de>
Neil Brown [Wed, 14 Jul 2010 10:30:42 +0000 (20:30 +1000)]
Add export operations for NFS export.
Signed-off-by: NeilBrown <neilb@suse.de>
Neil Brown [Mon, 12 Jul 2010 10:56:10 +0000 (20:56 +1000)]
Handle fsync of inodes correctly
get rid of lafs_write_inode as it doesn't do the right thing.
Instead, create updates for inode changes only when
fsync is called on an inode. The only other time we
flush out an inode is 'sync()' which does a checkpoint
so achieves the same with not updates in clusters.
Signed-off-by: NeilBrown <neilb@suse.de>
Neil Brown [Mon, 12 Jul 2010 09:12:41 +0000 (19:12 +1000)]
Implement readpages
Also allow readpage (and readpages) to make a single
bio rather than lots of small ones.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 12 Jul 2010 05:47:50 +0000 (15:47 +1000)]
Build larger bios when writing to cluster
We don't submit a bio until a new block doesn't fit, or until
we get to the end of a cluster and request a flush.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 12 Jul 2010 03:26:13 +0000 (13:26 +1000)]
Fix up writeout and flushing.
writepage should never flush.
sync_page should, if any block is dirty
cluster_flush should tell the backing dev to start writing
write_block and related functions don't need or use 'dev' arg.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Mon, 12 Jul 2010 02:21:34 +0000 (12:21 +1000)]
Some updates to rules.doc
Too asleep to do much more
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 9 Jul 2010 21:36:37 +0000 (07:36 +1000)]
Remove some FIXME comments that are outdated.
These are no longer relevant.
Also update README with lots of FIXME notes
and fix some white-space issues.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sun, 11 Jul 2010 09:13:47 +0000 (19:13 +1000)]
Fix (Again) handling of new segment for final cluster
There were other things that were being missed when
allocating the final cluster. So change code to take the
same path and make exceptions only where exceptions are clearly
needed.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 10 Jul 2010 11:42:34 +0000 (21:42 +1000)]
Make sure last segment allocated is properly registered in table
otherwise bad things happen when we try to de-register it.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Sat, 10 Jul 2010 11:39:06 +0000 (21:39 +1000)]
Better status handling for orphan handlers.
- let them return -ENOMEM resulting in a retry 'soon'.
- let them return -ERESTARTSYS resulting in immediate retry
- general tidy up
This fixes a bug where inode_orphan_handle would do part of
the work and not schedule any more.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 9 Jul 2010 23:17:08 +0000 (09:17 +1000)]
Make sure orphans gets run promptly.
Whenever we add an orphan to the list, ask the
cleaner thread to have a look at it.
Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Fri, 9 Jul 2010 21:34:25 +0000 (07:34 +1000)]
Reinstitute a BUG_ON in checkpoint_unlock_wait and doco it.
When waiting for the checkpoint to pass, we need to have triggered
a checkpoint to start somehow.
Signed-off-by: NeilBrown <neilb@suse.de>