]> git.neil.brown.name Git - LaFS.git/log
LaFS.git
15 years agoAdd missing set_anon_super for subset mounts
NeilBrown [Fri, 13 Aug 2010 06:43:37 +0000 (16:43 +1000)]
Add missing set_anon_super for subset mounts

oops - missed that.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd test code for subset mounts.
NeilBrown [Fri, 13 Aug 2010 06:26:55 +0000 (16:26 +1000)]
Add test code for subset mounts.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMaybe add a bug-on in dirty_dblock
NeilBrown [Fri, 13 Aug 2010 06:26:30 +0000 (16:26 +1000)]
Maybe add a bug-on in dirty_dblock

I think we want this bug_on, but it doesn't quite work
yet - leave it as a reminder of '15ca/' in README

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoBetter handling of changing a Directory into an InodeFile
NeilBrown [Fri, 13 Aug 2010 06:23:28 +0000 (16:23 +1000)]
Better handling of changing a Directory into an InodeFile

- actually change the type !!!
- make sure the on-disk block gets a proper index update.

Note that the checkpointing before creating things in the FS is
important for this to be correct.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoVarious fixes for lafs_get_subset
NeilBrown [Fri, 13 Aug 2010 06:14:27 +0000 (16:14 +1000)]
Various fixes for lafs_get_subset

- make sure root directory is created if it doesn't exist
- also create inode usage map.
- hold a ref on the inode while the fs is mounted.
- free the sb_key at unmount.
- set s_bdi from the prime_sb

15 years agolafs_get_subset: balance locks properly
NeilBrown [Fri, 13 Aug 2010 06:09:53 +0000 (16:09 +1000)]
lafs_get_subset: balance locks properly

We drop the mutex outside the 'if' so we must take it outside
the 'if' too - which is safer as well.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix lafs_put_super for subset mounts.
NeilBrown [Fri, 13 Aug 2010 06:06:06 +0000 (16:06 +1000)]
Fix lafs_put_super for subset mounts.

- we still need a checkpoint - though not a final one - to ensure
  that all dirty blocks from the fileset are written.

- We it isn't the root for a snapshot, we don't want to put
  to root inode - the root inode will be in the main filesystem,
  not in this one.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoImprove lafs_iget_fs
NeilBrown [Fri, 13 Aug 2010 04:00:56 +0000 (14:00 +1000)]
Improve lafs_iget_fs

Allow getting inodes in other filesystem.

This isn't quite perfect yet though.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoSet PinPending in flush_data_to_inode.
NeilBrown [Fri, 13 Aug 2010 03:59:10 +0000 (13:59 +1000)]
Set PinPending in flush_data_to_inode.

Should always have this set when we pin a block.
It keeps the block pinned until it is dirtied.
As lafs_pin_block does a refile at the end, it can drop the Pinned
state as soon as it is set.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoiget_my_inode - fix for case of ino == NULL
NeilBrown [Fri, 13 Aug 2010 03:56:57 +0000 (13:56 +1000)]
iget_my_inode - fix for case of ino == NULL

igrab doesn't handle NULL inodes, so we must.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agolafs_write_end: set new file size correctly.
NeilBrown [Fri, 13 Aug 2010 03:55:46 +0000 (13:55 +1000)]
lafs_write_end: set new file size correctly.

We were setting the size to the start of the write, not the end!!

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDiscard filesys field from lafs_inode
NeilBrown [Wed, 11 Aug 2010 23:42:56 +0000 (09:42 +1000)]
Discard filesys field from lafs_inode

i_sb can be used just as well.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoChange filesys arg of lafs_new_inode to struct super_block
NeilBrown [Wed, 11 Aug 2010 22:57:55 +0000 (08:57 +1000)]
Change filesys arg of lafs_new_inode to struct super_block

It is more direct in most cases to use a super_block rather
than a filesys inode.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake lafs_new_inode work when given an explicit inode number.
NeilBrown [Tue, 10 Aug 2010 05:31:12 +0000 (15:31 +1000)]
Make lafs_new_inode work when given an explicit inode number.

In this case imni->mb isn't set, so we have to cope with that.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoChoose_free_inum: never return number below 16
NeilBrown [Tue, 10 Aug 2010 05:16:03 +0000 (15:16 +1000)]
Choose_free_inum: never return number below 16

They are for internal use.

Also fix a missing B_PinPending setting.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd 'filesys' arg to lafs_new_inode
NeilBrown [Mon, 9 Aug 2010 11:17:11 +0000 (21:17 +1000)]
Add 'filesys' arg to lafs_new_inode

This allows it to be called with dir == NULL - when creating an inode
that isn't in a directory.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake dir arg to lafs_new_inode optional.
NeilBrown [Mon, 9 Aug 2010 11:06:34 +0000 (21:06 +1000)]
Make dir arg to lafs_new_inode optional.

After all, some inodes will be created without a directory (root and
other special inodes).

Make inodbp optional too.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRevise handling of filesystem inconsistency: nlink == 0
NeilBrown [Mon, 9 Aug 2010 10:43:15 +0000 (20:43 +1000)]
Revise handling of filesystem inconsistency: nlink == 0

Our handling wasn't really correct, and made the less-safe
assumption.
So change it to simply increment the linkcount.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agocheckpoint must wait for both dblock and iblock of root to change phase.
NeilBrown [Mon, 9 Aug 2010 10:26:25 +0000 (20:26 +1000)]
checkpoint must wait for both dblock and iblock of root to change phase.

Only waiting for iblock isn't enough - dblock might still be in the
old phase, which gets rather confusing.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoUnpin data blocks from previous phase before allowing them to be dirty.
NeilBrown [Mon, 9 Aug 2010 10:24:20 +0000 (20:24 +1000)]
Unpin data blocks from previous phase before allowing them to be dirty.

While checkpointing will unpin PinPending blocks, it might not
manage to do it before the block gets Dirtied again.
So before we Pin the block - which is a required precursor to dirtying
them, unpin the block.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoBe more careful about waking cleaner in cluster_end_io.
NeilBrown [Mon, 9 Aug 2010 10:20:16 +0000 (20:20 +1000)]
Be more careful about waking cleaner in cluster_end_io.

If done was set as well as wake, we didn't wake the cleaner
so *FlushNeeded wouldn't necessarily be effective.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't release an orphan just because an inode cannot be found.
NeilBrown [Mon, 9 Aug 2010 10:17:09 +0000 (20:17 +1000)]
Don't release an orphan just because an inode cannot be found.

This is over-reacting.  We could be between last_iput and setting
I_Delete for example, so orphan_release would be premature and wrong.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoREADME update
NeilBrown [Mon, 9 Aug 2010 02:41:42 +0000 (12:41 +1000)]
README update

15 years agoUse async erase_block in inode orphan handling
NeilBrown [Mon, 9 Aug 2010 02:34:47 +0000 (12:34 +1000)]
Use async erase_block in inode orphan handling

otherwise we could deadlock.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd tracing for when we actually wait for writeback.
NeilBrown [Mon, 9 Aug 2010 02:06:26 +0000 (12:06 +1000)]
Add tracing for when we actually wait for writeback.

This helps track deadlock bugs, just like the similar code
in lafs_iolock_block.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agorcu locking protection for ->my_inode
NeilBrown [Sun, 1 Aug 2010 03:53:23 +0000 (13:53 +1000)]
rcu locking protection for ->my_inode

We use rcu to free inodes, and use rcu locking to protect
access to ->my_inode.

Part of this required that once I_Deleting is set it stays set,
so remove the pointless clearing of it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoUse lafs_iget_fs rather than multiple get_blocks in orphan lookup.
NeilBrown [Sun, 1 Aug 2010 03:38:06 +0000 (13:38 +1000)]
Use lafs_iget_fs rather than multiple get_blocks in orphan lookup.

When compacting the orphan table so so changing the orphan
slot for a block, use lafs_iget_fs to help find the orphan block.
This avoids allocating blocks if the inodes exist (which they
should).

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agohold ref to inode for directory orphans.
NeilBrown [Sun, 1 Aug 2010 03:01:10 +0000 (13:01 +1000)]
hold ref to inode for directory orphans.

When we directory block is an orphan, make sure we hold
a reference on the inode so it cannot disappear on us.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoHold ref on inode while truncating.
NeilBrown [Sun, 1 Aug 2010 02:46:48 +0000 (12:46 +1000)]
Hold ref on inode while truncating.

If I_Trunc is set and I_Deleting is not, then we hold a
reference to the inode and must drop it when clearing I_Trunc.

This ensure that delete_inode won't get called while truncate
is happening, and the inode won't otherwise disappear.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoHold ref on inode during orphan handling.
NeilBrown [Sun, 1 Aug 2010 02:43:04 +0000 (12:43 +1000)]
Hold ref on inode during orphan handling.

Orphan handling will shortly drop references to the inode controlling
the orphan block.  As run_orphans needs to drop the mutex at the end
it needs to hold another reference too.

If I_Deleting is set, then the db effectively owns a reference,
so no further igrab is needed, nor will it work.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agowait more effectively for truncate to progress.
NeilBrown [Sun, 1 Aug 2010 00:57:01 +0000 (10:57 +1000)]
wait more effectively for truncate to progress.

The times that we wait for truncate to progress, we hold
i_mutex, so truncate cannot progress.
So if there is a need to wait, we need to call the orphan
handler directly.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoBreak linkage between inode and dblock at earliest opportunity.
NeilBrown [Sat, 31 Jul 2010 00:00:22 +0000 (10:00 +1000)]
Break linkage between inode and dblock at earliest opportunity.

clear_inode is the first chance to break this linkage,
so do it there.
It is still possible for lafs_iget to get a new inode before
clear_inode has completed, so we need to do the same
test/clear in lafs_iget if b->my_inode is found to be non-NULL;

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRemove some dead code.
NeilBrown [Wed, 28 Jul 2010 12:38:07 +0000 (22:38 +1000)]
Remove some dead code.

This used to play with WritePhase, but as we don't do that
any more, the whole thing can go.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't clear PinPending without a good reason.
NeilBrown [Wed, 28 Jul 2010 12:15:06 +0000 (22:15 +1000)]
Don't clear PinPending without a good reason.

This definitely is wrong.  If an inode change is waiting for
previous-phase data to be written, we don't want to lose
the PinPending.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure we do a cluster-flush when SecondFlushNeeded
NeilBrown [Wed, 28 Jul 2010 11:59:08 +0000 (21:59 +1000)]
Make sure we do a cluster-flush when SecondFlushNeeded

Also if FlushNeeded - normally this isn't needed as clhead won't be
empty, but there is room for races to make that not so - this is
safer.

So only test the FlushNeeded flags and ChecpointEnd for cnum==0,
ignore for the cleaner clusters.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoTidy up choose_free_inum a bit.
NeilBrown [Wed, 28 Jul 2010 11:56:29 +0000 (21:56 +1000)]
Tidy up choose_free_inum a bit.

The exact semantics of *bp and when refs were held were
a little confusing.

So only set *bp on function-exit while holding a ref on im,
and if the is set on function entry, drop the ref on it and on im.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRemove repeated pin_dblock
NeilBrown [Wed, 28 Jul 2010 11:52:34 +0000 (21:52 +1000)]
Remove repeated pin_dblock

inode_map_new_pin already pins this, so don't do it again.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoSpecial files should appear to have a link-count > 0
NeilBrown [Wed, 28 Jul 2010 11:25:32 +0000 (21:25 +1000)]
Special files should appear to have a link-count > 0

otherwise they might get deleted when we put them for
the last time, which would be bad.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agocleaner: don't iput while still holding a ref to a block.
NeilBrown [Wed, 28 Jul 2010 11:21:09 +0000 (21:21 +1000)]
cleaner: don't iput while still holding a ref to a block.

As the block->inode ref isn't counted, this isn't really safe.
The inode could disappear and the block might not get killed
when the address-space is truncated.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agolafs_get_block: fix mem leak on error path.
NeilBrown [Wed, 28 Jul 2010 11:15:44 +0000 (21:15 +1000)]
lafs_get_block: fix mem leak on error path.

forget to put the page...

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoHold ref to inode-map inode while allocating inode.
NeilBrown [Sun, 25 Jul 2010 10:25:50 +0000 (20:25 +1000)]
Hold ref to inode-map inode while allocating inode.

The block doesn't explicitly reference the inode, so we need
to hold a reference as long as we reference a block in the file.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agorefcount the prime_sb so fs doesn't disappear.
NeilBrown [Sun, 25 Jul 2010 09:22:48 +0000 (19:22 +1000)]
refcount the prime_sb so fs doesn't disappear.

use prime_sb->s_active to refcount the main fs when a snapshot
or subset is mounted, so the fs doesn't disappear on us.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDefine filesystem type for sub-fileset filesystems
Neil Brown [Mon, 19 Jul 2010 08:07:27 +0000 (18:07 +1000)]
Define filesystem type for sub-fileset filesystems

This also allows a sub-fileset to be created by
mounting an empty perm==0 directory as though it were
a sub-fileset already.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoImprove choice of superblock at mount.
NeilBrown [Sun, 25 Jul 2010 07:05:55 +0000 (17:05 +1000)]
Improve choice of superblock at mount.

Identify superblock by uuid, and ensure that it is unique
when mounting a new LaFS.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoNew mount infrastructure for snapshot.
NeilBrown [Sun, 25 Jul 2010 04:38:53 +0000 (14:38 +1000)]
New mount infrastructure for snapshot.

Using the new s_sb_info structure, we add a snapshot number
so we can uniquely identify a snapshot from the superblock and
the 'sget' can be used to find an existing or new superblock.

If it is new, set it up properly as before.

No need to fiddle with 'primary_sb' - we have a ref into it from the
path lookup so it cannot go away, and it shouldn't really matter if it
does.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoChange s_fs_info to point to root inode and fs
Neil Brown [Mon, 19 Jul 2010 08:36:47 +0000 (18:36 +1000)]
Change s_fs_info to point to root inode and fs

We create a new data structure containing the 'fs' and the root inode
of a filesystem, and store this in the superblock.
This allows each access to that root in iget, which previously was
impossible in general.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoUse inline to map sb to fs
Neil Brown [Mon, 19 Jul 2010 08:27:09 +0000 (18:27 +1000)]
Use inline to map sb to fs

because we are about to make the conversion slightly
more complex

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDiscard per-device superblocks
Neil Brown [Mon, 19 Jul 2010 05:22:40 +0000 (15:22 +1000)]
Discard per-device superblocks

There is no real value in the per-device superblocks.
Just open the device for exclusive access.

This loses the debatable possibility of just using mount
to add devices to an array - that should be remount anyway.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoUse anon super for prime_sb
Neil Brown [Mon, 19 Jul 2010 04:57:20 +0000 (14:57 +1000)]
Use anon super for prime_sb

This is cleaner.

also clean up failure path for lafs_load

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoStore blocksize directly in struct fs
Neil Brown [Mon, 19 Jul 2010 04:40:09 +0000 (14:40 +1000)]
Store blocksize directly in struct fs

That saves a lot of dereferences through prime_sb

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoBetter cleaner flushing
NeilBrown [Sun, 18 Jul 2010 17:46:52 +0000 (19:46 +0200)]
Better cleaner flushing

better comment needed

15 years agoRemove pointless code duplication in refile.
NeilBrown [Sun, 18 Jul 2010 17:17:07 +0000 (19:17 +0200)]
Remove pointless code duplication in refile.

Just get fs from inode once.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoSeparate setting of PinPending out
NeilBrown [Sun, 18 Jul 2010 16:44:12 +0000 (18:44 +0200)]
Separate setting of PinPending out

We want PinPending set whenever a transaction might be in progress to
ensure that write_page doesn't flush the block early, or that the
cleaner doesn't clean the block in the middle.

We also want the block be completely written if a write has already
been scheduled.

So:
  - set PinPending - after getting an IOlock and ensure the block is
     not in writeback.   This is set before the checkpoint lock is
     taken.
  - Once we have checkpoint lock and call pin_dblock, wait for
    writeout to complete again.  This can only be in writeout
    if the block is being written to the previous phase, and it
    is safe to wait for that inside the checkpoint lock.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDiscard WritePhase and phase wait
NeilBrown [Sun, 18 Jul 2010 15:08:21 +0000 (17:08 +0200)]
Discard WritePhase and phase wait

They didn't really work - we will achieve the same result
a different way

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agowrite_begin and sync_page fixes.
NeilBrown [Sun, 18 Jul 2010 15:03:33 +0000 (17:03 +0200)]
write_begin and sync_page fixes.

1/ write_begin needs to drop the page lock and failure,
  and generally clean up properly.
2/ sync_page does not need to 'get_block' as a pointer is
  readily available - so just use that with appropriate locking.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoForward port to 2.6.34
NeilBrown [Fri, 16 Jul 2010 09:00:39 +0000 (11:00 +0200)]
Forward port to 2.6.34

Also convert to using kvm for testing - much faster.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoREADME update
NeilBrown [Fri, 16 Jul 2010 05:56:18 +0000 (07:56 +0200)]
README update

15 years agoREADME update
NeilBrown [Fri, 16 Jul 2010 05:56:18 +0000 (07:56 +0200)]
README update

15 years agoUse write_begin/write_end in place of prepare/commit
NeilBrown [Thu, 15 Jul 2010 18:44:45 +0000 (20:44 +0200)]
Use write_begin/write_end in place of prepare/commit

As this is the 'new way' and need for upgrading the base kernel.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd export operations for NFS export.
Neil Brown [Wed, 14 Jul 2010 10:30:42 +0000 (20:30 +1000)]
Add export operations for NFS export.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoHandle fsync of inodes correctly
Neil Brown [Mon, 12 Jul 2010 10:56:10 +0000 (20:56 +1000)]
Handle fsync of inodes correctly

get rid of lafs_write_inode as it doesn't do the right thing.
Instead, create updates for inode changes only when
fsync is called on an inode.  The only other time we
flush out an inode is 'sync()' which does a checkpoint
so achieves the same with not updates in clusters.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoImplement readpages
Neil Brown [Mon, 12 Jul 2010 09:12:41 +0000 (19:12 +1000)]
Implement readpages

Also allow readpage (and readpages) to make a single
bio rather than lots of small ones.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoBuild larger bios when writing to cluster
NeilBrown [Mon, 12 Jul 2010 05:47:50 +0000 (15:47 +1000)]
Build larger bios when writing to cluster

We don't submit a bio until a new block doesn't fit, or until
we get to the end of a cluster and request a flush.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix up writeout and flushing.
NeilBrown [Mon, 12 Jul 2010 03:26:13 +0000 (13:26 +1000)]
Fix up writeout and flushing.

writepage should never flush.

sync_page should, if any block is dirty

cluster_flush should tell the backing dev to start writing

write_block and related functions don't need or use 'dev' arg.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoSome updates to rules.doc
NeilBrown [Mon, 12 Jul 2010 02:21:34 +0000 (12:21 +1000)]
Some updates to rules.doc

Too asleep to do much more

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRemove some FIXME comments that are outdated.
NeilBrown [Fri, 9 Jul 2010 21:36:37 +0000 (07:36 +1000)]
Remove some FIXME comments that are outdated.

These are no longer relevant.

Also update README with lots of FIXME notes
and fix some white-space issues.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix (Again) handling of new segment for final cluster
NeilBrown [Sun, 11 Jul 2010 09:13:47 +0000 (19:13 +1000)]
Fix (Again) handling of new segment for final cluster

There were other things that were being missed when
allocating the final cluster.  So change code to take the
same path and make exceptions only where exceptions are clearly
needed.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure last segment allocated is properly registered in table
NeilBrown [Sat, 10 Jul 2010 11:42:34 +0000 (21:42 +1000)]
Make sure last segment allocated is properly registered in table

otherwise bad things happen when we try to de-register it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoBetter status handling for orphan handlers.
NeilBrown [Sat, 10 Jul 2010 11:39:06 +0000 (21:39 +1000)]
Better status handling for orphan handlers.

- let them return -ENOMEM resulting in a retry 'soon'.
- let them return -ERESTARTSYS resulting in immediate retry
- general tidy up

This fixes a bug where inode_orphan_handle would do part of
the work and not schedule any more.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure orphans gets run promptly.
NeilBrown [Fri, 9 Jul 2010 23:17:08 +0000 (09:17 +1000)]
Make sure orphans gets run promptly.

Whenever we add an orphan to the list, ask the
cleaner thread to have a look at it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoReinstitute a BUG_ON in checkpoint_unlock_wait and doco it.
NeilBrown [Fri, 9 Jul 2010 21:34:25 +0000 (07:34 +1000)]
Reinstitute a BUG_ON in checkpoint_unlock_wait and doco it.

When waiting for the checkpoint to pass, we need to have triggered
a checkpoint to start somehow.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure we retry orphan handling if mutex unavailable.
NeilBrown [Fri, 9 Jul 2010 12:02:50 +0000 (22:02 +1000)]
Make sure we retry orphan handling if mutex unavailable.

We cannot arrange for a wakeup when i_mutex is dropped,
so we need to set a short timeout when i_mutex cannot be
claimed.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure we don't wait forever on writeback
NeilBrown [Fri, 9 Jul 2010 10:49:54 +0000 (20:49 +1000)]
Make sure we don't wait forever on writeback

If we ever wait on writeback on a block, we need to cluster_flush
quite promptly or we could wait forever.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoOnly increment pending_nxt when we finish a cluster
NeilBrown [Fri, 9 Jul 2010 10:39:45 +0000 (20:39 +1000)]
Only increment pending_nxt when we finish a cluster

cluster_reset is called when we reset a cluster, but also
when we reposition to the start of a new segment - which should be
the same cluster.
pending_nxt should only be changed in the first of those cases.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoREADME update and spelling fixes.
NeilBrown [Fri, 9 Jul 2010 06:29:39 +0000 (16:29 +1000)]
README update and spelling fixes.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoEnsure we drop async refs on youthblk extra on unmount.
NeilBrown [Fri, 9 Jul 2010 06:27:43 +0000 (16:27 +1000)]
Ensure we drop async refs on youthblk extra on unmount.

This shouldn't really be a problem, but it seems to be....

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRetard cleaner more when space is tight.
NeilBrown [Fri, 9 Jul 2010 06:18:56 +0000 (16:18 +1000)]
Retard cleaner more when space is tight.

If there is no space in any 'cleaner segment' to clean to, then
only clean if there are no 'clean' (but not 'free') segments.
As soon as we make a clean segment, we should stop cleaning and
allow a checkpoint to make the clean segment free so maybe more
progress can be made.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoProtect directory updates from be hit by writepage.
NeilBrown [Fri, 9 Jul 2010 05:29:36 +0000 (15:29 +1000)]
Protect directory updates from be hit by writepage.

We don't want writepage flushing a directory block while
we are updating it, or credits can be lost.

So set PinPending and leave it set the whole time.  This requires
a change in the handling of PinPending in checkpoint.

We get  an iolock on the block to set pinpending to make sure
writepage sees it.  This helps make sure we don't change the page
while it is being written.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoSupport fully async iget
NeilBrown [Fri, 9 Jul 2010 03:37:38 +0000 (13:37 +1000)]
Support fully async iget

iget can block if the inode is being initialised or freed
by a different thread.  It is not acceptable for the cleaner
to block in these cases as the other thread my need to trigger
a checkpoint.

So use special match/set functions to ensure we never
block, and use B_Async to check if we need to wake the
cleaner when done with an inode.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoClean up handling of B_Async
NeilBrown [Fri, 9 Jul 2010 04:36:50 +0000 (14:36 +1000)]
Clean up handling of B_Async

Follow a uniform structure for the functions that set/clear this bit.

Make sure cleaner is always woken if a block becomes available while
this bit is set.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoA not Valid directory orphan block is OK.
NeilBrown [Tue, 6 Jul 2010 10:01:47 +0000 (20:01 +1000)]
A not Valid directory orphan block is OK.

If an orphan block in a directory turns out to be not
B_Valid that is OK.  It could be that it got handled an
extra time or something.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd iolock_written_async
NeilBrown [Sat, 3 Jul 2010 01:24:17 +0000 (11:24 +1000)]
Add iolock_written_async

There are a couple of places where this open coded.

Do it properly and use B_Async to keep the block around.

Also wake the cleaner thread when a B_Async block finished writeback.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAvoid overallocation to the cleaner
NeilBrown [Sat, 3 Jul 2010 01:00:20 +0000 (11:00 +1000)]
Avoid overallocation to the cleaner

Don't give space to the cleaner at the expense of space for
writing new blocks.  That would be greedy.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't clear clean_reserved when allocating more
NeilBrown [Sat, 3 Jul 2010 00:57:53 +0000 (10:57 +1000)]
Don't clear clean_reserved when allocating more

If the allocation should fail we will have reduced the
allocation, and there could still be an open cleaner-segment
which would thus be confused.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRevise rules for setting EmergencyClean
NeilBrown [Sat, 3 Jul 2010 00:56:17 +0000 (10:56 +1000)]
Revise rules for setting EmergencyClean

This rule for clearing is good I think.

Setting is now a bit too early.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix some more left-shift-overflow issues.
NeilBrown [Sat, 3 Jul 2010 00:50:17 +0000 (10:50 +1000)]
Fix some more left-shift-overflow issues.

We need to cast before shifting sometimes.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't reset level of InoIdx which has dirty children
NeilBrown [Sat, 3 Jul 2010 00:35:25 +0000 (10:35 +1000)]
Don't reset level of InoIdx which has dirty children

If we do' then when those children get allocated confusion
will happen.
This just delayed the 'empty' verdict a little.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake the first segment 'active'.
NeilBrown [Sat, 3 Jul 2010 00:13:00 +0000 (10:13 +1000)]
Make the first segment 'active'.

A segment needs to be marked 'active' while we are writing to
it.  Newly allocated segments get that already, but the segment we
start on didn't.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix some issues with special entries in the segment table.
NeilBrown [Sat, 3 Jul 2010 00:09:58 +0000 (10:09 +1000)]
Fix some issues with special entries in the segment table.

We weren't handling all the special cases properly.
So fix that up and use #defines to make it more readable.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRemove lafs_write_super.
NeilBrown [Fri, 2 Jul 2010 23:44:25 +0000 (09:44 +1000)]
Remove lafs_write_super.

We don't really want to do anything of lafs_write_super
as we write the superblock when needed anyway.
However lafs_sync_fs needs to do what lafs_write_super was doing,
at least sometimes.

lafs_sync_fs will now force a checkpoint exactly when s_dirt is
set.  So revise those setting a little - I think we only want this if
there are dirty inodes to flush. but that needs to be thought about
more when I fix write_inode.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAllow index block to be Realloc during truncate.
NeilBrown [Fri, 2 Jul 2010 23:41:08 +0000 (09:41 +1000)]
Allow index block to be Realloc during truncate.

And data blocks in realloc will have been destroyed in
erase_dblock, but there could legitimately be Realloc index blocks
still, so allow them to be handled.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't set I_Trunc until pages are invalidated and trunc_next is set.
NeilBrown [Fri, 2 Jul 2010 23:38:55 +0000 (09:38 +1000)]
Don't set I_Trunc until pages are invalidated and trunc_next is set.

The block could already be subject to orphan handling, as unlink
sets that up before truncation happens.
So make sure not to set I_Trunc until we a really ready for the
orphan-inode truncation handling to happen.
Without this truncation can race with the cleaner and weird thinks
can happen.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoClose segments properly when we get to the end.
NeilBrown [Fri, 2 Jul 2010 23:29:34 +0000 (09:29 +1000)]
Close segments properly when we get to the end.

Passing -1 to new_segment was just *wrong*.
Do it right, and make sure to close all segments, and
release refcounts, at unmount.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't remove a block from cleaning list until B_Realloc is set.
NeilBrown [Fri, 2 Jul 2010 23:21:21 +0000 (09:21 +1000)]
Don't remove a block from cleaning list until B_Realloc is set.

This ensures that a race with erase_dblock will either run
into the mutex, or be able to clear Realloc immediately.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agokeep ->cleaning list in order.
NeilBrown [Fri, 2 Jul 2010 23:19:10 +0000 (09:19 +1000)]
keep ->cleaning list in order.

As we processes blocks from the segment in order, it is best
to keep them in order for later processing.
They will be sorted again when being added to a cluster,
but the more we help here, the better.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoreset newblocks for each checkpoint
NeilBrown [Fri, 2 Jul 2010 23:17:18 +0000 (09:17 +1000)]
reset newblocks for each checkpoint

newblocks is the count of new blocks written to the filesystem in this
checkpoint (roughly the amount of work that roll-forward would have to
do).  We use it to trigger new checkpoints.
So we need to reset it after each checkpoint.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoHandle print_tree of NULL block cleanly.
NeilBrown [Fri, 2 Jul 2010 23:14:41 +0000 (09:14 +1000)]
Handle print_tree of NULL block cleanly.

Don't want a BUG here..

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure the segment being written is never cleaned.
NeilBrown [Thu, 1 Jul 2010 07:51:35 +0000 (17:51 +1000)]
Make sure the segment being written is never cleaned.

Cleaning the current segment would be a bad idea as it's
usage count isn't really representative of anything useful.
So leave it in the table flags as 'active' to avoid it
becoming cleanable, and remove it when the segment is finished
with.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMore issues with wc->seg being explicitly unset at certain times.
NeilBrown [Thu, 1 Jul 2010 07:13:47 +0000 (17:13 +1000)]
More issues with wc->seg being explicitly unset at certain times.

We need to clear wc->seg at the end of a cleaning segment when we
choose not to add another, and we need to cope correctly when such
a segment is found.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agocluster_reset
NeilBrown [Thu, 1 Jul 2010 07:06:02 +0000 (17:06 +1000)]
cluster_reset

split some common code into cluster_reset which should be
called after a flushing a cluster or after setting up a
new segment.

This wasn't really happening at all in one case.

Signed-off-by: NeilBrown <neilb@suse.de>