]> git.neil.brown.name Git - LaFS.git/log
LaFS.git
15 years agoHold ref to inode-map inode while allocating inode.
NeilBrown [Sun, 25 Jul 2010 10:25:50 +0000 (20:25 +1000)]
Hold ref to inode-map inode while allocating inode.

The block doesn't explicitly reference the inode, so we need
to hold a reference as long as we reference a block in the file.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agorefcount the prime_sb so fs doesn't disappear.
NeilBrown [Sun, 25 Jul 2010 09:22:48 +0000 (19:22 +1000)]
refcount the prime_sb so fs doesn't disappear.

use prime_sb->s_active to refcount the main fs when a snapshot
or subset is mounted, so the fs doesn't disappear on us.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDefine filesystem type for sub-fileset filesystems
Neil Brown [Mon, 19 Jul 2010 08:07:27 +0000 (18:07 +1000)]
Define filesystem type for sub-fileset filesystems

This also allows a sub-fileset to be created by
mounting an empty perm==0 directory as though it were
a sub-fileset already.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoImprove choice of superblock at mount.
NeilBrown [Sun, 25 Jul 2010 07:05:55 +0000 (17:05 +1000)]
Improve choice of superblock at mount.

Identify superblock by uuid, and ensure that it is unique
when mounting a new LaFS.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoNew mount infrastructure for snapshot.
NeilBrown [Sun, 25 Jul 2010 04:38:53 +0000 (14:38 +1000)]
New mount infrastructure for snapshot.

Using the new s_sb_info structure, we add a snapshot number
so we can uniquely identify a snapshot from the superblock and
the 'sget' can be used to find an existing or new superblock.

If it is new, set it up properly as before.

No need to fiddle with 'primary_sb' - we have a ref into it from the
path lookup so it cannot go away, and it shouldn't really matter if it
does.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoChange s_fs_info to point to root inode and fs
Neil Brown [Mon, 19 Jul 2010 08:36:47 +0000 (18:36 +1000)]
Change s_fs_info to point to root inode and fs

We create a new data structure containing the 'fs' and the root inode
of a filesystem, and store this in the superblock.
This allows each access to that root in iget, which previously was
impossible in general.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoUse inline to map sb to fs
Neil Brown [Mon, 19 Jul 2010 08:27:09 +0000 (18:27 +1000)]
Use inline to map sb to fs

because we are about to make the conversion slightly
more complex

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDiscard per-device superblocks
Neil Brown [Mon, 19 Jul 2010 05:22:40 +0000 (15:22 +1000)]
Discard per-device superblocks

There is no real value in the per-device superblocks.
Just open the device for exclusive access.

This loses the debatable possibility of just using mount
to add devices to an array - that should be remount anyway.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoUse anon super for prime_sb
Neil Brown [Mon, 19 Jul 2010 04:57:20 +0000 (14:57 +1000)]
Use anon super for prime_sb

This is cleaner.

also clean up failure path for lafs_load

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoStore blocksize directly in struct fs
Neil Brown [Mon, 19 Jul 2010 04:40:09 +0000 (14:40 +1000)]
Store blocksize directly in struct fs

That saves a lot of dereferences through prime_sb

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoBetter cleaner flushing
NeilBrown [Sun, 18 Jul 2010 17:46:52 +0000 (19:46 +0200)]
Better cleaner flushing

better comment needed

15 years agoRemove pointless code duplication in refile.
NeilBrown [Sun, 18 Jul 2010 17:17:07 +0000 (19:17 +0200)]
Remove pointless code duplication in refile.

Just get fs from inode once.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoSeparate setting of PinPending out
NeilBrown [Sun, 18 Jul 2010 16:44:12 +0000 (18:44 +0200)]
Separate setting of PinPending out

We want PinPending set whenever a transaction might be in progress to
ensure that write_page doesn't flush the block early, or that the
cleaner doesn't clean the block in the middle.

We also want the block be completely written if a write has already
been scheduled.

So:
  - set PinPending - after getting an IOlock and ensure the block is
     not in writeback.   This is set before the checkpoint lock is
     taken.
  - Once we have checkpoint lock and call pin_dblock, wait for
    writeout to complete again.  This can only be in writeout
    if the block is being written to the previous phase, and it
    is safe to wait for that inside the checkpoint lock.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDiscard WritePhase and phase wait
NeilBrown [Sun, 18 Jul 2010 15:08:21 +0000 (17:08 +0200)]
Discard WritePhase and phase wait

They didn't really work - we will achieve the same result
a different way

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agowrite_begin and sync_page fixes.
NeilBrown [Sun, 18 Jul 2010 15:03:33 +0000 (17:03 +0200)]
write_begin and sync_page fixes.

1/ write_begin needs to drop the page lock and failure,
  and generally clean up properly.
2/ sync_page does not need to 'get_block' as a pointer is
  readily available - so just use that with appropriate locking.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoForward port to 2.6.34
NeilBrown [Fri, 16 Jul 2010 09:00:39 +0000 (11:00 +0200)]
Forward port to 2.6.34

Also convert to using kvm for testing - much faster.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoREADME update
NeilBrown [Fri, 16 Jul 2010 05:56:18 +0000 (07:56 +0200)]
README update

15 years agoREADME update
NeilBrown [Fri, 16 Jul 2010 05:56:18 +0000 (07:56 +0200)]
README update

15 years agoUse write_begin/write_end in place of prepare/commit
NeilBrown [Thu, 15 Jul 2010 18:44:45 +0000 (20:44 +0200)]
Use write_begin/write_end in place of prepare/commit

As this is the 'new way' and need for upgrading the base kernel.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd export operations for NFS export.
Neil Brown [Wed, 14 Jul 2010 10:30:42 +0000 (20:30 +1000)]
Add export operations for NFS export.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoHandle fsync of inodes correctly
Neil Brown [Mon, 12 Jul 2010 10:56:10 +0000 (20:56 +1000)]
Handle fsync of inodes correctly

get rid of lafs_write_inode as it doesn't do the right thing.
Instead, create updates for inode changes only when
fsync is called on an inode.  The only other time we
flush out an inode is 'sync()' which does a checkpoint
so achieves the same with not updates in clusters.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoImplement readpages
Neil Brown [Mon, 12 Jul 2010 09:12:41 +0000 (19:12 +1000)]
Implement readpages

Also allow readpage (and readpages) to make a single
bio rather than lots of small ones.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoBuild larger bios when writing to cluster
NeilBrown [Mon, 12 Jul 2010 05:47:50 +0000 (15:47 +1000)]
Build larger bios when writing to cluster

We don't submit a bio until a new block doesn't fit, or until
we get to the end of a cluster and request a flush.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix up writeout and flushing.
NeilBrown [Mon, 12 Jul 2010 03:26:13 +0000 (13:26 +1000)]
Fix up writeout and flushing.

writepage should never flush.

sync_page should, if any block is dirty

cluster_flush should tell the backing dev to start writing

write_block and related functions don't need or use 'dev' arg.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoSome updates to rules.doc
NeilBrown [Mon, 12 Jul 2010 02:21:34 +0000 (12:21 +1000)]
Some updates to rules.doc

Too asleep to do much more

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRemove some FIXME comments that are outdated.
NeilBrown [Fri, 9 Jul 2010 21:36:37 +0000 (07:36 +1000)]
Remove some FIXME comments that are outdated.

These are no longer relevant.

Also update README with lots of FIXME notes
and fix some white-space issues.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix (Again) handling of new segment for final cluster
NeilBrown [Sun, 11 Jul 2010 09:13:47 +0000 (19:13 +1000)]
Fix (Again) handling of new segment for final cluster

There were other things that were being missed when
allocating the final cluster.  So change code to take the
same path and make exceptions only where exceptions are clearly
needed.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure last segment allocated is properly registered in table
NeilBrown [Sat, 10 Jul 2010 11:42:34 +0000 (21:42 +1000)]
Make sure last segment allocated is properly registered in table

otherwise bad things happen when we try to de-register it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoBetter status handling for orphan handlers.
NeilBrown [Sat, 10 Jul 2010 11:39:06 +0000 (21:39 +1000)]
Better status handling for orphan handlers.

- let them return -ENOMEM resulting in a retry 'soon'.
- let them return -ERESTARTSYS resulting in immediate retry
- general tidy up

This fixes a bug where inode_orphan_handle would do part of
the work and not schedule any more.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure orphans gets run promptly.
NeilBrown [Fri, 9 Jul 2010 23:17:08 +0000 (09:17 +1000)]
Make sure orphans gets run promptly.

Whenever we add an orphan to the list, ask the
cleaner thread to have a look at it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoReinstitute a BUG_ON in checkpoint_unlock_wait and doco it.
NeilBrown [Fri, 9 Jul 2010 21:34:25 +0000 (07:34 +1000)]
Reinstitute a BUG_ON in checkpoint_unlock_wait and doco it.

When waiting for the checkpoint to pass, we need to have triggered
a checkpoint to start somehow.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure we retry orphan handling if mutex unavailable.
NeilBrown [Fri, 9 Jul 2010 12:02:50 +0000 (22:02 +1000)]
Make sure we retry orphan handling if mutex unavailable.

We cannot arrange for a wakeup when i_mutex is dropped,
so we need to set a short timeout when i_mutex cannot be
claimed.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure we don't wait forever on writeback
NeilBrown [Fri, 9 Jul 2010 10:49:54 +0000 (20:49 +1000)]
Make sure we don't wait forever on writeback

If we ever wait on writeback on a block, we need to cluster_flush
quite promptly or we could wait forever.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoOnly increment pending_nxt when we finish a cluster
NeilBrown [Fri, 9 Jul 2010 10:39:45 +0000 (20:39 +1000)]
Only increment pending_nxt when we finish a cluster

cluster_reset is called when we reset a cluster, but also
when we reposition to the start of a new segment - which should be
the same cluster.
pending_nxt should only be changed in the first of those cases.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoREADME update and spelling fixes.
NeilBrown [Fri, 9 Jul 2010 06:29:39 +0000 (16:29 +1000)]
README update and spelling fixes.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoEnsure we drop async refs on youthblk extra on unmount.
NeilBrown [Fri, 9 Jul 2010 06:27:43 +0000 (16:27 +1000)]
Ensure we drop async refs on youthblk extra on unmount.

This shouldn't really be a problem, but it seems to be....

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRetard cleaner more when space is tight.
NeilBrown [Fri, 9 Jul 2010 06:18:56 +0000 (16:18 +1000)]
Retard cleaner more when space is tight.

If there is no space in any 'cleaner segment' to clean to, then
only clean if there are no 'clean' (but not 'free') segments.
As soon as we make a clean segment, we should stop cleaning and
allow a checkpoint to make the clean segment free so maybe more
progress can be made.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoProtect directory updates from be hit by writepage.
NeilBrown [Fri, 9 Jul 2010 05:29:36 +0000 (15:29 +1000)]
Protect directory updates from be hit by writepage.

We don't want writepage flushing a directory block while
we are updating it, or credits can be lost.

So set PinPending and leave it set the whole time.  This requires
a change in the handling of PinPending in checkpoint.

We get  an iolock on the block to set pinpending to make sure
writepage sees it.  This helps make sure we don't change the page
while it is being written.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoSupport fully async iget
NeilBrown [Fri, 9 Jul 2010 03:37:38 +0000 (13:37 +1000)]
Support fully async iget

iget can block if the inode is being initialised or freed
by a different thread.  It is not acceptable for the cleaner
to block in these cases as the other thread my need to trigger
a checkpoint.

So use special match/set functions to ensure we never
block, and use B_Async to check if we need to wake the
cleaner when done with an inode.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoClean up handling of B_Async
NeilBrown [Fri, 9 Jul 2010 04:36:50 +0000 (14:36 +1000)]
Clean up handling of B_Async

Follow a uniform structure for the functions that set/clear this bit.

Make sure cleaner is always woken if a block becomes available while
this bit is set.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoA not Valid directory orphan block is OK.
NeilBrown [Tue, 6 Jul 2010 10:01:47 +0000 (20:01 +1000)]
A not Valid directory orphan block is OK.

If an orphan block in a directory turns out to be not
B_Valid that is OK.  It could be that it got handled an
extra time or something.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd iolock_written_async
NeilBrown [Sat, 3 Jul 2010 01:24:17 +0000 (11:24 +1000)]
Add iolock_written_async

There are a couple of places where this open coded.

Do it properly and use B_Async to keep the block around.

Also wake the cleaner thread when a B_Async block finished writeback.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAvoid overallocation to the cleaner
NeilBrown [Sat, 3 Jul 2010 01:00:20 +0000 (11:00 +1000)]
Avoid overallocation to the cleaner

Don't give space to the cleaner at the expense of space for
writing new blocks.  That would be greedy.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't clear clean_reserved when allocating more
NeilBrown [Sat, 3 Jul 2010 00:57:53 +0000 (10:57 +1000)]
Don't clear clean_reserved when allocating more

If the allocation should fail we will have reduced the
allocation, and there could still be an open cleaner-segment
which would thus be confused.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRevise rules for setting EmergencyClean
NeilBrown [Sat, 3 Jul 2010 00:56:17 +0000 (10:56 +1000)]
Revise rules for setting EmergencyClean

This rule for clearing is good I think.

Setting is now a bit too early.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix some more left-shift-overflow issues.
NeilBrown [Sat, 3 Jul 2010 00:50:17 +0000 (10:50 +1000)]
Fix some more left-shift-overflow issues.

We need to cast before shifting sometimes.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't reset level of InoIdx which has dirty children
NeilBrown [Sat, 3 Jul 2010 00:35:25 +0000 (10:35 +1000)]
Don't reset level of InoIdx which has dirty children

If we do' then when those children get allocated confusion
will happen.
This just delayed the 'empty' verdict a little.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake the first segment 'active'.
NeilBrown [Sat, 3 Jul 2010 00:13:00 +0000 (10:13 +1000)]
Make the first segment 'active'.

A segment needs to be marked 'active' while we are writing to
it.  Newly allocated segments get that already, but the segment we
start on didn't.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix some issues with special entries in the segment table.
NeilBrown [Sat, 3 Jul 2010 00:09:58 +0000 (10:09 +1000)]
Fix some issues with special entries in the segment table.

We weren't handling all the special cases properly.
So fix that up and use #defines to make it more readable.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRemove lafs_write_super.
NeilBrown [Fri, 2 Jul 2010 23:44:25 +0000 (09:44 +1000)]
Remove lafs_write_super.

We don't really want to do anything of lafs_write_super
as we write the superblock when needed anyway.
However lafs_sync_fs needs to do what lafs_write_super was doing,
at least sometimes.

lafs_sync_fs will now force a checkpoint exactly when s_dirt is
set.  So revise those setting a little - I think we only want this if
there are dirty inodes to flush. but that needs to be thought about
more when I fix write_inode.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAllow index block to be Realloc during truncate.
NeilBrown [Fri, 2 Jul 2010 23:41:08 +0000 (09:41 +1000)]
Allow index block to be Realloc during truncate.

And data blocks in realloc will have been destroyed in
erase_dblock, but there could legitimately be Realloc index blocks
still, so allow them to be handled.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't set I_Trunc until pages are invalidated and trunc_next is set.
NeilBrown [Fri, 2 Jul 2010 23:38:55 +0000 (09:38 +1000)]
Don't set I_Trunc until pages are invalidated and trunc_next is set.

The block could already be subject to orphan handling, as unlink
sets that up before truncation happens.
So make sure not to set I_Trunc until we a really ready for the
orphan-inode truncation handling to happen.
Without this truncation can race with the cleaner and weird thinks
can happen.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoClose segments properly when we get to the end.
NeilBrown [Fri, 2 Jul 2010 23:29:34 +0000 (09:29 +1000)]
Close segments properly when we get to the end.

Passing -1 to new_segment was just *wrong*.
Do it right, and make sure to close all segments, and
release refcounts, at unmount.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't remove a block from cleaning list until B_Realloc is set.
NeilBrown [Fri, 2 Jul 2010 23:21:21 +0000 (09:21 +1000)]
Don't remove a block from cleaning list until B_Realloc is set.

This ensures that a race with erase_dblock will either run
into the mutex, or be able to clear Realloc immediately.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agokeep ->cleaning list in order.
NeilBrown [Fri, 2 Jul 2010 23:19:10 +0000 (09:19 +1000)]
keep ->cleaning list in order.

As we processes blocks from the segment in order, it is best
to keep them in order for later processing.
They will be sorted again when being added to a cluster,
but the more we help here, the better.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoreset newblocks for each checkpoint
NeilBrown [Fri, 2 Jul 2010 23:17:18 +0000 (09:17 +1000)]
reset newblocks for each checkpoint

newblocks is the count of new blocks written to the filesystem in this
checkpoint (roughly the amount of work that roll-forward would have to
do).  We use it to trigger new checkpoints.
So we need to reset it after each checkpoint.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoHandle print_tree of NULL block cleanly.
NeilBrown [Fri, 2 Jul 2010 23:14:41 +0000 (09:14 +1000)]
Handle print_tree of NULL block cleanly.

Don't want a BUG here..

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure the segment being written is never cleaned.
NeilBrown [Thu, 1 Jul 2010 07:51:35 +0000 (17:51 +1000)]
Make sure the segment being written is never cleaned.

Cleaning the current segment would be a bad idea as it's
usage count isn't really representative of anything useful.
So leave it in the table flags as 'active' to avoid it
becoming cleanable, and remove it when the segment is finished
with.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMore issues with wc->seg being explicitly unset at certain times.
NeilBrown [Thu, 1 Jul 2010 07:13:47 +0000 (17:13 +1000)]
More issues with wc->seg being explicitly unset at certain times.

We need to clear wc->seg at the end of a cleaning segment when we
choose not to add another, and we need to cope correctly when such
a segment is found.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agocluster_reset
NeilBrown [Thu, 1 Jul 2010 07:06:02 +0000 (17:06 +1000)]
cluster_reset

split some common code into cluster_reset which should be
called after a flushing a cluster or after setting up a
new segment.

This wasn't really happening at all in one case.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix lafs_cluster_allocate to full cluster.
NeilBrown [Thu, 1 Jul 2010 06:58:54 +0000 (16:58 +1000)]
Fix lafs_cluster_allocate to full cluster.

When we unified the two loops in lafs_cluster_allocate,
we broke handling for a nearly-full cluster.  We need to
require wc->remaining is at least 1 before we even
consider a cluster_insert.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoProperly track segrefs held by active segments.
NeilBrown [Tue, 29 Jun 2010 11:57:33 +0000 (21:57 +1000)]
Properly track segrefs held by active segments.

Each active segment holds a segref.  This is currently in a
slightly haphazard way.

If .dev is >= 0 a segref is held, so new_segment drops it,
then devs .dev to -1, unless a new segment is found.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRemove all to lafs_io_wake in lafs_cluster_allocate
NeilBrown [Tue, 29 Jun 2010 11:45:52 +0000 (21:45 +1000)]
Remove all to lafs_io_wake in lafs_cluster_allocate

We only need io_wake when we unlock or clear writeback, and neither of
those happen here so discard the call, put in a 'break' instead and
turn the (hard to read) do loop into a for loop.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRearrange if structure in lafs_cluster_allocate
NeilBrown [Tue, 29 Jun 2010 11:40:57 +0000 (21:40 +1000)]
Rearrange if structure in lafs_cluster_allocate

try to get fewer indentation levels.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agocombine two loops in lafs_cluster_allocate
NeilBrown [Tue, 29 Jun 2010 11:34:24 +0000 (21:34 +1000)]
combine two loops in lafs_cluster_allocate

There are two loops which try to cluster_flush to get enough space.
One calls new_segment, the other assumes cluster_flush will do
that, which it might or might not.

Combine these into a single loop, and move the handling of
error from new_segment closer to the call.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRearrange code in lafs_cluster_allocate.
NeilBrown [Tue, 29 Jun 2010 11:24:27 +0000 (21:24 +1000)]
Rearrange code in lafs_cluster_allocate.

There are two loops in lafs_cluster_allocate that I want
to combine into one.
Need to get some stuff out of the way first.

EmptyIndex handling can move up to the other special-case
handling - just change writeback_done to iounlock_block.

Converting the iolock to writeback and getting a cluster reference
can also move up easily.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoHandle write clusters which point to themselves.
NeilBrown [Mon, 28 Jun 2010 22:04:45 +0000 (08:04 +1000)]
Handle write clusters which point to themselves.

This can happen at the end of a cleaner segment, and in
general it is best to be cautious.  So if the next pointer
isn't further along in this segment, don't follow it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd EmergencyClean mode
NeilBrown [Mon, 28 Jun 2010 10:27:44 +0000 (20:27 +1000)]
Add EmergencyClean mode

In this mode we are nearly full.
Cleaning just goes for the segment with the most space
even if it is quite new.
Allocation failures return ENOSPC rather than EAGAIN
We clean even if it doesn't look like it will help much.

The heuristic for switching in an out is rather odd...

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoReserve space for cleaner segments.
NeilBrown [Mon, 28 Jun 2010 09:40:32 +0000 (19:40 +1000)]
Reserve space for cleaner segments.

Now that we can reserve space specifically for cleaner
segments, do so and limit the number of cleaned segments
to the available number of cleaner segments.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRevise space allocation for cleaning.
NeilBrown [Mon, 28 Jun 2010 08:10:51 +0000 (18:10 +1000)]
Revise space allocation for cleaning.

We prefer to allocate whole segments for cleaning, but can only
do that if there is enough space.
If we cannot allocate whole segments, then just cleaning to the
main segment is perfectly acceptable.

So allow a 'clean_reserved' number which is a number of blocks that
have been reserved for cleaning - normally some number of segments.
The cleaner write whole segments while this number is big enough,
then gives up so the remainder will go to the main segment and not
create partial clean segments.

CleanSpace now never fails.  The next patch will cause the cleaner
to be more reserved in how much it asks for.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoReport directory size without holes.
NeilBrown [Mon, 28 Jun 2010 05:29:44 +0000 (15:29 +1000)]
Report directory size without holes.

Holes in a directory are an implementation details that does not need
to be exposed in i_size, and doing so is confusing and could leak info
about the hash used.
So when more than one block is used, report size as block size times
number of blocks.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't clear PinPending in blocks in the inode map.
NeilBrown [Mon, 28 Jun 2010 04:51:10 +0000 (14:51 +1000)]
Don't clear PinPending in blocks in the inode map.

We do not enforce exclusive access to these blocks, so it is not safe
to clear PinPending - some other thread might be allocating a nearby
inode and might need PinPending to remain.

It will get cleared at the next checkpoint or when the refcount on the
block hits zero, so there is no pressing need to clear it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agowait for pending truncate in delete_inode
NeilBrown [Mon, 28 Jun 2010 03:33:05 +0000 (13:33 +1000)]
wait for pending truncate in delete_inode

If we truncate then delete, the truncate could be on-going.
Safest to wait for it to finish before deleting (and this
truncating from 0).
This makes it more consistent with lafs_truncate.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRemove BUG_ON that is no longer valid.
NeilBrown [Mon, 28 Jun 2010 02:59:03 +0000 (12:59 +1000)]
Remove BUG_ON that is no longer valid.

Hasn't been valid for a while really - see comment.
However it finally triggered, so time to go.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMove filesystem shutdown from put_super to lafs_release.
NeilBrown [Mon, 28 Jun 2010 02:44:18 +0000 (12:44 +1000)]
Move filesystem shutdown from put_super to lafs_release.

It needs to happen before invalidate_inodes, else the cleaner
can corrupt the inode list.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix incorrect de-ref of ->my_inode
NeilBrown [Mon, 28 Jun 2010 02:33:18 +0000 (12:33 +1000)]
Fix incorrect de-ref of ->my_inode

my_inode may not be set on a block on writeout - e.g. if it was
just cleaned.
As my_inode does not disappear once set (while a ref is held
on the block) it is safe to simply test it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix races between truncate and cleaner.
NeilBrown [Mon, 28 Jun 2010 02:19:39 +0000 (12:19 +1000)]
Fix races between truncate and cleaner.

Not only do we need to recheck the size after putting
the block on the clean list, we also need to check
for inodes that have been cleared. (type == 0).

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoNew bugon for unlink loop.
NeilBrown [Mon, 28 Jun 2010 01:40:25 +0000 (11:40 +1000)]
New bugon for unlink loop.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoChange dirty_inode. Again.
NeilBrown [Mon, 28 Jun 2010 01:22:45 +0000 (11:22 +1000)]
Change dirty_inode.  Again.

Just set I_Dirty and obey that when we write_cluster.

If a change to the inode has to happen after the checkpoint,
it will come through setattr which will wait for the block to
be written (in pin_dblock).

Possibly mtime updates will slip through, but we could set S_NOCTIME
and do the updates more transactionally outselves.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoREADME update and bug-ons to help catch newly identified problems.
NeilBrown [Sun, 27 Jun 2010 23:16:24 +0000 (09:16 +1000)]
README update and bug-ons to help catch newly identified problems.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoGet block ref in lafs_add_block_address
NeilBrown [Sun, 27 Jun 2010 23:13:17 +0000 (09:13 +1000)]
Get block ref in lafs_add_block_address

It is possible to lose the ref on the parent that we
have before the final use to unlock, so make sure to
hold our own reference.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure we io_wait when we clear B_Writeback
NeilBrown [Sat, 26 Jun 2010 04:24:47 +0000 (14:24 +1000)]
Make sure we io_wait when we clear B_Writeback

There was one path where we didn't.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd proper locking and refcounting to pin_all_children.
NeilBrown [Sat, 26 Jun 2010 04:06:20 +0000 (14:06 +1000)]
Add proper locking and refcounting to pin_all_children.

We need private_lock to walk the list,
and we don't want to refile a block that might not have
a reference count.

And only pin Dirty children.  Others don't need it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoPin SegmentMap block when they might need to be dirtied.
NeilBrown [Sat, 26 Jun 2010 03:26:37 +0000 (13:26 +1000)]
Pin SegmentMap block when they might need to be dirtied.

This is more in-keeping with other practices and not that SegmentMap
blocks are handles carefully by the cleaner and checkpoint, it is
safe to do this.
They stay pinned until they are no-longer referenced.  This may keep
some Credits unavailable but that is not a big cost.

When we pin these we don't hold or need a phase lock, so don't
require it.

Now that we alway pin segments when first used (free_get) we don't
need to prealloc in lafs_seg_move.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDrop ssnum arg from lafs_free_get
NeilBrown [Sat, 26 Jun 2010 03:22:17 +0000 (13:22 +1000)]
Drop ssnum arg from lafs_free_get

It doesn't really make sense - we get a free segment for anything
from any snapshot that needs to be written.

Note that this makes more obvious the fact that snapshots are not
at all well supported yet.  Worry about that later.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRevise which blocks need N* credits.
NeilBrown [Sat, 26 Jun 2010 01:38:08 +0000 (11:38 +1000)]
Revise which blocks need N* credits.

I think it is just those that might be phase-flipped.
Let's hope that is right.
I think the others were there due to cleaning issues which
have not been resolved.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoBe Careful about cleaning PinPending blocks.
NeilBrown [Sat, 26 Jun 2010 01:26:46 +0000 (11:26 +1000)]
Be Careful about cleaning PinPending blocks.

PinPending blocks must never be written to the cleaner segment
as they might still get dirtied and need to be written in this phase,
but the cleaner will have taken their uninc credit.

So if we need to clean a PinPending block, just mark it dirty and
wait for it to be unpinned or written normally.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoChange flushing of space-accounting blocks.
NeilBrown [Sat, 26 Jun 2010 01:16:10 +0000 (11:16 +1000)]
Change flushing of space-accounting blocks.

Space-accounting blocks need to be flushed very late in the
checkpoint.

We were special casing these, but in an awkward way.

Change it so that these blocks are pinned, but that a checkpoint
doesn't handle them straight away but rather performs a phase_flip
and then queues them for later handling.

This means that we get more consistent behviour of pinned data blocks
and writepage doesn't need to special-base the flushing of segment
usage blocks.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoClean up do_checkpoint a bit.
NeilBrown [Fri, 25 Jun 2010 23:45:41 +0000 (09:45 +1000)]
Clean up do_checkpoint a bit.

There was an extra nest level that was just for debugging,
so remove it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoTidy up and re-factor lafs_phase_flip
NeilBrown [Fri, 25 Jun 2010 23:25:18 +0000 (09:25 +1000)]
Tidy up and re-factor lafs_phase_flip

There is some common code that can be extracted.
That code doesn't need any locking that I can find.

Once that is moved out, the rest can be made a lot neater too.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDesign Thoughts about PinPending and SegmentMap
NeilBrown [Fri, 25 Jun 2010 23:05:09 +0000 (09:05 +1000)]
Design Thoughts about PinPending and SegmentMap

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoComplete TODO list
NeilBrown [Fri, 25 Jun 2010 11:01:58 +0000 (21:01 +1000)]
Complete TODO list

Also remove some white-space badness.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd some more tracing.
NeilBrown [Fri, 25 Jun 2010 10:57:31 +0000 (20:57 +1000)]
Add some more tracing.

Just stuff that I have thought might be useful recently.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd lots of assertions.
NeilBrown [Fri, 25 Jun 2010 10:46:38 +0000 (20:46 +1000)]
Add lots of assertions.

Add a number of assertions (BUG_ON) that have show to be valuable.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd tracing to show orphans at shutdown.
NeilBrown [Fri, 25 Jun 2010 10:38:30 +0000 (20:38 +1000)]
Add tracing to show orphans at shutdown.

We have had problems with orphans not disappearing.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd debug tracing to unlink.
NeilBrown [Fri, 25 Jun 2010 10:34:52 +0000 (20:34 +1000)]
Add debug tracing to unlink.

Have had strange code of unlink failing to find the target
file.  So add lots of tracing in the hope that it will happen
again.
It might be sensitive to the hash chosen.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd loop-check to do_checkpoint
NeilBrown [Fri, 25 Jun 2010 10:29:34 +0000 (20:29 +1000)]
Add loop-check to do_checkpoint

If we cannot make adequate progress in do_checkpoint,
as the root inode doesn't seem to be changing phase, report
the status and abort.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoexport lafs_dir_print
NeilBrown [Fri, 25 Jun 2010 09:21:47 +0000 (19:21 +1000)]
export lafs_dir_print

so dir.c can use it for debugging.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd cluster list tracking to print_tree
NeilBrown [Fri, 25 Jun 2010 09:17:29 +0000 (19:17 +1000)]
Add cluster list tracking to print_tree

In print tree, where we try to print which 'lru' list a block
is on, also check the write-cluster lists, both in preparation,
and waiting for Writeout.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd has_ref to help debugging.
NeilBrown [Fri, 25 Jun 2010 09:15:32 +0000 (19:15 +1000)]
Add has_ref to help debugging.

It is sometimes helpful to BUG_ON whether a block has a certain
ref or not.
So add "has_ref" which returns -1 if we don't know (debugging
disabled) or  0/1 depending on whether ref is held.

Signed-off-by: NeilBrown <neilb@suse.de>