git.neil.brown.name Git

]> git.neil.brown.name Git - LaFS.git/log

NeilBrown [Sat, 15 Aug 2009 07:09:24 +0000 (17:09 +1000)]

Improve credit handling when flushing a data block into the inode.

Both the credit and icredit need to be moved from the data block
to the inode data block (if needed)

commit | commitdiff | tree

NeilBrown [Sat, 15 Aug 2009 07:09:23 +0000 (17:09 +1000)]

make checkpointing more robust

As long as the root inode hasn't changed phase, we keep
looking for things to do.
If things got confused, this could livelock, which would be bad...

commit | commitdiff | tree

NeilBrown [Sat, 15 Aug 2009 07:09:22 +0000 (17:09 +1000)]

Simplify writeout rules for inode data block.

Previously we did not write an inode data block until the
InoIdx was ready.
This is not good if we need to sync an inode well before checkpoint
runs.
So just write an inde data block when we find it, but ensure not to
send an inode data block during checkpoint until the InoIdx block is
ready.

Note: it is now clear why an inode has two sets of credits, one on the
data block and one on the index block. The first set may be needed to
sync the inode metadata. The second may be need to update the
indexing information - they are copied across to the data block for
this purpose.

commit | commitdiff | tree

NeilBrown [Sat, 15 Aug 2009 07:09:20 +0000 (17:09 +1000)]

Simplify iolocking in get_flushable

The difference between data and index block is not really supportable,
and we cannot just avoid waiting for some blocks.

But we cannot always for a full iowait as block that have been
allocated to a cluster do not complete until the cluster is written
and we don't want to wait for a cluster to be written, especially as
we there thread that is supposed to do that.

So create an intermediate iowait which wait for iolock to be dropped
or the block to be placed on a list. Once it is on a list we can be
sure not to lose it.

So we wait while incorporation or truncation happens, but not while
writeout happens.

commit | commitdiff | tree

NeilBrown [Sat, 15 Aug 2009 07:03:19 +0000 (17:03 +1000)]

Avoid races when processing blocks for checkpoint.

When doing a checkpoint we need to be sure that every
flushable block is accounted for.  So we need to be able
to wait for anything outstanding.

Currently a block can be removed from the leaflist by writepage
before it is placed on the cluster list.  During this window
the checkpoint thread cannot see it and so might progress without
waiting for and so will think it has completed prematurely.

So delay the removal from the leaflist until after we have the
writecluster lock.  This assures that every leaf block will be
either on the leaf list or on the cluster list when checkpoint is
looking for it.

When checkpoint calls cluster_done, this will release any blocks
from the cluster and, if needed, put them back on the leaf list where
they can be found again.

commit | commitdiff | tree

NeilBrown [Sat, 15 Aug 2009 05:52:56 +0000 (15:52 +1000)]

Remove dirtying of InoIdx in place of inode data block.

I don't remember why this was here, but until very recently
the code was wrong so it didn't do the "right" thing anyway.
And it doesn't seem to make sense.

When we dirty a dblock, we really want it to be dirty.
We might then write it and roll-forward will be able to pick
it all up except the index information which is always
calculated from addresses that are actually found.

commit | commitdiff | tree

NeilBrown [Sat, 15 Aug 2009 05:52:49 +0000 (15:52 +1000)]

Don't set Valid when setting Dirty.

While a block must be Valid to be Dirty, it is best to
only set Valid when actually putting data in the block,
and then check that it is Valid when marking it Dirty.
That can catch more bugs.

commit | commitdiff | tree

NeilBrown [Sat, 15 Aug 2009 05:49:46 +0000 (15:49 +1000)]

inode: Fix problems at inode creation.

- we need to pin the new inode dblock to ensure it gets written
- we don't need to dirty it in inode_map_new_commit as it is already
dirtied in lafs_inode_init.

commit | commitdiff | tree

NeilBrown [Sat, 15 Aug 2009 05:49:42 +0000 (15:49 +1000)]

lafs_allocated_block: don't repeat so much for a new_parent

When we find that incorporation has pushed us to a new parent,
we don't need to repeat so much.... maybe not even very much at all

commit | commitdiff | tree

NeilBrown [Sun, 9 Aug 2009 06:11:08 +0000 (16:11 +1000)]

lafs_refile: only use ->inode pointer when we know it is valid.

->inode may be invalid if block is not pinned, or at least
has a parent. So don't try to find 'fs' from it until we know
we will need it.

commit | commitdiff | tree

NeilBrown [Sun, 9 Aug 2009 05:50:54 +0000 (15:50 +1000)]

Don't insist on having UnincCredits for all Index blocks.

When an Index block has room for new addresses, it does not need to
have an UnincCredit because we know it will not split before more
credits are available.

commit | commitdiff | tree

NeilBrown [Sat, 8 Aug 2009 05:25:07 +0000 (15:25 +1000)]

temp_credits accounting.

Allow - in a totally smp-unsafe way - for credits which
aren't stored anywhere to be accounted when checking totals.

commit | commitdiff | tree

NeilBrown [Sat, 8 Aug 2009 04:56:29 +0000 (14:56 +1000)]

Be sure to remove all credits when releasing a block from a page.

As it has just become unlinked.

commit | commitdiff | tree

NeilBrown [Sat, 8 Aug 2009 04:23:40 +0000 (14:23 +1000)]

Make sure inodes don't get forgotten during cleaning.

During cleaning and other times when inodes might have dirty
index blocks we don't want the inode to be pushed out due to
apparently not being in use.

However there is a difficulty in holding a reference on the inode as
that cause the truncate following a final unlink to be delayed.

So to compromise, whenever the InoIdx block for an inode is pinned,
hold a reference to the inode as long as the link count is non-zero.

Once the link count becomes zero, we drop the extra ref and if this
leads to the inode being deleted, the current delaying of this
deletion (while the dblock is references) will keep the inode around
just long enough.

commit | commitdiff | tree

NeilBrown [Wed, 5 Aug 2009 06:38:08 +0000 (16:38 +1000)]

cleaner: hold ref on inode while preparing to clean blocks.

b->inode does not own a reference to the inode so we need to have some
other way to make sure it never goes invalid.
Normal filesystem references are safe as the VFS will truncate pages
before freeing the inode. But cleaner accesses don't benefit from
that.

Once a block is Pinned (e.g. when B_Realloc) it owns a reference up to
the InoIdx and so the dblock is also referenced. This ensures that
the inode won't go away even on destroy_inode.
However we hold a block for a short period before it is Pinned, so we
must hold a reference on the inode for that period as well.

commit | commitdiff | tree

NeilBrown [Wed, 5 Aug 2009 05:23:24 +0000 (15:23 +1000)]

cleaner: avoid blocks that are beyond the end of file.

This is a fairly half-hearted attempt which is more to
leave a reminder to get it right later.

commit | commitdiff | tree

NeilBrown [Wed, 5 Aug 2009 05:13:24 +0000 (15:13 +1000)]

Make sure cleaner doesn't start up after the FinalCheckpoint

We need to leave the cleaner thread running for checkpoint processing,
but don't want an really cleaning happening after the FinalCheckpoint.
So check and don't start anything new.

commit | commitdiff | tree

NeilBrown [Mon, 3 Aug 2009 01:28:37 +0000 (11:28 +1000)]

Don't use B_Credit to set B_Realloc

We need to keep B_Credit to possibly set B_Dirty (if a B_Realloc
block gets dirtied while being written to the 'clean' cluster we
still need to write it to a normal cluster).
So Find B_Realloc from elsewhere, use lafs_space_alloc if needed,
or as a last result setting B_Dirty.

commit | commitdiff | tree

NeilBrown [Mon, 3 Aug 2009 00:36:59 +0000 (10:36 +1000)]

Don't clear B_Realloc when setting B_Dirty

If the block has not yet been allocated to a cluster, then
B_Realloc will be cleared just before cluster allocation, and the
allocate won't happen.
If the block has already been allocated to a cluster for cleaning,
then we need the credit implied by B_Realloc so we shouldn't clear it.
The block will later be written to a normal cluster on the basis of
B_Dirty, so try to forget this cleaning, and in particular don't call
lafs_allocated_block as that will be a waste.....
Instead call lafs_pin_block to ensure that the block gets written out
in this phase, which is as good as cleaning.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 11:09:49 +0000 (21:09 +1000)]

README update and minor cosmetic changes.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 11:09:47 +0000 (21:09 +1000)]

Make sure segusage blocks are uptodate in seg_apply.

During roll-forward they can be not-read-yet.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 10:11:32 +0000 (20:11 +1000)]

Relax requirement for UnincCredit in lafs_allocated_block

A block that is being deleted and so has a new phys address of
0 does not need an UnincCredit.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 10:08:47 +0000 (20:08 +1000)]

clean: don't allow C or F to go negative.

As these are unsigned values, negatives are bad and confusing.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:52:20 +0000 (19:52 +1000)]

Fix use-wrong-variable bug in cluster_insert

I deleted the wrong thing from a list.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:50:55 +0000 (19:50 +1000)]

release iolock in lafs_phase_flip

The tail end of lafs_phase_flip needs the iolock to have been
dropped, so always drop it somewhere in lafs_phase_flip.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:47:40 +0000 (19:47 +1000)]

Add tracing to help find iolock deadlocks

i.e. record the place where the iolock was last claimed
on each block.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:52 +0000 (19:29 +1000)]

cluster_flush credit handling.

Combine calls to space_use and space_return as they do the same thing.
And move them up to immediately after the credits have been calculated
to avoid possibility of the credit-counter finding an incorrect count
that is only transient.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:52 +0000 (19:29 +1000)]

Clear uninc_credit when removing a data block from the tree.

We must drop this credit to keep the counts correct..

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:52 +0000 (19:29 +1000)]

Improve prepare_checkpoint locking.

prepare_checkpoint currently takes wc[0].lock.
This is presumably to avoid races with ->checkpointing updates.
However it causes a deadlock if any code that holds the checkpoint
lock needs to flush a cluster or make other updates to a cluster,
which can happen.
So use fs->lock to protect ->checkpointing instead.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:52 +0000 (19:29 +1000)]

flush_data_to_inode fixes.

When flushing data to the inode, mark the dblock dirty
rather than the iblock. We aren't allow to mark the iblock
dirty unless it is pinned, which it might not be....
It must be preallocated as the data is dirty, but the iblock doesn't
get pinned until the data is actually written.
As the inode data block is not pinned, it might not be written in
a particular phase, but that isn't a problem as long as it gets
written some time.

Add a BUG_ON if we ever try to dirty a non-Pinned index block again.
They are a problem because non-pinned InoIdx can lose their dblock
and then they fall out of the tree.

Also fix the space_return - we have already returned for the Dirty
flag.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:52 +0000 (19:29 +1000)]

fix max height for segment skiplist.

The max is defined as 8, but we use 9 !!!
Fix this and make sure it doesn't happen again.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:51 +0000 (19:29 +1000)]

count credits for debugging

Add code to count the number of credits in the tree
and make sure it matches the allocated_blocks number.
Call this at some useful places.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:51 +0000 (19:29 +1000)]

lafs_put_super - don't putref dblock

There is no need for this putref here - normal inode handling
gets it right.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:51 +0000 (19:29 +1000)]

lafs_shrinker fixes.

1/ only clear inode->iblock if the block we are about to
free is the InoIdx block (i.e. is inode->iblock).
2/ Assume that unreffed iblocks are not pinned, they should
never be.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:49 +0000 (19:29 +1000)]

lafs_release_index: free all index blocks when freeing an inode.

That than leaving the index blocks on the free list in an
to-be-freed state, actively free them when the inode is freed.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:49 +0000 (19:29 +1000)]

Fix freeing of inodes and ->dblock

The inode and the dblock each have a reference to the other.  Neither
are counted because:
   We don't want the inode to refcnt the dblock as that wastes space.
   We don't want the dblock to refcnt the inode as that stops it from
     being freed.

So when either is freed, it must remove the reference from the other.
To ease locking,  when the inode is freed it converts the reference,
if present, to a counted reference (using the same rule as
lafs_inode_dblock), then flags the inode for destruction and drops
the reference.

When the last reference to a dblock is dropped, it removes
both references and the calls destroy_inode again.

Notes that the dblock only exists while the inode exists - as soon
as the inode is destroyed, any dblock that might be around will
quickly get destroyed too, and the inode destruction is delayed until
this point.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:49 +0000 (19:29 +1000)]

Set I_Deleting on root inode during unmount

I don't remember exactly why I need this unfortunately....

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:49 +0000 (19:29 +1000)]

getref_locked fixes.

When lafs_dirty_inode chooses to dirty the dblock, because there
is no iblock, we need to getref_locked that dblock because there
will be no implied reference.

And when doing that, don't dereference ->my_inode unless we are sure
it is valid - i.e. that this is a block in an inode file.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:48 +0000 (19:29 +1000)]

Use write_super to start checkpoint and sync_fs to wait for it.

There are always called in the right order and at the right place,
and this gives us a bit more control.

I don't remember why the cluster_flush was there, so I don't know
where to put it..

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:48 +0000 (19:29 +1000)]

roll: use add_block_address rather than allocated block.

This avoids messing around with setting 'dirty' flags.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:48 +0000 (19:29 +1000)]

allocated_block: split out part of code for use in phase_flip.

When we flip phase, delayed incorporation needs to include addresses
in to the block. But some of the lafs_allocated_block work has
already been done. So split out the rest into a smaller function
for phase_flip to call.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:48 +0000 (19:29 +1000)]

put_super: release orphan and segsum files.

These files need to be released at unmount...

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:29:27 +0000 (19:29 +1000)]

umount fixups - flush in the right place.

We really need to sync the filesystem before the
final checkpoint and other cleanups in lafs_release.
So move them to lafs_put_super so they get done after
the sync in generic_shutdown_super, but before the superblock
is completely destroyed.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:28:53 +0000 (19:28 +1000)]

refile: when clearing Pinned, remove from lru too

As only pinned blocks should be on leafs lru.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:28:53 +0000 (19:28 +1000)]

refile: blocks with B_IOLock are not on lru

so don't assume they are when checking refcounts.

commit | commitdiff | tree

NeilBrown [Sun, 2 Aug 2009 09:28:53 +0000 (19:28 +1000)]

cluster_allocate: remove from leafs list if needed.

When lafs_writepage called cluster_allocate, the block could
be on a leafs list. In that case we need to cleanly remove it
from the list as lru is about to be used for a different purpose,
and it doesn't need to be on the list for a while.

commit | commitdiff | tree