]> git.neil.brown.name Git - LaFS.git/log
LaFS.git
15 years agoMore issues with wc->seg being explicitly unset at certain times.
NeilBrown [Thu, 1 Jul 2010 07:13:47 +0000 (17:13 +1000)]
More issues with wc->seg being explicitly unset at certain times.

We need to clear wc->seg at the end of a cleaning segment when we
choose not to add another, and we need to cope correctly when such
a segment is found.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agocluster_reset
NeilBrown [Thu, 1 Jul 2010 07:06:02 +0000 (17:06 +1000)]
cluster_reset

split some common code into cluster_reset which should be
called after a flushing a cluster or after setting up a
new segment.

This wasn't really happening at all in one case.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix lafs_cluster_allocate to full cluster.
NeilBrown [Thu, 1 Jul 2010 06:58:54 +0000 (16:58 +1000)]
Fix lafs_cluster_allocate to full cluster.

When we unified the two loops in lafs_cluster_allocate,
we broke handling for a nearly-full cluster.  We need to
require wc->remaining is at least 1 before we even
consider a cluster_insert.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoProperly track segrefs held by active segments.
NeilBrown [Tue, 29 Jun 2010 11:57:33 +0000 (21:57 +1000)]
Properly track segrefs held by active segments.

Each active segment holds a segref.  This is currently in a
slightly haphazard way.

If .dev is >= 0 a segref is held, so new_segment drops it,
then devs .dev to -1, unless a new segment is found.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRemove all to lafs_io_wake in lafs_cluster_allocate
NeilBrown [Tue, 29 Jun 2010 11:45:52 +0000 (21:45 +1000)]
Remove all to lafs_io_wake in lafs_cluster_allocate

We only need io_wake when we unlock or clear writeback, and neither of
those happen here so discard the call, put in a 'break' instead and
turn the (hard to read) do loop into a for loop.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRearrange if structure in lafs_cluster_allocate
NeilBrown [Tue, 29 Jun 2010 11:40:57 +0000 (21:40 +1000)]
Rearrange if structure in lafs_cluster_allocate

try to get fewer indentation levels.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agocombine two loops in lafs_cluster_allocate
NeilBrown [Tue, 29 Jun 2010 11:34:24 +0000 (21:34 +1000)]
combine two loops in lafs_cluster_allocate

There are two loops which try to cluster_flush to get enough space.
One calls new_segment, the other assumes cluster_flush will do
that, which it might or might not.

Combine these into a single loop, and move the handling of
error from new_segment closer to the call.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRearrange code in lafs_cluster_allocate.
NeilBrown [Tue, 29 Jun 2010 11:24:27 +0000 (21:24 +1000)]
Rearrange code in lafs_cluster_allocate.

There are two loops in lafs_cluster_allocate that I want
to combine into one.
Need to get some stuff out of the way first.

EmptyIndex handling can move up to the other special-case
handling - just change writeback_done to iounlock_block.

Converting the iolock to writeback and getting a cluster reference
can also move up easily.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoHandle write clusters which point to themselves.
NeilBrown [Mon, 28 Jun 2010 22:04:45 +0000 (08:04 +1000)]
Handle write clusters which point to themselves.

This can happen at the end of a cleaner segment, and in
general it is best to be cautious.  So if the next pointer
isn't further along in this segment, don't follow it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd EmergencyClean mode
NeilBrown [Mon, 28 Jun 2010 10:27:44 +0000 (20:27 +1000)]
Add EmergencyClean mode

In this mode we are nearly full.
Cleaning just goes for the segment with the most space
even if it is quite new.
Allocation failures return ENOSPC rather than EAGAIN
We clean even if it doesn't look like it will help much.

The heuristic for switching in an out is rather odd...

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoReserve space for cleaner segments.
NeilBrown [Mon, 28 Jun 2010 09:40:32 +0000 (19:40 +1000)]
Reserve space for cleaner segments.

Now that we can reserve space specifically for cleaner
segments, do so and limit the number of cleaned segments
to the available number of cleaner segments.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRevise space allocation for cleaning.
NeilBrown [Mon, 28 Jun 2010 08:10:51 +0000 (18:10 +1000)]
Revise space allocation for cleaning.

We prefer to allocate whole segments for cleaning, but can only
do that if there is enough space.
If we cannot allocate whole segments, then just cleaning to the
main segment is perfectly acceptable.

So allow a 'clean_reserved' number which is a number of blocks that
have been reserved for cleaning - normally some number of segments.
The cleaner write whole segments while this number is big enough,
then gives up so the remainder will go to the main segment and not
create partial clean segments.

CleanSpace now never fails.  The next patch will cause the cleaner
to be more reserved in how much it asks for.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoReport directory size without holes.
NeilBrown [Mon, 28 Jun 2010 05:29:44 +0000 (15:29 +1000)]
Report directory size without holes.

Holes in a directory are an implementation details that does not need
to be exposed in i_size, and doing so is confusing and could leak info
about the hash used.
So when more than one block is used, report size as block size times
number of blocks.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't clear PinPending in blocks in the inode map.
NeilBrown [Mon, 28 Jun 2010 04:51:10 +0000 (14:51 +1000)]
Don't clear PinPending in blocks in the inode map.

We do not enforce exclusive access to these blocks, so it is not safe
to clear PinPending - some other thread might be allocating a nearby
inode and might need PinPending to remain.

It will get cleared at the next checkpoint or when the refcount on the
block hits zero, so there is no pressing need to clear it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agowait for pending truncate in delete_inode
NeilBrown [Mon, 28 Jun 2010 03:33:05 +0000 (13:33 +1000)]
wait for pending truncate in delete_inode

If we truncate then delete, the truncate could be on-going.
Safest to wait for it to finish before deleting (and this
truncating from 0).
This makes it more consistent with lafs_truncate.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRemove BUG_ON that is no longer valid.
NeilBrown [Mon, 28 Jun 2010 02:59:03 +0000 (12:59 +1000)]
Remove BUG_ON that is no longer valid.

Hasn't been valid for a while really - see comment.
However it finally triggered, so time to go.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMove filesystem shutdown from put_super to lafs_release.
NeilBrown [Mon, 28 Jun 2010 02:44:18 +0000 (12:44 +1000)]
Move filesystem shutdown from put_super to lafs_release.

It needs to happen before invalidate_inodes, else the cleaner
can corrupt the inode list.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix incorrect de-ref of ->my_inode
NeilBrown [Mon, 28 Jun 2010 02:33:18 +0000 (12:33 +1000)]
Fix incorrect de-ref of ->my_inode

my_inode may not be set on a block on writeout - e.g. if it was
just cleaned.
As my_inode does not disappear once set (while a ref is held
on the block) it is safe to simply test it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix races between truncate and cleaner.
NeilBrown [Mon, 28 Jun 2010 02:19:39 +0000 (12:19 +1000)]
Fix races between truncate and cleaner.

Not only do we need to recheck the size after putting
the block on the clean list, we also need to check
for inodes that have been cleared. (type == 0).

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoNew bugon for unlink loop.
NeilBrown [Mon, 28 Jun 2010 01:40:25 +0000 (11:40 +1000)]
New bugon for unlink loop.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoChange dirty_inode. Again.
NeilBrown [Mon, 28 Jun 2010 01:22:45 +0000 (11:22 +1000)]
Change dirty_inode.  Again.

Just set I_Dirty and obey that when we write_cluster.

If a change to the inode has to happen after the checkpoint,
it will come through setattr which will wait for the block to
be written (in pin_dblock).

Possibly mtime updates will slip through, but we could set S_NOCTIME
and do the updates more transactionally outselves.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoREADME update and bug-ons to help catch newly identified problems.
NeilBrown [Sun, 27 Jun 2010 23:16:24 +0000 (09:16 +1000)]
README update and bug-ons to help catch newly identified problems.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoGet block ref in lafs_add_block_address
NeilBrown [Sun, 27 Jun 2010 23:13:17 +0000 (09:13 +1000)]
Get block ref in lafs_add_block_address

It is possible to lose the ref on the parent that we
have before the final use to unlock, so make sure to
hold our own reference.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake sure we io_wait when we clear B_Writeback
NeilBrown [Sat, 26 Jun 2010 04:24:47 +0000 (14:24 +1000)]
Make sure we io_wait when we clear B_Writeback

There was one path where we didn't.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd proper locking and refcounting to pin_all_children.
NeilBrown [Sat, 26 Jun 2010 04:06:20 +0000 (14:06 +1000)]
Add proper locking and refcounting to pin_all_children.

We need private_lock to walk the list,
and we don't want to refile a block that might not have
a reference count.

And only pin Dirty children.  Others don't need it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoPin SegmentMap block when they might need to be dirtied.
NeilBrown [Sat, 26 Jun 2010 03:26:37 +0000 (13:26 +1000)]
Pin SegmentMap block when they might need to be dirtied.

This is more in-keeping with other practices and not that SegmentMap
blocks are handles carefully by the cleaner and checkpoint, it is
safe to do this.
They stay pinned until they are no-longer referenced.  This may keep
some Credits unavailable but that is not a big cost.

When we pin these we don't hold or need a phase lock, so don't
require it.

Now that we alway pin segments when first used (free_get) we don't
need to prealloc in lafs_seg_move.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDrop ssnum arg from lafs_free_get
NeilBrown [Sat, 26 Jun 2010 03:22:17 +0000 (13:22 +1000)]
Drop ssnum arg from lafs_free_get

It doesn't really make sense - we get a free segment for anything
from any snapshot that needs to be written.

Note that this makes more obvious the fact that snapshots are not
at all well supported yet.  Worry about that later.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRevise which blocks need N* credits.
NeilBrown [Sat, 26 Jun 2010 01:38:08 +0000 (11:38 +1000)]
Revise which blocks need N* credits.

I think it is just those that might be phase-flipped.
Let's hope that is right.
I think the others were there due to cleaning issues which
have not been resolved.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoBe Careful about cleaning PinPending blocks.
NeilBrown [Sat, 26 Jun 2010 01:26:46 +0000 (11:26 +1000)]
Be Careful about cleaning PinPending blocks.

PinPending blocks must never be written to the cleaner segment
as they might still get dirtied and need to be written in this phase,
but the cleaner will have taken their uninc credit.

So if we need to clean a PinPending block, just mark it dirty and
wait for it to be unpinned or written normally.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoChange flushing of space-accounting blocks.
NeilBrown [Sat, 26 Jun 2010 01:16:10 +0000 (11:16 +1000)]
Change flushing of space-accounting blocks.

Space-accounting blocks need to be flushed very late in the
checkpoint.

We were special casing these, but in an awkward way.

Change it so that these blocks are pinned, but that a checkpoint
doesn't handle them straight away but rather performs a phase_flip
and then queues them for later handling.

This means that we get more consistent behviour of pinned data blocks
and writepage doesn't need to special-base the flushing of segment
usage blocks.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoClean up do_checkpoint a bit.
NeilBrown [Fri, 25 Jun 2010 23:45:41 +0000 (09:45 +1000)]
Clean up do_checkpoint a bit.

There was an extra nest level that was just for debugging,
so remove it.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoTidy up and re-factor lafs_phase_flip
NeilBrown [Fri, 25 Jun 2010 23:25:18 +0000 (09:25 +1000)]
Tidy up and re-factor lafs_phase_flip

There is some common code that can be extracted.
That code doesn't need any locking that I can find.

Once that is moved out, the rest can be made a lot neater too.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDesign Thoughts about PinPending and SegmentMap
NeilBrown [Fri, 25 Jun 2010 23:05:09 +0000 (09:05 +1000)]
Design Thoughts about PinPending and SegmentMap

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoComplete TODO list
NeilBrown [Fri, 25 Jun 2010 11:01:58 +0000 (21:01 +1000)]
Complete TODO list

Also remove some white-space badness.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd some more tracing.
NeilBrown [Fri, 25 Jun 2010 10:57:31 +0000 (20:57 +1000)]
Add some more tracing.

Just stuff that I have thought might be useful recently.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd lots of assertions.
NeilBrown [Fri, 25 Jun 2010 10:46:38 +0000 (20:46 +1000)]
Add lots of assertions.

Add a number of assertions (BUG_ON) that have show to be valuable.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd tracing to show orphans at shutdown.
NeilBrown [Fri, 25 Jun 2010 10:38:30 +0000 (20:38 +1000)]
Add tracing to show orphans at shutdown.

We have had problems with orphans not disappearing.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd debug tracing to unlink.
NeilBrown [Fri, 25 Jun 2010 10:34:52 +0000 (20:34 +1000)]
Add debug tracing to unlink.

Have had strange code of unlink failing to find the target
file.  So add lots of tracing in the hope that it will happen
again.
It might be sensitive to the hash chosen.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd loop-check to do_checkpoint
NeilBrown [Fri, 25 Jun 2010 10:29:34 +0000 (20:29 +1000)]
Add loop-check to do_checkpoint

If we cannot make adequate progress in do_checkpoint,
as the root inode doesn't seem to be changing phase, report
the status and abort.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoexport lafs_dir_print
NeilBrown [Fri, 25 Jun 2010 09:21:47 +0000 (19:21 +1000)]
export lafs_dir_print

so dir.c can use it for debugging.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd cluster list tracking to print_tree
NeilBrown [Fri, 25 Jun 2010 09:17:29 +0000 (19:17 +1000)]
Add cluster list tracking to print_tree

In print tree, where we try to print which 'lru' list a block
is on, also check the write-cluster lists, both in preparation,
and waiting for Writeout.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd has_ref to help debugging.
NeilBrown [Fri, 25 Jun 2010 09:15:32 +0000 (19:15 +1000)]
Add has_ref to help debugging.

It is sometimes helpful to BUG_ON whether a block has a certain
ref or not.
So add "has_ref" which returns -1 if we don't know (debugging
disabled) or  0/1 depending on whether ref is held.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoREADME update, typos, FIXME comments etc. No code.
NeilBrown [Fri, 25 Jun 2010 08:58:46 +0000 (18:58 +1000)]
README update, typos, FIXME comments etc.  No code.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRemove stray function declarations from lafs.h
NeilBrown [Fri, 25 Jun 2010 06:52:07 +0000 (16:52 +1000)]
Remove stray function declarations from lafs.h

One is duplicated.
The other is for a non-existent function.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAllow writers to block while the cleaner makes a little progress.
NeilBrown [Wed, 23 Jun 2010 07:00:23 +0000 (17:00 +1000)]
Allow writers to block while the cleaner makes a little progress.

We record how much progress is required, and allow to wait
for that much progress to happen at which point a checkpoint
happens.

Also add 'free_segs' similar to 'free_blocks' (and counting blocks)
which counts the number of blocks in free segs, not including
any current segs.  This forced allocators to wait sooner and
may be more appropriate.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRevise error returns for allocation failure.
NeilBrown [Wed, 23 Jun 2010 06:20:28 +0000 (16:20 +1000)]
Revise error returns for allocation failure.

lafs_prealloc, like lafs_space_alloc, just succeeds or fails, it
doesn't choose the error type.

lafs_cluster_update_pin and lafs_reserve_block need to indicate
either -ENOSPC or -EAGAIN depending on context.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd AccountSpace reservation.
NeilBrown [Wed, 23 Jun 2010 05:55:47 +0000 (15:55 +1000)]
Add AccountSpace reservation.

This is used when the space is needed for accounting space usage and
so failure implies the filesystem is corrupt.

Also start returning failure for ReserveSpace as we are gearing up
to handle it

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMake cluster_flush conditional on there being something to write.
NeilBrown [Fri, 25 Jun 2010 06:34:50 +0000 (16:34 +1000)]
Make cluster_flush conditional on there being something to write.

Avoid inadvertent empty clusters by only writing a cluster if
there is clearly something to write.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoImprove flushing of 'cleaner' clusters.
NeilBrown [Fri, 25 Jun 2010 06:30:18 +0000 (16:30 +1000)]
Improve flushing of 'cleaner' clusters.

There is no need for the cleaner to ever wait for block which
have been written.  Once the write has been requested, the block
will not be Realloc any more, and so will not get back onto the
clean_leaf list anyway.

So just flush out the cluster when everything is done.
Earlier flushes will happen when a segment gets full.

Just to be safe, also flush the cleaner cluster before a checkpoint.
If there is anything awaiting flushing, it will be invisible
to the checkpoint process, but will hold other blocks pinned
so the checkpoint will not be able to proceed.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRelax loop count restriction in truncation.
NeilBrown [Fri, 25 Jun 2010 06:21:06 +0000 (16:21 +1000)]
Relax loop count restriction in truncation.

It is possible to hit the current count in normal operations,
so make it a lot gentler.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoMinor updates to 'go' script.
NeilBrown [Fri, 25 Jun 2010 05:59:39 +0000 (15:59 +1000)]
Minor updates to 'go' script.

Trigger a global stack trace if we block at unmount.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoClean up final check of leafs lists.
NeilBrown [Fri, 25 Jun 2010 05:57:52 +0000 (15:57 +1000)]
Clean up final check of leafs lists.

Clearing the Pinned bit just confused the following
printout, so skip that.
And check that clean_leafs is empty too.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoStop using AOP_WRITEPAGE_ACTIVATE
NeilBrown [Wed, 23 Jun 2010 02:34:35 +0000 (12:34 +1000)]
Stop using AOP_WRITEPAGE_ACTIVATE

This is the wrong thing to do.
I need to redirty and unlock the page.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't allow memory flush to write out segusage blocks.
NeilBrown [Wed, 23 Jun 2010 02:29:57 +0000 (12:29 +1000)]
Don't allow memory flush to write out segusage blocks.

We always write these after a checkpoint, and there is little to be
gained by writing them earlier, and doing so causes their
dirty status to be lost, which is bad.

They should be treated much like PinPending blocks, but they
are not PinPending as they are written later in the checkpoint.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFix inode_orphan_handler issues.
NeilBrown [Wed, 23 Jun 2010 02:23:50 +0000 (12:23 +1000)]
Fix inode_orphan_handler issues.

1/ a stray ';' caused a while loop not to work.

2/ If the for loop finds a block with a 'primary' reference,
  just incorporating it won't help.  We need to find the last
  block so we know it has not primary reference, so it will
  get unpinned by the lafs_cluster_allocate call, and so
  will remove the primary reference.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agolafs_refile: fix nextparent handing.
NeilBrown [Wed, 23 Jun 2010 02:14:34 +0000 (12:14 +1000)]
lafs_refile: fix nextparent handing.

It is now possible that 'nextparent' is already a sibling of
the current block (inoidx vs ino data).  So handle that, and
generally clean up the code.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoCheck we haven't allocated too many flags.
NeilBrown [Wed, 23 Jun 2010 01:53:40 +0000 (11:53 +1000)]
Check we haven't allocated too many flags.

Simple compile-time checking that we haven't
allocated too many B_* flags.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agolafs_space_alloc - avoid underflow of unsigned numbers
NeilBrown [Wed, 23 Jun 2010 01:52:00 +0000 (11:52 +1000)]
lafs_space_alloc - avoid underflow of unsigned numbers

As the numbers in the calc are unsigned, use only additions
to avoid anything going negative.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoEnsure InoIdx block is really empty before erasing it.
NeilBrown [Wed, 23 Jun 2010 01:48:28 +0000 (11:48 +1000)]
Ensure InoIdx block is really empty before erasing it.

We we missing the test for non-empty children.
So move the block of code to after that test.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoincorporate_internal: remove unnecessary setting of 'buf'
NeilBrown [Wed, 23 Jun 2010 01:46:22 +0000 (11:46 +1000)]
incorporate_internal: remove unnecessary setting of 'buf'

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agofix leaf_lookup for indirect blocks.
NeilBrown [Wed, 23 Jun 2010 01:44:34 +0000 (11:44 +1000)]
fix leaf_lookup for indirect blocks.

If the space was an exact multiple of 6 bytes, we would not consider
the final entry as a possible 'next'.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agowalk_index: fix "found end of addresses" detection.
NeilBrown [Wed, 23 Jun 2010 01:41:21 +0000 (11:41 +1000)]
walk_index: fix "found end of addresses" detection.

As phys was being overloaded slightly, this was broken.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoleaf_lookup: fix mis-handling of addressed before first entry.
NeilBrown [Wed, 23 Jun 2010 01:38:43 +0000 (11:38 +1000)]
leaf_lookup: fix mis-handling of addressed before first entry.

If target < addr, then the first entry should be returned
a 'next', not consumed and ignored.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoleaf_lookup: simplify.
NeilBrown [Wed, 23 Jun 2010 01:36:55 +0000 (11:36 +1000)]
leaf_lookup: simplify.

Make decodeX calls more uniform, and avoid some
unnecessary arithmetic.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoleaf_lookup - avoid variable shadowing.
NeilBrown [Wed, 23 Jun 2010 01:30:31 +0000 (11:30 +1000)]
leaf_lookup - avoid variable shadowing.

Two vars in same scope called 'len'.
Second we should really use 'elen' for.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoWhen cluster_allocate and EmptyIndex block, clear Dirty flags
NeilBrown [Wed, 23 Jun 2010 01:27:40 +0000 (11:27 +1000)]
When cluster_allocate and EmptyIndex block, clear Dirty flags

Because that is what cluster allocate is supposed to do...

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoPin inode data blocks when dirtying them.
NeilBrown [Wed, 23 Jun 2010 01:25:17 +0000 (11:25 +1000)]
Pin inode data blocks when dirtying them.

Two places we were dirtying an inode data block that might
not have been pinned.  Fix them.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agospace_alloc is being called badly.
NeilBrown [Tue, 22 Jun 2010 21:01:52 +0000 (07:01 +1000)]
space_alloc is being called badly.

The is/then/else structure in both these cases is just wrong.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoSet EmptyIndex where appropriate.
NeilBrown [Tue, 22 Jun 2010 06:08:51 +0000 (16:08 +1000)]
Set EmptyIndex where appropriate.

When an index block becomes empty during incorp, mark it as
Empty.
All that other special handling we had now seems to be over-kill.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAllow EmptyIndex blocks to be written
NeilBrown [Tue, 22 Jun 2010 05:42:49 +0000 (15:42 +1000)]
Allow EmptyIndex blocks to be written

They are stored as phys address 0.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAllow empty index blocks to be loaded.
NeilBrown [Tue, 22 Jun 2010 05:36:37 +0000 (15:36 +1000)]
Allow empty index blocks to be loaded.

If physaddr is 0, then just create a clean index block.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoHonour EmptyIndex during index lookup.
NeilBrown [Tue, 22 Jun 2010 05:32:05 +0000 (15:32 +1000)]
Honour EmptyIndex during index lookup.

If we find an EmptyIndex that isn't first in the parent, we must
choose an earlier block.
We must check EmptyIndex after getting the lock.
If we have to drop a lock to do IO, and the unlocked block has a
->parent pointer, then we need to retry from the top.
'next' needs special care as it is could point to an
EmptyIndex block, so it is possible for leafs earlier in the tree to
have higher fileaddr (unlikely but possible).

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd EmptyIndex flag.
NeilBrown [Tue, 22 Jun 2010 02:35:32 +0000 (12:35 +1000)]
Add EmptyIndex flag.

This signals that an index block is known to be empty and
should normally be ignored.

It may never be set on an InoIdx block.
Normally it stays set once set.  However for the first index block
in a parent, it can be cleared again if any children appear.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoFilter empty block from uninc change before incorporation.
NeilBrown [Tue, 22 Jun 2010 02:32:07 +0000 (12:32 +1000)]
Filter empty block from uninc change before incorporation.

It is possible (though unusual) for an uninc chain to have
two block with the same fileaddr, one that is empty and being ignored,
and one that is newly split off and needs to be incorporated.
We need to detect this possibility after sorting and discard the
empty block so it doesn't confuse further incorporation.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDelay hashing of index blocks until they are incorporated.
NeilBrown [Tue, 22 Jun 2010 02:13:45 +0000 (12:13 +1000)]
Delay hashing of index blocks until they are incorporated.

We don't need an index block in the hash table until its
address is in the parent, as until then we will never try a lookup.

And it is good to delay it as it is possible for there to be two
blocks with the same address, one that is empty and thus ignored
mostly, and one that has since split of an earlier child.
While this is unlikely, we don't want that split-off block to
appear in the hash table until both have been incorporated.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRevert a21596e51b872635c7cb0683a21fff981f5d3716
NeilBrown [Tue, 22 Jun 2010 02:05:12 +0000 (12:05 +1000)]
Revert a21596e51b872635c7cb0683a21fff981f5d3716

As index block don't change addresses after all, this isn't needed.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRevert 7cfdec7d8156a8961c3413cca8e92340768c5b97
NeilBrown [Tue, 22 Jun 2010 01:50:49 +0000 (11:50 +1000)]
Revert 7cfdec7d8156a8961c3413cca8e92340768c5b97

Undo that format change.  It turned out to be a bad idea.
Index blocks should never change address, it is too confusing
and not needed.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRevert 1736b5f072c4f89cc641fb0a1991d9c145c411ec
NeilBrown [Tue, 22 Jun 2010 01:36:33 +0000 (11:36 +1000)]
Revert 1736b5f072c4f89cc641fb0a1991d9c145c411ec

... well, most of it.

Turns out this was a bad idea.  An index block should never
change address, so we can never try to incorp stuff before the start
of an indirect block.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agointroduce lafs_iolock_block_async
NeilBrown [Mon, 21 Jun 2010 05:40:40 +0000 (15:40 +1000)]
introduce lafs_iolock_block_async

In several places we want this an open code it.
Just write the real code.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoallocate_block fixes.
NeilBrown [Mon, 21 Jun 2010 04:55:26 +0000 (14:55 +1000)]
allocate_block fixes.

Try to set up slightly new rules....

1/ Adding an address as unincorporated requires just a spinlock,
    the inode private_lock
2/ The block with an address being added is not iolock, but is
    Writebehind
3/ incorporation happens under iolock,  removing the list of
    pending addresses also happens here.  so under iolock
    addresses can be added to list but not removed (unless I'm doing
    the removing).

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoRevise rule for inode data blocks as leafs.
NeilBrown [Mon, 21 Jun 2010 03:40:16 +0000 (13:40 +1000)]
Revise rule for inode data blocks as leafs.

We cannot process an inode data block as a leaf before processing
the InoIdx block.

Previously we would unpin an inode data block if the InoIdx block
should take priority.  But that is problematic.
Instead we simply take the inode data block off the leaf list.

This means we have to put it back on when the InoIdx gets unpinned
or phase-flipped.

At same time, tidy up determination of 'is a leaf' as this is used
both when adding something to a leaf list, and when taking something
off.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoChange lafs_phase_flip to take an indexblock
NeilBrown [Mon, 21 Jun 2010 01:32:12 +0000 (11:32 +1000)]
Change lafs_phase_flip to take an indexblock

As lafs_phase_flip is only ever passed an indexblock, make that
explicit in the signature, and remove any tests for B_Index and
they will always be True.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoBetter tracking of whether orphan handling is running.
NeilBrown [Mon, 21 Jun 2010 01:05:15 +0000 (11:05 +1000)]
Better tracking of whether orphan handling is running.

At unmount we need to wait for all orphan handling to
complete.
Just checking the list of orphan blocks is not enough
as it is empty while the handling is actually happening.
So add a state flag to help out.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoprealloc extra space for directory blocks.
NeilBrown [Mon, 21 Jun 2010 00:57:43 +0000 (10:57 +1000)]
prealloc extra space for directory blocks.

We need double space when preallocing for a transaction,
so that included Directories as well.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoincorporate: don't remove children prematurely.
NeilBrown [Mon, 21 Jun 2010 00:56:28 +0000 (10:56 +1000)]
incorporate: don't remove children prematurely.

We cannot just remove a child.  So to get it out of the PrimaryRef
chain, move it to the end of the child list.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't treat leaf-index blocks with children as empty.
NeilBrown [Mon, 21 Jun 2010 00:50:06 +0000 (10:50 +1000)]
Don't treat leaf-index blocks with children as empty.

They might be now, but they won't be soon.  Probably.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDon't let writepage spoil a transaction.
NeilBrown [Mon, 21 Jun 2010 00:39:00 +0000 (10:39 +1000)]
Don't let writepage spoil a transaction.

If a block is involved in a transaction (e.g. dir update) we
mustn't allow writepage to flush the page until the transaction
completes.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoEnsure that umount gets woken when a checkpoint completes.
NeilBrown [Mon, 21 Jun 2010 00:33:44 +0000 (10:33 +1000)]
Ensure that umount gets woken when a checkpoint completes.

We really need to call wake_up when ->checkpointing becomes 0.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoWait for segment-scan to finish before unmount.
NeilBrown [Mon, 21 Jun 2010 00:19:25 +0000 (10:19 +1000)]
Wait for segment-scan to finish before unmount.

It might be best to come up with a way to abort the scan,
but we won't want it to be running when we unmount, so wait
for it to complete and ensure it doesn't restart.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoDisable cleaner earlier at unmount.
NeilBrown [Mon, 21 Jun 2010 00:23:01 +0000 (10:23 +1000)]
Disable cleaner earlier at unmount.

We need to disable the cleaner a little earlier or it can run after we
have waited for it to finish.
So create a new flag.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoTemp fix for delaying youth updates.
NeilBrown [Mon, 21 Jun 2010 00:11:33 +0000 (10:11 +1000)]
Temp fix for delaying youth updates.

When we allocate a new segment during checkpoint we need to
delay the youth block update, possibly until the roll-forward.

This is just a simple hack to avoid the worst of the problem
but we need to properly delay it at some stage.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoSmall tidy up for lafs_seg_ref_block
NeilBrown [Mon, 21 Jun 2010 00:05:10 +0000 (10:05 +1000)]
Small tidy up for  lafs_seg_ref_block

Slight doco improvement, make local variables more local etc.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoSegRef fixes.
NeilBrown [Sun, 20 Jun 2010 23:41:51 +0000 (09:41 +1000)]
SegRef fixes.

We mustn't hold a SegRef for blocks which aren't going to be accounted
in any segment usage counts.

This means we should never hold SegRef on the Root block, and if we
decide not to account certain block during unmount - as roll-forward
will account them - we should drop SegRef promptly.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoAdd casts to shifts which should change type width.
NeilBrown [Sun, 20 Jun 2010 23:33:18 +0000 (09:33 +1000)]
Add casts to shifts which should change type width.

Sometimes when we left-shift a value it is possible that the
new value will require more bits to represent.
In those cases we first need to cast the value to the appropriately
sized type.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agocleaner: when erasing a datablock, cancel any pending cleaning.
NeilBrown [Fri, 18 Jun 2010 12:32:29 +0000 (22:32 +1000)]
cleaner: when erasing a datablock, cancel any pending cleaning.

This requires a bit of locking, but ensure that after erase_dblock,
the block is no longer in use, so truncate orphan handling doesn't
find children that it doesn't expect.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agounmount: clean up waiting for things.
NeilBrown [Fri, 18 Jun 2010 11:58:34 +0000 (21:58 +1000)]
unmount: clean up waiting for things.

 The unmount thread should run any orphans.  That should be
 left to the cleaner thread.
 It might be useful to wait for the cleaner to finish up.
 An alternate might be to release all pending-for-clean blocks.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoIncorp: improve setting of address after split.
NeilBrown [Fri, 18 Jun 2010 11:47:27 +0000 (21:47 +1000)]
Incorp: improve setting of address after split.

When we split a block, the address of the second half should be the
first address that won't fit in the original block.
Currently it is the first address that didn't fit.  If we end up
adding blocks in reverse address order, this could cause each new
block to require a split.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agomodify: avoid breaking an PrimaryRef chain
NeilBrown [Fri, 18 Jun 2010 11:29:20 +0000 (21:29 +1000)]
modify: avoid breaking an PrimaryRef chain

When we insert a new block into a PrimaryRef chain, we need to take
the new refcnt on the new block (which is now primary for the
following block) rather than than the block from which we split (on
which a primary_ref is already held).

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agosegments: fix array sizes.
NeilBrown [Fri, 18 Jun 2010 11:15:25 +0000 (21:15 +1000)]
segments: fix array sizes.

Heights range from 0 up, so the array must be sized
one larger than that maximum height.  So defined
SEG_NUM_HEIGHTS instead of SEG_MAX_HEIGHT, and make it one more.

Signed-off-by: NeilBrown <neilb@suse.de>
15 years agoREADME update
NeilBrown [Sun, 13 Jun 2010 23:16:46 +0000 (09:16 +1000)]
README update