So what to do?
If block is pinned, then dirty it to ensure writeout.
If not, don't. But copy data in any case.
+
+
+4sep2009
+
+ OK, I've decided that I don't like clearing B_Valid when an index
+ block contains no indexes. The final straw was that I seemed
+ to need to initialise the index block when I didn't hold IOLock.
+ That was probably fixable, but I'm sure more problems were coming.
+
+ So: what to do instead?
+ One issue that must be resolved is that an index block can still
+ have valid children even when it become empty.
+ This can happen if we erase blocks from a file, then add them back
+ after a checkpoint, and so in the next phase.
+ The checkpoint writeout could need to show an empty index block,
+ but the next phase will see real addresses.
+ We cannot easily avoid this, so we must handle it.
+ This interact badly with the index lookup algorithm that finds
+ the best index block currently in the parent, and then scans
+ the children. If there is no index block in the parent, we
+ cannot find any children.
+ This could be handled by responding to an empty index block by
+ scanning all children. But that isn't a full solution as if
+ just one index block got erased, it's unincorporated siblings
+ would still be lost.
+ We could treat empty index blocks like orphans. i.e. don't
+ discard them immediately but leave them with possibly real
+ addresses. Then when they have no children we allocate the
+ 0.
+ But we still need to ensure that index blocks off which siblings
+ have been split but not yet incorporated remain present in the
+ tree to mark the place for their siblings.
+ There is another problem. A horizontal split could leave the
+ new block with no addresses and everything in the uninc list.
+ Nothing can be found in there.
+
+ So maybe we need to revise the lookup mechanism.
+ The goal is to find an index block that starts at or before
+ the target and contains an address at or after the target.
+ Then out search can stop.
+ In rare cases.....
+
+7sep2009
+ I thought about this more over the weekend and think I have an answer.
+ We need to treat internal and leaf index blocks somewhat differently.
+
+ An internal index block must never be empty (while unlocked).
+ Any child block which has not had it's address incorporated must be
+ attached (simply in the sibling list) to a block which has been
+ incorporated. This will be the block that it was split off.
+ The uninc block needs to hold a reference so that the primary isn't
+ released.
+ When a 'primary' becomes empty it cannot be discarded, so the
+ addresses in the first dependent index block must be copied
+ across. This is awkward for indirect blocks so they might be
+ allowed to be empty (they aren't internal so don't violate the
+ above).
+ When a horizontal split break a sequence of dependent blocks
+ between two parents, the second parent must be incorporated
+ immediately so that the first block in the second half of the
+ sequence is incorporated.
+ If an internal index block does become empty and it has no
+ dependent blocks to fill from, it must be invalidated immediately.
+ It cannot have any children - even in next phase - as at least one
+ would have to be incorporated and so the block would not be empty.
+ Invaliding involves allocating to address 0.
+ If index lookup finds a block with PhysValid address of 0, it
+ must look to the previous index block. If there was none .... it
+ gets a bit complex.
+
+ Leaf index blocks can become empty, but we try to avoid it.
+ If a leaf has blocks which have been created in the next phase,
+ and others which have been deleted in this phase, it can be empty
+ but still have children. In this case we just treat it as a real
+ index block that doesn't actually have any addresses. We still
+ write it out even though that is a waste of space.
+
+ We have been working on the assumption that every address always
+ has a corresponding leaf index block. It is the leaf with the
+ highest index at or below the target address.
+ However this requires the every internal index block has a child
+ with the same address as the parent.
+ Preserving this requirement when the first child of an internal
+ become empty requires either:
+ - loading the 'next' child and reassigning this to the start
+ - changing the address of the parent to match the first child.
+ The former requires possibly reading a block from storage.
+ The latter only involves modifying blocks that are due to be
+ written out anyway, but makes block look up slightly interesting.
+ When lookup finds an invalid block that is 'first', it needs to
+ start again from the top.
+ When incorporation creates an invalid block that is first, it
+ needs to walk down from the top and any index block at the same
+ address needs to be relocated/rehashed. If the block is
+ incorporated, the incorporated address needs to be updated.
+ So:
+ - flag for unincorporated index blocks which implies a reference
+ on primary
+ - after split, immediately incorporate second block
+ - change lookup to retry when finding invalid block
+ - When internal block becomes empty, either merge with
+ first dependent or invalidate. If first in parent,
+ update address and parent and recurse.
+ Need some 'clever' locking here.
+ Before unlocking the invalidated block, we take i_alloc_sem,
+ then walk up the ->parent tree locking blocks as
+ required.
+ The index lookup, when it finds an invalid block will take
+ i_alloc_sem, then drop it, then start again.
+ Or maybe some other lock than i_alloc_sem...
+ - When leaf becomes empty, invalidate only if it has no children.
+ When internal leaf becomes unpinned, check if empty.
+
+21sep2009
+ That locking doesn't look like it will work, and we can never 'merge
+ with first dependant' as it is not valid to have a index block
+ where the first child is at a different address.
+ And we cannot always change the parent address, particularly if it
+ is zero - increasing it then cannot work.
+ And there is no need to load a block if we are just going to change
+ its start address (not internal index blocks anyway).
+ Let's drop the idea of relocating the parent.
+ If an internal index block becomes empty:
+ If it is last in parent, no loss, just discard
+ If parent would be empty, need to recurse up.
+ If it is not last relocate the next sibling to this location,
+ rehashing it and updating the parent.
+ If a leaf index block becomes empty we cannot just delegate to
+ next as it might be indirect... not a problem if address is
+ stored. But that requires a format change... now might be a
+ good time!
+
+
+ So:
+ If we hold an index block locked and it becomes empty and we choose
+ to invalidate it, we need to ensure that doing so does not
+ break any indexing paths.
+ So we take a separate lock (i_alloc_sem??) and flag the block as invalid
+ by setting physaddr to 0 while PhysValid is set, and unlock the block.
+ Any lookup that finds such a block must take and release i_alloc_sem,
+ and then restart from the top.
+ - If the block was not incorporated, we just remove from sibling list
+ and all is done - the space in implicitly included in
+ previous block.
+ - If the block has a different fileaddr than the parent then update
+ the parent directly, either removing the entry, or changing it to
+ point to the first unincorporated sibling (if there is one).
+ This requires taking the lock on the parent of course. That is
+ why we dropped the lock on the child.
+ Then all done.
+ - If the block has the same address as the parent we need to find
+ a 'next block' to relocate to the start of the parent.
+ It is either the first unincorporated sibling, or the next
+ block in the index block, or nothing, meaning the parent is
+ about to become empty.
+ We lock the parent (still holding i_alloc_sem), and rehash the
+ chosen child. If it doesn't exist, or is not dirty, we need
+ to update the phys address directly in the
+ accordingly, erasing or replacing the first address.
+ Then we need to rehash the index block, but we need to lock
+ the parent for that.
+ So set a 'busy' flag on the block, unlock it, lock parent,
+ rehash, clear busy flag, and repeat.
+ - We can never relocate a block with fileaddr of zero, as the
+ InoIdx block cannot be relocated. So leaf index block 0
+ must never be erased unless the file is empty. So
+
+28sep2009
+ New idea.
+ We store the start address of an indirect block in the block.
+ These means that the meaning of any index block is completely
+ independent of the location of the block, so we can change the location
+ easily and without touching the block.
+ So if a block becomes empty, we simply move the next block back to
+ fill the gap.
+ i.e. when an index block becomes truely empty (i.e. no children)
+ - if it wasn't incorporated, simply remove it
+ - if it was,
+ - if there is a dependent block, rehash it to take my address
+ - if there is a next block that is dirty, rehash it
+ - if there is a next block that is not dirty,
+ update parent to merge my entry with next, and rehash next
+ if it exists
+ - if there is no next block but we are not first, just update
+ parent
+ - if no next block and we are first, parent becomes empty,
+ recurse upwards.