From 98e5f0bad62ccf0a5ed893fc6b06a109476ec318 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 28 Sep 2009 10:16:15 +1000 Subject: [PATCH] README notes for changes to index management. --- README | 187 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 187 insertions(+) diff --git a/README b/README index da4fab5..0b939bb 100644 --- a/README +++ b/README @@ -3671,3 +3671,190 @@ FIXED orphans don't get cleaned up. It seems a 'create' fails and leaves So what to do? If block is pinned, then dirty it to ensure writeout. If not, don't. But copy data in any case. + + +4sep2009 + + OK, I've decided that I don't like clearing B_Valid when an index + block contains no indexes. The final straw was that I seemed + to need to initialise the index block when I didn't hold IOLock. + That was probably fixable, but I'm sure more problems were coming. + + So: what to do instead? + One issue that must be resolved is that an index block can still + have valid children even when it become empty. + This can happen if we erase blocks from a file, then add them back + after a checkpoint, and so in the next phase. + The checkpoint writeout could need to show an empty index block, + but the next phase will see real addresses. + We cannot easily avoid this, so we must handle it. + This interact badly with the index lookup algorithm that finds + the best index block currently in the parent, and then scans + the children. If there is no index block in the parent, we + cannot find any children. + This could be handled by responding to an empty index block by + scanning all children. But that isn't a full solution as if + just one index block got erased, it's unincorporated siblings + would still be lost. + We could treat empty index blocks like orphans. i.e. don't + discard them immediately but leave them with possibly real + addresses. Then when they have no children we allocate the + 0. + But we still need to ensure that index blocks off which siblings + have been split but not yet incorporated remain present in the + tree to mark the place for their siblings. + There is another problem. A horizontal split could leave the + new block with no addresses and everything in the uninc list. + Nothing can be found in there. + + So maybe we need to revise the lookup mechanism. + The goal is to find an index block that starts at or before + the target and contains an address at or after the target. + Then out search can stop. + In rare cases..... + +7sep2009 + I thought about this more over the weekend and think I have an answer. + We need to treat internal and leaf index blocks somewhat differently. + + An internal index block must never be empty (while unlocked). + Any child block which has not had it's address incorporated must be + attached (simply in the sibling list) to a block which has been + incorporated. This will be the block that it was split off. + The uninc block needs to hold a reference so that the primary isn't + released. + When a 'primary' becomes empty it cannot be discarded, so the + addresses in the first dependent index block must be copied + across. This is awkward for indirect blocks so they might be + allowed to be empty (they aren't internal so don't violate the + above). + When a horizontal split break a sequence of dependent blocks + between two parents, the second parent must be incorporated + immediately so that the first block in the second half of the + sequence is incorporated. + If an internal index block does become empty and it has no + dependent blocks to fill from, it must be invalidated immediately. + It cannot have any children - even in next phase - as at least one + would have to be incorporated and so the block would not be empty. + Invaliding involves allocating to address 0. + If index lookup finds a block with PhysValid address of 0, it + must look to the previous index block. If there was none .... it + gets a bit complex. + + Leaf index blocks can become empty, but we try to avoid it. + If a leaf has blocks which have been created in the next phase, + and others which have been deleted in this phase, it can be empty + but still have children. In this case we just treat it as a real + index block that doesn't actually have any addresses. We still + write it out even though that is a waste of space. + + We have been working on the assumption that every address always + has a corresponding leaf index block. It is the leaf with the + highest index at or below the target address. + However this requires the every internal index block has a child + with the same address as the parent. + Preserving this requirement when the first child of an internal + become empty requires either: + - loading the 'next' child and reassigning this to the start + - changing the address of the parent to match the first child. + The former requires possibly reading a block from storage. + The latter only involves modifying blocks that are due to be + written out anyway, but makes block look up slightly interesting. + When lookup finds an invalid block that is 'first', it needs to + start again from the top. + When incorporation creates an invalid block that is first, it + needs to walk down from the top and any index block at the same + address needs to be relocated/rehashed. If the block is + incorporated, the incorporated address needs to be updated. + So: + - flag for unincorporated index blocks which implies a reference + on primary + - after split, immediately incorporate second block + - change lookup to retry when finding invalid block + - When internal block becomes empty, either merge with + first dependent or invalidate. If first in parent, + update address and parent and recurse. + Need some 'clever' locking here. + Before unlocking the invalidated block, we take i_alloc_sem, + then walk up the ->parent tree locking blocks as + required. + The index lookup, when it finds an invalid block will take + i_alloc_sem, then drop it, then start again. + Or maybe some other lock than i_alloc_sem... + - When leaf becomes empty, invalidate only if it has no children. + When internal leaf becomes unpinned, check if empty. + +21sep2009 + That locking doesn't look like it will work, and we can never 'merge + with first dependant' as it is not valid to have a index block + where the first child is at a different address. + And we cannot always change the parent address, particularly if it + is zero - increasing it then cannot work. + And there is no need to load a block if we are just going to change + its start address (not internal index blocks anyway). + Let's drop the idea of relocating the parent. + If an internal index block becomes empty: + If it is last in parent, no loss, just discard + If parent would be empty, need to recurse up. + If it is not last relocate the next sibling to this location, + rehashing it and updating the parent. + If a leaf index block becomes empty we cannot just delegate to + next as it might be indirect... not a problem if address is + stored. But that requires a format change... now might be a + good time! + + + So: + If we hold an index block locked and it becomes empty and we choose + to invalidate it, we need to ensure that doing so does not + break any indexing paths. + So we take a separate lock (i_alloc_sem??) and flag the block as invalid + by setting physaddr to 0 while PhysValid is set, and unlock the block. + Any lookup that finds such a block must take and release i_alloc_sem, + and then restart from the top. + - If the block was not incorporated, we just remove from sibling list + and all is done - the space in implicitly included in + previous block. + - If the block has a different fileaddr than the parent then update + the parent directly, either removing the entry, or changing it to + point to the first unincorporated sibling (if there is one). + This requires taking the lock on the parent of course. That is + why we dropped the lock on the child. + Then all done. + - If the block has the same address as the parent we need to find + a 'next block' to relocate to the start of the parent. + It is either the first unincorporated sibling, or the next + block in the index block, or nothing, meaning the parent is + about to become empty. + We lock the parent (still holding i_alloc_sem), and rehash the + chosen child. If it doesn't exist, or is not dirty, we need + to update the phys address directly in the + accordingly, erasing or replacing the first address. + Then we need to rehash the index block, but we need to lock + the parent for that. + So set a 'busy' flag on the block, unlock it, lock parent, + rehash, clear busy flag, and repeat. + - We can never relocate a block with fileaddr of zero, as the + InoIdx block cannot be relocated. So leaf index block 0 + must never be erased unless the file is empty. So + +28sep2009 + New idea. + We store the start address of an indirect block in the block. + These means that the meaning of any index block is completely + independent of the location of the block, so we can change the location + easily and without touching the block. + So if a block becomes empty, we simply move the next block back to + fill the gap. + i.e. when an index block becomes truely empty (i.e. no children) + - if it wasn't incorporated, simply remove it + - if it was, + - if there is a dependent block, rehash it to take my address + - if there is a next block that is dirty, rehash it + - if there is a next block that is not dirty, + update parent to merge my entry with next, and rehash next + if it exists + - if there is no next block but we are not first, just update + parent + - if no next block and we are first, parent becomes empty, + recurse upwards. -- 2.39.5