From: NeilBrown Date: Fri, 25 Jun 2010 23:05:09 +0000 (+1000) Subject: Design Thoughts about PinPending and SegmentMap X-Git-Url: http://git.neil.brown.name/?a=commitdiff_plain;h=d4e3f2c2e4f28da7cf29d621e5fd3740dfc47001;p=LaFS.git Design Thoughts about PinPending and SegmentMap Signed-off-by: NeilBrown --- diff --git a/README b/README index 5b06a17..4ad31ad 100644 --- a/README +++ b/README @@ -4790,6 +4790,14 @@ DONE 3/ clean up the various 'scratch' patches discarding any tracing that DONE 4/ check in this README file DONE 5/ Write rest of the TODO list + 5a/ index.c:1982. Data block with Phys and no UnincCredit + It is Dirty but only has *N credits. + 16/1 ... + + 5b/ phase_flip/pin_all_children/lafs_refile finds refcnt == 0; + I guess we should getref/putref. + + 6/ soft lockup in unlink call. EIP is at lafs_hash_name+0xa5/0x10f [lafs] [] hash_piece+0x18/0x65 [lafs] @@ -4863,6 +4871,10 @@ Why have I no credits? 15a/ Find all FIXMEs and add them here. +15b/ Report directory size less confusingly + +15c/ roll-forward should not update index if physaddr hasn't changed (roll_block) + ## Items from 6 jul 2007. 16/ Update locking.doc @@ -4953,3 +4965,97 @@ Why have I no credits? 47/ Write good documentation 48/ Review all code, improve all comments, remove all bugs. + +26June2010 + Investigating 5a + + Normal sequence is to surrender UnincCredit, then to clear Dirty, + then to write. If anyone re-dirties after Dirty is clear, they + will naturally have to add an UnincCredit having reserved space first. + However it seems that the Cleaner gets in the way as the block in question + has just previously been cleaned, which consumed the UnincCredit + Do we need ReallocUnincCredit?? I hope not. + We generally need a way to say "I might want to write to this" so cleaner + doesn't write it early. + For index blocks that is pincnt. For data it is 'PinPending'. + This keeps index blocks off clean_leafs until they are ready, but + not data blocks. + And in any case, TypeSegmentMap blocks don't get PinPending as they + get written *after* the checkpoint. That is a rather ugly exception. + Maybe we make their different handling more explicit. We put them on + a separate list unpinned so the rest of the checkpoint can complete. + Then we flush that list? + Then PinPending keeps them off the clean_leafs list. + + So to clarify the plan: If a block is already Pinned to this phase, + we can "clean" it by marking it Dirty rather than Realloc. This is + appropriate for blocks that are likely to change soon (as blocks written + to the cleaner segment are not likely to change soon). + For data blocks we take "PinPending" to say "might change soon". For + index blocks ... we don't know if it is pinned by Realloc or Dirty or + PinPending children. So we set Realloc and wait for any children to + be unpinned for whatever reason. If it is only pinned by Realloc blocks, + it will end up on clean_leafs and be processed to the cleaner segment. + If it is pinned by anything else it will be found by the checkpoint and + processed to the new-data segment. + + So Index blocks always get Realloc, PinPending blocks get Dirty, + Other data blocks get Realloc. Good. + + Must review PinPending usage... always set, then maybe-dirty inside + checkpoint lock. In cases of unlocked usage (inode map) we don't clear + PinPending until checkpoint so it has longer exposure to Realloc->Dirty. + It is likely to be changing though, so not a big cost. Even good. + + Could make the distinction later. PinPending blocks don't go on + clean_leafs. So if they are still realloc at the checkpoint, we Realloc + to the new-data segment. This has the same net effect but is arguably + cleaner. It means that if a realloc block gets pinpending set, it + immediately stops being a clean leaf and so is safe. + So: just keep PinPending blocks off clean_leafs. Keep them on phase_leafs. + However there is no mechanism for moving things from phase_leafs to clean_leafs. + So maybe they stay on clean_leafs, but when the cleaner gets to them, it + dirties them and drops them.... that would work. + + So; if cleaner finds a block (on clean_leafs during cleaner-flush) which is + Dirty or PinPending, it makes sure it is Dirty and drops it for phase_leafs + to pick up. + + BUT: Does this work for TypeSegmentMap blocks? They aren't PinPending. + + We could treat them specially in the cleaner. Or we could set PinPending + and pin them to the phase, but treat them differently in checkpoint. + If we gathered them onto a separate list, then flush the list after + the phase had changed, it might be quite neat. No more getting writepages + to do our work for us. + They would need to be re-pinned to the next phase, then written out. + Or just unpinned, and let seg_inc re-pin as appropriate... except that + seg_inc is too later to pin. It dirties. We need to pin when we get + SegRef. We currently reserve but we don't pin. + We really do need to phase_flip these segmentmap blocks. But that requires + getting extra credits, and Pinning everything if new credits are not available. + And we don't really have a good list of 'everything' that depends on a segment. + But seeing the space_alloc never fails for these... + So Pin them, and flip them with AccountSpace + + So: + - split out common 'flip' code + - add 'flip' for data blocks + - create list of accounting blocks and flip accounting file blocks onto + that list during checkpoint + Flush should write that list, not the files. + - Get cleaner to ignore pinpending blocks, marking them dirty. + - pin segusage blocks while ref on them is held. + - writepage no longer needs special case for TypeSegmentMap, just PinPending + - lafs_prealloc just tests PinPending + + + [[aside: quota files seem to be handled like segmentmap files. Is that + right?? + We only track usage of data blocks based on various 'owners' of the file. + We need to know if a block was written in one phase or the next, and + only count blocks written/allocated in the one. + Data blocks can slip into 'this' phase quite late - any time before the + parent is finally incorporated. So we don't write quota blocks + until checkpoint is done. So yes, they are like SegmentMap + ]]