From: NeilBrown Date: Sun, 19 Sep 2010 04:42:24 +0000 (+1000) Subject: README update X-Git-Url: http://git.neil.brown.name/?a=commitdiff_plain;h=cb2c3de38be79dd01442cb1a2bbac95cb70123ef;p=LaFS.git README update --- diff --git a/README b/README index 26b1947..95ac591 100644 --- a/README +++ b/README @@ -5004,9 +5004,9 @@ DONE 15h/ orphan deadlock DONE 15i/ separate thread management from 'cleaner' name. -15j/ review rules in getref_locked - and document them +DONE 15j/ review rules in getref_locked - and document them - - fix accesses to iblock +DONE - fix accesses to iblock DONE 15k/ newblocks should probably be a count of segments. Review that. @@ -5060,20 +5060,20 @@ DONE 15ac/ if lafs_shrinker cannot reclaim enough index blocks, trigger some DONE 15ad/ review lafs_phase_flip's call to lafs_add_block_address and wonder if more is needed. -15ae/ refile wonders about a race with cluster_allocate which gets IOLock +DONE 15ae/ refile wonders about a race with cluster_allocate which gets IOLock before removing from lru. -15af/ Review all locking in lafs_refile +DONE 15af/ Review all locking in lafs_refile -15ag/ Don't allocate data part of InoIdx block. +DONE 15ag/ Don't allocate data part of InoIdx block. -15ah/ Is there a problem with lafs_allocated_block putting an +DONE 15ah/ Is there a problem with lafs_allocated_block putting an about-to-be-truncated block on an uninc list? -15ai/ When allocating a new segment during checkpoint, delay the +DONE 15ai/ When allocating a new segment during checkpoint, delay the youth-block update until after the checkpoint -15aj/ When roll-forward finds a new segment, make sure youth number is +DONE 15aj/ When roll-forward finds a new segment, make sure youth number is updated. 15ak/ Load orphan file during roll-forward and make every block an @@ -5081,14 +5081,14 @@ DONE 15ad/ review lafs_phase_flip's call to lafs_add_block_address and wonder 15al/ set filesystem update_time somewhere. -15am/ filesystem 'name' needs to be handle uniformly. +15am/ filesystem 'name' needs to be handled uniformly. -15an/ can we be sure 'b' will be non-null in delete_inode? +DONE 15an/ can we be sure 'b' will be non-null in delete_inode? 15ao/ determine what locking is needed to walk the children list in lafs_inode_handle_orphans. Probably the address_space private lock. -15ap/ Make sure write_inode has been cleaned up. See if this apply to +15ap/ Make sure write_inode has been cleaned up. See if this applies to rollforward of a symlink (see FIXME) 15aq/ change inode map to be little-endian, not host-endian @@ -5143,7 +5143,7 @@ DONE 15ad/ review lafs_phase_flip's call to lafs_add_block_address and wonder 15bh/ after roll-forward, check that free_blocks hasn't gone negative. or handle if it has. -15bi/ Set EmergencyClean a bit later - need at least one checkpoint first. +DONE 15bi/ Set EmergencyClean a bit later - need at least one checkpoint first. to twostage. 15bj/ Make sure .last link in segtracker is kepts uptodate, particularly in @@ -5157,11 +5157,11 @@ DONE 15ad/ review lafs_phase_flip's call to lafs_add_block_address and wonder 15bm/ make sure everything gets free properly on error during mount / lafs_load -15bn/ How does refcountsing of 'struct fs' work with multiple filesets? +15bn/ How does refcounting of 'struct fs' work with multiple filesets? 15bo/ use put_super to drop last refer to superblocks -15bp/ review all superblock - maybe use more anon?? +15bp/ review all superblocks - maybe use more anon?? 15bq/ check readonly status in lafs_get_sb @@ -6621,3 +6621,176 @@ WritePhase - what is that all about? lafs_allocated_block, when this happens, checks that the parent is dirty/realloc as appropriate. Inf this case, realloc isn't an issue, only dirty. lafs_incorporate must have made it dirty and it won't get written while it has these in-phase children, so all is happy. + + 15ae - refile race? Someone might set B_IOLock before removing from lru, so + onlru is 0 and refcnt is elevated so it doesn't seem to be unused. + But then whoever has the refer will refile again when dropping it and + so the right thing will be done. + But more generally, do we really want the lru etc to own a counted reference? + If it didn't: + - we would need to refile when removing from any list + - we would need to get a ref when removing from list. + uhmmm.. + + lafs_refile does: + clear PinPending if refcnt is low + unpin if not PinPending, or dirty etc and data or refcnt is low + place on leaf list - if pinned etc - this can be earlier + drop parent linkm if refcnt is low, and not pinned etc + handle dblock issues + + if lru was not refcounted, then the only things we might do when refcnt isn't zero are: + unpin a dblock once it is not dirty + add to lru + + But if we don't count lru, then we can lose the refcount on dblock + + Hmmm - we cannot leave things on the leaf list forever as they thus hold a reference and + don't get freed. + + I think I want things on 'leafs' list to not hold a counted reference. + Things *only* get removed while walking the list. + InoIdx blocks hold a ref on the dblock both when counted and some other time. Possibly + when pinned. This ensure they are held InoIdx is while a real leaf. + But: When we take that first ref, how do we know the dblock even exists? + + What is the lifetime of ->dblock? + removed when page is released + set by lafs_import_inode + set by lafs_inode_dblock + removed by clear_inode + So if I don't hold a ref, I always need to be ready to call lafs_inode_dblock + This is currently callers of getiref_locked + - erase_dblock_locked ?? shouldn't need a lock + - ihash_lookup - never on InoIdx + - lafs_make_iblock - already have dblock + So none of those really need lafs_inode_dblock + What about when we set Pinned + only really from set_phase ... messy. + What about when we set ->parent + grow index tree - not relevant + ditto do_incorporate_* + block_adopt + Can be called on InoIdx from: + lafs_make_iblock only!! + +15sep2010 + + I have tidied lafs_refile up a lot but I need to make locking a lot cleaner. + In particular I want a single lock I can take when the refcnt hits zero which will ensure no ref + is taken until I have finished my cleanup. I suspect the inode private_lock is the one to use. + I also need to clean up getiref_locked and getref_locked - having both is awkward. + + So: when are they called? + + getref_locked: + lafs_get_flushable - hold fs->lock + first_in_seg - holds private_lock, but shouldn't need _locked as hold a ref through child. + (getiref_locked) + pin_all_children - hold private_lock + find_better - private_lock + getdref_locked + lafs_invalidate_page - to get a ref on each block to either erase or invalidate it + presumably page is locked + lafs_get_block - holds private_lock - plus once with only page_lock + lafs_release_page - holds private_lock + (getiref_locked on dblock) - no locking + lafs_inode_dblock - private_lock of my_inode... + lafs_delete_inode - private_lock of my_inode + lafs_destroy_inode - ditto + lafs_drop_inode - ditto + getiref_locked + erase_dblock_locked - private_lock + lafs_get_flushable - fs->lock + ihash_lookup - lafs_hash_lock + lafs_make_iblock - private_lock + + So private_lock looks like a good choice. Issues are: + - what is the story with dblock on my_inode->private_lock + - what is the lock ordering + - what can refile negate that we need to be careful of. + i.e. we want to keep things stable while refile does its tests, but what do we need to keep + stable for others? + + we break the parent link?? and so the siblings link + + move things to freelist + + can put_page + + free dblock if not page_private + + Lock_ordering. private_lock, then fs->lock, then lafs_hash_lock + So if we have to hold lafs_hash_lock, we increment refcnt, drop the lock, get/drop private_lock + + This is getting messy - I need something nice and clear. + So: + Index Blocks. + If Pinned, either has references or is on a leaf list - possibly both + If no references and not pinned then not on leaf list, so can be on free list + + Pinned can only be set when there are references, and can only be cleared under private_lock + This is violated by phase_flip, which badly reads refcnt + If refcnt is zero and not pinned, then can be moved to free_list + If on freelist and refcnt is zero under hash_lock, can be freed + + So if lafs_get_flushable finds a block that is not pinned, then we can delete and ignore. + Someone else must hold a ref and will put it and it will refile. but that is pointless as + it could immediately be cleared after we test Pinned. + + lafs_get_flushable should get a reference before deleting from list. This ensure it won't be freed + by lafs_shrinker, though it could be on the free list. If it is, then it isn't pinned so it is not + interestin to us. + + + Data Blocks: + These are removed from lru when freed - we just need the extra refcnt check after removing from list. + No we don't - these are only pinned while refcnt or dirty and can only loose dirty while refcnt + so they cannot disappear + + What is the story with my_inode->private_lock though? This is used to protect ->dblock accesses. + I guess we need to get or hold the other lock .... look at what the race is - what else is checked when dblock is cleared? + dblock is cleared in refile for the dblock, + or in clear_inode under the inode rivate lock. + + So: + There are various places that hold a non-counted reference to a block. + These include + - index hash table lafs_hash_lock + - index free list lafs_hash_lock + - phase_leafs / clean_leafs fs->lock only if pinned + - inode->iblock lafs_hash_lock + - inode->dblock inode->i_data.private_lock + + Each of these is protected by its own lock, but not all the same lock. + When we turn one of these into a counted reference, we increment refcnt under the local lock, + then after dropping that lock we take and drop b->inode->i_data.private_lock to ensure refile has + finished. This must be done before changing/using the block in any way. + To free an index block it must first be removed from _leafs list. Then if the refcount is still + zero it can be freed - or put on freelist and subsequently freed. + An InoIdx block - we need to hold hash_lock as well as private_lock to take a reference. + To free a data block we similarly need to recheck refcnt after removing from leaf list. + If it is in an inode file we also take that inode's private_lock to clear dblock. + We use rcu to get the inode, the lock it, then clear dblock if refcnt is still zero. + +17sep2010 + review lafs_refile - are some of those tests redundant? - yes, one is gone. + + So: + 15ah - What about truncated blocks sitting on an uninc chain? + I don't see the problem. It will eventually get incorporated and do the right thing... + + 15ai - We don't want to touch the youth block during a checkpoint else it is awkward to write it out in + a stable way..... + No, I don't think that is really a problem. It only gets written out in the tail of the checkpoint after + the root. I guess it could then get a youth number for a segment that it has no count for, if the root is + written at the end of one segment and the segusage/youth written at the start of the next. + + But I think roll-forward is missing something. Blocks in the next phase need to be counted into segusage. + Are they? oh, yes - they are. - cleaned and index blocks are ignored so they might be some wasted space, + but the important blocks picked up by the roll-forward are handled. + + So.... + + A checkpoint could cover multiple segments. We need to be sure these each get a valid youth number. + Probably most of them will, but we need a consistent approach to be sure. + They don't need to be added to the segtracker, except the last needs to be active, and it already is. + So as we find a new segment we want to do much like was lafs_free_get does youth_update. + But the data block - isn't that youthblk? When it that set? + segsum_find sets if it ssnum == 0