From 5befac9ad5b5fc6280c14c8a344490413ea4e50c Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 14 Sep 2010 11:50:34 +1000 Subject: [PATCH] README update and some comment fixes. Still working through the TODO list. Signed-off-by: NeilBrown --- README | 196 ++++++++++++++++++++++++++++++++++++++++++++++++++------ index.c | 7 +- 2 files changed, 183 insertions(+), 20 deletions(-) diff --git a/README b/README index f95490e..26b1947 100644 --- a/README +++ b/README @@ -56,11 +56,6 @@ FIXME index pages never get put on an LRU - how is this supposed to work? - - - - - -------------------------- Thoughts: Inodes live in an address-space, much like a file. To load the @@ -5034,35 +5029,35 @@ DONE 15p/ Review and remove the 'if cleaner is active then don't checkpoint just DONE 15q/ check checksums when reading cluster_header for cleaner This is already done! -15r/ consider further optimisation in cleaner to avoid lookups. +DONE 15r/ consider further optimisation in cleaner to avoid lookups. -15s/ memory barrier for i_size check in cleaner??? +DONE 15s/ memory barrier for i_size check in cleaner??? -15t/ review usable-space calculations in clean. +DONE 15t/ review usable-space calculations in clean. -15u/ Do I need a SegRef when pin-dblock-by-hand in flush_data_to_inode +DONE 15u/ Do I need a SegRef when pin-dblock-by-hand in flush_data_to_inode -15v/ tidy up all code that fiddles bits and credits - maybe make some +DONE 15v/ tidy up all code that fiddles bits and credits - maybe make some common helpers. -15w/ review cluster updates and make sure space used is accounted properly. +DONE 15w/ review cluster updates and make sure space used is accounted properly. -15x/ Consider caching result of a failed dir lookup in case we immediately +DONT BOTHER 15x/ Consider caching result of a failed dir lookup in case we immediately try to create it. Would this actually save anything significant? -15y/ Don't make dir blocks into orphans if it cannot be needed? +DONE 15y/ Don't make dir blocks into orphans if it cannot be needed? -15z/ make sure symlink creation is safe - do I need to log the body?? +DONE 15z/ make sure symlink creation is safe - do I need to log the body?? -15aa/ lafs_rename should flush orphans just like lafs_rmdir does. +DONE 15aa/ lafs_rename should flush orphans just like lafs_rmdir does. -15ab/ Does writepage need to recheck if my_inode and/or iblock have appeared +DONE 15ab/ Does writepage need to recheck if my_inode and/or iblock have appeared after lock is taken on block? -15ac/ if lafs_shrinker cannot reclaim enough index blocks, trigger some +DONE 15ac/ if lafs_shrinker cannot reclaim enough index blocks, trigger some writeout. -15ad/ review lafs_phase_flip's call to lafs_add_block_address and wonder +DONE 15ad/ review lafs_phase_flip's call to lafs_add_block_address and wonder if more is needed. 15ae/ refile wonders about a race with cluster_allocate which gets IOLock @@ -5194,12 +5189,28 @@ DONE 15q/ check checksums when reading cluster_header for cleaner - flags for inconsistencies found, at layout/fileset/file levels(?) - policies of whether old or new data is allowed on each device - policies of how much duplication of metadata is required + - inode map - not host-endian 15cc/ free any stray B_ASync block found in destroy_inode 15cd/ Some code assume a cluster header does not exceed 1 page. Is this safe? Is in true? Is it enforced? +15ce/ classify BUGs as + - internal logic errors + - IO errors + - unusual conditions I want a warning of + - data corruption errors + +15cf/ lafs_iget_fs need to sometimes to in-kernel mounts for subset filesystems + This is needed for the cleaner - the cleaner needs to hold a ref somehow. + +15cg/ lafs_sync_inode is weird - why the lafs_checkpoint_start and update_cluster + stuff?? + +15ch/ Review values of youth and checkpoint_youth and think about off-by-one + issues. + 16/ Update locking.doc 17/ cluster_flush calls lafs_cluster_allocate calls lafs_add_block_address @@ -5328,6 +5339,23 @@ DONE 52/ NFS export 61/ Try to make the inode struct smaller - maybe move some of the fs metadata into a separately-allocated struct. +62/ System/trusted extended attributes: + fileset max size + directory hash/seed + +63/ user extended attributes. + +64/ wonder if index blocks can be flushed out by memory pressure somehow. + e.g. if a data block is written by reclaim, flag the index block. + When a flagged index block has no children, it is incorporated and written. + ?? + +65/ review why lafs_allocated_block needs the new_parent label. Should not + lafs_incorporate leave all parents dirty? Maybe it is just the need for + B_Realloc - so maybe lafs_incorporate should leave the new block either + realloc or dirty rather than lafs_allocated_block doing it.? + See also 15ad below. + 26June2010 Investigating 5a @@ -6461,3 +6489,135 @@ WritePhase - what is that all about? So it should all be finished when we do a checkpoint. So if checkpoint, or release_page, finds an async block, drop it. + + 15r - further optimisations in cleaner to avoid lookups. + We have fsnum,inum,blocknum and cluster seq number and trunc num. + + I want to introduce more async though. Currently it only loads + one inode at a time. + To do more, I need to mark inodes as 'done' when they are and always + restart from the start of the cluster (only do one cluster at a time + for now). + So if we get all the way though a cluster with no 'EAGAIN' we finish + with the cluster. + + 15y - when could a directory block become an orphan? + - when deleting that last entry - we don't know if it can be fully + deleted until we look in next block + - when deleting an entry follows a chain back to the first block + - when deleting the last entry in the block. + + So it could be an orphan if the entry found: + - is at end of block + - is first entry + - is only entry + or first entry is already deleted. + +15Aug2010 + looking at flushing etc when run out of space. + We often force a checkpoint when it won't do any good as + nothing has been cleaned. + In fact we write lots of dead checkpoints to 0/0 until it is full, + then move on, clean 0/0 and suddenly have space. + We shouldn't do that. sync should be what pushes us forwards. + Maybe that is fixed.. + + InoIdx blocks still cause confusion. Should they ever have credits? + or do only the data block have those? Certainly they cannot have + SegRef. + And there is confusion in my mind whether data blocks can be pinned + while the InoIdx block is - need to clarify that. + + +13Sep2010 - now, where was I... + - I've just been dropping the use of SegRef on InoIdx blocks, where it makes no sense. + - test run: block.c:660 - no credits available while dirtying an InoIdx block during + orphan handling. lafs_reserver_block (under checkpoint lock) should have set credit. + Only I just changed reserve_block to do that dblock instead - I wonder why. + OK, I think I cleaned that up... + + + - make_orphan is hanging in checkpoint_unlock_wait. So orphan_pin returned -EAGAIN + so pin_dblock did too. So reserve_block did too, so prealloc or summary_alloc or seg_ref_block + returned error. + Problem is that we don't push a checkpoint when cleaner runs out of things to do. + But we don't want to go back to pushing a checkpoint too often. + Maybe the problem is that we only force the checkpoint when we have enough space to do + new allocations, but we need to force it earlier if nothing new can be cleaned. + + Once we set EmergencyClean, lafs_reserve_block will stop returning EAGAIN for newspace, so + we need to wake 'checkpoint_wait' then. + But for ReleaseSpace we want to wake on every checkpoint... we probably do anyway. + ...anyway, that is sorted now at commit 95b6b05e460 + + + So: InoIdx blocks. + - These never get SegRef as that is meaningless - done. + - These can have credits. It possibly isn't necessary bit it makes things + easier. They are 'written' by transfering the credits to the data block, or discarding them. + - I think dblock and iblock can both be pinned + The problem this caused was that the dblock might get processed as a leaf before iblock. + We now have lafs_is_leaf which causes dblock not be a leaf even if it is pinned, if the iblock + is pinned to the same phase. + lafs_phase_flip refiles the dblock so that it goes back on the leaf list as does lafs_refile when + it unpins an iblock + So lafs_pin_dblock doesn't need to pin the inode instead. + OK, that is fixed. - commit f1c05293bfd Mon Sep 13 15:07:27 2010 +1000 + + 15u - I don't need to get a segref there, but I need to have one from the original dirty block, + so fix that up - commit Mon Sep 13 15:28:08 2010 +100 + + 15v - What do we have? + lafs_dirty_dblock: set Dirty, clear Credit clear NCredit + set Uninc, clear Icredit clear NICredit + lafs_dirty_iblock: set dirty, clear credit + test uninc, clear ICredit, set Unincredit - not essential + mark_cleaning: test realloc, / alloc / set realloc + test dirty / clear realloc/ set credit + set uninc clear icredit + cleaner_flush: set dirty, clear realloc, clear credit + test dirty, clear realloc set credit + flush_data_to_inode: + lafs_cluster_allocate - there is some odd code ther!! + flip_phase + lafs_allocated_block + + all rather different really. + Just do some tiny tidyup in lafs_cluster_allocate when dirtying dblock + + 15w/ Space used by cluster updates?? + It is all fine - just some confusion of function names. + + 15z/ logging symlink creation. + Do I need to log the content? I needs to be safe on a dir sync, and you cannot sync the + symlink itself. So I guess we queue the block for writeout so it will go with the + dir update. + Yes, that works: Mon Sep 13 17:33:54 2010 +100 + + 15ab/ already did that in commit f90959e6f492b6 + + + 15ac/ How can we trigger write-out of dirty index block which have no pin-count, thus allowing them to + be freed after the write completes? A checkpoint could do it, but that would write out index block + that cannot be freed too. A checkpoint would only be good after lots of data pages had been written. + We could just wait and let other processes kick in.. + + I don't think we need to do anything. lafs_shrinker doesn't really know how tight memory + is, and periodic checkpoint will free up any memory that we are pinning. + + .... but something is needed. We need some trigger to write dirty index blocks + Maybe: + - a timeout on checkpoints - every dirty_expire_interval - but that isn't exported. + DONE THAT. + + Not sure this is a complete solution. I might want to incorp/flush index block when they + have no dirty children, but I'm not sure about that. + +14sep2010 + 15ad - lafs_add_block_address call from lafs_phase_flip - do I handle failure correctly? + failure happens when b2 is data block and uninc table is full so we called incorporate on the parent. + This could split the parent which means the block could have been re-parented - it would have been in the + child list and so found and fixed. + lafs_allocated_block, when this happens, checks that the parent is dirty/realloc as appropriate. + Inf this case, realloc isn't an issue, only dirty. lafs_incorporate must have made it dirty and + it won't get written while it has these in-phase children, so all is happy. diff --git a/index.c b/index.c index 65d36c8..18adc7d 100644 --- a/index.c +++ b/index.c @@ -644,8 +644,11 @@ void lafs_phase_flip(struct fs *fs, struct indexblock *ib) b2->chain = NULL; clear_bit(B_Uninc, &b2->flags); while (lafs_add_block_address(fs, b2) == 0) - /* FIXME do I need more like I do in - * lafs_allocated_block */ + /* We had to incorporate b2->parent which might + * have split the parent but as that will have made + * it dirty, there is nothing extra that we need to + * do here + */ ; putref(b2, MKREF(uninc)); } -- 2.39.5