index pages never get put on an LRU - how is this supposed to work?
-
-
-
-
-
--------------------------
Thoughts:
Inodes live in an address-space, much like a file. To load the
DONE 15q/ check checksums when reading cluster_header for cleaner
This is already done!
-15r/ consider further optimisation in cleaner to avoid lookups.
+DONE 15r/ consider further optimisation in cleaner to avoid lookups.
-15s/ memory barrier for i_size check in cleaner???
+DONE 15s/ memory barrier for i_size check in cleaner???
-15t/ review usable-space calculations in clean.
+DONE 15t/ review usable-space calculations in clean.
-15u/ Do I need a SegRef when pin-dblock-by-hand in flush_data_to_inode
+DONE 15u/ Do I need a SegRef when pin-dblock-by-hand in flush_data_to_inode
-15v/ tidy up all code that fiddles bits and credits - maybe make some
+DONE 15v/ tidy up all code that fiddles bits and credits - maybe make some
common helpers.
-15w/ review cluster updates and make sure space used is accounted properly.
+DONE 15w/ review cluster updates and make sure space used is accounted properly.
-15x/ Consider caching result of a failed dir lookup in case we immediately
+DONT BOTHER 15x/ Consider caching result of a failed dir lookup in case we immediately
try to create it. Would this actually save anything significant?
-15y/ Don't make dir blocks into orphans if it cannot be needed?
+DONE 15y/ Don't make dir blocks into orphans if it cannot be needed?
-15z/ make sure symlink creation is safe - do I need to log the body??
+DONE 15z/ make sure symlink creation is safe - do I need to log the body??
-15aa/ lafs_rename should flush orphans just like lafs_rmdir does.
+DONE 15aa/ lafs_rename should flush orphans just like lafs_rmdir does.
-15ab/ Does writepage need to recheck if my_inode and/or iblock have appeared
+DONE 15ab/ Does writepage need to recheck if my_inode and/or iblock have appeared
after lock is taken on block?
-15ac/ if lafs_shrinker cannot reclaim enough index blocks, trigger some
+DONE 15ac/ if lafs_shrinker cannot reclaim enough index blocks, trigger some
writeout.
-15ad/ review lafs_phase_flip's call to lafs_add_block_address and wonder
+DONE 15ad/ review lafs_phase_flip's call to lafs_add_block_address and wonder
if more is needed.
15ae/ refile wonders about a race with cluster_allocate which gets IOLock
- flags for inconsistencies found, at layout/fileset/file levels(?)
- policies of whether old or new data is allowed on each device
- policies of how much duplication of metadata is required
+ - inode map - not host-endian
15cc/ free any stray B_ASync block found in destroy_inode
15cd/ Some code assume a cluster header does not exceed 1 page.
Is this safe? Is in true? Is it enforced?
+15ce/ classify BUGs as
+ - internal logic errors
+ - IO errors
+ - unusual conditions I want a warning of
+ - data corruption errors
+
+15cf/ lafs_iget_fs need to sometimes to in-kernel mounts for subset filesystems
+ This is needed for the cleaner - the cleaner needs to hold a ref somehow.
+
+15cg/ lafs_sync_inode is weird - why the lafs_checkpoint_start and update_cluster
+ stuff??
+
+15ch/ Review values of youth and checkpoint_youth and think about off-by-one
+ issues.
+
16/ Update locking.doc
17/ cluster_flush calls lafs_cluster_allocate calls lafs_add_block_address
61/ Try to make the inode struct smaller - maybe move some of the
fs metadata into a separately-allocated struct.
+62/ System/trusted extended attributes:
+ fileset max size
+ directory hash/seed
+
+63/ user extended attributes.
+
+64/ wonder if index blocks can be flushed out by memory pressure somehow.
+ e.g. if a data block is written by reclaim, flag the index block.
+ When a flagged index block has no children, it is incorporated and written.
+ ??
+
+65/ review why lafs_allocated_block needs the new_parent label. Should not
+ lafs_incorporate leave all parents dirty? Maybe it is just the need for
+ B_Realloc - so maybe lafs_incorporate should leave the new block either
+ realloc or dirty rather than lafs_allocated_block doing it.?
+ See also 15ad below.
+
26June2010
Investigating 5a
So it should all be finished when we do a checkpoint.
So if checkpoint, or release_page, finds an async block, drop it.
+
+ 15r - further optimisations in cleaner to avoid lookups.
+ We have fsnum,inum,blocknum and cluster seq number and trunc num.
+
+ I want to introduce more async though. Currently it only loads
+ one inode at a time.
+ To do more, I need to mark inodes as 'done' when they are and always
+ restart from the start of the cluster (only do one cluster at a time
+ for now).
+ So if we get all the way though a cluster with no 'EAGAIN' we finish
+ with the cluster.
+
+ 15y - when could a directory block become an orphan?
+ - when deleting that last entry - we don't know if it can be fully
+ deleted until we look in next block
+ - when deleting an entry follows a chain back to the first block
+ - when deleting the last entry in the block.
+
+ So it could be an orphan if the entry found:
+ - is at end of block
+ - is first entry
+ - is only entry
+ or first entry is already deleted.
+
+15Aug2010
+ looking at flushing etc when run out of space.
+ We often force a checkpoint when it won't do any good as
+ nothing has been cleaned.
+ In fact we write lots of dead checkpoints to 0/0 until it is full,
+ then move on, clean 0/0 and suddenly have space.
+ We shouldn't do that. sync should be what pushes us forwards.
+ Maybe that is fixed..
+
+ InoIdx blocks still cause confusion. Should they ever have credits?
+ or do only the data block have those? Certainly they cannot have
+ SegRef.
+ And there is confusion in my mind whether data blocks can be pinned
+ while the InoIdx block is - need to clarify that.
+
+
+13Sep2010 - now, where was I...
+ - I've just been dropping the use of SegRef on InoIdx blocks, where it makes no sense.
+ - test run: block.c:660 - no credits available while dirtying an InoIdx block during
+ orphan handling. lafs_reserver_block (under checkpoint lock) should have set credit.
+ Only I just changed reserve_block to do that dblock instead - I wonder why.
+ OK, I think I cleaned that up...
+
+
+ - make_orphan is hanging in checkpoint_unlock_wait. So orphan_pin returned -EAGAIN
+ so pin_dblock did too. So reserve_block did too, so prealloc or summary_alloc or seg_ref_block
+ returned error.
+ Problem is that we don't push a checkpoint when cleaner runs out of things to do.
+ But we don't want to go back to pushing a checkpoint too often.
+ Maybe the problem is that we only force the checkpoint when we have enough space to do
+ new allocations, but we need to force it earlier if nothing new can be cleaned.
+
+ Once we set EmergencyClean, lafs_reserve_block will stop returning EAGAIN for newspace, so
+ we need to wake 'checkpoint_wait' then.
+ But for ReleaseSpace we want to wake on every checkpoint... we probably do anyway.
+ ...anyway, that is sorted now at commit 95b6b05e460
+
+
+ So: InoIdx blocks.
+ - These never get SegRef as that is meaningless - done.
+ - These can have credits. It possibly isn't necessary bit it makes things
+ easier. They are 'written' by transfering the credits to the data block, or discarding them.
+ - I think dblock and iblock can both be pinned
+ The problem this caused was that the dblock might get processed as a leaf before iblock.
+ We now have lafs_is_leaf which causes dblock not be a leaf even if it is pinned, if the iblock
+ is pinned to the same phase.
+ lafs_phase_flip refiles the dblock so that it goes back on the leaf list as does lafs_refile when
+ it unpins an iblock
+ So lafs_pin_dblock doesn't need to pin the inode instead.
+ OK, that is fixed. - commit f1c05293bfd Mon Sep 13 15:07:27 2010 +1000
+
+ 15u - I don't need to get a segref there, but I need to have one from the original dirty block,
+ so fix that up - commit Mon Sep 13 15:28:08 2010 +100
+
+ 15v - What do we have?
+ lafs_dirty_dblock: set Dirty, clear Credit clear NCredit
+ set Uninc, clear Icredit clear NICredit
+ lafs_dirty_iblock: set dirty, clear credit
+ test uninc, clear ICredit, set Unincredit - not essential
+ mark_cleaning: test realloc, / alloc / set realloc
+ test dirty / clear realloc/ set credit
+ set uninc clear icredit
+ cleaner_flush: set dirty, clear realloc, clear credit
+ test dirty, clear realloc set credit
+ flush_data_to_inode:
+ lafs_cluster_allocate - there is some odd code ther!!
+ flip_phase
+ lafs_allocated_block
+
+ all rather different really.
+ Just do some tiny tidyup in lafs_cluster_allocate when dirtying dblock
+
+ 15w/ Space used by cluster updates??
+ It is all fine - just some confusion of function names.
+
+ 15z/ logging symlink creation.
+ Do I need to log the content? I needs to be safe on a dir sync, and you cannot sync the
+ symlink itself. So I guess we queue the block for writeout so it will go with the
+ dir update.
+ Yes, that works: Mon Sep 13 17:33:54 2010 +100
+
+ 15ab/ already did that in commit f90959e6f492b6
+
+
+ 15ac/ How can we trigger write-out of dirty index block which have no pin-count, thus allowing them to
+ be freed after the write completes? A checkpoint could do it, but that would write out index block
+ that cannot be freed too. A checkpoint would only be good after lots of data pages had been written.
+ We could just wait and let other processes kick in..
+
+ I don't think we need to do anything. lafs_shrinker doesn't really know how tight memory
+ is, and periodic checkpoint will free up any memory that we are pinning.
+
+ .... but something is needed. We need some trigger to write dirty index blocks
+ Maybe:
+ - a timeout on checkpoints - every dirty_expire_interval - but that isn't exported.
+ DONE THAT.
+
+ Not sure this is a complete solution. I might want to incorp/flush index block when they
+ have no dirty children, but I'm not sure about that.
+
+14sep2010
+ 15ad - lafs_add_block_address call from lafs_phase_flip - do I handle failure correctly?
+ failure happens when b2 is data block and uninc table is full so we called incorporate on the parent.
+ This could split the parent which means the block could have been re-parented - it would have been in the
+ child list and so found and fixed.
+ lafs_allocated_block, when this happens, checks that the parent is dirty/realloc as appropriate.
+ Inf this case, realloc isn't an issue, only dirty. lafs_incorporate must have made it dirty and
+ it won't get written while it has these in-phase children, so all is happy.