There is some code in the wrong place - probably a hang over from a
previous arrangement before we made lafs_is_leaf a function.
No function change here.
We cannot include it in an update, so just make sure it goes in the
next write cluster. This will be before an sync or fsync and
roll-forward should pick it up, so all is OK
We now allow both iblock and dblock to be pinned at the same time.
So when pinning the inode dblock, just do it and don't go bothering
the inode iblock.
Refinements for triggering checkpoint when we are low on space.
This is getting messy but seems to work.
Not sure now on the difference between CleanerBlocks and
EmergencyPending.
I guess the one makes sure the cleaner does what it can and then
triggers a checkpoint.
The other prepares for EmergencyClean to be set after the next
checkpoint.
NeilBrown [Sun, 15 Aug 2010 08:30:36 +0000 (18:30 +1000)]
Wait for a checkpoint before returning ENOSPC
If we seem to run out of space, it is worth waiting for
a checkpoint as that might free up some space. So add
an extra step to the sequence leading from 'no space' to 'ENOSPC'.
NeilBrown [Sat, 14 Aug 2010 10:52:11 +0000 (20:52 +1000)]
Allow cleaner_parse to request multiple inodes at once.
Currently cleaner_parse stops when it hits an inode that it cannot
load immediately. This reduced the opportunities for parallelism.
Instream allow up to 16 -EAGAINs from inode lookups.
This requires that we mark headers for inodes which failed, and
always start again from the beginning of the cluster head.
We already reduce the bcnt to 0, so for inodes that can be
found, we won't lookup the blocks twice.
NeilBrown [Sat, 14 Aug 2010 06:48:39 +0000 (16:48 +1000)]
Clean up interaction between cleaner and checkpoint.
If a checkpoint is wanted, the cleaner shouldn't start any more work.
If the cleaner or segscan is active a checkpoint cannot start, but
when they complete they should wake the checkpoint process.
NeilBrown [Sat, 14 Aug 2010 05:41:59 +0000 (15:41 +1000)]
Combine cleaning and orphan list_heads.
A datablock is very rarely both an orphan and requiring cleaning, so
having two list_heads is a waste.
If is an orphan it will have full parent linkage and addresses already
so it will be handled promptly and removed from the cleaning list.
So arrange that if a block wants to be both, it is preferentially on
the cleaning list, and when removed from the cleaning list is gets
added back to the pending_orphan list in case it needs processing.
Note that only directory and inode blocks can ever be orphans so some
optimisation of spinlocks is possible.
NeilBrown [Sat, 14 Aug 2010 03:36:36 +0000 (13:36 +1000)]
Implement youth decay.
- After a checkpoint, check if we are close enough to the end
of youth space to need a decay.
- when we record a new youth number, un-decay it if the block hasn't
been decayed yet (and convert endian properly)
- Change scan_seg to updates free_block/free_dev atomically in just
one place, and do a block worth of decay at that point.
As part of this, the youth block is only released at one place now.
NeilBrown [Fri, 13 Aug 2010 06:14:27 +0000 (16:14 +1000)]
Various fixes for lafs_get_subset
- make sure root directory is created if it doesn't exist
- also create inode usage map.
- hold a ref on the inode while the fs is mounted.
- free the sb_key at unmount.
- set s_bdi from the prime_sb
NeilBrown [Fri, 13 Aug 2010 03:59:10 +0000 (13:59 +1000)]
Set PinPending in flush_data_to_inode.
Should always have this set when we pin a block.
It keeps the block pinned until it is dirtied.
As lafs_pin_block does a refile at the end, it can drop the Pinned
state as soon as it is set.
NeilBrown [Mon, 9 Aug 2010 10:24:20 +0000 (20:24 +1000)]
Unpin data blocks from previous phase before allowing them to be dirty.
While checkpointing will unpin PinPending blocks, it might not
manage to do it before the block gets Dirtied again.
So before we Pin the block - which is a required precursor to dirtying
them, unpin the block.
NeilBrown [Sun, 1 Aug 2010 03:38:06 +0000 (13:38 +1000)]
Use lafs_iget_fs rather than multiple get_blocks in orphan lookup.
When compacting the orphan table so so changing the orphan
slot for a block, use lafs_iget_fs to help find the orphan block.
This avoids allocating blocks if the inodes exist (which they
should).
NeilBrown [Sun, 1 Aug 2010 02:43:04 +0000 (12:43 +1000)]
Hold ref on inode during orphan handling.
Orphan handling will shortly drop references to the inode controlling
the orphan block. As run_orphans needs to drop the mutex at the end
it needs to hold another reference too.
If I_Deleting is set, then the db effectively owns a reference,
so no further igrab is needed, nor will it work.
NeilBrown [Sun, 1 Aug 2010 00:57:01 +0000 (10:57 +1000)]
wait more effectively for truncate to progress.
The times that we wait for truncate to progress, we hold
i_mutex, so truncate cannot progress.
So if there is a need to wait, we need to call the orphan
handler directly.
Break linkage between inode and dblock at earliest opportunity.
clear_inode is the first chance to break this linkage,
so do it there.
It is still possible for lafs_iget to get a new inode before
clear_inode has completed, so we need to do the same
test/clear in lafs_iget if b->my_inode is found to be non-NULL;
cleaner: don't iput while still holding a ref to a block.
As the block->inode ref isn't counted, this isn't really safe.
The inode could disappear and the block might not get killed
when the address-space is truncated.
Using the new s_sb_info structure, we add a snapshot number
so we can uniquely identify a snapshot from the superblock and
the 'sget' can be used to find an existing or new superblock.
If it is new, set it up properly as before.
No need to fiddle with 'primary_sb' - we have a ref into it from the
path lookup so it cannot go away, and it shouldn't really matter if it
does.
Neil Brown [Mon, 19 Jul 2010 08:36:47 +0000 (18:36 +1000)]
Change s_fs_info to point to root inode and fs
We create a new data structure containing the 'fs' and the root inode
of a filesystem, and store this in the superblock.
This allows each access to that root in iget, which previously was
impossible in general.
We want PinPending set whenever a transaction might be in progress to
ensure that write_page doesn't flush the block early, or that the
cleaner doesn't clean the block in the middle.
We also want the block be completely written if a write has already
been scheduled.
So:
- set PinPending - after getting an IOlock and ensure the block is
not in writeback. This is set before the checkpoint lock is
taken.
- Once we have checkpoint lock and call pin_dblock, wait for
writeout to complete again. This can only be in writeout
if the block is being written to the previous phase, and it
is safe to wait for that inside the checkpoint lock.