README update

author NeilBrown <neilb@suse.de>

Thu, 10 Jun 2010 06:15:45 +0000 (16:15 +1000)

committer NeilBrown <neilb@suse.de>

Sun, 13 Jun 2010 07:20:00 +0000 (17:20 +1000)
author NeilBrown <neilb@suse.de>
Thu, 10 Jun 2010 06:15:45 +0000 (16:15 +1000)
committer NeilBrown <neilb@suse.de>
Sun, 13 Jun 2010 07:20:00 +0000 (17:20 +1000)
diff --git a/README b/README

index 2a67921dcf695d9e68b7a9311cee35e0f2403be9..07f805c8a9184792e86ea52857c9537d90c9ed7f 100644 (file)
--- a/README
+++ b/README
@@ -3905,5 +3905,235 @@ FIXED orphans don't get cleaned up.  It seems a 'create' fails and leaves
  
   Now cluster.c:619: [ce9233f8]0/282(0)r2F:Pinned,Phase1,PinPending,CN,CNI,Claimed,PhysValid iblock(1) leaf(1)
       cluster.c:619: [ce570a18]0/286(0)r2F:Pinned,Phase1,PinPending,SegRef,CN,CNI,Claimed,PhysValid iblock(1) leaf(1)
+     cluster.c:619: [ce588d6c]0/17(0)r2F:Pinned,Phase0,PinPending,CN,CNI,Claimed,PhysValid iblock(1) leaf(1)
+     cluster.c:619: [ce51dfe4]0/283(0)r2F:Pinned,Phase0,PinPending,CN,CNI,Claimed,PhysValid iblock(1) leaf(1)
+     cluster.c:619: [cfbb8430]0/328(0)r2F:Pinned,Phase1,PinPending,CN,CNI,Claimed,PhysValid iblock(1) leaf(1)
     We are allocating an InoIdx block, but data block is not valid??
  
+ That isn't very reproducible so I'll have to leave it for now...
+    erasedblock had been called on the data block .. inode 17??
+
+  Problem is that I keep changing the rules.
+   I don't erase the InoIdx block any more.
+   I used to, then change it to iolock_block/cluster_allocate->0
+
+ Problem: When all files are removed, usage is still quite high, two
+   segments have over 400 blocks (out of 512).  Cleaning keeps running and
+   not making much progress.
+  segment 6 has usage of 484.
+  'cluster 3072' shows: cluster 3072, 3085, 3086 3092
+    Inode 0:  blocks 267 272 276 
+    Inode 277: blocks 0/4 6/2
+    Inode 0: blocks 0/2 8 16
+    Inode 0: block 16 70/2 131/3 135/4 140/9 150/2 ... 296/7
+    Inode 16: 1/1
+    Inode 17: 0/28
+    Inode 283: 12/18
+          etc.
+
+  All 'old', so must be the product of cleaning, as you would expect.
+  All (most) of this has been deleted though, but count didn't drop.
+   'Count' add to 508, plus the 4 cluster heads makes 512 - good.
+  lafs_seg_move definitely isn't being called on these blocks.
+  it is only called from lafs_summary_update
+  cblocks_used "exactly" matches the number of un-removed blocks.
+
+
+  Another problem
+bad [ce5bcf50]301/0(0)r1E:Index(0),Pinned,Phase1,WPhase0,InoIdx,Valid,SegRef,C,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1)
+/home/neilb/work/nfsbrick/fs/module/modify.c:1652: [ce5bcf50]301/0(0)r1E:Index(0),Pinned,Phase1,WPhase0,InoIdx,Valid,SegRef,C,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1)
+bad [cfbf6000]327/0(0)r1E:Index(0),Pinned,Phase1,WPhase0,InoIdx,Valid,SegRef,C,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1)
+/home/neilb/work/nfsbrick/fs/module/modify.c:1656: [cfbf6000]327/0(0)r1E:Index(0),Pinned,Phase1,WPhase0,InoIdx,Valid,SegRef,C,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1)
+bad [cfb62d20]291/0(0)r1E:Index(0),Pinned,Phase1,WPhase1,InoIdx,Valid,SegRef,C,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1)
+/home/neilb/work/nfsbrick/fs/module/modify.c:1656: [cfb62d20]291/0(0)r1E:Index(0),Pinned,Phase1,WPhase1,InoIdx,Valid,SegRef,C,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1)
+
+ and
+free_blocks=1842 allocated=449 max_seg=512 clean_reserved=0
+Want dump of usage
+
+------------[ cut here ]------------
+kernel BUG at /home/neilb/work/nfsbrick/fs/module/segments.c:1028!
+ free list is empty - that should not be.
+
+and another...
+/home/neilb/work/nfsbrick/fs/module/modify.c:1219: [ce9893b0]74/0(0)r1E:Index(1),Pinned,Phase0,WPhase1,InoIdx,Valid,Dirty,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1)
+/home/neilb/work/nfsbrick/fs/module/modify.c:1219: [ce5ba690]74/0(0)r1E:Index(1),Pinned,Phase1,WPhase0,InoIdx,Valid,Dirty,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1)
+ [<d0a57bc8>] ? lafs_get_flushable+0x131/0x191 [lafs]
+ [<d0a5856d>] ? lafs_do_checkpoint+0x1b3/0x3a2 [lafs]
+ [<d0a5fe7e>] ? cleaner+0x105/0x1426 [lafs]
+ [<c02256bf>] ? autoremove_wake_function+0x0/0x33
+ [<d0a5fd79>] ? cleaner+0x0/0x1426 [lafs]
+
+
+08Jun2010
+ Weirdness with truncating.
+ The cleaner relocates a file resulting in the InoIdx block being
+ Maybe-dirty and phys_addr == 0.
+ Then truncate doesn't prune but just incorporates, finding
+  something weird there..
+  file 278, blocks around 4100
+  seem to find 1949 instead??
+
+ Note: When a non-InoIdx block is erased we set PhysValid
+  and physaddr == 0 to record the fact because it will not be stored...
+
+modify.c:1654: [ce5b4460]327/336(16)r4F:Index(1),Pinned,Phase0,WPhase1,Valid,Async,SegRef,C,CI,CN,CNI,IOLock,PhysValid{0,0}[0] leaf(1) inode_handle_orphan2(1) async(1) inode_handle_orphan3(1)
+Async ??
+modify.c:1657: [cfb90690]327/340(787)r4F:Index(1),Pinned,Phase1,WPhase0,Valid,Async,SegRef,C,CI,CN,CNI,IOLock,PhysValid{0,0}[0] leaf(1) inode_handle_orphan2(1) async(1) inode_handle_orphan3(1)
+Still Async ... wonder what it means.
+
+- directory block got corrupted.  Maybe conversion to indexed??
+
+
+Getting bug in remove_from_index because the addr isn't
+there, possibly block is empty.  But incorporation is
+??? instant?  No it isn't.
+If an index block hasn't be incorporated it has B_PrimaryRef
+set as it hold a ref to something earlier index.
+But what if nothing is incorporated?
+
+
+Allocated [ce402230]328/340(0)r5F:Index(1),Pinned,Phase0,WPhase0,Valid,Dirty,Async,SegRef,CN,CNI,UninCredit,IOLock,PhysValid,Uninc{0,0}[0] inode_handle_orphan2(1) leaf(1) uninc(1) async(1) inode_handle_orphan3(1) -> 0
+looping on [ce402230]328/340(0)r5F:Index(1),Pinned,Phase0,Valid,Dirty,Writeback,Async,SegRef,CN,CNI,UninCredit,PhysValid,Uninc{0,0}[0] inode_handle_orphan2(1) cluster(1) uninc(1) async(1) inode_handle_orphan3(1)
+
+Then spin in a soft-lockup in lafs_inode_handle_orphan
+
+
+-----------
+ - grow_index_tree needs to do initial incorporation so things can be found.
+    just like end of do_incorporate_internal.
+   NO - cannot incorp yet as do not have phys addr.  Don't need to as
+   lafs_leaf_find explicitly handles this.
+   For truncate case we don't use the stored address, but ensure all
+   leaf indexes must be dirty (or gone) so whole tree must be
+   accessible for walking around.
+ - do_incorporate_internal needs to set B_PrimaryRef and take the ref
+ - when we remove a B_PrimaryRef without incorporating it, we need to
+   drop a ref if the *next* in the list is B_PrimaryRef
+ - need to use a constant to identify 'async' calls etc.
+ - maybe I need other iolock_block in truncate ?? to ensure it is Valid so
+   it isn't found as async....
+
+09Jun2010
+ STILL struggling with incorporation.
+ We have a premise that any file address is coverred by precisely
+ one leaf index block.  Every leaf index has an implicit address
+ and it covers all addresses from there to the next leaf.  The last
+ leaf covers to EOF.
+ So there must always be a leaf at address 0.
+ This applies within the tree from an internal index block too.
+ Beneath an internal index block there must be a leaf covering every
+ address up to the next internal index block.  So there must be
+ a first.  So storing the first address is pointless.  And harmful.
+ When an index block becomes empty and disappears its coverage is
+ included in the previous block unless there is none, in which case
+ the next index block must be re-addressed.  If there is no 'next',
+ this index block must be empty and so must disappear.
+
+ BUT if we re-address an index block, we implicitly re-address the
+ first child - recursively - so we need to move/rehash them all
+ or lose them... or record where they are.  Or do lookup not by
+ addr....
+ I think just rehashing them all - with an iolock - is simple
+ and safe.  So just do that.
+
+
+ So:  I cleaned up index handling a truncation somewhat.
+  Now running looptest to see what patterns emerge:
+
+  block.c:197 (*9+1) During umount, the Root datablock is
+        Dirty+Realloc
+        Maybe just need for cleaner to become inactive
+        during umount - hope that doesn't deadlock
+        didn't event work...
+  block.c:529 (*4+1)  erase dblock while iblock depth > 0
+        When pruning InoIdx we want to set depth to 0.
+        FIXME is this really want I want, or is depth=0
+        only for data-inode ... FIXME
+  cluster.c:533 (*2) cluster_allocate on invalid block
+          Block is 8/0 in writepage from sync_inodes
+          This is the orphan file.
+                   blocks aren't dirty
+          I guess the file gets truncated while we wait for it.
+          Just need to re-test.
+  index.c:1936 (*2).   An index block is Root - FIXED??
+  modify.c:1056 - secondary bug, ignore for now.
+  modify.c:1650 update_index fails to find target.
+              second call, phys==0
+              Code was bad ... may not be the cause though.
+  modify.c:1696 (*4) lafs_incorporate gets non-dirty Index(1) block
+                   from orphan handler.
+                Maybe just change the do/while back to 'do'.
+  modify.c:1704: (*2) lafs_inc gets leaf with uninc list???
+               Index(0)/InoIdx
+               in do_checkpoint
+               uninc list gets set in lafs_add_block_address (parent of iblk),
+               do_incorporate_internal,
+               Maybe the InoIdx still had children.
+  segments.c:1028.  (*4) The free list becomes empty.
+  super.c:655 (*3)   Busy inodes after umount, and root InoIdx block
+         is still pinned as inode 16 data block was still dirty.
+         segusage slow.  Maybe same as block.c:197 ??
+  invalid address 6b6b6bfb: invalidate_inode_buffers in shutdown
+          finds invalid lock.
+          presumably the inodes was freed before invalidated.
+  spin on writeback during truncate (r3a) 8 times. now 10
+        Probably because writeback cannot proceed while
+        orphan processing keeps looping.
+  kmalloc-1024 problems - (*2)
+          A block - should be start of page - isn't not what it appears...
+
+ Others complete with 'cb' ranging from 202 to 715
+
+
+10 June 2010
+
+ Looking at segment.c:1028
+  We run a seg_scan every checkpoint, so that should keep free segments
+  in the list.....
+  Ahh.. do_checkpoint is looping because root isn't changing phase.
+
+  Lowest block pinned to old phase is 
+  [cfb7df08]0/74(4253)r0E:Pinned,Phase1,WPhase0,Valid,Dirty,CN,CNI,UninCredit,IOLock,Claimed,PhysValid
+  which is not on leaf list because it has IOLock
+  With more debugging:
+  [ce5c5f08]0/74(4250)r0E:Pinned,Phase1,WPhase0,Valid,Dirty,Realloc,SegRef,CN,CNI,UninCredit,IOLock,Claimed,PhysValid</io.c:368>
+  or better (that was in lafs_iolock_written)
+  [ce5c05e8]0/74(4257)r0E:Pinned,Phase0,WPhase0,Valid,Realloc,SegRef,C,CN,CNI,UninCredit,IOLock,Claimed,PhysValid</file.c:247>
+  FIXED - I didn't unlock if it wasn't dirty any more.
+  Well almost - it occurs much less now.
+  Out of 48 runs:
+      8 BUG: soft lockup - CPU#0 stuck for 61s! [lafs_cleaner:1180]
+      1 BUG: unable to handle kernel NULL pointer dereference at 000001b4
+      2 BUG: unable to handle kernel paging request at 6b6b6bfbt
+      3 kernel BUG at /home/neilb/work/nfsbrick/fs/module/block.c:197!
+      2 kernel BUG at /home/neilb/work/nfsbrick/fs/module/block.c:529!6
+      1 kernel BUG at /home/neilb/work/nfsbrick/fs/module/modify.c:1650!
+      2 kernel BUG at /home/neilb/work/nfsbrick/fs/module/modify.c:1696!8
+      4 kernel BUG at /home/neilb/work/nfsbrick/fs/module/segments.c:1028!
+      1 kernel BUG at /home/neilb/work/nfsbrick/fs/module/segments.c:332!!
+
+  So we now have 1/12 rather than 2/3.
+  a/ pinned by IOLock from file.c:220 - FIXED
+  b/ as above
+  c/  Root is pinned by 4 children
+      328/0  with 196 of data blocks in writeback/realloc, in a cluster
+      0/1, 74/0, 0/8   all in a cluster waiting writeout.
+     Don't understand this.
+  d/ as a,b
+
+  Of the 48, 11 ran to completion leaving blocks from 286 to 899
+  
+
+  Looking at the loss of blocks when truncating.
+   tracing show small number of files with remaining blocks at delete.
+     sum is 26+22+14+272+11+2 == 347 cf df shows cb=457
+   next attempt: 14+24+26*11 =324 cf cb=1124
+   next attempt 26+6+15+68+29 == 144 cf cb=383
+   26+18+14+19+284 = 361 cf 379
+    files are (in order)
+   49    bfile       - 30K
+   325   nbfile-49   - 30K
+   320   nbfile-44   - 30K
+   296   nbfile-20   - 30K
+      ??331??
author	NeilBrown <neilb@suse.de>
	Thu, 10 Jun 2010 06:15:45 +0000 (16:15 +1000)
committer	NeilBrown <neilb@suse.de>
	Sun, 13 Jun 2010 07:20:00 +0000 (17:20 +1000)