From fcdcb2b2806b5b3a01306eaf4637bb701899b8a9 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 10 Jun 2010 16:15:45 +1000 Subject: [PATCH] README update --- README | 230 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 230 insertions(+) diff --git a/README b/README index 2a67921..07f805c 100644 --- a/README +++ b/README @@ -3905,5 +3905,235 @@ FIXED orphans don't get cleaned up. It seems a 'create' fails and leaves Now cluster.c:619: [ce9233f8]0/282(0)r2F:Pinned,Phase1,PinPending,CN,CNI,Claimed,PhysValid iblock(1) leaf(1) cluster.c:619: [ce570a18]0/286(0)r2F:Pinned,Phase1,PinPending,SegRef,CN,CNI,Claimed,PhysValid iblock(1) leaf(1) + cluster.c:619: [ce588d6c]0/17(0)r2F:Pinned,Phase0,PinPending,CN,CNI,Claimed,PhysValid iblock(1) leaf(1) + cluster.c:619: [ce51dfe4]0/283(0)r2F:Pinned,Phase0,PinPending,CN,CNI,Claimed,PhysValid iblock(1) leaf(1) + cluster.c:619: [cfbb8430]0/328(0)r2F:Pinned,Phase1,PinPending,CN,CNI,Claimed,PhysValid iblock(1) leaf(1) We are allocating an InoIdx block, but data block is not valid?? + That isn't very reproducible so I'll have to leave it for now... + erasedblock had been called on the data block .. inode 17?? + + Problem is that I keep changing the rules. + I don't erase the InoIdx block any more. + I used to, then change it to iolock_block/cluster_allocate->0 + + Problem: When all files are removed, usage is still quite high, two + segments have over 400 blocks (out of 512). Cleaning keeps running and + not making much progress. + segment 6 has usage of 484. + 'cluster 3072' shows: cluster 3072, 3085, 3086 3092 + Inode 0: blocks 267 272 276 + Inode 277: blocks 0/4 6/2 + Inode 0: blocks 0/2 8 16 + Inode 0: block 16 70/2 131/3 135/4 140/9 150/2 ... 296/7 + Inode 16: 1/1 + Inode 17: 0/28 + Inode 283: 12/18 + etc. + + All 'old', so must be the product of cleaning, as you would expect. + All (most) of this has been deleted though, but count didn't drop. + 'Count' add to 508, plus the 4 cluster heads makes 512 - good. + lafs_seg_move definitely isn't being called on these blocks. + it is only called from lafs_summary_update + cblocks_used "exactly" matches the number of un-removed blocks. + + + Another problem +bad [ce5bcf50]301/0(0)r1E:Index(0),Pinned,Phase1,WPhase0,InoIdx,Valid,SegRef,C,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1) +/home/neilb/work/nfsbrick/fs/module/modify.c:1652: [ce5bcf50]301/0(0)r1E:Index(0),Pinned,Phase1,WPhase0,InoIdx,Valid,SegRef,C,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1) +bad [cfbf6000]327/0(0)r1E:Index(0),Pinned,Phase1,WPhase0,InoIdx,Valid,SegRef,C,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1) +/home/neilb/work/nfsbrick/fs/module/modify.c:1656: [cfbf6000]327/0(0)r1E:Index(0),Pinned,Phase1,WPhase0,InoIdx,Valid,SegRef,C,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1) +bad [cfb62d20]291/0(0)r1E:Index(0),Pinned,Phase1,WPhase1,InoIdx,Valid,SegRef,C,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1) +/home/neilb/work/nfsbrick/fs/module/modify.c:1656: [cfb62d20]291/0(0)r1E:Index(0),Pinned,Phase1,WPhase1,InoIdx,Valid,SegRef,C,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1) + + and +free_blocks=1842 allocated=449 max_seg=512 clean_reserved=0 +Want dump of usage + +------------[ cut here ]------------ +kernel BUG at /home/neilb/work/nfsbrick/fs/module/segments.c:1028! + free list is empty - that should not be. + +and another... +/home/neilb/work/nfsbrick/fs/module/modify.c:1219: [ce9893b0]74/0(0)r1E:Index(1),Pinned,Phase0,WPhase1,InoIdx,Valid,Dirty,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1) +/home/neilb/work/nfsbrick/fs/module/modify.c:1219: [ce5ba690]74/0(0)r1E:Index(1),Pinned,Phase1,WPhase0,InoIdx,Valid,Dirty,CN,CNI,UninCredit,IOLock,PhysValid{0,0}[0] leaf(1) + [] ? lafs_get_flushable+0x131/0x191 [lafs] + [] ? lafs_do_checkpoint+0x1b3/0x3a2 [lafs] + [] ? cleaner+0x105/0x1426 [lafs] + [] ? autoremove_wake_function+0x0/0x33 + [] ? cleaner+0x0/0x1426 [lafs] + + +08Jun2010 + Weirdness with truncating. + The cleaner relocates a file resulting in the InoIdx block being + Maybe-dirty and phys_addr == 0. + Then truncate doesn't prune but just incorporates, finding + something weird there.. + file 278, blocks around 4100 + seem to find 1949 instead?? + + Note: When a non-InoIdx block is erased we set PhysValid + and physaddr == 0 to record the fact because it will not be stored... + +modify.c:1654: [ce5b4460]327/336(16)r4F:Index(1),Pinned,Phase0,WPhase1,Valid,Async,SegRef,C,CI,CN,CNI,IOLock,PhysValid{0,0}[0] leaf(1) inode_handle_orphan2(1) async(1) inode_handle_orphan3(1) +Async ?? +modify.c:1657: [cfb90690]327/340(787)r4F:Index(1),Pinned,Phase1,WPhase0,Valid,Async,SegRef,C,CI,CN,CNI,IOLock,PhysValid{0,0}[0] leaf(1) inode_handle_orphan2(1) async(1) inode_handle_orphan3(1) +Still Async ... wonder what it means. + +- directory block got corrupted. Maybe conversion to indexed?? + + +Getting bug in remove_from_index because the addr isn't +there, possibly block is empty. But incorporation is +??? instant? No it isn't. +If an index block hasn't be incorporated it has B_PrimaryRef +set as it hold a ref to something earlier index. +But what if nothing is incorporated? + + +Allocated [ce402230]328/340(0)r5F:Index(1),Pinned,Phase0,WPhase0,Valid,Dirty,Async,SegRef,CN,CNI,UninCredit,IOLock,PhysValid,Uninc{0,0}[0] inode_handle_orphan2(1) leaf(1) uninc(1) async(1) inode_handle_orphan3(1) -> 0 +looping on [ce402230]328/340(0)r5F:Index(1),Pinned,Phase0,Valid,Dirty,Writeback,Async,SegRef,CN,CNI,UninCredit,PhysValid,Uninc{0,0}[0] inode_handle_orphan2(1) cluster(1) uninc(1) async(1) inode_handle_orphan3(1) + +Then spin in a soft-lockup in lafs_inode_handle_orphan + + +----------- + - grow_index_tree needs to do initial incorporation so things can be found. + just like end of do_incorporate_internal. + NO - cannot incorp yet as do not have phys addr. Don't need to as + lafs_leaf_find explicitly handles this. + For truncate case we don't use the stored address, but ensure all + leaf indexes must be dirty (or gone) so whole tree must be + accessible for walking around. + - do_incorporate_internal needs to set B_PrimaryRef and take the ref + - when we remove a B_PrimaryRef without incorporating it, we need to + drop a ref if the *next* in the list is B_PrimaryRef + - need to use a constant to identify 'async' calls etc. + - maybe I need other iolock_block in truncate ?? to ensure it is Valid so + it isn't found as async.... + +09Jun2010 + STILL struggling with incorporation. + We have a premise that any file address is coverred by precisely + one leaf index block. Every leaf index has an implicit address + and it covers all addresses from there to the next leaf. The last + leaf covers to EOF. + So there must always be a leaf at address 0. + This applies within the tree from an internal index block too. + Beneath an internal index block there must be a leaf covering every + address up to the next internal index block. So there must be + a first. So storing the first address is pointless. And harmful. + When an index block becomes empty and disappears its coverage is + included in the previous block unless there is none, in which case + the next index block must be re-addressed. If there is no 'next', + this index block must be empty and so must disappear. + + BUT if we re-address an index block, we implicitly re-address the + first child - recursively - so we need to move/rehash them all + or lose them... or record where they are. Or do lookup not by + addr.... + I think just rehashing them all - with an iolock - is simple + and safe. So just do that. + + + So: I cleaned up index handling a truncation somewhat. + Now running looptest to see what patterns emerge: + + block.c:197 (*9+1) During umount, the Root datablock is + Dirty+Realloc + Maybe just need for cleaner to become inactive + during umount - hope that doesn't deadlock + didn't event work... + block.c:529 (*4+1) erase dblock while iblock depth > 0 + When pruning InoIdx we want to set depth to 0. + FIXME is this really want I want, or is depth=0 + only for data-inode ... FIXME + cluster.c:533 (*2) cluster_allocate on invalid block + Block is 8/0 in writepage from sync_inodes + This is the orphan file. + blocks aren't dirty + I guess the file gets truncated while we wait for it. + Just need to re-test. + index.c:1936 (*2). An index block is Root - FIXED?? + modify.c:1056 - secondary bug, ignore for now. + modify.c:1650 update_index fails to find target. + second call, phys==0 + Code was bad ... may not be the cause though. + modify.c:1696 (*4) lafs_incorporate gets non-dirty Index(1) block + from orphan handler. + Maybe just change the do/while back to 'do'. + modify.c:1704: (*2) lafs_inc gets leaf with uninc list??? + Index(0)/InoIdx + in do_checkpoint + uninc list gets set in lafs_add_block_address (parent of iblk), + do_incorporate_internal, + Maybe the InoIdx still had children. + segments.c:1028. (*4) The free list becomes empty. + super.c:655 (*3) Busy inodes after umount, and root InoIdx block + is still pinned as inode 16 data block was still dirty. + segusage slow. Maybe same as block.c:197 ?? + invalid address 6b6b6bfb: invalidate_inode_buffers in shutdown + finds invalid lock. + presumably the inodes was freed before invalidated. + spin on writeback during truncate (r3a) 8 times. now 10 + Probably because writeback cannot proceed while + orphan processing keeps looping. + kmalloc-1024 problems - (*2) + A block - should be start of page - isn't not what it appears... + + Others complete with 'cb' ranging from 202 to 715 + + +10 June 2010 + + Looking at segment.c:1028 + We run a seg_scan every checkpoint, so that should keep free segments + in the list..... + Ahh.. do_checkpoint is looping because root isn't changing phase. + + Lowest block pinned to old phase is + [cfb7df08]0/74(4253)r0E:Pinned,Phase1,WPhase0,Valid,Dirty,CN,CNI,UninCredit,IOLock,Claimed,PhysValid + which is not on leaf list because it has IOLock + With more debugging: + [ce5c5f08]0/74(4250)r0E:Pinned,Phase1,WPhase0,Valid,Dirty,Realloc,SegRef,CN,CNI,UninCredit,IOLock,Claimed,PhysValid + or better (that was in lafs_iolock_written) + [ce5c05e8]0/74(4257)r0E:Pinned,Phase0,WPhase0,Valid,Realloc,SegRef,C,CN,CNI,UninCredit,IOLock,Claimed,PhysValid + FIXED - I didn't unlock if it wasn't dirty any more. + Well almost - it occurs much less now. + Out of 48 runs: + 8 BUG: soft lockup - CPU#0 stuck for 61s! [lafs_cleaner:1180] + 1 BUG: unable to handle kernel NULL pointer dereference at 000001b4 + 2 BUG: unable to handle kernel paging request at 6b6b6bfbt + 3 kernel BUG at /home/neilb/work/nfsbrick/fs/module/block.c:197! + 2 kernel BUG at /home/neilb/work/nfsbrick/fs/module/block.c:529!6 + 1 kernel BUG at /home/neilb/work/nfsbrick/fs/module/modify.c:1650! + 2 kernel BUG at /home/neilb/work/nfsbrick/fs/module/modify.c:1696!8 + 4 kernel BUG at /home/neilb/work/nfsbrick/fs/module/segments.c:1028! + 1 kernel BUG at /home/neilb/work/nfsbrick/fs/module/segments.c:332!! + + So we now have 1/12 rather than 2/3. + a/ pinned by IOLock from file.c:220 - FIXED + b/ as above + c/ Root is pinned by 4 children + 328/0 with 196 of data blocks in writeback/realloc, in a cluster + 0/1, 74/0, 0/8 all in a cluster waiting writeout. + Don't understand this. + d/ as a,b + + Of the 48, 11 ran to completion leaving blocks from 286 to 899 + + + Looking at the loss of blocks when truncating. + tracing show small number of files with remaining blocks at delete. + sum is 26+22+14+272+11+2 == 347 cf df shows cb=457 + next attempt: 14+24+26*11 =324 cf cb=1124 + next attempt 26+6+15+68+29 == 144 cf cb=383 + 26+18+14+19+284 = 361 cf 379 + files are (in order) + 49 bfile - 30K + 325 nbfile-49 - 30K + 320 nbfile-44 - 30K + 296 nbfile-20 - 30K + ??331?? -- 2.39.5