From: NeilBrown Date: Mon, 11 Oct 2010 01:47:26 +0000 (+1100) Subject: roll-forwards : review and minor fixes / cleanups. X-Git-Url: http://git.neil.brown.name/?a=commitdiff_plain;h=4855798bf64dac310b835189c73ef6ccbc7a6379;p=LaFS.git roll-forwards : review and minor fixes / cleanups. Now have a nice list of issues to address. Signed-off-by: NeilBrown --- diff --git a/README b/README index 21ee12f..44150df 100644 --- a/README +++ b/README @@ -5167,9 +5167,9 @@ DONE 15bi/ Set EmergencyClean a bit later - need at least one checkpoint first. 15bn/ How does refcounting of 'struct fs' work with multiple filesets? -15bo/ use put_super to drop last refer to superblocks +DONE 15bo/ use put_super to drop last refer to superblocks -15bp/ review all superblocks - maybe use more anon?? +DONE 15bp/ review all superblocks - maybe use more anon?? 15bq/ check readonly status in lafs_get_sb @@ -5213,7 +5213,7 @@ DONE 15bi/ Set EmergencyClean a bit later - need at least one checkpoint first. - unusual conditions I want a warning of - data corruption errors -15cf/ lafs_iget_fs need to sometimes to in-kernel mounts for subset filesystems +DONE 15cf/ lafs_iget_fs need to sometimes to in-kernel mounts for subset filesystems This is needed for the cleaner - the cleaner needs to hold a ref somehow. 15cg/ lafs_sync_inode is weird - why the lafs_checkpoint_start and update_cluster @@ -5291,8 +5291,23 @@ DONE 15bi/ Set EmergencyClean a bit later - need at least one checkpoint first. - how to destroy 36/ review roll-forward - make sure files with nlink == 0 are handled well - sanity check before trusting clusters + +36a/ make sure files with nlink == 0 are handled well +DONE 36b/ sanity check before trusting clusters +36c/ handle miniblocks which create new inodes. +36d/ Handle DescHole in roll_block +36e/ When dirtying a block in roll_block, maybe use writeback rather + than just iolock, for consistency... +37f/ What to do if table becomes full when add_block_address in + roll_block ?? +37g/ Write roll_mini for directories. +37h/ In roll_one, use the cluster counting code to find block number and + make sure we don't exceed the segment. +37i/ add more general error checking to lafs_mount - + lafs_iget orphans and segsum. Check type is correct. + errors from lafs_count_orphans or lafs_add_orphans. + alloc_page failure for chead - maybe allocate something bigger?? + 37/ Configure index block hash_table at run time base on mem size?? diff --git a/inode.c b/inode.c index acc9261..759365c 100644 --- a/inode.c +++ b/inode.c @@ -304,6 +304,8 @@ lafs_import_inode(struct inode *ino, struct datablock *b) = i->quota_inodes[2] = NULL; nlen = li->metadata_size - offsetof(struct la_inode, metadata[0].fs.name); + if (i->name) + kfree(i->name); if (nlen == 0) i->name = NULL; else { diff --git a/roll.c b/roll.c index b28e5e3..ca80cf7 100644 --- a/roll.c +++ b/roll.c @@ -1,11 +1,53 @@ /* * fs/lafs/roll.c - * Copyright (C) 2005-2009 + * Copyright (C) 2005-2010 * Neil Brown * Released under the GPL, version 2 * * 'rollforward' + * + * This file handles mounting of a filesystem once the superblocks + * have been loaded. + * It loads the root inode (the root of the filesystem, not of the + * directory tree) and then handles roll-forward to pick up and changes + * there are not in the filesystem yet, either due to a crash, or because + * they cannot be consistently stored easily (final segusage/quota info). + * + * Roll-forward reads write-cluster header and handle things as appropriate. + * Data blocks are only processed if they belong to: + * - regular files + * - segusage files + * - quota files. + * A data block in a regular file implies an extension of the file size + * to the end of the block, if it was previously at or before the start + * of the block. Datablocks that were just moved for cleaning are + * ignored. + * + * Index blocks are always ignored - they need to be recalculated. + * + * 'miniblocks' or 'updates' are always processed - they represent an + * atomic update that might affect multiple files - those files for which + * data blocks are ignored. + * Updates are understood: + * - for inodes. The update simply over-writes part of the inode metadata, + * which could affect the link count or size. Such inodes become + * orphans in case truncation or deletion is needed. This can create + * an inode which might affect the inode usage map. + * - for directories. The update identifies a name and an inode number. + * This can imply a change to the inode's link count and again could + * make it an orphan. In some cases updates are paired, possibly across + * different directories. This is needed for 'rename'. + * + * Each write-cluster has three levels of validation. + * Firstly, if the header is internally consistent, with correct tag, + * uuid, and sequence, then we know a write was attempted, and anything that + * must be written before that was successfully written. + * Secondly, if the header has a correct checksum, then it is all correct, + * and the miniblocks are valid. + * Thirdly, if the next or next-but-one header (depending on verify_type) is + * internally consistent, than we know that the data blocks in this cluster + * were all written successfully. */ #include "lafs.h" @@ -39,18 +81,23 @@ roll_valid(struct fs *fs, struct cluster_head *ch, unsigned long long addr) } /* + * roll_locate scopes out the full extent of the required roll-forward. + * It start at the start of the last checkpoint (recorded in the stateblock) + * and checks that the end of the checkpoint exists, and continues following + * the chain as far as valid cluster heads can be found. * roll_locate returns 0 if proper endpoints were found, - * or -EINVAL?? if CheckpointStart and CheckpointEnd weren't found properly + * or -EIO if CheckpointStart and CheckpointEnd weren't found properly * "next" will contain the address of the next cluster to be written to, * "last" the cluster before that, and "seq" the seq number for next cluster + * "maxp" will be used to report the maximum size of a cluster head. */ static int roll_locate(struct fs *fs, u64 start, - u64 *next, u64 *lastp, u64 *seqp, + u64 *nextp, u64 *lastp, u64 *seqp, int *maxp, struct page *p) { struct cluster_head *ch; - u64 this, prev, prev2, last; + u64 this, prev, prev2, last, next; u64 seq = 0; int max = 0; int prevtype, prev2type; @@ -59,6 +106,9 @@ roll_locate(struct fs *fs, u64 start, this = start; prev = start; + /* First we walk through the checkpoint section, which should + * all be valid. + */ do { if (lafs_load_page(fs, p, this, 1) != 0) { printk(KERN_ERR "LaFS: Could not read cluster %llu\n", @@ -109,7 +159,7 @@ roll_locate(struct fs *fs, u64 start, /* 'seq' is sequence number of 'this' */ dprintk("CheckpointEnd found at %llu, seq %llu\n", prev, seq-1); - /* now we need to step forward a bit more carefully. as any + /* now we need to step forward a bit more carefully, as any * cluster we find now could easily be bad. * We keep: * this - address of cluster we are now considering @@ -123,7 +173,7 @@ roll_locate(struct fs *fs, u64 start, */ last = prev; - start = this; + next = this; prev2 = prev; prevtype = prev2type = VerifyNull; @@ -136,7 +186,7 @@ roll_locate(struct fs *fs, u64 start, break; if (le64_to_cpu(ch->seq) != seq) break; - /* FIXME check checksum, and possibly VerifySum */ + /* this head looks valid, so we can possibly verify previous * clusters */ @@ -144,32 +194,35 @@ roll_locate(struct fs *fs, u64 start, max = le16_to_cpu(ch->Hlength); if (prev2type == VerifyNext2) { - start = prev; last = prev2; + next = prev; + last = prev2; } if (prevtype == VerifyNext) { - start = this; + next = this; last = prev; } /* shift prev info back */ - prev2 = prev; prev2type = prevtype; - prev = this ; prevtype = le16_to_cpu(ch->verify_type); + prev2 = prev; + prev2type = prevtype; + prev = this; + prevtype = le16_to_cpu(ch->verify_type); this = le64_to_cpu(ch->next_addr); if (prevtype == VerifyNull) { - start = this; + next = this; last = prev; } seq++; } - dprintk("LaFS: Next address to write is %llu\n", start); - *next = start; + dprintk("LaFS: Next address to write is %llu\n", next); + *nextp = next; *lastp = last; - if (start == this) + if (next == this) *seqp = seq; - else if (start == prev) + else if (next == prev) *seqp = seq-1; - else if (start == prev2) + else if (next == prev2) *seqp = seq-2; else BUG(); @@ -213,21 +266,37 @@ roll_mini(struct fs *fs, int fsnum, int inum, int trunc, int flg, li = LAFSI(inode); switch (li->type) { + default: /* Any unknown type is an error */ + printk(KERN_WARNING "LAFS impossibly file type for roll-forward: %d\n", + li->type); + err = -EIO; + break; + case TypeInodeFile: - BUG_ON(fsnum); /* FIXME should be more careful */ + if (fsnum) { + printk(KERN_WARNING "LAFS: Ignoring impossible sub-subset\n"); + break; + } + lafs_iput_fs(inode); inode = lafs_iget_fs(fs, inum, bnum, SYNC); if (IS_ERR(inode)) { err = PTR_ERR(inode); - if (err == -EIO && offset == 0) { - /* creating new inode */ - } - return PTR_ERR(inode); + if (err != -ENOENT || offset != 0) + return err; + + /* FIXME creating new inode */ + BUG(); } db = lafs_inode_dblock(inode, SYNC, MKREF(roll)); + if (IS_ERR(db)) { + err = PTR_ERR(db); + break; + } + /* Make sure block is in-sync with inode */ + lafs_inode_fillblock(inode); buf = map_dblock(db); - /* FIXME do I sync the inode back to the datablock first? */ memcpy(buf+offset, data, len); unmap_dblock(db, buf); err = lafs_import_inode(inode, db); @@ -243,6 +312,11 @@ roll_mini(struct fs *fs, int fsnum, int inum, int trunc, int flg, } putdref(db, MKREF(roll)); break; + + case TypeDir: + /* Haven't written this yet FIXME */ + BUG(); + break; } lafs_iput_fs(inode); return err; @@ -250,7 +324,7 @@ roll_mini(struct fs *fs, int fsnum, int inum, int trunc, int flg, static int __must_check roll_block(struct fs *fs, int fsnum, int inum, int trunc, int flg, - u32 bnum, u64 baddr, int type, struct page *p) + u32 bnum, u64 baddr, int bytes, struct page *p) { struct inode *inode; struct datablock *blk = NULL; @@ -259,9 +333,9 @@ roll_block(struct fs *fs, int fsnum, int inum, int trunc, int flg, if (flg) return 0; /* "old" blocks aren't interesting */ - if (type == DescIndex) + if (bytes == DescIndex) return 0; /* index blocks aren't interesting either */ - if (type == DescHole) + if (bytes == DescHole) return 0; /* FIXME should I punch a hole here? */ dprintk("Roll Block %d/%d/%d/%lu/%llu\n", @@ -273,8 +347,6 @@ roll_block(struct fs *fs, int fsnum, int inum, int trunc, int flg, if (IS_ERR(inode)) return PTR_ERR(inode); - /* FIXME do I need to 'lock' the inode in any way? */ - /* check type */ li = LAFSI(inode); @@ -342,12 +414,11 @@ roll_block(struct fs *fs, int fsnum, int inum, int trunc, int flg, /* already correctly indexed */ break; - /* FIXME do I need to dirty the inode to flush - * this change into the datablock? - */ if (li->type >= TypeBase && - inode->i_size <= ((loff_t)bnum << inode->i_blkbits)) - inode->i_size = ((loff_t)bnum << inode->i_blkbits) + type; + inode->i_size <= ((loff_t)bnum << inode->i_blkbits)) { + inode->i_size = ((loff_t)bnum << inode->i_blkbits) + bytes; + set_bit(I_Dirty, &LAFSI(inode)->iflags); + } /* FIXME: we pretend this is a dirty, pinned block * so the lower-level code doesn't get confused. diff --git a/super.c b/super.c index bdfbe1c..5ec7029 100644 --- a/super.c +++ b/super.c @@ -1250,6 +1250,7 @@ static struct inode *lafs_alloc_inode(struct super_block *sb) li->iblock = NULL; li->dblock = NULL; li->update_cluster = 0; + li->md.fs.name = NULL; init_rwsem(&li->ind_sem); INIT_LIST_HEAD(&li->free_index);