[PATCH] kNFSd: NFSv4: tweak nfsd_readdir() for NFSv4
This patch makes three small changes to nfsd_readdir().
First, the 'filldir' routine for NFSv4 may return an arbitrary error,
which should become the return value for nfsd_readdir(). I implemented
this by adding an 'nfserr' field to the 'struct readdir_cd'.
Second, in NFSv4 the caller of nfsd_readdir() will specify an attribute
bitmap, which must be communicated to the 'filldir' routine. I implemented
this by adding a @bitmap parameter to nfsd_readdir() and a corresponding
field in the 'struct readdir_cd'. (The bitmap is not interpreted in any
way by nfsd_readdir().)
Finally, NFSv4 defines a new error nfserr_readdir_nospc, which indicates
that there was not enough buffer space to encode a single entry.
[PATCH] kNFSd: NFSv4: new argument to nfsd_access()
NFSv4 defines a new field in the ACCESS response: a bitmap to indicate
which access bits requested by the client are "supported", i.e. meaningful
for the object in question.
This patch adds a new parameter @supported to nfsd_access(), so that
nfsd_access() can set the value of this bitmap.
[PATCH] kNFSd: NFSv4: tweak nfsd_create_v3() for NFSv4
File creation in NFSv4 is almost the same as in NFSv3, with one minor
difference. If an UNCHECKED create is done, and the file exists, we
don't set any attributes. Exception: If size==0 is specified as part
of the attributes, then we do truncate the file, but only after processing
the rest of the OPEN. (File creation is always part of an OPEN request.)
This patch defines a new argument *truncp to nfsd_create_v3(), which
will be NULL for v3 requests. For v4 requests, it will point to a
variable which should be set to 1 if file truncation is still needed.
The logic in nfsd_create_v3() is changed as follows: If
- *truncp is not NULL
- the create is UNCHECKED
- the file exists
then nfsd_create_v3() returns immediately. If size==0 is specified,
then *truncp is set to 1.
This is kind of a hack, but the only alternative I could see was creating
a new routine nfsd_create_v4(), which would be identical to nfsd_create_v3()
except for this point.
[PATCH] kNFSd: NFSv4: allow type==0 in nfsd_unlink()
If nfsd_unlink() is called with @type equal to 0, then let it do the
right thing regardless of the type of the file being unlinked. This
is needed for the NFSv4 REMOVE operation, which works for any type of
file, even directories.
[PATCH] kNFSd: NFSv4: overflow check in nfsd_commit()
Sanity check COMMIT arguments by ensuring that (start)+(length) < 2^64.
The check is done in a way which is free of signedness pathologies in
all cases.
This change was inspired by pynfs, Peter Astrand's regression testsuite
for NFSv4 servers. The change is necessary for all of the COMMIT tests
to pass. However, it's a little open to debate whether the change is
really needed. I'm curious to hear the opinions of other developers.
[PATCH] kNFSd: NFSv4: allow resfh==fhp in fh_compose()
Change fh_compose() so that it will do the right thing if fhp==res_fh.
(This is convenient in the NFSv4 LOOKUP operation, which _replaces_
CURRENT_FH with the filehandle obtained by lookup.)
This requires extracting the interesting parts of the reference
filehandle first, then calling fh_put if it is a re-use.
[PATCH] kNFSd: NFSv4: wipe out all evidence in fh_put()
When a filehandle is cleared with fh_put(), wipe out all traces by
clearing ->fh_pre_saved and ->fh_post_saved. This prevents
fill_post_wcc() from complaining if the filehandle is later reused.
(This could happen in NFSv4 to CURRENT_FH if, for example, LOOKUP LOOKUP
occurs in a COMPOUND.)
[PATCH] kNFSd: NFSv4: fix type checking in fh_verify()
Change the type checking in fh_verify(). This fixes a bug which
I reported on the mailing list a few days ago, and also adds a
new error code nfserr_symlink (v4 only). This is returned whenever
an operation which is illegal for symlinks is attempted on a symlink,
and takes precedence over ERR_NOTDIR or ERR_INVAL.
Neil Brown [Fri, 23 Aug 2002 04:27:25 +0000 (21:27 -0700)]
[PATCH] md: Remove per-personality 'operational' and 'write_only' flags
raid1, raid5 and multipath maintain their own
'operational' flag. This is equivalent to
!rdev->faulty
and so isn't needed.
Similarly raid1 and raid1 maintain a "write_only" flag
that is equivalnt to
!rdev->in_sync
so it isn't needed either.
As part of implementing this change, we introduce some extra
flag bit in raid5 that are meaningful only inside 'handle_stripe'.
Some of these replace the "action" array which recorded what
actions were required (and would be performed after the stripe
spinlock was released). This has the advantage of reducing our
dependance on MD_SB_DISKS which personalities shouldn't need
to know about.
Neil Brown [Fri, 23 Aug 2002 04:26:52 +0000 (21:26 -0700)]
[PATCH] md: Make spare handling simple ... personalities know less
1/ Personalities only know about raid_disks devices.
Some might be not in_sync and so cannot be read from,
but must be written to.
- change MD_SB_DISKS to ->raid_disks
- add tests for .write_only
2/ rdev->raid_disk is now -1 for spares. desc_nr is maintained
by analyse_sbs and sync_sbs.
3/ spare_inactive method is subsumed into hot_remove_disk
spare_writable is subsumed into hot_add_disk.
hot_add_disk decides which slot a new device will hold.
4/ spare_active now finds all non-in_sync devices and marks them
in_sync.
5/ faulty devices are removed by the md recovery thread as soon
as they are idle. Any spares that are available are then added.
Neil Brown [Fri, 23 Aug 2002 04:26:27 +0000 (21:26 -0700)]
[PATCH] md: Keep track of number of pending requests on each component device on an MD array
This will allow us to know, in the event of a device failure, when the
device is completely unused and so can be disconnected from the
array. Currently this isn't a problem as drives aren't normally disconnect
until after a repacement has been rebuilt, which is a LONG TIME, but that
will change shortly...
We always increment the count under a spinlock after checking that
it hasn't been disconnected already (rdev!= NULL).
We disconnect under the same spinlock after checking that the
count is zero.
Neil Brown [Fri, 23 Aug 2002 04:21:49 +0000 (21:21 -0700)]
[PATCH] call svc_sock_setbufsize when socket created.
bufsiz is re-evaluated on recv if SK_CHNGBUF is set,
but recv will never be reached if the buffers are too small.
So we have to set to to something vaguely reasonable
at init time.
Neil Brown [Fri, 23 Aug 2002 04:21:39 +0000 (21:21 -0700)]
[PATCH] Fix two problems with multiple concurrent nfs/tcp connects.
1/ connect requests would be get lost...
As the comment at the top of svcsock.c says when discussing
SK_CONN:
* after a set, svc_sock_enqueue must be called.
We didn't and so lost conneciton requests.
2/ set the max accept backlog to a more reasonable number to cope
with bursts of lots of connection requests.
Andrew Morton [Tue, 20 Aug 2002 10:45:33 +0000 (03:45 -0700)]
[PATCH] Fix YA bug in __page_cache_release
__page_cache_release() needs to check PG_lru inside the lock, because
page reclaim may have taken the page off the LRU while this CPU waits
on the lock.
That's three bugs in a single twenty-line function. So far.
Marcus Alanen [Tue, 20 Aug 2002 10:45:28 +0000 (03:45 -0700)]
[PATCH] vmalloc.c error path fixes
This fixes some problems in vmalloc.c. The two first parts of the diff
fix a spinlock being held if an error occurs in map_vm_area, and the
last part fixes the error path of __vmalloc.
- Use req->rq_received to determine the message length instead of
assuming that it goes to the end of the page.
- If the server returned an illegal record so that we cannot make
progress by retrying the request on a fresh page, truncate the
entire listing and return a syslog error.
Trond Myklebust [Tue, 20 Aug 2002 05:24:14 +0000 (22:24 -0700)]
[PATCH] Improve NFS READ reply sanity checking
- Fix the check for whether or not the received message length has
somehow been truncated: we need to use req->rq_received rather
than the receive buffer length (req->rq_rlen).
- Ensure that we set res->eof correctly. In particular, we need to
clear it if we find ourselves attempting to recover from a
truncated READ.
- Don't set PageUptodate() on those pages that are the victim of
message truncation.
Robert Love [Tue, 20 Aug 2002 05:23:02 +0000 (22:23 -0700)]
[PATCH] spinlock.h cleanup
- cleanup #defines: I do not follow the rationale behind the
odd line-wrapped defines at the beginning of the file. If
we have to use multiple lines, then we might as well do so
cleanly and according to normal practice...
- Remove a level of indirection: do not have spin_lock_foo
use spin_lock - just explicitly call what is needed.
- we do not need to define the spin_lock functions twice, once
for CONFIG_PREEMPT and once for !CONFIG_PREEMPT. Defining
them once with the preempt macros will optimize away fine.
- cleanup preempt.h too
- other misc. cleanup, improved comments, reordering, etc.
This fixes the ptrace wait4() anomaly that can be observed in any
previous Linux kernel i could get my hands at.
If the parent still has other children (that are being traced by
somebody), we wait for them or return immediately without an error in
case of WNOHANG.
Dave Jones [Tue, 20 Aug 2002 01:10:58 +0000 (18:10 -0700)]
[PATCH] struct superblock cleanups.
Finally, this chunk removes the references to the UFS & ROMFS
entries in struct superblock, leaving just ext3 and hpfs as
the only remaining fs's to be fixed up.
Andrew Morton [Mon, 19 Aug 2002 13:04:50 +0000 (06:04 -0700)]
[PATCH] fix uniprocessor lockups
I have a test_and_set_bit(PG_chainlock, page->flags) in page reclaim.
Which works fine on SMP. But on uniprocessor, we made
pte_chain_unlock() a no-op, so all pages end up with PG_chainlock set.
refill_inactive() cannot move any pages onto the inactive list and the
machine dies.
The patch removes the test_and_set_bit optimisation in there and just
uses pte_chain_lock(). If we want that (dubious) optimisation back
then let's do it right and create pte_chain_trylock().
Andrew Morton [Mon, 19 Aug 2002 13:04:46 +0000 (06:04 -0700)]
[PATCH] Fix a race between __page_cache_release() and shrink_cache()
__page_cache_release() needs to recheck the page count inside the LRU
lock, because shrink_cache() may have found the page on the LRU and
incremented its refcount again.
Which is carefully documented over __pagevec_release(). Duh.
- adds cleanups suggested by Christoph Hellwig: needed unlikely()
statements, a superfluous #define and line length problems.
- splits up the global ptrace list into per-task ptrace lists. This was
pretty straightforward, and this makes the worst-case exit() latency
O(nr_children).
the per-task ptrace lists unearthed a bug that the previous code did not
take care of: tasks on the ptrace list have to be correctly reparented as
well. This patch passed my stresstests as well.
Ingo Molnar [Sat, 17 Aug 2002 05:10:35 +0000 (22:10 -0700)]
[PATCH] Thread exit notification by futex
This updates the CLONE_CLEARTID case to use futexes to make it easier
to wait for a thread exit.
glibc/pthreads had been updated to use the TID-futex, this removes an
extra system-call and it also simplifies the pthread_join() code. The
pthreads testcode works just fine with the new kernel and does not work
with a kernel that does not do the futex wakeup, so it's working fine.
Petr Vandrovec [Fri, 16 Aug 2002 09:57:02 +0000 (02:57 -0700)]
[PATCH] More display -> fb_info fixes for new fbdev
This is the second part of "broken cfb* support in the 2.5.31-bk". I
needed fbcon-cfb2 on one of my systems, and so I went through all
fbcon-* drivers and fixed them.
line_length, type, type_aux and visual were moved from display to
fb_info in last James Simmon's fbdev update. Unfortunately lowlevel
support modules were not updated.
Alexander Viro [Fri, 16 Aug 2002 02:57:50 +0000 (19:57 -0700)]
[PATCH] per-disk gendisks in i2o
Note: I've also fixed several obvious "forgot to update" problems (changed
prototype of blk_init_queue(), etc.) but I hadn't touched the DMA-mapping
stuff, so it still doesn't work with 2.5; moreover, it misses a lot of fixes
done in 2.4, but that's fun for Alan - he's the maintainer
Andrew Morton [Thu, 15 Aug 2002 13:40:14 +0000 (06:40 -0700)]
[PATCH] memory leak in current BK
Well I didn't test that very well. __page_cache_release() is doing a
__free_page() on a zero-ref page, so __free_pages() sends the refcount
negative and doesn't free it. With patch #8, page_cache_release()
almost never frees pages, but it must have been leaking a little bit.
Lucky it showed up.
This fixes it, and also adds a missing PageReserved test in put_page().
Which makes put_page() identical to page_cache_release(), but there are
header file woes. I'll fix that up later.
Ingo Molnar [Thu, 15 Aug 2002 09:48:18 +0000 (02:48 -0700)]
[PATCH] thread management - take three
you have applied my independent-pointer patch already, but i think your
CLEARTID variant is the most elegant solution: it reuses a clone argument,
thus reduces the number of arguments and it's also a nice conceptual pair
to the existing SETTID call. And the TID field can be used as a 'usage'
field as well, because the TID (PID) can never be 0, reducing the number
of fields in the TCB. And we can change the userspace locking code to use
the TID field no problem.
Martin Mares [Thu, 15 Aug 2002 04:34:02 +0000 (21:34 -0700)]
[PATCH] PCI ID's for 2.5.31
I've filtered all submissions to the ID database, merged new ID's from
both 2.4.x and 2.5.x kernels and here is the result -- patch to 2.5.31
pci.ids with all the new stuff. Could you please send it to Linus?
(I would do it myself, but it seems I'll have a lot of work with the
floods in Prague very soon.)
Keith Mannthey [Thu, 15 Aug 2002 04:31:37 +0000 (21:31 -0700)]
[PATCH] for i386 SETUP CODE
The following is a simple fix for an array overrun problem in
mpparse.c. I am working on a multiquad box which has a EISA bus in it
for it's service processor. It's local bus number is 18 which is > 3
(see quad_local_to_mp_bus_id. When the NR_CPUS is close the the real
number of cpus adding the EISA bus #18 in the array stomps all over
various things in memory. The EISA bus does not need to be mapped
anywhere in the kernel for anything. This patch will not affect non
clustered apic (multiquad) kernels.
Trond Myklebust [Thu, 15 Aug 2002 04:29:56 +0000 (21:29 -0700)]
[PATCH] Clean up the RPC socket slot allocation code [2/2]
Patch by Chuck Lever. Remove the timeout logic from call_reserve.
This improves the overall RPC call ordering, and ensures that soft
tasks don't time out and give up before they have attempted to send
their message down the socket.