David Brownell [Tue, 29 Oct 2002 15:43:32 +0000 (07:43 -0800)]
[PATCH] USB: clean up usb structures some more
This patch splits up the usb structures to have two structs,
"usb_XXX_descriptor" with just the descriptor, and "usb_host_XXX" (or
something similar) to wrap it and add the "extra" pointers plus the
array of related descriptors that the host parsed during enumeration.
(2 or 3 words extra in each"usb_host_XXX".) This further matches the
"on the wire" data and enables the gadget drivers to share the same
header file.
Covers all the linux/drivers/usb/* and linux/sound/usb/* stuff, but
not a handful of other drivers (bluetooth, iforce, hisax, irda) that
are out of the usb tree and will likely be affected.
Josh Myer [Tue, 29 Oct 2002 07:27:28 +0000 (23:27 -0800)]
[PATCH] [PATCH] fix a FIXME in usb.h
In ush.h, there's a FIXME for the URB transfer flags. This patch is
basically a global search and replace to change those all from USB_ to
URB_.
It touches a few things that aren't directly USB-related, and so should
probably be passed by those authors, but I figured i should put it here to
get feedback (ie: "No, moron, you did it all wrong!" or "Oops, that FIXME
wasn't supposed to be there") before bothering them.
Hirofumi Ogawa [Tue, 29 Oct 2002 01:48:40 +0000 (17:48 -0800)]
[PATCH] remove the fat_cvf stuff (2/3)
This removes fat_cvf stuff, and adds printk() level. As far as I
know, all the challengers gave up porting of fat_cvf.
(This patch from Christoph Hellwig)
Jens Axboe [Tue, 29 Oct 2002 00:47:36 +0000 (16:47 -0800)]
[PATCH] arrange request fiels sanely
Right now, various fields in struct request are just scattered
throughout the struct. This makes for bad cache behaviour. This patch
puts commonly referenced together fiels in the same cache lines and also
removes the prefetches in deadline_merge(). The latter was actually
hurting performance here now that struct request is sanely laid out wrt
cache.
This is worth ~40% less deadline_merge() runtime during disk intensive
tests!
Jens Axboe [Tue, 29 Oct 2002 00:47:25 +0000 (16:47 -0800)]
[PATCH] scsi_command_size[] only known when SCSI is enabled
block/scsi_ioctl.c uses scsi_command_size[] to get from opcode to length
of cdb, but that is only available with SCSI enabled. Move to
block/scsi_ioctl.c from scsi/scsi.c.
Andrew Morton [Tue, 29 Oct 2002 00:34:19 +0000 (16:34 -0800)]
[PATCH] much miscellany
- add locking comments to do_mmap_pgoff(), filemap.c
- used unsigned long for cpu flags in aio.c (Andi)
- An x86-64 typo fix from Andi.
- Fix a tpyo
- Fix an unused var warning in the stack overflow check code
- mptlan compile fix (Rasmus Andersen)
- Update misleading comment in ia32 highmem.c
- "attempting to mount an ext3 fs on a stopped md/raid1 array caused a
divide by 0 error in ext3_fill_super. Fix duplicates check already
in ext2." - Angus Sawyer <angus.sawyer@dsl.pipex.com>
- Someone changed the return type of inl() again! Fix up compiler
warnings in 3c59x.c again.
Andrew Morton [Tue, 29 Oct 2002 00:23:03 +0000 (16:23 -0800)]
[PATCH] thread-aware oom-killer
From Ingo
- performance optimization: do not kill threads in the same thread group
as the OOM-ing thread. (it's still necessery to scan over every thread
though, as it's possible to have CLONE_VM threads in a different thread
group - we do not want those to escape the OOM-kill.)
- to not let newly created child threads slip out of the group-kill. Note
that the 2.4 kernel's OOM handler has the same problem, and it could be
the reason why forkbombs occasionally slip out of the OOM kill.
entries_in_slab and nr_lru_pages can vary a lot. There is a potential
for 32-bit overflows.
I spent ages trying to avoid corner cases which cause a significant
lack of precision while preserving some clarity. Gave up and used
do_div(). The code is called rarely - at most once per 128 kbytes of
reclaim.
The patch adds a tweak to balance_pgdat() to reduce the call rate to
shrink_slab() in the case where the zone is just a little bit below
pages_high.
Also increase SHRINK_BATCH. The things we're shrinking are typically a
few hundred bytes, and a batchcount of 128 gives us a minimum of ten
pages or so per shrinking callout.
Andrew Morton [Tue, 29 Oct 2002 00:22:53 +0000 (16:22 -0800)]
[PATCH] uninline the ia32 copy_*_user functions
There's more work to do on these, for well-aligned copies.
Arjan has some stuff for that. First step on that path is
to clean the code up, get it uninlined and have a framework for
making per-CPU-type decisions.
Andrew Morton [Tue, 29 Oct 2002 00:22:49 +0000 (16:22 -0800)]
[PATCH] faster copy_*_user for bad alignments on intel ia32
This patch speeds up copy_*_user for some Intel ia32 processors. It is
based on work by Mala Anand.
It is a good win. Around 30% for all src/dest alignments except 32/32.
In this test a fully-cached one gigabyte file was read into an
8192-byte userspace buffer using read(fd, buf, 8192). The alignment of
the user-side buffer was altered between runs. This is a PIII. Times
are in seconds.
So `rep;movsl' is slower at all non-cache-aligned offsets.
PII is using the PIII alignment. I don't have a PII any more, but I do
recall that it demonstrated the same behaviour as the PIII.
The patch contains an enhancement (based on careful testing) from
Hirokazu Takahashi <taka@valinux.co.jp>. In cases where source and
dest have the same alignment, but that aligment is poor, we do a short
copy of a few bytes to bring the two pointers onto a favourable
boundary and then do the big copy.
And also a bugfix from Hirokazu Takahashi.
As an added bonus, this patch decreases the kernel text by 28 kbytes.
22k of this in in .text and the rest in __ex_table. I'm not really
sure why .text shrunk so much.
These copy routines have no special-case for constant-sized copies. So
a lot of uaccess.h becomes dead code with this patch. The next patch
which uninlines the copy_*_user functions cleans all that up and saves
an additional 5k.
Andrew Morton [Tue, 29 Oct 2002 00:22:43 +0000 (16:22 -0800)]
[PATCH] export nr_running and nr_iowait tasks in /proc
From Rik.
"this trivial patch, against 2.5-current, exports nr_running and
nr_iowait_tasks in /proc/stat. With this patch in vmstat will no
longer need to walk all the processes in the system just to determine
the number of running and blocked processes."
Andrew Morton [Tue, 29 Oct 2002 00:22:39 +0000 (16:22 -0800)]
[PATCH] radix_tree_gang_lookup fix
When performing lookups against very sparse trees
radix_tree_gang_lookup fails to find nodes "far" to the right of the
start point. Because it only understands sparseness in the leaf nodes,
not the intermediate nodes.
Nobody noticed this because all callers are incrementing the start
index as they walk the tree.
Change it to terminate the search when it really has inspected the last
possible node for the current tree's height.
Andrew Morton [Tue, 29 Oct 2002 00:22:33 +0000 (16:22 -0800)]
[PATCH] less buslocked operations in the page allocator
Sort-of-but-not-really from High Dickins.
We're doing a lot of buslocked operations in the page allocator just
for debug. Plus when they _do_ trigger, there are so many BUG_ONs in
there that it's rather hard to work out from user reports which one
actually triggered.
So redo all that and also print out some more useful info about the
page state before taking the machine out.
(And yes, we need to take the machine out. Incorrect page handling in
there can cause file corruption).
Andrew Morton [Tue, 29 Oct 2002 00:22:28 +0000 (16:22 -0800)]
[PATCH] add a file_ra_state init function
Provide a function in core kernel to initialise a file_ra_state structure.
Perviously this was all taken care of by the fact that new struct
file's are all zeroed out. But now a file_ra_state may be
independently allocated, and we don't want users of it to have to know
how to initialise it.
Andrew Morton [Tue, 29 Oct 2002 00:22:23 +0000 (16:22 -0800)]
[PATCH] permit direct IO with finer-than-fs-blocksize alignments
Mainly from Badari Pulavarty
Traditionally we have only supported O_DIRECT I/O at an alignment and
granularity which matches the underlying filesystem. That typically
means that all IO must be 4k-aligned and a multiple of 4k in size.
Here, we relax that so that direct I/O happens with (typically)
512-byte alignment and multiple-of-512-byte size.
The tricky part is when a write starts and/or ends partway through a
filesystem block which has just been added. We need to zero out the
parts of that block which lie outside the written region.
We handle that by putting appropriately-sized parts of the ZERO_PAGE
into sepatate BIOs.
The generic_direct_IO() function has been changed so that the
filesystem must pass in the address of the block_device against which
the IO is to be performed. I'd have preferred to not do this, but we
do need that info at that time so that alignment checks can be
performed.
If the filesystem passes in a NULL block_device pointer then we fall
back to the old behaviour - must align with the fs blocksize.
There is no trivial way for userspace to know what the minimum
alignment is - it depends on what bdev_hardsect_size() says about the
device. It is _usually_ 512 bytes, but not always. This introduces
the risk that someone will develop and test applications which work
fine on their hardware, but will fail on someone else's hardware.
It is possible to query the hardsect size using the BLKSSZGET ioctl
against the backing block device. This can be performed at runtime or
at application installation time.
Andrew Morton [Tue, 29 Oct 2002 00:22:18 +0000 (16:22 -0800)]
[PATCH] restructure direct-io to suit bio_add_page
The direct IO code was initially designed to allocate a known-sized
BIO, to fill it with pages and to then send it off.
Then along came bio_add_page(). Really, it broke direct-io.c - it
meant that the direct-IO BIO assembly code no longer had a-priori
knowledge of whether a page would fit into the current BIO.
Our attempts to rework the initial design to play well with
bio_add_page() really weren't adequate. The code was getting more and
more twisty and we kept finding corner-cases which failed.
So this patch redesigns the BIO assembly and submission path of the
direct-IO code so that it better suits the bio_add_page() semantics.
It introduces another layer in the assembly phase: the 'cur_page' which
is cached in the dio structure.
The function which walks the file mapping do_direct_IO() simply emits a
sequence of (page,offset,len,sector) quads into the next layer down -
submit_page_section().
submit_page_section() is responsible for looking for a merge of the new
quad against the previous page section (same page). If no merge is
possible it passes the currently-cached page down to the next level,
dio_send_cur_page().
dio_send_cur_page() will try to add the current page to the current
BIO. If that fails, the current BIO is submitted for IO and we open a
new one.
So it's all nicely layered. The assembly of sections-of-page into the
current page closely mirrors the assembly of sections-of-BIO into the
current BIO.
At both of these levels everything is done in a "deferred" manner: try
to merge a new request onto the currently-cached one. If that fails
then send the currently-cached request and then cache this one instead.
Some variables have been renamed to more closely represent their usage.
Some thought has been put into ownership of the various state variables
within `struct dio'. We were updating and inspecting these in various
places in a rather hard-to-follow manner. So things have been reworked
so that particular functions "own" particular parts of the dio
structure. Violators have been exterminated and commentary has been
added to describe this ownership.
The handling of file holes has been simplified.
As a consequence of all this, the code is clearer and simpler than it
used to be, and it now passes the modified-for-O_DIRECT fsx-linux
testing again.
Andrew Morton [Tue, 29 Oct 2002 00:22:13 +0000 (16:22 -0800)]
[PATCH] invalidate_inode_pages fixes
Two fixes here.
First:
Fixes a BUG() which occurs if you try to perform O_DIRECT IO against a
blockdev which has an fs mounted on it. (We should be able to do
that).
What happens is that do_invalidatepage() ends up calling
discard_buffer() on buffers which it couldn't strip. That clears
buffer_mapped() against useful things like the superblock buffer_head.
The next submit_bh() goes BUG over the write of an unmapped buffer.
So just run try_to_release_page() (aka try_to_free_buffers()) on the
invalidate path.
Second:
The invalidate_inode_pages() functions are best-effort pagecache
shrinkers. They are used against pages inside i_size and are not
supposed to throw away dirty data.
However it is possible for another CPU to run set_page_dirty() against
one of these pages after invalidate_inode_pages() has decided that it
is clean. This could happen if someone was performing O_DIRECT IO
against a file which was also mapped with MAP_SHARED.
So recheck the dirty state of the page inside the mapping->page_lock
and back out if the page has just been marked dirty.
This will also prevent the remove_from_page_cache() BUG which will occur
if someone marks the page dirty between the clear_page_dirty() and
remove_from_page_cache() calls in truncate_complete_page().
Andrew Morton [Tue, 29 Oct 2002 00:22:07 +0000 (16:22 -0800)]
[PATCH] libfs a_ops correctnes
simple_prepare_write() currently memsets the entire page. It only
needs to clear the parts which are outside the to-be-written region.
This change makes no difference to performance - that memset was just a
cache preload for the copy_from_user() in generic_file_write(). But
it's more correct.
Also, mark the page dirty in simple_commit_write(), not in
simple_prepare_write(). Because the page's contents are changed after
prepare_write(). This doesn't matter in practice, but it is setting a
bad example.
Also, add a flush_dcache_page() to simple_prepare_write(). Again, not
really needed because the page cannot be mapped into pagetables if it
is not uptodate. But it is example code and should not be missing such
things.
Andrew Morton [Tue, 29 Oct 2002 00:22:02 +0000 (16:22 -0800)]
[PATCH] move ramfs a_ops into libfs
From Bill Irwin.
Abstract out ramfs readpage(), prepare_write(), and commit_write()
operations.
Ram-backed filesystems are going to be doing a lot of zero-filled read
and write operations. So in this patch, ramfs' implementations are
moved to libfs in anticipation of other callers.
Andrew Morton [Tue, 29 Oct 2002 00:21:56 +0000 (16:21 -0800)]
[PATCH] blkdev_get_block fix
Patch from Hugh Dickins <hugh@veritas.com>
Fix premature -EIO from blkdev_get_block: bdget initialize
bd_block_size consistent with bd_inode->i_blkbits (assigned by
new_inode). Otherwise, subsequent set_blocksize can find bd_block_size
doesn't need updating, and skip updating i_blkbits, leaving them
inconsistent.
Now that the devicemapper hit the tree there's no more reason
to keep the uncompiling LVM1 code around and it's various hacks
to other files around, this patch removes it.
Alexander Viro [Mon, 28 Oct 2002 10:51:19 +0000 (02:51 -0800)]
[PATCH] ide-{disk,cd,...} got separate block_device_operations
* first application of the fact that block device methods are
per-disk and not per-major - IDE subdrivers got block_device_operations
of their own, redirects in ide.c are gone, so is a bunch of methods of
IDE subdrivers.
Alexander Viro [Mon, 28 Oct 2002 10:51:04 +0000 (02:51 -0800)]
[PATCH] IO counters - per-partition part
This chunk and the next one basically do equivalent of sard in the
right way - counters are exported per-disk in driverfs, as attributes of
disk or partition nodes.
Alexander Viro [Mon, 28 Oct 2002 10:50:55 +0000 (02:50 -0800)]
[PATCH] block_device_operations always picked from gendisk
* do_open() cleaned up
* we always pick block_device_operations from gendisk->fops now
* register_blkdev() just stores the name of driver, nothing more
* ->bd_op and ->bd_queue removed - we have that in gendisk
* get_blkfops() is gone
Alexander Viro [Mon, 28 Oct 2002 10:50:50 +0000 (02:50 -0800)]
[PATCH] saner initialization order in IDE (gendisks allocated slightly earlier)
* we move allocation of gendisks in ide-probe to the moment when
queues are set up, so everything that wants to feed requests in one of
IDE queues can safely set ->rq_disk
Alexander Viro [Mon, 28 Oct 2002 10:49:47 +0000 (02:49 -0800)]
[PATCH] hd.c
* switched to private queues
* set ->queue and ->private_data
* switched to use of ->bd_disk and ->rq_disk
* folded recalibrate[] and special_op[] into hd_info[]
* switched to passing pointers instead of indices
* cleaned up
Alexander Viro [Mon, 28 Oct 2002 10:49:37 +0000 (02:49 -0800)]
[PATCH] nftl
* switched to private queues
* set ->queue and ->private_data
* switched to use of ->bd_disk and ->rq_disk
* fixed the problem with request_module() from open()
* cleaned up
James Morris [Mon, 28 Oct 2002 10:36:29 +0000 (02:36 -0800)]
[CRYPTO]: Cleanups and more consistency checks.
- Removed local_bh_disable() from kmap wrapper, not needed now with
two atomic kmaps.
- Nuked atomic flag, use in_softirq() instead.
- Converted crypto_kmap() and crypto_yield() to check in_softirq().
- Check CRYPTO_MAX_CIPHER_BLOCK_SIZE during alg init.
- Try to initialize as much at compile time as possible
(feedback from Christoph Hellwig).
- Clean up list handling a bit (feedback from Christoph Hellwig).