Kai Mäkisara [Tue, 15 Oct 2002 11:38:24 +0000 (04:38 -0700)]
[PATCH] SCSI tape door lock and reset fixes
- switch to using scsi_ioctl() for drive door locking and unlocking
instead of private code
- use a driver internal flag to save the reset status until tape is
positioned into known location
- set driver state properly for all partitions after reset
- change put_device() to driver_unregister() in st_detach()
- C99 initializer changes (from Art Haas)
Ingo Molnar [Tue, 15 Oct 2002 11:35:16 +0000 (04:35 -0700)]
[PATCH] futex-2.5.42-A2
This is my current futex patchset against BK-curr. It mostly includes
must-have crash/correctness fixes from Martin Wirth, tested and reworked
somewhat by myself:
- crash fix: futex_close did not detach from the vcache. Detach cleanups.
(Martin Wirth)
- memory leak fix: forgotten put_page() in a rare path in __pin_page().
(Martin Wirth)
- crash fix: do not do any quickcheck in unqueue_me(). (Martin, me)
- correctness fix: the fastpath in __pin_page() now handles reserved
pages the same way get_user_pages() does. (Martin Wirth)
- queueing improvement: __attach_vcache() now uses list_add_tail() to
avoid the reversal of the futex queue if a COW happens. (Martin Wirth)
- simplified alignment check in sys_futex. (Martin Wirth)
- comment fix: make it clear how the vcache hash quickcheck works. (me)
John Levon [Tue, 15 Oct 2002 11:30:56 +0000 (04:30 -0700)]
[PATCH] oprofile - core
Add the oprofile core. The core design is very similar to that we
discussed in private mail. The nasty details should be documented in
the patch below.
John Levon [Tue, 15 Oct 2002 11:30:38 +0000 (04:30 -0700)]
[PATCH] oprofile - timer hook
This implements a simple hook into the profiling timer for x86 so that
non-perfctr machines can still use oprofile. This has proven useful for
laptops and the like.
It also reduces header dependencies a bit by centralising readprofile
code
John Levon [Tue, 15 Oct 2002 11:30:32 +0000 (04:30 -0700)]
[PATCH] oprofile - dcookies
This implements the persistent path-to-dcookies mapping, and adds a
system call for the user-space profiler to look up the profile data, so
it can tag profiles to specific binaries.
John Levon [Tue, 15 Oct 2002 11:30:26 +0000 (04:30 -0700)]
[PATCH] oprofile - hooks
This implements the simple hooks we need to catch unmappings, and to
make sure no stale task_struct*'s are ever used by the main oprofile
core mechanism. If disabled, it compiles to nothing.
Alexander Viro [Tue, 15 Oct 2002 11:25:44 +0000 (04:25 -0700)]
[PATCH] bunch of ->open() killed.
Quite a few drivers don't need ->open() anymore - all it did was checking
that minor is good (== gendisk exists). That is handled by generic code
now...
Alexander Viro [Tue, 15 Oct 2002 11:25:37 +0000 (04:25 -0700)]
[PATCH] bdev->bd_disk introduced
There we go - now we can put a reference to gendisk into block_device. Which
we do in do_open(). Most of the callers of get_gendisk() are simply using
bdev->bd_disk now (and most of the put_disk() calls introduced on previous
step disappear). We also put that pointer into struct request - ->rq_disk.
That allows to get rid of disk_index() kludges in md.c (we simply count
relevant IO in the struct gendisk fields) and kill the export of get_gendisk().
Notice that by now we can move _all_ IO counters into gendisk. That
will kill a bunch of per-major arrays and more importantly, allow to merge
sard in clean way. FWIW, we probably could show them as disk/partitions
attributes in driverfs...
Alexander Viro [Tue, 15 Oct 2002 11:25:32 +0000 (04:25 -0700)]
[PATCH] refcounts for gendisks
Finally. We use disk->dev.refcount as a gendisk refcount. New helper -
get_disk(): atomic_inc on refcount. get_gendisk() does it on return,
callers of get_gendisk() do put_disk() when they are done.
Alexander Viro [Tue, 15 Oct 2002 11:25:24 +0000 (04:25 -0700)]
[PATCH] preparation to use of driverfs refcounts, part 2 - disk
* disk->disk_dev is initialized in alloc_disk(), device_add()'d in
add_disk(), device_del()'d in unregister_disk() and device_put() in
put_disk().
* devices of partitions are made its children.
* attributes of disk one: dev (dev_t of the thing), range (number of
minors) and size (in sectors).
* attributes of partition ones: dev (ditto), start (in sectors) and
size (in sectors).
* disk devices are put on a new bus - "block"
* if caller of add_disk() had set disk->driverfs_dev, we set symlinks:
"device" from disk to underlying device and "block" from underlying
device to disk.
* ->release() of disk_dev frees disk and disk->part.
At that point we have sane driverfs subtree for each gendisk and
refcount of its root (disk->disk_dev) can act as gendisk refcount.
Alexander Viro [Tue, 15 Oct 2002 11:25:18 +0000 (04:25 -0700)]
[PATCH] preparation to use of driverfs refcounts, part 1 - partitions
* update_partition() split into add_partition() and delete_partition().
* all updating of ->part[] is switched to these two (including initial
filling/final cleaning).
* per-partition devices are allocated on-demand and never reused.
We allocate struct device in add_partition() and put reference to it into
hd_struct. ->release() for that struct device frees it. delete_partition()
removes reference from hd_struct and does put_device() on it. Basically,
we get rid of problems with reused struct device by never reusing them...
At that point devices for partitions are nice and sane.
Alexander Viro [Tue, 15 Oct 2002 11:23:37 +0000 (04:23 -0700)]
[PATCH] device_register() splitup
new driverfs helpers - device_initialize/device_add and device_del.
The latter is device_unregister() sans the final put_device(). The former
is splitup of device_register() into initialization and insertion into tree.
Consistent naming for Bluetooth function and constants.
Some of them were named like BT_XXX and bt_xxx others BLUEZ_XXX and bluez_xxx.
From now on use BT_XXX and bt_xxx throughout Bluetooth code, including CONFIG_ defines.
Clean up small typos and misspelling along the way.
Now that the module name bluetooth.o is not used by USB subsystem anymore
we can rename bluez.o to what it should have been from the begging
bluetoth.o
Andrew Morton [Sun, 13 Oct 2002 09:59:10 +0000 (02:59 -0700)]
[PATCH] remove kiobufs
This patch from Christoph Hellwig removes the kiobuf/kiovec
infrastructure.
This affects three subsystems:
video-buf.c:
This patch includes an earlier diff from Gerd which converts
video-buf.c to use get_user_pages() directly.
Gerd has acked this patch.
LVM1:
Is now even more broken.
drivers/mtd/devices/blkmtd.c:
blkmtd is broken by this change. I contacted Simon Evans, who
said "I had done a rewrite of blkmtd anyway and just need to convert
it to BIO. Feel free to break it in the 2.5 tree, it will force me
to finish my code."
Neither EVMS nor LVM2 use kiobufs. The only remaining breakage
of which I am aware is a proprietary MPEG2 streaming module. It
could use get_user_pages().
Andrew Morton [Sun, 13 Oct 2002 09:58:45 +0000 (02:58 -0700)]
[PATCH] batched slab shrink and registration API
From Ed Tomlinson, then mauled by yours truly.
The current shrinking of the dentry, inode and dquot caches seems to
work OK, but it is slightly CPU-inefficient: we call the shrinking
functions many times, for tiny numbers of objects.
So here, we just batch that up - shrinking happens at the same rate but
we perform it in larger units of work.
To do this, we need a way of knowing how many objects are currently in
use by individual caches. slab does not actually track this
information, but the existing shrinkable caches do have this on hand.
So rather than adding the counters to slab, we require that the
shrinker callback functions keep their own count - we query that via
the callback.
We add a simple registration API which is exported to modules. A
subsystem may register its own callback function via set_shrinker().
set_shrinker() simply takes a function pointer. The function is called
with
int (*shrinker)(int nr_to_shrink, unsigned int gfp_mask);
The shrinker callback must scan `nr_to_scan' objects and free all
freeable scanned objects. Note: it doesn't have to *free* `nr_to_scan'
objects. It need only scan that many. Which is a fairly pedantic
detail, really.
The shrinker callback must return the number of objects which are in
its cache at the end of the scanning attempt. It will be called with
nr_to_scan == 0 when we're just querying the cache size.
The set_shrinker() registration API is passed a hint as to how many
disk seeks a single cache object is worth. Everything uses "2" at
present.
I saw no need to add the traditional `here is my void *data' to the
registration/callback. Because there is a one-to-one relationship
between caches and their shrinkers.
Various cleanups became possible:
- shrink_icache_memory() is no longer exported to modules.
- shrink_icache_memory() is now static to fs/inode.c
- prune_icache() is now static to fs/inode.c, and made inline (single caller)
- shrink_dcache_memory() is made static to fs/dcache.c
- prune_dcache() is no longer exported to modules
- prune_dcache() is made static to fs/dcache.c
- shrink_dqcache_memory() is made static to fs/dquot.c
- All the quota init code has been moved from fs/dcache.c into fs/dquot.c
- All modifications to inodes_stat.nr_inodes are now inside
inode_lock - the dispose_list one was racy.
David Brownell [Sun, 13 Oct 2002 08:40:00 +0000 (01:40 -0700)]
[PATCH] usbcore doc + minor fixes
Cleaning out my queue of most minor patches:
- Provides some kerneldoc for 'struct usb_interface' now that
the API is highlighting it.
- Fixes usb_set_interface() so it doesn't affect other interfaces.
This provides the right place for an eventual HCD call to clean
out now-invalid records of endpoint state, and also gets rid of
a potential SMP issue where drivers on different interfaces
calling concurrently could clobber each other. (Per-interface
data doesn't need locking except against config changes.)
- It's OK to pass URB_NO_INTERRUPT hints if you're queueing a
bunch of interrupt transfers.
The set_interface call should eventually take the interface as a
parameter, it's one of the few left using the "device plus magic
number" identifier. I have a partial patch for that, but it doesn't
handle the (newish) ALSA usb audio driver or a few other callers.
Russell King [Sun, 13 Oct 2002 19:52:13 +0000 (20:52 +0100)]
[ARM] Update AFS mtd partition parsing.
This updates AFS mtd partition parsing to the current CVS version:
- Don't recognise the AFS SIB as a partition
- Ensure initialisation of afs mtdpart structures.
Russell King [Sun, 13 Oct 2002 17:42:09 +0000 (18:42 +0100)]
[ARM] Make the assabet machine always use the same uart mapping.
Traditionally, the Assabet reverses its mapping of UART1 and UART3
when the Neponset board is connected. This can be (a) confusing
and (b) annoying when the boot loader uses UART1. We therefore
have a fixed mapping between the ttySA names and the physical
UARTs on this platform.
Russell King [Sun, 13 Oct 2002 17:36:03 +0000 (18:36 +0100)]
[ARM] Convert sa1100 PCMCIA drivers to C99 initializers (Art Haas)
The patches convert drivers/pcmcia to use C99 named initializers,
and all the patches are against 2.5.42. There are 25 patches in
total, and the "cat"ing them together they're more that 20K, so
I'm sending the patches as a compressed attachment. The patches
were CC'd to Linus in the first mail that bounced.
Russell King [Sun, 13 Oct 2002 17:28:03 +0000 (18:28 +0100)]
[ARM ADFS] C99 designated initialisers (Patch from Art Haas)
Here's a small set of patches that switch the code to use C99
desiginated initializers. Patches are against 2.5.42.
Russell King [Sun, 13 Oct 2002 17:02:58 +0000 (18:02 +0100)]
[ARM] Update acorn scsi code wrt global irq and bitops
This cset removes the global irq handling in the AcornSCSI driver,
and makes the target type for bitops an unsigned long array rather
than an unsigned char array.
Russell King [Sun, 13 Oct 2002 16:49:52 +0000 (17:49 +0100)]
[ARM] Convert boot-time memory permission selection to table.
This removes a compilation warning and makes the code smaller.
It is also more obvious what's going on.
Russell King [Sun, 13 Oct 2002 16:43:23 +0000 (17:43 +0100)]
[ARM] Remove non-existent USB gadget code from mach-sa1100/Makefile
The USB gadget code now lives in arch/arm/mach-sa1100/usb, and
isn't in a mergable state. We remove the old makefile entries
which are never going to be satisfied, and leave a placeholder for
the usb directory.
Russell King [Sun, 13 Oct 2002 16:38:43 +0000 (17:38 +0100)]
[ARM] dump_stack and show_trace_task
dump_stack() got used by the generic code. Call our version
__dump_stack since we're running out of other descriptive names.
Allow show_trace_task to show the backtrace for the current
thread.
Russell King [Sun, 13 Oct 2002 16:32:58 +0000 (17:32 +0100)]
[ARM] Rudimentary support for Thumb ptracing.
Add rudimentary support for Thumb ptracing; we aren't able to single
step through thumb branches yet, but this change provides enough
infrastructure to make this possible.
Russell King [Sun, 13 Oct 2002 16:08:51 +0000 (17:08 +0100)]
[ARM] Fix up NCR5380-based Acorn SCSI drivers
This cset updates (as much as is possible) the NCR5380-based Acorn
SCSI drivers, mainly converting them to the new error handling code.
However, they still don't build due to errors in NCR5380.
Russell King [Sun, 13 Oct 2002 15:57:58 +0000 (16:57 +0100)]
[ARM] Remove old Acorn iomd-based keyboard and mouse drivers.
Vojtech has patches that bring their counterparts in the input
subsystem up to date (and into a working state) so these drivers
are no longer required.
Russell King [Sun, 13 Oct 2002 15:34:28 +0000 (16:34 +0100)]
[ARM] Acorn serial port driver update
This cset combines the Atomwide and The Serial Port 16550 driver
modules into one "8250_acorn.c" driver. This new module takes full
advantage of the LDM-based expansion card facilities.
Russell King [Sun, 13 Oct 2002 15:17:51 +0000 (16:17 +0100)]
[ARM] Update Acorn ethernet expansion cards
This cset implements validity checks on the ethernet MAC address when
the device is opened, and refuses to open the device if this check
fails. We also provide the set_mac_address method to allow ifconfig
to change the mac address to something valid.
In addition, the driver is converted from the old expansion card
discovery methods to the new device model driver framework.
Russell King [Sun, 13 Oct 2002 14:35:29 +0000 (15:35 +0100)]
Convert acorn expansion card probing code to the Linux device model.
Provide LDM-based driver registration/removal interface for drivers
to use. We make the old device discovery code ignore devices
claimed via the LDM framework. However, the LDM framework ignores
devices that may be in use by the old device discovery code. This
is fine since the only devices that will still use the old discovery
code will be SCSI drivers.
Currently, we don't provide a useful dev.name entry.
Russell King [Sun, 13 Oct 2002 13:32:13 +0000 (14:32 +0100)]
[ARM] cpufreq updates for ARM
This updates the Integrator cpufreq code to use the new interfaces,
and makes the sa1100 cpufreq round up the requested frequency.
Russell King [Sun, 13 Oct 2002 13:00:14 +0000 (14:00 +0100)]
[ARM] Update neponset/sa1111 for Linux device model updates.
This updates these neponset and sa1111 support to use the new system
device infrastructure in the Linux device model.
Russell King [Sun, 13 Oct 2002 12:47:15 +0000 (13:47 +0100)]
[ARM] Optimise ARM TLB handling
Sanitise includes of asm/tlbflush.h, asm/cacheflush.h, asm/proc-fns.h
Implement ARM-specific TLB "shootdown" code. It turns out that it
is overall more efficient to unconditionally invalidate the whole
TLB rather than entry by entry when removing areas.
Russell King [Sun, 13 Oct 2002 12:22:31 +0000 (13:22 +0100)]
[ARM] Update RiscPC decompressor for PIC changes
This cset fixes the RiscPC decompressor code for the PIC changes.
We use a pointer to a structure rather than a structure to access
params. With a PIC decompressor, the address of the structure gets
PIC-ified which is not what we want.
Russell King [Sun, 13 Oct 2002 12:05:47 +0000 (13:05 +0100)]
[ARM] Update pcibios_enable_device, supply pci_mmap_page_range()
Update pcibios_enable_device to only enable requested resources,
mainly for IDE. Supply a pci_mmap_page_range() function to allow
user space to mmap PCI regions.
Russell King [Sun, 13 Oct 2002 11:36:29 +0000 (12:36 +0100)]
[ARM] Ensure deselected config variables are defined to 'n'
To keep the Config.in files relatively clean, we use the
following construct:
if [ "$CONFIG_ARM" = "y" ]; then
dep_tristate 'Foo' CONFIG_FOO $CONFIG_BAR
fi
where CONFIG_BAR is some machine implementation or high-level
chip support configuration option. If CONFIG_BAR is left
empty, then the tristate is offered to the user, which isn't
what we want. Defining CONFIG_BAR to 'n' prevents the option
being offered.
This is a rule I generally try to implement within
arch/arm/config.in.
This cset makes CONFIG_SA1111 and CONFIG_ARM_THUMB behave that
way.
Russell King [Sun, 13 Oct 2002 11:29:00 +0000 (12:29 +0100)]
[ARM] Allow CONFIG_ZBOOT_ROM=y image to be relocated to RAM
Since the decompressor supports PIC, even for CONFIG_ZBOOT_ROM,
we can easily allow an image which has been linked to run at
a particular address in ROM to be moved to RAM. We just need
to make sure that we don't relocate the GOT entries for the
BSS segment.
This cset also implements sa1100-based debugging for the
decompressor.
Russell King [Sun, 13 Oct 2002 11:20:40 +0000 (12:20 +0100)]
[ARM] Move TEXTADDR and DATAADDR out of vmlinux.lds.S
These two variables are used by more than just the linker;
they're also used by head.S to know where it can safely
place the page tables. We therefore need to export it
from the Makefile.
These are also highly machine dependent; we don't want
to duplicate the same set of conditionals for cpp and
for the makefiles.
arch/arm/Makefile also contained a stray close-paren. I'm
submitting this one to the lost property office.
We also always pass -mno-fpu to the assember; this
guarantees that any floating point will be caught.
Matthew Dharm [Sun, 13 Oct 2002 06:48:00 +0000 (23:48 -0700)]
[PATCH] usb-storage: cache pipe values
This patch to usb-storage makes all pipe values used by the driver an
unsigned int (like they should be), and caches them in the device data
structure.
Barry K. Nathan [Sun, 13 Oct 2002 06:44:18 +0000 (23:44 -0700)]
[PATCH] USB: 2.5.42 partial fix for older pl2303
On Sat, Oct 12, 2002 at 06:16:44PM -0700, Greg KH wrote:
> Now, would you mind taking a look at 2.5, and fixing this there too? :)
Here's a half-successful attempt. With this patch, the device no longer
appears twice, and it always works on the first open (at least, so I've
observed up to this point). The open following a successful open usually
fails (roughly speaking, it appears to play dead), and the open following
a failed open usually (always?) succeeds.
So, on my PL-2303, it's not perfect but it's certainly livable.
This patch is based on the one I did for 2.4, and in fact, this code
functions when it's plugged into 2.4.20-pre10 instead of 2.5.42. In that
scenario, the opens work 100% of the time.
I'd be interested in suggestions or comments regarding this patch.
Anyone who has a PL-2303 working under 2.5 might want to try this patch
just to make sure it doesn't kill their working setup.
The Documentation/Changes in the summary has been updated to require
make 3.78 but the other references were not updated. And 3.78 really
is required. This patch updates the other locations.
Andrew Morton [Sun, 13 Oct 2002 02:33:20 +0000 (19:33 -0700)]
[PATCH] reduce the dirty threshold when there's a lot of mapped
Dirty memory thresholds are currently set by /proc/sys/vm/dirty_ratio.
Background writeout levels are controlled by
/proc/sys/vm/dirty_background_ratio.
Problem is that these levels are hard to get right - they are too
static. If there is a lot of mapped memory around then the 40%
clamping level causes too much dirty data. We do lots of scanning in
page reclaim, and the VM generally starts getting into distress. Extra
swapping, extra page unmapping.
It would be much better to simply tell the caller of write(2) to slow
down - to write out their dirty data sooner, to make those written
pages trivially reclaimable. Penalise the offender, not the innocent
page allocators.
This patch changes the writer throttling code so that we clamp down
much harder on writers if there is a lot of mapped memory in the
machine. We only permit memory dirtiers to dirty up to 50% of unmapped
memory before forcing them to clean their own pagecache.
Andrew Morton [Sun, 13 Oct 2002 02:33:11 +0000 (19:33 -0700)]
[PATCH] start anon pages on the active list
We're currently adding anon pages to the inactive list. But they're
all referenced, so when they reach the tail of the inactive list the
kernel will always then bump them up to the active list.
Not only does this waste CPU, but it leads to inactive/active
imbalance. We end up with enormous sequences of unreclaimable,
to-be-activated pages hitting the tail of the LRU and large amounts of
scanning need to be done. Which upsets the VM, making it think that it
is "under distress".
Andrew Morton [Sun, 13 Oct 2002 02:33:06 +0000 (19:33 -0700)]
[PATCH] reduced and tunable swappiness
/proc/sys/vm/swappiness controls the VM's tendency to unmap pages and to
swap things out.
100 -> basically current 2.5 behaviour
0 -> not very swappy at all
The mechanism which is used to control swappiness is: to be reluctant
to bring mapped pages onto the inactive list. Prefer to reclaim
pagecache instead.
The control for that mechanism is as follows:
- If there is a large amount of mapped memory in the machine, we
prefer to bring mapped pages onto the inactive list.
- If page reclaim is under distress (more scanning is happening) then
prefer to bring mapped pages onto the inactive list. This is
basically the 2.4 algorithm, really.
- If the /proc/sys/vm/swappiness control is high then prefer to bring
mapped pages onto the inactive list.
The implementation is simple: calculate the above three things as
percentages and add them up. If that's over 100% then start reclaiming
mapped pages.
The `proportion of mapped memory' is downgraded so that we don't swap
just because a lot of memory is mapped into pagetables - we still need
some VM distress before starting to swap that memory out.
For a while I was adding a little bias so that we prefer to unmap
file-backed memory before swapping out anon memory. Because usually
file backed memory can be evicted and reestablished with one I/O, not
two. It was unmapping executable text too easily, so here I just treat
them equally.
Andrew Morton [Sun, 13 Oct 2002 02:33:01 +0000 (19:33 -0700)]
[PATCH] propagate pte reference into page reference during
zap_pte_range() is currently just dropping the pte. Change it to mark
the page referenced if the pte says it was. This has the effect of
delaying the eviction of recently-mapped pagecache.
This means that we're currently marking the page accessed when it is
first faulted in as well as when we drop it from pagetables. Which
matches up with the (strange) behaviour of the VM: it reclaims
PageReferenced pagecache pages off the inactive list.
Probably, it makes sense to remove the mark_page_accessed() from
filemap_nopage() and just use the pte bits everywhere. Reviewing all
the PageReferenced()/mark_page_accessed() usage is on my todo list.
Andrew Morton [Sun, 13 Oct 2002 02:32:57 +0000 (19:32 -0700)]
[PATCH] small-machine writer throttling fix
The current writer throttling in balance_dirty_pages() assumes that the
writer will be effectively throttled on request queues.
That works fine when the amount of data which can be placed into a
queue is "much less than" total memory.
But if the machine has a small amount of memory, or many disks, or has
large request queues, or large requests, it can go wrong.
For example, with mem=96m and dirty_async_ratio=15, we want to be able
to clamp dirty+writeback memory at 15 megabytes. But it doesn't work,
because a single SCSI request queue can hold 40 megs or more. The
heavy writer keeps on dirtying memory until that queue fills up.
So add a test for that - if we did some writeback, and we're *still*
over the dirty+writeback threshold then make the caller take an
explicit nap on some writes terminating. And keep on doing that until
the dirty+writeback memory subsides.