Stephen Lord [Tue, 15 Oct 2002 01:16:03 +0000 (03:16 +0200)]
XFS: Switch to native endian internal representation for extents
Switch xfs from using a big endian internal representation for
the in memory copy of extents to a host byte order representation.
The internal extents are read in once, then modified seperately
from the on disk ones. Since we search and manipulate the extents
multiple times, it is cheaper to convert them to host byte order
once and then keep them in that format. Worth about 5 to 10%
reduction in cpu time for some loads. Complicated by the fact
that the in memory extents are written out to the log sometimes,
and when expanding extents are used to write out the initial
block of extents.
Nathan Scott [Tue, 15 Oct 2002 00:56:47 +0000 (02:56 +0200)]
XFS: Sysctl updates
Symlinks are created by default with mode 777 now, old behavior is
still accessible through sysctl through. Irixsgid mount option
eprecated and it too is still accessible through sysctl.
Stephen Lord [Tue, 15 Oct 2002 00:40:32 +0000 (02:40 +0200)]
XFS: Rework dev_t and linux inode handling
This is a two fold change, first it moves the translation between
linux dev_t and kdev_t up the call stack in xfs and makes the bulk
of xfs work in terms of its on disk dev_t format. It also cleans
up a few related chunks of code.
The other part of the change reworks how we keep the linux inode
contents and the xfs inode fields in sync. A number of places where
we resynced the two have been removed, these were basically
replicating work elsewhere in the filesystem. We now also ensure
that the inode fields are filled in before calling unlock_new_inode -
there used to be a window.
Finally all the code which hooks together the linux inode and the xfs
inode is brought together as a more coherent whole rather than being
cattered around the inode create path. Most calls to revalidate the
linux inode from the xfs inode are removed.
Andrew Morton [Sun, 13 Oct 2002 09:59:10 +0000 (02:59 -0700)]
[PATCH] remove kiobufs
This patch from Christoph Hellwig removes the kiobuf/kiovec
infrastructure.
This affects three subsystems:
video-buf.c:
This patch includes an earlier diff from Gerd which converts
video-buf.c to use get_user_pages() directly.
Gerd has acked this patch.
LVM1:
Is now even more broken.
drivers/mtd/devices/blkmtd.c:
blkmtd is broken by this change. I contacted Simon Evans, who
said "I had done a rewrite of blkmtd anyway and just need to convert
it to BIO. Feel free to break it in the 2.5 tree, it will force me
to finish my code."
Neither EVMS nor LVM2 use kiobufs. The only remaining breakage
of which I am aware is a proprietary MPEG2 streaming module. It
could use get_user_pages().
Andrew Morton [Sun, 13 Oct 2002 09:58:45 +0000 (02:58 -0700)]
[PATCH] batched slab shrink and registration API
From Ed Tomlinson, then mauled by yours truly.
The current shrinking of the dentry, inode and dquot caches seems to
work OK, but it is slightly CPU-inefficient: we call the shrinking
functions many times, for tiny numbers of objects.
So here, we just batch that up - shrinking happens at the same rate but
we perform it in larger units of work.
To do this, we need a way of knowing how many objects are currently in
use by individual caches. slab does not actually track this
information, but the existing shrinkable caches do have this on hand.
So rather than adding the counters to slab, we require that the
shrinker callback functions keep their own count - we query that via
the callback.
We add a simple registration API which is exported to modules. A
subsystem may register its own callback function via set_shrinker().
set_shrinker() simply takes a function pointer. The function is called
with
int (*shrinker)(int nr_to_shrink, unsigned int gfp_mask);
The shrinker callback must scan `nr_to_scan' objects and free all
freeable scanned objects. Note: it doesn't have to *free* `nr_to_scan'
objects. It need only scan that many. Which is a fairly pedantic
detail, really.
The shrinker callback must return the number of objects which are in
its cache at the end of the scanning attempt. It will be called with
nr_to_scan == 0 when we're just querying the cache size.
The set_shrinker() registration API is passed a hint as to how many
disk seeks a single cache object is worth. Everything uses "2" at
present.
I saw no need to add the traditional `here is my void *data' to the
registration/callback. Because there is a one-to-one relationship
between caches and their shrinkers.
Various cleanups became possible:
- shrink_icache_memory() is no longer exported to modules.
- shrink_icache_memory() is now static to fs/inode.c
- prune_icache() is now static to fs/inode.c, and made inline (single caller)
- shrink_dcache_memory() is made static to fs/dcache.c
- prune_dcache() is no longer exported to modules
- prune_dcache() is made static to fs/dcache.c
- shrink_dqcache_memory() is made static to fs/dquot.c
- All the quota init code has been moved from fs/dcache.c into fs/dquot.c
- All modifications to inodes_stat.nr_inodes are now inside
inode_lock - the dispose_list one was racy.
David Brownell [Sun, 13 Oct 2002 08:40:00 +0000 (01:40 -0700)]
[PATCH] usbcore doc + minor fixes
Cleaning out my queue of most minor patches:
- Provides some kerneldoc for 'struct usb_interface' now that
the API is highlighting it.
- Fixes usb_set_interface() so it doesn't affect other interfaces.
This provides the right place for an eventual HCD call to clean
out now-invalid records of endpoint state, and also gets rid of
a potential SMP issue where drivers on different interfaces
calling concurrently could clobber each other. (Per-interface
data doesn't need locking except against config changes.)
- It's OK to pass URB_NO_INTERRUPT hints if you're queueing a
bunch of interrupt transfers.
The set_interface call should eventually take the interface as a
parameter, it's one of the few left using the "device plus magic
number" identifier. I have a partial patch for that, but it doesn't
handle the (newish) ALSA usb audio driver or a few other callers.
Russell King [Sun, 13 Oct 2002 19:52:13 +0000 (20:52 +0100)]
[ARM] Update AFS mtd partition parsing.
This updates AFS mtd partition parsing to the current CVS version:
- Don't recognise the AFS SIB as a partition
- Ensure initialisation of afs mtdpart structures.
Russell King [Sun, 13 Oct 2002 17:42:09 +0000 (18:42 +0100)]
[ARM] Make the assabet machine always use the same uart mapping.
Traditionally, the Assabet reverses its mapping of UART1 and UART3
when the Neponset board is connected. This can be (a) confusing
and (b) annoying when the boot loader uses UART1. We therefore
have a fixed mapping between the ttySA names and the physical
UARTs on this platform.
Russell King [Sun, 13 Oct 2002 17:36:03 +0000 (18:36 +0100)]
[ARM] Convert sa1100 PCMCIA drivers to C99 initializers (Art Haas)
The patches convert drivers/pcmcia to use C99 named initializers,
and all the patches are against 2.5.42. There are 25 patches in
total, and the "cat"ing them together they're more that 20K, so
I'm sending the patches as a compressed attachment. The patches
were CC'd to Linus in the first mail that bounced.
Russell King [Sun, 13 Oct 2002 17:28:03 +0000 (18:28 +0100)]
[ARM ADFS] C99 designated initialisers (Patch from Art Haas)
Here's a small set of patches that switch the code to use C99
desiginated initializers. Patches are against 2.5.42.
Russell King [Sun, 13 Oct 2002 17:02:58 +0000 (18:02 +0100)]
[ARM] Update acorn scsi code wrt global irq and bitops
This cset removes the global irq handling in the AcornSCSI driver,
and makes the target type for bitops an unsigned long array rather
than an unsigned char array.
Russell King [Sun, 13 Oct 2002 16:49:52 +0000 (17:49 +0100)]
[ARM] Convert boot-time memory permission selection to table.
This removes a compilation warning and makes the code smaller.
It is also more obvious what's going on.
Russell King [Sun, 13 Oct 2002 16:43:23 +0000 (17:43 +0100)]
[ARM] Remove non-existent USB gadget code from mach-sa1100/Makefile
The USB gadget code now lives in arch/arm/mach-sa1100/usb, and
isn't in a mergable state. We remove the old makefile entries
which are never going to be satisfied, and leave a placeholder for
the usb directory.
Russell King [Sun, 13 Oct 2002 16:38:43 +0000 (17:38 +0100)]
[ARM] dump_stack and show_trace_task
dump_stack() got used by the generic code. Call our version
__dump_stack since we're running out of other descriptive names.
Allow show_trace_task to show the backtrace for the current
thread.
Russell King [Sun, 13 Oct 2002 16:32:58 +0000 (17:32 +0100)]
[ARM] Rudimentary support for Thumb ptracing.
Add rudimentary support for Thumb ptracing; we aren't able to single
step through thumb branches yet, but this change provides enough
infrastructure to make this possible.
Russell King [Sun, 13 Oct 2002 16:08:51 +0000 (17:08 +0100)]
[ARM] Fix up NCR5380-based Acorn SCSI drivers
This cset updates (as much as is possible) the NCR5380-based Acorn
SCSI drivers, mainly converting them to the new error handling code.
However, they still don't build due to errors in NCR5380.
Russell King [Sun, 13 Oct 2002 15:57:58 +0000 (16:57 +0100)]
[ARM] Remove old Acorn iomd-based keyboard and mouse drivers.
Vojtech has patches that bring their counterparts in the input
subsystem up to date (and into a working state) so these drivers
are no longer required.
Russell King [Sun, 13 Oct 2002 15:34:28 +0000 (16:34 +0100)]
[ARM] Acorn serial port driver update
This cset combines the Atomwide and The Serial Port 16550 driver
modules into one "8250_acorn.c" driver. This new module takes full
advantage of the LDM-based expansion card facilities.
Russell King [Sun, 13 Oct 2002 15:17:51 +0000 (16:17 +0100)]
[ARM] Update Acorn ethernet expansion cards
This cset implements validity checks on the ethernet MAC address when
the device is opened, and refuses to open the device if this check
fails. We also provide the set_mac_address method to allow ifconfig
to change the mac address to something valid.
In addition, the driver is converted from the old expansion card
discovery methods to the new device model driver framework.
Russell King [Sun, 13 Oct 2002 14:35:29 +0000 (15:35 +0100)]
Convert acorn expansion card probing code to the Linux device model.
Provide LDM-based driver registration/removal interface for drivers
to use. We make the old device discovery code ignore devices
claimed via the LDM framework. However, the LDM framework ignores
devices that may be in use by the old device discovery code. This
is fine since the only devices that will still use the old discovery
code will be SCSI drivers.
Currently, we don't provide a useful dev.name entry.
Russell King [Sun, 13 Oct 2002 13:32:13 +0000 (14:32 +0100)]
[ARM] cpufreq updates for ARM
This updates the Integrator cpufreq code to use the new interfaces,
and makes the sa1100 cpufreq round up the requested frequency.
Russell King [Sun, 13 Oct 2002 13:00:14 +0000 (14:00 +0100)]
[ARM] Update neponset/sa1111 for Linux device model updates.
This updates these neponset and sa1111 support to use the new system
device infrastructure in the Linux device model.
Russell King [Sun, 13 Oct 2002 12:47:15 +0000 (13:47 +0100)]
[ARM] Optimise ARM TLB handling
Sanitise includes of asm/tlbflush.h, asm/cacheflush.h, asm/proc-fns.h
Implement ARM-specific TLB "shootdown" code. It turns out that it
is overall more efficient to unconditionally invalidate the whole
TLB rather than entry by entry when removing areas.
Russell King [Sun, 13 Oct 2002 12:22:31 +0000 (13:22 +0100)]
[ARM] Update RiscPC decompressor for PIC changes
This cset fixes the RiscPC decompressor code for the PIC changes.
We use a pointer to a structure rather than a structure to access
params. With a PIC decompressor, the address of the structure gets
PIC-ified which is not what we want.
Russell King [Sun, 13 Oct 2002 12:05:47 +0000 (13:05 +0100)]
[ARM] Update pcibios_enable_device, supply pci_mmap_page_range()
Update pcibios_enable_device to only enable requested resources,
mainly for IDE. Supply a pci_mmap_page_range() function to allow
user space to mmap PCI regions.
Russell King [Sun, 13 Oct 2002 11:36:29 +0000 (12:36 +0100)]
[ARM] Ensure deselected config variables are defined to 'n'
To keep the Config.in files relatively clean, we use the
following construct:
if [ "$CONFIG_ARM" = "y" ]; then
dep_tristate 'Foo' CONFIG_FOO $CONFIG_BAR
fi
where CONFIG_BAR is some machine implementation or high-level
chip support configuration option. If CONFIG_BAR is left
empty, then the tristate is offered to the user, which isn't
what we want. Defining CONFIG_BAR to 'n' prevents the option
being offered.
This is a rule I generally try to implement within
arch/arm/config.in.
This cset makes CONFIG_SA1111 and CONFIG_ARM_THUMB behave that
way.
Russell King [Sun, 13 Oct 2002 11:29:00 +0000 (12:29 +0100)]
[ARM] Allow CONFIG_ZBOOT_ROM=y image to be relocated to RAM
Since the decompressor supports PIC, even for CONFIG_ZBOOT_ROM,
we can easily allow an image which has been linked to run at
a particular address in ROM to be moved to RAM. We just need
to make sure that we don't relocate the GOT entries for the
BSS segment.
This cset also implements sa1100-based debugging for the
decompressor.
Russell King [Sun, 13 Oct 2002 11:20:40 +0000 (12:20 +0100)]
[ARM] Move TEXTADDR and DATAADDR out of vmlinux.lds.S
These two variables are used by more than just the linker;
they're also used by head.S to know where it can safely
place the page tables. We therefore need to export it
from the Makefile.
These are also highly machine dependent; we don't want
to duplicate the same set of conditionals for cpp and
for the makefiles.
arch/arm/Makefile also contained a stray close-paren. I'm
submitting this one to the lost property office.
We also always pass -mno-fpu to the assember; this
guarantees that any floating point will be caught.
Matthew Dharm [Sun, 13 Oct 2002 06:48:00 +0000 (23:48 -0700)]
[PATCH] usb-storage: cache pipe values
This patch to usb-storage makes all pipe values used by the driver an
unsigned int (like they should be), and caches them in the device data
structure.
Barry K. Nathan [Sun, 13 Oct 2002 06:44:18 +0000 (23:44 -0700)]
[PATCH] USB: 2.5.42 partial fix for older pl2303
On Sat, Oct 12, 2002 at 06:16:44PM -0700, Greg KH wrote:
> Now, would you mind taking a look at 2.5, and fixing this there too? :)
Here's a half-successful attempt. With this patch, the device no longer
appears twice, and it always works on the first open (at least, so I've
observed up to this point). The open following a successful open usually
fails (roughly speaking, it appears to play dead), and the open following
a failed open usually (always?) succeeds.
So, on my PL-2303, it's not perfect but it's certainly livable.
This patch is based on the one I did for 2.4, and in fact, this code
functions when it's plugged into 2.4.20-pre10 instead of 2.5.42. In that
scenario, the opens work 100% of the time.
I'd be interested in suggestions or comments regarding this patch.
Anyone who has a PL-2303 working under 2.5 might want to try this patch
just to make sure it doesn't kill their working setup.
The Documentation/Changes in the summary has been updated to require
make 3.78 but the other references were not updated. And 3.78 really
is required. This patch updates the other locations.
Andrew Morton [Sun, 13 Oct 2002 02:33:20 +0000 (19:33 -0700)]
[PATCH] reduce the dirty threshold when there's a lot of mapped
Dirty memory thresholds are currently set by /proc/sys/vm/dirty_ratio.
Background writeout levels are controlled by
/proc/sys/vm/dirty_background_ratio.
Problem is that these levels are hard to get right - they are too
static. If there is a lot of mapped memory around then the 40%
clamping level causes too much dirty data. We do lots of scanning in
page reclaim, and the VM generally starts getting into distress. Extra
swapping, extra page unmapping.
It would be much better to simply tell the caller of write(2) to slow
down - to write out their dirty data sooner, to make those written
pages trivially reclaimable. Penalise the offender, not the innocent
page allocators.
This patch changes the writer throttling code so that we clamp down
much harder on writers if there is a lot of mapped memory in the
machine. We only permit memory dirtiers to dirty up to 50% of unmapped
memory before forcing them to clean their own pagecache.
Andrew Morton [Sun, 13 Oct 2002 02:33:11 +0000 (19:33 -0700)]
[PATCH] start anon pages on the active list
We're currently adding anon pages to the inactive list. But they're
all referenced, so when they reach the tail of the inactive list the
kernel will always then bump them up to the active list.
Not only does this waste CPU, but it leads to inactive/active
imbalance. We end up with enormous sequences of unreclaimable,
to-be-activated pages hitting the tail of the LRU and large amounts of
scanning need to be done. Which upsets the VM, making it think that it
is "under distress".
Andrew Morton [Sun, 13 Oct 2002 02:33:06 +0000 (19:33 -0700)]
[PATCH] reduced and tunable swappiness
/proc/sys/vm/swappiness controls the VM's tendency to unmap pages and to
swap things out.
100 -> basically current 2.5 behaviour
0 -> not very swappy at all
The mechanism which is used to control swappiness is: to be reluctant
to bring mapped pages onto the inactive list. Prefer to reclaim
pagecache instead.
The control for that mechanism is as follows:
- If there is a large amount of mapped memory in the machine, we
prefer to bring mapped pages onto the inactive list.
- If page reclaim is under distress (more scanning is happening) then
prefer to bring mapped pages onto the inactive list. This is
basically the 2.4 algorithm, really.
- If the /proc/sys/vm/swappiness control is high then prefer to bring
mapped pages onto the inactive list.
The implementation is simple: calculate the above three things as
percentages and add them up. If that's over 100% then start reclaiming
mapped pages.
The `proportion of mapped memory' is downgraded so that we don't swap
just because a lot of memory is mapped into pagetables - we still need
some VM distress before starting to swap that memory out.
For a while I was adding a little bias so that we prefer to unmap
file-backed memory before swapping out anon memory. Because usually
file backed memory can be evicted and reestablished with one I/O, not
two. It was unmapping executable text too easily, so here I just treat
them equally.
Andrew Morton [Sun, 13 Oct 2002 02:33:01 +0000 (19:33 -0700)]
[PATCH] propagate pte reference into page reference during
zap_pte_range() is currently just dropping the pte. Change it to mark
the page referenced if the pte says it was. This has the effect of
delaying the eviction of recently-mapped pagecache.
This means that we're currently marking the page accessed when it is
first faulted in as well as when we drop it from pagetables. Which
matches up with the (strange) behaviour of the VM: it reclaims
PageReferenced pagecache pages off the inactive list.
Probably, it makes sense to remove the mark_page_accessed() from
filemap_nopage() and just use the pte bits everywhere. Reviewing all
the PageReferenced()/mark_page_accessed() usage is on my todo list.
Andrew Morton [Sun, 13 Oct 2002 02:32:57 +0000 (19:32 -0700)]
[PATCH] small-machine writer throttling fix
The current writer throttling in balance_dirty_pages() assumes that the
writer will be effectively throttled on request queues.
That works fine when the amount of data which can be placed into a
queue is "much less than" total memory.
But if the machine has a small amount of memory, or many disks, or has
large request queues, or large requests, it can go wrong.
For example, with mem=96m and dirty_async_ratio=15, we want to be able
to clamp dirty+writeback memory at 15 megabytes. But it doesn't work,
because a single SCSI request queue can hold 40 megs or more. The
heavy writer keeps on dirtying memory until that queue fills up.
So add a test for that - if we did some writeback, and we're *still*
over the dirty+writeback threshold then make the caller take an
explicit nap on some writes terminating. And keep on doing that until
the dirty+writeback memory subsides.