Andrew Morton [Tue, 26 Nov 2002 01:57:32 +0000 (17:57 -0800)]
[PATCH] reduced latency in dentry and inode cache shrinking
Shrinking a huge number of dentries or inodes can hold dcache_lock or
inode_lock for a long time. Not only does this hold off preemption -
holding those locks basically shuts down the whole VFS.
A neat fix for all such caches is to chunk the work up at the
shrink_slab() level.
I made the chunksize pretty small, for scalability reasons - avoid
holding the lock for too long so another CPU can come in, acquire it
and go off to do some work.
Andrew Morton [Tue, 26 Nov 2002 01:57:17 +0000 (17:57 -0800)]
[PATCH] swapoff accounting cleanup
From Hugh. Remove some strangeness in the swapoff path.
"it dates from the days when that block set *swap_map to 0, but in
2.4.13 I changed it to set *swap_map to 1 and delete_from_swap_cache
afterwards: it's been wrong ever since, hasn't it? swap_list_locking
can go, it was there to guard nr_swap_pages for si_swapinfo; the
swap_device_locking has to stay because swap_map is an array of
unsigned _shorts_."
Andrew Morton [Tue, 26 Nov 2002 01:57:13 +0000 (17:57 -0800)]
[PATCH] realtime swapspace accounting
There are a couple of statistical functions which scan the entire swap
map counting things up, to display in /proc.
On my machine, these hold spinlocks for 19 milliseconds which is
unacceptable from a scheduling latency point of view.
And an application which sits in a loop reading /proc/swaps on a large
machine is probably a decent denial-of-service attack - it will limit
swap allocations to tens of pages per second.
So add a counter to swap_info_struct and use it to track how many pages
are currently in use, so those reporting functions don't need to add
them all up.
Andrew Morton [Tue, 26 Nov 2002 01:57:08 +0000 (17:57 -0800)]
[PATCH] Add some low-latency scheduling points
This is the first in a little batch of patches which address long-held
locks in the VFS/MM layer which are affecting our worst-case scheduling
latency, and are making CONFIG_PREEMPT not very useful.
We end up with a worst-case of 500 microseconds at 500MHz, which is
very good. Unless you do an exit with lots of mmapped memory.
unmap_page_range() needs work.
Some of these patches also add rescheduling points for non-preemptible
kernels - where I felt that the code path could be long enough to be
perceptible.
Three places in the generic pagecache functions need manual
rescheduling points even for non-preemptible kernels:
- generic_file_read() (Can hold the CPU for seconds)
- generic_file_write() (ditto)
- filemap_fdatawait(). This won't hold the CPU for so long, but it
can walk many thousands of pages under the lock. It needs a lock
drop and scheduling point for both preemptible and non-preemptible
kernels. (This one's a bit ugly...)
Andrew Morton [Tue, 26 Nov 2002 01:57:03 +0000 (17:57 -0800)]
[PATCH] reduced context switch rate in writeback
pdflush writes back chunks of ~1000 pages. It currently takes a short
nap if it writes back no pages at all. That could cause it to write
back lots of small batches of pages, as it bounces against a congested
queue.
Change it to sleep if it failed to write back the entire batch against
a congested queue. Ths reduces the context switch rate a little.
The context switch rate is still fairly high (150/sec) - this appears
to be due to add_disk_randomness() scheduling a work function.
Andrew Morton [Tue, 26 Nov 2002 01:56:58 +0000 (17:56 -0800)]
[PATCH] shrink task_struct by removing per_cpu utime and stime
Patch from Bill Irwin. It has the potential to break userspace
monitoring tools a little bit, and I'm a rater uncertain about
how useful the per-process per-cpu accounting is.
Bill sent this out as an RFC on July 29:
"These statistics severely bloat the task_struct and nothing in
userspace can rely on them as they're conditional on CONFIG_SMP. If
anyone is using them (or just wants them around), please speak up."
And nobody spoke up.
If we apply this, the contents of /proc/783/cpu will go from
cpu 1 1
cpu0 0 0
cpu1 0 0
cpu2 1 1
cpu3 0 0
to
cpu 1 1
And we shall save 256 bytes from the ia32 task_struct.
On my SMP build with NR_CPUS=32:
Without this patch, sizeof(task_struct) is 1824, slab uses a 1-order
allocation and we are getting 2 task_structs per page.
With this patch, sizeof(task_struct) is 1568, slab uses a 2-order
allocation and we are getting 2.5 task_structs per page.
So it seems worthwhile.
(Maybe this highlights a shortcoming in slab. For the 1824-byte case
it could have used a 0-order allocation)
This patch (against 2.5.49) updates Documentation/kernel-parameters.txt
to the current state of kernel. It was somehow abadonded lately, so I
did my best, but it's possible that I still missed some of the options
- thus, if you will notice your favourite boot option missing there,
please speak up. Note also that I will probably send up another update
after few further kernel releases..
Also, I attempted to introduce some uniform format to the entries, I
added the format description where I was able to find it out and
decypher it, and I also added gross amount of external links to the
headers of the source files or to the README-like files, where the
options are described into more degree. This way, hopefully this file
has a chance to be actually usable for the users ;-).
There are almost certainly some entries which I missed, it was really a
huge number and the main reason is that some of the boot options don't
use the __setup macro, which I grep'd for.
I hope the patch is ok, there should be no problems with it. Please apply.
Note that this is the fourth submission of the patch - I took the
opportunity and updated the patch from 2.5.48 to 2.5.49. AFAIK mutt
shouldn't mangle the patch in any way, so it should apply cleanly to
your tree, Linus.
Andrew Morton [Tue, 26 Nov 2002 01:56:47 +0000 (17:56 -0800)]
[PATCH] kernel_stat cleanup
Patch from Dipankar Sarma <dipankar@gamebox.net>
This is a trivial cleanup removing two old unused macros from
kernel_stat.h that made no sense with the new per-CPU kstat.
Also included a few finicky coding style changes. Please apply.
Andrew Morton [Tue, 26 Nov 2002 01:56:31 +0000 (17:56 -0800)]
[PATCH] blk_run_queues() locking fix
blk_run_queues() places queues onto a local list without locking. But
interrupt-time activity in scsi_lib.c will replug these queues, which
involves accessing that list which blk_run_queues() is walking.
Net effect: list corruption and an oops. Very hard to reproduce...
So hold the lock while blk_run_queues() is walking the local list.
The patch was acked by Jens. It also uninlines a few largeish
functions and adds a couple of WARN_ON()s in code which worried me.