The Documentation/Changes in the summary has been updated to require
make 3.78 but the other references were not updated. And 3.78 really
is required. This patch updates the other locations.
Andrew Morton [Sun, 13 Oct 2002 02:33:20 +0000 (19:33 -0700)]
[PATCH] reduce the dirty threshold when there's a lot of mapped
Dirty memory thresholds are currently set by /proc/sys/vm/dirty_ratio.
Background writeout levels are controlled by
/proc/sys/vm/dirty_background_ratio.
Problem is that these levels are hard to get right - they are too
static. If there is a lot of mapped memory around then the 40%
clamping level causes too much dirty data. We do lots of scanning in
page reclaim, and the VM generally starts getting into distress. Extra
swapping, extra page unmapping.
It would be much better to simply tell the caller of write(2) to slow
down - to write out their dirty data sooner, to make those written
pages trivially reclaimable. Penalise the offender, not the innocent
page allocators.
This patch changes the writer throttling code so that we clamp down
much harder on writers if there is a lot of mapped memory in the
machine. We only permit memory dirtiers to dirty up to 50% of unmapped
memory before forcing them to clean their own pagecache.
Andrew Morton [Sun, 13 Oct 2002 02:33:11 +0000 (19:33 -0700)]
[PATCH] start anon pages on the active list
We're currently adding anon pages to the inactive list. But they're
all referenced, so when they reach the tail of the inactive list the
kernel will always then bump them up to the active list.
Not only does this waste CPU, but it leads to inactive/active
imbalance. We end up with enormous sequences of unreclaimable,
to-be-activated pages hitting the tail of the LRU and large amounts of
scanning need to be done. Which upsets the VM, making it think that it
is "under distress".
Andrew Morton [Sun, 13 Oct 2002 02:33:06 +0000 (19:33 -0700)]
[PATCH] reduced and tunable swappiness
/proc/sys/vm/swappiness controls the VM's tendency to unmap pages and to
swap things out.
100 -> basically current 2.5 behaviour
0 -> not very swappy at all
The mechanism which is used to control swappiness is: to be reluctant
to bring mapped pages onto the inactive list. Prefer to reclaim
pagecache instead.
The control for that mechanism is as follows:
- If there is a large amount of mapped memory in the machine, we
prefer to bring mapped pages onto the inactive list.
- If page reclaim is under distress (more scanning is happening) then
prefer to bring mapped pages onto the inactive list. This is
basically the 2.4 algorithm, really.
- If the /proc/sys/vm/swappiness control is high then prefer to bring
mapped pages onto the inactive list.
The implementation is simple: calculate the above three things as
percentages and add them up. If that's over 100% then start reclaiming
mapped pages.
The `proportion of mapped memory' is downgraded so that we don't swap
just because a lot of memory is mapped into pagetables - we still need
some VM distress before starting to swap that memory out.
For a while I was adding a little bias so that we prefer to unmap
file-backed memory before swapping out anon memory. Because usually
file backed memory can be evicted and reestablished with one I/O, not
two. It was unmapping executable text too easily, so here I just treat
them equally.
Andrew Morton [Sun, 13 Oct 2002 02:33:01 +0000 (19:33 -0700)]
[PATCH] propagate pte reference into page reference during
zap_pte_range() is currently just dropping the pte. Change it to mark
the page referenced if the pte says it was. This has the effect of
delaying the eviction of recently-mapped pagecache.
This means that we're currently marking the page accessed when it is
first faulted in as well as when we drop it from pagetables. Which
matches up with the (strange) behaviour of the VM: it reclaims
PageReferenced pagecache pages off the inactive list.
Probably, it makes sense to remove the mark_page_accessed() from
filemap_nopage() and just use the pte bits everywhere. Reviewing all
the PageReferenced()/mark_page_accessed() usage is on my todo list.
Andrew Morton [Sun, 13 Oct 2002 02:32:57 +0000 (19:32 -0700)]
[PATCH] small-machine writer throttling fix
The current writer throttling in balance_dirty_pages() assumes that the
writer will be effectively throttled on request queues.
That works fine when the amount of data which can be placed into a
queue is "much less than" total memory.
But if the machine has a small amount of memory, or many disks, or has
large request queues, or large requests, it can go wrong.
For example, with mem=96m and dirty_async_ratio=15, we want to be able
to clamp dirty+writeback memory at 15 megabytes. But it doesn't work,
because a single SCSI request queue can hold 40 megs or more. The
heavy writer keeps on dirtying memory until that queue fills up.
So add a test for that - if we did some writeback, and we're *still*
over the dirty+writeback threshold then make the caller take an
explicit nap on some writes terminating. And keep on doing that until
the dirty+writeback memory subsides.
Andrew Morton [Sun, 13 Oct 2002 02:32:40 +0000 (19:32 -0700)]
[PATCH] /proc/meminfo alterations for hugetlbpages
The patch from Rohit and David M-T changes the hugetlb page info in
/proc/meminfo slightly.
It makes the identifiers a little clearer while ensuring that we don't
add any identifiers which have whitespace. glibc is/shall be parsing
this information to determine the size and alignment requirements of
the hugetlb pages.
This basically means that procfs is a requirement for successful
hugetlb page usage. Not very nice, but I suspect real-world userspace
fails without procfs anyway.
Ben Collins [Sat, 12 Oct 2002 10:21:27 +0000 (03:21 -0700)]
[PATCH] Linux IEEE-1394 Updates
- Cleanup (purge) some of our old compat code (never thouched)
- Fix dv1394 compilation warnings without devfs
- Added new config-rom handling features. Allows for on-the-fly
config-rom generation for dynamic functionality of the host nodes.
- Convert to workqueue from taskqueue interfaces. This is actually
abstracted compatibility code between tqueue/workqueue.
Linus Torvalds [Sat, 12 Oct 2002 10:15:31 +0000 (03:15 -0700)]
Fix type - it used to be "__u8 short", which previous versions
of gcc incorrectly accepted as "short". It got fixed to __u8, but
it really should be __u16.
Andi Kleen [Sat, 12 Oct 2002 10:07:58 +0000 (03:07 -0700)]
[PATCH] Misc core changes for x86-64/2.5.42
And here all the other x86-64 changes that have accumulated in my tree.
It's various bugfixes and cleanups.
Changes:
- fix nmi watchdog
- remove local timer spreading over CPUs - it's useless here and caused many problems
- New offset.h computation from Kai
- Lots of changes for the C99 initializer syntax
- New MTRR driver from Dave & Mats
- Bugfix: kernel threads don't start with interrupts disabled anymore, which fixes
various boottime hangs (this was fixed a long time ago, but the bug crept in again
by the backdoor)
- Do %gs reload in context switch lockless
- Fix device_not_available entry point race
- New per CPU GDT layout following i386: the layot is not completely
compatible with i386, which may problems with Wine in theory.
Haven't seen any yet.
- Support disableapic option
- driverfs support removed for now because it caused crashes
- Updates for new signal setup
- Support for kallsyms
- Port TLS clone flags/syscalls: unfortunately made the context switch
even uglier than it already is.
- Security fixes for ptrace
- New in_interrupt()/atomic setup ported from i386
- New makefiles mostly from Kai
- Various updates ported from i386
Andi Kleen [Sat, 12 Oct 2002 10:05:38 +0000 (03:05 -0700)]
[PATCH] Time changes for x86-64
Some timer updates from Vojtech Pavlik for x86-64. In theory support
HPET timing now, but the support is disabled.
Would actually need vxtime_lock() macros in the generic timer code to
protect xtime updates, but I'm leaving that out now because it's only
needed for vsyscalls and they're currently disabled.
Andi Kleen [Sat, 12 Oct 2002 10:05:32 +0000 (03:05 -0700)]
[PATCH] hotplug cpu changes for x86-64
Port of the hotplug CPU changes from i386 for x86-64. I don't expect
x86-64 hardware to support CPU hotplugging any time soon, but this makes
it all compile & work again and keeps some consistency.
Andi Kleen [Sat, 12 Oct 2002 10:05:09 +0000 (03:05 -0700)]
[PATCH] x86-64 IOMMU & PCI updates
Update for the x86-64 PCI subsystem in 2.5.42. Main new feature is PCI
IOMMU support through the K8 aperture. This allows to use more than 4GB
of memory with 32bit PCI devices. Also some other PCI changes, mostly
merges from i386.
Andi Kleen [Sat, 12 Oct 2002 10:04:57 +0000 (03:04 -0700)]
[PATCH] x86-64 - new memory map handling
New e820 memory map handling for x86-64. Move it all to a new file and
clean it up a lot. Add some simple allocator functions to deal with
holey e820 mappings cleanly A lot of this is preparation for NUMA (which
works in 2.4, but is not ported to 2.5 yet)
Andi Kleen [Sat, 12 Oct 2002 10:04:51 +0000 (03:04 -0700)]
[PATCH] x86-64 Bootloader updates
Update the early 32bit bootloader for x86-64. This stuff is near
completely identical to i386, except for a few Makefile changes to tell
the x86-64 toolkit to compile in 32bit mode.
Russell King [Sun, 13 Oct 2002 00:47:55 +0000 (01:47 +0100)]
[SERIAL] Fix Sparc32/64 handling of CONFIG_SERIAL_CORE{,_CONSOLE}
SPARC was unconditionally setting CONFIG_SERIAL_CORE_CONSOLE to y
and conditionally setting CONFIG_SERIAL_CORE depending on the Sparc
sub-drivers. In addition, the core serial driver for SPARC is
always built, so we end up with link errors.
We instead make CONFIG_SERIAL_CORE{,_CONSOLE} dependent on building
the SPARC core driver (CONFIG_SERIAL_SUNCORE).
Fix warnings of the form
warning: long int format, different type arg (arg 5)
by casting ino_t arguments to unsigned long for printf formats.
In some instances, change %ld to %lu.
Robert Read [Sat, 12 Oct 2002 02:52:06 +0000 (19:52 -0700)]
[PATCH] InterMezzo for 2.5
This is the initial port of InterMezzo for 2.5. It now compiles,
mounts, and completes I/Os successfully, but we are aware that more
needs to be done and we will do it as quickly as possible.
Adrian Bunk [Sat, 12 Oct 2002 02:47:02 +0000 (19:47 -0700)]
[PATCH] Fix cpufreq compile
The timer-handling split patch moved cpufreq stuff from time.c to
timers/timer_tsc.c but not the corresponding #include <linux/cpufreq.h>
causing the build to fail.
Steve French [Fri, 11 Oct 2002 20:53:16 +0000 (15:53 -0500)]
Correct compiler warnings for 64 bit platforms and minor formatting cleanup and remove debug function that was causing a conflict with a function of the same name in SCSI