Andrew Morton [Mon, 19 Apr 2004 05:06:30 +0000 (22:06 -0700)]
[PATCH] From: David Gibson <david@gibson.dropbear.id.au>
hugepage_vma() is both misleadingly named and unnecessary. On most archs it
always returns NULL, and on IA64 the vma it returns is never used. The
function's real purpose is to determine whether the address it is passed is a
special hugepage address which must be looked up in hugepage pagetables,
rather than being looked up in the normal pagetables (which might have
specially marked hugepage PMDs or PTEs).
This patch kills off hugepage_vma() and folds the logic it really needs into
follow_huge_addr(). That now returns a (page *) if called on a special
hugepage address, and an error encoded with ERR_PTR otherwise. This also
requires tweaking the IA64 code to check that the hugepage PTE is present in
follow_huge_addr() - previously this was guaranteed, since it was only called
if the address was in an existing hugepage VMA, and hugepages are always
prefaulted.
Andrew Morton [Mon, 19 Apr 2004 05:06:16 +0000 (22:06 -0700)]
[PATCH] Fix default value for commit interval for older reiserfs filesystems.
From: Bart Samwel <bart@samwel.tk>
The reiserfs patch that adds support for "commit=0" saves the default max
commit age in a variable when the fs is originally mounted, so that it can
later restore it. Unfortunately it makes some mistakes with that:
- The default is not saved when the original mount has a commit=NNN option.
- The default is not correctly saved for older reiserfs filesystems, where
the default was not stored on disk.
Andrew Morton [Mon, 19 Apr 2004 05:05:51 +0000 (22:05 -0700)]
[PATCH] Increase number of dynamic inodes in procfs
From: Nathan Lynch <nathanl@austin.ibm.com>
On some larger ppc64 configurations /proc/device-tree is exhausting procfs'
dynamic (non-pid) inode range (16K). This patch makes the dynamic inode
range 0xf0000000-0xffffffff and changes the inode number allocator to use
the idr.c allocator for the first-fit allocations.
A few weeks ago, Pavel and I agreed that PF_IOTHREAD should be renamed to
PF_NOFREEZE. This reflects the fact that some threads so marked aren't
actually used for IO while suspending, but simply shouldn't be frozen.
This patch, against 2.6.5 vanilla, applies that change. In the
refrigerator calls, the actual value doesn't matter (so long as it's
non-zero) and it makes more sense to use PF_FREEZE so I've used that.
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds the posix message queue syscalls to ppc32 and 64 and fixes
our implementation of compat copy siginfo to 32 bits userland which wasn't
using the si_code but still doing a switch/case on the signal number.
Russell King [Sun, 18 Apr 2004 23:32:07 +0000 (00:32 +0100)]
[ARM] Clean up ARM includes
This removes a number of unnecessary includes from the ARM specific
files throughout the kernel. Most notably asm/pgalloc.h is
needlessly included in several places. There were some places
including it as a means to get at the cache flushing functions,
so this has been corrected.
This is my brown paper bag day, I sent you the wrong patch for
fixing the deadlock in rtas.c, here's one to apply on top of current
bk that fixes build.
Reduce the locking coverage of the oft-used j_list_lock: the per-bh
jbd_lock_bh_state() gives us sufficient locking of buffer_head and
journal_head internals.
Andrew Morton [Sun, 18 Apr 2004 03:55:18 +0000 (20:55 -0700)]
[PATCH] rmap: nonlinear truncation
From: Hugh Dickins <hugh@veritas.com>
The earlier changes introducing PageAnon left truncated pages mapped into
nonlinear vmas unswappable. Once we go to object-based rmap, it's
impossible to find where file page is mapped once page->mapping cleared:
switching them to anonymous is odd, and breaks strict commit accounting.
So now handle truncation of nonlinear vmas correctly. And factor in
Daniel's cluster filesystem needs while we're there: when invalidating
local cache, we do want to unmap shared pages from all mms, but we do not
want to discard private COWed modifications of those pages (which
truncation discards to satisfy the SIGBUS semantics demanded by specs).
Drew from Daniel's patch (LKML 2 Mar 04), but didn't always follow it;
fewer name changes, but still some - "unmap" rather than "invalidate".
zap_page_range is not exported, safe to give it and all the too-many layers
an extra zap_details arg, in normal cases just NULL.
Given details, zap_pte_range checks page mapping or index to skip anon or
untruncated pages. I didn't realize before implementing, that in nonlinear
case, it should set a file pte when truncating - otherwise linear pages
might appear in place of SIGBUS. I suspect this implies that ->populate
functions ought to set file ptes beyond EOF instead of failing, but haven't
changed them as yet.
To avoid making yet another copy of that ugly linear pgidx test, added
inline function linear_page_index (to pagemap.h to get PAGE_CACHE_SIZE,
though as usual things don't really work if it differs from PAGE_SIZE).
Ooh, I thought I'd removed ___add_to_page_cache last time, do so now.
unmap_page_range static, shift its hugepage check up into sole caller
unmap_vmas. Killed "killme" debug from unmap_vmas, not seen it trigger.
unmap_mapping_range is exported without restriction: I'm one of those who
believe it should be generally available. But I'm wrongly placed to decide
that, probably just sob quietly to myself if _GPL added later.
Andrew Morton [Sun, 18 Apr 2004 03:55:06 +0000 (20:55 -0700)]
[PATCH] rmap: swap_unplug page
From: Hugh Dickins <hugh@veritas.com>
Good example of "swapper_space considered harmful": swap_unplug_io_fn was
originally designed for calling via swapper_space.backing_dev_info; but
that way it loses track of which device is to be unplugged, so had to
unplug all swap devices. But now sync_page tests SwapCache anyway, can
call swap_unplug_io_fn with page, which leads direct to the device.
Reverted -mc4's CONFIG_SWAP=n fix, just add another NOTHING for it.
Reverted -mc3's editorial adjustments to swap_backing_dev_info and
swapper_space initializations: they document the few fields which are
actually used now, as comment above them says (sound of slapped wrist).
Andrew Morton [Sun, 18 Apr 2004 03:54:52 +0000 (20:54 -0700)]
[PATCH] rmap: flush_dcache revisited
From: Hugh Dickins <hugh@veritas.com>
One of the callers of flush_dcache_page is do_generic_mapping_read, where
file is read without i_sem and without page lock: concurrent truncation may
at any moment remove page from cache, NULLing ->mapping, making
flush_dcache_page liable to oops. Put result of page_mapping in a local
variable and apply mapping_mapped to that (if we were to check for NULL
within mapping_mapped, it's unclear whether to say yes or no).
parisc and arm do have other locking unsafety in their i_mmap(_shared)
searching, but that's a larger issue to be dealt with down the line.
Andrew Morton [Sun, 18 Apr 2004 03:54:27 +0000 (20:54 -0700)]
[PATCH] Fix unix module
From: Rusty Russell <rusty@rustcorp.com.au>
# lsmod
Module Size Used by
1 26060 6
#
The compiler #define's unix to 1: we use -DKBUILD_MODNAME=unix. We used to
#undef unix at the top of af_unix.c, but now the name is inserted by
modpost, that doesn't help.
Andrew Morton [Sun, 18 Apr 2004 03:54:15 +0000 (20:54 -0700)]
[PATCH] ppc64: Fix CPU hot unplug deadlock
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
My RTAS locking fixes incorrectly added a spinlock around the function used
to stop a CPU, that function never returns, thus the lock becomes stale.
The correct fix is to disable interrupts instead (the RTAS params beeing
per-CPU, this should be safe enough)
Russell King [Sat, 17 Apr 2004 23:19:03 +0000 (00:19 +0100)]
[ARM] Add detailed documentation concerning ARM page tables
This adds detailed documentation concerning how we map the Linux
page table structure onto the hardware tables on ARM. In addition,
it also adds documentation describing how we emulate the "dirty"
and "young" or "accessed" page table bits.
This should be of interest to Linux MM developers.
It occurred to me that if vma and new_vma are one and the same, then
vma_relink_file will not do a good job of linking it after itself - in
that pretty unlikely case when move_page_tables fails.
And more generally, whenever copy_vma's vma_merge succeeds, we have no
guarantee that old vma comes before new_vma in the i_mmap lists, as we
need to satisfy Rajesh's point: that ordering is only guaranteed in the
newly allocated case.
We have to abandon the ordering method when/if we move from lists to
prio_trees, so this patch switches to the less glamorous use of
i_shared_sem exclusion, as in my prio_tree mremap.
Pavel Roskin [Sat, 17 Apr 2004 10:41:18 +0000 (11:41 +0100)]
[PCMCIA] Conversion to module_param
Patch from: Pavel Roskin
As it turns out, mixing MODULE_PARM and module_param in one module is
wrong. The parameters specified in module_param are ignored. I've just
posted a patch to LKML that will detect this condition and warn about it.
The new debugging code used the new-style module_param, which means that
all instances of MODULE_PARM should be converted. The attached patch does
that.
An additional bonus is that module_param_array provides the number of
array elements. This allowed me to change tcic.c and i82365.c to use
this number for IRQ list. This change was tested with i82365. If
"irq_list" is not specified, irq_list_count is 0.
I set all permissions to 0444 to be safe. I think we have no secrets
from the users regarding those parameters. If some parameters can be
changed safely at the runtime, the permissions could be changed to 0644.
I didn't examine how safe (and how useful) it would be, so it's 0444 for
now.
Andrew Morton [Sat, 17 Apr 2004 10:32:46 +0000 (03:32 -0700)]
[PATCH] ARM-related ptep_to_address() fix
From: William Lee Irwin III <wli@holomorphy.com>
rmk mentioned that ARM was borked as the relation, assumed by generic rmap,
PTRS_PER_PTE*sizeof(pte_t) == PAGE_SIZE, fails to hold. The following
patch, developed jointly with him (or depending on POV, by him with me
acting as codemonkey), is reported to resolve the issue.
Specifically, while ARM dedicates an entire PAGE_SIZE -sized block of
memory to each PTE table, the PTE table itself only spans half that, the
remainder being dedicated to hardware-interpreted structures. As the
hardware structure must be contiguous, wider ptes can't be used. So the
core-visible PTE table only spans PAGE_SIZE/2 bytes, violating the
assumption. This corrects masking and scaling done in ptep_to_address().
Andrew Morton [Sat, 17 Apr 2004 10:29:12 +0000 (03:29 -0700)]
[PATCH] Fix bogus get_page() calls in hugepage code
From: David Gibson <david@gibson.dropbear.id.au>
Some versions of follow_huge_addr() and follow_huge_pmd() are doing a
get_page() on the target page. They shouldn't: follow_page() returns an
unpinned page and it is the caller's responsibility to pin the page (if
desired) before dropping page_table_lock.
Andrew Morton [Sat, 17 Apr 2004 10:28:13 +0000 (03:28 -0700)]
[PATCH] reiserfs: fsync() speedup
From: Chris Mason <mason@suse.com>
Updates the reiserfs-logging improvements to use schedule_timeout instead of
yield when letting the transaction grow a little before forcing a commit for
fsync/O_SYNC/O_DIRECT.
Also, when one process forces a transaction to end and plans on doing the
commit (like fsync), it sets a flag on the transaction so the journal code
knows not to bother kicking the journal work queue.
queue_delayed_work is used so that if we get a bunch of tiny transactions
ended quickly, we aren't constantly kicking the work queue.
These significantly improve reiserfs performance during fsync heavy
workloads.
Andrew Morton [Sat, 17 Apr 2004 10:28:02 +0000 (03:28 -0700)]
[PATCH] Add "commit=0" to reiserfs
From: Bart Samwel <bart@samwel.tk>
Add support for value 0 to the commit option of reiserfs. Means "restore
to the default value". For the maximum commit age, this default value is
normally read from the journal; this patch adds an extra variable to cache
the default value for the maximum commit age.
Andrew Morton [Sat, 17 Apr 2004 10:27:51 +0000 (03:27 -0700)]
[PATCH] ppc64: hugepage cleanup
From: David Gibson <david@gibson.dropbear.id.au>
This is a small cleanup to the PPC64 hugepage code. It removes an
unhelpful function, removing some studlyCaps in the process. It was
originally this way to match the normal page path, but that has all been
rewritten since.
Andrew Morton [Sat, 17 Apr 2004 10:27:40 +0000 (03:27 -0700)]
[PATCH] Fix mq 32-bit compatibility
From: Jakub Jelinek <jakub@redhat.com>
The first change removes just a useless put_user (si_int and si_ptr are
part of the same union, si_ptr is on all arches covering whole union), the
rest is fixes for signal handling of SI_MESGQ.
From: Andros: Implement server-side reboot recovery (server now handles
open and lock reclaims). Not completely to spec: we don't yet store the
state in stable storage that would be required to recover correctly in
certain situations.
Andrew Morton [Sat, 17 Apr 2004 10:26:39 +0000 (03:26 -0700)]
[PATCH] kNFSdv4: Set credentials properly when puutrootfh is used
From: NeilBrown <neilb@cse.unsw.edu.au>
The credentials (uid/gid) of a process are set when a filehandle is
verified. Nfsv4 allows requests without an explicit filehandle (instead,
an implicit 'root' filehandle) so we much make sure the credentials are set
for these requests too.
From: "J. Bruce Fields" <bfields@fieldses.org>
From: Andros: added a call to nfsd_setuser in nfsd4_putrootfh so that nfsd
runs as the rpc->cred user.
Andrew Morton [Sat, 17 Apr 2004 10:26:28 +0000 (03:26 -0700)]
[PATCH] kNFSdv4: Improve how locking copes with replays
From: NeilBrown <neilb@cse.unsw.edu.au>
From: "J. Bruce Fields" <bfields@fieldses.org>
From: Andros: Hold state_lock longer so the stateowner doesn't diseappear
out from under us before we get the chance to encode the replay. Don't
attempt to save replay if we failed to find a stateowner.
Andrew Morton [Sat, 17 Apr 2004 10:25:57 +0000 (03:25 -0700)]
[PATCH] kNFSdv4: Keep state to allow replays for 'close' to work.
From: NeilBrown <neilb@cse.unsw.edu.au>
From: "J. Bruce Fields" <bfields@fieldses.org>
From: Andros: Idea is to keep around a list of openowners recently released
by closes, and make sure they stay around long enough so that replays still
work.
Andrew Morton [Sat, 17 Apr 2004 10:25:10 +0000 (03:25 -0700)]
[PATCH] dm: avoid ioctl buffer overrun
From: Kevin Corry <kevcorry@us.ibm.com>
dm-ioctl.c::retrieve_status(): Prevent overrunning the ioctl buffer by making
sure we don't call the target status routine with a buffer size limit of
zero. [Kevin Corry, Alasdair Kergon]
Andrew Morton [Sat, 17 Apr 2004 10:24:54 +0000 (03:24 -0700)]
[PATCH] dm: fix a comment
From: Kevin Corry <kevcorry@us.ibm.com>
Clarify the comment regarding the "next" field in struct dm_target_spec. The
"next" field has different behavior if you're performing a DM_TABLE_STATUS
command than it does if you're performing a DM_TABLE_LOAD command.
See populate_table() and retrieve_status() in drivers/md/dm-ioctl.c for more
details on how this field is used.
Petr Vandrovec [Sat, 17 Apr 2004 10:08:23 +0000 (03:08 -0700)]
[PATCH] Fix exec in multithreaded application
The recent controlling terminal changes broke exec from multithreaded
application because de_thread was not upgraded to new arrangement. I
know that I should not have LD_PRELOAD library which automatically
creates one thread, but it looked like a cool solution to the problem I
had.
de_thread must initialize the controlling terminal information in the
new thread group.
Don Fry [Fri, 16 Apr 2004 09:01:59 +0000 (05:01 -0400)]
[PATCH] pcnet32 transmit performance fix
When the pcnet32 adapter is installed in a system with long PCI latency
and the read burst bit is not set, performance on transmission is very
low (under 20Mbit on a 100Mbit link). This patch against 2.6.6-rc1 will
make sure that read and write bursts are enabled. Tested on ppc64 and
ia32.
Russell King [Fri, 16 Apr 2004 16:27:18 +0000 (17:27 +0100)]
[ARM] Add --no-undefined to linker command line.
Many binutils versions over the last year appear to silently build
assembler files with undefined constants, and able to successfully
create executables from such files. The assembler appears to add
undefined symbols to the symbol table without any corresponding
relocation information. Obviously this is bad news since the
resulting executable may not be what the programmer intended.
Work around the problem by forcing the linker to fail if there are
any undefined symbols in the final object(s).