Reduce the locking coverage of the oft-used j_list_lock: the per-bh
jbd_lock_bh_state() gives us sufficient locking of buffer_head and
journal_head internals.
Andrew Morton [Sun, 18 Apr 2004 03:55:18 +0000 (20:55 -0700)]
[PATCH] rmap: nonlinear truncation
From: Hugh Dickins <hugh@veritas.com>
The earlier changes introducing PageAnon left truncated pages mapped into
nonlinear vmas unswappable. Once we go to object-based rmap, it's
impossible to find where file page is mapped once page->mapping cleared:
switching them to anonymous is odd, and breaks strict commit accounting.
So now handle truncation of nonlinear vmas correctly. And factor in
Daniel's cluster filesystem needs while we're there: when invalidating
local cache, we do want to unmap shared pages from all mms, but we do not
want to discard private COWed modifications of those pages (which
truncation discards to satisfy the SIGBUS semantics demanded by specs).
Drew from Daniel's patch (LKML 2 Mar 04), but didn't always follow it;
fewer name changes, but still some - "unmap" rather than "invalidate".
zap_page_range is not exported, safe to give it and all the too-many layers
an extra zap_details arg, in normal cases just NULL.
Given details, zap_pte_range checks page mapping or index to skip anon or
untruncated pages. I didn't realize before implementing, that in nonlinear
case, it should set a file pte when truncating - otherwise linear pages
might appear in place of SIGBUS. I suspect this implies that ->populate
functions ought to set file ptes beyond EOF instead of failing, but haven't
changed them as yet.
To avoid making yet another copy of that ugly linear pgidx test, added
inline function linear_page_index (to pagemap.h to get PAGE_CACHE_SIZE,
though as usual things don't really work if it differs from PAGE_SIZE).
Ooh, I thought I'd removed ___add_to_page_cache last time, do so now.
unmap_page_range static, shift its hugepage check up into sole caller
unmap_vmas. Killed "killme" debug from unmap_vmas, not seen it trigger.
unmap_mapping_range is exported without restriction: I'm one of those who
believe it should be generally available. But I'm wrongly placed to decide
that, probably just sob quietly to myself if _GPL added later.
Andrew Morton [Sun, 18 Apr 2004 03:55:06 +0000 (20:55 -0700)]
[PATCH] rmap: swap_unplug page
From: Hugh Dickins <hugh@veritas.com>
Good example of "swapper_space considered harmful": swap_unplug_io_fn was
originally designed for calling via swapper_space.backing_dev_info; but
that way it loses track of which device is to be unplugged, so had to
unplug all swap devices. But now sync_page tests SwapCache anyway, can
call swap_unplug_io_fn with page, which leads direct to the device.
Reverted -mc4's CONFIG_SWAP=n fix, just add another NOTHING for it.
Reverted -mc3's editorial adjustments to swap_backing_dev_info and
swapper_space initializations: they document the few fields which are
actually used now, as comment above them says (sound of slapped wrist).
Andrew Morton [Sun, 18 Apr 2004 03:54:52 +0000 (20:54 -0700)]
[PATCH] rmap: flush_dcache revisited
From: Hugh Dickins <hugh@veritas.com>
One of the callers of flush_dcache_page is do_generic_mapping_read, where
file is read without i_sem and without page lock: concurrent truncation may
at any moment remove page from cache, NULLing ->mapping, making
flush_dcache_page liable to oops. Put result of page_mapping in a local
variable and apply mapping_mapped to that (if we were to check for NULL
within mapping_mapped, it's unclear whether to say yes or no).
parisc and arm do have other locking unsafety in their i_mmap(_shared)
searching, but that's a larger issue to be dealt with down the line.
Andrew Morton [Sun, 18 Apr 2004 03:54:27 +0000 (20:54 -0700)]
[PATCH] Fix unix module
From: Rusty Russell <rusty@rustcorp.com.au>
# lsmod
Module Size Used by
1 26060 6
#
The compiler #define's unix to 1: we use -DKBUILD_MODNAME=unix. We used to
#undef unix at the top of af_unix.c, but now the name is inserted by
modpost, that doesn't help.
Andrew Morton [Sun, 18 Apr 2004 03:54:15 +0000 (20:54 -0700)]
[PATCH] ppc64: Fix CPU hot unplug deadlock
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
My RTAS locking fixes incorrectly added a spinlock around the function used
to stop a CPU, that function never returns, thus the lock becomes stale.
The correct fix is to disable interrupts instead (the RTAS params beeing
per-CPU, this should be safe enough)
It occurred to me that if vma and new_vma are one and the same, then
vma_relink_file will not do a good job of linking it after itself - in
that pretty unlikely case when move_page_tables fails.
And more generally, whenever copy_vma's vma_merge succeeds, we have no
guarantee that old vma comes before new_vma in the i_mmap lists, as we
need to satisfy Rajesh's point: that ordering is only guaranteed in the
newly allocated case.
We have to abandon the ordering method when/if we move from lists to
prio_trees, so this patch switches to the less glamorous use of
i_shared_sem exclusion, as in my prio_tree mremap.
Russell King [Sat, 17 Apr 2004 23:19:03 +0000 (00:19 +0100)]
[ARM] Add detailed documentation concerning ARM page tables
This adds detailed documentation concerning how we map the Linux
page table structure onto the hardware tables on ARM. In addition,
it also adds documentation describing how we emulate the "dirty"
and "young" or "accessed" page table bits.
This should be of interest to Linux MM developers.
Pavel Roskin [Sat, 17 Apr 2004 10:41:18 +0000 (11:41 +0100)]
[PCMCIA] Conversion to module_param
Patch from: Pavel Roskin
As it turns out, mixing MODULE_PARM and module_param in one module is
wrong. The parameters specified in module_param are ignored. I've just
posted a patch to LKML that will detect this condition and warn about it.
The new debugging code used the new-style module_param, which means that
all instances of MODULE_PARM should be converted. The attached patch does
that.
An additional bonus is that module_param_array provides the number of
array elements. This allowed me to change tcic.c and i82365.c to use
this number for IRQ list. This change was tested with i82365. If
"irq_list" is not specified, irq_list_count is 0.
I set all permissions to 0444 to be safe. I think we have no secrets
from the users regarding those parameters. If some parameters can be
changed safely at the runtime, the permissions could be changed to 0644.
I didn't examine how safe (and how useful) it would be, so it's 0444 for
now.
Andrew Morton [Sat, 17 Apr 2004 10:32:46 +0000 (03:32 -0700)]
[PATCH] ARM-related ptep_to_address() fix
From: William Lee Irwin III <wli@holomorphy.com>
rmk mentioned that ARM was borked as the relation, assumed by generic rmap,
PTRS_PER_PTE*sizeof(pte_t) == PAGE_SIZE, fails to hold. The following
patch, developed jointly with him (or depending on POV, by him with me
acting as codemonkey), is reported to resolve the issue.
Specifically, while ARM dedicates an entire PAGE_SIZE -sized block of
memory to each PTE table, the PTE table itself only spans half that, the
remainder being dedicated to hardware-interpreted structures. As the
hardware structure must be contiguous, wider ptes can't be used. So the
core-visible PTE table only spans PAGE_SIZE/2 bytes, violating the
assumption. This corrects masking and scaling done in ptep_to_address().
Andrew Morton [Sat, 17 Apr 2004 10:29:12 +0000 (03:29 -0700)]
[PATCH] Fix bogus get_page() calls in hugepage code
From: David Gibson <david@gibson.dropbear.id.au>
Some versions of follow_huge_addr() and follow_huge_pmd() are doing a
get_page() on the target page. They shouldn't: follow_page() returns an
unpinned page and it is the caller's responsibility to pin the page (if
desired) before dropping page_table_lock.
Andrew Morton [Sat, 17 Apr 2004 10:28:13 +0000 (03:28 -0700)]
[PATCH] reiserfs: fsync() speedup
From: Chris Mason <mason@suse.com>
Updates the reiserfs-logging improvements to use schedule_timeout instead of
yield when letting the transaction grow a little before forcing a commit for
fsync/O_SYNC/O_DIRECT.
Also, when one process forces a transaction to end and plans on doing the
commit (like fsync), it sets a flag on the transaction so the journal code
knows not to bother kicking the journal work queue.
queue_delayed_work is used so that if we get a bunch of tiny transactions
ended quickly, we aren't constantly kicking the work queue.
These significantly improve reiserfs performance during fsync heavy
workloads.
Andrew Morton [Sat, 17 Apr 2004 10:28:02 +0000 (03:28 -0700)]
[PATCH] Add "commit=0" to reiserfs
From: Bart Samwel <bart@samwel.tk>
Add support for value 0 to the commit option of reiserfs. Means "restore
to the default value". For the maximum commit age, this default value is
normally read from the journal; this patch adds an extra variable to cache
the default value for the maximum commit age.
Andrew Morton [Sat, 17 Apr 2004 10:27:51 +0000 (03:27 -0700)]
[PATCH] ppc64: hugepage cleanup
From: David Gibson <david@gibson.dropbear.id.au>
This is a small cleanup to the PPC64 hugepage code. It removes an
unhelpful function, removing some studlyCaps in the process. It was
originally this way to match the normal page path, but that has all been
rewritten since.
Andrew Morton [Sat, 17 Apr 2004 10:27:40 +0000 (03:27 -0700)]
[PATCH] Fix mq 32-bit compatibility
From: Jakub Jelinek <jakub@redhat.com>
The first change removes just a useless put_user (si_int and si_ptr are
part of the same union, si_ptr is on all arches covering whole union), the
rest is fixes for signal handling of SI_MESGQ.
From: Andros: Implement server-side reboot recovery (server now handles
open and lock reclaims). Not completely to spec: we don't yet store the
state in stable storage that would be required to recover correctly in
certain situations.
Andrew Morton [Sat, 17 Apr 2004 10:26:39 +0000 (03:26 -0700)]
[PATCH] kNFSdv4: Set credentials properly when puutrootfh is used
From: NeilBrown <neilb@cse.unsw.edu.au>
The credentials (uid/gid) of a process are set when a filehandle is
verified. Nfsv4 allows requests without an explicit filehandle (instead,
an implicit 'root' filehandle) so we much make sure the credentials are set
for these requests too.
From: "J. Bruce Fields" <bfields@fieldses.org>
From: Andros: added a call to nfsd_setuser in nfsd4_putrootfh so that nfsd
runs as the rpc->cred user.
Andrew Morton [Sat, 17 Apr 2004 10:26:28 +0000 (03:26 -0700)]
[PATCH] kNFSdv4: Improve how locking copes with replays
From: NeilBrown <neilb@cse.unsw.edu.au>
From: "J. Bruce Fields" <bfields@fieldses.org>
From: Andros: Hold state_lock longer so the stateowner doesn't diseappear
out from under us before we get the chance to encode the replay. Don't
attempt to save replay if we failed to find a stateowner.
Andrew Morton [Sat, 17 Apr 2004 10:25:57 +0000 (03:25 -0700)]
[PATCH] kNFSdv4: Keep state to allow replays for 'close' to work.
From: NeilBrown <neilb@cse.unsw.edu.au>
From: "J. Bruce Fields" <bfields@fieldses.org>
From: Andros: Idea is to keep around a list of openowners recently released
by closes, and make sure they stay around long enough so that replays still
work.
Andrew Morton [Sat, 17 Apr 2004 10:25:10 +0000 (03:25 -0700)]
[PATCH] dm: avoid ioctl buffer overrun
From: Kevin Corry <kevcorry@us.ibm.com>
dm-ioctl.c::retrieve_status(): Prevent overrunning the ioctl buffer by making
sure we don't call the target status routine with a buffer size limit of
zero. [Kevin Corry, Alasdair Kergon]
Andrew Morton [Sat, 17 Apr 2004 10:24:54 +0000 (03:24 -0700)]
[PATCH] dm: fix a comment
From: Kevin Corry <kevcorry@us.ibm.com>
Clarify the comment regarding the "next" field in struct dm_target_spec. The
"next" field has different behavior if you're performing a DM_TABLE_STATUS
command than it does if you're performing a DM_TABLE_LOAD command.
See populate_table() and retrieve_status() in drivers/md/dm-ioctl.c for more
details on how this field is used.
Petr Vandrovec [Sat, 17 Apr 2004 10:08:23 +0000 (03:08 -0700)]
[PATCH] Fix exec in multithreaded application
The recent controlling terminal changes broke exec from multithreaded
application because de_thread was not upgraded to new arrangement. I
know that I should not have LD_PRELOAD library which automatically
creates one thread, but it looked like a cool solution to the problem I
had.
de_thread must initialize the controlling terminal information in the
new thread group.
Don Fry [Fri, 16 Apr 2004 09:01:59 +0000 (05:01 -0400)]
[PATCH] pcnet32 transmit performance fix
When the pcnet32 adapter is installed in a system with long PCI latency
and the read burst bit is not set, performance on transmission is very
low (under 20Mbit on a 100Mbit link). This patch against 2.6.6-rc1 will
make sure that read and write bursts are enabled. Tested on ppc64 and
ia32.
Russell King [Fri, 16 Apr 2004 16:27:18 +0000 (17:27 +0100)]
[ARM] Add --no-undefined to linker command line.
Many binutils versions over the last year appear to silently build
assembler files with undefined constants, and able to successfully
create executables from such files. The assembler appears to add
undefined symbols to the symbol table without any corresponding
relocation information. Obviously this is bad news since the
resulting executable may not be what the programmer intended.
Work around the problem by forcing the linker to fail if there are
any undefined symbols in the final object(s).
Bring up/down network devices with lapbether causes scheduling while
atomic (if preempt enabled).
The calls to rcu_read_lock are unnecessary since lapb_device_event
is called from notifier with the rtnetlink semaphore held, it is
already protected from the labp_devices list changing.
Jean Tourrilhes [Fri, 16 Apr 2004 07:49:46 +0000 (00:49 -0700)]
[IRDA]: irlan_eth cleanup.
Use IrTTP flow control to stop/wake netif
From Stephen Hemminger.
Change irlan_eth device initialization:
*bug* address never set in DIRECT mode because access not set
in alloc_netdev -> irlan_eth_setup path
+ make eth_XXX handles static and provide alloc_irlandev hook
+ use netdev_priv (and get rid of truly impossible ASSERT's)
+ use skb_queue_purge
Jean Tourrilhes [Fri, 16 Apr 2004 07:48:29 +0000 (00:48 -0700)]
[IRDA]: irlan_common cleanup.
Minor type changes in irlan_common for clarity:
- use const
- init and exit can be static
- use skb_queue_purge to flush queue
- get rid of noisy old comment
Alexander Viro [Fri, 16 Apr 2004 07:27:54 +0000 (00:27 -0700)]
[PATCH] remount: mount flags filtering
- we could pass MS_ACTIVE in mount flags and it would be passed into
->get_sb(), leading to interesting failure modes. This flag is only
for internal use (it's set once fill_super is complete and reset
before the inode eviction on umount); made sure that we never get
tricked into having it set it too early.
Alexander Viro [Fri, 16 Apr 2004 07:27:14 +0000 (00:27 -0700)]
[PATCH] remount: fs/jffs2
- jff2->remount_fs() was buggy - it played with sb->s_flags instead of
doing modifications to *flags (->s_flags will be overwritten using
*flags right after the call of ->remount_fs()). Moreover, it tried
to do the wrong thing - it should just enforce noatime and be done
with that. Fixed, ACKed by maintainer.
Alexander Viro [Fri, 16 Apr 2004 07:26:34 +0000 (00:26 -0700)]
[PATCH] remount: fs/sysv fixes
- several variants of sysv fs are supported only r/o. Driver does
force r/o on mount, but doesn't do anything on remount. As the
result, one can remount them r/w and results are Not Pretty(tm).
Missing checks added, code cleaned up.
- we had double-brelse() in v7fs - if sanity checks on root inode will
succeed, but allocation of root dentry fails, we brelse() the same
buffer_head twice. Fixed.
Use correct types for fields related to spanning tree protocols.
* costs are 32 bit unsigned
* ports are 16 bit unsigned
* booleans are bytes rather than bitfield
* arrange for better packing
Make forwarding database more robust.
+ Don't insert invalid ether address,
+ Report errors back so adding an interface to bridge can fail
+ get rid of unneeded explicit pads in data structure
+ replace bitfields with byte's for simple booleans.
The bridge code needs to keep track of a cost estimate for each
port in the bridge. Instead of a hack based on device name, try
and use ethtool to get port speed from device. This has been tested
on e100 (uses ethtool_ops) and e1000 (does ethtool the hard way)
and dummy (no ethtool).
Need to export dev_ethtool() to allow bridge module to get to
it easily.
Code takes care to maintain same locking and semantics as if ioctl
was being done from application.