Rusty Russell [Fri, 19 Mar 2004 00:03:25 +0000 (16:03 -0800)]
[PATCH] Hotplug CPUs: Make Migration Thread Handle CPUs Going
Change the migration thread to directly use its cpu arg, rather than
smp_processor_id(): if a cpu goes up then down rapidly, it can be on
the wrong cpu just before it is stopped.
Add code to stop the migration thread on CPU_DEAD and CPU_UP_CANCELED.
Rusty Russell [Fri, 19 Mar 2004 00:03:16 +0000 (16:03 -0800)]
[PATCH] Hotplug CPUs: Set prio of migration thread before CPU
We need the migration thread to be RT as soon as the CPU comes online:
for example, stop_machine() (another RT task) expects to yield to it.
Extract the core of setscheduler() and do that when the migration
thread is created. rq lock is a precaution against the (theoretical)
possibility of someone else doing setscheduer on this thread at the
same time.
Rusty Russell [Fri, 19 Mar 2004 00:03:06 +0000 (16:03 -0800)]
[PATCH] Hotplug CPUs: Keep IRQs off in Migration Thread Calling
Currently the migration thread re-enables irqs, then calls
move_task_away which disables IRQs again and actually does the move.
This means there is a race where the migration thread gets preempted,
and the target CPU can go down.
Hold irqs disabled in migration thread across move_task_away(), which
now doesn't need to save flags (the other caller is the hotplug CPU
code, where irqs are also disabled).
Rusty Russell [Fri, 19 Mar 2004 00:02:56 +0000 (16:02 -0800)]
[PATCH] Hotplug CPUs: Take cpu Lock Around Migration
Grab cpu lock around sched_migrate_task() and sys_sched_setaffinity().
This is a noop without CONFIG_HOTPLUG_CPU.
The sched_migrate_task may have a performance penalty on NUMA if lots
of exec rebalancing is happening, however this only applies to
CONFIG_NUMA and CONFIG_HOTPLUG_CPU, which noone does at the moment
anyway.
Also, the scheduler in -mm solves the race another way, so this will
vanish then.
Andrew Morton [Thu, 18 Mar 2004 23:04:08 +0000 (15:04 -0800)]
[PATCH] remove_suid() should return error code
From: Nikita Danilov <Nikita@Namesys.COM>
remove_suid() ignores return value of notify_change()->i_op->setattr().
This mean, that even if file system fails to clear suid bit,
generic_file_aio_write_nolock() proceeds with write, which is unsafe.
Actually, even ext2's ->setattr() can fail, when trying to update ACL, for
example.
Attached patch modifies remove_suid() to return result of ->setattr(), and
updates in-tree callers.
Andrew Morton [Thu, 18 Mar 2004 23:03:58 +0000 (15:03 -0800)]
[PATCH] meye driver update
From: Stelian Pop <stelian@popies.net>
This patchlet is just a resync with my tree, it only increments the meye
driver version number and makes some small comment changes as suggested by
Randy Dunlap.
Andrew Morton [Thu, 18 Mar 2004 23:03:49 +0000 (15:03 -0800)]
[PATCH] VM overcommit documentation fixes
From: Andy Whitcroft <andyw@uk.ibm.com>
Whilst looking at the memory overcommit logic I noticed that the pointer to
the documentation from the *_vm_enough_memory calls is incorrect. Also
that in one instance the routine does not have the expected pointers.
Andrew Morton [Thu, 18 Mar 2004 23:03:30 +0000 (15:03 -0800)]
[PATCH] add note about "Copyright" to SubmittingDrivers
From: Grant Grundler <grundler@parisc-linux.org>
This patch adds a comment to "Documentation/SubmittingDrivers" about the
importance of adding a Copyright notice in submitted code.
The parisc-linux port has neglected this in the past and I've been slowly
trying to correct that (along with proper GPL header).
While I make it sound like GPL is the "only" acceptable license, I'll leave
it up to lawyers to determine what other appropriate license could be used
for a new driver.
Andrew Morton [Thu, 18 Mar 2004 23:02:40 +0000 (15:02 -0800)]
[PATCH] EDD: move code from i386-specific locations to generic
From: Matt Domsch <Matt_Domsch@dell.com>
Three patches to move the BIOS Enhanced Disk Drive code from i386-specific
locations into more generic locations, which will allow it to be used on
x86-64 as well.
Andrew Morton [Thu, 18 Mar 2004 23:02:23 +0000 (15:02 -0800)]
[PATCH] Fix uninlined memcmp on i386
From: DHollenbeck <dick@softplc.com>
This patch was needed against a pristine 2.6.4 kernel when compiling with
"gcc 3.4 _very recent_" using the -Os option.
Without this patch, modules would use a non-inline memcmp() and then not
find it in the kernel, causing depmod to complain and some modules not to
load.
Andrew Morton [Thu, 18 Mar 2004 23:01:15 +0000 (15:01 -0800)]
[PATCH] doc. updates/typos
From: "Randy.Dunlap" <rddunlap@osdl.org>
Remove the rest of references to smp.tex
Documentation/cpufreq => Documentation/cpu-freq
DocBook/tulip.{pdf,ps,html} => DocBook/tulip-user.{pdf,ps,html}
Bunch of other typos.
Andrew Morton [Thu, 18 Mar 2004 23:00:47 +0000 (15:00 -0800)]
[PATCH] config: choice fix
From: Roman Zippel <zippel@linux-m68k.org>
When a boolean choice value has a dependency of 'm' it can be shortly
treated as a tristate symbol. This fixes this and also add a small
optimization to precompute the value of the module symbol instead of
checking it all the time.
Andrew Morton [Thu, 18 Mar 2004 23:00:00 +0000 (15:00 -0800)]
[PATCH] don't abuse empty_zero_page (x86)
From: Brian Gerst <bgerst@didntduck.org>
Don't abuse empty_zero_page as temporary storage for boot parameters and
command line. This is a holdover from the days before discardable init
sections.
Andrew Morton [Thu, 18 Mar 2004 22:59:41 +0000 (14:59 -0800)]
[PATCH] sysfs: pin kobjects to fix use-after-free crashes
From: Maneesh Soni <maneesh@in.ibm.com>
Fix a sysfs use-after-free crash. The problem we have is of the kobject
going away while we have a live dentry (the corresponding sysfs directory)
still pointing to it throuh d_fsdata pointer. The patch makes sure to keep
the kobject alive by taking a reference to it during the life-time of
corresponding dentry.
o The following pins the kobject when sysfs assigns dentry and inode to
the kobject. This ensures that kobject is alive during the life time of
the dentry and inode, and people holding ref. to the dentry can access the
kobject without any problems.
o The ref. taken for the kobject is released through dentry->d_op->d_iput()
call when the dentry ref. count drops to zero and it is being freed. For
this sysfs_dentry_operations is introduced.
For testing one has to run the following test on a SMP box:
1) Do insmod/rmmod "dummy.o" network driver in a forever loop.
2) Parallely do "find /sys/class/net | xargs cat" also in a forever loop.
Andrew Morton [Thu, 18 Mar 2004 22:59:31 +0000 (14:59 -0800)]
[PATCH] Fix dentry refcounting in sysfs_remove_group()
From: Maneesh Soni <maneesh@in.ibm.com>
The following patch fixes the dentry refcounting, during
sysfs_remove_group() and also adds the missing dput() for the "extra" ref
taken during sysfs_create() for the sub-directory dentry corresponding to
attribute group.
I have re-done the patch fixing the race between sysfs_remove_dir() and
dcache_readdir(). If you recall, sysfs_remove_dir(kobj) manipulates the
->d_subdirs list for the dentry corresponding to the sysfs directory being
removed. It can end up deleting the cursor dentry which is added to the
->d_subdirs list during a concurrent dcache_dir_open() ==> dcache_readdir()
for the same directory. And as a result dcache_readdir() can loop for ever
holding dcache_lock.
The earlier patch which was included in -mm1 created problems which
resulted in list_del() BUG hits in prune_dcache(). The reason I think is
that in the main loop in sysfs_remove_dir(), dcache_lock is dropped and
re-acquired, and this could result in inconsistent ->d_subdirs list and
prune_dcache() may try to delete an already deleted dentry. I have
corrected this in the new patch as below.
I could do sysfs_remove_dir() more neatly on sysfs backing store patch set
as there I don't use the ->d_subdirs list. Instead the list of children
sysfs_dirent works out well. But untill sysfs backing store patch is
picked up the existing code suffer from this race. This can be easily
tested by running following two loops on a SMP box
# while true; do insmod drivers/net/dummy.ko; rmmod dummy; done
# while true; do find /sys/class/net > /dev/null; done
o This patch fixes sysfs_remove_dir race with dcache_readdir. There is
no need for sysfs_remove_dir to modify the d_subdirs list for the
directory being deleted as it is taken care in the final dput. Modifying
this list results in inconsistent d_subdirs list and causes infinite loop
in concurrently occurring dcache_readdir.
o The main loop is restarted every time, dcache_lock is re-acquired in
order to maintain consistency.
Andrew Morton [Thu, 18 Mar 2004 22:59:02 +0000 (14:59 -0800)]
[PATCH] ppc64: Fix POWER3 TCE allocation
From: Anton Blanchard <anton@samba.org>
- Fix for machines with 3GB IO holes (eg nighthawk).
- Increase the maximum number of PHBs and warn if we exceed this (we used
to walk off the end of the array)
- Only allocate an 8MB TCE table on POWER4
Andrew Morton [Thu, 18 Mar 2004 22:58:53 +0000 (14:58 -0800)]
[PATCH] ppc64: Fix SLB reload bug
From: Paul Mackerras <paulus@samba.org>
Recently we found a particularly nasty bug in the segment handling in the
ppc64 kernel. It would only happen rarely under heavy load, but when it
did the machine would lock up with the whole of memory filled with
exception stack frames.
The primary cause was that we were losing the translation for the kernel
stack from the SLB, but we still had it in the ERAT for a while longer.
Now, there is a critical region in various exception exit paths where we
have loaded the SRR0 and SRR1 registers from GPRs and we are loading those
GPRs and the stack pointer from the exception frame on the kernel stack.
If we lose the ERAT entry for the kernel stack in that region, we take an
SLB miss on the next access to the kernel stack. Taking the exception
overwrites the values we have put into SRR0 and SRR1, which means we lose
state. In fact we ended up repeating that last section of the exception
exit path, but using the user stack pointer this time. That caused another
exception (or if it didn't, we loaded a new value from the user stack and
then went around and tried to use that). And it spiralled downwards from
there.
The patch below fixes the primary problem by making sure that we really
never cast out the SLB entry for the kernel stack. It also improves
debuggability in case anything like this happens again by:
- In our exception exit paths, we now check whether the RI bit in the
SRR1 value is 0. We already set the RI bit to 0 before starting the
critical region, but we never checked it. Now, if we do ever get an
exception in one of the critical regions, we will detect it before
returning to the critical region, and instead we will print a nasty
message and oops.
- In the exception entry code, we now check that the kernel stack pointer
value we're about to use isn't a userspace address. If it is, we print a
nasty message and oops.
This has been tested on G5 and pSeries (both with and without hypervisor)
and compile-tested on iSeries.
Andrew Morton [Thu, 18 Mar 2004 22:58:43 +0000 (14:58 -0800)]
[PATCH] ppc64: Add numa=off command line option
From: Anton Blanchard <anton@samba.org>
Add numa=off command line option to disable NUMA support at runtime.
Useful if there are issues with our parsing of the NUMA toplogy or for
testing NUMA gains.
Alexander Viro [Thu, 18 Mar 2004 05:24:32 +0000 (21:24 -0800)]
[PATCH] hpfs: general cleanup
include files moved to fs/hpfs/, gratitious #include removed, stuff that
doesn't have to be global made static, misindented chunk of
hpfs_readdir() put in place, etc.
Alexander Viro [Thu, 18 Mar 2004 05:24:22 +0000 (21:24 -0800)]
[PATCH] hpfs: fix locking scheme
Fixed the locking scheme. The need of extra locking was caused by
the fact that hpfs_write_inode() must update directory entry; since HPFS
directories are implemented as b-trees, we must provide protection both
against rename() (to make sure that we update the entry in right directory)
and against rebalancing of the parent.
Old scheme had both deadlocks and races - to start with, we had no
protection against rename()/unlink()/rmdir(), since (a) locking parent
was done without any warranties that it will remain our parent and (b)
check that we still have a directory entry (== have positive nlink) was
done before we tried to lock the parent. Moreover, iget serialization
killed two steps ago gave immediate deadlocks if iget() of parent had
triggered another hpfs_write_inode().
New scheme introduces another per-inode semaphore (hpfs-only,
obviously) protecting the reference to parent. It's taken on
rename/rmdir/unlink victims and inode being moved by rename. Old semaphores
are taken only on parent(s) and only after we grab one(s) of the new kind.
hpfs_write_inode() gets the new semaphore on our inode, checks nlink and
if it's non-zero grabs parent and takes the old semaphore on it.
Order among the semaphores of the same kind is arbitrary - the only
function that might take more than one of the same kind is hpfs_rename()
and it's serialized by VFS.
We might get away with only one semaphore, but then the ordering
issues would bite us big way - we would have to make sure that child is
always locked before parent (hpfs_write_inode() leaves no other choice)
and while that's easy to do for almost all operations, rename() is a bitch -
as always. And per-superblock rwsem giving rename() vs. write_inode()
exclusion on hpfs would make the entire thing too baroque for my taste.
->readdir() takes no locks at all (protection against directory
modifications is provided by VFS exclusion), ditto for ->lookup().
->llseek() on directories switched to use of (VFS) ->i_sem, so
it's safe from directory modifications and ->readdir() is safe from it -
no hpfs locks are needed here.
Alexander Viro [Thu, 18 Mar 2004 05:24:12 +0000 (21:24 -0800)]
[PATCH] hpfs: deadlock fixes
We used to have GFP_KERNEL kmalloc() done by the code that held hpfs
lock on directory. That could trigger a call of hpfs_write_inode() and
deadlock; fixed by switch to GFP_NOFS. Same for hpfs inodes themselves
- hpfs_write_inode() calls iget() and that could trigger both the
deadlocks (avoidable with very baroque locking scheme) and stack
overflows (unavoidable unless we kill potential recursion here).
Alexander Viro [Thu, 18 Mar 2004 05:24:02 +0000 (21:24 -0800)]
[PATCH] hpfs: hpfs iget locking cleanup
Killed the nightmares in hpfs iget handling. Since in some (fairly
frequent) cases hpfs_read_inode() could avoid any IO (basically, lookup
hitting a native HPFS regular file can get all data from directory
entry) hpfs had a flag passed to that sucker. Said flag had been
protected by a semaphore lookalike made out of spit and duct-tape and
callers of iget looked like
hpfs_lock_iget(sb, flag);
result = iget(sb, ino);
hpfs_unlock_iget(sb);
Since now we are calling hpfs_read_inode() directly (note that calling
it without hpfs_lock_iget() would simply break) we can forget all that
crap and get rid of the flag - caller knows what it wants to call.
BTW, that had killed one of the last sleep_on() users in fs/*/*.
Alexander Viro [Thu, 18 Mar 2004 05:23:43 +0000 (21:23 -0800)]
[PATCH] hpfs: new/read/write_inode() cleanups
1) common initialization for all paths in hpfs_read_inode() taken into
a separate helper (hpfs_init_inode())
2) hpfs mkdir(),create(),mknod() and symlink() do not bother with
iget() anymore - they call new_inode(), do initializations and insert new
inode into icache. Handling of OOM failures cleaned up - if we can't
allocate in-core inode, bail instead of corrupting the filesystem.
Allocating in-core inode early also avoids one of the deadlocks here
(hpfs_write_inode() from memory pressure by kmem_cache_alloc() could
deadlock on attempt to lock our directory).
3) hpfs_write_inode() marks the inode dirty again in case if it
fails to iget() its parent directory. Again, OOM could trigger fs corruption
here.
Alexander Viro [Thu, 18 Mar 2004 05:23:34 +0000 (21:23 -0800)]
[PATCH] hpfs: clean up lock ordering
hpfs_{lock,unlock}_{2,3}inodes() killed; all places that take more than
one lock have ->i_sem held by VFS on all inodes involved and all hpfs per-inode
locks are of the same type. IOW, we can replace these guys with multiple
hpfs_lock_inode() - order doesn't matter here.
Andrew Morton [Wed, 17 Mar 2004 10:55:20 +0000 (02:55 -0800)]
[PATCH] exportfs - Remove unnecessary locking from find_exported_dentry()
From: "Jose R. Santos" <jrsantos@austin.ibm.com>
After discussing it with Neil, he fell that the original justification for
taking the kernel_lock on find_exported_dentry() is not longer valid and
should be safe to remove.
This patch fixes an issue while running SpecSFS where under memory
pressure, shrinking dcache cause find_exported_dentry() to allocate
disconnected dentries that later needed to be properly connected. The
connecting part of the code was done with BKL taken which cause a sharp
drop in performance during iterations and profiles showing 75% time spent
on find_exported_dentry(). After applying the patch, time spent on the
function is reduce to <1%.
I have tested this on an 8-way machine with 56 filesystems for several days
now with no problems using ext2, ext3, xfs and jfs.
Andrew Morton [Wed, 17 Mar 2004 10:54:42 +0000 (02:54 -0800)]
[PATCH] ppc64: wrap some stuff in __KERNEL__
From: Anton Blanchard <anton@samba.org>
- remove now unused kernel syscalls.
- wrap recently added defines in #ifdef __KERNEL__, fixes glibc
compile issue
- some of our extra syscalls used asmlinkage, some did not. Make them
consistent
Andrew Morton [Wed, 17 Mar 2004 10:54:33 +0000 (02:54 -0800)]
[PATCH] ppc32: Fix booting some IBM PRePs
From: Tom Rini <trini@kernel.crashing.org>
The following patch comes from Paul Mackerras. Earlier on in 2.6,
arch/ppc/boot/utils/mkprep.c was changed slightly so that it would build
and work on Solaris. Doing this required changing from filling out
pointers to an area to filling out a local copy of the struct. However, a
memcpy was left out, and the info is only needed on some machines to boot.
The following adds in the missing memcpy and allows for IBM PRePs to boot
from a raw floppy again.
Don Fry [Wed, 17 Mar 2004 15:12:15 +0000 (10:12 -0500)]
[PATCH] back out netdev_priv() for loopback
Please apply this fix to backout an erroneous change in loopback.c
The statistics structure is allocated separately from the
loopback_dev structure, and the current code overwrites something
other than the statistics. In my case the scsi_cmd_pool structure.
Krzysztof Halasa [Wed, 17 Mar 2004 07:54:59 +0000 (02:54 -0500)]
[netdrvr de2104x] fix ifup/down and promise mode
The attached patch fixes the problem: de->macmode variable,
meant to shadow MacMode (CSR6) register, was used inconsistently,
causing some updates to this register to be dropped.
2.4 kernel doesn't shadow this register at all, so I removed
shadowing from 2.6 as well.
David Mosberger [Wed, 17 Mar 2004 02:18:16 +0000 (18:18 -0800)]
ia64: Prevent GCC from clobbering r13. Found by Luming You.
Without this change, GCC thinks it's OK to clobber r13. It doesn't do it
very often, but it's enough if it does it once and it turns out
acpi_bus_receive_event() had code that would trigger this issue.
Fix by declaring r13 as a global register variable.
* wireless/Kconfig: fix typos, add SMC2835W-V2
* islpci_hotplug.c: new version 1.1, authors list, and
module description updated appropriately
* isl_ioctl.c, islpci_dev.c,
islpci_eth.c, islpci_hotplug.c, islpci_mgt.c:
s/ndev->priv/netdev_priv(ndev)/g
* islpci_hotplug.c: Add PCI ID values for SMC2835W-V2 cardbus card
Patch by Manuel Lauss <manuel.lauss@fh-hagenberg.at>
* isl_38xx.[ch]: include firmware.h in header, remove
declaration of headers in c file. Fix compiler warnings.
* islpci_dev.c (islpci_alloc_memory),
* islpci_eth.c (islpci_eth_cleanup_transmit,
islpci_eth_transmit, islpci_eth_receive): deal with skb stray
pointer, declare NULL.
* isl_38xx.c: remove unecessary __KERNEL_SYSCALLS__ and
re-ordered headers per vger.kernel.org - liking.
* isl_ioctl.c, islpci_mgt.c: move from MODULE_PARAM to the new
module_param, which is type-safe. Includes the new
<linux/moduleparam.h>.
* isl_ioctl.c (prism54_[s|g]et_[maxframeburst|profile]): added.
Not adding ioctls as ajfa is working on moving current private ioctls
to subioctls.
* isl_oid.h (dot11_[maxframeburst|preamblesettings|
slotsettings|nonerpstatus|nonerpprotection]_t): added.
Note: more ioctls can be added here, I believe problems
with mixed modes can be pinpointed here, with these values.
From: Boehm Olaf <olaf.boehm@lanner.de>
From: Jindrich Makovicka <makovick@kmlinux.fjfi.cvut.cz>
Wider range for 33MHz timing and PLL setup for HPT374
(using the HPT370A timing table, as it is the same as
used in the "opensource" driver by HighPoint).
Andrew Morton [Tue, 16 Mar 2004 23:10:45 +0000 (15:10 -0800)]
[PATCH] SHMLBA compat task alignment fix
From: Arun Sharma <arun.sharma@intel.com>
The current Linux implementation of shmat() insists on SHMLBA alignment even
when shmflg & SHM_RND == 0. This is not consistent with the man pages and
the single UNIX spec, which require only a page-aligned address.
However, some architectures require a SHMLBA alignment for correctness in all
cases. Such architectures use __ARCH_FORCE_SHMLBA.
Andrew Morton [Tue, 16 Mar 2004 23:09:29 +0000 (15:09 -0800)]
[PATCH] clean up devices.txt
From: "Cagle, John (ISS-Houston)" <john.cagle@hp.com>
Patch 1 cleans up the format by making devices.txt easily parsable.
Mainly this involved adding the word "block" after all the block major
numbers since the previous format didn't include it.
Andrew Morton [Tue, 16 Mar 2004 23:09:08 +0000 (15:09 -0800)]
[PATCH] make config_max_raw_devices work
From: "Kenneth Chen" <kenneth.w.chen@intel.com>
Even though there is a CONFIG_MAX_RAW_DEVS option, it doesn't actually
increase the number of raw devices beyond 256 because during the char
registration, it uses the standard register_chrdev() interface which has
hard coded 256 minor in it. Here is a patch that fix this problem by using
register_chrdev_region() and cdev_(init/add/del) functions.