Trond Myklebust [Fri, 4 Oct 2002 10:46:25 +0000 (03:46 -0700)]
[PATCH] NFS: readdir reply truncated!
Duh... Even a simple one-liner test can be wrong. The really sad bit
is that I made the same mistake 3 weeks ago, fixed it, and then lost
track of the fix...
To recap fix to fix: A valid end of directory marker has to read
(entry[0]==0 && entry[1]!=0). Here is final correct (I hope) patch.
[IPV4/IPV6]: General cleanups.
- Use s6_XXX instead of in6_u.s6_XXX
- Use macros not magic numbers
- Avoid __constant_{hton,ntoh}{l,s} in runtime code.
Traditionally, device detection os s390 is done completely
at a _very_ early stage during bootup (from init_irq(),
i.e. before memory management or the console are there).
This has always been a bad idea, but now it broke even more
since the linux driver model requires devices detection
to take place after the core_initcalls are done.
We now do only a small amount of scanning (probably
less in the future) at the early stage, the bulk of it
is done from a proper subsys_initcall(). This requires
some changes in related areas:
- the machine check handler initialization is split in
two halves, since we want to catch major machine malfunctions
as early as possible, but device machine checks can only
be caught after the channel subsystem is up.
- some functions that are called from the css initialization
made some assumptions of when to use kmalloc or bootmem_alloc,
which were broken anyway. We fix this here and hopefully
can get rid of bootmem_alloc for the css completely in the future.
- the debug logging feature for s390 was not used for functions
in the initialization before, since it requires the memory
management to be working. Now that we can be sure that it
works, some special cases can be removed.
Now that these changes are done, a partial implementation of the
device model for the channel subsystem is possible, but at this
point, none of the device drivers make use of that yet.
Add 'signal quiesque' feature to s390 hardware console. A signal quiesce
is sent from VM or the service element every time the system should shut
down. We receive the quiesce signal and call ctrl_alt_del(). Finally the
mainframes have ctrl-alt-del as well :-)
Rewrite s390 ptrace code in a more readable and less buggy way. As a part of
this, all psw related definitions are moved into ptrace.h from a number of
different locations.
Jens Axboe [Fri, 4 Oct 2002 03:44:29 +0000 (20:44 -0700)]
[PATCH] pass elevator type by reference, not value
Ingo spotted this one too, it's a leftover from when the elevator type
wasn't a variable. Also don't pass in &q->elevator, it can always be
deduced from queue itself of course.
Jens Axboe [Fri, 4 Oct 2002 03:42:31 +0000 (20:42 -0700)]
[PATCH] ide-cd updates
Here starts some new ide updates.
o Don't turn on dma before after having sent the packet cdb
o Clear sense data given in generic command, otherwise the user cannot
trust it. I already sent this patch for 2.4.20-pre inclusion.
Ben Collins [Fri, 4 Oct 2002 03:33:22 +0000 (20:33 -0700)]
[PATCH] IEEE1394 updates to 2.5.40
- Fixup for new tq changes
- Fix dv1394 for use without devfs
- Fix dv1394 for PAL capture
- Fix a hard to trigger bug in nodemgr.c
- Add another broken firmware device to sbp2's list
Matthew Wilcox [Thu, 3 Oct 2002 07:56:22 +0000 (00:56 -0700)]
[PATCH] Remove another for_each_process loop
Convert send_sigurg() to the for_each_task_pid() mechanism. Also in
the case where we were trying to send a signal to a non-existent PID,
don't bother searching for -PID in the PGID array; we won't find it.
The following patch removes the export of the sys_call_table.
There are no uses of this export that are valid and correct. The uses I've
found so far are
1. Calling syscalls from inside kernel modules
iBCS/Linux-abi used to do this (and this is the reason for the export
in the first place), however it does
no longer, because newer gcc's (2.96/3.x) don't allow
function pointer calls with a mismatching type. Also it's much better to
just call the sys_foo functions directly (most are export symbol'd already
and exporting more if needed wouldn't be a problem, they are clearly a
stable interface). Since gcc does no longer allow this
(and I doubt older ones allowed it for all platforms) this I
consider invalid and unneeded use.
2. Install new syscalls from kernel modules
LiS seems to be doing this. The correct way to do this is how NFS does
it for its syscall, and that doesn't need the syscall table to be
exported for this. Without an in-kernel helper like NFS has, it is not
possible to do this race free wrt module-unloads etc. Eg this use of the
export is unneeded and incorrect.
3. Intercept system calls
OProfile (and intel's vtune which is similar in function) used to do this;
however what they really need is a notification on certain
events (exec() mostly). The way modules do this is store the original
function pointer, install a new one that calls the old one after storing
whatever info they need. This mechanism breaks badly in the light of
multiple such modules doing this versus modules
unloading/uninstalling their handlers (by restoring their saved pointer
that may or may not point to a valid handler anymore).
Eg the use of the export in this just a bandaid due to lack of a
proper mechanism, and also incorrect and crash prone.
4. Extend system calls
The mechanism for this is identical to the previous one, except
that now the actual syscall behavior is changed. I don't think open source
modules do this (generally they don't need to, just adding things to the
kernel proper works for them), however I've
seen IBM's closed source cluster fs do this.
The objections to the mechanism are the same as in 3. Also
this changes the userspace ABI effectively, something which is undesireable.
Manfred Spraul [Thu, 3 Oct 2002 06:34:28 +0000 (23:34 -0700)]
[PATCH] pipe bugfix /cleanup
pipe_write contains a wakeup storm, 2 writers that write into the same
fifo can wake each other up, and spend 100% cpu time with
wakeup/schedule, without making any progress.
The only regression I'm aware of is that
$ dd if=/dev/zero | grep not_there
will fail due to OOM, because grep does something like
if it operates on pipes, and due to the improved syscall merging, read
will always return the maximum possible amount of data. But that's a grep
bug, not a kernel problem.
since the timer functions already did a !timer->base check this did not
have any effect on their fastpath.
- the rule from now on is that timer->base is set upon activation of the
timer, and cleared upon deactivation. This also made it possible to:
- reorganize all the timer handling code to not assume anything about
timer->entry.next and timer->entry.prev - this also removed lots of
unnecessery cleaning of these fields. Removed lots of unnecessary list
operations from the fastpath.
- simplified del_timer_sync(): it now uses del_timer() plus some simple
synchronization code. Note that this also fixes a bug: if mod_timer (or
add_timer) moves a currently executing timer to another CPU's timer
vector, then del_timer_sync() does not synchronize with the handler
properly.
- bugfix: moved run_local_timers() from scheduler_tick() into
update_process_times() .. scheduler_tick() might be called from the fork
code which will not quite have the intended effect ...
- removed the APIC-timer-IRQ shifting done on SMP, Dipankar Sarma's
testing shows no negative effects.
- cleaned up include/linux/timer.h:
- removed the timer_t typedef, and fixes up kernel/workqueue.c to use
the 'struct timer_list' name instead.
- removed unnecessery includes
- renamed the 'list' field to 'entry' (it's an entry not a list head)
- exchanged the 'function' and 'data' fields. This, besides being
more logical, also unearthed the last few remaining places that
initialized timers by assuming some given field ordering, the patch
also fixes these places. (fs/xfs/pagebuf/page_buf.c,
net/core/profile.c and net/ipv4/inetpeer.c)
- removed the defunct sync_timers(), timer_enter() and timer_exit()
prototypes.
- added docbook-style comments.
- other kernel/timer.c changes:
- base->running_timer does not have to be volatile ...
- added consistent comments to all the important functions.
- made the sync-waiting in del_timer_sync preempt- and lowpower-
friendly.
i've compiled, booted & tested the patched kernel on x86 UP and SMP. I
have tried moderately high networking load as well, to make sure the timer
changes are correct - they appear to be.
Ingo Molnar [Thu, 3 Oct 2002 06:32:38 +0000 (23:32 -0700)]
[PATCH] sigfix-2.5.40-D6
This fixes all known signal semantics problems.
sigwait() is really evil - i had to re-introduce ->real_blocked. When a
signal has no handler defined then the actual action taken by the kernel
depends on whether the sigwait()-ing thread was blocking the signal
originally or not. If the signal was blocked => specific delivery to the
thread, if the signal was not blocked => kill-all.
fortunately this meant that PF_SIGWAIT could be killed - the real_blocked
field contains all the necessery information to do the right decision at
signal-sending time.
i've also cleaned up and made the shared-pending code more robust: now
there's a single central dequeue_signal() function that handles all the
details. Plus upon unqueueing a shared-pending signal we now re-queue the
signal to the current thread, which this time around is not going to end
up in the shared-pending queue. This change handles the following case
correctly: a signal was blocked in every signal, then one thread unblocks
it and gets the signal delivered - but there's no handler for the signal
=> the correct action is to do a kill-all.
i removed the unused shared_unblocked field as well, reported by Oleg
Nesterov.
now we pass both signal-tst1 and signal-tst2, so i'm confident that we got
most of the details right.
Ingo Molnar [Thu, 3 Oct 2002 06:32:09 +0000 (23:32 -0700)]
[PATCH] futex-2.5.40-B5
This does a number of futex bugfixes, performance improvements and
cleanups.
The bugfixes are:
- fix locking bug noticed by Martin Wirth: the ordering of
page_table_lock, vcache_lock and futex_lock was inconsistent and
created the possibility of an SMP deadlock.
- fix spurious wakeup noticed by Andrew Morton: the get_user() in
futex_wait() can set the task state to TASK_RUNNING.
- fix futex_wake COW race, noticed by Martin Wirth - futex_wake() has to
go through the same lookup rules as the futex_wait() code, otherwise it
might end up trying to wake up based on the wrong physical page.
Improvements:
- speed up the basic addrs => page lookup done by the futex code. It used
to do an unconditional get_user_pages() call, which did a vma lookup
and other heavy-handed tactics - while the common case is that the
page is mapped and available. Furthermore, due to the COW-race code we
had to re-check the mapping anyway, which made the get_user_pages()
thing pretty unnecessery. This inefficiency was noticed by Martin
Wirth.
the new lookup code first does a lightweight follow_page(), then if no
page is present we do the get_user_pages() thing.
- locking cleanups - the new lookup code made some things simpler, eg.
the hash calculation can now be done in queue_me().
Ingo Molnar [Thu, 3 Oct 2002 06:31:55 +0000 (23:31 -0700)]
[PATCH] dump_stack() cleanup, BK-curr
This modifies x86's dump_stack() to print out just the backtrace, not
the stack contents. The patch also adds one more whitespace after the
numeric EIP value. The old dump looked this way:
Ivan Kokshaysky [Thu, 3 Oct 2002 06:15:53 +0000 (23:15 -0700)]
[PATCH] PCI: probing read-only BARs
Some pci devices may have base address registers locked with non-zero values.
Examples:
- AGP aperture BAR of AMD-7xx host bridges: if the AGP window disabled,
this BAR is read-only and read as 0x00000008;
- BAR0-4 of ALi IDE controllers can be non-zero and read-only.
Obviously, we can't calculate correct size of the respective region in
this case (for AMD AGP window we'll get 4 GB resource - ouch).
So I think that we should ignore r/o BARs (let the device specific
fixups deal with them if needed).
Patch appended (note that extra write(0)/read-back pair is required,
as the BAR might be programmed with all 1s).