Alexander Viro [Fri, 5 Sep 2003 03:53:56 +0000 (20:53 -0700)]
[PATCH] large dev_t - second series (6/15)
tty redirect handling sanitized. Such ttys (/dev/tty and
/dev/console) get a different file_operations; its ->write() handles
redirects; checks for file->f_op == &tty_fops updated, checks for
major:minor being that of a redirector replaced with check for
->f_op->write value. Piece of code in tty_io.c that had been #if 0
since 0.99<something> had been finally put out of its misery. kdev_val()
is gone.
Alexander Viro [Fri, 5 Sep 2003 03:53:47 +0000 (20:53 -0700)]
[PATCH] large dev_t - second series (5/15)
cdevname() killed, there was only one remaining user
(tty_paranoia_check()) and in that case cdevname() was worse
than plain major:minor (basically, it's "you've got corrupted
inode that was supposed to belong to tty device; here's what
I'd found in ->i_rdev")
Alexander Viro [Fri, 5 Sep 2003 03:53:38 +0000 (20:53 -0700)]
[PATCH] large dev_t - second series (4/15)
cciss cleanup - instead of playing with device numbers, we add
helper functions that get host and drive structures by gendisk and use
them in open/ioctl/release, same as had been done for cpqarray.
Russell King [Fri, 5 Sep 2003 17:36:50 +0000 (18:36 +0100)]
[ARM] Don't read the CPU control reg back - it may be write only.
Some ARM CPUs don't allow CP15 CR1 control register to be read.
Therefore, to ensure that the value hits the control register on
Xscale, read back the CP15 CR0 ID register instead.
Russell King [Fri, 5 Sep 2003 17:28:53 +0000 (18:28 +0100)]
[ARM] Restore preempt count before reporting unbalanced preempt count
On ARM, we oops when we detect that an interrupt handler has unbalanced
the preempt count. We should restore the preempt count when we started
to handle the interrupt and then cause the oops.
[PATCH] incomplete asm constraints in arch/i386/pci/pcbios.c
This fixes a "miscompile" HP reported against gcc 3.3 with
-march-pentium4. It turned out to be a non-complete asm contraint. the
existing constraint on "opt" was on the address of "opt", which allowed
gcc to reorder the setting of the fields inside opt to beyond the asm
that uses it, which is less than useful at best.
This changes the way futexes are indexed, so that they don't pin pages.
It also fixes some bugs with private mappings and COW pages.
Currently, all futexes look up the page at the userspace address and pin
it, using the pair (page,offset) as an index into a table of waiting
futexes. Any page with a futex waiting on it remains pinned in RAM,
which is a problem when many futexes are used, especially with FUTEX_FD.
Another problem is that the page is not always the correct one, if it
can be changed later by a COW (copy on write) operation. This can
happen when waiting on a futex without writing to it after fork(),
exec() or mmap(), if the page is then written to before attempting to
wake a futex at the same adress.
There are two symptoms of the COW problem:
- The wrong process can receive wakeups
- A process can fail to receive required wakeups.
This patch fixes both by changing the indexing so that VM_SHARED
mappings use the triple (inode,offset,index), and private mappings use
the pair (mm,virtual_address).
The former correctly handles all shared mappings, including tmpfs and
therefore all kinds of shared memory (IPC shm, /dev/shm and
MAP_ANON|MAP_SHARED). This works because every mapping which is
VM_SHARED has an associated non-zero vma->vm_file, and hence inode.
(This is ensured in do_mmap_pgoff, where it calls shmem_zero_setup).
The latter handles all private mappings, both files and anonymous. It
isn't affected by COW, because it doesn't care about the actual pages,
just the virtual address.
The patch has a few bonuses:
1. It removes the vcache implementation, as only futexes were
using it, and they don't any more.
2. Removing the vcache should make COW page faults a bit faster.
3. Futex operations no longer take the page table lock, walk
the page table, fault in pages that aren't mapped in the
page table, or do a vcache hash lookup - they are mostly a
simple offset calculation with one hash for the futex
table. So they should be noticably faster.
Special thanks to Hugh Dickins, Andrew Morton and Rusty Russell for
insightful feedback. All suggestions are included.
* get rid of leftover sti
* no longer need MOD_INC/DEC stuff
* get rid of dead code related to MOD_INC/DEC
* use module_init/module_exit to cleanly run init code
[SCTP] Move a local variable declaration ahead of the function code.
Apparently the new gcc 3.2.2 allows local variable declarations within
the code of a function if it is not used earlier. But older gcc's do
not allow this.
Cleanup ikconfig
- use single_open for built_with file.
- get rid of unneeded globals
- use copy_to_user instead of char at a time
- only need the read routine, proc defaults to correct behaviour
for the rest.
James Bottomley [Thu, 4 Sep 2003 03:13:28 +0000 (20:13 -0700)]
[PATCH] fix remap of shared read only mappings
When mmap MAP_SHARED is done on a file, it gets marked with VM_MAYSHARE
and, if it's read/write, VM_SHARED. However, if it is remapped with
mremap(), the MAP_SHARED is only passed into the new mapping based on
VM_SHARED. This means that remapped read only MAP_SHARED mappings lose
VM_MAYSHARE. This is causing us a problem on parisc because we have to
align all shared mappings carefully to mitigate cache aliasing problems.
The fix is to key passing the MAP_SHARED flag back into the remapped are
off VM_MAYSHARE not VM_SHARED.
[PATCH] ide: remove supports_dma field from ide_driver_t
driver->supports_dma was used together with CONFIG_IDEDMA_ONLYDISK to limit
DMA access to disk devices only. However Alan introduced new scheme in 2.5.63
and this field is not needed any longer because all ide drivers support DMA.
Russell King [Thu, 4 Sep 2003 02:43:03 +0000 (19:43 -0700)]
[PATCH] Don't #ifdef prototypes
It seems that changing CONFIG_BLK_DEV_INITRD causes the whole kernel to
rebuild due to an inappropriate ifdef in linux/fs.h - we should not
conditionalise prototypes.
In addition, real_root_dev is only used by two files (kernel/sysctl.c
and init/do_mounts_initrd.c) so it makes even less sense that it was in
linux/fs.h
Andrew Morton [Wed, 3 Sep 2003 18:14:04 +0000 (11:14 -0700)]
[PATCH] hermes.h fails with outw_p() in :?
From: Michael Pruznick <michael_pruznick@mvista.com>
build errors:
hermes.h: In function `hermes_set_irqmask':
hermes.h:337: parse error before "do"
hermes.h:337: parse error before ';' token
hermes.h: In function `hermes_write_words':
In mips, outw_p() is a #define do...while(0) which, in the case of ?:,
results in a statement being used where an expression is required.
Andrew Morton [Wed, 3 Sep 2003 18:13:39 +0000 (11:13 -0700)]
[PATCH] i8042 free_irq() aliasing fix
The same address `i8042_request_irq_cookie' is used in three places for the
i8042 request_irq() argument. This means that if someone calls
i8042_check_mux() or i8042_check_aux() while the IRQ is in use, the
free_irq() call in there will free the wrong IRQ handler.
So give all three instances of request_irq() in i8042.c a distinct address by
which to identify the IRQ instance.
(This is probably a non-bug, because the `check' functions are not called
when the device is open, but it is better this way).
Andrew Morton [Wed, 3 Sep 2003 18:13:22 +0000 (11:13 -0700)]
[PATCH] Enable SELinux via boot parameter
From: James Morris <jmorris@redhat.com>
This patch adds an 'selinux' boot parameter which must be used to actually
enable SELinux.
It follows some internal discussion about deployment issues, where a vendor
would want to ship a single kernel image with SELinux built-in, without
requiring the user to use it.
Without specifying selinux=1 as a boot parameter, SELinux will not register
with LSM and selinuxfs will not be registered as a filesystem. This causes
SELinux to be bypassed entirely from then on, and no performance overhead
is imposed. Other security modules may then also be loaded if needed.
Andrew Morton [Wed, 3 Sep 2003 18:13:12 +0000 (11:13 -0700)]
[PATCH] Remove percpufication of in_flight counter in
From: Ravikiran G Thirumalai <kiran@in.ibm.com>
The routine disk_round_stats showed up considerably under oprofile for high
disk io load (four processes doing dd to the same disk (different
partitions) on a 4 way).
This is because the counter in_flight which is per-cpu right now gets read
every time disk_round_stats gets called. Per cpu counters like disk
statistics improve write speed, but reads are slow (since all cpus' local
counter values have to be read and summed up). Considering the fact that
in_flight counter is modified post disk_round_stats (which reads the
in_flight counter) it is better not to per-cpu this counter.
Following patch does just that. Below is the profile comparison before and
after the change. This was on a 4 way PIII Xeon, 1G ram, 2.6.0-test4-mm2.
Andrew Morton [Wed, 3 Sep 2003 18:13:04 +0000 (11:13 -0700)]
[PATCH] MODULE_ALIAS() in char devices
From: Rusty Russell <rusty@rustcorp.com.au>
Previously, default aliases were hardwired into modutils. Now they should
be inside the modules, using MODULE_ALIAS() (they will be overridden by any
user alias).
Andrew Morton [Wed, 3 Sep 2003 18:12:53 +0000 (11:12 -0700)]
[PATCH] MODULE_ALIAS() in block devices
From: Rusty Russell <rusty@rustcorp.com.au>
Previously, default aliases were hardwired into modutils. Now they should
be inside the modules, using MODULE_ALIAS() (they will be overridden by any
user alias).
Andrew Morton [Wed, 3 Sep 2003 18:12:43 +0000 (11:12 -0700)]
[PATCH] might_sleep() improvements
From: Mitchell Blank Jr <mitch@sfgoth.com>
This patch makes the following improvements to might_sleep():
o Add a "might_sleep_if()" macro for when we might sleep only if some
condition is met. It's a bit tidier, and has an unlikely() in it.
o Add might_sleep checks to skb_share_check() and skb_unshare() which
sometimes need to allocate memory.
o Make all architectures call might_sleep() in both down() and
down_interruptible(). Before only ppc, ppc64, and i386 did this check.
(sh did the check on down() but not down_interruptible())
Andrew Morton [Wed, 3 Sep 2003 18:12:08 +0000 (11:12 -0700)]
[PATCH] elevator insertion fixes
From: Nick Piggin <piggin@cyberone.com.au>
This fixes a bug in deadline and AS that causes insert_here to be ignored on
blk_fs_requests. This has been causing problems with SCSI requeueing code.
It makes elevator insertion more correct as advertised wrt insert_here and
REQ_SOFTBARRIER.
It also fixes a buglet in the as_requeue code where the request wasn't being
put into the front of the list (in rare cases).
Andrew Morton [Wed, 3 Sep 2003 18:11:33 +0000 (11:11 -0700)]
[PATCH] Fix odd code in bio_add_page
From: Neil Brown <neilb@cse.unsw.edu.au>
With the current code in bio_add_page, if fail_segments is ever set, it
stays set, so bio_add_page will eventually fail having recounted the
segmentation once.
I don't think this is intended. This patch changes the code to allow
success if the recounting the segments helps.
Originally from Al Viro
NE11-ltpc
* switched ltpc to dynamic allocation
* ltpc: embedded ->priv
* ltpc: fixed bugs in DMA allocation
* ltpc: fixed resource leaks on failure exits
* ltpc: fixed part of timer bugs (still a-f**ing-plenty of those)
* ltpc: fixed order of freeing bugs
Added
* switch to free_netdev
Now that all the magic chain of static devices is gone from Space.c
The initialization of the one remaining static device (ie the loopback driver)
can be simplified.
One small change was to reduce possibility of failing the initialization if
allocation of private data failed by just going without statistics.
Based on Al viro's NE10-sdla
* switched sdla to dynamic allocation
* sdla: embedded ->priv
* sdla: fixed resource leaks on failure exits
Additionally fixes.
* get rid of cli/sti
* get rid of MOD/INC
Builds and probes, but don't have the hardware.
Driver has never built on 2.6 before this.
This set of patches is a mixture of Al's work to device initialization,
and some of my own to complete it for all the ether, tr, sbni, and loopback
devices.
The first patch adds the hook for converting old driver initialization
code over to dynamic allocation.
This part extracted from Al viro's set of net driver changes for ethertap.
- keep list of arrays for devices and use a lock
- make sure header is contiguous before overlaying data structure
- dynamically allocate dev->priv with alloc_netdev
- get rid of MOD_INC/DEC
- free devices on module unload
- keep refcount on slave device's since holding a ptr
o Big performance improvement. The version of the driver in the kernel
still had a "mdelay(1)" after every register write. This capped
performance at about 8Mbps and ate tons of CPU time. (Luckily, most
users of this card are just terminating a DSL line where its not too
noticeable)
However, after removing this delay the card started getting out of
sync with the driver under stress tests. After a couple days of chasing
the bug I finally determine that the card's support for transmitting
partial PDUs just doesn't quite work as advertised (before we would send
a partial PDU to completely fill a VCC's transmit buffer and then send
the rest of the skb when more buffer space filled up). The usefulness of
this is somewhat doubtful anyway and removing it cleaned up a lot of code.
I also added some memory barriers to make sure operations to the card
happen in the correct order.
Now for the first time ever we get near line-rate performance out of this
card (~19Mb/s TCP in netperf between two ~300Mhz machines)
o Locking changes (essentially the patch Chas sent me a couple weeks ago
with some minor tweaking) I'm still not sure we're getting 100% of the
cases right but it's definitely FAR better than the old lock-less version.
o Cleanup the backlog draining code in lanai_shutdown_tx_vci()
o Remove outdated comment describing how to compile the module
o Got rid of the "service_novcc_[tr]x" stats - it's really the same error
as "service_[tr]x" - there's no reason to count them separately.
o Use the ATM_25_PCR constant instead of computing it for ourselves