Anton Blanchard [Thu, 2 May 2002 03:01:43 +0000 (13:01 +1000)]
ppc64: Only implement thread priority macros on HMT or iSeries kernels
Drop back to eieio in spinlocks for the moment due to performance
issues of sync on power3
Andrew Morton [Wed, 1 May 2002 03:26:08 +0000 (20:26 -0700)]
[PATCH] Unresolved symbol block_flushpage
block_flushpage() used to be a macro which pointed at the
exported discard_bh_page(). I turned block_flushpage() into
a real function but forgot the export.
Douglas Gilbert [Wed, 1 May 2002 03:07:03 +0000 (20:07 -0700)]
[PATCH] scsi disk (sd) driver 2.5.12
This patch has the last bit of Justin Gibb's patch
described in:
http://marc.theaimsgroup.com/?l=linux-scsi&m=101200279101550&w=2
[this bit for the sd driver]
There is also a major code cleanup of the sd driver with
documentation headers added and a few obvious bugs fixed
described in:
http://marc.theaimsgroup.com/?l=linux-scsi&m=101798201714399&w=2
I did this cleanup.
Justin's patch has been in Dave's tree for several months while
my code cleanup patch has been there since 2.5.9-dj1 .
Douglas Gilbert [Wed, 1 May 2002 03:05:08 +0000 (20:05 -0700)]
[PATCH] scsi_error 2.5.12
The attachment is part of a patch from Justin Gibbs
described in:
http://marc.theaimsgroup.com/?l=linux-scsi&m=101200279101550&w=2
The original patch was targeted at lk 2.4 and Dave forwarded
ported it into 2.5. Other bits (e.g. sr) have already found
there way into your tree. One bit in the sd driver will be
included in my following patch.
Alexander Viro [Wed, 1 May 2002 02:58:03 +0000 (19:58 -0700)]
[PATCH] (4/6) blksize_size[] removal
- put block size in bdev->bd_block_size, make do_open() and
check_partition() to set it (see above), switch set_blocksize() and
block_size() to use of ->bd_block_size. Remove manipulations with
blksize_size[] from drivers, remove blksize_size[] itself.
Alexander Viro [Wed, 1 May 2002 02:57:27 +0000 (19:57 -0700)]
[PATCH] (1/6) blksize_size[] removal
- preliminary cleanups: make sure that swapoff restores original block
size, kill set_blocksize() (and use of __bread()) in multipath.c,
reorder opening device and finding its block size in mtdblock.c.
William Stinson [Tue, 30 Apr 2002 22:31:51 +0000 (18:31 -0400)]
request_region janitor updates for ultrastor scsi driver:
1) removes calls to check_region
2) checks the result of request_region calls
3) calls release_region where necessary in case of driver initialisation error
William Stinson [Tue, 30 Apr 2002 22:23:54 +0000 (18:23 -0400)]
baycom_ser_fdx hamradio driver request_region janitor updates:
1) remove call to check_region
2) test result of request_region
3) call release_region in case of driver intialisation error later on
Jean Tourrilhes [Tue, 30 Apr 2002 18:07:52 +0000 (14:07 -0400)]
irda update 6/7:
o [CORRECT] Cancel LSAP watchdog when putting socket back to listen
o [CORRECT] Try to close LAP when closing LSAP still active
<Following patch from Felix Tang>
o [CORRECT] Header fix for compile on Alpha architecture
Jean Tourrilhes [Tue, 30 Apr 2002 18:04:25 +0000 (14:04 -0400)]
irda update 2/7:
o [CORRECT] Fix race condition when starting todo timer
o [CORRECT] Fix race condition when stopping higher layer
Higher layer would think it is stopped and us it is started
o [CORRECT] Give credit even if packets in Tx queue
If Tx queue was stopped, could starve peer and deadlock
o [CORRECT] Protect Rx credit update with spinlock
o [CORRECT] Calculate properly self->avail_credit
Didn't take into account queued Rx fragments
Incremented even if Rx frame not delivered to higher layer
-> would never stop the peer (i.e. not flow control)
-> could become infinite
o [CORRECT] Send credit when higher layer reenable receive
Peer wouldn't restart Tx to us if flow stopped
o [FEATURE] Implement LAP queue not full notification
Lower latency, ...
o [FEATURE] Reduce Tx queue to 8 packets (from 10)
But make sure we can always send a full LAP window (7)
o [FEATURE] Fix and optimise TTP flow control
Make sure peer can always send a full LAP window (7)
Minimise explicit credit updates (give_credit)
o [FEATURE] Remove need for todo timer in Tx/Rx paths
Less potential races, lower latency, lower context switches
Could not use tasklet because broken API, better anyway ;-)
Jean Tourrilhes [Tue, 30 Apr 2002 18:03:22 +0000 (14:03 -0400)]
irda update 1/7:
o [FEATURE] Reduce LAP Tx queue to 2 packets (from 10)
Improve latency, reduce buffer usage
o [FEATURE] LAP Tx queue not full notification (flow start)
Poll higher layer to fill synchronously LAP window (7 packets)
o [FEATURE] LMP LSAP scheduler
Ensure Tx fairness between LSAPs (sockets, IrCOMM, IrNET...)
Jeff Garzik [Tue, 30 Apr 2002 17:38:46 +0000 (13:38 -0400)]
Add new PC300 WAN driver (courtesy Cyclades), and my own changes:
* patch added stuff to include/linux. move these three new headers
to drivers/net/wan.
* change the code to support these changes
* slightly better Config.in entry. needs more work, though.
David Gibson [Tue, 30 Apr 2002 07:42:48 +0000 (00:42 -0700)]
[PATCH] orinoco driver update
The following patch against 2.5.11 updates the orinoco driver. As well
as miscellaneous updates to the driver core it adds a new module
supporting Prism 2.5 based PCI wireless cards, and adds a MAINTAINERS
entry for the driver.
Frank Davis [Tue, 30 Apr 2002 07:41:56 +0000 (00:41 -0700)]
[PATCH] 2.5.11 : drivers/net/ppp_generic.c
Linus,
During a 'make bzImage', I received a warning on ppp_generic.c that ret
wasn't initialized (also for 2.5.10). I have attached a patch that sets
ret = count, thus removing the warning. Please review for inclusion.
Dave Hansen [Tue, 30 Apr 2002 07:00:53 +0000 (00:00 -0700)]
[PATCH] shift BKL out of vfs_readdir
This patch takes the BKL out of vfs_readdir() and moves it into the
individual filesystems, all 35 of them. I have the feeling that this
wasn't done before because there are a lot of these to change and it was
a pain to find them all. I definitely got all of those that were
defined in the in the structure declaration like this "readdir:
fs_readdir;" vxfs_readdir was assigned strangely, but I found it anyway.
I also left devfs out of this one. Richard seems confident that devfs
has no need for the BKL.
Martin Dalecki [Tue, 30 Apr 2002 06:59:09 +0000 (23:59 -0700)]
[PATCH] 2.5.11 IDE 48
Tue Apr 30 13:23:13 CEST 2002 ide-clean-48
This fixes the "performance" degradation partially, becouse we don't
miss that many jiffies in choose_urgent_device() anymore. However
choose_urgent_device has to be fixed for the off by one error to don't
loop for a whole 1/100 second before submitting the next request.
- Include small declaration bits for Jens. (WIN_NOP fix in esp.)
- Fix ide-pmac to conform to the recent API changes.
- Prepare and improve the handling of the request queue. It sucks now as many
request as possible. This is improving the performance.
Martin Dalecki [Tue, 30 Apr 2002 06:59:01 +0000 (23:59 -0700)]
[PATCH] 2.5.11 IDE 47
- Rewrite choose_drive() to iterate explicitely over the channels and devices
on them. It is not performance critical to iterate over this typically quite
small array of disks and allows us to let them act on the natural entity,
namely the channel as well as to remove the drive->next field from struct
ata_device. Make the device eviction code in ide_do_request() more
intelliglible. Add some comments explaining the reasoning behind the code
there.
- Now finally since the code for choosing the drive which will be serviced next
is intelliglibly it became obvious that the attempt to choose the next drive
based on the duration of the last request was entierly bogous. (Because for
example wakeups can take a long time, but this doesn't indicate that the
drive is slow.) Remove this criterium and the corresponding accounting
therefore. Threat all drives fairly right now.
Surprise surprise the overall system throughput increased :-).
Martin Dalecki [Tue, 30 Apr 2002 06:57:32 +0000 (23:57 -0700)]
[PATCH] 2.5.11 IDE 46
- Remove the specific CONFIG_IDEDMA_PCI_WIP in favor of using the generic
CONFIG_EXPERIMENTAL tag. (Pointed out by Vojtech Pavlik).
- Change the signature of the IRQ handler to take the request directly as a
parameter. This doesn't blow the code up but makes it much more obvious and
finally it's reducing the number of side effects of the hwgroup->rq field.
- A second sharp look after the above change allowed us to remove the wrq field
from the hwgroup struct. It's just not used at all.
- Change the signature of the end_request member of struct ata_operations to
take the request as a second argument. Similar for __ide_end_request()
and ide_end_request().
- Remove BUG_ON() items just before ide_set_handler(). The check in
ide_set_handler is clever enough now.
- Remove the rq subfield from ide-scsi packet structure. We have now the
request context always in place. Same for floppy.
- Let the timer expiry function take the request as a direct argument.
Yes I know those changes are extensive. But they are a necessary step
in between for the following purposes:
- Consolidate the whole ATA/ATAPI stuff on passing a single unified request
handling object. Because after eliminating those side effects it's far easier
to see what's passed where.
- Minimizing the amount of side effects in the overall code. That's a good
thing anyway and it *doesn't* cost us neither performance nor space, since
the stack depths are small anyway here.
- Minimizing the usage of hwgroup - which should go away if possible.
Andrew Morton [Tue, 30 Apr 2002 06:54:18 +0000 (23:54 -0700)]
[PATCH] page writeback locking update
- Fixes a performance problem - callers of
prepare_write/commit_write, etc are locking pages, which synchronises
them behind writeback, which also locks these pages. Significant
slowdowns for some workloads.
- So pages are no longer locked while under writeout. Introduce a
new PG_writeback and associated infrastructure to support this design
change.
- Pages which are under read I/O still use PageLocked. Pages which
are under write I/O have PageWriteback() true.
I considered creating Page_IO instead of PageWriteback, and marking
both readin and writeout pages as PageIO(). So pages are unlocked
during both read and write. There just doesn't seem a need to do
this - nobody ever needs unblocking access to a page which is under
read I/O.
- Pages under swapout (brw_page) are PageLocked, not PageWriteback.
So their treatment is unchangeded.
It's not obvious that pages which are under swapout actually need
the more asynchronous behaviour of PageWriteback.
I was setting the swapout pages PageWriteback and unlocking them
prior to submitting the buffers in brw_page(). This led to deadlocks
on the exit_mmap->zap_page_range->free_swap_and_cache path. These
functions call block_flushpage under spinlock. If the page is
unlocked but has locked buffers, block_flushpage->discard_buffer()
sleeps. Under spinlock. So that will need fixing if for some reason
we want swapout to use PageWriteback.
Kernel has called block_flushpage() under spinlock for a long time.
It is assuming that a locked page will never have locked buffers.
This appears to be true, but it's ugly.
- Adds new function wait_on_page_writeback(). Renames wait_on_page()
to wait_on_page_locked() to remind people that they need to call the
appropriate one.
- Renames filemap_fdatasync() to filemap_fdatawrite(). It's more
accurate - "sync" implies, if anything, writeout and wait. (fsync,
msync) Or writeout. it's not clear.
- Subtly changes the filemap_fdatawrite() internals - this function
used to do a lock_page() - it waited for any other user of the page
to let go before submitting new I/O against a page. It has been
changed to simply skip over any pages which are currently under
writeback.
This is the right thing to do for memory-cleansing reasons.
But it's the wrong thing to do for data consistency operations (eg,
fsync()). For those operations we must ensure that all data which
was dirty *at the time of the system call* are tight on disk before
the call returns.
So all places which care about this have been converted to do:
filemap_fdatawait(mapping); /* Wait for current writeback */
filemap_fdatawrite(mapping); /* Write all dirty pages */
filemap_fdatawait(mapping); /* Wait for I/O to complete */
- Fixes a truncate_inode_pages problem - truncate currently will
block when it hits a locked page, so it ends up getting into lockstep
behind writeback and all of the file is pointlessly written back.
One fix for this is for truncate to simply walk the page list in the
opposite direction from writeback.
I chose to use a separate cleansing pass. It is more
CPU-intensive, but it is surer and clearer. This is because there is
no reason why the per-address_space ->vm_writeback and
->writeback_mapping functions *have* to perform writeout in
->dirty_pages order. They may choose to do something totally
different.
(set_page_dirty() is an a_op now, so address_spaces could almost
privatise the whole dirty-page handling thing. Except
truncate_inode_pages and invalidate_inode_pages assume that the pages
are on the address_space lists. hmm. So making truncate_inode_pages
and invalidate_inode_pages a_ops would make some sense).
Andrew Morton [Tue, 30 Apr 2002 06:53:51 +0000 (23:53 -0700)]
[PATCH] cleanup of bh->flags
Moves all buffer_head-related stuff out of linux/fs.h and into
linux/buffer_head.h. buffer_head.h is currently included at the very
end of fs.h. So it is possible to include buffer_head directly from
all .c files and remove this nested include.
Also rationalises all the set_buffer_foo() and mark_buffer_bar()
functions. We have:
BUFFER_FNS() and TAS_BUFFER_FNS() macros generate all the above real
inline functions. Normally not a big fan of cpp abuse, but in this
case it fits. These function-generating macros are available to
filesystems to expand their own b_state functions. JBD uses this in
one case.
Andrew Morton [Tue, 30 Apr 2002 06:53:41 +0000 (23:53 -0700)]
[PATCH] remove show_buffers()
Remove show_buffers(). It really has nothing to show any more. just
buffermem_pages() - move that out into the callers.
There's a lot of duplication in this code. better approach would be to
remove all the duplicated code out in the architectures and implement
generic show_memory_state(). Later.
Andrew Morton [Tue, 30 Apr 2002 06:53:20 +0000 (23:53 -0700)]
[PATCH] remove i_dirty_data_buffers
Removes inode.i_dirty_data_buffers. It's no longer used - all dirty
buffers have their pages marked dirty and filemap_fdatasync() /
filemap_fdatawait() catches it all.
Updates all callers.
This required a change in JFS - it has "metapages" which
are a container around a page which holds metadata. They
were holding these pages locked and were relying on fsync_inode_data_buffers
for writing them out. So fdatasync() deadlocked.
I've changed JFS to not lock those pages. Change was acked
by Dave Kleikamp <shaggy@austin.ibm.com> as the right
thing to do, but may not be complete. Probably igrab()
against ->host is needed to pin the address_space down.
Andrew Morton [Tue, 30 Apr 2002 06:52:37 +0000 (23:52 -0700)]
[PATCH] cleanup page flags
page->flags cleanup.
Moves the definitions of the page->flags bits and all the PageFoo
macros into linux/page-flags.h. That file is currently included from
mm.h, but the stage is set to remove that and include page-flags.h
direct in all .c files which require that. (120 of them).
The patch also makes all the page flag macros and functions consistent: