For some cases, we cannot decide if a target would change just by looking
at its prequisites, i.e. it's quite likely that it remains the same
even though a prequisite changed. The updated timestamp would cause
a lot of unnecessary recompiles. In this case, we actually generate
a temporary file, compare it to the old file and only the contents
are different do overwrite the old file.
The "update-if-changed" snippet remains always the same, so let's
put it into a macro instead of duplicating it. After that change,
scripts/mkversion_h is so small that we rather put the three remaining
lines directly into the Makefile.
So here we are: make vmlinux/bzImage/whatever will now also build
modules as it goes. Other than that, everything works as usual.
"make modules" builds only the modules (but you shouldn't need
it anymore). If you don't want the modules built, you can
do "make KBUILD_MODULES= vmlinux/whatever" to only compile built-in
objects.
If people want it, I can also allow for "make vmlinux/whatever nomodules"
to do the same.
Also, add ' ' in Rules.make to properly align output in quiet mode.
If an object was changed to not export symbols anymore, the
corresponding stale .ver file would have been left lying around
and been picked up when generating modversions.h.
The obvious solution to remove include/linux/modules/* at the
beginning of "make dep" is not really good, since that means
that .ver files would be regenerated unconditionally, thus causing
a lot of possibly unnecessary rebuilds.
So, instead, we build a temporary shadow tree of all export-objs
(as empty files) during the recursive "make fastdep" phase, and use
that to generate modversions.h.
Ensure that we touch include/linux/modversions.h if any of the
.ver files changes, that's our marker to rebuild all modversions
affected files.
Rules.make now has three targets:
o default (a.k.a first_rule): The actual build. Deciding whether
to build built-in or modular or both is decided by
$(KBUILD_MODULES) and $(KBUILD_BUILTIN) now, instead of using
different targets
o fastdep: doesn't actually dependencies anymore, only generates
modversions
o modules_install: Well, you guess what that does.
Cleaned up descending, and no more differentiating between
$(subdir-y) and $(subdir-m). That means $(mod-subdirs) can
go away now.
kbuild: Use deep directory structure for include/linux/modules
We used to force the obvious deep structure of all objects which
export symbols into a flat list in include/linux/modules. This
initially caused the restriction the no two exporting objects could
have the same name (Ever wondered why there's ksyms.c and
i386_ksyms.c?)
With the ALSA merge this restriction was mostly lifted by some hack,
but some cases still don't work right (Hi XFS). As it's much cleaner
to just use a normal tree under include/linux/modules, reflecting the
source tree, we now do just that.
Making dependencies once up front is not ideal. For one, you don't need
them initially, since when you don't have the .o file, you bet you have
to build it no matter what the dependencies say - dependencies are about
deciding when to *re*build.
There's more reasons, like:
o you don't even know which files you'll build, so you have to go over
all files (even over drivers/{sbus,s390,...} on i386)
o generated files don't exist yet, so you cannot pick up dependencies
on them
o even if dependencies are right initially, they change when you work on
your tree or patch it, and nobody will notice unless you run "make dep"
explicitly again
Anyway, gcc knows hows to emit a correct dependency list, so we just use
that. Well, a little bit of hacking is necessary to remove the dependency
on autoconf.h and put in individual CONFIG_WHAT_EVER dependencies instead,
since otherwise changing one config option would cause everything to be
rebuilt.
I should add that I didn't come up with this all by myself, most work
is actually done in gcc and there were discussions about using -MD on
kbuild-devel way back, so I should mention Keith Owens and Michael
Elizabeth Chastain, and probably others that I forgot, so I apologize
just in case.
We source some scripts, but still pass parameters to them, e.g.
. mk_version_h $@ $(KERNELRELEASE) $(VERSION) ...
This does not work for all kinds of /bin/sh (it does for bash, that's
why I did not notice).
The fix is easy: Just mark the scripts executable and call instead of
source them.
Unfortunately, patch(1) doesn't understand about propagating chmod.
bk does, so changing the tree isn't hard, and we introduce an explicit
chmod a+x executed during the build for propagating this change into
those trees which get "traditionally" patched up.
Just use 'make some/dir/foo.lst' to produce mixed source code and
assembly for debugging. (If the object gets linked in and you have
a System.map, it'll relocate appropriately)
Apart from the needed Makefile bits, also clean up the script
"makelst".
kbuild: Split Makefile into needs / needs not .config
The current top-level Makefile has a fundamental problem which
makes "make oldconfig vmlinux" impossible:
It includes .config, which is changed by "oldconfig". So after "oldconfig"
.config has changed and the .config the Makefile had read is obsolete.
make provides a mechanism to cope with this, it'll restart automatically
if any of the files it included changed, if you let it know that you
changed it, just using a normal rule which has .config as its target.
However, once you tell make that "make oldconfig" changes .config, you
have another problem: oldconfig always uses .config to be remade, there's
no mechanism to tell if it's up to date. So makes notices that .config
has changed, restarts, makes oldconfig again, notices that .config has
changed, restarts, ... you get the picture.
The way to solve this is to do a proper two-stage approach: If you just
say "make oldconfig", there's no need for the Makefile to even read the
.config. If it does not, it won't restart and recurse infintely.
So we divide the Makefile into two sections: One for targets which don't
need the variables from .config, like *config, clean, mrproper and
one section which does the actual build, which needs to know the
CONFIG_ options.
If one of the "noconfig" targets is given, we handle those, without
reading .config. From there, we call make again, filtering out the already
handled targets, to do the main work.
The fact that this actually works correctly can be seen by trying
"make vmlinux oldconfig" which will execute things in the right
order - and this is not just nitpicking, it means that "-j" will
get this case right, too.
The $(CONFIGURATION) hack used to start "make config" automatically
can go away now, too. Since we don't know which of make *config the
user prefers, we'll just ask him call "make whatever-config" himself,
instead of forcing "make config" on him.
Tom Rini [Wed, 5 Jun 2002 02:42:57 +0000 (19:42 -0700)]
[PATCH] Cleanup i386 <linux/init.h> abuses
The following patch cleans up the i386 usage of <linux/init.h>.
This remove <linux/init.h> from <asm-i386/system.h> which did not need
it, <asm-i386/highmem.h> which only had it due to an extern using
__init, which is not needed.
This adds <linux/init.h> to <asm-i386/bugs.h> which actually has
numerous __init functions and adds <linux/init.h> to 9 files inside of
arch/i386 which were indirectly including <linux/init.h> previously.
We do not need to pass the file descriptor to the fcntl_[sg]etlk
functions, only the struct file * which we have already got
from the file descriptor and verified.
Martin Dalecki [Wed, 5 Jun 2002 02:39:20 +0000 (19:39 -0700)]
[PATCH] 2.5.20 IDE 85
- Work a bit on the automatic CRC error recovery handling. System still hangs.
But one thing for sure - we don't have to use any specialized irq handler for
it.
- Since ioctl don't any longer submit requests to the queue without
rq->special set, we can safely remove args_error handling from
start_request.
- Make REQ_SPECIAL usage in ide-floppy obvious.
- Use REQ_SPECIAL everywhere instead of REQ_DRIVE_ACB. This is actually a bit
dangerous but should work as far as I can see.
- Unfold the now not REQ_SPECIAL specific dequeing part of ide_end_drive_cmd().
Turns out that we can thereafter remove the calls to ide_end_drice_cmd() at
places where the request type isn't REQ_SPECIAL all any longer. (tcq.c)
- After the above operation it turns out that there are just two places where
ide_end_drive_cmd remains, namely: ata_error, task_no_data_intr
drive_cmd_intr.
We can avoid it by changing the logics in ata_error a slightly.
So now just to cases remain where ide_end_drive_cmd remains used:
drive_cmd_intr and task_no_data_intr.
- Now looking a bit closer we can realize that drive_cmd_intr and
task_no_data_intr can be easly merged together, since the usage of
task_no_data_intr implied taskfile.sector_number == 0.
- Use one single ata_special_intr function for the handling of interrupts
submitted through ide_raw_cmd.
- Move the remaining artefacts of ide_end_drive_cmd directly to
ata_special_intr. Oh we don't need to check for REQ_SPECIAL any longer there,
since the context is already known.
- Set the ata_special_intr handler and command type directly inside
ide_raw_taskfile to save code.
Russell King [Wed, 5 Jun 2002 01:19:28 +0000 (02:19 +0100)]
[ARM] Clean up map_desc structure
- remove LAST_DESC in favour of passing the array size to iotable_init
- replace domain and permission bits/cache attributes with a generic
"type", which can be easily converted to the required domain and
permission bits/cache attributes at run time. It also removes the
possibility for getting such things wrong and (accidentally)
allowing all user space to fiddle with devices directly.
Robert Love [Tue, 4 Jun 2002 07:31:55 +0000 (00:31 -0700)]
[PATCH] remove suser()
Attached patch replaces the lone remaining suser() call with capable()
and then removes suser() itself in a triumphant celebration of the glory
of capable(). Or something. ;-)
Small cleanup of capable() and some comments, too.
Martin Dalecki [Tue, 4 Jun 2002 02:34:23 +0000 (19:34 -0700)]
[PATCH] 2.5.20 IDE 84
- Simplify ide_cmd_type_parse by removing the handling of commands which we
never use.
- Realize that pre_task_out_intr and pre_task_mulout_intr are semanticaly
identical. Use only pre_task_out_intr(). This allowed us to
eliminate the prehandler altogether.
- Updated fix for misconfigured host chips by Vojtech Pavlik.
- Be more permissive about ioctl handling to allow device type drivers to do
they own checks.
- ali14xx cleanups by Andrej Panin.
- Unfold usage ide_cmd_type_parser in tcq.c code. This makes this operation
local to ide-disk.c. Move it as well as the interrupt handlers used only for
the handling of disk requests there too.
- Guard against calling handler before the drive is ready for it in
ata_taskfile()! Well this bug was there before, but right now we inform
about it.
- Unfold ide_cmd_type_praser in ide-disk.c. Merge the remaining bits of it with
get_command. Well it's no more.
- Move recal_intr to ide.c - the only place where it's used.
This doesn't change the "mechanics" of the code but it makes it a lot more
"obvious" what's going on.
Russell King [Tue, 4 Jun 2002 02:28:48 +0000 (19:28 -0700)]
[PATCH] fix 2.5.20 ramdisk
2.5.20 seems to be incapable of executing binaries in a ramdisk-based
root filesystem. The ramdisk in question is an ext2fs, with a 1K
block size loaded via the compressed ramdisk loader in do_mounts().
It appears that, in the case of a 1K block sized filesystem, we attempt
to read two 512-byte sectors into a BIO vector. The first one is copied
into the first 512 bytes. The second sector, however, is copied over
the first 512 bytes. Obviously not what we really want.
Patrick Mochel [Mon, 3 Jun 2002 12:53:50 +0000 (05:53 -0700)]
PCI driver mgmt:
- Make sure proper pci id is passed to probe()
- make sure pci_dev->driver is set and reset on driver registration/unregistration
- call remove_driver to force unload of driver on unregistration
Patrick Mochel [Mon, 3 Jun 2002 12:51:12 +0000 (05:51 -0700)]
Do manual traversing of drivers' devices list when unbinding the driver.
driver_unbind was called when drv->refcount == 0.
It would call driver_for_each_dev to do the unbinding
The first thing that would do was get_device, which...
BUG()'d if drv->refcount == 0.
Duh.
Patrick Mochel [Mon, 3 Jun 2002 12:48:44 +0000 (05:48 -0700)]
device model udpate:
- make sure drv->devices is initialized on registration (from Peter Osterlund)
- add remove_driver for forcing removal of driver
There was a potential race with the module unload code. When a pci driver was unloaded, it would call pci_unregister_driver, which would simply call put_driver.
If the driver's refcount wasn't 0, it wouldn't unbind it from devices, but the module unload would still continue.
If something tried to access the driver later (since everyone thinks its still there), Bad Things would happen.
This fixes it until there can be tighter integration between the device model and module unload code.
Pavel Machek [Mon, 3 Jun 2002 06:25:54 +0000 (23:25 -0700)]
[PATCH] Cleanup swsusp in 2.5.20
This cleans up swsusp in 2.5.20. Killed sysrq-D support (it is too
much trouble to support suspending from interrupt), kill unused
define, fix compile-time warnings (thanks to Adam).
Pavel Machek [Mon, 3 Jun 2002 06:25:44 +0000 (23:25 -0700)]
[PATCH] Fix suspend-to-RAM in 2.5.20
I created arch/i386/suspend.c not to clash with ACPI people so much in
future. (More stuff is going to move into it in the future, to clean
up functions that really do not belong to the headers.)
Zwane Mwaikambo [Mon, 3 Jun 2002 05:55:16 +0000 (22:55 -0700)]
[PATCH] bluesmoke merge
This patch merges in all the currently outstanding bluesmoke bits from
2.5-dj to 2.5.20, it also has the pleasant side effect of fixing the
compilation. Test compiled with and without MCE.
Dan Kegel [Mon, 3 Jun 2002 05:35:05 +0000 (22:35 -0700)]
[PATCH] must be __KERNEL__ for byteorder/generic.h
Here's that patch again (MIME this time, so tabs don't get
lost by my silly gui mailer); applies cleanly against against 2.4.19-pre8.
Nobody commented on it last time I posted it, and it does
make compiling gcc easier, so I guess that makes it trivial patch
monkey fodder. Or am I making a silly mistake?
Randy Hron [Mon, 3 Jun 2002 05:34:56 +0000 (22:34 -0700)]
[PATCH] remove space in cache names
Most /proc/slabinfo cache_names are in the format:
cache_name. There are a couple with spaces in the
name, which is inconsistent and requires a special case
when scripting.
Changes "fasync cache" and "file lock cache" to have
the usual underscore.
inactive_list and active_list are global, yet they are repeatedly
initialized using INIT_LIST_HEAD() in free_area_init_core(). This
patch is originally due to Christoph Hellwig, and by some reports
has been implementated before in 2.4-based trees by Andrea Arcangeli.
The memlist_* macros serve as nothing but an insulation layer from the
Linux-native generic list operations. This patch removes them in favor
of using generic list operations directly.
This comment, describing how to optimize for gcc-2.2.2, is so outdated
it should be removed. It's also quite doubtful it should ever have been
placed in this file at all (perhaps something under Documentation/ ?).
This patch removes it.
The comment describing the usage of zone_table[] assumes the existence
of an unsigned char page->zone field from the original implementation
of page->zone size reduction. This patch corrects the comment to
accurately describe the lookup mechanism used by page_zone() and also
to mention explicitly the sole user of the table, page_zone().
Kazuto Miyoshi [Mon, 3 Jun 2002 05:33:06 +0000 (22:33 -0700)]
[PATCH] fix for /proc operation:
I found that 'max' pointer is not updated in proc_dointvec_minmax()
and proc_doulongvec_minmax(), when I write smaller values than min to
/proc/sys entry (and val<*min++ check becomes true.)
This may lead to min/max checking of values with bogus maximum.
Johan Adolfsson [Mon, 3 Jun 2002 05:32:25 +0000 (22:32 -0700)]
[PATCH] Missing include in mm/bootmem.c
Missing include of asm_io.h in mm_bootmem.c:
Submitted this trivial patch on May 14, but nothing has happened yet.
Perhaps better chance if you took care of it.
It's needed by phys_to_virt() but it happens to work on i386 etc.
since dma.h includes io.h for that arch.
Trond Myklebust [Mon, 3 Jun 2002 05:12:03 +0000 (22:12 -0700)]
[PATCH] Fix Oops due to use of incorrect km_type in RPC socket code...
The following has been vetted with davem w.r.t. the change to
KM_SKB_DATA. Apologies for the bug...
include/asm-*/kmap_types.h:
Replace the unused km_type slot KM_SKB_DATA with
KM_SKB_SUNRPC_DATA.
net/sunrpc/xdr.c:
Replace the use of KM_USER0 with KM_SKB_SUNRPC_DATA for copying
data from an skb into the page cache when in the sk->data_ready()
callback.
oom-loop fixes error handling after a netlink failure - it does not do a
cleanup and it makes every next call to ip_fw_check to detect a
loop and drop the packet.
Robert Love [Mon, 3 Jun 2002 03:56:04 +0000 (20:56 -0700)]
[PATCH] sys_sysinfo cleanup
Looks like sys_sysinfo has not been touched in years. Among other
things, it uses a global cli() for protection; I switched it to an
existing rwlock. I also pulled it out of info.c and stuck it in timer.c
(I choose timer.c because it shares dependencies there already).
The details:
- move sys_sysinfo to kernel/timer.c from kernel/info.c:
why one small syscall got its own file is beyond me.
- delete kernel/info.c
- stop the global cli! now grab a read_lock on xtime_lock.
this is safe as we moved the write_unlock on xtime_lock
down one line to cover the calculating of avenrun.
Robert Love [Mon, 3 Jun 2002 03:55:55 +0000 (20:55 -0700)]
[PATCH] make smp.c preempt-safe
The attached patch cleans up some per-CPU code in arch/i386/kernel/smp.c
that could be problematic under preemption.
The first I solve with the new get_cpu interface, for the second two I
explicitly disable preemption. I also changed 1 to 1UL in the shift to
properly match the type.
Robert Love [Mon, 3 Jun 2002 03:55:35 +0000 (20:55 -0700)]
[PATCH] trivial misc. scheduler cleanups
Resend of trivial bits from my scheduler tree...:
- shift cpu by 1UL not 1 to match type
- clarify various comments
- remove the barrier from preempt_schedule. This was here
because I used to check need_resched before returning from
preempt_schedule but we do not now (although should). The
barrier insured need_resched and preempt_count were in sync
now and after an interrupt that could occur.
Robert Love [Mon, 3 Jun 2002 03:55:27 +0000 (20:55 -0700)]
[PATCH] capability.c cleanup
I started looking into a couple FIXMEs in kernel/capability.c and I
ended up with a fairly largish patch (although not quite so many changes
to object code).
First, it is unsafe to touch task->cap_* while not holding
task_capability_lock. The most notable occurrence of this is sys_access
which saves the current cap_* values, changes them, does its business,
then restores them. In between all this they can change and then be
restored to old values. Unfortunately we cannot just grab the lock here
since the function can sleep - I marked this with a FIXME for now.
Second, I formalized the locking rules with task_capability_lock. I
declared the lock in include/linux/capability.h so other code can grab
it.
Finally, there is a whole boatload of code cleanup:
- remove conditional locking/unlocking - that is just gross
- don't pointlessly grab the read_lock twice
- add/remove/edit comments
- change some types (int -> pid_t, etc)
- static inline two small functions that are called only
once each
- remove two FIXMEs
- general code cleanup for readability and performance
TODO:
- fix sys_access and other cap_* accesses
- do something about the annoying oddball 5-space indentation
in kernel/capability.c !!
Robert Love [Mon, 3 Jun 2002 03:55:19 +0000 (20:55 -0700)]
[PATCH] remove wq_lock_t cruft
This patch removes the whole wq_lock_t abstraction, forcing the behavior
to be that of a standard spinlock and changes all the wq_lock code in
the tree appropriately.
Removes lots of code - always a Good Thing to me. New behavior is same
as previous behavior (USE_RW_WAIT_QUEUE_SPINLOCK unset).
Martin Dalecki [Mon, 3 Jun 2002 03:52:31 +0000 (20:52 -0700)]
[PATCH] ]PATCH] 2.5.20 IDE 83
- Remove last parameter from ide_dump_status. This information is now
permanently present in device->staus field, so there is not need to pass it
around.
- Patch for DVD read through ide-scsi. There is the possibility that we can get
request structures passed down, which don't have the queue field set.
At lest on the BIO code path this seems to be something worth further
investigation. Found by Adam J. Richter. (Jens?)
- Revert my change to the hostdata handling. I did get it wrong about the way
host structures are allocated by the generic SCSI layer. It plays
tricks there.
- piix driver updates by Vojtech Pavlik.
- We have a ata_out_regfile, so we should have ata_in_regfile too.
Linus Torvalds [Mon, 3 Jun 2002 03:17:35 +0000 (20:17 -0700)]
Split up "iput()" and make it more readable.
Add "drop_inode" VFS interface to make FS operations cleaner
and race-free. Remove old force_delete interface, and update
filesystems that used it to use the new infrastructure.
Martin Dalecki [Sun, 2 Jun 2002 10:28:08 +0000 (03:28 -0700)]
[PATCH] 2.5.19 blk.h and more about the ugly kids.
- Remove DEVICE_INTR and associated code from floppy driver.
- Savlage s390 xpram code from kernel version dependant compilation disease.
- Eliminate SET_INTR code from the places where it was used.
- Eliminate bogous support for multiple sbpcd controllers. The driver didn't
even compile right now before we could think about further supporting it at
all we have to get rid of this hack first.
Don't call invalidate_buffers in the release method there.
Why should it be necessary?
- Resurrect sonycd535 compilation.
- Let CURRENT request macro use the same primitive at the remaining QUEUE macro
in blk.h, which is still not quite right, but first things first :-).
Andrew Morton [Sun, 2 Jun 2002 10:24:28 +0000 (03:24 -0700)]
[PATCH] rename flushpage to invalidatepage
Fixes a pet peeve: the identifier "flushpage" implies "flush the page
to disk". Which is very much not what the flushpage functions actually
do.
The patch renames block_flushpage and the flushpage
address_space_operation to "invalidatepage".
It also fixes a buglet in invalidate_this_page2(), which was calling
block_flushpage() directly - it needs to call do_flushpage() (now
do_invalidatepage()) so that the filesystem's ->flushpage (now
->invalidatepage) a_op gets a chance to relinquish any interest which
it has in the page's buffers.
Andrew Morton [Sun, 2 Jun 2002 10:24:15 +0000 (03:24 -0700)]
[PATCH] tmpfs bugfixes
A patch from Hugh Dickins which fixes a couple of error-path leaks
related to tmpfs (I think).
Also fixes a yield()-inside-spinlock bug.
It also includes code to clear the final page outside i_size on
truncate. tmpfs should be returning zeroes when a truncated file is
later expanded and it currently is not.
Andrew Morton [Sun, 2 Jun 2002 10:24:04 +0000 (03:24 -0700)]
[PATCH] put in-memory filesystem dirty pages on the correct list
Replaces SetPageDirty() with set_page_dirty() in several places related
to in-memory filesystems.
SetPageDirty() is basically always the wrong thing to do. Pages should
be moved to the ->dirty_pages list when dirtied so that writeback can
see them.
Without this change, dirty pages against in-memory filesystems would
churn around on the inactive list all the time, rather than getting
pushed away onto the active list. A minor efficiency thing.
The patch also removes all the block_flushpage() calls from the swap
code in favour of a direct call to try_to_free_buffers().
The theory is that the page is locked, there is no I/O underway, nobody
else has access to the buffers so they MUST be freeable. A bunch of
BUG() checks have been added, and unless someone manages to trigger
one, the "block_flushpage() inside spinlock" problem is fixed.
Andrew Morton [Sun, 2 Jun 2002 10:22:42 +0000 (03:22 -0700)]
[PATCH] buffer_boundary() for ext3
Implement buffer_boundary() for ext3.
buffer_boundary() is an I/O scheduling hint which the filesystem's
get_block() function passes up to the BIO assembly code. It is
described in fs/mpage.c
The time to read 1,000 52 kbyte files goes from 8.6 seconds down to 2.9
seconds. 52 kbytes is the worst-case size.
Andrew Morton [Sun, 2 Jun 2002 10:22:29 +0000 (03:22 -0700)]
[PATCH] speed up writes
Speeds up generic_file_write() by not calling mark_inode_dirty() when
the mtime and ctime didn't change.
There may be concerns over the fact that this restricts mtime and ctime
updates to one-second resolution. But the interface doesn't support
that anyway - all the filesystem knows is that its dirty_inode()
superop was called. It doesn't know why.
So filesystems which support high-resolution timestamps already need to
make their own arrangements. We need an update_mtime i_op to support
those properly.
time to write a one megabyte file one-byte-at-a-time:
Andrew Morton [Sun, 2 Jun 2002 10:22:17 +0000 (03:22 -0700)]
[PATCH] fix swapcache packing in the radix tree
First some terminology: this patch introduces a kernel-wide `pgoff_t'
type. It is the index of a page into the pagecache. The thing at
page->index. For most mappings it is also the offset of the page into
that mapping. This type has a very distinct function in the kernel and
it needs a name. I don't have any particular plans to go and migrate
everything so we can support 64-bit pagecache indices on x86, but this
would be the way to do it.
This patch improves the packing density of swapcache pages in the radix
tree.
A swapcache page is identified by the `swap type' (indexes the swap
device) and the `offset' (into that swap device). These two numbers
are encoded into a `swp_entry_t' machine word in arch-specific code
because the resulting number is placed into pagetables in a form which
will generate a fault.
The kernel also need to generate a pgoff_t for that page to index it
into the swapper_space radix tree. That pgoff_t is usually
bitwise-identical to the swp_entry_t. That worked OK when the
pagecache was using a hash. But with a radix tree, it produces
catastrophically bad results.
x86 (and many other architectures) place the `type' field into the
low-order bits of the swp_entry_t. So *all* swapcache pages are
basically identical in the eight low-order bits. This produces a very
sparse radix tree for swapcache. I'm observing packing densities of 1%
to 2%: so the typical 128-slot radix tree node has only one or two
pages in it.
The end result is that the kernel needs to allocate approximately one
new radix-tree node for each page which is added to the swapcache. So
no wonder we're having radix-tree node exhaustion during swapout!
(It's actually quite encouraging that the kernel works as well as it
does).
The patch changes the encoding of the swp_entry_t so that its
most-significant bits contain the `type' field and the
least-significant bits contain the `offset' field, right-aligned.
That is: the encoding in swp_entry_t is now arch-independent. The new
file <linux/swapops.h> has conversion functions which convert the
swp_entry_t to and from its machine pte representation.
Packing density in the swapper_space mapping goes up to around 90%
(observed) and the kernel is tons happier under swap load.
An alternative approach would be to create new conversion functions
which convert an arch-specific swp_entry_t to and from a pgoff_t. I
tried that. It worked, but I liked it less.
Andrew Morton [Sun, 2 Jun 2002 10:21:53 +0000 (03:21 -0700)]
[PATCH] list_head debugging
A common and very subtle bug is to use list_heads which aren't on any
lists. It causes kernel memory corruption which is observed long after
the offending code has executed.
The patch nulls out the dangling pointers so we get a nice oops at the
site of the buggy code.
Jens Axboe [Sun, 2 Jun 2002 10:15:36 +0000 (03:15 -0700)]
[PATCH] update to the update
Too much copy'n paste between 2.4 and 2.5 code base, attached patch on
top of the previous block tag fixes makes it work/compile again. Sorry
about that.
Jens Axboe [Sun, 2 Jun 2002 10:15:02 +0000 (03:15 -0700)]
[PATCH] documentation and tq_disk removals
This should be the last of tq_disk, at least the trivial ones. md still
has some queue_task references, I'll let Ingo/Neil clean those up.
suspend is still broken, it was broken before too though. I guess Pavel
will want to fix that.
Martin Dalecki [Sun, 2 Jun 2002 09:42:20 +0000 (02:42 -0700)]
[PATCH] 2.5.19 IDE 80
- Sanitize the handling of the ioctl's and fix a bug on the way in dealing with
the WIN_SMART command where arguments where exchanged.
- Finally sanitize ioctl further until it turned out that we could get rid of
the special request type REQ_DRIVE_CMD entierly. We are now using
consistently REQ_DRIVE_ACB.
One hidden code path less again!
- Realize the ide_end_drive_cmd can be on the REQ_DRIVE_ACB only for ioctl() to
a disk. Eliminate it's usage from device type driver modules.
- Remove command member from struct hd_drive_task_hdr and place it in strcut
ata_taskfile. It is not common between the normal register file and HOB.
We will have to introduce some helper functions for particular command types.
Martin Dalecki [Sun, 2 Jun 2002 09:41:38 +0000 (02:41 -0700)]
[PATCH] 2.5.19 IDE 78
- Move ide_fixstring() from ide.c to probe.c, since this is the place, where it's
most used.
- Remove GET_STAT() - it's not used any longer.
- Remove last parameter of ide_error. Rename it to ata_error().
- Don't use ide_fixstring in qd65xx.c host chip driver. The model name is
already fixed in probe.c.
- Invent ata_irq_enable() for the handling of the trice nIEN bit of the
control register. Consistently use ch->intrproc method every time we toggle
this bit. This simply wasn't the case before!
- Disable interrupts on a previous channel only when we share them indeed.
- Eliminate simple drive command handling function drive_cmd.
- Simplify the ioctl handler. Move it to ioctl, since that's the only place
where it's actually used.