Linus Torvalds [Fri, 23 Nov 2007 20:18:08 +0000 (15:18 -0500)]
Linux 2.2.1 - the Brown Paper Bag release
The subject says it all. We did have a few paper-bag-inducing bugs in
2.2.0, so there's a 2.2.1 out there now, just a few days after 2.2.0.
Oh, well. These things happen,
Linus
- the stupid off-by-one bug 'execute a coredump' crash found by Ingo
- __down_interruptible on alpha
- move "esstype" to outside a #ifdef MODULE
- NFSD rename/rmdir fixes
- revert to old array.c
- change comment about __PAGE_OFFSET
- missing "vma = NULL" case for avl find_vma()
Linus Torvalds [Fri, 23 Nov 2007 20:18:06 +0000 (15:18 -0500)]
Linux 2.2.0
> Compile this code
>
> ---- cut here ----
> #include <fcntlbits.h>
> void main( int argc, char *argv[] ) {
> open( argv[ 1 ], O_WRONLY|O_CREAT|O_TRUNC, 0666 );
> }
> ---- and here ----
>
> and run it like this
>
> strace ./a.out >(cat - )
>
> with 2.0.36 & 2.2.0-pre[67] you get:
>
> open("/dev/fd/63", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
>
> with 2.2.0-pre[89] you get:
>
> open("/dev/fd/63", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ENOENT (No
> such file or directory)
Ok, this seems to be due to pre9 removing some rather bogus code that
happened to hide another problem in open_namei().
I haven't actually tested this, but it looks really obvious, so does this
patch fix it for you? (This should also fix a potential performance
bogosity - there's absolutely no reason why we should get the directory
lock when we don't need to for a normal open of an existing file).
Linus Torvalds [Fri, 23 Nov 2007 20:18:05 +0000 (15:18 -0500)]
2.2.0-final
Hoya,
there's now a 2.2.0-pre9 on ftp.kernel.org, and when you compile it it
will call itself 2.2.0-final. The reason is fairly obvious: enough is
enough, and I can't make pre-kernels forever, it just dilutes the whole
idea. The only reason the tar-file is not called 2.2.0 is that I want to
avoid having any embarrassing typos that cause it to not compile under
reasonable configurations or something like that. Unreasonable
configurations I no longer care about.
Every program has bugs, and I'm sure there are still bugs in this. Get
over it - we've done our best, and nobody ever believed that there
wouldn't be 2.2.x kernels to fix problems as they come up, and delaying
2.2.0 forever is not an option.
I have a wedding anniversary and a company party coming up, so I'm taking
a few days off - when I get back I expect to take this current 2.2.0-final
and just remove the "-final" from the Makefile, and that will be it. I
suspect somebody _will_ find something embarrassing enough that I would
fix it too, but let's basically avoid planning on that.
In short, before you post a bug-report about 2.2.0-final, I'd like you to
have the following simple guidelines:
"Is this something Linus would be embarrassed enough about that he would
wear a brown paper bag over his head for a month?"
and
"Is this something that normal people would ever really care deeply
about?"
If the answer to either question is "probably not", then please consider
just politely discussing it as a curiosity on the kernel mailing lists
rather than even sending email about it to me: I've been too busy the last
few weeks, and I'd really appreciate it if I could just forget the worries
of a release for a few days..
But if you find something hilariously stupid I did, feel free to share it
with me, and we'll laugh about it together (and I'll avoid wearing the
brown paper bag on my head during the month of February). Do we have a
deal?
I've seen people working on a 2.2.0 announcement, and I'm happy - I've
been too busy to think straight, much less worry about details like that.
If everything turns out ok, I'll have a few memorable bloopers in my
mailbox but nothing worse than that, and I can sit down and actually read
the announcement texts that people have been discussing.
ObFeatures:
- m68k sync
- various minor driver fixes (irda, net drivers, scsi, video, isdn)
- SGI Visual Workstation support
- adjtimex update to the latest standards
- vfat silly buglet fix
- semaphores work on alpha again
- drop the inline strstr() that gcc got wrong whatever we did
- kswapd needed to be a bit more aggressive
- minor TCP retransmission and delack fixes
Until Monday,
Linus
Linus Torvalds [Fri, 23 Nov 2007 20:18:01 +0000 (15:18 -0500)]
Linux 2.2.0pre7
Ok, I think I now know why pre-6 looks so unbalanced. It's two issues.
Basically, trying to swap out a large number of pages from one process
context is just doomed. It bascially sucks, because
- it has bad latency. This is further excerberated by the per-process
"thrashing_memory" flag, which means that if we were unlucky enough to
be selected to be the process that frees up memory, we'll probably be
stuck with it for a long time. That can make it extremely unfair under
some circumstances - other processes may allocate the pages we free'd
up, so that we keep on being counted as a memory trasher even if we
really aren't.
Note that this shows most under "moderate" load - the problem doesn't
tend to show itself if you have some process that is _really_
allocating a lot of pages, because then that process will be correctly
found by the trashing logic. But if you have lots of "normal load"
processes, some of those can get really badly hurt by this.
In particular, the worst case you have a number of processes that all
allocate memory, but not very quickly - certainly not more quickly than
we can page things out. What happens is that under these circumstances
one of them gets marked as a "scapegoat", and once that happens all the
others will just live off the pages that the scapegoat frees up, while
the scapegoat itself doesn't make much progress at all because it is
always just freeing memory for others.
The really bad behaviour tends to go away reasonably quickly, but while
it happens it's _really_ unfair.
- try_to_free_pages() just goes overboard, and starts paging stuff out
without getting back to the nice balanced behaviour. This is what
Andrea noticed.
Essentially, once it starts failing the shrink_mmap() tests, it will
just page things out crazily. Normally this is avoided by just always
starting from shrink_mmap(), but if you ask try_to_free_pages() to try
to free up a ton of pages, the balancing that it does is basically
bypassed.
So basically pre-6 works _really_ well for the kind of stress-me stuff
that it was designed for: a few processes that are extremely memory
hungry. It gets close to perfect swap-out behaviour, simply because it is
optimized for getting into a paging rut.
That makes for nice benchmarks, but it also explains why (a) sometimes
it's just not very nice for interactive behaviour and (b) why it under
normal load can easily swap much too eagerly.
Anyway, the first problem is fixed by making "trashing" be a global flag
rather than a per-process flag. Being per-process is really nice when it
finds the right process, but it's really unfair under a lot of other
circumstances. I'd rather be fair than get the best possible page-out
speed.
Note that even a global flag helps: it still clusters the write-outs, and
means that processes that allocate more pages tend to be more likely to be
hit by it, so it still does a large part of what the per-process flag did
- without the unfairness (but admittedly being unfair sometimes gets you
better performance - you just have to be _very_ careful whom you target
with the unfairness, and that's the hard part).
The second problem actually goes away by simply just not asking
try_to_free_pages() to free too many pages - and having the global
trashing flag makes it unnecessary to do so anyway because the flag will
essentially cluster the page-outs even without asking for them to be all
done in one large chunk (and now it's not just one process that gets hit
any more).
There's a "pre-7.gz" on ftp.kernel.org in testing, anybody interested?
It's not the real thing, as I haven't done the write semaphore deadlock
thing yet, but that one will not affect normal users anyway so for
performance testing this should be equivalent.
Linus Torvalds [Fri, 23 Nov 2007 20:17:58 +0000 (15:17 -0500)]
Linux 2.2.0pre5
Oh, well.. Based on what the arca-[678] patches did, there's now a pre-5
out there. Not very similar, but it should incorporate the basic idea:
namely much more aggressively asynchronous swap-outs from a process
context.
Comment away,
Linus
Linus Torvalds [Fri, 23 Nov 2007 20:17:56 +0000 (15:17 -0500)]
Linux 2.2.0pre4
Ok, you know the drill by now. This fixes:
- yes, people told me about the new and improved ksymoops. Much better,
no need for C++, and this one actually seems to compile and work
reliably.
- ntfs fixes
- the vfat thing _really_ works now
- NFS fix for deleting files while writebacks active.
- ppa/imm driver updated
- minor mm balancing patches
- Alan took the gauntlet and cleaned up some CONFIG_PROC_FS stuff.
More on Monday,
Linus Torvalds [Fri, 23 Nov 2007 20:17:48 +0000 (15:17 -0500)]
Linux 2.2.0pre2 (December 31 1998)
Well, some people obviously had problems with the first 2.2.0pre, so
there's a second one there. Most of it is almost purely syntactic sugar:
configuration issues and jiffies wraparound, but there were a few problems
wrt some IDE disk geometry stuff in particular that made 2.2.0pre1 not
boot for some people.
Other real changes:
- nfsd updated, and we have an official maintainer for knfsd (and I was
happy by how many people were ready to stand up for it. Good show,
guys!)
- network driver updates (tulip/eepro)
- some TCP fixes for occasional but nasty performance problems.
- fix for an attack where you could cause a complete and utter lockup of
the kernel as a normal user. Thanks to Michael Chastain for keeping the
faith on this one and reminding me to fix it.
If you haven't had problems with pre1, there should be no major cause to
look at pre2. But if you haven't even looked at pre1 yet, please consider
looking at the pre-2.2.0 kernels before it's too late. I'm going to be
extremely rude to people who knew better but didn't test out the pre-
kernels and then send me bug-reports on the released 2.2.0.
Linus Torvalds [Fri, 23 Nov 2007 20:17:46 +0000 (15:17 -0500)]
Linux 2.2.0 (pre1) (28 Dec 1998)
we're in the pre-2.2.0 series now, I'm all synched up with Alan, and I
don't have anything pending any more. Over the internet nobody can hear
you all scream in pain over all your favourite features that didn't make
it.
Linus "another year older and wise as hell by now" Torvalds
Linus Torvalds [Fri, 23 Nov 2007 20:17:38 +0000 (15:17 -0500)]
pre-2.1.132-4..
There's a new pre-patch on ftp.kernel.org. I've been waiting for a few
other things, but the pre-patches are getting to be so big that it's
getting unwieldly, so I'll probably make a real 2.1.132 real soon now. In
the meantime, there's a pre-patch that people can verify for sanity (this
one should have coda-fs back to working order, for example - patch
craziness corrupted a simple update in pre-3).
Linus Torvalds [Fri, 23 Nov 2007 20:17:35 +0000 (15:17 -0500)]
pre-2.1.132-2..
..is out there, and has everybodys favourite fix, ie the version number
has been bumped this time. In addition, compared to pre-1, it has:
- autofs fix (uninitialized inode number could lead to "interesting"
problems)
- some more NFS fixes (file truncation with pending write-backs this time)
- disable_irq()/enable_irq() now nests properly, as Alan convinced me
(quite rightly) that they have to nest in order to work sanely with
shared interrupt and multiple CPU's and various other schenarios.
- more merges from Alan, we're getting closer to being synched up.
Most of the bulk of the thing is the irda stuff, that most people can
ignore.
Linus Torvalds [Fri, 23 Nov 2007 20:17:33 +0000 (15:17 -0500)]
Linux 2.1.132pre1
There's a new pre-patch out there. I'm back from Finland, and have caught
up with just about half the email that I got during the stay. However,
even the part I caught up with I may have partly missed something in,
because (for obvious reasons) I didn't read them as carefully (*) as I
usually do.
This should fix at least part of the NFS problems people have reported:
there was code to completely incorrectly invalidate quite valid write
requests under some circumstances. The pre-patch also contains the first
batch of patches merged in from Alan, and the "rmdir" problems should be
fixed (mostly thanks to Al Viro).
This pre-patch also gets rid of some imho completely unnecessary
complexity in some of the VM memory freeing routines. There have been
patches floating around that added more heuristics on when to do
something, and this tries to get the same result by just removing old
heuristics that didn't make much sense.
Linus
(*) Even my usual "careful" is not very careful by other peoples
standards. So when _I_ say that I wasn't very careful, you should just
assume that I was reading my email about as carefully as a hyper-active
hedgehog on some serious uppers. Can you say "ignored email" three times
quickly while chewing on an apple?
Linus Torvalds [Fri, 23 Nov 2007 20:17:31 +0000 (15:17 -0500)]
Linux 2.1.131
2.1.131 is out there now - and will be the last kernel release for a
while. I'm going to Finland for a week and a half, and will be back mid
December. During that time I hope people will beat on this. I'll be able
to read email when I'm gone, but as I haven't been back in over a year,
I'm not very likely to.
Alan, I have got any replies (positive or negative) about the VFS fixes in
pre-2.1.131-3 (which are obviously in the real 131 too), so I hope that
means that I successfully fixed all filesystems. The chance of that being
true is remote, but hey, I can hope. If not, I assume you'll be doing
your ac patches anyway (any bugs wrt rmdir() should be fairly obvious once
seen), and people might as well consider those official..
Linus Torvalds [Fri, 23 Nov 2007 20:17:30 +0000 (15:17 -0500)]
pre-patch-2.1.131-3
Ok,
I've made a new pre-patch-2.1.131-3.
The basic problem (that Alexander Viro correctly diagnosed) is that the
inode locking was horribly and subtly wrong for the case of a "rmdir()"
call. What rmdir() did was essentially something like
- VFS: lock the directory that contains the directory to remove
(this is normal and required to make sure that the name updates are
completely atomic - so removing or adding anything requires you to hold
the lock on the directory that contains the removee/addee)
- low-level filesystem: lock the directory you're going to remove, in
order to atomically check that it's empty.
So far so good, the above makes tons of sense. HOWEVER, the problem is
fairly obvious if anybody before Alexander had actually bothered to think
about it: when we hold two locks, we had better make sure that we get the
locks in the right order, or we may end up deadlocked with two (or more)
processes getting the locks in the wrong order and waiting on each other.
Now, if it was only rmdir(), things would be fine, because the directory
hierarchy itself imposes a lock order for rmdir(). But we have another
case that needs to lock two directories: "rename()". And that one doesn't
have the same kind of obvious order, and uses a different way to order the
two locks it gets. BOOM.
As far as I can tell, this is a problem in 2.0.x too, but while it's a
potential really nasty DoS-opening, it does have the saving grace that the
window to trigger it is really really small. I don't know if you can
actually make an exploit for it that has any real chance of hitting it,
but it's at least conceptually possible.
Now, the only sane fix was to actually make the VFS layer do all the
locking for rmdir(), and thus let the VFS layer make sure the order is
correct, so that low-level filesystems don't need to worry their pretty
heads. I tried to do that in the previous pre-patch, and it worked well
for ext2, but not all that much else. The problem was that too many
filesystems "knew" what the rmdir() downcall used to do. Oh, well.
Anyway, I've fixed the low-level filesystems as far as I can tell, and the
end result is a much cleaner interface (and one less bug). But it's an
interface change at a fairly late date, and while the fixes to smbfs etc
looked for the most part obvious, I haven't been able to test them, so
I've done most of them "blind".
Sadly, this bug couldn't just be glossed over, because a normal user could
(by knowing the exact right incantation) force tons of unkillable
processes that held critical filesystem resources (any lookup on a
directory that was locked would in turn also lock up). So I'd ask people
who have done filesystems for Linux to look over my changes, and if the
filesystems are not part of the standard distribution please look over
your own locally maintained fs code. I think we can ignore 2.0.x by virtue
of it probably being virtually impossible to trigger. I'll leave the
decision up to Alan.
Most specially, I'd like to have people who use/maintain vfat and umsdos
filesystems to test out that I actually made those filesystems happy with
my changes. The other filesystems were more straightforward.
Oh, and thanks to Alexander. Not that I really needed another bug to fix,
but it feels good to plug holes.
Linus
The change is basically:
- the VFS layer locks the directory to be removed for you (as opposed to
just the directory that contains the directory to be removed as it used
to). A lot of filesystems didn't actually do this, and it is required,
because otherwise the test for an empty directory may be subverted by a
clever hacker.
- the VFS layer will have done a dcache "prune" operation on the
directory, and if there were no other uses for that dcache entry, it
will have done a "d_drop()" on it too.
- the above essentially means that any filesystem can do a
if (!list_empty(&dentry->d_hash))
return -EBUSY;
to test whether there are other users of this directory. No need to do
any extra pruning etc - if it's been dropped there won't be any new
users of the dentry afterwards, so there are no races. So after doing
the above test you know that you'll have exclusive access to the dentry
forever.
Most notably, the low-level filesystem should _not_ look at the
dentry->d_count member to see how many users there are. The VFS layer
currently artificially raises the dentry count to make sure
"d_delete()" doesn't get rid of the inode early.
- however: traditional local UNIX-type filesystems tend to want to allow
removing of the directory even if it is in use by something else. This
requires that the inode be accessible even after the rmdir() - even
though it doesn't necessarily need to actually _do_ anything.
For a normal UNIX-like filesystem this tends to be trivial and quite
automatic behaviour, but you need to think about whether your
filesystem is of the kind where the inode stays around even after the
delete until we locally do the final "iput()". For example, on
networked filesystems this is generally not true, simply because the
server will have de-allocated the inode even if we still have a
reference to it locally.
Linus Torvalds [Fri, 23 Nov 2007 20:17:29 +0000 (15:17 -0500)]
Linux 2.1.131pre2
There's a pre-131-2 patch there on ftp.kernel.org in the testing
directory. This should have the NFS locking issues worked out (please
test), and also has a rather subtle but potentially very nasty deadlock
due to incorrect semaphore ordering with rmdir() hopefully fixed for good.
Alan, the regparm patches are also there.
Linus
nfs: write back everything whenever some lock is changed (not just for
unlock), and always invalidates the caches.
Linus Torvalds [Fri, 23 Nov 2007 20:17:27 +0000 (15:17 -0500)]
The Basted Turkey Release (aka 2.1.130)
Following hot on the heels of the greased weasel, the basted turkey rears
its handsome head.
The basted turkey release fixes some problems that our dear weasel had,
namely:
- NFS reference counting was wrong. It had been wrong for a long time,
but apparently the more aggressively asynchronous code was more easily
able to show the resultant random memory corruption. That should be
gone.
- The UP flu fixed officially (this has been in most of the 2.1.129
patches)
- kernel_thread() used to be able to cause bad things in init-routines at
bootup. Fixed.
- itimers could lead to bad things in SMP under heavy itimer load.
- various mm tweaks to make it behave better under load. Things for dirty
buffers still under consideration.
- IP masqerading check fixes.
- acenic gigabit ethernet driver
- some drunken revelers fixed some MCA issues.
- alpha PCI setup updates and video drivers
- hfs and minix filesystem fixes.
On the whole, an excellent thing to do this evening, and goes together
remarkably well with some good red wine. Amaze your friends and relatives
by completely ignoring them, sitting in a corner with your own basted
turkey, and getting wasted on red wine. Much more fun than your average
thanksgiving dinner,
Linus Torvalds [Fri, 23 Nov 2007 20:17:26 +0000 (15:17 -0500)]
pre-2.1.130-3
There's a new pre-patch for people who want to test these things out: I'll
probably make a real 2.1.130 soon just to make sure all the silly problems
in 2.1.129 are left behind (ie the UP flu in particular that people are
still discussing even though there's a known cure).
The pre-patch fixes a rather serious problem with wall-clock itimer
functions, that admittedly was very very hard to trigger in real life (the
only reason we found it was due to the diligent help from John Taves that
saw sporadic problems under some very specific circumstances - thanks
John).
It also fixes a very silly NFS path revalidation issue: when we
revalidated a cached NFS path component, we didn't update the revalidation
time, so we ended up doing a lookup over the wire every time after the
first time - essentially making the dcache useless for path component
caching of NFS. If you use NFS heavily, you _will_ notice this change (it
also fixes some rather ugly uses of dentries and inodes in the NFS code
where we didn't update the counter so the inode wasn't guaranteed to even
be there any more!).
Also, thanks to Richard Gooch &co, who found the rather nasty race
condition when a kernel thread was started from an init-region. The
trivial fix was to not have the kernel thread function be inlined, but
while fixing it was trivial, it wasn't trivial to notice in the first
place. Good debugging.
And the UP flu is obviously fixed here (as it was in earlier pre-patches
and in various other patches floating around).
Linus Torvalds [Fri, 23 Nov 2007 20:17:23 +0000 (15:17 -0500)]
Linux 2.1.129
To a large degree is more merges for PPC and Sparc (and
somehow I must have missed ARM _again_, so I'll have to find that).
But there's a few other things in there:
- ncr53c8xx tag fix
- more sound fixes.
- NFS fixed
- some subtle TCP issues fixed
- and lots of mm smoothness tweaks (most of those have been floating
around for some time - like getting rid of the last vestiges of page
ages which just complicated and hurt the code)
Have fun with it, and tell me if it breaks. But it won't. I'm finally
getting the old "greased weasel" feeling back. In short, this is the much
awaited perfect and bug-free release, and the only reason I don't call it
2.2 is that I'm chicken.
Linus Torvalds [Fri, 23 Nov 2007 20:17:18 +0000 (15:17 -0500)]
Linux 2.1.129-pre3
I don't know how I made an old pre-patch available: I've made a pre-3 that
has the proper proc thing so that it compiles (it is otherwise identical
to pre-2, so if you got pre-2 to compile by patching by hand, then there's
no reason to get pre-3).
Linus Torvalds [Fri, 23 Nov 2007 20:17:10 +0000 (15:17 -0500)]
Linux 2.1.127
Ok,
after two fairly hectic weeks for me, 2.1.127 is finally out there.
This kernel does:
- various small but important networking fixes from Davem (thanks). One
of them is the "anti-nagle" bit to allow programs that know what they
are doing to avoid nagling by telling the kernel so. This is mainly
things like Web servers and ftp-servers that can use this option
together with "sendfile()".
- scheduling timeout interface change: the new interface is much more
logical than the old one, and allows us to get the jiffies wrap-around
case right. Thanks to Andrea Arcangeli.
- Various driver updates: specialix, sonycd,
- Memory management fixups. Handle out-of-memory conditions correctly,
and handle high memory load much more gracefully.
- sparc and PowerPC architecture updates
- 3c509 SMP fix, tlan PCI probe update.
- scsi driver updates: ncr53c8xx, aic7xxx, dc390
- filesystem updates: autofs, hfs, umsdos
Linus Torvalds [Fri, 23 Nov 2007 20:17:07 +0000 (15:17 -0500)]
>> Btw, I've been looking at why Andrea thinks he's patches are needed,
>> because I looked very deep and the patches really shouldn't have made any
>> real difference..
>> The reason - tadaam - is so silly that it's embarrassing. The thing is,
>> that the things that should use GFP_USER don't. They use GFP_KERNEL
>> instead, and that is sufficient to explain all the problems that Andrea
>> saw. Becuase GFP_KERNEL will continue to allow allocations even after the
>> freeing up of another page has failed.
>> After fixing that in mm/memory.c and mm/filemap.c, the problem seems to be
>> properly fixed.
> I thought to change that but I was not sure (and infact some email ago I
> asked that to you too). I have not changed that myself because I was
> worryed that userspace allocation could be too much light. It would be
> nice to know if using GFP_USER and disabling kswapd (at the end of
> vmscan.c) causes process to segfaults (so that we can know if a real time
> process can alloc/swapout memory safely).
I wonder why it wasn't GFP_USER - that's exactly what the thing is there
for, and I don't know when it was changed. Probably with the new page
cache or something. I just looked at the memory allocator, and it looked
like it was doing the right thing, and it _was_ - but because it was
called with GFP_KERNEL it tried harder than it should have to return a
good page even when it ran out of memory.
Anyway, I made a pre-patch-2.1.127-6 and put it on ftp.kernel.org (pre-4
and pre-5 have been my internal pre-patches and don't show up there). This
has the timeout code basic fixes and the mm fixes, and doesn't fall over
for me with Andreas memory load case.
Linus Torvalds [Fri, 23 Nov 2007 20:17:04 +0000 (15:17 -0500)]
Linux 2.1.127pre2
I just found a case that could certainly result in endless page faults,
and an endless stream of __get_free_page() calls. It's been there forever,
and I bascially thought it could never happen, but thinking about it some
more it can happen a lot more easily than I thought.
The problem is that the page fault handling code will give up if it cannot
allocate a page table entry. We have code in place to handle the final
page allocation failure, but the "mid-way" failures just failed, and
caused the page fault to be done over and over again.
More importantly, this could happen from kernel mode when a system call
was trying to fill in a user page, in which case it wouldn't even be
interruptible.
It's really unlikely to happen (because the page tables tend to be set up
already), but I suspect it can be triggered by execve'ing a new process
which is not going to have any existing page tables. Even then we're
likely to have old pages available (the ones we free'd from the previous
process), but at least it doesn't sound impossible that this could be a
problem.
I've not seen this behaviour myself, but it could have caused Andrea's
problems, especially the harder to find ones. Andrea, can you check this
patch (against clean 2.1.126) out and see if it makes any difference to
your testing?
(Right now it does the wrong error code: it will cause a SIGSEGV instead
of a SIGBUS when we run out of memory, but that's a small detail).
Essentially, instead of trying to call "oom()" and sending a signal (which
doesn't work for kernel level accesses anyway), the code returns the
proper return value from handle_mm_fault(), which allows the caller to do
the right thing (which can include following the exception tables). That
way we can handle the case of running out of memory from a kernel mode
access too..
(This is also why the fault gets the wrong signal - I didn't bother to fix
up the x86 fault handler all that much ;)
Btw, the reason I'm sending out these patches in emails instead of just
putting them on ftp.kernel.org is that the machine has had disk problems
for the last week, and finally gave up completely last Friday or so. So
ftp.kernel.org is down until we have a new raid array or the old one
magically recovers. Sorry about the spamming.
Linus Torvalds [Fri, 23 Nov 2007 20:17:03 +0000 (15:17 -0500)]
Linux 2.1.127pre1
I have an alternate patch for low memory circumstances that I'd like you
to test out.
The problem with the old kswapd setup was at least partly that kswapd was
woken up too late - by the time kswapd was woken up, it really had to work
fairly hard. Also, kswapd really shouldn't be real-time at all: normally
it should just be a fairly low-priority process, and the priority should
grow as there is more urgent need for memory.
This alternate approach seems to work for me, and is designed to avoid the
"spikes" of heavy real-time kswapd activity during which the machine is
fairly unusable in the old scheme.
Linus Torvalds [Fri, 23 Nov 2007 20:17:02 +0000 (15:17 -0500)]
Linux-2.1.126
- architecture updates for alpha and MIPS (and some minor PPC updates
too)
- joystick updates
- MCA stuff from Alan. The guy has too much free time on his hands.
- stallion driver cosmetic update
- nasty SMP race with "task queues" (not the scheduling kind), where we
were mixing atomic metaphores, resulting in a mess. Usually a benign
one, but occasionally you could force oopses.
- some floppy and ide updates
- PS/2 mouse driver integrated into the PC keyboard controller. That got
rid of a lot of really nasty problems (it's the same controller,
accessing it from two different drivers was always messy)
- various driver updates: floppy, ide, network drivers, sound, video..
- various small FS fixes - finally _really_ getting the ENOENT vs ENOTDIR
stuff right, nfsd updates, remounting fixes, filesize limits on NFS
and smbfs, ntfs and ufs updates...
- shm updates from Alan
- cleanup of some MM stuff, I hope Andrea will re-do the patches and I'll
look at the other parts.
- unix fd garbage collection fix, getting rid of circular dependencies..
And probably various other small fixes that I have thankfully forgotten
about.
Linus Torvalds [Fri, 23 Nov 2007 20:16:57 +0000 (15:16 -0500)]
Linux-2.1.125 ... pre-2.2 imminent
It seems that I've finally found the mysterious bug that caused some SMP
machines to lock up at bootup if they had no keyboard enabled. It turns
out that the keyboard was a complete red herring, and that it just changed
timings of bottom half handling in particular. The real culprit was some
misguided locking attempts by the console driver at a really bad time.
Anyway, that means that the last of my personal show-stopper bugs in 2.1.x
seems to be finally history. I still expect to sync up with Alan Cox's
patches in particular, but I'm mentally getting ready for a real 2.2.
I still haven't decided on whether I'll make the same kind of "pre-2.2"
that I did before the 2.0 release, but there are strong psychological
reasons to do so to get people to more actively test it out with a "this
really should be stable" mindset.
In the meantime, there's now 2.1.125. Most of 2.1.125 is driver updates
for various things, most notably perhaps joystick and the new 5.10 version
of the Adaptec aic7xxx driver by Doug Ledford (but there are various other
driver updates). The fix for the mysterious lock-up is a few embarrassing
lines removed, but makes me feel a lot better ;)
Linus Torvalds [Fri, 23 Nov 2007 20:16:52 +0000 (15:16 -0500)]
Linux-2.1.124...
.. is out there now, and includes:
- subtle fix for lazy FP save and restore on x86. The bug has been there
for a long time, but was apparently triggered by the re-write of the
low-level scheduling function. It could result in corrupted i387 state
under certain (admittedly fairly unlikely) circumstances.
- various networking updates. Some of the bugs fixed could result in
kernel Oopses. None of them were common, though.
- fixes for both filesystem accounting and quota handling.
- the much-ado-about-little video driver merge.
- PPC and Sparc updates
- i386/SMP interrupt handling falls back on the safe mode.. Please tell
me whether there are still machines with problems.
- some new network drivers and updates
- final (we hope) IP masquerade update
I still have a problem with certain machines that apparently don't want to
boot with the keyboard not plugged in even though they should. Kill me
now. If you have problems with i386/SMP on a machine without a keyboard,
plug one in and send me a report..
Linus Torvalds [Fri, 23 Nov 2007 20:16:38 +0000 (15:16 -0500)]
2.1.122pre1
This may or may not fix the APM problems, and the INITRD ones. The
INITRD one in particular was a case of a fairly inexplicable test that
shouldn't have been there in the first place breaking when something
completely unrelated was cleaned up..
The APM breakage was simply due to it being in the wrong place. The
patch looks bigger than it really is - it really only moves the file to
the proper directory, and makes sure that it should compile with the
standard assembler..
Linus Torvalds [Fri, 23 Nov 2007 20:16:24 +0000 (15:16 -0500)]
Linux 2.1.117
I made a 117 to fix the silly things left in 116 in my excitement over it
passing all my crashtests. This should fix the things with the kernel
thinking it was out of memory much sooner than it actually was etc.
Alan still reports some funnies with unix domain sockets, but he's
reportedly fixed the behaviour of NFS over TCP. He didn't make it sound as
if you really want to use it yet, though ;)
Linus Torvalds [Fri, 23 Nov 2007 20:16:21 +0000 (15:16 -0500)]
Linux 2.1.116
I just released Linux-2.1.116. I've tested it fairly extensively on my SMP
box, both with little memory and much, and I cannot make it lock up any
more.
Special thanks to Dean Gaudet who helped me set up a apache configuration
that finally made me able to repeat the lockup, and made me able to debug
the thing.
Most of the 2.1.116 patches are "just" alpha and m68k updates, and can be
ignored by most people. The bugfixes are, roughly:
- fixed serious low-memory situation problem, where a critical resource
allocation problem could result in nasy behaviour. Notably, doing TCP
under low memory could result in TCP trying to allocate memory in a
tight loop and locking out kswapd completely so that the situation
would never be rectified. In short, the machine hung.
This problem has been there forever, the only reason it doesn't show up
under 2.0.x seems to be because under 2.0.x the TCP allocation was
always for a single page, for which this situation never arises. Under
2.1.x the slab code forced multi-page allocations.
If you've seen lockups with 2.1.x, this may be the cause. This was what
held up 2.1.116 for so long.
- various minor driver updates. Networking, radio, bttv.
- NFS over TCP still doesn't work, but at least it fails due to new
reasons.
Alan, try your squid thing under 2.1.116. I suspect it will hold up now,
Linus Torvalds [Fri, 23 Nov 2007 20:16:17 +0000 (15:16 -0500)]
Linux-2.1.115 - code freeze.
Ok, we've been in a tentative code freeze for a long time, and now it's
final. I've made a 2.1.115 that I hope is good enough, and I won't be
accepting anything but bug-fixes until 2.2..
There are two long-standing patches that I'm still considering:
- devfs
- dynamic fd's
and I kind of expect that they'll go in (devfs is configurable, so if you
don't want it you don't need to care, and the dynamic fd's save some
memory and speed certain things up a bit). The reason they're not in now
is mainly that I've been trying to get everything else off my plate, and I
want to ruminate on them in peace for a while.
Bug-fixes are still (and will always be) accepted,
Linus Torvalds [Fri, 23 Nov 2007 20:15:58 +0000 (15:15 -0500)]
What 2.1.110-3 does is to much more aggressively throw out dentries (and
thus inodes) under low-memory circumstances. It may be _too_ aggressive
right now, but if so that just gives a good mid point to strive for.
I'd really like to hear comments about how this "feels" (and numbers
too, if you have them). It's fairly hard for me to judge, as whenever I
run Linux on small-memory machines it always feels slower than I'm used
to, regardless of whether Linux does the right thing or not ;)
Linus Torvalds [Fri, 23 Nov 2007 20:15:54 +0000 (15:15 -0500)]
Linux-2.1.109.. preliminary code freeze.
Ok, it's out there now in all its glory...
2.1.109 does the following thing:
- CPU detection in C code (and thus much easier to expand upon,
especially as it's all thrown away after booting now that it is
"initfunc()"). This should finally get the Cyrix case right, for
example. Please test.
- too meny people convinced me that sendfile() really wants to act like
writep().
- sound driver updates from Alan.
- console updates, so now we have the full old functionality again as far
as I'm concerned (but I'm sure people will tell me something is still
missing)
- task switch and user space return cleanly handles bad segment
descriptors etc, so people shoul dno longer be able to cause kernel
messages by misusing the LDT (and I was just informed that you could
actually completely hang a 2.1.x SMP kernel by doing nasty things -
this fixes it)
- wine should work again thanks to Bill Hawes (other LDT fixes)
- de4x5 driver update
- token ring driver update
- ppp driver update
- coda-fs update
- "shared writable" bug fixed (thanks to a lot of people for testing and
working on this - the actual fix was trivial once the problem was
understood)
In addition, I've spent a large part of my day running with a 12MB
machine, and low-memory behaviour seems to be reasonable. People who have
been unhappy with low-memory behaviour should check out 109 and comment on
it - the heuristics are fairly different, and seem to be better.
As of this release, I won't be looking at the "incoming" directory at the
linux-patches site any more. I'll only be looking at "urgent" things, on
the theory that I'm (a) lazy and (b) getting into code freeze.
If you have important patches in "incoming", feel free to move them to
"urgent". However, I will warn that if I don't consider them to be 2.2
material, I'll just move them to "discarded".
The same goes for patches in email. I will accept patches, but I've just
raised the bar for acception.
Linus Torvalds [Fri, 23 Nov 2007 20:15:52 +0000 (15:15 -0500)]
pre-2.1.109-2..
To get people away from their normally scheduled copyright discussions, I
made a pre-2.1.109 to try out. I woul dhave made a real 2.1.109, but my
computer room has been taken over by visiting relatives, and they want to
go to sleep. Ye Gods!
Get it from ftp.kernel.org, /pub/linux/kernel/testing as usual. It has
- CPU detection in C code (and thus much easier to expand upon,
especially as it's all thrown away after booting now that it is
"initfunc()"). This should finally get the Cyrix case right, for
example. Please test.
- too meny people convinced me that sendfile() really wants to act like
writep().
- sound driver updates from Alan.
- console updates, so now we have the full old functionality again as far
as I'm concerned (but I'm sure people will tell me something is still
missing)
- task switch and user space return cleanly handles bad segment
descriptors etc, so people shoul dno longer be able to cause kernel
messages by misusing the LDT.
- wine should work again thanks to Bill Hawes (other LDT fixes)
- de4x5 driver update
- token ring driver update
- ppp driver update
- coda-fs update
- "shared writable" bug fixed (thanks to a lot of people for testing and
working on this - the actual fix was trivial once the problem was
understood)
Check it out,
Linus Torvalds [Fri, 23 Nov 2007 20:15:49 +0000 (15:15 -0500)]
Linux 2.1.108
I just made a pre-2.1.108 and put it on ftp.kernel.org - it fixes a
problem where my sendfile() forgot to get the kernel lock (blush), so it
randomly didn't work correctly on SMP.
I've also done some more testing of sendfile(), and the nice thing is that
when I compared doing a file copy with sendfile compared to a plain "cp",
the sendfile implementation was about twice as fast (at least my version
of "cp" will just do read+write pairs over and over again). When I copied
a 38MB file the "cp" took 1:58 seconds while sendfile took 1:08 seconds
according to "time" (I have 512MB of RAM, so this was all cached,
obviously)..
I haven't done any network tests, because I don't think I'd be able to see
any difference, and it does need the "SO_CONSTIPATED" thing and a way to
push the end of data for best performance.
Some final words on sendfile():
- it does report errors correctly. That doesn't mean that you necessarily
can know _which_ fd produced the error, that you have to find out on
your own. A file real access can generally result in EIO and EACCES
(the latter with NFS and other "protection-at-read-time" non-UNIX
filesystems), while the output write() can result in a number of errors
as the output fd can be any kind of socket/tty/file. Depending on the
mode of the output file, the output errors can include EINTR, EAGAIN
etc, and you can mix sendfile() with select() on the output socket, for
example.
- you can give it a length of MAX_ULONG, and it will write as much as it
can. This is entirely consistent with the notion that it is equivalent
with write(out, tmpbuf, read(in, tmpbuf, size)) where "tmpbuf" is
essentially infinite - the read() will read al of the file and return
the file length in the process. Thus you don't even need to know the
size of the file beforehand.
The file copy test was essentially done with a single
error = sendfile(out, in, ~0);
and I'm appending my current test-program.
This is going to be in 2.2, btw. The changes are so small and so obviously
have to work that it would be ridiculous not to have this - the only
question is whether I'll try to make it a "copyfd()" system call instead,
falling back on read+write when I can't use the page cache directly. I
suspect I won't.
Linus Torvalds [Fri, 23 Nov 2007 20:15:45 +0000 (15:15 -0500)]
Linux 2.1.107pre2
Are there people out there that use the loopback device with SMP, and have
been irritated at it not working lately?
I gave up on waiting for any real loop device maintainer to step up and
fix this, so I made a very small patch that I suspect may fix the problem.
I'm not going to test it myself, and I'm fairly disgusted with how badly
the loop device is being maintained at all. But if people feel they want
to test it out, go to
ftp.kernel.org//pub/linux/kernel/testing
and fetch the current pre-107-2 patch.
It also has some other patches to the loop device that I picked up and
that looked like the right thing to do (use dentry pointers instead of
inodes to make mount/umount happy.
Linus Torvalds [Fri, 23 Nov 2007 20:15:39 +0000 (15:15 -0500)]
Linux 2.1.105
Linux-2.1.105 is out there, and is mainly a "synch to other people and fix
silly problems" release. It has the 104 kmod and compilation problems
fixed, and updates some pending patches (notably sound and ham radio
drivers).
Linus Torvalds [Fri, 23 Nov 2007 20:15:38 +0000 (15:15 -0500)]
[tytso] include/asm-i386/posix_types.h
This quick fix eliminates a lot of warning messages when
compiling e2fsprogs under glibc. This is because the glibc header files
defines its own version of FD_SET, FD_ZERO, etc., and so if you need to
#include the kernel include files, you get a lot of duplicate defined
macro warning messages. This patch simply #ifdef's out the kernel
versions of these function if the kernel is not being compiled and the
glibc header files are in use.