SOLVED: Modern kernels fail to boot on old quad processor Pentium Pro server (Updated x3)

I’ve got an ancient Olivetti Netstrada, a deskside server system with quad Pentium Pro 200MHz processors, 256MB RAM, dual power supplies and five 4GB SCSI drives.

It’s been running Ubuntu 8.04 for ages and I found that with my partitioning layout (set up for testing ZFS-fuse ages ago) I couldn’t upgrade it without major surgery so I decided I’d just put Debian on it instead. That’s where I hit problems..

  1. Debian/kfreebsd (Squeeze & daily) – kernel panics very early with panic: vm_fault: fault on nofault entry, addr: c3925000.
  2. Debian/Linux Squeeze – CD boot loader hangs before getting to menu
  3. Debian Lenny – install kernel panics when uncompressing the initramfs, claims it’s out of memory.

Fortunately the Debian Etch install CD boots and installs correctly, only problem is that Etch is now archived and there are no updates for it..

I dist-upgraded to Lenny and found that the latest kernel there still panics on boot, but the user space is OK. Then I went to Squeeze and found that yes, the Squeeze kernel hangs very early, just after saying it was booting the kernel after uncompressing. Unfortunately the udev in Squeeze won’t work with the Etch kernel, but all that’s broken so far is bringing up the network interface and I can do that by hand with dhclient eth0. Oh, and grub2 hangs (which I suspect is the same issue as the install CD).

I’ve tried building my own kernel using 2.6.38.3 starting with an “allnoconfig” to disable everything and only turn on the minimum necessary, but that has the same behaviour as the 2.6.32 kernel that is in Squeeze, the last thing printed to the console is:

Booting the kernel.

which is at the end of the decompress_kernel() function in arch/x86/boot/compressed/misc.c.

Does anyone have any ideas before I go and throw myself on the tender mercies of the LKML ?

Update: Both Alan Cox and Ingo Molnar suggested using the earlyprintk=vga option which I’d not stumbled across before, that revealed that the 2.6.39-rc4 kernel is misdetecting LOWMEM as 16MB not 256MB which could explain a lot. It also reminded me that I’d seen this before and had an offlist conversation with H. Peter Anvin about it in 2008 which tailed off due to work pressures on his part.

Update 2: Thanks to Thomas Meyer and H. Peter Anvin it’s now known what happened – the commit message from hpa for Thomas’s patch describes it best:

When we use BIOS function e801 to probe memory, we should use ax/bx (or cx/dx) as a pair, not mix and match. This was a typo during the translation from assembly code, and breaks at least one set of machines in the field (which return cx = dx = 0).

The patch has been accepted by Linus and will be in 2.6.39!

Update 3: The patch is in 2.6.39-rc6 and that now successfully boots all the way to userspace with the kernel parameters “noapic scsi_mod.scan=sync”! Hooray!