[dm-devel] kernel update and dmraid causing grub errors

Heinz Mauelshagen heinzm at redhat.com
Wed Nov 3 12:04:53 UTC 2010


Hi David,

because you're able to access your config fine with some arch LTS
kernels, it doesn't make sense to analyze your metadata upfront and the
following reasons may cause the failures:

- initramfs issue not activating ATARAID mappings properly via dmraid

- drivers missing to access the mappings

- host protected area changes going together with the kernel changes
  (eg. the "Error 24: Attempt to access block outside partition");
  try the libata.ignore_hpa kernel paramaters described
  in the kernel source Documentation/kernel-parameters.txt
  to test for this one

FYI: in general dmraid doesn't rely on a particular controller, just
metadata signatures it discovers. You could attach the disks to some
other SATA controller and still access your RAID sets.

Regards,
Heinz

On Mon, 2010-11-01 at 17:27 -0500, David C. Rankin wrote:
> dmraid devs,
> 
> 	Over the past 8-9 months, I have had numerous dmraid related boot failures with
> the past 6-8 kernels. It seems like a Russian-roulette type problem. Some
> kernels work with dmraid, some cause grub errors. The problem is most acute on
> an MSI SLI Platinum Based board (MS-7374), Phenom X4 (9850), with the following
> pci bus config:
> 
> [15:48 archangel:/home/david/bugs/aa] # lspci
> 00:00.0 RAM memory: nVidia Corporation MCP78S [GeForce 8200] Memory Controller
> (rev a2)
> 00:01.0 ISA bridge: nVidia Corporation MCP78S [GeForce 8200] LPC Bridge (rev a2)
> 00:01.1 SMBus: nVidia Corporation MCP78S [GeForce 8200] SMBus (rev a1)
> 00:01.2 RAM memory: nVidia Corporation MCP78S [GeForce 8200] Memory Controller
> (rev a1)
> 00:01.3 Co-processor: nVidia Corporation MCP78S [GeForce 8200] Co-Processor (rev a2)
> 00:01.4 RAM memory: nVidia Corporation MCP78S [GeForce 8200] Memory Controller
> (rev a1)
> 00:02.0 USB Controller: nVidia Corporation MCP78S [GeForce 8200] OHCI USB 1.1
> Controller (rev a1)
> 00:02.1 USB Controller: nVidia Corporation MCP78S [GeForce 8200] EHCI USB 2.0
> Controller (rev a1)
> 00:04.0 USB Controller: nVidia Corporation MCP78S [GeForce 8200] OHCI USB 1.1
> Controller (rev a1)
> 00:04.1 USB Controller: nVidia Corporation MCP78S [GeForce 8200] EHCI USB 2.0
> Controller (rev a1)
> 00:06.0 IDE interface: nVidia Corporation MCP78S [GeForce 8200] IDE (rev a1)
> 00:07.0 Audio device: nVidia Corporation MCP72XE/MCP72P/MCP78U/MCP78S High
> Definition Audio (rev a1)
> 00:08.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1)
> 00:09.0 RAID bus controller: nVidia Corporation MCP78S [GeForce 8200] SATA
> Controller (RAID mode) (rev a2)
> 00:0a.0 Ethernet controller: nVidia Corporation MCP77 Ethernet (rev a2)
> 00:10.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Express Bridge
> (rev a1)
> 00:12.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Express Bridge
> (rev a1)
> 00:13.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1)
> 00:14.0 PCI bridge: nVidia Corporation MCP78S [GeForce 8200] PCI Bridge (rev a1)
> 00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64,
> Sempron] HyperTransport Configuration
> 00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64,
> Sempron] Address Map
> 00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64,
> Sempron] DRAM Controller
> 00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64,
> Sempron] Miscellaneous Control
> 00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64,
> Sempron] Link Control
> 01:06.0 Serial controller: 3Com Corp, Modem Division 56K FaxModem Model 5610
> (rev 01)
> 01:09.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)]
> IEEE 1394 OHCI Controller (rev c0)
> 02:00.0 VGA compatible controller: nVidia Corporation G92 [GeForce 8800 GT] (rev a2)
> 04:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA
> Controller (rev 03)
> 04:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA
> Controller (rev 03)
> 
> full dmidecode information at:
>   http://www.3111skyline.com/dl/Archlute/bugs/aa-dmidecode.txt

Not accessible.

> 
> 	Booting the current Arch Linux kernel (2.6.35.8-1) fails and the boot hangs at
> the very start. The kernel line I use hasn't changed in a long time:
> 
>   kernel /vmlinuz root=/dev/mapper/nvidia_baaccajap5 ro vga=0x31a
> 
> 	Booting first stopped with the following error:
> 
> Booting 'Arch Linux on Archangel'
> 
> root (hd1,5)
>   Filesystem type is ext2fs, Partition type 0x83
> Kernel /vmlinuz26 root=/dev/mapper/nvidia_baacca_jap5 ro vga=794
> 
> Error 24: Attempt to access block outside partition
> 
> Press any key to continue...
> 
> 	Upgrading to device-mapper-2.02.75-1 completely changes the error to:
> 
> Error 5: Partition table invalid or corrupt
> 
> 	Rebooting to 2.6.35.7-1, or 2.6.32.25-1 (the Arch LTS kernel) works just fine.
> So the problem is not a partition or partition table problem. The Arch Linux
> developer (Tobias Powalowski) has referred me here as the problem isn't a kernel
> problem, but something strange that is happening with dmraid.
> 
> 	The only guess I have is that it is a dmraid/GeForce controller issue that is
> triggered when dmraid loads under certain circumstances.
> 
> 	This box has 2 dmraid arrays:
> 
> [17:15 archangel:/home/david/bugs/aa] # dmraid -r
> /dev/sdd: nvidia, "nvidia_baaccaja", mirror, ok, 1465149166 sectors, data@ 0
> /dev/sda: nvidia, "nvidia_fdaacfde", mirror, ok, 976773166 sectors, data@ 0
> /dev/sdb: nvidia, "nvidia_baaccaja", mirror, ok, 1465149166 sectors, data@ 0
> /dev/sdc: nvidia, "nvidia_fdaacfde", mirror, ok, 976773166 sectors, data@ 0
> 
> [17:15 archangel:/home/david/bugs/aa] # dmraid -s
> *** Active Set
> name   : nvidia_baaccaja
> size   : 1465149056
> stride : 128
> type   : mirror
> status : ok
> subsets: 0
> devs   : 2
> spares : 0
> *** Active Set
> name   : nvidia_fdaacfde
> size   : 976773120
> stride : 128
> type   : mirror
> status : ok
> subsets: 0
> devs   : 2
> spares : 0
> 
> 	All disks check out fine with smartctl, so it isn't a disk-hardware problem.
> The detailed information on the GeForce controller (lspci -vv) is:
> 
> 00:09.0 RAID bus controller: nVidia Corporation MCP78S [GeForce 8200] SATA
> Controller (RAID mode) (rev a2)
>         Subsystem: Micro-Star International Co., Ltd. Device 7374
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
>         Latency: 0 (750ns min, 250ns max)
>         Interrupt: pin A routed to IRQ 28
>         Region 0: I/O ports at b080 [size=8]
>         Region 1: I/O ports at b000 [size=4]
>         Region 2: I/O ports at ac00 [size=8]
>         Region 3: I/O ports at a880 [size=4]
>         Region 4: I/O ports at a800 [size=16]
>         Region 5: Memory at f9e76000 (32-bit, non-prefetchable) [size=8K]
>         Capabilities: [44] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [8c] SATA HBA v1.0 InCfgSpace
>         Capabilities: [b0] MSI: Enable+ Count=1/8 Maskable- 64bit+
>                 Address: 00000000fee0f00c  Data: 4191
>         Capabilities: [ec] HyperTransport: MSI Mapping Enable+ Fixed+
>         Kernel driver in use: ahci
>         Kernel modules: ahci
> 
> 
>     Basically, I'm stumped here. Nothing has changed with this box in over a
> year (same grub menu.lst, same hardware), the only oddity is that in 4 of the
> last 6 kernels or so have failed to boot with this weird grub error, that has
> nothing to do with grub (because it boots all other kernels fine), but is
> 1Gsomething that results from dmraid and the way it gets initialized (which I'm
> clueless about).
> 
>     Let me know what you think and let me know what data or testing you want me
> to do. I'll be happy to do it. I last filed this bug with Arch against 2.6.35-1
> and the problem was never fixed, but (solved) by upgrading to the (next -
> testing kernel), so the actual problem was never found. The url to the closed
> report is:
> 
> https://bugs.archlinux.org/task/20918?
> 
>     Thanks for any ideas or help you can give.
> 





More information about the dm-devel mailing list