Ticket #1905 (closed defect: fixed)
Field Return: flash corruption - OpenFirmware complaining of 'unknown node type 2006'.
|Reported by:||dwmw2||Owned by:||wad|
|Priority:||high||Milestone:||8.2.0 (was Update.2)|
|Keywords:||Cc:||wmb@…, dwmw2, jg, gary, Luna.Huang@…, Elvis.Wu@…|
|Deployments affected:||Blocked By:|
Description (last modified by kimquirk) (diff)
A B2 machine was handed to me which failed to boot from NAND, with OpenFirmware complaining of 'unknown node type 2006'.
This is a somewhat bogus diagnostic message from OpenFirmware. It _does_ understand the node type 0x2006, which is a summary node. It's just that these ones have bad CRCs.
There seems to have been corruption on the write path, between CPU, RAM and CAFÉ. An example...
01fdb310 30 17 eb 15 21 00 00 3b 85 19 01 e0 36 00 00 00 |0...!..;....6...| 01fdb320 a4 e1 55 df 60 05 00 80 3d 06 00 00 3f 06 00 00 |..U.`...=...?...| 01fdb330 a1 a5 0a 46 0e 08 00 00 06 7e be ae 18 18 99 b3 |...F.....~......| 01fdb340 70 69 6e 6b 5f 72 6f 75 6e 64 2e 67 69 66 ff ff |pink_round.gif..|
This provokes the following report from the kernel:
JFFS2 notice: (2554) read_direntry: header CRC failed on dirent node at 0x1fdb318: read 0xaebe7e06, calculated 0x1432b5ee
The parent inode value of 0x8000560 looks very suspicious. Flipping the msb of the byte at 01fdb327 back to a more reasonable 0x00 makes the crc32 match what's on the flash.
There are no ECC errors reported -- what's on the flash seems to be what reached the CAFÉ in the DMA transfer when this block was being written. So this doesn't seem to be an error between CAFÉ and NAND. And the crc32 seems sane too, so it doesn't seem likely that it's memory corruption or program error. I suspect hardware.
I'll look at other nodes (there are many broken ones) and see if there's a pattern to the corruption.