Ticket #5391 (closed defect: fixed)

Opened 7 years ago

Last modified 7 years ago

Q2D05/6 bricks machines with bad RTC data

Reported by: wmb@… Owned by: wmb@…
Priority: blocker Milestone: Ship.2
Component: ofw - open firmware Version: 1.0-firmware-Q2D05
Keywords: Cc:
Action Needed: Verified: no
Deployments affected: Blocked By:
Blocking:

Description

Install Q2D05 or Q2D06 firmware. Remove all power including the RTC battery. Wait a couple of minutes for the RTC to discarge. Reinstall the RTC battery and reconnect power. Turn on the machine. Congratulations, you have a brick. The screen won't even come on.

If you have a serial console, you can recover as follows:

The last line on the serial console is Page Fault

Do this:

ok probe-pci probe-usb

Now you can reload the firmware from one of the usual sources with the "flash" command.

The root cause of this problem is a deficiency in the "factory-mode" code added by svn 736. If the month value in the RTC is 0 (which is not in the valid range 1..12), the routine that converts the date to the "seconds since 1970" format accesses outside the valid range of the days per month table and faults.

The fix is to force that value to be within the range 1-12 before accessing the table.

This is a very bad bug and I will issue a new version with a fix immediately.

Change History

Changed 7 years ago by wmb@…

Fixed by svn 752. Will be released as q2d07.

This release needs to be deployed as soon as possible. In manufacturing, it is critical to use this instead of Q2D05 or Q2D06. In the field, it can wait until the next convenient update time. If the machine has Q2D04 or earlier, this bug will not be present. If the machine has Q2D05 or Q2D06, everything will be okay so long as the RTC does not lose its data. The only ways in which the bug would manifest on a field machine would be if

a) The RTC battery were to be removed (requires system dissassembly) or to fail (should not happen for several years).

or

b) Someone explicitly sets the RTC month field to 0 (not easy to do from Linux; could be done from OFW if you went out of your way).

Changed 7 years ago by jg

  • milestone changed from Never Assigned to Ship.2

Changed 7 years ago by wmb@…

  • status changed from new to closed
  • resolution set to fixed

Changed 7 years ago by gnu

Turns out that some early mass-production motherboards contained bad clock-battery holders (due to a worn plastic mold), which resulted in some batteries popping out during shipping, e.g. when a box was dropped. The resulting loss of RTC date caused a certain amount of DOA bricking, *without* anyone doing disassembly of units.

Note: See TracTickets for help on using tickets.