Ticket #11089 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

XO-1.75 B1 brick due to efface-md failure to flash-open (after RTC power fail or sufficient elapsed time)

Reported by: Quozl Owned by: greenfeld
Priority: high Milestone: 1.75-firmware
Component: ofw - open firmware Version: 1.75-B1
Keywords: Cc:
Action Needed: no action Verified: no
Deployments affected: Blocked By:
Blocking:

Description (last modified by Quozl) (diff)

An XO-1.75 B1 with Q4B05 with an "md" tag will brick if the RTC loses power, or if enough time has elapsed since the "md" tag was created in factory.

Diagnosis: OpenFirmware serial port shows an ok prompt after an error Unsupported SPI FLASH ID. Using flash-open manually at this point shows no error.

Workaround: flash q4b05jt (svn 2402) or later.

Prevention: manually change the "md" tag to an "MD" tag, before the time has elapsed, or before the RTC loses power.

Attachments

sku199-1.log (1.7 kB) - added by Quozl 3 years ago.
firmware boot prior to removing RTC battery
sku199-2.log (1.5 kB) - added by Quozl 3 years ago.
firmware boot after removing RTC battery, note error, md tag, and reset clock.
sku199-3.log (5.0 kB) - added by Quozl 3 years ago.
firmware boot after removing RTC battery, note error, md tag, and reset clock, then reflash to q4b02d, then boot, then note MD tag, then reflash to Q4B05, then boot.
svn2396.log (1.3 kB) - added by Quozl 3 years ago.
With SVN 2396, the SPI ID read is 00, and CForth reports that OFW has not fully started.

Change History

Changed 3 years ago by Quozl

firmware boot prior to removing RTC battery

Changed 3 years ago by Quozl

firmware boot after removing RTC battery, note error, md tag, and reset clock.

Changed 3 years ago by Quozl

firmware boot after removing RTC battery, note error, md tag, and reset clock, then reflash to q4b02d, then boot, then note MD tag, then reflash to Q4B05, then boot.

Changed 3 years ago by Quozl

With SVN 2396, the SPI ID read is 00, and CForth reports that OFW has not fully started.

Changed 3 years ago by wmb@…

In initial bringup, I had problems with SPI initialization - sometimes it would work and sometimes it wouldn't. This was in the CForth realm, which has to access the SPI FLASH several times. I suspect that this may be a similar problem. My guess is that the SPI interface hardware might have an extra byte stuck in the FIFO, confusing the SPI device identification code. I'm not sure why it works with q4b02d and fails with q4b05, but I suspect that bisection is not the best approach. My plan is to debug spi-start in the failing case and check the status of the SSP hardware, looking for initialization glitches.

Changed 3 years ago by Quozl

Another report in #11096.

Changed 3 years ago by wmb@…

svn 2400 may fix this problem. It turns out that calling ssp-spi-start was confusing the SPI FLASH chip, but only under strange circumstances such as when find-drop-in was being called in a nested fashion, once to read a compressed dropin and once to read the inflater to inflate it.

Changed 3 years ago by Quozl

  • next_action changed from diagnose to test in build

Provided build q4b05jt (svn 2402) for testing by Quanta.

Changed 3 years ago by Quozl

  • description modified (diff)
  • summary changed from XO-1.75 B1 brick after RTC power fail due to efface-md failure to flash-open to XO-1.75 B1 brick due to efface-md failure to flash-open (after RTC power fail or sufficient elapsed time)

Changed 3 years ago by wmb@…

  • owner changed from wmb@… to greenfeld
  • next_action changed from test in build to test in release

Deployed in q4b06

Changed 3 years ago by Quozl

Possible test plan:

  • flash with Q4B06,
  • set md tag to 20110721T020256Z,
  • set RTC to nine days later,
  • power off,
  • test that power on and boot is normal, and that the md tag has remained md,
  • wait a day or two,
  • test that power on and boot is normal, and that the md tag has changed to MD.

Repeat test with an RTC battery removal after the md tag is set.

Changed 3 years ago by gnu

The lowercase "md" tag isn't documented at http://wiki.laptop.org/go/Manufacturing_data - and in particular there's no hint that the system will refuse to boot if it's not properly set. (There's a "Req" field for manufacturing data fields that are required for proper operation of the laptop.)

And am I correctly inferring that during boot-up, OFW is looking for "md" and changing it to "MD"? That seems a bit odd/dangerous to be doing at the very first user boot.

Changed 3 years ago by Quozl

  • status changed from new to closed
  • next_action changed from test in release to no action
  • resolution set to fixed

Yes, the tag was not documented. I've added an entry to the table now. I'm not sure the documentation is needed, as I think it is a tag that should not escape the manufacturing process. There are other tags that are deleted before shipping.

The tag is not required for proper operation. There will have been no hint about refusing to boot, since the root cause was a defect in OpenFirmware.

Yes, during the first boot that occurs at least ten days after the value in the md tag, OpenFirmware will change it. The code that does that is between lines 1293 and 1302, see source.

Yes, it is a bit dangerous, but it is necessary to provide a grace period for the manufacturer to avoid the deployment security system.

I'm satisfied with the testing so far, so I'm closing this ticket.

Note: See TracTickets for help on using tickets.