Ticket #5422 (closed defect: fixed)

Opened 6 years ago

Last modified 6 years ago

Pending firmware update apparently prevents boot until AC power is applied

Reported by: gnu Owned by: wmb@…
Priority: normal Milestone: Update.1
Component: ofw - open firmware Version:
Keywords: release? Cc: cscott, rharrison, tomeu, rafael, Eben
Action Needed: Verified: no
Deployments affected: Blocked By:
Blocking:

Description

B4, 650, Q2D05.

I was running a ship.1 candidate and upgraded to 650 with olpc-update. That worked. Upon reboot, the machine did not upgrade its firmware, because it wasn't running in secure mode.

On my next reboot, I held the X key to get a secure reboot. The result is that the laptop would not boot: It noticed the new firmware, noticed that the machine had no AC power, and decided to do the worst thing possible. It complained, left the message up for ten seconds and then powered itself off.

It could have just ignored the firmware and booted up the existing OS. It could have ignored the firmware and booted the backup OS. It could have installed the firmware while running on batteries (the battery was >80% full). No -- it bricked the machine until AC arrived. (Perhaps this was the "security over usability" tradeoff desired. If so, the tradeoff should be re-evaluated.)

(People sold 9V battery power plugs for old Macintoshes, that would fool the computer into thinking it had AC power. This allowed suspending then swapping the battery, without crashing. I suspect the same trick would work on the OLPC, as a circumvention.)

(Because my machine isn't DRM'd, I could just power it back on without holding the "X" key, and it booted fine. And because I was close to my charger and working AC power, I was able to plug it in and power it on, at which point it upgraded the firmware to Q2D06. This bug is being reported for the people who don't have either option handy.)

[I suggest curing this by testing MP hardware's ability to correctly do firmware updates on battery power, and relaxing that restriction if possible.]

Change History

  Changed 6 years ago by jg

  • cc cscott added
  • keywords security DRM removed

Here's the problem, John....

The one thing that can "brick" a machine is interrupting a firmware update.

So we require the presence of a battery *and* a power supply before proceeding to reflash the firmware. This is to prevent a failure that literally causes the machine to be permanently unusable (short of replacing the SPI flash).

Now, that the system would not boot anyway, deferring the firmware update until the next opportunity, one might argue is a bug...

For safety's sake, it may need to reboot into the "old" version of the system, as running new code against old firmware may cause major headaches under some circumstances.

  Changed 6 years ago by cscott

  • summary changed from Pending signed firmware update kills DRM'd machine til AC power is applied to Pending firmware update apparently prevents boot until AC power is applied

It doesn't kill the machine. You can use the 'alt-boot' key (O) to bypass the firmware upgrade.

Arguably we need a much better message for this case, translated into many languages.

There's another bug filed about enabling firmware upgrade in 'insecure boot' mode. That might also mitigate the problem somewhat.

Please don't drag "DRM" into it -- this particular problem is entirely unrelated. We can't safely upgrade firmware without redundant power sources, and we can't safely boot into an upgrade without doing the firmware reflash, since there may be dependencies. The fact that this problem doesn't appear on a developer key'ed machine is an irrelevant artifact, and a bug in its own right (trac #5371).

Future versions of the XO will hopefully remove the hardware problem which makes firmware reflash so unsafe (trac #5314).

  Changed 6 years ago by jg

  • milestone changed from Never Assigned to Update.1

Seems like the OFW message might be more informative, or we'll have support problems. Worth fixing in a future version of OFW.

follow-up: ↓ 5   Changed 6 years ago by rharrison

  • cc rharrison added

Would something along the lines of "Connect AC power and type boot to continue" be a better option?

in reply to: ↑ 4   Changed 6 years ago by tomeu

  • cc tomeu added

Replying to rharrison:

Would something along the lines of "Connect AC power and type boot to continue" be a better option?

Just wanted to point out that "AC power" is a technical term that most people don't understand. Perhaps we should use instead some less correct but more universally known term?

  Changed 6 years ago by RafaelOrtiz

  • cc rafael added

Talking from the spanish side of things AC power is known but if there is a term that is more universal i would like to know it to translate it.

  Changed 6 years ago by Eben

This kind of process needs to be as friendly as possible. The kids don't know what firmware is, what re-flashing means, or perhaps even why it's happening. What we really need is a simple translated message such as "Please connect the laptop to power" with an image of the supplied AC adapter so that reading the message isn't even necessary. This should be the only icon/text on screen, centered and large, so that there is no confusion about what is required.

Ideally, the user wouldn't have to do anything else but connect the power. The laptop should detect this and continue the boot without further interaction. If this isn't possible, we could simply add "then press Enter" to the message, but requiring the kids to type "boot" just seems wrong.

Are there other similar issues in the boot/update process which expose information to the user and/or asks for their input? If so, it would be helpful to have those working on these processes create a ticket enumerating the details for the when and why so the design team can suggest visual and interaction designs to make this a straightforward experience for the kids. Thanks!

  Changed 6 years ago by Eben

  • cc Eben added

  Changed 6 years ago by gnu

The team is agreed on a desired resolution, but it isn't implemented yet:

<jg> When I was in au, I found that OFW wouldn't continue to try to boot if it wasn't on power and the firmware needed updating.... <jg> did this get resolved? <jg> it was logistically a PITA to have to plug in machines to get them fully upgraded.

<cjb> hm, it certainly got to Mitch, who was convinced that the status quo wouldn't work. <cjb> smithbone_: any idea on that? <cjb> I don't see anything likely in the SVN log for OFW.

<smithbone_> cjb: I think it was me who actually had the most resistance.

<gnu{-> jg: not fixed yet. It's http://dev.laptop.org/ticket/5422

<smithbone_> If you allow re-flashing without power then you will get bricks <smithbone_> redundant power that is.

<cjb> smithbone_: so let's not do that, but let's not *refuse to boot* when there is an update pending and we don't have power.

<jg> that's my point.

<smithbone_> cjb: Yes. I supported that. I think thats already in OFW head.

<cjb> I thought that was the consensus view we came up with. Failing to apply an update and booting anyway is less bad than trying to apply an update on bad power.

<cjb> smithbone_: ok. we need it in a release. also, I don't see it in the SVN log.

<jg> the first time the system is booted on power, it gets reflashed.

<smithbone_> Hmm...

<jg> smithbone_: can you follow up with Mitch?

<smithbone_> jg: I will.

<cjb> jg: Thanks for remembering about that.

<jg> 'cause I found that a real headache just updating 100 machines; much less 40K. <jg> of course, with 40k you can afford to set up updating stations, so it may not be all that bad. <jg> not as bad as I experienced. <jg> so I don't think we should hold things up on this change, but we should see it gets done soon.

<smithbone_> If we accept that fact that unsafe flashing can brick then we can remove the hard requirement. But I think the current view was that if it save's one childs laptop from getting bricked then its worth the extra work.

<jg> smithbone_: no, we just delay the reflash until you boot on power. <jg> of course, some of our stuff really wants the later firmware.

<cjb> smithbone_: We were all convinced by your argument that is in unsafe. That's your call. What's up for debate is what happens when there is an update pending and we don't have power -- do we boot anyway, or do we crash to the ok prompt. <cjb> And I heard everyone agreeing that we boot anyway.

<jg> boot anyway. <jg> yes.

follow-up: ↓ 11   Changed 6 years ago by cscott

I thought the agreement was that we *alt* boot anyway. That way we're not trying to run a new version of the OS without the firmware it may need.

But if we're committed to supporting all existing OFW versions in all future kernels, then "boot anyway" might be reasonable.

On another note, I'd like to get some of OFW's messages translated into (say) five languages. "Security failure" and "AC not present" are the ones I can think of that could stand to be more friendly. Mitch might know others (or he might prefer to get icons for all such messages).

in reply to: ↑ 10   Changed 6 years ago by Eben

Replying to cscott:

On another note, I'd like to get some of OFW's messages translated into (say) five languages. "Security failure" and "AC not present" are the ones I can think of that could stand to be more friendly. Mitch might know others (or he might prefer to get icons for all such messages).

I commented similarly above, noting that the user experience really needs improvement at this level. For instance, "AC not present" really isn't a message that a general audience will understand, whereas "please connect to power" might be. In addition to translating these friendlier messages into a few languages, I'd really like to be given a list of all such messages (in another ticket?) so that we can come up with some friendly static screens to show in these circumstances.

As a side note, do we still want to indicate (even when "booting anyway") that the update was skipped due to lack of power? It seems wrong to silently bypass this completely. Perhaps the "Please connect to power" screen could offer a way to skip the update and boot anyway?

  Changed 6 years ago by rsmith

  • status changed from new to closed
  • resolution set to fixed

Fixed in q2d14

follow-up: ↓ 15   Changed 6 years ago by cscott

  • status changed from closed to reopened
  • resolution deleted

q2d14 is currently being tested in joyride, although we'll need a q2d15 in order to properly exercise the new update code paths.

Reopening pending migration to update.1 and testing.

  Changed 6 years ago by Blaketh

  • keywords release? added

in reply to: ↑ 13   Changed 6 years ago by rsmith

Replying to cscott:

q2d14 is currently being tested in joyride, although we'll need a q2d15 in order to properly exercise the new update code paths. Reopening pending migration to update.1 and testing.

q2d14a has been created for this purpose. Its identical to q2d14 except for the version bump. Should show up in joyride soon.

follow-up: ↓ 17   Changed 6 years ago by mstone

  • status changed from reopened to closed
  • resolution set to fixed

Confirmed fixed in update.1-702. I tested by reflashing 702, booting, reflashing joyride-1794 (with q2d14a), removing power, rebooting (firmware update was skipped and boot continued), then rebooting with power applied.

Also tested the same path but with olpc-update --usb instead of reflash into joyride-1794.

in reply to: ↑ 16   Changed 6 years ago by rsmith

Replying to mstone:

Confirmed fixed in update.1-702. I tested by reflashing 702, booting, reflashing joyride-1794 (with q2d14a), removing power, rebooting (firmware update was skipped and boot continued), then rebooting with power applied. Also tested the same path but with olpc-update --usb instead of reflash into joyride-1794.

If you you want complete testing you need to test the secure side of this as well. In my testing I built q2d14 with a my dev version number scheme (q207u) such that q2d13 would be considered an upgrade. Then I flashed in my test, ran enable-security on a system with a signed version of q2d13.

q2d14a needs to be signed and then you can re-run the test in secure mode.

  Changed 6 years ago by mstone

Success: (install q2d14; put signed q2d14a in /boot/bootfw.zip; disable-security; remove power; boot (firmware update skipped; boot continues [though it fails because I've got an unsigned build]; apply power, reboot, firmware is updated).

Note: See TracTickets for help on using tickets.