Ticket #11021 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

EC turns on CPU with keyboard unresponsive

Reported by: wmb@… Owned by: rsmith
Priority: high Milestone: 11.3.0
Component: embedded controller Version: not specified
Keywords: Cc: dsd, pgf, Quozl, rsmith, sridhar
Action Needed: no action Verified: no
Deployments affected: Blocked By:
Blocking:

Description

Samuel and I have been investigating a keyboard problem on one of his 1.5's. It first showed up as a failure to respond to keyboard during a "more?" prompt inside rocker-induced selftest, but we eventually discovered that it is much more general. The machine will often (about 2 failures in 3 tries) fail to respond to ESC to gain the ok prompt.

We eventually found a way to reproduce the problem on other machines:

One way is to run linux, then "halt", then power off with the button.

Another way is to run OFW, then test /mouse . While running your finger on the touchpad, simultaneously press the power button to turn off the machine. Then power on again. With high probability, you will not be able to break into OFW with the ESC key.

If you use a serial console to inspect the state of the world, you will see that the kbd status register (port 0x64) contains 0x30 instead of 0x10 . This indicates that the PS/2 hardware is in the state "the byte in the output buffer is coming from the touchpad, but there is no byte in the output buffer".

Samuel was able to reproduce the problem with Q3B08 too, but it didn't seem to fail as readily there.

You can see the 0x30 thing very early - just "i to interact" and type

ok 64 pc@ .

Change History

  Changed 3 years ago by wmb@…

I have what I think is a reliable OFW recipe for recovering from this bad state. The following code, when added to the 8042 driver's open method, seems to work on an ENE 1.5 system:

: consume  ( -- )
   \ Discard any data that is already queued up
   begin  d# 10 ms  stat@  out-buf-full and  while
      data@ drop
   repeat
   d# 10 ms
;
: reset-ps2  ( -- )
   consume

   \ Reset the mouse if the controller is expecting a mouse command
   stat@ h# 38 =  if  h# ff data!  consume   then

   \ Reset the mouse if the controller is reporting an aux port ready bit
   stat@ h# 30 =  if
      h# ff h# d4 put-ctlr-cmd2  consume
   then

   \ Reset the keyboard just for good measure
   h# ff data!  consume
;      

It first reads and discards any data that is already queued up (the initial "consume").

It then checks to see if the interface is expecting a command to the mouse (the "38" test); if so it sends the reset command and consumes any result bytes.

It then checks to see if the interface is in "mouse state" (the "30" check); if so, it directs a reset command to the mouse and consumes any result bytes.

Finally, it sends a reset command to the keyboard just to kick it.

This sequence of operations was arrived at experimentally by booting many times, looking at possible error states, and finding successful recovery sequences.

Ideally, it would be nice if the EC could put the interface into a consistent and workable state. I don't know how hard that will be in the case where the mouse happens to be turned on at power-cycle time. If it is too hard for the EC, we can consider putting the code in OFW.

  Changed 3 years ago by dsd

  • cc dsd added
  • milestone changed from Not Triaged to 11.3.0

This has been bugging me as well. I can't put my finger on it, but I think its a recent regression - or seems at least worse than before.

I'd very much appreciate it if you both could look at this with a high priority once you have taken a breather after the current bringup. Invasive fixes are welcome as we are entering a new development cycle.

  Changed 3 years ago by wmb@…

  • cc pgf, Quozl added

It will be awhile before I can get to this, as I didn't bring a 1.5 with me on the bringup trip, and I'm heading off for vacation as soon as I return. Perhaps James and Paul can take point on this issue?

follow-up: ↓ 5   Changed 3 years ago by wmb@…

I believe that Richard found a way to fix the problem by initializing a bit in the EC's PS/2 interface. If he has some code ready to go, I wonder if others could make a release and get it tested.

in reply to: ↑ 4   Changed 3 years ago by rsmith

Replying to wmb@firmworks.com:

I believe that Richard found a way to fix the problem by initializing a bit in the EC's PS/2 interface. If he has some code ready to go, I wonder if others could make a release and get it tested.

Yes. I have this bug fixed. Its a trivial fix. I'll push it to master.

  Changed 3 years ago by dsd

Great, thanks!

Now would be a good time to roll this out. Can someone do a new release? Or at least provide a test firmware to me and Sam?

follow-up: ↓ 8   Changed 3 years ago by Quozl

  • cc rsmith added

Richard, is the change in an EC release, and if so which one?

in reply to: ↑ 7   Changed 3 years ago by rsmith

Replying to Quozl:

Richard, is the change in an EC release, and if so which one?

The fixes are in EC 2.2.7. It contains the small fix for this bug and also some fixes for 2 other quirks discovered while fixing this bug.

I have a test version available here: http://dev.laptop.org/~rsmith/q3z1081.rom

  Changed 3 years ago by Quozl

  • next_action changed from diagnose to test in build

Q3B14 released with EC 2.2.7, please test.

  Changed 3 years ago by dsd

test in 11.3.0 build 1

  Changed 3 years ago by greenfeld

  • status changed from new to closed
  • next_action changed from test in build to no action
  • resolution set to fixed

Found an XO-1.5 I could reproduce the issue easily with Q3B13. Upgraded it to Q3B14 and could no longer reproduce the issue with it.

  Changed 3 years ago by sridhar

  • cc sridhar added
Note: See TracTickets for help on using tickets.