Opened 3 years ago

Last modified 3 years ago

#11223 new defect

XO-1 & XO-1.5 remapping keyboard on resume

Reported by: greenfeld Owned by: rsmith
Priority: blocker Milestone: 11.3.0
Component: embedded controller Version: Development build as of this date
Keywords: Cc: dsd, rsmith
Blocked By: Blocking:
Deployments affected: Action Needed: code
Verified: no

Description

An XO-1 (SHC930004B6F) has a problem where when this XO is not used for a while, the DCON is showing the last frozen frame, and then the touchpad (or keyboard?) is used to resume. When this happens the touchpad works properly but the keyboard is entirely remapped to a mix of invalid and different keys.

To an end user the remap makes no sense (k outputs 6, 3 outputs 4, etc.), but it may make sense internally. The same key will consistently output the same incorrect value, if it outputs anything usable at all.

The internal keyboard remapping happens both in X Windows and on the Linux console. Attaching a USB keyboard works and is not remapped.

Rebooting the XO (even softly via Linux) will get rid of the improper remap.

Seen in 11.3.0 os3.

Attachments (6)

logs.tgz (45.5 KB) - added by greenfeld 3 years ago.
Logs from affected XO-1
evpressed (6.6 KB) - added by dsd 3 years ago.
evpressed app
evpressed.c (2.5 KB) - added by dsd 3 years ago.
evpressed.c source code
evpressed.log (5.3 KB) - added by pgf 3 years ago.
the Fri entries are normal. the Mon entries are while the problem occurs
keyb.log (73.4 KB) - added by pgf 3 years ago.
messages.save (1.6 MB) - added by pgf 3 years ago.

Download all attachments as: .zip

Change History (26)

Changed 3 years ago by greenfeld

Logs from affected XO-1

comment:1 Changed 3 years ago by greenfeld

Also seen for the first time on an XO-1.75 SHC12901A83 after holding down the power key to shut it down while at the Linux console, and pressing ESC rapidly on startup to get into OFW.

The resulting keymap errors appear to be the same, suggesting a improper initialization of something.

Powering off by holding the power key again and waiting for the startup chime to get into OFW resolved the issue.

Some example invalid keymaps:

  • n -> t
  • k -> y
  • i -> 6
  • 1 -> 3
  • ` (US keyboard upper corner) -> 1
  • 6 -> 8
  • Left Shift -> W

comment:2 Changed 3 years ago by pgf

is the invalid mapping ever observed at the OFW level? or only in linux? being able to make it happen at OFW would go a long way to isolating it.

if you can reproduce again, please "stop olpc-kbdshim-udev", and see if bad mapping remains.

comment:3 Changed 3 years ago by pgf

i'm seeing a (possible) variant of this behavior on my XO-1.5. after running for many hours, mostly on, but also transported home in lid-close suspend, then plugged in and opened up, the keyboard is no longer fully functional. pressing keys sometimes gives a ctrl- version of the same key. mostly it gives nothing at all. sometimes the key produces itself (!).

i was prepared, and had booted initially with serial connected, so i had access to a serial console. in this state, an strace of kbdshim shows that it is seeing every keypress.

running

DISPLAY=:0 xev

at the console showed that X11 is not seeing input events from the keyboard. (the touchpad remains fully functional, throughout.)

stopping kbdshim had no effect. restarting kbdshim also had no effect.

switching to a VT ("chvt 2" at the serial console) showed that the VT session was not getting keyboard events either.

however: running "evtest", and selecting the uinput device (when kbdshim was running) or the keyboard (when kbdshim was not) showed every single character typed being made available, and correctly.

i'm running os5, kernel 2.6.35.13_xo1.5-20110801.1409.olpc.66488fe .

comment:4 Changed 3 years ago by pgf

i tried "stop prefdm" and "start prefdm". no change in behavior -- characters very sporadically "get through", but mostly do not.

comment:5 Changed 3 years ago by dsd

  • Cc dsd added
  • Milestone changed from Not Triaged to 11.3.0

Once reproduced, I suggest:

echo 1 > /sys/module/i8042/parameters/debug

Then record a few key presses against the logs, and we can look at the byte data that comes in.

comment:6 Changed 3 years ago by dsd

Ah wait, I misread. if evtest works fine then i8042 data won't help us.

This suggests an X problem. I guess another possibility is that a key is incorrectly recorded as being held down, which could be a result of #11260.

comment:7 Changed 3 years ago by pgf

i like the "held-down" theory. is there a way to ask evdev for the current state of all keys?

Changed 3 years ago by dsd

evpressed app

Changed 3 years ago by dsd

evpressed.c source code

comment:8 Changed 3 years ago by dsd

The attached evpressed program tells you which input state bits are set.

./evpressed <device> <mode>

where device is /dev/input/eventX

and <mode> is key, led, snd, or sw

I suggest running this program after reproducing the issue. Check sw and key on the keyboard, mouse, lid switch, ebook switch, uinput node.

Outputs info if any bits are set in the evdev state. Produces no output if no bits are set (e.g. no keys pressed).

comment:9 Changed 3 years ago by dsd

Please also run this kernel on your XO-1.5:
http://dev.laptop.org/~dsd/20110930/kernel-2.6.35.13_xo1.5-20110930.1817.olpc.f25eb72.i586.rpm

and after hitting the issue, and checking evpressed as above, do:

echo 1 > /sys/module/i8042/parameters/extra_debug

I added quite a bit of logging to a suspect part of the kernel that takes key presses from the input layer and maps them and passes them through the vt layer. Here is some reference output for a keypress (and release) of the 'j' key.

[   85.728884] keyboard: kbd_event type=4 code=4 value=24
[   85.734902] keyboard: kbd_event type=1 code=24 value=1
[   85.738852] keyboard: kbd_keycode keycode=24 down=1 hw_raw=0
[   85.738852] keyboard: kbd_keycode param set shift=0 ledstate=0
[   85.738852] keyboard: kbd_keycode regular map 24 -> fb6a
[   85.738852] keyboard: kbd_keycode call chain value fb6a
[   85.738852] keyboard: kbd_keycode sent keysym 6a !down 0 ledstate 0
[   85.738852] keyboard: kbd_event type=0 code=0 value=0
[   85.925724] keyboard: kbd_event type=4 code=4 value=24
[   85.935700] keyboard: kbd_event type=1 code=24 value=0
[   85.935700] keyboard: kbd_keycode keycode=24 down=0 hw_raw=0
[   85.935700] keyboard: kbd_keycode param set shift=0 ledstate=0
[   85.935700] keyboard: kbd_keycode regular map 24 -> fb6a
[   85.935700] keyboard: kbd_keycode call chain value fb6a
[   85.935700] keyboard: kbd_keycode sent keysym 6a !down 1 ledstate 0
[   85.935700] keyboard: kbd_event type=0 code=0 value=0

comment:10 Changed 3 years ago by pgf

thanks! i've added this to appropriate places in both olpc-rotate and powerd so i can invoke it w/o keyboard or serial (i.e., on the bus):

(
    date
    for x in /dev/input/event?
    do
        echo $x ---- $(grep ^ /sys/class/input/input${x##/dev/input/event}/name)
        echo key:
        evpressed $x key
        echo sw:
        evpressed $x sw
    done
) >> /home/olpc/evpressed.log

(this comment mainly for backup purposes)

comment:12 Changed 3 years ago by dsd

  • Priority changed from normal to blocker

comment:13 Changed 3 years ago by pgf

attaching the output of my evpressed capture script. i ran it several times, as the problem was occurring -- it looks like bits are accumulating over time.

Changed 3 years ago by pgf

the Fri entries are normal. the Mon entries are while the problem occurs

Changed 3 years ago by pgf

comment:14 Changed 3 years ago by pgf

keyb.log is a debug message log from a later occurrence. the evpressed info during that time only showed a single bit set, which is likely the rotate button used to generate the report:

Mon Oct  3 15:40:04 EDT 2011
from olpc-rotate
/dev/input/event0 ---- OLPC AC power jack
key:
sw:
byte 0: 20
/dev/input/event1 ---- Lid Switch
key:
sw:
/dev/input/event2 ---- Power Button
key:
sw:
/dev/input/event3 ---- EBook Switch
key:
sw:
/dev/input/event4 ---- AT Translated Set 2 keyboard
key:
byte 28: 8
sw:
/dev/input/event5 ---- PS/2 Synaptics TouchPad
key:
sw:
/dev/input/event6 ---- olpc-kbdshim virtual input
key:
sw:

comment:15 Changed 3 years ago by dsd

Thanks for the logs. For some reason it looks like the keyboard has been put in raw mode (presumably via the KDSKBMODE ioctl).

Please install this new test kernel: http://dev.laptop.org/~dsd/20111004/kernel-2.6.35.13_xo1.5-20111004.1752.olpc.a23ea57.i586.rpm

it has the same debugging stuff as the last one, but will additionally print a stack trace and message when a process changes kbdmode.

comment:16 Changed 3 years ago by pgf

i'm attaching messages.save, which is the full /var/log/messages for the current boot. i've annotated the "olpc-rotate: checkpoint" messages (which i generate with logger, from /usr/bin/olpc-rotate) with the keyboard key that i hit (and released) just after each one. there are four instances of that. the log also includes a backtrace from the setting of kbdmode, but that happened days ago -- long before the keyboard "broke".

also, here are two dmesg snippets:

three 'h' characters:
[31360.060916] keyboard: kbd_event type=1 code=23 value=2
[31360.066377] keyboard: kbd_keycode keycode=23 down=2 hw_raw=0
[31360.070734] keyboard: kbdmode 2 rep 1
[31360.070734] keyboard: kbd_keycode raw mode, emulate raw
[31360.070734] keyboard: kbd_keycode param set shift=4 ledstate=0
[31360.070734] keyboard: kbd_keycode regular map 23 -> f008
[31360.070734] keyboard: kbd_keycode call chain value f008
[31360.070734] keyboard: kbd_keycode raw mode type 0 bailout
[31360.112741] keyboard: kbd_event type=0 code=0 value=0
[31360.292870] keyboard: kbd_event type=1 code=23 value=2
[31360.298322] keyboard: kbd_keycode keycode=23 down=2 hw_raw=0
[31360.302758] keyboard: kbdmode 2 rep 1
[31360.302758] keyboard: kbd_keycode raw mode, emulate raw
[31360.302758] keyboard: kbd_keycode param set shift=4 ledstate=0
[31360.302758] keyboard: kbd_keycode regular map 23 -> f008
[31360.302758] keyboard: kbd_keycode call chain value f008
[31360.302758] keyboard: kbd_keycode raw mode type 0 bailout
[31360.337616] keyboard: kbd_event type=0 code=0 value=0
[31360.446639] keyboard: kbd_event type=1 code=23 value=2
[31360.452094] keyboard: kbd_keycode keycode=23 down=2 hw_raw=0
[31360.456530] keyboard: kbdmode 2 rep 1
[31360.456530] keyboard: kbd_keycode raw mode, emulate raw
[31360.456530] keyboard: kbd_keycode param set shift=4 ledstate=0
[31360.456530] keyboard: kbd_keycode regular map 23 -> f008
[31360.456530] keyboard: kbd_keycode call chain value f008
[31360.456530] keyboard: kbd_keycode raw mode type 0 bailout
[31360.491447] keyboard: kbd_event type=0 code=0 value=0

the 'j' key:
[31542.210971] keyboard: kbd_event type=1 code=24 value=2
[31542.216442] keyboard: kbd_keycode keycode=24 down=2 hw_raw=0
[31542.220861] keyboard: kbdmode 2 rep 1
[31542.220861] keyboard: kbd_keycode raw mode, emulate raw
[31542.220861] keyboard: kbd_keycode param set shift=4 ledstate=0
[31542.220861] keyboard: kbd_keycode regular map 24 -> f00a
[31542.220861] keyboard: kbd_keycode call chain value f00a
[31542.220861] keyboard: kbd_keycode raw mode type 0 bailout

Changed 3 years ago by pgf

comment:17 Changed 3 years ago by dsd

Debugged further on IRC:

My earlier worrying about kbdmode sent us a little in the wrong direction. kbdmode is 3 on the console (VC_UNICODE) but 2 in X (VC_RAW). not sure what changes it, my last kernel doesn't catch it, but that seems irrelevant now. its normal for raw mode to be active if X is on screen when testing, and this is what Paul is doing.

The above logs show that no key releases are being received. enabling i8042 debugging confirms this - no key release interrupts. basic EC debugging confirms this too.

EC debugging was turned on at runtime with sdkit:

1 fbfe ec!

Paul is running old EC code on this laptop, he'll now try with the latest version. not that we expect to fix anything, but worth a try.

comment:18 Changed 3 years ago by pgf

  • Action Needed changed from diagnose to code
  • Cc rsmith added
  • Component changed from olpc-kbdshim to embedded controller
  • Owner changed from pgf to rsmith

i think i've fully diagnosed this, and have a (partial) fix.

the EC code contains a ring buffer into which characters are put before being picked up for transmission to the host. the code managing this buffer is sufficiently arcane that i was able to fix four separate bugs in the way it's managed. two of those bugs conspired to cause my symptoms. (the other two were found by inspection, and we may never actually hit them in practice.)

to reproduce the symptoms: use a magnet to put the laptop into "lid-closed" suspend, and hit enough keys to fill the ring buffer -- typing slowly, 16 to 20 characters should do it (or maybe more -- i don't have an unpatched laptop handy to try it again). the ring buffer fills because the upstream interface to the host is blocked (sort of -- more below), so characters accumulate. once you've typed enough,
simulate "lid open" by removing the magnet. you may have to type a few more characters to start seeing characters dropped, errant control chars, etc.

i believe that in actual use, the bug was occurring for me because it's possible to depress keys on the mechanical HS model by squeezing the laptop when it's closed. this likely happens in my shoulder bag while i'm commuting, or perhaps i have a characteristic way of holding it with my right hand while connecting power. not sure.

it turns out that characters typed while in lid-suspend state are not completely blocked from the host. some will be lost entirely, but others (about half, up to the size of the ring buffer, i think) will be sent to the host after the laptop is awakened. i think this behavior is wrong -- no characters that arrive while wakeups are masked should be saved and sent up (with the possible exception of the break code for the key that got us into suspend, if any. but since this is a lid-closed suspend, that isn't as much of an issue as it is with lid-open suspend.)

this bug was present in 10.1.3, and is likely present on earlier releases and on XO-1 as well. this makes it a) not a regression, and b) probably not a blocker. i have a fix, but i think it may be worth fixing the rest of the problem at the same time.

comment:19 Changed 3 years ago by greenfeld

  • Summary changed from XO-1 remapping keyboard on resume to XO-1 & XO-1.5 remapping keyboard on resume

Clarifying the title to note similar issues happen on more than one platform.

comment:20 Changed 3 years ago by pgf

i think i'd like to ask for help in characterizing the severity of this bug, and verifying that the symptoms i've described (and, possibly those described by greenfeld) can be recreated by forcing keystrokes during lid-closed suspend.

i feel like i'm seeing this bug a lot -- i don't know whether it's just the way i handle and transport the laptop, or whether my keyboard has worse mechanical issues, or what, but i feel like we should be seeing reports of this keyboard lockup more often.

one other thing i do that's probably unusual is this: i edit powerd.conf to set a very aggressive suspend policy in ebook mode, with

config_EBOOK_TIME_SLEEP="3"

i do this for e-reading on my commute. i'm now wondering whether that also is a (partial?) trigger.

Note: See TracTickets for help on using tickets.