Ticket #11260 (new defect)

Opened 3 years ago

Last modified 23 months ago

i8042 data loss over suspend/resume

Reported by: dsd Owned by: rsmith
Priority: normal Milestone: Future Release
Component: embedded controller Version: 1.5-C2
Keywords: Cc: pgf
Action Needed: never set Verified: no
Deployments affected: Blocked By:
Blocking:

Description

Testing my XO-1.5, reproduced with both the synaptics-mouse base section that it came shipped with, and the AVC-mouse base shipped borrowed from my XO-1.75 B1.

I am finding that if I suspend by typing "echo mem > /sys/power/state" at the terminal (and terminating with the 'enter' key of course), sometimes the system suspends before the "enter released" interrupt comes in, and this interrupt then never properly arrives on resume. I am resuming with the power button. Reproducible within a few tries (especially on a more modern kernel which seems to suspend quicker).

Upon wakeup, the i8042 resume code simulates an interrupt (not sure exactly why, but if we don't do this then no interrupts arrive later). I8042_STATUS_REG reads 0x10 (i.e. no OBF bit set), indicating no data.

Then Linux receives a real interrupt, but I8042_STATUS_REG reads 0x10 still (no data). And the result is that the "enter released" interrupt (expected data 0x9c) was never received and the system incorrectly thinks that enter is held down.

[  656.532452] i8042: [65410] read I8042_STATUS_REG=11
[  656.532466] i8042: [65410] read I8042_DATA_REG=1c
[  656.532478] i8042: [65410] 1c <- i8042 (interrupt, 0, 1)
[  656.539243] PM: Syncing filesystems ... done.
[  656.547792] PM: Preparing system for mem sleep
[  656.547813] Freezing user space processes ... (elapsed 0.01 seconds) done.
[  656.572828] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
[  656.592816] PM: Entering mem sleep
[  656.593599] i8042 kbd 00:04: wake-up capability enabled by ACPI
[  656.599616] i8042 aux 00:03: wake-up capability disabled by ACPI
[  656.605834] ehci_hcd 0000:00:10.4: PCI INT D disabled
[  656.610999] uhci_hcd 0000:00:10.2: PCI INT C disabled
[  656.616305] uhci_hcd 0000:00:10.0: PCI INT A disabled
[  656.621507] viafb_suspend!
[  656.624849] viafb 0000:00:01.0: PCI INT A disabled
[  656.629836] ACPI handle has no context!
[  656.629850] sdhci-pci 0000:00:0c.0: PCI INT A disabled
[  656.635060] uhci_hcd 0000:00:10.1: PCI INT B disabled
[  656.640212] ACPI handle has no context!
[  656.660048] PM: suspend of devices complete after 66.956 msecs
[  656.740257] PM: late suspend of devices complete after 74.286 msecs
[  656.746712] ACPI: Preparing to enter system sleep state S3
[  656.850073] PM: Saving platform NVS memory
[  656.850073] ACPI: Low-level resume complete
[  656.850073] PM: Restoring platform NVS memory
[  656.850073] ACPI: Waking up from system sleep state S3
[  656.860330] viafb 0000:00:01.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[  656.860381] sdhci-pci 0000:00:0c.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[  656.860409] sdhci-pci 0000:00:0c.0: restoring config space at offset 0x6 (was 0x0, writing 0x80002000)
[  656.860426] sdhci-pci 0000:00:0c.0: restoring config space at offset 0x5 (was 0x0, writing 0x80001000)
[  656.860443] sdhci-pci 0000:00:0c.0: restoring config space at offset 0x4 (was 0x0, writing 0x80000000)
[  656.860463] sdhci-pci 0000:00:0c.0: restoring config space at offset 0x1 (was 0x2100010, writing 0x2100016)
[  656.880042] uhci_hcd 0000:00:10.0: BAR 4: set to [io  0x8000-0x801f] (PCI address [0x8000-0x801f])
[  656.889122] uhci_hcd 0000:00:10.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
[  656.889157] uhci_hcd 0000:00:10.0: restoring config space at offset 0x1 (was 0x2100010, writing 0x2100013)
[  656.900032] uhci_hcd 0000:00:10.1: BAR 4: set to [io  0x8020-0x803f] (PCI address [0x8020-0x803f])
[  656.909089] uhci_hcd 0000:00:10.1: restoring config space at offset 0xf (was 0x200, writing 0x209)
[  656.909124] uhci_hcd 0000:00:10.1: restoring config space at offset 0x1 (was 0x2100010, writing 0x2100013)
[  656.920031] uhci_hcd 0000:00:10.2: BAR 4: set to [io  0x8040-0x805f] (PCI address [0x8040-0x805f])
[  656.929085] uhci_hcd 0000:00:10.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
[  656.929120] uhci_hcd 0000:00:10.2: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100013)
[  656.940032] ehci_hcd 0000:00:10.4: BAR 0: set to [mem 0x80003000-0x800030ff] (PCI address [0x80003000-0x800030ff])
[  656.950508] ehci_hcd 0000:00:10.4: restoring config space at offset 0xf (was 0x400, writing 0x40a)
[  656.950542] ehci_hcd 0000:00:10.4: restoring config space at offset 0x1 (was 0x2100010, writing 0x2100012)
[  656.950681] pci 0000:00:14.0: restoring config space at offset 0xf (was 0x100, writing 0x109)
[  656.950709] pci 0000:00:14.0: restoring config space at offset 0x4 (was 0x8fff0004, writing 0x80004004)
[  656.950890] PM: early resume of devices complete after 90.837 msecs
[  656.957521] viafb 0000:00:01.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[  656.964493] sdhci-pci 0000:00:0c.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[  656.971814] uhci_hcd 0000:00:10.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[  656.979054] usb usb2: root hub lost power or was reset
[  656.984305] uhci_hcd 0000:00:10.1: PCI INT B -> GSI 22 (level, low) -> IRQ 22
[  656.991583] uhci_hcd 0000:00:10.2: PCI INT C -> GSI 21 (level, low) -> IRQ 21
[  656.998809] usb usb4: root hub lost power or was reset
[  657.004176] ehci_hcd 0000:00:10.4: PCI INT D -> GSI 23 (level, low) -> IRQ 23
[  657.011464] pci 0000:00:13.0: setting latency timer to 64
[  657.024832] viafb_resume!
[  657.027960] viafb_resume: Init 3d engine!
[  657.032287] usb usb3: root hub lost power or was reset
[  657.032330] sdhci-pci 0000:00:0c.0: setting latency timer to 64
[  657.032366] sdhci-pci 0000:00:0c.0: setting latency timer to 64
[  657.032388] sdhci-pci 0000:00:0c.0: setting latency timer to 64
[  657.052220] i8042 kbd 00:04: wake-up capability disabled by ACPI
[  657.070911] i8042: [65464] soft resume interrupt
[  657.070923] i8042: [65464] read I8042_STATUS_REG=10
[  657.070942] i8042: [65464] read I8042_STATUS_REG=10
[  657.070951] i8042: [65464] Interrupt 1, without any data
[  657.280197] PM: resume of devices complete after 322.805 msecs
[  657.286390] PM: Finishing wakeup.
[  657.286397] Restarting tasks ... done.

Kernel which you can use to debug this, which prints all i8042 I/O (as above): http://dev.laptop.org/~dsd/20110920/bzImage

Be sure to enable debugging and keyboard wakeup before suspending, with:

echo 1 > /sys/module/i8042/parameters/debug
echo 1 > /sys/devices/platform/i8042/power/wakeup
echo 1 > /sys/devices/platform/i8042/serio0/power/wakeup
echo 1 > /sys/devices/platform/i8042/serio1/power/wakeup
echo 1 > /sys/devices/platform/i8042/serio0/input/*/power/wakeup
echo 1 > /sys/devices/platform/i8042/serio1/input/*/power/wakeup

Change History

  Changed 3 years ago by dsd

  • cc pgf added

  Changed 3 years ago by dsd

Strangely when doing this from the VT (instead of the terminal activity) this does seem to work almost all of the time. During resume the "soft interrupt" reads the 0x9c data byte. It took me a lot of tries to reproduce this on the VT.

To reproduce:

Flash http://build.laptop.org/11.3.0/os5/xo-1.5/os5.zd4 on a XO-1.5

Prepare a USB disk with:

Boot with USB disk inserted.

Open Terminal activity, run:

sudo su
echo 1 > /sys/module/i8042/parameters/debug
echo 1 > /sys/devices/platform/i8042/power/wakeup
echo 1 > /sys/devices/platform/i8042/serio0/power/wakeup
echo 1 > /sys/devices/platform/i8042/serio1/power/wakeup
echo 1 > /sys/devices/platform/i8042/serio0/input/*/power/wakeup
echo 1 > /sys/devices/platform/i8042/serio1/input/*/power/wakeup

Then suspend with:

echo mem > /sys/power/state

Resume with power button.

If enter key is not held down on resume (or if you can see the 9c interrupt in the logs either before or after suspend), suspend again, and resume with the power button. And so on.

  Changed 3 years ago by dsd

(by the way, the DCON will die an ugly death on suspend on that kernel. can be ignored, it'll be fine on wakeup)

  Changed 3 years ago by dsd

Richard reproduced and has found a possible workaround in the EC. Testing with http://dev.laptop.org/~rsmith/11260.rom

Observations:

1. When suspending with sleep 5; echo mem > /sys/power/state (i.e. no keypresses for a long time before suspend) and resuming with the power button, a 1c (make enter) 9c (break enter) interrupt sequence is correctly received before suspend, however on resume a duplicate 9c (break enter) interrupt is received.

2. The same experiment repeated but waking the system with the 'j' key results in 1c,9c before suspend, and 24,a4 on resume. (all correct)

3. Reproducing the conditions described on this ticket, where the system suspends on a 1c event (make enter) before the 9c break interrupt has been received, upon power-button-triggered resume, the Linux-simulated 'soft' interrupt receives a 9c interrupt, then Linux receives a 'real' interrupt with no data.

4. If the system suspends on a 1c event (make enter) before the 9c break interrupt has been received, if I resume with the 'j' key the wrong thing happens. The system wakes with 24,a4 (j make, j break) codes, but no '9c' enter break is ever received, so Linux continues believing that enter is held down.

5. If the system suspends on a 1c event (make enter) before the 9c break interrupt has been received, if I resume with the 'j' key the wrong thing happens. The system wakes with 1c,9c (enter make, enter break). However, this interrupt pattern suggests that I had held enter down long enough for one interrupt-autorepeat to happen, and then released just once. In reality, I pressed and released enter once to suspend, and pressed and released enter once to resume.

In summary, I think (1) is probably no big deal on its own, (3) similarly a bit strange behaviour but unlikely to have adverse effects, (4) is a problem however because it would cause Linux/X to effectively ignore the next keypress of the "stuck key", and (5) is also a problem because I effectively lose a keypress/keyrelease from the userspace perspective.

follow-up: ↓ 6   Changed 3 years ago by pgf

is all this behavior a regression? or has it always been this way? if a regression, since when?

in reply to: ↑ 5   Changed 3 years ago by rsmith

Replying to pgf:

is all this behavior a regression? or has it always been this way? if a regression, since when?

We need to to find this out. We have always had problems with key repeats on resume. It's possible that the suspend speed of the previous kernels was slow enough that this didn't happen so much.

It important to know this because the core of this problem is that for some reason when the system suspends the EC sees a read of the keyboard data register so it thinks that the break code has been read by the host. If this has always been the case and then there is perhaps some sort of hardware glitch and solving the root cause may not be possible. If the false key read does not occur on previous releases then there's hope that some sort of kernel tweak can solve this.

Solving it directly in the EC is tricky because you have to interrupt the normal path before suspend since the EC isn't doing anything different. Its sees a read and goes on to wait for the the next keypress. By the time the EC sees that the system has suspended its too late because the data has been lost.

The 11260 hack was a trivial attempt at a fix and didn't pan out. A 2nd idea I'm having is to use the SCI inhibit system we have for XO-1. On XO-1 the kernel sends an SCI_INHIBIT prior to the suspend so the EC knows not to send up an SCI until the system has has a chance to go all the way into suspend otherwise XO-1 will hang. Its unused on 1.5 but all the code is still present. We can possibly use the SCI inhibit period as a keypress processing inhibit period. I can explore that option but I need a kernel which sends the SCI INHIBIT command prior to going into suspend.

  Changed 3 years ago by dsd

One relevant difference between 2.6.31 kernels (from 10.1 releases) and 2.6.35 kernels (in 11.x releases) is that 2.6.35 kernels disable i8042 interrupts much earlier going into suspend. This behaviour was changed to fix some races in resume in #10650.

I tested in 10.1.3 (build 860, kernel 2.6.31) on XO-1.5, running firmware Q3A62 (EC code 2.0.1). It is not easy to reproduce the issue there, because the interrupts are enabled for much longer, and suspend itself is slower than 2.6.35. I did reproduce it exactly once. I also ran several times into the issues in #10650 where the interrupts are processed in the wrong order during resume (kernel bug, which is why we ended up disabling them early). This made it difficult to perform many tests as I had to keep rebooting.

I then retested with Q3B19 against that old OS build. Again reproduced exactly once, but with difficulty, and with several occurances of #10650 getting in my way.

  Changed 3 years ago by dsd

Here is a kernel as requested. Set up a USB key as follows:

This sends EC command 0x32 late before going into suspend, and 0x34 during early resume.

These commands appear to be having some effect already - this kernel enables input wakeups by default, but the system doesn't wake up with keyboard/mouse input. I double checked that building without my modifications (http://dev.laptop.org/~dsd/20111010/sci-inhibit.patch) allows the system to wake up as normal.

  Changed 2 years ago by Quozl

  • milestone changed from Not Triaged to 12.1.0

  Changed 23 months ago by dsd

  • milestone changed from 12.1.0 to Future Release
Note: See TracTickets for help on using tickets.