Opened 8 years ago

Closed 7 years ago

Last modified 2 years ago

#9967 closed defect (fixed)

2.6.31.6: libertas suspend fails on XO-1

Reported by: sascha_silbe Owned by: dsaxena
Priority: normal Milestone:
Component: kernel Version: Development source as of this date
Keywords: libertas suspend Cc: bernie, pgf
Blocked By: Blocking:
Deployments affected: Action Needed: no action
Verified: no

Description

Kernel built from latest git (branch olpc-2.6.31, last commit f119223) fails to suspend:

[  204.998608] PM: Syncing filesystems ... done.
[  205.026089] PM: Preparing system for mem sleep
[  205.026115] Freezing user space processes ... (elapsed 0.00 seconds) done.
[  205.029741] Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
[  205.030458] PM: Entering mem sleep
[  205.032054] dcon_source_switch to DCON
[  205.058837] olpc-dcon: The DCON has control
[  205.137688] libertas: PREP_CMD: command 0x0043 failed: 1
[  205.138035] libertas: HOST_SLEEP_CFG failed 1
[  205.138308] libertas: Host sleep configuration failed: 1
[  205.138637] usb8xxx 1-1:1.0: suspend error 1
[  205.138924] pm_op(): usb_dev_suspend+0x0/0xf returns 1
[  205.139246] PM: Device 1-1 failed to suspend: error 1
[  205.139563] PM: Some devices failed to suspend
[  205.270777] dcon_source_switch to CPU
[  205.299204] olpc-dcon: The CPU has control
[  205.381422] PM: Finishing wakeup.
[  205.381440] Restarting tasks ... done.

Attachments (3)

0002-olpc-convenience-funcs-to-check-for-XO-1-XO-1.5.patch (1.3 KB) - added by bernie 8 years ago.
Preparation infrastructure for the actual bugfix
0003-libertas-Do-not-issue-HOST_SLEEP_CFG-on-XO-1.patch (1.4 KB) - added by bernie 8 years ago.
Bugfix
ehs_remove_fixup_on_xo_1.patch (2.9 KB) - added by dsaxena 8 years ago.
Remove call to EHS_REMOVE_WAKEUP on firwmare that does not support it

Download all attachments as: .zip

Change History (24)

comment:1 follow-up: Changed 8 years ago by dsd

thanks, reported to Marvell.

This was introduced by http://dev.laptop.org/git/olpc-2.6/commit/?h=olpc-2.6.31&id=14f2fe76fe0b16685b409ab1d2471b9f4c1dc54d

If they don't respond quickly (we already have them working on higher priority issues) it may be wise simply to make this code only execute on SD8686 (the wifi chip in the XO-1.5) for the time being - anyone interested in cracking out a quick patch?

comment:2 in reply to: ↑ 1 Changed 8 years ago by sascha_silbe

Replying to dsd:

This was introduced by http://dev.laptop.org/git/olpc-2.6/commit/?h=olpc-2.6.31&id=14f2fe76fe0b16685b409ab1d2471b9f4c1dc54d

Confirmed, reverting that commit fixes it.

If they don't respond quickly (we already have them working on higher priority issues) it may be wise simply to make this code only execute on SD8686 (the wifi chip in the XO-1.5) for the time being - anyone interested in cracking out a quick patch?

Since they're busy with CES there won't be a quick response. Since there was some restructuring I have no idea what parts to make conditional so will leave it up to someone more familiar with the libertas driver to do it.

comment:3 Changed 8 years ago by Quozl

  • Milestone changed from Not Triaged to 1.5-software-later

comment:4 Changed 8 years ago by pgf

  • Milestone changed from 1.5-software-later to 1.0-software-later

comment:5 Changed 8 years ago by bernie

I'm testing an upstramable patch.

comment:6 Changed 8 years ago by bernie

Attaching proposed patches.

Changed 8 years ago by bernie

Preparation infrastructure for the actual bugfix

comment:7 Changed 8 years ago by sascha_silbe

With bernies patches (and the Marvell commit not reverted) wifi always dies for me on resume. The link LED stays lit during suspend, only getting turned off after the WLAN module gets resetted.

[ 2849.889038] Freezing user space processes ... (elapsed 0.00 seconds) done.
[ 2849.894925] Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
[ 2849.895156] PM: Entering mem sleep
[ 2849.896641] libertas: Not issuing HOST_SLEEP_CFG on XO-1
[ 2849.916888] dcon_source_switch to DCON
[ 2849.953080] olpc-dcon: The DCON has control
[ 2849.977765] mmc_suspend_host: turning off power on mmc0
[ 2849.977797] sdhci_set_ios: power off for mmc0 from c0699829
[ 2849.977819] sdhci_set_power: new power value = 0
[ 2849.977929] sdhci-pci 0000:00:0c.1: PME# disabled
[ 2849.983133] sdhci_pci_probe: Enable PME set to 0x1a0c108
[ 2849.984589] ehci_hcd 0000:00:0f.5: PME# disabled
[ 2849.996827] ohci_hcd 0000:00:0f.4: PME# disabled
[ 2850.009076] olpc_do_sleep!
[ 2850.009076] CAFÉ NAND 0000:00:0c.0: restoring config space at offset 0xf (was 0x8080100, writing 0x808010b)
[ 2850.009076] CAFÉ NAND 0000:00:0c.0: restoring config space at offset 0x3 (was 0x800000, writing 0x802000)
[ 2850.009076] sdhci-pci 0000:00:0c.1: restoring config space at offset 0x1 (was 0x2b00002, writing 0x2b00006)
[ 2850.009076] cafe1000-ccic 0000:00:0c.2: restoring config space at offset 0xf (was 0x8080100, writing 0x808010b)
[ 2850.009076] cafe1000-ccic 0000:00:0c.2: restoring config space at offset 0x3 (was 0x800000, writing 0x802000)
[ 2850.009076] cafe1000-ccic 0000:00:0c.2: restoring config space at offset 0x1 (was 0x2b00002, writing 0x2b00006)
[ 2850.009076] ohci_hcd 0000:00:0f.4: PME# disabled
[ 2850.009076] ehci_hcd 0000:00:0f.5: PME# disabled
[ 2850.009076] olpc-pm:  SCI 0x1 received
[ 2850.009076] olpc-pm:  SCI 0x0 received
[ 2850.416496] sdhci_set_power: new power value = 14
[ 2850.456485] sdhci_set_power: new power value = 12
[ 2851.713031] sdhci_pci_probe: Enable PME is 0x1a0c008
[ 2851.733303] cs5535audio 0000:00:0f.3: setting latency timer to 64
[ 2851.737665] ohci_hcd 0000:00:0f.4: PME# disabled
[ 2851.737698] ohci_hcd 0000:00:0f.4: setting latency timer to 64
[ 2851.828897] usb usb2: root hub lost power or was reset
[ 2851.828934] ehci_hcd 0000:00:0f.5: PME# disabled
[ 2851.828966] ehci_hcd 0000:00:0f.5: setting latency timer to 64
[ 2851.828991] usb usb1: root hub lost power or was reset
[ 2851.829026] ehci_hcd 0000:00:0f.5: cache line size of 32 is not supported
[ 2852.022369] dcon_source_switch to CPU
[ 2852.062229] olpc-dcon: The CPU has control
[ 2852.146955] usb 1-1: reset high speed USB device using ehci_hcd and address 12
[ 2852.295547] libertas: Not issuing HOST_SLEEP_CFG on XO-1
[ 2852.297546] PM: Finishing wakeup.
[ 2852.297563] Restarting tasks ... done.
[ 2855.516513] libertas: command 0x000b timed out
[ 2855.516695] libertas: requeueing command 0x000b due to timeout (#1)
[ 2858.516544] libertas: command 0x000b timed out
[ 2858.516723] libertas: requeueing command 0x000b due to timeout (#2)
[ 2861.516573] libertas: command 0x000b timed out
[ 2861.516755] libertas: requeueing command 0x000b due to timeout (#3)
[ 2864.516608] libertas: command 0x000b timed out
[ 2864.516789] libertas: Excessive timeouts submitting command 0x000b
[ 2864.516822] Resetting OLPC wireless via EC...
[ 2864.522000] usb 1-1: USB disconnect, address 12
[ 2864.522879] libertas: PREP_CMD: command 0x000b failed: -110
[ 2866.376580] usb 1-1: new high speed USB device using ehci_hcd and address 13
[ 2866.531582] usb 1-1: configuration #1 chosen from 1 choice
[ 2866.550480] usb 1-1: firmware: using built-in firmware usb8388.bin
[ 2867.250640] usb8xxx: Firmware ready event received
[ 2867.268234] libertas: 00:17:c4:0c:ef:28, fw 5.110.22p23, cap 0x000003a3
[ 2867.287450] libertas: eth0: Marvell WLAN 802.11 adapter
[ 2867.297794] libertas: Not issuing HOST_SLEEP_CFG on XO-1
[ 2867.298861] libertas: PREP_CMD: command 0x0074 failed: 2
[ 2867.298887] usb8xxx: Firmware does not seem to support PS mode
[ 2874.038865] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 2884.389559] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

comment:8 Changed 8 years ago by sascha_silbe

For comparison: This is without bernies patches, but the Marvell patch reverted. While link was lost during suspend (it has been sleeping for much longer than with the above test where it kept the link over suspend), no libertas reset was required after resume:

[ 3014.752592] Freezing user space processes ... (elapsed 0.01 seconds) done.
[ 3014.763188] Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
[ 3014.763414] PM: Entering mem sleep
[ 3014.808055] mmc_suspend_host: turning off power on mmc0
[ 3014.808088] sdhci_set_ios: power off for mmc0 from c0699611
[ 3014.808111] sdhci_set_power: new power value = 0
[ 3014.808222] sdhci-pci 0000:00:0c.1: PME# disabled
[ 3014.813490] sdhci_pci_probe: Enable PME set to 0x1a0c108
[ 3014.814962] ehci_hcd 0000:00:0f.5: PME# disabled
[ 3014.820349] ohci_hcd 0000:00:0f.4: PME# disabled
[ 3014.839344] olpc_do_sleep!
[ 3014.839344] CAFÉ NAND 0000:00:0c.0: restoring config space at offset 0xf (was 0x8080100, writing 0x808010b)
[ 3014.839344] CAFÉ NAND 0000:00:0c.0: restoring config space at offset 0x3 (was 0x800000, writing 0x802000)
[ 3014.839344] sdhci-pci 0000:00:0c.1: restoring config space at offset 0x1 (was 0x2b00002, writing 0x2b00006)
[ 3014.839344] cafe1000-ccic 0000:00:0c.2: restoring config space at offset 0xf (was 0x8080100, writing 0x808010b)
[ 3014.839344] cafe1000-ccic 0000:00:0c.2: restoring config space at offset 0x3 (was 0x800000, writing 0x802000)
[ 3014.839344] cafe1000-ccic 0000:00:0c.2: restoring config space at offset 0x1 (was 0x2b00002, writing 0x2b00006)
[ 3014.839344] ohci_hcd 0000:00:0f.4: PME# disabled
[ 3014.839344] ehci_hcd 0000:00:0f.5: PME# disabled
[ 3014.839344] olpm-pm:  PM_PWRBTN wakeup event received
[ 3015.246391] sdhci_set_power: new power value = 14
[ 3015.286384] sdhci_set_power: new power value = 12
[ 3016.623091] sdhci_pci_probe: Enable PME is 0x1a0c008
[ 3016.643304] cs5535audio 0000:00:0f.3: setting latency timer to 64
[ 3016.647670] ohci_hcd 0000:00:0f.4: PME# disabled
[ 3016.647703] ohci_hcd 0000:00:0f.4: setting latency timer to 64
[ 3016.738895] usb usb2: root hub lost power or was reset
[ 3016.738933] ehci_hcd 0000:00:0f.5: PME# disabled
[ 3016.738965] ehci_hcd 0000:00:0f.5: setting latency timer to 64
[ 3016.738989] usb usb1: root hub lost power or was reset
[ 3016.739025] ehci_hcd 0000:00:0f.5: cache line size of 32 is not supported
[ 3017.036455] usb 1-1: reset high speed USB device using ehci_hcd and address 2
[ 3017.196575] PM: Finishing wakeup.
[ 3017.196594] Restarting tasks ... done.
[ 3017.530516] i2c-adapter i2c-0: timeout in state address

comment:9 follow-ups: Changed 8 years ago by dsaxena

  • Status changed from new to assigned

I wasn't really sure that the patch removing the call to host_sleep_cfg() was the right solution so started doing some testing. What Marvell's patch does is move the call to host_sleep_cfg() out out of the WOL-configuration path and into the suspend path. By default, WOL is disabled, but if I run "ethtool -s eth0 u" and then "echo mem > /sys/power/state", the system will go to sleep and I can wake it up with a ping from a host. When we resume, I still see a "command 0x0043 failed" and the reason is b/c the WOL-disable command (EHS_REMOVE_WAKEUP) does not seem to be supported by the 8388 firmware. This is not fatal as the card keeps running.

The proper fix would be to update the userspace on XO-1 F11 to submit the proper WOL commands on suspend as we do on XO-1.5 but this still leaves us with the case where we are closing the lid and we want full WOL disable which we can't issue. :/ A workaround would be to unload the driver on lid close and re-load it on lid-open, which is reasonable as resume speed is not critical for this use case.

Another option is to add a flag to the lbs_priv structure that we fill in at probe time to let the suspend/resume code know not to submit the PM commands and move the WOL-enable back into the WOL path for that case. This is not much different than Bernie's patches but a bit more upstream friendly than checking the machine type on every suspend. I will first test Bernie's patches and see if I can reproduce the issue Sascha saw as I would likely see the same issue with the other approach.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 8 years ago by sascha_silbe

Replying to dsaxena:

By default, WOL is disabled, but if I run "ethtool -s eth0 u" and then "echo mem > /sys/power/state", the system will go to sleep and I can wake it up with a ping from a host. When we resume, I still see a "command 0x0043 failed" and the reason is b/c the WOL-disable command (EHS_REMOVE_WAKEUP) does not seem to be supported by the 8388 firmware. This is not fatal as the card keeps running.

If the command is not supported, why does the kernel issue it?

The proper fix would be to update the userspace on XO-1 F11 to submit the proper WOL commands on suspend as we do on XO-1.5 but this still leaves us with the case where we are closing the lid and we want full WOL disable which we can't issue. :/ A workaround would be to unload the driver on lid close and re-load it on lid-open, which is reasonable as resume speed is not critical for this use case.

I'm a bit confused about the kernel/userspace interaction here, probably because I'm on Debian, not on Fedora. Can you please elaborate on what the user space parts are and what exactly they do on Fedora? Do you use powerd or OHM?

comment:11 in reply to: ↑ 9 Changed 8 years ago by Quozl

Replying to dsaxena:

The proper fix would be to update the userspace on XO-1 F11 to submit the proper WOL commands on suspend as we do on XO-1.5 [...]

I dislike the idea of putting something in userspace that could be handled by the kernel, especially if a default behaviour is obvious.

Our kernel is used by other distributions other than our own.

However, I can see your point; the kernel can't tell the difference (at the moment) between a suspend for idle and a suspend for lid-close, and we've given that job to userspace.

comment:12 Changed 8 years ago by bernie

  • Cc bernie added

Changed 8 years ago by dsaxena

Remove call to EHS_REMOVE_WAKEUP on firwmare that does not support it

comment:13 Changed 8 years ago by dsaxena

I've uploaded a patch that fixes the issue w/o being dependent on XO-1 machine checks. Will commit to olpc-2.6.31-updates branch.

comment:14 in reply to: ↑ 10 Changed 8 years ago by dsaxena

  • Cc pgf added

Replying to sascha_silbe:

Replying to dsaxena:

By default, WOL is disabled, but if I run "ethtool -s eth0 u" and then "echo mem > /sys/power/state", the system will go to sleep and I can wake it up with a ping from a host. When we resume, I still see a "command 0x0043 failed" and the reason is b/c the WOL-disable command (EHS_REMOVE_WAKEUP) does not seem to be supported by the 8388 firmware. This is not fatal as the card keeps running.

If the command is not supported, why does the kernel issue it?

The code was written specifically to handle proper suspend/resume on XO-1.5 where the command is supported. Even in the old XO-1 code/kernel, we would still issue the command on "ethtool -s eth0 wol d", it would just happen at the time of the user command and not later during the suspend path.

The proper fix would be to update the userspace on XO-1 F11 to submit the proper WOL commands on suspend as we do on XO-1.5 but this still leaves us with the case where we are closing the lid and we want full WOL disable which we can't issue. :/ A workaround would be to unload the driver on lid close and re-load it on lid-open, which is reasonable as resume speed is not critical for this use case.

I'm a bit confused about the kernel/userspace interaction here, probably because I'm on Debian, not on Fedora. Can you please elaborate on what the user space parts are and what exactly they do on Fedora? Do you use powerd or OHM?

We are currently using OHM, working on switching to powerd. Paul, (who I'm cc:ing) can fill in details.

comment:15 Changed 8 years ago by sascha_silbe

Thanks for the explanations, it's slowly starting to make sense. With your patches in olpc-2.6.31-updates, it's working fine for the first suspend/resume cycle. But after the cycle WOL is disabled, so it doesn't wake up after the next resume. What puzzles me is that resetting the WOL state seems to be intentional:

{{{drivers/net/wireless/libertas/main.c:

int lbs_resume(struct lbs_private *priv) {

int ret;

lbs_deb_enter(LBS_DEB_FW);

priv->fw_ready = 1; priv->wol_criteria |= EHS_REMOVE_WAKEUP;

ret = lbs_host_sleep_cfg(priv, priv->wol_criteria,

(struct wol_config *)NULL);

netif_device_attach(priv->dev); if (priv->mesh_dev)

netif_device_attach(priv->mesh_dev);

lbs_deb_leave_args(LBS_DEB_FW, "ret %d", ret); return ret;

} }}}

After commenting out the priv->wol_criteria change and the lbs_host_sleep_cfg() call, it works as expected.

comment:16 Changed 8 years ago by pgf

i had assumed it was meant to be this way -- powerd and ohmd both reinitialize the wake-on-lan options by calling ethtool before every suspend. but we should find out. it would be preferable to be able to skip that step, though in practice it might be safer not to (since the setting might have been changed "out of band", and might not get back in sync very soon. (for testing on XO-1, you can enable the ethtool calls in powerd for XO-1 by commenting out the obvious line in set_wake_on_wlan())

comment:17 Changed 8 years ago by pgf

is there a kernel rpm for XO-1 built from the -updates branch i can try? i think this fix would make power management on XO-1 a lot more reliable, for the f11-on-x01 builds. currently powerd gets really confused, because although it looks like the system slept and woke up, it didn't actually ever sleep completely, and the observed wakeup reason is left over from a previous wakeup. so powerd may do a shutdown, thinking the power button was pushed, when actually it was just a keypress after the power button caused a wakeup some time ago. (for example)

comment:18 Changed 8 years ago by pgf

dsaxena -- if you think this fix is correct, then i (or you) can move it to olpc-2.6.31.

comment:19 Changed 7 years ago by pgf

  • Action Needed changed from diagnose to add to build
  • Resolution set to fixed
  • Status changed from assigned to closed

i've reworked the code surrounding libertas commands during suspend/resume (8db05fd8) as well as changing the way in which we avoid the failing EHS_REMOVE_WAKEUP setting (515dc731). i believe this bug is fixed.

comment:20 Changed 7 years ago by Quozl

  • Action Needed changed from add to build to no action
  • Milestone changed from 1.0-software-later to 10.1.2

comment:21 Changed 2 years ago by Quozl

  • Milestone 10.1.2 deleted

Milestone 10.1.2 deleted

Note: See TracTickets for help on using tickets.