Ticket #7922 (new defect)

Opened 6 years ago

Last modified 6 years ago

hang on suspend when USB-ethernet is connected

Reported by: dsd Owned by: dsaxena
Priority: normal Milestone: 8.2.0 (was Update.2)
Component: kernel Version: not specified
Keywords: joyride-2281:- blocks-:8.2.0 relnote Cc: cjb, gregorio, mikus
Action Needed: never set Verified: no
Deployments affected: Blocked By:
Blocking:

Description

I'm running an XO with joyride 2281 and a USB ethernet adapter plugged in. When I hit the power button to suspend it, the system crashes somewhere in the suspend routine, before the power light goes out.

Here are the final log messages from serial console:

[  830.949080] olpc-ec:  received 0x92                                          
[  830.949186] olpc-ec:  running cmd 0x1b                                       
[  830.958978] olpc-ec:  sending cmd arg 0x90                                   
[  830.959847] PM: Syncing filesystems ... done.                                
[  830.965290] PM: Preparing system for mem sleep                               
[  830.969358] Freezing user space processes ... (elapsed 0.00 seconds) done.   
[  830.984952] Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.                                                                              
[  830.992611] PM: Entering mem sleep                                           
[  830.996022] Suspending console(s)  

Power to the usb ethernet adapter remains in place - the blinky lights continue to blink even at the point when the system has hung.

Unsurprisingly it is not possible to resume the system.

This is reproducible every time, and seems to suspend fine without the usb-ethernet adapter connected. It's one of the XO-branded USB ethernet adapters floating around the office (uses the asix driver).

Change History

  Changed 6 years ago by dsd

  • keywords blocks?:8.2.0 added

  Changed 6 years ago by brian

Turning off suspend (baby-out-with-the-bathwater solution):

http://wiki.laptop.org/go/Tests/Suspend_Resume#Turning_off_suspend.2Fresume

  Changed 6 years ago by dsaxena

Daniel, I don't have an ethernet doggle ATM, so can you boot with no_console_suspend in the kernel command line so we can get some more data?

  Changed 6 years ago by dsd

That doesn't add any new messages, it just makes the "Suspending console(s) " line go away.

When it is then hung in that state, if I press the power button I get "olpm-pm: PM_PWRBTN event received" messages. So it's not completely hung.

  Changed 6 years ago by cscott

  • cc cjb added

Is an ohm workaround possible?

follow-up: ↓ 12   Changed 6 years ago by cjb

We have an open feature request for "don't suspend when a USB keyboard or ethernet adaptor is plugged in", which would be a workaround. It's not ready for code writing yet, though.

I think we need kernel work for this. I want to know what's crashing and whether anything else might crash in the same way.

  Changed 6 years ago by cjb

Please retry adding "echo 9 > /proc/sys/kernel/printk" before the suspend.

  Changed 6 years ago by dsd

I already had that set.

  Changed 6 years ago by cjb

I already had that set.

Then we should be seeing many more messages at resume. Can you attach a full log?

  Changed 6 years ago by cjb

.. oh, it crashes at suspend. Still, please do attach a full log.

  Changed 6 years ago by cjb

There are other asix ethernet dongles (the chunky dlink ones) around the office too, we should try with those as well to see whether it's device-specific.

in reply to: ↑ 6   Changed 6 years ago by dsaxena

Replying to cjb:

We have an open feature request for "don't suspend when a USB keyboard or ethernet adaptor is plugged in", which would be a workaround. It's not ready for code writing yet, though.

I don't know that we want to code that workaround as it should not be default behaviour. Someone who doesn't want suspend/resume should just disable it from control-panel and we document that as a workaround.

I think we need kernel work for this. I want to know what's crashing and whether anything else might crash in the same way.

+1

follow-up: ↓ 15   Changed 6 years ago by cjb

Someone who doesn't want suspend/resume should just disable it from control-panel and we document that as a workaround.

We don't have a control panel option for disabling suspend/resume on lid close or power button press, just for disabling idle suspend.

  Changed 6 years ago by dsd

full log:

[ 2399.203070] olpm-pm:  PM_PWRBTN event received                     
[ 2399.207772] olpc-ec:  running cmd 0x26                                       
[ 2399.213070] olpc-ec:  sending cmd arg 0x0                                    
[ 2399.223070] olpc-ec:  running cmd 0x1c                                       
[ 2399.223070] olpc-ec:  received 0xff                                          
[ 2399.223173] olpc-ec:  running cmd 0x1b                                       
[ 2399.232961] olpc-ec:  sending cmd arg 0xdf                                   
[ 2399.243070] olpc-ec:  running cmd 0x1c                                       
[ 2399.243070] olpc-ec:  received 0xdf                                          
[ 2399.243180] olpc-ec:  running cmd 0x1b                                       
[ 2399.252965] olpc-ec:  sending cmd arg 0xde                                   
[ 2399.263070] olpc-ec:  running cmd 0x1c                                       
[ 2399.263070] olpc-ec:  received 0xde                                          
[ 2399.263195] olpc-ec:  running cmd 0x1b                                       
[ 2399.272981] olpc-ec:  sending cmd arg 0x9e                                   
[ 2399.283070] olpc-ec:  running cmd 0x1c                                       
[ 2399.283070] olpc-ec:  received 0x9e                                          
[ 2399.283221] olpc-ec:  running cmd 0x1b                                       
[ 2399.293011] olpc-ec:  sending cmd arg 0x96                                   
[ 2399.303070] olpc-ec:  running cmd 0x1c                                       
[ 2399.303070] olpc-ec:  received 0x96                                          
[ 2399.303192] olpc-ec:  running cmd 0x1b                                       
[ 2399.312984] olpc-ec:  sending cmd arg 0x92                                   
[ 2399.323070] olpc-ec:  running cmd 0x1c                                       
[ 2399.323070] olpc-ec:  received 0x92                                          
[ 2399.323167] olpc-ec:  running cmd 0x1b                                       
[ 2399.332958] olpc-ec:  sending cmd arg 0x90                                   
[ 2399.333827] PM: Syncing filesystems ... done.                                
[ 2399.339235] PM: Preparing system for mem sleep                               
[ 2399.343345] Freezing user space processes ... (elapsed 0.00 seconds) done.   
[ 2399.361311] Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.                                                                              
[ 2399.366608] PM: Entering mem sleep                                           

in reply to: ↑ 13   Changed 6 years ago by dsaxena

Replying to cjb:

Someone who doesn't want suspend/resume should just disable it from control-panel and we document that as a workaround.

We don't have a control panel option for disabling suspend/resume on lid close or power button press, just for disabling idle suspend.

How hard would it be to add an option that is the opposite of "extreme power management", that is, a "no power management" option that simply does not do anything?

  Changed 6 years ago by cjb

How hard would it be to add an option that is the opposite of "extreme power management", that is, a "no power management" option that simply does not do anything?

We're past feature and string freeze (and about to hit code freeze) so that doesn't sound likely. touch /etc/ohm/inhibit-suspend is your suspend inhibit API.

  Changed 6 years ago by cjb

I'm not getting a hang with a "pegasus" usb1 ethernet device, fwiw.

  Changed 6 years ago by gregorio

  • cc gregorio added

Hi Chris,

What is the next step on this one? Can we close it as fixed or unreproducible?

Thanks,

Greg S

  Changed 6 years ago by dsd

It's 100% reproducible, I don't think we can close it

  Changed 6 years ago by cjb

I think the best we could do is find out that either:

  • the bug only happens on that particular instance of the asix chipset, in which case it might be unlikely to hit it in the field, assuming we aren't going to be providing those adapters to people.
  • the bug is also exhibited on non-XO machines with upstream kernel, meaning that fixing it is less of our responsibility

If we can't prove either of these true, I think we need to fix this soon.

  Changed 6 years ago by gregorio

  • keywords blocks-:8.2.0 relnote added; blocks?:8.2.0 removed

  Changed 6 years ago by pgf

i only just re-found this bug. i've been suspending and resuming using an XO-branded asix adapter on two different laptops for two days now. (i'm trying to figure out what keeps the ethernet from resuming correctly (i think it's the lack of "reset_resume" functionality).) but i definitely don't get the crash/hang that dsd has seen.

(running 8.2-759)

  Changed 6 years ago by pgf

i take it back. i may have just reproduced it. system hung, power light still on. no response to power button.

eventually tried pulling the ethernet dongle, and retried the power button. system tries to come up (screen is restored) but immediately suspends again. since the power light was still on when i pulled the ethernet adapter, i don't know if pulling it let the power go out (i.e., let it finished suspending) or if a new press of the button caused it to go out.

oh -- pressing the power button then using the keyboard while the screen was (very briefly) caused the system to stay on.

in any case, i think the original bug (i.e. hang/crash during suspend) is not "100% reproducible". too bad. :-/

  Changed 6 years ago by cjb

Just tested in a master build, and it doesn't crash, which means this was fixed upstream in the meantime somewhere. If someone cares enough we can backport the patch, else just wait for 9.1 and master.

  Changed 6 years ago by leejc

FYI, I'm not sure if it's the same issue, but I'm seeing similar behavior for my Linksys USB200M ethernet adapter running under 8.2.0-767. If I have the adapter in during boot, the XO comes up just fine and I can access the net through the adapter... provided I keep interacting with the system. If I leave it alone for a minute or two, however, the system locks up; it continues to show the same thing on the screen, but the keyboard and mouse are unresponsive. At that point, the only thing I can do is hold down the power button until the system turns off.

follow-up: ↓ 27   Changed 6 years ago by dsaxena

I have reproduced this on 767 though it is intermittent and I have yet to figure out what conditions cause it. Once we know what is causing it, we can try to reproduce on Joyride (2.6.27) to see if the problem still exists.

in reply to: ↑ 26   Changed 6 years ago by cjb

Replying to dsaxena:

I have reproduced this on 767 though it is intermittent and I have yet to figure out what conditions cause it. Once we know what is causing it, we can try to reproduce on Joyride (2.6.27) to see if the problem still exists.

I already did this test, with the results above:

cjb:

Just tested in a master build, and it doesn't crash, which means this was fixed upstream in the meantime somewhere.

So, it is fixed upstream. I think we probably don't care about fixing it in 8.2. This bug could be closed WONTFIX.

follow-up: ↓ 29   Changed 6 years ago by gregorio

  • cc mikus added

Hi Chris,

If i read you right, it sounds like this is going to be fixed in a future release by taking the "upstream". I think that means it will be fixed instead of wontfix, right?

Just a question of semantics.

The only reason I am commenting on this one is that I believe it was a hot item for Mikus. He has been so productive in testing for us that I think we should resolve it by 9.1 and thereby hopefully get Mikus to help testing power save modes for that release.

Thanks,

Greg S

in reply to: ↑ 28 ; follow-up: ↓ 30   Changed 6 years ago by cjb

Hi Greg,

Replying to gregorio:

If i read you right, it sounds like this is going to be fixed in a future release by taking the "upstream". I think that means it will be fixed instead of wontfix, right?

Yes, it is *already* fixed, in Joyride-which-will-become-9.1.

I'd use "WONTFIX" because there's no action we're going to take to fix the bug against the milestone it was filed against. Closing as fixed is okay too, but we can't really do that until we make the release that contains the fix, I guess.

in reply to: ↑ 29   Changed 6 years ago by mikus

Replying to cjb:

If i read you right, it sounds like this is going to be fixed in a future release by taking the "upstream". I think that means it will be fixed instead of wontfix, right?

Yes, it is *already* fixed, in Joyride-which-will-become-9.1. I'd use "WONTFIX" because there's no action we're going to take to fix the bug against the milestone it was filed against. Closing as fixed is okay too, but we can't really do that until we make the release that contains the fix, I guess.


Let me suggest updating Trac (this wiki) to add a new "resolution" to the 'Action' panel - "accepted upstream".

To me, "wontfix" sounds definitely "cold shoulder".

  Changed 6 years ago by mikus


OFF TOPIC - Metadiscussion follows:

I had not cc'd this ticket because I was not experiencing the exact original description. [I went through similar hair-splitting back in spring 2008, when I wrote a ticket about not being able to connect to a Jabber server. The specific problem was found and fixed. Me *still* not being able to connect from home to a Jabber server was shelved. (These days, if I want to connect to a Jabber server, I go to someplace with an AP.)]

So "fixing" (or "upstreaming") ticket #7922 does not directly affect me, because I never noticed the specific symptom described by #7922. Up to this moment, I've been using Ticket #5990 as the "carrier" of the problem I do experience -- if I have ethernet working, and do a suspend/resume, I no longer have ethernet working. [Perhaps I've not seen #7922 because I normally run with suspend inhibited for this reason, and thus am not looking for whatever happens minutes after a resume.]


To get back to the meta-discussion, it may be that whatever the "current solution" for #7922 is, will also be considered to close #5990. If so, I'll just open a new ticket for no longer having an ethernet connection after suspend/resume -- as far as I can tell, even with the latest Joyride (0.83), although the ethernet adapter gets powered-on by resume, and although the 'eth0' interface is re-established, that interface is given an IPv6 address. Before the suspend, my connection was running with an IPv4 address. If resume now assigns an IPv6 address, that result leaves my XO without a working ethernet connection.

Note: See TracTickets for help on using tickets.