Ticket #9744 (closed defect: fixed)

Opened 4 years ago

Last modified 4 years ago

XO-1.5 WLAN errors after multiple suspend/resumes

Reported by: wad Owned by: mbletsas
Priority: high Milestone: 1.5-software-later
Component: wireless Version: Development build as of this date
Keywords: XO-1.5 WLAN Cc: rchokshi, cjb, edmcnierney
Action Needed: diagnose Verified: no
Deployments affected: Blocked By:
Blocking:

Description (last modified by cjb) (diff)

On an XO-1.5 B3 prototype (#17), running Q3A16 and os45, I had WLAN errors after multiple suspend/resumes.

The laptop was modified with the XO1.5 WLAN SR ECO:

http://wiki.laptop.org/go/XO1.5_WLAN_SR_ECO

The laptop was set to autowack by adding:

50 autowack-delay

autowack-on

to the olpc.fth file.

It was associated with an access point.

From the terminal activity, I enabled EC wakeup events using:

sudo sh

echo EC > /proc/acpi/wakeup

I then started a script (dosr) running:

for i in seq 1 100000; do echo $i; sleep 1; echo mem > /sys/power/state; done;

After twenty cycles, I stopped it and checked the interface using iwconfig and ping the gateway just fine.

I restarted it, and after ten or so cycles I noticed that the WLAN LED was flashing very quickly. I stopped, and when I typed iwconfig it took many second to respond. This happened repeatedly.

This is easy to reproduce.

Attachments

slow_log (152.1 kB) - added by wad 4 years ago.
This is the serial log of a laptop which is having problems with the WLAN, including a dmesg dump which contains the entire episode.

Change History

Changed 4 years ago by wad

This is the serial log of a laptop which is having problems with the WLAN, including a dmesg dump which contains the entire episode.

  Changed 4 years ago by wad

Sorry, but the initial report is wrong (thank cut & paste). I didn't have a reboot instead of a resume. I had a very slow (if at all functional) WLAN after some number of suspend/resume cycles.

  Changed 4 years ago by cjb

  • description modified (diff)

  Changed 4 years ago by cjb

  • cc rchokshi added; ronak removed

Well, that's all extremely broken. I'm hoping this is the mdelay()s too.

  Changed 4 years ago by Quozl

  • milestone changed from Not Triaged to 1.5-software-beta

triage.

  Changed 4 years ago by cjb

please test with os46

  Changed 4 years ago by wad

I've seen a laptop run for 32K S/R cycles while not associated, and it didn't lose the WLAN. Running while associated (and pinging) causes the WLAN to be lost after between one minute and twenty minutes of S/R cycling.

  Changed 4 years ago by dsaxena

Reproduced on my B2 with WLAN_EN on EC ECO.

Also just tried running the loop continuously w/o stopping in the middle and after about 70 cycles the card just disappeared (that is it does not even show up in 'iwconfig' or in /sys/bus/sdio/devices).

  Changed 4 years ago by dsaxena

  • cc cjb, edmcnierney added

I've been running lots of S/R loops and grabbing the console logs to see if I can find a pattern and I've got some info but really need to look at what is happening on the firmware side and co-relate it to the kernel to move forward.

One thing I have determined is that every time that the WLAN card stops responding, we see the following message very close to when the WLAN starts miss-behaving. Either we'll get a timeout on CMD 0x0006 (SCAN) and then requeue it and later start seeing the list_add() and list_del() messages, or we'll see the following and then immediately see the list corruption warning.

[  383.770098] mmc2: Timeout waiting for hardware interrupt.
[  383.775777] sdhci: ============== REGISTER DUMP ==============
[  383.780085] sdhci: Sys addr: 0x00000000 | Version:  0x00000000
[  383.780085] sdhci: Blk size: 0x00000000 | Blk cnt:  0x00000000
[  383.780085] sdhci: Argument: 0x10004000 | Trn mode: 0x00000000
[  383.780085] sdhci: Present:  0x01f70000 | Host ctl: 0x00000001
[  383.780085] sdhci: Power:    0x0000000f | Blk gap:  0x00000000
[  383.780085] sdhci: Wake-up:  0x00000000 | Clock:    0x00000107
[  383.780085] sdhci: Timeout:  0x00000000 | Int stat: 0x00000000
[  383.780085] sdhci: Int enab: 0x00ff0103 | Sig enab: 0x00ff0103
[  383.780085] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000
[  383.780085] sdhci: Caps:     0x056030b0 | Max curr: 0x00f001f0
[  383.780085] sdhci: ===========================================
[  383.851333] mmc2:0001:1: resume: we're back

Overall, what we're seeing with WLAN is very similar to #7458 on XO-1 except that it happens much quicker (which does make it easier to reproduce and debug on the plus side).

  Changed 4 years ago by culseg

With Power Auto Mgm unchecked in os50 I get wireless disconnects after about 2 minutes, independent if I access any network or enter any keys. Was stable in os48 for 1-2 days

These are full restarts/reboots so perhaps this should be different trac ticket.

See: http://pastebin.com/d1136377b

  Changed 4 years ago by culseg

Possible wireless and touchpad connection found: os50

Wireless stays on if I keep making small or large circles on touchpad continuously for over 5 minutes, wireless led lights, even blinks, but stays on and icon shows connectivity.

Soon after stopping active contact with touchpad, wireless fails.

  Changed 4 years ago by cjb

It must be suspending.

  Changed 4 years ago by dsaxena

Just saw this during one of my tests, possibly related.

[  672.412656] mmc1: Controller never released inhibit bit(s).                  
[  672.412656] sdhci: ============== REGISTER DUMP ==============               
[  672.412656] sdhci: Sys addr: 0xffffffff | Version:  0x0000ffff               
[  672.412656] sdhci: Blk size: 0x0000ffff | Blk cnt:  0x0000ffff               
[  672.412656] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff               
[  672.412656] sdhci: Present:  0xffffffff | Host ctl: 0x000000ff               
[  672.412656] sdhci: Power:    0x000000ff | Blk gap:  0x000000ff               
[  672.412656] sdhci: Wake-up:  0x000000ff | Clock:    0x0000ffff               
[  672.412656] sdhci: Timeout:  0x000000ff | Int stat: 0xffffffff               
[  672.412656] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff               
[  672.412656] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff               
[  672.412656] sdhci: Caps:     0xffffffff | Max curr: 0xffffffff               
[  672.412656] sdhci: ===========================================               
[  672.502468] mmc1: Reset 0x2 never completed.                                 
[  672.502468] sdhci: ============== REGISTER DUMP ==============               
[  672.502468] sdhci: Sys addr: 0xffffffff | Version:  0x0000ffff               
[  672.502468] sdhci: Blk size: 0x0000ffff | Blk cnt:  0x0000ffff               
[  672.502468] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff               
[  672.502468] sdhci: Present:  0xffffffff | Host ctl: 0x000000ff               
[  672.502468] sdhci: Power:    0x000000ff | Blk gap:  0x000000ff               
[  672.502468] sdhci: Wake-up:  0x000000ff | Clock:    0x0000ffff               
[  672.502468] sdhci: Timeout:  0x000000ff | Int stat: 0xffffffff               
[  672.502468] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff               
[  672.502468] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff               
[  672.502468] sdhci: Caps:     0xffffffff | Max curr: 0xffffffff               
[  672.502468] sdhci: ===========================================               
[  672.502468] mmc1: Reset 0x4 never completed.                                 
[  672.502468] sdhci: ============== REGISTER DUMP ==============               
[  672.502468] sdhci: Sys addr: 0xffffffff | Version:  0x0000ffff               
[  672.502468] sdhci: Blk size: 0x0000ffff | Blk cnt:  0x0000ffff               
[  672.502468] sdhci: Argument: 0xffffffff | Trn mode: 0x0000ffff               
[  672.502468] sdhci: Present:  0xffffffff | Host ctl: 0x000000ff               
[  672.502468] sdhci: Power:    0x000000ff | Blk gap:  0x000000ff               
[  672.502468] sdhci: Wake-up:  0x000000ff | Clock:    0x0000ffff               
[  672.502468] sdhci: Timeout:  0x000000ff | Int stat: 0xffffffff               
[  672.502468] sdhci: Int enab: 0xffffffff | Sig enab: 0xffffffff               
[  672.502468] sdhci: AC12 err: 0x0000ffff | Slot int: 0x0000ffff               
[  672.502468] sdhci: Caps:     0xffffffff | Max curr: 0xffffffff               
[  672.502468] sdhci: =========================================== 

  Changed 4 years ago by culseg

After quick dropped wireless with brief flashing light ( unless touchpad was kept active) I tested re-enabling "Power Auto_ Mgm" restart then un check same power mgm and restart...now wireless appears back and stable.

Only other anomaly is that the record led has stayed on, stopping Record.

On os51 with ver 18 rom

follow-up: ↓ 15   Changed 4 years ago by Quozl

Reproduced the loss of wireless despite Automatic power management being disabled. I'll raise a new ticket for it.

Sandy, it was quite definitely suspending despite Automatic power management being off. The evidence you gave in IRC confirms this.

I've just got a unit to do this. The checkbox is off, yet the unit does an idle dim and suspend.

I'm fairly sure it is as a result of installing os50, turning off the checkbox, olpc-update to os51, then rebooting. I'll try to reproduce it carefully.

in reply to: ↑ 14   Changed 4 years ago by cjb

Replying to Quozl:

I'm fairly sure it is as a result of installing os50, turning off the checkbox, olpc-update to os51, then rebooting. I'll try to reproduce it carefully.

Yep, I think that might do it. Please file a separate OHM bug.

  Changed 4 years ago by Quozl

The idle suspend in violation of configuration settings has been moved to new ticket #9802.

This ticket #9744 remains WLAN errors after multiple suspend and resumes.

  Changed 4 years ago by cjb

  • status changed from new to closed
  • resolution set to fixed

Closing as not reproducible with current hardware, yell if incorrect.

Note: See TracTickets for help on using tickets.