Opened 7 years ago

Closed 6 years ago

#6586 closed defect (fixed)

SCAN command fails, timer doesn't fire

Reported by: dwmw2 Owned by: dwmw2
Priority: blocker Milestone: Update.1
Component: wireless Version:
Keywords: release? Cc: mblestas, yani, carrano, kim, jg, cscott, dwmw2, ashish, mstone, mtd
Blocked By: Blocking:
Deployments affected: Action Needed:
Verified: no

Description

In testing we've seen that sometimes the command queue gets full. This shouldn't happen -- we set a timer to resubmit commands, and then reset the device completely, when commands don't complete. But for some reason it doesn't fire.

We think it's a SCAN command because when we unload the module, we get a 'SCAN failed' message as it's cancelled.

Obviously there's an underlying firmware bug, but there's also a driver bug -- I don't understand why the timer isn't firing. Have added some debugging messages when the timer is armed and cancelled; would like to see debug output while this happens.

Attachments (2)

iwlist.dump (189 bytes) - added by yani 7 years ago.
logs.CSN7440001C.2008-03-11.15-47-18.tar.bz2 (23.6 KB) - added by yani 7 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 7 years ago by dwmw2

  • Component changed from distro to wireless
  • Owner changed from jg to dwmw2

comment:2 Changed 7 years ago by dwmw2

  • Cc giannisgalanis@… added
  • Status changed from new to assigned

I've been unable to reproduce this -- perhaps because our testing is now focused on unicast communication with the school server rather than multicast communication, so the network is less busy. I cannot see why the command timer would ever fail to fire -- I need to catch it in the act with debugging enabled. I don't think I'll manage that at home though...

comment:3 Changed 7 years ago by yani

  • Cc mblestas yani carrano kim jim cscott added; giannisgalanis@… removed
  • Priority changed from normal to blocker

We had several instances of this bug with 698.

overview:
Couldnt connect to any mesh
iwlist displayed "Failed to read scan data : Resource temporarily unavailable"
rmmod usb8xxx displayed "libertas: SCAN_CMD failed"

It happened in 3 XOs:

1)worked fine; clicked register to connect to jabber;when i rebooted it had the bug

2)worked fine(connection to net was confirned); left alone overnight; bug was seen next morning

3)rebooted; it had the bug right from the start

I have also seen somth similar(the same iwlist output) some weeks ago as well.

Changed 7 years ago by yani

comment:4 Changed 7 years ago by jg

  • Cc jg dwmw2 added; jim removed
  • Milestone changed from Never Assigned to Update.1

comment:5 Changed 7 years ago by dwmw2

Those logs lack the debugging output, and don't shed much light.

They do show this:

[   77.298195] libertas: Command 16 timed out
[   77.298237] libertas: requeueing command 16 due to timeout (#1)

... but no indication of whether the command eventually completes or not, due to the lack of debugging. I'll take another look at the timeout code, which I tested and verified to be working a few weeks ago, but I don't think I'll magically see the problem this time; I need it reproduced with debug logs.

comment:6 Changed 7 years ago by Blaketh

  • Keywords release? added

comment:7 Changed 7 years ago by ashish

  • Cc ashish added

Not sure if the following observations are related to this but worth to investigate.

Sometimes while sending down command or data, tx_urb callback does not get called resulting priv->dnld_sent set forever.
As long as this is set libertas driver does not send any data packet or command down to the firmware, which is expected behavior.
When it happened nothing was pending in the firmware, and firmware to host path was still up, and tx_urb->status stuck to EINPROGRESS.
The following can be used to reproduce this, however, not always reproducible.

XO1 and XO2 on channel 1. AP configured on channel 6.
XO1:
ping <XO2>
while true; do iwconfig eth0 channel 1; sleep 1; iwconfig eth0 mode mana essid AP; iwconfig; sleep 1; done


After a while iwconfig stops returning any result, and tx_urb->status stuck to EINPROGRESS (115).

comment:8 Changed 6 years ago by mstone

  • Cc mstone added

comment:9 Changed 6 years ago by mtd

  • Cc mtd added

comment:10 Changed 6 years ago by bcavagnolo

  • Resolution set to fixed
  • Status changed from assigned to closed

Hello,

I managed to reproduce the problem by forcing the driver to drop an occasional command packet destined for the USB. As observed in some of the above posts, the dmesg output reveals that the timer is in fact firing. The problem was that the dnld_sent state variable was not being updated after the timer expired, so lbs_execute_next_command was not being called. I submitted a patch to devel .AT. laptop.org. See http://lists.laptop.org/pipermail/devel/2008-May/014035.html

Note: See TracTickets for help on using tickets.