Ticket #6586 (closed defect: fixed)

Opened 7 years ago

Last modified 6 years ago

SCAN command fails, timer doesn't fire

Reported by: dwmw2 Owned by: dwmw2
Priority: blocker Milestone: Update.1
Component: wireless Version:
Keywords: release? Cc: mblestas, yani, carrano, kim, jg, cscott, dwmw2, ashish, mstone, mtd
Action Needed: Verified: no
Deployments affected: Blocked By:
Blocking:

Description

In testing we've seen that sometimes the command queue gets full. This shouldn't happen -- we set a timer to resubmit commands, and then reset the device completely, when commands don't complete. But for some reason it doesn't fire.

We think it's a SCAN command because when we unload the module, we get a 'SCAN failed' message as it's cancelled.

Obviously there's an underlying firmware bug, but there's also a driver bug -- I don't understand why the timer isn't firing. Have added some debugging messages when the timer is armed and cancelled; would like to see debug output while this happens.

Attachments

iwlist.dump (189 bytes) - added by yani 6 years ago.
logs.CSN7440001C.2008-03-11.15-47-18.tar.bz2 (23.6 kB) - added by yani 6 years ago.

Change History

Changed 7 years ago by dwmw2

  • owner changed from jg to dwmw2
  • component changed from distro to wireless

Changed 7 years ago by dwmw2

  • cc giannisgalanis@… added
  • status changed from new to assigned

I've been unable to reproduce this -- perhaps because our testing is now focused on unicast communication with the school server rather than multicast communication, so the network is less busy. I cannot see why the command timer would ever fail to fire -- I need to catch it in the act with debugging enabled. I don't think I'll manage that at home though...

Changed 6 years ago by yani

  • cc mblestas, yani, carrano, kim, jim, cscott added; giannisgalanis@… removed
  • priority changed from normal to blocker

We had several instances of this bug with 698.

overview: Couldnt connect to any mesh iwlist displayed "Failed to read scan data : Resource temporarily unavailable" rmmod usb8xxx displayed "libertas: SCAN_CMD failed"

It happened in 3 XOs:

1)worked fine; clicked register to connect to jabber;when i rebooted it had the bug

2)worked fine(connection to net was confirned); left alone overnight; bug was seen next morning

3)rebooted; it had the bug right from the start

I have also seen somth similar(the same iwlist output) some weeks ago as well.

Changed 6 years ago by yani

Changed 6 years ago by yani

Changed 6 years ago by jg

  • cc jg, dwmw2 added; jim removed
  • milestone changed from Never Assigned to Update.1

Changed 6 years ago by dwmw2

Those logs lack the debugging output, and don't shed much light.

They do show this:

[   77.298195] libertas: Command 16 timed out
[   77.298237] libertas: requeueing command 16 due to timeout (#1)

... but no indication of whether the command eventually completes or not, due to the lack of debugging. I'll take another look at the timeout code, which I tested and verified to be working a few weeks ago, but I don't think I'll magically see the problem this time; I need it reproduced with debug logs.

Changed 6 years ago by Blaketh

  • keywords release? added

Changed 6 years ago by ashish

  • cc ashish added

Not sure if the following observations are related to this but worth to investigate.

Sometimes while sending down command or data, tx_urb callback does not get called resulting priv->dnld_sent set forever. As long as this is set libertas driver does not send any data packet or command down to the firmware, which is expected behavior. When it happened nothing was pending in the firmware, and firmware to host path was still up, and tx_urb->status stuck to EINPROGRESS. The following can be used to reproduce this, however, not always reproducible.

XO1 and XO2 on channel 1. AP configured on channel 6.
XO1:
ping <XO2>
while true; do iwconfig eth0 channel 1; sleep 1; iwconfig eth0 mode mana essid AP; iwconfig; sleep 1; done

After a while iwconfig stops returning any result, and tx_urb->status stuck to EINPROGRESS (115).

Changed 6 years ago by mstone

  • cc mstone added

Changed 6 years ago by mtd

  • cc mtd added

Changed 6 years ago by bcavagnolo

  • status changed from assigned to closed
  • resolution set to fixed

Hello,

I managed to reproduce the problem by forcing the driver to drop an occasional command packet destined for the USB. As observed in some of the above posts, the dmesg output reveals that the timer is in fact firing. The problem was that the dnld_sent state variable was not being updated after the timer expired, so lbs_execute_next_command was not being called. I submitted a patch to devel .AT. laptop.org. See http://lists.laptop.org/pipermail/devel/2008-May/014035.html

Note: See TracTickets for help on using tickets.