Opened 7 years ago

Closed 7 years ago

#4927 closed defect (fixed)

[firmware] beacon interval gets reset by other operations

Reported by: carrano Owned by: mbletsas
Priority: normal Milestone: Future Release
Component: wireless Version:
Keywords: beacon Cc: mbletsas, jcardona@…, kim, dwmw2
Blocked By: Blocking:
Deployments affected: Action Needed:
Verified: no

Description

Setting up of beacon intervals (with iwpriv ethX bcn_control) is overriden, after some time, by NM, returning to its defaulf value (1 100 - enabled, 100ms) .

The following tests were performed with joyride-269 build in an environment where the only XO present is the one under test.

1 - Starting with default settings (XO just turned on), we set beacon interval to 10 seconds:
iwpriv eht0 bcn_control 1 10000
It works for some time (less than a minute) but then returns to the default value of 100ms.
See attached file: beacon-interval-test-build269

2 - Kill NetworkManager and repeat the test (reissue the iwpriv command)
Attached cap file shows that the config stays: beacon-interval-test-build269-butnoNM

Attachments (5)

beacon-interval-test-build269 (14.9 KB) - added by carrano 7 years ago.
beacon interval returns to 100ms
beacon-interval-test-build269-butnoNM (1.8 KB) - added by carrano 7 years ago.
beacon interval is set to 10s and stays
dmesg_output (119.9 KB) - added by carrano 7 years ago.
'iwlist scan' overrides beacon interval
capture-crash.pcap (3.8 MB) - added by carrano 7 years ago.
during 20p4 wireless crash
after-kernel-panic (588 bytes) - added by carrano 7 years ago.
sendinf out frames after kernel panic

Change History (21)

Changed 7 years ago by carrano

beacon interval returns to 100ms

Changed 7 years ago by carrano

beacon interval is set to 10s and stays

comment:1 Changed 7 years ago by carrano

  • Cc jcardona@… kim added; jcardona removed

comment:2 Changed 7 years ago by dcbw

NM doesn't explicitly touch beacon interval at all. It's probably a side-effect of some other unrelated wireless operation.

comment:3 Changed 7 years ago by carrano

Only to add that disabling the beacons (iwpriv eth0 bcn_control 0) also gets reversed to its default (1 100) value if NM is running (and not if NM is not running).

This is very easy to reproduce and you can even observe without sniffing. Just check the output of "iwpriv eth0 bcn_control". It will get back to the default in less than one minute.

comment:4 Changed 7 years ago by jg

  • Milestone changed from Never Assigned to Future Release

comment:5 Changed 7 years ago by dcbw

  • Component changed from network manager to wireless
  • Owner changed from dcbw to mbletsas

I'd suggest putting some debugging info into the driver to report back what values are actually returned from the firmware. The driver does _not_ store the requested beacon interval internally for long, any time you execute a get-beacon it overwrites the internal value with the value the firmware returns. So it may be that the firmware isn't keeping the right value around either.

Furthermore, the value is sent to the firmware during adhoc association, so any time the association happens the value may change. You should put a debug print in the driver where the driver sends the beacon_period to the firmware and see what that value is, and also a debug print when the value gets read from the firmware.

Basically, NM never touches this value. It doesn't matter what results happen when NM is running and when it's not, because NM isn't the problem. It only makes the problem happen more often.

Whatever the driver or firmware is doing is somehow overwriting the beacon period value as a result of other operations in the driver or firmware. The value returned from iwpriv _always_ is what the firmware reports, and the firmware is only ever told what the beacon period is through (a) the iwpriv command, and (b) when starting an adhoc network. Somebody needs to put more debugging info into the driver to trace down the driver/firmware interaction here.

comment:6 Changed 7 years ago by ashish

  • Summary changed from NetworkManager overrides iwpriv command to l

I would like to add a few things here.
Driver does store beacon interval and also whether beacon is enabled or not. All network start (adhoc start/infra join) commands pass this stored beacon interval to the firmware and firmware updates mesh beacon frequency based on the received value from these commands.
Also, when we get beacon interval with iwpriv call it returns already stored beacon interval and does not fetch beacon information from the firmware.
However, when we set beacon, it modifies driver's stored beacon interval and also pass down new beacon interval to the firmware.
By default, beacons are enbaled with interval of 100ms, so I suspect that when NM is running probably it's trying to pass down ad-hoc start/join/infra join command with deafult beacon interval of 100ms to the firmware.
driver log with
echo 0x126000 > /sys/module/libertas/parameters/libertas_debug
can shed more light on this.
A sniffer capture may also help.

comment:7 Changed 7 years ago by tomeu

  • Summary changed from l to NetworkManager overrides iwpriv command

comment:8 Changed 7 years ago by carrano

In fact, iwpriv command gets overriden by wireless operations.

An "iwlist scan" is an example of such an operation that will revert the beacon interval to its default value (1 100).

Another such example is an association operation (iwconfig eth0 mode managed essid test).

For the attached file (dmesg_output) the beacon was disabled (iwpriv eth0 bcn_control O). After some seconds a scanning (iwlist scan) was executed, overriding the configuration - the beacons returned.

Changed 7 years ago by carrano

'iwlist scan' overrides beacon interval

comment:9 Changed 7 years ago by ashish

Could you please verify this with firmware release 5.110.21.p1?

comment:10 Changed 7 years ago by carrano

Tested with 21p1 - failed (doesn't accept private ioctls / Operation not permitted).
*but* I am not sure that we would expect it to work anyway. I used 643 with a customized kernel (from Javier - to implement mesh stop). We do not have driver patches in the stable builds (to my best knowledge).

Tested also with 20p4x - it gets overriden the same way.

Changed 7 years ago by carrano

during 20p4 wireless crash

Changed 7 years ago by carrano

sendinf out frames after kernel panic

comment:11 follow-up: Changed 7 years ago by carrano

I am trying for some hours now to reproduce the blinking-light/broken-wireless status in
order to determine if, after the crash, the XO still forward frames.

I started testing 20p42 in four XOs (with builds varying from 640 to 643) and I got very
stable results. One of the XOs is pinging the AP ("media lab 802.11") for 3+ hours now
using 2k packets and an interval of 100ms with 0% loss. The other 3 are doing default
pings, also with no packet loss.

I then went back to 20p4. Again I couldn't break it the same way, but I could break it in other ways.

In the uploaded capture file (capture-crash.pcap) after a successful association and some pinging to the AP, the XO stoped pinging and began to dump "usb_tx_block using URB already in flight" on the screen. After some time we had a kernel panic (not registered in the capture file which was interrupted first).

Kernel Panic - Not syncing: Fatal exception in interrupt
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware
directly.

The after-kernel-panic file shows packets captured after the kernel panic.

In resume, 20p42 at least in my tests is much more reliable than 20p4 or 20p41.

The 20p4, on the other hand. is relatively easy to break. Just associating and generating some traffic will do the job (at least here at 1cc). But right now, I couldn't get the continuously blinking led scenario again.

comment:12 in reply to: ↑ 11 Changed 7 years ago by carrano

Replying to carrano:

I am trying for some hours now to reproduce the blinking-light/broken-wireless status in
order to determine if, after the crash, the XO still forward frames.

I started testing 20p42 in four XOs (with builds varying from 640 to 643) and I got very
stable results. One of the XOs is pinging the AP ("media lab 802.11") for 3+ hours now
using 2k packets and an interval of 100ms with 0% loss. The other 3 are doing default
pings, also with no packet loss.

I then went back to 20p4. Again I couldn't break it the same way, but I could break it in other ways.

In the uploaded capture file (capture-crash.pcap) after a successful association and some pinging to the AP, the XO stoped pinging and began to dump "usb_tx_block using URB already in flight" on the screen. After some time we had a kernel panic (not registered in the capture file which was interrupted first).

Kernel Panic - Not syncing: Fatal exception in interrupt
atkbd.c: Spurious ACK on isa0060/serio0. Some program might be trying to access hardware
directly.

The after-kernel-panic file shows packets captured after the kernel panic.

In resume, 20p42 at least in my tests is much more reliable than 20p4 or 20p41.

The 20p4, on the other hand. is relatively easy to break. Just associating and generating some traffic will do the job (at least here at 1cc). But right now, I couldn't get the continuously blinking led scenario again.

I am sorry! This does not belong here! Please ignore this post.

comment:13 Changed 7 years ago by dwmw2

  • Cc dwmw2 added
  • Summary changed from NetworkManager overrides iwpriv command to [firmware] beacon interval gets reset by other operations

comment:14 Changed 7 years ago by carrano

iwpriv msh0 bcn_control still doesn't work (build now is update.1 702)

comment:15 Changed 7 years ago by carrano

Firmware version 22.p8 fixes this. But since this firmware is still not approved (see #6854), I will keep this open until we have the fix in an approved release.

comment:16 Changed 7 years ago by carrano

  • Resolution set to fixed
  • Status changed from new to closed

Since firmware 22.p14 is in officially released, I am closing this. Fixed.

Note: See TracTickets for help on using tickets.