Opened 7 years ago

Closed 7 years ago

#5485 closed defect (fixed)

Doesn't accept my Wep key anymore

Reported by: gdesmott Owned by: mbletsas
Priority: high Milestone: Update.1
Component: wireless Version:
Keywords: Cc: reinier@…, morgan.collett@…, dwmw2@…, sjoerd, JensJorgensen, dcbw, carrano@…, homunq, sdague, yani
Blocked By: Blocking:
Deployments affected: Action Needed:
Verified: no

Description

Since I upgrade my B4 to Joyride-1415 XO doesn't accept my WEP key anymore.

Maybe a kernel issue as it was upgraded to kernel.i586 0:2.6.22-20071213.5.olpc.fa094abd8cdf4d6 in this build?

Attachments (9)

wpa_supplicant.log (65.7 KB) - added by rwh 7 years ago.
libertas-wpa-oldcode-working.cap (72.5 KB) - added by dwmw2 7 years ago.
Old code, working when wpa_supplicant is run without debugging
libertas-wpa-oldcode-notworking.cap (76.8 KB) - added by dwmw2 7 years ago.
Old code, failing when wpa_supplicant is run with debugging
Trac.zip (2.5 KB) - added by reillysl 7 years ago.
Several logs showing the wpa_supplicant issue
log.dmesg (122.7 KB) - added by homunq 7 years ago.
yet another dmesg log on q2d09 and joyride-1551. Enter wrong key 1 time and right key 3 times, fail to connect.
dmesg-656 (61.3 KB) - added by carrano 7 years ago.
dmesg-656.2 (61.3 KB) - added by carrano 7 years ago.
wep works
dmesg-657 (100.2 KB) - added by carrano 7 years ago.
wep fails
dmesg-1608 (78.6 KB) - added by carrano 7 years ago.
WEP fails (joyride 1608)

Download all attachments as: .zip

Change History (90)

comment:1 Changed 7 years ago by rwh

  • Cc reinier@… added
  • Priority changed from normal to high

I'm using WPA and can't connect at all. Upon a wpa_supplicant -dWext -ieth0 -c... (which always used to work), I get the following messages:

Trying to associate with 00:1b:11:6b:29:89 (SSID='villabeter' freq=2422 MHz)
ioctl[SIOCSIWAUTH]: Operation not supported
WEXT auth param 10 value 0x1 - ioctl[SIOCSIWAUTH]: Operation not supported
WEXT auth param 8 value 0x0 - Association request to the driver failed

Wireless driver issue?

comment:2 Changed 7 years ago by mbletsas

  • Status changed from new to assigned

Ricardo,

Can you verify?

M.

comment:3 Changed 7 years ago by dcbw

auth param 10 = IW_AUTH_PRIVACY_INVOKED -> doesn't matter
auth param 8 = IW_AUTH_RX_UNENCRYPTED_EAPOL -> doesn't matter

Can you please attach a full wpa_supplicant run using the "-ddd" switch to enable full debug output. Please log it to /var/log/wpa_supplicant.log using the "-f" switch as well:

/usr/sbin/wpa_supplicant -i eth0 -D wext -c /etc/wpa_supplicant.conf -ddd -f

comment:4 Changed 7 years ago by carrano

Yes, I also get this errors but after this I get associated:

Trying to associate with 00:18:0a:0f:1f:e0 (SSID='roofnet' freq=2462 MHz)
ioctl[SIOCSIWAUTH]: Operation not supported
WEXT auth param 10 value 0x1 - ioctl[SIOCSIWAUTH]: Operation not supported
WEXT auth param 8 value 0x0 - Association request to the driver failed
Associated with 00:18:0a:0f:1f:e0

Can you confirm that you will not associate even if you wait a little?

Changed 7 years ago by rwh

comment:5 Changed 7 years ago by rwh

Attache wpa_supplicant.log, hope it helps!

comment:6 Changed 7 years ago by morgs

  • Cc morgan.collett@… added

comment:7 Changed 7 years ago by gdesmott

Same problem with joyride-1422

Note that I'm using WEP and no WPA

comment:8 Changed 7 years ago by erikos

Same here with WPA and WEP on joyride-1415. I used WPA before just fine 1393.

comment:9 Changed 7 years ago by mbletsas

  • Cc dwmw2@… added

David,

It seems that encryption is completely broken in the latest joyride builds.
The only difference seems to be the kernel thus everybody suspects the driver.

Any clue?

M.

comment:10 Changed 7 years ago by sjoerd

  • Cc sjoerd added

I just upgraded to 1428 and hit this issue too. After downgrading to 1410 things started working again

comment:11 Changed 7 years ago by JensJorgensen

  • Cc JensJorgensen added

comment:12 Changed 7 years ago by dwmw2

Do we have an AP here that I can test against?

comment:13 Changed 7 years ago by dwmw2

Scratch that; I have one I can use. The current driver seems to work fine with WEP, but not WPA. I haven't changed anything there, to my knowledge.

comment:14 Changed 7 years ago by tomeu

In latest joyrides I cannot connect to my WEP AP.

I could without problems until joyride 1407.

comment:15 Changed 7 years ago by dwmw2

Testing first with the libertas code from 9679b65c8c5ed6e685f00872eec0a832d16ee545, which predates the merge of my changes.

When wpa_supplicant is run from the command line, it works -- but only if debugging is not enabled. If debugging in wpa_supplicant is enabled, it fails. Will post logs...

Changed 7 years ago by dwmw2

Old code, working when wpa_supplicant is run without debugging

Changed 7 years ago by dwmw2

Old code, failing when wpa_supplicant is run with debugging

comment:16 Changed 7 years ago by dwmw2

When it works, it looks like this:

[root@localhost ~]# wpa_supplicant -Dwext -ieth0 -c /etc/wpa_supplicant/wpa_supplicant.conf
ioctl[SIOCSIWPMKSA]: Invalid argument                                                                             
Trying to associate with 00:11:95:05:31:09 (SSID='OLPCOFW' freq=2412 MHz)                                         
ioctl[SIOCSIWAUTH]: Operation not supported                                                                       
WEXT auth param 10 value 0x1 - ioctl[SIOCSIWAUTH]: Operation not supported                                        
WEXT auth param 8 value 0x0 - Association request to the driver failed                                            
WPA: No SSID info found (msg 1 of 4).                                                                             
Associated with 00:11:95:05:31:09                                                                                 
WPA: Key negotiation completed with 00:11:95:05:31:09 [PTK=CCMP GTK=CCMP]                                         
CTRL-EVENT-CONNECTED - Connection to 00:11:95:05:31:09 completed (auth) [id=0 id_str=] 

Capturing the failing output is less easy -- it only seems to fail when the output is going directly to the screen, and not even via tee(1).

comment:17 Changed 7 years ago by dwmw2

I admit that WPA seems to have been working when I started working on the upstream code, and now seems not to be. Working on that now.

The WEP thing seems different -- tomeu has shown me debug output which looks like this:

[  754.783253] libertas host: CMD_RESP: response 0x800b, seq 190, size 62, jiffies 42852
[  754.783279] libertas host: CMD_RESP: error 0x0004 in command reply 0x800b

So that's a dupe of #429.

comment:18 Changed 7 years ago by mbletsas

David,

WEP works in some builds and doesn't work on others using the same firmware.
Let's try to figure out what the driver does differently so that we can fix things (or ask Marvell to fix things on their end) - at this point your suggestion doesn't really help.

M.

comment:19 Changed 7 years ago by dwmw2

It works for me with the current driver. It also works for Tomeu with the current driver, sometimes.
But sometimes the firmware gets into a state where it returns 0x0004 to all commands. There really isn't a lot we can do about that. I _know_ it's not wonderfully helpful to say 'dupe of #429', but that _really_ is all I can do. This needs to be debugged in the firmware, and we can't do that.

The best I can do is kick it in the head when it does this to us -- which I was about to hook up when I got distracted into debugging the WPA thing, which really does look like it's in the driver.

comment:20 Changed 7 years ago by dwmw2

  • Cc dcbw added

git-bisect suggests that the commit which broke WPA was:
http://git.infradead.org/?p=libertas-2.6.git;a=commitdiff;h=57a463d96e783700373c779f8944ace1f45d0497

I don't see a smoking gun there; will investigate further and compare behaviour.

comment:21 Changed 7 years ago by dwmw2

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:22 Changed 7 years ago by dwmw2

To clarify: that commit fixes WPA, not WEP.

If anyone can show a debug log (echo 0x186484 > /sys/module/libertas/parameters/libertas_debug) of WEP failing while the firmware is actually alive and _responding_ to the driver instead of either ignoring or deferring all commands, I'll be happy to look into that further.

comment:23 follow-up: Changed 7 years ago by gdesmott

WEP still not working with Joyride 1477...

comment:24 in reply to: ↑ 23 Changed 7 years ago by dwmw2

Replying to gdesmott:

WEP still not working with Joyride 1477...

Does it work if you set the WEP key manually with iwconfig? Does 'iwlist scan' even work? What kernel output do you see after echo 0x6184 > /sys/module/libertas/parameters/libertas_debug and then trying to set the key? Is the key shown when you run 'iwconfig' to view the settings?

comment:25 Changed 7 years ago by mbletsas

  • Cc carrano@… added

comment:26 Changed 7 years ago by gdesmott

Replying to dwmw2:

Does it work if you set the WEP key manually with iwconfig?

Well, it's weird. I tried to run iwconfig and the dhclient and it didn't get an IP (no one was displayed in ifconfig). But after few minutes, he magically get one and network was working fine.

Does 'iwlist scan' even work?

yes

What kernel output do you see after echo 0x6184 > /sys/module/libertas/parameters/libertas_debug and then trying to set the key?

SSID: <my-sid>
chann: 6
band: 0
mode: 2
BSSID: <mac-of-my-AP>
secinfo: WEP
auth_mode: 1

And in dmesg:
[ 1200.447011] libertas host: PREP_CMD: command 0x0013
[ 1200.447208] libertas cmd: SET_WEP: add key 0 (104 bit)
[ 1200.447401] libertas host: QUEUE_CMD: inserted command 0x0013 into cmdpendingq
[ 1200.447623] libertas host: EXEC_NEXT_CMD: sending command 0x0013
[ 1200.447654] libertas host: DNLD_CMD: command 0x0013, seq 1277, size 80, jiffies 88030
[ 1200.447692] libertas cmd: DNLD_CMD: sent command 0x0013, jiffies 88030
[ 1200.448210] libertas host: PREP_CMD: wait for response
[ 1200.448539] libertas host: CMD_RESP: response 0x8013, seq 1277, size 80, jiffies 88030
[ 1200.448796] libertas host: PREP_CMD: command 0x0028
[ 1200.448987] libertas cmd: MAC_CONTROL: action 0x2b, size 12
[ 1200.449182] libertas host: QUEUE_CMD: inserted command 0x0028 into cmdpendingq
[ 1200.449397] libertas host: EXEC_NEXT_CMD: sending command 0x0028
[ 1200.449427] libertas host: DNLD_CMD: command 0x0028, seq 1278, size 12, jiffies 88030
[ 1200.449461] libertas cmd: DNLD_CMD: sent command 0x0028, jiffies 88030
[ 1200.449977] libertas host: PREP_CMD: command 0x0028
[ 1200.450169] libertas cmd: MAC_CONTROL: action 0x2b, size 12
[ 1200.450282] libertas host: CMD_RESP: response 0x8028, seq 1278, size 12, jiffies 88030
...

Is the key shown when you run 'iwconfig' to view the settings?

yes

comment:27 Changed 7 years ago by dwmw2

Hm, sounds like it worked then. Does it pick up IPv6 addresses via RA? When you use tcpdump does it see traffic? Make sure you use the -p option with tcpdump.

comment:28 follow-up: Changed 7 years ago by karmaflux

  • Verified set

Build joyride-1477, firmware q2d07, and I have the same issue. Removing the executable bit from /etc/rc.d/init.d/NetworkManager* allows me to manually associate via iwconfig and dhclient. If I leave NetworkManager and NetworkManagerDispatcher running, I get thousands of Rx frag errors reported in iwconfig; even though iwconfig reports me as associated, I cannot request an IP address from a dhcp server.

I'd say this is NetworkManager's fault, since when I disable it everything else works fine.

comment:29 Changed 7 years ago by carrano

I also confirmed that joyride 1477 won't connect to a WEP AP via UI.

But in terms of manual connection via iwconfig, I am afraid that this will depend on the AP, as it generally does when it comes to WEP.

As an example, you still can connect manually to a Buffalo AirStation with WEP enabled:

 iwconfig eth0 mode managed
 iwconfig eth0 essid <myessid> key s:<mypassphrase>
 dhclient eth0

Note that I did _not_ kill NetworkManager.

But the same procedure will _not_ work for an Apple Express

comment:30 follow-up: Changed 7 years ago by dwmw2

If you don't kill NetworkManager, the test is completely invalid. It'll overwrite all the settings at random times. Does the AirportExpress work when NetworkManager is dead?

comment:31 in reply to: ↑ 30 Changed 7 years ago by carrano

Replying to dwmw2:

If you don't kill NetworkManager, the test is completely invalid.

Not true, for two reasons:
1 - NM is not overwritting the settings. The XO is associated for an hour now - still pinging.
2 - Even if it did (overwrite), the test proves that the AP (at least some) _will_ accept the wep key. Accepting a wep key entered via command line has nothing to do with the fact that NM is or is not running.

And no, you won't associate to the Apple even if you kill NetworkManager.

It does not look as "simple" as in:

I'd say this is NetworkManager?'s fault, since when I disable it everything else works fine.

comment:32 Changed 7 years ago by carrano

Here is what you need to do to connect manually to the Apple:

iwconfig eth0 mode managed
iwconfig eth0 essid <myessid> key <hex_key>
dhclient eth0

NM won't interfere (it is running now)
Again the point is what to send out (it usually is in wep).

comment:33 in reply to: ↑ 28 Changed 7 years ago by mbletsas

I'd say this is NetworkManager's fault, since when I disable it everything else works fine.

Not really, since NM and the wireless firmware haven't change.
What NM does though is expose issues with command processing in the driver/firmware since it constantly issues commands to it.

M.

comment:34 follow-up: Changed 7 years ago by carrano

Associating to a WEP AP via sugar is not working in builds 674 and joyride 1514.

It does work with manual (iwconfig) association/dhcp.

Tests were performed with Apple and Buffalo APs (like the ones describe above).

comment:35 in reply to: ↑ 34 Changed 7 years ago by carrano

Replying to carrano:

Associating to a WEP AP via sugar is not working in builds 674 and joyride 1514.

It does work with manual (iwconfig) association/dhcp.

Tests were performed with Apple and Buffalo APs (like the ones describe above).

Updating: same for build 684.

comment:36 Changed 7 years ago by jg

  • Resolution fixed deleted
  • Status changed from closed to reopened

Changed 7 years ago by reillysl

Several logs showing the wpa_supplicant issue

comment:37 Changed 7 years ago by jg

  • Milestone changed from Never Assigned to Update.1
  • Verified unset

Dan, could you take a look at these logs?

comment:38 Changed 7 years ago by dcbw

The logs do indicate that wpa_supplicant is failing to get the association event from the driver. At this point, if even wpa_supplicant is failing (ie, no NM involved) then we'd need driver logs during the association request to figure out why it's failing in the driver.

comment:39 Changed 7 years ago by dwmw2

The firmware does often fail to associate. If you try again a few times, does it ever work?

To get the driver logs as Dan requests, echo 0x6184 > /sys/module/libertas/parameters/libertas_debug, then run dmesg -c to clear the kernel's log buffer, then run wpa_supplicant once, and run dmesg again to see the logs. Redirect the output to a file and attach...

comment:40 Changed 7 years ago by homunq

I'm having what looks like this problem in q2d09 and joyride-1551 (that is, essentially update-1RC1). I have turned of WEP on my AP for the night but will turn it on again in the morning and send logs.

comment:41 follow-up: Changed 7 years ago by homunq

  • Cc homunq added

Changed 7 years ago by homunq

yet another dmesg log on q2d09 and joyride-1551. Enter wrong key 1 time and right key 3 times, fail to connect.

comment:42 Changed 7 years ago by sdague

  • Cc sdague added

comment:43 follow-up: Changed 7 years ago by homunq

Let me try to summarize the situation with WEP, as I'm reading it from the comments in this bug:

  1. It was working, consistently?, in 1398, 1407, ??? , and 1410.
  2. It was broken on the same machines in 1415, ????, 1422, and 1428.
  3. One report from tomeu on this machine shows a total meltdown in the closed firmware. It responds 0x0004 to everything, ie, something has exposed bug 429.
  4. More recent reports, including the latest logs, show something different; failure with the firmware still responding.
  5. Manual association using iwconfig seems to be working. NetworkManager may or may not be getting in the way, but it certainly isn't helping.

Conclusions:
This was working at one point with the current firmware. It is working now, manually. This is not simply a firmware problem.

There is only 1 report of the 0x0004 bug. Although it is apparently on a machine which had been working at one point, it really could be some intermittent/hardware/other problem.

Either the current problem was caused by something between 1410 and 1415, or something there caused the 0x0004 bug, which has since been fixed, but which hid the true onset of the current bug.

Is this hard to test? Do you need my help making more logs? (ie, trying to do this both failing through UI and succeeding manually using iwconfig, showing logs for both; or answering "Is the key shown when you run 'iwconfig' to view the settings?"; or something)

comment:44 Changed 7 years ago by homunq

Sorry, in the above comment, the "this machine" which showed the firmware bug was the one that broke sometime after 1407.

comment:45 Changed 7 years ago by carrano

As posted, it is possible to connect to the WEP APs using iwconfig only. Don't believe it is the firmware.

At least one UI issue is related to this... #6182.

comment:46 Changed 7 years ago by mbletsas

There was a major rewrite in the driver. Network Manager uses wpa_supplicant for encryption which is known to be very sensitive to timing.

Best guess at this point is that wpa_supplicant does not like the responses that it gets from the driver.

The firmware hasn't changed, it is the least suspect component at this point.

M.

comment:47 in reply to: ↑ 43 ; follow-up: Changed 7 years ago by carrano

Replying to homunq:

Let me try to summarize the situation with WEP, as I'm reading it from the comments in this bug:

I would propose the following summary:

1 - It is not the firmware because we can manually associate

2 - It is not UI because manually writing on /home/olpc/.sugar/default/nm/networks.cfg does not work. (this is an old trick that worked when the problem was UI. And, btw, #6182 is fixed, so nothing points to UI anymore)

3 - It used to work

If the 3 above are correct, Libertas driver, wpa_supplicant and NetworkManager are the possible candidates. The first suffered the most deep changes, so it would be the logical candidate.

So, maybe if we grab an old kernel with an old driver and test it in a recent joyride, this would give as a hint, right?

comment:48 in reply to: ↑ 47 Changed 7 years ago by carrano

.

So, maybe if we grab an old kernel with an old driver and test it in a recent joyride, this would give as a hint, right?

... and it works:
joyride 1579 + 2.6.22-20071121.7.olpc.af3dd731d18bc39.

So, it seems to be in the driver.

comment:49 follow-up: Changed 7 years ago by dwmw2

Using git-bisect on the libertas-2.6.git tree to isolate the commit which made the difference, and comparing the driver's debug output in the last working version and the first non-working version, would be the next step in diagnosis.

comment:50 in reply to: ↑ 41 Changed 7 years ago by dwmw2

Replying to homunq:

Thanks for the log. It seems to show that after a successful association (and I can't see how that can happen without a corresponding SIOCGIWAP event), we keep being asked for signal levels, and after a while we're asked to reassociate. Adding 0x20 (LBS_DEB_WEXT) to the debug flags might be more enlightening.

Could we be giving wpa_supplicant stats it doesn't like? Dan?

comment:51 Changed 7 years ago by dcbw

Should also get LBS_DEB_ASSOC and LBS_DEB_SCAN logging.

I'm a bit concerned here about the scanning. First off, these bits:

[   78.610543] libertas host: EXEC_NEXT_CMD: sending command 0x0006
[   78.610569] libertas host: DNLD_CMD: command 0x0006, seq 518, size 63, jiffies 18826
[   78.610615] libertas cmd: DNLD_CMD: sent command 0x0006, jiffies 18826
[   79.056908] libertas host: CMD_RESP: response 0x8006, seq 518, size 11, jiffies 18870
[   79.056941] libertas scan: SCAN_RESP: bssdescriptsize 0
[   79.056959] libertas scan: SCAN_RESP: scan results 0
[   79.060104] libertas assoc: Association Request:

This indicates that the firmware didn't return any results at all, let alone the AP that the card is supposed to currently be connected to.

In any case, having LBS_DEB_WEXT would be quite informative here.

comment:52 in reply to: ↑ 49 Changed 7 years ago by carrano

Replying to dwmw2:

Using git-bisect on the libertas-2.6.git tree to isolate the commit which made the difference, and comparing the driver's debug output in the last working version and the first non-working version, would be the next step in diagnosis.

Ok, I am starting with:

653 - I know it works

675 - I know it does not.

But bisect obviously returns a lot of libertas-related changes between these two.

kimquirk emails that 656 was the last that worked. So I'll confirm it works in 656 and then check if it breaks in 657. But any info here that helps narrowing it down will be more than welcome.

comment:53 follow-ups: Changed 7 years ago by carrano

Puzzle:

  • 656 connects to WEP. The kernel is 2.6.22-20071231.2.olpc.83e0631da83a269.
  • 657 _do_not_ connect to WEP. The kernel is 2.6.22-20071121.7.olpc.af3dd731d18bc39 (older)

656 is the last ship2, and 657 is the first Update1, and brought some older stuff back (like firmware 20.p42 and the kernel).

But af3dd.. is the same kernel that, when put in joyride-1579 made the WEP work. We seem to be dealing with many causes here (as usual).

I'll continue this until I find two consecutive builds where:

  • WEP works with build N but not with N+1
  • Build N+1 uses a newer kernel than build N (which should happen when inside the same branch, as update.1)
  • Libertas firmware version is the same in both builds (if not force it and see if changes anything)

But it helps if anyone knows of an Update.1 build where WEP works (if ever)

In any case, I am attaching dmesg outputs with lbs_deb_{SCAN, ASSOC, WEXT, CMD and HOST} on for 656 and 657.

Changed 7 years ago by carrano

Changed 7 years ago by carrano

wep works

Changed 7 years ago by carrano

wep fails

comment:54 in reply to: ↑ 53 Changed 7 years ago by dwmw2

Replying to carrano:

I'll continue this until I find two consecutive builds where:

It would be more useful to use git-bisect on the kernel tree, building kernels until you find the actual commit which made the difference. Keeping _everything_ else in userspace the same.

You should be able to do this on any machine with an active antenna, running wpa_supplicant to test connection to the WEP-enabled network.

comment:55 Changed 7 years ago by dwmw2

(And I mean the libertas-2.6.git tree as mentioned before, not the olpc tree)

comment:56 Changed 7 years ago by dcbw

I'm having a really tough time getting wireless-2.6/everything (which is caught up to libertas-2.6 minus two irrelevant patches) to associate to anything WEP right now and pass traffic with either iwconfig or wpa_supplicant (no NM involved). I need to investigate more but until I can get that working I have doubts that everything is OK in OLPC-land too.

I saw a few issues today with wireless-2.6/everything on my thinkpad with F7's 2.6.23.12-52.fc7:

1) Scans would timeout using both 5.110.20.p0 and 5.110.17.p5 and hang the card while doing so

2) Can't rmmod/insmod the modules and have the card work again; you need to actually unplug the thing to get it to take firmware upload again. Maybe the driver isn't correctly resetting the 8388 when the driver starts up?

3) Even once associated with a WEP AP, 3 out of 4 times DHCP wouldn't work and I couldn't get an address. Happened randomly.

#1 worries me the most here.

comment:57 Changed 7 years ago by dcbw

I will try libertas-2.6 for completeness' sake, but given that it's not substantially different than wireless-2.6/everything I'm not too hopeful it'll work. Only one way to find out though.

comment:58 Changed 7 years ago by dwmw2

Try the libertas-2.6 tree. There may well be unrelated changes in the wireless tree, and the patches are recommitted in arbitrary order so that my known and tested versions don't exist and git-bisect is useless (the wireless 'git' tree is just a patch stack really, and makes a mockery of git as a version control system).

comment:59 Changed 7 years ago by carrano

(David: Yes, you are right. My tests were performed in the OLPC git tree, including the following, that I will post anyway hoping it is still useful):

Build 656 comes with:
kernel-2.6.22-20071231.2.olpc.83e0631da83a269 - OK

The next kernel release is:
kernel-2.6.22-20071231.3.olpc.71454c965b73c4e - BAD (tested in 656)

This kernel release seems to break WEP (and WPA) if put in 656.

If my poor git skills are not fooling me, the 71454c965b73c4e commit introduced 46 driver changes in the kernel, coded between 12/20 and 12/31.

comment:60 Changed 7 years ago by dcbw

David; no luck with up-to-date libertas-2.6 against either a WRT54G or a WRT54GC, both of which work just fine with the ipw2200. Symptoms are the same; the driver will report an association, but DHCP will never complete successfully.

I tried an unencrypted connection to the WRT54GC with libertas-2.6, and that actually worked like a charm. I'm starting to suspect some command mis-ordering or race conditions here like back when WPA didn't work (which was a command sequence issue). Since the association completes successfully, that indicates that the 8388 can exchange frames with the AP, but since no actual traffic appears to pass, that indicates that perhaps there's something amiss in the WEP key setting functions or sequence in the driver.

I could try to git-bisect libertas-2.6 I guess...

comment:61 Changed 7 years ago by dcbw

261d52b4874d5575f98b494ad4e669154f141a94 is first bad commit
commit 261d52b4874d5575f98b494ad4e669154f141a94
Author: David Woodhouse <dwmw2@infradead.org>
Date:   Tue Dec 11 18:56:42 2007 -0500

    libertas: add lbs_mesh sysfs attribute for enabling mesh
    
    Signed-off-by: David Woodhouse <dwmw2@infradead.org>

:040000 040000 d3c51ed8fd49fe7bb5ed3b456d471e7b9d3aedc2 ec9cd9f6baa37859b8039f9c62a874b9cbb1e165 M      drivers

WEP works before that commit (with fbbe497b58e3a5f1dca9cc363fc3957820e66299) but not with or after that commit. I couldn't understand why that patch would affect WEP so I tried switching between those two commits manually and got the same results as the bisect. Could something with mesh autostart be screwing us again? I'm using the latest "stable" firmware release from TechWiki (5.110.17.p5).

Also, when bisecting, note that I usually had to manually fix up the build with:

diff --git a/drivers/net/wireless/libertas/assoc.c b/drivers/net/wireless/libertas/assoc.c
index bd9cfe1..b93a51b 100644
--- a/drivers/net/wireless/libertas/assoc.c
+++ b/drivers/net/wireless/libertas/assoc.c
@@ -171,8 +171,10 @@ static int update_channel(struct lbs_private *priv)
        lbs_deb_enter(LBS_DEB_ASSOC);
 
        ret = lbs_get_channel(priv);
-       if (ret > 0)
+       if (ret > 0) {
                priv->curbssparams.channel = (u8) ret;
+               ret = 0;
+       }
 
        lbs_deb_leave_args(LBS_DEB_ASSOC, "ret %d", ret);
        return ret;

to fix the bug I introduced with my channel command refactor; otherwise the driver won't complete association.

comment:62 Changed 7 years ago by dcbw

Using libertas-2.6 head and applying the following makes it work for me:

diff --git a/drivers/net/wireless/libertas/main.c b/drivers/net/wireless/libertas/main.c
index 91b2f23..0bd0c60 100644
--- a/drivers/net/wireless/libertas/main.c
+++ b/drivers/net/wireless/libertas/main.c
@@ -938,6 +938,22 @@ static int lbs_setup_firmware(struct lbs_private *priv)
 		goto done;
 	}
 
+	/* Disable mesh autostart */
+	if (1) {
+		struct cmd_ds_mesh_access mesh_access;
+		memset(&mesh_access, 0, sizeof(mesh_access));
+		mesh_access.data[0] = cpu_to_le32(0);
+		ret = lbs_mesh_access(priv, CMD_ACT_MESH_SET_AUTOSTART_ENABLED,
+				      &mesh_access);
+		if (ret) {
+			printk("Mesh autostart set failed\n");
+			ret = 0;
+			//ret = -1;
+			goto done;
+		}
+		priv->mesh_autostart_enabled = 0;
+	}
+
 	ret = 0;
 done:
 	lbs_deb_leave_args(LBS_DEB_FW, "ret %d", ret);

The bisected commit 261d52b4874d5575f98b494ad4e669154f141a94 apparently moves mesh stuff around such that priv->mesh_dev is NULL at firmware setup time, which never turns off mesh autostart. I believe we should be unconditionally be turning mesh autostart OFF, because autostart is only really useful in the active antenna case. If the card is ever controlled by a driver them mesh autostart isn't useful since the driver/user knows when to start mesh better than the firmware itself. We've had mesh autostart screw us hard with a sledgehammer before.

comment:63 in reply to: ↑ 53 Changed 7 years ago by morgs

Replying to carrano:

Puzzle:

  • 656 connects to WEP. The kernel is 2.6.22-20071231.2.olpc.83e0631da83a269.
  • 657 _do_not_ connect to WEP. The kernel is 2.6.22-20071121.7.olpc.af3dd731d18bc39 (older)

656 is the last ship2, and 657 is the first Update1, and brought some older stuff back (like firmware 20.p42 and the kernel).

Actually, no. Update.1 builds originally started at 651 - see http://xs-dev.laptop.org/~cscott/olpc/streams/update.1/ which is the old location for Update.1 builds - so we have some overlapping version numbers. If you want the build before 657, look at 656 in this location.

comment:64 Changed 7 years ago by dwmw2

I thought I was told that autostart was a NOP in current firmware. Hence removing all references to it in commit ddaf86a9f0c71cdad4f15de93169909cba87ce8c.

Obviously, life would be a lot easier if we actually had visibility into the firmware.

comment:65 Changed 7 years ago by dwmw2

Ok, have committed the above patch to restore the explicit disable on init.

comment:66 Changed 7 years ago by dcbw

Yeah; that hunk should probably be there. I started using 5.110.220.p49 and that appears to work OK with wpa_supplicant and WEP so far _without_ the mesh autostart disable thing; will keep poking at it.

comment:67 Changed 7 years ago by dcbw

So the actual problem is that should_deauth_infrastructure() in assoc.c wasn't returning the correct value. Change "return 0;" -> "return ret;" and that should fix the issues.

Holger posted a patch to linux-wireless yesterday that contains the fix, though I didn't find that again until I'd already tracked it down independently :( See the "make association debug output nicer" patch. Need to get this in OLPC kernels ASAP, I think.

comment:68 Changed 7 years ago by dwmw2

Hm, interesting. It looks like that 'return 0' has been there since the beginning, so how did this _ever_ work?

comment:69 follow-up: Changed 7 years ago by dcbw

I assume because the code probably used to just 'return -1' where it currently calls 'goto out'. And when somebody cleaned it up, likely to ensure that the lbs_deb_leave_args() call fired, they just forgot to change the return 0 to a return ret.

comment:70 Changed 7 years ago by carrano

Tested joyride-1608 which brings ...

the new kernel 1b84a9e4bedcd661ec8f1fca0cc87b20c40dc37a which brings ...

the "libertas: should_deauth_infrastructure() fix"

No WEP (and no WPA either)

I'll post dmesg logs.

Changed 7 years ago by carrano

WEP fails (joyride 1608)

comment:71 follow-up: Changed 7 years ago by carrano

Mmm. It works if you upgrade the firmware to 22.p1.

I'll double check...

comment:72 in reply to: ↑ 71 Changed 7 years ago by carrano

Replying to carrano:

Mmm. It works if you upgrade the firmware to 22.p1.

I'll double check...

Other thing that made it work: manually adding an entry to networks,cfg
(No, I did not change the firmware this time)

But, anyway, an out-of-the-box 1608 still won't connect to WEP APs. Two receipts that seem to make it work (either one):

1 - upgrade the firmaware to 22p1; or

2 - manually edit networks.cfg

The fact that this two items are so not related is very uncomfortable. But I tested twice with two different XOs. Would anyone replicate this? There must be dust in my eyes.

comment:73 in reply to: ↑ 69 Changed 7 years ago by dwmw2

Replying to dcbw:

I assume because the code probably used to just 'return -1' where it currently calls 'goto out'. And when somebody cleaned it up, likely to ensure that the lbs_deb_leave_args() call fired, they just forgot to change the return 0 to a return ret.

Hm. Indeed so. Bad Holger, no biscuit. Dan, do you mind if I take over as libertas maintainer upstream and start vetting all changes to it?

comment:74 Changed 7 years ago by dcbw

I'm able to associate with NetworkManager (on Fedora Core 8) to both WEP and WPA APs using firmware 5.110.20.p49 and head libertas-2.6 with my patch. Before the patch I was not. Somebody should ensure that libertas-2.6 and olpc-2.6 are in sync before going much further, I guess.

comment:75 Changed 7 years ago by dcbw

Manual wpa_supplicant associations with dhclient runs work for me with both WEP and WPA on joyride 1608 with kernel 2.6.22-20080129.2.olpc.1b84a9e4bedcd66 to the WRT54GC.

NetworkManager association via Sugar to WEP AP (using 104-bit hex key) works too

comment:76 follow-up: Changed 7 years ago by carrano

Joyride-1612 (kernel 13dc6cf365d32ae8fe52e7b4e10ea784f4d6ba4c) which brings:

Revert "libertas: Restore explicit mesh autostart disable on init."

WEP works again.

I believe we have to turn our attention to WPA testing (maybe on another ticket). In preliminary tests it is (still) broken. I'll start gathering data.

comment:77 in reply to: ↑ 76 Changed 7 years ago by carrano


I believe we have to turn our attention to WPA testing (maybe on another ticket). In preliminary tests it is (still) broken. I'll start gathering data.

Updated #6191

comment:78 Changed 7 years ago by dwmw2

I'd be very interested to know why disabling mesh autostart was fixing this, and why the change in should_deauth_infrastructure makes a difference too. The latter problem was that we weren't sending the CMD_802_11_DEAUTHENTICATE command before trying to associate again -- but should that have caused a failure to associate? It doesn't _seem_ to have caused a failure to associate -- the kernel logs suggest that it associated just fine.

I don't really consider this fixed properly while we still don't know what's going on.

comment:79 Changed 7 years ago by dcbw

For the mesh autostart thing, it was always kind of flaky. ISTR that mesh autostart cleared some device state in the firmware or something like that that screwed the existing connection. That wasn't my understanding of how it was _supposed_ to work (which was just to bring up the mesh if no association was performed within 10 seconds or something). It just screwed with the internal firmware state and stuff just didnt' work.

For the should_deauth_infrastructure() issue, when the CMD_802_11_DEAUTHENTICATE command response comes back from the firmware, lbs_mac_event_disconnected() is called which clears priv->connect_status, frees current rx & tx packets, and sends the SIOCSIWAP "disassociated" WEXT signal. It's quite unclear what the firmware's behavior actually is in this case, and I can't really see what in the driver would depend on priv->connect_status to substantially change behavior.

If the status is already LBS_CONNECTED, that doesn't mean anything for the association process really. The driver logs indicate that the correct association parameters were sent to the firmware, and that the driver received an appropriate association response from the firmware after completing the association and authentication commands. Note that each test I did was a separate run of the supplicant, like so:

a) do something in driver
b) build driver
c) insmod modules
d) run wpa_supplicant
e) run dhclient
f) killall -TERM dhclient
g) Ctl+C wpa_supplicant
h) go to (d)

The second run of the supplicant would correctly associate, but no packets ever got received by the driver and passed up to the kernel. They got filtered out somewhere in firmware, or perhaps the DHCP requests never got received by the APs I was using.

To ensure that it's not the driver, but would be the firmware, we would need to instrument the driver at every applicable place where priv->connect_status is checked and ensure that the branch taken there had no real effect on the association process. A quick look over the code doesn't seem like it does, and the diffs of the driver dmesg log output I looked at didn't show any significant differences _other_ than the should_deauth_infrastructure() change. I didn't see any real differences in command #s sent to the driver (with the exception of the missing DEAUTHENTICATE), but I did not inspect the actual command contents.

comment:80 Changed 7 years ago by yani

  • Cc yani added

comment:81 Changed 7 years ago by jg

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.