Ticket #10270 (closed defect: fixed)

Opened 4 years ago

Last modified 3 years ago

wlan goes missing during runin

Reported by: rsmith Owned by: dsaxena
Priority: high Milestone: Future Release
Component: kernel Version: not specified
Keywords: XO-1.5 WLAN Cc: wad, dsd
Action Needed: reproduce Verified: no
Deployments affected: Blocked By:
Blocking:

Description

This is a top level ticket for 3 different wlan problems that are occurring during runin.

The following issues have been observed:

1) Wlan some times shows up as eth1 rather than eth0

This issue has a temporary fix in runin versions '0.9.40'. It still generates an error but the script will delete the persistent rules file for udev so that on the next boot it will return to eth0. The long term fix is to create a udev rule that pins the wlan to eth0 always.

2) Libertas fails to load firmware.

This issue has 2 different flavors. The first is if you remove the libertas module from the kernel yet do not power cycle the wlan card. The 2nd is the same result but it happens at odd times in the runin sequence. Sometimes on the 1st boot and some times after a suspend/resume.

3) Wlan card fails to respond after power up.

In this case the wlan fails to respond after a mmc power up. Kernel logs look like:

<4>[ 1245.312719] sdhci_reset: wlan w/u control is 0x0
<7>[ 1245.312731] sdhci-pci 0000:00:0c.0: setting latency timer to 64
<4>[ 1245.312757] sdhci_set_power: new power value = 14
<4>[ 1245.322795] sdhci_set_power: new power value = 14
<4>[ 1245.373206] sdhci_set_ios: power off for mmc0 from b061ba25
<4>[ 1245.373214] sdhci_set_power: new power value = 0
<4>[ 1245.373225] sdhci_set_power: new power value = 14
<4>[ 1245.427424] sdhci_set_ios: power off for mmc1 from b061ba25
<4>[ 1245.427432] sdhci_set_power: new power value = 0

This seems to occur after a lot of suspend/resumes.

Issues 2 and 3 are so far cannot be duplicated by OLPC or Quanta. Quanta has tried to pin it to specific hardware but so far that has failed. Usually when you re-run the tests then they will pass. Quanta has taken 2 of the cards that failed on the line and run them again for 48 hours with no failures.

Change History

follow-up: ↓ 3   Changed 4 years ago by rsmith

Additional info on Problem #2

While studying Quanta's logs Paul discovered today that the fail to load firmware coincides with a recovery on the ext3 filesystem. The firmware is stored in the initrd but somehow a journal recovery seems to trip that up. (At least thats how it looks right now)

We have so far been unable to duplicate this here at 1cc. Power cycling the runin tests while they are in progress recreates the recovery on next boot but we have not seen the load failure.

This brings up the question of why is there any sort of journal recovery going on during runin? Is it due to a restart? Normal shutdown or failure is should do a clean shutdown.

The latest version of the runin tests tracks how many times the tests have been restarted. Logs from those tests should show up tonight (2010-08-03)

  Changed 4 years ago by mikus

Comment on Problem #1

I use an external USB ethernet adapter. I have not seen this problem in the last year or so, but back when I was running Joyride builds -- sometimes the first boot-up of one new build would assign eth0 to the radio and eth1 to the ethernet, but then the first boot-up of the next new build might assign eth0 to the ethernet and eth1 to the radio. [The interface assignment order appeared to reflect the order in which the kernel? accessed the respective hardware devices.]

To protect myself from "reverse-assignment-order", I ended up manually editing /etc/udev/rules.d/70-persistent-net-rules to ensure that the ethernet got interface eth1.

in reply to: ↑ 1 ; follow-up: ↓ 5   Changed 4 years ago by Quozl

Replying to rsmith:

While studying Quanta's logs Paul discovered today that the fail to load firmware coincides with a recovery on the ext3 filesystem. The firmware is stored in the initrd but somehow a journal recovery seems to trip that up. (At least thats how it looks right now)

Comment: I seem to remember (i.e. I'm possibly wrong) that journal recovery blocks access to only the filesystem being recovered, and that the firmware load is done by a process created by a kernel thread. There's an assumption in the firmware load sequence that we can afford to wait for filesystem to read the firmware before it is sent to the device. Is the device unable to handle that delay? I imagine that what is happening is that the mount of the filesystem occurs, but that journal recovery must happen. If the mount is occluding /lib/firmware, thentThreads created for firmware loading after that mount (but before it returns) might use the mounted filesystem despite journal recovery being in progress.

  Changed 4 years ago by wad

  • keywords XO-1.5 WLAN added

Regarding case 3:

According to a confidential Marvell AN, we should be asserting RESET externally after powering up the WLAN card.

The Linux driver should assert WLAN_RESETn low for between 1 and 20 uS, about 1 mS after turning on power to the card. WLAN_RESETn is tied to GPIO0 on the VX855.

I was under the impression that there was a cap on the reset line which obviated the need for this (as on the XO-1 WLAN module), but a search of the WLAN card schematics shows no such capacitor.

in reply to: ↑ 3   Changed 4 years ago by pgf

Replying to Quozl:

Comment: I seem to remember (i.e. I'm possibly wrong) that journal recovery blocks access to only the filesystem being recovered, and that the firmware load is done by a process created by a kernel thread.

i think you're right. the firmware is coming from the initrd, not the ext3 root fs that's in need of recovery. if indeed there's a correlation (what i've seen in the logs may be coincidental) it could have to do with competing for the sdio controller. but after some testing yesterday (dd'ing to and from an SD card while booting the wlan card) i doubt that. the problem is far more likely to be related to the reset issue that wad points out.

  Changed 3 years ago by dsd

  • cc wad added

Paul did commit the above GPIO tweak (commit e9bee721fb0cc), so it's included in 10.1.2.

Do we have any more feedback from Quanta about issues 2&3? Are they still seeing it?

We are fighting a new issue where Linux can now power down SD cards without suspending the rest of the machine. We can't get it to turn libertas back on when this happens. The reset GPIO tweak does not help.

  Changed 3 years ago by dsd

  • cc dsd added
  • milestone changed from Not Triaged to Future Release

It would be good to get feedback about this. If Paul's tweak made a difference I will try to upstream it. Otherwise I won't; its not an easy one to do cleanly.

  Changed 3 years ago by dsd

  • status changed from new to closed
  • resolution set to fixed

Further discussion with Marvell suggests that the GPIO dance is not needed.

There is a SD8686 bug that it cannot accept two initialisation sequences without a power cycle (that is what this errata was about). However, a SDIO-level reset was identified and confirmed by Marvell as another way of overcoming this limitation. Linux does this in the normal probe cycle, so we were never going to face this issue anyway.

With that knowledge, I'm going to assume that this issue was fixed by other means and close this ticket. The GPIO tweak will remain in olpc-2.6.35 but be dropped from future kernels, as it seems that this is not needed and wouldn't have made a difference anyway. If we discover otherwise later, we can revive this issue and implement the GPIO tweaking using Linux's regulator framework.

On a similar topic, we also made related findings: another way of working around this issue is to power cycle the card using normal SD bus mechanisms. This obviously causes a reset. And Linux's new SD runtime power management will cause power to be cycled during rmmod/modprobe.

One thing to be aware of: On our motherboard, the SD card power supply is not clamped, so it is necessary to wait a while (approx 300ms) after powering the card down, before powering it up again, to ensure that this reset happens. However, again, as sdio_reset() happens during the probe path, we were never going to face this issue anyway.

Note: See TracTickets for help on using tickets.