Opened 5 years ago

Closed 4 years ago

Last modified 4 years ago

#10071 closed defect (fixed)

external SD cards not working on XO-1.5

Reported by: sascha_silbe Owned by: wmb@…
Priority: normal Milestone: 1.5-firmware
Component: ofw - open firmware Version: 1.5-C2
Keywords: Cc: mikus, wad
Blocked By: Blocking:
Deployments affected: Action Needed: test in release
Verified: no

Description

I'm having trouble using any of my two SD cards in an XO-1.5 (CL1B / D4). While the cards are detected fine and listing the root directory seems to work, trying to access a subdirectory or running the self-test yields one the following errors:

SDHCI: Error: ISR = 8000 ESR = 10 Data Timeout
SDHCI: Error: ISR = 8002 ESR = 10 Data Timeout

Happens with OFW Q3A33 / EC 1.9.21 and OFW Q3A35 / EC 1.9.24.
SD cards: Silicon Power SDHC Class 6 16GB, takeMS SDHC Class 6 4GB.

The 16GB cards was used for running Debian on an XO-1, so it's known to work fine. The only issue on XO-1 was that it often required two boot attempts ("SD card failed to power up" or something like that).

Change History (11)

comment:1 Changed 5 years ago by mikus

  • Cc mikus added

comment:2 Changed 5 years ago by sascha_silbe

Reproducible on a second XO-1.5 (same model) with the same set of firmware versions.
The SD cards are working fine (dd if=/dev/mmcblk0 of=/dev/null, 16..19MB/s read speed) from within Linux (OS114).

comment:3 follow-up: Changed 5 years ago by pgf

  • Cc wad added

cc'ing wad, to be sure he's seen this. the fact that one of the cards was also problematic on XO-1 is suspicious. does the self-test work on these cards on XO-1?

comment:4 in reply to: ↑ 3 Changed 5 years ago by sascha_silbe

Replying to pgf:

the fact that one of the cards was also problematic on XO-1 is suspicious.

I don't think it's related as the same error happened twice on XO-1.5 now, but on second attempt it worked as "well" as every other time (i.e. root dir works, but some other accesses time out).
However using the 16GB card I now managed to trigger exactly the same error (i.e. SDHCI: Error: ISR = 8000 ESR = 10 Data Timeout) on XO-1 using dir sd:\security. This command (with sd replaced by ext for XO-1.5) fails on both systems whereas dir sd:\boot fails only on XO-1.5. Even though the XO-1 cannot list sd:\security it recognizes the developer key in sd:\security\develop.sig quite fine.

It appears the cards are quirky/unreliable on XO-1 as well and happened to work just well enough for me not to encounter the error during regular use. Talking about OFW only - the cards work fine on any system from within Linux. They are often quite slow (~300KB/s write speed) but I think that's more related to write amplification effect and the like, not a sign of the cards being problematic / quirky. No errors or warnings in the logs.

does the self-test work on these cards on XO-1?

"test /pci/sd" doesn't produce any output for both of the cards.

Is there a way to turn on some debug output that might shed more light on this issue?

comment:5 follow-up: Changed 5 years ago by wmb@…

It's possible that the device is taking a long time to respond to reads of certain blocks - perhaps due to internal error correction or retrying or being busy with background wear leveling. I suspect that Linux retries data timeouts a few times, so maybe a subsequent try would work. I think we saw cases on XO-1 where the timeout was too short, so there were lots of timeouts, but the only thing you noticed at the user level was that some operations took nearly forever.

comment:6 Changed 5 years ago by sascha_silbe

Is there a way to increase this timeout at runtime to validate this theory? Or is it a hardware (i.e. SD host controller) timeout and the only way to handle it is making OFW retry?

comment:7 Changed 5 years ago by wmb@…

The timeout is implemented by hardware and the register that controls the timeout length is already set to its maximum value of 2.8 seconds. There is an "off by a factor of 2" bug in the XO-1's CaFe chip, so that the timeout value on 1.0 is actually 1.4 seconds.

comment:8 Changed 4 years ago by wmb@…

  • Action Needed changed from never set to test in release
  • Status changed from new to assigned

This could be related to the fix in svn 1890. I discovered that, under certain conditions, the Via VX855 will report a Data Timeout error on a reset card (CMD0) command, even though CMD0 is not a data transfer command.

svn 1890 seems to suppress this spurious CMD0 timeout by setting the data timeout control register before doing any card commands. Previously it was set before any real data transfer command, but after several non-data commands.

This change will appear in Q3A44.

comment:9 Changed 4 years ago by Quozl

  • Milestone changed from Not Triaged to 1.5-firmware

Sascha, Q3A44 is released, please check to see if this problem is solved for you. I've no card that reproduces this symptom.

comment:10 Changed 4 years ago by Quozl

  • Resolution set to fixed
  • Status changed from assigned to closed

Sascha, when you get time, please check with recent firmware and reopen this ticket if the problem still occurs. There's also a new 900 microsecond delay before writes.

comment:11 in reply to: ↑ 5 Changed 4 years ago by mikus

Replying to wmb@firmworks.com:

It's possible that the device is taking a long time to respond to reads of certain blocks - perhaps due to internal error correction or retrying or being busy with background wear leveling. I suspect that Linux retries data timeouts a few times, so maybe a subsequent try would work. I think we saw cases on XO-1 where the timeout was too short, so there were lots of timeouts, but the only thing you noticed at the user level was that some operations took nearly forever.

XO-1 (q2e44) I keep the develop.sig on the "permanent" SD card in each XO (as well as in /security in jffs2). Bought a new SD card (make+model same as what I already had). With this new SD card, OFW would give me an "10 timeout" while trying to access /security/develop.sig on that SD card - and would go on to use /security/develop.sig from jffs2. Then suddenly OFW stopped getting that "10 timeout" -- now that new SD card works as I expect it to. [I myself had made no changes to OFW, nor to the SD card, when that problem spontaneously went away.]

Note: See TracTickets for help on using tickets.