Opened 6 years ago

Closed 5 years ago

Last modified 14 months ago

#10632 closed defect (wontfix)

USB failure when updating SD using OFW

Reported by: wad Owned by: wmb@…
Priority: normal Milestone:
Component: ofw - open firmware Version: 1.75-A2
Keywords: XO-1.75 Cc: gary, wad, martin.langhoff
Blocked By: Blocking:
Deployments affected: Action Needed: no action
Verified: yes


On many if not all 1.75 A2 motherboards, an attempt to use fs-update to update the internal/external SD or eMMC will result in a "Short read of zdata file" error message from OFW.

The crashing usually happens either around block 50 or around block 1200.

This was tested running OFW version Q4A10i. On motherboards that have shown this failure, running a wear-levelling test from Linux on the USB device doesn't find any errors.

I spent some time looking at the power rails (Vmain, +3.3V, +5V_USB, Vin). There doesn't seem to be any abnormal activity when OFW fails. The voltage spikes due to the Vmain power supply switching transient are slightly larger when running OFW fs-update than when running the Linux tests, but successful efforts to reduce the spikes didn't have any effect on the frequency of crashing.

Increasing the voltage of Vmain to 1.35V from 1.25V didn't have any effect on the crash frequency.

Change History (8)

comment:1 Changed 6 years ago by wad

On 1/11/11, Mitch discovered that:

a) The USB errors are "Babble", i.e. the host controller detects data on the USB bus after the end of transaction. That is a fatal error according to the USB spec, requiring that the endpoint be halted.

b) The error only happens if the length of an individual USB transfer exceeds 4K. If I restrict transfers to 4K each, I can do long operations, including a complete
fs-update, without error. If the transfer length exceeds 4K by even one sector (4K + 512 bytes), I quickly see babble conditions.

c) I tried it with the ellisys USB Explorer, which did not notice a babble condition on the external bus at the error point. I'm not sure that's conclusive, but it suggests that the babble might be somehow related to conditions at the Hub/SOC.

d) I don't know the max transfer length for Linux; Lennart speculates that it might be 4K, per the filesystem block size and page size, but that is not confirmed. If Linux does indeed restrict to 4K, that would explain why Linux hasn't seen the problem.

Here are two versions of a test script:

ok text-off \ Speed things up by turning off screen output

ok select u:0 \ Open USB disk in raw mode

ok 1000 0 do (cr i . load-base 0 8 read-blocks 8 <> abort" x" loop

That script re-reads the same 8 blocks (4K) starting at block 0. You can replace the 8 with larger numbers (2 places) to increase the read length. For me, it works at b (decimal 11) and failing at c (decimal 12) on one USB stick, while working at 8 and failing at 9 on a different stick

ok text-off \ Speed things up by turning off screen output

ok select u:0 \ Open USB disk in raw mode

ok 1000 0 do (cr i . load-base i 8 * 8 read-blocks 8 <> abort" x" loop

The above script moves across the "disk" as it reads, instead of always reading the same blocks. It works at 8 and fails at 9 on the first stick above. Aha! It fails at 8 on the second stick. A third USB stick failed at a (decimal 10) blocks/transfer on the first test and failed at 9 on the second.

comment:2 Changed 5 years ago by martin.langhoff

  • Cc martin.langhoff added

comment:3 Changed 5 years ago by Quozl

  • Milestone changed from Not Triaged to 1.75-firmware

Fix milestone.

comment:4 Changed 5 years ago by wmb@…

Increasing the poll-delay in dev/usb2/hcd/ehci/qhtd.fth from 100 us to 300 us appears to make the problem go away.

comment:5 Changed 5 years ago by wad

Verified that increasing the poll-delay from 100 uS to 300 uS fixed USB errors from OFW on 1.75 A3 #37. This was a motherboard which failed with the value of poll-delay used in Q4A13H.

comment:6 Changed 5 years ago by edmcnierney

Ed is updating this ticket to check Gary's notification account.

comment:7 Changed 5 years ago by Quozl

  • Action Needed changed from never set to no action
  • Resolution set to wontfix
  • Status changed from new to closed

This hasn't been seen on B1, and it has been reasonably reliable on A3, closing.

comment:8 Changed 14 months ago by Quozl

  • Milestone 1.75-firmware deleted

Milestone 1.75-firmware deleted

Note: See TracTickets for help on using tickets.