Ticket #12169 (new defect)

Opened 19 months ago

Last modified 18 months ago

XO-4 B1 SDHCI error on boot

Reported by: rsmith Owned by: wad
Priority: normal Milestone: 4-hardware
Component: hardware Version: 4-B1
Keywords: XO-4, SDHCI, OFW Cc:
Action Needed: diagnose Verified: no
Deployments affected: Blocked By:
Blocking:

Description

Sporadic SDHCI errors have been reported on various XO-4 B1 units. A unit at Twine is showing this error on a repeatable basis.

Change History

Changed 19 months ago by rsmith

The following is e-mail/IRC between Richard and Mitch on trying to diagnose the failure on the unit at Twine

============= I played with my B1 having the SDHCI problem a bit just now. Its still showing the problem and I've discovered a few things.

If I have my boot USB flash drive inserted then it errors every time. This is what I was using to boot cjb's kernel. If I do not have the flash drive in then it errors only sometimes.

In both cases If I boot with the "check" key enabled then the problem does not occur. At least not with enough frequency that I can see it in 10 or so tries.

The same usb disk does not cause the error on 2 other B1 units I tried. =============

One thing to try: With the USB key installed, power on, then as soon as the screen turns while, press the rotate button.

Then, on the serial port, type "resume"

The intention here is to see if any delay fixes the problem.

We have already seen that the "check" key fixes the problem. One effect of the check key is the delay introduced by the "Release the game keys to continue" step. I wonder if introducing delays at other points works similarly. ===================

Then, on the serial port, type "resume"

That does not fix it.

The intention here is to see if any delay fixes the problem. We have already seen that the "check" key fixes the problem. One effect of the check key is the delay introduced by the "Release the game keys to continue" step. I wonder if introducing delays at other points works similarly.

If I release the check key after the cforth numbers but before OFW can check for the key then the problem still happens =========

Via IRC: [09 19:41:14] <MitchBradley> The thing I was going to suggest was ok debug startup [09 19:41:29] <MitchBradley> or ok debug stand-init [09 19:42:00] <MitchBradley> and find out where it starts failing when you do 'g' from various points

Changed 19 months ago by wad

I've been seeing this on SHC238000C7, running OFW Q7B02 and Q7B03. It does have a USB key inserted at boot time.

I'm going to take a look at this with a scope soon.

Changed 19 months ago by wad

  • next_action changed from never set to diagnose
  • component changed from not assigned to hardware
  • priority changed from normal to blocker
  • owner set to wad
  • version changed from not specified to 4-B1
  • milestone changed from Not Triaged to 4-hardware
  • keywords XO-4, SDHCI, OFW added

BTW, the USB key was the "Belkin silver" key. Other USB keys (or other laptops ?) don't see the problem.

Changed 18 months ago by wmb@…

"Fixed" by svn 3407 and CForth git 12ec3e2 . Unknown whether Linux will play.

Changed 18 months ago by wad

  • next_action changed from diagnose to test in build

The OFW fix also requires svn 3412. Otherwise, Linux doesn't play.

Linux still requires a fix. It now asserts eMMC_RST# properly but doesn't quite manage to restore communications with the eMMC.

The root cause of the problem is leakage from the Marvell PXA2128 into the eMMC when the eMMC is powered down. If the MMC interface in the SoC is connected to the pins, but the interface isn't powered up and clocked, the eMMC power line is held around 2.8V. This means that on some units, the eMMC doesn't properly power-on-reset when power is turned on to +3.3V.

Just adding a clamp isn't the solution. Significant current is dissipated, and Linux still has problems.

The above changes to CForth and OFW set the eMMC pins as GPIO inputs when not driving the eMMC (and ensure that there is time for the leakage across the caps to decay).

Marvell's normal recommendation is to leave the eMMC tied to +3.3V (powered), and assert the eMMC reset line to reset it. While we can use this within Linux, there is a chicken and egg problem if we also try to use it for OFW. The eMMC has to be configured before it responds to the RESET# line. If it is one of the 1% of units showing this problem, we would never get OFW to configure it properly.

Changed 18 months ago by wad

  • priority changed from blocker to normal
  • next_action changed from test in build to diagnose

Changed both the priority as we discussed this and agreed that other Linux issues are higher priority at this time. Also changed the action needed to reflect the Linux side of the problem.

Note: See TracTickets for help on using tickets.