Ticket #3479 (closed defect: fixed)

Opened 7 years ago

Last modified 6 years ago

i2c timeout hang

Reported by: cjb Owned by: dilinger
Priority: blocker Milestone: Update.1
Component: kernel Version:
Keywords: Cc: marcelo, JordanCrouse, jg, wad
Action Needed: Verified: no
Deployments affected: Blocked By:
Blocking:

Description

[    1.314226] psmouse serio1: resuming
[    1.365035] OLPC-DCON 1-000d: resuming
[    1.568271] i2c-adapter i2c-1: timeout in state address
[    1.781125] i2c-adapter i2c-1: timeout in state address
[    1.993986] i2c-adapter i2c-1: timeout in state address
(repeated forever)

seen on a C-Test with the #1835 mod, during resume testing.

full log: http://dev.laptop.org/~cjb/libertas-20070914-2

Change History

Changed 7 years ago by jg

  • cc marcelo added
  • owner changed from marcelo to dilinger
  • milestone changed from Untriaged to Trial-3

Changed 7 years ago by dilinger

  • cc JordanCrouse added

Changed 7 years ago by dilinger

Well crap. Hey Jordan, remember this messiness? I bet it's related.. we might need a more general purpose solution.. Force the smbus read/write callbacks to poll until the bus is stable, or something similar. Some sort of resume callback reordering/sys relationship stuff (piggy?) is another possibility, I suppose.

commit 7b088e9fbf1d4d310efd4f9806e6e6bce447c1d2 Author: Andres Salomon <dilinger@…> Date: Thu Jun 14 20:21:55 2007 -0400

DCON: in the resume path, ensure bus stability

We're seeing some smbus weirdness; upon resume, the smbus takes a read or two to get into a sane state. So, let's do a proper check before attempting to switch the dcon source.

Signed-off-by: Andres Salomon <dilinger@…>

Changed 7 years ago by wmb@…

If anyone wants to try turning off the firmware smbus workaround that is in the resume path, here is a recipe that works on q2c26:

ok f0455 3e 90 fill

That line overwrites the RAM copy of the workaround code sequence with NOP's (0x90 bytes).

Changed 7 years ago by wad

My first suspicion was problems due to the DCON I2C bus being left floating during suspend. Checking with Himax indicated, however, that the I2C controller is in an unpowered domain during suspend:

On Sep 17, 2007, at 9:42 PM, abnern_chen@… wrote:

:The DCON has three power dowain (DCON_1.8V, DCON_2.5V and DCON_3.3V). I guess the DCON_3.3V power is power down, when you suspend the laptop. We call this stage is S3. If DCON_3.3V power is power down, the DCON I/O include I2C I/O would power down (drop to 0V) at DCON_3.3V power domain. Maybe you can check about the DCON_3.3V or +3.3V power domain.

The only reset line for the DCON runs off the core power supply (1.8V), which isn't powered down during suspend. We've always had to reset the I2C controller by doing a dummy bus read to address 0x1a (ignoring the result and any errors) in OFW upon resume.

As there is no hardware fix possible short of respinning the chip, this will have to be remedied in software. If repeated attempts to reset the I2C via dummy bus transactions fails, software should power cycle the DCON by having the EC deassert DCON_EN for a hundred mS.

Changed 7 years ago by JordanCrouse

When B3 first came out, we had numerous problems with various components on the southbridge, all of which can be subsequently explained by #1835. This was one of those workarounds that we did to kick the system back into shape. Before moving on, I would like to confirm that we've reverted the work arounds without fixing the problem.

Changed 7 years ago by cjb

The next step should be rolling back the workaround, as detailed by Mitch above.

Changed 7 years ago by jg

  • cc jg added

Changed 7 years ago by dilinger

  • cc wad added

It sounds as though wad

Changed 7 years ago by dilinger

er..

It sounds as though wad has fixed this through ECOs to the machine. He had to clean up some power rails; there was noise on them during resume.

Changed 7 years ago by dilinger

But we should be following himax's instructions of:

We suggest when turn on +3.3 V , send eight SMB clock first  and SMDAT 
keep high . DCON state machine 
will reset to initial state. and we can  normal control SMBUS.
22:23 < Mitch_Bradley> #define SMBPINS 0xc000\n#define SMBCLR 
(SMBPINS<<16)\noutl(SMBCLR,0x1010);outl(SMBCLR,0x1014);outl(SMBCLR,0x1034);outl(SMBPINS,0x1000);outl(SMBPINS,0x1004);

Changed 7 years ago by dilinger

Oh, and:

22:24 < Mitch_Bradley> for(i=0;i<8;i++){usdelay(5);outl(0x4000<<16,0x1000);usdelay(5);outl(0x4000,0x1000);} usdelay(5);

Changed 7 years ago by dilinger

And finally,

22:25 < Mitch_Bradley> but there is a problem with that, which could be what 
                       was causing the issue I saw:  I forget to put the pins 
                       back to SMBUS mode, instead leaving them in GPIO mode.
22:31 < Mitch_Bradley> hmm, actually I didn't forget to put the pins back.  the 
                       code in question is right before the code that restores 
                       the GPIO pins to their pre-suspend values, so that 
                       should put them right.
22:33 < Mitch_Bradley> but dilinger will need to put them back.   
                       outl(SMBPINS,0x1034);outl(SMBPINS,0x1010);

Changed 7 years ago by dilinger

For posterity, here's the kernel code:

#define SMBCLR (0xc000<<16)
outl(SMBCLR, gpio_base + GPIO_OUTPUT_AUX1);
outl(SMBCLR, gpio_base + GPIO_OUTPUT_AUX2);
outl(SMBCLR, gpio_base + GPIO_INPUT_AUX1);
outl(SMBCLR, gpio_base + GPIO_OUTPUT_VAL);
outl(SMBCLR, gpio_base + GPIO_OUTPUT_ENABLE);
for (x=0; x<8; x++) {
        udelay(5);
        outl(0x4000<<16, gpio_base + GPIO_OUTPUT_VAL);
        udelay(5);
        outl(0x4000, gpio_base + GPIO_OUTPUT_VAL);
        udelay(5);
}
outl(SMBCLR, gpio_base + GPIO_INPUT_AUX1);
outl(SMBCLR, gpio_base + GPIO_OUTPUT_AUX1);


However, it breaks things badly. Not worth dealing with.

Changed 7 years ago by jg

  • milestone changed from Untriaged to First Deployment, V1.0

Changed 7 years ago by dilinger

A variation of this recipe has been commited: http://dev.laptop.org/git?p=olpc-2.6;a=commitdiff;h=ab7afec1af5bfa4a3a35300388edd26856f0ebf8 and http://dev.laptop.org/git?p=olpc-2.6;a=commitdiff;h=0da1d1c1291712c80ebbca58d7ec0151bcd31169

We still have the dcon workaround in place, but I don't believe we've seen the i2c glitch with this recipe.

Changed 6 years ago by dilinger

Alright, considering that we're not seeing the DCON glitch on our C2 tests (right?), I'm going to both consider this closed, as well as re-add the BUG_ON() to make things explode properly C2+ machines.

Changed 6 years ago by dilinger

  • status changed from new to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.