Opened 7 years ago

Last modified 7 years ago

#4479 reopened defect

DCON corruption during suspend/resume

Reported by: wad Owned by: dilinger
Priority: high Milestone: 8.2.0 (was Update.2)
Component: kernel Version:
Keywords: DCON Cc: dilinger, wmb, dwmw2, rsmith
Blocked By: Blocking:
Deployments affected: Action Needed:
Verified: no

Description

When doing suspend/resume testing, we have encountered at least one laptop in which the DCON stops loading after a while. This results in a system which looks fine while running, but the screen has colored static on it when suspended.

This machine was a C2 running build 617 w. kernel 2.6.22-20071023.3.olpc.0a5a6b07e5...

The log indicated communication problems with the DCON. Relevant portions are attached.

This problem was seen regularly in earlier kernels, but appeared to be fixed in more recent ones.

Attachments (2)

log_dconstatic.txt (34.6 KB) - added by wad 7 years ago.
This is a log from a machine which is no longer latching an image during resume
dcon-reinit.patch (1.4 KB) - added by dwmw2 7 years ago.

Download all attachments as: .zip

Change History (21)

Changed 7 years ago by wad

This is a log from a machine which is no longer latching an image during resume

comment:1 Changed 7 years ago by dilinger

  • Component changed from distro to kernel
  • Owner changed from jg to dilinger

gaaahhh

comment:2 Changed 7 years ago by jg

  • Milestone changed from Never Assigned to XM - killjoy
  • Priority changed from normal to high

comment:3 Changed 7 years ago by wad

There are two problems here. One is a hardware problem in which the DCON does not respond to the I2C reset sequence recommended by Himax, requiring a power cycle. The second is that when the chip is power cycled, it isn't being reset to the same state that OFW places it in on first boot.

I've searched through the logs generated on the testbed (these laptops don't have displays attached), and we are seeing I2C resets on all 40 machines. Around six to eight of these showed the problem described above (it is difficult to determine the exact number as logs are not strongly tied to one machine.)

We will continue to look for hardware problems causing the failure of the I2C reset, but we need to get the power cycling of the DCON to reliably reset it to a working state.

comment:4 Changed 7 years ago by wad

One significant change between the ECOd C1 that dilinger has been using to debug this problem and the C2 units we are seeing this on is that on C2 Quanta modified the turn-on circuit in a way that also increases the turn-off time. We will confirm the new time required, but in the meantime can the driver assume that it takes a second or so to power off the DCON ?

comment:5 Changed 7 years ago by wad

Please ignore the previous comment. The turn-off time should not be adversely affected.

comment:6 Changed 7 years ago by dilinger

For those playing along at home, the following patches affect the DCON:

http://dev.laptop.org/git?p=olpc-2.6;a=commitdiff;h=d3bead635e30d25bdeea26bf5a1d85161d52e9fe
http://dev.laptop.org/git?p=olpc-2.6;a=commitdiff;h=519ba4494b25d5e08111ccea215fa1054b9ea0f6
http://dev.laptop.org/git?p=olpc-2.6;a=commitdiff;h=7896b2dff75048f0163eba9bd5d9230086561759

Note that the logs show smbus timeout errors during suspend/resume.. And, the error case where the smbus goes screwy and we're forced to reinit the dcon. I'd *really* like to know the cause(s) for that. Dave also mentioned wanting to see traces..

comment:7 Changed 7 years ago by wad

Yeah, yeah. I'd like traces too, but we're not going to get them while I'm stuck in Changshu with a shallow-trace two channel scope swiped from a PCB repair station.

Himax has been sent two boards showing this problem, and will also be looking into why we get the timeouts.

comment:8 Changed 7 years ago by dwmw2

It sounds like SDRAM is uninitialised. Can you poke the SDRAM control registers manually using the i2cset tool, and make it initialise properly?

Something along these lines... (Himax or the peanut gallery will confirm):

   i2cset -y 0 13 0x3a 0xc040 w
   i2cset -y 0 13 0x41 0 w
   i2cset -y 0 13 0x41 0x101 w

comment:9 Changed 7 years ago by wad

  • Cc wmb dwmw2 added

We just confirmed that the kernel changes made to mfgtest 1031.2 do not fix the problem.

Looking at the code (olpc_dcon.c), I can't see where the DCON is reinitialized after power cycling. I suggest that the following be added, right before the final return statement in dcon_bus_stabilize() (line 164):

if( is_powered_down ) {

dcon_hw_init( ?, 0 );

}

I'm not familiar enough with the code to figure out where to obtain the struct i2c_client * to pass to dcon_hw_init().

comment:10 Changed 7 years ago by dwmw2

In the dcon_resume() code path you're right -- we aren't reinitialising the hardware. But that's because it's not powered down in that case -- that's when we put the CPU into suspend, but the screen remains on. Hence the 'is_powered_down' argument being zero.

The _other_ case is dcon_sleep(), and in that case we _do_ seem to get it right -- after calling dcon_bus_stablilize(1) we do call dcon_hw_init().

Did you try reinitialising by hand with i2cset tools from userspace? Just adding a printk in dcon_hw_init() should suffice to confirm that we're following that path when we should be.

comment:11 Changed 7 years ago by dwmw2

  • Cc rsmith added

comment:12 Changed 7 years ago by wad

But you SHOULD be reinitializing the DCON in the resume path, if we fail to communicate with it and power cycle it.

We powered the chip down at line 131 cause it wouldn't talk to us, set is_powered_down to one in line 133, then jump back to the beginning of the dcon_bus_stabilize() function where we now power the chip back up --- but forget to re-initialize it. Since nothing outside dcon_bus_stabilize() knows that we've power cycled the chip, nothing else will re-initialize its registers.

No I didn't reinitialize all the registers by hand to test this. It was too glaring a code error and Richard couldn't get i2cset or sdtools up and running on the affected laptop.

Changed 7 years ago by dwmw2

comment:13 Changed 7 years ago by dwmw2

Yeah, you're right. Try this patch.

comment:15 Changed 7 years ago by dilinger

thanks wad, nice catch!

comment:16 Changed 7 years ago by dilinger

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in build 625.

comment:17 Changed 7 years ago by dilinger

  • Resolution fixed deleted
  • Status changed from closed to reopened

Er, didn't mean to do that.

comment:18 Changed 7 years ago by jg

Is this in current joyride now?

comment:19 Changed 7 years ago by wad

I downloaded Joyride 327 (w. q204c firmware) and installed them on my fully modified C1. When I looked back after a few minutes, it had snow crashed. Tapping a key restored the user registration window, but this seems to indicate that it did not get into joyride...

Note: See TracTickets for help on using tickets.