Opened 6 years ago

Closed 4 years ago

#7958 closed defect (duplicate)

DCON showed old screen image during suspend, with extra black "dusty" spots

Reported by: gnu Owned by: dilinger
Priority: blocker Milestone: Future Release
Component: kernel Version: not specified
Keywords: Cc: dilinger, rsmith
Blocked By: Blocking:
Deployments affected: Action Needed: never set
Verified: no

Description

This is a rare condition that I think I may have seen or heard of once before.

XO G1G1 MP "Xoroaster", S/N CSN7500230F, Joyride 2263, firmware is a custom special test version by rsmith: "Q20107", a post-Q2E12 but pre-Q2E13 version.

I had just run a power test for rsmith in a terminal activity. The machine was configured for Mesh channel 1 (Simple Mesh) during the test. The machine had suspended at the end of that test, showing that the battery was fully charged. (The screen shows a log of battery status every 10 seconds or so, from the olpc-pwr-log command.) I woke up the suspended system with a keypress, tried to scp and failed, switched the network configuration to use a local access point, and scp'd the files to another machine so I could send them to Richard. I sent the email at about 22:40 (Pacific time) and then got into an extended irc conversation.

When I glanced back at the XO, at 23:07, it was suspended, and the screen was "spotty", with a lot of black dust mixed into the image. But the most interesting part is that the screen image was the image at the end of the battery test -- not including the subsequent commands!

I realized this was probably a DCON issue, and took two photos of the screen. From the IRC log, the image persisted until 23:20, when I had set up a camera to take a video of what happened when I resumed the system with a keypress. Those are attached. My prediction was that the screen would jump to show the correct contents immediately upon resume. Indeed, it did.

My theory is that the system suspended normally during the end of the power test. But its next suspend, 65 seconds after I finished scp-ing the files off the XO, was abnormal. The DCON missed the DCONLOAD signal that should've copied the current screen contents into the DCON's little 1MB DRAM buffer. When the suspend code switched the screen so the DCON would refresh it, it started refreshing from the *prior* contents of that buffer -- with some bit-rot speckles because the DRAM buffer doesn't get refreshed when it isn't in use. That's the theory.

Some time after the above, I captured "dmesg" output and have attached that as well. It seems to have the last four suspends. There are some odd kernel messages, but they're about the CAFE chip, not about the DCON.

In the GMT timezone of the laptop, the last power file was written at 2008-08-14 05:26, and the subsequent dmesg command was at 06:33.

Richard remembers some i2c problems with the CPU talking to the DCON, that were never fully diagnosed; perhaps that's the root cause. He says the EC is not involved unless the DCON needs to be reset. (I didn't see any indication of a DCON reset in the dmesg log, but I don't know what to look for.)

(For contrast, see #2358 for a very early DCONLOAD problem while suspend was originally being debugged.)

Attachments (4)

dmess-dconload-prob (121.5 KB) - added by gnu 6 years ago.
dmesg from the ailing system (after resuming everything with a keypress)
cimg7823.jpg (4.0 MB) - added by gnu 6 years ago.
Screen shot while suspended (note power LED is off); also note black spots scattered randomly in the white area of the screen (examine photo at full resolution)
cimg7824.jpg (3.9 MB) - added by gnu 6 years ago.
A second screen image, showing a bit more detail.
dconload-movie.avi (19.0 MB) - added by gnu 6 years ago.
Blurry but useful video (21MB) of the OLPC screen during suspend and then switching dramatically when a press on the spacebar woke it from suspend. I had to truncate the last megabyte or so of it, because of the 20MB file size limit in TRAC. Nothing useful was lost.

Change History (10)

Changed 6 years ago by gnu

dmesg from the ailing system (after resuming everything with a keypress)

Changed 6 years ago by gnu

Screen shot while suspended (note power LED is off); also note black spots scattered randomly in the white area of the screen (examine photo at full resolution)

Changed 6 years ago by gnu

A second screen image, showing a bit more detail.

Changed 6 years ago by gnu

Blurry but useful video (21MB) of the OLPC screen during suspend and then switching dramatically when a press on the spacebar woke it from suspend. I had to truncate the last megabyte or so of it, because of the 20MB file size limit in TRAC. Nothing useful was lost.

comment:1 Changed 6 years ago by mikus

This has nothing to do with 'suspend' or ticket #7958, but on random occasions I have seen at shutdown an old screen image (usually "left over" from boot), with extra black "dusty" spots, that is OTHER than the "don't do this" image.

comment:2 Changed 6 years ago by pgf

possibly related: running joyride-2294, just once when auto-suspend took effect, the screen went to "colored snow", i.e. uniform multi-colored speckles. hitting hitting the touchpad woke up the machine, refreshed the screen properly, and the symptom hasn't been reproduced since.

comment:3 Changed 6 years ago by gnu

Additional comments from Mikus (by email):

What hardware do you see this in? (G1G1? Serial number?)

G1G1. CSN74804910

What firmware is it running and what OS release ?

Impossible to tell. I install new firmware whenever I hear about
it, and rarely let my OS (Joyride) get more than three days behind.
So the firmware and OS would normally have been "current" whenever I
saw anything.

Please take a guess at how many
random occasions you have seen it happen on ?

My guess would be six or so. The first was probably in January, on
old releases (was using Update.1 builds whenever they became
available -- did not switch to Joyride until later).
Can't positively swear to any of this.

comment:4 Changed 6 years ago by cjb

  • Milestone changed from 8.2.0 (was Update.2) to Future Release
  • Priority changed from normal to blocker

Pushing to blocker for future release. I'd like this fixed by the time we turn on idle suspend. I'll add it to the tracker bug for that.

comment:5 Changed 6 years ago by wmb@…

I have been experimenting (using OFW) with the timing of deasserting DCONLOAD relative to the CPU's vertical scan line counter (register DC+0x6c). If you deassert between lines 0 and 35 inclusive, you get screen artifacts near the bottom of the screen while the DCON is in frozen mode. The artifacts are worse when the deassertion happens at scan line 0, dwindling to rare artifacts on the last scan line of the screen when deasserting at scan line 35.

The following deassertion procedure works reliably for me, without artifacts:

Step 1) Wait until vertical sync is reported in that register (bit 29).
Step 2) Wait until scan line 38 is reported in the low 11 bits of that register.
Step 3) Deassert DCONLOAD

In Step 2, you can wait for any line between 36 and 911 inclusive; it makes no difference.

It is tempting to consider a shorter-wait procedure, which doesn't work:

Bad) Wait until the scan line number is greater than 35.

That procedure does eliminate artifacts, but it can lose (not display during frozen mode) the last thing that was written to the frame buffer prior to the deassertion of DCONLOAD if that last write happened in the same frame time as the deassertion.

comment:6 Changed 4 years ago by dsd

  • Resolution set to duplicate
  • Status changed from new to closed

we've taken steps to partially resolve this in #9664

Note: See TracTickets for help on using tickets.