Ticket #12453 (closed defect: fixed)

Opened 20 months ago

Last modified 19 months ago

[CL4] System randomly hangs up at big 01.

Reported by: tomyin Owned by: cjb
Priority: blocker Milestone: 13.1.0
Component: not assigned Version: not specified
Keywords: Cc: pgf, wad, wmb@…, dsd
Action Needed: no action Verified: no
Deployments affected: Blocked By:
Blocking:

Description

OS: 31022o4 OFW: Q7B10 EC: 0.3.07 Procedure: System randomly hangs up at big 01. 1. Update to q7b11, reboot 2. To ok prompt enter “update-nn-flash” then reboot 3. Move cursor via touchpad to paint 4. Open paint via touchscreen, ==> hang up then appear 01 No logs can catch

Attachments

big01.jpg (0.6 MB) - added by tomyin 20 months ago.
big01

Change History

Changed 20 months ago by tomyin

big01

Changed 20 months ago by tomyin

1.1 Goto sugar 1.2 Normally operate. 1.3 It randomly hangs up at big 01

2.1 Goto sugar and idle the machine let it enter suspend mode 2.2 Via touchpad to wake up machine after panel turn off. ==> hang up then appear 01

Changed 20 months ago by dsd

  • cc pgf, wad, wmb@… added
  • owner set to cjb
  • next_action changed from never set to diagnose
  • milestone changed from Not Triaged to 13.1.0

I've seen this a handful of times as well, and have seen it mentioned on IRC. Can't see an existing ticket for it though. Some more reports/logs are in #12458.

Changed 20 months ago by dsd

Walter in #12471 has managed to reproduce it quite easily in Browse, even with power management disabled.

Changed 20 months ago by dsd

#12433 suggests that various mmc errors are printed at the time of crash. #12486 suggests that opening the sugar frame by moving the mouse to a hot corner may be a likely way to trigger the issue.

Changed 20 months ago by wmb@…

The reproduction recipe in #12486 stopped failing after gonzalo opened up the machine and connected a serial port. He said that it is the first time that he has removed the battery in a long time, but that he has updated the EC code since the last battery removal.

Changed 20 months ago by walter

Got this twice tonight (once while in Measure, and once while trying to access the Journal from the Home View). Nothing of interest in the logs.

Changed 20 months ago by dsd

  • cc dsd added
  • priority changed from normal to blocker

This will block XO-4 production if it affects runin. Even if it doesn't, it should still be treated with importance.

Changed 20 months ago by wmb@…

The problem has been tracked down to corruption of CForth's interrupt stack, specifically the saved PC value. Moving the interrupt stack from SRAM to TCM works around the problem. The root cause of the corruption is as yet undetermined. It could be hardware, or a bug in the CForth interrupt handling code, or a bad setting for the suspend/resume parameters relating to SRAM, or a driver bug that causes writing to some SRAM locations that are not owned by the driver.

Changed 20 months ago by wmb@…

The workaround is encoded in CForth git commit aeea08d.

Changed 20 months ago by dsd

...and released in XO-4 firmware Q7B12.

Changed 19 months ago by dsd

  • next_action changed from diagnose to test in build

This workaround can be tested in 13.1.0 build 27.

Changed 19 months ago by greenfeld

  • status changed from new to closed
  • next_action changed from test in build to no action
  • resolution set to fixed

I have not seen any 01 failure reboots over the past few days with Q7B12 (os25/26) & Q7B14 (os27).

Note: See TracTickets for help on using tickets.