Opened 6 years ago

Closed 6 years ago

#8234 closed defect (fixed)

Software update (in Control Panel) crashes X-server.

Reported by: sj Owned by: cscott
Priority: blocker Milestone: 8.2.0 (was Update.2)
Component: upgrade utility Version: Development build as of this date
Keywords: blocks:8.2.0 r+ 8.2-764:? Cc: marco
Blocked By: Blocking:
Deployments affected: Action Needed: test in release
Verified: no

Description

757 : the software update part of the control panel, after adding a group, still crashes sugar on occasion. I'm not sure from 1 instance noticed today what sets it off.

Attachments (5)

tmpw9cmtb.log (76.1 KB) - added by thomaswamm 6 years ago.
/var/log/messages after Software update crash
logs.CSN74500056.2008-09-13.01-31-59.tar.bz2 (145.5 KB) - added by mstone 6 years ago.
contains a suspicious sugar traceback
dbus-sugar.patch (729 bytes) - added by marco 6 years ago.
dbus-toolkit.patch (1.5 KB) - added by marco 6 years ago.
dbus-minimal.patch (324 bytes) - added by marco 6 years ago.

Download all attachments as: .zip

Change History (33)

comment:1 Changed 6 years ago by cjb

  • Milestone changed from Not Triaged to 8.2.0 (was Update.2)
  • Priority changed from normal to blocker

Scott, ping.

comment:2 Changed 6 years ago by cscott

I saw X crash today after clicking "install updates". dsd has seen X crashes, too. No one has succeeded so far in obtaining a reproducible result or logs. =(

Changed 6 years ago by thomaswamm

/var/log/messages after Software update crash

comment:3 Changed 6 years ago by thomaswamm

  • Action Needed changed from never set to reproduce
  • Summary changed from Update software crashes X after adding a new group to Software update (in Control Panel) crashes X-server.
  • Version changed from not specified to Development build as of this date

In the attached /var/log/messages, this has the timestamp corresponding to approximately when I saw Control Panel - Software update crash when I clicked on 'Install selected':

782	Sep  6 03:07:06 localhost init: prefdm main process ended, respawning
783	Sep  6 03:07:06 localhost init: rainbow main process (1155) killed by TERM signal

I have seen Software update crash 3 times, each time immediately when I clicked on 'Install selected'. Each crash was shortly after a cold start or restart of the XO. All 3 times it worked on 2nd try, but that was a warm start, because only the X-server (with Sugar) crashed and restarted automagically.

One crash happened in 8.2-757 (after olpc-update).

Two crashes happened in 8.2-759 (after olpc-update), and then I was watching for it. The most recent crash was after I called up Software update manually in Home view.

So to reproduce this bug, I suggest you cold-start your XO, then try Software update. (Just guessing.)

comment:4 Changed 6 years ago by cscott

See also #8345, which was a dup, and suggested trying to reproduce using the sugar-on-debian packages, since they seem to crash more reliably.

Changed 6 years ago by mstone

contains a suspicious sugar traceback

comment:5 Changed 6 years ago by mstone

See sugar/1221260285/shell.log in the attached log file. (Also, marvel at bzip2's compression ratio on the output of my new olpc-log #8457!)

comment:6 Changed 6 years ago by mstone

  • Keywords marco added

comment:7 Changed 6 years ago by thomaswamm

From 8.2-759, I did $ sudo olpc-update 8.2-760.

It went very smoothly (about 30 minutes), including manual reboot, auto firmware update to Q2E17, then the Software update of 4 activities. The only unsettling surprise was that after the 4 updates installed, the Software updates dialog box vanished suddenly without any feedback or "Finished" message. I re-accessed Software updates in the Control Panel, did Refresh, and saw no more updates offered. Home list view showed new version numbers for the 4 activities, so I presume it worked.

comment:8 Changed 6 years ago by marco

afaict the trace should be unrelated to the X/Sugar crash.

(it's great to have all the old logs, thanks!)

comment:9 Changed 6 years ago by gnu

Just happened to me, twice, on my first install of 8.2-760 on two different laptops. Nothing obvious in the logs. Only one odd thing there:

sugar-shell:1302: DEBUG: sms_error_handler (0x959dc40, FALSE, 3, 9, 32771, 0)

Don't know if it's related, or just a random message someone forgot to remove.

"last" says:

root tty1 Thu Sep 18 01:10 still logged in

olpc console Thu Sep 18 01:05 still logged in

olpc console Thu Sep 18 00:46 - 01:05 (00:18)

reboot system boot 2.6.25-... Thu Sep 18 00:46 (00:37)

i.e. init noticed the crash (3rd line, logout of "olpc").

comment:10 Changed 6 years ago by bemasc

  • Action Needed changed from reproduce to diagnose

Still crashes in 761.

comment:11 Changed 6 years ago by cjb

attached to Xorg before the crash, and:

Program received signal SIGTERM, Terminated.
0x0812e140 in ?? ()
(gdb) bt
#0  0x0812e140 in ?? ()
#1  <signal handler called>
#2  0xb7f90424 in __kernel_vsyscall ()
#3  0xb7c250bc in writev () from /lib/libc.so.6
#4  0x0813275e in ?? ()
#5  0x081317af in _XSERVTransWritev ()
#6  0x0812c722 in FlushClient ()
#7  0x0812d09f in FlushAllOutput ()
#8  0x08085c15 in Dispatch ()
#9  0x0806b60d in main ()

comment:12 Changed 6 years ago by marco

Installing debuginfo might give a better trace. The updater is not doing UI calls from the other threads, is it?

comment:13 Changed 6 years ago by mstone

  • Keywords blocks:8.2.0 added; marco removed

From Greg: We are _considering_ holding 8.2.0 for this bug. Please find root cause and suggest some patches. We will consider them carefully.

comment:14 Changed 6 years ago by cjb

  • Cc marco added

Here's a trace with debuginfo:

(gdb) bt
#0  SmartScheduleTimer (sig=14) at utils.c:1551
#1  <signal handler called>
#2  0xb7f58424 in __kernel_vsyscall ()
#3  0xb7bed0bc in writev () from /lib/libc.so.6
#4  0x0813275e in _XSERVTransSocketWritev (ciptr=0x96fd720, buf=0xbfd76994, 
    size=1) at /usr/include/X11/Xtrans/Xtranssock.c:2297
#5  0x081317af in _XSERVTransWritev (ciptr=0x96fd720, buf=0xbfd76994, size=1)
    at /usr/include/X11/Xtrans/Xtrans.c:922
#6  0x0812c722 in FlushClient (who=0x9705720, oc=0x9705d68, extraBuf=0x0, 
    extraCount=0) at io.c:930
#7  0x0812d09f in FlushAllOutput () at io.c:682
#8  0x08085c15 in Dispatch () at dispatch.c:473
#9  0x0806b60d in main (argc=7, argv=0xbfd76b44, envp=0x8089e71) at main.c:441
(gdb)

comment:15 Changed 6 years ago by marco

It looks like dbus-glib threads are never initialized, nor in sugar nor in the updater.

http://dbus.freedesktop.org/doc/dbus-python/api/dbus.mainloop.glib-module.html

Changed 6 years ago by marco

Changed 6 years ago by marco

comment:16 Changed 6 years ago by marco

The two patches above cleanup dbus initialization in the sugar shell and activities. I don't know if it actually help this issue but I think it's worth a try.

I can test them out tomorrow, if someone post instructions on how to reproduce the crash. Or feel free to do it directly :)

comment:17 Changed 6 years ago by tomeu

The patches look good to me.

comment:18 Changed 6 years ago by cscott

I will test this today. Good catch!

comment:19 Changed 6 years ago by marco

afaict it fixes it... I can't reproduce the crash 100% though, so I can't say for sure.

I'm going to push a smaller patch (without the cleanups) for testing in joyride.

Changed 6 years ago by marco

comment:20 Changed 6 years ago by marco

dbus-minimal.patch is in sugar-0.82.8-2.olpc3

comment:21 Changed 6 years ago by tomeu

  • Action Needed changed from diagnose to review
  • Keywords r+ added

comment:22 Changed 6 years ago by tomeu

  • Action Needed changed from review to package

pushed to both branches

comment:23 Changed 6 years ago by marco

  • Action Needed changed from package to test in build
  • Keywords 8.2-764:? added

comment:25 Changed 6 years ago by cscott

  • Action Needed changed from test in build to approve for release

Haven't seen a crash since joyride-2483 (when first version went in). sugar-0.82.9-1 is in joyrider-2485.

Please consider for next stable build.

comment:26 Changed 6 years ago by mstone

  • Action Needed changed from approve for release to add to release

Approved.

|TestCase|

Repeatedly run the software updater. X should never restart.

comment:27 Changed 6 years ago by cscott

  • Action Needed changed from add to release to test in release

Packages added to stable repository in commit 59fd5419 for build 764. Please confirm the package versions are correct and test in stable build 764 or later.

comment:28 Changed 6 years ago by mchua

  • Resolution set to fixed
  • Status changed from new to closed

On 765 (gg-765-2), I ran the software updater 10 times in succession (running and then closing the update panel, and reopening it from sugar-control-panel each time - not just hitting 'refresh'). No X restarts. I also ran the updater on 33 machines running 765 right now, and none of them exhibited this problem.

I'm marking this as closed because we've got a lot more bugs to test tonight, but if you want this hammered more and longer, tell me a number of trials/XOs you'd like hit.

Note: See TracTickets for help on using tickets.