#12543 closed defect (fixed)

stopping X from inside X breaks the display

Reported by: dsd Owned by: dsd
Priority: blocker Milestone: 13.2.0
Component: x window system Version: not specified
Keywords: Cc:
Blocked By: Blocking:
Deployments affected: Action Needed: add to build
Verified: no

Description

As of 13.1.0 build 29 for XO-4, running:

sudo systemctl stop olpc-dm.service

from inside the Sugar terminal causes the display to break (white screen, no clear way to recover)

Change History (11)

comment:1 Changed 20 months ago by dsd

  • Priority changed from normal to blocker

This can be seen as well when shutting down Sugar, as it too triggers an X shutdown from inside X. The factory has identified this as a blocker - Jon, please prioritise.

It is a regression over 13.1.0 build 28, i.e. it is likely related to the graphics driver changes that were implemented to bring power usage back to normal.

comment:2 Changed 20 months ago by pgf

this is probably exacerbated by some shutdowns taking a Really Long Time.

here is a log from the first reboot after an install of os32. i fs-update'd, untarred a set of kernel modules, installed my new kernel, and typed "reboot". i got the white screen, and it stayed there for over 90 seconds until the shutdown proceeded:

bash-4.2# reboot
[  247.982602] pxa168fb_open GFX layer, fbi 0 opened 3 times ----
[  248.090391] pxa168fb: set_screen for fbi 0
[  248.158951] gcmkONERROR: status=-17(gcvSTATUS_INVALID_DATA) @ gckKERNEL_DeleteRecord(475)
[  248.194746] gcmkONERROR: status=-17(gcvSTATUS_INVALID_DATA) @ gckKERNEL_RemoveProcessDB(994)
[  248.220908] gcmkONERROR: status=-17(gcvSTATUS_INVALID_DATA) @ gckKERNEL_Dispatch(1151)
[  248.253122] Process 538 released: 
[  248.253188]  -- VidMem:         used bytes   12712 KB, max bytes   26546 KB, total bytes   88675 KB
[  248.291547]  -- NonPaged Mem:   used bytes       0 KB, max bytes       0 KB, total bytes       0 KB
[  248.319332]  -- Contiguous Mem: used bytes   51712 KB, max bytes   51712 KB, total bytes   51712 KB
[  248.345773]  -- MapUserMemory:  used bytes       0 KB, max bytes     384 KB, total bytes   49982 KB
[  248.374046]  -- MapMemory:      used bytes       0 KB, max bytes  153600 KB, total bytes  153600 KB
[  248.401702] [538]Marvell Technology Group Ltd(GC Ver0.8.4609p8)
[  248.401702] idle register: [3D][0x00][idle], [2D][0x00][idle]
[  248.401702] clock register: [0x00]
[  248.401702] clock rate: [0] MHz
[  248.401702] Total reserved video mem:    51200 KB
[  248.401702]   - used video mem:              0 KB
[  248.401702]   - contiguousPaged:             0 KB
[  248.401702]   - virtualPaged:                0 KB
[  248.401702]   - contiguousNonPaged:          0 KB
[  248.401702] Video memory usage in details:
[  248.401702]   - Index:                       0 KB
[  248.401702]   - Vertex:                      0 KB
[  248.401702]   - Texture:                     0 KB
[  248.401702]   - RenderTarget:                0 KB
[  248.401702]   - Depth:                       0 KB
[  248.401702]   - Bitmap:                      0 KB
[  248.401702]   - TileStatus:                  0 KB
[  248.401702]   - Image:                       0 KB
[  248.401702]   - Mask:                        0 KB
[  248.704357] pxa168fb_release GFX layer, fbi 0 opened 4 times ----
[  248.733332] pxa168fb_release GFX layer, fbi 0 opened 3 times ----
[  341.434699] systemd-journald[263]: Received SIGTERM
[  346.571507] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[  346.743113] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[  346.902487] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[  347.072407] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[  347.239805] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[  347.399961] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[  347.737299] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[  347.769830] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[  347.769830] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[  347.879866] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[  347.907328] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[  347.907328] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[  347.921340] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[  347.926805] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[  348.177936] Restarting system.

comment:3 Changed 20 months ago by jnettlet

This bug is a problem with the extra layer of complexity added on in the Vivante codebase to handle dual cores. When we shutdown X everything will gladly clean up all the structures and close the connection to the kernel module, however there may still be command queues waiting. Our code does push all the command queues, but the calculation of when a command queue is done seems very broken in this code base.

All this is very timing dependent. Doing a chvt or telinit 3 allows everything to work fine, but a shutdown or killing the X server on the same runlevel just gives a white screen.

comment:4 Changed 20 months ago by Quozl

We might chvt as workaround then?

comment:5 Changed 19 months ago by dsd

Trying to get a better handle on this. Exploring the possibility that there is some code somewhere that makes the return-to-VT-mode change successful when chvt happens, which does not happen on X shutdown:

  • Crippling pxa168fb_set_par does not cripple chvt.
  • Coudn't find anything relevant in galcore to cripple - but I didn't hunt very hard.
  • Crippling mrvl_scn_leavevt in the dove driver also does not cripple chvt.

So I was unable to find such code.

This might suggest that since we don't actually change video mode between X and VT, no special code is needed. However, if I "killall -9 X", the video does hang, which would suggest otherwise - something different is happening in the chvt path.

Looking at the chvt case step-by-step there seem to be 3 important things that happen at different points:

  1. The screen goes black
  2. The cursor appears
  3. The console text appears

Looking in detail at number 2, this happens on the chvt case because the VT_RELDISP ioctl calls complete_change_console which calls do_unblank_screen and from there we go into the core of the screen mode setting for console and cursor drawing.

Why doesn't that happen on shutdown?

On shutdown we would expect do_unblank_screen() to be called when KDSETMODE(text) is sent from the X server. But X isn't sending that.

This is probably because X always seems to exit with SIGTERM

#0  0xb6b45e54 in writev () from /lib/libc.so.6
#1  0x0007989c in _XSERVTransWritev (ciptr=ciptr@entry=0x4fe278,
    buf=buf@entry=0xbe881418, size=<optimized out>)
    at /usr/include/X11/Xtrans/Xtrans.c:884
#2  0x0006e9b0 in FlushClient (who=0x1cb000, who@entry=0x4e6738, oc=0x0,
    oc@entry=0x4c76f0, __extraBuf=0x0, extraCount=0) at io.c:892
#3  0x0006f214 in FlushAllOutput () at io.c:638
#4  0x000389c8 in Dispatch () at dispatch.c:450
#5  0x000282e8 in main (
    argc=<error reading variable: Cannot access memory at address 0xbe8814f0>,
    argv=<error reading variable: can't compute CFA for this frame>,
    envp=<optimized out>) at main.c:298

and that doesn't seem right.

comment:6 Changed 19 months ago by Quozl

Possibly related; I have an XO-4 that was fs-update'd several weeks ago, since yum update'd, which occasionally boots to X with a text cursor block visible, and occasionally hangs during power button shutdown with a static X display.

comment:7 Changed 19 months ago by dsd

Actually SIGTERM is normal here - X then catches that and tries to shut down cleanly.

In this case we are crashing in mrvlExaShutdownHal() - it is calling a non-existant symbol gckOS_SuspendInterrupt. Removing that call causes a kernel crash on X shutdown.

Also confirmed that this happens in build 28 (I stated otherwise above), which is the first build that included the dove driver.

comment:8 Changed 19 months ago by dsd

Kernel crash is:

gcmkONERROR: status=-17(gcvSTATUS_INVALID_DATA) @ gckKERNEL_DeleteRecord(475)
gcmkONERROR: status=-17(gcvSTATUS_INVALID_DATA) @ gckKERNEL_Dispatch(1151)
pxa168fb_open GFX layer, fbi 0 opened 3 times ----
pxa168fb_release GFX layer, fbi 0 opened 4 times ----
pxa168fb: set_screen for fbi 0
Unable to handle kernel NULL pointer dereference at virtual address 0000000a
pgd = ecab0000
[0000000a] *pgd=2c9b5831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT ARM
Modules linked in: fuse xt_tcpudp iptable_filter ip_tables x_tables mousedev joydev uinput mwifiex_sdio mwifiex psmouse mmp_camera syscopyarea sysfillrect sysimgblt fb_sys_fops videobuf2_dma_sg videobuf2_vmalloc videobuf2_memops videobuf2_core zforce ov7670 [last unloaded: udlfb]
CPU: 0    Not tainted  (3.5.7_xo4-20130405.1732.olpc.cc05f92 #1)
PC is at gckOS_ReadRegisterEx+0x30/0x94
LR is at gckHARDWARE_Interrupt+0x40/0xb4
pc : [<c0246fac>]    lr : [<c0258bac>]    psr: a0070193
sp : eca6fa20  ip : eca6fa38  fp : eca6fa34
r10: eca6fc38  r9 : ec12f000  r8 : 00000000
r7 : 00000049  r6 : ec006740  r5 : 00000000  r4 : ec053dbc
r3 : eca6fa44  r2 : 00000010  r1 : 00400040  r0 : 00000002
Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 2cab0059  DAC: 00000015
Process X (pid: 516, stack limit = 0xeca6e2f8)
Stack: (0xeca6fa20 to 0xeca70000)
fa20: eca6fa44 ec053dbc eca6fa5c eca6fa38 c0258bac c0246f88 00000000 00000000
fa40: 78afc3a5 c0246aa8 ec053dbc eca6e000 eca6fa6c eca6fa60 c02465cc c0258b78
fa60: eca6fa84 eca6fa70 c02439b8 c02465b0 ec34d140 eca6e000 eca6fac4 eca6fa88
fa80: c007e464 c024399c c0247308 c045a4c4 eca6facc 00000000 c0247ec0 ec006740
faa0: eca6e000 ec34d140 c068cb84 00000000 ec12f000 eca6fc38 eca6fae4 eca6fac8
fac0: c007e6d8 c007e3d8 00020000 ec006740 c068cb84 000000fa eca6fafc eca6fae8
fae0: c008118c c007e67c c00810a0 00000049 eca6fb14 eca6fb00 c007dc80 c00810ac
fb00: 00000000 00000000 eca6fb3c eca6fb18 c001eeb8 c007dc5c c01e82f8 00000001
fb20: 00000008 00000000 fe282104 eca6fbb4 eca6fb54 eca6fb40 c007dc80 c001ee18
fb40: 0000016f 00000008 eca6fb6c eca6fb58 c000fc9c c007dc5c c0252448 60070113
fb60: eca6fb7c eca6fb70 c0008550 c000fc38 eca6fbf4 eca6fb80 c000e84c c0008544
fb80: 00000000 00000000 00000204 00000001 00000000 ec351740 ec34e780 00000000
fba0: 00000000 ec12f000 eca6fc38 eca6fbf4 00000000 eca6fbc8 c0247308 c0252448
fbc0: 60070113 ffffffff c0254028 c024abd0 00000001 00000001 00000000 ec351740
fbe0: 00000000 ec351740 eca6fc1c eca6fbf8 c0252a8c c0252374 00000000 00000000
fc00: ec351740 ec12f000 00000000 00000000 eca6fc6c eca6fc20 c0261060 c02529fc
fc20: eca6fc38 c045a4c4 eca6fc6c 00000001 c0260820 002472e4 00000008 dffff010
fc40: ec368480 00000000 00000000 00000000 ec12f000 00000000 00000000 00000000
fc60: eca6fd54 eca6fc70 c0261378 c0260ca0 ec602440 c00560cc 00000000 00000204
fc80: 00000000 002e9b20 c163d520 c163d520 eca6fcbc eca6fca0 c00aa164 c01e415c
fca0: ec360378 c071ea68 eca6fcc8 00000000 eca6fcfc eca6fcc0 c01e415c c0247a60
fcc0: eca6fcec 00000000 ec360378 c0247308 eca6fcec eca6fce0 c0247308 c045a4c4
fce0: eca6e000 ec360340 ec360378 0000000d eca6fd14 c0247308 eca6fd14 eca6fd08
fd00: c0247308 c045a4c4 eca6fd44 c004322c eca6fd2c eca6fd20 c004322c c0043110
fd20: eca6fd3c eca6fd30 c0246788 c0043208 eca6fe28 ec34e880 00000001 00007530
fd40: ec754e40 befdc994 eca6fdf4 eca6fd58 c02509cc c02611ac 7823934f 00000001
fd60: eca6fd94 eca6fd70 c00d7204 c001aadc 7823934f 00000000 eca1eb78 00000204
fd80: 00000000 ec8b21c4 eca6fde4 eca6fd98 c00ca418 c00d71b4 00000001 ec9b7240
fda0: b6890000 00000200 00000028 0000006f b6890000 c162b6e0 eca6fe3c 782393cf
fdc0: eca1eb78 00000001 00271000 ec35c400 eca2da80 eca6e000 00007530 ec754e40
fde0: eca6e000 befdc994 eca6feec eca6fdf8 c0245dac c024fc7c eca6fdf8 eca6fdf8
fe00: eca6fe48 00000000 befdc9d0 000000a0 befdc9d0 000000a0 ec4c0840 eca1eb78
fe20: ecab0008 00000029 0000000f 00000001 b6825bec 001f8208 b6825efe b6825d0c
fe40: 00363790 b6fd64c0 000b7dc4 b6821000 00000000 00000000 b68d800c b6fddd48
fe60: befdcc64 b6fc6f20 00000000 00000001 00000001 00000000 00000000 002dea40
fe80: 00000008 00000000 002dea20 00000000 b68d800c b6fddd48 befdcc64 b68ba224
fea0: 002dea20 b68ba298 00000000 b683a4f4 002709b0 002dea20 002709bc 00000000
fec0: 00000000 b683a754 eca6ff3c befdc994 ec754e40 00007530 00000006 ec35ecf8
fee0: eca6fefc eca6fef0 c00f3190 c0245ac0 eca6ff7c eca6ff00 c00f3d70 c00f3164
ff00: c0008428 c0019ee0 ec76c3d0 ffffffff ffffffff ec754540 eca6ff34 eca6ff28
ff20: c00ff540 c00ff3d0 eca6ff6c eca6ff38 c00e467c c00ff518 00000000 00000000
ff40: 00000000 ec754540 00000000 ec755cc0 00000006 befdc994 ec754e40 00007530
ff60: 00000006 c000ef28 eca6e000 00020000 eca6ffa4 eca6ff80 c00f3e0c c00f3814
ff80: eca6ffa4 00000000 00007530 00002710 b68dc604 00000036 00000000 eca6ffa8
ffa0: c000ecc0 c00f3dd4 00007530 00002710 00000006 00007530 befdc994 001f9548
ffc0: 00007530 00002710 b68dc604 00000036 b68d8798 000000a0 000000a0 befdcc64
ffe0: b68d89e4 befdc98c b68bac94 b6b74d5c 600f0010 00000006 1c00fb00 e3001c10
[<c0246fac>] (gckOS_ReadRegisterEx+0x30/0x94) from [<c0258bac>] (gckHARDWARE_Interrupt+0x40/0xb4)
[<c0258bac>] (gckHARDWARE_Interrupt+0x40/0xb4) from [<c02465cc>] (gckKERNEL_Notify+0x28/0x34)
[<c02465cc>] (gckKERNEL_Notify+0x28/0x34) from [<c02439b8>] (isrRoutine+0x28/0x50)
[<c02439b8>] (isrRoutine+0x28/0x50) from [<c007e464>] (handle_irq_event_percpu+0x98/0x2a4)
[<c007e464>] (handle_irq_event_percpu+0x98/0x2a4) from [<c007e6d8>] (handle_irq_event+0x68/0x84)
[<c007e6d8>] (handle_irq_event+0x68/0x84) from [<c008118c>] (handle_level_irq+0xec/0x124)
[<c008118c>] (handle_level_irq+0xec/0x124) from [<c007dc80>] (generic_handle_irq+0x30/0x40)
[<c007dc80>] (generic_handle_irq+0x30/0x40) from [<c001eeb8>] (icu_mux_irq_demux+0xac/0xdc)
[<c001eeb8>] (icu_mux_irq_demux+0xac/0xdc) from [<c007dc80>] (generic_handle_irq+0x30/0x40)
[<c007dc80>] (generic_handle_irq+0x30/0x40) from [<c000fc9c>] (handle_IRQ+0x70/0x94)
[<c000fc9c>] (handle_IRQ+0x70/0x94) from [<c0008550>] (asm_do_IRQ+0x18/0x1c)
[<c0008550>] (asm_do_IRQ+0x18/0x1c) from [<c000e84c>] (__irq_svc+0x4c/0x94)
Exception stack(0xeca6fb80 to 0xeca6fbc8)
fb80: 00000000 00000000 00000204 00000001 00000000 ec351740 ec34e780 00000000
fba0: 00000000 ec12f000 eca6fc38 eca6fbf4 00000000 eca6fbc8 c0247308 c0252448
fbc0: 60070113 ffffffff
[<c000e84c>] (__irq_svc+0x4c/0x94) from [<c0252448>] (_IncrementCommitAtom+0xe0/0x134)
[<c0252448>] (_IncrementCommitAtom+0xe0/0x134) from [<c0252a8c>] (gckCOMMAND_ExitCommit+0x9c/0xe0)
[<c0252a8c>] (gckCOMMAND_ExitCommit+0x9c/0xe0) from [<c0261060>] (gckEVENT_Submit+0x3cc/0x4e8)
[<c0261060>] (gckEVENT_Submit+0x3cc/0x4e8) from [<c0261378>] (gckEVENT_Commit+0x1d8/0x24c)
[<c0261378>] (gckEVENT_Commit+0x1d8/0x24c) from [<c02509cc>] (gckKERNEL_Dispatch+0xd5c/0x2374)
[<c02509cc>] (gckKERNEL_Dispatch+0xd5c/0x2374) from [<c0245dac>] (drv_ioctl+0x2f8/0x404)
[<c0245dac>] (drv_ioctl+0x2f8/0x404) from [<c00f3190>] (vfs_ioctl+0x38/0x4c)
[<c00f3190>] (vfs_ioctl+0x38/0x4c) from [<c00f3d70>] (do_vfs_ioctl+0x568/0x5c0)
[<c00f3d70>] (do_vfs_ioctl+0x568/0x5c0) from [<c00f3e0c>] (sys_ioctl+0x44/0x70)
[<c00f3e0c>] (sys_ioctl+0x44/0x70) from [<c000ecc0>] (ret_fast_syscall+0x0/0x30)
Code: e59f1068 eb0056c9 e3e00000 e89da818 (e590c008) 
---[ end trace f8be7dc69c2c0634 ]---
Kernel panic - not syncing: Fatal exception in interrupt

comment:9 Changed 19 months ago by dsd

Kernel crash fixed in arm-3.5 2334dc80762d7fa67bd3b7820da2d95a572a62b3

comment:10 Changed 19 months ago by dsd

  • Action Needed changed from never set to add to build
  • Milestone changed from 13.1.0 to 13.2.0
  • Owner changed from jnettlet to dsd

It is strange that gckOS_SuspendInterrupt() was placed in the CloseScreen path. The "k" in gck means kernel - as if we were trying to call kernel code directly from userspace? Not sure of the origin here, it doesn't seem to be marvell symplicity #472550 where (I think) the XO-4 support code is sourced from.

Delete that call in xorg-x11-drv-dove-0.3.6.xo4, now things are working.

comment:11 Changed 18 months ago by dsd

  • Resolution set to fixed
  • Status changed from new to closed

Shutdown from Sugar in 13.2.0 build 4 works fine, X quits as it should and the ul-warning shutdown splash is displayed.

Note: See TracTickets for help on using tickets.