Ticket #10568 (closed defect: fixed)

Opened 4 years ago

Last modified 3 years ago

XO-1 graphical boot hangs if SISUSB VGA adapter attached

Reported by: greenfeld Owned by: dsd
Priority: low Milestone: 12.1.0
Component: initscripts Version: Development source as of this date
Keywords: Cc: martin.langhoff, kevgor
Action Needed: test in build Verified: no
Deployments affected: Blocked By:
Blocking:

Description

  1. Plug a SISUSB VGA device into an XO-1 with 10.1.3 os360.
  2. Power on the XO-1.

Expected: The system should boot.

Actual: The system hangs with the picture in the graphical boot pointing an arrow at the first dot.

Workaround: Use a verbose (text printing) boot from open firmware to start the XO-1, such as by typing "boot" at the OFW prompt on an unsecured laptop when using a SISUSB VGA device.

XO-1.5s are not affected by this at this time.

Attachments

usbvgahang.log (28.6 kB) - added by greenfeld 3 years ago.
Serial console log showing hang condition from two runs of it
vgahang2.jpg (434.4 kB) - added by greenfeld 3 years ago.
Photo of hang condition
messages (225.9 kB) - added by greenfeld 3 years ago.
Kernel messages along with sysrq output

Change History

  Changed 4 years ago by martin.langhoff

This is nasty. I am certain I have pretty-booted XO-1s with the USB2VGA. Have we seen it in more than one XO?

Next step -- log a pretty boot w USB2VGA from a serial adapter.

  Changed 4 years ago by martin.langhoff

  • cc martin.langhoff added

  Changed 4 years ago by greenfeld

Debugging this may be hard, or we could be dealing with a flaky machine:

Variables which may or may not make sense

  • May happen on only one XO-1 (SHC92805C12)
  • May depend on the USB port used, if we are using external power, if the adapter is plugged into the VGA monitor, etc.
  • Happens often when we don't want it to happen, but rarely happens when we are trying to debug it.
  • In general this is being a Heisenbug.

  Changed 4 years ago by martin.langhoff

Rough assessment: happens 20% of the time. Only spotted on a particular XO-1 unit so far.

  Changed 4 years ago by dsd

  • milestone changed from 11.2.0-M3 to 11.2.0-M4

  Changed 3 years ago by dsd

  • priority changed from normal to low

  Changed 3 years ago by dsd

This needs boot logs captured with serial console up til point of crash.

  Changed 3 years ago by greenfeld

Intermittently reproduced on 11.2.0 os20 with SHC92805C12 using the USB port under the audio jacks for the VGA adapter on battery power. I don't know how relevant all these factors are, but since they're in the ticket I'll stick with them.

Repetitively produced using the same configuration on SHC92805B44 which has a serial port cable on it so I could attach logs. While the serial log makes it look like a possible libertas issue, two more lines are seen on the console in verbose mode (where it still hangs on this system). I took a photo of this, where the sisusb adapter attaches, we see an [ OK ] from the init scripts with no item, and then "Setting hostname localhost.localdomain... [ OK ]" as the final line.

Changed 3 years ago by greenfeld

Serial console log showing hang condition from two runs of it

Changed 3 years ago by greenfeld

Photo of hang condition

  Changed 3 years ago by dsd

The libertas messages always appear and are unrelated.

The logs don't provide as much info as I was hoping for. A few things to look at for further diagnosis:

  1. Do any other sisusb messages usually appear after "Allocated 8 buffers"?
  2. Can you get a sysrq task dump at the point of hang?
  3. See if you can reproduce it as follows: move the sisusbvga.ko file to somewhere where modprobe can't find it, boot the system and stop X. Then run a loop at the shell of insmod;rmmod, see if that makes it possible to reproduce the hang quickly

  Changed 3 years ago by greenfeld

Moved the kernel module out of the way, booted an XO-1, stopped prefdm, and manually insmod'd sisusbvga later.

If the USB2VGA adapter was inserted while the computer was turned on, the following lines are seen, followed by insmod hanging:

[  840.586868] usb 2-2: new high speed USB device using ehci_hcd and address 3
[  851.641357] usb 2-2: USB2VGA dongle found at address 3
[  851.658333] usb 2-2: Allocated 8 output buffers

If the USB2VGA adapter is inserted after the computer fully boots and the above steps were taken, the following two additional dmesg lines appear in addition to the three above:

[  851.854686] usb 2-2: 8MB 1 ch/1 r SDR SDRAM, bus width 32
[  852.799084] usbcore: registered new interface driver sisusb

Changed 3 years ago by greenfeld

Kernel messages along with sysrq output

  Changed 3 years ago by dsd

analysis of the above messages attachment:

(gdb) list *sisusb_init_gfxdevice+0x9fa
0x3139 is in sisusb_init_gfxdevice (drivers/usb/misc/sisusbvga/sisusb.c:1510).
1505		ret |= WRITEL(ramptr +  4, 0x456789ab);
1506		ret |= WRITEL(ramptr +  8, 0x89abcdef);
1507		ret |= WRITEL(ramptr + 12, 0xcdef0123);
1508		ret |= WRITEL(ramptr + 16, 0x55555555);
1509		ret |= WRITEL(ramptr + 20, 0x55555555);
1510		ret |= WRITEL(ramptr + 24, 0xffffffff);
1511		ret |= WRITEL(ramptr + 28, 0xffffffff);
1512		ret |= READL(ramptr +  0, &t0);
1513		ret |= READL(ramptr +  4, &t1);
1514		ret |= READL(ramptr +  8, &t2);
(gdb) list *sisusb_send_packet+0x4f
0x9b5 is in sisusb_send_packet (drivers/usb/misc/sisusbvga/sisusb.c:573).
568	
569		/* 1. send the packet */
570		ret = sisusb_send_bulk_msg(sisusb, SISUSB_EP_GFX_OUT, len,
571				(char *)packet, NULL, 0, &bytes_transferred, 0, 0);
572	
573		if ((ret == 0) && (len == 6)) {
574	
575			/* 2. if packet len == 6, it means we read, so wait for 32bit
576			 *    return value and write it to packet->data
577			 */
(gdb) list *sisusb_send_bulk_msg+0x2f6
0x45a is in sisusb_send_bulk_msg (drivers/usb/misc/sisusbvga/sisusb.c:256).
251		/* Submit URB */
252		retval = usb_submit_urb(urb, GFP_KERNEL);
253	
254		/* If OK, and if timeout > 0, wait for completion */
255		if ((retval == 0) && timeout) {
256			wait_event_timeout(sisusb->wait_q,
257					   (!(sisusb->urbstatus[index] & SU_URB_BUSY)),
258					   timeout);
259			if (sisusb->urbstatus[index] & SU_URB_BUSY) {
260				/* URB timed out... kill it and report error */

  Changed 3 years ago by dsd

Please reproduce this again, and capture another crash dump (I want to see if it always hangs in exactly the same place).

Then, wait 5 seconds, and without rebooting, capture another dump. This will clarify if the system has hung in 1 place or is just looping.

  Changed 3 years ago by dsd

Then, install this kernel: http://dev.laptop.org/~dsd/20110728/kernel-2.6.35.13_xo1-20110728.1633.olpc.377bb5a.i586.rpm

it will be quite noisy in its output. please post messages from boot til point of hang, followed by a task dump.

  Changed 3 years ago by dsd

  • cc kevgor added

This seems to be caused by OHCI being loaded before EHCI (#10746). Making both OHCI and EHCI built into the kernel works around the issue, as in this test kernel: http://dev.laptop.org/~dsd/20111021/kernel-2.6.35.13_xo1-20111021.1415.olpc.9d949a3.i586.rpm

USB was originally made modular for XO-1 because it resulted in more attractive suspend/resume behaviour (IIRC, can't find references). Need to see if it would be safe to make them builtin as in the above kernel, or alternatively we can look at udev to see if there is a way that it can enforce this load order quirk.

  Changed 3 years ago by dsd

  • next_action changed from diagnose to add to build
  • milestone changed from 11.3.0 to 12.1.0

udev makes no effort to load these modules in the right order, it leaves this up to the distro. Both fedora and ubuntu use USB=y presumably for this reason.

Looks like the earlier reason for modular USB was to enable "extreme mode" power savings: #6935

However, we don't currently offer this functionality in our images, and carrying out this power saving by unloading the module doesn't seem like the correct approach (especially in light of issues such as this one).

XO-1 USB now builtin (like XO-1.5 and XO-1.75) in x86-3.1 2098f73d22

follow-up: ↓ 17   Changed 3 years ago by dsd

  • next_action changed from add to build to test in build

Kevin, please test this in 12.1.0 build 2.

in reply to: ↑ 16   Changed 3 years ago by kevgor

Replying to dsd:

Kevin, please test this in 12.1.0 build 2.

DSD: All good, works on multiple XO-1's with build 2. KG.

  Changed 3 years ago by dsd

  • status changed from new to closed
  • resolution set to fixed

thanks

Note: See TracTickets for help on using tickets.