Ticket #12516 (closed defect: fixed)

Opened 18 months ago

Last modified 18 months ago

Reimaging with fs-update fails if USB keyboard attached to other XO-4 USB port

Reported by: greenfeld Owned by: Quozl
Priority: normal Milestone: 4-firmware
Component: ofw - open firmware Version: Development build as of this date
Keywords: Cc:
Action Needed: no action Verified: no
Deployments affected: Blocked By:
Blocking:

Description

  1. Attach a USB Keyboard (in this case a Lenovo Thinkpad Keyboard/Touchpad/J-Mouse unit) to the USB port of a SKU 296 unit on the audio jack side of the XO. {A different 296 unit was used than the one with the known runin hub issues.}
  2. Attach a Sandisk 16 GB USB stick to the other.
  3. Power on the XO (with Q7B14) and get into OFW.
  4. Run "fs-update u:\31028o4.zd" and wait a few minutes. You are likely to encounter a data abort before the re-imaging process continues.
  5. Power off the XO and remove the keyboard. Power on the XO and try re-imaging again. Every time I tried without the USB keyboard present the re-imaging succeeded.

Attachments

usbabort.txt (5.0 kB) - added by greenfeld 18 months ago.
Abort log with USB failure

Change History

Changed 18 months ago by greenfeld

Abort log with USB failure

Changed 18 months ago by Quozl

Thanks.

I do not have a SanDisk 16 GB USB drive to test with, but I have tested with a SanDisk Cruzer Colors 4 GB USB drive, and a flexible USB keyboard. With a test build that fixes #12466 and avoids heap allocation in the USB keyboard interrupt driver, I have reproduced several symptom patterns:

  • a successful fs-update but with loss of USB keyboard response afterwards,
  • a "Wrong expanded data length",
  • a "Short read of zdata file" followed by loss of USB drive access, and;
  • a hang,

But not your symptom.

Could you please also try with the USB keyboard interrupt driver turned off? (Theory: interference between the keyboard driver and the storage driver.) It fixes the symptom for me, but I'd like to see if your device combination does the same. Here is how I did it:

ok .alarms \ display the scheduled alarms
Action                Ihandle  Interval  Remaining
poll-tty                    0      a         3
get-scan             fd9ffa60      1         1
get-scan             fd9f77d0      a         3
ok usb-keyboard-ih . \ check that the USB keyboard instance handler is in the list
fd9f77d0
ok usb-keyboard-ih iselect ' get-scan 0 alarm iunselect \ remove it
ok .alarms \ check that it was removed
Action                Ihandle  Interval  Remaining
poll-tty                    0      a         9
get-scan             fd9ffa60      1         1
ok fs-update ...

If the above test does not pass without error, could you please also try with an external powered USB hub between the laptop and both USB devices? (Theory: USB power problems.)

Changed 18 months ago by greenfeld

Retesting with Q7B14 I saw the "Wrong expanded data length" symptom twice in two tries without disabling the keyboard handler, but not the Data Abort one.

Removing the USB keyboard handler I was able to successfully re-image with the USB keyboard attached twice, power cycling and re-entering the disabling commands in-between.

The layout of USB devices OFW sees I am using is:

/usb@d4208000/hub@0,0
/usb@d4208000/hub@0,0/hub@3,0
/usb@d4208000/hub@0,0/scsi@2,0
/usb@d4208000/hub@0,0/hub@3,0/device@4,1
/usb@d4208000/hub@0,0/hub@3,0/mouse@4,0
/usb@d4208000/hub@0,0/hub@3,0/hid@3,1
/usb@d4208000/hub@0,0/hub@3,0/keyboard@3,0
/usb@d4208000/hub@0,0/scsi@2,0/disk

Changed 18 months ago by Quozl

Thanks.

I have tested continuous XO-4 fs-update with the USB keyboard interrupt handler disabled, for 73 hours straight, with no exceptions reported.

Therefore the problem relates to the re-entrancy of the USB stack, or the interrupt handler.

Adding a coarse interlock that prevents execution of the handler when the USB stack is in use by a command does work, but then the USB keyboard is unresponsive at some unusual times, such as the more prompt in the directory command.

I am testing continuous XO-1.5 fs-update with the USB keyboard interrupt handler enabled, with no exceptions reported so far.

Then I shall test XO-1.5 with a hub, in the hope that this is a re-entrancy issue associated with USB hubs alone.

Changed 18 months ago by Quozl

I have tested continuous XO-1.5 fs-update with the USB keyboard interrupt handler enabled, for 20 hours straight, with no exceptions reported. A USB keyboard was directly attached.

I have tested XO-1.5 fs-update with a USB keyboard and USB drive attached via a hub. Failures occur, including:

  • a hang 1, 2, (keyboard works, drive doesn't work, USB ERROR),
  • a Page Fault (keyboard works, drive works),
  • a Bad hash for eblock#, and
  • two instances of Short read of zdata file.

I have tested XO-1.5 fs-update with a USB keyboard attached via a hub, and a USB drive directly attached. Failures occur, including:

  • a hang 3, (keyboard works, drive doesn't work, USB ERROR),

I have tested XO-1.5 fs-update with a USB drive attached via a hub, and a USB keyboard directly attached. No failures occur, over 29 cycles.

I have tested XO-4 fs-update with a USB bar code scanner attached. The scanner appears as a keyboard. No failures occur, over 24 cycles.

Therefore the problem relates to the re-entrancy of the USB stack, or the interrupt handler, when specific models of USB keyboard or bar code scanner are used on XO-1.75 or XO-4, or attached via a hub on XO-1.5.

Changed 18 months ago by Quozl

  • next_action changed from diagnose to test in build
  • milestone changed from Not Triaged to 4-firmware

Fixed in svn 3536, which adds an interlock to skip the USB keyboard interrupt handler if the non-interrupt execution path is in the middle of a USB bulk or interrupt pipe method.

Test builds:

  • q7b14jc.rom for XO-4 (testcase: attach USB keyboard directly, attach USB drive directly, see below),
  • q3c10jb.rom for XO-1.5 (testcase: attach USB keyboard via an external hub, attach USB drive directly, see below),

Common testcase conclusion ... do repeated fs-update, it should complete without error and USB keyboard should be functional afterwards.

Changed 18 months ago by Quozl

Passed 268 cycles on XO-4, and 271 cycles on XO-1.5.

(A flexible USB keyboard did stop responding on the XO-4 test, with control-set operations returning USB error code 8 (STALL), but the significance is not yet clear).

Changed 18 months ago by Quozl

  • status changed from new to closed
  • next_action changed from test in build to no action
  • resolution set to fixed

Passed 981 cycles on XO-4. Is in Q7B15.

Note: See TracTickets for help on using tickets.