Opened 2 years ago

Closed 22 months ago

#12101 closed defect (fixed)

cl4: touchpad missing after reboot

Reported by: pgf Owned by:
Priority: blocker Milestone: 4-software
Component: kernel Version: Development source as of this date
Keywords: XO-4, touchpad Cc: wad, dsd, greenfeld
Blocked By: Blocking:
Deployments affected: Action Needed: no action
Verified: no

Description

observed just once, on os20, unmodified kernel. system booted normally from powerup, then on reboot, got this:

psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0

numerous retry attempts were not successful.

full log leading to error:

bash-4.2# reboot
[   49.214788] dcon_freeze_store: 0
[   59.145352] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[   59.407417] Restarting system.

CForth built 2012-09-06 23:39 from commit 7b873ab6...
Matrix mode on
releasing
USB devices:
/usb@d4208000/hub@0,0
OLPC 4A2, 1063 MHz, 2 GiB memory, 4 GB internal storage, S/N Unknown
OpenFirmware  Q7A13   EC Firmware 0.1.08   2012-09-10 18:34:40 UTC

Type the Esc key to interrupt automatic startup
Boot device: /sd/sdhci@d4281000/disk:\boot\olpc.fth  Arguments: 
Boot device: /sd/sdhci@d4281000/disk:\boot\vmlinuz  Arguments: ttyS2 zforce.multitouch=y
Loading ramdisk image from /sd/sdhci@d4281000/disk:\boot\initrd.img ...
Uncompressing Linux... done, booting the kernel.
[    0.000000] Booting Linux on physical CPU 0
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.5.0_xo1.75-20120907.1358.olpc.b5fea0f (martin@koji3.laptop.org) (gcc version 4.7.0 20120507 (Red Hat 4.7.0-5) (GCC) ) #1 PREEMPT Fri Sep 7 14:03:56 EDT 2012
[    0.000000] CPU: ARMv7 Processor [562f5842] revision 2 (ARMv7), cr=10c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
[    0.000000] Machine: Marvell MMP2 (Device Tree Support), model: 4A2
[    0.000000] cma: CMA: reserved 64 MiB at 28000000
[    0.000000] Memory policy: ECC disabled, Data cache writeback
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 518160
[    0.000000] Kernel command line: console=ttyS2,115200 console=tty0 selinux=0 fbcon=font:SUN12x22 ttyS2 zforce.multitouch=y
[    0.000000] PID hash table entries: 4096 (order: 2, 16384 bytes)
[    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Memory: 2040MB = 2040MB total
[    0.000000] Memory: 1994556k/1994556k available, 94404k reserved, 1310720K highmem
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[    0.000000]     vmalloc : 0xf0000000 - 0xff000000   ( 240 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xef800000   ( 760 MB)
[    0.000000]     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
[    0.000000]     modules : 0xbf000000 - 0xbfe00000   (  14 MB)
[    0.000000]       .text : 0xc0008000 - 0xc05a1fac   (5736 kB)
[    0.000000]       .init : 0xc05a2000 - 0xc05dc000   ( 232 kB)
[    0.000000]       .data : 0xc05dc000 - 0xc0617a38   ( 239 kB)
[    0.000000]        .bss : 0xc0617a5c - 0xc06b2c8c   ( 621 kB)
[    0.000000] NR_IRQS:16 nr_irqs:16 16
[    0.000000] matched_mmp3_of_clk_apmu_setup
[    0.000000] matched_mmp3_of_clk_apbc_setup
[    0.000000] sched_clock: 32 bits at 6MHz, resolution 153ns, wraps every 660764ms
[    0.000000] Console: colour dummy device 80x30
[    0.000000] console [tty0] enabled
[    0.040154] Calibrating delay loop... 1059.22 BogoMIPS (lpj=5296128)
[    0.040154] pid_max: default: 32768 minimum: 301
[    0.040193] Mount-cache hash table entries: 512
[    0.044120] CPU: Testing write buffer coherency: ok
[    0.044120] ftrace: allocating 15659 entries in 46 pages
[    0.072463] Setting up static identity map for 0x4084f8 - 0x408550
[    0.074313] devtmpfs: initialized
[    0.082329] NET: Registered protocol family 16
[    0.092371] DMA: preallocated 256 KiB pool for atomic coherent allocations
[    0.094959] Tauros2: Disabling L2 prefetch.
[    0.095014] Tauros2: Enabling L2 cache.
[    0.095035] Tauros2: L2 cache support initialised in ARMv7 mode.
[    0.095079] Wow!  Can't register IRQ for DMA
[    0.095079] OLPC board revision 4A2 (EC api 4)
[    0.105207] bio: create slab <bio-0> at 0
[    0.105503] SCSI subsystem initialized
[    0.106058] usbcore: registered new interface driver usbfs
[    0.106383] usbcore: registered new interface driver hub
[    0.106522] usbcore: registered new device driver usb
[    0.107629] i2c-gpio camera-i2c.0: using pins 166 (SDA) and 165 (SCL)
[    0.108519] i2c-gpio dcon-i2c.1: using pins 167 (SDA) and 168 (SCL)
[    0.109792] I2C: i2c-2: PXA I2C adapter
[    0.110547] I2C: i2c-3: PXA I2C adapter
[    0.111555] I2C: i2c-5: PXA I2C adapter
[    0.112788] I2C: i2c-4: PXA I2C adapter
[    0.112788] Linux video capture interface: v2.00
[    0.112980] Advanced Linux Sound Architecture Driver Version 1.0.25.
[    0.114087] cfg80211: Calling CRDA to update world regulatory domain
[    0.114593] Switching to clocksource clocksource
[    0.136817] NET: Registered protocol family 2
[    0.137144] IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.137144] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.137804] TCP bind hash table entries: 65536 (order: 6, 262144 bytes)
[    0.139109] TCP: Hash tables configured (established 131072 bind 65536)
[    0.139450] TCP: reno registered
[    0.139501] UDP hash table entries: 512 (order: 1, 8192 bytes)
[    0.139501] UDP-Lite hash table entries: 512 (order: 1, 8192 bytes)
[    0.139540] NET: Registered protocol family 1
[    0.140147] Trying to unpack rootfs image as initramfs...
[    0.199095] Freeing initrd memory: 4588K
[    0.201948] highmem bounce pool size: 64 pages
[    0.202417] VFS: Disk quotas dquot_6.5.2
[    0.202582] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[    0.203504] msgmni has been set to 1472
[    0.204987] io scheduler noop registered
[    0.204987] io scheduler deadline registered
[    0.205150] io scheduler cfq registered (default)
[    0.214940] Console: switching to colour frame buffer device 100x40
[    0.234027] pxa2xx-uart.1: ttyS2 at MMIO 0xd4017000 (irq = 28) is a UART3
[    0.674245] console [ttyS2] enabled
[    0.685274] brd: module loaded
[    0.708741] lis3lv02d: 8 bits 3DC sensor found
[    0.795198] input: ST LIS3LV02DL Accelerometer as /devices/platform/lis3lv02d/input/input0
[    0.804078] OLPC XO-1.75 Embedded Controller driver probed
[    0.814149] input: Power Button as /devices/d4037000.ec-spi/input/input1
[    0.844921] input: OLPC OLS notify as /devices/ols.4/input/input2
[    0.856143] OLPC Outdoor Light Sensor driver probed
[    0.865814] OLPC XO-1.75 lid and ebook switches
[    0.875029] input: OLPC lid switch as /devices/virtual/input/input3
[    0.886689] input: OLPC ebook switch as /devices/virtual/input/input4
[    0.898220] usbcore: registered new interface driver ipheth
[    0.909381] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    0.920957] pxau2o-ehci d4208000.usb: pxa9xx ehci
[    0.931089] pxau2o-ehci d4208000.usb: new USB bus registered, assigned bus number 1
[    0.948256] pxau2o-ehci d4208000.usb: irq 44, io mem 0xd4208000
[    0.974976] pxau2o-ehci d4208000.usb: USB 2.0 started, EHCI 1.00
[    0.985985] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[    0.997562] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.009557] usb usb1: Product: pxa9xx ehci
[    1.018316] usb usb1: Manufacturer: Linux 3.5.0_xo1.75-20120907.1358.olpc.b5fea0f ehci_hcd
[    1.031364] usb usb1: SerialNumber: d4208000.usb
[    1.041441] hub 1-0:1.0: USB hub found
[    1.050166] hub 1-0:1.0: 1 port detected
[    1.059324] Initializing USB Mass Storage driver...
[    1.069083] usbcore: registered new interface driver usb-storage
[    1.079818] USB Mass Storage support registered.
[    1.089249] usbcore: registered new interface driver libusual
[    1.099751] usbcore: registered new interface driver mdc800
[    1.110084] mdc800: v0.7.5 (30/10/2000):USB Driver for Mustek MDC800 Digital Camera
[    1.122414] usbcore: registered new interface driver usbserial
[    1.133271] usbcore: registered new interface driver usbserial_generic
[    1.144706] USB Serial support registered for generic
[    1.154676] usbserial: USB Serial Driver core
[    1.164577] OLPC SP keyboard/touchpad driver probed
[    1.285241] zforce_input_setup: running in multitouch mode
[    1.295622] input: zForce touchscreen as /devices/d4033000.i2c/i2c-5/5-0050/input/input5
[    1.314085] input: AT Translated Set 2 keyboard as /devices/d4290000.ap-sp/serio0/input/input6
[    1.339097] zforce_i2c_probe: found zForce firmware 0.0.0.2
[    1.384913] usb 1-1: new high-speed USB device number 2 using pxau2o-ehci
[    1.404259] sa1100-rtc d4010000.wakeup-rtc: rtc core: registered d4010000.wakeup-rtc as rtc0
[    1.418561] rtc_idt1338 3-0068: rtc core: registered rtc_idt1338 as rtc1
[    1.546716] usb 1-1: New USB device found, idVendor=05e3, idProduct=0608
[    1.558663] usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[    1.570942] usb 1-1: Product: USB2.0 Hub
[    1.580905] hub 1-1:1.0: USB hub found
[    1.590167] hub 1-1:1.0: 4 ports detected
[    1.679449] sdhci: Secure Digital Host Controller Interface driver
[    1.690601] sdhci: Copyright(c) Pierre Ossman
[    1.700136] mmc0: FIXME: setting broken card detection quirk, remove this hack once the 'broken-cd' DT property is in-place.
[    1.754954] mmc0: SDHCI controller on d4280800.sdhci [d4280800.sdhci] using ADMA
[    1.767718] mmc1: FIXME: setting broken card detection quirk, remove this hack once the 'broken-cd' DT property is in-place.
[    1.819377] mmc0: new SDIO card at address 0001
[    1.829360] mmc1: SDHCI controller on d4281000.sdhci [d4281000.sdhci] using ADMA
[    1.842503] mmc2: FIXME: setting broken card detection quirk, remove this hack once the 'broken-cd' DT property is in-place.
[    1.904947] mmc2: SDHCI controller on d4280000.sdhci [d4280000.sdhci] using ADMA
[    1.918121] mmc3: FIXME: setting broken card detection quirk, remove this hack once the 'broken-cd' DT property is in-place.
[    1.943347] mmc1: new high speed DDR MMC card at address 0001
[    1.956261] mmcblk0: mmc1:0001 SEM04G 3.68 GiB 
[    1.966846] mmcblk0boot0: mmc1:0001 SEM04G partition 1 1.00 MiB
[    1.978732] mmc3: SDHCI controller on d4217000.sdhci [d4217000.sdhci] using ADMA
[    1.992624] sdhci-pltfm: SDHCI platform and OF driver helper
[    2.004292] mmcblk0boot1: mmc1:0001 SEM04G partition 2 1.00 MiB
[    2.017719] usbcore: registered new interface driver usbhid
[    2.029460]  mmcblk0: p1 p2
[    2.039609] usbhid: USB HID core driver
[    2.049537] olpc-dcon:  Discovered DCON version 2
[    2.061136]  mmcblk0boot1: unknown partition table
[    2.076354] i2c-core: driver [olpc_dcon] using legacy suspend method
[    2.089654] i2c-core: driver [olpc_dcon] using legacy resume method
[    2.102161]  mmcblk0boot0: unknown partition table
[    2.113387] usbcore: registered new interface driver snd-usb-audio
[    2.415864] asoc: rt5631-hifi <-> mmp-sspa-dai.0 mapping ok
[    2.428970] input: OLPC XO-1.75 Headphone Jack as /devices/platform/olpc-xo-1-75-audio/sound/card0/input7
[    2.449826] input: OLPC XO-1.75 Mic Jack as /devices/platform/olpc-xo-1-75-audio/sound/card0/input8
[    2.471467] TCP: bic registered
[    2.481072] Initializing XFRM netlink socket
[    2.491316] NET: Registered protocol family 10
[    2.502625] NET: Registered protocol family 17
[    2.514227] lib80211: common routines for IEEE802.11 drivers
[    2.526473] Key type dns_resolver registered
[    2.536427] VFP support v0.3: implementor 56 architecture 2 part 20 variant 9 rev 6
[    2.549596] ThumbEE CPU extension supported.
[    2.559266] Registering SWP/SWPB emulation handler
[    2.569519] PJ4 iWMMXt coprocessor enabled.
[    2.583991] rtc_idt1338 3-0068: setting system clock to 2012-09-10 18:34:48 UTC (1347302088)
[    2.599438] ALSA device list:
[    2.608843]   #0: OLPC XO-1.75
[    2.617273] Freeing init memory: 232K
[    2.671258] tmpfs: No value for mount option 'strictatime'
[    2.707942] tmpfs: No value for mount option 'strictatime'
[    2.735672] tmpfs: No value for mount option 'strictatime'
[    2.811257] dracut: dracut-018-36.git20120510.fc17
[    2.868936] udevd[71]: starting version 182
[    4.451002] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
[    4.500469] dracut: Mounted root filesystem /dev/disk/mmc/mmc1p2
[    4.610994] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[    4.721603] EXT4-fs (mmcblk0p1): mounted filesystem without journal. Opts: (null)
[    4.766184] Dsize 7733248 Psize 7593984 Pstart 139264 Pend 7733248
[    5.011980] dracut: Switching root
[    5.358604] systemd[1]: systemd 44 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP; fedora)
[    5.401292] systemd[1]: Set hostname to <xo-e4-fc-36.localdomain>.
[    6.272214] udevd[173]: starting version 182
[    6.846673] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[    7.850241] ov7670 0-0021: chip found @ 0x42 (i2c-gpio-1)
[    8.147330] mousedev: PS/2 mouse device common for all mice
[    8.252202] got GPIOs power=150 reset=102
[    8.411628] calling clk_mmp_enable_ccic
[    8.684345] psmouse serio1: sentelic: Finger Sensing Pad, hw: 13.2.1, sn: 0, sw: 1.1.0-K
[    9.063017] input: FSPPS/2 Sentelic FingerSensingPad as /devices/d4290000.ap-sp/serio1/input/input9
[    9.321507] libertas_sdio: Libertas SDIO driver
[    9.336222] usbcore: registered new interface driver sisusb
[    9.395734] calling clk_mmp_disable_ccic
[    9.418559] usbcore: registered new interface driver udlfb
[    9.433600] libertas_sdio: Copyright Pierre Ossman
[    9.844355] usbcore: deregistering interface driver udlfb
[   10.134935] runin-check: noop
[   10.773100] input: olpc-kbdshim virtual touchscreen as /devices/virtual/input/input10
[   12.544978] calling clk_mmp_enable_ccic
[   12.946663] input: olpc-kbdshim virtual keyboard as /devices/virtual/input/input11
[   13.001264] input: olpc-kbdshim virtual mouse as /devices/virtual/input/input12
[   13.742819] libertas_sdio mmc0:0001:1: (unregistered net_device): 20:7c:8f:74:39:14, fw 9.70.3p36, cap 0x000003a3
[   13.768712] libertas_sdio mmc0:0001:1: wlan0: Marvell WLAN 802.11 adapter
[   14.265013] calling clk_mmp_disable_ccic
[   14.519982] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   14.889860] ip_tables: (C) 2000-2006 Netfilter Core Team
[   15.210542] psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0
[   15.544948] psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0
[   16.554941] psmouse serio1: Failed to deactivate mouse on olpc_touchpad/serio0
[   17.665065] psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0
[   19.175045] fuse init (API version 7.19)
[   19.434544] psmouse serio1: Failed to deactivate mouse on olpc_touchpad/serio0
[   21.535023] psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0

bash-4.2# 
bash-4.2# 

Attachments (3)

mouse-lost.txt (56.5 KB) - added by dsd 23 months ago.
log from 13.1.0 build 29
mouse-lost2.sp.txt (5.8 KB) - added by dsd 23 months ago.
failure case with SP logging from http://dev.laptop.org/~dsd/20130212/consoleio-debug.patch
mouse-lost2.kernel.txt (130.4 KB) - added by dsd 23 months ago.
kernel log corresponding to the same failure instance

Download all attachments as: .zip

Change History (27)

comment:1 follow-up: Changed 2 years ago by pgf

not reliably reproducible, but perhaps the problem is related to this shutdown issue, which seems much easier to make happen (i.e., all the time for me, currently). i'm running q7b06pa, and kernel 3.5.4-00608-g334324e, on os8.

bash-4.2# reboot
[   62.287276] systemd[1]: olpc-switchd.service: main process exited, code=exited, status=1
[   62.314936] systemd[1]: Unit olpc-switchd.service entered failed state.
[   62.322246] systemd[1]: olpc-kbdshim.service: main process exited, code=exited, status=1
[   62.714089] dcon_freeze_store: 0
[   71.058659] systemd-journald[224]: Received SIGTERM
[   71.143134] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[   71.383127] psmouse serio1: Failed to deactivate mouse on olpc_touchpad/serio0
[   71.783135] psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0
[   71.783135] psmouse serio1: sentelic: Unable get OPC state.
[   71.993136] psmouse serio1: Failed to deactivate mouse on olpc_touchpad/serio0
[   73.193130] psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0
[   74.193135] psmouse serio1: Failed to deactivate mouse on olpc_touchpad/serio0
[   76.193124] psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0
[   79.193118] Restarting system.


comment:2 Changed 2 years ago by wad

  • Component changed from not assigned to kernel

Also seen on13.1.0, os15 w. q7b07 running on an XO-4 C1.

Assigning to kernel, as systemd isn't a supported Component in trac.

comment:3 Changed 2 years ago by erikos

see as well #12370 for another case on os15.

comment:4 Changed 2 years ago by dsd

  • Cc wad added
  • Priority changed from normal to blocker

I've seen this a couple of times, and wad has seen it in #12460.

comment:5 Changed 2 years ago by dsd

Sorry, I meant #12459.

comment:6 Changed 2 years ago by wad

  • Keywords XO-4 touchpad added

Interesting that I saw this on a Synaptics touchpad (log at #12459), and this ticket is for a Sentellic. This problem has been more frequent than the few trac tickets indicate.

comment:7 Changed 2 years ago by dsd

  • Cc dsd added

I ran my C1 in a scripted reboot loop overnight, didn't catch any instance of this message during bootup (didn't check for it during shutdown).

comment:8 Changed 2 years ago by pgf

  • Cc cerlyn added

samuel reports (on the eng. call) that he sees this problem sporadically in his runin logs. (at least i think that's what he said.)

comment:9 in reply to: ↑ 1 Changed 23 months ago by dsd

  • Cc greenfeld added; cerlyn removed

Replying to pgf:

not reliably reproducible, but perhaps the problem is related to this shutdown issue, which seems much easier to make happen (i.e., all the time for me, currently). i'm running q7b06pa, and kernel 3.5.4-00608-g334324e, on os8.

I was seeing the exact same messages on shutdown as you on 13.1.0 build 16, Q7B07, however after upgrading to build 27 and Q7B14, the messages have gone away. Testing with XO-4 B1.

This also coincides with the fact that I haven't seen the loss-of-input-devices problem recently on my B1 nor the C1, so I wonder if we have fixed this. Has anyone seen this problem or these kernel messages recently?

comment:10 Changed 23 months ago by greenfeld

I still have seen the kernel messages at the end of runin on one B1 unit.

I have not tried to boot the system into interactive mode yet after seeing it happen in runin to see if the mouse was truly lost.

comment:11 Changed 23 months ago by dsd

Please detail the OS build, firmware version and EC version running, and whether or not the mouse was actaully lost.

comment:12 Changed 23 months ago by greenfeld

I forced a runin stop to check this.

Still seen with 13.1.0 os27/Q7B14/EC 3.0.10 on a SKU 292 (1 GHz/1 GB RAM/8 GB MMC).

[70838.253578] runin-main: reboot
[70838.489256] pxa168fb_open GFX layer, fbi 0 opened 1 times ----
[70838.854549] pxa168fb: set_screen for fbi 0
[70839.029622] pxa168fb_release GFX layer, fbi 0 opened 2 times ----
[70846.966687] systemd-journald[219]: Received SIGTERM
[70851.991357] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[70851.998003] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[70852.004430] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[70852.010810] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[70852.017122] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[70852.031524] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[70852.304483] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[70852.312443] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[70852.318117] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[70852.325014] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null)
[70852.340216] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[70852.346070] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[70852.355104] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[70852.360541] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[70852.569258] psmouse serio1: Failed to disable mouse on olpc_touchpad/serio0
[70852.769241] psmouse serio1: Failed to deactivate mouse on olpc_touchpad/serio0
[70853.169240] psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0
[70853.169253] psmouse serio1: sentelic: Unable get OPC state.
[70854.179248] psmouse serio1: Failed to deactivate mouse on olpc_touchpad/serio0
[70856.179235] psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0
[70857.179251] psmouse serio1: Failed to deactivate mouse on olpc_touchpad/serio0
[70859.179236] psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0
[70862.179239] Restarting system.

Changed TS tag from DEVL to SHIP (which forced a second reboot) and the Sentelic touchpad worked. After this I tried to force runin to run once via filesystem flags, suspend 20 times, and then force runin to stop, but this did not cause the issue.

comment:13 Changed 23 months ago by dsd

Putting my B1 in a reboot loop (13.1.0 build 27 with the included firmware versions) and logging serial, after a few hours I caught a case where these messages appeared during both bootup and shutdown - I guess the mouse was also not usable at this time. So it seems likely that this issue still exists. I'll dig further into the errors.

comment:14 Changed 23 months ago by dsd

That reboot loop is running again, with some extra kernel debugging which will hopefully give more insight into the problem.

In the mean time, Jon Nettleton encountered the problem again and noted that /proc/interrupts contained:

169:        359   icu_irq

(ICU IRQ 169 claimed by no handler?)

This doesn't appear "normally", so there is some suspicion that this is related.

comment:15 Changed 23 months ago by pgf

more data to capture, if this occurs on a machine with EC serial hooked up -- capture the output of the 'X' command at the EC prompt. this will let us see the state of the kbd/tpad flow controller bits.

comment:16 Changed 23 months ago by dsd

Reproduced this on a cold boot. I have a feeling that this bug appears quite often on a cold boot and not so much in a reboot loop.

There was no weird IRQ169 handler.

For an unknown reason we couldn't get anything except junk from the EC serial port, and didn't want to reboot. So we used sdkit. Thanks to Paul for his help looking at the details.

After a while the keyboard lag issue came into play as well (#12370). So it is not exactly clear if we are debugging the failure to communicate with the mouse, or the keyboard lag (also seen on XO-1.75, #11543), or both.

ok d4290000 1000 mmap constant sp-base
ok sp-base 84 + l@ .

The SP code that runs on the other end of the ap-sp interface is in cforth git at src/platform/arm-xo-1.75/consoleio.c

Looking at register values on this confused system ("bad") vs a working unit ("good")

SECURE_PROCESSOR_COMMAND 40 = 1f3 (bad), 00ff (good)
COMMAND_RETURN_STATUS 80 = e0 (bad), 1c (good)
COMMAND_FIFO_STATUS c4 = 104 (bad), 100 (good)
PJ_RST_INTERRUPT c8 = 20000 (bad), 0 (good)
PJ_INTERRUPT_MASK cc = ~1 (both)
SP_INTERRUPT_REGISTER 218 = 802 (bad), 804 (good)
SP_INTERRUPT_MASK 21c = ~1 (bad), 0xfffffffd (good)
SP_CONTROL 220 = 1 (bad), 0 (good)
PJ_INTERRUPT_SET 234 = 20000 (bad), 0 (good)

The COMMAND_FIFO_STATUS value shows that on a bad system, there are 4 commands queued in the FIFO, which is the maximum allowed amount on MMP2 and MMP3 (according to datasheet). We found a bug in olpc_keyboard where it will continue writing commands even when there are 4 commands queued, fixed in da81b98b1e17.

SP_INTERRUPT_MASK shows that the command interrupt has been masked. This explains why the queue isn't being emptied - the SP isn't listening.

In the SP code you can see that irq_handler() does mask the interrupt, and applies some non-trivial logic for when to unmask it. It looks like we have triggered a bug in this code, or confused it significantly with bad data.

There is a possibility the olpc_keyboard bug has caused the SP to be confused, but we haven't come up with a solid explanation that would back up this possibility, so for now we will go back to testing with the fix and look at logging when the FIFO is full.

Other open questions:

  • irq_handler() in the SP only processes one command per interrupt. Is that correct? Would be interesting to run an experiment: SP masks the interrupt, CPU queues two commands, SP unmasks interrupt, what happens now? How many interrupts?
  • Can olpc_kbd_pad_write() be called twice concurrently or is there some serialization happening at a higher layer? This code would be racy if ran concurrently in two different processes.

comment:17 Changed 23 months ago by dsd

See #11363 for a potentially related issue that we saw on XO-1.75.

Changed 23 months ago by dsd

log from 13.1.0 build 29

comment:18 Changed 23 months ago by dsd

I reproduced this in 13.1.0 build 29, so the above kernel fix has not solved the problem.

With logging, I have more info on what is happening, logs attached. During the normal bootup conversation, a 0xf4 command is sent to the mouse, and no response is received, causing the message:

[    3.248793] psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0

At this point the FIFO is empty, suggesting that the command was actually sent and flushed. But from this point we stop receiving any data from the device, and any commands we send do not get flushed (we queue 4 in the FIFO, then the rest just timeout as the FIFO is full).

Changed 23 months ago by dsd

Changed 23 months ago by dsd

kernel log corresponding to the same failure instance

comment:19 Changed 23 months ago by dsd

In the above trace, we attempt to submit command 0x33 to the device. The SP drives the clock line low, and programs a timer interrupt to arrive about 110uS later.

The timer interrupt arrives, the SP drives data low and clock high. This is all in accordance with my reading of PS2 specs.

The relevant GPIO is then configured to generate interrupts upon the clock's falling edge. But we do not receive any interrupts at this point. Given that the device should always be generating the clock signal, and the host has it unhibited at this point, it seems strange that this is failing here; I would have thought that listening for the clock signal would be reliable/trivial.

comment:20 Changed 23 months ago by dsd

My understanding of the clock line was incorrect. The device will only generate a clock signal when there is data to be sent in either direction. Anyway, we tried increasing the delay to 220uS, but this didn't help.

Using sdkit to examine the GPIOs when the situation occurs:

ok d# 160 gpio-out? .
ffffffff
ok d# 107 gpio-out? .
ffffffff

This means that GPIO 160 (the clock line) is an output, which means that it has been driven low. This is the opposite of what we would expect - it should have been set as an input, waiting for a clock pulse.

Flipping it to an input causes the mouse to start working again (albeit in confused fashion).

comment:21 Changed 23 months ago by erikos

I have just seen the issue again in build 29.

psmouse serio1: Failed to enable mouse on olpc_touchpad/serio0

is in /var/log/messages, I backed it up, let me know if you need it.

comment:22 Changed 23 months ago by dsd

  • Action Needed changed from reproduce to add to build

Pretty sure I caught this by changing tactics and looking for Linux interference with this GPIO.

The failure always seems to happen around dcon init time, which happens to mess with 5-6 GPIOs. Different ones from the mouse, however these GPIOs are organised into big banks, and one register controls a whole bank.

The linux gpio driver was using a read-modify-write sequence to set gpio direction bits (on the whole bank). This was likely racing with the SP setting a GPIO direction bit in the same bank at the same time.

Fixed in arm-3.5 411fb2816d52a1bc630495237156a6f64320c146 : we now use a bit-wise scheme to set GPIO direction bits, which doesn't have such race conditions associated with it.

My test unit has survived around 4 hours in a reboot loop without seeing the problem - thats more than before. Overnight testing will confirm.

comment:23 Changed 23 months ago by dsd

  • Action Needed changed from add to build to test in build

Test in 13.1.0 build 30.

comment:24 Changed 22 months ago by greenfeld

  • Action Needed changed from test in build to no action
  • Resolution set to fixed
  • Status changed from new to closed

I have not lost the use of the touchpad without seeing the entire system being hung (and therefore the cause not being this ticket) in 13.1.0 os30 or 31.

Note: See TracTickets for help on using tickets.