Ticket #9008 (closed defect: fixed)

Opened 5 years ago

Last modified 4 years ago

touchpad suddenly stops working (recalibration failed)

Reported by: HoboPrimate Owned by: dsaxena
Priority: normal Milestone: 8.2.1
Component: kernel Version: not specified
Keywords: cjbfor9.1.0 Cc: dilinger, wad, pgf, joe
Action Needed: finalize Verified: no
Deployments affected: Blocked By:
Blocking:

Description

Build 767, on a B4 laptop.

I nand-erased and installed build 767 yesterday. I've had two occasions today of the touchpad stopping to work: once I was running on battery, the other time it was plugged to the wall.

/var/log/messages gives the following error at the time the touchpad stops working:

"psmouse serio1: recalibration failed!" A restart of sugar doesn't fix it. I don't have extreme power management activated, and when it happened, it was while using Browse or in a zoom level view (the B4 doesn't suspend anyway in this build).

Change History

  Changed 5 years ago by dsaxena

  • cc dilinger, wad added

Dilinger, have we seen this before on any spin of HW and B-4 in particular?

  Changed 5 years ago by pgf

i was just talking to dilinger about this the other day.

i've seen it on my MP machine, running 767. i partially discounted it, because my machine has a replacement usb keyboard, which could in principle cause differences down near the psmouse layer.

in my case it happened several times in a week of daily use. (i haven't been using the laptop so regularly just lately.) i'm set up to instrument psmouse and do more logging, if you have suggestions.

follow-up: ↓ 6   Changed 5 years ago by dsaxena

My guess from looking at the code is that we're running into this path in hgpk_force_recalibrate():

        /*
         * XXX: If a finger is down during this delay, recalibration will
         * detect capacitance incorrectly.  This is a hardware bug, and
         * we may need to work around that here.
         */

        if (ps2_command(ps2dev, NULL, PSMOUSE_CMD_ENABLE))
                return -1;

If we return -1 here, psmouse->state is left as PSMOUSE_INITIALIZING and the psmouse interrupt handler will discard all packets. An X restart will not fix this b/c it does not cause the driver to rebind to the device and reinitialize this.

  Changed 5 years ago by dilinger

Paul, is your machine available to play with? I'd like to stick a kernel w/ a bit of debugging on there.

  Changed 5 years ago by dilinger

  • cc pgf added

in reply to: ↑ 3   Changed 5 years ago by pgf

Replying to dsaxena:

My guess from looking at the code is that we're running into this path in hgpk_force_recalibrate():

>         /*
>          * XXX: If a finger is down during this delay, recalibration will
>          * detect capacitance incorrectly.  This is a hardware bug, and
>          * we may need to work around that here.
>          */
> 
>         if (ps2_command(ps2dev, NULL, PSMOUSE_CMD_ENABLE))
>                 return -1;
> 

If we return -1 here, psmouse->state is left as PSMOUSE_INITIALIZING and the psmouse interrupt handler will discard all packets. An X restart will not fix this b/c it does not cause the driver to rebind to the device and reinitialize this.

i think the same is true if we fail any of the immediately previous steps in the recalibration command itself.

what's the right fix? just set it to PSMOUSE_ACTIVATED, and assume we'll recalibrate shortly, i suppose?

  Changed 5 years ago by pgf

it's not that easy -- the recalibration also has to be scheduled. but if i do both, it works. i have a patch, but there are some other things in it (all the recal timeouts and delays are tuneable, mainly). i can separate.

  Changed 5 years ago by mstone-xmlrpc

  • keywords cjbfor9.1.0 added
  • milestone changed from 8.2.1 to 9.1.0

Pushing out to 9.1.0, per edmcnierney's request.

  Changed 5 years ago by cscott

  • next_action changed from never set to add to build
  • milestone changed from 9.1.0 to 8.2.1

This fix is included in dsaxena's RPMs for 8.1.0.

Please put the appropriate kernel RPM in yr ~/public_rpms/staging on dev. (Also in joyride, if appropriate.)

  Changed 5 years ago by pgf

  • summary changed from B4, touchpad suddenly stops working to touchpad suddenly stops working (recalibration failed)

note: after a "recalibration failed!" event, the touchpad will stop working (in 8.2 vintage kernels), but a suspend/resume cycle will rescuscitate it.

also, changing summary to better reflect the thread and fix.

  Changed 5 years ago by joe

  • cc joe added

Saw it on a laptop running 8.2.1 Staging-7.

  Changed 5 years ago by cjb

  • next_action changed from add to build to test in build

Verified that kernel-2.6.25-20081217.24 is present in staging-9.

  Changed 5 years ago by pgf

just realized i'm not sure anyone else knows how to test this.

it's fairly easy to generate the "recalibration failed" error: i do it by taking my finger and continuously doing quick sweeps, alternating from the two sides of the touchpad. mix it up -- some short, some long, but the goal is to cause touchpad jumps (which means closely spaced large relocations of your finger with no contact on the pad) and lots of touchpad traffic (which involves contact with the pad). the former (large displacements) force the recal, and the latter (traffic) interferes. at least that's my theory. you'll know when you get it, because the mouse will lock up, and you'll see the "recalibration failed!" message in the log. you can recover by suspending/resuming the laptop.

with the fix in, you'll still see the message, but the recalibration will be retried in 500msec, and should succeed the next time. the driver currently retries forever, at half second intervals, if the recalibration keeps failing. upstream has rejected the patch because of this, and would rather we re-init the ps2 subsystem after some number of failures. i have it on my todo list to look at this, but my OLPC work isn't currently at the top of that list. i don't know whether simply putting a limit on the number of retry attempts (5? 10?), which would be a much easier change, would be acceptable upstream.

final note: we have not looked at _why_ we're getting a failure from ps2_command() when it is called by hgpk_force_recalibrate(), nor exactly which of the 5 possible calls is failing. there are already special cases in ps2_command() for some commands that can take a long time -- it's possible that one of the commands we're sending (which aren't #defined -- grr) needs such an exception. (i actually think that ps2_command() should get an additional "how long to wait for reply" parameter of some sort, which would usually specify a default value -- knowledge of per-command delay req'ts should be with the caller.)

  Changed 5 years ago by dsd

  • next_action changed from test in build to finalize

I tried and tried but I cannot get the touchpad to recalibrate based on rapid finger movements. Perhaps my technique is wrong. However, over the course of testing today I have triggered two "recalibration failed" messages and the touchpad still works. So I can confirm this is working in staging-9.

I suggest that we open a separate ticket for the other issues (upstream rejected our fix, and we have unanswered questions as above).

  Changed 5 years ago by pgf

i can usually get recalibration to occur pretty easily by using two fingers, and using them to tap repeatedly, some distance apart. (no swiping motion)

  Changed 5 years ago by pgf

  • status changed from new to closed
  • resolution set to fixed

we now understand this bug. it was a regression, introduced at approximately q2e16, when the EC firmware was modified with code from quanta to support the new keyboard controller. that code contained a register change which "adds a 16us delay on some parts of the ps2 protocol and was screwing up the reception of 0xfa (acks) from the touchpad". (that's a quote from richard's commit for the fix, which was introduced in q2e34.)

the OLPC kernel tree has a workaround for the recalibration failed issue in the form of forced retries. that workaround is not upstream, but it now doesn't seem that it needs to be, given the firmware fix.

Note: See TracTickets for help on using tickets.