Opened 6 years ago

Last modified 5 years ago

#7788 new defect

Touchpad behavior deteriorates under joyride-2212, joyride-2230

Reported by: tvoverbeek Owned by: dilinger
Priority: blocker Milestone: 8.2.0 (was Update.2)
Component: kernel Version: olpc-3
Keywords: 8.2.0:? blocks-:8.2.0 Cc: dsd, dsaxena, smithbone, holt
Blocked By: Blocking:
Deployments affected: Action Needed: diagnose
Verified: no

Description

Compared with Update.1 (Build 708) the touchpad is behaving much worse under joyride-2212 and joyride-2230. After a few minutes the cursor starts to jump around and moves erraticly. The four-finger salute helps, but only temporarily.
Booting back in Build 708 shows a stable behavior.
Is this only me or is it generic ???

Attachments (11)

dmesg.log (92.6 KB) - added by tvoverbeek 6 years ago.
dmesg output on joyride-2280
dmesg2298-1.log (39.4 KB) - added by tvoverbeek 6 years ago.
dmesg2298-2.log (39.2 KB) - added by tvoverbeek 6 years ago.
dmesg2298-3.log (46.2 KB) - added by tvoverbeek 6 years ago.
dmesg2298-4.log (41.8 KB) - added by tvoverbeek 6 years ago.
dmesg2298-5.log (42.4 KB) - added by tvoverbeek 6 years ago.
dmesg708.log (62.2 KB) - added by tvoverbeek 6 years ago.
080816.tar.bz2 (132.5 KB) - added by tvoverbeek 6 years ago.
log files 2008-08-16
080817.tar.bz2 (122.5 KB) - added by tvoverbeek 6 years ago.
log files 2008-08-17
080820.tar.bz2 (26.9 KB) - added by tvoverbeek 6 years ago.
Log files 2008-08-20
080821.tar.bz2 (86.1 KB) - added by tvoverbeek 6 years ago.
Log files 2008-08-21

Download all attachments as: .zip

Change History (41)

comment:1 Changed 6 years ago by auser

While I did not boot back into 708 to double chek, I can say that the touchpad became much more jumpy with 2230 for me as well.

comment:2 Changed 6 years ago by dsd

  • Cc dsd added
  • Keywords 8.2.0:? added

comment:3 Changed 6 years ago by tvoverbeek

  • Keywords blocks?:8.2.0 added
  • Version changed from not specified to olpc-3

Just stumbled over #7341. This might be a duplicate of #7341. If so, please close it.

Changed 6 years ago by tvoverbeek

dmesg output on joyride-2280

comment:4 follow-ups: Changed 6 years ago by tvoverbeek

I have attached the output of dmesg on a joyride-2280 run.
The first part is on battery only, with battery almost empty.
The last part has mains power plugged in.
Note the extended EC traffic every 100 sec. At the end (while plugged in) the message sequence is much shorter.
Note also the nested touchpad calibration messages at e.g. 213 sec (short)
and 850 (long), 1476 (very long with lost synchronization)

comment:5 Changed 6 years ago by dsaxena

  • Cc dsaxena added

comment:6 follow-up: Changed 6 years ago by kimquirk

  • Action Needed changed from never set to diagnose

we need to monitor this as we believe we've made significant improvements. needs some thoughts on how to quantify 'better' or 'worse' for the touchpad performance.

comment:7 Changed 6 years ago by dsaxena

  • Cc smithbone added

comment:8 in reply to: ↑ 4 Changed 6 years ago by dsaxena

Replying to tvoverbeek:

I have attached the output of dmesg on a joyride-2280 run.
The first part is on battery only, with battery almost empty.
The last part has mains power plugged in.
Note the extended EC traffic every 100 sec. At the end (while plugged in) the message sequence is much shorter.
Note also the nested touchpad calibration messages at e.g. 213 sec (short)
and 850 (long), 1476 (very long with lost synchronization)

So the issue we have here is that if we are in the middle of recalibrating and the user touches the mouse, we need to start allover again. Were you moving the mouse around a lot when you saw the nested re calibrations? If not, that means you were seeing completely spurious packets from he mouse. :(

We've heard nothing but goodness about the new driver so far, so this is a bit worrysome.

comment:9 Changed 6 years ago by mstone

tvoverbeek will spend some more time looking at this, perhaps with smithbone's help. smithbone would like us to find him access to the equipment required to do controlled experiments. jg would like us to fix #1407. Otherwise, nothing new is known.

Finally, dsaxena: there have been other reports of touchpad deterioriation. search trac.

comment:10 in reply to: ↑ 4 ; follow-up: Changed 6 years ago by dilinger

Replying to tvoverbeek:

I have attached the output of dmesg on a joyride-2280 run.
The first part is on battery only, with battery almost empty.
The last part has mains power plugged in.
Note the extended EC traffic every 100 sec. At the end (while plugged in) the message sequence is much shorter.
Note also the nested touchpad calibration messages at e.g. 213 sec (short)
and 850 (long), 1476 (very long with lost synchronization)

The resync thing looks like a race that was fixed here:

http://dev.laptop.org/git?p=olpc-2.6;a=commitdiff;h=91414e51364fbb8ca855ea37b4d66f8ce9c19aed

As for the nested touchpad calibration messages; you need to stop using the touchpad when it's recalibrating for 2s, otherwise it will keep recalibrating.

comment:11 Changed 6 years ago by dilinger

Also, note that the recalibration stuff is in response to the cursor behaving erratically. The jumpy cursor is a hardware bug, and the touchpad driver versions should not make a difference at all (other than the current driver actually forcing the hardware to fix itself, and the older driver attempting to do so far less successfully).

I'm not convinced that switching back to 708 and it working better is anything other than luck, unless you can reliably prove that (say, switching back and forth between the two images 20 times and always seeing similar behavior).

comment:12 in reply to: ↑ 10 Changed 6 years ago by dsaxena

Replying to dilinger:

Replying to tvoverbeek:

I have attached the output of dmesg on a joyride-2280 run.
The first part is on battery only, with battery almost empty.
The last part has mains power plugged in.
Note the extended EC traffic every 100 sec. At the end (while plugged in) the message sequence is much shorter.
Note also the nested touchpad calibration messages at e.g. 213 sec (short)
and 850 (long), 1476 (very long with lost synchronization)

The resync thing looks like a race that was fixed here:

http://dev.laptop.org/git?p=olpc-2.6;a=commitdiff;h=91414e51364fbb8ca855ea37b4d66f8ce9c19aed

I'll pull this into the testing kernel.

Changed 6 years ago by tvoverbeek

Changed 6 years ago by tvoverbeek

Changed 6 years ago by tvoverbeek

Changed 6 years ago by tvoverbeek

Changed 6 years ago by tvoverbeek

Changed 6 years ago by tvoverbeek

comment:13 Changed 6 years ago by tvoverbeek

I downloaded joyride-2298 which I believe has the fix for the synchronization race included in the kernel (correct?).
I did 5 runs with joyride 2298. Before each run I powered down, removed the battery, reinserted it and then booted. Each run showed the jumpy touchpad behavior. See the corresponding 5 logs: dmesg2298-1.log .. dmesg2298-5.log.
Then I booted back into Update.1 (708), again after powering down, removing and reinserting the battery. In this run no touchpad problems. In the dmesg708.log there are no recalibration messages.

So I maintain my point that for me touchpad behavior has deteriorated in 8.2.

What else do you want me to test (or repeat)?

comment:14 follow-up: Changed 6 years ago by dilinger

Thanks for the logs!

The lack of recalibration messages in 708 does not mean that the cursor isn't messed up. When you boot and use 708, is the mouse cursor jumpy at all?

There is a lot of strangeness in your 2298 logs. The fact that 3/5 times, the driver detects miscalibration at around the same time (160s after boot) makes me wonder what's happening there. What point in the boot sequence are you at when it happens? Is sugar completely up? Are you launching any activities, or doing anything else other than moving the cursor around?

The eth0/msh0 messages makes me think that NetworkManager has just come up, so the hardware was miscalibrated right from the start. I wonder if the large amount of EC commands happening during boot are screwing something up.. I'm also wondering wtf the PCI EHCI messages are about.

Do you keep your finger on the touchpad during bootup? Also, when you see the miscalibration errors in the logs (and you remove your fingers from the touchpad for a few seconds), does the touchpad driver fix itself, or is the cursor still jumpy afterwards?

comment:15 in reply to: ↑ 6 Changed 6 years ago by AlbertCahalan

Replying to kimquirk:

we need to monitor this as we believe we've made significant improvements. needs some thoughts on how to quantify 'better' or 'worse' for the touchpad performance.

fake finger attached to a motor, continuously going around in a circle

comment:16 in reply to: ↑ 14 ; follow-up: Changed 6 years ago by tvoverbeek

Replying to dilinger:

Thanks for the logs!

The lack of recalibration messages in 708 does not mean that the cursor isn't messed up. When you boot and use 708, is the mouse cursor jumpy at all?

No, under 708 I get no noticeable cursor jumps.

There is a lot of strangeness in your 2298 logs. The fact that 3/5 times, the driver detects miscalibration at around the same time (160s after boot) makes me wonder what's happening there. What point in the boot sequence are you at when it happens? Is sugar completely up? Are you launching any activities, or doing anything else other than moving the cursor around?

When booting I wait till the home view is complete (circle view with activities) without touching the touchpad. Then I start a terminal session and start Memorize. Memorize
requires a lot of mouse movement. Play a while in Memorize until the cursor starts jumping.
Then switch to Bounce and play a while.
At 160 sec I am in Memorize.

The eth0/msh0 messages makes me think that NetworkManager has just come up, so the hardware was miscalibrated right from the start. I wonder if the large amount of EC commands happening during boot are screwing something up.. I'm also wondering wtf the PCI EHCI messages are about.

The EHCI messages only started to show up with the most recent kernels.
I cannot give you a specific date/version. Sorry.
Also I am in a quite noisy WiFi environment at home. There are more than 15 WiFi networks in the
neighborhood view. My home accesspoint has WPA-PSK. Sometimes the XO prompts me for the
accesspoint password. I cancel the dialog and then go to the neighborhood view, select my network, click Connect again and it associates without prompting for the password again.
Might have nothing to do with this.

Do you keep your finger on the touchpad during bootup? Also, when you see the miscalibration errors in the logs (and you remove your fingers from the touchpad for a few seconds), does the touchpad driver fix itself, or is the cursor still jumpy afterwards?

As written above, no finger on the touchpad during boot. After recalibration it fixes itself
for a short while, but then starts jumping again (as you can see in the logs).

Saw Richard Smiths message about the new firmware with faster/better EC handling.
My XO firmware is on Q2E12. Could the interaction between recent firmwares (Q2E12) and 2.6.25
kernel scheduling have something to do with this?
Should I try again when Q2E13 shows up in joyride?


Cut and pasted this from my reply on devel@. Why don't replies regarding bugs sent by email
get added to trac autoamtically???

comment:17 in reply to: ↑ 16 Changed 6 years ago by rsmith

Replying to tvoverbeek:

As written above, no finger on the touchpad during boot. After recalibration it fixes itself
for a short while, but then starts jumping again (as you can see in the logs).

So far nobody has been able to duplicate your findings. Using your additional info we can try again. I'd like you to continue to try and duplicate your tests daily. ie repeat the test under joyride 5x times switch back to 708 and do 5x with it then back to joyride and test again. If that pattern continues to hold true then congratulations you just became the touchpad testbed. So far its never held up to long term repeatability.

There is also another test. The new touchpad driver was backported to the older kernel.

From deepak:

================
I have built an RPM with the 2.6.22 kernel + driver backport that folks
running <= 703 can use for this purpose:

http://dev.laptop.org/~dsaxena/kernel-2.6.22-20080710.1.olpc.0.i586.rpm
================

Dunno if this will actually work on 708 but its worth a try.

Saw Richard Smiths message about the new firmware with faster/better EC handling.
My XO firmware is on Q2E12. Could the interaction between recent firmwares (Q2E12) and 2.6.25
kernel scheduling have something to do with this?
Should I try again when Q2E13 shows up in joyride?

Many EC commands in a row will indeed affect the touchpad data stream. The old EC command speed would have the kernel blocked for 20 ms at a time. During that time it can't read pad data from the EC. Since the pad update rate is 12ms the data is either has to be stored or discarded. The pad appears to discard the data. So if this is related to all the polling of battery data that HAL is doing then yes the new EC firmware might make a difference.

I thought the new joyrides had the HAL polling fixed.


Cut and pasted this from my reply on devel@. Why don't replies regarding bugs sent by email
get added to trac autoamtically???

Our trac does not have an e-mail interface. I've seen some traffic about a new module that does this but we don't have it installed. We may wait untill after 8.2 before we mess with trac.

comment:18 Changed 6 years ago by tvoverbeek

Here todays logs collected into 080816.tar.bz2.
These were done after going back to Q2E12 firmware. With Q2E13 the mouse was mot usable in 703.
There are 3 runs on build 708 and three runs on joyride-2302.
Each run was done while exercising Bounce (3Dpong).
Before each run the power was reset (power off, battery removed and reinserted).
The logs ending in .0.log are taken without setting sys/modules/psmouse/parameters/tpdebug to 1.
The *.1.log files have tpdebug set to 1.
The touchpad behavior was normal under 703.
The touchpad seems to behave better with tpdebug set to 1 under joyride-2203.
There are again a number of recalibrations in the 2302-x.0.log files
So it smells like some marginal timing issue somewhere.
I'll try to do an other set of runs tomorrow.

Changed 6 years ago by tvoverbeek

log files 2008-08-16

Changed 6 years ago by tvoverbeek

log files 2008-08-17

comment:19 follow-up: Changed 6 years ago by tvoverbeek

Here are todays (Aug 17) log files. There are again 3 runs on both Update.1 (708) and joyride-2302.
Each run again with 2 log files as yesterday.
The first run felt pretty good, even on joyride-2302.
The second run after a battery recharge was pretty lousy. Needed the 4-finger salute.
Even the 3rd run was not very good. Even in Update.1 there were jumps. Some of them were caught in the dmesg708-3.1.log file: there are regularly 2 consecutive packets with x=0, y=0 in between 'normal' packets. The following 2302 run started very bad. After the 4 finger salute everything worked nicely again. The turning on tpdebug made things worse again.

Especially the last run makes me believe that we have an issue with timing between the EC firmware and the driver. In 2203 the packet frequency is twice the one in 708. So 2302 is more sensitive to correct timing, which would explain the worse initial results in 2302 in the 3rd run.

comment:20 in reply to: ↑ 19 ; follow-up: Changed 6 years ago by rsmith

Especially the last run makes me believe that we have an issue with timing between the EC firmware and the driver. In 2203 the packet frequency is twice the one in 708. So 2302 is more sensitive to correct timing, which would explain the worse initial results in 2302 in the 3rd run.

Try my latest and greatest test firmware http://dev.laptop.org/~rsmith/q20158.rom

comment:21 in reply to: ↑ 20 ; follow-ups: Changed 6 years ago by tvoverbeek

Replying to rsmith:
Updated to firmware q2e13d.
Did 2 runs on 708 and 2203.
708 worked fine with this firmware. 2203 still showed jumps and recalibrations, but it surely felt it was behaving better.
Attached todays log files in 080820.tar.bz2.
Waiting now till Deepaks psmouse driver change makes it into joyride/8.2.

Changed 6 years ago by tvoverbeek

Log files 2008-08-20

comment:22 in reply to: ↑ 21 ; follow-up: Changed 6 years ago by rsmith

Waiting now till Deepaks psmouse driver change makes it into joyride/8.2.

Its in. Use the latest joyride.

comment:23 in reply to: ↑ 21 ; follow-up: Changed 6 years ago by pgf

Replying to tvoverbeek:

Replying to rsmith:
Updated to firmware q2e13d.
Did 2 runs on 708 and 2203.
708 worked fine with this firmware. 2203 still showed jumps and recalibrations, but it surely felt it was behaving better.

i assume you meant "2302", above, correct?

i also assume (but feel free to verify :-) that you're careful when using the touchpad that one and only one finger or thumb is on or near the pad, correct? multiple touches (i get this if my thumb is lazy and dangles) can easily cause a perceived big jump, and therefore a recal.

(and thanks, btw, for your persistent testing.)

comment:24 in reply to: ↑ 22 ; follow-up: Changed 6 years ago by tvoverbeek

Replying to rsmith:

Waiting now till Deepaks psmouse driver change makes it into joyride/8.2.

Its in. Use the latest joyride.

Are you sure?
kernel package in both joyride-2302 and latest one is ...20080813.4.olpc.cc86...
Original commit in kernel git was July 31 and in the testing head (which is what I am assuming you are using for the joyride builds) the commit is Aug 13, which is about the time the troubles with koji started.
If it really was already in in 2302 I have been using it all along.
Will try to do some more testing and also generate logs with tpdebug=1 with firmware q2e13d.

comment:25 in reply to: ↑ 23 Changed 6 years ago by tvoverbeek

Replying to pgf:

i assume you meant "2302", above, correct?

Yes

i also assume (but feel free to verify :-) that you're careful when using the touchpad that one and only one finger or thumb is on or near the pad, correct? multiple touches (i get this if my thumb is lazy and dangles) can easily cause a perceived big jump, and therefore a recal.

Yes, I am aware of this touchpad behavior, so I am trying very hard to have only
one finger on the pad ;-).

comment:26 in reply to: ↑ 24 Changed 6 years ago by rsmith

Replying to tvoverbeek:

Replying to rsmith:

Waiting now till Deepaks psmouse driver change makes it into joyride/8.2.

Its in. Use the latest joyride.

Are you sure?
kernel package in both joyride-2302 and latest one is ...20080813.4.olpc.cc86...

Yep.

Patch is here:

http://dev.laptop.org/git?p=olpc-2.6;a=commit;h=cc866cfe0c31220bd03a44e6c5d9e86decd63aaa

It hit joyride in 2298:

joyride build 2298 (pkgs)

Size delta: 0.00M

-kernel 2.6.25-20080812.3.olpc.9b42ff8eb9564f7
+kernel 2.6.25-20080813.4.olpc.cc866cfe0c31220

comment:27 Changed 6 years ago by tvoverbeek

OK, so I have been testing with the psmouse driver change all along.
Here the promised logs (080821.tar.bz2) with tpdebug=1 also included.
There were 2 runs on 2302 and 2 runs on 2302.
Each run with 2 logs: the x.0.log with tpdebug=0 and the x.1.log with tpdebug=1.
The results are the usual again: 708 behaving better than 2302, although there were some good stretches on 2302.
It seems it starts out OK after a reboot and deteoriates later. A 4-finger salute usually helps for a while.
This time the runs started with battery half full and almost empty at the last run.
Runs were done in order 2302-1, 708-1, 2302-2, 708-2.
I'll leave it at this, unless you suggest some more/other tests.

Changed 6 years ago by tvoverbeek

Log files 2008-08-21

comment:28 Changed 6 years ago by cjb

  • Keywords blocks-:8.2.0 added; blocks?:8.2.0 removed

We'll consider this if we get more reports of regression.

comment:29 Changed 6 years ago by dsd

I view this as a regression. I have only ever seen a touchpad jump once under 8.1, but 8.2 it jumps all the time. I made a bridge-building activity at the weekend (http://wiki.laptop.org/go/Bridge) and I am unable to build and bolt a bridge on the XO running 8.2 without seeing 3 or more jumps during the process. Feel free to come to my desk and try it...

The recalibration stuff does seem to work (hands off for a few seconds), but the number of recalibrations needed seems excessive.

comment:30 Changed 6 years ago by holt

  • Cc holt added
Note: See TracTickets for help on using tickets.