Ticket #10878 (new defect)

Opened 3 years ago

Last modified 3 years ago

Aggressive power management hindering XO collaboration

Reported by: sridhar Owned by:
Priority: normal Milestone: Future Release
Component: not assigned Version: 1.5/1.0 Software Build os860 aka 10.1.3
Keywords: Cc: sridhar, sascha_silbe
Action Needed: never set Verified: no
Deployments affected: Blocked By:
Blocking:

Description

It's looking to us that the aggressive power management enabled on the XO can sometimes create confusion when children are collaborating on activities.

Take for example a turn-based game like Memorise. If a child is waiting for their turn, they might leave the XO untouched. In that time, power management can kick in, and the XO stops communicating over the wireless network. Waking up the XO (e.g. by touching the pad) doesn't always rejoin the XO to the game properly. The whole game is stalled because the turn cannot be completed.

Are there any ways we can manage this in our deployments? Perhaps some guidelines to give to teachers so that they have a reasonable expectation?

Change History

Changed 3 years ago by sridhar

This is with OLPC OS 10.1.3 (XO-AU 10.1.3-au2 - we haven't made any changes that would affect reliability of collaboration). The XOs were registered and using an XS.

Workaround 1: prevent power management from starting, by maintaining keyboard/trackpad activity

Workaround 2: turn off power management completely in My Settings

Workaround 3: active player should exit and save, then open the saved Journal entry and share again

Downstream report: http://dev.laptop.org.au/issues/636

Changed 3 years ago by Quozl

Is this with or without a school server? Does it work for you in 11.2.0 development builds? Any fix will be there, and that's where current work is focused. We do not plan an update to 10.1.3. See also #9854 and http://wiki.laptop.org/go/Release_notes/10.1.3#Restoring_wireless_connection_takes_too_long

Changed 3 years ago by Quozl

Was with a school server, per mailing list posting.

Changed 3 years ago by sridhar

Yes, it was with a school server.

This one is hard to troubleshoot because we were seeing this in a remote school (so we can't just go there and try new builds) and the problem was intermittent. I can say that the wireless was connected (or at least was reported as connected by Sugar). We gave it plenty of time to connect to the game. According to the XO, it was still the previous player's turn. According to the other XOs, it was that player's turn. The game was halted as a result.

I'll provide more information as it comes.

Changed 3 years ago by Quozl

Were you able to reproduce it locally?

Retransmissions on the wireless network due to noise, distance, or faulty devices can generate the same symptom in conjunction with power management ... such as delays obtaining a DHCP lease on resume. It isn't clear yet that the symptom is entirely due to power management, but if disabling power management is a viable workaround then that doesn't exclude other symptoms as contributing.

Changed 3 years ago by greenfeld

This is wild speculation, but how long are the kids waiting between turns before the connection stalls?

Something I've noticed previously with XO laptops is that they do not seem configured to wake up on ARP requests. If a system knows an XO's Ethernet address<->IP mapping it can wake the XO up with unicast data; but if it doesn't know the mapping the XO will never respond.

So if the TCP connection between the school server and the laptop is not kept alive with keepalives or data before it dies, or if the ARP entry expires out of the ARP cache for some reason, no system may know how to contact the XO.

The downside to waking up to ARP requests is that XOs are likely to wake up significantly more often, if they can get to suspend mode at all.

In any case, running tcpdump on a schoolserver on the network interface with the XOs to obtain a packet capture while reproducing this issue might be useful.

Changed 3 years ago by pgf

the laptops aren't configured for wake-on-arp because the driver/firmware doesn't support it. the older XO-1 driver might have supported it, via the packet signature mechanism, but that was never used, and the mechanism was removed in newer versions of the driver, iirc.

(and yes, this might be contributing to the current behavior.)

Changed 3 years ago by dsd

Another likely contributor to this issue, especially if its intermittent, is #9960, where wake-on-wlan frequently fails to wake the system.

Changed 3 years ago by sascha_silbe

  • cc sascha_silbe added

Changed 3 years ago by dsd

  • milestone changed from Not Triaged to Future Release

Another possible contributor is #10912.

My view: we have enough bugs around idle-suspend vs networking that you are definitely going to get frequent problems with such a setup.

Changed 3 years ago by sridhar

Unfortunately I'm not able to reproduce the issue here. I understand that this makes it difficult to solve the problem. I am awaiting feedback from the school where we experienced this problem.

Changed 3 years ago by sridhar

The impact of this problem is large. Teachers are not using collaboration because they cannot depend upon it. We can turn off power management to make it smoother, but that shortens the time the XO can be used in the day.

There is some good discussion in January and February (see the threads titled "Impacts of disabling Automatic Power Management").

A 'perfect' solution is proving hard to come by, and it is quite common for deployments to disable automatic power management. As a workaround, we can suspend automatic power management whenever a collaboration session is active.

Changed 3 years ago by erikos

Hmm, I would be interested in knowing whether disabling automatic power management solves all the issues in collaboration (for you), or if we mix issues and streamline to one root cause. I agree that power management can have an impact (see #10363 as a detailed discussion and a Sugar workaround to inhibit suspend when collaborating), but I would like to make sure we don't mix several issues.

Changed 3 years ago by sridhar

#10363 probably explains most of our problems, based on what I've seen and heard. We haven't been able to narrow it down any further.

Note: See TracTickets for help on using tickets.