Ticket #7893 (new defect)

Opened 11 months ago

Last modified 9 months ago

Presence service gets confused in simple mesh

Reported by: martin.langhoff Owned by: daf
Priority: high Milestone: 9.1.0
Component: telepathy-other Version: not specified
Keywords: joyride-2270 blocks-:8.2.0 Cc: morgs, jg, dsd, aly, joe, Collabora
Action Needed: diagnose Verified: no
Deployments affected: Blocked By:
Blocking: #7417

Description

In a simple mesh with 5 XOs. all running joyride-2270. After ~2 hours of interacting over the network and with various reboots from the machines...

Shared activities of a given XO continue to appear in the neighbourhood view of the XO itself, even right after a reboot with no activities open! Some of those activities had been closed gracefully, others had been killed or the machine poweroff's ungracefully.

In general, the quality of the mesh interactions worsened significantly, and the neighbourhood/friends views were completely out of sync with reality.

This seems to lack a way to revalidate or reset the status of the presence service.

Attachments

logs.SHF7200007E.2008-09-16.10-23-57.tar.bz2 (285.3 kB) - added by gdesmott 10 months ago.

Change History

  Changed 11 months ago by morgs

  • cc morgs added

follow-up: ↓ 4   Changed 11 months ago by kimquirk

  • cc jg, dsd added

I agree with this as I have now seen it in the 20+ laptop testing we are doing here. When we first start everything up, the neighborhood view is good... and then hours later or after some sharing, you can't rely on the view.

Does a restart of sugar (not a full reboot) kick the presence?

  Changed 11 months ago by kimquirk

  • keywords blocks?:8.2.0 added
  • priority changed from normal to high

in reply to: ↑ 2   Changed 11 months ago by tomeu

Replying to kimquirk:

Does a restart of sugar (not a full reboot) kick the presence?

Killing X should restart the presence service, yeah.

  Changed 11 months ago by cjb

But Martin said that even a full reboot of the affected machines didn't help.

  Changed 11 months ago by kimquirk

  • keywords blocks:8.2.0 added; blocks?:8.2.0 removed

  Changed 11 months ago by cjb

Collabora folks, we'd like you to look at/reproduce this if you can.

  Changed 11 months ago by kimquirk

Anyone else who sees this - please include logs.

  Changed 11 months ago by morgs

http://wiki.laptop.org/go/Telepathy_debugging explains how to turn on the appropriate logging.

follow-up: ↓ 14   Changed 11 months ago by daf

The first question that springs to mind is: what changed? Do we know which build this behaviour started manifesting itself in? That might give us a clue as to what's causing it.

We'll try reproducing it in Cambridge UK.

  Changed 10 months ago by mstone

  • next_action changed from never set to reproduce

  Changed 10 months ago by martin.langhoff

The wellington testers team did try to repro it, and had some success, but the test run saw lots of apps crashing due to the sound-hard-lockup and the "crashes when loses focus" problem (#8072).

So the logs we have are spotty and I doubt useful.

Is there a rough script you'd like us to follow to narrow down on reproduceability?

  Changed 10 months ago by martin.langhoff

  • cc aly added

in reply to: ↑ 10   Changed 10 months ago by carrano

Replying to daf:

The first question that springs to mind is: what changed? Do we know which build this behaviour started manifesting itself in? That might give us a clue as to what's causing it. We'll try reproducing it in Cambridge UK.

In terms of having outdated information on the mesh view, I don't think that we ever got this working properly. So, I don't think that anything has to change for us to get inconsistent data.

  Changed 10 months ago by kimquirk

  • owner changed from Collabora to joe

Can we recreate this on a simple mesh in a non-RF busy environment with 10 laptops. (Joe)

  Changed 10 months ago by joe

  • owner changed from joe to sjoerd
  • next_action changed from reproduce to diagnose

Tested with 8.2-759 in a 10-laptop testbed on a mesh network. Got results similar to Martin's.

Joe

  Changed 10 months ago by joe

  • cc sjoerd added

  Changed 10 months ago by joe

  • cc joe added

  Changed 10 months ago by sjoerd

  • owner changed from sjoerd to daf

  Changed 10 months ago by gdesmott

Martin: Could you tell me a bit more about this problem? Activities announcements are known to be bugged (#8441). Did you observe situations where *buddies* in the mesh view didn't match the reality?

Note that you can use "sugar-xos" to display all the buddies known by the PS (you need at least 8.1-760).

  Changed 10 months ago by martin.langhoff

Hi! I've seen #8441 quite frequently, and I have seen buddies that were off or not around appearing as "current" in the listing, and I've also seen "missing" buddies that others could see. All of this in simple mesh mode.

Good to hear we have sugar-xos now. Will use it in the next round of testing :-)

  Changed 10 months ago by gnu

  • blocking 7417 added

  Changed 10 months ago by gdesmott

  • cc Collabora added; sjoerd removed

  Changed 10 months ago by gdesmott

I think I finally reproduce this bug.

This XO (olpc-or) doesn't see "olpc-bg" which is connected and its service is announced.

-bash-3.2# olpc-xos 
Time   : 10:44:24
Total  : 2

10.0.0.181 	 olpc-or
--> TOTAL: 1
-bash-3.2# olpc-xos -avahi | grep olpc-
d48ee750@xo-0C-E8-0B	169.254.5.155	olpc-bg
0c4328f9@xo-03-55-EC	10.0.0.181	olpc-or
0c4328f9@xo-03-55-EC	169.254.7.153	olpc-or

So olpc-bg's service is only announced on the mesh interface which is normal as olpc-bg is not connected to the AP.

-bash-3.2# avahi-browse _presence._tcp | grep d48ee750
+ msh0 IPv4 d48ee750@xo-0C-E8-0B                          iChat Presence       local

Attaching logs of olpc-or.

Changed 10 months ago by gdesmott

  Changed 10 months ago by gdesmott

Actually this bug could be a consequence of #8441. Because of #8441, Salut will potentially announce a lot of _olpc-activity1._udp services that could make Avahi reaching the service limits and so making impossible to announce our presence. We should retry to reproduce once #8441 will be fixed (very soon hopefully).

  Changed 10 months ago by gdesmott

#8441 should be fixed in Joyride 2452. Could you retry with that version please?

I manually installed the new Salut package on 8-2:760 and wasn't able to reproduce this bug anymore for now.

  Changed 9 months ago by gdesmott

Would be good to know if someone has still experienced this problem with 8.2-766

  Changed 9 months ago by gregorio

  • keywords blocks-:8.2.0 added; blocks:8.2.0 removed
  • milestone changed from 8.2.0 (was Update.2) to 9.1.0
Note: See TracTickets for help on using tickets.