Opened 6 years ago

Last modified 6 years ago

#7893 new defect

Presence service gets confused in simple mesh

Reported by: martin.langhoff Owned by: daf
Priority: high Milestone: 9.1.0-cancelled
Component: telepathy-other Version: not specified
Keywords: joyride-2270 blocks-:8.2.0 Cc: morgs, jg, dsd, aly, joe, Collabora
Blocked By: Blocking: #7417
Deployments affected: Action Needed: diagnose
Verified: no

Description

In a simple mesh with 5 XOs. all running joyride-2270. After ~2 hours of interacting over the network and with various reboots from the machines...

Shared activities of a given XO continue to appear in the neighbourhood view of the XO itself, even right after a reboot with no activities open! Some of those activities had been closed gracefully, others had been killed or the machine poweroff's ungracefully.

In general, the quality of the mesh interactions worsened significantly, and the neighbourhood/friends views were completely out of sync with reality.

This seems to lack a way to revalidate or reset the status of the presence service.

Attachments (1)

logs.SHF7200007E.2008-09-16.10-23-57.tar.bz2 (285.3 KB) - added by gdesmott 6 years ago.

Download all attachments as: .zip

Change History (29)

comment:1 Changed 6 years ago by morgs

  • Cc morgs added

comment:2 follow-up: Changed 6 years ago by kimquirk

  • Cc jg dsd added

I agree with this as I have now seen it in the 20+ laptop testing we are doing here. When we first start everything up, the neighborhood view is good... and then hours later or after some sharing, you can't rely on the view.

Does a restart of sugar (not a full reboot) kick the presence?

comment:3 Changed 6 years ago by kimquirk

  • Keywords blocks?:8.2.0 added
  • Priority changed from normal to high

comment:4 in reply to: ↑ 2 Changed 6 years ago by tomeu

Replying to kimquirk:

Does a restart of sugar (not a full reboot) kick the presence?

Killing X should restart the presence service, yeah.

comment:5 Changed 6 years ago by cjb

But Martin said that even a full reboot of the affected machines didn't help.

comment:6 Changed 6 years ago by kimquirk

  • Keywords blocks:8.2.0 added; blocks?:8.2.0 removed

comment:7 Changed 6 years ago by cjb

Collabora folks, we'd like you to look at/reproduce this if you can.

comment:8 Changed 6 years ago by kimquirk

Anyone else who sees this - please include logs.

comment:9 Changed 6 years ago by morgs

http://wiki.laptop.org/go/Telepathy_debugging explains how to turn on the appropriate logging.

comment:10 follow-up: Changed 6 years ago by daf

The first question that springs to mind is: what changed? Do we know which build this behaviour started manifesting itself in? That might give us a clue as to what's causing it.

We'll try reproducing it in Cambridge UK.

comment:11 Changed 6 years ago by mstone

  • Action Needed changed from never set to reproduce

comment:12 Changed 6 years ago by martin.langhoff

The wellington testers team did try to repro it, and had some success, but the test run saw lots of apps crashing due to the sound-hard-lockup and the "crashes when loses focus" problem (#8072).

So the logs we have are spotty and I doubt useful.

Is there a rough script you'd like us to follow to narrow down on reproduceability?

comment:13 Changed 6 years ago by martin.langhoff

  • Cc aly added

comment:14 in reply to: ↑ 10 Changed 6 years ago by carrano

Replying to daf:

The first question that springs to mind is: what changed? Do we know which build this behaviour started manifesting itself in? That might give us a clue as to what's causing it.

We'll try reproducing it in Cambridge UK.

In terms of having outdated information on the mesh view, I don't think that we ever got this working properly. So, I don't think that anything has to change for us to get inconsistent data.

comment:15 Changed 6 years ago by kimquirk

  • Owner changed from Collabora to joe

Can we recreate this on a simple mesh in a non-RF busy environment with 10 laptops. (Joe)

comment:16 Changed 6 years ago by joe

  • Action Needed changed from reproduce to diagnose
  • Owner changed from joe to sjoerd

Tested with 8.2-759 in a 10-laptop testbed on a mesh network. Got results similar to Martin's.

Joe

comment:17 Changed 6 years ago by joe

  • Cc sjoerd added

comment:18 Changed 6 years ago by joe

  • Cc joe added

comment:19 Changed 6 years ago by sjoerd

  • Owner changed from sjoerd to daf

comment:20 Changed 6 years ago by gdesmott

Martin: Could you tell me a bit more about this problem?
Activities announcements are known to be bugged (#8441). Did you observe situations where *buddies* in the mesh view didn't match the reality?

Note that you can use "sugar-xos" to display all the buddies known by the PS (you need at least 8.1-760).

comment:21 Changed 6 years ago by martin.langhoff

Hi! I've seen #8441 quite frequently, and I have seen buddies that were off or not around appearing as "current" in the listing, and I've also seen "missing" buddies that others could see. All of this in simple mesh mode.

Good to hear we have sugar-xos now. Will use it in the next round of testing :-)

comment:22 Changed 6 years ago by gnu

  • Blocking 7417 added

comment:23 Changed 6 years ago by gdesmott

  • Cc Collabora added; sjoerd removed

comment:24 Changed 6 years ago by gdesmott

I think I finally reproduce this bug.

This XO (olpc-or) doesn't see "olpc-bg" which is connected and its service is announced.

-bash-3.2# olpc-xos 
Time   : 10:44:24
Total  : 2

10.0.0.181 	 olpc-or
--> TOTAL: 1
-bash-3.2# olpc-xos -avahi | grep olpc-
d48ee750@xo-0C-E8-0B	169.254.5.155	olpc-bg
0c4328f9@xo-03-55-EC	10.0.0.181	olpc-or
0c4328f9@xo-03-55-EC	169.254.7.153	olpc-or

So olpc-bg's service is only announced on the mesh interface which is normal as olpc-bg is not connected to the AP.

-bash-3.2# avahi-browse _presence._tcp | grep d48ee750
+ msh0 IPv4 d48ee750@xo-0C-E8-0B                          iChat Presence       local

Attaching logs of olpc-or.

Changed 6 years ago by gdesmott

comment:25 Changed 6 years ago by gdesmott

Actually this bug could be a consequence of #8441. Because of #8441, Salut will potentially announce a lot of _olpc-activity1._udp services that could make Avahi reaching the service limits and so making impossible to announce our presence. We should retry to reproduce once #8441 will be fixed (very soon hopefully).

comment:26 Changed 6 years ago by gdesmott

#8441 should be fixed in Joyride 2452. Could you retry with that version please?

I manually installed the new Salut package on 8-2:760 and wasn't able to reproduce this bug anymore for now.

comment:27 Changed 6 years ago by gdesmott

Would be good to know if someone has still experienced this problem with 8.2-766

comment:28 Changed 6 years ago by gregorio

  • Keywords blocks-:8.2.0 added; blocks:8.2.0 removed
  • Milestone changed from 8.2.0 (was Update.2) to 9.1.0
Note: See TracTickets for help on using tickets.