Ticket #6888 (new defect)

Opened 6 years ago

Last modified 6 years ago

Laptop connects to presence server, but not seen by other laptops

Reported by: wad Owned by: Collabora
Priority: normal Milestone: 8.2.0 (was Update.2)
Component: presence-service Version:
Keywords: schoolserver, presence, ejabberd Cc:
Action Needed: diagnose Verified: yes
Deployments affected: Blocked By:
Blocking:

Description

Using either mesh and wifi connections to a school presence service, we occasionally see laptops that seem to be connected just fine, but which are only seen by a few other laptops.

The "missing" laptop will show other laptops connected to the school server, and is seen on some other laptops, but it is missing from most neighborhood views. For example, in test 0410B, laptop X59 was seen on 2 of the 30 neighborhood views checked.

Logs and packet traces for the WiFi case are available at:  http://wiki.laptop.org/go/Collab_Network_School_Wifi_Tests#Test_0410B

Change History

  Changed 6 years ago by gdesmott

Didn't see anything suspicious in X59's log. I'd like to have logs from one laptop which was able to see X59 and one from a laptop which wasn't. Thanks.

  Changed 6 years ago by gdesmott

When you have this kind of problem, it would be worth to check if buddies displayed on the mesh view are the same as the one know by the PS (using the Analyze activity).

  Changed 6 years ago by wad

A similar case was logged happening with a school mesh at:  http://wiki.laptop.org/go/Collab_Network_School_Mesh_Tests#Test_0414E

We'll try to get logs from both a laptop seeing the "missing laptop" (hard to find !) and one that doesn't see it.

Jabber logs from the server for both tests are available at:  http://xs-dev.laptop.org/mesh/test0414/ejabberd/

  Changed 6 years ago by gdesmott

In Test 0414E, X58 (b79e24ede504c970c50dd12a42285d6d0a09f977) can't see X59 (d0fe86c2281b5f8978138160318ba33b5d86c2b4).

It makes sense as we see in X58 PS log:

1208220146.358327 DEBUG s-p-s.presenceservice: Buddy left: X59 (#00A0FF,#9900E6)

because of (X58 Gabble log):

RECV [263]:
-----------------------------------
'<presence from='d0fe86c2281b5f8978138160318ba33b5d86c2b4@schoolserver.annex.xs.laptop.org/Telepathy' to='b79e24ede504c970c50dd12a42285d6d0a09f977@schoolserver.annex.xs.laptop.org/Telepathy' type='unavailable'><status>Replaced by new connection</status></presence>'
-----------------------------------

That's weird because I didn't see any problem in X59 Gabble logs.

  Changed 6 years ago by gdesmott

Actually, X59 Gabble logs contains

** (telepathy-gabble:1925): DEBUG: gabble_roster_presence_cb: ignoring presence from ourselves on another resource:
<presence type="unavailable" to="d0fe86c2281b5f8978138160318ba33b5d86c2b4@schoolserver.annex.xs.laptop.org/Telepathy" from="d0fe86c2281b5f8978138160318ba33b5d86c2b4@schoolserver.annex.xs.laptop.org/Telepathy"> <status>Replaced by new connection</status>
</presence>

It's really weird because that could means another laptop was connected using the same login/pass. Did you copied its ~/.sugar to another XO? Another possibility is that Gabble reconnects but the server didn't close its first stream so it remains apparently connected. So that could be a server issue.

  Changed 6 years ago by gdesmott

I tried to connect a XO A using the same login/pass than an already connected XO B. B was properly disconnected because of a stream error from the server:

RECV [195]:
-----------------------------------
'<stream:error><conflict xmlns='urn:ietf:params:xml:ns:xmpp-streams'/><text xml:lang='' xmlns='urn:ietf:params:xml:ns:xmpp-streams'>Replaced by new connection</text></stream:error></stream:stream>'
-----------------------------------
** (telepathy-gabble:1736): DEBUG: connection_stream_error_cb: got stream error:
<stream:error> <conflict xmlns="urn:ietf:params:xml:ns:xmpp-streams"></conflict>
 <text xmlns="urn:ietf:params:xml:ns:xmpp-streams" xml:lang="">Replaced by new connection</text>
</stream:error>

** (telepathy-gabble:1736): DEBUG: connection_stream_error_cb: found conflict node, emiting status change
** (telepathy-gabble:1736): DEBUG: tp_base_connection_change_status: was 0, now 2, for reason 5

It would be worth to test the same scenario using a jabber server running the same configuration as the one used for Test 0414E

  Changed 6 years ago by gdesmott

<Robot101> Robert McQueen: is it normal you'd receive a <presence>... Replaced ... </presence> for yourself before you get the <stream:error><conflict>?
<Robot101> cromain: yes
<Robot101> cassidy: I think in this case, what happens is the server sends a FIN packet with the <stream:error> which doesn't get to us...
<Robot101> cassidy: I don't know how or why the other laptop is failing to connect, but this /must/ be another instance connecting

follow-up: ↓ 9   Changed 6 years ago by wad

The server jabber configuration is well documented. It is from build 160.

While I am sure that nothing from .sugar was copied from machine to machine, tests are run sequentially. If the jabber server still thinks it has a connection from laptop A, what happens when laptop A shows up again ?

Perhaps a hash collision ?

in reply to: ↑ 8   Changed 6 years ago by gdesmott

Replying to wad:

While I am sure that nothing from .sugar was copied from machine to machine, tests are run sequentially. If the jabber server still thinks it has a connection from laptop A, what happens when laptop A shows up again ?

It's supposed to close the first connection but it seems it doesn't according to logs.

Perhaps a hash collision ?

Very unprobable. The login is a hash of the public key and the pass a hash of the private key. So the probabilty to have 2 laptops sharing the same login *and* pass is ~0.

  Changed 6 years ago by marco

  • keywords 8.2.0:? added
  • milestone changed from Never Assigned to 8.2.0 (was Update.2)

  Changed 6 years ago by marco

  • keywords 8.2.0:? removed
  • next_action set to diagnose

  Changed 6 years ago by sjoerd

This might be caused by a server bug. As we're sent a presence saying that we're offline, while we are clearly not. One possibility is that on a restart of the sugar session the dbus session and gabble is kept alive, causing a second log in on the same account.. Again this shouldn't have the effect. At some point gdesmott tested this theory, but it worked fine for him, so it might be fixed in newer ejabbrd versions.

I'm working on a test plan to test various jabber scalability and stability issues. I'll include a section with some test instructions to see if this ever happens with our current software stack.

Note: See TracTickets for help on using tickets.