Ticket #7553 (closed defect: invalid)

Opened 12 months ago

Last modified 11 months ago

Instability of mesh while chatting

Reported by: joe Owned by: mbletsas
Priority: normal Milestone: 8.2.0 (was Update.2)
Component: wireless Version: Build 708
Keywords: Cc: kimquirk, mstone, Charlie, mel@…, morgs, Collabora, gregorio
Action Needed: never set Verified: no
Deployments affected: Blocked By:
Blocking:

Description

"Under the tree" configuration, no school server, build 708.

Independently of the number of XOs participating in the chatting, laptops periodically leave and then rejoin the network. This behaviour usually starts in between 1/2 hour and an hour after XOs originally joined the chat.

Attachments

logs.CSN74804BE8.2008-07-21.12-10-38.tar.bz2 (34.0 kB) - added by joe 12 months ago.
Joe-E
logs.CSN74702114.2008-07-21.12-09-41.tar.bz2 (23.9 kB) - added by joe 12 months ago.
X56

Change History

  Changed 12 months ago by joe

  • cc mel@… added

  Changed 12 months ago by morgs

  • cc morgs, Collabora added
  • component changed from chat-activity to wireless

Reassigning to wireless, this is not a Chat issue.

Joe, please provide more details such as logs, output of olpc-netstatus...

  Changed 12 months ago by mchua

Joe asked me to try reproducing this at the ILXO office. Used 5 MP XOs newly flashed with 708, under-the-tree (mesh network 1) in radio-intensive environment, Chat Activity.

Somewhere between 25 and 35 minutes into the test, one of the XOs left and rejoined the chat. It is now 110 minutes into the test and no other connectivity events have been detected. Did the testbed XOs at 1cc flicker in and out of the network more frequently? Any ideas on how to better reproduce this?

  Changed 12 months ago by gdesmott

Salut seems to be broken with current Joyride in simple mesh configuration because of multicast issues. See #7319

Changed 12 months ago by joe

Joe-E

Changed 12 months ago by joe

X56

follow-up: ↓ 7   Changed 12 months ago by joe

  • cc mchua added

Took 2 XOs (Joe-E and X56) home (supposedely radio-quiet environment), ran Chat in the simple mesh network - experienced no laptop going in/out of the chat. After a while X56 dropped from the network completely, couldn't recognize the Joe-E laptop, required several reboots to make it connected again. I'm attaching log files collected from both laptops.

Preliminary conclusion: in the radio-quiet environment, simple mesh networking should be stable enough for expected deployments.

Got a 2.4 GHz spectrum analyzer, will measure "radio-intensity" at 1CC and at home.

  Changed 12 months ago by gdesmott

Could you post telepathy-salut.log with SALUT_DEBUG=all and GIBBER_DEBUG=all please? See http://wiki.laptop.org/go/Telepathy_debugging for details.

A tcpdump file could be useful too.

in reply to: ↑ 5   Changed 12 months ago by morgs

Replying to joe:

Took 2 XOs (Joe-E and X56) home (supposedely radio-quiet environment), ran Chat in the simple mesh network - experienced no laptop going in/out of the chat. After a while X56 dropped from the network completely, couldn't recognize the Joe-E laptop, required several reboots to make it connected again. I'm attaching log files collected from both laptops. Preliminary conclusion: in the radio-quiet environment, simple mesh networking should be stable enough for expected deployments. Got a 2.4 GHz spectrum analyzer, will measure "radio-intensity" at 1CC and at home.

You've tested the "2 kids under a tree" case. We have deployments of >500 XOs per school.

The moment more XOs join the mesh network, it ceases to be a nice simple "radio quiet" environment. You will get different results with 3, 5, 10, 30 XOs. You will get different results depending on which activities are shared, how many at once, how active the participants are, and how large the data (PDF in Read for example) is.

  Changed 12 months ago by joe

The solution (per Michail's advice) is not to proceed with the mesh network (with all its limitations), but to switch to connecting through a school server with an active antenna. I'm going to conduct these tests.

  Changed 12 months ago by mbletsas

Actually, the best way to conduct these tests is to use an access point.

M

  Changed 12 months ago by joe

Sorry, I actually meant an access point. ;-( Probably was distracted while typing...

  Changed 11 months ago by kimquirk

  • cc gregorio added; mchua removed
  • milestone changed from 8.1.1 (was Update1.1) to 8.2.0 (was Update.2)

I actually don't think this is a 'bug', but normal behavior of an RF channel. If a laptop cannot get its packets through the channel (probably due to noise or congestion), then it will fall off the network and rejoin again later. If you guarantee a low-RF environment for testing, then you will see much less of this.

Here is a better defined requirement for chat in simple mesh mode: in a low-noise, low-RF environment, 10 laptops should be able to see each other and chat with each other over a 1 hour period without any laptops crashing, hanging, or losing connectivity. If you don't have a low-noise environment, then you may lose connectivity from time to time. As long as the laptops do come back to the chat session, it is successful.

  Changed 11 months ago by kimquirk

  • status changed from new to closed
  • resolution set to invalid

Please re-open this if someone would like to argue that this is a bug; or if this laptop should have been able 100% of the time to communicate. Otherwise, I believe it is good to get this feedback that a laptop cannot communicate.

  Changed 11 months ago by morgs

This scenario will work fine for Chat, where we are simply using an XMPP text channel. People can leave and come back without any harm. However it breaks tubes connections - and hence all other collaborative activities - see the scenarios described in this mail:

http://lists.freedesktop.org/archives/telepathy/2008-August/002112.html

So collaboration fails, silently and mysteriously as we have no feedback in the UI.

Note: See TracTickets for help on using tickets.