Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#7319 closed defect (fixed)

multicast RX broken in joyride

Reported by: dsd Owned by: jcardona
Priority: high Milestone: 8.2.0 (was Update.2)
Component: wireless Version: olpc-3
Keywords: 8.2.0:+ olpc3-20:- blocks:8.2.0 joyride-2103:+ joyride-2181:- joyride-2200:+ Cc: morgs, wad, dsaxena, Collabora
Blocked By: Blocking: #7383, #7393
Deployments affected: Action Needed: qa signoff
Verified: no

Description

I have an update1 laptop running on mesh network 11 which sees a lot of other XOs in the neighbourhood view.

I then sign my olpc3 laptop onto the same mesh. It appears in the neighbourhood on the update1 system.

No XOs ever appear on the neighbourhood view on the olpc3 system.

Attachments (1)

presenceservice.log (1.7 KB) - added by dsd 6 years ago.

Download all attachments as: .zip

Change History (46)

Changed 6 years ago by dsd

comment:1 Changed 6 years ago by dsd

  • Cc morgs added

comment:2 Changed 6 years ago by morgs

I pinged bpepple about building F9 telepathy-gabble/salut packages with olpc patches and --enable-olpc.

comment:3 follow-up: Changed 6 years ago by dsd

confirmed that rebuilding telepathy-salut-0.2.3-1.fc9 with --enable-olpc solves the problem with salut presence.

Can't test gabble presence at the moment because our school server jabber daemon is down

comment:4 Changed 6 years ago by dsd

gabble is working

comment:5 Changed 6 years ago by gdesmott

So I guess we should close this bug when olpc3 has the right packages.

comment:6 Changed 6 years ago by dsd

Agreed. Morgan, has there been any word from Brian?

comment:7 Changed 6 years ago by marco

  • Keywords 8.2.0:+ added; olpc3-20:- removed
  • Milestone changed from Never Assigned to 8.2.0 (was Update.2)

comment:8 Changed 6 years ago by marco

  • Keywords olpc3-20:- added

comment:9 Changed 6 years ago by dsd

  • Blocking 7383 added

comment:10 follow-up: Changed 6 years ago by dsd

  • Action Needed set to never set

Morgan says: telepathy-salut with --enable-olpc has been pushed to F9 updates. There are still some patches we need for stream tubes (IIRC) to circumvent normal security in the presence of rainbow that are too crackful to push to F9. dgilmore suggested a runtime option to enable that - cassidy is consulting with daf, sjoerd and m_stone.

comment:11 in reply to: ↑ 10 Changed 6 years ago by sjoerd

Replying to dsd:

Morgan says: telepathy-salut with --enable-olpc has been pushed to F9 updates. There are still
some patches we need for stream tubes (IIRC) to circumvent normal security in the presence of
rainbow that are too crackful to push to F9. dgilmore suggested a runtime option to enable that > - cassidy is consulting with daf, sjoerd and m_stone.

With my salut upstream head on i really want to keep those patches out of salut. They completely circumvent the basic security checking salut and gabble do. If we want to move something like this upstream then it should be fixed in a better way.

comment:12 Changed 6 years ago by mstone

  • Keywords blocks:8.2.0 added

comment:13 Changed 6 years ago by dsd

Due to time constraints, we have decided to recreate the OLPC-3 branch with our ugly patches. Morgan will take care of this. Hopefully we can find a nicer solution for 9.1. I added this to the nastiness list:
http://wiki.laptop.org/go/Distro_Version_Migration_Nastiness

comment:14 Changed 6 years ago by morgs

  • Keywords joyride-2103:+ added

I definitely get salut presence on joyride 2103, with telepathy-salut I built from the OLPC-3 branch.

comment:15 Changed 6 years ago by morgs

I have built telepathy-gabble 0.7.4 with OLPC patches on OLPC-3.

Guillaume has built the latest salut 0.3.3 with OLPC patches on OLPC-3 now that we are on F9.

comment:16 Changed 6 years ago by morgs

I've built telepathy-gabble 0.7.6 with OLPC patches on OLPC-3.

comment:17 Changed 6 years ago by morgs

All these telepathy updates are in joyride as of 2111.

comment:18 Changed 6 years ago by dsd

I'm having trouble with joyride-2123. Presence of other XOs only works when associated to a network, no XOs appear when you are on a mesh channel. This is using salut, I'll try gabble when we have a working school server again.

In terms of collaboration, I can't get it working at all. I've been trying to share Chat and Write activities between two XOs over salut (when both on the same access point). When sharing with the neighborhood, the activity appears on the neighborhood view on the other XO but joining it just opens the activity as blank.

I also tried inviting the other XO to the activity, and that didn't result in any visual invitation appearing (but I'm not exactly sure where to be looking for that in this UI redesign).

comment:19 Changed 6 years ago by morgs

Please make sure you have logs enabled when testing, and post presenceservice.log, the activity log, and the output of olpc-netstatus - from both XOs

comment:20 Changed 6 years ago by morgs

#7457 is probably a duplicate, please coordinate with Charlie.

Sjoerd can possibly help debug this, although he's at Guadec at the moment.

comment:21 Changed 6 years ago by gdesmott

Did some tests using Joyride 2155.

Presence and collaboration worked fine with telepathy-salut connected to an AP but not on simple mesh.
avahi-browse doesn't show services from the other XO so I suspect a multicast problem in the mesh.

comment:22 Changed 6 years ago by dsd

I've been working on this today as well and have made the same findings so far.

comment:23 Changed 6 years ago by gdesmott

  • Component changed from presence-service to wireless
  • Owner changed from Collabora to dwmw2

Seems mdns is completely broken with simple mesh. I published a dummy service using avahi-publish and it was not displayed on my other XO using "avahi-browse --all".
avahi-browse still displays local services though and XO's can ping each other.

Re-assigning to the wireless component as I suspect a multicast bug in the mesh code.

comment:24 in reply to: ↑ 3 Changed 6 years ago by gdesmott

Replying to dsd:

Can't test gabble presence at the moment because our school server jabber daemon is down

I tested server collaboration with Joyride 2155 and it worked fine.

comment:25 Changed 6 years ago by dsd

  • Cc wad added
  • Summary changed from Presence broken in olpc3 to multicast RX broken in joyride

I have done further testing and I believe joyride cannot receive multicast frames.

I am using this app to test:
http://www.venaas.no/multicast/ssmping/

I put 2 708 systems on the same mesh channel. One runs ssmpingd, another runs ssmping, and the ssmping box receives both unicast and multicast responses as expected.

Swap one of the 708 systems for joyride, and repeat the test. Joyride running ssmpingd, 708 running ssmping: 708 receives both unicast and mcast responses - good.

Swap the arrangement so that 708 runs ssmpingd and joyride runs ssmping: joyride only receives unicast responses.

Running libertas firmware 5.110.22p14 on all the systems. This is the version shipped in joyride and in 708.

comment:26 Changed 6 years ago by dsaxena

  • Cc dsaxena added

comment:27 Changed 6 years ago by cjb

Try with a stable branch (2.6.22) kernel on the same joyride build?

comment:28 follow-up: Changed 6 years ago by dsd

I determined that joyride-2072 worked and joyride-2081 failed. From that I went through and found that this change in joyride-2074 is what caused the regression:

-libertas-usb8388-firmware.noarch 2:5.110.20.p49-1.fc9
+libertas-usb8388-firmware.noarch 2:5.110.22.p14-1.fc9

Interestingly the version we upgraded to is the same one present in update1. It looks like the joyride kernel is currently incompatible with the firmware being used in update1?

comment:29 in reply to: ↑ 28 Changed 6 years ago by dsaxena

  • Owner changed from dwmw2 to jcardona

Replying to dsd:

I determined that joyride-2072 worked and joyride-2081 failed. From that I went through and found that this change in joyride-2074 is what caused the regression:

-libertas-usb8388-firmware.noarch 2:5.110.20.p49-1.fc9
+libertas-usb8388-firmware.noarch 2:5.110.22.p14-1.fc9

Interestingly the version we upgraded to is the same one present in update1. It looks like the joyride kernel is currently incompatible with the firmware being used in update1?

Probably a regression in the driver brought in from upstream merge. Javier, can you take a look at this one?

comment:30 Changed 6 years ago by dsaxena

  • Blocking 7393 added

comment:31 Changed 6 years ago by mstone

  • Action Needed changed from never set to diagnose

comment:32 Changed 6 years ago by andrey

Hi. In joyride-2081 it looks like 'depmod' was never run so there is no modules.dep and, therefore, no modules loaded (including libertas).

After running depmod and rebooting, we're fine and avahi shows services advertised on our LAN, so mDNS and multicast are working.

comment:33 Changed 6 years ago by dsd

Are you definitely using salut over the mesh?

It works fine for us using salut over infrastructure networks and gabble. It's just the mesh case that is broken.

You could also try the ssmping test.

comment:34 Changed 6 years ago by andrey

Ah, sorry, that was (unintentionally) over infra. Turning off multicast on eth0 to get that out of the way:

# ifconfig eth0 -multicast

...and indeed I can't see the other XOs. However it looks like I can't ping via the msh0 interface either, so I'll look into that.

comment:35 Changed 6 years ago by dsd

FWIW, unicast pinging over the mesh interface was definitely working here, even when mcast was broken.

comment:36 Changed 6 years ago by gdesmott

  • Cc collabora added

comment:37 Changed 6 years ago by dsd

The stable kernel branch has several libertas commits which are not in testing.
I tried to put them in testing, but it does not seem to have helped the case. My work can be found at users/dsd/olpc-2.6 branch "testing"

comment:38 Changed 6 years ago by dsd

Downgrading to 2.6.22 definitely fixes it.

I then tried "backporting" the 2.6.22 libertas driver to 2.6.25. I just copied the directory over verbatim. Booted that kernel, and surprisingly, multicast still does NOT work.

So, this seems to be a kernel regression outside of the wireless driver.

comment:39 Changed 6 years ago by dsd

I attempted to reconfirm those results today: I took current testing kernel, copied drivers/net/wireless/libertas from stable, and retested. multicast is working so perhaps I made a mistake yesterday. It looks like something at the driver-level.

Now to resync the drivers... The diff at the moment is:

27 files changed, 1525 insertions(+), 1140 deletions(-)

comment:40 Changed 6 years ago by dsd

OK, no idea what went wrong yesterday, but my attempts to forward port the most recent libertas commits from stable to testing do indeed solve the problem.

So, please pull users/dsd/olpc-2.6 branch testing

comment:41 Changed 6 years ago by dsd

  • Keywords joyride-2181:- joyride-2200:+ added
  • Resolution set to fixed
  • Status changed from new to closed

new kernel pulled in, fixed in joyride-2180

comment:42 Changed 6 years ago by dsd

that should say: fixed in joyride-2200

comment:43 Changed 6 years ago by gdesmott

  • Cc Collabora added; collabora removed

comment:44 Changed 6 years ago by gdesmott

Collaboration seems to work fine with Joyride-2216

comment:45 Changed 6 years ago by gregorio

  • Action Needed changed from diagnose to qa signoff
  • Priority changed from normal to high
Note: See TracTickets for help on using tickets.