Opened 7 years ago

Last modified 6 years ago

#6818 new defect

Driver does not set link level multicast addresses into firmware when ip address assigned to mesh interface

Reported by: shailen Owned by: cjb
Priority: high Milestone: 9.1.0-cancelled
Component: wireless Version:
Keywords: blocks-:8.2.0 cjbfor9.1.0 Cc: ashishs@…, rbhagwat@…, rchokshi@…, carrano, cjb, dsaxena, gnu, gregorio, morgs
Blocked By: Blocking:
Deployments affected: Action Needed: diagnose
Verified: no

Description

It has been observed that driver sets multicast addresses into the firmware only when user assigns IP address to eth interface. This may create a problem in pure mesh mode when no eth interface configured with IP address. In above case, multicast address filter in firmware will not get configured and will drop all received valid multicast packets. Is it by design?

Attachments (3)

mesh_mcast.patch (519 bytes) - added by ashish 7 years ago.
mesh_mcast_v2.patch (7.8 KB) - added by ashish 7 years ago.
Independent multicast list for eth and msh interface
0001-Separate-multicast-configuration-for-mesh-and-wlan-i.patch (9.5 KB) - added by jcardona 6 years ago.

Download all attachments as: .zip

Change History (29)

Changed 7 years ago by ashish

comment:1 follow-up: Changed 7 years ago by ashish

The attached patch http://dev.laptop.org/attachment/ticket/6818/mesh_mcast.patch populates default multicast addresses in the firmware when IP address is assigned to mshX interface.
This fixes mesh view problem in firmware version 22.p8/9.
This patch is not tested for any other side that it might have.

comment:2 in reply to: ↑ 1 Changed 7 years ago by carrano

Replying to ashish:

The attached patch http://dev.laptop.org/attachment/ticket/6818/mesh_mcast.patch populates default multicast addresses in the firmware when IP address is assigned to mshX interface.
This fixes mesh view problem in firmware version 22.p8/9.
This patch is not tested for any other side that it might have.

Tested a kernel with this patch and confirm that Mesh view is back to normal.
Please keep this open until driver patch is committed.

comment:3 Changed 7 years ago by dwmw2

  • Cc wmb added

If a 400ms delay is needed instead of the currently implemented 200ms delay, then we'll need to change that in OpenFirmware too.

comment:4 Changed 7 years ago by dwmw2

oops, sorry; wrong bug.

comment:5 Changed 7 years ago by dwmw2

  • Cc wmb removed

comment:6 follow-up: Changed 7 years ago by dwmw2

This patch doesn't do the right thing when both ethX and mshX interfaces are in use, does it? We need to combine the desired addresses on _both_ interfaces, and listen for all of them. And honour the ALLMULTI and PROMISC flags if they're set on either interface.

comment:7 Changed 7 years ago by carrano

After some unknown time (I was not looking but it is less than 1 hour) the mesh view was again displaying no XOs. We have to keep investigating this.

David,
Do you believe that this can be explained by your comments?

comment:8 follow-up: Changed 7 years ago by dwmw2

Maybe. If set_multicast_list() gets called on the eth0 interface and it's not listening on some of the addresses that the msh0 interface is, those addresses will probably get dropped from the multicast list in the device.

Run 'ip maddr'. Are there addresses listed on msh0 that aren't on eth0?

comment:9 in reply to: ↑ 6 Changed 7 years ago by ashish

Replying to dwmw2:

This patch doesn't do the right thing when both ethX and mshX interfaces are in use, does it?

The basic stuff worked when we used both ethx and mshx interfaces at the same time.

We need to combine the desired addresses on _both_ interfaces, and listen for all of them. And honour the ALLMULTI and PROMISC flags if they're set on either interface.

Agree.

comment:10 in reply to: ↑ 8 Changed 7 years ago by carrano

Run 'ip maddr'. Are there addresses listed on msh0 that aren't on eth0?

ip maddr is always (*) diplaying the same results:

(*) always means:

  • An XO free of this bug, because it is running stock 703 (with stock fw version 22p1)
  • An XO that uses the patched kernel plus 22.p9 and is not (yet) manifesting this bug
  • An XO that uses the patched kernel plus 22.p9 and manifested this bug after some time

ip maddr output:

1:	lo
	inet  224.0.0.1
	inet6 ff02::1
2:	eth0
	link  33:33:ff:05:27:a3
	link  01:00:5e:00:00:01
	link  33:33:00:00:00:01
	inet  224.0.0.1
	inet6 ff02::1:ff05:27a3
	inet6 ff02::1
3:	msh0
	link  33:33:ff:05:27:a3
	link  33:33:00:00:00:01
	link  01:00:5e:00:00:fb
	link  01:00:5e:00:00:01
	inet  224.0.0.251
	inet  224.0.0.1
	inet6 ff02::1:ff05:27a3
	inet6 ff02::1

Changed 7 years ago by ashish

Independent multicast list for eth and msh interface

comment:11 Changed 7 years ago by ashish

Patch https://dev.laptop.org/attachment/ticket/6818/mesh_mcast_v2.patch maintains two independent multicast list one for eth and other for msh interface.
Therefore, any change in multicast list of one interface would not affect changes in other interface's multicast list.
The patch is based on http://dev.laptop.org/git?p=olpc-2.6;a=snapshot;h=bb855af96a4caa7e6dc80dd9e81266c235961a02 kernel, and not clean (just to get feedback) wrt. to formatting etc.

comment:12 Changed 7 years ago by carrano

It seems we finished the first part of the story: Making the multicast filter not break anything. But it seems we still have work to do on the second: make the multicast filter actually work on a suspended XO.

I just did the following test (on a 703 build, with firmware 22.p10 and a kernel patched as above (mesh_mcast_v2.patch):

Supended a node and tested resume for the following events (all of than should bring the node back, I believe)

1 - ping multicast address 224.0.0.1 => OK (the node resumed)

2 - shared an activity in another node (this causes multicast frames to be sent to 01:00:5e:00:00:fb) => FAIL (the suspended node ignored). Expected: the activity would show up on the mesh view (as it did in the control group) - it did not.

3 - added another node to the mesh (this also causes multicast frames to be sent to 01:00:5e:00:00:fb) => FAIL (the suspended node ignored). Expected: the new XO would show up on the mesh view (as it did in the control group) - it did not.

I would like to stress that:

A - the ip maddr output is not providing us with the information we need (it always report the expected bindings, for example to 01:00:5e:00:00:fb). I would say it does not report actual multicast addresses informed to the firmware.

B - Once again the 224.0.0.1 address (bound to 01:00:5e:00:00:01) works while another multicast addresses fail. (please refer to http://dev.laptop.org/ticket/6854#comment:8). My guess is that understanding this will provide us with an insight.

comment:13 Changed 6 years ago by ashish

  • Cc rchokshi@… added

I did the following to confirm wakeup with two XOs, running 703 patched kernel (mesh_mcast_v2.patch) firmware 22.p10.
XO1

ethtool -s msh0 wol m
ip maddr
echo mem > /sys/power/state

on XO2 populated static ARP entries for the multicast addresses listed by ip maddr command above.
e.g.,

arp -s 10.0.0.144 01:00:5e:00:00:fb
ping 10.0.0.144 -c1

And it waked up XO1.

comment:14 Changed 6 years ago by carrano

Ashish,

You are right. I assumed that wake on multicast was activated because the node wakes to pings at 224.0.0.1, but it was a bad assumption. Once "ethtool -s msh0 wol um" was issued the suspended node began responding as expected, waking to:

  • multicast pings at 224.0.0.1
  • the inclusion of a new node in the mesh
  • the sharing of an activity

Actually as presence information is periodic, the nodes wakes up every few minutes for them (independent of the conditions above).

I believe this is the expected behavior. Any one?

comment:15 follow-up: Changed 6 years ago by jcardona

I've uploaded a patch that extends Ashish's approach and I believe addresses dwmw's concerns.

Ricardo, can you take it for a ride?

Feedback appreciated.

comment:16 Changed 6 years ago by carrano

  • Cc carrano added

comment:17 in reply to: ↑ 15 ; follow-up: Changed 6 years ago by carrano

Replying to jcardona:

I've uploaded a patch that extends Ashish's approach and I believe addresses dwmw's concerns.

Ricardo, can you take it for a ride?

Feedback appreciated.

Javier,
Tested the new patch. The multicast filter seems to be running fine!

comment:18 in reply to: ↑ 17 Changed 6 years ago by gnu

Yes, the real test will be how it integrates in the whole system: With the presence service running, with "ethtool -s msh0 wol uma", and with auto-suspend: Will we drop the unicast or multicast packet that wakes us up (#6528), or will it actually reach the application that's awaiting it?

And, secondarily, can we stay suspended long enough to save power? Or will the application level multicast traffic be so constant that we always wake a few seconds after we suspend (in which case we need to fix the applications so they aren't so chatty)?

comment:19 Changed 6 years ago by gregorio

  • Milestone Never Assigned deleted

Milestone Never Assigned deleted

comment:20 follow-up: Changed 6 years ago by gnu

  • Action Needed set to diagnose
  • Cc cjb dsaxena gnu added
  • Keywords blocks?:8.2.0 added
  • Priority changed from normal to high

Chris, can you confirm whether Ohm is setting the proper "wake-on" parameters on eth0 and msh0 when it suspends?

When suspending for lid-close, it should "wake-on: d" (disabled, i.e. wake on nothing).
When suspending automatically, it should "wake-on: uma" (unicast, multicast that we care about, and arps that we care about).

This should fix up a bunch of Presence bugs that occur in the "presence" of suspend. Ethtool sources can tell you how to set these driver settings.

Currently, in joyride-2263, I have auto-suspend on, and it's suspending regularly, but the network drivers are set to "Wake-on: u".

There's a separate issue, which is that the latest Marvell firmware should support wake-on-arp,
but the driver doesn't report that it supports wake-on-arp ("a"). See #3732. In the absence of wake-on-arp, we should STILL fix wake-on-multicast (and ideally code it so it'll automatically turn on wake-on-arp once the driver supports it).

comment:21 in reply to: ↑ 20 Changed 6 years ago by carrano

Replying to gnu:

Chris, can you confirm whether Ohm is setting the proper "wake-on" parameters on eth0 and msh0 when it suspends?

When suspending for lid-close, it should "wake-on: d" (disabled, i.e. wake on nothing).
When suspending automatically, it should "wake-on: uma" (unicast, multicast that we care about, and arps that we care about).

This should fix up a bunch of Presence bugs that occur in the "presence" of suspend. Ethtool sources can tell you how to set these driver settings.

Currently, in joyride-2263, I have auto-suspend on, and it's suspending regularly, but the network drivers are set to "Wake-on: u".

There's a separate issue, which is that the latest Marvell firmware should support wake-on-arp,
but the driver doesn't report that it supports wake-on-arp ("a"). See #3732. In the absence of wake-on-arp, we should STILL fix wake-on-multicast (and ideally code it so it'll automatically turn on wake-on-arp once the driver supports it).

wake on ARP is supported in the firmware by setting a signature based filter in the driver. Please refer to: http://dev.laptop.org/ticket/6993, particularly http://dev.laptop.org/ticket/6993#comment:3

comment:22 Changed 6 years ago by gregorio

  • Cc gregorio added

Hi Guys,

If I understood Chris correctly, we don't wake the XO on any network traffic arriving at the NICs (radio or otherwise).

Is the original problem reported resolved? If so let's close this and move the "wake" on network traffic to a new bug and make it milestone 9.1.

If the original problem is not resolved, let me know how it affects the user (e.g. does the XO need to join a multicast?) and we'll reconsider its blocker status.

Thanks,

Greg S

comment:23 Changed 6 years ago by cjb

  • Keywords blocks-:8.2.0 added; blocks?:8.2.0 removed
  • Owner changed from dwmw2 to cjb

I think this isn't going to block 8.2.0, but I'll try to look at it and see what the smallest fix we could make is, and how necessary it is.

comment:24 Changed 6 years ago by cjb

  • Milestone set to 8.2.1

comment:25 Changed 6 years ago by morgs

  • Cc morgs added

comment:26 Changed 6 years ago by mstone-xmlrpc

  • Keywords cjbfor9.1.0 added
  • Milestone changed from 8.2.1 to 9.1.0

Pushing out to 9.1.0, per edmcnierney's request.

Note: See TracTickets for help on using tickets.