Ticket #4616 (new defect)

Opened 7 years ago

Last modified 6 years ago

Mesh doesn't resume from suspend on reciept of multicast packets

Reported by: gnu Owned by: cjb
Priority: normal Milestone: 8.2.0 (was Update.2)
Component: power manager (OHM) Version: Development build as of this date
Keywords: Cc: jg, mbletsas, dwmw2, Collabora
Action Needed: Verified: no
Deployments affected: Blocked By: #6993
Blocking:

Description

B4, Q2D03, build 616.

Set up two XO's next to each other. On one of them, run:

ping6 -I msh0 ff02::1

This pings the IPv6 link-local "all nodes" multicast address.

This will print two lines (packets) per second, one for its own node, and one from the other XO. (If there are more XO's nearby, they will also show up as additional lines.)

Now suspend the target XO with the power button. It stops responding to the pings. When you resume it by pressing a key on the keyboard, it shortly starts responding to pings again.

Since ieee802 multicast packets are used for IPv6 neighbor discovery (the ARP equivalent), this also prevents ordinary unicast pings or unicast TCP connections from reaching a suspended machine. You can demonstrate this by reading off the source address of the pings that did come back, and sending a ping6 directly to that address. It works while not suspended, but not after suspending.

I believe that this "bug" results from some of the massive confusion about suspending -- is the power-button suspend a "mac-style" or "pc-style" suspend that requires manual action to resume from? Or is it an automatic "olpc-style" suspend that is just a power saving measure and requires no manual action? As we further debug and evolve the suspend architecture, we'll resolve this question and can then either fix (or not) this possible bug.

An automatic suspend should be awakened by these packets.

A manual suspend should not be awakened by these packets.

Change History

  Changed 7 years ago by wad

Please retest with the most recent release (624). This should have been fixed in recent WLAN firmware, which I don't think was included in 616.

  Changed 7 years ago by dilinger

625 is the latest release, actually. ;)

It has the DCON fix..

  Changed 7 years ago by jg

  • cc jg added

Do we have a way to distinguish automatic vs. user requested suspends in the kernel now?

  Changed 7 years ago by gnu

Tested in 623. B4 does not wake from suspend when it receives a multicast packet. My first try at upgrading to 624 didn't work, so I reflashed 623.

  Changed 7 years ago by jg

  • owner changed from dilinger to cjb
  • component changed from kernel to power manager (OHM)
  • milestone changed from Never Assigned to Update.1

follow-up: ↓ 7   Changed 7 years ago by cjb

  • cc mbletsas added

If this doesn't work now, there's nothing OHM can do to make it start working. The action item for OHM would be *after* we have these wakeups, in order to ignore them when we go to hard-sleep instead of suspend.

Michailis, should receipt of a multicast packet ever cause the wireless module to assert a wakeup?

in reply to: ↑ 6   Changed 7 years ago by mbletsas

The way things are setup right now, we didn't have a use scenario for wakeup on multicast.

M.

  Changed 7 years ago by gnu

This is what I mean about massive confusion about suspends.

The use scenario is that we've done an automatic suspend to save power, and somebody opens a TCP connection to the laptop, using IPv6. IPv6 does its "ARP" equivalent using multicast packets, to avoid waking up every node on the network like ARP (broadcasts) do. Pings will produce the same IPv6 neighbor discovery multicast.

In general, the chip should have three bits that say:

  • wake us on unicast
  • wake us on multicast
  • wake us on broadcast

and for efficiency in supporting IPv4 ARP, it should have a fourth:

  • wake us on broadcast, if and only if the bcast packet has byte value X in byte position Y.

Then the host can turn these four bits on and off (and set the two bytes used by the arp kludge) as it desires, depending on why it suspended in the first place.

  Changed 7 years ago by gnu

(the arp kludge is to fix #3732 without waking up the laptop every time an ARP for somebody else is received on wireless.)

  Changed 7 years ago by mbletsas

"wake up on multicast" should really be "wake up on certain multicasts" We don't really need to wake up when multicast distribution is going on.

For that reason, "wake on multicast" becomes more like the fourth case that you describe, something that we are looking into but which has performance implications.

M.

follow-up: ↓ 13   Changed 7 years ago by gnu

"Wake up on multicast" is ALREADY "wake up on certain multicasts". The kernel loads in a set of multicast addresses that the chip should be listening for. If a multicast comes along that isn't in that set, don't listen for it and don't wake up for it. (Most chips do this in hardware.) This is totally standard on every Ethernet chip made, by every vendor.

If you do "/sbin/ip maddr" it will list you the (link level, IPv4 and IPv6) multicast addresses that each interface is listening to. What gets loaded into the chip is the link level addresses, which are the 802.11 MAC addresses that correspond to each of those higher level addresses.

  Changed 7 years ago by gnu

Another way to think about automatic suspends is that the system is in full operation, and it wants to wake up from any interrupt that comes along, from any source. The suspend is just a behind-the-scenes way to save power.

Now if we had an interrupt controller that was powered during suspend, and if our USB host controller didn't require constant DMA that doesn't work during suspend in order to process interrupts, and our network chip wasn't on the wrong side of a powered-down USB bus, then the chip could just signal an interrupt in the usual fashion, and the system would wake up and (eventually) take the interrupt. That's the model we'd like to shoot for -- simple and straightforward. (Maybe for Gen2.)

In the meantime, the convolutions that we have to do to bypass dead hardware shouldn't obscure the ultimate goal, which is to make this be a lot like a normal interrupt. So the kernel and the firmware could use the normal interrupt enables in the chip to turn various sources on and off. And if an interrupt should happen and it's enabled, but we're suspended, then tickle the system to wake up, using the special secret un-suspend wire. Eventually it will come back, and turn on the USB bus, and enumerate the network chip, and... then... it can take the interrupt.

Does that model work better for the firmware than the "four bits" model? It should handle a bunch of interruptible conditions better -- like loss of AP connection, and other interrupts that don't involve a packet arriving.

in reply to: ↑ 11   Changed 7 years ago by mbletsas

I agree that from the kernel point of view, what you describe is straightforward, however from the firmware point of view things are different.

The firmware doesn't understand anything above layer-2 and currently treats broadcast and multicast frames in the same manner (i.e. keeps track of recently forwarded ones so that it doesn't retransmit them).

M

follow-up: ↓ 15   Changed 7 years ago by gnu

If the firmware isn't implementing multicast filtering like any other Ethernet chip, then it needs to implement multicast filtering, like every other Ethernet chip. Pretty straightforward.

This has nothing to do with the mesh. It has everything to do with what the firmware does with received packets (that it happens to receive either via the mesh, or via ad-hoc or access point packets). Some such packets are ignored e.g. if they aren't addressed to us. Some such packets are reported to the host, e.g. if they ARE addressed to us, or are a broadcast, or are a multicast that the host is interested in.

Just because one corner of your firmware implements a mesh, doesn't mean you are exempt from making normal 802.x networking functions work.

in reply to: ↑ 14   Changed 7 years ago by mbletsas

I don't think that the firmware doesn't implement standard 802.x functions. It is the wakeup behavior that we are discussing here.

M.

follow-up: ↓ 17   Changed 7 years ago by gnu

Why is "the wakeup behavior" any different from "standard 802.x functions"? The chip should not do a CPU wakeup unless it has an interrupt to present to the host. So, normal interrupt masking should be all the CPU needs to set up in advance, to determine which possible events will cause a wakeup. And there should already be ways for the CPU to tell the chip "don't interrupt me about multicasts, except ones addressed to these addresses".

This seems like a no-brainer to me, why is it so hard to get across?

in reply to: ↑ 16   Changed 7 years ago by mbletsas

This seems like a no-brainer to me, why is it so hard to get across?

Inferior brain on the receiver side, I guess ;-)

The radio is connected to the host via the USB bus plus 1 wakeup line into the motherboard's embedded controller (EC). During suspend there is no way for the radio to wakeup the host via the USB bus. If a received frame meets the wakeup criteria, then the radio wakes up the host via the line to the EC.

Maybe I am confused since the beginning of this thread because I don't get how a multicast frame will only wake up a subset of hosts (or in other words, how do you define which multicast group each subset listens to).

M.

  Changed 7 years ago by gnu

First: "If a received frame meets the wakeup criteria, then the radio wakes up the host via the line to the EC."

Why are there any "wakeup criteria"?

Why not just have the rule be: "Whenever the radio has an unmasked interrupt pending for the host, and its USB interface is suspended, then it wakes up the host via the line to the EC."

Then the software can just set the usual interrupt masks to tell the firmware which packets (or events, like loss of association) it cares about. If it wants no wakeups, it can just mask off all interrupts. If it only wants loss-of-assoc, it can set that mask bit. If it wants unicast but not multicast, it can set that interrupt mask.

Second: "how do you define which multicast group each subset [of hosts] listens to?"

There's an interface. http://dev.laptop.org/git?p=olpc-2.6;a=blob;f=drivers/net/wireless/libertas/cmd.c;h=ce2841731231fb8ca91213d94fbdc22721268f23;hb=b776810bcc6492addb10a0aa3d9b05a650243cad#l635 shows it on line 635, where it uses the CMD_MAC_MULTICAST_ADR command to pass a table of MAC addresses to the firmware. The rest of the kernel already knows how and when to call such an interface (for an Ethernet or anything else that supports multicast). The kernel starts off the table with its own internal multicast addresses (the all-ipv6-nodes-multicast address, the multicast address it uses for neighbor discovery, etc). When user programs join particular multicast groups by binding a socket to a particular IPv6 multicast address, the kernel adds the corresponding MAC address to the table and passes it down to the hardware, via this function. When such a socket is closed, the kernel removes the address from the table. According to drivers/net/wireless/libertas/defs.h, it can hold 32 muliticast addresses to look for. The driver keeps its local copy of the table in its struct _wlan_adapter (see dev.h) in the member "multicastlist".

Multicast is handled already -- if only the firmware was looking in the right place in its own implementation!

  Changed 7 years ago by mbletsas

Excuse our ignorance, we are kind of slow here...

We can currently wake up the host upon receipt of a frame addressed to its unicast address, a broadcast address and a range of anycast addresses. You are also asking to wakeup the host when certain multicast frames are received.

The purpose of that would be to wake up a subset of hosts and not all of the hosts present in the local mesh network (as a wakeup on broadcast would do).

Every XO right now listens to 4 link-level multicast layer-2 addresses: 3 of them are common among all of them and 1 is derived from its own MAC address.

The piece that I am missing, is how are we going to wake up specific XOs using multicast frames (without keeping track of them a priori to the wakeup request).

M.

follow-up: ↓ 22   Changed 7 years ago by dwmw2

  • cc dwmw2 added

The libertas driver supports the standard wake-on-lan configuration via ethtool. You can enable wake on multicast (as well as unicast) as follows:

ethtool -s eth0 wol um

I'm not sure if that makes it wake on _all_ multicast packets, or only the ones which it was configured to listen to with the CMD_MAC_MULTICAST_ADR command. If the former, that should be filed as a firmware bug.

  Changed 6 years ago by morgs

  • cc Collabora added

in reply to: ↑ 20   Changed 6 years ago by gnu

Replying to dwmw2:

I'm not sure if that makes it wake on _all_ multicast packets, or only the ones which it was configured to listen to with the CMD_MAC_MULTICAST_ADR command.

I just tested it in update.2-691. After I run "ethtool -s eth0 wol um", it works correctly, only awakening the host when a multicast packet arrives which the kernel has indicated an interest in. I tested it with various link-local multicast address. No wakey if the address isn't in the recipient's "ip maddr" list. Wakes promptly (but drops the wakeup packet) if the address IS in the recipient's "ip maddr" list.

The default "wol" setting is "u", meaning to wake only on unicast packets. This is defined in the kernel in liberats/if_usb.c: if_usb_setup_firmware(). It could either be changed there, or be changed by invoking ethtool later (perhaps in an rc script, or from ohm). Best to fix it in the driver, rather than require startup scripts to reconfigure it, I think. Proposed patch:

- lbs_host_sleep_cfg(priv, EHS_WAKE_ON_UNICAST_DATA);

+ lbs_host_sleep_cfg(priv, EHS_WAKE_ON_UNICAST_DATA|EHS_WAKE_ON_MULTICAST_DATA);

  Changed 6 years ago by gnu

Filed #6527 (Mesh does not forward multicast packets (most of the time)), #6528 (Packets that wake the laptop from suspend are often lost), #6529 (Multicast ping over eth0 (not mesh) sometimes produces duplicate packets) for other bugs I discovered while reproducing this bug.

  Changed 6 years ago by gnu

  • blockedby 6993 added

#6993 contains firmware improvements that fix the wake-on-multicast table. If they work, and the driver changes take advantage of them, and Ohm or the driver sets the right bits based on how we suspend, then system-level testing should be finally able to close this bug.

Note: See TracTickets for help on using tickets.