Ticket #3732 (new defect)

Opened 7 years ago

Last modified 2 years ago

arp broadcasts don't wake up autosuspended laptop

Reported by: wad Owned by: cjb
Priority: high Milestone: 9.1.0-cancelled
Component: power manager (OHM) Version:
Keywords: cjbfor9.1.0 Cc: mbletsas, jcardona, wad, rchokshi, dwmw2, dsaxena, gregorio, cjb, carrano, ashish, sascha_silbe, sridhar
Action Needed: code Verified: no
Deployments affected: Blocked By: #6993
Blocking:

Description

An arp request broadcast over the mesh doesn't seem to wake up the laptop. This means that a laptop that is in suspend will cease to be visible to other laptops.

This can be tested as follows:

1.) Set up an ad-hoc network with a couple of laptops 2.) Start one laptop (the trigger laptop) pinging the other laptop, with an interval of five seconds 3.) On the second laptop (the test laptop), enter the following script:

count=0 while true; do ((count=count+1)); logger -s $count; echo mem > /sys/power/state; done

After a few seconds, the trigger laptop declares "Destination unreachable". Waking the test laptop up manually (possibly once or twice until it is awake at the right time) restores the connection until arp times out again.

This was discovered in exploring #2621.

Change History

  Changed 7 years ago by wad

I forgot to mention a critical piece of evidence: This problem doesn't happen if the trigger laptop has a static arp mapping for the test laptop.

  Changed 7 years ago by jcardona

The firmware supports wake-up on broadcast, but needs to be properly configured. For that, set EHS_WAKE_ON_BROADCAST_DATA in the hostsleep wake-up conditions.

+ config.conditions = EHS_WAKE_ON_UNICAST_DATA | EHS_WAKE_ON_BROADCAST_DATA;
- config.conditions = EHS_WAKE_ON_UNICAST_DATA;

http://dev.laptop.org/git?p=olpc-2.6;a=blob;f=drivers/net/wireless/libertas/if_usb.c;h=13a7263db59c20d40daf48d276c2cd8e62ffbe20;hb=HEAD#l996

  Changed 7 years ago by wad

  • cc mbletsas added

Is the correct solution to set the interface to uncondititionally wake the laptop up on all broadcasts ?

follow-up: ↓ 7   Changed 7 years ago by jg

  • cc cscott, jcardona, wad added
  • priority changed from normal to high
  • milestone changed from Untriaged to First Deployment, V1.0

This (wakeup on all broadcasts) isn't a reasonable solution; many networks are way to chatty for that; we can't afford to wake up every second, and go back to sleep.

ARP and IPv6 NDP are similar; I don't know what the usual timeouts are.

I suspect we'll need some sort of sane filtering at the firmware level for key protocols. It also isn't clear to me that we should not forget our IP address after a while anyway; after a long suspend a delay for ARP and or NDP is reasonable, as it is quite likely the laptop has been moved anyway.

  Changed 7 years ago by mbletsas

This is uncharted territory for the most part. WOL (Wake up On Lan) functionality is the closest thing, however at this point what we see can not be characterized as a bug since ARP requests are broadcasts and we should not really wake up on broadcasts.

One possible solution is to make the ARP entries timeout very slowly. In that way one host can contact another without having to go through ARP discovery.

M

  Changed 7 years ago by jcardona

I agree that waking up on broadcast is not the solution. But this was reported as a bug and so I provided instructions on how to configure the wireless module to wake up on broadcast traffic if needed.

Task Group V of the 802.11 working group is trying to come up with a standard wake-on-wlan mechanism, but nothing has been standardized yet. The existing implementations of wake-on-wlan are based on pattern matching of incoming traffic (e.g. the host passes its IP address to the wireless card, who will only wake up on arp packets for that specific IP).

in reply to: ↑ 4   Changed 7 years ago by rchokshi

Replying to jg:

This (wakeup on all broadcasts) isn't a reasonable solution; many networks are way to chatty for that; we can't afford to wake up every second, and go back to sleep. ARP and IPv6 NDP are similar; I don't know what the usual timeouts are. I suspect we'll need some sort of sane filtering at the firmware level for key protocols. It also isn't clear to me that we should not forget our IP address after a while anyway; after a long suspend a delay for ARP and or NDP is reasonable, as it is quite likely the laptop has been moved anyway.

The advanced firmware that will run on our next generation SoC, i.e. 88W8682 will have much more flexibility in software to do nifty filtering based on key protocols (L2 as well as L3), packet types, etc. and wake-up the host only under certain pre-configured conditions.

  Changed 7 years ago by rchokshi

  • cc rchokshi added

  Changed 7 years ago by jg

that's nice to know.

Do you have any concrete suggestions of what to do in the meanwhile?

  Changed 7 years ago by rchokshi

Michail's suggestion seems to be the most efficient at the moment.

  Changed 7 years ago by jg

Seems like this should be pretty easy to set.

http://www.ams-ix.net/technical/config_guide/linux_configuration_hints.html

Opinions on what to set arp and ND timeouts to?

  Changed 7 years ago by jg

  • owner changed from jg to cjb
  • summary changed from arp broadcasts don't wake up laptop to arp and NDP timeouts on Linux are bizzarely short.

See also #2639.

  Changed 7 years ago by gnu

Jim, your assertion is incorrect that ipv6 Neighbor Discovery Protocol (NDP) is like ARP. NDP and router advertisements use *multicast* packets, not broadcasts. They are carefully tuned so that they should only wake up the nodes which they are addressed to. (The IPv6 designers learned from the mistakes of IPv4.)

See NDP RFC 4861's use of the "solicited-node multicast address", section 7.2.2 (page 60). This address is computed from the target IPv6 address via the algorithm in IPv6 Address Architecture RFC 4291, page 15-16. It uses a link-local multicast IPv6 address whose low 24 bits match the destination IPv6 address. This is translated to a multicast Ethernet MAC address 0x3333nnnnnnnn whose low 32 bits match the multicast IPv6 address (see IPv6-on-Ethernet RFC 2464, page 4. As long as the chip implements any kind of half-assed or better multicast address matching, the only node that will see this multicast is the node the NDP packet is looking for.

Therefore, the wireless chip *should* be set up to wake-on-multicast. It should have its multicast address table or hashtable preloaded with the subset of multicasts of interest to this node. All existing Ethernet drivers already preload these addresses, on every Ethernet chip that supports multicast.

This would allow IPv6 Neighbor Discovery to wake the laptop when suspended, fixing this bug for IPv6.

[As for IPv4: I think that, though it's a layer violation, OLPC/Cozybit should implement a wake-on-wlan hack that passes the node's IPv4 address to the chip, and asks it to wake-on-arp-for-that-address. It could be made arbitrarily simple: pass a byte value and an offset within the packet, and only wake on broadcasts where that byte matches. Pointing it at the low byte of the IP address in an ARP packet would only wake you for 1/256th of random broadcast packets; and in real life they are seldom random, so you win bigger. This would eliminate the need for unusual "arp and NDP timeouts" in every host or gateway that wants to initiate a connection to an OLPC laptop.]

follow-up: ↓ 16   Changed 6 years ago by gnu

Rumor on IRC is that wad wants a way to wake up every XO "without having to preload the ARP table" (due to this bug, sending a broadcast ping doesn't work). Instead, use multicast to the "all IPv6 nodes" link-level multicast address:

ping6 -I msh0 ff02::1

That should wake all nodes on the mesh out of suspend. (I'm not sure how the mesh implements bcast/multicast on the three meshes on three different channels.)

  Changed 6 years ago by cjb

(due to this bug, sending a broadcast ping doesn't work)

This isn't true -- we are using a v4 ping to the broadcast address, and it is correctly waking up all of the laptops associated to the AP.

in reply to: ↑ 14   Changed 6 years ago by gnu

ping6 -I msh0 ff02::1 That should wake all nodes on the mesh out of suspend.

I just tested this from a B1 @623 to a B4 @616. It doesn't work. When the B4 is awake, it responds fine to those pings. When suspended, it does not wake up for them. In addition, it does not wake up when pinged at its own link-local IPv6 address (the address that the initial ping responses came from).

#4616 reports this possible bug.

  Changed 6 years ago by gnu

  • cc dwmw2 added
  • component changed from distro to kernel

I think this bug should be retitled back to: arp broadcasts don't wake up laptop

Then we can fix it! I found out how.

The firmware interface document, and the Unix "ethtool", define a way for us to set up wake-on-arp separately from wake-on-broadcast. We just haven't done the minor work involved. (Note: I don't know that the Libertas firmware *meets* its spec. But since this is part of the basic Ethernet/802.11 wake infrastructure, it probably does. See errata in #2177.)

"ethtool eth0 wol" configures the wake-on-lan for the interface. It allows setting various bits. Our current interface supports the "p" (physical layer event), "u" (unicast), "m" (multicast), and "b" (broadcast) bits. There's an additional "a" (ARP) bit defined, but we don't support it yet.

Page 85-86 of the Marvell spec (http://wiki.laptop.org/images/f/f3/Firmware-Spec-v5.1-MV-S103752-00.pdf) defines the wakeup conditions, passed to the firmware in a CMD_802_11_HOST_SLEEP_CFG command. It has bits defined for b, u, p, and m as above. It also says, "The optional MrvlIETypes_HostSleepFilterType1 TLV is used to further specify the EthType packets that can wake up the host."

Page 131 of the Marvell firmware spec defines the "MrvIIETypes_HostSleepFilterType1 TLV", and gives an example of how to use it for ARP. You can match each packet against three fields: AddrType (bcast/unicast/multicast), EthType (2-byte Ethernet address type), and Ipv4Addr (our IPv4 address). Most of these can be -1 for don't care. If any row matches, the packet causes a wakeup.

I suggest that we support the "a" bit in the Linux interface, and hook it to this filter. When the "a" bit is set, we should configure the chip with this HostSleepFilterType setting:

AddrType 1 (bcast), EthType 0x0806 (ARP), Ipv4Addr (our IP address)

In addition, if we're using this filter, we need to add a row for each of the other packet type bits we have on, e.g.:

AddrType 3 (mcast), EthType 0xFFFF (dontcare), Ipv4Addr 0xFFFFFFFF (dontcare)

I further believe that by default, the driver should configure itself to wake on unicast, multicast, and arp. It shouldn't take external twiddling to make this happen. This will permit ordinary IPv4 and IPv6 TCP and UDP packets to wake the laptop (but not non-ARP broadcasts). Doing this requires sending a new Host Sleep setup command whenever our IPv4 address changes. If this interface has several IPv4 addresses, then we'd add several arp rows to this table. (We may need to put our anycast addresses in here too, if they get arp'd for; I don't know how that currently works.)

  Changed 6 years ago by gnu

  • blockedby 6993 added

The new firmware in #6993 adds the wake-on-signature feature that's mentioned in the comment above. This could be used by our driver, initscripts, and network manager to permit wake-on-incoming-ARP. I don't believe that we have all of that infrastructure set up to use the new feature, though.

Doing that would fix this "autosuspended laptop doesn't wake on incoming ARP packets" bug, once and for all.

I believe it's mostly a driver issue; whenever an IPv4 address is added to the interface, it needs to be added to the wake-on-signature table so that we'll wake when an ARP for that address comes in. (Note that we can have several IPv4 addresses at the same time on the same interface.) When an IPv4 address is removed, we'd remove the entry from the wake-on-signature table.

Then during suspend, the kernel would have to know whether we are doing an automatic suspend (for power management, invisibly to the user) or a manual suspend (because the user told us to turn off). A manual suspend should not wake when packets come in (any packets). An automatic suspend should wake when packets come in (any packets for us, including ARP, multicast that we're listening for, or unicast addressed to us).

follow-up: ↓ 20   Changed 6 years ago by cjb

Thanks, John. Would you be willing to verify that it's possible to enable working wake on ARP now? Once we have a recipe, it can easily be added to userspace.

If we think we still need driver support, we should find someone to task that driver work to.

in reply to: ↑ 19   Changed 6 years ago by mbletsas

Replying to cjb:

Thanks, John. Would you be willing to verify that it's possible to enable working wake on ARP now? Once we have a recipe, it can easily be added to userspace. If we think we still need driver support, we should find someone to task that driver work to.

Ricardo has already verified that. He should probably rerun his tests now that the driver changes just made it into the kernel, since we were forced to rewrite the (already working) code.

And we have already tasked Javier to deal with such issues.

M.

  Changed 6 years ago by gnu

  • cc dsaxena added
  • keywords blocks?:8.2.0 added
  • next_action set to code
  • summary changed from arp and NDP timeouts on Linux are bizzarely short. to arp broadcasts don't wake up autosuspended laptop

Wake-on-ARP support is not in the kernel in joyride-2263. ("ethtool eth0" says "Supports: wake-on pumb" but doesn't say it supports "a". Ditto for msh0.) Comment 17 above describes how to implement it in the driver. The lack of this support means that autosuspended XO's don't awaken when another node tries to open a connection to them (e.g. by offering to share something with them, or from an incoming ssh connection).

  Changed 6 years ago by gregorio

  • cc gregorio added

Hi Chris,

Can you code this and get it in ASAP?

I'll leave the blocker status alone for now but my impression is that we would ship without this being resolved...

I think that the design discussion above has converged. Let me know if you still need consensus on what to code.

Thanks,

Greg S

  Changed 6 years ago by cscott

  • cc cjb added; cscott removed
  • owner changed from cjb to dsaxena

From reading the comments above, this bug is in dsaxena's lap at the moment, not cjb's. (Although both will probably have to write a bit of code.)

We could use a better testbed for exercising the "wake on network traffic" functionality, since we've got quite a bit of anecdotal evidence that it is at least slightly broken. Once we fix this bug, it would be nice to be able to exercise the functionality more thoroughly to convince ourselves there aren't other lingering issues. Perhaps a pair of simple scripts to exchange traffic between two XOs, forcing suspend at various times. (What if, for example, the suspend decision happens after the kernel has sent an ACK for a packet but before the data is sent up to userland? Probably better kernel integration is necessary to solve these problems completely.)

follow-up: ↓ 25   Changed 6 years ago by cjb

  • cc carrano added

Hi Greg,

I'd like to get this in too. It sounds like it isn't exposed by ethtool, but I think I saw mention of it exposed by the new packet filter code in a different bug. Ricardo, would you be able to provide a recipe for us to set wake on ARP with it?

in reply to: ↑ 24 ; follow-up: ↓ 27   Changed 6 years ago by carrano

Replying to cjb:

Hi Greg, I'd like to get this in too. It sounds like it isn't exposed by ethtool, but I think I saw mention of it exposed by the new packet filter code in a different bug. Ricardo, would you be able to provide a recipe for us to set wake on ARP with it?

Chris, The recipe would be:

1 - any arp requests via msh0 interface

iwpriv msh0 set_wol_rule "b m 0x50430806.FFFFFFFF@04"

2 - any arp request via eth0

iwpriv eth0 set_wol_rule "b 0x00000806.FFFFFFFF@04"

Details in: http://dev.laptop.org/ticket/6993#comment:2

However I am very concerned with the current status of the driver wrt to this bits.

  Changed 6 years ago by dsaxena

I've merged the bits for these iwpriv calls into testing kernel and is in latest joyrides.

in reply to: ↑ 25 ; follow-up: ↓ 28   Changed 6 years ago by dsaxena

Replying to carrano:

Chris, The recipe would be: 1 - any arp requests via msh0 interface iwpriv msh0 set_wol_rule "b m 0x50430806.FFFFFFFF@04" 2 - any arp request via eth0 iwpriv eth0 set_wol_rule "b 0x00000806.FFFFFFFF@04"

Since we merge the filter rules before sending to the HW, we only need one of these if I understand everything correctly.

in reply to: ↑ 27   Changed 6 years ago by dsaxena

Replying to dsaxena:

Replying to carrano:

Chris, The recipe would be: 1 - any arp requests via msh0 interface iwpriv msh0 set_wol_rule "b m 0x50430806.FFFFFFFF@04" 2 - any arp request via eth0 iwpriv eth0 set_wol_rule "b 0x00000806.FFFFFFFF@04"

Since we merge the filter rules before sending to the HW, we only need one of these if I understand everything correctly.

My bad here. We do need both rules.

  Changed 6 years ago by dsaxena

  • owner changed from dsaxena to cjb
  • component changed from kernel to power manager (OHM)

Moving to OHM since the kernel bits (#6993) are in.

follow-up: ↓ 31   Changed 6 years ago by cjb

Hm, this doesn't have much to do with OHM, and I think we *always* want the global wake-on-ARP behavior. Is there a kernel method for turning this on we could use?

in reply to: ↑ 30   Changed 6 years ago by dsaxena

Replying to cjb:

Hm, this doesn't have much to do with OHM, and I think we *always* want the global wake-on-ARP behavior. Is there a kernel method for turning this on we could use?

iwpriv is the interface into the kernel to do this. If not ohm, can we put this into a an initscript of some sort?

  Changed 6 years ago by mstone

  • keywords blocks-:8.2.0 relnote added; blocks?:8.2.0 removed
  • milestone changed from 8.2.0 (was Update.2) to 9.1.0

We turned off idlesuspend because we know that it breaks more things than we can fix for 8.2.0; hence, this is just a relnote bug for 8.2.0. Sorry!

  Changed 6 years ago by gregorio

  • keywords blocks-:8.2.0 relnote removed

  Changed 6 years ago by cjb

  • milestone changed from 9.1.0 to 8.2.0 (was Update.2)

I think this is a good 8.2.1 candidate.

  Changed 6 years ago by cjb

  • milestone changed from 8.2.0 (was Update.2) to 8.2.1

  Changed 5 years ago by carrano

I am having trouble setting the wol filter to wake up the XO on the receiving of arp requests destined to its IP address.

In this example, the XO has eth0 address 192.168.11.14 and is set to sleep.

This is a captured frame that should wake up the XO:

0000  00 00 18 00 ee 58 00 00  10 02 9e 09 a0 00 ac 9c   .....X.. ........
0010  46 00 00 10 4d 7d 34 63  08 02 00 00 ff ff ff ff   F...M}4c ........
0020  ff ff 00 16 01 84 2b 0f  00 17 c4 0d 5e 11 30 d8   ......+. ....^.0.
0030  aa aa 03 00 00 00 08 06  00 01 08 00 06 04 00 01   ........ ........
0040  00 17 c4 0d 5e 11 c0 a8  0b 08 00 00 00 00 00 00   ....^... ........
0050  c0 a8 0b 0e 4d 7d 34 63  

As per my understanding of the signature based wol filter, the rule to achieve so would be:

iwpriv eth0 set_wol_filter "b 00000806@4 && 0ca80b0e@20"

Which *seems* to work, but it does not do what we want, as I explain bellow:

The following works (wakes up the XO)

iwpriv eth0 set_wol_filter "b 00000806@4"

While the following doesn't work (does not wake up the XO as it should)

iwpriv eth0 set_wol_filter "b 0ca80b0e@20"

Javier,

Apart from the above, I still suspect that there is an issue with the evaluation of the expression with AND (&&), since:

The following works:

iwpriv eth0 set_wol_filter "b 00000806@4 && 0ca80b0e@20"

While the following does not:

iwpriv eth0 set_wol_filter "b 0ca80b0e@20 && 00000806@4"

It seems that if the first term is true, the second is not evaluated. In short, 'iwpriv eth0 set_wol_filter "b 00000806@4 && 0ca80b0e@20"' is actually waking up the XO to *any* ARP request (tests confirmed this).

  Changed 5 years ago by mstone-xmlrpc

  • keywords cjbfor9.1.0 added
  • milestone changed from 8.2.1 to 9.1.0

Pushing out to 9.1.0, per edmcnierney's request.

follow-up: ↓ 39   Changed 5 years ago by carrano

Repeated tests reported in the above comment but this time with firmware release 22.p23. (btw there is a consistent typo in this comment: the iwpriv commands are {set,get,reset}_wol_rule, not {set,get,reset}_wol_filter)

The results were not good either. Now the && (AND) operator seems completely broken.

When the host receives an arp for its IP addr (192.168.11.14) it will wake up if configured with the following filters:

iwpriv eth0 set_wol_filter "b 00000806@4"

or

iwpriv eth0 set_wol_filter "b 0ca80b0e@20"

But will not wake up for:

iwpriv eth0 set_wol_filter "b 00000806@4 && 0ca80b0e@20"

or

iwpriv eth0 set_wol_filter "b 0ca80b0e@20 && 00000806@4"

(so its a little different from what we had with 22.p20) as discussed here)

in reply to: ↑ 38 ; follow-up: ↓ 40   Changed 5 years ago by ashish

  • cc ashish added

Replying to carrano:

Repeated tests reported in the above comment but this time with firmware release 22.p23. (btw there is a consistent typo in this comment: the iwpriv commands are {set,get,reset}_wol_rule, not {set,get,reset}_wol_filter) The results were not good either. Now the && (AND) operator seems completely broken. When the host receives an arp for its IP addr (192.168.11.14) it will wake up if configured with the following filters: {{{ iwpriv eth0 set_wol_filter "b 00000806@4" }}} or {{{ iwpriv eth0 set_wol_filter "b 0ca80b0e@20" }}} But will not wake up for: {{{ iwpriv eth0 set_wol_filter "b 00000806@4 && 0ca80b0e@20" }}} or {{{ iwpriv eth0 set_wol_filter "b 0ca80b0e@20 && 00000806@4" }}} (so its a little different from what we had with 22.p20) as discussed here)

Could you please confirm if the rule for IP 192.168.11.14 was

c0a80b0e@20

instead of

0ca80b0e@20

(which is anyway incorrect)? I remmber we did verify this simple test. Thanks

in reply to: ↑ 39 ; follow-up: ↓ 42   Changed 5 years ago by carrano

Replying to ashish: [...]

Could you please confirm if the rule for IP 192.168.11.14 was {{{ c0a80b0e@20 }}} instead of {{{ 0ca80b0e@20 }}} (which is anyway incorrect)? I remmber we did verify this simple test. Thanks

You're right, of course: c0, not 0c. Stupid mistake.

Also, I repeated the tests that uncovered the bounding limit issue with good results.

  • iwpriv msh0 reset_wol_rule; iwpriv msh0 set_wol_rule "u ef@40" will make the XO wake up for "ping -p ef" [OK]
  • iwpriv msh0 reset_wol_rule; iwpriv msh0 set_wol_rule "u ef@60" will *not* make the XO wake up for "ping -p ef" [OK]
  • iwpriv msh0 reset_wol_rule; iwpriv msh0 set_wol_rule "u ef@60 && ef@40" will *not* make the XO wake up for "ping -p ef" [OK]
  • iwpriv msh0 reset_wol_rule; iwpriv msh0 set_wol_rule "u ef@40 && ef@60" will *not* make the XO wake up for "ping -p ef" [OK]

So it seems that the "Boundary condition for WOL rule has been fixed".

  Changed 5 years ago by ashish

I would confirm the other problems reported here and would update more on this. Thanks

in reply to: ↑ 40   Changed 5 years ago by carrano

Also, I repeated the tests that uncovered the bounding limit issue with good results. Clarifying, this means that the firmware release 22.p23, fixed the boundary condition for WOL rule.

  Changed 4 years ago by sascha_silbe

  • cc sascha_silbe added

  Changed 2 years ago by sridhar

  • cc sridhar added
Note: See TracTickets for help on using tickets.