Ticket #6869 (new defect)

Opened 6 years ago

Last modified 6 years ago

Firmware release - 5.110.22.p9

Reported by: carrano Owned by: ashish
Priority: high Milestone:
Component: distro Version:
Keywords: Cc: mbletsas, mstone
Action Needed: Verified: no
Deployments affected: Blocked By:
Blocking:

Description

FW release W8388-5.110.22.p9

New Features

1. Firmware ready event.

Firmware sends event 0x30 (FIRMWARE_READY) once it's downloaded from host and up.

Bug Fixes

1. OLPC ticket 6589.

http://dev.laptop.org/ticket/6589 Fixed timers, which are used for mesh routing, leak during mesh stop.

Note

This firmware is based on release - 5.110.22.p8 (#6854)

Attachments

usb8388.bin (120.1 kB) - added by carrano 6 years ago.
Firmware release - 5.110.22.p9

Change History

Changed 6 years ago by carrano

Firmware release - 5.110.22.p9

  Changed 6 years ago by carrano

  • owner changed from dgilmore to ashish

Initial tests with this firmware on build 702.

There are some problems in a mesh scenario. The symptoms with no further analysis are:

- Sometimes the interfaces (both msh0 and eth0) are started with MAC address FF:FF:FF:FF:FF:FF.

- Even if the interface is started with correct MAC address and Local Link IP is correctly attributed, nodes do not appear on each other's mesh view (tough pinging is possible between them).

follow-up: ↓ 3   Changed 6 years ago by ashish

Is this with the driver which either waits for FIRMWARE_READY event (0x30) or 200ms?

in reply to: ↑ 2   Changed 6 years ago by carrano

Replying to ashish:

Is this with the driver which either waits for FIRMWARE_READY event (0x30) or 200ms?

No. But tests with p250 were not either. Is this firmware not working unless we have change the driver?

follow-up: ↓ 5   Changed 6 years ago by ashish

I don't know why you did not see problem with p250. I would strongly vote for a driver fix. Could you please try something like

rmmod usb8xxx
modprobe usb8xxx

And start NM/your test? If my understanding is correct and problem is becuase of firmware download, the above should work.

in reply to: ↑ 4 ; follow-up: ↓ 6   Changed 6 years ago by carrano

Replying to ashish:

I don't know why you did not see problem with p250.

Me neither. But it is an statistical fact I cannot ignore. It may be the case that we are still missing part of the puzzle, isn't it?

I would strongly vote for a driver fix.

And I agree (and trying to get this into a build).

Could you please try something like

{{{ rmmod usb8xxx modprobe usb8xxx }}} And start NM/your test? If my understanding is correct and problem is becuase of firmware download, the above should work.

This fixes the first problem, which is clearly a failure in initialization. But it doesn't help with the second (mesh view). That seems unrelated to #6589. It seems a side effect of the changes in the firmware. If you judge that this firmware release is worth more time investment, I'm happy to do further analysis of the problem and post here.

in reply to: ↑ 5 ; follow-up: ↓ 7   Changed 6 years ago by ashish

Replying to carrano:

Replying to ashish:

I don't know why you did not see problem with p250.

Me neither. But it is an statistical fact I cannot ignore. It may be the case that we are still missing part of the puzzle, isn't it?

I would strongly vote for a driver fix.

And I agree (and trying to get this into a build). Could you please try something like

{{{ rmmod usb8xxx modprobe usb8xxx }}} And start NM/your test? If my understanding is correct and problem is becuase of firmware download, the above should work.

This fixes the first problem, which is clearly a failure in initialization. But it doesn't help with the second (mesh view). That seems unrelated to #6589. It seems a side effect of the changes in the firmware. If you judge that this firmware release is worth more time investment, I'm happy to do further analysis of the problem and post here.

Thanks! I believe this is other problem, it will be very useful if you can provide the following information. 1. Does mesh view based on mesh beacons and probe response? 2. Could you please try out iwpriv msh0 bcn_control to see if beacons are enabled. 2. And, also with the help of probe response API whether device is sending probe response.

in reply to: ↑ 6   Changed 6 years ago by carrano

Thanks! I believe this is other problem, it will be very useful if you can provide the following information. 1. Does mesh view based on mesh beacons and probe response?

Mesh view is based on presence information for XOs and beacons for AP.

2. Could you please try out iwpriv msh0 bcn_control to see if beacons are enabled. 2. And, also with the help of probe response API whether device is sending probe response.

Sniffing reveals that the failing nodes are sending out: beacons and probe requests and responding with probe responses normally.

(more on this soon)

  Changed 6 years ago by mbletsas

Just to be clear: XO presence doesn't use mesh beacons in any manner.

  Changed 6 years ago by carrano

Sniffing reveals that the nodes are *not* sending out presence information tough they are perfectly capable of sending and receiving multicast frames (tested with synthetic traffic generators and multicast pings).

In the failing nodes telepathy-salut is not running. That seems to be the cause. So, what's causing the salut crashing? (sugar-presence-service is still running). I will continue investigating.

  Changed 6 years ago by carrano

In a failing node, avahi-browse will return only information from the host itself.

+ msh0 IPv4 xo-0C-E8-EB [00:17:c4:0c:e8:eb]               Workstation          local
+ msh0 IPv4 xo-0C-E8-EB                                   SSH Remote Terminal  local

Another node (this one not failing) will display the first one (failing one) in its avahi-browe entries:

+ msh0 IPv4 a5800894@xo-05-2C-AE                          iChat Presence       local
+ msh0 IPv4 8701e7af@xo-05-23-02                          iChat Presence       local
+ msh0 IPv4 bd500be5@xo-05-2A-79                          iChat Presence       local
+ msh0 IPv4 xo-05-23-02                                   SSH Remote Terminal  local
+ msh0 IPv4 xo-05-2A-79                                   SSH Remote Terminal  local
+ msh0 IPv4 xo-0C-E8-EB                                   SSH Remote Terminal  local
+ msh0 IPv4 xo-05-23-02 [00:17:c4:05:23:02]               Workstation          local
+ msh0 IPv4 xo-05-2A-79 [00:17:c4:05:2a:79]               Workstation          local
+ msh0 IPv4 xo-0C-E8-EB [00:17:c4:0c:e8:eb]               Workstation          local

follow-up: ↓ 12   Changed 6 years ago by mbletsas

Well, since we have established that telepathy-salut is not running on failing nodes, we know where the problem lies. The question now is what triggers that process' going away. Does the wireless firmware passes traffic up that it didn't used to? Does 5.110.22.p8 have the same issue?

M.

in reply to: ↑ 11   Changed 6 years ago by carrano

Does 5.110.22.p8 have the same issue?

Yes. 5.110.22.p8 also fails in the mesh-view/avahi issue.

follow-up: ↓ 19   Changed 6 years ago by carrano

Summary

This firmware release fails in two modes:

1 - It sometimes does not start the wireless interface correctly. Symptom is the mac address equals to ff:ff:ff:ff:ff:ff. It seems that this is #6589 manifesting in a different way. A reboot or a reload in the usb8xxx module will fix it (expected from our experience with #6589).

2 - It does not display neighbors in the mesh view. Which seems related to the fact that avahi-browse does not return the expected data. It is established that the nodes are still able to send out and receive multicast frames.

We also know that:

- Firmware release 22.p8 fails in the second and not in the first.

  Changed 6 years ago by carrano

In face of the above summary. I believe we should not proceed testing this firmware version, since it's baseline is compromised. The strategy of skipping 22.p8 was not successful.

I suggest we go back to 22p8 in order to investigate the avahi/mesh view issue. After finding the root cause and fixing it, we will need to release another version and test it and only them we would add the "ready firmware" event.

M, what do you say?

  Changed 6 years ago by carrano

This firmware version used in conjunction with a driver patch described in http://dev.laptop.org/ticket/6589#comment:16 seem effective to fix #6589.

follow-up: ↓ 17   Changed 6 years ago by carrano

Tests with the patch in http://dev.laptop.org/attachment/ticket/6818/mesh_mcast.patch confirmed that the root cause for the mesh view problem (#6818) is the multicast filter not being populated by the driver.

in reply to: ↑ 16   Changed 6 years ago by carrano

Replying to carrano:

Tests with the patch in http://dev.laptop.org/attachment/ticket/6818/mesh_mcast.patch confirmed that the root cause for the mesh view problem (#6818) is the multicast filter not being populated by the driver.

After some unknown time (I was not looking but it is less than 1 hour) the mesh view was again displaying no XOs. We have to keep investigating #6818.

  Changed 6 years ago by ashish

Currently firmware 5.110.22.p8/9 does not support more than 8 multicast mac addresses. Is there a possibility that any given point of time there are more than 8 multicast address required?

in reply to: ↑ 13 ; follow-up: ↓ 20   Changed 6 years ago by ashish

Replying to carrano:

Summary This firmware release fails in two modes: 1 - It sometimes does not start the wireless interface correctly. Symptom is the mac address equals to ff:ff:ff:ff:ff:ff. It seems that this is #6589 manifesting in a different way. A reboot or a reload in the usb8xxx module will fix it (expected from our experience with #6589).

Firmware version 5.110.22.p10 fixes above.

2 - It does not display neighbors in the mesh view. Which seems related to the fact that avahi-browse does not return the expected data. It is established that the nodes are still able to send out and receive multicast frames. We also know that: - Firmware release 22.p8 fails in the second and not in the first.

in reply to: ↑ 19   Changed 6 years ago by carrano

Replying to ashish:

Replying to carrano:

Summary This firmware release fails in two modes: 1 - It sometimes does not start the wireless interface correctly. Symptom is the mac address equals to ff:ff:ff:ff:ff:ff. It seems that this is #6589 manifesting in a different way. A reboot or a reload in the usb8xxx module will fix it (expected from our experience with #6589).

Firmware version 5.110.22.p10 fixes above.

This firmware release is skipped in favor of 22.p10 (#6931)

  Changed 6 years ago by gregorio

  • milestone deleted

Milestone Never Assigned deleted

Note: See TracTickets for help on using tickets.