Ticket #5746 (closed defect: fixed)

Opened 7 years ago

Last modified 6 years ago

msh0 interface with a bogus name (msh0_rename)

Reported by: carrano Owned by: cscott
Priority: normal Milestone: 8.2.0 (was Update.2)
Component: distro Version:
Keywords: review? Cc: Blaketh, mbletsas, carrano, dwmw2, cscott, wad
Action Needed: never set Verified: no
Deployments affected: Blocked By:
Blocking:

Description

In joyride-1477 (and also in older joyride versions) the msh0 interface is reported (baptised) as msh0_rename.

This will, if anything else, break some scripts.

Attachments

0002-Disable-udev-renaming-of-network-interfaces.patch (1.3 kB) - added by dsd 7 years ago.
fix (implemented in pilgrim stream)

Change History

  Changed 7 years ago by jg

  • milestone changed from Never Assigned to Future Release

  Changed 7 years ago by dwmw2

  • cc dwmw2 added
  • component changed from wireless to distro

this is udev

  Changed 7 years ago by dsd

  • keywords review? added
  • milestone changed from Future Release to Retriage, Please!

The problem is that the newer udev contains some new rules for persistent interface naming based on MAC address, and tries to rename all network interfaces.

udev generates the following rule for the XO: SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:17:c4:05:26:1f", ATTR{type}=="1", NAME="eth0"

This functionality does not play too nicely with our case, where we have 2 interfaces with the same MAC address.

As per the above rule, udev tries to rename both eth0 and msh0 to "eth0". Only one can win, obviously. msh0 fails to be renamed. I'm not sure why udev adds the _rename prefix before the final rename, but presumably there is a reason.

The presence of msh0_rename rather than msh0 is breaking NetworkManager, see ticket #5931.

I suggest adding a simple udev rule to disable network interface renaming.

Changed 7 years ago by dsd

fix (implemented in pilgrim stream)

  Changed 7 years ago by jg

  • cc cscott added
  • owner changed from dwmw2 to dgilmore
  • milestone changed from Retriage, Please! to Update.1

  Changed 7 years ago by jg

  • blocking 5153 added

  Changed 7 years ago by cscott

Dave, can we get your review of this patch? Does this seem reasonable to you?

Surely there are other aliased devices in Linux Land -- how are those handled? (But maybe we're the only ones who have aliased devices which share a single MAC?)

  Changed 7 years ago by cscott

From my review, this patch is appropriate for ship.2.2 (if necessary), but I'm not convinced that it is necessary for update.1/joyride, pending word from Dave on how network device naming "should" work. We need to hear from Dave.

follow-ups: ↓ 10 ↓ 13   Changed 7 years ago by cscott

Patch applied for joyride (and in the master branch) since the scuttlebutt I'm hearing is that Dave approves (at least in principle) to fixing this in udev.

follow-up: ↓ 11   Changed 7 years ago by yani

  • blocking 5153 removed

(In #5153) I dont think this is related to msh0/msh0_rename (5746)

When applying david's patch

ifconfig eth0 down
echo $TRAFFIC_MASK > /sys/class/net/eth0/lbs_rtap

everything works fine.

But, still ifconfig eth0 down, sometimes takes forever to complete.

Ricardo can you give some hints here? so we can finally update the script

in reply to: ↑ 8   Changed 7 years ago by yani

Replying to cscott:

Patch applied for joyride (and in the master branch) since the scuttlebutt I'm hearing is that Dave approves (at least in principle) to fixing this in udev.

In 1540 it seems to have been fixed.

It must also be mentioned that there where some occasions in 1537 that XOs did NOT have the msh0_rename, where as others did. It looked as if it was an occasional case.

in reply to: ↑ 9   Changed 7 years ago by carrano

Replying to yani:

(In #5153) I dont think this is related to msh0/msh0_rename (5746) When applying david's patch {{{ ifconfig eth0 down echo $TRAFFIC_MASK > /sys/class/net/eth0/lbs_rtap }}} everything works fine.

Yani: that what I meant. Changing in the rtap initialization mechanism broke the script. It would still be broken if msh0 were used instead of eth0. So, there are(were) other scripts broken until the udev fix gets(got) in.

  Changed 7 years ago by dwmw2

This kind of udev rule will allow you to distinguish between eth%d and msh%d interfaces and stop trying to assign them both the same name...

# USB device 0x1286:0x2001 (usb)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:50:43:28:0a:d9", ATTR{type}=="1", ATTR{lbs_rtap}=="0x0", NAME="lbsethB"
# USB device 0x1286:0x2001 (usb)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:50:43:28:0a:d9", ATTR{type}=="1", ATTR{anycast_mask}=="0x0", NAME="lbsmshB"

in reply to: ↑ 8   Changed 7 years ago by bernie

Replying to cscott:

Patch applied for joyride (and in the master branch) since the scuttlebutt I'm hearing is that Dave approves (at least in principle) to fixing this in udev.

I also have it in olpc-utils.

Please drop it from pilgrim to reduce the number of hacks outside rpm packages.

  Changed 7 years ago by bernie

  • owner changed from dgilmore to ApprovalForUpdate

Please tag olpc-utils-0.67-1 in Update.1

  Changed 7 years ago by jg

  • owner changed from ApprovalForUpdate to dgilmore

approved.

  Changed 7 years ago by cscott

dwmw2, should we be using your rule instead of the one in the attached 0002 patch?

  Changed 7 years ago by dwmw2

The best fix is to patch whatever _creates_ those rules, make it recognise libertas devices, and make it create the rules accordingly. That way it's consistent with what the system normally does for device naming, and will work nicely for us too.

  Changed 7 years ago by cscott

  • cc wad added
  • owner changed from dgilmore to cscott
  • priority changed from high to normal
  • milestone changed from Update.1 to Update.2

Fixed via about patch for update.1. Retargetting bug to update.2, for which I'll make sure dwmw's improved version gets into both the school server and XO builds.

  Changed 7 years ago by Blaketh

  • cc Blaketh added

As the previous fix prevented NetworkManager from learning the existence of wired interfaces, Mike Stone and I have written a slightly less broad rule for udev that disables all rules on msh* interfaces rather than all net interfaces. This rule was released in olpc-utils-0.71-1, and is ready for testing. Sadly this patch will not work well for laptops with more than one mesh interface. Laptops with an Active Antenna may show race conditions unless a more comprehensive fix is created.

My thoughts on a conservative way to write udev rules that will work stably with such devices: A) Device index should persist according to MAC address. B) A pair of mesh, ethernet interfaces that share a MAC address should share a device index. C) It is hard for me to figure out whether a given device is libertas or not, so for each device hwaddr I see I shall generate a pair of rules as follows:

i) Find the appropriate index i for this MAC address in the persistence store. ii) The first rule takes devices named msh* with this MAC and renames them msh$i. iii) The second rule takes devices named eth* with this MAC and renames them eth$1.

D) The storage will be in /etc/udevd/rules.d/70-persist-net.rules. E) The generation shall take place in our fork of /lib/udev/write_net_rules from the udev package in the extras/ directory. Apologies for the poor formatting, but I am excited to attack more bugs!

  Changed 7 years ago by wad

B is a nice idea.

C. All existing Libertas devices have MAC addresses in the following three ranges:

00:17 00:50 00:79

I have a problem with A: This means that subsequent inserts of active antennas won't work as expected. Most software expect that when there is one interface, it is if0, and a second interface will be if1, etc. By introducing A you've made each insertion of a new device create a new interface number.

I recently saw this behavior as a bug on an XO, where I kept inserting new active antennas and the interface number kept increasing. It persisted across reboots! I wiped the machine after reaching eth24, and haven't seen that behavior on others.

  Changed 7 years ago by wad

Damn non-wiki trac formatting!

00:17

00:50

00:79

  Changed 6 years ago by dwmw2

It seems sufficient to use the base part ('eth', 'msh', 'wlan') of the kernel's own device name as one of the criteria for matching. The kernel doesn't consistently _number_ the devices, but it is at least consistent with that part.

I've filed an appropriate patch in Red Hat bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=440568#c19

  Changed 6 years ago by dwmw2

My patch is now in udev upstream.

  Changed 6 years ago by cscott

  • status changed from new to closed
  • next_action set to never set
  • resolution set to fixed
Note: See TracTickets for help on using tickets.