Opened 7 years ago

Closed 7 years ago

#4960 closed defect (fixed)

NetworkManager SEGV

Reported by: dwmw2 Owned by: marco
Priority: normal Milestone: Update.1
Component: network manager Version:
Keywords: Cc: jg, kimquirk, cjb, dsd
Blocked By: Blocking:
Deployments affected: Action Needed:
Verified: no

Description

If the wireless device goes away (because it craps itself and the libertas driver kicks it in the head), NetworkManager segfaults.

I find this behaviour suboptimal.

{{{-bash-3.2# NetworkManager --no-daemon
NetworkManager: <info> starting...
NetworkManager: <info> eth0: Device is fully-supported using driver 'usb'.
NetworkManager: <info> nm_device_init(): waiting for device's worker thread to start
NetworkManager: <info> nm_device_init(): device's worker thread started, continuing.
NetworkManager: <info> Now managing wireless (802.11) device 'eth0'.
NetworkManager: <info> Deactivating device eth0.
NetworkManager: <info> Setting hostname: 'xo-05-2B-F4.localdomain'
NetworkManager: <WARN> nm_spawn_process(): nm_spawn_process('/sbin/restorecon /etc/hosts'): could not spawn process. (Failed to execute child process "/sbin/restorecon" (No such file or directory))

NetworkManager: <info> msh0: Device is fully-supported using driver 'usb'.
NetworkManager: <info> nm_device_init(): waiting for device's worker thread to start
NetworkManager: <info> msh0: found 802.11 companion device eth0.
NetworkManager: <info> nm_device_init(): device's worker thread started, continuing.
NetworkManager: <info> Now managing OLPC Mesh (802.11) device 'msh0'.
NetworkManager: <info> Deactivating device msh0.
NetworkManager: <info> aipd_cleanup(): called
NetworkManager: <info> Anycast addresses: server (c0:27:c0:27:c0:00) xo (c0:27:c0:27:c0:01)
NetworkManager: <info> Updating allowed wireless network lists.
NetworkManager: <WARN> nm_dbus_get_networks_cb(): error received: org.freedesktop.NetworkManagerInfo.NoNetworks - Traceback (most recent call last):

File "/usr/lib/python2.5/site-packages/dbus/service.py", line 692, in _message_cb

retval = candidate_method(self, *args, keywords)

File "/usr/share/sugar/shell/hardware/nminfo.py", line 355, in getNetworks

raise NoNetworks()

NoNetworks: org.freedesktop.NetworkManagerInfo.NoNetworks:
.
NetworkManager: <info> msh0: Got association; scheduling association handler
NetworkManager: <info> msh0: got association event from driver.
NetworkManager: <info> initial wireless suppression timeout up
NetworkManager: <info> SWITCH: no current connection, found better connection 'msh0'.
NetworkManager: <info> Will activate connection 'msh0'.
NetworkManager: <info> Device msh0 activation scheduled...
NetworkManager: <info> Activation (msh0) started...
NetworkManager: <info> Activation (msh0) Stage 1 of 5 (Device Prepare) scheduled...
NetworkManager: <info> Activation (msh0) Stage 1 of 5 (Device Prepare) started...
NetworkManager: <info> Activation (msh0) Stage 1 of 5 (Device Prepare) complete.
NetworkManager: <info> Activation (msh0) Stage 2 of 5 (Device Configure) scheduled...
NetworkManager: <info> Activation (msh0) Stage 2 of 5 (Device Configure) starting...
NetworkManager: <info> Activation (msh0/mesh) level 1: School Mesh Portal.
NetworkManager: <info> Activation (msh0/mesh) Stage 2 of 6 (Device Configure) looking for a mesh on channel 1.
NetworkManager: <info> Activation (msh0) Stage 2 of 5 (Device Configure) complete.
NetworkManager: <info> Old device 'msh0' activating, won't change.
NetworkManager: <info> Old device 'msh0' activating, won't change.
NetworkManager: <info> msh0: Got association; scheduling association handler
NetworkManager: <info> msh0: got association event from driver.
NetworkManager: <info> Activation (msh0) Stage 3 of 5 (IP Configure Start) scheduled.
NetworkManager: <info> Activation (msh0) Stage 3 of 5 (IP Configure Start) started...
NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): enter, pid 0
NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): returning, zero pid 0
NetworkManager: <info> Activation (msh0) Beginning DHCP transaction.
NetworkManager: <info> msh0: nm_dhcp_manager_begin_transaction():662 started timeout source 0x80b18d8
Internet Systems Consortium DHCP Client V3.0.5-RedHat
Copyright 2004-2006 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/
NetworkManager: <info> dhclient started with pid 3315
NetworkManager: <info> Activation (msh0) Stage 3 of 5 (IP Configure Start) complete.
NetworkManager: <info> DHCP: device msh0 state changed (null) -> preinit
Listening on LPF/msh0/00:17:c4:05:2b:f4
Sending on LPF/msh0/00:17:c4:05:2b:f4
Sending on Socket/fallback
DHCPDISCOVER on msh0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on msh0 to 255.255.255.255 port 67 interval 1
DHCPOFFER from 172.18.16.1
DHCPREQUEST on msh0 to 255.255.255.255 port 67
DHCPACK from 172.18.16.1
NetworkManager: <info> nm_dhcp_dbus_set_state(): calling timeout_cleanup
NetworkManager: <info> msh0: nm_dhcp_device_timeout_cleanup():184 cleared timeout source 0x80b18d8
NetworkManager: <info> DHCP: device msh0 state changed preinit -> bound
NetworkManager: <info> Activation (msh0) Stage 4 of 5 (IP Configure Get) scheduled...
NetworkManager: <info> Activation (msh0) Stage 4 of 5 (IP Configure Get) started...
NetworkManager: <info> address 172.18.19.231
NetworkManager: <info> netmask 255.255.240.0
NetworkManager: <info> broadcast 172.18.31.255
NetworkManager: <info> gateway 172.18.16.1
NetworkManager: <info> nameserver '172.18.0.1'
NetworkManager: <info> domain name 'cambridge.xs.laptop.org'
NetworkManager: <info> Activation (msh0) Stage 5 of 5 (IP Configure Commit) scheduled...
NetworkManager: <info> Activation (msh0) Stage 4 of 5 (IP Configure Get) complete.
NetworkManager: <info> Activation (msh0) Stage 5 of 5 (IP Configure Commit) started...
bound to 172.18.19.231 -- renewal in 260 seconds.
NetworkManager: <info> Activation (msh0) successful, device activated.
NetworkManager: <info> Activation (msh0) Finish handler scheduled.
NetworkManager: <info> Activation (msh0) Stage 5 of 5 (IP Configure Commit) complete.
receive_packet failed on msh0: Network is down
NetworkManager: <debug> [1195210971.312661] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/net_00_17_c4_05_2b_f4').
NetworkManager: <info> Deactivating device msh0.
NetworkManager: <info> msh0: will stop DHCP
NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): enter, pid 3315
NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): kill -TERM-ing pid 3315
NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): waiting for 3315 to exit
NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): i was 19 after wait pid 3315
NetworkManager: <info> msh0: canceled DHCP transaction, dhclient pid 3315
NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): calling timeout_cleanup
NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): exit
NetworkManager: <info> aipd_cleanup(): called
NetworkManager: <info> msh0: will stop DHCP
NetworkManager: <debug> [1195210972.057726] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/usb_device_1286_2001_noserial_if0').

(NetworkManager:3283): GLib-GObject-WARNING : invalid unclassed pointer in cast to `NMDevice'
NetworkManager: <WARN> nm_signal_handler(): Caught signal 11. Generating backtrace...
NetworkManager:
* START
NetworkManager: (no debugging symbols found)
NetworkManager: Using host libthread_db library "/lib/libthread_db.so.1".
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: [Thread debugging using libthread_db enabled]
NetworkManager: [New Thread -1211812160 (LWP 3283)]
NetworkManager: [New Thread -1228600432 (LWP 3316)]
NetworkManager: [New Thread -1211815024 (LWP 3284)]
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: (no debugging symbols found)
NetworkManager: 0xb7fb8410 in kernel_vsyscall ()
NetworkManager: #0 0xb7fb8410 in
kernel_vsyscall ()
NetworkManager: #1 0xb7e0843b in waitpid () from /lib/libpthread.so.0
NetworkManager: #2 0x080749f0 in ?? ()
NetworkManager: #3 0x08074ae1 in ?? ()
NetworkManager: #4 <signal handler called>
NetworkManager: #5 0x08053c9f in nm_device_notify_no_best_device ()
NetworkManager: #6 0x08071adb in ?? ()
NetworkManager: #7 0xb7e3da91 in g_str_equal () from /lib/libglib-2.0.so.0
NetworkManager: #8 0xb7e3f7f2 in g_main_context_dispatch () from /lib/libglib-2.0.so.0
NetworkManager: #9 0xb7e427cf in g_str_equal () from /lib/libglib-2.0.so.0
NetworkManager: #10 0xb7e42b79 in g_main_loop_run () from /lib/libglib-2.0.so.0
NetworkManager: #11 0x080707d8 in main ()
NetworkManager:
NetworkManager: Thread 3 (Thread -1211815024 (LWP 3284)):
NetworkManager: #0 0xb7fb8410 in kernel_vsyscall ()
NetworkManager: No symbol table info available.
NetworkManager: #1 0xb7d6c1a3 in poll () from /lib/libc.so.6
NetworkManager: No symbol table info available.
NetworkManager: #2 0xb7e42803 in g_str_equal () from /lib/libglib-2.0.so.0
NetworkManager: No symbol table info available.
NetworkManager: #3 0xb7e42b79 in g_main_loop_run () from /lib/libglib-2.0.so.0
NetworkManager: No symbol table info available.
NetworkManager: #4 0x0805791e in ?? ()
NetworkManager: No symbol table info available.
NetworkManager: #5 0xb7e5d66f in g_str_equal () from /lib/libglib-2.0.so.0
NetworkManager: No symbol table info available.
NetworkManager: #6 0xb7e004fb in start_thread () from /lib/libpthread.so.0
NetworkManager: No symbol table info available.
NetworkManager: #7 0xb7d7620e in clone () from /lib/libc.so.6
NetworkManager: No symbol table info available.
NetworkManager:
NetworkManager: Thread 2 (Thread -1228600432 (LWP 3316)):
NetworkManager: #0 0xb7fb8410 in
kernel_vsyscall ()
NetworkManager: No symbol table info available.
NetworkManager: #1 0xb7e074eb in read () from /lib/libpthread.so.0
NetworkManager: No symbol table info available.
NetworkManager: #2 0xb7e4044d in g_str_equal () from /lib/libglib-2.0.so.0
NetworkManager: No symbol table info available.
NetworkManager: #3 0xb7e5d66f in g_str_equal () from /lib/libglib-2.0.so.0
NetworkManager: No symbol table info available.
NetworkManager: #4 0xb7e004fb in start_thread () from /lib/libpthread.so.0
NetworkManager: No symbol table info available.
NetworkManager: #5 0xb7d7620e in clone () from /lib/libc.so.6
NetworkManager: No symbol table info available.
NetworkManager:
NetworkManager: Thread 1 (Thread -1211812160 (LWP 3283)):
NetworkManager: #0 0xb7fb8410 in kernel_vsyscall ()
NetworkManager: No symbol table info available.
NetworkManager: #1 0xb7e0843b in waitpid () from /lib/libpthread.so.0
NetworkManager: No symbol table info available.
NetworkManager: #2 0x080749f0 in ?? ()
NetworkManager: No symbol table info available.
NetworkManager: #3 0x08074ae1 in ?? ()
NetworkManager: No symbol table info available.
NetworkManager: #4 <signal handler called>
NetworkManager: No symbol table info available.
NetworkManager: #5 0x08053c9f in nm_device_notify_no_best_device ()
NetworkManager: No symbol table info available.
NetworkManager: #6 0x08071adb in ?? ()
NetworkManager: No symbol table info available.
NetworkManager: #7 0xb7e3da91 in g_str_equal () from /lib/libglib-2.0.so.0
NetworkManager: No symbol table info available.
NetworkManager: #8 0xb7e3f7f2 in g_main_context_dispatch () from /lib/libglib-2.0.so.0
NetworkManager: No symbol table info available.
NetworkManager: #9 0xb7e427cf in g_str_equal () from /lib/libglib-2.0.so.0
NetworkManager: No symbol table info available.
NetworkManager: #10 0xb7e42b79 in g_main_loop_run () from /lib/libglib-2.0.so.0
NetworkManager: No symbol table info available.
NetworkManager: #11 0x080707d8 in main ()
NetworkManager: No symbol table info available.
NetworkManager: #0 0xb7fb8410 in
kernel_vsyscall ()
NetworkManager: The program is running. Quit anyway (and detach it)? (y or n) [answered Y; input not from terminal]

(NetworkManager:3283): GLib-GObject-WARNING : invalid unclassed pointer in cast to `NMDevice80211MeshOLPC'

(NetworkManager:3283): GLib-GObject-WARNING : invalid unclassed pointer in cast to `NMDevice'
NetworkManager:
* END
}}}

Change History (15)

comment:1 Changed 7 years ago by dwmw2

Program received signal SIGSEGV, Segmentation fault.                            
[Switching to Thread -1211885888 (LWP 2090)]                                    
0x08053c9f in nm_device_notify_no_best_device (self=0x80b4030)                  
    at nm-device.c:1959                                                         
1959            if (NM_DEVICE_GET_CLASS (self)->notify_no_best_device)          
(gdb) bt                                                                        
#0  0x08053c9f in nm_device_notify_no_best_device (self=0x80b4030)              
    at nm-device.c:1959                                                         
#1  0x08071adb in nm_policy_device_change_check (data=0x80a80d8)                
    at NetworkManagerPolicy.c:346                                               
#2  0xb7e2ba91 in g_str_equal () from /lib/libglib-2.0.so.0                     
#3  0xb7e2d7f2 in g_main_context_dispatch () from /lib/libglib-2.0.so.0         
#4  0xb7e307cf in g_str_equal () from /lib/libglib-2.0.so.0                     
#5  0xb7e30b79 in g_main_loop_run () from /lib/libglib-2.0.so.0                 
#6  0x080707d8 in main (argc=1918989871, argv=0xbfe8ec64)                       
    at NetworkManager.c:1053                                                    

comment:2 Changed 7 years ago by jg

  • Component changed from distro to network manager
  • Milestone changed from Never Assigned to Update.1
  • Owner changed from jg to dcbw

comment:3 Changed 7 years ago by dcbw

Good catch, looks pretty simple.

comment:4 Changed 7 years ago by cjb

  • Cc jg kimquirk cjb added

I don't think we can ship Update.1 (with OHM) without this fixed -- today I saw the firmware on the 8388 crap out during resume, so the kernel reset the device, and NM crashed (as expected).

Jim/Kim, please raise this to blocker?

comment:5 Changed 7 years ago by dcbw

What are the steps to induce the wireless device to crap itself and go away? Is an rmmod enough?

comment:6 Changed 7 years ago by dwmw2

Yes, that should suffice. NetworkManager doesn't care _why_ it goes away; it just crashes every time it goes away, I believe.

comment:7 Changed 7 years ago by dcbw

comment:8 Changed 7 years ago by dsd

  • Cc dsd added

This isn't that easy to reproduce. Tried on a B4 on joyride-1520:

rmmod usb8xxx does not cause a NM crash (confirmed by seeing the process remaining in ps), nor does David's wlanfrob program.

Based on a suggestion that it might only happen during scans, I tried wlanfrob and rmmod while "iwlist eth0 scan" was running in a repeated loop, no crash. I also established the periodicity of when NM scans and ran wlanfrom in the middle of a NetworkManager-invoked scan, no crash.

comment:9 Changed 7 years ago by dcbw

I was easily able to reproduce with build 450 on a B2... it might be that rmmod in your case doesn't actually remove the module due to usage counts or something...

comment:10 Changed 7 years ago by dcbw

Note that the fixed RPMS got automatically pulled into joyride 1524. The version I built above also increased the wpa_supplicant authentication timeout which should help connections in low-memory or high-load conditions.

comment:11 Changed 7 years ago by dsd

Ah, indeed, it just takes a while to crash (about a minute in some cases). Confirmed that a few times on both ship2.2-656 and joyride-1520.

I upgraded one of those systems to joyride-1524 and can confirm that the problem goes away (no more crashing on rmmod). So I guess the next stages can be taken in getting this in the update2 build.

comment:12 Changed 7 years ago by marco

  • Owner changed from dcbw to ApprovalForUpdate

Asking for approval since dsd tested it in joyride. (dcbw I hope you don't mind).

NetworkManager-0.6.5-0.8.svn3218.olpc2

comment:13 Changed 7 years ago by jg

  • Owner changed from ApprovalForUpdate to dgilmore

comment:14 Changed 7 years ago by dgilmore

  • Owner changed from dgilmore to marco

Please test update.1 build 684

comment:15 Changed 7 years ago by marco

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.