Ticket #4960 (closed defect: fixed)

Opened 7 years ago

Last modified 6 years ago

NetworkManager SEGV

Reported by: dwmw2 Owned by: marco
Priority: normal Milestone: Update.1
Component: network manager Version:
Keywords: Cc: jg, kimquirk, cjb, dsd
Action Needed: Verified: no
Deployments affected: Blocked By:
Blocking:

Description

If the wireless device goes away (because it craps itself and the libertas driver kicks it in the head), NetworkManager segfaults.

I find this behaviour suboptimal.

{{{-bash-3.2# NetworkManager --no-daemon NetworkManager: <info> starting... NetworkManager: <info> eth0: Device is fully-supported using driver 'usb'. NetworkManager: <info> nm_device_init(): waiting for device's worker thread to start NetworkManager: <info> nm_device_init(): device's worker thread started, continuing. NetworkManager: <info> Now managing wireless (802.11) device 'eth0'. NetworkManager: <info> Deactivating device eth0. NetworkManager: <info> Setting hostname: 'xo-05-2B-F4.localdomain' NetworkManager: <WARN> nm_spawn_process(): nm_spawn_process('/sbin/restorecon /etc/hosts'): could not spawn process. (Failed to execute child process "/sbin/restorecon" (No such file or directory))

NetworkManager: <info> msh0: Device is fully-supported using driver 'usb'. NetworkManager: <info> nm_device_init(): waiting for device's worker thread to start NetworkManager: <info> msh0: found 802.11 companion device eth0. NetworkManager: <info> nm_device_init(): device's worker thread started, continuing. NetworkManager: <info> Now managing OLPC Mesh (802.11) device 'msh0'. NetworkManager: <info> Deactivating device msh0. NetworkManager: <info> aipd_cleanup(): called NetworkManager: <info> Anycast addresses: server (c0:27:c0:27:c0:00) xo (c0:27:c0:27:c0:01) NetworkManager: <info> Updating allowed wireless network lists. NetworkManager: <WARN> nm_dbus_get_networks_cb(): error received: org.freedesktop.NetworkManagerInfo.NoNetworks - Traceback (most recent call last):

File "/usr/lib/python2.5/site-packages/dbus/service.py", line 692, in _message_cb

retval = candidate_method(self, *args, **keywords)

File "/usr/share/sugar/shell/hardware/nminfo.py", line 355, in getNetworks

raise NoNetworks()

NoNetworks: org.freedesktop.NetworkManagerInfo.NoNetworks: . NetworkManager: <info> msh0: Got association; scheduling association handler NetworkManager: <info> msh0: got association event from driver. NetworkManager: <info> initial wireless suppression timeout up NetworkManager: <info> SWITCH: no current connection, found better connection 'msh0'. NetworkManager: <info> Will activate connection 'msh0'. NetworkManager: <info> Device msh0 activation scheduled... NetworkManager: <info> Activation (msh0) started... NetworkManager: <info> Activation (msh0) Stage 1 of 5 (Device Prepare) scheduled... NetworkManager: <info> Activation (msh0) Stage 1 of 5 (Device Prepare) started... NetworkManager: <info> Activation (msh0) Stage 1 of 5 (Device Prepare) complete. NetworkManager: <info> Activation (msh0) Stage 2 of 5 (Device Configure) scheduled... NetworkManager: <info> Activation (msh0) Stage 2 of 5 (Device Configure) starting... NetworkManager: <info> Activation (msh0/mesh) level 1: School Mesh Portal. NetworkManager: <info> Activation (msh0/mesh) Stage 2 of 6 (Device Configure) looking for a mesh on channel 1. NetworkManager: <info> Activation (msh0) Stage 2 of 5 (Device Configure) complete. NetworkManager: <info> Old device 'msh0' activating, won't change. NetworkManager: <info> Old device 'msh0' activating, won't change. NetworkManager: <info> msh0: Got association; scheduling association handler NetworkManager: <info> msh0: got association event from driver. NetworkManager: <info> Activation (msh0) Stage 3 of 5 (IP Configure Start) scheduled. NetworkManager: <info> Activation (msh0) Stage 3 of 5 (IP Configure Start) started... NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): enter, pid 0 NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): returning, zero pid 0 NetworkManager: <info> Activation (msh0) Beginning DHCP transaction. NetworkManager: <info> msh0: nm_dhcp_manager_begin_transaction():662 started timeout source 0x80b18d8 Internet Systems Consortium DHCP Client V3.0.5-RedHat Copyright 2004-2006 Internet Systems Consortium. All rights reserved. For info, please visit http://www.isc.org/sw/dhcp/ NetworkManager: <info> dhclient started with pid 3315 NetworkManager: <info> Activation (msh0) Stage 3 of 5 (IP Configure Start) complete. NetworkManager: <info> DHCP: device msh0 state changed (null) -> preinit Listening on LPF/msh0/00:17:c4:05:2b:f4 Sending on LPF/msh0/00:17:c4:05:2b:f4 Sending on Socket/fallback DHCPDISCOVER on msh0 to 255.255.255.255 port 67 interval 1 DHCPDISCOVER on msh0 to 255.255.255.255 port 67 interval 1 DHCPOFFER from 172.18.16.1 DHCPREQUEST on msh0 to 255.255.255.255 port 67 DHCPACK from 172.18.16.1 NetworkManager: <info> nm_dhcp_dbus_set_state(): calling timeout_cleanup NetworkManager: <info> msh0: nm_dhcp_device_timeout_cleanup():184 cleared timeout source 0x80b18d8 NetworkManager: <info> DHCP: device msh0 state changed preinit -> bound NetworkManager: <info> Activation (msh0) Stage 4 of 5 (IP Configure Get) scheduled... NetworkManager: <info> Activation (msh0) Stage 4 of 5 (IP Configure Get) started... NetworkManager: <info> address 172.18.19.231 NetworkManager: <info> netmask 255.255.240.0 NetworkManager: <info> broadcast 172.18.31.255 NetworkManager: <info> gateway 172.18.16.1 NetworkManager: <info> nameserver '172.18.0.1' NetworkManager: <info> domain name 'cambridge.xs.laptop.org' NetworkManager: <info> Activation (msh0) Stage 5 of 5 (IP Configure Commit) scheduled... NetworkManager: <info> Activation (msh0) Stage 4 of 5 (IP Configure Get) complete. NetworkManager: <info> Activation (msh0) Stage 5 of 5 (IP Configure Commit) started... bound to 172.18.19.231 -- renewal in 260 seconds. NetworkManager: <info> Activation (msh0) successful, device activated. NetworkManager: <info> Activation (msh0) Finish handler scheduled. NetworkManager: <info> Activation (msh0) Stage 5 of 5 (IP Configure Commit) complete. receive_packet failed on msh0: Network is down NetworkManager: <debug> [1195210971.312661] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/net_00_17_c4_05_2b_f4'). NetworkManager: <info> Deactivating device msh0. NetworkManager: <info> msh0: will stop DHCP NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): enter, pid 3315 NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): kill -TERM-ing pid 3315 NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): waiting for 3315 to exit NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): i was 19 after wait pid 3315 NetworkManager: <info> msh0: canceled DHCP transaction, dhclient pid 3315 NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): calling timeout_cleanup NetworkManager: <info> nm_dhcp_manager_cancel_transaction_real(): exit NetworkManager: <info> aipd_cleanup(): called NetworkManager: <info> msh0: will stop DHCP NetworkManager: <debug> [1195210972.057726] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/usb_device_1286_2001_noserial_if0').

(NetworkManager:3283): GLib-GObject-WARNING **: invalid unclassed pointer in cast to `NMDevice' NetworkManager: <WARN> nm_signal_handler(): Caught signal 11. Generating backtrace... NetworkManager: ******************* START ********************************** NetworkManager: (no debugging symbols found) NetworkManager: Using host libthread_db library "/lib/libthread_db.so.1". NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: [Thread debugging using libthread_db enabled] NetworkManager: [New Thread -1211812160 (LWP 3283)] NetworkManager: [New Thread -1228600432 (LWP 3316)] NetworkManager: [New Thread -1211815024 (LWP 3284)] NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: (no debugging symbols found) NetworkManager: 0xb7fb8410 in kernel_vsyscall () NetworkManager: #0 0xb7fb8410 in kernel_vsyscall () NetworkManager: #1 0xb7e0843b in waitpid () from /lib/libpthread.so.0 NetworkManager: #2 0x080749f0 in ?? () NetworkManager: #3 0x08074ae1 in ?? () NetworkManager: #4 <signal handler called> NetworkManager: #5 0x08053c9f in nm_device_notify_no_best_device () NetworkManager: #6 0x08071adb in ?? () NetworkManager: #7 0xb7e3da91 in g_str_equal () from /lib/libglib-2.0.so.0 NetworkManager: #8 0xb7e3f7f2 in g_main_context_dispatch () from /lib/libglib-2.0.so.0 NetworkManager: #9 0xb7e427cf in g_str_equal () from /lib/libglib-2.0.so.0 NetworkManager: #10 0xb7e42b79 in g_main_loop_run () from /lib/libglib-2.0.so.0 NetworkManager: #11 0x080707d8 in main () NetworkManager: NetworkManager: Thread 3 (Thread -1211815024 (LWP 3284)): NetworkManager: #0 0xb7fb8410 in kernel_vsyscall () NetworkManager: No symbol table info available. NetworkManager: #1 0xb7d6c1a3 in poll () from /lib/libc.so.6 NetworkManager: No symbol table info available. NetworkManager: #2 0xb7e42803 in g_str_equal () from /lib/libglib-2.0.so.0 NetworkManager: No symbol table info available. NetworkManager: #3 0xb7e42b79 in g_main_loop_run () from /lib/libglib-2.0.so.0 NetworkManager: No symbol table info available. NetworkManager: #4 0x0805791e in ?? () NetworkManager: No symbol table info available. NetworkManager: #5 0xb7e5d66f in g_str_equal () from /lib/libglib-2.0.so.0 NetworkManager: No symbol table info available. NetworkManager: #6 0xb7e004fb in start_thread () from /lib/libpthread.so.0 NetworkManager: No symbol table info available. NetworkManager: #7 0xb7d7620e in clone () from /lib/libc.so.6 NetworkManager: No symbol table info available. NetworkManager: NetworkManager: Thread 2 (Thread -1228600432 (LWP 3316)): NetworkManager: #0 0xb7fb8410 in kernel_vsyscall () NetworkManager: No symbol table info available. NetworkManager: #1 0xb7e074eb in read () from /lib/libpthread.so.0 NetworkManager: No symbol table info available. NetworkManager: #2 0xb7e4044d in g_str_equal () from /lib/libglib-2.0.so.0 NetworkManager: No symbol table info available. NetworkManager: #3 0xb7e5d66f in g_str_equal () from /lib/libglib-2.0.so.0 NetworkManager: No symbol table info available. NetworkManager: #4 0xb7e004fb in start_thread () from /lib/libpthread.so.0 NetworkManager: No symbol table info available. NetworkManager: #5 0xb7d7620e in clone () from /lib/libc.so.6 NetworkManager: No symbol table info available. NetworkManager: NetworkManager: Thread 1 (Thread -1211812160 (LWP 3283)): NetworkManager: #0 0xb7fb8410 in kernel_vsyscall () NetworkManager: No symbol table info available. NetworkManager: #1 0xb7e0843b in waitpid () from /lib/libpthread.so.0 NetworkManager: No symbol table info available. NetworkManager: #2 0x080749f0 in ?? () NetworkManager: No symbol table info available. NetworkManager: #3 0x08074ae1 in ?? () NetworkManager: No symbol table info available. NetworkManager: #4 <signal handler called> NetworkManager: No symbol table info available. NetworkManager: #5 0x08053c9f in nm_device_notify_no_best_device () NetworkManager: No symbol table info available. NetworkManager: #6 0x08071adb in ?? () NetworkManager: No symbol table info available. NetworkManager: #7 0xb7e3da91 in g_str_equal () from /lib/libglib-2.0.so.0 NetworkManager: No symbol table info available. NetworkManager: #8 0xb7e3f7f2 in g_main_context_dispatch () from /lib/libglib-2.0.so.0 NetworkManager: No symbol table info available. NetworkManager: #9 0xb7e427cf in g_str_equal () from /lib/libglib-2.0.so.0 NetworkManager: No symbol table info available. NetworkManager: #10 0xb7e42b79 in g_main_loop_run () from /lib/libglib-2.0.so.0 NetworkManager: No symbol table info available. NetworkManager: #11 0x080707d8 in main () NetworkManager: No symbol table info available. NetworkManager: #0 0xb7fb8410 in kernel_vsyscall () NetworkManager: The program is running. Quit anyway (and detach it)? (y or n) [answered Y; input not from terminal]

(NetworkManager:3283): GLib-GObject-WARNING **: invalid unclassed pointer in cast to `NMDevice80211MeshOLPC'

(NetworkManager:3283): GLib-GObject-WARNING **: invalid unclassed pointer in cast to `NMDevice' NetworkManager: ******************* END ********************************** }}}

Change History

Changed 7 years ago by dwmw2

Program received signal SIGSEGV, Segmentation fault.                            
[Switching to Thread -1211885888 (LWP 2090)]                                    
0x08053c9f in nm_device_notify_no_best_device (self=0x80b4030)                  
    at nm-device.c:1959                                                         
1959            if (NM_DEVICE_GET_CLASS (self)->notify_no_best_device)          
(gdb) bt                                                                        
#0  0x08053c9f in nm_device_notify_no_best_device (self=0x80b4030)              
    at nm-device.c:1959                                                         
#1  0x08071adb in nm_policy_device_change_check (data=0x80a80d8)                
    at NetworkManagerPolicy.c:346                                               
#2  0xb7e2ba91 in g_str_equal () from /lib/libglib-2.0.so.0                     
#3  0xb7e2d7f2 in g_main_context_dispatch () from /lib/libglib-2.0.so.0         
#4  0xb7e307cf in g_str_equal () from /lib/libglib-2.0.so.0                     
#5  0xb7e30b79 in g_main_loop_run () from /lib/libglib-2.0.so.0                 
#6  0x080707d8 in main (argc=1918989871, argv=0xbfe8ec64)                       
    at NetworkManager.c:1053                                                    

Changed 7 years ago by jg

  • owner changed from jg to dcbw
  • component changed from distro to network manager
  • milestone changed from Never Assigned to Update.1

Changed 7 years ago by dcbw

Good catch, looks pretty simple.

Changed 7 years ago by cjb

  • cc jg, kimquirk, cjb added

I don't think we can ship Update.1 (with OHM) without this fixed -- today I saw the firmware on the 8388 crap out during resume, so the kernel reset the device, and NM crashed (as expected).

Jim/Kim, please raise this to blocker?

Changed 7 years ago by dcbw

What are the steps to induce the wireless device to crap itself and go away? Is an rmmod enough?

Changed 7 years ago by dwmw2

Yes, that should suffice. NetworkManager doesn't care _why_ it goes away; it just crashes every time it goes away, I believe.

Changed 7 years ago by dcbw

Please test the RPMs available here:

http://koji.fedoraproject.org/koji/taskinfo?taskID=337166

Changed 7 years ago by dsd

  • cc dsd added

This isn't that easy to reproduce. Tried on a B4 on joyride-1520:

rmmod usb8xxx does not cause a NM crash (confirmed by seeing the process remaining in ps), nor does David's wlanfrob program.

Based on a suggestion that it might only happen during scans, I tried wlanfrob and rmmod while "iwlist eth0 scan" was running in a repeated loop, no crash. I also established the periodicity of when NM scans and ran wlanfrom in the middle of a NetworkManager-invoked scan, no crash.

Changed 7 years ago by dcbw

I was easily able to reproduce with build 450 on a B2... it might be that rmmod in your case doesn't actually remove the module due to usage counts or something...

Changed 7 years ago by dcbw

Note that the fixed RPMS got automatically pulled into joyride 1524. The version I built above also increased the wpa_supplicant authentication timeout which should help connections in low-memory or high-load conditions.

Changed 7 years ago by dsd

Ah, indeed, it just takes a while to crash (about a minute in some cases). Confirmed that a few times on both ship2.2-656 and joyride-1520.

I upgraded one of those systems to joyride-1524 and can confirm that the problem goes away (no more crashing on rmmod). So I guess the next stages can be taken in getting this in the update2 build.

Changed 7 years ago by marco

  • owner changed from dcbw to ApprovalForUpdate

Asking for approval since dsd tested it in joyride. (dcbw I hope you don't mind).

NetworkManager-0.6.5-0.8.svn3218.olpc2

Changed 7 years ago by jg

  • owner changed from ApprovalForUpdate to dgilmore

Changed 7 years ago by dgilmore

  • owner changed from dgilmore to marco

Please test update.1 build 684

Changed 6 years ago by marco

  • status changed from new to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.