Ticket #10366 (closed defect: fixed)

Opened 4 years ago

Last modified 3 years ago

idle suspend causes network connection in progress to fail

Reported by: greenfeld Owned by:
Priority: normal Milestone: 11.3.0
Component: not assigned Version: not specified
Keywords: Cc: pgf, sridhar
Action Needed: no action Verified: no
Deployments affected: Blocked By:
Blocking:

Description

This likely needs to be triaged, as I probably keep guessing wrong as to why eth0 loses track of its ESSID/IP Address information. Again this is with the RPMs from #9845 installed, but at first glance the problem appears to be below the Sugar layer.

My production XO-1.5 laptops occasionally lose track of what to connect to, even when they are supposed to default to an Adhoc network with the #9845 changes. So one time when I cleared the Network History from Sugar, clicked the checkmark to close the Network control panel, and pressed Ctrl-Alt-Esc to restart X & Sugar, I noticed that the #9845 change did not reconnect to the Adhoc #1 network by default like it should. So I enabled sugar & presence server debugging, and kept clearing the history/restarting X a few times on the laptop showing the issue (as well as one which wasn't showing it) trying to get the problem to reproduce and/or go away. {It happened twice on the broken one and persisted for a while each time; never on the non-broken XO 1.5, although the latter has shown this behavior before.}

Martin took an initial look, and although it may not relate to the above, the area around "Sep 15 17:14:52" in var/log/messages looked interesting. There, the laptop decided to sleep in the middle of NetworkManager setting up a connection. This resulted in NetworkManager deciding that the connection was invalid, and not trying to restore it.

Attached please find the system/sugar/powersave log files from the system which lost track of connections. I turned on verbose sugar debugging after the first time the issue was spotted.

Attachments

logbundle.tgz (119.8 kB) - added by greenfeld 4 years ago.
Bundle of log files from the system which lost track of its ESSID & didn't connect to AdHoc1 instead

Change History

Changed 4 years ago by greenfeld

Bundle of log files from the system which lost track of its ESSID & didn't connect to AdHoc1 instead

Changed 4 years ago by Quozl

  • next_action changed from never set to diagnose
  • cc pgf added
  • component changed from network manager to not assigned
  • summary changed from Possible Networkmanager/S3 sleep race condition and/or other issue(s) to idle suspend causes network connection in progress to fail
  • milestone changed from Not Triaged to 10.1.3
  • owner dsd deleted

Triage may include reworking the problem description, reproducing, prioritising, and proposing a milestone.

The problem is that an idle suspend interrupts the establishment of a connection by Network Manager, causing the attempt to connect to fail.

I have observed that in other contexts, with os852 unpatched, so I think this is reproduced.

I agree with a normal priority.

I propose 10.1.3 as milestone.

I don't think that the problem is necessarily in Network Manager, it might well be in powerd. I don't think Network Manager was ever intended to complete the establishment of a connection if that operation is interrupted by suspend.

Changed 4 years ago by martin.langhoff

NM should know we're in the process of suspending -- I propose investigating if there is a POSIX signal or dbus msg powerd should be sending to NM.

Changed 4 years ago by Quozl

I don't understand. Once a suspend is requested by powerd, the NetworkManager process may not execute again until resume. If NetworkManager is in the process of connecting, e.g. it has issued I/O calls to the network device and is waiting for the result, then it won't get the result until resume, and this delay may invalidate the connection timing; such as the DHCP negotiation.

Changed 3 years ago by dsd

  • next_action changed from diagnose to test in build

powerd now monitors NM state and avoids idle-suspending while connecting to wifi. please test 11.3.0 build 5.

Changed 3 years ago by greenfeld

  • status changed from new to closed
  • next_action changed from test in build to no action
  • resolution set to fixed

I have not seen a clear case of this happening, so the issue presumably is fixed in the 11.3.0 series.

However, we still may suspend prior to attempting any network connection.

Changed 3 years ago by sridhar

  • cc sridhar added
Note: See TracTickets for help on using tickets.