Ticket #10366 (closed defect: fixed)

Opened 3 years ago

Last modified 19 months ago

idle suspend causes network connection in progress to fail

Reported by: greenfeld Owned by:
Priority: normal Milestone: 11.3.0
Component: not assigned Version: not specified
Keywords: Cc: pgf, sridhar
Action Needed: no action Verified: no
Deployments affected: Blocked By:
Blocking:

Description

This likely needs to be triaged, as I probably keep guessing wrong as to why eth0 loses track of its ESSID/IP Address information. Again this is with the RPMs from #9845 installed, but at first glance the problem appears to be below the Sugar layer.

My production XO-1.5 laptops occasionally lose track of what to connect to, even when they are supposed to default to an Adhoc network with the #9845 changes. So one time when I cleared the Network History from Sugar, clicked the checkmark to close the Network control panel, and pressed Ctrl-Alt-Esc to restart X & Sugar, I noticed that the #9845 change did not reconnect to the Adhoc #1 network by default like it should. So I enabled sugar & presence server debugging, and kept clearing the history/restarting X a few times on the laptop showing the issue (as well as one which wasn't showing it) trying to get the problem to reproduce and/or go away. {It happened twice on the broken one and persisted for a while each time; never on the non-broken XO 1.5, although the latter has shown this behavior before.}

Martin took an initial look, and although it may not relate to the above, the area around "Sep 15 17:14:52" in var/log/messages looked interesting. There, the laptop decided to sleep in the middle of NetworkManager setting up a connection. This resulted in NetworkManager deciding that the connection was invalid, and not trying to restore it.

Attached please find the system/sugar/powersave log files from the system which lost track of connections. I turned on verbose sugar debugging after the first time the issue was spotted.

Attachments

logbundle.tgz (119.8 kB) - added by greenfeld 3 years ago.
Bundle of log files from the system which lost track of its ESSID & didn't connect to AdHoc1 instead

Change History

Changed 3 years ago by greenfeld

Bundle of log files from the system which lost track of its ESSID & didn't connect to AdHoc1 instead

Changed 3 years ago by Quozl

  • next_action changed from never set to diagnose
  • cc pgf added
  • component changed from network manager to not assigned
  • summary changed from Possible Networkmanager/S3 sleep race condition and/or other issue(s) to idle suspend causes network connection in progress to fail
  • milestone changed from Not Triaged to 10.1.3
  • owner dsd deleted

Triage may include reworking the problem description, reproducing, prioritising, and proposing a milestone.

The problem is that an idle suspend interrupts the establishment of a connection by Network Manager, causing the attempt to connect to fail.

I have observed that in other contexts, with os852 unpatched, so I think this is reproduced.

I agree with a normal priority.

I propose 10.1.3 as milestone.

I don't think that the problem is necessarily in Network Manager, it might well be in powerd. I don't think Network Manager was ever intended to complete the establishment of a connection if that operation is interrupted by suspend.

Changed 3 years ago by martin.langhoff

NM should know we're in the process of suspending -- I propose investigating if there is a POSIX signal or dbus msg powerd should be sending to NM.

Changed 3 years ago by Quozl

I don't understand. Once a suspend is requested by powerd, the NetworkManager process may not execute again until resume. If NetworkManager is in the process of connecting, e.g. it has issued I/O calls to the network device and is waiting for the result, then it won't get the result until resume, and this delay may invalidate the connection timing; such as the DHCP negotiation.

Changed 20 months ago by dsd

  • next_action changed from diagnose to test in build

powerd now monitors NM state and avoids idle-suspending while connecting to wifi. please test 11.3.0 build 5.

Changed 19 months ago by greenfeld

  • status changed from new to closed
  • next_action changed from test in build to no action
  • resolution set to fixed

I have not seen a clear case of this happening, so the issue presumably is fixed in the 11.3.0 series.

However, we still may suspend prior to attempting any network connection.

Changed 19 months ago by sridhar

  • cc sridhar added
Note: See TracTickets for help on using tickets.