Ticket #10802 (new defect)

Opened 4 years ago

Last modified 3 years ago

suspend sometimes hangs with several blocked tasks

Reported by: sascha_silbe Owned by: buytenh
Priority: normal Milestone: 11.3.0
Component: kernel Version: Development source as of this date
Keywords: Cc:
Action Needed: diagnose Verified: no
Deployments affected: Blocked By:
Blocking:

Description

With a kernel based on OLPC 2.6.35 (commit 2447ae65) suspend sometimes hangs (magic SysRq via serial console still works):

[ 2346.129716] Restarting tasks ... done.
[ 2359.363513] PM: Syncing filesystems ... done.
[ 2359.377963] Freezing user space processes ... (elapsed 0.01 seconds) done.
[ 2359.403030] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
[ 2359.425706] dcon_source_switch to DCON
[ 2359.465513] olpc-dcon: The DCON has control
[ 2359.470673] HDA Intel 0000:00:14.0: PCI INT A disabled
[ 2359.476266] ehci_hcd 0000:00:10.4: PCI INT D disabled
[ 2359.481681] uhci_hcd 0000:00:10.2: PCI INT C disabled
[ 2359.487075] uhci_hcd 0000:00:10.1: PCI INT B disabled
[ 2359.492489] uhci_hcd 0000:00:10.0: PCI INT A disabled
[ 2359.497920] viafb_suspend!
[ 2359.501522] viafb 0000:00:01.0: PCI INT A disabled
[ 2359.506666] mmc1:0001:1: suspend: PM flags = 0x3
[ 2523.010056] INFO: task kjournald:1438 blocked for more than 120 seconds.
[ 2523.017190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2523.025536] kjournald     D ea21ee60  2596  1438      2 0x00000000
[ 2523.032169]  ea21ee80 00000046 0b300001 ea21ee60 ea21ee64 b0420fcf 00000001 ea35d530
[ 2523.040558]  a63754ab 00000226 ea35d530 ea35d530 00000000 ea21eecc ea21ee8c b088cdf0
[ 2523.048942]  ea21eec4 ea21ee94 b04ce488 ea21eeb0 b088d2d7 b04ce455 b19cc9b8 b04ce455
[ 2523.057297] Call Trace:
[ 2523.059912]  [<b0420fcf>] ? get_parent_ip+0xb/0x31
[ 2523.065051]  [<b088cdf0>] io_schedule+0x21/0x31
[ 2523.069874]  [<b04ce488>] sync_buffer+0x33/0x37
[ 2523.074730]  [<b088d2d7>] __wait_on_bit+0x39/0x60
[ 2523.079737]  [<b04ce455>] ? sync_buffer+0x0/0x37
[ 2523.084683]  [<b04ce455>] ? sync_buffer+0x0/0x37
[ 2523.089611]  [<b088d39d>] out_of_line_wait_on_bit+0x9f/0xa7
[ 2523.095578]  [<b043de1c>] ? wake_bit_function+0x0/0x3c
[ 2523.101062]  [<b04ce3e9>] __wait_on_buffer+0x1e/0x21
[ 2523.106356]  [<b0540051>] journal_commit_transaction+0x9e8/0xf5b
[ 2523.112772]  [<b088cda7>] ? schedule+0x448/0x470
[ 2523.117688]  [<b0890ae4>] ? sub_preempt_count+0x9/0x8e
[ 2523.123183]  [<b0420fcf>] ? get_parent_ip+0xb/0x31
[ 2523.128286]  [<b05430c5>] kjournald+0xc1/0x1e1
[ 2523.133053]  [<b043dde8>] ? autoremove_wake_function+0x0/0x34
[ 2523.139155]  [<b0543004>] ? kjournald+0x0/0x1e1
[ 2523.143992]  [<b043dab0>] kthread+0x63/0x68
[ 2523.148442]  [<b043da4d>] ? kthread+0x0/0x68
[ 2523.153003]  [<b0402cf6>] kernel_thread_helper+0x6/0x10
[ 2523.158625] INFO: task rtcwake:895 blocked for more than 120 seconds.
[ 2523.165502] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2523.173839] rtcwake       D e9c89e50  2744   895   2399 0x00000000
[ 2523.180481]  e9c89e70 00000086 00000001 e9c89e50 b0890b5c 00000246 e9c89e5c ea2ce0c0
[ 2523.188853]  5e5aa79f 00000225 ea2ce0c0 58a5589b 00000225 e9c89e84 e9c89ea4 b044344d
[ 2523.197214]  00000000 0000156a b0a961b0 00000000 ea2ce0c0 b043dde8 b0a961a8 b0a961a8
[ 2523.205618] Call Trace:
[ 2523.208241]  [<b0890b5c>] ? sub_preempt_count+0x81/0x8e
[ 2523.213828]  [<b044344d>] async_synchronize_cookie_domain+0xaf/0x103
[ 2523.220594]  [<b043dde8>] ? autoremove_wake_function+0x0/0x34
[ 2523.226708]  [<b04434b3>] async_synchronize_cookie+0x12/0x14
[ 2523.232751]  [<b04434cd>] async_synchronize_full+0x18/0x36
[ 2523.238590]  [<b06adbbc>] dpm_suspend_start+0x344/0x36b
[ 2523.244182]  [<b0452790>] suspend_devices_and_enter+0x4a/0x17f
[ 2523.250392]  [<b0452988>] enter_state+0xc3/0x10a
[ 2523.255311]  [<b0452269>] state_store+0x98/0xac
[ 2523.260155]  [<b04521d1>] ? state_store+0x0/0xac
[ 2523.265069]  [<b060a175>] kobj_attr_store+0x16/0x22
[ 2523.270283]  [<b04f302b>] sysfs_write_file+0xc4/0xef
[ 2523.275564]  [<b04b07f5>] vfs_write+0x8f/0x117
[ 2523.280320]  [<b04f2f67>] ? sysfs_write_file+0x0/0xef
[ 2523.285695]  [<b04b091e>] sys_write+0x40/0x65
[ 2523.290360]  [<b040278c>] sysenter_do_call+0x12/0x22
[ 2523.295638] INFO: task async/1:897 blocked for more than 120 seconds.
[ 2523.302499] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2523.310839] async/1       D 00000225  3368   897      2 0x00000000
[ 2523.317433]  ea35be88 00000046 5e5a98f8 00000225 ea7925b0 eaa7561c b0a93b70 eaa755f0
[ 2523.325843]  5e5a77f1 00000225 eaa755f0 eb1209d4 7fffffff 7fffffff ea35bedc b088cf5c
[ 2523.334184]  00000000 86ced74a 0000001c 00000000 eaa7561c 00000001 ea7925e4 b0a93b70
[ 2523.342596] Call Trace:
[ 2523.345223]  [<b088cf5c>] schedule_timeout+0x1c/0x1f2
[ 2523.350617]  [<b0420fcf>] ? get_parent_ip+0xb/0x31
[ 2523.355714]  [<b0890b5c>] ? sub_preempt_count+0x81/0x8e
[ 2523.361289]  [<b088c876>] wait_for_common+0xb9/0x101
[ 2523.366570]  [<b0425f25>] ? default_wake_function+0x0/0x12
[ 2523.372426]  [<b06acd4d>] ? dpm_wait_fn+0x0/0x14
[ 2523.377344]  [<b088c95d>] wait_for_completion+0x17/0x19
[ 2523.382934]  [<b06acd2f>] dpm_wait+0x27/0x29
[ 2523.387487]  [<b06acd5d>] dpm_wait_fn+0x10/0x14
[ 2523.392333]  [<b06a6f6d>] device_for_each_child+0x2d/0x57
[ 2523.398089]  [<b06ad7ba>] __device_suspend+0x22/0xe0
[ 2523.403393]  [<b06adbff>] async_suspend+0x1c/0x4a
[ 2523.408404]  [<b044324f>] async_thread+0xf0/0x1d3
[ 2523.413432]  [<b0425f25>] ? default_wake_function+0x0/0x12
[ 2523.419274]  [<b044315f>] ? async_thread+0x0/0x1d3
[ 2523.424388]  [<b043dab0>] kthread+0x63/0x68
[ 2523.433414]  [<b0402cf6>] kernel_thread_helper+0x6/0x10

This is on an XO-1.5 running from an external, ext3 formatted SD card. To reproduce the bug, I ran powerd with a 5 second timeout and triggered a wake-up via IPv6 ping with an interval of 30s. It took over an hour to trigger; during regular use, it happens about once per day.

Diff against log messages of a "good" suspend ("-" = good, "+" = hang):

 Restarting tasks ... done.
 PM: Syncing filesystems ... done.
-PM: Preparing system for mem sleep
-Freezing user space processes ... (elapsed 0.02 seconds) done.
+Freezing user space processes ... (elapsed 0.01 seconds) done.
 Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
-PM: Entering mem sleep
 dcon_source_switch to DCON
 olpc-dcon: The DCON has control
 HDA Intel 0000:00:14.0: PCI INT A disabled
@@ -14,9 +12,251 @@ uhci_hcd 0000:00:10.0: PCI INT A disabled
 viafb_suspend!
 viafb 0000:00:01.0: PCI INT A disabled
 mmc1:0001:1: suspend: PM flags = 0x3
-sdhci_reset: slot wakeups enabled
-sdhci_pci_probe: Enable PME set to 0x103
-PM: suspend of devices complete after 101.093 msecs
-PM: late suspend of devices complete after 78.596 msecs
-ACPI: Preparing to enter system sleep state S3
-PM: Saving platform NVS memory
+INFO: task kjournald:1438 blocked for more than 120 seconds.
[...]

Change History

Changed 3 years ago by dsd

  • milestone changed from Not Triaged to 11.2.0-final

Is this still reproducible with more recent 2.6.35 kernels?

Note: See TracTickets for help on using tickets.