Ticket #12580 (closed defect: fixed)

Opened 17 months ago

Last modified 14 months ago

XO-1.5 audio playback hang

Reported by: dsd Owned by: dsd
Priority: normal Milestone: 13.2.0
Component: kernel Version: not specified
Keywords: Cc:
Action Needed: test in build Verified: no
Deployments affected: Blocked By:
Blocking:

Description

Seen in Nicaragua's 12.1.0 spin. "I know my ABC" activity was running, and had hung, blocked trying to play a sound via gst-launch, no audio being produced.

The strace of that process just showed a poll loop.

Trying to run speaker-test as root also hung before playing the first sound.

After killing the gst-launch process, the activity spawned another one which also hung. Now speaker-test (as root) would apparently cycle through the sounds, but there was no audible output.

After killing the new gst-launch process, no further sounds could be played. speaker-test (as user) complains: unable to create IPC semaphore

ipcs shows that the memory and semaphore was owned by root. Deleted it with ipcrm, and now things started playing again. Maybe me running speaker-test as root had screwed something up.

Indeed just running the gst-launch command in a loop will reproduce this problem:

while true; do
/usr/bin/gst-launch-0.10 filesrc location=/home/olpc/Activities/IKnowMyABCs.activity/sounds/es/letra_o.ogg  ! oggdemux ! vorbisdec ! audioconvert ! alsasink
done

Most of the time it plays the sound fine, sometimes it doesn't give any errors but produces no audible output, and sometimes it hangs.

Change History

Changed 15 months ago by dsd

Caught this in gdb:

(gdb) t 2
[Switching to thread 2 (Thread 0xa5fffb40 (LWP 1772))]
#0  0xa7708424 in __kernel_vsyscall ()
(gdb) bt
#0  0xa7708424 in __kernel_vsyscall ()
#1  0xa7200bdb in poll () from /lib/libc.so.6
#2  0xa697f468 in ?? () from /lib/libasound.so.2
#3  0xa697f64c in snd_pcm_wait () from /lib/libasound.so.2
#4  0xa6a275cb in gst_alsasink_write (asink=0x9747998, data=0xa604d0e8, 
    length=8192) at gstalsasink.c:857
#5  0xa6d29b9d in audioringbuffer_thread_func (buf=0x974d800)
    at gstaudiosink.c:246
#6  0xa7362644 in ?? () from /lib/libglib-2.0.so.0
#7  0xa72d3aff in start_thread () from /lib/libpthread.so.0
#8  0xa720b0ee in clone () from /lib/libc.so.6
(gdb) t 3
[Switching to thread 3 (Thread 0xa6916b40 (LWP 1771))]
#0  0xa7708424 in __kernel_vsyscall ()
(gdb) bt
#0  0xa7708424 in __kernel_vsyscall ()
#1  0xa72d718c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0xa737ed47 in g_cond_wait () from /lib/libglib-2.0.so.0
#3  0xa6d07570 in wait_segment (buf=buf@entry=0x974d800)
    at gstringbuffer.c:1513
#4  0xa6d07769 in default_commit (buf=0x974d800, sample=0xa69154a8, 
    data=0xa6060ef8 "", in_samples=64, out_samples=64, accum=0xa691546c)
    at gstringbuffer.c:1713
#5  0xa6d08b1f in gst_ring_buffer_commit_full (buf=buf@entry=0x974d800, 
    sample=sample@entry=0xa69154a8, 
    data=data@entry=0xa605f0f8 "@\316\002\v\340\003g\017 \f\032\020`\253&\024\340\347\325\024 \247r\030@\205\302\031\240_\337\034`\th\037\300\"\036\"@\027\372$\300yf'\300\060U)\300U\210+\300", in_samples=in_samples@entry=1024, 
    out_samples=out_samples@entry=1024, accum=accum@entry=0xa691546c)
    at gstringbuffer.c:1815
#6  0xa6d1f061 in gst_base_audio_sink_render (bsink=0x9747998, 
    buf=<optimized out>) at gstbaseaudiosink.c:1873
#7  0xa6de5509 in ?? () from /lib/libgstbase-0.10.so.0
#8  0xa6de93eb in ?? () from /lib/libgstbase-0.10.so.0
#9  0xa6de9a80 in ?? () from /lib/libgstbase-0.10.so.0
#10 0xa6de9ee6 in ?? () from /lib/libgstbase-0.10.so.0
#11 0xa766a918 in gst_pad_push () from /lib/libgstreamer-0.10.so.0
#12 0xa6dfb140 in ?? () from /lib/libgstbase-0.10.so.0

Tracing through gst_alsasink_write() should help figure this out.

Changed 15 months ago by dsd

snd_pcm_wait() returns success ("you can write audio data now") but snd_pcm_writei fails with EAGAIN. So gstreamer gets confused.

Disabling dmix doesn't help.

We can catch this inside the kernel. With some logging messages added, in azx_interrupt() we can see that in the failure case, the system receives a flood of stream FIFO error interrupts (SD_STS 0x8), rather than the normal "audio buffer was played" interrupts (SD_STS 0x24).

It's not clear why such a FIFO error occurred, nor what the FIFO actually is. And the HDA driver currently has no code to handle such conditions.

http://mailman.alsa-project.org/pipermail/alsa-devel/2013-May/061990.html

Changed 14 months ago by dsd

  • next_action changed from never set to add to build
  • milestone changed from Future Release to 13.2.0

That FIFO error couldn't be reproduced later, but other kinds of issues were popping up. After looking closely, I found that the codec was being powered up and down at odd points. Backporting recent fixes from upstream allows 2 XO-1.5s to survive an overnight test so I am confident this is fixed.

Fixed in x86-3.3 c0f817d..f12d9e6

Changed 14 months ago by dsd

  • next_action changed from add to build to test in build

Test in 13.2.0 build 8.

Changed 14 months ago by dsd

  • status changed from new to closed
  • resolution set to fixed

The original test case has been running for most of the day, it is working perfectly. No hangs, no silent playback.

Note: See TracTickets for help on using tickets.