#12580 closed defect (fixed)

XO-1.5 audio playback hang

Reported by: dsd Owned by: dsd
Priority: normal Milestone: 13.2.0
Component: kernel Version: not specified
Keywords: Cc:
Blocked By: Blocking:
Deployments affected: Action Needed: test in build
Verified: no

Description

Seen in Nicaragua's 12.1.0 spin. "I know my ABC" activity was running, and had hung, blocked trying to play a sound via gst-launch, no audio being produced.

The strace of that process just showed a poll loop.

Trying to run speaker-test as root also hung before playing the first sound.

After killing the gst-launch process, the activity spawned another one which also hung. Now speaker-test (as root) would apparently cycle through the sounds, but there was no audible output.

After killing the new gst-launch process, no further sounds could be played. speaker-test (as user) complains: unable to create IPC semaphore

ipcs shows that the memory and semaphore was owned by root. Deleted it with ipcrm, and now things started playing again. Maybe me running speaker-test as root had screwed something up.

Indeed just running the gst-launch command in a loop will reproduce this problem:

while true; do
/usr/bin/gst-launch-0.10 filesrc location=/home/olpc/Activities/IKnowMyABCs.activity/sounds/es/letra_o.ogg  ! oggdemux ! vorbisdec ! audioconvert ! alsasink
done

Most of the time it plays the sound fine, sometimes it doesn't give any errors but produces no audible output, and sometimes it hangs.

Change History (5)

comment:1 Changed 18 months ago by dsd

Caught this in gdb:

(gdb) t 2
[Switching to thread 2 (Thread 0xa5fffb40 (LWP 1772))]
#0  0xa7708424 in __kernel_vsyscall ()
(gdb) bt
#0  0xa7708424 in __kernel_vsyscall ()
#1  0xa7200bdb in poll () from /lib/libc.so.6
#2  0xa697f468 in ?? () from /lib/libasound.so.2
#3  0xa697f64c in snd_pcm_wait () from /lib/libasound.so.2
#4  0xa6a275cb in gst_alsasink_write (asink=0x9747998, data=0xa604d0e8, 
    length=8192) at gstalsasink.c:857
#5  0xa6d29b9d in audioringbuffer_thread_func (buf=0x974d800)
    at gstaudiosink.c:246
#6  0xa7362644 in ?? () from /lib/libglib-2.0.so.0
#7  0xa72d3aff in start_thread () from /lib/libpthread.so.0
#8  0xa720b0ee in clone () from /lib/libc.so.6
(gdb) t 3
[Switching to thread 3 (Thread 0xa6916b40 (LWP 1771))]
#0  0xa7708424 in __kernel_vsyscall ()
(gdb) bt
#0  0xa7708424 in __kernel_vsyscall ()
#1  0xa72d718c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0xa737ed47 in g_cond_wait () from /lib/libglib-2.0.so.0
#3  0xa6d07570 in wait_segment (buf=buf@entry=0x974d800)
    at gstringbuffer.c:1513
#4  0xa6d07769 in default_commit (buf=0x974d800, sample=0xa69154a8, 
    data=0xa6060ef8 "", in_samples=64, out_samples=64, accum=0xa691546c)
    at gstringbuffer.c:1713
#5  0xa6d08b1f in gst_ring_buffer_commit_full (buf=buf@entry=0x974d800, 
    sample=sample@entry=0xa69154a8, 
    data=data@entry=0xa605f0f8 "@\316\002\v\340\003g\017 \f\032\020`\253&\024\340\347\325\024 \247r\030@\205\302\031\240_\337\034`\th\037\300\"\036\"@\027\372$\300yf'\300\060U)\300U\210+\300", in_samples=in_samples@entry=1024, 
    out_samples=out_samples@entry=1024, accum=accum@entry=0xa691546c)
    at gstringbuffer.c:1815
#6  0xa6d1f061 in gst_base_audio_sink_render (bsink=0x9747998, 
    buf=<optimized out>) at gstbaseaudiosink.c:1873
#7  0xa6de5509 in ?? () from /lib/libgstbase-0.10.so.0
#8  0xa6de93eb in ?? () from /lib/libgstbase-0.10.so.0
#9  0xa6de9a80 in ?? () from /lib/libgstbase-0.10.so.0
#10 0xa6de9ee6 in ?? () from /lib/libgstbase-0.10.so.0
#11 0xa766a918 in gst_pad_push () from /lib/libgstreamer-0.10.so.0
#12 0xa6dfb140 in ?? () from /lib/libgstbase-0.10.so.0

Tracing through gst_alsasink_write() should help figure this out.

comment:2 Changed 18 months ago by dsd

snd_pcm_wait() returns success ("you can write audio data now") but snd_pcm_writei fails with EAGAIN. So gstreamer gets confused.

Disabling dmix doesn't help.

We can catch this inside the kernel. With some logging messages added, in azx_interrupt() we can see that in the failure case, the system receives a flood of stream FIFO error interrupts (SD_STS 0x8), rather than the normal "audio buffer was played" interrupts (SD_STS 0x24).

It's not clear why such a FIFO error occurred, nor what the FIFO actually is. And the HDA driver currently has no code to handle such conditions.

http://mailman.alsa-project.org/pipermail/alsa-devel/2013-May/061990.html

comment:3 Changed 17 months ago by dsd

  • Action Needed changed from never set to add to build
  • Milestone changed from Future Release to 13.2.0

That FIFO error couldn't be reproduced later, but other kinds of issues were popping up. After looking closely, I found that the codec was being powered up and down at odd points. Backporting recent fixes from upstream allows 2 XO-1.5s to survive an overnight test so I am confident this is fixed.

Fixed in x86-3.3 c0f817d..f12d9e6

comment:4 Changed 17 months ago by dsd

  • Action Needed changed from add to build to test in build

Test in 13.2.0 build 8.

comment:5 Changed 17 months ago by dsd

  • Resolution set to fixed
  • Status changed from new to closed

The original test case has been running for most of the day, it is working perfectly. No hangs, no silent playback.

Note: See TracTickets for help on using tickets.