[vdr] vdr-xine: what's wrong with this piece of code -- threading issue?

Reinhard Nissl rnissl at gmx.de
Fri Mar 25 00:18:19 CET 2005


Hi,

I'm facing a deadlock situation, when the below code is modified to 
ignore the "r == 0" cases (= original code in vdr-xine-0.7.2):

   int cXineLib::xread(int f, void *b, int n)
   {
     bool atEOF = false;

     int t = 0;

     while (t < n)
     {
       void (* const sigPipeHandler)(int) = ::signal(SIGPIPE, SIG_IGN);

       errno = 0;
       int r = ::read(f, ((char *)b) + t, n - t);
       int myErrno = errno;

       ::signal(SIGPIPE, sigPipeHandler);

       if (r < 0
         || (r == 0 && atEOF))
       {
         if (EAGAIN == myErrno)
           continue;

         fprintf(stderr, "lib::read(%d) failed (atEOF: %d) %d: ", n, 
atEOF, myErrno);
         errno = myErrno;
         perror("");

         disconnect();

         return r;
       }
       else if (r == 0)
       {
         cPoller Poller(f);
         atEOF = Poller.Poll(0);
fprintf(stderr, "--- lib read 0, atEOF %d\n", atEOF);
       }
       else
         atEOF = false;

       t += r;
     }

     return t;
   }

Some more information:
- Filedescriptor "f" represents a FIFO, which is opend for reading and 
should be in blocking mode by design.
- As this function is called by different threads (synchronized outside 
via mutex), I save and restore the handler for signal SIGPIPE, as I 
don't want any of VDR's signal handlers to be triggered.

The read() should block until data is available and typically return the 
number of bytes read, which should be greater than zero. But there are 
some cases, where read() returns zero:
a) the writing side of the FIFO was closed.
b) a signal caused the block to break.

The original code simply ignores the result of zero, as for case a) a 
different function (xwrite), which should be called by a different 
thread, should see a SIGPIPE and initiate the disconnect(). Case b) 
should simply go on with reading the remaining data. But this leads to a 
deadlock situation where the read() never returns anything != 0 and 
therefore the original loop spins forever. This is most likely to 
trigger if you move cutting marks (on my machine it only triggers when 
xine uses -V xshm and when moving cutting marks in HDTV recordings).

The new code above tries to detect case a) by asking a cPoller, whether 
data is available on file descriptor f, after the read returned 0.

Let's assume, the Poll() returns true, because data is available: atEOF 
is set pessimistically but the next read() should return anything > 0, 
which resets atEOF. The loop continues.

When the Poll() returns false as there is no further data available, the 
next read() should block. The loop continues.

But the Poll() might return true in an error condition (e. g. the 
writing side of the FIFO was closed). Then the next read() is expected 
to return anything <= 0. The loop terminates and a disconnect() happens.

The strange thing is now, that a disconnect() happens occationally when 
moving cutting marks.

Any help appreciated! Thanks!

Bye.
-- 
Dipl.-Inform. (FH) Reinhard Nissl
mailto:rnissl at gmx.de



More information about the vdr mailing list