Mailing List archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[vdr] Re: Is cMutex class thread safe? Possible bug?



Hi Colin,

On Thursday 30 October 2003 12:27 pm, Colin Paton wrote:
> Hi,
>
> I've found some code that looks a bit 'funny' to me and I'm wondering if
> anybody could advise.

you are not the only one that doesn't like this particular piece of code ;-)

> The cMutex class is roughly as follows:
>
> void cMutex::Lock(void)
> {
>   if (getpid() != lockingPid || !locked)
>   {
>     pthread_mutex_lock(&mutex);
>     lockingPid = getpid();
>   }
>   locked++;
> }
>
> void cMutex::Unlock(void)
> {
>   if (!--locked)
>   {
>     lockingPid = 0;
>     pthread_mutex_unlock(&mutex);
>   }
> }
>
> It looks to me as though there could be a potential race condition here in
> that the 'locked' count isn't protected by the mutex. This condition could
> occur if one thread tried to unlock the mutex while the other thread tried
> to lock it. In this case, 'locked' could be decremented by the thread
> calling 'Unlock()' at the same time as a different thread tested 'locked'
> in the Lock() method.

i spend some time looking at this code before and it *is* safe. The thread 
attempting to aquire the lock will increment locked only and only when it has 
gotten the lock or when it already held the lock.

> I think that recursive mutexes should be used instead. That way, the
> pthreads library keeps track of lock counts, effectively having its own
> internal 'locked' and 'lockingPid' variables which are accessed in a thread
> safe way.

well, i tried that. Most locks in vdr do not need to be recursive. The only 
lock that truly is recursive is in the eit code.

> Therefore, the calls would just look like:
>
> void cMutex::Lock(void)
> {
>   pthread_mutex_lock(&mutex);
> }
>
> void cMutex::Unlock(void)
> {
>   pthread_mutex_unlock(&mutex);
> }
>
> ...and the mutex would be created with the type
> PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP in pthread_mutex_init().

It's not that easy. VDR uses (has used, does it still?) cond_var's. Cond_var's 
do not work with recursive mutexes! So, you have to be extra carefull with 
which mutexes you make recursive. Simply using pthread_recursive mutexes by 
default will break some plugins and making them non recusrsive by default 
will break almost everything. So, this certainly is a 1.3 topic.

> Of course, having written all this I have noticed that a similar discussion
> appeared in 2001.
>
> http://www.linuxtv.org/mailinglists/vdr/2001/09-2001/msg00270.html
>
> This seems to state that using recursive mutexes leads to channel switching
> hanging, which is exactly the problem that I am encountering, using the
> default vdr 1.2.5 code (without using pthreads recursive mutexes). I
> sometimes have to 'kill -9 vdr'

I do have an old patch floating arround that use pure mutexes. I'm not exactly 
sure why i didn't pursue this further or wether it caused or fixed the 
channel switch hangups. I will dig it out and post an update.

> It seems that using the 'buggy' mutex above is potentially just hiding
> another problem, and it would be best to use mutex code without a race
> condition and then try to find out what is causing the hang/deadlock.

I don't think so. The recursive behaviour used there looks a bit strange at 
first sight, but is certainly not buggy.

> Before I do this however, I wonder if anybody could advise. The discussion
> above occurred two years ago, and I'd just like to confirm that nothing has
> changed before I try to track down the problem.

Well, the current code will not work on any system using the new NPTL 
implementation. This basicly mean that RH9 will definetly break vdr. The easy 
workarround is to put a LD_ASSUME_KERNEL="2.4.1" in your vdr start script.

> I am also wondering if the use of pthread_cancel is correct? It would seem
> that if the thread has become locked for more than 'Cancel' time then a
> fatal error should occur - something has become frozen? Again, I can
> reproduce the freezing problem by halting xine (using Ctrl-Z in the shell)
> and changing channel. I leave xine halted for more than 3 seconds and then
> unfreeze it. VDR has died, as it has killed the transfer thread which still
> (I believe) has the locked mutex.

well, canceling a thread that holds locks will not release those locks. So 
using pthread_cancel is always a bad thing(tm).
I'm not so sure about this xine thing however. Beeing an media player type 
application it will almost certainly fidles with it process priority. It 
could simply be, that halting xine causes the system to freeze or blocks some 
resource (e.g. hdd throughput) causing vdr's transfer thread to block.

> As usual with threads problems I may been looking in the wrong area, but I
> would appreciate comments.

Could you describe your setup a bit more detailed?

Andreas



-- 
Info:
To unsubscribe send a mail to ecartis@linuxtv.org with "unsubscribe vdr" as subject.



Home | Main Index | Thread Index