This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
Re: 1.5.24 (and later): race condition in sigproc.cc
- From: Christopher Faylor <cgf-use-the-mailinglist-please at cygwin dot com>
- To: cygwin-developers at cygwin dot com
- Date: Fri, 13 Jul 2007 22:29:45 -0400
- Subject: Re: 1.5.24 (and later): race condition in sigproc.cc
- References: <94511A2526233044AC4E4C278B80B3BA918BC6@IRIS.electric-cloud.com>
- Reply-to: cygwin-developers at cygwin dot com
On Fri, Jul 13, 2007 at 05:09:24PM -0700, Scott Stanton wrote:
>I have found what I believe to be a race condition in sigproc.cc and
>exceptions.cc. The problem is that any access to the in_dllentry
>variable defined in init.cc is vulnerable to a race condition when a new
>thread is being initialized.
>
>The initial symptom is that under heavy load on a multiprocessor machine
>cygwin processes intermittently fail with a "fork: Resource temporarily
>unavailable" error. I tracked this to the sig_send() calls inside
>fork(). These calls were failing, causing fork to return EAGAIN. The
>sig_send() call was failing on the first no_signals_available() test.
>After expanding the macro to see which arm of the test was failing, it
>turns out that sig_send() was seeing a non-zero value for in_dllentry.
>This boolean is set during the call to dll_entry() whenever a process or
>thread attaches to the cygwin dll. Because the in_dllentry variable is
>checked without holding a mutex, threads calling sig_send() can
>temporarily see the value as true when a new thread is starting. If the
>sig_send() code is modified to retry after yielding the processor, the
>second attempt succeeds.
When a process or thread attaches via dll_entry it is supposed to be
single threaded at that point so another thread isn't supposed to be
able to see that variable as true. So, what you may be seeing is that
in_dllentry is being duplicated by fork and that is causing a problem.
The fix for that is simple. I've checked it in.
If in_dllentry is really still getting set and other threads can see that
then much more work will need to be done since that violates a lot of
assumptions about what goes on via process/thread attach/detach.
cgf