This is the mail archive of the cygwin-developers@cygwin.com mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: hang in sig_wait waiting for debug lock


Hi!

Friday, 06 September, 2002 Christopher Faylor cgf@redhat.com wrote:

>>Changelog states, however, that setclexec stuff isn't needed. Yet i
>>can't see why we shouldn't process protected handle list as long as we
>>recreating handles during set-close-on-exec operation. Can you give a
>>comment?

CF> I assume that you mean this entry:

CF> 2002-07-14  Christopher Faylor  <cgf@redhat.com>

CF>         * dcrt0.cc (dll_crt0_1): Move debug_init call back to here.  Avoid a
CF>         compiler warning.
CF>         * shared.cc (memory_init): Remove debug_init call.
CF>         * debug.h (handle_list): Change "clexec" to "inherited".
CF>         * debug.cc: Remove a spurious declaration.
CF>         (setclexec): Conditionalize away since it is currently unused.
CF>         (add_handle): Use inherited field rather than clexec.
CF>         (debug_fixup_after_fork_exec): Ditto.  Move debugging output to
CF>         delete_handle.
CF>         (delete_handle): Add debugging output.
CF>         * fhandler.cc (fhandler_base::set_inheritance): Don't bother setting
CF>         inheritance in debugging table since the handle was never protected
CF>         anyway.
CF>         (fhandler_base::fork_fixup): Ditto.

CF> I'm at a loss to understand why adding additional things into the
CF> protected handle table would solve a race.

I thought about it again and here's a hypothesis of what may be
happening.

I suspect that it's not exactly a race. I.e., it's caused not by
randomness in order in which different threads of control are
executed, but by randomness in which handles are allocated by OS.
If value of some handle allocated in one process is equal to value
of handle we were dealing with in other, we may got warnings from
add_handle.

system_printf is pumping data to STD_ERROR_HANDLE. It's possibly a
pipe to tty master. Handling data in tty master thread is quite
complicated, and may possibly get to the same add_handle() but with
muto already locked. Normally it's not a big problem since
system_printf() will return asynchronously to tty master and unlock
the mutex. But here we have the second nasty random thing that may
happen: The pipe may be filled up. In this case WriteFile in
system_printf blocked until master drain the data from pipe. And
master may be blocked because it wants to protect a handle but debug
muto is locked.

I've noticed special here.unlock() before debug_printf() in
add_handle(). Could it be that it was added there for similar reasons?
If not, then it's not clear why we should unlock mute explicitly when
it will be unlocked in the next line when 'return' statement is
executed?

CF>  There are too many places  where the fd handle is manipulated but
CF> not protected for this code  to be turned on.  And since there is
CF> no easy way to get distinct handle  name information into the
CF> table, it wouldn't make sense to add the  protection anyway.

Egor.            mailto:deo@logos-m.ru ICQ 5165414 FidoNet 2:5020/496.19


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]