This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

How to make child of failed fork exit cleanly?


Hi all,

I'm working on some changes to fork() which would detect early the case where a parent-child pair have unresolvable differences in address space layout (e.g. thread stacks, heaps, or statically-linked dlls which moved).

Detecting the problem turned out to be pretty easy, but making the child exit cleanly is not. This leads to two questions, followed by what I have figured out so far while attempting to answer them myself.

1. What's the best way to make a child process notify the parent that the fork() cannot succeed, and exit cleanly?

2. When the child does exit, how to prevent finalizers from running for dlls which did not load properly?

Context for the first question: Existing fork failure code calls api_fatal(), but that sends messages to the terminal and generates a stack trace, in addition to the desired result of making the parent's fork() call return an error message. Further, Windows 7 treats such an exit as grounds for an automatic process restart, and respawns the failed child up to five more times before giving up. The result is a screen full of error messages and stack traces even if the fork eventually succeeds. It's especially annoying under terminal apps like emacs or screen, where the messages clutter up the display pretty badly.

Given that the cause of the fork failure is known (rather than some surprise or bug), I propose that the messages go to some strace channel (a new one for fork, perhaps?) and that the child exit without attempting to generate a dump file (especially since dump generation itself has a tendency to cause crashes). It would also be good, in cases where the parent is the reason for fork failures, to prevent Windows from respawning the process so many times (though it is admittedly handy when the child was the problem and the fork succeeds on the nth try). All of this still leaves the question of how to exit the child process, "properly" though. Is it necessary to wait for dll initialization to finish first, for example?

Context for the second question: exiting the child tends to trigger access violations, often in a pthread_mutex destructor call (la-la land). Some of these can be avoided by disabling stack dumping from api_fatal (see separate email about alloca and stack walking), but the others continue to mystify.

Overal, AFAICT, the cygwin dll design assumes that all dlls have loaded properly, and a failed fork breaks that invariant. I worry that some "properly-loaded" dll accesses state of a "not-properly loaded" dependency, but haven't been able to eliminate fully two simpler explanations yet:

(a) A statically-linked dll maps to a different address in the child than the parent, and because copied-over dll state references addresses which are valid in the parent but not the child, dll initialization crashes. For example, this was probably responsible for the access violations I reported earlier [1]. I've verified that this can be avoided by checking for handle mismatches in dll_list::alloc and forcing an early exit, but this leads to...

(b) Finalizers run for a dynamically-linked dll which never loaded (and/or a statically-linked dll which loaded to the wrong location -- I can't tell). I've tried inserting checks in a few places to not run finalizers unless the after-fork initialization completed, by extending dll_list entries to say whether a given dll initialized properly, but I've clearly not isolated the cause because the access violations continue. Part of the challenge is that the dll_list copied over from the parent process will always say that every dll initialized properly. It also doesn't help that many dll initializers run before cygwin1.dll (sometimes even other cygwin dlls, if they've been dynamically rebased), so the value of in_forkee is reliable.

[1] http://cygwin.com/ml/cygwin-developers/2011-04/msg00006.html

Thoughts?
Ryan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]