This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: fork failure?


Charles Wilson wrote:
>   1) look at the parent's stack trace when it is inside fork(). Ditto
>      its return address.
>   2) after the child longjmp's back into fork() from dll_crt0_1,
>      look at its stack trace and return address. (although I can't
>      really catch it that early. I can only catch it in the debugger
>      just after the CYGWIN_FORK_SLEEP...but at least I'm still
>      back inside fork() at that point.
> 
> They ought to match in all respects, correct?

I think...maybe...I've located the primary symptom.  I don't yet know
the cause, so I don't have fix.  But...

The stack traces are "close", but do not match exactly.  Here's an
interleaved version, "P" for parent, "C" for child (and edited for
formatting, and to redact private data):

A: I /think/ this is the import thunk for fork, rather than
   the /actual/ _fork in the cygwin1.dll. Blame mingw gdb.
B: where I set the breakpoint in child. So that's reassuring.


P	#0  0x00439bb4 in fork ()  [[[[ B ]]]]
C	#0  fork () at fork.cc:617 [[[[ A ]]]]

P	#1  0x0043843f in pipe_connect_unix (ctx=0x7ff8fa30,
	    name=0x137ad68 "/usr/bin/pinentry-gtk-2.exe",
	    argv=0x7ff8fa1c, fd_child_list=0x7ff8fa10, atfork=0x40c803
	    <atfork_cb>, atforkvalue=0x136b948, flags=128)
	    at assuan-pipe-connect.c:234

C	#1  0x610cd03c in _sigfe () from /usr/bin/cygwin1.dll
C	#2  0x0137ad68 in ?? ()
C	#3  0x00001120 in ?? ()
C	#4  0x00000019 in ?? ()

P	#2  0x00438f56 in assuan_pipe_connect_ext (ctx=0x7ff8fa30,
	    name=0x137ad68 "/usr/bin/pinentry-gtk-2.exe",
	    argv=0x7ff8fa1c, fd_child_list=0x7ff8fa10, atfork=0x40c803
	    <atfork_cb>, atforkvalue=0x136b948, flags=128)
	    at assuan-pipe-connect.c:920
C	#5  0x00438f56 in assuan_pipe_connect_ext (ctx=0x7ff8fa30,
	    name=0x137ad68 "/usr/bin/pinentry-gtk-2.exe",
	    argv=0x7ff8fa1c, fd_child_list=0x7ff8fa10, atfork=0x40c803
	    <atfork_cb>, atforkvalue=0x136b948, flags=128)
	    at assuan-pipe-connect.c:920

P	#3  0x0040cbd0 in start_pinentry (ctrl=0x136b948)
	    at agent/call-pinentry.c:316
C	#6  0x0040cbd0 in start_pinentry (ctrl=0x136b948)
	    at agent/call-pinentry.c:316		
		
P	#4  0x0040e013 in agent_get_passphrase (ctrl=0x136b948,
	    retpass=0x7ff8fec0,
	    desc=0x136bae9 "Please enter the passphrase to unlock the
	    secret key for the OpenPGP certificate:REDACTED"...,
	    prompt=0x0,errtext=0x0, with_qualitybar=0)
	    at agent/call-pinentry.c:809		
C	#7  0x0040e013 in agent_get_passphrase (ctrl=0x136b948,
	    retpass=0x7ff8fec0,
	    desc=0x136bae9 "Please enter the passphrase to unlock the
	    secret key for the OpenPGP certificate:REDACTED"...,
	    prompt=0x0, errtext=0x0, with_qualitybar=0)
	    at agent/call-pinentry.c:809
		
P	#5  0x00406f15 in cmd_get_passphrase (ctx=0x136ba50,
	    line=0x136babc "REDACTED")
	    at agent/command.c:1111
C	#8  0x00406f15 in cmd_get_passphrase (ctx=0x136ba50,
	    line=0x136babc "REDACTED")
	    at agent/command.c:1111

P	#6  0x00435115 in dispatch_command (ctx=0x136ba50,
	    line=0x136baa7 "--data --repeat=0 -- REDACTED", linelen=282)
	    at assuan-handler.c:497
C	#9  0x00435115 in dispatch_command (ctx=0x136ba50,
	    line=0x136baa7 "--data --repeat=0 -- REDACTED", linelen=282)
	    at assuan-handler.c:497
		
P	#7  0x004355a9 in process_request (ctx=0x136ba50)
	    at assuan-handler.c:720
C	#10 0x004355a9 in process_request (ctx=0x136ba50)
	    at assuan-handler.c:720

P	#8  0x004355d7 in assuan_process (ctx=0x136ba50)
	    at assuan-handler.c:742
C	#11 0x004355d7 in assuan_process (ctx=0x136ba50)
	    at assuan-handler.c:742
		
P	#9  0x00408519 in start_command_handler (ctrl=0x136b948,
	    listen_fd=-1, fd=7)
	    at agent/command.c:1944
C	#12 0x00408519 in start_command_handler (ctrl=0x136b948,
	    listen_fd=-1, fd=7)
	    at agent/command.c:1944

P	#10 0x004041a2 in start_connection_thread (arg=0x136b948)
	    at agent/gpg-agent.c:1755	
C	#13 0x004041a2 in start_connection_thread (arg=0x136b948)
	    at agent/gpg-agent.c:1755
		
P	#11 0x6fa44bba in ?? ()
C	#14 0x6fa44bba in pth_exit () from /usr/bin/cygpth-20.dll

P	#12 0x0136b948 in ?? ()
C	#15 0x00000000 in ?? ()

P	#13 0x00000202 in ?? ()
P	#14 0x0022c9e8 in ?? ()


The register dump and $esp inspection for both parent and child are in
the attached file, for the truly interested.


So, what's the smoking gun?  This, in the child:

(gdb) p *(child_info_fork *)child_proc_info
$5 = {<child_info> = {msv_count = 0, cb = 288, intro = 2936076800,
    magic = 3897586042, type = 2, subproc_ready = 0x230, user_h = 0xb4,
    parent = 0x234, cygheap = 0x61245650, cygheap_max = 0x6124c1ac,
    cygheap_reserve_sz = 0, flag = 2 '\002', fhandler_union_cb = 464,
    retry = 10, exit_code = 0, static retry_count = 10},
  forker_finished = 0x238, stacksize = 0, jmp = {1, 22793784, 0, 0, 7,
    4210969, 2147022580, 2147022140, 1627853804, 3866659, 2293760,
    2281064, 2284368, 0 <repeats 39 times>}, stacktop = 0x7ff8f53c,
  stackbottom = 0x230000, filler = "\000\000\000"}

fork_info->stacksize is zero. According to fork.cc in frok::child:

  /* If we've played with the stack, stacksize != 0.  That means that
     fork() was invoked from other than the main thread.  Make sure that
     the threadinfo information is properly set up.  */
  if (fork_info->stacksize)
    {
      _main_tls = &_my_tls;
      _main_tls->init_thread (NULL, NULL);
      _main_tls->local_clib = *_impure_ptr;
      _impure_ptr = &_main_tls->local_clib;
    }

So, since in this child, we have stacksize == 0, that must mean that
fork() was invoked from the main thread (of the parent, since a child
/never/ "invokes" fork. It can't; parents invoke fork).  Right?

Wrong: look at frame #10(parent)/#13(child)
0x004041a2 in start_connection_thread (arg=0x136b948)
	    at agent/gpg-agent.c:1755

Here's the start of that function:

/* This is the standard connection thread's main function.  */
static void *
start_connection_thread (void *arg)
{
...
}

It is launched inside:


/* Connection handler loop.  Wait for connection requests and spawn a
   thread after accepting a connection.  */
static void
handle_connections (gnupg_fd_t listen_fd, gnupg_fd_t listen_fd_ssh)
{
...
  for (;;)
    {
...
     /* We now might create new threads and because we don't want any
         signals (as we are handling them here) to be delivered to a
         new thread.  Thus we need to block those signals. */
      pth_sigmask (SIG_BLOCK, &sigs, NULL);

      if (!shutdown_pending && FD_ISSET (FD2INT (listen_fd),
                                         &read_fdset))
        {
>>> HERE <<<  if (!pth_spawn (tattr, start_connection_thread, ctrl))
                {
                  log_error ("error spawning connection handler: %s\n",
                             strerror (errno) );
                  assuan_sock_close (fd);
                  xfree (ctrl);
                }

        }
...
    }
...
}


So, we have a thread, which is not the main thread, which calls fork.
Yet, when the forkee (child) checks that, it thinks that it WAS launched
from the main thread, and does not do the main_tls/_my_tls fixup stuff
for the stack.


Right now the "CYGWIN_FORK_SLEEP" check comes AFTER the main_tls fixup
stuff, so there's another cygwin kernel rebuild in my future, if I want
to step into the child early enough to look at the main_tls fixup stuff.
However, even if I poke at fork_info->stacksize and make it non-zero, so
that the main_tls/my_tls fixup happens --- I still need to figure out
exactly WHAT non-zero value it should have. I suspect that's almost as
hard as figuring out why cygwin didn't do it in the first place -- and
if I knew that, I'd just fix it there, rather than poking at it manually
using gdb...

I have a hunch that an STC (okay, less-hellaciously-complicated test
case) could be developed, using just GNU pth and avoiding all the
libassuan/gnupg gobbledygook.

Anyway, I'm going to be AFK (or, at least AF-This-Problem) for the weekend.

I've attached the entire log of relevant data from parent and child,
gzipped.

--
Chuck

Attachment: fork-data.tar.gz
Description: application/gzip

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]