This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

select() hangs sometimes, for TCP connections


Problem: sometimes select() doesn't return.

Context: I run a DB replication scenario,
with cron, everything 5 mn. There is no change in the
DB, so the scenario is always the same. Most of the
time, it works. But eventually, after some time (may
be some minutes or hours), a process A keeps waiting
forever in select() for a response on a TCP socket.
With gdb I can see that the other end B returned in
its
ReadCommand() function, meaning it has send its
response and waits for a new command, so this side
should be OK.

Here is the stack for A :
(gdb) bt
#0  0x77f682cb in ntdll!ZwWaitForMultipleObjects ()  
from /cygdrive/c/WINNT/System32/NTDLL.DLL
#1  0x77f1ce6b in WaitForMultipleObjectsEx ()   from
/cygdrive/c/WINNT/system32/KERNEL32.DLL
#2  0x77f1cd76 in WaitForMultipleObjects ()   from
/cygdrive/c/WINNT/system32/KERNEL32.DLL
#3  0x61073703 in sigpending () from
/usr/bin/cygwin1.dll
#4  0x61069de2 in select () from /usr/bin/cygwin1.dll
#5  0x002145e0 in ?? ()

By adding many printf traces, the problem appears
more quickly, so I can reproduce it very easily.
Just one run of cron is enough.
(file descriptors are more solicited...?)

I know that I'm not providing much info to
reproduce the problem. But my environment is not so
simple: it involves:
cron -> sh -> pgtclsh (TCL interpretor of Postgresql)
-> a cnx is made with a Postgresql backend

It seems that the last command executed by Postgresql
in the replication scenario is a
"copy <table> to <file>", which creates more file
descriptors. May be a clue.

I was initially in version 1.5.5-1. Searching the
archives, I saw
http://sources.redhat.com/ml/cygwin/2003-11/msg00137.html
http://sources.redhat.com/ml/cygwin/2003-10/msg01812.html
reporting some problems with select() in 1.5.5-1.
My first reaction was to upgrade to 1.5.7-1, hoping
the problem would be solved. :( no.
These reports were more related to UDP or pipes,
not TCP sockets. But the symptoms are similar.

Is it possible that a correction of the same kind as
the one for the pipes is applicable in this case?

Feel free to ask me for more investigation, I'm
willing to help. Just tell me how.

Patrick


__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

Attachment: cygcheck.out
Description: cygcheck.out

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]