This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Cygwin multithreading performance


On Nov 21 01:21, Mark Geisert wrote:
> Kacper Michajlow wrote:
> >Thanks for reply. And sorry for being not specific enough before. 'git
> >gc' is a driver which runs various git command to do cleanup in
> >repository. Though I'm mostly concerned about the code I linked.
> >Instead of 'git gc' it is better to test directly 'git repack -a -f'
> >and possibly on repository where it takes some time.
> >'git://sourceware.org/git/newlib-cygwin.git' is good test case.
> >Although with bigger repositories performance hit is bigger, this is
> >good example to see what's going on.
> 
> I appreciate that more specific info on how you experience the issue.
> 
> >I'm well aware that forking on windows is problematic, but I
> >explicitly interested in parallelized part of execution. I don't care
> >about forks, while this slows things down too, they are not used in
> >compression process which is parallelized over the all cpu threads.
> >Each command is indeed forked, but I'm only interested about
> >pack-objects part hence the code I linked.
> 
> OK, we're on the same page now :).
> 
> >$ strace --mask=debug+syscall+thread -o git.strace git repack -a -f
> >Counting objects: 156690, done.
> >Delta compression using up to 12 threads.
> >Compressing objects: 100% (154730/154730), done.
> >Writing objects: 100% (156690/156690), done.
> >Total 156690 (delta 123449), reused 33146 (delta 0)
> >
> >$ grep "fork(" git.strace
> >   559   53728 [main] git 24340 fork: 24368 = fork()
> >   465   54022 [main] git 24368 fork: 0 = fork()
> >
> >Only two forks were created, while during compression only 25% cpu was
> >used (on big repo like linux kernel it doesn't exceed 8%). With native
> >git the same workload easily uses 95-100% cpu and therefor is a lot
> >faster.
> 
> I was able to reproduce your issue using a cloned newlib-cygwin repo. On a
> 6-CPU machine I saw max 36% CPU utilization during the compression phase.
> ProcessExplorer showed all 6 threads were getting CPU time (to varying
> degrees) and when suspended they were always trying to acquire a mutex.  I'd
> like to run some more straces and perhaps investigate with some other tools
> before saying more.  This may take a while.
> 
> What I've done so far is install the git-debuginfo and cygwin-debuginfo
> packages to that I can convert hex RIP addresses to line numbers.  I've run
> the testcase under gdb so I can interrupt at random times and poke around.
> The straces from this testcase are ginormous so I hope I can figure out a
> better way to see why the compression threads aren't CPU-bound like they
> should be.  If you don't already know, 'strace --help' shows the available
> mask values.  The threads are each writing to disk, so I wonder if there's
> some unintentional serialization going on somewhere, but I don't know yet
> how I could verify that theory.

If I'm allowed to make an educated guess, the big serializer in Cygwin
are probably the calls to malloc, calloc, realloc, free.  We desperately
need a new malloc implementation better suited to multi-threading.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

Attachment: pgptItZ5a_GPy.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]