This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Cygwin Filesystem Performance degradation 1.7.5 vs 1.7.7, and methods for improving performance


Hi,

> Right.  Another way of looking at this is that the mount options offer
> consistency.  The notion of setting an environment variable in Window A
> to get one behavior and not settting it in Window B is, IMO, a support
> nightmare and a recipe for end-user confusion.

1) for applications that internally will set this setting there is nothing confusing on non-consistant. The application author knows that his application does not use inode/nlink and sets it in his main(), and the end-user needs to know nothing about it, and it has no behavior change on the application (except for the increased performance...) since the application never uses the inode/nlink info. The application will behave exactly the same.

2) for users that want to set this: setting PATH can cause the same application/bash script to behave completely differently. same goes for LD_LIBRARY_PATH, SHELL, COMSPEC, TMPDIR etc. They can cause the same application in different shells to behave differently. This causes confusion only for end-users who touch things they dont understand what they do. If you dont know what LD_LIBRARY_PATH is: dont touch it! Unix system does not try to protect itself from ignorent end-users who touch things that they are not supposed to touch (unlike GUI applications which try to). Nothing will protect against an end user setting an incorrect PATH. If an end user does not know what PATH is: he should not touch it!

> Or, another way of looking at this is, instead of implementing their own
> potentially buggy, imprecise stat() they could have not thought of
> Cygwin as a black box and either 1) offered improvements for the DLL or
> 2) engaged the Cygwin community with requirements.  If there is ifdef'ed
> __CYGWIN__ code in git now that means that any performance improvements that
> we (i.e., Corinna) has made will never be noticed and that code will be
> maintained forever.

And this is exactly what Yoni Londner is trying to do: He not only complained about performance: but gave a practical patch to for using setenv("CYGWIN") to solve the performance problems.

I am sure git developers were not happy to have to write their own version of stat() specially for __CYGWIN__. But it seems here that the simple to implement setenv("CYGWIN", "no_ino no_nlink") is being rejected without any good reason.

> So, you're trading ifdef __CYGWIN__ in git with lots of if's in the very
> parts of Cygwin code path where people complain about slowness.

The slowness of the cygwin filesystem calls do not come from if()'s in Cygwin's code.

A typical CPU today can perform around 1,000,000,000 if()'s per second (around 1 nano second per if()).
While the 'cost' of WinNT system call is a minimum of 20,000ns, while many filesystem calls are much much longer.


So adding an if() to save a system call (or even 10 if()s...) - is always worth it.

Derry

On 9/29/2010 5:10 PM, Christopher Faylor wrote:
On Wed, Sep 29, 2010 at 11:08:21AM +0200, Derry Shribman wrote:
Hi,

Doesn't the 'noacl' mount option provide that already?

Partially, there are also the ihash and the exec/notexec options. A lot has been already discussed on the cygwin-patches list, see, for instance

The problem with mount options is that they are 'static'. They require a cygwin 'reboot' and they do not allow 'inheritance' for subprocesses, and do not allow concurrent processes running in different modes.

Right. Another way of looking at this is that the mount options offer consistency. The notion of setting an environment variable in Window A to get one behavior and not settting it in Window B is, IMO, a support nightmare and a recipe for end-user confusion.

Dynamic options via CYGWIN env allow setting stuff in runtime, in /etc/profile,
~/bashrc, or for specific commands (and their subprocesses), such as:
CYGWIN=no_nlink rsync c:/... z:/...

This allows the user to be free to decide where to relax POSIX compliance in
order to achieve speed.

It also allows application developers (such as 'git'), to decide in their code
how they want Cygwin to behave.
In 'git' for example, it does need stat's nlink (number of hard links), and
actually, not even n_ino (the inode number). Cygwin's git performance was
ultra-slow, and they improved it by not using Cygwin's stat(), rather
re-implementing their own 'quick-stat' which worked directly with Win32 API.

If Cygwin would have supported dynamic options (as opposed to mount time
options), instead of the large 'ifdef __CYGWIN__' code, it would simply be
adding 'setenv("CYGWIN", "no_nlink no_inode")' to the code in git's main().

Or, another way of looking at this is, instead of implementing their own potentially buggy, imprecise stat() they could have not thought of Cygwin as a black box and either 1) offered improvements for the DLL or 2) engaged the Cygwin community with requirements. If there is ifdef'ed __CYGWIN__ code in git now that means that any performance improvements that we (i.e., Corinna) has made will never be noticed and that code will be maintained forever.

This allow applications to declare they will never look into the 'st_ino' and
'st_nlink'. The authors of an application, at the time of writing it, know
whether their code accesses these fields or not.

So, you're trading ifdef __CYGWIN__ in git with lots of if's in the very parts of Cygwin code path where people complain about slowness.

But, anyway, if we were going to implement something like this, it wouldn't
be with environment variables, it would be with the proposed api that Eric
Blake has mentioned in the past.

cgf




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]