This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Performance optimization in av::fixup - use buffered IO, not mapped file


Emacs "make bootstrap" runs Emacs as a compiler, generating .elc files from .el
files. The build system runs Emacs once for each .el file we compile, of which
there are thousands. Now, Emacs takes about a two seconds to start on my system,
so compiling thousands of files takes a while; the actual .el to .elc
compilation is nearly instantaneous.

According to xperf, Emacs spends most of its startup time re-reading emacs.exe
code pages from disk.

~/edev/trunk.nox/src
$ time ./emacs --batch -Q --eval '(kill-emacs)'

real    0m2.236s
user    0m0.015s
sys     0m0.015s

~/edev/trunk.nox/src
$ time ./emacs --batch -Q --eval '(kill-emacs)'

real    0m2.343s
user    0m0.062s
sys     0m0.016s

We shouldn't need to read this file more than once. After the first run, the
system should be able to read emacs.exe from the standby list, not the disk.

Now, if we run emacs.exe from cmd, not bash, that's exactly what happens:

C:\Users\dancol\edev\trunk.nox\src
> type bench-emacs.cmd
@echo off
echo %TIME%
.\emacs --batch -Q --eval "(kill-emacs)"
echo %TIME%

C:\Users\dancol\edev\trunk.nox\src
> .\bench-emacs
16:39:46.31
16:39:48.73

C:\Users\dancol\edev\trunk.nox\src
> .\bench-emacs
16:39:50.91
16:39:50.96

C:\Users\dancol\edev\trunk.nox\src
> .\bench-emacs
16:39:51.32
16:39:51.37

I came up with a simple test case that reproduces in cmd the behavior I see when
I run Emacs from bash. I've reproduced the program below. Here, I've compiled
a.exe with -DSLOW:

C:\Users\dancol\edev\trunk.nox\src
> type .\bench-emacs2.cmd
@echo off
%TMP%\a.exe emacs.exe
echo %TIME%
.\emacs --batch -Q --eval "(kill-emacs)"
echo %TIME%

C:\Users\dancol\edev\trunk.nox\src
> .\bench-emacs2
Success
16:41:55.12
16:41:57.24

C:\Users\dancol\edev\trunk.nox\src
> .\bench-emacs2
Success
16:41:57.62
16:41:59.69

C:\Users\dancol\edev\trunk.nox\src
> .\bench-emacs2
Success
16:42:00.05
16:42:02.20

Here's the program that generates a.exe:

#define UNICODE 1
#define _UNICODE 1
#include <windows.h>
#include <stdio.h>

int
main(int argc, char* argv[])
{
    HANDLE file;
    HANDLE section;
    PVOID view;
    LARGE_INTEGER size;
    BYTE Buffer[64*1024];
    DWORD BytesRead;

    file = CreateFileA(argv[1],
                      SYNCHRONIZE | GENERIC_READ,
                      FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
                      NULL,
                      OPEN_EXISTING,
                      FILE_ATTRIBUTE_NORMAL,
                      NULL);

    if (file == INVALID_HANDLE_VALUE) {
        fprintf(stderr, "CreateFile: 0x%lx\n", GetLastError());
        return 1;
    }

    if (!GetFileSizeEx(file, &size)) {
        fprintf(stderr, "GetFileSizeEx: 0x%lx\n", GetLastError());
        return 1;
    }

    if (size.QuadPart > 64*1024) {
        size.LowPart = 64*1024;
    }

#if defined FAST
    if (!ReadFile(file, Buffer, sizeof (Buffer), &BytesRead, NULL)) {
        fprintf(stderr, "ReadFile: 0x%lx\n", GetLastError());
    }

    printf("Read %lu bytes\n", BytesRead);
#elif defined SLOW
    section = CreateFileMapping(file, NULL, PAGE_READONLY, 0, 64*1024, NULL);
    if (!section) {
        fprintf(stderr, "CreateFileMapping: 0x%lx\n", GetLastError());
        return 1;
    }
#else
#error Define FAST or SLOW
#endif

    printf("Success\n");
    return 0;
}

As you can see, a.exe merely creates a section object for emacs.exe; it doesn't
even map it into memory. Still, after running a.exe on emacs.exe, the system
reloads all emacs.exe's code pages the next time we run emacs.exe.

If we build a.exe with -DFAST instead of -DSLOW, then a.exe grabs the first 64k
of emacs.exe using ordinary, buffered ReadFile instead of trying to create a
section object. When compiled this way, a.exe seems to have no effect on Emacs
startup time:

C:\Users\dancol\edev\trunk.nox\src
> .\bench-emacs
16:48:38.25
16:48:40.54

C:\Users\dancol\edev\trunk.nox\src
> .\bench-emacs
16:48:42.03
16:48:42.08

C:\Users\dancol\edev\trunk.nox\src
> .\bench-emacs
16:48:42.38
16:48:42.43

a.exe with -DSLOW mimics what av::fixup does when trying to determine whether an
executable is a Cygwin program. If av::fixup used ordinary ReadFile instead of
memory-mapped IO, program start performance would increase drastically, at least
for my workload.

I'm running 2K8R2. I'm not running any AV products, disk scanners, or other
exotic pieces of software. CYGWIN=detect_bloda reports nothing.

$ uname -a
CYGWIN_NT-6.1-WOW64 xyzzy 1.7.17(0.262/5/3) 2012-10-19 14:39 i686 Cygwin





Attachment: signature.asc
Description: OpenPGP digital signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]