From adc25dc19ade01fe42d4a4446a24b45ddd01f106 Mon Sep 17 00:00:00 2001 From: Mark Geisert Date: Thu, 10 Mar 2016 23:56:56 -0800 Subject: [PATCH] Documentation covering profiling Cygwin programs. --- winsup/doc/gprof.xml | 250 +++++++++++++++++++++++++++++++++++++++++++++ winsup/doc/programming.xml | 1 + winsup/doc/setup-env.xml | 9 +- 3 files changed, 259 insertions(+), 1 deletion(-) create mode 100644 winsup/doc/gprof.xml diff --git a/winsup/doc/gprof.xml b/winsup/doc/gprof.xml new file mode 100644 index 0000000..b2043cb --- /dev/null +++ b/winsup/doc/gprof.xml @@ -0,0 +1,250 @@ + + + +Profiling Cygwin Programs + +Introduction +Profiling is a way to analyze your program to find out where it is +spending its time. You might need to do this if it seems your program is +taking more time to do its job than you think it should. It is always +preferable to profile your program than to just guess where the time is +being spent; even expert programmers are known to guess badly at this. + + +In Cygwin, you enable profiling with a compiler flag and you display +the resulting profiling data with gprof. Read on to find out how. + + +To enable profiling of your program, first compile it with an +additional gcc flag: -pg. That flag should be used +when compiling every source file of the program. If your program has a +Makefile, you would add the flag to all gcc compilation commands or to the +CFLAGS= setting. A manual compilation that enables profiling looks like this: + + + +bash$ gcc -pg -g -o myapp myapp.c + + +The -pg flag causes gcc to do two additional +things as it compiles your program. First, a small bit of code is added to +the beginning of each function that records its address and the address it +was called from at run time. gprof uses this data to generate a call graph. +Second, gcc arranges to have a special "front end" added to the beginning +of your program. The front end starts a recurring timer and every time the +timer fires, 100 times per second, the currently executing address is saved. +gprof uses this data to generate a "flat profile" showing where your +program is spending its time. + + +After compiling your program (and linking it, if you do that as a +separate step), you are ready to profile it. Just run it as you normally +would. If there are specific code paths you want to profile, take the +actions that would exercise those code paths. When your program exits, +you will have an additional file in the current directory: gmon.out. +That file contains the profiling data gprof processes and displays. + + +gprof has many flags to control its operation. The +gprof man page details everything gprof can do. We +will only use a few of gprof's flags here. You launch gprof as follows: + +bash$ gprof [flags] appname [datafile]... + +If you don't specify any flags, gprof operates as if you gave it flags +-p -q which means: generate a flat profile with +descriptive text and generate a call graph with more descriptive text. In +the examples below we will give specific flags to gprof to demonstrate +specific displays. We'll also use flag -b which +means: be brief, i.e. don't display the descriptive text. You can also +specify a trailing list of one or more profiling data files. If you don't, +gprof assumes gmon.out is the only file to process and display. + + + +Examples +Flat profile + +bash$ gprof -b -p myapp +Flat profile: + +Each sample counts as 0.01 seconds. + % cumulative self self total + time seconds seconds calls s/call s/call name + 25.11 13.34 13.34 1 13.34 13.34 func0 + 25.00 26.62 13.28 1 13.28 13.28 func1 + 25.00 39.90 13.28 1 13.28 13.28 func3 + 24.89 53.12 13.22 1 13.22 13.22 func2 + + +Call graph + +bash$ gprof -b -q myapp + Call graph + + +granularity: each sample hit covers 4 byte(s) for 0.02% of 53.12 seconds + +index % time self children called name + <spontaneous> +[1] 100.0 0.00 53.12 main [1] + 13.34 0.00 1/1 func0 [2] + 13.28 0.00 1/1 func1 [3] + 13.28 0.00 1/1 func3 [4] + 13.22 0.00 1/1 func2 [5] +----------------------------------------------- + 13.34 0.00 1/1 main [1] +[2] 25.1 13.34 0.00 1 func0 [2] +----------------------------------------------- + 13.28 0.00 1/1 main [1] +[3] 25.0 13.28 0.00 1 func1 [3] +----------------------------------------------- + 13.28 0.00 1/1 main [1] +[4] 25.0 13.28 0.00 1 func3 [4] +----------------------------------------------- + 13.22 0.00 1/1 main [1] +[5] 24.9 13.22 0.00 1 func2 [5] +----------------------------------------------- + + +Index by function name + + [2] func0 [5] func2 + [3] func1 [4] func3 + + +Source line profile + +bash$ gprof -b -l myapp +Flat profile: + +Each sample counts as 0.01 seconds. + % cumulative self self total + time seconds seconds calls s/call s/call name + 25.11 13.34 13.34 1 13.34 13.34 func0 (myapp.c:9 @ 1004010e0) + 25.00 26.62 13.28 1 13.28 13.28 func1 (myapp.c:10 @ 10040111a) + 25.00 39.90 13.28 1 13.28 13.28 func3 (myapp.c:12 @ 10040118e) + 24.89 53.12 13.22 1 13.22 13.22 func2 (myapp.c:11 @ 100401154) + + + Call graph + + +granularity: each sample hit covers 4 byte(s) for 0.02% of 53.12 seconds + +index % time self children called name + 13.34 0.00 1/1 main (myapp.c:26 @ 10040123c) [1] +[2] 25.1 13.34 0.00 1 func0 (myapp.c:9 @ 1004010e0) [2] +----------------------------------------------- + 13.28 0.00 1/1 main (myapp.c:26 @ 10040123c) [1] +[3] 25.0 13.28 0.00 1 func1 (myapp.c:10 @ 10040111a) [3] +----------------------------------------------- + 13.28 0.00 1/1 main (myapp.c:28 @ 100401246) [5] +[4] 25.0 13.28 0.00 1 func3 (myapp.c:12 @ 10040118e) [4] +----------------------------------------------- + 13.22 0.00 1/1 main (myapp.c:27 @ 100401241) [7] +[6] 24.9 13.22 0.00 1 func2 (myapp.c:11 @ 100401154) [6] +----------------------------------------------- + + +Index by function name + + [2] func0 (myapp.c:9 @ 1004010e0) [6] func2 (myapp.c:11 @ 100401154) + [3] func1 (myapp.c:10 @ 10040111a) [4] func3 (myapp.c:12 @ 10040118e) + + + +Special situations +Profiling multi-threaded programs +Multi-threaded programs are profiled just like single-threaded programs. +There is no mechanism to turn profiling on or off for specific threads. +gprof combines the data for all threads when generating its displays. + + + +Profiling programs that fork +Programs that fork, i.e., use the fork() system call with or without +using exec() afterwards, require special care. Since there is only one +gmon.out file, profiling data from the parent process might get overwritten +by the child process, or vice-versa, after a fork(). You can avoid this by +setting the environment variable GMON_OUT_PREFIX before running your +program. If the variable is non-empty, its contents will be used as a +prefix to name the profiling data files. Here's an example: + + + + +bash$ export GMON_OUT_PREFIX=myapp.out +bash$ ./myapp -fork +bash$ ls myapp.out* +myapp.out.2728 myapp.out.3224 + +bash$ gprof -bp myapp myapp.out.2728 +Flat profile: + +Each sample counts as 0.01 seconds. + % cumulative self self total + time seconds seconds calls s/call s/call name + 50.25 30.28 30.28 2 15.14 15.14 func3 + 24.99 45.34 15.06 1 15.06 15.06 func1 + 24.76 60.26 14.92 1 14.92 14.92 func2 + +bash$ gprof -bp myapp myapp.out.3224 +Flat profile: + +Each sample counts as 0.01 seconds. + % cumulative self self total + time seconds seconds calls s/call s/call name + 49.25 29.36 29.36 2 14.68 14.68 func3 + 25.43 44.52 15.16 1 15.16 15.16 func1 + 25.33 59.62 15.10 1 15.10 15.10 func2 + +bash$ gprof -bp myapp myapp.out* +Flat profile: + +Each sample counts as 0.01 seconds. + % cumulative self self total + time seconds seconds calls s/call s/call name + 49.75 59.64 59.64 4 14.91 14.91 func3 + 25.21 89.86 30.22 2 15.11 15.11 func1 + 25.04 119.88 30.02 2 15.01 15.01 func2 + +As the last gprof command above shows, gprof can combine the data +from a selection of profiling data files to generate its displays. Just +list the names of those files at the end of the gprof command; you can use +a wildcard here. NOTE: If you update your program, remember to remove stale +profiling data files before profiling your program again. If you aren't +careful about this, gprof could combine data from your most recent version +with stale data from prior versions, possibly giving misleading displays. + + + +Getting better profiling resolution +To get better resolution (i.e., more data points) when profiling +your program, try running it multiple times with the environment variable +GMON_OUT_PREFIX set, as described in the previous situation. There will be +multiple profiling data files generated and you can have gprof combine +the data from all of them into one display. + + + +Profiling programs with their libraries +At the time of this writing Cygwin's profiling support only allows +for one range of addresses per program. It is hard-wired to be the range +covering the .text segment of your program, which is where your code resides. +If you build your program with static libraries (e.g., libfoo.a), the code +from those libraries is linked into your program's .text segment so will be +included when profiling. But dynamic libraries (e.g., libfoo.dll) reside in +other address ranges and code within them won't be included. + + + +Profiling Cygwin itself +Due to the issue mentioned in the previous situation and other issues, +at the time of this writing there is no support for profiling Cygwin itself. + + + + + diff --git a/winsup/doc/programming.xml b/winsup/doc/programming.xml index 4b65c40..05ca12e 100644 --- a/winsup/doc/programming.xml +++ b/winsup/doc/programming.xml @@ -9,4 +9,5 @@ + diff --git a/winsup/doc/setup-env.xml b/winsup/doc/setup-env.xml index 4b519a2..ca5acc9 100644 --- a/winsup/doc/setup-env.xml +++ b/winsup/doc/setup-env.xml @@ -71,9 +71,16 @@ first starts. Most Cygwin applications do not make use of the dlopen () call and do not need this variable. +The GMON_OUT_PREFIX environment variable is helpful +in some situations when profiling Cygwin programs. For more information, +see . Only software developers with a +need to profile their programs have a use for this variable. + + In addition to PATH, HOME, -and LD_LIBRARY_PATH, there are three other environment +LD_LIBRARY_PATH, and GMON_OUT_PREFIX, +there are three other environment variables which, if they exist in the Windows environment, are converted to UNIX format: TMPDIR, TMP, and TEMP. The first is not set by default in the -- 2.7.0