HTCondorWiki: Memory Profiling

How to generate a core file unobtrusively

This method is not nearly as powerful as some of the other options, but it has one big advantage: you can get a core file from a running daemon without killing it and restarting. It just interrupts the daemon for as long as it takes to write out the core file.

cd /path/to/space/for/core/file
gdb -p <pid of daemon>
gcore
quit

Now you have a core file. You can load it up in gdb and walk around in data structures to see if some of them are larger than expected. You can also run 'strings' on it to see if anything unexpected jumps out. For example, if a particular string is repeated many more times than it should be, that could be a clue to help you find a leak or source of bloat.

strings corefile | sort | uniq -c | sort -n

How to use valgrind to check a daemon for memory leaks

Valgrind is a powerful tool for catching memory leaks and sources of bloat. The main disadvantage of the tool is that it slows things down quite a lot (~20 times in my experience). You also need to stop the running daemon and restart it under valgrind.

Here is an example of how to use it. In this example, it is the collector that is being checked.

condor_off -collector
# become the user that runs HTCondor
sudo su root
env _CONDOR_USE_CLONE_TO_CREATE_PROCESSES=False valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --leak-resolution=high >& /tmp/valgrind.log < /dev/null /path/to/condor_collector -f -p 9621 &

Note that valgrind (as of 2.4.0) crashes due to the way we use clone(), so the above example disables the clone() optimization. As of 7.1.2, HTCondor should auto-detect that it is running under valgrind and automatically disable the clone optimization.

To check for leaks, run for a while and then do a graceful shutdown (kill -TERM). To check for bloat, kill with SIGINT instead. This will prevent it from doing a normal exit, freeing up memory, etc. That way, we can see memory that it has referenced (but which may be unexpectedly bloated). The valgrind output will contain information on the blocks of memory that were allocated at exit.

valgrind gives more detailed information when you run it with an unstripped binary (including line numbers in the call stack).

Running the test suite under valgrind

batch_test.pl is the means by which the test suite is run. It can be told to start up its own person HTCondor and use that for testing. The tests ran by batch_test.pl may also start their own personal condors.

In order to test all of these personal condors under valgrind, valgrind can be told to follow all forked/execed children and where to dump its output. So we'll do just that and start up the master. You can either be root or a normal user for this, depending upon what you are testing.

After the test suite codes are built, cd into condor_tests/.

We'll invoke batch_test.pl under valgrind. We tell batch test to start up its own personal condors and clean up everything when it is done. We don't ask it to run tests in parallel since there likely are race conditions in the tests over that behavior.

We make sure to specify the logfile pattern in a valid and empty directory. This is because valgrind will write a new log file with the pid associated with it for every program executed under batch_test.pl. You can put them into /tmp like I have here, but I would recommend you put them into a special directory since there will be LOTS of log files.

valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --leak-resolution=high \
--log-file=/tmp/tests.%p --trace-children=yes --num-callers=16 ./batch_test.pl -b -c

Then, in the same directory as the pile of tests.* files, run this script, which will collate the tests.* files associated with HTCondor daemons and tools into directories.

#! /usr/bin/env perl

use warnings;
use strict;
use File::Basename;
use File::Path;

sub collate_valgrind_logs
{
    my $prefix = shift @_;
    my @files;
    my $file;
    my @lines;
    my $line;
    my @pieces;

    @files = `/bin/ls -1 $prefix*`;
    map {chomp $_} @files;

    # foreach file which has a pattern of ^$prefix[0-9]+$ (some corefiles might
    # exist which could have that prefix which we don't want to process) we
    # figure out the command associated with it
    foreach $file (@files) {
        next if ($file !~ /^\Q$prefix\E[0-9]+$/);

        # Get the line out of the file which represents the command that
        # the file is valgrinding.
        @lines = `egrep -e '== Command:' $file`;
        next if ((scalar @lines) == 0);
        map {chomp $_} @lines;

        # figure out the actual command
        @pieces = split / /, $lines[0];
        map {s/^\s+//; chomp $_} @pieces;

        # make sure I have something to work with.
        next if (!defined($pieces[2]));

        # get rid of an easy set of commands that aren't HTCondor processes.
        next if ($pieces[2] !~ m:condor_\w+$:);

        # get rid of any HTCondor process that appears to be a test suite
        # executable
        next if ($pieces[2] =~ m:condor_exec:);

        # get the basename of the file
        $pieces[2] = basename($pieces[2]);

        # Now, mkdir the basename and copy the file into it.
        mkpath($pieces[2]);
        if (-d $pieces[2]) {
            system("cp $file $pieces[2]");
        }

        print "$file, $pieces[2]\n";

    }
}

sub main
{
    collate_valgrind_logs("tests.");
}

exit main();

How to use the google heap profiler

The google heap profiler is part of the google-perftools package, which may be easily downloaded and compiled. This example used version 0.98.

The heap profiler is useful for watching heap usage over time. Although the package documentation recommends linking the application with the tcmalloc library, I have had success with the LD_PRELOAD mechanism.

This example shows how I used the google heap profiler on the startd:

# from the central manager
condor_off -startd
# back on the startd node
# become the user who runs HTCondor
ksu
env LD_PRELOAD=/p/condor/workspaces/danb/google-perftools-0.98-rh5/lib/libtcmalloc.so HEAPPROFILE=/tmp/startd.hprof HEAP_PROFILE_ALLOCATION_INTERVAL=5000000  /unsup/condor/sbin/condor_startd -f >& /tmp/startd.hprof.log < /dev/null &

Every 5M of heap allocation will result in a new file /tmp/startd.hprof.000x.heap. These files can be analyzed using the pprof tool from the google-perftools. In addition to looking at individual snapshots, you can look at the difference between two files. One nice way to look at the output is with kcachegrind, the callgrind GUI. Example:

pprof --callgrind /path/to/condor_startd /tmp/startd.hprof.0001.heap > /tmp/startd.0001.callgrind
kcachegrind /tmp/startd.0001.callgrind

Then you can look at the call tree and see which paths allocated lots of heap, hopefully leading you to discover the source of trouble. If you configure kcachegrind to know where the source code is, you can even pull up a panel showing where in the code the allocations happened.

One thing to beware of is that any code compiled with -fomit-frame-pointer may be untracked by the heap profiler. For example, openssl is usually compiled this way and I have found that the pprof --text report shows some things allocated in libcrypt code to show up as allocations in default_malloc_ex() but the pprof --callgrind report does not show this memory allocation at all. By turning off the compiler option when compiling openssl, I was able to get things to work as expected. This same issue may affect other memory profiling tools too.

How to use igprof for memory profiling

Igprof (the ignominous profiler, http://igprof.sf.net) is another convenient tool for tracking CPU performance and memory. RHEL5 RPMs are available here:

http://koji.hep.caltech.edu/koji/buildinfo?buildID=557.

HTCondor isn't really performance critical, but let's use it for tracking leaks. Igprof normally tracks a process from beginning to end, and dumps a profile at process exit. Instead, we'll use it to monitor the HTCondor tree and dump a periodic heap dump. A useful invocation of condor_master under igprof follows:

igprof -D /var/log/condor/dump_profile -mp condor_master

Whenever a process sees the file /var/log/condor/dump_profile, it will dump a heap profile into $CWD (/var/log/condor if you use the RPMs). An example output filename follows:

igprof.condor_schedd.21553.1324688401.945964.gz

If you only want to monitor the condor_schedd, you can have igprof filter the processes it tracks with "-t":

igprof -D /var/log/condor/dump_profile -t condor_schedd -mp condor_master

Be careful about the dump_profile file: the HTCondor process will attempt to remove it after dumping the profile; if it is owned by root and the process runs as user condor, the removal will fail and the heap dump will occur again 1/3 second later.

If I'm tracking a slow leak, I setup a cron job to do a periodic dump:

*/10 * * * * condor touch /var/log/condor/dump_profile

Igprof produces compressed ASCII dumps about memory allocations which are pretty useless to humans. However, igprof-analyse can analyze the dumps and produce a sqlite3 database:

igprof-analyse -r MEM_LIVE --sqlite -d -v -g igprof.condor_schedd.21553.1324687875.867257.gz | sqlite3 ~/igreport_condor_schedd.sql3

This will show the origination of all currently live memory allocations on the heap. A more useful snapshot would be to compare the current heap to a previous snapshot to see the difference:

igprof-analyse -r MEM_LIVE --sqlite -d -v -g --diff-mode -b igprof.condor_schedd.21553.1324687875.867257.gz igprof.condor_schedd.21553.1324689001.637857.gz | sqlite3 ~/igreport_condor_schedd.sql3

Finally, sqlite3 databases aren't very useful to humans either, so igprof can run an embedded webserver to display its results:

igprof-navigator ~/igreport_condor_schedd.sql3

How to use gdb to trap memory allocations

I mention this only because it is a clever idea from Greg Thain that might be helpful in some situation. In practice, I have found it to slow down the application so much that timeouts happen and normal operation ceases. Too bad!

The idea is to attach to a running daemon and log all of the memory allocation calls that are being made. After sufficient time, you can detach and let the program continue running.

Here's a gdb script that we used to watch the 64-bit collector:

add-symbol-file /tmp/condor_collector.dbg
set pagination off
break malloc
commands 1
  where
  finish
end
define hookpost-finish
  info reg rax
  continue
end
break free
commands 2
  info reg rdi
  continue
end
continue

With the above commands in a file named gdb_malloc_script, you can then attach to the daemon with the following command:

gdb gdb_malloc_script /path/to/collector_malloc.log -p <pid>

Stopping it is a bit annoying. You have to kill gdb and then kill -CONT the program if it is in T state.

With a suitable script for analyzing the malloc log, you could then see things like places in the code that allocated a lot of new memory during the time you were watching.

How to look at static (compile time) allocations for a daemon

This simple script will print out size of text, data & bss segments of the shadow and each of the libraries that it loads.

#!/bin/sh
# print size of Text, Data, & BSS for the condor_shadow and each .so it needs.
shadow=`condor_config_val shadow`
size "$shadow"
ldd "$shadow" | awk '{print $3}' | while read L; do size $L 2>/dev/null | sed '1 d'; done

This simple script will print out the names of all of data symbols in the condor_utils lib sorted by size. (Note that the 7.6.x shadow and the 7.9.5 static shadow don't link with condor_utils, so this script doesn't work with them).

#!/bin/sh
shadow=`condor_config_val shadow`
utillib=`ldd "$shadow" | grep condor_util | awk '{print $3}'`
echo $shadow
echo $utillib
nm -BStd --size-sort "$utillib" | egrep ' [BbDd] '

Attachments:

pid_smap_summary 2439 bytes added by johnkn on 2013-Mar-12 20:38:42 UTC.
Bash script to total all private dirty pages from /proc/[PID]/smaps for a pid or set of pids. if no PIDs supplied, it shows a line of output for each "condor_*" process.