One thing to beware of is that any code compiled with -fomit-frame-pointer may be untracked by the heap profiler. For example, openssl is usually compiled this way and I have found that the pprof --text report shows some things allocated in libcrypt code to show up as allocations in default_malloc_ex() but the pprof --callgrind report does not show this memory allocation at all. By turning off the compiler option when compiling openssl, I was able to get things to work as expected. This same issue may affect other memory profiling tools too. +{section: How to use igprof for memory profiling} + +Igprof (the ignominous profiler, http://igprof.sf.net) is another convenient tool for tracking CPU performance and memory. RHEL5 RPMs are available here: + +http://koji.hep.caltech.edu/koji/buildinfo?buildID=557. + +Condor isn't really performance critical, but let's use it for tracking leaks. Igprof normally tracks a process from beginning to end, and dumps a profile at process exit. Instead, we'll use it to monitor the Condor tree and dump a periodic heap dump. A useful invocation of condor_master under igprof follows: + +{verbatim} +igprof -D /var/log/condor/dump_profile -mp condor_master +{endverbatim} + +Whenever a process sees the file /var/log/condor/dump_profile, it will dump a heap profile into $CWD (/var/log/condor if you use the RPMs). An example output filename follows: + +{verbatim} +igprof.condor_schedd.21553.1324688401.945964.gz +{endverbatim} + +If you only want to monitor the condor_schedd, you can have igprof filter the processes it tracks with "-t": + +{verbatim} +igprof -D /var/log/condor/dump_profile -t condor_schedd -mp condor_master +{endverbatim} + +Be careful about the dump_profile file: the condor process will attempt to remove it after dumping the profile; if it is owned by root and the process runs as user condor, the removal will fail and the heap dump will occur again 1/3 second later. + +If I'm tracking a slow leak, I setup a cron job to do a periodic dump: + +{verbatim} +*/10 * * * * condor touch /var/log/condor/dump_profile +{endverbatim} + +Igprof produces compressed ASCII dumps about memory allocations which are pretty useless to humans. However, igprof-analyse can analyze the dumps and produce a sqlite3 database: + +{verbatim} +igprof-analyse -r MEM_LIVE --sqlite -d -v -g igprof.condor_schedd.21553.1324687875.867257.gz | sqlite3 ~/igreport_condor_schedd.sql3 +{endverbatim} + +This will show the origination of all currently live memory allocations on the heap. A more useful snapshot would be to compare the current heap to a previous snapshot to see the difference: + +{verbatim} +igprof-analyse -r MEM_LIVE --sqlite -d -v -g --diff-mode -b igprof.condor_schedd.21553.1324687875.867257.gz igprof.condor_schedd.21553.1324689001.637857.gz | sqlite3 ~/igreport_condor_schedd.sql3 +{endverbatim} + +Finally, sqlite3 databases aren't very useful to humans either, so igprof can run an embedded webserver to display its results: + +{verbatim} +igprof-navigator ~/igreport_condor_schedd.sql3 +{endverbatim} + {section: How to use gdb to trap memory allocations} I mention this only because it is a clever idea from Greg Thain that might be helpful in some situation. In practice, I have found it to slow down the application so much that timeouts happen and normal operation ceases. Too bad!