HTCondorWiki: Execution Profiling

Page History

How to sample the stack

This method is useful when you are called in to diagnose a condor daemon that is using lots of cpu (or blocking a lot on I/O) for an unknown reason. Although crude, I have found this method to lead to the source of trouble in numerous cases.

If pstack or gstack are available, just use those. If not, you can use gdb.

Here’s an example gdb script for the schedd:

add-symbol-file /path/to/condor_schedd.dbg
set pagination off
where
quit

Run it a few times with the following command:

gdb -p <pid> < gdb_stack_sampler

If the stack shows that the schedd is usually in a certain part of the code, this may lead you to the source of trouble.

If you need to get the condor admin to do this for you, then it is nice to make it even simpler. Below is a simple script to do the sampling. (Note that it doesn’t load the .dbg file, just to keep things simple.)

#!/bin/sh

PID=$1
if [ "$PID" == "" ]; then
  echo "USAGE: $0 pid"
  exit 2
fi

LOG=stack_sample.$PID
echo "Writing output to $LOG."
max=10
for i in `seq 1 $max`; do
echo "Sample $i of $max"
date >> $LOG

gdb -p $PID >> $LOG <<EOF
   set pagination off
   where
   quit
EOF

sleep 5
done

How to use callgrind to profile condor

Callgrind is a wonderful profiling tool. The one big disadvantage of it is that it slows down the application considerably (~20 times in my experience).

Here’s an example of how to run callgrind on the collector.

condor_off -collector
# become the user who runs condor
sudo su root
valgrind —tool=callgrind /path/to/condor_collector -f -p 9618 >& /tmp/callgrind.log < /dev/null &
PID=$!

You will see some files in the current working directory named “callgrind.out.$PID” and “callgrind.info.$PID”. Change the ownership of these from root to the condor user or callgrind will have trouble writing to them when the program exits.

chown condor:condor callgrind.*.$PID

After running for sufficient time, stop the collector.

kill -TERM $PID

You can then analyze the profile using kcachegrind.

kcachegrind /path/to/callgrind.out.$PID

The call tree is very useful. If you configure kcachegrind to know the path to the code, then you can also see the code for points in the call graph, annotated with profiling information. It is very nice!