Page History
- 2012-Nov-13 15:51 adesmet
- 2009-Nov-25 08:09 danb
- 2009-Jul-10 11:14 danb
- 2009-Jan-29 10:20 danb
How to sample the stack
This method is useful when you are called in to diagnose a condor daemon that is using lots of cpu (or blocking a lot on I/O) for an unknown reason. Although crude, I have found this method to lead to the source of trouble in numerous cases.
If pstack or gstack are available, just use those. If not, you can use gdb.
Here’s an example gdb script for the schedd:
add-symbol-file /path/to/condor_schedd.dbg set pagination off where quit
Run it a few times with the following command:
gdb -p <pid> < gdb_stack_sampler
If the stack shows that the schedd is usually in a certain part of the code, this may lead you to the source of trouble.
If you need to get the condor admin to do this for you, then it is nice to make it even simpler. Below is a simple script to do the sampling. (Note that it doesn’t load the .dbg file, just to keep things simple.)
#!/bin/sh PID=$1 if [ "$PID" == "" ]; then echo "USAGE: $0 pid" exit 2 fi LOG=stack_sample.$PID echo "Writing output to $LOG." max=10 for i in `seq 1 $max`; do echo "Sample $i of $max" date >> $LOG gdb -p $PID >> $LOG <<EOF set pagination off where quit EOF sleep 5 done
How to use callgrind to profile condor
Callgrind is a wonderful profiling tool. The one big disadvantage of it is that it slows down the application considerably (~20 times in my experience).
Here’s an example of how to run callgrind on the collector.
condor_off -collector # become the user who runs condor sudo su root valgrind —tool=callgrind /path/to/condor_collector -f -p 9618 >& /tmp/callgrind.log < /dev/null & PID=$!
You will see some files in the current working directory named “callgrind.out.$PID” and “callgrind.info.$PID”. Change the ownership of these from root to the condor user or callgrind will have trouble writing to them when the program exits.
chown condor:condor callgrind.*.$PID
After running for sufficient time, stop the collector.
kill -TERM $PID
You can then analyze the profile using kcachegrind.
kcachegrind /path/to/callgrind.out.$PID
The call tree is very useful. If you configure kcachegrind to know the path to the code, then you can also see the code for points in the call graph, annotated with profiling information. It is very nice!