Page History

Turn Off History

Reading the Gridmanager Log

Normal Logging Level

This section describes entries that always appear in the gridmanager log.

Like all Condor daemons, the gridmanager prints a banner when it starts up.

11/08/11 23:38:56 ******************************************************
11/08/11 23:38:56 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
11/08/11 23:38:56 ** /afs/cs.wisc.edu/unsup/condor-production/condor-7.6.4/i386_rhel5/sbin/condor_gridmanager
11/08/11 23:38:56 ** SubsystemInfo: name=GRIDMANAGER type=DAEMON(12) class=DAEMON(1)
11/08/11 23:38:56 ** Configuration: subsystem:GRIDMANAGER local:<NONE> class:DAEMON
11/08/11 23:38:56 ** $CondorVersion: 7.6.4 Oct 17 2011 BuildID: 378655 PRE-RELEASE-UWCS $
11/08/11 23:38:56 ** $CondorPlatform: x86_rhap_5 $
11/08/11 23:38:56 ** PID = 15864
11/08/11 23:38:56 ** Log last touched 11/8 23:32:33
11/08/11 23:38:56 ******************************************************
11/08/11 23:38:56 Using config source: /p/condor/workspaces/jfrey/test.68/condor_config
11/08/11 23:38:56 Using local config sources:
11/08/11 23:38:56    /p/condor/workspaces/jfrey/test.68/condor_config.local
11/08/11 23:38:56 DaemonCore: command socket at <128.105.165.29:37201>
11/08/11 23:38:56 DaemonCore: private command socket at <128.105.165.29:37201>

This banner includes the Condor version and platform, daemon pid, when the log file was last touched, config files, and the daemon's command port.

It's possible for multiple gridmanger processes to be writing to the log at the same time. So after the banner, the gridmanager includes its pid in square brackets on every line.

When the gridmanager starts managing a job (whether newly submitted or released from hold, it writes this to the log:

11/08/11 23:38:59 [15864] Found job 716.0 --- inserting

Lines in the log that relate to a specific job will often include the job id in parentheses:

11/08/11 23:38:59 [15864] (716.0) doEvaluateState called: gmState GM_CLEAR_REQUEST, globusState 32

With GRAM, whenever the jobmanager notifies the gridmanager of a job status change, it's logged:

11/08/11 23:39:10 [15864] (716.0) gram callback: state 1, errorcode 0

When a GAHP command fails, you can usually find a line mentioning it:

11/08/11 23:42:40 [15864] (716.0) gmState GM_RESTART, globusState 128: globus_gram_client_job_request() returned Globus error 93

When the gridmanager exits normally, it prints a goodbye message:

11/08/11 23:42:44 [15864] **** condor_gridmanager (condor_GRIDMANAGER) pid 15864 EXITING WITH STATUS 0

Verbose Logging Level

When something goes wrong, it's useful to increase the verbosity of the gridmanager's logging to collect more information. When you do so, you may also want to increase the maximum log size

GRIDMANAGER_DEBUG = D_FULLDEBUG
MAX_GRIDMANAGER_LOG = 50000000

This section will describe some of the additional information that will appear in the log when D_FULLDEBUG is enabled.

All data exchanged with GAHP servers is logged. The gridmanager communicates with a GAHP server over a pair of pipes. The GAHP acknowledges the issuance of each command. The results of the command are supplied later, in response to a RESULTS command.

11/03/11 16:51:54 [29399] GAHP[29509] <- 'GRAM_JOB_SIGNAL 10632 https://stress2.
chtc.wisc.edu:46861/16217854812539030796/8230609966348670593/ 10 NULL'
11/03/11 16:51:54 [29399] GAHP[29509] -> 'S'
...
11/03/11 16:51:54 [29399] GAHP[29509] <- 'RESULTS'
11/03/11 16:51:54 [29399] GAHP[29509] -> 'R'
11/03/11 16:51:54 [29399] GAHP[29509] -> 'S' '1'
11/03/11 16:51:54 [29399] GAHP[29509] -> '10632' '0' '0' '8'

In this example, 29509 is the pid of the GAHP server, GRAM_JOB_SIGNAL is the type of command being issued, 10632 is a unique id for the command, and the rest of the line is arguments for the command. To learn more about the GAHP protocol, see http://http://www.cs.wisc.edu/condor/gahp