{section: Reading the Gridmanager Log} {subsection: Normal Logging Level} This section describes entries that always appear in the gridmanager log. Like all Condor daemons, the gridmanager prints a banner when it starts up. {code} 11/08/11 23:38:56 ****************************************************** 11/08/11 23:38:56 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP 11/08/11 23:38:56 ** /afs/cs.wisc.edu/unsup/condor-production/condor-7.6.4/i386_rhel5/sbin/condor_gridmanager 11/08/11 23:38:56 ** SubsystemInfo: name=GRIDMANAGER type=DAEMON(12) class=DAEMON(1) 11/08/11 23:38:56 ** Configuration: subsystem:GRIDMANAGER local: class:DAEMON 11/08/11 23:38:56 ** $CondorVersion: 7.6.4 Oct 17 2011 BuildID: 378655 PRE-RELEASE-UWCS $ 11/08/11 23:38:56 ** $CondorPlatform: x86_rhap_5 $ 11/08/11 23:38:56 ** PID = 15864 11/08/11 23:38:56 ** Log last touched 11/8 23:32:33 11/08/11 23:38:56 ****************************************************** 11/08/11 23:38:56 Using config source: /p/condor/workspaces/jfrey/test.68/condor_config 11/08/11 23:38:56 Using local config sources: 11/08/11 23:38:56 /p/condor/workspaces/jfrey/test.68/condor_config.local 11/08/11 23:38:56 DaemonCore: command socket at <128.105.165.29:37201> 11/08/11 23:38:56 DaemonCore: private command socket at <128.105.165.29:37201> {endcode} This banner includes the Condor version and platform, daemon pid, when the log file was last touched, config files, and the daemon's command port. It's possible for multiple gridmanger processes to be writing to the log at the same time. So after the banner, the gridmanager includes its pid in square brackets on every line. When the gridmanager starts managing a job (whether newly submitted or released from hold, it writes this to the log: {code} 11/08/11 23:38:59 [15864] Found job 716.0 --- inserting {endcode} Lines in the log that relate to a specific job will often include the job id in parentheses: {code} 11/08/11 23:38:59 [15864] (716.0) doEvaluateState called: gmState GM_CLEAR_REQUEST, globusState 32 {endcode} With GRAM, whenever the jobmanager notifies the gridmanager of a job status change, it's logged: {code} 11/08/11 23:39:10 [15864] (716.0) gram callback: state 1, errorcode 0 {endcode} When a GAHP command fails, you can usually find a line mentioning it: {code} 11/08/11 23:42:40 [15864] (716.0) gmState GM_RESTART, globusState 128: globus_gram_client_job_request() returned Globus error 93 {endcode} When the gridmanager exits normally, it prints a goodbye message: {code} 11/08/11 23:42:44 [15864] **** condor_gridmanager (condor_GRIDMANAGER) pid 15864 EXITING WITH STATUS 0 {endcode} {subsection: Verbose Logging Level} When something goes wrong, it's useful to increase the verbosity of the gridmanager's logging to collect more information. When you do so, you may also want to increase the maximum log size {code} GRIDMANAGER_DEBUG = D_FULLDEBUG MAX_GRIDMANAGER_LOG = 50000000 {endcode} This section will describe some of the additional information that will appear in the log when D_FULLDEBUG is enabled. All data exchanged with GAHP servers is logged. The gridmanager communicates with a GAHP server over a pair of pipes. The GAHP acknowledges the issuance of each command. The results of the command are supplied later, in response to a RESULTS command. {code} 11/03/11 16:51:54 [29399] GAHP[29509] <- 'GRAM_JOB_SIGNAL 10632 https://stress2. chtc.wisc.edu:46861/16217854812539030796/8230609966348670593/ 10 NULL' 11/03/11 16:51:54 [29399] GAHP[29509] -> 'S' ... 11/03/11 16:51:54 [29399] GAHP[29509] <- 'RESULTS' 11/03/11 16:51:54 [29399] GAHP[29509] -> 'R' 11/03/11 16:51:54 [29399] GAHP[29509] -> 'S' '1' 11/03/11 16:51:54 [29399] GAHP[29509] -> '10632' '0' '0' '8' {endcode} In this example, 29509 is the pid of the GAHP server, GRAM_JOB_SIGNAL is the type of command being issued, 10632 is a unique id for the command, and the rest of the line is arguments for the command. To learn more about the GAHP protocol, see http://http://www.cs.wisc.edu/condor/gahp The stderr of the GAHP server is also logged: {code} 09/21/11 15:26:28 [4205] GAHP[4209] (stderr) -> 2011-09-21 15:26:28,909 INFO - CreamProxy_Delegate::execute() - Signing returned proxy with our proxy certificate [/tmp/x509up_u17063]... {endcode} The gridmanager logs when it observes the job's status as reported by the remote system change: {code} 11/09/11 00:03:25 [16311] (717.0) globus state change: ACTIVE -> STAGE_OUT {endcode} {code} 11/09/11 00:17:39 [16741] (718.0) cream state change: IDLE -> REALLY-RUNNING {endcode} The gridmanager's connections to its schedd are also logged. This includes every job attribute it modifies. {code} 11/03/11 16:44:47 [29399] in doContactSchedd() 11/03/11 16:44:47 [29399] Updating classad values for 714.933: 11/03/11 16:44:47 [29399] GlobusStatus = 64 11/03/11 16:44:47 [29399] GridJobStatus = "STAGE_IN" 11/03/11 16:44:47 [29399] LastRemoteStatusUpdate = 1320356686 11/03/11 16:44:47 [29399] NumGlobusSubmits = 1 11/03/11 16:44:47 [29399] leaving doContactSchedd() {endcode}