Page History

Turn Off History

All platforms

<SUBSYS>_DEBUG_WAIT

When <SUBSYS>_DEBUG_WAIT is true, a daemon will pause on startup, running a loop of bool debug_wait=1;while(debug_wait){sleep(1)};. That way you can attach to the daemon with a debugger and continue the run.

(Prior to late June, 2012, this doesn't work reliably. debug_wait could be optimized out. See #3064 for details and workaround.)

On Windows you might prefer the slightly more convenient <SUBSYS>_WAIT_FOR_DEBUGGER, documented below.

GDB specific issues

clone(), gdb internal-errors

HTCondor uses clone in a way that causes GDB grief. Running a HTCondor daemon under GDB will likely fail with errors similar to:

warning: Can't attach LWP -1: No such process
../../gdb/linux-thread-db.c:389: internal-error: thread_get_info_callback: Assertion `inout->thread_info != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)

The solution is to tell HTCondor to not to use clone, but to fall back on the fork code path. In your HTCondor configuration file, use:

USE_CLONE_TO_CREATE_PROCESSES = false

(This page primarily exists as search bait, so that others hitting this error can quickly find the solution.)

For the adventurous (or those who are debugging the clone calls), one can apply the patch found on #2208

Debug a core file produced by a stripped binary

Say you have a.out.stripped and a.out.with_symbols. And say you have a core file from a.out.stripped that you want to debug, but are being thwarted by the lack of symbols. The trick is you still want to gdb the file that produced the core, but load symbols from the unstripped binary into the text segment of the file you are debugging.

   % gdb a.out.stripped core

... gdb will complain about missing debug info ...

   (gdb) maint info sections

In the output from 'maint info sections', look for .text section. The address that is in the first column is what you want to use in the next command. For instance, one of the output lines from 'maint info sections' will be

    0x00400390->0x00400588 at 0x00000390: .text ALLOC LOAD READONLY CODE HAS_CONTENTS

The address in the first column, 0x00400390 in this example, is used in the next gdb command:

   (gdb) add-symbol-file a.out.with_symbols 0x00400390

Of course, substitute 0x00400390 with the proper text segment address.

That will get your symbols loaded for the main executable, but you still won't have any symbols for libraries that were dynamically loaded, which these days is most of HTCondor.

So, to get those, you need to do a couple things.

First, run gdb on the unstripped version of the .so you would like to load and find the .text segment just like above, and write down the offset (the first column), you'll need it later.

Second, go back to gdb with the stripped binary and the core. If you look at any particular stack frame, you'll see the instruction pointer in the second column:

    #7  0x00007f034f8c1834 in KeyCache::remove ()

Now comes the black-magic portion. Run "maint info sections" again and look at the first column. Find the entry that contains the above instruction pointer. It should be marked "ALLOC READONLY CODE". Now, add that address to the offset of the .text segment from the .so file you found above. Finally, you can add the symbols from the shared library with

    add-symbol-file unstripped.so 0xMagicAddress

You'll have to do this for each shared library (libcondor_util, libclassad, etc.) if you need the symbols for that particular stack frame.

Windows specific issues

<SUBSYS>_WAIT_FOR_DEBUGGER

In addition to <SUBSYS>_DEBUG_WAIT, on Windows (#ifdef WIN32) you have access to <SUBSYS>_WAIT_FOR_DEBUGGER. It's very similar, but is smart enough to automatically detect when a debugger is attached.