Page History

Turn Off History

Interactive Singularity Jobs and condor_ssh_to_job

Starting with HTCondor 8.8, condor_ssh_to_job and hence also interactive jobs use an sshd running directly on the execute node with user privileges. Since 8.8.10, most issue have been ironed out, and connecting into the job happens using condor_nsenter, which is an nsenter -like tool to "enter" container namespaces in a generic way. This tool is spawned by the starter in parallel to sshd.

There are a few remaining issues related to X11 forwarding which can be worked around, and which are partially dependent on the utilised setup. These are discussed on this page.

X11 forwarding

X11 forwarding in general works by running xauth as a child of the sshd process on the execute node. sshd mostly prunes the environment before, setting a new DISPLAY variable to a forwarded X11 port. It then runs xauth which by default uses the user's home directory to store the X11 authorization information.

Two issues arise:

A possible workaround

To solve both issues at once, we can make use of the fact that sshd is spawned and configured by HTCondor via the condor_ssh_to_job_sshd_config_template. The location of this template can be set via the knob SSH_TO_JOB_SSHD_CONFIG_TEMPLATE. We can patch the file shipped with HTCondor and add the line: XAuthLocation /usr/local/bin/condor_xauth_wrapper Subsequently, we can create the wrapper script (make sure it is executable) with the following content:
#!/bin/bash

# Walk up the process tree until we find the second sshd which rewrites cmdline to "sshd: user@tty".
# The first sshd is our parent process which does not log itself.
SSHD_PID=$$
SSHD_CNT=0
while true; do
  CMDLINE=$(cat /proc/${SSHD_PID}/cmdline)
  #echo "Checking ID ${SSHD_PID}, cmdline ${CMDLINE^^}"
  SSHD_MATCHER="^SSHD: "
  if [[ ${CMDLINE^^} =~ ${SSHD_MATCHER} ]]; then
    # We found the sshd!
    SSHD_CNT=$(( SSHD_CNT + 1))
    if [ ${SSHD_CNT} -gt 1 ]; then
      break;
    fi
  fi
  SSHD_PID=$(ps -o ppid= -p ${SSHD_PID} | awk '{print $1}')
  if [ ${SSHD_PID} -eq 1 ]; then
    # We arrived at the INIT process, something very wrong... Let's stop and alert the user.
    echo "Error: Could not determine sshd process, X11 forwarding will not work!"
    echo "       Please let your admins know you got this error!"
    exit 0
  fi
done
#echo "SSHD PID is ${SSHD_PID}."

# Find sshd.log, checking through fds.
FOUND_SSHD_LOG=0
for FD in $(ls -1 /proc/${SSHD_PID}/fd/); do
  FILE=$(readlink -f /proc/${SSHD_PID}/fd/$FD)
  #echo "Checking FD $FD, file is $FILE"
  SSHD_LOG_MATCHER="sshd\.log$"
  if [[ "${FILE}" =~ ${SSHD_LOG_MATCHER} ]]; then
    #echo "Found ${FILE}!"
    FOUND_SSHD_LOG=1
    SSH_TO_JOB_DIR=$(dirname ${FILE})
    JOB_WORKING_DIR=$(dirname ${SSH_TO_JOB_DIR})
    break;
  fi
done

if [ ${FOUND_SSHD_LOG} -eq 0 ]; then
  # We could not identify sshd.log, let's stop and alert the user.
  echo "Error: Could not determine sshd process' (PID: ${SSHD_PID}) log, X11 forwarding will not work!"
  echo "       Please let your admins know you got this error!"
  exit 0
fi

# Finally, if we arrive here, all is well.

# This does NOT work, since env.sh is sourced as forced command, too early.
#echo "export DISPLAY=${DISPLAY}" >> ${SSH_TO_JOB_DIR}/env.sh

# Ugly hack needed with HTCondor 8.8.10 which does not yet pass through DISPLAY.
echo "export DISPLAY=${DISPLAY}" > ${JOB_WORKING_DIR}/.display

export XAUTHORITY=${JOB_WORKING_DIR}/.Xauthority
/usr/bin/xauth "$@" </dev/stdin

Please note that this script is pretty verbose, and handles very unlikely errors not observed in practice (yet). Most of the code is just there to find out which directory is used as the execute directory for the job, then place the DISPLAY environment variable inside a file .display in there, and finally adjust the environment variable XAUTHORITY to place the .Xauthority file there.

This script works combined with two environment hack inside the container:

An alternative workaround

Note that if your sshd is recent enough and understands SetEnv (should be the case starting from versions >=7.8), you could also patch /usr/libexec/condor/condor_ssh_to_job_sshd_setup instead and inject the XAUTHORITY location via SetEnv. In that script, ${base_dir} refers to the execute directory. This solution has not yet been tested, but should also achieve the expected result. However, you will still need the above condor_xauth_wrapper for now, to transport over the DISPLAY variable by hooking into sshd. But as soon as HTCondor learns this in a future version, the lengthy script can be dropped.