Syscall mode
In the processes of puttering with the virtual file table, Jim and I (Doug Thain) came across some strange bits of code relating to the system call mode. After some discussion, we agreed on the enclosed interpretation of the system call mode.thain 1-Feb-1999
Abstract
The syscall mode moves along two axes:
- SYS_LOCAL/SYS_REMOTE describes whether non-filesystem calls are executed locally, or sent to the shadow.
- SYS_MAPPED/SYS_UNMAPPED describes whether filesystem calls are redirected through the virtual file table.
All four combinations are valid, and some have surprising side effects. The existing HTCondor code does not use or implement these states consistently. Below, we propose an interpretation to use in the future.
Background
In earlier versions of HTCondor, there were two ways to open a file -- either on the local host, or via a remote system call. However, the number of ways to access a file is expanding, and this is causing some confusion in the meaning of the system call flags.
System calls related to the file system (open, close, read, write, etc.) will pass through a virtual file table (vft). The vft hides a bunch of things from the user. Primarily, it hides the access method -- for a given fd, you might be communicating with the shadow, or an ioserver, or any number of other entities.
When SYS_MAPPED is in effect, the vft is consulted for all filesystem calls. For example, a SYS_MAPPED read() is routed through the vft. The user-supplied fd is converted into a real fd, and the vft indicates the access method for the read (local, shadow, ioserver, etc.).
What about SYS_UNMAPPED? A SYS_UNMAPPED read() can be performed in only one of two ways: on the local machine or on the shadow. The choice is determined by the other axis of the syscall mode. (Ah hah!)
SYS_LOCAL/SYS_REMOTE is (almost) orthogonal to SYS_MAPPED/SYS_UNMAPPED. When SYS_LOCAL is in effect, non-filesystem calls are executed locally. When SYS_REMOTE is in effect, non-filesystem calls are sent to the shadow for execution. When mapping is not in effect, system calls such as read() and write() are executed according to the SYS_LOCAL/SYS_REMOTE mode.
The Four Possible States
SetSyscalls(SYS_LOCAL|SYS_UNMAPPED) is the simplest case. All system calls are executed on the local system. Examples: gettimeofday() is executed locally. read() is executed locally with the given file descriptor.
SetSyscalls(SYS_REMOTE|SYS_UNMAPPED) indicates that ALL system calls are to be dispatched to the shadow. An operation on an fd is not interpreted by the user job -- it is simply passed on to it's representative on the shadow. Examples: gettimeofday() is executed on the shadow. read() is executed on the shadow with the given file descriptor
SetSyscalls(SYS_REMOTE|SYS_MAPPED) is the common mode of execution in HTCondor. MOST system calls are sent to the shadow. However, filesystem calls are routed through the virtual file table. From there, they are dispatched according to the access method of the appropriate file. Examples: gettimeofday() is executed on the shadow. read() is executed according to the access method stored in the vft.
SetSyscalls(SYS_LOCAL|SYS_MAPPED) is a little unusual. Non-filesystem calls are executed locally. Filesystem calls are routed through the virtual file table, and dispatched according to their access methods. IN THIS MODE, A FILE MIGHT STILL BE ACCESSED REMOTELY. Despite the fact that SYS_LOCAL is in effect, the vft could route an fd to any kind of access method, including remote access. Examples: gettimeofday() is executed locally. read() is executed according to the access method stored in the vft.
Open is a Little Tricky
open() considers both axes of the system call mode. The first axis is used to establish the appropriate access method for the file being opened.
SYS_LOCAL: Access method is local. Perform a local open. SYS_REMOTE: Ask the shadow, "What access method should I use for this file name?" (local, remote, ioserver, etc.), and then perform the appropriate open.
The second axis is used to determine what fd to return to the user.
SYS_UNMAPPED: Don't touch the vft. Return a real fd. SYS_MAPPED: Install this file in the vft, and return a virtual file descriptor.
Things to note
- SYS_MAPPED refers to signals as well as files. You might consider the SYS_MAPPED/SYS_UNMAPPED access to be akin to supervisor/user mode.
- SYS_LOCAL by itself does NOT prevent all remote system calls.
- The checkpoint code also has a third axis, STANDALONE/REMOTE. In standalone mode, the user job has been run without a starter or a shadow, so SYS_REMOTE is not possible. (However, SYS_MAPPED still is!)