HTCondorWiki: Vm Universe

Introduction

Comprehensive design outline of the vm-universe, your one stop shopping location for all things virtual in HTCondor. For a breakdown of the current work that is being done, please see Ticket #883

Related Tickets

Ticket #883: VM universe refactor

Goals and Objectives

The vm-universe gives HTCondor the ability to run virtual machines as a "job", these vm's can either run a dedicated task, spin up extra resources (which can be added to the existing pool), create a virtual pool.

Requirements

Engineering/Design Requirements

(E1) The VM gahp should be made as lightweight and minimalistic as possible.
(E2) The VM Gahp shall support a pluggable hypervisor mechanism to allow for multiple technolgies. Primary support will be for libvirt - which can support most if not all hypervisors which one would care about.
(E3) Support for multiple CPU's and hard disks (needs further description) of what running a N-CPU job actually means on the host machine.
(E4) Support for multiple network interfaces. These mutliple network interfaces may use different types of networking. Also a network interface may be present but disabled ( virtualization equivalent of no cable plugged in ).
(E5) Support for running multiple virtual machines and/or hypervisors from a single execute node.
(E6) Administrators can allow/override/reject any virtual machine requirement requests. via script which "could" xsl transform the incoming xml (e.g. limit memory size, reject MAC address settings, etc.)

Functional/Behavioral Requirements

Initialization

(TBD) shall support the following networking capabilities
- Tunneling the network interface? (ssh to my job).
- Bridged networking interfaces (assign a MAC address).

Discovery & Advertising

(F1) The VM-Gahp shall support discovery of a machines vm-capabilities and subsequently advertise that information in the machine classad.
- VM_TYPES := script-managed, libvirt-managed
- VM_HYPERVISORS := xen, vmware, virtualbox, kvm etc (populated by discovery)
- VM_<HyperVisor>_ATTR_<name> (for hypervisor specific capabilities)
(TBD) Reporting of NUM_CPUS, HD, & MEMORY? SLOTS?

Submission

(F2) Submit side processing of different hypervisor specific vm formats need to be defined. This should be capable of building the VM's requirements and populating the classad. {Consensus still needed on what all goes into the classad. ?? }
- Shall support submission of libvirt compatible .xml files
  - Can this be a frontend (e.g. condor_libvirt_submit) that generates the appropriate job submission file for HTCondor? Similar front ends could be developed for other formats (e.g. condor_ovf_submit).
- VM_TYPE=script-managed, libvirt-managed (support multiple ones?)
- VM_HYPERVISOR=xen,vmware,... (support multiple ones?)
- VM_ATTR_<name>=... (for generic attributes like memory, cpus, etc.)
- VM_<HyperVisor>_ATTR_<name>=... (for hypervisor specific attributes)
  - Can this be open ended? Leave it up to the vm manager to decide what to do with attributes that are unknown? How is this information reported back to the user?
(TBD) Shall support shared folders.

Matching

(F3) The gahp will only manage a single virtual machine at a time. Functionality should be handed off to libvirt / script where ever possible.
(F4) Post matching, the script / libvirt should support a verification stage which checks to see if the vm's requirement is in accordance with what the classad claims it to be.
(TBD) authentication

Running

(TBD) gahp reporting
- Memory, cycles, etc.
(TBD) Checkpointing and migrating
- what requirements get "peg'd?"
- define actions (pause, resume, stop ...)

Testability of Requirements

TBD Matrix which outlines F#, E# and how/if it is testable

Design Choices and "Views"

Logical Design View

The Design View captures the classes, interfaces, and patterns that describe the representation of the problem domain and how the software will be built to address it. The design view almost always uses class diagrams, object diagrams, activity diagrams, composite structure diagrams, and sequence diagrams to convey the design of a system. The design view typically doesn't address how the system will be implemented or executed.

Deployment View

The Deployment View captures how a system is configured, installed, and executed. It often consists of component diagrams, deployment diagrams, and interaction diagrams. The Deployment View also captures how the physical layout of the hardware communicates to execute the system.

Implementation View

The Implementation View emphasizes the components, files, and resources used by the system. Typically the Implementation View focuses on the configuration management of a system; what components depend on what, what source files implement what classes, etc. Implementation views almost always use one or more component diagrams and may include interaction diagrams, statechart diagrams, and composite structure diagrams.

Process View

The Process View of a system is intended to capture concurrency, performance, and scalability information, which shows how the system will behave at runtime. The process view often uses some form of interaction diagrams and activity diagrams.

Use Case Senarios (Tie views together)

Libvirt submission & Run
Script submission & Run
- User specifies VM_TYPE=script-managed VM_HYPERVISOR=virtualbox in their job submission file (as well as additional VM requirements). On the startd/vm_gahp side, a script matching the VM_HYPERVISOR value needs to exist.
Pause & Resume
Terminating a Job
Libvirt is a hypervisor abstraction layer. This means, libvirt can manage virtual machines intended for different hypervisors while giving a uniform, hypervisor independent interface to the interacting user/application. While HTCondor interfaces with libvirt, this interface as of now is not hypervisor independent. Ideally, HTCondor should be able to use libvirt to manage virtual machines intended for any hypervisor that libvirt supports, even if it is one that was not previously supported by libvirt while the HTCondor installation was being coded.
Libvirt being a hypervisor abstraction layer, can also be viewed as a hypervisor. Libvirt has it's virtual machine description format which is a libvirt xml file. HTCondor users must therefore be given an option where they may submit a libvirt xml virtual machine description to HTCondor. HTCondor must be capable of accepting this input and running the virtual machine on behalf of the user using libvirt.
The "actions" that the HTCondor vm-gahp performs on a virtual machine that it manages is finite and enumerable. (At times, not all hypervisors would support all of these actions). This motivates a use case where third parties are given the flexibility to develop support for previously unsupported hypervisors on HTCondor. This can be achieved through supporting the option of a script callout with the virtual machine and the action as parameters. In this case, the script becomes the black box, that interfaces the hypervisor with HTCondor.
- allow the user to submit such a script along with the virtual machine ??
  - I don't think users should be allowed to submit a virtual machine manager script. I think that violates the administrative controls of the resource owner. Of course a user could try to submit a vanilla job that launches a virtual machine and it would be up to the resource owner to allow/disallow such an option.
The case of the user not caring about the choice of hypervisor: many hypervisors these days support the vmware standard vmdk disk images. Likewise, another incarnation of virtualization that is gaining popularity is virtual appliances which may be run on several different hypervisors. The user may be allowed to submit such hypervisor independent virtual machines in which case HTCondor should pick the hypervisor for the jobs.
- See submission functional behavior. One way to support this is to allow for the user to specify a list of types and hypervisors that can be used. Maybe add a special "any" wildcard or have the absence of VM_TYPE and VM_HYPERVISOR indicate "I don't care." I have a feeling that leaving out VM_TYPE and VM_HYPERVISOR would lead to more user errors, however.
A collection of virtual machine configurations and disk configurations may be pre-deployed/staged in the pool. One possibility is to allow a vm universe job specification to refer to a pre-deployed virtual machine and provide a secondary disk containing job specific data.
Note on Ticket #1023 for VirtualBox. This can be supported with VirtualBox's VBoxHeadless --capture option. Not sure what other hypervisors provide.

Raw Thoughts, Ideas, and Questions (Migrate to above later)

Script managed virtual machines are provided job class ads specifying the virtual machine universe job to be run. The script is responsible for parsing the class ad, making any modifications to the ad necessary to meet administrative requirements, and carry out the specified vm_gahp command (e.g., start, preempt, checkpoint, etc.). Commands may be executed, rejected, modified, or not-implemented.
- Should these status codes be returned to the user?
- If a classad is modified, should the user be notified? Should the user receive indications of what was modified? What are the security implications of providing that information?
An interesting side effect of script managed virtual machine is that it is relatively straight forward to override and fill in attributes in any arbitrary way. For example, it would be possible for a script implementation to call out to a MAC address and IP address management system to assign MAC and IP addresses to machines. Some care is needed to recover if the script is unable to gather all the information necessary to launch a virtual machine.
- That would be very useful. The problem of brokering and assigning MAC addresses would be simple. However, that of IP address management needs more pondering since the only fathomable way to do it is to have some kind of a daemon which speaks the DHCP language.
Also note, "scripts" is a fairly loose term. It is entirely possible to implement scripts in compiled languages like C or C++, Java, or traditional scripting languages.
Do VM universe jobs interact with job hooks or job wrappers? If so, how?
- Within the current code, there's only two hooks that I can think of as pertinent: the job router hooks and the starter hooks. For both, there's no functionality support required from th part of the script / libvirt.
Supporting multiple CDROM and Floppy images, and checking and disabling use of the cdrom/floppy device on the execute host?

Vm Universe