*Justification* -Condor is very good at distributing and managing large jobs that might take hours or more to complete. Managing these jobs includes moving job binaries and input files to an execute machine, executing the job, and returning any output to the submit machine. In this case, the overhead of staging the execute machine with binaries and input, and the return of the output is made trivial by the amount of time the job requires to run. There are, however, situations in which a job may only take a few seconds or less. This job might need to be run hundreds or thousands of times with variability in the data used to produce the results. It may also be desired that these jobs complete in as little time as possible. In this case, the overhead of moving the job binary each time the task is required to run may not be as trivial as before. To aid in execution of these jobs, we will add the concept of High Frequency Computing to Condor. +HTCondor is very good at distributing and managing large jobs that might take hours or more to complete. Managing these jobs includes moving job binaries and input files to an execute machine, executing the job, and returning any output to the submit machine. In this case, the overhead of staging the execute machine with binaries and input, and the return of the output is made trivial by the amount of time the job requires to run. There are, however, situations in which a job may only take a few seconds or less. This job might need to be run hundreds or thousands of times with variability in the data used to produce the results. It may also be desired that these jobs complete in as little time as possible. In this case, the overhead of moving the job binary each time the task is required to run may not be as trivial as before. To aid in execution of these jobs, we will add the concept of High Frequency Computing to HTCondor. *High Frequency Computing* -At it's core, High Frequency Computing (HFC) is an attempt to run as many jobs as possible in as short a time as possible. These jobs are expected to take a very short time to complete (a few seconds or less) and work on relatively small input data (a couple hundred to a couple thousand bytes). These special jobs will be called Tasks. For an HFC job, the user will provide a Task or batch of Tasks that will be submitted to Condor. Once the tasks have been completed, the results will be returned to the user. +At it's core, High Frequency Computing (HFC) is an attempt to run as many jobs as possible in as short a time as possible. These jobs are expected to take a very short time to complete (a few seconds or less) and work on relatively small input data (a couple hundred to a couple thousand bytes). These special jobs will be called Tasks. For an HFC job, the user will provide a Task or batch of Tasks that will be submitted to HTCondor. Once the tasks have been completed, the results will be returned to the user. -Tasks will be defined as the data required to complete the task. In this way it is different from Condor jobs as the task itself isn't capable of doing anything. The task must be given to a worker that knows how to interpret the task and produce a result from it. Workers can be thought of as task servers; send a worker a task and it will send the results back. In this way the transfer of the worker (the executable) is only done once. Once the worker is established, it will be able to process any tasks sent to it. By removing the need to transfer binaries and input files, the time to complete a task will be greatly reduced. +Tasks will be defined as the data required to complete the task. In this way it is different from HTCondor jobs as the task itself isn't capable of doing anything. The task must be given to a worker that knows how to interpret the task and produce a result from it. Workers can be thought of as task servers; send a worker a task and it will send the results back. In this way the transfer of the worker (the executable) is only done once. Once the worker is established, it will be able to process any tasks sent to it. By removing the need to transfer binaries and input files, the time to complete a task will be greatly reduced. -The user will be provided a method of staging the workers (most likely via condor_submit) and submitting tasks. Condor will perform the majority of the work required to manage the tasks. This will include all scheduling duties. Task result processing will occur inside a user created Results Processor. The results processor will be fed the results as they become available. +The user will be provided a method of staging the workers (most likely via condor_submit) and submitting tasks. HTCondor will perform the majority of the work required to manage the tasks. This will include all scheduling duties. Task result processing will occur inside a user created Results Processor. The results processor will be fed the results as they become available. -The user will be responsible for providing the worker executable and the result processing executable. All other duties will be performed by Condor. +The user will be responsible for providing the worker executable and the result processing executable. All other duties will be performed by HTCondor. *Summary* @@ -27,13 +27,13 @@ *: User: Submits as many Tasks as desired and provides a results processor. -*: Condor: Schedules and submits tasks to available workers. +*: HTCondor: Schedules and submits tasks to available workers. -*: Worker: Processes tasks as Condor submits them and returns results. +*: Worker: Processes tasks as HTCondor submits them and returns results. -*: Condor: Receives results and forwards them to the results processor. +*: HTCondor: Receives results and forwards them to the results processor. -*: Results Processor: Processes results from Condor into whatever form the user wishes. +*: Results Processor: Processes results from HTCondor into whatever form the user wishes. The benifit of using HFC is derived form the fact that reducing the amount of data that needs to be sent over the network will decrease task completion time and increase task frequency. @@ -54,7 +54,7 @@ Task Transmission occurs between the shadow of the worker process and the the starter of the worker process. Currently, all communication is initialized in the starter using the NTsender interface. The shadow simply listens for communications from the starter and responds as necessary. -The issue is that as tasks are fed to the shadow by Condor, the shadow should be able to signal the starter that there are tasks available. In practice this has lead to deadlock conditions in which the starter is attempting to pull form the shadow while the shadow is waiting for an ack from the starter. +The issue is that as tasks are fed to the shadow by HTCondor, the shadow should be able to signal the starter that there are tasks available. In practice this has lead to deadlock conditions in which the starter is attempting to pull form the shadow while the shadow is waiting for an ack from the starter. *Task Processing*