Flow of the TransferDaemon
--------------------------
Start of TD daemon:
+ Globally construct a TransferD object.
MAIN() CONTROL:
libc calls main():
+ Set up daemoncore, call dc_main() which calls main_init()
main_init():
+ Call g_td.init();
+ Parse command line and create object representing command line options.
+ The TransferD object has a "Features" object called m_features. This
holds command line options and all their defaults.
+ The valid command line options are:
--schedd <sinfulstring>
To which schedd should the transferd register itself.
--stdin
Get TransferRequests from stdin.
--id <ascii_key>
Used by the transferd to identify itself to the schedd. A shared
"secret" used only to pair an incoming request of a registering
transferd and the schedd's forking of it. This is because the schedd
can fork more than one at a time and need to know which registration
mmatches which reason the schedd needed it.
--timeout <number of seconds>
If there is no work to do, exit in number of seconds.
--shadow <upload|download> [DEMO option, may not be used in production]
The transferd must get a transferrequest via stdin, and then
will connect to a shadow to perform up/down load as requested.
+ If using stdin, call TransferD::accept_transfer_request().
+ Read the first line of stdin, this represents an "encapsulation
method" which dictates how to process the stuff that comes
after. The function encap_method() converts this line to an
enum. Currently, there is only one: ENCAP_METHOD_OLD_CLASSADS (and
ENCAP_METHOD_UNKNOWN which means I don't know what kind to use).
+ If the encapsulation is validly ENCAP_METHOD_OLD_CLASSADS, then call
the function accept_transfer_request_encapsulation_old_classads().
[accept_transfer_request_encapsulation_old_classads()]:
+ Read the TransferRequest ClassAd from stdin. This details how many
additional job classads are coming down the pipe for this transfer.
+ Create a single TransferRequest object from the above ad.
+ Read multiple work ads (each representing a jobad which the FileTransfer
Object will use as a unit of work for a transfer) and store them into
the TransferRequest object.
+ Since we can only have one transfer request given to the transferd in
the stdin mode, we will generate a "capability" that represents the
TransferRequest and associate the transfer request with the capability.
+ Since we have work to do, ensure our inactivity timer will not be fired.
+ Mark ourselves initialized.
+ Tell the schedd I'm alive and ready: [g_td.register_to_schedd()]
+ Determine the location of the schedd from the Features object.
+ Get my own sinful string.
+ Create a DCSchedd object to the schedd.
+ The DCSched object knows how to register the treansferd via the
schedd.register_transferd(...) call: [DCSchedd::register_transferd()]
+ StartCommand the TRANSFERD_REGISTER command to the schedd.
+ Force authentication
+ Send a registration request classad that includes:
- ATTR_TREQ_TD_SINFUL: sinful string of TransferDaemon
- ATTR_TREQ_TD_ID: id string of secret key given by the schedd on the
command line.
+ The reponse ad from the schedd contains:
- ATTR_TREQ_INVALID_REQUEST which is either TRUE or FALSE
- If FALSE, then ATTR_TREQ_INVALID_REASON is defined.
+ If the registration is:
Not valid: EXCEPT
Valid: Store the socket to the schedd as m_update_sock. This is a connection
that is used to send updates about TransferRequests from the transferd to
the schedd.
+ Return from g_td.init();
+ Call g_td.register_timers():
+ Register: TransferD::process_active_requests_timer()
This timer is commented out.
I apologize because I don't know why, and this is one of the major
means by which the TransferD performs its functions. I think it was
commented out so all of this stuff could be merged into mainline
HTCondor and not affect anything before we could finish it all.
+ Register: TransferD::exit_due_to_inactivity_timer()
This will cause the transferd to exit if ther eis no work to
perform. Coupled with the above, it means in mainline HTCondor, if the
transferd starts, it will automatically exit because it won't process
any of its queued work.
+ Call g_td.register_handlers():
+ Register:
DUMP_STATE -> TransferD::dump_state_handler
This function will emit information to condor_squawk.
+ Register:
TRANSFERD_CONTROL_CHANNEL -> TransferD::setup_transfer_request_handler()
This function manages a permanent connection to the schedd and is the control
channel between the schedd and the transferd. There are scalability
issues with this architecture which were known about, but we decided to
solve later once the architecture of how the schedd communicated to the
transferd was more finalized.
+ Register:
TRANSFERD_WRITE_FILES -> TransferD::write_files_handler();
This function uses some mechanism (like the file transfer object
itself) to write transfered files into a local directory. It could
be the spool, the initialdir, or whatever.
+ Register:
TRANSFERD_READ_FILES -> TransferD::read_files_handler();
This function uses some mechanism (like the file transfer object
itself) to read transferring files from a local directory. It could
be the spool, the initialdir, or whatever.
+ Register a Repaer:
"Reaper" -> TransferD::reaper_handler()
This function handles the exiting of any process which wasn't a process
devoted specifically to reading/writing files.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Handler DUMP_STATE: TransferD::dump_state_handler
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
When this handler is invoked, it constructs a classad and sends it back on
the wire.
The contents of this classad can includes all sorts of status information
or other statistics about the transferd.
Currently, it will create a classad with:
Uid = getuid()
OutstandingTransferRequests = <how many requests left to do>
and send it back to the client which requested it. Afterwards, the stream
is not kept and daemoncore closes the connection.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Handler TRANSFERD_CONTROL_CHANNEL: TransferD::setup_transfer_request_handler()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
ASIDE: The transferd registration processes looks like this:
1. Schedd fork/execs transferd with secret key on command line.
2. TransferD connects back to the schedd with the secret key and closes the
connection.
3. Schedd validates the key and sinful of the transferd with what it is
expecting.
4. The schedd connects back to transferd, this is called the control
connection, and the tether stays active until the transferd exits.
More transfer requests, among other things, can come down the control
connection.
This handler manages step 4:
+ Make sure we're authenticated by calling rsock->triedAuthentication().
+ Register_Socket the control socket to handle any future TransferRequests
from the schedd:
Control Socket Fd -> TransferD::accept_transfer_request_handler();
+ We keep the stream alive, and return.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
TRANSFERD_WRITE_FILES -> TransferD::write_files_handler()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
This handler is called when a client wishes to send a TransferRequest to
the TransferD which results in files being written *to* the TransferD.
How the actual files themselves are written depends upon the ATTR_TREQ_FTP
attribute in the request ad from the client. FTP in this attribute means
"File Transfer Protocol", but NOT in the same manner as the ftp program.
The values of that attribute dictate HOW the transferd should accept
and write those files. Currently, the only supported method is the
enumeration FTP_CFTP, which means "File Transfer Protocol: HTCondor File
Transfer Protocol".
+ Make sure we're authenticated by calling rsock->triedAuthentication().
+ Get the fully qualified user name fromthe socket.
+ Grab the ClassAd from the socket, and lookup the capability string.
This capability represents a TransferRequest association, not a FileTransfer
capability as found in a job ad.
+ If the capability is unknown, abort the request with a classad and close the
connection, otherwise, find the TransferRequest object associated with the
capability.
+ Get the ATTR_TREQ_FTP protocol from the request ad.
It can only be FTP_CFTP because I had implemented one protocol so far!
+ Ensure that the request is ONLY of the "upload" variety. Upload in this
context means the transfer is ocurring from the client to the TransferD.
+ If everything is ok, send a success classad back to the client.
[Now we set up a "thread" using DaemonCore CreateThread which
_actually_ performs the file transfer in an asynchronous manner using
the FTP protocol defined.]
+ Create a reaper to handle the call to TransferD::write_files_thread() which
will enact the actual transfer.
+ Create a ThreadArg object which holds the FTP protocol enumeration and
the TransferRequest.
+ Create the TransferD::write_files_thread() thread passing in the ThreadArg
object (so it knows what to do) and associating it with the aforementioned
reaper id.
+ TODO: If this creation fails, I don't do anything! I should.
+ Associate the newly created thread with its reaper id in a data structure so
the transferd can refer to it later when it finishes.
+ Return, closing the stream to the client.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
THREAD TransferD::write_files_thread
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
The purpose of this thread, which was created in
TransferD::write_files_handler(), is to make asynchronous the act
of having a client upload a set of files in a TransferRequest to
the TransferD. Otherwise the TransferD would have to block while the
transfer happened and that would suck _a lot_. This function is running
in either another thread, or another process under the TransferD.
[NOTE: This function is barely implemented to just exactly perform the task
at hand. It needs obvious improvement before it can be production
quality.]
+ First thing we do is sleep(1), it is a hack that the comments explain.
+ We pick out the protocol and TransferRequest form the passed in ThreadArg.
+ TODO: We should have different actions based upon the FTP protocol here,
however, we don't and just assume FTP_CFTP in the control flow. This needs
to be fixed for it to be production code.
+ The client is actually waiting to perform the transfer on this socket, so
we'll set the timeout to be very large (8 hours) so the transfer can finish
without timeouts happening.
+ Now, we iterate over the tasks in the TransferReqeust object. Each task is
a job ad that we can instantiate a FileTransfer object with.
For each job ad in the TransferRequest:
+ Create a FileTransfer object.
+ SimpleInit() it.
+ SetPeetVersion() whatever the TranferRequest stated.
+ DownloadFiles() and store all the files in the local file system wherever
the job ad stated it should go.
+ When we're done will all of them it is end of message to the client.
+ Send back a classad to the client that everything is ok.
+ Exit the thread/process with EXIT_SUCCESS. The reaper cares about this.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
REAPER TransferD::write_files_reaper
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
This handler gets called with a write_files_thread image/thread exits. The
main act of this handler is to inform the schedd that the TransferRequest
has been attempted (and hopefully completed!) and what the results of that was.
+ Ensure that the just exited transfer thread that I just caught the reaped
status of is something that I actually asked to happen. This is a consistency
check.
+ Construct a status ad which contains the capability of the TransferRequest
and how the sub-thread/process succeeded or failed.
+ Send the status ad back to the schedd.
+ Destroy the TransferRequest. It is completely finished. If it failed, the
idea is that the schedd will send another one with the same tasks to
be performed.
+ If there are no more requests, then set the inactivity timer.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
TRANSFERD_READ_FILES -> TransferD::read_files_handler()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[This handler is a very similar analogue to TransferD::write_files_handler().]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
THREAD TransferD::read_files_thread
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[This thread is a very similar analogue to TransferD::write_files_thread().]
The only _real_ difference is that UploadFiles() is called on the FileTransfer
object.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
REAPER TransferD::read_files_reaper
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[This reaper is a very similar analogue to TransferD::write_files_reaper().]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
TIMER TransferD::exit_due_to_inactivity_timer()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+ If m_inactivity_timer == 0, the timer is off and we return.
+ Otherwise, we subtract m_inactivity_timer from now, and see if it is
greater then the timeout specified (on the command line, or by default).
+ If we timeout due to inactivity, we DC_Exit(TD_EXIT_TIMEOUT).from now,
and see if it is greater then the timeout specified (on the command line,
or by default).
+ If not, the timer fires again later...
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
TIMER TransferD::process_active_requests_timer() DEMO code!
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
The purpose of this timer is to see if there are any pending pending
TransferRequests and to process them. Currently this code path is demo
code in that it performs the first active TransferRequest it knows about,
and then exits the process.
ASIDE: TransferRequests have two main modes, and a third hacked mode.
The first mode is ACTIVE. This means the transferd itself initiates the
processing of a TransferRequest.
The second mode is PASSIVE. This means that a client initiated the
processing of a TransferRequest.
The hacked one is ACTIVE_SHADOW. This means that the transferd is going
to be initiating the handling of a TransferRequest, except it will do
so only be hard coded communication with a condor_shadow.
WARNING: This timer code is smushy in that it was never fully developed
and/or finished. Coupled with that, there is some demo code in here
that connects to a running shadow to pull files from wherever the shadow
states to get them (which in this case is itself).
+ Iterate over the known about TransferRequests
+ [TREQ_MODE_ACTIVE: No behavior implemented, TODO in source]
+ [TREQ_MODE_PASSIVE: Don't do anything, since a client is managing it]
+ [TREQ_MODE_ACTIVE_SHADOW: Demo code, talk to a shadow and process it]
process_active_shadow_request():
+ Figure out how many jobads (AKA transfer ads) to process. However I only
will be processing the first jobad in the TransferRequest!
+ Create a FileTransfer object. Init() it so it uses Daemoncore.
+ Set up a callback to TransferD::active_shadow_transfer_completed() which
will get called by Daemoncore when the file transfer is completed.
+ Depending upon if the TransferD was invoked with 'upload' or 'download'
on the command line, tell the FileTransfer object to UploadFiles() or
DownloadFiles() respectively. This is a nasty hack since an active
TransferRequest should know if it is uploading or downloading--the actual
TransferRequest object can know this information, but who specified
it wasn't completed.
+ If I found any active TransferRequests, return that I found some.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
CALLBACK TransferD::active_shadow_transfer_completed()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
This is the control continuation from
TransferD::process_active_requests_timer().
+ Ensure the file transfer suceeded, except otherwise.
+ DC_Exit();