HTCondorWiki: Mpi Wisdom

 
 Hi there,
 
-I am in the process of setting up Condor as a batch queueing/scheduling
+I am in the process of setting up HTCondor as a batch queueing/scheduling
 system for our Beowulf cluster. But before I can get started I already
 ran into a few problems:
 
@@ -16,7 +16,7 @@
    However, the release.tar contains only what looks like static
    libraries (lib*.a). Am I missing something?
 
-2) I am installing Condor on the master node of the cluster and want to
+2) I am installing HTCondor on the master node of the cluster and want to
    use it on the private network 172.16.0.0. The hostname structure is
 
    172.16.0.1    b001                (master node)
@@ -39,12 +39,12 @@
    used for MPI communication, the second (on the 172.17.0.0 network)
    is used for NFS. Thus I set NETWORK_INTERFACE = 172.16.0.# in each
    of the condor_config.local files.
-   Does it create a problem with Condor that NFS traffic (including
-   exporting of the Condor home directory and the Condor binaries and
+   Does it create a problem with HTCondor that NFS traffic (including
+   exporting of the HTCondor home directory and the HTCondor binaries and
    libraries) is actually on a different interface?
 
 Thanks a lot for your help in advance!
-Also, thanks for making Condor available.
+Also, thanks for making HTCondor available.
 
 Regards,
 Martin
@@ -68,7 +68,7 @@
 Hello,
 
 
-> I am in the process of setting up Condor as a batch queueing/scheduling
+> I am in the process of setting up HTCondor as a batch queueing/scheduling
 > system for our Beowulf cluster. But before I can get started I already
 > ran into a few problems:
 >
@@ -80,7 +80,7 @@
 No.  We statically link in all of our own libraries, hence the size of the
 executables.
 
-> 2) I am installing Condor on the master node of the cluster and want to
+> 2) I am installing HTCondor on the master node of the cluster and want to
 >    use it on the private network 172.16.0.0. The hostname structure is
 >
 >    172.16.0.1    b001                (master node)
@@ -109,8 +109,8 @@
 >    used for MPI communication, the second (on the 172.17.0.0 network)
 >    is used for NFS. Thus I set NETWORK_INTERFACE = 172.16.0.# in each
 >    of the condor_config.local files.
->    Does it create a problem with Condor that NFS traffic (including
->    exporting of the Condor home directory and the Condor binaries and
+>    Does it create a problem with HTCondor that NFS traffic (including
+>    exporting of the HTCondor home directory and the HTCondor binaries and
 >    libraries) is actually on a different interface?
 
 Consensus amongst the development team is no, it shouldn't create any
@@ -135,7 +135,7 @@
 
 > Hello,
 >
-> > I am in the process of setting up Condor as a batch queueing/scheduling
+> > I am in the process of setting up HTCondor as a batch queueing/scheduling
 > > system for our Beowulf cluster. But before I can get started I already
 > > ran into a few problems:
 > >
@@ -147,14 +147,14 @@
 > No.  We statically link in all of our own libraries, hence the size of the
 > executables.
 
-Doesn't that mean that every time a job is checkpointed all the condor
+Doesn't that mean that every time a job is checkpointed all the HTCondor
 libraries are dumped into the checkpoint directory for every and each job
 over and over again leading to an unnecessary high demand for disk space?
 I would prefer adding /usr/local/condor/lib to /etc/ld.so.conf (just a
 thought - I have not actually estimated the typical disk space needs on
 my cluster in the checkpoint directory).
 
-> > 2) I am installing Condor on the master node of the cluster and want to
+> > 2) I am installing HTCondor on the master node of the cluster and want to
 > >    use it on the private network 172.16.0.0. The hostname structure is
 > >
 > >    172.16.0.1    b001                (master node)
@@ -185,10 +185,10 @@
 
 As far as I understand I can set FILESYSTEM_DOMAIN to any string and as
 long as it is the same on all hosts it will work.
-I could set UID_DOMAIN to *. Then I could firewall all condor ports on the
+I could set UID_DOMAIN to *. Then I could firewall all HTCondor ports on the
 master node so that only machines on the private network can talk to the
-condor daemons. This seems to be desirable anyway.
-Which are the condor ports that I would have to block in that case?
+HTCondor daemons. This seems to be desirable anyway.
+Which are the HTCondor ports that I would have to block in that case?
 Anything else that I must consider (with respect to security), if I set
 UID_DOMAIN = *
 ?
@@ -197,8 +197,8 @@
 > >    used for MPI communication, the second (on the 172.17.0.0 network)
 > >    is used for NFS. Thus I set NETWORK_INTERFACE = 172.16.0.# in each
 > >    of the condor_config.local files.
-> >    Does it create a problem with Condor that NFS traffic (including
-> >    exporting of the Condor home directory and the Condor binaries and
+> >    Does it create a problem with HTCondor that NFS traffic (including
+> >    exporting of the HTCondor home directory and the HTCondor binaries and
 > >    libraries) is actually on a different interface?
 >
 > Consensus amongst the development team is no, it shouldn't create any
@@ -210,7 +210,7 @@
 I'll definitely let you know. But for now I ran into another problem:
 SMP configuration.
 All of the machines in my cluster are identical dual processor boxes.
-I understand that by default condor would split those boxes up into
+I understand that by default HTCondor would split those boxes up into
 two virtual machines with equal amount of memory. This is not really
 what I would like to do. I'd rather make all of memory available to
 the first job and then make [total_memory - (memory taken by first job)]
@@ -222,7 +222,7 @@
 it into two parts: 96 CPUs dedicated to MPI (and possibly PVM) jobs and
 96 CPUs for everything else. Right now I do not have enough MPI jobs to fill
 the MPI processors, but I have (at least sometimes) more than 192 serial
-jobs. I'd like to configure Condor such that a serial job goes preferably
+jobs. I'd like to configure HTCondor such that a serial job goes preferably
 on those 96 processors that are not dedicated to MPI jobs. If those
 are already busy, serial jobs go onto the MPI processors as well.
 When a MPI job claims those processor the serial jobs vacate immediately
@@ -240,9 +240,9 @@
 3.11.11.5? Or can I actually use "START = True" everywhere, thus just define
 it in the global config file?
 
-Also, is there a way of forcing users to submit their jobs to Condor
-instead of starting them directly in the background? I.e., can Condor
-stop jobs that are not started via Condor?
+Also, is there a way of forcing users to submit their jobs to HTCondor
+instead of starting them directly in the background? I.e., can HTCondor
+stop jobs that are not started via HTCondor?
 
 Any pointer are appreciated. Thanks a lot for your help!
 
@@ -255,15 +255,15 @@
 
 
 To: condor-admin@cs.wisc.edu
-Subject: Re: [condor-admin #4081] condor on a beowulf
+Subject: Re: [condor-admin #4081] HTCondor on a beowulf
 Date: Tue, 30 Jul 2002 16:39:28 -0500
 From: Derek Wright
 
-> Doesn't that mean that every time a job is checkpointed all the condor
+> Doesn't that mean that every time a job is checkpointed all the HTCondor
 > libraries are dumped into the checkpoint directory for every and each job
 > over and over again leading to an unnecessary high demand for disk space?
 
-basically, yes.  however, there is state in the condor libraries
+basically, yes.  however, there is state in the HTCondor libraries
 linked in with your code, specific to each job, that can't be shared
 across the different jobs.
 
@@ -272,7 +272,7 @@
 > my cluster in the checkpoint directory).
 
 if we did a lot of effort, we might be able to make a shareable
-section of the condor libraries and put that in a dynamic library.
+section of the HTCondor libraries and put that in a dynamic library.
 however, the problems associated with checkpointing dynamically linked
 jobs (particularly on linux) were so awful that we decided it wasn't
 worth our effort to continue to support it.  80 gig drives are so
@@ -286,7 +286,7 @@
 
 sort of.  that's not really what you want, though.
 
-as luck would have it, there's active work on the condor team to
+as luck would have it, there's active work on the HTCondor team to
 support pools without fully qualified hostnames.  however, i'm not
 involved with that, so i put that question of yours into a new message
 in our tracking system.  i'll assign it to someone else, and let them
@@ -311,20 +311,20 @@
 
 
 To: condor-admin@cs.wisc.edu
-Subject: Re: [condor-admin #4081] condor on a beowulf
+Subject: Re: [condor-admin #4081] HTCondor on a beowulf
 Date: Tue, 30 Jul 2002 19:10:49 -0500
 From: Derek Wright
 
 > SMP configuration.
 > All of the machines in my cluster are identical dual processor boxes.
-> I understand that by default condor would split those boxes up into
+> I understand that by default HTCondor would split those boxes up into
 > two virtual machines with equal amount of memory. This is not really
 > what I would like to do. I'd rather make all of memory available to
 > the first job and then make [total_memory - (memory taken by first job)]
 > available to the second machine. Is that possible?
 
 no.  i agree with you, that's how i think it should work, too.
-however, when i was implementing the smp support in condor, i was told
+however, when i was implementing the smp support in HTCondor, i was told
 it had to work the way it does, so that's how it is.  here's the deal:
 you can partition the memory however you want between the two cpus
 (50/50, 73/27, whatever you want), but you have to do it ahead of
@@ -354,10 +354,10 @@
 > but I have (at least sometimes) more than 192 serial jobs.
 
 no problem.  this mix of dedicated parallel jobs and "opportunistic"
-serial jobs is exactly the kind of environment condor is setup to
+serial jobs is exactly the kind of environment HTCondor is setup to
 handle.
 
-> I'd like to configure Condor such that a serial job goes preferably
+> I'd like to configure HTCondor such that a serial job goes preferably
 > on those 96 processors that are not dedicated to MPI jobs.
 
 you would do this with the "Rank" expression in the job's submit
@@ -434,7 +434,7 @@
 > processors jobs of the user currently occupy, i.e., the "history"
 > should not play a role.
 
-that's a different issue.  this is the responsibility of the condor
+that's a different issue.  this is the responsibility of the HTCondor
 "accountant", which lives inside the condor_negotiator daemon.  the
 knob you want to turn is called "PRIORITY_HALFLIFE".  think of your
 user priority as a radioactive substance. :) consider a priority that
@@ -451,7 +451,7 @@
 > Quite honestly: I am kind of lost after having read through various
 > chapters of the manual.
 
-yeah, condor is incredibly flexible, therefore, incredibly complicated
+yeah, HTCondor is incredibly flexible, therefore, incredibly complicated
 to configure and document.  i'm sorry.  we get this stuff wrong in our
 own pool on occassion, even with the developers editing the config
 files. :)
@@ -470,7 +470,7 @@
 > Also I guess that I should set KeyboardIdle to True and KeyboardBusy
 > and ConsoleBusy to False, correct?
 
-KeyboardIdle and ConsoleIdle are computed for you by condor.  they're
+KeyboardIdle and ConsoleIdle are computed for you by HTCondor.  they're
 just a measure of how long it's been since someone logged into a tty,
 or touched the physical console.  you can't set them to anything.
 however, all that really matters are the "policy expressions" (start,
@@ -505,9 +505,9 @@
 "START = True" in your global config file and that'll be the behavior
 on all the machines in the pool.
 
-> Also, is there a way of forcing users to submit their jobs to Condor
+> Also, is there a way of forcing users to submit their jobs to HTCondor
 > instead of starting them directly in the background? I.e., can
-> Condor stop jobs that are not started via Condor?
+> HTCondor stop jobs that are not started via HTCondor?
 
 sort of, but it's quite convoluted at this point.  your best bet for
 now is to just have a cron job run every minute or so, check for
@@ -517,7 +517,7 @@
 
 i hope this helps.  if you have futher questions about any of this,
 reply to this message.  if you have other questions or comments about
-condor, just send a new note to condor-admin.
+HTCondor, just send a new note to condor-admin.
 
 thanks,
 -derek