We have found it useful to stress-test a release candidate by giving it to Igor Sfiligoi to run in =glideinWMS=. This exercises a lot of HTCondor features that are important to OSG-type users: glidein, HTCondor-G, CE schedd, CCB. Igor suggested that it would be a good idea of we could run this test in-house, and we agree, so we have set up the machinery to do so. This document describes roughly how we installed it. {section: High-level View} *: *ghost-pool* - runs light-weight jobs on cluster nodes. The setup is in /unsup/condor/ghost-pool. Some nodes in CHTC also join the ghost pool. These are configured in the CHTC cfengine. *: *CE: vdt-itb.cs.wisc.edu* - an OSG CE that is part of the ITB (integration test bed). This node will be referred to as the CE. The schedd on this node is part of the ghost pool. The jobmanager is vdt-itb.cs.wisc.edu/jobmanager-nfslitecondor. *: *factory: c157.cs.wisc.edu* - the glideinWMS factory, installed in /data1/itb-stress. It submits glideins to the CE. This node is part of the GLOW pool (despite its domain name). It normally runs HTCondor jobs. Turn off HTCondor when doing a large-scale test. *: *submit: c158.cs.wisc.edu* - the glideinWMS VO frontend, user job submission node, and collector tree for the glidein pool, all installed in /data1/itb-stress. This node normally runs HTCondor jobs. Turn off HTCondor when doing a large-scale test. {section: glidein factory setup} Go to c157.chtc.wisc.edu. Install the factory in /data1/itb-stress. Where an unprivileged user account is needed, we are using zmiller. Documentation for the glideinWMS factory may be found here: http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/doc.v2/install/factory_install.html {verbatim} cd /data1/itb-stress cvs -d :pserver:anonymous@cdcvs.fnal.gov:/cvs/cd_read_only co -r snapshot_091214 glideinWMS perl-Time-HiRes appears to be already installed as part of the main perl install httpd was already installed, so just enabled it: /sbin/chkconfig --level 3 httpd on /etc/init.d/httpd start wget http://atlas.bu.edu/~youssef/pacman/sample_cache/tarballs/pacman-3.28.tar.gz tar xvzf pacman-3.28.tar.gz cd pacman-3.28 . ./setup.sh cd .. mkdir osg_1_2_4 cd osg_1_2_4/ export VDTSETUP_CONDOR_CONFIG=/data1/itb-stress/personal-condor/etc/condor_config export VDTSETUP_CONDOR_BIN=/data1/itb-stress/personal-condor/bin pacman -get http://software.grid.iu.edu/osg-1.2:client . ./setup.sh ln -s /etc/grid-security/certificates globus/TRUSTED_CA cd .. yum install rrdtool-python M2Crypto python module appears to be already installed m2crypto-0.16-6.el5.3.x86_64 This is older than Igor's requirements 0.17, but I am trying it to see if it works. Download javascriptrrd-0.4.2.zip and unzip it. Download flot-0.5.tar.gz and untar it. I made a service cert for glideinwms and installed it in certs. (I used the following command.) cert-gridadmin -email dan@hep.wisc.edu -affiliation osg -vo glow -host c157.chtc.wisc.edu -service glideinwms A proxy for the factory may be made with this command: grid-proxy-init -cert certs/c157.chtc.wisc.edu_glideinwms_cert.pem -key certs/c157.chtc.wisc.edu_glideinwms_key.pem -hours 100 -out certs/c157.chtc.wisc.edu_proxy.pem cd glideinWMS/install chown zmiller /data1/itb-stress su zmiller ./glideinWMS_install choose [4] pool Collector Where do you have the HTCondor tarball? /afs/cs.wisc.edu/p/condor/public/binaries/v7.4/7.4.1/condor-7.4.1-linux-x86_64-rhel5-dynamic.tar.gz Where do you want to install it?: [/home/zmiller/glidecondor] /data1/itb-stress/glidecondor Where are the trusted CAs installed?: [/data1/itb-stress/osg_1_2_4/globus/TRUSTED_CA] /etc/grid-security/certificates Will you be using a proxy or a cert? (proxy/cert) cert Where is your certificate located?: /data1/itb-stress/certs/c157.chtc.wisc.edu_glideinwms_cert.pem Where is your certificate key located?: /data1/itb-stress/certs/c157.chtc.wisc.edu_glideinwms_key.pem What name would you like to use for this pool?: [My pool] ITB-Stress How many secondary schedds do you want?: [9] In this version of glideinWMS, the condor_config.local needs to be edited to replace HOSTALLOW with ALLOW. ./glideinWMS_install [2] Glidein Factory Where is your proxy located?: /data1/itb-stress/certs/c157.chtc.wisc.edu_proxy.pem Where will you host your config and log files?: [/home/zmiller/glideinsubmit] /data1/itb-stress/glideinsubmit Give a name to this Glidein Factory?: [mySites-c157] ITB-Stress Do you want to use CCB (requires HTCondor 7.3.0 or better)?: (y/n) y Do you want to use gLExec?: (y/n) n Force VO frontend to provide its own proxy?: (y/n) [y] Do you want to fetch entries from RESS?: (y/n) [n] Do you want to fetch entries from BDII?: (y/n) [n] Entry name (leave empty when finished): ITB Gatekeeper for 'ITB': vdt-itb.cs.wisc.edu/jobmanager-condornfslite RSL for 'ITB': Work dir for 'ITB': Site name for 'ITB': [ITB] Should glideins use the more efficient Match authentication (works for HTCondor v7.1.3 and later)?: (y/n) y Do you want to create the glidein (as opposed to just the config file)?: (y/n) [n] /data1/itb-stress/glideinWMS/creation/create_glidein /data1/itb-stress/glideinsubmit/glidein_v1_0.cfg/glideinWMS.xml Submit files can be found in /data1/itb-stress/glideinsubmit/glidein_v1_0 Support files are in /var/www/html/glidefactory/stage/glidein_v1_0 Monitoring files are in /var/www/html/glidefactory/monitor/glidein_v1_0 cd /data1/itb-stress # make setup.sh set X509_CERT_DIR and X509_USER_PROXY . ./setup.sh glideinsubmit/glidein_v1_0/factory_startup start I added the following entries to glidecondor/certs/condor_mapfile: GSI "/DC=org/DC=doegrids/OU=Services/CN=glideinwms/c158.chtc.wisc.edu" vofrontend FS zmiller gfactory The FS mapping of zmiller to gfactory appears to be necessary to make the factory ad have AuthenticatedIdentity gfactory, which is what I configured the VO frontend to expect. Not sure why it is using FS to authenticate to the collector though. Initially, the glideins were failing on startup because the -dir argument to glidein_startup.sh had no argument. We edited the factory xml file and set the directory to '.': bash-3.2$ vi ../glidein_v1_0.cfg/glideinWMS.xml bash-3.2$ ./factory_startup reconfig ../glidein_v1_0.cfg/glideinWMS.xml Shutting down glideinWMS factory v1_0@ITB-Stress: [OK] Reconfigured glidein 'v1_0' Active entries are: ITB Submit files are in /data1/itb-stress/glideinsubmit/glidein_v1_0 Reconfiguring the factory [OK] Starting glideinWMS factory v1_0@ITB-Stress: [OK] Then the glideins failed to run because glidein_startup.sh could not find the worker node client software, because the ITB gatekeeper does not point this at a shared location. I added a few lines to the top of /data1/itb-stress /glideinsubmit/glidein_v1_0/glidein_submit.sh to hack around this problem: #Dan hack: if ! [ -f "${OSG_GRID}" ]; then OSG_GRID=/afs/hep.wisc.edu/osg/wnclient fi I had to remove the gass cache on the CE to make this change take effect! Then there was a problem with the glidein HTCondor binaries being incompatible with the SL4 machines in CHTC. To change the binaries used by the glidein: I unpackd the desired HTCondor tarball and named it /data1/itb-stress/condor_versions/7.4.1_x86_rhel3. I then edited glideinsubmit/glidein_v1_0.cfg/glideinWMS.xml. Remove the existing entry in condor_tarballs and replace with this: Then reconfig the factory: glideinsubmit/glidein_v1_0 ./factory_startup reconfig ../glidein_v1_0.cfg/glideinWMS.xml Note that this will slightly rewrite the entry we just added to glideinWMS.xml to insert the tar_file that it generates. {endverbatim} {section: submit node setup} The submit node is c158.chtc.wisc.edu. The documentation for setting up the glideinWMS VO frontend and pool collector are here: http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/doc.v2/install/ {verbatim} This node will run the glidein pool collector, the schedd that submits jobs, and the glideinWMS VO frontend. Install all the same glideinWMS prereqs as on c057. I made a service cert for glideinwms and installed it in certs. (I used the following command.) cert-gridadmin -email dan@hep.wisc.edu -affiliation osg -vo glow -host c158.chtc.wisc.edu -service glideinwms To create the proxy used by the VO frontend: grid-proxy-init -cert certs/c158.chtc.wisc.edu_glideinwms_cert.pem -key certs/c158.chtc.wisc.edu_glideinwms_key.pem -hours 100 -out certs/c158.chtc.wisc.edu_proxy.pem cd glideinWMS/install/ ./glideinWMS_install [4] User Pool Collector [5] User Schedd Please select: 4,5 Where do you have the HTCondor tarball? /afs/cs.wisc.edu/p/condor/public/binaries/v7.4/7.4.1/condor-7.4.1-linux-x86_64-rhel5-dynamic.tar.gz Where do you want to install it?: [/home/zmiller/glidecondor] /data1/itb-stress/glidecondor Do you want to split the config files between condor_config and condor_config.local?: (y/n) [y] y Do you want to get it from VDT?: (y/n) n Where are the trusted CAs installed?: [/etc/grid-security/certificates] Where is your certificate located?: /data1/itb-stress/certs/c158.chtc.wisc.edu_glideinwms_cert.pem Where is your certificate key located?: /data1/itb-stress/certs/c158.chtc.wisc.edu_glideinwms_key.pem How many slave collectors do you want?: [5] 50 What name would you like to use for this pool?: [My pool] ITB-Stress-Submit What port should the collector be running?: [9618] (I had to hand edit condor_config.local to change HOSTALLOW --> ALLOW) Installing user submit schedds Do you want to use the more efficient Match authentication (works for HTCondor v7.1.3 and later)?: (y/n) y How many secondary schedds do you want?: [9] ./glideinWMS_install [7] VO Frontend Where is javascriptRRD installed?: /data1/itb-stress/javascriptrrd-0.4.2 Where is your proxy located?: /data1/itb-stress/certs/c158.chtc.wisc.edu_proxy.pem For security reasons, we need to know what will the WMS collector map us to. It will likely be something like joe@collector1.myorg.com What is the mapped name?: vofrontend@c157.chtc.wisc.edu Where will you host your config and log files?: [/home/zmiller/frontstage] /data1/itb-stress/frontstage Where will the web data be hosted?: [/var/www/html/vofrontend] What node is the WMS collector running?: c157.chtc.wisc.edu What is the classad identity of the glidein factory?: [gfactory@c157.chtc.wisc.edu] Collector name(s): [c158.chtc.wisc.edu:9618]c158.chtc.wisc.edu:9618c158.chtc.wisc.edu:9620-9669 What kind of jobs do you want to monitor?: [(JobUniverse==5)&&(GLIDEIN_Is_Monitor =!= TRUE)&&(JOB_Is_Monitor =!= TRUE)] The following schedds have been found: [1] c158.chtc.wisc.edu [2] schedd_jobs1@c158.chtc.wisc.edu [3] schedd_jobs2@c158.chtc.wisc.edu [4] schedd_jobs3@c158.chtc.wisc.edu [5] schedd_jobs4@c158.chtc.wisc.edu [6] schedd_jobs5@c158.chtc.wisc.edu [7] schedd_jobs6@c158.chtc.wisc.edu [8] schedd_jobs7@c158.chtc.wisc.edu [9] schedd_jobs8@c158.chtc.wisc.edu [10] schedd_jobs9@c158.chtc.wisc.edu Do you want to monitor all of them?: (y/n) y What kind of jobs do you want to monitor?: [(JobUniverse==5)&&(GLIDEIN_Is_Monitor =!= TRUE)&&(JOB_Is_Monitor =!= TRUE)] Give a name to the main group: [main] Match string: [True] VO frontend proxy = '/data1/itb-stress/certs/c158.chtc.wisc.edu_proxy.pem' Do you want to use it to submit glideins: (y/n) [y] /data1/itb-stress/glideinWMS/creation/create_frontend /data1/itb-stress/frontstage/instance_v1_0.cfg/frontend.xml Work files can be found in /data1/itb-stress/frontstage/frontend_myVO-c158-v1_0 Support files are in /var/www/html/vofrontend/stage/frontend_myVO-c158-v1_0 Monitoring files are in /var/www/html/vofrontend/monitor/frontend_myVO-c158-v1_0 Added /DC=org/DC=doegrids/OU=Services/CN=glideinwms/c157.chtc.wisc.edu to GSI_DAEMON_NAME. # To start it up: cd frontstage/frontend_myVO-c158-v1_0/ ./frontend_startup start # Logs are in frontstage/frontend_myVO-c158-v1_0/group_main/log frontstage/frontend_myVO-c158-v1_0/log {endverbatim} {section: setting up CE} The CE on vdt-itb.cs.wisc.edu was already set up except for a HTCondor jobmanager. {verbatim} SET UP HTCondor 1) installed condor-rhel5.repo into /etc/yum.repo.d 2) yum install condor 3) edit /etc/condor/condor_config? 4) edit /etc/condor/condor_config.local change CONDOR_HOST = chopin.cs.wisc.edu change COLLECTOR_HOST = $(CONDOR_HOST):15396 change DAEMON_LIST = MASTER, SCHEDD 5) export CONDOR_CONFIG=/etc/condor/condor_config 6) /usr/sbin/condor_master okay, that's working. SET UP JOBMANAGER 7) cd /opt/itb 8) DO NOT DO THIS: pacman -trust-all-caches -get http://software.grid.iu.edu/osg-1.1.14:Globus-Condor-Setup 9) do: [root@vdt-itb ~]# export CONDOR_CONFIG=/etc/condor/condor_config [root@vdt-itb ~]# export VDT_CONDOR_LOCATION=/usr [root@vdt-itb ~]# export VDT_CONDOR_CONFIG=/etc/condor/condor_config 10) cd /opt/itb/ 11) pacman -install http://vdt.cs.wisc.edu/vdt_2099_cache:Globus-CondorNFSLite-Setup 12) $VDT_LOCATION/globus/lib/perl/Globus/GRAM/JobManager/condornfslite.pm update condor_config location, condor_rm, condor_submit 13) edit $VDT_LOCATION/edg/etc/grid-mapfile-local, add DNs 14) run $VDT_LOCATION/edg/sbin/edg-mkgridmap {endverbatim}