Student Project Ideas
This page contains an unsorted list of brainstorm ideas that could be good candidates for student hourly work.
This search may turn up other tasks
- I think it will be useful to start a document that records wisdom/insight about the context in which expressions are evaluated in HTCondor. A good assignment for a student would be to go through the expressions we have and record the context in which they are evaluated. -Miron 9/21/09
- #173 Have a vanilla job "flavor" for running Excel as a job on Win32. (talk to Todd or Ben if you don't understand this) (Current Status).
- #617 Use getopt() everywhere to parse creepy command line options consistently! Whoot!
- #789 Go through the Coverity bugs and fix the obvious ones.
- Add support for standard universe to the new starter/shadow, then dump the old one. This will remove all kinds of inconsistencies between vanilla and standard universe, and speed up future development, as we won't have to add code in multiple places. (NOTE: perhaps this isn't such a great idea anymore if DMTCP adoption goes as hoped)
- There's a number of cases where condor_q -analyze doesn't know why jobs can't run. (negotiation cycle hasn't happened yet, job retirement in progress, parallel universe jobs, etc.) It would be nice to add smarts to -analyze to catch these cases.
- Add stress tests to the HTCondor test suite. It would be great if we had tests that stressed the individual daemons, without trying to run jobs through the system. This way, we'd find memory leaks and other embarrassing problems before out users did.
- #596, #597 Currently, CondorView is difficult to install and maintain. This is because it is written in perl, java, javascript and C++. I believe we could implement something much better in javascript + SOAP to the collector, and have it installed by default with HTCondor. This would just work without any tinkering around, installing apache, downloading jar files or anything other than just a normal HTCondor install.
- Make metronome scalable. There's been a bunch of times where a test fails intermittently. We've got plenty of machines. I'd love to be able to submit 100 test runs overnight and look at the results in the morning. Currently, metronome can't do this.
- Add more effort to writing tests, incl unit tests. I.e. more than just Bill.
- Cool/interesting Job Sandbox transfer plugins: BitTorrent, gridftp, ...
- Work on the Stork supplement work.
- Help wrapping new classads w/ old classad API, and/or make a shared library.
- Add private-to-private proxying in the CCB broker.
- Explore use of Parrot w/ HTCondor.
- Explore a secure HTCondor setup by default, with use of MiniCA and/or online CA.
- Super CHTC turnkey package for installation on Windoze machines around campus, using combo of VirtualBox + HTCondor, in collaboration with Marquette Univ and/or Purdue.
- Quill to MySQL (or SQLite?)
- Test HTCondor w/ out of disk space.
- Get a distcc setup going, put some abstractions into Metronome so you can automatically use it
- There are some memory-testing scripts running on GLOW using Hawkeye/startd cron. Include those in the default install of HTCondor
- Do a clipped port to Solaris 10/x86
- Prototype a 'condor_job_backup' command (not sure you can finish this in a few weeks)
- a .NET/Managed code universe. Run the same job on the CLR on Win32 or on Mono (or, think real hard about how to abstract the Java universe, .Net universe, Python universe, etc)
- Replace the checkpoint server protocol with HTTP GETs and PUTs
- Create a couple of "HTCondor Technical Articles", in the 5-10 page range. Some ideas:
- A document exploring all of the issues involved in running HTCondor in a root squash environment. Document all of the different user ids HTCondor uses and what it does while it is in each of those UIDs
- A document that explores HTCondor and DNS - figure out all of the places HTCondor requires a working DNS, look at where it doesn't need it, and what you can override. Don't be afraid to start at square one with these documents, and explain what root squash is or what reverse DNS is.
- #152 move git to pinguino
- #653: Report operating system, distribution, and version in world ads - Lots of straightforward work detecting different distributions.
- Whenever a user reports a problem or a bug, there's a standard bunch of questions we have: what are your log files, configuration, condor_q -analze, etc. The VDT has a nice script which automatically discovers a bunch of stuff about a system, and generates a report the user can mail in with their bug report. We should do something similar for HTCondor, and ask the VDT people if we can integrate our script into their, or if we can share parts of their script.
- Remove code related to CVS and the defunct git repositories (CONDOR_EXT, CONDOR_TEST_LRG, CONDOR_TEST_CNFDTL) from HTCondor's NMI build scripts.
- Seamless network migration for virtual machines.
If a virtual machine is using networking, and it gets checkpointed and migrated, networking can go disconnected. Likewise, allowing in-bound connections is a tricky problem with virtual machines. This is an early stage design proposal I have. It consists of two aspects, a homing daemon, and tap interfaces on execute hosts.
-homing daemon basically is a HTCondor job which runs as NAT. These HTCondor jobs can execute only on select hosts (as set up by the pool admin), maybe marked by a specific attribute.(problem : what if the homing daemon is checkpointed and migrated ? oops!)
- on the execute host for virtual machines, the nic of the virtual machine should be connected to a tap interface on the host. The starter sets up a tunnel connecting this tap interface and the homing daemon.
- when a virtual machine job is matched, the negotiator fires a homing daemon job (unless one's already running), and passes the configs to the starter / adds it in the classad.
- load balancing can be done by starting more homing daemons when more vm jobs are encountered.
- a dhcp server could also be run as a part of the homing daemon. MAC addresses could be made a part of the vmuniverse job. Thus if a vm job checkpoints and migrates, when it comes up again, it will get the same ip.
- the homing daemon could just run as a switch. And vmuniverse jobs can specify a vlan. In this case, homing daemons can be used to achieve intranetworking for vm universe jobs.
(vmathew) - create virtual machine add ons (guest additions), for the standard vm hypervisors (vmware, virtualbox) etc, to communicate to the HTCondor instance on the execute hosts. A HTCondor-guest-addition.iso cd image which will install (or maybe it's okay to require to compile and install) a service inside the guest (so we need support for common guest operating systems).
Maybe, if we can use these guest additions to allow vms to communicate to the outside world via HTCondor api, then we may not need to support more than NAT networking. Depends fully on how rich the guest additions can make networking.
(vmathew) - Currently, the architecture of the submit machine is taken as the architecture of the required VM Universe jobs. However, if I create a 32 bit vm on a 64 bit machine, then, that VM can as well run on a 32 bit machine with vmware. Yet, the architecture requirement prevents such matches. This needs to be fixed.
(vmathew) - I see a lot of D_ALWAYS used in the vm universe code. Evaluate and change these to something appropriate.. Maybe create a debugging mode for vm universe ?? (vmathew)
- support for mobility:
this is an alternate idea for seamless network migration that I've written down above. Basically, how I deploy a new VM HTCondor node to turn around, join the pool become a worker node is by using NAT and CCB. If the virtual machine is checkpointed and migrated, both the host machine IP address and the guest machine IP address (assigned by vmware with NAT) changes thereby changing the identity of the host. Likewise, it has also been observed that vmwarae assigns a different ip address for nat, when the DHCP lease expires for the guest. This too causes the identity of the machine to change from HTCondor's perspective.
Hence, we need a mechanism by which host machines can be uniquely identified on HTCondor. Maybe, the CM generates a unique host identifier for each host and signs it cryptographically, thereby making a sufficiently persistent host identification for the purposes of VM migration. (vmathew) - #2416: Remove support for antique Visual Studio C++. Straightforward task, requires basic C++ knowledge (preprocessor), ability to compile and run the regression tests on Windows.