Ticket #2530: Xen/KVM modified disk images not transferred back to submit machine
When a Xen or KVM VM shuts down and the vm-gahp's call to virDomainLookupByName() fails with error VIR_ERR_NO_DOMAIN, the vm-gahp removes all files in the execute directory when it shuts down. This means that the modified disk images don't get transferred back to the submit machine.
Should the vm-gahp ever remove all files in the execute directory? The starter will decide if files should be transferred back to the submit machine and clean up the execute directory.
This was reported by the BNL ATLAS group. Another thing they noticed is that should_transfer_files and when_to_transfer_output aren't consulted by condor_submit for Xen jobs. It even prints warnings about not consulting them.
2011-Oct-12 15:26:34 by tstclair:
The patch looks ok, but what is the root for this use case.
2011-Oct-12 15:36:25 by jfrey:
The BNL ATLAS user was seeing his VM jobs complete after experiencing the VIR_ERR_NO_DOMAIN case in VirshType::Status(). The jobs enter the completed state and leave the queue, but the vm-gahp deletes the disk image files.
Are you saying that VIR_ERR_NO_DOMAIN case shouldn't occur when a vm shuts itself down? That it's a sign of a failure? If so, we can't treat it as a successfully completed job. We need to return it to idle status to rerun or put it on hold.
I have a call with the user this afternoon. I'll confirm whether his vms shut themselves down after completing their work.
2011-Oct-12 16:14:15 by tstclair:
My mistake offset while reading. That is a normal workflow.
2011-Oct-12 16:42:35 by jfrey:
I am leery of marking a vm as done in the VIR_ERR_SYSTEM_ERROR case. We may need to handle that case differently.
2011-Oct-17 16:56:39 by jfrey:
The problem with should_transfer_files and when_to_transfer_output is in a child ticket (#2556).
File transfer attributes ignored for vm universe|
Check-in : Xen/KVM modified disk images not transferred back to submit machine #2530 When a running Xen or KVM vm is reported 'not found' by libvirt, the vm-gahp reported the vm as completed, but ended up removing all files in the execute directory. It no longer removes the files. ===VersionHistory:Complete=== [...]
(By Jaime Frey )|
1258 bytes added by jfrey on 2011-Oct-07 19:40:26 UTC.
Patch to stop vm-gahp from removing all files when a libvirt vm shuts down.