Ticket #2530: Xen/KVM modified disk images not transferred back to submit machine

When a Xen or KVM VM shuts down and the vm-gahp's call to virDomainLookupByName() fails with error VIR_ERR_NO_DOMAIN, the vm-gahp removes all files in the execute directory when it shuts down. This means that the modified disk images don't get transferred back to the submit machine.

Should the vm-gahp ever remove all files in the execute directory? The starter will decide if files should be transferred back to the submit machine and clean up the execute directory.

This was reported by the BNL ATLAS group. Another thing they noticed is that should_transfer_files and when_to_transfer_output aren't consulted by condor_submit for Xen jobs. It even prints warnings about not consulting them.

[Append remarks]

Remarks:

2011-Oct-12 15:26:34 by tstclair:
The patch looks ok, but what is the root for this use case.


2011-Oct-12 15:36:25 by jfrey:
The BNL ATLAS user was seeing his VM jobs complete after experiencing the VIR_ERR_NO_DOMAIN case in VirshType::Status(). The jobs enter the completed state and leave the queue, but the vm-gahp deletes the disk image files.

Are you saying that VIR_ERR_NO_DOMAIN case shouldn't occur when a vm shuts itself down? That it's a sign of a failure? If so, we can't treat it as a successfully completed job. We need to return it to idle status to rerun or put it on hold.

I have a call with the user this afternoon. I'll confirm whether his vms shut themselves down after completing their work.


2011-Oct-12 16:14:15 by tstclair:
My mistake offset while reading. That is a normal workflow.


2011-Oct-12 16:42:35 by jfrey:
I am leery of marking a vm as done in the VIR_ERR_SYSTEM_ERROR case. We may need to handle that case differently.


2011-Oct-17 16:56:39 by jfrey:
The problem with should_transfer_files and when_to_transfer_output is in a child ticket (#2556).
[Append remarks]

Properties:

Type: defect           Last Change: 2011-Oct-17 16:56
Status: resolved          Created: 2011-Oct-07 14:36
Fixed Version: v070604           Broken Version: v070600 
Priority:          Subsystem: VM 
Assigned To: jfrey           Derived From:  
Creator: jfrey  Rust:  
Customer Group: atlas  Visibility: public 
Notify: tstclair@redhat.com  Due Date: 20111014 

Derived Tickets:

#2556   File transfer attributes ignored for vm universe

Related Check-ins:

2011-Oct-12 11:21   Check-in [27783]: Xen/KVM modified disk images not transferred back to submit machine #2530 When a running Xen or KVM vm is reported 'not found' by libvirt, the vm-gahp reported the vm as completed, but ended up removing all files in the execute directory. It no longer removes the files. ===VersionHistory:Complete=== [...] (By Jaime Frey )

Attachments: