Ticket #1018: queue super user cannot drop privs

As of 7.2.0, the queue super user has the power to change the value of Owner to any owner who has a job in the queue or who has ever had a job in the queue during the life of the current running schedd. This was added for JobRouter.

The problem with this is that condor_shadow authenticates to the schedd as the condor user (always a super user) when performing set_job_attr operations on behalf of chirp calls made by the job. Therefore, the job can change its own Owner attribute, requeue itself, and gain access to somebody else's account. It can only gain access to accounts that have submitted a job to condor or to the "condor" user, and it can not gain access to root, because condor will not run jobs as root.

A pithy example of how to reproduce the problem appears in a remark from tannenba below.

The solution we decided to pursue in 7.2.5 and 7.4.1 is to add a qmgmt command that allows the queue super user to reduce privileges to that of an ordinary user so that all subsequent operations in the qmgmt session are done as though by the other user (like seteuid() in unix). The job queue updater used by the shadow (and local starter) does this in all cases immediately after establishing a qmgmt session. This addresses the problem of chirp but also reduces risk of other operations being allowed that should not be.

Changing the shadow to authenticate as the user instead of condor is another possible solution. This is problematic, because not all authentication methods support authenticating as the user from within a condor daemon. For those methods that do support authentication based on priv state (such as FS), the security session cache does not keep track of what priv state was used to create the session, so we would either need to fix that, or change all shadow --> schedd communication to be done as the user (including the daemon keepalive and possibly other things I am forgetting).

Removing the power of the super user to directly change Owner is another option. In fact, with the "seteuid()" qmgmt operation, the JobRouter should no longer need the queue super user to have the power to directly set Owner. However, changing the JobRouter seems like a larger than necessary change in the stable series, so this should probably be done in the current development series, leaving the extra super user power as is in the stable series if not beyond.

[Append remarks]

Remarks:

2009-Dec-02 18:41:01 by danb:
Todd had an idea: just make Owner immutable. JobRouter does not need to change the owner. It just needs to be able to set it.

I'm planning to make a patch that does that in 7.2 and forward. I already have a patch that implements qmgmt seteuid, and it's not very complex, but given the immutable Owner, I think the seteuid patch can be safely put in 7.5 instead of the stable series.


2009-Dec-03 10:18:58 by danb:
The problem is slightly worse than we previously thought. Even before 7.2.0 chirp can set the owner to the condor user. This is possible because in all existing versions of condor, any user is allowed to set Owner to the authenticated name of the qmgmt connection, and the shadow authenticates as the condor user.

So in all versions of condor that support chirp's set attribute operation (appears to be 6.5.4+), it is possible to become the condor user. In 7.2.0+, it is also possible to become other users who have previously submitted jobs. Since the condor user is always the queue super user (regardless of configuration), becoming the condor user gives one power to modify all other jobs in the queue, effectively allowing one to become those users as well. Therefore, the only additional exposure added in 7.2.0+ is the ability to become users who no longer have jobs in the queue but who submitted jobs to this schedd in the past (since the schedd was last started).


2009-Dec-03 15:18:11 by danb:
I have attached two patches, one proposed for stable series 7.2 and 7.4 and the other for the development series 7.5. The stable series patch makes Owner immutable. The development series patch makes the job queue updater (used by shadow and local starter) set its effective qmgmt owner name to the user.


2009-Dec-04 13:16:03 by tannenba:
Pithy example of how to reproduce this issue.

When you first submit the below job, the Owner attribute will be that of the submitting user. However, after the job runs, the Owner attribute will change.

Assuming that your are running bash and Condor is in your path.

1. First create file test.sub :

universe = vanilla
+WantIOProxy = True
executable = $ENV(CONDOR_CHIRP)
arguments = set_job_attr Owner \"$ENV(CONDOR_USER)\"
when_to_transfer_output = ON_EXIT
leave_in_queue = JobStatus==4
queue

2. Then enter the following commands:

bash
export CONDOR_USER=`condor_config_val schedd_log | xargs ls -l  | awk '{print $3}'`
export CONDOR_CHIRP=`condor_config_val libexec`/condor_chirp
condor_submit test.sub


2009-Dec-04 14:56:35 by tannenba:
Finished code review of stable series fix, and pushed it to V7_2-branch, then merged this into V7_4_1-branch, V7_4-branch, and master.


2009-Dec-07 13:25:59 by danb:
The devel-series fix has been committed for 7.5.0 and both JobRouter and Gridmanager have been updated to use the new "set effective owner" qmgmt mode. I am closing this ticket.


2010-Feb-01 13:23:15 by tannenba:
Security embargo period is over, making this information public.
[Append remarks]

Properties:

Type: defect           Last Change: 2010-Feb-01 13:23
Status: resolved          Created: 2009-Dec-02 13:34
Fixed Version: v070401           Broken Version: v060504 
Priority:          Subsystem: Daemons 
Assigned To: tannenba           Derived From:  
Creator: danb  Rust:  
Customer Group: other  Visibility: public 
Notify: tannenba@cs.wisc.edu,zmiller@cs.wisc.edu matt@cs.wisc.edu  Due Date: 20091204 

Related Check-ins:

2022-Dec-06 16:30   Check-in [70786]: Merged [70758], Merge pull request #1018 from htcondor/V10_0-HTCONDOR-1468-cmake-3-25 V10_0-HTCONDOR-1468-cmake-3-25 Committer: GitHub (By Tim Theisen )
2009-Dec-04 17:23   Check-in [16505]: Devel series fix for issue described in #1018 (By Dan Bradley )
2009-Dec-04 14:04   Check-in [16495]: See issue #1018. (By Todd Tannenbaum )

Attachments: