Ticket #1786: review treatment of undefined policy settings in startd
Miron suggested that we review the treatment of undefined policy expressions, starting with the startd. I have summarized below the current treatment as of 7.5.4.
Meaning of the columns in the following table:
- Unconfigured treatment when not specified in configuration
- Undefined treatment when expression evaluates to undefined
- Error treatment when expression evaluates to error
Meaning of the values in the following table:
- (S) silent treatment; no log message
- Ignore use more generic policy variable in place of this one
- Error shut down the startd and log the reason why
|
Variable |
Unconfigured |
Undefined |
Error |
|
START |
Error |
False (S) |
False (S) |
|
PREEMPT |
Error |
Error |
Error |
|
SUSPEND |
Error |
Error |
Error |
|
CONTINUE |
Error |
Error |
Error |
|
WANT_HOLD |
False (S) |
False |
Error |
|
WANT_SUSPEND |
Error |
False (S) |
False (S) |
|
WANT_VACATE |
Error |
Error |
Error |
|
KILL |
Error |
Error |
Error |
|
PERIODIC_CHECKPOINT |
False (S) |
False (S) |
False (S) |
|
PREEMPT_VANILLA |
Ignore (S) |
Ignore (S) |
Ignore (S) |
|
SUSPEND_VANILLA |
Ignore (S) |
Ignore (S) |
Ignore (S) |
|
CONTINUE_VANILLA |
Ignore (S) |
Ignore (S) |
Ignore (S) |
|
WANT_SUSPEND_VANILLA |
Ignore (S) |
Ignore (S) |
Ignore (S) |
|
WANT_VACATE_VANILLA |
Ignore (S) |
Ignore (S) |
Ignore (S) |
|
KILL_VANILLA |
Ignore (S) |
Ignore (S) |
Ignore (S) |
Prior to 7.4.2, if WANT_SUSPEND evaluated to undefined, this was treated as an error. This was changed to be treated as false because numerous instances occurred in which admins were surprised to have their startd exit when a job showed up lacking some attribute the WANT_SUSPEND expression referenced. In discussion, it was concluded that in every known case of this problem, the desired behavior was for WANT_SUSPEND to be treated as false, so a change was made to the code to implement this behavior (#1001).
Proposal
Treat 'unconfigured' and 'undefined' and 'error' as semantically equivalent. For all the cases mentioned in the table above where 'undefined' leads to an error, it should instead be equivalent to 'false'. We should, however, provide useful diagnostics. For example, it should not log that SUSPEND is false when it is actually 'undefined' or 'error'.
Remarks:
2010-Nov-30 23:41:49 by matt:
The proposal sounds good, but what about making CONTINUE default to True? Also, please make a policy object to create a central place where code managing these parameters can be documented and handled consistently.
2010-Dec-01 17:08:01 by danb:
I notice that the <x>_VANILLA knobs are barely documented. I propose that we phase them out (warning in 7.5, remove in 7.7). We can ask the users first to see if anybody cares. I would be surprised if anybody does. In my opinion, they just make things more complicated, and it's not clear why knobs exist for just the vanilla universe. I see half-implemented support for similar VM universe knobs.
2011-Jan-27 14:50:47 by danb:
Bulk change of target version from v070505 to v070506 using ./ticket-target-mover.
2011-Feb-01 16:20:02 by tannenba:
Bulk change of target version from v070506 to NULL using ./ticket-target-mover.
Properties:
| Type: |
enhance |
|
Last Change: |
2011-Feb-01 16:21 |
| Status: |
new |
|
Created: |
2010-Nov-29 18:20 |
| Fixed Version: |
|
|
Broken Version: |
v070504 |
| Priority: |
4 |
|
Subsystem: |
Daemons |
| Assigned To: |
|
|
Derived From: |
|
| Creator: |
danb |
|
Rust: |
|
| Customer Group: |
other |
|
Visibility: |
public |
| Notify: |
dan@hep.wisc.edu, miron@cs.wisc.edu, tannenba@cs.wisc.edu, matt@cs.wisc.edu |
|
Due Date: |
|