Ticket #2299: memory leaks in negotiator

I noticed an idiom used several times in the HGQ code to check if a parameter is defined:

	if (NULL != param_without_default("GROUP_QUOTA_MAX_ALLOCATION_ROUNDS"))

This leaks memory if the parameter is defined, because the return value of param_without_default() is not freed.

I suggest adding a new function param_is_defined(name). It could probably be implemented more efficiently than by calling param_without_default(), but until there is need to optimize, a simple implementation is probably best.

[Append remarks]

Remarks:

2011-Sep-06 15:42:51 by eje:
REPRO/TEST

There is no functional change made by this fix. The negotiator should work exactly as before. I exercised the fix with the following functional test for HGQ (also attached as a tarball):

Configuration:

NEGOTIATOR_DEBUG = D_FULLDEBUG
SCHEDD_INTERVAL	= 15
NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE

NUM_CPUS = 40

# turn on round-robin for this test, since we're testing
# an overlapping-effective-pool scenario
HFS_ROUND_ROBIN_RATE = 1.0

# turn on multiple allocation rounds, since we're testing
# slot rejection scenario as well
HFS_MAX_ALLOCATION_ROUNDS = 3

GROUP_NAMES = a, b, a.a, a.b, b.a, b.b

GROUP_QUOTA_DYNAMIC_a = 0.5
GROUP_QUOTA_DYNAMIC_b = 0.5
GROUP_QUOTA_DYNAMIC_a.a = 0.5
GROUP_QUOTA_DYNAMIC_a.b = 0.5
GROUP_QUOTA_DYNAMIC_b.a = 0.5
GROUP_QUOTA_DYNAMIC_b.b = 0.5

# Have "a" and "b" subtrees manage their quota independently
GROUP_AUTOREGROUP = TRUE
GROUP_AUTOREGROUP_a = FALSE
GROUP_AUTOREGROUP_b = FALSE

Submit file:

universe = vanilla
cmd = /bin/sleep
arguments = 10m
# group "a.a" should be able to use slots unused by "a.b"
# set up requirements to stay out of lower slots (used by b.a and b.b)
Requirements = (SlotID > 10)
+AccountingGroup = "a.a.user"
queue 25
# set up jobs for "a.b" to fail:
Requirements = False
+AccountingGroup = "a.b.user"
queue 25
# set up "b.a" and "b.b" to compete
Requirements = (SlotID <= 10)
+AccountingGroup = "b.a.user"
queue 25
+AccountingGroup = "b.b.user"
queue 25

Expected negotiator results (copied from original test README)

This test is designed to test behavior in two cases of negotiation failure:
1) A group cannot use all the slots it is allocated (in which case other should be able to use surplus)
2) Two (or more) groups are competing for same subset of slots, in which case neither should starve.

Best to begin this by nuking your accounting log.
Install "neg_failures.config" into your configuration, then start up condor.

Submit "neg_failures.submit":
% condor_submit neg_failures.submit
Submitting job(s)....................................................................................................
100 job(s) submitted to cluster 2.

You should see the following distribution: "b.a" and "b.b" are sharing their subset, and neither is starved.  "a.a" is able to u
se the surplus that "a.b" cannot use.  (note, there are 10 of the total 40 slots unused here, by design):

% qvhist AccountingGroup -c 'JobStatus == 2'
     20 a.a.user
      5 b.a.user
      5 b.b.user
     30 total


2011-Sep-23 15:25:10 by danb:
Code review: looks good. Thanks! --Dan
[Append remarks]

Properties:

Type: defect           Last Change: 2011-Sep-23 15:25
Status: resolved          Created: 2011-Jul-13 10:11
Fixed Version: v070604           Broken Version: v070506 
Priority:          Subsystem: Daemons 
Assigned To: eje           Derived From:  
Creator: danb  Rust:  
Customer Group: other  Visibility: public 
Notify: dan@hep.wisc.edu, tstclair@redhat.com, eje@cs.wisc.edu  Due Date:  

Related Check-ins:

2011-Sep-06 15:31   Check-in [27103]: Fixed a memory leak from some calls to param_without_default() by introducing a nicer abstraction param_defined(). ===VersionHistory:Complete=== ===GT:Fixed=== #2299 (By Erik Erlandson )

Attachments: