Ticket #3288: RFE: expose general_stats 'recent' ring buffer quantization to configu

The ring buffer used to maintain RecentXxx statistics uses a hard-coded quantization level. To improve quality of statistics as viewed from tools such as cumin, it will be useful to expose this quantization to configuration and allow it to be given increased time resolution.

My plan is to implement a new config param STATISTICS_WINDOW_QUANTUM, and update stats logic to obtain its quantization level from this variable (when it is present).

[Append remarks]

Remarks:

2012-Oct-23 15:53:36 by johnkn:
the SCHEDD has 2 collections of statistics, the SCHEDD stats and DC stats, it might be wise to make the quantum individually changable for these two sets.

so maybe 2 config knobs.

use SCHEDD_STATISTICS_WINDOW_QUANTUM if it is defined, if not, use STATISTICS_WINDOW_QUANTUM.

or

use DC_STATISTICS_WINDOW_QUANTUM if it is defined if not, use STATISTICS_WINDOW_QUANTUM.

note also that a timer needs to be running at the quantum for DC stats, (the SCHEDD stats are opportunistic and accumulate in CountJobs and when a shadow exits).


2012-Oct-23 16:00:20 by tstclair:
So by default the param system already has the

DAEMON.PARAM override, e.g.

SCHEDD.STATISTICS_WINDOW_QUANTUM would already work.

I'd have to look about DC though.


2012-Oct-23 16:17:59 by eje:
For dealing with named statistics subsets, we might adopt the acct-group convention.

So, to alter the 'DC' stats, on the schedd only, you'd do:

SCHEDD.STATISTICS_WINDOW_QUANTUM_DC = 10

A side benefit would be it would allow me to just focus on STATISTICS_WINDOW_QUANTUM, without precluding named-subset support under that convention, although adding that as well might not be so much effort.


2012-Oct-23 17:17:59 by eje:
One thing about using <name>_STATISTICS_WINDOW_QUANTUM, is that:

SCHEDD_STATISTICS_WINDOW_QUANTUM

would be equivalent to:

SCHEDD.STATISTICS_WINDOW_QUANTUM

under the older (although deprecated) config semantic. In that regard I think it could cause confusion to users. (it continues to confuse me). STATISTICS_WINDOW_QUANTUM_<name> should reduce confusion, since it works like the acct group convention.


2012-Oct-23 17:32:13 by johnkn:
No, you're confusing the names of daemons, and the names of statistics collections. If you use SCHEDD.STATISTICS_WINDOW_QUANTUM, then you are setting the quantum for ALL statistics collections in the SCHEDD.

But the SCHEDD has two collections, the so called "SCHEDD" collection, and the "DC" collection. We want to be able to set the quantum for ONE of these collections in the SCHEDD-daemon without changing the quantum for all collections in teh SCHEDD-daemon.

Thus SCHEDD_STATISTICS_WINDOW_QUANTUM is not the same as SCHEDD.STATISTICS_WINDOW_QUANTUM. You could call it SCHEDULER_STATISTICS_WINDOW_QUANTUM and DAEMONCORE_STATISTICS_WINDOW_QUANTUM instead of my original suggestion if that makes it less confusing.


2012-Oct-23 17:49:31 by eje:
I understand the distinction, I just think that using this:

STATISTICS_WINDOW_QUANTUM_SCHEDD

makes the distinction more clear, and has the added benefit of being consistent with our already-existing naming convention for acct group variables.


2012-Oct-24 06:45:51 by matt:
+1 eje's STATISTICS_WINDOW_QUANTUM approach

And <name> -> <collection>, because it is a new concept and should be differentiated.

<collection> should not be allowed to share a value with <subsys>, e.g. both SCHEDD, to avoid confusion.

We already have confusion between <subsys> and <daemon>, let's not introduce another undifferentiated concept to the mix.

IMHO, having <collection> is premature.


2012-Oct-24 08:57:12 by tstclair:
Say it's 10 years from now, how would one iterate the collection space in the off shoot that there are multiple collections within a single daemon.

+1 on suffix.


2012-Nov-06 17:07:47 by eje:
Proposed implementation on topic branch: V7_9-gt3288-stats-window-quantum

the config var STATISTICS_WINDOW_QUANTUM (which currently defaults to 240) is the global default quantum for all stats collections.

There are also: STATISTICS_WINDOW_QUANTUM_SCHEDD (synonym: _SCHEDULER), and STATISTICS_WINDOW_QUANTUM_DC (synonym: _DAEMONCORE) that can override the global default for respective collections.

You can test the new feature, with visible 'saw-tooth' every 20 seconds using this configuration:

STATISTICS_TO_PUBLISH = SCHEDD:2 DC:2
# one minute window for Recent* stats
STATISTICS_WINDOW_SECONDS = 60
# 20-second ring buffer quantization - recent-stats will sawtooth every 20 secs
STATISTICS_WINDOW_QUANTUM = 20

Kick off a script that submits one job per second (or every couple seconds, as long as it's a regular interval << 20 seconds).

Saw-toothing is visible in tools like cumin, or I also tested using 'watch' on a 'recent' stat from "SCHEDD" collection and one from "DC" collection:

watch -n 5 'condor_status -l -schedd | grep -e RecentJobsSubmitted -e RecentDCSelectWaittime'

In the above, both of the statistics drop off every 20 seconds, then begin to grow again.

If you change STATISTICS_WINDOW_QUANTUM to '1', and restart the scheduler, then you will see both statistics reach steady-state values, with no major drop-offs or visible saw-toothing.


2012-Nov-08 15:26:20 by tstclair:
please review V7_9-gt3288-stats-window-quantum


2012-Nov-12 13:58:06 by johnkn:
why is the type of STATISTICS_WINDOW_QANTUM_DC and others set to string?


2012-Nov-12 14:34:54 by eje:
I set STATISTICS_WINDOW_QUANTUM_[name] to string because the 'int' wouldn't let me leave it with no default value. The name-specific variations need to have no default so they aren't active when the admin doesn't configure a value.


2012-Nov-12 14:45:27 by tstclair:
That is a general failing in the param system, which is why lots of the defaults which should be 'int' are 'string'


2012-Nov-12 14:50:43 by johnkn:
reviewed by: TJ, 12-Nov-2012. approved.
[Append remarks]

Properties:

Type: enhance           Last Change: 2012-Nov-12 17:44
Status: resolved          Created: 2012-Oct-23 15:11
Fixed Version: v070903           Broken Version: v070800 
Priority:          Subsystem: Daemons 
Assigned To: eje           Derived From: #2197
Creator: johnkn  Rust:  
Customer Group: other  Visibility: public 
Notify: eje@cs.wisc.edu, tstclair@cs.wisc.edu, johnkn@cs.wisc.edu  Due Date:  

Related Check-ins:

2013-Jan-11 10:10   Check-in [34613]: fix crash caused by incomplete change of STATISTICS_WINDOW_QUANTUM from constant to config knob. #3288 quantum value was never set in the schedd stats pools created by SCHEDD_COLLECT_STATS_BY_*. ===VersionHistory:None=== bug never shipped. (By John (TJ) Knoeller )
2013-Jan-08 12:22   Check-in [34555]: revise 7.9.3 version history item ===GT=== #3288 (By Karen Miller )
2013-Jan-07 13:41   Check-in [34539]: Rectify definitions for STATISTICS_WINDOW_SECONDS and STATISTICS_WINDOW_QUANTUM ===GT=== #3288 (By Karen Miller )
2012-Nov-13 10:48   Check-in [34100]: Fix bug in DaemonCore::Stats init logic: #3288 (By Erik Erlandson )
2012-Nov-12 17:42   Check-in [34097]: ===VersionHistory:Completed=== ===GT=== #3288 (By Erik Erlandson )
2012-Nov-12 17:40   Check-in [34096]: Documentation for STATISTICS_WINDOW_QUANTUM[_<collection>] #3288 (By Erik Erlandson )
2012-Nov-12 17:39   Check-in [34095]: Expose ring buffer quantization to configuration: STATISTICS_WINDOW_QUANTUM[_<collection>] ===GT:Fixed=== #3288 (By Erik Erlandson )