Ticket #3288: RFE: expose general_stats 'recent' ring buffer quantization to configu
The ring buffer used to maintain RecentXxx statistics uses a hard-coded quantization level. To improve quality of statistics as viewed from tools such as cumin, it will be useful to expose this quantization to configuration and allow it to be given increased time resolution.
My plan is to implement a new config param STATISTICS_WINDOW_QUANTUM, and update stats logic to obtain its quantization level from this variable (when it is present).
2012-Oct-23 15:53:36 by johnkn:
the SCHEDD has 2 collections of statistics, the SCHEDD stats and DC stats, it might be wise to make the quantum individually changable for these two sets.
so maybe 2 config knobs.
SCHEDD_STATISTICS_WINDOW_QUANTUMif it is defined, if not, use
DC_STATISTICS_WINDOW_QUANTUMif it is defined if not, use
note also that a timer needs to be running at the quantum for DC stats, (the SCHEDD stats are opportunistic and accumulate in
CountJobsand when a shadow exits).
2012-Oct-23 16:00:20 by tstclair:
So by default the param system already has the
DAEMON.PARAM override, e.g.
SCHEDD.STATISTICS_WINDOW_QUANTUM would already work.
I'd have to look about DC though.
2012-Oct-23 16:17:59 by eje:
For dealing with named statistics subsets, we might adopt the acct-group convention.
- STATISTICS_WINDOW_QUANTUM -- "default for all subsets, all daemons"
- SCHEDD.STATISTICS_WINDOW_QUANTUM -- "default for all subsets, schedd daemon"
- STATISTICS_WINDOW_QUANTUM_<name> -- "used for subset <name> (any daemon)"
- SCHEDD.STATISTICS_WINDOW_QUANTUM_<name> "subset <name> on schedd daemon"
So, to alter the 'DC' stats, on the schedd only, you'd do:SCHEDD.STATISTICS_WINDOW_QUANTUM_DC = 10
A side benefit would be it would allow me to just focus on STATISTICS_WINDOW_QUANTUM, without precluding named-subset support under that convention, although adding that as well might not be so much effort.
2012-Oct-23 17:17:59 by eje:
One thing about using <name>_STATISTICS_WINDOW_QUANTUM, is that:SCHEDD_STATISTICS_WINDOW_QUANTUM
would be equivalent to:SCHEDD.STATISTICS_WINDOW_QUANTUM
under the older (although deprecated) config semantic. In that regard I think it could cause confusion to users. (it continues to confuse me). STATISTICS_WINDOW_QUANTUM_<name> should reduce confusion, since it works like the acct group convention.
2012-Oct-23 17:32:13 by johnkn:
No, you're confusing the names of daemons, and the names of statistics collections. If you use
SCHEDD.STATISTICS_WINDOW_QUANTUM, then you are setting the quantum for ALL statistics collections in the SCHEDD.
But the SCHEDD has two collections, the so called "SCHEDD" collection, and the "DC" collection. We want to be able to set the quantum for ONE of these collections in the SCHEDD-daemon without changing the quantum for all collections in teh SCHEDD-daemon.
SCHEDD_STATISTICS_WINDOW_QUANTUMis not the same as
SCHEDD.STATISTICS_WINDOW_QUANTUM. You could call it
DAEMONCORE_STATISTICS_WINDOW_QUANTUMinstead of my original suggestion if that makes it less confusing.
2012-Oct-23 17:49:31 by eje:
I understand the distinction, I just think that using this:STATISTICS_WINDOW_QUANTUM_SCHEDD
makes the distinction more clear, and has the added benefit of being consistent with our already-existing naming convention for acct group variables.
2012-Oct-24 06:45:51 by matt:
+1 eje's STATISTICS_WINDOW_QUANTUM approach
And <name> -> <collection>, because it is a new concept and should be differentiated.
<collection> should not be allowed to share a value with <subsys>, e.g. both SCHEDD, to avoid confusion.
We already have confusion between <subsys> and <daemon>, let's not introduce another undifferentiated concept to the mix.
IMHO, having <collection> is premature.
2012-Oct-24 08:57:12 by tstclair:
Say it's 10 years from now, how would one iterate the collection space in the off shoot that there are multiple collections within a single daemon.
+1 on suffix.
2012-Nov-06 17:07:47 by eje:
Proposed implementation on topic branch: V7_9-gt3288-stats-window-quantum
the config var STATISTICS_WINDOW_QUANTUM (which currently defaults to 240) is the global default quantum for all stats collections.
There are also: STATISTICS_WINDOW_QUANTUM_SCHEDD (synonym: _SCHEDULER), and STATISTICS_WINDOW_QUANTUM_DC (synonym: _DAEMONCORE) that can override the global default for respective collections.
You can test the new feature, with visible 'saw-tooth' every 20 seconds using this configuration:STATISTICS_TO_PUBLISH = SCHEDD:2 DC:2 # one minute window for Recent* stats STATISTICS_WINDOW_SECONDS = 60 # 20-second ring buffer quantization - recent-stats will sawtooth every 20 secs STATISTICS_WINDOW_QUANTUM = 20
Kick off a script that submits one job per second (or every couple seconds, as long as it's a regular interval << 20 seconds).
Saw-toothing is visible in tools like cumin, or I also tested using 'watch' on a 'recent' stat from "SCHEDD" collection and one from "DC" collection:watch -n 5 'condor_status -l -schedd | grep -e RecentJobsSubmitted -e RecentDCSelectWaittime'
In the above, both of the statistics drop off every 20 seconds, then begin to grow again.
If you change STATISTICS_WINDOW_QUANTUM to '1', and restart the scheduler, then you will see both statistics reach steady-state values, with no major drop-offs or visible saw-toothing.
2012-Nov-08 15:26:20 by tstclair:
please review V7_9-gt3288-stats-window-quantum
2012-Nov-12 13:58:06 by johnkn:
why is the type of STATISTICS_WINDOW_QANTUM_DC and others set to string?
2012-Nov-12 14:34:54 by eje:
I set STATISTICS_WINDOW_QUANTUM_[name] to string because the 'int' wouldn't let me leave it with no default value. The name-specific variations need to have no default so they aren't active when the admin doesn't configure a value.
2012-Nov-12 14:45:27 by tstclair:
That is a general failing in the param system, which is why lots of the defaults which should be 'int' are 'string'
2012-Nov-12 14:50:43 by johnkn:
reviewed by: TJ, 12-Nov-2012. approved.
Type: enhance Last Change: 2012-Nov-12 17:44 Status: resolved Created: 2012-Oct-23 15:11 Fixed Version: v070903 Broken Version: v070800 Priority: 2 Subsystem: Daemons Assigned To: eje Derived From: #2197 Creator: johnkn Rust: Customer Group: other Visibility: public Notify: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org Due Date:
|2013-Jan-11 10:10||Check-in : fix crash caused by incomplete change of STATISTICS_WINDOW_QUANTUM from constant to config knob. #3288 quantum value was never set in the schedd stats pools created by SCHEDD_COLLECT_STATS_BY_*. ===VersionHistory:None=== bug never shipped. (By John (TJ) Knoeller )|
|2013-Jan-08 12:22||Check-in : revise 7.9.3 version history item ===GT=== #3288 (By Karen Miller )|
|2013-Jan-07 13:41||Check-in : Rectify definitions for STATISTICS_WINDOW_SECONDS and STATISTICS_WINDOW_QUANTUM ===GT=== #3288 (By Karen Miller )|
|2012-Nov-13 10:48||Check-in : Fix bug in DaemonCore::Stats init logic: #3288 (By Erik Erlandson )|
|2012-Nov-12 17:42||Check-in : ===VersionHistory:Completed=== ===GT=== #3288 (By Erik Erlandson )|
|2012-Nov-12 17:40||Check-in : Documentation for STATISTICS_WINDOW_QUANTUM[_<collection>] #3288 (By Erik Erlandson )|
|2012-Nov-12 17:39||Check-in : Expose ring buffer quantization to configuration: STATISTICS_WINDOW_QUANTUM[_<collection>] ===GT:Fixed=== #3288 (By Erik Erlandson )|