Ticket #2541: Memory utilization of classads (revised)

Problem Description

When running heap profiles experiments on the schedd, we've found that doubling the queue count == doubling of the classad count in the schedd even though most of the data is the same across ads.

Also when doing performance calculations on the collector it was determined that most of the time inside of the collector was spent churning on redundant data.

Proposed solution

Keep a cache of the current in memory classads.

Design

Approach key [str_name] = key[str_value] = weak_ptr<~ExprTree>

After several iterations on possible implementations I've settled on this implementation because it has optimal speed performance without sacrificing any space savings.

One thing that becomes apart when analyzing the data is that there is a large % or redundant data and a small % of data which churns. We could easily save more memory by having meta information in classads (possibly through nested ads).

m_Cache[name][value] = weak_ptr<Cache_Element>

All caching code backs against the Insert function(s) in classads. It was optimized so the main path by which messages come across the wire will do a look-up using a single string based function in the classads layer:

bool Insert(const std::string& serialized_nvp);

Initial Results

Space savings depends on a couple of variables:

Debugging

There are a couple of internal features which have been added to aid in the diagnostics of the behavior of caching. There is an internal only knob which is hooked through DAEMON_CORE which enables a daemon to dump it's cache to a <subsystem>_classad_cache file under LOG when DEBUG_CLASSAD_CACHE=TRUE . The file dumping is hooked through SIGUSR2

kill -s 12 <collector_pid> -> drops $LOG/COLLECTOR_classad_cache

NOTES

[Append remarks]

Remarks:

2012-May-15 16:24:04 by tstclair:
After taking it off the shelf, dusting off, and giving this ticket much needed love things are looking good. Initial tests yield hit rates ~80% on average with a fair amount of variance depending on the workload.

I will update this ticket to include the design. Once my tests come back green I will push a final squash commit for review.


2012-May-16 15:11:39 by tstclair:


2012-May-16 15:14:03 by tstclair:
Any extra lexer optimization would still be handy, because as entries get flushed from the cache they still need to be lex'd prior to re-entry.


2012-May-22 16:25:15 by tstclair:
removing lexer mods and cleaning up + running through local tests post mods to revised approach. Will then commit and create new tickets which outline further potential gains.


2012-May-30 15:34:11 by jfrey:
CODE REVIEW of first iteration ([32102], git hash 10ae0cd4):

These have been removed and cleaned

Non starter, does not affect perf

I actually don't like this recommendation

I've verified behavior and went through that code many times so I think I will need to see exactly what you are talking about

Fixed

Fixed

All virtual function go through self so it should be clean, which is why the Env->Letter idiom was chosen

Non-starter with redesign, and has been tested at length

Will let other test justify

Will need example, as all tests pass

Outside of scope of this ticket


2012-May-30 15:36:53 by jfrey:
While reviewing this code, I had a few ideas for additional memory optimizations. The sizes mentioned are on x86_64 linux:


2012-Jun-26 21:54:26 by jfrey:
Followup on code review:

Because I want to ensure open semantics and it's intended to be a debug only utility to be used only by developers so safefile semantics do not matter

Fixed


2012-Jun-27 13:46:40 by jfrey:
I found some additional problems with nested ads. Your most recent fix addresses when a nested ad is the value of an attribute in the parent ad. But it doesn't address when the nested ad is a subexpression. I'll attach a demonstration program similar to the previous one.

Fixed


2012-Jun-29 14:12:37 by johnkn:
Wranger sez - This item needs version history. please compose a couple of sentences and send to me and or Karen.
[Append remarks]

Properties:

Type: enhance           Last Change: 2013-Jan-14 14:18
Status: resolved          Created: 2011-Oct-11 15:39
Fixed Version: v070900           Broken Version: v070600 
Priority:          Subsystem: Daemons 
Assigned To: tstclair           Derived From: #2961
Creator: tstclair  Rust:  
Customer Group: other  Visibility: public 
Notify: matt@cs.wisc.edu tstclair@redhat.com, jfrey@cs.wisc.edu tannenba@cs.wisc.edu eje@cs.wisc.edu  Due Date:  

Derived Tickets:

#2899   classad lex/parse uses up to 50% of the compute time for daemons
#3075   Proper build fails on Centos 5
#3089   Reduce size of classad::Value
#3092   MatchClassAd::ReplaceLeftAd() and ReplaceRightAd() need fixing
#3099   Bug in unparsing quoted ClassAd attribute names
#3100   New ClassAd::Insert() doesn't handle attribute names properly
#3127   Caching code makes ClassAds much less thread-safe
#3175   ClassAd caching leaks memory in Condor
#3441   Fixes for classad caching.
#3448   Improving classads hash function
#3457   reduce the size of classad exprtree nodes

Related Check-ins:

2013-Mar-22 09:58   Check-in [35262]: Remove data members from classad Operation class, and create subclasses by arity #3457 parent ticket #2541 each subclass of Operation class will have only the necessary data members, so we don't waste space for 2nd & 3rd ExprTree pointers in nodes that have no need of them. ===VersionHistory:Pending=== [...] (By John (TJ) Knoeller )
2013-Mar-22 09:58   Check-in [35261]: Remove nodeKind member from ExprTree, and replace it with a pure virtual method #3457 that all of the derived classes must implement. parent ticket #2541 (By John (TJ) Knoeller )
2012-Dec-04 14:41   Check-in [34268]: Ticket #2541 Fix Cache - Unparsing issue in xml, and cleanup SameAs f(n)'s. (By Timothy St. Clair )
2012-Jul-17 14:13   Check-in [32823]: edit of version history item and knob defn related to new caching of ClassAds ===GT=== #3127 ===GT=== #2541 (By Karen Miller )
2012-Jul-11 10:44   Check-in [32702]: improve wording of version history item about caching ClassAds ===GT=== #2541 (By Karen Miller )
2012-Jun-29 14:38   Check-in [32548]: ===GT=== #2541 ===VersionHistory:Complete=== (By Timothy St. Clair )
2012-Jun-29 12:52   Check-in [32541]: Ticket #2541 Update string::npos check for invalid input strings (By Timothy St. Clair )
2012-Jun-27 14:36   Check-in [32496]: Ticket #2541 Update for ExprList (By Timothy St. Clair )
2012-Jun-27 12:10   Check-in [32490]: Ticket #2541 Update caching for nested classads use case. (By Timothy St. Clair )
2012-Jun-26 18:35   Check-in [32484]: Fix building of cream and blahp externals. #2541 classad_stl.h wasn't being installed into the externals directory. The blahp configure line was getting two include directories for boost. (By Jaime Frey )
2012-Jun-25 13:17   Check-in [32464]: Fix building of cream_gahp for RHEL 6 #2541 (watched by tstclair) (By Nathan W. Panike )
2012-Jun-22 16:11   Check-in [32460]: Ticket #2541 Windows requires explicit inclusion of weak_ptr.hpp (By Timothy St. Clair )
2012-Jun-22 15:28   Check-in [32459]: Ticket #3075 and #2541 remove namespace collision with prefix classad_ (By Timothy St. Clair )
2012-Jun-22 15:01   Check-in [32458]: Ticket #3075 and #2541 Add in order checks for standard features C++11 -> tr1 -> boost. Sadly our code base needs tender love for C++11 support. Also hides boost from classads exposure where possible. (By Timothy St. Clair )
2012-Jun-22 11:08   Check-in [32457]: Ticket #3075 and #2541 Update build chain to prefer local system boost when possible even when non-proper. This ensures distributed libraries match system where possible. (By Timothy St. Clair )
2012-Jun-21 17:22   Check-in [32456]: Ticket #2541 & Ticket #3075 Update cream to use prefixed boost with prefixed classads (By Timothy St. Clair )
2012-Jun-21 15:26   Check-in [32450]: Ticket #3075 and #2541 Where boost doesn't exist prefer tr1. (By Timothy St. Clair )
2012-Jun-21 12:46   Check-in [32448]: Ticket #2541 NMI build and test updates Fixes Fedora 16 -Werror build Fixes lib_job_router_local failure (By Timothy St. Clair )
2012-Jun-21 08:06   Check-in [32443]: Ticket #2541 - Squashed feature commit for classad caching. (By Timothy St. Clair )

Attachments: