Write-through and Generational Caching Make a Great Team
We all know that a great way to speed up any website is to throw some caching at it, right? When considering caching strategies there are a lot of questions you should be asking. What type of caching is right for my specific situation? Do we have to worry about serving up stale data, and if so how do we go about handling that? Plus another dozen or so questions. My goal in this post is to show a generic caching strategy that can be implemented in most web applications and just work.
Caching Considerations
“So what exactly are you looking for when it comes to a generic non-website specific caching strategy?” you might probably didn’t ask. Good question.
- Generic implementation
By that I mean I can just throw it at any new project I’m working on and don’t have to customize it for the type of data I’m dealing with, or any specific cache implementation (i.e. Memcached, in-memory cache, etc.) - Stays out of my way
I don’t want to have to specifically clear out the cache for my entity whenever it is updated, or manage cache keys to invalidate previous queries. - Don’t serve up stale data
There are certain situations where stale data is acceptable, but for a generic implementation we want to be conservative and only serve up to date data.
There are 2 types of main data you will be dealing with in your application: single entities (think CRUD) and queries (give me all the posts in a category). I’ll be dealing with these types differently. For single entities, I’ll implement write-through caching and for queries I’ll be using generational caching.
Write-Through Caching
Here’s how to sum up write-through caching: when you write to the database you write to the cache. Done and done.
Write-through caching is great for serving up single entities, not queries. The single entity is always in cache so it will serve up really fast. A great place for the write-through caching logic is in your repository layer and even better is to include it in a generic repository that is the base class for all your specific repository instances (that keeps it DRY), but that is information for another post.
Pretty simple, write? (terrible pun used for my satisfaction only I’m sure)
Let’s get to some code
I’m using a generic repository that uses Entity Framework 5 to access the database, so my methods will have a generic for the entity type. If you implement this directly in a typed repository you will just need to change accordingly.
Let’s start with the caching provider. Here I have the interface that we will code against. It provides a generic caching interface and we will provide the concrete class via injection (personally I like using StructureMap). The beauty of this approach is that you can start out with a simple in-memory caching provider and then scale up to a distributed cache like Memcached or something else later very easily.
public interface ICachingProvider : IDisposable
{
void Set<T>(string key, T value, int? timeoutInSeconds = null);
void Clear(string key);
bool Exists(string key);
bool Get<T>(string key, out T value);
int Increment(string key, int defaultValue, int value);
}
And the in-memory implementation looks like this. It uses the built-in .NET runtime caching.
/// <summary>
/// Uses the .NET built-in MemoryCache as the caching provider.
/// </summary>
public class InMemoryCachingProvider : ICachingProvider
{
private static ObjectCache Cache
{
get { return MemoryCache.Default; }
}
private static readonly object LockObject = new object();
public void Set<T>(string key, T value, int? cacheTime = null)
{
if (String.IsNullOrEmpty(key)) throw new ArgumentNullException("key");
var policy = new CacheItemPolicy();
if (cacheTime.HasValue)
{
policy.AbsoluteExpiration = DateTime.Now + TimeSpan.FromSeconds(cacheTime.Value);
}
Cache.Set(key, value, policy);
}
public void Clear(string key)
{
if (String.IsNullOrEmpty(key)) throw new ArgumentNullException("key");
Cache.Remove(key);
}
public bool Exists(string key)
{
if (String.IsNullOrEmpty(key)) throw new ArgumentNullException("key");
return Cache.Any(x => x.Key == key);
}
public bool Get<T>(string key, out T value)
{
if (String.IsNullOrEmpty(key)) throw new ArgumentNullException("key");
value = default(T);
try
{
if (!Exists(key))
return false;
value = (T)Cache[key];
}
catch (Exception)
{
// ignore and use default
return false;
}
return true;
}
public int Increment(string key, int defaultValue, int value)
{
if (String.IsNullOrEmpty(key)) throw new ArgumentNullException("key");
lock (LockObject)
{
int current;
if (!Get(key, out current))
{
current = defaultValue;
}
var newValue = current + value;
Set(key, newValue);
return newValue;
}
}
public void Dispose()
{
// no need to do anything
}
}
Now onto the repository bits. I’m not going to show the whole thing because I don’t want the details of the generic repository to take away from the caching bits (plus I’m using a simplified version for my example here), so I’ll only be showing parts I think are necessary to understand what is going on.
Let’s get started with some helper methods we use for the write-through caching. There is a method for setting the cache value and for making the cache key based on the type of the entity and the primary key.
private void SetWriteThroughCache(int id, T item)
{
_cachingProvider.Set(CacheKey(id), item);
}
private string CacheKey(int id)
{
return String.Concat(_typeName, "/", id);
}
The Get method is pretty straightforward I think. It checks to see if the entity is already in cache and if not, get it from the database and put it into cache so it’s there the next time.
public T Get(int id)
{
T item;
// if the item is in the cache already then just return it
if (_cachingProvider.Get(CacheKey(id), out item))
{
Debug.WriteLine("Found in cache for {0} with id of {1}", _typeName, id);
return item;
}
Debug.WriteLine("NOT FOUND in cache for {0} with id of {1}", _typeName, id);
// otherwise we need to call EF and get the item
item = DbSet.Find(id);
// do the write-through caching so next time it won't hit the DB
SetWriteThroughCache(id, item);
return item;
}
I’ll just show one of the CRUD operations since they are basically the same.
public void Update(int id, T item)
{
// update the entity using EF
Context.Entry(item).State = EntityState.Modified;
Context.SaveChanges();
// do the write-through caching
SetWriteThroughCache(id, item);
}
When we update the item in the database, we need to also update the cache, simple as that. Same goes for the Add method, while in the Delete method we will just remove it from cache and then delete from the database. All fairly straightforward. Okay, so how can I query my entities instead of just getting one item at a time? Wow, you ask great questions. That leads right into my next part.
Generational Caching
The basic idea behind generational caching is that the majority of sites will have more reads than writes on a particular table. If this is true, then there is a performance benefit to caching queries against that table until it is written to, then we clear out that cache. Where generational caching shines is in the management of what information should be kept in cache and what should be cleared out. Keeping track of what cached queries are stale because of a table update is tough. To do this manually you’d have to loop through your cache (which caches are generally not designed for) look at each query and determine which tables it used or what rows within the table it used and if they have been updated or not. Not fun.
Generational caching takes a simple, conservative approach. If a table gets written to (insert, update or delete), then all cached queries for that table are no longer valid. This is conservative because if I update a single row in a table then all queries (even those that don’t involve that row) are no longer used. But it’s this simplicity that makes it a beautiful approach. We don’t have to manage that process and we can get some good performance benefits as long as our site has more reads than writes in general.
Here’s how we do this. For each entity type we are tracking (most times it maps to one database table per entity) we setup a generation for each one that initially starts at 1 and is incremented each time the table is written to. The generation will be stored in cache and will use a key like “Post/Generation”. Then each time we cache queries or look for query results we use this generation as part of its key. That’s it, that’s all there is to it. “Really? But you aren’t dealing with clearing the cache at all.” Exactly, that’s the best part. Since my generation is updated each time the table is written to and my cache lookup logic is using that generation, I will never serve up stale data and the old data won’t ever be accessed again. The caching provider being used will clear out old data when the cache fills up so those old queries will get flushed on their own, no need to manually remove them.
Time for more code
Again we’ll start with the helper methods that deal with the generational part of generational caching. The GetGeneration method gets the current generation from cache for this particular entity type (each type of entity gets its own generation and is independent of the others, so if you update the Post table it doesn’t affect the User cache). NextGeneration is in charge of incrementing the generation by one, while the CacheKey builds up the appropriate cache key to use.
private int GetGeneration()
{
int generation;
return !_cachingProvider.Get(GetGenerationKey(), out generation) ? 1 : generation;
}
private int NextGeneration()
{
return _cachingProvider.Increment(GetGenerationKey(), 1, 1);
}
private string GetGenerationKey()
{
return String.Format("{0}/Generation", _typeName);
}
private string CacheKey(Expression<Func<T, bool>> predicate = null)
{
var key = predicate == null ? "All" : Md5Helper.CalculateMd5(predicate.ToString());
return String.Concat(_typeName, "/", GetGeneration(), "/", key);
}
The CacheKey version that accepts an Expression might need some additional explanation. In our repository, we have a GetAll method that accepts an Expression (think LINQ) as a way to get only certain entities from a table. To make a unique cache key for this we use the current generation (like we do for all generational caching keys) as well as an MD5 hash of the predicate itself. Calling ToString() on the Expression returns a string representation of the query (isn’t that clever) which includes the specific values passed in, so it will be unique each time, and taking an MD5 of a unique string will itself be unique, so we are good there.
Only a couple more parts to go, I promise.
The GetAll method has an optional Expression parameter which lets us query our repository using LINQ syntax (or call GetAll() without an Expression to get everything). We do the same basic logic in here as the Get() method where we check the cache and if it exists return it directly or query the database, cache the results and then return. Pretty simple.
public IEnumerable<T> GetAll(Expression<Func<T, bool>> predicate = null)
{
IEnumerable<T> items;
// get the appropriate cache key for this query
var key = CacheKey(predicate);
// check if it's in the cache already
if (!_cachingProvider.Get(key, out items))
{
Debug.WriteLine(String.Format("NOT FOUND cache for: {0}", key));
// if not, then run the query
items = predicate == null ? DbSet.ToList() : DbSet.Where(predicate).ToList();
// and cache the results for next time
_cachingProvider.Set(key, items);
}
else
{
Debug.WriteLine(String.Format("Found cache for: {0}", key));
}
// return the results either from cache or that were just run
return items;
}
So the most important piece that brings this all together is updating the generation whenever the table is updated. So let’s add that to the repository and then we are done. In Add(), Update() and Delete() we need to add a call to NextGeneration() and that takes care of it.
Here’s our updated Update method (couldn’t resist) but each one is similar.
public void Update(int id, T item)
{
// udpate the entity using EF
Context.Entry(item).State = EntityState.Modified;
Context.SaveChanges();
// do the write-through caching
SetWriteThroughCache(id, item);
// update the generation
NextGeneration();
}
To sum it all up
The combination of a repository layer, write-through caching and generational caching is a pretty powerful thing. It is a straightforward way to gain some performance without really having to think too much. It’s a conservative, generic approach that will work for most web applications, especially those that have many more reads than writes (which I’m betting is most sites you develop).
In the future, think about adding this strategy when doing new development or add it to an existing repository layer to gain a little boost right now.
Leave a comment
Recent Posts
- iOS Unit Testing With OCMock
- Why Stakeholders Need To Be Involved In Scrum
- NuGet Config File Transformation Causes Duplicate Entries On Update
- Load Testing with Locust on Windows
- Writing A Custom LINQ Provider With Re-linq
- AutoMapper Profile Organization
- Rails 3.2: A Nested-Form Demo Part 4: Switch to Targeting Computer!
- SharpRepository: Configuration
- Rails 3.2: A Nested-Form Demo, Part 3: We’re Starting Our Attack Run!
- Rails 3.2: A Nested-Form Demo, Part 2: Accelerate to Attack Speed!
- Rails 3.2: A Nested-Form Demo, Part 1: All Wings Report In!
- iOS Behind the Curve
- Distributed Transaction Coordinators, Port 135, and Firewalls – Oh My!
- SharpRepository: Getting Started
- Find Performance Problems Using JMeter, MySQL and Xdebug/Webgrind
- Taming Hot Key Context Shifting When Running A Windows VM In Virtualbox On OSX
- Integrating Twitter’s Bootstrap Into Your Project
- Mobile payments, tags and more using NFC
- Stress Pig
- Dear Client Services, What Works?
- What Would Steve Do?
- Still Using Fiddler to Test & Debug Your REST Services?
- Write-through and Generational Caching Make a Great Team
- Thinking Recursively
- Development Incentives, What’s the Payoff?
- How do you like them Apples?
- “Optional” Software Development Practices Series — Code Review
- Adding Images to Select Lists in MVC3
- “Optional” Software Development Practices Series
- You Get What You Pay For…
- Outsourcing Safety Tips
- Facebook IPO
- The Ballad of Tim Toady
- The Little Schemer
- Newsflash: Mom leaves tech job at 5p.m.
- Flashback!
- I <negative_emotion> Windows 8!
- Prefix vs. Postfix Increment and Decrement Operators in C++
- Corporate videos: viral boon or epic fail?
- Recruitin’ Time!
- Reference vs. pointer parameters in C++
- The IE8 "hover" Bug: The Most Awesome IE Bug Ever?
- When is perfect perfect enough?
- SOPA/PIPA: Anti-Censorship Protest or Techies Revenge?
- A Decade of Fairway
- Handling Session Timeout Gracefully
- Generating Software Diagrams
- The Audacity of Nope
- The Origins of Culture
- Scrum Overview in Prezi – not another boring slideshow
- Numbers don’t lie: LinkedIn Statistics
- What is your favorite software development tool?
- Best Practices for Selecting Onshore, Nearshore or Offshore Information Technology Outsourcing (ITO) Providers
- Sign of the Times
- Advantages and Risks of Offshoring, Nearshoring or Onshoring
- Does Outsourcing Mean Offshoring?
- Too little, too late?
- New Favorite Lunch Spot
- Why should I care about functions as first-class citizens?
- PHP Remote Debugging with XDebug and NetBeans
- Installing SubText with Web PI
- ROI Primer
- Learn Domain-Driven Design
- Learn Behavior-Driven Development
- Mario Kart Tournament
- F# in 90 Seconds
- Website Vulnerabilities
- Scrum Overview
- Language Club
- Top 12 Favorite Podcasts Ever…
- Fairway Dart Tournament
- Learn Lean Software Development and Kanban Systems
- Android – Eclipse Quick Start
- Learn Functional Programming
- Backup & Restore Strategy
- Smartphone Screens – Another Wireless Variable
- Wireless Application Market
- Head First AOP





