Better Caching for Express Applications
node.js is awesome
node.js is known for it’s high throughput I/O capabilities and lightning fast JavaScript execution engine, V8. Both of these together enable developers to easily create high-performance, lightweight web applications. Despite these features, bottlenecks are unavoidable at times in web applications, even those built with node.js and the awesome express web framework.
solving bottlenecks
Sometimes there’s a simple solution for such problems; caching. By caching responses from slower external systems at our node.js layer we can dramatically reduce load on those systems and improve our response times in one fell sweep! While this is a great solution, the implementation is often less than ideal and frequently looks something like this.
There’s nothing particularly wrong with this code, but I’d rather not have to write this for each of my routes since it will mean more testing effort, increased likelihood of bugs, and is not DRY. Also, what if you need custom caching logic based on the response from the getUsers call? This can get messy pretty quickly, and is often too intrusive for me since it affects how functions are written.
npm to the rescue?
A better solution to this problem is to create an express middleware that does the caching for you. Naturally I thought “surely this has been done already”, and headed to npmjs.org to find such a module, but was met with some disappointment. Most of the modules in the wild perform caching, but come with caveats:
- They override functions on the response (res) Object which can lead to errors and confusion as well as slow down in response time if done incorrectly
- Some functions on the response (res) Object are forgotten about meaning caching does not occur and you’re lost as to why.
- A cache store interface is not exposed so it’s not possible to programmatically interact with the underlying cache when you need to delete or modify entries
- Using node.js memory for caching. This means your node.js process could use far more resources than you’d like
- If it does use a different datastore then you’re forced to use it, e.g redis (redis is great, but might not be viable for everyone)
- Not accounting for parallel requests for the same resources, and therefore resulting in many large buffers in memory resulting in memory and CPU spikes
- Lack of ability to override ttl for different status codes meaning a temporary error such as a 500 could result in bad response being cached and returned repeatedly for longer than necessary
- Unable to generate cache keys using a custom function meaning loss of control over cache conflicts
- They’re unable to cache piped responses, i.e request.get(URL).pipe(res)
Since I tend to be a perfectionist at times, these issues wouldn’t stand. There was only one solution; write yet another caching module. I guess this is relevant?
expeditious to the rescue?
In an either bold, or insane move I decided to try my hand at creating a better solution to this problem and ended up creating the project expeditious. Needless to say it got out of hand quickly, and currently there are three expeditious related modules I’ve published:
- expeditious - The core caching interface
- expeditious-engine-memory - An engine that can be passed to a core instance
- express-expeditious - A caching middleware for express that uses an instance of expeditious as the cache
The benefits of this modular approach are numerous:
- Core is lightweight and independent of the underlying data store
- Data stores can be created as needed. You need to store data in CouchDB? Go ahead!
- API is standard regardless of storage engine used, meaning a single line of code is the only change required to change the data store
- expeditious instances used by middleware such as express-expeditious are available for programmatic modification meaning middleware is not a black hole
Beside this, express-expeditious offers a solution for the drawbacks I found with the existing express caching modules. The expeditious core also serves as a standalone caching solution that can be used in any node.js or JavaScript application in a browser.
using express-expeditious
The end result? Pretty neat I think:
This is a trivial example and is by no means the only way to use express-expeditious. You can apply it to specific routes or express.Router instances for granular control. Thanks to the manner in which these modules work in unison, we can programmatically access the cache.del method to remove “/ping” if some event occurred and new data became available.
the results
Taking a look at this fabricated test where only 25 concurrent requests can be made, we see that the total time for 1000 requests goes from 80.4 seconds to 2.4 seconds. Not bad!
without cache
with cache
summary
There are many ways to architect a solution, and while this works for some node.js applications, you might have another approach or layer for solving your caching needs.
If you’d like to get started checkout the example for express-expeditious.
Go forth and cut those response times down to size!