Overkilling Website Performance19 Nov 2019 Tagged: sysadmin
For the past year or so, I’ve been using GCP multi-region storage buckets with CloudFlare in front (for caching and TLS) to serve all of my static sites. I was never thrilled with the TTFB numbers I was getting out of the combo on un-cached pages however, and GCP having four relatively major outages in 6 months just pushed it over the edge for me.
I did some initial tests in AWS with a single S3 bucket and Cloudfront, and while results were slightly better, they still were not fantastic - times to load pages not in Cloudfront’s edge caches were around 60-70ms (vs ~100ms for GCS). Keep in mind however that those numbers are from a client in NYC with the site hosted in us-east-1 - one of the best cases possible (latency-wise). Visitors on the other side of the US would be right back to ~100ms load times, and visitors outside of the Americas would easily be 150ms+.
Not having much better to do one night, I decided to figure out how I could completely overkill site performance. My main concern was performance when cache misses happened (the majority of page loads on my site due to it not getting much traffic), and my objective was to get ~60ms response times on any page (cached or not) from anywhere in the world, not just within the US.
Sticking with Amazon as a provider here, there’s a few options I started exploring:
S3 + Cloudfront with extremely large TTLs
Based on the docs, Cloudfront employs a two-layer cache. POPs have their own, independent caches (very standard), however Cloudfront also has regional caches which, based on maps, seem to correspond to AWS regions. If an item is not in the POP’s cache, it will reach back to the regional cache, which can then return the object from there or go back to the origin if necessary. Since my site is not visited very frequently, it’s highly unlikely that any given page will be cached in the POP closest to the visitor (even with high TTLs), so I would be relying on regional caches keeping content basically indefinitely. In theory this layout could work (assuming regional caches effectively never expire items that are within their TTL), however this is not guaranteed, and it also means I would have to explicitly invalidate a number of pages each time I make a change to the site.
Multi-region S3 + ???
This was actually my first thought: just stick a copy of the site on each continent. While the content replication is easy to do within S3, there don’t seem to be any ways to make origin decisions in Cloudfront based on geolocation. The closest thing I found was to use a Lambda@Edge function to dynamically proxy the request based on geo, but then I got to thinking… If I already need to have a function at the edge to determine where to proxy incoming requests, could I just have the function return the site itself?
This reminded me of a blog post by CloudFlare which talks about deploying a static site to their edge using their Workers product (storing the site in their K/V store). I was curious to see if I could do something similar on Amazon, mainly because Workers has a $5/mo minimum price which I’d rather not have to pay if necessary.
Amazon has a vaguely similar product to Workers called Lambda@Edge which, after a bit of reading, seems to be a bit of a misleading name (in my opinion). From what I can tell (based on docs and timing), the Lambda functions (at least for “Origin Request” triggered calls) are invoked in the nearest Amazon region, not at the POP/edge itself. Either way, if I can easily get the site contents stored in every Amazon region, that definitely gets me very close to the goal of delivering uncached pages in ~60ms anywhere on Earth.
A bonus of Lambdas that I only realized later is that the timing characteristics of Lambda functions ends up coinciding nicely with visitor usage. If a visitor comes along and are the first ones in a while in the entire AWS region, it will take a hundred milliseconds or so for the Lambda function to start up, which, while not ideal, also isn’t the worst thing since DNS resolution, initial TCP handshare, TLS, etc. will have also taken up a bit of time. Keep in mind this all only happens in the case that Cloudfront doesn’t have the page cached, either in the POP in in the regional cache which certain pages (the home page for instance) likely will be just due to background traffic. The interesting part about Lambdas is on subsequent requests. Requests to other pages (like the visitor clicking on a blog entry) are unlikely to be cached (since Cloudfront didn’t have the prior page cached), however we now have a running Lambda instance up that can serve requests in a couple of milliseconds. So regardless of where the user is, they will either hit a cached page in Cloudfront (taking basically round-trip time to respond), or will be proxied to a warmed Lambda instance through Cloudfront.
While I only ended up implementing the full Lambda@Edge solution (and so don’t have concrete numbers for the others), we can make some deductions about relative performance:
- vs. multi-region S3
- Even with a bucket in every region, the Lambda function would still have to do an intra-region request/response to fetch the content from S3
- If there was only one bucket per continent, there would be additional inter-region latency
- vs. large TTLs
- Strictly better in the case of a complete cache miss (no round trip to the origin)
- Only worse on first page load in a region with no running Lambda
There’s also a few non-performance benefits to a pure Lambda@Edge solution (vs. large TTLs):
- Don’t have to deal with invalidation on every site update
- Eliminates the single point of failure of the origin (just in case https://aws.amazon.com/message/41926/ happens again)
Old vs. New
With all of that theory discussed, let’s look at some actual measurements (taken with TurboBytes Pulse):
The old method (GCS + CloudFlare) gave mean TTFBs of ~350ms on effectively every page unless it happened to be in CloudFlare’s cache.
The new method gives global average response times of ~210ms for the first connection in the region, and subsequent loads of uncached pages in ~70ms.
TTFBs of pages in POP caches average ~30ms on both.
With a bit more effort it should be possible to keep a Lambda instance warm in each region, which should completely eliminate the first page TTFB penalty, giving consistent uncached TTFBs of ~70ms. And with that, I’ve basically achieved my original goal (averaging just 10ms higher than I hoped for), significantly bringing down page load times and making the blog extremely snappy.
Will many people notice? No. But was it fun? Absolutely.