CSAW CTF 2017 Infrastructure Overview25 Sep 2017 Tagged: sysadmin
(Cross-posted from my entry in the OSIRIS Lab’s blog: https://blog.isis.poly.edu/2017/09/25/csaw-ctf-2017-infra/)
We’ve had a few people ask us over the past couple of years how we deploy CTFd and our challenges to serve the more than 2000 teams we have in CSAW CTF Quals, so here’s a quick post explaining how we do it.
First things first: you’re going to need hardware. Not a lot, but definitely at least 4 cores. We run CTFd as an LXC container on some very large (40 core, 160GB RAM) machines the OSIRIS lab owns, however even at peak traffic the box’s CPU ends up being <10% utilized, so even just 4 cores should be able to handle the traffic we serve. A t2.xlarge on EC2 (or equivalent 4c/8+g RAM on other cloud computing services) should do nicely.
Next, we use Cloudflare. Even at peak, we don’t end up sending much traffic outbound so this is primarily just to make the site more responsive for people outside the US by caching static resources (JS & CSS primarily) on their CDN. Being able to offload ~1 million queries never hurts!
In-between Cloudflare and gunicorn (more on that in a minute), we have nginx. nginx is mostly here to more efficiently handle any static assets that get through Cloudflare, but we also use it as another layer to filter requests and drop bad traffic.
Finally on the www side, we have gunicorn itself which is responsible for running CTFd. We run gunicorn with 16 workers, so that with the ~50req/sec we see sustained each worker handles just a couple of requests per second. We’ve
ab‘d this, and this setup can easily handle > 150 concurrent requests, so we’ve got plenty of room to grow. Note here: uwsgi DOES NOT work with CTFd. As far as we can tell, this is due to uwsgi forking python after the DB connection(s) have been created which causes multiple threads to use the same underlying connection which is a bad thing™.
And last but not least, the database. We have an internal 3 host MariaDB/Galera cluster which handles all DB traffic. Each of the nodes in there isn’t anything fancy, so running the MySQL/MariaDB instance on the same host as CTFd should work fine.
The last couple of years we’ve been deploying challenges with Docker which has made it so much easier to manage and reset challenges when they inevitably go down or break. We have 5 different VMs (all roughly 4c/8g of RAM like the CTFd instance above), each of which handles the challenges for one of the main 5 categories in CTF: pwn, RE, web, crypto, and misc. RE in particular is not needed most of the time, and so this could easily be folded down into 4 VMs.
This year, we added another layer into the mix which was super useful: our own Docker registry. The registry ran on a completely separate internal box, and was responsible for rebuilding the docker containers for each challenge as they were added or changed in the challenge repo. This allowed us to push changes to the repo, and just seconds later pull down new docker images from the challenge VMs without having to worry about manually rebuilding or burn disk space on the challenge VMs with old/temporary images.
Dockerfiles also give us the ability to quickly redeploy elsewhere in the case of catastrophic failure. Since they’ve become essentially the industry standard for containers, we can easily drop them onto AWS ECS (or equivalent) and be back up and running on completely different hardware in just a few minutes.