CSAW War Stories21 Sep 2019 Tagged: sysadmin
While writing up my recent post about debugging a problem with CSAW CTF’s website this year, I remembered this post which I started writing about a year ago documenting all of the “fun” stories I have from running CTF over the past 4 years.
This was originally supposed to go along with a blog post from hyper on how to run a good CTF, but, well, that post hasn’t been written yet, and I wanted to go ahead and put this out before I lose it.
2016 Quals - SQL
Relevant background information: we run two completely separate CTFd instances leading up to CSAW CTF going live:
- The production instance which has no challenges loaded into it until the hour or so before CTF starts (mainly so that it’s impossible for challenges to leak if, for instance, the start time is set wrong).
- An internal-only dev instance where the challenge board is curated, descriptions are written, points are assigned, etc.
In 2016, the production instance was running on our freshly minted multi-host MySQL cluster while the development instance was using SQLite - the stock CTFd configuration.
This was also the first year that hyper, breadchris, and I were running the competition, so we had some “time management” issues and were frantically still getting the board setup 15 minutes before the competition started (at 6pm).
It was around that time when I called for a freeze on dev so we could migrate all of the challenge metadata and files to prod since, again, nothing was actually on prod besides team registration at this point.
So in short, we needed to get data from a SQLite DB to a MySQL DB. Seems simple, right? I thought it would be. I exported the challenge-related tables from SQLite to a flat SQL file, and then tried to load it into the MySQL cluster.
But there were syntax errors, lots and lots of syntax errors.
It turns out, unbeknownst to me at the time, SQLite’s dialect of SQL is juuusssttt different enough from MySQL’s that this doesn’t really work. I forget the exact issue (I believe it had to do with different quotes being used or something like that), but at this point it was 5:50pm, the competition started in 10 minutes, and we have no challenges in prod.
So I open the file in vim and start finding and replacing things by hand. Whether it was the right idea or not it worked, and at 5:58pm the DB was loaded, and the competition started 2 minutes later.
Except file downloads didn’t work.
Oops, forgot to scp over the files.
2016 Finals - no flags 4 u
Having learned a lot from running Quals, 2016 Finals was definitely in a better shape on the day of the competition than Quals was. We had an astounding 30 minutes to get things working on prod before the competition started!
But everything was going smoothly, and I think I even managed to make it down to the competition floor before CTF officially started.
So we announce the competition is live at 10pm, and immediately run into issues - nobody can submit flags.
They always get a
403 Forbidden back despite being logged in.
So I go find a slightly quieter corner of the room and start bug hunting, having no idea what could be going wrong. This was the exact same box running the exact same CTFd that we had in Quals (where flag submission worked just fine), so what could be going wrong?
After something like an hour of looking at this thinking it was an issue with authentication,
I resort to (gasp) testing in prod to debug, still thinking this is auth related.
Trying and failing with that, I happened to run
git status inside of the CTFd directory on prod,
and found that
challenges.py has an uncommited change. That’s odd.
git diff showed that, for some reason unknown to me,
abort(403) line had been inserted in the first line of the flag submission function.
After reverting that change and restarting CTFd all was well.
I later learned that one of the other CTF leads had added that line in just after Quals had ended because flags were still being accepted even though Quals had ended, and I never found out about it because I was fast asleep at that point having spent something like 36 of the 48 prior hours awake.
2017 Quals - git push -f
The afternoon before CSAW CTF Quals 2017 was going just as the past year had gone - frantic last minute fixes to nearly all of the challenges. But problems were being resolved, and it looked like we would have everything setup and ready to go (maybe even with more than 2 minutes to spare this year!).
Basically all lab members were in at this point (around 6pm or so) working on resolving issues, writing last minute challenges, etc.
Like I said before, we have time management issues.
One particular member (who will remain anonymous) said that they had just committed a fix for some challenge or another,
and so we all pulled to make sure we were up to date.
A few minutes go by, and I think somebody comments that something hasn’t been done yet and for somebody else to add it to the TODO list.
But I had taken care of that issue earlier in the day - like 3 or 4 hours ago at this point.
I checked the file and sure enough, my changes were no where to be seen.
Thinking I was on a branch or something, I checkout master again, pull, etc. but no, it’s just not there.
git log, I don’t even see the commit message.
But what I do see is a large time gap between the last few commits and the commits before that.
My memory is a bit fuzzy here, but I belive 2 or 3 of us came to the same conclusion at almost the exact same time - somebody had force pushed over the last few hours of work.
At this point, some people start asking on slack for anybody who might have cloned the challenges repo within the last hour, some people started trying to go through the reflog to find the old commits (maybe?), and I believe hyper put out the message on IRC that we may be delayed for a bit since we were “working on some things”.
At this point it’s convenient to say that I backup everything. Almost religiously. I have real-time file synchronization between 3 different machines, point in time backups of my primary laptop, and occasionally take full-disk images.
Real-time sync won’t help since I pulled the force-pushed repo, and the last full disk image I had taken was months old, but maybe Time Machine has a snapshot from before the force push.
ghost [18:34] I think I know how we can get stuff back if you haven't found a newer clone already hypersonic [18:34] we havne't ghost [18:34] I have time machine backups hypersonic [18:34] lmaoooo ghost [18:34] Should put us within the past hour
Now you may ask, “why was this in Slack if we were all in the same room?” Well, it’s because I actually came to this realization in the restroom.
After quickly finishing up business there, I hop back into the lab and sure enough, there’s a Time Machine backup from around 5pm which isn’t perfect,
but it would save us about 3 hours of re-doing work. I try to copy it out, but for some reason it fails copying out files inside of
Great. So I pull everything except
.git which thankfully works, completely reinitialize the repo, and add everything back with a single commit message:
2017 Quals - Theme
A very minor blip, but a fun one.
In 2017, we had a new CTFd theme. It was super sweet, but had a few bugs when the competition started, so we (properly) fixed the issues on our dev instance while the competition started, and copied them over to prod once resolved.
Except while doing this, we also copied the CSAW CTF logo image that had “DEV DEV” written over it in big red letters.
Somebody reloaded the page to make sure things worked on prod, noticed it, and we had the issue fixed in about 2 minutes, but it was a funny mess up nonetheless and it did get a few comments on IRC.
2019 Quals - Website Slowness
The post I mentioned at the top of the article goes into much more detail about actually debugging this, but the TL;DR is that the CTF site was incredibly slow, with some pages like the scoreboard timing out entirely. After a few hours of debugging, we eventually found that this was due to around a hundred redis lookups being required per page load, absolutely tanking performance. Moving the particularly common configuration lookups to a local dictionary in Python fixed the problem, taking most page load times from seconds to milliseconds, and making the scoreboard work.
- SQLite != MySQL
- Time manage
- Don’t force push