Our on-call engineers were paged on November 13 at 12:21pm indicating our production website was loading slowly.
We identified that Rack::Attack
, our anti-DDoS protection middleware, was causing significant slowdowns in page loads. Specifically, a cache write during the request was the source of much of the slowdown.
This pointed to an issue with our cache. We temporarily recreated our memcached instance, and this immediately resolved the issue.
Since this incident, we’ve taken steps to add a break-glass to quickly disable Rack::Attack
in the event that it’s unable to read from a cache. We’ve also switched it to using a dedicated highly-available Redis for its caching layer, separate from the rest of the production application.