QCon New York June 15-19, 2020 | | Nonconformist Resilience: DB-Backed Job Queues

What You’ll Learn

Discover the hidden complexity implicit in common message-bus-based approaches to background work.
Reset expectations of what your platform can bring to correctness and resilience at high velocity and team scale.
Understand the qualities that might make a database-backed job queue right for your next app.

Abstract

Resilience in the face of chaos is a tall order. As a vertically integrated financial institution where rapidly delivered features with complete data consistency and scrupulous correctness are all non-negotiable, Betterment had its work cut out for it. So we moved the goalposts - inward. By eliminating complexity that many teams consider table stakes, we’ve built a distributed software ecosystem that empowers engineers to do their best work with a minimum of high-wire distributed systems thinking.

One of the complexity-obliterating weapons in our arsenal is our approach to background work. I’ll present how we use, deploy, and even love Delayed::Job (yes, a database-backed job queue) at Betterment for its transactional enqueue semantics, safe retry with exponential backoff, and its storage model, which lends itself to simple but powerful SLA-based monitoring and alerting. DJ enables engineers to pour their creativity into their features and get resilience by default.

Question:

QCon: What is the focus of your work today?

Answer:

I lead software architecture at Betterment, which means I work with people throughout Betterment’s engineering team, keeping apprised of new developments and challenges throughout the org, sharing and cross-pollinating best practices and a shared vision for our platform, and regularly diving deep into the code alongside domain owners

Question:

QCon: What’s the motivation for your talk?

Answer:

John: A lot of companies end up selecting patterns based on industry norms, but sometimes the accepted patterns have rough edges that may permanently leak into your app layer causing pain. There’s a strong sense in the industry currently that you should never use a database as a job queue, instead delegating to a product that’s called a queue. And there are valid reasons to prefer a dedicated queue, but there are also reasons not to, which often get short shrift. At Betterment, we build a suite of products that people rightly care a great deal about the correctness and consistency of, and folks don’t generally realize that when you coordinate across two datastores (which a queue is) how hard a problem it is to perform a transaction that also enqueues background work, and then ensure that that background work definitely gets worked if-and-only-if the transaction commits. Many folks will end up addressing the edge cases in their business logic on a per-feature basis rather than simply eliminating the problem by unfashionably using the database as a work queue.

There will definitely be pushback from some folks on the basis of scalability and throughput - and those are real concerns for some applications, but certainly not all, and in many cases, there are other levers you should be thinking about pulling to alleviate those concerns rather than switching jobs to a dedicated queue. I’ll be presenting an honest warts-and-all accounting of the tradeoffs so that ideally folks in the room can apply them to their distinctive problem spaces and come away with better outcomes, fully aware of the pros and cons of the choices they make.

Question:

QCon: How do-you describe the persona of the target audience of this talk?

Answer:

John: Engineers building new platforms or evaluating technology for future revs of their platforms would be the sweet spot. Background work is something that most grown-up apps need to perform, and there doesn’t seem to be much info out there about the pros and cons of different approaches.

Speaker: John Mileham

VP Architecture @Betterment

John Mileham is VP of Architecture at Betterment. After a decade building industry-defining products like Berklee Online, John, a glutton for punishment, decided to switch to FinTech where he could build industry-defining products that, just for fun, must also protect people's life savings and most sensitive personal information, first at ImpulseSave, and now Betterment.

John is passionate about humane software, teaching, and unlocking teams' potential to move fast, safely. Fittingly, in his free time, he teaches people to drive fast, safely, at race tracks throughout the northeast as an instructor with the Audi Club.