QCon New York June 15-19, 2020 | | CockroachDB: Architecture of a Geo-Distributed SQL Database

This presentation is now available to view on InfoQ.com

What You’ll Learn

Hear about SQL distributed databases, what it takes to build one, what are some of the problems that need to be solved.
Hear about how they addressed the complexity of distributed databases with CockroachDB.
Find out why it is better to use a tested distributed DB rather than building an in house solution.

Abstract

In this talk Cockroach Labs' CTO and co-founder, Peter Mattis, will speak to the architecture of an open-source, geo-distributed, SQL database. The talk will be a whirlwind tour of CockroachDB’s internals, covering the usage of Raft for consensus, the challenges of data distribution, distributed transactions, distributed SQL execution, and distributed SQL optimizations.

Question:

Could you tell me what the focus of your work today is?

Answer:

I am VP of Engineering and CTO of Cockroach Labs. I am keeping the lights on for the engineering department at Cockroach Labs and making sure product development stays on track for CockroachDB. I've been involved in every aspect of CockroachDB and today engineering work is much more limited, focusing on specifics like the performance of our low level key-value storage, but that probably gets about 20% of my time, 80% is training the rest of the teams to functioning and working smoothly.

Question:

What's the motivation for your talk?

Answer:

To give an architectural overview of the components that go into building this distributed SQL database. I want to talk about this from the base level, how we're doing replication, how our data is spread across the cluster and then step it up through the levels, how we do distributed transactions, some of the optimizations involved there, stepping up past the distributed transactional store up in the SQL, how the SQL execution works, how we map SQL data down onto key-value data and then that we have a cost-based SQL optimizer, how the SQL optimizer plays into a distributed SQL database.

Question:

How would you describe the persona on the level of the target audience?

Answer:

I think this talk could be interesting to anybody who uses SQL databases. It might be more interesting to people who have some knowledge about the internals. But if you're having knowledge of the internals of a SQL database I think you'd get something out of this talk, understanding what's going on behind the scenes, peel back the curtain there, and sometimes if you're not familiar with the inner workings of a system you attribute black magic going on there.

Question:

What is it that makes data distribution difficult?

Answer:

There's just a whole bunch of factors that go into deciding where you want to place your data, factors such as you want to spread the load out across the system. We also want to spread the CPU usage across the system. You could have heterogeneous data within your cluster taking that into account. But then when you start to create geographic clusters, you start taking latency into account. You want the data to be close to where it is being accessed. And that changes over time. An example of that is during the course of a day news sites will see more traffic as the country wakes up and starts reading the morning’s news. There will be different traffic during different parts of the day, and you see hotspots. That's one aspect of it, you're having to adjust to changing load and changing patterns and moving the data around dynamically.

Question:

I saw in the abstract you are using Raft for your consensus protocol. I was curious as to why you would choose that over Paxos.

Answer:

When we started working on CockroachDB Paxos was known. Diego had just written his thesis about Raft. This is four or five years ago now. And it was considered to be an easier more approachable consensus protocol. Now, having worked with Raft, and thoroughly understanding Paxos as well I'm not sure the simplifications are as dramatic as advertised. These things are all complex at some level. It's interesting though, having worked on Cockroach for four plus years now, Raft is one bit of complexity but it's actually understandable. There's other levels out there that I think are more significant the Raft. Yeah, looking back in hindsight, would we have chosen Raft If we were starting today? I don't know. Heidi Howard just had this hundred fifty page thesis about Paxos and there's ideas there that we might be adopting.

We're wedded to Raft in one way which is that we need to maintain backwards compatibility but we can be evolving the distributed consensus protocol that we use and we're looking to do that the next release. Something that was implemented in Diego's thesis that was never implemented in our version of Raft is a consensus protocol for changing the members in a Raft group and that's something we're looking to implement in the next release.

Question:

What do you want attendees who come to talk to walk away with?

Answer:

I want to walk away with some understanding of the hard work that goes into building a distributed database, an appreciation for what goes on. They're trying to think like, hey, distributed SQL databases or distributed databases are easier, we just need to take Raft and something else and stack them together and build it, and I want to walk away saying, oh, this is really hard software engineering at various levels and it's better off to use a system that was hardened and battle tested and being worked on seriously by a group of database engineers. Databases in general should be taking off hard work from the applications, making easier for applications to be run. And there's too long in the industry where the NoSQL movement saying we need horizontal scalability, we'll push the burden onto the application developers and we felt that was an error. I think the industry is starting to agree with that, to provide transactions and indexes and SQL than to have applications devs provide that themselves. I want application developers who come to watch this talk to say, yes, I want the database to take care of that because there's hard things being solved there.

Speaker: Peter Mattis

CockroachDB maintainer, Co-founder & CTO @CockroachDB

Peter is the co-founder of Cockroach Labs where he works on a bit of everything, from low level optimization of code to refining the overall design. He has worked on distributed systems for most of his career, designing and implementing the original Gmail backend search and storage system at Google and designing and implementing Colossus, the successor to Google's original distributed file system. In his university days he was one of the original authors of the GIMP and is still amazed when people tell him they use it frequently.