Presentation: Streaming Microservices: Contracts & Compatibility

Track: Stream Processing at Large

Location: Broadway Ballroom South, 6th fl.

Day of week:

Slides: Download Slides

Level: Intermediate - Advanced

Persona: Data Scientist, Developer

What You’ll Learn

  • Understand the need and role of defining types in an Enterprise Application.
  • Learn techniques and approaches to manage these schemas.
  • Hear war stories and lessons on what can happen if you don’t practice clean practices around defining your types.

Abstract

In a world of microservices that communicate via unbounded streams of events, schemas are the contracts between the services. Having an agreed contract allows the teams developing those services to move fast, by reducing the risk involved in making changes. Yet delivering events with schema change in mind isn’t the common practice yet.

In this presentation, we’ll discuss patterns of schema design, schema storage and schema evolution that help development teams build better contracts through better collaboration - and deliver resilient applications faster. We’ll look at how schemas were used in the past, how their meaning has changed over the years and why they gained particular importance with the rise of the stream processing.

Question: 

QCon: What’s the motivation for your talk?

Answer: 

Gwen: I’ve been concerned about the way companies manage their metadata since… 2012 probably. When I first moved from managing relational databases to managing Hadoop clusters. DBAs take metadata, especially schemas for granted. And then suddenly Hadoop was this wild west, people just dump data and no one knows how to use it. You create all those crazy dependencies between teams because whoever writes the data makes decisions that affect everyone and can break downstream apps at any time. This is even more difficult with stream processing because of the real-time and microservices nature of the applications.

I spent the last 5 years working with customers on solving this problem with different tools and environments. I feel like I have quite a lot to share.

Question: 

QCon:How you you describe the persona of the target audience of this talk?

Answer: 

Gwen: The relevant role is usually “enterprise architect”, because they have overall responsibility for how different applications communicate and play together. Although I hope that many responsible engineers care as well. My target audience is usually from medium to huge companies - you need to be of a certain size before questions of compatibility become important.

Question: 

QCon: How are you going to address these things?

Answer: 

Gwen: I'm going to spend part of my time just telling horror stories of what happens if you don't manage your schemas (I have four years worth of horror stories to share). Then I'm going to talk about how it's a general problem. It's not about if you use Avro or if you use JSON or something thing else. It doesn't even matter if you do stream processing at all, it's a very generic problem on how components, services and teams communicate.

Then I'm going to go into some solutions, including the Confluent Schema Registry. It's open to note though; there are lots of other solutions that you can use too.

I want to end the talk with few examples of the potential in implementing this kind of centralized streams and schema of management. In addition to the immediate compatibility benefits - a centralized metadata store can be used for data discovery and for governance. I hope to share some examples of what forward-looking enterprise architects in some organizations are currently exploring.

Question: 

QCon: QCon targets advanced architects and sr development leads, what do you feel will be the actionable that type of persona will walk away from your talk with?

Answer: 

If you use events to communicate between applications (and this includes all stream processing apps) - you absolutely need to figure out a way to detect and prevent schema compatibility issues early in the development process. You also need reasonable ways to allow schemas to change without breaking things. My talk is full of suggestions on how to do both.

Question: 

What do you feel is the most important thing/practice/tech/technique for a developer/leader in your space to be focused on today?

Answer: 

Gwen: The transition from both request-response processing and batch processing to stream processing.

Every business has many applications that are either request-response or batch due to historical reasons - but the real business process they model is a continuous stream of events. Using new technologies to model the business process more accurately in the applications will help make the entire process more efficient and more timely.

I am typically wary of cutting-edge technologies and prefer to use proven systems (like Kafka!), but one of the technologies I am currently most curious about is Lift’s Envoy. I hope to learn more about it at QCon NYC.

Speaker: Gwen Shapira

Engineering manager @Confluent, Apache Kafka PMC, author of Kafka the Definitive Guide

Gwen is a principal data architect at Confluent helping customers to achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating microservices, relational and big data technologies. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects. When Gwen isn't coding or building data pipelines, you can find her pedaling on her bike exploring the roads and trails of California, and beyond.

Find Gwen Shapira at