Track: Stream Processing at Large

Day of week:

The software industry has learned that the world’s data can be represented as unbounded queues of changes. It can be sliced into sliding windows. It can be aggregated, rolled up, and analyzed. We can choose a number of ways to do this work such as using Kafka Streams or Spark Streaming. We can opt for Apache Beam, Storm, Samza, Flume, or Flink. We have a large pool of options on which we can build powerful systems, but there is accidental complexity lurking in any of the choices:

  • What if I need to rebuild all the data?
  • How do I know when my system is not healthy?
  • How do I reason about time in this system?
  • What if things arrive out of order?
  • How do I know things have arrived?

This track walks through uses of streaming technologies at large, the problems encountered, and how teams are coping with the state of this new world. As we approach maturity in streaming systems the companies using these systems are growing ecosystems and best practices around building and operating them. They are discovering new ways to reason about monitoring, testing, performance, and failure. This track is an opportunity to learn from their experiences.

Track Host: Michelle Brush

SRE Manager @Google

Michelle Brush is a math geek turned computer geek with over 15 years of software development experience. She has developed algorithms and data structures for pathfinding, search, compression, and data mining in embedded as well as distributed systems. In her current role as an SRE Manager for Google, she leads teams of SREs that ensures GCE's APIs are reliable. Previously, she served as the Director of HealtheIntent Architecture for Cerner Corporation, responsible for the data processing platform for Cerner’s Population Health solutions. Prior to her time at Cerner, she was the lead engineer for Garmin's automotive routing algorithm.

Trackhost Interview

  • QCon: So who is the audience for the track that you're hosting?
  • Michelle: This track is for someone that has some exposure to streaming systems either because they're working with one, and they have buyers’ remorse, and they’re asking themselves if they picked the right streaming engine or for an architect or software engineer who is currently dealing with a batch system and thinking about moving to streaming. This track will give attendees an idea of what's in store when it comes to streaming.

  • QCon: What do you hope someone walks away from this track with?
  • Michelle: First, that they need a schema registry! Besides that, getting an awareness of how much complexity is still left after you've made the initial decision of which framework or engine you're going to pick. Streaming as a whole is not in a state yet that's anywhere near consumable or usable, which is evidenced by the constant explosion of new, thrilling frameworks still being built out right now. It's getting much better, but it's still far from being solved.

  • QCon: What questions will your track answer for attendees?
  • Michelle: Streaming is the direction that I see most architectures going in the future. If you're in Microservices you're probably going to move to streams with a combination of “functions as a service.” If you're in batch, demands for latency are going to increase and people are going to want more for feedback on their algorithms, analytics, and machine learning—faster

Personalizing Netflix with Streaming Datasets

Streaming applications have historically been complex to design and implement because of the significant infrastructure investment. However, recent active developments in various streaming platforms provide an easy transition to stream processing, and enable analytics applications/experiments to consume near real-time data without massive development cycles.

This talk will cover the experiences Netflix’s Personalization Data team had in stream processing unbounded datasets. The datasets consisted of - but were not limited to - the stream of playback events (all of Netflix’s plays worldwide) that are used as feedback for recommendation algorithms. These datasets when ultimately consumed by the team's machine learning models, directly affect the customer’s personalized experience. As such, the impact is high and tolerance for failure is low.

This talk will provide guidance on how to determine whether a streaming solution is a right fit for your problem. It will compare microbatch versus event-based approaches with Apache Spark and Apache Flink as examples. Finally, the talk will discuss the impact moving to stream processing had on the team's customers, and (most importantly) the challenges faced.

Shriya Arora, Senior Data Engineer @Netflix

Adopting Stream Processing for Instrumentation

In the midst of building a multi-datacenter, multi-tenant instrumentation and visibility system, we arrived at stream processing as an alternative to storing, forwarding, and post-processing metrics as traditional systems do. However, the streaming paradigm is alien to many engineers and sysadmins who are used to working with "wall-of-graphs" dashboards, predefined aggregates, and point-and-click alert configuration.

Taking inspiration from REPLs, literate programming, and DevOps practices, we've designed an interface to our instrumentation system that focuses on interactive feedback, note-taking, and team communication. An engineer can both experiment with new flows at low risk, and codify longer-term practices into runbooks that embed live visualizations of instrumentation data. As a result, we can start to free our users from understanding the mechanics of the stream processor and instead focus on the domain of instrumentation.

In this talk, we will discuss how the interface described above works, how the stream processor manages flows on behalf of the user, and some tradeoffs we have encountered while preparing the system to roll out into our organization.

Sean Cribbs, Software Engineer @Comcast

Streaming Microservices: Contracts & Compatibility

In a world of microservices that communicate via unbounded streams of events, schemas are the contracts between the services. Having an agreed contract allows the teams developing those services to move fast, by reducing the risk involved in making changes. Yet delivering events with schema change in mind isn’t the common practice yet.

In this presentation, we’ll discuss patterns of schema design, schema storage and schema evolution that help development teams build better contracts through better collaboration - and deliver resilient applications faster. We’ll look at how schemas were used in the past, how their meaning has changed over the years and why they gained particular importance with the rise of the stream processing.

Gwen Shapira, Engineering manager @Confluent, Apache Kafka PMC, author of Kafka the Definitive Guide

Survival of the Fittest - Streaming Architectures

​“Perfect is the enemy of good” ​ ​ -​ ​Voltaire

On the journey through life, we learn and adapt via trial and error - software development is no different. We realize and accept that we won’t build the perfect solution the first time around, it takes many iterations. At Gilt.com, now part of HBC Digital, we started processing and streaming event data nearly 5 years ago. Our initial solution was dramatically different from our current solution - and will likely be different from our solution 5 years from now.

The Gilt.com banner, at HBC Digital, is in the business of flash sales, which makes for some interesting use cases in the world of streaming. We release new sales of top designer labels, at up to 70% off retail, on the web and our mobile app, every day at Noon and 9pm. Around the time of these releases, we experience volume spikes between 10X and 100X on our streams.

Numerous streaming frameworks, homemade, as well as, open source, did not pass the evolutionary tests. Frameworks come and go, ​so this talk is not about the “best” framework or platform to use, rather it’s about core principles that will stand the tests of streaming evolution. Also, this talk covers major potential pitfalls that you may stumble over on your path to streaming, as well as, how to avoid these. Finally, this talk will cover what the next evolutionary step in streaming at HBC Digital. ​

Michael Hansen, Principal Data Engineer @hbcdigital

When Streams Fail: Kafka Off the Shore

How good is your streaming framework at failure? Does it die gracefully telling you exactly at which point it died? Does it tell you why it died? Does it pick-up where it left off afterwards? Can it easily skip the "erroneous" portions of the stream? Do you always know what was processed and what wasn't? Does it even have to die when process, host, data-center fail?

In this talk we focus on "What Ifs" scenarios and how to evaluate and architect a streaming platform that has high level of resilience. We'll look at Kafka and Spark Streaming as specific examples and share our experience of using these frameworks to process financial transactions answering the questions above along the way. We'll also show examples of tools that we built along our streaming journey which we found invaluable during failure scenarios.

Anton Gorshkov, Managing Director @GoldmanSachs

Tracks

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.