Presentation: When Streams Fail: Kafka Off the Shore

Track: Stream Processing at Large

Day of week:

Slides: Download Slides

Level: Intermediate

Persona: Data Scientist

Abstract

How good is your streaming framework at failure? Does it die gracefully telling you exactly at which point it died? Does it tell you why it died? Does it pick-up where it left off afterwards? Can it easily skip the "erroneous" portions of the stream? Do you always know what was processed and what wasn't? Does it even have to die when process, host, data-center fail?

In this talk we focus on "What Ifs" scenarios and how to evaluate and architect a streaming platform that has high level of resilience. We'll look at Kafka and Spark Streaming as specific examples and share our experience of using these frameworks to process financial transactions answering the questions above along the way. We'll also show examples of tools that we built along our streaming journey which we found invaluable during failure scenarios.

Speaker: Anton Gorshkov

Managing Director @GoldmanSachs

Anton Gorshkov is a Managing Director at Goldman Sachs Asset Management where he runs a global Core Platform team, focusing on GSAM’s data strategy and real-time services. Anton started at Goldman 15 years ago and worked with numerous groups throughout his career, mostly focusing on data-oriented concerns, ranging from data warehouses to in-memory key-value stores to building a custom language and framework used to generate investment signals.

Find Anton Gorshkov at