Streaming applications have historically been complex to design and implement because of the significant infrastructure investment. However, recent active developments in various streaming platforms provide an easy transition to stream processing, and enable analytics applications/experiments to consume near real-time data without massive development cycles.
This talk will cover the experiences Netflix’s Personalization Data team had in stream processing unbounded datasets. The datasets consisted of - but were not limited to - the stream of playback events (all of Netflix’s plays worldwide) that are used as feedback for recommendation algorithms. These datasets when ultimately consumed by the team's machine learning models, directly affect the customer’s personalized experience. As such, the impact is high and tolerance for failure is low.
This talk will provide guidance on how to determine whether a streaming solution is a right fit for your problem. It will compare microbatch versus event-based approaches with Apache Spark and Apache Flink as examples. Finally, the talk will discuss the impact moving to stream processing had on the team's customers, and (most importantly) the challenges faced.
In the midst of building a multi-datacenter, multi-tenant instrumentation and visibility system, we arrived at stream processing as an alternative to storing, forwarding, and post-processing metrics as traditional systems do. However, the streaming paradigm is alien to many engineers and sysadmins who are used to working with "wall-of-graphs" dashboards, predefined aggregates, and point-and-click alert configuration.
Taking inspiration from REPLs, literate programming, and DevOps practices, we've designed an interface to our instrumentation system that focuses on interactive feedback, note-taking, and team communication. An engineer can both experiment with new flows at low risk, and codify longer-term practices into runbooks that embed live visualizations of instrumentation data. As a result, we can start to free our users from understanding the mechanics of the stream processor and instead focus on the domain of instrumentation.
In this talk, we will discuss how the interface described above works, how the stream processor manages flows on behalf of the user, and some tradeoffs we have encountered while preparing the system to roll out into our organization.
In a world of microservices that communicate via unbounded streams of events, schemas are the contracts between the services. Having an agreed contract allows the teams developing those services to move fast, by reducing the risk involved in making changes. Yet delivering events with schema change in mind isn’t the common practice yet.
In this presentation, we’ll discuss patterns of schema design, schema storage and schema evolution that help development teams build better contracts through better collaboration - and deliver resilient applications faster. We’ll look at how schemas were used in the past, how their meaning has changed over the years and why they gained particular importance with the rise of the stream processing.
Gwen Shapira, Engineering manager @Confluent, Apache Kafka PMC, author of Kafka the Definitive Guide
“Perfect is the enemy of good” - Voltaire
On the journey through life, we learn and adapt via trial and error - software development is no different. We realize and accept that we won’t build the perfect solution the first time around, it takes many iterations. At Gilt.com, now part of HBC Digital, we started processing and streaming event data nearly 5 years ago. Our initial solution was dramatically different from our current solution - and will likely be different from our solution 5 years from now.
The Gilt.com banner, at HBC Digital, is in the business of flash sales, which makes for some interesting use cases in the world of streaming. We release new sales of top designer labels, at up to 70% off retail, on the web and our mobile app, every day at Noon and 9pm. Around the time of these releases, we experience volume spikes between 10X and 100X on our streams.
Numerous streaming frameworks, homemade, as well as, open source, did not pass the evolutionary tests. Frameworks come and go, so this talk is not about the “best” framework or platform to use, rather it’s about core principles that will stand the tests of streaming evolution. Also, this talk covers major potential pitfalls that you may stumble over on your path to streaming, as well as, how to avoid these. Finally, this talk will cover what the next evolutionary step in streaming at HBC Digital.
How good is your streaming framework at failure? Does it die gracefully telling you exactly at which point it died? Does it tell you why it died? Does it pick-up where it left off afterwards? Can it easily skip the "erroneous" portions of the stream? Do you always know what was processed and what wasn't? Does it even have to die when process, host, data-center fail?
In this talk we focus on "What Ifs" scenarios and how to evaluate and architect a streaming platform that has high level of resilience. We'll look at Kafka and Spark Streaming as specific examples and share our experience of using these frameworks to process financial transactions answering the questions above along the way. We'll also show examples of tools that we built along our streaming journey which we found invaluable during failure scenarios.