Presentation: Building and Operating a Serverless Data Pipeline
This presentation is now available to view on InfoQ.com
Watch video with transcriptWhat You’ll Learn
- Hear about Intent’s experience transitioning to a serverless platform, some of the benefits it brought and some of the pain points.
- Listen about serverless, when it is worth considering it and what it takes to implement it.
Abstract
At Intent our machine learning platform processes real-time and historical data to predict user intent on billions of page views a month. At the heart of this system is a serverless data pipeline that allows us to gather, process, store, and analyze data from disparate data sources. In this talk, we’ll discuss the motivations of switching to a serverless infrastructure, and lessons learned while building and operating such a system at scale; focusing on operability, stability, scalability, and ease of development.
What is the focus of your work these days?
I'm the lead of the data platform team at Intent. Intent is a data science company that helps commerce sites maximize the value for each person who visits their site. Our main product is an ad network that runs on travel sites. On the data platform team we oversee our data pipeline, that collects data from ad servers and various other sources, the processing of that data and storage of that data to a data lake. We also oversee data warehouses, and aggregation jobs that compute summaries of that data; making it available to various analysts and other business users. Our team is also responsible for automated reporting solutions for our publishers and advertisers.
What is the motivation for your talk?
To talk about our experiences at Intent building out our serverless data platform. Going from our legacy platform to this new data platform. Some of the wins that we experienced with that and also some of those pain points that we had during that transition. I also think that Serverless is becoming a bit of a buzzword so I think it’s worth clarifying what serverless is, and some of the benefits and tradeoffs.
Will the examples apply only to your domain or they are more general?
Will be aimed towards being generic for the data pipeline processing domain. We are on AWS so a lot of our experiences are based on that, but will discuss things that are applicable to other platforms. Things like how systems compose and encapsulation are important in any serverless application.
How would you describe the persona and the level of the target audience?
A tech lead / senior developer / engineering manager who's interested in serverless, is thinking about building out a data platform, or thinking about ways they might evolve their own stack.
What do you want this persona to walk away with?
I want them to have insight on why serverless can be a boon for their team. And what are some of the pain points or pitfalls they might encounter going down that road.
Similar Talks
Scaling DB Access for Billions of Queries Per Day @PayPal
Software Engineer @PayPal
Petrica Voicu
Psychologically Safe Process Evolution in a Flat Structure
Director of Software Development @Hunter_Ind
Christopher Lucian
PID Loops and the Art of Keeping Systems Stable
Senior Principal Engineer @awscloud
Colm MacCárthaigh
Are We Really Cloud-Native?
Director of Technology @Luminis_eu
Bert Ertman
The Trouble With Learning in Complex Systems
Senior Cloud Advocate @Microsoft
Jason Hand
How Did Things Go Right? Learning More From Incidents
Site Reliability Engineering @Netflix
Ryan Kitchens
What Breaks Our Systems: A Taxonomy of Black Swans
Site Reliability Engineer @Slack, Contributor to Seeking SRE, & SRECon Steering Committee
Laura Nolan
Cultivating High-Performing Teams in Hypergrowth
Chief Scientist @n26
Patrick Kua
Inside Job: How to Build Great Teams Within a Legacy Organization?
Engineering Director @Meetup