QCon New York June 15-19, 2020 | | Building and Operating a Serverless Data Pipeline

This presentation is now available to view on InfoQ.com

What You’ll Learn

Hear about Intent’s experience transitioning to a serverless platform, some of the benefits it brought and some of the pain points.
Listen about serverless, when it is worth considering it and what it takes to implement it.

Abstract

At Intent our machine learning platform processes real-time and historical data to predict user intent on billions of page views a month. At the heart of this system is a serverless data pipeline that allows us to gather, process, store, and analyze data from disparate data sources. In this talk, we’ll discuss the motivations of switching to a serverless infrastructure, and lessons learned while building and operating such a system at scale; focusing on operability, stability, scalability, and ease of development.

Question:

What is the focus of your work these days?

Answer:

I'm the lead of the data platform team at Intent. Intent is a data science company that helps commerce sites maximize the value for each person who visits their site. Our main product is an ad network that runs on travel sites. On the data platform team we oversee our data pipeline, that collects data from ad servers and various other sources, the processing of that data and storage of that data to a data lake. We also oversee data warehouses, and aggregation jobs that compute summaries of that data; making it available to various analysts and other business users. Our team is also responsible for automated reporting solutions for our publishers and advertisers.

Question:

What is the motivation for your talk?

Answer:

To talk about our experiences at Intent building out our serverless data platform. Going from our legacy platform to this new data platform. Some of the wins that we experienced with that and also some of those pain points that we had during that transition. I also think that Serverless is becoming a bit of a buzzword so I think it’s worth clarifying what serverless is, and some of the benefits and tradeoffs.

Question:

Will the examples apply only to your domain or they are more general?

Answer:

Will be aimed towards being generic for the data pipeline processing domain. We are on AWS so a lot of our experiences are based on that, but will discuss things that are applicable to other platforms. Things like how systems compose and encapsulation are important in any serverless application.

Question:

How would you describe the persona and the level of the target audience?

Answer:

A tech lead / senior developer / engineering manager who's interested in serverless, is thinking about building out a data platform, or thinking about ways they might evolve their own stack.

Question:

What do you want this persona to walk away with?

Answer:

I want them to have insight on why serverless can be a boon for their team. And what are some of the pain points or pitfalls they might encounter going down that road.

Speaker: Will Norman

Director Of Engineering at Intent

Will is a Director of Engineering at Intent where he leads the Data Platform team. He has over 10 years of experience architecting, and building, high volume systems in the FinTech and AdTech industries. These days Will is glad to be spending more time wrangling services than servers. Outside of tech he enjoys cooking, traveling, and spending time with his family.