QCon New York June 15-19, 2020 | | Machine-Learned Indexes

Abstract

Modern data processing systems are designed to be general purpose, in that they can handle a wide variety of different schemas, data types, and data distributions, and aim to provide efficient access and computation over this data. This “one-size-fits-all” nature results in systems that do not take advantage of the unique characteristics of each application, data of the user, or workload. However, ignored in these old systems’ design: machine learning excels at understanding and adapting to particular datasets. We present here a vision (with evidence) for the future of data processing systems: through learning models of the application, data, and workload, we can redesign and customize nearly every component of data processing systems. We will do a deep-dive into understanding how traditional index structures can be reframed as machine learning problems, and that by doing so, and through careful model design and code synthesis, we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. Building on these same modeling techniques, we find that we can achieve improvements in sorting, multi-dimensional indexing, and query optimization, all areas that have historically been the domain of traditional discrete algorithms and complex systems engineering.

Question:

What is the focus of your work today?

Answer:

My research focuses on machine learning and data mining applications, in particular ML for data systems, machine learning fairness, and recommender systems.

Question:

What’s the motivation for this talk?

Answer:

This talk focuses on recent research on using machine learning algorithms to improve traditional data processing systems. In particular, we’ve done research recently on how machine learned models can improve traditional data structures, which has exciting implications for databases and other core components of computer science

Question:

How would you describe the persona and level of the target audience?

Answer:

This talk is geared toward both researchers and engineers exploring how machine learning and data mining can interact with traditional computer engineering and infrastructure challenges.

Question:

What do you want this persona to walk away from your talk with?

Answer:

My hope is that folks will leave the talk with a new perspective on traditional computer science problems and a better understanding of when it may be beneficial to frame some tasks as machine learning tasks.

Speaker: Alex Beutel

Senior Research Scientist @Google

Alex Beutel is a Staff Research Scientist in Google Brain SIR, leading a team working on fairness in machine learning as well as working on neural recommendation and ML for Systems. He received his Ph.D. in 2016 from Carnegie Mellon University’s Computer Science Department, and previously received his B.S. from Duke University in computer science and physics. His Ph.D. thesis on large-scale user behavior modeling, covering recommender systems, fraud detection, and scalable machine learning, was given the SIGKDD 2017 Doctoral Dissertation Award Runner-Up. He received the Best Paper Award at KDD 2016 and ACM GIS 2010, was a finalist for best paper in KDD 2014 and ASONAM 2012, and was awarded the Facebook Fellowship in 2013 and the NSF Graduate Research Fellowship in 2011. More details can be found at alexbeutel.com.