Presentation: Machine-Learned Indexes - Research from Google
Abstract
Modern data processing systems are designed to be general purpose, in that they can handle a wide variety of different schemas, data types, and data distributions, and aim to provide efficient access and computation over this data. This “one-size-fits-all” nature results in systems that do not take advantage of the unique characteristics of each application, data of the user, or workload. However, ignored in these old systems’ design: machine learning excels at understanding and adapting to particular datasets. We present here a vision (with evidence) for the future of data processing systems: through learning models of the application, data, and workload, we can redesign and customize nearly every component of data processing systems. We will do a deep-dive into understanding how traditional index structures can be reframed as machine learning problems, and that by doing so, and through careful model design and code synthesis, we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. Building on these same modeling techniques, we find that we can achieve improvements in sorting, multi-dimensional indexing, and query optimization, all areas that have historically been the domain of traditional discrete algorithms and complex systems engineering.
What is the focus of your work today?
My research focuses on machine learning and data mining applications, in particular ML for data systems, machine learning fairness, and recommender systems.
What’s the motivation for this talk?
This talk focuses on recent research on using machine learning algorithms to improve traditional data processing systems. In particular, we’ve done research recently on how machine learned models can improve traditional data structures, which has exciting implications for databases and other core components of computer science
How would you describe the persona and level of the target audience?
This talk is geared toward both researchers and engineers exploring how machine learning and data mining can interact with traditional computer engineering and infrastructure challenges.
What do you want this persona to walk away from your talk with?
My hope is that folks will leave the talk with a new perspective on traditional computer science problems and a better understanding of when it may be beneficial to frame some tasks as machine learning tasks.
Similar Talks
Scaling DB Access for Billions of Queries Per Day @PayPal
Software Engineer @PayPal
Petrica Voicu
Psychologically Safe Process Evolution in a Flat Structure
Director of Software Development @Hunter_Ind
Christopher Lucian
PID Loops and the Art of Keeping Systems Stable
Senior Principal Engineer @awscloud
Colm MacCárthaigh
Are We Really Cloud-Native?
Director of Technology @Luminis_eu
Bert Ertman
The Trouble With Learning in Complex Systems
Senior Cloud Advocate @Microsoft
Jason Hand
How Did Things Go Right? Learning More From Incidents
Site Reliability Engineering @Netflix
Ryan Kitchens
What Breaks Our Systems: A Taxonomy of Black Swans
Site Reliability Engineer @Slack, Contributor to Seeking SRE, & SRECon Steering Committee
Laura Nolan
Cultivating High-Performing Teams in Hypergrowth
Chief Scientist @n26
Patrick Kua
Inside Job: How to Build Great Teams Within a Legacy Organization?
Engineering Director @Meetup