Speaker: Mayank Bansal
Staff Engineer @Uber
Find Mayank Bansal at:
SESSION + Live Q&A
Peloton - Uber's Webscale Unified Scheduler on Mesos & Kubernetes
With the increasing scale of Uber’s business, efficient use of cluster resources is important to reduce the cost per trip. As we have learned when operating Mesos clusters in production, it is a challenge to overcommit resources for latency-sensitive services due to their large spread of resource usage patterns. Uber also has significant demand on running large-scale batch jobs for marketplace intelligence, fraud detection, maps, self-driving vehicles etc.
In this talk, we will present Peloton, a Unified Resource Scheduler for collocating heterogeneous workloads in shared Mesos clusters. The goal of Peloton is to manage compute resources more efficiently while providing hierarchical max-min fairness guarantees for different teams. Peloton schedules large-scale batch jobs with millions of tasks and also supports distributed TensorFlow jobs with thousands of GPUs.