Presentation: Alibaba Container Platform Infrastructure - a Kubernetes Approach
This presentation is now available to view on InfoQ.com
Watch video with transcriptWhat You’ll Learn
- Hear about how Alibaba is using a very large cluster of Kubernetes systems.
- Find out how Alibaba extended Kubernetes to help with their scalability needs.
- Hear about Alibaba’s plans to open source their Controller and Scheduler extensions.
Abstract
As one of the biggest data companies in the world, Alibaba provides thousands of on-line/off-line services to various customers to support their business. Most of the Alibaba applications are fully containerized and run on top of Alibaba container platform which manages huge number of clustered physical machines. A typical such container cluster consists of tens of thousands nodes and manages more than a hundred thousand heterogeneous applications.
With the rapid increase of adoption and the active development in the community, Kuberenetes has become the dominant cloud operating system to manage cloud native applications. To adopt this emerging technology, we decided to fully integrate upstream Kubernetes into existing Alibaba container management system. We will present how we extend, scale Kubernetes to make this integration succeed in this talk.
Overall, we’d like to deliver the following key takeaways from this talk.
Architectural-wise, Kubernetes is a scalable container operating system. It can manage large scale cluster(more than 10K nodes) with minimal modifications.
Kubernetes can support complicated application deployment/upgrade requirements leveraging its strong extendability. We built a set of new controllers to satisfy our application requirements.
The scheduler plug-in mechanism make it possible to overcome the default scheduling limitations by developing a new scheduler as a replacement. We will share the design of our in-house scheduler which scales extremely well.
It is important to keep the Kubernetes APIs intact during the integration in order to preserve a standard for upper PAAS clients. We will share some of our integration best practices for this purpose.
What is the focus of your work today?
I'm currently working on Alibaba Container Platform team, focusing on the integration of the Kubernetes including extending it for our workloads and needs in terms of creating new controllers making the Scheduler more scalable and providing native Kubernetes API to the upper pass layer demanded by other clients in the Alibaba cloud infrastructure.
What is the motivation for your talk?
The reason I want to give a talk is that I hear from our company or other companies that using Kubernetes as a container platform they have some worries in terms of the scalability or if the Kubernetes can satisfy the workloads that they are running. I want to ease some doubts on these two aspects. The number one thing is that Kubernetes allows you to add extensions to overcome certain scalability limitations. Number two, it is not very complicated to develop certain kind of workloads. I take Alibaba as a strong use case because there are more than 10000 applications running the containers using Kubernetes.
How would you describe the persona for your presentation?
I think this is going to be a mixed one. It is not an introductory talk because most people use Kubernetes already. I'll touch a bit the technology especially on the scalability side, how we make the Scheduler scalable to handle more than 10,000 nodes in a very large cluster. I'm assuming this is one of the biggest Kubernetes deployments in the world. In the scalability side there are going to be some technical details. But mostly is how we extend Kubernetes to support more workloads, what kind of workloads we support, what kind of functions our controller provides.
What do you want this persona to walk away with?
First of all, we have some plan to open source some of our solutions to the community because there are some in house developing the controller or even the scheduler. We feel our experience can benefit the community more. So we seek some collaborations with some other companies or organizations. We hope to open source some of the solutions and let the community to contribute and improve it. That's the main goal of the talk.
Similar Talks
Scaling DB Access for Billions of Queries Per Day @PayPal
Software Engineer @PayPal
Petrica Voicu
Psychologically Safe Process Evolution in a Flat Structure
Director of Software Development @Hunter_Ind
Christopher Lucian
Not Sold Yet, GraphQL: A Humble Tale From Skeptic to Enthusiast
Software Engineer @Netflix
Garrett Heinlen
Let's talk locks!
Software Engineer @Samsara
Kavya Joshi
PID Loops and the Art of Keeping Systems Stable
Senior Principal Engineer @awscloud
Colm MacCárthaigh
Are We Really Cloud-Native?
Director of Technology @Luminis_eu
Bert Ertman
The Trouble With Learning in Complex Systems
Senior Cloud Advocate @Microsoft
Jason Hand
How Did Things Go Right? Learning More From Incidents
Site Reliability Engineering @Netflix
Ryan Kitchens
Graceful Degradation as a Feature
Director of Product @GremlinInc