Presentation: Alibaba Container Platform Infrastructure - a Kubernetes Approach

Track: Software Defined Infrastructure: Kubernetes, Service Meshes, & Beyond

Location: Broadway Ballroom North, 6th fl.

Duration: 10:35am - 11:25am

Day of week:

This presentation is now available to view on

Watch video with transcript

What You’ll Learn

  1. Hear about how Alibaba is using a very large cluster of Kubernetes systems.
  2. Find out how Alibaba extended Kubernetes to help with their scalability needs.
  3. Hear about Alibaba’s plans to open source their Controller and Scheduler extensions.


As one of the biggest data companies in the world, Alibaba provides thousands of on-line/off-line services to various customers to support their business. Most of the Alibaba applications are fully containerized and run on top of Alibaba container platform which manages huge number of clustered physical machines. A typical such container cluster consists of tens of thousands nodes and manages more than a hundred thousand heterogeneous applications.

With the rapid increase of adoption and the active development in the community, Kuberenetes has become the dominant cloud operating system to manage cloud native applications.  To adopt this emerging technology, we decided to fully integrate upstream Kubernetes into existing Alibaba container management system. We will present how we extend, scale Kubernetes to make this integration succeed in this talk.

Overall, we’d like to deliver the following key takeaways from this talk.

Architectural-wise, Kubernetes is a scalable container operating system. It can manage large scale cluster(more than 10K nodes) with minimal modifications.

Kubernetes can support complicated application deployment/upgrade requirements leveraging its strong extendability. We built a set of new controllers to satisfy our application requirements.

The scheduler plug-in mechanism make it possible to overcome the default scheduling limitations by developing a new scheduler as a replacement. We will share the design of our in-house scheduler which scales extremely well.

It is important to keep the Kubernetes APIs intact during the integration in order to preserve a standard for upper PAAS clients.  We will share some of our integration best practices for this purpose.


What is the focus of your work today?


I'm currently working on Alibaba Container Platform team, focusing on the integration of the Kubernetes including extending it for our workloads and needs in terms of creating new controllers making the Scheduler more scalable and providing native Kubernetes API to the upper pass layer demanded by other clients in the Alibaba cloud infrastructure.


What is the motivation for your talk?


The reason I want to give a talk is that I hear from our company or other companies that using Kubernetes as a container platform they have some worries in terms of the scalability or if the Kubernetes can satisfy the workloads that they are running. I want to ease some doubts on these two aspects. The number one thing is that Kubernetes allows you to add extensions to overcome certain scalability limitations. Number two, it is not very complicated to develop certain kind of workloads. I take Alibaba as a strong use case because there are more than 10000 applications running the containers using Kubernetes.


How would you describe the persona for your presentation?


I think this is going to be a mixed one. It is not an introductory talk because most people use Kubernetes already. I'll touch a bit the technology especially on the scalability side, how we make the Scheduler scalable to handle more than 10,000 nodes in a very large cluster. I'm assuming this is one of the biggest Kubernetes deployments in the world. In the scalability side there are going to be some technical details. But mostly is how we extend Kubernetes to support more workloads, what kind of workloads we support, what kind of functions our controller provides.


What do you want this persona to walk away with?


First of all, we have some plan to open source some of our solutions to the community because there are some in house developing the controller or even the scheduler. We feel our experience can benefit the community more. So we seek some collaborations with some other companies or organizations. We hope to open source some of the solutions and let the community to contribute and improve it. That's the main goal of the talk.

Speaker: Fei Guo

Senior Staff Engineer in Alibaba Container Platform Group

Fei Guo is currently a senior staff engineer in Alibaba Container Platform Group. He has more than 10 years of experience in compute resource management and performance optimization for virtualized and containerized environments. Before joining Alibaba, Fei worked in VMware and was the tech leader for vSphere DRS (distributed resource scheduler). He has authored/co-authored a few white papers and academic papers in these technical areas.

Find Fei Guo at