QCon New York June 15-19, 2020 | | How to Evolve Kubernetes Resource Management Model

This presentation is now available to view on InfoQ.com

What You’ll Learn

Learn about the differences between the traditional application development model and the one based on containers.
Hear how to tune up the Kubernetes engine to maximize the performance, isolation and resilience of systems.

Abstract

Built with Linux container and cgroup technologies, Kubernetes provides an efficient framework for deploying different kinds of application workloads across multiple machines and compute platforms. Over the past five years, Kubernetes has evolved to support increasingly complex and diverse classes of applications, such as web services, databases, big data, and AI/ML workloads. As people are adopting Kubernetes to run even more diverse enterprise-class, cloud-native, and web-scalable workloads, we are seeing more requirements on improving the Kubernetes resource management model with better isolation, utilization, and performance consistency, yet provides a flexible and extensible framework for people to enable more hardware and workload-specific optimizations on Kubernetes.

In this talk, we will first provide an overview of the current Kubernetes resource model and best practice guidance on managing compute resources and specifying application resource requirements on Kubernetes. We will then discuss some recent and ongoing work on extending the Kubernetes resource model to provide better resource isolation, support more diverse hardware, facilitate fast and flexible application and resource scaling, and promote more consistent application performance across different compute platforms.

Question:

What is the focus of your work today?

Answer:

I'm mostly working on two major things: a) Improve Kubernetes reliability in terms of resource isolation. We want to provide better isolation between containers by hardening the resource model. b) We also attempt to see how we can better support other non-primary resources such as GPU.

Question:

What is the motivation for the talk?

Answer:

The motivation for this talk is mostly to let people know what is the difference between the traditional application model to the container-based application model. We try to make it as transparent as possible but I think there are some differences people want to be aware of so that they can better utilize the Kubernetes environment.

Question:

How would you describe the persona targeted by the talk?

Answer:

It's mostly for people who are considering to move from the traditional application development model to container-based CI/CD. They have been using Kubernetes for some time and want to know what kind of improvements they can benefit from, like auto-scaling, better resource isolation and better resource efficiency.

Question:

What do you want attendees to walk away with?

Answer:

If they can get some tips, like how they can modify their applications so that they can improve their application performance, reliability, robustness.

Question:

Does the resource manager in Kubernetes manage CPU and memory, or does it go beyond that?

Answer:

CPU and memory are the two primary resources. We start with that. They are the primary resources most used, but Kubernetes resource management actually covers a lot, i.e., any kind of resource that is shared by all the containers running on a node. E.g., you also share the number of processes that node allows, the number of file descriptors, networking resources, I/O bandwidth, and also today people are using GPUs. They are all considered resources.

Question:

Can you tell me about some of the recent changes to the Kubernetes resource model?

Answer:

We are trying to improve resource isolation. As I mentioned, we started from CPU and memory. They have a long history as shared resources and can benefit from the OS cgroup isolation support. But we are also trying to provide better isolation on other types of resources, and try to make performance more consistent by providing better resource isolation. We are also trying to better support other types of resources.

Speaker: Jiaying Zhang

Software Engineer @Google Kubernetes team

Jiaying Zhang is a software engineer at Google Kubernetes team. She has more than 10 years of experience in Linux system development on areas including NFS, ext4, kernel tracing, flash storage, and software-defined network. She has worked on Kubernetes for 2 years, focusing on improving compute resource management to better support more diverse workloads and hardware types.