Presentation: Lyft's Envoy: Embracing a Service Mesh

Track: Architectures You've Always Wondered About

Location: Broadway Ballroom North, 6th fl.

Duration: 10:35am - 11:25am

Day of week:

Slides: Download Slides

Level: Intermediate

Persona: Architect, Developer, DevOps Engineer

This presentation is now available to view on InfoQ.com

Watch video

What You’ll Learn

  • Hear from the creator of Envoy the sorts of problems Lyft was facing that ultimately led to the creation of Envoy.

  • Understand how Lyft used Envoy to be able to focus on producing more business logic oriented code and less infrastructure oriented code.

  • Learn more about Envoy and why so many companies are making it part of their infrastructure when deploying Microservices.

Abstract

Over the past several years, facing considerable operational difficulties with its initial microservice deployment primarily rooted in networking and observability, Lyft migrated to a sophisticated service mesh powered by Envoy (https://www.envoyproxy.io/), a high-performance distributed proxy that aims to make the network transparent to applications. Envoy’s out-of-process architecture allows it to be used alongside any language or runtime.

At its core, Envoy is an L4 proxy with a pluggable filter chain model. It includes a full HTTP stack with a parallel pluggable L7 filter chain. This programming model allows Envoy to be used for a variety of different scenarios, including HTTP/2 gRPC, MongoDB. Redis, rate limiting, etc. Envoy provides advanced load balancing support, including eventually consistent service discovery, circuit breakers, retries, and zone-aware load balancing. Envoy also has best-in-class observability, using statistics, logging, and distributed tracing.

Matt Klein explains why Lyft developed Envoy, focusing primarily on the operational agility that the burgeoning service mesh paradigm provides, with a particular focus on microservice networking observability.

 

Question: 

QCon: You created Envoy. How did you come up with the idea for Envoy?

Answer: 

Matt: I've been working on Internet-scale networking for the last 10 years at places like Amazon, Twitter, and Lyft.

The migration of technology stacks from a single language stack to a more polyglot stack over the last five to seven years has made it clear that people are embracing more Microservices architectures. Embracing a stack that has many different languages brings with it a lot of different problems. For example, you have hugely heterogeneous environments across different types of architectures and even across different on-prem and cloud providers. We realized that networking and unobservable behavior are quickly becoming the largest impediments to scale. These are things like advanced load balancing, timeouts, retries, circuit breakers, tracing, and logging.

Looking around the ecosystem you see a lot of great tooling around the JVM (things like Finagle or Hystrix from Netflix). But when you start looking in the polyglot environment, there really did not exist any cohesive set of technologies that allow people to deploy distributed system best practices (particularly, across networking and observability).

So when I came into Lyft, the company had a monolith environment with mostly PHP. They had some services in Python and were looking to add more services in Go. We were facing a lot of the same problems any company would face around this type of architecture. When it came to choosing between solving these problems with yet another library, it became clear that if we could solve these problems with an out of process proxy that was extensible, high performance, had best of class load balancing, and observability, it would be something compelling and help improve Lyft's architecture. In addition, if you could use the same proxy for internal services and for traffic at the edge, that's a pretty great thing from an operational perspective. So we felt there was a really great opportunity to help Lyft scale. That solution became Envoy.

Question: 

QCon: What's the focus of the QConNYC talk?

Answer: 

Matt: What we're going to do is dig deep into what Lyft's problems were prior to Envoy existing. So we'll try to set the stage for why Lyft was rolling out a microservice architecture. I'll discuss what we were hoping to gain from it. What were the operational problems that we were actually having, and then we're going to dig into the main design points of Envoy and how it helped fix those problems. We'll probably spend a considerable amount of time actually talking about the operational aspects of those problems. I'll show a lot of the internal dashboarding that we use. I'll talk a little bit about the alarming, the tracing, the logging, and try to give people a good understanding of how (from an operational perspective) Envoy and the service mesh actually help people scale their Microservices architectures.

Question: 

QCon: Why do you think this is an important story today?

Answer: 

Matt: think deploying microservice architectures is obviously all the rage right now. I think that there are very good reasons for organizations to do that, but I think, at the same time, the current state of the industry is such that organizations undertake microservice migrations without fully understanding all the operational complexity.

I think many organizations get stuck, and I think Lyft was in that position. We wanted to unlock the people agility around microservices but faced major operational concerns particularly around networking and observability. That's where envoy comes in and helps bridge that gap. How do you allow people to come in and build microservice architectures and scale them in such a way that they don't spend all their time debugging?

Question: 

QCon: Who are you talking to or are you talking to in this talk?

Answer: 

Matt: First off, I think I'm talking to two different types of people. The first are people who are building infrastructure. So people who are building the foundational systems that the application developers are going to run their business logic on. The second set of people are application developers. I think a lot of application developers spend a lot of time dealing with infrastructure problems and not focusing on business logic. For that audience, my goal is to try to help them understand that there is a better way. If the infrastructure is mature enough (and provides enough abstractions), they can spend more time focusing on business logic than on dealing with debugging random problems.

Speaker: Matt Klein

Creator of Envoy & Software Engineer @Lyft

Matt Klein is a software engineer at Lyft and the creator of Envoy (www.envoyproxy.io). Matt has been working on operating systems, virtualization, distributed systems, networking, and making systems easy to operate for more than 15 years across a variety of companies. Some highlights include leading the development of Twitter’s L7 edge proxy and working on high-performance computing and networking in Amazon’s EC2.

Find Matt Klein at

Similar Talks