Presentation: Forced Evolution: Shopify's Journey to Kubernetes
This presentation is now available to view on InfoQ.com
Watch videoWhat You’ll Learn
- Understand the design goals and approaches to developing a common platform based on kubernetes to deploy across on-prem and cloud environments.
- Learn how Golang can be seamlessly integrated into Kubernetes to create a highly customized platform for your use cases.
- Understand how and why Shopify built a customized platform on top of Kubernetes.
Abstract
Shopify, in 2014, was one of the first large scale users of Docker in production. We ran 100% of our production web workers in hundreds of containers. We saw the value of containerization and aspired to also introduce a real orchestration layer.
Fast forward two years to 2016, when instead we had a clumsy and fragile homemade middleware for controlling containers. We started looking at orchestration solutions again and the technology behind Kubernetes intrigued us.
In this talk I'll briefly go over the challenges we saw in moving from a traditional host-based infrastructure to a cloud native one, moving not only our core app to Kubernetes but also hundreds of our other apps at the same time. I'll focus on the cluster tooling solutions we've built, such as controllers, cluster creators, and deploy tools. We've automated things ranging from our DNS to certificates and even complex cluster creations - and all with a real programming language rather than a handful of random scripts.
The ability to extend Kubernetes to fit our needs has been the greatest reward of this project. It's given us a new paradigm on which to build upon rather than relying on old patterns.
What’s the focus of your work today?
I'm on the cloud platform team. Mostly I'm working with Kubernetes. The main goal is to make sure our platform for developers is working. That may mean that I'm debugging some issues with Docker or Kubernetes, or I’m writing Golang to automate the platform. Today, I'm actually working on improving our cluster life cycle, we are trying to make it more dynamic. We are building new tooling to speed up new clusters and build up some sort of cluster registries and deletion workflow.
Your background is in Java, but you write a lot of Golang these days. What are your thoughts on Golang?
I got started in the software industry with Java Development. I was always interested in the domain of system engineering and my responsibilities back then were mostly about the infrastructure concerns. Things like how we can build deployment pipelines and other things like that. I wrote C++ in university, and I always felt like I wanted something in between C++ and Java. It’s hard to do simple customized tasks in Java, and C++ takes a lot of effort for a simple task.
Golang strikes a perfect balance between these two because you can write complex software and write small tasks of 10 to 20 lines of script. I like the approach of using uber jars in Java and Golang takes this even further. You can build static binary, so you don't have to have a specific environment. You just make sure your binary is compiled against the correct platform that you will be running against. In a way, I like getting something stronger and more powerful than Java, especially for the systems engineering domain.
What was the motivation for your talk?
My motivation was really about showing people the effort that went into building this kind of platform ourselves. I think the audience will be able to identify with us, because we run our applications in the data center, in the cloud, and in Heroku. We had a three-pronged approach and it got messy. It got messy because you have to perform a separation of, say, what goes into Heroku or what goes into AWS. If your application gets large enough that you have to get away from Heroku, you have to basically be thinking of how to start from scratch. We had to get on common ground for developers so they have a single story for running their application. It shouldn’t matter if it’s our most critical and largest application or if it's an application written in 15 minutes.
Who is the target audience for your talk?
I think the main target is going to be anybody that has a non-trivial scale of application. Meaning, if you have a single application, you can be specific and customize how you want to run it (and where you want to run it). But when you go into microservices or if you have hundreds of different applications (especially, in the enterprise environment) you have to think how to have a single story, a single way of approaching the same problem. It could be a solution architect that is thinking about this in a way that fits their company or their customers. Also, developers would be interested in how their company could adopt this kind of approach to make their lives easier by allowing them to write automations without involving operations people or figuring out where to put those scripts.
What are some of the takeaways from your talk?
One of the biggest takeaways is that we can write native Golang on top of Kubernetes and everything works as if it's part of Kubernetes. We can basically expand Kubernetes, which is already a really powerful tool. Before Kubernetes, you could write random scripts on top of, say a configuration management system, but usually you end up going behind it or working around it by writing different kind of integration. We’ve been able to build in automation with Golang in a Kubernetes native way.
Similar Talks
Scaling DB Access for Billions of Queries Per Day @PayPal
Software Engineer @PayPal
Petrica Voicu
Psychologically Safe Process Evolution in a Flat Structure
Director of Software Development @Hunter_Ind
Christopher Lucian
PID Loops and the Art of Keeping Systems Stable
Senior Principal Engineer @awscloud
Colm MacCárthaigh
Are We Really Cloud-Native?
Director of Technology @Luminis_eu
Bert Ertman
The Trouble With Learning in Complex Systems
Senior Cloud Advocate @Microsoft
Jason Hand
How Did Things Go Right? Learning More From Incidents
Site Reliability Engineering @Netflix
Ryan Kitchens
What Breaks Our Systems: A Taxonomy of Black Swans
Site Reliability Engineer @Slack, Contributor to Seeking SRE, & SRECon Steering Committee
Laura Nolan
Cultivating High-Performing Teams in Hypergrowth
Chief Scientist @n26
Patrick Kua
Inside Job: How to Build Great Teams Within a Legacy Organization?
Engineering Director @Meetup