QCon New York June 15-19, 2020 | | EBtree - Design for a Scheduler and Use (Almost) Everywhere

This presentation is now available to view on InfoQ.com

What You’ll Learn

Hear about elastic binary trees, what they are and what they are good for.
Listen what were the goals, design and the choices behind the implementation of EBtree used in HAProxy.
Find out about the benefits of optimizing the data structure of critical code.

Abstract

As the demand on Internet infrastructures grows, so do the requirements on its components, like load balancers, to process ever more data. EBtree is a different take on the ubiquitous tree data structure, and is helping HAProxy, a high performance Open-Source software load balancer, to keep up with demands for over a decade. In this talk we explore the goals, design and the choices behind the implementations of EBtree, and how they combine to produce a very fast and versatile data storage for many of HAProxys advanced features.

Question:

What is the focus of your work today?

Answer:

Currently I work on making sure that HAProxy is and remains the fastest and most widely used software load balancer today. My team and I work on improving and extending HAProxy. We constantly try to find new ways to integrate it into existing and emerging platforms and environments. We know that HAProxy is often an indispensable piece of software for our users and as they migrate through their infrastructure lifecycles so does HAProxy.

Question:

And what's the motivation for the talk?

Answer:

The motivation for the talk is to remind software architects and service architects to think outside of the box. Even though we often have to resort to using off-the-shelf components to develop software faster and our services faster, investing effort into particular performance critical parts of our stacks can really provide long term benefits in various unforeseen ways. For us in this example, this is investing in binary trees and our environment was optimizing for CPU cache sizes and minimizing memory allocations. That's where we started our improvements and they cascaded upwards through our software stacks, but any place is a good place to take an off-the-shelf component and see how it can be improved and replaced.

Question:

For people who don't know, can you briefly describe what an elastic binary tree is?

Answer:

An elastic binary tree is a data structure that tries to optimize the speed of insertion and retrieval of data. More specifically it’s a variant of a prefix or radix tree that is exhibiting some practical benefits of a red-black tree. It was developed over a number of years primarily to be used within a task scheduler to sort tasks and retrieve task data in the order of those tasks needing to be processed. It shares some characteristics at a theoretical level with prefix trees, but it has some other advantages over them. Namely reducing memory allocations which then in practice makes it a lot more usable for not just scheduler application but for many other applications in HAProxy. For example, ACL lists or cache indexing and so on.

Question:

What are the specific properties that make an EBTree suitable for HAProxy?

Answer:

Andjelko Specific properties are less memory allocation and better use of CPU caches and even branch prediction algorithms, which gives us fast inserts, lookups and deletes of tree entries. A lot of work has gone into optimizing the memory footprint and layout of tree nodes in order to reduce accessing data outside of the CPU caches, because every time we do that we get a big performance drop. A lot of techniques are employed which are closely tailored to actual hardware rather than just being a translation of a theoretical concept into code.

Question:

How would you describe the persona and the target audience?

Answer:

I would say anyone who has the desire and willingness to go and tinker with any part of their software stack. In this case we're talking about a fairly low level data structure. But it can be anything else, it can be a balancing algorithm. It could be a performance measurement part of some distributed application. Those interested to take a look at those most important parts of their architecture, people who are in the position to holistically view their service or their product, and also change any part that they can identify as needing to be improved whether it's low level or high level.

Question:

What do you want people who come to your talk to walk away with?

Answer:

I was hinting at that in my previous answer, I’d like people to walk away with an itch to, every now and then, take a look at their architecture, whether it's software or service or whatever else, and invest into improving a fundamental component for that particular service or software. More often than not that opens up new ways of either delivering the service or new levels of performance and it keeps giving those benefits over the years. Don't just always use off-the-shelf stuff but every now and then take a deep look and pick one part and optimize a lot.

Speaker: Andjelko Iharos

Director of Engineering at HAProxy Technologies

Andjelko is a Director of Engineering at HAProxy Technologies, the company behind the worlds fastest and most widely-used software load balancer. He is responsible for designing and overseeing implementation of solutions and services at all scales, from low level and microsecond optimized software to massive automated clusters processing tens of billions of requests per day. Andjelko works with tools such as Python, C, Golang and many more.