Presentation: Evaluating Machine Learning Models: A Case Study
What You’ll Learn
- Understand the importance of developing a simulation-based framework to reasoning about machine learning models.
- Hear a three-step approach to evaluating models against metrics that matter to the business.
- Learn more how Opendoor uses machine learning to drive pricing models.
Abstract
American homes represent a $25 trillion asset class, with very little liquidity. Selling a home on the market takes months of hassle and uncertainty. Opendoor offers to buy houses from sellers, charging a fee for this service. Opendoor bears the risk in reselling the house and needs to understand the effectiveness of different hazard-based liquidity models.
This talk focuses on how to estimate the business impact of launching various machine learning models, in particular, those we use for modeling the liquidity of houses. For instance, if AUC increases by a certain amount, what is the likely impact on various business metrics such as volume and margin?
With the rise of machine learning, there has been a spate of work in integrating such techniques into other fields. One such application area is in econometrics and causal inference (cf. Varian, Athey, Pearl, Imbens), where the goal is to leverage advances in machine learning to better estimate causal effects. Given that a typical A/B test of a real estate liquidity model can run many months in order to fully realize resale outcomes, we use a simulation-based framework to estimate the causal impact on the business (e.g. on volume and margin) of launching a new model.
QCon: What does Opendoor do?
Nelson: We make it as easy as possible to buy and sell houses. The way it works is you sell your house to us, and then we later sell it to a market buyer. This is pretty risky. So one of my teams focuses on modeling that risk
For example, we’re able to resell some houses quickly. Perhaps they are in more central areas of town, they're at lower price points, with larger buyer pools to draw from, or it's a favorable time of the year (like spring). It just depends on the market. In these cases, the fee that we charge the seller is quite low because we're incurring very little risk in holding the house and then later reselling it. So we have a series of machine learning models that we are using in our pricing that are focused on modeling this kind of liquidity.
QCon: What’s the motivation for your talk?
Nelson: I think that backtesting machine learning systems is very well understood, and running A/B tests to see how a new model launch is affecting business metrics is also quite common. However, in many application areas it’s important to be able to backtest the business impact of a new machine learning model. There aren’t many resources on how to do that, and I’d love to spread awareness.
QCon: From a high-level, how will you go about discussing testing and evaluating models in this talk?
Nelson: You need a simulation of the business. In most cases, that's a user model. So, for example in real estate, we want to know what is the probability that someone will sell to us given the price we're offering. This is a demand curve, which is generally downward sloping. You can you can add as many features as you want to make it a more accurate reflection of business. Other domains will have other user models.
So that's the main business simulation. Then you put in your predictions from a new machine learning model and the old one. The difference is the impact on the business metrics. I am going to discuss this approach and our use case and go over a three step process to generalize to new problems.
QCon: What do you want someone to walk away from your talk with?
Nelson: You might be experimenting with some extreme change or, in our case, it might just take a very long time to get results because of what I call metric measurement lag.
If these are the case, consider this simulation-based approach and the steps I'll discuss as an approach to reason about selecting your models.
Similar Talks
Scaling DB Access for Billions of Queries Per Day @PayPal
Software Engineer @PayPal
Petrica Voicu
Psychologically Safe Process Evolution in a Flat Structure
Director of Software Development @Hunter_Ind
Christopher Lucian
Not Sold Yet, GraphQL: A Humble Tale From Skeptic to Enthusiast
Software Engineer @Netflix
Garrett Heinlen
Let's talk locks!
Software Engineer @Samsara
Kavya Joshi
PID Loops and the Art of Keeping Systems Stable
Senior Principal Engineer @awscloud
Colm MacCárthaigh
Are We Really Cloud-Native?
Director of Technology @Luminis_eu
Bert Ertman
The Trouble With Learning in Complex Systems
Senior Cloud Advocate @Microsoft
Jason Hand
How Did Things Go Right? Learning More From Incidents
Site Reliability Engineering @Netflix
Ryan Kitchens
Graceful Degradation as a Feature
Director of Product @GremlinInc