Presentation: Evaluating Machine Learning Models: A Case Study

Track: Machine Learning 2.0

Location: Majestic Complex, 6th fl.

Day of week:

Slides: Download Slides

Level: Advanced

Persona: Data Scientist

What You’ll Learn

  • Understand the importance of developing a simulation-based framework to reasoning about machine learning models.
  • Hear a three-step approach to evaluating models against metrics that matter to the business.
  • Learn more how Opendoor uses machine learning to drive pricing models.

Abstract

American homes represent a $25 trillion asset class, with very little liquidity. Selling a home on the market takes months of hassle and uncertainty. Opendoor offers to buy houses from sellers, charging a fee for this service. Opendoor bears the risk in reselling the house and needs to understand the effectiveness of different hazard-based liquidity models.

This talk focuses on how to estimate the business impact of launching various machine learning models, in particular, those we use for modeling the liquidity of houses. For instance, if AUC increases by a certain amount, what is the likely impact on various business metrics such as volume and margin?

With the rise of machine learning, there has been a spate of work in integrating such techniques into other fields. One such application area is in econometrics and causal inference (cf. Varian, Athey, Pearl, Imbens), where the goal is to leverage advances in machine learning to better estimate causal effects. Given that a typical A/B test of a real estate liquidity model can run many months in order to fully realize resale outcomes, we use a simulation-based framework to estimate the causal impact on the business (e.g. on volume and margin) of launching a new model.

Question: 

QCon: What does Opendoor do?

Answer: 

Nelson: We make it as easy as possible to buy and sell houses. The way it works is you sell your house to us, and then we later sell it to a market buyer. This is pretty risky. So one of my teams focuses on modeling that risk

For example, we’re able to resell some houses quickly. Perhaps they are in more central areas of town, they're at lower price points, with larger buyer pools to draw from, or it's a favorable time of the year (like spring). It just depends on the market. In these cases, the fee that we charge the seller is quite low because we're incurring very little risk in holding the house and then later reselling it. So we have a series of machine learning models that we are using in our pricing that are focused on modeling this kind of liquidity.

Question: 

QCon: What’s the motivation for your talk?

Answer: 

Nelson: I think that backtesting machine learning systems is very well understood, and running A/B tests to see how a new model launch is affecting business metrics is also quite common. However, in many application areas it’s important to be able to backtest the business impact of a new machine learning model. There aren’t many resources on how to do that, and I’d love to spread awareness.

Question: 

QCon: From a high-level, how will you go about discussing testing and evaluating models in this talk?

Answer: 

Nelson: You need a simulation of the business. In most cases, that's a user model. So, for example in real estate, we want to know what is the probability that someone will sell to us given the price we're offering. This is a demand curve, which is generally downward sloping. You can you can add as many features as you want to make it a more accurate reflection of business. Other domains will have other user models.

So that's the main business simulation. Then you put in your predictions from a new machine learning model and the old one. The difference is the impact on the business metrics. I am going to discuss this approach and our use case and go over a three step process to generalize to new problems.

Question: 

QCon: What do you want someone to walk away from your talk with?

Answer: 

Nelson: You might be experimenting with some extreme change or, in our case, it might just take a very long time to get results because of what I call metric measurement lag.

If these are the case, consider this simulation-based approach and the steps I'll discuss as an approach to reason about selecting your models.

Speaker: Nelson Ray

Data Scientist @Opendoor

Nelson manages the Risk Science group at Opendoor in San Francisco. His team is responsible for per-home liquidity estimation and developing responsive risk models. Prior to joining Opendoor, Nelson was a data scientist at Google and a software engineer at Metamarkets. He holds a BS in mathematics and an MS and PhD in statistics from Stanford University.

Find Nelson Ray at