Presentation: Probabilistic Programming from Scratch

Track: Modern CS in the Real World

Location: Soho Complex, 7th fl.

Duration: 4:10pm - 5:00pm

Day of week:

Slides: Download Slides

Level: Intermediate - Advanced

Persona: Architect, Data Scientist, Developer

This presentation is now available to view on InfoQ.com

Watch video

What You’ll Learn

  • Gain a deeper understanding of how Probabilistic Programming can be used to help engineers solve problems around incomplete or partial data.
  • Learn a new programming paradigm using Python and PyMC3.
  • Hear how Probability Programming is being used in places like Facebook, Twitter, and Google in time series forecasting systems.

Abstract

This talk is for anyone who deals with real world data. Such data is always incomplete or imperfect in some way. Bayesian inference is a framework that allows us to draw conclusion from that data. And despite a reputation for mathematical and computational complexity, you don’t need a statistics background to understand Bayes at a conceptual level. We’ll develop that understanding by building a lightweight probabilistic programming system from scratch with simple Python. We’ll use the code we write to solve two real data problems: an A/B test and the German Tank problem. We’ll also look at how we’d solve those problems using PyMC3, a much more powerful, fully-featured probabilistic programming system.

Question: 

What do you want someone to leave your talk with? 

Answer: 

The audience will leave with a strong non-mathematical intuition for how Bayesian inference allows us to quantify the strength of conclusions drawn from real-world data. They’ll hopefully be excited to solve other toy problems with the tool we put together during the talk, and keen to check out PyMC3.

This talk is perhaps most useful for people who deal with real world data and face concrete statistical problems. But Bayesian inference provides a powerful day-to-day mental model for thinking about data and belief. And in keeping with the CS track, this talk will be an introduction to a new programming language paradigm for some. So I hope it will be at least interesting to a very wide audience! 

Question: 

Is probabilistic programming a real thing? Can you give me an example of where it's being used today? 

Answer: 

Yes, it's a real thing! The most prominent examples of tech companies using these ideas in the real world are Facebook's Prophet time series forecasting system (which I'll discuss in the talk), and Uber's release of Pyro, an open source deep probabilistic programming system built on top of PyTorch. And Google are now getting involved with Tensorflow Probability

Question: 

What is the level of experience someone attending this talk should have?

Answer: 

This might seem like a talk about statistics, mathematics and computer science. But my goal is that everyone who can write a for loop will understand everything we do. I attempt to ensure this by implementing things from scratch, and choosing a Bayesian inference algorithm that is particularly transparent and non-mathematical. I happen to use Python a little in this talk, but it's not essential that you can code in Python. And very importantly: no mathematics is required!

Speaker: Mike Lee Williams

Research engineer @Cloudera Fast Forward Labs

Mike Lee Williams does applied research into computer science, statistics and machine learning at Cloudera Fast Forward Labs. While getting his PhD in astrophysics he spent 2% of his time observing the heavens in beautiful far west Texas, and the other 98% trying to figure out how to fit straight lines to data. He once did a postdoc at the Max Planck Institute for Extraterrestrial Physics, which, amazingly, is a real place.

Find Mike Lee Williams at