Presentation: ML Data Pipelines for Real-Time Fraud Prevention @PayPal
This presentation is now available to view on InfoQ.com
Watch videoWhat You’ll Learn
- Understand How Real-time inference is supported by near-real-time data streaming, and offline analytical computation of the features
- Learn How the trained models are packaged, configured and served into large-scale production environment
- Hear how the data tier is organized, and how data is managed at PayPay
Abstract
PayPal processes about a billion dollars of payment volume daily ($451bn in FY2017); complex decisions are made for each transaction or user action, to manage risk and compliance, while also ensuring good user experience. PayPal users can make payments immediately in 200 regions with the assurance that the company’s transactions are secure. How does PayPal achieve this goal in today's complex environment filled with "high-level" fraudsters as well as constantly increasing customer demand? While many industry solutions rely on fast analytics performed in near-real time over streaming data, our business requirements demand real-time, millisecond-range response. This talk will focus on the architectural approach towards our internally built real-time service platform that leverages Machine Learning models and delivers unparalleled performance and quality of decisions. This platform has established a fine balance between Big Data and sustainable support for a high volume of real-time decision requests. Well-structured design, along with domain modeling methodology provide for high adaptability to emerging fraud patterns and behavioral variations, deployment on real-time event-driven, fast data in-memory architecture that accelerates detection and decisions, thereby reducing losses, improving customer experience, and allowing efficient new integrations.
Is this talk a repeat or has it changed?
This talk builds up on my previous talks at QCon-London and QCon.ai earlier this year. The main focus will be given to the production inference, instead of trying to cover complete end-to-end ML development pipeline. The model inception, training and testing will be only briefly mentioned, while most of the time will be given to deeper level of details of the large-scale production stack.
Previous Talks on this Topic:
- QCon London 2018: Real-Time Data Analysis and ML for Fraud Prevention
- QCon.ai: Data Pipelines for Real-Time Fraud Prevention at Scale
What is the level of experience someone attending this talk should have?
The talk is geared towards the existing as well as aspiring practitioners of ML with the general understanding of Machine Learning landscape, not necessarily the data scientists, but rather architects and engineers focused on delivering the ML inference comute as a production capability at scale.
Similar Talks
Scaling DB Access for Billions of Queries Per Day @PayPal
Software Engineer @PayPal
Petrica Voicu
Psychologically Safe Process Evolution in a Flat Structure
Director of Software Development @Hunter_Ind
Christopher Lucian
Not Sold Yet, GraphQL: A Humble Tale From Skeptic to Enthusiast
Software Engineer @Netflix
Garrett Heinlen
Let's talk locks!
Software Engineer @Samsara
Kavya Joshi
PID Loops and the Art of Keeping Systems Stable
Senior Principal Engineer @awscloud
Colm MacCárthaigh
Are We Really Cloud-Native?
Director of Technology @Luminis_eu
Bert Ertman
The Trouble With Learning in Complex Systems
Senior Cloud Advocate @Microsoft
Jason Hand
How Did Things Go Right? Learning More From Incidents
Site Reliability Engineering @Netflix
Ryan Kitchens
Graceful Degradation as a Feature
Director of Product @GremlinInc