Presentation: ML Data Pipelines for Real-Time Fraud Prevention @PayPal

Track: Practical Machine Learning

Location: Empire Complex, 7th fl.

Duration: 11:50am - 12:40pm

Day of week:

Slides: Download Slides

Level: Advanced

Persona: Architect, Data Scientist, Developer

This presentation is now available to view on InfoQ.com

Watch video

What You’ll Learn

    • Understand How Real-time inference is supported by near-real-time data streaming, and offline analytical computation of the features
    • Learn How the trained models are packaged, configured and served into large-scale production environment
    • Hear how the data tier is organized, and how data is managed at PayPay

Abstract

PayPal processes about a billion dollars of payment volume daily ($451bn in FY2017); complex decisions are made for each transaction or user action, to manage risk and compliance, while also ensuring good user experience. PayPal users can make payments immediately in 200 regions with the assurance that the company’s transactions are secure. How does PayPal achieve this goal in today's complex environment filled with "high-level" fraudsters as well as constantly increasing customer demand? While many industry solutions rely on fast analytics performed in near-real time over streaming data, our business requirements demand real-time, millisecond-range response. This talk will focus on the architectural approach towards our internally built real-time service platform that leverages Machine Learning models and delivers unparalleled performance and quality of decisions. This platform has established a fine balance between Big Data and sustainable support for a high volume of real-time decision requests. Well-structured design, along with domain modeling methodology provide for high adaptability to emerging fraud patterns and behavioral variations, deployment on real-time event-driven, fast data in-memory architecture that accelerates detection and decisions, thereby reducing losses, improving customer experience, and allowing efficient new integrations.

Question: 

Is this talk a repeat or has it changed?

Answer: 

This talk builds up on my previous talks at QCon-London and QCon.ai earlier this year. The main focus will be given to the production inference, instead of trying to cover complete end-to-end ML development pipeline.  The model inception, training and testing will be only briefly mentioned, while most of the time will be given to deeper level of details of the large-scale production stack.

Previous Talks on this Topic:

Question: 

What is the level of experience someone attending this talk should have?

Answer: 

The talk is geared towards the existing as well as aspiring practitioners of ML with the general understanding of Machine Learning landscape, not necessarily the data scientists, but rather architects and engineers focused on delivering the ML inference comute as a production capability at scale.

Speaker: Mikhail Kourjanski

Lead Data Architect, Risk and Compliance Management Platform @PayPal

Mikhail Kourjanski is the Lead Data Architect at PayPal, responsible for the data architecture of the PayPal real-time decisioning platform, that handles billions of events per day and maintains dozens of petabytes of data. For fraud prevention function alone, this platform saves more than $500M in annual profits.

Mikhail has over 20 years of work experience, including high-tech software engineering, academic research, and consulting for the Financial Services industry. Mikhail’s architecture work includes a number of innovative developments such as high-performance distributed processing over eventually consistent data, multi-layer security model for data-in-transit middleware, service domain models for banking and Fintech clients. Mikhail had delivered multiple engagements for the Top-10 banks in the roles of trusted advisor up to CIO level, lead architect, and IT delivery executive. Prior to consulting period of Mikhail’s career, he proved a successful entrepreneur running his own company, winning and delivering R&D projects for the US Government agencies. Mikhail earned his  Ph.D. degree in applied mathematics from the Moscow State (Lomonosov) University, Russia, followed by the post-doctoral research position at UC Berkeley. Mikhail’s academic research focused on large-scale distributed systems and real-time simulations for the Transportation industry and Smart Cars technologies.

Find Mikhail Kourjanski at