Silei Huo

Logo

Data-driven problem solver with extensive experience in Banking and Technology (Payments), focusing on Strategic Planning, Product Management, and Data Analytics. MBA from London Business School.

Data Science | FinTech | Payment Processing | Banking

View My LinkedIn Profile

View My GitHub Profile

Online Payment Fraud Detection

Project Background

Applying machine learning models on a the e-commerce transactions dataset, which contains a wide range of features from device type to product features, to detect fraudulent transactions and improve the efficacy of alerts to reduce fraud loss as well as save the hassle of false positives.

transaction Photo by Paul Felberbauer on Unsplash

Data Source & Description

The data comes from real-world e-commerce transactions, source: IEEE-CIS Fraud Detection

PART 1 - Data Wrangling & EDA

Overall Target Variable(IsFraud) & Transaction Amount Distribution

Percentage of Fraudulent Transactions Transaction Amount Distribution across Two Classes
1 2

Card Types

Transaction Distribution across Card Types
3

Email Domains

Transaction Distribution across Email Domains
4

Counting Information

(such as how many addresses are found to be associated with the payment card)

Since the data for couting information is heavily right skewed, thus looking further into the higher quantile values, and it’s interesting to find that for most of the columns, fraud class has much higher values and only columns C4 & C9 are opposite.

Quantiles for Counting Variables
5

Web - Browser Types

5

PART 2 - Analysis & Modeling

Model Performance Comparison

Feature Importance Comparison

Feature Importance (XGBoost) Feature Importance (LGBM)
1 3
2 4