Why CNP Fraud is a Constantly Evolving Threat to eCommerce Merchants
CNP fraud is a significant concern faced by ecommerce merchants. And just as you may feel you have it under control, it evolves, and new risks emerge. By 2023 ecommerce merchants are projected to lose $130B to fraud. Unfortunately, the steps that many merchants have taken to control fraud end up costing them more than their direct fraud losses. False positives, declining a legitimate order due to the suspicion of fraud, cost merchants $118B in an average year.
The Challenges are Significant
Merchant’s are facing an uphill challenge when it comes to CNP Fraud. While they recognize the risks and have put solutions in place to fight fraud, they are often falling short of the desired result. There is a huge financial impact associated with false declines. That’s revenue merchants are leaving on the table. And the customers they decline will frequently explore other options the next time they go shopping. Merchants may also be facing a high rate of fraud chargebacks. They know they’re taking on too much but their efforts to control the problem aren’t working. Rule-based fraud strategies always seem to be one step behind the criminals. Fraud control system and chargeback management costs are going through the roof, with very little to show for it. Developing accurate and timely detection methods is essential but difficult.
An effective fraud control system must:
- Address diverse and hidden fraud behavior
- Recognize evolving fraud patterns
- Provide timely feedback
- Adjust quickly and accurately to emerging threats
- Significantly reduce false positives
What is the Difference Between
Proactive vs. Reactive Fraud Control Systems?
Most fraud control systems begin with proactive rule-based detection. These tools have severe limitations. They are only able to catch easy fraud patterns that are based on a limited number of attributes or variables. They require a significant amount of manual effort to define and code each rule. Once in effect, these systems are typically slow to respond due to overly burdensome processing.
Reactive fraud control systems that are based on artificial intelligence (AI) and machine learning (ML) are superior to the proactive rule-based approach. They provide far better detection of complex patterns and non-linear correlations. They can handle hundreds of attributes rather than just a few, and they are able to analyze vast amounts of data and respond more quickly. A key benefit of AI/ML systems is inherent in their ability to learn a wide array of fraud patterns and to automatically develop and train new models. New candidate models are developed in the ML pipeline, tested for efficacy, and once proven deployed to production.
Why Machine Learning is Not a Silver Bullet
At this point, a merchant might jump to the false conclusion that an AI/ML system will solve every fraud challenge. On its own, that simply isn’t the case.
AI/ML systems have several dependencies and can fail for a variety of reasons.
- Feature engineering by experts is key to the performance of the system. Domain expertise is essential to fine tune and optimize models.
- Data quality determines the upper limit of model performance. The adage, “garbage in, garbage out” is especially valid when looking at the effectiveness of any AI/ML system. Without access to consortium data and collaborative data sharing the insight and accuracy of a model will be limited.
- Each algorithm targets a specific threat. In isolation a model will never be adequate. A cohesive strategy that provides orchestrated coverage is key.
- ML/AI is not a standalone solution and should not be expected to perform as one. Leading fraud systems leverage a wide range of technologies, including:
- Consortium data
- Graph link analysis
- Advanced identity authentication
- Silent pending
- And more…
How Vesta’s Approach to CNP Fraud is Different
Our goal is to increase revenue
for merchantsWe Follow a Multi-Layered Approach to Fraud Prevention using:
- Behavioral & Device Data Analysis
- Overall, Vesta utilizes more than 200M unique data assets. Data includes 3rd party data, social network data, email, phone, and digital fingerprint, and much more.
- Policies are a first level filter to stop obvious fraud and can be used to eliminate easy to detect outliers.
- Machine Learning
ML includes numerous technologies such as
- Graph link analysis
- Ensemble techniques
- Supervised learning
- Anomaly detection and fraud alerts
- Human touch
- Fraud analyst review is an important part of an effective fraud control system. We should be clear that this is not the same as manual review. Fraud analysts are trained data scientists with advanced degrees in mathematics, statistics, and data analysis who use their expertise to refine models for optimal performance.
- Delayed decisioning via silent pending allows an ultimate fraud decision to be delayed while additional data is gathered and evaluated.
- Secondary authentication such as 3DS routing can be utilized when further authentication is required to validate the legitimacy of a transaction.
The Key Pillars of Vesta’s Machine Learning Strategy include:
- Supervised ML
- This is the most common technique used within our ML portfolio. Models are trained with billions of transactions labeled legitimate or fraudulent. They are refined through an iterative learning cycle where real-time predictions are made and the results are then used to continue feeding the training process.
- Unsupervised ML.
- Where labeled transaction data is unavailable or very limited, self-learning models can be used. These systems are designed to detect anomalous behavior through off-layer detection of unique patterns. Recommendations are incorporated to refine new candidate models that are then evaluated prior to deployment to production.
- Adaptive ML
- Adaptive ML handles evolving fraud patterns through a process of continuous updates based on fraud feedback.
- Real-time Graph link analysis.
- This technique is used to identify linked entities, attributes, and groups. These associations contribute to a risk assessment that is based on the known linkages found between the current transaction and historical data.
All of these methods are used together simultaneously as part of a cohesive strategy and approach to project a final risk profile.
CNP Fraud is Dynamic and so is Vesta’s Technology
Vesta transaction decisioning platform is a highly integrated eco-system that is comprised of a real-time decisioning system - UFS, and a near real-time data warehouse that constantly analyzes transactions. This system is connected to a cloud computing service where the machine learning models are trained. The trained models are deployed to production to make real-time decision.
Machine learning plays a key role in Vesta’s decisioning system. For Vesta’s risk decision, we collect the minimal amount of data from the end customer in real time to ensure the best possible customer experience.
Sophisticated Data Collection Strategy:
- Collect the minimal amount of data from customers to ensure the best possible customer experience
- Get the right types of data from third party sources
- Collect maximal amount of available data from third parties to reduce customer impact
- Collects behavioral data, device data, and social data when legally allowed
We obtain certain data from third parties to increase our predictive power. The risk team aggregates these data into a set of derived variables. Each transaction can usually be described by a few thousand variables. With all these data, data scientists train machine learning models to extract patterns to predict if a future transaction is legitimate or fraudulent.
We have made significant investments in scaling our behavioral data collection product, with easy to implement browser and native app based solutions flowing data into our decision platform for use in behavioral modeling, further increasing approval rates. Our data warehouse is continually growing and our patented link analysis processes enable us to leverage links across all order history in order to determine fraudulent, and non-fraudulent, linkages over time in deep link analysis. Using real time, shorter time horizon link analysis in-line with the transaction we can create critical linkage-based model features that deliver exceptional results. Our machine learning pipeline, developed in-house, provides an automated training and model selection process, allowing us to retrain and create advanced models in a matter of hours, providing us the ability to maintain client-specific models where needed to enhance overall approval rates. Advanced techniques such as ensemble modeling and deep learning are supported in our pipeline. Unsupervised models are evaluating all of our transactions for anomalous behavior, which creates alerts that our team of fraud analysts review for fraudulent activity. If fraud patterns are discovered, our automated pipeline allows us to retrain models with the updated intelligence within a matter of hours.
How Vesta’s Machine Learning Pipeline Works
Vesta’s machine learning pipeline provides an end-to-end solution
- Handle Machine Learning tasks from raw data to production models
- Build and compare multiple ML models in one run: random forest, gradient-boosted trees, deep neural networks, etc
- Grid-search for best model
- Feature ranking and performance report
- Build complex model in hours
AutoML -Machine Learning Pipeline Automation
Before this pipeline concept, to train a machine learning model, data scientists need to go through a complex process which contains multiple steps:
- Import data from data warehouse
- Perform variable analysis to exam distributions
- Perform data transformation to transform data into a format that suitable for the proposed algorithm
- Select only a subset of variables which are most useful to reduce calculation burden
- Train multiple models by tuning the hyper parameters
- Proceed model evaluation and comparison to chose best model.
- Deploy model to production.
Data scientist used to go through this process manually and repeat this process many times to train one model.
Vesta built a ML pipeline platform to standardize the model training process and improve its efficiency. The pipeline contains each of these steps as building block. Data scientist can build models by modifying a configuration file to put these building blocks together and customize them. Based on the configuration file, the platform will build a production model from raw data and automatically generate model performance report. This platform supports algorithms from classic linear model to most advanced machine learning model like gradient-boosting trees, deep neural networks, random forests, etc. It also support some advance techniques based on our research, such as ensemble modeling, which combine multiple models together to reach best results.
With MLP, data scientists can build complex non-linear model in hours, in contrast to days to weeks in the past.
ML Pipeline Reporting
Vesta’s machine learning pipeline reports provide insights into the data that guide the creation and refinement of ML models. Data scientists can access these reports to further refine models by adjusting variable thresholds.
Below are just a few examples of the charts and tables automatically generated by the automated ML pipeline reporting process.
- Feature ranking reports provide an indication of the importance of each variable
- Model performance curves compare the performance of multiple models
- Grid search comparisons give metrics of top variables
- Training-test score distributions show how various models score differently for training and test sets of variables.
- Score distribution provides metrics of different score buckets, allowing the Vesta data scientists to choose a cut-off threshold.
Model Auto-Refresh System (MARS)
Our experiments showed that models need to be retrained frequently to capture new risk patterns. Especially in the fraud risk industry, bad actors will try to work around existing models once they are blocked. Retraining models to capture new fraud patterns is critical. However, the manual effort required to maintain and retrain multiple production models on an ongoing basis was a huge challenge.
MARS provides numerous advantages over a manual approach:
- Full automation, automatically refresh production risk models on calendar base or preset criteria
- Highly scalable, able to handle hundreds of models.
- Manageable scheduler and job monitor.
- Interactive dashboard for model version control.
How to Get Prepared to Address Key Fraud Challenges
The measure of an effective fraud system lies in its ability to address emerging and complex challenges. The five most common challenges faced include:
1. Diverse Behavior
- Different vertical markets will have unique fraud patterns
- Data attributes will carry different weight and significance in different verticals
- Shipping address for ecommerce
- Receiving email for money transfer
- Origination and destination for airline travel, but not shipping address
- Regional fraud patterns add another layer of diversity
- Issuing bank capabilities for verification and authorization differ
- Long-term and short-term events introduce additional fraud patterns
- Seasonal patterns such as black Friday
- Quarterly sales, holiday sales, etc.
The solution to diverse behavior is to utilize multiple models with stacking predictive techniques. But this can place a significant burden on ML infrastructure and fraud system performance. An automated ML system is required to train, deploy, and maintain such a high volume of models.
2. Hidden Behavior
- Stolen data, stolen cards, fake entities, stolen IDs, stolen phone #s, and different IP addresses are all forms of hidden behavior used by criminals to mask their identity.
- So long as one attribute remains the same, with that attribute all other attributes can be linked to uncover the hidden behavior
- The techniques used to address this threat is graph link analysis
3. Evolving Patterns
- Fraudsters attempt to trick the system by changing their behavior to mimic safe behavior
- A combination of unsupervised with adaptive models are effective in combating this threat pattern
- Unsupervised Anomaly Detection
Identifies unseen patterns and notifies a fraud analyst to update the ML model. This process is much faster than waiting for chargeback return data that can often take 60-90 days or more. With this approach, adjustments to models can be made in real-time.
- Adaptive ML
Adaptive ML produces an ongoing refresh of production models based on new and emerging data. New candidate models are developed and then tested against production models. If the candidate model proves to be more effective it will be deployed to production.
- Unsupervised Anomaly Detection
4. Fraud Feedback Delay
Chargeback data can often take 60 days or more to reach the fraud system. During the period models degrade and false positives can persist leading to poor customer experience and a loss of revenue.
A multi layer approach is required to update models quickly. An effective feedback loop will leverage graph link analysis in concert with unsupervised machine learning and the review of fraud analysts.
5. Vesta’s Data is Best in Class
Poor data quality can cause any model to fail. Vesta is in a unique position. We have built an extensive set of consortium data that includes more than 200M unique data assets assembled over 20+ years in business. Our data includes:
- Transactional data and also behavioral data.
- Updated and stored in real-time, providing immediate insight to our ML pipeline and decision models.
- Our collection and storage methods are fully compliant with all card network and country specific data security and privacy regulations.
The Best Defense is a Good Offense:
Fraud is one of the greatest threats businesses face today. Find out how AIML can position your business for stronger revenue growth.
Explore our Free Guide to start taking charge of CNP Fraud today!