SlideShare a Scribd company logo
1 of 35
Insurance Fraud Claims
Detection
Arul Kumar ARK
225229103
I MSc Data Science
Bishop Heber College (Autonomous), Trichy
INTRODUCTION
Insurance fraud claims refer to the illegal act of filing a false
insurance claim or exaggerating a legitimate claim for financial gain.
Fraudulent insurance claims not only result in financial losses for the
insurance companies but also drive up the premiums for honest
policyholders. Therefore, insurance companies invest significant
resources in detecting and preventing insurance fraud claims.
There are various techniques that insurance companies can use to detect
fraud. Some of the commonly used methods include:
● Data analytics
● Machine learning
● Social media monitoring
● Investigative techniques
● Fraud detection software
Machine learning is increasingly being used for insurance fraud claims
detection. Machine learning algorithms can analyze large amounts of
data to detect patterns that indicate fraud. There are several techniques
that can be used in machine learning for insurance fraud claims
detection, including:
● Supervised learning
● Unsupervised learning
● Deep learning
● Ensemble learning
MOTIVATION:
The motivation behind fraud claims
detection is to protect insurance
companies from financial losses that
can result from fraudulent activities.
By make use of some Machine
Learning Algorithms to Detecting
fraudulent claims
20XX 20XX 20XX 20XX
Dataset description
The Insurance Fraud Claims Detection dataset is a collection of insurance claims made by
policyholders. The dataset is designed to help insurance companies detect fraudulent claims
and improve their claims processing accuracy. The dataset contains a total of 1000 instances
and 40 features, including both numerical and categorical variables.
Each instance in the dataset represents a single insurance claim, and the features describe
various aspects of the claim, such as the policyholder's age, gender, location, type of insurance,
claim amount, and other related information. The target variable in the dataset is a binary label
indicating whether the claim is fraudulent or not. About 14.4% of the claims in the dataset are
labeled as fraudulent.
Columns
‘months_as_customer’ , 'age', 'policy_number',
'policy_bind_date', 'policy_state', 'policy_csl',
'policy_deductable','policy_annual_premium',
'umbrella_limit', 'insured_zip',
'insured_sex','insured_education_level',
'insured_occupation', 'insured_hobbies',
'insured_relationship', 'capital-gains', 'capital-loss',
'incident_date', 'incident_type', 'collision_type',
'incident_severity', 'authorities_contacted',
'incident_state', 'incident_city', 'incident_location',
'incident_hour_of_the_day',
'number_of_vehicles_involved', 'property_damage',
'bodily_injuries', 'witnesses',
'police_report_available', 'total_claim_amount',
'injury_claim', 'property_claim', 'vehicle_claim',
'auto_make', 'auto_model', 'auto_year',
'fraud_reported', '_c39'
Numerical Columns respective with Fraud report
Categorical Columns respective with Fraud report
Plot Heatmap :
Headmap to check Correlation ( Correlation explains how one or more variables are
related to each other )
Check Outlier :
*Outlier decreases the value of a correlation coefficient and weakens the regression relationship*
StandardScaler for
standardize the features of a dataset
LabelEncoder used for encoding
categorical variables as numerical
variables. It converts each unique
categorical value into a numerical
Split
● X: the array of feature values
● y: the array of target values
● test_size: the proportion of the
data to be used for testing (usually
between 0.2 and 0.3)
● random_state: a random seed for
reproducibility
● X_train: the array of feature values
for the training set
● X_test: the array of feature values
for the testing set
● y_train: the array of target values
for the training set
● y_test: the array of target values
for the testing set
Fit And Transform
Algorithms
LogisticRegression
KNeighborsClassifier
DecisionTreeClassifier
LogisticRegression
KNeighborsClassifier
DecisionTreeClassifier
Tree
Comparison
LogisticRegression
Accuracy Score : 0.72
Mean Squared Error : 0.28
KNeighborsClassifier
Accuracy Score : 0.685
Mean Squared Error : 0.315
DecisionTreeClassifier
Accuracy Score : 0.805
Mean Squared Error : 0.19
Comparison : Visualization
Confusion Matrix Comparison
Logistic Regression K-Nearest Neighbors Decision Tree
The best model with the lowest MSE to be
selected is ['DecisionTreeClassifier']
Lowest MSE
DecisionTreeClassifier : Best estimator
*GridSearchCV*
Best Parameters :
{'criterion': 'entropy',
'max_depth': 3,
'min_samples_leaf': 1,
'min_samples_split': 3}
DecisionTreeClassifier : Best estimator
*GridSearchCV*
Important features
DecisionTreeClassifier : Important features
Classification Report
DTC vs DTC :Important features vs DTC : Best estimator
DTC DTC :Important features DTC : Best estimator
Confusion Matrix Comparison
DTC vs DTC :Important features vs DTC : Best estimator
DTC DTC :Important features DTC : Best estimator
Function : plot_confusion_matrix
The confusion matrix is a table that is used to evaluate the performance of a classification model by comparing
the predicted labels of the model with the true labels. The confusion matrix shows the number of true positives
(TP), true negatives (TN), false positives (FP), and false negatives (FN) that the model has produced.
The plot_confusion_matrix function takes a trained classifier and a set of test data as inputs and plots a
colored matrix that represents the values in the confusion matrix. The rows of the matrix represent the true
labels, while the columns represent the predicted labels. The diagonal of the matrix represents the correct
predictions, while the off-diagonal elements represent the incorrect predictions. The color of each cell
represents the number of instances that have been classified in that category.
The plot_confusion_matrix function can help in understanding the performance of a classifier by visualizing
how well the model is predicting each class. It can also be used to compare the performance of different
classifiers or different hyperparameters of the same classifier.
Overall, plot_confusion_matrix is a useful tool in the evaluation and comparison of classification models, as it
provides an intuitive way to visualize and understand the performance of the models.
ROC
DTC vs DTC :Important features vs DTC : Best estimator
Receiver Operating Characteristic (ROC)
When comparing ROC curves, we are typically interested in determining which model performs better at
distinguishing between the positive and negative cases. The ROC curve can help us to visualize this comparison
by showing the trade-off between true positive rate (TPR) and false positive rate (FPR) for each model.
In general, a better model will have an ROC curve that is closer to the top-left corner of the plot, which
corresponds to higher TPR and lower FPR. Conversely, a worse model will have an ROC curve that is closer to the
diagonal line, which corresponds to random guessing.
Another way to compare ROC curves is to calculate the area under the curve (AUC) for each model. The AUC is a
metric that summarizes the overall performance of the model, with a perfect classifier having an AUC of 1 and a
random classifier having an AUC of 0.5.
If the AUC values of two models are compared, the model with the higher AUC is considered to be a better model.
This is because the AUC provides a single value that summarizes the overall performance of the model across all
possible classification thresholds.
In summary, when comparing ROC curves, we can visually compare the trade-off between TPR and FPR for each
model, and we can also compare the AUC values to determine which model has better overall performance.
CONCLUSION
Insurance Fraud Claims Detection in Machine Learning is a crucial application of
supervised learning algorithms in the insurance industry. It helps insurers to identify
and prevent fraudulent activities by predicting whether a given insurance claim is
fraudulent or not. By reducing their financial losses, insurers can offer competitive
premiums to their customers and improve customer satisfaction. Moreover,
detecting fraudulent activities can also help insurers to maintain their reputation in
the market by preventing negative publicity due to fraudulent claims. Therefore, the
use of Machine Learning in Insurance Fraud Claims Detection is beneficial for both
insurers and policyholders alike.

More Related Content

What's hot

Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAndrea Dal Pozzolo
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learningdataalcott
 
Cyber Security Threats in the Financial Sector
Cyber Security Threats in the Financial SectorCyber Security Threats in the Financial Sector
Cyber Security Threats in the Financial SectorFarook Al-Jibouri
 
Atm frauds
Atm fraudsAtm frauds
Atm fraudsGPERI
 
Detecting Fraud Using Data Mining Techniques
Detecting Fraud Using Data Mining TechniquesDetecting Fraud Using Data Mining Techniques
Detecting Fraud Using Data Mining TechniquesDecosimoCPAs
 
Cyber Security of Nepal - Press Release
Cyber Security of Nepal - Press ReleaseCyber Security of Nepal - Press Release
Cyber Security of Nepal - Press ReleaseDr. Ramhari Subedi
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad
 
Cyber forensic investigation & Analysis
Cyber forensic investigation & AnalysisCyber forensic investigation & Analysis
Cyber forensic investigation & AnalysisAnshul Tayal
 
Online privacy & security
Online privacy & securityOnline privacy & security
Online privacy & securityPriyab Satoshi
 
Presentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlPresentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlDominic Sroda Korkoryi
 
ATM Frauds and Solutions
ATM Frauds and SolutionsATM Frauds and Solutions
ATM Frauds and SolutionsClarice_Wilson
 
1 INSURANCE FRAUD TRAINING PRESENTATION
1 INSURANCE FRAUD TRAINING PRESENTATION1 INSURANCE FRAUD TRAINING PRESENTATION
1 INSURANCE FRAUD TRAINING PRESENTATIONJoseph Callahan
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Pratibha Singh
 
Credit card payment_fraud_detection
Credit card payment_fraud_detectionCredit card payment_fraud_detection
Credit card payment_fraud_detectionPEIPEI HAN
 

What's hot (20)

Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud Detection
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Cyber Security Threats in the Financial Sector
Cyber Security Threats in the Financial SectorCyber Security Threats in the Financial Sector
Cyber Security Threats in the Financial Sector
 
Credit card frauds
Credit card fraudsCredit card frauds
Credit card frauds
 
Social Engineering
Social EngineeringSocial Engineering
Social Engineering
 
Banks and cybersecurity v2
Banks and cybersecurity v2Banks and cybersecurity v2
Banks and cybersecurity v2
 
Fraud Analytics
Fraud AnalyticsFraud Analytics
Fraud Analytics
 
Atm frauds
Atm fraudsAtm frauds
Atm frauds
 
Detecting Fraud Using Data Mining Techniques
Detecting Fraud Using Data Mining TechniquesDetecting Fraud Using Data Mining Techniques
Detecting Fraud Using Data Mining Techniques
 
Fraud risk management
Fraud risk managementFraud risk management
Fraud risk management
 
Cyber Security of Nepal - Press Release
Cyber Security of Nepal - Press ReleaseCyber Security of Nepal - Press Release
Cyber Security of Nepal - Press Release
 
Phishing ppt
Phishing pptPhishing ppt
Phishing ppt
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research Paper
 
Cyber forensic investigation & Analysis
Cyber forensic investigation & AnalysisCyber forensic investigation & Analysis
Cyber forensic investigation & Analysis
 
Online privacy & security
Online privacy & securityOnline privacy & security
Online privacy & security
 
Presentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlPresentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & control
 
ATM Frauds and Solutions
ATM Frauds and SolutionsATM Frauds and Solutions
ATM Frauds and Solutions
 
1 INSURANCE FRAUD TRAINING PRESENTATION
1 INSURANCE FRAUD TRAINING PRESENTATION1 INSURANCE FRAUD TRAINING PRESENTATION
1 INSURANCE FRAUD TRAINING PRESENTATION
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...
 
Credit card payment_fraud_detection
Credit card payment_fraud_detectionCredit card payment_fraud_detection
Credit card payment_fraud_detection
 

Similar to Detect Insurance Fraud with ML

Assessing the predictive capacity measur marco scattareggia
Assessing the predictive capacity measur   marco scattareggiaAssessing the predictive capacity measur   marco scattareggia
Assessing the predictive capacity measur marco scattareggiaScattareggia
 
Assessing the predictive capacity measur marco scattareggia
Assessing the predictive capacity measur   marco scattareggiaAssessing the predictive capacity measur   marco scattareggia
Assessing the predictive capacity measur marco scattareggiaScattareggia
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industryskewdlogix
 
Safer Drivers - An Analysis of Driver Characteristics in Car Fatalities
Safer Drivers - An Analysis of Driver Characteristics in Car FatalitiesSafer Drivers - An Analysis of Driver Characteristics in Car Fatalities
Safer Drivers - An Analysis of Driver Characteristics in Car FatalitiesRyan Schuldt
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
credit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstractcredit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstractVenkat Projects
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversionsSudeep Shukla
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisAmanda Reed
 
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYNAutomobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYNIRJET Journal
 
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICSCOMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICScscpconf
 
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfTanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfShrutiGarg649495
 
Automobile Insurance Claim Fraud Detection
Automobile Insurance Claim Fraud DetectionAutomobile Insurance Claim Fraud Detection
Automobile Insurance Claim Fraud DetectionIRJET Journal
 
IRJET - Fraud Detection in Credit Card using Machine Learning Techniques
IRJET -  	  Fraud Detection in Credit Card using Machine Learning TechniquesIRJET -  	  Fraud Detection in Credit Card using Machine Learning Techniques
IRJET - Fraud Detection in Credit Card using Machine Learning TechniquesIRJET Journal
 
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...ijmvsc
 
A Novel Performance Measure for Machine Learning Classification
A Novel Performance Measure for Machine Learning ClassificationA Novel Performance Measure for Machine Learning Classification
A Novel Performance Measure for Machine Learning ClassificationIJMIT JOURNAL
 
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATIONA NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATIONIJMIT JOURNAL
 
Performance of the classification algorithm
Performance of the classification algorithmPerformance of the classification algorithm
Performance of the classification algorithmHoopeer Hoopeer
 
Research Report - Are You Equipped to Successfully Combat Fraud
Research Report - Are You Equipped to Successfully Combat FraudResearch Report - Are You Equipped to Successfully Combat Fraud
Research Report - Are You Equipped to Successfully Combat FraudDavid Hartley
 

Similar to Detect Insurance Fraud with ML (20)

Assessing the predictive capacity measur marco scattareggia
Assessing the predictive capacity measur   marco scattareggiaAssessing the predictive capacity measur   marco scattareggia
Assessing the predictive capacity measur marco scattareggia
 
Assessing the predictive capacity measur marco scattareggia
Assessing the predictive capacity measur   marco scattareggiaAssessing the predictive capacity measur   marco scattareggia
Assessing the predictive capacity measur marco scattareggia
 
Lead Tracking and Conversion - Todd Katler
Lead Tracking and Conversion - Todd KatlerLead Tracking and Conversion - Todd Katler
Lead Tracking and Conversion - Todd Katler
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
 
Safer Drivers - An Analysis of Driver Characteristics in Car Fatalities
Safer Drivers - An Analysis of Driver Characteristics in Car FatalitiesSafer Drivers - An Analysis of Driver Characteristics in Car Fatalities
Safer Drivers - An Analysis of Driver Characteristics in Car Fatalities
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
credit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstractcredit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstract
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversions
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYNAutomobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
 
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICSCOMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
 
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfTanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
 
Automobile Insurance Claim Fraud Detection
Automobile Insurance Claim Fraud DetectionAutomobile Insurance Claim Fraud Detection
Automobile Insurance Claim Fraud Detection
 
IRJET - Fraud Detection in Credit Card using Machine Learning Techniques
IRJET -  	  Fraud Detection in Credit Card using Machine Learning TechniquesIRJET -  	  Fraud Detection in Credit Card using Machine Learning Techniques
IRJET - Fraud Detection in Credit Card using Machine Learning Techniques
 
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
A LINEAR REGRESSION APPROACH TO PREDICTION OF STOCK MARKET TRADING VOLUME: A ...
 
Travel Insurance Prediction - Mehataab Shaikh.pptx
Travel Insurance Prediction - Mehataab Shaikh.pptxTravel Insurance Prediction - Mehataab Shaikh.pptx
Travel Insurance Prediction - Mehataab Shaikh.pptx
 
A Novel Performance Measure for Machine Learning Classification
A Novel Performance Measure for Machine Learning ClassificationA Novel Performance Measure for Machine Learning Classification
A Novel Performance Measure for Machine Learning Classification
 
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATIONA NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
 
Performance of the classification algorithm
Performance of the classification algorithmPerformance of the classification algorithm
Performance of the classification algorithm
 
Research Report - Are You Equipped to Successfully Combat Fraud
Research Report - Are You Equipped to Successfully Combat FraudResearch Report - Are You Equipped to Successfully Combat Fraud
Research Report - Are You Equipped to Successfully Combat Fraud
 

Recently uploaded

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 

Recently uploaded (20)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Detect Insurance Fraud with ML

  • 1. Insurance Fraud Claims Detection Arul Kumar ARK 225229103 I MSc Data Science Bishop Heber College (Autonomous), Trichy
  • 2. INTRODUCTION Insurance fraud claims refer to the illegal act of filing a false insurance claim or exaggerating a legitimate claim for financial gain. Fraudulent insurance claims not only result in financial losses for the insurance companies but also drive up the premiums for honest policyholders. Therefore, insurance companies invest significant resources in detecting and preventing insurance fraud claims.
  • 3. There are various techniques that insurance companies can use to detect fraud. Some of the commonly used methods include: ● Data analytics ● Machine learning ● Social media monitoring ● Investigative techniques ● Fraud detection software
  • 4. Machine learning is increasingly being used for insurance fraud claims detection. Machine learning algorithms can analyze large amounts of data to detect patterns that indicate fraud. There are several techniques that can be used in machine learning for insurance fraud claims detection, including: ● Supervised learning ● Unsupervised learning ● Deep learning ● Ensemble learning
  • 5. MOTIVATION: The motivation behind fraud claims detection is to protect insurance companies from financial losses that can result from fraudulent activities. By make use of some Machine Learning Algorithms to Detecting fraudulent claims 20XX 20XX 20XX 20XX
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. Dataset description The Insurance Fraud Claims Detection dataset is a collection of insurance claims made by policyholders. The dataset is designed to help insurance companies detect fraudulent claims and improve their claims processing accuracy. The dataset contains a total of 1000 instances and 40 features, including both numerical and categorical variables. Each instance in the dataset represents a single insurance claim, and the features describe various aspects of the claim, such as the policyholder's age, gender, location, type of insurance, claim amount, and other related information. The target variable in the dataset is a binary label indicating whether the claim is fraudulent or not. About 14.4% of the claims in the dataset are labeled as fraudulent.
  • 11. Columns ‘months_as_customer’ , 'age', 'policy_number', 'policy_bind_date', 'policy_state', 'policy_csl', 'policy_deductable','policy_annual_premium', 'umbrella_limit', 'insured_zip', 'insured_sex','insured_education_level', 'insured_occupation', 'insured_hobbies', 'insured_relationship', 'capital-gains', 'capital-loss', 'incident_date', 'incident_type', 'collision_type', 'incident_severity', 'authorities_contacted', 'incident_state', 'incident_city', 'incident_location', 'incident_hour_of_the_day', 'number_of_vehicles_involved', 'property_damage', 'bodily_injuries', 'witnesses', 'police_report_available', 'total_claim_amount', 'injury_claim', 'property_claim', 'vehicle_claim', 'auto_make', 'auto_model', 'auto_year', 'fraud_reported', '_c39'
  • 12. Numerical Columns respective with Fraud report
  • 13. Categorical Columns respective with Fraud report
  • 14. Plot Heatmap : Headmap to check Correlation ( Correlation explains how one or more variables are related to each other )
  • 15. Check Outlier : *Outlier decreases the value of a correlation coefficient and weakens the regression relationship*
  • 16. StandardScaler for standardize the features of a dataset LabelEncoder used for encoding categorical variables as numerical variables. It converts each unique categorical value into a numerical Split ● X: the array of feature values ● y: the array of target values ● test_size: the proportion of the data to be used for testing (usually between 0.2 and 0.3) ● random_state: a random seed for reproducibility ● X_train: the array of feature values for the training set ● X_test: the array of feature values for the testing set ● y_train: the array of target values for the training set ● y_test: the array of target values for the testing set Fit And Transform
  • 21. Tree
  • 22. Comparison LogisticRegression Accuracy Score : 0.72 Mean Squared Error : 0.28 KNeighborsClassifier Accuracy Score : 0.685 Mean Squared Error : 0.315 DecisionTreeClassifier Accuracy Score : 0.805 Mean Squared Error : 0.19
  • 24. Confusion Matrix Comparison Logistic Regression K-Nearest Neighbors Decision Tree
  • 25. The best model with the lowest MSE to be selected is ['DecisionTreeClassifier'] Lowest MSE
  • 26. DecisionTreeClassifier : Best estimator *GridSearchCV* Best Parameters : {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 1, 'min_samples_split': 3}
  • 27. DecisionTreeClassifier : Best estimator *GridSearchCV*
  • 30. Classification Report DTC vs DTC :Important features vs DTC : Best estimator DTC DTC :Important features DTC : Best estimator
  • 31. Confusion Matrix Comparison DTC vs DTC :Important features vs DTC : Best estimator DTC DTC :Important features DTC : Best estimator
  • 32. Function : plot_confusion_matrix The confusion matrix is a table that is used to evaluate the performance of a classification model by comparing the predicted labels of the model with the true labels. The confusion matrix shows the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) that the model has produced. The plot_confusion_matrix function takes a trained classifier and a set of test data as inputs and plots a colored matrix that represents the values in the confusion matrix. The rows of the matrix represent the true labels, while the columns represent the predicted labels. The diagonal of the matrix represents the correct predictions, while the off-diagonal elements represent the incorrect predictions. The color of each cell represents the number of instances that have been classified in that category. The plot_confusion_matrix function can help in understanding the performance of a classifier by visualizing how well the model is predicting each class. It can also be used to compare the performance of different classifiers or different hyperparameters of the same classifier. Overall, plot_confusion_matrix is a useful tool in the evaluation and comparison of classification models, as it provides an intuitive way to visualize and understand the performance of the models.
  • 33. ROC DTC vs DTC :Important features vs DTC : Best estimator
  • 34. Receiver Operating Characteristic (ROC) When comparing ROC curves, we are typically interested in determining which model performs better at distinguishing between the positive and negative cases. The ROC curve can help us to visualize this comparison by showing the trade-off between true positive rate (TPR) and false positive rate (FPR) for each model. In general, a better model will have an ROC curve that is closer to the top-left corner of the plot, which corresponds to higher TPR and lower FPR. Conversely, a worse model will have an ROC curve that is closer to the diagonal line, which corresponds to random guessing. Another way to compare ROC curves is to calculate the area under the curve (AUC) for each model. The AUC is a metric that summarizes the overall performance of the model, with a perfect classifier having an AUC of 1 and a random classifier having an AUC of 0.5. If the AUC values of two models are compared, the model with the higher AUC is considered to be a better model. This is because the AUC provides a single value that summarizes the overall performance of the model across all possible classification thresholds. In summary, when comparing ROC curves, we can visually compare the trade-off between TPR and FPR for each model, and we can also compare the AUC values to determine which model has better overall performance.
  • 35. CONCLUSION Insurance Fraud Claims Detection in Machine Learning is a crucial application of supervised learning algorithms in the insurance industry. It helps insurers to identify and prevent fraudulent activities by predicting whether a given insurance claim is fraudulent or not. By reducing their financial losses, insurers can offer competitive premiums to their customers and improve customer satisfaction. Moreover, detecting fraudulent activities can also help insurers to maintain their reputation in the market by preventing negative publicity due to fraudulent claims. Therefore, the use of Machine Learning in Insurance Fraud Claims Detection is beneficial for both insurers and policyholders alike.