Data Mining.pptx

•Download as PPTX, PDF•

0 likes•12 views

• Scrubbed and processed outliers in demographic data from the US Census Bureau for 263.4 million residents of 1994 • Designed supervised learning Nave Bayes, Logit, Decision Tree, and Random Forest models to predict the proclivity of a family with an annual income of more than $50,000 for a multinational banking enterprise • Determined that Random Forest had the best accuracy (~93%), precision, and sensitivity from Confusion Matrix & ROC; the anticipated results were used in a $25 million direct marketing effort • Deployed K-means, KNN, and Neural Networks to identify individuals who are more likely to default on loans in the future. To compare the effectiveness of various machine learning models, ROC and Accuracy statistics were evaluated

Economy & Finance

Cooper Clark, Rayten Tiano, Yash
Guptaa
PREDICTING LOAN
DEFAULT FROM A
BANK

EXECUTIVE SUMMARY
• Our banking enterprise has called upon us to do an analysis of 30,000 customers to see if we are
able to determine an important prediction: Loan Defaults
• The data comes from this research: Yeh, I. C., & Lien, C. H. (2009). Expert Systems with
Applications, 36(2), 2473-2480
• Through our analysis, we were able to create strong segmentation models that made predictions with
high accuracies (ANN~ 81.72%)

SLICE & DICE
• Total of 30,000 customers with the majority of Females
• 11,888 Male customers of which 2,873 have defaulted (24.16%)
• 18,112 Female customers of which 3,763 have defaulted (20.77%)
Marital Status Defaults
Married 13659
Single 15964
Other 323
2873 3763
9015
14349
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Male (11,888) Female (18,112)
Demographics (30,000)
Defaults No Defaults
Table A: Defaults on basis of marital status
Table B: Demographic distribution of sex

VISUALIZATIONS
Chart 1: BILL_AMT1 distribution against Males | Peak - 4743
Chart 2: BILL_AMT1 distribution against Females | Peak -
7958
Chart 3: Age against
Defaults
• BILL_AMT1 for both sexes is positively skewed
• Defaults peak at the mean age of 61.6

SIMPLE KNN (K=35)
# of rows: No (predicted) Yes
(Predicted)
No (Actual) 6751 280
Yes (Actual) 1420 549
Accuracy 81.1 %
Misclassification Rat
e
18.9 %
True Positive Rate 0.279
False Positive Rate 0.721
Specificity 0.96
Precision 0.662
Prevalence 0.218
Confusion Matrix
Data is portioned in a 70 - 30 split for model building purposes
Class Statistics
Graph: ROC Curve for Simple KNN Model (k=35), AUC = 0.7427

K-MEANS (2 CLUSTERS, K= 27)
# of
rows:
No
(predicted
)
Yes
(Predicted)
No
(Actual)
4279 210
Yes
(Actual) 869 329
Accuracy 81.03 %
Misclassification Rat
e
18.98 %
True Positive Rate 0.275
False Positive Rate 0.725
Specificity 0.953
Precision 0.61
Prevalence 0.21
# of
rows:
No
(predicted
)
Yes
(Predicted)
No
(Actual)
2454 66
Yes
(Actual) 579 214
Accuracy 80.53 %
Misclassification Rat
e
19.47 %
True Positive Rate 0.27
False Positive Rate 0.73
Specificity 0.974
Precision 0.764
Prevalence 0.239
Age <= 37
AUC: 0.74
Age > 37
AUC: 0.73
Data is
portioned in a
70 - 30 split for
model-building
purposes
The
unsegmented
Model has a
better overall
performance
by 0.33 %

K-MEANS (3 CLUSTERS, K= 18)
# of
rows: No Yes
No 2715 143
Yes 585 227
Accuracy 80.16 %
Misclassification Rate 19.84 %
True Positive Rate 0.28
False Positive Rate 0.72
Specificity 0.95
Precision 0.614
Prevalence 0.22
Age <= 31
AUC: 0.73
# of
rows: No Yes
No 2530 95
Yes 526 183
Accuracy 81.37 %
Misclassification Rate 18.63 %
True Positive Rate 0.258
False Positive Rate 0.742
Specificity 0.964
Precision 0.658
Prevalence 0.21
Age {32 – 41}
AUC: 0.735
# of
rows: No Yes
No 1455 67
Yes 340 135
Accuracy 79.62 %
Misclassification Rate 20.38 %
True Positive Rate 0.284
False Positive Rate 0.716
Specificity 0.956
Precision 0.668
Prevalence 0.237
Age >= 42
AUC: 0.73
Data is portioned in a 70 - 30 split for model-building purposes The unsegmented Model has a better overall performance by 0.72
%

ANN (EPOCH 1000, LEARNING RATE 0.3, MOMENTUM 0.2)
# of rows: No (predicted) Yes
(Predicted)
No (Actual) 6691 332
Yes (Actual) 1313 664
Accuracy 81.72 %
Misclassification Rat
e
18.28 %
True Positive Rate 0.336
False Positive Rate 0.664
Specificity 0.953
Precision 0.667
Prevalence 0.219
Confusion Matrix
Data is portioned in a 70 - 30 split for model building purposes
Class Statistics
Graph: ROC Curve for ANN Model, AUC = 0.7434

CONCLUDING POINTS
• With the use of the ANN (Neural Network) model, we had a stronger
accuracy of 81.72% from the Confusion Matrix
• It also gives a powerful ROC curve (AUC = 0.7434), therefore,
providing a fit accuracy for predicting loan defaulters
• The true positive rate (aka sensitivity) is the highest for the ANN model
Model Accuracy
Simple KNN 81.11 %
K Means (2 Clusters) 80.78 %
K Means (3 Clusters) 80.38 %
ANN 81.72 %

Recently uploaded

Best VIP Call Girls Noida Sector 18 Call Me: 8448380779Delhi Call girls

Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...Call Girls in Nagpur High Profile

Gurley shaw Theory of Monetary Economics.Vinodha Devi

The Economic History of the U.S. Lecture 19.pdfGale Pooley

06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdfFinTech Belgium

05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptxFinTech Belgium

Veritas Interim Report 1 January–31 March 2024Veritas Eläkevakuutus - Veritas Pensionsförsäkring

Basic concepts related to Financial modellingbaijup5

Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi

Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Pooja Nehwal

(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7Call Girls in Nagpur High Profile Call Girls

20240429 Calibre April 2024 Investor Presentation.pdfAdnet Communications

The Economic History of the U.S. Lecture 20.pdfGale Pooley

High Class Call Girls Nashik Maya 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

The Economic History of the U.S. Lecture 26.pdfGale Pooley

00_Main ppt_MeetupDORA&CyberSecurity.pptxFinTech Belgium

TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...ssifa0344

Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )Pooja Nehwal

Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Call Girls in Nagpur High Profile

Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile

Recently uploaded (20)

Best VIP Call Girls Noida Sector 18 Call Me: 8448380779

Booking open Available Pune Call Girls Talegaon Dabhade 6297143586 Call Hot ...

Gurley shaw Theory of Monetary Economics.

The Economic History of the U.S. Lecture 19.pdf

06_Joeri Van Speybroek_Dell_MeetupDora&Cybersecurity.pdf

05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx

Veritas Interim Report 1 January–31 March 2024

Basic concepts related to Financial modelling

Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking

Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...

(INDIRA) Call Girl Mumbai Call Now 8250077686 Mumbai Escorts 24x7

20240429 Calibre April 2024 Investor Presentation.pdf

The Economic History of the U.S. Lecture 20.pdf

High Class Call Girls Nashik Maya 7001305949 Independent Escort Service Nashik

The Economic History of the U.S. Lecture 26.pdf

00_Main ppt_MeetupDORA&CyberSecurity.pptx

TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...

Vip Call US 📞 7738631006 ✅Call Girls In Sakinaka ( Mumbai )

Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...

Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...

Featured

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Featured (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Data Mining.pptx

1. Cooper Clark, Rayten Tiano, Yash Guptaa PREDICTING LOAN DEFAULT FROM A BANK

2. EXECUTIVE SUMMARY • Our banking enterprise has called upon us to do an analysis of 30,000 customers to see if we are able to determine an important prediction: Loan Defaults • The data comes from this research: Yeh, I. C., & Lien, C. H. (2009). Expert Systems with Applications, 36(2), 2473-2480 • Through our analysis, we were able to create strong segmentation models that made predictions with high accuracies (ANN~ 81.72%)

3. DATA

4. SLICE & DICE • Total of 30,000 customers with the majority of Females • 11,888 Male customers of which 2,873 have defaulted (24.16%) • 18,112 Female customers of which 3,763 have defaulted (20.77%) Marital Status Defaults Married 13659 Single 15964 Other 323 2873 3763 9015 14349 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 Male (11,888) Female (18,112) Demographics (30,000) Defaults No Defaults Table A: Defaults on basis of marital status Table B: Demographic distribution of sex

5. VISUALIZATIONS Chart 1: BILL_AMT1 distribution against Males | Peak - 4743 Chart 2: BILL_AMT1 distribution against Females | Peak - 7958 Chart 3: Age against Defaults • BILL_AMT1 for both sexes is positively skewed • Defaults peak at the mean age of 61.6

6. MODELS

7. SIMPLE KNN (K=35) # of rows: No (predicted) Yes (Predicted) No (Actual) 6751 280 Yes (Actual) 1420 549 Accuracy 81.1 % Misclassification Rat e 18.9 % True Positive Rate 0.279 False Positive Rate 0.721 Specificity 0.96 Precision 0.662 Prevalence 0.218 Confusion Matrix Data is portioned in a 70 - 30 split for model building purposes Class Statistics Graph: ROC Curve for Simple KNN Model (k=35), AUC = 0.7427

8. K-MEANS (2 CLUSTERS, K= 27) # of rows: No (predicted ) Yes (Predicted) No (Actual) 4279 210 Yes (Actual) 869 329 Accuracy 81.03 % Misclassification Rat e 18.98 % True Positive Rate 0.275 False Positive Rate 0.725 Specificity 0.953 Precision 0.61 Prevalence 0.21 # of rows: No (predicted ) Yes (Predicted) No (Actual) 2454 66 Yes (Actual) 579 214 Accuracy 80.53 % Misclassification Rat e 19.47 % True Positive Rate 0.27 False Positive Rate 0.73 Specificity 0.974 Precision 0.764 Prevalence 0.239 Age <= 37 AUC: 0.74 Age > 37 AUC: 0.73 Data is portioned in a 70 - 30 split for model-building purposes The unsegmented Model has a better overall performance by 0.33 %

9. K-MEANS (3 CLUSTERS, K= 18) # of rows: No Yes No 2715 143 Yes 585 227 Accuracy 80.16 % Misclassification Rate 19.84 % True Positive Rate 0.28 False Positive Rate 0.72 Specificity 0.95 Precision 0.614 Prevalence 0.22 Age <= 31 AUC: 0.73 # of rows: No Yes No 2530 95 Yes 526 183 Accuracy 81.37 % Misclassification Rate 18.63 % True Positive Rate 0.258 False Positive Rate 0.742 Specificity 0.964 Precision 0.658 Prevalence 0.21 Age {32 – 41} AUC: 0.735 # of rows: No Yes No 1455 67 Yes 340 135 Accuracy 79.62 % Misclassification Rate 20.38 % True Positive Rate 0.284 False Positive Rate 0.716 Specificity 0.956 Precision 0.668 Prevalence 0.237 Age >= 42 AUC: 0.73 Data is portioned in a 70 - 30 split for model-building purposes The unsegmented Model has a better overall performance by 0.72 %

10. ANN (EPOCH 1000, LEARNING RATE 0.3, MOMENTUM 0.2) # of rows: No (predicted) Yes (Predicted) No (Actual) 6691 332 Yes (Actual) 1313 664 Accuracy 81.72 % Misclassification Rat e 18.28 % True Positive Rate 0.336 False Positive Rate 0.664 Specificity 0.953 Precision 0.667 Prevalence 0.219 Confusion Matrix Data is portioned in a 70 - 30 split for model building purposes Class Statistics Graph: ROC Curve for ANN Model, AUC = 0.7434

11. CONCLUDING POINTS • With the use of the ANN (Neural Network) model, we had a stronger accuracy of 81.72% from the Confusion Matrix • It also gives a powerful ROC curve (AUC = 0.7434), therefore, providing a fit accuracy for predicting loan defaulters • The true positive rate (aka sensitivity) is the highest for the ANN model Model Accuracy Simple KNN 81.11 % K Means (2 Clusters) 80.78 % K Means (3 Clusters) 80.38 % ANN 81.72 %

12. THANK YOU

Data Mining.pptx

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Data Mining.pptx