SlideShare a Scribd company logo
1 of 18
Presented by: Pallavi Mohanty
Introduction
01
Research
Objective
02
Data
Preprocessing
04
Exploratory
Data
Analysis(EDA)
05
Data
Collection
03
Feature
Engineering
06
Data
Splitting
07
Model
Selection
08
Model
Training
09
Model
Evaluation
10
Model
Testing
11
Results and
Conclusion
12
In today's financial landscape, credit scoring plays a pivotal role in shaping
individuals' access to credit, loans, and financial opportunities. Whether
you're a consumer seeking a mortgage, a business owner looking for
capital, or a lender evaluating risk, understanding credit scoring is essential
for navigating the complex world of finance.
Definition of Credit Scoring
Credit scoring is a statistical method used by lenders and financial
institutions to evaluate the creditworthiness of individuals or entities
seeking to borrow money. It involves the systematic assessment of
various factors related to an individual's financial history, behavior, and
risk profile to generate a numerical score, often referred to as a credit
score. This score serves as a quantitative measure of the likelihood
that a borrower will repay their debts responsibly and on time.
Overall, credit scoring is a cornerstone of the modern credit system,
facilitating efficient and equitable allocation of credit while balancing
the interests of borrowers and lenders.
01
02
03
04
Develop a Robust Credit Scoring Model: The primary objective of this project is to
develop a machine learning model capable of accurately classifying individuals into
credit score brackets based on their credit-related information.
Enhance Credit Assessment Efficiency: By automating the credit scoring
process, the project aims to reduce manual efforts and streamline the evaluation
of loan and credit applicants.
Evaluate Key Credit Assessment Factors: Another objective is to identify and
evaluate the most influential factors affecting credit scores. By analyzing various
features such as payment behavior, credit utilization ratio, and credit history age,
it seeks to determine which variables have greatest impact on creditworthiness.
Facilitate Financial Inclusion and Fairness: The project aims to
promote financial inclusion by developing a credit scoring model that
considers a diverse range of factors beyond traditional credit metrics.
These objectives align with the overarching goal of building an intelligent system to
classify individuals into credit score brackets, ultimately benefiting both financial
companies and consumers in the lending process. Understand the financial behavior of
customers and identify patterns or trends that may influence their creditworthiness.
• ID: Unique ID of the record
• Customer_ID: Unique ID of the customer
• Month: Month of the year
• Name: The name of the person
• Age: The age of the person
• SSN: Social Security Number of the person
• Occupation: The occupation of the person
• Annual_Income: The Annual Income of the person
• Monthly_Inhand_Salary: Monthly in-hand salary of the person
• Num_Bank_Accounts: The number of bank accounts of the person
• Num_Credit_Card: Number of credit cards the person is having
• Interest_Rate: The interest rate on the credit card of the person
• Num_of_Loan: The number of loans taken by the person from the bank
• Type_of_Loan: The types of loans taken by the person from the bank
• Delay_from_due_date: The average number of days delayed by the person
from the date of payment
• Num_of_Delayed_Payment: Number of payments delayed by the person
• Changed_Credit_Card: The percentage change in the credit card limit of the person
• Num_Credit_Inquiries: The number of credit card inquiries by the person
• Credit_Mix: Classification of Credit Mix of the customer
• Outstanding_Debt: The outstanding balance of the person
• Credit_Utilization_Ratio: The credit utilization ratio of the credit card of the customer
• Credit_History_Age: The age of the credit history of the person
• Payment_of_Min_Amount: Yes if the person paid the minimum amount to be paid
only, otherwise no.
• Total_EMI_per_month: The total EMI per month of the person
• Amount_invested_monthly: The monthly amount invested by the person
• Payment_Behaviour: The payment behaviour of the person
• Monthly_Balance: The monthly balance left in the account of the person
• Credit_Score: The credit score of the person
The dataset contains detailed information about
individuals' financial profiles, including their age,
occupation, annual income, and credit-related
metrics such as the number of bank accounts, credit
cards, and loans they hold. The ultimate target
variable, "Credit_Score," serves as a numerical
representation of individuals' creditworthiness.
• Cleaning
In the Dataset, data is conatins some error like "_", "NM", "!@9#%8", "_______", and the datatypes as well . So, we
tend to solve it by doing the relpacement or by using various method
2. Missing Values
We filled in the empty values in the loan type variables with the KNN Imputer method. We will visualize the missing data
with the help of the missingno library. We examined the correlation between missing data. If the correlation is high, the
missing data did not occur randomly. In this case, we removed these observations from the data set. Each id
represents a customer and the customer has multiple transactions recorded. Considering this situation, we will fill in
the missing values.
Observation -
1. Could not convert Changed_Credit_Limit to float. The reason is that "" cannot be converted to float.
2. Min of age value is -500. Age variables shouldn't have negative values.
3. Min of Num_Bank_Accounts is -1. Num_Bank_Accounts variables shouldn't have negative values.
4. Min of Num_of_Loan is -100. Num_of_Loan variables shouldn't have negative values.
5. The customer may have paid his loan before the due date. Therefore * Delay_from_due_date can contain
negative values.
6.Numerical variables include outlier values.
7. There is a moderate positive correlation between Delay_from_due_date and Outstanding_Dedt
8.There is a moderate positive correlation between Changed_Credict_Limit and Outstanding_Dedt
3. Outliers Detection
We handled outliers using the IQR method. we filled the outlier observations in continuous variables with the median
value of the relevant variable.
Continuous variables in which class distinctions are
evident:
• Num_Bank_Accounts
• Num_Credict_Card
• Interest_Rate
• Num_of_Loan
• Delay_from_due_date
• Num_of_Delayed_Payment
• Num_Credit_Inquiries
• Outstanding_Debt
• Credict_History_Age
Performing EDA to understand the characteristics
of the credit data. Visualizing trends, patterns, and
correlations withinthe data. Exploring factors such
as credit utilization, payment history, income and
types of loans.
V
OBSERVATION-
• The dependent variable is evenly
distributed in the data set.
OBSERVATION-
• Credit score averages are close to
each other in the month, occupation
and payment behavior variable groups.
• In credit mix and payment of min
amount, the distinction between credit
score averages according to groups is
clear.
• Let's gather the groups whose
credit score averages are close to
each other in variable Paymen
Behaviour into a single group.
One Way ANOVA
Test
Chi-square Test of
Independence
• H0: There is no relationship between
two variables.
• H1: There is a relationship between
two variables.
Homogeneity of
variances test
• With the ANOVA test, it is tested statistically
whether the averages between at least two
groups are different.
• There is an assumption of normality and
homogeneity of variances. Since the number of
data is large, it is assumed that the data is
normally distributed according to the central
limit theorem. We will test whether the variances
are homogeneous. If it is not homogeneous, we
will use a non-parametric anova test.
• H0: u1=u2=...=un
• H1: u1!=u2!=..!=un
• H0: Variances are homogeneous
• H1: Variances are not homogeneous
In this context, 'X_train' contains the
independent variables or features from the
original dataset, excluding the
"Credit_Score" column, while 'y_train'
comprises the corresponding
"Credit_Score" values.
The testing split, denoted by 'X_test' and 'y_test'
in the given code, represents a distinct subset of
the original dataset that is reserved for evaluating
the performance of the trained machine learning
model. 'X_test' comprises the independent
variables, excluding the "Credit_Score" column
The dataset's independent variables ('x') are split into two subsets:
'X_train' and 'X_test', while the corresponding dependent variable ('y') is
split into 'y_train' and 'y_test'. The test_size parameter is set to 0.20,
indicating that 20% of the data will be allocated to the testing set,
leaving the remaining 80% for training the model. Additionally, the
random_state parameter is set to 42, ensuring reproducibility by fixing
the random seed for the data split.
Principal Component Analysis
PCA reduces the dimensionality and keeps the data set with the highest variance in high-
dimensional data. Our dataset is high dimensional. We will try to reduce the size and continue our
analysis with fewer variables without losing too much information from our data set.
K-Nearest Neighbors
KNN is a supervised machine learning algorithm used for classification and regression tasks. It works
by identifying the 'k' nearest data points in the feature space to a given input, and the output is
determined by the majority class or the average of the 'k' nearest neighbors.
Random Forest
Random Forest is an ensemble learning method based on constructing a multitude of decision trees
during training and outputting the class that is the mode of the classes (classification) or mean
prediction (regression) of the individual trees.
Bagging Classifier
Bagging, short for Bootstrap Aggregating, is an ensemble meta-algorithm that aims to improve the
stability and accuracy of machine learning algorithms.
XGBoost
XGBoost is an efficient and scalable implementation of gradient boosting. It is widely used for
supervised learning tasks and has gained popularity for its speed and performance.
• In this part, we will create classification models without hyperparameter optimization. We will
apply hyperparameter optimization to the models that achieve the highest accuracy values.
• We apply this method because there will be a problem caused by the CPU.
Hyperparameter Tuning
Since the data is very large, the CPU is insufficient for
hyperparameter optimization.
We will find the n_neighbors parameter that gives the most
successful results for the KNN model. Then we will build Random
Forest classifier model and compare two models.
• The data set shows unbalance distribution. This may
cause a biased estimate. So we will use SMOTE, an
oversampling process that allows synthetic data to be
generated.
• Artificial variables were added to the data set with the
Smote method. The independent variable groups
became equal to each other. In this way, we will try to
prevent biased learning.
• In the ensemble model, the prediction of the credit
score with the good label improved. Accuracy
increased to 0.79.
Credit Scoring Capstone Project- Pallavi Mohanty.pptx

More Related Content

Similar to Credit Scoring Capstone Project- Pallavi Mohanty.pptx

Whitepaper - Leveraging Analytics to build collection strategies
Whitepaper - Leveraging Analytics to build collection strategiesWhitepaper - Leveraging Analytics to build collection strategies
Whitepaper - Leveraging Analytics to build collection strategiesArup Das
 
Consumer credit-risk3440
Consumer credit-risk3440Consumer credit-risk3440
Consumer credit-risk3440stone55
 
Historical Credit Data | Total Credit Card Spend
Historical Credit Data | Total Credit Card SpendHistorical Credit Data | Total Credit Card Spend
Historical Credit Data | Total Credit Card SpendExperian
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
 
ROLE OF credit score WHILE SanctionING LOAN .pptx
ROLE OF credit score  WHILE SanctionING LOAN .pptxROLE OF credit score  WHILE SanctionING LOAN .pptx
ROLE OF credit score WHILE SanctionING LOAN .pptxrekhabawa2
 
Business analytics in banking sector
Business analytics in banking sectorBusiness analytics in banking sector
Business analytics in banking sectorVikhilSonna
 
The Credit Management Analysis Of Customers Traffic Flows And Pattern[1].pptx
The Credit Management Analysis Of Customers Traffic Flows And Pattern[1].pptxThe Credit Management Analysis Of Customers Traffic Flows And Pattern[1].pptx
The Credit Management Analysis Of Customers Traffic Flows And Pattern[1].pptxkarthiknat1807
 
Syoncloud big data for retail banking
Syoncloud big data for retail bankingSyoncloud big data for retail banking
Syoncloud big data for retail bankingSyoncloud
 
Syoncloud big data for retail banking, Syoncloud
Syoncloud big data for retail banking,  SyoncloudSyoncloud big data for retail banking,  Syoncloud
Syoncloud big data for retail banking, SyoncloudLadislav Urban
 
Loan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring ModelLoan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring ModelSaurabh Singh
 
Customer Analytics in Banking: Understand Your Customers
Customer Analytics in Banking: Understand Your CustomersCustomer Analytics in Banking: Understand Your Customers
Customer Analytics in Banking: Understand Your CustomersKavika Roy
 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 
PredictiveMetrics' Predictive Scoring for Collections Capabilities
PredictiveMetrics' Predictive Scoring for Collections CapabilitiesPredictiveMetrics' Predictive Scoring for Collections Capabilities
PredictiveMetrics' Predictive Scoring for Collections CapabilitiesPredictiveMetrics, Inc.
 
Customer Lifetime Value
Customer Lifetime ValueCustomer Lifetime Value
Customer Lifetime ValueJennaToler
 
AFCPE 2014-25 Financial Wellness Metrics
AFCPE 2014-25 Financial Wellness MetricsAFCPE 2014-25 Financial Wellness Metrics
AFCPE 2014-25 Financial Wellness MetricsBarbara O'Neill
 

Similar to Credit Scoring Capstone Project- Pallavi Mohanty.pptx (20)

Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Whitepaper - Leveraging Analytics to build collection strategies
Whitepaper - Leveraging Analytics to build collection strategiesWhitepaper - Leveraging Analytics to build collection strategies
Whitepaper - Leveraging Analytics to build collection strategies
 
K-MODEL PPT.pptx
K-MODEL PPT.pptxK-MODEL PPT.pptx
K-MODEL PPT.pptx
 
Consumer credit-risk3440
Consumer credit-risk3440Consumer credit-risk3440
Consumer credit-risk3440
 
Historical Credit Data | Total Credit Card Spend
Historical Credit Data | Total Credit Card SpendHistorical Credit Data | Total Credit Card Spend
Historical Credit Data | Total Credit Card Spend
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
 
Assignment#09
Assignment#09Assignment#09
Assignment#09
 
ROLE OF credit score WHILE SanctionING LOAN .pptx
ROLE OF credit score  WHILE SanctionING LOAN .pptxROLE OF credit score  WHILE SanctionING LOAN .pptx
ROLE OF credit score WHILE SanctionING LOAN .pptx
 
Business analytics in banking sector
Business analytics in banking sectorBusiness analytics in banking sector
Business analytics in banking sector
 
The Credit Management Analysis Of Customers Traffic Flows And Pattern[1].pptx
The Credit Management Analysis Of Customers Traffic Flows And Pattern[1].pptxThe Credit Management Analysis Of Customers Traffic Flows And Pattern[1].pptx
The Credit Management Analysis Of Customers Traffic Flows And Pattern[1].pptx
 
Syoncloud big data for retail banking
Syoncloud big data for retail bankingSyoncloud big data for retail banking
Syoncloud big data for retail banking
 
Syoncloud big data for retail banking, Syoncloud
Syoncloud big data for retail banking,  SyoncloudSyoncloud big data for retail banking,  Syoncloud
Syoncloud big data for retail banking, Syoncloud
 
Loan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring ModelLoan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring Model
 
Forecasting peer-to-peer lending risk
Forecasting peer-to-peer lending riskForecasting peer-to-peer lending risk
Forecasting peer-to-peer lending risk
 
Customer Analytics in Banking: Understand Your Customers
Customer Analytics in Banking: Understand Your CustomersCustomer Analytics in Banking: Understand Your Customers
Customer Analytics in Banking: Understand Your Customers
 
Creditscore
CreditscoreCreditscore
Creditscore
 
PredictiveMetrics' Predictive Scoring for Collections Capabilities
PredictiveMetrics' Predictive Scoring for Collections CapabilitiesPredictiveMetrics' Predictive Scoring for Collections Capabilities
PredictiveMetrics' Predictive Scoring for Collections Capabilities
 
Customer Lifetime Value
Customer Lifetime ValueCustomer Lifetime Value
Customer Lifetime Value
 
AFCPE 2014-25 Financial Wellness Metrics
AFCPE 2014-25 Financial Wellness MetricsAFCPE 2014-25 Financial Wellness Metrics
AFCPE 2014-25 Financial Wellness Metrics
 

More from Boston Institute of Analytics

Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgBoston Institute of Analytics
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFBoston Institute of Analytics
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Boston Institute of Analytics
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesBoston Institute of Analytics
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionBoston Institute of Analytics
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachBoston Institute of Analytics
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationBoston Institute of Analytics
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 

More from Boston Institute of Analytics (20)

Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Detecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven ApproachDetecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven Approach
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Analyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning projectAnalyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning project
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning Approach
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project Presentation
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 

Recently uploaded

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Recently uploaded (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Credit Scoring Capstone Project- Pallavi Mohanty.pptx

  • 3. In today's financial landscape, credit scoring plays a pivotal role in shaping individuals' access to credit, loans, and financial opportunities. Whether you're a consumer seeking a mortgage, a business owner looking for capital, or a lender evaluating risk, understanding credit scoring is essential for navigating the complex world of finance. Definition of Credit Scoring Credit scoring is a statistical method used by lenders and financial institutions to evaluate the creditworthiness of individuals or entities seeking to borrow money. It involves the systematic assessment of various factors related to an individual's financial history, behavior, and risk profile to generate a numerical score, often referred to as a credit score. This score serves as a quantitative measure of the likelihood that a borrower will repay their debts responsibly and on time. Overall, credit scoring is a cornerstone of the modern credit system, facilitating efficient and equitable allocation of credit while balancing the interests of borrowers and lenders.
  • 4. 01 02 03 04 Develop a Robust Credit Scoring Model: The primary objective of this project is to develop a machine learning model capable of accurately classifying individuals into credit score brackets based on their credit-related information. Enhance Credit Assessment Efficiency: By automating the credit scoring process, the project aims to reduce manual efforts and streamline the evaluation of loan and credit applicants. Evaluate Key Credit Assessment Factors: Another objective is to identify and evaluate the most influential factors affecting credit scores. By analyzing various features such as payment behavior, credit utilization ratio, and credit history age, it seeks to determine which variables have greatest impact on creditworthiness. Facilitate Financial Inclusion and Fairness: The project aims to promote financial inclusion by developing a credit scoring model that considers a diverse range of factors beyond traditional credit metrics. These objectives align with the overarching goal of building an intelligent system to classify individuals into credit score brackets, ultimately benefiting both financial companies and consumers in the lending process. Understand the financial behavior of customers and identify patterns or trends that may influence their creditworthiness.
  • 5. • ID: Unique ID of the record • Customer_ID: Unique ID of the customer • Month: Month of the year • Name: The name of the person • Age: The age of the person • SSN: Social Security Number of the person • Occupation: The occupation of the person • Annual_Income: The Annual Income of the person • Monthly_Inhand_Salary: Monthly in-hand salary of the person • Num_Bank_Accounts: The number of bank accounts of the person • Num_Credit_Card: Number of credit cards the person is having • Interest_Rate: The interest rate on the credit card of the person • Num_of_Loan: The number of loans taken by the person from the bank • Type_of_Loan: The types of loans taken by the person from the bank • Delay_from_due_date: The average number of days delayed by the person from the date of payment • Num_of_Delayed_Payment: Number of payments delayed by the person • Changed_Credit_Card: The percentage change in the credit card limit of the person • Num_Credit_Inquiries: The number of credit card inquiries by the person • Credit_Mix: Classification of Credit Mix of the customer • Outstanding_Debt: The outstanding balance of the person • Credit_Utilization_Ratio: The credit utilization ratio of the credit card of the customer • Credit_History_Age: The age of the credit history of the person • Payment_of_Min_Amount: Yes if the person paid the minimum amount to be paid only, otherwise no. • Total_EMI_per_month: The total EMI per month of the person • Amount_invested_monthly: The monthly amount invested by the person • Payment_Behaviour: The payment behaviour of the person • Monthly_Balance: The monthly balance left in the account of the person • Credit_Score: The credit score of the person The dataset contains detailed information about individuals' financial profiles, including their age, occupation, annual income, and credit-related metrics such as the number of bank accounts, credit cards, and loans they hold. The ultimate target variable, "Credit_Score," serves as a numerical representation of individuals' creditworthiness.
  • 6. • Cleaning In the Dataset, data is conatins some error like "_", "NM", "!@9#%8", "_______", and the datatypes as well . So, we tend to solve it by doing the relpacement or by using various method 2. Missing Values We filled in the empty values in the loan type variables with the KNN Imputer method. We will visualize the missing data with the help of the missingno library. We examined the correlation between missing data. If the correlation is high, the missing data did not occur randomly. In this case, we removed these observations from the data set. Each id represents a customer and the customer has multiple transactions recorded. Considering this situation, we will fill in the missing values. Observation - 1. Could not convert Changed_Credit_Limit to float. The reason is that "" cannot be converted to float. 2. Min of age value is -500. Age variables shouldn't have negative values. 3. Min of Num_Bank_Accounts is -1. Num_Bank_Accounts variables shouldn't have negative values. 4. Min of Num_of_Loan is -100. Num_of_Loan variables shouldn't have negative values. 5. The customer may have paid his loan before the due date. Therefore * Delay_from_due_date can contain negative values. 6.Numerical variables include outlier values. 7. There is a moderate positive correlation between Delay_from_due_date and Outstanding_Dedt 8.There is a moderate positive correlation between Changed_Credict_Limit and Outstanding_Dedt
  • 7. 3. Outliers Detection We handled outliers using the IQR method. we filled the outlier observations in continuous variables with the median value of the relevant variable. Continuous variables in which class distinctions are evident: • Num_Bank_Accounts • Num_Credict_Card • Interest_Rate • Num_of_Loan • Delay_from_due_date • Num_of_Delayed_Payment • Num_Credit_Inquiries • Outstanding_Debt • Credict_History_Age
  • 8. Performing EDA to understand the characteristics of the credit data. Visualizing trends, patterns, and correlations withinthe data. Exploring factors such as credit utilization, payment history, income and types of loans.
  • 9. V OBSERVATION- • The dependent variable is evenly distributed in the data set.
  • 10. OBSERVATION- • Credit score averages are close to each other in the month, occupation and payment behavior variable groups. • In credit mix and payment of min amount, the distinction between credit score averages according to groups is clear. • Let's gather the groups whose credit score averages are close to each other in variable Paymen Behaviour into a single group.
  • 11.
  • 12.
  • 13. One Way ANOVA Test Chi-square Test of Independence • H0: There is no relationship between two variables. • H1: There is a relationship between two variables. Homogeneity of variances test • With the ANOVA test, it is tested statistically whether the averages between at least two groups are different. • There is an assumption of normality and homogeneity of variances. Since the number of data is large, it is assumed that the data is normally distributed according to the central limit theorem. We will test whether the variances are homogeneous. If it is not homogeneous, we will use a non-parametric anova test. • H0: u1=u2=...=un • H1: u1!=u2!=..!=un • H0: Variances are homogeneous • H1: Variances are not homogeneous
  • 14. In this context, 'X_train' contains the independent variables or features from the original dataset, excluding the "Credit_Score" column, while 'y_train' comprises the corresponding "Credit_Score" values. The testing split, denoted by 'X_test' and 'y_test' in the given code, represents a distinct subset of the original dataset that is reserved for evaluating the performance of the trained machine learning model. 'X_test' comprises the independent variables, excluding the "Credit_Score" column The dataset's independent variables ('x') are split into two subsets: 'X_train' and 'X_test', while the corresponding dependent variable ('y') is split into 'y_train' and 'y_test'. The test_size parameter is set to 0.20, indicating that 20% of the data will be allocated to the testing set, leaving the remaining 80% for training the model. Additionally, the random_state parameter is set to 42, ensuring reproducibility by fixing the random seed for the data split.
  • 15. Principal Component Analysis PCA reduces the dimensionality and keeps the data set with the highest variance in high- dimensional data. Our dataset is high dimensional. We will try to reduce the size and continue our analysis with fewer variables without losing too much information from our data set. K-Nearest Neighbors KNN is a supervised machine learning algorithm used for classification and regression tasks. It works by identifying the 'k' nearest data points in the feature space to a given input, and the output is determined by the majority class or the average of the 'k' nearest neighbors. Random Forest Random Forest is an ensemble learning method based on constructing a multitude of decision trees during training and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Bagging Classifier Bagging, short for Bootstrap Aggregating, is an ensemble meta-algorithm that aims to improve the stability and accuracy of machine learning algorithms. XGBoost XGBoost is an efficient and scalable implementation of gradient boosting. It is widely used for supervised learning tasks and has gained popularity for its speed and performance.
  • 16. • In this part, we will create classification models without hyperparameter optimization. We will apply hyperparameter optimization to the models that achieve the highest accuracy values. • We apply this method because there will be a problem caused by the CPU. Hyperparameter Tuning Since the data is very large, the CPU is insufficient for hyperparameter optimization. We will find the n_neighbors parameter that gives the most successful results for the KNN model. Then we will build Random Forest classifier model and compare two models.
  • 17. • The data set shows unbalance distribution. This may cause a biased estimate. So we will use SMOTE, an oversampling process that allows synthetic data to be generated. • Artificial variables were added to the data set with the Smote method. The independent variable groups became equal to each other. In this way, we will try to prevent biased learning. • In the ensemble model, the prediction of the credit score with the good label improved. Accuracy increased to 0.79.