SlideShare a Scribd company logo
1 of 24
Download to read offline
SALARY
PREDICTION
Introduction:
• In today's dynamic job market, predicting salaries accurately plays a pivotal role in various aspects of workforce
management, recruitment, and financial planning. The ability to estimate salaries based on a range of factors
empowers organizations to make informed decisions regarding budget allocation, employee compensation, and
talent acquisition strategies. Therefore, the development of robust salary prediction models has become
increasingly valuable in modern business operations.
• The goal of our project is to construct a reliable salary prediction system that leverages machine learning
techniques to forecast salaries for individuals based on relevant attributes such as education, experience, skills,
and geographic location. By analyzing historical salary data and identifying patterns within the job market, our
aim is to create a model capable of providing accurate salary estimates for new job listings or assessing the
competitiveness of compensation packages offered by employers.
• Through this project, we seek to address several key challenges in salary prediction, including the inherent
variability in compensation across industries, regions, and job roles, as well as the complex interplay of factors
influencing salary determination. By applying advanced machine learning algorithms and feature engineering
techniques to large-scale datasets, we aim to develop a predictive model that not only achieves high accuracy
but also provides insights into the factors driving salary disparities and trends within the job market.
• Ultimately, our salary prediction project aims to empower businesses, recruiters, and job seekers alike with
actionable insights into salary expectations, thereby facilitating more transparent and equitable negotiations,
optimizing resource allocation, and supporting informed decision-making in the realm of human resource
management.
Problem Statement:
• In today's competitive job market, accurately predicting salaries for job positions is essential for organizations
to make informed decisions regarding budget allocation, compensation strategies, and talent acquisition.
However, the task of salary prediction presents several challenges due to the multifaceted nature of salary
determinants and the inherent variability within the job market.
• The primary challenge we aim to address with our salary prediction project is the accurate estimation of
salaries for individuals based on a diverse set of attributes, including but not limited to education level, years of
experience, specialized skills, industry sector, and geographic location. Additionally, we seek to account for the
complex interactions between these factors and their impact on salary levels across different job roles and
industries.
• Furthermore, the availability and quality of data for salary prediction can vary significantly, posing challenges in
terms of data preprocessing, feature selection, and model generalization. Additionally, factors such as inflation,
market demand, and economic conditions introduce temporal variability that must be accounted for in the
prediction process.
• By developing a robust salary prediction model, our objective is to address these challenges and provide
stakeholders with a reliable tool for estimating salaries with a high degree of accuracy and precision. This model
will not only aid organizations in optimizing their recruitment and compensation strategies but also assist job
seekers in negotiating fair and competitive salaries based on their qualifications and market demand.
• In summary, our salary prediction project seeks to bridge the gap between employer expectations and
candidate aspirations by leveraging machine learning techniques to provide transparent and data-driven salary
estimations, thereby facilitating more equitable and informed decision-making in the realm of human resource
management.
About Dataset
• The "Salary Prediction Dataset" is a synthetic dataset generated for the purpose of exploring salary prediction tasks. It
contains simulated data reflecting various factors influencing salary levels such as education, experience, location, job title,
age, and gender. This dataset can be utilized for predictive modeling tasks to estimate salaries based on these factors
• # Data Collection
• Salary Prediction Data ,Predict the salary according to the features
• (From kaggle.com)
• Explore, clean and prepare dataset:
• Check shape of original dataset:
• We have total 1000 rows and 7 columns
• Imported Dataset from the file.csv
Details step of Data Exploration:
The data process involves exploration , handling info of data , unique values, duplication values in data , finding null values and describe data
which shows the total values , min , average and max values of the data.
1. Info of data 2. unique values 3.Null values
4. Data describe 5. duplicate values of data
In the data we have 0 null values and 0 duplicate values.
Exploratory Data Analysis (EDA):
First we plotted a pie chart to find-out gender relationship,
Here the Gender value counts.
Here the relationship of gender , male has 51.60%
and women has 48.40%
#Hence proved the male has more salary then women.
• A bar plot is generated to display the “ Job Title “ Distribution to know the relationship between job title and salary.
• The bar plot offering insight into the frequency of different job title.
As we can see the following bar plot the more frequency has for the
manger job title and less frequency for the engineer job title.
The average frequency for the job title of director and Analyst.
• A donut chart is plotted to visualized education distribution and location distribution to finding out relationship between .
As per the education we can see the almost equal % salary but
qualified from high school people have highly paid package
Location distribution chart is offering insight that and rural and
suburban and rural area people has high package.
A heatmap is generated to visualize the correlation matrix of the entire dataset providing the comprehensive overview of relationship
between numerical variables.
Each cell in the heatmap corresponds to the correlation coefficient between the variables represented by the row and column.
1. The color intensity indicates the strength and direction of the correlation:
2. Dark blue indicates a strong negative correlation; Dark red indicates a strong positive correlation.
3. White indicates no correlation (correlation coefficient close to zero).
A stacked column chart is plotted for the “Age Distribution”. Offering insight into the frequency of different age type of people.
1. In the plot we can found that age group of between 25 to 45 have high frequency of the good salary package.
2. And other age group of people have average salary package
3. Here we can found that the middle age group of people have high salary package, and above 30 to 40 and 50 to 60 age group of
people have less salary package.
Generated a box plot to visualize the distribution of salary
• Found min, median and max value of the salary package as the dataset.
• We can see as per shown in the box plot min salary is 40k
• Median salary package is 1.5 lakhs.
• And max salary package is 1.9 lakhs.
1. using scatter plot to display relationship of age and salary. 2. using scatter plot to display relationship of experience and salary.
1. in 1st scatter plot we can see mostly age group of people taking salary between 1 lakh to 1.2 lakh of package.
2. Highly experienced people less frequency and few of them only taking high salary package. which is 2 lakh
3. Older age group of people get lower salary package who has more experience.
#Mean salary for each category
Generated common multiple plots to indicates mean
of each category
1.For education category masters degree people have
mean salary
2.In location category mean is suburban located people has mean salary.
3. Aa per the job title category manger job role mean salary .
4. Gender category has distributed the equal salary package.
#Data distribution using pair-plots
"Age", "Experience", "Salary“, "Education“ "Age", "Experience", "Salary“ "Job Title"
pair plot visualizes the relationships between "Age", "Experience", and "Salary" for different levels of "Education". Each
scatterplot in the pair plot represents the relationship between two variables, and the diagonal contains histograms showing
the distribution of each variable.
#Encoding data
1. Import the Label-Encoder class from scikit-learn
Iterate over each categorical variable in the list 'categorical’
2. found the data types of all columns in the Data Frame.
There is 2 type of data : int, object.
3. df. head() to get information of the data of rows and columns
# Splitting the train-test split & # Scaling the data
linear Regression
Using linear regression method is not suitable for my dataset, it’s showing less accuracy (0.57%)
Finding R-squared, is a statistical measure that represents the proportion of the variance in the dependent variable that is explained
by the independent variables in a regression model. It is a key metric used to evaluate the goodness of fit of the model to the
observed data.
Random forest
used Random forest model for tuning hyperparameter it showing 97% accuracy.
Ada boost regressor
After using Ada boost regression model got 81% accuracy.
Will try new model for getting proper accuracy.
Support vector regressor
Support vector regressor provides flexible options for customizing the SVR model, including the choice of kernel function,
regularization parameter, and other hyperparameters.
After using support vector regressor got 2.14% accuracy. This model is not suitable for my dataset.
XG Boost Regressor
After applying 5 different models , getting 99% accuracy after using XGBoost Regression model so this is
best model for my data set.
Conclusion and Insights:
Gender
1 . The pie chart reveals the percentage distribution % of Each slice of the pie represents a gender category, and the size of each slice
corresponds to the proportion of that gender category in the dataset. The percentage labels on each slice provide additional information about
the relative frequency of each gender category. We found the male employees has good salary package as compare female employees.
Job Title
2 . In the second slide the bar plot indicates Each bar represents a unique job title, and the height of each bar corresponds to the frequency
(count) of that job title in the dataset. The text labels on top of each bar provide the exact count for each job title category. People who are
working as manager position has good salary package
Education & Location
3 . The donut chart shows the distribution of education and location distribution The percentage labels on each slice provide additional
information about the relative frequency of each category. In summary, these visualizations provide insights into the distribution of categorical
variables in the data set. As we can see the outcome is almost same , people who studied in high school and with degree in masters have
good salary package, and as we can see the location distribution rural and suburban people earning equally.
Used Heatmap for find out correlation.
Insights from the Heatmap:
1. Strong positive correlations (values close to 1) between variables appear as bright red cells in the heatmap.
2. Strong negative correlations (values close to -1) between variables appear as bright blue cells in the heatmap.
3. Weak correlations (values close to 0) appear as cells with colors closer to white or gray.
4. By examining the heatmap, you can identify patterns and relationships between different numerical variables in the dataset. For
example, variables with high positive correlations may indicate dependencies or interactions between them, while variables with high
negative correlations may indicate inverse relationships.
1. Application and Further Analysis:
1. The heatmap provides valuable insights into the relationships between variables, which can inform feature selection, model building, and
data preprocessing steps in data analysis and machine learning tasks.
2. Further analysis can involve investigating the identified correlations in more detail, exploring causality, and validating the relationships
through additional statistical tests or domain knowledge.
Histogram :
1. Application and Further Analysis:
1. The histogram provides insights into the age distribution of the dataset, which can inform demographic analysis, segmentation, and
targeted marketing strategies.
2. Further analysis can involve comparing the age distribution across different groups or segments in the dataset, identifying outliers or
anomalies, and assessing the impact of age on other variables or outcomes of interest.
• In summary, the histogram visualization of the age distribution helps in understanding the demographic composition of the dataset and
provides valuable insights for data-driven decision-making and analysis.
• Insights from the Box Plot:
The box plot provides insights into the central tendency (median) and spread of salary values in the data set. The length of the box (IQR)
indicates the spread of salary values, with longer boxes representing greater variability . The position of the median line within the box indicates
the central tendency of salary values. Outliers, if present, are identified as individual data points beyond the whiskers, suggesting potential
extreme or unusual salary values.
In box plot min salary is 40k and the median is 1.10 lakh and the max is 1.90 lakh.
Scatter plot
the scatter plot visualization of the age-salary relationship provides insights into the patterns and variability in the dataset, facilitating data
exploration and analysis for salary prediction or related tasks.
We found the relationship of age and salary as the age increases the salary package is getting less, and the 20 to 40 age group of people has
average salary package but the frequency is high . Highly experienced people less frequency and few of them only taking high salary
package. which is 2 lakh.
Mean salary for each category
• Insights from the Bar Plots:
The height of each bar indicates the average salary for the corresponding category.
By comparing the heights of bars within each plot, you can identify variations in mean salary across different categories within the same
categorical variable.
Differences in mean salary between categories may suggest potential factors influencing salary variations within the dataset.
Model used:
Used all this model ( linear Regression , Random forest ,Ada boost regressor , Support vector regressor , XG Boost Regressor) to get
accuracy of dataset and After applying 5 different models , getting 99% accuracy after using XG Boost Regression model so this is best
model for my data set.
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

More Related Content

Similar to Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxdarwinming1
 
Executive Program Practical Connection Assignment - 100 poin
Executive Program Practical Connection Assignment - 100 poinExecutive Program Practical Connection Assignment - 100 poin
Executive Program Practical Connection Assignment - 100 poinBetseyCalderon89
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statisticsLamineKaba6
 
Work structure and pay structure - HRM
Work structure   and pay structure - HRMWork structure   and pay structure - HRM
Work structure and pay structure - HRMjalajaAnilkumar
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.Souma Maiti
 
Chapter 03
Chapter 03Chapter 03
Chapter 03bmcfad01
 
Designing Pay Structure.pptx
Designing Pay Structure.pptxDesigning Pay Structure.pptx
Designing Pay Structure.pptxJerome Formalejo
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
6 Cutting-Edge HR Metrics to Measure in 2019
6 Cutting-Edge HR Metrics to Measure in 20196 Cutting-Edge HR Metrics to Measure in 2019
6 Cutting-Edge HR Metrics to Measure in 2019Namely
 
direct marketing in banking using data mining
direct marketing in banking using data miningdirect marketing in banking using data mining
direct marketing in banking using data miningHossein Malekinezhad
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfAmmarAhmedSiddiqui2
 
Statistics
StatisticsStatistics
Statisticspikuoec
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...Smarten Augmented Analytics
 
5 Steps to Master Microsoft Excel: Workbooks
5 Steps to Master Microsoft Excel: Workbooks5 Steps to Master Microsoft Excel: Workbooks
5 Steps to Master Microsoft Excel: WorkbooksTuan Yang
 
Webinar - The State of Remote Work in 2023
Webinar - The State of Remote Work in 2023 Webinar - The State of Remote Work in 2023
Webinar - The State of Remote Work in 2023 PayScale, Inc.
 

Similar to Predicting Salary Using Data Science: A Comprehensive Analysis.pdf (20)

Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
 
Executive Program Practical Connection Assignment - 100 poin
Executive Program Practical Connection Assignment - 100 poinExecutive Program Practical Connection Assignment - 100 poin
Executive Program Practical Connection Assignment - 100 poin
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statistics
 
Work structure and pay structure - HRM
Work structure   and pay structure - HRMWork structure   and pay structure - HRM
Work structure and pay structure - HRM
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
 
Chapter 03
Chapter 03Chapter 03
Chapter 03
 
Designing Pay Structure.pptx
Designing Pay Structure.pptxDesigning Pay Structure.pptx
Designing Pay Structure.pptx
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
6 Cutting-Edge HR Metrics to Measure in 2019
6 Cutting-Edge HR Metrics to Measure in 20196 Cutting-Edge HR Metrics to Measure in 2019
6 Cutting-Edge HR Metrics to Measure in 2019
 
direct marketing in banking using data mining
direct marketing in banking using data miningdirect marketing in banking using data mining
direct marketing in banking using data mining
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 
Statistics
StatisticsStatistics
Statistics
 
Measurement and scaling
Measurement and scalingMeasurement and scaling
Measurement and scaling
 
Bank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptxBank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptx
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
5 Steps to Master Microsoft Excel: Workbooks
5 Steps to Master Microsoft Excel: Workbooks5 Steps to Master Microsoft Excel: Workbooks
5 Steps to Master Microsoft Excel: Workbooks
 
Webinar - The State of Remote Work in 2023
Webinar - The State of Remote Work in 2023 Webinar - The State of Remote Work in 2023
Webinar - The State of Remote Work in 2023
 

More from Boston Institute of Analytics

NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesBoston Institute of Analytics
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionBoston Institute of Analytics
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachBoston Institute of Analytics
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationBoston Institute of Analytics
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Boston Institute of Analytics
 
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Boston Institute of Analytics
 
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Boston Institute of Analytics
 
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Boston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...Boston Institute of Analytics
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 

More from Boston Institute of Analytics (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Analyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning projectAnalyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning project
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning Approach
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project Presentation
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
 
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
 
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
 
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 

Recently uploaded

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf

  • 2. Introduction: • In today's dynamic job market, predicting salaries accurately plays a pivotal role in various aspects of workforce management, recruitment, and financial planning. The ability to estimate salaries based on a range of factors empowers organizations to make informed decisions regarding budget allocation, employee compensation, and talent acquisition strategies. Therefore, the development of robust salary prediction models has become increasingly valuable in modern business operations. • The goal of our project is to construct a reliable salary prediction system that leverages machine learning techniques to forecast salaries for individuals based on relevant attributes such as education, experience, skills, and geographic location. By analyzing historical salary data and identifying patterns within the job market, our aim is to create a model capable of providing accurate salary estimates for new job listings or assessing the competitiveness of compensation packages offered by employers. • Through this project, we seek to address several key challenges in salary prediction, including the inherent variability in compensation across industries, regions, and job roles, as well as the complex interplay of factors influencing salary determination. By applying advanced machine learning algorithms and feature engineering techniques to large-scale datasets, we aim to develop a predictive model that not only achieves high accuracy but also provides insights into the factors driving salary disparities and trends within the job market. • Ultimately, our salary prediction project aims to empower businesses, recruiters, and job seekers alike with actionable insights into salary expectations, thereby facilitating more transparent and equitable negotiations, optimizing resource allocation, and supporting informed decision-making in the realm of human resource management.
  • 3. Problem Statement: • In today's competitive job market, accurately predicting salaries for job positions is essential for organizations to make informed decisions regarding budget allocation, compensation strategies, and talent acquisition. However, the task of salary prediction presents several challenges due to the multifaceted nature of salary determinants and the inherent variability within the job market. • The primary challenge we aim to address with our salary prediction project is the accurate estimation of salaries for individuals based on a diverse set of attributes, including but not limited to education level, years of experience, specialized skills, industry sector, and geographic location. Additionally, we seek to account for the complex interactions between these factors and their impact on salary levels across different job roles and industries. • Furthermore, the availability and quality of data for salary prediction can vary significantly, posing challenges in terms of data preprocessing, feature selection, and model generalization. Additionally, factors such as inflation, market demand, and economic conditions introduce temporal variability that must be accounted for in the prediction process. • By developing a robust salary prediction model, our objective is to address these challenges and provide stakeholders with a reliable tool for estimating salaries with a high degree of accuracy and precision. This model will not only aid organizations in optimizing their recruitment and compensation strategies but also assist job seekers in negotiating fair and competitive salaries based on their qualifications and market demand. • In summary, our salary prediction project seeks to bridge the gap between employer expectations and candidate aspirations by leveraging machine learning techniques to provide transparent and data-driven salary estimations, thereby facilitating more equitable and informed decision-making in the realm of human resource management.
  • 4. About Dataset • The "Salary Prediction Dataset" is a synthetic dataset generated for the purpose of exploring salary prediction tasks. It contains simulated data reflecting various factors influencing salary levels such as education, experience, location, job title, age, and gender. This dataset can be utilized for predictive modeling tasks to estimate salaries based on these factors • # Data Collection • Salary Prediction Data ,Predict the salary according to the features • (From kaggle.com) • Explore, clean and prepare dataset: • Check shape of original dataset: • We have total 1000 rows and 7 columns • Imported Dataset from the file.csv
  • 5. Details step of Data Exploration: The data process involves exploration , handling info of data , unique values, duplication values in data , finding null values and describe data which shows the total values , min , average and max values of the data. 1. Info of data 2. unique values 3.Null values 4. Data describe 5. duplicate values of data In the data we have 0 null values and 0 duplicate values.
  • 6. Exploratory Data Analysis (EDA): First we plotted a pie chart to find-out gender relationship, Here the Gender value counts. Here the relationship of gender , male has 51.60% and women has 48.40% #Hence proved the male has more salary then women.
  • 7. • A bar plot is generated to display the “ Job Title “ Distribution to know the relationship between job title and salary. • The bar plot offering insight into the frequency of different job title. As we can see the following bar plot the more frequency has for the manger job title and less frequency for the engineer job title. The average frequency for the job title of director and Analyst.
  • 8. • A donut chart is plotted to visualized education distribution and location distribution to finding out relationship between . As per the education we can see the almost equal % salary but qualified from high school people have highly paid package Location distribution chart is offering insight that and rural and suburban and rural area people has high package.
  • 9. A heatmap is generated to visualize the correlation matrix of the entire dataset providing the comprehensive overview of relationship between numerical variables. Each cell in the heatmap corresponds to the correlation coefficient between the variables represented by the row and column. 1. The color intensity indicates the strength and direction of the correlation: 2. Dark blue indicates a strong negative correlation; Dark red indicates a strong positive correlation. 3. White indicates no correlation (correlation coefficient close to zero).
  • 10. A stacked column chart is plotted for the “Age Distribution”. Offering insight into the frequency of different age type of people. 1. In the plot we can found that age group of between 25 to 45 have high frequency of the good salary package. 2. And other age group of people have average salary package 3. Here we can found that the middle age group of people have high salary package, and above 30 to 40 and 50 to 60 age group of people have less salary package.
  • 11. Generated a box plot to visualize the distribution of salary • Found min, median and max value of the salary package as the dataset. • We can see as per shown in the box plot min salary is 40k • Median salary package is 1.5 lakhs. • And max salary package is 1.9 lakhs.
  • 12. 1. using scatter plot to display relationship of age and salary. 2. using scatter plot to display relationship of experience and salary. 1. in 1st scatter plot we can see mostly age group of people taking salary between 1 lakh to 1.2 lakh of package. 2. Highly experienced people less frequency and few of them only taking high salary package. which is 2 lakh 3. Older age group of people get lower salary package who has more experience.
  • 13. #Mean salary for each category Generated common multiple plots to indicates mean of each category 1.For education category masters degree people have mean salary 2.In location category mean is suburban located people has mean salary. 3. Aa per the job title category manger job role mean salary . 4. Gender category has distributed the equal salary package.
  • 14. #Data distribution using pair-plots "Age", "Experience", "Salary“, "Education“ "Age", "Experience", "Salary“ "Job Title" pair plot visualizes the relationships between "Age", "Experience", and "Salary" for different levels of "Education". Each scatterplot in the pair plot represents the relationship between two variables, and the diagonal contains histograms showing the distribution of each variable.
  • 15. #Encoding data 1. Import the Label-Encoder class from scikit-learn Iterate over each categorical variable in the list 'categorical’ 2. found the data types of all columns in the Data Frame. There is 2 type of data : int, object. 3. df. head() to get information of the data of rows and columns # Splitting the train-test split & # Scaling the data
  • 16. linear Regression Using linear regression method is not suitable for my dataset, it’s showing less accuracy (0.57%) Finding R-squared, is a statistical measure that represents the proportion of the variance in the dependent variable that is explained by the independent variables in a regression model. It is a key metric used to evaluate the goodness of fit of the model to the observed data.
  • 17. Random forest used Random forest model for tuning hyperparameter it showing 97% accuracy.
  • 18. Ada boost regressor After using Ada boost regression model got 81% accuracy. Will try new model for getting proper accuracy.
  • 19. Support vector regressor Support vector regressor provides flexible options for customizing the SVR model, including the choice of kernel function, regularization parameter, and other hyperparameters. After using support vector regressor got 2.14% accuracy. This model is not suitable for my dataset.
  • 20. XG Boost Regressor After applying 5 different models , getting 99% accuracy after using XGBoost Regression model so this is best model for my data set.
  • 21. Conclusion and Insights: Gender 1 . The pie chart reveals the percentage distribution % of Each slice of the pie represents a gender category, and the size of each slice corresponds to the proportion of that gender category in the dataset. The percentage labels on each slice provide additional information about the relative frequency of each gender category. We found the male employees has good salary package as compare female employees. Job Title 2 . In the second slide the bar plot indicates Each bar represents a unique job title, and the height of each bar corresponds to the frequency (count) of that job title in the dataset. The text labels on top of each bar provide the exact count for each job title category. People who are working as manager position has good salary package Education & Location 3 . The donut chart shows the distribution of education and location distribution The percentage labels on each slice provide additional information about the relative frequency of each category. In summary, these visualizations provide insights into the distribution of categorical variables in the data set. As we can see the outcome is almost same , people who studied in high school and with degree in masters have good salary package, and as we can see the location distribution rural and suburban people earning equally. Used Heatmap for find out correlation. Insights from the Heatmap: 1. Strong positive correlations (values close to 1) between variables appear as bright red cells in the heatmap. 2. Strong negative correlations (values close to -1) between variables appear as bright blue cells in the heatmap. 3. Weak correlations (values close to 0) appear as cells with colors closer to white or gray. 4. By examining the heatmap, you can identify patterns and relationships between different numerical variables in the dataset. For example, variables with high positive correlations may indicate dependencies or interactions between them, while variables with high negative correlations may indicate inverse relationships.
  • 22. 1. Application and Further Analysis: 1. The heatmap provides valuable insights into the relationships between variables, which can inform feature selection, model building, and data preprocessing steps in data analysis and machine learning tasks. 2. Further analysis can involve investigating the identified correlations in more detail, exploring causality, and validating the relationships through additional statistical tests or domain knowledge. Histogram : 1. Application and Further Analysis: 1. The histogram provides insights into the age distribution of the dataset, which can inform demographic analysis, segmentation, and targeted marketing strategies. 2. Further analysis can involve comparing the age distribution across different groups or segments in the dataset, identifying outliers or anomalies, and assessing the impact of age on other variables or outcomes of interest. • In summary, the histogram visualization of the age distribution helps in understanding the demographic composition of the dataset and provides valuable insights for data-driven decision-making and analysis. • Insights from the Box Plot: The box plot provides insights into the central tendency (median) and spread of salary values in the data set. The length of the box (IQR) indicates the spread of salary values, with longer boxes representing greater variability . The position of the median line within the box indicates the central tendency of salary values. Outliers, if present, are identified as individual data points beyond the whiskers, suggesting potential extreme or unusual salary values. In box plot min salary is 40k and the median is 1.10 lakh and the max is 1.90 lakh.
  • 23. Scatter plot the scatter plot visualization of the age-salary relationship provides insights into the patterns and variability in the dataset, facilitating data exploration and analysis for salary prediction or related tasks. We found the relationship of age and salary as the age increases the salary package is getting less, and the 20 to 40 age group of people has average salary package but the frequency is high . Highly experienced people less frequency and few of them only taking high salary package. which is 2 lakh. Mean salary for each category • Insights from the Bar Plots: The height of each bar indicates the average salary for the corresponding category. By comparing the heights of bars within each plot, you can identify variations in mean salary across different categories within the same categorical variable. Differences in mean salary between categories may suggest potential factors influencing salary variations within the dataset. Model used: Used all this model ( linear Regression , Random forest ,Ada boost regressor , Support vector regressor , XG Boost Regressor) to get accuracy of dataset and After applying 5 different models , getting 99% accuracy after using XG Boost Regression model so this is best model for my data set.