SlideShare a Scribd company logo
1 of 55
Download to read offline
UCI
Heart
Disease
Prediction
Data
-By Supriya Kamble
Introduction
Cardiovascular diseases have been the most common cause of death worldwide over the last few decades in
developed as well as underdeveloped and developing countries. Early detection of cardiac diseases and
continuous supervision of clinicians can reduce the mortality rate. However, it is not possible to monitor
patients every day in all cases accurately and consultation of a patient for 24 hours by a doctor is not available
since it requires more sapience, time, and expertise.
Every day, the average human heart beats around 100,000 times, pumping 2,000 gallons of blood through the
body. Inside your body, there are 60,000 miles of blood vessels. The signs of a woman having a heart attack are
much less noticeable than the signs of a man. In women, heart attacks may feel uncomfortable squeezing,
pressure, fullness, or pain in the center of the chest. It may also cause pain in one or both arms, the back, neck,
jaw, stomach, shortness of breath, nausea, and other symptoms.
Men experience typical symptoms of heart attack, such as chest pain, discomfort, and stress. They may also
experience pain in other areas, such as arms, neck, back, and jaw, and shortness of breath, sweating, and
discomfort that mimics heartburn. It’s a lot of work for an organ which is just like a large fist and weighs
between 8 and 12 ounces.
Objective of Data
The objective of the UCI Heart Disease dataset is to facilitate research and analysis aimed at developing
predictive models for the detection and assessment of heart disease. Specifically, the dataset aims to:
• Enable Prediction: Provide a diverse set of medical attributes and corresponding diagnoses to enable
the development of machine learning models capable of predicting the likelihood of heart disease in
patients.
• Support Research: Serve as a valuable resource for researchers and data scientists interested in
studying the factors associated with heart disease and exploring novel approaches to its diagnosis and
treatment.
• Promote Healthcare Innovation: Foster innovation in healthcare by empowering healthcare providers,
businesses, and policymakers with data-driven insights into heart disease risk assessment and
management.
• Improve Patient Outcomes: Ultimately, the primary objective of the dataset is to contribute to the
improvement of patient outcomes by facilitating early detection, intervention, and personalized
treatment of heart disease.
How data can help businesses
1) Healthcare Providers: Hospitals and clinics can use these models to assess the risk of heart disease in
patients during routine check-ups. This can lead to early detection and intervention, ultimately
improving patient outcomes and reducing healthcare costs.
2) Insurance Companies: Insurance companies can utilize these models to assess the risk of heart
disease in their policyholders. By identifying high-risk individuals, they can offer targeted
interventions or wellness programs to mitigate the risk and reduce claims.
3) Pharmaceutical Companies: Pharmaceutical companies can use predictive models to identify
potential candidates for clinical trials of new drugs aimed at preventing or treating heart disease. This
can streamline the drug development process and bring new treatments to market more efficiently.
4) Healthtech Startups: Startups focused on digital health and wellness can develop applications or
wearable devices that utilize heart disease prediction models to provide personalized health
recommendations to users. This can empower individuals to take proactive steps toward preventing
heart disease.
Real-life Applications
1) Clinical Decision Support: Healthcare professionals can use these models as decision-support tools
during patient consultations. By inputting patient data into the model, clinicians can obtain risk scores
and recommendations for further evaluation or treatment.
2) Public Health Initiatives: Public health authorities can utilize predictive models to identify
populations at high risk of heart disease and implement targeted prevention strategies, such as
educational campaigns, screening programs, or policy interventions.
3) Remote Monitoring: Remote monitoring devices equipped with heart disease prediction algorithms
can continuously monitor individuals at risk and alert them or their caregivers of any significant
changes or warning signs, enabling timely medical intervention.
4) Personalized Medicine: Predictive models can facilitate the shift towards personalized medicine by
enabling healthcare providers to tailor treatment plans based on an individual's risk profile and
genetic predisposition to heart disease.
About Dataset
• This is a multivariate type of dataset which means providing or involving various mathematical or
statistical variables, and multivariate numerical data analysis.
• It is composed of 14 attributes which are age, sex, chest pain type, resting blood pressure, serum
cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved,
exercise-induced angina, old peak-ST depression induced by exercise relative to rest, the slope of the
peak exercise ST segment, number of major vessels and Thalassemia.
• This database includes 76 attributes, but all published studies relate to using a subset of 14 of them.
One of the major tasks of this dataset is to predict based on the given attributes of a patient whether
that particular person has heart disease or not. The other is the experimental task to diagnose and
find out various insights from this dataset which could help in understanding the problem more.
Column Descriptions
1) id: Unique identifier for each patient
2) age: Age of the patient in years
3) origin: Place of study
4) sex: Gender of the patient
5) cp: Chest pain type (e.g., typical angina, atypical angina, non-anginal, asymptomatic)
6) trestbps: Resting blood pressure (mm Hg on admission)
7) chol: Serum cholesterol level (mg/dl)
8) fbs: Fasting blood sugar (>120 mg/dl)
9) restecg: Resting electrocardiographic results
10) Values: normal, ST-T abnormality, left ventricular hypertrophy
11) thalach: Maximum heart rate achieved
12) exang: Exercise-induced angina (True/False)
13) oldpeak: ST depression induced by exercise relative to rest
14) slope: Slope of the peak exercise ST segment
15) ca: Number of major vessels colored by fluoroscopy (0-3)
16) thal: Thalassemia diagnosis (normal, fixed defect, reversible defect)
17) num: Predicted attribute indicating presence of heart disease
Challenges
1) Data Quality: Ensuring the accuracy and reliability of the medical data is crucial for building effective
prediction models. Incomplete or inaccurate data can lead to biased or unreliable predictions.
2) Feature Selection: Identifying the most relevant features or attributes from the dataset that
contribute to the prediction of heart disease is essential. This requires domain knowledge and careful
analysis of the data.
3) Imbalanced Data: Imbalance in the distribution of classes (i.e., presence or absence of heart disease)
can affect the performance of machine learning algorithms. Techniques such as oversampling, under-
sampling, or using algorithms that handle imbalanced data well are necessary to address this issue.
4) Interpretability: Building models that not only provide accurate predictions but also offer insights into
the factors contributing to the prediction is important for gaining trust from healthcare professionals
and patients.
Data Understanding
Begin by loading the dataset into Python Programming. Verifying that the dataset is loaded correctly and
examine the first few rows to get a glimpse of the data structure.
The size of the dataset is 920 rows and 16 attributes in which num is the dependent variable for which we
have to make the prediction.
Dataset Overview
Based on the summary above, it appears that the data
consists of a total of 920 observations. However, many
features in this dataset have missing values, including
trestbps, chol, fbs, restecg, thalch, exang, oldpeak, slope, ca,
and thal. In addition, the dataset contains both numeric and
categorical variables.
Exploring Numerical and Categorical Features
Exploratory Data Analysis (EDA)
Categorical Features – Countplot
Numerical Features – histplot
Outlier Detection
Based on the box plot above, trestbps, chol, and thalch exhibit outliers, especially chol. On the contrary, age and
exang are two features that do not have outliers.
Pattern of Missingness
• Based on the heatmap above, missing
values appear intensively starting
from the 300th row.
• The top three variables with the
highest number of observations with
missing values are slope, ca, and thal.
• So far, it does not look like the
missing values are distributed
randomly.
Correlation Matrix
• From the heatmap above, we
observe a strong relationship of
missing values between thalch
and trestbps, exang and
trestbps, oldpeak and trestbps,
etc.
• Once again, the pattern of
missing values among variables
does not appear random.
• As we mentioned above, the
dataset includes 15 variables.
However, at least 10 variables
have missing values.
• Hence, we will apply 2
imputation methods
(Median/Mode imputation and
Random Forest imputation) to
fill in the missing values.
Imputing Missing Values
Median/Mode Imputation
We will start by trying the simplest imputation method, which is Median/Mode Imputation, to fill in missing
values
we will fill in the missing values by inputting the median value if the feature is numerical. For categorical
features,
we will use the mode value to replace the missing values.
Numeric variables ==> median value
Categorical variables ==> mode value
Bivariate Analysis
Distribution of Age Among Patients with and without Heart Disease
We can notice that people between the ages of 40 and 70 are the most affected by heart disease
Heart Disease Prevalence by Sex
We can notice that men are more susceptible to heart disease at all levels.
Relationship Between Cholesterol Levels and Heart Disease
• The box plot illustrates cholesterol
levels across five heart disease
categories, showing median
values, range variability, and
outliers.
• Categories 1 to 4 have similar
medians, but the spread and
outliers differ, with category 0
showing the most variability
Maximum Heart Rate and Heart Disease
The plot shows a negative
correlation where the maximum
heart rate tends to decrease as
age increases.
The Impact of Exercise-Induced Angina on Heart Disease
• Most cases in category 0 do not
report angina, while categories 1
through 4 show a more varied
distribution, with both angina
and non-angina cases present.
• The data suggests that exercise-
induced angina is more
commonly reported in individuals
with heart disease categories 1
to 4 compared to category 0.
Average Resting Blood Pressure by Heart Disease Status
• All categories show similar
average blood pressures ranging
slightly above 120 mm Hg.
• The error bars indicate some
variability in the measurements,
with a slight trend toward
increasing variability from status
0 to 4.
Distribution of Chest Pain Type among Patients
• ‘Asymptomatic' is the most common
type of chest pain across all heart
disease statuses except for status 0,
where 'typical angina' is more
prevalent.
• 'Non-anginal' pain is notably
frequent in heart disease status 4,
while 'atypical angina' is relatively
less common across all states.
Fasting Blood Sugar and Heart Disease
• The majority of individuals across all
heart disease statuses have fasting
blood sugar levels at or below 120
mg/dl.
• For those with higher blood sugar
levels, the counts are notably lower,
suggesting that elevated fasting blood
sugar is less common among these
individuals regardless of their heart
disease status.
Heart Disease Prevalence by Resting Electrocardiographic Results
• Most individuals with a normal
ECG result fall into the '0' heart
disease category, indicating no
presence of heart disease.
• In contrast, those with ST-T
abnormalities show a higher
count of heart disease statuses 1
through 4.
• Left ventricular hypertrophy is
less common but shows some
presence across all heart disease
categories.
Data Preprocessing
If we just look at the data, we will see some of the features have categorical values. So we have to do one hot
encoding for them. Also, the original dataset contains the target as 0, 1, 2, 3, 4. But for identifying simply the
presence of disease, we will take binary classification. With that view in mind, we will convert all the target
features in the num column into 1/0.
One-Hot Encoding
Splitting the Dependent and Independent Features
Splitting the dependent and independent features using the train test split from the sklearn library. The test
size of the split is an 80-20 ratio.
Feature Scaling
• Normalization
The Min-Max Normalization method is used to Normalize the data. This method scales the data range to [0,1].
Machine Learning Model
Logistic Regression
In the above figure, the red dots represent the predicted values that are either 0 or 1 and the blue line & and dot
represent the actual value of that particular patient. In the places where the red dot and blue dot do not overlap
are the wrong predictions and where both dots overlap those are the right predicted values.
Model Evaluation
• The logistic regression has given an accuracy of 77.71%.
• From the confusion matrix, we can say the model can classify whether the disease is present or not. But
False Positives and False Negatives are also high to reduce this we will fit another classification model.
A ROC curve, or receiver operating characteristic curve, is like a graph that shows how well a classification
model performs.
Coefficients
Linear Regression calculates the total outcome by summing up
the weighted sum of the different features.
Random Forest Classifier
Random Forest has given accuracy of 79.34% which is better than Logistic Regression. Also, the precision,
recall, and F1 scores improved more than in the previous model.
Naïve Bayes
Naïve Bayes has given an accuracy of 77.7% which is the same as Logistic Regression. Also, the precision,
recall, and F1 scores have improved in this model.
Gradient Boosting Classifier
Gradient Boosting has performed better than all models till now with an accuracy of 80.43%. Also, the
model can classify the whether disease is present or not more accurately.
XGBoost Classifier
After applying the Xgboost classifier the confusion matrix True positive and True Negative has increased
from the previous model.
LightGBM
Here, the accuracy increased to 81.52% and also the false
negative and false positive decreased making the model
able to classify properly.
Hyperparameter Tuning
Hyperparameters are external configurations that guide the learning process but are not learned from the
data. It involves the systematic optimization of the parameters to enhance a model's performance. This
process often employs techniques like grid search, exploring different combinations of hyperparameter values
to find the optimal set that maximizes model accuracy or other performance metrics.
The accuracy of Xgboost didn’t
improve after doing hyperparameter
tuning on data.
The accuracy of LightGBM also didn’t improve.
Model Selection
• Since the accuracy of both Xgboost and LighGBM didn’t increase after tuning them with parameters.
But lightGBM has a high accuracy of 82% and also the model was able to correctly classify the classes.
Therefore, the LightGBM is the best model for the heart prediction data.
• As per the result, the model has around 82% precision score which is quite acceptable to predict heart
disease in an individual based upon the characteristics of age, sex, cp trestbps, chol, fbs, restecg,
thalch, exang, oldpeak, slope, ca, thal.
1) The patients' ages range from 29 to 77 years, with an average age of 54.
2) The majority of the patients are male (75.9%) and the most common type of chest pain experienced by
the patients is typical angina (39.6%).
3) The average resting blood pressure is 131.6 mmHg and the average cholesterol level is 246 mg/dL.
4) The average maximum heart rate achieved during exercise is 139.9 bpm.
5) Most patients (70.3%) do not experience exercise-induced angina.
6) The average ST depression induced by exercise is 1.04 mm the majority of the patients (54.8%) have a
normal ECG result.
7) Several classification models were trained and evaluated, including Logistic Regression, Random Forest,
Naive Bayes, Gradient Boosting, XGBoost, and LightGBM.
8) The LightGBM model achieved the highest accuracy of 80.97% after hyperparameter tuning.
9) The ROC curves and AUC scores for each model were analyzed to assess their performance.
10) The results suggest that the XGBoost and LightGBM models are suitable for predicting the presence or
absence of heart disease based on the available features.
Summary
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data Science.pdf

More Related Content

Similar to NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data Science.pdf

predictionofheartdiseaseusingmachinelearning.pdf
predictionofheartdiseaseusingmachinelearning.pdfpredictionofheartdiseaseusingmachinelearning.pdf
predictionofheartdiseaseusingmachinelearning.pdfDasariSeshadri
 
Prediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptxPrediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptxkumari36
 
Genetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease ClassificationGenetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease ClassificationIRJET Journal
 
Heart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means ClassifierHeart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means ClassifierIOSR Journals
 
Predicting Heart Disease Using Machine Learning Algorithms.
Predicting Heart Disease Using Machine Learning Algorithms.Predicting Heart Disease Using Machine Learning Algorithms.
Predicting Heart Disease Using Machine Learning Algorithms.IRJET Journal
 
Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...BASMAJUMAASALEHALMOH
 
Running Head SCENARIO NCLEX MEMORIAL HOSPITAL .docx
Running Head SCENARIO NCLEX MEMORIAL HOSPITAL                    .docxRunning Head SCENARIO NCLEX MEMORIAL HOSPITAL                    .docx
Running Head SCENARIO NCLEX MEMORIAL HOSPITAL .docxtoltonkendal
 
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGHEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGIJDKP
 
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...IJCSEA Journal
 
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCEPREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCEijaia
 
IRJET- A System to Detect Heart Failure using Deep Learning Techniques
IRJET- A System to Detect Heart Failure using Deep Learning TechniquesIRJET- A System to Detect Heart Failure using Deep Learning Techniques
IRJET- A System to Detect Heart Failure using Deep Learning TechniquesIRJET Journal
 
Ascendable Clarification for Coronary Illness Prediction using Classification...
Ascendable Clarification for Coronary Illness Prediction using Classification...Ascendable Clarification for Coronary Illness Prediction using Classification...
Ascendable Clarification for Coronary Illness Prediction using Classification...ijtsrd
 
Mining of medical data to identify risk factors of heart disease using freque...
Mining of medical data to identify risk factors of heart disease using freque...Mining of medical data to identify risk factors of heart disease using freque...
Mining of medical data to identify risk factors of heart disease using freque...IRJET Journal
 
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...IRJET Journal
 
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docx
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docxRunning head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docx
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docxtoltonkendal
 

Similar to NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data Science.pdf (20)

Heart Disease Prediction Analysis - Sushil Gupta.pptx
Heart Disease Prediction Analysis - Sushil Gupta.pptxHeart Disease Prediction Analysis - Sushil Gupta.pptx
Heart Disease Prediction Analysis - Sushil Gupta.pptx
 
predictionofheartdiseaseusingmachinelearning.pdf
predictionofheartdiseaseusingmachinelearning.pdfpredictionofheartdiseaseusingmachinelearning.pdf
predictionofheartdiseaseusingmachinelearning.pdf
 
Prediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptxPrediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptx
 
PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 
Genetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease ClassificationGenetically Optimized Neural Network for Heart Disease Classification
Genetically Optimized Neural Network for Heart Disease Classification
 
Heart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means ClassifierHeart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means Classifier
 
Predicting Heart Disease Using Machine Learning Algorithms.
Predicting Heart Disease Using Machine Learning Algorithms.Predicting Heart Disease Using Machine Learning Algorithms.
Predicting Heart Disease Using Machine Learning Algorithms.
 
Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...
 
Running Head SCENARIO NCLEX MEMORIAL HOSPITAL .docx
Running Head SCENARIO NCLEX MEMORIAL HOSPITAL                    .docxRunning Head SCENARIO NCLEX MEMORIAL HOSPITAL                    .docx
Running Head SCENARIO NCLEX MEMORIAL HOSPITAL .docx
 
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNINGHEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
HEART DISEASE PREDICTION USING MACHINE LEARNING AND DEEP LEARNING
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
APPLYING MACHINE LEARNING TECHNIQUES TO FIND IMPORTANT ATTRIBUTES FOR HEART F...
 
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCEPREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
 
IRJET- A System to Detect Heart Failure using Deep Learning Techniques
IRJET- A System to Detect Heart Failure using Deep Learning TechniquesIRJET- A System to Detect Heart Failure using Deep Learning Techniques
IRJET- A System to Detect Heart Failure using Deep Learning Techniques
 
Ascendable Clarification for Coronary Illness Prediction using Classification...
Ascendable Clarification for Coronary Illness Prediction using Classification...Ascendable Clarification for Coronary Illness Prediction using Classification...
Ascendable Clarification for Coronary Illness Prediction using Classification...
 
Mining of medical data to identify risk factors of heart disease using freque...
Mining of medical data to identify risk factors of heart disease using freque...Mining of medical data to identify risk factors of heart disease using freque...
Mining of medical data to identify risk factors of heart disease using freque...
 
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
IRJET -Improving the Accuracy of the Heart Disease Prediction using Hybrid Ma...
 
Heart attack possibility.pptx
Heart attack possibility.pptxHeart attack possibility.pptx
Heart attack possibility.pptx
 
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docx
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docxRunning head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docx
Running head PHASE 1 SCENARIO NCLEX MEMOORIAL HOSPITAL1PHASE .docx
 
Biostatistics khushbu
Biostatistics khushbuBiostatistics khushbu
Biostatistics khushbu
 

More from Boston Institute of Analytics

NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesBoston Institute of Analytics
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionBoston Institute of Analytics
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachBoston Institute of Analytics
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationBoston Institute of Analytics
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Boston Institute of Analytics
 
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Boston Institute of Analytics
 
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Boston Institute of Analytics
 
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Boston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...Boston Institute of Analytics
 

More from Boston Institute of Analytics (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Analyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning projectAnalyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning project
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning Approach
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project Presentation
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
 
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
 
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
 
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
 

Recently uploaded

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data Science.pdf

  • 2. Introduction Cardiovascular diseases have been the most common cause of death worldwide over the last few decades in developed as well as underdeveloped and developing countries. Early detection of cardiac diseases and continuous supervision of clinicians can reduce the mortality rate. However, it is not possible to monitor patients every day in all cases accurately and consultation of a patient for 24 hours by a doctor is not available since it requires more sapience, time, and expertise. Every day, the average human heart beats around 100,000 times, pumping 2,000 gallons of blood through the body. Inside your body, there are 60,000 miles of blood vessels. The signs of a woman having a heart attack are much less noticeable than the signs of a man. In women, heart attacks may feel uncomfortable squeezing, pressure, fullness, or pain in the center of the chest. It may also cause pain in one or both arms, the back, neck, jaw, stomach, shortness of breath, nausea, and other symptoms. Men experience typical symptoms of heart attack, such as chest pain, discomfort, and stress. They may also experience pain in other areas, such as arms, neck, back, and jaw, and shortness of breath, sweating, and discomfort that mimics heartburn. It’s a lot of work for an organ which is just like a large fist and weighs between 8 and 12 ounces.
  • 3. Objective of Data The objective of the UCI Heart Disease dataset is to facilitate research and analysis aimed at developing predictive models for the detection and assessment of heart disease. Specifically, the dataset aims to: • Enable Prediction: Provide a diverse set of medical attributes and corresponding diagnoses to enable the development of machine learning models capable of predicting the likelihood of heart disease in patients. • Support Research: Serve as a valuable resource for researchers and data scientists interested in studying the factors associated with heart disease and exploring novel approaches to its diagnosis and treatment. • Promote Healthcare Innovation: Foster innovation in healthcare by empowering healthcare providers, businesses, and policymakers with data-driven insights into heart disease risk assessment and management. • Improve Patient Outcomes: Ultimately, the primary objective of the dataset is to contribute to the improvement of patient outcomes by facilitating early detection, intervention, and personalized treatment of heart disease.
  • 4. How data can help businesses 1) Healthcare Providers: Hospitals and clinics can use these models to assess the risk of heart disease in patients during routine check-ups. This can lead to early detection and intervention, ultimately improving patient outcomes and reducing healthcare costs. 2) Insurance Companies: Insurance companies can utilize these models to assess the risk of heart disease in their policyholders. By identifying high-risk individuals, they can offer targeted interventions or wellness programs to mitigate the risk and reduce claims. 3) Pharmaceutical Companies: Pharmaceutical companies can use predictive models to identify potential candidates for clinical trials of new drugs aimed at preventing or treating heart disease. This can streamline the drug development process and bring new treatments to market more efficiently. 4) Healthtech Startups: Startups focused on digital health and wellness can develop applications or wearable devices that utilize heart disease prediction models to provide personalized health recommendations to users. This can empower individuals to take proactive steps toward preventing heart disease.
  • 5. Real-life Applications 1) Clinical Decision Support: Healthcare professionals can use these models as decision-support tools during patient consultations. By inputting patient data into the model, clinicians can obtain risk scores and recommendations for further evaluation or treatment. 2) Public Health Initiatives: Public health authorities can utilize predictive models to identify populations at high risk of heart disease and implement targeted prevention strategies, such as educational campaigns, screening programs, or policy interventions. 3) Remote Monitoring: Remote monitoring devices equipped with heart disease prediction algorithms can continuously monitor individuals at risk and alert them or their caregivers of any significant changes or warning signs, enabling timely medical intervention. 4) Personalized Medicine: Predictive models can facilitate the shift towards personalized medicine by enabling healthcare providers to tailor treatment plans based on an individual's risk profile and genetic predisposition to heart disease.
  • 6. About Dataset • This is a multivariate type of dataset which means providing or involving various mathematical or statistical variables, and multivariate numerical data analysis. • It is composed of 14 attributes which are age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, old peak-ST depression induced by exercise relative to rest, the slope of the peak exercise ST segment, number of major vessels and Thalassemia. • This database includes 76 attributes, but all published studies relate to using a subset of 14 of them. One of the major tasks of this dataset is to predict based on the given attributes of a patient whether that particular person has heart disease or not. The other is the experimental task to diagnose and find out various insights from this dataset which could help in understanding the problem more.
  • 7. Column Descriptions 1) id: Unique identifier for each patient 2) age: Age of the patient in years 3) origin: Place of study 4) sex: Gender of the patient 5) cp: Chest pain type (e.g., typical angina, atypical angina, non-anginal, asymptomatic) 6) trestbps: Resting blood pressure (mm Hg on admission) 7) chol: Serum cholesterol level (mg/dl) 8) fbs: Fasting blood sugar (>120 mg/dl) 9) restecg: Resting electrocardiographic results 10) Values: normal, ST-T abnormality, left ventricular hypertrophy 11) thalach: Maximum heart rate achieved 12) exang: Exercise-induced angina (True/False) 13) oldpeak: ST depression induced by exercise relative to rest 14) slope: Slope of the peak exercise ST segment 15) ca: Number of major vessels colored by fluoroscopy (0-3) 16) thal: Thalassemia diagnosis (normal, fixed defect, reversible defect) 17) num: Predicted attribute indicating presence of heart disease
  • 8. Challenges 1) Data Quality: Ensuring the accuracy and reliability of the medical data is crucial for building effective prediction models. Incomplete or inaccurate data can lead to biased or unreliable predictions. 2) Feature Selection: Identifying the most relevant features or attributes from the dataset that contribute to the prediction of heart disease is essential. This requires domain knowledge and careful analysis of the data. 3) Imbalanced Data: Imbalance in the distribution of classes (i.e., presence or absence of heart disease) can affect the performance of machine learning algorithms. Techniques such as oversampling, under- sampling, or using algorithms that handle imbalanced data well are necessary to address this issue. 4) Interpretability: Building models that not only provide accurate predictions but also offer insights into the factors contributing to the prediction is important for gaining trust from healthcare professionals and patients.
  • 9. Data Understanding Begin by loading the dataset into Python Programming. Verifying that the dataset is loaded correctly and examine the first few rows to get a glimpse of the data structure. The size of the dataset is 920 rows and 16 attributes in which num is the dependent variable for which we have to make the prediction.
  • 10. Dataset Overview Based on the summary above, it appears that the data consists of a total of 920 observations. However, many features in this dataset have missing values, including trestbps, chol, fbs, restecg, thalch, exang, oldpeak, slope, ca, and thal. In addition, the dataset contains both numeric and categorical variables.
  • 11. Exploring Numerical and Categorical Features
  • 12. Exploratory Data Analysis (EDA) Categorical Features – Countplot
  • 13.
  • 14.
  • 15.
  • 17.
  • 18. Outlier Detection Based on the box plot above, trestbps, chol, and thalch exhibit outliers, especially chol. On the contrary, age and exang are two features that do not have outliers.
  • 19. Pattern of Missingness • Based on the heatmap above, missing values appear intensively starting from the 300th row. • The top three variables with the highest number of observations with missing values are slope, ca, and thal. • So far, it does not look like the missing values are distributed randomly.
  • 20. Correlation Matrix • From the heatmap above, we observe a strong relationship of missing values between thalch and trestbps, exang and trestbps, oldpeak and trestbps, etc. • Once again, the pattern of missing values among variables does not appear random. • As we mentioned above, the dataset includes 15 variables. However, at least 10 variables have missing values. • Hence, we will apply 2 imputation methods (Median/Mode imputation and Random Forest imputation) to fill in the missing values.
  • 21. Imputing Missing Values Median/Mode Imputation We will start by trying the simplest imputation method, which is Median/Mode Imputation, to fill in missing values we will fill in the missing values by inputting the median value if the feature is numerical. For categorical features, we will use the mode value to replace the missing values. Numeric variables ==> median value Categorical variables ==> mode value
  • 23. Distribution of Age Among Patients with and without Heart Disease We can notice that people between the ages of 40 and 70 are the most affected by heart disease
  • 24. Heart Disease Prevalence by Sex We can notice that men are more susceptible to heart disease at all levels.
  • 25. Relationship Between Cholesterol Levels and Heart Disease • The box plot illustrates cholesterol levels across five heart disease categories, showing median values, range variability, and outliers. • Categories 1 to 4 have similar medians, but the spread and outliers differ, with category 0 showing the most variability
  • 26. Maximum Heart Rate and Heart Disease The plot shows a negative correlation where the maximum heart rate tends to decrease as age increases.
  • 27. The Impact of Exercise-Induced Angina on Heart Disease • Most cases in category 0 do not report angina, while categories 1 through 4 show a more varied distribution, with both angina and non-angina cases present. • The data suggests that exercise- induced angina is more commonly reported in individuals with heart disease categories 1 to 4 compared to category 0.
  • 28. Average Resting Blood Pressure by Heart Disease Status • All categories show similar average blood pressures ranging slightly above 120 mm Hg. • The error bars indicate some variability in the measurements, with a slight trend toward increasing variability from status 0 to 4.
  • 29. Distribution of Chest Pain Type among Patients • ‘Asymptomatic' is the most common type of chest pain across all heart disease statuses except for status 0, where 'typical angina' is more prevalent. • 'Non-anginal' pain is notably frequent in heart disease status 4, while 'atypical angina' is relatively less common across all states.
  • 30. Fasting Blood Sugar and Heart Disease • The majority of individuals across all heart disease statuses have fasting blood sugar levels at or below 120 mg/dl. • For those with higher blood sugar levels, the counts are notably lower, suggesting that elevated fasting blood sugar is less common among these individuals regardless of their heart disease status.
  • 31. Heart Disease Prevalence by Resting Electrocardiographic Results • Most individuals with a normal ECG result fall into the '0' heart disease category, indicating no presence of heart disease. • In contrast, those with ST-T abnormalities show a higher count of heart disease statuses 1 through 4. • Left ventricular hypertrophy is less common but shows some presence across all heart disease categories.
  • 32. Data Preprocessing If we just look at the data, we will see some of the features have categorical values. So we have to do one hot encoding for them. Also, the original dataset contains the target as 0, 1, 2, 3, 4. But for identifying simply the presence of disease, we will take binary classification. With that view in mind, we will convert all the target features in the num column into 1/0.
  • 34. Splitting the Dependent and Independent Features Splitting the dependent and independent features using the train test split from the sklearn library. The test size of the split is an 80-20 ratio.
  • 35. Feature Scaling • Normalization The Min-Max Normalization method is used to Normalize the data. This method scales the data range to [0,1].
  • 37. In the above figure, the red dots represent the predicted values that are either 0 or 1 and the blue line & and dot represent the actual value of that particular patient. In the places where the red dot and blue dot do not overlap are the wrong predictions and where both dots overlap those are the right predicted values.
  • 38. Model Evaluation • The logistic regression has given an accuracy of 77.71%. • From the confusion matrix, we can say the model can classify whether the disease is present or not. But False Positives and False Negatives are also high to reduce this we will fit another classification model.
  • 39. A ROC curve, or receiver operating characteristic curve, is like a graph that shows how well a classification model performs.
  • 40. Coefficients Linear Regression calculates the total outcome by summing up the weighted sum of the different features.
  • 41. Random Forest Classifier Random Forest has given accuracy of 79.34% which is better than Logistic Regression. Also, the precision, recall, and F1 scores improved more than in the previous model.
  • 42.
  • 43. Naïve Bayes Naïve Bayes has given an accuracy of 77.7% which is the same as Logistic Regression. Also, the precision, recall, and F1 scores have improved in this model.
  • 44.
  • 45. Gradient Boosting Classifier Gradient Boosting has performed better than all models till now with an accuracy of 80.43%. Also, the model can classify the whether disease is present or not more accurately.
  • 46.
  • 47. XGBoost Classifier After applying the Xgboost classifier the confusion matrix True positive and True Negative has increased from the previous model.
  • 48.
  • 49. LightGBM Here, the accuracy increased to 81.52% and also the false negative and false positive decreased making the model able to classify properly.
  • 50.
  • 51. Hyperparameter Tuning Hyperparameters are external configurations that guide the learning process but are not learned from the data. It involves the systematic optimization of the parameters to enhance a model's performance. This process often employs techniques like grid search, exploring different combinations of hyperparameter values to find the optimal set that maximizes model accuracy or other performance metrics. The accuracy of Xgboost didn’t improve after doing hyperparameter tuning on data.
  • 52. The accuracy of LightGBM also didn’t improve.
  • 53. Model Selection • Since the accuracy of both Xgboost and LighGBM didn’t increase after tuning them with parameters. But lightGBM has a high accuracy of 82% and also the model was able to correctly classify the classes. Therefore, the LightGBM is the best model for the heart prediction data. • As per the result, the model has around 82% precision score which is quite acceptable to predict heart disease in an individual based upon the characteristics of age, sex, cp trestbps, chol, fbs, restecg, thalch, exang, oldpeak, slope, ca, thal.
  • 54. 1) The patients' ages range from 29 to 77 years, with an average age of 54. 2) The majority of the patients are male (75.9%) and the most common type of chest pain experienced by the patients is typical angina (39.6%). 3) The average resting blood pressure is 131.6 mmHg and the average cholesterol level is 246 mg/dL. 4) The average maximum heart rate achieved during exercise is 139.9 bpm. 5) Most patients (70.3%) do not experience exercise-induced angina. 6) The average ST depression induced by exercise is 1.04 mm the majority of the patients (54.8%) have a normal ECG result. 7) Several classification models were trained and evaluated, including Logistic Regression, Random Forest, Naive Bayes, Gradient Boosting, XGBoost, and LightGBM. 8) The LightGBM model achieved the highest accuracy of 80.97% after hyperparameter tuning. 9) The ROC curves and AUC scores for each model were analyzed to assess their performance. 10) The results suggest that the XGBoost and LightGBM models are suitable for predicting the presence or absence of heart disease based on the available features. Summary