Can data analysis help predict the future of your heart health?
The Boston Institute of Analytics (BIA) presents a collection of student presentations on data analysis projects tackling the critical topic of heart attack prediction.
Join us as we delve into the world of healthcare analytics and explore how data can be harnessed to identify individuals at risk of heart attack. These presentations offer valuable insights for:
Medical professionals seeking to develop preventative healthcare strategies
Individuals interested in understanding their own heart health risks
Data analysts passionate about applying data analysis for social good
Here's what you'll learn by watching these presentations:
The power of data analysis in predicting heart attacks
Various data analysis techniques used for risk assessment
Real-world examples of heart attack prediction models
Insights and findings from the research of dedicated BIA students
Empower yourself and others with the knowledge of heart health prediction. Watch these presentations and unlock the potential of data analysis in saving lives!
visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
3. Project Contents
1. Introduction and Problem Statement
2. Data Exploration
3. Data Cleaning
4. Data Visualization
5. Data Preprocessing
6. Model Building and Evaluation
7. Model Comparison
8. Conclusion
4. Heart attack risk prediction refers to the process of utilizing machine learning and medical
data to assess an individual's likelihood of experiencing a heart attack. It involves analyzing
various factors related to a person's health, lifestyle, and medical history to identify
patterns that may indicate an elevated risk of a heart attack.
Predicting the risk of a heart attack is a complex task that often involves the use of
machine learning models applied to medical data. The medical data was provided to me,
using which I moved ahead with risk prediction.
Problem Statement:
The objective of this experimental approach is to check for the accuracy of the data set
given to me using ML models.
Heart Attack Risk Prediction
5. Read Data
Import the libraries which will be used.
Read the CSV file located at the specified path and assigning it to a pandas DataFrame called
‘data’.
Head and Tail Function displays the initial 5 and the final 5 rows of a DataFrame, providing a
quick overview of its structure and content.
6. Data Exploration
I used ‘data.info()’ to check for non-null values and data types.
I used data[‘column name’].value_counts() function to calculate the count of unique values in whichever
column I needed.
Data.isnull function provides a count of null values in each column in the DataFrame.
7. Data Exploration
I used data.describe() to provide descriptive statistics of the data set.
I used data.corr() to calculate the correlation matrix of the data set.
8. Data.nunique() function displays the count of unique values for each column.
Checking for Unique and Duplicate Values
9. Data Cleaning
Information provided in the data set all have a part to play when it comes to heart attack or cardiac arrest. I Did
not have any need to clean the dataset
10. Data Visualization using Python
All the following graphs were obtained using python which will help us understand the data.
12. • Heart Attack risk has highest correlation with Diabetes, Cholestrol and Exercise Hours Per Weak
• Heart attack Risk is not much dependent on Sedentary Hours Per Day
• Alcohol Consumption has no stronger link with Heart Attack Risk Smoking is not a major cause of Heart Attack
13. Data Preprocessing
Label encoding is used to transform categorical variables into numeric forms, which makes it easier to be used in machine learning
algorithms. Since numerous machine learning algorithms demand numerical input, the process of encoding categorical variables is
essential for representing them as numeric features that can be effectively processed by the algorithms.
Scaling is performed to ensure that all numerical features in a dataset are on a similar scale, avoiding biases,
enabling fair comparisons, and facilitating the convergence of machine learning algorithms for better model
performance.
14. Splitting Data
This code randomly splits the dataset into two separate sets: the training set and the testing set. The split is done
with a test size of 0.20, meaning that 20% of the data will be allocated for testing, while the remaining 80% will be
used for training.
15. Modeling
Models used :
1. Logistic Regression : Logistic Regression provides interpretable results, allowing you to understand the
impact and significance of each independent variable on the probability of heart attack.
16. Modeling
Models used :
2. Decision Tree : Decision trees can effectively handle non-linear relationships between features.
17. Modeling
Models used :
3. Random Forest : Random Forest combines predictions from multiple decision trees, which can lead to a
more accurate and stable model compared to individual decision trees.
19. Model Comparison
After evaluating different models for heart attack risk predicition, including Logistic Regression, Decision Tree,
Random Forest, and KNN, it can be concluded that the Logistic Regression and random models outperform the
others when compared with each of the performance.
20. Conclusion
In conclusion, the analysis of the provided dataset involved the application of various machine
learning models, including K-Nearest Neighbors (KNN), Random Forest, and Decision Tree. Each
model was trained on a portion of the dataset and evaluated on a separate test set.
Performance metrics such as accuracy, precision, and F1 score were computed to assess the
models' effectiveness in capturing patterns within the data.