Decoding the Heart: Student Presentation on Heart Attack Prediction with Data Analysis

Heart Attack Risk Prediction
Presented By :
Shreyank Thakker

Project Contents
1. Introduction and Problem Statement
2. Data Exploration
3. Data Cleaning
4. Data Visualization
5. Data Preprocessing
6. Model Building and Evaluation
7. Model Comparison
8. Conclusion

Heart attack risk prediction refers to the process of utilizing machine learning and medical
data to assess an individual's likelihood of experiencing a heart attack. It involves analyzing
various factors related to a person's health, lifestyle, and medical history to identify
patterns that may indicate an elevated risk of a heart attack.
Predicting the risk of a heart attack is a complex task that often involves the use of
machine learning models applied to medical data. The medical data was provided to me,
using which I moved ahead with risk prediction.
Problem Statement:
The objective of this experimental approach is to check for the accuracy of the data set
given to me using ML models.
Heart Attack Risk Prediction

Read Data
 Import the libraries which will be used.
 Read the CSV file located at the specified path and assigning it to a pandas DataFrame called
‘data’.
 Head and Tail Function displays the initial 5 and the final 5 rows of a DataFrame, providing a
quick overview of its structure and content.

Data Exploration
 I used ‘data.info()’ to check for non-null values and data types.
 I used data[‘column name’].value_counts() function to calculate the count of unique values in whichever
column I needed.
 Data.isnull function provides a count of null values in each column in the DataFrame.

Data Exploration
 I used data.describe() to provide descriptive statistics of the data set.
 I used data.corr() to calculate the correlation matrix of the data set.

 Data.nunique() function displays the count of unique values for each column.
Checking for Unique and Duplicate Values

Data Cleaning
Information provided in the data set all have a part to play when it comes to heart attack or cardiac arrest. I Did
not have any need to clean the dataset

Data Visualization using Python
All the following graphs were obtained using python which will help us understand the data.

• Heart Attack risk has highest correlation with Diabetes, Cholestrol and Exercise Hours Per Weak
• Heart attack Risk is not much dependent on Sedentary Hours Per Day
• Alcohol Consumption has no stronger link with Heart Attack Risk Smoking is not a major cause of Heart Attack

Data Preprocessing
Label encoding is used to transform categorical variables into numeric forms, which makes it easier to be used in machine learning
algorithms. Since numerous machine learning algorithms demand numerical input, the process of encoding categorical variables is
essential for representing them as numeric features that can be effectively processed by the algorithms.
Scaling is performed to ensure that all numerical features in a dataset are on a similar scale, avoiding biases,
enabling fair comparisons, and facilitating the convergence of machine learning algorithms for better model
performance.

Splitting Data
This code randomly splits the dataset into two separate sets: the training set and the testing set. The split is done
with a test size of 0.20, meaning that 20% of the data will be allocated for testing, while the remaining 80% will be
used for training.

Modeling
Models used :
1. Logistic Regression : Logistic Regression provides interpretable results, allowing you to understand the
impact and significance of each independent variable on the probability of heart attack.

Modeling
Models used :
2. Decision Tree : Decision trees can effectively handle non-linear relationships between features.

Modeling
Models used :
3. Random Forest : Random Forest combines predictions from multiple decision trees, which can lead to a
more accurate and stable model compared to individual decision trees.

Modeling
Models used :
4. k-Nearest Neighbors (KNN)

Model Comparison
After evaluating different models for heart attack risk predicition, including Logistic Regression, Decision Tree,
Random Forest, and KNN, it can be concluded that the Logistic Regression and random models outperform the
others when compared with each of the performance.

Conclusion
In conclusion, the analysis of the provided dataset involved the application of various machine
learning models, including K-Nearest Neighbors (KNN), Random Forest, and Decision Tree. Each
model was trained on a portion of the dataset and evaluated on a separate test set.
Performance metrics such as accuracy, precision, and F1 score were computed to assess the
models' effectiveness in capturing patterns within the data.

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data Analysis

Recommended

Recommended

More Related Content

Similar to Decoding the Heart: Student Presentation on Heart Attack Prediction with Data Analysis

Similar to Decoding the Heart: Student Presentation on Heart Attack Prediction with Data Analysis (20)

More from Boston Institute of Analytics

More from Boston Institute of Analytics (20)

Recently uploaded

Recently uploaded (20)

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data Analysis