SlideShare a Scribd company logo
1 of 16
Leveraging Machine Learning for
Breast Cancer Prediction Presented by : Arwa Marfatia
Introduction
• Machine Learning technologies has a wide range of potential uses
in healthcare from improving patient data, medical research,
diagnosis and treatment, to reducing costs and making patient
safety more efficient.
• Breast Cancer is considered one of the most common cancers in
women caused by various clinical, lifestyle, social and economic
factors.
• Machine learning, with its predictive capabilities, offers a
transformative approach to understanding and predicting breast
cancer in patients.
Through data-driven insights and predictive modeling, this presentation aims to showcase my
Machine Learning Capstone Project focused on predicting breast cancer in the Healthcare
Sector.
Why Healthcare
Domain?
Machine learning provides an exciting opportunity in healthcare to improve the
accuracy of diagnoses, personalize healthcare, and find novel solutions to decades-
old problems.
Application of Machine Learning in Healthcare:
• Improve trauma-care response: By creating sensors and devices that can send a
patient’s vital information to the hospital before they arrive via ambulance or
other emergency transport, there is less time between when the patient arrives
and when they are able to receive life-saving treatment.
• Disease prediction: You can use machine learning to find trends, create
connections, and make conclusions based on large data sets. This can include
predicting disease outbreaks in communities and tracking habits leading to
patient disease.
• Visualization of biomedical data: You can use machine learning to create three-
dimensional visualisations of biomedical data such as RNA sequences, protein
structure, and genomic profiles.
• Improved diagnosis and disease identification: Identify previously
unrecognisable symptom patterns and compare them with larger data sets to
diagnose diseases earlier in their development.
Project’s Significance and
its Benefits to Healthcare
• Early Diagnosis: Combining multiple risk factors in modeling for breast cancer
prediction could help the early diagnosis of the disease with necessary care plans.
• Collection, storage, and management: of different data and intelligent systems based
on multiple factors for predicting breast cancer are effective in disease management.
• Visualization of biomedical data: You can use machine learning to create three-
dimensional visualizations of biomedical data such as RNA sequences, protein
structure, and genomic profiles.
• Improved diagnosis and disease identification: Identify previously unrecognisable
symptom patterns and compare them with larger data sets to diagnose diseases
earlier in their development.
Dataset
Information
Here are the key details about the dataset used in this project:
• Number of records: Our dataset comprises of a comparatively smaller collection
of data, consisting of 569 records. Each record represents a unique entry,
contributing to the richness and depth of our analysis.
• Features/Columns: The dataset is characterized by a diverse set of features.
Features are computed from a digitized image of a fine needle aspirate (FNA) of
a breast mass. They describe characteristics of the cell nuclei present in the
image. In total, there are 30 features/columns that form the basis of our
predictive modeling.
• Source of the Data: The dataset is sourced from Kaggle, ensuring reliability
and relevance. The data's origin plays a crucial role in shaping the context and
ensuring that our analysis is grounded in real-world scenarios and industry
dynamics.
Exploratory Data Analysis (EDA)
• Exploring the data allowed us to gain a comprehensive overview of the
data's structure. It uncovered potential patterns, helped us identify key
trends and get essential insights from the dataset.
• Throughout the EDA process, we analyzed the distribution of individual
features, investigated correlations, and explored any inherent
relationships between variables.
• Visualizations also played a crucial role in providing a clear
representation of the data, offering insights into breast cancer
prediction.
• First, we made sure there were no Null values and Duplicates in the dataset. There was only one
column with null values which was dropped since it only had null values. Our dataset was clean
to begin with.
• Then, we checked our columns to see if they were providing any useful information for us to
work with. We found out that columns like “ID” and “Unnamed 32” weren't contributing much
to the predictions. Hence, we decided to drop them during preprocessing.
• Some columns were highly correlated and could lead to overfitting and hence were dropped.
• To ensure consistent scales for numerical features, we decided to employ Standard Scaler
during preprocessing.
Exploratory Data Analysis (EDA)
Visualization
s
Our target variable ‘Diagnosis’ has 357 Benign
(Negative cases) and 212 Malignant (Positive
cases).
Upon inspecting the heatmap, we can see that there is multicollinearity observed among the
columns. As a result, some columns will be dropped.
Preprocessing
• First, “ID” and “Unnamed 32”columns were dropped as they didn’t provide any useful
information for our predictions.
• Since there is multicollinearity, columns with high correlations with other were
dropped.
• Then, we encoded the Categorical data into Numerical data with the help of Mapping.
It assigns binary numeric values to each unique class present in column with
categorical data.
Splitting the data into X and
y• In this step, we partitioned the dataset into two components: X and y.
• The variable X encompasses all independent variables, representing the features
that contribute to our predictions.
• On the other hand, y encapsulates the dependent variable or target variable,
serving as the outcome we aim to predict.
Train-Test Split
• We then split the dataset into training data and testing data.
• We did an 80:20 split, meaning 80% of our data is Training Data and 20% of our data is
Testing Data. So, our test size was set to 0.2.
• We took Random State as 40. This guaranteed the reproducibility of our results across
different runs.
• We also used Stratify = y to ensure that our Target Variable (y) is distributed
proportionally.
Standard Scaler
• We used Standard Scaler to standardize the features of the dataset.
• This ensured that the consistency between the features of the dataset was maintained.
• Standardization is crucial for certain machine learning algorithms, promoting optimal
model performance by mitigating the influence of varying magnitudes among features.
Applying Machine
Learning Algorithms
The, Breast Cancer Prediction problem, is a Binary Classification problem.
Models used:
• Logistic Regression : Logistic Regression is a powerful tool in binary classification. Its very good at
modeling the probability of an event occurring, making it suitable for scenarios where understanding the
likelihood of breast cancer cells is essential.
• Random Forest : It is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
• Decision Tree : A decision tree is a supervised learning algorithm that models decisions based on input
features.
• Support Vector Machine (SVC) : Support Vector Classification is a robust algorithm employed for
classification tasks, especially when there's a need for clear separation between classes.
• Naive Bayes : Naive Bayes is a probabilistic classification algorithm known for its simplicity and efficiency.
It assumes that features are independent, making calculations easier. Its often used when simplicity and
speed are crucial.
Model Selection and Considerations
• SVC outperforms Logistic Regression, Random Forest, Decision Tree and Naive Bayes in
all metrics, demonstrating higher Accuracy, Precision, Recall, and F1-Score. It seems to
be a promising model for our task.
• Based on the provided metrics, SVC stands out as the best-performing model overall. It
achieves a good balance between precision and recall, making it suitable for our Breast
Cancer prediction task.
• Hence, we will go with Support Vector Classification as our final model as it is quite
evident that it performs best for our Breast Cancer problem.
Conclusion
• With the help of several insights, patterns and trends in our data, we’ve used Machine
Learning to address the intricate challenge of predicting Breast Cancer.
• This project offers significant benefits to banks:
 Combining multiple risk factors in modeling for breast cancer prediction could help
the early diagnosis of the disease with necessary care plans.
 Collection, storage, and management of different data and intelligent systems
based on multiple factors for predicting breast cancer are effective in disease
management.
 The proposed machine-learning approaches could predict breast cancer as the
early detection of this disease could help slow down the progress of the disease and
reduce the mortality rate through appropriate therapeutic interventions at the
right time.
Thank You !

More Related Content

Similar to Breast Cancer Prediction - Arwa Marfatia.pptx

Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...IRJET Journal
 
brain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumorbrain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumorNagavelliMadhavi
 
DataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxDataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxMaligireddyTanujaRed1
 
Breast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBreast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBayesia USA
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
 
Breast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine LearningBreast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine LearningIRJET Journal
 
A Review on Breast Cancer Detection
A Review on Breast Cancer DetectionA Review on Breast Cancer Detection
A Review on Breast Cancer DetectionIRJET Journal
 
Classification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataClassification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataIIRindia
 
Breast Cancer Prediction
Breast Cancer PredictionBreast Cancer Prediction
Breast Cancer PredictionIRJET Journal
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
 
Breast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningBreast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningIRJET Journal
 
Classification of Breast Cancer Tissues using Decision Tree Algorithms
Classification of Breast Cancer Tissues using Decision Tree AlgorithmsClassification of Breast Cancer Tissues using Decision Tree Algorithms
Classification of Breast Cancer Tissues using Decision Tree AlgorithmsLovely Professional University
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniquesinventionjournals
 
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbhfirst review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbhmithun302002
 
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...Michael Batavia
 
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...IRJET Journal
 
IRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning ApproachIRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning ApproachIRJET Journal
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 

Similar to Breast Cancer Prediction - Arwa Marfatia.pptx (20)

Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...IRJET-  	  Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
IRJET- Breast Cancer Relapse Prognosis by Classic and Modern Structures o...
 
brain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumorbrain tumor presentation.pptxbraintumorpresentationonbraintumor
brain tumor presentation.pptxbraintumorpresentationonbraintumor
 
DataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptxDataMining Techniques in BreastCancer.pptx
DataMining Techniques in BreastCancer.pptx
 
Breast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian NetworksBreast Cancer Diagnostics with Bayesian Networks
Breast Cancer Diagnostics with Bayesian Networks
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Breast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine LearningBreast Cancer Detection Using Machine Learning
Breast Cancer Detection Using Machine Learning
 
A Review on Breast Cancer Detection
A Review on Breast Cancer DetectionA Review on Breast Cancer Detection
A Review on Breast Cancer Detection
 
Comparison of breast cancer classification models on Wisconsin dataset
Comparison of breast cancer classification models on Wisconsin  datasetComparison of breast cancer classification models on Wisconsin  dataset
Comparison of breast cancer classification models on Wisconsin dataset
 
Classification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer DataClassification AlgorithmBased Analysis of Breast Cancer Data
Classification AlgorithmBased Analysis of Breast Cancer Data
 
Breast Cancer Prediction
Breast Cancer PredictionBreast Cancer Prediction
Breast Cancer Prediction
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
Breast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine LearningBreast Cancer Prediction using Machine Learning
Breast Cancer Prediction using Machine Learning
 
Classification of Breast Cancer Tissues using Decision Tree Algorithms
Classification of Breast Cancer Tissues using Decision Tree AlgorithmsClassification of Breast Cancer Tissues using Decision Tree Algorithms
Classification of Breast Cancer Tissues using Decision Tree Algorithms
 
Classification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining TechniquesClassification of Breast Cancer Diseases using Data Mining Techniques
Classification of Breast Cancer Diseases using Data Mining Techniques
 
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbhfirst review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
first review.pptxgghggggvvvvbbvvvvvhhjjjbbvvvvbbbbbhhhhhhhhhbbh
 
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
Key Insights Of Using Deep Learning To Analyze Healthcare Data | Workshop Fro...
 
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
IRJET - A Conceptual Method for Breast Tumor Classification using SHAP Values ...
 
IRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning ApproachIRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
IRJET- Breast Cancer Disease Prediction : Using Machine Learning Approach
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 

More from Boston Institute of Analytics

Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgBoston Institute of Analytics
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFBoston Institute of Analytics
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Boston Institute of Analytics
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesBoston Institute of Analytics
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionBoston Institute of Analytics
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachBoston Institute of Analytics
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationBoston Institute of Analytics
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 

More from Boston Institute of Analytics (20)

Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Detecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven ApproachDetecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven Approach
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Analyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning projectAnalyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning project
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning Approach
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project Presentation
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 

Recently uploaded

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

Recently uploaded (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 

Breast Cancer Prediction - Arwa Marfatia.pptx

  • 1.
  • 2. Leveraging Machine Learning for Breast Cancer Prediction Presented by : Arwa Marfatia
  • 3. Introduction • Machine Learning technologies has a wide range of potential uses in healthcare from improving patient data, medical research, diagnosis and treatment, to reducing costs and making patient safety more efficient. • Breast Cancer is considered one of the most common cancers in women caused by various clinical, lifestyle, social and economic factors. • Machine learning, with its predictive capabilities, offers a transformative approach to understanding and predicting breast cancer in patients. Through data-driven insights and predictive modeling, this presentation aims to showcase my Machine Learning Capstone Project focused on predicting breast cancer in the Healthcare Sector.
  • 4. Why Healthcare Domain? Machine learning provides an exciting opportunity in healthcare to improve the accuracy of diagnoses, personalize healthcare, and find novel solutions to decades- old problems. Application of Machine Learning in Healthcare: • Improve trauma-care response: By creating sensors and devices that can send a patient’s vital information to the hospital before they arrive via ambulance or other emergency transport, there is less time between when the patient arrives and when they are able to receive life-saving treatment. • Disease prediction: You can use machine learning to find trends, create connections, and make conclusions based on large data sets. This can include predicting disease outbreaks in communities and tracking habits leading to patient disease. • Visualization of biomedical data: You can use machine learning to create three- dimensional visualisations of biomedical data such as RNA sequences, protein structure, and genomic profiles. • Improved diagnosis and disease identification: Identify previously unrecognisable symptom patterns and compare them with larger data sets to diagnose diseases earlier in their development.
  • 5. Project’s Significance and its Benefits to Healthcare • Early Diagnosis: Combining multiple risk factors in modeling for breast cancer prediction could help the early diagnosis of the disease with necessary care plans. • Collection, storage, and management: of different data and intelligent systems based on multiple factors for predicting breast cancer are effective in disease management. • Visualization of biomedical data: You can use machine learning to create three- dimensional visualizations of biomedical data such as RNA sequences, protein structure, and genomic profiles. • Improved diagnosis and disease identification: Identify previously unrecognisable symptom patterns and compare them with larger data sets to diagnose diseases earlier in their development.
  • 6. Dataset Information Here are the key details about the dataset used in this project: • Number of records: Our dataset comprises of a comparatively smaller collection of data, consisting of 569 records. Each record represents a unique entry, contributing to the richness and depth of our analysis. • Features/Columns: The dataset is characterized by a diverse set of features. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. In total, there are 30 features/columns that form the basis of our predictive modeling. • Source of the Data: The dataset is sourced from Kaggle, ensuring reliability and relevance. The data's origin plays a crucial role in shaping the context and ensuring that our analysis is grounded in real-world scenarios and industry dynamics.
  • 7. Exploratory Data Analysis (EDA) • Exploring the data allowed us to gain a comprehensive overview of the data's structure. It uncovered potential patterns, helped us identify key trends and get essential insights from the dataset. • Throughout the EDA process, we analyzed the distribution of individual features, investigated correlations, and explored any inherent relationships between variables. • Visualizations also played a crucial role in providing a clear representation of the data, offering insights into breast cancer prediction.
  • 8. • First, we made sure there were no Null values and Duplicates in the dataset. There was only one column with null values which was dropped since it only had null values. Our dataset was clean to begin with. • Then, we checked our columns to see if they were providing any useful information for us to work with. We found out that columns like “ID” and “Unnamed 32” weren't contributing much to the predictions. Hence, we decided to drop them during preprocessing. • Some columns were highly correlated and could lead to overfitting and hence were dropped. • To ensure consistent scales for numerical features, we decided to employ Standard Scaler during preprocessing. Exploratory Data Analysis (EDA)
  • 9. Visualization s Our target variable ‘Diagnosis’ has 357 Benign (Negative cases) and 212 Malignant (Positive cases).
  • 10. Upon inspecting the heatmap, we can see that there is multicollinearity observed among the columns. As a result, some columns will be dropped.
  • 11. Preprocessing • First, “ID” and “Unnamed 32”columns were dropped as they didn’t provide any useful information for our predictions. • Since there is multicollinearity, columns with high correlations with other were dropped. • Then, we encoded the Categorical data into Numerical data with the help of Mapping. It assigns binary numeric values to each unique class present in column with categorical data. Splitting the data into X and y• In this step, we partitioned the dataset into two components: X and y. • The variable X encompasses all independent variables, representing the features that contribute to our predictions. • On the other hand, y encapsulates the dependent variable or target variable, serving as the outcome we aim to predict.
  • 12. Train-Test Split • We then split the dataset into training data and testing data. • We did an 80:20 split, meaning 80% of our data is Training Data and 20% of our data is Testing Data. So, our test size was set to 0.2. • We took Random State as 40. This guaranteed the reproducibility of our results across different runs. • We also used Stratify = y to ensure that our Target Variable (y) is distributed proportionally. Standard Scaler • We used Standard Scaler to standardize the features of the dataset. • This ensured that the consistency between the features of the dataset was maintained. • Standardization is crucial for certain machine learning algorithms, promoting optimal model performance by mitigating the influence of varying magnitudes among features.
  • 13. Applying Machine Learning Algorithms The, Breast Cancer Prediction problem, is a Binary Classification problem. Models used: • Logistic Regression : Logistic Regression is a powerful tool in binary classification. Its very good at modeling the probability of an event occurring, making it suitable for scenarios where understanding the likelihood of breast cancer cells is essential. • Random Forest : It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model. • Decision Tree : A decision tree is a supervised learning algorithm that models decisions based on input features. • Support Vector Machine (SVC) : Support Vector Classification is a robust algorithm employed for classification tasks, especially when there's a need for clear separation between classes. • Naive Bayes : Naive Bayes is a probabilistic classification algorithm known for its simplicity and efficiency. It assumes that features are independent, making calculations easier. Its often used when simplicity and speed are crucial.
  • 14. Model Selection and Considerations • SVC outperforms Logistic Regression, Random Forest, Decision Tree and Naive Bayes in all metrics, demonstrating higher Accuracy, Precision, Recall, and F1-Score. It seems to be a promising model for our task. • Based on the provided metrics, SVC stands out as the best-performing model overall. It achieves a good balance between precision and recall, making it suitable for our Breast Cancer prediction task. • Hence, we will go with Support Vector Classification as our final model as it is quite evident that it performs best for our Breast Cancer problem.
  • 15. Conclusion • With the help of several insights, patterns and trends in our data, we’ve used Machine Learning to address the intricate challenge of predicting Breast Cancer. • This project offers significant benefits to banks:  Combining multiple risk factors in modeling for breast cancer prediction could help the early diagnosis of the disease with necessary care plans.  Collection, storage, and management of different data and intelligent systems based on multiple factors for predicting breast cancer are effective in disease management.  The proposed machine-learning approaches could predict breast cancer as the early detection of this disease could help slow down the progress of the disease and reduce the mortality rate through appropriate therapeutic interventions at the right time.