SlideShare a Scribd company logo
1 of 31
Semantic Analysis of Online Health Information
Seeking for Cardiovascular Diseases
1
Ashutosh Jadhav
ashutosh@knoesis.org
AMIA 2014 Annual Symposium
Washington, DC
• Speaker discloses that he has no
relationships with commercial interests.
Disclosure
Collaborators
Prof. Amit Sheth (PhD Advisor)
Kno.e.sis Center, Wright State
University, OH, USA
Dr. Jyotishman Pathak (Mentor)
Mayo Clinic, Rochester, MN, USA
http://www.internetlivestats.com/internet-users/
Around 3 Billions (40%) of the
world population
Around 300 Million (87 %) of
the US population
4
Internet Users in the World
Online Health Information Seeking
5
Online Health Resources
6
Online Health Information Seeking
7
According to the Pew Survey, approximately 8 in 10 online
health inquiries initiate from a search engine.
Fox S, Duggan M. Pew Internet & American Life Project. 2013. Health online 2013
• According to Center for Disease Control and
Prevention, in the United States
– CVD is one of the most common chronic
diseases
– the leading cause of death (1 in every 4 deaths)
• CVD is common across all socioeconomic
groups and demographics
• Online health resources are “significant
information supplement” for the patients with
chronic conditions
8
Cardiovascular Diseases (CVD)
Use-case
Motivation
• Although cardiovascular diseases (CVD) affect a large
percentage of the population, few studies have
investigated what and how users search for CVD related
information online
• Such knowledge can be applied to improve the online
health search experience as well as to develop more
advanced next-generation knowledge and content
delivery systems
10
Methods Overview
• Data:
– CVD related search queries
– Limited to United States
• Data timeframe:
– September 2011 to August 2013
• Data collection tool:
– IBM NetInsight On Demand
(Web Analytics tool)
• Dataset size:
– 10 million CVD related SQ
– Significantly large dataset for a
single class of diseases. 11
Dataset Creation
12
Top CVD Search Queries
Top 1-5 Queries Top 6-10 Queries
heart attack symptom congestive heart failure
blood pressure chart low blood pressure
how to lower blood pressure stroke symptoms
heart rate normal blood pressure
broken heart syndrome high blood pressure symptoms
Health Categories
• Selected “14 consumer oriented” health categories,
representing health information needs
• Methods
– Focus group study (Published in JMIR)
– Online health information seeking literature
– Empirical data analysis
– Health categories on popular health websites
• The health categories and the classification scheme is reviewed and
validated by the Mayo Clinic clinicians and domain experts.
13
Health Categories
Health Categories Health Categories
1 Symptoms 8 Living with
2 Causes 9 Prevention
3 Risks & Complications 10 Side effects
4 Drugs and Medications 11 Medical devices
5 Treatments 12 Diseases and conditions
6 Tests and Diagnosis 13 Age-group References
7 Food and Diet 14 Vital signs
14
Drugs and Medications: tylenol raise blood pressure, ibuprofen heart rate,
dextromethorphan blood pressure, medications pulmonary hypertension,
Health Categories Example
15
Search Query Health Categories
Heart palpitations with headache Symptoms
Tylenol and blood pressure Medication, Vital sign
Pump for pulmonary
hypertension
Medical device,
Disease
Red wine heart disease Food, Disease
Bypass surgery Treatment
Classification: Possible Approaches
• Statistical Machine Learning algorithms
– Require training data
– For multiclass classification problem with 14 classes, we
need lot of training data
– Training data
• expensive to create as it should be created manually by
domain expert
• Coverage will be limited
– Does not consider semantics of queries
16
Domain Constraint
Classifier trained for one disease may
not work for other diseases as the
symptom, treatment, drugs and
medications varies by the diseases
17
Background Knowledge
• UMLS (Unified Medical Language System)
– Comprises over 1 million biomedical concepts and 5
million concept names
– Incorporates variety of medical vocabularies and concepts,
and maps each concept to semantic types
– Contains Consumer Health Vocabulary (CHV)
• Hair loss => Alopecia
– Quarterly updated with new concepts
18
Semantic
Analysis
• UMLS Semantic Type
– Example: symptom or sign, disease or syndrome
• UMLS Concepts
– Example: blood pressure, heart rate
• UMLS MetaMap
– Tool for recognizing UMLS concepts in the text
19
MetaMap Usage Challenge and Solution
20
Hadoop-MapReduce framework with 16 Nodes
Functional overview of a mapper
Gold Standard Dataset Creation
• Randomly selected 2000 search queries from the analysis
dataset.
• Two domain experts manually annotated 2000 search queries
by labeling one search query with zero, one, or more than
health category
• The annotators first discussed and agreed upon the annotation
scheme.
• To reduce the probability of human errors and subjectivity, the
two annotators discussed together and annotated each query
and created a gold standard dataset with 2000 search queries.
• The gold standard dataset is further divided into training and
testing dataset with 1000 search queries each.
21
22
Health
Category
Categorization
Rule
Example
Drugs and
Medications
• ST:
ORCH|PHSU,
CLND, PHSU
• CC: medication,
medicine,
drugs, dose,
dosage, tablet,
pill
• KW: meds
• without CC:
alcohol,
caffeine, fruit,
prevent
• Tylenol raise
blood pressure
• Medications
pulmonary
hypertension
• ibuprofen heart
rate
• Dextromethorph
an blood
pressure
23
Intent classes UMLS Semantic Types (ST), UMLS Concepts (CC) and Keywords (KW)
Symptoms ST: SOSY CC: symptoms, signs
Causes CC: cause, reason
Risks & Complications
CC: risk, complications
Drugs and Medications
ST: ORCH|PHSU, CLND, PHSU CC: medication, medicine, drugs, dose,
dosage, tablet, pill KW: meds (without CC: alcohol, caffeine, fruit, prevent)
Treatments
ST: TOPP, FTCN (treatment, surgery), CNCE (treatment), CC: remedy,
remediate (without CC: prevention and ‘Drugs and Medication’ queries)
Tests and Diagnosis
ST: DIAP, LBPR, LBTR CC: Test, diagnosis (without ST: DIAP| TOPP, CC:
alcohol, blood caffeine)
Food and Diet
ST: FOOD CC: caffeine, recipe, meal, menu, diet, eat, breakfast, lunch, dinner,
alcohol, drink
Living with
CC: control, manage, reduce, lower, coping, cure, recover KW: living with,
bring down, low down
Prevention CC: prevent, avoidance, low risk
Side effects CC: side effect KW: side effect
Medical devices ST: MEDD
Diseases and conditions ST: DSYN
Age-group References ST: AGGP
Vital signs
CC: blood pressure, heart rate, pulse rate, temperature, Heart beat, blood
glucose (without high/low blood pressure as we considered them under
‘Diseases and Conditions’)
Evaluation: Micro average
Precision Recall
• Classify 1000 search queries from the testing dataset
using the rule-based classifier
• Based on the evaluation, our classification approach has
very good Micro Average
– Precision: 0.8842,
– Recall: 0.8642
– and F-Score: 0.8723
24
Evaluations: Precision and Recall
Analysis for each Health Category
25
26
Results
No Intent Classes Total Queries
Percentage
Distribution
1 Diseases 4,232,398 40.66
2 Vital signs 3,455,809 33.20
3 Symptoms 1,422,826 13.67
4 Living with 1,178,756 11.32
5 Treatments 955,701 9.18
6 Food and Diet 779,949 7.49
7 Med Devices 665,484 6.39
8 Drugs and Medications 603,905 5.80
9 Causes 599,895 5.76
10 Tests & Diagnosis 344,747 3.31
11 Risks and Complication 277,294 2.66
12 Prevention 136,428 1.31
13 Age-group References 87,929 0.84
14 Side effects 25,655 0.25
Total 10,408,921 100
27
Results
8%
48%
40%
4%
0%
Distribution of search queries by number of
intent classes in which they are categorized
0
1
2
3
4 and 5
28
Data Analysis Results
29
• Average search query length for CVD is 3.88 words and 22.22 characters
• Around 80% of the CVD search queries have 3 or more words.
• CVD search queries are longer than previously reported non-medical as well
as medical queries
Data Analysis Results
Discussion and Conclusion
• We found that use of MetaMap and UMLS concepts/semantic type
to be a very good approach for customized health categorization
• The top searched health categories for CVD are ‘Diseases and
Conditions’, ‘Vital Sings’, ‘Symptoms’, and ‘Living with’.
• Most of the queries (around 88%) are categorized into either one
or two health categories.
• To the best of our knowledge, there is not much research on
understanding online health information searching for chronic
diseases and especially for CVD.
• This study addresses this knowledge gap and extends our
knowledge about online health information search behavior.
Thanks!
Ashutosh Jadhav
ashutosh@knoesis.org

More Related Content

Viewers also liked

THE 4 X 4 SEMANTIC MODEL : Semantics to Empower Services Science: Using Seman...
THE 4 X 4 SEMANTIC MODEL: Semantics to Empower Services Science: Using Seman...THE 4 X 4 SEMANTIC MODEL: Semantics to Empower Services Science: Using Seman...
THE 4 X 4 SEMANTIC MODEL : Semantics to Empower Services Science: Using Seman...Artificial Intelligence Institute at UofSC
 
Semantic Computing in Real-World: Vertical and Horizontal application, withi...
Semantic Computing in Real-World: Vertical and Horizontal application, withi...Semantic Computing in Real-World: Vertical and Horizontal application, withi...
Semantic Computing in Real-World: Vertical and Horizontal application, withi...Artificial Intelligence Institute at UofSC
 
Are Twitter Users Equal in Predicting Elections? Insights from Republican Pri...
Are Twitter Users Equal in Predicting Elections? Insights from Republican Pri...Are Twitter Users Equal in Predicting Elections? Insights from Republican Pri...
Are Twitter Users Equal in Predicting Elections? Insights from Republican Pri...Artificial Intelligence Institute at UofSC
 
Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and ...
Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and ...Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and ...
Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and ...Artificial Intelligence Institute at UofSC
 
Stay Awhile and Listen: User Interactions in a Crowdsourced Platform Offerin...
Stay Awhile and Listen: User Interactions in a Crowdsourced PlatformOfferin...Stay Awhile and Listen: User Interactions in a Crowdsourced PlatformOfferin...
Stay Awhile and Listen: User Interactions in a Crowdsourced Platform Offerin...Artificial Intelligence Institute at UofSC
 
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...Artificial Intelligence Institute at UofSC
 
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Artificial Intelligence Institute at UofSC
 
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...Artificial Intelligence Institute at UofSC
 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Artificial Intelligence Institute at UofSC
 
The knowledge-driven exploration of integrated biomedical knowledge sources f...
The knowledge-driven exploration of integrated biomedical knowledge sources f...The knowledge-driven exploration of integrated biomedical knowledge sources f...
The knowledge-driven exploration of integrated biomedical knowledge sources f...Artificial Intelligence Institute at UofSC
 
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...Artificial Intelligence Institute at UofSC
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Artificial Intelligence Institute at UofSC
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Artificial Intelligence Institute at UofSC
 

Viewers also liked (19)

Exploring Synthetic Cannabinoid Effects Using Web Forum Data
Exploring Synthetic Cannabinoid Effects Using Web Forum Data Exploring Synthetic Cannabinoid Effects Using Web Forum Data
Exploring Synthetic Cannabinoid Effects Using Web Forum Data
 
Integrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City EventsIntegrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City Events
 
THE 4 X 4 SEMANTIC MODEL : Semantics to Empower Services Science: Using Seman...
THE 4 X 4 SEMANTIC MODEL: Semantics to Empower Services Science: Using Seman...THE 4 X 4 SEMANTIC MODEL: Semantics to Empower Services Science: Using Seman...
THE 4 X 4 SEMANTIC MODEL : Semantics to Empower Services Science: Using Seman...
 
Semantic Computing in Real-World: Vertical and Horizontal application, withi...
Semantic Computing in Real-World: Vertical and Horizontal application, withi...Semantic Computing in Real-World: Vertical and Horizontal application, withi...
Semantic Computing in Real-World: Vertical and Horizontal application, withi...
 
Are Twitter Users Equal in Predicting Elections? Insights from Republican Pri...
Are Twitter Users Equal in Predicting Elections? Insights from Republican Pri...Are Twitter Users Equal in Predicting Elections? Insights from Republican Pri...
Are Twitter Users Equal in Predicting Elections? Insights from Republican Pri...
 
Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and ...
Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and ...Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and ...
Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and ...
 
Stay Awhile and Listen: User Interactions in a Crowdsourced Platform Offerin...
Stay Awhile and Listen: User Interactions in a Crowdsourced PlatformOfferin...Stay Awhile and Listen: User Interactions in a Crowdsourced PlatformOfferin...
Stay Awhile and Listen: User Interactions in a Crowdsourced Platform Offerin...
 
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
 
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
 
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
 
Representation of Parsimonious Covering Theory in OWL-DL
Representation of Parsimonious Covering Theory in OWL-DLRepresentation of Parsimonious Covering Theory in OWL-DL
Representation of Parsimonious Covering Theory in OWL-DL
 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
 
The knowledge-driven exploration of integrated biomedical knowledge sources f...
The knowledge-driven exploration of integrated biomedical knowledge sources f...The knowledge-driven exploration of integrated biomedical knowledge sources f...
The knowledge-driven exploration of integrated biomedical knowledge sources f...
 
Knoesis Student Achievement
Knoesis Student AchievementKnoesis Student Achievement
Knoesis Student Achievement
 
Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013Kno.e.sis Review: late 2012 to mid 2013
Kno.e.sis Review: late 2012 to mid 2013
 
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
SIAM SDM2014 tutorial - Social Media and Web of Data to Assist Crisis Respons...
 
Satya Sahoo Thesis Defense
Satya Sahoo Thesis DefenseSatya Sahoo Thesis Defense
Satya Sahoo Thesis Defense
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
 
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
Delroy Cameron's Dissertation Defense: A Contenxt-Driven Subgraph Model for L...
 

Similar to Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

Healthcare analytics
Healthcare analytics Healthcare analytics
Healthcare analytics Arun K
 
Ilkka Kunnamo: Virtual Health Check and Computer-based Decision Support
Ilkka Kunnamo: Virtual Health Check and Computer-based Decision Support Ilkka Kunnamo: Virtual Health Check and Computer-based Decision Support
Ilkka Kunnamo: Virtual Health Check and Computer-based Decision Support Sitra / Hyvinvointi
 
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...CrowdTruth
 
Using real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questionsUsing real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questionsKarin Verspoor
 
2016-Symposium-Fix-Your-HIPS-Problem-Meador.pptx
2016-Symposium-Fix-Your-HIPS-Problem-Meador.pptx2016-Symposium-Fix-Your-HIPS-Problem-Meador.pptx
2016-Symposium-Fix-Your-HIPS-Problem-Meador.pptxvarichk
 
Cadth 2015 a6 cadth symposium final
Cadth 2015 a6 cadth symposium finalCadth 2015 a6 cadth symposium final
Cadth 2015 a6 cadth symposium finalCADTH Symposium
 
EMR as a highly powerful European RWD source
EMR as a highly powerful European RWD sourceEMR as a highly powerful European RWD source
EMR as a highly powerful European RWD sourceIMSHealthRWES
 
CHSI Health Analytics
CHSI Health AnalyticsCHSI Health Analytics
CHSI Health AnalyticsPrashanth Raj
 
Million-Hearts-Initiative.pptx
Million-Hearts-Initiative.pptxMillion-Hearts-Initiative.pptx
Million-Hearts-Initiative.pptxderek462361
 
Presentation at Rare Disease conference in San-Antonio
Presentation at Rare Disease conference in San-AntonioPresentation at Rare Disease conference in San-Antonio
Presentation at Rare Disease conference in San-AntonioAnton Yuryev
 
Hbp Stategy Hypertension Management Initiative Feb07
Hbp Stategy Hypertension Management Initiative Feb07Hbp Stategy Hypertension Management Initiative Feb07
Hbp Stategy Hypertension Management Initiative Feb07primary
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBigData_Europe
 
How to communicate scientific and medical information to patients, advocates ...
How to communicate scientific and medical information to patients, advocates ...How to communicate scientific and medical information to patients, advocates ...
How to communicate scientific and medical information to patients, advocates ...jangeissler
 
Readmission of Diabetes Patients Report
Readmission of Diabetes Patients ReportReadmission of Diabetes Patients Report
Readmission of Diabetes Patients ReportHong Lu
 
Predicting Trends in Preventive Care Service Utilization Impacting Cardiovasc...
Predicting Trends in Preventive Care Service Utilization Impacting Cardiovasc...Predicting Trends in Preventive Care Service Utilization Impacting Cardiovasc...
Predicting Trends in Preventive Care Service Utilization Impacting Cardiovasc...gpartha85
 
Coverage of Clinical Medicine: A Diagnosis and Treatment Plan
Coverage of Clinical Medicine: A Diagnosis and Treatment PlanCoverage of Clinical Medicine: A Diagnosis and Treatment Plan
Coverage of Clinical Medicine: A Diagnosis and Treatment PlanIvan Oransky
 

Similar to Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium (20)

Healthcare analytics
Healthcare analytics Healthcare analytics
Healthcare analytics
 
Watson – from Jeopardy to healthcare
Watson – from Jeopardy to healthcareWatson – from Jeopardy to healthcare
Watson – from Jeopardy to healthcare
 
Ilkka Kunnamo: Virtual Health Check and Computer-based Decision Support
Ilkka Kunnamo: Virtual Health Check and Computer-based Decision Support Ilkka Kunnamo: Virtual Health Check and Computer-based Decision Support
Ilkka Kunnamo: Virtual Health Check and Computer-based Decision Support
 
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
 
Using real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questionsUsing real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questions
 
2016-Symposium-Fix-Your-HIPS-Problem-Meador.pptx
2016-Symposium-Fix-Your-HIPS-Problem-Meador.pptx2016-Symposium-Fix-Your-HIPS-Problem-Meador.pptx
2016-Symposium-Fix-Your-HIPS-Problem-Meador.pptx
 
Cadth 2015 a6 cadth symposium final
Cadth 2015 a6 cadth symposium finalCadth 2015 a6 cadth symposium final
Cadth 2015 a6 cadth symposium final
 
EMR as a highly powerful European RWD source
EMR as a highly powerful European RWD sourceEMR as a highly powerful European RWD source
EMR as a highly powerful European RWD source
 
CHSI Health Analytics
CHSI Health AnalyticsCHSI Health Analytics
CHSI Health Analytics
 
Meta-analysis in medical research
Meta-analysis in medical researchMeta-analysis in medical research
Meta-analysis in medical research
 
Million-Hearts-Initiative.pptx
Million-Hearts-Initiative.pptxMillion-Hearts-Initiative.pptx
Million-Hearts-Initiative.pptx
 
Drug Information.
Drug Information.Drug Information.
Drug Information.
 
Presentation at Rare Disease conference in San-Antonio
Presentation at Rare Disease conference in San-AntonioPresentation at Rare Disease conference in San-Antonio
Presentation at Rare Disease conference in San-Antonio
 
Hbp Stategy Hypertension Management Initiative Feb07
Hbp Stategy Hypertension Management Initiative Feb07Hbp Stategy Hypertension Management Initiative Feb07
Hbp Stategy Hypertension Management Initiative Feb07
 
Evidence Based Medicine by Dr. Harmanjit Singh, GMC, Patiala
Evidence Based Medicine by Dr. Harmanjit Singh, GMC, PatialaEvidence Based Medicine by Dr. Harmanjit Singh, GMC, Patiala
Evidence Based Medicine by Dr. Harmanjit Singh, GMC, Patiala
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health Domain
 
How to communicate scientific and medical information to patients, advocates ...
How to communicate scientific and medical information to patients, advocates ...How to communicate scientific and medical information to patients, advocates ...
How to communicate scientific and medical information to patients, advocates ...
 
Readmission of Diabetes Patients Report
Readmission of Diabetes Patients ReportReadmission of Diabetes Patients Report
Readmission of Diabetes Patients Report
 
Predicting Trends in Preventive Care Service Utilization Impacting Cardiovasc...
Predicting Trends in Preventive Care Service Utilization Impacting Cardiovasc...Predicting Trends in Preventive Care Service Utilization Impacting Cardiovasc...
Predicting Trends in Preventive Care Service Utilization Impacting Cardiovasc...
 
Coverage of Clinical Medicine: A Diagnosis and Treatment Plan
Coverage of Clinical Medicine: A Diagnosis and Treatment PlanCoverage of Clinical Medicine: A Diagnosis and Treatment Plan
Coverage of Clinical Medicine: A Diagnosis and Treatment Plan
 

Recently uploaded

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium

  • 1. Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases 1 Ashutosh Jadhav ashutosh@knoesis.org AMIA 2014 Annual Symposium Washington, DC
  • 2. • Speaker discloses that he has no relationships with commercial interests. Disclosure
  • 3. Collaborators Prof. Amit Sheth (PhD Advisor) Kno.e.sis Center, Wright State University, OH, USA Dr. Jyotishman Pathak (Mentor) Mayo Clinic, Rochester, MN, USA
  • 4. http://www.internetlivestats.com/internet-users/ Around 3 Billions (40%) of the world population Around 300 Million (87 %) of the US population 4 Internet Users in the World
  • 7. Online Health Information Seeking 7 According to the Pew Survey, approximately 8 in 10 online health inquiries initiate from a search engine. Fox S, Duggan M. Pew Internet & American Life Project. 2013. Health online 2013
  • 8. • According to Center for Disease Control and Prevention, in the United States – CVD is one of the most common chronic diseases – the leading cause of death (1 in every 4 deaths) • CVD is common across all socioeconomic groups and demographics • Online health resources are “significant information supplement” for the patients with chronic conditions 8 Cardiovascular Diseases (CVD) Use-case
  • 9. Motivation • Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated what and how users search for CVD related information online • Such knowledge can be applied to improve the online health search experience as well as to develop more advanced next-generation knowledge and content delivery systems
  • 11. • Data: – CVD related search queries – Limited to United States • Data timeframe: – September 2011 to August 2013 • Data collection tool: – IBM NetInsight On Demand (Web Analytics tool) • Dataset size: – 10 million CVD related SQ – Significantly large dataset for a single class of diseases. 11 Dataset Creation
  • 12. 12 Top CVD Search Queries Top 1-5 Queries Top 6-10 Queries heart attack symptom congestive heart failure blood pressure chart low blood pressure how to lower blood pressure stroke symptoms heart rate normal blood pressure broken heart syndrome high blood pressure symptoms
  • 13. Health Categories • Selected “14 consumer oriented” health categories, representing health information needs • Methods – Focus group study (Published in JMIR) – Online health information seeking literature – Empirical data analysis – Health categories on popular health websites • The health categories and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts. 13
  • 14. Health Categories Health Categories Health Categories 1 Symptoms 8 Living with 2 Causes 9 Prevention 3 Risks & Complications 10 Side effects 4 Drugs and Medications 11 Medical devices 5 Treatments 12 Diseases and conditions 6 Tests and Diagnosis 13 Age-group References 7 Food and Diet 14 Vital signs 14 Drugs and Medications: tylenol raise blood pressure, ibuprofen heart rate, dextromethorphan blood pressure, medications pulmonary hypertension,
  • 15. Health Categories Example 15 Search Query Health Categories Heart palpitations with headache Symptoms Tylenol and blood pressure Medication, Vital sign Pump for pulmonary hypertension Medical device, Disease Red wine heart disease Food, Disease Bypass surgery Treatment
  • 16. Classification: Possible Approaches • Statistical Machine Learning algorithms – Require training data – For multiclass classification problem with 14 classes, we need lot of training data – Training data • expensive to create as it should be created manually by domain expert • Coverage will be limited – Does not consider semantics of queries 16
  • 17. Domain Constraint Classifier trained for one disease may not work for other diseases as the symptom, treatment, drugs and medications varies by the diseases 17
  • 18. Background Knowledge • UMLS (Unified Medical Language System) – Comprises over 1 million biomedical concepts and 5 million concept names – Incorporates variety of medical vocabularies and concepts, and maps each concept to semantic types – Contains Consumer Health Vocabulary (CHV) • Hair loss => Alopecia – Quarterly updated with new concepts 18
  • 19. Semantic Analysis • UMLS Semantic Type – Example: symptom or sign, disease or syndrome • UMLS Concepts – Example: blood pressure, heart rate • UMLS MetaMap – Tool for recognizing UMLS concepts in the text 19
  • 20. MetaMap Usage Challenge and Solution 20 Hadoop-MapReduce framework with 16 Nodes Functional overview of a mapper
  • 21. Gold Standard Dataset Creation • Randomly selected 2000 search queries from the analysis dataset. • Two domain experts manually annotated 2000 search queries by labeling one search query with zero, one, or more than health category • The annotators first discussed and agreed upon the annotation scheme. • To reduce the probability of human errors and subjectivity, the two annotators discussed together and annotated each query and created a gold standard dataset with 2000 search queries. • The gold standard dataset is further divided into training and testing dataset with 1000 search queries each. 21
  • 22. 22 Health Category Categorization Rule Example Drugs and Medications • ST: ORCH|PHSU, CLND, PHSU • CC: medication, medicine, drugs, dose, dosage, tablet, pill • KW: meds • without CC: alcohol, caffeine, fruit, prevent • Tylenol raise blood pressure • Medications pulmonary hypertension • ibuprofen heart rate • Dextromethorph an blood pressure
  • 23. 23 Intent classes UMLS Semantic Types (ST), UMLS Concepts (CC) and Keywords (KW) Symptoms ST: SOSY CC: symptoms, signs Causes CC: cause, reason Risks & Complications CC: risk, complications Drugs and Medications ST: ORCH|PHSU, CLND, PHSU CC: medication, medicine, drugs, dose, dosage, tablet, pill KW: meds (without CC: alcohol, caffeine, fruit, prevent) Treatments ST: TOPP, FTCN (treatment, surgery), CNCE (treatment), CC: remedy, remediate (without CC: prevention and ‘Drugs and Medication’ queries) Tests and Diagnosis ST: DIAP, LBPR, LBTR CC: Test, diagnosis (without ST: DIAP| TOPP, CC: alcohol, blood caffeine) Food and Diet ST: FOOD CC: caffeine, recipe, meal, menu, diet, eat, breakfast, lunch, dinner, alcohol, drink Living with CC: control, manage, reduce, lower, coping, cure, recover KW: living with, bring down, low down Prevention CC: prevent, avoidance, low risk Side effects CC: side effect KW: side effect Medical devices ST: MEDD Diseases and conditions ST: DSYN Age-group References ST: AGGP Vital signs CC: blood pressure, heart rate, pulse rate, temperature, Heart beat, blood glucose (without high/low blood pressure as we considered them under ‘Diseases and Conditions’)
  • 24. Evaluation: Micro average Precision Recall • Classify 1000 search queries from the testing dataset using the rule-based classifier • Based on the evaluation, our classification approach has very good Micro Average – Precision: 0.8842, – Recall: 0.8642 – and F-Score: 0.8723 24
  • 25. Evaluations: Precision and Recall Analysis for each Health Category 25
  • 26. 26 Results No Intent Classes Total Queries Percentage Distribution 1 Diseases 4,232,398 40.66 2 Vital signs 3,455,809 33.20 3 Symptoms 1,422,826 13.67 4 Living with 1,178,756 11.32 5 Treatments 955,701 9.18 6 Food and Diet 779,949 7.49 7 Med Devices 665,484 6.39 8 Drugs and Medications 603,905 5.80 9 Causes 599,895 5.76 10 Tests & Diagnosis 344,747 3.31 11 Risks and Complication 277,294 2.66 12 Prevention 136,428 1.31 13 Age-group References 87,929 0.84 14 Side effects 25,655 0.25 Total 10,408,921 100
  • 27. 27 Results 8% 48% 40% 4% 0% Distribution of search queries by number of intent classes in which they are categorized 0 1 2 3 4 and 5
  • 29. 29 • Average search query length for CVD is 3.88 words and 22.22 characters • Around 80% of the CVD search queries have 3 or more words. • CVD search queries are longer than previously reported non-medical as well as medical queries Data Analysis Results
  • 30. Discussion and Conclusion • We found that use of MetaMap and UMLS concepts/semantic type to be a very good approach for customized health categorization • The top searched health categories for CVD are ‘Diseases and Conditions’, ‘Vital Sings’, ‘Symptoms’, and ‘Living with’. • Most of the queries (around 88%) are categorized into either one or two health categories. • To the best of our knowledge, there is not much research on understanding online health information searching for chronic diseases and especially for CVD. • This study addresses this knowledge gap and extends our knowledge about online health information search behavior.

Editor's Notes

  1. Since the last decade, Internet literacy and the number of Internet users have increased exponentially.
  2. With the growing availability of the internet and online health resources, consumers are increasingly using the Internet to seek health related information
  3. Online health resources are easily accessible and provide information about most of the health topics. These resources can help non-experts to make more informed decisions and play a vital role in improving health literacy.
  4. One of the most common ways to seek online health Information is via Web search engines such as Google, Yahoo! and Bing Therefore, studying health related search logs can help us to understand what health topics Online Health Information Seekers (OHIS) search for (“information need”) and how do they formulate search queries (“expression of information need”).
  5. While selecting the intent classes, we studied the health categories on popular health websites (e.g., Mayo Clinic, WebMD, etc.) and the types of information frequently mentioned along with CVD search queries Note that there can be possible overlaps between some of the intent classes, but in our analysis we considered both as a separate intent classes in order to fine grain understanding of search intent These intent classes and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts.
  6. While selecting the intent classes, we studied the health categories on popular health websites (e.g., Mayo Clinic, WebMD, etc.) and the types of information frequently mentioned along with CVD search queries Note that there can be possible overlaps between some of the intent classes, but in our analysis we considered both as a separate intent classes in order to fine grain understanding of search intent These intent classes and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts.
  7. National Library of Medicine (NLM) Approarich based on medical background knowledge UMLS
  8. MetaMap is a tool for recognizing UMLS concepts in the text. For a given search query MetaMap identifies one or more UMLS concepts, their semantic types, Concept Unique Identifiers (CUIs), and other details MetaMap uses a knowledge-intensive approach based on natural-language processing (NLP) and computational-linguistic techniques
  9. To check the performance of the classification approach for individual health categories.
  10. One in every two search is related to either ‘Diseases and Conditions’ or ‘Vital signs’. Other popular health categories that users search for includes ‘Symptoms’, ‘Living with’, ‘Treatments’, ‘Food and Diet’ and ‘Causes’. Although CVD can be prevented with some lifestyle and diet changes, interestingly very few OHISs search for CVD ‘Prevention’.
  11. . Our approach did not categorize 8.13% of the queries into any health categories. After studying the uncategorized search queries, we found that there are few queries that do not fit into any of the selected 14 categories such as cardiac surgeon, cardiology mayo, video on cardiovascular, pediatric cardiology, and orthostatic. A search query can be categorized into zero, one or more health categories Using our categorization approach, we categorized 92% of the 10 million CVD related queries into at least one health category Most of the queries (around 88%) are categorized into either one or two categories Very few CVD queries (4.28%) are categorized into 3 or more categories
  12. Users predominantly formulate search queries using keywords (80%), though queries with Wh-Questions are also significant Few queries (2.5%) are formulated as Yes/No type questions In Wh-questions, OHISs mostly use “How” and “What” in the search queries and both of them generally signify that more descriptive information is needed Yes/No questions are usually used to check some factual information. In Yes/No Questions, OHISs more often start the search queries with “does” “can” and “is”
  13. Longer search queries also denote users’ interest in more specific information about the disease; subsequently users use more words to narrow down to a particular health topic.
  14. of the health related search queries as UMLS incorporates variety of medical vocabularies and concepts, and mapping of each concept to semantic types. However for customized categorization, we have to carefully select/eliminate UMLS semantic types and concepts considering the alignment of their scope with desired categories