Our paper presented at AMIA 2014 Annual Symposium. Paper is available at:
http://www.knoesis.org/library/resource.php?id=2002
Citation: Ashutosh Jadhav, Amit Sheth, Jyotishman Pathak 'Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer Health Information Portal', AMIA Annual Symposium 2014, Washington DC, November 15-19, 2014
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Semantic Analysis of Online Health Information Seeking for Cardiovascular Diseases, Ashutosh Jadhav, AMIA 2014 Annual Symposium
1. Semantic Analysis of Online Health Information
Seeking for Cardiovascular Diseases
1
Ashutosh Jadhav
ashutosh@knoesis.org
AMIA 2014 Annual Symposium
Washington, DC
2. • Speaker discloses that he has no
relationships with commercial interests.
Disclosure
3. Collaborators
Prof. Amit Sheth (PhD Advisor)
Kno.e.sis Center, Wright State
University, OH, USA
Dr. Jyotishman Pathak (Mentor)
Mayo Clinic, Rochester, MN, USA
7. Online Health Information Seeking
7
According to the Pew Survey, approximately 8 in 10 online
health inquiries initiate from a search engine.
Fox S, Duggan M. Pew Internet & American Life Project. 2013. Health online 2013
8. • According to Center for Disease Control and
Prevention, in the United States
– CVD is one of the most common chronic
diseases
– the leading cause of death (1 in every 4 deaths)
• CVD is common across all socioeconomic
groups and demographics
• Online health resources are “significant
information supplement” for the patients with
chronic conditions
8
Cardiovascular Diseases (CVD)
Use-case
9. Motivation
• Although cardiovascular diseases (CVD) affect a large
percentage of the population, few studies have
investigated what and how users search for CVD related
information online
• Such knowledge can be applied to improve the online
health search experience as well as to develop more
advanced next-generation knowledge and content
delivery systems
11. • Data:
– CVD related search queries
– Limited to United States
• Data timeframe:
– September 2011 to August 2013
• Data collection tool:
– IBM NetInsight On Demand
(Web Analytics tool)
• Dataset size:
– 10 million CVD related SQ
– Significantly large dataset for a
single class of diseases. 11
Dataset Creation
12. 12
Top CVD Search Queries
Top 1-5 Queries Top 6-10 Queries
heart attack symptom congestive heart failure
blood pressure chart low blood pressure
how to lower blood pressure stroke symptoms
heart rate normal blood pressure
broken heart syndrome high blood pressure symptoms
13. Health Categories
• Selected “14 consumer oriented” health categories,
representing health information needs
• Methods
– Focus group study (Published in JMIR)
– Online health information seeking literature
– Empirical data analysis
– Health categories on popular health websites
• The health categories and the classification scheme is reviewed and
validated by the Mayo Clinic clinicians and domain experts.
13
14. Health Categories
Health Categories Health Categories
1 Symptoms 8 Living with
2 Causes 9 Prevention
3 Risks & Complications 10 Side effects
4 Drugs and Medications 11 Medical devices
5 Treatments 12 Diseases and conditions
6 Tests and Diagnosis 13 Age-group References
7 Food and Diet 14 Vital signs
14
Drugs and Medications: tylenol raise blood pressure, ibuprofen heart rate,
dextromethorphan blood pressure, medications pulmonary hypertension,
15. Health Categories Example
15
Search Query Health Categories
Heart palpitations with headache Symptoms
Tylenol and blood pressure Medication, Vital sign
Pump for pulmonary
hypertension
Medical device,
Disease
Red wine heart disease Food, Disease
Bypass surgery Treatment
16. Classification: Possible Approaches
• Statistical Machine Learning algorithms
– Require training data
– For multiclass classification problem with 14 classes, we
need lot of training data
– Training data
• expensive to create as it should be created manually by
domain expert
• Coverage will be limited
– Does not consider semantics of queries
16
17. Domain Constraint
Classifier trained for one disease may
not work for other diseases as the
symptom, treatment, drugs and
medications varies by the diseases
17
18. Background Knowledge
• UMLS (Unified Medical Language System)
– Comprises over 1 million biomedical concepts and 5
million concept names
– Incorporates variety of medical vocabularies and concepts,
and maps each concept to semantic types
– Contains Consumer Health Vocabulary (CHV)
• Hair loss => Alopecia
– Quarterly updated with new concepts
18
19. Semantic
Analysis
• UMLS Semantic Type
– Example: symptom or sign, disease or syndrome
• UMLS Concepts
– Example: blood pressure, heart rate
• UMLS MetaMap
– Tool for recognizing UMLS concepts in the text
19
20. MetaMap Usage Challenge and Solution
20
Hadoop-MapReduce framework with 16 Nodes
Functional overview of a mapper
21. Gold Standard Dataset Creation
• Randomly selected 2000 search queries from the analysis
dataset.
• Two domain experts manually annotated 2000 search queries
by labeling one search query with zero, one, or more than
health category
• The annotators first discussed and agreed upon the annotation
scheme.
• To reduce the probability of human errors and subjectivity, the
two annotators discussed together and annotated each query
and created a gold standard dataset with 2000 search queries.
• The gold standard dataset is further divided into training and
testing dataset with 1000 search queries each.
21
23. 23
Intent classes UMLS Semantic Types (ST), UMLS Concepts (CC) and Keywords (KW)
Symptoms ST: SOSY CC: symptoms, signs
Causes CC: cause, reason
Risks & Complications
CC: risk, complications
Drugs and Medications
ST: ORCH|PHSU, CLND, PHSU CC: medication, medicine, drugs, dose,
dosage, tablet, pill KW: meds (without CC: alcohol, caffeine, fruit, prevent)
Treatments
ST: TOPP, FTCN (treatment, surgery), CNCE (treatment), CC: remedy,
remediate (without CC: prevention and ‘Drugs and Medication’ queries)
Tests and Diagnosis
ST: DIAP, LBPR, LBTR CC: Test, diagnosis (without ST: DIAP| TOPP, CC:
alcohol, blood caffeine)
Food and Diet
ST: FOOD CC: caffeine, recipe, meal, menu, diet, eat, breakfast, lunch, dinner,
alcohol, drink
Living with
CC: control, manage, reduce, lower, coping, cure, recover KW: living with,
bring down, low down
Prevention CC: prevent, avoidance, low risk
Side effects CC: side effect KW: side effect
Medical devices ST: MEDD
Diseases and conditions ST: DSYN
Age-group References ST: AGGP
Vital signs
CC: blood pressure, heart rate, pulse rate, temperature, Heart beat, blood
glucose (without high/low blood pressure as we considered them under
‘Diseases and Conditions’)
24. Evaluation: Micro average
Precision Recall
• Classify 1000 search queries from the testing dataset
using the rule-based classifier
• Based on the evaluation, our classification approach has
very good Micro Average
– Precision: 0.8842,
– Recall: 0.8642
– and F-Score: 0.8723
24
29. 29
• Average search query length for CVD is 3.88 words and 22.22 characters
• Around 80% of the CVD search queries have 3 or more words.
• CVD search queries are longer than previously reported non-medical as well
as medical queries
Data Analysis Results
30. Discussion and Conclusion
• We found that use of MetaMap and UMLS concepts/semantic type
to be a very good approach for customized health categorization
• The top searched health categories for CVD are ‘Diseases and
Conditions’, ‘Vital Sings’, ‘Symptoms’, and ‘Living with’.
• Most of the queries (around 88%) are categorized into either one
or two health categories.
• To the best of our knowledge, there is not much research on
understanding online health information searching for chronic
diseases and especially for CVD.
• This study addresses this knowledge gap and extends our
knowledge about online health information search behavior.
Since the last decade, Internet literacy and the number of Internet users have increased exponentially.
With the growing availability of the internet and online health resources, consumers are increasingly using the Internet to seek health related information
Online health resources are easily accessible and provide information about most of the health topics.
These resources can help non-experts to make more informed decisions and play a vital role in improving health
literacy.
One of the most common ways to seek online health Information is via Web search engines such as Google, Yahoo! and Bing
Therefore, studying health related search logs can help us to understand what health topics Online Health Information Seekers (OHIS) search for (“information need”) and how do they formulate search queries (“expression of information need”).
While selecting the intent classes, we studied the health categories on popular health websites (e.g., Mayo Clinic, WebMD, etc.) and the types of information frequently mentioned along with CVD search queries
Note that there can be possible overlaps between some of the intent classes, but in our analysis we considered both as a separate intent classes in order to fine grain understanding of search intent
These intent classes and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts.
While selecting the intent classes, we studied the health categories on popular health websites (e.g., Mayo Clinic, WebMD, etc.) and the types of information frequently mentioned along with CVD search queries
Note that there can be possible overlaps between some of the intent classes, but in our analysis we considered both as a separate intent classes in order to fine grain understanding of search intent
These intent classes and the classification scheme is reviewed and validated by the Mayo Clinic clinicians and domain experts.
National Library of Medicine (NLM)
Approarich based on medical background knowledge UMLS
MetaMap is a tool for recognizing UMLS concepts in the text. For a given search query MetaMap identifies one or more UMLS concepts, their semantic types, Concept Unique Identifiers (CUIs), and other details
MetaMap uses a knowledge-intensive approach based on natural-language processing (NLP) and computational-linguistic techniques
To check the performance of the classification approach for individual health categories.
One in every two search is related to either ‘Diseases and Conditions’ or ‘Vital signs’.
Other popular health categories that users search for includes ‘Symptoms’, ‘Living with’, ‘Treatments’, ‘Food and Diet’ and ‘Causes’.
Although CVD can be prevented with some lifestyle and diet changes, interestingly very few OHISs search for CVD ‘Prevention’.
. Our approach did not categorize 8.13% of the queries into any health categories. After studying the uncategorized search queries, we found that there are few queries that do not fit into any of the selected 14 categories such as cardiac surgeon, cardiology mayo, video on cardiovascular, pediatric cardiology, and orthostatic. A search query can be categorized into zero, one or more health categories
Using our categorization approach, we categorized 92% of the 10 million CVD related queries into at least one health category
Most of the queries (around 88%) are categorized into either one or two categories
Very few CVD queries (4.28%) are categorized into 3 or more categories
Users predominantly formulate search queries using keywords (80%), though queries with Wh-Questions are also significant
Few queries (2.5%) are formulated as Yes/No type questions
In Wh-questions, OHISs mostly use “How” and “What” in the search queries and both of them generally signify that more descriptive information is needed
Yes/No questions are usually used to check some factual information. In Yes/No Questions, OHISs more often start the search queries with “does” “can” and “is”
Longer search queries also denote users’ interest in more specific information about the disease; subsequently users use more words to narrow down to a particular health topic.
of the health related search queries as UMLS incorporates variety of medical vocabularies and concepts, and mapping of each concept to semantic types. However for customized categorization, we have to carefully select/eliminate UMLS semantic types and concepts considering the alignment of their scope with desired categories