Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Workshop)
1. Researcher Dilemmas
using Behavioral Big Data in Healthcare
INFORMS Workshop on Data Mining & Decision Analysis
Houston, TX, Oct 21, 2017
Galit Shmueli 徐茉莉
Institute of Service Science
2. What is Behavioral Big Data (BBD)
Special type of Big Data
• Behavioral: people’s measurable
“everyday” behavior, interactions, self-
reported opinions, thoughts, feelings
• Human and social aspects:
Intentions, deception, emotion,
reciprocation, herding,…
When aware of data collection -> modified behavior (legal risks, embarrassment,
unwanted solicitation)
3. BBD vs. Inanimate Big Data
Behavioral
Big Data
Researcher
Human
Subjects
Research
Question
Inanimate
Big Data
Researcher
Research
Question
1. Aware, ongoing interaction
with the BBD - “contaminate”
BBD with intention,
deception, emotion, herding…
2. Can be harmed by BBD
4. Figure 1: The types of physiological
data points and the wearable
sensors under development or on
the market to monitor them.
Elenko, Underwood & Zohar (2015),
“Defining Digital Medicine”,
Nature Biotechnology 33, 456-461
Physiological
Big Data
Human
Subjects
5. BBD vs.
Physiological
Big Data
• Physical/bio
measurements
• Data collection
timing often set
by medical system
• Clinical trials:
awareness &
vested interest
• People’s daily actions,
interactions, self-reported
feelings, opinions,
thoughts (UGC)
• Data generation timing
often chosen by user
• Experiments: users often
unaware; goal not always
in user’s interest
Different research methods in life sciences and behavioral sciences
• Measurement instruments
• Models (latent variable models, social network analysis)
• Human subjects risks
6. “Behavioral Health” Data vs. BBD
• Behaviors: substance abuse & mental health
• Population: patients with mental illness /
substance abuse
• Specific (defined) behavior of patient
• “Big”?
www.carolinashealthcare.org/medical-services/prevention-wellness/behavioral-health
8. Hospital Data on Patients, Staff, Assets
Patients
Personal info
Medical history (visits, tests,
medication, hospitalization...)
Scheduled events, billing
Physicians
Scheduled + actual appointments,
procedures, prescriptions,…
Entries of patient info/data
Nurses
Location, work hours,…
Pharmacy staff
Speed of service
Quality of service
Lab staff
Speed of service
Quality of service
Other staff
Finance/accounting
Cleaning
Receptionists
Volunteers
Food court…
Data Collection
Technologies:
• Medical devices
• HIT systems
(EHR, HR for
Health Info
System)
---
Smart Hospital
• Cameras
• Sensors
• GPS
• IoT
9. Typical Research Fields using Hospital Data
Operations Researchers and Industrial Engineers
For: Hospital Management and Operations
(staffing, scheduling,…)
Medical/Healthcare Researchers & Clinicians
For: Improved Medical Treatment
(safety, effectiveness,…)
Information Systems Researchers
For: Improved Design & Use of Medical IS
(value of IS, effectiveness, standardization,…)
11. Behavioral big data also on…
Interactions between
Patients – doctors/nurses
Doctors – other doctors
Patients – other patients
Patient family – hospital staff
Patients – social network ”friends”
...
12. Health-related BBD: Online
• Medical/health websites
• Online forums
• Social networks
• Search engines
Info voluntarily entered by users: personal details, photos, comments,
messages, search terms, likes, payment information, connections with “friends”
Passive footprints: duration on the website, pages browsed, sequence,
referring website, Internet browser, operating system, location, IP address
14. “Some hospitals are collecting new information
from patients directly, while others have sought
data from companies that sell consumer and
financial information, or federal agencies that
provide statistics on poverty, housing density
and unemployment.”
The big obstacle: access to the data. Doctors and nurses have limited time to collect new data
and patients bombarded with questions about their lives may suffer “interview fatigue”
Health-unrelated
BBD
15. Research Using New Medical BBD: Challenges
Behavioral
Big Data
Researcher
Human
Subjects
Research
Question
Scientific vs.
Clinical vs.
Commercial
Explain
vs.
Predict
Different (conflicting) Goals:
Unit of analysis vs.
Unit of measurement
Under/over-
coverage
New risks (privacy, liability,
security, HIPAA compliance)
New ethical challenges:
Generalization Challenges:
Acquire + analyze data
Users (self-selection,
spill-over, knowledge of
allocation, network)
Company algorithms
Average effect vs. individual effect
Data contaminated by:New modes of connection &
information (social networks,
forums, IoT)
ATE vs.
Individual
Technical expertise
16. Sample Behavioral
Healthcare-Related BBD Studies
Vocal Minority and Silent Majority:
How Do Online Ratings Reflect
Population Perception of Quality?
Gao et al. (MISQ 2015)
Outcomes matter: estimating
pre-transplant survival rates
of kidney-transplant patients
using simulator-based
propensity scores
Yahav & Shmueli (Annals of
Oper. Research, 2014)
Emotional Contagion in
Social Networks
Kramer et al. (PNAS, 2014)
Detecting influenza
epidemics using search
engine query data
Ginsberg et al. (Nature, 2009)
17. Emotional Contagion in Social Networks
Kramer et al. (2014) Proceedings of the National Academies of Sciences
• Can emotional states be transferred to others via emotional contagion?
• BBD from large-scale experiment run by FB, manipulating users’
exposure level to emotional expressions in their Facebook News Feed
• No IRB
“[The work] was consistent with Facebook’s Data
Use Policy, to which all users agree prior to
creating an account on Facebook, constituting
informed consent for this research.”
• PNAS editorial Expression of Concern
• Varied response from public, academia, press,
ethicists, corporates
18. Behavioral
Big Data
Researcher
Human
Subjects
Research
Question
Scientific vs.
Clinical vs.
Commercial
Explain
vs.
Predict
Different (conflicting) Goals:
Unit of analysis vs.
Unit of measurement
Under/over-
coverage
New risks (privacy, liability,
security, HIPAA compliance)
New ethical challenges:
Generalization Challenges:
Acquire + analyze data
Users (self-selection,
spill-over, knowledge of
allocation, network)
Company algorithms
Average effect vs. individual effect
Data contaminated by:New modes of connection &
information (social networks,
forums, IoT)
ATE vs.
Individual
Technical expertise
20. Detecting influenza epidemics using search engine query data
Ginsberg et al. (2009), Nature
• “Up-to-date influenza estimates may enable public health officials and health
professional to better respond to seasonal epidemics”
• Researchers from Google and CDC
• BBD: automated search results for 50M keywords on Google.com (2003-
2007). For each query, collected {query text, IP address}
• Analysis: Fit 450M different models, correlating each query text with CDC
data; Combined 45 queries with highest correlation
21. Researchers: epidemiologists + data science academics
Dalton et al. (2016), “Flutracking weekly online community
survey of influenza-like illness annual report, 2015”
Communicable diseases intelligence quarterly report
Challenge: Acquire data
22. • The algorithm detects “flu” or “winter”
• Persistent over-estimation
• Performs worse than lagged CDC
3-week-old data
• Never released 45 terms used
• Changes made by Google’s search
algorithm to display potential diagnoses
+ recommend search for treatment
(more advertising) -> increased search
• Lazer et al. recommend combining/
calibrating GFT with CDC data
23. Behavioral
Big Data
Researcher
Human
Subjects
Research
Question
Scientific vs.
Clinical vs.
Commercial
Explain
vs.
Predict
Different (conflicting) Goals:
Unit of analysis vs.
Unit of measurement
Under/over-
coverage
New risks (privacy, liability,
security, HIPAA compliance)
New ethical challenges:
Generalization Challenges:
Acquire + analyze data
Users (self-selection,
spill-over, knowledge of
allocation, network)
Company algorithms
Average effect vs. individual effect
Data contaminated by:New modes of connection &
information (social networks,
forums, IoT)
ATE vs.
Individual
Technical expertise
26. … and new challenges
Behavioral
Big Data
Researcher
Human
Subjects
Research
Question
Scientific vs.
Clinical vs.
Commercial
Explain
vs.
Predict
Different (conflicting) Goals:
Unit of analysis vs.
Unit of measurement
Under/over-
coverage
New risks (privacy, liability,
security, HIPAA compliance)
New ethical challenges:
Generalization Challenges:
Acquire + analyze data
Users (self-selection,
spill-over, knowledge of
allocation, network)
Company algorithms
Average effect vs. individual effect
Data contaminated by:New modes of connection &
information (social networks,
forums, IoT)
ATE vs.
Individual
Technical expertise
27. Anal yt ics
Humanit y
Responsibil it y
Galit Shmueli 徐茉莉
Institute of Service Science
Editor's Notes
Inanimate:
Medical devices and drug manufacturing (quality control, safety)
Laboratory testing
“patients as well as medical staff will be communicating in a non-private environment. It is very important to understand, monitor and control your own content for its privacy implications. More dangerous and needing control will be the reach of patient-to-patient identification and communication.” - http://www.medicalwebtimes.com/thetimes/medical-headlines/top-10-pros-cons-for-medical-practices-using-social-networking-web-sites/
Kayhan Parsi, JD, PhD, and Nanette Elster, JD, MPH. Why Can't We Be Friends? A Case-Based Analysis of Ethical Issues with Social Media in Health Care. AMA Journal of Ethics, November 2015 DOI: 10.1001/journalofethics.2015.17.11.peer1-1511
HHS propose new IRB exemption criteria for publicly available data (or even buying it)
Council for Big Data, Ethics & Society’s letter: “these criteria for exclusion focus on the status of the dataset… not the content of the dataset nor what will be done with the dataset, which are more accurate criteria for determining the risk profile of the proposed research
“patients as well as medical staff will be communicating in a non-private environment. It is very important to understand, monitor and control your own content for its privacy implications. More dangerous and needing control will be the reach of patient-to-patient identification and communication.” - http://www.medicalwebtimes.com/thetimes/medical-headlines/top-10-pros-cons-for-medical-practices-using-social-networking-web-sites/
Kayhan Parsi, JD, PhD, and Nanette Elster, JD, MPH. Why Can't We Be Friends? A Case-Based Analysis of Ethical Issues with Social Media in Health Care. AMA Journal of Ethics, November 2015 DOI: 10.1001/journalofethics.2015.17.11.peer1-1511
How Does Flutracking work?
It takes only 10 - 15 seconds each week. We ask if you have had fever or cough in the last week. This will help us find ways to detect both seasonal influenza and hopefully pandemic influenza and other diseases so we can better protect the community from epidemics.
FluNearYou.org