SlideShare a Scribd company logo
1 of 24
PRACTICAL
ANALYSIS IN THE
FIGHT AGAINST
CANCER:
Advice for data scientists
deep6.ai
Brian Dolan, Chief Scientist + Co-founder
• Deep 6 AI and the fight against cancer
• Why graphs for massive data?
• Applications of graphs
• Guidance on NLP
• Sage-like wisdom
• Final thoughts / Q+A
WHAT
WE’LL
TALK
ABOUT
WHAT WE
WON’T*
SAY
the “c” word,
repeatedly
*Let’s use a euphemism instead.
“DEEP 6 AI IS A
GAME-CHANGER.”
CUSTOMER TESTIMONIAL
WEBINAR
TOP 100 MOST DISRUPTIVE
COMPANIES IN THE WORLD
Deep 6 AI applies AI and NLP to clinical
data to find patients for clinical trials in
minutes, not months.
ALUM OF TECHSTARS, STARTX,
AND HEALTHBOX ACCELERATORS
WIN AT SXSW 2017 ACCELERATOR:
ENTERPRISE + SMART DATA
INNOVATION HAPPENS IN CLINICAL TRIALS
… BUT TOO FEW PEOPLE PARTICIPATE
3.7MPATIENT
SHORTFALL
5.9M
goal
2015
SOURCES: clinicaltrials.gov, CISCRP
(2017: 6.7M)
2.2M
trial participants
PATIENT DATA IS HARD TO
READ
THREE PILLARS OF BEING A “GRAPH
COMPANY”
neo4j
graph database
(est. 2000)
igraph
graph processing system
(est. 2006)
Graph
analytics
(est. 1736)
WHY GRAPHS?
> Not mole
WHY GRAPHS?
> Not mole
> Not mole
WHY GRAPHS?
> Not mole
> Not mole
> Not mole
WHY GRAPHS?
> Not mole
> Not mole
> Not mole
! Stage IV mole attack
Hidden correlation structures make a huge
difference in mole attacks
The field of Algebraic Graph Theory is quite well
developed and offers a lot of machinery for analysis
SOME GRAPH ANALYTICS
• Basic descriptions include:
• Connectedness: can you go from any node to another node?
• Degree of a vertex
• Transitivity
• Betweeness
• Community detection is a variety of methods to find
dense sub-networks
• Read “Statistical Analysis of Network Data” by Kolaczyk
(supplement by Csardi is really good, too)
• Strong body of Algebraic Graph Theory (next!)
ADJACENCY MATRICES
B
C D
A B C D E F G
A 0 1 0 0 1 0 0
B 0 0 0 0 0 0 0
C 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0
E 0 1 0 0 0 0 0
F 0 0 0 0 0 0 1
G 0 0 0 1 0 0 0
E
G
F
A
# {Length n Paths}
Directed edge =
Asymmetric matrix
Transformation of
Graph by Graph
• Just like that, we have turned an arbitrary collection of objects into a
Linear Algebra problem
• Any PCA, Spectral Decomp you do will translate into the Edge space
BIGRAPH: PATIENTS / SYMPTOM
PAIRS
Assume X is n x m.
D
E
W
X
Y
Z
B
C
A
• X is a rectangular matrix – in our case, very tall
• X’X is the number of Patients the Symptoms
have in common and corresponds to a
Symptom Graph
• X X’ is the number of Symptoms the Patients
have in common and corresponds to a Patient
Graph
• Both matrices are square
• Both can be analyzed as directed graphs
EXAMPLE: SUPERVISED LEARNING IN
GRAPHS
Discoloration (mole)
Mole
(rodent)
Lyme
Disease
Malignant
neoplasms
Neoplasms of
the lung
Neoplasms of
the skin
Malignancies
Disease
vector
Lexically related
Semantically related
+ User labels - User labels
EXAMPLE: SUPERVISED LEARNING IN
GRAPHS
Discoloration (mole)
Mole
(rodent)
Lyme
Disease
Malignant
neoplasms
Neoplasms of
the lung
Neoplasms of
the skin
Malignancies
Disease
vector
Lexically related
Semantically related
+ User labels - user labels
DECIDE THE DOMAIN
• Analyze X’X to find patterns in Symptoms.
• Unlike methods like k-means, you are operating on the
relationships between the objects, not the objects themselves
• By default, everything you do is context-sensitive.
“This thing makes more less sense in the presence of that thing”
• That is semantic analysis at a primitive, but extremely practical
and effective level.
“RA”
Rheumatoid
Arthritis
Room Air
Inpatient?
Lung cancer?
J44.9?
COPD?
Hospitalist?
Anti-inflammatories?
General practitioner?
ER visit?
Refractory
Anemias Dysplasia?
Bone marrow?
Leukemia?
Joint pain?
NATURAL LANGUAGE PROCESSING
PITFALLS OF NLP IN PRACTICE
“D. tested negative for the following: sepsis,
secondary infection, metastatic nodules.”
Negations are VERY hard and the subject of active
research. Ubiquitous in non-trivial domains, e.g., not
Twitter or movie reviews
tf/idf rewards the wrong things, ignores contextual
queues and has few theoretical underpinnings.
Latent Dirichlet Allocation assumes topics can be
expressed as permutations of tokens.
Because there will always be domains of knowledge, there will always be domains in NLP.
And it follows that there will always be some degree of feature engineering. In humans, this
is analogous to “college.”
BUT BRIAN,
WHAT
ABOUT
DEEP
LEARNING?
• Pretty cool results in limited domains
• Almost certainly require more data than
you have in your domain
• Long-Short-Term-Memory assumes you
want to predict the “next token” or mimic a
series of tokens
• The corpus needs to provide similar context
with different tokens A LOT of times
• There are always relationships that appear
to be errors, but actually occur in the data
• Violates own promise of “no feature
engineering”
WISDOM OF THE ANCIENT
• Indexing data is not analyzing
data
•Storing data is just kicking the
can to the next guy
•We must try to be smarter,
better and more
relevant to the
world
•Let’s generate
universal truths if
we can
• Don’t ask your software
what analyses you should
do
• Learn the math from first
principles
• Take time to align the
methods to the problem,
don’t rely on mental
furniture
Don’t ask your barber if you
need a haircut
The map is not the terrain
THINGS HOLDING MY INTEREST NOW
igraph
Politics family ice hockey robots
coffee tacos PDEs Irish music
management theory VCs
vacation with my wife naps,
solar energy health care for all
marine ecosystems Blender
biking Markov Chains Americana
Sleeping in a Wigwam! AYSO
sales gun control…
THE FUTURE IS MY
RESPONSIBILTY
deep6.ai
STAY
POSITI
VE!
brian@deep6.ai
THANK YOU!
deep6.ai
Brian Dolan
brian@deep6.ai

More Related Content

Similar to A Practical Use of Artificial Intelligence in the Fight Against Cancer by Brian Dolan

Bayesian networks and the search for causality
Bayesian networks and the search for causalityBayesian networks and the search for causality
Bayesian networks and the search for causalityBayes Nets meetup London
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)Yoon Sup Choi
 
On the importance (and absence) of annotation in Next Generation Sequencing Data
On the importance (and absence) of annotation in Next Generation Sequencing DataOn the importance (and absence) of annotation in Next Generation Sequencing Data
On the importance (and absence) of annotation in Next Generation Sequencing DataHugh Shanahan
 
Querylog-based Assessment of Retrievability Bias in Delpher
Querylog-based Assessment of Retrievability Bias in DelpherQuerylog-based Assessment of Retrievability Bias in Delpher
Querylog-based Assessment of Retrievability Bias in DelpherMyriam Traub
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentEleanor Howe
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostelloData Con LA
 
How deep learning reshapes medicine
How deep learning reshapes medicineHow deep learning reshapes medicine
How deep learning reshapes medicineHongyoon Choi
 
Big biomedical data is a lie
Big biomedical data is a lieBig biomedical data is a lie
Big biomedical data is a liePaul Agapow
 
Jillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian Aurisano
 
Humanizing bioinformatics
Humanizing bioinformaticsHumanizing bioinformatics
Humanizing bioinformaticsJan Aerts
 
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Allen Day, PhD
 
Big data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big DataBig data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big DataChristos Hadjinikolis
 

Similar to A Practical Use of Artificial Intelligence in the Fight Against Cancer by Brian Dolan (20)

Bayesian networks and the search for causality
Bayesian networks and the search for causalityBayesian networks and the search for causality
Bayesian networks and the search for causality
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
 
Data mining BY Zubair Yaseen
Data mining BY Zubair YaseenData mining BY Zubair Yaseen
Data mining BY Zubair Yaseen
 
On the importance (and absence) of annotation in Next Generation Sequencing Data
On the importance (and absence) of annotation in Next Generation Sequencing DataOn the importance (and absence) of annotation in Next Generation Sequencing Data
On the importance (and absence) of annotation in Next Generation Sequencing Data
 
Querylog-based Assessment of Retrievability Bias in Delpher
Querylog-based Assessment of Retrievability Bias in DelpherQuerylog-based Assessment of Retrievability Bias in Delpher
Querylog-based Assessment of Retrievability Bias in Delpher
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
 
Using Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and developmentUsing Bioinformatics Data to inform Therapeutics discovery and development
Using Bioinformatics Data to inform Therapeutics discovery and development
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostello
 
How deep learning reshapes medicine
How deep learning reshapes medicineHow deep learning reshapes medicine
How deep learning reshapes medicine
 
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
 
Big biomedical data is a lie
Big biomedical data is a lieBig biomedical data is a lie
Big biomedical data is a lie
 
Jillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-jaJillian ms defense-4-14-14-ja
Jillian ms defense-4-14-14-ja
 
Humanizing bioinformatics
Humanizing bioinformaticsHumanizing bioinformatics
Humanizing bioinformatics
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
intro_big_data.pptx
intro_big_data.pptxintro_big_data.pptx
intro_big_data.pptx
 
Data in science
Data in science Data in science
Data in science
 
Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...Deep learning in medicine: An introduction and applications to next-generatio...
Deep learning in medicine: An introduction and applications to next-generatio...
 
Big data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big DataBig data week 2018 - Graph Analytics on Big Data
Big data week 2018 - Graph Analytics on Big Data
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 

Recently uploaded (20)

AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 

A Practical Use of Artificial Intelligence in the Fight Against Cancer by Brian Dolan

  • 1. PRACTICAL ANALYSIS IN THE FIGHT AGAINST CANCER: Advice for data scientists deep6.ai Brian Dolan, Chief Scientist + Co-founder
  • 2. • Deep 6 AI and the fight against cancer • Why graphs for massive data? • Applications of graphs • Guidance on NLP • Sage-like wisdom • Final thoughts / Q+A WHAT WE’LL TALK ABOUT
  • 3. WHAT WE WON’T* SAY the “c” word, repeatedly *Let’s use a euphemism instead.
  • 4. “DEEP 6 AI IS A GAME-CHANGER.” CUSTOMER TESTIMONIAL WEBINAR TOP 100 MOST DISRUPTIVE COMPANIES IN THE WORLD Deep 6 AI applies AI and NLP to clinical data to find patients for clinical trials in minutes, not months. ALUM OF TECHSTARS, STARTX, AND HEALTHBOX ACCELERATORS WIN AT SXSW 2017 ACCELERATOR: ENTERPRISE + SMART DATA
  • 5. INNOVATION HAPPENS IN CLINICAL TRIALS … BUT TOO FEW PEOPLE PARTICIPATE 3.7MPATIENT SHORTFALL 5.9M goal 2015 SOURCES: clinicaltrials.gov, CISCRP (2017: 6.7M) 2.2M trial participants
  • 6. PATIENT DATA IS HARD TO READ
  • 7. THREE PILLARS OF BEING A “GRAPH COMPANY” neo4j graph database (est. 2000) igraph graph processing system (est. 2006) Graph analytics (est. 1736)
  • 9. WHY GRAPHS? > Not mole > Not mole
  • 10. WHY GRAPHS? > Not mole > Not mole > Not mole
  • 11. WHY GRAPHS? > Not mole > Not mole > Not mole ! Stage IV mole attack Hidden correlation structures make a huge difference in mole attacks The field of Algebraic Graph Theory is quite well developed and offers a lot of machinery for analysis
  • 12. SOME GRAPH ANALYTICS • Basic descriptions include: • Connectedness: can you go from any node to another node? • Degree of a vertex • Transitivity • Betweeness • Community detection is a variety of methods to find dense sub-networks • Read “Statistical Analysis of Network Data” by Kolaczyk (supplement by Csardi is really good, too) • Strong body of Algebraic Graph Theory (next!)
  • 13. ADJACENCY MATRICES B C D A B C D E F G A 0 1 0 0 1 0 0 B 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 D 0 0 0 0 0 0 0 E 0 1 0 0 0 0 0 F 0 0 0 0 0 0 1 G 0 0 0 1 0 0 0 E G F A # {Length n Paths} Directed edge = Asymmetric matrix Transformation of Graph by Graph • Just like that, we have turned an arbitrary collection of objects into a Linear Algebra problem • Any PCA, Spectral Decomp you do will translate into the Edge space
  • 14. BIGRAPH: PATIENTS / SYMPTOM PAIRS Assume X is n x m. D E W X Y Z B C A • X is a rectangular matrix – in our case, very tall • X’X is the number of Patients the Symptoms have in common and corresponds to a Symptom Graph • X X’ is the number of Symptoms the Patients have in common and corresponds to a Patient Graph • Both matrices are square • Both can be analyzed as directed graphs
  • 15. EXAMPLE: SUPERVISED LEARNING IN GRAPHS Discoloration (mole) Mole (rodent) Lyme Disease Malignant neoplasms Neoplasms of the lung Neoplasms of the skin Malignancies Disease vector Lexically related Semantically related + User labels - User labels
  • 16. EXAMPLE: SUPERVISED LEARNING IN GRAPHS Discoloration (mole) Mole (rodent) Lyme Disease Malignant neoplasms Neoplasms of the lung Neoplasms of the skin Malignancies Disease vector Lexically related Semantically related + User labels - user labels
  • 17. DECIDE THE DOMAIN • Analyze X’X to find patterns in Symptoms. • Unlike methods like k-means, you are operating on the relationships between the objects, not the objects themselves • By default, everything you do is context-sensitive. “This thing makes more less sense in the presence of that thing” • That is semantic analysis at a primitive, but extremely practical and effective level.
  • 18. “RA” Rheumatoid Arthritis Room Air Inpatient? Lung cancer? J44.9? COPD? Hospitalist? Anti-inflammatories? General practitioner? ER visit? Refractory Anemias Dysplasia? Bone marrow? Leukemia? Joint pain? NATURAL LANGUAGE PROCESSING
  • 19. PITFALLS OF NLP IN PRACTICE “D. tested negative for the following: sepsis, secondary infection, metastatic nodules.” Negations are VERY hard and the subject of active research. Ubiquitous in non-trivial domains, e.g., not Twitter or movie reviews tf/idf rewards the wrong things, ignores contextual queues and has few theoretical underpinnings. Latent Dirichlet Allocation assumes topics can be expressed as permutations of tokens. Because there will always be domains of knowledge, there will always be domains in NLP. And it follows that there will always be some degree of feature engineering. In humans, this is analogous to “college.”
  • 20. BUT BRIAN, WHAT ABOUT DEEP LEARNING? • Pretty cool results in limited domains • Almost certainly require more data than you have in your domain • Long-Short-Term-Memory assumes you want to predict the “next token” or mimic a series of tokens • The corpus needs to provide similar context with different tokens A LOT of times • There are always relationships that appear to be errors, but actually occur in the data • Violates own promise of “no feature engineering”
  • 21. WISDOM OF THE ANCIENT • Indexing data is not analyzing data •Storing data is just kicking the can to the next guy •We must try to be smarter, better and more relevant to the world •Let’s generate universal truths if we can • Don’t ask your software what analyses you should do • Learn the math from first principles • Take time to align the methods to the problem, don’t rely on mental furniture Don’t ask your barber if you need a haircut The map is not the terrain
  • 22. THINGS HOLDING MY INTEREST NOW igraph Politics family ice hockey robots coffee tacos PDEs Irish music management theory VCs vacation with my wife naps, solar energy health care for all marine ecosystems Blender biking Markov Chains Americana Sleeping in a Wigwam! AYSO sales gun control… THE FUTURE IS MY RESPONSIBILTY

Editor's Notes

  1. This is hard because
  2. Why Graphs? Very well studied mathematically, making a comeback with modern computing Diseases are expressed as clusters or constellations of symptoms The feature space of symptoms shifts over time Systems of relationship define the status of an illness, not just the symptoms
  3. Why Graphs? Very well studied mathematically, making a comeback with modern computing Diseases are expressed as clusters or constellations of symptoms The feature space of symptoms shifts over time Systems of relationship define the status of an illness, not just the symptoms
  4. Why Graphs? Very well studied mathematically, making a comeback with modern computing Diseases are expressed as clusters or constellations of symptoms The feature space of symptoms shifts over time Systems of relationship define the status of an illness, not just the symptoms
  5. Why Graphs? Very well studied mathematically, making a comeback with modern computing Diseases are expressed as clusters or constellations of symptoms The feature space of symptoms shifts over time Systems of relationship define the status of an illness, not just the symptoms
  6. You can describe a graph with n nodes as and nxn matrix with the entries as edge strength You can take a matrix X and make it a graph G Because of this, you can multiply a Graph with another Graph And your favorite Markov Chain is also a graph Directed graphs, including bigraphs, have asymmetric matrices* LINDA: This slide is a visual mess right now
  7. Semantic Analysis Term co-opted from linguists by computer scientists Now generally means “understanding context of data points” Think going graph with no edges to graph with edges Deep Learning Techniques Pretty cool results Almost certainly require more data than you have in your domain
  8. Not bad, but Cool results on some domains Requires a lot of data and
  9. Let’s be realistic about how much “Science” we are doing. Science has always been about data, hypothesis testing and peer review Many people in that role now are simply throwing pre-packaged routines against data, and they haven’t checked the assumptions of the models That job title is going to be obviated by better software packages We must try to be smarter, better and more relevant to the world Let’s generate universal truths if we can