This document summarizes a study that used machine learning to predict cancer patient survival based on integrating multiple types of molecular and clinical data from The Cancer Genome Atlas. The study found that combining molecular data like gene expression, methylation, and mutations with clinical data significantly improved survival prediction for kidney, ovarian, and lung cancers compared to using single data types alone. Analyzing the models provided biological insights into molecular subtypes and markers correlated with survival outcomes. The results suggest that more comprehensive molecular profiling of tumors could help stratify patients and identify targets for personalized cancer treatment.
Machine Learning Predicts Cancer Patient Survival Using Multi-Omics Data
1. Academia Sinica
Taiwan
Gul Muneer
INSTITUTE OF CHEMISTRY
Coach Professor: Dr. Hwang Ming-Jing
Sit-in Professor: Dr. Carmay Lim
Class Coordinator: Dr. Takashi Angata
ANALYSIS _computational
BIOLOGY
1
3. Predicting patient survival― How do clinicians predict?
How do oncologists know how long you have left to live?
Prognostics ─ relating to “prediction”
How long have I got, Doc?
Doctors can’t accurately predict the survival time (prognosis)
Survival prediction ─ task of predicting the remaining life
Prognosis─ guide doctors on planning future and selecting
therapies
3
4. Traditional prognosis
Tumor stage
Luminal A
Luminal B
ERB2+
Basal-like
Luminal BBasal-like ERB2+ Luminal A
ER + ER -
Proliferation
Differentiation
70-Gene
signature
76-Gene
signature
Wound
signature
Recurrence
score
Genomic
grade
Invasiveness
Gene
signature
Molecular classification
Luminal BBasal ERB2+ Luminal A
Good
Prognosis
Poor
PrognosisS3
S2
S1
Traditional and recent prediction approaches for Prognosis
Molecular Classification
Molecular prognosis Genomic prognosis
These studies have used “single-platform data” and have been limited to a “single cancer type”
4
5. Aims and motivations of the study
Kidney
Glioblastoma
Lung
Ovarian
Deletion Normal Amplif.
Cytosine 5-Methyl
Cytosine
AAAA
AAA
AA
Survival Clinical
variables
Somatic copy no. alteration
DNA methylation
mRNA expression
miRNA expression
Protein expression
How and to what extent
molecular profiling
affect oncology practice?
Prognostic utility
(Survival prediction)?
Therapeutic utility
(Clinically actionable genes)?
Target selection for drug
development
Clinical trail design
Identify patient populations
for targeted therapies
5
7. Overview of TCGA samples
kidney renal clear cell carcinoma (KIRC); glioblastoma multiforme (GBM)
ovarian serous cystadenocarcinoma (OV); lung squamous cell carcinoma (LUSC)
(i) SCNA: ~100 arm or focal alterations
(ii) DNA methylation: ~20,000 genes
(iii) mRNA expression: ~20,000 genes
(iv) microRNA expression: >500
microRNAs
(v) protein expression: ~170 proteins
7
8. Overview of the computational approach
Cox regression-builds a predictive model for time-to-event data
LASSO-to identify good features and reduce feature redundancy
Random survival forest-an algorithm for analysis of survival data
C-Index-quantifies power of predictive model.
C-index = 1 indicates perfect prediction accuracy
C-index = 0.5 is as good as a random guess. 8
9. Prognostic power of molecular and clinical data
Integrated data showed higher predictive power for both cancers
Kidney cancer
Ovarian cancer
9
10. Prognostic power of molecular and clinical data
Lung cancer protein expression had predictive power comparable to clinical data
Glioblastoma
Lung cancer
10
11. Predictive power of clinical, molecular and integrated data
Integrated models showed significant predictive
power in 3 cancer types.
LUSC protein expression is only molecular data
alone showed performance similar to clinical data
Similar trends in Cox and RSF models.
11
12. Biological insights from prognostic models
Building classifiers
By 5-fold cross-validation
NMF subtypes reveal
Distinct survival patterns
Predicted NMF subtypes
show expected survival
NMF subtypes (derived from miRNA expression) showed distinct survival patterns 12
13. Survival pattern of NMF subtypes matches the survival correlation of individual protein markers.
Molecular subtypes defined by LUSC protein expression
13
14. KIRC miRNAs correlated with survival
Higher or lower signature miRNAs are correlated with survival
Better prognosis
Worse prognosis
Better prognosis
14
18. Common feature for cross-tumor predictive power
“12q” crucial for cross-tumor
predictive power
Shared biological features provide insights into mechanistic connections b/w two cancer types
18
19. Modeling factors affecting prediction of survival data
Predictive power of molecular data strongly depend on the cancer type
19
20. Variation by modeling factors and their interactions
cancer type, data type, and their interactions are dominant sources of variability
35.7%
17.4% 11.8%
20
21. Other
(n = 623,096)
Noalterations
Nonsynonymous in
121 actionable genes
(n=10,281)
Somatic alterations in clinically relevant genes
10,281 somatic alterations across 12 tumor types
in 2,928 of 3,277 patients (89.4%)
ERBB2-
Neratinib
AKT1-
AKT inhibitors
FGFR1-
FGFR inhibitors
MAP2K1 & MAP2K2
MEK/ERK inhibitors
21
22. Expanding genome profiling to exome sequencing
↑ % of Patients with clinically actionable alterations
22
23. Alterations in clinically relevant genes
Global surveys of mutational patterns may stratify patients resistant to certain therapies 23
24. Conclusion
Deletion Normal Amplif.
Cytosine 5-Methyl
Cytosine
AAAA
AAA
AA
Survival Clinical
variables
Somatic copy no. alteration
DNA methylation
mRNA expression
miRNA expression
Protein expression
How and to what extent
molecular profiling
affect oncology practice?
Prognostic utility
(Survival prediction)?
Therapeutic utility
(Clinically actionable genes)?
Survival prediction significantly
improved for 3 cancer types
(2.2 % to 23.9%, FDR < 0.05)
10,281 somatic mutations in 2,928
patients (89.4%) out of 3,277
patients across 12 cancer types
This information could be helpful
in setting treatment targets.
24
25. Discussion and future perspective
No obvious improvement with addition of “OMICS” to clinical. Difference is unlikely predictive.
Predictive power of molecular data depended on cancer type but still cross-tumor models were used.
Upgrading from hotspot profiling to exome sequencing will yield a more complete and clinically
useful patient tumor profile.
Genes rarely mutated in any given tumor type are more regularly altered when considering aggregate
studies.
25
Fortunately, doctors don’t use it! So, I’ll start with the definition of XXXX or prognosis. So these pictures here illustrate the concept of prediction