SlideShare a Scribd company logo
1 of 34
Download to read offline
Multi-trait modeling in polygenic scores
Yosuke Tanigawa
Postdoc @ Computational Biology Lab
(PI: Prof. Manolis Kellis), MIT CSAIL
2022/1/28 (Fri.) 2:30 pm (ET) @ Zoom
Debora Marks Lab Journal Club
1
@yk_tani
https://yosuketanigawa.com/
The main paper for journal club presentation
2
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Joint work w/ Nasa
Sinnott-Armstrong
Polygenic risk scores (PRSs) combine
genetic associations across many variants
- Large-scale cohorts enabled discovery of GWAS associations
- Polygenic risk score (PGS)
(or polygenic score [PGS])
“Inference” to “Prediction”
3
i-th individual
j-th variant
G: genotype
β: effect size
PRS predictions are sometimes useful
- Difference in (overlapping) PRS distributions are sometimes useful for
population stratification.
- PRS can be used as instrument variable in causal inference
4
N. R. Wray et al., JAMA Psychiatry (2020); Sakaue*, Kanai*, et al., Nat Med (2020).
PRS(biomarker)
associations with lifespan
PRS models often contain many variants
- One challenge in PRS modeling is the LD structure
- Bayesian regression with GWAS summary statistics + LD reference
has been successful
- Genome-wide polygenic risk score (Khera et al) with 6M+ variants
- We don’t assume 6M causal variants for common complex traits
5
Khera, et al., Nat Gen (2018).
Sparse regression model with Lasso
- One alternative: regularized regression on individual-level data
- e.g. Lasso
- Challenge: dataset is large (n = 300k, p = 1M+)
- Does not fit on memory, etc.
- We developed Batch screening iterative Lasso (BASIL)
- Efficient screening based on “strong rule” (Tibshirani et al 2012)
- Solves Lasso via iterative procedure
6
Junyang Qian
Qian, Tanigawa, et al. PLOS Gen. (2020).
Batch screening iterative Lasso (BASIL)
BASIL (= BAtch Screening Iterative Lasso) in R snpnet package
7
3 steps per iteration
1. Screening
2. Lasso Fit (glmnet)
3. KKT Check
Qian, Tanigawa, et al. PLOS Gen. (2020).
BASIL/snpnet model are sparse, yet have
comparable predictive performance
- The snpnet PRS models (Lasso & Elastic-Net) have comparable
predictive performance with SBayesR
- Standing height was one of the most polygenic traits.
- Hight PRS model has 47k variants (5% of non-zero BETAs)
8
Qian, Tanigawa, et al. PLOS Gen. (2020).; Tanigawa, Qian, et al. medRxiv (2021)
Hold-out
test
set
R
2
Hold-out
test
set
AUC
Genetics of 35 biomarkers study in UK Biobank
9
349 rare (MAF < 1%) non-synonymous variant associations
1,381 (1,134 novel) associations on non-synonymous variants
Cardiovascular
Bone and Joint
Diabetes
Liver
Hormone
Renal
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Genetics of 35 biomarkers study in UK Biobank
10
Cardiovascular
Bone and Joint
Diabetes
Liver
Hormone
Renal
Polygenic risk scores (PRSs) for 35 biomarkers
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Polygenic risk scores (PRSs) for 35 biomarkers
• Created 70% training/10% validation/ 20% test split for white British
• Tested 4 additional UKB sub-populations of different ancestries
• Limited trans-ethnic predictive performance of PRSs
11
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Disease cases are enriched in PRS tails
Take extreme in PRS for biomarkers
Compare odds ratio for disease
outcome relative to 40-60%ile bin
Applied PheWAS for ~160 diseases
12
Lewis, C. M. & Vassos, E.
Genome Medicine (2020).
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Disease cases are enriched in PRS tails
Take extreme in PRS for biomarkers
Identify diseases with biomarker PRS
associations
Compare odds ratio for disease
outcome relative to 40-60%ile bin
Applied PheWAS for ~160 diseases
13
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Multi-PRS - a linear combination of a disease
PRS and biomarker PRSs
- Multiple observations suggest “biomarkers → disease” links
- PRS-PheWAS analysis
- Biomarkers are more heritable than disease
- Mendelian Randomization
- Multi-PRS is a weighted sum of PRSs
i.e. w1
(PRS1
) + w2
(PRS2
) + w3
(PRS3
) + …
14
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
Weights of multi-PRS comes from Lasso
Multi-PRS: w1
(PRS1
) + w2
(PRS2
) + w3
(PRS3
) + …
15
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
multi-PRS improves disease prevalence prediction
Chronic kidney
disease (CKD)
Other diseases in
UK Biobank
16
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
multi-PRS models improves incident disease
prediction in FinnGen
The multi-PRS model is replicated in Finnish cohort (FinnGen)
17
Nina Mars
Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
- Two complementary approaches to improve predictive performance
- 1) Sample size → increase in power
- 2) Multi-trait analysis
- Why does multi-PRS work?
- Quantitative traits have more power (J. Yang et al 2010)
- Genetic correlation between biomarkers and disease
- Phenotyping challenges in some disease phenotypes
- When does multi-PRS work the best?
- Exact conditions are not fully clear (yet)
- The multi-phenotype model
- multi-PRS:
Genetics → Biomarkers (Molecular traits) → Disease
- Alternatives (other models):
- “Genetic component”-based model
What we learned from multi-PRS?
18
Extreme polygenicity & pleiotropy in
the genetics of common complex traits
19
Genetic
variants
Complex
traits
- Polygenicity: many variants - one trait
- Pleiotropy: one variant - many traits
- Large number of associations in
population-based cohorts
- Can we group them together for enhanced interpretation?
Decomposition of genetic associations (DeGAs)
20
Tanigawa*, Li*, et al. Nat Comm (2019).
Low-rank representation of association summary
statistics provides latent components
1. Genome & phenome-wide association summary statistic matrix
2. Truncated-singular value decomposition (TSVD)
3. Quantify the variant & trait-loadings
on each component
“paint” the disease genetics with components!
Summary statistics from
association analysis
(beta or log odds ratio)
21
Tanigawa*, Li*, et al. Nat Comm (2019).
Biplot annotation helps interpretation of
DeGAs latent components
22
Tanigawa*, Li*, et al. Nat Comm (2019).
DeGAs is subsequently extended to PRS model
- DeGAs-PRS (dPRS)
- Derive “component”-score
- Disease PRS as sum of
component-score
- It offers better interpretation
23
Aguirre, Tanigawa, et al. Eur J Hum Gen (2021).
Sparse reduced-rank regression (SRRR) in
multiSnpnet package bridge the all
1. BASIL/snpnet (Lasso) – sparse PRS models
2. multi-PRS – linear combination of snpnet PRSs
w1
(PRS1
) + w2
(PRS2
) + w3
(PRS3
) + …
3. DeGAs-PRS – genetic component-based PRSs
w1
(cPRS1
) + w2
(cPRS2
) + w3
(cPRS3
) + …
cPRS comes from tSVD of GWAS associations
SRRR/multiSnpnet fits penalized multivariate multi-response model
24
Sparse reduced-rank regression (SRRR) in
multiSnpnet package bridge the two approaches
25
- One can show (1) and (2) are equivalent. Note: it’s NOT convex
- Group lasso penalty
- We select features that influence on multiple responses (traits)
- DeGAs (tSVD)-based approach offers interpretation
Qian, Tanigawa, et al. Ann Appl Stat (in press).
(1)
(2)
Junyang Qian
multiSnpnet/SRRR applied on UK Biobank
- Asthma & clinically related traits
- Predictive performance improvements
for asthma & basophil count
- SVD of the coefficients offer interpretation
26
Qian, Tanigawa, et al. Ann Appl Stat (in press).
Summary & future directions
Summary
- Polygenic risk score models (PRSs) computes genetic liability of
diseases by aggregating effects across multiple genetic variants
- Sparse snpnet PRS models have competitive performance
- Multi-trait aware PRS can improve the predictive power
Future direction & discussion
- Integrate with fine-mapping, conservation, variant, gene annotation?
- Incorporate (cell-type-specific) biological knowledge as prior
- It may help improving the predictive performance / transferability?
- Machine-learning-based PRS models
- Non-linear combination of multiple traits
- Incorporate biological priors
27
Acknowledgements
Dept. Biomedical Data Science
- Matthew Aguirre
- Manuel A. Rivas
- the Rivas lab
Dept. Statistics
- Junyang Qian
- Trevor Hastie
- Rob Tibshirani
Dept. Genetics, Stanford
- Nasa Sinnott-Armstrong
- Jonathan Pritchard
University of Helsinki
- Nina Mars
- Samuli Ripatti
28
Funding supports:
Nasa Sinnott-Armstrong
Junyang Qian
References
- Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. (2021). (PMID: 33462484)
- Genetics of 35 biomarkers, multi-PRS
- Qian, Tanigawa, et al. PLoS Gen. (2020). (PMID: 33095761)
- Batch screening iterative Lasso (BASIL) & R snpnet package
- Qian, Tanigawa, et al. Ann Appl Stat. (in press). (doi: 10.1101/2020.05.30.125252)
- Sparse reduced rank regression (SRRR) & R multiSnpnet package
- Tanigawa, Li, et al. Nat Comm (2019). (PMID: 31492854)
- DeGAs - decomposition of genetic associations
- Aguirre, Tanigawa, et al. Eur J Hum Genet. (2021). (PMID: 33558700)
- DeGAs-PRS (dPRS)
- Tanigawa, Qian, et al. medRxiv (2021) (doi: 10.1101/2021.09.02.21262942)
- Phenome-wide application of BASIL/snpnet
29
30
multiSnpnet efficiently solves SRRR
BASIL-like iterative procedure
31
3 steps per iteration
1. Screening
2. Fitting (SVD & group lasso)
3. KKT Check
Qian, Tanigawa, et al. Ann Appl Stat (in press).
Variant prioritization w/ predicted consequence
does not help improving the performance
- Lasso penalty factor.
- Penalty factor = 0 → no regularization on the variable
- Protein-truncating and known pathogenic variants = 0.5
- Protein-altering and known likely-pathogenic variants = 0.75
32
Tanigawa, Qian, et al. medRxiv (2021)
Sex-specific genetic effects for testosterone
33
Emily Flynn
Flynn, Tanigawa, et al. EJHG (2021).
Improved genetic prediction of testosterone
levels with sex-specific PRS models
Sex-specific polygenic risk model for testosterone outperforms polygenic
risk scores that combine males and females
34
Flynn, Tanigawa, et al. EJHG (2021).

More Related Content

What's hot

20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06Computer Science Club
 
Ed Millensted - 100,00 Genomes Project
Ed Millensted - 100,00 Genomes ProjectEd Millensted - 100,00 Genomes Project
Ed Millensted - 100,00 Genomes ProjectInnovation Agency
 
When to Consider Multi-Gene Testing in Early-Stage and Metastatic Breast Cancer
When to Consider Multi-Gene Testing in Early-Stage and Metastatic Breast CancerWhen to Consider Multi-Gene Testing in Early-Stage and Metastatic Breast Cancer
When to Consider Multi-Gene Testing in Early-Stage and Metastatic Breast Cancerbkling
 
Management of recurrent Glioblastoma and role of Bevacizumab
Management of recurrent Glioblastoma and role of BevacizumabManagement of recurrent Glioblastoma and role of Bevacizumab
Management of recurrent Glioblastoma and role of BevacizumabAjeet Gandhi
 
Comparative genomics presentation
Comparative genomics presentationComparative genomics presentation
Comparative genomics presentationEmmanuel Aguon
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018David Cook
 
Next generation sequencing in cancer treatment
Next generation sequencing in cancer treatment  Next generation sequencing in cancer treatment
Next generation sequencing in cancer treatment MarliaGan
 
Analysis and Interpretation of Cell-free DNA
Analysis and Interpretation of Cell-free DNAAnalysis and Interpretation of Cell-free DNA
Analysis and Interpretation of Cell-free DNAQIAGEN
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingDayananda Salam
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.Varsha Gayatonde
 
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
APPLICATION OF  NEXT GENERATION SEQUENCING (NGS)  IN CANCER TREATMENTAPPLICATION OF  NEXT GENERATION SEQUENCING (NGS)  IN CANCER TREATMENT
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENTDinie Fariz
 
NGS in cancer treatment
NGS in cancer treatmentNGS in cancer treatment
NGS in cancer treatmentNur Suhaida
 
Should triple negative breast cancer (tnbc) subtype
Should triple negative breast cancer (tnbc) subtypeShould triple negative breast cancer (tnbc) subtype
Should triple negative breast cancer (tnbc) subtypeEreny Samwel
 
Conventional and next generation sequencing ppt
Conventional and next generation sequencing pptConventional and next generation sequencing ppt
Conventional and next generation sequencing pptAshwini R
 

What's hot (20)

20100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture0620100515 bioinformatics kapushesky_lecture06
20100515 bioinformatics kapushesky_lecture06
 
Ed Millensted - 100,00 Genomes Project
Ed Millensted - 100,00 Genomes ProjectEd Millensted - 100,00 Genomes Project
Ed Millensted - 100,00 Genomes Project
 
When to Consider Multi-Gene Testing in Early-Stage and Metastatic Breast Cancer
When to Consider Multi-Gene Testing in Early-Stage and Metastatic Breast CancerWhen to Consider Multi-Gene Testing in Early-Stage and Metastatic Breast Cancer
When to Consider Multi-Gene Testing in Early-Stage and Metastatic Breast Cancer
 
Management of recurrent Glioblastoma and role of Bevacizumab
Management of recurrent Glioblastoma and role of BevacizumabManagement of recurrent Glioblastoma and role of Bevacizumab
Management of recurrent Glioblastoma and role of Bevacizumab
 
Comparative genomics presentation
Comparative genomics presentationComparative genomics presentation
Comparative genomics presentation
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Next generation sequencing in cancer treatment
Next generation sequencing in cancer treatment  Next generation sequencing in cancer treatment
Next generation sequencing in cancer treatment
 
Analysis and Interpretation of Cell-free DNA
Analysis and Interpretation of Cell-free DNAAnalysis and Interpretation of Cell-free DNA
Analysis and Interpretation of Cell-free DNA
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.
 
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
APPLICATION OF  NEXT GENERATION SEQUENCING (NGS)  IN CANCER TREATMENTAPPLICATION OF  NEXT GENERATION SEQUENCING (NGS)  IN CANCER TREATMENT
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
 
NGS in cancer treatment
NGS in cancer treatmentNGS in cancer treatment
NGS in cancer treatment
 
Should triple negative breast cancer (tnbc) subtype
Should triple negative breast cancer (tnbc) subtypeShould triple negative breast cancer (tnbc) subtype
Should triple negative breast cancer (tnbc) subtype
 
Conventional and next generation sequencing ppt
Conventional and next generation sequencing pptConventional and next generation sequencing ppt
Conventional and next generation sequencing ppt
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
Clinical Applications of Next Generation Sequencing
Clinical Applications of Next Generation SequencingClinical Applications of Next Generation Sequencing
Clinical Applications of Next Generation Sequencing
 
Snp
SnpSnp
Snp
 

Similar to Multi-trait modeling in polygenic scores, journal club talk at Debora Marks lab

2015. Patrik Schnable. Trait associated SNPs provide insights into heterosis...
2015. Patrik Schnable. Trait associated SNPs provide insights  into heterosis...2015. Patrik Schnable. Trait associated SNPs provide insights  into heterosis...
2015. Patrik Schnable. Trait associated SNPs provide insights into heterosis...FOODCROPS
 
Fast forward genetic mapping provides candidate genes for resistance to fusar...
Fast forward genetic mapping provides candidate genes for resistance to fusar...Fast forward genetic mapping provides candidate genes for resistance to fusar...
Fast forward genetic mapping provides candidate genes for resistance to fusar...ICRISAT
 
Swansea University (October-2020): Challenges of using GWAS in bacteria
Swansea University (October-2020): Challenges of using GWAS in bacteriaSwansea University (October-2020): Challenges of using GWAS in bacteria
Swansea University (October-2020): Challenges of using GWAS in bacteriaBen Pascoe
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationIJAEMSJORNAL
 
Jcb 2005-12-1103
Jcb 2005-12-1103Jcb 2005-12-1103
Jcb 2005-12-1103Farah Diba
 
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...John Blue
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...tuxette
 
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Md Rahman
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
 
Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Laurence Dawkins-Hall
 
An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...
An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...
An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...Mandy Brown
 
Final From journal on website
Final From journal on websiteFinal From journal on website
Final From journal on websiteMichael Clawson
 
Genome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleGenome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleLaurence Dawkins-Hall
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13Jonathan Eisen
 
IJSRED-V2I1P5
IJSRED-V2I1P5IJSRED-V2I1P5
IJSRED-V2I1P5IJSRED
 
A systematic, data driven approach to the combined analysis of microarray and...
A systematic, data driven approach to the combined analysis of microarray and...A systematic, data driven approach to the combined analysis of microarray and...
A systematic, data driven approach to the combined analysis of microarray and...Laurence Dawkins-Hall
 
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...Sara Alvarez
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_versionDago Noel
 

Similar to Multi-trait modeling in polygenic scores, journal club talk at Debora Marks lab (20)

2015. Patrik Schnable. Trait associated SNPs provide insights into heterosis...
2015. Patrik Schnable. Trait associated SNPs provide insights  into heterosis...2015. Patrik Schnable. Trait associated SNPs provide insights  into heterosis...
2015. Patrik Schnable. Trait associated SNPs provide insights into heterosis...
 
Fast forward genetic mapping provides candidate genes for resistance to fusar...
Fast forward genetic mapping provides candidate genes for resistance to fusar...Fast forward genetic mapping provides candidate genes for resistance to fusar...
Fast forward genetic mapping provides candidate genes for resistance to fusar...
 
Swansea University (October-2020): Challenges of using GWAS in bacteria
Swansea University (October-2020): Challenges of using GWAS in bacteriaSwansea University (October-2020): Challenges of using GWAS in bacteria
Swansea University (October-2020): Challenges of using GWAS in bacteria
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
 
Jcb 2005-12-1103
Jcb 2005-12-1103Jcb 2005-12-1103
Jcb 2005-12-1103
 
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...
 
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...Systemic analysis of data combined from genetic qtl's and gene expression dat...
Systemic analysis of data combined from genetic qtl's and gene expression dat...
 
An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...
An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...
An Enrichment Analysis For Cardiometabolic Traits Suggests Non-Random Assignm...
 
Final From journal on website
Final From journal on websiteFinal From journal on website
Final From journal on website
 
Genome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleGenome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattle
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
QTL mapping
QTL mappingQTL mapping
QTL mapping
 
EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13
 
IJSRED-V2I1P5
IJSRED-V2I1P5IJSRED-V2I1P5
IJSRED-V2I1P5
 
A systematic, data driven approach to the combined analysis of microarray and...
A systematic, data driven approach to the combined analysis of microarray and...A systematic, data driven approach to the combined analysis of microarray and...
A systematic, data driven approach to the combined analysis of microarray and...
 
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_version
 

More from Yosuke Tanigawa

Multi-trait analysis informs genetic disease studies (IIBMP 2020)
Multi-trait analysis informs genetic disease studies (IIBMP 2020)Multi-trait analysis informs genetic disease studies (IIBMP 2020)
Multi-trait analysis informs genetic disease studies (IIBMP 2020)Yosuke Tanigawa
 
人類遺伝学の謎に コンピュータを使って挑む 〜ワクワクを追求する人生のつくりかた〜
人類遺伝学の謎に コンピュータを使って挑む  〜ワクワクを追求する人生のつくりかた〜人類遺伝学の謎に コンピュータを使って挑む  〜ワクワクを追求する人生のつくりかた〜
人類遺伝学の謎に コンピュータを使って挑む 〜ワクワクを追求する人生のつくりかた〜Yosuke Tanigawa
 
20180802 Yosuke Tanigawa public
20180802 Yosuke Tanigawa public20180802 Yosuke Tanigawa public
20180802 Yosuke Tanigawa publicYosuke Tanigawa
 
20180715 海外大学院留学説明会
20180715 海外大学院留学説明会20180715 海外大学院留学説明会
20180715 海外大学院留学説明会Yosuke Tanigawa
 
Why do we need a computer to study biology (20180505 splash B6476)
Why do we need a computer to study biology (20180505 splash B6476)Why do we need a computer to study biology (20180505 splash B6476)
Why do we need a computer to study biology (20180505 splash B6476)Yosuke Tanigawa
 
20161222 米国大学院学生会説明会資料
20161222 米国大学院学生会説明会資料20161222 米国大学院学生会説明会資料
20161222 米国大学院学生会説明会資料Yosuke Tanigawa
 
ゲノム科学への招待
ゲノム科学への招待ゲノム科学への招待
ゲノム科学への招待Yosuke Tanigawa
 
ゲノム科学への招待 (2016.5.19 draft)
ゲノム科学への招待 (2016.5.19 draft)ゲノム科学への招待 (2016.5.19 draft)
ゲノム科学への招待 (2016.5.19 draft)Yosuke Tanigawa
 
6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)
6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)
6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)Yosuke Tanigawa
 
生物情報科学科 ガイダンス (2016/5/17)
生物情報科学科 ガイダンス (2016/5/17)生物情報科学科 ガイダンス (2016/5/17)
生物情報科学科 ガイダンス (2016/5/17)Yosuke Tanigawa
 

More from Yosuke Tanigawa (10)

Multi-trait analysis informs genetic disease studies (IIBMP 2020)
Multi-trait analysis informs genetic disease studies (IIBMP 2020)Multi-trait analysis informs genetic disease studies (IIBMP 2020)
Multi-trait analysis informs genetic disease studies (IIBMP 2020)
 
人類遺伝学の謎に コンピュータを使って挑む 〜ワクワクを追求する人生のつくりかた〜
人類遺伝学の謎に コンピュータを使って挑む  〜ワクワクを追求する人生のつくりかた〜人類遺伝学の謎に コンピュータを使って挑む  〜ワクワクを追求する人生のつくりかた〜
人類遺伝学の謎に コンピュータを使って挑む 〜ワクワクを追求する人生のつくりかた〜
 
20180802 Yosuke Tanigawa public
20180802 Yosuke Tanigawa public20180802 Yosuke Tanigawa public
20180802 Yosuke Tanigawa public
 
20180715 海外大学院留学説明会
20180715 海外大学院留学説明会20180715 海外大学院留学説明会
20180715 海外大学院留学説明会
 
Why do we need a computer to study biology (20180505 splash B6476)
Why do we need a computer to study biology (20180505 splash B6476)Why do we need a computer to study biology (20180505 splash B6476)
Why do we need a computer to study biology (20180505 splash B6476)
 
20161222 米国大学院学生会説明会資料
20161222 米国大学院学生会説明会資料20161222 米国大学院学生会説明会資料
20161222 米国大学院学生会説明会資料
 
ゲノム科学への招待
ゲノム科学への招待ゲノム科学への招待
ゲノム科学への招待
 
ゲノム科学への招待 (2016.5.19 draft)
ゲノム科学への招待 (2016.5.19 draft)ゲノム科学への招待 (2016.5.19 draft)
ゲノム科学への招待 (2016.5.19 draft)
 
6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)
6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)
6分でわかる遺伝子検査のしくみ ―21世紀のゲノム医科学― (2016.5.12)
 
生物情報科学科 ガイダンス (2016/5/17)
生物情報科学科 ガイダンス (2016/5/17)生物情報科学科 ガイダンス (2016/5/17)
生物情報科学科 ガイダンス (2016/5/17)
 

Recently uploaded

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 

Recently uploaded (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

Multi-trait modeling in polygenic scores, journal club talk at Debora Marks lab

  • 1. Multi-trait modeling in polygenic scores Yosuke Tanigawa Postdoc @ Computational Biology Lab (PI: Prof. Manolis Kellis), MIT CSAIL 2022/1/28 (Fri.) 2:30 pm (ET) @ Zoom Debora Marks Lab Journal Club 1 @yk_tani https://yosuketanigawa.com/
  • 2. The main paper for journal club presentation 2 Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021 Joint work w/ Nasa Sinnott-Armstrong
  • 3. Polygenic risk scores (PRSs) combine genetic associations across many variants - Large-scale cohorts enabled discovery of GWAS associations - Polygenic risk score (PGS) (or polygenic score [PGS]) “Inference” to “Prediction” 3 i-th individual j-th variant G: genotype β: effect size
  • 4. PRS predictions are sometimes useful - Difference in (overlapping) PRS distributions are sometimes useful for population stratification. - PRS can be used as instrument variable in causal inference 4 N. R. Wray et al., JAMA Psychiatry (2020); Sakaue*, Kanai*, et al., Nat Med (2020). PRS(biomarker) associations with lifespan
  • 5. PRS models often contain many variants - One challenge in PRS modeling is the LD structure - Bayesian regression with GWAS summary statistics + LD reference has been successful - Genome-wide polygenic risk score (Khera et al) with 6M+ variants - We don’t assume 6M causal variants for common complex traits 5 Khera, et al., Nat Gen (2018).
  • 6. Sparse regression model with Lasso - One alternative: regularized regression on individual-level data - e.g. Lasso - Challenge: dataset is large (n = 300k, p = 1M+) - Does not fit on memory, etc. - We developed Batch screening iterative Lasso (BASIL) - Efficient screening based on “strong rule” (Tibshirani et al 2012) - Solves Lasso via iterative procedure 6 Junyang Qian Qian, Tanigawa, et al. PLOS Gen. (2020).
  • 7. Batch screening iterative Lasso (BASIL) BASIL (= BAtch Screening Iterative Lasso) in R snpnet package 7 3 steps per iteration 1. Screening 2. Lasso Fit (glmnet) 3. KKT Check Qian, Tanigawa, et al. PLOS Gen. (2020).
  • 8. BASIL/snpnet model are sparse, yet have comparable predictive performance - The snpnet PRS models (Lasso & Elastic-Net) have comparable predictive performance with SBayesR - Standing height was one of the most polygenic traits. - Hight PRS model has 47k variants (5% of non-zero BETAs) 8 Qian, Tanigawa, et al. PLOS Gen. (2020).; Tanigawa, Qian, et al. medRxiv (2021) Hold-out test set R 2 Hold-out test set AUC
  • 9. Genetics of 35 biomarkers study in UK Biobank 9 349 rare (MAF < 1%) non-synonymous variant associations 1,381 (1,134 novel) associations on non-synonymous variants Cardiovascular Bone and Joint Diabetes Liver Hormone Renal Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 10. Genetics of 35 biomarkers study in UK Biobank 10 Cardiovascular Bone and Joint Diabetes Liver Hormone Renal Polygenic risk scores (PRSs) for 35 biomarkers Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 11. Polygenic risk scores (PRSs) for 35 biomarkers • Created 70% training/10% validation/ 20% test split for white British • Tested 4 additional UKB sub-populations of different ancestries • Limited trans-ethnic predictive performance of PRSs 11 Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 12. Disease cases are enriched in PRS tails Take extreme in PRS for biomarkers Compare odds ratio for disease outcome relative to 40-60%ile bin Applied PheWAS for ~160 diseases 12 Lewis, C. M. & Vassos, E. Genome Medicine (2020). Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 13. Disease cases are enriched in PRS tails Take extreme in PRS for biomarkers Identify diseases with biomarker PRS associations Compare odds ratio for disease outcome relative to 40-60%ile bin Applied PheWAS for ~160 diseases 13 Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 14. Multi-PRS - a linear combination of a disease PRS and biomarker PRSs - Multiple observations suggest “biomarkers → disease” links - PRS-PheWAS analysis - Biomarkers are more heritable than disease - Mendelian Randomization - Multi-PRS is a weighted sum of PRSs i.e. w1 (PRS1 ) + w2 (PRS2 ) + w3 (PRS3 ) + … 14 Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 15. Weights of multi-PRS comes from Lasso Multi-PRS: w1 (PRS1 ) + w2 (PRS2 ) + w3 (PRS3 ) + … 15 Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 16. multi-PRS improves disease prevalence prediction Chronic kidney disease (CKD) Other diseases in UK Biobank 16 Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 17. multi-PRS models improves incident disease prediction in FinnGen The multi-PRS model is replicated in Finnish cohort (FinnGen) 17 Nina Mars Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. 2021
  • 18. - Two complementary approaches to improve predictive performance - 1) Sample size → increase in power - 2) Multi-trait analysis - Why does multi-PRS work? - Quantitative traits have more power (J. Yang et al 2010) - Genetic correlation between biomarkers and disease - Phenotyping challenges in some disease phenotypes - When does multi-PRS work the best? - Exact conditions are not fully clear (yet) - The multi-phenotype model - multi-PRS: Genetics → Biomarkers (Molecular traits) → Disease - Alternatives (other models): - “Genetic component”-based model What we learned from multi-PRS? 18
  • 19. Extreme polygenicity & pleiotropy in the genetics of common complex traits 19 Genetic variants Complex traits - Polygenicity: many variants - one trait - Pleiotropy: one variant - many traits - Large number of associations in population-based cohorts - Can we group them together for enhanced interpretation?
  • 20. Decomposition of genetic associations (DeGAs) 20 Tanigawa*, Li*, et al. Nat Comm (2019).
  • 21. Low-rank representation of association summary statistics provides latent components 1. Genome & phenome-wide association summary statistic matrix 2. Truncated-singular value decomposition (TSVD) 3. Quantify the variant & trait-loadings on each component “paint” the disease genetics with components! Summary statistics from association analysis (beta or log odds ratio) 21 Tanigawa*, Li*, et al. Nat Comm (2019).
  • 22. Biplot annotation helps interpretation of DeGAs latent components 22 Tanigawa*, Li*, et al. Nat Comm (2019).
  • 23. DeGAs is subsequently extended to PRS model - DeGAs-PRS (dPRS) - Derive “component”-score - Disease PRS as sum of component-score - It offers better interpretation 23 Aguirre, Tanigawa, et al. Eur J Hum Gen (2021).
  • 24. Sparse reduced-rank regression (SRRR) in multiSnpnet package bridge the all 1. BASIL/snpnet (Lasso) – sparse PRS models 2. multi-PRS – linear combination of snpnet PRSs w1 (PRS1 ) + w2 (PRS2 ) + w3 (PRS3 ) + … 3. DeGAs-PRS – genetic component-based PRSs w1 (cPRS1 ) + w2 (cPRS2 ) + w3 (cPRS3 ) + … cPRS comes from tSVD of GWAS associations SRRR/multiSnpnet fits penalized multivariate multi-response model 24
  • 25. Sparse reduced-rank regression (SRRR) in multiSnpnet package bridge the two approaches 25 - One can show (1) and (2) are equivalent. Note: it’s NOT convex - Group lasso penalty - We select features that influence on multiple responses (traits) - DeGAs (tSVD)-based approach offers interpretation Qian, Tanigawa, et al. Ann Appl Stat (in press). (1) (2) Junyang Qian
  • 26. multiSnpnet/SRRR applied on UK Biobank - Asthma & clinically related traits - Predictive performance improvements for asthma & basophil count - SVD of the coefficients offer interpretation 26 Qian, Tanigawa, et al. Ann Appl Stat (in press).
  • 27. Summary & future directions Summary - Polygenic risk score models (PRSs) computes genetic liability of diseases by aggregating effects across multiple genetic variants - Sparse snpnet PRS models have competitive performance - Multi-trait aware PRS can improve the predictive power Future direction & discussion - Integrate with fine-mapping, conservation, variant, gene annotation? - Incorporate (cell-type-specific) biological knowledge as prior - It may help improving the predictive performance / transferability? - Machine-learning-based PRS models - Non-linear combination of multiple traits - Incorporate biological priors 27
  • 28. Acknowledgements Dept. Biomedical Data Science - Matthew Aguirre - Manuel A. Rivas - the Rivas lab Dept. Statistics - Junyang Qian - Trevor Hastie - Rob Tibshirani Dept. Genetics, Stanford - Nasa Sinnott-Armstrong - Jonathan Pritchard University of Helsinki - Nina Mars - Samuli Ripatti 28 Funding supports: Nasa Sinnott-Armstrong Junyang Qian
  • 29. References - Sinnott-Armstrong*, Tanigawa*, et al. Nat Gen. (2021). (PMID: 33462484) - Genetics of 35 biomarkers, multi-PRS - Qian, Tanigawa, et al. PLoS Gen. (2020). (PMID: 33095761) - Batch screening iterative Lasso (BASIL) & R snpnet package - Qian, Tanigawa, et al. Ann Appl Stat. (in press). (doi: 10.1101/2020.05.30.125252) - Sparse reduced rank regression (SRRR) & R multiSnpnet package - Tanigawa, Li, et al. Nat Comm (2019). (PMID: 31492854) - DeGAs - decomposition of genetic associations - Aguirre, Tanigawa, et al. Eur J Hum Genet. (2021). (PMID: 33558700) - DeGAs-PRS (dPRS) - Tanigawa, Qian, et al. medRxiv (2021) (doi: 10.1101/2021.09.02.21262942) - Phenome-wide application of BASIL/snpnet 29
  • 30. 30
  • 31. multiSnpnet efficiently solves SRRR BASIL-like iterative procedure 31 3 steps per iteration 1. Screening 2. Fitting (SVD & group lasso) 3. KKT Check Qian, Tanigawa, et al. Ann Appl Stat (in press).
  • 32. Variant prioritization w/ predicted consequence does not help improving the performance - Lasso penalty factor. - Penalty factor = 0 → no regularization on the variable - Protein-truncating and known pathogenic variants = 0.5 - Protein-altering and known likely-pathogenic variants = 0.75 32 Tanigawa, Qian, et al. medRxiv (2021)
  • 33. Sex-specific genetic effects for testosterone 33 Emily Flynn Flynn, Tanigawa, et al. EJHG (2021).
  • 34. Improved genetic prediction of testosterone levels with sex-specific PRS models Sex-specific polygenic risk model for testosterone outperforms polygenic risk scores that combine males and females 34 Flynn, Tanigawa, et al. EJHG (2021).