SlideShare a Scribd company logo
1 of 58
Whole exome sequencing
(WES|WXS)
and its data analysis
Feb 28, 2023
Haibo Liu
Senior Bioinformatician
UMass Medical School, Worcester, MA
Email: haibol2017@gmail.com
Eukaryotic Exome
The human exome contains about 180,000 exons. These constitute about 1% of
the human genome (~40 Mb).
Exome sequencing
• A NGS method that selectively sequences the transcribed
regions of the genome.
• Provides a cost-effective alternative to WGS
• Produces a smaller, more manageable data set for faster, easier data
analysis (4–5 Gb WES vs ~90 Gb WGS)
• Identify both somatic and germline variants
• Single Nucleotide Polymorphisms (SNPs)
• Small Insertions-Deletions (indels)
• Loss of Heterozygosity (LOH)
• Copy Number Variants (CNVs), structural variants (SV)
• Microsatellite stability
Performance of WES in clinical studies
Workflow of WES
Genotyping by Microarray, WES, and WGS
(not updated, data
analysis cost not
included)
Experimental design of WES
• Tissue sampling
• Somatic mutations
• Tumor (tumor purity and freshness are critical)
• Normal tissue or blood sample
• Germline mutations
• Blood or any other tissue
• Sample size and sample population
• cohort (disease vs health)
• Trio, related family (non-carrier, carrier, and patient)
• Capture methods
• Sequencing strategies
• platform, PE|SE, UMI, read length, seq. depth
Rescue to DNA preparation from FFPE fixed
samples
Exome capture: Target-enrichment strategies
Array-based capture
https://en.wikipedia.org/wiki/Exome_sequencing
• Twist Exome 2.0 (Twist
Bioscience)
• Nextera Rapid Capture
Exomes (Illumina)
• xGen WES (IDT)
• SureSelect (Agilent)
• KAPA HyperExome
(Roche)
• SeqCap (NimblGen)
• …
Capture toolkits
UMI for detecting low frequency mutations for
prenatal or cancer research
The Cell3™ Target library preparation behind our whole exome enrichment incorporates error suppression
technology. This includes unique molecular indexes (UMIs) and unique dual indexes (UDIs), to remove both
PCR and sequencing errors and index hopping events. This error suppression technique, combined with our
excellent uniformity of coverage, allows you to confidently and accurately call mutations down to 0.1% VAF
and enables generation of sequencing libraries from as little as 1 ng cfDNA input.
Comparison of different library preparation methods
Comparison of different library preparation methods
Sequencing depth
Quality control in WES
Raw data QC BAM QC
variant QC
Raw data QC
• QC tools
• FastQC/MultiQC
• NGS QC toolkit (https://github.com/mjain-lab/NGSQCToolkit)
• QC-chain (contamination detection)
• PRINSEQ
• QC3
• Important QC metrics
• Base quality
• Nucleotide distribution along cycles
• GC content distribution
• Duplication rate
• Adaptor content
QC3
Read trimming
• Trimmomatic, cutadapt, fastp (auto adaptor detection), …
• Quality/adaptor trimming
• Don’t trim 5’ end (markduplicates)
From raw fastq to analysis-ready BAM
Aligner
• BWA-mem
• Bowtie2, Novoalign, GMAP
Selection of reference genomes
• Completeness
• Decoyed genome (1000 Genomes analysis pipeline)
• EBV (herpesvirus 4 type 1, AC:NC_007605) and decoy sequences
derived from HuRef, Human BAC and Fosmid clones and NA12878.
(~36Mb)
• T2T- CHM13v1.1, the latest, complete human reference
genome
Quality control in WES
Raw data QC BAM QC
variant QC
BAM QC
• Important QC metrics
• % of reads that map to the reference
• % of reads that map to the baits
• Coverage depth distribution (target regions)
• Coverage unevenness & Cohort Coverage Sparseness
• Insert size distribution
• Duplicate rate
• Tools
• Alfred
• QC3
• Various picard CollectMetrics tools
• covReport
Cohort Coverage Sparseness (CCS) and
Unevenness (UE) Scores for a detailed
assessment of the distribution of coverage of
sequence reads
https://www.nature.com/articles/s41598-017-01005-x
Local and global non-uniformity of
different capture toolkits
Differences from Capture toolkits
Differences from Capture toolkits
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4092227/
Exome probe design is one of the major
culprits
• Most of the observed bias in modern WES stems from
mappability limitations of short reads and exome probe design
rather than sequence composition.
https://www.nature.com/articles/s41598-020-59026-y
Alfred QC metrics
https://academic.oup.com/bioinformatics/article/35/14/2489/5232224
Alignment Metric DNA-Seq (WGS) DNA-Seq (Capture) RNA-Seq ChIP-Seq/ATAC-Seq Chart Type
Mapping Statistics ✔ ✔ ✔ ✔ Table
Duplicate Statistics ✔ ✔ ✔ ✔ Table
Sequencing Error Rates ✔ ✔ ✔ ✔ Table
Base Content Distribution ✔ ✔ ✔ ✔ Grouped Line Chart
Read Length Distribution ✔ ✔ ✔ ✔ Line Chart
Base Quality Distribution ✔ ✔ ✔ ✔ Line Chart
Coverage Histogram ✔ ✔ ✔ ✔ Line Chart
Insert Size Distribution ✔ ✔ ✔ ✔ Grouped Line Chart
InDel Size Distribution ✔ ✔ ✔ ✔ Grouped Line Chart
InDel Context ✔ ✔ ✔ ✔ Bar Chart
GC Content ✔ ✔ ✔ ✔ Grouped Line Chart
On-Target Rate ✔ Line Chart
Target Coverage Distribution ✔ Line Chart
TSS Enrichment ✔ Table
DNA pitch / Nucleosome pattern ✔ Grouped Line Chart
https://www.gear-genomics.com/docs/alfred/webapp/#featuresty-control)
CovReport
From BAM to VCF
GATK:
 Slop exon by 200 bp
 Analysis for each
chromosome
Variant callers
(Mutect2)
(HaplotypeCaller)
BreakSeq, LUMPY, Hydra,DELLY, CNVNator, Pindel
FreeBayes/SAMtools, DeepVariant
GATK Best practices for population-
based germline variant calling
GATK Mutect2 Best practices for population-
based soMATIC variant calling
Discrepancy of variants called by
different callers
Integrated variant calling
• Integration of multiple tools’ results
• Isma (integrative somatic mutation analysis)
• Ensemble Machine learning method
• BAYSIC
• SomaticSeq
• NeoMutate
• SMuRF
(Bartha and Gyorffy2019)
(Nanni et al. 2019)
Quality control in WES
Raw data QC BAM QC
variant QC
Sample-level Variant QC
• Tools
• GATK
CollectVariantCallingMetrics
, VCFtools, PLINK/seq, QC3
• Important QC metrics
• Ti/Tv ratio, nonsynonymous/synonymous,
heterozygous/nonreference-homozygous
(het/nonref-hom) ratio, mean depth,
• Genotype missing rate
• Genotype concordance to related data
(different platforms)
• Cross-sample DNA contamination
(VerifyBamID)
• Identity-by-descent (IBD) analysis (PLINK)
• Related samples
• PCA (EIGENSTRAT)
• Population stratum (ethnicity)
• Sex check (PLINK)
Ti/Tv ratio and het/nonref-hom ratio
• The Ti/Tv ratio varies greatly by genome region and
functionality, but not by ancestry.
• The het/nonref-hom ratio varies greatly by ancestry,
but not by genome regions and functionality.
• extreme guanine + cytosine content (either high or
low) is negatively associated with the Ti/Tv ratio
magnitude.
• when performing QC assessment using these two
measures, care must be taken to apply the correct
thresholds based on ancestry and genome region.
https://academic.oup.com/bioinformatics/article/31/3/318/2366248
Too low ==> high false positive rate; too high ==> bias.
Example report
Potential error sources in next-generation sequencing
workflow
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1659-6
Origin of variant artifacts
• Artifacts introduced by sample/library preparation
• low-quality base calls (Read-end artifacts and other low Qual bases)
• Alignment artifacts
• Local misalignment near indels,
• Erroneous alignments in low-complexity regions
• Paralogous alignments of reads not well represented in the reference
• Strand orientation bias artifacts (Strand Orientation Bias Detector
(SOBDetector), Fisher score)--
https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-666
•
Artifacts (https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-020-00791-w)
Low base qual Read end
Strand bias Low complexity misalignment Paralog misalgnment
Variant-level QC
• Important QC metrics
• Genotype missing rate
• Hardy-Weinberg Equilibrium (caution) p-value
• Mendelian error rate
• Allele balance of heterozygous calls
• Variant quality score (GATK): filtering SNP and INDELS
separately(https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-
filtering-germline-short-variants)
• Hard filter
• QualByDepth (QD)
• FisherStrand (FS)
• StrandOddsRatio (SOR)
• RMSMappingQuality (MQ)
• MappingQualityRankSumTest (MQRankSum)
• ReadPosRankSumTest (ReadPosRankSum)
• Machine learning-based filtering: Variant Quality Score Recalibration
--filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"
--filterName "my_snp_filter"
Variant-level filtering
• Tools
• GATK, VCFtools, PLINK/Seq
• Sequencing data-based filtering
• Exclude potential artifacts
• Database-based filtering:
• Exclude known variants which are present in public SNP databases,
published studies or in-house databases as it is assumed that common
variants represent harmless variations
• Pedigree-based filtering
• Each generation introduces up to 4.5 deleterious mutations, it might be as
well that a de novo mutation is causing the disease.
• Function-based filtering
• Caution: risk removing the pathogenic variant
Allelic balance
https://www.cureffi.org/2012/09/19/exome-sequencing-pipeline-using-gatk/
Allelic balance
• SLIVAR: genotype quality, sequencing depth, allele balance, and
population allele frequency : https://github.com/brentp/slivar
https://onlinelibrary.wiley.com/doi/full/10.1002/humu.23674
Variant annotation tools
VAT Annotation of variants
by functionality in a
cloud computing
environment.
Variant annotation databases
Functional predictors/Prioritization tools
snpSift http://pcingola.github.io/SnpEff/ss_introduction/
(Hintzsche et al., 2016)
VAAST https://github.com/Yandell-Lab/VVP-pub
VarSifter, VarSight
gNome, KGGseq
(Cheng et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense.
SCIENCE, 19 Sep 2023Vol 381, Issue 6664,DOI: 10.1126/science.adg7492)
Latest, advanced AI tool for infer effect of missense mutations: AlphaMissense
(Hintzsche et al., 2016)
Tools and resources for linking variants to
therapeutics
Variant visualization tools
VIVA, vcfR
oncoprint
Oncoprint for visualizing cohort variants
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6895801/
Beyond variants
Summary
WES and its data analysis
WES data analysis pipelines
• DRAGEN (Illumina)
• https://www.illumina.com/products/by-type/informatics-
products/basespace-sequence-hub/apps/dragen-enrichment.html
• JWES
• A high-performance commercial solution
(https://www.sentieon.com/products/)
• improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller,
Mutect, and Mutect2 based pipelines and is deployable on any
generic-CPU-based computing system
WES data analysis pipelines

More Related Content

What's hot

Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
Alignment of pairs of sequence (Types of Similarity Sequences)
Alignment of pairs of sequence (Types of Similarity Sequences)Alignment of pairs of sequence (Types of Similarity Sequences)
Alignment of pairs of sequence (Types of Similarity Sequences)Rahul M. Prathap
 
SNPs analysis methods
SNPs analysis methodsSNPs analysis methods
SNPs analysis methodshad89
 
Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation SequencingFarid MUSA
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Restriction fragment length polymorphism
Restriction fragment length polymorphismRestriction fragment length polymorphism
Restriction fragment length polymorphismAbhinav Baranwal
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Comparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 formatComparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 formatsidjena70
 
Single Nucleotide Polymorphism
Single Nucleotide PolymorphismSingle Nucleotide Polymorphism
Single Nucleotide PolymorphismFazeehaAmjad
 
ENCODE project: brief summary of main findings
ENCODE project: brief summary of main findingsENCODE project: brief summary of main findings
ENCODE project: brief summary of main findingsMaté Ongenaert
 
Fundamentals of Fluorescence in situ Hybridization
Fundamentals of Fluorescence in situ Hybridization Fundamentals of Fluorescence in situ Hybridization
Fundamentals of Fluorescence in situ Hybridization Amartya Pradhan
 

What's hot (20)

Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
presentation
presentationpresentation
presentation
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
Alignment of pairs of sequence (Types of Similarity Sequences)
Alignment of pairs of sequence (Types of Similarity Sequences)Alignment of pairs of sequence (Types of Similarity Sequences)
Alignment of pairs of sequence (Types of Similarity Sequences)
 
NANOPORE SEQUENCING
NANOPORE SEQUENCINGNANOPORE SEQUENCING
NANOPORE SEQUENCING
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
SNPs analysis methods
SNPs analysis methodsSNPs analysis methods
SNPs analysis methods
 
Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation Sequencing
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 
Rna seq
Rna seqRna seq
Rna seq
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Ngs introduction
Ngs introductionNgs introduction
Ngs introduction
 
Genome assembly
Genome assemblyGenome assembly
Genome assembly
 
Restriction fragment length polymorphism
Restriction fragment length polymorphismRestriction fragment length polymorphism
Restriction fragment length polymorphism
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Comparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 formatComparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 format
 
Single Nucleotide Polymorphism
Single Nucleotide PolymorphismSingle Nucleotide Polymorphism
Single Nucleotide Polymorphism
 
ENCODE project: brief summary of main findings
ENCODE project: brief summary of main findingsENCODE project: brief summary of main findings
ENCODE project: brief summary of main findings
 
Fundamentals of Fluorescence in situ Hybridization
Fundamentals of Fluorescence in situ Hybridization Fundamentals of Fluorescence in situ Hybridization
Fundamentals of Fluorescence in situ Hybridization
 
genomic comparison
genomic comparison genomic comparison
genomic comparison
 

Similar to WES Data Analysis and Clinical Insights

160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomicsGenomeInABottle
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSGolden Helix
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger Eli Kaminuma
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshopGenomeInABottle
 
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA Roberto Scarafia
 
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...OECD Environment
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim D. Pruitt
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupGenomeInABottle
 
2012 10-24 - ngs webinar
2012 10-24 - ngs webinar2012 10-24 - ngs webinar
2012 10-24 - ngs webinarElsa von Licy
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsGenomeInABottle
 
16S MVRSION at Washington University
16S MVRSION at Washington University16S MVRSION at Washington University
16S MVRSION at Washington UniversitySeth Crosby
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A RathoreGRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A RathoreCGIAR Generation Challenge Programme
 

Similar to WES Data Analysis and Clinical Insights (20)

160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomics
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVS
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
 
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
 
2012 10-24 - ngs webinar
2012 10-24 - ngs webinar2012 10-24 - ngs webinar
2012 10-24 - ngs webinar
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
16S MVRSION at Washington University
16S MVRSION at Washington University16S MVRSION at Washington University
16S MVRSION at Washington University
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A RathoreGRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
GRM 2013: Genome-Wide Selection Update -- RK Varshney and A Rathore
 
Ngs webinar 2013
Ngs webinar 2013Ngs webinar 2013
Ngs webinar 2013
 

Recently uploaded

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 

WES Data Analysis and Clinical Insights

  • 1. Whole exome sequencing (WES|WXS) and its data analysis Feb 28, 2023 Haibo Liu Senior Bioinformatician UMass Medical School, Worcester, MA Email: haibol2017@gmail.com
  • 2. Eukaryotic Exome The human exome contains about 180,000 exons. These constitute about 1% of the human genome (~40 Mb).
  • 3. Exome sequencing • A NGS method that selectively sequences the transcribed regions of the genome. • Provides a cost-effective alternative to WGS • Produces a smaller, more manageable data set for faster, easier data analysis (4–5 Gb WES vs ~90 Gb WGS) • Identify both somatic and germline variants • Single Nucleotide Polymorphisms (SNPs) • Small Insertions-Deletions (indels) • Loss of Heterozygosity (LOH) • Copy Number Variants (CNVs), structural variants (SV) • Microsatellite stability
  • 4. Performance of WES in clinical studies
  • 6. Genotyping by Microarray, WES, and WGS (not updated, data analysis cost not included)
  • 7. Experimental design of WES • Tissue sampling • Somatic mutations • Tumor (tumor purity and freshness are critical) • Normal tissue or blood sample • Germline mutations • Blood or any other tissue • Sample size and sample population • cohort (disease vs health) • Trio, related family (non-carrier, carrier, and patient) • Capture methods • Sequencing strategies • platform, PE|SE, UMI, read length, seq. depth
  • 8. Rescue to DNA preparation from FFPE fixed samples
  • 9. Exome capture: Target-enrichment strategies Array-based capture https://en.wikipedia.org/wiki/Exome_sequencing • Twist Exome 2.0 (Twist Bioscience) • Nextera Rapid Capture Exomes (Illumina) • xGen WES (IDT) • SureSelect (Agilent) • KAPA HyperExome (Roche) • SeqCap (NimblGen) • … Capture toolkits
  • 10. UMI for detecting low frequency mutations for prenatal or cancer research The Cell3™ Target library preparation behind our whole exome enrichment incorporates error suppression technology. This includes unique molecular indexes (UMIs) and unique dual indexes (UDIs), to remove both PCR and sequencing errors and index hopping events. This error suppression technique, combined with our excellent uniformity of coverage, allows you to confidently and accurately call mutations down to 0.1% VAF and enables generation of sequencing libraries from as little as 1 ng cfDNA input.
  • 11. Comparison of different library preparation methods
  • 12. Comparison of different library preparation methods
  • 14. Quality control in WES Raw data QC BAM QC variant QC
  • 15. Raw data QC • QC tools • FastQC/MultiQC • NGS QC toolkit (https://github.com/mjain-lab/NGSQCToolkit) • QC-chain (contamination detection) • PRINSEQ • QC3 • Important QC metrics • Base quality • Nucleotide distribution along cycles • GC content distribution • Duplication rate • Adaptor content QC3
  • 16. Read trimming • Trimmomatic, cutadapt, fastp (auto adaptor detection), … • Quality/adaptor trimming • Don’t trim 5’ end (markduplicates)
  • 17. From raw fastq to analysis-ready BAM
  • 19. Selection of reference genomes • Completeness • Decoyed genome (1000 Genomes analysis pipeline) • EBV (herpesvirus 4 type 1, AC:NC_007605) and decoy sequences derived from HuRef, Human BAC and Fosmid clones and NA12878. (~36Mb) • T2T- CHM13v1.1, the latest, complete human reference genome
  • 20. Quality control in WES Raw data QC BAM QC variant QC
  • 21. BAM QC • Important QC metrics • % of reads that map to the reference • % of reads that map to the baits • Coverage depth distribution (target regions) • Coverage unevenness & Cohort Coverage Sparseness • Insert size distribution • Duplicate rate • Tools • Alfred • QC3 • Various picard CollectMetrics tools • covReport
  • 22. Cohort Coverage Sparseness (CCS) and Unevenness (UE) Scores for a detailed assessment of the distribution of coverage of sequence reads https://www.nature.com/articles/s41598-017-01005-x
  • 23. Local and global non-uniformity of different capture toolkits
  • 25. Differences from Capture toolkits https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4092227/
  • 26. Exome probe design is one of the major culprits • Most of the observed bias in modern WES stems from mappability limitations of short reads and exome probe design rather than sequence composition. https://www.nature.com/articles/s41598-020-59026-y
  • 27. Alfred QC metrics https://academic.oup.com/bioinformatics/article/35/14/2489/5232224 Alignment Metric DNA-Seq (WGS) DNA-Seq (Capture) RNA-Seq ChIP-Seq/ATAC-Seq Chart Type Mapping Statistics ✔ ✔ ✔ ✔ Table Duplicate Statistics ✔ ✔ ✔ ✔ Table Sequencing Error Rates ✔ ✔ ✔ ✔ Table Base Content Distribution ✔ ✔ ✔ ✔ Grouped Line Chart Read Length Distribution ✔ ✔ ✔ ✔ Line Chart Base Quality Distribution ✔ ✔ ✔ ✔ Line Chart Coverage Histogram ✔ ✔ ✔ ✔ Line Chart Insert Size Distribution ✔ ✔ ✔ ✔ Grouped Line Chart InDel Size Distribution ✔ ✔ ✔ ✔ Grouped Line Chart InDel Context ✔ ✔ ✔ ✔ Bar Chart GC Content ✔ ✔ ✔ ✔ Grouped Line Chart On-Target Rate ✔ Line Chart Target Coverage Distribution ✔ Line Chart TSS Enrichment ✔ Table DNA pitch / Nucleosome pattern ✔ Grouped Line Chart https://www.gear-genomics.com/docs/alfred/webapp/#featuresty-control)
  • 29. From BAM to VCF GATK:  Slop exon by 200 bp  Analysis for each chromosome
  • 30. Variant callers (Mutect2) (HaplotypeCaller) BreakSeq, LUMPY, Hydra,DELLY, CNVNator, Pindel FreeBayes/SAMtools, DeepVariant
  • 31. GATK Best practices for population- based germline variant calling
  • 32. GATK Mutect2 Best practices for population- based soMATIC variant calling
  • 33. Discrepancy of variants called by different callers
  • 34. Integrated variant calling • Integration of multiple tools’ results • Isma (integrative somatic mutation analysis) • Ensemble Machine learning method • BAYSIC • SomaticSeq • NeoMutate • SMuRF (Bartha and Gyorffy2019) (Nanni et al. 2019)
  • 35. Quality control in WES Raw data QC BAM QC variant QC
  • 36. Sample-level Variant QC • Tools • GATK CollectVariantCallingMetrics , VCFtools, PLINK/seq, QC3 • Important QC metrics • Ti/Tv ratio, nonsynonymous/synonymous, heterozygous/nonreference-homozygous (het/nonref-hom) ratio, mean depth, • Genotype missing rate • Genotype concordance to related data (different platforms) • Cross-sample DNA contamination (VerifyBamID) • Identity-by-descent (IBD) analysis (PLINK) • Related samples • PCA (EIGENSTRAT) • Population stratum (ethnicity) • Sex check (PLINK)
  • 37. Ti/Tv ratio and het/nonref-hom ratio • The Ti/Tv ratio varies greatly by genome region and functionality, but not by ancestry. • The het/nonref-hom ratio varies greatly by ancestry, but not by genome regions and functionality. • extreme guanine + cytosine content (either high or low) is negatively associated with the Ti/Tv ratio magnitude. • when performing QC assessment using these two measures, care must be taken to apply the correct thresholds based on ancestry and genome region. https://academic.oup.com/bioinformatics/article/31/3/318/2366248 Too low ==> high false positive rate; too high ==> bias.
  • 39. Potential error sources in next-generation sequencing workflow https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1659-6
  • 40. Origin of variant artifacts • Artifacts introduced by sample/library preparation • low-quality base calls (Read-end artifacts and other low Qual bases) • Alignment artifacts • Local misalignment near indels, • Erroneous alignments in low-complexity regions • Paralogous alignments of reads not well represented in the reference • Strand orientation bias artifacts (Strand Orientation Bias Detector (SOBDetector), Fisher score)-- https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-666 •
  • 41. Artifacts (https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-020-00791-w) Low base qual Read end Strand bias Low complexity misalignment Paralog misalgnment
  • 42. Variant-level QC • Important QC metrics • Genotype missing rate • Hardy-Weinberg Equilibrium (caution) p-value • Mendelian error rate • Allele balance of heterozygous calls • Variant quality score (GATK): filtering SNP and INDELS separately(https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard- filtering-germline-short-variants) • Hard filter • QualByDepth (QD) • FisherStrand (FS) • StrandOddsRatio (SOR) • RMSMappingQuality (MQ) • MappingQualityRankSumTest (MQRankSum) • ReadPosRankSumTest (ReadPosRankSum) • Machine learning-based filtering: Variant Quality Score Recalibration --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "my_snp_filter"
  • 43. Variant-level filtering • Tools • GATK, VCFtools, PLINK/Seq • Sequencing data-based filtering • Exclude potential artifacts • Database-based filtering: • Exclude known variants which are present in public SNP databases, published studies or in-house databases as it is assumed that common variants represent harmless variations • Pedigree-based filtering • Each generation introduces up to 4.5 deleterious mutations, it might be as well that a de novo mutation is causing the disease. • Function-based filtering • Caution: risk removing the pathogenic variant
  • 45. Allelic balance • SLIVAR: genotype quality, sequencing depth, allele balance, and population allele frequency : https://github.com/brentp/slivar https://onlinelibrary.wiley.com/doi/full/10.1002/humu.23674
  • 46. Variant annotation tools VAT Annotation of variants by functionality in a cloud computing environment.
  • 48. Functional predictors/Prioritization tools snpSift http://pcingola.github.io/SnpEff/ss_introduction/ (Hintzsche et al., 2016) VAAST https://github.com/Yandell-Lab/VVP-pub VarSifter, VarSight gNome, KGGseq
  • 49. (Cheng et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. SCIENCE, 19 Sep 2023Vol 381, Issue 6664,DOI: 10.1126/science.adg7492) Latest, advanced AI tool for infer effect of missense mutations: AlphaMissense (Hintzsche et al., 2016)
  • 50. Tools and resources for linking variants to therapeutics
  • 52. Oncoprint for visualizing cohort variants
  • 55. WES and its data analysis
  • 56.
  • 57. WES data analysis pipelines • DRAGEN (Illumina) • https://www.illumina.com/products/by-type/informatics- products/basespace-sequence-hub/apps/dragen-enrichment.html • JWES
  • 58. • A high-performance commercial solution (https://www.sentieon.com/products/) • improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system WES data analysis pipelines