SlideShare a Scribd company logo
1 of 30
Rare Variant Analysis
Workflows:
Analyzing NGS Data in
Large Cohorts
Nov 13, 2013
Bryce Christensen
Statistical Geneticist / Director of Services
Rare Variant Analysis
Workflows:
Analyzing NGS Data in
Large Cohorts
Nov 13, 2013
Bryce Christensen
Statistical Geneticist / Director of Services
Use the Questions pane in
your GoToWebinar window
Questions during
the presentation
Golden Helix
Leaders in Genetic Analytics
 Founded in 1998
 Multi-disciplinary: computer science,
bioinformatics, statistics, genetics
 Software and analytic services
About Golden Helix
GenomeBrowse
 Free sequencing visualization tool
 Launched in 2011
 Makes the process of exploring DNA-
seq and RNA-seq pile-up and
coverage data intuitive and powerful
 Stream public annotations via the
cloud
 Use it to validate variant calls, trio
exploration, de Novo discovery, and
more
Core
Features
Packages
Core Features
 Powerful Data Management
 Rich Visualizations
 Robust Statistics
 Flexible
 Easy-to-use
Applications
 Genotype Analysis
 DNA sequence analysis
 CNV Analysis
 RNA-seq differential expression
 Family Based Association
SNP & Variation Suite (SVS)
Merging of Two Great Products
Previous Webcast Recording Available
Agenda
Brief review of upstream and QC considerations
NGS workflow design in SVS
2
3
4
Overview of RV analysis approaches
Define the problem: What is rare variant (RV) analysis?1
Interactive software demo5
What about exome chips?6
GenomeBrowse SVS 8: Exploratory tools, Analysis workflows
The Problem
 Array-based GWAS has been the primary technology for gene-
finding research for most of the past decade
- Common variant – common disease hypothesis
 NGS technology, particularly whole-exome sequencing, makes it
possible to include rare variants (RVs) in the analysis
 Individual RVs lack statistical power for standard GWAS
approaches
- How do we utilize that information?
 Proposed solution: combine RVs into logical groups and analyze
them as a single unit
- AKA “Collapsing” or “Burden” tests.
From the Vault:
January 2011 Slide on RV Analysis
What have we learned
since then?
NGS Analysis
Primary
Analysis
Secondary
Analysis
Tertiary
Analysis
“Sense Making”
 Analysis of hardware generated data, on-machine real-time stats.
 Production of sequence reads and quality scores
 Typical product is “FASTQ” file
 Recalibrating, de-duplication, QA and clipping/filtering reads
 Alignment/Assembly of reads
 Variant calling on aligned reads
 Typical products are “BAM” and/or “VCF” files
 QA and filtering of variant calls
 Annotation and filtering of variants
 Multi-sample integration
 Visualization of variants in genomic context
 Experiment-specific inheritance/population analysis
 “Small-N” and “Large-N” approaches
NGS Analysis
Primary
Analysis
Secondary
Analysis
Tertiary
Analysis
“Sense Making”
 Analysis of hardware generated data, on-machine real-time stats.
 Production of sequence reads and quality scores
 Typical product is “FASTQ” file
 Recalibrating, de-duplication, QA and clipping/filtering reads
 Alignment/Assembly of reads
 Variant calling on aligned reads
 Typical products are “BAM” and/or “VCF” files
 QA and filtering of variant calls
 Annotation and filtering of variants
 Multi-sample integration
 Visualization of variants in genomic context
 Experiment-specific inheritance/population analysis
 “Small-N” and “Large-N” approaches
Most Importantly: Be Consistent!
Gholson Lyon, 2012
Things That Can Confound Your Experiment
Library preparation errors Sequencing errors Analysis errors
 PCR amplification point
mutations (e.g. TruSeq
protocol, amplicons)
 Emulsion PCR
amplification point
mutations (454, Ion
Torrent and SOLiD)
 Bridge amplification errors
(Illumina)
 Chimera generation
(particularly during
amplicon protocols)
 Sample contamination
 Amplification errors
associated with high or low
GC content
 PCR duplicates
 Base miscalls due to low
signal
 InDel errors (particular
PacBio)
 Short homopolymer
associated InDels (Ion
Torrent PGM)
 Post-homopolymeric tract
SNPs (Illumina) and/or
read-through problems
 Associated with inverted
repeats (Illumina)
 Specific motifs particularly
with older Illumina
chemistry
 Calling variants without
sufficient reads mapping
 Bad mapping (incorrectly
placed read)
 Correctly placed read
but InDels misaligned
 Multi-mapping to
paralogous regions
 Sequence contamination
e.g. adaptors
 Error in reference
sequence
 Alignment to ends of
contigs in draft assemblies
 Incorrect trimming of
reads, aligning adaptors
 Inclusion of PCR
duplicates
Nick Loman: Sequencing data: I want the truth! (You can’t handle the truth!)
Qual et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent,
Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012 Jul
NGS Analysis
Primary
Analysis
Secondary
Analysis
Tertiary
Analysis
“Sense Making”
 Analysis of hardware generated data, on-machine real-time stats.
 Production of sequence reads and quality scores
 Typical product is “FASTQ” file
 Recalibrating, de-duplication, QA and clipping/filtering reads
 Alignment/Assembly of reads
 Variant calling on aligned reads
 Typical products are “BAM” and/or “VCF” files
 QA and filtering of variant calls
 Annotation and filtering of variants
 Multi-sample integration
 Visualization of variants in genomic context
 Experiment-specific inheritance/population analysis
 “Small-N” and “Large-N” approaches
Two Primary Approaches
 Direct search for susceptibility variants
- Assume highly penetrant variant and/or Mendelian disease
- Extensive reliance on bioinformatics for variant annotation and filtering
- Sample sizes usually small—from single case up to nuclear families
 Rare Variant (RV) “collapsing” methods
- More common in complex disease research
- May require very large sample sizes!
- Assume that any of several LOF variants in a susceptibility gene may lead to
same disease or trait
- Many statistical tests available
- Also relies heavily on bioinformatics
Families of Collapsing Tests
 Burden Tests
- Combine minor alleles across multiple variant sites…
- Without weighting (CMC, CAST, CMAT)
- With fixed weights based on allele frequency (WSS, RWAS)
- With data-adaptive weights (Lin/Tang, KBAC)
- With data-adaptive thresholds (Step-Up, VT)
- With extensions to allow for effects in either direction (Ionita-Laza/Lange, C-alpha)
 Kernel Tests
- Allow for individual variant effects in either direction and permit covariate
adjustment based on kernel regression
- Kwee et al., AJHG, 2008
- SKAT
- SKAT-O
Credit: Schaid et al., Genet Epi, 2013
CMC: Combined Multivariate and Collapsing
 Multivariate test: simultaneous test for association of common
and rare variants in gene
 Flexibility in variant frequency bin definition
 Testing methods include Hotelling T2 and Regression
 Regression method allows for covariate correction
 Li and Leal, AJHG, 2008
KBAC: Kernel Based Adaptive Clustering
 Per-gene tests models the risk associated with multi-locus
genotypes at a per-gene level
 Adaptive weighting procedure that gives higher weights to
genotypes with higher sample risks
- Meant to attain good balance between classification accuracy and the number of
estimated parameters
 SVS implementation includes option for 1- or 2-tailed test
- But most powerful when all variants in gene have unidirectional effect
 Permutation testing or regression options
- Regression allows for covariate correction
 Liu and Leal, PLoS Genetics, 2010
NGS Analysis Workflow Development in SVS
 SVS is very flexible in workflow design.
 SVS includes a broad range of tools for data manipulation and
variant annotation and visualization that can be used together to
guide us on an interactive exploration of the data.
 We begin by defining the final goal and the steps needed to help us
reach that goal:
- Are we looking for a very rare, non-synonymous variant that causes a dominant
Mendelian trait?
- Are we looking for a gene with excess rare variation in cases vs controls?
 Once we know what we are looking for, we can identify the
available annotation sources that will help us answer the question.
Python Integration in SVS
 Allows rapid
development and
iteration of new
functions
 API access to most
SVS functions
 Access to extensive
Python analytic
libraries
 Fully documented in
manual
SVS Online Scripts Repository
 Downloadable add-on
functions for a variety of
analysis and data
management tasks
 “Plug-and-play”
 Some contributed by
customers
 Popular scripts often get
adopted into the “shipped”
version of SVS.
 Scripts in repository are
forward compatible to
SVS 8.0
Today’s Featured Scripts
 Activate Variants by Genotype Count Threshold
- Identify variants that occur with a specified frequency in one or several groups
 Filter by Marker Map Field
- Variant-level “INFO” fields from VCF files are imported to the SVS marker map
- This script allows you to filter markers based on those variables
 Many more useful scripts to take a look at:
- Add Annotation Data to Marker Map from Spreadsheet
- Nonparametric association tests
- Import Unsorted VCF Files
- Build Variant Spreadsheet
- Many, many more
Interactive Demonstration
 GenomeBrowse
- Exploring multi-sample VCF files in our free genome viewer software
 SVS 8.0
- Exploratory analysis workflow
- Using downloaded scripts
- Using basic analysis tools to create advanced workflows
- Simulate the development of a burden test
- RV association testing workflow
- KBAC
- CMC
- Data visualization
SVS Demo
What about Exome Chips?
 Exome chips CAN be used
with RV association tests
 Exome chips include both
common and rare variants
 Remember: Exome chips
don’t capture all rare variants.
 Exome chips are thus less
powerful than WES for RV
associations, but also
significantly cheaper.
A Note about Exome Chips
 Exome chips are not GWAS chips
- GWAS chips focus on common SNPs, have uniform spacing, minimal LD and
are designed to capture population variability
- Exome chips include rare variants and the content is anything but uniform
 Most GWAS functions can be used with exome chips, but
require some workflow adjustments
- Gender checking
- IBD estimation
- Principal components
 Not unlike other chips with custom/targeted content
- Cardio-MetaboChip
- ImmunoChip
Questions or
more info:
 info@goldenhelix.com
 Request a copy of SVS at
www.goldenhelix.com
 Download GenomeBrowse
for free at
www.GenomeBrowse.com
Use the Questions pane in
your GoToWebinar window
Any Questions?

More Related Content

What's hot

A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
 
LUGM-Update of the Illumina Analysis Pipeline
LUGM-Update of the Illumina Analysis PipelineLUGM-Update of the Illumina Analysis Pipeline
LUGM-Update of the Illumina Analysis PipelineHai-Wei Yen
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizeAnn Loraine
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
rnaseq_from_babelomics
rnaseq_from_babelomicsrnaseq_from_babelomics
rnaseq_from_babelomicsFrancisco Garc
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqTimothy Tickle
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Torsten Seemann
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2QIAGEN
 
Molecular insight into Gene Expression Using Digital RNAseq: Digital RNAseq W...
Molecular insight into Gene Expression Using Digital RNAseq: Digital RNAseq W...Molecular insight into Gene Expression Using Digital RNAseq: Digital RNAseq W...
Molecular insight into Gene Expression Using Digital RNAseq: Digital RNAseq W...QIAGEN
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platformsAllSeq
 
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then someRNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then somebasepairtech
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Gunnar Rätsch
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analysesrjorton
 

What's hot (20)

A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
 
ChIP-seq Theory
ChIP-seq TheoryChIP-seq Theory
ChIP-seq Theory
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015
 
LUGM-Update of the Illumina Analysis Pipeline
LUGM-Update of the Illumina Analysis PipelineLUGM-Update of the Illumina Analysis Pipeline
LUGM-Update of the Illumina Analysis Pipeline
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
rnaseq_from_babelomics
rnaseq_from_babelomicsrnaseq_from_babelomics
rnaseq_from_babelomics
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
 
Molecular insight into Gene Expression Using Digital RNAseq: Digital RNAseq W...
Molecular insight into Gene Expression Using Digital RNAseq: Digital RNAseq W...Molecular insight into Gene Expression Using Digital RNAseq: Digital RNAseq W...
Molecular insight into Gene Expression Using Digital RNAseq: Digital RNAseq W...
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then someRNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 

Similar to Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts

Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsDelaina Hawkins
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsGolden Helix Inc
 
Population-Based DNA Variant Analysis
Population-Based DNA Variant AnalysisPopulation-Based DNA Variant Analysis
Population-Based DNA Variant AnalysisGolden Helix
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionChris Dwan
 
Cassava genome hub
Cassava genome hubCassava genome hub
Cassava genome hubCIAT
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSGolden Helix Inc
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014Anil Thanki
 
Prediction and Meta-Analysis
Prediction and Meta-AnalysisPrediction and Meta-Analysis
Prediction and Meta-AnalysisGolden Helix
 
Prediction and Meta-Analysis
Prediction and Meta-AnalysisPrediction and Meta-Analysis
Prediction and Meta-AnalysisGolden Helix Inc
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Two Clinical Workflows - From Unfiltered Variants to a Clinical Report
Two Clinical Workflows - From Unfiltered Variants to a Clinical ReportTwo Clinical Workflows - From Unfiltered Variants to a Clinical Report
Two Clinical Workflows - From Unfiltered Variants to a Clinical ReportGolden Helix Inc
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious DiseaseJoão André Carriço
 
Next generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyNext generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyGenotypic Technology
 
VS-CNV Annotations from the User's Perspective
VS-CNV Annotations from the User's PerspectiveVS-CNV Annotations from the User's Perspective
VS-CNV Annotations from the User's PerspectiveGolden Helix
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsGolden Helix Inc
 

Similar to Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts (20)

Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Using VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research WorkflowsUsing VarSeq to Improve Variant Analysis Research Workflows
Using VarSeq to Improve Variant Analysis Research Workflows
 
Population-Based DNA Variant Analysis
Population-Based DNA Variant AnalysisPopulation-Based DNA Variant Analysis
Population-Based DNA Variant Analysis
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
 
Cassava genome hub
Cassava genome hubCassava genome hub
Cassava genome hub
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
TGAC Browser bosc 2014
TGAC Browser bosc 2014TGAC Browser bosc 2014
TGAC Browser bosc 2014
 
Prediction and Meta-Analysis
Prediction and Meta-AnalysisPrediction and Meta-Analysis
Prediction and Meta-Analysis
 
Prediction and Meta-Analysis
Prediction and Meta-AnalysisPrediction and Meta-Analysis
Prediction and Meta-Analysis
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Two Clinical Workflows
Two Clinical WorkflowsTwo Clinical Workflows
Two Clinical Workflows
 
Two Clinical Workflows - From Unfiltered Variants to a Clinical Report
Two Clinical Workflows - From Unfiltered Variants to a Clinical ReportTwo Clinical Workflows - From Unfiltered Variants to a Clinical Report
Two Clinical Workflows - From Unfiltered Variants to a Clinical Report
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 
Next generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic TechnologyNext generation sequencing & microarray-- Genotypic Technology
Next generation sequencing & microarray-- Genotypic Technology
 
2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked
 
Folker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data AnnotationFolker Meyer: Metagenomic Data Annotation
Folker Meyer: Metagenomic Data Annotation
 
VS-CNV Annotations from the User's Perspective
VS-CNV Annotations from the User's PerspectiveVS-CNV Annotations from the User's Perspective
VS-CNV Annotations from the User's Perspective
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 

More from Golden Helix Inc

Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...Golden Helix Inc
 
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...Golden Helix Inc
 
Authoring Clinical Reports in VarSeq
Authoring Clinical Reports in VarSeqAuthoring Clinical Reports in VarSeq
Authoring Clinical Reports in VarSeqGolden Helix Inc
 
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...Golden Helix Inc
 
The Molecular Sciences Made Personal
The Molecular Sciences Made PersonalThe Molecular Sciences Made Personal
The Molecular Sciences Made PersonalGolden Helix Inc
 
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeqIntroducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeqGolden Helix Inc
 
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...Golden Helix Inc
 
Cancer Workflows in VarSeq
Cancer Workflows in VarSeqCancer Workflows in VarSeq
Cancer Workflows in VarSeqGolden Helix Inc
 
Getting Started with VSWarehouse - The User Experience
Getting Started with VSWarehouse - The User ExperienceGetting Started with VSWarehouse - The User Experience
Getting Started with VSWarehouse - The User ExperienceGolden Helix Inc
 
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced CardiotoxicityPharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced CardiotoxicityGolden Helix Inc
 
Using WES in Distant Relationships to Identify Cardiomyopathy Genes
Using WES in Distant Relationships to Identify Cardiomyopathy GenesUsing WES in Distant Relationships to Identify Cardiomyopathy Genes
Using WES in Distant Relationships to Identify Cardiomyopathy GenesGolden Helix Inc
 
Using Clinical Reports as a part of a Gene Panel Pipeline
Using Clinical Reports as a part of a Gene Panel PipelineUsing Clinical Reports as a part of a Gene Panel Pipeline
Using Clinical Reports as a part of a Gene Panel PipelineGolden Helix Inc
 
Investigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceInvestigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceGolden Helix Inc
 
Personalized Medicine through Tumor Sequencing
Personalized Medicine through Tumor SequencingPersonalized Medicine through Tumor Sequencing
Personalized Medicine through Tumor SequencingGolden Helix Inc
 
Clinical Reporting Made Easy
Clinical Reporting Made EasyClinical Reporting Made Easy
Clinical Reporting Made EasyGolden Helix Inc
 
Population Structure & Genetic Improvement in Livestock
Population Structure & Genetic Improvement in LivestockPopulation Structure & Genetic Improvement in Livestock
Population Structure & Genetic Improvement in LivestockGolden Helix Inc
 

More from Golden Helix Inc (20)

Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
Pharmacological Induction of FoxO3 is a Potential Treatment for Sickle Cell D...
 
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
Uncovering novel candidate genes for pyridoxine-responsive epilepsy in a cons...
 
Authoring Clinical Reports in VarSeq
Authoring Clinical Reports in VarSeqAuthoring Clinical Reports in VarSeq
Authoring Clinical Reports in VarSeq
 
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
SETBP1 as a novel candidate gene for neurodevelopmental disorders of speech a...
 
The Molecular Sciences Made Personal
The Molecular Sciences Made PersonalThe Molecular Sciences Made Personal
The Molecular Sciences Made Personal
 
A Walk Through GWAS
A Walk Through GWASA Walk Through GWAS
A Walk Through GWAS
 
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeqIntroducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeq
 
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
MM-KBAC – Using Mixed Models to Adjust for Population Structure in a Rare-var...
 
Cancer Workflows in VarSeq
Cancer Workflows in VarSeqCancer Workflows in VarSeq
Cancer Workflows in VarSeq
 
Getting Started with VSWarehouse - The User Experience
Getting Started with VSWarehouse - The User ExperienceGetting Started with VSWarehouse - The User Experience
Getting Started with VSWarehouse - The User Experience
 
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced CardiotoxicityPharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
Pharmacogenomic Prediction of Antracycline-induced Cardiotoxicity
 
Custom Family Workflows
Custom Family WorkflowsCustom Family Workflows
Custom Family Workflows
 
Using WES in Distant Relationships to Identify Cardiomyopathy Genes
Using WES in Distant Relationships to Identify Cardiomyopathy GenesUsing WES in Distant Relationships to Identify Cardiomyopathy Genes
Using WES in Distant Relationships to Identify Cardiomyopathy Genes
 
Using Clinical Reports as a part of a Gene Panel Pipeline
Using Clinical Reports as a part of a Gene Panel PipelineUsing Clinical Reports as a part of a Gene Panel Pipeline
Using Clinical Reports as a part of a Gene Panel Pipeline
 
Investigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol DependenceInvestigating Shared Additive Genetic Variation for Alcohol Dependence
Investigating Shared Additive Genetic Variation for Alcohol Dependence
 
Personalized Medicine through Tumor Sequencing
Personalized Medicine through Tumor SequencingPersonalized Medicine through Tumor Sequencing
Personalized Medicine through Tumor Sequencing
 
CNV Analysis in VarSeq
CNV Analysis in VarSeqCNV Analysis in VarSeq
CNV Analysis in VarSeq
 
Beagle Imputation in SVS
Beagle Imputation in SVSBeagle Imputation in SVS
Beagle Imputation in SVS
 
Clinical Reporting Made Easy
Clinical Reporting Made EasyClinical Reporting Made Easy
Clinical Reporting Made Easy
 
Population Structure & Genetic Improvement in Livestock
Population Structure & Genetic Improvement in LivestockPopulation Structure & Genetic Improvement in Livestock
Population Structure & Genetic Improvement in Livestock
 

Recently uploaded

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 

Recently uploaded (20)

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 

Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts

  • 1. Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts Nov 13, 2013 Bryce Christensen Statistical Geneticist / Director of Services
  • 2. Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts Nov 13, 2013 Bryce Christensen Statistical Geneticist / Director of Services
  • 3. Use the Questions pane in your GoToWebinar window Questions during the presentation
  • 4. Golden Helix Leaders in Genetic Analytics  Founded in 1998  Multi-disciplinary: computer science, bioinformatics, statistics, genetics  Software and analytic services About Golden Helix
  • 5. GenomeBrowse  Free sequencing visualization tool  Launched in 2011  Makes the process of exploring DNA- seq and RNA-seq pile-up and coverage data intuitive and powerful  Stream public annotations via the cloud  Use it to validate variant calls, trio exploration, de Novo discovery, and more
  • 6. Core Features Packages Core Features  Powerful Data Management  Rich Visualizations  Robust Statistics  Flexible  Easy-to-use Applications  Genotype Analysis  DNA sequence analysis  CNV Analysis  RNA-seq differential expression  Family Based Association SNP & Variation Suite (SVS)
  • 7. Merging of Two Great Products
  • 9. Agenda Brief review of upstream and QC considerations NGS workflow design in SVS 2 3 4 Overview of RV analysis approaches Define the problem: What is rare variant (RV) analysis?1 Interactive software demo5 What about exome chips?6 GenomeBrowse SVS 8: Exploratory tools, Analysis workflows
  • 10. The Problem  Array-based GWAS has been the primary technology for gene- finding research for most of the past decade - Common variant – common disease hypothesis  NGS technology, particularly whole-exome sequencing, makes it possible to include rare variants (RVs) in the analysis  Individual RVs lack statistical power for standard GWAS approaches - How do we utilize that information?  Proposed solution: combine RVs into logical groups and analyze them as a single unit - AKA “Collapsing” or “Burden” tests.
  • 11. From the Vault: January 2011 Slide on RV Analysis What have we learned since then?
  • 12. NGS Analysis Primary Analysis Secondary Analysis Tertiary Analysis “Sense Making”  Analysis of hardware generated data, on-machine real-time stats.  Production of sequence reads and quality scores  Typical product is “FASTQ” file  Recalibrating, de-duplication, QA and clipping/filtering reads  Alignment/Assembly of reads  Variant calling on aligned reads  Typical products are “BAM” and/or “VCF” files  QA and filtering of variant calls  Annotation and filtering of variants  Multi-sample integration  Visualization of variants in genomic context  Experiment-specific inheritance/population analysis  “Small-N” and “Large-N” approaches
  • 13. NGS Analysis Primary Analysis Secondary Analysis Tertiary Analysis “Sense Making”  Analysis of hardware generated data, on-machine real-time stats.  Production of sequence reads and quality scores  Typical product is “FASTQ” file  Recalibrating, de-duplication, QA and clipping/filtering reads  Alignment/Assembly of reads  Variant calling on aligned reads  Typical products are “BAM” and/or “VCF” files  QA and filtering of variant calls  Annotation and filtering of variants  Multi-sample integration  Visualization of variants in genomic context  Experiment-specific inheritance/population analysis  “Small-N” and “Large-N” approaches
  • 14. Most Importantly: Be Consistent! Gholson Lyon, 2012
  • 15. Things That Can Confound Your Experiment Library preparation errors Sequencing errors Analysis errors  PCR amplification point mutations (e.g. TruSeq protocol, amplicons)  Emulsion PCR amplification point mutations (454, Ion Torrent and SOLiD)  Bridge amplification errors (Illumina)  Chimera generation (particularly during amplicon protocols)  Sample contamination  Amplification errors associated with high or low GC content  PCR duplicates  Base miscalls due to low signal  InDel errors (particular PacBio)  Short homopolymer associated InDels (Ion Torrent PGM)  Post-homopolymeric tract SNPs (Illumina) and/or read-through problems  Associated with inverted repeats (Illumina)  Specific motifs particularly with older Illumina chemistry  Calling variants without sufficient reads mapping  Bad mapping (incorrectly placed read)  Correctly placed read but InDels misaligned  Multi-mapping to paralogous regions  Sequence contamination e.g. adaptors  Error in reference sequence  Alignment to ends of contigs in draft assemblies  Incorrect trimming of reads, aligning adaptors  Inclusion of PCR duplicates Nick Loman: Sequencing data: I want the truth! (You can’t handle the truth!) Qual et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012 Jul
  • 16. NGS Analysis Primary Analysis Secondary Analysis Tertiary Analysis “Sense Making”  Analysis of hardware generated data, on-machine real-time stats.  Production of sequence reads and quality scores  Typical product is “FASTQ” file  Recalibrating, de-duplication, QA and clipping/filtering reads  Alignment/Assembly of reads  Variant calling on aligned reads  Typical products are “BAM” and/or “VCF” files  QA and filtering of variant calls  Annotation and filtering of variants  Multi-sample integration  Visualization of variants in genomic context  Experiment-specific inheritance/population analysis  “Small-N” and “Large-N” approaches
  • 17. Two Primary Approaches  Direct search for susceptibility variants - Assume highly penetrant variant and/or Mendelian disease - Extensive reliance on bioinformatics for variant annotation and filtering - Sample sizes usually small—from single case up to nuclear families  Rare Variant (RV) “collapsing” methods - More common in complex disease research - May require very large sample sizes! - Assume that any of several LOF variants in a susceptibility gene may lead to same disease or trait - Many statistical tests available - Also relies heavily on bioinformatics
  • 18. Families of Collapsing Tests  Burden Tests - Combine minor alleles across multiple variant sites… - Without weighting (CMC, CAST, CMAT) - With fixed weights based on allele frequency (WSS, RWAS) - With data-adaptive weights (Lin/Tang, KBAC) - With data-adaptive thresholds (Step-Up, VT) - With extensions to allow for effects in either direction (Ionita-Laza/Lange, C-alpha)  Kernel Tests - Allow for individual variant effects in either direction and permit covariate adjustment based on kernel regression - Kwee et al., AJHG, 2008 - SKAT - SKAT-O Credit: Schaid et al., Genet Epi, 2013
  • 19. CMC: Combined Multivariate and Collapsing  Multivariate test: simultaneous test for association of common and rare variants in gene  Flexibility in variant frequency bin definition  Testing methods include Hotelling T2 and Regression  Regression method allows for covariate correction  Li and Leal, AJHG, 2008
  • 20. KBAC: Kernel Based Adaptive Clustering  Per-gene tests models the risk associated with multi-locus genotypes at a per-gene level  Adaptive weighting procedure that gives higher weights to genotypes with higher sample risks - Meant to attain good balance between classification accuracy and the number of estimated parameters  SVS implementation includes option for 1- or 2-tailed test - But most powerful when all variants in gene have unidirectional effect  Permutation testing or regression options - Regression allows for covariate correction  Liu and Leal, PLoS Genetics, 2010
  • 21. NGS Analysis Workflow Development in SVS  SVS is very flexible in workflow design.  SVS includes a broad range of tools for data manipulation and variant annotation and visualization that can be used together to guide us on an interactive exploration of the data.  We begin by defining the final goal and the steps needed to help us reach that goal: - Are we looking for a very rare, non-synonymous variant that causes a dominant Mendelian trait? - Are we looking for a gene with excess rare variation in cases vs controls?  Once we know what we are looking for, we can identify the available annotation sources that will help us answer the question.
  • 22. Python Integration in SVS  Allows rapid development and iteration of new functions  API access to most SVS functions  Access to extensive Python analytic libraries  Fully documented in manual
  • 23. SVS Online Scripts Repository  Downloadable add-on functions for a variety of analysis and data management tasks  “Plug-and-play”  Some contributed by customers  Popular scripts often get adopted into the “shipped” version of SVS.  Scripts in repository are forward compatible to SVS 8.0
  • 24. Today’s Featured Scripts  Activate Variants by Genotype Count Threshold - Identify variants that occur with a specified frequency in one or several groups  Filter by Marker Map Field - Variant-level “INFO” fields from VCF files are imported to the SVS marker map - This script allows you to filter markers based on those variables  Many more useful scripts to take a look at: - Add Annotation Data to Marker Map from Spreadsheet - Nonparametric association tests - Import Unsorted VCF Files - Build Variant Spreadsheet - Many, many more
  • 25. Interactive Demonstration  GenomeBrowse - Exploring multi-sample VCF files in our free genome viewer software  SVS 8.0 - Exploratory analysis workflow - Using downloaded scripts - Using basic analysis tools to create advanced workflows - Simulate the development of a burden test - RV association testing workflow - KBAC - CMC - Data visualization
  • 27. What about Exome Chips?  Exome chips CAN be used with RV association tests  Exome chips include both common and rare variants  Remember: Exome chips don’t capture all rare variants.  Exome chips are thus less powerful than WES for RV associations, but also significantly cheaper.
  • 28. A Note about Exome Chips  Exome chips are not GWAS chips - GWAS chips focus on common SNPs, have uniform spacing, minimal LD and are designed to capture population variability - Exome chips include rare variants and the content is anything but uniform  Most GWAS functions can be used with exome chips, but require some workflow adjustments - Gender checking - IBD estimation - Principal components  Not unlike other chips with custom/targeted content - Cardio-MetaboChip - ImmunoChip
  • 29. Questions or more info:  info@goldenhelix.com  Request a copy of SVS at www.goldenhelix.com  Download GenomeBrowse for free at www.GenomeBrowse.com
  • 30. Use the Questions pane in your GoToWebinar window Any Questions?