Elucidata’s data harmonization platform- Polly, delivers the highest quality single cell data to fit diverse analysis methods & pipelines. https://www.elucidata.io/polly/data/single-cell
2. Polly by Elucidata
Elucidata’s data harmonization platform- Polly, delivers the highest
quality single cell data to fit diverse analysis methods & pipelines. All
datasets are Polly Verified, i.e harmonized with a configurable, granular
& transparent curation process
3. Streamlined Journey to Improving Quality of Single-cell Data
Data on Polly
Data at Source
Tabular file
(MTX, CSV)
Txt File
● 50% Missing annotations
● <2% Harmonized
● Different access nuances
● Formats vary across datasets &
samples
● <1% Missing annotations
● 100% Harmonized
● 4X New fields added
● Consistent H5AD format
Processing
Metadata
Harmonization
Cell
Annotation
Quality
Assurance
Polly Harmonization
4. Single-Cell Data Options on Polly
Raw Counts Polly Processed Counts Author Processed Counts
What is it?
Raw unfiltered counts
extracted from the
source, cleaned and
metadata annotated
Harmonized Single Cell Data,
consistently processed & cell
type annotated using a
validated Polly Pipeline
Single Cell Data that is
processed & cell type
annotated using author
provided parameters
Useful for
Re-Processing and
annotating data with
in-house pipelines
Making data comparable &
interoperable for large scale
comparative analyses
Replicating a published study
of interest
Output
File(s)
Unfiltered Raw counts
with 30 metadata fields
(H5AD)
● Polly Processed Counts
with cell type annotations
and 32 other fields (H5AD)
● Raw Counts with 30 fields
(H5AD)
● Author Processed Counts
with cell type annotations &
32 other fields (H5AD)
● Raw Counts with 30 fields
(H5AD)
5. Why Access Single-Cell Data on Polly?
Data You
Can Trust
~50 QA checks performed
on all data/metadata to
ensure quality and
provenance.
Learn how each dataset
was processed and
annotated with
comprehensive QA
reports.
Complete
Transparency
Request custom metadata
fields or cell type
annotation with your own
markers.
Customizable
Harmonization
Flexible Ways to
Consume Data
Work with Polly’s data on
tools and environments of
your choice. No download
restrictions applied!
6. How We Deliver: Data Concierge
Data Audits
● Experts identify datasets relevant to your research on/off Polly
● Requirement gathering for curation & processing of found data
Store in your Atlas
● Domain specific repository of Analysis-Ready data
● All datasets are QC-ed, Custom Curated & Polly Verified
Exploration and Analysis
● Explore on Polly via CellxGene
● Download data with Polly’s APIs or GUI, explore on tools of choice
● Customized solutions as service: GSEA, Knowledge Graphs, ML
Classifiers and Dashboards for analysis & visualization
7. 7
About the Customer
A therapeutics is an early stage startup based in Boston that is developing biologics for inflammatory and
autoimmune diseases. The company was looking to identify potential targets for these indications.
Objective
Find and integrate single cell datasets specific to inflammatory diseases from public sources.
Perform meta-analysis to arrive at fibroblast specific gene targets for further exploration.
Target Identification & Validation with Curated Single-cell Data: Case-study
9. How Was the Data Processed?
Data at Source Unfiltered Raw
Counts
H5AD files with
Hugo symbols, QC
metrics, curated
metadata fields
Filtering &
Normalization
Consistent filtering
criteria, normalization
& Batch effect
correction
Cell Type
Annotation
Store on
Atlas
Marker list from
publications to
derive cell
annotations
h5AD with curated
metadata and
consistently
annotated cells
mtx, csv, tsv, h5ad,
seurat, h5
10. Meta-Analysis for Target Identification and Validation
Differential expression analysis of merged
data to get top 250 DEGs
13 datasets identified and 3 datasets
merged
Refine results to top 20 genes with RF
model and point biserial scores
Examine expression and narrow down to
10 genes
Review literature and perform pathway
analysis to arrive at 5 targets
B cells
T cells
Myeloid cells
Plasma
Stem or
Enterocyte cells
Mast cells
Vascular cells
Fibroblasts
UMAP
2
UMAP 1
Integrated Cell Type
Diseased Normal
Fibroblast Fibroblast Other
Other
Gene
1
11. Single Cell Data Curation
Impact
Target Identification & Validation
156 scRNA-Seq datasets, specific to inflammatory diseases
were identified and annotated with relevant metadata information
Shortlisted 4 novel targets and validated 5 pre-identified targets
using meta-analysis
Time Savings
4X acceleration in the target identification process (from 8-10 months
to 2.5 months)
12. Reach out to us at info@elucidata.io or Book a Demo
with us to learn more.