Abstract
AIRR-seq data (antibody/B-cell and T-cell receptor sequences from Adaptive Immune Receptor Repertoires) can describe the adaptive immune response in exquisite detail, and comparison and analysis of these data across studies and institutions can greatly contribute to the development of diagnostics and therapeutics, including the discovery of monoclonal antibodies for treatment of autoimmune diseases.
The AIRR community has developed protocols and standards for curating, analyzing and sharing AIRR-seq data (www.airr-community.org), and supports the AIRR Data Commons, a set of geographically distributed repositories that follows the AIRR Community’s metadata standards and the FAIR principles. The ADC currently comprises > 5 Bn receptor sequences from over 86 studies and ~9000 repertoires. The data model of the ADC has recently been expanded to include gene expression and cell phenotype data from single immune receptor cells, as well as MHC/HLA genotyping.
The iReceptor Gateway (ireceptor.org) queries this AIRR Data Commons for specific “metadata”, e.g. “find all repertoires from T1D studies” or for specific CDR3 sequences (e.g., find all repertoires from healthy individuals expressing this CDR3 sequence). Data from these federated repositories can then be analyzed through the Gateway by several sophisticated analysis tools, or downloaded for further analysis offline. The iReceptor Team at Simon Fraser University has recently initiated a collaboration to greatly expand the amount of bulk and single-cell immune profiling data from T1D studies in the AIRR Data Commons. For more information on obtaining or sharing AIRR-seq data contact support@ireceptor.org.
The top 3 key questions that the Adaptive Immune Receptor Repertoire (AIRR) can answer:
1. A researcher observes that many individuals with Type 1 Diabetes express a specific B-cell or T-cell receptor compared to controls (i.e., a “public” clonotype). To what degree is this receptor observed to be public across other T1D studies or other autoimmune disease populations?
2. Can Machine Learning be used to identify individuals who will respond well to a new cancer immunotherapy based on differences in their antibody/B-cell or T-cell receptor repertoires as curated in the AIRR Data Commons?
3. Is there an association between particular HLA, immunoglobulin (IG), or T-cell receptor (TR) germline gene polymorphisms and propensity toward specific infectious or autoimmune diseases?
Presenters:
Dr. Felix Breden, Scientific Director, iReceptor
Dr. Brian Corrie, Technical Director, iReceptor
Dr. Kira Neller, Bioinformatics Director, iReceptor
Upcoming webinars schedule: https://dknet.org/about/webinar
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
dkNET Webinar: : FAIR Data Curation of Antibody/B-cell and T-cell Receptor Sequences in the AIRR Data Commons 01/27/2023-Final.pdf
1. FAIR Data Curation of Antibody/B-cell and T-cell
Receptor Sequences in the AIRR Data Commons
Felix Breden, Brian Corrie, Kira Neller
iReceptor
dkNET Webinar
January 27, 2023
2. Presentation Overview
• Introduction to the AIRR (Adaptive Immune Receptor Repertoire) Community
• AIRR Data Commons and iReceptor
• iReceptor v4.0 – an Overview
• Navigating the iReceptor Platform – T1D Use-Cases
• Navigating iReceptor v4.0 Features – Clones, Cells, Analyses
3. Presentation Overview
• Introduction to the AIRR (Adaptive Immune Receptor Repertoire) Community
• AIRR Data Commons and iReceptor
• iReceptor v4.0 – an Overview
• Navigating the iReceptor Platform – T1D Use-Cases
• Navigating iReceptor v4.0 Features – Clones, Cells, Analyses
4. Introduction to the AIRR Community:
The Adaptive Immune System
• Focus on antibody/B-cell receptors and T-cell receptors – AIRR-seq data (Adaptive Immune Receptor
Repertoire)
• Critical to development of vaccines, drugs suppressing autoimmune diseases, new cancer
immunotherapies, etc.
• Adaptive immune system evolves within the body in response to pathogens (bacteria, viruses, etc.)
• Incredibly variable to recognize and remove bacteria and viruses (including new ones, e.g. novel
coronavirus)
• AIRR-seq repertoires are highly diverse: ~1013 potential human B-cell receptors
• Systemic sclerosis (de Bourcy et al.) - 700M B-Cell receptors
5. • Clones are sets of B cells
or T cells descended from
ancestral cell produced
by V(D)J recombination
• Immunoglobulin and T-cell
receptor genes are only
genes in eukaryote
genome that undergo this
somatic recombination
AIRR-seq data are difficult to share and compare:
Somatic recombination demands unique database model & analysis tools
6. Yaari & Kleinstein 2015 Genome Medicine 7:121-135
Cell/sample prep
Library prep
AIRR-seq data are difficult to share and compare:
Many ways for experiments to differ
7. B-cell Clonal Lineage Expansion in Health and Disease
• Chronic Lymphocytic Leukemia (CLL) is
characterized by the expansion of a
few dominant clones in B-cell
repertoire (Bashford-Rogers et al.
2019)
• FDA approved Adaptive
Biotechnologies clonoSEQ® test for
Minimal Residual Disease (MRD)
based on searching for these CLL-
associated, expanded clones
8. Adaptive Immune Receptor Repertoire (AIRR) Community
• The AIRR Community (2015) is a grass-roots group of immunologists, bioinformaticists,
computer scientists, experts in legal, ethical and IP issues, who are developing guidelines and
standards for the generation, annotation and storage of high-throughput AIRR-seq data to
facilitate its use by the larger research community.
• Ability to share AIRR-seq data greatly increases the value of any one data set:
• Each researcher may have small N, large amount of data per sample
• Increase sample sizes, statistical power
• AI approaches demand huge sample sizes and number of data points
• Facilitate comparisons between affected/controls/multiple disease states
9. AIRR Community Working Groups
1. Biological Resources – Biological calibrators and reagents for evaluation of AIRR-seq data
2. Common Repository – Data Commons for AIRR-seq data, following FAIR principles
3. Diagnostics – Facilitate development of diagnostics and markers for disease
4. Germline Database – Germline gene inference from AIRR-seq data
5. Legal and Ethics – Standards for human subjects
6. Software – Interoperability of analysis software
7. Standards – For publishing or depositing AIRR-seq data (MiAIRR)
10. AIRR Community Working Groups Develop Standards
Minimal StandardsWG
Data RepresentationWG
Common RepositoryWG
MiAIRR: Minimal metadata standard for
depositing AIRR-seq data.
Nature Immunology (2017)
DataRep Standard: File format and specification
for sharing AIRR-seq rearrangement data.
Frontiers in Immunology (2018)
ADC API:AIRR repository web API for data
exploration.
Frontiers in Big Data (2020)
Standards (Publications) are ratified by full AIRR Community
Work with us: www.airr-community.org
11. Presentation Overview
• Introduction to the AIRR (Adaptive Immune Receptor Repertoire) Community
• AIRR Data Commons and iReceptor
• iReceptor v4.0 – an Overview
• Navigating the iReceptor Platform – T1D Use-Cases
• Navigating iReceptor v4.0 Features – Clones, Cells, Analyses
12. AIRR Data Commons
• Philosophy: Distributed set of AIRR compliant repositories – AIRR Data Commons
• AIRR Standards: Search across study (time points), subject (age) , sample (tissue, disease state)
• Allows for scalable repositories (billions of sequences) 10s - 100s of repositories
• Data curated at home institution under local data policy
• Researcher needs: AIRR-seq data that is FAIR
• Find data, federate data (Accessible and Interoperable)
• Reuse data to derive new insights
13. The iReceptor Approach
iReceptor Scientific Gateway
Interactive web-based data
discovery, exploration, and
analytics
Data
Federation
Data
Query
Hide complexity from the user:
finding, federating,
and analyzing data
AIRR Data Commons
Distributed AIRR-seq data
repositories
Based on standards developed
by the international
AIRR community
14. -
1,000
2,000
3,000
4,000
5,000
6,000
7,000
Sequence
Annotations
(Millions)
Year-Month
AIRR Data Commons Growth
T1D (Canada)
Roche (Canada)
Muenster (Germany)
NICD (South Africa)
sciReptor (Germany)
Sorbonne (France)
VDJBase (Israel)
VDJServer (US)
iReceptor COVID-19
(Canada)
iReceptor Public
Archive (Canada)
COVID-19
5 new international
repositories
Small step increases
Growth in the AIRR Data Commons (ADC)
Large step increases
NewT1D repository (Jan 2023)
15. International user base
24% new
users from
industry
June 2020:
COVID-19 data
available
iReceptor – Usage trends
16. COVID-19: Disease specific data sharing driving research
COVID-19 data sharing (this is not normal):
• Researchers reaching out to publish data
• Researchers collaborating with us to publish data
Schultheiß et al
• Pre-published in ADC before pre-print
• iReceptor cited as source for annotated data!
• Incredibly rich data set
• 46 subjects, IG+TR data, 15M annotations
• Time series data – out to 55 days, up to 9 time points
2020-06 2020-12 2021-06 2021-12
Goel
et
al.
Turner
et
al.
Goel
at
al.
Schmitz
et
al.
Nielsen
et
al.
Data
Nielsen
et
al.
Preprint
Galson
et
al.,
Minvervina
et
al
Schultheiß
et
al.,
Liao
et
al
Shomuradova
et
al.
Alsoussie
et
al.
Kim
et
al.
Kuri-Cervantes
et
al.
Wen
et
al.
Montague
et
al.
Nolan
et
al
Mor
et
al.
Sokal
et
al.
COVID-19 curation
continues
18. Presentation Overview
• Introduction to the AIRR (Adaptive Immune Receptor Repertoire) Community
• AIRR Data Commons and iReceptor
• iReceptor v4.0 – an Overview
• Navigating the iReceptor Platform – T1D Use-Cases
• Navigating iReceptor v4.0 Features – Clones, Cells, Analyses
19. iReceptor V4.0: Curate both AIRR-seq and Cell/GEX data in the ADC!
Combining AIRR-seq and Single Cell Immune Profiling
• Bulk AIRR-seq provides deep sampling across many cells (106 – 107)
• Single cell provides paired chains, better clone resolution, gene expression (GEX) across a smaller
number of cells (103)
• Complementary - gain a better understanding of immune cell phenotype and immune system state
Wen et al., DOI: 10.1038/s41421-020-0168-9
Early recovery COVID-19, ERS1-GEX: 5,107 cells; 9 clusters
ADC: https://gateway.ireceptor.org/samples?query_id=51948
AIRR-seq
Top clone (dark blue)
167/1392 (12%)
What is the gene
expression signature?
20. • AIRR Community developed Single Cell/GEX extension (collaboration with 10x Genomics)
• Load Cell/GEX data into ADC repositories (matrix/features/barcodes files)
• Query studies based on both VDJ and Cell/GEX data
• AIRR standard released August 2022
• iReceptor v4.0
• iReceptor Turnkey repositories can store and query Cell & GEX data
• iReceptor Gateway extended to support Single Cell immunology workflows
• In production Dec 2022 – curated Single Cell data & Single Cell user workflows
Single Cell Immune Profiling in the AIRR Data Commons
21. iReceptor v3.0…
A platform for finding AIRR-seq data in the
AIRR Data Commons
AIRR v1.4: Clones, Cells/GEX
iReceptor v4.0 – an integrative approach (data mashup)
iReceptor Scientific Gateway
Interactive web-based data
discovery, exploration, and
analytics
Data
Federation
Data
Query
AIRR Data Commons
Distributed AIRR-seq data
repositories
Antigen specificity:
IEDB
Cells:
Human Cell Atlas
Ontologies:
EBI OLS
Clones, Cells/GEX
Analysis Workflows
Job
Management
Analysis
Results
Complex analysis tools
Analyze rearrangements, clones,
cells across entire ADC
22. Presentation Overview
• Introduction to the AIRR (Adaptive Immune Receptor Repertoire) Community
• AIRR Data Commons and iReceptor
• iReceptor v4.0 – an Overview
• Navigating the iReceptor Platform – T1D Use-Cases
• Navigating iReceptor v4.0 Features – Clones, Cells, Analyses
23. Searches the AIRR Data Commons
8 repositories, 86 studies, 9346 repertoires, >5 billion sequence annotations
Two workflows
Searching study metadata, searching sequence annotations
The iReceptor Gateway – How Does it Work?
35. IEDB – 206 insulin-binding TCRs are associated with T1D
36. CDR3 search in gateway simultaneously
queries IEDB for known binding interactions!
iReceptor Gateway Sequence Search Links Out to IEDB –
Discovery of novel antigen/epitope specificity
Associated with
previous infection?
42. iReceptor v3.0: Select data of interest and download!
Run the analysis on the data directly
User and data never leave the platform!
iReceptor v4.0: Choose the Analysis to run on the data!
iReceptor v4.0: Analysis Applications
43. Choose an Analysis App
CellTypist
Gateway does all the work
- Downloads data from ADC
- Stages data to computation
- Stages app to computation
- Runs job
- Tabulates results
- Presents results to the user
Submit a Job
iReceptor v4.0: Analysis Applications
48. Opportunities for T1D & AIRR Data Commons
• Achieve “Network Effect”
• More repositories/more data to the AIRR Data Commons
• AIRR Data Commons “go-to” place for combined AIRR-seq and Single Cell Immune Profile study data
• Extend COVID-19 sharing culture to T1D
• All disease areas would benefit from the same spirit of sharing – can we do it for T1D
• We are working with T1D colleagues to curate a critical mass of T1D AIRR-seq studies
• Systems Immunology – AIRR-seq data doesn’t stand alone!
• Bringing multi-omics data together (AIRR-seq, Epitope, GEX, …)
To contribute to or explore the AIRR Data Commons: support@ireceptor.org
49. Acknowledgements
• Colleagues in the AIRR Community
• Collaborators: Partners in CIHR/EU Horizon 2020 iReceptor Plus project
• Funders
• CANARIE
• Canada Foundation for Innovation
• CIHR
• BC Knowledge Development Fund
• EU Horizon 2020 Research and Innovation Programme
• Simon Fraser University