SlideShare a Scribd company logo
Orphan and non-orphan EC number distribution across superkingdoms
Integrated approaches for the discovery of novel enzymatic activities
Guillaume REBOUL, Maria SOROKINA, Jonathan MERCIER, Karine BASTARD, Mark STAM, David VALLENET, Claudine MEDIGUE – CEA, Genoscope, LABGeM
Annotation Rules
Functional annotation rules
UniRule: HAMAP + PIRSF + RuleBase
Pathway Rules
Consistency against biological
processes
Knowledge
Base Rule Engine
Enzyme Activity Discovery workflow
Biological Facts
rule "Missing state"
when
$fact: Fact( present == "no", require == "yes", avoid == "no" )
then
modify( $fact.setState("missing") );
end
rule "require Pathway"
when
$org: Organism()
$path: Pathway(org == $org)
then
Fact fact = new Fact($org, $path);
fact.setRequire("yes");
insert(fact);
end
The “Novel Enzymatic Activities” group
• Group of the LABGeM: Laboratory of Bioinformatics
Analyses for Genomics and Metabolism
• Part of the CEA (French Alternative Energies and
Atomic Energy Commission)
• 3 Researchers
• 2 PhD students
• 1 sandwich placement Master Student
• 1 undergraduate student
• 1 post-doctoral placement available
Pool of
information
able to pull up
Novel
Enzymatic
Activities
Protein or
Domain
Families
Protein
Annotation
and
Sequences
Literature
Enzymatic
Reactions
Own database: NEADB
Summary
?
 iso-functional groups
 multi-functional group
 non-”activity” groups
 Motifs and key residues
for function assignment
3
Promiscuity
& Specificity
Modeling of
compounds
in active sites
+
Full family
 new metabolic
functions and
associated
pathways
4
Metabolic
Role Biochemical
validation
+
Genomic contextRepresentants
enzymes family
1
Define one
reaction
+Multiple alignment
one generic
reaction
A +B <-> A’ + B’
 A family of unknown function
 with experimental evidences
 with one available structure
17 substrats
new
reactions
2
Selection &
Screening
+Enzymatic
screening
Statistical
analysis
+
Family partitioning Potential metabolites
Set of sequences
BLAST PDB
Homology Modeling - MODELLER
3D Models
Cavity Detection - FPOCKET
3D-Active Sites
Structural Alignment - MULTALIGN
Hierarchical Clustering - WEKA
Specificity Determining Residues are determined by a log-likelihood analysis
Pfam unknown
family (DUF 846)
Next Generation Sequencing technology has dramatically increased the number of available sequences in
public databases. At the same time, many enzymatic activities (~22%) are orphans of protein sequence
(Sorokina et al., 2014). The large amount of available protein sequences is an opportunity to discover
enzymes associated to new reactions. We present here an integrated bioinformatics approach to reduce
this lack of knowledge in metabolism and to propose new activity/protein associations for experimental
validation. With this objective, the “New Enzymatic Activity” group of the LABGeM team is developing
several methods. The CanOE method combines genomic and metabolic contexts to predict candidate genes
for orphan enzymes (Smith et al., 2012). Currently, this approach is extended to the detection of
conserved chemical transformation motifs in the metabolism (Sorokina et al., submitted).
From a structural point of view, the ASMC (Active Site Modeling and Clustering) method
finds and compares active site pockets to classify enzymes of a family and detects
important residues for substrate specificity (de Melo-Minardi et al., 2010).
These methods were successful applied to elucidate the enzymatic diversity
of a protein family of unknown function (Bastard et al., 2014). Their
results, associated with present knowledge, must be unified in a
database allowing the elaboration of strategies for the selection of
enzymatic families of interest.
This work is supported by genomic and metabolic network data from
MicroScope, a platform for microbial genome analyses
(Vallenet et al., 2013).
Exploration of archaeal enzyme activities:
ARCHAEOACTOME research project
Literature references
Bastard, K. et al. Revealing the hidden functional diversity of an enzyme family. Nat.
Chem. Biol. 10, 42–9 (2014).
de Melo-Minardi, R. C., Bastard, K. & Artiguenave, F. Identification of subfamily-
specific sites based on active sites modeling and clustering. Bioinformatics 26,
3075–82 (2010).
Smith, A. A. T., Belda, E., Viari, A., Medigue, C. & Vallenet, D. The CanOE strategy:
Integrating genomic and metabolic contexts across multiple prokaryote genomes to
find candidate genes for orphan enzymes. PLoS Comput. Biol. 8, (2012).
Sorokina, M., Stam, M., Médigue, C., Lespinet, O. & Vallenet, D. Profiling the orphan
enzymes. Biol. Direct 9, 10 (2014).
Vallenet, D. et al. MicroScope--an integrated microbial resource for the curation and
comparative analysis of genomic and metabolic data. Nucleic Acids Res. 41, D636–
47 (2013).
Sorokina et al. A novel metabolic network representation for the discovery of
conserved modules of chemical transformations. Submitted.
Mercier, J., Vallenet, D. GROOLS: Reactive Graph Reasoning for Genome
Annotation. RuleML 2015 Conference
Active Sites Classification (ASMC)
The CanOE strategy
Reaction Molecular Signature Network
The dynamics of enzyme discoveryMicroScope
From genomes to biological
systems
Reactions sharing a same RMS
Reaction Network reduction
into a RMS Network
Microbial genome
analysis Metabolic network
>3,900 genomes
1-10 Mb
ASMC method
Classification of a family
into groups of similar active sites
NEA team
Workbench
Data
Integration
Orphan
Enzymes
Grools
Structural
Analysis
Metabolic
Network
MicroScope

More Related Content

Similar to Integrated approaches for the discovery of novel enzymatic activities

Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)
Tania Acuna
 
Gdt 2-126 (1)
Gdt 2-126 (1)Gdt 2-126 (1)
Gdt 2-126 (1)
Al Baha University
 
Gdt 2-126
Gdt 2-126Gdt 2-126
Semantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life SciencesSemantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life Sciences
Artificial Intelligence Institute at UofSC
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applications
Sham Sadiq
 
metagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdfmetagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdf
VisheshMishra20
 
B.3.5
B.3.5B.3.5
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
Maulik Kamdar
 
Resume-Cover letter-Ali Ashrafzadeh020416
Resume-Cover letter-Ali Ashrafzadeh020416Resume-Cover letter-Ali Ashrafzadeh020416
Resume-Cover letter-Ali Ashrafzadeh020416
Ali Ashrafzadeh
 
Maize database
Maize database Maize database
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
Sara Alvarez
 
Roleoffunctionalgenomicsincropimprovement ashishgautam
Roleoffunctionalgenomicsincropimprovement ashishgautamRoleoffunctionalgenomicsincropimprovement ashishgautam
Roleoffunctionalgenomicsincropimprovement ashishgautam
Ashish Gautam
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
Rothamsted Research, UK
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
c.titus.brown
 
Thesis def
Thesis defThesis def
Thesis def
Jay Vyas
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
rehman2009
 
MORPH-R article
MORPH-R articleMORPH-R article
MORPH-R article
Netanel Ghatan
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
Ankit Bhardwaj
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
Monica Munoz-Torres
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
CSCJournals
 

Similar to Integrated approaches for the discovery of novel enzymatic activities (20)

Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)
 
Gdt 2-126 (1)
Gdt 2-126 (1)Gdt 2-126 (1)
Gdt 2-126 (1)
 
Gdt 2-126
Gdt 2-126Gdt 2-126
Gdt 2-126
 
Semantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life SciencesSemantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life Sciences
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applications
 
metagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdfmetagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdf
 
B.3.5
B.3.5B.3.5
B.3.5
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
 
Resume-Cover letter-Ali Ashrafzadeh020416
Resume-Cover letter-Ali Ashrafzadeh020416Resume-Cover letter-Ali Ashrafzadeh020416
Resume-Cover letter-Ali Ashrafzadeh020416
 
Maize database
Maize database Maize database
Maize database
 
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
 
Roleoffunctionalgenomicsincropimprovement ashishgautam
Roleoffunctionalgenomicsincropimprovement ashishgautamRoleoffunctionalgenomicsincropimprovement ashishgautam
Roleoffunctionalgenomicsincropimprovement ashishgautam
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
Thesis def
Thesis defThesis def
Thesis def
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
 
MORPH-R article
MORPH-R articleMORPH-R article
MORPH-R article
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
 

Recently uploaded

seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
Nistarini College, Purulia (W.B) India
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
frank0071
 
SAP Unveils Generative AI Innovations at Annual Sapphire Conference
SAP Unveils Generative AI Innovations at Annual Sapphire ConferenceSAP Unveils Generative AI Innovations at Annual Sapphire Conference
SAP Unveils Generative AI Innovations at Annual Sapphire Conference
CGB SOLUTIONS
 
Roles and skills of administration-MON.pptx
Roles and skills of administration-MON.pptxRoles and skills of administration-MON.pptx
Roles and skills of administration-MON.pptx
DawThantMonPaing
 
Embracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and ReplicabilityEmbracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and Replicability
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
BANANA BUNCHY TOP K R.pptx
BANANA BUNCHY  TOP               K R.pptxBANANA BUNCHY  TOP               K R.pptx
BANANA BUNCHY TOP K R.pptx
KARTHIK REDDY C A
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
suyashempire
 
The Powders And The Granules 123456.pptx
The Powders And The Granules 123456.pptxThe Powders And The Granules 123456.pptx
The Powders And The Granules 123456.pptx
sanjeevkhanal2
 
GBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agentsGBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agents
Areesha Ahmad
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
Rodents, Birds and locust_Pests of crops.pdf
Rodents, Birds and locust_Pests of crops.pdfRodents, Birds and locust_Pests of crops.pdf
Rodents, Birds and locust_Pests of crops.pdf
PirithiRaju
 
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيعحبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات حبوب سايتوتك الامارات
 
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENTFlow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
savindersingh16
 
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
yashika sharman06
 
Discovery of Merging Twin Quasars at z=6.05
Discovery of Merging Twin Quasars at z=6.05Discovery of Merging Twin Quasars at z=6.05
Discovery of Merging Twin Quasars at z=6.05
Sérgio Sacani
 
Module_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISMModule_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISM
rajeshwexl
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
Mapping the Growth of Supermassive Black Holes as a Function of Galaxy Stella...
Mapping the Growth of Supermassive Black Holes as a Function of Galaxy Stella...Mapping the Growth of Supermassive Black Holes as a Function of Galaxy Stella...
Mapping the Growth of Supermassive Black Holes as a Function of Galaxy Stella...
Sérgio Sacani
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
lucianamillenium
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 

Recently uploaded (20)

seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
 
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdfHolsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
Holsinger, Bruce W. - Music, body and desire in medieval culture [2001].pdf
 
SAP Unveils Generative AI Innovations at Annual Sapphire Conference
SAP Unveils Generative AI Innovations at Annual Sapphire ConferenceSAP Unveils Generative AI Innovations at Annual Sapphire Conference
SAP Unveils Generative AI Innovations at Annual Sapphire Conference
 
Roles and skills of administration-MON.pptx
Roles and skills of administration-MON.pptxRoles and skills of administration-MON.pptx
Roles and skills of administration-MON.pptx
 
Embracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and ReplicabilityEmbracing Deep Variability For Reproducibility and Replicability
Embracing Deep Variability For Reproducibility and Replicability
 
BANANA BUNCHY TOP K R.pptx
BANANA BUNCHY  TOP               K R.pptxBANANA BUNCHY  TOP               K R.pptx
BANANA BUNCHY TOP K R.pptx
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
 
The Powders And The Granules 123456.pptx
The Powders And The Granules 123456.pptxThe Powders And The Granules 123456.pptx
The Powders And The Granules 123456.pptx
 
GBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agentsGBSN - Microbiology (Unit 2) Antimicrobial agents
GBSN - Microbiology (Unit 2) Antimicrobial agents
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
Rodents, Birds and locust_Pests of crops.pdf
Rodents, Birds and locust_Pests of crops.pdfRodents, Birds and locust_Pests of crops.pdf
Rodents, Birds and locust_Pests of crops.pdf
 
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيعحبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
 
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENTFlow chart.pdf  LIFE SCIENCES CSIR UGC NET CONTENT
Flow chart.pdf LIFE SCIENCES CSIR UGC NET CONTENT
 
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
 
Discovery of Merging Twin Quasars at z=6.05
Discovery of Merging Twin Quasars at z=6.05Discovery of Merging Twin Quasars at z=6.05
Discovery of Merging Twin Quasars at z=6.05
 
Module_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISMModule_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISM
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
Mapping the Growth of Supermassive Black Holes as a Function of Galaxy Stella...
Mapping the Growth of Supermassive Black Holes as a Function of Galaxy Stella...Mapping the Growth of Supermassive Black Holes as a Function of Galaxy Stella...
Mapping the Growth of Supermassive Black Holes as a Function of Galaxy Stella...
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 

Integrated approaches for the discovery of novel enzymatic activities

  • 1. Orphan and non-orphan EC number distribution across superkingdoms Integrated approaches for the discovery of novel enzymatic activities Guillaume REBOUL, Maria SOROKINA, Jonathan MERCIER, Karine BASTARD, Mark STAM, David VALLENET, Claudine MEDIGUE – CEA, Genoscope, LABGeM Annotation Rules Functional annotation rules UniRule: HAMAP + PIRSF + RuleBase Pathway Rules Consistency against biological processes Knowledge Base Rule Engine Enzyme Activity Discovery workflow Biological Facts rule "Missing state" when $fact: Fact( present == "no", require == "yes", avoid == "no" ) then modify( $fact.setState("missing") ); end rule "require Pathway" when $org: Organism() $path: Pathway(org == $org) then Fact fact = new Fact($org, $path); fact.setRequire("yes"); insert(fact); end The “Novel Enzymatic Activities” group • Group of the LABGeM: Laboratory of Bioinformatics Analyses for Genomics and Metabolism • Part of the CEA (French Alternative Energies and Atomic Energy Commission) • 3 Researchers • 2 PhD students • 1 sandwich placement Master Student • 1 undergraduate student • 1 post-doctoral placement available Pool of information able to pull up Novel Enzymatic Activities Protein or Domain Families Protein Annotation and Sequences Literature Enzymatic Reactions Own database: NEADB Summary ?  iso-functional groups  multi-functional group  non-”activity” groups  Motifs and key residues for function assignment 3 Promiscuity & Specificity Modeling of compounds in active sites + Full family  new metabolic functions and associated pathways 4 Metabolic Role Biochemical validation + Genomic contextRepresentants enzymes family 1 Define one reaction +Multiple alignment one generic reaction A +B <-> A’ + B’  A family of unknown function  with experimental evidences  with one available structure 17 substrats new reactions 2 Selection & Screening +Enzymatic screening Statistical analysis + Family partitioning Potential metabolites Set of sequences BLAST PDB Homology Modeling - MODELLER 3D Models Cavity Detection - FPOCKET 3D-Active Sites Structural Alignment - MULTALIGN Hierarchical Clustering - WEKA Specificity Determining Residues are determined by a log-likelihood analysis Pfam unknown family (DUF 846) Next Generation Sequencing technology has dramatically increased the number of available sequences in public databases. At the same time, many enzymatic activities (~22%) are orphans of protein sequence (Sorokina et al., 2014). The large amount of available protein sequences is an opportunity to discover enzymes associated to new reactions. We present here an integrated bioinformatics approach to reduce this lack of knowledge in metabolism and to propose new activity/protein associations for experimental validation. With this objective, the “New Enzymatic Activity” group of the LABGeM team is developing several methods. The CanOE method combines genomic and metabolic contexts to predict candidate genes for orphan enzymes (Smith et al., 2012). Currently, this approach is extended to the detection of conserved chemical transformation motifs in the metabolism (Sorokina et al., submitted). From a structural point of view, the ASMC (Active Site Modeling and Clustering) method finds and compares active site pockets to classify enzymes of a family and detects important residues for substrate specificity (de Melo-Minardi et al., 2010). These methods were successful applied to elucidate the enzymatic diversity of a protein family of unknown function (Bastard et al., 2014). Their results, associated with present knowledge, must be unified in a database allowing the elaboration of strategies for the selection of enzymatic families of interest. This work is supported by genomic and metabolic network data from MicroScope, a platform for microbial genome analyses (Vallenet et al., 2013). Exploration of archaeal enzyme activities: ARCHAEOACTOME research project Literature references Bastard, K. et al. Revealing the hidden functional diversity of an enzyme family. Nat. Chem. Biol. 10, 42–9 (2014). de Melo-Minardi, R. C., Bastard, K. & Artiguenave, F. Identification of subfamily- specific sites based on active sites modeling and clustering. Bioinformatics 26, 3075–82 (2010). Smith, A. A. T., Belda, E., Viari, A., Medigue, C. & Vallenet, D. The CanOE strategy: Integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes. PLoS Comput. Biol. 8, (2012). Sorokina, M., Stam, M., Médigue, C., Lespinet, O. & Vallenet, D. Profiling the orphan enzymes. Biol. Direct 9, 10 (2014). Vallenet, D. et al. MicroScope--an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data. Nucleic Acids Res. 41, D636– 47 (2013). Sorokina et al. A novel metabolic network representation for the discovery of conserved modules of chemical transformations. Submitted. Mercier, J., Vallenet, D. GROOLS: Reactive Graph Reasoning for Genome Annotation. RuleML 2015 Conference Active Sites Classification (ASMC) The CanOE strategy Reaction Molecular Signature Network The dynamics of enzyme discoveryMicroScope From genomes to biological systems Reactions sharing a same RMS Reaction Network reduction into a RMS Network Microbial genome analysis Metabolic network >3,900 genomes 1-10 Mb ASMC method Classification of a family into groups of similar active sites NEA team Workbench Data Integration Orphan Enzymes Grools Structural Analysis Metabolic Network MicroScope