SlideShare a Scribd company logo
1 of 243
Download to read offline
Scientific Workflow Systems
and
Multi-Objective Evolutionary Algorithms
for
Life Sciences Informatics
Christos C. Kannas
Computer Science, University of Cyprus
6th June 2017
Table of Contents
1 Introduction
Scientific Workflow Management Systems
Self-Adaptive Multi-Objective Evolutionary Algorithms
Virtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platform
About Life Sciences Informatics platform
LiSIs Showcase
LiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary Algorithm
About Self-Adaptive MOEA
Self-Adaptive MOEA Showcases
Self-Adaptive MOEA Showcases Discussion
4 Concluding Remarks
Concluding Remarks - LiSIs platform
Concluding Remarks - Self-Adaptive MOEA
5 Future Work
Future Work - LiSIs platform
Future Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 1 / 130
Introduction
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 2 / 130
Table of Contents
1 Introduction
Scientific Workflow Management Systems
Self-Adaptive Multi-Objective Evolutionary Algorithms
Virtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platform
About Life Sciences Informatics platform
LiSIs Showcase
LiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary Algorithm
About Self-Adaptive MOEA
Self-Adaptive MOEA Showcases
Self-Adaptive MOEA Showcases Discussion
4 Concluding Remarks
Concluding Remarks - LiSIs platform
Concluding Remarks - Self-Adaptive MOEA
5 Future Work
Future Work - LiSIs platform
Future Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 2 / 130
Scientific Workflow Management Systems
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 3 / 130
SWMSs Application Domains
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 4 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithms
Multi-Objective Evolutionary Algorithms:
Family of algorithms inspired by nature:
Evolve a population
Mutation and Crossover
Select fittest individuals by Pareto ranking
Handle 1 to 3 objectives
Self-Adaptive Techniques:
Optimise search parameters:
Population Size
Mutation Rate
Crossover Rate
Generation Gap
Scaling Window
Optimise reproduction operators:
Mutation Operator(s)
Crossover Operator(s)
Parent Selection Operator
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithms
Multi-Objective Evolutionary Algorithms:
Family of algorithms inspired by nature:
Evolve a population
Mutation and Crossover
Select fittest individuals by Pareto ranking
Handle 1 to 3 objectives
Self-Adaptive Techniques:
Optimise search parameters:
Population Size
Mutation Rate
Crossover Rate
Generation Gap
Scaling Window
Optimise reproduction operators:
Mutation Operator(s)
Crossover Operator(s)
Parent Selection Operator
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithms
Multi-Objective Evolutionary Algorithms:
Family of algorithms inspired by nature:
Evolve a population
Mutation and Crossover
Select fittest individuals by Pareto ranking
Handle 1 to 3 objectives
Self-Adaptive Techniques:
Optimise search parameters:
Population Size
Mutation Rate
Crossover Rate
Generation Gap
Scaling Window
Optimise reproduction operators:
Mutation Operator(s)
Crossover Operator(s)
Parent Selection Operator
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithms
Multi-Objective Evolutionary Algorithms:
Family of algorithms inspired by nature:
Evolve a population
Mutation and Crossover
Select fittest individuals by Pareto ranking
Handle 1 to 3 objectives
Self-Adaptive Techniques:
Optimise search parameters:
Population Size
Mutation Rate
Crossover Rate
Generation Gap
Scaling Window
Optimise reproduction operators:
Mutation Operator(s)
Crossover Operator(s)
Parent Selection Operator
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithms
Multi-Objective Evolutionary Algorithms:
Family of algorithms inspired by nature:
Evolve a population
Mutation and Crossover
Select fittest individuals by Pareto ranking
Handle 1 to 3 objectives
Self-Adaptive Techniques:
Optimise search parameters:
Population Size
Mutation Rate
Crossover Rate
Generation Gap
Scaling Window
Optimise reproduction operators:
Mutation Operator(s)
Crossover Operator(s)
Parent Selection Operator
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithms
Multi-Objective Evolutionary Algorithms:
Family of algorithms inspired by nature:
Evolve a population
Mutation and Crossover
Select fittest individuals by Pareto ranking
Handle 1 to 3 objectives
Self-Adaptive Techniques:
Optimise search parameters:
Population Size
Mutation Rate
Crossover Rate
Generation Gap
Scaling Window
Optimise reproduction operators:
Mutation Operator(s)
Crossover Operator(s)
Parent Selection Operator
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
Drug Discovery Process - Steps
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 6 / 130
Drug Discovery Process - Timeline
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 7 / 130
Life Sciences Informatics platform
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 8 / 130
Table of Contents
1 Introduction
Scientific Workflow Management Systems
Self-Adaptive Multi-Objective Evolutionary Algorithms
Virtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platform
About Life Sciences Informatics platform
LiSIs Showcase
LiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary Algorithm
About Self-Adaptive MOEA
Self-Adaptive MOEA Showcases
Self-Adaptive MOEA Showcases Discussion
4 Concluding Remarks
Concluding Remarks - LiSIs platform
Concluding Remarks - Self-Adaptive MOEA
5 Future Work
Future Work - LiSIs platform
Future Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 8 / 130
Motivation & Objectives
Motivation
Provide an easy to use web based platform,
Focused on Virtual Screening (VS) of natural products, and
Aimed towards cancer chemoprevention researchers.
Objectives
Design and develop a web based Scientific Workflow
Management System (SWMS),
Provide tools for VS, and
Evaluate it on use cases for identifying novel chemopreventive
agents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
Motivation & Objectives
Motivation
Provide an easy to use web based platform,
Focused on Virtual Screening (VS) of natural products, and
Aimed towards cancer chemoprevention researchers.
Objectives
Design and develop a web based Scientific Workflow
Management System (SWMS),
Provide tools for VS, and
Evaluate it on use cases for identifying novel chemopreventive
agents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
Motivation & Objectives
Motivation
Provide an easy to use web based platform,
Focused on Virtual Screening (VS) of natural products, and
Aimed towards cancer chemoprevention researchers.
Objectives
Design and develop a web based Scientific Workflow
Management System (SWMS),
Provide tools for VS, and
Evaluate it on use cases for identifying novel chemopreventive
agents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
Motivation & Objectives
Motivation
Provide an easy to use web based platform,
Focused on Virtual Screening (VS) of natural products, and
Aimed towards cancer chemoprevention researchers.
Objectives
Design and develop a web based Scientific Workflow
Management System (SWMS),
Provide tools for VS, and
Evaluate it on use cases for identifying novel chemopreventive
agents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
Motivation & Objectives
Motivation
Provide an easy to use web based platform,
Focused on Virtual Screening (VS) of natural products, and
Aimed towards cancer chemoprevention researchers.
Objectives
Design and develop a web based Scientific Workflow
Management System (SWMS),
Provide tools for VS, and
Evaluate it on use cases for identifying novel chemopreventive
agents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
Motivation & Objectives
Motivation
Provide an easy to use web based platform,
Focused on Virtual Screening (VS) of natural products, and
Aimed towards cancer chemoprevention researchers.
Objectives
Design and develop a web based Scientific Workflow
Management System (SWMS),
Provide tools for VS, and
Evaluate it on use cases for identifying novel chemopreventive
agents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
Scientific Workflow Management Systems for
Virtual Screening
Applications Technology Scientific Field(s)
Open Source
Taverna Java
Bioinformatics,
Chemistry,
Astronomy,
Data Mining,
Text Mining,
Music
Galaxy Python
Life Sciences,
Bioinformatics
Knime Java
Life Sciences,
Chemoinformatics,
Bioinformatics,
High Performance Data Anal-
ysis
Commercial
Inforsence/DiscoveryNet
Life Sciences,
Healthcare,
Environmental Monitoring,
Geo-hazard Modelling
Pipeline Pilot
Biology,
Chemistry,
Material Science
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 10 / 130
Funding Support
The work has been partially supported through the EU-FP7
GRANATUM project, ”A Social Collaborative Working
Space Semantically Interlinking Biomedical Researchers,
Knowledge and data for the design and execution of In Silico
Models and Experiments in Cancer Chemoprevention”,
contract number 270139.
Support the research of EU-FP7 Linked2Safety project, ”A
Next-Generation, Secure Linked Data Medical Information
Space For Semantically-Interconnecting Electronic Health
Records and Clinical Trials Systems Advancing Patients
Safety In Clinical Research”, contract number 288328.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130
Funding Support
The work has been partially supported through the EU-FP7
GRANATUM project, ”A Social Collaborative Working
Space Semantically Interlinking Biomedical Researchers,
Knowledge and data for the design and execution of In Silico
Models and Experiments in Cancer Chemoprevention”,
contract number 270139.
Support the research of EU-FP7 Linked2Safety project, ”A
Next-Generation, Secure Linked Data Medical Information
Space For Semantically-Interconnecting Electronic Health
Records and Clinical Trials Systems Advancing Patients
Safety In Clinical Research”, contract number 288328.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130
Life Sciences Informatics platform
Life Sciences Informatics (LiSIs) is a web based SWMS for
VS [Kannas et al., 2015].
LiSIs is based on the Galaxy SWMS [Goecks et al., 2010],
[Blankenberg et al., 2010], [Giardine et al., 2005].
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130
Life Sciences Informatics platform
Life Sciences Informatics (LiSIs) is a web based SWMS for
VS [Kannas et al., 2015].
LiSIs is based on the Galaxy SWMS [Goecks et al., 2010],
[Blankenberg et al., 2010], [Giardine et al., 2005].
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130
LiSIs modules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130
LiSIs modules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130
LiSIs modules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130
LiSIs modules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130
LiSIs Showcase
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 14 / 130
LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promising
agents with chemopreventive properties, that are able to bind
to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β
(ER-β)
Datasets:
2414 compounds from Indofine,
55 compounds characterized by Medina-Franco et al.
[Medina-Franco et al., 2010], and
21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promising
agents with chemopreventive properties, that are able to bind
to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β
(ER-β)
Datasets:
2414 compounds from Indofine,
55 compounds characterized by Medina-Franco et al.
[Medina-Franco et al., 2010], and
21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promising
agents with chemopreventive properties, that are able to bind
to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β
(ER-β)
Datasets:
2414 compounds from Indofine,
55 compounds characterized by Medina-Franco et al.
[Medina-Franco et al., 2010], and
21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promising
agents with chemopreventive properties, that are able to bind
to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β
(ER-β)
Datasets:
2414 compounds from Indofine,
55 compounds characterized by Medina-Franco et al.
[Medina-Franco et al., 2010], and
21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promising
agents with chemopreventive properties, that are able to bind
to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β
(ER-β)
Datasets:
2414 compounds from Indofine,
55 compounds characterized by Medina-Franco et al.
[Medina-Franco et al., 2010], and
21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
LiSIs Showcase Workflow
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 16 / 130
LiSIs Showcase Docking Results
(a) ER-α Docking Score (b) ER-β Docking Score
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 17 / 130
LiSIs Showcase Discussion
From Indofine dataset (2414 compounds), based on their
natural-like criteria and docking results, we selected:
18 potential ER ligands,
Were further investigated in vitro with the ER binding assay
described by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]
with minor modifications,
15 out of 18 compounds (83.3%) were experimentally
confirmed active.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130
LiSIs Showcase Discussion
From Indofine dataset (2414 compounds), based on their
natural-like criteria and docking results, we selected:
18 potential ER ligands,
Were further investigated in vitro with the ER binding assay
described by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]
with minor modifications,
15 out of 18 compounds (83.3%) were experimentally
confirmed active.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130
LiSIs Showcase Discussion
From Indofine dataset (2414 compounds), based on their
natural-like criteria and docking results, we selected:
18 potential ER ligands,
Were further investigated in vitro with the ER binding assay
described by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]
with minor modifications,
15 out of 18 compounds (83.3%) were experimentally
confirmed active.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130
Self-Adaptive Multi-Objective Evolutionary
Algorithm
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 19 / 130
Table of Contents
1 Introduction
Scientific Workflow Management Systems
Self-Adaptive Multi-Objective Evolutionary Algorithms
Virtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platform
About Life Sciences Informatics platform
LiSIs Showcase
LiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary Algorithm
About Self-Adaptive MOEA
Self-Adaptive MOEA Showcases
Self-Adaptive MOEA Showcases Discussion
4 Concluding Remarks
Concluding Remarks - LiSIs platform
Concluding Remarks - Self-Adaptive MOEA
5 Future Work
Future Work - LiSIs platform
Future Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 19 / 130
Multi-Objective Algorithms for Molecular Design
Name MO Method Search
Method
Remarks Reference
EA-
Inventor
Weighted Evolutionary
Algorithm
Ligand [Feher et al., 2008]
GANDI Weighted Parallel Evo-
lutionary Al-
gorithm
Structure [Dey and Caflisch, 2008]
FOG Weighted Evolutionary
Algorithm
Ligand [Kutchukian et al., 2009]
MEGA Pareto based Evolutionary
Algorithm
Ligand & Struc-
ture
[Nicolaou et al., 2009a]
PLD Pareto based Evolutionary
Algorithm
ADME related
properties
[Ekins et al., 2010]
NovoFLAP Weighted Evolutionary
Algorithm
Ligand [Damewood et al., 2010]
PhDD Weighted Workflow Pharmacophore [Huang et al., 2010]
DOGS Weighted Workflow Ligand [Hartenfeller et al., 2012]
LiGen Weighted Workflow Ligand, Struc-
ture & Pharma-
cophore
[Beccari et al., 2013]
MOARF Weighted Workflow Ligand & Struc-
ture
[Firth et al., 2015]
Synopsis Pareto based Evolutionary
Algorithm
Ligand & Struc-
ture
[Daeyaert and Deem, 2016]
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 20 / 130
Motivation & Objectives
Motivation
Find suitable search parameters for an algorithm in a given
problem, and
Automate this process.
Objectives
Design and develop an algorithm:
To search for the fittest search parameters of MOEAs,
To be problem agnostic, and
Evaluate on our previously proposed eMEGA for molecular
De Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
Motivation & Objectives
Motivation
Find suitable search parameters for an algorithm in a given
problem, and
Automate this process.
Objectives
Design and develop an algorithm:
To search for the fittest search parameters of MOEAs,
To be problem agnostic, and
Evaluate on our previously proposed eMEGA for molecular
De Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
Motivation & Objectives
Motivation
Find suitable search parameters for an algorithm in a given
problem, and
Automate this process.
Objectives
Design and develop an algorithm:
To search for the fittest search parameters of MOEAs,
To be problem agnostic, and
Evaluate on our previously proposed eMEGA for molecular
De Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
Motivation & Objectives
Motivation
Find suitable search parameters for an algorithm in a given
problem, and
Automate this process.
Objectives
Design and develop an algorithm:
To search for the fittest search parameters of MOEAs,
To be problem agnostic, and
Evaluate on our previously proposed eMEGA for molecular
De Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
Motivation & Objectives
Motivation
Find suitable search parameters for an algorithm in a given
problem, and
Automate this process.
Objectives
Design and develop an algorithm:
To search for the fittest search parameters of MOEAs,
To be problem agnostic, and
Evaluate on our previously proposed eMEGA for molecular
De Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
Self-Adaptive MOEA Pseudocode
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 23 / 130
Self-Adaptive MOEA Chromosome
Chromosomes Example
Objective Fitness Functions
Objective Fitness Function Range Example
Non-dominated Solutions % 0 - 1.0 0.90
Unique Solutions % 0 - 1.0 0.88
Pareto Front Hypervolume 0 - 1.0 0.56
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 24 / 130
Self-Adaptive MOEA Chromosome
Chromosomes Example
Objective Fitness Functions
Objective Fitness Function Range Example
Non-dominated Solutions % 0 - 1.0 0.90
Unique Solutions % 0 - 1.0 0.88
Pareto Front Hypervolume 0 - 1.0 0.56
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 24 / 130
eMEGA Chromosome
Graph based, and
Information related to evolutionary design process.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130
eMEGA Chromosome
Graph based, and
Information related to evolutionary design process.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130
eMEGA Chromosome
Graph based, and
Information related to evolutionary design process.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130
Self-Adaptive MOEA Flowchart
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130
Self-Adaptive MOEA Flowchart
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130
Self-Adaptive MOEA Flowchart
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130
Self-Adaptive MOEA Flowchart
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130
Self-Adaptive MOEA Showcases
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 27 / 130
Validation of Self-Adaptive MOEA: About
Compare SAMOEA, eMEGA and MOARF
[Firth et al., 2015].
Design molecules that have structural and chemical
properties similarity to the target molecule of Seliciclib.
Figure: Seliciclib (CYC202, R-roscovitine)
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 28 / 130
Validation of Self-Adaptive MOEA: Staring
Datasets
Starting Molecules datasets:
Maybridge’s Screening Library that contains 53953 molecules
(Dataset 1),
Asinex’s Elite Libraries that contains 104577 molecules
(Dataset 2).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130
Validation of Self-Adaptive MOEA: Staring
Datasets
Starting Molecules datasets:
Maybridge’s Screening Library that contains 53953 molecules
(Dataset 1),
Asinex’s Elite Libraries that contains 104577 molecules
(Dataset 2).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130
Validation of Self-Adaptive MOEA: Settings
eMEGA Settings
Dataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Structural Similarity
Chemical Descriptor
Similarity
500 500
Mutation Probability: 15%
Crossover Probability: 80%
Selection Type: Roulette
Diversity Type: Genotype
Dataset 2
SAMOEA Settings
SAMOEA
Dataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Non Dominate
Solutions Percentage
Unique Solutions
Percentage
20 100
Mutation Probability: 15%
Crossover Probability: 80%
Selection Type: Roulette
Diversity Type: Phenotype
Dataset 2
eMEGA
Dataset 1 Structural Similarity
Chemical Descriptor
Similarity
100 1
Defined during run time.
Based on SAMOEA’s chro-
mosomes.
Dataset 2
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130
Validation of Self-Adaptive MOEA: Settings
eMEGA Settings
Dataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Structural Similarity
Chemical Descriptor
Similarity
500 500
Mutation Probability: 15%
Crossover Probability: 80%
Selection Type: Roulette
Diversity Type: Genotype
Dataset 2
SAMOEA Settings
SAMOEA
Dataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Non Dominate
Solutions Percentage
Unique Solutions
Percentage
20 100
Mutation Probability: 15%
Crossover Probability: 80%
Selection Type: Roulette
Diversity Type: Phenotype
Dataset 2
eMEGA
Dataset 1 Structural Similarity
Chemical Descriptor
Similarity
100 1
Defined during run time.
Based on SAMOEA’s chro-
mosomes.
Dataset 2
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130
Validation of Self-Adaptive MOEA: Results
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 31 / 130
Validation of Self-Adaptive MOEA: Results
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 31 / 130
Validation of Self-Adaptive MOEA: Results -
Search Settings (1)
SAMOEA Top 10 proposed settings for eMEGA for Maybridge dataset
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Unique
Solutions
%
Rank
0.029 0.694 roulette genotype 0.9 0.986 1
0.175 0.818 roulette phenotype 0.914 0.961 1
0.172 0.818 tournament phenotype 0.934 0.9533 1
0.026 0.694 roulette phenotype 0.928 0.955 1
0.001 0.963 roulette phenotype 0.982 0.848 1
0.177 0.818 roulette phenotype 0.921 0.956 1
0.083 0.73 tournament phenotype 0.95 0.946 1
0.086 0.798 tournament genotype 0.976 0.928 1
0.172 0.818 best genotype 0.914 0.973 2
0.176 0.818 roulette genotype 0.9312 0.956 2
Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the
actual %. The smaller the number listed here the better. ’Rank’ is their non dominance
rank.
Validation of Self-Adaptive MOEA: Results -
Search Settings (2)
SAMOEA Top 10 proposed settings for eMEGA for Asinex dataset
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Unique
Solutions
%
Rank
0.105 1.0 best phenotype 0.988 0.931 1
0.139 0.963 tournament phenotype 0.962 0.956 1
0.089 0.694 tournament genotype 0.976 0.943 1
0.139 0.969 best phenotype 0.96 0.96 1
0.108 0.69 tournament genotype 0.955 0.962 1
0.1 1.0 best phenotype 0.988 0.942 1
0.088 0.685 tournament genotype 0.96 0.962 1
0.139 0.966 roulette phenotype 0.965 0.948 1
0.089 0.709 tournament genotype 0.964 0.957 2
Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the
actual %. The smaller the number listed here the better. ’Rank’ is their non dominance
rank.
Use Case 1: About
Design molecules that bind to ER-α based on:
Structural similarity to Tamoxifen, and
Structural dissimilarity to Ibuproxam.
(a) Tamoxifen. (b) Ibuproxam.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 34 / 130
Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
Use Case 1: Results - In objective space
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 36 / 130
Use Case 1: Results - Designed molecules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 37 / 130
Use Case 1: Results - AutoDock Vina docking
Molecule Id Docking Affinity (kcal/mol)
Tamoxifen -8.2
DnD 6 SP 20 4 X 13a -7.9
DnD 31 SP 150 37 M 19 -7.9
DnD 8 SP 9 2 M 13 -7.8
DnD 4 SP 199 49 X 46b -7.7
DnD 12 SP 75 18 M 13 -7.6
DnD 31 SP 6 1 M 16 -7.2
DnD 15 SP 168 41 M 0 -7.2
DnD 11 SP 74 18 M 4 -7.1
DnD 31 SP 193 48 X 76b -6.9
DnD 1 SP 78 19 X 84a -6.8
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 38 / 130
Use Case 1: Results - Self-Adaptive MOEA non
dominated settings for eMEGA
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Pareto
Hypervolume
Rank
0.15777 0.80279 tournament genotype 0.634 0.341 1
0.15613 0.88305 tournament genotype 0.634 0.341 1
0.15627 0.88891 tournament genotype 0.634 0.341 1
0.15688 0.88891 roulette genotype 0.649 0.340 1
0.00552 0.94308 best genotype 0.624 0.427 1
Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the
number listed here the better. ’Rank’ is their non dominance rank.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 39 / 130
Use Case 3: About
Design molecules that bind to ER-α based on:
Structural similarity to Raloxifene, and
Chemical Properties similarity to Raloxifene.
Figure: Raloxifene.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 40 / 130
Use Case 3: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 41 / 130
Use Case 3: Results - In objective space
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 42 / 130
Use Case 3: Results - Designed molecules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 43 / 130
Use Case 3: Results - AutoDock Vina docking
Molecule Id Docking Affinity (kcal/mol)
DnD 31 SP 194 48 M 49 -8.2
DnD 34 SP 197 49 X 13a -5.9
Raloxifene -2.2 (-11.70 PubChem)
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 44 / 130
Use Case 3: Results - Self-Adaptive MOEA non
dominated settings for eMEGA
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Pareto
Hypervolume
Rank
0.12927 0.98597 roulette genotype 0.997 0.274 1
0.12897 0.98588 roulette genotype 0.997 0.274 1
0.12933 0.98588 roulette genotype 0.997 0.274 1
0.12946 0.98559 roulette genotype 0.997 0.274 1
0.12928 0.98582 roulette genotype 0.997 0.274 1
0.12897 0.98588 tournament genotype 0.997 0.274 1
Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the
number listed here the better. ’Rank’ is their non dominance rank.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 45 / 130
Use Case 4: About
Design molecules that bind to Proteasome B5 based on:
Structural similarity to Ixazomib, and
Chemical Properties similarity to Ixazomib.
Figure: Ixazomib.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 46 / 130
Use Case 4: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 47 / 130
Use Case 4: Results - In objective space
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 48 / 130
Use Case 4: Results - Designed molecules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 49 / 130
Use Case 4: Results - AutoDock 4 docking
Molecule Id Docking Affinity (kcal/mol)
DnD 19 SP 196 48 X 59b -7.19
DnD 49 SP 193 48 X 123b -6.68
DnD 1 SP 196 48 X 67a -6.08
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 50 / 130
Use Case 4: Results - Self-Adaptive MOEA non
dominated settings for eMEGA
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Pareto
Hypervolume
Rank
0.09507 0.98194 tournament phenotype 0.993 0.442 1
0.09507 0.9819 roulette phenotype 0.991 0.442 1
0.09471 0.98178 roulette genotype 0.997 0.426 1
0.09484 0.98183 roulette phenotype 0.996 0.441 1
0.09277 0.98235 roulette genotype 0.996 0.441 1
Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the
number listed here the better. ’Rank’ is their non dominance rank.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 51 / 130
Self-Adaptive MOEA Showcases Discussion
SAMOEA proposed interesting solutions in all problems that
has been applied to,
Further in-vitro investigation is required, and
SAMOEA’s proposed eMEGA settings differ based on
problem and dataset (no silver bullet).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130
Self-Adaptive MOEA Showcases Discussion
SAMOEA proposed interesting solutions in all problems that
has been applied to,
Further in-vitro investigation is required, and
SAMOEA’s proposed eMEGA settings differ based on
problem and dataset (no silver bullet).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130
Self-Adaptive MOEA Showcases Discussion
SAMOEA proposed interesting solutions in all problems that
has been applied to,
Further in-vitro investigation is required, and
SAMOEA’s proposed eMEGA settings differ based on
problem and dataset (no silver bullet).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130
Concluding Remarks
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 53 / 130
Table of Contents
1 Introduction
Scientific Workflow Management Systems
Self-Adaptive Multi-Objective Evolutionary Algorithms
Virtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platform
About Life Sciences Informatics platform
LiSIs Showcase
LiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary Algorithm
About Self-Adaptive MOEA
Self-Adaptive MOEA Showcases
Self-Adaptive MOEA Showcases Discussion
4 Concluding Remarks
Concluding Remarks - LiSIs platform
Concluding Remarks - Self-Adaptive MOEA
5 Future Work
Future Work - LiSIs platform
Future Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 53 / 130
Concluding Remarks - LiSIs platform
Features a Web based Virtual Screening platform, focused for
Cancer Chemoprevention Research.
To be expanded later in the future with tools featuring the
algorithms from MEGA framework.
A number of SWs were implemented for:
preparing docking models,
preparing predictive models,
performing docking experiments,
using predictive models to predict biochemical properties
and behaviour, and
performing VS workflows.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130
Concluding Remarks - LiSIs platform
Features a Web based Virtual Screening platform, focused for
Cancer Chemoprevention Research.
To be expanded later in the future with tools featuring the
algorithms from MEGA framework.
A number of SWs were implemented for:
preparing docking models,
preparing predictive models,
performing docking experiments,
using predictive models to predict biochemical properties
and behaviour, and
performing VS workflows.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130
Concluding Remarks - LiSIs platform
Features a Web based Virtual Screening platform, focused for
Cancer Chemoprevention Research.
To be expanded later in the future with tools featuring the
algorithms from MEGA framework.
A number of SWs were implemented for:
preparing docking models,
preparing predictive models,
performing docking experiments,
using predictive models to predict biochemical properties
and behaviour, and
performing VS workflows.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130
Concluding Remarks - Self-Adaptive MOEA (1)
Drawbacks:
Needs a lot of time to terminate, and
Very slow convergence.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130
Concluding Remarks - Self-Adaptive MOEA (1)
Drawbacks:
Needs a lot of time to terminate, and
Very slow convergence.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130
Concluding Remarks - Self-Adaptive MOEA (1)
Drawbacks:
Needs a lot of time to terminate, and
Very slow convergence.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
Future Work
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 57 / 130
Table of Contents
1 Introduction
Scientific Workflow Management Systems
Self-Adaptive Multi-Objective Evolutionary Algorithms
Virtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platform
About Life Sciences Informatics platform
LiSIs Showcase
LiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary Algorithm
About Self-Adaptive MOEA
Self-Adaptive MOEA Showcases
Self-Adaptive MOEA Showcases Discussion
4 Concluding Remarks
Concluding Remarks - LiSIs platform
Concluding Remarks - Self-Adaptive MOEA
5 Future Work
Future Work - LiSIs platform
Future Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 57 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management and
parallelism),
Implement self-adaptive technique for selecting genetic
operators,
Extend Self-Adaptive MOEA to use other MOEAs,
Implement models for other problems, and
Implement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management and
parallelism),
Implement self-adaptive technique for selecting genetic
operators,
Extend Self-Adaptive MOEA to use other MOEAs,
Implement models for other problems, and
Implement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management and
parallelism),
Implement self-adaptive technique for selecting genetic
operators,
Extend Self-Adaptive MOEA to use other MOEAs,
Implement models for other problems, and
Implement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management and
parallelism),
Implement self-adaptive technique for selecting genetic
operators,
Extend Self-Adaptive MOEA to use other MOEAs,
Implement models for other problems, and
Implement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management and
parallelism),
Implement self-adaptive technique for selecting genetic
operators,
Extend Self-Adaptive MOEA to use other MOEAs,
Implement models for other problems, and
Implement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
List of Publications
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
Table of Contents
6 List of Publications
7 References
8 Backup Frames
Validation of Self-Adaptive MOEA
Use Case 1
Use Case 2
Use Case 3
Use Case 4
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
List of Publications I
Book Chapters
C. A. Nicolaou and C. C. Kannas, “Molecular Library
Design Using Multi-Objective Optimization
Methods,” in Chemical Library Design, J. Z. Zhou, Ed.
Humana Press, 2011, pp. 53–69.
Journals
C. Kannas et al., “LiSIs: An Online Scientific Workflow
System for Virtual Screening,” Combinatorial Chemistry
& High Throughput Screening, vol. 18, no. 3, pp. 281–295,
Mar. 2015.
C. A. Nicolaou, C. Kannas, and E. Loizidou,
“Multi-objective optimization methods in de novo
drug design,” Mini Rev Med Chem, vol. 12, no. 10, pp.
979–987, Sep. 2012.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 62 / 130
List of Publications II
C. Nicolaou, C. Kannas, and C. Pattichis,
“Knowledge-driven multi-objective de novo drug
design,” Chemistry Central Journal, vol. 3, p. P22, 2009.
Conferences
C. C. Kannas, and C. S. Pattichis, ”Self-Adaptive
Multi-Objective Evolutionary Algorithm for
Molecular Design,” in 30th IEEE International
Symposium on Computer-Base Medical Systems,
Thessoloniki, Greece, 22-24 June 2017, pp. 1-6.
P. Hasapis et al., ”Molecular clustering via knowledge
mining from biomedical scientific corpora,” in 2013
IEEE 13th International Conference on Bioinformatics and
Bioengineering (BIBE), 2013, pp. 1-5.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 63 / 130
List of Publications III
C. C. Kannas et al., “A workflow system for virtual
screening in cancer chemoprevention,” in 2012 IEEE
12th International Conference on Bioinformatics
Bioengineering (BIBE), 2012, pp. 439–446.
K. G. Achilleos, C. C. Kannas, C. A. Nicolaou, C. S.
Pattichis, and V. J. Promponas, “Open source workflow
systems in life sciences informatics,” in 2012 IEEE 12th
International Conference on Bioinformatics Bioengineering
(BIBE), 2012, pp. 552–558.
C. A. Nicolaou, C. Kannas, and C. S. Pattichis, “Optimal
graph design using a knowledge-driven
multi-objective evolutionary graph algorithm,” in
2009 9th International Conference on Information
Technology and Applications in Biomedicine, Larnaka,
Cyprus, 2009, pp. 1–6.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 64 / 130
List of Publications IV
C. C. Kannas, C. A. Nicolaou, and C. S. Pattichis, “A
Parallel implementation of a Multi-objective
Evolutionary Algorithm,” in 2009 9th International
Conference on Information Technology and Applications in
Biomedicine, Larnaka, Cyprus, 2009, pp. 1–6.
Abstracts
C. C. Kannas, and C. S. Pattichis, ”Self-Adaptive
Multi-Objective Evolutionary Algorithm for
Molecular Design,” in 39th Annual International
Conference of the IEEE Engineering in Medicine and Biology
Society, Jeju Island, Korea, 11-15 July 2017.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 65 / 130
References
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 66 / 130
Table of Contents
6 List of Publications
7 References
8 Backup Frames
Validation of Self-Adaptive MOEA
Use Case 1
Use Case 2
Use Case 3
Use Case 4
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 66 / 130
References I
Beccari, A. R., Cavazzoni, C., Beato, C., and Costantino, G.
(2013). LiGen: A High Performance Workflow for Chemistry
Driven de Novo Design. Journal of Chemical Information and
Modeling.
Blankenberg, D., Kuster, G. V., Coraor, N., Ananda, G.,
Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. (2010).
Galaxy: A Web-Based Genome Analysis Tool for
Experimentalists. In Current Protocols in Molecular Biology.
John Wiley & Sons, Inc.
Daeyaert, F. and Deem, M. W. (2016). A Pareto Algorithm for
Efficient De Novo Design of Multi-functional Molecules.
Molecular Informatics, pages n/a–n/a.
References II
Damewood, Jr, J. R., Lerman, C. L., and Masek, B. B. (2010).
NovoFLAP: A ligand-based de novo design approach for the
generation of medicinally relevant ideas. Journal of Chemical
Information and Modeling, 50(7):1296–1303.
Dey, F. and Caflisch, A. (2008). Fragment-based de novo ligand
design by multiobjective evolutionary optimization. Journal of
Chemical Information and Modeling, 48(3):679–690.
Ekins, S., Honeycutt, J. D., and Metz, J. T. (2010). Evolving
molecules using multi-objective optimization: applying to
ADME/Tox. Drug Discovery Today, 15(11-12):451–460.
References III
Feher, M., Gao, Y., Baber, J. C., Shirley, W. A., and Saunders,
J. (2008). The use of ligand-based de novo design for scaffold
hopping and sidechain optimization: two case studies. Bioorganic
& Medicinal Chemistry, 16(1):422–427.
Firth, N. C., Atrash, B., Brown, N., and Blagg, J. (2015).
MOARF, an Integrated Workflow for Multiobjective
Optimization: Implementation, Synthesis, and Biological
Evaluation. Journal of Chemical Information and Modeling.
Fonseca, C. and Fleming, P. (1998). Multiobjective optimization
and multiple constraint handling with evolutionary algorithms. I.
A unified formulation. IEEE Transactions on Systems, Man and
Cybernetics, Part A: Systems and Humans, 28(1):26–37.
References IV
Giardine, B., Riemer, C., Hardison, R. C., Burhans, R., Elnitski,
L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J.,
Miller, W., Kent, W. J., and Nekrutenko, A. (2005). Galaxy: A
Platform for Interactive Large-Scale Genome Analysis. Genome
Research, 15(10):1451–1455.
Goecks, J., Nekrutenko, A., Taylor, J., and Galaxy Team, T.
(2010). Galaxy: A comprehensive approach for supporting
accessible, reproducible, and transparent computational research
in the life sciences. Genome Biology, 11(8):R86.
Grefenstette, J. (1986). Optimization of Control Parameters for
Genetic Algorithms. IEEE Transactions on Systems, Man and
Cybernetics, 16(1):122–128.
References V
Gurer-Orhan, H., Kool, J., Vermeulen, N. P. E., and Meerman, J.
H. N. (2005). A novel microplate reader-based high-throughput
assay for estrogen receptor binding. International Journal of
Environmental Analytical Chemistry, 85(3):149–161.
Hartenfeller, M., Zettl, H., Walter, M., Rupp, M., Reisen, F.,
Proschak, E., Weggen, S., Stark, H., and Schneider, G. (2012).
DOGS: Reaction-Driven de novo Design of Bioactive
Compounds. PLoS Comput Biol, 8(2):e1002380.
Huang, Q., Li, L.-L., and Yang, S.-Y. (2010). PhDD: a new
pharmacophore-based de novo design method of drug-like
molecules combined with assessment of synthetic accessibility.
Journal of Molecular Graphics and Modelling, 28(8):775–787.
References VI
Kannas, C., Kalvari, I., Lambrinidis, G., Neophytou, C., Savva,
C., Kirmitzoglou, I., Antoniou, Z., Achilleos, K., Scherf, D.,
Pitta, C., Nicolaou, C., Mikros, E., Promponas, V., Gerhauser,
C., Mehta, R., Constantinou, A., and Pattichis, C. (2015). LiSIs:
An Online Scientific Workflow System for Virtual Screening.
Combinatorial Chemistry & High Throughput Screening,
18(3):281 – 295.
Kramer, O. (2010). Evolutionary self-adaptation: a survey of
operators and strategy parameters. Evolutionary Intelligence,
3(2):51–65.
References VII
Kutchukian, P. S., Lou, D., and Shakhnovich, E. I. (2009). FOG:
Fragment Optimized Growth algorithm for the de novo
generation of molecules occupying druglike chemical space.
Journal of Chemical Information and Modeling, 49(7):1630–1642.
Medina-Franco, J. L., L´opez-Vallejo, F., Kuck, D., and Lyko, F.
(2010). Natural products as DNA methyltransferase inhibitors: a
computer-aided discovery approach. Molecular Diversity,
15:293–304.
Nicolaou, C. A., Apostolakis, J., and Pattichis, C. S. (2009a). De
Novo Drug Design Using Multiobjective Evolutionary Graphs.
Journal of Chemical Information and Modeling, 49(2):295–307.
References VIII
Nicolaou, C. A., Kannas, C., and Pattichis, C. S. (2009b).
Optimal graph design using a knowledge-driven multi-objective
evolutionary graph algorithm. In 2009 9th International
Conference on Information Technology and Applications in
Biomedicine, pages 1–6, Larnaka, Cyprus. IEEE.
Backup Frames
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
Table of Contents
6 List of Publications
7 References
8 Backup Frames
Validation of Self-Adaptive MOEA
Use Case 1
Use Case 2
Use Case 3
Use Case 4
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
Pareto Ranking
LiSIs Showcase - Known ER Ligands
A/A Estrogen Ligand Docking Score ER-α Docking Score ER-β
1 Raloxifene -11.70 -8.72
2 Lilly-117018 -11.53 -3.80
3 3-HydroxyTamoxifen -11.02 N/A
4 Nafoxidine -10.88 N/A
5 ICI-182780 -10.73 N/A
6 Pyrolidine -10.04 N/A
7 Clomiphene A -10.01 N/A
8 Nitrofinene Citrate -9.87 N/A
9 ICI-164384 -9.82 -9.13
10 Moxestrol -9.38 -9.77
11 Naringenine -8.55 -7.80
12 Triphenylethylene -8.50 N/A
13 Afema -8.15 -7.78
14 Danazol -6.99 N/A
15 Ethamoxytriphetol -6.67 N/A
16 4-HydroxyTamoxifen -6.60 N/A
17 Dioxin -6.22 N/A
18 Estralutin -5.86 -3.80
19 Cyclopentanone -4.88 N/A
20 Miproxifene Phosphate -4.48 N/A
21 EM-800 N/A N/A
Note: The list was retrieved from PubChem and it includes compounds characterized as
“estrogen ligands”. N/A; no binding affinity.
LiSIs Showcase - Natural-like Rule of 5 filter
GRANATUM Rule of 5 filter:
1 MW between 160 and 700,
2 HBD less or equal to 5,
3 HBA less or equal to 10,
4 TPSA less than 140, and
5 cLogP between -0.4 and 5.6.
eMEGA Settings
Table: eMEGA experimental design settings
Dataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Structural Similarity
Chemical Descriptor
Similarity
500 500
Mutation Probability: 15%
Crossover Probability: 80%
Selection Type: Roulette
Diversity Type: Genotype
Dataset 2
SAMOEA Settings
Table: SAMOEA experimental design settings
SAMOEA
Dataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Non Dominate
Solutions Percentage
Unique Solutions
Percentage
20 100
Mutation Probability: 15%
Crossover Probability: 80%
Selection Type: Roulette
Diversity Type: Phenotype
Dataset 2
eMEGA
Dataset 1 Structural Similarity
Chemical Descriptor
Similarity
100 1
Defined during run time.
Based on SAMOEA’s chro-
mosomes.
Dataset 2
Virtual Machine Specifications
Table: Specifications of the virtual machine the experimental runs were
performed
Linux Virtual Machine
CPU 4x Virtual CPU @ 2GHz
RAM 16GB
OS CentOS 6
eMEGA Maybridge Run 1
Figure: eMEGA Run 1 results for Maybridge dataset.
eMEGA Maybridge Run 2
Figure: eMEGA Run 2 results for Maybridge dataset.
eMEGA Maybridge Run 3
Figure: eMEGA Run 3 results for Maybridge dataset.
eMEGA Maybridge Run 4
Figure: eMEGA Run 4 results for Maybridge dataset.
eMEGA Maybridge Run 5
Figure: eMEGA Run 5 results for Maybridge dataset.
eMEGA Maybridge All Runs
Figure: eMEGA results for Maybridge dataset.
eMEGA Maybridge All Runs Top 10 Results (1)
Figure: eMEGA Top 10 results for Maybridge dataset.
eMEGA Maybridge All Runs Top 10 Results (2)
Figure: eMEGA Top 10 results for Maybridge dataset compared with
Seliciclib, the red highlighted part of the molecules is their common
core.
eMEGA Asinex Run 1
Figure: eMEGA Run 1 results for Asinex dataset.
eMEGA Asinex Run 2
Figure: eMEGA Run 2 results for Asinex dataset.
eMEGA Asinex Run 3
Figure: eMEGA Run 3 results for Asinex dataset.
Results - eMEGA Asinex Run 4
Figure: eMEGA Run 4 results for Asinex dataset.
eMEGA Asinex Run 5
Figure: eMEGA Run 5 results for Asinex dataset.
eMEGA Asinex All Runs
Figure: eMEGA results for Asinex dataset.
eMEGA Asinex All Runs Top 10 Results (1)
Figure: eMEGA Top 10 results for Asinex dataset.
eMEGA Asinex All Runs Top 10 Results (2)
Figure: eMEGA Top 10 results for Asinex dataset compared with
Seliciclib, the red highlighted part of the molecules is their common
core.
SAMOEA Maybridge Run 1
Figure: SAMOEA Run 1 results for Maybridge dataset.
SAMOEA Maybridge Run 2
Figure: SAMOEA Run 2 results for Maybridge dataset.
SAMOEA Maybridge Run 3
Figure: SAMOEA Run 3 results for Maybridge dataset.
SAMOEA Maybridge Run 4
Figure: SAMOEA Run 4 results for Maybridge dataset.
SAMOEA Maybridge Run 5
Figure: SAMOEA Run 5 results for Maybridge dataset.
SAMOEA Maybridge All Runs
Figure: SAMOEA results for Maybridge dataset.
SAMOEA Maybridge All Runs Top 10 Results (1)
Figure: SAMOEA Top 10 results for Maybridge dataset.
SAMOEA Maybridge All Runs Top 10 Results (2)
Figure: SAMOEA Top 10 results for Maybridge dataset compared with
Seliciclib, the red highlighted part of the molecules is their common
core.
SAMOEA Top 10 proposed settings for eMEGA
for Maybridge dataset
Table: SAMOEA Top 10 proposed settings for eMEGA for Maybridge
dataset
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Unique
Solutions
%
Rank
0.029 0.694 roulette genotype 0.9 0.986 1
0.175 0.818 roulette phenotype 0.914 0.961 1
0.172 0.818 tournament phenotype 0.934 0.9533 1
0.026 0.694 roulette phenotype 0.928 0.955 1
0.001 0.963 roulette phenotype 0.982 0.848 1
0.177 0.818 roulette phenotype 0.921 0.956 1
0.083 0.73 tournament phenotype 0.95 0.946 1
0.086 0.798 tournament genotype 0.976 0.928 1
0.172 0.818 best genotype 0.914 0.973 2
0.176 0.818 roulette genotype 0.9312 0.956 2
Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the
actual %. The smaller the number listed here the better. ’Rank’ is their non dominance
rank.
SAMOEA Asinex Run 1
Figure: SAMOEA Run 1 results for Asinex dataset.
SAMOEA Asinex Run 2
Figure: SAMOEA Run 2 results for Asinex dataset.
SAMOEA Asinex Run 3
Figure: SAMOEA Run 3 results for Asinex dataset.
SAMOEA Asinex Run 4
Figure: SAMOEA Run 4 results for Asinex dataset.
SAMOEA Asinex All Runs
Figure: SAMOEA results for Asinex dataset.
SAMOEA Asinex All Runs Top 10 Results (1)
Figure: SAMOEA Top 10 results for Asinex dataset.
SAMOEA Asinex All Runs Top 10 Results (2)
Figure: SAMOEA Top 10 results for Asinex dataset compared with
Seliciclib, the red highlighted part of the molecules is their common
core.
SAMOEA Top 10 proposed settings for eMEGA
for Maybridge Asinex
Table: SAMOEA Top 10 proposed settings for eMEGA for Asinex
dataset
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Unique
Solutions
%
Rank
0.105 1.0 best phenotype 0.988 0.931 1
0.139 0.963 tournament phenotype 0.962 0.956 1
0.089 0.694 tournament genotype 0.976 0.943 1
0.139 0.969 best phenotype 0.96 0.96 1
0.108 0.69 tournament genotype 0.955 0.962 1
0.1 1.0 best phenotype 0.988 0.942 1
0.088 0.685 tournament genotype 0.96 0.962 1
0.139 0.966 roulette phenotype 0.965 0.948 1
0.089 0.709 tournament genotype 0.964 0.957 2
Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the
actual %. The smaller the number listed here the better. ’Rank’ is their non dominance
rank.
MOARF Results
Figure: MOARF’s results compared with Seliciclib.
Compare SAMOEA, eMEGA and MOARF
Figure: Compare all Top 10 results with MOARF’s results and
Seliciclib.
Discussion (1)
eMEGA and SAMOEA generate molecules that approximate
Seliciclib,
Datasets and algorithms have different common core with
Seliciclib,
MOARF approximates Seliciclib better than eMEGA and
SAMOEA:
Generates molecules in a more chemical oriented way, with
less stochastic operations,
Starts from a selected core for the target where then attaches
new fragments on to it,
SAMOEA explores the space better than eMEGA and
MOARF
Discussion (1)
eMEGA and SAMOEA generate molecules that approximate
Seliciclib,
Datasets and algorithms have different common core with
Seliciclib,
MOARF approximates Seliciclib better than eMEGA and
SAMOEA:
Generates molecules in a more chemical oriented way, with
less stochastic operations,
Starts from a selected core for the target where then attaches
new fragments on to it,
SAMOEA explores the space better than eMEGA and
MOARF
Discussion (1)
eMEGA and SAMOEA generate molecules that approximate
Seliciclib,
Datasets and algorithms have different common core with
Seliciclib,
MOARF approximates Seliciclib better than eMEGA and
SAMOEA:
Generates molecules in a more chemical oriented way, with
less stochastic operations,
Starts from a selected core for the target where then attaches
new fragments on to it,
SAMOEA explores the space better than eMEGA and
MOARF
Discussion (1)
eMEGA and SAMOEA generate molecules that approximate
Seliciclib,
Datasets and algorithms have different common core with
Seliciclib,
MOARF approximates Seliciclib better than eMEGA and
SAMOEA:
Generates molecules in a more chemical oriented way, with
less stochastic operations,
Starts from a selected core for the target where then attaches
new fragments on to it,
SAMOEA explores the space better than eMEGA and
MOARF
Discussion (1)
eMEGA and SAMOEA generate molecules that approximate
Seliciclib,
Datasets and algorithms have different common core with
Seliciclib,
MOARF approximates Seliciclib better than eMEGA and
SAMOEA:
Generates molecules in a more chemical oriented way, with
less stochastic operations,
Starts from a selected core for the target where then attaches
new fragments on to it,
SAMOEA explores the space better than eMEGA and
MOARF
Discussion (1)
eMEGA and SAMOEA generate molecules that approximate
Seliciclib,
Datasets and algorithms have different common core with
Seliciclib,
MOARF approximates Seliciclib better than eMEGA and
SAMOEA:
Generates molecules in a more chemical oriented way, with
less stochastic operations,
Starts from a selected core for the target where then attaches
new fragments on to it,
SAMOEA explores the space better than eMEGA and
MOARF
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
Discussion (3)
The objective fitness scores for the proposed settings are very
high, which means that the actual percentage is really low, below
5%. From this we can conclude the following:
eMEGA instances generate a large number of identical
solutions, despite the fact that they have different
configurations, this is something that we noticed with
previous experiments when comparing MEGA, eMEGA and
MOGA [Nicolaou et al., 2009b], and
The objective fitness functions we choose to use in SAMOEA
compete each other, which means that having eMEGAs
generating a high number of unique and non dominated
solutions (above 20%) proves to be a difficult task.
Discussion (3)
The objective fitness scores for the proposed settings are very
high, which means that the actual percentage is really low, below
5%. From this we can conclude the following:
eMEGA instances generate a large number of identical
solutions, despite the fact that they have different
configurations, this is something that we noticed with
previous experiments when comparing MEGA, eMEGA and
MOGA [Nicolaou et al., 2009b], and
The objective fitness functions we choose to use in SAMOEA
compete each other, which means that having eMEGAs
generating a high number of unique and non dominated
solutions (above 20%) proves to be a difficult task.
Use Case 1: Docked designed molecules (1)
Figure: Designed molecule DnD 6 SP 20 4 X 13a docked to ER-α.
Use Case 1: Docked designed molecules (2)
Figure: Designed molecule DnD 31 SP 150 37 M 19 docked to ER-α.
Use Case 1: Docked designed molecules (3)
Figure: Designed molecule DnD 8 SP 9 2 M 13 docked to ER-α.
Use Case 1: Docked designed molecules (4)
Figure: Designed molecule DnD 4 SP 199 49 X 46b docked to ER-α.
Use Case 1: Docked designed molecules (5)
Figure: Designed molecule DnD 12 SP 75 18 M 13 docked to ER-α.
Use Case 1: Docked designed molecules (6)
Figure: Designed molecule DnD 31 SP 6 1 M 16 docked to ER-α.
Use Case 1: Docked designed molecules (7)
Figure: Designed molecule DnD 15 SP 168 41 M 0 docked to ER-α.
Use Case 1: Docked designed molecules (8)
Figure: Designed molecule DnD 11 SP 74 18 M 4 docked to ER-α.
Use Case 1: Docked designed molecules (9)
Figure: Designed molecule DnD 31 SP 193 48 X 76b docked to ER-α.
Use Case 1: Docked designed molecules (10)
Figure: Designed molecule DnD 1 SP 78 19 X 84a docked to ER-α.
Use Case 2: About
Design molecules that bind to ER-α based on:
Structural similarity to Tamoxifen, and
Chemical Properties similarity to Tamoxifen.
Figure: Tamoxifen.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 116 / 130
Use Case 2: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 117 / 130
Use Case 2: Results - In objective space
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 118 / 130
Use Case 2: Results - Designed molecules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 119 / 130
Use Case 2: Results - AutoDock Vina docking
Molecule Id Docking Affinity (kcal/mol)
DnD 42 SP 194 48 X 96b -10.1
DnD 17 SP 199 49 M 4 -10
DnD 33 SP 189 47 X 66b -9.9
DnD 48 SP 193 48 M 5 -9.6
Tamoxifen -8.2
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 120 / 130
Use Case 2: Results - Self-Adaptive MOEA non
dominated settings for eMEGA
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Pareto
Hypervolume
Rank
0.02707 0.97973 tournament genotype 0.983 0.153 1
0.02758 0.97965 tournament phenotype 0.988 0.152 1
Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the
number listed here the better. ’Rank’ is their non dominance rank.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 121 / 130
Use Case 2: Docked designed molecules (1)
Figure: Designed molecule DnD 42 SP 194 48 X 96b docked to ER-α.
Use Case 2: Docked designed molecules (2)
Figure: Designed molecule DnD 17 SP 199 49 M 4 docked to ER-α.
Use Case 2: Docked designed molecules (3)
Figure: Designed molecule DnD 33 SP 189 47 X 66b docked to ER-α.
Use Case 2: Docked designed molecules (4)
Figure: Designed molecule DnD 48 SP 193 48 M 5 docked to ER-α.
Use Case 3: Docked designed molecules (1)
Figure: Designed molecule DnD 31 SP 194 48 M 49 docked to ER-α.
Use Case 3: Docked designed molecules (2)
Figure: Designed molecule DnD 34 SP 197 49 X 13a docked to ER-α.
Use Case 4: Docked designed molecules (1)
Figure: Designed molecule DnD 19 SP 196 48 X 59b docked to
Proteasome B5.
Use Case 4: Docked designed molecules (2)
Figure: Designed molecule DnD 49 SP 193 48 X 123b docked to
Proteasome B5.
Use Case 4: Docked designed molecules (3)
Figure: Designed molecule DnD 1 SP 196 48 X 67a docked to
Proteasome B5.

More Related Content

Similar to CKannas PhD Thesis Slides

2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflowsmyGrid team
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
supporting communities in an increasingly decentralized biomedical research e...
supporting communities in an increasingly decentralized biomedical research e...supporting communities in an increasingly decentralized biomedical research e...
supporting communities in an increasingly decentralized biomedical research e...Brian Bot
 
xAPI-Enabled Mobile Health System with Context Awareness Recommendation Engin...
xAPI-Enabled Mobile Health System with Context Awareness Recommendation Engin...xAPI-Enabled Mobile Health System with Context Awareness Recommendation Engin...
xAPI-Enabled Mobile Health System with Context Awareness Recommendation Engin...Megan Bowe
 
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsGolden Helix Inc
 
Open PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsOpen PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsopen_phacts
 
Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009Paolo Missier
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedSri Ambati
 
Multi-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorMulti-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorLevi Waldron
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
XAPI and Machine Learning for Patient / Learner
XAPI and Machine Learning for Patient / LearnerXAPI and Machine Learning for Patient / Learner
XAPI and Machine Learning for Patient / LearnerJessie Chuang
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckPistoia Alliance
 
7.1 design exercise presentation 10 11 (47)
7.1 design exercise presentation 10 11 (47)7.1 design exercise presentation 10 11 (47)
7.1 design exercise presentation 10 11 (47)LeNS_slide
 
7.1 design exercise presentation 10 11 (47)
7.1 design exercise presentation 10 11 (47)7.1 design exercise presentation 10 11 (47)
7.1 design exercise presentation 10 11 (47)LeNS_slide
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuAnne Deslattes Mays
 

Similar to CKannas PhD Thesis Slides (20)

2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
supporting communities in an increasingly decentralized biomedical research e...
supporting communities in an increasingly decentralized biomedical research e...supporting communities in an increasingly decentralized biomedical research e...
supporting communities in an increasingly decentralized biomedical research e...
 
xAPI-Enabled Mobile Health System with Context Awareness Recommendation Engin...
xAPI-Enabled Mobile Health System with Context Awareness Recommendation Engin...xAPI-Enabled Mobile Health System with Context Awareness Recommendation Engin...
xAPI-Enabled Mobile Health System with Context Awareness Recommendation Engin...
 
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large CohortsRare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
Rare Variant Analysis Workflows: Analyzing NGS Data in Large Cohorts
 
NCFPD Research & Resources in Food Protection March Webinar | NCFPD Tools 2014
NCFPD Research & Resources in Food Protection March Webinar | NCFPD Tools 2014NCFPD Research & Resources in Food Protection March Webinar | NCFPD Tools 2014
NCFPD Research & Resources in Food Protection March Webinar | NCFPD Tools 2014
 
2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked
 
Open PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsOpen PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow tools
 
Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009Invited talk @ ESIP summer meeting, 2009
Invited talk @ ESIP summer meeting, 2009
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Multi-omics methods and resources for Bioconductor
Multi-omics methods and resources for BioconductorMulti-omics methods and resources for Bioconductor
Multi-omics methods and resources for Bioconductor
 
Semantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life SciencesSemantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life Sciences
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
XAPI and Machine Learning for Patient / Learner
XAPI and Machine Learning for Patient / LearnerXAPI and Machine Learning for Patient / Learner
XAPI and Machine Learning for Patient / Learner
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deck
 
7.1 design exercise presentation 10 11 (47)
7.1 design exercise presentation 10 11 (47)7.1 design exercise presentation 10 11 (47)
7.1 design exercise presentation 10 11 (47)
 
7.1 design exercise presentation 10 11 (47)
7.1 design exercise presentation 10 11 (47)7.1 design exercise presentation 10 11 (47)
7.1 design exercise presentation 10 11 (47)
 
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
 

More from Christos Kannas

CKannas_UK_QSAR_Oct_2015_Poster_Port
CKannas_UK_QSAR_Oct_2015_Poster_PortCKannas_UK_QSAR_Oct_2015_Poster_Port
CKannas_UK_QSAR_Oct_2015_Poster_PortChristos Kannas
 
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818Christos Kannas
 
LiSIs: a Galaxy based platform for Life Sciences Research
LiSIs: a Galaxy based platform for Life Sciences ResearchLiSIs: a Galaxy based platform for Life Sciences Research
LiSIs: a Galaxy based platform for Life Sciences ResearchChristos Kannas
 
Estimate Water Solubility
Estimate Water SolubilityEstimate Water Solubility
Estimate Water SolubilityChristos Kannas
 
LiSIs Poster Presentation
LiSIs Poster PresentationLiSIs Poster Presentation
LiSIs Poster PresentationChristos Kannas
 
GCC2013 LiSIs Lightning Talk
GCC2013 LiSIs Lightning TalkGCC2013 LiSIs Lightning Talk
GCC2013 LiSIs Lightning TalkChristos Kannas
 
Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Christos Kannas
 
20120615_Granatum_COST_v2
20120615_Granatum_COST_v220120615_Granatum_COST_v2
20120615_Granatum_COST_v2Christos Kannas
 
2009 MSc Presentation for Parallel-MEGA
2009 MSc Presentation for Parallel-MEGA2009 MSc Presentation for Parallel-MEGA
2009 MSc Presentation for Parallel-MEGAChristos Kannas
 
9th ITAB 2009 Parallel-MEGA
9th ITAB 2009 Parallel-MEGA9th ITAB 2009 Parallel-MEGA
9th ITAB 2009 Parallel-MEGAChristos Kannas
 

More from Christos Kannas (14)

CKannas_UK_QSAR_Oct_2015_Poster_Port
CKannas_UK_QSAR_Oct_2015_Poster_PortCKannas_UK_QSAR_Oct_2015_Poster_Port
CKannas_UK_QSAR_Oct_2015_Poster_Port
 
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
 
CSC2013_LiSIs_poster
CSC2013_LiSIs_posterCSC2013_LiSIs_poster
CSC2013_LiSIs_poster
 
LiSIs: a Galaxy based platform for Life Sciences Research
LiSIs: a Galaxy based platform for Life Sciences ResearchLiSIs: a Galaxy based platform for Life Sciences Research
LiSIs: a Galaxy based platform for Life Sciences Research
 
Estimate Water Solubility
Estimate Water SolubilityEstimate Water Solubility
Estimate Water Solubility
 
Diversity Filtering
Diversity FilteringDiversity Filtering
Diversity Filtering
 
LiSIs platform
LiSIs platformLiSIs platform
LiSIs platform
 
LiSIs Poster Presentation
LiSIs Poster PresentationLiSIs Poster Presentation
LiSIs Poster Presentation
 
GCC2013 LiSIs poster
GCC2013 LiSIs posterGCC2013 LiSIs poster
GCC2013 LiSIs poster
 
GCC2013 LiSIs Lightning Talk
GCC2013 LiSIs Lightning TalkGCC2013 LiSIs Lightning Talk
GCC2013 LiSIs Lightning Talk
 
Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0
 
20120615_Granatum_COST_v2
20120615_Granatum_COST_v220120615_Granatum_COST_v2
20120615_Granatum_COST_v2
 
2009 MSc Presentation for Parallel-MEGA
2009 MSc Presentation for Parallel-MEGA2009 MSc Presentation for Parallel-MEGA
2009 MSc Presentation for Parallel-MEGA
 
9th ITAB 2009 Parallel-MEGA
9th ITAB 2009 Parallel-MEGA9th ITAB 2009 Parallel-MEGA
9th ITAB 2009 Parallel-MEGA
 

Recently uploaded

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 

Recently uploaded (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 

CKannas PhD Thesis Slides

  • 1. Scientific Workflow Systems and Multi-Objective Evolutionary Algorithms for Life Sciences Informatics Christos C. Kannas Computer Science, University of Cyprus 6th June 2017
  • 2. Table of Contents 1 Introduction Scientific Workflow Management Systems Self-Adaptive Multi-Objective Evolutionary Algorithms Virtual Screening & De Novo Molecular Design 2 Life Sciences Informatics platform About Life Sciences Informatics platform LiSIs Showcase LiSIs Showcase Discussion 3 Self-Adaptive Multi-Objective Evolutionary Algorithm About Self-Adaptive MOEA Self-Adaptive MOEA Showcases Self-Adaptive MOEA Showcases Discussion 4 Concluding Remarks Concluding Remarks - LiSIs platform Concluding Remarks - Self-Adaptive MOEA 5 Future Work Future Work - LiSIs platform Future Work - Self-Adaptive MOEA C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 1 / 130
  • 3. Introduction C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 2 / 130
  • 4. Table of Contents 1 Introduction Scientific Workflow Management Systems Self-Adaptive Multi-Objective Evolutionary Algorithms Virtual Screening & De Novo Molecular Design 2 Life Sciences Informatics platform About Life Sciences Informatics platform LiSIs Showcase LiSIs Showcase Discussion 3 Self-Adaptive Multi-Objective Evolutionary Algorithm About Self-Adaptive MOEA Self-Adaptive MOEA Showcases Self-Adaptive MOEA Showcases Discussion 4 Concluding Remarks Concluding Remarks - LiSIs platform Concluding Remarks - Self-Adaptive MOEA 5 Future Work Future Work - LiSIs platform Future Work - Self-Adaptive MOEA C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 2 / 130
  • 5. Scientific Workflow Management Systems C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 3 / 130
  • 6. SWMSs Application Domains C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 4 / 130
  • 7. Self-Adaptive Multi-Objective Evolutionary Algorithms Multi-Objective Evolutionary Algorithms: Family of algorithms inspired by nature: Evolve a population Mutation and Crossover Select fittest individuals by Pareto ranking Handle 1 to 3 objectives Self-Adaptive Techniques: Optimise search parameters: Population Size Mutation Rate Crossover Rate Generation Gap Scaling Window Optimise reproduction operators: Mutation Operator(s) Crossover Operator(s) Parent Selection Operator C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
  • 8. Self-Adaptive Multi-Objective Evolutionary Algorithms Multi-Objective Evolutionary Algorithms: Family of algorithms inspired by nature: Evolve a population Mutation and Crossover Select fittest individuals by Pareto ranking Handle 1 to 3 objectives Self-Adaptive Techniques: Optimise search parameters: Population Size Mutation Rate Crossover Rate Generation Gap Scaling Window Optimise reproduction operators: Mutation Operator(s) Crossover Operator(s) Parent Selection Operator C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
  • 9. Self-Adaptive Multi-Objective Evolutionary Algorithms Multi-Objective Evolutionary Algorithms: Family of algorithms inspired by nature: Evolve a population Mutation and Crossover Select fittest individuals by Pareto ranking Handle 1 to 3 objectives Self-Adaptive Techniques: Optimise search parameters: Population Size Mutation Rate Crossover Rate Generation Gap Scaling Window Optimise reproduction operators: Mutation Operator(s) Crossover Operator(s) Parent Selection Operator C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
  • 10. Self-Adaptive Multi-Objective Evolutionary Algorithms Multi-Objective Evolutionary Algorithms: Family of algorithms inspired by nature: Evolve a population Mutation and Crossover Select fittest individuals by Pareto ranking Handle 1 to 3 objectives Self-Adaptive Techniques: Optimise search parameters: Population Size Mutation Rate Crossover Rate Generation Gap Scaling Window Optimise reproduction operators: Mutation Operator(s) Crossover Operator(s) Parent Selection Operator C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
  • 11. Self-Adaptive Multi-Objective Evolutionary Algorithms Multi-Objective Evolutionary Algorithms: Family of algorithms inspired by nature: Evolve a population Mutation and Crossover Select fittest individuals by Pareto ranking Handle 1 to 3 objectives Self-Adaptive Techniques: Optimise search parameters: Population Size Mutation Rate Crossover Rate Generation Gap Scaling Window Optimise reproduction operators: Mutation Operator(s) Crossover Operator(s) Parent Selection Operator C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
  • 12. Self-Adaptive Multi-Objective Evolutionary Algorithms Multi-Objective Evolutionary Algorithms: Family of algorithms inspired by nature: Evolve a population Mutation and Crossover Select fittest individuals by Pareto ranking Handle 1 to 3 objectives Self-Adaptive Techniques: Optimise search parameters: Population Size Mutation Rate Crossover Rate Generation Gap Scaling Window Optimise reproduction operators: Mutation Operator(s) Crossover Operator(s) Parent Selection Operator C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130
  • 13. Drug Discovery Process - Steps C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 6 / 130
  • 14. Drug Discovery Process - Timeline C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 7 / 130
  • 15. Life Sciences Informatics platform C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 8 / 130
  • 16. Table of Contents 1 Introduction Scientific Workflow Management Systems Self-Adaptive Multi-Objective Evolutionary Algorithms Virtual Screening & De Novo Molecular Design 2 Life Sciences Informatics platform About Life Sciences Informatics platform LiSIs Showcase LiSIs Showcase Discussion 3 Self-Adaptive Multi-Objective Evolutionary Algorithm About Self-Adaptive MOEA Self-Adaptive MOEA Showcases Self-Adaptive MOEA Showcases Discussion 4 Concluding Remarks Concluding Remarks - LiSIs platform Concluding Remarks - Self-Adaptive MOEA 5 Future Work Future Work - LiSIs platform Future Work - Self-Adaptive MOEA C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 8 / 130
  • 17. Motivation & Objectives Motivation Provide an easy to use web based platform, Focused on Virtual Screening (VS) of natural products, and Aimed towards cancer chemoprevention researchers. Objectives Design and develop a web based Scientific Workflow Management System (SWMS), Provide tools for VS, and Evaluate it on use cases for identifying novel chemopreventive agents. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
  • 18. Motivation & Objectives Motivation Provide an easy to use web based platform, Focused on Virtual Screening (VS) of natural products, and Aimed towards cancer chemoprevention researchers. Objectives Design and develop a web based Scientific Workflow Management System (SWMS), Provide tools for VS, and Evaluate it on use cases for identifying novel chemopreventive agents. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
  • 19. Motivation & Objectives Motivation Provide an easy to use web based platform, Focused on Virtual Screening (VS) of natural products, and Aimed towards cancer chemoprevention researchers. Objectives Design and develop a web based Scientific Workflow Management System (SWMS), Provide tools for VS, and Evaluate it on use cases for identifying novel chemopreventive agents. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
  • 20. Motivation & Objectives Motivation Provide an easy to use web based platform, Focused on Virtual Screening (VS) of natural products, and Aimed towards cancer chemoprevention researchers. Objectives Design and develop a web based Scientific Workflow Management System (SWMS), Provide tools for VS, and Evaluate it on use cases for identifying novel chemopreventive agents. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
  • 21. Motivation & Objectives Motivation Provide an easy to use web based platform, Focused on Virtual Screening (VS) of natural products, and Aimed towards cancer chemoprevention researchers. Objectives Design and develop a web based Scientific Workflow Management System (SWMS), Provide tools for VS, and Evaluate it on use cases for identifying novel chemopreventive agents. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
  • 22. Motivation & Objectives Motivation Provide an easy to use web based platform, Focused on Virtual Screening (VS) of natural products, and Aimed towards cancer chemoprevention researchers. Objectives Design and develop a web based Scientific Workflow Management System (SWMS), Provide tools for VS, and Evaluate it on use cases for identifying novel chemopreventive agents. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
  • 23. Scientific Workflow Management Systems for Virtual Screening Applications Technology Scientific Field(s) Open Source Taverna Java Bioinformatics, Chemistry, Astronomy, Data Mining, Text Mining, Music Galaxy Python Life Sciences, Bioinformatics Knime Java Life Sciences, Chemoinformatics, Bioinformatics, High Performance Data Anal- ysis Commercial Inforsence/DiscoveryNet Life Sciences, Healthcare, Environmental Monitoring, Geo-hazard Modelling Pipeline Pilot Biology, Chemistry, Material Science C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 10 / 130
  • 24. Funding Support The work has been partially supported through the EU-FP7 GRANATUM project, ”A Social Collaborative Working Space Semantically Interlinking Biomedical Researchers, Knowledge and data for the design and execution of In Silico Models and Experiments in Cancer Chemoprevention”, contract number 270139. Support the research of EU-FP7 Linked2Safety project, ”A Next-Generation, Secure Linked Data Medical Information Space For Semantically-Interconnecting Electronic Health Records and Clinical Trials Systems Advancing Patients Safety In Clinical Research”, contract number 288328. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130
  • 25. Funding Support The work has been partially supported through the EU-FP7 GRANATUM project, ”A Social Collaborative Working Space Semantically Interlinking Biomedical Researchers, Knowledge and data for the design and execution of In Silico Models and Experiments in Cancer Chemoprevention”, contract number 270139. Support the research of EU-FP7 Linked2Safety project, ”A Next-Generation, Secure Linked Data Medical Information Space For Semantically-Interconnecting Electronic Health Records and Clinical Trials Systems Advancing Patients Safety In Clinical Research”, contract number 288328. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130
  • 26. Life Sciences Informatics platform Life Sciences Informatics (LiSIs) is a web based SWMS for VS [Kannas et al., 2015]. LiSIs is based on the Galaxy SWMS [Goecks et al., 2010], [Blankenberg et al., 2010], [Giardine et al., 2005]. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130
  • 27. Life Sciences Informatics platform Life Sciences Informatics (LiSIs) is a web based SWMS for VS [Kannas et al., 2015]. LiSIs is based on the Galaxy SWMS [Goecks et al., 2010], [Blankenberg et al., 2010], [Giardine et al., 2005]. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130
  • 28. LiSIs modules C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130
  • 29. LiSIs modules C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130
  • 30. LiSIs modules C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130
  • 31. LiSIs modules C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130
  • 32. LiSIs Showcase C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 14 / 130
  • 33. LiSIs Showcase Information LiSIs was (successfully) used for the discovery of promising agents with chemopreventive properties, that are able to bind to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β (ER-β) Datasets: 2414 compounds from Indofine, 55 compounds characterized by Medina-Franco et al. [Medina-Franco et al., 2010], and 21 known ER ligands retrieved from PubChem. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
  • 34. LiSIs Showcase Information LiSIs was (successfully) used for the discovery of promising agents with chemopreventive properties, that are able to bind to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β (ER-β) Datasets: 2414 compounds from Indofine, 55 compounds characterized by Medina-Franco et al. [Medina-Franco et al., 2010], and 21 known ER ligands retrieved from PubChem. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
  • 35. LiSIs Showcase Information LiSIs was (successfully) used for the discovery of promising agents with chemopreventive properties, that are able to bind to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β (ER-β) Datasets: 2414 compounds from Indofine, 55 compounds characterized by Medina-Franco et al. [Medina-Franco et al., 2010], and 21 known ER ligands retrieved from PubChem. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
  • 36. LiSIs Showcase Information LiSIs was (successfully) used for the discovery of promising agents with chemopreventive properties, that are able to bind to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β (ER-β) Datasets: 2414 compounds from Indofine, 55 compounds characterized by Medina-Franco et al. [Medina-Franco et al., 2010], and 21 known ER ligands retrieved from PubChem. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
  • 37. LiSIs Showcase Information LiSIs was (successfully) used for the discovery of promising agents with chemopreventive properties, that are able to bind to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β (ER-β) Datasets: 2414 compounds from Indofine, 55 compounds characterized by Medina-Franco et al. [Medina-Franco et al., 2010], and 21 known ER ligands retrieved from PubChem. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
  • 38. LiSIs Showcase Workflow C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 16 / 130
  • 39. LiSIs Showcase Docking Results (a) ER-α Docking Score (b) ER-β Docking Score C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 17 / 130
  • 40. LiSIs Showcase Discussion From Indofine dataset (2414 compounds), based on their natural-like criteria and docking results, we selected: 18 potential ER ligands, Were further investigated in vitro with the ER binding assay described by Gurer-Orhan et al. [Gurer-Orhan et al., 2005] with minor modifications, 15 out of 18 compounds (83.3%) were experimentally confirmed active. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130
  • 41. LiSIs Showcase Discussion From Indofine dataset (2414 compounds), based on their natural-like criteria and docking results, we selected: 18 potential ER ligands, Were further investigated in vitro with the ER binding assay described by Gurer-Orhan et al. [Gurer-Orhan et al., 2005] with minor modifications, 15 out of 18 compounds (83.3%) were experimentally confirmed active. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130
  • 42. LiSIs Showcase Discussion From Indofine dataset (2414 compounds), based on their natural-like criteria and docking results, we selected: 18 potential ER ligands, Were further investigated in vitro with the ER binding assay described by Gurer-Orhan et al. [Gurer-Orhan et al., 2005] with minor modifications, 15 out of 18 compounds (83.3%) were experimentally confirmed active. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130
  • 43. Self-Adaptive Multi-Objective Evolutionary Algorithm C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 19 / 130
  • 44. Table of Contents 1 Introduction Scientific Workflow Management Systems Self-Adaptive Multi-Objective Evolutionary Algorithms Virtual Screening & De Novo Molecular Design 2 Life Sciences Informatics platform About Life Sciences Informatics platform LiSIs Showcase LiSIs Showcase Discussion 3 Self-Adaptive Multi-Objective Evolutionary Algorithm About Self-Adaptive MOEA Self-Adaptive MOEA Showcases Self-Adaptive MOEA Showcases Discussion 4 Concluding Remarks Concluding Remarks - LiSIs platform Concluding Remarks - Self-Adaptive MOEA 5 Future Work Future Work - LiSIs platform Future Work - Self-Adaptive MOEA C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 19 / 130
  • 45. Multi-Objective Algorithms for Molecular Design Name MO Method Search Method Remarks Reference EA- Inventor Weighted Evolutionary Algorithm Ligand [Feher et al., 2008] GANDI Weighted Parallel Evo- lutionary Al- gorithm Structure [Dey and Caflisch, 2008] FOG Weighted Evolutionary Algorithm Ligand [Kutchukian et al., 2009] MEGA Pareto based Evolutionary Algorithm Ligand & Struc- ture [Nicolaou et al., 2009a] PLD Pareto based Evolutionary Algorithm ADME related properties [Ekins et al., 2010] NovoFLAP Weighted Evolutionary Algorithm Ligand [Damewood et al., 2010] PhDD Weighted Workflow Pharmacophore [Huang et al., 2010] DOGS Weighted Workflow Ligand [Hartenfeller et al., 2012] LiGen Weighted Workflow Ligand, Struc- ture & Pharma- cophore [Beccari et al., 2013] MOARF Weighted Workflow Ligand & Struc- ture [Firth et al., 2015] Synopsis Pareto based Evolutionary Algorithm Ligand & Struc- ture [Daeyaert and Deem, 2016] C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 20 / 130
  • 46. Motivation & Objectives Motivation Find suitable search parameters for an algorithm in a given problem, and Automate this process. Objectives Design and develop an algorithm: To search for the fittest search parameters of MOEAs, To be problem agnostic, and Evaluate on our previously proposed eMEGA for molecular De Novo Design. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
  • 47. Motivation & Objectives Motivation Find suitable search parameters for an algorithm in a given problem, and Automate this process. Objectives Design and develop an algorithm: To search for the fittest search parameters of MOEAs, To be problem agnostic, and Evaluate on our previously proposed eMEGA for molecular De Novo Design. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
  • 48. Motivation & Objectives Motivation Find suitable search parameters for an algorithm in a given problem, and Automate this process. Objectives Design and develop an algorithm: To search for the fittest search parameters of MOEAs, To be problem agnostic, and Evaluate on our previously proposed eMEGA for molecular De Novo Design. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
  • 49. Motivation & Objectives Motivation Find suitable search parameters for an algorithm in a given problem, and Automate this process. Objectives Design and develop an algorithm: To search for the fittest search parameters of MOEAs, To be problem agnostic, and Evaluate on our previously proposed eMEGA for molecular De Novo Design. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
  • 50. Motivation & Objectives Motivation Find suitable search parameters for an algorithm in a given problem, and Automate this process. Objectives Design and develop an algorithm: To search for the fittest search parameters of MOEAs, To be problem agnostic, and Evaluate on our previously proposed eMEGA for molecular De Novo Design. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
  • 51. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
  • 52. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
  • 53. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
  • 54. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
  • 55. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
  • 56. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
  • 57. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
  • 58. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
  • 59. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
  • 60. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
  • 61. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
  • 62. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
  • 63. Self-Adaptive MOEA Pseudocode C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 23 / 130
  • 64. Self-Adaptive MOEA Chromosome Chromosomes Example Objective Fitness Functions Objective Fitness Function Range Example Non-dominated Solutions % 0 - 1.0 0.90 Unique Solutions % 0 - 1.0 0.88 Pareto Front Hypervolume 0 - 1.0 0.56 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 24 / 130
  • 65. Self-Adaptive MOEA Chromosome Chromosomes Example Objective Fitness Functions Objective Fitness Function Range Example Non-dominated Solutions % 0 - 1.0 0.90 Unique Solutions % 0 - 1.0 0.88 Pareto Front Hypervolume 0 - 1.0 0.56 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 24 / 130
  • 66. eMEGA Chromosome Graph based, and Information related to evolutionary design process. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130
  • 67. eMEGA Chromosome Graph based, and Information related to evolutionary design process. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130
  • 68. eMEGA Chromosome Graph based, and Information related to evolutionary design process. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130
  • 69. Self-Adaptive MOEA Flowchart C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130
  • 70. Self-Adaptive MOEA Flowchart C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130
  • 71. Self-Adaptive MOEA Flowchart C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130
  • 72. Self-Adaptive MOEA Flowchart C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130
  • 73. Self-Adaptive MOEA Showcases C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 27 / 130
  • 74. Validation of Self-Adaptive MOEA: About Compare SAMOEA, eMEGA and MOARF [Firth et al., 2015]. Design molecules that have structural and chemical properties similarity to the target molecule of Seliciclib. Figure: Seliciclib (CYC202, R-roscovitine) C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 28 / 130
  • 75. Validation of Self-Adaptive MOEA: Staring Datasets Starting Molecules datasets: Maybridge’s Screening Library that contains 53953 molecules (Dataset 1), Asinex’s Elite Libraries that contains 104577 molecules (Dataset 2). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130
  • 76. Validation of Self-Adaptive MOEA: Staring Datasets Starting Molecules datasets: Maybridge’s Screening Library that contains 53953 molecules (Dataset 1), Asinex’s Elite Libraries that contains 104577 molecules (Dataset 2). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130
  • 77. Validation of Self-Adaptive MOEA: Settings eMEGA Settings Dataset Objectives Population Iterations Evolutionary Operations Dataset 1 Structural Similarity Chemical Descriptor Similarity 500 500 Mutation Probability: 15% Crossover Probability: 80% Selection Type: Roulette Diversity Type: Genotype Dataset 2 SAMOEA Settings SAMOEA Dataset Objectives Population Iterations Evolutionary Operations Dataset 1 Non Dominate Solutions Percentage Unique Solutions Percentage 20 100 Mutation Probability: 15% Crossover Probability: 80% Selection Type: Roulette Diversity Type: Phenotype Dataset 2 eMEGA Dataset 1 Structural Similarity Chemical Descriptor Similarity 100 1 Defined during run time. Based on SAMOEA’s chro- mosomes. Dataset 2 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130
  • 78. Validation of Self-Adaptive MOEA: Settings eMEGA Settings Dataset Objectives Population Iterations Evolutionary Operations Dataset 1 Structural Similarity Chemical Descriptor Similarity 500 500 Mutation Probability: 15% Crossover Probability: 80% Selection Type: Roulette Diversity Type: Genotype Dataset 2 SAMOEA Settings SAMOEA Dataset Objectives Population Iterations Evolutionary Operations Dataset 1 Non Dominate Solutions Percentage Unique Solutions Percentage 20 100 Mutation Probability: 15% Crossover Probability: 80% Selection Type: Roulette Diversity Type: Phenotype Dataset 2 eMEGA Dataset 1 Structural Similarity Chemical Descriptor Similarity 100 1 Defined during run time. Based on SAMOEA’s chro- mosomes. Dataset 2 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130
  • 79. Validation of Self-Adaptive MOEA: Results C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 31 / 130
  • 80. Validation of Self-Adaptive MOEA: Results C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 31 / 130
  • 81. Validation of Self-Adaptive MOEA: Results - Search Settings (1) SAMOEA Top 10 proposed settings for eMEGA for Maybridge dataset Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Unique Solutions % Rank 0.029 0.694 roulette genotype 0.9 0.986 1 0.175 0.818 roulette phenotype 0.914 0.961 1 0.172 0.818 tournament phenotype 0.934 0.9533 1 0.026 0.694 roulette phenotype 0.928 0.955 1 0.001 0.963 roulette phenotype 0.982 0.848 1 0.177 0.818 roulette phenotype 0.921 0.956 1 0.083 0.73 tournament phenotype 0.95 0.946 1 0.086 0.798 tournament genotype 0.976 0.928 1 0.172 0.818 best genotype 0.914 0.973 2 0.176 0.818 roulette genotype 0.9312 0.956 2 Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank.
  • 82. Validation of Self-Adaptive MOEA: Results - Search Settings (2) SAMOEA Top 10 proposed settings for eMEGA for Asinex dataset Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Unique Solutions % Rank 0.105 1.0 best phenotype 0.988 0.931 1 0.139 0.963 tournament phenotype 0.962 0.956 1 0.089 0.694 tournament genotype 0.976 0.943 1 0.139 0.969 best phenotype 0.96 0.96 1 0.108 0.69 tournament genotype 0.955 0.962 1 0.1 1.0 best phenotype 0.988 0.942 1 0.088 0.685 tournament genotype 0.96 0.962 1 0.139 0.966 roulette phenotype 0.965 0.948 1 0.089 0.709 tournament genotype 0.964 0.957 2 Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank.
  • 83. Use Case 1: About Design molecules that bind to ER-α based on: Structural similarity to Tamoxifen, and Structural dissimilarity to Ibuproxam. (a) Tamoxifen. (b) Ibuproxam. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 34 / 130
  • 84. Use Case 1: Starting Dataset Starting Molecules dataset: Molecules retrieved from ZINC15, Applied filters: Clean (Substances with ”clean” reactivity), In-vitro (Substances reported or inferred active at 10 uM or better in direct binding assays) and Now (Immediate delivery, includes in-stock and agent). The collection contains 7035 molecules. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
  • 85. Use Case 1: Starting Dataset Starting Molecules dataset: Molecules retrieved from ZINC15, Applied filters: Clean (Substances with ”clean” reactivity), In-vitro (Substances reported or inferred active at 10 uM or better in direct binding assays) and Now (Immediate delivery, includes in-stock and agent). The collection contains 7035 molecules. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
  • 86. Use Case 1: Starting Dataset Starting Molecules dataset: Molecules retrieved from ZINC15, Applied filters: Clean (Substances with ”clean” reactivity), In-vitro (Substances reported or inferred active at 10 uM or better in direct binding assays) and Now (Immediate delivery, includes in-stock and agent). The collection contains 7035 molecules. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
  • 87. Use Case 1: Starting Dataset Starting Molecules dataset: Molecules retrieved from ZINC15, Applied filters: Clean (Substances with ”clean” reactivity), In-vitro (Substances reported or inferred active at 10 uM or better in direct binding assays) and Now (Immediate delivery, includes in-stock and agent). The collection contains 7035 molecules. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
  • 88. Use Case 1: Starting Dataset Starting Molecules dataset: Molecules retrieved from ZINC15, Applied filters: Clean (Substances with ”clean” reactivity), In-vitro (Substances reported or inferred active at 10 uM or better in direct binding assays) and Now (Immediate delivery, includes in-stock and agent). The collection contains 7035 molecules. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
  • 89. Use Case 1: Starting Dataset Starting Molecules dataset: Molecules retrieved from ZINC15, Applied filters: Clean (Substances with ”clean” reactivity), In-vitro (Substances reported or inferred active at 10 uM or better in direct binding assays) and Now (Immediate delivery, includes in-stock and agent). The collection contains 7035 molecules. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
  • 90. Use Case 1: Results - In objective space C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 36 / 130
  • 91. Use Case 1: Results - Designed molecules C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 37 / 130
  • 92. Use Case 1: Results - AutoDock Vina docking Molecule Id Docking Affinity (kcal/mol) Tamoxifen -8.2 DnD 6 SP 20 4 X 13a -7.9 DnD 31 SP 150 37 M 19 -7.9 DnD 8 SP 9 2 M 13 -7.8 DnD 4 SP 199 49 X 46b -7.7 DnD 12 SP 75 18 M 13 -7.6 DnD 31 SP 6 1 M 16 -7.2 DnD 15 SP 168 41 M 0 -7.2 DnD 11 SP 74 18 M 4 -7.1 DnD 31 SP 193 48 X 76b -6.9 DnD 1 SP 78 19 X 84a -6.8 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 38 / 130
  • 93. Use Case 1: Results - Self-Adaptive MOEA non dominated settings for eMEGA Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Pareto Hypervolume Rank 0.15777 0.80279 tournament genotype 0.634 0.341 1 0.15613 0.88305 tournament genotype 0.634 0.341 1 0.15627 0.88891 tournament genotype 0.634 0.341 1 0.15688 0.88891 roulette genotype 0.649 0.340 1 0.00552 0.94308 best genotype 0.624 0.427 1 Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 39 / 130
  • 94. Use Case 3: About Design molecules that bind to ER-α based on: Structural similarity to Raloxifene, and Chemical Properties similarity to Raloxifene. Figure: Raloxifene. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 40 / 130
  • 95. Use Case 3: Starting Dataset Starting Molecules dataset: Molecules retrieved from ZINC15, Applied filters: Clean (Substances with ”clean” reactivity), In-vitro (Substances reported or inferred active at 10 uM or better in direct binding assays) and Now (Immediate delivery, includes in-stock and agent). The collection contains 7035 molecules. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 41 / 130
  • 96. Use Case 3: Results - In objective space C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 42 / 130
  • 97. Use Case 3: Results - Designed molecules C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 43 / 130
  • 98. Use Case 3: Results - AutoDock Vina docking Molecule Id Docking Affinity (kcal/mol) DnD 31 SP 194 48 M 49 -8.2 DnD 34 SP 197 49 X 13a -5.9 Raloxifene -2.2 (-11.70 PubChem) C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 44 / 130
  • 99. Use Case 3: Results - Self-Adaptive MOEA non dominated settings for eMEGA Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Pareto Hypervolume Rank 0.12927 0.98597 roulette genotype 0.997 0.274 1 0.12897 0.98588 roulette genotype 0.997 0.274 1 0.12933 0.98588 roulette genotype 0.997 0.274 1 0.12946 0.98559 roulette genotype 0.997 0.274 1 0.12928 0.98582 roulette genotype 0.997 0.274 1 0.12897 0.98588 tournament genotype 0.997 0.274 1 Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 45 / 130
  • 100. Use Case 4: About Design molecules that bind to Proteasome B5 based on: Structural similarity to Ixazomib, and Chemical Properties similarity to Ixazomib. Figure: Ixazomib. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 46 / 130
  • 101. Use Case 4: Starting Dataset Starting Molecules dataset: Molecules retrieved from ZINC15, Applied filters: Clean (Substances with ”clean” reactivity), In-vitro (Substances reported or inferred active at 10 uM or better in direct binding assays) and Now (Immediate delivery, includes in-stock and agent). The collection contains 7035 molecules. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 47 / 130
  • 102. Use Case 4: Results - In objective space C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 48 / 130
  • 103. Use Case 4: Results - Designed molecules C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 49 / 130
  • 104. Use Case 4: Results - AutoDock 4 docking Molecule Id Docking Affinity (kcal/mol) DnD 19 SP 196 48 X 59b -7.19 DnD 49 SP 193 48 X 123b -6.68 DnD 1 SP 196 48 X 67a -6.08 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 50 / 130
  • 105. Use Case 4: Results - Self-Adaptive MOEA non dominated settings for eMEGA Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Pareto Hypervolume Rank 0.09507 0.98194 tournament phenotype 0.993 0.442 1 0.09507 0.9819 roulette phenotype 0.991 0.442 1 0.09471 0.98178 roulette genotype 0.997 0.426 1 0.09484 0.98183 roulette phenotype 0.996 0.441 1 0.09277 0.98235 roulette genotype 0.996 0.441 1 Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 51 / 130
  • 106. Self-Adaptive MOEA Showcases Discussion SAMOEA proposed interesting solutions in all problems that has been applied to, Further in-vitro investigation is required, and SAMOEA’s proposed eMEGA settings differ based on problem and dataset (no silver bullet). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130
  • 107. Self-Adaptive MOEA Showcases Discussion SAMOEA proposed interesting solutions in all problems that has been applied to, Further in-vitro investigation is required, and SAMOEA’s proposed eMEGA settings differ based on problem and dataset (no silver bullet). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130
  • 108. Self-Adaptive MOEA Showcases Discussion SAMOEA proposed interesting solutions in all problems that has been applied to, Further in-vitro investigation is required, and SAMOEA’s proposed eMEGA settings differ based on problem and dataset (no silver bullet). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130
  • 109. Concluding Remarks C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 53 / 130
  • 110. Table of Contents 1 Introduction Scientific Workflow Management Systems Self-Adaptive Multi-Objective Evolutionary Algorithms Virtual Screening & De Novo Molecular Design 2 Life Sciences Informatics platform About Life Sciences Informatics platform LiSIs Showcase LiSIs Showcase Discussion 3 Self-Adaptive Multi-Objective Evolutionary Algorithm About Self-Adaptive MOEA Self-Adaptive MOEA Showcases Self-Adaptive MOEA Showcases Discussion 4 Concluding Remarks Concluding Remarks - LiSIs platform Concluding Remarks - Self-Adaptive MOEA 5 Future Work Future Work - LiSIs platform Future Work - Self-Adaptive MOEA C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 53 / 130
  • 111. Concluding Remarks - LiSIs platform Features a Web based Virtual Screening platform, focused for Cancer Chemoprevention Research. To be expanded later in the future with tools featuring the algorithms from MEGA framework. A number of SWs were implemented for: preparing docking models, preparing predictive models, performing docking experiments, using predictive models to predict biochemical properties and behaviour, and performing VS workflows. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130
  • 112. Concluding Remarks - LiSIs platform Features a Web based Virtual Screening platform, focused for Cancer Chemoprevention Research. To be expanded later in the future with tools featuring the algorithms from MEGA framework. A number of SWs were implemented for: preparing docking models, preparing predictive models, performing docking experiments, using predictive models to predict biochemical properties and behaviour, and performing VS workflows. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130
  • 113. Concluding Remarks - LiSIs platform Features a Web based Virtual Screening platform, focused for Cancer Chemoprevention Research. To be expanded later in the future with tools featuring the algorithms from MEGA framework. A number of SWs were implemented for: preparing docking models, preparing predictive models, performing docking experiments, using predictive models to predict biochemical properties and behaviour, and performing VS workflows. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130
  • 114. Concluding Remarks - Self-Adaptive MOEA (1) Drawbacks: Needs a lot of time to terminate, and Very slow convergence. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130
  • 115. Concluding Remarks - Self-Adaptive MOEA (1) Drawbacks: Needs a lot of time to terminate, and Very slow convergence. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130
  • 116. Concluding Remarks - Self-Adaptive MOEA (1) Drawbacks: Needs a lot of time to terminate, and Very slow convergence. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130
  • 117. Concluding Remarks - Self-Adaptive MOEA (2) Advantages: Searches a larger space, Generates far more solutions per iteration, Proposes the fittest parameter sets that should be used from eMEGA for the given problem, Has been build to be adaptable, Uses objective fitness functions that can evaluate the effectiveness and the progression of any MOEA, Can be used on other problems, SAMOEA’s chromosome can be expanded with additional search parameters, and Leverages multi-core parallelism (needs more memory). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
  • 118. Concluding Remarks - Self-Adaptive MOEA (2) Advantages: Searches a larger space, Generates far more solutions per iteration, Proposes the fittest parameter sets that should be used from eMEGA for the given problem, Has been build to be adaptable, Uses objective fitness functions that can evaluate the effectiveness and the progression of any MOEA, Can be used on other problems, SAMOEA’s chromosome can be expanded with additional search parameters, and Leverages multi-core parallelism (needs more memory). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
  • 119. Concluding Remarks - Self-Adaptive MOEA (2) Advantages: Searches a larger space, Generates far more solutions per iteration, Proposes the fittest parameter sets that should be used from eMEGA for the given problem, Has been build to be adaptable, Uses objective fitness functions that can evaluate the effectiveness and the progression of any MOEA, Can be used on other problems, SAMOEA’s chromosome can be expanded with additional search parameters, and Leverages multi-core parallelism (needs more memory). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
  • 120. Concluding Remarks - Self-Adaptive MOEA (2) Advantages: Searches a larger space, Generates far more solutions per iteration, Proposes the fittest parameter sets that should be used from eMEGA for the given problem, Has been build to be adaptable, Uses objective fitness functions that can evaluate the effectiveness and the progression of any MOEA, Can be used on other problems, SAMOEA’s chromosome can be expanded with additional search parameters, and Leverages multi-core parallelism (needs more memory). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
  • 121. Concluding Remarks - Self-Adaptive MOEA (2) Advantages: Searches a larger space, Generates far more solutions per iteration, Proposes the fittest parameter sets that should be used from eMEGA for the given problem, Has been build to be adaptable, Uses objective fitness functions that can evaluate the effectiveness and the progression of any MOEA, Can be used on other problems, SAMOEA’s chromosome can be expanded with additional search parameters, and Leverages multi-core parallelism (needs more memory). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
  • 122. Concluding Remarks - Self-Adaptive MOEA (2) Advantages: Searches a larger space, Generates far more solutions per iteration, Proposes the fittest parameter sets that should be used from eMEGA for the given problem, Has been build to be adaptable, Uses objective fitness functions that can evaluate the effectiveness and the progression of any MOEA, Can be used on other problems, SAMOEA’s chromosome can be expanded with additional search parameters, and Leverages multi-core parallelism (needs more memory). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
  • 123. Concluding Remarks - Self-Adaptive MOEA (2) Advantages: Searches a larger space, Generates far more solutions per iteration, Proposes the fittest parameter sets that should be used from eMEGA for the given problem, Has been build to be adaptable, Uses objective fitness functions that can evaluate the effectiveness and the progression of any MOEA, Can be used on other problems, SAMOEA’s chromosome can be expanded with additional search parameters, and Leverages multi-core parallelism (needs more memory). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
  • 124. Concluding Remarks - Self-Adaptive MOEA (2) Advantages: Searches a larger space, Generates far more solutions per iteration, Proposes the fittest parameter sets that should be used from eMEGA for the given problem, Has been build to be adaptable, Uses objective fitness functions that can evaluate the effectiveness and the progression of any MOEA, Can be used on other problems, SAMOEA’s chromosome can be expanded with additional search parameters, and Leverages multi-core parallelism (needs more memory). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
  • 125. Concluding Remarks - Self-Adaptive MOEA (2) Advantages: Searches a larger space, Generates far more solutions per iteration, Proposes the fittest parameter sets that should be used from eMEGA for the given problem, Has been build to be adaptable, Uses objective fitness functions that can evaluate the effectiveness and the progression of any MOEA, Can be used on other problems, SAMOEA’s chromosome can be expanded with additional search parameters, and Leverages multi-core parallelism (needs more memory). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
  • 126. Future Work C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 57 / 130
  • 127. Table of Contents 1 Introduction Scientific Workflow Management Systems Self-Adaptive Multi-Objective Evolutionary Algorithms Virtual Screening & De Novo Molecular Design 2 Life Sciences Informatics platform About Life Sciences Informatics platform LiSIs Showcase LiSIs Showcase Discussion 3 Self-Adaptive Multi-Objective Evolutionary Algorithm About Self-Adaptive MOEA Self-Adaptive MOEA Showcases Self-Adaptive MOEA Showcases Discussion 4 Concluding Remarks Concluding Remarks - LiSIs platform Concluding Remarks - Self-Adaptive MOEA 5 Future Work Future Work - LiSIs platform Future Work - Self-Adaptive MOEA C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 57 / 130
  • 128. Future Work - LiSIs platform Develop LiSIs 2.0: Based on latest Galaxy platform, and Redesign of tools to be compatible with Galaxy’s ToolShed for easy deployment, Update LiSIs with a feature to visualise intermediate results from various tools, Expand LiSIs tools with tools featuring the MEGA line-up of algorithms and SAMOEA, Explore resource management in SWMSs: Novel Multi-Objective Optimization SW design approaches, Novel Multi-Objective Optimization SWs scheduling approaches. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
  • 129. Future Work - LiSIs platform Develop LiSIs 2.0: Based on latest Galaxy platform, and Redesign of tools to be compatible with Galaxy’s ToolShed for easy deployment, Update LiSIs with a feature to visualise intermediate results from various tools, Expand LiSIs tools with tools featuring the MEGA line-up of algorithms and SAMOEA, Explore resource management in SWMSs: Novel Multi-Objective Optimization SW design approaches, Novel Multi-Objective Optimization SWs scheduling approaches. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
  • 130. Future Work - LiSIs platform Develop LiSIs 2.0: Based on latest Galaxy platform, and Redesign of tools to be compatible with Galaxy’s ToolShed for easy deployment, Update LiSIs with a feature to visualise intermediate results from various tools, Expand LiSIs tools with tools featuring the MEGA line-up of algorithms and SAMOEA, Explore resource management in SWMSs: Novel Multi-Objective Optimization SW design approaches, Novel Multi-Objective Optimization SWs scheduling approaches. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
  • 131. Future Work - LiSIs platform Develop LiSIs 2.0: Based on latest Galaxy platform, and Redesign of tools to be compatible with Galaxy’s ToolShed for easy deployment, Update LiSIs with a feature to visualise intermediate results from various tools, Expand LiSIs tools with tools featuring the MEGA line-up of algorithms and SAMOEA, Explore resource management in SWMSs: Novel Multi-Objective Optimization SW design approaches, Novel Multi-Objective Optimization SWs scheduling approaches. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
  • 132. Future Work - LiSIs platform Develop LiSIs 2.0: Based on latest Galaxy platform, and Redesign of tools to be compatible with Galaxy’s ToolShed for easy deployment, Update LiSIs with a feature to visualise intermediate results from various tools, Expand LiSIs tools with tools featuring the MEGA line-up of algorithms and SAMOEA, Explore resource management in SWMSs: Novel Multi-Objective Optimization SW design approaches, Novel Multi-Objective Optimization SWs scheduling approaches. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
  • 133. Future Work - LiSIs platform Develop LiSIs 2.0: Based on latest Galaxy platform, and Redesign of tools to be compatible with Galaxy’s ToolShed for easy deployment, Update LiSIs with a feature to visualise intermediate results from various tools, Expand LiSIs tools with tools featuring the MEGA line-up of algorithms and SAMOEA, Explore resource management in SWMSs: Novel Multi-Objective Optimization SW design approaches, Novel Multi-Objective Optimization SWs scheduling approaches. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
  • 134. Future Work - LiSIs platform Develop LiSIs 2.0: Based on latest Galaxy platform, and Redesign of tools to be compatible with Galaxy’s ToolShed for easy deployment, Update LiSIs with a feature to visualise intermediate results from various tools, Expand LiSIs tools with tools featuring the MEGA line-up of algorithms and SAMOEA, Explore resource management in SWMSs: Novel Multi-Objective Optimization SW design approaches, Novel Multi-Objective Optimization SWs scheduling approaches. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
  • 135. Future Work - LiSIs platform Develop LiSIs 2.0: Based on latest Galaxy platform, and Redesign of tools to be compatible with Galaxy’s ToolShed for easy deployment, Update LiSIs with a feature to visualise intermediate results from various tools, Expand LiSIs tools with tools featuring the MEGA line-up of algorithms and SAMOEA, Explore resource management in SWMSs: Novel Multi-Objective Optimization SW design approaches, Novel Multi-Objective Optimization SWs scheduling approaches. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
  • 136. Future Work - Self-Adaptive MOEA Optimise MEGA framework (memory management and parallelism), Implement self-adaptive technique for selecting genetic operators, Extend Self-Adaptive MOEA to use other MOEAs, Implement models for other problems, and Implement new objective functions. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
  • 137. Future Work - Self-Adaptive MOEA Optimise MEGA framework (memory management and parallelism), Implement self-adaptive technique for selecting genetic operators, Extend Self-Adaptive MOEA to use other MOEAs, Implement models for other problems, and Implement new objective functions. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
  • 138. Future Work - Self-Adaptive MOEA Optimise MEGA framework (memory management and parallelism), Implement self-adaptive technique for selecting genetic operators, Extend Self-Adaptive MOEA to use other MOEAs, Implement models for other problems, and Implement new objective functions. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
  • 139. Future Work - Self-Adaptive MOEA Optimise MEGA framework (memory management and parallelism), Implement self-adaptive technique for selecting genetic operators, Extend Self-Adaptive MOEA to use other MOEAs, Implement models for other problems, and Implement new objective functions. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
  • 140. Future Work - Self-Adaptive MOEA Optimise MEGA framework (memory management and parallelism), Implement self-adaptive technique for selecting genetic operators, Extend Self-Adaptive MOEA to use other MOEAs, Implement models for other problems, and Implement new objective functions. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
  • 141.
  • 142. List of Publications C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
  • 143. Table of Contents 6 List of Publications 7 References 8 Backup Frames Validation of Self-Adaptive MOEA Use Case 1 Use Case 2 Use Case 3 Use Case 4 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
  • 144. List of Publications I Book Chapters C. A. Nicolaou and C. C. Kannas, “Molecular Library Design Using Multi-Objective Optimization Methods,” in Chemical Library Design, J. Z. Zhou, Ed. Humana Press, 2011, pp. 53–69. Journals C. Kannas et al., “LiSIs: An Online Scientific Workflow System for Virtual Screening,” Combinatorial Chemistry & High Throughput Screening, vol. 18, no. 3, pp. 281–295, Mar. 2015. C. A. Nicolaou, C. Kannas, and E. Loizidou, “Multi-objective optimization methods in de novo drug design,” Mini Rev Med Chem, vol. 12, no. 10, pp. 979–987, Sep. 2012. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 62 / 130
  • 145. List of Publications II C. Nicolaou, C. Kannas, and C. Pattichis, “Knowledge-driven multi-objective de novo drug design,” Chemistry Central Journal, vol. 3, p. P22, 2009. Conferences C. C. Kannas, and C. S. Pattichis, ”Self-Adaptive Multi-Objective Evolutionary Algorithm for Molecular Design,” in 30th IEEE International Symposium on Computer-Base Medical Systems, Thessoloniki, Greece, 22-24 June 2017, pp. 1-6. P. Hasapis et al., ”Molecular clustering via knowledge mining from biomedical scientific corpora,” in 2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), 2013, pp. 1-5. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 63 / 130
  • 146. List of Publications III C. C. Kannas et al., “A workflow system for virtual screening in cancer chemoprevention,” in 2012 IEEE 12th International Conference on Bioinformatics Bioengineering (BIBE), 2012, pp. 439–446. K. G. Achilleos, C. C. Kannas, C. A. Nicolaou, C. S. Pattichis, and V. J. Promponas, “Open source workflow systems in life sciences informatics,” in 2012 IEEE 12th International Conference on Bioinformatics Bioengineering (BIBE), 2012, pp. 552–558. C. A. Nicolaou, C. Kannas, and C. S. Pattichis, “Optimal graph design using a knowledge-driven multi-objective evolutionary graph algorithm,” in 2009 9th International Conference on Information Technology and Applications in Biomedicine, Larnaka, Cyprus, 2009, pp. 1–6. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 64 / 130
  • 147. List of Publications IV C. C. Kannas, C. A. Nicolaou, and C. S. Pattichis, “A Parallel implementation of a Multi-objective Evolutionary Algorithm,” in 2009 9th International Conference on Information Technology and Applications in Biomedicine, Larnaka, Cyprus, 2009, pp. 1–6. Abstracts C. C. Kannas, and C. S. Pattichis, ”Self-Adaptive Multi-Objective Evolutionary Algorithm for Molecular Design,” in 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Jeju Island, Korea, 11-15 July 2017. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 65 / 130
  • 148. References C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 66 / 130
  • 149. Table of Contents 6 List of Publications 7 References 8 Backup Frames Validation of Self-Adaptive MOEA Use Case 1 Use Case 2 Use Case 3 Use Case 4 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 66 / 130
  • 150. References I Beccari, A. R., Cavazzoni, C., Beato, C., and Costantino, G. (2013). LiGen: A High Performance Workflow for Chemistry Driven de Novo Design. Journal of Chemical Information and Modeling. Blankenberg, D., Kuster, G. V., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. (2010). Galaxy: A Web-Based Genome Analysis Tool for Experimentalists. In Current Protocols in Molecular Biology. John Wiley & Sons, Inc. Daeyaert, F. and Deem, M. W. (2016). A Pareto Algorithm for Efficient De Novo Design of Multi-functional Molecules. Molecular Informatics, pages n/a–n/a.
  • 151. References II Damewood, Jr, J. R., Lerman, C. L., and Masek, B. B. (2010). NovoFLAP: A ligand-based de novo design approach for the generation of medicinally relevant ideas. Journal of Chemical Information and Modeling, 50(7):1296–1303. Dey, F. and Caflisch, A. (2008). Fragment-based de novo ligand design by multiobjective evolutionary optimization. Journal of Chemical Information and Modeling, 48(3):679–690. Ekins, S., Honeycutt, J. D., and Metz, J. T. (2010). Evolving molecules using multi-objective optimization: applying to ADME/Tox. Drug Discovery Today, 15(11-12):451–460.
  • 152. References III Feher, M., Gao, Y., Baber, J. C., Shirley, W. A., and Saunders, J. (2008). The use of ligand-based de novo design for scaffold hopping and sidechain optimization: two case studies. Bioorganic & Medicinal Chemistry, 16(1):422–427. Firth, N. C., Atrash, B., Brown, N., and Blagg, J. (2015). MOARF, an Integrated Workflow for Multiobjective Optimization: Implementation, Synthesis, and Biological Evaluation. Journal of Chemical Information and Modeling. Fonseca, C. and Fleming, P. (1998). Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 28(1):26–37.
  • 153. References IV Giardine, B., Riemer, C., Hardison, R. C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., Miller, W., Kent, W. J., and Nekrutenko, A. (2005). Galaxy: A Platform for Interactive Large-Scale Genome Analysis. Genome Research, 15(10):1451–1455. Goecks, J., Nekrutenko, A., Taylor, J., and Galaxy Team, T. (2010). Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology, 11(8):R86. Grefenstette, J. (1986). Optimization of Control Parameters for Genetic Algorithms. IEEE Transactions on Systems, Man and Cybernetics, 16(1):122–128.
  • 154. References V Gurer-Orhan, H., Kool, J., Vermeulen, N. P. E., and Meerman, J. H. N. (2005). A novel microplate reader-based high-throughput assay for estrogen receptor binding. International Journal of Environmental Analytical Chemistry, 85(3):149–161. Hartenfeller, M., Zettl, H., Walter, M., Rupp, M., Reisen, F., Proschak, E., Weggen, S., Stark, H., and Schneider, G. (2012). DOGS: Reaction-Driven de novo Design of Bioactive Compounds. PLoS Comput Biol, 8(2):e1002380. Huang, Q., Li, L.-L., and Yang, S.-Y. (2010). PhDD: a new pharmacophore-based de novo design method of drug-like molecules combined with assessment of synthetic accessibility. Journal of Molecular Graphics and Modelling, 28(8):775–787.
  • 155. References VI Kannas, C., Kalvari, I., Lambrinidis, G., Neophytou, C., Savva, C., Kirmitzoglou, I., Antoniou, Z., Achilleos, K., Scherf, D., Pitta, C., Nicolaou, C., Mikros, E., Promponas, V., Gerhauser, C., Mehta, R., Constantinou, A., and Pattichis, C. (2015). LiSIs: An Online Scientific Workflow System for Virtual Screening. Combinatorial Chemistry & High Throughput Screening, 18(3):281 – 295. Kramer, O. (2010). Evolutionary self-adaptation: a survey of operators and strategy parameters. Evolutionary Intelligence, 3(2):51–65.
  • 156. References VII Kutchukian, P. S., Lou, D., and Shakhnovich, E. I. (2009). FOG: Fragment Optimized Growth algorithm for the de novo generation of molecules occupying druglike chemical space. Journal of Chemical Information and Modeling, 49(7):1630–1642. Medina-Franco, J. L., L´opez-Vallejo, F., Kuck, D., and Lyko, F. (2010). Natural products as DNA methyltransferase inhibitors: a computer-aided discovery approach. Molecular Diversity, 15:293–304. Nicolaou, C. A., Apostolakis, J., and Pattichis, C. S. (2009a). De Novo Drug Design Using Multiobjective Evolutionary Graphs. Journal of Chemical Information and Modeling, 49(2):295–307.
  • 157. References VIII Nicolaou, C. A., Kannas, C., and Pattichis, C. S. (2009b). Optimal graph design using a knowledge-driven multi-objective evolutionary graph algorithm. In 2009 9th International Conference on Information Technology and Applications in Biomedicine, pages 1–6, Larnaka, Cyprus. IEEE.
  • 158. Backup Frames C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
  • 159. Table of Contents 6 List of Publications 7 References 8 Backup Frames Validation of Self-Adaptive MOEA Use Case 1 Use Case 2 Use Case 3 Use Case 4 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
  • 161. LiSIs Showcase - Known ER Ligands A/A Estrogen Ligand Docking Score ER-α Docking Score ER-β 1 Raloxifene -11.70 -8.72 2 Lilly-117018 -11.53 -3.80 3 3-HydroxyTamoxifen -11.02 N/A 4 Nafoxidine -10.88 N/A 5 ICI-182780 -10.73 N/A 6 Pyrolidine -10.04 N/A 7 Clomiphene A -10.01 N/A 8 Nitrofinene Citrate -9.87 N/A 9 ICI-164384 -9.82 -9.13 10 Moxestrol -9.38 -9.77 11 Naringenine -8.55 -7.80 12 Triphenylethylene -8.50 N/A 13 Afema -8.15 -7.78 14 Danazol -6.99 N/A 15 Ethamoxytriphetol -6.67 N/A 16 4-HydroxyTamoxifen -6.60 N/A 17 Dioxin -6.22 N/A 18 Estralutin -5.86 -3.80 19 Cyclopentanone -4.88 N/A 20 Miproxifene Phosphate -4.48 N/A 21 EM-800 N/A N/A Note: The list was retrieved from PubChem and it includes compounds characterized as “estrogen ligands”. N/A; no binding affinity.
  • 162. LiSIs Showcase - Natural-like Rule of 5 filter GRANATUM Rule of 5 filter: 1 MW between 160 and 700, 2 HBD less or equal to 5, 3 HBA less or equal to 10, 4 TPSA less than 140, and 5 cLogP between -0.4 and 5.6.
  • 163. eMEGA Settings Table: eMEGA experimental design settings Dataset Objectives Population Iterations Evolutionary Operations Dataset 1 Structural Similarity Chemical Descriptor Similarity 500 500 Mutation Probability: 15% Crossover Probability: 80% Selection Type: Roulette Diversity Type: Genotype Dataset 2
  • 164. SAMOEA Settings Table: SAMOEA experimental design settings SAMOEA Dataset Objectives Population Iterations Evolutionary Operations Dataset 1 Non Dominate Solutions Percentage Unique Solutions Percentage 20 100 Mutation Probability: 15% Crossover Probability: 80% Selection Type: Roulette Diversity Type: Phenotype Dataset 2 eMEGA Dataset 1 Structural Similarity Chemical Descriptor Similarity 100 1 Defined during run time. Based on SAMOEA’s chro- mosomes. Dataset 2
  • 165. Virtual Machine Specifications Table: Specifications of the virtual machine the experimental runs were performed Linux Virtual Machine CPU 4x Virtual CPU @ 2GHz RAM 16GB OS CentOS 6
  • 166. eMEGA Maybridge Run 1 Figure: eMEGA Run 1 results for Maybridge dataset.
  • 167. eMEGA Maybridge Run 2 Figure: eMEGA Run 2 results for Maybridge dataset.
  • 168. eMEGA Maybridge Run 3 Figure: eMEGA Run 3 results for Maybridge dataset.
  • 169. eMEGA Maybridge Run 4 Figure: eMEGA Run 4 results for Maybridge dataset.
  • 170. eMEGA Maybridge Run 5 Figure: eMEGA Run 5 results for Maybridge dataset.
  • 171. eMEGA Maybridge All Runs Figure: eMEGA results for Maybridge dataset.
  • 172. eMEGA Maybridge All Runs Top 10 Results (1) Figure: eMEGA Top 10 results for Maybridge dataset.
  • 173. eMEGA Maybridge All Runs Top 10 Results (2) Figure: eMEGA Top 10 results for Maybridge dataset compared with Seliciclib, the red highlighted part of the molecules is their common core.
  • 174. eMEGA Asinex Run 1 Figure: eMEGA Run 1 results for Asinex dataset.
  • 175. eMEGA Asinex Run 2 Figure: eMEGA Run 2 results for Asinex dataset.
  • 176. eMEGA Asinex Run 3 Figure: eMEGA Run 3 results for Asinex dataset.
  • 177. Results - eMEGA Asinex Run 4 Figure: eMEGA Run 4 results for Asinex dataset.
  • 178. eMEGA Asinex Run 5 Figure: eMEGA Run 5 results for Asinex dataset.
  • 179. eMEGA Asinex All Runs Figure: eMEGA results for Asinex dataset.
  • 180. eMEGA Asinex All Runs Top 10 Results (1) Figure: eMEGA Top 10 results for Asinex dataset.
  • 181. eMEGA Asinex All Runs Top 10 Results (2) Figure: eMEGA Top 10 results for Asinex dataset compared with Seliciclib, the red highlighted part of the molecules is their common core.
  • 182. SAMOEA Maybridge Run 1 Figure: SAMOEA Run 1 results for Maybridge dataset.
  • 183. SAMOEA Maybridge Run 2 Figure: SAMOEA Run 2 results for Maybridge dataset.
  • 184. SAMOEA Maybridge Run 3 Figure: SAMOEA Run 3 results for Maybridge dataset.
  • 185. SAMOEA Maybridge Run 4 Figure: SAMOEA Run 4 results for Maybridge dataset.
  • 186. SAMOEA Maybridge Run 5 Figure: SAMOEA Run 5 results for Maybridge dataset.
  • 187. SAMOEA Maybridge All Runs Figure: SAMOEA results for Maybridge dataset.
  • 188. SAMOEA Maybridge All Runs Top 10 Results (1) Figure: SAMOEA Top 10 results for Maybridge dataset.
  • 189. SAMOEA Maybridge All Runs Top 10 Results (2) Figure: SAMOEA Top 10 results for Maybridge dataset compared with Seliciclib, the red highlighted part of the molecules is their common core.
  • 190. SAMOEA Top 10 proposed settings for eMEGA for Maybridge dataset Table: SAMOEA Top 10 proposed settings for eMEGA for Maybridge dataset Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Unique Solutions % Rank 0.029 0.694 roulette genotype 0.9 0.986 1 0.175 0.818 roulette phenotype 0.914 0.961 1 0.172 0.818 tournament phenotype 0.934 0.9533 1 0.026 0.694 roulette phenotype 0.928 0.955 1 0.001 0.963 roulette phenotype 0.982 0.848 1 0.177 0.818 roulette phenotype 0.921 0.956 1 0.083 0.73 tournament phenotype 0.95 0.946 1 0.086 0.798 tournament genotype 0.976 0.928 1 0.172 0.818 best genotype 0.914 0.973 2 0.176 0.818 roulette genotype 0.9312 0.956 2 Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank.
  • 191. SAMOEA Asinex Run 1 Figure: SAMOEA Run 1 results for Asinex dataset.
  • 192. SAMOEA Asinex Run 2 Figure: SAMOEA Run 2 results for Asinex dataset.
  • 193. SAMOEA Asinex Run 3 Figure: SAMOEA Run 3 results for Asinex dataset.
  • 194. SAMOEA Asinex Run 4 Figure: SAMOEA Run 4 results for Asinex dataset.
  • 195. SAMOEA Asinex All Runs Figure: SAMOEA results for Asinex dataset.
  • 196. SAMOEA Asinex All Runs Top 10 Results (1) Figure: SAMOEA Top 10 results for Asinex dataset.
  • 197. SAMOEA Asinex All Runs Top 10 Results (2) Figure: SAMOEA Top 10 results for Asinex dataset compared with Seliciclib, the red highlighted part of the molecules is their common core.
  • 198. SAMOEA Top 10 proposed settings for eMEGA for Maybridge Asinex Table: SAMOEA Top 10 proposed settings for eMEGA for Asinex dataset Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Unique Solutions % Rank 0.105 1.0 best phenotype 0.988 0.931 1 0.139 0.963 tournament phenotype 0.962 0.956 1 0.089 0.694 tournament genotype 0.976 0.943 1 0.139 0.969 best phenotype 0.96 0.96 1 0.108 0.69 tournament genotype 0.955 0.962 1 0.1 1.0 best phenotype 0.988 0.942 1 0.088 0.685 tournament genotype 0.96 0.962 1 0.139 0.966 roulette phenotype 0.965 0.948 1 0.089 0.709 tournament genotype 0.964 0.957 2 Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank.
  • 199. MOARF Results Figure: MOARF’s results compared with Seliciclib.
  • 200. Compare SAMOEA, eMEGA and MOARF Figure: Compare all Top 10 results with MOARF’s results and Seliciclib.
  • 201. Discussion (1) eMEGA and SAMOEA generate molecules that approximate Seliciclib, Datasets and algorithms have different common core with Seliciclib, MOARF approximates Seliciclib better than eMEGA and SAMOEA: Generates molecules in a more chemical oriented way, with less stochastic operations, Starts from a selected core for the target where then attaches new fragments on to it, SAMOEA explores the space better than eMEGA and MOARF
  • 202. Discussion (1) eMEGA and SAMOEA generate molecules that approximate Seliciclib, Datasets and algorithms have different common core with Seliciclib, MOARF approximates Seliciclib better than eMEGA and SAMOEA: Generates molecules in a more chemical oriented way, with less stochastic operations, Starts from a selected core for the target where then attaches new fragments on to it, SAMOEA explores the space better than eMEGA and MOARF
  • 203. Discussion (1) eMEGA and SAMOEA generate molecules that approximate Seliciclib, Datasets and algorithms have different common core with Seliciclib, MOARF approximates Seliciclib better than eMEGA and SAMOEA: Generates molecules in a more chemical oriented way, with less stochastic operations, Starts from a selected core for the target where then attaches new fragments on to it, SAMOEA explores the space better than eMEGA and MOARF
  • 204. Discussion (1) eMEGA and SAMOEA generate molecules that approximate Seliciclib, Datasets and algorithms have different common core with Seliciclib, MOARF approximates Seliciclib better than eMEGA and SAMOEA: Generates molecules in a more chemical oriented way, with less stochastic operations, Starts from a selected core for the target where then attaches new fragments on to it, SAMOEA explores the space better than eMEGA and MOARF
  • 205. Discussion (1) eMEGA and SAMOEA generate molecules that approximate Seliciclib, Datasets and algorithms have different common core with Seliciclib, MOARF approximates Seliciclib better than eMEGA and SAMOEA: Generates molecules in a more chemical oriented way, with less stochastic operations, Starts from a selected core for the target where then attaches new fragments on to it, SAMOEA explores the space better than eMEGA and MOARF
  • 206. Discussion (1) eMEGA and SAMOEA generate molecules that approximate Seliciclib, Datasets and algorithms have different common core with Seliciclib, MOARF approximates Seliciclib better than eMEGA and SAMOEA: Generates molecules in a more chemical oriented way, with less stochastic operations, Starts from a selected core for the target where then attaches new fragments on to it, SAMOEA explores the space better than eMEGA and MOARF
  • 207. Discussion (2) From the SAMOEA proposed eMEGA settings Tables we can see that different settings are favoured for each dataset. Maybridge dataset: Mutation probability around 17%, Crossover probability around 80%, Selection type either roulette or tournament and Diversity type both selections are valid ones. Asinex dataset: Mutation probability around 10%, Crossover probability around 96%, Selection type either best or tournament and Diversity type both selections are valid ones.
  • 208. Discussion (2) From the SAMOEA proposed eMEGA settings Tables we can see that different settings are favoured for each dataset. Maybridge dataset: Mutation probability around 17%, Crossover probability around 80%, Selection type either roulette or tournament and Diversity type both selections are valid ones. Asinex dataset: Mutation probability around 10%, Crossover probability around 96%, Selection type either best or tournament and Diversity type both selections are valid ones.
  • 209. Discussion (2) From the SAMOEA proposed eMEGA settings Tables we can see that different settings are favoured for each dataset. Maybridge dataset: Mutation probability around 17%, Crossover probability around 80%, Selection type either roulette or tournament and Diversity type both selections are valid ones. Asinex dataset: Mutation probability around 10%, Crossover probability around 96%, Selection type either best or tournament and Diversity type both selections are valid ones.
  • 210. Discussion (2) From the SAMOEA proposed eMEGA settings Tables we can see that different settings are favoured for each dataset. Maybridge dataset: Mutation probability around 17%, Crossover probability around 80%, Selection type either roulette or tournament and Diversity type both selections are valid ones. Asinex dataset: Mutation probability around 10%, Crossover probability around 96%, Selection type either best or tournament and Diversity type both selections are valid ones.
  • 211. Discussion (2) From the SAMOEA proposed eMEGA settings Tables we can see that different settings are favoured for each dataset. Maybridge dataset: Mutation probability around 17%, Crossover probability around 80%, Selection type either roulette or tournament and Diversity type both selections are valid ones. Asinex dataset: Mutation probability around 10%, Crossover probability around 96%, Selection type either best or tournament and Diversity type both selections are valid ones.
  • 212. Discussion (2) From the SAMOEA proposed eMEGA settings Tables we can see that different settings are favoured for each dataset. Maybridge dataset: Mutation probability around 17%, Crossover probability around 80%, Selection type either roulette or tournament and Diversity type both selections are valid ones. Asinex dataset: Mutation probability around 10%, Crossover probability around 96%, Selection type either best or tournament and Diversity type both selections are valid ones.
  • 213. Discussion (2) From the SAMOEA proposed eMEGA settings Tables we can see that different settings are favoured for each dataset. Maybridge dataset: Mutation probability around 17%, Crossover probability around 80%, Selection type either roulette or tournament and Diversity type both selections are valid ones. Asinex dataset: Mutation probability around 10%, Crossover probability around 96%, Selection type either best or tournament and Diversity type both selections are valid ones.
  • 214. Discussion (2) From the SAMOEA proposed eMEGA settings Tables we can see that different settings are favoured for each dataset. Maybridge dataset: Mutation probability around 17%, Crossover probability around 80%, Selection type either roulette or tournament and Diversity type both selections are valid ones. Asinex dataset: Mutation probability around 10%, Crossover probability around 96%, Selection type either best or tournament and Diversity type both selections are valid ones.
  • 215. Discussion (2) From the SAMOEA proposed eMEGA settings Tables we can see that different settings are favoured for each dataset. Maybridge dataset: Mutation probability around 17%, Crossover probability around 80%, Selection type either roulette or tournament and Diversity type both selections are valid ones. Asinex dataset: Mutation probability around 10%, Crossover probability around 96%, Selection type either best or tournament and Diversity type both selections are valid ones.
  • 216. Discussion (2) From the SAMOEA proposed eMEGA settings Tables we can see that different settings are favoured for each dataset. Maybridge dataset: Mutation probability around 17%, Crossover probability around 80%, Selection type either roulette or tournament and Diversity type both selections are valid ones. Asinex dataset: Mutation probability around 10%, Crossover probability around 96%, Selection type either best or tournament and Diversity type both selections are valid ones.
  • 217. Discussion (3) The objective fitness scores for the proposed settings are very high, which means that the actual percentage is really low, below 5%. From this we can conclude the following: eMEGA instances generate a large number of identical solutions, despite the fact that they have different configurations, this is something that we noticed with previous experiments when comparing MEGA, eMEGA and MOGA [Nicolaou et al., 2009b], and The objective fitness functions we choose to use in SAMOEA compete each other, which means that having eMEGAs generating a high number of unique and non dominated solutions (above 20%) proves to be a difficult task.
  • 218. Discussion (3) The objective fitness scores for the proposed settings are very high, which means that the actual percentage is really low, below 5%. From this we can conclude the following: eMEGA instances generate a large number of identical solutions, despite the fact that they have different configurations, this is something that we noticed with previous experiments when comparing MEGA, eMEGA and MOGA [Nicolaou et al., 2009b], and The objective fitness functions we choose to use in SAMOEA compete each other, which means that having eMEGAs generating a high number of unique and non dominated solutions (above 20%) proves to be a difficult task.
  • 219. Use Case 1: Docked designed molecules (1) Figure: Designed molecule DnD 6 SP 20 4 X 13a docked to ER-α.
  • 220. Use Case 1: Docked designed molecules (2) Figure: Designed molecule DnD 31 SP 150 37 M 19 docked to ER-α.
  • 221. Use Case 1: Docked designed molecules (3) Figure: Designed molecule DnD 8 SP 9 2 M 13 docked to ER-α.
  • 222. Use Case 1: Docked designed molecules (4) Figure: Designed molecule DnD 4 SP 199 49 X 46b docked to ER-α.
  • 223. Use Case 1: Docked designed molecules (5) Figure: Designed molecule DnD 12 SP 75 18 M 13 docked to ER-α.
  • 224. Use Case 1: Docked designed molecules (6) Figure: Designed molecule DnD 31 SP 6 1 M 16 docked to ER-α.
  • 225. Use Case 1: Docked designed molecules (7) Figure: Designed molecule DnD 15 SP 168 41 M 0 docked to ER-α.
  • 226. Use Case 1: Docked designed molecules (8) Figure: Designed molecule DnD 11 SP 74 18 M 4 docked to ER-α.
  • 227. Use Case 1: Docked designed molecules (9) Figure: Designed molecule DnD 31 SP 193 48 X 76b docked to ER-α.
  • 228. Use Case 1: Docked designed molecules (10) Figure: Designed molecule DnD 1 SP 78 19 X 84a docked to ER-α.
  • 229. Use Case 2: About Design molecules that bind to ER-α based on: Structural similarity to Tamoxifen, and Chemical Properties similarity to Tamoxifen. Figure: Tamoxifen. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 116 / 130
  • 230. Use Case 2: Starting Dataset Starting Molecules dataset: Molecules retrieved from ZINC15, Applied filters: Clean (Substances with ”clean” reactivity), In-vitro (Substances reported or inferred active at 10 uM or better in direct binding assays) and Now (Immediate delivery, includes in-stock and agent). The collection contains 7035 molecules. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 117 / 130
  • 231. Use Case 2: Results - In objective space C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 118 / 130
  • 232. Use Case 2: Results - Designed molecules C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 119 / 130
  • 233. Use Case 2: Results - AutoDock Vina docking Molecule Id Docking Affinity (kcal/mol) DnD 42 SP 194 48 X 96b -10.1 DnD 17 SP 199 49 M 4 -10 DnD 33 SP 189 47 X 66b -9.9 DnD 48 SP 193 48 M 5 -9.6 Tamoxifen -8.2 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 120 / 130
  • 234. Use Case 2: Results - Self-Adaptive MOEA non dominated settings for eMEGA Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Pareto Hypervolume Rank 0.02707 0.97973 tournament genotype 0.983 0.153 1 0.02758 0.97965 tournament phenotype 0.988 0.152 1 Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 121 / 130
  • 235. Use Case 2: Docked designed molecules (1) Figure: Designed molecule DnD 42 SP 194 48 X 96b docked to ER-α.
  • 236. Use Case 2: Docked designed molecules (2) Figure: Designed molecule DnD 17 SP 199 49 M 4 docked to ER-α.
  • 237. Use Case 2: Docked designed molecules (3) Figure: Designed molecule DnD 33 SP 189 47 X 66b docked to ER-α.
  • 238. Use Case 2: Docked designed molecules (4) Figure: Designed molecule DnD 48 SP 193 48 M 5 docked to ER-α.
  • 239. Use Case 3: Docked designed molecules (1) Figure: Designed molecule DnD 31 SP 194 48 M 49 docked to ER-α.
  • 240. Use Case 3: Docked designed molecules (2) Figure: Designed molecule DnD 34 SP 197 49 X 13a docked to ER-α.
  • 241. Use Case 4: Docked designed molecules (1) Figure: Designed molecule DnD 19 SP 196 48 X 59b docked to Proteasome B5.
  • 242. Use Case 4: Docked designed molecules (2) Figure: Designed molecule DnD 49 SP 193 48 X 123b docked to Proteasome B5.
  • 243. Use Case 4: Docked designed molecules (3) Figure: Designed molecule DnD 1 SP 196 48 X 67a docked to Proteasome B5.