16. Table of Contents
1 Introduction
Scientific Workflow Management Systems
Self-Adaptive Multi-Objective Evolutionary Algorithms
Virtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platform
About Life Sciences Informatics platform
LiSIs Showcase
LiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary Algorithm
About Self-Adaptive MOEA
Self-Adaptive MOEA Showcases
Self-Adaptive MOEA Showcases Discussion
4 Concluding Remarks
Concluding Remarks - LiSIs platform
Concluding Remarks - Self-Adaptive MOEA
5 Future Work
Future Work - LiSIs platform
Future Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 8 / 130
17. Motivation & Objectives
Motivation
Provide an easy to use web based platform,
Focused on Virtual Screening (VS) of natural products, and
Aimed towards cancer chemoprevention researchers.
Objectives
Design and develop a web based Scientific Workflow
Management System (SWMS),
Provide tools for VS, and
Evaluate it on use cases for identifying novel chemopreventive
agents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
18. Motivation & Objectives
Motivation
Provide an easy to use web based platform,
Focused on Virtual Screening (VS) of natural products, and
Aimed towards cancer chemoprevention researchers.
Objectives
Design and develop a web based Scientific Workflow
Management System (SWMS),
Provide tools for VS, and
Evaluate it on use cases for identifying novel chemopreventive
agents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
19. Motivation & Objectives
Motivation
Provide an easy to use web based platform,
Focused on Virtual Screening (VS) of natural products, and
Aimed towards cancer chemoprevention researchers.
Objectives
Design and develop a web based Scientific Workflow
Management System (SWMS),
Provide tools for VS, and
Evaluate it on use cases for identifying novel chemopreventive
agents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
20. Motivation & Objectives
Motivation
Provide an easy to use web based platform,
Focused on Virtual Screening (VS) of natural products, and
Aimed towards cancer chemoprevention researchers.
Objectives
Design and develop a web based Scientific Workflow
Management System (SWMS),
Provide tools for VS, and
Evaluate it on use cases for identifying novel chemopreventive
agents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
21. Motivation & Objectives
Motivation
Provide an easy to use web based platform,
Focused on Virtual Screening (VS) of natural products, and
Aimed towards cancer chemoprevention researchers.
Objectives
Design and develop a web based Scientific Workflow
Management System (SWMS),
Provide tools for VS, and
Evaluate it on use cases for identifying novel chemopreventive
agents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
22. Motivation & Objectives
Motivation
Provide an easy to use web based platform,
Focused on Virtual Screening (VS) of natural products, and
Aimed towards cancer chemoprevention researchers.
Objectives
Design and develop a web based Scientific Workflow
Management System (SWMS),
Provide tools for VS, and
Evaluate it on use cases for identifying novel chemopreventive
agents.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130
23. Scientific Workflow Management Systems for
Virtual Screening
Applications Technology Scientific Field(s)
Open Source
Taverna Java
Bioinformatics,
Chemistry,
Astronomy,
Data Mining,
Text Mining,
Music
Galaxy Python
Life Sciences,
Bioinformatics
Knime Java
Life Sciences,
Chemoinformatics,
Bioinformatics,
High Performance Data Anal-
ysis
Commercial
Inforsence/DiscoveryNet
Life Sciences,
Healthcare,
Environmental Monitoring,
Geo-hazard Modelling
Pipeline Pilot
Biology,
Chemistry,
Material Science
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 10 / 130
24. Funding Support
The work has been partially supported through the EU-FP7
GRANATUM project, ”A Social Collaborative Working
Space Semantically Interlinking Biomedical Researchers,
Knowledge and data for the design and execution of In Silico
Models and Experiments in Cancer Chemoprevention”,
contract number 270139.
Support the research of EU-FP7 Linked2Safety project, ”A
Next-Generation, Secure Linked Data Medical Information
Space For Semantically-Interconnecting Electronic Health
Records and Clinical Trials Systems Advancing Patients
Safety In Clinical Research”, contract number 288328.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130
25. Funding Support
The work has been partially supported through the EU-FP7
GRANATUM project, ”A Social Collaborative Working
Space Semantically Interlinking Biomedical Researchers,
Knowledge and data for the design and execution of In Silico
Models and Experiments in Cancer Chemoprevention”,
contract number 270139.
Support the research of EU-FP7 Linked2Safety project, ”A
Next-Generation, Secure Linked Data Medical Information
Space For Semantically-Interconnecting Electronic Health
Records and Clinical Trials Systems Advancing Patients
Safety In Clinical Research”, contract number 288328.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130
26. Life Sciences Informatics platform
Life Sciences Informatics (LiSIs) is a web based SWMS for
VS [Kannas et al., 2015].
LiSIs is based on the Galaxy SWMS [Goecks et al., 2010],
[Blankenberg et al., 2010], [Giardine et al., 2005].
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130
27. Life Sciences Informatics platform
Life Sciences Informatics (LiSIs) is a web based SWMS for
VS [Kannas et al., 2015].
LiSIs is based on the Galaxy SWMS [Goecks et al., 2010],
[Blankenberg et al., 2010], [Giardine et al., 2005].
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130
33. LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promising
agents with chemopreventive properties, that are able to bind
to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β
(ER-β)
Datasets:
2414 compounds from Indofine,
55 compounds characterized by Medina-Franco et al.
[Medina-Franco et al., 2010], and
21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
34. LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promising
agents with chemopreventive properties, that are able to bind
to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β
(ER-β)
Datasets:
2414 compounds from Indofine,
55 compounds characterized by Medina-Franco et al.
[Medina-Franco et al., 2010], and
21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
35. LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promising
agents with chemopreventive properties, that are able to bind
to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β
(ER-β)
Datasets:
2414 compounds from Indofine,
55 compounds characterized by Medina-Franco et al.
[Medina-Franco et al., 2010], and
21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
36. LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promising
agents with chemopreventive properties, that are able to bind
to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β
(ER-β)
Datasets:
2414 compounds from Indofine,
55 compounds characterized by Medina-Franco et al.
[Medina-Franco et al., 2010], and
21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
37. LiSIs Showcase Information
LiSIs was (successfully) used for the discovery of promising
agents with chemopreventive properties, that are able to bind
to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β
(ER-β)
Datasets:
2414 compounds from Indofine,
55 compounds characterized by Medina-Franco et al.
[Medina-Franco et al., 2010], and
21 known ER ligands retrieved from PubChem.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130
40. LiSIs Showcase Discussion
From Indofine dataset (2414 compounds), based on their
natural-like criteria and docking results, we selected:
18 potential ER ligands,
Were further investigated in vitro with the ER binding assay
described by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]
with minor modifications,
15 out of 18 compounds (83.3%) were experimentally
confirmed active.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130
41. LiSIs Showcase Discussion
From Indofine dataset (2414 compounds), based on their
natural-like criteria and docking results, we selected:
18 potential ER ligands,
Were further investigated in vitro with the ER binding assay
described by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]
with minor modifications,
15 out of 18 compounds (83.3%) were experimentally
confirmed active.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130
42. LiSIs Showcase Discussion
From Indofine dataset (2414 compounds), based on their
natural-like criteria and docking results, we selected:
18 potential ER ligands,
Were further investigated in vitro with the ER binding assay
described by Gurer-Orhan et al. [Gurer-Orhan et al., 2005]
with minor modifications,
15 out of 18 compounds (83.3%) were experimentally
confirmed active.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130
44. Table of Contents
1 Introduction
Scientific Workflow Management Systems
Self-Adaptive Multi-Objective Evolutionary Algorithms
Virtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platform
About Life Sciences Informatics platform
LiSIs Showcase
LiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary Algorithm
About Self-Adaptive MOEA
Self-Adaptive MOEA Showcases
Self-Adaptive MOEA Showcases Discussion
4 Concluding Remarks
Concluding Remarks - LiSIs platform
Concluding Remarks - Self-Adaptive MOEA
5 Future Work
Future Work - LiSIs platform
Future Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 19 / 130
45. Multi-Objective Algorithms for Molecular Design
Name MO Method Search
Method
Remarks Reference
EA-
Inventor
Weighted Evolutionary
Algorithm
Ligand [Feher et al., 2008]
GANDI Weighted Parallel Evo-
lutionary Al-
gorithm
Structure [Dey and Caflisch, 2008]
FOG Weighted Evolutionary
Algorithm
Ligand [Kutchukian et al., 2009]
MEGA Pareto based Evolutionary
Algorithm
Ligand & Struc-
ture
[Nicolaou et al., 2009a]
PLD Pareto based Evolutionary
Algorithm
ADME related
properties
[Ekins et al., 2010]
NovoFLAP Weighted Evolutionary
Algorithm
Ligand [Damewood et al., 2010]
PhDD Weighted Workflow Pharmacophore [Huang et al., 2010]
DOGS Weighted Workflow Ligand [Hartenfeller et al., 2012]
LiGen Weighted Workflow Ligand, Struc-
ture & Pharma-
cophore
[Beccari et al., 2013]
MOARF Weighted Workflow Ligand & Struc-
ture
[Firth et al., 2015]
Synopsis Pareto based Evolutionary
Algorithm
Ligand & Struc-
ture
[Daeyaert and Deem, 2016]
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 20 / 130
46. Motivation & Objectives
Motivation
Find suitable search parameters for an algorithm in a given
problem, and
Automate this process.
Objectives
Design and develop an algorithm:
To search for the fittest search parameters of MOEAs,
To be problem agnostic, and
Evaluate on our previously proposed eMEGA for molecular
De Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
47. Motivation & Objectives
Motivation
Find suitable search parameters for an algorithm in a given
problem, and
Automate this process.
Objectives
Design and develop an algorithm:
To search for the fittest search parameters of MOEAs,
To be problem agnostic, and
Evaluate on our previously proposed eMEGA for molecular
De Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
48. Motivation & Objectives
Motivation
Find suitable search parameters for an algorithm in a given
problem, and
Automate this process.
Objectives
Design and develop an algorithm:
To search for the fittest search parameters of MOEAs,
To be problem agnostic, and
Evaluate on our previously proposed eMEGA for molecular
De Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
49. Motivation & Objectives
Motivation
Find suitable search parameters for an algorithm in a given
problem, and
Automate this process.
Objectives
Design and develop an algorithm:
To search for the fittest search parameters of MOEAs,
To be problem agnostic, and
Evaluate on our previously proposed eMEGA for molecular
De Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
50. Motivation & Objectives
Motivation
Find suitable search parameters for an algorithm in a given
problem, and
Automate this process.
Objectives
Design and develop an algorithm:
To search for the fittest search parameters of MOEAs,
To be problem agnostic, and
Evaluate on our previously proposed eMEGA for molecular
De Novo Design.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130
51. About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
52. About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
53. About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
54. About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
55. About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
56. About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
57. About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
58. About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
59. About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
60. About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
61. About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
62. About Self-Adaptive MOEA
Meta-level algorithmic approach influenced by Grefenstette
[Grefenstette, 1986] and Kramer [Kramer, 2010]
Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA)
[Nicolaou et al., 2009a], [Nicolaou et al., 2009b],
Meta-level EA is a modified MOGA
[Fonseca and Fleming, 1998],
Optimise eMEGA parameters:
Mutation Rate,
Crossover Rate,
Parent Selection Type,
Population Diversity Type.
Objective fitness functions for the meta-level:
The percentage of non-dominated solutions each eMEGA has
per iteration,
The percentage of unique solutions each eMEGA has per
iteration.
Pareto Front Hypervolume each eMEGA has per iteration.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130
74. Validation of Self-Adaptive MOEA: About
Compare SAMOEA, eMEGA and MOARF
[Firth et al., 2015].
Design molecules that have structural and chemical
properties similarity to the target molecule of Seliciclib.
Figure: Seliciclib (CYC202, R-roscovitine)
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 28 / 130
75. Validation of Self-Adaptive MOEA: Staring
Datasets
Starting Molecules datasets:
Maybridge’s Screening Library that contains 53953 molecules
(Dataset 1),
Asinex’s Elite Libraries that contains 104577 molecules
(Dataset 2).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130
76. Validation of Self-Adaptive MOEA: Staring
Datasets
Starting Molecules datasets:
Maybridge’s Screening Library that contains 53953 molecules
(Dataset 1),
Asinex’s Elite Libraries that contains 104577 molecules
(Dataset 2).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130
77. Validation of Self-Adaptive MOEA: Settings
eMEGA Settings
Dataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Structural Similarity
Chemical Descriptor
Similarity
500 500
Mutation Probability: 15%
Crossover Probability: 80%
Selection Type: Roulette
Diversity Type: Genotype
Dataset 2
SAMOEA Settings
SAMOEA
Dataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Non Dominate
Solutions Percentage
Unique Solutions
Percentage
20 100
Mutation Probability: 15%
Crossover Probability: 80%
Selection Type: Roulette
Diversity Type: Phenotype
Dataset 2
eMEGA
Dataset 1 Structural Similarity
Chemical Descriptor
Similarity
100 1
Defined during run time.
Based on SAMOEA’s chro-
mosomes.
Dataset 2
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130
78. Validation of Self-Adaptive MOEA: Settings
eMEGA Settings
Dataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Structural Similarity
Chemical Descriptor
Similarity
500 500
Mutation Probability: 15%
Crossover Probability: 80%
Selection Type: Roulette
Diversity Type: Genotype
Dataset 2
SAMOEA Settings
SAMOEA
Dataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Non Dominate
Solutions Percentage
Unique Solutions
Percentage
20 100
Mutation Probability: 15%
Crossover Probability: 80%
Selection Type: Roulette
Diversity Type: Phenotype
Dataset 2
eMEGA
Dataset 1 Structural Similarity
Chemical Descriptor
Similarity
100 1
Defined during run time.
Based on SAMOEA’s chro-
mosomes.
Dataset 2
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130
81. Validation of Self-Adaptive MOEA: Results -
Search Settings (1)
SAMOEA Top 10 proposed settings for eMEGA for Maybridge dataset
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Unique
Solutions
%
Rank
0.029 0.694 roulette genotype 0.9 0.986 1
0.175 0.818 roulette phenotype 0.914 0.961 1
0.172 0.818 tournament phenotype 0.934 0.9533 1
0.026 0.694 roulette phenotype 0.928 0.955 1
0.001 0.963 roulette phenotype 0.982 0.848 1
0.177 0.818 roulette phenotype 0.921 0.956 1
0.083 0.73 tournament phenotype 0.95 0.946 1
0.086 0.798 tournament genotype 0.976 0.928 1
0.172 0.818 best genotype 0.914 0.973 2
0.176 0.818 roulette genotype 0.9312 0.956 2
Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the
actual %. The smaller the number listed here the better. ’Rank’ is their non dominance
rank.
82. Validation of Self-Adaptive MOEA: Results -
Search Settings (2)
SAMOEA Top 10 proposed settings for eMEGA for Asinex dataset
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Unique
Solutions
%
Rank
0.105 1.0 best phenotype 0.988 0.931 1
0.139 0.963 tournament phenotype 0.962 0.956 1
0.089 0.694 tournament genotype 0.976 0.943 1
0.139 0.969 best phenotype 0.96 0.96 1
0.108 0.69 tournament genotype 0.955 0.962 1
0.1 1.0 best phenotype 0.988 0.942 1
0.088 0.685 tournament genotype 0.96 0.962 1
0.139 0.966 roulette phenotype 0.965 0.948 1
0.089 0.709 tournament genotype 0.964 0.957 2
Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the
actual %. The smaller the number listed here the better. ’Rank’ is their non dominance
rank.
83. Use Case 1: About
Design molecules that bind to ER-α based on:
Structural similarity to Tamoxifen, and
Structural dissimilarity to Ibuproxam.
(a) Tamoxifen. (b) Ibuproxam.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 34 / 130
84. Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
85. Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
86. Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
87. Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
88. Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
89. Use Case 1: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130
90. Use Case 1: Results - In objective space
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 36 / 130
91. Use Case 1: Results - Designed molecules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 37 / 130
92. Use Case 1: Results - AutoDock Vina docking
Molecule Id Docking Affinity (kcal/mol)
Tamoxifen -8.2
DnD 6 SP 20 4 X 13a -7.9
DnD 31 SP 150 37 M 19 -7.9
DnD 8 SP 9 2 M 13 -7.8
DnD 4 SP 199 49 X 46b -7.7
DnD 12 SP 75 18 M 13 -7.6
DnD 31 SP 6 1 M 16 -7.2
DnD 15 SP 168 41 M 0 -7.2
DnD 11 SP 74 18 M 4 -7.1
DnD 31 SP 193 48 X 76b -6.9
DnD 1 SP 78 19 X 84a -6.8
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 38 / 130
93. Use Case 1: Results - Self-Adaptive MOEA non
dominated settings for eMEGA
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Pareto
Hypervolume
Rank
0.15777 0.80279 tournament genotype 0.634 0.341 1
0.15613 0.88305 tournament genotype 0.634 0.341 1
0.15627 0.88891 tournament genotype 0.634 0.341 1
0.15688 0.88891 roulette genotype 0.649 0.340 1
0.00552 0.94308 best genotype 0.624 0.427 1
Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the
number listed here the better. ’Rank’ is their non dominance rank.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 39 / 130
94. Use Case 3: About
Design molecules that bind to ER-α based on:
Structural similarity to Raloxifene, and
Chemical Properties similarity to Raloxifene.
Figure: Raloxifene.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 40 / 130
95. Use Case 3: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 41 / 130
96. Use Case 3: Results - In objective space
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 42 / 130
97. Use Case 3: Results - Designed molecules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 43 / 130
98. Use Case 3: Results - AutoDock Vina docking
Molecule Id Docking Affinity (kcal/mol)
DnD 31 SP 194 48 M 49 -8.2
DnD 34 SP 197 49 X 13a -5.9
Raloxifene -2.2 (-11.70 PubChem)
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 44 / 130
99. Use Case 3: Results - Self-Adaptive MOEA non
dominated settings for eMEGA
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Pareto
Hypervolume
Rank
0.12927 0.98597 roulette genotype 0.997 0.274 1
0.12897 0.98588 roulette genotype 0.997 0.274 1
0.12933 0.98588 roulette genotype 0.997 0.274 1
0.12946 0.98559 roulette genotype 0.997 0.274 1
0.12928 0.98582 roulette genotype 0.997 0.274 1
0.12897 0.98588 tournament genotype 0.997 0.274 1
Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the
number listed here the better. ’Rank’ is their non dominance rank.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 45 / 130
100. Use Case 4: About
Design molecules that bind to Proteasome B5 based on:
Structural similarity to Ixazomib, and
Chemical Properties similarity to Ixazomib.
Figure: Ixazomib.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 46 / 130
101. Use Case 4: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 47 / 130
102. Use Case 4: Results - In objective space
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 48 / 130
103. Use Case 4: Results - Designed molecules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 49 / 130
104. Use Case 4: Results - AutoDock 4 docking
Molecule Id Docking Affinity (kcal/mol)
DnD 19 SP 196 48 X 59b -7.19
DnD 49 SP 193 48 X 123b -6.68
DnD 1 SP 196 48 X 67a -6.08
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 50 / 130
105. Use Case 4: Results - Self-Adaptive MOEA non
dominated settings for eMEGA
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Pareto
Hypervolume
Rank
0.09507 0.98194 tournament phenotype 0.993 0.442 1
0.09507 0.9819 roulette phenotype 0.991 0.442 1
0.09471 0.98178 roulette genotype 0.997 0.426 1
0.09484 0.98183 roulette phenotype 0.996 0.441 1
0.09277 0.98235 roulette genotype 0.996 0.441 1
Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the
number listed here the better. ’Rank’ is their non dominance rank.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 51 / 130
106. Self-Adaptive MOEA Showcases Discussion
SAMOEA proposed interesting solutions in all problems that
has been applied to,
Further in-vitro investigation is required, and
SAMOEA’s proposed eMEGA settings differ based on
problem and dataset (no silver bullet).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130
107. Self-Adaptive MOEA Showcases Discussion
SAMOEA proposed interesting solutions in all problems that
has been applied to,
Further in-vitro investigation is required, and
SAMOEA’s proposed eMEGA settings differ based on
problem and dataset (no silver bullet).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130
108. Self-Adaptive MOEA Showcases Discussion
SAMOEA proposed interesting solutions in all problems that
has been applied to,
Further in-vitro investigation is required, and
SAMOEA’s proposed eMEGA settings differ based on
problem and dataset (no silver bullet).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130
110. Table of Contents
1 Introduction
Scientific Workflow Management Systems
Self-Adaptive Multi-Objective Evolutionary Algorithms
Virtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platform
About Life Sciences Informatics platform
LiSIs Showcase
LiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary Algorithm
About Self-Adaptive MOEA
Self-Adaptive MOEA Showcases
Self-Adaptive MOEA Showcases Discussion
4 Concluding Remarks
Concluding Remarks - LiSIs platform
Concluding Remarks - Self-Adaptive MOEA
5 Future Work
Future Work - LiSIs platform
Future Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 53 / 130
111. Concluding Remarks - LiSIs platform
Features a Web based Virtual Screening platform, focused for
Cancer Chemoprevention Research.
To be expanded later in the future with tools featuring the
algorithms from MEGA framework.
A number of SWs were implemented for:
preparing docking models,
preparing predictive models,
performing docking experiments,
using predictive models to predict biochemical properties
and behaviour, and
performing VS workflows.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130
112. Concluding Remarks - LiSIs platform
Features a Web based Virtual Screening platform, focused for
Cancer Chemoprevention Research.
To be expanded later in the future with tools featuring the
algorithms from MEGA framework.
A number of SWs were implemented for:
preparing docking models,
preparing predictive models,
performing docking experiments,
using predictive models to predict biochemical properties
and behaviour, and
performing VS workflows.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130
113. Concluding Remarks - LiSIs platform
Features a Web based Virtual Screening platform, focused for
Cancer Chemoprevention Research.
To be expanded later in the future with tools featuring the
algorithms from MEGA framework.
A number of SWs were implemented for:
preparing docking models,
preparing predictive models,
performing docking experiments,
using predictive models to predict biochemical properties
and behaviour, and
performing VS workflows.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130
114. Concluding Remarks - Self-Adaptive MOEA (1)
Drawbacks:
Needs a lot of time to terminate, and
Very slow convergence.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130
115. Concluding Remarks - Self-Adaptive MOEA (1)
Drawbacks:
Needs a lot of time to terminate, and
Very slow convergence.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130
116. Concluding Remarks - Self-Adaptive MOEA (1)
Drawbacks:
Needs a lot of time to terminate, and
Very slow convergence.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130
117. Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
118. Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
119. Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
120. Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
121. Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
122. Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
123. Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
124. Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
125. Concluding Remarks - Self-Adaptive MOEA (2)
Advantages:
Searches a larger space,
Generates far more solutions per iteration,
Proposes the fittest parameter sets that should be used from
eMEGA for the given problem,
Has been build to be adaptable,
Uses objective fitness functions that can evaluate the
effectiveness and the progression of any MOEA,
Can be used on other problems,
SAMOEA’s chromosome can be expanded with additional
search parameters, and
Leverages multi-core parallelism (needs more memory).
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130
127. Table of Contents
1 Introduction
Scientific Workflow Management Systems
Self-Adaptive Multi-Objective Evolutionary Algorithms
Virtual Screening & De Novo Molecular Design
2 Life Sciences Informatics platform
About Life Sciences Informatics platform
LiSIs Showcase
LiSIs Showcase Discussion
3 Self-Adaptive Multi-Objective Evolutionary Algorithm
About Self-Adaptive MOEA
Self-Adaptive MOEA Showcases
Self-Adaptive MOEA Showcases Discussion
4 Concluding Remarks
Concluding Remarks - LiSIs platform
Concluding Remarks - Self-Adaptive MOEA
5 Future Work
Future Work - LiSIs platform
Future Work - Self-Adaptive MOEA
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 57 / 130
128. Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
129. Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
130. Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
131. Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
132. Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
133. Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
134. Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
135. Future Work - LiSIs platform
Develop LiSIs 2.0:
Based on latest Galaxy platform, and
Redesign of tools to be compatible with Galaxy’s ToolShed
for easy deployment,
Update LiSIs with a feature to visualise intermediate results
from various tools,
Expand LiSIs tools with tools featuring the MEGA line-up of
algorithms and SAMOEA,
Explore resource management in SWMSs:
Novel Multi-Objective Optimization SW design approaches,
Novel Multi-Objective Optimization SWs scheduling
approaches.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130
136. Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management and
parallelism),
Implement self-adaptive technique for selecting genetic
operators,
Extend Self-Adaptive MOEA to use other MOEAs,
Implement models for other problems, and
Implement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
137. Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management and
parallelism),
Implement self-adaptive technique for selecting genetic
operators,
Extend Self-Adaptive MOEA to use other MOEAs,
Implement models for other problems, and
Implement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
138. Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management and
parallelism),
Implement self-adaptive technique for selecting genetic
operators,
Extend Self-Adaptive MOEA to use other MOEAs,
Implement models for other problems, and
Implement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
139. Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management and
parallelism),
Implement self-adaptive technique for selecting genetic
operators,
Extend Self-Adaptive MOEA to use other MOEAs,
Implement models for other problems, and
Implement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
140. Future Work - Self-Adaptive MOEA
Optimise MEGA framework (memory management and
parallelism),
Implement self-adaptive technique for selecting genetic
operators,
Extend Self-Adaptive MOEA to use other MOEAs,
Implement models for other problems, and
Implement new objective functions.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130
143. Table of Contents
6 List of Publications
7 References
8 Backup Frames
Validation of Self-Adaptive MOEA
Use Case 1
Use Case 2
Use Case 3
Use Case 4
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
144. List of Publications I
Book Chapters
C. A. Nicolaou and C. C. Kannas, “Molecular Library
Design Using Multi-Objective Optimization
Methods,” in Chemical Library Design, J. Z. Zhou, Ed.
Humana Press, 2011, pp. 53–69.
Journals
C. Kannas et al., “LiSIs: An Online Scientific Workflow
System for Virtual Screening,” Combinatorial Chemistry
& High Throughput Screening, vol. 18, no. 3, pp. 281–295,
Mar. 2015.
C. A. Nicolaou, C. Kannas, and E. Loizidou,
“Multi-objective optimization methods in de novo
drug design,” Mini Rev Med Chem, vol. 12, no. 10, pp.
979–987, Sep. 2012.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 62 / 130
145. List of Publications II
C. Nicolaou, C. Kannas, and C. Pattichis,
“Knowledge-driven multi-objective de novo drug
design,” Chemistry Central Journal, vol. 3, p. P22, 2009.
Conferences
C. C. Kannas, and C. S. Pattichis, ”Self-Adaptive
Multi-Objective Evolutionary Algorithm for
Molecular Design,” in 30th IEEE International
Symposium on Computer-Base Medical Systems,
Thessoloniki, Greece, 22-24 June 2017, pp. 1-6.
P. Hasapis et al., ”Molecular clustering via knowledge
mining from biomedical scientific corpora,” in 2013
IEEE 13th International Conference on Bioinformatics and
Bioengineering (BIBE), 2013, pp. 1-5.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 63 / 130
146. List of Publications III
C. C. Kannas et al., “A workflow system for virtual
screening in cancer chemoprevention,” in 2012 IEEE
12th International Conference on Bioinformatics
Bioengineering (BIBE), 2012, pp. 439–446.
K. G. Achilleos, C. C. Kannas, C. A. Nicolaou, C. S.
Pattichis, and V. J. Promponas, “Open source workflow
systems in life sciences informatics,” in 2012 IEEE 12th
International Conference on Bioinformatics Bioengineering
(BIBE), 2012, pp. 552–558.
C. A. Nicolaou, C. Kannas, and C. S. Pattichis, “Optimal
graph design using a knowledge-driven
multi-objective evolutionary graph algorithm,” in
2009 9th International Conference on Information
Technology and Applications in Biomedicine, Larnaka,
Cyprus, 2009, pp. 1–6.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 64 / 130
147. List of Publications IV
C. C. Kannas, C. A. Nicolaou, and C. S. Pattichis, “A
Parallel implementation of a Multi-objective
Evolutionary Algorithm,” in 2009 9th International
Conference on Information Technology and Applications in
Biomedicine, Larnaka, Cyprus, 2009, pp. 1–6.
Abstracts
C. C. Kannas, and C. S. Pattichis, ”Self-Adaptive
Multi-Objective Evolutionary Algorithm for
Molecular Design,” in 39th Annual International
Conference of the IEEE Engineering in Medicine and Biology
Society, Jeju Island, Korea, 11-15 July 2017.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 65 / 130
149. Table of Contents
6 List of Publications
7 References
8 Backup Frames
Validation of Self-Adaptive MOEA
Use Case 1
Use Case 2
Use Case 3
Use Case 4
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 66 / 130
150. References I
Beccari, A. R., Cavazzoni, C., Beato, C., and Costantino, G.
(2013). LiGen: A High Performance Workflow for Chemistry
Driven de Novo Design. Journal of Chemical Information and
Modeling.
Blankenberg, D., Kuster, G. V., Coraor, N., Ananda, G.,
Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. (2010).
Galaxy: A Web-Based Genome Analysis Tool for
Experimentalists. In Current Protocols in Molecular Biology.
John Wiley & Sons, Inc.
Daeyaert, F. and Deem, M. W. (2016). A Pareto Algorithm for
Efficient De Novo Design of Multi-functional Molecules.
Molecular Informatics, pages n/a–n/a.
151. References II
Damewood, Jr, J. R., Lerman, C. L., and Masek, B. B. (2010).
NovoFLAP: A ligand-based de novo design approach for the
generation of medicinally relevant ideas. Journal of Chemical
Information and Modeling, 50(7):1296–1303.
Dey, F. and Caflisch, A. (2008). Fragment-based de novo ligand
design by multiobjective evolutionary optimization. Journal of
Chemical Information and Modeling, 48(3):679–690.
Ekins, S., Honeycutt, J. D., and Metz, J. T. (2010). Evolving
molecules using multi-objective optimization: applying to
ADME/Tox. Drug Discovery Today, 15(11-12):451–460.
152. References III
Feher, M., Gao, Y., Baber, J. C., Shirley, W. A., and Saunders,
J. (2008). The use of ligand-based de novo design for scaffold
hopping and sidechain optimization: two case studies. Bioorganic
& Medicinal Chemistry, 16(1):422–427.
Firth, N. C., Atrash, B., Brown, N., and Blagg, J. (2015).
MOARF, an Integrated Workflow for Multiobjective
Optimization: Implementation, Synthesis, and Biological
Evaluation. Journal of Chemical Information and Modeling.
Fonseca, C. and Fleming, P. (1998). Multiobjective optimization
and multiple constraint handling with evolutionary algorithms. I.
A unified formulation. IEEE Transactions on Systems, Man and
Cybernetics, Part A: Systems and Humans, 28(1):26–37.
153. References IV
Giardine, B., Riemer, C., Hardison, R. C., Burhans, R., Elnitski,
L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J.,
Miller, W., Kent, W. J., and Nekrutenko, A. (2005). Galaxy: A
Platform for Interactive Large-Scale Genome Analysis. Genome
Research, 15(10):1451–1455.
Goecks, J., Nekrutenko, A., Taylor, J., and Galaxy Team, T.
(2010). Galaxy: A comprehensive approach for supporting
accessible, reproducible, and transparent computational research
in the life sciences. Genome Biology, 11(8):R86.
Grefenstette, J. (1986). Optimization of Control Parameters for
Genetic Algorithms. IEEE Transactions on Systems, Man and
Cybernetics, 16(1):122–128.
154. References V
Gurer-Orhan, H., Kool, J., Vermeulen, N. P. E., and Meerman, J.
H. N. (2005). A novel microplate reader-based high-throughput
assay for estrogen receptor binding. International Journal of
Environmental Analytical Chemistry, 85(3):149–161.
Hartenfeller, M., Zettl, H., Walter, M., Rupp, M., Reisen, F.,
Proschak, E., Weggen, S., Stark, H., and Schneider, G. (2012).
DOGS: Reaction-Driven de novo Design of Bioactive
Compounds. PLoS Comput Biol, 8(2):e1002380.
Huang, Q., Li, L.-L., and Yang, S.-Y. (2010). PhDD: a new
pharmacophore-based de novo design method of drug-like
molecules combined with assessment of synthetic accessibility.
Journal of Molecular Graphics and Modelling, 28(8):775–787.
155. References VI
Kannas, C., Kalvari, I., Lambrinidis, G., Neophytou, C., Savva,
C., Kirmitzoglou, I., Antoniou, Z., Achilleos, K., Scherf, D.,
Pitta, C., Nicolaou, C., Mikros, E., Promponas, V., Gerhauser,
C., Mehta, R., Constantinou, A., and Pattichis, C. (2015). LiSIs:
An Online Scientific Workflow System for Virtual Screening.
Combinatorial Chemistry & High Throughput Screening,
18(3):281 – 295.
Kramer, O. (2010). Evolutionary self-adaptation: a survey of
operators and strategy parameters. Evolutionary Intelligence,
3(2):51–65.
156. References VII
Kutchukian, P. S., Lou, D., and Shakhnovich, E. I. (2009). FOG:
Fragment Optimized Growth algorithm for the de novo
generation of molecules occupying druglike chemical space.
Journal of Chemical Information and Modeling, 49(7):1630–1642.
Medina-Franco, J. L., L´opez-Vallejo, F., Kuck, D., and Lyko, F.
(2010). Natural products as DNA methyltransferase inhibitors: a
computer-aided discovery approach. Molecular Diversity,
15:293–304.
Nicolaou, C. A., Apostolakis, J., and Pattichis, C. S. (2009a). De
Novo Drug Design Using Multiobjective Evolutionary Graphs.
Journal of Chemical Information and Modeling, 49(2):295–307.
157. References VIII
Nicolaou, C. A., Kannas, C., and Pattichis, C. S. (2009b).
Optimal graph design using a knowledge-driven multi-objective
evolutionary graph algorithm. In 2009 9th International
Conference on Information Technology and Applications in
Biomedicine, pages 1–6, Larnaka, Cyprus. IEEE.
159. Table of Contents
6 List of Publications
7 References
8 Backup Frames
Validation of Self-Adaptive MOEA
Use Case 1
Use Case 2
Use Case 3
Use Case 4
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130
162. LiSIs Showcase - Natural-like Rule of 5 filter
GRANATUM Rule of 5 filter:
1 MW between 160 and 700,
2 HBD less or equal to 5,
3 HBA less or equal to 10,
4 TPSA less than 140, and
5 cLogP between -0.4 and 5.6.
164. SAMOEA Settings
Table: SAMOEA experimental design settings
SAMOEA
Dataset Objectives Population Iterations Evolutionary Operations
Dataset 1 Non Dominate
Solutions Percentage
Unique Solutions
Percentage
20 100
Mutation Probability: 15%
Crossover Probability: 80%
Selection Type: Roulette
Diversity Type: Phenotype
Dataset 2
eMEGA
Dataset 1 Structural Similarity
Chemical Descriptor
Similarity
100 1
Defined during run time.
Based on SAMOEA’s chro-
mosomes.
Dataset 2
165. Virtual Machine Specifications
Table: Specifications of the virtual machine the experimental runs were
performed
Linux Virtual Machine
CPU 4x Virtual CPU @ 2GHz
RAM 16GB
OS CentOS 6
172. eMEGA Maybridge All Runs Top 10 Results (1)
Figure: eMEGA Top 10 results for Maybridge dataset.
173. eMEGA Maybridge All Runs Top 10 Results (2)
Figure: eMEGA Top 10 results for Maybridge dataset compared with
Seliciclib, the red highlighted part of the molecules is their common
core.
174. eMEGA Asinex Run 1
Figure: eMEGA Run 1 results for Asinex dataset.
175. eMEGA Asinex Run 2
Figure: eMEGA Run 2 results for Asinex dataset.
176. eMEGA Asinex Run 3
Figure: eMEGA Run 3 results for Asinex dataset.
177. Results - eMEGA Asinex Run 4
Figure: eMEGA Run 4 results for Asinex dataset.
178. eMEGA Asinex Run 5
Figure: eMEGA Run 5 results for Asinex dataset.
179. eMEGA Asinex All Runs
Figure: eMEGA results for Asinex dataset.
180. eMEGA Asinex All Runs Top 10 Results (1)
Figure: eMEGA Top 10 results for Asinex dataset.
181. eMEGA Asinex All Runs Top 10 Results (2)
Figure: eMEGA Top 10 results for Asinex dataset compared with
Seliciclib, the red highlighted part of the molecules is their common
core.
188. SAMOEA Maybridge All Runs Top 10 Results (1)
Figure: SAMOEA Top 10 results for Maybridge dataset.
189. SAMOEA Maybridge All Runs Top 10 Results (2)
Figure: SAMOEA Top 10 results for Maybridge dataset compared with
Seliciclib, the red highlighted part of the molecules is their common
core.
190. SAMOEA Top 10 proposed settings for eMEGA
for Maybridge dataset
Table: SAMOEA Top 10 proposed settings for eMEGA for Maybridge
dataset
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Unique
Solutions
%
Rank
0.029 0.694 roulette genotype 0.9 0.986 1
0.175 0.818 roulette phenotype 0.914 0.961 1
0.172 0.818 tournament phenotype 0.934 0.9533 1
0.026 0.694 roulette phenotype 0.928 0.955 1
0.001 0.963 roulette phenotype 0.982 0.848 1
0.177 0.818 roulette phenotype 0.921 0.956 1
0.083 0.73 tournament phenotype 0.95 0.946 1
0.086 0.798 tournament genotype 0.976 0.928 1
0.172 0.818 best genotype 0.914 0.973 2
0.176 0.818 roulette genotype 0.9312 0.956 2
Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the
actual %. The smaller the number listed here the better. ’Rank’ is their non dominance
rank.
191. SAMOEA Asinex Run 1
Figure: SAMOEA Run 1 results for Asinex dataset.
192. SAMOEA Asinex Run 2
Figure: SAMOEA Run 2 results for Asinex dataset.
193. SAMOEA Asinex Run 3
Figure: SAMOEA Run 3 results for Asinex dataset.
194. SAMOEA Asinex Run 4
Figure: SAMOEA Run 4 results for Asinex dataset.
196. SAMOEA Asinex All Runs Top 10 Results (1)
Figure: SAMOEA Top 10 results for Asinex dataset.
197. SAMOEA Asinex All Runs Top 10 Results (2)
Figure: SAMOEA Top 10 results for Asinex dataset compared with
Seliciclib, the red highlighted part of the molecules is their common
core.
198. SAMOEA Top 10 proposed settings for eMEGA
for Maybridge Asinex
Table: SAMOEA Top 10 proposed settings for eMEGA for Asinex
dataset
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Unique
Solutions
%
Rank
0.105 1.0 best phenotype 0.988 0.931 1
0.139 0.963 tournament phenotype 0.962 0.956 1
0.089 0.694 tournament genotype 0.976 0.943 1
0.139 0.969 best phenotype 0.96 0.96 1
0.108 0.69 tournament genotype 0.955 0.962 1
0.1 1.0 best phenotype 0.988 0.942 1
0.088 0.685 tournament genotype 0.96 0.962 1
0.139 0.966 roulette phenotype 0.965 0.948 1
0.089 0.709 tournament genotype 0.964 0.957 2
Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the
actual %. The smaller the number listed here the better. ’Rank’ is their non dominance
rank.
200. Compare SAMOEA, eMEGA and MOARF
Figure: Compare all Top 10 results with MOARF’s results and
Seliciclib.
201. Discussion (1)
eMEGA and SAMOEA generate molecules that approximate
Seliciclib,
Datasets and algorithms have different common core with
Seliciclib,
MOARF approximates Seliciclib better than eMEGA and
SAMOEA:
Generates molecules in a more chemical oriented way, with
less stochastic operations,
Starts from a selected core for the target where then attaches
new fragments on to it,
SAMOEA explores the space better than eMEGA and
MOARF
202. Discussion (1)
eMEGA and SAMOEA generate molecules that approximate
Seliciclib,
Datasets and algorithms have different common core with
Seliciclib,
MOARF approximates Seliciclib better than eMEGA and
SAMOEA:
Generates molecules in a more chemical oriented way, with
less stochastic operations,
Starts from a selected core for the target where then attaches
new fragments on to it,
SAMOEA explores the space better than eMEGA and
MOARF
203. Discussion (1)
eMEGA and SAMOEA generate molecules that approximate
Seliciclib,
Datasets and algorithms have different common core with
Seliciclib,
MOARF approximates Seliciclib better than eMEGA and
SAMOEA:
Generates molecules in a more chemical oriented way, with
less stochastic operations,
Starts from a selected core for the target where then attaches
new fragments on to it,
SAMOEA explores the space better than eMEGA and
MOARF
204. Discussion (1)
eMEGA and SAMOEA generate molecules that approximate
Seliciclib,
Datasets and algorithms have different common core with
Seliciclib,
MOARF approximates Seliciclib better than eMEGA and
SAMOEA:
Generates molecules in a more chemical oriented way, with
less stochastic operations,
Starts from a selected core for the target where then attaches
new fragments on to it,
SAMOEA explores the space better than eMEGA and
MOARF
205. Discussion (1)
eMEGA and SAMOEA generate molecules that approximate
Seliciclib,
Datasets and algorithms have different common core with
Seliciclib,
MOARF approximates Seliciclib better than eMEGA and
SAMOEA:
Generates molecules in a more chemical oriented way, with
less stochastic operations,
Starts from a selected core for the target where then attaches
new fragments on to it,
SAMOEA explores the space better than eMEGA and
MOARF
206. Discussion (1)
eMEGA and SAMOEA generate molecules that approximate
Seliciclib,
Datasets and algorithms have different common core with
Seliciclib,
MOARF approximates Seliciclib better than eMEGA and
SAMOEA:
Generates molecules in a more chemical oriented way, with
less stochastic operations,
Starts from a selected core for the target where then attaches
new fragments on to it,
SAMOEA explores the space better than eMEGA and
MOARF
207. Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
208. Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
209. Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
210. Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
211. Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
212. Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
213. Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
214. Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
215. Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
216. Discussion (2)
From the SAMOEA proposed eMEGA settings Tables we can see
that different settings are favoured for each dataset.
Maybridge dataset:
Mutation probability around 17%,
Crossover probability around 80%,
Selection type either roulette or tournament and
Diversity type both selections are valid ones.
Asinex dataset:
Mutation probability around 10%,
Crossover probability around 96%,
Selection type either best or tournament and
Diversity type both selections are valid ones.
217. Discussion (3)
The objective fitness scores for the proposed settings are very
high, which means that the actual percentage is really low, below
5%. From this we can conclude the following:
eMEGA instances generate a large number of identical
solutions, despite the fact that they have different
configurations, this is something that we noticed with
previous experiments when comparing MEGA, eMEGA and
MOGA [Nicolaou et al., 2009b], and
The objective fitness functions we choose to use in SAMOEA
compete each other, which means that having eMEGAs
generating a high number of unique and non dominated
solutions (above 20%) proves to be a difficult task.
218. Discussion (3)
The objective fitness scores for the proposed settings are very
high, which means that the actual percentage is really low, below
5%. From this we can conclude the following:
eMEGA instances generate a large number of identical
solutions, despite the fact that they have different
configurations, this is something that we noticed with
previous experiments when comparing MEGA, eMEGA and
MOGA [Nicolaou et al., 2009b], and
The objective fitness functions we choose to use in SAMOEA
compete each other, which means that having eMEGAs
generating a high number of unique and non dominated
solutions (above 20%) proves to be a difficult task.
219. Use Case 1: Docked designed molecules (1)
Figure: Designed molecule DnD 6 SP 20 4 X 13a docked to ER-α.
220. Use Case 1: Docked designed molecules (2)
Figure: Designed molecule DnD 31 SP 150 37 M 19 docked to ER-α.
221. Use Case 1: Docked designed molecules (3)
Figure: Designed molecule DnD 8 SP 9 2 M 13 docked to ER-α.
222. Use Case 1: Docked designed molecules (4)
Figure: Designed molecule DnD 4 SP 199 49 X 46b docked to ER-α.
223. Use Case 1: Docked designed molecules (5)
Figure: Designed molecule DnD 12 SP 75 18 M 13 docked to ER-α.
224. Use Case 1: Docked designed molecules (6)
Figure: Designed molecule DnD 31 SP 6 1 M 16 docked to ER-α.
225. Use Case 1: Docked designed molecules (7)
Figure: Designed molecule DnD 15 SP 168 41 M 0 docked to ER-α.
226. Use Case 1: Docked designed molecules (8)
Figure: Designed molecule DnD 11 SP 74 18 M 4 docked to ER-α.
227. Use Case 1: Docked designed molecules (9)
Figure: Designed molecule DnD 31 SP 193 48 X 76b docked to ER-α.
228. Use Case 1: Docked designed molecules (10)
Figure: Designed molecule DnD 1 SP 78 19 X 84a docked to ER-α.
229. Use Case 2: About
Design molecules that bind to ER-α based on:
Structural similarity to Tamoxifen, and
Chemical Properties similarity to Tamoxifen.
Figure: Tamoxifen.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 116 / 130
230. Use Case 2: Starting Dataset
Starting Molecules dataset:
Molecules retrieved from ZINC15,
Applied filters:
Clean (Substances with ”clean” reactivity),
In-vitro (Substances reported or inferred active at 10 uM or
better in direct binding assays) and
Now (Immediate delivery, includes in-stock and agent).
The collection contains 7035 molecules.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 117 / 130
231. Use Case 2: Results - In objective space
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 118 / 130
232. Use Case 2: Results - Designed molecules
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 119 / 130
233. Use Case 2: Results - AutoDock Vina docking
Molecule Id Docking Affinity (kcal/mol)
DnD 42 SP 194 48 X 96b -10.1
DnD 17 SP 199 49 M 4 -10
DnD 33 SP 189 47 X 66b -9.9
DnD 48 SP 193 48 M 5 -9.6
Tamoxifen -8.2
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 120 / 130
234. Use Case 2: Results - Self-Adaptive MOEA non
dominated settings for eMEGA
Mutation
Probability
Crossover
Probability
Selection
Type
Diversity
Type
Non
Dominated
%
Pareto
Hypervolume
Rank
0.02707 0.97973 tournament genotype 0.983 0.153 1
0.02758 0.97965 tournament phenotype 0.988 0.152 1
Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the
number listed here the better. ’Rank’ is their non dominance rank.
C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 121 / 130
235. Use Case 2: Docked designed molecules (1)
Figure: Designed molecule DnD 42 SP 194 48 X 96b docked to ER-α.
236. Use Case 2: Docked designed molecules (2)
Figure: Designed molecule DnD 17 SP 199 49 M 4 docked to ER-α.
237. Use Case 2: Docked designed molecules (3)
Figure: Designed molecule DnD 33 SP 189 47 X 66b docked to ER-α.
238. Use Case 2: Docked designed molecules (4)
Figure: Designed molecule DnD 48 SP 193 48 M 5 docked to ER-α.
239. Use Case 3: Docked designed molecules (1)
Figure: Designed molecule DnD 31 SP 194 48 M 49 docked to ER-α.
240. Use Case 3: Docked designed molecules (2)
Figure: Designed molecule DnD 34 SP 197 49 X 13a docked to ER-α.
241. Use Case 4: Docked designed molecules (1)
Figure: Designed molecule DnD 19 SP 196 48 X 59b docked to
Proteasome B5.
242. Use Case 4: Docked designed molecules (2)
Figure: Designed molecule DnD 49 SP 193 48 X 123b docked to
Proteasome B5.
243. Use Case 4: Docked designed molecules (3)
Figure: Designed molecule DnD 1 SP 196 48 X 67a docked to
Proteasome B5.