CKannas PhD Thesis Slides

1. Scientiﬁc Workﬂow Systems and Multi-Objective Evolutionary Algorithms for Life Sciences Informatics Christos C. Kannas Computer Science, University of Cyprus 6th June 2017

2. Table of Contents 1 Introduction Scientiﬁc Workﬂow Management Systems Self-Adaptive Multi-Objective Evolutionary Algorithms Virtual Screening & De Novo Molecular Design 2 Life Sciences Informatics platform About Life Sciences Informatics platform LiSIs Showcase LiSIs Showcase Discussion 3 Self-Adaptive Multi-Objective Evolutionary Algorithm About Self-Adaptive MOEA Self-Adaptive MOEA Showcases Self-Adaptive MOEA Showcases Discussion 4 Concluding Remarks Concluding Remarks - LiSIs platform Concluding Remarks - Self-Adaptive MOEA 5 Future Work Future Work - LiSIs platform Future Work - Self-Adaptive MOEA C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 1 / 130

3. Introduction C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 2 / 130

5. Scientiﬁc Workﬂow Management Systems C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 3 / 130

6. SWMSs Application Domains C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 4 / 130

7. Self-Adaptive Multi-Objective Evolutionary Algorithms Multi-Objective Evolutionary Algorithms: Family of algorithms inspired by nature: Evolve a population Mutation and Crossover Select ﬁttest individuals by Pareto ranking Handle 1 to 3 objectives Self-Adaptive Techniques: Optimise search parameters: Population Size Mutation Rate Crossover Rate Generation Gap Scaling Window Optimise reproduction operators: Mutation Operator(s) Crossover Operator(s) Parent Selection Operator C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 5 / 130

13. Drug Discovery Process - Steps C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 6 / 130

14. Drug Discovery Process - Timeline C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 7 / 130

15. Life Sciences Informatics platform C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 8 / 130

17. Motivation & Objectives Motivation Provide an easy to use web based platform, Focused on Virtual Screening (VS) of natural products, and Aimed towards cancer chemoprevention researchers. Objectives Design and develop a web based Scientiﬁc Workﬂow Management System (SWMS), Provide tools for VS, and Evaluate it on use cases for identifying novel chemopreventive agents. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 9 / 130

23. Scientific Workflow Management Systems for Virtual Screening Applications Technology Scientific Field(s) Open Source Taverna Java Bioinformatics, Chemistry, Astronomy, Data Mining, Text Mining, Music Galaxy Python Life Sciences, Bioinformatics Knime Java Life Sciences, Chemoinformatics, Bioinformatics, High Performance Data Anal- ysis Commercial Inforsence/DiscoveryNet Life Sciences, Healthcare, Environmental Monitoring, Geo-hazard Modelling Pipeline Pilot Biology, Chemistry, Material Science C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 10 / 130

24. Funding Support The work has been partially supported through the EU-FP7 GRANATUM project, ”A Social Collaborative Working Space Semantically Interlinking Biomedical Researchers, Knowledge and data for the design and execution of In Silico Models and Experiments in Cancer Chemoprevention”, contract number 270139. Support the research of EU-FP7 Linked2Safety project, ”A Next-Generation, Secure Linked Data Medical Information Space For Semantically-Interconnecting Electronic Health Records and Clinical Trials Systems Advancing Patients Safety In Clinical Research”, contract number 288328. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130

25. Funding Support The work has been partially supported through the EU-FP7 GRANATUM project, ”A Social Collaborative Working Space Semantically Interlinking Biomedical Researchers, Knowledge and data for the design and execution of In Silico Models and Experiments in Cancer Chemoprevention”, contract number 270139. Support the research of EU-FP7 Linked2Safety project, ”A Next-Generation, Secure Linked Data Medical Information Space For Semantically-Interconnecting Electronic Health Records and Clinical Trials Systems Advancing Patients Safety In Clinical Research”, contract number 288328. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 11 / 130

26. Life Sciences Informatics platform Life Sciences Informatics (LiSIs) is a web based SWMS for VS [Kannas et al., 2015]. LiSIs is based on the Galaxy SWMS [Goecks et al., 2010], [Blankenberg et al., 2010], [Giardine et al., 2005]. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130

27. Life Sciences Informatics platform Life Sciences Informatics (LiSIs) is a web based SWMS for VS [Kannas et al., 2015]. LiSIs is based on the Galaxy SWMS [Goecks et al., 2010], [Blankenberg et al., 2010], [Giardine et al., 2005]. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 12 / 130

28. LiSIs modules C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 13 / 130

32. LiSIs Showcase C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 14 / 130

33. LiSIs Showcase Information LiSIs was (successfully) used for the discovery of promising agents with chemopreventive properties, that are able to bind to Estrogen Receptor-α (ER-α) and/or Estrogen Receptor-β (ER-β) Datasets: 2414 compounds from Indoﬁne, 55 compounds characterized by Medina-Franco et al. [Medina-Franco et al., 2010], and 21 known ER ligands retrieved from PubChem. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 15 / 130

38. LiSIs Showcase Workﬂow C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 16 / 130

39. LiSIs Showcase Docking Results (a) ER-α Docking Score (b) ER-β Docking Score C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 17 / 130

40. LiSIs Showcase Discussion From Indofine dataset (2414 compounds), based on their natural-like criteria and docking results, we selected: 18 potential ER ligands, Were further investigated in vitro with the ER binding assay described by Gurer-Orhan et al. [Gurer-Orhan et al., 2005] with minor modifications, 15 out of 18 compounds (83.3%) were experimentally confirmed active. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 18 / 130

43. Self-Adaptive Multi-Objective Evolutionary Algorithm C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 19 / 130

45. Multi-Objective Algorithms for Molecular Design Name MO Method Search Method Remarks Reference EA- Inventor Weighted Evolutionary Algorithm Ligand [Feher et al., 2008] GANDI Weighted Parallel Evo- lutionary Al- gorithm Structure [Dey and Caflisch, 2008] FOG Weighted Evolutionary Algorithm Ligand [Kutchukian et al., 2009] MEGA Pareto based Evolutionary Algorithm Ligand & Struc- ture [Nicolaou et al., 2009a] PLD Pareto based Evolutionary Algorithm ADME related properties [Ekins et al., 2010] NovoFLAP Weighted Evolutionary Algorithm Ligand [Damewood et al., 2010] PhDD Weighted Workflow Pharmacophore [Huang et al., 2010] DOGS Weighted Workflow Ligand [Hartenfeller et al., 2012] LiGen Weighted Workflow Ligand, Struc- ture & Pharma- cophore [Beccari et al., 2013] MOARF Weighted Workflow Ligand & Struc- ture [Firth et al., 2015] Synopsis Pareto based Evolutionary Algorithm Ligand & Struc- ture [Daeyaert and Deem, 2016] C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 20 / 130

46. Motivation & Objectives Motivation Find suitable search parameters for an algorithm in a given problem, and Automate this process. Objectives Design and develop an algorithm: To search for the ﬁttest search parameters of MOEAs, To be problem agnostic, and Evaluate on our previously proposed eMEGA for molecular De Novo Design. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 21 / 130

51. About Self-Adaptive MOEA Meta-level algorithmic approach influenced by Grefenstette [Grefenstette, 1986] and Kramer [Kramer, 2010] Main Evolutionary Algorithm (EA) is elite-MEGA (eMEGA) [Nicolaou et al., 2009a], [Nicolaou et al., 2009b], Meta-level EA is a modified MOGA [Fonseca and Fleming, 1998], Optimise eMEGA parameters: Mutation Rate, Crossover Rate, Parent Selection Type, Population Diversity Type. Objective fitness functions for the meta-level: The percentage of non-dominated solutions each eMEGA has per iteration, The percentage of unique solutions each eMEGA has per iteration. Pareto Front Hypervolume each eMEGA has per iteration. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 22 / 130

63. Self-Adaptive MOEA Pseudocode C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 23 / 130

64. Self-Adaptive MOEA Chromosome Chromosomes Example Objective Fitness Functions Objective Fitness Function Range Example Non-dominated Solutions % 0 - 1.0 0.90 Unique Solutions % 0 - 1.0 0.88 Pareto Front Hypervolume 0 - 1.0 0.56 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 24 / 130

65. Self-Adaptive MOEA Chromosome Chromosomes Example Objective Fitness Functions Objective Fitness Function Range Example Non-dominated Solutions % 0 - 1.0 0.90 Unique Solutions % 0 - 1.0 0.88 Pareto Front Hypervolume 0 - 1.0 0.56 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 24 / 130

66. eMEGA Chromosome Graph based, and Information related to evolutionary design process. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 25 / 130

69. Self-Adaptive MOEA Flowchart C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 26 / 130

73. Self-Adaptive MOEA Showcases C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 27 / 130

74. Validation of Self-Adaptive MOEA: About Compare SAMOEA, eMEGA and MOARF [Firth et al., 2015]. Design molecules that have structural and chemical properties similarity to the target molecule of Seliciclib. Figure: Seliciclib (CYC202, R-roscovitine) C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 28 / 130

75. Validation of Self-Adaptive MOEA: Staring Datasets Starting Molecules datasets: Maybridge’s Screening Library that contains 53953 molecules (Dataset 1), Asinex’s Elite Libraries that contains 104577 molecules (Dataset 2). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130

76. Validation of Self-Adaptive MOEA: Staring Datasets Starting Molecules datasets: Maybridge’s Screening Library that contains 53953 molecules (Dataset 1), Asinex’s Elite Libraries that contains 104577 molecules (Dataset 2). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 29 / 130

77. Validation of Self-Adaptive MOEA: Settings eMEGA Settings Dataset Objectives Population Iterations Evolutionary Operations Dataset 1 Structural Similarity Chemical Descriptor Similarity 500 500 Mutation Probability: 15% Crossover Probability: 80% Selection Type: Roulette Diversity Type: Genotype Dataset 2 SAMOEA Settings SAMOEA Dataset Objectives Population Iterations Evolutionary Operations Dataset 1 Non Dominate Solutions Percentage Unique Solutions Percentage 20 100 Mutation Probability: 15% Crossover Probability: 80% Selection Type: Roulette Diversity Type: Phenotype Dataset 2 eMEGA Dataset 1 Structural Similarity Chemical Descriptor Similarity 100 1 Deﬁned during run time. Based on SAMOEA’s chromosomes. Dataset 2 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130

78. Validation of Self-Adaptive MOEA: Settings eMEGA Settings Dataset Objectives Population Iterations Evolutionary Operations Dataset 1 Structural Similarity Chemical Descriptor Similarity 500 500 Mutation Probability: 15% Crossover Probability: 80% Selection Type: Roulette Diversity Type: Genotype Dataset 2 SAMOEA Settings SAMOEA Dataset Objectives Population Iterations Evolutionary Operations Dataset 1 Non Dominate Solutions Percentage Unique Solutions Percentage 20 100 Mutation Probability: 15% Crossover Probability: 80% Selection Type: Roulette Diversity Type: Phenotype Dataset 2 eMEGA Dataset 1 Structural Similarity Chemical Descriptor Similarity 100 1 Deﬁned during run time. Based on SAMOEA’s chromosomes. Dataset 2 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 30 / 130

79. Validation of Self-Adaptive MOEA: Results C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 31 / 130

80. Validation of Self-Adaptive MOEA: Results C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 31 / 130

81. Validation of Self-Adaptive MOEA: Results - Search Settings (1) SAMOEA Top 10 proposed settings for eMEGA for Maybridge dataset Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Unique Solutions % Rank 0.029 0.694 roulette genotype 0.9 0.986 1 0.175 0.818 roulette phenotype 0.914 0.961 1 0.172 0.818 tournament phenotype 0.934 0.9533 1 0.026 0.694 roulette phenotype 0.928 0.955 1 0.001 0.963 roulette phenotype 0.982 0.848 1 0.177 0.818 roulette phenotype 0.921 0.956 1 0.083 0.73 tournament phenotype 0.95 0.946 1 0.086 0.798 tournament genotype 0.976 0.928 1 0.172 0.818 best genotype 0.914 0.973 2 0.176 0.818 roulette genotype 0.9312 0.956 2 Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank.

82. Validation of Self-Adaptive MOEA: Results - Search Settings (2) SAMOEA Top 10 proposed settings for eMEGA for Asinex dataset Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Unique Solutions % Rank 0.105 1.0 best phenotype 0.988 0.931 1 0.139 0.963 tournament phenotype 0.962 0.956 1 0.089 0.694 tournament genotype 0.976 0.943 1 0.139 0.969 best phenotype 0.96 0.96 1 0.108 0.69 tournament genotype 0.955 0.962 1 0.1 1.0 best phenotype 0.988 0.942 1 0.088 0.685 tournament genotype 0.96 0.962 1 0.139 0.966 roulette phenotype 0.965 0.948 1 0.089 0.709 tournament genotype 0.964 0.957 2 Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank.

83. Use Case 1: About Design molecules that bind to ER-α based on: Structural similarity to Tamoxifen, and Structural dissimilarity to Ibuproxam. (a) Tamoxifen. (b) Ibuproxam. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 34 / 130

84. Use Case 1: Starting Dataset Starting Molecules dataset: Molecules retrieved from ZINC15, Applied ﬁlters: Clean (Substances with ”clean” reactivity), In-vitro (Substances reported or inferred active at 10 uM or better in direct binding assays) and Now (Immediate delivery, includes in-stock and agent). The collection contains 7035 molecules. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 35 / 130

90. Use Case 1: Results - In objective space C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 36 / 130

91. Use Case 1: Results - Designed molecules C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 37 / 130

92. Use Case 1: Results - AutoDock Vina docking Molecule Id Docking Aﬃnity (kcal/mol) Tamoxifen -8.2 DnD 6 SP 20 4 X 13a -7.9 DnD 31 SP 150 37 M 19 -7.9 DnD 8 SP 9 2 M 13 -7.8 DnD 4 SP 199 49 X 46b -7.7 DnD 12 SP 75 18 M 13 -7.6 DnD 31 SP 6 1 M 16 -7.2 DnD 15 SP 168 41 M 0 -7.2 DnD 11 SP 74 18 M 4 -7.1 DnD 31 SP 193 48 X 76b -6.9 DnD 1 SP 78 19 X 84a -6.8 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 38 / 130

93. Use Case 1: Results - Self-Adaptive MOEA non dominated settings for eMEGA Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Pareto Hypervolume Rank 0.15777 0.80279 tournament genotype 0.634 0.341 1 0.15613 0.88305 tournament genotype 0.634 0.341 1 0.15627 0.88891 tournament genotype 0.634 0.341 1 0.15688 0.88891 roulette genotype 0.649 0.340 1 0.00552 0.94308 best genotype 0.624 0.427 1 Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 39 / 130

94. Use Case 3: About Design molecules that bind to ER-α based on: Structural similarity to Raloxifene, and Chemical Properties similarity to Raloxifene. Figure: Raloxifene. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 40 / 130

98. Use Case 3: Results - AutoDock Vina docking Molecule Id Docking Aﬃnity (kcal/mol) DnD 31 SP 194 48 M 49 -8.2 DnD 34 SP 197 49 X 13a -5.9 Raloxifene -2.2 (-11.70 PubChem) C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 44 / 130

99. Use Case 3: Results - Self-Adaptive MOEA non dominated settings for eMEGA Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Pareto Hypervolume Rank 0.12927 0.98597 roulette genotype 0.997 0.274 1 0.12897 0.98588 roulette genotype 0.997 0.274 1 0.12933 0.98588 roulette genotype 0.997 0.274 1 0.12946 0.98559 roulette genotype 0.997 0.274 1 0.12928 0.98582 roulette genotype 0.997 0.274 1 0.12897 0.98588 tournament genotype 0.997 0.274 1 Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 45 / 130

100. Use Case 4: About Design molecules that bind to Proteasome B5 based on: Structural similarity to Ixazomib, and Chemical Properties similarity to Ixazomib. Figure: Ixazomib. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 46 / 130

104. Use Case 4: Results - AutoDock 4 docking Molecule Id Docking Aﬃnity (kcal/mol) DnD 19 SP 196 48 X 59b -7.19 DnD 49 SP 193 48 X 123b -6.68 DnD 1 SP 196 48 X 67a -6.08 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 50 / 130

105. Use Case 4: Results - Self-Adaptive MOEA non dominated settings for eMEGA Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Pareto Hypervolume Rank 0.09507 0.98194 tournament phenotype 0.993 0.442 1 0.09507 0.9819 roulette phenotype 0.991 0.442 1 0.09471 0.98178 roulette genotype 0.997 0.426 1 0.09484 0.98183 roulette phenotype 0.996 0.441 1 0.09277 0.98235 roulette genotype 0.996 0.441 1 Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 51 / 130

106. Self-Adaptive MOEA Showcases Discussion SAMOEA proposed interesting solutions in all problems that has been applied to, Further in-vitro investigation is required, and SAMOEA’s proposed eMEGA settings diﬀer based on problem and dataset (no silver bullet). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 52 / 130

109. Concluding Remarks C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 53 / 130

111. Concluding Remarks - LiSIs platform Features a Web based Virtual Screening platform, focused for Cancer Chemoprevention Research. To be expanded later in the future with tools featuring the algorithms from MEGA framework. A number of SWs were implemented for: preparing docking models, preparing predictive models, performing docking experiments, using predictive models to predict biochemical properties and behaviour, and performing VS workﬂows. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 54 / 130

114. Concluding Remarks - Self-Adaptive MOEA (1) Drawbacks: Needs a lot of time to terminate, and Very slow convergence. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 55 / 130

117. Concluding Remarks - Self-Adaptive MOEA (2) Advantages: Searches a larger space, Generates far more solutions per iteration, Proposes the fittest parameter sets that should be used from eMEGA for the given problem, Has been build to be adaptable, Uses objective fitness functions that can evaluate the effectiveness and the progression of any MOEA, Can be used on other problems, SAMOEA’s chromosome can be expanded with additional search parameters, and Leverages multi-core parallelism (needs more memory). C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 56 / 130

126. Future Work C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 57 / 130

128. Future Work - LiSIs platform Develop LiSIs 2.0: Based on latest Galaxy platform, and Redesign of tools to be compatible with Galaxy’s ToolShed for easy deployment, Update LiSIs with a feature to visualise intermediate results from various tools, Expand LiSIs tools with tools featuring the MEGA line-up of algorithms and SAMOEA, Explore resource management in SWMSs: Novel Multi-Objective Optimization SW design approaches, Novel Multi-Objective Optimization SWs scheduling approaches. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 58 / 130

136. Future Work - Self-Adaptive MOEA Optimise MEGA framework (memory management and parallelism), Implement self-adaptive technique for selecting genetic operators, Extend Self-Adaptive MOEA to use other MOEAs, Implement models for other problems, and Implement new objective functions. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 59 / 130

142. List of Publications C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130

143. Table of Contents 6 List of Publications 7 References 8 Backup Frames Validation of Self-Adaptive MOEA Use Case 1 Use Case 2 Use Case 3 Use Case 4 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130

144. List of Publications I Book Chapters C. A. Nicolaou and C. C. Kannas, “Molecular Library Design Using Multi-Objective Optimization Methods,” in Chemical Library Design, J. Z. Zhou, Ed. Humana Press, 2011, pp. 53–69. Journals C. Kannas et al., “LiSIs: An Online Scientiﬁc Workﬂow System for Virtual Screening,” Combinatorial Chemistry & High Throughput Screening, vol. 18, no. 3, pp. 281–295, Mar. 2015. C. A. Nicolaou, C. Kannas, and E. Loizidou, “Multi-objective optimization methods in de novo drug design,” Mini Rev Med Chem, vol. 12, no. 10, pp. 979–987, Sep. 2012. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 62 / 130

145. List of Publications II C. Nicolaou, C. Kannas, and C. Pattichis, “Knowledge-driven multi-objective de novo drug design,” Chemistry Central Journal, vol. 3, p. P22, 2009. Conferences C. C. Kannas, and C. S. Pattichis, ”Self-Adaptive Multi-Objective Evolutionary Algorithm for Molecular Design,” in 30th IEEE International Symposium on Computer-Base Medical Systems, Thessoloniki, Greece, 22-24 June 2017, pp. 1-6. P. Hasapis et al., ”Molecular clustering via knowledge mining from biomedical scientiﬁc corpora,” in 2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), 2013, pp. 1-5. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 63 / 130

146. List of Publications III C. C. Kannas et al., “A workﬂow system for virtual screening in cancer chemoprevention,” in 2012 IEEE 12th International Conference on Bioinformatics Bioengineering (BIBE), 2012, pp. 439–446. K. G. Achilleos, C. C. Kannas, C. A. Nicolaou, C. S. Pattichis, and V. J. Promponas, “Open source workﬂow systems in life sciences informatics,” in 2012 IEEE 12th International Conference on Bioinformatics Bioengineering (BIBE), 2012, pp. 552–558. C. A. Nicolaou, C. Kannas, and C. S. Pattichis, “Optimal graph design using a knowledge-driven multi-objective evolutionary graph algorithm,” in 2009 9th International Conference on Information Technology and Applications in Biomedicine, Larnaka, Cyprus, 2009, pp. 1–6. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 64 / 130

147. List of Publications IV C. C. Kannas, C. A. Nicolaou, and C. S. Pattichis, “A Parallel implementation of a Multi-objective Evolutionary Algorithm,” in 2009 9th International Conference on Information Technology and Applications in Biomedicine, Larnaka, Cyprus, 2009, pp. 1–6. Abstracts C. C. Kannas, and C. S. Pattichis, ”Self-Adaptive Multi-Objective Evolutionary Algorithm for Molecular Design,” in 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Jeju Island, Korea, 11-15 July 2017. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 65 / 130

148. References C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 66 / 130

150. References I Beccari, A. R., Cavazzoni, C., Beato, C., and Costantino, G. (2013). LiGen: A High Performance Workﬂow for Chemistry Driven de Novo Design. Journal of Chemical Information and Modeling. Blankenberg, D., Kuster, G. V., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., and Taylor, J. (2010). Galaxy: A Web-Based Genome Analysis Tool for Experimentalists. In Current Protocols in Molecular Biology. John Wiley & Sons, Inc. Daeyaert, F. and Deem, M. W. (2016). A Pareto Algorithm for Eﬃcient De Novo Design of Multi-functional Molecules. Molecular Informatics, pages n/a–n/a.

151. References II Damewood, Jr, J. R., Lerman, C. L., and Masek, B. B. (2010). NovoFLAP: A ligand-based de novo design approach for the generation of medicinally relevant ideas. Journal of Chemical Information and Modeling, 50(7):1296–1303. Dey, F. and Caﬂisch, A. (2008). Fragment-based de novo ligand design by multiobjective evolutionary optimization. Journal of Chemical Information and Modeling, 48(3):679–690. Ekins, S., Honeycutt, J. D., and Metz, J. T. (2010). Evolving molecules using multi-objective optimization: applying to ADME/Tox. Drug Discovery Today, 15(11-12):451–460.

152. References III Feher, M., Gao, Y., Baber, J. C., Shirley, W. A., and Saunders, J. (2008). The use of ligand-based de novo design for scaffold hopping and sidechain optimization: two case studies. Bioorganic & Medicinal Chemistry, 16(1):422–427. Firth, N. C., Atrash, B., Brown, N., and Blagg, J. (2015). MOARF, an Integrated Workflow for Multiobjective Optimization: Implementation, Synthesis, and Biological Evaluation. Journal of Chemical Information and Modeling. Fonseca, C. and Fleming, P. (1998). Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 28(1):26–37.

153. References IV Giardine, B., Riemer, C., Hardison, R. C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., Miller, W., Kent, W. J., and Nekrutenko, A. (2005). Galaxy: A Platform for Interactive Large-Scale Genome Analysis. Genome Research, 15(10):1451–1455. Goecks, J., Nekrutenko, A., Taylor, J., and Galaxy Team, T. (2010). Galaxy: A comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology, 11(8):R86. Grefenstette, J. (1986). Optimization of Control Parameters for Genetic Algorithms. IEEE Transactions on Systems, Man and Cybernetics, 16(1):122–128.

154. References V Gurer-Orhan, H., Kool, J., Vermeulen, N. P. E., and Meerman, J. H. N. (2005). A novel microplate reader-based high-throughput assay for estrogen receptor binding. International Journal of Environmental Analytical Chemistry, 85(3):149–161. Hartenfeller, M., Zettl, H., Walter, M., Rupp, M., Reisen, F., Proschak, E., Weggen, S., Stark, H., and Schneider, G. (2012). DOGS: Reaction-Driven de novo Design of Bioactive Compounds. PLoS Comput Biol, 8(2):e1002380. Huang, Q., Li, L.-L., and Yang, S.-Y. (2010). PhDD: a new pharmacophore-based de novo design method of drug-like molecules combined with assessment of synthetic accessibility. Journal of Molecular Graphics and Modelling, 28(8):775–787.

155. References VI Kannas, C., Kalvari, I., Lambrinidis, G., Neophytou, C., Savva, C., Kirmitzoglou, I., Antoniou, Z., Achilleos, K., Scherf, D., Pitta, C., Nicolaou, C., Mikros, E., Promponas, V., Gerhauser, C., Mehta, R., Constantinou, A., and Pattichis, C. (2015). LiSIs: An Online Scientiﬁc Workﬂow System for Virtual Screening. Combinatorial Chemistry & High Throughput Screening, 18(3):281 – 295. Kramer, O. (2010). Evolutionary self-adaptation: a survey of operators and strategy parameters. Evolutionary Intelligence, 3(2):51–65.

156. References VII Kutchukian, P. S., Lou, D., and Shakhnovich, E. I. (2009). FOG: Fragment Optimized Growth algorithm for the de novo generation of molecules occupying druglike chemical space. Journal of Chemical Information and Modeling, 49(7):1630–1642. Medina-Franco, J. L., L´opez-Vallejo, F., Kuck, D., and Lyko, F. (2010). Natural products as DNA methyltransferase inhibitors: a computer-aided discovery approach. Molecular Diversity, 15:293–304. Nicolaou, C. A., Apostolakis, J., and Pattichis, C. S. (2009a). De Novo Drug Design Using Multiobjective Evolutionary Graphs. Journal of Chemical Information and Modeling, 49(2):295–307.

157. References VIII Nicolaou, C. A., Kannas, C., and Pattichis, C. S. (2009b). Optimal graph design using a knowledge-driven multi-objective evolutionary graph algorithm. In 2009 9th International Conference on Information Technology and Applications in Biomedicine, pages 1–6, Larnaka, Cyprus. IEEE.

158. Backup Frames C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 61 / 130

160. Pareto Ranking

161. LiSIs Showcase - Known ER Ligands A/A Estrogen Ligand Docking Score ER-α Docking Score ER-β 1 Raloxifene -11.70 -8.72 2 Lilly-117018 -11.53 -3.80 3 3-HydroxyTamoxifen -11.02 N/A 4 Nafoxidine -10.88 N/A 5 ICI-182780 -10.73 N/A 6 Pyrolidine -10.04 N/A 7 Clomiphene A -10.01 N/A 8 Nitroﬁnene Citrate -9.87 N/A 9 ICI-164384 -9.82 -9.13 10 Moxestrol -9.38 -9.77 11 Naringenine -8.55 -7.80 12 Triphenylethylene -8.50 N/A 13 Afema -8.15 -7.78 14 Danazol -6.99 N/A 15 Ethamoxytriphetol -6.67 N/A 16 4-HydroxyTamoxifen -6.60 N/A 17 Dioxin -6.22 N/A 18 Estralutin -5.86 -3.80 19 Cyclopentanone -4.88 N/A 20 Miproxifene Phosphate -4.48 N/A 21 EM-800 N/A N/A Note: The list was retrieved from PubChem and it includes compounds characterized as “estrogen ligands”. N/A; no binding aﬃnity.

162. LiSIs Showcase - Natural-like Rule of 5 ﬁlter GRANATUM Rule of 5 ﬁlter: 1 MW between 160 and 700, 2 HBD less or equal to 5, 3 HBA less or equal to 10, 4 TPSA less than 140, and 5 cLogP between -0.4 and 5.6.

163. eMEGA Settings Table: eMEGA experimental design settings Dataset Objectives Population Iterations Evolutionary Operations Dataset 1 Structural Similarity Chemical Descriptor Similarity 500 500 Mutation Probability: 15% Crossover Probability: 80% Selection Type: Roulette Diversity Type: Genotype Dataset 2

164. SAMOEA Settings Table: SAMOEA experimental design settings SAMOEA Dataset Objectives Population Iterations Evolutionary Operations Dataset 1 Non Dominate Solutions Percentage Unique Solutions Percentage 20 100 Mutation Probability: 15% Crossover Probability: 80% Selection Type: Roulette Diversity Type: Phenotype Dataset 2 eMEGA Dataset 1 Structural Similarity Chemical Descriptor Similarity 100 1 Deﬁned during run time. Based on SAMOEA’s chromosomes. Dataset 2

165. Virtual Machine Speciﬁcations Table: Speciﬁcations of the virtual machine the experimental runs were performed Linux Virtual Machine CPU 4x Virtual CPU @ 2GHz RAM 16GB OS CentOS 6

166. eMEGA Maybridge Run 1 Figure: eMEGA Run 1 results for Maybridge dataset.

171. eMEGA Maybridge All Runs Figure: eMEGA results for Maybridge dataset.

172. eMEGA Maybridge All Runs Top 10 Results (1) Figure: eMEGA Top 10 results for Maybridge dataset.

173. eMEGA Maybridge All Runs Top 10 Results (2) Figure: eMEGA Top 10 results for Maybridge dataset compared with Seliciclib, the red highlighted part of the molecules is their common core.

174. eMEGA Asinex Run 1 Figure: eMEGA Run 1 results for Asinex dataset.

177. Results - eMEGA Asinex Run 4 Figure: eMEGA Run 4 results for Asinex dataset.

179. eMEGA Asinex All Runs Figure: eMEGA results for Asinex dataset.

180. eMEGA Asinex All Runs Top 10 Results (1) Figure: eMEGA Top 10 results for Asinex dataset.

181. eMEGA Asinex All Runs Top 10 Results (2) Figure: eMEGA Top 10 results for Asinex dataset compared with Seliciclib, the red highlighted part of the molecules is their common core.

182. SAMOEA Maybridge Run 1 Figure: SAMOEA Run 1 results for Maybridge dataset.

187. SAMOEA Maybridge All Runs Figure: SAMOEA results for Maybridge dataset.

188. SAMOEA Maybridge All Runs Top 10 Results (1) Figure: SAMOEA Top 10 results for Maybridge dataset.

189. SAMOEA Maybridge All Runs Top 10 Results (2) Figure: SAMOEA Top 10 results for Maybridge dataset compared with Seliciclib, the red highlighted part of the molecules is their common core.

190. SAMOEA Top 10 proposed settings for eMEGA for Maybridge dataset Table: SAMOEA Top 10 proposed settings for eMEGA for Maybridge dataset Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Unique Solutions % Rank 0.029 0.694 roulette genotype 0.9 0.986 1 0.175 0.818 roulette phenotype 0.914 0.961 1 0.172 0.818 tournament phenotype 0.934 0.9533 1 0.026 0.694 roulette phenotype 0.928 0.955 1 0.001 0.963 roulette phenotype 0.982 0.848 1 0.177 0.818 roulette phenotype 0.921 0.956 1 0.083 0.73 tournament phenotype 0.95 0.946 1 0.086 0.798 tournament genotype 0.976 0.928 1 0.172 0.818 best genotype 0.914 0.973 2 0.176 0.818 roulette genotype 0.9312 0.956 2 Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank.

191. SAMOEA Asinex Run 1 Figure: SAMOEA Run 1 results for Asinex dataset.

195. SAMOEA Asinex All Runs Figure: SAMOEA results for Asinex dataset.

196. SAMOEA Asinex All Runs Top 10 Results (1) Figure: SAMOEA Top 10 results for Asinex dataset.

197. SAMOEA Asinex All Runs Top 10 Results (2) Figure: SAMOEA Top 10 results for Asinex dataset compared with Seliciclib, the red highlighted part of the molecules is their common core.

198. SAMOEA Top 10 proposed settings for eMEGA for Maybridge Asinex Table: SAMOEA Top 10 proposed settings for eMEGA for Asinex dataset Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Unique Solutions % Rank 0.105 1.0 best phenotype 0.988 0.931 1 0.139 0.963 tournament phenotype 0.962 0.956 1 0.089 0.694 tournament genotype 0.976 0.943 1 0.139 0.969 best phenotype 0.96 0.96 1 0.108 0.69 tournament genotype 0.955 0.962 1 0.1 1.0 best phenotype 0.988 0.942 1 0.088 0.685 tournament genotype 0.96 0.962 1 0.139 0.966 roulette phenotype 0.965 0.948 1 0.089 0.709 tournament genotype 0.964 0.957 2 Note: The numbers for ’Non Dominated %’ and ’Unique Solutions %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank.

199. MOARF Results Figure: MOARF’s results compared with Seliciclib.

200. Compare SAMOEA, eMEGA and MOARF Figure: Compare all Top 10 results with MOARF’s results and Seliciclib.

201. Discussion (1) eMEGA and SAMOEA generate molecules that approximate Seliciclib, Datasets and algorithms have diﬀerent common core with Seliciclib, MOARF approximates Seliciclib better than eMEGA and SAMOEA: Generates molecules in a more chemical oriented way, with less stochastic operations, Starts from a selected core for the target where then attaches new fragments on to it, SAMOEA explores the space better than eMEGA and MOARF

207. Discussion (2) From the SAMOEA proposed eMEGA settings Tables we can see that diﬀerent settings are favoured for each dataset. Maybridge dataset: Mutation probability around 17%, Crossover probability around 80%, Selection type either roulette or tournament and Diversity type both selections are valid ones. Asinex dataset: Mutation probability around 10%, Crossover probability around 96%, Selection type either best or tournament and Diversity type both selections are valid ones.

217. Discussion (3) The objective fitness scores for the proposed settings are very high, which means that the actual percentage is really low, below 5%. From this we can conclude the following: eMEGA instances generate a large number of identical solutions, despite the fact that they have different configurations, this is something that we noticed with previous experiments when comparing MEGA, eMEGA and MOGA [Nicolaou et al., 2009b], and The objective fitness functions we choose to use in SAMOEA compete each other, which means that having eMEGAs generating a high number of unique and non dominated solutions (above 20%) proves to be a difficult task.

218. Discussion (3) The objective fitness scores for the proposed settings are very high, which means that the actual percentage is really low, below 5%. From this we can conclude the following: eMEGA instances generate a large number of identical solutions, despite the fact that they have different configurations, this is something that we noticed with previous experiments when comparing MEGA, eMEGA and MOGA [Nicolaou et al., 2009b], and The objective fitness functions we choose to use in SAMOEA compete each other, which means that having eMEGAs generating a high number of unique and non dominated solutions (above 20%) proves to be a difficult task.

219. Use Case 1: Docked designed molecules (1) Figure: Designed molecule DnD 6 SP 20 4 X 13a docked to ER-α.

220. Use Case 1: Docked designed molecules (2) Figure: Designed molecule DnD 31 SP 150 37 M 19 docked to ER-α.

222. Use Case 1: Docked designed molecules (4) Figure: Designed molecule DnD 4 SP 199 49 X 46b docked to ER-α.

229. Use Case 2: About Design molecules that bind to ER-α based on: Structural similarity to Tamoxifen, and Chemical Properties similarity to Tamoxifen. Figure: Tamoxifen. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 116 / 130

233. Use Case 2: Results - AutoDock Vina docking Molecule Id Docking Aﬃnity (kcal/mol) DnD 42 SP 194 48 X 96b -10.1 DnD 17 SP 199 49 M 4 -10 DnD 33 SP 189 47 X 66b -9.9 DnD 48 SP 193 48 M 5 -9.6 Tamoxifen -8.2 C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 120 / 130

234. Use Case 2: Results - Self-Adaptive MOEA non dominated settings for eMEGA Mutation Probability Crossover Probability Selection Type Diversity Type Non Dominated % Pareto Hypervolume Rank 0.02707 0.97973 tournament genotype 0.983 0.153 1 0.02758 0.97965 tournament phenotype 0.988 0.152 1 Note: The numbers for ’Non Dominated %’ are 1 minus the actual %. The smaller the number listed here the better. ’Rank’ is their non dominance rank. C. C. Kannas (CS, UCY) Ph.D Thesis 06-06-2017 121 / 130

241. Use Case 4: Docked designed molecules (1) Figure: Designed molecule DnD 19 SP 196 48 X 59b docked to Proteasome B5.

242. Use Case 4: Docked designed molecules (2) Figure: Designed molecule DnD 49 SP 193 48 X 123b docked to Proteasome B5.

243. Use Case 4: Docked designed molecules (3) Figure: Designed molecule DnD 1 SP 196 48 X 67a docked to Proteasome B5.

CKannas PhD Thesis Slides

Recommended

Recommended

More Related Content

Similar to CKannas PhD Thesis Slides

Similar to CKannas PhD Thesis Slides (20)

More from Christos Kannas

More from Christos Kannas (14)

Recently uploaded

Recently uploaded (20)

CKannas PhD Thesis Slides