SlideShare a Scribd company logo
1 of 32
Download to read offline
16 Years of the
Chemistry Development Kit
(CDK)
Christoph Steinbeck
and the CDK Developers
http://cdk.sourceforge.net
The Chemistry Development Kit (CDK)
Open Source Cheminformatics in Java
The CDK after 16 years
•16,521 commits made by 115 contributors
•564,171 lines of code
•mostly written in Java
•well established, mature codebase
•maintained by a large development team
•with stable Y-O-Y commits
•estimated 151 years of effort (COCOMO model)
•first commit in October, 2000
•most recent commit 1 day ago
The Chemistry Development Kit (CDK)
Open Source Cheminformatics in Java
A bit of history
1990
Computer-Assisted
Structure Elucidation
(CASE)
Steinbeck, C.; Angewandte Chemie. International
Ed. in English 1996, 35, 1984-1986
Steinbeck, C.: J. Chem. Inf. Comput. Sci. 2001,
41, 6, 1500
1992 - now
A bit of history
Growth over theYears
Bibliometrics
Build your own
Blue Obelisk
The Doctor Who
Model of Open Source
The Doctor Who
Model of Open Source
Egon Willighagen
The Doctor Who
Model of Open Source
Rajarshi Guha
Egon Willighagen
The Doctor Who
Model of Open Source
John May
Rajarshi Guha
Egon Willighagen
Development Model
• Open Source Principles
• Release Early, Release Often
• All the Raymond stuff (Cathedral …
• Persistance
• People contribute what they need
• You need a Doctor Who who cares
• code quality, build systems, etc
CDK
Current status
John May
• Maven describes how a project is built and it’s dependencies.
• Simplifies both building from source or linking a distributed JAR.
• Dependencies are dynamically downloaded and kept in sync.
• “Convention over configuration”
• Many new Java projects choose Maven, CDK is 15+ years old this was
a challenge.
• Splitting a one source tree into 76 interdependent modules.
• Modularisation started by Egon Willighagen modularisation in Ant.
• Test Fail=Build Fail, Required resolving 150+ existing regressions.
• CDK 1.5.10+ available from The Central Repository a geographically
distributed collection of dedicated servers.
Maven Switch Over
tinyurl.com/cdk-mavencentral
1.5.x: Cleaner, More Efficient, More Robust, More Stable
Example: Generate depiction of a molecule.
1.4.x 1.5.x
+ Improved Layout
+ Improved Render
+ Easy highlighting
+ Abbreviations
Try it: http://cdkdepict-openchem.rhcloud.com/
Error
1.4.x1.5.x
Examples 1-4: Clark A, et al, 2D structure depiction. JCIM, 46, 1107-1123 (2006)
1.5.x: Cleaner, More Efficient, More Robust, More Stable
Molecule 2D layout and rendering from SMILES
Try it: http://cdkdepict-openchem.rhcloud.com/
1.4.x
1.5.x
1.5.x
+abbr
+colmap
Reaction: Lowe, D. Extraction of chemical structures and reactions from the literature. PhD Thesis. 2012
1.5.x: Cleaner, More Efficient, More Robust, More Stable
Reaction 2D layout and rendering from SMILES
CHEMBL590010
(ChEMBL website)
CHEMBL590010
(from SMILES with CDK 1.5.x)
Example: SMARTS match for intramolecular Hydrogen Bonds:
O=[C,N]aa[N,O;!H0] in NCI Aug00 (~250,000 molecules) [1-3]
1.4.x: 16 mins (64 err)
1.5.x: 16 secs (0 err)
+ Lazy algorithm
+ Stereochemistry Match
+ Component Grouping
+ Adaptive (e.g. ring membership only if needed)
+ New Pattern API, hides differences between SMARTS/Substructure/
Isomorphism queries
1.5.x: Cleaner, More Efficient, More Robust, More Stable
[1] Weininger D. Chemistry Cartridge CGI Examples. EMug (1998)
[2] Sayle R. Cheminformatics Toolkits: a personal perspective, RDKit UGM (2012)
[3] May J. All The Small Things. http://efficientbits.blogspot.co.uk/2013/10/
Robustness
CDK 1.5.x moves away from default atom type perception/sanitisation.
+ Much faster
+ High fidelity IO: round trip [CH2] though SMILES/InChI/Molfile
+ Exact Kekulization
+ Exact ring perception
+ Portable canonical Kekulé SMILES
+ Multiple aromaticity models, “Horse for courses”
+ Accurate MMFF94 partial charges
Stability
Java APIs can be more fluid than native: aim to keep public API fixed.
Continuous integration and regression testing with Jenkins and Travis.
1.5.x: Cleaner, More Efficient, More Robust, More Stable
Stereochemistry
Tetrahedral, CisTrans, Extended Tetrahedral (Allene)
Representation and round tripping between formats
Query Matching
Perspective conversion (Haworth, Chairs, Fischer)
File Formats
Molfile Sgroup support: Repeat Units, Display Shortcuts
CXSMILES
Coming soon: HELM 2.0
Fingerprints
Count fingerprints
Efficient Circular Fingerprint and Model Building
Clark et al. JChemInf. 6:38 (2014)
Coming soon: FPS readers, mmap indexes
Updated and Super Quick Fundamental Algorithms
Ring Finding - May and Steinbeck. JChemInf. 6:3 (2014)
Subgraph Isomorphism
Canonical labelling
Aromaticity
Kekulization
Molecular Hash Codes, Automorphism Group, and much more
Other Features of 1.5.x
Acknowledgement
• All CDK developers
• The Blue Obelisk Community
• You for your attention
iCASE PhD
Studentships
16 years of the Chemistry Development Kit (CDK)

More Related Content

Similar to 16 years of the Chemistry Development Kit (CDK)

Openstack Summit Container Day Keynote
Openstack Summit Container Day KeynoteOpenstack Summit Container Day Keynote
Openstack Summit Container Day KeynoteBoyd Hemphill
 
Cloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CDCloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CDMustafa AKIN
 
Ensuring OpenStack Version up Compatibility for CloudOpen Japan 2013-05-31
Ensuring OpenStack Version up Compatibility for CloudOpen Japan 2013-05-31Ensuring OpenStack Version up Compatibility for CloudOpen Japan 2013-05-31
Ensuring OpenStack Version up Compatibility for CloudOpen Japan 2013-05-31Masayuki Igawa
 
Docker в автоматизации тестирования
Docker в автоматизации тестированияDocker в автоматизации тестирования
Docker в автоматизации тестированияCOMAQA.BY
 
Why Docker
Why DockerWhy Docker
Why DockerdotCloud
 
Docker, Containers and the Future of Application Delivery
Docker, Containers and the Future of Application DeliveryDocker, Containers and the Future of Application Delivery
Docker, Containers and the Future of Application DeliveryDocker, Inc.
 
Hypervisor "versus" Linux Containers with Docker !
Hypervisor "versus" Linux Containers with Docker !Hypervisor "versus" Linux Containers with Docker !
Hypervisor "versus" Linux Containers with Docker !Francisco Gonçalves
 
Why docker | OSCON 2013
Why docker | OSCON 2013Why docker | OSCON 2013
Why docker | OSCON 2013dotCloud
 
Docker, Containers and the Future of Application Delivery
Docker, Containers and the Future of Application DeliveryDocker, Containers and the Future of Application Delivery
Docker, Containers and the Future of Application DeliveryDocker, Inc.
 
oci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdfoci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdfNandiniSinghal16
 
Creating Realistic Unit Tests with Testcontainers
Creating Realistic Unit Tests with TestcontainersCreating Realistic Unit Tests with Testcontainers
Creating Realistic Unit Tests with TestcontainersPaul Balogh
 
Kamaelia - Networking Using Generators
Kamaelia - Networking Using GeneratorsKamaelia - Networking Using Generators
Kamaelia - Networking Using Generatorskamaelian
 
Level Up Your Integration Testing With Testcontainers
Level Up Your Integration Testing With TestcontainersLevel Up Your Integration Testing With Testcontainers
Level Up Your Integration Testing With TestcontainersVMware Tanzu
 
Containers and Microservices for Realists
Containers and Microservices for RealistsContainers and Microservices for Realists
Containers and Microservices for RealistsOracle Developers
 
Containers and microservices for realists
Containers and microservices for realistsContainers and microservices for realists
Containers and microservices for realistsKarthik Gaekwad
 
Continous Delivery and Continous Integration at IKERLAN
Continous Delivery and Continous Integration at IKERLANContinous Delivery and Continous Integration at IKERLAN
Continous Delivery and Continous Integration at IKERLANAngel Conde Manjon
 
ContainerSched 2017: Why Containers Will Take Over the World
ContainerSched 2017: Why Containers Will Take Over the WorldContainerSched 2017: Why Containers Will Take Over the World
ContainerSched 2017: Why Containers Will Take Over the WorldElton Stoneman
 
How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...
How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...
How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...SOASTA
 

Similar to 16 years of the Chemistry Development Kit (CDK) (20)

Openstack Summit Container Day Keynote
Openstack Summit Container Day KeynoteOpenstack Summit Container Day Keynote
Openstack Summit Container Day Keynote
 
ECMFA 2016 slides
ECMFA 2016 slidesECMFA 2016 slides
ECMFA 2016 slides
 
Arquillian
ArquillianArquillian
Arquillian
 
Cloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CDCloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CD
 
Ensuring OpenStack Version up Compatibility for CloudOpen Japan 2013-05-31
Ensuring OpenStack Version up Compatibility for CloudOpen Japan 2013-05-31Ensuring OpenStack Version up Compatibility for CloudOpen Japan 2013-05-31
Ensuring OpenStack Version up Compatibility for CloudOpen Japan 2013-05-31
 
Docker в автоматизации тестирования
Docker в автоматизации тестированияDocker в автоматизации тестирования
Docker в автоматизации тестирования
 
Why Docker
Why DockerWhy Docker
Why Docker
 
Docker, Containers and the Future of Application Delivery
Docker, Containers and the Future of Application DeliveryDocker, Containers and the Future of Application Delivery
Docker, Containers and the Future of Application Delivery
 
Hypervisor "versus" Linux Containers with Docker !
Hypervisor "versus" Linux Containers with Docker !Hypervisor "versus" Linux Containers with Docker !
Hypervisor "versus" Linux Containers with Docker !
 
Why docker | OSCON 2013
Why docker | OSCON 2013Why docker | OSCON 2013
Why docker | OSCON 2013
 
Docker, Containers and the Future of Application Delivery
Docker, Containers and the Future of Application DeliveryDocker, Containers and the Future of Application Delivery
Docker, Containers and the Future of Application Delivery
 
oci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdfoci-container-engine-oke-100.pdf
oci-container-engine-oke-100.pdf
 
Creating Realistic Unit Tests with Testcontainers
Creating Realistic Unit Tests with TestcontainersCreating Realistic Unit Tests with Testcontainers
Creating Realistic Unit Tests with Testcontainers
 
Kamaelia - Networking Using Generators
Kamaelia - Networking Using GeneratorsKamaelia - Networking Using Generators
Kamaelia - Networking Using Generators
 
Level Up Your Integration Testing With Testcontainers
Level Up Your Integration Testing With TestcontainersLevel Up Your Integration Testing With Testcontainers
Level Up Your Integration Testing With Testcontainers
 
Containers and Microservices for Realists
Containers and Microservices for RealistsContainers and Microservices for Realists
Containers and Microservices for Realists
 
Containers and microservices for realists
Containers and microservices for realistsContainers and microservices for realists
Containers and microservices for realists
 
Continous Delivery and Continous Integration at IKERLAN
Continous Delivery and Continous Integration at IKERLANContinous Delivery and Continous Integration at IKERLAN
Continous Delivery and Continous Integration at IKERLAN
 
ContainerSched 2017: Why Containers Will Take Over the World
ContainerSched 2017: Why Containers Will Take Over the WorldContainerSched 2017: Why Containers Will Take Over the World
ContainerSched 2017: Why Containers Will Take Over the World
 
How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...
How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...
How To Use Jenkins for Continuous Load and Mobile Testing with SOASTA & Cloud...
 

More from Christoph Steinbeck

Publication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic moleculesPublication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic moleculesChristoph Steinbeck
 
Developments in Metabolomics leading to PhenoMeNal
Developments in Metabolomics leading to PhenoMeNalDevelopments in Metabolomics leading to PhenoMeNal
Developments in Metabolomics leading to PhenoMeNalChristoph Steinbeck
 
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Christoph Steinbeck
 
Building a Model Organism Metabolome Database
Building a  Model Organism Metabolome DatabaseBuilding a  Model Organism Metabolome Database
Building a Model Organism Metabolome DatabaseChristoph Steinbeck
 
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping dataPhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping dataChristoph Steinbeck
 
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for MetabolomicsDeveloping an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for MetabolomicsChristoph Steinbeck
 
Building an efficient infrastructure, standards and data flow for metabolomics
Building an efficient infrastructure, standards and data flow for metabolomicsBuilding an efficient infrastructure, standards and data flow for metabolomics
Building an efficient infrastructure, standards and data flow for metabolomicsChristoph Steinbeck
 
World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016Christoph Steinbeck
 
Skolnik symposium ACS Meeting Philadelphia 2016
Skolnik symposium ACS Meeting Philadelphia 2016Skolnik symposium ACS Meeting Philadelphia 2016
Skolnik symposium ACS Meeting Philadelphia 2016Christoph Steinbeck
 
Multi-Omics Bioinformatics across Application Domains
Multi-Omics Bioinformatics across Application DomainsMulti-Omics Bioinformatics across Application Domains
Multi-Omics Bioinformatics across Application DomainsChristoph Steinbeck
 
The time is right to focus on a model organism database
The time is right to focus on a model organism databaseThe time is right to focus on a model organism database
The time is right to focus on a model organism databaseChristoph Steinbeck
 
PhenoMeNal presentation at STFC-ELIXIR Meeting Hinxon
PhenoMeNal presentation at STFC-ELIXIR Meeting HinxonPhenoMeNal presentation at STFC-ELIXIR Meeting Hinxon
PhenoMeNal presentation at STFC-ELIXIR Meeting HinxonChristoph Steinbeck
 
Large Scale computing with medical metabolic phenotyping data
Large Scale computing with medical metabolic phenotyping dataLarge Scale computing with medical metabolic phenotyping data
Large Scale computing with medical metabolic phenotyping dataChristoph Steinbeck
 
Sharing data from clinical and medical research
Sharing data from clinical and medical researchSharing data from clinical and medical research
Sharing data from clinical and medical researchChristoph Steinbeck
 

More from Christoph Steinbeck (14)

Publication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic moleculesPublication of raw and curated NMR spectroscopic data for organic molecules
Publication of raw and curated NMR spectroscopic data for organic molecules
 
Developments in Metabolomics leading to PhenoMeNal
Developments in Metabolomics leading to PhenoMeNalDevelopments in Metabolomics leading to PhenoMeNal
Developments in Metabolomics leading to PhenoMeNal
 
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
 
Building a Model Organism Metabolome Database
Building a  Model Organism Metabolome DatabaseBuilding a  Model Organism Metabolome Database
Building a Model Organism Metabolome Database
 
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping dataPhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
 
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for MetabolomicsDeveloping an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
 
Building an efficient infrastructure, standards and data flow for metabolomics
Building an efficient infrastructure, standards and data flow for metabolomicsBuilding an efficient infrastructure, standards and data flow for metabolomics
Building an efficient infrastructure, standards and data flow for metabolomics
 
World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016
 
Skolnik symposium ACS Meeting Philadelphia 2016
Skolnik symposium ACS Meeting Philadelphia 2016Skolnik symposium ACS Meeting Philadelphia 2016
Skolnik symposium ACS Meeting Philadelphia 2016
 
Multi-Omics Bioinformatics across Application Domains
Multi-Omics Bioinformatics across Application DomainsMulti-Omics Bioinformatics across Application Domains
Multi-Omics Bioinformatics across Application Domains
 
The time is right to focus on a model organism database
The time is right to focus on a model organism databaseThe time is right to focus on a model organism database
The time is right to focus on a model organism database
 
PhenoMeNal presentation at STFC-ELIXIR Meeting Hinxon
PhenoMeNal presentation at STFC-ELIXIR Meeting HinxonPhenoMeNal presentation at STFC-ELIXIR Meeting Hinxon
PhenoMeNal presentation at STFC-ELIXIR Meeting Hinxon
 
Large Scale computing with medical metabolic phenotyping data
Large Scale computing with medical metabolic phenotyping dataLarge Scale computing with medical metabolic phenotyping data
Large Scale computing with medical metabolic phenotyping data
 
Sharing data from clinical and medical research
Sharing data from clinical and medical researchSharing data from clinical and medical research
Sharing data from clinical and medical research
 

Recently uploaded

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 

Recently uploaded (20)

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

16 years of the Chemistry Development Kit (CDK)

  • 1. 16 Years of the Chemistry Development Kit (CDK) Christoph Steinbeck and the CDK Developers
  • 2. http://cdk.sourceforge.net The Chemistry Development Kit (CDK) Open Source Cheminformatics in Java
  • 3. The CDK after 16 years •16,521 commits made by 115 contributors •564,171 lines of code •mostly written in Java •well established, mature codebase •maintained by a large development team •with stable Y-O-Y commits •estimated 151 years of effort (COCOMO model) •first commit in October, 2000 •most recent commit 1 day ago The Chemistry Development Kit (CDK) Open Source Cheminformatics in Java
  • 4. A bit of history
  • 6.
  • 7. Computer-Assisted Structure Elucidation (CASE) Steinbeck, C.; Angewandte Chemie. International Ed. in English 1996, 35, 1984-1986 Steinbeck, C.: J. Chem. Inf. Comput. Sci. 2001, 41, 6, 1500 1992 - now
  • 8. A bit of history
  • 11.
  • 12.
  • 13.
  • 14.
  • 16. The Doctor Who Model of Open Source
  • 17. The Doctor Who Model of Open Source Egon Willighagen
  • 18. The Doctor Who Model of Open Source Rajarshi Guha Egon Willighagen
  • 19. The Doctor Who Model of Open Source John May Rajarshi Guha Egon Willighagen
  • 20. Development Model • Open Source Principles • Release Early, Release Often • All the Raymond stuff (Cathedral … • Persistance • People contribute what they need • You need a Doctor Who who cares • code quality, build systems, etc
  • 22. • Maven describes how a project is built and it’s dependencies. • Simplifies both building from source or linking a distributed JAR. • Dependencies are dynamically downloaded and kept in sync. • “Convention over configuration” • Many new Java projects choose Maven, CDK is 15+ years old this was a challenge. • Splitting a one source tree into 76 interdependent modules. • Modularisation started by Egon Willighagen modularisation in Ant. • Test Fail=Build Fail, Required resolving 150+ existing regressions. • CDK 1.5.10+ available from The Central Repository a geographically distributed collection of dedicated servers. Maven Switch Over tinyurl.com/cdk-mavencentral
  • 23. 1.5.x: Cleaner, More Efficient, More Robust, More Stable Example: Generate depiction of a molecule. 1.4.x 1.5.x + Improved Layout + Improved Render + Easy highlighting + Abbreviations
  • 24. Try it: http://cdkdepict-openchem.rhcloud.com/ Error 1.4.x1.5.x Examples 1-4: Clark A, et al, 2D structure depiction. JCIM, 46, 1107-1123 (2006) 1.5.x: Cleaner, More Efficient, More Robust, More Stable Molecule 2D layout and rendering from SMILES
  • 25. Try it: http://cdkdepict-openchem.rhcloud.com/ 1.4.x 1.5.x 1.5.x +abbr +colmap Reaction: Lowe, D. Extraction of chemical structures and reactions from the literature. PhD Thesis. 2012 1.5.x: Cleaner, More Efficient, More Robust, More Stable Reaction 2D layout and rendering from SMILES
  • 27. Example: SMARTS match for intramolecular Hydrogen Bonds: O=[C,N]aa[N,O;!H0] in NCI Aug00 (~250,000 molecules) [1-3] 1.4.x: 16 mins (64 err) 1.5.x: 16 secs (0 err) + Lazy algorithm + Stereochemistry Match + Component Grouping + Adaptive (e.g. ring membership only if needed) + New Pattern API, hides differences between SMARTS/Substructure/ Isomorphism queries 1.5.x: Cleaner, More Efficient, More Robust, More Stable [1] Weininger D. Chemistry Cartridge CGI Examples. EMug (1998) [2] Sayle R. Cheminformatics Toolkits: a personal perspective, RDKit UGM (2012) [3] May J. All The Small Things. http://efficientbits.blogspot.co.uk/2013/10/
  • 28. Robustness CDK 1.5.x moves away from default atom type perception/sanitisation. + Much faster + High fidelity IO: round trip [CH2] though SMILES/InChI/Molfile + Exact Kekulization + Exact ring perception + Portable canonical Kekulé SMILES + Multiple aromaticity models, “Horse for courses” + Accurate MMFF94 partial charges Stability Java APIs can be more fluid than native: aim to keep public API fixed. Continuous integration and regression testing with Jenkins and Travis. 1.5.x: Cleaner, More Efficient, More Robust, More Stable
  • 29. Stereochemistry Tetrahedral, CisTrans, Extended Tetrahedral (Allene) Representation and round tripping between formats Query Matching Perspective conversion (Haworth, Chairs, Fischer) File Formats Molfile Sgroup support: Repeat Units, Display Shortcuts CXSMILES Coming soon: HELM 2.0 Fingerprints Count fingerprints Efficient Circular Fingerprint and Model Building Clark et al. JChemInf. 6:38 (2014) Coming soon: FPS readers, mmap indexes Updated and Super Quick Fundamental Algorithms Ring Finding - May and Steinbeck. JChemInf. 6:3 (2014) Subgraph Isomorphism Canonical labelling Aromaticity Kekulization Molecular Hash Codes, Automorphism Group, and much more Other Features of 1.5.x
  • 30. Acknowledgement • All CDK developers • The Blue Obelisk Community • You for your attention