SlideShare a Scribd company logo
1 of 41
From Words to Wonders:
Language Models for Life
Sciences
Room L1.02
Robots Unleashed: The Rise of AI-
Driven Chemical Discovery
Room L1.01
16 November 2023
The Rise of AI-driven Chemical Discovery
- Language Models, Robotics, and Digitizing
Operations
ChemAI 2023
Amsterdam
16 November 2023
Dr. Amol Thakkar
�tha@zurich.ibm.com
Research Scientist
AI for Scientific Discovery
IBM Research
The research
lab is the
central
element in
scientific
discovery
2
IBM Research| ©2023 IBM Corporation
Up to 70% of experimentation is not
reproducible because of flawed
experimental data or metadata1
Only one-third to one-half of original
findings are also observed in replication
studies2
Studies can last years instead of months
due to small differences in protocols3
In 2020, companies spent an average of
17% of revenue on digital initiatives and
4% of revenue on R&D4
IBM Research| ©2023 IBM Corporation 3
1 Accenture, Digital Transformation in the Lab (2020)
2 Aarts, A. A. et al. Science 349, 943 (2015)
3 Hines, W. C. et al. Cell Rep. 6, 779–781 (2014)
4 Gartner, Digital Technologies in R&D (2019), S&P 500 (2020)
AI breakthroughs for language are changing scientific
discovery
4
IBM Research| ©2023 IBM Corporation
AI
Data AI Models
Question Answering
Improved
Language
Tasks
Sentences
Words
Letters
Language
AI
Data AI Models
Improved
Chemistry
Tasks
Properties/Reactions
Molecules
Atoms
Chemistry
Generative modeling and transformers are achieving new breakthroughs in chemistry
Natural Language Processing (NLP) Accelerated Chemistry
Transformers Transformers
Generative Toolkit for Scientific Discovery (GT4SD)
Open-source library to accelerate hypothesis generation in scientific discovery
A winner of the
2022 IEEE Open Software
Services Award
GT4SD makes generative AI algorithms and
models easier to use in scientific discovery
https://github.com/GT4SD/gt4sd-core Applications include hypothesis generation for
inverse design and discovery of materials and
therapeutics like antivirals and antimicrobials
Example molecules generated using GT4SD
Test
Hypothesize
Study
1. Train generative models
2. Create inference pipelines
3. Run inference pipelines
W Manica et al., npj Comput. Mater. 9, 69 (2023)
AI for Scientific Discovery | IBM Research | ©2023 IBM Corporation 6
Chemistry as a language
C C 1 ( C C C C ( N 1 O C ( O C 1
= C C = C ( [N+] ( = O ) [O-] ) C
= C 1 ) = O ) ( C ) C ) C
C C 1 ( C C C C ( N 1 O ) ( C ) C ) C . O ( C ( = O ) Cl )
C 1 = C C = C ( [N+] ( = O ) [O-] ) C = C 1
Products Reactants + Reagents
English Spanish
Translation
retrosynthesis
synthesis
“Sentence of atoms”
Textual representation (SMILES)
AI for Scientific Discovery | IBM Research | ©2023 IBM Corporation 7
CC1(CCCC(N1OC(OC1=CC=C([N+](=O)[O-])C=C1)=O)(C)C)C CC1(CCCC(N1O)(C)C)C O(C(=O)Cl)C1=CC=C([N+](=O)[O-])C=C1
Automating lab synthesis and experimentation
with help of AI
7
W Vaucher et al., Nat. Comm. 12 (2022) 2573
IBM Research| ©2023 IBM Corporation
O = C ( N C c 1 c c c c ( Cl ) c 1 ) c 1 c c c ( C Br ) c c 1
1. AI-based chemical reaction prediction
synthesis
+
retrosynthesis
“translation”
transformer
N C c 1 c c c c ( Cl ) c 1 . O = C ( Cl ) c 1 c c c ( C Br ) c c 1
W Schwaller et al., Chem. Sci. 9 (2018) 6091
W Schwaller et al., ACSCent. Sci. 5 (2019) 1572
W Schwaller et al., Chem. Sci. 11 (2020) 3316
2. Chemical procedures from text (Paragraph2Actions)
translation
The TFA was removed in vacuo and a
saturated solution of NaHCO3 was added.
Concentrate(),
Add(name=‘saturated solution of NaHCO3’)
W Vaucher et al., Nat. Comm. 11 (2020) 3601
C(=NC1CCCCC1)=NC1CCCCC1.ClCCl.CC1(C)CC(=O)Nc2cc(C(=O)O)ccc21.Nc1ccccc1>>CC1(C)CC(=O)Nc2cc(C(=O)Nc3ccccc3)ccc21
3. Chemical procedures from reactions (Smiles2Actions)
1. ADD $1$
2. ADD $4$
3. ADD $2$
4. ADD $3$
5. STIR for @3@ at #4#
6. FILTER keep precipitate
7. RECRYSTALLIZE from ethanol
8. YIELD $-1$
30k+
global users
via cloud
40+ million
reaction
predictions
Automated synthesis via cloud + robotic lab
https://rxn.res.ibm.com/
Making AI for chemistry available to everyone
AI for Scientific Discovery | IBM Research | ©2023 IBM Corporation 9
http://rxn.app.accelerate.science/
Multi-modal foundation models will
accelerate fundamental research tasks
W Christofidellis et al., ICML 2023
AI for Scientific Discovery | IBM Research | ©2023 IBM Corporation 1
0
An example from chemistry…
Up to 70% of experimentation is not
reproducible because of flawed
experimental data or metadata1
Only one-third to one-half of original
findings are also observed in replication
studies2
Studies can last years instead of months
due to small differences in protocols3
In 2020, companies spent an average of
17% of revenue on digital initiatives and
4% of revenue on R&D4
IBM Research| ©2023 IBM Corporation 10
1 Accenture, Digital Transformation in the Lab (2020)
2 Aarts, A. A. et al. Science 349, 943 (2015)
3 Hines, W. C. et al. Cell Rep. 6, 779–781 (2014)
4 Gartner, Digital Technologies in R&D (2019), S&P 500 (2020)
Vendor lock in for devices, poor
interoperability, and no APIs provided
IBM Research| ©2023 IBM Corporation 12
Documentation burden on laboratory
scientists
Data and meta data not captured,
reporting bias
Need to improve data quality!
Building on the Experience of Domain Experts
Domain Specific Models
Expert Systems
Machine Learning
Deep Learning
User Driven Ecosystem
Multiple models for specific
tasks
Intelligent Ecosystem
Orchestration of domain
specific models based on
context
Domain Experts
Generate Data Requires data
IBM Research| ©2023 IBM Corporation 13
?
Creating the Lab that Learns Key innovations Benefits for the lab
• AI foundation models for
automatic
documentation of
manual procedures and
validation of outcomes
• Hybrid and multi-cloud
computing to
automatically integrate
all data and metadata of
any experiment
• Capture all details
needed to fully capture
and describe an
experiment
• Minimize the time wasted
using different tools to
organize data
• Reproduce any version of
an experiment at any
point in time
• Discover patterns by
continuously learning
over all experimental
data
AI for Scientific Discovery | IBM Research | ©2023 IBM Corporation 13
Chip AZ33 was
developed in 1:3
MIPK:IPA for 10
seconds
E-beam
lithography
Details
Manual puddle
development
Details
Video +Gaze
Audio
Text
The AI-enabled lab of the future for a new
era of reproducible and collaborative
experimentation
Multi-modal
data
Foundation model
generates workflows
010011101101
110101011100
101110110101
100011110011
Bytes
Consider a Laboratory Environment
Chemistry
laboratory
Lab user
Action:
Measurement
Problem:
- Recording volume
IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 16
Multi-Modal Models for Automated Documentation
Chemistry
laboratory
Lab user
Meas
Lab goggles with video
IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 17
Multi-Modal Models for Automated Documentation
Meas
Chemistry
laboratory
Lab user
Lab goggles with video
Containerised data watcher
Laboratory Device
Device 1
Automatic step inference
IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 18
- Predicts the step taken
- Automatic extraction of
- Volume
- Measurement data
Multi-Modal Models for Automated Documentation
Stir
Chemistry
laboratory
Lab user
Lab goggles with video
Containerised data watcher
Laboratory Device
Device 1
Automatic step inference
IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 19
- Its easy to forget steps
trivial for domain experts
- Automatic
documentation of which
step and in which order
Architecture Overview
An
Chemistry
laboratory
Lab user
Device 1 Device 2
Collect Videos
Collect Device
Measurements
Data Storage
AI Engine
Consolidate Workflow
Actions
Notes
Digit Recognition
IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 20
Interoperable Across Multiple Devices
Device 1 Device 2
- Vendor independent data streaming
- Interoperable across multiple devices and
networks
- Containerised solution for automatic data
capture
IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 21
Interoperable Cloud Environments with Multi-Cloud
- Vendor independent cloud solution
- On premise, public cloud, or a combination
- AI models, and data can live in different cloud
environments
Actions
Notes
Digit Recognition
IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 22
Summary
Experiment
Question
Hypothesize
Study
Analyze
Report
Lab that Learns
IBM Research| ©2023 IBM Corporation 23
- Vendor independent solution for device
integration
- Interoperable across multiple devices and
networks
- Automatically documents steps for
experiments and workflows
- Extract crucial data and meta-data of
relevance to the experiment
- Reproducible workflows
What can we
discover
together?
Acknowledgments
Adriano Martinelli
Alain Vaucher
Alek Sobczyk
Alessandra Toniato
Alice Driessen
Amol Thakkar
Andrea Giovannini
Antonio Foncubierta
Anuv Chakraborty
Artem Leonov
Carlo Baldassari
Dimitrios Christofidellis
Federico Zipoli
Gianmarco Gabrieli
Jannis Born
Joris Cadow-Gossweiler
Mara Graziani
Marianna Rapsomaniki
Marvin Alberts
Matteo Manica
Miruna Cretu
Nicolas Deutschmann
Nikita Janakarajan
Oliver Schilter
Pushpak Pati
Patrick Ruch
Teodoro Laino
IBM Research| ©2023 IBM Corporation 24
Wilhelm Huck
Radboud University
Introducing the Nationaal
Groeifonds ‘Big Chemistry’
consortium
Partners:
TU/e, RUG, Radboud, AMOLF
Fontys Hogeschool Eindhoven
Max Planck Gesellschaft
Complex society,
complex problems
We know we need to change….
….. But change is too slow!
Chemistry needs to innovate
Changing the paradigm of chemical research
7
Alun Aspuru-Guzik Science (2018) 361, 360-365 DOI: 10.1126/science.aat2663
Evolution is the ‘problem solver’ in biology
8
We need an evolution machine
Genotypes are defined as the collection of
all experimental parameters of a system
(i.e. molecular composition, pH,
temperature, etc. etc.)
Phenotypes are defined as the collection of all
experimental properties of a system (i.e.
fluorescence, turbidity, spatiotemporal
patterns shape, etc. etc.)
Chemistry is more than synthesis
Focus on molecular properties
What?
a) Solubility of molecules in water and organic solvents
b) Predicting CMC, surface tension
c) Predicting reactivity
d) Vapour pressures
Possible applications
a) Aiding formulation of stable emulsions
b) Creating a desired smell of mixture of compounds
c) Discovering catalytic activity
d) Automated synthesis using solubility, reactivity, kinetics prediction
The Big Chemistry ecosystem
RobotLab
Central Facility
Max Planck Research Campus
Industry
transforming formulation from an art to a
science-based technology
Start-ups
Specialized CROs
online formulation
Fundamental research
Tue, RUG, RU, AMOLF, Fontys
12
Example: LLM for solubility prediction
MMB was trained on the ZINC database, approx. 1,5 billion molecules
Database: https://zenodo.org/records/5970538
Vermeire et al. J. Am. Chem. Soc. 2022, 144, 24, 10785–10797
13
megaMOLBart trained on AqueousSolu da
compounds)
Promising result:
MMB is as good as high-level theoretical calculations in predicting
solubility….
(trained small regression head, 600k parameters)
megaMOLBart does not understand chem
Broadening: predicting logCMC values
Data:
Manually curated dataset containing 1316 compounds
Type of surfactants:
i. Anionic --- 225 compounds
ii. Anionic-cationic salt --- 13 compounds
iii. Cationic --- 693 compounds
iv.Nonionic --- 366 compounds
v. Zwitterionic --- 19 compounds
Next steps
Explore possibilities for multi-property prediction
solubility + pCMC + surface tension
solubility in multiple organic solvents
Expand experimental datasets
Ensure each additional datapoint yields maximum information
Develop high throughput analytical methods
Beyond pure compounds
Predict properties of mixtures of molecules
acknowledgements
Co-PIs: Bert Meijer, Ghislaine Vantomme, Ben Feringa, Nathalie Katsonis Board Big C
Marcel Wubbolts Radboud Team:
Tal Kachman, Stefan Hödl, Will Robinson, Aigars Piruska, Luc Hermans Peter Korevaar, Jana Roit
Collaborators RUG, Tue, AMOLF, Fontys

More Related Content

Similar to Workshop Chemical Robotics ChemAI 231116.pptx

An investigation of extreme programming practices and its impact on software ...
An investigation of extreme programming practices and its impact on software ...An investigation of extreme programming practices and its impact on software ...
An investigation of extreme programming practices and its impact on software ...Roberto Pepato
 
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16Boris Adryan
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
 
Design Optimization of Safety Critical Component for Fatigue and Strength Usi...
Design Optimization of Safety Critical Component for Fatigue and Strength Usi...Design Optimization of Safety Critical Component for Fatigue and Strength Usi...
Design Optimization of Safety Critical Component for Fatigue and Strength Usi...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Pi school-dli-presentation de nobili
Pi school-dli-presentation de nobiliPi school-dli-presentation de nobili
Pi school-dli-presentation de nobiliDeep Learning Italia
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSIRJET Journal
 
Fogify: A Fog Computing Emulation Framework
Fogify: A Fog Computing Emulation FrameworkFogify: A Fog Computing Emulation Framework
Fogify: A Fog Computing Emulation FrameworkMoysisSymeonides
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...INRIA-OAK
 
20072311272506
2007231127250620072311272506
20072311272506Vinod Vyas
 
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...Eugenio Villar
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 

Similar to Workshop Chemical Robotics ChemAI 231116.pptx (20)

An investigation of extreme programming practices and its impact on software ...
An investigation of extreme programming practices and its impact on software ...An investigation of extreme programming practices and its impact on software ...
An investigation of extreme programming practices and its impact on software ...
 
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud Library
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud Library
 
Design Optimization of Safety Critical Component for Fatigue and Strength Usi...
Design Optimization of Safety Critical Component for Fatigue and Strength Usi...Design Optimization of Safety Critical Component for Fatigue and Strength Usi...
Design Optimization of Safety Critical Component for Fatigue and Strength Usi...
 
Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016
 
Pi school-dli-presentation de nobili
Pi school-dli-presentation de nobiliPi school-dli-presentation de nobili
Pi school-dli-presentation de nobili
 
IT6511 Networks Laboratory
IT6511 Networks LaboratoryIT6511 Networks Laboratory
IT6511 Networks Laboratory
 
Embedded systems
Embedded systemsEmbedded systems
Embedded systems
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
 
UNIT 1.pdf
UNIT 1.pdfUNIT 1.pdf
UNIT 1.pdf
 
UNIT 1.pptx
UNIT 1.pptxUNIT 1.pptx
UNIT 1.pptx
 
Fogify: A Fog Computing Emulation Framework
Fogify: A Fog Computing Emulation FrameworkFogify: A Fog Computing Emulation Framework
Fogify: A Fog Computing Emulation Framework
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...
 
Deep learning in manufacturing predicting and preventing manufacturing defect...
Deep learning in manufacturing predicting and preventing manufacturing defect...Deep learning in manufacturing predicting and preventing manufacturing defect...
Deep learning in manufacturing predicting and preventing manufacturing defect...
 
20072311272506
2007231127250620072311272506
20072311272506
 
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Edge-Fog Cloud
Edge-Fog CloudEdge-Fog Cloud
Edge-Fog Cloud
 

Recently uploaded

Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 

Recently uploaded (20)

Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 

Workshop Chemical Robotics ChemAI 231116.pptx

  • 1. From Words to Wonders: Language Models for Life Sciences Room L1.02 Robots Unleashed: The Rise of AI- Driven Chemical Discovery Room L1.01 16 November 2023
  • 2. The Rise of AI-driven Chemical Discovery - Language Models, Robotics, and Digitizing Operations ChemAI 2023 Amsterdam 16 November 2023 Dr. Amol Thakkar �tha@zurich.ibm.com Research Scientist AI for Scientific Discovery IBM Research
  • 3. The research lab is the central element in scientific discovery 2 IBM Research| ©2023 IBM Corporation
  • 4. Up to 70% of experimentation is not reproducible because of flawed experimental data or metadata1 Only one-third to one-half of original findings are also observed in replication studies2 Studies can last years instead of months due to small differences in protocols3 In 2020, companies spent an average of 17% of revenue on digital initiatives and 4% of revenue on R&D4 IBM Research| ©2023 IBM Corporation 3 1 Accenture, Digital Transformation in the Lab (2020) 2 Aarts, A. A. et al. Science 349, 943 (2015) 3 Hines, W. C. et al. Cell Rep. 6, 779–781 (2014) 4 Gartner, Digital Technologies in R&D (2019), S&P 500 (2020)
  • 5. AI breakthroughs for language are changing scientific discovery 4 IBM Research| ©2023 IBM Corporation AI Data AI Models Question Answering Improved Language Tasks Sentences Words Letters Language AI Data AI Models Improved Chemistry Tasks Properties/Reactions Molecules Atoms Chemistry Generative modeling and transformers are achieving new breakthroughs in chemistry Natural Language Processing (NLP) Accelerated Chemistry Transformers Transformers
  • 6. Generative Toolkit for Scientific Discovery (GT4SD) Open-source library to accelerate hypothesis generation in scientific discovery A winner of the 2022 IEEE Open Software Services Award GT4SD makes generative AI algorithms and models easier to use in scientific discovery https://github.com/GT4SD/gt4sd-core Applications include hypothesis generation for inverse design and discovery of materials and therapeutics like antivirals and antimicrobials Example molecules generated using GT4SD Test Hypothesize Study 1. Train generative models 2. Create inference pipelines 3. Run inference pipelines W Manica et al., npj Comput. Mater. 9, 69 (2023) AI for Scientific Discovery | IBM Research | ©2023 IBM Corporation 6
  • 7. Chemistry as a language C C 1 ( C C C C ( N 1 O C ( O C 1 = C C = C ( [N+] ( = O ) [O-] ) C = C 1 ) = O ) ( C ) C ) C C C 1 ( C C C C ( N 1 O ) ( C ) C ) C . O ( C ( = O ) Cl ) C 1 = C C = C ( [N+] ( = O ) [O-] ) C = C 1 Products Reactants + Reagents English Spanish Translation retrosynthesis synthesis “Sentence of atoms” Textual representation (SMILES) AI for Scientific Discovery | IBM Research | ©2023 IBM Corporation 7 CC1(CCCC(N1OC(OC1=CC=C([N+](=O)[O-])C=C1)=O)(C)C)C CC1(CCCC(N1O)(C)C)C O(C(=O)Cl)C1=CC=C([N+](=O)[O-])C=C1
  • 8. Automating lab synthesis and experimentation with help of AI 7 W Vaucher et al., Nat. Comm. 12 (2022) 2573 IBM Research| ©2023 IBM Corporation O = C ( N C c 1 c c c c ( Cl ) c 1 ) c 1 c c c ( C Br ) c c 1 1. AI-based chemical reaction prediction synthesis + retrosynthesis “translation” transformer N C c 1 c c c c ( Cl ) c 1 . O = C ( Cl ) c 1 c c c ( C Br ) c c 1 W Schwaller et al., Chem. Sci. 9 (2018) 6091 W Schwaller et al., ACSCent. Sci. 5 (2019) 1572 W Schwaller et al., Chem. Sci. 11 (2020) 3316 2. Chemical procedures from text (Paragraph2Actions) translation The TFA was removed in vacuo and a saturated solution of NaHCO3 was added. Concentrate(), Add(name=‘saturated solution of NaHCO3’) W Vaucher et al., Nat. Comm. 11 (2020) 3601 C(=NC1CCCCC1)=NC1CCCCC1.ClCCl.CC1(C)CC(=O)Nc2cc(C(=O)O)ccc21.Nc1ccccc1>>CC1(C)CC(=O)Nc2cc(C(=O)Nc3ccccc3)ccc21 3. Chemical procedures from reactions (Smiles2Actions) 1. ADD $1$ 2. ADD $4$ 3. ADD $2$ 4. ADD $3$ 5. STIR for @3@ at #4# 6. FILTER keep precipitate 7. RECRYSTALLIZE from ethanol 8. YIELD $-1$ 30k+ global users via cloud 40+ million reaction predictions Automated synthesis via cloud + robotic lab https://rxn.res.ibm.com/
  • 9. Making AI for chemistry available to everyone AI for Scientific Discovery | IBM Research | ©2023 IBM Corporation 9 http://rxn.app.accelerate.science/
  • 10. Multi-modal foundation models will accelerate fundamental research tasks W Christofidellis et al., ICML 2023 AI for Scientific Discovery | IBM Research | ©2023 IBM Corporation 1 0 An example from chemistry…
  • 11. Up to 70% of experimentation is not reproducible because of flawed experimental data or metadata1 Only one-third to one-half of original findings are also observed in replication studies2 Studies can last years instead of months due to small differences in protocols3 In 2020, companies spent an average of 17% of revenue on digital initiatives and 4% of revenue on R&D4 IBM Research| ©2023 IBM Corporation 10 1 Accenture, Digital Transformation in the Lab (2020) 2 Aarts, A. A. et al. Science 349, 943 (2015) 3 Hines, W. C. et al. Cell Rep. 6, 779–781 (2014) 4 Gartner, Digital Technologies in R&D (2019), S&P 500 (2020)
  • 12. Vendor lock in for devices, poor interoperability, and no APIs provided IBM Research| ©2023 IBM Corporation 12 Documentation burden on laboratory scientists Data and meta data not captured, reporting bias Need to improve data quality!
  • 13. Building on the Experience of Domain Experts Domain Specific Models Expert Systems Machine Learning Deep Learning User Driven Ecosystem Multiple models for specific tasks Intelligent Ecosystem Orchestration of domain specific models based on context Domain Experts Generate Data Requires data IBM Research| ©2023 IBM Corporation 13 ?
  • 14. Creating the Lab that Learns Key innovations Benefits for the lab • AI foundation models for automatic documentation of manual procedures and validation of outcomes • Hybrid and multi-cloud computing to automatically integrate all data and metadata of any experiment • Capture all details needed to fully capture and describe an experiment • Minimize the time wasted using different tools to organize data • Reproduce any version of an experiment at any point in time • Discover patterns by continuously learning over all experimental data AI for Scientific Discovery | IBM Research | ©2023 IBM Corporation 13 Chip AZ33 was developed in 1:3 MIPK:IPA for 10 seconds E-beam lithography Details Manual puddle development Details Video +Gaze Audio Text The AI-enabled lab of the future for a new era of reproducible and collaborative experimentation Multi-modal data Foundation model generates workflows 010011101101 110101011100 101110110101 100011110011 Bytes
  • 15.
  • 16. Consider a Laboratory Environment Chemistry laboratory Lab user Action: Measurement Problem: - Recording volume IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 16
  • 17. Multi-Modal Models for Automated Documentation Chemistry laboratory Lab user Meas Lab goggles with video IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 17
  • 18. Multi-Modal Models for Automated Documentation Meas Chemistry laboratory Lab user Lab goggles with video Containerised data watcher Laboratory Device Device 1 Automatic step inference IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 18 - Predicts the step taken - Automatic extraction of - Volume - Measurement data
  • 19. Multi-Modal Models for Automated Documentation Stir Chemistry laboratory Lab user Lab goggles with video Containerised data watcher Laboratory Device Device 1 Automatic step inference IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 19 - Its easy to forget steps trivial for domain experts - Automatic documentation of which step and in which order
  • 20. Architecture Overview An Chemistry laboratory Lab user Device 1 Device 2 Collect Videos Collect Device Measurements Data Storage AI Engine Consolidate Workflow Actions Notes Digit Recognition IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 20
  • 21. Interoperable Across Multiple Devices Device 1 Device 2 - Vendor independent data streaming - Interoperable across multiple devices and networks - Containerised solution for automatic data capture IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 21
  • 22. Interoperable Cloud Environments with Multi-Cloud - Vendor independent cloud solution - On premise, public cloud, or a combination - AI models, and data can live in different cloud environments Actions Notes Digit Recognition IBM Research | ©2023 IBM Corporation 'image: Flaticon.com'. This graphic has been designed using images from Flaticon.com 22
  • 23. Summary Experiment Question Hypothesize Study Analyze Report Lab that Learns IBM Research| ©2023 IBM Corporation 23 - Vendor independent solution for device integration - Interoperable across multiple devices and networks - Automatically documents steps for experiments and workflows - Extract crucial data and meta-data of relevance to the experiment - Reproducible workflows
  • 24. What can we discover together? Acknowledgments Adriano Martinelli Alain Vaucher Alek Sobczyk Alessandra Toniato Alice Driessen Amol Thakkar Andrea Giovannini Antonio Foncubierta Anuv Chakraborty Artem Leonov Carlo Baldassari Dimitrios Christofidellis Federico Zipoli Gianmarco Gabrieli Jannis Born Joris Cadow-Gossweiler Mara Graziani Marianna Rapsomaniki Marvin Alberts Matteo Manica Miruna Cretu Nicolas Deutschmann Nikita Janakarajan Oliver Schilter Pushpak Pati Patrick Ruch Teodoro Laino IBM Research| ©2023 IBM Corporation 24
  • 26. Introducing the Nationaal Groeifonds ‘Big Chemistry’ consortium
  • 27. Partners: TU/e, RUG, Radboud, AMOLF Fontys Hogeschool Eindhoven Max Planck Gesellschaft
  • 29. We know we need to change…. ….. But change is too slow!
  • 30. Chemistry needs to innovate
  • 31. Changing the paradigm of chemical research 7 Alun Aspuru-Guzik Science (2018) 361, 360-365 DOI: 10.1126/science.aat2663
  • 32. Evolution is the ‘problem solver’ in biology 8
  • 33. We need an evolution machine Genotypes are defined as the collection of all experimental parameters of a system (i.e. molecular composition, pH, temperature, etc. etc.) Phenotypes are defined as the collection of all experimental properties of a system (i.e. fluorescence, turbidity, spatiotemporal patterns shape, etc. etc.)
  • 34. Chemistry is more than synthesis Focus on molecular properties What? a) Solubility of molecules in water and organic solvents b) Predicting CMC, surface tension c) Predicting reactivity d) Vapour pressures Possible applications a) Aiding formulation of stable emulsions b) Creating a desired smell of mixture of compounds c) Discovering catalytic activity d) Automated synthesis using solubility, reactivity, kinetics prediction
  • 35. The Big Chemistry ecosystem RobotLab Central Facility Max Planck Research Campus Industry transforming formulation from an art to a science-based technology Start-ups Specialized CROs online formulation Fundamental research Tue, RUG, RU, AMOLF, Fontys
  • 36. 12 Example: LLM for solubility prediction MMB was trained on the ZINC database, approx. 1,5 billion molecules
  • 37. Database: https://zenodo.org/records/5970538 Vermeire et al. J. Am. Chem. Soc. 2022, 144, 24, 10785–10797 13 megaMOLBart trained on AqueousSolu da compounds) Promising result: MMB is as good as high-level theoretical calculations in predicting solubility…. (trained small regression head, 600k parameters)
  • 38. megaMOLBart does not understand chem
  • 39. Broadening: predicting logCMC values Data: Manually curated dataset containing 1316 compounds Type of surfactants: i. Anionic --- 225 compounds ii. Anionic-cationic salt --- 13 compounds iii. Cationic --- 693 compounds iv.Nonionic --- 366 compounds v. Zwitterionic --- 19 compounds
  • 40. Next steps Explore possibilities for multi-property prediction solubility + pCMC + surface tension solubility in multiple organic solvents Expand experimental datasets Ensure each additional datapoint yields maximum information Develop high throughput analytical methods Beyond pure compounds Predict properties of mixtures of molecules
  • 41. acknowledgements Co-PIs: Bert Meijer, Ghislaine Vantomme, Ben Feringa, Nathalie Katsonis Board Big C Marcel Wubbolts Radboud Team: Tal Kachman, Stefan Hödl, Will Robinson, Aigars Piruska, Luc Hermans Peter Korevaar, Jana Roit Collaborators RUG, Tue, AMOLF, Fontys