SlideShare a Scribd company logo
1 of 73
Download to read offline
Dr. Marcus D. Hanwell
marcus.hanwell@kitware.com
http://openchemistry.org/
June 8, 2013
LA-SiGMA
Baton Rouge, LA
Input Preparation, Data Visualization & Analysis
1	
  
Outline
•  Introduction
•  Kitware
•  Open Chemistry
•  Avogadro 2
•  MoleQueue
•  MongoChem
•  The Future
•  Summary
2	
  
Introduction
•  User-friendly desktop integration with
– Computational codes
– HPC/cloud resources
– Database/informatics resources
3	
  
Introduction
•  Bringing real change to chemistry
– Open-source frameworks
– Developed openly
– Cross-platform compatibility
– Tested and verified
– Contribution model
– Supported by Kitware experts
•  Liberally-licensed to facilitate research
4	
  
Open Chemistry Development Team
•  Inter-disciplinary team at Kitware
•  The first three worked on open-source
chemistry in their spare time
•  The final two are computer scientists with
years of open-source experience
•  Seeking partners in industry & research, labs
5	
  
Outline
•  Introduction
•  Kitware
•  Open Chemistry
•  Avogadro 2
•  MoleQueue
•  MongoChem
•  The Future
•  Summary
6	
  
Kitware
•  Founded in 1998 by five former GE Research employees
•  118 current employees; 39 with PhDs
•  Privately held, profitable from creation, no debt
•  Rapidly Growing: >30% in 2011, 7M web-visitors/quarter
•  Offices
–  Clifton Park, NY
–  Carrboro, NC
–  Santa Fe, NM
–  Lyon, France
•  2011 Small Business
Administration’s
Tibbetts Award
•  HPCWire Readers
and Editor’s Choice
•  Inc’s 5000 List: 2008
to 2011
Kitware: Core Technologies
8	
  
CMake
CDash
Supercomputing Visualization
•  Scientific Visualization
•  Informatics
•  Large Data Visualization
•  3D Interaction
•  Volume Rendering
Medical Image Analysis
•  Image Processing
•  Segmentation
•  Registration
•  Measurement &
Analysis
CMake provides the software process
for many popular projects
Allegro library
Armadillo
Avidemux
Awesome
Blender 3D
Bullet Physics Engine
Chicken Scheme
Chipmunk physics engine
Clang
Compiz
Conky
Doomsday Engine
Drishti
Gammu
GDCM
Gmsh
Hypertable
Hugin
iCub
IGSTK
ITK
KDE SC 4
Kicad
LMMS
LLVM
MariaDB
MiKTeX
MuseScore
MySQL
OGRE
OpenSceneGraph
OpenSync
OpenCV
ParaView
Poppler
PvPGN
Quantum GIS
QutIM
Raw Therapee
ROS
Scribus
Second Life
Spring RTS
SuperTux
Slicer
Stellarium
VTK
VXL
YARP
Arbor is an NSF funded project to enable evolutionary
biological research by making it easy for biologists to
•  Create
•  Test
•  Visualize
Algorithms on the Tree of Life. Below
is the evolutionary tree for Heliconia
(Lobster Claw) plants coupled to a character
matrix of observational data such as color, feature
measurements and range.
DARPA XData Project
•  Addressing needs of big data analysis
•  Large, collaborative project
•  PI: Jeffrey Baumes, Kitware Inc.
– Jeffrey Heer, Stanford
– Hanspeter Pfister, Harvard
– John Stasko, Georgia Institute of Technology
– Miriah Meyer, University of Utah
– Curtis Lisle, KnowledgeVis LLC
13	
  
Building Community
•  Communities are grown around
open source projects
•  Using Kitware software process
–  Ensuring quality with continuous
testing
–  Code contributions via the web
–  Public mailing lists, bug trackers,
and code review
•  Promoting projects and
participation
–  Publications
–  Conferences
–  Workshops
15	
  
Software
Repository
Build, Test
& Package
Community
Review
Developers
& Users
Business Model: Open Source
•  Open-source Software
– Normally BSD-licensed
– Collaboration platforms
•  Collaborative Research and Development
•  Technology Integration
•  Service and Support
•  Consulting
•  Training and webinars
16	
  
Business Model: Open Source
•  Open-source platforms used in:
–  Research
–  Teaching
–  Commercial applications
•  Software is created by:
–  Internationally-recognized (Kitware) experts
–  Extended open-source communities
•  Using a rigorous, quality-inducing software
development process
17	
  
Commercialization Strategy
•  Services & Consulting Model
–  Kitware develops widely-used software
frameworks and serves them through consulting.
•  Collaborative R&D
•  Custom solution development
•  Value-added products (e.g., training, support, books)
•  Services comprise approximately 2/3 of the global
software market
•  Companies such as IBM, HP, and Oracle realize
massive business from services
18	
  
Value of Open Source
•  Access to and ownership of the code
•  Collaborative relationships are natural
•  Rapid, responsive development process
•  Partners can participate in development
•  Reduced or (often) no licensing fees
•  Maintenance burden taken up by broader
community
•  Often represents the greatest part of the cost
of software
19	
  
Outline
•  Introduction
•  Kitware
•  Open Chemistry
•  Avogadro 2
•  MoleQueue
•  MongoChem
•  The Future
•  Summary
20	
  
Beginnings of Open Chemistry
•  The Avogadro project began in 2006
•  One of very few open-source 3D chemical editors
•  Draw/edit structure
•  Generate input for codes
•  Analyze output of codes
•  Open-source, GPLv2 GUI
•  Google Summer of Code in 2007 (KDE)
•  Used by Kalzium, a KDE educational tool
•  Over 300,000 downloads, 20+ translations
21	
  
Avogadro Paper Published 8/13/12
22	
  
http://www.jcheminf.com/content/4/1/17
NWChem, FoX, Avogadro Paper
Published 5/24/13
23	
  
http://www.jcheminf.com/content/5/1/25
Vision
•  Advancing the state-of-the-art
•  Tight integration is needed
•  Computational codes
•  Clusters/supercomputers
•  Data repositories
•  Reduce, reuse, and recycle!
•  Facilitate sharing and
searching of data
•  Embracing data-centric workflows
24	
  
Overview
•  Desktop chemistry application suite
– 3D structure editor, pre- and post-processing
– HPC integration for easily runing codes
– Cheminformatics to store, index, and analyze
•  Each program can work independently
– Enhanced functionality when used together
– One-click HPC job submission
– Easily open structure found in database
– Coordination of job submission
25	
  
Open Chemistry Project Approach
•  An open approach to chemistry software
– Open-source frameworks
– Developed openly
– Cross-platform
– Tested and verified
– Contribution model
– Supported by Kitware experts
•  BSD-licensed to facilitate research/reuse
26	
  
Opening Up Chemistry
•  Computational chemistry is currently one
of the more closed sciences
•  Lots of black box proprietary codes
– Only a few have access to the code
– Publishing results from black box codes
– Many file formats in use, little agreement
•  More papers should be including data
•  Growing need for open standards
27	
  
OpenChemistry.org
•  Web presence to promote Open Chemistry
•  Hosting of project-specific pages
•  Providing an identity for related projects
•  Promote shared ownership of projects
–  Website
–  Code submission and review
–  Testing infrastructure
–  Wiki, mailing lists, news, and galleries
28	
  
29	
  
Applications Being Developed
•  Three independent applications
•  Communication handled with local sockets
•  Avogadro 2: Structure editing, input generation,
output viewing, and analysis
•  MoleQueue: Running local and remote jobs in
standalone programs, and management
•  MongoChem: Storage of data, searching, entry,
and annotation
30	
  
Open Frameworks
•  AvogadroLibs: Core data structures and
algorithms shared across codes
•  Split into dedicated libraries; e.g. core, io,
rendering, qtgui, qtopengl, qtplugins, quantum
•  Core maintains a minimal dependency set
•  Intended for use on server, command line,
and in a full-blown desktop application
•  VTK: Chemistry visualization and data
structures, use of above
31	
  
HPC	
  GUI/Visualiza:on	
  Core,	
  command	
  line	
  
Project Diagram: Libraries/Apps
32	
  
AvogadroLibs	
  
MongoChem	
  
Avogadro	
  
MoleQueue	
  
VTK	
  
Workflow in Open Chemistry
33	
  
Avogadro2	
  
Job	
  Submission	
  
Calcula:on	
  
Results	
  
Input File
Local
Remote
Log File
MoleQueue	
  
MongoChem	
  
Avogadro2
•  Rewrite of Avogadro
•  Split into libraries &
application (plugin-based)
•  Still one of very few open source editors
•  Still using Qt, C++, Eigen, OpenGL, CMake
•  Use AvogadroLibs for core data
•  Introduces client-server dataflow/patterns
•  New, efficient rendering code
•  More liberally-licensed – from GPL to BSD
34	
  
Avogadro: Visualization
•  GPU-accelerated rendering
•  VTK for advanced visualization
•  Support for 2D and 3D plots of data
•  Optimized data structures
– Large data
– Streaming
•  Reworked interface
– Tighter database and workflow integration
35	
  
Advanced Impostor Rendering
•  Using a scene, vertex buffer objects, and
OpenGL shading language
•  Impostor techniques
– Sphere goes from 100s of triangles to 2!
– No artifacts from triangulation
– Scales to millions of spheres on modest GPU
36	
  
Electronic Structure Visualization
•  Read quantum output files
–  Calculates cubes for molecular orbitals
–  Shows isosurface or volume rendering
–  Multithreaded C++ code to perform calculations –
scales very well
37	
  
Scriptable Simulation Input Generator
•  Previous input generators were C++
•  Executes a simple Python script
– Script can output JSON with parameters
– Input is parameters specified by user
– Chemical JSON with full structure
– Supports syntax highlighting rules
•  New input generator is as simple as
adding a new Python script
– Implement 2-3 entry points and done
38	
  
Avogadro to Input Deck
39	
  
Quantum Data in AvogadroLibs
•  Reads in key quantum data
– Basis set used in calculation
– Eigenvectors for molecular orbitals
– Density matrix for electron density
– Standard geometry
•  Multi-threaded calculation
– Produces regular grids of scalar data
– Molecular orbitals, electron density…
40	
  
Molecular Orbitals and Electron Density
•  Quantum files store basis sets and
matrices
•  Using these equations, and the supplied
matrices – calculate cubes
GTO = ce−αr2
φi = cµiφµ
µ
∑
ρ r( )= Pµν φµφν
ν
∑
µ
∑
41	
  
Calling Stand-alone Programs
•  Many are already supported:
•  NWChem, GAMESS, GAMESS-UK, Molpro,
Q-Chem, MOPAC, Gaussian, Dalton
•  Very easy to add more
•  MongoChem and Avogadro 2 use libraries
•  Custom applications are simple
•  Now with simpler BSD licensing, testing, …
•  Started looking at/prototyping other areas
•  Molecular dynamics, plane-wave, APBS
42	
  
New CML I/O
•  Development of modular CML code
•  Allow for multi-pass parsing of CML
•  Keep the CML closer to application
•  Much faster, easier to extend and change
•  Moving from simple CML to full semantic
documents that can be edited
•  Learned from previous work in VTK and
Open Babel
43	
  
File Format: CML & HDF5
•  Leverage our experience with XDMF
•  Early prototype already implemented
•  CML stores semantic data
– Name, formula, atoms, bonds
– Computational code, theory, basis set
•  HDF5 used to store heavy data
– Basis set, intermediate data
– Eigenvectors, SCF matrix
– Volumetric data (MOs, electron density)
44	
  
Avogadro: Client-Server
•  Currently in early stages of development
•  Off-loads more calculations to cluster
•  Streams data, geometry
•  Loading/creation of data remotely
•  Analysis of large data
– Processing nodes
– Rendering nodes
•  Scales to very large data
45	
  
MoleQueue: Job Management
•  Tighter integration with remote queues
•  Integration with databases
– Retains full log of computational jobs
– Triggers actions on completion
•  Plugin-based system
– Easy addition of new codes
– Easy addition of new queue systems
•  Provides client API for applications
46	
  
MoleQueue
•  Supports configuration for a variety of
remote clusters and queuing software
•  Transparently switches between local and
remote execution of codes
MoleQueue: Queue Types
•  Several transports implemented
– Command line SSH/plink (Windows)
– libssh2 (experimental)
– HTTPS (SOAP)
•  Several queue types
– Local (execute and marshal processes)
– Sun Grid Engine, PBS, SLURM
– UIT (ezHPC with largely PBS dialect)
48	
  
Using JSON
•  MongoDB stores data as BSON
–  JSON: JavaScript Object Notation
–  BSON: Binary form, type safe
•  JSON is very compact, standardized
{
“name”: “water”,
“atoms”: {
“elementType”: [“H”, “H”, “O”],
}
“properties”: { “molecular weight”: 18.0153 }
}
49	
  
JSON-RPC interface
Applications can submit jobs via a local
socket or ZeroMQ connection:
Client request:
{ "jsonrpc": "2.0",
"method": "submitJob",
"params": {
"queue": "Remote cluster PBS",
"program": "MOPAC",
"description": "PM6 H2 optimization",
"inputAsString": "PM6nnH 0.0 0.0 0.0nH 1.0 0.0 0.0n"
},
"id": "XXX” } Server reply:
{ "jsonrpc": "2.0",
"result": {
"moleQueueId": 17,
"queueId": 123456,
"workingDirectory": "/tmp/MoleQueue/17/"
},
"id": "XXX” }
Chemical JSON
•  Stores molecular structure,
geometry, identifiers, and
descriptors as a JSON object
•  Benefits:
–  More compact than XML/CML
–  Native language of MongoDB
and JSON-RPC
–  Easily converted to a binary
representation (BSON)
51	
  
MongoChem Overview
•  A desktop cheminformatics tool
– Chemical data exploration and analysis
– Interactive, editable, and searchable database
•  Leverages several open-source projects
– Qt, VTK, MongoDB, Avogadro 2, Open Babel
•  Designed to look at many molecules
•  Spots patterns, outliers; runs many jobs
•  Scales to studies with ~3 million structures
Architecture Overview
•  Native, cross-platform C++ application built with Qt
•  Stores chemical data in a NoSQL MongoDB database
•  Uses VTK for 2D and 3D dataset visualization
53	
  
Computational Job Storage
•  Jobs associated
with molecules
•  Searchable based
on structure/job
parameters
54	
  
Charts and Plots in ChemData
55	
  
K-Means Clustering
•  >30 numerical molecular descriptors
•  Extraction and filtering into clusters
56	
  
ParaViewWeb and MongoChem
•  Uses ParaView's client-server architecture
•  Interactive 3D rendering
•  Runs in any modern web browser
•  Same MongoDB server as MongoChem
•  Move more to the client JavaScript code
•  Moving to a simple, Python-based server
– Easy to add new APIs
– Easy to deploy/integrate into other solutions
57	
  
ParaViewWeb and Open Chemistry
58	
  
Software Process
•  Source code publicly hosted using Git
•  Gerrit for online code review
•  CTest/CDash for testing/summary
– Gerrit can use CDash@Home
•  Test proposed changes before merge
•  CDash can now provide binaries
– Built nightly, available for direct download
•  Wiki, mailing list, and bug tracker
59	
  
Software Process
60	
  
Outline
•  Introduction
•  Kitware
•  Open Chemistry
•  Avogadro 2
•  MoleQueue
•  MongoChem
•  The Future
•  Summary
61	
  
Vision for the Future
•  Find partners to develop targeted solutions
•  Improved tight integration is needed
•  Computational codes
•  Clusters/supercomputers
•  Data repositories
•  Improve and extend client-server architecture
•  Co-processing/in-situ visualization/analysis
•  Embracing open, semantically rich data
•  Address semantic and large data in chemistry
62	
  
Avogadro: Visualization
•  GPGPU accelerated rendering/interop
•  More VTK for advanced visualization
•  Support for 2D and 3D plots of data
•  Optimized data structures
– Streaming of large data
– Real-time ray-tracing
•  Reworked interface
– Tighter database/workflow integration
63	
  
MoleQueue: Complex Jobs
•  Tighter integration with remote queues
•  Integration with databases
– Retain full log of computational jobs
– Trigger actions on completion
•  Manage complex jobs
– Restarts, dependent jobs, triggers
– Meta-scheduling – choose best resource
– Classify completed job success/failure/status
64	
  
MongoChem: Chemistry Data
•  Substructure searches
– Fingerprints support substructure searching
•  Tighter integration with applications
– Communication to search/retrieve/submit
•  Easier addition/annotation of data
– Enable full annotation and searching
•  Web frontend wider sharing
•  Simple command line tools – batch jobs
65	
  
Quixote: Parser Technology
•  From punch cards and line printers…
•  Implement C++ parsers
– Using regular expressions
– Provide editor/simulator
– Easily update parser for new terms
•  Dictionaries
– Documenting the log files
•  Facilitating data storage and exchange
66	
  
Building Community
•  Community around chemistry
projects
•  Using Kitware software process
–  Ensuring quality with continuous
testing
–  Code contributions on the web
–  Public mailing lists, bug trackers,
code review
•  Promoting projects and
participation
–  Publication
–  Conferences
–  Workshops
–  Social media
67	
  
Software
Repository
Build, Test
& Package Community
Review
Developers
& Users
Rethinking Input File Generation
•  Can we create a CML representation?
– Could be loaded directly by some codes
– Could be translated to input files for others
•  Would allow search on input and output
•  Could be stored and published
•  Make it easier to set up calculations
•  Created a more uniform experience
•  Input generators currently use JSON rep
68	
  
Outline
•  Introduction
•  Kitware
•  Open Chemistry
•  The Future
•  Summary
69	
  
Overview
•  Avogadro 2 – molecular editor/analysis
•  MoleQueue – manage external code execution
•  MongoChem– data management, visualization
•  AvogadroLibs – core data structures, algorithms
•  VTK – advanced visualization and analysis
•  A strong ecosystem for computational chemistry
–  Documentation and training materials
–  Collaborative research and development
–  Work with scientists on real research problems
–  Provision of support and consulting services
70	
  
Conclusions
•  Real opportunity to make an impact
•  Bringing best practices to chemistry
•  Improve research, industry, and teaching
•  Semantic data at the center of our work
–  Storage
–  Search
–  Interaction with computational codes
–  Comparison with experimental data
•  Actively seeking collaborators for future work
71	
  
Gerrit
72	
  
CDash
73	
  

More Related Content

What's hot

Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryMarcus Hanwell
 
Evolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projectsEvolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projectsTom Mens
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationTanu Malik
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityTanu Malik
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014aceas13tern
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsTanu Malik
 
ExSchema - ICSM'13
ExSchema - ICSM'13ExSchema - ICSM'13
ExSchema - ICSM'13jccastrejon
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanBoris Glavic
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015Tanu Malik
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...University of California, San Diego
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
Mulvery Detail - English
Mulvery Detail - EnglishMulvery Detail - English
Mulvery Detail - EnglishDaichi Teruya
 
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?BIOVIA
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontGreg Landrum
 
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...Till Blume
 
The eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryThe eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryNina Jeliazkova
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSAPRBETTER
 
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...Keiichiro Ono
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 

What's hot (20)

Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
 
Evolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projectsEvolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projects
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database Virtualization
 
Bioinformatics on Azure
Bioinformatics on AzureBioinformatics on Azure
Bioinformatics on Azure
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for Repeatability
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
 
ExSchema - ICSM'13
ExSchema - ICSM'13ExSchema - ICSM'13
ExSchema - ICSM'13
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, Ian
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Mulvery Detail - English
Mulvery Detail - EnglishMulvery Detail - English
Mulvery Detail - English
 
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
 
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...The FLuID Meta Model: Incrementally Compute  Schema-level Indices for the Web...
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
 
The eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryThe eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and query
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
 
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 

Viewers also liked

Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...Anubhav Jain
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...Anubhav Jain
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overviewAnubhav Jain
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureAnubhav Jain
 
IoT Architecture for Water Resources Industry
IoT Architecture for Water Resources IndustryIoT Architecture for Water Resources Industry
IoT Architecture for Water Resources IndustryAren Matta
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 

Viewers also liked (7)

Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overview
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructure
 
IoT Architecture for Water Resources Industry
IoT Architecture for Water Resources IndustryIoT Architecture for Water Resources Industry
IoT Architecture for Water Resources Industry
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 

Similar to Open Chemistry: Input Preparation, Data Visualization & Analysis

Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)Ina Smith
 
ufsojs-161024084446 (1).pdf
ufsojs-161024084446 (1).pdfufsojs-161024084446 (1).pdf
ufsojs-161024084446 (1).pdfTeshome Oljira
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
UI Dev in Big data world using open source
UI Dev in Big data world using open sourceUI Dev in Big data world using open source
UI Dev in Big data world using open sourceTech Triveni
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering
 
A FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning ModelsA FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning ModelsBen Blaiszik
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsSimon Cockell
 
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Josh Levy-Kramer
 
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016Grid Protection Alliance
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Subbu Rama
 
Adam bosc-071114
Adam bosc-071114Adam bosc-071114
Adam bosc-071114fnothaft
 
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Centre of Competence
 
General Introduction to the Oxford e-Research Centre
General Introduction to the Oxford e-Research CentreGeneral Introduction to the Oxford e-Research Centre
General Introduction to the Oxford e-Research CentreDavid Wallom
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding CattranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding CatDavid Peyruc
 
Grid Computing (An Up-Coming Technology)
Grid Computing (An Up-Coming Technology)Grid Computing (An Up-Coming Technology)
Grid Computing (An Up-Coming Technology)LJ PROJECTS
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobus
 
εξελιξη πληροφοριακων συστηματων στη διαχειρiση καινοτομιας
εξελιξη πληροφοριακων συστηματων στη διαχειρiση καινοτομιαςεξελιξη πληροφοριακων συστηματων στη διαχειρiση καινοτομιας
εξελιξη πληροφοριακων συστηματων στη διαχειρiση καινοτομιαςManolis Vavalis
 

Similar to Open Chemistry: Input Preparation, Data Visualization & Analysis (20)

nstitutional repositories, item and research data metrics
nstitutional repositories, item and research data metricsnstitutional repositories, item and research data metrics
nstitutional repositories, item and research data metrics
 
Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)
 
ufsojs-161024084446 (1).pdf
ufsojs-161024084446 (1).pdfufsojs-161024084446 (1).pdf
ufsojs-161024084446 (1).pdf
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
UI Dev in Big data world using open source
UI Dev in Big data world using open sourceUI Dev in Big data world using open source
UI Dev in Big data world using open source
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
 
A FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning ModelsA FAIR Approach to Publishing and Sharing Machine Learning Models
A FAIR Approach to Publishing and Sharing Machine Learning Models
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformatics
 
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
Reproducible data science: review of Pachyderm, Data Version Control and GIT ...
 
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
Advanced Automated Analytics Using OSS Tools, GA Tech FDA Conference 2016
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
 
Adam bosc-071114
Adam bosc-071114Adam bosc-071114
Adam bosc-071114
 
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens Neudecker
 
Developing XWiki
Developing XWikiDeveloping XWiki
Developing XWiki
 
General Introduction to the Oxford e-Research Centre
General Introduction to the Oxford e-Research CentreGeneral Introduction to the Oxford e-Research Centre
General Introduction to the Oxford e-Research Centre
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding CattranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
 
Grid Computing (An Up-Coming Technology)
Grid Computing (An Up-Coming Technology)Grid Computing (An Up-Coming Technology)
Grid Computing (An Up-Coming Technology)
 
OGCE SC10
OGCE SC10OGCE SC10
OGCE SC10
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
εξελιξη πληροφοριακων συστηματων στη διαχειρiση καινοτομιας
εξελιξη πληροφοριακων συστηματων στη διαχειρiση καινοτομιαςεξελιξη πληροφοριακων συστηματων στη διαχειρiση καινοτομιας
εξελιξη πληροφοριακων συστηματων στη διαχειρiση καινοτομιας
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

Open Chemistry: Input Preparation, Data Visualization & Analysis

  • 1. Dr. Marcus D. Hanwell marcus.hanwell@kitware.com http://openchemistry.org/ June 8, 2013 LA-SiGMA Baton Rouge, LA Input Preparation, Data Visualization & Analysis 1  
  • 2. Outline •  Introduction •  Kitware •  Open Chemistry •  Avogadro 2 •  MoleQueue •  MongoChem •  The Future •  Summary 2  
  • 3. Introduction •  User-friendly desktop integration with – Computational codes – HPC/cloud resources – Database/informatics resources 3  
  • 4. Introduction •  Bringing real change to chemistry – Open-source frameworks – Developed openly – Cross-platform compatibility – Tested and verified – Contribution model – Supported by Kitware experts •  Liberally-licensed to facilitate research 4  
  • 5. Open Chemistry Development Team •  Inter-disciplinary team at Kitware •  The first three worked on open-source chemistry in their spare time •  The final two are computer scientists with years of open-source experience •  Seeking partners in industry & research, labs 5  
  • 6. Outline •  Introduction •  Kitware •  Open Chemistry •  Avogadro 2 •  MoleQueue •  MongoChem •  The Future •  Summary 6  
  • 7. Kitware •  Founded in 1998 by five former GE Research employees •  118 current employees; 39 with PhDs •  Privately held, profitable from creation, no debt •  Rapidly Growing: >30% in 2011, 7M web-visitors/quarter •  Offices –  Clifton Park, NY –  Carrboro, NC –  Santa Fe, NM –  Lyon, France •  2011 Small Business Administration’s Tibbetts Award •  HPCWire Readers and Editor’s Choice •  Inc’s 5000 List: 2008 to 2011
  • 9. Supercomputing Visualization •  Scientific Visualization •  Informatics •  Large Data Visualization •  3D Interaction •  Volume Rendering
  • 10. Medical Image Analysis •  Image Processing •  Segmentation •  Registration •  Measurement & Analysis
  • 11. CMake provides the software process for many popular projects Allegro library Armadillo Avidemux Awesome Blender 3D Bullet Physics Engine Chicken Scheme Chipmunk physics engine Clang Compiz Conky Doomsday Engine Drishti Gammu GDCM Gmsh Hypertable Hugin iCub IGSTK ITK KDE SC 4 Kicad LMMS LLVM MariaDB MiKTeX MuseScore MySQL OGRE OpenSceneGraph OpenSync OpenCV ParaView Poppler PvPGN Quantum GIS QutIM Raw Therapee ROS Scribus Second Life Spring RTS SuperTux Slicer Stellarium VTK VXL YARP
  • 12. Arbor is an NSF funded project to enable evolutionary biological research by making it easy for biologists to •  Create •  Test •  Visualize Algorithms on the Tree of Life. Below is the evolutionary tree for Heliconia (Lobster Claw) plants coupled to a character matrix of observational data such as color, feature measurements and range.
  • 13. DARPA XData Project •  Addressing needs of big data analysis •  Large, collaborative project •  PI: Jeffrey Baumes, Kitware Inc. – Jeffrey Heer, Stanford – Hanspeter Pfister, Harvard – John Stasko, Georgia Institute of Technology – Miriah Meyer, University of Utah – Curtis Lisle, KnowledgeVis LLC 13  
  • 14.
  • 15. Building Community •  Communities are grown around open source projects •  Using Kitware software process –  Ensuring quality with continuous testing –  Code contributions via the web –  Public mailing lists, bug trackers, and code review •  Promoting projects and participation –  Publications –  Conferences –  Workshops 15   Software Repository Build, Test & Package Community Review Developers & Users
  • 16. Business Model: Open Source •  Open-source Software – Normally BSD-licensed – Collaboration platforms •  Collaborative Research and Development •  Technology Integration •  Service and Support •  Consulting •  Training and webinars 16  
  • 17. Business Model: Open Source •  Open-source platforms used in: –  Research –  Teaching –  Commercial applications •  Software is created by: –  Internationally-recognized (Kitware) experts –  Extended open-source communities •  Using a rigorous, quality-inducing software development process 17  
  • 18. Commercialization Strategy •  Services & Consulting Model –  Kitware develops widely-used software frameworks and serves them through consulting. •  Collaborative R&D •  Custom solution development •  Value-added products (e.g., training, support, books) •  Services comprise approximately 2/3 of the global software market •  Companies such as IBM, HP, and Oracle realize massive business from services 18  
  • 19. Value of Open Source •  Access to and ownership of the code •  Collaborative relationships are natural •  Rapid, responsive development process •  Partners can participate in development •  Reduced or (often) no licensing fees •  Maintenance burden taken up by broader community •  Often represents the greatest part of the cost of software 19  
  • 20. Outline •  Introduction •  Kitware •  Open Chemistry •  Avogadro 2 •  MoleQueue •  MongoChem •  The Future •  Summary 20  
  • 21. Beginnings of Open Chemistry •  The Avogadro project began in 2006 •  One of very few open-source 3D chemical editors •  Draw/edit structure •  Generate input for codes •  Analyze output of codes •  Open-source, GPLv2 GUI •  Google Summer of Code in 2007 (KDE) •  Used by Kalzium, a KDE educational tool •  Over 300,000 downloads, 20+ translations 21  
  • 22. Avogadro Paper Published 8/13/12 22   http://www.jcheminf.com/content/4/1/17
  • 23. NWChem, FoX, Avogadro Paper Published 5/24/13 23   http://www.jcheminf.com/content/5/1/25
  • 24. Vision •  Advancing the state-of-the-art •  Tight integration is needed •  Computational codes •  Clusters/supercomputers •  Data repositories •  Reduce, reuse, and recycle! •  Facilitate sharing and searching of data •  Embracing data-centric workflows 24  
  • 25. Overview •  Desktop chemistry application suite – 3D structure editor, pre- and post-processing – HPC integration for easily runing codes – Cheminformatics to store, index, and analyze •  Each program can work independently – Enhanced functionality when used together – One-click HPC job submission – Easily open structure found in database – Coordination of job submission 25  
  • 26. Open Chemistry Project Approach •  An open approach to chemistry software – Open-source frameworks – Developed openly – Cross-platform – Tested and verified – Contribution model – Supported by Kitware experts •  BSD-licensed to facilitate research/reuse 26  
  • 27. Opening Up Chemistry •  Computational chemistry is currently one of the more closed sciences •  Lots of black box proprietary codes – Only a few have access to the code – Publishing results from black box codes – Many file formats in use, little agreement •  More papers should be including data •  Growing need for open standards 27  
  • 28. OpenChemistry.org •  Web presence to promote Open Chemistry •  Hosting of project-specific pages •  Providing an identity for related projects •  Promote shared ownership of projects –  Website –  Code submission and review –  Testing infrastructure –  Wiki, mailing lists, news, and galleries 28  
  • 29. 29  
  • 30. Applications Being Developed •  Three independent applications •  Communication handled with local sockets •  Avogadro 2: Structure editing, input generation, output viewing, and analysis •  MoleQueue: Running local and remote jobs in standalone programs, and management •  MongoChem: Storage of data, searching, entry, and annotation 30  
  • 31. Open Frameworks •  AvogadroLibs: Core data structures and algorithms shared across codes •  Split into dedicated libraries; e.g. core, io, rendering, qtgui, qtopengl, qtplugins, quantum •  Core maintains a minimal dependency set •  Intended for use on server, command line, and in a full-blown desktop application •  VTK: Chemistry visualization and data structures, use of above 31  
  • 32. HPC  GUI/Visualiza:on  Core,  command  line   Project Diagram: Libraries/Apps 32   AvogadroLibs   MongoChem   Avogadro   MoleQueue   VTK  
  • 33. Workflow in Open Chemistry 33   Avogadro2   Job  Submission   Calcula:on   Results   Input File Local Remote Log File MoleQueue   MongoChem  
  • 34. Avogadro2 •  Rewrite of Avogadro •  Split into libraries & application (plugin-based) •  Still one of very few open source editors •  Still using Qt, C++, Eigen, OpenGL, CMake •  Use AvogadroLibs for core data •  Introduces client-server dataflow/patterns •  New, efficient rendering code •  More liberally-licensed – from GPL to BSD 34  
  • 35. Avogadro: Visualization •  GPU-accelerated rendering •  VTK for advanced visualization •  Support for 2D and 3D plots of data •  Optimized data structures – Large data – Streaming •  Reworked interface – Tighter database and workflow integration 35  
  • 36. Advanced Impostor Rendering •  Using a scene, vertex buffer objects, and OpenGL shading language •  Impostor techniques – Sphere goes from 100s of triangles to 2! – No artifacts from triangulation – Scales to millions of spheres on modest GPU 36  
  • 37. Electronic Structure Visualization •  Read quantum output files –  Calculates cubes for molecular orbitals –  Shows isosurface or volume rendering –  Multithreaded C++ code to perform calculations – scales very well 37  
  • 38. Scriptable Simulation Input Generator •  Previous input generators were C++ •  Executes a simple Python script – Script can output JSON with parameters – Input is parameters specified by user – Chemical JSON with full structure – Supports syntax highlighting rules •  New input generator is as simple as adding a new Python script – Implement 2-3 entry points and done 38  
  • 39. Avogadro to Input Deck 39  
  • 40. Quantum Data in AvogadroLibs •  Reads in key quantum data – Basis set used in calculation – Eigenvectors for molecular orbitals – Density matrix for electron density – Standard geometry •  Multi-threaded calculation – Produces regular grids of scalar data – Molecular orbitals, electron density… 40  
  • 41. Molecular Orbitals and Electron Density •  Quantum files store basis sets and matrices •  Using these equations, and the supplied matrices – calculate cubes GTO = ce−αr2 φi = cµiφµ µ ∑ ρ r( )= Pµν φµφν ν ∑ µ ∑ 41  
  • 42. Calling Stand-alone Programs •  Many are already supported: •  NWChem, GAMESS, GAMESS-UK, Molpro, Q-Chem, MOPAC, Gaussian, Dalton •  Very easy to add more •  MongoChem and Avogadro 2 use libraries •  Custom applications are simple •  Now with simpler BSD licensing, testing, … •  Started looking at/prototyping other areas •  Molecular dynamics, plane-wave, APBS 42  
  • 43. New CML I/O •  Development of modular CML code •  Allow for multi-pass parsing of CML •  Keep the CML closer to application •  Much faster, easier to extend and change •  Moving from simple CML to full semantic documents that can be edited •  Learned from previous work in VTK and Open Babel 43  
  • 44. File Format: CML & HDF5 •  Leverage our experience with XDMF •  Early prototype already implemented •  CML stores semantic data – Name, formula, atoms, bonds – Computational code, theory, basis set •  HDF5 used to store heavy data – Basis set, intermediate data – Eigenvectors, SCF matrix – Volumetric data (MOs, electron density) 44  
  • 45. Avogadro: Client-Server •  Currently in early stages of development •  Off-loads more calculations to cluster •  Streams data, geometry •  Loading/creation of data remotely •  Analysis of large data – Processing nodes – Rendering nodes •  Scales to very large data 45  
  • 46. MoleQueue: Job Management •  Tighter integration with remote queues •  Integration with databases – Retains full log of computational jobs – Triggers actions on completion •  Plugin-based system – Easy addition of new codes – Easy addition of new queue systems •  Provides client API for applications 46  
  • 47. MoleQueue •  Supports configuration for a variety of remote clusters and queuing software •  Transparently switches between local and remote execution of codes
  • 48. MoleQueue: Queue Types •  Several transports implemented – Command line SSH/plink (Windows) – libssh2 (experimental) – HTTPS (SOAP) •  Several queue types – Local (execute and marshal processes) – Sun Grid Engine, PBS, SLURM – UIT (ezHPC with largely PBS dialect) 48  
  • 49. Using JSON •  MongoDB stores data as BSON –  JSON: JavaScript Object Notation –  BSON: Binary form, type safe •  JSON is very compact, standardized { “name”: “water”, “atoms”: { “elementType”: [“H”, “H”, “O”], } “properties”: { “molecular weight”: 18.0153 } } 49  
  • 50. JSON-RPC interface Applications can submit jobs via a local socket or ZeroMQ connection: Client request: { "jsonrpc": "2.0", "method": "submitJob", "params": { "queue": "Remote cluster PBS", "program": "MOPAC", "description": "PM6 H2 optimization", "inputAsString": "PM6nnH 0.0 0.0 0.0nH 1.0 0.0 0.0n" }, "id": "XXX” } Server reply: { "jsonrpc": "2.0", "result": { "moleQueueId": 17, "queueId": 123456, "workingDirectory": "/tmp/MoleQueue/17/" }, "id": "XXX” }
  • 51. Chemical JSON •  Stores molecular structure, geometry, identifiers, and descriptors as a JSON object •  Benefits: –  More compact than XML/CML –  Native language of MongoDB and JSON-RPC –  Easily converted to a binary representation (BSON) 51  
  • 52. MongoChem Overview •  A desktop cheminformatics tool – Chemical data exploration and analysis – Interactive, editable, and searchable database •  Leverages several open-source projects – Qt, VTK, MongoDB, Avogadro 2, Open Babel •  Designed to look at many molecules •  Spots patterns, outliers; runs many jobs •  Scales to studies with ~3 million structures
  • 53. Architecture Overview •  Native, cross-platform C++ application built with Qt •  Stores chemical data in a NoSQL MongoDB database •  Uses VTK for 2D and 3D dataset visualization 53  
  • 54. Computational Job Storage •  Jobs associated with molecules •  Searchable based on structure/job parameters 54  
  • 55. Charts and Plots in ChemData 55  
  • 56. K-Means Clustering •  >30 numerical molecular descriptors •  Extraction and filtering into clusters 56  
  • 57. ParaViewWeb and MongoChem •  Uses ParaView's client-server architecture •  Interactive 3D rendering •  Runs in any modern web browser •  Same MongoDB server as MongoChem •  Move more to the client JavaScript code •  Moving to a simple, Python-based server – Easy to add new APIs – Easy to deploy/integrate into other solutions 57  
  • 58. ParaViewWeb and Open Chemistry 58  
  • 59. Software Process •  Source code publicly hosted using Git •  Gerrit for online code review •  CTest/CDash for testing/summary – Gerrit can use CDash@Home •  Test proposed changes before merge •  CDash can now provide binaries – Built nightly, available for direct download •  Wiki, mailing list, and bug tracker 59  
  • 61. Outline •  Introduction •  Kitware •  Open Chemistry •  Avogadro 2 •  MoleQueue •  MongoChem •  The Future •  Summary 61  
  • 62. Vision for the Future •  Find partners to develop targeted solutions •  Improved tight integration is needed •  Computational codes •  Clusters/supercomputers •  Data repositories •  Improve and extend client-server architecture •  Co-processing/in-situ visualization/analysis •  Embracing open, semantically rich data •  Address semantic and large data in chemistry 62  
  • 63. Avogadro: Visualization •  GPGPU accelerated rendering/interop •  More VTK for advanced visualization •  Support for 2D and 3D plots of data •  Optimized data structures – Streaming of large data – Real-time ray-tracing •  Reworked interface – Tighter database/workflow integration 63  
  • 64. MoleQueue: Complex Jobs •  Tighter integration with remote queues •  Integration with databases – Retain full log of computational jobs – Trigger actions on completion •  Manage complex jobs – Restarts, dependent jobs, triggers – Meta-scheduling – choose best resource – Classify completed job success/failure/status 64  
  • 65. MongoChem: Chemistry Data •  Substructure searches – Fingerprints support substructure searching •  Tighter integration with applications – Communication to search/retrieve/submit •  Easier addition/annotation of data – Enable full annotation and searching •  Web frontend wider sharing •  Simple command line tools – batch jobs 65  
  • 66. Quixote: Parser Technology •  From punch cards and line printers… •  Implement C++ parsers – Using regular expressions – Provide editor/simulator – Easily update parser for new terms •  Dictionaries – Documenting the log files •  Facilitating data storage and exchange 66  
  • 67. Building Community •  Community around chemistry projects •  Using Kitware software process –  Ensuring quality with continuous testing –  Code contributions on the web –  Public mailing lists, bug trackers, code review •  Promoting projects and participation –  Publication –  Conferences –  Workshops –  Social media 67   Software Repository Build, Test & Package Community Review Developers & Users
  • 68. Rethinking Input File Generation •  Can we create a CML representation? – Could be loaded directly by some codes – Could be translated to input files for others •  Would allow search on input and output •  Could be stored and published •  Make it easier to set up calculations •  Created a more uniform experience •  Input generators currently use JSON rep 68  
  • 69. Outline •  Introduction •  Kitware •  Open Chemistry •  The Future •  Summary 69  
  • 70. Overview •  Avogadro 2 – molecular editor/analysis •  MoleQueue – manage external code execution •  MongoChem– data management, visualization •  AvogadroLibs – core data structures, algorithms •  VTK – advanced visualization and analysis •  A strong ecosystem for computational chemistry –  Documentation and training materials –  Collaborative research and development –  Work with scientists on real research problems –  Provision of support and consulting services 70  
  • 71. Conclusions •  Real opportunity to make an impact •  Bringing best practices to chemistry •  Improve research, industry, and teaching •  Semantic data at the center of our work –  Storage –  Search –  Interaction with computational codes –  Comparison with experimental data •  Actively seeking collaborators for future work 71